Massive Technical Interviews Tips: How to Design Distributed Priority Queue

Saturday, March 12, 2016

How to Design Distributed Priority Queue

http://en.clouddesignpattern.org/index.php/CDP:Priority_Queue_Pattern

There are cases where a large number of batch jobs may need processing, and where the the jobs may need to be re-prioritized.

A queue is used in controlling batch jobs. The queue need only be provided with priority numbers. Job requests are controlled by the queue, and the job requests in the queue are processed by a batch server. In Cloud computing, a highly reliable queue is provided as a service, which you can use to structure a highly reliable batch system with ease. You may prepare multiple queues depending on priority levels, with job requests put into the queues depending on their priority levels, to apply prioritization to batch processes.

You can increase or decrease the number of servers for processing jobs to change automatically the processing speeds of the priority queues and secondary queues.
You can handle performance and service requirements through merely increasing or decreasing the number of EC2 instances used in job processing.
Even if an EC2 were to fail, the messages (jobs) would remain in the queue service, enabling processing to be continued immediately upon recovery of the EC2 instance, producing a system that is robust to failure.

Use SQS to prepare multiple queues for the individual priority levels.
Place those processes to be executed immediately (job requests) in the high priority queue.
Prepare numbers of batch servers, for processing the job requests of the queues, depending on the priority levels.
Queues have a message "Delayed Send" function. You can use this to delay the time for starting a process.

https://www.quora.com/Distributed-Systems/How-to-implement-a-high-availability-priority-Queue-that-can-easily-scale
create one SQS queue per priority level (assuming a fixed set of priorities). At that point you have a number of options of how to pop off of these queues. The algorithm that I settled on is as follows:

I issue N pop requests against the highest priority queue then M requests against the next highest queue (where M < N) and so on such that each queue gets fewer pop requests as the priority decreases. Since N can be quite large if at any point that particular queue is drained I short circuit the loop and go to the next lowest priority queue.

http://curator.apache.org/curator-recipes/distributed-priority-queue.html

queue.put(aMessage, priority);

https://cwiki.apache.org/confluence/display/CURATOR/TN4

ZooKeeper makes a very bad Queue source.

The ZooKeeper recipes page lists Queues as a possible use-case for ZooKeeper. Curator includes several Queue recipes. In our experience, however, it is a bad idea to use ZooKeeper as a Queue:

ZooKeeper has a 1MB transport limitation. In practice this means that ZNodes must be relatively small. Typically, queues can contain many thousands of messages.

ZooKeeper can slow down considerably on startup if there are many large ZNodes. This will be common if you are using ZooKeeper for queues. You will need to significantly increase initLimit and syncLimit.

If a ZNode gets too big it can be extremely difficult to clean. getChildren() will fail on the node. At Netflix we had to create a special-purpose program that had a huge value for jute.maxbuffer in order to get the nodes and delete them.

ZooKeeper can start to perform badly if there are many nodes with thousands of children.

The ZooKeeper database is kept entirely in memory. So, you can never have more messages than can fit in memory.

https://www.quora.com/Whats-a-good-distributed-queue

1. 如何设计一个priorityqueue service，client可以submit job request然后server按照priority执行

https://www.quora.com/Distributed-Systems-How-to-implement-a-high-availability-priority-Queue-that-can-easily-scale

Build something with Redis, which supports list operations. If you want to write your own List/Queue as API, go for it. Or if you want to use SQS, use it.

Do the replication yourself if you are unsure about your Queue unavailability.

Shard it yourself based on time, task queue name or priority. Let's go with the priority approach.

Step 2:

Here is where you can work out a variety of logical ways to dequeue:

Assign weights to each queue. At every dequeue step, add weights to each respective queue per unit time, and subtract one for each task you dequeue. To avoid starvation, use a round robin for each time you dequeue, and you shift to the next.

Return a set of tasks to each worker.

Step 3:

High availability priority queue that can scale

The trick I'm currently using is to create one SQS queue per priority level (assuming a fixed set of priorities). At that point you have a number of options of how to pop off of these queues. The algorithm that I settled on is as follows:

I issue N pop requests against the highest priority queue then M requests against the next highest queue (where M < N) and so on such that each queue gets fewer pop requests as the priority decreases. Since N can be quite large if at any point that particular queue is drained I short circuit the loop and go to the next lowest priority queue.

https://github.com/Netflix/curator/wiki/Distributed-Priority-Queue

Saturday, March 12, 2016

How to Design Distributed Priority Queue

Labels

Popular Posts