Distributing tasks between multiprocessing workers

Distributing tasks between multiprocessing workers - python

In the book "Mastering Concurrency in Python", chapter 6 "Working with Processes in Python": "Message passing between several workers", example 7, there is implementation of task queue. Here is its code: https://github.com/PacktPublishing/Mastering-Concurrency-in-Python/blob/master/Chapter06/example7.py
The author states the problem with this example:
Everything seems to be working, but if we look closely at the messages our processes have printed out, we will notice that most of the tasks were executed by either Consumer-2 or Consumer-3, and that Consumer-4 executed only one task while Consumer-1 failed to execute any. What happened here?
I tried to understand the explanation of the problem that author gives and it looks like it is wrong:
Essentially, when one of our consumers—let's say Consumer-3—finished executing a task,
it tried to look for another task to execute immediately after. Most of the time, it would get priority over other consumers, since it was already being run by the main program. So
while Consumer-2 and Consumer-3 were constantly finishing their tasks' executions and
picking up other tasks to execute, Consumer-4 was only able to "squeeze" itself in once,
and Consumer-1 failed to do this altogether.
To address this issue, a technique has been developed, to stop consumers from immediately taking the next item from the task queue, called poison pill. The idea is that, after setting up the real tasks in the task queue, we also add in dummy tasks that contain "stop" values
and that will have the current consumer hold and allow other consumers to get the next
item in the task queue first; hence the name "poison pill."
The problem seems different: as consumers are started before queue is filled with tasks, most of consumers exit without processing queue, because they find it empty here. Adding poison pills helps because it eliminates the race between empty() and get() calls - and not because it lowers priority of consumers. Poison pills are not in effect before queue has no more tasks, that is why poison pills cannot influence which consumer will take the task.
Moreover, it seems to be a bug in this example: if other consumer steals task from the queue between our consumer calls to empty() and get(), then our consumer will block indefinitely on get(), which actually happens on my laptop.
Who can validate please.

Related

Running multiple celery chains are getting mixed

We have started one celery worker reading from Rabbitmq (one queue):
celery -A tasks worker -c 1 (one process)
We send to RabbitMq 2 chains (3 tasks in each chain):
chain(*tasks_to_chain1).apply_async() (let's call it C1 and its tasks C1.1, C1.2, C1.3)
chain(*tasks_to_chain2).apply_async() (let's call it C2 and its tasks C2.1, C2.2, C2.3)
We expected the tasks to be run in this order: C1.1, C1.2, C1.3, C2.1, C2.2, C2.3.
However we are seeing this instead: C1.1, C2.1, C1.2, C2.2, C1.3, C2.3.
We don't get why. Can someone shed some light on what's happening?
Many thanks,
Edit: more generally speaking we observe that chain 2 starts before chain 1 ends.

Without access to full code it is not possible to test, but a plausible explanation is that your asynchronous sender just happens to do that. I would also assume the order is not deterministic. You will likely get somewhat different results if you keep repeating this.
When you execute apply_async(), an asynchronous task is created that will (probably, not sure without seeing code) start submitting those tasks to the queue, but as the call is not blocking, your main program immediately proceeds to the second apply_async() that creates another background task to submit things to the queue.
These two background tasks will run in the background handled by a scheduler. What you now see in your output is that each task submits one item to the queue but then passes control to the other task, which again submits one and then hands back control.
If you do not want this to happen asynchronously, use apply instead of apply_async. This is a blocking call, and your main program execution does not proceed until the first three tasks have been submitted. With asynchronous you can never be sure of the exact order of execution between tasks. You will know C1.1 will happen before C1.2 but you cannot guarantee how C1 and C2 tasks are interleaved.

What does it mean for a celery task to be "Received"? When all celery workers are blocked, what is happening with new tasks that are not "Received"?

I'm working on a new monitoring system that can measure Celery queue throughput and help alert the team when the queue is getting backed up. Over the course of my work, I've come across some peculiar behaviors that I don't understand (and are not well documented in the Celery specs).
For testing purposes, I've set up an endpoint that will populate the queue with 16 several long-running tasks that can be used to simulate a backed-up queue. The framework is Flask and the Queue broker is Redis. Celery is configured for each worker to work on up to 4 tasks in parallel, and I have 2 workers running.
api/health.py
def health():
health = Blueprint("health", __name__)
#health.route("/api/debug/create-long-queue", methods=["GET"])
def long_queue():
for i in range(16):
sleepy_job.delay()
return make_response({}, 200)
return health
jobs.py
#celery.task(priority=HIGH_PRIORITY)
def sleepy_job(*args, **kwargs):
time.sleep(30)
Here's what I do to simulate a backed-up production queue:
I call /api/debug/create-long-queue to simulate a back-up in my queue. Based on the above math, the workers should be busy sleeping for 1 minute each (Together, they can concurrently handle 8 tasks at a time. Each task just sleeps for 30 seconds, and there are 16 tasks total.)
I make another API call shortly after (< 5 s), which kicks of a different job with real business logic (processing of an inbound webhook API call). We'll call this job handle_incoming_message.
Here's what I see Using flower to inspect the queue:
While all workers are blocked by the first 8 sleepy_job tasks, I see no sign of the new handle_incoming_message on the queue, even though I am certain handle_incoming_message.delay() has been called as a result of the 2nd API call.
After the first 8 sleepy_job tasks have been completed (~30s), I see the new handle_incoming_message on the queue with state RECIEVED.
After the second (and final) 8 sleepy_job tasks have been completed, I now see handle_incoming_message has state STARTED (and I can confirm this as the UI updates with the new data that was received and processed in that task.)
Questions
So it seems clear that when the workers are momentarily unblocked after handling the first 8 sleepy_job tasks, they are doing something to mark/acknowledge the new handle_incoming_message task in a way that is visible to flower. But this leaves several unanswered questions:
What is the state of the new handle_incoming_message task when the workers are blocked?
What changes after workers are unblocked that makes it so flower now has visibility into the new handle_incoming_message task?
What does the "RECEIVED" state actually mean?
(Bonus: How can I get visibility into tasks that are queued while workers are blocked?)

When all workers are blocked SOME tasks could be in the received state because of prefetching (look in the documentation for that). So chances are very high that your tasks are simply in the queue, waiting to be received by Celery workers (coordinating processes - these are not actual worker processes).
Flower is a simple service that is built upon a Celery feature called "task events". In simple terms it (Flower) subscribes itself as receiver of all events (received, succeeded, started, failed, etc) and then visually represents those to the web clients. More about it here. So when task gets received by a Celery worker, a "task-received" event is sent. Flower fetches this event, and changes the state of that task in the dashboard.
When a task is "received" it means that particular Celery worker took that task off the queue and it may be executed immediately (if there is a free worker-process to execute it), or Celery worker will wait for a worker process to become ready to run the task. I have already mentioned prefetching - Celery workers will often take more tasks then available worker-processes.
Celery does not give users a way to list what is in particular queue. That is why you will see many similar questions - including this one which offers answers. You will see my short answer there among others. In short, it depends on your broker of choice. If it is Redis, then you simply go through the list of objects. If it is RabbitMQ then you can use their tool to inspect queues. I think the decision not to provide this is good one as this information is never reliable. By the time you list all the tasks in particular queue, there may be thousands new ones...

What is the difference between a Queue and a JoinableQueue in multiprocessing in Python?

What is the difference between a Queue and a JoinableQueue in multiprocessing in Python? This question has already been asked here, but as some comments point out, the accepted answer is not helpful because all it does is quote the documentation. Could someone explain the difference in terms of when to use one versus the other? For example, why would one choose to use Queue over JoinableQueue if JoinableQueue is pretty much the same thing except for offering the two extra methods join() and task_done(). Additionally, the other answer in the post I linked to mentions that Based on the documentation, it's hard to be sure that Queue is actually empty. which again raises the question as to why would I want to use a Queue over JoinableQueue? What advantages does it offer?

multiprocessing patterns its queues off of queue.Queue. In that model, Queue keeps a "task count" of everything put on the queue. There are generally two ways to use this queue. Producers could just put things on the queue and ignore what happens to them in the long run. The producer may wait from time to time if the queue is full, but doesn't care if any of the things put on the queue are actually processed by the consumer. In this case the queue's task count grows, but who cares?
Alternately, the producer can "join" the queue. That means that it waits until the last task on the queue has been processed and the task count has gone to zero. But to do this, the producer needs the consumer's help. A consumer gets an item from the queue, but that doesn't decrease the task count. The consumer has to actively call task_done (typically when the task is done...) and the join will wait until every put has a task_done.
Fast forward to multiprocessing. The task_done mechanism requires communication between processes which is relatively expensive. If you are a type A producer that doesn't play the join game, use a multiprocessing.Queue and save a bit of CPU time. If you are a type B producer use multiprocessing.JoinableQueue. But remember that the consumer also has to play the task_done game or the producer will hang.

Notify the main thread when a thread is done

I am fairly new to Python programming and Threads isn't my area of expertise. I have a problem for which i would hope that people here can help me out with.
Task: as a part of my master thesis, i need to make a mixed reality game which involves multiplayer capability. in my game design, each player can set a bunch of traps, each of which is active for a specific time period e.g. 30 secs. In order to maintain a consistent game state across all the players, all the time check needs to be done on the server side, which is implemented in Python.
I decided to start a python thread, everytime a new trap is laid by a player and run a timer on the thread. All this part is fine, but the real problem arises when i need to notify the main thread that the time is up for this particular trap, so that i can communicate the same to the client (android device).
i tried creating a queue and inserting information into the queue when the task is done, but i cant do a queue.join() since it will put the main thread on hold till the task is done and this is not what i need nor is it ideal in my case, since the main thread is constantly communicating with the client and if it is halted, then all the communication with the players will come to a standstill.
I need the secondary thread, which is running a timer, to tell the main thread, as soon as the time runs out that the time has run out and send the ID of the trap, so that i can pass this information to the android client to remove it. How can i achieve this ??
Any other suggestions on how this task can be achieved without starting a gazillion threads, are also welcome.. :) :)
Thanks in advance for the help..
Cheers

i have finally found a nice little task scheduler written in python, which actually is quite light and quite handy to schedule events for a later time or date with a callback mechanism, which allows the child thread to pass-back a value to the main thread notifying the main thread of its status and whether the job was successfully done or not.
people out there, who need a similar functionality as the one in the question and dont want to haggle around with threads can use this scheduler to schedule their events and get a callback when the event is done
here is the link to APScheduler

It may be easier to have the timers all done in the main thread - have a list of timers that you keep appending new ones to. Each timer doesn't actually do anything, it just has a time when it goes off - which is easier if you work in arbitrary 'rounds' than in real time, but still doable. Each interval, the mainloop should check all of them, and see if it is time (or past time) for them to expire - if it is, remove them from the list (of course, be careful about removing items from a list you're iterating over - it mightn't do what you expect).
If you have a lot of timers, and by profiling you find out that running through all of them every interval is costing you too much time, a simple optimisation would be to keep them in a heapq - this will keep them sorted for you, so you know after the first one that hasn't expired yet that none of the rest have either. Something like:
while True:
if not q:
break
timer = heapq.heappop(q)
if timer.expiry <= currenttime:
# trigger events
else:
heapq.heappush(q)
break
This does still cost you one unnecessary pop/push pair, but its hard to see how you would do better - again, doing something like:
for timer in q:
if timer.expiry <= currenttime:
heapq.heappop(timer)
# trigger events
else:
break
Can have subtle bugs because list iterators (functions in heapq work on sequences and use side effects, rather than there being a full-fledged heapq class for some reason) work by keeping track of what index they're up to - so if you remove the current element, you push everything after it one index to the left and end up skipping the next one.
The only important thing is that currenttime is consistently updated each interval in the main loop (or, if your heart is set on having it in real time, based on the system clock), and timer.expiry is measured in the same units - if you have a concept of 'rounds', and a trap lasts six rounds, when it is placed you would do heapq.heappush(q, Timer(expiry=currenttime+6).
If you do want to do it the multithreaded way, your way of having a producer/consumer queue for cleanup will work - you just need to not use Queue.join(). Instead, as the timer in a thread runs out, it calls q.put(), and then dies. The mainloop would use q.get(False), which will avoid blocking, or else q.get(True, 0.1) which will block for at most 0.1 seconds - the timeout can be any positive number; tune it carefully for the best tradeoff between blocking long enough that clients notice and having events go off late because they only just missed being in the queue on time.

The main thread creates a queue and a bunch of worker threads that are
pulling tasks from the queue. As long as the queue is empty all worker
threads block and do nothing. When a task is put into the queue a random
worker thread acquires the task, does it job and sleeps as soon as its
ready. That way you can reuse a thread over and over again without
creating a new worker threads.
When you need to stop the threads you put a kill object into the queue
that tells the thread to shut down instead of blocking on the queue.

What's the best pattern to design an asynchronous RPC application using Python, Pika and AMQP?

The producer module of my application is run by users who want to submit work to be done on a small cluster. It sends the subscriptions in JSON form through the RabbitMQ message broker.
I have tried several strategies, and the best so far is the following, which is still not fully working:
Each cluster machine runs a consumer module, which subscribes itself to the AMQP queue and issues a prefetch_count to tell the broker how many tasks it can run at once.
I was able to make it work using SelectConnection from the Pika AMQP library. Both consumer and producer start two channels, one connected to each queue. The producer sends requests on channel [A] and waits for responses in channel [B], and the consumer waits for requests on channel [A] and send responses on channel [B]. It seems, however, that when the consumer runs the callback that calculates the response, it blocks, so I have only one task executed at each consumer at each time.
What I need in the end:
the consumer [A] subscribes his tasks (around 5k each time) to the cluster
the broker dispatches N messages/requests for each consumer, where N is the number of concurrent tasks it can handle
when a single task is finished, the consumer replies to the broker/producer with the result
the producer receives the replies, update the computation status and, in the end, prints some reports
Restrictions:
If another user submits work, all of his tasks will be queued after the previous user (I guess this is automatically true from the queue system, but I haven't thought about the implications on a threaded environment)
Tasks have an order to be submitted, but the order they are replied is not important
UPDATE
I have studied a bit further and my actual problem seems to be that I use a simple function as callback to the pika's SelectConnection.channel.basic_consume() function. My last (unimplemented) idea is to pass a threading function, instead of a regular one, so the callback would not block and the consumer can keep listening.

As you have noticed, your process blocks when it runs a callback. There are several ways to deal with this depending on what your callback does.
If your callback is IO-bound (doing lots of networking or disk IO) you can use either threads or a greenlet-based solution, such as gevent, eventlet, or greenhouse. Keep in mind, though, that Python is limited by the GIL (Global Interpreter Lock), which means that only one piece of python code is ever running in a single python process. This means that if you are doing lots of computation with python code, these solutions will likely not be much faster than what you already have.
Another option would be to implement your consumer as multiple processes using multiprocessing. I have found multiprocessing to be very useful when doing parallel work. You could implement this by either using a Queue, having the parent process being the consumer and farming out work to its children, or by simply starting up multiple processes which each consume on their own. I would suggest, unless your application is highly concurrent (1000s of workers), to simply start multiple workers, each of which consumes from their own connection. This way, you can use the acknowledgement feature of AMQP, so if a consumer dies while still processing a task, the message is sent back to the queue automatically and will be picked up by another worker, rather than simply losing the request.
A last option, if you control the producer and it is also written in Python, is to use a task library like celery to abstract the task/queue workings for you. I have used celery for several large projects and have found it to be very well written. It will also handle the multiple consumer issues for you with the appropriate configuration.

Your setup sounds good to me. And you are right, you can simply set the callback to start a thread and chain that to a separate callback when the thread finishes to queue the response back over Channel B.
Basically, your consumers should have a queue of their own (size of N, amount of parallelism they support). When a request comes in via Channel A, it should store the result in the queue shared between the main thread with Pika and the worker threads in the thread pool. As soon it is queued, pika should respond back with ACK, and your worker thread would wake up and start processing.
Once the worker is done with its work, it would queue the result back on a separate result queue and issue a callback to the main thread to send it back to the consumer.
You should take care and make sure that the worker threads are not interfering with each other if they are using any shared resources, but that's a separate topic.

Being unexperienced in threading, my setup would run multiple consumer processes (the number of which basically being your prefetch count). Each would connect to the two queues and they would process jobs happily, unknowning of eachother's existence.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.