We have started one celery worker reading from Rabbitmq (one queue):
celery -A tasks worker -c 1 (one process)
We send to RabbitMq 2 chains (3 tasks in each chain):
chain(*tasks_to_chain1).apply_async() (let's call it C1 and its tasks C1.1, C1.2, C1.3)
chain(*tasks_to_chain2).apply_async() (let's call it C2 and its tasks C2.1, C2.2, C2.3)
We expected the tasks to be run in this order: C1.1, C1.2, C1.3, C2.1, C2.2, C2.3.
However we are seeing this instead: C1.1, C2.1, C1.2, C2.2, C1.3, C2.3.
We don't get why. Can someone shed some light on what's happening?
Many thanks,
Edit: more generally speaking we observe that chain 2 starts before chain 1 ends.
Without access to full code it is not possible to test, but a plausible explanation is that your asynchronous sender just happens to do that. I would also assume the order is not deterministic. You will likely get somewhat different results if you keep repeating this.
When you execute apply_async(), an asynchronous task is created that will (probably, not sure without seeing code) start submitting those tasks to the queue, but as the call is not blocking, your main program immediately proceeds to the second apply_async() that creates another background task to submit things to the queue.
These two background tasks will run in the background handled by a scheduler. What you now see in your output is that each task submits one item to the queue but then passes control to the other task, which again submits one and then hands back control.
If you do not want this to happen asynchronously, use apply instead of apply_async. This is a blocking call, and your main program execution does not proceed until the first three tasks have been submitted. With asynchronous you can never be sure of the exact order of execution between tasks. You will know C1.1 will happen before C1.2 but you cannot guarantee how C1 and C2 tasks are interleaved.
Related
In the book "Mastering Concurrency in Python", chapter 6 "Working with Processes in Python": "Message passing between several workers", example 7, there is implementation of task queue. Here is its code: https://github.com/PacktPublishing/Mastering-Concurrency-in-Python/blob/master/Chapter06/example7.py
The author states the problem with this example:
Everything seems to be working, but if we look closely at the messages our processes have printed out, we will notice that most of the tasks were executed by either Consumer-2 or Consumer-3, and that Consumer-4 executed only one task while Consumer-1 failed to execute any. What happened here?
I tried to understand the explanation of the problem that author gives and it looks like it is wrong:
Essentially, when one of our consumers—let's say Consumer-3—finished executing a task,
it tried to look for another task to execute immediately after. Most of the time, it would get priority over other consumers, since it was already being run by the main program. So
while Consumer-2 and Consumer-3 were constantly finishing their tasks' executions and
picking up other tasks to execute, Consumer-4 was only able to "squeeze" itself in once,
and Consumer-1 failed to do this altogether.
To address this issue, a technique has been developed, to stop consumers from immediately taking the next item from the task queue, called poison pill. The idea is that, after setting up the real tasks in the task queue, we also add in dummy tasks that contain "stop" values
and that will have the current consumer hold and allow other consumers to get the next
item in the task queue first; hence the name "poison pill."
The problem seems different: as consumers are started before queue is filled with tasks, most of consumers exit without processing queue, because they find it empty here. Adding poison pills helps because it eliminates the race between empty() and get() calls - and not because it lowers priority of consumers. Poison pills are not in effect before queue has no more tasks, that is why poison pills cannot influence which consumer will take the task.
Moreover, it seems to be a bug in this example: if other consumer steals task from the queue between our consumer calls to empty() and get(), then our consumer will block indefinitely on get(), which actually happens on my laptop.
Who can validate please.
Celery will send task to idle workers.
I have a task will run every 5 seconds, and I want this task to only be sent to one specify worker.
Other tasks can share the left over workers
Can celery do this??
And I want to know what this parameter is: CELERY_TASK_RESULT_EXPIRES
Does it means that the task will not be sent to a worker in the queue?
Or does it stop the task if it runs too long?
Sure, you can. Best way to do it, separate celery workers using different queues. You just need to make sure that task you need goes to separate queue, and your worker listening particular queue.
Long story for this: http://docs.celeryproject.org/en/latest/userguide/routing.html
Just to answer your second question CELERY_TASK_RESULT_EXPIRES is the time in seconds that the result of the task is persisted. So after a task is over, its result is saved into your result backend. The result is kept there for the amount of time specified by that parameter. That is used when a task result might be accessed by different callers.
This has probably nothing to do with your problem. As for the first solution, as already stated you have to use multiple queues. However be aware that you cannot assign the task to a specific Worker Process, just to a specific Worker which will then assign it to a specific Worker Process.
I have a Django application that needs to run an optimization algorithm. This algorithm is composed of two parts. The first part is an evolutionary algorithm and this algorithm calls a certain number of tasks of the second part which is a simulated annealing algorithm.
The problem is that celery doesn't allow a task calls an asynchronous task.
I have tried this code below:
sa_list = []
for cromossomo in self._populacao:
sa_list.append(simulated_annealing_t.s(cromossomo.to_JSON(), self._NR, self._T, get_mutacao_str(self._mutacao_SA), self._argumentos))
job = group(sa_list)
result = job.apply_async()
resultados = result.get()
This code is part of the evolutionary algorithm which is a celery task.
When I tried to run it the celery shows this message:
[2015-12-02 16:20:15,970: WARNING/Worker-1] /home/arthur/django-user/local/lib/python2.7/site-packages/celery/result.py:45: RuntimeWarning: Never call result.get() within a task!
See http://docs.celeryq.org/en/latest/userguide/tasks.html#task-synchronous-subtasks
In Celery 3.2 this will result in an exception being
raised instead of just being a warning.
despite being just a warning the celery seems to be full of tasks and locks.
I searched for a lot of solutions but none of them worked.
one way to deal with this is to have a 2 stage pipeline:
def first_task():
sa_list = []
for cromossomo in self._populacao:
sa_list.append(simulated_annealing_t.s(cromossomo.to_JSON(), self._NR, self._T, get_mutacao_str(self._mutacao_SA), self._argumentos))
job = group(sa_list)
result = job.apply_async()
result.save()
return result.id
then call it like this:
from path.to.tasks import app, first_task
result_1 = first_task.apply_async()
result_2_id = result_1.get()
result_2 = app.GroupResult.restore(result_2_id)
resultados = result_2.get()
there are other ways to do this that involve more work - you could use a chord to gather the results of the group.
The problem is not that celery doesn't allow the execution of async tasks in your example, but that you'll run into a deadlock, hence the warning:
Let's assume you have a task A that spawns a number of subtasks B through apply_async(). Every one of those tasks is executed by a worker. The problem is that if the number of tasks B is larger than the amount of available workers, task A is still waiting for their results (in your example, at least - it's not by default). When task A is still running, the workers that have executed a task B will not execute another one, they are blocked until task A is finished. (I don't know exactly why, but I had this problem just a few weeks ago.)
This means that celery can't execute anything until you manually shut down the workers.
Solutions
This depends entirely what you will do with your task results. If you need them to execute the next subtask, you can chain them through Linking with callbacks or by hardcoding it into the respective tasks (so that you call the first, that calls the second, and so on).
If you only need to see if they are executed and are successful or not, you can use flower to monitor your tasks.
If you need to process the output of all the subtasks further, I recommend writing the results to an xml file: Have task A call all tasks B, and once they are done you execute task C that processes the results. Maybe there are more elegant solutions, but this avoids the deadlock for sure.
By code:
#celery.task()
def some_recursive_task():
# Do some stuff and schedule it to run again later
# Note that the next run is not scheduled in a fixed basis, like crontabs
# but based on history of some object
# Actual task is found here:
# https://github.com/rafaelsierra/cheddar/blob/master/src/feeds/tasks.py#L39
# Then it call himself again
countdown = bla.get_countdown()
some_recursive_task.apply_async(countdown=countdown)
This task will run withing the next 10 minutes and 12 hours, but this task also calls another tasks that should run now, one for downloading stuff and other to parse it.
The problem is that the main function is called for every single record on database, let's assume a few hundred tasks running, but, considering that those task runs in average every few hours the amount of tasks is not a big deal.
The problem starts when I try to run this with a single worker, when I start the worker, I put it to run all queues and set 8 concurrent workers, then it starts an begin to acknowledge the tasks, but it seems that, no matter how far in future a task is set to, a worker will get it and wait for the its scheduled run, meaning that this worker is locked until then.
I know that I can just split the two other functions into different queues, which I already did, but my concern is that workers will acknowledge tasks to run 12 hours ahead and will not run the ones it should in 30 minutes.
Shouldn't workers ignore scheduled tasks until its time and run the ones that are just delayed without a time?
I don't think, or don't know how, periodic tasks is a solution.
See the points 5 & 6 there. Please, keep in mind that countdown is no different from eta argument of the task.
In short you're right. Single worker (or any amount of workers) should not block on scheduled (eta or countdown) tasks.
How can you tell that workers are locked? The scheduled tasks are prefetched from the queue, but not acknowledged until they are executed.
Also, please keep in mind all scheduled tasks are kept in RAM until they're executed. You would like them to be as light as possible. From what I understand the scheduled task doesn't pass around big chunks of data, probably only some URI, so this shouldn't be a problem.
The links you've pasted return 404. Are you sure cheddar isn't a private repository?
I'm using Celery to queue jobs from a CGI application I made. The way I've set it up, Celery makes each job run one- or two-at-a-time by setting CELERYD_CONCURRENCY = 1 or = 2 (so they don't crowd the processor or thrash from memory consumption). The queue works great, thanks to advice I got on StackOverflow.
Each of these jobs takes a fair amount of time (~30 minutes serial), but has an embarrassing parallelizability. For this reason, I was using Pool.map to split it and do the work in parallel. It worked great from the command line, and I got runtimes around 5 minutes using a new many-cored chip.
Unfortunately, there is some limitation that does not allow daemonic process to have subprocesses, and when I run the fancy parallelized code within the CGI queue, I get this error:
AssertionError: daemonic processes are not allowed to have children
I noticed other people have had similar questions, but I can't find an answer that wouldn't require abandoning Pool.map altogether, and making more complicated thread code.
What is the appropriate design choice here? I can easily run my serial jobs using my Celery queue. I can also run my much faster parallelized jobs without a queue. How should I approach this, and is it possible to get what I want (both the queue and the per-job parallelization)?
A couple of ideas I've had (some are quite hacky):
The job sent to the Celery queue simply calls the command line program. That program can use Pool as it pleases, and then saves the result figures & data to a file (just as it does now). Downside: I won't be able to check on the status of the job or see if it terminated successfully. Also, system calls from CGI may cause security issues.
Obviously, if the queue is very full of jobs, I can make use of the CPU resources (by setting CELERYD_CONCURRENCY = 6 or so); this will allow many people to be "at the front of the queue" at once.Downside: Each job will spend a lot of time at the front of the queue; if the queue isn't full, there will be no speedup. Also, many partially finished jobs will be stored in memory at the same time, using much more RAM.
Use Celery's #task to parallelize within sub-jobs. Then, instead of setting CELERYD_CONCURRENCY = 1, I would set it to 6 (or however many sub jobs I'd like to allow in memory at a time). Downside: First of all, I'm not sure whether this will successfully avoid the "task-within-task" problem. But also, the notion of queue position may be lost, and many partially finished jobs may end up in memory at once.
Perhaps there is a way to call Pool.map and specify that the threads are non-daemonic? Or perhaps there is something more lightweight I can use instead of Pool.map? This is similar to an approach taken on another open StackOverflow question. Also, I should note that the parallelization I exploit via Pool.map is similar to linear algebra, and there is no inter-process communication (each just runs independently and returns its result without talking to the others).
Throw away Celery and use multiprocessing.Queue. Then maybe there'd be some way to use the same "thread depth" for every thread I use (i.e. maybe all of the threads could use the same Pool, avoiding nesting)?
Thanks a lot in advance.
What you need is a workflow management system (WFMS) that manages
task concurrency
task dependency
task nesting
among other things.
From a very high level view, a WFMS sits on top of a task pool like celery, and submits the tasks which are ready to execute to the pool. It is also responsible for opening up a nest and submitting the tasks in the nest accordingly.
I've developed a system to do just that. It's called pomsets. Try it out, and feel free to send me any questions.
I using a multiprocessed deamons based on Twisted with forking and Gearman jobs query normally.
Try to look at Gearman.