Can someone tell me whether Celery executes a task in a thread or in a separate child process? The documentation doesn't seem to explain it (read it like 3 times). If it is a thread, how does it get pass the GIL (particularly whom and how an event is notified)?
How would you compare celery's async with Twisted's reactor model? Is celery using reactor model after all?
Thanks,
Can someone tell me whether Celery executes a task in a thread or in a
separate child process?
Neither, the task will be executed in a separate process possibly on a different machine. It is not a child process of the thread where you call 'delay'. The -C and -P options control how the worker process manages it's own threading. The worker processes get tasks through a message service which is also completely independent.
How would you compare celery's async with Twisted's reactor model? Is
celery using reactor model after all?
Twisted is an event queue. It is asynchronous but it's not designed for parallel processing.
-c and -P are the concurrency related options for celery worker.
-c CONCURRENCY, --concurrency=CONCURRENCY
Number of child processes processing the queue. The
default is the number of CPUs available on your
system.
-P POOL_CLS, --pool=POOL_CLS
Pool implementation: processes (default), eventlet,
gevent, solo or threads.
using eventlet:
http://docs.celeryproject.org/en/latest/userguide/concurrency/eventlet.html#enabling-eventlet
http://docs.celeryproject.org/en/latest/internals/reference/celery.concurrency.processes.html
Related
Cheers,
I have a celery setup running in a production environment (on Linux) where I need to consume two different task types from two dedicated queues (one for each). The problem that arises is, that all workers are always bound to both queues, even when I specify them to only consume from one of them.
TL;DR
Celery running with 2 queues
Messages are published in correct queue as designed
Workers keep consuming both queues
Leads to deadlock
General Information
Think of my two different task types as a hierarchical setup:
A task is a regular celery task that may take quite some time, because it dynamically dispatches other celery tasks and may be required to chain through their respective results
A node is a dynamically dispatched sub-task, which also is a regular celery task but itself can be considered an atomic unit.
My task thus can be a more complex setup of nodes where the results of one or more nodes serves as input for one or more subsequent nodes, and so on. Since my tasks can take longer and will only finish when all their nodes have been deployed, it is essential that they are handled by dedicated workers to keep a sufficient number of workers free to consume the nodes. Otherwise, this could lead to the system being stuck, when a lot of tasks are dispatched, each consumed by another worker, and their respective nodes are only queued but will never be consumed, because all workers are blocked.
If this is a bad design in general, please make any propositions on how I can improve it. I did not yet manage to build one of these processes using celery's built-in canvas primitives. Help me, if you can?!
Configuration/Setup
I run celery with amqp and have set up the following queues and routes in the celery configuration:
CELERY_QUERUES = (
Queue('prod_nodes', Exchange('prod'), routing_key='prod.node'),
Queue('prod_tasks', Exchange('prod'), routing_key='prod.task')
)
CELERY_ROUTES = (
'deploy_node': {'queue': 'prod_nodes', 'routing_key': 'prod.node'},
'deploy_task': {'queue': 'prod_tasks', 'routing_key': 'prod.task'}
)
When I launch my workers, I issue a call similar to the following:
celery multi start w_task_01 w_node_01 w_node_02 -A my.deployment.system \
-E -l INFO -P gevent -Q:1 prod_tasks -Q:2-3 prod_nodes -c 4 --autoreload \
--logfile=/my/path/to/log/%N.log --pidfile=/my/path/to/pid/%N.pid
The Problem
My queue and routing setup seems to work properly, as I can see messages being correctly queued in the RabbitMQ Management web UI.
However, all workers always consume celery tasks from both queues. I can see this when I start and open up the flower web UI and inspect one of the deployed tasks, where e.g. w_node_01 starts consuming messages from the prod_tasks queue, even though it shouldn't.
The RabbitMQ Management web UI furthermore tells me, that all started workers are set up as consumers for both queues.
Thus, I ask you...
... what did I do wrong?
Where is the issue with my setup or worker start call; How can I circumvent the problem of workers always consuming from both queues; Do I really have to make additional settings during runtime (what I certainly do not want)?
Thanks for your time and answers!
You can create 2 separate workers for each queue and each one's define what queue it should get tasks from using the -Q command line argument.
If you want to keep the number processes the same, by default a process is opened for each core for each worker you can use the --concurrency flag (See Celery docs for more info)
Celery allows configuring a worker with a specific queue.
1) Specify the name of the queue with 'queue' attribute for different types of jobs
celery.send_task('job_type1', args=[], kwargs={}, queue='queue_name_1')
celery.send_task('job_type2', args=[], kwargs={}, queue='queue_name_2')
2) Add the following entry in configuration file
CELERY_CREATE_MISSING_QUEUES = True
3) On starting the worker, pass -Q 'queue_name' as argument, for consuming from that desired queue.
celery -A proj worker -l info -Q queue_name_1 -n worker1
celery -A proj worker -l info -Q queue_name_2 -n worker2
For gevent pool, we could use the paramter '-c' to specify the greenlet number. As the following shows.
celery worker -A celerytasks.celery_worker_init -P gevent -c 1000 --loglevel=info
But what is the propriate value for it?
And what does it really mean for 'greenlet number'? If I specify 1000 greenlets, what will happen if they are used out?
The -c parameter is a Celery parameter that allows you to specify how many task at the same time will be executed from the worker, here is an extract from the doc:
The number of worker processes/threads can be changed using the --concurrency argument and defaults to the number of CPUs available on the machine.
When running celery and instructing to use gevent this means that instead of using prefork celery will use green thread to process your tasks.
In this case you can refer to this page about eventlet on celery docs, that is not the same as gevent but the same concepts apply.
Here an extract that can help you understand:
The prefork pool can take use of multiple processes, but how many is often limited to a few processes per CPU. With Eventlet you can efficiently spawn hundreds, or thousands of green threads.
However note that gevent or eventlet is not always better than prefork, often depends on what type of task you need to execute like explained in this part of the doc:
Celery supports Eventlet as an alternative execution pool implementation. It is in some cases superior to prefork, but you need to ensure your tasks do not perform blocking calls, as this will halt all other operations in the worker until the blocking call returns.
About the dimensioning is hard to answer because also this depends strongly on your type of tasks and average load.
In my personal experience is often better to don't go over 1000-3000 for one celery worker.
One last note, from celery docs looks like that gevent is still in experimental state , you may want to use eventlet instead.
I need to create a semaphore to restrict the parallel count of a particular subprocess. I am using gunicorn with eventlet workers and allow many simultaneous connections. Mostly these are waiting on remote data. However, they all enter a processing phase at some point and this involves calling a subprocess. This subprocess though should not be run too often in parallel as it is memory/CPU hungry.
Is threading.Semaphore correctly monkey_patch'd and usable with eventlet inside gunicorn?
As I understand the problem:
one gunicorn process (this is crucial) spawns N green threads
each worker may spawn one or more subprocesses
you want to limit total number of subprocesses
In this case, yes, semaphore will work as expected.
However, if you have more than one process, they will have separate instances of semaphore and you would observe more subprocesses. In this case, I recommend to move subprocess responsibilities to a separate application, running on same machine and call it via API you like (RPC/socket/message queue/dbus/etc). You could design the system like this:
user -> gunicorn (any number of processes)
gunicorn -> one subprocess manager
manager -> N subprocesses
The manager listens for jobs from gunicorn, spawns a subprocess if needed, maybe reuses existing subprocesses. You may like a job queue system like Beanstalk, Celery, Gearman. Or you may wish to build a custom solution on top of existing message transports like NSQ, RabbitMQ, ZeroMQ.
If I hook up a callback to the celery task_success signal handler, which process does it get executed in? The child or the worker process?
The documentation does not explicitly list it. (It lists it for the signal task_sent, but not for the other signals: http://docs.celeryproject.org/en/latest/userguide/signals.html#task-sent)
thanks...
There's no such thing as a "child" process; there is the process sending the task (which can be any Python process, including a celery worker, or celery beat, or anything else) and there is the worker that processes the task.
All the task signals except task_sent are executed in the worker that processes the task; in fact they can't possibly execute anywhere else. Celery signals (like Django signals) are not like operating system events, or like Celery tasks, which can originate in one process and trigger something in another process; they get processed in the same process as that in which they originate. They have nothing to do with the Python standard library signal module.
I'm using Django celery for my project requirements.
How can i do the multiprocessing of two queues that uses a single worker?
I don't want one queue to wait for the task in other queue to finish.
Please advice
Why not just use concurrency of >1 ? Like so:
celeryd -c 5
That should start a worker with 5 processes that can work on things in parallel.