django celery - multiprocessing

django celery - multiprocessing - python

I'm using Django celery for my project requirements.
How can i do the multiprocessing of two queues that uses a single worker?
I don't want one queue to wait for the task in other queue to finish.
Please advice

Why not just use concurrency of >1 ? Like so:
celeryd -c 5
That should start a worker with 5 processes that can work on things in parallel.

Related

Django Celery & Django-Celery-Beat

I'm new to asynchronous tasks and I'm using django-celery and was hoping to use django-celery-beat to schedule periodic tasks.
However it looks like celery-beat doesn't pick up one-off tasks. Do I need two Celery instances, one as a worker for one off tasks and one as beat for scheduled tasks for this to work?

Pass -B parameter to your worker, it is a parameter to run beat schedule. This worker will do all other tasks, the ones sent from beat, and the "one-off" ones, it really doesn't matter for worker.
So the full command looks like:
celery -A flock.celery worker -l DEBUG -BE.

If you have multiple periodic tasks executing for example every 10 seconds, then they should all point to the same schedule object. please refer here

Single Celery task suspend all other Celery workers

I have a celery configuration (over django-celery) with rabbit MQ as a broker and concurrency of 20 threads.
one of the tasks is taking really long time (about an hour) to be executed and. after a few minutes that the task running all the other concurrency threads stop working until the task finish, why this is happening?
Thanks!

Need to have multiple worker to pick up the task.
For example, environment needs to have at least 2 CPU.
http://docs.celeryproject.org/en/latest/userguide/workers.html#concurrency
May use celery flower to inspect the workers.

Stop celery workers to consume from all queues

Cheers,
I have a celery setup running in a production environment (on Linux) where I need to consume two different task types from two dedicated queues (one for each). The problem that arises is, that all workers are always bound to both queues, even when I specify them to only consume from one of them.
TL;DR
Celery running with 2 queues
Messages are published in correct queue as designed
Workers keep consuming both queues
Leads to deadlock
General Information
Think of my two different task types as a hierarchical setup:
A task is a regular celery task that may take quite some time, because it dynamically dispatches other celery tasks and may be required to chain through their respective results
A node is a dynamically dispatched sub-task, which also is a regular celery task but itself can be considered an atomic unit.
My task thus can be a more complex setup of nodes where the results of one or more nodes serves as input for one or more subsequent nodes, and so on. Since my tasks can take longer and will only finish when all their nodes have been deployed, it is essential that they are handled by dedicated workers to keep a sufficient number of workers free to consume the nodes. Otherwise, this could lead to the system being stuck, when a lot of tasks are dispatched, each consumed by another worker, and their respective nodes are only queued but will never be consumed, because all workers are blocked.
If this is a bad design in general, please make any propositions on how I can improve it. I did not yet manage to build one of these processes using celery's built-in canvas primitives. Help me, if you can?!
Configuration/Setup
I run celery with amqp and have set up the following queues and routes in the celery configuration:
CELERY_QUERUES = (
Queue('prod_nodes', Exchange('prod'), routing_key='prod.node'),
Queue('prod_tasks', Exchange('prod'), routing_key='prod.task')
)
CELERY_ROUTES = (
'deploy_node': {'queue': 'prod_nodes', 'routing_key': 'prod.node'},
'deploy_task': {'queue': 'prod_tasks', 'routing_key': 'prod.task'}
)
When I launch my workers, I issue a call similar to the following:
celery multi start w_task_01 w_node_01 w_node_02 -A my.deployment.system \
-E -l INFO -P gevent -Q:1 prod_tasks -Q:2-3 prod_nodes -c 4 --autoreload \
--logfile=/my/path/to/log/%N.log --pidfile=/my/path/to/pid/%N.pid
The Problem
My queue and routing setup seems to work properly, as I can see messages being correctly queued in the RabbitMQ Management web UI.
However, all workers always consume celery tasks from both queues. I can see this when I start and open up the flower web UI and inspect one of the deployed tasks, where e.g. w_node_01 starts consuming messages from the prod_tasks queue, even though it shouldn't.
The RabbitMQ Management web UI furthermore tells me, that all started workers are set up as consumers for both queues.
Thus, I ask you...
... what did I do wrong?
Where is the issue with my setup or worker start call; How can I circumvent the problem of workers always consuming from both queues; Do I really have to make additional settings during runtime (what I certainly do not want)?
Thanks for your time and answers!

You can create 2 separate workers for each queue and each one's define what queue it should get tasks from using the -Q command line argument.
If you want to keep the number processes the same, by default a process is opened for each core for each worker you can use the --concurrency flag (See Celery docs for more info)

Celery allows configuring a worker with a specific queue.
1) Specify the name of the queue with 'queue' attribute for different types of jobs
celery.send_task('job_type1', args=[], kwargs={}, queue='queue_name_1')
celery.send_task('job_type2', args=[], kwargs={}, queue='queue_name_2')
2) Add the following entry in configuration file
CELERY_CREATE_MISSING_QUEUES = True
3) On starting the worker, pass -Q 'queue_name' as argument, for consuming from that desired queue.
celery -A proj worker -l info -Q queue_name_1 -n worker1
celery -A proj worker -l info -Q queue_name_2 -n worker2

What is the propriate number of gevent greenlet in Celery startup command line

For gevent pool, we could use the paramter '-c' to specify the greenlet number. As the following shows.
celery worker -A celerytasks.celery_worker_init -P gevent -c 1000 --loglevel=info
But what is the propriate value for it?
And what does it really mean for 'greenlet number'? If I specify 1000 greenlets, what will happen if they are used out?

The -c parameter is a Celery parameter that allows you to specify how many task at the same time will be executed from the worker, here is an extract from the doc:
The number of worker processes/threads can be changed using the --concurrency argument and defaults to the number of CPUs available on the machine.
When running celery and instructing to use gevent this means that instead of using prefork celery will use green thread to process your tasks.
In this case you can refer to this page about eventlet on celery docs, that is not the same as gevent but the same concepts apply.
Here an extract that can help you understand:
The prefork pool can take use of multiple processes, but how many is often limited to a few processes per CPU. With Eventlet you can efficiently spawn hundreds, or thousands of green threads.
However note that gevent or eventlet is not always better than prefork, often depends on what type of task you need to execute like explained in this part of the doc:
Celery supports Eventlet as an alternative execution pool implementation. It is in some cases superior to prefork, but you need to ensure your tasks do not perform blocking calls, as this will halt all other operations in the worker until the blocking call returns.
About the dimensioning is hard to answer because also this depends strongly on your type of tasks and average load.
In my personal experience is often better to don't go over 1000-3000 for one celery worker.
One last note, from celery docs looks like that gevent is still in experimental state , you may want to use eventlet instead.

Is celery's apply_async thread or process?

Can someone tell me whether Celery executes a task in a thread or in a separate child process? The documentation doesn't seem to explain it (read it like 3 times). If it is a thread, how does it get pass the GIL (particularly whom and how an event is notified)?
How would you compare celery's async with Twisted's reactor model? Is celery using reactor model after all?
Thanks,

Can someone tell me whether Celery executes a task in a thread or in a
separate child process?
Neither, the task will be executed in a separate process possibly on a different machine. It is not a child process of the thread where you call 'delay'. The -C and -P options control how the worker process manages it's own threading. The worker processes get tasks through a message service which is also completely independent.
How would you compare celery's async with Twisted's reactor model? Is
celery using reactor model after all?
Twisted is an event queue. It is asynchronous but it's not designed for parallel processing.

-c and -P are the concurrency related options for celery worker.
-c CONCURRENCY, --concurrency=CONCURRENCY
Number of child processes processing the queue. The
default is the number of CPUs available on your
system.
-P POOL_CLS, --pool=POOL_CLS
Pool implementation: processes (default), eventlet,
gevent, solo or threads.
using eventlet:
http://docs.celeryproject.org/en/latest/userguide/concurrency/eventlet.html#enabling-eventlet
http://docs.celeryproject.org/en/latest/internals/reference/celery.concurrency.processes.html

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.