I started running celery for tasks in a Python/Django web project, hosted on a single VM with 8 cores or CPUs. I need to improve the configuration now - I've made rookie mistakes.
I use supervisor to handle celery workers and beat. In /etc/supervisor/conf.d/, I have two worker-related conf files - celery1.conf and celery1.conf. Should I...
1) Remove one of them? Both spawn different workers. I.e. the former conf file has command=python manage.py celery worker -l info -n celeryworker1. The latter has command=python manage.py celery worker -l info -n celeryworker2. And it's authoritatively stated here to run 1 worker per machine.
2) Tinker with numprocs in the conf? Currently in celery1.conf, I've defined numprocs=2. In celery2.conf, I've defined numprocs=3* (see footer later). At the same time, in /etc/default/celeryd, I have CELERYD_OPTS="--time-limit=300 --concurrency=8". So what's going on? supervisor's numprocs takes precedence over concurrency in celeryd, or what? Should I set numprocs=0?
*total numprocs over both files = 2+3 = 5. This checks out. sudo supervisorctl shows 5 celery worker processes. But in newrelic, I see 45 processes running for celeryd. What the heck?! Even if each proc created by supervisor is actually giving birth to 8 procs (via celeryd), total numprocs x concurrency = 5 x 8 = 40. That's 5 less than the 45 shown by newrelic. Need guidance in righting these wrongs.
Compare screenshots:
vs
it's authoritatively stated here to run 1 worker per machine
Actually, it's advised ("I would suggest") to only run one worker per machine for this given use case.
You may have pretty good reason to do otherwise (for example having different workers for different queues...), and the celery doc states that how many worker / how many processes per worker (concurrency) works best really depends on you tasks, usage, machine and whatnots.
wrt/ numprocs in supervisor conf and concurrency in celery, these are totally unrelated (well, almost...) things. A celery "worker" is actually one main process spawning concurrency children (which are the ones effectively handling your tasks). Supervisor's numprocs tells supervisor how many processes (here: the celery workers) it should launch. So if you have one celery conf with numprocs = 2 and another one with numproc = 3, this means you launch a total of 5 parents worker processes - each of them spawning n subchilds, where - by default - n is your server's cpus count. This means you have a total of 5 + (5*8) = 45 worker subprocesses running.
Wether you actually need that many workers is a question only you can answer ;)
Related
In my case still I'm trying to understand something about it. Running long tasks (they take from 20 mins to 2 hours) I have a weird scenario in which my celery worker, after a while (15-20 mins) pass from status=online to offline, however they still have active=1.
After this I see how the same task is started in another celery worker. Ant the process repeat. This happens again until I have the same task running three times at the same time in different workers. All of them offline with active=1 after a while
What does it mean to have a celery worker status=offline with active=1?
What can be the reason to have a worker on this state?
This question is regarding the use of multiple remote Celery workers on separate machines. The implementation of the App can be conceptualized as:
My App (Producer) will be adding multiple tasks (say 50) to the queue every 5 mins (imagine a python for loop iterating over a list of tasks to be performed asynchronously at every 5 min interval). I want the celery workers (which will be remote machines) to pick these tasks up as soon as they are pushed.
My question is will Celery/RabbitMQ automatically handle task distribution (so no Worker picks up a task that has already been picked up by a worker from the queue - i.e. to ensure work is not duplicated) and distribute the tasks evenly so no worker is left lazying about while other workers are working hard or do these have to be configured/programmed in the settings?*
I would most appreciate it if someone could forward me relevant documentation (I was checking out Celery docs but couldn't find this specific info regarding remote celery workers in this context.)
Automatically but you need to be aware of prefetching feature which is described here: http://docs.celeryproject.org/en/latest/userguide/optimizing.html#prefetch-limits, read until the end of the page.
In short, prefetching works on two levels: worker level and process level, since a worker may have multiple processes. To disable prefetch on worker level you need to specify worker_prefetch_multiplier = 1 in celery settings, to disable on the process level you need to specify -Ofair option in worker's command line.
So after digging around in RabbitMQ docs it seems that the default exchange method is Direct Exchange (ref https://www.rabbitmq.com/tutorials/amqp-concepts.html) which means that tasks will be distributed to workers in a round-robin manner.
Cheers,
I have a celery setup running in a production environment (on Linux) where I need to consume two different task types from two dedicated queues (one for each). The problem that arises is, that all workers are always bound to both queues, even when I specify them to only consume from one of them.
TL;DR
Celery running with 2 queues
Messages are published in correct queue as designed
Workers keep consuming both queues
Leads to deadlock
General Information
Think of my two different task types as a hierarchical setup:
A task is a regular celery task that may take quite some time, because it dynamically dispatches other celery tasks and may be required to chain through their respective results
A node is a dynamically dispatched sub-task, which also is a regular celery task but itself can be considered an atomic unit.
My task thus can be a more complex setup of nodes where the results of one or more nodes serves as input for one or more subsequent nodes, and so on. Since my tasks can take longer and will only finish when all their nodes have been deployed, it is essential that they are handled by dedicated workers to keep a sufficient number of workers free to consume the nodes. Otherwise, this could lead to the system being stuck, when a lot of tasks are dispatched, each consumed by another worker, and their respective nodes are only queued but will never be consumed, because all workers are blocked.
If this is a bad design in general, please make any propositions on how I can improve it. I did not yet manage to build one of these processes using celery's built-in canvas primitives. Help me, if you can?!
Configuration/Setup
I run celery with amqp and have set up the following queues and routes in the celery configuration:
CELERY_QUERUES = (
Queue('prod_nodes', Exchange('prod'), routing_key='prod.node'),
Queue('prod_tasks', Exchange('prod'), routing_key='prod.task')
)
CELERY_ROUTES = (
'deploy_node': {'queue': 'prod_nodes', 'routing_key': 'prod.node'},
'deploy_task': {'queue': 'prod_tasks', 'routing_key': 'prod.task'}
)
When I launch my workers, I issue a call similar to the following:
celery multi start w_task_01 w_node_01 w_node_02 -A my.deployment.system \
-E -l INFO -P gevent -Q:1 prod_tasks -Q:2-3 prod_nodes -c 4 --autoreload \
--logfile=/my/path/to/log/%N.log --pidfile=/my/path/to/pid/%N.pid
The Problem
My queue and routing setup seems to work properly, as I can see messages being correctly queued in the RabbitMQ Management web UI.
However, all workers always consume celery tasks from both queues. I can see this when I start and open up the flower web UI and inspect one of the deployed tasks, where e.g. w_node_01 starts consuming messages from the prod_tasks queue, even though it shouldn't.
The RabbitMQ Management web UI furthermore tells me, that all started workers are set up as consumers for both queues.
Thus, I ask you...
... what did I do wrong?
Where is the issue with my setup or worker start call; How can I circumvent the problem of workers always consuming from both queues; Do I really have to make additional settings during runtime (what I certainly do not want)?
Thanks for your time and answers!
You can create 2 separate workers for each queue and each one's define what queue it should get tasks from using the -Q command line argument.
If you want to keep the number processes the same, by default a process is opened for each core for each worker you can use the --concurrency flag (See Celery docs for more info)
Celery allows configuring a worker with a specific queue.
1) Specify the name of the queue with 'queue' attribute for different types of jobs
celery.send_task('job_type1', args=[], kwargs={}, queue='queue_name_1')
celery.send_task('job_type2', args=[], kwargs={}, queue='queue_name_2')
2) Add the following entry in configuration file
CELERY_CREATE_MISSING_QUEUES = True
3) On starting the worker, pass -Q 'queue_name' as argument, for consuming from that desired queue.
celery -A proj worker -l info -Q queue_name_1 -n worker1
celery -A proj worker -l info -Q queue_name_2 -n worker2
How do you limit the number of instances of a specific Celery task that can be ran simultaneously?
I have a task that processes large files. I'm running into a problem where a user may launch several tasks, causing the server to run out of CPU and memory as it tries to process too many files at once. I want to ensure that only N instances of this one type of task are ran at any given time, and that other tasks will sit queued in the scheduler until the others complete.
I see there's a rate_limit option in the task decorator, but I don't think this does what I want. If I'm understanding the docs correctly, this will just limit how quickly the tasks are launched, but it won't restrict the overall number of tasks running, so this will make my server will crash more slowly...but it will still crash nonetheless.
You have to setup extra queue and set desired concurrency level for it. From Routing Tasks:
# Old config style
CELERY_ROUTES = {
'app.tasks.limited_task': {'queue': 'limited_queue'}
}
or
from kombu import Exchange, Queue
celery.conf.task_queues = (
Queue('default', default_exchange, routing_key='default'),
Queue('limited_queue', default_exchange, routing_key='limited_queue')
)
And start extra worker, serving only limited_queue:
$ celery -A celery_app worker -Q limited_queue --loglevel=info -c 1 -n limited_queue
Then you can check everything running smoothly using Flower or inspect command:
$ celery -A celery_app worker inspect --help
What you can do is to push these tasks to a specific queue and have X number of workers processing them. Having two workers on a queue with 100 items will ensure that there will only be two tasks processed at the same time.
I am not sure you can do that in Celery, what you can do is check how many tasks of that name are currently running when a request arrives and if it exceeds the maximum either return an error or add a mechanism that periodically checks if there are open slots for the tasks and runs it (if you add such a mechanism, you don't need to double check, just at each request add it to it's queue.
In order to check running tasks, you can use the inspect command.
In short:
app = Celery(...)
i = app.control.inspect()
i.active()
I have a celery task which uses subprocess.Popen() to call out to an executable which does some CPU-intensive data crunching. It works well but does not take full advantage of the celery worker concurrency.
If I start up celeryd with --concurrency 8 -P prefork, I can confirm with ps aux | grep celeryd that 8 child processes have been spawned. OK.
Now when I start e.g. 3 tasks in parallel, I see all three tasks picked up by one of child workers:
[2014-05-08 13:41:23,839: WARNING/Worker-2] running task a...
[2014-05-08 13:41:23,839: WARNING/Worker-4] running task b...
[2014-05-08 13:41:24,661: WARNING/Worker-7] running task c...
... and they run for several minutes before completing successfully. However, when you observe the CPU usage during that time, it's clear that all three tasks are sharing the same CPU despite another free core:
If I add two more tasks, each subprocess takes ~20% of that one CPU, etc.
I would expect that each child celery process (which are created using multiprocessing.Pool via the prefork method) would be able to operate independently and not be constrained to a single core. If not, how can I take full advantage of multiple CPU cores with a CPU-bound celery task?
According to
http://bugs.python.org/issue17038
and
https://stackoverflow.com/a/15641148/519385
There is a problem whereby some C extensions mess with core affinity and prevent multiprocessing from accessing all available CPUs. The solution is a total hack but appears to work.
os.system("taskset -p 0xff %d" % os.getpid())