Celery concurrency settings : CELERYD_CONCURRENCY and CELERYD_PREFETCH_MULTIPLIER

Celery concurrency settings : CELERYD_CONCURRENCY and CELERYD_PREFETCH_MULTIPLIER - python

I have question about CELERYD_CONCURRENCY and CELERYD_PREFETCH_MULTIPLIER
Because my english is not well to understand the official site description,
I want to make sure it
I set CELERYD_CONCURRENCY=40
I think it will use 40 workers to do things
But I usually see INFO/MainProcess ,seldom see INFO/Worker-n
Is it because the task is fast,so it didn't have to assign to worker??
Here is a task architecture :
I have a period_task is celery period_task , and mail_it is normal celery task
#shared_task
def period_task():
do_something()
....
for mail in mail_list:
mail_it.delay(mail)
And the second question is CELERYD_PREFETCH_MULTIPLIER ,the default value is 4
Is it means that each worker can get 4 tasks from queue one time ??? So I have 40 worker,I can get 40*4 task????

My understanding:
CELERYD_CONCURRENCY:
This is the number of THREADS/GREENTHREADS a given worker will have. Celery calls these "processes". This is the number of tasks a single worker can execute in parallel. I believe Celery creates this numbe PLUS ONE internally, and that the additional 1 is for actually managing/assigning to the others (in your case, 40 of them!). In my experience, you likely don't need/want 40 (closer to 1 or 2 per CPU), but your mileage may vary.
CELERYD_PREFETCH_MULTIPLIER:
Prefetch is how many tasks are reserved per "process" according to the docs. It's a bit like a mini-queue just for that specific thread. This would indeed mean that your ONE started worker would potentially 'reserve' 40 * 4 tasks to do. Keep in mind that these reserved tasks cannot be "stolen" or sent to another worker or thread, so if they are long running you may wish to disable this feature to allow faster stations to pickup the slack of slower ones.
If this isn't clear in your current setup, I might suggest adding a sleep() to your task to be able to observe it.

Related

Django Celery, Celery beat workers

In celery I've 3 types of tasks first task executes in every 3 minutes and take almost 1 minute to complete, second task is periodic which runs on every monday and takes almost 10 minutes to complete, the third and last one is for sending users emails for register/forget password, I'm confused how many workers/ celery beat instances I should use, can anyone help me out please?

Usually you'll have only one Celery beat instance to schedule your tasks. If you have more than one instance, it will lead to tasks being scheduled as many times as the number one Celery beat instances you have.
There's no hard and fast rule for how many Celery workers you should have. Start with a handful, e.g. three, and keep an eye on the metrics (you can use something like https://flower.readthedocs.io/en/latest/index.html to create dashboards).

Rate limit a celery task without blocking other tasks

I am trying to limit the rate of one celery task. Here is how I am doing it:
from project.celery import app
app.control.rate_limit('task_a', '10/m')
It is working well. However, there is a catch. Other tasks that this worker is responsible for are being blocked as well.
Let's say, 100 of task_a have been scheduled. As it is rate-limited, it will take 10 minutes to execute all of them. During this time, task_b has been scheduled as well. It will not be executed until task_a is done.
Is it possible to not block task_b?
By the looks of it, this is just how it works. I just didn't get that impression after reading the documentation.
Other options include:
Separate worker and queue only for this task
Adding an eta to the task task_a so that all of it are scheduled to run during the night
What is the best practice in such cases?

This should be part of a task declaration to work on per-task basis. The way you are doing it via control probably why it has this side-effect on other tasks
#task(rate_limit='10/m')
def task_a():
...
After more reading
Note that this is a per worker instance rate limit, and not a global rate limit. To enforce a global rate limit (e.g., for an API with a maximum number of requests per second), you must restrict to a given queue.
You probably will have to do this in separate queue

The easiest (no coding required) way is separating the task into its own queue and running a dedicated worker just for this purpose.
There's no shame in that, it is totally fine to have many Celery queues and workers, each dedicated just for a specific type of work. As an added bonus you may get some more control over the execution, you can easily turn workers ON/OFF to pause certain processes if needed, etc.
On the other hand, having lots of specialized workers idle most of the time (waiting for a specific job to be queued) is not particularly memory-efficient.
Thus, in case you need to rate limit more tasks and expect the specific workers to be idle most of the time, you may consider increasing the efficiency and implement a Token Bucket. With that all your workers can be generic-purpose and you can scale them naturally as your overall load increases, knowing that the work distribution will not be crippled by a single task's rate limit anymore.

Spread Celery Django tasks out over 24 hours

I have tasks that do a get request to an API.
I have around 70 000 requests that I need to do, and I want to spread them out in 24 hours. So not all 70k requests are run at for example 10AM.
How would I do that in celery django? I have been searching for hours but cant find a good simple solution.
The database has a list of games that needs to be refreshed. Currently I have a cron that creates tasks every hour. But is it better to create a task for every game and make it repeat every hour?

The typical approach is to send them whenever you need some work done, no matter how many there are (even hundreds of thousands). The execution however is controlled by how many workers (and worker processes) you have subscribed to a dedicated queue. The key here is the dedicated queue - that is a common way of not allowing all workers start executing the newly created tasks. This goes beyond the basic Celery usage. You need to use celery multi for this use-case, or create two or more separate Celery workers manually with different queues.
If you do not want to over-complicate things you can use your current setup, but make these tasks with lowest priority, so if any new, more important, task gets created, it will be executed first. Problem with this approach is that only Redis and RabbitMQ backends support priorities as far as I know.

Understanding celery task prefetching

I just found out about the configuration option CELERYD_PREFETCH_MULTIPLIER (docs). The default is 4, but (I believe) I want the prefetching off or as low as possible. I set it to 1 now, which is close enough to what I'm looking for, but there's still some things I don't understand:
Why is this prefetching a good idea? I don't really see a reason for it, unless there's a lot of latency between the message queue and the workers (in my case, they are currently running on the same host and at worst might eventually run on different hosts in the same data center). The documentation only mentions the disadvantages, but fails to explain what the advantages are.
Many people seem to set this to 0, expecting to be able to turn off prefetching that way (a reasonable assumption in my opinion). However, 0 means unlimited prefetching. Why would anyone ever want unlimited prefetching, doesn't that entirely eliminate the concurrency/asynchronicity you introduced a task queue for in the first place?
Why can prefetching not be turned off? It might not be a good idea for performance to turn it off in most cases, but is there a technical reason for this not to be possible? Or is it just not implemented?
Sometimes, this option is connected to CELERY_ACKS_LATE. For example. Roger Hu writes «[…] often what [users] really want is to have a worker only reserve as many tasks as there are child processes. But this is not possible without enabling late acknowledgements […]» I don't understand how these two options are connected and why one is not possible without the other. Another mention of the connection can be found here. Can someone explain why the two options are connected?

Prefetching can improve the performance. Workers don't need to wait for the next message from a broker to process. Communicating with a broker once and processing a lot of messages gives a performance gain. Getting a message from a broker (even from a local one) is expensive compared to the local memory access. Workers are also allowed to acknowledge messages in batches
Prefetching set to zero means "no specific limit" rather than unlimited
Setting prefetching to 1 is documented to be equivalent to turning it off, but this may not always be the case (see https://stackoverflow.com/a/33357180/71522)
Prefetching allows to ack messages in batches. CELERY_ACKS_LATE=True prevents acknowledging messages when they reach to a worker

Old question, but still adding my answer in case it helps someone. My understanding from some initial testing was same as that in David Wolever's answer. I just tested this more in celery 3.1.19 and -Ofair does work. Just that it is not meant to disable prefetch at the worker node level. That will continue to happen. Using -Ofair has a different effect which is at the pool worker level. In summary, to disable prefetch completely, do this:
Set CELERYD_PREFETCH_MULTIPLIER = 1
Set CELERY_ACKS_LATE = True at a global level or task level
Use -Ofair while starting the workers
If you set concurrency to 1, then step 3 is not needed. If you want a
higher concurrency, then step 3 is essential to avoid tasks getting
backed up in a node that could be run long running tasks.
Adding some more details:
I found that the worker node will always prefetch by default. You can only control how many tasks it prefetches by using CELERYD_PREFETCH_MULTIPLIER. If set to 1, it will only prefetch as many tasks as the number of pool workers (concurrency) in the node. So if you had concurrency = n, the max tasks prefetched by the node will be n.
Without the -Ofair option, what happened for me was that if one of the pool worker processes was executing a long running task, the other workers in the node would also stop processing the tasks already prefetched by the node. By using -Ofair, that changed. Even though one of the workers in the node was executing a long running tasks, others would not stop processing and would continue to process the tasks prefetched by the node. So I see two levels of prefetching. One at the worker node level. The other at the individual worker level. Using -Ofair for me seemed to disable it at the worker level.
How is ACKS_LATE related? ACKS_LATE = True means that the task will be acknowledged only when the task succeeds. If not, I suppose it would happen when it is received by a worker. In case of prefetch, the task is first received by the worker (confirmed from logs) but will be executed later. I just realized that prefetched messages show up under "unacknowledged messages" in rabbitmq. So I'm not sure if setting it to True is absolutely needed. We anyway had our tasks set that way (late ack) for other reasons.

Just a warning: as of my testing with the redis broker + Celery 3.1.15, all of the advice I've read pertaining to CELERYD_PREFETCH_MULTIPLIER = 1 disabling prefetching is demonstrably false.
To demonstrate this:
Set CELERYD_PREFETCH_MULTIPLIER = 1
Queue up 5 tasks that will each take a few seconds (ex, time.sleep(5))
Start watching the length of the task queue in Redis: watch redis-cli -c llen default
Start celery worker -c 1
Notice that the queue length in Redis will immediately drop from 5 to 3
CELERYD_PREFETCH_MULTIPLIER = 1 does not prevent prefetching, it simply limits the prefetching to 1 task per queue.
-Ofair, despite what the documentation says, also does not prevent prefetching.
Short of modifying the source code, I haven't found any method for entirely disabling prefetching.

I cannot comment on David Wolever's answers, since my stackcred isn't high enough. So, I've framed my comment as an answer since I'd like to share my experience with Celery 3.1.18 and a Mongodb broker. I managed to stop prefetching with the following:
add CELERYD_PREFETCH_MULTIPLIER = 1 to the celery config
add CELERY_ACKS_LATE = True to the celery config
Start celery worker with options: --concurrency=1 -Ofair
Leaving CELERY_ACKS_LATE to the default, the worker still prefetches. Just like the OP I don't fully grasp the link between prefetching and late acks. I understand what David says "CELERY_ACKS_LATE=True prevents acknowledging messages when they reach to a worker", but I fail to understand why late acks would be incompatible with prefetch. In theory a prefetch would still allow to ack late right - even if not coded as such in celery ?

I experienced something a little bit different with SQS as broker.
The setup was:
CELERYD_PREFETCH_MULTIPLIER = 1
ACKS_ON_FAILURE_OR_TIMEOUT=False
CELERY_ACKS_LATE = True
CONCURRENCY=1
After task fail (exception raised), the worker became unavailable since the message was not acked, both local and remote queue.
The solution that made the workers continue consuming work was setting
CELERYD_PREFETCH_MULTIPLIER = 0
I can only speculate that acks_late was not taken in consideration when writing the SQS transport

Celery rate_limit affecting multiple tasks

I have a setup with rabbitmq and celery, with workers running on 4 machines with 4 instances each. I have two task functions defined, which basically call the same backend function, but one of them named process_transaction with no rate_limit defined, and another called slow_process_transaction, with rate_limit="6/m". The tasks go to different queues on rabbitmq, slow and normal.
The strange thing is the rate_limit being enforced for both tasks. If I try to change the rate_limit using celery.control.rate_limit, doing it with the process_transaction doesn't change the effective rate, and using the slow_process_transaction name changes the effective rate for both.
Any ideas on what is wrong?

By reading the bucket source code I figured out celery implements rate limiting by sleeping the time delta after finishing a task, so if you mix tasks with different rate limits in the same workers, they affect each other.
Separating the workers solved my problem, but it's not the optimal solution.
You can separate the workers by using node names and named parameters on your celeryd call. For instance, you have nodes 'fast' and 'slow', and you want them to consume separate queues with concurrency 5 and 1 respectively:
celeryd <other opts> -Q:fast fast_queue -c:fast 5 -Q:slow slow_queue -c:slow 1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.