Configure Python Celery for Long Running Tasks - python

I have a setup where I run long idempotent tasks on AWS spot instances but I can't work out how to set up Celery to elegantly handle workers being killed mid task.
At the moment if a worker is killed the task is marked as failed (WorkerLostError). I found the documentation on the subject to be a bit lean, but it suggests that you should use CELERY_ACKS_LATE for this scenario. This isn't working for me, the task is still marked as failed.
When I had CELERY_ACKS_LATE=False the task just stayed stuck as PENDING - so at least now I can tell that it has failed - which is a good start.
Here are my config settings at the moment:
# I'm using rabbit-mq as the broker
BROKER_HEARTBEAT = 10
CELERY_ACKS_LATE = True
CELERYD_PREFETCH_MULTIPLIER = 1
CELERY_TRACK_STARTED = True
I have a task spinning on a master server that checks for the results of outstanding tasks and handles updating my local db to mark the tasks as complete (and performs work with the results). At this stage I think I'm going to have to catch the 'Worker exited prematurely: signal 15 (SIGTERM)' scenario and retry the task.
It feels like this should all be handled by celery, so I feel like I've missed something fundamental in my config.
Given idempotent tasks and workers that will fail, what is the best way to configure celery so that those tasks are picked up by a different worker?

Related

Celery worker prefetching too many tasks on other worker network failure ignoring worker_prefetch_multiplier

Recently we encountered an issue with Celery where, when one worker X has network connectivity issues the tasks appears to be redelivered to other workers. There's nothing wrong with this, but the problem is that very few workers are suddenly prefetching like 150 of these redelivered tasks. This happens even when we set worker_prefetch_multiplier to 1
So when we re-launch the worker X which had connectivity issues, it is not fetching any tasks anymore - because they got prefetched by other workers. Even though we didn't want the other workers to prefetch more than 1 task at the time.
The prefetching works fine on regular task submission and processing, however in situation like described above - where one of the workers has connectivity issues - weird things start to happen.
We know that the other workers are absorbing most of these messages because we tried to stop all of them. Upon pressing CTRL+C we see message on two workers like these:
Restoring 150 unacknowledged message(s)
instead of usual message (when things are fine)
Restoring 1 unacknowledged message(s)
How is this possible that it prefetched 150 tasks upon other worker failure, when we set worker_prefetch_multiplier to 1?
We inspected the redis once the tasks has been restored to redis queue to see what we can find there. Not a celery expert here but this celery internals line seemed interesting
\", \"kwargsrepr\": \"{}\", \"origin\": \"WORKER_NAME_WHICH_HAD_CONNECTIVITY_ISSUES\", \"ignore_result\": false, \"redelivered\": true},
Origin pointed to the worker which had connection issues and redelivered is set to true.
Does anyone have idea why suddenly few of the workers had 150 unacknowledged tasks each, even when we had worker_prefetch_multiplier set to 1?
This happened only on other worker network issues. In normal conditions the prefetching appears to work fine.
Below is shown how we launch worker and tasks:
# How the worker is started
celery -A tasks worker -n "someworkername" --loglevel=INFO -Q "hardcodedqueuename" -c 1
app.conf['worker_prefetch_multiplier'] = 1
app.conf.broker_transport_options = {"visibility_timeout": 6 * 3600}
signatures=[]
signatures.append(scan_task.s(arg1, args).set(queue=queue_name))
finalize_func = finalize.s(context)
group_task = group(signatures,
queue=queue_name) | finalize_func
group_task()

Does restarting celery cause duplicate tasks?

I have an email task in celery that has an eta of 10 days from now(). However, I'm finding that some people are getting 5-6 duplicate emails at a time. I've come across this problem before with BROKER_TRANSPORT_OPTIONS set too low. Now I have this in my settings file:
BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 2592000} #30 days
So that shouldn't be a problem any more. I'm just wondering if there is anything else that can cause it. i.e. restarting celery. Celery gets restarted every time I deploy new code and that can happen 5 or more times a week so it's the only thing I can think of.
Any ideas?
Thanks.
Task duplicating is possible if worker/beat processes had not stopped correctly. How do you restart celery workers/beat? Check server for zombie celery worker and beat processes. Try to stop all celery processes, check no processes of celery exist and start it again. After all check that ps ax | grep celery shows fresh workers and only one beat.
Tasks won't restart in case of incorrect worker stop if you set CELERY_ACKS_LATE = False. In this case the task marked as acknowledged immediately after consuming. See docs.
Also make sure that your tasks have no retry enabled. If any exception happens inside task - they might retry with the same input arguments.
Another possible case - your tasks are written wrong and each run selects the same recipients set.

Persistent Long Running Tasks in Celery

I'm working on a Python based system, to enqueue long running tasks to workers.
The tasks originate from an outside service that generate a "token", but once they're created based on that token, they should run continuously, and stopped only when explicitly removed by code.
The task starts a WebSocket and loops on it. If the socket is closed, it reopens it. Basically, the task shouldn't reach conclusion.
My goals in architecting this solutions are:
When gracefully restarting a worker (for example to load new code), the task should be re-added to the queue, and picked up by some worker.
Same thing should happen when ungraceful shutdown happens.
2 workers shouldn't work on the same token.
Other processes may create more tasks that should be directed to the same worker that's handling a specific token. This will be resolved by sending those tasks to a queue named after the token, which the worker should start listening to after starting the token's task. I am listing this requirement as an explanation to why a task engine is even required here.
Independent servers, fast code reload, etc. - Minimal downtime per task.
All our server side is Python, and looks like Celery is the best platform for it.
Are we using the right technology here? Any other architectural choices we should consider?
Thanks for your help!
According to the docs
When shutdown is initiated the worker will finish all currently executing tasks before it actually terminates, so if these tasks are important you should wait for it to finish before doing anything drastic (like sending the KILL signal).
If the worker won’t shutdown after considerate time, for example because of tasks stuck in an infinite-loop, you can use the KILL signal to force terminate the worker, but be aware that currently executing tasks will be lost (unless the tasks have the acks_late option set).
You may get something like what you want by using retry or acks_late
Overall I reckon you'll need to implement some extra application-side job control, plus, maybe, a lock service.
But, yes, overall you can do this with celery. Whether there are better technologies... that's out of the scope of this site.

What happens to a Celery Worker's scheduled (eta) tasks when it shuts down?

I've been learning about celery and haven't been able to find the answer to a conceptual question and have had odd results experimenting.
When there are scheduled tasks (by scheduled, I don't mean periodic but scheduled to run in the future using eta=x) submitted to Celery, they seem to be consumed from the queue by a worker right away (rather than staying in the Redis default celery key/queue). Presumably, the worker will actually execute the tasks at eta.
What happens if that worker were to be shut down or restarted (to update it's registered tasks for example)? Would those scheduled tasks be lost? They are not "running" so a warm terminate wouldn't wait for them to finish of course.
Is there a way to force those tasks to be return to the queue and consumed by the next available worker?
I suppose, manually, one could dump the tasks before shutting down a worker:
http://celery.readthedocs.org/en/latest/userguide/workers.html#inspecting-workers
and resubmit them when a new worker is back up... but is this supposed to happen automatically?
Would really appreciate any help with this
Thanks
Take a look at acks_late
http://celery.readthedocs.org/en/latest/reference/celery.app.task.html#celery.app.task.Task.acks_late
If set to true Celery will keep the task in the queue until it has been successfully executed.
Update: Celery 5.1
Workers will acknowledge the message even if acks_late is enabled. This is the default and intentional setting set forth by the library. [Ref]
To change the default settings and re-queue your unfinished tasks, you can use the task_reject_on_worker_lost config. [Ref]
Although keep in mind that this could lead to a message loop and can cause unintended effects if your tasks are not idempotent.
Specifically for eta tasks, queues wait for workers to acknowledge the tasks before deleting them. With default settings, celery workers ack right before the task is executed and with acks_late when the task is finished executing.
So when workers fail to ack the tasks probably because of shutdown/restart/lost_connection or in case of Redis/SQS visibility_timeout exceeded [ref], the queue will redeliver the message to any available worker.

Understanding celery task prefetching

I just found out about the configuration option CELERYD_PREFETCH_MULTIPLIER (docs). The default is 4, but (I believe) I want the prefetching off or as low as possible. I set it to 1 now, which is close enough to what I'm looking for, but there's still some things I don't understand:
Why is this prefetching a good idea? I don't really see a reason for it, unless there's a lot of latency between the message queue and the workers (in my case, they are currently running on the same host and at worst might eventually run on different hosts in the same data center). The documentation only mentions the disadvantages, but fails to explain what the advantages are.
Many people seem to set this to 0, expecting to be able to turn off prefetching that way (a reasonable assumption in my opinion). However, 0 means unlimited prefetching. Why would anyone ever want unlimited prefetching, doesn't that entirely eliminate the concurrency/asynchronicity you introduced a task queue for in the first place?
Why can prefetching not be turned off? It might not be a good idea for performance to turn it off in most cases, but is there a technical reason for this not to be possible? Or is it just not implemented?
Sometimes, this option is connected to CELERY_ACKS_LATE. For example. Roger Hu writes «[…] often what [users] really want is to have a worker only reserve as many tasks as there are child processes. But this is not possible without enabling late acknowledgements […]» I don't understand how these two options are connected and why one is not possible without the other. Another mention of the connection can be found here. Can someone explain why the two options are connected?
Prefetching can improve the performance. Workers don't need to wait for the next message from a broker to process. Communicating with a broker once and processing a lot of messages gives a performance gain. Getting a message from a broker (even from a local one) is expensive compared to the local memory access. Workers are also allowed to acknowledge messages in batches
Prefetching set to zero means "no specific limit" rather than unlimited
Setting prefetching to 1 is documented to be equivalent to turning it off, but this may not always be the case (see https://stackoverflow.com/a/33357180/71522)
Prefetching allows to ack messages in batches. CELERY_ACKS_LATE=True prevents acknowledging messages when they reach to a worker
Old question, but still adding my answer in case it helps someone. My understanding from some initial testing was same as that in David Wolever's answer. I just tested this more in celery 3.1.19 and -Ofair does work. Just that it is not meant to disable prefetch at the worker node level. That will continue to happen. Using -Ofair has a different effect which is at the pool worker level. In summary, to disable prefetch completely, do this:
Set CELERYD_PREFETCH_MULTIPLIER = 1
Set CELERY_ACKS_LATE = True at a global level or task level
Use -Ofair while starting the workers
If you set concurrency to 1, then step 3 is not needed. If you want a
higher concurrency, then step 3 is essential to avoid tasks getting
backed up in a node that could be run long running tasks.
Adding some more details:
I found that the worker node will always prefetch by default. You can only control how many tasks it prefetches by using CELERYD_PREFETCH_MULTIPLIER. If set to 1, it will only prefetch as many tasks as the number of pool workers (concurrency) in the node. So if you had concurrency = n, the max tasks prefetched by the node will be n.
Without the -Ofair option, what happened for me was that if one of the pool worker processes was executing a long running task, the other workers in the node would also stop processing the tasks already prefetched by the node. By using -Ofair, that changed. Even though one of the workers in the node was executing a long running tasks, others would not stop processing and would continue to process the tasks prefetched by the node. So I see two levels of prefetching. One at the worker node level. The other at the individual worker level. Using -Ofair for me seemed to disable it at the worker level.
How is ACKS_LATE related? ACKS_LATE = True means that the task will be acknowledged only when the task succeeds. If not, I suppose it would happen when it is received by a worker. In case of prefetch, the task is first received by the worker (confirmed from logs) but will be executed later. I just realized that prefetched messages show up under "unacknowledged messages" in rabbitmq. So I'm not sure if setting it to True is absolutely needed. We anyway had our tasks set that way (late ack) for other reasons.
Just a warning: as of my testing with the redis broker + Celery 3.1.15, all of the advice I've read pertaining to CELERYD_PREFETCH_MULTIPLIER = 1 disabling prefetching is demonstrably false.
To demonstrate this:
Set CELERYD_PREFETCH_MULTIPLIER = 1
Queue up 5 tasks that will each take a few seconds (ex, time.sleep(5))
Start watching the length of the task queue in Redis: watch redis-cli -c llen default
Start celery worker -c 1
Notice that the queue length in Redis will immediately drop from 5 to 3
CELERYD_PREFETCH_MULTIPLIER = 1 does not prevent prefetching, it simply limits the prefetching to 1 task per queue.
-Ofair, despite what the documentation says, also does not prevent prefetching.
Short of modifying the source code, I haven't found any method for entirely disabling prefetching.
I cannot comment on David Wolever's answers, since my stackcred isn't high enough. So, I've framed my comment as an answer since I'd like to share my experience with Celery 3.1.18 and a Mongodb broker. I managed to stop prefetching with the following:
add CELERYD_PREFETCH_MULTIPLIER = 1 to the celery config
add CELERY_ACKS_LATE = True to the celery config
Start celery worker with options: --concurrency=1 -Ofair
Leaving CELERY_ACKS_LATE to the default, the worker still prefetches. Just like the OP I don't fully grasp the link between prefetching and late acks. I understand what David says "CELERY_ACKS_LATE=True prevents acknowledging messages when they reach to a worker", but I fail to understand why late acks would be incompatible with prefetch. In theory a prefetch would still allow to ack late right - even if not coded as such in celery ?
I experienced something a little bit different with SQS as broker.
The setup was:
CELERYD_PREFETCH_MULTIPLIER = 1
ACKS_ON_FAILURE_OR_TIMEOUT=False
CELERY_ACKS_LATE = True
CONCURRENCY=1
After task fail (exception raised), the worker became unavailable since the message was not acked, both local and remote queue.
The solution that made the workers continue consuming work was setting
CELERYD_PREFETCH_MULTIPLIER = 0
I can only speculate that acks_late was not taken in consideration when writing the SQS transport

Categories