Force Celery to run next task in queue?

Force Celery to run next task in queue? - python

Is there any way to make Celery recheck if there are any tasks in the main queue ready to be started? Will the remote command add_consumer() do the job?
The reason: I am running several concurrent tasks, which spawn multiple sub-processes. When the tasks are done, the sub-processes sometimes take a few seconds to finish, so because the concurrency limit is maxed out by sub-processes, a new task from the queue is never started. And because Celery does not check again when the sub-processes finish, the queue gets stalled with no active tasks. Therefore I want to add a periodical task that tells Celery to recheck the queue and and start the next task. How do I tell Celery this?

From the docs:
The add_consumer control command will tell one or more workers to start consuming from a queue. This operation is idempotent.
Yes, add_consumer does what you want. You could also combine that with a periodic task to "recheck the queue and start the next task" every so often (depending on your need)

Related

Given a task_id, execute the task

I'm creating a celery task in a situation where task producers are more than consumers (workers). Now since my queues are getting filled up and the workers consume in FCFS manner, can I get to execute a specific task(given a task_id) instantly?
for eg:
My tasks are filled in the following fashion. [1,2,3,4,5,6,7,8,9,0]. Now the tasks are fetched from the zeroth index. Now a situation arise where I want to execute task 8 above all. How can I do this?
The worker need not execute that task (because there can be situation where a worker is already occupied). It can be run directly from the application. And when the task is completed (either from the worker or directly from the application), it should get deleted from the queue.
I know how to forcefully revoke a task (given a task_id) but how can I execute a task given an id ?

how can I execute a task given an id ?
the short answer is you can't. Celery workers pull tasks off the broker backend as they become available.
Why not?
Note that's not a limitation of Celery as such, rather it is a characteristic of message queuing systems(MQS) in general. The point of MQS is to desynchronize an application's component so that the producer can go on to do other work while workers execute the tasks asynchronously. In other words, once a task has been sent off it cannot be modified (but it can be removed as long as it has not been started yet).
What options are there?
Celery offers you several options to deal with lower v.s. higher priority or short- and long-running tasks, at task submission time:
Routing - tasks can be routed to different workers. So if your tasks [0 .. 9] are all long-running, except for task 8, you could route task 8 to a worker, or a set of workers, that deal with short-running tasks.
Timed execution - specify a countdown or estimated time of arrival (eta) for each task. That's a good option if you know that some tasks can be delayed for later execution i.e. when the system will be less busy. This leaves workers ready for those tasks that need to be executed immediately.
Task expiry - specify an expire countdown or time with a callback. This way the task will be revoked if it didn't execute within the time alloted to it and the callback can start an alternative course of action.
Check on task results periodically, revoke a task if it didn't start executing within some time. Note this is different from task expiry where the revoking only happens once a worker has fetched the task from the queue - if the queue is full the revoking may happen too late for your use case. Checking results periodically means you have another component in your system that does this and determines an alternate course of action.

Can celery assign task to specify worker

Celery will send task to idle workers.
I have a task will run every 5 seconds, and I want this task to only be sent to one specify worker.
Other tasks can share the left over workers
Can celery do this??
And I want to know what this parameter is: CELERY_TASK_RESULT_EXPIRES
Does it means that the task will not be sent to a worker in the queue?
Or does it stop the task if it runs too long?

Sure, you can. Best way to do it, separate celery workers using different queues. You just need to make sure that task you need goes to separate queue, and your worker listening particular queue.
Long story for this: http://docs.celeryproject.org/en/latest/userguide/routing.html

Just to answer your second question CELERY_TASK_RESULT_EXPIRES is the time in seconds that the result of the task is persisted. So after a task is over, its result is saved into your result backend. The result is kept there for the amount of time specified by that parameter. That is used when a task result might be accessed by different callers.
This has probably nothing to do with your problem. As for the first solution, as already stated you have to use multiple queues. However be aware that you cannot assign the task to a specific Worker Process, just to a specific Worker which will then assign it to a specific Worker Process.

Looking for ways and options to stop the worker process after task completion in celery?

The question context is from the stand point of disposable infrastructure using docker for celery workers.
I am looking out for a option or ways to achieve the following:
I start a docker container, which starts the celery worker [along with concurrency]
It waits and picks ups one task per worker
the workers shutdown after task is completed [success or failure]
The main process dies after all workers are down.
I dispose off the container and start a new one.

Celery limit number of specific task in queue

I'm using Celery 3.1.x with 2 tasks. The first task (TaskOne) is enqueued when Celery starts up through the celeryd_after_setup signal:
#celeryd_after_setup.connect
def celeryd_after_setup(*args, **kwargs):
TaskOne().apply_async(countdown=5)
When TaskOne is run, it does some calculations and then enqueues TaskTwo. Imagine the following workflow:
I start celery, thus the signal is fired and TaskOne is enqueued
after the countdown (5) TaskTwo is enqueued
then I stop celery (the TaskTwo remains in the queue)
afterwards I restart celery
the workflow is run again and TaskTwo is enqueued again
So we have 2 TaskTwo in the queue. That is a problem for my workflow because I only want one TaskTwo within the queue and avoid that a second one is enqueued.
My question: How can I achieve this?
With celery.app.control.Inspect.scheduled() (Docs) I can get a list of which tasks are scheduled, hidden in a combination of lists and dicts. This is maybe a way, but going through the result of this does not feel right. Is there any better way?

An easy-to-implement solution would be to add the --purge switch to your worker command. It will clear the queue and the worker start with no scheduled jobs.
But beware: that's a kind of job-global, unrecoverable action. When there are other scheduled jobs you depend on, this is not your solution.

After considering several options I chose to use app.control.inspect.
It's not a really beautiful solution, but it works:
# fetch all scheduled tasks
scheduled_tasks = inspect().scheduled()
# iterate the scheduled task values, see http://docs.celeryproject.org/en/latest/userguide/workers.html?highlight=revoke#dump-of-scheduled-eta-tasks
for task_values in iter(scheduled_tasks.values()):
# task_values is a list of dicts
for task in task_values:
if task['request']['name'] == '{}.{}'.format(TaskTwo.__module__, TaskTwo.__name__):
logger.info('TaskTwo is already scheduled, skipping additional run')
return

What happens to a Celery Worker's scheduled (eta) tasks when it shuts down?

I've been learning about celery and haven't been able to find the answer to a conceptual question and have had odd results experimenting.
When there are scheduled tasks (by scheduled, I don't mean periodic but scheduled to run in the future using eta=x) submitted to Celery, they seem to be consumed from the queue by a worker right away (rather than staying in the Redis default celery key/queue). Presumably, the worker will actually execute the tasks at eta.
What happens if that worker were to be shut down or restarted (to update it's registered tasks for example)? Would those scheduled tasks be lost? They are not "running" so a warm terminate wouldn't wait for them to finish of course.
Is there a way to force those tasks to be return to the queue and consumed by the next available worker?
I suppose, manually, one could dump the tasks before shutting down a worker:
http://celery.readthedocs.org/en/latest/userguide/workers.html#inspecting-workers
and resubmit them when a new worker is back up... but is this supposed to happen automatically?
Would really appreciate any help with this
Thanks

Take a look at acks_late
http://celery.readthedocs.org/en/latest/reference/celery.app.task.html#celery.app.task.Task.acks_late
If set to true Celery will keep the task in the queue until it has been successfully executed.

Update: Celery 5.1
Workers will acknowledge the message even if acks_late is enabled. This is the default and intentional setting set forth by the library. [Ref]
To change the default settings and re-queue your unfinished tasks, you can use the task_reject_on_worker_lost config. [Ref]
Although keep in mind that this could lead to a message loop and can cause unintended effects if your tasks are not idempotent.
Specifically for eta tasks, queues wait for workers to acknowledge the tasks before deleting them. With default settings, celery workers ack right before the task is executed and with acks_late when the task is finished executing.
So when workers fail to ack the tasks probably because of shutdown/restart/lost_connection or in case of Redis/SQS visibility_timeout exceeded [ref], the queue will redeliver the message to any available worker.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.