I'm using Celery 3.1.x with 2 tasks. The first task (TaskOne) is enqueued when Celery starts up through the celeryd_after_setup signal:
#celeryd_after_setup.connect
def celeryd_after_setup(*args, **kwargs):
TaskOne().apply_async(countdown=5)
When TaskOne is run, it does some calculations and then enqueues TaskTwo. Imagine the following workflow:
I start celery, thus the signal is fired and TaskOne is enqueued
after the countdown (5) TaskTwo is enqueued
then I stop celery (the TaskTwo remains in the queue)
afterwards I restart celery
the workflow is run again and TaskTwo is enqueued again
So we have 2 TaskTwo in the queue. That is a problem for my workflow because I only want one TaskTwo within the queue and avoid that a second one is enqueued.
My question: How can I achieve this?
With celery.app.control.Inspect.scheduled() (Docs) I can get a list of which tasks are scheduled, hidden in a combination of lists and dicts. This is maybe a way, but going through the result of this does not feel right. Is there any better way?
An easy-to-implement solution would be to add the --purge switch to your worker command. It will clear the queue and the worker start with no scheduled jobs.
But beware: that's a kind of job-global, unrecoverable action. When there are other scheduled jobs you depend on, this is not your solution.
After considering several options I chose to use app.control.inspect.
It's not a really beautiful solution, but it works:
# fetch all scheduled tasks
scheduled_tasks = inspect().scheduled()
# iterate the scheduled task values, see http://docs.celeryproject.org/en/latest/userguide/workers.html?highlight=revoke#dump-of-scheduled-eta-tasks
for task_values in iter(scheduled_tasks.values()):
# task_values is a list of dicts
for task in task_values:
if task['request']['name'] == '{}.{}'.format(TaskTwo.__module__, TaskTwo.__name__):
logger.info('TaskTwo is already scheduled, skipping additional run')
return
Related
I'm creating a celery task in a situation where task producers are more than consumers (workers). Now since my queues are getting filled up and the workers consume in FCFS manner, can I get to execute a specific task(given a task_id) instantly?
for eg:
My tasks are filled in the following fashion. [1,2,3,4,5,6,7,8,9,0]. Now the tasks are fetched from the zeroth index. Now a situation arise where I want to execute task 8 above all. How can I do this?
The worker need not execute that task (because there can be situation where a worker is already occupied). It can be run directly from the application. And when the task is completed (either from the worker or directly from the application), it should get deleted from the queue.
I know how to forcefully revoke a task (given a task_id) but how can I execute a task given an id ?
how can I execute a task given an id ?
the short answer is you can't. Celery workers pull tasks off the broker backend as they become available.
Why not?
Note that's not a limitation of Celery as such, rather it is a characteristic of message queuing systems(MQS) in general. The point of MQS is to desynchronize an application's component so that the producer can go on to do other work while workers execute the tasks asynchronously. In other words, once a task has been sent off it cannot be modified (but it can be removed as long as it has not been started yet).
What options are there?
Celery offers you several options to deal with lower v.s. higher priority or short- and long-running tasks, at task submission time:
Routing - tasks can be routed to different workers. So if your tasks [0 .. 9] are all long-running, except for task 8, you could route task 8 to a worker, or a set of workers, that deal with short-running tasks.
Timed execution - specify a countdown or estimated time of arrival (eta) for each task. That's a good option if you know that some tasks can be delayed for later execution i.e. when the system will be less busy. This leaves workers ready for those tasks that need to be executed immediately.
Task expiry - specify an expire countdown or time with a callback. This way the task will be revoked if it didn't execute within the time alloted to it and the callback can start an alternative course of action.
Check on task results periodically, revoke a task if it didn't start executing within some time. Note this is different from task expiry where the revoking only happens once a worker has fetched the task from the queue - if the queue is full the revoking may happen too late for your use case. Checking results periodically means you have another component in your system that does this and determines an alternate course of action.
Celery will send task to idle workers.
I have a task will run every 5 seconds, and I want this task to only be sent to one specify worker.
Other tasks can share the left over workers
Can celery do this??
And I want to know what this parameter is: CELERY_TASK_RESULT_EXPIRES
Does it means that the task will not be sent to a worker in the queue?
Or does it stop the task if it runs too long?
Sure, you can. Best way to do it, separate celery workers using different queues. You just need to make sure that task you need goes to separate queue, and your worker listening particular queue.
Long story for this: http://docs.celeryproject.org/en/latest/userguide/routing.html
Just to answer your second question CELERY_TASK_RESULT_EXPIRES is the time in seconds that the result of the task is persisted. So after a task is over, its result is saved into your result backend. The result is kept there for the amount of time specified by that parameter. That is used when a task result might be accessed by different callers.
This has probably nothing to do with your problem. As for the first solution, as already stated you have to use multiple queues. However be aware that you cannot assign the task to a specific Worker Process, just to a specific Worker which will then assign it to a specific Worker Process.
How I can config celery to get one worker always run the same task. And after it ended starts it again on the same worker.
It looks like you will need to take two steps
Create a separate queue for this task, route the task to the queue
2a. Create an infinite loop that calls your particular task, such as this answer
OR
2b. Have a recursive task that calls itself on completion (this could get messy)
Is there any way to make Celery recheck if there are any tasks in the main queue ready to be started? Will the remote command add_consumer() do the job?
The reason: I am running several concurrent tasks, which spawn multiple sub-processes. When the tasks are done, the sub-processes sometimes take a few seconds to finish, so because the concurrency limit is maxed out by sub-processes, a new task from the queue is never started. And because Celery does not check again when the sub-processes finish, the queue gets stalled with no active tasks. Therefore I want to add a periodical task that tells Celery to recheck the queue and and start the next task. How do I tell Celery this?
From the docs:
The add_consumer control command will tell one or more workers to start consuming from a queue. This operation is idempotent.
Yes, add_consumer does what you want. You could also combine that with a periodic task to "recheck the queue and start the next task" every so often (depending on your need)
I have to spawn celery tasks, which have to have some namespace (for example user id).
So I'm spawn it by
scrapper_start.apply_async((request.user.id,), queue=account.Account_username)
app.control.add_consumer(account.Account_username, reply=True)
And tasks spawns recursively, from other task.
Now I have to check, if tasks of queue are executing. Tried to check list length in redis, it return true number before celery start executing.
How to solve this problem. I need only to check, if queue or consumer is executing or already empty. Thanks
If you just want to inspect the queue, you do this from command line itself.
from celery.task.control import inspect
i = inspect('scrapper_start')
i.active() # get a list of active tasks
In addition to checking which are currently executing, you can also do the following.
i.registered() # get a list of tasks registered
i.scheduled # get a list of tasks waiting
i.reserved() #tasks that has been received, but waiting to be executed
This command line inspection is good if you want to check once in a while.
For some reason, if you want to monitor them continuously, you can use Flower which provides a beautiful interface to monitor workers.