Celery add task dynamically to chain - python

I am using celery 3 with Django.
I have a list of jobs in database. User can start a particular job which starts a celery task.
Now I want user to be able to start multiple jobs and it should add them to the celery queue and process them one after the other not in parallel as with async.
I am trying to create a job scheduler with celery where user can select the jobs to execute and they will be executed in sequential fashion.
If I use chain() then I cannot add new tasks to the chain dynamically.
What is ​the best solution?

A better primitive for you to use would be link instead of chain in Celery.
From the documentation:
s = add.s(2, 2)
s.link(mul.s(4))
s.link(log_result.s())
You can see how this allows you to dynamically add a task to be executed by iterating through the required tasks in a loop, and linking each one's signature. After the loop you would want to call something like s.apply_async(...) to execute them.

Related

Accessing Celery Group Task Results inside a Celery Worker

I have to spawn certain tasks and have them execute in parallel. However I also need to have all their results of all these updated centrally.
Is it possible to access the results of all these tasks within a parent task somehow? I know I cant call a task_result.get() from a tasks since Celery doesnt allow it, is there any other way to achieve this?
You can make Celery wait for the result of a subtask (see disable_sync_subtasks parameter to get()), it's just not recommended because you could deadlock the worker (see here for more details). So if you use it, you should know what you are doing.
The recommended way for your use case is to use a chord:
A chord is just like a group but with a callback. A chord consists of a header group and a body, where the body is a task that should execute after all of the tasks in the header are complete.
This would indeed require you to refactor your logic a bit so you don't need the subtasks' results inside the parent task but to process it in the chord's body.

Flask Celery: Event synchronization type

I have pretty unique behavior I need to achieve with celery. I understand that it is not recommended to have tasks block at all, however I think it is necessary here as I describe below. Pseudocode:
Task 1:
Set event to false
Start group of task 2
Scrape website every few seconds to check for changes
If changes found, set event
Task 2:
Log into website with selenium.
Block until event from Task 1 is set
Perform website action with selenium
I would want task2 to be executed multiple times in parallel for multiple users. Therefore checking the website for updates in each instance of task2 would result in a large number of requests to the website which is not acceptable.
For a normal flow like this, I would to use task1 to start login tasks in a group and start another group after the condition has been met to execute the action tasks. However, the web action is time-sensitive and I don't want to re-open a new selenium instance (which would defeat the purpose of having this structure in the first place).
I've seen examples like this: Flask Celery task locking but using a Redis cache seems unnecessary for this application (and it does not need to be atomic because the 'lock' is only modified by task1). I've also looked into Celery's remote control but I'm not sure if there is the capability to block until a signal is received.
There is a similar question here which was solved by splitting the task I want to block into 2 separate tasks, but again I can't do this.
Celery tasks can themselves enqueue tasks, so it's possible to wait for an event like "it's 9am", and then spawn off a bunch of parallel tasks. If you need to launch an additional task on the completion of a group of parallel tasks (i.e., if you need a fan-in task at the completion of all fan-out tasks), the mechanism you want is chords.

Config celery to start task again after it has been finished

How I can config celery to get one worker always run the same task. And after it ended starts it again on the same worker.
It looks like you will need to take two steps
Create a separate queue for this task, route the task to the queue
2a. Create an infinite loop that calls your particular task, such as this answer
OR
2b. Have a recursive task that calls itself on completion (this could get messy)

How to ensure task execution order per user using Celery, RabbitMQ and Django?

I'm running Django, Celery and RabbitMQ. What I'm trying to achieve is to ensure, that tasks related to one user are executed in order (specifically, one at the time, I don't want task concurrency per user)
whenever new task is added for user, it should depend on the most recently added task. Additional functionality might include not adding task to queue, if task of this type is queued for this user and has not yet started.
I've done some research and:
I couldn't find a way to link newly created task with already queued one in Celery itself, chains seem to be only able to link new tasks.
I think that both functionalities are possible to implement with custom RabbitMQ message handler, though it might be hard to code after all.
I've also read about celery-tasktree and this might be an easiest way to ensure execution order, but how do I link new task with already "applied_async" task_tree or queue? Is there any way that I could implement that additional no-duplicate functionality using this package?
Edit: There is this also this "lock" example in celery cookbook and as the concept is fine, I can't see a possible way to make it work as intended in my case - simply if I can't acquire lock for user, task would have to be retried, but this means pushing it to the end of queue.
What would be the best course of action here?
If you configure the celery workers so that they can only execute one task at a time (see worker_concurrency setting), then you could enforce the concurrency that you need on a per user basis. Using a method like
NUMBER_OF_CELERY_WORKERS = 10
def get_task_queue_for_user(user):
return "user_queue_{}".format(user.id % NUMBER_OF_CELERY_WORKERS)
to get the task queue based on the user id, every task will be assigned to the same queue for each user. The workers would need to be configured to only consume tasks from a single task queue.
It would play out like this:
User 49 triggers a task
The task is sent to user_queue_9
When the one and only celery worker that is listening to user_queue_9 is ready to consume a new task, the task is executed
This is a hacky answer though, because
requiring just a single celery worker for each queue is a brittle system -- if the celery worker stops, the whole queue stops
the workers are running inefficiently

How to retrieve Celery tasks that have not yet started, using Django?

From reading the Celery documentation, it looks like I should be able to use the following python code to list tasks on the queue that have not yet been picked up:
from celery.task.control import inspect
i = inspect()
tasks = i.reserved()
However, when running this code, the list of tasks is empty, even if there are items waiting in the queue (I have verified they are definitely in the queue using django-admin). The same is true for using the command line equivalent:
$ celeryctl inspect reserved
So I'm guessing this is not in fact what this command is for? If not, what is the accepted way to retrieve a list of tasks that have not yet started? Do I have to maintain my own list of task IDs in the code in order to query them?
The reason I ask is because I am trying to handle a situation where two tasks are queued which perform a write operation on the same object in the database. If both tasks execute in parallel and task 1 takes longer than task 2, it will overwrite the output from task 2, but I want the output from the most recent task i.e. task 2. So my plan was to cancel any pending tasks that operate on an object each time a new task is added which will write to the same object.
Thanks
Tom
You can see pending tasks using scheduled instead of reserved.
$ celeryctl inspect scheduled

Categories