I have to spawn certain tasks and have them execute in parallel. However I also need to have all their results of all these updated centrally.
Is it possible to access the results of all these tasks within a parent task somehow? I know I cant call a task_result.get() from a tasks since Celery doesnt allow it, is there any other way to achieve this?
You can make Celery wait for the result of a subtask (see disable_sync_subtasks parameter to get()), it's just not recommended because you could deadlock the worker (see here for more details). So if you use it, you should know what you are doing.
The recommended way for your use case is to use a chord:
A chord is just like a group but with a callback. A chord consists of a header group and a body, where the body is a task that should execute after all of the tasks in the header are complete.
This would indeed require you to refactor your logic a bit so you don't need the subtasks' results inside the parent task but to process it in the chord's body.
Related
Pardon my ignorance as I am learning how I can use celery for my purposes.
Suppose I have two tasks: create_ticket and add_message_to_ticket. Usually create_ticket task is created and completed before add_message_to_ticket tasks are created multiple times.
#app.task
def create_ticket(ticket_id):
time.sleep(random.uniform(1.0, 4.0)) # replace with code that processes ticket creation
return f"Successfully processed ticket creation: {ticket_id}"
#app.task
def add_message_to_ticket(ticket_id, who, when, message_contents):
# TODO add code that checks to see if create_ticket task for ticket_id has already been completed
time.sleep(random.uniform(1.0, 4.0)) # replace with code that handles added message
return f"Successfully processed message for ticket {ticket_id} by {who} at {when}"
Now suppose that these tasks are created out of order due to Python's server receiving the events from an external web service out of order. For example, one add_message_to_ticket.delay(82, "auroranil", 1599039427, "This issue also occurs on Microsoft Edge on Windows 10.") gets called few seconds before create_ticket.delay(82) gets called. How would I solve the following problems?
How would I fetch results of celery task create_ticket by specifying ticket_id within task add_message_to_ticket? All I can think of is to maintain a database that stores tickets state, and checks to see if a particular ticket has been created, but I want to know if I am able to use celery's result backend somehow.
If I receive an add_message_to_ticket task with a ticket id where I find out that corresponding ticket does not have create_ticket task completed, do I reject that task, and put that back in the queue?
Do I need to ensure that the tasks are idempotent? I know that is good practice, but is it a requirement for this to work?
Is there a better approach at solving this problem? I am aware of Celery Canvas workflow with primitives such as chain, but I am not sure how I can ensure that these events are processed in order, or be able to put tasks on pending state while it waits for tasks it depends on to be completed based on arguments I want celery to check, which in this case is ticket_id.
I am not particularly worried if I receive multiple user messages for a particular ticket with timestamps out of order, as it is not as important as knowing that a ticket has been created before messages are added to that ticket. The point I am making is that I am coding up several tasks where some events crucially depend on others, whereas the ordering of other events do not matter as much for the Python's server to function.
Edit:
Partial solutions:
Use task_id to identify Celery tasks, with a formatted string containing argument values which identifies that task. For example, task_id="create_ticket(\"TICKET000001\")"
Retry tasks that do not meet dependency requirements. Blocking for subtasks to be completed is bad, as subtask may never complete, and will hog a process in one of the worker machines.
Store arguments as part of result of a completed task, so that you can use that information not available in later tasks.
Relevant links:
Where do you set the task_id of a celery task?
Retrieve result from 'task_id' in Celery from unknown task
Find out whether celery task exists
More questions:
How do I ensure that I send task once per task_id? For instance, I want create_ticket task to be applied asynchronous only once. This is an alternative to making all tasks idempotent.
How do I use AsyncResult in add_message_to_ticket to check for status of create_ticket task? Is it possible to specify a chain somehow even though the first task may have already been completed?
How do I fetch all results of tasks given task name derived from the name of the function definition?
Most importantly, should I use Celery results backend to abstract stored data away from dealing with a database? Or should I scratch this idea and just go ahead with designing a database schema instead?
I am using celery 3 with Django.
I have a list of jobs in database. User can start a particular job which starts a celery task.
Now I want user to be able to start multiple jobs and it should add them to the celery queue and process them one after the other not in parallel as with async.
I am trying to create a job scheduler with celery where user can select the jobs to execute and they will be executed in sequential fashion.
If I use chain() then I cannot add new tasks to the chain dynamically.
What is the best solution?
A better primitive for you to use would be link instead of chain in Celery.
From the documentation:
s = add.s(2, 2)
s.link(mul.s(4))
s.link(log_result.s())
You can see how this allows you to dynamically add a task to be executed by iterating through the required tasks in a loop, and linking each one's signature. After the loop you would want to call something like s.apply_async(...) to execute them.
I'm trying to use Celery to handle background tasks. I currently have the following setup:
#app.task
def test_subtask(id):
print('test_st:', id)
#app.task
def test_maintask():
print('test_maintask')
g = group(test_subtask.s(id) for id in range(10))
g.delay()
test_maintask is scheduled to execute every n seconds, which works (I see the print statement appearing in the command line window where I started the worker). What I'm trying to do is have this scheduled task spawn a series of subtasks, which I've grouped here using group().
It seems, however, like none of the test_subtask tasks are being executed. What am I doing wrong? I don't have any timing/result constraints for these subtasks and just want them to happen some time from now, asynchronously, in no particular order. n seconds later, test_maintask will fire again (and again) but with none of the subtasks executing.
I'm using one worker, one beat, and AMQP as a broker (on a separate machine).
EDIT: For what it's worth, the problem seems to be purely because of one task calling another (and not something because of the main task being scheduled). If I call the main task manually:
celery_funcs.test_maintask.delay()
I see the main task's print statement but -- again -- not the subtasks. Calling a subtask directly does work however:
celery_funcs.test_subtask.delay(10)
Sigh... just found out the answer, I used the following to configure my Celery app:
app = Celery('celery_app', broker='<my_broker_here>')
Strangely enough, this is not being picked up in the task itself... that is,
print('test_maintask using broker', app.conf.BROKER_URL, current_app.conf.BROKER_URL)
Gives back '<my_broker_here>' and None respectively, causing the group to be send of to... some default broker (I guess?).
Adding BROKER_URL to app.conf.update does the trick, though I'm still not completely clear on what's going on in Celery's internals here...
How I can config celery to get one worker always run the same task. And after it ended starts it again on the same worker.
It looks like you will need to take two steps
Create a separate queue for this task, route the task to the queue
2a. Create an infinite loop that calls your particular task, such as this answer
OR
2b. Have a recursive task that calls itself on completion (this could get messy)
I'm running Django, Celery and RabbitMQ. What I'm trying to achieve is to ensure, that tasks related to one user are executed in order (specifically, one at the time, I don't want task concurrency per user)
whenever new task is added for user, it should depend on the most recently added task. Additional functionality might include not adding task to queue, if task of this type is queued for this user and has not yet started.
I've done some research and:
I couldn't find a way to link newly created task with already queued one in Celery itself, chains seem to be only able to link new tasks.
I think that both functionalities are possible to implement with custom RabbitMQ message handler, though it might be hard to code after all.
I've also read about celery-tasktree and this might be an easiest way to ensure execution order, but how do I link new task with already "applied_async" task_tree or queue? Is there any way that I could implement that additional no-duplicate functionality using this package?
Edit: There is this also this "lock" example in celery cookbook and as the concept is fine, I can't see a possible way to make it work as intended in my case - simply if I can't acquire lock for user, task would have to be retried, but this means pushing it to the end of queue.
What would be the best course of action here?
If you configure the celery workers so that they can only execute one task at a time (see worker_concurrency setting), then you could enforce the concurrency that you need on a per user basis. Using a method like
NUMBER_OF_CELERY_WORKERS = 10
def get_task_queue_for_user(user):
return "user_queue_{}".format(user.id % NUMBER_OF_CELERY_WORKERS)
to get the task queue based on the user id, every task will be assigned to the same queue for each user. The workers would need to be configured to only consume tasks from a single task queue.
It would play out like this:
User 49 triggers a task
The task is sent to user_queue_9
When the one and only celery worker that is listening to user_queue_9 is ready to consume a new task, the task is executed
This is a hacky answer though, because
requiring just a single celery worker for each queue is a brittle system -- if the celery worker stops, the whole queue stops
the workers are running inefficiently