If I understood the tutorial correctly, Celery subtask supports almost the same API as task, but has the additional advantage that it can be passed around to other functions or processes.
Clearly, if that was the case, Celery would have simply replaced tasks with subtasks instead of keeping both (e.g., the #app.task decorator would have converted a function to a subtask instead of to a task, etc.). So I must be misunderstanding something.
What can a task do that a subtask can't?
Celery API changed quite a bit; my question is specific to version 3.1 (currently, the latest).
Edit:
I know the docs say subtasks are intended to be called from other tasks. My question is what prevents Celery from getting rid of tasks completely and using subtasks everywhere? They seem to be strictly more flexible/powerful than tasks:
# tasks.py
from celery import Celery
app = Celery(backend='rpc://')
#app.task
def add(x, y):
# just print out a log line for testing purposes
print(x, y)
# client.py
from tasks import add
add_subtask = add.subtask()
# in this context, it seems the following two lines do the same thing
add.delay(2, 2)
add_subtask.delay(2, 2)
# when we need to pass argument to other tasks, we must use add_subtask
# so it seems add_subtask is strictly better than add
You will take the difference into account when you start using complex workflows with celery.
A signature() wraps the arguments, keyword arguments, and execution
options of a single task invocation in a way such that it can be
passed to functions or even serialized and sent across the wire.
Signatures are often nicknamed “subtasks” because they describe a task
to be called within a task.
Also:
subtask‘s are objects used to pass around the signature of a task
invocation, (for example to send it over the network)
Task is just a function definition wrapped with decorator, but subtask is a task with parameters passed, but not yet started. You may transfer the subtask serialized over network or, more used, call it within a group/chain/chord.
Related
I have to spawn certain tasks and have them execute in parallel. However I also need to have all their results of all these updated centrally.
Is it possible to access the results of all these tasks within a parent task somehow? I know I cant call a task_result.get() from a tasks since Celery doesnt allow it, is there any other way to achieve this?
You can make Celery wait for the result of a subtask (see disable_sync_subtasks parameter to get()), it's just not recommended because you could deadlock the worker (see here for more details). So if you use it, you should know what you are doing.
The recommended way for your use case is to use a chord:
A chord is just like a group but with a callback. A chord consists of a header group and a body, where the body is a task that should execute after all of the tasks in the header are complete.
This would indeed require you to refactor your logic a bit so you don't need the subtasks' results inside the parent task but to process it in the chord's body.
In my python project i want to use celery for creation pipeline of tasks: some tasks will be grouped and this group is part of chain. Schema of pipeline:
task_chain = chain(
group(
chain(taks1.s(uid=uid, index=i), task2.s(uid=uid, index=i)) for i in
range(len(collection))
),
task3.s(uid=uid),
task4.s(uid=uid),
reduce_job_results_from_pages.s(job_uid=job_uid),
push_metrics.s(job_uid=job_uid))
Should i use result backend in this case or only broker is enough?
I don't understand what technology celery use to sync results of tasks and pass result of previous task or group of tasks to next one in chain.
Thank you!
The answer is somewhat provided in the Important Notes section on the Canvas page:
Tasks used within a chord must not ignore their results. In practice this means that you must enable a result_backend in order to use chords. Additionally, if task_ignore_result is set to True in your configuration, be sure that the individual tasks to be used within the chord are defined with ignore_result=False. This applies to both Task subclasses and decorated tasks.
You may wonder that you could get away without it since you do not use Chord. - I believe Celery will transform any Chain with a Group in it into a Chord.
I want to temporarily convert the distributed behavior of a celery task to serial behavior. That is to say, I want the process to run the task code as if the task decorator were not present. I need this for debugging purposes.
I could swear there was an env var which handles this but I can't seem to find it in the documentation?
For example:
#celery.task()
def add_together(a, b):
return a + b
When the add_together method is called I do not want it sent to a celery worker.
I think you mean eager mode which can be turned on with task_always_eager setting. With that turned on, all tasks will be executed locally instead of being sent to the queue.
I'm trying to use Celery to handle background tasks. I currently have the following setup:
#app.task
def test_subtask(id):
print('test_st:', id)
#app.task
def test_maintask():
print('test_maintask')
g = group(test_subtask.s(id) for id in range(10))
g.delay()
test_maintask is scheduled to execute every n seconds, which works (I see the print statement appearing in the command line window where I started the worker). What I'm trying to do is have this scheduled task spawn a series of subtasks, which I've grouped here using group().
It seems, however, like none of the test_subtask tasks are being executed. What am I doing wrong? I don't have any timing/result constraints for these subtasks and just want them to happen some time from now, asynchronously, in no particular order. n seconds later, test_maintask will fire again (and again) but with none of the subtasks executing.
I'm using one worker, one beat, and AMQP as a broker (on a separate machine).
EDIT: For what it's worth, the problem seems to be purely because of one task calling another (and not something because of the main task being scheduled). If I call the main task manually:
celery_funcs.test_maintask.delay()
I see the main task's print statement but -- again -- not the subtasks. Calling a subtask directly does work however:
celery_funcs.test_subtask.delay(10)
Sigh... just found out the answer, I used the following to configure my Celery app:
app = Celery('celery_app', broker='<my_broker_here>')
Strangely enough, this is not being picked up in the task itself... that is,
print('test_maintask using broker', app.conf.BROKER_URL, current_app.conf.BROKER_URL)
Gives back '<my_broker_here>' and None respectively, causing the group to be send of to... some default broker (I guess?).
Adding BROKER_URL to app.conf.update does the trick, though I'm still not completely clear on what's going on in Celery's internals here...
I'm using Celery in Python to run background tasks and couldn't find any definitive answer to the question of whether I can split the Celery task definition from task implementation?
For example, take the really simple task below:
#celery_app.task
def add_numbers(num1, num2):
return num1 + num2
The definition and implementation are in the same file i.e. when the caller imports this module to call add_numbers, both the definition and implementation are imported.
In this case, not so bad. But my tasks are a bit more complex, importing multiple modules and packages that the caller certainly doesn't need and I'd like to keep out of the caller.
So, does Celery provide a way to do this? Or am I going against the framework? Is this even a problem?
I have seen this question Celery dynamic tasks / hiding Celery implementation behind an interface
implementation-behind-an-interface, but it is well over two years old - more than enough time for a lot to change.
There's a feature called signatures which allows calling tasks without importing them. You will need the Celery app instance to be available:
sig = celery_app.signature('myapp.add_numbers', args=(1,2))
sig.delay()