Get current celery task id anywhere in the thread - python

I'd like to get the task id inside a running task,
without knowing which task I'm in.
(That's why I can't use https://stackoverflow.com/a/8096086/245024)
I'd like it to be something like this:
#task
def my_task():
foo()
def foo():
logger.log(current_task_id)
This pattern returns in many different tasks, and I don't want to carry the task context to every inner method call.
One option could be to use the thread local storage, but then I will need to initialize it before the task starts, and clean it after it finished.
Is there something simpler?

from celery import current_task
print current_task.request.id
I'm just copying this from the comment, because it should be an answer, so thanks to #asksol.

Related

multiprocessing.Pool: get notified when a task is started

I use multiprocessing.Pool like so to execute a number of tasks.
def execute(task):
# run task, return result
def on_completion(task_result):
# process task result
async_results = [pool.apply_async(execute,
args=[task],
callback=on_completion)
for task in self.tasks]
# wait for results
My completion handler is invoked by the pool in a nice, serialized way so I don't have to worry about thread safety in its implementation.
However, I would also like to be notified when a task is started. Is there an elegant way to accomplish the following?
def on_start(arg): # Whatever arg(s) were passed to the execute function
# Called when task starts to run
pool.apply_async(run_task,
args=[task],
start_callback=on_start,
completion_callback=on_completion)

Python celery - how to wait for all subtasks in chord

I am unit testing celery tasks.
I have chain tasks that also have groups, so a chord is resulted.
The test should look like:
run celery task ( delay )
wait for task and all subtasks
assert
I tried the following:
def wait_for_result(result):
result.get()
for child in result.children or list():
if isinstance(child, GroupResult):
# tried looping over task result in group
# until tasks are ready, but without success
pass
wait_for_result(child)
This creates a deadlock, chord_unlock being retried forever.
I am not interested in task results.
How can I wait for all the subtasks to finish?
Although this is an old question, I just wanted to share how I got rid of the deadlock issue, just in case it helps somebody.
Like the celery logs says, never use get() inside a task. This indeed will create a deadlock.
I have a similar set of celery tasks which includes chain of group tasks, hence making it a chord. I'm calling these tasks using tornado, by making HTTP request. So what I did was something like this:
#task
def someFunction():
....
#task
def someTask():
....
#task
def celeryTask():
groupTask = group([someFunction.s(i) for i in range(10)])
job = (groupTask| someTask.s())
return job
When celeryTask() is being called by tornado, the chain will start executing, & the UUID of someTask() will be held in job. It will look something like
AsyncResult: 765b29a8-7873-4b28-b05c-7e19c33e950c
This UUID is returned and the celeryTask() exits before even the chain starts executing(ideally), hence leaving space for another process to run.
I then used the tornado layer to check the status of the task. Details on the tornado layer can be found in this stackoverflow question
Have you tried chord + callback ?
http://docs.celeryproject.org/en/latest/userguide/canvas.html#chords
>>> callback = tsum.s()
>>> header = [add.s(i, i) for i in range(100)]
>>> result = chord(header)(callback)
>>> result.get()
9900

Show a progress bar for my multithreaded process

I have a simple Flask web app that make many HTTP requests to an external service when a user push a button. On the client side I have an angularjs app.
The server side of the code look like this (using multiprocessing.dummy):
worker = MyWorkerClass()
pool = Pool(processes=10)
result_objs = [pool.apply_async(worker.do_work, (q,))
for q in queries]
pool.close() # Close pool
pool.join() # Wait for all task to finish
errors = not all(obj.successful() for obj in result_objs)
# extract result only from successful task
items = [obj.get() for obj in result_objs if obj.successful()]
As you can see I'm using apply_async because I want to later inspect each task and extract from them the result only if the task didn't raise any exception.
I understood that in order to show a progress bar on client side, I need to publish somewhere the number of completed tasks so I made a simple view like this:
#app.route('/api/v1.0/progress', methods=['GET'])
def view_progress():
return jsonify(dict(progress=session['progress']))
That will show the content of a session variable. Now, during the process, I need to update that variable with the number of completed tasks (the total number of tasks to complete is fixed and known).
Any ideas about how to do that? I working in the right direction?
I'have seen similar questions on SO like this one but I'm not able to adapt the answer to my case.
Thank you.
For interprocess communication you can use a multiprocessiong.Queue and your workers can put_nowait tuples with progress information on it while doing their work. Your main process can update whatever your view_progress is reading until all results are ready.
A bit like in this example usage of a Queue, with a few adjustments:
In the writers (workers) I'd use put_nowait instead of put because working is more important than waiting to report that you are working (but perhaps you judge otherwise and decide that informing the user is part of the task and should never be skipped).
The example just puts strings on the queue, I'd use collections.namedtuples for more structured messages. On tasks with many steps, this enables you to raise the resolution of you progress report, and report more to the user.
In general the approach you are taking is okay, I do it in a similar way.
To calculate the progress you can use an auxiliary function that counts the completed tasks:
def get_progress(result_objs):
done = 0
errors = 0
for r in result_objs:
if r.ready():
done += 1
if not r.successful():
errors += 1
return (done, errors)
Note that as a bonus this function returns how many of the "done" tasks ended in errors.
The big problem is for the /api/v1.0/progress route to find the array of AsyncResult objects.
Unfortunately AsyncResult objects cannot be serialized to a session, so that option is out. If your application supports a single set of async tasks at a time then you can just store this array as a global variable. If you need to support multiple clients, each with a different set of async tasks, then you will need figure out a strategy to keep client session data in the server.
I implemented the single client solution as a quick test. My view functions are as follows:
results = None
#app.route('/')
def index():
global results
results = [pool.apply_async(do_work) for n in range(20)]
return render_template('index.html')
#app.route('/api/v1.0/progress')
def progress():
global results
total = len(results)
done, errored = get_progress(results)
return jsonify({'total': total, 'done': done, 'errored': errored})
I hope this helps!
I think you should be able to update the number of completed tasks using multiprocessing.Value and multiprocessing.Lock.
In your main code, use:
processes=multiprocessing.Value('i', 10)
lock=multiprocessing.Lock()
And then, when you call worker.dowork, pass a lock object and the value to it:
worker.dowork(lock, processes)
In your worker.dowork code, decrease "processes" by one when the code is finished:
lock.acquire()
processes.value-=1
lock.release()
Now, "processes.value" should be accessible from your main code, and be equal to the number of remaining processes. Make sure you acquire the lock before acessing processes.value, and release the lock afterwards

django celery: how to set task to run at specific interval programmatically

I found that I can set the task to run at specific interval at specific times from here, but that was only done during task declaration. How do I set a task to run periodically dynamically?
The schedule is derived from a setting, and thus seems to be immutable at runtime.
You can probably accomplish what you're looking for using Task ETAs. This guarantees that your task won't run before the desired time, but doesn't promise to run the task at the designated timeā€”if the workers are overloaded at the designated ETA, the task may run later.
If that restriction isn't an issue, you could write a task which would first run itself like:
#task
def mytask():
keep_running = # Boolean, should the task keep running?
if keep_running:
run_again = # calculate when to run again
mytask.apply_async(eta=run_again)
# ... do the stuff you came here to do ...
The major downside of this approach is that you are relying on the taskstore to remember the tasks in flight. If one of them fails before firing off the next one, then the task will never run again. If your broker isn't persisted to disk and it dies (taking all in-flight tasks with it), then none of those tasks will run again.
You could solve these issues with some kind of transaction logging and a periodic "nanny" task whose job it is to find such repeating tasks that died an untimely death and revive them.
If I had to implement what you've described, I think this is how I would approach it.
celery.task.base.PeriodicTask defines is_due which determines when the next run should be. You could override this function to contain your custom dynamic running logic. See the docs here: http://docs.celeryproject.org/en/latest/reference/celery.task.base.html?highlight=is_due#celery.task.base.PeriodicTask.is_due
An example:
import random
from celery.task import PeriodicTask
class MyTask(PeriodicTask):
def run(self, **kwargs):
logger = self.get_logger(**kwargs)
logger.info("Running my task")
def is_due(self, last_run_at):
# Add your logic for when to run. Mine is random
if random.random() < 0.5:
# Run now and ask again in a minute
return (True, 60)
else:
# Don't run now but run in 10 secs
return (True, 10)
see here http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html
i think you can't make it dynamically ... best way is create task in task :D
for example you want run something for X sec later then you create new task with x sec delay and in this task create another task for N*X sec delay ...
This should help you some... http://celery.readthedocs.org/en/latest/faq.html#can-i-change-the-interval-of-a-periodic-task-at-runtime
Once you've defined a custom schedule, assign it to your task as asksol has suggested above.
CELERYBEAT_SCHEDULE = {
"my_name": {
"task": "myapp.tasks.task",
"schedule": myschedule(),
}
}
You might also want to modify CELERYBEAT_MAX_LOOP_INTERVAL if you want your schedule to update more often than every five minutes.

In Celery, how do I run a task, and then have that task run another task, and keep it going?

#tasks.py
from celery.task import Task
class Randomer(Task):
def run(self, **kwargs):
#run Randomer again!!!
return random.randrange(0,1000000)
>>> from tasks import Randomer
>>> r = Randomer()
>>> r.delay()
Right now, I run the simple task. And it returns a random number. But, how do I make it run another task , inside that task?
You can call other_task.delay() from inside Randomer.run; in this case you may want to set Randomer.ignore_result = True (and other_task.ignore_result, and so on).
Remember that celery tasks delay returns instantly, so if you don't put any limit or wait time on the nested calls (or recursive calls), you can reach meltdown pretty quickly.
Instead of recursion or nested tasks, you should consider an infinite loop to avoid stack overflow (no pun intended).
from celery.task import Task
class Randomer(Task):
def run(self, **kwargs):
while True:
do_something(**kwargs)
time.sleep(600)
You can chain subtasks as described here: http://docs.celeryproject.org/en/latest/userguide/canvas.html#chains

Categories