Tell completion for a set of celery tasks

Tell completion for a set of celery tasks - python

How I can run a another method action() automatically when a set of celery tasks is finished. Are there any simple way to trigger another function call on completion?
#tasks.py
#app.task
def rank(item):
# Update database
#main.py
from tasks import rank
def action():
print('Tasks has been finished.')
ans = list()
for item in tqdm.tqdm(all_items):
rank.apply_async(([{"_id": item["_id"], "max": item["max"]}]))

In the previous message that is very similar to this one, which you deleted, I explained how to do this without using the Chord workflow primitive that you for some reason decided to avoid... You even left some parts of that code here that does nothing (ans = list()). I will put that part of the answer here, as it explains how what you need can be accomplished:
Without some code changes your code will not work. For starters, apply_async() does not return result. So, after you modify the code to ans.append(rank.apply_async(([{"_id": item["_id"], "max": item["max"]}])).get()) it should work as you want, but unfortunately it will not distribute tasks (which is why we use Celery!), so in order to emulate the logic that Chord does, you would need to call apply_async() as you do, store the task IDs, and periodically poll for state. If the task is finished, get the result and do this until all are finished.
Solution B would be to use Group primitive, schedule your tasks to be executed in a group, obtain GroupResult object, and do the same what I wrote above - periodically poll for individual results.
If you do this polling in a loop, than you can simply call action() after the loop, as it will be called after all tasks are finished. Once you implement this you will understand why many of experienced Celery users use Chord instead...

Related

Task scheduler for python jobs

I have one main function which I want execute with different arguments. It's function which play video on raspberry pi using omxplayer.
I would like to use scheduler which let me to plan executing of specific task, they should define time when task will be executed and/or make a queue, and if I execute this main function, scheduler places this task at the end of queue.
I have tried Python-RQ and it's good, but the problem is that I don't know how I can add new task at the end of queue if I don't know name of previous job..
I have function which should adds jobs to queue.
def add_movie(path):
q.enqueue(run_movie2, '{0}'.format(path))
Which execute:
def run_movie2(path):
subprocess.Popen(['omxplayer','-o', 'hdmi', '/home/bart/FlaskApp/movies/{0}'.format(path)])
return "Playing {0}".format(path)
Do you know scheduler which meet the requirements?
What can you advise with python rq? Is it any way to do it one by one ? How can I always add jobs at the end of queue ?
Thank you.

Python Celery subtask group not executing

I'm trying to use Celery to handle background tasks. I currently have the following setup:
#app.task
def test_subtask(id):
print('test_st:', id)
#app.task
def test_maintask():
print('test_maintask')
g = group(test_subtask.s(id) for id in range(10))
g.delay()
test_maintask is scheduled to execute every n seconds, which works (I see the print statement appearing in the command line window where I started the worker). What I'm trying to do is have this scheduled task spawn a series of subtasks, which I've grouped here using group().
It seems, however, like none of the test_subtask tasks are being executed. What am I doing wrong? I don't have any timing/result constraints for these subtasks and just want them to happen some time from now, asynchronously, in no particular order. n seconds later, test_maintask will fire again (and again) but with none of the subtasks executing.
I'm using one worker, one beat, and AMQP as a broker (on a separate machine).
EDIT: For what it's worth, the problem seems to be purely because of one task calling another (and not something because of the main task being scheduled). If I call the main task manually:
celery_funcs.test_maintask.delay()
I see the main task's print statement but -- again -- not the subtasks. Calling a subtask directly does work however:
celery_funcs.test_subtask.delay(10)

Sigh... just found out the answer, I used the following to configure my Celery app:
app = Celery('celery_app', broker='<my_broker_here>')
Strangely enough, this is not being picked up in the task itself... that is,
print('test_maintask using broker', app.conf.BROKER_URL, current_app.conf.BROKER_URL)
Gives back '<my_broker_here>' and None respectively, causing the group to be send of to... some default broker (I guess?).
Adding BROKER_URL to app.conf.update does the trick, though I'm still not completely clear on what's going on in Celery's internals here...

Chain where all the stages aren't known ahead of time

My app spiders various websites, and uses Celery to do it in a nice, distributed way.
The spidering can be split into stages, kind of like a chain, except I don't know exactly what tasks are going to be in each stage ahead of time. For example, a lot of spiders run one task to get a list of results, then run another task for each result to get more information on the result. I'm going to call this new kind of thing an "unknown chain".
My problem is how to implement the unknown chain. I'd like to be able to use it wherever a chain can be used, such as waiting for it synchronously, running it with a callback, or (most importantly) putting it into a chord.
My current solution is to have the task for each stage return the signature for the next stage. I can then create one function that synchronously waits for the unknown chain to complete:
def run_unknown_chain_sync(unknown_chain):
result = unknown_chain.delay().get()
while isinstance(result, Signature):
result = result.delay().get()
return result
And another function + task that does it asynchronously with a callback:
def run_query_async(unknown_chain):
unknown_chain_advance.delay(unknown_chain, callback)
#app.task
def unknown_chain_advance(unknown_chain, callback):
if isinstance(unknown_chain, Signature):
chain(unknown_chain, unknown_chain_advance(callback)).delay()
else:
callback.delay(result)
The main problem with this solution is that the unknown chain can't be used in a chord.
Other ideas I came up with:
Do some kind of yucky messing around with Celery's innards and somehow create a new kind of task that represents an unknown chain. If it looks like a task, it should work like a task.
It would intercept whatever is reporting the task as finished, and check if the task is actually done or just returning the next stage. If it's returning the next stage, it would "forget" to report the task as finished and start the next stage, and chain something onto that which repeats the process.
Not a very good idea because it will break when I update Celery. Also, I haven't looked too close at the Celery codebase, but I suspect this might be impossible.
Create a new kind of primitive, kind of like chain, but called unknown_chain. I doubt this can be done because from my reading of the celery code, Celery is not designed to allow you to make new kinds of signatures like this.
Invent my own way of chording unknown chains, like I invented my own way of running them with a callback. The question is, how the hell would you do that?

Create celery tasks then run synchronously

My app gathers a bunch of phone numbers on a page. Once the user hits the submit button I create a celery task to call each number and give a reminder message then redirect them to a page where they can see the live updates about the call. I am using web sockets to live update the status of each call and need the tasks to execute synchronously as I only have access to dial out from one number.
So once the first call/task is completed, I want the next one to fire off.
I took a look at CELERY_ALWAYS_EAGER settings but it just went through the first iteration and stopped.
#task
def reminder(number):
# CODE THAT CALLS NUMBER HERE....
def make_calls(request):
for number in phone_numbers:
reminder.delay(number)
return redirect('live_call_updates')

If you look at the celery DOCS on tasks you see that to call a task synchronosuly, you use the apply() method as opposed to the apply_async() method.
So in your case you could use:
reminder.apply(args=[number])
The DOCS also note that:
If the CELERY_ALWAYS_EAGER setting is set, it will be replaced by a local apply() call instead.
Thanks to #JivanAmara who in the comments reiterated that when using apply(), the task will run locally(in the server/computer in which its called). And this can have ramifications, if you intended to run your tasks across multiple servers/machines.

if you want to fire each call one after another, why dont you wrap all the calls in one task
#task
def make_a_lot_of_calls(numbers):
for num in numbers:
# Assuming that reminder blocks till the call finishes
reminder(number)
def make_calls(request):
make_a_lot_of_calls.delay(phone_numers)
return redirect('live_call_updates')

Can use celery chain.
from celery import chain
tasks = [reminder.s(number) for number in phone_numbers]
chain(*tasks).apply_async()

How to know if a particular task inside a queue is complete?

I have a doubt with respect to python queues.
I have written a threaded class, whose run() method executes the queue.
import threading
import Queue
def AThread(threading.Thread):
def __init__(self,arg1):
self.file_resource=arg1
threading.Thread.__init__(self)
self.queue=Queue.Queue()
def __myTask(self):
self.file_resource.write()
''' Method that will access a common resource
Needs to be synchronized.
Returns a Boolean based on the outcome
'''
def run():
while True:
cmd=self.queue.get()
#cmd is actually a call to method
exec("self.__"+cmd)
self.queue.task_done()
#The problem i have here is while invoking the thread
a=AThread()
a.queue.put("myTask()")
print "Hai"
The same instance of AThread (a=AThread()) will load tasks to the queue from different locations.
Hence the print statement at the bottom should wait for the task added to the queue through the statement above and wait for a definitive period and also receive the value returned after executing the task.
Is there a simplistic way to achieve this ?. I have searched a lot regarding this, kindly review this code and provide suggessions.
And Why python's acquire and release lock are not on the instances of the class. In the scenario mentioned, instances a and b of AThread need not be synchronized, but myTask runs synchronized for both instances of a as well as b when acquire and release lock are applied.
Kindly provide suggestions.

There's lots of approaches you could take, depending on the particular contours of your problem.
If your print "Hai" just needs to happen after myTask completes, you could put it into a task and have myTask put that task on the queue when it finishes. (if you're a CS theory sort of person, you can think of this as being analogous to continuation-passing style).
If your print "Hai" has a more elaborate dependency on multiple tasks, you might look into futures or promises.
You could take a step into the world of Actor-based concurrency, in which case there would probably be a synchronous message send method that does more or less what you want.
If you don't want to use futures or promises, you can achieve a similar thing manually, by introducing a condition variable. Set the condition variable before myTask starts and pass it to myTask, then wait for it to be cleared. You'll have to be very careful as your program grows and constantly rethink your locking strategy to make sure it stays simple and comprehensible - this is the stuff of which difficult concurrency bugs is made.
The smallest sensible step to get what you want is probably to provide a blocking version of Queue.put() which does the condition variable thing. Make sure you think about whether you want to block until the queue is empty, or until the thing you put on the queue is removed from the queue, or until the thing you put on the queue has finished processing. And then make sure you implement the thing you decided to implement when you were thinking about it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.