Python Celery subtask group not executing - python

I'm trying to use Celery to handle background tasks. I currently have the following setup:
#app.task
def test_subtask(id):
print('test_st:', id)
#app.task
def test_maintask():
print('test_maintask')
g = group(test_subtask.s(id) for id in range(10))
g.delay()
test_maintask is scheduled to execute every n seconds, which works (I see the print statement appearing in the command line window where I started the worker). What I'm trying to do is have this scheduled task spawn a series of subtasks, which I've grouped here using group().
It seems, however, like none of the test_subtask tasks are being executed. What am I doing wrong? I don't have any timing/result constraints for these subtasks and just want them to happen some time from now, asynchronously, in no particular order. n seconds later, test_maintask will fire again (and again) but with none of the subtasks executing.
I'm using one worker, one beat, and AMQP as a broker (on a separate machine).
EDIT: For what it's worth, the problem seems to be purely because of one task calling another (and not something because of the main task being scheduled). If I call the main task manually:
celery_funcs.test_maintask.delay()
I see the main task's print statement but -- again -- not the subtasks. Calling a subtask directly does work however:
celery_funcs.test_subtask.delay(10)

Sigh... just found out the answer, I used the following to configure my Celery app:
app = Celery('celery_app', broker='<my_broker_here>')
Strangely enough, this is not being picked up in the task itself... that is,
print('test_maintask using broker', app.conf.BROKER_URL, current_app.conf.BROKER_URL)
Gives back '<my_broker_here>' and None respectively, causing the group to be send of to... some default broker (I guess?).
Adding BROKER_URL to app.conf.update does the trick, though I'm still not completely clear on what's going on in Celery's internals here...

Related

when celery task will be execute if we call task with .delay()

my friends all the time talking about doing time-consuming task with celery .since i haven't computer science i can't get exactly about time of execution of celery task . in celery document talking about daemon when calling .delay() but i can't found what is daemon and finally when exactly celery task will be execute if we call it by .delay() ? :)
for example if i have below code when my_task will be execute? function.py:
def test():
my_task.delay()
while second<10:
second += 1 # assume this part take a second
1-exactly when test() function finished (about 10 second after test() called)
2-in the middle of while loop
3- after finished test() and when requests wasn't too many and server have time and resources to do task!! (maybe celery is intelligent and know the best time for execute task)
4- whenever want :)
5- correct way that i don't pointed to . :)
if it's depending to configuration i must tell i used default configuration from celery documentation.thank you.
Imagine that you do not have this task alone but several ones. You put all these tasks on a queue if you invoke it with my_task.delay(). Now there are several workers which just picks the first open task and will execute them.
So the right answer would be
"Whenever the responsible worker is free". This could be immediately just before you go into your while second<10:-loop but could also take several seconds or minutes if the worker is currently busy.

How to make a celery task call asynchronous tasks?

I have a Django application that needs to run an optimization algorithm. This algorithm is composed of two parts. The first part is an evolutionary algorithm and this algorithm calls a certain number of tasks of the second part which is a simulated annealing algorithm.
The problem is that celery doesn't allow a task calls an asynchronous task.
I have tried this code below:
sa_list = []
for cromossomo in self._populacao:
sa_list.append(simulated_annealing_t.s(cromossomo.to_JSON(), self._NR, self._T, get_mutacao_str(self._mutacao_SA), self._argumentos))
job = group(sa_list)
result = job.apply_async()
resultados = result.get()
This code is part of the evolutionary algorithm which is a celery task.
When I tried to run it the celery shows this message:
[2015-12-02 16:20:15,970: WARNING/Worker-1] /home/arthur/django-user/local/lib/python2.7/site-packages/celery/result.py:45: RuntimeWarning: Never call result.get() within a task!
See http://docs.celeryq.org/en/latest/userguide/tasks.html#task-synchronous-subtasks
In Celery 3.2 this will result in an exception being
raised instead of just being a warning.
despite being just a warning the celery seems to be full of tasks and locks.
I searched for a lot of solutions but none of them worked.
one way to deal with this is to have a 2 stage pipeline:
def first_task():
sa_list = []
for cromossomo in self._populacao:
sa_list.append(simulated_annealing_t.s(cromossomo.to_JSON(), self._NR, self._T, get_mutacao_str(self._mutacao_SA), self._argumentos))
job = group(sa_list)
result = job.apply_async()
result.save()
return result.id
then call it like this:
from path.to.tasks import app, first_task
result_1 = first_task.apply_async()
result_2_id = result_1.get()
result_2 = app.GroupResult.restore(result_2_id)
resultados = result_2.get()
there are other ways to do this that involve more work - you could use a chord to gather the results of the group.
The problem is not that celery doesn't allow the execution of async tasks in your example, but that you'll run into a deadlock, hence the warning:
Let's assume you have a task A that spawns a number of subtasks B through apply_async(). Every one of those tasks is executed by a worker. The problem is that if the number of tasks B is larger than the amount of available workers, task A is still waiting for their results (in your example, at least - it's not by default). When task A is still running, the workers that have executed a task B will not execute another one, they are blocked until task A is finished. (I don't know exactly why, but I had this problem just a few weeks ago.)
This means that celery can't execute anything until you manually shut down the workers.
Solutions
This depends entirely what you will do with your task results. If you need them to execute the next subtask, you can chain them through Linking with callbacks or by hardcoding it into the respective tasks (so that you call the first, that calls the second, and so on).
If you only need to see if they are executed and are successful or not, you can use flower to monitor your tasks.
If you need to process the output of all the subtasks further, I recommend writing the results to an xml file: Have task A call all tasks B, and once they are done you execute task C that processes the results. Maybe there are more elegant solutions, but this avoids the deadlock for sure.

Spawn Asyncronous Python Process From Flask [duplicate]

I have to do some long work in my Flask app. And I want to do it async. Just start working, and then check status from javascript.
I'm trying to do something like:
#app.route('/sync')
def sync():
p = Process(target=routine, args=('abc',))
p.start()
return "Working..."
But this it creates defunct gunicorn workers.
How can it be solved? Should I use something like Celery?
There are many options. You can develop your own solution, use Celery or Twisted (I'm sure there are more already-made options out there but those are the most common ones).
Developing your in-house solution isn't difficult. You can use the multiprocessing module of the Python standard library:
When a task arrives you insert a row in your database with the task id and status.
Then launch a process to perform the work which updates the row status at finish.
You can have a view to check if the task is finished, which actually just checks the status in the corresponding.
Of course you have to think where you want to store the result of the computation and what happens with errors.
Going with Celery is also easy. It would look like the following.
To define a function to be executed asynchronously:
#celery.task
def mytask(data):
... do a lot of work ...
Then instead of calling the task directly, like mytask(data), which would execute it straight away, use the delay method:
result = mytask.delay(mydata)
Finally, you can check if the result is available or not with ready:
result.ready()
However, remember that to use Celery you have to run an external worker process.
I haven't ever taken a look to Twisted so I cannot tell you if it more or less complex than this (but it should be fine to do what you want to do too).
In any case, any of those solutions should work fine with Flask. To check the result it doesn't matter at all if you use Javascript. Just make the view that checks the status return JSON (you can use Flask's jsonify).
I would use a message broker such as rabbitmq or activemq. The flask process would add jobs to the message queue and a long running worker process (or pool or worker processes) would take jobs off the queue to complete them. The worker process could update a database to allow the flask server to know the current status of the job and pass this information to the clients.
Using celery seems to be a nice way to do this.

Flask long routines

I have to do some long work in my Flask app. And I want to do it async. Just start working, and then check status from javascript.
I'm trying to do something like:
#app.route('/sync')
def sync():
p = Process(target=routine, args=('abc',))
p.start()
return "Working..."
But this it creates defunct gunicorn workers.
How can it be solved? Should I use something like Celery?
There are many options. You can develop your own solution, use Celery or Twisted (I'm sure there are more already-made options out there but those are the most common ones).
Developing your in-house solution isn't difficult. You can use the multiprocessing module of the Python standard library:
When a task arrives you insert a row in your database with the task id and status.
Then launch a process to perform the work which updates the row status at finish.
You can have a view to check if the task is finished, which actually just checks the status in the corresponding.
Of course you have to think where you want to store the result of the computation and what happens with errors.
Going with Celery is also easy. It would look like the following.
To define a function to be executed asynchronously:
#celery.task
def mytask(data):
... do a lot of work ...
Then instead of calling the task directly, like mytask(data), which would execute it straight away, use the delay method:
result = mytask.delay(mydata)
Finally, you can check if the result is available or not with ready:
result.ready()
However, remember that to use Celery you have to run an external worker process.
I haven't ever taken a look to Twisted so I cannot tell you if it more or less complex than this (but it should be fine to do what you want to do too).
In any case, any of those solutions should work fine with Flask. To check the result it doesn't matter at all if you use Javascript. Just make the view that checks the status return JSON (you can use Flask's jsonify).
I would use a message broker such as rabbitmq or activemq. The flask process would add jobs to the message queue and a long running worker process (or pool or worker processes) would take jobs off the queue to complete them. The worker process could update a database to allow the flask server to know the current status of the job and pass this information to the clients.
Using celery seems to be a nice way to do this.

how to track revoked tasks in across multiple celeryd processes

I have a reminder type app that schedules tasks in celery using the "eta" argument. If the parameters in the reminder object changes (e.g. time of reminder), then I revoke the task previously sent and queue a new task.
I was wondering if there's any good way of keeping track of revoked tasks across celeryd restarts. I'd like to have the ability to scale celeryd processes up/down on the fly, and it seems that any celeryd processes started after the revoke command was sent will still execute that task.
One way of doing it is to keep a list of revoked task ids, but this method will result in the list growing arbitrarily. Pruning this list requires guarantees that the task is no longer in the RabbitMQ queue, which doesn't seem to be possible.
I've also tried using a shared --statedb file for each of the celeryd workers, but it seems that the statedb file is only updated on termination of the workers and thus not suitable for what I would like to accomplish.
Thanks in advance!
Interesting problem, I think it should be easy to solve using broadcast commands.
If when a new worker starts up it requests all the other workers to dump its revoked
tasks to the new worker. Adding two new remote control commands,
you can easily add new commands by using #Panel.register,
Module control.py:
from celery.worker import state
from celery.worker.control import Panel
#Panel.register
def bulk_revoke(panel, ids):
state.revoked.update(ids)
#Panel.register
def broadcast_revokes(panel, destination):
panel.app.control.broadcast("bulk_revoke", arguments={
"ids": list(state.revoked)},
destination=destination)
Add it to CELERY_IMPORTS:
CELERY_IMPORTS = ("control", )
The only missing problem now is to connect it so that the new worker
triggers broadcast_revokes at startup. I guess you could use the worker_ready
signal for this:
from celery import current_app as celery
from celery.signals import worker_ready
def request_revokes_at_startup(sender=None, **kwargs):
celery.control.broadcast("broadcast_revokes",
destination=sender.hostname)
I had to do something similar in my project and used celerycam with django-admin-monitor. The monitor takes a snapshot of tasks and saves them in the database periodically. And there is a nice user interface to browse and check the status of all tasks. And you can even use it even if your project is not Django based.
I implemented something similar to this some time ago, and the solution I came up with was very similar to yours.
The way I solved this problem was to have the worker fetch the Task object from the database when the job ran (by passing it the primary key, as the documentation recommends). In your case, before the reminder is sent the worker should perform a check to ensure that the task is "ready" to be run. If not, it should simply return without doing any work (assuming that the ETA has changed and another worker will pick up the new job).

Categories