I have to do some long work in my Flask app. And I want to do it async. Just start working, and then check status from javascript.
I'm trying to do something like:
#app.route('/sync')
def sync():
p = Process(target=routine, args=('abc',))
p.start()
return "Working..."
But this it creates defunct gunicorn workers.
How can it be solved? Should I use something like Celery?
There are many options. You can develop your own solution, use Celery or Twisted (I'm sure there are more already-made options out there but those are the most common ones).
Developing your in-house solution isn't difficult. You can use the multiprocessing module of the Python standard library:
When a task arrives you insert a row in your database with the task id and status.
Then launch a process to perform the work which updates the row status at finish.
You can have a view to check if the task is finished, which actually just checks the status in the corresponding.
Of course you have to think where you want to store the result of the computation and what happens with errors.
Going with Celery is also easy. It would look like the following.
To define a function to be executed asynchronously:
#celery.task
def mytask(data):
... do a lot of work ...
Then instead of calling the task directly, like mytask(data), which would execute it straight away, use the delay method:
result = mytask.delay(mydata)
Finally, you can check if the result is available or not with ready:
result.ready()
However, remember that to use Celery you have to run an external worker process.
I haven't ever taken a look to Twisted so I cannot tell you if it more or less complex than this (but it should be fine to do what you want to do too).
In any case, any of those solutions should work fine with Flask. To check the result it doesn't matter at all if you use Javascript. Just make the view that checks the status return JSON (you can use Flask's jsonify).
I would use a message broker such as rabbitmq or activemq. The flask process would add jobs to the message queue and a long running worker process (or pool or worker processes) would take jobs off the queue to complete them. The worker process could update a database to allow the flask server to know the current status of the job and pass this information to the clients.
Using celery seems to be a nice way to do this.
Related
I am trying to build a Tornado web server which takes requests from multiple clients. The request consists of:
a. For a given directory name passed through an URL, zip the files, etc and FTP it out.
b. Providing a status of sorts if the task is completed.
So, rather than making it a synchronous and linear process, I wanted to break it down into multiple subtasks. The client will submit the URL request and then simply receive a response of sorts 'job submitted'. A bit later, the client can come along asking status on this job. During this time the job obviously has to finish its task.
I am confused between what modules to use - Tornado Subprocess, Popen contructor, Subprocess.Call, etc. I've read Python docs but can't find anything where the task is running longer and Tornado is not supposed to wait for it to finish. So, I need a mechanism to start a job, let it run its course but relinquish the client and then when asked by client provide a status on it.
Any help is appreciated. Thanks.
Python programmers widely use Celery for a set of processes to manage a queue of tasks. Set up Celery with RabbitMQ and write a Celery worker (perhaps with Celery Canvas that does the work you need: zips a directory, ftps it to somewhere, etc.
The Tornado-Celery integration package provides something that appears close to what you need to integrate your Tornado application with Celery.
This is all a lot of moving parts to install and configure at first, of course, but it will prepare you for a maintainable application architecture.
I have to do some long work in my Flask app. And I want to do it async. Just start working, and then check status from javascript.
I'm trying to do something like:
#app.route('/sync')
def sync():
p = Process(target=routine, args=('abc',))
p.start()
return "Working..."
But this it creates defunct gunicorn workers.
How can it be solved? Should I use something like Celery?
There are many options. You can develop your own solution, use Celery or Twisted (I'm sure there are more already-made options out there but those are the most common ones).
Developing your in-house solution isn't difficult. You can use the multiprocessing module of the Python standard library:
When a task arrives you insert a row in your database with the task id and status.
Then launch a process to perform the work which updates the row status at finish.
You can have a view to check if the task is finished, which actually just checks the status in the corresponding.
Of course you have to think where you want to store the result of the computation and what happens with errors.
Going with Celery is also easy. It would look like the following.
To define a function to be executed asynchronously:
#celery.task
def mytask(data):
... do a lot of work ...
Then instead of calling the task directly, like mytask(data), which would execute it straight away, use the delay method:
result = mytask.delay(mydata)
Finally, you can check if the result is available or not with ready:
result.ready()
However, remember that to use Celery you have to run an external worker process.
I haven't ever taken a look to Twisted so I cannot tell you if it more or less complex than this (but it should be fine to do what you want to do too).
In any case, any of those solutions should work fine with Flask. To check the result it doesn't matter at all if you use Javascript. Just make the view that checks the status return JSON (you can use Flask's jsonify).
I would use a message broker such as rabbitmq or activemq. The flask process would add jobs to the message queue and a long running worker process (or pool or worker processes) would take jobs off the queue to complete them. The worker process could update a database to allow the flask server to know the current status of the job and pass this information to the clients.
Using celery seems to be a nice way to do this.
Are there any pure wsgi implementation of background task?
I want to use local variables under the same context directly, not serialize/deserialize to another daemon process via a broker.
Is it possible to make this happen under the current wsgi infrastructure? E.g. after return response yield, run some callback functions?
This is a duplicate of question asked on the Python WEB-SIG. I reference the same page as provided in response to the question on the Python WEB-SIG so others can see it:
http://code.google.com/p/modwsgi/wiki/RegisteringCleanupCode
In doing this though, it ties up the request thread and so it would not be able to handle other requests until your task has finished.
Creating background threads at the end of a request is not a good idea unless you do it using a pooling mechanism such that you limit the number of worker threads for your tasks. Because the process can crash or be shutdown, you loose the job as only in memory and thus not persistent.
Better to use Celery, or if you think that is too heavy weight, have a look at Redis Queue (RQ) instead.
You could look at Django async. It uses an in-database queue and so handles transactions much better. All arguments need to be JSONable as does the return type. In some cases this means you may need to schedule a wrapper function, but that oughtn't to cause you any headaches.
http://pypi.python.org/pypi/django-async
You don't want to be doing this sort of thing inside the web server -- it's absolutely not the right place to do it. Django async provides a manage.py command for flushing the queue which you can run in a loop, possible on another machine from the web server.
I have a reminder type app that schedules tasks in celery using the "eta" argument. If the parameters in the reminder object changes (e.g. time of reminder), then I revoke the task previously sent and queue a new task.
I was wondering if there's any good way of keeping track of revoked tasks across celeryd restarts. I'd like to have the ability to scale celeryd processes up/down on the fly, and it seems that any celeryd processes started after the revoke command was sent will still execute that task.
One way of doing it is to keep a list of revoked task ids, but this method will result in the list growing arbitrarily. Pruning this list requires guarantees that the task is no longer in the RabbitMQ queue, which doesn't seem to be possible.
I've also tried using a shared --statedb file for each of the celeryd workers, but it seems that the statedb file is only updated on termination of the workers and thus not suitable for what I would like to accomplish.
Thanks in advance!
Interesting problem, I think it should be easy to solve using broadcast commands.
If when a new worker starts up it requests all the other workers to dump its revoked
tasks to the new worker. Adding two new remote control commands,
you can easily add new commands by using #Panel.register,
Module control.py:
from celery.worker import state
from celery.worker.control import Panel
#Panel.register
def bulk_revoke(panel, ids):
state.revoked.update(ids)
#Panel.register
def broadcast_revokes(panel, destination):
panel.app.control.broadcast("bulk_revoke", arguments={
"ids": list(state.revoked)},
destination=destination)
Add it to CELERY_IMPORTS:
CELERY_IMPORTS = ("control", )
The only missing problem now is to connect it so that the new worker
triggers broadcast_revokes at startup. I guess you could use the worker_ready
signal for this:
from celery import current_app as celery
from celery.signals import worker_ready
def request_revokes_at_startup(sender=None, **kwargs):
celery.control.broadcast("broadcast_revokes",
destination=sender.hostname)
I had to do something similar in my project and used celerycam with django-admin-monitor. The monitor takes a snapshot of tasks and saves them in the database periodically. And there is a nice user interface to browse and check the status of all tasks. And you can even use it even if your project is not Django based.
I implemented something similar to this some time ago, and the solution I came up with was very similar to yours.
The way I solved this problem was to have the worker fetch the Task object from the database when the job ran (by passing it the primary key, as the documentation recommends). In your case, before the reminder is sent the worker should perform a check to ensure that the task is "ready" to be run. If not, it should simply return without doing any work (assuming that the ETA has changed and another worker will pick up the new job).
So I have a background procss that I need to expose/control as a web service. I have wrapped the process to be able to accept commands via a pipe, but now am trying to find out how to control it.
Requirements are as follows:
Need to be able to start the process via the web
Need to be able to send cmds
Need to be able to return results from cmds
Process once started is alive until killed
I think the main question is how do I get django to own the process? Own in the sense, keep a valid save the pipe for future communication with the background process. Right now its something along the lines (just an example):
if __name__ == '__main__':
to_process_pipe, process_pipe = Pipe()
node = PFacade(process_pipe)
p.start()
to_process_pipe.send(['connect'])
print to_process_pipe.recv()
p.killed = True
p.join()
I think I need a better way to be able to communicate, bc I am not sure how I could store the Pipe in DJango.
And please, if you are going to respond with use Celery, please give me a good explination of how.
ok, so you want a process to be up and running and accepting commands from the django workers?
In such a case celery will not be a good solution as it is not providing communication after task is spawned.
IMHO a good solution will be to have a deamon (implemented as django management command) with a infinite main loop, some sleep between runs, listening to the commands from the specific queue.
For the communication - kombu/django-kombu will be great (that was a part of celery).
My final solution was to write a custom "mailbox" based on pidbox.Mailbox. Their implementation was horribly broken but the algorithm was solid.
I basically stood up a REST API hosted via django and then had that rest api send a message to a AMQP Queue(QPID implementation).
I then had a process that sits, monitors the queues, and passess along any commands as they came in.
It worked well and was pretty awesome when it came together.
Maybe Celery (asynchronous task queue/job queue based on distributed message passing) fits your bill.