To minimize the request time I want to execute the method after return 200 to client.
#app.route('/register', methods=['POST'])
def register():
#code and code
return 200
send_email_with_validation_url()
How can I do it? With threads?
You can do it with threads, but without some control you could end up with lots of threads choking resources. You could also end up with processes crashing without you being aware.
This is the job for a queue system. Celery would be a good fit. Something along the lines of:
from celery import Celery
app = Celery('tasks', broker='amqp://guest#localhost//')
#app.task
send_email_job(address):
send_email_with_validation_url()
#app.route('/register', methods=['POST'])
def register():
#code and code
send_email_job.delay(address)
return 200
In this example, send_email_job will be scheduled run in the background (in a different thread or process or even machine if you want) with the given arguments and your server will return immediately.
Celery is great but if the task isn't critical asyncio would be a great option to explore, see this
Related
Let's say I have a (websocket) API, api.py, as such:
from flask import Flask, request
from flask_socketio import SocketIO, emit
from worker import Worker
app = Flask()
socketio = SocketIO(app)
worker = Worker()
worker.start()
#socketio.on('connect')
def connect():
print("Client", request.sid, "connected")
#socketio.on('get_results')
def get_results(query):
"""
The only endpoing of the API.
"""
print("Client", request.sid, "requested results for query", query)
# Set the worker to work, wait for results to be ready, and
# send the results back to the client.
worker.task_queue.put(query)
results = worker.result_queue.get()
emit("results", results)
#socketio.on('disconnect')
def disconnect():
print("Client", request.sid, "disconnected, perhaps before results where ready")
# What to do here?
socketio.run(app, host='')
The a API will serve many clients but only has a single worker to produce the results that should be served. worker.py:
from multiprocessing import Process, Queue
class Worker(Process):
def __init__(self):
super().__init__()
self.task_queue = Queue()
self.result_queue = Queue()
self.some_stateful_variable = 0
# Do other computationally expensive work
def reset_state(self):
# Computationally inexpensive.
pass
def do_work(self, task):
# Computationally expensive. Takes long time.
# Modifies internal state.
pass
def run(self):
while True:
task = self.task_queue.get()
results = self.do_work(task)
self.result_queue.put(results)
The worker gets a request, i.e. a task to do, and sets forth producing a result. When the result is ready, the client will be served it.
But not all clients are patient. They may leave, i.e. disconnect from the API, before the results are ready. They don't want them, and the worker therefore ends up working on a task that does not need to finish. That makes other client in queue wait unnecessarily. How to avoid this situation, and get the worker to abort executing do_work for a task that does not need to finish?
In client side: when user closes browser tab or leave the page send request to your Flask server, the request should contain id of the task you would like to cancel.
In server side put cancel status for the task in database or any shared variable between Flask Server and your Worker Process
Divide Task processing in several stages and check status of task in database before each stage, if status is canceled - stop the task processing.
Another choice for point 1 is to do some monitoring in Server side in separate Process - count interval between status requests from client side.
I've handled similar problems by launching an entirely separate process via:
sp.call('start python path\\worker.py', shell=True)
worker.py would then report its PID back to the api.py via redis, then its straightforward to kill the process at any point from api.py
Of course, how viable that is for you will depend on how much data resides within api.py and is shared to worker.py - whether its feasible for that to also pass via redis or not is for you to decide.
The added benefit is you decouple socket from heavy compute - and you can go quasi-multi-core (single thread per worker.py). You could go full multi core by incorporating multiprocessing into each worker.py if you wished.
For learning purpose I want to implement the next thing:
I have a script that runs selenium for example in the background and I have some log messages that help me to see what is going on in the terminal.
But I want to get the same messages in my REST request to the Angular app.
print('Started')
print('Logged in')
...
print('Processing')
...
print('Success')
In my view.py file
class RunTask(viewsets.ViewSet):
queryset = Task.objects.all()
#action(detail=False, methods=['GET'], name='Run Test Script')
def run(self, request, *args, **kwargs):
task = task()
if valid['success']:
return Response(data=task)
else:
return Response(data=task['message'])
def task()
print('Staring')
print('Logged in')
...
print('Processing')
...
print('Success')
return {
'success': True/False,
'message': 'my status message'
}
Now it shows me only the result of the task. But I want to get the same messages to indicate process status in frontend.
And I can't understand how to organize it.
Or how I can tell angular about my process status?
Unfortunately, it's not that simple. Indeed, the REST API lets you start the task, but since it runs in the same thread, the HTTP request will block until the task is finished before sending the response. Your print statements won't appear in the HTTP response but on your server output (if you look at the shell where you ran python manage.py runserver, you'll see those print statements).
Now, if you wish to have those output in real-time, you'll have to look for WebSockets. They allow you to open a "tunnel" between the browser and the server, and send/receive messages in real-time. The django-channels library allow you to implement them.
However, for long-running background tasks (like a Selenium scraper), I would advise to look into the Celery task queue. Basically, your Django process will schedule task into the queue. The tasks into the queue will then be executed by one (or more !) "worker" processes. The advantage of this is that your Django process won't be blocked by the long task: it justs add some work into the queue and then respond.
When you add tasks in the queue, Celery will give you a unique identifier for this task, that you can return in the HTTP response. You can then very well implement another endpoint which takes a task id in parameter and return the state of the task (is it pending ? done ? failed ?).
For this to work, you'll have to setup a "broker", a kind of database that will store the tasks to do and their results (typically RabbitMQ or Redis). Celery documentation explains this well: https://docs.celeryproject.org/en/latest/getting-started/brokers/index.html
Either way you choose, it's not a trivial thing and will need quite some work before having some results ; but it's interesting to see how it expands the possibilities of a classical HTTP server.
I have designed a REST API which receives inputs through POST requests and then applies some logic to the inputs and returns to the callback uri which is part of the inputs.
This design was working fine for single input, but then i want to implement multithreading so that i can handle multiple POST requests. I have tried using 'app.run(threaded=True)' but was not successful.
I am running this code on linux platform. Not sure what is wrong in the following code, and am not so good at using threads in python, would appreciate if someone can let me know where the issue is:
I am able to get the '200' response once there is a POST request and the inputs are appended to 'inp_params', after which there is no processing in the thread.
from flask import Flask, jsonify, request
import time
import json
import os
import threading
import Queue
import test_func_module as tf
app = Flask(__name__)
inp_params = []
# Create the queue and threader
q = Queue.Queue()
#app.route('/', methods = ['GET', 'POST'] )
def get_data():
if request.method == 'GET':
return 'RESTful API'
elif request.method == 'POST':
global inp_params
inputs = {"fileName": request.json["fileName"], "fileId": request.json["fileId"], "ModuleId": request.json["ModuleId"], "WorkflowId": request.json["WorkflowId"],"Language": request.json["Language"], "callbackuri": request.json["callbackuri"]}
inp_params.append(inputs)
return '200'
def test_integrate(worker):
TF_output = tf.test_func(worker)
return TF_output
def threader():
while True:
# gets an worker from the queue
worker = q.get()
# Run the example job with the avail worker in queue (thread)
test_integrate(worker)
# completed with the job
q.task_done()
if __name__ == '__main__':.
for worker in inp_params:
q.put(worker)
for x in range(4): #4 cores
t = threading.Thread(target=threader)
# classifying as a daemon, so they will die when the main dies
t.daemon = True
# begins, must come after daemon definition
t.start()
# wait until the thread terminates.
q.join()
app.run(threaded=True)
#Shilparani Since you mentioned
I have tried using 'app.run(threaded=True)' but was not successful.
May not be exact answer for your question but I would like to share my experience for achieving concurrency through uwsgi/gunicorn :
Keep it simple by coding Flask for REST endpoints and move Multithreading , MultiProcessing logic to gunicorn or uwsgi where you can mention threads and workers which help for achieving concurrency , parallelism if that's what you are trying to achieve.
gunicorn -b localhost:8080 -w 4 -t 4 app:app
Based on your need and operations:
If tasks are CPU intensive try to keep #workers as #CPU-cores
If tasks are I/O intensive may be safe to try with more threads
Is there a way in Flask to send the response to the client and then continue doing some processing? I have a few book-keeping tasks which are to be done, but I don't want to keep the client waiting.
Note that these are actually really fast things I wish to do, thus creating a new thread, or using a queue, isn't really appropriate here. (One of these fast things is actually adding something to a job queue.)
QUICK and EASY method.
We will use pythons Thread Library to acheive this.
Your API consumer has sent something to process and which is processed by my_task() function which takes 10 seconds to execute.
But the consumer of the API wants a response as soon as they hit your API which is return_status() function.
You tie the my_task to a thread and then return the quick response to the API consumer, while in the background the big process gets compelete.
Below is a simple POC.
import os
from flask import Flask,jsonify
import time
from threading import Thread
app = Flask(__name__)
#app.route("/")
def main():
return "Welcome!"
#app.route('/add_')
def return_status():
"""Return first the response and tie the my_task to a thread"""
Thread(target = my_task).start()
return jsonify('Response asynchronosly')
def my_task():
"""Big function doing some job here I just put pandas dataframe to csv conversion"""
time.sleep(10)
import pandas as pd
pd.DataFrame(['sameple data']).to_csv('./success.csv')
return print('large function completed')
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8080)
Sadly teardown callbacks do not execute after the response has been returned to the client:
import flask
import time
app = flask.Flask("after_response")
#app.teardown_request
def teardown(request):
time.sleep(2)
print("teardown_request")
#app.route("/")
def home():
return "Success!\n"
if __name__ == "__main__":
app.run()
When curling this you'll note a 2s delay before the response displays, rather than the curl ending immediately and then a log 2s later. This is further confirmed by the logs:
teardown_request
127.0.0.1 - - [25/Jun/2018 15:41:51] "GET / HTTP/1.1" 200 -
The correct way to execute after a response is returned is to use WSGI middleware that adds a hook to the close method of the response iterator. This is not quite as simple as the teardown_request decorator, but it's still pretty straight-forward:
import traceback
from werkzeug.wsgi import ClosingIterator
class AfterResponse:
def __init__(self, app=None):
self.callbacks = []
if app:
self.init_app(app)
def __call__(self, callback):
self.callbacks.append(callback)
return callback
def init_app(self, app):
# install extension
app.after_response = self
# install middleware
app.wsgi_app = AfterResponseMiddleware(app.wsgi_app, self)
def flush(self):
for fn in self.callbacks:
try:
fn()
except Exception:
traceback.print_exc()
class AfterResponseMiddleware:
def __init__(self, application, after_response_ext):
self.application = application
self.after_response_ext = after_response_ext
def __call__(self, environ, start_response):
iterator = self.application(environ, start_response)
try:
return ClosingIterator(iterator, [self.after_response_ext.flush])
except Exception:
traceback.print_exc()
return iterator
Which you can then use like this:
#app.after_response
def after():
time.sleep(2)
print("after_response")
From the shell you will see the response return immediately and then 2 seconds later the after_response will hit the logs:
127.0.0.1 - - [25/Jun/2018 15:41:51] "GET / HTTP/1.1" 200 -
after_response
This is a summary of a previous answer provided here.
I had a similar problem with my blog. I wanted to send notification emails to those subscribed to comments when a new comment was posted, but I did not want to have the person posting the comment waiting for all the emails to be sent before he gets his response.
I used a multiprocessing.Pool for this. I started a pool of one worker (that was enough, low traffic site) and then each time I need to send an email I prepare everything in the Flask view function, but pass the final send_email call to the pool via apply_async.
You can find an example on how to use celery from within Flask
here https://gist.github.com/jzempel/3201722
The gist of the idea (pun intended) is to define the long, book-keeping tasks as #celery.task and use apply_async1 or delay to from within the view to start the task
Sounds like Teardown Callbacks would support what you want. And you might want to combine it with the pattern from Per-Request After-Request Callbacks to help with organizing the code.
You can do this with WSGI's close protocol, exposed from the Werkzeug Response object's call_on_close decorator. Explained in this other answer here: https://stackoverflow.com/a/63080968/78903
Im new to task queue api in google app engine. I have created a new queue and added a task in it using the taskqueue.add() function. I have defined the url of the task and have written down the logic for the task the url. But the task is NOT HAPPENING ASYNCHRONOUSLY as the app is waiting for the task to complete and then it continues executing the statement after the taskqueue.add() function. How do i make the task asynchronous? Any help on this issue is appreciated.
the code looks like this
class botinitiate(webapp.RequestHandler):
def get(self):
# some more statements here
template_values = {'token': token,
'me': user.user_id()
}
taskqueue.add(url='/autobot', params={'key':game_key},queue_name='autobot')
path = os.path.join(os.path.dirname(__file__), 'index.html')
self.response.out.write(template.render(path, template_values))
class autobot(webapp.RequestHandler):
def post(self):
# task logic goes here
application = webapp.WSGIApplication([('/botinitiate',botinitiate),('/autobot',autobot)],debug=True)
def main():
run_wsgi_app(application)
if __name__ == "__main__":
main()
Thanks
The recently developed dev_appserver2 provides concurrency between user requests and task queue requests, for a more accurate emulation of production.
Task queues on App Engine are asynchronous; there's no way for the request that enqueued the task to know when the task is run (short of making RPC calls or other deliberate communication). What you may be observing is the single-threaded nature of the dev_appserver development environment; this certainly won't be the case in production.
So you'd use:
add_async(task, transactional=False, rpc=None)
Source: https://developers.google.com/appengine/docs/python/taskqueue/queues
You'd need to read the docs at the above URL and apply it to your own code.