How to use grequests in flask? - python

I am experimenting with gunicorn with gevent workers and Flask. In a view function, I need to make a GET request to an url. Previously, I'm using requests library to do that. But since I want to use gevent workers, I believe I have to make it async, so I use grequests library.
This is my code:
from gevent import monkey
monkey.patch_all()
import grequests
from flask import Flask
from flask import jsonify
pool = grequests.Pool()
#app.route('/<search_term>')
def find_result(search_term):
return get_result(search_term) or ''
def get_result(search_term):
request = grequests.get('https://www.google.com/search?q=' + search_term)
job = grequests.send(request, pool=pool)
async_resp = job.get() # this blocks
resp = async_resp.response
if resp.status_code == 200:
return resp.text
Is this the correct way to use grequests, since I use blocking job.get()? I want to exploit the fact that the problem I have is IO bound.

Related

Returning 'still loading' response with Flask API

I have a scikit-learn classifier running as a Dockerised Flask app, launched with gunicorn. It receives input data in JSON format as a POST request, and responds with a JSON object of results.
When the app is first launched with gunicorn, a large model (serialised with joblib) is read from a database, and loaded into memory before the app is ready for requests. This can take 10-15 minutes.
A reproducible example isn't feasible, but the basic structure is illustrated below:
from flask import Flask, jsonify, request, Response
import joblib
import json
def classifier_app(model_name):
# Line below takes 10-15 mins to complete
classifier = _load_model(model_name)
app = Flask(__name__)
#app.route('/classify_invoice', methods=['POST'])
def apicall():
query = request.get_json()
results = _build_results(query['data'])
return Response(response=results,
status=200,
mimetype='application/json')
print('App loaded!')
return app
How do I configure Flask or gunicorn to return a 'still loading' response (or suitable error message) to any incoming http requests while _load_model is still running?
Basically, you want to return two responses for one request. So there are two different possibilities.
First one is to run time-consuming task in background and ping server with simple ajax requests every two seconds to check if task is completed or not. If task is completed, return result, if not, return "Please standby" string or something.
Second one is to use websockets and flask-socketio extension.
Basic server code would be something like this:
from threading import Thread
from flask import Flask
app = Flask(__name__)
socketio = SocketIO(app)
def do_work():
result = your_heavy_function()
socketio.emit("result", {"result": result}, namespace="/test/")
#app.route("/api/", methods=["POST"])
def start():
socketio.start_background_task(target=do_work)
# return intermediate response
return Response()
On the client side you should do something like this
var socket = io.connect('http://' + document.domain + ':' + location.port + '/test/');
socket.on('result', function(msg) {
// Process your request here
});
For further details, visit this blog post, flask-socketio documentation for server-side reference and socketio documentation for client-side reference.
PS Using web-sockets this you can make progress-bar too.

Process Multiple Requests Simultaneously and return the result using Klein Module Python

Hi I am using Klein Python module for my web server.
I need to run each request separately as a thread and also need to
return the result.
But Klein waits until the completion of single request to process
another request.
I also tried using deferToThread from twisted module. But it also
process the requests only after completion of the first request.
Similarly I also tried #inlineCallbacks method it also produce the
same result.
Note: This methods works perfectly when there is nothing to return.
But I need to return the result.
Here I attached a sample code snippet below,
import time
import klein
import requests
from twisted.internet import threads
def test():
print "started"
x = requests.get("http://google.com")
time.sleep(10)
return x.text
app = klein.Klein()
#app.route('/square/submit',methods = ['GET'])
def square_submit(request):
return threads.deferToThread(test)
app.run('localhost', 8000)
As #notorious.no suggested, the code is valid and it works.
To prove it, check out this code
# app.py
from datetime import datetime
import json
import time
import random
import string
import requests
import treq
from klein import Klein
from twisted.internet import task
from twisted.internet import threads
from twisted.web.server import Site
from twisted.internet import reactor, endpoints
app = Klein()
def test(y):
print(f"test called at {datetime.now().isoformat()} with arg {y}", )
x = requests.get("http://www.example.com")
time.sleep(10)
return json.dumps([{
"time": datetime.now().isoformat(),
"text": x.text[:10],
"arg": y
}])
#app.route('/<string:y>',methods = ['GET'])
def index(request, y):
return threads.deferToThread(test, y)
def send_requests():
# send 3 concurrent requests
rand_letter = random.choice(string.ascii_letters)
for i in range(3):
y = rand_letter + str(i)
print(f"request send at {datetime.now().isoformat()} with arg {y}", )
d = treq.get(f'http://localhost:8080/{y}')
d.addCallback(treq.content)
d.addCallback(lambda r: print("response", r.decode()))
loop = task.LoopingCall(send_requests)
loop.start(15) # repeat every 15 seconds
reactor.suggestThreadPoolSize(3)
# disable unwanted logs
# app.run("localhost", 8080)
# this way reactor logs only print calls
web_server = endpoints.serverFromString(reactor, "tcp:8080")
web_server.listen(Site(app.resource()))
reactor.run()
Install treq and klein and run it
$ python3.6 -m pip install treq klein requests
$ python3.6 app.py
The output should be
request send at 2019-12-28T13:22:27.771899 with arg S0
request send at 2019-12-28T13:22:27.779702 with arg S1
request send at 2019-12-28T13:22:27.780248 with arg S2
test called at 2019-12-28T13:22:27.785156 with arg S0
test called at 2019-12-28T13:22:27.786230 with arg S1
test called at 2019-12-28T13:22:27.786270 with arg S2
response [{"time": "2019-12-28T13:22:37.853767", "text": "<!doctype ", "arg": "S1"}]
response [{"time": "2019-12-28T13:22:37.854249", "text": "<!doctype ", "arg": "S0"}]
response [{"time": "2019-12-28T13:22:37.859076", "text": "<!doctype ", "arg": "S2"}]
...
As you can see Klein does not block the requests.
Furthermore, if you decrease thread pool size to 2
reactor.suggestThreadPoolSize(2)
Klein will execute the first 2 requests and wait until there is a free thread again.
And "async alternatives", suggested by #notorious.no are discussed here.
But Klein waits until the completion of single request to process another request.
This is not true. In fact, there's absolutely nothing wrong with the code you've provided. Simply running your example server at tcp:localhost:8000 and using the following curl commands, invalidates your claim:
curl http://localhost:8000/square/submit & # run in background
curl http://localhost:8000/square/submit
Am I correct in assuming you're testing the code in a web browser? If you are, then you're experiencing a "feature" of most modern browsers. The browser will make single request per URL at a given time. One way around this in the browser would be to add a bogus query string at the end of the URL, like so:
http://localhost:8000/squre/submit
http://localhost:8000/squre/submit?bogus=0
http://localhost:8000/squre/submit?bogus=1
http://localhost:8000/squre/submit?bogus=2
However, a very common mistake new Twisted/Klein developers tend to make is to write blocking code, thinking that Twisted will magically make it async. Example:
#app.route('/square/submit')
def square_submit():
print("started")
x = requests.get('https://google.com') # blocks the reactor
time.sleep(5) # blocks the reactor
return x.text
Code like this will handle requests sequentially and should be modified with async alternatives.

Handling async requests using multiple APIs - Python Flask Redis

I'm working on an application that will have to consult multiple APIs for information and after processing the data, will output the answer to a client. The client uses a browser to connect to a web server to forward the request, afterwards, the web server will look for the information needed from the multiple APIs and after joining the responses from those APIs will then give an answer to the client.
The web server was built using Flask and a module that extracts the needed information for each API was also implemented (Python). Since the consulting process for each API takes time, I would like to give the web server a timeout for responding, therefore, after the requests are sent only those that are below the time buffer will be used.
My proposed solution:
Use a Redis Queue and an RQ worker to enqueue the requests for each API and store the responses on the Queue then wait for the timeout and collect the responses that were able to respond in the allowed time. Afterwards, process the information and give the response to the user.
The flask web server is setup something like this:
#app.route('/result',methods=["POST"])
def show_result():
inputText = request.form["question"]
tweetModule = Twitter()
tweeterResponse = tweetModule.ask(params=inputText)
redditObject = RedditModule()
redditResponse = redditObject.ask(params=inputText)
edmunds = Edmunds()
edmundsJson = edmunds.ask(params=inputText)
# More APIs could be consulted here
# Send each request async and the synchronize the responses from the queue
template = env.get_template('templates/result.html')
return render_template(template,resp=resp)
The worker:
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
And lets assume each Module handles its own queueing process.
I can see some problems ahead:
What happens to the information stored on the queue that did not make it to the timeout?
How can I make Flask wait and then extract the responses from the Queue?
Is it possible that information could get mixed if two clients ask in the same time-frame?
Is there a better way to handle the async requests and then synchronize the response?
Thanks!
In such cases I prefer a combination of HTTPX and flask[async]
First - HTTPX
HTTPX offers a standard synchronous API by default, but also gives you the option of an async client if you need it.
Async is a concurrency model that is far more efficient than multi-threading, and can provide significant performance benefits and enable the use of long-lived network connections such as WebSockets.
If you're working with an async web framework then you'll also want to use an async client for sending outgoing HTTP requests.
>>> async with httpx.AsyncClient() as client:
... r = await client.get('https://www.example.com/')
...
>>> r
<Response [200 OK]>
Second - Using async and await in a flask
Routes, error handlers, before request, after request, and teardown functions can all be coroutine functions if Flask is installed with the async extra (pip install flask[async]). It requires Python 3.7+ where contextvars.ContextVar is available. This allows views to be defined with async def and use await.
For example, you should do something like this:
import asyncio
import httpx
from flask import Flask, render_template, request
app = Flask(__name__)
#app.route('/async', methods=['GET', 'POST'])
async def async_form():
if request.method == 'POST':
...
async with httpx.AsyncClient() as client:
tweeterResponse, redditResponse, edmundsJson = await asyncio.gather(
client.get(f'https://api.tweeter....../id?id={request.form["tweeter_id"]}', timeout=None),
client.get(f'https://api.redditResponse.....?key={APIKEY}&reddit={request.form["reddit_id"]}'),
client.post(f'https://api.edmundsJson.......', data=inputText)
)
...
resp = {
"tweeter_response" : tweeterResponse,
"reddit_response": redditResponse,
"edmunds_json" : edmundsJson
}
template = env.get_template('templates/result.html')
return render_template(template, resp=resp)

Is a Python requests Session object shared between gevent greenlets, thread-safe (between greenlets)?

Can a requests library session object be used across greenlets safely in a gevented program?
EDIT - ADDING MORE EXPLANATION:
When a greenlet yields because it has made a socket call to send the request to the server, can the same socket (inside the shared session object) be used safely by another greenlet to send its own request?
END EDIT
I attempted to test this with the code posted here - https://gist.github.com/donatello/0b399d0353cb29dc91b0 - however I got no errors or unexpected results. However, this does not validate thread safety.
In the test, I use a shared session object to make lots of requests and try to see if the object gets the requests mixed up - it is kind of naive, but I do not get any exceptions.
For convenience, I am re-pasting the code here:
client.py
import gevent
from gevent.monkey import patch_all
patch_all()
import requests
import json
s = requests.Session()
def make_request(s, d):
r = s.post("http://127.0.0.1:5000", data=json.dumps({'value': d}))
if r.content.strip() != str(d*2):
print("Sent %s got %s" % (r.content, str(d*2)))
if r.status_code != 200:
print(r.status_code)
print(r.content)
gevent.joinall([
gevent.spawn(make_request, s, v)
for v in range(300)
])
server.py
from gevent.wsgi import WSGIServer
from gevent.monkey import patch_all
patch_all()
from flask import Flask
from flask import request
import time
import json
app = Flask(__name__)
#app.route('/', methods=['POST', 'GET'])
def hello_world():
d = json.loads(request.data)
return str(d['value']*2)
if __name__ == '__main__':
http_server = WSGIServer(('', 5000), app)
http_server.serve_forever()
Exact library versions:
requirements.txt
Flask==0.10.1
Jinja2==2.7.2
MarkupSafe==0.23
Werkzeug==0.9.4
argparse==1.2.1
gevent==1.0.1
greenlet==0.4.2
gunicorn==18.0
itsdangerous==0.24
requests==2.3.0
wsgiref==0.1.2
Is there some other test that can check greenlet thread safety? The requests documentation is not too clear on this point.
The author of requests has also created a gevent-integration package: grequests. Use that instead.
It supports passing in a session with the session keyword:
import grequests
s = requests.Session()
requests = [grequests.post("http://127.0.0.1:5000",
data=json.dumps({'value': d}), session=s)
for d in range(300)]
responses = grequests.map(requests)
for r in responses:
if r.content.strip() != str(d*2):
print("Sent %s got %s" % (r.content, str(d*2)))
if r.status_code != 200:
print(r.status_code)
print(r.content)
Note that while the session is shared, concurrent requests must use a separate connection (socket) each; HTTP 1.x can't multiplex multiple queries over the same connection.

Python & URLLIB2 - Request webpage but don't wait for response

In python, how would I go about making a http request but not waiting for a response. I don't care about getting any data back, I just need to server to register a page request.
Right now I use this code:
urllib2.urlopen("COOL WEBSITE")
But obviously this pauses the script until a a response is returned, I just want to fire off a request and move on.
How would I do this?
What you want here is called Threading or Asynchronous.
Threading:
Wrap the call to urllib2.urlopen() in a threading.Thread()
Example:
from threading import Thread
def open_website(url):
return urllib2.urlopen(url)
Thread(target=open_website, args=["http://google.com"]).start()
Asynchronous:
Unfortunately there is no standard way of doing this in the Python standard library.
Use the requests library which has this support.
Example:
from requests import async
async.get("http://google.com")
There is also a 3rd option using the restclient library which has
builtin (has for some time) Asynchronous support:
from restclient import GET
res = GET("http://google.com", async=True, resp=True)
Use thread:
import threading
threading.Thread(target=urllib.urlopen, args=('COOL WEBSITE',)).start()
Don't forget args argument should be tuple. That's why there's trailing ,.
You can do this with requests library as follows
import requests
try:
requests.get("http://127.0.0.1:8000/test/",timeout=10)
except requests.exceptions.ReadTimeout: #this confirms you that the request has reached server
do_something
except:
print "unable to reach server"
raise
from the above code you can send async requests without getting response. Specify timeout according to your need. if not it will not time out.
gevent may be a proper choice.
First patch socket:
import gevent
import gevent.monkey
monkey.patch_socket()
monkey.patch_ssl()
Then use gevent.spawn() to encapulate your requests to generate greenlets. It will not block the main thread and be very fast!
Here's a simple tutorial.

Categories