When I try sending a 10MB HTTP request from my server to my client both running on the same machine, it takes ~2s to complete. I would expect it to be faster since all the data is just moving over localhost loopback. I followed this article to test my localhost speed and it showed ~2000MB/s, at that speed a 10MB request should take 5ms.
Why does my HTTP request take so long and how can I speed it up?
The context for why I'm asking is that I am trying to make a web based GUI for my Python project and it needs to display large data tables. I thought a web-based GUI could be just as fast as native frameworks like tkinter since over localhost the HTTP request time should be negligible. I tried FastAPI, Flask, Tornado and ExpressJS, but they all had similar slow performance.
server.py
from fastapi import FastAPI
from starlette.middleware.cors import CORSMiddleware
import uvicorn
from fastapi.responses import PlainTextResponse
app = FastAPI()
app.add_middleware(CORSMiddleware,
allow_methods=["*"],
allow_headers=["*"],
allow_origins=["*"])
DATA = 'x' * 10 ** 7
#app.get("/api/data", response_class=PlainTextResponse)
def _():
return DATA
uvicorn.run(app, debug=False, port=8000)
client.py
import requests
import time
start = time.time()
response = requests.get("http://localhost:8000/api/data")
end = time.time()
duration = (end - start)
print(duration)
print(f"Response size: {len(response.content) / 1048576} MB")
Related
I'm setting up a web api using flask_restful
Backend is just bare Flask app with a post method.
from flask import Flask, request
from flask_restful import Resource, Api
app = Flask(__name__)
api = Api(app)
class Test(Resource):
def post(self):
return {'you sent': request.form}, 201
api.add_resource(Test, '/')
if __name__ == '__main__':
app.run(debug=True)
I tried calling this with requests.post, but I was surprised at the speed, so I used curl which was significantly faster.
import requests
import time
import os
t0 = time.time()
os.popen('curl --data "" http://localhost:5000')
print(time.time() - t0)
t0 = time.time()
response = requests.post('http://localhost:5000', data="")
print(time.time() - t0)
Results in an output
0.276999950409
2.03600001335
My experience in coding web based things is very limited, so maybe there is something obvious that I'm missing here, but why is requests.post approx. eight times slower than invoking curl?
This api will be the main portal for a suite of desktop applications, so responsiveness is quite important.
I have a scikit-learn classifier running as a Dockerised Flask app, launched with gunicorn. It receives input data in JSON format as a POST request, and responds with a JSON object of results.
When the app is first launched with gunicorn, a large model (serialised with joblib) is read from a database, and loaded into memory before the app is ready for requests. This can take 10-15 minutes.
A reproducible example isn't feasible, but the basic structure is illustrated below:
from flask import Flask, jsonify, request, Response
import joblib
import json
def classifier_app(model_name):
# Line below takes 10-15 mins to complete
classifier = _load_model(model_name)
app = Flask(__name__)
#app.route('/classify_invoice', methods=['POST'])
def apicall():
query = request.get_json()
results = _build_results(query['data'])
return Response(response=results,
status=200,
mimetype='application/json')
print('App loaded!')
return app
How do I configure Flask or gunicorn to return a 'still loading' response (or suitable error message) to any incoming http requests while _load_model is still running?
Basically, you want to return two responses for one request. So there are two different possibilities.
First one is to run time-consuming task in background and ping server with simple ajax requests every two seconds to check if task is completed or not. If task is completed, return result, if not, return "Please standby" string or something.
Second one is to use websockets and flask-socketio extension.
Basic server code would be something like this:
from threading import Thread
from flask import Flask
app = Flask(__name__)
socketio = SocketIO(app)
def do_work():
result = your_heavy_function()
socketio.emit("result", {"result": result}, namespace="/test/")
#app.route("/api/", methods=["POST"])
def start():
socketio.start_background_task(target=do_work)
# return intermediate response
return Response()
On the client side you should do something like this
var socket = io.connect('http://' + document.domain + ':' + location.port + '/test/');
socket.on('result', function(msg) {
// Process your request here
});
For further details, visit this blog post, flask-socketio documentation for server-side reference and socketio documentation for client-side reference.
PS Using web-sockets this you can make progress-bar too.
#packages.
greenlet==0.4.11
Flask==0.11.1
#centos, /etc/security/limit.conf
* soft nofile 65535
* hard nofile 65535
This is my test codes (python 3.5) I ran this and watched memory usage.
At First, It started with 30MB memory with 3 threads.
But After sending bulk "/do" request on this server,
memory increase to 60MB with 12 threads. Although sending and every request is done. this memory usage is not changed.
from gevent import monkey;monkey.patch_all(thread=False)
import gevent
from flask import Flask, request
from gevent.pywsgi import WSGIServer
import requests
app = Flask(__name__)
#app.route("/do", methods=['GET', 'POST'])
def ping():
data = request.get_json()
gevent.spawn(send_request, data)
return 'pong'
def send_request(data):
resp = requests.get("http://127.0.0.1:25000/ping", data=data)
if resp.text != 'pong':
app.logger.error(resp.text)
if __name__ == "__main__":
http = WSGIServer(("0.0.0.0", 9999), app)
http.serve_forever()
end_server = True
app.logger.info("Server will be closed")
I think this python uses all available 65535 file count.
How can I limit python to use less file count than I configured in limit.conf file?
python seems not reuse socket when it busy, so It makes socket over an over again until limit.conf nofile limit when sending request in spawn.
So, I just gave a limit for this python process.
import resource
resource.setrlimit(resource.RLIMIT_NOFILE, (1024, 1024))
== updated ==
But requests library still consumes a lot of memory..
I just decided to use tornado http server and AsyncHttpClient with this options below,
AsyncHTTPClient.configure("tornado.simple_httpclient.SimpleAsyncHTTPClient", max_clients=1000)
tornado.netutil.Resolver.configure("tornado.netutil.ThreadedResolver")
you need to write this code on global area below "import" stuffs.
and used gen.moment after finishing request to send it immediately.
#gen.coroutine
def get(self):
self.write("pong")
self.finish()
yield gen.moment
resp = yield self.application.http_client.fetch("...url...", method='POST', headers={"Content-Type": "application/json"},
body=json.dumps({..data..}))
I currently have a flask app that makes a call to S3 as well as an external API with the following structure before rendering the data in javascript:
from flask import Flask, render_template,make_response
from flask import request
import requests
import requests_cache
import redis
from boto3.session import Session
import json
app = Flask(__name__)
#app.route('/test')
def test1():
bucket_root = 'testbucket'
session = Session(
aws_access_key_id='s3_key',
aws_secret_access_key='s3_secret_key')
s3 = session.resource('s3')
bucket = s3.Bucket(bucket_root)
testvalues = json.dumps(s3.Object(bucket_root,'all1.json').get()['Body'].read())
r = requests.get(api_link)
return render_template('test_html.html',json_s3_test_response=r.content,
limit=limit, testvalues=testvalues)
#app.route('/test2')
def test2():
bucket_root = 'testbucket'
session = Session(
aws_access_key_id='s3_key',
aws_secret_access_key='s3_secret_key')
s3 = session.resource('s3')
bucket = s3.Bucket(bucket_root)
testvalues = json.dumps(s3.Object(bucket_root,'all2.json').get()['Body'].read())
r = requests.get(api_link)
return render_template('test_html.html',json_s3_test_response=r.content,
limit=limit, testvalues=testvalues)
#app.errorhandler(500)
def internal_error(error):
return "500 error"
#app.errorhandler(404)
def not_found(error):
return "404 error",404
#app.errorhandler(400)
def custom400(error):
return "400 error",400
//catch all?
#app.errorhandler(Exception)
def all_exception_handler(error):
return 'error', 500
Obviously I have a lot of inefficiencies here, but my main question is:
To me it seems like I'm calling S3 and the external API for each client, every time they refresh the page. This increases the chance for the app to crash due to timeouts (and my poor error handling) and diminishes performance. I would like to resolve this by periodically caching the S3 results (say every 10 mins) into a local redis server (already set up and running) as well as just pinging the external API just once from the server every few seconds before passing it onto ALL clients.
I have code that can store the data into redis every 10 mins in a regular python script, however, I'm not sure where to place this within the flask server? Do I put it as it's own function or keep the call to redis in the #app.route()?
Thank you everyone for your time and effort. Any help would be appreciated! I'm new to flask so some of this has been confusing.
I am trying to be able to respond incoming web requests simultaneously, while processing of a request includes quite long IO call. I'm going to use gevent, as it's supposed to be "non-blocking"
The problem I found is that requests are processed sequentially even though I have a lot of gevent threads. For some reason requests get served by single green thread.
I have nginx (with default config which isn't relevant here I think), also I have uwsgi and simple wsgi app that emulates IO-blocking call as gevent.sleep(). Here they are:
uwsgi.ini
[uwsgi]
chdir = /srv/website
home = /srv/website/env
module = wsgi:app
socket = /tmp/uwsgi_mead.sock
#daemonize = /data/work/zx900/mob-effect.mead/logs/uwsgi.log
processes = 1
gevent = 100
gevent-monkey-patch
wsgi.py
import gevent
import time
from flask import Flask
app = Flask(__name__)
#app.route("/")
def hello():
t0 = time.time()
gevent.sleep(10.0)
t1 = time.time()
return "{1} - {0} = {2}".format(t0, t1, t1 - t0)
then I simultaneously (almost) open two tabs in my browser, and here is what I get as result:
1392297388.98 - 1392297378.98 = 10.0021491051
# first tab, processing finished at 1392297378.98
1392297398.99 - 1392297388.99 = 10.0081849098
# second tab, processing started at 1392297398.99
As you can see, first call blocked execution of the view. What did I wrong?
Send requests with curl or anything else than browser as browser has a limit on the number of simultaneous connections per site or per address. Or use two different browsers.