Flask slow at retrieving post data from request? - python

I'm writing flask application that accepts POST requests with json data. I noticed huge differences in response time based on data size being passed to application. After debugging I narrowed down issue to the line where I was retrieving json data from request object. It may be important to note that testing was done on flask development server.
start = time.time()
resp = json.dumps(request.json)
return str(time.time() - start)
I timed this line and for data of 1024 (probably not coincidence) and less characters this took 0.002s and for anything over 1024 over 1 second!
What is happening here? Is this the limitation of development server?
EDIT:
Same thing happens for getting POST data through request.form.get('somedata') with content lenght over 1024
EDIT:
I couldn't replicate issue with same example served by Apache
EDIT:
I started digging into Werkzeug module and found that slowness occurs when reading response message self._read(to_read) in wsgi.py module which is passed from BaseHTTPRequestHandler. Still don't know why so slow.
Here's environment details:
Ubuntu - 10.04
Python - 2.6.5
Flask - 0.9
Werkzeug - 0.8.3

The flask development server is expected to be slow. From http://flask.pocoo.org/docs/deploying/:
You can use the builtin server during development, but you should use a full deployment option for production applications. (Do not use the builtin development server in production.)
As Marcus mentioned in the comments, another WSGI server like gunicorn or tornado would be much faster and more reliable, so definitely use one of those for deployment and benchmarking.
If you're worried about working quickly during development, you can use gunicorn in development just like you would in deployment. If you're deploying to heroku, for example, you can run "foreman start" and the gunicorn server will start right up.

I had this problem on a line like this, it was taking about 1.0 second! It's in a flask post handler:
username=request.form.get('username')
I was testing it with curl -F:
curl -F username="x" http://127.0.0.1:5000/func
I just changed -F to -d and it got 0.0004 seconds!!!
curl -d username="x" http://127.0.0.1:5000/func
I think flask has a problem to retrieving "multipart/form-data" content-type.

If you use curl to send a request, Expect: 100-continue might cause the behavior. I met a similar behavior with uwsgi, flask and curl. What happens in my case are the following:
If the request body size is larger than 1024 byte, curl posts data with Expect: 100-continue header.
However, uwsgi can't deal with the header. So the uwsgi doesn't respond 100-continue.
curl waits for the100-continue response until about one second time-out.
When curl sends 100-continue | Georg's Log was useful for me to know the curl behavior.

Related

How to manually invoke SpellCheck avoiding Browser timeout

I'm trying to update the SpellChecker in my local MoinMoin installation according to this documentation page: https://moinmo.in/HelpOnSpellCheck.
I followed the steps, got a new dictionary file and sym-linked it into data/dict directory in the MoinMoin installation path. I then deleted /data/cache/spellchecker.dict, which should be rebuilt upon invoking the SpellCheck action. If I visit my Wiki and use SpellCheck, the browser times out upon building the SpellCheck database, as expected according to the link above.
In the documentation it says: "If your browser or the webserver timeouts before the file is completely built, one solution is to telnet into your webserver, and manually request the page." This is what I'm trying to do. Unfortunately the request does not seem to invoke the database creation and quickly returns the requested page.
Here is how I requested the page (I'm hosting it locally over port 8085):
telnet 192.168.1.199 8085
Trying 192.168.1.199...
Connected to 192.168.1.199.
Escape character is '^]'.
HEAD /wiki/FrontPage?action=SpellCheck HTTP/1.1
Host: 192.168.1.199
HTTP/1.1 200 OK
...
I would expect that the request invokes the database creation, as it does in a web browser. This should take a few minutes and I should afterwards be able to find the created database in /data/cache/. Unfortunately this does not happen.
If anyone else is interested, here is how I ended up solving the problem: I narrowed the possible timeouts down to either the webserver (nginx) or uwsgi, hence I did the following changes to the config-files:
In /etc/moin/uwsgi.ini:
harakiri 9999
In /etc/nginx/nginx.conf:
uwsgi_read_timeout 9999
uwsgi_send_timeout 9999
I then used the python package requests to send the get-request to the server:
import requests
r = requests.get('http://192.168.1.199:8085/wiki/FrontPage',
params={'action' : 'SpellCheck'}, timeout=9999)
This ran for about 2-3 minutes. Afterwards, the word count shown when performing a spelling check was 629182, reflecting the number of words present in my dictionary.

Simultaneous requests with turbogears2

I'm very new to web dev, and i'm trying to build a simple Web interface with Ajax calls to refresh data, and turbogears2 as the backend.
My Ajax calls are working fine and makes periodic calls to my Turbogears2 server, however these calls takes time to complete (some requests make the server to use remote SSH calls on other machines, which takes up to 3-4 seconds to complete).
My problem is that TurboGears waits for each request to complete before handling the next one, so all my concurrent Ajax calls are being queued instead of being all processed in parallel.
To refresh N values takes 3*N seconds where it could just take 3 seconds with concurrency.
Any idea how to fix this ?
Here is my current server-side code (method get_load is the one called with Ajax):
class RootController(TGController):
#expose()
def index(self):
with open ("index.html") as data:
index = data.read()
return index
#expose()
def get_load(self, ip):
command = "bash get_cpu_load.sh"
request = subprocess.Popen(["ssh", "-o ConnectTimeout=2", ip, command])
load = str(request.communicate()[0])
return load
Your problem is probably caused by the fact that you are serving requests with Gearbox wsgiref server. By default the wsgiref server is single threaded and so can serve a single request at time. That can be changed by providing the wsgiref.threaded = true configuration option in your development.ini server section (the same where ip address and port are specified too). See https://github.com/TurboGears/gearbox#gearbox-http-servers and http://turbogears.readthedocs.io/en/latest/turbogears/gearbox.html#changing-http-server for additional details.
Note that wsgiref is the development server for TurboGears and usage on production is usually discouraged. You should consider using something like waitress, chaussette or mod_wsgi when deploying your application, see http://turbogears.readthedocs.io/en/latest/cookbook/deploy/index.html?highlight=deploy

Bottle with CherryPy does not behave multi-threaded when same end-point is called

As far as I know Bottle when used with CherryPy server should behave multi-threaded. I have a simple test program:
from bottle import Bottle, run
import time
app = Bottle()
#app.route('/hello')
def hello():
time.sleep(5)
#app.route('/hello2')
def hello2():
time.sleep(5)
run(app, host='0.0.0.0', server="cherrypy", port=8080)
When I call localhost:8080/hello by opening 2 tabs and refreshing them at the same time, they don't return at the same time but one of them is completed after 5 seconds and the other is completed after 5 more seconds.
But when I call /hello in one tab and /hello2 in another at the same time they finish at the same time.
Why does Bottle not behave multi-threaded when the same end-point is called twice? Is there a way to make it multi-threaded?
Python version: 2.7.6
Bottle version: 0.12.8
CherryPy version: 3.7.0
OS: Tried on both Ubuntu 14.04 64-Bit & Windows 10 64-Bit
I already met this behaviour answering one question and it had gotten me confused. If you would have searched around for related questions the list would go on and on.
The suspect was some incorrect server-side handling of Keep-Alive, HTTP pipelining, cache policy or the like. But in fact it has nothing to do with server-side at all. The concurrent requests coming to the same URL are serialised because of a browser cache implementation (Firefox, Chromium). The best answer I've found before searching bugtrackers directly, says:
Necko's cache can only handle one writer per cache entry. So if you make multiple requests for the same URL, the first one will open the cache entry for writing and the later ones will block on the cache entry open until the first one finishes.
Indeed, if you disable cache in Firebug or DevTools, the effect doesn't persist.
Thus, if your clients are not browsers, API for example, just ignore the issue. Otherwise, if you really need to do concurrent requests from one browser to the same URL (normal requests or XHRs) add random query string parameter to make request URLs unique, e.g. http://example.com/concurrent/page?nocache=1433247395.
It's almost certainly your browser that's serializing the request. Try using two different ones, or better yet a real client. It doesn't reproduce for me using curl.

POST flask server with XML from python

I have a flask server up and running on pythonanywhere and I am trying to write a python script which I can run locally which will trigger a particular response - lets say the server time, for the sake of this discussion.
There is tonnes and tonnes of documentation on how to write the Flask server side of this process, but non/very little on how to write something which can trigger the Flask app to run.
I have tried sending XML in the form of a simple curl command e.g.
curl -X POST -d '<From>Jack</From><Body>Hello, it worked!</Body>' URL
But this doesnt seem to work (errors about referral headers).
Could someone let me know the correct way to compose some XML which can be sent to a listening flask server.
Thanks,
Jack
First, i would add -H "Content-Type: text/xml" to the headers in the cURL call so the server knows what to expect. It would be helpful if you posted the server code (not necessarily everything, but at least what's failing).
To debug this i would use
#app.before_request
def before_request():
if True:
print "HEADERS", request.headers
print "REQ_path", request.path
print "ARGS",request.args
print "DATA",request.data
print "FORM",request.form
It's a bit rough, but helps to see what's going on at each request. Turn it on and off using the if statement as needed while debugging.
Running your request without the xml header in the cURL call sends the data to the request.form dictionary. Adding the xml header definition results in the data appearing in request.data. Without knowing where your server fails, the above should give you at least a hint on how to proceed.
EDIT referring to comment below:
I would use the excellent xmltodict library. Use this to test:
import xmltodict
#app.before_request
def before_request():
print xmltodict.parse(request.data)['xml']['From']
with this cURL call:
curl -X POST -d '<xml><From>Jack</From><Body>Hello, it worked!</Body></xml>' localhost:5000 -H "Content-Type: text/xml"
'Jack' prints out without issues.
Note that this call has been modified from your question- the 'xml' tag has been added since XML requires a root node (it's called an xml tree for a reason..). Without this tag you'll get a parsing error from xmltodict (or any other parser you choose).

Timeout with Python Requests + Clojure HttpKit Server but not Ring server

I have some Ring routes which I'm running one of two ways.
lein ring server, with the lein-ring plugin
using org.httpkit.server, like (hs/run-server app {:port 3000}))
It's a web app (being consumed by an Angular.js browser client).
I have some API tests written in Python using the Requests library:
my_r = requests.post(MY_ROUTE,
data=MY_DATA,
headers={"Content-Type": "application/json"},
timeout=10)
When I use lein ring server, this request works fine in the JS client and the Python tests.
When I use httpkit, this works fine in the JS client but the Python client times out with
socket.timeout: timed out
I can't figure out why the Python client is timing out. It happens with httpkit but not with lein-ring, so I can only assume that the cause is related to the difference.
I've looked at the traffic in WireShark and both look like they give the correct response. Both have the same Content-Length field (15 bytes).
I've raised the number of threads to 10 (shouldn't need to) and no change.
Any ideas what's wrong?
I found how to fix this, but no satisfactory explanation.
I was using wrap-json-response Ring middleware to take a HashMap and convert it to JSON. I switched to doing my own conversion in my handler with json/write-str, and this fixes it.
At a guess it might be something to do with the server handling output buffering, but that's speculation.
I've combed through the Wireshark dumps and I can't see any relevant differences between the two. The sent Content-Length fields are identical. The 'bytes in flight' differ, at 518 and 524.
No clue as to why the web browser was happy with this but Python Requests wasn't, and whether or this is a bug in Requests, httpkit, ring-middleware-format or my own code.

Categories