Some background: I work for a corporation which uses a proxy. Ping/nslookup are blocked, and I think this may be contributing to the following problem. The operating system being used is Windows, and the version of Python I'm testing 3.4.3.
I'm trying to create an application that communicates with a webservice, and this application will run inside our network. However, all requests take over 10 seconds to complete, while in the web browser it loads in under a second. Note that these requests succeed, they just take too long to be usable.
I profiled the application using the cProfile module, and I found that the application is spending 11 seconds on gethostbyaddr, and 4 seconds on gethostbyname.
I'm not familiar enough with networks, but is this a timeout? Why does the request go through despite the timeout? How do I disable these operations? And if I can't, is there a library that does not use these operations?
I tried both the requests and urllib modules. Pip is also exceedingly slow and may be because of the same cause.
Thanks in advance for any help or information on this subject.
Edit
I just tried monkey patching socket.gethostbyaddr and socket.gethostbyname, and the speed delay was gone. This doesn't feel like a proper solution though.
import requests
import socket
def do_nothing(*args):
return None
socket.gethostbyaddr = do_nothing
socket.gethostbyname = do_nothing
r = requests.get('https://google.com')
print(r.status_code) # prints 200
Related
I am creating an application on Python3.6 with Flask and uwsgi. The logic is very simple, it is just return a ok in the response:
# main
import connexion
def main():
app = connexion.App(__name__)
app.add_api('openapi.yaml',
arguments={'title': 'Service'})
app.run(port=8080)
# endpoint
def health_get():
return "OK"
as above code, it basically responses to health check. In the client side, I am using request library to send rest request:
s = requests.Session()
...
response = s.get('http://localhost:8080/health')
...
I measure the time spend on the request s.get('http://localhost:8080/health'). It takes about 3 milliseconds. Both server and client run in localhost which means there is no network latency. I can't think about any improvement. It seems that the framework takes most of the time. Is it possible to improve the performance to be less than 1 millisecond?
And this is not a heavy load test case, a few requests per second is good enough.
If it is not possible, whether websockt connection is an option?
I don't think this is something you'd be able to improve with better syntax in python. Looking at a faster language like C++ or Java could help as they are compiled whereas python is interpreted, therefore making it slower.
If you are committed to python, your other option is increasing your bandwidth. This can be achieved by using ethernet.
I have a proposal for you, If you are designing just a heartbeat app and you want to stick with python. connexion and requests are not the right candidates for that. You can try bjoern server and python's http-client together. This will reduce a lot of wrapper logic in the code. It will improve the performance for sure. I cannot predict the benchmark at this point.
As far as I know Bottle when used with CherryPy server should behave multi-threaded. I have a simple test program:
from bottle import Bottle, run
import time
app = Bottle()
#app.route('/hello')
def hello():
time.sleep(5)
#app.route('/hello2')
def hello2():
time.sleep(5)
run(app, host='0.0.0.0', server="cherrypy", port=8080)
When I call localhost:8080/hello by opening 2 tabs and refreshing them at the same time, they don't return at the same time but one of them is completed after 5 seconds and the other is completed after 5 more seconds.
But when I call /hello in one tab and /hello2 in another at the same time they finish at the same time.
Why does Bottle not behave multi-threaded when the same end-point is called twice? Is there a way to make it multi-threaded?
Python version: 2.7.6
Bottle version: 0.12.8
CherryPy version: 3.7.0
OS: Tried on both Ubuntu 14.04 64-Bit & Windows 10 64-Bit
I already met this behaviour answering one question and it had gotten me confused. If you would have searched around for related questions the list would go on and on.
The suspect was some incorrect server-side handling of Keep-Alive, HTTP pipelining, cache policy or the like. But in fact it has nothing to do with server-side at all. The concurrent requests coming to the same URL are serialised because of a browser cache implementation (Firefox, Chromium). The best answer I've found before searching bugtrackers directly, says:
Necko's cache can only handle one writer per cache entry. So if you make multiple requests for the same URL, the first one will open the cache entry for writing and the later ones will block on the cache entry open until the first one finishes.
Indeed, if you disable cache in Firebug or DevTools, the effect doesn't persist.
Thus, if your clients are not browsers, API for example, just ignore the issue. Otherwise, if you really need to do concurrent requests from one browser to the same URL (normal requests or XHRs) add random query string parameter to make request URLs unique, e.g. http://example.com/concurrent/page?nocache=1433247395.
It's almost certainly your browser that's serializing the request. Try using two different ones, or better yet a real client. It doesn't reproduce for me using curl.
Openstack-Swift is using evenlet.green.httplib for BufferedHttpconnections.
When I do performance benchmark of it for write operations, I could observer that write throughput drops even only one replica node is overloaded.
As I know write quorum is 2 out of 3 replicas, therefore overloading only one replica cannot affect for the throughput.
When I dig deeper what I observed was, the consequent requests are blocked until the responses are reached for the previous requests. Its mainly because of the BufferedHttpConnection which stops issuing new request until the previous response is read.
Why Openstack-swift use such a method?
Is this the usual behaviour of evenlet.green.httplib.HttpConnection?
This does not make sense in write quorum point of view, because its like waiting for all the responses not a quorum.
Any ideas, any workaround to stop this behaviour using the same library?
Its not a problem of the library but a limitation due to the Openstack Swift configuration where the "Workers" configuration in all Account/Container/Object config of Openstack Swift was set to 1
Regarding the library
When new connections are made using evenlet.green.httplib.HttpConnection
it does not block.
But if requests are using the same connection, subsequent requests are blocked until the response is fully read.
I'm using gevent + bottle for following:
call API method on remote server
Process result from the API
return HTML
I've set a tiemout for the API call (httplib/socket), but if it's set to 5 seconds (for example), my python script is busy for that time and can't return any other pages (which is normal).
Question:
Can I somehow make a clever use of gevent (in a separate script, maybe?) to handle such long requests?
I was thinking of starting a separate API-interrogating script on localhost:8080 and putting it behind a load balancer (as "Internet" suggested) but I'm sure there msut be a better way.
I am not an experienced programmer, so thank you for your help!
Actually, your problem should not exist. The gevent server backend can handle any number of requests at the same time. If one is blocked for 5 seconds, that does not affect the other requests arriving at the server. Thats the point of the gevent server backend.
1) Are you sure that you use the gevent server backend properly? And not just a monkey-patched version of the wsgiref default server (which is single-threaded)?
2) Did you start the server via bottle.py --server gevent? If not, did you gevent.monkey.patch_all() before importing all the other socket-related stuff (including bottle)?
Example:
from gevent import monkey
monkey.patch_all()
import bottle
import urllib2
#bottle.route(...)
def callback():
urllib2.open(...)
bottle.run(server='gevent')
I am using facebook python Graph API. When i am calling put_object to write to news feed it is taking about 12-14 sec to complete the call. When i run from command line using curl with same parameters i get the response back in 1.2 seconds.
I ran the profiler on the python code and from i see that it is spending 99.5% time in the socket.recv . I am not sure if it is the problem with facebook python sdk or something else.
I am on python 2.6. i see from facebook.py that it is using urllib.
file = urllib.urlopen("https://graph.facebook.com/" + path + "?" +
urllib.urlencode(args), post_data)
Has someone experienced similar slow down ? Any suggestions will be highly appreciated.
Direct command-line CURL is bound to be faster than urllib or urllib2. If you want speed, you could replace the call using pycurl (which is also a C-extension) whereas urllib is a python module written on top of httplib.
What more you could do is, if you're flexible enough to use a Tornado server, use the async caller of Tornado which directly talks to sockets and is also asynchronous.
Also, if nothing out of these can be done, try replacing urllib with urllib2 and create a non blocking caller with callback returns. This is all that I've done to improve the native 3rd party wrappers of facebook/twitter/amazon etc.
Are you behind an http proxy server? Curl honors proxy server environment variables, while urllib doesn't do so by default, and also doesn't support calling an https url (such as https://graph.facebook.com) over a proxy server.
In any event I expect it's more likely a network issue than a Python vs C issue. Yes C is faster, but this isn't a CPU-bound task, and there's no way that you're burning 12-14 seconds inside the Python interpreter to make this call.
If curl is happy but urllib is not, perhaps trying pycurl will solve your problem. http://pycurl.sourceforge.net/