How to make Python requests.get cut off or fail faster? - python

Why does the Python requests.get() function take too long to be cut off (or fail) if the target request URL is a remote URL (i.e. not localhost) and it can't be reached?
If the target request URL is just localhost and it can't be reached, it cuts off or fails very fast so the issue only occurs if the target request URL is a remote URL.
How could I make it quicker?

Yes. A timeout would do the trick, but you'll need to make sure the remote server will execute the request (if it's supposed to). Meaning: don't make the request timeout 1s if the request would take 5s to execute.
Requests Docs for Timeouts
requests.get('https://github.com/', timeout=0.001)
Alternatively, you could (on localhost) point to the server you're after. And have some kind of variable that points to the "correct" server based on if the Python server is local or remote?

Related

Nginx/PHP-FPM/Laravel cannot handle concurrent requests with single IP

We have two servers:
Nginx/PHP-FPM/Laravel
Python on a different server
Some API calls to Laravel need data from the python server and vice versa. But when both of these occur it appears to be blocking.
My IP makes a call to the Laravel server.
Laravel server makes a call (guzzletransport) to python and waits for a response.
Python server makes a request back to the Laravel server in part of completing the request. This call never completes under these conditions.
I can test the call in (3) by making the appropriate call to the python server and it completes without issue. It appears that the call in (2) blocks the server from processing any additional requests. I am confused why this is occurring as php-fpm should allow simultaneous connections.
While waiting for (1) from my browser and additional requests from my browser also are blocked. I can make requests during this time to the Laravel server from a different IP and it completes it without issue.

Is it possible to recreate a request from the packets programatically?

For a script I am making, I need to be able to see the parameters that are sent with a request.
This is possible through Fiddler, but I am trying to automate the process.
Here are some screenshots to start with. As you can see in the first picture of Fiddler, I can see the URL of a request and the parameters sent with that request.
I tried to do some packet sniffing with scapy with the code below to see if I can get a similar result, but what I get is in the second picture. Basically, I can get the source and destination of a packet as ip addresses, but the packets themselves are just bytes.
def sniffer():
t = AsyncSniffer(prn = lambda x: x.summary(), count = 10)
t.start()
time.sleep(8)
results = t.results
print(len(results))
print(results)
print(results[0])
From my understanding, after we establish a TCP connection, the request is broken down into several IP packets and then sent over to the destination. I would like to be able to replicate the functionality of Fiddler, where I can see the url of the request and then the values of parameters being sent over.
Would it be feasible to recreate the information of a request through only the information gathered from the packets?
Or is this difference because the sniffing is done on Layer 2, and then maybe Fiddler operates on Layer 3/4 before/after the translation into IP packets is done, so it actually sees the content of the original request itself and the result of the combination of packets? If my understanding is wrong, please correct me.
Basically, my question boils down to: "Is there a python module I can use to replicate the features of Fiddler to identify the destination url of a request and the parameters sent along with that request?"
The sniffed traffic is HTTPS traffic - therefore just by sniffing you won't see any details on the HTTP request/response because it is encrypted via SSL/TLS.
Fiddler is a proxy with HTTPS interception, that is something totally different compared to sniffing traffic on network level. This means that for the client application Fiddler "mimics" the server and for the server Fiddler mimics the client. This allows Fiddler to decrypt the requests/responses and show them to you.
If you want to perform request interception on python level I would recommend to you to use mitmproxy instead of Fiddler. This proxy also can perform HTTPS interception but it is written in Python and therefore much easier to integrate in your Python environment.
Alternatively if you just want to see the request/response details of a Python program it may be easier to do so by setting the log-level in an appropriate way. See for example this question: Log all requests from the python-requests module

Python - best way to wait until a remote system is booted

I am using wake on lan to start a certain server in a python script.
The server is online when I can do a successfull API request, such as:
return requests.get(
url + path,
auth=('user', user_password),
headers={'Content-Type':'application/json'},
verify=False,
timeout=0.05
).json()
What is the best method to wait for the server bootup process (until it is reachable via API) without spamming the network with requests in a loop?
I believe you're very close. Why not put that request in while a d try except blocks?
while True:
try:
return requests.head(...)
except requests.exceptions.ConnectionError:
time.sleep(0.5)
Your two choices are to poll the remote service until it responds or configure the service to in some way notify you that it's up.
There's really nothing wrong with sending requests in a loop, as long as you don't do so unnecessarily often. If the service takes ~10 seconds to come up, checking once a second would be reasonable. If it takes ~10 minutes, every 30 seconds or so would probably be fine.
The alternative - some sort of push notification - is more elegant, but it requires you having some other service up and running already, listening for the notification. For example you could start a simple webserver locally before restarting the remote service and have the remote service make a request against your server when it's ready to start handling requests.
Generally speaking I would start with the polling approach since it's easier and involves fewer moving parts. Just be sure you design your polling in a fault-tolerant way; in particular be sure to specify a maximum time to wait or number of polling attempts to make before giving up. Otherwise your script will just hang if the remote service never comes up.

Python Requests Not Cleaning up Connections and Causing Port Overflow?

I'm doing something fairly outside of my comfort zone here, so hopefully I'm just doing something stupid.
I have an Amazon EC2 instance which I'm using to run a specialized database, which is controlled through a webapp inside of Tomcat that provides a REST API. On the same server, I'm running a Python script that uses the Requests library to make hundreds of thousands of simple queries to the database (I don't think it's possible to consolidate the queries, though I am going to try that next.)
The problem: after running the script for a bit, I suddenly get a broken pipe error on my SSH terminal. When I try to log back in with SSH, I keep getting "operation timed out" errors. So I can't even log back in to terminate the Python process and instead have to reboot the EC2 instance (which is a huge pain, especially since I'm using ephemeral storage)
My theory is that each time requests makes a REST call, it activates a pair of ports between Python and Tomcat, but that it never closes the ports when it's done. So python keeps trying to grab more and more ports and eventually either somehow grabs away and locks the SSH port (booting me off), or it just uses all the ports and that causes the network system to crap out somehow (as I said, I'm out of my depth.)
I also tried using httplib2, and was getting a similar problem.
Any ideas? If my port theory is correct, is there a way to force requests to surrender the port when it's done? Or otherwise is there at least a way to tell Ubuntu to keep the SSH port off-limits so that I can at least log back in and terminate the process?
Or is there some sort of best practice to using Python to make lots and lots of very simple REST calls?
Edit:
Solved...do:
s = requests.session()
s.config['keep_alive'] = False
Before making the request to force Requests to release connections when it's done.
My speculation:
https://github.com/kennethreitz/requests/blob/develop/requests/models.py#L539 sets conn to connectionpool.connection_from_url(url)
That leads to https://github.com/kennethreitz/requests/blob/develop/requests/packages/urllib3/connectionpool.py#L562, which leads to https://github.com/kennethreitz/requests/blob/develop/requests/packages/urllib3/connectionpool.py#L167.
This eventually leads to https://github.com/kennethreitz/requests/blob/develop/requests/packages/urllib3/connectionpool.py#L185:
def _new_conn(self):
"""
Return a fresh :class:`httplib.HTTPConnection`.
"""
self.num_connections += 1
log.info("Starting new HTTP connection (%d): %s" %
(self.num_connections, self.host))
return HTTPConnection(host=self.host, port=self.port)
I would suggest hooking a handler up to that logger, and listening for lines that match that one. That would let you see how many connections are being created.
Figured it out...Requests has a default 'Keep Alive' policy on connections which you have to explicitly override by doing
s = requests.session()
s.config['keep_alive'] = False
before you make a request.
From the doc:
"""
Keep-Alive
Excellent news — thanks to urllib3, keep-alive is 100% automatic within a session! Any requests that you make within a session will automatically reuse the appropriate connection!
Note that connections are only released back to the pool for reuse once all body data has been read; be sure to either set prefetch to True or read the content property of the Response object.
If you’d like to disable keep-alive, you can simply set the keep_alive configuration to False:
s = requests.session()
s.config['keep_alive'] = False
"""
There may be a subtle bug in Requests here because I WAS reading the .text and .content properties and it was still not releasing the connections. But explicitly passing 'keep alive' as false fixed the problem.

How can i ignore server response to save bandwidth?

I am using a server to send some piece of information to another server every second. The problem is that the other server response is few kilobytes and this consumes the bandwidth on the first server ( about 2 GB in an hour ). I would like to send the request and ignore the return ( not even receive it to save bandwidth ) ..
I use a small python script for this task using (urllib). I don't mind using any other tool or even any other language if this is going to make the request only.
A 5K reply is small stuff and is probably below the standard TCP window size of your OS. This means that even if you close your network connection just after sending the request and checking just the very first bytes of the reply (to be sure that request has been really received) probably the server already sent you the whole answer and the packets are already on the wire or on your computer.
If you cannot control (i.e. trim down) what is the server reply for your notification the only alternative I can think to is to add another server on the remote machine waiting for a simple command and doing the real request locally and just sending back to you the result code. This can be done very easily may be even just with bash/perl/python using for example netcat/wget locally.
By the way there is something strange in your math as Glenn Maynard correctly wrote in a comment.
For HTTP, you can send a HEAD request instead of GET or POST:
import urllib2
request = urllib2.Request('https://stackoverflow.com/q/5049244/')
request.get_method = lambda: 'HEAD' # override get_method
response = urllib2.urlopen(request) # make request
print response.code, response.url
Output
200 https://stackoverflow.com/questions/5049244/how-can-i-ignore-server-response-t
o-save-bandwidth
See How do you send a HEAD HTTP request in Python?
Sorry but this does not make much sense and is likely a violation of the HTTP protocol. I consider such an idea as weird and broken-by-design. Either make the remote server shut up or configure your application or whatever is running on the remote server on a different protocol level using a smarter protocol with less bandwidth usage. Everything else is hard being considered as nonsense.

Categories