How can I force urllib2 to time out? - python

I want to to test my application's handling of timeouts when grabbing data via urllib2, and I want to have some way to force the request to timeout.
Short of finding a very very slow internet connection, what method can I use?
I seem to remember an interesting application/suite for simulating these sorts of things. Maybe someone knows the link?

I usually use netcat to listen on port 80 of my local machine:
nc -l 80
Then I use http://localhost/ as the request URL in my application. Netcat will answer at the http port but won't ever give a response, so the request is guaranteed to time out provided that you have specified a timeout in your urllib2.urlopen() call or by calling socket.setdefaulttimeout().

You could set the default timeout as shown above, but you could use a mix of both since Python 2.6 in there is a timeout option in the urlopen method:
import urllib2
import socket
try:
response = urllib2.urlopen("http://google.com", None, 2.5)
except URLError, e:
print "Oops, timed out?"
except socket.timeout:
print "Timed out!"
The default timeout for urllib2 is infinite, and importing socket ensures you that you'll catch the timeout as socket.timeout exception

import socket
socket.setdefaulttimeout(2) # set time out to 2 second.
If you want to set the timeout for each request you can use the timeout argument for urlopen

why not write a very simple CGI script in bash that just sleeps for the required timeout period?

If you're running on a Mac, speedlimit is very cool.
There's also dummynet. It's a lot more hardcore, but it also lets you do some vastly more interesting things. Here's a pre-configured VM image.
If you're running on a Linux box already, there's netem.
I believe I've heard of a Windows-based tool called TrafficShaper, but that one I haven't verified.

Related

How to send HTTP request in SSH?

I'm trying to make a simple HTTP request in Python in an SSH terminal:
from requests import get
r = get("https://www.google.com")
However, this command just stalls to infinity. This does not happen when not in SSH.
Is there any way to send the request such that it goes through?
Thanks ahead of time.
EDIT: Running the logging in Joran's link yields only the following line:
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.google.com
First, check that you have ability to reach URL using some system-wide tool, like curl: curl -I "https://www.google.com". In case, you will have not timeout error, and got success response, my answer is not for you :)
You code can run forever, just because there is not timeout defined for socket connections. And if for some reason your system is not able to read from socket (at low level), you have to wait for long time.
http://docs.python-requests.org/en/latest/user/quickstart/#timeouts
Try this (assuming you are using python3):
from urllib.request import urlopen
r = urlopen('https://www.google.com').read()

ZMQ socket connect timeout

I'm trying to connect a socket to an endpoint until the socket receives data from that endpoint. This is because the endpoint might not exist at that time.
Currently the connect stalls, i'm guessing because it can't resolve the hostname and that takes a while.
Is there any way to set a timeout on a socket connect?
import zmq
import time
endpoint = 'tcp://doesnt_exist:12345'
ctx = zmq.Context.instance()
s = ctx.socket(zmq.SUB)
t = time.time()
try:
s.connect(endpoint)
except Exception:
pass
print time.time() - t
If you provide a host name to connect, ZeroMQ uses synchronous DNS resolution via a call to getaddrinfo, which is why you see the connect call blocking.
If you really need to connect in controllable way, I suggest you do DNS resolve on your own, using one of the asynchronous DNS resolvers already available for Python (check this example based on pyuc/pycares).
Also see my reply to similar question.
The problem is not the connection, but the DNS lookup. The blocking is done at the OS level, on the gethostbyname call.
Since the timeout is controlled by the OS, working around it is hard (but feasible). My suggestion is that you simply hardcode the IP

Python Requests Not Cleaning up Connections and Causing Port Overflow?

I'm doing something fairly outside of my comfort zone here, so hopefully I'm just doing something stupid.
I have an Amazon EC2 instance which I'm using to run a specialized database, which is controlled through a webapp inside of Tomcat that provides a REST API. On the same server, I'm running a Python script that uses the Requests library to make hundreds of thousands of simple queries to the database (I don't think it's possible to consolidate the queries, though I am going to try that next.)
The problem: after running the script for a bit, I suddenly get a broken pipe error on my SSH terminal. When I try to log back in with SSH, I keep getting "operation timed out" errors. So I can't even log back in to terminate the Python process and instead have to reboot the EC2 instance (which is a huge pain, especially since I'm using ephemeral storage)
My theory is that each time requests makes a REST call, it activates a pair of ports between Python and Tomcat, but that it never closes the ports when it's done. So python keeps trying to grab more and more ports and eventually either somehow grabs away and locks the SSH port (booting me off), or it just uses all the ports and that causes the network system to crap out somehow (as I said, I'm out of my depth.)
I also tried using httplib2, and was getting a similar problem.
Any ideas? If my port theory is correct, is there a way to force requests to surrender the port when it's done? Or otherwise is there at least a way to tell Ubuntu to keep the SSH port off-limits so that I can at least log back in and terminate the process?
Or is there some sort of best practice to using Python to make lots and lots of very simple REST calls?
Edit:
Solved...do:
s = requests.session()
s.config['keep_alive'] = False
Before making the request to force Requests to release connections when it's done.
My speculation:
https://github.com/kennethreitz/requests/blob/develop/requests/models.py#L539 sets conn to connectionpool.connection_from_url(url)
That leads to https://github.com/kennethreitz/requests/blob/develop/requests/packages/urllib3/connectionpool.py#L562, which leads to https://github.com/kennethreitz/requests/blob/develop/requests/packages/urllib3/connectionpool.py#L167.
This eventually leads to https://github.com/kennethreitz/requests/blob/develop/requests/packages/urllib3/connectionpool.py#L185:
def _new_conn(self):
"""
Return a fresh :class:`httplib.HTTPConnection`.
"""
self.num_connections += 1
log.info("Starting new HTTP connection (%d): %s" %
(self.num_connections, self.host))
return HTTPConnection(host=self.host, port=self.port)
I would suggest hooking a handler up to that logger, and listening for lines that match that one. That would let you see how many connections are being created.
Figured it out...Requests has a default 'Keep Alive' policy on connections which you have to explicitly override by doing
s = requests.session()
s.config['keep_alive'] = False
before you make a request.
From the doc:
"""
Keep-Alive
Excellent news — thanks to urllib3, keep-alive is 100% automatic within a session! Any requests that you make within a session will automatically reuse the appropriate connection!
Note that connections are only released back to the pool for reuse once all body data has been read; be sure to either set prefetch to True or read the content property of the Response object.
If you’d like to disable keep-alive, you can simply set the keep_alive configuration to False:
s = requests.session()
s.config['keep_alive'] = False
"""
There may be a subtle bug in Requests here because I WAS reading the .text and .content properties and it was still not releasing the connections. But explicitly passing 'keep alive' as false fixed the problem.

Python httplib.HTTPSConnection timeout -- connection vs. response

When creating an HTTPSConnection with httplib, easy enough to set a timeout:
connection = httplib.HTTPSConnection('some.server.com', timeout=10)
connection.request('POST', '/api', xml, headers={'Content-Type': 'text/xml'})
response = connection.getresponse().read()
There are various parts to this operation, e.g. the connection being accepted and a response being received.
Does the timeout apply to the entire operation? Will it still timeout if the remote host accepts the connection but never sends back a response? I want to be sure that setting the timeout ensure that the operation blocks for a maximum of 10 seconds.
Some context:
I am connecting to an external API and want the operation to block. Just not for more than 10 seconds, and if it is blocking for more than 10 seconds, stop blocking and raise an exception. I'm correctly handling the case when an external API is unreachable, but unsure about when it accepts my connection but fails to respond.
It seems the standard library implementation does not support a timeout on the socket read operations. You would have to make the HTTPSConnection (technically the HTTPResponse._safe_read method) non-blocking for this.
There is a similar question here, which might also help:
Does python's httplib.HTTPConnection block?
I would use gevent for the whole application if that's possible in your case, that supports fully non-blocking I/O and you can implement any timeout scheme you want, even for multiple connections at once.

TLS connection with timeouts (and a few other difficulties)

I have a HTTP client in Python which needs to use TLS. I need not only
to make encrypted connections but also to retrieve info from the
remote machine, such as the certificate issuer. I need to make
connection to many HTTP servers, often badly behaved, so I absolutely
need to have a timeout. With non-TLS connections,
mysocket.settimeout(5) does what I want.
Among the many TLS Python modules:
python-gnutls does not allow to use settimeout() on sockets because
it uses non-blocking sockets:
gnutls.errors.OperationWouldBlock: Function was interrupted.
python-openssl has a similar issue:
OpenSSL.SSL.WantReadError
The SSL module of the standard library does not work with Python
2.5.
Other libraries like TLSlite apparently does not give access to
the metadata of the certificate.
The program is threaded so I cannot use signals. I need detailed
control on the HTTP dialog so I cannot use a standard library like urllib2.
Background: this is
the survey project DNSwitness. Relevant SO threads: Timeout on a
Python function call and How to limit execution time of a function call in Python.
Although I've never used it for exactly this purpose, Twisted should do what you want. The only downside is that it's a rather large library, and you will also need to install PyOpenSSL (Twisted depends on it). If you've never used it before, Twisted's callback-based architecture can take some getting used to (you really want to read the tutorials before starting).
But aside from that, it's designed around the idea of managing a lot of connections, it of course lets you specify timeouts, reconnects, etc., and you can retrieve certificate info (see here).
I assume the problems you're having is the following, you're opening a connection using PyOpenSSL and you always get a WantReadError exception. And you can't distinguish between this error and a timeout. Consider the following example:
#!/usr/bin/python
import OpenSSL
import socket
import struct
context = OpenSSL.SSL.Context(OpenSSL.SSL.TLSv1_METHOD)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(5)
connection = OpenSSL.SSL.Connection(context,s)
connection.connect(("www.gmail.com",443))
# Put the socket in blocking mode
connection.setblocking(1)
# Set the timeout using the setsockopt
tv = struct.pack('ii', int(6), int(0))
connection.setsockopt(socket.SOL_SOCKET, socket.SO_RCVTIMEO, tv)
print "Connected to " , connection.getpeername()
print "Sate " , connection.state_string()
while True:
try:
connection.do_handshake()
break
except OpenSSL.SSL.WantReadError:
print "Exception"
pass
print "Sate " , connection.state_string()
print connection.send("koekoek\r\n")
while True:
try:
recvstr = connection.recv(1024)
break
except OpenSSL.SSL.WantReadError:
print "Exception"
pass
print recvstr
This will open an SSL connection to gmail, send an invalid string, read the response and print it. Note that:
* the connection is explicitely set to blocking-mode
* the recv timeout is explicitely set to in this case 6 seconds.
Now what will the behavior be, when the timeout occurs, the WantReadError exception will be thornw, in this case after waiting for 6 seconds. (You can remove the while True to avoid the retry, but in this case i added them for testing). The timeout set on the socket only appears to be effective in the connect() call.
An alternative would be when keeping the sockets in non-blocking mode which probably applies for the GNUTLS case as well is to perform the timekeeping yourself, you get the time when you launch the call, and in the while True, try: except WantReadError you perform the check every time yourself to see if you haven't been waiting for too long.
I would also recommend Twisted, and using M2Crypto for the TLS parts.
One simple solution could be to change the socket type depending on the operation. I tested this with gnutls and it worked:
Do settimeout() on the socket before doing connect() on the bare socket wrapped by gnutls, that way connect() is subject to the timeout as you wanted.
Make sure you remove the timeout with settimeout(None) or setblocking(1) BEFORE GnuTLS's handshake()

Categories