Python httplib.HTTPSConnection timeout -- connection vs. response - python

When creating an HTTPSConnection with httplib, easy enough to set a timeout:
connection = httplib.HTTPSConnection('some.server.com', timeout=10)
connection.request('POST', '/api', xml, headers={'Content-Type': 'text/xml'})
response = connection.getresponse().read()
There are various parts to this operation, e.g. the connection being accepted and a response being received.
Does the timeout apply to the entire operation? Will it still timeout if the remote host accepts the connection but never sends back a response? I want to be sure that setting the timeout ensure that the operation blocks for a maximum of 10 seconds.
Some context:
I am connecting to an external API and want the operation to block. Just not for more than 10 seconds, and if it is blocking for more than 10 seconds, stop blocking and raise an exception. I'm correctly handling the case when an external API is unreachable, but unsure about when it accepts my connection but fails to respond.

It seems the standard library implementation does not support a timeout on the socket read operations. You would have to make the HTTPSConnection (technically the HTTPResponse._safe_read method) non-blocking for this.
There is a similar question here, which might also help:
Does python's httplib.HTTPConnection block?
I would use gevent for the whole application if that's possible in your case, that supports fully non-blocking I/O and you can implement any timeout scheme you want, even for multiple connections at once.

Related

python3 sockets never stops trying to read data

i'm using a tcp socket to read data from a website, HTTP requests to be exact. I want to use sockets and not requests or pycurl so please do not suggest me any higher level library.
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s = wrap_socket(s)
response_bytes = b""
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
s.connect((website))
s.send(all of this works good)
#this is where my problems occur
while True:
response_bytes+=s.recv(4096)
if not response_bytes: break
this solution should work perfectly according to multiple stack posts. I want to use the most efficient way without a timeout. If i use try/except and set a socket timeout it works fine but thats not very good imo. This seems to make the code hang forever and make it try to read infinitely. Is there any reason it is doing this?
s.send(all of this works good)
Let me guess: this is doing a HTTP request with an explicit or implicit Connection: keep-alive. This header is implicit when doing a HTTP/1.1 request. Because of this the server decides to keep the TCP connection open because it is awaiting the next request of the client.
I want to use the most efficient way without a timeout.
The correct way is to properly understand the HTTP protocol, extract the size of the response body from the response header and read exactly as much data as specified by the size. The easy way is to just do a HTTP/1.0 request without enabling HTTP keep-alive. In this case the server will close the TCP connection immediately after the response was sent.
I want to use sockets and not requests or pycurl so please do not suggest me any higher level library.
It looks like you want to implement HTTP yourself. There is a standard you should read in this case which describes the fairly complex behavior of HTTP. Don't try to guess a protocol but read the actual specification.
this solution should work perfectly according to multiple stack posts
No, you missed an important detail.
while True:
response_bytes+=s.recv(4096)
if not response_bytes: break
If response_bytes is ever non-empty then it stays non-empty, and this becomes an infinite loop. Instead, do something like
while True:
buf = s.recv(2048)
if not buf:
break
response_bytes+=buf

make HTTP request from python and wait a long time for a response

I'm using Python to to access a REST API that sometimes takes a long time to run (more than 5 minutes). I'm using pyelasticsearch to make the request, and tried setting the timeout to 10 minutes like this:
es = ElasticSearch(config["es_server_url"], timeout=600)
results = es.send_request("POST",
[config["es_index"], "_search_with_clusters" ],
cluster_query)
but it times out after 5 minutes (not 10) with requests.exceptions.ConnectionError (Caused by <class 'socket.error'>: [Errno 104] Connection reset by peer)
I tried setting the socket timeout and using requests directly like this:
socket.setdefaulttimeout(600)
try:
r = requests.post(url, data=post, timeout=600)
except:
print "timed out"
and it times out after approximately 5 minutes every time.
How can I make my script wait longer until the request returns?
The err "Connection reset by peer", aka ECONNRESET, means that the server—or some router or proxy between you and the server—closed the connection forcibly.
So, specifying a longer timeout on your end isn't going to make any difference. You need to figure out who's closing the connection and configure it to wait longer.
Plausible places to look are the server application itself, whatever server program drives that application (e.g., if you're using Apache with mod_wsgi, Apache), a load-balancing router or front-end server or reverse proxy in front of that server, or a web proxy in front of your client.
Once you figure out where the problem is, if it's something you can't fix yourself, you may be able to fix it by trickling from the server to the client—have it send something useless but harmless (an HTTP 100, an extra header, some body text that your client knows how to skip over, whatever) every 120 seconds. This may or may not work, depending on what component is hanging up.

Python Requests Not Cleaning up Connections and Causing Port Overflow?

I'm doing something fairly outside of my comfort zone here, so hopefully I'm just doing something stupid.
I have an Amazon EC2 instance which I'm using to run a specialized database, which is controlled through a webapp inside of Tomcat that provides a REST API. On the same server, I'm running a Python script that uses the Requests library to make hundreds of thousands of simple queries to the database (I don't think it's possible to consolidate the queries, though I am going to try that next.)
The problem: after running the script for a bit, I suddenly get a broken pipe error on my SSH terminal. When I try to log back in with SSH, I keep getting "operation timed out" errors. So I can't even log back in to terminate the Python process and instead have to reboot the EC2 instance (which is a huge pain, especially since I'm using ephemeral storage)
My theory is that each time requests makes a REST call, it activates a pair of ports between Python and Tomcat, but that it never closes the ports when it's done. So python keeps trying to grab more and more ports and eventually either somehow grabs away and locks the SSH port (booting me off), or it just uses all the ports and that causes the network system to crap out somehow (as I said, I'm out of my depth.)
I also tried using httplib2, and was getting a similar problem.
Any ideas? If my port theory is correct, is there a way to force requests to surrender the port when it's done? Or otherwise is there at least a way to tell Ubuntu to keep the SSH port off-limits so that I can at least log back in and terminate the process?
Or is there some sort of best practice to using Python to make lots and lots of very simple REST calls?
Edit:
Solved...do:
s = requests.session()
s.config['keep_alive'] = False
Before making the request to force Requests to release connections when it's done.
My speculation:
https://github.com/kennethreitz/requests/blob/develop/requests/models.py#L539 sets conn to connectionpool.connection_from_url(url)
That leads to https://github.com/kennethreitz/requests/blob/develop/requests/packages/urllib3/connectionpool.py#L562, which leads to https://github.com/kennethreitz/requests/blob/develop/requests/packages/urllib3/connectionpool.py#L167.
This eventually leads to https://github.com/kennethreitz/requests/blob/develop/requests/packages/urllib3/connectionpool.py#L185:
def _new_conn(self):
"""
Return a fresh :class:`httplib.HTTPConnection`.
"""
self.num_connections += 1
log.info("Starting new HTTP connection (%d): %s" %
(self.num_connections, self.host))
return HTTPConnection(host=self.host, port=self.port)
I would suggest hooking a handler up to that logger, and listening for lines that match that one. That would let you see how many connections are being created.
Figured it out...Requests has a default 'Keep Alive' policy on connections which you have to explicitly override by doing
s = requests.session()
s.config['keep_alive'] = False
before you make a request.
From the doc:
"""
Keep-Alive
Excellent news — thanks to urllib3, keep-alive is 100% automatic within a session! Any requests that you make within a session will automatically reuse the appropriate connection!
Note that connections are only released back to the pool for reuse once all body data has been read; be sure to either set prefetch to True or read the content property of the Response object.
If you’d like to disable keep-alive, you can simply set the keep_alive configuration to False:
s = requests.session()
s.config['keep_alive'] = False
"""
There may be a subtle bug in Requests here because I WAS reading the .text and .content properties and it was still not releasing the connections. But explicitly passing 'keep alive' as false fixed the problem.

Close inactive connections in Twisted

I'm running a Twisted server with the LineReceiver protocol. Sometimes clients will disconnect silently, so Twisted keeps the connection open. And because the server doesn't send anything unless requested of it, there's never a TCP timeout. In other words, some connections are never closed server-side.
How can I have Twisted close a connection that's been inactive for a few hours?
You can schedule timed events using reactor.callLater. Based on this, there's a helper for adding timeouts to protocols, twisted.protocols.policies.TimeoutMixin.
Another approach is to use TCP keep-alives, which you can enable using the transport's setTcpKeepAlive method.
And another approach is to use application-level keep-alives. Essentially send a ''noop'' once in a while. It doesn't need a response. If the connection has been lost, the extra data in the send buffer will cause the TCP stack to eventually notice.
See also the FAQ entry.

TLS connection with timeouts (and a few other difficulties)

I have a HTTP client in Python which needs to use TLS. I need not only
to make encrypted connections but also to retrieve info from the
remote machine, such as the certificate issuer. I need to make
connection to many HTTP servers, often badly behaved, so I absolutely
need to have a timeout. With non-TLS connections,
mysocket.settimeout(5) does what I want.
Among the many TLS Python modules:
python-gnutls does not allow to use settimeout() on sockets because
it uses non-blocking sockets:
gnutls.errors.OperationWouldBlock: Function was interrupted.
python-openssl has a similar issue:
OpenSSL.SSL.WantReadError
The SSL module of the standard library does not work with Python
2.5.
Other libraries like TLSlite apparently does not give access to
the metadata of the certificate.
The program is threaded so I cannot use signals. I need detailed
control on the HTTP dialog so I cannot use a standard library like urllib2.
Background: this is
the survey project DNSwitness. Relevant SO threads: Timeout on a
Python function call and How to limit execution time of a function call in Python.
Although I've never used it for exactly this purpose, Twisted should do what you want. The only downside is that it's a rather large library, and you will also need to install PyOpenSSL (Twisted depends on it). If you've never used it before, Twisted's callback-based architecture can take some getting used to (you really want to read the tutorials before starting).
But aside from that, it's designed around the idea of managing a lot of connections, it of course lets you specify timeouts, reconnects, etc., and you can retrieve certificate info (see here).
I assume the problems you're having is the following, you're opening a connection using PyOpenSSL and you always get a WantReadError exception. And you can't distinguish between this error and a timeout. Consider the following example:
#!/usr/bin/python
import OpenSSL
import socket
import struct
context = OpenSSL.SSL.Context(OpenSSL.SSL.TLSv1_METHOD)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(5)
connection = OpenSSL.SSL.Connection(context,s)
connection.connect(("www.gmail.com",443))
# Put the socket in blocking mode
connection.setblocking(1)
# Set the timeout using the setsockopt
tv = struct.pack('ii', int(6), int(0))
connection.setsockopt(socket.SOL_SOCKET, socket.SO_RCVTIMEO, tv)
print "Connected to " , connection.getpeername()
print "Sate " , connection.state_string()
while True:
try:
connection.do_handshake()
break
except OpenSSL.SSL.WantReadError:
print "Exception"
pass
print "Sate " , connection.state_string()
print connection.send("koekoek\r\n")
while True:
try:
recvstr = connection.recv(1024)
break
except OpenSSL.SSL.WantReadError:
print "Exception"
pass
print recvstr
This will open an SSL connection to gmail, send an invalid string, read the response and print it. Note that:
* the connection is explicitely set to blocking-mode
* the recv timeout is explicitely set to in this case 6 seconds.
Now what will the behavior be, when the timeout occurs, the WantReadError exception will be thornw, in this case after waiting for 6 seconds. (You can remove the while True to avoid the retry, but in this case i added them for testing). The timeout set on the socket only appears to be effective in the connect() call.
An alternative would be when keeping the sockets in non-blocking mode which probably applies for the GNUTLS case as well is to perform the timekeeping yourself, you get the time when you launch the call, and in the while True, try: except WantReadError you perform the check every time yourself to see if you haven't been waiting for too long.
I would also recommend Twisted, and using M2Crypto for the TLS parts.
One simple solution could be to change the socket type depending on the operation. I tested this with gnutls and it worked:
Do settimeout() on the socket before doing connect() on the bare socket wrapped by gnutls, that way connect() is subject to the timeout as you wanted.
Make sure you remove the timeout with settimeout(None) or setblocking(1) BEFORE GnuTLS's handshake()

Categories