i'm using a tcp socket to read data from a website, HTTP requests to be exact. I want to use sockets and not requests or pycurl so please do not suggest me any higher level library.
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s = wrap_socket(s)
response_bytes = b""
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
s.connect((website))
s.send(all of this works good)
#this is where my problems occur
while True:
response_bytes+=s.recv(4096)
if not response_bytes: break
this solution should work perfectly according to multiple stack posts. I want to use the most efficient way without a timeout. If i use try/except and set a socket timeout it works fine but thats not very good imo. This seems to make the code hang forever and make it try to read infinitely. Is there any reason it is doing this?
s.send(all of this works good)
Let me guess: this is doing a HTTP request with an explicit or implicit Connection: keep-alive. This header is implicit when doing a HTTP/1.1 request. Because of this the server decides to keep the TCP connection open because it is awaiting the next request of the client.
I want to use the most efficient way without a timeout.
The correct way is to properly understand the HTTP protocol, extract the size of the response body from the response header and read exactly as much data as specified by the size. The easy way is to just do a HTTP/1.0 request without enabling HTTP keep-alive. In this case the server will close the TCP connection immediately after the response was sent.
I want to use sockets and not requests or pycurl so please do not suggest me any higher level library.
It looks like you want to implement HTTP yourself. There is a standard you should read in this case which describes the fairly complex behavior of HTTP. Don't try to guess a protocol but read the actual specification.
this solution should work perfectly according to multiple stack posts
No, you missed an important detail.
while True:
response_bytes+=s.recv(4096)
if not response_bytes: break
If response_bytes is ever non-empty then it stays non-empty, and this becomes an infinite loop. Instead, do something like
while True:
buf = s.recv(2048)
if not buf:
break
response_bytes+=buf
Related
In my code I wrote something like this:
try:
s.sendall(data)
except Exception as e:
print e
Now, can I assume that if there wasn't any exception thrown by sendall that the other side of the socket (its kernel) did receive 'data'? If not then that means I need to send an application ack which seems unreasonable to me.
If I can assume that the other side's kernel did receive 'data' then that means that 'sendall' returns only when it sees tcp ack for all the bytes I have put in 'data' but I couldn't see any documentation for this, on the contrary, from searching the web I got the feeling that I cannot assume an ack was received.
can I assume that if there wasn't any exception thrown by sendall that the other side of the socket (its kernel) did receive 'data'?
No, you can't. All it tells you that the system successfully sent the data. It will not wait for the peer to ACK the data (i.e. data received at the OS kernel) or even wait until the data got processed by the peer application. This behavior is not specific to python.
And usually it does not matter much if the peer systems kernel received the data and put it into the applications socket buffer. All what really counts is if it received and processed the data inside the application, which might involve complex things like inserting the data into a database and waiting for a successful commit or even forwarding the data to yet another system. And since it is up to the application to decide when the data are really processed you have to make your application specific ACK to signal successful processing.
Yes you can :)
According to the socket.sendall docs:
socket.sendall(string[, flags]) Send data to the socket. The socket
must be connected to a remote socket. The optional flags argument has
the same meaning as for recv() above. Unlike send(), this method
continues to send data from string until either all data has been sent
or an error occurs. None is returned on success. On error, an
exception is raised, and there is no way to determine how much data,
if any, was successfully sent.
Specifically:
socket.sendall() will continue to send all data until it has completed or an error has occurred.
Update: To answer your comment about what's going on under the hook:
Looking at the socketmodule.c source code it looks like it repeatedly tries to "send all data" until there is no more data left to send. You can see this on L3611 } while (len > 0);. Hopefully this answers your question.
I have a TCP server with code that looks like this (in a loop):
r, w, e = select([self.sock], [self.sock], [self.sock], 0)
if r or e:
try:
data = self.sock.recv(2048)
except:
debug("%s: .recv() crashed!"%self.id)
debug(traceback.format_exc())
break
For some reason, my connection from the client to this server randomly breaks, but I only see that it broke once I try to send data, then I get the error from recv(), is there any way to detect the error without trying to send data?
Depending on how the connection was closed, you may not know it's closed until you attempt a send. By default, TCP won't automatically detect if the remote machine disappears without sending a disconnect.
What you're seeing is the correct behavior and something you need to handle. Make sure you don't assume that all exceptions from recv are "crashes", though. I don't know python, but there are likely different exceptions (including disconnects) that you need to handle but the code you posted doesn't deal with properly.
You should either enable TCP keepalive or send application-layer no-op packets to determine if your connection is still open.
I'm trying to write an application where I send an initial HTTP post message to server and leave the connection open. The application then sits around until the server sends data back. Once the server sends data back I want to read it and write it to a file (easy enough).
The part I'm having trouble with is actua
Basically I do this:
h=http.HTTPConnection(sever, port, timeout)
h.putrequest('POST', selector)
h.putheaders(...)
h.endheaders()
h.send(body)
buffering = False
while 1:
r = h.getresponse(buffering)
f=open(unique_filename, 'w')
f.write(r.read())
f.close()
What I expect is that the app should block in the loop and when data arrives it gets written to the file. I suspect I'm using read the wrong way, but looking at the httplib source didn't help.
Also, the python documentation site mentions a httplib.fileno() that returns the socket httplib uses. I'm using 2.7.0 and website doc is for 2.7.2, I can't find the fileno() method. I suspect taking the socket over httplib and calling recv myself is the best way to go, is that a good idea?
Any help is appreciated with one exception: please don't tell me to use some other library.
I am using a server to send some piece of information to another server every second. The problem is that the other server response is few kilobytes and this consumes the bandwidth on the first server ( about 2 GB in an hour ). I would like to send the request and ignore the return ( not even receive it to save bandwidth ) ..
I use a small python script for this task using (urllib). I don't mind using any other tool or even any other language if this is going to make the request only.
A 5K reply is small stuff and is probably below the standard TCP window size of your OS. This means that even if you close your network connection just after sending the request and checking just the very first bytes of the reply (to be sure that request has been really received) probably the server already sent you the whole answer and the packets are already on the wire or on your computer.
If you cannot control (i.e. trim down) what is the server reply for your notification the only alternative I can think to is to add another server on the remote machine waiting for a simple command and doing the real request locally and just sending back to you the result code. This can be done very easily may be even just with bash/perl/python using for example netcat/wget locally.
By the way there is something strange in your math as Glenn Maynard correctly wrote in a comment.
For HTTP, you can send a HEAD request instead of GET or POST:
import urllib2
request = urllib2.Request('https://stackoverflow.com/q/5049244/')
request.get_method = lambda: 'HEAD' # override get_method
response = urllib2.urlopen(request) # make request
print response.code, response.url
Output
200 https://stackoverflow.com/questions/5049244/how-can-i-ignore-server-response-t
o-save-bandwidth
See How do you send a HEAD HTTP request in Python?
Sorry but this does not make much sense and is likely a violation of the HTTP protocol. I consider such an idea as weird and broken-by-design. Either make the remote server shut up or configure your application or whatever is running on the remote server on a different protocol level using a smarter protocol with less bandwidth usage. Everything else is hard being considered as nonsense.
I have a HTTP client in Python which needs to use TLS. I need not only
to make encrypted connections but also to retrieve info from the
remote machine, such as the certificate issuer. I need to make
connection to many HTTP servers, often badly behaved, so I absolutely
need to have a timeout. With non-TLS connections,
mysocket.settimeout(5) does what I want.
Among the many TLS Python modules:
python-gnutls does not allow to use settimeout() on sockets because
it uses non-blocking sockets:
gnutls.errors.OperationWouldBlock: Function was interrupted.
python-openssl has a similar issue:
OpenSSL.SSL.WantReadError
The SSL module of the standard library does not work with Python
2.5.
Other libraries like TLSlite apparently does not give access to
the metadata of the certificate.
The program is threaded so I cannot use signals. I need detailed
control on the HTTP dialog so I cannot use a standard library like urllib2.
Background: this is
the survey project DNSwitness. Relevant SO threads: Timeout on a
Python function call and How to limit execution time of a function call in Python.
Although I've never used it for exactly this purpose, Twisted should do what you want. The only downside is that it's a rather large library, and you will also need to install PyOpenSSL (Twisted depends on it). If you've never used it before, Twisted's callback-based architecture can take some getting used to (you really want to read the tutorials before starting).
But aside from that, it's designed around the idea of managing a lot of connections, it of course lets you specify timeouts, reconnects, etc., and you can retrieve certificate info (see here).
I assume the problems you're having is the following, you're opening a connection using PyOpenSSL and you always get a WantReadError exception. And you can't distinguish between this error and a timeout. Consider the following example:
#!/usr/bin/python
import OpenSSL
import socket
import struct
context = OpenSSL.SSL.Context(OpenSSL.SSL.TLSv1_METHOD)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(5)
connection = OpenSSL.SSL.Connection(context,s)
connection.connect(("www.gmail.com",443))
# Put the socket in blocking mode
connection.setblocking(1)
# Set the timeout using the setsockopt
tv = struct.pack('ii', int(6), int(0))
connection.setsockopt(socket.SOL_SOCKET, socket.SO_RCVTIMEO, tv)
print "Connected to " , connection.getpeername()
print "Sate " , connection.state_string()
while True:
try:
connection.do_handshake()
break
except OpenSSL.SSL.WantReadError:
print "Exception"
pass
print "Sate " , connection.state_string()
print connection.send("koekoek\r\n")
while True:
try:
recvstr = connection.recv(1024)
break
except OpenSSL.SSL.WantReadError:
print "Exception"
pass
print recvstr
This will open an SSL connection to gmail, send an invalid string, read the response and print it. Note that:
* the connection is explicitely set to blocking-mode
* the recv timeout is explicitely set to in this case 6 seconds.
Now what will the behavior be, when the timeout occurs, the WantReadError exception will be thornw, in this case after waiting for 6 seconds. (You can remove the while True to avoid the retry, but in this case i added them for testing). The timeout set on the socket only appears to be effective in the connect() call.
An alternative would be when keeping the sockets in non-blocking mode which probably applies for the GNUTLS case as well is to perform the timekeeping yourself, you get the time when you launch the call, and in the while True, try: except WantReadError you perform the check every time yourself to see if you haven't been waiting for too long.
I would also recommend Twisted, and using M2Crypto for the TLS parts.
One simple solution could be to change the socket type depending on the operation. I tested this with gnutls and it worked:
Do settimeout() on the socket before doing connect() on the bare socket wrapped by gnutls, that way connect() is subject to the timeout as you wanted.
Make sure you remove the timeout with settimeout(None) or setblocking(1) BEFORE GnuTLS's handshake()