I'm a bit confused about how to keep calling recv() when using select(). This code isn't complete but it demonstrates the issue. Lets assume we are receiving a decent amount of data from each connection (10,20mb).
Should you keep looping using recv() until you get the desired number of bytes after the call to select()?
while True:
r,w,e = select.select(r_ready, w_ready, [], timeout)
for client in r:
if client == sock:
acceptConnection(sock)
else:
chunks = []
bytesRead = 0
while bytesRead < desiredBytes:
chunk = client.recv(1024)
bytesRead += len(chunk)
Or should you only call recv() once after each select() loop?
clientBuffers = {}
while True:
r,w,e = select.select(r_ready, w_ready, [], timeout)
for client in r:
if client == sock:
acceptConnection(sock)
else:
chunk = client.recv(1024)
clientBuffers[client].append(chunk)
Should you keep looping using recv() until you get the desired number
of bytes after the call to select()?
In general, no; because you have no way of knowing how long that will take. (e.g. for all you know, the client might not send (or the network might not deliver) the entire sequence of desired bytes until an hour after it sends the first bytes in the sequence; which means that if you stay in a loop calling recv() until you get all of the bytes, then it's possible that all of the other clients will not get any response from your server for a very long time -- clearly not desirable behavior for a multi-client server!)
Instead, just get as many bytes from recv() as you currently can, and if you didn't receive enough bytes to take action yet, then store the received bytes in a buffer somewhere for later and go back to your regular select() call. select() should be the only place in your event loop that you ever block. Making all of your sockets non-blocking is highly recommended, in order to guarantee that you won't ever accidentally block inside a recv() call.
Related
I am using Python's select library to asynchronously read data off of two sockets. Since the size of the packets that I receive can vary, and I don't know if there is a maximum possible size for the data that I am reading, I have implemented a function called recvAll(sock) to get all of the data off of a socket:
def recvAll(sock):
buffer = ''
data = []
try:
while True:
buffer = sock.recv(8192)
if not buffer:
break
data.append(buffer)
except error, (errorCode, message):
if errorCode != 10035:
print 'error: ', str(errorCode), ' ', message
return "".join(data)
I am calling the select library like this:
rlist, wlist, elist = select.select([sock1, sock2], [], [])
for sock in rlist:
if sock == sock1:
#data1 = sock.recv(8192)
data1 = recvAll(sock)
else:
#data2 = sock.recv(8192)
data2 = recvAll(sock)
In the for loop for each socket I process, if I call sock.recv directly, I am able to properly get data1 and data2. However, if I first pass sock to recvAll I am only able to get data1. It does not appear that recvAll is being called on sock2 at all. Why is this the case?
Most likely what is going on is that, since your socket is not set as non-blocking ( socket.setblocking(0) ), your recvAll function is blocking, waiting for more input from the socket. It won't return 0 until the socket is closed by the other end, or an error occurs.
The way to fix this is to structure your code to combine the functions of recvAll with your select. Each time your select returns with an indication that there is data waiting on the socket, read from the socket ONLY ONCE, append the data to the buffer for that socket, then loop back into select.
After each recv, look at what you got and decide what to do next.. eg. for some protocols, if the buffer contains a \n, that would be an indication that you got a complete message and need to do something with it. Or, in your case, it seems that closing of the socket is the indication that your message is complete.. so you should look for recv returning a zero length string....
If the socket was in fact closed, then you need to remove it from the list of sockets you are passing into select.
I am attempting to send a string to my server from my client with a specific filename and then send that file to the client. For some reason it hangs even after it's received all of the file. It hangs on the:
m = s.recv(1024)
client.py
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("192.168.1.2", 54321))
s.send(b"File:test.txt")
f = open("newfile.txt", "wb")
data = None
while True:
m = s.recv(1024)
data = m
if m:
while m:
m = s.recv(1024)
data += m
else:
break
f.write(data)
f.close()
print("Done receiving")
server.py
import socket
import os
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("", 54321))
while True:
client_input = c.recv(1024)
command = client_input.split(":")[0]
if command == "File":
command_parameter = client_input.split(":")[1]
f = open(command_parameter, "rb")
l = os.path.getsize(command_parameter)
m = f.read(l)
c.sendall(m)
f.close()
TLDR
The reason recv blocks is because the socket connection is not shutdown after the file data was sent. The implementation currently has no way to know when the communication is over, which results in a deadlock between the two, remote processes. To avoid this, close the socket connection in the server, which will generate an end-of-file event in the client (i.e. recv returns a zero-length string).
More insight
Whenever you design any software where two processes communicate with each other, you have to define a protocol that disambiguates the communication such that both peers know exactly which state they are in at all times. Typically this involves using the syntax of the communication to help guide the interpretation of the data.
Currently, there are some problems with your implementation: it doesn't define an adequate protocol to resolve potential ambiguity. This becomes apparent when you consider the fact that each call to send in one peer doesn't necessarily correspond to exactly one call to recv in the other. That is, the calls to send and recv are not necessarily one-to-one. Consider sending the file name to the server on a heavily congested network: perhaps only half of the file name makes it to the server when the first call to recv returns. The server has no way (currently) to know if it has finished receiving the file name. The same is true in the client: how does the client know when the file has finished?
To work around this, we can introduce some syntax into the protocol and some logic into the server to ensure we get the complete file name before continuing. A simple solution would be to use an EOL character, i.e. \n to denote the end of the client's message. Now, 99.99% of the time in your testing this will take a single call to recv to read in. However you have to anticipate the cases in which it might take more than one call to recv. This can be implemented using a loop, obviously.
The client end is simpler for this demo. If the communication is over after the sending of the file, then that event can be used to denote the end of the data stream. This happens when the server closes the connection on its end.
If we were to expand the implementation to, say, allow for requests for multiple, back-to-back files, then we'd have to introduce some mechanism in the protocol for distinguishing the end of one file and the beginning of the next. Note that this also means the server would need to potentially buffer extra bytes that it reads in on previous iterations in case there is overlap. A stream implementation is generally useful for these sorts of things.
The python3 socket programming howto presents this code snippet
class MySocket:
"""demonstration class only
- coded for clarity, not efficiency
"""
def __init__(self, sock=None):
if sock is None:
self.sock = socket.socket(
socket.AF_INET, socket.SOCK_STREAM)
else:
self.sock = sock
def connect(self, host, port):
self.sock.connect((host, port))
def mysend(self, msg):
totalsent = 0
while totalsent < MSGLEN:
sent = self.sock.send(msg[totalsent:])
if sent == 0:
raise RuntimeError("socket connection broken")
totalsent = totalsent + sent
def myreceive(self):
chunks = []
bytes_recd = 0
while bytes_recd < MSGLEN:
chunk = self.sock.recv(min(MSGLEN - bytes_recd, 2048))
if chunk == b'':
raise RuntimeError("socket connection broken")
chunks.append(chunk)
bytes_recd = bytes_recd + len(chunk)
return b''.join(chunks)
where the send loop is interrupted if the socket send method returns 0.
The logic behind this snippet is that when the send method returns '0 bytes sent', the sending side of a socket connection should give up its efforts to send data. This is for sure true for the recv method, where zero bytes read for a socket in blocking mode should be interpreted as EOF, and therefore the reading side should give up.
However I cannot understand under which situations the send method could return zero. My understanding of python sockets is that send returns immediately due to buffering at the OS level. If the buffer is full send will block, or if the connections is closed at the remote side, an exception is raised.
Finally suppose send returns zero without raising an exception: does this really indicate that all future send calls will return zero?
I've done some testing (although using only socket connected to ::1 on OS X) and was not able to find a situation in which send returns 0.
Edit
The HOWTO states:
But if you plan to reuse your socket for further transfers, you need
to realize that there is no EOT on a socket. I repeat: if a socket
send or recv returns after handling 0 bytes, the connection has been
broken. If the connection has not been broken, you may wait on a recv
forever, because the socket will not tell you that there’s nothing
more to read (for now).
It is pretty easy to find a situation in which recv returns 0: when the remote (sending) side calls socket.shutdown(SHUT_WR), further recv on the receiving side will return 0 and not raise any exception.
I'm looking for a concrete example where you can show that receiving 0 zero from send indicates a broken connection (which will continue to return 0 on send.)
Upon seeing the question I was somehow stunned, because a send C call can return 0 bytes and the connection is of course still alive (the socket cannot simply send more bytes at that given moment in time)
https://github.com/python/cpython/blob/master/Modules/socketmodule.c
I decided to "use the source" and unless I am very wrong (which can always be and often is) this is a bug in the HOWTO.
Chain:
send is an alias for sock_send
sock_send calls in turn sock_call
sock_call calls in turn sock_call_ex
sock_call calls in turn sock_send_impl (which has been passed down the chain starting with sock_send)
Unwinding:
sock_send_impl returns true or false (1 or 0) with return (ctx->result >= 0)
sock_call_ex returns
-1 if sock_send_impl returns false
0 if sock_send_impl returns true
sock_call returns this value transparently.
sock_send
returns NULL for a -1 (because an error has been set and an exception will be raised)
returns ctx->result for 0from sock_call
And ctx->result is the number of bytes written by the C call send in sock_send_impl.
The chain shows that if 0 bytes have been sent, there is no error and this actually is a potential real life socket situation.
If my logic is wrong, someone please let me know.
I might be wrong, but I think you are looking for an impossible situation...
As #mementum has shown in his answer, it is theoretically possible for a socket to return zero when there is no error, but also no data sent.
However, as shown elsewhere on SO this can only happen in very specific scenarios. In my experience (and also covered in the comments to the accepted answer) you would only ever get a zero result on a non-blocking socket when the network is congested. Now Python sockets are blocking by default, which means that the kernel should wait until there is room to take some more data then return how many bytes were queued. By definition, this can never be zero.
So, putting it all together, since your snippet doesn't reset the socket type - e.g. using the set_blocking function - it is using blocking sockets and so cannot return zero and thus cannot hit the path mementum identified.
This is backed up by the fact that you have been unable to trigger the specific line of code no matter what you do.
I'm building a simple server-client app using sockets. Right now, I am trying to get my client to print to console only when it received a specific message (actually, when it doesn't receive a specific message), but for some reason, every other time I run it, it goes through the other statement in my code and is really inconsistent - sometimes it will work as it should and then it will randomly break for a couple uses.
Here is the code on my client side:
def post_checker(client_socket):
response = client_socket.recv(1024)
#check if response is "NP" for a new post from another user
if response == "NP":
new_response = client_socket.recv(1024)
print new_response
else: #print original message being sent
print response
where post_checker is called in the main function as simply "post_checker(client_socket)" Basically, sometimes I get "NPray" printed to my console (when the client only expects to receive the username "ray") and other times it will print correctly.
Here is the server code correlated to this
for sublist in user_list:
client_socket.send("NP")
client_socket.send(sublist[1] + " ")
where user_list is a nested list and sublist[1] is the username I wish to print out on the client side.
Whats going on here?
The nature of your problem is that TCP is a streaming protocol. The bufsize in recv(bufsize) is a maximum size. The recv function will always return data when available, even if not all of the bytes have been received.
See the documentation for details.
This causes problems when you've only sent half the bytes, but you've already started processing the data. I suggest you take a look at the "recvall" concept from this site or you can also consider using UDP sockets (which would solve this problem but may create a host of others as UDP is not a guaranteed protocol).
You may also want to let the python packages handle some of the underlying framework for you. Consider using a SocketServer as documented here:
buffer = []
def recv(sock):
global buffer
message = b""
while True:
if not (b"\r\n" in b"".join(buffer)):
chunk = sock.recv(1024)
if not chunk:
break
buffer.append(chunk)
concat = b"".join(buffer)
if (b"\r\n" in concat):
message = concat[:concat.index(b"\r\n")]
concat = concat[concat.index(b"\r\n") + 2:]
buffer = [concat]
break
return message
def send(sock, data):
sock.send(data + b"\r\n")
I have tested this, and in my opinion, it works perfectly
My use case: I have two scripts that send data quickly, it ends up that one time or another, the buffers receive more than they should, and gather the data, with this script it leaves everything that receives more saved, and continues receiving until there is a new line between the data, and then, it gathers the data, divides in the new line, saves the rest and returns the data perfectly separated
(I translated this, so please excuse me if anything is wrong or misunderstood)
I am trying to stop recv from waiting endlessly for input.
First I tried:
recv = bytes('','UTF-8')
while True:
data_recv = self.socketclient.recv(1024*768)
if not data_recv:
break
else:
recv += data_recv
return recv
On Serverside I send a picture and then the server just waits after host.sendall(string).
So I thought after a couple of receives (since the picture is bigger the client has to receive more often) it will detect data_recv == false and stops but it doesn't happen.
My second try was with select()
do_read = False
recv = bytes('','UTF-8')
while True:
read_sockets,write_sockets,error_sockets = select.select([self.socketclient],[],[])
do_read = bool(read_sockets)
print (do_read)
if do_read:
print ("read")
data_recv = self.socketclient.recv(640*480)
recv += data_recv
else:
break
return recv
With this he only reads True from print(do_read) and then also just stops and waits endlessly. Here too I had expected to read False at some point.
So my question is how do I manage it, that my client reads the entire string from the socket and if nothing is send anymore it stops?
I had some success with self.socketclient.settimeout() but I rather would not use it since it will always waste time in the programm and it is more like a workaround.
You have to stop your while loop when all data is received. So after each recv you have to check if received data length is the length of file you requested. If it is, then you break the loop and do something with that data.
recv has no knowledge if sending of data is done, because it doesn't know how long data is. It is just listening forever until you or other endpoint close connection.
You can also use non-blocking socket (which is better by the way).