Python sockets really unreliable - python

I have been trying to do some coding with sockets recently and found out that i often get broken pipe errors, especially when working with bad connections.
In an attempt to solve this problem I made sure to sleep after every socket operation. It works but it is slow and ugly.
Is there any other way of making a socket connection more stable?

...server and client getting out of sync
Basically you say that your application is buggy. And the way to make the connection more stable is therefor to fix these bugs, not to work around it with some explicit sleep.
While you don't show any code, a common cause of "getting out of sync" is the assumption that a send on one side is matched exactly by a recv on the other side. Another common assumption is that send will actually send all data given and recv(n) will receive exactly n bytes of data.
All of these assumptions are wrong. TCP is not a message based protocol but a byte stream. Any message semantics need to be explicitly added on top of this byte stream, for example by prefixing messages with a length or by having a unique message separator or by having a fixed message size. And the result of send and recv need to be checked to be sure that all data have been send or all expected data have been received - and if not more send or recv would need to be done until all data are processed.
Adding some sleep often seems to "fix" some of these problems by basically adding "time" as a message separator. But it is not a real fix, i.e. it affects performance but it is also not 100% reliable either.

I've been using Python's Sockets for a long time and I can tell that as long as your code (which you unfortunately didn't provide) is clean and synchronized in itself you shouldn't get any problems. I use Sockets for small applications where I don't necessarily want/need to write/use an API, and it works like a dream.
As #Steffen already mentioned in his answer, TCP is not a message based protocol. It is a "stream oriented protocol" which means that is sends data byte-by-byte and not message-by-message..
Take a look at this thread and this paper to get a better understanding about the differences.
I would also suggest taking a look at this great answer to know how to sync your messages between your server and your client(s).

Related

Socket programming for Python 2.7.10 with PyPy 5.1.2: Flush -- send data immediately

I have a server and many client implemented in Python, talking over sockets. I want message sent from both sides to be rushed immediately to the other side.
I found this article which talks about calling .flush() on the socket. However, I don't see any flush() api in the python doc here.
Here is a stackoverflow answer regarding the similar question.
It suggests a naive option of closing the socket. However, I want to keep one connection and not re-connect again and again(to avoid complexity of code and useless burden on system.)
It has another answer suggesting to convert the socket to a textfile like object and then call flush on it. However, comments there raise question on its credibility and it is for python 3 and is not accepted answer as well.
I do NOT necessarily need TCP socket, UDP is fine for me as well. But the only thing I want is that data should be sent immediately. Any suggestions for achieving that? I would prefer clean solution, however, hacks are welcome as well.

ZeroMQ: socket per data type or just one socket?

I've got a program which receives information from about 10 other (sensor reading) programs (all controlled by myself). I now want to make them communicate using ZeroMQ.
For most of the queues the important thing is that the central receiving program always has the latest sensor data, all older messages are not important anymore. If a couple messages get lost I don't care. So for all of them I started out with a separate PUB/SUB socket; one for each program. But I'm not sure if that is the right way to do it. As far as I understand I have two options:
Make a separate socket for every program and read them out in a loop. That way I know by the socket what the information is I'm receiving (I'm often just sending an int).
Make one socket to which all the programs connect, and with every message I send a string which tells the receiving end what the message is about.
All connections are on a PUB/SUB basis, so creating one socket would well work out. I'm just not sure if that is the most efficient way to do it.
All tips are welcome!
- PUB/SUB is fine and allows an easy conversion from N-sensors:1-logger into N-sensors:2+-loggers- one might also benefit from a conceptual separation of a socket from an access-port, where more than one sockets may get connected
How to get always JUST THE ACTUAL ( LAST ) SENSOR READOUT:
If not bound, due to system-integration constraints, to some early ZeroMQ API, there is a lovely feature exactly for this via a .setsockopt( ZMQ_CONFLATE, True ) method:
ZMQ_CONFLATE: Keep only last message
If set, a socket shall keep only one message in its inbound/outbound queue, this message being the last message received/the last message to be sent. Ignores ZMQ_RCVHWM and ZMQ_SNDHWM options. Does not support multi-part messages, in particular, only one part of it is kept in the socket internal queue.
On design dilemma:
Unless your real-time control stability introduces some hard-real-time limit, the PUB-side freely decides, how often a new value is instructed to .send() to SUB(-s). Here no magic is needed, the less with ZMQ_CONFLATE option set on the internal outgoing queue managed.
The SUB(-s) side receiver(s) will also benefit from the ZMQ_CONFLATE option set on the internal incoming queue managed, but given a set of individual .bind()-s instantiate separate landing ports for delivery of different individual sensoric readouts, your "last" values will remain consistently the "last"-readouts. If all readouts would go into a common landing pad, your receiving process will get masked-out ( lost ) all readouts but the one that was just accidentally the "last" right before .recv() took place, which would not help much, would it?
If some I/O-performance related tweaking becomes necessary, the .Context( n_IO_threads ) + ZMQ_AFFINITY-mapping options may increase and prioritise the resources the ioDataPump may harness for increased IO-performance
Unless you're up against a tight real time requirement there's not much point in having more sockets than necessary. ZMQ's fair queuing ought to take care of giving each sensor program equal attention (see Figure 6 in the guide)
If your sensor programs are on other devices connected by Ethernet, the ultimate performance of your programs is limited by the bandwidth of the Ethernet NIC in your computer. A single thread program handling a single PULL socket stands a good chance of being able to process the data coming in faster than it can transit the NIC.
If that's so, then you may as well stick to a single socket and enjoy the simpler code. It's not very hard dealing with multiple sockets, but it's far easier to deal with one. For example, with one single socket you don't have to tell each sensor program what network port to connect to - it can be a constant.
PUSH/PULL sounds like a more natural pattern for your situation than PUB/SUB, but that won't make much difference.
Lastness
Lastness is going to be your (potential) problem. The whole point of things like ZMQ is that they will deliver messages in the order they're sent. Thus you read a message, it is by definition the "last" message so far as the recipient is concerned. The recipient has no idea as to whether or not there is another message on the way, in transit.
This is a feature of Actor model architectures (which is what ZMQ is). Messages get buffered up in the transport, and there's no information about the newness of the message to be learned when it's read. All you know is that it was sent some time beforehand. There is no execution rendezvous with the sender.
Now, you either process it as if it is the last message, or you wait for a period of time to see if another one comes along before processing it. The easiest thing to do is to simply process each message as if it is the last.
Contrast this with a Communicating Sequential Processes architecture. It's basically the same as an Actor model architecture, except that the transport does not buffer messages. Message sends block until the recipient has called message read.
Thus when you read a message, the recipient knows that it the last one sent by the sender. And the sender knows that the message it has sent has been received at that very instant by the recipient. So the knowledge of lastness is absolute - the message received really is the last one sent.
However, unless you have something fairly heavyweight going on I wouldn't worry about it. You are quite likely to be able to keep up with your sensor data stream even if the messages you're reading aren't the latest in the queue.
You can nearly make ZMQ into CSP by setting the high water limit on the sending end's socket to 1. That means that you can buffer up at most 1 message. That's not the same as 0, and unfortunately setting the HWM to 0 means "unlimited size buffer".

How to manage socket connections for a chat server (Python) via sockets and select module

Sorry to bother everyone with this, but I've been stumped for a while now.
The problem is that I decided to reconfigure this chat program I had using sockets so that instead of a client and a sever/client, it would have a server, and then two separate clients.
I asked earlier as to how I might get my server to 'manage' these connections of the clients so that it could redirect the data between them. And I got a fantastic answer that provided me with exactly the code I would apparently need to do this.
The problem is I don't understand how it works, and I did ask in the comments but I didn't get much of a reply except for some links to documentation.
Here's what I was given:
connections = []
while True:
rlist,wlist,xlist = select.select(connections + [s],[],[])
for i in rlist:
if i == s:
conn,addr = s.accept()
connections.append(conn)
continue
data = i.recv(1024)
for q in connections:
if q != i and q != s:
q.send(data)
As far as I understand, the select module gives the ability to make waitable objects in the case of select.select.
I've got the rlist, the pending to be read list, the wlist, the pending to be written list, and then the xlist, the pending exceptional condition.
He's assigning the pending to be written list to "s" which in my part of the chat server, is the socket that is listening on the assigned port.
That's about as much as I feel I understand clearly enough. But I would really really like some explanation.
If you don't feel like I asked an appropriate question, tell me in the comments and I'll delete it. I don't want to violate any rules, and I'm pretty sure I am not duplicating threads as I did do research for a while before resorting to asking.
Thanks!
Note: my explanation here assumes you're talking about TCP sockets, or at least some type which is connection-based. UDP and other datagram (i.e. non-connection-based) sockets are similar in some ways, but the way you use select on them in slightly different.
Each socket is like an open file which can have data read and written to it. Data that you write goes into a buffer inside the system waiting to be sent out on the network. Data that arrives from the network is buffered inside the system until you read it. Lots of clever stuff is going on underneath, but when you're using a socket that's all you really need to know (at least initially).
It's often useful to remember that the system is doing this buffering in the explanation that follows, because you'll realise that the TCP/IP stack in the OS sends and receives data independently of your application - this is done so your application can have a simple interface (that's what the socket is, a way of hiding all the TCP/IP complexity from your code).
One way of doing this reading and writing is blocking. Using that system, when you call recv(), for example, if there is data waiting in the system then it will be returned immediately. However, if there is no data waiting then the call blocks - that is, your program halts until there is data to read. Sometimes you can do this with a timeout, but in pure blocking IO then you really can wait forever until the other end either sends some data or closes the connection.
This doesn't work too badly for some simple cases, but only where you're talking to one other machine - when you're talking on more than one socket, you can't just wait for data from one machine because the other one may be sending you stuff. There are also other issues which I won't cover in too much detail here - suffice to say it's not a good approach.
One solution is to use different threads for each connection, so the blocking is OK - other threads for other connections can be blocked without affecting each other. In this case you'd need two threads for each connection, one to read and one to write. However, threads can be tricky beasts - you need to carefully synchronise your data between them, which can make coding a little complicated. Also, they're somewhat inefficient for a simple task like this.
The select module allows you a single-threaded solution to this problem - instead of blocking on a single connection, it allows you a function which says "go to sleep until at least one of these sockets has some data I can read on it" (that's a simplification which I'll correct in a moment). So, once that call to select.select() returns, you can be certain that one of the connections you're waiting on has some data, and you can safely read it (even with blocking IO, if you're careful - since you're sure there's data there, you won't ever block waiting for it).
When you first start your application, you have only a single socket which is your listening socket. So, you only pass that in the call to select.select(). The simplification I made earlier is that actually the call accepts three lists of sockets for reading, writing and errors. The sockets in the first list are watched for reading - so, if any of them have data to read, the select.select() function returns control to your program. The second list is for writing - you might think you can always write to a socket, but actually if the other end of the connection isn't reading data fast enough then your system's write buffer can fill up and you can temporarily be unable to write. It looks like the person who gave you your code ignored this complexity, which isn't too bad for a simple example because usually the buffers are big enough you're unlikely to hit problems in simple cases like this, but it's an issue you should address in the future once the rest of your code works. The final list is watched for errors - this isn't widely used, so I'll skip it for now. Passing the empty list is fine here.
At this point someone connects to your server - as far as select.select() is concerned this counts as making the listen socket "readable", so the function returns and the list of readable sockets (the first return value) will include the listen socket.
The next part runs over all the connections which have data to read, and you can see the special case for your listen socket s. The code calls accept() on it which will take the next waiting new connection from the listen socket and turn it into a brand new socket for that connection (the listen socket continues to listen and may have other new connections also waiting on it, but that's fine - I'll cover this in a second). The brand new socket is added to the connections list and that's the end of handling the listen socket - the continue will move on to the next connection returned from select.select(), if any.
For other connections that are readable, the code calls recv() on them to recover the next 1024 bytes (or whatever is available if less than 1024 bytes). Important note - if you hadn't used select.select() to make sure the connection was readable, this call to recv() could block and halt your program until data arrived on that specific connection - hopefully this illustrates why the select.select() is required.
Once some data has been read the code runs over all the other connections (if any) and uses the send() method to copy the data down them. The code correctly skips the same connection as the data just arrived on (that's the business about q != i) and also skips s, but as it happens this isn't required since as far as I can see it's never actually added to the connections list.
Once all readable connections have been processed, the code returns to the select.select() loop to wait for more data. Note that if a connection still has data, the call returns immediately - this is why accepting only a single connection from the listen socket is OK. If there are more connections, select.select() will return again immediately and the loop can handle the next available connection. You can use non-blocking IO to make this a bit more efficient, but it makes things more complicated so let's keep things simple for now.
This is a reasonable illustration, but unfortunately it suffers from some problems:
As I mentioned, the code assumes you can always call send() safely, but if you have one connection where the other end isn't receiving properly (maybe that machine is overloaded) then your code here could fill up the send buffer and then hang when it tries to call send().
The code doesn't cope with connections closing, which will often result in an empty string being returned from recv(). This should result in the connection being closed and removed from the connections list, but this code doesn't do it.
I've updated the code slightly to try and solve these two issues:
connections = []
buffered_output = {}
while True:
rlist,wlist,xlist = select.select(connections + [s],buffered_output.keys(),[])
for i in rlist:
if i == s:
conn,addr = s.accept()
connections.append(conn)
continue
try:
data = i.recv(1024)
except socket.error:
data = ""
if data:
for q in connections:
if q != i:
buffered_output[q] = buffered_output.get(q, b"") + data
else:
i.close()
connections.remove(i)
if i in buffered_output:
del buffered_output[i]
for i in wlist:
if i not in buffered_output:
continue
bytes_sent = i.send(buffered_output[i])
buffered_output[i] = buffered_output[i][bytes_sent:]
if not buffered_output[i]:
del buffered_output[i]
I should point out here that I've assumed that if the remote end closes the connection, we also want to close immediately here. Strictly speaking this ignores the potential for TCP half-close, where the remote end has sent a request and closes its end, but still expects data back. I believe very old versions of HTTP used to sometimes do this to indicate the end of the request, but in practice this is rarely used any more and probably isn't relevant to your example.
Also it's worth noting that a lot of people make their sockets non-blocking when using select - this means that a call to recv() or send() which would otherwise block will instead return an error (raise an exception in Python terms). This is done partly for safety, to make sure a careless bit of code doesn't end up blocking the application; but it also allows some slightly more efficient approaches, such as reading or writing data in multiple chunks until there's none left. Using blocking IO this is impossible because the select.select() call only guarantees there's some data to read or write - it doesn't guarantee how much. So you can only safely call a blocking send() or recv() once on each connection before you need to call select.select() again to see whether you can do so again. The same applies to the accept() on a listening socket.
The efficiency savings are only generally a problem on systems which have a large number of busy connections, however, so in your case I'd keep things simple and not worry about blocking for now. In your case, if your application seems to hang up and become unresponsive then chances are you're doing a blocking call somewhere where you shouldn't.
Finally, if you want to make this code portable and/or faster, it might be worth looking at something like libev, which essentially has several alternatives to select.select() which work well on different platforms. The principles are broadly similar, however, so it's probably best to focus on select for now until you get your code running, and the investigate changing it later.
Also, I note that a commenter has suggested Twisted which is a framework which offers a higher-level abstraction so that you don't need to worry about all of the details. Personally I've had some issues with it in the past, such as it being difficult to trap errors in a convenient way, but many people use it very successfully - it's just an issue of whether their approach suits the way you think about things. Worth investigating at the very least to see whether its style suits you better than it does me. I come from a background writing networking code in C/C++ so perhaps I'm just sticking to what I know (the Python select module is quite close to the C/C++ version on which it's based).
Hopefully I've explained things sufficiently there - if you still have questions, let me know in the comments and I can add more detail to my answer.

Issues with Python socket module

So I'm working on a Python IRC framework, and I'm using Python's socket module. Do I feel like using Twisted? No, not really.
Anyway, I have an infinite loop reading and processing data from socket.recv(xxxx), where xxxx is really irrelevant in this situation. I split the received data into messages using str.split("\r\n") and process them one by one.
My problem is that I have to set a specific 'read size' in socket.recv() to define how much data to read from the socket. When I receive a burst of data (for example, when I connect to the IRC server and receive the MOTD.etc), there's always a message that spans two 'reads' of the socket (i.e. part of the line is read in one socket.recv() and the rest is read in the next iteration of the infinite loop).
I can't process half-received messages, and I'm not sure if there's even a way of detecting them. In an ideal situation I'd receive everything that's in the buffer, but it doesn't look like socket provides a method for doing that.
Any help?
You should really be using select or poll, e.g. via asyncore or select, or twisted (which you prefer not to).
Reading from a socket you never know how much you'll receive in each read. You could receive several messages in one go, or have one message split into many reads. You should always collect the data in a buffer until you can make use of it, then remove the data you've used from the buffer (but leave data you haven't used yet).
Since you know your input makes sense line by line, then your receive loop can look something like:
while true:
Append new data to buffer
Look for EOLs, process and remove all complete lines
Stream-mode sockets (e.g, TCP) never guarantee that you'll receive messages in any sort of neatly framed format. If you receive partial lines of input -- which will inevitably happen sometimes -- you need to hold onto the partial line until the rest of the line shows up.
Using Twisted will save you a lot of time. Better yet, you may want to look into using an existing IRC framework -- there are a number of them already available.

Speeding up socket send behavior (in Python)

I have a script which sends 5-10 requests a second to the server. The most crucial requirement i have is a limit of requests per second. It must always be the specific figure, not more and no less. To do it i send requests after a given interval of time (minus time required to send previous request).
Problem: some requests are sent fast enough however others take too much time at sock.sendall() step. I believe this is because send buffer is full and execution is blocked until buffer is cleared.
What can i do to flush that buffer quicker?
One of the options i tried is to disable Nagle:
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
but it didn't seem to improve things.
Another option which even sounds too wrong to try it is to set send buffer to the length of the request before each sendall() call.
Is there anything i can do to get more predictable requests per second?
One more option i just thought about: have several processes which will do a small amount of requests per second each, hopefully it will make results more predictable.
OS in question is Centos.
Update: It seems that my error in setting socket options after connect. Looks like size buffer can only be set prior connect() call. Same with TCP_NODELAY. Haven't yet had time to test if it makes any difference.
The most crucial requirement i have is a limit of requests per second.
It must always be the specific figure, not more and no less.
That requirement is completely unimplementable via TCP. You would also need real-time guarantees of service times at the peer.
(From How can I force a socket to send the data in its buffer?)
You can't force it. Period. TCP makes up its own mind as to when it can send data. Now, normally when you call write() on a TCP socket, TCP will indeed send a segment, but there's no guarantee and no way to force this. There are lots of reasons why TCP will not send a segment: a closed window and the Nagle algorithm are two things to come immediately to mind.
Read the full post, it is quite in-depth and clarified some of the things for me eg when disabling Nagle algorithm makes sense and so on.

Categories