send one big packet of many small over TCP/UDP? - python

I am sending coordinates of points to a visualizational client-side script via TCP over the internet. I wonder which option I should use:
concat the coordinates into a large string and send them together, or
send them one by one
I don't know which one the faster is. I have some other questions too:
Which one should I use?
Is there a maximum size of packet of TCP? (python: maximum size of string for client.send(string))
As it is a visualization project should I use UDP instead of TCP?
Could you please tell me a bit about lost packet? When do they occur? How to deal with them?
Sorry for the many questions, but I really struggle with this issue...

When you send a string, that might be sent in multiple TCP packets. If you send multiple strings, they might all be sent in one TCP packet. You are not exposed to the packets, TCP sockets are just a constant stream of data. Do not expect that every call to recv() is paired with a single call to send(), because that isn't true. You might send "abcd" and "efg", and might read in "a", "bcde", and "fg" from recv().
It is probably best to send data as soon as you get it, so that the networking stack has as much information about what you're sending, as soon as possible. It will decide exactly what to do. You can send as big a string as you like, and if necessary, it will be broken up to send over the wire. All automatically.
Since in TCP you don't deal with packets, things like lost packets also aren't your concern. That's all handled automatically -- either the data gets through eventually, or the connection closes.
As for UDP - you probably don't want UDP. :)

UDP has a maximum theoretical packet size of 65535, in reality this size is much lower and depends on operation systems and routing hardware. UDP does not have any mechanism to ensure data delivery, this make it much faster but unreliable without some hand crafted data protection mechanism.
TCP, you don't have to worry about packet size, if you try to send a bigger chunk then possible, wil be automatically and transparently split by the operating system for you. Sending small chunks of data can be inefficient due the overhead of the TCP communication, but even here, by default you have Nagle's algorithm enabled on most OS (See Nagle's algorithm). The algorithm basically tries to join small chunks of data, for most scenarios this is fast and efficient.

Related

Python sockets really unreliable

I have been trying to do some coding with sockets recently and found out that i often get broken pipe errors, especially when working with bad connections.
In an attempt to solve this problem I made sure to sleep after every socket operation. It works but it is slow and ugly.
Is there any other way of making a socket connection more stable?
...server and client getting out of sync
Basically you say that your application is buggy. And the way to make the connection more stable is therefor to fix these bugs, not to work around it with some explicit sleep.
While you don't show any code, a common cause of "getting out of sync" is the assumption that a send on one side is matched exactly by a recv on the other side. Another common assumption is that send will actually send all data given and recv(n) will receive exactly n bytes of data.
All of these assumptions are wrong. TCP is not a message based protocol but a byte stream. Any message semantics need to be explicitly added on top of this byte stream, for example by prefixing messages with a length or by having a unique message separator or by having a fixed message size. And the result of send and recv need to be checked to be sure that all data have been send or all expected data have been received - and if not more send or recv would need to be done until all data are processed.
Adding some sleep often seems to "fix" some of these problems by basically adding "time" as a message separator. But it is not a real fix, i.e. it affects performance but it is also not 100% reliable either.
I've been using Python's Sockets for a long time and I can tell that as long as your code (which you unfortunately didn't provide) is clean and synchronized in itself you shouldn't get any problems. I use Sockets for small applications where I don't necessarily want/need to write/use an API, and it works like a dream.
As #Steffen already mentioned in his answer, TCP is not a message based protocol. It is a "stream oriented protocol" which means that is sends data byte-by-byte and not message-by-message..
Take a look at this thread and this paper to get a better understanding about the differences.
I would also suggest taking a look at this great answer to know how to sync your messages between your server and your client(s).

ZeroMQ: socket per data type or just one socket?

I've got a program which receives information from about 10 other (sensor reading) programs (all controlled by myself). I now want to make them communicate using ZeroMQ.
For most of the queues the important thing is that the central receiving program always has the latest sensor data, all older messages are not important anymore. If a couple messages get lost I don't care. So for all of them I started out with a separate PUB/SUB socket; one for each program. But I'm not sure if that is the right way to do it. As far as I understand I have two options:
Make a separate socket for every program and read them out in a loop. That way I know by the socket what the information is I'm receiving (I'm often just sending an int).
Make one socket to which all the programs connect, and with every message I send a string which tells the receiving end what the message is about.
All connections are on a PUB/SUB basis, so creating one socket would well work out. I'm just not sure if that is the most efficient way to do it.
All tips are welcome!
- PUB/SUB is fine and allows an easy conversion from N-sensors:1-logger into N-sensors:2+-loggers- one might also benefit from a conceptual separation of a socket from an access-port, where more than one sockets may get connected
How to get always JUST THE ACTUAL ( LAST ) SENSOR READOUT:
If not bound, due to system-integration constraints, to some early ZeroMQ API, there is a lovely feature exactly for this via a .setsockopt( ZMQ_CONFLATE, True ) method:
ZMQ_CONFLATE: Keep only last message
If set, a socket shall keep only one message in its inbound/outbound queue, this message being the last message received/the last message to be sent. Ignores ZMQ_RCVHWM and ZMQ_SNDHWM options. Does not support multi-part messages, in particular, only one part of it is kept in the socket internal queue.
On design dilemma:
Unless your real-time control stability introduces some hard-real-time limit, the PUB-side freely decides, how often a new value is instructed to .send() to SUB(-s). Here no magic is needed, the less with ZMQ_CONFLATE option set on the internal outgoing queue managed.
The SUB(-s) side receiver(s) will also benefit from the ZMQ_CONFLATE option set on the internal incoming queue managed, but given a set of individual .bind()-s instantiate separate landing ports for delivery of different individual sensoric readouts, your "last" values will remain consistently the "last"-readouts. If all readouts would go into a common landing pad, your receiving process will get masked-out ( lost ) all readouts but the one that was just accidentally the "last" right before .recv() took place, which would not help much, would it?
If some I/O-performance related tweaking becomes necessary, the .Context( n_IO_threads ) + ZMQ_AFFINITY-mapping options may increase and prioritise the resources the ioDataPump may harness for increased IO-performance
Unless you're up against a tight real time requirement there's not much point in having more sockets than necessary. ZMQ's fair queuing ought to take care of giving each sensor program equal attention (see Figure 6 in the guide)
If your sensor programs are on other devices connected by Ethernet, the ultimate performance of your programs is limited by the bandwidth of the Ethernet NIC in your computer. A single thread program handling a single PULL socket stands a good chance of being able to process the data coming in faster than it can transit the NIC.
If that's so, then you may as well stick to a single socket and enjoy the simpler code. It's not very hard dealing with multiple sockets, but it's far easier to deal with one. For example, with one single socket you don't have to tell each sensor program what network port to connect to - it can be a constant.
PUSH/PULL sounds like a more natural pattern for your situation than PUB/SUB, but that won't make much difference.
Lastness
Lastness is going to be your (potential) problem. The whole point of things like ZMQ is that they will deliver messages in the order they're sent. Thus you read a message, it is by definition the "last" message so far as the recipient is concerned. The recipient has no idea as to whether or not there is another message on the way, in transit.
This is a feature of Actor model architectures (which is what ZMQ is). Messages get buffered up in the transport, and there's no information about the newness of the message to be learned when it's read. All you know is that it was sent some time beforehand. There is no execution rendezvous with the sender.
Now, you either process it as if it is the last message, or you wait for a period of time to see if another one comes along before processing it. The easiest thing to do is to simply process each message as if it is the last.
Contrast this with a Communicating Sequential Processes architecture. It's basically the same as an Actor model architecture, except that the transport does not buffer messages. Message sends block until the recipient has called message read.
Thus when you read a message, the recipient knows that it the last one sent by the sender. And the sender knows that the message it has sent has been received at that very instant by the recipient. So the knowledge of lastness is absolute - the message received really is the last one sent.
However, unless you have something fairly heavyweight going on I wouldn't worry about it. You are quite likely to be able to keep up with your sensor data stream even if the messages you're reading aren't the latest in the queue.
You can nearly make ZMQ into CSP by setting the high water limit on the sending end's socket to 1. That means that you can buffer up at most 1 message. That's not the same as 0, and unfortunately setting the HWM to 0 means "unlimited size buffer".

Yet another confustion about sending/recieving large amount of data over (unix-) socket

I have a C++ program which reads frames from a high speed camera and write each frame to a socket (unix socket). Each write is of 4096 bytes. Each frame is roughly 5MB. ( There is no guarantee that frame size would be constant but it is always a multiple of 4096 bytes. )
There is a python script which reads from the socket : 10 * 4096 bytes at each call of recv. Often I get unexpected behavior which I think boils down to understand the following about the sockets. I believe both of my programs are write/recving in blocking mode.
Can I write whole frame in one go (write call with 5MB of data)? Is it recommended? Speed is major concern here.
If python client fails to read or read slowly than write, does it mean that after some time write operation on socket would not add to buffer? Or, would they overwrite the buffer? If no-one is reading the socket, I'd not mind overwriting the buffer.
Ideally, I'd like my application to write to socket as fast as possibly. If no one is reading the data, then overwriting is fine. If someone is reading the data from socket but not reading fast enough, I'd like to store all data in buffer. Then how can I force my socket to increase the buffer size when reading is slow?
Can I write whole frame in one go (write call with 5MB of data)? Is it
recommended? Speed is major concern here.
Well, you can certainly try, but don't be too surprised if the call to socket.send() only sends a portion of the bytes you've asked it to send. In particular, you should always check the return value of socket.send() to see how many bytes it actually accepted from you, because that value may be greater than zero but less than the number of bytes you passed in to the call. (If it is less, then you'll probably want to call socket.send() again to send the remaining bytes from your buffer that weren't handled by the first call... and repeat as necessary; or alternatively you can call socket.sendall() instead of socket.send(), and that will do the necessary looping and re-calling of the socket.send() command for you, so you don't have to worry about it... the tradeoff is that socket.sendall() might not return for a long time, depending on the speed of your network connection and how much data you told socket.sendall() to send)
Note that when sending datagrams it is common for maximum packet sizes to be enforced; packets larger than that will either be fragmented into smaller packets for transmission (and hopefully re-assembled on the receiving side) or they might simply be discarded. For example, when sending UDP packets over Ethernet, it is common to have an MTU of 1500 bytes. When sending over a Unix socket the MTU will likely be larger than that, but likely there will still be a limit.
If python client fails to read or read slowly than write, does it mean
that after some time write operation on socket would not add to
buffer? Or, would they overwrite the buffer? If no-one is reading the
socket, I'd not mind overwriting the buffer.
If you are sending on a stream-style socket (SOCK_STREAM), then a slow client will cause your server's send() calls to block if/when the buffer fills up. If you are sending on a datagram-style socket (SOCK_DGRAM) and the buffer fills up, the "overflow" datagrams will simply be discarded.
Then how can I force my socket to increase the buffer size when
reading is slow?
You can set the socket's send-buffer size via socket.setsockopt(SOL_SOCKET, SO_SNDBUF, xxx). Note that this is typically done in advance (e.g. right after the socket is created) rather than trying to do it "on the fly" in response to a slow reader.
It sounds like a design flaw that you need to send this much data over the socket to begin-with and that there is this risk of the reader not keeping up with the writer. As an alternative, you may want to consider using a delta-encoding, where you alternate between "key frame"s (whole frames) and multiple frames encoded as deltas from the the prior frame. You may also want to consider writing the data to a local buffer and then, on your UNIX domain socket, implementing a custom protocol that allows reading a sequence of frames starting at a given timestamp or a single frame given a timestamp. If all reads go through such buffer rather than directly from the source, I imagine you could also add additional encoding / compression options in that protocol. Also, if the server application that exports the data to a UNIX socket is a separate application from the one that is reading in the data and writing it to a buffer, you won't need to worry about your data ingestion being blocked by slow readers.

Build a buffer in file for stream of TCP packets

I am trying to build a proxy to buffer some packets according to some schedules.
I have two TCP connections, one from host A to the proxy and the other one from the proxy to host B.
The proxy forwards the packets between A and B. The proxy will buffer the packets according to the scheduled instructions. At certain time, it will buffer the packets. After the buffering period is over, it will forward the packets in the buffer and also do its normal forwarding work.
I am using python. Which module would be the best in this situation? I tried pickle but it is difficult to remove and append elements in the file. Any suggestions? Thanks!
I recommend you join the two scripts into one and just use memory.
If you can't join the scripts for some reason, create a unix-domain socket to pass the raw, binary data directly from one to the other. These fifos have limited size so you'll still have to do in-memory buffering on one side or another, probably the B side.
If the data is too big for memory, you can write it out to a temporary file and re-read it when it's time to pass it on. It'll be easiest if the same script both writes and reads the file as then you won't have to guess when to write, advanced to a new file, or deal with separate readers and writers.

FIFO (named pipe) messaging obstacles

I plan to use Unix named pipes (mkfifo) for simple multi-process messaging.
A message would be just a single line of text.
Would you discourage me from that? What obstacles should I expect?
I have noticed these limitations:
A sender cannot continue until the message is received.
A receiver is blocked until there are some data. Nonblocking IO would be needed
when we need to stop the reading. For example, another thread could ask for that.
The receiver could obtain many messages in a single read. These have to be processed
before quiting.
The max length of an atomic message is limited by 4096 bytes. That is the PIPE_BUF limit on Linux (see man 7 pipe).
I will implement the messaging in Python. But the obstacles hold in general.
Lack of portability - they are mainly a Unix thing. Sockets are more portable.
Harder to scale out to multiple systems (another sockets+)
On the other hand, I believe pipes are faster than sockets for processes on the same machine (less communication overhead).
As to your limitations,
You can "select" on pipes, to do a non-blocking read.
I normally (in perl) print out my messages on pipes seperated by "\n", and read a line from them to get one message at a time.
Do be careful with the atomic length.
I find perlipc to be a good discussion between the various options, though it has perl specific code.
The blocking, both on the sender side and the receiver side, can be worked around via non-blocking I/O.
Further limitations of FIFOs:
Only one client at a time.
After the client closes the FIFO, the server need to re-open its endpoint.
Unidirectional.
I would use UNIX domain sockets instead, which have none of the above limitations.
As an added benefit, if you want to scale it to communicate between multiple machines, it's barely any change at all. For example, just take the Python documentation page on socket and replace socket.AF_INET with socket.AF_UNIX, (HOST, PORT) with filename, and it just works.
SOCK_STREAM will give you stream-like behavior; that is, two sends may be merged into one receive or vice versa. AF_UNIX also supports SOCK_DGRAM: datagrams are guaranteed to be sent and read all as one unit or not at all. (Analogously, AF_INET+SOCK_STREAM=TCP, AF_INET+SOCK_DGRAM=UDP.)

Categories