Weird behavior of send() and recv()

Weird behavior of send() and recv() - python

SORRY FOR BAD ENGLISH
Why if I have two send()-s on the server, and two recv()-s on the client, sometimes the first recv() will get the content of the 2nd send() from the server, without taking just the content of the first one and let the other recv() to take the "due and proper" content of the other send()?
How can I get this work in an other way?

This is by design.
A TCP stream is a channel on which you can send bytes between two endpoints but the transmission is stream-based, not message based.
If you want to send messages then you need to encode them... for example by prepending a "size" field that will inform the receiver how many bytes to expect for the body.
If you send 100 bytes and then other 100 bytes it's well possible that the receiver will instead see 200 at once, or even 50 + 150 in two different read commands. If you want message boundaries then you have to put them in the data yourself.
There is a lower layer (datagrams) that allows to send messages, however they are limited in size and delivery is not guaranteed (i.e. it's possible that a message will get lost, that will be duplicated or that two messages you send will arrive in different order).
TCP stream is built on top of this datagram service and implements all the logic needed to transfer data reliably between the two endpoints.
As an alternative there are libraries designed to provide reliable message-passing between endpoints, like ZeroMQ.

Most probably you use SOCK_STREAM type socket. This is a TCP socket and that means that you push data to one side and it gets from the other side in the same order and without missing chunks, but there are no delimiters. So send() just sends data and recv() receives all the data available to the current moment.
You can use SOCK_DGRAM and then UDP will be used. But in such case every send() will send a datagram and recv() will receive it. But you are not guaranteed that your datagrams will not be shuffled or lost, so you will have to deal with such problems yourself. There is also a limit on maximal datagram size.
Or you can stick to TCP connection but then you have to send delimiters yourself.

Related

Python TCP socket for a lot of data

We (as project group) are currently stuck on the issue of how to handle live data to our server.
We are getting updates on data every second, and we would like to insert this into our database (security is currently not an issue, because it is a school project). The problem is here we tried python SockerServer and AsyncIO to create a TCP server to which the data can be sent.
We got this working with different libraries etc. But we are stuck on the fact that if we keep an open connection with the client (in this case hardware which sends data every second) we can't split the different JSON or XML messages. They are all added up together.
We know why because TCP only provides order.
Any thoughts on how to handle this? So that every message sent will get split from the others.
Recreating the socket won't be the right option if I recall correctly.

What you will have to do is ensure that there is a clear delimiter for each message. For example, the first 6 characters of every message could be the length of the message - whatever reads from the socket decodes the length then reads that number of bytes, and sends the data to whatever needs it. Another way would be if there is a character/byte which never appears in the content, send it immediately before a message - for example control-A (binary value 1) could be the leadin character, and send control-B (binary value 2) as the leadout. Again the server looks for these framing a message.

If you can't change the client side (the thing sending the data), then you are going to have to parse the input. You can't just add a delimiter to something that you don't control.
An alternative is to use a header that encodes the size of the message that will be sent. Lets say you use a header of 4 bytes, The client first send the server a header with the size of the message to come. The client then sends the message (up to 4 gigs or there about). The server knows that it must first read 4 bytes (a header). It calculates the size n that the header contained then reads n bytes from the socket buffer. You are guaranteed to have read only your message. Using special delimiters is dangerous as you MUST know all possible values that a client can send.
It really depends on the type of data you are receiving. What type of connection, latency... If you have a pause of 1 second between packets and your connection is consistent, you could probably get away with first reading the entire buffer once to clear it, then as soon as there is data available - read it and clear the buffer it. not a great approach, but it might work for what you need - and no parsing involved.

Socket in python and socket.recv()

I am sending some request from the server to my client but I have some problem.
When I'm sending messages to the client, if I send many messages, I'll receive all with socket.recv()
Is there a way to get the messages one by one ?
Thanks

You need to use some kind of protocol over otherwise bare sockets.
See python twisted or use something like nanomsg or ZeroMQ if you want a simple drop-in replacement which is message-oriented.
It is not transport-agnostic though, meaning they will only work if they are used on both ends.

No. TCP is a byte stream. There are no messages larger than one byte.

I assume that you are using TCP. TCP is a streaming protocol, not a datagram protocol. This means that the data are not a series of messages, but instead a single data stream without any message boundaries. If you need something like this either switch the protocol (UDP is datagram, but has other problems) or make your own protocol on top of TCP which knows about messages.
Typical message based protocols on top of TCP either use some message delimiter (often newline) or prefix each message with its size.

Python TCP programming

I am having a tcp server and a client written in python. The aim of the programs is that the server will be sending some numbers one after the other to the client and the client should process each one of the number in a separate thread.
The server loops over the number list and sends each one of them to client.
as:
for num in nums:
client_sock.send(str(num))
and the client loop as:
while True:
data = tcpClientSock.recv(BUFSIZE)
thread.start_new_thread( startFunction, (data, ) )
The problem is even though the server sends the program in separate send() call the client receives it all at once.
How can I avoid it? Should I use UDP instead of TCP in this situation?

you'll have to flush the socket on the sending end - add a CR/NL to do so (since you're sending a string)

TCP is a stream based protocol and not message based. This means there are no message boundaries for each time the server calls send(). In fact, each time send() is called, the bytes of data are just added to the stream.
On the receiving end, you'll receive bytes of the stream as they arrive. Since there are no message boundaries, you may receive part of a message or whole messages or whole + part of the next message.
In order to send message over a TCP stream, your protocol needs to establish message boundaries. This allows the receiver to interpret whether it has received a partial, full, or multiple messages.
In your example, the server is sending strings. The string termination servers as the message boundary. On the receiving side, you should be parsing out the strings and have handling for receiving partial strings

Send messages to ZeroMQ server using conventional TCP, possible?

I'm not sure if I'm doing this right, but I would like to be able to send messages to my server running ZMQ from normal TCP connections. The server is running Python ZMQ on port 5555 using a TCP transport. I would like to be able to send messages to it using different clients (Python, Java, PHP) that use conventional TCP. This is what I have so far:
SERVER
context = zmq.Context()
socket = context.socket(zmq.REP)
socket.bind("tcp://*:5555")
while True:
message = socket.recv()
print message
socket.send('{"name":"someone"}')
CLIENT
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('127.0.0.1', 5555))
s.send('Hello, World!')
data = s.recv(1024)
print data
Printing data on the client does not give me the message I am expecting. I get this: �. I tried doing bytes(data).decode('utf8') thinking what I'm getting was an array of bytes, but I get the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
I am just wondering: Is this possible at all? Or am I doing something wrong? Also, is it recommended?
My reason for not using ZMQ on the clients is that I want to reduce the number of dependencies (ZeroMQ being one).
Thank you for your help.

ZeroMQ is a protocol that sits on top of TCP (well, technically, ZMTP is a protocol; 0MQ is the library that implements the ZMTP protocol, and ZeroMQ is the Python implementation of 0MQ…). All the signaling it does, and all the framing (dividing the stream of bytes into separate messages), is done by sending bytes over the socket. A client that doesn't know how to do ZeroMQ signaling and framing is just going to see a bunch of garbage.
What you're trying to do is exactly like trying to write a web client that just reads off a socket without knowing anything about HTTP. You're going to get a bunch of framing stuff that you don't know what to do with. The only difference is that in the case of HTTP, the framing is sometimes (but not always—you can have MIME envelopes, gzip transport-encoding, chunked transport, …) just a bunch of human-readable ASCII that comes before the data, while with ZMTP it's never human-readable…
If you want to send data over a plain TCP socket, you have to do that by creating a plain TCP socket on the server side and calling its send (or sendall, etc.) method. If you want to send data over a ZMTP channel, you have to do that by parsing ZMTP, or using a library that does it for you (like ZeroMQ), on the client side.
One more thing to keep in mind is that, unlike ZMTP, TCP is not a message-oriented protocol; all TCP sends is a stream of bytes. From the receiving side, there's no way to know when one send ends and the next one begins. So, for almost anything but a "send a request, get a response, hang up" protocol, you need to write some kind of framing of your own. This can be as simple as "messages are strings that have no newlines in them, and each message is separated by a newline" (in which case you can just use socket.makefile), but often the message format has to be more complicated, or you have to send "commands" rather than just data, etc.
Since tdelaney's answer has been deleted (which I think means it's invisible to anyone under 10K rep), and had a useful suggestion, I'll repeat it here, with credit as due: You can (using the ZeroMQ library) write a piece of middleware that talks ZMTP to the server but your own simple TCP-based protocol to the clients. ZeroMQ has been specifically designed to make this reasonably easy; as tdelaney put it, it's "kinda like Lego, you build a robust communication infrastructure by building different communicating parts. Not all of the parts need to be zeromq."

Need some clarification on how socket.recv behaves

I'm trying to write an IRC bot but I'm not exactly sure how the receiving of data works. What I currently have:
while True:
data = socket.recv(1024)
#process data
Let's say that for whatever reason it takes it more time to process the data, what would happen if something is sent at that time? Will it get skipped or get added to some sort of a queue and processed after the current one is done?

Depending upon the protocol type the behavior will be different.
TCP:
The TCP RFC clearly states:
TCP provides a means for the receiver to govern the amount of data
sent by the sender. This is achieved by returning a "window" with
every ACK indicating a range of acceptable sequence numbers beyond
the last segment successfully received. The window indicates an
allowed number of octets that the sender may transmit before
receiving further permission.
Also from wikipedia the information is similar:
TCP uses an end-to-end flow control protocol to avoid having the
sender send data too fast for the TCP receiver to receive and process
it reliably. For example, if a PC sends data to a smartphone that is
slowly processing received data, the smartphone must regulate the data
flow so as not to be overwhelmed. TCP uses a sliding window flow
control protocol. In each TCP segment, the receiver specifies in the
receive window field the amount of additionally received data (in
bytes) that it is willing to buffer for the connection. The sending
host can send only up to that amount of data before it must wait for
an acknowledgment and window update from the receiving host.
UDP:
UDP doesn't have any flow control mechanism as TCP. However there is an other implementation of UDP such as RUDP that have some of the features of TCP like flow control.
Here is an other interesting link for the differences between TCP & UDP.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.