Can python socket recieve fill if not read? - python

If you don't read from a python tcp socket, will it fill and cause an error ?
In my code I use .send() and there seem to be an ack reply from the device I'm talking to. If i don't read these out, will they build up and create a problem ? Do it just keep storing them all infinitely ? Surely this would cause memory issue eventually ...
thanks.

If you don't read from a tcp socket then the recv buffer on the receiving end and the send buffer on the seinding end will fill up, at which point your program will block on further send() calls.
How much memory each process will use depends on the size of those buffers, which depends on the operating system and socket options. For example, on linux you would get into a situation like this:
$ ss -tpn
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 2595384 127.0.0.1:3333 127.0.0.1:2222 users:(("python3",pid=13088,fd=3))
ESTAB 964588 0 127.0.0.1:2222 127.0.0.1:3333 users:(("python3",pid=13087,fd=4))
The first line shows the sending process (full send queue, ~2.6MB), the second line the receiving process (full recv queue, ~1MB).
This happens because during data transfer using TCP, with each ACK the receiver tells the sender how much data it is ready to accept for the next transmission. If the rec buffer is full, the send buffer will also fill up and then no more data can be sent.

Related

python server for real time telemetry: how to skip packets to avoid "falling behind"

I've got a real-time telemetry problem I'd like guidance on. The basics are obvious: a stream of UDP multicast packets are arriving at a given address/port. msg = recv(), then decode and display that msg, and all's well. But what if the next packet arrives before the first msg has been fully processed, or worse, what if the next TWO packets have arrived before we've gotten back to the recv() loop? Since this is a real-time telemetry situation, I'd like to (a) recognize that there's more than one packet to be dealt with, and then (b) delete/ignore all packets that in the multicast reception buffer except for the most recent.
I have spent a few hours trying to get recv() to work in a non-blocking mode and to return a status showing how many separate packets are in the received multicast queue, to no avail.
I'm using matplotlib/tkinter for the display of the data, that (plus hey, it's Python) is why operating on each packet is a lengthy operation.
Am I way off in the weeds here? Is there a better approach?

Python socket recv function

In the python socket module recv method socket.recv(bufsize[, flags]) docs here , it states:
Receive data from the socket. The return value is a bytes object representing the data received.
The maximum amount of data to be received at once is specified by bufsize.
I'm aware that bufsize represents the MAX amount of data received at once, and that if the amount of data received is LESS than bufsize, that means that the number of bytes sent by socket on the other end is less than bufsize
Is it possible that the data returned from the 1st call to socket.recv(bufsize) is < bufsize but there is still data left in the network buffer?
Eg.
data = socket.recv(10)
print(len(data)) # outputs 5
data = socket.recv(10) # calling `socket.recv(10)` returns more data without the
# socket on the other side doing `socket.send(data)`
Can a scenario in the example ever occur and does this apply for unix domain sockets as well as regular TCP/IP sockets?
The real problem in network communication is that the receiver cannot control when and how the network delivers the data.
If the size of the data returned by recv is less than the requested size, that means that at the moment of the recv call, no more data was available in the local network buffer. So if you can make sure that:
the sender has stopped sending data
the network could deliver all the data
then a new revc call will block.
The problem is that in real world cases, you can never make sure of the two above assumptions. TCP is a stream protocol, which only guarantees that all sent bytes will go to the receiver and in correct order. But if offers no guarantee on the timing, and sent packets can be fragmented or re-assembled by the network (starting from the network TCP stack on the sender and ending at the reciever TCP stack)
Found a similar post that follows up on this: How can I reliably read exactly n bytes from a TCP socket?
Basically use socket.makefile() to return a file object and call read(num_bytes) to return exactly the amount requested, else block.
fh = socket.makefile(mode='b')
data = fh.read(LENGTH_BYTES)

Receive an high rate of UDP packets with python

I'm working with python in order to receive a stream of UDP packets from an FPGA, trying to lose as few packets as possible.
The packet rate goes from around 5kHz up to some MHz and we want to take data in a specific time window (acq_time in the code).
We have this code now:
BUFSIZE=4096
dataSock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
dataSock.settimeout(0.1)
dataSock.bind((self.HOST_IP, self.HOST_PORT))
time0=time.time()
data_list = []
while time.time()-time0<acq_time:
fast_acquisition(data_list)
def fast_acquisition(data_list_tmp):
data, addr = dataSock.recvfrom(self.BUFSIZE)
data_list_tmp.append(data)
return len(data)
And after the acquisition we save our data_list on disk.
This code is meant to be as simple and fast as possible, but it's still too slow and we lose too many packets even at 5kHz, and we think that this happens because while we read, and store in the list one packet and check the time, the next one (ore ones) arrives and is lost.
Is there any way to keep the socket open? Can we open multiple sockets "in series" with parallel processing, so that when we are saving the file from the first the second can receive another packet?
We can even think to use another language only to receive and store the packets on disk.
You could use tcpdump (which is implemented in C) to capture the UDP traffic, since it's faster than python:
#!/bin/bash
iface=$1 # interface, 1st arg
port=$2 # port, 2nd arg
tcpdump -i $iface -G acq_time_in_seconds -n udp port $port -w traffic.pcap
And then you can use e.g. scapy to process that traffic
#!/usr/bin/env python
from scapy.all import *
scapy_cap = rdpcap('traffic.pcap')
for packet in scapy_cap:
# process packets...
There are several reasons why UDP packets can be lost, and certainly the speed of being able to take them off the socket queue and store them can be a factor, at least eventually. However, even if you had a dedicated C language program handling them, it's unlikely you'll be able to receive all the UDP packets if you expect to receive more than a million a second.
The first thing I'd do is to determine if python performance is actually your bottleneck. It is more likely in my experience that, first and foremost, you're simply running out of receive buffer space. The kernel will store UDP datagrams on your socket's receive queue until the space is exhausted. You might be able to extend that capacity a little with a C program, but you will still exhaust the space faster than you can drain the socket if packets are coming in at high enough speed.
Assuming you're running on linux, take a look at this answer for how to configure the socket's receive buffer space -- and examine the system-wide maximum value, which is also configurable and might need to be increased.
https://stackoverflow.com/a/30992928/1076479
(If you're not on linux, I can't really give any specific guidance, although the same factors are likely to apply.)
It is possible that you will be unable to receive packets fast enough, even with more buffer space and even in a C program. In that case, #game0ver's idea of using tcpdump might work better if you only need to withstand a short intense burst of packets as it uses a much lower-level interface to obtain packets (and is highly optimized). But then of course you won't just have the UDP payload, you'll have entire raw packets and will need to strip the IP and Ethernet layer headers as well before you can process them.

Why does sending consecutive UDP messages cause messages to arrive late?

I've written a server python script in Windows 7 to send Ethernet UDP packets to a UNIX system running a C client receiving program that sends the message back to the server. However, sometimes (not always) a message in the last port (and always the last port) that python sends to won't arrive until the next batch of 4 messages is sent. This causes the timing of the message received for the last port incorrect to when it was sent, and I cannot have two messages back to back on the same port.
I have been able to verify this in Wireshark by locating two messages that arrived around the same time because the one that wasn't received was processed with the other. I have also checked the timing right after the recv() function and it shows a long delay and then a short delay because it basically had two packets received.
Things I have done to try to fix this, but has help me explain the problem or how to solve it: I can add a delay in between each sendto() and I will successfully send and receive all messages with correct timing but I want the test to work the way I've written it below; I've increased the priority of the receive thread thinking that my Ethernet receive was not getting signal to pick up the package or that some process was taking too long, but this didn't work and 20ms should be WAY more than necessary to process the data; I have removed ports C and D, then port B misses messages (Only having one port doesn't caause issues), I thought reducing the number of ports would improve timing; Sending to a dummy PORTE immediately after PORTD lets me receive all of the messages with correct timing (I assume the problem is transferred to PORTE); I have also reproduced the python script in a UNIX environment and C code and have had the same issue, pointing me to a receiving issue; I've also set my recv function to time out every 1ms hoping that it could recover somehow even though the timing would be off a bit, but I still saw messages back to back. I've also checked that no UDP packets have been dropped and that the buffer is large enough to hold those 4 messages. Any new ideas would help.
This is the core of the code, the python script will send 4 packets. One 20 byte message to a corresponding waiting thread in C and delay for 20ms
A representation of the python code looks something like
msg_cnt = 5000
while cnt < msg_cnt:
UDPsocket.sendto(data, (IP, PORTA))
UDPsocket.sendto(data, (IP, PORTB))
UDPsocket.sendto(data, (IP, PORTC))
UDPsocket.sendto(data, (IP, PORTD))
time.sleep(.02)
cnt++
The C code has 4 threads waiting to receive on their corresponding ports. Essentially each thread should receive its packet, process it, and send back to the server. This process should take less than 20ms before the next set of messages arrive
void * receiveEthernetThread(){
uint8_t ethRxBuff[1024];
if((byteCnt = recv(socketForPort, ethRxBuff, 1024, 0)) < 0){
perror("recv")
}else{
//Process Data, cannot have back to back messages on the same port
//Send back to the server
}
}
I found out the reason I was missing messages a while back and wanted to answer my question. I was running the program on a Zynq-7000 and didn't realize this would be an issue.
In the Xilinx Zynq-7000-TRM, there is a known issue describing:
" It is possible to have the last frame(s) stuck in the RX FIFO with the software having no way to get the last frame(s) out of there. The GEM only initiates a descriptor request when it receives another frame. Therefore, when a frame is in the FIFO and a descriptor only becomes available at a later time and no new frames arrive, there is no way to get that frame out or even know that it exists.
This issue does not occur under typical operating conditions. Typical operating conditions are when the system always has incoming Ethernet frames. The above mentioned issue occurs when the MAC stops receiving the Ethernet frames.
Workaround: There is no workaround in the software except to ensure a continual flow of Ethernet frames."
Was fixed basically by having continuous incoming Ethernet traffic, sorry for missing that crucial information.
This causes the timing of the message received for the last port
incorrect to when it was sent, and I cannot have two messages back to
back on the same port.
The short explanation is you are using UDP, and that protocol gives no guarantees about delivery or order.
That aside, what you are describing most definitely sounds like a buffering issue. Unfortunately, there is no real way to "flush" a socket.
You will either need to use a protocol that guarantees what you need (TCP), or implement your needs on top of UDP.
I believe your real problem though is how the data is being parsed on the server side. If your application is completely dependent on a 20ms interval of four separate packets from the network, that's just asking for trouble. If it's possible, I would fix that rather than attempt fixing (normal) socket buffering issues.
A hacky solution though, since I love hacky things:
Set up a fifth socket on the server. After sending your four time-sensitive packets, send "enough" packets to the fifth port to force any remaining time-sensitive packets through. What is "enough" packets is up to you. You can send a static number, assuming it will work, or have the fifth port send you a message back the moment it starts recv-ing.

How to check the amount of packets in a receive buffer of a raw socket

I have written a linux server receiving packets on a specific Ethernet-type (using a raw socket), and sending them on a different ethernet device. The thing is, the rate I need to receive the packets, is greater then the rate I can send them to the other interface. So I'm using the socket buffer, untill it gets full, and then I expect packets drop.
I have set the buffer size using
setsockopt(socket, SOL_SOCKET, RECVBUF, 20 * 1024 * 1024)
And validating using getsockopt, I do see the socket was configured correctly.
The thing is I start do drop packets much faster then I expected (nearly 10 times)
What I want to do, is get the amount of packets in the socket buffer, that I will be able to print the time left untill it is full.
(The server is written in Python, yet I would be able to "translate" from other languages)

Categories