Why does sending consecutive UDP messages cause messages to arrive late?

Why does sending consecutive UDP messages cause messages to arrive late? - python

I've written a server python script in Windows 7 to send Ethernet UDP packets to a UNIX system running a C client receiving program that sends the message back to the server. However, sometimes (not always) a message in the last port (and always the last port) that python sends to won't arrive until the next batch of 4 messages is sent. This causes the timing of the message received for the last port incorrect to when it was sent, and I cannot have two messages back to back on the same port.
I have been able to verify this in Wireshark by locating two messages that arrived around the same time because the one that wasn't received was processed with the other. I have also checked the timing right after the recv() function and it shows a long delay and then a short delay because it basically had two packets received.
Things I have done to try to fix this, but has help me explain the problem or how to solve it: I can add a delay in between each sendto() and I will successfully send and receive all messages with correct timing but I want the test to work the way I've written it below; I've increased the priority of the receive thread thinking that my Ethernet receive was not getting signal to pick up the package or that some process was taking too long, but this didn't work and 20ms should be WAY more than necessary to process the data; I have removed ports C and D, then port B misses messages (Only having one port doesn't caause issues), I thought reducing the number of ports would improve timing; Sending to a dummy PORTE immediately after PORTD lets me receive all of the messages with correct timing (I assume the problem is transferred to PORTE); I have also reproduced the python script in a UNIX environment and C code and have had the same issue, pointing me to a receiving issue; I've also set my recv function to time out every 1ms hoping that it could recover somehow even though the timing would be off a bit, but I still saw messages back to back. I've also checked that no UDP packets have been dropped and that the buffer is large enough to hold those 4 messages. Any new ideas would help.
This is the core of the code, the python script will send 4 packets. One 20 byte message to a corresponding waiting thread in C and delay for 20ms
A representation of the python code looks something like
msg_cnt = 5000
while cnt < msg_cnt:
UDPsocket.sendto(data, (IP, PORTA))
UDPsocket.sendto(data, (IP, PORTB))
UDPsocket.sendto(data, (IP, PORTC))
UDPsocket.sendto(data, (IP, PORTD))
time.sleep(.02)
cnt++
The C code has 4 threads waiting to receive on their corresponding ports. Essentially each thread should receive its packet, process it, and send back to the server. This process should take less than 20ms before the next set of messages arrive
void * receiveEthernetThread(){
uint8_t ethRxBuff[1024];
if((byteCnt = recv(socketForPort, ethRxBuff, 1024, 0)) < 0){
perror("recv")
}else{
//Process Data, cannot have back to back messages on the same port
//Send back to the server
}
}

I found out the reason I was missing messages a while back and wanted to answer my question. I was running the program on a Zynq-7000 and didn't realize this would be an issue.
In the Xilinx Zynq-7000-TRM, there is a known issue describing:
" It is possible to have the last frame(s) stuck in the RX FIFO with the software having no way to get the last frame(s) out of there. The GEM only initiates a descriptor request when it receives another frame. Therefore, when a frame is in the FIFO and a descriptor only becomes available at a later time and no new frames arrive, there is no way to get that frame out or even know that it exists.
This issue does not occur under typical operating conditions. Typical operating conditions are when the system always has incoming Ethernet frames. The above mentioned issue occurs when the MAC stops receiving the Ethernet frames.
Workaround: There is no workaround in the software except to ensure a continual flow of Ethernet frames."
Was fixed basically by having continuous incoming Ethernet traffic, sorry for missing that crucial information.

This causes the timing of the message received for the last port
incorrect to when it was sent, and I cannot have two messages back to
back on the same port.
The short explanation is you are using UDP, and that protocol gives no guarantees about delivery or order.
That aside, what you are describing most definitely sounds like a buffering issue. Unfortunately, there is no real way to "flush" a socket.
You will either need to use a protocol that guarantees what you need (TCP), or implement your needs on top of UDP.
I believe your real problem though is how the data is being parsed on the server side. If your application is completely dependent on a 20ms interval of four separate packets from the network, that's just asking for trouble. If it's possible, I would fix that rather than attempt fixing (normal) socket buffering issues.
A hacky solution though, since I love hacky things:
Set up a fifth socket on the server. After sending your four time-sensitive packets, send "enough" packets to the fifth port to force any remaining time-sensitive packets through. What is "enough" packets is up to you. You can send a static number, assuming it will work, or have the fifth port send you a message back the moment it starts recv-ing.

Related

python server for real time telemetry: how to skip packets to avoid "falling behind"

I've got a real-time telemetry problem I'd like guidance on. The basics are obvious: a stream of UDP multicast packets are arriving at a given address/port. msg = recv(), then decode and display that msg, and all's well. But what if the next packet arrives before the first msg has been fully processed, or worse, what if the next TWO packets have arrived before we've gotten back to the recv() loop? Since this is a real-time telemetry situation, I'd like to (a) recognize that there's more than one packet to be dealt with, and then (b) delete/ignore all packets that in the multicast reception buffer except for the most recent.
I have spent a few hours trying to get recv() to work in a non-blocking mode and to return a status showing how many separate packets are in the received multicast queue, to no avail.
I'm using matplotlib/tkinter for the display of the data, that (plus hey, it's Python) is why operating on each packet is a lengthy operation.
Am I way off in the weeds here? Is there a better approach?

Receive an high rate of UDP packets with python

I'm working with python in order to receive a stream of UDP packets from an FPGA, trying to lose as few packets as possible.
The packet rate goes from around 5kHz up to some MHz and we want to take data in a specific time window (acq_time in the code).
We have this code now:
BUFSIZE=4096
dataSock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
dataSock.settimeout(0.1)
dataSock.bind((self.HOST_IP, self.HOST_PORT))
time0=time.time()
data_list = []
while time.time()-time0<acq_time:
fast_acquisition(data_list)
def fast_acquisition(data_list_tmp):
data, addr = dataSock.recvfrom(self.BUFSIZE)
data_list_tmp.append(data)
return len(data)
And after the acquisition we save our data_list on disk.
This code is meant to be as simple and fast as possible, but it's still too slow and we lose too many packets even at 5kHz, and we think that this happens because while we read, and store in the list one packet and check the time, the next one (ore ones) arrives and is lost.
Is there any way to keep the socket open? Can we open multiple sockets "in series" with parallel processing, so that when we are saving the file from the first the second can receive another packet?
We can even think to use another language only to receive and store the packets on disk.

You could use tcpdump (which is implemented in C) to capture the UDP traffic, since it's faster than python:
#!/bin/bash
iface=$1 # interface, 1st arg
port=$2 # port, 2nd arg
tcpdump -i $iface -G acq_time_in_seconds -n udp port $port -w traffic.pcap
And then you can use e.g. scapy to process that traffic
#!/usr/bin/env python
from scapy.all import *
scapy_cap = rdpcap('traffic.pcap')
for packet in scapy_cap:
# process packets...

There are several reasons why UDP packets can be lost, and certainly the speed of being able to take them off the socket queue and store them can be a factor, at least eventually. However, even if you had a dedicated C language program handling them, it's unlikely you'll be able to receive all the UDP packets if you expect to receive more than a million a second.
The first thing I'd do is to determine if python performance is actually your bottleneck. It is more likely in my experience that, first and foremost, you're simply running out of receive buffer space. The kernel will store UDP datagrams on your socket's receive queue until the space is exhausted. You might be able to extend that capacity a little with a C program, but you will still exhaust the space faster than you can drain the socket if packets are coming in at high enough speed.
Assuming you're running on linux, take a look at this answer for how to configure the socket's receive buffer space -- and examine the system-wide maximum value, which is also configurable and might need to be increased.
https://stackoverflow.com/a/30992928/1076479
(If you're not on linux, I can't really give any specific guidance, although the same factors are likely to apply.)
It is possible that you will be unable to receive packets fast enough, even with more buffer space and even in a C program. In that case, #game0ver's idea of using tcpdump might work better if you only need to withstand a short intense burst of packets as it uses a much lower-level interface to obtain packets (and is highly optimized). But then of course you won't just have the UDP payload, you'll have entire raw packets and will need to strip the IP and Ethernet layer headers as well before you can process them.

How to check the amount of packets in a receive buffer of a raw socket

I have written a linux server receiving packets on a specific Ethernet-type (using a raw socket), and sending them on a different ethernet device. The thing is, the rate I need to receive the packets, is greater then the rate I can send them to the other interface. So I'm using the socket buffer, untill it gets full, and then I expect packets drop.
I have set the buffer size using
setsockopt(socket, SOL_SOCKET, RECVBUF, 20 * 1024 * 1024)
And validating using getsockopt, I do see the socket was configured correctly.
The thing is I start do drop packets much faster then I expected (nearly 10 times)
What I want to do, is get the amount of packets in the socket buffer, that I will be able to print the time left untill it is full.
(The server is written in Python, yet I would be able to "translate" from other languages)

Python socket recv taking ages to deliver packet

I have a Python 3 program which sends short commands to a host and gets short responses back (both 20 bytes). It's not doing anything complicated.
The socket is opened like this:
self.conn = socket.create_connection( ( self.host, self.port ) )
self.conn.settimeout( POLL_TIME )
and used like this:
while( True ):
buf = self.conn.recv( 256 )
# append buffer to bigger buffer, parse packet once we've got enough bytes
After my program has been running for a while (hours, usually), sometimes it goes into a strange mode - if I use tcpdump, I can see a response packet arriving at the local machine, but recv doesn't give me the packet until 30s (Windows) to 1m (Linux) later. The time is random +/- about ten seconds. I wondered if the packet was being delayed til the next packet arrived, but this doesn't seem to be true.
In the meantime, the same program is also operating a second socket connection using the same code on a different thread, which continues to work normally.
This doesn't happen all the time, but it's happened several times in a month. Sometimes it's preceded by a few seconds of packets taking longer and longer to arrive, but most of the time it just goes straight from OK to completely broken. Most of the time it stays broken for hours until I restart the server, but last night I noticed it recovering and going back to normal operation, so it's not irrecoverable.
CPU usage is almost zero, and nothing else is running on the same machine.
The weirdest thing is that this happens on both the Linux Subsystem for Windows (two different laptops), and on Linux (AWS tiny instance running Amazon Linux).
I had a look at the CPython implementation of socket.recv() using GDB. Looking at the source code, it looks like it passes calls to socket.recv() straight through to the underlying recv(). However, while the outer function sock_recv() (which implements socket.recv() ) gets called frequently, it only calls recv() when there's actually data to read from the socket, using the socket_call() function to call poll()/select() to see if there's any data waiting. Calls to recv() happen directly before the app receives a packet, so the delay is somewhere before that point, rather than between recv() and my code.
Any ideas on how to troubleshoot this?
(Both the Linux and Windows machines are updated to the most recent everything, and the Python is Python 3.6.2)
[edit] The issue gets even weirder. I got fed up and wrote a method to detect the issue (looking for ten late-arriving packets in a row with near-identical roundtrip times), drop the connection and reconnect (by closing the previous connection and creating a new socket object) ... and it didn't work. Even with a new socket object, the delayed packets remain delayed by exactly the same amount. So I altered the method to completely kill the thread that was running that code and restart it, reasoning that perhaps there was some thread-local state. That still didn't work. The only resort I have left is killing the entire program and having a watchdog to restart it...
[edit2] Killing the entire program and restarting it with an external watchdog worked. It's a terrible solution, but at least it's a solution.

Python socket wait

I was wondering if there is a way I can tell python to wait until it gets a response from a server to continue running.
I am writing a turn based game. I make the first move and it sends the move to the server and then the server to the other computer. The problem comes here. As it is no longer my turn I want my game to wait until it gets a response from the server (wait until the other player makes a move). But my line:
data=self.sock.recv(1024)
hangs because (I think) it's no getting something immediately. So I want know how can I make it wait for something to happen and then keep going.
Thanks in advance.

The socket programming howto is relevant to this question, specifically this part:
Now we come to the major stumbling block of sockets - send and recv operate on the
network buffers. They do not necessarily handle all the bytes you hand them (or expect
from them), because their major focus is handling the network buffers. In general, they
return when the associated network buffers have been filled (send) or emptied (recv).
They then tell you how many bytes they handled. It is your responsibility to call them
again until your message has been completely dealt with.
...
One complication to be aware of: if your conversational protocol allows multiple
messages to be sent back to back (without some kind of reply), and you pass recv an
arbitrary chunk size, you may end up reading the start of a following message. You’ll
need to put that aside >and hold onto it, until it’s needed.
Prefixing the message with it’s length (say, as 5 numeric characters) gets more complex,
because (believe it or not), you may not get all 5 characters in one recv. In playing
around, you’ll get away with it; but in high network loads, your code will very quickly
break unless you use two recv loops - the first to determine the length, the second to
get the data part of the message. Nasty. This is also when you’ll discover that send
does not always manage to get rid of everything in one pass. And despite having read
this, you will eventually get bit by it!
The main takeaways from this are:
you'll need to establish either a FIXED message size, OR you'll need to send the the size of the message at the beginning of the message
when calling socket.recv, pass number of bytes you actually want (and I'm guessing you don't actually want 1024 bytes). Then use LOOPs because you are not guaranteed to get all you want in a single call.

That line, sock.recv(1024), blocks until 1024 bytes have been received or the OS detects a socket error. You need some way to know the message size -- this is why HTTP messages include the Content-Length.
You can set a timeout with socket.settimeout to abort reading entirely if the expected number of bytes doesn't arrive before a timeout.
You can also explore Python's non-blocking sockets using setblocking(0).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.