Significant Delay in Receiving Data using Python Select() - python

I have a Python script that is used to receive data associated with a radio station audio event (such as a song or commercial) from the machine playing the audio. The script will parse and process the data and then send portions of it to various other destinations.
First the socket is set up:
client_socket_1 = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
print 'trying to open socket 1'
client_socket_1.connect((TCP_RCV_IP_CR1, TCP_RCV_PORT_CR1))
client_socket_1.setblocking(0)
except socket.error, e:
print 'Error', e, TCP_RCV_IP_CR1, '\n\n\n'
else:
SOCK1 = 1
print 'Successful connection to ',TCP_RCV_IP_CR1,'\n'
Now we wait until data is available to be read. I used select() and when the socket is ready to be read, the thread that parses and processes the data is spawned.
ready_1 = select.select([client_socket_1], [], [], 1) # select tells us when data is available at the socket
if ready_1[0] and SOCK1: # Don't run this code if there is no connection on client_socket_1 or no data available
t1 = Thread(target=processdata1) # Set up the thread
t1.start() # Call the process to process available data as a thread
It is important that the data be read as quickly as possible as it will be transported via TCP or UDP (depending on the particular data chunks and program specifications) along with the associated audio, and the function of one of the data items we are handling can create an on-air 'hiccup' in the audio if not received in a timely fashion. (TMI: It causes a 'replacement' commercial to play at the receiving end which is supposed to 'cover' the commercial audio we are sending. If the replacement spot doesn't start quickly enough listeners will hear the beginning of the commercial we are sending, then the local replacement one will start when our data is received and it sounds like a hiccup on the air.)
To confirm that my script is not always receiving the data quickly enough I telnetted to the port it is listening to and watched the data as it is received in the telnet window, then look at the Python output (which sends received data to stdout as soon as it is received) and I see about a 1.5-second delay between the telnet output and the Python output. This is the same amount of delay we have observed in normal on-air operation.
I chose to use select() because I was asked to multi-thread the script and I thought that would be a good way to know when to trigger a data-processing thread. My original idea was to simply loop through attempting to read data from each of the three systems we are monitoring and, when data is found, process it.
The thought was that if data is being processed from one system when another system has data ready to be read, it might cause a delay in processing and sending out the data from that machine. However, I can't see that delay being as significant as what I am experiencing now. I am considering going back to the original plan.
I would rather stick with what I have which is working flawlessly as long as data is received in a timely fashion. Any thoughts on why the excessively long delay?

I think it has to do with your timeout parameter in combination with the wlist and xlist parameters
Consider this piece of code
write_list = []
exception_list = []
select.select([client_socket_1], write_list, exception_list)
It takes an optional timeout parameter, like you use it. The documentation says
select() also takes an optional fourth parameter which is the number
of seconds to wait before breaking off monitoring if no channels have
become active. Using a timeout value lets a main program call select()
as part of a larger processing loop, taking other actions in between
checking for network input.
It might be that the call will always wait one second before returning because of the empty lists. Try
ready_1 = select.select(
[client_socket_1],
[client_socket_1],
[client_socket_1], 1
)
Or you can use a timeout value of 0, which
specifies a poll and never blocks.

Related

Python socket stop queuing

A program not made by me is sending coordinates using a python UDP socket.
I am able to process and receive the data, but there is a problem. My program is lagging behind from the tag that transmits the data more and more as time passes. I assume that some kind of queuing is happening because my program is to slow, as my current solution is to process one coordinate and then just quickly read a bunch more to keep up to date before processing again. While this works I don't like this solution, should it not be possible to take some kind of LIFO approach and then flush the remaining data after having read the latest coordinates? Attached is a snippet of my code, some of the processes that takes a long time have been removed.
def main():
sock = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
adress = ("IP",PORT)
sock.bind(adress)
while True:
data, addr = sock.recvfrom(1024)
split_by_letter = re.findall('[A-Z][^A-Z]*', data.decode('ascii'))
final_coord = str(split_by_letter)[3:-2]

What is the most efficient way to run independent processes from the same application in Python

I have a script that in the end executes two functions. It polls for data on a time interval (runs as daemon - and this data is retrieved from a shell command run on the local system) and, once it receives this data will: 1.) function 1 - first write this data to a log file, and 2.) function 2 - observe the data and then send an email IF that data meets certain criteria.
The logging will happen every time, but the alert may not. The issue is, in cases that an alert needs to be sent, if that email connection stalls or takes a lengthy amount of time to connect to the server, it obviously causes the next polling of the data to stall (for an undisclosed amount of time, depending on the server), and in my case it is very important that the polling interval remains consistent (for analytics purposes).
What is the most efficient way, if any, to keep the email process working independently of the logging process while still operating within the same application and depending on the same data? I was considering creating a separate thread for the mailer, but that kind of seems like overkill in this case.
I'd rather not set a short timeout on the email connection, because I want to give the process some chance to connect to the server, while still allowing the logging to be written consistently on the given interval. Some code:
def send(self,msg_):
"""
Send the alert message
:param str msg_: the message to send
"""
self.msg_ = msg_
ar = alert.Alert()
ar.send_message(msg_)
def monitor(self):
"""
Post to the log file and
send the alert message when
applicable
"""
read = r.SensorReading()
msg_ = read.get_message()
msg_ = read.get_message() # the data
if msg_: # if there is data in general...
x = read.get_failed() # store bad data
msg_ += self.write_avg(read)
msg_ += "==============================================="
self.ctlog.update_templog(msg_) # write general data to log
if x:
self.send(x) # if bad data, send...
This is exactly the kind of case you want to use threading/subprocesses for. Fork off a thread for the email, which times out after a while, and keep your daemon running normally.
Possible approaches that come to mind:
Multiprocessing
Multithreading
Parallel Python
My personal choice would be multiprocessing as you clearly mentioned independent processes; you wouldn't want a crashing thread to interrupt the other function.
You may also refer this before making your design choice: Multiprocessing vs Threading Python
Thanks everyone for the responses. It helped very much. I went with threading, but also updated the code to be sure it handled failing threads. Ran some regressions and found that the subsequent processes were no longer being interrupted by stalled connections and the log was being updated on a consistent schedule . Thanks again!!

Adding a time.sleep to a multithreaded program solves a UnicodeDecodeError in python

Here's a basic idea of the threads that I am creating in my program:
Main thread
|
ListenerCreator(The WebSocketServer thread) ---> Several listener threads(using log())
So the main thread creates a ListenerCreator thread, which connects to a number of clients and creates a listener thread for each client. Here's briefly what a listener thread does:
EDIT1 :
I'm using WebSockets to read/write data off my client. I've made my own server for this purpose. There is a framing protocol which the standard specifies -- and I am using that. On the client side I am simply using WebSocket.send() and "unmasking" the messages according to the instructions given in the protocol(see section 5.3 in the link above).
I would be willing to provide the server code if someone requests it, however, here's a brief outline:
class WebSocketServer:
def start():
#Open server socket, bind to host:port
while True:
#Accept client socket, start a new listener thread for self.log(client)
def log(client):
#Receive data using socket.socket.recv(1024)
#Unmask data as per the protocol
#Decode using data.decode("utf-8")
#Append to data_q while holding data_q_lock
There are other methods - those to facilitate sending, closing, handshaking and so on.
Meanwhile in the main thread:
while breaking!=len(client_list):
#time.sleep(0.5)
with data_q_lock:
for i in range(len(data_q)):
mes = data_q.pop()
for m in client_list:
if "#DONE"== mes:
breaking += 1
if(mes[:len("#COUNT:")] == "#COUNT:"):
print(mes)
So basically what this loop does is: Loop thru the data_q, if the message starts with "#COUNT", print the message, and after getting a certain number of "#DONE" messages, exit the loop.
If the time.sleep is uncommented, then this code works, however without time.sleep I get an UnicodeDecodeError in the log function.
Also I only get the error sometimes , sometimes the program works perfectly.
(The client is sending the same data every time, by the way)
So, my question is, why is the time.sleep required?
I thought it was something to do with the GIL in python, as time.sleep releases the GIL. However, even after reading about it I couldn't solve the question
Currently there is no information about how the listener is reading data off the socket. It seems likely however that this is being caused by the usual misunderstanding of sockets.
Data sent down a socket is not "framed" in any way by the socket. Imagine if I sent the message "hello" three times down a socket. Then, like writing to a file without line breaks, the following would flow on the socket:
hellohellohello
Now consider the reader ... when reading the data, how does it know where one message ("hello") starts and and the next? It cannot, unless the sender and receiver agree about how that data should be "framed". This could be done by agreeing on some protocol like:
null-terminating data; or
fixed size messages; or
size prefixed messages.
It gets more complicated of course, even once you've decided how the data should be framed, you cannot guarantee that socket.recv will return a "whole" message ... it will simply return whatever data happens to be in the buffer at the time. It may be a half a message, or a message and a half. Its your job to collate the data read from the socket and divide it into messages.
Turning to your problem, where you are sending utf-8 data. How does the reader know it has read a full utf-8 data message? Most likely, what is happening here is that you have only received a partial message ... there is still more to arrive.
In particular, a valid utf-8 character may consist of more than one byte. So if your partial message ends in the middle of a multi-byte utf-8 representation of a character, then you can certainly not decode it.

A Process to check if Infinite Loop is still running in Python3

I am unable to grasp this with the help of Programming concepts in general with the following scenario:
Note: All Data transmission in this scenario is done via UDP packets using socket module of Python3
I have a Server which sends some certain amount of data, assume 300 Packets over a WiFi Channel
At the other end, I have a receiver which works on a certain Decoding process to decode the data. This Decoding Process is kind of Infinite Loop which returns Boolean Value true or false at every iteration depending on certain aspects which can be neglected as of now
a Rough Code Snippet is as follows:Python3
incomingPacket = next(bringNextFromBuffer)
if decoder.consume_data(incomingPacket):
# this if condition is inside an infinite loop
# unless the if condition becomes True keep
# keep consuming data in a forever for loop
print("Data has been received")
Everything as of moment works since the Server and Client are in proximity and the data can be decoded. But in practical scenarios I want to check the loop that is mentioned above. For instance, after a certain amount of time, if the above loop is still in the Forever (Infinite) state I would like to send out something back to the server to start the data sending again.
I am not much clear with multithreading concept, but can I use a thread over here in this scenario?
For Example:
Thread a Process for a certain amount of time and keep checking the decoder.consume_data() function and if the time expires and the output is still False can I then send out a kind of Feedback to the server using struct.pack() over sockets.
Of course the networking logic, need NOT be addressed as of now. But is python capable of MONITORING THIS INFINITE LOOP VIA A PARALLEL THREAD OR OTHER CONCEPT OF PROGRAMMING?
Caveats
Unfortunately the Receiver in question is a dumb receiver i.e. No user control is specified. Only thing Receiver can do is decode the data and perhaps send a Feedback to the Server stating whether the data is received or not and that is possible only when the above mentioned LOOP is completed.
What is a possible solution here?
(Would be happy to share more information on request)
Yes you can do this. Roughly it'll look like this:
from threading import Thread
from time import sleep
state = 'running'
def monitor():
while True:
if state == 'running':
tell_client()
sleep(1) # to prevent too much happening here
Thread(target=monitor).start()
while state == 'running':
receive_data()

Using multprocessing.Pipe for blocking until event occurs (with timeout)

I am using multprocessing.Pipe in a rather simple script that has 2 processes where A reads data from an external source (Arduino connected on a serial port) and sends and event to B. This way I can make B block until it received that one specific event from A, however the external source is not able to detect the even very reliably at times (it's ~75% reliable). So I would like to implement a time-out around this event, however I would like to drop an erroneous even that had already been timed-out, but nothing stops it from occurring
Is there a better abstraction that I can utilised for this purpose? One thing I'd like to be able to do is b.recv(timeout=N), for some reason it's not currently possible with multiprocess.Pipe.
You could use the Connection's poll method; it has a timeout parameter:
receiver, sender = mp.Pipe()
...
if receiver.poll(timeout):
data = receiver.recv()

Categories