PySerial's readlines() consumes 25x times as much CPU time as read()

PySerial's readlines() consumes 25x times as much CPU time as read() - python

I'm reading a constant flow of data over a serial port, using pyserial:
import serial
s = serial.Serial('/dev/ttyACM0', 9600)
If I'm using .read method, like this
while True:
print(s.read(1000))
It consumes something around 1-2% CPU
But if I start getting them in a list, which is more convenient, like that
while True:
print(s.readlines(1000))
CPU usage suddenly spikes up to 50%, which seems a bit unreasonable with only difference of output being split in the new lines.
Am I doing something wrong, is there a way to get readlines() method to use CPU more sparingly?
Thank you

My guess is that readlines and readline busily poll the serial line for new characters in order to fulfill your request to get a full line (or lines), whereas .read will only read and return when there indeed is new data. You'll probably have to implement buffering and splitting to lines yourself (code untested since I don't have anything on a serial line right now :-) ):
import serial
def read_lines(s, sep=b"\n"):
buffer = b""
while True:
buffer += s.read(1000)
while sep in buffer:
line, _, buffer = buffer.partition(sep)
yield line
s = serial.Serial("/dev/ttyACM0", 9600)
for line in read_lines(s):
print(line)

Related

Why these Python send / receive socket functions work if invoked slowly, but fail if invoked quickly in a row?

I have a client and a server, where the server needs to send a number of text files to the client.
The send file function receives the socket and the path of the file to send:
CHUNKSIZE = 1_000_000
def send_file(sock, filepath):
with open(filepath, 'rb') as f:
sock.sendall(f'{os.path.getsize(filepath)}'.encode() + b'\r\n')
# Send the file in chunks so large files can be handled.
while True:
data = f.read(CHUNKSIZE)
if not data:
break
sock.send(data)
And the receive file function receives the client socket and the path where to save the incoming file:
CHUNKSIZE = 1_000_000
def receive_file(sock, filepath):
with sock.makefile('rb') as file_socket:
length = int(file_socket.readline())
# Read the data in chunks so it can handle large files.
with open(filepath, 'wb') as f:
while length:
chunk = min(length, CHUNKSIZE)
data = file_socket.read(chunk)
if not data:
break
f.write(data)
length -= len(data)
if length != 0:
print('Invalid download.')
else:
print('Done.')
It works by sending the file size as the first line, then sending the text file line by line.
Both are invoked in loops in the client and the server, so that files are sent and saved one by one.
It works fine if I put a breakpoint and invoke these functions slowly. But If I let the program run uninterrupted, it fails when reading the size of the second file:
File "/home/stark/Work/test/networking.py", line 29, in receive_file
length = int(file_socket.readline())
ValueError: invalid literal for int() with base 10: b'00,1851,-34,-58,782,-11.91,13.87,-99.55,1730,-16,-32,545,-12.12,19.70,-99.55,1564,-8,-10,177,-12.53,24.90,-99.55,1564,-8,-5,88,-12.53,25.99,-99.55,1564,-8,-3,43,-12.53,26.54,-99.55,0,60,0\r\n'
Clearly a lot more data is being received by that length = int(file_socket.readline()) line.
My questions: why is that? Shouldn't that line read only the size given that it's always sent with a trailing \n?
How can I fix this so that multiple files can be sent in a row?
Thanks!

It seems like you're reusing the same connection and what happens is your file_socket being buffered means... you've actually recved more from your socket then you'd think with your read loop.
I.e. the receiver consumes more data from your socket and next time you attempt to readline() you end up reading rest of the previous file up to the new line contained therein or of the next length information.
This also means your initial problem actually is you've skipped a while. Effect of which is next read line is not an int you expected and hence the observed failure.
You can say:
with sock.makefile('rb', buffering=0) as file_socket:
instead to force the file like access being unbuffered. Or actually handle the receiving and buffering and parsing of incoming bytes (understanding where one file ends and the next one begins) on your own (instead of file like wrapper and readline).

You have to understand that socket communication is based on TCP/IP, does not matter if it's same machine (you use loopback in such cases) or different machines. So, you've got some IP addresses between which the connection is established. Going further, it involves accessing your network adapter, ie takes relatively long in comparison to accessing eg. RAM. Additionally, the adapter itself manages when to send particular data frames (lower ISO/OSI layers). Basically, in case of TCP there's ACK required, but on standard PC this is usually not some industrial, real-time ethernet.
So, in your code, you've got a while True loop without any sleep and you don't check what does sock.send returns. Even if something goes wrong with particular data frame, you ignore it and try to send next. On first glance it appears that something has been cached and receiver received what was flushed once connection was re-established.
So, first thing which you should do is check if sock.send indeed returned number of bytes sent. If not, I believe the frame should be re-sent. Another thing which I strongly recommend in such cases is think of some custom protocol (this is usually called application layer in context of OSI/ISO stack). For example, you might have 4 types of frames: START, FILESIZE, DATA, END, assign unique ID and start each frame with the identifier. Then, START is gonna be empty, FILESIZE gonna contain single uint16, DATA is gonna contain {FILE NUMBER, LINE NUMBER, LINE_LENGTH, LINE} and END is gonna be empty. Then, once you've got entire frame on the client, you can safely assemble the information you received.

PySerial timeout, CPU usage, and barcode scanning

It's worth mentioning up-front that while I have a background in CS, the number of Python scripts I've written in could likely be counted on the number of toes on a sloth's paw. That said, I started playing with PySerial to read from a USB barcode scanner. One problem I'm having is the timeout. If I set it too low, I miss scans. If I set it too high, the processor utilization is huge. Of course, this is mentioned in the documentation for PySerial:
Be careful when using readline(). Do specify a timeout when opening
the serial port otherwise it could block forever if no newline
character is received. Also note that readlines() only works with a
timeout. readlines() depends on having a timeout and interprets that
as EOF (end of file). It raises an exception if the port is not opened
correctly.
Right. So, here's my simple code:
#!/usr/bin/env python
import serial
ser = serial.Serial('/dev/ttyACM0', rtscts=True, dsrdtr=True, timeout=0.05)
ser.baudrate = 115200
while True:
s = ser.readline()
if s:
print(s)
How do I appropriately read from a serial device without risking missed scans? Sure, the odds are incredibly low with that small of a timeout, but I'm wanting to use this for production purposes at my business, so let's assume that this is mission-critical. What's the proper way to approach this problem (again, assuming that my understanding of Python is nil)?
Thanks, everyone!
EDIT: Possible solution?
I came up with the following that doesn't use a timeout and simply reads a single character at a time until it reaches a newline. It seems like this is pretty light on processor utilization (which was the whole issue I was having). Of course, I need to account for other newline possibilities from different scanners, but is there any reason why this wouldn't work?
#!/usr/bin/env python
import serial
ser = serial.Serial('/dev/ttyACM0', rtscts=True, dsrdtr=True)
ser.baudrate = 115200
string = ""
while 1:
char = ser.read(1)
string += char
if char == '\r':
print(string)
string = ""

From what I know about barcode scanners, you can configure them so that they only trigger scanning when you send them a specific write command over serial, you can use that to your advantage.
ser = serial.Serial('/dev/ttyUSBx',timeout=y)
ser.write('<trigger scan>')
value = ser.readline()
ser.close()
For continuous reading, the best way of doing it is to keep reading bytes in a timeout loop like
time_start = datetime.datetime.now()
time_end = time_start + datetime.timedelta(seconds=timeout)
output = []
while datetime.datetime.now() < time_end:
output.append(ser.read(100))

My experience with setting timeout to a high value is the opposite of your assertion. A high timeout ensures that python is not checking the serial buffer every 1/20,000th of a second. That's the point of a serial buffer, to store input until it is read. The timeout is in thousandths of seconds, so 0.05 * 1/1000 = 1/20,000 or 20,000 checks per second. I set it to 10 seconds below. (a minimum of 6 checks per minute) Of course, if python encounters a new line sooner then the readline() does not timeout.
#!/usr/bin/env python
import serial
ser = serial.Serial('/dev/ttyACM0', rtscts=True, dsrdtr=True, timeout=10000)
ser.baudrate = 115200
while True:
s = ser.readline()
if s:
print(s)
However, if your UART has no buffer and discards anything past one character, you could lose input. This depends on your hardware and setup. If the bar code fits in the buffer, you should not encounter any problems.

Unexpected behaviour of read(n) in pySerial 3.3

I'm writing a project using an STM32F407_VG board that uses an RS232 connection to send batch of data of different sizes (~400 bytes) on a serial port, and those data must be written on file. On desktop side I'm using a python 3 script with pyserial 3.3.
I've tried reading a single byte at time with ser.read() but I think it's too slow because I'm losing some of the data. So I'm trying to send the size of the batch as an integer before the batch itself, in order to reduce the overhead, and write data to file during time interval within a batch and the following one.
PROBLEM IS: ser.read(n) behave in a very strange way, and 99% of the times it blocks when it's time to read the batch and do not return. It also happens that sometimes it can read the first batch and writes it to file successfully, but it blocks at the second loop iteration. It's strange because I can use ser.read(4) to get the batch size with zero problem, and I use ser.readline() at the beginning of the script when listening to a starting signal, but I cannot read the data.
I'm sure that data are there and are well formed because I checked with a logic analyzer, and I've already tried with enabling and disabling flow control or set different baud rates on both the board and the script. I think it could be a config problem of python, but actually I've run out of ideas.
PYTHON SCRIPT CODE -SNIPPET
ser = serial.Serial(str(sys.argv[1]), \
int(sys.argv[2]), \
stopbits=serial.STOPBITS_ONE, \
parity=serial.PARITY_NONE, \
bytesize=serial.EIGHTBITS, \
timeout=None \
)
outputFile = open(sys.argv[3],"wb")
# wait for begin string
beginSignal = "ready"
word = ""
while word != beginSignal:
word = ser.readline().decode()
word = sample.split("\n")[0]
print("Started receiving...")
while True:
# read size of next batch
nextBatchSize = ser.read(4)
nextBatchSize = int.from_bytes(nextBatchSize,byteorder='little', signed=True)
# reads the batch:
# THIS IS THE ONE THAT CREATES PROBLEMS
batch = ser.read(nextBatchSize)
# write data to file
outputFile.write(batch)
BOARD CODE - SNIPPET
// this function sends the size of the batch and the batch itself
void sendToSerial(unsigned char* mp3data, int size){
// send actual size of the batch
write(STDOUT_FILENO,&size,sizeof(int));
// send the batch of data
write(STDOUT_FILENO,mp3data,size);
}
Any idea? Thanks!

You might have already tried this, but print out nextBatchSize to confirm that it is what you expect, just in case the byte order is reversed. If this is wrong your Python code could be trying to read too many bytes, and would therefore block.
Also you can check ser.in_waiting to see how many bytes are available to be read from the input buffer before your read attempt:
print(ser.in_waiting)
batch = ser.read(nextBatchSize)
You should also check the return value of write() in your C code, which is the number of bytes actually written, or -1 if there is an error. write() is not guaranteed to write all of the bytes in the given buffer, so there might remain some that have not been written. Or there might have been an error.
Data loss suggests a flow control issue. You need to ensure that it is enabled at both ends. You've said that you tried it, but in your posted code it is not enabled. How are you opening and configuring the serial port at the board end?

Python Socket is receiving inconsistent messages from Server

So I am very new to networking and I was using the Python Socket library to connect to a server that is transmitting a stream of location data.
Here is the code used.
import socket
BUFFER_SIZE = 1024
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((gump.gatech.edu, 756))
try:
while (1):
data = s.recv(BUFFER_SIZE).decode('utf-8')
print(data)
except KeyboardInterrupt:
s.close()
The issue is that the data arrives in inconsistent forms.
Most of the times it arrives in the correct form like this:
2016-01-21 22:40:07,441,-84.404153,33.778685,5,3
Yet other times it can arrive split up into two lines like so:
2016-01-21
22:40:07,404,-84.396004,33.778085,0,0
The interesting thing is that when I establish a raw connection to the server using Putty I only get the correct form and never the split. So I imagine that there must be something happening that is splitting the message. Or something Putty is doing to always assemble it correctly.
What I need is for the variable data to contain the proper line always. Any idea how to accomplish this?

It is best to think of a socket as a continuous stream of data, that may arrive in dribs and drabs, or a flood.
In particular, it is the receivers job to break the data up into the "records" that it should consist of, the socket does not magically know how to do this for you. Here the records are lines, so you must read the data and split into lines yourself.
You cannot guarantee that a single recv will be a single full line. It could be:
just part of a line;
or several lines;
or, most probably, several lines and another part line.
Try something like: (untested)
# we'll use this to collate partial data
data = ""
while 1:
# receive the next batch of data
data += s.recv(BUFFER_SIZE).decode('utf-8')
# split the data into lines
lines = data.splitlines(keepends=True)
# the last of these may be a part line
full_lines, last_line = lines[:-1], lines[-1]
# print (or do something else!) with the full lines
for l in full_lines:
print(l, end="")
# was the last line received a full line, or just half a line?
if last_line.endswith("\n"):
# print it (or do something else!)
print(last_line, end="")
# and reset our partial data to nothing
data = ""
else:
# reset our partial data to this part line
data = last_line

The easiest way to fix your code is to print the received data without adding a new line, which the print statement (Python 2) and the print() function (Python 3) do by default. Like this:
Python 2:
print data,
Python 3:
print(data, end='')
Now print will not add its own new line character to the end of each printed value and only the new lines present in the received data will be printed. The result is that each line is printed without being split based on the amount of data received by each `socket.recv(). For example:
from __future__ import print_function
import socket
s = socket.socket()
s.connect(('gump.gatech.edu', 756))
while True:
data = s.recv(3).decode('utf8')
if not data:
break # socket closed, all data read
print(data, end='')
Here I have used a very small buffer size of 3 which helps to highlight the problem.
Note that this only fixes the problem from the POV of printing the data. If you wanted to process the data line-by-line then you would need to do your own buffering of the incoming data, and process the line when you receive a new line or the socket is closed.

Edit:
socket.recv() is blocking and like the others said, you wont get an exact line each time you call the method. So as a result, the socket is waiting for data, gets what it can get and then returns. When you print this, because of pythons default end argument, you may get more newlines than you expected. So to get the raw stuff from your server, use this:
import socket
BUFFER_SIZE = 1024
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('gump.gatech.edu', 756))
try:
while (1):
data=s.recv(BUFFER_SIZE).decode('utf-8')
if not data: break
print(data, end="")
except KeyboardInterrupt:
s.close()

How to read most recent line from stdin in python

Is there way to read only the current data from stdin?
I would like to pipe some never-ending input data (from a mouse like device) into a python script and grab only the most recent line of data.
The input x,y data looks like this and arrives at 600 lines per second:
0.123,0.123
0.244,0.566
etc.
So far I have tried something like this:
import sys, time
while 1:
data = sys.stdin.readline()
my_slow_function(data)
Python seems to buffer the data so nothing is skipped. I would like to skip everything except the current line.

Just spin up a separate thread to read stdin into a global variable. Make it a daemon thread so that you don't have to close it later on. The thread reads the data as it arrives and keeps discarding the old stuff. Have your regular program read last_line when it wants to.
I added an event so that the regular program can wait when no new data is available. If that's not what you want, take it out.
import sys
import threading
last_line = ''
new_line_event = threading.Event()
def keep_last_line():
global last_line, new_line_event
for line in sys.stdin:
last_line = line
new_line_event.set()
keep_last_line_thread = threading.Thread(target=keep_last_line)
keep_last_line_thread.daemon = True
keep_last_line_thread.start()

Keep the current line, only act on the last line.
buffer = None
for line in sys.stdin:
buffer = line
my_slow_function(buffer)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.