Unexpected behaviour of read(n) in pySerial 3.3

Unexpected behaviour of read(n) in pySerial 3.3 - python

I'm writing a project using an STM32F407_VG board that uses an RS232 connection to send batch of data of different sizes (~400 bytes) on a serial port, and those data must be written on file. On desktop side I'm using a python 3 script with pyserial 3.3.
I've tried reading a single byte at time with ser.read() but I think it's too slow because I'm losing some of the data. So I'm trying to send the size of the batch as an integer before the batch itself, in order to reduce the overhead, and write data to file during time interval within a batch and the following one.
PROBLEM IS: ser.read(n) behave in a very strange way, and 99% of the times it blocks when it's time to read the batch and do not return. It also happens that sometimes it can read the first batch and writes it to file successfully, but it blocks at the second loop iteration. It's strange because I can use ser.read(4) to get the batch size with zero problem, and I use ser.readline() at the beginning of the script when listening to a starting signal, but I cannot read the data.
I'm sure that data are there and are well formed because I checked with a logic analyzer, and I've already tried with enabling and disabling flow control or set different baud rates on both the board and the script. I think it could be a config problem of python, but actually I've run out of ideas.
PYTHON SCRIPT CODE -SNIPPET
ser = serial.Serial(str(sys.argv[1]), \
int(sys.argv[2]), \
stopbits=serial.STOPBITS_ONE, \
parity=serial.PARITY_NONE, \
bytesize=serial.EIGHTBITS, \
timeout=None \
)
outputFile = open(sys.argv[3],"wb")
# wait for begin string
beginSignal = "ready"
word = ""
while word != beginSignal:
word = ser.readline().decode()
word = sample.split("\n")[0]
print("Started receiving...")
while True:
# read size of next batch
nextBatchSize = ser.read(4)
nextBatchSize = int.from_bytes(nextBatchSize,byteorder='little', signed=True)
# reads the batch:
# THIS IS THE ONE THAT CREATES PROBLEMS
batch = ser.read(nextBatchSize)
# write data to file
outputFile.write(batch)
BOARD CODE - SNIPPET
// this function sends the size of the batch and the batch itself
void sendToSerial(unsigned char* mp3data, int size){
// send actual size of the batch
write(STDOUT_FILENO,&size,sizeof(int));
// send the batch of data
write(STDOUT_FILENO,mp3data,size);
}
Any idea? Thanks!

You might have already tried this, but print out nextBatchSize to confirm that it is what you expect, just in case the byte order is reversed. If this is wrong your Python code could be trying to read too many bytes, and would therefore block.
Also you can check ser.in_waiting to see how many bytes are available to be read from the input buffer before your read attempt:
print(ser.in_waiting)
batch = ser.read(nextBatchSize)
You should also check the return value of write() in your C code, which is the number of bytes actually written, or -1 if there is an error. write() is not guaranteed to write all of the bytes in the given buffer, so there might remain some that have not been written. Or there might have been an error.
Data loss suggests a flow control issue. You need to ensure that it is enabled at both ends. You've said that you tried it, but in your posted code it is not enabled. How are you opening and configuring the serial port at the board end?

Related

Why these Python send / receive socket functions work if invoked slowly, but fail if invoked quickly in a row?

I have a client and a server, where the server needs to send a number of text files to the client.
The send file function receives the socket and the path of the file to send:
CHUNKSIZE = 1_000_000
def send_file(sock, filepath):
with open(filepath, 'rb') as f:
sock.sendall(f'{os.path.getsize(filepath)}'.encode() + b'\r\n')
# Send the file in chunks so large files can be handled.
while True:
data = f.read(CHUNKSIZE)
if not data:
break
sock.send(data)
And the receive file function receives the client socket and the path where to save the incoming file:
CHUNKSIZE = 1_000_000
def receive_file(sock, filepath):
with sock.makefile('rb') as file_socket:
length = int(file_socket.readline())
# Read the data in chunks so it can handle large files.
with open(filepath, 'wb') as f:
while length:
chunk = min(length, CHUNKSIZE)
data = file_socket.read(chunk)
if not data:
break
f.write(data)
length -= len(data)
if length != 0:
print('Invalid download.')
else:
print('Done.')
It works by sending the file size as the first line, then sending the text file line by line.
Both are invoked in loops in the client and the server, so that files are sent and saved one by one.
It works fine if I put a breakpoint and invoke these functions slowly. But If I let the program run uninterrupted, it fails when reading the size of the second file:
File "/home/stark/Work/test/networking.py", line 29, in receive_file
length = int(file_socket.readline())
ValueError: invalid literal for int() with base 10: b'00,1851,-34,-58,782,-11.91,13.87,-99.55,1730,-16,-32,545,-12.12,19.70,-99.55,1564,-8,-10,177,-12.53,24.90,-99.55,1564,-8,-5,88,-12.53,25.99,-99.55,1564,-8,-3,43,-12.53,26.54,-99.55,0,60,0\r\n'
Clearly a lot more data is being received by that length = int(file_socket.readline()) line.
My questions: why is that? Shouldn't that line read only the size given that it's always sent with a trailing \n?
How can I fix this so that multiple files can be sent in a row?
Thanks!

It seems like you're reusing the same connection and what happens is your file_socket being buffered means... you've actually recved more from your socket then you'd think with your read loop.
I.e. the receiver consumes more data from your socket and next time you attempt to readline() you end up reading rest of the previous file up to the new line contained therein or of the next length information.
This also means your initial problem actually is you've skipped a while. Effect of which is next read line is not an int you expected and hence the observed failure.
You can say:
with sock.makefile('rb', buffering=0) as file_socket:
instead to force the file like access being unbuffered. Or actually handle the receiving and buffering and parsing of incoming bytes (understanding where one file ends and the next one begins) on your own (instead of file like wrapper and readline).

You have to understand that socket communication is based on TCP/IP, does not matter if it's same machine (you use loopback in such cases) or different machines. So, you've got some IP addresses between which the connection is established. Going further, it involves accessing your network adapter, ie takes relatively long in comparison to accessing eg. RAM. Additionally, the adapter itself manages when to send particular data frames (lower ISO/OSI layers). Basically, in case of TCP there's ACK required, but on standard PC this is usually not some industrial, real-time ethernet.
So, in your code, you've got a while True loop without any sleep and you don't check what does sock.send returns. Even if something goes wrong with particular data frame, you ignore it and try to send next. On first glance it appears that something has been cached and receiver received what was flushed once connection was re-established.
So, first thing which you should do is check if sock.send indeed returned number of bytes sent. If not, I believe the frame should be re-sent. Another thing which I strongly recommend in such cases is think of some custom protocol (this is usually called application layer in context of OSI/ISO stack). For example, you might have 4 types of frames: START, FILESIZE, DATA, END, assign unique ID and start each frame with the identifier. Then, START is gonna be empty, FILESIZE gonna contain single uint16, DATA is gonna contain {FILE NUMBER, LINE NUMBER, LINE_LENGTH, LINE} and END is gonna be empty. Then, once you've got entire frame on the client, you can safely assemble the information you received.

Python - Reading Serial Data that is constantly sending for parsing

I have a control box and a Raspberry Pi which communicate over Serial (Serial to RJ45), and I need the commands sent from the control box which are sent every 50ms. I am able to read the code, but here's the issue. When I start reading, the starting byte is incorrect so I am unable to parse it.
For example (The output I am currently getting):
b'\0x21\0x21\0x98\0x98\0x21\0x21\0x18\0x12\0x21\0x12\0x02\0x32\0x11
The starting byte I need has to be 0x98, so I need it to be like this
b'\0x98\0x98\0x21\0x21\0x18\0x12\0x21\0x12\0x02\0x32\0x11\0x‌12\0x11
I need it this way so I can parse the line and say grab Byte[4]-(0x21) or something like that.
In terms of research, I ran into Struct. I have no idea how to use this though, and I have no idea if I even need to use it.
I currently don't have a full version of the code on me at this moment, but here is a quick example of what I currently have:
import serial
import time
port = serial.Serial("/dev/ttyS0", baudrate=9600)
while True:
output = port.read(13) # --- In Total there are 13 Bytes
print(output)

Since you are getting another lot of data every 50mS, you need to be able to sync with the start of the data:
buffer = b''
header = b'\0x98'
while True:
if port.in_waiting:
buffer += port.read(port.in_waiting)
while len(buffer) >= 2:
if buffer[0] == header and buffer[1] == header:
break
buffer=buffer[1:]
if len(buffer) >= 13:
print(buffer[:13]) # or otherwise process latest buffer
buffer = buffer[13:]
This code starts with an empty buffer and then reads whatever data arrives at the serial port. While the buffer does not start with the two header bytes, any excess at the front is discarded. Once the buffer starts with the right header and is long enough, the 13 bytes are printed here (but you might want to call another function to process a whole packet), and then that packet is thrown away, ready to start with whatever arrives next.

PySerial skips/loses data on serial acquisition

I have a data acquisition system that produces ASCII data. The data is acquired over USB with serial communication protocol (virtual serial, as the manufacturer of the box claims). I have a Python program/script that uses PySerial with a PySide GUI that plots the acquired data and saves it to HDF5 files. I'm having a very weird problem and I don't know how to tackle it. I wish you guys could help me and provide advice on how you would debug this problem.
How the problem shows up: The problem is that if I use a software like Eltima Data Logger, the data acquired looks fine. However, if I use my software (with PySerial), some chunks of the data seems to be missing. What's weird is that the missing chunk of the data is incompatible with the method of reading. I read line by line, and what is missing from the data is like 100 bytes or 64 bytes chunks that sometimes include newlines!!! I know what's missing because the device buffers the data on an SD Card before sending it to the computer. This made me believe for a long time that the hardware has a problem, until I used this software, Eltima, that showed that it's acquiring the data fine.
The following is the configuration of Eltima:
My configuration:
This whole thing is running in a QThread.
The following is the methods I use in my code (with some minor polishing to make it reusable here):
self.obj = serial.Serial()
self.obj.port = instrumentName
self.obj.baudrate = 115200
self.obj.bytesize = serial.EIGHTBITS
self.obj.parity = serial.PARITY_ODD
self.obj.stopbits = serial.STOPBITS_ONE
self.obj.timeout = 1
self.obj.xonxoff = False
self.obj.rtscts = False
self.obj.dsrdtr = False
self.obj.writeTimeout = 2
self.obj.open()
The algorithm I use for reading, is that I have a loop that looks for a specific header line, and once found, it keeps pushing lines into buffer until a specific end line is found; and this data is finally processed. Following is my code:
try:
# keep reading until a header line is found that indicates the beginning of a batch of data
while not self.stopped:
self.line = self.readLine()
self.writeDumpFileLine(self.line)
if self.line == DataBatch.d_startString:
print("Acquiring batch, line by line...")
self.dataStrQueue.append(self.line)
break
# after the header line, keep reading until a specific string is found
while not self.stopped:
self.line = self.readLine()
self.writeDumpFileLine(self.line)
self.dataStrQueue.append(self.line)
if self.line == DataBatch.d_endString:
break
except Exception as e1:
print("Exception while trying to read. Error: " + str(e1))
The self.writeDumpFileLine() takes the line from the device and dumps it in a file directly before processing for debugging purposes. These dump files have confirmed the problem of missing chunks.
The implementation of self.readLine() is quite simple:
def readLine(self):
lineData = decodeString(self.obj.readline())
lineData = lineData.replace(acquisitionEndlineChar, "")
return lineData
I would like to point out that I also have an implementation that pulls thousands of lines and parses them based on inWaiting(), and this method has the same problem too!
Now I'm starting to wonder: Is it PySerial? What else could be causing this problem?
Thank you so much for any efforts. If you require any additional information, please ask!
UPDATE:
Actually I have just confirmed that the problem can be reproduced by getting the system to lag a little bit. I use PyCharm to program this software, and while the program is running, if I press Ctrl+S to save, the GUI of PyCharm freezes a little bit (and hence its terminal). Repeating this many times causes the problem in a reproducible manner!!!!

pySerial read() only reads up to 116 characters? Is this an issue with PythonSerial or the PLC I'm connecting to?

I am writing a command to my PLC to read 50 registers each register returning 4 characters of information (and around 8 characters for other information). For some reason though, the most I can read is 27 registers using
datarecv = ser.read(116)
where normally I'd wish to use ser.read(208) but this freezes up the program for some reason. Now a work around that I use is to do 2 separate reads one reading 26 registers and the other reading 24. What is the reason for the limit above and is it possible to get all the information in one read?

How long does it take to read all registers when you use 2 separate reads?
To debug the problem you may try the following:
1) Setup reasonable timeout for the connection, let's say 10 seconds or whatever it takes
ser = serial.Serial(comDev, 115200, timeout=10)
2) Write your command and don't flush anything
2) Read from serial character by character and, checking buffer, see where it hangs up (it shouldn't if your PCL doesn't need additional writes or whatever else to real all registers)
buffer = []
while 1:
ch = self.ser.read()
buffer.append(ch)
print buffer

TCP Socket file transfer

I'm trying to write a secure transfer file program using Python and AES and i've got a problem i don't totally understand. I send my file by parsing it with 1024 bytes chunks and sending them over but the server side who receive the data crashes ( I use AES CBC therefore my data length must be a multiple of 16 bytes ) and the error i get says that it is not.
I tried to print the length of the data sent by the client on the client side and the length of the data received on the server and it shows that the client is sending exactly 1024 bytes each time like it's supposed to, but the server side shows that at some point in time, a received packet is not and so less than 1024 bytes ( for example 743 bytes ).
I tried to put a time.sleep(0.5) between each socket send on the client side and it seems to work. Is it possible that it is some kind of socket buffer failure on the server side ? That too much data is being send too fast by the client and that it breaks somehow the socket buffer on the server side so the data is corrupted or vanish and the recv(1024) only receive a broken chunk? That's the only thing i could think of, but this may also be completely false, if anyone has an idea of why this is not working properly it would be great ;)
Following my idea i tried :
self.s.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 32768000)
print socket.SO_RCVBUF
I tried to put a 32mbytes buffer on the server side but On Windows XP it shows 4098 on the print and on linux it shows only 8. I don't know how i must interpret this, the only thing i know is that it seems that it doesn't have a 32mbytes buffer so the code doesn't work.
Well it's been a really long post, i hope some of you had the courage to read it all to here ! i'm totally lost there so if anyone has any idea about this please share it :D
Thanks to Faisal my code is here :
Server Side: ( count is my filesize/1024 )
while 1:
txt=self.s.recv(1024)
if txt == " ":
break
txt = self.cipher.decrypt(txt)
if countbis == count:
txt = txt.rstrip()
tfile.write(txt)
countbis+=1
Client side :
while 1:
txt= tfile.read(1024)
if not txt:
self.s.send(" ")
break
txt += ' ' * (-len(txt) % 16)
txt = self.cipher.encrypt(txt)
self.s.send(txt)
Thanks in advance,
Nolhian

Welcome to network programming! You've just fallen into the same mistaken assumption that everyone makes the first time through in assuming that client sends & server recives should be symmetric. Unfortunately, this is not the case. The OS allows reception to occur in arbitrarily sized chunks. It's fairly easy to work around though, just buffer your data until the amount you've read in equals the amount you wish to receive. Something along the lines of this will do the trick:
buff=''
while len(buff) < 1024:
buff += s.recv( 1024 - len(buff) )

TCP is a stream protocol, it doesn't conserve message boundaries, as you have just discovered.

As others have pointed out you're probably processing an incomplete message. You need to either have fixed sized messages or have a delimiter (don't forget to escape your data!) so you know when a complete message has been received.

What TCP can guarantee is that all your data arrives, in the right order, at some point. (Unless something unexpected happens, by which it won't arrive.) But it's very possible that the data you send will still arrive in chunks. Much of it is because of limited send- and receive-buffers. What you should do is to continue doing your recv calls until you have enough data to process it. You might might have to call send multiple times; use its return value to keep track of how much data has been sent/buffered so far.
When you do print socket.SO_RCVBUF, you actually print the symbolic SO_RCVBUF contant (except that Python doesn't really have constants); the one used to tell setsockopt what you want to change. To get the current value, you should instead call getsockopt.

Not related to TCP (as that has been answered already), but appending to a string repeatedly will be rather inefficient if you're expecting to receive a lot. It might be better to append to a list and then turn the list into a string when you finished receiving by using ''.join(list).

For many applications, the complexities of TCP are neatly abstracted by Python's asynchat module.

Here is the nice snippet of code that I wrote some time ago, may be not the best , but it could be good example of big files transfer over the local network. http://setahost.com/sending-files-in-local-network-with-python/

As mentioned above
TCP is a stream protocol
You can try this code, where the data is your original data, you can read it from the file or user input
Sender
import socket as s
sock = s.socket(s.AF_INET, s.SOCK_STREAM)
sock.connect((addr,5000))
sock.sendall(data)
finish = t.time()
Receiver
import socket as s
sock = s.socket(s.AF_INET, s.SOCK_STREAM)
sock.setsockopt(s.SOL_SOCKET, s.SO_REUSEADDR, 1)
sock.bind(("", 5000))
sock.listen(1)
conn, _ = sock.accept()
pack = []
while True:
piece = conn.recv(8192)
if not piece:
break
pack.append(piece.decode())

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.