struct.unpack_from() buffer reading issue with data longer than 255 bytes

struct.unpack_from() buffer reading issue with data longer than 255 bytes - python

I am using zmq to pass messages back and forth from a client/server. I am using pickle to deserialze the message, so I can parse information out of it. When I am sending a message that is less than 256 bytes, everything works as expected, and I can grab the message ID from the 12 byte. However, if I send a message with a buffer length of more than 255 bytes, struct.pack_into doesn't seem the be reading the fields correctly, and is giving the wrong messageID. I printed out the bytes, and they are still sending correctly. It seems like struct.pack_into can no longer properly find the 12th byte. Somehow the size is changing its behavior. Suggestions?
Client:
buf = ctypes.create_string_buffer(256)
struct.pack_into("!qII", buf, 0, message.Timestamp, message.MessageID, message.Payload)
# Send reply back to client with version
socket.send(buf)
Server:
message = socket.recv()
message = pickle.dumps(message) # Serializes object
(timestamp, message_ID) = struct.unpack_from('!qi', message, 4) # Start reading on 4th byte (pickle adds header)

Q : Suggestions?
Just follow the struct documented properties.
Current code fails on both loading data into struct.pack() and keeps trying to unload some other data layout in struct.unpack_from(), than was used for the struct-payload assembly.
struct.pack_into( "!qII", ################## Use this packing LAYOUT !:: BIG_ENDIAN byte-ordering ( "network"-alike )
buf, # ----BUFFER[256] q:: ( signed ) long long 8-B ~ int64
0, # ----start_at_offset 0 I:: unsigned int 4-B ~ uint32
message.Timestamp, # ----a------+ I:: unsigned int 4-B ~ uint32
message.MessageID, # ----b------:---------------+
message.Payload # ----c------:---------------:-------+ <End-of-STORAGE-FORMAT>
) # |_:____________+8_:____+4_:____+4
# Big | v | v | v |
# # ENDIAN |0.1.2.3.4.5.6.7|8.9.A.B|C.D.E.F|
# # ?in just|
# Send reply back to client w version # ?4-Bytes|
socket.send( buf ) # if len( message.Payload ) > 4 ...^^^.^^^^^ Houston, we have a problem ...
# |
# |
# V
#-----V-------- network-transported payload
# V
# :
# :
# :
#-----V-------- network-delivered (same) payload
# V
socket.recv()
( timestamp,
message_ID
) = struct.unpack_from( '!qi', ########### Use this packing LAYOUT !:: BIG_ENDIAN byte-ordering ( "network"-alike )
message, # ----PAYLOAD-DATA q:: ( signed ) long long 8-B ~ int64
4 # ----from_offset 4! I:: unsigned int 4-B ~ uint32
) # !
# # MESSAGE |_______!_______________________<End-of-STORAGE-FORMAT>
# # DATA | ! |
# # |0.1.2.3!4.5.6.7.8.9.A.B.C.D.E.F|
# | +8 +4
# | | |
# |0.1.2.3.4.5.6.7|8.9.A.B|
# : : <End-of-UNPACK-FORMAT>
#( timestamp <----------------------------------------------<-:_______________: :
# message_ID <-------------------------------------------------------------<-:_______:
# )
So,
unless there is some hidden intent to somehow cross-breed a mix of bytes, taking the last-4-bytes from the original, sending Python int64-8B-long-long message.Timestamp & joining them with the uint32-4-bytes from the original, sending Python message.MessageID to decode a receiving-side Python a new int64-8B-long-long, formed from this mix, stored into a python ( timestamp, ... ) first item, leaving the rest ( 4B-long part of the delivered payload (originated as a message.Payload in the sending Python side) to now get decoded & stored as ( ..., message_ID ),wewill have to repair the code, to at least avoid the current Byte-cross-breeding.
So, either use the very same struct-format-string-TEMPLATES on either side, or one may design an adaptive data-storage layout compositions, where the PAYLOAD-assembler puts the correct byte-length into a known, fix position (best coded as the first I-uint32, or a 10s-string representation, if an easy, human-readable, wireline packet inspection is wished to be kept)
The PAYLOAD-receiver will then first decode the PAYLOAD-data on this known, fix, position -- so as to learn the actual whole PAYLOAD-length from it -- and as a second step it will adaptively compose the learned length-"matching" format-string-TEMPLATE :
...
TEMPLATE_MASK = "!10s I I {0:}s"
...
aRecvdMSG = socket.recv()
TEMPLATE = TEMPLATE_MASK.format( int( struct.unpack( "!10s", aRecvdMSG )[0] )
-10 # SUB _10s_ for _str__ used for advice of payload length
- 4 # SUB ___I_ for _int32_ used for ...
- 4 # SUB ___I_ for _int32_ used for ID
)
...
( aPayloadByteLENGTH,
aVersionNUM,
aMessageID,
) = struct.unpack( TEMPLATE, aRecvdMSG )
The same principle applies to the sending Python side, where PAYLOAD-assembly can use the very same adaptively composed TEMPLATE.format(...)-method, so as the struct.pack()-method gets all details in-order & well aligned, so as to avoid both any kind of the declared buffer-overflows and any kind of mixing bytes in order/alignment-related "cross-breeding" ill-demapped bytes from the delivered PAYLOAD-content into the hands of the blind & believing receiving-side Python interpreter.
Having used this for more than a decade for many-items long message-PAYLOAD assembly/dissasembly in low-latency TAT distributed-computing systems, you can rely on re-using this know how for any less demanding data-portability guaranteed ZeroMQ-served communication DATA-interexchange

Related

Shared memory between C and python

I want to share memory between a program in C and another in python.
The c program uses the following structure to define the data.
struct Memory_LaserFrontal {
char Data[372]; // original data
float Med[181]; // Measurements in [m]
charD; // 'I': Invalid -- 'V': Valid
charS; // 'L': Clean -- 'S': Dirty
char LaserStatus[2];
};
From python I have managed to read the variable in memory using sysv_ipc but they have no structure and is seen as a data array. How can I restructure them?
python code:
from time import sleep
import sysv_ipc
# Create shared memory object
memory = sysv_ipc.SharedMemory(1234)
# Read value from shared memory
memory_value = memory.read()
print (memory_value)
print (len(memory_value))
while True:
memory_value = memory.read()
print (float(memory_value[800]))
sleep(0.1)
I have captured and printed the data in python, I have modified the sensor reading and the read data is also modified, confirming that the read data corresponds to the data in the sensor's shared memory. But without the proper structure y cant use the data.

You need to unpack your binary data structure into Python types. The Python modules struct and array can do this for you.
import struct
import array
NB: Some C compilers, but not the comomn ones, may pad your member variables to align each of them with the expected width for your CPU ( almost always 4 bytes ). This means that it may add padding bytes. You may have to experiment with the struct format parameter 'x' between the appropriate parts of your struct if this is the case. Python's struct module does not expect aligned or padded types by default, you need to inform it. See my note at the very end for a guess on what the padding might look like. Again, per #Max's comment, this is unlikely.
NB: I think the members charD and charS are really char D; and char S;
Assuming you want the floats as a Python list or equivalent we have to do some more work with the Python array module . Same for the char[] Data.
# Get the initial char array - you can turn it into a string if you need to later.
my_chars = array.array("b") # f for float, b for byteetc.
my_chars.from_bytes(memory_value[:372]) # happens that 372 C chars is 372 bytes.
Data = my_chars.tolist() # Could be bytes list
# advance to the member after Data
end_of_Data = struct.calcsize("372c")
# get the length in bytes that 181 floats take up
end_of_Med = struct.calcsize("181f") + end_of_Data
# now we know where the floats are
floats_as_bytes = memory_value[ end_of_Data : end_of_Med ]
# unpack the remaining parts
( D, S, LaserStatus_1, LaserStatus_2 ) = struct.unpack( "cccc", memory_value[end_of_Med:] )
Now use the array module to unpack to make a Python list
my_floats = array.array("f") # f for float, c for char etc.
my_floats.from_bytes(floats_as_bytes)
Now Data might be a list of Python bytes type that you need to convert to your preferred string encoding. Usually .decode('utf-8') is good enough.
Data_S = "".join(Data).decode('utf-8') # get a usable string in Data_S
Padding
struct Memory_LaserFrontal {
char Data[372]; // 372 is a multiple of 4, probably no padding
float Med[181]; // floats are 4 bytes, probably no padding
charD; // single char, expect 3 padding bytes after
charS; // single char, expect 3 padding bytes after
char LaserStatus[2]; // double char expect 2 padding bytes after.
};
So the last Python line above might be - where the 'x' indicates a padding byte that can be ignored.
( D, S, LaserStatus_1, LaserStatus_2 ) = struct.unpack( "cxxxcxxxccxx", memory_value[end_of_Med:] )

I always like to leave the full source code of the problem solved so others can use it if they have a similar problem.
thanks a lot all!
from time import sleep
import sysv_ipc
import struct
import array
# Create shared memory object
while True:
memory = sysv_ipc.SharedMemory(1234)
# Read value from shared memory
memory_value = memory.read()
#print (memory_value)
#print (len(memory_value))
# Get the initial char array - you can turn it into a string if you need to later.
my_chars = array.array("b") # f for float, c for char etc.
#my_chars.from_bytes(memory_value[:372]) # happens that 372 chars is 372 bytes.
Data = my_chars.tolist() # Could be bytes list
# advance to the member after Data
end_of_Data = struct.calcsize("372c")
# get the length in bytes that 181 floats take up
end_of_Med = struct.calcsize("181f") + end_of_Data
# now we know where the floats are
floats_as_bytes = memory_value[ end_of_Data : end_of_Med ]
# unpack the remaining parts
( D, S, LaserStatus_1, LaserStatus_2 ) = struct.unpack( "cccc", memory_value[end_of_Med:] )
print(len(floats_as_bytes)/4)
a=[]
for i in range(0,len(floats_as_bytes),4):
a.append(struct.unpack('<f', floats_as_bytes[i:i+4]))
print (a[0])
sleep(0.1)

unable to unpack information between custom Preamble in Python and telnetlib

I have an industrial sensor which provides me information via telnet over port 10001.
It has a Data Format as follows:
Also the manual:
All the measuring values are transmitted int32 or uint32 or float depending on the sensors
Code
import telnetlib
import struct
import time
# IP Address, Port, timeout for Telnet
tn = telnetlib.Telnet("169.254.168.150", 10001, 10)
while True:
op = tn.read_eager() # currently read information limit this till preamble
print(op[::-1]) # make little-endian
if not len(op[::-1]) == 0: # initially an empty bit starts (b'')
data = struct.unpack('!4c', op[::-1]) # unpacking `MEAS`
time.sleep(0.1)
my initial attempt:
Connect to the sensor
read data
make it to little-endian
OUTPUT
b''
b'MEAS\x85\x8c\x8c\x07\xa7\x9d\x01\x0c\x15\x04\xf6MEAS'
b'\x04\xf6MEAS\x86\x8c\x8c\x07\xa7\x9e\x01\x0c\x15\x04\xf6'
b'\x15\x04\xf6MEAS\x85\x8c\x8c\x07\xa7\x9f\x01\x0c\x15'
b'\x15\x04\xf6MEAS\x87\x8c\x8c\x07\xa7\xa0\x01\x0c'
b'\xa7\xa2\x01\x0c\x15\x04\xf6MEAS\x87\x8c\x8c\x07\xa7\xa1\x01\x0c'
b'\x8c\x07\xa7\xa3\x01\x0c\x15\x04\xf6MEAS\x87\x8c\x8c\x07'
b'\x88\x8c\x8c\x07\xa7\xa4\x01\x0c\x15\x04\xf6MEAS\x88\x8c'
b'MEAS\x8b\x8c\x8c\x07\xa7\xa5\x01\x0c\x15\x04\xf6MEAS'
b'\x04\xf6MEAS\x8b\x8c\x8c\x07\xa7\xa6\x01\x0c\x15\x04\xf6'
b'\x15\x04\xf6MEAS\x8a\x8c\x8c\x07\xa7\xa7\x01\x0c\x15'
b'\x15\x04\xf6MEAS\x88\x8c\x8c\x07\xa7\xa8\x01\x0c'
b'\x01\x0c\x15\x04\xf6MEAS\x88\x8c\x8c\x07\xa7\xa9\x01\x0c'
b'\x8c\x07\xa7\xab\x01\x0c\x15\x04\xf6MEAS\x8b\x8c\x8c\x07\xa7\xaa'
b'\x8c\x8c\x07\xa7\xac\x01\x0c\x15\x04\xf6MEAS\x8c\x8c'
b'AS\x89\x8c\x8c\x07\xa7\xad\x01\x0c\x15\x04\xf6MEAS\x8a'
b'MEAS\x88\x8c\x8c\x07\xa7\xae\x01\x0c\x15\x04\xf6ME'
b'\x15\x04\xf6MEAS\x87\x8c\x8c\x07\xa7\xaf\x01\x0c\x15\x04\xf6'
b'\x15\x04\xf6MEAS\x8a\x8c\x8c\x07\xa7\xb0\x01\x0c'
b'\x0c\x15\x04\xf6MEAS\x8a\x8c\x8c\x07\xa7\xb1\x01\x0c'
b'\x07\xa7\xb3\x01\x0c\x15\x04\xf6MEAS\x89\x8c\x8c\x07\xa7\xb2\x01'
b'\x8c\x8c\x07\xa7\xb4\x01\x0c\x15\x04\xf6MEAS\x89\x8c\x8c'
b'\x85\x8c\x8c\x07\xa7\xb5\x01\x0c\x15\x04\xf6MEAS\x84'
b'MEAS\x87\x8c\x8c\x07\xa7\xb6\x01\x0c\x15\x04\xf6MEAS'
b'\x04\xf6MEAS\x8b\x8c\x8c\x07\xa7\xb7\x01\x0c\x15\x04\xf6'
b'\x15\x04\xf6MEAS\x8b\x8c\x8c\x07\xa7\xb8\x01\x0c\x15'
b'\x15\x04\xf6MEAS\x8a\x8c\x8c\x07\xa7\xb9\x01\x0c'
b'\xa7\xbb\x01\x0c\x15\x04\xf6MEAS\x87\x8c\x8c\x07\xa7\xba\x01\x0c'
try to unpack the preamble !?
How do I read information like Article number, Serial number, Channel, Status, Measuring Value between the preamble?
The payload size seems to be fixed here for 22 Bytes (via Wireshark)

Parsing the reversed buffer is just weird; please use struct's support for endianess. Using big-endian '!' in a little-endian context is also odd.
The first four bytes are a text constant. Ok, fine perhaps you'll need to reverse those. But just those, please.
After that, use struct.unpack to parse out 'IIQI'. So far, that was kind of working OK with your approach, since all fields consume 4 bytes or a pair of 4 bytes. But finding frame M's length is the fly in the ointment since it is just 2 bytes, so parse it with 'H', giving you a combined 'IIQIH'. After that, you'll need to advance by only that many bytes, and then expect another 'MEAS' text constant once you've exhausted that set of measurements.

I managed to avoid TelnetLib altogether and created a tcp client using python3. I had the payload size already from my wireshark dump (22 Bytes) hence I keep receiving 22 bytes of Information. Apparently the module sends two distinct 22 Bytes payload
First (frame) payload has the preamble, serial, article, channel information
Second (frame) payload has the information like bytes per frame, measuring value counter, measuring value Channel 1, measuring value Channel 2, measuring value Channel 3
The information is in int32 and thus needs a formula to be converted to real readings (mentioned in the instruction manual)
(as mentioned by #J_H the unpacking was as He mentioned in his answer with small changes)
Code
import socket
import time
import struct
DRANGEMIN = 3261
DRANGEMAX = 15853
MEASRANGE = 50
OFFSET = 35
# Create a TCP/IP socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_address = ('169.254.168.150', 10001)
print('connecting to %s port %s' % server_address)
sock.connect(server_address)
def value_mm(raw_val):
return (((raw_val - DRANGEMIN) * MEASRANGE) / (DRANGEMAX - DRANGEMIN) + OFFSET)
if __name__ == '__main__':
while True:
Laser_Value = 0
data = sock.recv(22)
preamble, article, serial, x1, x2 = struct.unpack('<4sIIQH', data)
if not preamble == b'SAEM':
status, bpf, mValCounter, CH1, CH2, CH3 = struct.unpack('<hIIIII',data)
#print(CH1, CH2, CH3)
Laser_Value = CH3
print(str(value_mm(Laser_Value)) + " mm")
#print('RAW: ' + str(len(data)))
print('\n')
#time.sleep(0.1)
Sure enough, this provides me the information that is needed and I compared the information via the propreitary software which the company provides.

Python struct as networking data packets (uknown byte sequence)

I am working on a server engine in Python, for my game made in GameMaker Studio 2. I'm currently having some issues with making and sending a packet.
I've successfully managed to establish a connection and send the first packet, but I can't find a solution for sending data in a sequence of which if the first byte in the packed struct is equal to a value, then unpack other data into a given sequence.
Example:
types = 'hhh' #(message_id, x, y) example
message_id = 0
x = 200
y = 200
buffer = pack(types, 0,x, y)
On the server side:
data = conn.recv(BUFFER_SIZE)
mid = unpack('h', data)[0]
if not data: break
if mid == 0:
sequnce = 'hhh'
x = unpack(sequnce, data)[1]
y = unpack(sequnce, data)[2]

It looks like your subsequent decoding is going to vary based on the message ID?
If so, you will likely want to use unpack_from which allows you to pull only the first member from the data (as written now, your initial unpack call will generate an exception because the buffer you're handing it is not the right size). You can then have code that varies the unpacking format string based on the message ID. That code could look something like this:
from struct import pack, unpack, unpack_from
while True:
data = conn.recv(BUFFER_SIZE)
# End of file, bail out of loop
if not data: break
mid = unpack_from('!h', data)[0]
if mid == 0:
# Message ID 0
types = '!hhh'
_, x, y = unpack(types, data)
# Process message type 0
...
elif mid == 1:
types = '!hIIq'
_, v, w, z = unpack(types, data)
# Process message type 1
...
elif mid == 2:
...
Note that we're unpacking the message ID again in each case along with the ID-specific parameters. You could avoid that if you like by using the optional offset argument to unpack_from:
x, y = unpack_from('!hh', data, offset=2)
One other note of explanation: If you are sending messages between two different machines, you should consider the "endianness" (byte ordering). Not all machines are "little-endian" like x86. Accordingly it's conventional to send integers and other structured numerics in a certain defined byte order - traditionally that has been "network byte order" (which is big-endian) but either is okay as long as you're consistent. You can easily do that by prepending each format string with '!' or '<' as shown above (you'll need to do that for every format string on both sides).
Finally, the above code probably works fine for a simple "toy" application but as your program increases in scope and complexity, you should be aware that there is no guarantee that your single recv call actually receives all the bytes that were sent and no other bytes (such as bytes from a subsequently sent buffer). In other words, it's often necessary to add a buffering layer, or otherwise ensure that you have received and are operating on exactly the number of bytes you intended.

Could you unpack whole data to list, and then check its elements in the loop? What is the reason to unpack it 3 times? I guess, you could unpack it once, and then work with that list - check its length first, if not empty -> check first element -> if equal to special one, continue on list parsing. Did you try like that?

Deserializing byte stream to objects

I am building a python project that receives bytes from a serial port. The bytes are responses to commands sent (also via serial port). The responses have no identifying marks, i.e. from the bytes alone, I don't know which command response this is. The decoder would of course need to know in advance which command this is a response to.
I would like to have the incoming byte sequence represented as a nested object, indicating the frame, header, payload, decoded payload, etc. I would much prefer to push 1 byte at a time to the decoder and have it call a callback once it has received enough bytes for a full object (or errorCallback if there are errors or timeout).
The actual frame has a start byte and an end byte. It has a header with a few bytes (some id, command status (basically ok/fail)), one is a data length byte. This is followed by the data which is followed by a checksum (single byte). The data is the response to the command.
The response is predictable in that the previous bytes decide what the coming bytes mean.
Example response:
aa:00:0c:00:01:00:00:d3:8d:d4:5c:50:01:04:e0:6e:bb
Broken down:
aa: start frame
00: id
0c: data length (incl status): 12 bytes
00: command status (byte 1)
01: 1 data frame (byte 2)
00:00: flags of first data frame (byte 3-4)
d3:8d:d4:5c:50:01:04:e0: first data (aa and bb could be in it) (byte 5-12)
6e: checksum (includes all bytes except start, checksum, end bytes)
bb: end frame
This being serial port communication, bytes may be lost (and extra produced) and I expect to use timeout to handle resets (no responses are expected without first a command being sent).
I really would like an object oriented approach where the decoder would produce an object that when serialized, would produce the same byte sequence again. I am using python 2.7, but really any object oriented language would do (as long as I could convert it to python).
I am just not sure how to structure the decoder to make it neat looking. I am looking for a full solution, just something that would get me going in the right direction (right direction being somewhat subjective here).

I don't completely understand what you want to do but if you want to receive fixed-length responses from some device and make them attributes of some object, would something like this be okay?:
START_FRAME = 0xAA
END_FRAME = 0xBB
TIMEOUT = 2
class Response:
def __init__(self, response):
if (len(response) - 6) % 11 == 0 and response[0] == START_FRAME and response[-1] == END_FRAME: # verify that its a valid response
self.header = {} # build header
self.header['id'] = response[1]
self.header['length'] = response[2]
self.header['status'] = response[3]
self.r_checksum = response[-2] # get checksum from response
self.checksum = self.header['id'] ^ self.header['length'] ^ self.header['status'] # verify the checksum
self.payload = response[4:-2] # get raw payload slice
self.decode(self.payload) # parse payload
if self.r_checksum == self.checksum: # final check
self.parsed = True
else:
self.parsed = False
else: # if response didnt follow the pattern
self.parsed = False
def decode(self, packet): # decode payload
self.frm_count = packet[0] # get number of data frames
self.checksum ^= self.frm_count
self.decoded = [] # hold decoded payload
frames = packet[1:]
for c in range(self.frm_count): # iterate through data frames
flags = frames[(c*10):(c*10 + 2)]
for f in flags:
self.checksum ^= f
data = frames[(c*10 + 2):(c+1)*10]
for d in data:
self.checksum ^= d
self.decoded.append({'frame': c+1, 'flags': flags, 'data':data})
def serialize(): # reconstruct response
res = bytearray()
res.append(START_FRAME)
res.extend([self.header['id'], self.header['length'], self.header['status']])
res.extend(self.payload)
res.extend([self.checksum, END_FRAME])
return res
response = bytearray()
ser = serial.Serial('COM3', 9600) # timeout is 2 seconds
last_read = time.clock()
while time.clock() - last_read < TIMEOUT:
while ser.inWaiting() > 0:
response.append(ser.read())
last_read = time.clock()
decoded_res = Response(response)
if decoded_res.parsed:
# do stuff
else:
print('Invalid response!')
This code assumes there may be more than one data frame, with the data frames immediately preceded by a byte indicating the number of data frames.
Parsing a packet is fast compared to the time taken for serial comms (even at 115200 baud). The whole thing is roughly O(n), i think.

Creating 16 and 24-bit integers from binary file

I'm modifying an existing Python app that reads a binary file. The format of the file is changing a bit. Currently, one field is defined as bytes 35-36 of the record. The specs also state that "...fields in the records will be character fields written in ASCII." Here's what the current working code looks like:
def to_i16( word ):
xx = struct.unpack( '2c', word )
xx = ( ord( xx[ 0 ] ) << 8 ) + ord( xx[ 1 ] )
return xx
val = to_i16( reg[ 34:36 ] )
But that field is being redefined as a bytes 35-37, so it'll be a 24-bit value. I detest working with binary files and am horrible at bit-twiddling. How do I turn that 3-byte value into a 24-bit integer?? I've tried a couple of code bits that I've found by googling but I don't think they are correct. Hard to be sure since I'm still waiting on the people that sent the sample 'new format' file to send me a text representation that shows the values I should be coming up with.

Simply read 24 bit (I assume in big endian, since the original code is in that format as well ):
val = struct.unpack('>I', b'\x00' + reg[34:37])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

struct.unpack_from() buffer reading issue with data longer than 255 bytes - python

Related

Shared memory between C and python

unable to unpack information between custom Preamble in Python and telnetlib

Python struct as networking data packets (uknown byte sequence)

Deserializing byte stream to objects

Creating 16 and 24-bit integers from binary file

Categories

Resources