Can someone explain Python struct unpacking?

Can someone explain Python struct unpacking? - python

I have a binary file made from C structs that I want to parse in Python. I know the exact format and layout of the binary but I am confused on how to use Python Struct unpacking to read this data.
Would I have to traverse the whole binary unpacking a certain number of bytes at a time based on what the members of the struct are?
C File Format:
typedef struct {
int data1;
int data2;
int data4;
} datanums;
typedef struct {
datanums numbers;
char *name;
} personal_data;
Lets say the binary file had personal_data structs repeatedly after another.

Assuming the layout is a static binary structure that can be described by a simple struct pattern, and the file is just that structure repeated over and over again, then yes, "traverse the whole binary unpacking a certain number of bytes at a time" is exactly what you'd do.
For example:
record = struct.Struct('>HB10cL')
with open('myfile.bin', 'rb') as f:
while True:
buf = f.read(record.size)
if not buf:
break
yield record.unpack(buf)
If you're worried about the efficiency of only reading 17 bytes at a time and you want to wrap that up by buffering 8K at a time or something… well, first make sure it's an actual problem worth optimizing; then, if it is, loop over unpack_from instead of unpack. Something like this (untested, top-of-my-head code):
buf, offset = b'', 0
with open('myfile.bin', 'rb') as f:
if len(buf) < record.size:
buf, offset = buf[offset:] + f.read(8192), 0
if not buf:
break
yield record.unpack_from(buf, offset)
offset += record.size
Or, even simpler, as long as the file isn't too big for your vmsize, just mmap the whole thing and unpack_from on the mmap itself:
with open('myfile.bin', 'rb') as f:
with mmap.mmap(f, 0, access=mmap.ACCESS_READ) as m:
for offset in range(0, m.size(), record.size):
yield record.unpack_from(m, offset)

You can unpack a few at a time. Let's start with this example:
In [44]: a = struct.pack("iiii", 1, 2, 3, 4)
In [45]: a
Out[45]: '\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00'
If you're using a string, you can just use a subset of it, or use unpack_from:
In [49]: struct.unpack("ii",a[0:8])
Out[49]: (1, 2)
In [55]: struct.unpack_from("ii",a,0)
Out[55]: (1, 2)
In [56]: struct.unpack_from("ii",a,4)
Out[56]: (2, 3)
If you're using a buffer, you'll need to use unpack_from.

Related

Shared memory between C and python

I want to share memory between a program in C and another in python.
The c program uses the following structure to define the data.
struct Memory_LaserFrontal {
char Data[372]; // original data
float Med[181]; // Measurements in [m]
charD; // 'I': Invalid -- 'V': Valid
charS; // 'L': Clean -- 'S': Dirty
char LaserStatus[2];
};
From python I have managed to read the variable in memory using sysv_ipc but they have no structure and is seen as a data array. How can I restructure them?
python code:
from time import sleep
import sysv_ipc
# Create shared memory object
memory = sysv_ipc.SharedMemory(1234)
# Read value from shared memory
memory_value = memory.read()
print (memory_value)
print (len(memory_value))
while True:
memory_value = memory.read()
print (float(memory_value[800]))
sleep(0.1)
I have captured and printed the data in python, I have modified the sensor reading and the read data is also modified, confirming that the read data corresponds to the data in the sensor's shared memory. But without the proper structure y cant use the data.

You need to unpack your binary data structure into Python types. The Python modules struct and array can do this for you.
import struct
import array
NB: Some C compilers, but not the comomn ones, may pad your member variables to align each of them with the expected width for your CPU ( almost always 4 bytes ). This means that it may add padding bytes. You may have to experiment with the struct format parameter 'x' between the appropriate parts of your struct if this is the case. Python's struct module does not expect aligned or padded types by default, you need to inform it. See my note at the very end for a guess on what the padding might look like. Again, per #Max's comment, this is unlikely.
NB: I think the members charD and charS are really char D; and char S;
Assuming you want the floats as a Python list or equivalent we have to do some more work with the Python array module . Same for the char[] Data.
# Get the initial char array - you can turn it into a string if you need to later.
my_chars = array.array("b") # f for float, b for byteetc.
my_chars.from_bytes(memory_value[:372]) # happens that 372 C chars is 372 bytes.
Data = my_chars.tolist() # Could be bytes list
# advance to the member after Data
end_of_Data = struct.calcsize("372c")
# get the length in bytes that 181 floats take up
end_of_Med = struct.calcsize("181f") + end_of_Data
# now we know where the floats are
floats_as_bytes = memory_value[ end_of_Data : end_of_Med ]
# unpack the remaining parts
( D, S, LaserStatus_1, LaserStatus_2 ) = struct.unpack( "cccc", memory_value[end_of_Med:] )
Now use the array module to unpack to make a Python list
my_floats = array.array("f") # f for float, c for char etc.
my_floats.from_bytes(floats_as_bytes)
Now Data might be a list of Python bytes type that you need to convert to your preferred string encoding. Usually .decode('utf-8') is good enough.
Data_S = "".join(Data).decode('utf-8') # get a usable string in Data_S
Padding
struct Memory_LaserFrontal {
char Data[372]; // 372 is a multiple of 4, probably no padding
float Med[181]; // floats are 4 bytes, probably no padding
charD; // single char, expect 3 padding bytes after
charS; // single char, expect 3 padding bytes after
char LaserStatus[2]; // double char expect 2 padding bytes after.
};
So the last Python line above might be - where the 'x' indicates a padding byte that can be ignored.
( D, S, LaserStatus_1, LaserStatus_2 ) = struct.unpack( "cxxxcxxxccxx", memory_value[end_of_Med:] )

I always like to leave the full source code of the problem solved so others can use it if they have a similar problem.
thanks a lot all!
from time import sleep
import sysv_ipc
import struct
import array
# Create shared memory object
while True:
memory = sysv_ipc.SharedMemory(1234)
# Read value from shared memory
memory_value = memory.read()
#print (memory_value)
#print (len(memory_value))
# Get the initial char array - you can turn it into a string if you need to later.
my_chars = array.array("b") # f for float, c for char etc.
#my_chars.from_bytes(memory_value[:372]) # happens that 372 chars is 372 bytes.
Data = my_chars.tolist() # Could be bytes list
# advance to the member after Data
end_of_Data = struct.calcsize("372c")
# get the length in bytes that 181 floats take up
end_of_Med = struct.calcsize("181f") + end_of_Data
# now we know where the floats are
floats_as_bytes = memory_value[ end_of_Data : end_of_Med ]
# unpack the remaining parts
( D, S, LaserStatus_1, LaserStatus_2 ) = struct.unpack( "cccc", memory_value[end_of_Med:] )
print(len(floats_as_bytes)/4)
a=[]
for i in range(0,len(floats_as_bytes),4):
a.append(struct.unpack('<f', floats_as_bytes[i:i+4]))
print (a[0])
sleep(0.1)

How to get multiple 32bit values with byte array?

I need to extract some number values out of a binary data stream.
the code below is working for me, but for sure there is a more suitable way to do this in python. Especially I was struggling a lot to find a better way to iterate over the array and get 4 byte as byte arrays from the buffer.
some hint for me?
outfile = io.BytesIO()
outfile.writelines(some binary data stream)
buf = outfile.getvalue()
blen = int(len(buf) / 4 );
for i in range(blen):
a = bytearray([0,0,0,0])
a[0] = buf[i*4]
a[1] = buf[i*4+1]
a[2] = buf[i*4+2]
a[3] = buf[i*4+3]
data = struct.unpack('<l', a)[0]
do something with data

Your question and accompanying pseudo-code are somewhat hazy in my opinion, but here's something that uses slices of buf to obtain the each group of 4 bytes needed—so if nothing else it's at least a bit more succinct (assuming I've correctly interpreted what you're asking):
import io
import struct
outfile = io.BytesIO()
outfile.writelines([b'\x00\x01\x02\x03',
b'\x04\x05\x06\x07'])
buf = outfile.getvalue()
for i in range(0, len(buf), 4):
data = struct.unpack('<l', buf[i:i+4])[0]
print(hex(data))
Output:
0x3020100
0x7060504

Reading binary file into different hex "types" (8bit, 16bit, 32bit, ...)

I have a file which contains binary data. The content of this file is just one long line.
Example: 010101000011101010101
Originaly the content was an array of c++ objects with the following data types:
// Care pseudo code, just for visualisation
int64 var1;
int32 var2[50];
int08 var3;
I want to skip var1 and var3 and only extract the values of var2 into some readable decimal values. My idea was to read the file byte by byte and convert them into hex values. In the next step I though I could "combine" (append) 4 of those hex values to get one int32 value.
Example: 0x10 0xAA 0x00 0x50 -> 0x10AA0050
My code so far:
def append_hex(a, b):
return (a << 4) | b
with open("file.dat", "rb") as f:
counter = 0
tickdifcounter = 0
current_byte=" "
while True:
if (counter >= 8) and (counter < 208):
tickdifcounter+=1
if (tickdifcounter <= 4):
current_byte = append_hex(current_byte, f.read(1))
if (not current_byte):
break
val = ord(current_byte)
if (tickdifcounter > 4):
print hex(val)
tickdifcounter = 0
current_byte=""
counter+=1
if(counter == 209): #209 bytes = int64 + (int32*50) + int08
counter = 0
print
Now I have the problem that my append_hex is not working because the variables are strings so the bitshift is not working.
I am new to python so please give me hints when I can do something in a better way.

You can use struct module for reading binary files.
This can help you Reading a binary file into a struct in Python

A character can be converted to a int using the ord(x) method. In order to get the integer value of a multi-byte number, bitshift left. For example, from a earlier project:
def parseNumber(string, index):
return ord(string[index])<<24 + ord(string[index+1])<<16 + \
ord(string[index+2])<<8+ord(string[index+3])
Note this code assumes big-endian system, you will need to reverse the index for parsing little-endian code.
If you know exaclty what the size of the struct is going to be, (or can easily calculate it based on the size of the file) you are probably better of using the "struct" module.

reading struct in python from created struct in c

I am very new at using Python and very rusty with C, so I apologize in advance for how dumb and/or lost I sound.
I have function in C that creates a .dat file containing data. I am opening the file using Python to read the file. One of the things I need to read are a struct that was created in the C function and printed in binary. In my Python code I am at the appropriate line of the file to read in the struct. I have tried both unpacking the stuct item by item and as a whole without success. Most of the items in the struct were declared 'real' in the C code. I am working on this code with someone else and the main source code is his and has declared the variables as 'real'. I need to put this in a loop because I want to read all of the files in the directory that end in '.dat'. To start the loop I have:
for files in os.listdir(path):
if files.endswith(".dat"):
part = open(path + files, "rb")
for line in part:
Which then I read all of the lines previous to the one containing the struct. Then I get to that line and have:
part_struct = part.readline()
r = struct.unpack('<d8', part_struct[0])
I'm trying to just read the first thing stored in the struct. I saw an example of this somewhere on here. And when I try this I'm getting an error that reads:
struct.error: repeat count given without format specifier
I will take any and all tips someone can give me. I have been stuck on this for a few days and have tried many different things. To be honest, I think I don't understand the struct module but I've read as much as I could on it.
Thanks!

You could use ctypes.Structure or struct.Struct to specify format of the file. To read structures from the file produced by C code in #perreal's answer:
"""
struct { double v; int t; char c;};
"""
from ctypes import *
class YourStruct(Structure):
_fields_ = [('v', c_double),
('t', c_int),
('c', c_char)]
with open('c_structs.bin', 'rb') as file:
result = []
x = YourStruct()
while file.readinto(x) == sizeof(x):
result.append((x.v, x.t, x.c))
print(result)
# -> [(12.100000381469727, 17, 's'), (12.100000381469727, 17, 's'), ...]
See io.BufferedIOBase.readinto(). It is supported in Python 3 but it is undocumented in Python 2.7 for a default file object.
struct.Struct requires to specify padding bytes (x) explicitly:
"""
struct { double v; int t; char c;};
"""
from struct import Struct
x = Struct('dicxxx')
with open('c_structs.bin', 'rb') as file:
result = []
while True:
buf = file.read(x.size)
if len(buf) != x.size:
break
result.append(x.unpack_from(buf))
print(result)
It produces the same output.
To avoid unnecessary copying Array.from_buffer(mmap_file) could be used to get an array of structs from a file:
import mmap # Unix, Windows
from contextlib import closing
with open('c_structs.bin', 'rb') as file:
with closing(mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_COPY)) as mm:
result = (YourStruct * 3).from_buffer(mm) # without copying
print("\n".join(map("{0.v} {0.t} {0.c}".format, result)))

Some C code:
#include <stdio.h>
typedef struct { double v; int t; char c;} save_type;
int main() {
save_type s = { 12.1f, 17, 's'};
FILE *f = fopen("output", "w");
fwrite(&s, sizeof(save_type), 1, f);
fwrite(&s, sizeof(save_type), 1, f);
fwrite(&s, sizeof(save_type), 1, f);
fclose(f);
return 0;
}
Some Python code:
import struct
with open('output', 'rb') as f:
chunk = f.read(16)
while chunk != "":
print len(chunk)
print struct.unpack('dicccc', chunk)
chunk = f.read(16)
Output:
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
but there is also the padding issue. The padded size of save_type is 16, so we read 3 more characters and ignore them.

A number in the format specifier means a repeat count, but it has to go before the letter, like '<8d'. However you said you just want to read one element of the struct. I guess you just want '<d'. I guess you are trying to specify the number of bytes to read as 8, but you don't need to do that. d assumes that.
I also noticed you are using readline. That seems wrong for reading binary data. It will read until the next carriage return / line feed, which will occur randomly in binary data. What you want to do is use read(size), like this:
part_struct = part.read(8)
r = struct.unpack('<d', part_struct)
Actually, you should be careful, as read can return less data than you request. You need to repeat it if it does.
part_struct = b''
while len(part_struct) < 8:
data = part.read(8 - len(part_struct))
if not data: raise IOException("unexpected end of file")
part_struct += data
r = struct.unpack('<d', part_struct)

I had same problem recently, so I had made module for the task, stored here: http://pastebin.com/XJyZMyHX
example code:
MY_STRUCT="""typedef struct __attribute__ ((__packed__)){
uint8_t u8;
uint16_t u16;
uint32_t u32;
uint64_t u64;
int8_t i8;
int16_t i16;
int32_t i32;
int64_t i64;
long long int lli;
float flt;
double dbl;
char string[12];
uint64_t array[5];
} debugInfo;"""
PACKED_STRUCT='\x01\x00\x01\x00\x00\x01\x00\x00\x00\x00\x00\x01\x00\x00\x00\xff\x00\xff\x00\x00\xff\xff\x00\x00\x00\x00\xff\xff\xff\xff*\x00\x00\x00\x00\x00\x00\x00ff\x06#\x14\xaeG\xe1z\x14\x08#testString\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00'
if __name__ == '__main__':
print "String:"
print depack_bytearray_to_str(PACKED_STRUCT,MY_STRUCT,"<" )
print "Bytes in Stuct:"+str(structSize(MY_STRUCT))
nt=depack_bytearray_to_namedtuple(PACKED_STRUCT,MY_STRUCT,"<" )
print "Named tuple nt:"
print nt
print "nt.string="+nt.string
The result should be:
String:
u8:1
u16:256
u32:65536
u64:4294967296
i8:-1
i16:-256
i32:-65536
i64:-4294967296
lli:42
flt:2.09999990463
dbl:3.01
string:u'testString\x00\x00'
array:(1, 2, 3, 4, 5)
Bytes in Stuct:102
Named tuple nt:
CStruct(u8=1, u16=256, u32=65536, u64=4294967296L, i8=-1, i16=-256, i32=-65536, i64=-4294967296L, lli=42, flt=2.0999999046325684, dbl=3.01, string="u'testString\\x00\\x00'", array=(1, 2, 3, 4, 5))
nt.string=u'testString\x00\x00'

Numpy can be used to read/write binary data. You just need to define a custom np.dtype instance that defines the memory layout of your c-struct.
For example, here is some C++ code defining a struct (should work just as well for C structs, though I'm not a C expert):
struct MyStruct {
uint16_t FieldA;
uint16_t pad16[3];
uint32_t FieldB;
uint32_t pad32[2];
char FieldC[4];
uint64_t FieldD;
uint64_t FieldE;
};
void write_struct(const std::string& fname, MyStruct h) {
// This function serializes a MyStruct instance and
// writes the binary data to disk.
std::ofstream ofp(fname, std::ios::out | std::ios::binary);
ofp.write(reinterpret_cast<const char*>(&h), sizeof(h));
}
Based on the advice I found at stackoverflow.com/a/5397638, I've included some padding in the struct (the pad16 and pad32 fields) so that serialization will happen in a more predictable way. I think that this is a C++ thing; it might not be necessary when using plain ol' C structs.
Now, in python, we create a numpy.dtype object describing the memory-layout of MyStruct:
import numpy as np
my_struct_dtype = np.dtype([
("FieldA" , np.uint16 , ),
("pad16" , np.uint16 , (3,) ),
("FieldB" , np.uint32 ),
("pad32" , np.uint32 , (2,) ),
("FieldC" , np.byte , (4,) ),
("FieldD" , np.uint64 ),
("FieldE" , np.uint64 ),
])
Then use numpy's fromfile to read the binary file where you've saved your c-struct:
# read data
struct_data = np.fromfile(fpath, dtype=my_struct_dtype, count=1)[0]
FieldA = struct_data["FieldA"]
FieldB = struct_data["FieldB"]
FieldC = struct_data["FieldC"]
FieldD = struct_data["FieldD"]
FieldE = struct_data["FieldE"]
if FieldA != expected_value_A:
raise ValueError("Bad FieldA, got %d" % FieldA)
if FieldB != expected_value_B:
raise ValueError("Bad FieldB, got %d" % FieldB)
if FieldC.tobytes() != b"expc":
raise ValueError("Bad FieldC, got %s" % FieldC.tobytes().decode())
...
The count=1 argument in the above call np.fromfile(..., count=1) is so that the returned array will have only one element; this means "read the first struct instance from the file". Note that I am indexing [0] to get that element out of the array.
If you have appended the data from many c-structs to the same file, you can use fromfile(..., count=n) to read n struct instances into a numpy array of shape (n,). Setting count=-1, which is the default for the np.fromfile and np.frombuffer functions, means "read all of the data", resulting in a 1-dimensional array of shape (number_of_struct_instances,).
You can also use the offset keyword argument to np.fromfile to control where in the file the data read will begin.
To conclude, here are some numpy functions that will be useful once your custom dtype has been defined:
Reading binary data as a numpy array:
np.frombuffer(bytes_data, dtype=...):
Interpret the given binary data (e.g. a python bytes instance)
as a numpy array of the given dtype. You can define a custom
dtype that describes the memory layout of your c struct.
np.fromfile(filename, dtype=...):
Read binary data from filename. Should be the same result as
np.frombuffer(open(filename, "rb").read(), dtype=...).
Writing a numpy array as binary data:
ndarray.tobytes():
Construct a python bytes instance containing
raw data from the given numpy array. If the array's data has dtype
corresponding to a c-struct, then the bytes coming from
ndarray.tobytes can be deserialized
by c/c++ and interpreted as an (array of) instances of that c-struct.
ndarray.tofile(filename):
Binary data from the array is written to filename.
This data could then be deserialized by c/c++.
Equivalent to open("filename", "wb").write(a.tobytes()).

Python Socket Receiving Doubles

I need a python program to use a TCP connection(to another program) to retrieve sets of 9 bytes.
The first of the nine bytes represents a char, and the rest represent a double.
How can I extract this information from a python socket? Will I have to do the maths manually to convert the stream data or is there a better way?

Take a look at python struct
http://docs.python.org/library/struct.html
So something like
from struct import unpack
unpack('cd',socket_read_buffer)
--> ('c', 3.1415)
Be careful of Endianness.

If both client and server are written in python, I'd suggest you use pickle. It lets you convert python variables to bytes and then back to python variables with their original type.
#SENDER
import pickle, struct
#convert the variable to bytes with pickle
bytes = pickle.dumps(original_variable)
#convert its size to a 4 bytes int (in bytes)
#I: 4 bytes unsigned int
#!: network (= big-endian)
length = struct.pack("!I", len(bytes))
a_socket.sendall(length)
a_socket.sendall(bytes)
#RECEIVER
import pickle, struct
#This function lets you receive a given number of bytes and regroup them
def recvall(socket, count):
buf = b""
while count > 0:
newbuf = socket.recv(count)
if not newbuf: return None
buf += newbuf
count -= len(newbuf)
return buf
length, = struct.unpack("!I", recvall(socket, 4)) #we know the first reception is 4 bytes
original_variable = pickle.loads(recval(socket, length))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can someone explain Python struct unpacking? - python

Related

Shared memory between C and python

How to get multiple 32bit values with byte array?

Reading binary file into different hex "types" (8bit, 16bit, 32bit, ...)

reading struct in python from created struct in c

Python Socket Receiving Doubles

Categories

Resources