How to read structured binary data from a file?

How to read structured binary data from a file? - python

The following C++ code writes a header to a file:
#include <iostream>
struct Header
{
uint16_t name;
uint8_t type;
uint8_t padding;
uint32_t width, height;
uint32_t depth1, depth2;
float dMin, dMax;
};
int main()
{
Header header;
header.name = *reinterpret_cast<const uint16_t*>("XO");
header.type = true;
header.width = (uint32_t)512;
header.height = (uint32_t)600;
header.depth1 = (uint32_t)16;
header.depth2 = (uint32_t)25;
header.dMin = 5.0;
header.dMax = 8.6;
FILE* f = fopen("header.bin", "wb");
fwrite(&header, sizeof(Header), 1, f);
}
I am looking to read these header.bin files using Python. In C++ I would be doing something like:
fread(&header, sizeof(Header), 1, f)
But I'm unsure how to read the bytes and convert them into the corresponding fields that the Header struct has in Python?

Use the struct module to define the binary layout of a C-like struct and de-/serialise it:
import struct
# Format String describing the data layout
layout = "H B x 2L 2L 2f"
# Object representing the layout, including size
header = struct.Struct(layout)
with open("header.bin", "rb") as in_stream:
print(header.unpack(in_stream.read(header.size))
The layout is a format string describing the fields in-order, e.g. H for uint16_t, B for uint8_t, x for a pad byte, and so on.

I would do this with Python's ctypes, somewhat so you can share the Header header
Create a class from ctypes.Structure to map the types
import ctypes
class StructHeader(ctypes.Structure):
_fields_ = [
("name", ctypes.c_uint16),
("type", ctypes.c_uint8),
...
]
And create a function which does what you want with a signature like
int header(struct Header &buffer)
{
// open the file and write to buffer
// opportunity for other features
}
Then you can compile a shared object to read it which returns that type
gcc -shared -Wl,-soname,your_soname \
-o library_name file_list library_list
And call out with ctypes.CDLL to read the headers
header = ctypes.CDLL("mylib.so.1").header # function named header
header.argtypes = [ctypes.POINTER(StructHeader)]
header.restype = ctypes.c_int
# allocate struct for write
buffer = StructHeader()
# call out to function to write buffer
header(buffer)

Related

Python ctypes writing data to be read by C executable

I'm trying to learn how to use the Python ctypes library to write data to a file that can easily be read by C executables. In the little test case that I've put together, I'm running into some problems with reading/writing character arrays.
At the moment, I have three source files. write_struct.py creates a simple struct with two
entries, an integer value called git and a character array called command, then writes the struct to a file using ctypes.fwrite. read_struct.c and read_struct.h compile into an executable that internally defines an identical struct to the one in write_struct.py, then reads in the data written by the python script and prints it out.
At the moment, the following values are assigned in the python file (not literally in the manner shown below, scroll down to see the actual code):
git = 1
command = 'cp file1 file2'
And when run, the C executable prints the following:
git: 1
command:
I realize that the problem is almost certainly in how the command variable is being assigned in the python script. I have read that c_char_p() (the function I'm currently using to initialize the data in that variable) does not create a pointer to mutable memory, and create_string_buffer() should be used instead, however I'm not sure about how this works with either adding that data to a struct, or writing it to a file. I guess I'm also confused about how writing pointers/their data to a file works in the first place. What is the best way to go about doing this?
Thanks in advance to anyone that is able to help!!
The code of my three files is below for reference:
write_struct.py:
"""
write_struct.py
"""
from ctypes import *
libc = cdll.LoadLibrary("libc.so.6")
class DataStruct(Structure):
_fields_ = [("git", c_int),
("command", c_char_p)
]
def main():
pydata = DataStruct(1, c_char_p("cp file1 file2"))
libc.fopen.argtypes = c_char_p, c_char_p
libc.fopen.restype = c_void_p
libc.fwrite = libc.fwrite
libc.fwrite.argtypes = c_void_p, c_size_t, c_size_t, c_void_p
libc.fwrite.restype = c_size_t
libc.fclose = libc.fclose
libc.fclose.argtypes = c_void_p,
libc.fclose.restype = c_int
f = libc.fopen("stored_data", "wb")
libc.fwrite(byref(pydata), sizeof(pydata), 1, f)
libc.fclose(f)
return 0
main()
read_struct.c:
/*
* read_struct.c
*
*/
#include "read_struct.h"
int main()
{
data_struct cdata = malloc(DATASIZE);
FILE *fp;
if ((fp = fopen("stored_data", "r")) != NULL) {
fread(cdata, DATASIZE, 1, fp);
printf("git: %i\n", cdata->git);
printf("command:");
printf("%s\n", cdata->command);
fclose(fp);
} else {
printf("Could not open file\n");
exit(1);
}
return 0;
}
read_struct.h:
/*
* read_struct.h
*
*/
#include <stdio.h>
#include <stdlib.h>
typedef struct _data_struct *data_struct;
struct _data_struct {
int git;
char command[40];
};
#define DATASIZE sizeof(struct _data_struct)

You can write binary data directly with Python. ctypes can be used to create the structure and supports bit fields and unions, or for simple structures the struct module can be used.
from ctypes import *
class DataStruct(Structure):
_fields_ = [("git", c_int),
("command", c_char * 40)] # You want array here, not pointer
pydata = DataStruct(1,b'cp file1 file2') # byte string for initialization.
with open('stored_data','wb') as f: # write file in binary mode
f.write(pydata) # ctypes support conversion to bytes
import struct
# See struct docs for formatting codes
# i = int (native-endian. Use <i to force little-endian, >i for big-endian)
# 40s = char[40] (zero-padded if initializer is shorter)
pydata = struct.pack('i40s',1,b'cp file1 file2')
with open('stored_data2','wb') as f:
f.write(pydata)
Ref: https://docs.python.org/3/library/struct.html#format-strings

How to access the value of a ctypes.LP_c_char pointer?

I have defined a struct :
class FILE_HANDLE(Structure):
_fields_ = [
("handle_bytes", c_uint),
("handle_type", c_int),
("f_handle", POINTER(c_char))
]
The struct is initialised :
buf = create_string_buffer(f_handle.handle_bytes)
fh = FILE_HANDLE(c_uint(8), c_int(0), buf)
I am passing it by reference to a function that populates it.
ret = libc.name_to_handle_at(dirfd, pathname, byref(fh), byref(mount_id), flags)
I can check with strace that the call works, but I have not been able to figure out how to access the value of fh.f_handle
fh.f_handle type is <ctypes.LP_c_char object at 0x7f1a7ca17560>
fh.f_handle.contents type is <ctypes.LP_c_char object at 0x7f1a7ca17560> but I get a SIGSEGV if I try to access its value.
How could I get 8 bytes from f_handle into a string or array ?

Everything actually looks right for what you've shown, but without seeing the explicit C definition of the structure and function you are calling it is difficult to see the problem.
Here's an example that works with what you have shown. I inferred what the C definitions should be from what you have declared in Python, but most likely your definition is different if you get a segfault.
C Code (Windows)
struct FILE_HANDLE
{
unsigned int handle_bytes;
int handle_type;
char* f_handle;
};
__declspec(dllexport) int name_to_handle_at(int dirfd, char* pathname, struct FILE_HANDLE* fh, int* mount_id, int flags)
{
unsigned int i;
printf("dirfd=%d pathname=%s fh->handle_bytes=%u fh->handle_type=%d flags=%d\n", dirfd, pathname, fh->handle_bytes, fh->handle_type, flags);
for(i = 0; i < fh->handle_bytes; ++i)
fh->f_handle[i] = 'A' + i;
*mount_id = 123;
return 1;
}
Python code (Works in Python 2 and 3):
from __future__ import print_function
from ctypes import *
class FILE_HANDLE(Structure):
_fields_ = [("handle_bytes", c_uint),
("handle_type", c_int),
("f_handle", POINTER(c_char))]
buf = create_string_buffer(8);
fh = FILE_HANDLE(8,0,buf)
libc = CDLL('test.dll')
mount_id = c_int(0)
ret = libc.name_to_handle_at(1,b'abc',byref(fh),byref(mount_id),7)
print('mount_id =',mount_id.value)
print('fh.f_handle =',fh.f_handle[:fh.handle_bytes])
Output
dirfd=1 pathname=abc fh->handle_bytes=8 fh->handle_type=0 flags=7
mount_id = 123
fh.f_handle = b'ABCDEFGH'
Note that since the structure is declared as a pointer to a single character, printing fh.f_handle.contents would only print b'A'. Using slicing, I've instructed Python to index the pointer up to the length allocated.
If this doesn't work for you, provide a Minimal, Complete, and Verifiable example (as I have) to reproduce your error exactly.

fh.f_handle is shown as LP_c_char because you defined the struct that way.
buf = create_string_buffer(8)
print type(buf)
fh = FILE_HANDLE(c_uint(8), c_int(0), buf)
print type(fh.f_handle)
Will output
<class 'ctypes.c_char_Array_8'>
<class 'ctypes.LP_c_char'>
You have defined your struct to accept a pointer to a c_char. So when you try to access fh.f_handle it will expect the value to be a memory address containing the address to the actual single c_char.
But by trying to input a c_char * 8 from the string buffer it will convert the first part of your buffer to a pointer.
Python tries to dereference your char[0] which means that it will look for a memory address with the value of the character you have defined in char[0]. That memory address is not valid, so your interpreter will signal a SIGSEGV.
Now to create a class which properly handles a variable length buffer is quite difficult. An easier option is to pass the buffer as an opaque handle, to access it afterwards you need to cast it back to a char array.
Example:
class FILE_HANDLE(Structure):
_fields_ = [
("handle_bytes", c_uint),
("handle_type", c_int),
("f_handle", c_void_p)
]
buf = create_string_buffer(8)
buf = cast(buf, c_void_p)
fh = FILE_HANDLE(c_uint(8), c_int(0), buf)
f_handle_value = (c_char * fh.handle_bytes).from_address(fh.f_handle)

How to parse serialized C structs from binary file in python?

I have a handful of different type of C-structs that are all compressed into a binary file.
struct-id serialized-struct struct-id serialized-struct ...
If it were the same struct over and over, it would make sense to use the struct package, but I want to switch between previously defined structs all the time.
STRUCT1_ID = '\xAA'
STRUCT2_ID = '\xBB'
STRUCT_IDS = frozenset([STRUCT1_ID, STRUCT2_ID])
struct1s = []
struct2s = []
def create_test_file(filepath):
with open(filepath, 'wb') as f:
# Write an example struct1 id followed by struct
f.write(STRUCT1_ID)
f.write(b'\x01\x02\x03\x04\x05\x06')
# Write an example struct2 id followed by struct
f.write(STRUCT2_ID)
f.write(b'\x07\x08\x09\x0A')
def parse_test_file(filepath):
with open(filepath, 'rb') as f:
msg_type = f.read(1)
while msg_type:
print(byte)
if byte in STRUCT_IDS:
# Parse the next however many bytes needed by struct
# logic breaks down here
struct1s.append(turnIntoStruct(f.read(?)))
msg_type = f.read(1)
else:
print('Corrupted file. Unrecognized id')
In C, the structs would be:
typedef struct struct1_s {
uint16_t a;
uint16_t b;
uint16_t c;
} struct1_t;
typedef struct struct2_s {
uint16_t d;
uint16_t e;
} struct2_t;
// Declare and initialize the structs
struct1_t s1 = {
.a = 0x0201,
.b = 0x0403,
.c = 0x0605
};
struct2_t s2 = {
.d = 0x0807,
.e = 0x0A09
};
I'm less python than I am C right now. I seem unable to bring construct to python 3.4.3?

Map the ID to the struct pattern, and use the appropriate one.
structmap = {
b'\xaa': ('3H', struct1s),
b'\xbb': ('2H', struct2s)
}
...
structmap[msg_type][1].append(struct.unpack(structmap[msg_type][0],
f.read(struct.calcsize(structmap[msg_type][0]))))

C equivalent to python pickle (object serialization)?

What would be the C equivalent to this python code?
Thanks.
data = gather_me_some_data()
# where data = [ (metic, datapoints), ... ]
# and datapoints = [ (timestamp, value), ... ]
serialized_data = cPickle.dumps(data, protocol=-1)
length_prefix = struct.pack("!L", len(serialized_data))
message = length_prefix + serialized_data

C doesn't supports direct serialization mechanism because in C you can't get type information at run-time. You must yourself inject some type info at run-time and then construct required object by that type info. So define all your possible structs:
typedef struct {
int myInt;
float myFloat;
unsigned char myData[MY_DATA_SIZE];
} MyStruct_1;
typedef struct {
unsigned char myUnsignedChar;
double myDouble;
} MyStruct_2;
Then define enum which collects info about what structs in total you have:
typedef enum {
ST_MYSTRUCT_1,
ST_MYSTRUCT_2
} MyStructType;
Define helper function which lets to determine any struct size:
int GetStructSize(MyStructType structType) {
switch (structType) {
case ST_MYSTRUCT_1:
return sizeof(MyStruct_1);
case ST_MYSTRUCT_2:
return sizeof(MyStruct_2);
default:
// OOPS no such struct in our pocket
return 0;
}
}
Then define serialize function:
void BinarySerialize(
MyStructType structType,
void * structPointer,
unsigned char * serializedData) {
int structSize = GetStructSize(structType);
if (structSize != 0) {
// copy struct metadata to serialized bytes
memcpy(serializedData, &structType, sizeof(structType));
// copy struct itself
memcpy(serializedData+sizeof(structType), structPointer, structSize);
}
}
And de-serialization function:
void BinaryDeserialize(
MyStructType structTypeDestination,
void ** structPointer,
unsigned char * serializedData)
{
// get source struct type
MyStructType structTypeSource;
memcpy(&structTypeSource, serializedData, sizeof(structTypeSource));
// get source struct size
int structSize = GetStructSize(structTypeSource);
if (structTypeSource == structTypeDestination && structSize != 0) {
*structPointer = malloc(structSize);
memcpy(*structPointer, serializedData+sizeof(structTypeSource), structSize);
}
}
Serialization usage example:
MyStruct_2 structInput = {0x69, 0.1};
MyStruct_1 * structOutput_1 = NULL;
MyStruct_2 * structOutput_2 = NULL;
unsigned char testSerializedData[SERIALIZED_DATA_MAX_SIZE] = {0};
// serialize structInput
BinarySerialize(ST_MYSTRUCT_2, &structInput, testSerializedData);
// try to de-serialize to something
BinaryDeserialize(ST_MYSTRUCT_1, &structOutput_1, testSerializedData);
BinaryDeserialize(ST_MYSTRUCT_2, &structOutput_2, testSerializedData);
// determine which object was de-serialized
// (plus you will get code-completion support about object members from IDE)
if (structOutput_1 != NULL) {
// do something with structOutput_1
free(structOutput_1);
}
else if (structOutput_2 != NULL) {
// do something with structOutput_2
free(structOutput_2);
}
I think this is most simple serialization approach in C. But it has some problems:
struct must not have pointers, because you will never know how much memory one needs to allocate when serializing pointers and from where/how to serialize data into pointers.
this example has issues with system endianess - you need to be careful about how data is stored in memory - in big-endian or little-endian fashion and reverse bytes if needed [when casting char * to integal type such as enum] (...or refactor code to be more portable).

If you can use C++, there is the PicklingTools library

python using ctypes to work with dll - structure OUT argument

In the header file of the dll I have the following structure
typedef struct USMC_Devices_st{
DWORD NOD; // Number of the devices ready to work
char **Serial; // Array of 16 byte ASCII strings
char **Version; // Array of 4 byte ASCII strings
} USMC_Devices; // Structure representing connected devices
I would like to call a dll function:
DWORD USMC_Init( USMC_Devices &Str );
I tried with this:
class USMCDevices(Structure):
_fields_ = [("NOD", c_long),
("Serial", c_char_p),
("Version", c_char_p)]
usmc = cdll.USMCDLL #this is the dll file
init = usmc.USMC_Init
init.restype = c_int32; # return type
init.argtypes = [USMCDevices]; # argument
dev = USMCDevices()
init(dev)
I get an error here. I guess the problem is with "Serial" and "Version" which both are array corresponding to the NOD (number of devices).
Any ideas how to solve this problem?
I really appreciate your help!!!

Use POINTER(c_char_p) for the char ** pointers. Indexing Serial or Version creates a Python string for the given null-terminated string. Note that indexing in the array beyond NOD - 1 either produces garbage values or will crash the interpreter.
C:
#include <windows.h>
typedef struct USMC_Devices_st {
DWORD NOD; // Number of the devices ready to work
char **Serial; // Array of 16 byte ASCII strings
char **Version; // Array of 4 byte ASCII strings
} USMC_Devices;
char *Serial[] = {"000000000000001", "000000000000002"};
char *Version[] = {"001", "002"};
__declspec(dllexport) DWORD USMC_Init(USMC_Devices *devices) {
devices->NOD = 2;
devices->Serial = Serial;
devices->Version = Version;
return 0;
}
// build: cl usmcdll.c /LD
Python:
import ctypes
from ctypes import wintypes
class USMCDevices(ctypes.Structure):
_fields_ = [("NOD", wintypes.DWORD),
("Serial", ctypes.POINTER(ctypes.c_char_p)),
("Version", ctypes.POINTER(ctypes.c_char_p))]
usmc = ctypes.cdll.USMCDLL
init = usmc.USMC_Init
init.restype = wintypes.DWORD
init.argtypes = [ctypes.POINTER(USMCDevices)]
dev = USMCDevices()
init(ctypes.byref(dev))
devices = [dev.Serial[i] + b':' + dev.Version[i]
for i in range(dev.NOD)]
print('\n'.join(d.decode('ascii') for d in devices))
Output:
000000000000001:001
000000000000002:002

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read structured binary data from a file? - python

Related

Python ctypes writing data to be read by C executable

How to access the value of a ctypes.LP_c_char pointer?

How to parse serialized C structs from binary file in python?

C equivalent to python pickle (object serialization)?

python using ctypes to work with dll - structure OUT argument

Categories

Resources