how to create a data structure binary file in python? - python

I am working a image build project, I am trying to use python to create a structure,
I know c is very easy to create a data structure, here is a c example.
struct Books {
char title[50];
char author[50];
char subject[100];
uint32_t book_id;
uint8_t book_ver;
uint16_t book_location;
};
void main( ) {
FILE *fptr;
struct Books Book1; /* Declare Book1 of type Book */
strcpy( Book1.title, "C Programming");
strcpy( Book1.author, "Anna Ali");
strcpy( Book1.subject, "C Programming Tutorial");
Book1.book_id = 649507;
Book1.book_ver = 2;
Book1.book_location= 308;
fptr = fopen("book_struct.bin","wb");
fwrite(&Book1, sizeof(struct Books), 1, fptr);
fclose(fptr);
}
How can I create a data struct binary file in python?
Can someone help to provide a python reference code?

Related

How to read structured binary data from a file?

The following C++ code writes a header to a file:
#include <iostream>
struct Header
{
uint16_t name;
uint8_t type;
uint8_t padding;
uint32_t width, height;
uint32_t depth1, depth2;
float dMin, dMax;
};
int main()
{
Header header;
header.name = *reinterpret_cast<const uint16_t*>("XO");
header.type = true;
header.width = (uint32_t)512;
header.height = (uint32_t)600;
header.depth1 = (uint32_t)16;
header.depth2 = (uint32_t)25;
header.dMin = 5.0;
header.dMax = 8.6;
FILE* f = fopen("header.bin", "wb");
fwrite(&header, sizeof(Header), 1, f);
}
I am looking to read these header.bin files using Python. In C++ I would be doing something like:
fread(&header, sizeof(Header), 1, f)
But I'm unsure how to read the bytes and convert them into the corresponding fields that the Header struct has in Python?
Use the struct module to define the binary layout of a C-like struct and de-/serialise it:
import struct
# Format String describing the data layout
layout = "H B x 2L 2L 2f"
# Object representing the layout, including size
header = struct.Struct(layout)
with open("header.bin", "rb") as in_stream:
print(header.unpack(in_stream.read(header.size))
The layout is a format string describing the fields in-order, e.g. H for uint16_t, B for uint8_t, x for a pad byte, and so on.
I would do this with Python's ctypes, somewhat so you can share the Header header
Create a class from ctypes.Structure to map the types
import ctypes
class StructHeader(ctypes.Structure):
_fields_ = [
("name", ctypes.c_uint16),
("type", ctypes.c_uint8),
...
]
And create a function which does what you want with a signature like
int header(struct Header &buffer)
{
// open the file and write to buffer
// opportunity for other features
}
Then you can compile a shared object to read it which returns that type
gcc -shared -Wl,-soname,your_soname \
-o library_name file_list library_list
And call out with ctypes.CDLL to read the headers
header = ctypes.CDLL("mylib.so.1").header # function named header
header.argtypes = [ctypes.POINTER(StructHeader)]
header.restype = ctypes.c_int
# allocate struct for write
buffer = StructHeader()
# call out to function to write buffer
header(buffer)

Writing to hdf5-file in C++ results in data being truncated at some point

Consider the following code:
#include <H5Cpp.h>
#include <vector>
#include <eigen3/Eigen/Dense>
#include <iostream>
double* matrix_to_array(Eigen::MatrixXd const &input){
int const NX = input.rows();
int const NY = input.cols();
double *data = new double[NX*NY];
for(std::size_t i=0; i<NX; i++){
for(std::size_t j=0; j<NY; j++){
data[j+i*NX] = input(i,j);
}
}
return data;
}
int main() {
Eigen::MatrixXd data = Eigen::MatrixXd::Random(124, 4654);
data.fill(3);
H5::H5File file("data.hdf5", H5F_ACC_TRUNC);
hsize_t dimsf[2] = {data.rows(), data.cols()};
H5::DataSpace dataspace(2, dimsf);
H5::DataSet dataset = file.createDataSet("test_data_set",
H5::PredType::NATIVE_DOUBLE,
dataspace);
auto data_arr = matrix_to_array(data);
dataset.write(data_arr, H5::PredType::NATIVE_DOUBLE);
delete[] data_arr;
}
It compiles just fine using the following CMakeLists.txt
cmake_minimum_required(VERSION 2.8)
project(test)
find_package(HDF5 REQUIRED COMPONENTS C CXX)
include_directories(${HDF5_INCLUDE_DIRS})
add_executable(hdf5 hdf5.cpp)
target_link_libraries(hdf5 ${HDF5_HL_LIBRARIES} ${HDF5_CXX_LIBRARIES} ${HDF5_LIBRARIES})
After executing I thought everything was fine, but upon running the following python code (which bscly. just prints the data row by row)
import h5py
import numpy as np
hf = h5py.File("build/data.hdf5", "r")
keys = list(hf.keys())
data_set = hf.get(keys[0])
data_set_np = np.array(data_set)
for row in data_set_np:
print(row)
I realized that the first 18000 or so entries of the matrix were properly written to the hdf5-file, while the rest was set to zero for some reason. I checked data and data_arr in the above C++ code, and all the entries of both matrices are set to 0, so the error must happen somewhere in the writing process to the hdf5-file... The issue is, I don't see where. What exactly am I missing?
After some trying out and consulting the examples of the H5 group, I got it to work.
#include <iostream>
#include <string>
#include "H5Cpp.h"
#include <eigen3/Eigen/Dense>
using namespace H5;
int main (void){
const H5std_string FILE_NAME( "data.h5" );
const H5std_string DATASET_NAME( "DOUBLEArray" );
const int NX = 123; // dataset dimensions
const int NY = 4563;
const int RANK = 2;
Eigen::MatrixXd data = Eigen::MatrixXd::Random(NX, NY);
int i, j;
double data_arr[NX][NY]; // buffer for data to write
for (j = 0; j < NX; j++)
{
for (i = 0; i < NY; i++)
data_arr[j][i] = data(j,i);
}
H5File file( FILE_NAME, H5F_ACC_TRUNC );
hsize_t dimsf[2]; // dataset dimensions
dimsf[0] = NX;
dimsf[1] = NY;
DataSpace dataspace( RANK, dimsf );
/*
* Define datatype for the data in the file.
* We will store little endian DOUBLE numbers.
*/
FloatType datatype( PredType::NATIVE_DOUBLE );
datatype.setOrder( H5T_ORDER_LE );
DataSet dataset = file.createDataSet( DATASET_NAME, datatype, dataspace );
dataset.write( data_arr, PredType::NATIVE_DOUBLE );
}
As far as I can tell the only thing that changes is that we specify the order of elements here explicitly, i.e.
FloatType datatype( PredType::NATIVE_DOUBLE );
datatype.setOrder( H5T_ORDER_LE );
while in the question we just pass PredType::NATIVE_DOUBLE as argument. I can't really comment on why or if this solves the problem...

Map data in C++ to memory and read data in Python

I am mapping integers to memory in C++ (Process 1) and trying to read them in Python (Process 2) ..
Current Results:
1) map integer 3 in C++ ==> Python (b'\x03\x00\x00\x00')
2) map integer 4 in C++ ==> Python (b'\x04\x00\x00\x00'), and so on ..
code:
Process 1
#include <windows.h>
#include <iostream>
using namespace std;
void main()
{
auto name = "new";
auto size = 4;
HANDLE hSharedMemory = CreateFileMapping(NULL, NULL, PAGE_READWRITE, NULL, size, name);
auto pMemory = (int*)MapViewOfFile(hSharedMemory, FILE_MAP_ALL_ACCESS, NULL, NULL, size);
for (int i = 0; i < 10; i++)
{
* pMemory = i;
cout << i << endl;
Sleep(1000);
}
UnmapViewOfFile(pMemory);
CloseHandle(hSharedMemory);
}
Process 2
import time
import mmap
bufSize = 4
FILENAME = 'new'
for i in range(10):
data = mmap.mmap(0, bufSize, tagname=FILENAME, access=mmap.ACCESS_READ)
dataRead = data.read(bufSize)
print(dataRead)
time.sleep(1)
However, my goal is to map an array that is 320*240 in size but when I try a simple array as below
int arr[4] = {1,2,3,4};
and attempt to map to memory by * pMemory = arr;
I am getting the error "a value of type int* cannot be assigned to an entity of type int" and error code "0x80070002" ..
Any ideas on how to solve this problem??
P.S for some reason integer 9 is mapped as b'\t\x00\x00\x00' in python ==> what am I missing?
Use memcpy to copy the array to shared memory.
#include <cstring>
#include <windows.h>
int main() {
int array[320*240];
const int size = sizeof(array);
const char *name = "new";
HANDLE hSharedMemory = CreateFileMapping(NULL, NULL, PAGE_READWRITE, NULL, size, name);
void *pMemory = MapViewOfFile(hSharedMemory, FILE_MAP_ALL_ACCESS, NULL, NULL, size);
std::memcpy(pMemory, array, size);
UnmapViewOfFile(pMemory);
CloseHandle(hSharedMemory);
}

How to parse serialized C structs from binary file in python?

I have a handful of different type of C-structs that are all compressed into a binary file.
struct-id serialized-struct struct-id serialized-struct ...
If it were the same struct over and over, it would make sense to use the struct package, but I want to switch between previously defined structs all the time.
STRUCT1_ID = '\xAA'
STRUCT2_ID = '\xBB'
STRUCT_IDS = frozenset([STRUCT1_ID, STRUCT2_ID])
struct1s = []
struct2s = []
def create_test_file(filepath):
with open(filepath, 'wb') as f:
# Write an example struct1 id followed by struct
f.write(STRUCT1_ID)
f.write(b'\x01\x02\x03\x04\x05\x06')
# Write an example struct2 id followed by struct
f.write(STRUCT2_ID)
f.write(b'\x07\x08\x09\x0A')
def parse_test_file(filepath):
with open(filepath, 'rb') as f:
msg_type = f.read(1)
while msg_type:
print(byte)
if byte in STRUCT_IDS:
# Parse the next however many bytes needed by struct
# logic breaks down here
struct1s.append(turnIntoStruct(f.read(?)))
msg_type = f.read(1)
else:
print('Corrupted file. Unrecognized id')
In C, the structs would be:
typedef struct struct1_s {
uint16_t a;
uint16_t b;
uint16_t c;
} struct1_t;
typedef struct struct2_s {
uint16_t d;
uint16_t e;
} struct2_t;
// Declare and initialize the structs
struct1_t s1 = {
.a = 0x0201,
.b = 0x0403,
.c = 0x0605
};
struct2_t s2 = {
.d = 0x0807,
.e = 0x0A09
};
I'm less python than I am C right now. I seem unable to bring construct to python 3.4.3?
Map the ID to the struct pattern, and use the appropriate one.
structmap = {
b'\xaa': ('3H', struct1s),
b'\xbb': ('2H', struct2s)
}
...
structmap[msg_type][1].append(struct.unpack(structmap[msg_type][0],
f.read(struct.calcsize(structmap[msg_type][0]))))

C equivalent to python pickle (object serialization)?

What would be the C equivalent to this python code?
Thanks.
data = gather_me_some_data()
# where data = [ (metic, datapoints), ... ]
# and datapoints = [ (timestamp, value), ... ]
serialized_data = cPickle.dumps(data, protocol=-1)
length_prefix = struct.pack("!L", len(serialized_data))
message = length_prefix + serialized_data
C doesn't supports direct serialization mechanism because in C you can't get type information at run-time. You must yourself inject some type info at run-time and then construct required object by that type info. So define all your possible structs:
typedef struct {
int myInt;
float myFloat;
unsigned char myData[MY_DATA_SIZE];
} MyStruct_1;
typedef struct {
unsigned char myUnsignedChar;
double myDouble;
} MyStruct_2;
Then define enum which collects info about what structs in total you have:
typedef enum {
ST_MYSTRUCT_1,
ST_MYSTRUCT_2
} MyStructType;
Define helper function which lets to determine any struct size:
int GetStructSize(MyStructType structType) {
switch (structType) {
case ST_MYSTRUCT_1:
return sizeof(MyStruct_1);
case ST_MYSTRUCT_2:
return sizeof(MyStruct_2);
default:
// OOPS no such struct in our pocket
return 0;
}
}
Then define serialize function:
void BinarySerialize(
MyStructType structType,
void * structPointer,
unsigned char * serializedData) {
int structSize = GetStructSize(structType);
if (structSize != 0) {
// copy struct metadata to serialized bytes
memcpy(serializedData, &structType, sizeof(structType));
// copy struct itself
memcpy(serializedData+sizeof(structType), structPointer, structSize);
}
}
And de-serialization function:
void BinaryDeserialize(
MyStructType structTypeDestination,
void ** structPointer,
unsigned char * serializedData)
{
// get source struct type
MyStructType structTypeSource;
memcpy(&structTypeSource, serializedData, sizeof(structTypeSource));
// get source struct size
int structSize = GetStructSize(structTypeSource);
if (structTypeSource == structTypeDestination && structSize != 0) {
*structPointer = malloc(structSize);
memcpy(*structPointer, serializedData+sizeof(structTypeSource), structSize);
}
}
Serialization usage example:
MyStruct_2 structInput = {0x69, 0.1};
MyStruct_1 * structOutput_1 = NULL;
MyStruct_2 * structOutput_2 = NULL;
unsigned char testSerializedData[SERIALIZED_DATA_MAX_SIZE] = {0};
// serialize structInput
BinarySerialize(ST_MYSTRUCT_2, &structInput, testSerializedData);
// try to de-serialize to something
BinaryDeserialize(ST_MYSTRUCT_1, &structOutput_1, testSerializedData);
BinaryDeserialize(ST_MYSTRUCT_2, &structOutput_2, testSerializedData);
// determine which object was de-serialized
// (plus you will get code-completion support about object members from IDE)
if (structOutput_1 != NULL) {
// do something with structOutput_1
free(structOutput_1);
}
else if (structOutput_2 != NULL) {
// do something with structOutput_2
free(structOutput_2);
}
I think this is most simple serialization approach in C. But it has some problems:
struct must not have pointers, because you will never know how much memory one needs to allocate when serializing pointers and from where/how to serialize data into pointers.
this example has issues with system endianess - you need to be careful about how data is stored in memory - in big-endian or little-endian fashion and reverse bytes if needed [when casting char * to integal type such as enum] (...or refactor code to be more portable).
If you can use C++, there is the PicklingTools library

Categories