I've have a 8 bytes long packed string ( bytes ). Which has follwing structure
typedef struct _entry_t {
uint start;
ushort size;
ushort id;
} _entry_t;
I want to know how can I unpack the entire string in above format and extract those member values , in easiest way possible ( One line maybe )
Take a look at the struct package.
Suppose you get the data as bytes and have it stored in the variable input, then you can decode it with the following code:
import struct
start, size, id = struct.unpack('IHH', input)
Depending on the platform the C code is run on, you might want to think about endianess (add ">" or "<" as prefix to the format string) and if the struct needs the attribute __attribute__((packed)). I assumed that on your platform a int ist 32 bits long and a short is 16 bits long.
Related
I'm using a binary Python library that returns a Buffer object. This object is basically a wrapper of a C object containing a pointer to the actual memory buffer. What I need is to get the memory address contained in that pointer from Python, the problem is that the Buffer object doesn't have a Python method to obtain it, so I need to do some hacky trick to get it.
For the moment I found an ugly and unsafe way to get the pointer value:
I know the internal structure of the C object:
typedef struct _Buffer {
PyObject_VAR_HEAD PyObject *parent;
int type; /* GL_BYTE, GL_SHORT, GL_INT, GL_FLOAT */
int ndimensions;
int *dimensions;
union {
char *asbyte;
short *asshort;
int *asint;
float *asfloat;
double *asdouble;
void *asvoid;
} buf;
} Buffer;
So I wrote this Python code:
# + PyObject_VAR_HEAD size
# + 8 bytes PyObject_VAR_HEAD PyObject *parent
# + 4 bytes from int type
# + 4 bytes from int ndimensions
# + 8 bytes from int *dimensions
# = 24
offset = sys.getsizeof(0) + 24
buffer_pointer_addr = id(buffer) + offset
buffer_pointer_data = ctypes.string_at(buffer_pointer_addr, 8)
buffer_pointer_value = struct.unpack('Q', buffer_pointer_data)[0]
This is working consistently for me. As you can see I'm getting the memory address of the Python Buffer object with id(buffer), but as you may know that's not the actual pointer to the buffer, but just a Python number that in CPython happens to be the memory address to the Python object.
So then I'm adding the offset that I calculated by adding the sizes of all the variables in the C struct. I'm hardcoding the byte sizes (which is obviously completely unsafe) except for the PyObject_VAR_HEAD, that I get with sys.getsizeof(0).
By adding the offset I get the memory address that contains the pointer to the actual buffer, then I use ctypes to extract it with ctypes.string_at hardcoding the size of the pointer as 8 bytes (I'm on a 64bit OS), then I use struct.unpack to convert it to an actual Python int.
So now my question is: how could I implement a safer solution without hardcoding all the sizes? (if it exists). Maybe something with ctypes? It's OK if it only works on CPython.
I found a safer solution after investigating about C Struct padding and based on the following assumptions:
The code will only be used on CPython.
The buffer pointer is at the end of the C Struct.
The buffer pointer size can be safely extracted from void * C-type as it's going to be the biggest of the union{} made in the C struct. Anyway there will be no different sizes between data pointer types on most modern OS's.
The C Struct members are going to be exactly the ones shown in the question
Based on all these assumptions and the rules found here: https://stackoverflow.com/a/38144117/8861787,
we can safely say that there will be no padding at the end of the struct and we can extract the pointer without hardcoding anything:
# Get the size of the Buffer Python object
buffer_obj_size = sys.getsizeof(buffer)
# Get the size of void * C-type
buffer_pointer_size = ctypes.sizeof(ctypes.c_void_p)
# Calculate the address to the pointer assuming that it's at the end of the C Struct
buffer_pointer_addr = id(buffer) + buffer_obj_size - buffer_pointer_size
# Get the actual pointer value as a Python Int
buffer_pointer_value = (ctypes.c_void_p).from_address(buffer_pointer_addr).value
Say I have the following code in C++:
union {
int32_t i;
uint32_t ui;
};
i = SomeFunc();
std::string test(std::to_string(ui));
std::ofstream outFile(test);
And say I had the value of i somehow in Python, how would I be able to get the name of the file?
For those of you that are unfamiliar with C++. What I am doing here is writing some value in signed 32-bit integer format to i and then interpreting the bitwise representation as an unsigned 32-bit integer in ui. I am taking the same 32 bits and interpreting them in two different ways.
How can I do this in Python? There does not seem to be any explicit type specification in Python, so how can I reinterpret some set of bits in a different way?
EDIT: I am using Python 2.7.12
I would use python struct for interpreting bits in different ways.
something like following to print -12 as unsigned integer
import struct
p = struct.pack("#i", -12)
print("{}".format(struct.unpack("#I",p)[0]))
Have a python server talking to a C/C++ client. The client expects an unsigned char array to be passed back so I need a way to convert from a long to an unsigned char* of length 64.
Edit:
My initial thought was to look into the memory for the long and grab 8bit chunks to try and create the needed unsigned chars. I'm testing that now.
Is there a way in python to unpack C structures created using #pragma pack(x) or __attribute__((packed)) using structs?
Alternatively, how to determine the manner in which python struct handles padding?
Use the struct class.
It is flexible in terms of byte order (big vs. little endian) and alignment (packing). See Byte Order, Size, and Alignment. It defaults to native byte order (pretty much meaning however python was compiled).
Native example
C:
struct foo {
int bar;
char t;
char x;
}
Python:
struct.pack('IBB', bar, t, x)
I need to convert between python objects and c strings of various encodings. Going from a c string to a unicode object was fairly simple using PyUnicode_Decode, however Im not sure how to go the other way
//char* can be a wchar_t or any other element size, just make sure it is correctly terminated for its encoding
Unicode(const char *str, size_t bytes, const char *encoding="utf-16", const char *errors="strict")
:Object(PyUnicode_Decode(str, bytes, encoding, errors))
{
//check for any python exceptions
ExceptionCheck();
}
I want to create another function that takes the python Unicode string and puts it in a buffer using a given encodeing, eg:
//fills buffer with a null terminated string in encoding
void AsCString(char *buffer, size_t bufferBytes,
const char *encoding="utf-16", const char *errors="strict")
{
...
}
I suspect it has somthing to do with PyUnicode_AsEncodedString however that returns a PyObject so I'm not sure how to put that into my buffer...
Note: both methods above are members of a c++ Unicode class that wraps the python api
I'm using Python 3.0
I suspect it has somthing to do with PyUnicode_AsEncodedString however that returns a PyObject so I'm not sure how to put that into my buffer...
The PyObject returned is a PyStringObject, so you just need to use PyString_Size and PyString_AsString to get a pointer to the string's buffer and memcpy it to your own buffer.
If you're looking for a way to go directly from a PyUnicode object into your own char buffer, I don't think that you can do that.