What I have in python is a
string lenght = 4
, that I can unpack to get some values.
Is there an equivalent of this function on python, in C?
In C, there is no concept of "packing" like this. Whenever you have a char buffer such as
char buf[128];
whether you treat it like a string or a complex data structure is up to you. The simplest way would be to define a struct and copy data back and forth from your array.
struct MyStruct{
int data1;
int data2;
};
char buf[sizeof(struct MyStruct)];
struct MyStruct myStruct;
myStruct.data1 = 1;
myStruct.data2 = 2;
memcpy(buf, &myStruct, sizeof(struct MyStruct));
Please note that there IS some packing/padding that MAY happen here. For example, if you have a short in your struct, the compiler MAY use 4 bytes anyway. This also fails when you have to use pointers, such as char* strings in the structure.
Related
I got stuck in a trivial problem of reinterpret_cast casting operator. Basically, in CPP, I have a float variable which is used to create a uint32_t variable using reinterpret_cast as shown below-
float x = 2.2949836e-38;
uint32_t rgb = *reinterpret_cast<uint32_t*>(&x);
printf("rgb=%d", rgb); // prints rgb=16377550
I want to achieve the same in python. Please note that conventional int casting isn't producing the expected result.
You can use pack, unpack from struct module:
from struct import pack, unpack
b = pack('f', 2.2949836e-38)
print(unpack('i', b)[0])
Prints:
16377550
Edit:
shortened example
On a 64-bit system an integer in Python takes 24 bytes. This is 3 times the memory that would be needed in e.g. C for a 64-bit integer. Now, I know this is because Python integers are objects. But what is the extra memory used for? I have my guesses, but it would be nice to know for sure.
Remember that the Python int type does not have a limited range like C int has; the only limit is the available memory.
Memory goes to storing the value, the current size of the integer storage (the storage size is variable to support arbitrary sizes), and the standard Python object bookkeeping (a reference to the relevant object and a reference count).
You can look up the longintrepr.h source (the Python 3 int type was traditionally known as the long type in Python 2); it makes effective use of the PyVarObject C type to track integer size:
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
The ob_digit array stores 'digits' of either 15 or 30 bits wide (depending on your platform); so on my 64-bit OS X system, an integer up to (2 ^ 30) - 1 uses 1 'digit':
>>> sys.getsizeof((1 << 30) - 1)
28
but if you use 2 30-bit digits in the number an additional 4 bytes are needed, etc:
>>> sys.getsizeof(1 << 30)
32
>>> sys.getsizeof(1 << 60)
36
>>> sys.getsizeof(1 << 90)
40
The base 24 bytes then are the PyObject_VAR_HEAD structure, holding the object size, the reference count and the type pointer (each 8 bytes / 64 bits on my 64-bit OS X platform).
On Python 2, integers <= sys.maxint but >= -sys.maxint - 1 are stored using a simpler structure storing just the single value:
typedef struct {
PyObject_HEAD
long ob_ival;
} PyIntObject;
because this uses PyObject instead of PyVarObject there is no ob_size field in the struct and the memory size is limited to just 24 bytes; 8 for the long value, 8 for the reference count and 8 for the type object pointer.
From longintrepr.h, we see that a Python 'int' object is defined with this C structure:
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
Digit is a 32-bit unsigned value. The bulk of the space is taken by the variable size object header. From object.h, we can find its definition:
typedef struct {
PyObject ob_base;
Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;
typedef struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
We can see that we are using a Py_ssize_t, 64-bits assuming 64-bit system, to store the count of "digits" in the value. This is possibly wasteful. We can also see that the general object header has a 64-bit reference count, and a pointer to the object type, which will also be a 64-bits of storage. The reference count is necessary for Python to know when to deallocate the object, and the pointer to the object type is necessary to know that we have an int and not, say, a string, as C structures have no way to test the type of an object from an arbitrary pointer.
_PyObject_HEAD_EXTRA is defined to nothing on most builds of python, but can be used to store a linked list of all Python objects on the heap if the build enables that option, using another two pointers of 64-bits each.
I'm trying to pack integers as bytes in python and unpack them in C. So in my python code I have something like
testlib = ctypes.CDLL('/something.so')
testlib.process(repr(pack('B',10)))
which packs 10 as a byte and calls the function "process" in my C code.
What do I need in my C code to unpack this packed data? That is, what do I need to do to get 10 back from the given packed data.
Assuming you have a 10 byte string containing 10 integers, just copy the data.
char packed_data[10];
int unpacked[10];
int i;
for(i = 0; i < 10; ++i)
unpacked[i] = packed_data[i];
... or using memcpy
On the other hand, if you're using 4 bytes pr int when packing, you can split the char string in C and use atoi on it. How are you exchanging data from Python to C ?
I'm working on talking to a library that handles strings as wchar_t arrays. I need to convert these to char arrays so that I can hand them over to Python (using SWIG and Python's PyString_FromString function). Obviously not all wide characters can be converted to chars. According to the documentation for wcstombs, I ought to be able to do something like
wcstombs(NULL, wideString, wcslen(wideString))
to test the string for unconvertable characters -- it's supposed to return -1 if there are any. However, in my test case it's always returning -1. Here's my test function:
void getString(wchar_t* target, int size) {
int i;
for(i = 0; i < size; ++i) {
target[i] = L'a' + i;
}
printf("Generated %d characters, nominal length %d, compare %d\n", size,
wcslen(target), wcstombs(NULL, target, size));
}
This is generating output like this:
Generated 32 characters, nominal length 39, compare -1
Generated 16 characters, nominal length 20, compare -1
Generated 4 characters, nominal length 6, compare -1
Any idea what I'm doing wrong?
On a related note, if you know of a way to convert directly from wchar_t*s to Python unicode strings, that'd be welcome. :) Thanks!
Clearly, as you found, it's essential to zero-terminate your input data.
Regarding the final paragraph, I would convert from wide to UTF8 and call PyUnicode_FromString.
Note that I am assuming you are using Python 2.x, it's presumably all different in Python 3.x.
Python says I need 4 bytes for a format code of "BH":
struct.error: unpack requires a string argument of length 4
Here is the code, I am putting in 3 bytes as I think is needed:
major, minor = struct.unpack("BH", self.fp.read(3))
"B" Unsigned char (1 byte) + "H" Unsigned short (2 bytes) = 3 bytes (!?)
struct.calcsize("BH") says 4 bytes.
EDIT: The file is ~800 MB and this is in the first few bytes of the file so I'm fairly certain there's data left to be read.
The struct module mimics C structures. It takes more CPU cycles for a processor to read a 16-bit word on an odd address or a 32-bit dword on an address not divisible by 4, so structures add "pad bytes" to make structure members fall on natural boundaries. Consider:
struct { 11
char a; 012345678901
short b; ------------
char c; axbbcxxxdddd
int d;
};
This structure will occupy 12 bytes of memory (x being pad bytes).
Python works similarly (see the struct documentation):
>>> import struct
>>> struct.pack('BHBL',1,2,3,4)
'\x01\x00\x02\x00\x03\x00\x00\x00\x04\x00\x00\x00'
>>> struct.calcsize('BHBL')
12
Compilers usually have a way of eliminating padding. In Python, any of =<>! will eliminate padding:
>>> struct.calcsize('=BHBL')
8
>>> struct.pack('=BHBL',1,2,3,4)
'\x01\x02\x00\x03\x04\x00\x00\x00'
Beware of letting struct handle padding. In C, these structures:
struct A { struct B {
short a; int a;
char b; char b;
}; };
are typically 4 and 8 bytes, respectively. The padding occurs at the end of the structure in case the structures are used in an array. This keeps the 'a' members aligned on correct boundaries for structures later in the array. Python's struct module does not pad at the end:
>>> struct.pack('LB',1,2)
'\x01\x00\x00\x00\x02'
>>> struct.pack('LBLB',1,2,3,4)
'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04'
By default, on many platforms the short will be aligned to an offset at a multiple of 2, so there will be a padding byte added after the char.
To disable this, use: struct.unpack("=BH", data). This will use standard alignment, which doesn't add padding:
>>> struct.calcsize('=BH')
3
The = character will use native byte ordering. You can also use < or > instead of = to force little-endian or big-endian byte ordering, respectively.