I got stuck in a trivial problem of reinterpret_cast casting operator. Basically, in CPP, I have a float variable which is used to create a uint32_t variable using reinterpret_cast as shown below-
float x = 2.2949836e-38;
uint32_t rgb = *reinterpret_cast<uint32_t*>(&x);
printf("rgb=%d", rgb); // prints rgb=16377550
I want to achieve the same in python. Please note that conventional int casting isn't producing the expected result.
You can use pack, unpack from struct module:
from struct import pack, unpack
b = pack('f', 2.2949836e-38)
print(unpack('i', b)[0])
Prints:
16377550
Edit:
shortened example
Related
I want to understand more about Cython's awesome typed-memoryviews and the memory layout indirect_contiguous.
According to the documentation indirect_contiguous is used when "the list of pointers is contiguous".
There's also an example usage:
# contiguous list of pointers to contiguous lists of ints
cdef int[::view.indirect_contiguous, ::1] b
So pls correct me if I'm wrong but I assume a "contiguous list of pointers to contiguous lists of ints" means something like the array created by the following c++ dummy-code:
// we want to create a 'contiguous list of pointers to contiguous lists of ints'
int** array;
// allocate row-pointers
// This is the 'contiguous list of pointers' related to the first dimension:
array = new int*[ROW_COUNT]
// allocate some rows, each row is a 'contiguous list of ints'
array[0] = new int[COL_COUNT]{1,2,3}
So if I understand correctly then in my Cython code it should be possible to get a memoryview from a int** like this:
cdef int** list_of_pointers = get_pointers()
cdef int[::view.indirect_contiguous, ::1] view = <int[:ROW_COUNT:view.indirect_contiguous,COL_COUNT:1]> list_of_pointers
But I get Compile-errors:
cdef int[::view.indirect_contiguous, ::1] view = <int[:ROW_COUNT:view.indirect_contiguous,:COL_COUNT:1]> list_of_pointers
^
------------------------------------------------------------
memview_test.pyx:76:116: Pointer base type does not match cython.array base type
what did I do wrong?
Am I missing any casts or did I misunderstand the concept of indirect_contiguous?
Let's set the record straight: typed memory view can be only used with objects which implement buffer-protocol.
Raw C-pointers obviously don't implement the buffer-protocol. But you might ask, why something like the following quick&dirty code works:
%%cython
from libc.stdlib cimport calloc
def f():
cdef int* v=<int *>calloc(4, sizeof(int))
cdef int[:] b = <int[:4]>v
return b[0] # leaks memory, so what?
Here, a pointer (v) is used to construct a typed memory view (b). There is however more, going under the hood (as can be seen in the cythonized c-file):
a cython-array (i.e. cython.view.array) is constructed, which wraps the raw pointer and can expose it via buffer-protocol
this array is used for the creation of typed memory view.
Your understanding what view.indirect_contiguous is used for is right - it is exactly what you desire. However, the problem is view.array, which just cannot handle this type of data-layout.
view.indirect and view.indirect_contiguous correspond to PyBUF_INDIRECT in protocol-buffer parlance and for this the field suboffsets must contain some meaningful values (i.e >=0 for some dimensions). However, as can be see in the source-code view.array doesn't have this member at all - there is no way it can represent the complex memory layout at all!
Where does it leave us? As pointed out by #chrisb and #DavidW in your other question, you will have to implement a wrapper which can expose your data-structure via protocol-buffer.
There are data structures in Python, which use the indirect memory layout - most prominently the PIL-arrays. A good starting point to understand, how suboffsets are supposed to work is this piece of documenation:
void *get_item_pointer(int ndim, void *buf, Py_ssize_t *strides,
Py_ssize_t *suboffsets, Py_ssize_t *indices) {
char *pointer = (char*)buf; // A
int i;
for (i = 0; i < ndim; i++) {
pointer += strides[i] * indices[i]; // B
if (suboffsets[i] >=0 ) {
pointer = *((char**)pointer) + suboffsets[i]; // C
}
}
return (void*)pointer; // D
}
In your case strides and offsets would be
strides=[sizeof(int*), sizeof(int)] (i.e. [8,4] on usual x86_64 machines)
offsets=[0,-1], i.e. only the first dimension is indirect.
Getting the address of element [x,y] would then happen as follows:
in the line A, pointer is set to buf, let's assume BUF.
first dimension:
in line B, pointer becomes BUF+x*8, and points to the location of the pointer to x-th row.
because suboffsets[0]>=0, we dereference the pointer in line C and thus it shows to address ROW_X - the start of the x-th row.
second dimension:
in line B we get the address of the y element using strides, i.e. pointer=ROW_X+4*y
second dimension is direct (signaled by suboffset[1]<0), so no dereferencing is needed.
we are done, pointer points to the desired address and is returned in line D.
FWIW, I have implemented a library which is able to export int** and similar memory layouts via buffer protocol: https://github.com/realead/indirect_buffer.
What I have in python is a
string lenght = 4
, that I can unpack to get some values.
Is there an equivalent of this function on python, in C?
In C, there is no concept of "packing" like this. Whenever you have a char buffer such as
char buf[128];
whether you treat it like a string or a complex data structure is up to you. The simplest way would be to define a struct and copy data back and forth from your array.
struct MyStruct{
int data1;
int data2;
};
char buf[sizeof(struct MyStruct)];
struct MyStruct myStruct;
myStruct.data1 = 1;
myStruct.data2 = 2;
memcpy(buf, &myStruct, sizeof(struct MyStruct));
Please note that there IS some packing/padding that MAY happen here. For example, if you have a short in your struct, the compiler MAY use 4 bytes anyway. This also fails when you have to use pointers, such as char* strings in the structure.
I've a c code to type cast a string to an integer via pointer.
char s[]="efgh";
int * p;
p=(int *) s;
printf("%d",*p);
This gives me an output of:
1751606885
Which is a 32 bit integer.
I'm analyzing a network packet in python and need the above functionality in python.
I've a string
s="efgh"
and want the above in a 32 bit integer (from the byte level).
How can I do it?
You can try struct.unpack:
>>> import struct
>>>
>>> struct.unpack('<I', 'efgh')
(1751606885,)
>>>
I'm trying to pack integers as bytes in python and unpack them in C. So in my python code I have something like
testlib = ctypes.CDLL('/something.so')
testlib.process(repr(pack('B',10)))
which packs 10 as a byte and calls the function "process" in my C code.
What do I need in my C code to unpack this packed data? That is, what do I need to do to get 10 back from the given packed data.
Assuming you have a 10 byte string containing 10 integers, just copy the data.
char packed_data[10];
int unpacked[10];
int i;
for(i = 0; i < 10; ++i)
unpacked[i] = packed_data[i];
... or using memcpy
On the other hand, if you're using 4 bytes pr int when packing, you can split the char string in C and use atoi on it. How are you exchanging data from Python to C ?
From http://www.cs.bell-labs.com/cm/cs/pearls/sol01.html:
The C code is like that:
#define BITSPERWORD 32
#define SHIFT 5
#define MASK 0x1F
#define N 10000000
int a[1 + N/BITSPERWORD];
void set(int i) { a[i>>SHIFT] |= (1<<(i & MASK)); }
void clr(int i) { a[i>>SHIFT] &= ~(1<<(i & MASK)); }
int test(int i){ return a[i>>SHIFT] & (1<<(i & MASK)); }
I found ctypes, BitArrays,numpy but I'm not sure whether they could be as efficient as the C codes above.
For example, if I write codes like this:
from ctypes import c_int
a=[c_int(9)]*1024*1024
Would the space used be 1M Bytes, or much more?
Does anyone know some good libraries that can do the same things in Python?
Numpy or ctypes are both good choices. But are you sure your Python code really needs to be as efficient as C, and are you sure this code is a performance hotspot?
The best thing to do is to use the Python profiler to ensure that this code really does need to be as efficient as C. If it truly does, then it will probably be easiest to just keep the code in C and link to it using something like ctypes or SWIG.
Edit: To answer your updated question, a numpy array of size N with element size M will contain N*M bytes of contiguous memory, plus a header and some bytes for views.
Here are a couple of related links:
Python memory usage of numpy arrays
Memory usage of numpy-arrays
you can also check the built-in array module:
>>> import array
>>> help(array)
Help on built-in module array:
NAME
array
FILE
(built-in)
DESCRIPTION
This module defines an object type which can efficiently represent
an array of basic values: characters, integers, floating point
numbers. Arrays are sequence types and behave very much like lists,
except that the type of objects stored in them is constrained. The
type is specified at object creation time by using a type code, which
is a single character. The following type codes are defined:
Type code C Type Minimum size in bytes
'b' signed integer 1
'B' unsigned integer 1
'u' Unicode character 2 (see note)
'h' signed integer 2
'H' unsigned integer 2
'i' signed integer 2
'I' unsigned integer 2
'l' signed integer 4
'L' unsigned integer 4
'f' floating point 4
'd' floating point 8
This:
a=[c_int()]
makes a list which contains a reference to a c_int object.
Multiplying the list merely duplicates the references, so:
a = [c_int()] * 1024 * 1024
actually creates a list of 1024 * 1024 references to the same single c_int object.
If you want an array of 1024 * 1024 c_ints, do this:
a = c_int * (1024 * 1024)