Why is PyBytes_AsStringAndSize() writing the wrong size byte array? - python

I am working on a research project calling some python functions from C and am trying to return a 256-byte bytearray from a python script to my C program using the python/C API. I am trying to store the returned byte array as a char array in my C program so I can later write it to a file, however when I try to convert the bytes PyObject using PyBytes_AsStringAndSize(), it only appears to write 8 bytes of data despite me specifying 256. Could anyone explain what is causing this behaviouir? I have tried scouring the documentation and online and haven't found help. Any ideas would be much appreciated!
int len = PyBytes_Size(pValue); //pValue is the object returned from our python function
printf("C:object returned. Length of pValue Object: %i\n", len);
PyObject *pBytes = PyBytes_FromObject(pValue);
int bLen = PyBytes_Size(pBytes);
printf("C:object converted to bytes. Length of pBytes: %i\n", bLen);
Py_ssize_t size = len;
char * test;
PyBytes_AsStringAndSize(pBytes,&test,&size); //Stores contents of returned pyobject into char array test
printf("Length of new byte array: %lu \n", sizeof(test));
I have looked all throughout the C/python api documentation and online but haven't found any clues so far.
Whenever the code is run, it produces the following output:
C:object returned. Length of pValue Object: 256
C:object converted to bytes. Length of pBytes: 256
Length of new byte array: 8

Related

How to extract a memory address from inside a Python object

I'm using a binary Python library that returns a Buffer object. This object is basically a wrapper of a C object containing a pointer to the actual memory buffer. What I need is to get the memory address contained in that pointer from Python, the problem is that the Buffer object doesn't have a Python method to obtain it, so I need to do some hacky trick to get it.
For the moment I found an ugly and unsafe way to get the pointer value:
I know the internal structure of the C object:
typedef struct _Buffer {
PyObject_VAR_HEAD PyObject *parent;
int type; /* GL_BYTE, GL_SHORT, GL_INT, GL_FLOAT */
int ndimensions;
int *dimensions;
union {
char *asbyte;
short *asshort;
int *asint;
float *asfloat;
double *asdouble;
void *asvoid;
} buf;
} Buffer;
So I wrote this Python code:
# + PyObject_VAR_HEAD size
# + 8 bytes PyObject_VAR_HEAD PyObject *parent
# + 4 bytes from int type
# + 4 bytes from int ndimensions
# + 8 bytes from int *dimensions
# = 24
offset = sys.getsizeof(0) + 24
buffer_pointer_addr = id(buffer) + offset
buffer_pointer_data = ctypes.string_at(buffer_pointer_addr, 8)
buffer_pointer_value = struct.unpack('Q', buffer_pointer_data)[0]
This is working consistently for me. As you can see I'm getting the memory address of the Python Buffer object with id(buffer), but as you may know that's not the actual pointer to the buffer, but just a Python number that in CPython happens to be the memory address to the Python object.
So then I'm adding the offset that I calculated by adding the sizes of all the variables in the C struct. I'm hardcoding the byte sizes (which is obviously completely unsafe) except for the PyObject_VAR_HEAD, that I get with sys.getsizeof(0).
By adding the offset I get the memory address that contains the pointer to the actual buffer, then I use ctypes to extract it with ctypes.string_at hardcoding the size of the pointer as 8 bytes (I'm on a 64bit OS), then I use struct.unpack to convert it to an actual Python int.
So now my question is: how could I implement a safer solution without hardcoding all the sizes? (if it exists). Maybe something with ctypes? It's OK if it only works on CPython.
I found a safer solution after investigating about C Struct padding and based on the following assumptions:
The code will only be used on CPython.
The buffer pointer is at the end of the C Struct.
The buffer pointer size can be safely extracted from void * C-type as it's going to be the biggest of the union{} made in the C struct. Anyway there will be no different sizes between data pointer types on most modern OS's.
The C Struct members are going to be exactly the ones shown in the question
Based on all these assumptions and the rules found here: https://stackoverflow.com/a/38144117/8861787,
we can safely say that there will be no padding at the end of the struct and we can extract the pointer without hardcoding anything:
# Get the size of the Buffer Python object
buffer_obj_size = sys.getsizeof(buffer)
# Get the size of void * C-type
buffer_pointer_size = ctypes.sizeof(ctypes.c_void_p)
# Calculate the address to the pointer assuming that it's at the end of the C Struct
buffer_pointer_addr = id(buffer) + buffer_obj_size - buffer_pointer_size
# Get the actual pointer value as a Python Int
buffer_pointer_value = (ctypes.c_void_p).from_address(buffer_pointer_addr).value

Python socket: cannot receive int array

I try to make a inter process communication between a Python and c program via winsockets. Sending a string does work, but now I try to send an int array from the c socket to the python socket.
I already found out that I have to use htonl() to convert the int array into a byte stream as the send function of winsock2 cannot send int arrays directly.
Now I want to use ntohl() in the python socket but the receive function returns bytes whereas ntohl() needs an integer value as input.
Here is my code
C-Side (just relevant parts):
uint32_t a[1] = {1231};
uint32_t a_converted[1]={0};
a_converted[0] = htonl(a[0]);
iResult = send( ConnectSocket, ( char *) a_converted, sizeof( a_converted), 0 );
Python Side (just relevant parts):
data = connection.recv(16)
data_i = socket.ntohl(data)
What you received is string of bytes, did not ntohl cause exception?
You may use struct module to unpack - for 16 bytes
struct.unpack('!4I', data)
Meaning - unpack 4 unsigned 32-bit integers in network order
RTM
(I cannot test it - try it on your own)
EDIT:
Ooops, did not read your comment through. According to sockets docs, recv should return object of type bytes. If it returns object of type str - you should convert it to bytes - in Python3 it would be data.encode()
PS Which Python are you on?
You said you have managed to send strings over the connection. I assume you sent a char* and received it in python as a string. What you have done is sent a stream of bytes.
Now you want to send an array of integers. In the memory, the integers are again stored as bytes.
Each integer could occupy 4/8 bytes. You can check this before hand by printing
printf("Size of integer is %zu", sizeof(int));
Okay, great now we know how many bytes we need to send. Say it is 4 for now.
We also need to know the endianness of the integers but lets assume big endian for now.
This means the lowest significant byte will be first and the highest significant byte at the end.
So now you can send the integer array exactly lile you sent, by casting the array to char* and sending sizeof(array).
On the receiving side though, you just have a stream of bytes. To convert it to array of integers you need to get 4 bytes at a time and combine it into an integer.
We can do that as follows.
Say there are total 10 integers. You have to pass this information on separately somehow.
bytes = connection.recv(10*4)
array = []
for i in range(10):
x = ord(bytes[i*4+0])
x += ord(bytes[i*4+1]) << 8
x += ord(bytes[i*4+2]) << 16
x += ord(bytes[i*4+3]) << 24
array += [x]
print x
And you will be able to see you array of integers.
Here the function ord converts a character to its ASCII equivalent integer.
Side notes:
Now, if your system has size of integer as 8 instead of 4, you need to extend the body of the loop in python. It will go till 56. Also each of the index in bytes will be i*8+...
Similarly if the endianess is different, the order of the elements will change. Basically the indices on bytes will go from i*4+3 to i*4+0.

How can we shift ob_item pointer in Python tuple if it is static array?

I am currently reading the Python 2.7 source code and got stuck with the following piece of code, in tupleobject.h:
PyObject *ob_item[1];
and in tupleobject.c (PyTuple_SetItem):
p = ((PyTupleObject *)op)->ob_item + i;
How can we shift pointer by i if ob_item is an array of one PyObject?
It's how arrays and pointers can be used interchangeable. So it's equivalent to
p = &((PyTupleObject *)op)->ob_item[i];
Is an array name a pointer? goes a little more into detail.

Convert a large long in python to unsigned char array

Have a python server talking to a C/C++ client. The client expects an unsigned char array to be passed back so I need a way to convert from a long to an unsigned char* of length 64.
Edit:
My initial thought was to look into the memory for the long and grab 8bit chunks to try and create the needed unsigned chars. I'm testing that now.

Python c-api and unicode strings

I need to convert between python objects and c strings of various encodings. Going from a c string to a unicode object was fairly simple using PyUnicode_Decode, however Im not sure how to go the other way
//char* can be a wchar_t or any other element size, just make sure it is correctly terminated for its encoding
Unicode(const char *str, size_t bytes, const char *encoding="utf-16", const char *errors="strict")
:Object(PyUnicode_Decode(str, bytes, encoding, errors))
{
//check for any python exceptions
ExceptionCheck();
}
I want to create another function that takes the python Unicode string and puts it in a buffer using a given encodeing, eg:
//fills buffer with a null terminated string in encoding
void AsCString(char *buffer, size_t bufferBytes,
const char *encoding="utf-16", const char *errors="strict")
{
...
}
I suspect it has somthing to do with PyUnicode_AsEncodedString however that returns a PyObject so I'm not sure how to put that into my buffer...
Note: both methods above are members of a c++ Unicode class that wraps the python api
I'm using Python 3.0
I suspect it has somthing to do with PyUnicode_AsEncodedString however that returns a PyObject so I'm not sure how to put that into my buffer...
The PyObject returned is a PyStringObject, so you just need to use PyString_Size and PyString_AsString to get a pointer to the string's buffer and memcpy it to your own buffer.
If you're looking for a way to go directly from a PyUnicode object into your own char buffer, I don't think that you can do that.

Categories