Convert a large long in python to unsigned char array - python

Have a python server talking to a C/C++ client. The client expects an unsigned char array to be passed back so I need a way to convert from a long to an unsigned char* of length 64.
Edit:
My initial thought was to look into the memory for the long and grab 8bit chunks to try and create the needed unsigned chars. I'm testing that now.

Related

Why is PyBytes_AsStringAndSize() writing the wrong size byte array?

I am working on a research project calling some python functions from C and am trying to return a 256-byte bytearray from a python script to my C program using the python/C API. I am trying to store the returned byte array as a char array in my C program so I can later write it to a file, however when I try to convert the bytes PyObject using PyBytes_AsStringAndSize(), it only appears to write 8 bytes of data despite me specifying 256. Could anyone explain what is causing this behaviouir? I have tried scouring the documentation and online and haven't found help. Any ideas would be much appreciated!
int len = PyBytes_Size(pValue); //pValue is the object returned from our python function
printf("C:object returned. Length of pValue Object: %i\n", len);
PyObject *pBytes = PyBytes_FromObject(pValue);
int bLen = PyBytes_Size(pBytes);
printf("C:object converted to bytes. Length of pBytes: %i\n", bLen);
Py_ssize_t size = len;
char * test;
PyBytes_AsStringAndSize(pBytes,&test,&size); //Stores contents of returned pyobject into char array test
printf("Length of new byte array: %lu \n", sizeof(test));
I have looked all throughout the C/python api documentation and online but haven't found any clues so far.
Whenever the code is run, it produces the following output:
C:object returned. Length of pValue Object: 256
C:object converted to bytes. Length of pBytes: 256
Length of new byte array: 8

How to extract a memory address from inside a Python object

I'm using a binary Python library that returns a Buffer object. This object is basically a wrapper of a C object containing a pointer to the actual memory buffer. What I need is to get the memory address contained in that pointer from Python, the problem is that the Buffer object doesn't have a Python method to obtain it, so I need to do some hacky trick to get it.
For the moment I found an ugly and unsafe way to get the pointer value:
I know the internal structure of the C object:
typedef struct _Buffer {
PyObject_VAR_HEAD PyObject *parent;
int type; /* GL_BYTE, GL_SHORT, GL_INT, GL_FLOAT */
int ndimensions;
int *dimensions;
union {
char *asbyte;
short *asshort;
int *asint;
float *asfloat;
double *asdouble;
void *asvoid;
} buf;
} Buffer;
So I wrote this Python code:
# + PyObject_VAR_HEAD size
# + 8 bytes PyObject_VAR_HEAD PyObject *parent
# + 4 bytes from int type
# + 4 bytes from int ndimensions
# + 8 bytes from int *dimensions
# = 24
offset = sys.getsizeof(0) + 24
buffer_pointer_addr = id(buffer) + offset
buffer_pointer_data = ctypes.string_at(buffer_pointer_addr, 8)
buffer_pointer_value = struct.unpack('Q', buffer_pointer_data)[0]
This is working consistently for me. As you can see I'm getting the memory address of the Python Buffer object with id(buffer), but as you may know that's not the actual pointer to the buffer, but just a Python number that in CPython happens to be the memory address to the Python object.
So then I'm adding the offset that I calculated by adding the sizes of all the variables in the C struct. I'm hardcoding the byte sizes (which is obviously completely unsafe) except for the PyObject_VAR_HEAD, that I get with sys.getsizeof(0).
By adding the offset I get the memory address that contains the pointer to the actual buffer, then I use ctypes to extract it with ctypes.string_at hardcoding the size of the pointer as 8 bytes (I'm on a 64bit OS), then I use struct.unpack to convert it to an actual Python int.
So now my question is: how could I implement a safer solution without hardcoding all the sizes? (if it exists). Maybe something with ctypes? It's OK if it only works on CPython.
I found a safer solution after investigating about C Struct padding and based on the following assumptions:
The code will only be used on CPython.
The buffer pointer is at the end of the C Struct.
The buffer pointer size can be safely extracted from void * C-type as it's going to be the biggest of the union{} made in the C struct. Anyway there will be no different sizes between data pointer types on most modern OS's.
The C Struct members are going to be exactly the ones shown in the question
Based on all these assumptions and the rules found here: https://stackoverflow.com/a/38144117/8861787,
we can safely say that there will be no padding at the end of the struct and we can extract the pointer without hardcoding anything:
# Get the size of the Buffer Python object
buffer_obj_size = sys.getsizeof(buffer)
# Get the size of void * C-type
buffer_pointer_size = ctypes.sizeof(ctypes.c_void_p)
# Calculate the address to the pointer assuming that it's at the end of the C Struct
buffer_pointer_addr = id(buffer) + buffer_obj_size - buffer_pointer_size
# Get the actual pointer value as a Python Int
buffer_pointer_value = (ctypes.c_void_p).from_address(buffer_pointer_addr).value

Python unpack a Packed Struct in One Line

I've have a 8 bytes long packed string ( bytes ). Which has follwing structure
typedef struct _entry_t {
uint start;
ushort size;
ushort id;
} _entry_t;
I want to know how can I unpack the entire string in above format and extract those member values , in easiest way possible ( One line maybe )
Take a look at the struct package.
Suppose you get the data as bytes and have it stored in the variable input, then you can decode it with the following code:
import struct
start, size, id = struct.unpack('IHH', input)
Depending on the platform the C code is run on, you might want to think about endianess (add ">" or "<" as prefix to the format string) and if the struct needs the attribute __attribute__((packed)). I assumed that on your platform a int ist 32 bits long and a short is 16 bits long.

Pad byte in python struct

What is a pad byte (x) in the python struct type? Is it an unsigned char with value 0, or what exactly does it look like / why is it one of the types that are available in the struct object?
For example, what would be the difference between doing one of the following:
>>> struct.pack('BB', 0, ord('a'))
b'\x00a'
>>> struct.pack('xB', ord('a'))
b'\x00a'
It's useful for matching required length of another system.
For example. In my work, there is a server that sends a fixed sized header and expects fixed sized messages. This guarantees that, lets say the first 20 bytes are a header, with bytes 0-8 being the message size.
It doesn't really matter what type the pad is. It's basically junk data. unsigned char 0 is a good choice though and the one that struct.pack uses.

Python socket: cannot receive int array

I try to make a inter process communication between a Python and c program via winsockets. Sending a string does work, but now I try to send an int array from the c socket to the python socket.
I already found out that I have to use htonl() to convert the int array into a byte stream as the send function of winsock2 cannot send int arrays directly.
Now I want to use ntohl() in the python socket but the receive function returns bytes whereas ntohl() needs an integer value as input.
Here is my code
C-Side (just relevant parts):
uint32_t a[1] = {1231};
uint32_t a_converted[1]={0};
a_converted[0] = htonl(a[0]);
iResult = send( ConnectSocket, ( char *) a_converted, sizeof( a_converted), 0 );
Python Side (just relevant parts):
data = connection.recv(16)
data_i = socket.ntohl(data)
What you received is string of bytes, did not ntohl cause exception?
You may use struct module to unpack - for 16 bytes
struct.unpack('!4I', data)
Meaning - unpack 4 unsigned 32-bit integers in network order
RTM
(I cannot test it - try it on your own)
EDIT:
Ooops, did not read your comment through. According to sockets docs, recv should return object of type bytes. If it returns object of type str - you should convert it to bytes - in Python3 it would be data.encode()
PS Which Python are you on?
You said you have managed to send strings over the connection. I assume you sent a char* and received it in python as a string. What you have done is sent a stream of bytes.
Now you want to send an array of integers. In the memory, the integers are again stored as bytes.
Each integer could occupy 4/8 bytes. You can check this before hand by printing
printf("Size of integer is %zu", sizeof(int));
Okay, great now we know how many bytes we need to send. Say it is 4 for now.
We also need to know the endianness of the integers but lets assume big endian for now.
This means the lowest significant byte will be first and the highest significant byte at the end.
So now you can send the integer array exactly lile you sent, by casting the array to char* and sending sizeof(array).
On the receiving side though, you just have a stream of bytes. To convert it to array of integers you need to get 4 bytes at a time and combine it into an integer.
We can do that as follows.
Say there are total 10 integers. You have to pass this information on separately somehow.
bytes = connection.recv(10*4)
array = []
for i in range(10):
x = ord(bytes[i*4+0])
x += ord(bytes[i*4+1]) << 8
x += ord(bytes[i*4+2]) << 16
x += ord(bytes[i*4+3]) << 24
array += [x]
print x
And you will be able to see you array of integers.
Here the function ord converts a character to its ASCII equivalent integer.
Side notes:
Now, if your system has size of integer as 8 instead of 4, you need to extend the body of the loop in python. It will go till 56. Also each of the index in bytes will be i*8+...
Similarly if the endianess is different, the order of the elements will change. Basically the indices on bytes will go from i*4+3 to i*4+0.

Categories