How to extract a memory address from inside a Python object - python

I'm using a binary Python library that returns a Buffer object. This object is basically a wrapper of a C object containing a pointer to the actual memory buffer. What I need is to get the memory address contained in that pointer from Python, the problem is that the Buffer object doesn't have a Python method to obtain it, so I need to do some hacky trick to get it.
For the moment I found an ugly and unsafe way to get the pointer value:
I know the internal structure of the C object:
typedef struct _Buffer {
PyObject_VAR_HEAD PyObject *parent;
int type; /* GL_BYTE, GL_SHORT, GL_INT, GL_FLOAT */
int ndimensions;
int *dimensions;
union {
char *asbyte;
short *asshort;
int *asint;
float *asfloat;
double *asdouble;
void *asvoid;
} buf;
} Buffer;
So I wrote this Python code:
# + PyObject_VAR_HEAD size
# + 8 bytes PyObject_VAR_HEAD PyObject *parent
# + 4 bytes from int type
# + 4 bytes from int ndimensions
# + 8 bytes from int *dimensions
# = 24
offset = sys.getsizeof(0) + 24
buffer_pointer_addr = id(buffer) + offset
buffer_pointer_data = ctypes.string_at(buffer_pointer_addr, 8)
buffer_pointer_value = struct.unpack('Q', buffer_pointer_data)[0]
This is working consistently for me. As you can see I'm getting the memory address of the Python Buffer object with id(buffer), but as you may know that's not the actual pointer to the buffer, but just a Python number that in CPython happens to be the memory address to the Python object.
So then I'm adding the offset that I calculated by adding the sizes of all the variables in the C struct. I'm hardcoding the byte sizes (which is obviously completely unsafe) except for the PyObject_VAR_HEAD, that I get with sys.getsizeof(0).
By adding the offset I get the memory address that contains the pointer to the actual buffer, then I use ctypes to extract it with ctypes.string_at hardcoding the size of the pointer as 8 bytes (I'm on a 64bit OS), then I use struct.unpack to convert it to an actual Python int.
So now my question is: how could I implement a safer solution without hardcoding all the sizes? (if it exists). Maybe something with ctypes? It's OK if it only works on CPython.

I found a safer solution after investigating about C Struct padding and based on the following assumptions:
The code will only be used on CPython.
The buffer pointer is at the end of the C Struct.
The buffer pointer size can be safely extracted from void * C-type as it's going to be the biggest of the union{} made in the C struct. Anyway there will be no different sizes between data pointer types on most modern OS's.
The C Struct members are going to be exactly the ones shown in the question
Based on all these assumptions and the rules found here: https://stackoverflow.com/a/38144117/8861787,
we can safely say that there will be no padding at the end of the struct and we can extract the pointer without hardcoding anything:
# Get the size of the Buffer Python object
buffer_obj_size = sys.getsizeof(buffer)
# Get the size of void * C-type
buffer_pointer_size = ctypes.sizeof(ctypes.c_void_p)
# Calculate the address to the pointer assuming that it's at the end of the C Struct
buffer_pointer_addr = id(buffer) + buffer_obj_size - buffer_pointer_size
# Get the actual pointer value as a Python Int
buffer_pointer_value = (ctypes.c_void_p).from_address(buffer_pointer_addr).value

Related

Why is PyBytes_AsStringAndSize() writing the wrong size byte array?

I am working on a research project calling some python functions from C and am trying to return a 256-byte bytearray from a python script to my C program using the python/C API. I am trying to store the returned byte array as a char array in my C program so I can later write it to a file, however when I try to convert the bytes PyObject using PyBytes_AsStringAndSize(), it only appears to write 8 bytes of data despite me specifying 256. Could anyone explain what is causing this behaviouir? I have tried scouring the documentation and online and haven't found help. Any ideas would be much appreciated!
int len = PyBytes_Size(pValue); //pValue is the object returned from our python function
printf("C:object returned. Length of pValue Object: %i\n", len);
PyObject *pBytes = PyBytes_FromObject(pValue);
int bLen = PyBytes_Size(pBytes);
printf("C:object converted to bytes. Length of pBytes: %i\n", bLen);
Py_ssize_t size = len;
char * test;
PyBytes_AsStringAndSize(pBytes,&test,&size); //Stores contents of returned pyobject into char array test
printf("Length of new byte array: %lu \n", sizeof(test));
I have looked all throughout the C/python api documentation and online but haven't found any clues so far.
Whenever the code is run, it produces the following output:
C:object returned. Length of pValue Object: 256
C:object converted to bytes. Length of pBytes: 256
Length of new byte array: 8

Python unpack a Packed Struct in One Line

I've have a 8 bytes long packed string ( bytes ). Which has follwing structure
typedef struct _entry_t {
uint start;
ushort size;
ushort id;
} _entry_t;
I want to know how can I unpack the entire string in above format and extract those member values , in easiest way possible ( One line maybe )
Take a look at the struct package.
Suppose you get the data as bytes and have it stored in the variable input, then you can decode it with the following code:
import struct
start, size, id = struct.unpack('IHH', input)
Depending on the platform the C code is run on, you might want to think about endianess (add ">" or "<" as prefix to the format string) and if the struct needs the attribute __attribute__((packed)). I assumed that on your platform a int ist 32 bits long and a short is 16 bits long.

how to memset a unicode string in python 2.7

I have a unicode string f. I want to memset it to 0. print f should display null (\0)
I am using ctypes.memset to achieve this -
> >>> f
> u'abc'
> >>> print ("%s" % type(f))
> <type 'unicode'>
> >>> import ctypes
> **>>> ctypes.memset(id(f)+50,0,6)**
> **4363962530
> >>> f
> u'abc'
> >>> print f
> abc**
Why did the memory location not get memset in case of unicode string?
It works perfectly for an str object.
Thanks for help.
First, this is almost certainly a very bad idea. Python expects strings to be immutable. There's a reason that even the C API won't let you change their contents after they're flagged ready. If you're just doing this to play around with the interpreter's implementation, that can be fun and instructive, but if you're doing it for any real-life purpose, you're probably doing something wrong.
In particular, if you're doing it for "security", what you almost certainly really want to do is to not create a unicode in the first place, but instead create, say, a bytearray with the UTF-16 or UTF-32 encoding of your string, which can be zeroed out in a way that's safe, portable, and a lot easier.
Anyway, there's no reason to expect that two completely different types should store their buffers at the same offset.
In CPython 2.x, a str is a PyStringObject:
typedef struct {
PyObject_VAR_HEAD
long ob_shash;
int ob_sstate;
char ob_sval[1];
} PyStringObject;
That ob_sval is the buffer; the offset should be 36 on 64-bit builds and (I think) 24 on 32-bit builds.
In a comment, you say:
I read it somewhere and also the offset for a string type is 37 in my system which is what sys.getsizeof('') shows -> >>> sys.getsizeof('') 37
The offset for a string buffer is actually 36, not 37. And the fact that it's even that close is just a coincidence of the way str is implemented. (Hopefully you can understand why by looking at the struct definition—if not, you definitely shouldn't be writing code like this.) There's no reason to expect the same trick to work for some other type without looking at its implementation.
A unicode is a PyUnicodeObject:
typedef struct {
PyObject_HEAD
Py_ssize_t length; /* Length of raw Unicode data in buffer */
Py_UNICODE *str; /* Raw Unicode buffer */
long hash; /* Hash value; -1 if not set */
PyObject *defenc; /* (Default) Encoded version as Python
string, or NULL; this is used for
implementing the buffer protocol */
} PyUnicodeObject;
Its buffer is not even inside the object itself; that str member is a pointer to the buffer (which is not guaranteed to be right after the struct). Its offset should be 24 on 64-bit builds, and (I think) 20 on 32-bit builds. So, to do the equivalent, you'd need to read the pointer there, then follow it to find the location to memset.
If you're using a narrow-Unicode build, it should look like this:
>>> ctypes.POINTER(ctypes.c_uint16 * len(g)).from_address(id(g)+24).contents[:]
[97, 98, 99]
That's the ctypes translation of finding (uint16_t *)(((char *)g)+24) and reading the array that starts at *that and ends at *(that+len(g)), which is what you'd have to do if you were writing C code and didn't have access to the unicodeobject.h header.
(In the the test I just quoted, g is at 0x10a598090, while its src points to 0x10a3b09e0, so the buffer is not immediately after the struct, or anywhere near it; it's about 2MB before it.)
For a wide-Unicode build, the same thing with c_uint32.
So, that should show you what you want to memset.
And you should also see a serious implication for your attempt at "security" here. (If I have to point it out, that's yet another indication that you should not be writing this code.)

How to read packed structures using ctypes

Is there a way in python to unpack C structures created using #pragma pack(x) or __attribute__((packed)) using structs?
Alternatively, how to determine the manner in which python struct handles padding?
Use the struct class.
It is flexible in terms of byte order (big vs. little endian) and alignment (packing). See Byte Order, Size, and Alignment. It defaults to native byte order (pretty much meaning however python was compiled).
Native example
C:
struct foo {
int bar;
char t;
char x;
}
Python:
struct.pack('IBB', bar, t, x)

Python c-api and unicode strings

I need to convert between python objects and c strings of various encodings. Going from a c string to a unicode object was fairly simple using PyUnicode_Decode, however Im not sure how to go the other way
//char* can be a wchar_t or any other element size, just make sure it is correctly terminated for its encoding
Unicode(const char *str, size_t bytes, const char *encoding="utf-16", const char *errors="strict")
:Object(PyUnicode_Decode(str, bytes, encoding, errors))
{
//check for any python exceptions
ExceptionCheck();
}
I want to create another function that takes the python Unicode string and puts it in a buffer using a given encodeing, eg:
//fills buffer with a null terminated string in encoding
void AsCString(char *buffer, size_t bufferBytes,
const char *encoding="utf-16", const char *errors="strict")
{
...
}
I suspect it has somthing to do with PyUnicode_AsEncodedString however that returns a PyObject so I'm not sure how to put that into my buffer...
Note: both methods above are members of a c++ Unicode class that wraps the python api
I'm using Python 3.0
I suspect it has somthing to do with PyUnicode_AsEncodedString however that returns a PyObject so I'm not sure how to put that into my buffer...
The PyObject returned is a PyStringObject, so you just need to use PyString_Size and PyString_AsString to get a pointer to the string's buffer and memcpy it to your own buffer.
If you're looking for a way to go directly from a PyUnicode object into your own char buffer, I don't think that you can do that.

Categories