Ctypes Offset Into A Buffer - python

I have a string buffer: b = create_string_buffer(numb) where numb is a number of bytes.
In my wrapper I need to splice up this buffer. When calling a function that expects a POINTER(c_char) I can do: myfunction(self, byref(b, offset)) but in a Structure:
class mystruct(Structure):
_fields_ = [("buf", POINTER(c_char))]
I am unable to do this, getting an argument type exception. So my question is: how can I assign .buf to be an offset into b. Direct assignment works so .buf = b, however this is unsuitable. (Python does not hold up to well against ~32,000 such buffers being created every second, hence my desire to use a single spliced buffer.)

ctypes.cast
>>> import ctypes
>>> b = ctypes.create_string_buffer(500)
>>> b[:6] = 'foobar'
>>> ctypes.cast(ctypes.byref(b, 4), ctypes.POINTER(ctypes.c_char))
<ctypes.LP_c_char object at 0x100756e60>
>>> _.contents
c_char('a')

Related

Safe use of ctypes.create_string_buffer?

Usually in Python when you do an assignment of a variable, you don't get a copy - you just get a second reference to the same object.
a = b'Hi'
b = a
a is b # shows True
Now when you use ctypes.create_string_buffer to get a buffer to e.g. interact with a Windows API function, you can use the .raw attribute to access the bytes. But what if you want to access those bytes after you've deleted the buffer?
c = ctypes.create_string_buffer(b'Hi')
d = c.raw
e = c.raw
d is e # shows False?
d == e # shows True as you'd expect
c.raw is c.raw # shows False!
del c
At this point are d and e still safe to use? From my experimentation it looks like the .raw attribute makes copies when you access it, but I can't find anything in the official documentation to support that.
.raw returns a separate, immutable Python bytes object each time called it may be an interned version (d is e could return True) but is safe to use.
An easy test:
>>> x=ctypes.create_string_buffer(1)
>>> a = x.raw
>>> x[0] = 255
>>> b = x.raw
>>> a
b'\x00'
>>> b
b'\xff'
This point in the documentation comments on this for the ctypes type c_char_p, but it applies to other ctypes types like c_char_Array as well (emphasis mine):
>>> s = c_char_p()
>>> s.value = b"abc def ghi"
>>> s.value
b'abc def ghi'
>>> s.value is s.value
False
...
Why is it printing False? ctypes instances are objects containing a memory block plus some descriptors accessing the contents of the memory. Storing a Python object in the memory block does not store the object itself, instead the contents of the object is stored. Accessing the contents again constructs a new Python object each time!

Python convert string back to object

I have an object:
c = Character(...)
I convert it to a string by using:
p = "{0}".format(c)
print(p)
>>> <Character.Character object at 0x000002267ED6DA50>
How do i get the object back so i can run this code?
p.get_name()
You absolutely can if you are using CPython (where the id is the memory address). Different implementations may not work the same way.
>>> import ctypes
>>> class A:
... pass
...
>>> a = A()
>>> id(a)
140669136944864
>>> b = ctypes.cast(id(a), ctypes.py_object).value
>>> b
<__main__.A object at 0x7ff015f03ee0>
>>> a is b
True
So we've de-referenced the id of a back into a py_object and snagged its value.
If your main goal is to serialize and deserialize objects. (ie. turn objects into string and back while preserving all the data and functions) your can use pickle. You can use pickle.dumps to convert any object into a string and pickle.loads to convert it back into an object. docs
>>> import pickle
>>> class Student:
... def __init__(self, name, age):
... self.name = name
... self.age = age
...
>>> a = Student("name", 20)
>>> pickle.dumps(a)
b'\x80\x04\x951\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x07Student\x94\x93\x94)\x81\x94}\x94(\x8c\x04name\x94h\x05\x8c\x03age\x94K\x14ub.'
>>> s = pickle.dumps(a)
>>> b = pickle.loads(s)
>>> b
<__main__.Student object at 0x7f04a856c910>
>>> b.name == a.name and b.age == a.age
True
It is not possible in the general case to use the ID embedded in the default __str__ implementation to retrieve an object.
Borrowing from g.d.d.c's answer, let's define a function to do this:
import ctypes
def get_object_by_id(obj_id):
return ctypes.cast(obj_id, ctypes.py_object).value
def get_object_by_repr(obj_repr):
return get_object_by_id(int(obj_repr[-19:-1], 16))
It works for any object that is still in scope, provided that it's using the default __repr__/__str__ implementation that includes the hex-encoded id at the end of the string:
>>> class A:
... pass
...
>>> a = A()
>>> r = str(a)
>>> r
'<__main__.A object at 0x000001C584E8BC10>'
>>> get_object_by_repr(r)
<__main__.A object at 0x000001C584E8BC10>
But what if our original A has gone out of scope?
>>> def get_a_repr():
... a = A()
... return str(a)
...
>>> r = get_a_repr()
>>> get_object_by_repr(r)
(crash)
(and I don't mean an uncaught exception, I mean Python itself crashes)
You don't necessarily need to define a in a function to do this; it also can happen if you just rebind a in the local scope (note: GC isn't necessarily guaranteed to happen as soon as you rebind the variable, so this may not behave 100% deterministically, but the below is a real example):
>>> a = A()
>>> r = str(a)
>>> get_object_by_repr(r)
<__main__.A object at 0x000001C73C73BBE0>
>>> a = A()
>>> get_object_by_repr(r)
<__main__.A object at 0x000001C73C73BBE0>
>>> a
<__main__.A object at 0x000001C73C73B9A0>
>>> get_object_by_repr(r)
(crash)
I'd expect this to also happen if you passed this string between processes, stored it in a file for later use by the same script, or any of the other things you'd normally be doing with a serialized object.
The reason this happens is that unlike C, Python garbage-collects objects that have gone out of scope and which do not have any references -- and the id value itself (which is just an int), or the string representation of the object that has the id embedded in it, is not recognized by the interpreter as a live reference! And because ctypes lets you reach right into the guts of the interpreter (usually a bad idea if you don't know exactly what you're doing), you're telling it to dereference a pointer to freed memory, and it crashes.
In other situations, you might actually get the far more insidious bug of getting a different object because that memory address has since been repurposed to hold something else (I'm not sure how likely this is).
To actually solve the problem of turning a str() representation into the original object, the object must be serialized, i.e. turned into a string that contains all the data needed to reconstruct an exact copy of the object, even if the original object no longer exists. How to do this depends entirely on the actual content of the object, but a pretty standard (language-agnostic) solution is to make the class JSON-serializable; check out How to make a class JSON serializable.

Python Ctypes - Memmove not working correctly

I would like to move data from one variable to another.
I have the following code:
a = 'a' # attempting to move the contents of b into here
b = 'b'
obj = ctypes.py_object.from_address(id(a))
obj2 = ctypes.py_object.from_address(id(b))
ptr = ctypes.pointer(obj)
ptr2 = ctypes.pointer(obj2)
ctypes.memmove(ptr, ptr2, ctypes.sizeof(obj2))
print(a, b) # expected result: b b
a does not change, and gives no errors.
Is this simply not possible, or is it something I am doing wrong?
NOT RECOMMENDED But interesting for learning...
It's possible on CPython due to the implementation detail that id(obj) returns the address of the internal PyObject, but a very bad idea. Python strings are immutable, so corrupting their inner workings is going to break things. Python objects have internal data like reference counts, type, length that will be corrupted by blindly copying over them.
import ctypes as ct
import sys
# Using strings that are more unique and less likely to be used inside Python
# (lower reference counts).
a = '123'
b = '456'
# Create ctypes byte buffers that reference the same memory as a and b
bytes_a = (ct.c_ubyte * sys.getsizeof(a)).from_address(id(a))
bytes_b = (ct.c_ubyte * sys.getsizeof(b)).from_address(id(b))
# View the bytes as hex. The first bytes are the reference counts.
# The last bytes are the ASCII bytes of the strings.
print(bytes(bytes_a).hex())
print(bytes(bytes_b).hex())
ct.memmove(bytes_b, bytes_a, len(bytes_a))
# Does what you want, but Python crashes on exit in my case
print(a,b)
Output:
030000000000000060bc9563fc7f00000300000000000000bf4fda89331c3232e5a5a97d1b020000000000000000000031323300
030000000000000060bc9563fc7f00000300000000000000715a1b84492b4696e5feaf7d1b020000000000000000000034353600
123 123
Exception ignored deletion of interned string failed:
KeyError: '123'
111
Safe way to do make a copy of the memory and view it
import ctypes as ct
import sys
a = '123'
# Copy memory at address to a Python bytes object.
bytes_a = ct.string_at(id(a), sys.getsizeof(a))
print(bytes_a.hex())
Output:
020000000000000060bc5863fc7f000003000000000000001003577d19c6d60be59f53919b010000000000000000000031323300

Python struct.unpack(ing) when there are multiple byte-orders?

I have a function that reads a binary file and then unpacks the file's contents using struct.unpack(). My function works just fine. It is faster if/when I unpack the whole of the file using a long 'format' string. Problem is that sometimes the byte-alignment changes so my format string (which is invalid) would look like '<10sHHb>llh' (this is just an example (they are usually way longer)). Is there any ultra slick/pythonic way of handling this situation?
Nothing super-slick, but if speed counts, the struct module top-level functions are wrappers that have to repeatedly recheck a cache for the actual struct.Struct instance corresponding to the format string; while you must make separate format strings, you might solve part of your speed problem by avoiding that repeated cache check.
Instead of doing:
buffer = memoryview(somedata)
allresults = []
while buffer:
allresults += struct.unpack_from('<10sHHb', buffer)
buffer = buffer[struct.calcsize('<10sHHb'):]
allresults += struct.unpack_from('>llh', buffer)
buffer = buffer[struct.calcsize('>llh'):]
You'd do:
buffer = memoryview(somedata)
structa = struct.Struct('<10sHHb')
structb = struct.Struct('>llh')
allresults = []
while buffer:
allresults += structa.unpack_from(buffer)
buffer = buffer[structa.size:]
allresults += structb.unpack_from(buffer)
buffer = buffer[structb.size:]
No, it's not much nicer looking, and the speed gains aren't likely to blow you away. But you've got weird data, so this is the least brittle solution.
If you want unnecessarily clever/brittle solutions, you could do this with ctypes custom Structures, nesting BigEndianStructure(s) inside a LittleEndianStructure or vice-versa. For your example format :
from ctypes import *
class BEStruct(BigEndianStructure):
_fields_ = [('x', 2 * c_long), ('y', c_short)]
_pack_ = True
class MainStruct(LittleEndianStructure):
_fields_ = [('a', 10 * c_char), ('b', 2 * c_ushort), ('c', c_byte), ('big', BEStruct)]
_pack_ = True
would give you a structure such that you could do:
mystruct = MainStruct()
memoryview(mystruct).cast('B')[:] = bytes(range(25))
and you'd then get results in the expected order, e.g.:
>>> hex(mystruct.b[0]) # Little endian as expected in main struct
'0xb0a'
>>> hex(mystruct.big.x[0]) # Big endian from inner big endian structure
'0xf101112'
While clever in a way, it's likely it will run slower (ctypes attribute lookup is weirdly slow in my experience), and unlike struct module functions, you can't just unpack into top-level named variables in a single line, it's attribute access all the way.

How to get ctypes type object from an ctypes array

Actually, I'm trying to convert ctypes arrays to python lists and back.
If found this thread. But it assumes that we know the type at compile time.
But is it possible to retrieve a ctypes type for an element?
I have a python list that contains at least one element. I want to do something like that
import ctypes
arr = (type(pyarr[0]) * len(pyarr))(*pyarr)
This obviously doesn't work because type() doesn't return a ctypes compatible class. But even if the list contains object created directly from ctypes, the above code doesn't work because its an object instance of the type.
Is there any way to perform this task?
[EDIT]
Ok, here is the code that works for me. I'm using it to convert input paraters from comtypes server method to python lists and return values to array pointers:
def list(count, p_items):
"""Returns a python list for the given times represented by a pointer and the number of items"""
items = []
for i in range(count):
items.append(p_items[i])
return items
def p_list(items):
"""Returns a pointer to a list of items"""
c_items = (type(items[0])*len(items))(*items)
p_items = cast(c_items, POINTER(type(items[0])))
return p_items
As explained before, p_list(items) requires at least one element.
I don't think that's directly possible, because multiple ctypes types map to single Python types. For example c_int/c_long/c_ulong/c_ulonglong all map to Python int. Which type would you choose? You could create a map of your preferences:
>>> D = {int:c_int,float:c_double}
>>> pyarr = [1.2,2.4,3.6]
>>> arr = (D[type(pyarr[0])] * len(pyarr))(*pyarr)
>>> arr
<__main__.c_double_Array_3 object at 0x023540D0>
>>> arr[0]
1.2
>>> arr[1]
2.4
>>> arr[2]
3.6
Also, the undocumented _type_ can tell the type of a ctypes array.
>>> arr._type_
<class 'ctypes.c_double'>

Categories