I'm programming an interface with 3M document scanners.
I am calling a function called MMMReader_GetData
MMMReaderErrorCode MMMReader_GetData(MMMReaderDataType aDataType,void* DataPtr,int* aDataLen);
Description:
After a data item has been read from a document it may be obtained via
this API. The buffer supplied in the aDataPtr parameter will be
written to with the data, and aDataLen updated to be the length of the
data.
The problem is how can I create a void* DataPrt and how can get it the data?
I have tried:
from ctypes import *
lib=cdll.LoadLibrary('MMMReaderHighLevelAPI.dll')
CD_CODELINE = 0
aDataLen = c_int()
aDataPtr = c_void_p()
index= c_int(0)
r = lib.MMMReader_GetData(CD_CODELINE,byref(aDataPtr),byref(aDataLen),index)
aDataLen always returns a value but aDataPtr returns None
What you need to do is allocate a "buffer". The address of the buffer will be passed as the void* parameter, and the size of the buffer in bytes will be passed as the aDataLen parameter. Then the function will put its data in the buffer you gave it, and then you can read the data back out of the buffer.
In C or C++ you would use malloc or something similar to create a buffer. When using ctypes, you can use ctypes.create_string_buffer to make a buffer of a certain length, and then pass the buffer and the length to the function. Then once the function fills it in, you can read the data out of the buffer you created, which works like a list of characters with [] and len().
With ctypes, it is best to define the argument types and return value for better error checking, and declaring pointer types is especially important on 64-bit systems.
from ctypes import *
MMMReaderErrorCode = c_int # Set to an appropriate type
MMMReaderDataType = c_int # ditto...
lib = CDLL('MMMReaderHighLevelAPI')
lib.MMMReader_GetData.argtypes = MMMReaderDataType,c_void_p,POINTER(c_int)
lib.MMMReader_GetData.restype = MMMReaderErrorCode
CD_CODELINE = 0
# Make sure to pass in the original buffer size.
# Assumption: the API should update it on return with the actual size used (or needed)
# and will probably return an error code if the buffer is not large enough.
aDataLen = c_int(256)
# Allocate a writable buffer of the correct size.
aDataPtr = create_string_buffer(aDataLen.value)
# aDataPtr is already a pointer, so no need to pass it by reference,
# but aDataLen is a reference so the value can be updated.
r = lib.MMMReader_GetData(CD_CODELINE,aDataPtr,byref(aDataLen))
On return you can access just the returned portion of the buffer by string slicing, e.g.:
>>> from ctypes import *
>>> aDataLen = c_int(10)
>>> aDataPtr = create_string_buffer(aDataLen.value)
>>> aDataPtr.raw
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> aDataLen.value = 5 # Value gets updated
>>> aDataPtr[:aDataLen.value] # Get the valid portion of buffer
'\x00\x00\x00\x00\x00'
There are several issues with your code:
You need to allocate the buffer pointed to by aDataPtr.
You need to pass the buffer length in aDataLen. According to [1], if the buffer isn't big enough, MMMReader_GetData will reallocate it as needed.
You should pass aDataPtr directly, not byref.
You are passing an extra argument to the method (the index argument) based on the method descriptor of MMMReader_GetData you provided.
Try the following:
import ctypes
lib = ctypes.cdll.LoadLibrary('MMMReaderHighLevelAPI.dll')
CD_CODELINE = 0
aDataLen = ctypes.c_int(1024)
aDataPtr = ctypes.create_string_buffer(aDataLen.value)
err = lib.MMMReader_GetData(CD_CODELINE, aDataPtr, ctype.byref(aDataLen))
Then you can read the content of the buffer as a regular character array. The actual length is returned back for you in aDataLen.
[1] 3M Page Reader Programmers' Guide: https://wenku.baidu.com/view/1a16b6d97f1922791688e80b.html
Related
I would like to move data from one variable to another.
I have the following code:
a = 'a' # attempting to move the contents of b into here
b = 'b'
obj = ctypes.py_object.from_address(id(a))
obj2 = ctypes.py_object.from_address(id(b))
ptr = ctypes.pointer(obj)
ptr2 = ctypes.pointer(obj2)
ctypes.memmove(ptr, ptr2, ctypes.sizeof(obj2))
print(a, b) # expected result: b b
a does not change, and gives no errors.
Is this simply not possible, or is it something I am doing wrong?
NOT RECOMMENDED But interesting for learning...
It's possible on CPython due to the implementation detail that id(obj) returns the address of the internal PyObject, but a very bad idea. Python strings are immutable, so corrupting their inner workings is going to break things. Python objects have internal data like reference counts, type, length that will be corrupted by blindly copying over them.
import ctypes as ct
import sys
# Using strings that are more unique and less likely to be used inside Python
# (lower reference counts).
a = '123'
b = '456'
# Create ctypes byte buffers that reference the same memory as a and b
bytes_a = (ct.c_ubyte * sys.getsizeof(a)).from_address(id(a))
bytes_b = (ct.c_ubyte * sys.getsizeof(b)).from_address(id(b))
# View the bytes as hex. The first bytes are the reference counts.
# The last bytes are the ASCII bytes of the strings.
print(bytes(bytes_a).hex())
print(bytes(bytes_b).hex())
ct.memmove(bytes_b, bytes_a, len(bytes_a))
# Does what you want, but Python crashes on exit in my case
print(a,b)
Output:
030000000000000060bc9563fc7f00000300000000000000bf4fda89331c3232e5a5a97d1b020000000000000000000031323300
030000000000000060bc9563fc7f00000300000000000000715a1b84492b4696e5feaf7d1b020000000000000000000034353600
123 123
Exception ignored deletion of interned string failed:
KeyError: '123'
111
Safe way to do make a copy of the memory and view it
import ctypes as ct
import sys
a = '123'
# Copy memory at address to a Python bytes object.
bytes_a = ct.string_at(id(a), sys.getsizeof(a))
print(bytes_a.hex())
Output:
020000000000000060bc5863fc7f000003000000000000001003577d19c6d60be59f53919b010000000000000000000031323300
I have a function that reads a binary file and then unpacks the file's contents using struct.unpack(). My function works just fine. It is faster if/when I unpack the whole of the file using a long 'format' string. Problem is that sometimes the byte-alignment changes so my format string (which is invalid) would look like '<10sHHb>llh' (this is just an example (they are usually way longer)). Is there any ultra slick/pythonic way of handling this situation?
Nothing super-slick, but if speed counts, the struct module top-level functions are wrappers that have to repeatedly recheck a cache for the actual struct.Struct instance corresponding to the format string; while you must make separate format strings, you might solve part of your speed problem by avoiding that repeated cache check.
Instead of doing:
buffer = memoryview(somedata)
allresults = []
while buffer:
allresults += struct.unpack_from('<10sHHb', buffer)
buffer = buffer[struct.calcsize('<10sHHb'):]
allresults += struct.unpack_from('>llh', buffer)
buffer = buffer[struct.calcsize('>llh'):]
You'd do:
buffer = memoryview(somedata)
structa = struct.Struct('<10sHHb')
structb = struct.Struct('>llh')
allresults = []
while buffer:
allresults += structa.unpack_from(buffer)
buffer = buffer[structa.size:]
allresults += structb.unpack_from(buffer)
buffer = buffer[structb.size:]
No, it's not much nicer looking, and the speed gains aren't likely to blow you away. But you've got weird data, so this is the least brittle solution.
If you want unnecessarily clever/brittle solutions, you could do this with ctypes custom Structures, nesting BigEndianStructure(s) inside a LittleEndianStructure or vice-versa. For your example format :
from ctypes import *
class BEStruct(BigEndianStructure):
_fields_ = [('x', 2 * c_long), ('y', c_short)]
_pack_ = True
class MainStruct(LittleEndianStructure):
_fields_ = [('a', 10 * c_char), ('b', 2 * c_ushort), ('c', c_byte), ('big', BEStruct)]
_pack_ = True
would give you a structure such that you could do:
mystruct = MainStruct()
memoryview(mystruct).cast('B')[:] = bytes(range(25))
and you'd then get results in the expected order, e.g.:
>>> hex(mystruct.b[0]) # Little endian as expected in main struct
'0xb0a'
>>> hex(mystruct.big.x[0]) # Big endian from inner big endian structure
'0xf101112'
While clever in a way, it's likely it will run slower (ctypes attribute lookup is weirdly slow in my experience), and unlike struct module functions, you can't just unpack into top-level named variables in a single line, it's attribute access all the way.
I have a question about ctypes in python3.
I am trying to get a c_char_p as a python bytes object.
The following code is trying to get its value as a python3 bytes object.
How to get its value as bytes object?
from ctypes import *
libc = cdll.LoadLibrary("libSystem.B.dylib")
s1 = create_string_buffer(b"abc") # create a null terminated string buffer
s2 = create_string_buffer(b"bc") # same at above
g = libc.strstr(s1, s2) # execute strstr (this function return character pointer)
print(g) # print the returned value as integer
matched_point = c_char_p(g) # cast to char_p
print(matched_point.value) # trying to getting value as bytes object (cause segmentation fault here)
I found an answer of the question myself.
According to official Python ctypes documentation, called C function return the integer by default.
So When before calling C function, specify the type of return value with restype attribute.
correct code example:
from ctypes import *
libc = cdll.LoadLibrary("libSystem.B.dylib")
s1 = create_string_buffer(b"abc") # create a null terminated string buffer
s2 = create_string_buffer(b"bc") # same at above
libc.strstr.restype = c_char_p # specify the type of return value
g = libc.strstr(s1, s2) # execute strstr (this function return character pointer)
print(g) # => b"bc" (g is bytes object.)
I am trying to wrap a C function using ctypes, which returns a character array of unknown size. The function is from the gdal c api, but my question is not specific to that function.
I would like to know if there is a general way of deconstructing the output of a function returning a char** array object of unknown size. In ctypes, this would be POINTER(c_char_p * X) where X is not known.
Using the tips from an answer to a similar question, I was able to get the following to work:
# Define the function wrapper.
f = ctypes.CDLL('libgdal.so.20').GDALGetMetadata
MAX_OUTPUT_LENGTH = 10
f.restype = ctypes.POINTER(ctypes.c_char_p * MAX_OUTPUT_LENGTH)
f.argtypes = [ctypes.c_void_p, ctypes.c_char_p]
# Example call (the second argument can be null).
result = []
counter = 0
output = f(ptr, None).contents[counter]
while output:
result.append(output)
counter += 1
output = f(ptr, None).contents[counter]
Where output is the resulting array and ptr is a ctypes pointer to an open GDALRaster. The limitation to this is that I have to construct an array with a fixed length before calling the function. I can guess what the maximum length could be in practical cases, and simply use that. But that is arbitrary, and I wonder if there is a way of getting an array pointer without specifying the array's length. In other words:
Is there a way to do something similar as the example above, but without specifying an arbitrary maximum length?
It turns out, that you can simply pass a pointer to a c_char_p object without specifying a length as restype argument, if the function output is a null terminated character array. Then you loop through the result until the null element is found, which indicates the end of the array.
So the following works beatifully for my use case:
# Define the function wrapper, the restype can simply be a
# pointer to c_char_p (without length!).
f = ctypes.CDLL('libgdal.so.20').GDALGetMetadata
f.restype = ctypes.POINTER(ctypes.c_char_p)
f.argtypes = [ctypes.c_void_p, ctypes.c_char_p]
# Prepare python result array.
result = []
# Call C function.
output = f(ptr, None)
# Ensure that output is not a null pointer.
if output:
# Get first item from array.
counter = 0
item = output[counter]
# Get more items, until the array accessor returns null.
# The function output (at least in my use case) is a null
# terminated char array.
while item:
result.append(item)
counter += 1
item = output[counter]
I am using llvmpy to (attempt to) generate IR code. However, I am stuck using printf and an array of int8.
The following is an excerpt of what is giving me issues:
# Defining the print function.
# -----
fntype = Type.function(Type.void(), [Type.pointer(Type.int(8))])
myprint = module.add_function(fntype, 'print')
cb = CBuilder(myprint)
x = cb.printf("%s\n", cb.args[0])
cb.ret()
cb.close()
# Creating the string.
# -----
chartype = Type.int(8)
chars = [Constant.int(chartype, ord(c)) for c in "Hello World!"]
string = Constant.array(chartype, chars)
ptr = string.gep([Constant.int(Type.int(8)), 0])
# Calling the print function.
# -----
cfn = exe.get_ctype_function(module.get_function_named('print'),
None, ct.c_char_p)
cfn(ptr)
When I run this code I receive
ctypes.ArgumentError: argument 1: : wrong
type
What am I doing wrong? I feel that my usage of .gep() is at fault, but I'm not sure in what way. Or is there something else that I don't understand?
Also, is there a way to get the expected type from the function?
Yes, your usage of gep is incorrect:
The gep method receives a collection of indices, so not sure what a type is doing there.
The receiver of the gep method needs to be a pointer (or a pointer vector), while your receiver is an array.
But the fundamental problem here is that you are trying to get the address of a compile-time constant - i.e., the address of something which is never allocated any memory.
The proper way to do what you're trying to do is to create a global variable which is
Initialized to your "hello world" and
Marked as constant
Such a variable is allocated an address (of type pointer to i8 array) - and then you can use gep or bitcast constant-expressions to get i8* and send it to your print function.
For an example, try to compile a c program with a string literal into LLVM IR and you'll see the string literal was placed in such a global variable.