I'm writing an application working with plugins. There are two types of plugins: Engine and Model. Engine objects have an update() method that call the Model.velocity() method.
For performance reasons these methods are allowed to be written in C. This means that sometimes they will be written in Python and sometimes written in C.
The problem is that this forces to do an expensive Python function call of Model.velocity() in Engine.update() (and also reacquiring the GIL). I thought about adding something like Model.get_velocity_c_func() to the API, that would allow Model implementations to return a pointer to the C version of their velocity() method if available, making possible for Engine to do a faster C function call.
What data type should I use to pass the function pointer ? And is this a good design at all, maybe there is an easier way ?
The CObject (PyCOBject) data type exists for this purpose. It holds a void*, but you can store any data you wish. You do have to be careful not to pass the wrong CObject to the wrong functions, as some other library's CObjects will look just like your own.
If you want more type security, you could easily roll your own PyType for this; all it has to do, after all, is contain a pointer of the right type.
Related
Question
Suppose that I have implemented two Python types using the C extension API and that the types are identical (same data layouts/C struct) with the exception of their names and a few methods. Assuming that all methods respect the data layout, can you safely change the type of an object from one of these types into the other in a C function?
Notably, as of Python 3.9, there appears to be a function Py_SET_TYPE, but the documentation is not clear as to whether/when this is safe to do. I'm interested in knowing both how to use this function safely and whether types can be safely changed prior to version 3.9.
Motivation
I'm writing a Python C extension to implement a Persistent Hash Array Mapped Trie (PHAMT); in case it's useful, the source code is here (as of writing, it is at this commit). A feature I would like to add is the ability to create a Transient Hash Array Mapped Trie (THAMT) from a PHAMT. THAMTs can be created from PHAMTs in O(1) time and can be mutated in-place efficiently. Critically, THAMTs have the exact same underlying C data-structure as PHAMTs—the only real difference between a PHAMT and a THAMT is a few methods encapsulated by their Python types. This common structure allows one to very efficiently turn a THAMT back into a PHAMT once one has finished performing a set of edits. (This pattern typically reduces the number of memory allocations when performing a large number of updates to a PHAMT).
A very convenient way to implement the conversion from THAMT to PHAMT would be to simply change the type pointers of the THAMT objects from the THAMT type to the PHAMT type. I am confident that I can write code that safely navigates this change, but I can imagine that doing so might, for example, break the Python garbage collector.
(To be clear: the motivation is just context as to how the question arose. I'm not looking for help implementing the structures described in the Motivation, I'm looking for an answer to the Question, above.)
The supported way
It is officially possible to change an object's type in Python, as long as the memory layouts are compatible... but this is mostly limited to types not implemented in C. With some restrictions, it is possible to do
# Python attribute assignment, not C struct member assignment
obj.__class__ = some_new_class
to change an object's class, with one of the restrictions being that both the old and new classes must be "heap types", which all classes implemented in Python are and most classes implemented in C are not. (types.ModuleType and subclasses of that type are also specifically permitted, despite types.ModuleType not being a heap type. See the source for exact restrictions.)
If you want to create a heap type from C, you can, but the interface is pretty different from the normal way of defining Python types from C. Plus, for __class__ assignment to work, you have to not set the Py_TPFLAGS_IMMUTABLETYPE flag, and that means that people will be able to monkey-patch your classes in ways you might not like (or maybe you see that as an upside).
If you want to go that route, I suggest looking at the CPython 3.10 _functools module source code for an example. (They set the Py_TPFLAGS_IMMUTABLETYPE flag, which you'll have to make sure not to do.)
The unsupported way
There was an attempt at one point to allow __class__ assignment for non-heap types, as long as the memory layouts worked. It got abandoned because it caused problems with some built-in immutable types, where the interpreter likes to reuse instances. For example, allowing (1).__class__ = SomethingElse would have caused a lot of problems. You can read more in the big comment in the source code for the __class__ setter. (The comment is slightly out of date, particularly regarding the Py_TPFLAGS_IMMUTABLETYPE flag, which was added after the comment was written.)
As far as I know, this was the only problem, and I don't think any more problems have been added since then. The interpreter isn't going to aggressively reuse instances of your classes, so as long as you're not doing anything like that, and the memory layouts are compatible, I think changing the type of your objects should work for now, even for non-heap-types. However, it is not officially supported, so even if I'm right about this working for now, there's no guarantee it'll keep working.
Py_SET_TYPE only sets an object's type pointer. It doesn't do any refcount fixing that might be needed. It's a very low-level operation. If neither the old class nor the new class are heap types, no extra refcount fixing is needed, but if the old class is a heap type, you will have to decref the old class, and if the new class is a heap type, you will have to incref the new class.
If you need to decref the old class, make sure to do it after changing the object's class and possibly incref'ing the new class.
According to the language reference, chapter 3 "Data model" (see here):
An object’s type determines the operations that the object supports (e.g., “does it have a length?”) and also defines the possible values for objects of that type. The type() function returns an object’s type (which is an object itself). Like its identity, an object’s type is also unchangeable.[1]
which, to my mind states that the type must never change, and changing it would be illegal as it would break the language specification. The footnote however states that
[1] It is possible in some cases to change an object’s type, under certain controlled conditions. It generally isn’t a good idea though, since it can lead to some very strange behaviour if it is handled incorrectly.
I don't know of any method to change the type of an object from within python itself, so the "possible" may indeed refer to the CPython function.
As far as I can see a PyObject is defined internally as a
struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
PyTypeObject *ob_type;
};
So the reference counting should still work. On the other hand you will segfault the interpreter if you set the type to something that is not a PyTypeObject, or if the pointer is free()d, so the usual caveats.
Apart from that I agree that the specification is a little ambiguous, but the question of "legality" may not have a good answer. The long and short of it seems to me to be "do not change types unless you know what your are doing, and if you are not hacking on CPython itself you do not know what you are doing".
Edit: The Py_SET_TYPE function was added in Python 3.9 based on this commit. Apparently, people used to just set the type using
Py_TYPE(obj) = typeobj;
So the inclusion (without being formerly announced as far as I can see) is more akin to adding a convenience function.
For a project idea of mine, I have the following need, which is quite precise:
I would like to be able to execute Python code (pre-compiled before hand if necessary) on a per-bytecode-instruction basis. I also need to access what's inside the Python VM (frame stack, data stacks, etc.). Ideally, I would also like to remove a lot of Python built-in features and reimplement a few of them my own way (such as file writing).
All of this must be coded in C# (I'm using Unity).
I'm okay with loosing a few of Python's actual features, especially concerning complicated stuff with imports, etc. However, I would like most of it to stay intact.
I looked a little bit into IronPython's code but it remains very obscure to me and it seems quite enormous too. I began translating Byterun (a Python bytecode interpreter written in Python) but I face a lot of difficulties as Byterun leverages a lot of Python's features to... interpret Python.
Today, I don't ask for a pre-made solution (except if you have one in mind?), but rather for some advice, places to look at, etc. Do you have any ideas about the things I should research first?
I've tried to do my own implementation of the Python VM in the distant past and learned a lot but never came even close to a fully working implementation. I used the C implementation as a starting point, specifically everything in https://github.com/python/cpython/tree/main/Objects and
https://github.com/python/cpython/blob/main/Python/ceval.c (look for switch(opcode))
Here are some pointers:
Come to grips with the Python object model. Implement an abstract PyObject class with the necessary methods for instancing, attribute access, indexing and slicing, calling, comparisons, aritmetic operations and representation. Provide concrete implemetations for None, booleans, ints, floats, strings, tuples, lists and dictionaries.
Implement the core of your VM: a Frame object that loops over the opcodes and dispatches, using a giant switch statment (following the C implementation here), to the corresponding methods of the PyObject. The frame should maintains a stack of PyObjects for the operants of the opcodes. Depending on the opcode, arguments are popped from and pushed on this stack. A dict can be used to store and retrieve local variables. Use the Frame object to create a PyObject for function objects.
Get familiar with the idea of a namespace and the way Python builds on the concept of namespaces. Implement a module, a class and an instance object, using the dict to map (attribute)names to objects.
Finally, add as many builtin functions as you think you need to get a usefull implementation.
I think it is easy to underestimate the amount of work you're getting yourself into, but ... have fun!
I found the following paradigm quite useful and I would love to be able to reproduce it somehow in Julia to take advantage of Julia's speed and C wrapping capabilities.
I normally maintain a set of objects in Python/Matlab that represent the blocks (algorithms) of a pipeline. Set up unit-tests etc.
Then develop the equivalent C/C++ code by having equivalent python/Matlab objects (same API) that wrap C/C++ to implement the same functionality and have to pass the same tests (by this I mean the exact same tests written in python/Matlab where either I generate synthetic data or I load recorded data).
I will maintain the full-python and python/C++ objects in parallel enforcing parity with extensive test suites. The python only and python/C++ versions are fully interchangeable.
Every time I need to modify the behavior of the pipeline, or debug an issue, I first use the fully pythonic version of the specific object/block I need to modify, typically in conjunction with other blocks running in python/C++ mode for speed, then update the tests to match the behavior of the modified python block and finally update the C++ version until it reaches parity and passes the updated tests.
Evey time I instantiate the Python/C++ version on the block, in the constructor I run a "make" that rebuilds the C++ code if there was any modification. To make sure I always test the latest version of the C++.
Is there any elegant way to reproduce the same paradigm with the Julia/C++ combination? Maintaining julia/C++ versions in parallel via automatic testing.
I.e. how do I check/rebuild the C++ only once when I instantiate the object and not per function call (it would be way too slow).
I guess I could call the "make" once at the test-suite level before I run all the tests of the different blocks. But then I will have to manually call it if I'm writing a quick python script for a debugging session.
Let's pick the example of a little filter object with a configure method that changes the filter parameters and a filter method the filters the incoming data.
We will have something like:
f1 = filter('python');
f2 = filter('C++'); % rebuild C++ as needed
f1.configure(0.5);
f2.configure(0.5);
x1 = data;
x2 = data;
xf1 = f1.filter(x1);
xf2 = f2.filter(x2);
assert( xf1 == xf2 )
In general there will be a bunch of tests that instantiate the objects in both python-only mode or python/C++ mode and test them.
I guess what I'm trying to say is that since in Julia the paradigm is to have a filter type, and then "external" methods that modify/use the filter type there is no centralized way to check/rebuild all its methods that wrap C code. Unless the type contains a list of variable that keep track of the relevant methods. Seems awkward.
I would appreciate comments / ideas.
Is there a reason why you can't wrap your functions in a struct like this?
struct Filter
FilterStuff::String
Param::Int
function Filter(stuff::String, param::Int)
# Make the C++ code here
# Return the created object here
obj = new(stuff, param)
end
end
I have heard many times that C and Python/Ruby code can be integrated.
Now, my question is, can I use, for example a Python/Ruby ORM from within C?
Yes, but the API would be unlikely to be very nice, especially because the point of an ORM is to return objects and C doesn't have objects, hence making access to the nice OOP API unwieldy.
Even in C++ is would be problematic as the objects would be Python/Ruby objects and the values Python/Ruby objects/values, and you would need to convert back and forth.
You would be better off using a nice database layer especially made for C.
For Ruby, yes, you can by using the Ruby C API. After including ruby.h you can use rb_funcall:
To invoke methods directly, you can use the function below
VALUE rb_funcall(VALUE recv, ID mid, int argc, ...)
This function invokes a method on the recv, with the method name specified by the symbol mid.
This will allow you to call any Ruby method, and thus use any Ruby code from C. It won’t be pretty, though. There are a lot of good resources in SO’s Ruby C API tag wiki.
I'm learning how to use Qt with PyQt, and I have a QTabelView with a StandardItemModel I've populated the model successfully and hooked up the itemChanged signal to a slot. I'd l'd like to mess around with whatever object is returned in IPython, so currently I have the line:
def itemChangedSlot(epw, item):
new_data = item.data()
print new_data
print item
which prints
<PyQt4.QtGui.QStandardItem object at 0x07C5F930>
<PyQt4.QtCore.QVariant object at 0x07D331F0>
In the IPython session is it possible to get the object using this memory address? I'm not seeing anything on Google, maybe I don't have my terminology right?
You need to hold a reference to an object (i.e. assign it to a variable or store it in a list).
There is no language support for going from an object address directly to an object (i.e. pointer dereferencing).
You're almost certainly asking the wrong question, and Raymond Hettinger's answer is almost certainly what you really want.
Something like this might be useful trying to dig into the internals of the CPython interpreter for learning purposes or auditing it for security holes or something… But even then, you're probably better off embedding the Python interpreter into a program and writing functions that expose whatever you want into the Python interpreter, or at least writing a C extension module that lets you manipulate CPython objects.
But, on the off chance that you really do need to do this…
First, there is no reliable way to even get the address from the repr. Most objects with a useful eval-able representation will give you that instead. For example, the repr of ('1', 1) is "('1', 1)", not <tuple at 0x10ed51908>. Also, even for objects that have no useful representation, returning <TYPE at ADDR> is just an unstated convention that many types follow (and a default for user-defined classes), not something you can rely on.
However, since you presumably only care about CPython, you can rely on id:
CPython implementation detail: This is the address of the object in memory.
(Of course if you have the object to call id (or repr) on, you don't need to dereference it via pointer, and if you don't have the object, it's probably been garbage collected so there's nothing to dereference, but maybe you still have it and just can't remember where you put it…)
Next, what do you do with this address? Well, Python doesn't expose any functions to do the opposite of id. But the Python C API is well documented—and, if your Python is built around a shared library, that C API can be accessed via ctypes, just by loading it up. In fact, ctypes provides a special variable that automatically loads the right shared library to call the C API on, ctypes.pythonapi.
In very old versions of ctypes, you may have to find and load it explicitly, like pydll = ctypes.cdll.LoadLibrary('/usr/lib/libpython2.5.so') (This is for linux with Python 2.5 installed into /usr/lib; obviously if any of those details differ, the exact command line will differ.)
Of course it's much easier to crash the Python interpreter doing this than to do anything useful, but it's not impossible to do anything useful, and you may have fun experimenting with it.
I think the answer is here. This way, I accessed an object using its address:
import ctypes
myObj = ctypes.cast(0x249c8c6cac0, ctypes.py_object).value