I ran into a situation with pure python and C python module.
To summarize, how can I accept and manipulate python object in C module?
My python part will look like this.
#!/usr/bin/env python
import os, sys
from c_hello import *
class Hello:
busyHello = _sayhello_obj
class Man:
def __init__(self, name):
self.name = name
def getName(self):
return self.name
h = Hello()
h.busyHello( Man("John") )
in C, two things need to be resolved.
first, how can I receive object?
second, how can I call a method from the object?
static PyObject *
_sayhello_obj(PyObject *self, PyObject *args)
{
PyObject *obj;
// How can I fill obj?
char s[1024];
// How can I fill s, from obj.getName() ?
printf("Hello, %s\n", s);
return Py_None;
}
To extract an argument from an invocation of your method, you need to look at the functions documented in Parsing arguments and building values, such as PyArg_ParseTuple. (That's for if you're only taking positional args! There are others for positional-and-keyword args, etc.)
The object you get back from PyArg_ParseTuple doesn't have it's reference count increased. For simple C functions, you probably don't need to worry about this. If you're interacting with other Python/C functions, or if you're releasing the global interpreter lock (ie. allowing threading), you need to think very carefully about object ownership.
static PyObject *
_sayhello_obj(PyObject *self, PyObject *args)
{
PyObject *obj = NULL;
// How can I fill obj?
static char fmt_string = "O" // For "object"
int parse_result = PyArg_ParseTuple(args, fmt_string, &obj);
if(!parse_res)
{
// Don't worry about using PyErr_SetString, all the exception stuff should be
// done in PyArg_ParseTuple()
return NULL;
}
// Of course, at this point you need to do your own verification of whatever
// constraints might be on your argument.
For calling a method on an object, you need to use either PyObject_CallMethod or PyObject_CallMethodObjArgs, depending on how you construct the argument list and method name. And see my comment in the code about object ownership!
Quick digression just to make sure you're not setting yourself up for a fall later: If you really are just getting the string out to print it, you're better off just getting the object reference and passing it to PyObject_Print. Of course, maybe this is just for illustration, or you know better than I do what you want to do with the data ;)
char s[1024];
// How can I fill s, from obj.getName() ?
// Name of the method
static char method_name = "getName";
// No arguments? Score! We just need NULL here
char method_fmt_string = NULL;
PyObject *objname = PyObject_CallMethod(obj, obj_method, method_fmt_string);
// This is really important! What we have here now is a Python object with a newly
// incremented reference count! This means you own it, and are responsible for
// decrementing the ref count when you're done. See below.
// If there's a failure, we'll get NULL
if(objname == NULL)
{
// Again, this should just propagate the exception information
return NULL;
}
Now there are a number of functions in the String/Bytes Objects section of the Concrete Objects Layer docs; use whichever works best for you.
But do not forget this bit:
// Now that we're done with the object we obtained, decrement the reference count
Py_XDECREF(objname);
// You didn't mention whether you wanted to return a value from here, so let's just
// return the "None" singleton.
// Note: this macro includes the "return" statement!
Py_RETURN_NONE;
}
Note the use of Py_RETURN_NONE there, and note that it's not return Py_RETURN_NONE!
PS. The structure of this code is dictated to a great extent by personal style (eg. early returns, static char format strings inside the function, initialisation to NULL). Hopefully the important information is clear enough apart from stylistic conventions.
Related
Viewing the source code of CPython on GitHub, I saw the method here:
https://github.com/python/cpython/blob/main/Python/bltinmodule.c
And more specifically:
static PyObject *
builtin_sorted(PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames)
{
PyObject *newlist, *v, *seq, *callable;
/* Keyword arguments are passed through list.sort() which will check
them. */
if (!_PyArg_UnpackStack(args, nargs, "sorted", 1, 1, &seq))
return NULL;
newlist = PySequence_List(seq);
if (newlist == NULL)
return NULL;
callable = _PyObject_GetAttrId(newlist, &PyId_sort);
if (callable == NULL) {
Py_DECREF(newlist);
return NULL;
}
assert(nargs >= 1);
v = _PyObject_FastCallKeywords(callable, args + 1, nargs - 1, kwnames);
Py_DECREF(callable);
if (v == NULL) {
Py_DECREF(newlist);
return NULL;
}
Py_DECREF(v);
return newlist;
}
I am not a C master, but I don't see any implementation of any of the known sorting algorithms, let alone the special sort that Python uses (I think it's called Timsort? - correct me if I'm wrong)
I would highly appreciate if you could help me "digest" this code and understand it, because as of right now I've got:
PyObject *newlist, *v, *seq, *callable;
Which is creating a new list - even though list is mutable no? then why create a new one?
and creating some other pointers, not sure why...
then we unpack the rest of the arguments as the comment suggests, if it doesn't match the arguments there (being the function 'sorted' for example) then we break out..
I am pretty sure I am reading this all completely wrong, so I stopped here...
Thanks for the help in advanced, sorry for the multiple questions but this block of code is blowing my mind and learning to read this would help me a lot!
The actual sorting is done by list.sort. sorted simply creates a new list from whatever iterable argument it is given, sorts that list in-place, then returns it. A pure Python implementation of sorted might look like
def sorted(itr, *, key=None):
newlist = list(itr)
newlist.sort(key=key)
return newlist
Most of the C code is just boilerplate for working with the underlying C data structures, detecting and propagating errors, and doing memory management.
The actual sorting algorithm is spread throughout Objects/listobject.c; start here. If you are really interested in what the algorithm is, rather than how it is implemented in C, you may want to start with https://github.com/python/cpython/blob/main/Objects/listsort.txt instead.
list sort implementation isn't there. This is a wrapper function fetching PyId_sort from there:
callable = _PyObject_GetAttrId(newlist, &PyId_sort);
object.h contains a macro using token pasting to define the PyId_xxx objects
#define _Py_IDENTIFIER(varname) _Py_static_string(PyId_##varname, #varname)
... and I stopped digging after that. There could be more macro magic involved in order to enforce a coherent naming through the whole python codebase.
The implementation is located here:
https://github.com/python/cpython/blob/main/Objects/listobject.c
More precisely around line 2240
static PyObject *
list_sort_impl(PyListObject *self, PyObject *keyfunc, int reverse)
/*[clinic end generated code: output=57b9f9c5e23fbe42 input=cb56cd179a713060]*/
{
Comments read:
/* An adaptive, stable, natural mergesort. See listsort.txt.
* Returns Py_None on success, NULL on error. Even in case of error, the
* list will be some permutation of its input state (nothing is lost or
* duplicated).
*/
Now it takes some effort to understand the details of the algorithm but it's there.
I have a function in c++ that receives a initialised class as a PyObject.
The python class is:
class Expression:
def __init__(self, obj):
self.obj = obj
def get_source(self):
#Check if the object whose source is being obtained is a function.
if inspect.isfunction(self.obj):
source = inspect.getsourcelines(self.obj)[0][1:]
ls = len(source[0]) - len(source[0].lstrip())
source = [line[ls:] for line in source]
#get rid of comments from the source
source = [item for item in source if item.lstrip()[0] != '#']
source = ''.join(source)
return source
else:
raise Exception("Expression object is not a function.")
The c++ receives this:
Expression(somefunctogetsource)
From c++ how do I call the get_source method of the expression object?
So far I've read the python c-api docs and tried things like this:
PyObject* baseClass = (PyObject*)expression->ob_type;
PyObject* func = PyObject_GetAttrString(baseClass, "get_source");
PyObject* result = PyObject_CallFunctionObjArgs(func, expression, NULL);
And convert the result to a string, but this doesn't work.
Simpler than you're making it. You don't need to retrieve anything from the base class directly. Just do:
PyObject* result = PyObject_CallMethod(expression, "get_source", NULL);
if (result == NULL) {
// Exception occurred, return your own failure status here
}
// result is a PyObject* (in this case, it should be a PyUnicode_Object)
PyObject_CallMethod takes an object to call a method of, a C-style string for the method name, and a format string + varargs for the arguments. When no arguments are needed, the format string can be NULL.
The resulting PyObject* isn't super useful to C++ code (it has runtime determined 1, 2 or 4 byte characters, depending on the ordinals involved, so straight memory copying from it into std::string or std::wstring won't work), but PyUnicode_AsUTF8AndSize can be used to get a UTF-8 encoded version and length, which can be used to efficiently construct a std::string with equivalent data.
If performance counts, you may want to explicitly make a PyObject* representing "get_source" during module load, e.g. with a global like:
PyObject *get_source_name;
which is initialized in the module's PyMODINIT_FUNC with:
get_source_name = PyUnicode_InternFromString("get_source");
Once you have that, you can use the more efficient PyObject_CallMethodObjArgs with:
PyObject* result = PyObject_CallMethodObjArgs(expression, get_source_name, NULL);
The savings there are largely in avoiding constructing a Python level str from a C char* over and over, and by using PyUnicode_InternFromString to construct the string, you're using the interned string, making the lookup more efficient (since the name of get_source is itself automatically interned when def-ed in the interpreter, no actual memory comparison of the contents takes place; it realizes the two strings are both interned, and just checks if they point to the same memory or not).
I'm writing functools.partial object alternative, that accumulates arguments until their number become sufficient to make a call.
I use C API and I have tp_call implementation which when its called, returns modified version of self or PyObject*.
At first I followed Defining New Types guide and then realized, that I just can't return different types (PyObject * and MyObject*) from tp_call implementation.
Then I tried to not use struct with MyObject* definition and use PyObject_SetAttrString in tp_init instead, just like we do that in Python. But in that case I got AttributeError, because you can't set arbitrary attributes on object instances in Python.
What I need here is to make my tp_call implementation polymorphic, and make it able to return either MyObject which is subclass of PyObject, or PyObject type itself.
What is the sane way to do that?
UPDATE #0
That's the code:
static PyObject *Curry_call(Curry *self, PyObject *args,
PyObject *kwargs) {
PyObject * old_args = self->args;
self->args = PySequence_Concat(self->args, args);
Py_DECREF(old_args);
if (self->kwargs == NULL && kwargs != NULL) {
self->kwargs = kwargs;
Py_INCREF(self->kwargs);
} else if (self->kwargs != NULL && kwargs != NULL) {
PyDict_Merge(self->kwargs, kwargs, 1);
}
if ((PyObject_Size(self->args) +
(self->kwargs != NULL ? PyObject_Size(self->kwargs) : 0)) >=
self->num_args) {
return PyObject_Call(self->fn, self->args, self->kwargs);
} else {
return (PyObject *)self;
}
}
UPDATE #1
Why I initially abandoned this implementation - because I get segfault with it on subsequent calls of partial object. I thought that It happens because of casting Curry * to PyObject* issues. But now I have fixed the segfault by adding Py_INCREF(self); before return (PyObject *)self;. Very strange to me. Should I really INCREF self if I return it by C API ownership rules?
If you've defined your MyObject type correctly, you should be able to simply cast your MyObject * to a PyObject * and return that. The first member of a MyObject is a PyObject, and C lets you cast a pointer to a struct to a pointer to the struct's first member and vice versa. I believe the feature exists specifically to allow things like this.
I don't really know your whole code, but as long as MyObject is a PyObject (compatible, i.e. has the same "header" fields, make sure you have a length field), CPython is designed to just take your MyObject as a PyObject; simply cast the pointer to PyObject before returning it.
As you can see here, that is one of the things that is convenient when using C++: You can actually have subclasses with type safety, and you don't have to worry about someone just copying over half of your subclass' instance, for example.
EDIT: because it was asked "isn't this unsafe": yes. It is. But its only as unsafe as type handling in user code gets; CPython lets you do this, because it stores and checks the PyTypeObject *ob_type member of the PyObject struct contained. That's about as safe as for example C++'s runtime type checking is -- but it's implemented by python developers as opposed to GCC/clang/MSVC/icc/... developers.
This question is related to a previous question I asked. Namely this one if anyone is interested. Basically, what I want to do is to expose a C array to Python using a Py_buffer wrapped in a memoryview-object. I've gotten it to work using PyBuffer_FillInfo (work = I can manipulate the data in Python and write it to stdout in C), but if I try to roll my own buffer I get a segfault after the C function returns.
I need to create my own buffer because PyBuffer_FillInfo assumes that the format is char, making the itemsize field 1. I need to be able to provide items of size 1, 2, 4 and 8.
Some code, this is a working example:
Py_buffer *buf = (Py_buffer *) malloc(sizeof(*buf));
int r = PyBuffer_FillInfo(buf, NULL, malloc(sizeof(char) * 4), 4, 0, PyBUF_CONTIG);
PyObject *mv = PyMemoryView_FromBuffer(buf);
//Pack the memoryview object into an argument list and call the Python function
for (blah)
printf("%c\n", *buf->buf++); //this prints the values i set in the Python function
Looking at the implementation of PyBuffer_FillInfo, which is really simple, I rolled my own function to be able to provide custom itemsizes:
//buffer creation function
Py_buffer *getReadWriteBuffer(int nitems, int itemsize, char *fmt) {
Py_buffer *buf = (Py_buffer *) malloc(sizeof(*buf));
buf->obj = NULL
buf->buf = malloc(nitems * itemsize);
buf->len = nitems * itemsize;
buf->readonly = 0;
buf->itemsize = itemsize;
buf->format = fmt;
buf->ndim = 1;
buf->shape = NULL;
buf->strides = NULL;
buf->suboffsets = NULL;
buf->internal = NULL;
return buf;
}
How i use it:
Py_buffer *buf = getReadWriteBuffer(32, 2, "h");
PyObject *mv = PyMemoryView_FromBuffer(buf);
// pack the memoryview into an argument list and call the Python function as before
for (blah)
printf("%d\n", *buf->buf); //this prints all zeroes even though i modify the array in Python
return 0;
//the segfault happens somewhere after here
The result of using my own buffer object is a segfault after the C function returns. I really don't understand why this happens at all. Any help would be most appreciated.
EDIT
According to this question, which I failed to find before, itemsize > 1 might not even be supported at all. Which makes this question even more interesting. Maybe I could use PyBuffer_FillInfo with a large enough block of memory to hold what I want (32 C floats for example). In that case, the question is more about how to assign Python floats to the memoryview object in the Python function. Questions questions.
So, in lack of answers I decided to take another approach than the one I originally intended. Leaving this here in case someone else hits the same snag.
Basically, instead of creating a buffer (or bytearray, equiv.) in C and passing it to Python for the extension user to modify. I simply redesigned the code slightly, so that the user returns a bytearray (or any type that supports the buffer interface) from the Python callback function. This way I need not even worry about the size of the items since, in my case, all the C code does with the returned object is to extract its buffer and copy it to another buffer with a simple memcpy.
Code:
PYGILSTATE_ACQUIRE; //a macro i made
PyObject *result = PyEval_CallObject(python_callback, NULL);
if (!PyObject_CheckBuffer(result))
; //raise exception
Py_buffer *view = (Py_buffer *) malloc(sizeof(*view));
int error = PyObject_GetBuffer(result, view, PyBUF_SIMPLE);
if (error)
; //raise exception
memcpy(my_other_buffer, view->buf, view->len);
PyBuffer_Release(view);
Py_DECREF(result);
PYGILSTATE_RELEASE; //another macro
I hope this helps someone.
There is a libx.so which export 2 functions, and a struct,
typedef struct Tag {
int num;
char *name;
}Tag;
Tag *create(int n, char *n)
{
Tag *t = malloc(sizeof(Tag));
t->num = n;
t->name = n;
return t;
}
void use(Tag *t)
{
printf("%d, %s\n", t->num, t->name);
}
I want to call create in Python and then save the Tag *res returned by create, later I will call use and pass the Tag *res saved before to use, here is it (just to demonstrate):
>>>libx = ctypes.CDLL("./libx.so")
>>>res = libx.create(c_int(1), c_char_p("a"))
>>>libx.use(res)
The above code might be wrong, just to demonstrate what I want to do.
And my problem is that, how could I save the result returned by create? Because it returns a pointer to a user-defined struct, and I don't want to construct struct Tag's counterpart in Python, would c_void_p do the trick?
UPDATE
From #David's answer, I still don't quite understand one thing:
the pointer (c_char_p("a")) is only valid for the duration of the
call to create. As soon as create returns then that pointer is no
longer valid.
And I assign c_char_p("a") to t->name in create, when the call to create finishes, is t->name a dangling pointer? Because according to the quoted words, that pointer is no longer valid after create. Why c_char_p("a") is no longer valid?
The C code that you present is simply not going to work. You need to be much more precise about which party allocates and is responsible for the heap memory.
In your current example you pass c_char_p("a") to the C code. However, the pointer to that ctypes memory is only valid for the duration of the call to create. As soon as create returns then that pointer is no longer valid. But you took a copy of the pointer inside create. Thus the subsequent call to use is liable to fail.
You are going to need to take a copy of the contents of that string and store it in the struct. If you do that then you can use libx.create.restype = c_void_p safely.
But if you want the memory you allocated to be deallocated you will have to provide a destroy function to match the create function. With these changes the C code would look like this:
Tag *create(int n, char *s)
{
Tag *t = malloc(sizeof(Tag));
t->num = n;
t->name = strdup(s);
return t;
}
void destroy(Tag *t)
{
free(t->name);
free(t);
}
The Python code would look like this:
libx = ctypes.CDLL("./libx.so")
libx.create.restype = c_void_p
res = libx.create(c_int(1), c_char_p("a"))
libx.use(res)
libx.destroy(res)
Python does reference counting. You'll have to use Py_INCREF() and friends for objects that are returned from "external" libraries.
UPDATE: I don't know about .so loading by python, maybe the method proposed by #David Hefferman does this automagically.
UPDATE2: delete me!