I'm writing a V8 add-on to convert javascript objects to python, and vice-versa. I'm able to convert all sorts of types, but PyDateTime_FromTimestamp (which is specified as existing in the cpython docs: https://docs.python.org/2/c-api/datetime.html#c.PyDateTime_FromTimestamp) is apparently undefined, causing compilation to fail.
../src/py_object_wrapper.cc:189:13: error: use of undeclared identifier
'PyDateTime_FromTimestamp'
return PyDateTime_FromTimestamp(value->NumberValue());
Anybody know what's going on?
Since you haven't given us enough information to debug anything, I'm going to take a wild guess at the most likely problem.
Notice that at the top of the documentation you linked to it says:
Various date and time objects are supplied by the datetime module. Before using any of these functions, the header file datetime.h must be included in your source (note that this is not included by Python.h), and the macro PyDateTime_IMPORT must be invoked, usually as part of the module initialisation function. The macro puts a pointer to a C structure into a static variable, PyDateTimeAPI, that is used by the following macros.
If you just forgot the macro, this would compile but then crash at runtime, as PyDateTimeAPI will be NULL.
But if you forgot to #include the datetime.h, that would cause exactly what you're seeing.
Related
While exploring Cython compile steps, I found I need to link C libraries like math explicitly in setup.py. However, such step was not needed for numpy. Why so? Is numpy being imported through usual python import mechanism? If that is the case, we need not explicitly link any extension module in Cython?
I tried to rummage through the official documentation, but unfortunately there was no explanation as to when an explicit linking is required and when it will be dealt automatically.
Call of a cdef-function corresponds more or less just to a jump to an address in the memory - the one from which the command should be read/executed. The question is how this address is provided. There are some cases we need to consider:
A. inline functions
The code of those functions is either inlined or the definition of the function is in the same translation unit, thus the address is known to the linker at the link time (or even compiler at compile-time) - no need for additional libraries.
An example are header-only libraries.
Consequences: Only include path(s) should be provided in setup.py.
B. static linking
The definition/functionality we need is in another translation unit/library - the target-address of the jump is calculated at the link-time and cannot be changed anymore afterwards.
An example are additional c/cpp-files or static libraries which are added to extension-definition.
Consequences: Static library should be added to setup.py, i.e. library-path and library name along with include paths.
C. dynamic linking
The necessary functionality is provided in a shared object/dll. The address to jump to is calculated during the runtime from loader and can be replaced at program start by exchanging the loaded shared objects.
An example are stdlibc++ (usually added automatically by g++) or libm, which is not automatically added to linker command by gcc.
Consequences: Dynamic library should be added to setup.py, i.e. library-path and library name, maybe r-path + include paths. Shared object/dll must be provided at the run time. More (than one probably would like to know) information about Cython/Python using dynamic libraries can be found in this SO-post.
D. Calling via a pointer
Linker is needed only when we call a function via its name. If we call it via a function-pointer, we don't need a linker/loader because the address of the function is already known - the value in the function pointer.
Example: Cython-generated modules uses this machinery to enable access to its cdef-functions exported through pxd-file. It creates a data structure (which is stored as variable __pyx_capi__ in the module itself) of function-pointers, which is filled by the loader once the so/dll is loaded via ldopen (or whatever Windows' equivalent). The lookup in the dictionary happens only once when the module is loaded and the addresses of functions are cached, so the calls during the run time have almost no overhead.
We can inspect it, for example via
#foo.pyx:
cdef void doit():
print("doit")
#foo.pxd
cdef void doit()
>>> cythonize -3 -i foo.pyx
>>> python -c "import foo; print(foo.__pyx_capi__)"
{'doit': <capsule object "void (void)" at 0x7f7b10bb16c0>}
Now, calling a cdef function from another module is just jumping to the corresponding address.
Consequences: We need to cimport the needed funcionality.
Numpy is a little bit more complicated as it uses a sophisticated combination of A and D in order to postpone the resolution of symbols until the run time, thus not needing shared-object/dlls at link time (but at run time!).
Some functionality in numpy-pxd file can be directly used because they are inlined (or even just defines), for example PyArray_NDIM, basically everything from ndarraytypes.h. This is the reason one can use cython's ndarrays without much ado.
Other functionality (basically everything from ndarrayobject.h) cannot be accessed without calling np.import_array() in an initialization step, for example PyArray_FromAny. Why?
The answer is in the header __multiarray_api.h which is included in ndarrayobject.h, but cannot be found in the git-repository as it is generated during the installation, where the definition of PyArray_FromAny can be looked up:
...
static void **PyArray_API=NULL; //usually...
...
#define PyArray_CheckFromAny \
(*(PyObject * (*)(PyObject *, PyArray_Descr *, int, int, int, PyObject *)) \
PyArray_API[108])
...
PyArray_CheckFromAny isn't a name of a function, but a define fo a function pointer saved in PyArray_API, which is not initialized (i.e. is NULL), when module is first loaded! Btw, there is also a (private) function called PyArray_CheckFromAny, which is what the function pointer actually points to - and because the public version is a define there is no name collision when linked...
The last piece of the puzzle - the function _import_array (more or less the working horse behind np.import_array) is an inline function (case A), so only include path is needed, to be able to use it.
_import_array uses a similar approach to Cython's __pyx_capi__ to get the function pointers: The field is called _ARRAY_API and can be inspected via:
>>> import numpy.core._multiarray_umath as macore
>>> macore._ARRAY_API
<capsule object NULL at 0x7f17d85f3810>
More info about how PyArray_API can be initialized can be found in this SO-answer of mine.
However, when using functionality from numpy/math.pxd, one has to staticly link numpy's math-library (see for example this SO-question).
I work with Python most of the time, for some reasons now I also need to use C++.
I find Python's import XXX as X very neat in the following way, for example:
import numpy as np
a = np.array([1,2,3])
where I'm very clear by looking at my code that the array() function is provided by the numpy module.
However, when working with C++, if I do:
#include<cstdio>
std::remove(filename);
It's not clear to me at first sight that remove() function under the std namespace is provided by <cstdio>.
So I'm wondering if there is a way to do it in C++ as the import XXX as X way in Python?
Nope.
It'll be slightly clearer if you write std::remove (which you should be doing anyway; there's no guarantee that the symbol is available in the global namespace) because then at least you'll know it comes from a standard header.
Beyond that, it's up to your memory. š
Some people try to introduce hacks like:
namespace SomeThing {
#include <cstdio>
}
// Now it's SomeThing::std::remove
That might work for your own headers (though I'd still discourage it even then). But it'll cause all manner of chaos with standard headers for sure and is not permitted:
[using.headers]/1: The entities in the C++ standard library are defined in headers, whose contents are made available to a translation unit when it contains the appropriate #include preprocessing directive.
[using.headers]/3: A translation unit shall include a header only outside of any declaration or definition, and shall include the header lexically before the first reference in that translation unit to any of the entities declared in that header. No diagnostic is required.
Recall that #include and import are fundamentally different things. C++ modules may go some way towards this sort of functionality, perhaps, but by including source code you are not even touching namespaces of symbols created by that code.
No there is no way to force this syntax. The person who developped the code that you include is free. Generally people split their code into namespaces, which can result to this syntax:
#include <MyLibrary.h>
int main()
{
MyLibrary::SayHello();
return 0;
}
But you have no guarentee on how the code in the header is written.
C++ #include<XXX.h> equivalent of Python's import XXX as X
There is no equivalent in C++.
When you include a file into another, you get every single declaration from the included file, and you have no option of changing their names.
You can add aliases for types and namespaces though, and references to objects, as well as write wrapper functions to do some of what the as X part does in Python.
It's not clear to me at first sight that remove() is provided by <cstdio>.
The std namespace at least tells you that it is provided by the standard library.
What I like to do, is document which header provides the used declarations:
#include<cstdio> // std::remove
std::remove(filename);
That said, most IDE's can show you where an identifier is declared by ctrl-clicking or hovering over it (although this doesn't always work well when there are overloads in different headers). My primary use for inclusion comments is checking which includes can be removed after refactoring.
I am creating Python bindings for a C library.
In C the code to use the functions would look like this:
Ihandle *foo;
foo = MethFunc();
SetArribute(foo, 's');
I am trying to get this into Python. Where I have MethFunc() and SetAttribute() functions that could be used in my Python code:
import mymodule
foo = mymodule.MethFunc()
mymodule.SetAttribute(foo)
So far my C code to return the function looks like this:
static PyObject * _MethFunc(PyObject *self, PyObject *args) {
return Py_BuildValue("O", MethFunc());
}
But that fails by crashing (no errors)
I have also tried return MethFunc(); but that failed.
How can I return the function foo (or if what I am trying to achieve is completely wrong, how should I go about passing MethFunc() to SetAttribute())?
The problem here is that MethFunc() returns an IHandle *, but you're telling Python to treat it as a PyObject *. Presumably those are completely unrelated types.
A PyObject * (or any struct you or Python defines that starts with an appropriate HEAD macro) begins with pointers to a refcount and a type, and the first thing Python is going to do with any object you hand it is deal with those pointers. So, if you give it an object that instead starts with, say, two ints, Python is going to end up trying to access a type at 0x00020001 or similar, which is almost certain to segfault.
If you need to pass around a pointer to some C object, you have to wrap it up in a Python object. There are three ways to do this, from hackiest to most solid.
First, you can just cast the IHandle * to a size_t, then PyLong_FromSize_t it.
This is dead simple to implement. But it means these objects are going to look exactly like numbers from the Python side, because that's all they are.
Obviously you can't attach a method to this number; instead, your API has to be a free function that takes a number, then casts that number back to an IHandle* and calls a method.
It's more like, e.g., C's stdio, where you have to keep passing stdin or f as an argument to fread, instead of Python's io, where you call methods on sys.stdin or f.
But even worse, because there's no type checking, static or dynamic, to protect you from some Python code accidentally passing you the number 42. Which you'll then cast to an IHandle * and try to dereference, leading to a segfaultā¦
And if you were hoping Python's garbage collector would help you know when the object is still referenced, you're out of luck. You need to make your users manually keep track of the number and call some CloseHandle function when they're done with it.
Really, this isn't that much better than accessing your code from ctypes, so hopefully that inspires you to keep reading.
A better solution is to cast the IHandle * to a void *, then PyCapsule_New it.
If you haven't read about capsules, you need to at least skim the main chapter. But the basic idea is that it wraps up a void* as a Python object.
So, it's almost as simple as passing around numbers, but solves most of the problems. Capsules are opaque values which your Python users can't accidentally do arithmetic on; they can't send you 42 in place of a capsule; you can attach a function that gets called when the last reference to a capsule goes away; you can even give it a nice name to show up in the repr.
But you still can't attach any behavior to capsules.
So, your API will still have to be a MethSetAttribute(mymodule, foo) instead of mymeth.SetAttribute(foo) if mymodule is a capsule, just as if it's an int. (Except now it's type-safe.)
Finally, you can build a new Python extension type for a struct that contains an IHandle *.
This is a lot more work. And if you haven't read the tutorial on Defining Extension Types, you need to go thoroughly read through that whole chapter.
But it means that you have an actual Python type, with everything that goes with it.
You can give it a SetAttribute method, and Python code can just call that method. You can give it whatever __str__ and __repr__ you want. You can give it a __doc__. Python code can do isinstance(mymodule, MyMeth). And so on.
If you're willing to use C++, or D, or Rust instead of C, there are some great libraries (PyCxx, boost::python, Pyd, rust-python, etc.) that can do most of the boilerplate for you. You just declare that you want a Python class and how you want its attributes and methods bound to your C attributes and methods and you get something you can use like a C++ class, except that it's actually a PyObject * under the covers. (And it'll even takes care of all the refcounting cruft for you via RAII, which will save you endless weekends debugging segfaults and memory leaksā¦)
Or you can use Cython, which lets you write C extension modules in a language that's basically Python, but extended to interface with C code. So your wrapper class is just a class, but with a special private cdef attribute that holds the IHandle *, and your SetAttribute(self, s) can just call the C SetAttribute function with that private attribute.
Or, as suggested by user, you can also use SWIG to generate the C bindings for you. For simple cases, it's pretty trivialājust feed it your C API, and it gives you back the code to build your Python .so. For less simple cases, I personally find it a lot more painful than something like PyCxx, but it definitely has a lower learning curve if you don't already know C++.
How does module loading work in CPython under the hood? Especially, how does the dynamic loading of extensions written in C work? Where can I learn about this?
I find the source code itself rather overwhelming. I can see that trusty ol' dlopen() and friends is used on systems that support it but without any sense of the bigger picture it would take a long time to figure this out from the source code.
An enormous amount could be written on this topic but as far as I can tell, almost nothing has been ā the abundance of webpages describing the Python language itself makes this difficult to search for. A great answer would provide a reasonably brief overview and references to resources where I can learn more.
I'm mostly concerned with how this works on Unix-like systems simply because that's what I know but I am interested in if the process is similar elsewhere.
To be more specific (but also risk assuming too much), how does CPython use the module methods table and initialization function to "make sense" of dynamically loaded C?
TLDR short version bolded.
References to the Python source code are based on version 2.7.6.
Python imports most extensions written in C through dynamic loading. Dynamic loading is an esoteric topic that isn't well documented but it's an absolute prerequisite. Before explaining how Python uses it, I must briefly explain what it is and why Python uses it.
Historically C extensions to Python were statically linked against the Python interpreter itself. This required Python users to recompile the interpreter every time they wanted to use a new module written in C. As you can imagine, and as Guido van Rossum describes, this became impractical as the community grew. Today, most Python users never compile the interpreter once. We simply "pip install module" and then "import module" even if that module contains compiled C code.
Linking is what allows us to make function calls across compiled units of code. Dynamic loading solves the problem of linking code when the decision for what to link is made at runtime. That is, it allows a running program to interface with the linker and tell the linker what it wants to link with. For the Python interpreter to import modules with C code, this is what's called for. Writing code that makes this decision at runtime is quite uncommon and most programmers would be surprised that it's possible. Simply put, a C function has an address, it expects you to put certain data in certain places, and it promises to have put certain data in certain places upon return. If you know the secret handshake, you can call it.
The challenge with dynamic loading is that it's incumbent upon the programmer to get the handshake right and there are no safety checks. At least, they're not provided for us. Normally, if we try to call a function name with an incorrect signature we get a compile or linker error. With dynamic loading we ask the linker for a function by name (a "symbol") at runtime. The linker can tell us if that name was found but it canāt tell us how to call that function. It just gives us an address - a void pointer. We can try to cast to a function pointer of some sort but it is solely up to the programmer to get the cast correct. If we get the function signature wrong in our cast, itās too late for the compiler or linker to warn us. Weāll likely get a segfault after the program careens out of control and ends up accessing memory inappropriately. Programs using dynamic loading must rely on pre-arranged conventions and information gathered at runtime to make proper function calls. Here's a small example before we tackle the Python interpreter.
File 1: main.c
/* gcc-4.8 -o main main -ldl */
#include <dlfcn.h> /* key include, also in Python/dynload_shlib.c */
/* used for cast to pointer to function that takes no args and returns nothing */
typedef void (say_hi_type)(void);
int main(void) {
/* get a handle to the shared library dyload1.so */
void* handle1 = dlopen("./dyload1.so", RTLD_LAZY);
/* acquire function ptr through string with name, cast to function ptr */
say_hi_type* say_hi1_ptr = (say_hi_type*)dlsym(handle1, "say_hi1");
/* dereference pointer and call function */
(*say_hi1_ptr)();
return 0;
}
/* error checking normally follows both dlopen() and dlsym() */
File 2: dyload1.c
/* gcc-4.8 -o dyload1.so dyload1.c -shared -fpic */
/* compile as C, C++ does name mangling -- changes function names */
#include <stdio.h>
void say_hi1() {
puts("dy1: hi");
}
These files are compiled and linked separately but main.c knows to go looking for ./dyload1.so at runtime. The code in main assumes that dyload1.so will have a symbol "say_hi1". It gets a handle to dyload1.so's symbols with dlopen(), gets the address of a symbol using dlsym(), assumes it is a function that takes no arguments and returns nothing, and calls it. It has no way to know for sure what "say_hi1" is -- a prior agreement is all that keeps us from segfaulting.
What I've shown above is the dlopen() family of functions. Python is deployed on many platforms, not all of which provide dlopen() but most have similar dynamic loading mechanisms. Python achieves portable dynamic loading by wrapping the dynamic loading mechanisms of several operating systems in a common interface.
This comment in Python/importdl.c summarizes the strategy.
/* ./configure sets HAVE_DYNAMIC_LOADING if dynamic loading of modules is
supported on this platform. configure will then compile and link in one
of the dynload_*.c files, as appropriate. We will call a function in
those modules to get a function pointer to the module's init function.
*/
As referenced, in Python 2.7.6 we have these dynload*.c files:
Python/dynload_aix.c Python/dynload_beos.c Python/dynload_hpux.c
Python/dynload_os2.c Python/dynload_stub.c Python/dynload_atheos.c
Python/dynload_dl.c Python/dynload_next.c Python/dynload_shlib.c
Python/dynload_win.c
They each define a function with this signature:
dl_funcptr _PyImport_GetDynLoadFunc(const char *fqname, const char *shortname,
const char *pathname, FILE *fp)
These functions contain the different dynamic loading mechanisms for different operating systems. The mechanism for dynamic loading on Mac OS newer than 10.2 and most Unix(-like) systems is dlopen(), which is called in Python/dynload_shlib.c.
Skimming over dynload_win.c, the analagous function for Windows is LoadLibraryEx(). Its use looks very similar.
At the bottom of Python/dynload_shlib.c you can see the actual call to dlopen() and to dlsym().
handle = dlopen(pathname, dlopenflags);
/* error handling */
p = (dl_funcptr) dlsym(handle, funcname);
return p;
Right before this, Python composes the string with the function name it will look for. The module name is in the shortname variable.
PyOS_snprintf(funcname, sizeof(funcname),
LEAD_UNDERSCORE "init%.200s", shortname);
Python simply hopes there's a function called init{modulename} and asks the linker for it. Starting here, Python relies on a small set of conventions to make dynamic loading of C code possible and reliable.
Let's look at what C extensions must do to fulfill the contract that makes the above call to dlsym() work. For compiled C Python modules, the first convention that allows Python to access the compiled C code is the init{shared_library_filename}() function. For a module named spam compiled as shared library named āspam.soā, we might provide this initspam() function:
PyMODINIT_FUNC
initspam(void)
{
PyObject *m;
m = Py_InitModule("spam", SpamMethods);
if (m == NULL)
return;
}
If the name of the init function does not match the filename, the Python interpreter cannot know how to find it. For example, renaming spam.so to notspam.so and trying to import gives the following.
>>> import spam
ImportError: No module named spam
>>> import notspam
ImportError: dynamic module does not define init function (initnotspam)
If the naming convention is violated there's simply no telling if the shared library even contains an initialization function.
The second key convention is that once called, the init function is responsible for initializing itself by calling Py_InitModule. This call adds the module to a "dictionary"/hash table kept by the interpreter that maps module name to module data. It also registers the C functions in the method table. After calling Py_InitModule, modules may initialize themselves in other ways such as adding objects. (Ex: the SpamError object in the Python C API tutorial). (Py_InitModule is actually a macro that creates the real init call but with some info baked in like what version of Python our compiled C extension used.)
If the init function has the proper name but does not call Py_InitModule(), we get this:
SystemError: dynamic module not initialized properly
Our methods table happens to be called SpamMethods and looks like this.
static PyMethodDef SpamMethods[] = {
{"system", spam_system, METH_VARARGS,
"Execute a shell command."},
{NULL, NULL, 0, NULL}
};
The method table itself and the function signature contracts it entails is the third and final key convention necessary for Python to make sense of dynamically loaded C. The method table is an array of struct PyMethodDef with a final sentinel entry. A PyMethodDef is defined in Include/methodobject.h as follows.
struct PyMethodDef {
const char *ml_name; /* The name of the built-in function/method */
PyCFunction ml_meth; /* The C function that implements it */
int ml_flags; /* Combination of METH_xxx flags, which mostly
describe the args expected by the C func */
const char *ml_doc; /* The __doc__ attribute, or NULL */
};
The crucial part here is that the second member is a PyCFunction. We passed in the address of a function, so what is a PyCFunction? It's a typedef, also in Include/methodobject.h
typedef PyObject *(*PyCFunction)(PyObject *, PyObject *);
PyCFunction is a typedef for a pointer to a function that returns a pointer to a PyObject and that takes for arguments two pointers to PyObjects. As a lemma to convention three, C functions registered with the method table all have the same signature.
Python circumvents much of the difficulty in dynamic loading by using a limited set of C function signatures. One signature in particular is used for most C functions. Pointers to C functions that take additional arguments can be "snuck in" by casting to PyCFunction. (See the keywdarg_parrot example in the Python C API tutorial.) Even C functions that backup Python functions that take no arguments in Python will take two arguments in C (shown below). All functions are also expected to return something (which may just be the None object). Functions that take multiple positional arguments in Python have to unpack those arguments from a single object in C.
That's how the data for interfacing with dynamically loaded C functions is acquired and stored. Finally, here's an example of how that data is used.
The context here is that we're evaluating Python "opcodes", instruction by instruction, and we've hit a function call opcode. (see https://docs.python.org/2/library/dis.html. It's worth a skim.) We've determined that the Python function object is backed by a C function. In the code below we check if the function in Python takes no arguments (in Python) and if so, call it (with two arguments in C).
Python/ceval.c.
if (flags & (METH_NOARGS | METH_O)) {
PyCFunction meth = PyCFunction_GET_FUNCTION(func);
PyObject *self = PyCFunction_GET_SELF(func);
if (flags & METH_NOARGS && na == 0) {
C_TRACE(x, (*meth)(self,NULL));
}
It does of course take arguments in C - exactly two. Since everything is an object in Python, it gets a self argument. At the bottom you can see that meth is assigned a function pointer which is then dereferenced and called. The return value ends up in x.
I'm trying to compile lunatic python on windows with minigw. The command is as follows:
gcc.exe -shared -DLUA_BUILD_AS_DLL src\luainpython.c src\pythoninlua.c liblua.a
libpython27.a -IC:\Python27\include -IC:\LUA\include
This gives me undefined reference errors. But I cannot find any lua api change reference with what should I replace these.
src\luainpython.c:350:14: warning: 'LuaObject_Type' redeclared without dllimport
attribute after being referenced with dll linkage [enabled by default]
C:\Users\Wiz\AppData\Local\Temp\cccm0nAN.o:luainpython.c:(.text+0x7a): undefined
reference to `lua_strlen'
C:\Users\Wiz\AppData\Local\Temp\cccm0nAN.o:luainpython.c:(.text+0x557): undefine
d reference to `_imp__LuaObject_Type'
C:\Users\Wiz\AppData\Local\Temp\cccm0nAN.o:luainpython.c:(.text+0xc3a): undefine
d reference to `luaL_getn'
C:\Users\Wiz\AppData\Local\Temp\cccm0nAN.o:luainpython.c:(.text+0x1036): undefin
ed reference to `luaopen_loadlib'
c:/mingw32/bin/../lib/gcc/i686-w64-mingw32/4.7.1/../../../../i686-w64-mingw32/bi
n/ld.exe: C:\Users\Wiz\AppData\Local\Temp\cccm0nAN.o: bad reloc address 0x0 in s
ection `.data'
collect2.exe: error: ld returned 1 exit status
The original Lunatic-Python codebase has many known problems -- the build issue you're running into above being one of them. Unfortunately, it doesn't seem like the original author is still maintaining this project -- if the last modification date here is any indication.
If you're still trying to get it to work I would highly recommend going with one of the more recent forks. In particular, the Lunantic-Python fork at github incorporates many of my fixes improvements.
Getting back to your question, many of the undefined references are due to improper forward declaration in the headers or because of defined macros that cause the forward declare to be incorrect. For example, the original luainpython.h contains:
PyAPI_DATA(PyTypeObject) LuaObject_Type;
In windows, after preprocessing it expands into:
extern __declspec(dllimport) PyTypeObject LuaObject_Type;
In other words, the linker's going to try and find the definition of LuaObject_Type from an import library. This is of course wrong since that new type is created and implemented by lunatic in luainpython.c. The proper prototype should be extern PyTypeObject LuaObject_Type; instead.
Also note that luaopen_loadlib is deprecated in Lua5.1 which explains the other undefined reference you're getting. In fact, lunatic-python's usage of the following are all deprecated:
luaopen_base(L);
luaopen_table(L);
luaopen_io(L);
luaopen_string(L);
luaopen_debug(L);
luaopen_loadlib(L);
and should be replaced with this instead:
luaL_openlibs(L);