Bundle all dependencies in Cython compile [duplicate]

Bundle all dependencies in Cython compile [duplicate] - python

I am trying to make one unix executable file from my python source files.
I have two file, p1.py and p2.py
p1.py :-
from p2 import test_func
print (test_func())
p2.py :-
def test_func():
return ('Test')
Now, as we can see p1.py is dependent on p2.py . I want to make an executable file by combining two files together. I am using cython.
I changed the file names to p1.pyx and p2.pyx respectively.
Now, I can make file executable by using cython,
cython p1.pyx --embed
It will generate a C source file called p1.c . Next we can use gcc to make it executable,
gcc -Os -I /usr/include/python3.5m -o test p1.c -lpython3.5m -lpthread -lm -lutil -ldl
But how to combine two files into one executable ?

People are tempted to do this because it's fairly easy to do for the simplest case (one module, no dependencies). #ead's answer is good but honestly pretty fiddly and it is handling the next simplest case (two modules that you have complete control of, no dependencies).
In general a Python program will depend on a range of external modules. Python comes with a large standard library which most programs use to an extent. There's a wide range of third party libraries for maths, GUIs, web frameworks. Even tracing those dependencies through the libraries and working out what you need to build is complicated, and tools such as PyInstaller attempt it but aren't 100% reliable.
When you're compiling all these Python modules you're likely to come across a few Cython incompatibilities/bugs. It's generally pretty good, but struggles with features like introspection, so it's unlikely a large project will compile cleanly and entirely.
On top of that many of those modules are compiled modules written either in C, or using tools such as SWIG, F2Py, Cython, boost-python, etc.. These compiled modules may have their own unique idiosyncrasies that make them difficult to link together into one large blob.
In summary, it may be possible, but for non-trivial programs it is not a good idea however appealing it seems. Tools like PyInstaller and Py2Exe that use a much simpler approach (bundle everything into a giant zip file) are much more suitable for this task (and even then they struggle to be really robust).
Note this answer is posted with the intention of making this question a canonical duplicate for this problem. While an answer showing how it might be done is useful, "don't do this" is probably the best solution for the vast majority of people.

There are some loops you have to jump through to make it work.
First, you must be aware that the resulting executable is a very slim layer which just delegates the whole work to (i.e. calls functions from) pythonX.Ym.so. You can see this dependency when calling
ldd test
...
libpythonX.Ym.so.1.0 => not found
...
So, to run the program you either need to have the LD_LIBRARY_PATH showing to the location of the libpythonX.Ym.so or build the exe with --rpath option, otherwise at the start-up of test dynamic loader will throw an error similar to
/test: error while loading shared libraries: libpythonX.Ym.so.1.0: cannot open shared object file: No such file or directory
The generic build command would look like following:
gcc -fPIC <other flags> -o test p1.c -I<path_python_include> -L<path_python_lib> -Wl,-rpath=<path_python_lib> -lpython3.6m <other_needed_libs>
It is also possible to build against static version of the python-library, thus eliminating run time dependency on the libpythonX.Ym, see for example this SO-post.
The resulting executable test behaves exactly the same as if it were a python-interpreter. This means that now, test will fail because it will not find the module p2.
One simple solution were to cythonize the p2-module inplace (cythonize p2.pyx -i): you would get the desired behavior - however, you would have to distribute the resulting shared-object p2.so along with test.
It is easy to bundle both extension into one executable - just pass both cythonized c-files to gcc:
# creates p1.c:
cython --empbed p1.pyx
# creates p2.c:
cython p2.pyx
gcc ... -o test p1.c p2.c ...
But now a new (or old) problem arises: the resulting test-executable cannot once again find the module p2, because there is no p2.py and no p2.so on the python-path.
There are two similar SO questions about this problem, here and here. In your case the proposed solutions are kind of overkill, here it is enough to initialize the p2 module before it gets imported in the p1.pyx-file to make it work:
# making init-function from other modules accessible:
cdef extern object PyInit_p2();
#init/load p2-module manually
PyInit_p2() #Cython handles error, i.e. if NULL returned
# actually using already cached imported module
# no search in python path needed
from p2 import test_func
print(test_func())
Calling the init-function of a module prior to importing it (actually the module will not be really imported a second time, only looked up in the cache) works also if there are cyclic dependencies between modules. For example if module p2 imports module p3, which imports p2in its turn.
Warning: Since Cython 0.29, Cython uses multi-phase initialization per default for Python>=3.5, thus calling PyInit_p2 is not enough (see e.g. this SO-post). To switch off this multi-phase initialization -DCYTHON_PEP489_MULTI_PHASE_INIT=0should be passed to gcc or similar to other compilers.
Note: However, even after all of the above, the embedded interpreter will need its standard libraries (see for example this SO-post) - there is much more work to do to make it truly standalone! So maybe one should heed #DavidW's advice:
"don't do this" is probably the best solution for the vast majority of
people.
A word of warning: if we declare PyInit_p2() as
from cpython cimport PyObject
cdef extern PyObject *PyInit_p2();
PyInit_p2(); # TODO: error handling if NULL is returned
Cython will no longer handle the errors and its our responsibility. Instead of
PyObject *__pyx_t_1 = NULL;
__pyx_t_1 = PyInit_p2(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
produced for object-version, the generated code becomes just:
(void)(PyInit_p2());
i.e. no error checking!
On the other hand using
cdef extern from *:
"""
PyObject *PyInit_p2(void);
"""
object PyInit_p2()
will not work with g++ - one has to add extern C to declaration.

Related

Unit test C-generating python code

I have a project which involves some (fairly simple) C-code generation as part of a build system. In essence, I have some version information associated with a project (embedded C) which I want to expose in my binary, so that I can easily determine what firmware version was programmed to a particular device for debugging purposes.
I'm writing some simplistic python tools to do this, and I want to make sure they're thoroughly tested. In general, this has been fairly straightforward, but I'm unsure what the best strategy is for the code-generation portion. Essentially, I want to make sure that the generated files both:
Are syntactically correct
Contain the necessary information
The second, I can (I believe) achieve to a reasonable degree with regex matching. The first, however, is something of a bigger task. I could probably use something like pycparser and examine the resulting AST to accomplish both goals, but that seems like an unnecessarily heavyweight solution.
Edit: A dataflow diagram of my build hierarchy

Thanks for the diagram! Since you are not testing for coverage, if it were me, I would just compile the generated C code and see if it worked :) . You didn't mention your toolchain, but in a Unix-like environment, gcc <whatever build flags> -c generated-file.c || echo 'Oops!' should be sufficient.
Now, it may be that the generated code isn't a freestanding compilation unit. No problem there: write a shim. Example shim.c:
#include <stdio.h>
#include "generated-file.c"
main() {
printf("%s\n", GENERATED_VERSION); //or whatever is in generated-file.c
}
Then gcc -o shim shim.c && diff <(./shim) "name of a file holding the expected output" || echo 'Oops!' should give you a basic test. (The <() is bash process substitution.) The file holding the expected results may already be in your git repo, or you might be able to use your Python routine to write it to disk somewhere.
Edit 2 This approach can work even if your actual toolchain isn't amenable to automation. To test syntactic validity of your code, you can use gcc even if you are using a different compiler for your target processor. For example, compiling with gcc -ansi will disable a number of GNU extensions, which means code that compiles with gcc -ansi is more likely to compile on another compiler than is code that compiles with full-on, GNU-extended gcc. See the gcc page on "C Dialect Options" for all the different flavors you can use (ditto C++).
Edit Incidentally, this is the same approach GNU autoconf uses: write a small test program to disk (autoconf calls it conftest.c), compile it, and see if the compilation succeeded. The test program is (preferably) the bare minimum necessary to test if everything is OK. Depending on how complicated your Python is, you might want to test several different aspects of your generated code with respective, different shims.

How to compile my python code in cython with external python libs like pybrain

I need more perfomance running my neural network, so I thinked that building it with cython will be good idea. I am building my code like this:
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules = cythonize("my_code.pyx")
)
But will it build external python files that I use? Like pybrain, skimage and PIL in my case.
If not, how to force cython to build them.

No, external python files will not be cythonized and compiled unless you specifically add them to your setup.py as an extension. As far as I know there is no trivial way to do this.
This means that all calls to the external files will be handled in 'Python-space' and hence can not use the full potential of Cython. For example all calls to an external file will be type checked, which wastes a lot of time. You can see this if you cythonize a file using cython -a yourfile.pyx and take a look at the created C code. The more yellow there is the more pythony your code is.
You have the following options:
Find libraries / packages that offer Cython or C-level access. Unfortunately chances are low that you will find good ones (or any at all) using Cython and building a wrapper for a C library is a lot of work. Note that packages that themselve are implemented in C (like numpy for example) already are reasonably fast. I do not know how this behaves with your packages in question. pybrains seems to be pure python from what I saw at first glance.
Get the source code of the packages you want to use and compile them yourself with Cython. This might be an awful lot of work and not worth the time.
Find the bottlenecks using a profiler like lineprofiler / kernprof (this should always be the first step when optimizing) and try to cythonize only the runtime bottlenecks.
I personally would go with option three, as options one and two both might require a lot of work on your side with questionable outcome.

Embedding Cython in C++

I am trying to embed a piece of Cython code in a C++ project, such that I can compile a binary that has no dependencies on Python 2.7 (so users can run the executable without having Python installed). The Cython source is not pure Cython: There is also Python code in there.
I am compiling my Cython code using distutils in the following script (setup.py):
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules = cythonize("test.pyx")
)
I then run the script using python setup.py build_ext --inplace. This generates a couple of files: test.c, test.h, test.pyd and some library files: test.exp, test.obj and test.lib.
What would be the proper procedure to import this into C++? I managed to get it working by including test.c and test.h during compilation and test.lib during linking.
I am then able to call the Cython functions after I issue
Py_Initialize();
inittest();
in my C++ code.
The issue is that there a numerous dependencies on Python, both during compilation (e.g., in test.h) as well in during linking. Bottom-line is that in order to run the executable, Python has to be installed (otherwise I get errors on missing python27.dll).
Am I going in the right direction with this approach? There are so many options that I am just very confused on how to proceed. Conceptually, it also does not make sense why I should call Py_Initialize() if I want the whole thing to be Python-independent. Furthermore, this is apparently the `Very High Level Embedding' method instead a low-level Cython embedding, but this is just how I got it to work.
If anybody has any insights on this, that would be really appreciated.

Cython cannot make Python code Python-independent; it calls into the Python library in order to handle Python types and function calls. If you want your program to be Python-independent then you should not write any Python code.

(This is primarily extra detail to
Ignacio Vazquez-Abrams's answer which says that you can't eliminate the Python dependency)
If you don't want to force your users to have Python installed themselves, you could always bundle python27.dll with your application (read the license agreement, but I'm almost certain it's fine!).
However, as soon as you do an import in your code, you either have to bundle the relevant module, or make sure it (and anything it imports!) is compiled with Cython. Unless you're doing something very trivial then you could end spending a lot of time chasing dependencies. This includes the majority of the standard library.

Adding a library to C

I'm working with python, but I have a basic understanding of packaging with C. However I don't know how to build the c 'path.' Also, my google searches seem to be failing me returning results on c++. Or is that my solution?
The objective is to include qrencode.h, I can surly put it in the same folder but I'd like to know how to link to it instead.
Thanks!
PS. As always, addition to read material that is relevant would be much appreciated!

You use an include directive to include the *.h file in your C/C++ code:
#include "qrencode.h"
As #Ignacio Vazquez-Abrams says, though, that's just a header, which declares functions; you need the actual functions, and they'll be in a *.dylib or *.so file, which needs to be linked into an executable. Compiling is turning one *.c file into a *.o file; linking is when you put all the *.o files and libraries together into an application. The -L option on the linker command line tells it where to look for libraries; the -l option tells it to include a library.

Compiling Python to C using Cython

I'm trying to compile python source code foo.py to C using cython.
In foo.py:
print "Hello World"
The command I'm running is cython foo.py.
The problem is that when compiling foo.c using gcc, I get the error:
undefined reference to 'main'.

when converting the code from python to c (using Cython) it converts it to c code which can be compiled into a shared object.
in order to make it executable, you should add "--embed" to cython conversion command. this flag adds the 'main' function you need, so you could compile the c code into executable file.
please notice you'll need the python .so runtime libraries in order to run the exec.

Read the Cython documentation. This will also (hopefully) teach you what Cython is and what it isn't. Cython is for creating python extensions (not a general-purpose Python-to-C-compiler), which are shared objects/dlls. Dynamically loaded libraries don't have a main function like standalone programs, but compilers assume that they are ultimately linking an executable. You have to tell them otherwise via flags (-shared methinks, but again, refer to the Cython documentation) - or even better, don't compile yourself, use a setup.py for this (yet again, read the Cython documentation).

The usual way is to use distutils to compile the cython-generated file. This also gives you all the include directories you need in a portable way.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.