Embedding Cython in C++

Embedding Cython in C++ - python

I am trying to embed a piece of Cython code in a C++ project, such that I can compile a binary that has no dependencies on Python 2.7 (so users can run the executable without having Python installed). The Cython source is not pure Cython: There is also Python code in there.
I am compiling my Cython code using distutils in the following script (setup.py):
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules = cythonize("test.pyx")
)
I then run the script using python setup.py build_ext --inplace. This generates a couple of files: test.c, test.h, test.pyd and some library files: test.exp, test.obj and test.lib.
What would be the proper procedure to import this into C++? I managed to get it working by including test.c and test.h during compilation and test.lib during linking.
I am then able to call the Cython functions after I issue
Py_Initialize();
inittest();
in my C++ code.
The issue is that there a numerous dependencies on Python, both during compilation (e.g., in test.h) as well in during linking. Bottom-line is that in order to run the executable, Python has to be installed (otherwise I get errors on missing python27.dll).
Am I going in the right direction with this approach? There are so many options that I am just very confused on how to proceed. Conceptually, it also does not make sense why I should call Py_Initialize() if I want the whole thing to be Python-independent. Furthermore, this is apparently the `Very High Level Embedding' method instead a low-level Cython embedding, but this is just how I got it to work.
If anybody has any insights on this, that would be really appreciated.

Cython cannot make Python code Python-independent; it calls into the Python library in order to handle Python types and function calls. If you want your program to be Python-independent then you should not write any Python code.

(This is primarily extra detail to
Ignacio Vazquez-Abrams's answer which says that you can't eliminate the Python dependency)
If you don't want to force your users to have Python installed themselves, you could always bundle python27.dll with your application (read the license agreement, but I'm almost certain it's fine!).
However, as soon as you do an import in your code, you either have to bundle the relevant module, or make sure it (and anything it imports!) is compiled with Cython. Unless you're doing something very trivial then you could end spending a lot of time chasing dependencies. This includes the majority of the standard library.

Related

How to compile my python code in cython with external python libs like pybrain

I need more perfomance running my neural network, so I thinked that building it with cython will be good idea. I am building my code like this:
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules = cythonize("my_code.pyx")
)
But will it build external python files that I use? Like pybrain, skimage and PIL in my case.
If not, how to force cython to build them.

No, external python files will not be cythonized and compiled unless you specifically add them to your setup.py as an extension. As far as I know there is no trivial way to do this.
This means that all calls to the external files will be handled in 'Python-space' and hence can not use the full potential of Cython. For example all calls to an external file will be type checked, which wastes a lot of time. You can see this if you cythonize a file using cython -a yourfile.pyx and take a look at the created C code. The more yellow there is the more pythony your code is.
You have the following options:
Find libraries / packages that offer Cython or C-level access. Unfortunately chances are low that you will find good ones (or any at all) using Cython and building a wrapper for a C library is a lot of work. Note that packages that themselve are implemented in C (like numpy for example) already are reasonably fast. I do not know how this behaves with your packages in question. pybrains seems to be pure python from what I saw at first glance.
Get the source code of the packages you want to use and compile them yourself with Cython. This might be an awful lot of work and not worth the time.
Find the bottlenecks using a profiler like lineprofiler / kernprof (this should always be the first step when optimizing) and try to cythonize only the runtime bottlenecks.
I personally would go with option three, as options one and two both might require a lot of work on your side with questionable outcome.

How to install python binding of a C++ library

Imaging that we are given a finished C++ source code of a library, called MyAwesomeLib. The goal is to expose some of its power to python, so we create a wrapper using swig and generated a python package called PyMyAwesomeLib.
The directory structure now looks like
root_dir
|-src/
|-lib/
| |- libMyAwesomeLib.so
| |- _PyMyAwesomeLib.so
|-swig/
| |- PyMyAwesomeLib.py
|-python/
|- Script_using_myawesomelib.py
So far so good. Ideally, all we want to do next is to copy lib/*.so swig/*.py and python/*.py into the corresponding directory in site-packages in a pythonic way, i.e. using
python setup.py install
However, I got very confused when trying to achieve this simple goal using setuptools and distutils. Both tools handles the compilation of python extensions through an internal system, where the source file, compiler flags etc. are passed using setup(ext_module=[Extension(...)]). But this is ridiculous since MyAsesomeLib has a fully functioning build system that is based on makefile. Porting the logic embedded in makefiles would be redundant and completely un-necessary work.
After some research, it seems there are two options left, I can either override setuptools.command.build and setuptools.command.install to use the existing makefile and copy the results directly, or I can somehow let setuptools know about these files and ask it to copy them during installation. The second way is more appealing, but it is what gives me the most headache. I have tried the following optionts without success
package_data, and include_package_data does not work because *.so files are not under version control and they are not inside of any package.
data_files does not seems to work since the files only get included when running python setup.py sdist, but ignored when python setup.py install. This is the opposite of what I want. The .so files should not be included in the source distribution, but get copied during the installation step.
MANIFEST.in failed for the same reason as data_files.
eager_resources does not work either, but honestly I do not know the difference between eager_resources and data_files or MANIFEST.in.
I think this is actually a common situation, and I hope there is a simple solution to it. Any help would be greatly appreciated.

Porting the logic embedded in makefiles would be redundant and
completely un-necessary work.
Unfortunately, that's exactly what I had to do. I've been struggling with this same issue for a while now.
Porting it over actually wasn't too bad. distutils does understand SWIG extensions, but it this was implemented rather haphazardly on their part. Running SWIG creates Python files, and the current build order assumes that all Python files have been accounted for before running build_ext. That one wasn't too hard to fix, but it's annoying that they would claim to support SWIG without mentioning this. Distutils attempts to be cross-platform when compiling things, so there is still an advantage to using it.
If you don't want to port your entire build system over, use the system's package manager. Many complex libraries do this (but they also try their best with setup.py). For example, to get numpy and lxml on Ubuntu you'd just do:
sudo apt-get install python-numpy python-lxml. No pip.
I realize you'd rather write one setup file instead of dealing with every package manager ever so this is probably not very helpful.
If you do try to go the setuptools route there is one fatal flaw I ran into: dependencies.
For instance, if you are distributing a SWIG-based project, it's going to need libpython. If they don't have it, an error like this happens:
#include <Python.h>
error: File not found
That's pretty unhelpful to the average user.
Even worse, if you require a shared library but the user's library is out of date, the user can get some crazy errors. You're at the mercy of their C++ compiler to output Google-friendly error messages so they can figure it out.
The long-term solution would be to get setuptools/distutils to get better at detecting non-python libraries, hopefully as good as Ruby's gem. I pretty much had to roll my own. For instance, in this setup.py I'm working on you can see a few functions at the top I hacked together for dependency detection (still doesn't work on all systems...definitely not Windows).

Better Way of Debugging Cython Packages

I currently use Cython to build a module that is mostly written in C. I would like to be able to debug quickly by simply calling a python file that imports the "new" Cython module and test it. The problem is that I import GSL and therefore pyximport will not work. So I'm left with "python setup.py build; python setup.py install" and then running my test script.
Is this the only way? I was wondering if anyone else uses any shortcuts or scripts to help them debug faster?

I usually just throw all the commands I need to build and test into a shell script, and run it when I want to test. It's a lot easier than futzing with crazy Python test runners.

How to compile a Python package to a dll

Well, I have a Python package. I need to compile it as dll before distribute it in a way easily importable. How? You may suggest that *.pyc. But I read somewhere any *.pyc can be easily decompiled!
Update:
Follow these:
1) I wrote a python package
2) want to distribute it
3) do NOT want distribute the source
4) *.pyc is decompilable >> source can be extracted!
5) dll is standard

Write everything you want to hide in Cython, and compile it to pyd. That's as close as you can get to making compiled python code.
Also, dll is not a standard, not in Python world. They're not portable, either.

Nowadays a simple solutino exists: use Nuitka compiler as described in Nuitka User Manual
Use Case 2 - Extension Module compilation
If you want to compile a single extension module, all you have to do is this:
python -m nuitka --module some_module.py
The resulting file some_module.so can then be used instead of some_module.py.
You need to compile for each platform you want to support and write some initialization code to import so/pyd file ~~appropriate for given platform/python version etc.~~
[EDIT 2021-12]: Actually in python 3 the proper so/dll is determined automatically based on the file name (if it includes python version and platform - can't find PEP for this feature at the moment but Nuitka creates proper names for compiled modules). So for python 2.7 the library name would be something.pyd or something.so whereas for python 3 this would change to something.cp36-win32.pyd or something.cpython-36m-x86_64-linux-gnu.so (for 32bit python 3.6 on x86).
The result is not DLL as requested but Python-native compiled binary format (it is not bytecode like in pyc files; the so/pyd format cannot be easily decompiled - Nuitka compiles to machine code through C++ translation)
EDIT [2020-01]: The compiled module is prone to evaluation methods using python standard mechanisms - e.g. it can be imported as any other module and get its methods listed etc. To secure implementation from being exposed that way there is more work to be done than just compiling to a binary module.

You can use py2exe.org to convert python scripts into windows executables. Granted this will only work on windows, but it's better then nothing.

You can embed python inside C. The real trick is converting between C values and Python values. Once you've done that, though, making a DLL is pretty straightforward.
However, why do you need to make a dll? Do you need to use this from a non-python program?

Python embedding is supported in CFFI version 1.5, you can create a .dll file which can be used by a Windows C application.

I would also using Cython to generate pyd files, like Dikei wrote.
But if you really want to secure your code, you should better write the important stuff in C++. The best would be to combine both C++ and Python. The idea: you would leave the python code open for adjustments, so that you don't have to compile everything over and over again. That means, you would write the "core" in C++ (which is the most secure solution these days) and use those dll files in your python code. It really depends what kind of tool or program you are building and how you want to execute it. I create mostly an execution file (exe,app) once I finish a tool or a program, but this is more for the end user. This could be done with py2exe and py2app (both 64 bit compatible). If you implement the interpreter, the end user's machine doesn't have to have python installed on the system.
A pyd file is the same like a dll and fully supported inside python. So you can normally import your module. You can find more information about it here.
Using and generating pyd files is the fastest and easiest way to create safe and portable python code.
You could also write real dll files in C++ and import them with ctypes to use them (here a good post and here the python description of how it works)

To expand on the answer by Nick ODell
You must be on Windows for DLLs to work, they are not portable.
However the code below is cross platform and all platforms support run-times so this can be re-compiled for each platform you need it to work on.
Python does not (yet) provide an easy tool to create a dll, however you can do it in C/C++
First you will need a compiler (Windows does not have one by default) notably Cygwin, MinGW or Visual Studio.
A basic knowledge of C is also necessary (since we will be coding mainly in C).
You will also need to include the necessary headers, I will skip this so it does not become horribly long, and will assume everything is set up correctly.
For this demonstration I will print a traditional hello world:
Python code we will be converting to a DLL:
def foo(): print("hello world")
C code:
#include "Python.h" // Includes everything to use the Python-C API
int foo(void); // Declare foo
int foo(void) { // Name of our function in our DLL
Py_Initialize(); // Initialise Python
PyRun_SimpleString("print('hello world')"); // Run the Python commands
return 0; // Finish execution
}
Here is the tutorial for embedding Python. There are a few extra things that should be added here, but for brevity I have left those out.
Compile it and you should have a DLL. :)
That is not all. You will need to distribute whatever dependencies are needed, that will mean the python36.dll run-time and some other components to run the Python script.
My C coding is not perfect, so if anyone can spot any improvements please comment and I will do my best to fix the it.
It might also be possible in C# from this answer How do I call a specific Method from a Python Script in C#?, since C# can create DLLs, and you can call Python functions from C#.

You can use pyinstaller for converting the .py files into executable with all required packages into .dll format.
Step 1. pip install pyinstaller,
step 2. new python file let's name it code.py .
step 3. Write some lines of code i.e print("Hello World")
step 4. Open Command Prompt in the same location and write pyinstaller code.py hit enter. Last Step see in the same location two folders name build, dist will be created. inside dist folder there is folder code and inside that folder there is an exe file code.exe along with required .dll files.

If your only goal is to hide your source code, it is much simpler to just compile your code to an executable(use PyInstaller, for example), and use an module with readable source for communication.
NOTE: You might need more converter functions as shown in this example.
Example:
Module:
import subprocess
import codecs
def _encode_str(str):
encoded=str.encode("utf-32","surrogatepass")
return codecs.encode(encoded,"base64").replace(b"\n",b"")
def _decode_str(b64):
return codecs.decode(b64,"base64").decode("utf-32","surrogatepass")
def strlen(s:str):#return length of str;int
proc=subprocess.Popen(["path_to_your_exe.exe","strlen",_encode_str(str).decode("ascii")],stdout=subprocess.PIPE)
return int(proc.stdout.read())
def random_char_from_string(str):
proc=subprocess.Popen(["path_to_your_exe.exe","randchr",_encode_str(str).decode("ascii")],stdout=subprocess.PIPE)
return _decode_str(proc.stdout.read())
Executable:
import sys
import codecs
import random
def _encode_str(str):
encoded=str.encode("utf-32","surrogatepass")
return codecs.encode(encoded,"base64").replace(b"\n",b"")
def _decode_str(b64):
return codecs.decode(b64,"base64").decode("utf-32","surrogatepass")
command=sys.argv[1]
if command=="strlen":
s=_decode_str(sys.argv[2].encode("ascii"))
print(len(str))
if command=="randchr":
s_decode_str(sys.argv[2].encode("ascii"))
print(_encode_str(random.choice(s)).decode("ascii"))
You might also want to think about compiling different executables for different platforms, if your package isn't a windows-only package anyways.

This is my idea, it might work. I don't know, if that work or not.
1.Create your *.py files.
2.Rename them into *.pyx
3.Convert them into *.c files using Cython
4.Compile *.c into *.dll files.
But I don't recommend you because it won't work on any other platforms, except Windows.

Grab Visual Studio Express and IronPython and do it that way? You'll be in Python 2.7.6 world though.

Compiling Python to C using Cython

I'm trying to compile python source code foo.py to C using cython.
In foo.py:
print "Hello World"
The command I'm running is cython foo.py.
The problem is that when compiling foo.c using gcc, I get the error:
undefined reference to 'main'.

when converting the code from python to c (using Cython) it converts it to c code which can be compiled into a shared object.
in order to make it executable, you should add "--embed" to cython conversion command. this flag adds the 'main' function you need, so you could compile the c code into executable file.
please notice you'll need the python .so runtime libraries in order to run the exec.

Read the Cython documentation. This will also (hopefully) teach you what Cython is and what it isn't. Cython is for creating python extensions (not a general-purpose Python-to-C-compiler), which are shared objects/dlls. Dynamically loaded libraries don't have a main function like standalone programs, but compilers assume that they are ultimately linking an executable. You have to tell them otherwise via flags (-shared methinks, but again, refer to the Cython documentation) - or even better, don't compile yourself, use a setup.py for this (yet again, read the Cython documentation).

The usual way is to use distutils to compile the cython-generated file. This also gives you all the include directories you need in a portable way.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.