Static linking Python library to C (C++) with Numpy

Static linking Python library to C (C++) with Numpy - python

I'm developing a C++ library, which has python embedded. What I would like to do is to statically link Python library, so that there won't be configuration issues, when I switch to production server. So far, I'm able to link libpython3.5m.a statically (I had to build Python from sources though, because it seems, that packaged libraries aren't compiled with -fPIC flag). However, I came to a problem, that it seems, there's no Numpy: When I run application, which uses my library, it prompts me with an error:
ImportError: numpy.core.multiarray failed to import
And this error is caused by import_array1() macro, that (AFAIK) is used to import the numpy routines to C++. I tried linking libnpymath.a as well as libnpysort.a, which I found in numpy build dir, but to no avail. Do you happen to know, if such static linking is possible and how to do it? I guess it should be possible, since numpy is written in C...

What I would like to do is to statically link Python library, so that there won't be configuration issues, when I switch to production server.
This would only be the Python core, it would exclude all of the Python libraries. You still need to ship all of the Python code.
...since numpy is written in C...
This is incorrect. NumPy is written about half in C and half in Python. It looks like the C part is the part that's not loading here, since numpy.core.multiarray is written in C, and you wouldn't normally import that yourself, it would normally be imported by the Python part of NumPy.
Linking in the C code is not enough anyway, you need to load initialize the associated Python modules exported by the C code. Without static linking, Python would just find the multiarray.so file in the right place and load it. When you build Python statically, you would would normally edit the Modules/Setup.local file with the modules you want statically compiled into Python. However, this is not designed to work with arbitrary third-party modules like NumPy. See: Compile the Python interpreter statically?
Honestly, if you are just trying to make sure that the same version of Python runs on both development and production systems, there are vastly easier ways to do this, like virtualenv. CPython is simply not designed to be statically linked.

Related

Can Cython compiled .so extensions be imported into other languages, eg. Java?

I'm in the process of learning Cython and I wasn't able to find a direct answer to this. Also please bear with me as my understanding of C is limited as of now. As far as I understand, with the cythonize command, .pyx files are converted to C, and are compiled to platform-specific libraries (.so / .pxd). My questions are:
If .pyx files are fully converted to C, does it mean that these generated extensions are no longer dependent on python runtime after compilation?
Assuming the same host architecture, can these generated extensions be loaded into other languages, eg. via Java's JNI? If so, are there any hello world examples on this?

Cython extensions are fully C, but they heavily use the Python C API. This means they can't be run independent of libpython (and usually the Python standard library). However it is possible to load libpython into other languages and then use a Cython extension. Also bear in mind that anything you import within Cython is not compiled but needs to be available for import.
I don't plan to answer this fully, but both ImageJ (in Java) and Julia do support Python interoperability layers and do Cython extensions would work there. You're much better off searching for "Python-Java interoperability layer" than trying to create your own way of using Cython specifically.

Cython PYD is a DLL, is there an easy way to embed Python into an executable calling it?

I'm trying to embed Python into a C++ .exe using NumPy, Pandas, SciPy, and some compiled Cython PYD files. There are some instructions for embedding Python but these don't really extend to additional libraries: to ensure this works on Windows machines lacking my version of Python, I am using the embedded library for 3.5.1 here which has little documentation on its use: https://docs.python.org/3.5/using/windows.html#embedded-distribution
And before calling Py_Initialize(), I extract the python35.zip file from the embedded distribution and place it in the project directory, then reference it as described here: http://www.myoddweb.com/2016/02/27/embed-python-in-your-c-application-without-python-installed-on-guest-machine/ i.e.
#include <Python.h>
std::wstring python_path;
python_path += L"\\pythonlibs\\python35.zip";
Py_SetPath(python_path.c_str());
Py_Initialize();
For some reason, I cannot get the compiled version to run. I have tried to keep Python separate from my system build. I'm still referencing the Numpy includes from my system path though C:\Anaconda3\Lib\site-packages\numpy\core\include, not sure if that is problematic. All my code is fast because I'm using NumPy MemoryViews and optimized Pandas data manipulation, which I need to keep. Not a MSVC expert here so just looking for anyone who can guide me in the right direction embedding Python. Much appreciated.

Compile a Python application to C

I have made an application in Python. It contains several plugins, organized into different subdirectories. I need to compile entirely to C code to improve security of source code. I have dealt with Cython, but cannot find how to compile the entire directory, with all plugin dependencies. I need a way to compile each of the dependencies to C, and that the application runs from C compiled.
http://docs.cython.org/src/quickstart/build.html
How to compile and link multiple python modules (or packages) using cython?

Python does not compile to native code. Scripts can be "frozen" with a few different tools, which makes them into standalone executables, but it's not actually compiling to C, it's packaging the script (or just its Python byte code representation) with a copy of the interpreter and all of its dependencies; the binary still has all the Python code (or the trivial byte code transform thereof) in it.
Cython lets you compile a syntactic variant of Python into Python C extensions, but they still run on the Python interpreter, and they still expose enough information to reverse the transformation.
Get the proper legal protections in place and freeze your Python executable if you like (freezing is enough to make the source code "non-obvious" even if anyone who went to trivial effort could get it back), but Python does not compile to plain C directly (if it did, I'd expect the CPython reference interpreter to do that more often just to gain performance with the built-in modules, yet they only write C accelerators by hand).

import pygr into jython failing on C library

I am trying to import pygr:
It fails on:
>>> import seqfmt
ImportError: No module named seqfmt
The program that uses this works fine in Python. However its calling a C library called seqfmt (which has a C file and a PYX files). Is this possible to import over to Jython or since its C am I out of luck?

.PYX is the file extension used by cython, a tool for writing C extensions for python in a python-like syntax. Cython creates an intermediate file (that's presumably the .C file you see, at least it's not in the git repository) and compiles it into a python extension.
Jython does not support CPython extensions yet. From its homepage:
There are a number of differences. First, Jython programs cannot currently use CPython
extension modules written in C. These modules usually have files with the extension .so,
.pyd or .dll. If you want to use such a module, you should look for an
equivalent written in pure Python or Java. However, it is technically
feasible to support such extensions, as demonstrated by IronPython.
For the next release of Jython, we plan to support the C Python
Extension API.
Some cython modules can be easily translated into python, and seqfmt is one of them, but pygr has a second cython module, cnestedlist, which involves C calls: The lines
cdef extern from "apps/maf2nclist.h":
[..]
int readMAFrecord(IntervalMap im[],int n,SeqIDMap seqidmap[],int nseq,
int lpoStart,int *p_block_len,FILE *ifile,int maxseq,
long long linecode_count[],int *p_has_continuation)
define an external library call. You'd have to translate this library to Python, too.
Just as a side-note regarding the translation: cython can not only be used to wrap C libraries, but also to simply speed up certain parts of a program. In these cases it is pretty straight-forward to translate them into a python module. Have a look into the seqfmt.pyx source file, its pretty self-explanatory if you know python.
That all being said, there is a project related to Jython, JyNI, which aims to support CPython extensions from Jython. It is work-in-progress, so I can't tell if your libraries are supported by it. There are some examples in the github repository, maybe you can get it to work. The readme file claims binary compatibility, so with JyNI enabled you should be able to run your code without any recompiling.

How to compile a Python package to a dll

Well, I have a Python package. I need to compile it as dll before distribute it in a way easily importable. How? You may suggest that *.pyc. But I read somewhere any *.pyc can be easily decompiled!
Update:
Follow these:
1) I wrote a python package
2) want to distribute it
3) do NOT want distribute the source
4) *.pyc is decompilable >> source can be extracted!
5) dll is standard

Write everything you want to hide in Cython, and compile it to pyd. That's as close as you can get to making compiled python code.
Also, dll is not a standard, not in Python world. They're not portable, either.

Nowadays a simple solutino exists: use Nuitka compiler as described in Nuitka User Manual
Use Case 2 - Extension Module compilation
If you want to compile a single extension module, all you have to do is this:
python -m nuitka --module some_module.py
The resulting file some_module.so can then be used instead of some_module.py.
You need to compile for each platform you want to support and write some initialization code to import so/pyd file ~~appropriate for given platform/python version etc.~~
[EDIT 2021-12]: Actually in python 3 the proper so/dll is determined automatically based on the file name (if it includes python version and platform - can't find PEP for this feature at the moment but Nuitka creates proper names for compiled modules). So for python 2.7 the library name would be something.pyd or something.so whereas for python 3 this would change to something.cp36-win32.pyd or something.cpython-36m-x86_64-linux-gnu.so (for 32bit python 3.6 on x86).
The result is not DLL as requested but Python-native compiled binary format (it is not bytecode like in pyc files; the so/pyd format cannot be easily decompiled - Nuitka compiles to machine code through C++ translation)
EDIT [2020-01]: The compiled module is prone to evaluation methods using python standard mechanisms - e.g. it can be imported as any other module and get its methods listed etc. To secure implementation from being exposed that way there is more work to be done than just compiling to a binary module.

You can use py2exe.org to convert python scripts into windows executables. Granted this will only work on windows, but it's better then nothing.

You can embed python inside C. The real trick is converting between C values and Python values. Once you've done that, though, making a DLL is pretty straightforward.
However, why do you need to make a dll? Do you need to use this from a non-python program?

Python embedding is supported in CFFI version 1.5, you can create a .dll file which can be used by a Windows C application.

I would also using Cython to generate pyd files, like Dikei wrote.
But if you really want to secure your code, you should better write the important stuff in C++. The best would be to combine both C++ and Python. The idea: you would leave the python code open for adjustments, so that you don't have to compile everything over and over again. That means, you would write the "core" in C++ (which is the most secure solution these days) and use those dll files in your python code. It really depends what kind of tool or program you are building and how you want to execute it. I create mostly an execution file (exe,app) once I finish a tool or a program, but this is more for the end user. This could be done with py2exe and py2app (both 64 bit compatible). If you implement the interpreter, the end user's machine doesn't have to have python installed on the system.
A pyd file is the same like a dll and fully supported inside python. So you can normally import your module. You can find more information about it here.
Using and generating pyd files is the fastest and easiest way to create safe and portable python code.
You could also write real dll files in C++ and import them with ctypes to use them (here a good post and here the python description of how it works)

To expand on the answer by Nick ODell
You must be on Windows for DLLs to work, they are not portable.
However the code below is cross platform and all platforms support run-times so this can be re-compiled for each platform you need it to work on.
Python does not (yet) provide an easy tool to create a dll, however you can do it in C/C++
First you will need a compiler (Windows does not have one by default) notably Cygwin, MinGW or Visual Studio.
A basic knowledge of C is also necessary (since we will be coding mainly in C).
You will also need to include the necessary headers, I will skip this so it does not become horribly long, and will assume everything is set up correctly.
For this demonstration I will print a traditional hello world:
Python code we will be converting to a DLL:
def foo(): print("hello world")
C code:
#include "Python.h" // Includes everything to use the Python-C API
int foo(void); // Declare foo
int foo(void) { // Name of our function in our DLL
Py_Initialize(); // Initialise Python
PyRun_SimpleString("print('hello world')"); // Run the Python commands
return 0; // Finish execution
}
Here is the tutorial for embedding Python. There are a few extra things that should be added here, but for brevity I have left those out.
Compile it and you should have a DLL. :)
That is not all. You will need to distribute whatever dependencies are needed, that will mean the python36.dll run-time and some other components to run the Python script.
My C coding is not perfect, so if anyone can spot any improvements please comment and I will do my best to fix the it.
It might also be possible in C# from this answer How do I call a specific Method from a Python Script in C#?, since C# can create DLLs, and you can call Python functions from C#.

You can use pyinstaller for converting the .py files into executable with all required packages into .dll format.
Step 1. pip install pyinstaller,
step 2. new python file let's name it code.py .
step 3. Write some lines of code i.e print("Hello World")
step 4. Open Command Prompt in the same location and write pyinstaller code.py hit enter. Last Step see in the same location two folders name build, dist will be created. inside dist folder there is folder code and inside that folder there is an exe file code.exe along with required .dll files.

If your only goal is to hide your source code, it is much simpler to just compile your code to an executable(use PyInstaller, for example), and use an module with readable source for communication.
NOTE: You might need more converter functions as shown in this example.
Example:
Module:
import subprocess
import codecs
def _encode_str(str):
encoded=str.encode("utf-32","surrogatepass")
return codecs.encode(encoded,"base64").replace(b"\n",b"")
def _decode_str(b64):
return codecs.decode(b64,"base64").decode("utf-32","surrogatepass")
def strlen(s:str):#return length of str;int
proc=subprocess.Popen(["path_to_your_exe.exe","strlen",_encode_str(str).decode("ascii")],stdout=subprocess.PIPE)
return int(proc.stdout.read())
def random_char_from_string(str):
proc=subprocess.Popen(["path_to_your_exe.exe","randchr",_encode_str(str).decode("ascii")],stdout=subprocess.PIPE)
return _decode_str(proc.stdout.read())
Executable:
import sys
import codecs
import random
def _encode_str(str):
encoded=str.encode("utf-32","surrogatepass")
return codecs.encode(encoded,"base64").replace(b"\n",b"")
def _decode_str(b64):
return codecs.decode(b64,"base64").decode("utf-32","surrogatepass")
command=sys.argv[1]
if command=="strlen":
s=_decode_str(sys.argv[2].encode("ascii"))
print(len(str))
if command=="randchr":
s_decode_str(sys.argv[2].encode("ascii"))
print(_encode_str(random.choice(s)).decode("ascii"))
You might also want to think about compiling different executables for different platforms, if your package isn't a windows-only package anyways.

This is my idea, it might work. I don't know, if that work or not.
1.Create your *.py files.
2.Rename them into *.pyx
3.Convert them into *.c files using Cython
4.Compile *.c into *.dll files.
But I don't recommend you because it won't work on any other platforms, except Windows.

Grab Visual Studio Express and IronPython and do it that way? You'll be in Python 2.7.6 world though.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.