Can I statically link Cython modules into an executable which embeds python? - python

I currently have an executable compiled from C++ that embeds python. The embedded executable runs a python script which load several Cython modules. Both the Cython modules and the executable are linked against a shared library.
I want to move the shared library into the executable by statically linking the shared library against the executable.
Can I statically link the Cython modules into the executable which embeds python? What is the best way to handle this situation?

Yes it's possible, but if you have an hand on the python interpreter. What i'm going to describe have been done for python on IOS platform. You need to check more how to let python known about your module if you don't want to touch on the original python interpreter (Replace TEST everywhere with your own tag/libname)
One possible way to do it is:
Compile your own python with a dynload patch that prefer to not dlopen() your module, but use directly dlsym() to check if the module is already in memory.
Create an libTEST.a, including all the .o generated during the build process (not the .so). You can found it usually in the build/temp.*, and do something like this:
ar rc libTEST.a build/temp.*/*.o
ranlib libTEST.a
When compiling the main executable, you need to add a dependency to that new libTEST.a by appending in the compilation command line:
-lTEST -L.
The result will give you an executable with all the symbol from your cython modules, and python will be able to search them in memory.
(As an example, I'm using an enhanced wrapper that redirect ld during compilation to not produce .so, and create a .a at the end. On the kivy-ios project, you can grab liblink that is used to produce .o, and biglink that is used to grab all the .o in directories and produce .a. You can see how it's used in build_kivy.sh)

Related

difference between *.so file created by pybind and regular linux dynamic libs

When one uses pybind to create python-c++ bindings, upon compilation pybind creates a *.so file. AFAIK the compilation step in pybind just uses the c++ compiler, so this should be no different from just regular shared libs that one would create for a normal c++ code. How does the python interpreter introspect into these *.so files to notice that there are python-compatible modules in them?
Ultimately, you'll want to look at the CPython docs for how C extensions work. From the docs: https://docs.python.org/3/extending/building.html
A C extension for CPython is a shared library (e.g. a .so file on Linux, .pyd on Windows), which exports an initialization function.
As it says here, the primary difference is that it defines its initialization / entry point function.
All pybind does is wrap this entry point via PYBIND11_MODULE:
https://pybind11.readthedocs.io/en/stable/basics.html#creating-bindings-for-a-simple-function
https://github.com/pybind/pybind11/blob/25abf7e/include/pybind11/detail/common.h#L283

Package shared object (library) with Python Egg / Wheel

I've done this so far:
Created MANIFEST.in with: include path/to/libfoo.so
Created setup.py that after calling setupt.py install puts the libfoo.so into /usr/local/lib/python/site-packages/foo.egg/path/to/libfoo.so.
This, of course, doesn't help Python to find libfoo when it's required at run time. What do I need to do to make Python actually find this library?
Note
This library doesn't have Python bindings, it's just a shared library with some native code in it. It is called from another shared library which sits in /usr/local/lib/python/site-packages/foo.egg/path/wrapped.cpython-36m-x86_64-linux-gnu.so.
If you want to hard-code the location of shared library, you can use the rpath option. For that you would do something like..
python setup.py build_ext --rpath=/usr/local/lib/python/site-packages/foo.egg/path/to
Where the setup.py above is the script used to build wrapped.cpython-36m-x86_64-linux-gnu.so and the rpath is the path to libfoo.so Of course you should be able to put this directly inside the build script, depending on what that process looks like.
-rpath=dir
Add a directory to the runtime library search path. This is used when
linking an ELF executable with shared objects. All -rpath arguments
are concatenated and passed to the runtime linker, which uses them to
locate shared objects at runtime. The -rpath option is also used when
locating shared objects which are needed by shared objects explicitly
included in the link
If it's not an option to update the build process for wrapped.cpython-36m-x86_64-linux-gnu.so I think your only option it to put libfoo.so somewhere that's in the load library path or manually add the location at run-time.
In answer to a few of your follow-on questions...
The system load library locations come from /etc/ld.so.conf and references the locations in the ld.so.conf.d directory. The ldconfig command rebuilds the the cache for shared libraries from this data so if you change things be sure to call this command.
At the command line or in your .bashrc you can use export LD_LIBRARY_PATH=.... to add additional directories to the search path.
You can manually load shared objects. See https://docs.python.org/2/library/ctypes.html Loading shared libraries.
I haven't tried this myself but I've read that if you manually load a subordinate shared library in your python code and then import the higher level library, the linker won't have to go out and find the lower one since it's already loaded. This would look something like...
import ctypes
foolib = ctypes.CDLL('/full/path/to/libfoo.so')
import wrapped
There's a number of examples on StackOverflow on how to do this and lots of additional info/examples on manipulating the library search paths.

How to compile a Python package to a dll

Well, I have a Python package. I need to compile it as dll before distribute it in a way easily importable. How? You may suggest that *.pyc. But I read somewhere any *.pyc can be easily decompiled!
Update:
Follow these:
1) I wrote a python package
2) want to distribute it
3) do NOT want distribute the source
4) *.pyc is decompilable >> source can be extracted!
5) dll is standard
Write everything you want to hide in Cython, and compile it to pyd. That's as close as you can get to making compiled python code.
Also, dll is not a standard, not in Python world. They're not portable, either.
Nowadays a simple solutino exists: use Nuitka compiler as described in Nuitka User Manual
Use Case 2 - Extension Module compilation
If you want to compile a single extension module, all you have to do is this:
python -m nuitka --module some_module.py
The resulting file some_module.so can then be used instead of some_module.py.
You need to compile for each platform you want to support and write some initialization code to import so/pyd file ~~appropriate for given platform/python version etc.~~
[EDIT 2021-12]: Actually in python 3 the proper so/dll is determined automatically based on the file name (if it includes python version and platform - can't find PEP for this feature at the moment but Nuitka creates proper names for compiled modules). So for python 2.7 the library name would be something.pyd or something.so whereas for python 3 this would change to something.cp36-win32.pyd or something.cpython-36m-x86_64-linux-gnu.so (for 32bit python 3.6 on x86).
The result is not DLL as requested but Python-native compiled binary format (it is not bytecode like in pyc files; the so/pyd format cannot be easily decompiled - Nuitka compiles to machine code through C++ translation)
EDIT [2020-01]: The compiled module is prone to evaluation methods using python standard mechanisms - e.g. it can be imported as any other module and get its methods listed etc. To secure implementation from being exposed that way there is more work to be done than just compiling to a binary module.
You can use py2exe.org to convert python scripts into windows executables. Granted this will only work on windows, but it's better then nothing.
You can embed python inside C. The real trick is converting between C values and Python values. Once you've done that, though, making a DLL is pretty straightforward.
However, why do you need to make a dll? Do you need to use this from a non-python program?
Python embedding is supported in CFFI version 1.5, you can create a .dll file which can be used by a Windows C application.
I would also using Cython to generate pyd files, like Dikei wrote.
But if you really want to secure your code, you should better write the important stuff in C++. The best would be to combine both C++ and Python. The idea: you would leave the python code open for adjustments, so that you don't have to compile everything over and over again. That means, you would write the "core" in C++ (which is the most secure solution these days) and use those dll files in your python code. It really depends what kind of tool or program you are building and how you want to execute it. I create mostly an execution file (exe,app) once I finish a tool or a program, but this is more for the end user. This could be done with py2exe and py2app (both 64 bit compatible). If you implement the interpreter, the end user's machine doesn't have to have python installed on the system.
A pyd file is the same like a dll and fully supported inside python. So you can normally import your module. You can find more information about it here.
Using and generating pyd files is the fastest and easiest way to create safe and portable python code.
You could also write real dll files in C++ and import them with ctypes to use them (here a good post and here the python description of how it works)
To expand on the answer by Nick ODell
You must be on Windows for DLLs to work, they are not portable.
However the code below is cross platform and all platforms support run-times so this can be re-compiled for each platform you need it to work on.
Python does not (yet) provide an easy tool to create a dll, however you can do it in C/C++
First you will need a compiler (Windows does not have one by default) notably Cygwin, MinGW or Visual Studio.
A basic knowledge of C is also necessary (since we will be coding mainly in C).
You will also need to include the necessary headers, I will skip this so it does not become horribly long, and will assume everything is set up correctly.
For this demonstration I will print a traditional hello world:
Python code we will be converting to a DLL:
def foo(): print("hello world")
C code:
#include "Python.h" // Includes everything to use the Python-C API
int foo(void); // Declare foo
int foo(void) { // Name of our function in our DLL
Py_Initialize(); // Initialise Python
PyRun_SimpleString("print('hello world')"); // Run the Python commands
return 0; // Finish execution
}
Here is the tutorial for embedding Python. There are a few extra things that should be added here, but for brevity I have left those out.
Compile it and you should have a DLL. :)
That is not all. You will need to distribute whatever dependencies are needed, that will mean the python36.dll run-time and some other components to run the Python script.
My C coding is not perfect, so if anyone can spot any improvements please comment and I will do my best to fix the it.
It might also be possible in C# from this answer How do I call a specific Method from a Python Script in C#?, since C# can create DLLs, and you can call Python functions from C#.
You can use pyinstaller for converting the .py files into executable with all required packages into .dll format.
Step 1. pip install pyinstaller,
step 2. new python file let's name it code.py .
step 3. Write some lines of code i.e print("Hello World")
step 4. Open Command Prompt in the same location and write pyinstaller code.py hit enter. Last Step see in the same location two folders name build, dist will be created. inside dist folder there is folder code and inside that folder there is an exe file code.exe along with required .dll files.
If your only goal is to hide your source code, it is much simpler to just compile your code to an executable(use PyInstaller, for example), and use an module with readable source for communication.
NOTE: You might need more converter functions as shown in this example.
Example:
Module:
import subprocess
import codecs
def _encode_str(str):
encoded=str.encode("utf-32","surrogatepass")
return codecs.encode(encoded,"base64").replace(b"\n",b"")
def _decode_str(b64):
return codecs.decode(b64,"base64").decode("utf-32","surrogatepass")
def strlen(s:str):#return length of str;int
proc=subprocess.Popen(["path_to_your_exe.exe","strlen",_encode_str(str).decode("ascii")],stdout=subprocess.PIPE)
return int(proc.stdout.read())
def random_char_from_string(str):
proc=subprocess.Popen(["path_to_your_exe.exe","randchr",_encode_str(str).decode("ascii")],stdout=subprocess.PIPE)
return _decode_str(proc.stdout.read())
Executable:
import sys
import codecs
import random
def _encode_str(str):
encoded=str.encode("utf-32","surrogatepass")
return codecs.encode(encoded,"base64").replace(b"\n",b"")
def _decode_str(b64):
return codecs.decode(b64,"base64").decode("utf-32","surrogatepass")
command=sys.argv[1]
if command=="strlen":
s=_decode_str(sys.argv[2].encode("ascii"))
print(len(str))
if command=="randchr":
s_decode_str(sys.argv[2].encode("ascii"))
print(_encode_str(random.choice(s)).decode("ascii"))
You might also want to think about compiling different executables for different platforms, if your package isn't a windows-only package anyways.
This is my idea, it might work. I don't know, if that work or not.
1.Create your *.py files.
2.Rename them into *.pyx
3.Convert them into *.c files using Cython
4.Compile *.c into *.dll files.
But I don't recommend you because it won't work on any other platforms, except Windows.
Grab Visual Studio Express and IronPython and do it that way? You'll be in Python 2.7.6 world though.

Compiling Python to C using Cython

I'm trying to compile python source code foo.py to C using cython.
In foo.py:
print "Hello World"
The command I'm running is cython foo.py.
The problem is that when compiling foo.c using gcc, I get the error:
undefined reference to 'main'.
when converting the code from python to c (using Cython) it converts it to c code which can be compiled into a shared object.
in order to make it executable, you should add "--embed" to cython conversion command. this flag adds the 'main' function you need, so you could compile the c code into executable file.
please notice you'll need the python .so runtime libraries in order to run the exec.
Read the Cython documentation. This will also (hopefully) teach you what Cython is and what it isn't. Cython is for creating python extensions (not a general-purpose Python-to-C-compiler), which are shared objects/dlls. Dynamically loaded libraries don't have a main function like standalone programs, but compilers assume that they are ultimately linking an executable. You have to tell them otherwise via flags (-shared methinks, but again, refer to the Cython documentation) - or even better, don't compile yourself, use a setup.py for this (yet again, read the Cython documentation).
The usual way is to use distutils to compile the cython-generated file. This also gives you all the include directories you need in a portable way.

Compiling C-dll for Python OR SWIG-module creation, how to continue?

I reference this file "kbdext.c" and its headerfile listed on http://www.docdroppers.org/wiki/index.php?title=Writing_Keyloggers (the listings are at the bottom).
I've been trying to compile this into a dll for use in Python or Visual Basic, but have not succeeded. I'm not familiar with C or GCC to sort out the problems or do the dll compile correctly. (I also get an error about snprintf not being declared when doing a regular compile of all the files).
What are the steps I should do to make all functions available for other languages and external apps?
Or is it perhaps easier to use SWIG and make a python module, instead of compiling a DLL?
I've succeeded in compiling the dll with GCC, and am able to import its functions in C. I have yet to test the import in VB and Python but can't see why it would pose problems.

Categories