Compiling an optional cython extension only when possible in setup.py

Compiling an optional cython extension only when possible in setup.py - python

I have a python module fully implemented in python. (For portability reasons.)
The implementation of a small part has been duplicated in a cython module. To improve perfomance where possible.
I know how to install the .c modules created by cython with distutils. However if a machine has no compiler installed, I suspect the setup will fail even though the module is still usable in pure python mode.
Is there a way to compile the .c module if possible but fail gracefully and install without it if compiling is not possible?

I guess you will have to make some modification both in your setup.py and in one __init__ file in your module.
Let say the name of your package will be "module" and you have a functionality, sub for which you have pure python code in the sub subfolder and the equivalent C code in c_sub subfolder.
For example in your setup.py :
import logging
from setuptools.extension import Extension
from setuptools.command.build_ext import build_ext
from distutils.errors import CCompilerError, DistutilsExecError, DistutilsPlatformError
logging.basicConfig()
log = logging.getLogger(__file__)
ext_errors = (CCompilerError, DistutilsExecError, DistutilsPlatformError, IOError, SystemExit)
setup_args = {'name': 'module', 'license': 'BSD', 'author': 'xxx',
'packages': ['module', 'module.sub', 'module.c_sub'],
'cmdclass': {'build_ext': build_ext}
}
ext_modules = [Extension("module.c_sub._sub", ["module/c_sub/_sub.c"])]
try:
# try building with c code :
setup(ext_modules=ext_modules, **setup_args)
except ext_errors as ex:
log.warn(ex)
log.warn("The C extension could not be compiled")
## Retry to install the module without C extensions :
# Remove any previously defined build_ext command class.
if 'build_ext' in setup_args['cmdclass']:
del setup_args['cmdclass']['build_ext']
# If this new 'setup' call don't fail, the module
# will be successfully installed, without the C extension :
setup(**setup_args)
log.info("Plain-Python installation succeeded.")
Now you will need to include something like this in your __init__.py file (or at any place relevant in your case):
try:
from .c_sub import *
except ImportError:
from .sub import *
In this way the C version will be used if it was build, other-wise the plain python version is used. It assumes that sub and c_sub will provide the same API.
You can find an example of setup file doing this way in the Shapely package. Actually most of the code I posted was copied (the construct_build_ext function) or adapted (lines after) from this file.

Class Extension has parameter optional in constructor:
optional - specifies that a build failure in the extension should not
abort the build process, but simply skip the extension.
Here is also a link to the quite interesting history of piece of code proposed by mgc.

The question How should I structure a Python package that contains Cython code
is related, there the question is how to fallback from Cython to the "already generated C code". You could use a similar strategy to select which of the .py or the .pyx code to install.

Related

SWIG C++ to python with multiple extensions in module on Ubuntu (vs MacOS)

I am trying to run C++ code, wrapped with SWIG into Python 3.8 modules on a remote computing cluster (where I don't have root access). I was writing the code on my own computer (MacOS Monterey) while the cluster runs Ubuntu20.04. The Code makes use of libraries quadedge, nlopt and gsl. Using the latter two I perform an optimisation with nlopt and integration with gsl every step however this only works on my own Mac, on the Ubuntu cluster there's some undefined behaviour during runtime, i.e. my rate of change when integrating is zero at around 50% of the integration steps (On the Mac everything works flawlessly).
To install my code as python modules, I use a setup.py. The procedure and module structure I use is currently the following:
install quadedge
install polygeo (own module that depends on quadedge)
install plantdevelopment (own module that depends on polygeo)
Both polygeo and plantdevelopment contain two extensions in one module. My setup.py for polygeo on my Mac looks like this:
from setuptools import setup, find_packages, Extension
from setuptools.command.build_py import build_py as _build_py
import os
import glob
class build_py(_build_py):
def run(self):
self.run_command("build_ext")
return super().run()
polygeoDir = 'polygeo/'
surfnloptDir = 'polygeo/surfnlopt/'
includeDir = 'polygeo/include/'
srcDir = 'polygeo/src/'
headerFiles = glob.glob(includeDir + '*.hpp')
srcFiles = glob.glob(srcDir + '*.cpp')
srcFiles.append(polygeoDir + 'polygeo.i')
surfnloptHeaderFiles = glob.glob(surfnloptDir + 'include/*.hpp')
surfnloptSrcFiles = glob.glob(surfnloptDir + 'src/*.cpp')
surfnloptSrcFiles.append(surfnloptDir + 'surfnlopt.i')
allHeaders = []
allHeaders += headerFiles
allHeaders += surfnloptHeaderFiles
polygeoExt = Extension('_polygeo',
sources=srcFiles,
include_dirs=[includeDir],
library_dirs=[],
libraries=[],
swig_opts=['-c++'],
extra_compile_args=['-std=c++11'],)
surfnloptExt = Extension('surfnlopt._surfnlopt',
sources=surfnloptSrcFiles,
include_dirs=[includeDir,
surfnloptDir + '/include'],
library_dirs=[],
libraries=['m', 'nlopt'],
swig_opts=['-c++'],
extra_compile_args=['-std=c++11'],)
setup(name='polygeo',
version='1.0',
packages=find_packages(),
ext_package='polygeo',
ext_modules=[polygeoExt,
surfnloptExt],
install_requires=['quadedge'],
headers=allHeaders,
cmdclass = {'build_py' : build_py},
)
Interestingly, it seems not to be necessary on my Mac to add extra paths or anything, but on the remote it requires me to edit the .bash_profile in order to set the necessary includepaths etc..
On top of that on Ubuntu it is required though to also add to each of the extensions the extra_link_args=['/home/.local/lib/python3.8/site-packages/quadedge/_quadedge.cpython-38-x86_64-linux-gnu.so'],that point to the cpython.so that I already generated with the setup.py for quadedge.
Also I need to add the source files of the polygeo extension to the source files of surfnlopt, i.e. sources=srcFiles+surfnloptSrcFiles, otherwise it throws a "symbols not found" error when importing the modules in python, because surfnlopt depends on the source files of polygeo (same if I don't include the .so as it also needs functions from quadedge).
For installing plantdevelopment my script looks almost the same, however there I have to include the polygeo.so instead of quadedge.so and have to add swig_opts=['-c++','-I'+polygeoIncDir]that points to the directory where the header files from the polygeo setup end up.
First thing I was wondering if it causes problems to have the extensions depend on the previously installed ones as then plantdevelopment also depends on polygeo and its extensions.
Could adding the source files to the extensions again, have an impact due to ambiguity/duplication when running the code? Could this be a memory leak?
So far I tried the following:
try to have as little includes/sources as possible to avoid overdefinition in the setup.py
try to install python locally on the Ubuntu remote with hombrew and pyenv with all software the same version as on the Mac (as well as the compiler)
tried out python 3.10 as there apparently used to be a memory leak issue at some point
tried to install everything on Mac without root privileges (still works flawlessly)
tried to put ALL files into one single module (which doesn't work due to abstract classes and the C++ code structure)
tested that the C++ code without python is working properly to make sure it's not about the C libraries (unfortunately I need the python API for the project and it's not my choice)
Using the guppy memory profiler, actually some of the runs (~50%) work out properly, already when I just import it and print the heap status, however there seems to be a lot of randomness...
In my program I use guppy the following way:
heap = hpy()
print("Heap Status At Starting : ")
heap_status1 = heap.heap()
print("Heap Size : ", heap_status1.size, " bytes\n")
print(heap_status1)
heap.setref()
print("\nHeap Status After Setting Reference Point : ")
heap_status2 = heap.heap()
print("Heap Size : ", heap_status2.size, " bytes\n")
print(heap_status2)
I can't really think of a reason why just importing and printing the heap info should change anything of the outcome of my optimization/integration?
Could this be an issue with the different OS?
Or maybe the code structure it problematic with python, building everything on top of each other/having dependencies between extensions in a single module?
I'd be happy for any kind of help and will of course provide any further necessary info.

Python Import cannot find shared object file (C++ extension)

I'm creating a C++ extension module for Python3. Compilation of the setup.py file compiles just fine, but when I go to import my new module, I get
ImportError: libMyLib.so: cannot open shared object file: No such file or directory
this is the path to my .so:
/path/to/lib-cc7/libMyLib.so
I've tried to import my libraries in the setup.py in different ways, I have tried setting and re-setting my LD_LIBRARY_PATH variable in the terminal as well as in my .bash_profile. I have tried setting the paths in sys.path.
When I run this code before the import statement:
print(os.environ.get("LD_LIBRARY_PATH"))
print(os.environ.get("PYTHONPATH"))
I get the path to the correct library directory.
When I run strace the path to other .so's that I need show up, and I see it searching for libMyLib.so, but it just searches in what seems like all of the other directories and /path/to/lib-cc7/. In other library searches it checks /path/to/lib-cc7/.
I have sanity checked that the library is there about 5 times.
It seems like no matter what I do,
import MyModule.MySubModule as SubModule
always returns the same import error. Is there anything else that I haven't tried? Why does it seem like Python is looking in the wrong place for my .so?
EDIT 1:
This is what my setup.py (in essence) looks like:
Submodule1 = Extension('Module.Submodule1', sources=['Module/Submodule1/submod1.cpp'], language = C++, libraries=[..long list of libraries..],)
Submodule2 = Extension('Module.Submodule2', sources=['Module/Submodule2/submod2.cpp'], language = C++, libraries=[..long list of libraries..],)
setup(name = "Module", version = '1.0',
packages = ['Module', 'Module.Submodule1', 'Module.Submodule2'],
ext_modules = [Submodule1, Submodule2], )

How do you get the filename of a Python wheel when running setup.py?

I have a build process that creates a Python wheel using the following command:
python setup.py bdist_wheel
The build process can be run on many platforms (Windows, Linux, py2, py3 etc.) and I'd like to keep the default output names (e.g. mapscript-7.2-cp27-cp27m-win_amd64.whl) to upload to PyPI.
Is there anyway to get the generated wheel's filename (e.g. mapscript-7.2-cp27-cp27m-win_amd64.whl) and save to a variable so I can then install the wheel later on in the script for testing?
Ideally the solution would be cross platform. My current approach is to try and clear the folder, list all files and select the first (and only) file in the list, however this seems a very hacky solution.

setuptools
If you are using a setup.py script to build the wheel distribution, you can use the bdist_wheel command to query the wheel file name. The drawback of this method is that it uses bdist_wheel's private API, so the code may break on wheel package update if the authors decide to change it.
from setuptools.dist import Distribution
def wheel_name(**kwargs):
# create a fake distribution from arguments
dist = Distribution(attrs=kwargs)
# finalize bdist_wheel command
bdist_wheel_cmd = dist.get_command_obj('bdist_wheel')
bdist_wheel_cmd.ensure_finalized()
# assemble wheel file name
distname = bdist_wheel_cmd.wheel_dist_name
tag = '-'.join(bdist_wheel_cmd.get_tag())
return f'{distname}-{tag}.whl'
The wheel_name function accepts the same arguments you pass to the setup() function. Example usage:
>>> wheel_name(name="mydist", version="1.2.3")
mydist-1.2.3-py3-none-any.whl
>>> wheel_name(name="mydist", version="1.2.3", ext_modules=[Extension("mylib", ["mysrc.pyx", "native.c"])])
mydist-1.2.3-cp36-cp36m-linux_x86_64.whl
Notice that the source files for native libs (mysrc.pyx or native.c in the above example) don't have to exist to assemble the wheel name. This is helpful in case the sources for the native lib don't exist yet (e.g. you are generating them later via SWIG, Cython or whatever).
This makes the wheel_name easily reusable in the setup.py script where you define the distribution metadata:
# setup.py
from setuptools import setup, find_packages, Extension
from setup_helpers import wheel_name
setup_kwargs = dict(
name='mydist',
version='1.2.3',
packages=find_packages(),
ext_modules=[Extension(...), ...],
...
)
file = wheel_name(**setup_kwargs)
...
setup(**setup_kwargs)
If you want to use it outside of the setup script, you have to organize the access to setup() args yourself (e.g. reading them from a setup.cfg script or whatever).
This part is loosely based on my other answer to setuptools, know in advance the wheel filename of a native library
poetry
Things can be simplified a lot (it's practically a one-liner) if you use poetry because all the relevant metadata is stored in the pyproject.toml. Again, this uses an undocumented API:
from clikit.io import NullIO
from poetry.factory import Factory
from poetry.masonry.builders.wheel import WheelBuilder
from poetry.utils.env import NullEnv
def wheel_name(rootdir='.'):
builder = WheelBuilder(Factory().create_poetry(rootdir), NullEnv(), NullIO())
return builder.wheel_filename
The rootdir argument is the directory containing your pyproject.toml script.
flit
AFAIK flit can't build wheels with native extensions, so it can give you only the purelib name. Nevertheless, it may be useful if your project uses flit for distribution building. Notice this also uses an undocumented API:
from flit_core.wheel import WheelBuilder
from io import BytesIO
from pathlib import Path
def wheel_name(rootdir='.'):
config = str(Path(rootdir, 'pyproject.toml'))
builder = WheelBuilder.from_ini_path(config, BytesIO())
return builder.wheel_filename
Implementing your own solution
I'm not sure whether it's worth it. Still, if you want to choose this path, consider using packaging.tags before you find some old deprecated stuff or even decide to query the platform yourself. You will still have to fall back to private stuff to assemble the correct wheel name, though.

My current approach to install the wheel is to point pip to the folder containing the wheel and let it search itself:
python -m pip install --no-index --find-links=build/dist mapscript
twine also can be pointed directly at a folder without needing to know the exact wheel name.

I used a modified version of hoefling's solution. My goal was to copy the build to a "latest" wheel file. The setup() function will return an object with all the info you need, so you can find out what it actually built, which seems simpler than the solution above. Assuming you have a variable version in use, the following will get the file name I just built and then copies it.
setup = setuptools.setup(
# whatever options you currently have
)
wheel_built = 'dist/{}-{}.whl'.format(
setup.command_obj['bdist_wheel'].wheel_dist_name,
'-'.join(setup.command_obj['bdist_wheel'].get_tag()))
wheel_latest = wheel_built.replace(version, 'latest')
shutil.copy(wheel_built, wheel_latest)
print('Copied {} >> {}'.format(wheel_built, wheel_latest))
I guess one possible drawback is you have to actually do the build to get the name, but since that was part of my workflow, I was ok with that. hoefling's solution has the benefit of letting you plan the name without doing the build, but it seems more complex.

Compile and use python-openzwave with open-zwave in non-standard location

I manually compiled python-openzwave to work with C++ library.
I would like to use it as Kodi addon (OpenELEC running on Pi 3), so can not use standard installation.
I've compiled everything, downloaded missing six and louie libs, and now try to run hello_world.py.
My current dirs structure is the following:
- root
- bin
- .lib
- config
Alarm.o
...
libopenzwave.a
libopenzwave.so
libopenzwave.so.1.4
...
- libopenzwave
driver.pxd
group.pxd
...
- louie
__init__.py
dispatcher.py
...
- openzwave
__init__.py
command.py
...
six.py
hello_world.py
But when I run hello_world.py, I get the following error -
Traceback (most recent call last):
File "hello_world.py", line 40, in <module>
from openzwave.controller import ZWaveController
File "/storage/.kodi/addons/service.multimedia.open-zwave/openzwave/controller.py", line 34, in <module>
from libopenzwave import PyStatDriver, PyControllerState
ImportError: No module named libopenzwave
If I move libopenzwave.a and libopenzwave.so to root folder, then I get the following error:
Traceback (most recent call last):
File "hello_world.py", line 40, in <module>
from openzwave.controller import ZWaveController
File "/storage/.kodi/addons/service.multimedia.open-zwave/openzwave/controller.py", line 34, in <module>
from libopenzwave import PyStatDriver, PyControllerState
ImportError: dynamic module does not define init function (initlibopenzwave)
What is wrong with my setup?

In general the steps required consist of calls to make build which handles building the .cpp files for openzwave and downloading all dependencies (including Cython); and make install which runs the setup-api, setup-lib.py (this setup script also creates the C++ Python extention for openzwave), setup-web.py and setup-manager.py.
Since you cannot run make install as you specified and are instead using the archive they provide, the only other options for creating the python extention, after building the openzwave library with make build, is generating the .so files for it without installing to standard locations.
Building the .so for the cython extention in the same folder as the Cython scripts is done by running:
python setup.py build_ext --inplace
This should create a shared library in src-lib named libopenzwave.so (it is different from the libopenzwave.so contained in the bin/ directory) which contains all the functionality specified in the extention module. You could try adding that to the libopenzwave folder.
If you pass special compiler flags during make build for building the openzwave library you should specify the same ones when executing the setup-lib.py script. This can be done by specifying the CFLAGS before executing it (as specified here) or else you might have issues like error adding symbols: File in wrong format.

Here's the description of the python-openzwave's build from the question's perspective. Almost all the steps correspond to the root Makefile's targets.
Prerequisites. There are several independent targets with little to no organization. Most use Debian-specific commands.
Cython is not needed if building from an archive (details below)
openzwave C++ library (openzwave openzwave/.lib/ target).
Build logic: openzwave/Makefile, invoked without parameters (but with inherited environment).
Inputs: openzwave/ subtree (includes libhidapi and libtinyxml, statically linked).
Outputs: openzwave/.lib/libopenzwave.{a,so}
Accepts PREFIX as envvar (/usr/local by default)
The only effect that affects us is: $(PREFIX)/etc/openzwave/ is assigned to a macro which adds a search location for config files (Options.cpp): config/ -> /etc/openzwave/ -> <custom location>.
libopenzwave Python C extension module (install-lib target - yes, the stock Makefile cannot just build it; the target doesn't even have the dependency on the library).
Build logic: setup-lib.py
Inputs: src-lib/, openzwave/.lib/libopenzwave.a
Outputs: build/<...>/libopenzwave.so - yes, the same name as openzwave's output, so avoid confusing them
By default, openzwave is linked statically with the module so you don't need to include the former into a deployment
The module does, however, need the config folder from the library. It is included by the build script when making a package.
Contrary to what Jim says, Cython is not needed to build from an archive, the archive already includes the generated .cpp.
Now, the catch is: the module itself uses pkg_resources to locate its data. So you cannot just drop the .so and config into the currect directory and call it a day. You need to make pkg_resources.get_distribution('libopenzwave') succeed.
pkg_resources claims to support "normal filesystem packages, .egg files, and unpacked .egg files."
In particular, I was able to pull this off: make an .egg (setup-lib.py bdist_egg), unpack it into the current directory and rename EGG-INFO into libopenzwave.egg-info (like it is in site-packages). A UserWarning is issued if I don't specifically add the .so's location into PYTHON_PATH/sys.path before importing the module.
openzwave,pyozwman and pyozwweb Python packages (install)
these are pure Python packages. The first one uses the C extension module, others use the first one.
Build logic: setup-api.py,setup-manager.py,setup-web.py
Input: src-*/
Output: (pure Python)
They only use pkg_resources.declare_namespace() so you're gonna be fine with just the proper files/dirs on sys.path without any .egg-info's

Importing python boost module

I built a DLL in VS2010 with boost::python to export some function to a python module:
myDLL.cpp:
std::string greet() { return "hello, world"; }
int square(int number) { return number * number; }
BOOST_PYTHON_MODULE(getting_started1)
{
// Add regular functions to the module.
def("greet", greet);
def("square", square);
}
Up to here, everything compiles just fine. I then get the myDLL.dll and myDLL.lib file in c:\myDLL\Debug.
According to boost doc (http://wiki.python.org/moin/boost.python/SimpleExample), I need to add this to PYTHONPATH, so I added c:\myDLL\Debug to it:
PYTHONPATH:
C:\Python27;c:\myDLL\Debug;
then, from my .py file, I try to import it:
import getting_started1
print getting_started1.greet()
number = 11
print number, '*', number, '=', getting_started1.square(number)
I have also tried from myDLL import getting_started1, and from getting_started1 import *, and all possible combinations of sorts.
Can anyone tell me how am I supposed to call my module? Thanks
EDIT:
According to cgohlke, there should be a getting_started1.pyd somewhere in my PYTHONPATH when I compile in VS? This file is inexistant... Do I have to set somethign different in VS2010? I have a default win32 DLL project.
But the boost doc says " If we build this shared library and put it on our PYTHONPATH", isn't a shared library on windows a DLL? ergo, the DLL should be in the PYTHONPATH?

The standard, portable way to build Python extensions is via distutils. However, Visual Studio 2010 is not a supported compiler for Python 2.7. The following setup.py works for me with Visual Studio 2008 and boost_1_48_0. The build command is python setup.py build_ext --inplace.
# setup.py
from distutils.core import setup
from distutils.extension import Extension
setup(name="getting_started1",
ext_modules=[
Extension("getting_started1", ["getting_started1.cpp"],
include_dirs=['boost_1_48_0'],
libraries = ['boost_python-vc90-mt-1_48'],
extra_compile_args=['/EHsc', '/FD', '/DBOOST_ALL_DYN_LINK=1']
)
])
For your Visual Studio 2010 project, try to change the linker output file to getting_started1.pyd instead of myDLL.dll.

I managed to get it working only in Release configuration and not in Debug.
From the project properties, on the General tab, modify Target Extension to .pyd
The project should be indeed a dll, as you did
In the Python script you need to specify the location of the dll as in this example:
import sys
sys.path.append("d:\\TheProjectl\\bin\\Release")
import getting_started #Dll name without the extension

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.