setup.py adding options (aka setup.py --enable-feature ) - python

I'm looking for a way to include some feature in a python (extension) module in installation phase.
In a practical manner:
I have a python library that has 2 implementations of the same function, one internal (slow) and one that depends from an external library (fast, in C).
I want that this library is optional and can be activated at compile/install time using a flag like:
python setup.py install # (it doesn't include the fast library)
python setup.py --enable-fast install
I have to use Distutils, however all solution are well accepted!

The docs for distutils include a section on extending the standard functionality. The relevant suggestion seems to be to subclass the relevant classes from the distutils.command.* modules (such as build_py or install) and tell setup to use your new versions (through the cmdclass argument, which is a dictionary mapping commands to classes which are to be used to execute them). See the source of any of the command classes (e.g. the install command) to get a good idea of what one has to do to add a new option.

An example of exactly what you want is the sqlalchemy's cextensions, which are there specifically for the same purpose - faster C implementation. In order to see how SA implemented it you need to look at 2 files:
1) setup.py. As you can see from the extract below, they handle the cases with setuptools and distutils:
try:
from setuptools import setup, Extension, Feature
except ImportError:
from distutils.core import setup, Extension
Feature = None
Later there is a check if Feature: and the extension is configured properly for each case using variable extra, which is later added to the setup() function.
2) base.py: here look at how BaseRowProxy is defined:
try:
from sqlalchemy.cresultproxy import BaseRowProxy
except ImportError:
class BaseRowProxy(object):
#....
So basically once C extensions are installed (using --with-cextensions flag during setup), the C implementation will be used. Otherwise, pure Python implementation of the class/function is used.

Related

CMake C++ library includes toolchain name

I am building a Python extension in C++ using pybind11 and scikit-build. I base on the example provided at https://github.com/pybind/scikit_build_example/blob/master/setup.py.
My CMakelists boils down to this:
pybind11_add_module(_mylib MODULE ${SOURCE_FILES})
install(TARGETS _mylib DESTINATION .)
setup.py:
setup(
name="mylib",
version="0.0",
packages=['mylib'],
cmake_install_dir="mylib",
)
And on the python side I have in mylib/__init__.py:
from _mylib import *
This all works great. I can install the package with pip and importing mylib successfully imports the library by proxy. This proxy is necessary because I do more in the library than just the C++ library.
Except there is one problem. The name of the built library includes the tools chain. For my system it looks like _mylib.cpython-38-x86_64-linux-gnu.so where I expect _mylib.so. __init__.py cannot find library unless either I rename it manually on the python side or I change the so name.
How can I resolve this issue?
Conclusion: as Alex said this part of the name is necessary. See https://www.python.org/dev/peps/pep-3149/. Python will automatically figure out it can use _mylib.cpython-38-x86_64-linux-gnu.so if you import _mylib.

How should a Python module use code generation?

I have a Python module that is built around a native extension written in C. This extension includes code generated using the GNU Bison and (not GNU) Flex tools. That means the build process for my C extension involves calling these tools and then including their outputs (C source files) in the extension sources.
To get this to work when calling python setup.py install, I extended the setuptools.command.build_ext class to call both Flex and Bison and then add the generated source to the Extension source before calling the super class run method.
This means my setup.py looks like:
import os
from setuptools import setup, Extension
from setuptools.command.build_ext import build_ext
c_extension = Extension('_mymod',
include_dirs = ['inc'],
sources = [
os.path.join('src', 'lib.c'),
os.path.join('src', 'etc.c')
])
class MyBuild(build_ext):
def run(self):
parser_dir = os.path.join(self.build_temp, 'parser')
# add the parser directory to include_dirs
self.include_dirs.append(parser_dir)
# add the source files to the sources
self.extensions[0].sources.extend([os.path.join(parser_dir, 'lex.yy.c'), os.path.join(parser_dir, 'parse.tab.c')])
# honor the --dry-run flag
if not self.dry_run:
self.mkpath(parser_dir)
os.system('flex -o ' + os.path.join(parser_dir, 'lex.yy.c') + ' ' + os.path.join('src', 'lex.l'))
os.system('bison -d -o ' + os.path.join(parser_dir, 'parse.tab.c') + ' ' + os.path.join('src', 'parse.y'))
# call the super class method
return build_ext.run(self)
setup (name = 'MyMod',
version = '0.1',
description = 'A module that uses external code generation tools',
author = 'Sean Kauffman',
packages = ['MyMod'],
ext_modules = [c_extension],
cmdclass={'build_ext': MyBuild},
python_requires='>=3',
zip_safe=False)
Now, however, I am trying to package this module for distribution and I have a problem. Either users who want to install my package need Bison and Flex installed, or I need to run these tools when I build the source distribution. I see two possible solutions:
I validate the flex and bison are in the system execution PATH. This keeps the custom builder as-is. I have found no documentation that implies I can validate that system files exist like bison and flex. The closest is using the libraries field of the Extension, but it seems I would need some real hackery to check the entire PATH for executables. I haven't tried this yet because my first choice would be option 2.
I move code generation to occur when the source distribution is created. This means the source distribution will contain the output files from bison and flex so people installing the package don't need these tools. This seems like the cleaner option. I have tried extending the sdist command instead of build_ext like above, but it isn't clear how I can add the generated files to the MANIFEST so they are included. Furthermore, I want to ensure that it still works to build using python setup.py install, but I don't think this command will run sdist before building.
It's fine for any solution to only work on Linux and OS X.
The usual solution for distributing code requiring (f)lex and bison/yacc is to bundle the generated scanner and parser, but be prepared to generate them if they are not present. The second part makes development a little easier and also gives people the option of using their own flex/bison version if they feel they have a good reason to do so. I suppose this advice would also apply to Python modules.
(IANAL but my understanding is that there is a licence exception for the code generated by bison, making it possible to distribute even in non-GPL projects. Flex is not GPL to start with, and afaik there are no distribution restrictions.)
To conditionally build the scanner and parser in a source distribution, you could use the code you have already provided, after verifying that the generated files don't exist. (Ideally, you would check that the generated files don't exist or are newer than the respective source file. That depends on the file dates not being altered on their voyage through an archive. That will work fine on Linux and OS X but it might not be completely portable.)
The assumption is that the package is built before executing the sdist command. sdist should normally exclude object files built in the source tree, so it shouldn't be necessary to manually clean the source. However, if you wanted to ensure that the generated files were present when you execute sdist, you could override it in your setup.py the same way you override build_ext, invoking bison and flex prior to calling the base sdist command.

Creating a Python package for a C extension-only module which is pre-built

I want to create a package for a project that does not contain any .py source files, but is completely implemented as a Python C extension (resulting in an .so). Additionally, assume that the .so is already built by a separate build process (say CMake).
I know that setuptools/distutils minimally requires a directory structure:
mymodule
__init__.py
But what I really want is for mymodule to be provided by a C extension (say mymodule.so) such that after installing the package, import mymodule has the same effect as directly importing mymodule.so.
I know that I could have this kind of directory structure:
mymodule
__init__.py
mymodule_native.so
and have __init__.py be:
from mymodule_native import *
This kind of works, but an object A imported from mymodule will actually look like mymodule.mymodule_native.A.
Is there a more direct way?
It's possible if the extension is configured by setuptools.
For example:
from setuptools import setup, Extension
extension = Extension('mymodule', sources=[<..>])
setup('mymodule', ext_modules=[extension])
Once installed, the extension is available as import mymodule. Note that find_packages is not used.
This needs to be done by setuptools as it would otherwise require a packages setting if no ext_modules are provided.
However, this makes the .so module be installed directly under site-packages directory and will conflict with any non-extension python modules of the same name.
This is generally considered bad practice and most libraries use a bare python module with a single __init__.py, under which the extension is available.
You may in future add python code to your module for example and wish to separate the pure python code from extension code. Or you may want to add more than one extension which is not possible in this way, at least not while keeping the same package name.
So a structure like mymodule.<python modules> and mymodule.my_extension makes sense.
Personally I'd have separate name spaces for extension code vs python code and not do from <ext mod> import * in __init__.py.

How to make my package importable without initializing the GPU

I'm writing a Python package that does GPU computing using the PyCUDA library. PyCUDA needs to initialize a GPU device (usually by importing pycuda.autoinit) before any of its submodules can be imported.
In my own modules I import whatever submodules and functions I need from PyCUDA, which means that my own modules are not importable without first initializing PyCUDA. That's fine mostly, because my package does nothing useful without a GPU present. However, now I want to write documentation and Sphinx Autodoc needs to import my package to read the docstrings. It works fine if I put import pycuda.autoinit into docs/conf.py, but I would like for the documentation to be buildable on machines that don't have an NVIDIA GPU such as my own laptop or readthedocs.org.
What's the most elegant way to defer the of import my dependencies such that I can import my own submodules on machines that don't have all the dependencies installed?
The autodoc mechanism requires that all modules to be documented are importable. When this requirement is a problem, mocking (replacing parts of the system with mock objects) can be a solution.
Here is an article that explains how mock objects can be used when working with Sphinx: http://blog.rtwilson.com/how-to-make-your-sphinx-documentation-compile-with-readthedocs-when-youre-using-numpy-and-scipy/.
The gist of the article is that it should work if you add something like this to conf.py:
import mock # See http://www.voidspace.org.uk/python/mock/
MOCK_MODULES = ['module1', 'module2', ...]
for mod_name in MOCK_MODULES:
sys.modules[mod_name] = mock.Mock()
The usual method I've seen is to have a module-level function like foo.init() that sets up the GPU/display/whatever that you need at runtime but don't want automatically initialized on import.
You might also consider exposing initialization options here: what if I have 2 CUDA-capable GPUs, but only want to use one of them?

Adding Version Control / Numbering (?) to Python Project

With my Java projects at present, I have full version control by declaring it as a Maven project. However I now have a Python project that I'm about to tag 0.2.0 which has no version control. Therefore should I come accross this code at a later date, I won't no what version it is.
How do I add version control to a Python project, in the same way Maven does it for Java?
First, maven is a build tool and has nothing to do with version control. You don't need a build tool with Python -- there's nothing to "build".
Some folks like to create .egg files for distribution. It's as close to a "build" as you get with Python. This is a simple setup.py file.
You can use SVN keyword replacement in your source like this. Remember to enable keyword replacement for the modules that will have this.
__version__ = "$Revision$"
That will assure that the version or revision strings are forced into your source by SVN.
You should also include version keywords in your setup.py file.
Create a distutils setup.py file. This is the Python equivalent to maven pom.xml, it looks something like this:
from distutils.core import setup
setup(name='foo',
version='1.0',
py_modules=['foo'],
)
If you want dependency management like maven, take a look at setuptools.
Ants's answer is correct, but I would like to add that your modules can define a __version__ variable, according to PEP 8, which can be populated manually or via Subversion or CVS, e.g. if you have a module thingy, with a file thingy/__init__.py:
___version___ = '0.2.0'
You can then import this version in setup.py:
from distutils.core import setup
import thingy
setup(name='thingy',
version=thingy.__version__,
py_modules=['thingy'],
)

Categories