Identifying Packages Using Python's Stable ABI - python

I have a Python package containing a number of C/C++ extensions built as a single wheel. I'm trying to understand how to ensure the wheel and shared libraries it contains correctly advertise that they use the stable ABI at a particular API version. I build the package using a setup.py that I run this way.
% python setup.py bdist_wheel --py-limited-api=cp34
I think the cp34 part is how I indicate that I'm using the stable ABI and at most the Python 3.4 API. The resulting wheel is named goober-1.2-cp34-abi3-linux_x86_64.whl. The highlighted part shows the Python and ABI tags. Without the --py-limited-api, that part is cp38-cp38, matching my Python 3.8. Is that enough to advertise that my wheel should work with all Python 3.x starting from 3.4, without recompiling? I guess I'd specify cp3 to indicate all 3.x versions.
For the shared libraries, I compile the C/C++ source this way.
% gcc ... -DPy_LIMITED_API=0x03040000 ... blooper.c
In this case, the shared library is named blooper.cpython-38-x86_64-linux-gnu.so, with nothing indicating it supports the stable ABI and the 3.4 API. From PEP 3149 I expected to see that somewhere in the name. Otherwise, won't Python 3.8 be the only version willing to import this module?
Thanks.

It might be surprising, but adding --py-limited-api=cp34 only changes the name of the wheel, but not its content - i.e. it will be still "usual" version pinned to the Python version which with it has been built.
The first step is to create a setup.py which would produce a C-extension which uses stable-API and which declares it as well. To my knowledge distutils has no support for stable C-API, so setuptools should be used.
There is a minimal example:
from setuptools import setup, Extension
my_extension = Extension(
name='foo',
sources = ["foo.c"],
py_limited_api = True,
define_macros=[('Py_LIMITED_API', '0x03040000')],
)
kwargs = {
'name':'foo',
'version':'0.1.0',
'ext_modules': [my_extension],
}
setup(**kwargs)
Important details are:
py_limited_api should be set to True, thus the resulting extension will have the correct suffixes (e.g. abi3), once build.
Py_LIMITED_API macro should be set to the correct value, otherwise non-stable or wrong stable C-API will be used.
The resulting suffix of the extension might also be surprising. The CPython documentation states:
On some platforms, Python will look for and load shared library files
named with the abi3 tag (e.g. mymodule.abi3.so). It does not check if
such extensions conform to a Stable ABI. The user (or their packaging
tools) need to ensure that, for example, extensions built with the
3.10+ Limited API are not installed for lower versions of Python.
"Some platforms" are Linux and MacOS, one can check it by looking at
from importlib.machinery import EXTENSION_SUFFIXES
print(EXTENSION_SUFFIXES)
# ['.cpython-38m-x86_64-linux-gnu.so', '.abi3.so', '.so'] on Linux
# ['.cp38-win_amd64.pyd', '.pyd'] on Windows
that means on Linux, the result will be foo.abi3.so and just foo.pyx on Windows (see e.g. this code in setuptools).
Now, just running
python setup.py bdist_wheel
would build an extension, which could be used with any Python version>= 3.4, but pip would not install it for anything else than CPython-3.8 with pymalloc on Linux (because the name of wheel is foo-0.1.0-cp38-cp38m-linux_x86_64.whl). This is the part, from the documentation, where the packaging system needs to ensure, that it doesn't come to version mismatch.
To allow pip to install for multiple python versions, the wheel should be created with --py-limited-api-version:
python setup.py bdist_wheel --py-limited-api=cp34
due to the resulting name foo-0.1.0-cp34-abi3-linux_x86_64.whl, pip will know, it is safe to install for CPython>=3.4.
To make clear: CPython doesn't really know, that the c-extension with suffix abi3.so (or .pyx on Windows) can really be used by the interpreter (it just assumes in good faith) - it is pip who ensures, that right version is installed.

Related

Does Python has dependency on C compilers like GCC in Linux? What OS libraries does Python depends on?

I'm trying to understand Linux OS library dependencies to effectively run python 3.9 and imported pip packages to work. Is there a requirement for GCC to be installed for pip modules with c extention modules to run? What system libraries does Python's interpreter (CPython) depends on?
I'm trying to understand Linux OS library dependencies to effectively run python 3.9 and imported pip packages to work.
Your questions may have pretty broad answers and depend on a bunch of input factors you haven't mentioned.
Is there a requirement for GCC to be installed for pip modules with c extention modules to run?
It depends how the package is built and shipped. If it is available only as a source distribution (sdist), then yes. Obviously a compiler is needed to take the .c files and produce a laudable binary extension (ELF or DLL). Some packages ship binary distributions, where the publisher does the compilation for you. Of course this is more of a burden on the publisher, as they must support many possible target machines.
What system libraries does Python's interpreter depends on?
It depends on a number of things, including which interpreter (there are multiple!) and how it was built and packaged. Even constraining the discussion to CPython (the canonical interpreter), this may vary widely.
The simplest thing to do is whatever your Linux distro has decided for you; just apt install python3 or whatever, and don't think too hard about it. Most distros ship dynamically-linked packages; these will depend on a small number of "common" libraries (e.g. libc, libz, etc). Some distros will statically-link the Python library into the interpreter -- IOW the python3 executable will not depend on libpython3.so. Other distros will dynamically link against libpython.
What dependencies will external modules (e.g. from PyPI) have? Well that completely depends on the package in question!
Hopefully this helps you understand the limitations of your question. If you need more specific answers, you'll need to either do your own research, or provide a more specific question.
Python depends on compilers and a lot of other tools if you're going to compile the source (from the repository). This is from the offical repository, telling you what you need to compile it from source, check it out.
1.4. Install dependencies
This section explains how to install additional extensions (e.g. zlib) on Linux and macOs/OS X. On Windows, extensions are already included and built automatically.
1.4.1. Linux
For UNIX based systems, we try to use system libraries whenever available. This means optional components will only build if the relevant system headers are available. The best way to obtain the appropriate headers will vary by distribution, but the appropriate commands for some popular distributions are below.
However, if you just want to run python programs, all you need is the python binary (and the libraries your script wants to use). The binary is usually at /usr/bin/python3 or /usr/bin/python3.9
Python GitHub Repository
For individual packages, it depends on the package.
Further reading:
What is PIP?
Official: Managing application dependencies

Create binary python package that depends on system-wide installed libssl.so

I am trying to create a binary python package that can be installed without compilation. The python package consists only one extension module written in C using the Python API. The extension module depending on the stable python ABI by using Py_LIMITED_API with 0x03060000 (3.6). Up to my knowledge, this means the extension module can work for all CPython versions that are not older than 3.6. I managed to create the sdist package, and I explored dumb, egg and wheel formats. I managed to create the binary packages, but none of them is perfect for my use case.
The problem is the extension module is depending on libssl.so, and this is the only "external" dependency of it. Because the python package itself is very stable, it doesn't require frequent releases. Therefore I wouldn't like to include libssl.so and take the burden of releasing new versions because of the security updates of OpenSSL (not mentioning to educate the users to update their python package regularly). I think because of this, the wheel format is not suitable, because the linux_*.whl packages cannot be uploaded to PyPI, but the manylinux2014_*.whl tag has to include libssl.so (and its dependencies) in the package. The dumb package are not suitable for PyPI based distribution, the egg format is not supported by pip, so they are also not suitable.
Because of the stable ABI and the widespread of libssl.so, I think it should be possible to release a single binary package for the most linux distributions and multiple python versions (similarly to the manylinux tags with wheel). Of course the pacakge would require libssl to be installed on the machine, but that is something I can accept for achieving better security. And that's where I am stuck.
My question is how can I create a binary python package, which
contains a python extension module,
depends on the system-wide installed libssl.so,
and can uploaded to and installed via pip and PyPI?
I tried to explore other possibilities, but I couldn't find anything else, so if you have any tips for other formats to look after, I would appreciate that also.

How to structure and distribute Pybind11 extension with stubs?

I'm trying to create and distribute (with pip) a Python package that has Python code, and C++ code compiled to a .pyd file with Pybind11 (using Visual Studio 2019). I also want to include .pyi stub files, for VScode and other editors. I can't find much documentation on doing this correctly.
I'd like to be able to just install the package via pip as normal, and write from mymodule.mysubmodule import myfunc etc like a normal Python package, including autocompletes, type annotations, VScode intellisense etc using the .pyi files I'd write.
My C++ code is in multiple cpp and header files. It uses a few standard libraries, and a few external libraries (such as boost). It defines a single module, and 2 submodules. I want to be able to distribute this on Windows and Linux, and for x86 and x64. I am currently targeting Python 3.9, and the c++17 standard.
How should I structure and distribute this package? Do I include the c++ source files, and create a setup.py similar to the Pybind11 example? If so, how do I include the external libraries? And how do I structure the .pyi stub files? Does this mean whoever tries to install my package would need a c++ compiler as well?
Or, should I compile my c++ to a .pyd/.so file for each platform and architecture? If so, is there a way to specify which one gets installed through pip? And again, how would I structure the .pyi stubs?
Generating .pyi stubs
The pybind11 issue mentions a couple of tools (1, 2) to generate stubs for binary modules. There could be more, but I'm not aware of others. Unfortunately both are far from being perfect, so you probably still need to check and adjust the generated stubs manually.
Distribution of .pyi stubs
After correction of stubs you just include those .pyi files in you distribution (e.g. in wheel or as sources) along with py.typed indication file or, alternatively, distribute them separately as standalone package (e.g. mypackage-stubs).
Building wheels
Wheels allows users of your library to install it in binary form, i.e. without compilation. Wheels makes use of older compilers in order to be compatible with greater number of systems/platforms, so you might face some troubles with a C++17 library. (C++11 is old enough and should have no problems with wheels).
Building wheels for various platforms is tedious, the pybind11's python_example uses cibuildwheels package to do that, I would recommend this route if you are already using CI.
If wheels are missing for target platform the pip will attempt to install from source. This would require compiler and 3rd party libraries you are using to be already installed.
Maybe conda?
If your setup is complex and requires a number of 3rd party libraries it might be worth to write a conda recipe and use conda-forge to generate binary versions of the package. Conda is superior to pip, since it can manage non-python dependencies as well.

pep 420 namespace_packages purpose in setup.py

What is the purpose of the namespace_packages argument in setup.py when working with PEP420 namespace packages (the ones without __init__.py)?
I played with it and saw no difference whether I declared the namespace packages or not.
"setup.py install" and "pip install ." worked in any case.
I am building an automatic setup.py code generator and would be happy not to handle this if this is not necessary.
As long as you:
aim for Python 3.3 and newer or Python 2.7 with importlib2 dependency installed (a backport of importlib for Python 2),
use a recent version of setuptools for packaging (I think it should be 28.8 or newer)
and use a recent pip version for installing (9.0 and newer will be fine, 8.1.2 will probably also work, but you should test that yourself),
you are on the safe side and can safely omit the namespace_packages keyword arg in your setup scripts.
There is a PyPA's official repository named sample-namespace-packages on GitHub that contains a suite of tests for different possible scenarios of distributions installed that contain namespace packages of each kind. As you can see, the sample packages using the implicit namespace packages don't use namespace_packages arg in their setup scripts (here is one of the scripts) and all of the tests of types pep420 and cross_pep420_pkgutil pass on Python 3; here is the complete results table.
Namespace packages are separate packages that are installed under one top-level name.
Usually two different packages (for example SQLObject and Cheetah3) install two (or more) different top-level packages (sqlobject and Cheetah in my examples).
But what if I have a library that I want to split into parts and allow to install these parts without the rest of the library? I use namespace packages. Example: these two packages are 2 parts of one library: m_lib and m_lib.defenc. One installs m_lib/defenc.py which can be used separately, the other installs the rest of the m_lib library. To install the entire library at once I also provide m_lib.full.
PS. All mentioned packages are mine. Source code is provided at Github or my personal git hosting.

How do I manage python versions in source control for application?

We have an application that uses pyenv/virtualenv to manage python dependencies. We want to ensure that everyone who works on the application will have the same python version. Coming from ruby, the analog is Gemfile. To a certain degree, .ruby-version.
What's the equivalent in python? Is it .python-version? I've seen quite a few .gitignore that have that in it and usually under a comment ".pyenv". What's the reason for that? And what's the alternative?
Recent versions of setuptools (24.2.0+) allow you to control Python version at the distribution level.
For example, suppose you wanted to allow installation only on a (compatible) version of Python 3.6, you could specify:
# in setup.py
from setuptools import setup
setup(
...
python_requires='~=3.6',
...
)
The distribution built by this setup would have associated metadata which would prevent installation on incompatible Python version. Your clients need a current version of pip for this feature to work properly, older pip (<9.0.0) will not check this metadata.
If you must extend the requirement to people using older version of pip, you may put an explicit check on sys.version somewhere in the module level of the setup.py file. However, note that with this workaround, the package will still be downloaded by pip - it will fail later, on a pip install attempt with incorrect interpreter version.

Categories