pep 420 namespace_packages purpose in setup.py - python

What is the purpose of the namespace_packages argument in setup.py when working with PEP420 namespace packages (the ones without __init__.py)?
I played with it and saw no difference whether I declared the namespace packages or not.
"setup.py install" and "pip install ." worked in any case.
I am building an automatic setup.py code generator and would be happy not to handle this if this is not necessary.

As long as you:
aim for Python 3.3 and newer or Python 2.7 with importlib2 dependency installed (a backport of importlib for Python 2),
use a recent version of setuptools for packaging (I think it should be 28.8 or newer)
and use a recent pip version for installing (9.0 and newer will be fine, 8.1.2 will probably also work, but you should test that yourself),
you are on the safe side and can safely omit the namespace_packages keyword arg in your setup scripts.
There is a PyPA's official repository named sample-namespace-packages on GitHub that contains a suite of tests for different possible scenarios of distributions installed that contain namespace packages of each kind. As you can see, the sample packages using the implicit namespace packages don't use namespace_packages arg in their setup scripts (here is one of the scripts) and all of the tests of types pep420 and cross_pep420_pkgutil pass on Python 3; here is the complete results table.

Namespace packages are separate packages that are installed under one top-level name.
Usually two different packages (for example SQLObject and Cheetah3) install two (or more) different top-level packages (sqlobject and Cheetah in my examples).
But what if I have a library that I want to split into parts and allow to install these parts without the rest of the library? I use namespace packages. Example: these two packages are 2 parts of one library: m_lib and m_lib.defenc. One installs m_lib/defenc.py which can be used separately, the other installs the rest of the m_lib library. To install the entire library at once I also provide m_lib.full.
PS. All mentioned packages are mine. Source code is provided at Github or my personal git hosting.

Related

Identifying Packages Using Python's Stable ABI

I have a Python package containing a number of C/C++ extensions built as a single wheel. I'm trying to understand how to ensure the wheel and shared libraries it contains correctly advertise that they use the stable ABI at a particular API version. I build the package using a setup.py that I run this way.
% python setup.py bdist_wheel --py-limited-api=cp34
I think the cp34 part is how I indicate that I'm using the stable ABI and at most the Python 3.4 API. The resulting wheel is named goober-1.2-cp34-abi3-linux_x86_64.whl. The highlighted part shows the Python and ABI tags. Without the --py-limited-api, that part is cp38-cp38, matching my Python 3.8. Is that enough to advertise that my wheel should work with all Python 3.x starting from 3.4, without recompiling? I guess I'd specify cp3 to indicate all 3.x versions.
For the shared libraries, I compile the C/C++ source this way.
% gcc ... -DPy_LIMITED_API=0x03040000 ... blooper.c
In this case, the shared library is named blooper.cpython-38-x86_64-linux-gnu.so, with nothing indicating it supports the stable ABI and the 3.4 API. From PEP 3149 I expected to see that somewhere in the name. Otherwise, won't Python 3.8 be the only version willing to import this module?
Thanks.
It might be surprising, but adding --py-limited-api=cp34 only changes the name of the wheel, but not its content - i.e. it will be still "usual" version pinned to the Python version which with it has been built.
The first step is to create a setup.py which would produce a C-extension which uses stable-API and which declares it as well. To my knowledge distutils has no support for stable C-API, so setuptools should be used.
There is a minimal example:
from setuptools import setup, Extension
my_extension = Extension(
name='foo',
sources = ["foo.c"],
py_limited_api = True,
define_macros=[('Py_LIMITED_API', '0x03040000')],
)
kwargs = {
'name':'foo',
'version':'0.1.0',
'ext_modules': [my_extension],
}
setup(**kwargs)
Important details are:
py_limited_api should be set to True, thus the resulting extension will have the correct suffixes (e.g. abi3), once build.
Py_LIMITED_API macro should be set to the correct value, otherwise non-stable or wrong stable C-API will be used.
The resulting suffix of the extension might also be surprising. The CPython documentation states:
On some platforms, Python will look for and load shared library files
named with the abi3 tag (e.g. mymodule.abi3.so). It does not check if
such extensions conform to a Stable ABI. The user (or their packaging
tools) need to ensure that, for example, extensions built with the
3.10+ Limited API are not installed for lower versions of Python.
"Some platforms" are Linux and MacOS, one can check it by looking at
from importlib.machinery import EXTENSION_SUFFIXES
print(EXTENSION_SUFFIXES)
# ['.cpython-38m-x86_64-linux-gnu.so', '.abi3.so', '.so'] on Linux
# ['.cp38-win_amd64.pyd', '.pyd'] on Windows
that means on Linux, the result will be foo.abi3.so and just foo.pyx on Windows (see e.g. this code in setuptools).
Now, just running
python setup.py bdist_wheel
would build an extension, which could be used with any Python version>= 3.4, but pip would not install it for anything else than CPython-3.8 with pymalloc on Linux (because the name of wheel is foo-0.1.0-cp38-cp38m-linux_x86_64.whl). This is the part, from the documentation, where the packaging system needs to ensure, that it doesn't come to version mismatch.
To allow pip to install for multiple python versions, the wheel should be created with --py-limited-api-version:
python setup.py bdist_wheel --py-limited-api=cp34
due to the resulting name foo-0.1.0-cp34-abi3-linux_x86_64.whl, pip will know, it is safe to install for CPython>=3.4.
To make clear: CPython doesn't really know, that the c-extension with suffix abi3.so (or .pyx on Windows) can really be used by the interpreter (it just assumes in good faith) - it is pip who ensures, that right version is installed.

How does setuptools installs test dependencies on python setup.py test command

When I use the command python setup.py test, all of the documentation I've seen says setuptools will handle installing the testing dependencies. Where does it install them and are they deleted from the machine after the test suite runs? I've noticed none of the testing modules are actually installed into my virtual environment after this command completes.
I understand it takes all of the modules in the tests_require list and installs them somewhere but I'm not sure where, what it does with them afterward and why it does this. Also, is there any way to pass arguments to the command without using flags, like with a config file or something?
Avoid python setup.py test and tests_require, it's crufty and is now deprecated.
That old feature just downloads the test deps to the project's setup directory, which is seldom what the developer wanted or expected to happen! That doesn't work well in a modern CI workflows with virtual environments, where you would want your dependencies installed to site-packages.
The recommended way to do it using setuptools these days is with an extras_require tag.. See here for an example.
It installs them into an automatically-created subdirectory of the code base named .eggs as .eggs. That's because .eggs are designed to be importable from any location.
This will thus most likely not work in a modern environment because packages are not distributed as .eggs (which lost competition to .whls) so setuptools will have to build them from source (with bdist_egg). Which is likely to fail for many widely-used binary packages with nontrivial build requirements (not to mention the time needed and the fact that packages are not tested as .eggs, either, and may fail when packaged like this).
Instead, listing build requirements in requirements.txt and invoking pip install -r requirements.txt before the build seems to have become widespread practice. This does not make setup.py automatically buildable from source by pip though.
I tried to install them myself from setup.py but this proved to be fragile (e.g. if the user doesn't have write access to site-packages).
The best solution adopted by at least a number of high-profile projects seems to be to just make setup.py fail if they are not present. This is especially useful if the requirements are not Python but C libraries as setup.py doesn't know how to install these in the specific environment anyway. As you can see, this complements requirements.txt naturally.

requirements.txt vs setup.py

I started working with Python. I've added requirements.txt and setup.py to my project. But, I am still confused about the purpose of both files. I have read that setup.py is designed for redistributable things and that requirements.txt is designed for non-redistributable things. But I am not certain this is accurate.
How are those two files truly intended to be used?
requirements.txt:
This helps you to set up your development environment.
Programs like pip can be used to install all packages listed in the file in one fell swoop. After that you can start developing your python script. Especially useful if you plan to have others contribute to the development or use virtual environments.
This is how you use it:
pip install -r requirements.txt
It can be produced easily by pip itself:
pip freeze > requirements.txt
pip automatically tries to only add packages that are not installed by default, so the produced file is pretty minimal.
setup.py:
This helps you to create packages that you can redistribute.
The setup.py script is meant to install your package on the end user's system, not to prepare the development environment as pip install -r requirements.txt does. See this answer for more details on setup.py.
The dependencies of your project are listed in both files.
The short answer is that requirements.txt is for listing package requirements only. setup.py on the other hand is more like an installation script. If you don't plan on installing the python code, typically you would only need requirements.txt.
The file setup.py describes, in addition to the package dependencies, the set of files and modules that should be packaged (or compiled, in the case of native modules (i.e., written in C)), and metadata to add to the python package listings (e.g. package name, package version, package description, author, ...).
Because both files list dependencies, this can lead to a bit of duplication. Read below for details.
requirements.txt
This file lists python package requirements. It is a plain text file (optionally with comments) that lists the package dependencies of your python project (one per line). It does not describe the way in which your python package is installed. You would generally consume the requirements file with pip install -r requirements.txt.
The filename of the text file is arbitrary, but is often requirements.txt by convention. When exploring source code repositories of other python packages, you might stumble on other names, such as dev-dependencies.txt or dependencies-dev.txt. Those serve the same purpose as dependencies.txt but generally list additional dependencies of interest to developers of the particular package, namely for testing the source code (e.g. pytest, pylint, etc.) before release. Users of the package generally wouldn't need the entire set of developer dependencies to run the package.
If multiplerequirements-X.txt variants are present, then usually one will list runtime dependencies, and the other build-time, or test dependencies. Some projects also cascade their requirements file, i.e. when one requirements file includes another file (example). Doing so can reduce repetition.
setup.py
This is a python script which uses the setuptools module to define a python package (name, files included, package metadata, and installation). It will, like requirements.txt, also list runtime dependencies of the package. Setuptools is the de-facto way to build and install python packages, but it has its shortcomings, which over time have sprouted the development of new "meta-package managers", like pip. Example shortcomings of setuptools are its inability to install multiple versions of the same package, and lack of an uninstall command.
When a python user does pip install ./pkgdir_my_module (or pip install my-module), pip will run setup.py in the given directory (or module). Similarly, any module which has a setup.py can be pip-installed, e.g. by running pip install . from the same folder.
Do I really need both?
Short answer is no, but it's nice to have both. They achieve different purposes, but they can both be used to list your dependencies.
There is one trick you may consider to avoid duplicating your list of dependencies between requirements.txt and setup.py. If you have written a fully working setup.py for your package already, and your dependencies are mostly external, you could consider having a simple requirements.txt with only the following:
# requirements.txt
#
# installs dependencies from ./setup.py, and the package itself,
# in editable mode
-e .
# (the -e above is optional). you could also just install the package
# normally with just the line below (after uncommenting)
# .
The -e is a special pip install option which installs the given package in editable mode. When pip -r requirements.txt is run on this file, pip will install your dependencies via the list in ./setup.py. The editable option will place a symlink in your install directory (instead of an egg or archived copy). It allows developers to edit code in place from the repository without reinstalling.
You can also take advantage of what's called "setuptools extras" when you have both files in your package repository. You can define optional packages in setup.py under a custom category, and install those packages from just that category with pip:
# setup.py
from setuptools import setup
setup(
name="FOO"
...
extras_require = {
'dev': ['pylint'],
'build': ['requests']
}
...
)
and then, in the requirements file:
# install packages in the [build] category, from setup.py
# (path/to/mypkg is the directory where setup.py is)
-e path/to/mypkg[build]
This would keep all your dependency lists inside setup.py.
Note: You would normally execute pip and setup.py from a sandbox, such as those created with the program virtualenv. This will avoid installing python packages outside the context of your project's development environment.
For the sake of completeness, here is how I see it in 3 4 different angles.
Their design purposes are different
This is the precise description quoted from the official documentation (emphasis mine):
Whereas install_requires (in setup.py) defines the dependencies for a single project, Requirements Files are often used to define the requirements for a complete Python environment.
Whereas install_requires requirements are minimal, requirements files often contain an exhaustive listing of pinned versions for the purpose of achieving repeatable installations of a complete environment.
But it might still not easy to be understood, so in next section, there come 2 factual examples to demonstrate how the 2 approaches are supposed to be used, differently.
Their actual usages are therefore (supposed to be) different
If your project foo is going to be released as a standalone library (meaning, others would probably do import foo), then you (and your downstream users) would want to have a flexible declaration of dependency, so that your library would not (and it must not) be "picky" about what exact version of YOUR dependencies should be. So, typically, your setup.py would contain lines like this:
install_requires=[
'A>=1,<2',
'B>=2'
]
If you just want to somehow "document" or "pin" your EXACT current environment for your application bar, meaning, you or your users would like to use your application bar as-is, i.e. running python bar.py, you may want to freeze your environment so that it would always behave the same. In such case, your requirements file would look like this:
A==1.2.3
B==2.3.4
# It could even contain some dependencies NOT strickly required by your library
pylint==3.4.5
In reality, which one do I use?
If you are developing an application bar which will be used by python bar.py, even if that is "just script for fun", you are still recommended to use requirements.txt because, who knows, next week (which happens to be Christmas) you would receive a new computer as a gift, so you would need to setup your exact environment there again.
If you are developing a library foo which will be used by import foo, you have to prepare a setup.py. Period.
But you may still choose to also provide a requirements.txt at the same time, which can:
(a) either be in the A==1.2.3 style (as explained in #2 above);
(b) or just contain a magical single .
.
The latter is essentially using the conventional requirements.txt habit to document your installation step is pip install ., which means to "install the requirements based on setup.py" while without duplication. Personally I consider this last approach kind of blurs the line, adds to the confusion, but it is nonetheless a convenient way to explicitly opt out for dependency pinning when running in a CI environment. The trick was derived from an approach mentioned by Python packaging maintainer Donald in his blog post.
Different lower bounds.
Assuming there is an existing engine library with this history:
engine 1.1.0 Use steam
...
engine 1.2.0 Internal combustion is invented
engine 1.2.1 Fix engine leaking oil
engine 1.2.2 Fix engine overheat
engine 1.2.3 Fix occasional engine stalling
engine 2.0.0 Introducing nuclear reactor
You follow the above 3 criteria and correctly decided that your new library hybrid-engine would use a setup.py to declare its dependency engine>=1.2.0,<2, and then your separated application reliable-car would use requirements.txt to declare its dependency engine>=1.2.3,<2 (or you may want to just pin engine==1.2.3). As you see, your choice for their lower bound number are still subtly different, and neither of them uses the latest engine==2.0.0. And here is why.
hybrid-engine depends on engine>=1.2.0 because, the needed add_fuel() API was first introduced in engine 1.2.0, and that capability is the necessity of hybrid-engine, regardless of whether there might be some (minor) bugs inside such version and been fixed in subsequent versions 1.2.1, 1.2.2 and 1.2.3.
reliable-car depends on engine>=1.2.3 because that is the earliest version WITHOUT known issues, so far. Sure there are new capabilities in later versions, i.e. "nuclear reactor" introduced in engine 2.0.0, but they are not necessarily desirable for project reliable-car. (Your yet another new project time-machine would likely use engine>=2.0.0, but that is a different topic, though.)
TL;DR
requirements.txt lists concrete dependencies
setup.py lists abstract dependencies
A common misunderstanding with respect to dependency management in Python is whether you need to use a requirements.txt or setup.py file in order to handle dependencies.
The chances are you may have to use both in order to ensure that dependencies are handled appropriately in your Python project.
The requirements.txt file is supposed to list the concrete dependencies. In other words, it should list pinned dependencies (using the == specifier). This file will then be used in order to create a working virtual environment that will have all the dependencies installed, with the specified versions.
On the other hand, the setup.py file should list the abstract dependencies. This means that it should list the minimal dependencies for running the project. Apart from dependency management though, this file also serves the package distribution (say on PyPI).
For a more comprehensive read, you can read the article requirements.txt vs setup.py in Python on TDS.
Now going forward and as of PEP-517 and PEP-518, you may have to use a pyproject.toml in order to specify that you want to use setuptools as the build-tool and an additional setup.cfg file to specify the details.
For more details you can read the article setup.py vs setup.cfg in Python.

How do I manage python versions in source control for application?

We have an application that uses pyenv/virtualenv to manage python dependencies. We want to ensure that everyone who works on the application will have the same python version. Coming from ruby, the analog is Gemfile. To a certain degree, .ruby-version.
What's the equivalent in python? Is it .python-version? I've seen quite a few .gitignore that have that in it and usually under a comment ".pyenv". What's the reason for that? And what's the alternative?
Recent versions of setuptools (24.2.0+) allow you to control Python version at the distribution level.
For example, suppose you wanted to allow installation only on a (compatible) version of Python 3.6, you could specify:
# in setup.py
from setuptools import setup
setup(
...
python_requires='~=3.6',
...
)
The distribution built by this setup would have associated metadata which would prevent installation on incompatible Python version. Your clients need a current version of pip for this feature to work properly, older pip (<9.0.0) will not check this metadata.
If you must extend the requirement to people using older version of pip, you may put an explicit check on sys.version somewhere in the module level of the setup.py file. However, note that with this workaround, the package will still be downloaded by pip - it will fail later, on a pip install attempt with incorrect interpreter version.

Current state of python namespace packages

I would like to have several python sub modules inside a main module, but I want to distribute them as separated python packages. So package A should provide 'my_data.source_a', package B should provide 'my_data.source_b', ... and so on.
I found out that I have to use a namespace package for this, but trying to figuring out the details, I found multiple PEPs covering that problem. PEP 420 seems to be the latest one, which builds upon PEP 402 and PEP 382.
To me it's not clear what the status of the different PEPs an implementations is. So my question is: Is http://pythonhosted.org/distribute/setuptools.html#namespace-packages still the way to go or how should I build my namespace package?
The Python documentation has a good description of the three ways of creating namespace packages in Python, including guidelines for when to use each of the three methods. Furthermore, this topic is discussed in great depth in a different StackOverflow thread which has a good accepted answer. Finally, if you are someone who would rather read code than documentation, the sample-namespace-packages repo contains examples of namespace packages created using each of the three available methods.
In brief, if you intend your packages to work with Python versions 3.3 and above, you should use the native namespace packages method. If you intend your packages to work with older versions of Python, you should use the pkgutil method. If you intend to add a namespace package to a namespace that is already using the pkg_resources method, you should continue to use method.
With native namespace packages, we can remove __init__.py from both packages and modify our setup.py files to look as follows:
# setup.py file for my_data.source_a
from setuptools import setup, find_namespace_packages
setup(
name="my_data.source_a",
version="0.1",
packages=find_namespace_packages(include=['my_data.*'])
)
# setup.py file for my_data.source_b
from setuptools import setup, find_namespace_packages
setup(
name="my_data.source_b",
version="0.1",
packages=find_namespace_packages(include=['my_data.*'])
)
We need to add the include=['my_data.*'] argument because, by default find_namespace_packages() is rather lenient in the folders that it includes as namespace packages, as described here.
This is the recommended approach for packages supporting Python 3.3 and above.
With pkgutil-style namespace packages, we need to add the following line to the my_data.__init__.py files in each of our packages:
__path__ = __import__('pkgutil').extend_path(__path__, __name__)
This is the approach used by the backports namespace, and by different packages in the google-cloud-python repo, and it is the recommended approach for supporting older versions of Python.
The latest version of Python which is Python 3.7 uses the native namespace packages approach to create namespace packages which are defined in PEP 420.
There are currently three different approaches to creating namespace packages:
Use native namespace packages. This type of namespace package is defined in PEP 420 and is available in Python 3.3 and later. This is recommended if packages in your namespace only ever need to support Python 3 and installation via pip.
Use pkgutil-style namespace packages. This is recommended for new packages that need to support Python 2 and 3 and installation via both pip and python setup.py install.
Use pkg_resources-style namespace packages. This method is recommended if you need compatibility with packages already using this method or if your package needs to be zip-safe.
Reference: Packaging namespace packages

Categories