Current state of python namespace packages - python

I would like to have several python sub modules inside a main module, but I want to distribute them as separated python packages. So package A should provide 'my_data.source_a', package B should provide 'my_data.source_b', ... and so on.
I found out that I have to use a namespace package for this, but trying to figuring out the details, I found multiple PEPs covering that problem. PEP 420 seems to be the latest one, which builds upon PEP 402 and PEP 382.
To me it's not clear what the status of the different PEPs an implementations is. So my question is: Is http://pythonhosted.org/distribute/setuptools.html#namespace-packages still the way to go or how should I build my namespace package?

The Python documentation has a good description of the three ways of creating namespace packages in Python, including guidelines for when to use each of the three methods. Furthermore, this topic is discussed in great depth in a different StackOverflow thread which has a good accepted answer. Finally, if you are someone who would rather read code than documentation, the sample-namespace-packages repo contains examples of namespace packages created using each of the three available methods.
In brief, if you intend your packages to work with Python versions 3.3 and above, you should use the native namespace packages method. If you intend your packages to work with older versions of Python, you should use the pkgutil method. If you intend to add a namespace package to a namespace that is already using the pkg_resources method, you should continue to use method.
With native namespace packages, we can remove __init__.py from both packages and modify our setup.py files to look as follows:
# setup.py file for my_data.source_a
from setuptools import setup, find_namespace_packages
setup(
name="my_data.source_a",
version="0.1",
packages=find_namespace_packages(include=['my_data.*'])
)
# setup.py file for my_data.source_b
from setuptools import setup, find_namespace_packages
setup(
name="my_data.source_b",
version="0.1",
packages=find_namespace_packages(include=['my_data.*'])
)
We need to add the include=['my_data.*'] argument because, by default find_namespace_packages() is rather lenient in the folders that it includes as namespace packages, as described here.
This is the recommended approach for packages supporting Python 3.3 and above.
With pkgutil-style namespace packages, we need to add the following line to the my_data.__init__.py files in each of our packages:
__path__ = __import__('pkgutil').extend_path(__path__, __name__)
This is the approach used by the backports namespace, and by different packages in the google-cloud-python repo, and it is the recommended approach for supporting older versions of Python.

The latest version of Python which is Python 3.7 uses the native namespace packages approach to create namespace packages which are defined in PEP 420.
There are currently three different approaches to creating namespace packages:
Use native namespace packages. This type of namespace package is defined in PEP 420 and is available in Python 3.3 and later. This is recommended if packages in your namespace only ever need to support Python 3 and installation via pip.
Use pkgutil-style namespace packages. This is recommended for new packages that need to support Python 2 and 3 and installation via both pip and python setup.py install.
Use pkg_resources-style namespace packages. This method is recommended if you need compatibility with packages already using this method or if your package needs to be zip-safe.
Reference: Packaging namespace packages

Related

What are ways of using external modules and libraries in Python?

The book "Learning Python" by Mark Lutz states that in Python one can use various types of external modules, which include .py files, .zip archives, C/C++ compiled libraries and others. My question is, how does one usually handle installation of each type of module?
For example, I know that to use a .py module, I simply need to locate it with import. What about something like .dll or .a? Or for example, I found an interesting library on GitHub, which has no installation manual. How do I know which files to import?
Also, are there any ways of installing modules besides pip?
The answer depends on what you want to do.
You can use Ninja for example to use C++ modules and cython for C and there are various packages for almost any type of compiled code.
You can install packages via pip using the pypi package repository or by using cloned repositories that have a setup.py file inside.
Any other python based repo can be imported either by a custom build script (that they will provide) or by directly importing the relevant Python files. This will require you the dive into the code and check what are the relevant files.
Also, are there any ways of installing modules besides pip?
Yes, according to Installing Python Modules (Legacy version) modules packaged using distutils should be downloaded, unpacked and command
python setup.py install
or similar should be run. Beware that
The entire distutils package has been deprecated and will be removed
in Python 3.12.

Python setup config install_requires "good practices"

My question here may seem really naive but I never found any clue about it on web resources.
The question is, concerning install_requires argument for setup() function or setup.cfg file, is it a good practice to mention every package used, even python built-in ones such as os for example ?
One can assume that any python environment has those common packages, so is it problematic to explicitly mention them in the setup, making it potentially over-verbose ?
Thanks
install_requires should include non-standard library requirements, and constraints on their versions (as needed).
For example, this would declare minimal versions for numpy and scipy, but allow any version of scikit-learn:
setup(
# ...
install_requires=["numpy>=1.13.3", "scipy>=0.19.1", "scikit-learn"]
)
Packages such as os, sys are part of Python's standard library, so should not be included.
As #sinoroc mentioned, only direct 3rd party dependencies should be declared here. Dependencies-of-your-dependencies are handled automatically. (For example, scikit-learn depends on joblib; when the former is required, the latter will be installed).
A list of standard library packages are listed here: https://docs.python.org/3/library/
I've found it helpful to read other packages and see how their setup.py files are defined.
imbalanced-learn
pandas
You should list top-level 3rd party dependencies.
Don't list packages and modules from Python's standard library.
Do list 3rd party dependencies your code directly depends on, i.e. the projects containing:
the packages and modules your code imports;
the binaries your code directly calls (in subprocesses for example).
Don't list dependencies of your dependencies.
install_requires should mention only packages that don't come prepackaged with the standard library.
If you're afraid some package might not be included due to Python versioning, you can specify that your packages requires a python version bigger or equal to X.
Note: That packaging python packages is a notoriously hairy thing.
I'd recommend you take a look at pyproject.toml, these can be pip installed like any normal package and are used by some of the more modern tools like poetry

In python 3.6 onwards, do we still need to use the __init__.py file to make python treat a folder as package?

I have read that there is no longer a need to add the __init__.py file in latest versions of python to treat a folder as package. However, the python official documentation does not say this - for example the below still shows examples and documentation using the __init__.py file.
The __init__.py files are required to make Python treat directories
containing the file as packages.
https://docs.python.org/3/tutorial/modules.html#packages
Do we still need to use the __init__.py file to make python treat a folder as package? And are there any advantages/disadvantages to adding/removing this file?
That is True but only for namespace packages.
There are currently three different approaches to creating namespace
packages:
Use native namespace packages. This type of namespace package is defined in PEP 420 and is available in Python 3.3 and later. This is
recommended if packages in your namespace only ever need to support
Python 3 and installation via pip.
Use pkgutil-style namespace packages. This is recommended for new packages that need to support Python 2 and 3 and installation via both
pip and python setup.py install.
Use pkg_resources-style namespace packages. This method is recommended if you need compatibility with packages already using this
method or if your package needs to be zip-safe.
Maybe you mention the native namespace packages.

pep 420 namespace_packages purpose in setup.py

What is the purpose of the namespace_packages argument in setup.py when working with PEP420 namespace packages (the ones without __init__.py)?
I played with it and saw no difference whether I declared the namespace packages or not.
"setup.py install" and "pip install ." worked in any case.
I am building an automatic setup.py code generator and would be happy not to handle this if this is not necessary.
As long as you:
aim for Python 3.3 and newer or Python 2.7 with importlib2 dependency installed (a backport of importlib for Python 2),
use a recent version of setuptools for packaging (I think it should be 28.8 or newer)
and use a recent pip version for installing (9.0 and newer will be fine, 8.1.2 will probably also work, but you should test that yourself),
you are on the safe side and can safely omit the namespace_packages keyword arg in your setup scripts.
There is a PyPA's official repository named sample-namespace-packages on GitHub that contains a suite of tests for different possible scenarios of distributions installed that contain namespace packages of each kind. As you can see, the sample packages using the implicit namespace packages don't use namespace_packages arg in their setup scripts (here is one of the scripts) and all of the tests of types pep420 and cross_pep420_pkgutil pass on Python 3; here is the complete results table.
Namespace packages are separate packages that are installed under one top-level name.
Usually two different packages (for example SQLObject and Cheetah3) install two (or more) different top-level packages (sqlobject and Cheetah in my examples).
But what if I have a library that I want to split into parts and allow to install these parts without the rest of the library? I use namespace packages. Example: these two packages are 2 parts of one library: m_lib and m_lib.defenc. One installs m_lib/defenc.py which can be used separately, the other installs the rest of the m_lib library. To install the entire library at once I also provide m_lib.full.
PS. All mentioned packages are mine. Source code is provided at Github or my personal git hosting.

Is there any reason to list standard library dependencies when using setuptools?

I've gone down the Python packaging and distribution rabbit-hole and am wondering:
Is there ever any reason to provide standard library modules/packages as dependencies to setup() when using setuptools?
And a side note: the only setuptools docs I have found are terrible. Is there anything better?
Thanks
In a word, no. The point of the dependencies is to make sure they are available when someone installs your library. Since standard libraries are always available with a python installation, you do not need to include them.
For a more user-friendly guide to packaging, check out Hynek's guide:
Sharing Your Labor of Love: PyPI Quick and Dirty
No — In fact, you should never specify standard modules as setup() requirements, because those requirements are only for downloading packages from PyPI (or VCS repositories or other package indices). Adding, say, "itertools" to install_requires will mean that your package will fail to install because its dependencies can't be satisfied because there's (currently) no package named "itertools" on PyPI. Some standard modules do share their name with a project on PyPI; in the best case (e.g., argparse), the PyPI project is compatible with the standard module and only exists as a separate project for historical reasons/backwards compatibility. In the worst case... well, use your imagination.

Categories