My question here may seem really naive but I never found any clue about it on web resources.
The question is, concerning install_requires argument for setup() function or setup.cfg file, is it a good practice to mention every package used, even python built-in ones such as os for example ?
One can assume that any python environment has those common packages, so is it problematic to explicitly mention them in the setup, making it potentially over-verbose ?
Thanks
install_requires should include non-standard library requirements, and constraints on their versions (as needed).
For example, this would declare minimal versions for numpy and scipy, but allow any version of scikit-learn:
setup(
# ...
install_requires=["numpy>=1.13.3", "scipy>=0.19.1", "scikit-learn"]
)
Packages such as os, sys are part of Python's standard library, so should not be included.
As #sinoroc mentioned, only direct 3rd party dependencies should be declared here. Dependencies-of-your-dependencies are handled automatically. (For example, scikit-learn depends on joblib; when the former is required, the latter will be installed).
A list of standard library packages are listed here: https://docs.python.org/3/library/
I've found it helpful to read other packages and see how their setup.py files are defined.
imbalanced-learn
pandas
You should list top-level 3rd party dependencies.
Don't list packages and modules from Python's standard library.
Do list 3rd party dependencies your code directly depends on, i.e. the projects containing:
the packages and modules your code imports;
the binaries your code directly calls (in subprocesses for example).
Don't list dependencies of your dependencies.
install_requires should mention only packages that don't come prepackaged with the standard library.
If you're afraid some package might not be included due to Python versioning, you can specify that your packages requires a python version bigger or equal to X.
Note: That packaging python packages is a notoriously hairy thing.
I'd recommend you take a look at pyproject.toml, these can be pip installed like any normal package and are used by some of the more modern tools like poetry
Related
I've gone down the Python packaging and distribution rabbit-hole and am wondering:
Is there ever any reason to provide standard library modules/packages as dependencies to setup() when using setuptools?
And a side note: the only setuptools docs I have found are terrible. Is there anything better?
Thanks
In a word, no. The point of the dependencies is to make sure they are available when someone installs your library. Since standard libraries are always available with a python installation, you do not need to include them.
For a more user-friendly guide to packaging, check out Hynek's guide:
Sharing Your Labor of Love: PyPI Quick and Dirty
No — In fact, you should never specify standard modules as setup() requirements, because those requirements are only for downloading packages from PyPI (or VCS repositories or other package indices). Adding, say, "itertools" to install_requires will mean that your package will fail to install because its dependencies can't be satisfied because there's (currently) no package named "itertools" on PyPI. Some standard modules do share their name with a project on PyPI; in the best case (e.g., argparse), the PyPI project is compatible with the standard module and only exists as a separate project for historical reasons/backwards compatibility. In the worst case... well, use your imagination.
I would like to have several python sub modules inside a main module, but I want to distribute them as separated python packages. So package A should provide 'my_data.source_a', package B should provide 'my_data.source_b', ... and so on.
I found out that I have to use a namespace package for this, but trying to figuring out the details, I found multiple PEPs covering that problem. PEP 420 seems to be the latest one, which builds upon PEP 402 and PEP 382.
To me it's not clear what the status of the different PEPs an implementations is. So my question is: Is http://pythonhosted.org/distribute/setuptools.html#namespace-packages still the way to go or how should I build my namespace package?
The Python documentation has a good description of the three ways of creating namespace packages in Python, including guidelines for when to use each of the three methods. Furthermore, this topic is discussed in great depth in a different StackOverflow thread which has a good accepted answer. Finally, if you are someone who would rather read code than documentation, the sample-namespace-packages repo contains examples of namespace packages created using each of the three available methods.
In brief, if you intend your packages to work with Python versions 3.3 and above, you should use the native namespace packages method. If you intend your packages to work with older versions of Python, you should use the pkgutil method. If you intend to add a namespace package to a namespace that is already using the pkg_resources method, you should continue to use method.
With native namespace packages, we can remove __init__.py from both packages and modify our setup.py files to look as follows:
# setup.py file for my_data.source_a
from setuptools import setup, find_namespace_packages
setup(
name="my_data.source_a",
version="0.1",
packages=find_namespace_packages(include=['my_data.*'])
)
# setup.py file for my_data.source_b
from setuptools import setup, find_namespace_packages
setup(
name="my_data.source_b",
version="0.1",
packages=find_namespace_packages(include=['my_data.*'])
)
We need to add the include=['my_data.*'] argument because, by default find_namespace_packages() is rather lenient in the folders that it includes as namespace packages, as described here.
This is the recommended approach for packages supporting Python 3.3 and above.
With pkgutil-style namespace packages, we need to add the following line to the my_data.__init__.py files in each of our packages:
__path__ = __import__('pkgutil').extend_path(__path__, __name__)
This is the approach used by the backports namespace, and by different packages in the google-cloud-python repo, and it is the recommended approach for supporting older versions of Python.
The latest version of Python which is Python 3.7 uses the native namespace packages approach to create namespace packages which are defined in PEP 420.
There are currently three different approaches to creating namespace packages:
Use native namespace packages. This type of namespace package is defined in PEP 420 and is available in Python 3.3 and later. This is recommended if packages in your namespace only ever need to support Python 3 and installation via pip.
Use pkgutil-style namespace packages. This is recommended for new packages that need to support Python 2 and 3 and installation via both pip and python setup.py install.
Use pkg_resources-style namespace packages. This method is recommended if you need compatibility with packages already using this method or if your package needs to be zip-safe.
Reference: Packaging namespace packages
My package has a dependency on the latest version of the jsonpickle package. Older versions are installable via pip, however I need the latest version (i.e on Github) for it to work. Is it generally considered OK in this situation to bundle the latest version of jsonpickle in my code? Is there some other solution? I'd rather not ask my users not to clone from github.
I'm thinking of organising my package like this:
My package
|
__init__.py
file1.py
file2.py
\
jsonpickle (latest)
i.e Doing what was done here: Python: importing a sub‑package or sub‑module
As kag says, this is generally not a good idea. It's not that it's "frowned upon" as being unfriendly to the other packages, it's that it causes maintenance burdens for you and your users. (Imagine that there's a bug that's fixed in jsonpickle that affects your users, but you haven't picked up the fix yet. If you'd done things normally, all they'd have to do is upgrade jsonpickle, but if you're using an internal copy, they have to download the jsonpickle source and yours, hack up your package, and install it all manually.)
Sometimes, it's still worth doing. For example, the very popular requests module includes its own copy of other packages like urllib3. And yes, it does face both of the costs described above. But it also means that each version of request can rely on an exact specific version of urllib3. Since requests makes heavy use of urllib3's rarely-used interfaces, and even has workarounds for some of its known bugs, that can be valuable.
In your case, that doesn't sound like the issue. You just need a bleeding-edge version of jsonpickle temporarily, until the upstream maintainers upload a new version to PyPI. The problem isn't that you don't want your users all having different versions; it's that you don't want to force them to clone the repo and figure out how to install it manually. Fortunately, pip takes care of that for you, by wrapping most of the difficulties up in one line:
pip install git+https://github.com/foo/bar
It's not a beautiful solution, but it's only temporary, right?
It's generally not the best idea to bundle some dependency with your project. Some projects do it anyway, or bundle it as an alternative if there's no system package available. (This is mostly found in C projects, not Python.)
You didn't mention what the "latest" means exactly. Is this the latest in pypi?
The best way to make sure a specific version, or greater than a baseline version, of a package is installed is to properly specify the requirement in setup.py requires section. Read more about requires here [1]. This way pip can take care of resolving dependencies, and if it's available in pypi it will be automatic.
[1] http://docs.python.org/2/distutils/setupscript.html#relationships-between-distributions-and-packages
I have packages A and B, both have their own git repository, PyPI page, etc... Package A depends on package B, and by using the install_requires keyword I can get A to automatically download and install B.
But suppose I want to go a step further for my especially non-savvy users; I want to actually include package B within the tar/zip for package A, so no download is necessary (this also lets them potentially make any by-hand edits to package B's setup.cfg)
Is there a suggested (ideally automated) way to,
Include B in A when I call sdist for A
Tell setuptools that B is bundled with A for resolving the dependency (something like a local dependency_links)
Thanks!
This is called 'vendorizing' and no, there is no support for such an approach.
It is also a bad idea; you want to leave installation to the specialized tools, which not only manage dependencies but also what versions are installed. With tools like buildout or pip's requirements.txt format you can control very precisely what versions are being used.
By bundling a version of a dependency inside, you either force upon your users what version they get to use, or make it harder for such tools to ensure the used versions for a given installation are consistent. In addition, you are potentially wasting bandwidth and space; if other packages have also included the same requirements, you now have multiple copies. And if your dependency is updated to fix a critical security issue, you have to re-release any package that bundles it.
In the past, some packages did use vendorize packaging to include a dependency into their distribution. requests was a prime example; they stepped away from this approach because it complicated their release process. Every time there was a bug in one of the vendored packages they had to produce a new release to include the fix, for example.
If you do want to persist in including packages, you'll have to write your own support. I believe requests manually just added the vendorized packages to their repository; so they kept a static copy they updated from time to time. Alternatively, you could extend setup.py to download code when you are creating a distribution.
I had my fair chance of getting through the python management of modules, and every time is a challenge: packaging is not what people do every day, and it becomes a burden to learn, and a burden to remember, even when you actually do it, since this happens normally once.
I would like to collect here the definitive overview of how import, package management and distribution works in python, so that this question becomes the definitive explanation for all the magic that happens under the hood. Although I understand the broad level of the question, these things are so intertwined that any focused answer will not solve the main problem: understand how all works, what is outdated, what is current, what are just alternatives for the same task, what are the quirks.
The list of keywords to refer to is the following, but this is just a sample out of the bunch. There's a lot more and you are welcome to add additional details.
PyPI
setuptools / Distribute
distutils
eggs
egg-link
pip
zipimport
site.py
site-packages
.pth files
virtualenv
handling of compiled modules in eggs (with and without installation via easy_install)
use of get_data()
pypm
bento
PEP 376
the cheese shop
eggsecutable
Linking to other answers is probably a good idea. As I said, this question is for the high-level overview.
For the most part, this is an attempt to look at the packaging/distribution side, not the mechanics of import. Unfortunately, packaging is the place where Python provides way more than one way to do it. I'm just trying to get the ball rolling, hopefully others will help fill what I miss or point out mistakes.
First of all there's some messy terminology here. A directory containing an __init__.py file is a package. However, most of what we're talking about here are specific versions of packages published on PyPI, one of it's mirrors, or in a vendor specific package management system like Debian's Apt, Redhat's Yum, Fink, Macports, Homebrew, or ActiveState's pypm.
These published packages are what folks are trying to call "Distributions" going forward in an attempt to use "Package" only as the Python language construct. You can see some of that usage in PEP-376 PEP-376.
Now, your list of keywords relate to several different aspects of the Python Ecosystem:
Finding and publishing python distributions:
PyPI (aka the cheese shop)
PyPI Mirrors
Various package management tools / systems: apt, yum, fink, macports, homebrew
pypm (ActiveState's alternative to PyPI)
The above are all services that provide a place to publish Python distributions in various formats. Some, like PyPI mirrors and apt / yum repositories can be run on your local machine or within your companies network but folks typically use the official ones. Most, if not all provide a tool (or multiple tools in the case of PyPI) to help find and download distributions.
Libraries used to create and install distributions:
setuptools / Distribute
distutils
Distutils is the standard infrastructure on which Python packages are compiled and built into distributions. There's a ton of functionality in distutils but most folks just know:
from distutils.core import setup
setup(name='Distutils',
version='1.0',
description='Python Distribution Utilities',
author='Greg Ward',
author_email='gward#python.net',
url='http://www.python.org/sigs/distutils-sig/',
packages=['distutils', 'distutils.command'],
)
And to some extent that's a most of what you need. With the prior 9 lines of code you have enough information to install a pure Python package and also the minimal metadata required to publish that package a distribution on PyPI.
Setuptools provides the hooks necessary to support the Egg format and all of it's features and foibles. Distribute is an alternative to Setuptools that adds some features while trying to be mostly backwards compatible. I believe Distribute is going to be included in Python 3 as the successor to Distutil's from distutils.core import setup.
Both Setuptools and Distribute provide a custom version of the distutils setup command
that does useful things like support the Egg format.
Python Distribution Formats:
source
eggs
Distributions are typically provided either as source archives (tarball or zipfile). The standard way to install a source distribution is by downloading and uncompressing the archive and then running the setup.py file inside.
For example, the following will download, build, and install the Pygments syntax highlighting library:
curl -O -G http://pypi.python.org/packages/source/P/Pygments/Pygments-1.4.tar.gz
tar -zxvf Pygments-1.4.tar.gz
cd Pygments-1.4
python setup.py build
sudo python setup.py install
Alternatively you can download the Egg file and install it. Typically this is accomplished by using easy_install or pip:
sudo easy_install pygments
or
sudo pip install pygments
Eggs were inspired by Java's Jarfiles and they have quite a few features you should read about here
Python Package Formats:
uncompressed directories
zipimport (zip compressed directories)
A normal python package is just a directory containing an __init__.py file and an arbitrary number of additional modules or sub-packages. Python also has support for finding and loading source code within *.zip files as long as they are included on the PYTHONPATH (sys.path).
Installing Python Packages:
easy_install: the original egg installation tool, depends on setuptools
pip: currently the most popular way to install python packages. Similar to easy_install but more flexible and has some nice features like requirements files to help document dependencies and reproduce deployments.
pypm, apt, yum, fink, etc
Environment Management / Automated Deployment:
bento
buildout
virtualenv (and virtualenvwrapper)
The above tools are used to help automate and manage dependencies for a Python project. Basically they give you tools to describe what distributions your application requires and automate the installation of those specific versions of your dependencies.
Locations of Packages / Distributions:
site-packages
PYTHONPATH
the current working directory (depends on your OS and environment settings)
By default, installing a python distribution is going to drop it into the site-packages directory. That directory is usually something like /usr/lib/pythonX.Y/site-packages.
A simple programmatic way to find your site-packages directory:
from distuils import sysconfig
print sysconfig.get_python_lib()
Ways to modify your PYTHONPATH:
Python's import statement will only find packages that are located in one of the directories included in your PYTHONPATH.
You can inspect and change your path from within Python by accessing:
import sys
print sys.path
sys.path.append("/home/myname/lib")
Besides that, you can set the PYTHONPATH environment variable like you would any other environment variable on your OS or you could use:
.pth files: *.pth files located in directories that are already on your PYTHONPATH are read and each line of the *.pth file is added to your PYTHONPATH. Basically any time you would copy a package into a directory on your PYTHONPATH you could instead create a mypackages.pth. Read more about *.pth files: site module
egg-link files: Internal structure of python eggs they are a cross platform alternative to symbolic links. Creating an egg link file is similar to creating a pth file.
site.py modifications
To add the above /home/myname/lib to site-packages with a *.pth file you'd create a *.pth file. The name of the file doesn't matter but you should still probably choose something sensible.
Let's create myname.pth:
# myname.pth
/home/myname/lib
That's it. Drop that into sysconfig.get_python_lib() on your system or any other directory in your PYTHONPATH and /home/myname/lib will be added to the path.
For packaging question, this should help http://guide.python-distribute.org/
For import, the old article from Fredrik Lundh http://effbot.org/zone/import-confusion.htm still a very good starting point.
I recommend Tarek Ziadek's Book on Python. There's a chapter dedicated to packaging and distribution.
I don't think import needs to be explored (Python's namespacing and importing functionality is intuitive IMHO).
I use pip exclusively now. I haven't run into any issues with it.
However, the topic of packaging and distribution is something worth exploring. Instead of giving a lengthy answer, I will say this:
I learned how to package and distribute my own "packages" by simply copying how Pylons or many other open-source packages do it. I then combined that sort-of template with reading up of the docs to flesh it out even further and have come up with a solid distribution method.
When you grok package management and distribution for python (distutils and pypi) it's actually quite powerful. I like it a lot.
[edit]
I also wanted to add in a bit about virtualenv. USE IT. I create a virtualenv for every project and I always use --no-site-packages; I install all the packages I need for that particular project (even if it's something common amongst them all, like lxml) inside the virtualev. It keeps everything isolated and it's much easier for me to maintain the grouping in my head (rather than trying to keep track of what's where and for which version of python!)
[/edit]