Python Package and module version management

Python Package and module version management - python

I plan to develop a medium scale web application (with lot of back-end work) using multiple packages (internal and external), since it's my first experience with such scale, can I get some advice on:
How to ensure that all dependencies are satisfied (no required package/module is missing, packages were imported from right location etc.)
How to manage versions of packages/modules. I want to be able to introduce new versions of packages/modules with minimal code changes in other connected packages/modules.
Please let me know if there is a tool that can be of help in this.
If there is any other challenge (which I may not even be aware of) that comes in managing code with this scale then please caution me and provide ways to resolve it.
What I know currently (and think that it may help in code management):
__all__ to define exportable modules.
Use of preceding single underscore to prevent modules from getting imported
__init__.py to manage imports at package level (so when I do import package_name then __init__.py can centrally control other imports)
PEP 328 to manage relative imports, importlib and other such stuff around importing.
I suppose some of the 3rd party packages also define __version__ or VERSION variable to check the version at runtime, but I am not sure if I can rely on it.
If there are books, blogs etc. that can help then please let me know.
EDIT: I am not sure if this type of question fits here on SO. If not then I apologize.

How to ensure that all dependencies are satisfied (no required package/module is missing, packages were imported from right location etc.)
Use a virtualenv and freeze the packages with pip to a requirements file:
$ pip freeze > requirements.txt
$ cat requirements.txt
argparse==1.2.1
distribute==0.6.31
wsgiref==0.1.2
Basically, use virtualenv, distribute and pip whenever you have some code that'll need a module installed.
To install packages according to the version specified in your requirements.txt:
pip install -r requirements.txt

Related

How to make pip check for already installed pgks from multiple directories when installing to a --target dir?

For internal reasons my group shares a conda environment with a number of different groups. This limits flexibility of the package installation, because we don't want to accidentally update dependent packages (I know we live in the past...) To get around the inflexibility my group wants install the packages we develop in a remote directory. Using pip to install the packages works fine using the --target flag to designate the new/remote install folder. We will then modify our PYTHONPATHin our .bashrc to access our newly installed packages via standard import x.
The issue I have is the packages in our setup.py defined in the install_requires=['pandas==0.24.1']are also being installed in the remote directory, even though that requirement is satisfied by the shared python site_packages. What appears to be happening is that pip is installing the dependencies only looking in the remote packages directory. Is there some way install our packages while also having pip look in multiple places for package requirement satisfaction, specifically our python installation's site-packages?
I was thinking pip would use PYTHONPATH to check if a dependency is met, but that does not seem to be the case.
Please let me know if this does not make sense, packaging is still new to me. So i am sure I used the wrong terms all over the place.

I believe using "path configuration files" might help.
Say you have some packages installed in /path/to/external-packages and the regular location for site packages in the current environment is /path/to/site-packages.
Then you could add a file /path/to/site-packages/external-packages.pth with the following content:
/path/to/external-packages
I believe this should at least work for some pip commands: check, list, show, maybe more.
Be careful to read about and experiment with this technique, as it may have undesired side effects. Additionally, if I am not mistaken, there should be no need for modification to PYTHONPATH environment variable.

Are there naming conventions for pip requirements files?

Are there conventions for storing multiple requirements.txt files in a Python code repository. For example, one file for simply running the program, another for day-to-day development, another for making a Windows build.
Some repositories contain two files, requirements.txt and requirements_dev.txt, or requirements.txt and requirements_win.txt - this seems pretty ad-hoc.
I have others with a requires subfolder. But I'm not sure what the meaning of requires/requirements.txt is in this context -- running the application, or for development?
There is no mention of storing multiple requirements files in Structuring your project (Hitchhiker's guide to Python) or pip install (pip documentation).

As far as I am aware, there are no hard-and-fast rules here. At least not via PEP (but someone feel free to correct me).
The Hitchhiker's Guide to Python recommends putting the pip requirements file at the root of your project.
There does not appear to be any requirement that a pip requirements file has to be called requirements.txt. The pip install documentation even uses the example of pip install -r example-requirements.txt.
I would think that the conventions vary from project-to-project and largely depend on your deployment process and project-specific documentation.

requirements.txt vs setup.py

I started working with Python. I've added requirements.txt and setup.py to my project. But, I am still confused about the purpose of both files. I have read that setup.py is designed for redistributable things and that requirements.txt is designed for non-redistributable things. But I am not certain this is accurate.
How are those two files truly intended to be used?

requirements.txt:
This helps you to set up your development environment.
Programs like pip can be used to install all packages listed in the file in one fell swoop. After that you can start developing your python script. Especially useful if you plan to have others contribute to the development or use virtual environments.
This is how you use it:
pip install -r requirements.txt
It can be produced easily by pip itself:
pip freeze > requirements.txt
pip automatically tries to only add packages that are not installed by default, so the produced file is pretty minimal.
setup.py:
This helps you to create packages that you can redistribute.
The setup.py script is meant to install your package on the end user's system, not to prepare the development environment as pip install -r requirements.txt does. See this answer for more details on setup.py.
The dependencies of your project are listed in both files.

The short answer is that requirements.txt is for listing package requirements only. setup.py on the other hand is more like an installation script. If you don't plan on installing the python code, typically you would only need requirements.txt.
The file setup.py describes, in addition to the package dependencies, the set of files and modules that should be packaged (or compiled, in the case of native modules (i.e., written in C)), and metadata to add to the python package listings (e.g. package name, package version, package description, author, ...).
Because both files list dependencies, this can lead to a bit of duplication. Read below for details.
requirements.txt
This file lists python package requirements. It is a plain text file (optionally with comments) that lists the package dependencies of your python project (one per line). It does not describe the way in which your python package is installed. You would generally consume the requirements file with pip install -r requirements.txt.
The filename of the text file is arbitrary, but is often requirements.txt by convention. When exploring source code repositories of other python packages, you might stumble on other names, such as dev-dependencies.txt or dependencies-dev.txt. Those serve the same purpose as dependencies.txt but generally list additional dependencies of interest to developers of the particular package, namely for testing the source code (e.g. pytest, pylint, etc.) before release. Users of the package generally wouldn't need the entire set of developer dependencies to run the package.
If multiplerequirements-X.txt variants are present, then usually one will list runtime dependencies, and the other build-time, or test dependencies. Some projects also cascade their requirements file, i.e. when one requirements file includes another file (example). Doing so can reduce repetition.
setup.py
This is a python script which uses the setuptools module to define a python package (name, files included, package metadata, and installation). It will, like requirements.txt, also list runtime dependencies of the package. Setuptools is the de-facto way to build and install python packages, but it has its shortcomings, which over time have sprouted the development of new "meta-package managers", like pip. Example shortcomings of setuptools are its inability to install multiple versions of the same package, and lack of an uninstall command.
When a python user does pip install ./pkgdir_my_module (or pip install my-module), pip will run setup.py in the given directory (or module). Similarly, any module which has a setup.py can be pip-installed, e.g. by running pip install . from the same folder.
Do I really need both?
Short answer is no, but it's nice to have both. They achieve different purposes, but they can both be used to list your dependencies.
There is one trick you may consider to avoid duplicating your list of dependencies between requirements.txt and setup.py. If you have written a fully working setup.py for your package already, and your dependencies are mostly external, you could consider having a simple requirements.txt with only the following:
# requirements.txt
#
# installs dependencies from ./setup.py, and the package itself,
# in editable mode
-e .
# (the -e above is optional). you could also just install the package
# normally with just the line below (after uncommenting)
# .
The -e is a special pip install option which installs the given package in editable mode. When pip -r requirements.txt is run on this file, pip will install your dependencies via the list in ./setup.py. The editable option will place a symlink in your install directory (instead of an egg or archived copy). It allows developers to edit code in place from the repository without reinstalling.
You can also take advantage of what's called "setuptools extras" when you have both files in your package repository. You can define optional packages in setup.py under a custom category, and install those packages from just that category with pip:
# setup.py
from setuptools import setup
setup(
name="FOO"
...
extras_require = {
'dev': ['pylint'],
'build': ['requests']
}
...
)
and then, in the requirements file:
# install packages in the [build] category, from setup.py
# (path/to/mypkg is the directory where setup.py is)
-e path/to/mypkg[build]
This would keep all your dependency lists inside setup.py.
Note: You would normally execute pip and setup.py from a sandbox, such as those created with the program virtualenv. This will avoid installing python packages outside the context of your project's development environment.

For the sake of completeness, here is how I see it in 3 4 different angles.
Their design purposes are different
This is the precise description quoted from the official documentation (emphasis mine):
Whereas install_requires (in setup.py) defines the dependencies for a single project, Requirements Files are often used to define the requirements for a complete Python environment.
Whereas install_requires requirements are minimal, requirements files often contain an exhaustive listing of pinned versions for the purpose of achieving repeatable installations of a complete environment.
But it might still not easy to be understood, so in next section, there come 2 factual examples to demonstrate how the 2 approaches are supposed to be used, differently.
Their actual usages are therefore (supposed to be) different
If your project foo is going to be released as a standalone library (meaning, others would probably do import foo), then you (and your downstream users) would want to have a flexible declaration of dependency, so that your library would not (and it must not) be "picky" about what exact version of YOUR dependencies should be. So, typically, your setup.py would contain lines like this:
install_requires=[
'A>=1,<2',
'B>=2'
]
If you just want to somehow "document" or "pin" your EXACT current environment for your application bar, meaning, you or your users would like to use your application bar as-is, i.e. running python bar.py, you may want to freeze your environment so that it would always behave the same. In such case, your requirements file would look like this:
A==1.2.3
B==2.3.4
# It could even contain some dependencies NOT strickly required by your library
pylint==3.4.5
In reality, which one do I use?
If you are developing an application bar which will be used by python bar.py, even if that is "just script for fun", you are still recommended to use requirements.txt because, who knows, next week (which happens to be Christmas) you would receive a new computer as a gift, so you would need to setup your exact environment there again.
If you are developing a library foo which will be used by import foo, you have to prepare a setup.py. Period.
But you may still choose to also provide a requirements.txt at the same time, which can:
(a) either be in the A==1.2.3 style (as explained in #2 above);
(b) or just contain a magical single .
.
The latter is essentially using the conventional requirements.txt habit to document your installation step is pip install ., which means to "install the requirements based on setup.py" while without duplication. Personally I consider this last approach kind of blurs the line, adds to the confusion, but it is nonetheless a convenient way to explicitly opt out for dependency pinning when running in a CI environment. The trick was derived from an approach mentioned by Python packaging maintainer Donald in his blog post.
Different lower bounds.
Assuming there is an existing engine library with this history:
engine 1.1.0 Use steam
...
engine 1.2.0 Internal combustion is invented
engine 1.2.1 Fix engine leaking oil
engine 1.2.2 Fix engine overheat
engine 1.2.3 Fix occasional engine stalling
engine 2.0.0 Introducing nuclear reactor
You follow the above 3 criteria and correctly decided that your new library hybrid-engine would use a setup.py to declare its dependency engine>=1.2.0,<2, and then your separated application reliable-car would use requirements.txt to declare its dependency engine>=1.2.3,<2 (or you may want to just pin engine==1.2.3). As you see, your choice for their lower bound number are still subtly different, and neither of them uses the latest engine==2.0.0. And here is why.
hybrid-engine depends on engine>=1.2.0 because, the needed add_fuel() API was first introduced in engine 1.2.0, and that capability is the necessity of hybrid-engine, regardless of whether there might be some (minor) bugs inside such version and been fixed in subsequent versions 1.2.1, 1.2.2 and 1.2.3.
reliable-car depends on engine>=1.2.3 because that is the earliest version WITHOUT known issues, so far. Sure there are new capabilities in later versions, i.e. "nuclear reactor" introduced in engine 2.0.0, but they are not necessarily desirable for project reliable-car. (Your yet another new project time-machine would likely use engine>=2.0.0, but that is a different topic, though.)

TL;DR
requirements.txt lists concrete dependencies
setup.py lists abstract dependencies
A common misunderstanding with respect to dependency management in Python is whether you need to use a requirements.txt or setup.py file in order to handle dependencies.
The chances are you may have to use both in order to ensure that dependencies are handled appropriately in your Python project.
The requirements.txt file is supposed to list the concrete dependencies. In other words, it should list pinned dependencies (using the == specifier). This file will then be used in order to create a working virtual environment that will have all the dependencies installed, with the specified versions.
On the other hand, the setup.py file should list the abstract dependencies. This means that it should list the minimal dependencies for running the project. Apart from dependency management though, this file also serves the package distribution (say on PyPI).
For a more comprehensive read, you can read the article requirements.txt vs setup.py in Python on TDS.
Now going forward and as of PEP-517 and PEP-518, you may have to use a pyproject.toml in order to specify that you want to use setuptools as the build-tool and an additional setup.cfg file to specify the details.
For more details you can read the article setup.py vs setup.cfg in Python.

packaging scientific project in python

I am trying to build a package for an apps in python. It uses sklearn, pandas, numpy, boto and some other scientific module from anaconda. Being very unexperienced with python packaging, I have various questions:
1- I have some confidential files .py in my project which I don't want anyone to be able to see. In java I would have defined private files and classes but I am completely lost in python. What is the "good practice" to deal with these private modules? Can anyone link me some tutorial?
2- What is the best way to package my apps? I don't want to publish anything on Pypi, I only need it to execute on Google App engine for instance. I tried a standalone package with PyInstaller but I could not finish it because of numpy and other scipy packages which makes it hard. Is there a simple way to package in a private way python projects made with anaconda?
3- Since I want to build more apps in a close future, shall I try to make sub-packages in order to use them for other apps?

The convention is to lead with a single underscore _ if something is internal. Note that this is a convention. If someone really wants to use it, they still can. Your code is not strictly confidential.
Take a look at http://python-packaging-user-guide.readthedocs.org/en/latest/. You don't need to publish to pypi to create a Python package that uses tools such as pip. You can create a project with a setup.py file and a requirements.txt file and then use pip to install your package from wherever you have it (e.g., a local directory or a repository on github). If you take this approach then pip will install all the dependencies you list.
If you want to reuse your package, just include it in requirements.txt and the install_requires parameter in setup.py (see http://python-packaging-user-guide.readthedocs.org/en/latest/requirements/). For example, if you install your package with pip install https://github/myname/mypackage.git then you could include https://github/myname/mypackage.git in your requirements.txt file in future projects.

Delete unused packages from requirements file

Is there any easy way to delete no-more-using packages from requirements file?
I wrote a bash script for this task but, it doesn't work as I expected. Because, some packages are not used following their PyPI project names. For example;
dj-database-url
package is used as
dj_database_url
My project has many packages in its own requirements file, so, searching them one-by-one is too messy, error-prone and takes too much time. As I searched, IDEs don't have this property, yet.

You can use Code Inspection in PyCharm.
Delete the contents of your requirements.txt but keep the empty file.
Load your project in,
PyCharm go to Code -> Inspect code....
Choose Whole project option in dialog and click OK.
In inspection results panel locate Package requirements section under Python (note that this section will be showed only if there is any requirements.txt or setup.py file).
The section will contain one of the following messages:
Package requirement '<package>' is not satisfied if there is any package that is listed in requirements.txt but not used in any .py file.
Package '<package>' is not listed in project requirements if there is any package that is used in .py files, but not listed in requirements.txt.
You are interested in the second inspection.
You can add all used packages to requirements.txt by right clicking the Package requirements section and selecting Apply Fix 'Add requirements '<package>' to requirements.txt'. Note that it will show only one package name, but it will actually add all used packages to requirements.txt if called for section.
If you want, you can add them one by one, just right click the inspection corresponding to certain package and choose Apply Fix 'Add requirements '<package>' to requirements.txt', repeat for each inspection of this kind.
After that you can create clean virtual environment and install packages from new requirements.txt.
Also note that PyCharm has import optimisation feature, see Optimize imports.... It can be useful to use this feature before any other steps listed above.

The best bet is to use a (fresh) python venv/virtual-env with no packages, or only those you definitely know you need, test your package - installing missing packages with pip as you hit problems which should be quite quick for most software then use the pip freeze command to list the packages you really need. Better you you could use pip wheel to create a wheel with the packages in.
The other approach would be to:
Use pylint to check each file for unused imports and delete them, (you should be doing this anyway),
Run your tests to make sure that it was right,
Use a tool like snakefood or snakefood3 to generate your new list of dependencies
Note that for any dependency checking to work well it is advisable to avoid conditional import and import within functions.
Also note that to be sure you have everything then it is a good idea to build a new venv/virtual-env and install from your dependencies list then re-test your code.

You can find obsolete dependencies by using deptry, a command line utility that checks for various issues with a project's dependencies, such as obsolete, missing or transitive dependencies.
Add it to your project with
pip install deptry
and then run
deptry .
Example output:
-----------------------------------------------------
The project contains obsolete dependencies:
Flask
scikit-learn
scipy
Consider removing them from your projects dependencies. If a package is used for development purposes, you should add
it to your development dependencies instead.
-----------------------------------------------------
Note that for the best results, you should be using a virtual environment for your project, see e.g. here.
Disclaimer: I am the author of deptry.

In pycharm go to Tools -> Sync Python Requirements. There's a 'Remove unused requirements' checkbox.

I've used with success pip-check-reqs.
With command pip-extra-reqs your_directory it will check for all unused dependencies in your_directory
Install it with pip install pip-check-reqs.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.