Python upgrade and requirements compatibility

Python upgrade and requirements compatibility - python

I have a Python 3.6 project that I want to migrate to Python 3.8. I also have a
requirements.txt file generated with pip freeze, which is used by the Python 3.6 project. Is there a way to know if the packages listed in requirements.txt, with their own specific pinned version, support/are compatible with Python 3.8?
I could imagine some ways to do that, like check the packages classifiers or look at tox.ini and so on, but the requirements file has ~300 packages listed and doing that manually would be cumbersome at best.

If you do a pip install -r requirements.txt under the new version of Python (in a venv), it'll tell you if it can't find that particular version of a package for it
If there are several missing, it'll be a bit annoying, because it'll only tell you one at a time, but hopefully there won't be too many
Not sure if there's an official documented procedure in the docs, but this should give you a quick idea, especially on the happy path where it just works

Related

Python dependency management best practices

I have a little Python side project which is experiencing some growing pains, wondering how people on larger Python projects manage this issue.
The project is Python/Flask/Docker deployed to AWS. Listed dependencies (that we import directly in the project) are installed from a requirements.txt file with explicit version numbers. We added the version numbers after noticing our new deployments (which rebuild Docker/dependencies etc) would sometimes install newer versions of the packages, causing the project to break.
The issue we're facing now is that an onboarding developer is setting up her environment and facing the same issue - this time with sub-dependencies of the original dependencies. (For example, Flask might install Werkskreug, Jinja2, etc and if some of these are the wrong version, the app breaks.) The obvious solution is to go through each sub-dependency and list out every package, with explicit versions, in requirements.txt. But this is a bit of a pain so I'm asking around to see what people do on Real Projects.
You guys can't be doing this all manually, right? In JS we have NPM and package.lock files and so on - they're automatically built. Is there some equivalent in Python? Have I missed something basic that we should be using here?
Thanks in advance

I wrote a tool that might be helpful for this called realreq.. You can install it from pip pip install realreq. It will generate the requirements you have by reading through your source files and recursively specifying their requirements.
realreq --deep -s /path/to/source will fully specify your dependencies and their sub-dependencies. Note that if you are using a virtual environment you need to have it activated for realreq to be able to find the dependencies, and they must be installed. (i.e realreq needs to be ran in an environment where the dependencies are installed). One of your engineers who has a setup env can run it and then pass the output as a requirements.txt file to your new engineers.

setup.py + virtualenv = chicken and egg issue?

I'm a Java/Scala dev transitioning to Python for a work project. To dust off the cobwebs on the Python side of my brain, I wrote a webapp that acts as a front-end for Docker when doing local Docker work. I'm now working on packaging it up and, as such, am learning about setup.py and virtualenv. Coming from the JVM world, where dependencies aren't "installed" so much as downloaded to a repository and referenced when needed, the way pip handles things is a bit foreign. It seems like best practice for production Python work is to first create a virtual environment for your project, do your coding work, then package it up with setup.py.
My question is, what happens on the other end when someone needs to install what I've written? They too will have to create a virtual environment for the package but won't know how to set it up without inspecting the setup.py file to figure out what version of Python to use, etc. Is there a way for me to create a setup.py file that also creates the appropriate virtual environment as part of the install process? If not — or if that's considered a "no" as this respondent stated to this SO post — what is considered "best practice" in this situation?

You can think of virtualenv as an isolation for every package you install using pip. It is a simple way to handle different versions of python and packages. For instance you have two projects which use same packages but different versions of them. So, by using virtualenv you can isolate those two projects and install different version of packages separately, not on your working system.
Now, let's say, you want work on a project with your friend. In order to have the same packages installed you have to share somehow what versions and which packages your project depends on. If you are delivering a reusable package (a library) then you need to distribute it and here where setup.py helps. You can learn more in Quick Start
However, if you work on a web site, all you need is to put libraries versions into a separate file. Best practice is to create separate requirements for tests, development and production. In order to see the format of the file - write pip freeze. You will be presented with a list of packages installed on the system (or in the virtualenv) right now. Put it into the file and you can install it later on another pc, with completely clear virtualenv using pip install -r development.txt
And one more thing, please do not put strict versions of packages like pip freeze shows, most of time you want >= at least X.X version. And good news here is that pip handles dependencies by its own. It means you do not have to put dependent packages there, pip will sort it out.
Talking about deploy, you may want to check tox, a tool for managing virtualenvs. It helps a lot with deploy.

Python default package path always point to system environment, that need Administrator access to install. Virtualenv able to localised the installation to an isolated environment.
For deployment/distribution of package, you can choose to
Distribute by source code. User need to run python setup.py --install, or
Pack your python package and upload to Pypi or custom Devpi. So the user can simply use pip install <yourpackage>
However, as you notice the issue on top : without virtualenv, they user need administrator access to install any python package.
In addition, the Pypi package worlds contains a certain amount of badly tested package that doesn't work out of the box.
Note : virtualenv itself is actually a hack to achieve isolation.

My version of python have multiple versions of packages in 'site-packages', is it safe to delete some of them?

I have, in /usr/local/lib/python2.7/site-packages multiple versions of the same package.
E.g. I have django-angular-0.7.13-py2.6.egg, django-angular-0.7.13-py2.6.egg-info, and django-angular-0.7.13-py2.7.egg.
Is it safe to delete the two files that are, ostensibly, the wrong version?
When I got into the python interpreter, import the package/module, it tells me it's run from <module 'django_angular' from '/usr/local/lib/python2.7/site-packages/django-angular-0.7.13-py2.6.egg/django_angular/__init__.py'>...
I'm concern I'll irreparably damage my python packages, as is so easy when you mess with these things.
All I could find on the topic is this article, but that's windows specific, and doesn't address having multiple versions of the same thing.

You shouldn't go and delete the files manually, although it is probably safe. It is quite unlikely that your system depends on Django and Python always uses only one version of the installed packages anyway.
However, I would strongly recommend keeping your development environment separated from your system packages. If you haven't already, take a look at pip and virtualenv. Here is one tutorial for them.
With pip you can also uninstall your system libraries if you really want to do that. However, pip only sees one version at a time, so if you want to uninstall the above packages you have to run pip uninstall multiple times. But after all the versions are gone you can install the version you really want.
Also, a better option for displaying the installed versions is to use yolk. With that you don't have to browse the site-packages manually.

Migrating to pip+virtualenv from setuptools

So pip and virtualenv sound wonderful compared to setuptools. Being able to uninstall would be great. But my project is already using setuptools, so how do I migrate? The web sites I've been able to find so far are very vague and general. So here's an anthology of questions after reading the main web sites and trying stuff out:
First of all, are virtualenv and pip supposed to be in a usable state by now? If not, please disregard the rest as the ravings of a madman.
How should virtualenv be installed? I'm not quite ready to believe it's as convoluted as explained elsewhere.
Is there a set of tested instructions for how to install matplotlib in a virtual environment? For some reason it always wants to compile it here instead of just installing a package, and it always ends in failure (even after build-dep which took up 250 MB of disk space). After a whole bunch of warnings it prints src/mplutils.cpp:17: error: ‘vsprintf’ was not declared in this scope.
How does either tool interact with setup.py? pip is supposed to replace easy_install, but it's not clear whether it's a drop-in or more complicated relationship.
Is virtualenv only for development mode, or should the users also install it?
Will the resulting package be installed with the minimum requirements (like the current egg), or will it be installed with sources & binaries for all dependencies plus all the build tools, creating a gigabyte monster in the virtual environment?
Will the users have to modify their $PATH and $PYTHONPATH to run the resulting package if it's installed in a virtual environment?
Do I need to create a script from a text string for virtualenv like in the bad old days?
What is with the #egg=Package URL syntax? That's not part of the standard URL, so why isn't it a separate parameter?
Where is #rev included in the URL? At the end I suppose, but the documentation is not clear about this ("You can also include #rev in the URL").
What is supposed to be understood by using an existing requirements file as "as a sort of template for the new file"? This could mean any number of things.

Wow, that's quite a set of questions. Many of them would really deserve their own SO question with more details. I'll do my best:
First of all, are virtualenv and pip
supposed to be in a usable state by
now?
Yes, although they don't serve everyone's needs. Pip and virtualenv (along with everything else in Python package management) are far from perfect, but they are widely used and depended upon nonetheless.
How should virtualenv be installed?
I'm not quite ready to believe it's as
convoluted as explained elsewhere.
The answer you link is complex because it is trying to avoid making any changes at all to your global Python installation and install everything in ~/.local instead. This has some advantages, but is more complex to setup. It's also installing virtualenvwrapper, which is a set of convenience bash scripts for working with virtualenv, but is not necessary for using virtualenv.
If you are on Ubuntu, aptitude install python-setuptools followed by easy_install virtualenv should get you a working virtualenv installation without doing any damage to your global python environment (unless you also had the Ubuntu virtualenv package installed, which I don't recommend as it will likely be an old version).
Is there a set of tested instructions
for how to install matplotlib in a
virtual environment? For some reason
it always wants to compile it here
instead of just installing a package,
and it always ends in failure (even
after build-dep which took up 250 MB
of disk space). After a whole bunch of
warnings it prints
src/mplutils.cpp:17: error: ‘vsprintf’
was not declared in this scope.
It "always wants to compile" because pip, by design, installs only from source, it doesn't install pre-compiled binaries. This is a controversial choice, and is probably the primary reason why pip has seen widest adoption among Python web developers, who use more pure-Python packages and commonly develop and deploy in POSIX environments where a working compilation chain is standard.
The reason for the design choice is that providing precompiled binaries has a combinatorial explosion problem with different platforms and build architectures (including python version, UCS-2 vs UCS-4 python builds, 32 vs 64-bit...). The way easy_install finds the right binary package on PyPI sort of works, most of the time, but doesn't account for all these factors and can break. So pip just avoids that issue altogether (replacing it with a requirement that you have a working compilation environment).
In many cases, packages that require C compilation also have a slower-moving release schedule and it's acceptable to simply install OS packages for them instead. This doesn't allow working with different versions of them in different virtualenvs, though.
I don't know what's causing your compilation error, it works for me (on Ubuntu 10.10) with this series of commands:
virtualenv --no-site-packages tmp
. tmp/bin/activate
pip install numpy
pip install -f http://downloads.sourceforge.net/project/matplotlib/matplotlib/matplotlib-1.0.1/matplotlib-1.0.1.tar.gz matplotlib
The "-f" link is necessary to get the most recent version, due to matplotlib's unusual download URLs on PyPI.
How does either tool interact with
setup.py? pip is supposed to replace
easy_install, but it's not clear
whether it's a drop-in or more
complicated relationship.
The setup.py file is a convention of distutils, the Python standard library's package management "solution." distutils alone is missing some key features, and setuptools is a widely-used third-party package that "embraces and extends" distutils to provide some additional features. setuptools also uses setup.py. easy_install is the installer bundled with setuptools. Setuptools development stalled for several years, and distribute was a fork of setuptools to fix some longstanding bugs. Eventually the fork was resolved with a merge of distribute back into setuptools, and setuptools development is now active again (with a new maintainer).
distutils2 was a mostly-rewritten new version of distutils that attempted to incorporate the best ideas from setuptools/distribute, and was supposed to become part of the Python standard library. Unfortunately this effort failed, so for the time being setuptools remains the de facto standard for Python packaging.
Pip replaces easy_install, but it does not replace setuptools; it requires setuptools and builds on top of it. Thus it also uses setup.py.
Is virtualenv only for development
mode, or should the users also install
it?
There's no single right answer to that; it can be used either way. In the end it's really your user's choice, and your software ideally should be able to be installed inside or out of a virtualenv; though you might choose to document and emphasize one approach or the other. It depends very much on who your users are and what environments they are likely to need to install your software into.
Will the resulting package be
installed with the minimum
requirements (like the current egg),
or will it be installed with sources &
binaries for all dependencies plus all
the build tools, creating a gigabyte
monster in the virtual environment?
If a package that requires compilation is installed via pip, it will need to be compiled from source. That also applies to any dependencies that require compilation.
This is unrelated to the question of whether you use a virtualenv. easy_install is available by default in a virtualenv and works just fine there. It can install pre-compiled binary eggs, just like it does outside of a virtualenv.
Will the users have to modify their
$PATH and $PYTHONPATH to run the
resulting package if it's installed in
a virtual environment?
In order to use anything installed in a virtualenv, you need to use the python binary in the virtualenv's bin/ directory (or another script installed into the virtualenv that references this binary). The most common way to do this is to use the virtualenv's activate or activate.bat script to temporarily modify the shell PATH so the virtualenv's bin/ directory is first. Modifying PYTHONPATH is not generally useful or necessary with virtualenv.
Do I need to create a script from a
text string for virtualenv like in the
bad old days?
No.
What is with the #egg=Package URL
syntax? That's not part of the
standard URL, so why isn't it a
separate parameter?
The "#egg=projectname-version" URL fragment hack was first introduced by setuptools and easy_install. Since easy_install scrapes links from the web to find candidate distributions to install for a given package name and version, this hack allowed package authors to add links on PyPI that easy_install could understand, even if they didn't use easy_install's standard naming conventions for their files.
Where is #rev included in the URL? At
the end I suppose, but the
documentation is not clear about this
("You can also include #rev in the
URL").
A couple sentences after that quoted fragment there is a link to "read the requirements file format to learn about other features." The #rev feature is fully documented and demonstrated there.
What is supposed to be understood by
using an existing requirements file as
"as a sort of template for the new
file"? This could mean any number of
things.
The very next sentence says "it will keep the packages listed in devel-req.txt in order and preserve comments." I'm not sure what would be a better concise description.

I can't answer all your questions, but hopefully the following helps.
Both virtualenv and pip are very usable. Many Python devs use these everyday.
Since you have a working easy_install, the easiest way to install both is the following:
easy_install pip
easy_install virtualenv
Once you have virtualenv, just type virtualenv yourEnvName and you'll get your new python virtual environment in a directory named yourEnvName.
From there, it's as easy as source yourEnvName/bin/activate and the virtual python interpreter will be your active. I know nothing about matplotlib, but following the installation interactions should work out ok unless there are weird hard-coded path issues.
If you can install something via easy_install you can usually install it via pip. I haven't found anything that easy_install could do that pip couldn't.
I wouldn't count on users being able to install virtualenv (it depends on who your users are). Technically, a virtual python interpreter can be treated as a real one for most cases. It's main use is not cluttering up the real interpreter's site-packages and if you have two libraries/apps that require different and incompatible versions of the same library.
If you or a user install something in a virtualenv, it won't be available in other virtualenvs or the system Python interpreter. You'll need to use source /path/to/yourvirtualenv/bin/activate command to switch to a virtual environment you installed the library on.
What they mean by "as a sort of template for the new file" is that the pip freeze -r devel-req.txt > stable-req.txt command will create a new file stable-req.txt based on the existing file devel-req.txt. The only difference will be anything installed not already specified in the existing file will be in the new file.

Questions about Setuptools and alternatives

I've seen a good bit of setuptools bashing on the internets lately. Most recently, I read James Bennett's On packaging post on why no one should be using setuptools. From my time in #python on Freenode, I know that there are a few souls there who absolutely detest it. I would count myself among them, but I do actually use it.
I've used setuptools for enough projects to be aware of its deficiencies, and I would prefer something better. I don't particularly like the egg format and how it's deployed. With all of setuptools' problems, I haven't found a better alternative.
My understanding of tools like pip is that it's meant to be an easy_install replacement (not setuptools). In fact, pip uses some setuptools components, right?
Most of my packages make use of a setuptools-aware setup.py, which declares all of the dependencies. When they're ready, I'll build an sdist, bdist, and bdist_egg, and upload them to pypi.
If I wanted to switch to using pip, what kind of changes would I need to make to rid myself of easy_install dependencies? Where are the dependencies declared? I'm guessing that I would need to get away from using the egg format, and provide just source distributions. If so, how do i generate the egg-info directories? or do I even need to?
How would this change my usage of virtualenv? Doesn't virtualenv use easy_install to manage the environments?
How would this change my usage of the setuptools provided "develop" command? Should I not use that? What's the alternative?
I'm basically trying to get a picture of what my development workflow will look like.
Before anyone suggests it, I'm not looking for an OS-dependent solution. I'm mainly concerned with debian linux, but deb packages are not an option, for the reasons Ian Bicking outlines here.

pip uses Setuptools, and doesn't require any changes to packages. It actually installs packages with Setuptools, using:
python -c 'import setuptools; __file__="setup.py"; execfile(__file__)' \
install \
--single-version-externally-managed
Because it uses that option (--single-version-externally-managed) it doesn't ever install eggs as zip files, doesn't support multiple simultaneously installed versions of software, and the packages are installed flat (like python setup.py install works if you use only distutils). Egg metadata is still installed. pip also, like easy_install, downloads and installs all the requirements of a package.
In addition you can also use a requirements file to add other packages that should be installed in a batch, and to make version requirements more exact (without putting those exact requirements in your setup.py files). But if you don't make requirements files then you'd use it just like easy_install.
For your install_requires I don't recommend any changes, unless you have been trying to create very exact requirements there that are known to be good. I think there's a limit to how exact you can usefully be in setup.py files about versions, because you can't really know what the future compatibility of new libraries will be like, and I don't recommend you try to predict this. Requirement files are an alternate place to lay out conservative version requirements.
You can still use python setup.py develop, and in fact if you do pip install -e svn+http://mysite/svn/Project/trunk#egg=Project it will check that out (into src/project) and run setup.py develop on it. So that workflow isn't any different really.
If you run pip verbosely (like pip install -vv) you'll see a lot of the commands that are run, and you'll probably recognize most of them.

I'm writing this in April 2014. Be conscious of the date on anything written about Python packaging, distribution or installation. It looks like there's been some lessening of factiousness, improvement in implementations, PEP-standardizing and unifying of fronts in the last, say, three years.
For instance, the Python Packaging Authority is "a working group that maintains many of the relevant projects in Python packaging."
The python.org Python Packaging User Guide has Tool Recommendations and The Future of Python Packaging sections.
distribute was a branch of setuptools that was remerged in June 2013. The guide says, "Use setuptools to define projects and create Source Distributions."
As of PEP 453 and Python 3.4, the guide recommends, "Use pip to install Python packages from PyPI," and pip is included with Python 3.4 and installed in virtualenvs by pyvenv, which is also included. You might find the PEP 453 "rationale" section interesting.
There are also new and newish tools mentioned in the guide, including wheel and buildout.
I'm glad I read both of the following technical/semi-political histories.
By Martijn Faassen in 2009: A History of Python Packaging.
And by Armin Ronacher in June 2013 (the title is not serious): Python Packaging: Hate, hate, hate everywhere.

For starters, pip is really new. New, incomplete and largely un-tested in the real world.
It shows great promise but until such time as it can do everything that easy_install/setuptools can do it's not likely to catch on in a big way, certainly not in the corporation.
Easy_install/setuptools is big and complex - and that offends a lot of people. Unfortunately there's a really good reason for that complexity which is that it caters for a huge number of different use-cases. My own is supporting a large ( > 300 ) pool of desktop users, plus a similar sized grid with a frequently updated application. The notion that we could do this by allowing every user to install from source is ludicrous - eggs have proved themselves a reliable way to distribute my project.
My advice: Learn to use setuptools - it's really a wonderful thing. Most of the people who hate it do not understand it, or simply do not have the use-case for as full-featured distribution system.
:-)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.