How to specify Python requirements by allowing prereleases? - python

I am having some kind of confusion regarding the right way of declaring requirements of Python packages.
New builds that are not officially released yes do have pre-release names like 0.2.3.dev20160513165655.
pip is really smart to install pre-releases when we add --pre option and when we are building the develop branch we do use it. Master branch does not use it.
I discovered that if I put foobar>=0.2.3 in a requirements file the development version will not be picked even if I specified the --pre parameter.
The pip documentation is not helping here too much because is missing to point anything about pre-releases.
I used the approach of putting foobar>0.2.2 which in conjunction with --pre would install the pre-release.
Still even this if a bit flawed because if we release a hotfix like 0.2.2.1 it may have picked it.
So, what's the best approach to deal with this?
Side note: It would be highly desired not to have to patch the requirement file when we do make a release (a pull request from develop to master). Please remember that develop branch is always using --pre and the master doesn't.

For anyone else coming across this question, the answer is in the same documentation:
If a Requirement specifier includes a pre-release or development version (e.g. >=0.0.dev0) then pip will allow pre-release and development versions for that requirement. This does not include the != flag.
Hence, specifying >=0.2.3.dev0 or similar should pick the "newest" prerelease.
Note that if you already have 0.2.3 released, it will always sort "newer" than prereleases such as 0.2.3.dev20160513165655. PEP 440 says the following:
The developmental release segment consists of the string .dev, followed by a non-negative integer value. Developmental releases are ordered by their numerical component, immediately before the corresponding release (and before any pre-releases with the same release segment), and following any previous release (including any post-releases).
It also says:
... publishing developmental releases of pre-releases to general purpose public index servers is strongly discouraged, as it makes the version identifier difficult to parse for human readers. If such a release needs to be published, it is substantially clearer to instead create a new pre-release by incrementing the numeric component.
Developmental releases of post-releases are also strongly discouraged ...
So ideally you would not use a datestamp, but something like dev1, dev2, dev3. I think the PEP is actually saying you should use 0.2.3.dev1, 0.2.4.dev1, 0.2.5.dev1, but either is equally readable. It really depends on how many builds you are producing.
In your case, if 0.2.3 is already released, all the subsequent development releases need to be 0.2.4.dev20160513165655 so that pip will see it as newer.

Related

As a python package maintainer, how can I determine lowest working requirements

While it is possible to simply use pip freeze to get the current environment, it is not suitable to require an environment as bleeding edge as what I am used too.
Moreover, some developer tooling are only available on recent version of packages (think type annotations), but not needed for users.
My target users may want to use my package on slowly upgrading machines, and I want to get my requirements as low as possible.
For example, I cannot require better than Python 3.6 (and even then I think some users may be unable to use the package).
Similarly, I want to avoid requiring the last Numpy or Matplotlib versions.
Is there a (semi-)automatic way of determining the oldest compatible version of each dependency?
Alternatively, I can manually try to build a conda environment with old packages, but I would have to try pretty randomly.
Unfortunately, I inherited a medium-sized codebase (~10KLoC) with no automated test yet (I plan on making some, but it takes some time, and it sadly cannot be my priority).
The requirements were not properly defined either so that I don't know what it has been run with two years ago.
Because semantic versionning is not always honored (and because it may be difficult from a developper standpoint to determine what is a minor or major change exactly for each possible user), and because only a human can parse release notes to understand what has changed, there is no simple solution.
My technical approach would be to create a virtual environment with a known working combination of Python and libraries versions. From there, downgrade one version by one version, one lib at a time, verifying that it still works fine (may be difficult if it is manual and/or long to check).
My social solution would be to timebox the technical approach to take no more than a few hours. Then settle for what you have reached. Indicate in the README that lib requirements may be overblown and that help is welcome.
Without fast automated tests in which you are confident, there is no way to automate the exploration of the N-space (each library is a dimension) to find a some minimums.

pip: selecting index url based on package name?

I have created a local private packages repository. By convention, all those packages are named with an identifying prefix, for example foo-package. These packages may depend on public packages available on PyPi. Let's assume there's no risk of having a package in PyPi with the same name. By using --index-url together with --extra-index-url, I can make pip search on both. This will happen every single time.
Even when pip finds a package on PyPi, it will still try to find it also on the extra url. What I'd like to achieve is that pip only searches the extra url when the package name is foo-*, and only searches PyPi for everything else. Is this possible somehow?
As far as I understood, the philosophy from the point of view of pip, and PyPI (and I guess PyPA ecosystem in general) is that indexes should be indistinguishable, interchangeable. If 2 projects of the same name exist on 2 indexes, it should be assumed that they are the exact same project. And 2 distributions of the same name and version number should be assumed to be the exact same distribution and so it does not matter from which one we fetch. In other words:
Packages are expected to be unique up to name and version, so two wheels with the same package name and version are treated as indistinguishable by pip. This is a deliberate feature of the package metadata, and not likely to change.
-- https://github.com/pypa/pip/issues/5045#issuecomment-369521345
[Short of relying on direct URLs Library # https://dists.tango.dev/library-1.2.3-xyz.whl I do not see how it can be done, right now. But maybe I am missing something obvious.]
If one needs to circumvent this behaviour and regain control over the situation, they need to put something like devpi or pydist in place.
In the case of devpi, its "inheritance" feature seems of particular importance here. As far as I understood this is the key feature that would prevent downloading a dependency from the "wrong" index (not sure how exactly that works and how to do the configuration, though).
For pydist: https://pydist.com/blog/extra-index-url
Probably also possible in other servers...
References:
Dependency notation including the index URL
https://github.com/pypa/pip/issues/5045#issuecomment-369521345
PyDist – Blog – The Problem with --extra-index-url

How to handle multiple major versions of dependency

I'm wondering how to handle multiple major versions of a dependency library.
I have an open source library, Foo, at an early release stage. The library is a wrapper around another open source library, Bar. Bar has just launched a new major version. Foo currently only supports the previous version. As I'm guessing that a lot of people will be very slow to convert from the previous major version of Bar to the new major version, I'm reluctant to switch to the new version myself.
How is this best handled? As I see it I have these options
Switch to the new major version, potentially denying people on the old version.
Keep going with the old version, potentially denying people on the new version.
Have two different branches, updating both branches for all new features. Not sure how this works with PyPi. Wouldn't I have to release at different version numbers each time?
Separate the repository into two parts. Don't really want to do this.
The ideal solution for me would be to have the same code base, where I could have some sort of C/C++ macro-like thing where if the version is new, use new_bar_function, else use old_bar_function. When installing the library from PyPi, the already installed version of the major version dictates which version is used. If no version is installed, install the newest.
Would much appreciate some pointers.
Normally the Package version information is available after import with package.__version__. You could parse that information from Bar and decide based on this what to do (chose the appropriate function calls or halt the program or raise an error or ...).
You might also gain some insight from https://www.python.org/dev/peps/pep-0518/ for ways to control dependency installation.
It seems that if someone already has Bar installed, installing Foo only updates Bar if Foo explicitly requires the new version. See https://github.com/pypa/pip/pull/4500 and this answer
Have two different branches, updating both branches for all new features. Not sure how this works with PyPI. Wouldn't I have to release at different version numbers each time?
Yes, you could have a 1.x release (that supports the old version) and a 2.x release (that supports the new version) and release both simultaneously. This is a common pattern for packages that want to introduce a breaking change, but still want to continue maintaining the previous release as well.

Is "locking" requirements in conda's meta.yaml a good idea?

In a recent conversation with a colleague we were discussing whether it is a best practice to "lock" or specify a certain major version in the meta.yaml file like so:
requirements:
build:
- python 3.5.*
- pyserial 2.7.*
run:
- python 3.5.*
- pyserial 2.7.*
instead of ...
requirements:
build:
- python >=3.5
- pyserial >=2.7
run:
- python >=3.5
- pyserial >=2.7
His concern, which I understand, is that the developers of pyserial, as an example, might change something significantly in, say, version 3.0 that will break our modules. Is this concern justifiable to locking down fairly specific version of the dependencies?
I argue that while dependencies could very well break our code, that the people that write those dependencies are taking a lot of that into account and that if something should break it would be trivial to downgrade to a working version anyway. And I've not seen such a restrictive model as the one he is suggesting. Is there a reason for that?
With this:
requirements:
build:
- python 3.5.*
- pyserial 2.7.*
run:
- python 3.5.*
- pyserial 2.7.*
you are not locking on a major version, but on a minor version, locking on a major version would require:
requirements:
build:
- python 3.*.*
- pyserial 2.*.*
run:
- python 3.*.*
- pyserial 2.*.*
(or maybe just one * instead of *.*).
This kind of locking is equivalent to using >= and < when installing with pip on minor version number:
pip install 'some.package>=0.14.0,<0.15'
or on major version number:
pip install 'some.package>=0.0,<1.0'
There are multiple aspects to consider when locking:
The semantics
Preferably there are predefined semantics on what it means to change the major, minor and micro/build/revision number of package. If changing the major version number of a package is defined as an API change that may break something, then not fixing (i.e using just >=) will break something.
These semantics are not the same for each package. Especially if the major version is 0, a package might still change and you might want to fix on the minor version number (e.g. using 0.3.*, or in pip: >=0.3,<0.4)
The complexity of your dependencies
If you have multiple dependencies, and the dependencies are dependent on each other, you might have non-overlapping locking requirements. E.g. you specify pyserial 2.7.* and some.package 0.4.* and then some.package requires pyserial 2.5.*
How soon will you find out breakage, and how easy is it to fix it
If you have proper test coverage you should be able to find out if a new version of some package breaks your "build". It then depends on how easy it is to correct that (e.g. find the last working version of such package) and whether that can be done in time. If you have an urgent bug fix in your own software, and deployment is delayed because you didn't lock and now have to spend time finding the culprit package and locking it down, this might be unacceptable.
What do you lose if you don't have the latest version
You probably want all the bug fixes and non of the incompatibilities when selecting another package. On the other hand you should already have found all of the bugs that might exists in all of the packages that you depend upon, if you have proper test coverage, which is an argument for "tight" locking on specific versions.
Chances of breakage within the selected lock range
If a package, that you depend upon, has no—or little—test coverage itself, and/or has a reputation for breaking even on micro/build version number changes, you might want to lock it down completely.
What is actually best, as so often, depends on all of the above. If you cannot deal with delays in deployment—figuring out on which minor version a package went wrong—you have to lock down. But if your package dependency tree is complex, you might have to loosen restrictions in order to be able to install all packages.
Most important in all of this is knowing what the consequences are what you are doing. Tight locking on complex projects might take considerable time for you to find "allowed" version number ranges for package, just make sure you have that time when you are forced having to change it.
So yes there might be a reason for the tight locking your colleague suggests. But whether that is warranted in your situation depends on all the factors mentioned above.

Make install new Python minor release over previous one

I have built and installed Python 2.7.8 from source on CentOS 6 with the following commands:
./configure --prefix /opt/Python27 --exec-prefix=/opt/Python27
make
make install
Now 2.7.9 is out and I would like to update my installation. Is it reasonable to expect everything to keep working if I uncompress it in a different directory from the previous one and install it with exactly the same commands, i.e. over the previous installation?
In practice, you're probably OK, and the worst-case scenario isn't that bad.
I'm not sure if Python 2.x ever guaranteed binary-API stability between micro versions.* But, according to the dev guide:
The only changes allowed to occur in a maintenance branch without debate are bug fixes. Also, a general rule for maintenance branches is that compatibility must not be broken at any point between sibling minor releases (3.4.1, 3.4.2, etc.). For both rules, only rare exceptions are accepted and must be discussed first.
So, in theory, there could have been a compatibility-breaking release between 2.7.8 and 2.7.9, and the only way to know for sure is to dig through the bug tracker and the python-dev mailing list and so on to see where it was discussed and accepted. And of course they could always have screwed up and make a breaking change without realizing it. But in practice, the first has only happened a few times in history, and the second has as far as I know never happened.
Another thing that can cause a problem is major changes to the required or optional dependencies that Python builds against between your last build. But this is pretty rare in practice. If you've, say, uninstalled zlib since the last build, then yeah, that could break compatibility, but you're unlikely to have done anything like that.
So, what happens if either of those is true? It just means that any binary extensions, or embedding apps, that you've built need to be rebuilt.
Hopefully you've been using pip, in which case, if there's a problem, getting a list of all the extensions in your site-packages and force-reinstalling them is trivial (although it may take a while to run). And if you're using a lot of virtual environments, you could need to do the same for all of them. As for embedding, if you don't know about it, you're not doing it (unless you've built "semi-standalone" executables with something like pyInstaller, which I doubt you have).
So, not too terrible. And, remember, that's usually not a problem at all, it's just the worst-case scenario.

Categories