Is "locking" requirements in conda's meta.yaml a good idea?

Is "locking" requirements in conda's meta.yaml a good idea? - python

In a recent conversation with a colleague we were discussing whether it is a best practice to "lock" or specify a certain major version in the meta.yaml file like so:
requirements:
build:
- python 3.5.*
- pyserial 2.7.*
run:
- python 3.5.*
- pyserial 2.7.*
instead of ...
requirements:
build:
- python >=3.5
- pyserial >=2.7
run:
- python >=3.5
- pyserial >=2.7
His concern, which I understand, is that the developers of pyserial, as an example, might change something significantly in, say, version 3.0 that will break our modules. Is this concern justifiable to locking down fairly specific version of the dependencies?
I argue that while dependencies could very well break our code, that the people that write those dependencies are taking a lot of that into account and that if something should break it would be trivial to downgrade to a working version anyway. And I've not seen such a restrictive model as the one he is suggesting. Is there a reason for that?

With this:
requirements:
build:
- python 3.5.*
- pyserial 2.7.*
run:
- python 3.5.*
- pyserial 2.7.*
you are not locking on a major version, but on a minor version, locking on a major version would require:
requirements:
build:
- python 3.*.*
- pyserial 2.*.*
run:
- python 3.*.*
- pyserial 2.*.*
(or maybe just one * instead of *.*).
This kind of locking is equivalent to using >= and < when installing with pip on minor version number:
pip install 'some.package>=0.14.0,<0.15'
or on major version number:
pip install 'some.package>=0.0,<1.0'
There are multiple aspects to consider when locking:
The semantics
Preferably there are predefined semantics on what it means to change the major, minor and micro/build/revision number of package. If changing the major version number of a package is defined as an API change that may break something, then not fixing (i.e using just >=) will break something.
These semantics are not the same for each package. Especially if the major version is 0, a package might still change and you might want to fix on the minor version number (e.g. using 0.3.*, or in pip: >=0.3,<0.4)
The complexity of your dependencies
If you have multiple dependencies, and the dependencies are dependent on each other, you might have non-overlapping locking requirements. E.g. you specify pyserial 2.7.* and some.package 0.4.* and then some.package requires pyserial 2.5.*
How soon will you find out breakage, and how easy is it to fix it
If you have proper test coverage you should be able to find out if a new version of some package breaks your "build". It then depends on how easy it is to correct that (e.g. find the last working version of such package) and whether that can be done in time. If you have an urgent bug fix in your own software, and deployment is delayed because you didn't lock and now have to spend time finding the culprit package and locking it down, this might be unacceptable.
What do you lose if you don't have the latest version
You probably want all the bug fixes and non of the incompatibilities when selecting another package. On the other hand you should already have found all of the bugs that might exists in all of the packages that you depend upon, if you have proper test coverage, which is an argument for "tight" locking on specific versions.
Chances of breakage within the selected lock range
If a package, that you depend upon, has no—or little—test coverage itself, and/or has a reputation for breaking even on micro/build version number changes, you might want to lock it down completely.
What is actually best, as so often, depends on all of the above. If you cannot deal with delays in deployment—figuring out on which minor version a package went wrong—you have to lock down. But if your package dependency tree is complex, you might have to loosen restrictions in order to be able to install all packages.
Most important in all of this is knowing what the consequences are what you are doing. Tight locking on complex projects might take considerable time for you to find "allowed" version number ranges for package, just make sure you have that time when you are forced having to change it.
So yes there might be a reason for the tight locking your colleague suggests. But whether that is warranted in your situation depends on all the factors mentioned above.

Related

How can I know what Python versions can run my code?

I've read in few places that generally, Python doesn't provide backward compatibility, which means that any newer version of Python may break code that worked fine for earlier versions. If so, what is my way as a developer to know what versions of Python can execute my code successfully? Is there any set of rules/guarantees regarding this? Or should I just tell my users: Just run this with Python 3.8 (for example) - no more no less...?

99% of the time, if it works on Python 3.x, it'll work on 3.y where y >= x. Enabling warnings when running your code on the older version should pop DeprecationWarnings when you use a feature that's deprecated (and therefore likely to change/be removed in later Python versions). Aside from that, you can read the What's New docs for each version between the known good version and the later versions, in particular the Deprecated and Removed sections of each.
Beyond that, the only solution is good unit and component tests (you are using those, right? 😉) that you rerun on newer releases to verify stuff still works & behavior doesn't change.

According to PEP387, section "Making Incompatible Changes", before incompatible changes are made, a deprecation warning should appear in at least two minor Python versions of the same major version, or one minor version in an older major version. After that, it's a free game, in principle. This made me cringe with regards to safety. Who knows if people run airplanes on Python and if they don't always read the python-dev list. So if you have something that passes 100% coverage unit tests without deprecation warnings, your code should be safe for the next two minor releases.
You can avoid this issue and many others by containerizing your deployments.

tox is great for running unit tests against multiple Python versions. That’s useful for at least 2 major cases:
You want to ensure compatibility for a certain set of Python versions, say 3.7+, and to be told if you make any breaking changes.
You don’t really know what versions your code supports, but want to establish a baseline of supported versions for future work.
I don’t use it for internal projects where I can control over the environment where my code will be running. It’s lovely for people publishing apps or libraries to PyPI, though.

As a python package maintainer, how can I determine lowest working requirements

While it is possible to simply use pip freeze to get the current environment, it is not suitable to require an environment as bleeding edge as what I am used too.
Moreover, some developer tooling are only available on recent version of packages (think type annotations), but not needed for users.
My target users may want to use my package on slowly upgrading machines, and I want to get my requirements as low as possible.
For example, I cannot require better than Python 3.6 (and even then I think some users may be unable to use the package).
Similarly, I want to avoid requiring the last Numpy or Matplotlib versions.
Is there a (semi-)automatic way of determining the oldest compatible version of each dependency?
Alternatively, I can manually try to build a conda environment with old packages, but I would have to try pretty randomly.
Unfortunately, I inherited a medium-sized codebase (~10KLoC) with no automated test yet (I plan on making some, but it takes some time, and it sadly cannot be my priority).
The requirements were not properly defined either so that I don't know what it has been run with two years ago.

Because semantic versionning is not always honored (and because it may be difficult from a developper standpoint to determine what is a minor or major change exactly for each possible user), and because only a human can parse release notes to understand what has changed, there is no simple solution.
My technical approach would be to create a virtual environment with a known working combination of Python and libraries versions. From there, downgrade one version by one version, one lib at a time, verifying that it still works fine (may be difficult if it is manual and/or long to check).
My social solution would be to timebox the technical approach to take no more than a few hours. Then settle for what you have reached. Indicate in the README that lib requirements may be overblown and that help is welcome.
Without fast automated tests in which you are confident, there is no way to automate the exploration of the N-space (each library is a dimension) to find a some minimums.

Method to determine lowest required versions of python packages for a project/package?

This question concerns any package, not just Python version itself. To give some context: we are planning to build an internal package at work, which naturally will have many dependencies. To give freedom for our developers and avoid messy version conflicts, I want to specify broader constraints for packages requirements(.txt), for example, pandas>=1.0 or pyspark>=1.0.0, <2.0.
There is a way to efficiently determine/test which are the lowest required versions for a given code?
I could install pandas==0.2.4 and I see if the code runs, and so on, but that approach seems to get out of hand pretty fast. It's the first time I work on package building, so I am kinda lost on that. Looking at other package's source-code (on GitHub) didn't help me, because I have no idea what is the methodology developers use to specify dependency constraints.

How to specify Python requirements by allowing prereleases?

I am having some kind of confusion regarding the right way of declaring requirements of Python packages.
New builds that are not officially released yes do have pre-release names like 0.2.3.dev20160513165655.
pip is really smart to install pre-releases when we add --pre option and when we are building the develop branch we do use it. Master branch does not use it.
I discovered that if I put foobar>=0.2.3 in a requirements file the development version will not be picked even if I specified the --pre parameter.
The pip documentation is not helping here too much because is missing to point anything about pre-releases.
I used the approach of putting foobar>0.2.2 which in conjunction with --pre would install the pre-release.
Still even this if a bit flawed because if we release a hotfix like 0.2.2.1 it may have picked it.
So, what's the best approach to deal with this?
Side note: It would be highly desired not to have to patch the requirement file when we do make a release (a pull request from develop to master). Please remember that develop branch is always using --pre and the master doesn't.

For anyone else coming across this question, the answer is in the same documentation:
If a Requirement specifier includes a pre-release or development version (e.g. >=0.0.dev0) then pip will allow pre-release and development versions for that requirement. This does not include the != flag.
Hence, specifying >=0.2.3.dev0 or similar should pick the "newest" prerelease.
Note that if you already have 0.2.3 released, it will always sort "newer" than prereleases such as 0.2.3.dev20160513165655. PEP 440 says the following:
The developmental release segment consists of the string .dev, followed by a non-negative integer value. Developmental releases are ordered by their numerical component, immediately before the corresponding release (and before any pre-releases with the same release segment), and following any previous release (including any post-releases).
It also says:
... publishing developmental releases of pre-releases to general purpose public index servers is strongly discouraged, as it makes the version identifier difficult to parse for human readers. If such a release needs to be published, it is substantially clearer to instead create a new pre-release by incrementing the numeric component.
Developmental releases of post-releases are also strongly discouraged ...
So ideally you would not use a datestamp, but something like dev1, dev2, dev3. I think the PEP is actually saying you should use 0.2.3.dev1, 0.2.4.dev1, 0.2.5.dev1, but either is equally readable. It really depends on how many builds you are producing.
In your case, if 0.2.3 is already released, all the subsequent development releases need to be 0.2.4.dev20160513165655 so that pip will see it as newer.

Make install new Python minor release over previous one

I have built and installed Python 2.7.8 from source on CentOS 6 with the following commands:
./configure --prefix /opt/Python27 --exec-prefix=/opt/Python27
make
make install
Now 2.7.9 is out and I would like to update my installation. Is it reasonable to expect everything to keep working if I uncompress it in a different directory from the previous one and install it with exactly the same commands, i.e. over the previous installation?

In practice, you're probably OK, and the worst-case scenario isn't that bad.
I'm not sure if Python 2.x ever guaranteed binary-API stability between micro versions.* But, according to the dev guide:
The only changes allowed to occur in a maintenance branch without debate are bug fixes. Also, a general rule for maintenance branches is that compatibility must not be broken at any point between sibling minor releases (3.4.1, 3.4.2, etc.). For both rules, only rare exceptions are accepted and must be discussed first.
So, in theory, there could have been a compatibility-breaking release between 2.7.8 and 2.7.9, and the only way to know for sure is to dig through the bug tracker and the python-dev mailing list and so on to see where it was discussed and accepted. And of course they could always have screwed up and make a breaking change without realizing it. But in practice, the first has only happened a few times in history, and the second has as far as I know never happened.
Another thing that can cause a problem is major changes to the required or optional dependencies that Python builds against between your last build. But this is pretty rare in practice. If you've, say, uninstalled zlib since the last build, then yeah, that could break compatibility, but you're unlikely to have done anything like that.
So, what happens if either of those is true? It just means that any binary extensions, or embedding apps, that you've built need to be rebuilt.
Hopefully you've been using pip, in which case, if there's a problem, getting a list of all the extensions in your site-packages and force-reinstalling them is trivial (although it may take a while to run). And if you're using a lot of virtual environments, you could need to do the same for all of them. As for embedding, if you don't know about it, you're not doing it (unless you've built "semi-standalone" executables with something like pyInstaller, which I doubt you have).
So, not too terrible. And, remember, that's usually not a problem at all, it's just the worst-case scenario.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.