Releasing a non-backwards compatible library on PyPI? - python

I have a library on PyPI called foobar and it's currently at version 1.2.0 (using semantic versioning).
The next version doesn't preserve API compatibility with versions 1.x, so I'll release it as 2.0.0.
What is the best practice to publish this new version to PyPI, so that clients which are using the 1.x versions don't accidentally upgrade to 2.0.0 and break their code? (I'm assuming that there are people who didn't enforce a version dependency like >=1.0.0, <2.0.0 in their code).
Would it be better to create a completely new package called foobar2 on PyPI and push the new version there? How do other projects handle this?

I assert: API changes normally fall into two categories.
New API B replaces A but A can entirely be implemented using new API B. Therefore it is feasible to maintain the old API simultaneously with the new API.
This could be as simple as a new API being moved to tidy or rationalize your module, or more complex such as a conversion of args to kwargs or whatever.
New API replaces old API but cannot implement it for whatever technical reason.
These are your options for categories IMO. Which one you take will depend a lot on what changes you are making and how much you a) care about or b) are in contact and can talk to your users (i.e. you can get away with a few unannounced breakages if it's just a few people on your team who you can subsequently help fix their issues).
1. Provide both old and new in your new version.
Implement the old API using the new one but mark it as deprecated using the warnings module. You give people notice to convert and you can remove the old API at some point in the future.
This is best practice for API changes of the first type. It keeps everyone on the same stream and allows you to tidy up at some point in the future.
2. Warn, then introduce new API.
If you are in situation 2 or situation 1 but can't justify the resource to implement old using new, then you can easily release a version 1.2.1 that uses the warnings module to warn users that you are about to add a new version that will break their codez, and that they should quickly peg the version in their requirements.txt.
Say when you're going to release version 2.0, and then you've warned them.
But this is only really fair if it's not too much effort to migrate from 1.2.0 to 2.0 for your users.
3. Add a completely new package.
If there are profound differences, and it would be a right pain for your users to update their code to the point that they would essentially need to rewrite it, then you shouldn't be afraid of just using a completely new package. We all make mistakes, and no one in the Python community is not aware of that given the differences between Python 2 and Python 3 :-). unittest2 is also one such example from a while back.
What will people expect.
Personally, if I had automatic upgrades occurring on a system I cared about and if I didn't peg the versions to upgrade only maintenance releases, I would consider it my fault if someone released a major upgrade that my system automatically took but then stopped working because of it.
That gives you the moral highground IMO, but that isn't much consolation for your lazy users (or worse, customers) who didn't check and got burnt.
What other people do
paramiko kept API back-compatibility on 1.x to 2.0
beautifulsoup change name on PyPI (from BeautifulSoup to bs4)
django tends to deprecate features and remove them in later feature releases, and in general I would never upgrade a django install I cared about from 1.X to 1.(X+1) without testing it first. (I consider this to be the best practice, like a lot of things the django folk do.)
So the summary is: there is a mix, and it really is up to you. However the only completely safe ways to avoid user self-inflicted issues is to keep back-compatibility entirely or create a new package, as BeautifulSoup did.

Related

As a python package maintainer, how can I determine lowest working requirements

While it is possible to simply use pip freeze to get the current environment, it is not suitable to require an environment as bleeding edge as what I am used too.
Moreover, some developer tooling are only available on recent version of packages (think type annotations), but not needed for users.
My target users may want to use my package on slowly upgrading machines, and I want to get my requirements as low as possible.
For example, I cannot require better than Python 3.6 (and even then I think some users may be unable to use the package).
Similarly, I want to avoid requiring the last Numpy or Matplotlib versions.
Is there a (semi-)automatic way of determining the oldest compatible version of each dependency?
Alternatively, I can manually try to build a conda environment with old packages, but I would have to try pretty randomly.
Unfortunately, I inherited a medium-sized codebase (~10KLoC) with no automated test yet (I plan on making some, but it takes some time, and it sadly cannot be my priority).
The requirements were not properly defined either so that I don't know what it has been run with two years ago.
Because semantic versionning is not always honored (and because it may be difficult from a developper standpoint to determine what is a minor or major change exactly for each possible user), and because only a human can parse release notes to understand what has changed, there is no simple solution.
My technical approach would be to create a virtual environment with a known working combination of Python and libraries versions. From there, downgrade one version by one version, one lib at a time, verifying that it still works fine (may be difficult if it is manual and/or long to check).
My social solution would be to timebox the technical approach to take no more than a few hours. Then settle for what you have reached. Indicate in the README that lib requirements may be overblown and that help is welcome.
Without fast automated tests in which you are confident, there is no way to automate the exploration of the N-space (each library is a dimension) to find a some minimums.

How to handle multiple major versions of dependency

I'm wondering how to handle multiple major versions of a dependency library.
I have an open source library, Foo, at an early release stage. The library is a wrapper around another open source library, Bar. Bar has just launched a new major version. Foo currently only supports the previous version. As I'm guessing that a lot of people will be very slow to convert from the previous major version of Bar to the new major version, I'm reluctant to switch to the new version myself.
How is this best handled? As I see it I have these options
Switch to the new major version, potentially denying people on the old version.
Keep going with the old version, potentially denying people on the new version.
Have two different branches, updating both branches for all new features. Not sure how this works with PyPi. Wouldn't I have to release at different version numbers each time?
Separate the repository into two parts. Don't really want to do this.
The ideal solution for me would be to have the same code base, where I could have some sort of C/C++ macro-like thing where if the version is new, use new_bar_function, else use old_bar_function. When installing the library from PyPi, the already installed version of the major version dictates which version is used. If no version is installed, install the newest.
Would much appreciate some pointers.
Normally the Package version information is available after import with package.__version__. You could parse that information from Bar and decide based on this what to do (chose the appropriate function calls or halt the program or raise an error or ...).
You might also gain some insight from https://www.python.org/dev/peps/pep-0518/ for ways to control dependency installation.
It seems that if someone already has Bar installed, installing Foo only updates Bar if Foo explicitly requires the new version. See https://github.com/pypa/pip/pull/4500 and this answer
Have two different branches, updating both branches for all new features. Not sure how this works with PyPI. Wouldn't I have to release at different version numbers each time?
Yes, you could have a 1.x release (that supports the old version) and a 2.x release (that supports the new version) and release both simultaneously. This is a common pattern for packages that want to introduce a breaking change, but still want to continue maintaining the previous release as well.

How do I find out all previous versions of python with which my code is compatible

I have created a medium sized project in python 2.7.3 containing around 100 modules. I wish to find out with which previous versions of python (ex: 2.6.x, 2.7.x) is my code compatible (before releasing my project in public domain). What is the easiest way to find it out?
Solutions I know -
Install multiple versions of python and check in every versions. But I don't have test cases defined yet, so need to define those first.
Read and compare changelog of the various python versions I wish to check compatibility for, and accordingly find out.
Kindly provide better solutions.
I don't really know of a way to get around doing this without some test cases. Even if your code could run in an older version of python there is no guarantee that it works correctly without a suite of test cases that sufficiently test your code
No, what you named is pretty much how it's done, though the What's New pages and the documentation proper may be more useful than the full changelog. Compatibility to such a huge, moving target is infeasible to automate even partially. It's just not as much work as it sounds like, because:
Some people do have test suites ;-)
You don't (usually) need to consider bugfix releases (such as 2.7.x for various x). It's possible that your code requires a bug fix, but generally the .0 releases are quite reliable and code compatible with x.y.0 can run on any x.y.z version.
Thanks to the backwards compatibility policy, it is enough to establish a minimum supported version, all later releases (of the same major version) will stay compatible. This doesn't help in your case as 2.7 is the last 2.x release ever, but if you target, say, 2.5 then you usually don't have to check for 2.6 or 2.7 compatibility.
If you keep your eyes open while coding, and have a bit of experience as well as a good memory, you'll know you used some functionality that was introduced in a recent version. Even if you don't know what version specifically, you can look it up quickly in the documentation.
Some people embark with the intent to support a specific version, and always keep that in mind when developing. Even if it happens to work on other versions, they'd consider it unsupported and don't claim compatibility.
So, you could either limit yourself to 2.7 (it's been out for three years), or perform tests on older releases. If you just want to determine whether it's compatible, not which incompatibilities there are and how they can be fixed, you can:
Search the What's New pages for new features, most importantly new syntax, which you used.
Check the version constraints of third party libraries you used.
Search the documentation of standard library modules you use for newly added functionality.
A lot easier with some test cases but manual testing can give you a reasonably idea.
Take the furthest back version that you would hope to support, (I would suggest 2.5.x but further back if you must - manually test with that version keeping notes of what you did and especially where it fails if any where - if it does fail then either address the issue or do a binary search to see which version the failure point(s) disappear at. This could work even better if you start from a version that you are quite sure you will fail at, 2.0 maybe.
1) If you're going to maintain compatibility with previous versions, testing is the way to go. Even if your code happens to be compatible now, it can stop being so at any moment in the future if you don't pay attention.
2) If backwards compatibility is not an objective but just a "nice side-feature for those lucky enough", an easy way for OSS is to let users try it out, noting that "it was tested in <version> but may work in previous ones as well". If there's anyone in your user base interested in running your code in an earlier version (and maintain compatibility with it), they'll probably give you feedback. If there isn't, why bother?

What versions of Python are in current use, and packages should support?

As a small-time Python package writer (cobs, simplerandom), I'm wondering what Python versions I should support.
I've heard anecdotally that Python 2.5 is still in use on enterprise type servers. So I thought 2.5 was the oldest that needed to be practically supported, here in 2011.
However, I saw this blog in which the author says he's still using 2.4. From memory, I saw an e-mail on the PyCrypto mailing list saying they aimed to keep support going back to 2.2 if possible.
Of course, then there's Python 3.x which is slowly gaining momentum. It would be good to know who is using that.
Then, there is also Jython and Ironpython, and I have very little idea about them.
Is there any concrete and up-to-date Python installation/usage data available to enlighten us? Is there any "best practices" or other advice for what versions/flavours of Python a package writer should aim to support?
I think that this is a problem that's simply inherent when developing any software. Anyone could be running any version and would need support for that version (I wonder how many people are still running Windoze ME out there? ;)). Personally, when developing libraries, I'll support only support the current version+. If for no other reason, because I'm only one person and I don't have a team.
Having said that, I'd stick my packages up on github and take patches from anyone who wants support for previous versions (and is willing to put in the work).
Edit:
I've found that a good rule in software development (especially packages) is develop only for what is needed, not what you think might be needed. In other words, get it working for whatever version of Python you're running (or is dearest to you) and then take support requests if you want to implement them yourself as people need them.
I have servers that run Python 2.3. :-)
But no, you don't need to support it. Most servers like that are just running, and do not get any new modules installed.
When creating a new module today, 2.6, 2.7, 3.1 and 3.2 is the versions to support. For existing modules you can ask your users. :-)
You should support the latest of the 2.x series (2.7 as of now) and 3.x (3.2 as of now). Unless you have a specific need for supporting older versions, I don't think you need to go there.
As for alternate implementations like IronPython and Jython, you should do that only if needed. It can be time consuming (although perhaps educational) to support you app for all these implementations.
As a side node, to test your app on multiple versions/implementations of Python, I recommend tox.

When should I drop support for python2.4 on my public python library?

I maintain an open source python project. Right now it supports python 2.4, 2.5, 2.6. I am looking for to add support for python 3. I guess it will be easier if I drop 2.4 support.
I know it is possible to support all but it is very annoying if I have to install 4 or 5 python versions on my machine and run the tests on all of them. Although it is easy to avoid new features introduced in the language I would like to make use of them! And what is the point of supporting something that possible nobody uses? I do want to drop it, but also dont want to loose users (existing and new).
When should I drop support for python 2.4? Is there any recommendation on this?
I'd say it depends on your target audience. For enterprise stuff I think RedHat (certainly CentOS 5) are still on 2.4 - so if you want typical RedHat/CentOS using people to able to install without resorting to third party python installations then I think you need to keep 2.4 for a while. If most of your users are more 'desktop' based running Fedora/Ubuntu then they probably have 2.5/2.6 already so it wouldn't be an issue for them.
You don't have to drop support for 2.4 in order to add support for 3.x, as I'm sure you know. I've made coverage.py run on 2.3 through 3.1 with the same code. It's not always pretty, but it's possible: Running the same code on Python 2.x and 3.x.
It's a matter of weighing pros and cons.
I suppose the real answer to this question is how many features there are in 2.5/2.6 that would really improve your library. It seems as though 2.4 becomes less and less worth supporting as time goes by.
On the other hand, there are still some people on Python 2.4. You have to decide if it's worth it to drop support for them to take advantage of newer features of Python 2.5.
You don't have to drop anything, what works on 2.4, works on 2.5 and 2.6. You can easily avoid incompatibilities skipping "with", the ternary operation, et "import future".
Now, once you have a very stable and full featured version of your code and need to make a big achitectural change, start writting for Python 3.0. No rush, it won't be massively used before one year or two.
A good indicator is to focus on project that have the same audience as yours. When do they switch on the roadmap ?
GNOME ?
Django ?
Inkscape ?
Track downloads of each version of your project. Graph the daily traffic (or weekly if there is too much variation day to day) for each version separately. Keep an eye on the trends and at some point you will see a distinctly downward trend for 2.4 compared to the rest. When that downward trend is well established, discontinue upgrades to the 2.4 version, but keep it available for download. You should include some kind of note in the README for the last 2.4 version, and maybe display a message when it is installed.
At this point, your work is done, unless you find some really glaring error that you want to fix. You don't ever have to actually discontinue the 2.4 version, just cease upgrading it.
And the graphs that you now produce every week will tell you when it is time to do the same for 2.5, and eventually 2.6.
Any answer here is going to be subjective. I suggest you make a feature and user list. There are 2 things to consider here.
1: How will your program benefit - what features are better nicer/faster/less buggy in newer versions of Python ? What extra dependent libraries can your program utilize by sticking to an older version ? Not everything is ported to 3.x or even 2.5 yet.
2: How will your user benefit - What benefits do users gain from older versions. How much bigger / smaller does your user base get by dropping 2.4 and adding 3.x ? What does your user base look like currently.
The third is not really a point since direct benefit from Open Source to developers is kinda iffy - but what do you gain ? i.e. less time needed to maintain, faster development etc.
Hope making a summary will help you put things in perspective.

Categories