I have built and installed Python 2.7.8 from source on CentOS 6 with the following commands:
./configure --prefix /opt/Python27 --exec-prefix=/opt/Python27
make
make install
Now 2.7.9 is out and I would like to update my installation. Is it reasonable to expect everything to keep working if I uncompress it in a different directory from the previous one and install it with exactly the same commands, i.e. over the previous installation?
In practice, you're probably OK, and the worst-case scenario isn't that bad.
I'm not sure if Python 2.x ever guaranteed binary-API stability between micro versions.* But, according to the dev guide:
The only changes allowed to occur in a maintenance branch without debate are bug fixes. Also, a general rule for maintenance branches is that compatibility must not be broken at any point between sibling minor releases (3.4.1, 3.4.2, etc.). For both rules, only rare exceptions are accepted and must be discussed first.
So, in theory, there could have been a compatibility-breaking release between 2.7.8 and 2.7.9, and the only way to know for sure is to dig through the bug tracker and the python-dev mailing list and so on to see where it was discussed and accepted. And of course they could always have screwed up and make a breaking change without realizing it. But in practice, the first has only happened a few times in history, and the second has as far as I know never happened.
Another thing that can cause a problem is major changes to the required or optional dependencies that Python builds against between your last build. But this is pretty rare in practice. If you've, say, uninstalled zlib since the last build, then yeah, that could break compatibility, but you're unlikely to have done anything like that.
So, what happens if either of those is true? It just means that any binary extensions, or embedding apps, that you've built need to be rebuilt.
Hopefully you've been using pip, in which case, if there's a problem, getting a list of all the extensions in your site-packages and force-reinstalling them is trivial (although it may take a while to run). And if you're using a lot of virtual environments, you could need to do the same for all of them. As for embedding, if you don't know about it, you're not doing it (unless you've built "semi-standalone" executables with something like pyInstaller, which I doubt you have).
So, not too terrible. And, remember, that's usually not a problem at all, it's just the worst-case scenario.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 months ago.
Improve this question
I have yet to come across an answer that makes me WANT to start using virtual environments. I understand how they work, but what I don’t understand is how can someone (like me) have hundreds of Python projects on their drive, almost all of them use the same packages (like Pandas and Numpy), but if they were all in separate venv’s, you’d have to pip install those same packages over and over and over again, wasting so much space for no reason. Not to mention if any of those also require a package like tensorflow.
The only real benefit I can see to using venv’s in my case is to mitigate version issues, but for me, that’s really not as big of an issue as it’s portrayed. Any project of mine that becomes out of date, I update the packages for it.
Why install the same dependency for every project when you can just do it once for all of them on a global scale? I know you can also specify —-global-dependencies or whatever the tag is when creating a new venv, but since ALL of my python packages are installed globally (hundreds of dependencies are pip installed already), I don’t want the new venv to make use of ALL of them? So I can specify only specific global packages to use in a venv? That would make more sense.
What else am I missing?
UPDATE
I’m going to elaborate and clarify my question a bit as there seems to be some confusion.
I’m not so much interested in understanding HOW venv’s work, and I understand the benefits that can come with using them. What I’m asking is:
Why would someone with (for example) have 100 different projects that all require tensorflow to be installed into their own venv’s. That would mean you have to install tensorflow 100 separate times. That’s not just a “little” extra space being wasted, that’s a lot.
I understand they mitigate dependency versioning issues, you can “freeze” packages with their current working versions and can forget about them, great. And maybe I’m just unique in this respect, but the versioning issue (besides the obvious difference between python 2 and 3) really hasn’t been THAT big of an issue. Yes I’ve run into it, but isn’t it better practise to keep your projects up to date with the current working/stable versions than to freeze them with old, possibly no longer supported versions? Sure it works, but that doesn’t seem to be the “best” option to me either.
To reiterate on the second part of my question, what I would think is, if I have (for example) tensorflow installed globally, and I create a venv for each of my 100 tensorflow projects, is there not a way to make use of the already globally installed tensorflow inside of the venv, without having to install it again? I know in pycharm and possibly the command line, you can use a — system-site-packages argument (or whatever it is) to make that happen, but I don’t want to include ALL of the globally installed dependencies, cuz I have hundreds of those too. Is —-system-site-packages -tensorflow for example a thing?
Hope that helps to clarify what I’m looking for out of this discussion because so far, I have no use for venv’s, other than from everyone else claiming how great they are but I guess I see it a bit differently :P
(FINAL?) UPDATE
From the great discussions I've had with the contributors below, here is a summation of where I think venv's are of benefit and where they're not:
USE a venv:
You're working on one BIG project with multiple people to mitigate versioning issues among the people
You don't plan on updating your dependencies very often for all projects
To have a clearer separation of your projects
To containerize your project (again, for distribution)
Your portfolio is fairly small (especially in the data science world where packages like Tensorflow are large and used quite frequently across all of them as you'd have to pip install the same package to each venv)
DO NOT use a venv:
Your portfolio of projects is large AND requires a lot of heavy dependencies (like tensorflow) to mitigate installing the same package in every venv you create
You're not distributing your projects across a team of people
You're actively maintaining your projects and keeping global dependency versions up to date across all of them (maybe I'm the only one who's actually doing this, but whatever)
As was recently mentioned, I guess it depends on your use case. Working on a website that requires contribution from many people at once, it makes sense to all be working out of one environment, but for someone like me with a massive portfolio of Tensorflow projects, that do not have versioning issues or the need for other team members, it doesn't make sense. Maybe if you plan on containerizing or distributing the project it makes sense to do so on an individual basis, but to have (going back to this example) 100 Tensorflow projects in your portfolio, it makes no sense to have 100 different venv's for all of them as you'd have to install tensorflow 100 times into each of them, which is no different than having to pip install tensorflow==2.2.0 for specific old projects that you want to run, which in that case, just keep your projects up to date.
Maybe I'm missing something else major here, but that's the best I've come up with so far. Hope it helps someone else who's had a similar thought.
I'm a data scientist and sometimes I run into these things called "virtual environments" and I don't get what the use case is? I already have all of these packages and modules and widgets downloaded! Why should I set up a separate place where I manage all of the stuff I'm already managing globally?
Python is a very powerful tool. In this answer consider two such ways to swing the metaphorical hammer:
Data Science
Software Engineering
For a data scientist (working alone) using Python to write a poc for a research paper, make a lstm nn, or predict the price of TSLA dependent on the frequency of Elon Musk's tweets all that really matters is being able to use the best library (tensorflow, pytorch, sklearn, ...) for whatever task they're trying to get done. In whatever directory they're working in when they need it. It is very tempting to use one global Python installation and just use the same stuff everywhere. Frankly, this is probably fine. As it's just one person managing their own space. So the configuration of their machine would be one single Python environment and everything, everywhere uses it. Or if the data scientist wanted to they could have a single directory that contains a virtual environment and some sub directories containing all the scripts (projects) they work on.
Now consider a software engineer who has multiple git repos with complete CI/CD pipelines that each build into separate entities that then get deployed to some cloud environment. Them and the 9 other people on their team need to be able to be sure that they are all making changes that won't break any piece of the code. For example in Python 3.6 the function dict.popitem subtly changed from returning a random element in a dict to LIFO order guaranteed. It's pretty easy to see that that could cause issues if Jerry had implemented a function that relies on the original random nature of the function and Bob implemented a function with the LIFO behavior guaranteed. This team of engineers would have git repos that each contain a single virtual environment (a single isolated Python environment) that allows them to manage dependencies for that "project".
The data scientist has one Python installation/environment that allows them to do whatever.
The engineer has a Python installation and a bunch of environments so that they can work across multiple repos with multiple people and (hopefully) nothing breaks.
I can see where you're coming from with your question. It can seem like a lot of work to set up and maintain multiple virtual environments (venvs), especially when many of your projects might use similar or even the same packages.
However, there are some good reasons for using venvs even in cases where you might be tempted to just use a single global environment. One reason is that it can be helpful to have a clear separation between your different projects. This can be helpful in terms of organization, but it can also be helpful if you need to use different versions of packages in different projects.
If you try to share a single venv among all of your projects, it can be difficult to use different versions of packages in those projects when necessary. This is because the packages in your venv will be shared among all of the projects that use that venv. So, if you need to use a different version of a package in one project, you would need to change the version in the venv, which would then affect all of the other projects that use that venv. This can be confusing and make it difficult to keep track of what versions of packages are being used in which projects.
Another issue with sharing a single venv among all of your projects is that it can be difficult to share your code with others. This is because they would need to have access to the same environment (which contains lots of stuff unrelated to the single project you are trying to share). This can be confusing and inconvenient for them.
So, while it might seem like a lot of work to set up and maintain multiple virtual environments, there are some good reasons for doing so. In most cases, it is worth the effort in order to have a clear separation between your different projects and to avoid confusion when sharing your code with others.
It's the same principle as in monouser vs multiuser, virtualization vs no virtualization, containers vs no containers, monolithic apps vs micro services, etcetera; to avoid conflict, maintain order, easily identify a state of failure, among other reasons as scalability or portability. If necessary apply it, and always keeping in mind KISS philosophy as well, managing complexity, not creating more.
And as you have already mentioned, considering that resources are finite.
Besides, a set of projects that share the same base of dependencies of course that is not the best example of separation necessity.
In addition to that, technology evolve taking into account not redundancy of knowingly base of commonly used resources.
Well, there are a few advantages:
with virtual environments, you have knowledge about your project's dependencies: without virtual environments your actual environment is going to be a yarnball of old and new libraries, dependencies and so on, such that if you want to deploy a thing into somewhere else (which may mean just running it in your new computer you just bought) you can reproduce the environment it was working in
you're eventually going to run into something like the following issue: project alpha needs version7 of library A, but project beta needs library B, which runs on version3 of library A. if you install version3, A will probably die, but you really need to get B working.
it's really not that complicated, and will save you a lot of grief in the long term.
There are several motivations for venvs,
or for their moral equivalent: conda environments.
1. author a package
You create a cool "scrape my favorite site" package
which graphs a timeseries of some widget product.
Naturally it depends on BeautifulSoup.
You happened to have html5lib 1.1 lying around
due to some previous project, so you tested with that.
A user downloads your scrape-widget package from pypi,
happens to have lxml 4.7.1 available, and finds
that scraping crashes when using that library.
Wouldn't it have been better for your package
to specify that user shall run against the same
deps that you tested with?
2. use a package
Same scenario, but now you're using someone's scrape-widget
package. Author tested with lxml 4.7.1 but you have lxml 4.9.1,
which behaves differently, and this makes the app behave
differently, crashing in ways the author never saw.
3. use two packages
You want to run both scrape-frobozz-magic-widgets
and scrape-acme-widget. Their authors tested using
different versions of requests, and of lxml.
Changing dep changes the app behavior.
You can only use one or the other, unless you're
willing to re-run pip quite frequently.
4. collaborate on a team
You write code that has deps.
So does your colleague.
You have to coordinate things,
so testing on one laptop
instills confidence the test
would succeed on other laptops.
5. use CI
You have a teammate named Jenkins, and
want to communicate to him that you used
a specific version of a dep when you saw the test succeed.
6. get a new laptop
Things were working.
Then your laptop exploded,
you got a new one,
and you (quickly) want to see things work again.
Some of your deps were downrev, due to
recently released bugs and breaking changes.
Reading a file full of dep versions from your github repo
lets you immediately reproduce the state of the world
back when things were working.
I've read in few places that generally, Python doesn't provide backward compatibility, which means that any newer version of Python may break code that worked fine for earlier versions. If so, what is my way as a developer to know what versions of Python can execute my code successfully? Is there any set of rules/guarantees regarding this? Or should I just tell my users: Just run this with Python 3.8 (for example) - no more no less...?
99% of the time, if it works on Python 3.x, it'll work on 3.y where y >= x. Enabling warnings when running your code on the older version should pop DeprecationWarnings when you use a feature that's deprecated (and therefore likely to change/be removed in later Python versions). Aside from that, you can read the What's New docs for each version between the known good version and the later versions, in particular the Deprecated and Removed sections of each.
Beyond that, the only solution is good unit and component tests (you are using those, right? 😉) that you rerun on newer releases to verify stuff still works & behavior doesn't change.
According to PEP387, section "Making Incompatible Changes", before incompatible changes are made, a deprecation warning should appear in at least two minor Python versions of the same major version, or one minor version in an older major version. After that, it's a free game, in principle. This made me cringe with regards to safety. Who knows if people run airplanes on Python and if they don't always read the python-dev list. So if you have something that passes 100% coverage unit tests without deprecation warnings, your code should be safe for the next two minor releases.
You can avoid this issue and many others by containerizing your deployments.
tox is great for running unit tests against multiple Python versions. That’s useful for at least 2 major cases:
You want to ensure compatibility for a certain set of Python versions, say 3.7+, and to be told if you make any breaking changes.
You don’t really know what versions your code supports, but want to establish a baseline of supported versions for future work.
I don’t use it for internal projects where I can control over the environment where my code will be running. It’s lovely for people publishing apps or libraries to PyPI, though.
While it is possible to simply use pip freeze to get the current environment, it is not suitable to require an environment as bleeding edge as what I am used too.
Moreover, some developer tooling are only available on recent version of packages (think type annotations), but not needed for users.
My target users may want to use my package on slowly upgrading machines, and I want to get my requirements as low as possible.
For example, I cannot require better than Python 3.6 (and even then I think some users may be unable to use the package).
Similarly, I want to avoid requiring the last Numpy or Matplotlib versions.
Is there a (semi-)automatic way of determining the oldest compatible version of each dependency?
Alternatively, I can manually try to build a conda environment with old packages, but I would have to try pretty randomly.
Unfortunately, I inherited a medium-sized codebase (~10KLoC) with no automated test yet (I plan on making some, but it takes some time, and it sadly cannot be my priority).
The requirements were not properly defined either so that I don't know what it has been run with two years ago.
Because semantic versionning is not always honored (and because it may be difficult from a developper standpoint to determine what is a minor or major change exactly for each possible user), and because only a human can parse release notes to understand what has changed, there is no simple solution.
My technical approach would be to create a virtual environment with a known working combination of Python and libraries versions. From there, downgrade one version by one version, one lib at a time, verifying that it still works fine (may be difficult if it is manual and/or long to check).
My social solution would be to timebox the technical approach to take no more than a few hours. Then settle for what you have reached. Indicate in the README that lib requirements may be overblown and that help is welcome.
Without fast automated tests in which you are confident, there is no way to automate the exploration of the N-space (each library is a dimension) to find a some minimums.
To simplify my work I want to migrate from Python 2.7.6 To Python 2.7.9/2.7.10.
I need to justify that my Python 2.7.10 Will not break my software "working" with Python 2.7.6
I followed the steps describe in porting python 2 to python 3
Increase my test coverage from 0 to 40%
run pylint (no critical bug)
Learn the differences between Python 2.7.10 And 2.7.6 < I read the release notes
I can't be sure 100% that my code will not break, but how can I be confident?
For example, should I have to look at all the Core and Builtins bugs fixed between 2.7.6 And 2.7.10 And search into my code if we use those methods?
Does exists a better strategy?
100% code coverage is a good solution, but it may be harder to obtain than 50% coverage + 100% code using modified methods between 2.7.6 And 2.7.10 Are tested.
It is a very minor Python update that almost certainty won't break anything, even the without the above mentioned steps (Python 2 to Python 3 migration is a different matter entirely).
As to proving it, well, no amount of statical checking and reading the release notes with help, since all it will tell you, is that almost certainly it is backward compatible (which is the initial guess anyway).
A possible approach would be to reproduce your production environment with Python 2.7.10 in a virtual machine (valgrind, etc can help there) and check if everything runs as expected. No way around running it to be 100% sure.
Increasing coverage is a good idea. By itself though, even full coverage run with Python 2.7.6, doesn't tell you whether it will break with Python 2.7.10 or not.
My answer does not apply only to Python, but to software development in general.
First of all, as someone already stated, Python 2.7.10 is "just" a bug fixing release - this means that all regression tests are passing and that no backward incompatible changes are included. This also guarantees that a function signature does not change, therefore your code is likely to be working. Due to Python source code high coverage, it's also possible to say that even if a bug fix might have introduced a bug, this had been covered with regression tests - so either the bug is new or it was not covered by regression tests (the first does not imply the second).
In addition, having 100% coverage is technically not always possible - 90-95% is generally the way to go. And if that's not enough, you might try different scenarios on a local environment as suggested by rth.
However, consider going through your libraries/modules imported and check if they all support Python 2.7.10. If not, it doesn't mean that your project won't work, but it could happen that if you are using some low-level C libraries they might break - so be careful especially there.
In general, I suggest you to go through the changes and through the imported libraries. Adding coverage is always good - not just to update to a new version - so I join other users in saying that you should definitely increase your coverage.
As stated in the dev-cycle presentation:
To clarify terminology, Python uses a major.minor.micro nomenclature
for production-ready releases. So for Python 3.1.2 final, that is a
major version of 3, a minor version of 1, and a micro version of 2.
new major versions are exceptional; they only come when strongly incompatible changes are deemed necessary, and are planned very long
in advance;
new minor versions are feature releases; they get released roughly every 18 months, from the current in-development branch;
new micro versions are bugfix releases; they get released roughly every 6 months, although they can come more often if necessary; they
are prepared in maintenance branches
This means that updating from a micro version to another shouldn't (in theory) break anything.
It's the same for minor versions, which should only add features that are backward compatible.
Considering how widely used python is, you can be sure that many tests are made to ensure that this is respected.
However, there is no guarantee, but the whole point of micro versions is bugfixing, not introducing new bugs.
I have created a medium sized project in python 2.7.3 containing around 100 modules. I wish to find out with which previous versions of python (ex: 2.6.x, 2.7.x) is my code compatible (before releasing my project in public domain). What is the easiest way to find it out?
Solutions I know -
Install multiple versions of python and check in every versions. But I don't have test cases defined yet, so need to define those first.
Read and compare changelog of the various python versions I wish to check compatibility for, and accordingly find out.
Kindly provide better solutions.
I don't really know of a way to get around doing this without some test cases. Even if your code could run in an older version of python there is no guarantee that it works correctly without a suite of test cases that sufficiently test your code
No, what you named is pretty much how it's done, though the What's New pages and the documentation proper may be more useful than the full changelog. Compatibility to such a huge, moving target is infeasible to automate even partially. It's just not as much work as it sounds like, because:
Some people do have test suites ;-)
You don't (usually) need to consider bugfix releases (such as 2.7.x for various x). It's possible that your code requires a bug fix, but generally the .0 releases are quite reliable and code compatible with x.y.0 can run on any x.y.z version.
Thanks to the backwards compatibility policy, it is enough to establish a minimum supported version, all later releases (of the same major version) will stay compatible. This doesn't help in your case as 2.7 is the last 2.x release ever, but if you target, say, 2.5 then you usually don't have to check for 2.6 or 2.7 compatibility.
If you keep your eyes open while coding, and have a bit of experience as well as a good memory, you'll know you used some functionality that was introduced in a recent version. Even if you don't know what version specifically, you can look it up quickly in the documentation.
Some people embark with the intent to support a specific version, and always keep that in mind when developing. Even if it happens to work on other versions, they'd consider it unsupported and don't claim compatibility.
So, you could either limit yourself to 2.7 (it's been out for three years), or perform tests on older releases. If you just want to determine whether it's compatible, not which incompatibilities there are and how they can be fixed, you can:
Search the What's New pages for new features, most importantly new syntax, which you used.
Check the version constraints of third party libraries you used.
Search the documentation of standard library modules you use for newly added functionality.
A lot easier with some test cases but manual testing can give you a reasonably idea.
Take the furthest back version that you would hope to support, (I would suggest 2.5.x but further back if you must - manually test with that version keeping notes of what you did and especially where it fails if any where - if it does fail then either address the issue or do a binary search to see which version the failure point(s) disappear at. This could work even better if you start from a version that you are quite sure you will fail at, 2.0 maybe.
1) If you're going to maintain compatibility with previous versions, testing is the way to go. Even if your code happens to be compatible now, it can stop being so at any moment in the future if you don't pay attention.
2) If backwards compatibility is not an objective but just a "nice side-feature for those lucky enough", an easy way for OSS is to let users try it out, noting that "it was tested in <version> but may work in previous ones as well". If there's anyone in your user base interested in running your code in an earlier version (and maintain compatibility with it), they'll probably give you feedback. If there isn't, why bother?