Managing Multiple Python installations - python

Many modern software has dependency on python language and they -as a consequence- install their own versions of python with the necessary libraries for each particular software to work properly.
In my case, I have my own python that I downloaded intentionally using anaconda distribution, but I also have the ones came with ArcGIS, QGIS, and others.
I have difficulties distinguishing which python -say- I am updating or adding libraries to when reaching them from the command line, and these are not environments but rather the full python packages.
What is the best way to tackle that?
Is there a way to force new software to create new environments within one central python distribution instead of loosing track of all the copies existing in my computer?!
Note that I am aware that QGIS can be downloaded now through conda, which reduces the size of my problem, but doesn't completely solve it. Moreover, that version of QGIS comes with its own issues.
Thank you very much.

as Nukala suggested, that's exactly what virtual environments are for. It contains a particular version of a python interpreter and a set of libraries to be used by a single (or sometimes multiple) project. If you use IDE:s such as Pycharm, it handles the venvs for you automatically.

You can use pyenv to manage python versions in your system. Using pyenv you can easily switch between multiple versions.
And as suggested - each project can create a virtual environment. You have multiple options here - venv, virtualenv, virtualenvwrapper, pipenv, poetry ... etc.

Related

Python virtualenv with relative paths

I would like to install a python virtualenv with relative paths, so that I can move the virtualenv to another machine (that has the same operative systems).
I googled about a solution, some suggests using the --portable options of the virtualenv command, but it is not available on my system. Maybe it is old?
Apart from changing the paths by hand, are there any other official way to create a portable virtualenv?
I am planning to use this on OSX and Linux (without mixing them of course).
The goal of virtualvenv is to create isolated Python environments.
The basic problem being addressed is one of dependencies and versions, and indirectly permissions.
Creating portable installations is not the supported use case:
Created python virtual environments are usually not self-contained. A complete python packaging is usually made up of thousands of files, so it’s not efficient to install the entire python again into a new folder. Instead virtual environments are mere shells, that contain little within themselves, and borrow most from the system python (this is what you installed, when you installed python itself). This does mean that if you upgrade your system python your virtual environments might break, so watch out.
You could always try to run a virtual environment on another machine that has the same OS, version and packages. But be warned, for having tried it myself in the past it is very fragile and prone to weird errors.
There are other tools to do what you are looking for depending on your use case and target OS (e.g. single executable, MSI installer, Docker image, etc.). See this answer.

Why and when to use a new anaconda environment? (One environment for everything? A new environment for every project?)

I have a little bit of experience using Anaconda and am about to transition to using it much more, for all of my Python data work. Before embarking on this I have what feels like a simple question: "when should I use a new environment?"
I cannot find any good, practical advice on this on StackOverflow or elsewhere on the web.
I understand what environments are and their benefits and how, if I am working on a project that has a dependency on a specific version of a library that is different to e.g. the latest version of that library etc. etc. ... then virtual environments are the answer; but I am looking for some advice as to how to practically approach their use in my day-to-day work on different data projects.
Logically there appears to be (at least) two approaches:
Use one environment until you absolutely need a separate environment
for a specific project
Use a new environment for every single project
I can see some pros and cons to each approach and am wondering if there is any best practice that can be shared.
If I should use just one environment until I need a second one, should I just use the default "root" environment and load all my required dependent libraries into that or is it best to start off with my own environment that is called something else?
An answer to this question "why create new environment for install" by #codeblooded gives me some hints as to how to use and approach conda environments and suggests a third way,
Create new environments on an as-needs basis, projects do not "live" inside environments but use environments at runtime, you will end up with as many different virtual environments as you need to run the projects that you regularly use on that machine, that may be just one environment or it may be more
Anyway, you can see that I am struggling to get my head around this, any help would be greatly appreciated. Thank you!
As a developer that works with data scientists I would strongly recommend creating an environment for each project. The benefit of python environments is that the encapsulate the requirements of a project from all other python projects.
In the case above, if you were to use Python36 for 8 different projects it would be very easy to accidentally upgrade a package or install a conflicting package that breaks other projects without you realising it.
In the work you do it might not be a big deal, but given how easy it is to create a separate environment for each project the benefits outway the small time cost.
I can tell you that if any of the developers I work with was found to be using a single python environment for multiple development projects they would be instructed to stop doing that immediately.
Ok, I think I worked this one out for myself. Seems kind of obvious now.
You do NOT need to create an environment for every project.
However if particular projects require particular versions of libraries, a particular version of Python etc then you can create a virtual environment to capture all of those dependencies.
To give an example,
let's say you are working on a project that requires a library that has a dependency on a particular version of Python e.g. Python 3.6,
and your base (root) environment is Python 3.7,
you would create a new anaconda environment configured to use Python 3.6 (maybe call it "Python36")
and you would install all the required libraries in that environment and you would use that environment when running that project.
When you have another project that requires similar libraries, you may re-use your now existing Python 3.6 environment (named "Python36") to run this new project,
you would not have to create a new Python 3.6 environment, in the same way that you would not have to install multiple instances of Python 3.6 in order to run multiple projects that required Python 3.6.

What are the advantages of using different anaconda enviroments for different projects?

I just want to ask...maybe a very stupid question but I want to be sure...
What are the advantages of using different anaconda enviroments for different projects?
Also, why would it be bad to have just one with all the packages installed?
Advantages using environments per project are e.g.:
Be able to use different versions of the same packages in different projects.
Ensure only using dependencies you are aware of in your projects, so you are able to reproduce the environment for specific projects on other machines.
Hence some disadvantages when not using environments per project:
If you want to update a package that requires updates in your code, you would need to update all projects which use that package.
You could forget adding dependencies to your project's environment file, because it is in the global environment anyways and therefore nobody forced you to do so.
See the Python and Conda docs about environments for further information.

Python virtual environments and space management (Pipenv in particular)

I have recently grown interest in learning how to use virtual environments with Python.
As you probably already know, they are useful when in need of multiple versions of the same package. As far as I understood, using pip you can't differentiate between versions since it just uses the package's name.
I will take as an example Pipenv, which seems to be a new powerful tool also announced as the new standard by the PyPA. I fairly understand what, how, and why Pipenv does the (basic) things. What I don't understand (or better, what puzzles me) is how Pipenv (or any virtual environment tool in Python for what I know) manages space on the disk.
With Pip you usually install packages in one place, then you simple import them in you code, that's it.
With Pipenv (or similar) you create a virtual environment in which you have everything installed and it cannot communicate with the external world (which is kind of the point, I know).
Now let's suppose I am working on ProjectA, then on ProjectB. Both will have their environment (somewhere within ~.virtualenvs, for Pipenv).
Let's also suppose that even if the two projects have different high-level dependencies, they have one sub-dependence in common. I mean, same name same version.
When I do "pipenv install thatpackage" in each of the cases, it will be downloaded and stored separately in each case. Am I correct?
If I'm right, isn't this a waste of space? I would have 2 copies of the same package on my disk. If this is reiterated for many packages you can guess how much space is wasted when working on many different projects.

Virtualenv for multiple users or groups

I'm setting up a new system for a group of Python rookies to do a specific kind of scientific work using Python. It's got 2 different pythons on it (32 and 64 bit), and I want to install a set of common modules that users on the system will use.
(a) Some modules work out of the box for both pythons,
(b) some compile code and install differently depending on the python, and
(c) some don't work at all on certain pythons.
I've been told that virtualenv (+ wrapper) is good for this type of situation, but it's not clear to me how.
Can I use virtualenv to set up sandboxed modules across multiple user accounts without having to install each module for each user?
Can I use virtualenv to save me some time for case (a), i.e. install a module, but have all pythons see it?
I like the idea of isolating environments, and then having them just type "workon science32", "workon science64", depending on the issues with case (c).
Any advice is appreciated.
With virtualenv, you can allow each environment to use globally installed system packages simply by omitting the --no-site-packages option. This is the default behavior.
If you want to make each environment install all of their own packages, then use --no-site-packages and you will get a bare python installation to install your own modules. This is useful when you do not want packages to conflict with system packages. I normally do this just to keep system upgrades from interfering with working code.
I would be careful about thinking about these as sandboxes, because they are only partially isolated. The paths to python binaries and libraries are modified to use the environment, but really that is all that is going on. Virtualenv does nothing to prevent code running from doing destructive things to the system. Best way to sandbox is set Linux/Unix permissions properly, and give them their own user accounts.
EDIT For Version 1.7+
The default for 1.7 is to not include system packages, so if you want the behavior of using system packages, use the --system-site-packages option. Check the docs for more info.

Categories