I have two, related Python projects:
../src/project_a
../src/project_a/requirements.txt
../src/project_a/project_a.py
../src/project_b
../src/project_b/requirements.txt
../src/project_b/project_b.py
Both projects use the same version of Python. The respective requirements.txt files are similar but not identical.
Do I create a separate virtual environment for each project or can I create a "global" virtual environment at the ../src level?
Note: I'm obviously new to using virtual environments.
Virtual environments are meant to keep things isolated from each other.
If one project is a dependency of the other one, then they have to be installed in the same environment.
If two projects have dependencies that conflict with each other, then they have to be installed in different environments.
If two projects are meant to be run on different versions of the Python interpreter, then they have to be installed in different environments.
That's basically the only rules (I can think of). To me the rest is just a mix of best practices, personal opinions, common sense, technical limitations, and so on.
One could think of the pet vs cattle analogy (again) for example. Virtual environments can be seen as throw away things, that are created on demand (automatically with tools such as tox for example), which is easy once the dependencies are clearly written down (in requirements.txt for example).
In your case, I would probably start with a single Python virtual environment, and only start creating more when the need arises. Most likely this will happen once the projects grow in size. And eventually it could become an absolute necessity once a project requires specific versions of dependencies that conflict with the dependencies of the other.
Related
Lately I have started to use venv virtual environments for my development. (before I was just using docker images and conda environments)
However I notice that virtual environments are created for some code you have.
My question is isn't that wasteful?
I mean if we have 20 repos of code, and they all need opencv, having 20 virtual environments make it install opencv 20 times?
What is under the hood of the virtual environment practice?
There's a classic trade-off involved here. YES, the liberal use of virtualenvs requires more disk space...but these days, disk space is super cheap. The common consensus, which if you're an old timer like me then you would have come to on your own by now, is that the benefits of having a separate virtualenv for each of your projects VASTLY outweighs the downside of using some extra disk space. Of course, you can get by with fewer or even no virtualenvs.
I used to try to have fewer than one virtualenv per project. But that sometimes led to problems when I went to package up and distribute a particular project. Only if you have one virtualenv per project can you be 100% sure that the runtime environment you are using for testing your code will be identical to the one you'll represent in a requirements.txt file when you distribute your app.
PS: I just bought a refurbished 4TB hard drive on Amazon to use for backups. It cost me $38 dollars, shipped to me in one day! And 1TB SSDs can now be had for about $75. It's amazing how cheap "disk" space is these days.
My question is isn't that wasteful?
If you have N projects on same machine and all these projects using different versions of most imported libraries, venv is a good choice to be able guarantee that all projects are properly working as expected. In other way (if you have only single version of Python) you have to test all N pojects for compatibilty with single version. What is more wastful: spent time to check all libraries for compatibility with current version, or just install multiple versions of libraries checked and recommended by library provider?
I mean if we have 20 repos of code, and they all need opencv, having 20 virtual environments make it install opencv 20 times?
Actually you don't need to create venv for every single repo (project). it is just a folder and may be created outside source code directory (for example I use pyenv with virtualenv plugin that makes able to switch between envs independently of repo). So if you have 10 repos using opencv x.x and 10 repos using opencv y.y you may use only two venvs x.x and y.y
What is under the hood of the virtual environment practice?
In modern times of automation any processes venv is a simple way to be sure that automated process are using specified versions of libraries and here is no conflicts between versions on single worker (server). venv per project allow you create / destroy python environment at any time without worrying is it updated semewehere outise and updated at all? You don't need to maintenance environment and keep it fresh after creating. Just create → use → remove. of course if couple of processes are using same versions of libraries and start / stop working at same time (so venv may be deleted at same time), it will be better to use single environment for these projects
Many modern software has dependency on python language and they -as a consequence- install their own versions of python with the necessary libraries for each particular software to work properly.
In my case, I have my own python that I downloaded intentionally using anaconda distribution, but I also have the ones came with ArcGIS, QGIS, and others.
I have difficulties distinguishing which python -say- I am updating or adding libraries to when reaching them from the command line, and these are not environments but rather the full python packages.
What is the best way to tackle that?
Is there a way to force new software to create new environments within one central python distribution instead of loosing track of all the copies existing in my computer?!
Note that I am aware that QGIS can be downloaded now through conda, which reduces the size of my problem, but doesn't completely solve it. Moreover, that version of QGIS comes with its own issues.
Thank you very much.
as Nukala suggested, that's exactly what virtual environments are for. It contains a particular version of a python interpreter and a set of libraries to be used by a single (or sometimes multiple) project. If you use IDE:s such as Pycharm, it handles the venvs for you automatically.
You can use pyenv to manage python versions in your system. Using pyenv you can easily switch between multiple versions.
And as suggested - each project can create a virtual environment. You have multiple options here - venv, virtualenv, virtualenvwrapper, pipenv, poetry ... etc.
I have a little bit of experience using Anaconda and am about to transition to using it much more, for all of my Python data work. Before embarking on this I have what feels like a simple question: "when should I use a new environment?"
I cannot find any good, practical advice on this on StackOverflow or elsewhere on the web.
I understand what environments are and their benefits and how, if I am working on a project that has a dependency on a specific version of a library that is different to e.g. the latest version of that library etc. etc. ... then virtual environments are the answer; but I am looking for some advice as to how to practically approach their use in my day-to-day work on different data projects.
Logically there appears to be (at least) two approaches:
Use one environment until you absolutely need a separate environment
for a specific project
Use a new environment for every single project
I can see some pros and cons to each approach and am wondering if there is any best practice that can be shared.
If I should use just one environment until I need a second one, should I just use the default "root" environment and load all my required dependent libraries into that or is it best to start off with my own environment that is called something else?
An answer to this question "why create new environment for install" by #codeblooded gives me some hints as to how to use and approach conda environments and suggests a third way,
Create new environments on an as-needs basis, projects do not "live" inside environments but use environments at runtime, you will end up with as many different virtual environments as you need to run the projects that you regularly use on that machine, that may be just one environment or it may be more
Anyway, you can see that I am struggling to get my head around this, any help would be greatly appreciated. Thank you!
As a developer that works with data scientists I would strongly recommend creating an environment for each project. The benefit of python environments is that the encapsulate the requirements of a project from all other python projects.
In the case above, if you were to use Python36 for 8 different projects it would be very easy to accidentally upgrade a package or install a conflicting package that breaks other projects without you realising it.
In the work you do it might not be a big deal, but given how easy it is to create a separate environment for each project the benefits outway the small time cost.
I can tell you that if any of the developers I work with was found to be using a single python environment for multiple development projects they would be instructed to stop doing that immediately.
Ok, I think I worked this one out for myself. Seems kind of obvious now.
You do NOT need to create an environment for every project.
However if particular projects require particular versions of libraries, a particular version of Python etc then you can create a virtual environment to capture all of those dependencies.
To give an example,
let's say you are working on a project that requires a library that has a dependency on a particular version of Python e.g. Python 3.6,
and your base (root) environment is Python 3.7,
you would create a new anaconda environment configured to use Python 3.6 (maybe call it "Python36")
and you would install all the required libraries in that environment and you would use that environment when running that project.
When you have another project that requires similar libraries, you may re-use your now existing Python 3.6 environment (named "Python36") to run this new project,
you would not have to create a new Python 3.6 environment, in the same way that you would not have to install multiple instances of Python 3.6 in order to run multiple projects that required Python 3.6.
I just want to ask...maybe a very stupid question but I want to be sure...
What are the advantages of using different anaconda enviroments for different projects?
Also, why would it be bad to have just one with all the packages installed?
Advantages using environments per project are e.g.:
Be able to use different versions of the same packages in different projects.
Ensure only using dependencies you are aware of in your projects, so you are able to reproduce the environment for specific projects on other machines.
Hence some disadvantages when not using environments per project:
If you want to update a package that requires updates in your code, you would need to update all projects which use that package.
You could forget adding dependencies to your project's environment file, because it is in the global environment anyways and therefore nobody forced you to do so.
See the Python and Conda docs about environments for further information.
I have recently grown interest in learning how to use virtual environments with Python.
As you probably already know, they are useful when in need of multiple versions of the same package. As far as I understood, using pip you can't differentiate between versions since it just uses the package's name.
I will take as an example Pipenv, which seems to be a new powerful tool also announced as the new standard by the PyPA. I fairly understand what, how, and why Pipenv does the (basic) things. What I don't understand (or better, what puzzles me) is how Pipenv (or any virtual environment tool in Python for what I know) manages space on the disk.
With Pip you usually install packages in one place, then you simple import them in you code, that's it.
With Pipenv (or similar) you create a virtual environment in which you have everything installed and it cannot communicate with the external world (which is kind of the point, I know).
Now let's suppose I am working on ProjectA, then on ProjectB. Both will have their environment (somewhere within ~.virtualenvs, for Pipenv).
Let's also suppose that even if the two projects have different high-level dependencies, they have one sub-dependence in common. I mean, same name same version.
When I do "pipenv install thatpackage" in each of the cases, it will be downloaded and stored separately in each case. Am I correct?
If I'm right, isn't this a waste of space? I would have 2 copies of the same package on my disk. If this is reiterated for many packages you can guess how much space is wasted when working on many different projects.