I have several Python projects using different environments. These environments are managed using Conda and this works well, allowing the same environment to be used in production and dev/test for each project.
Conda yml files are used to define each environment.
There are a number of packages that I would like to use during development, such as autopep8. These don't need to be in the production environment so are not included in the yml file.
How can I install autopep8 and others so that they will work across any Python environment that I load in VS Code? So far I have had to manually install these packages as I switch environments.
Default Packages
One way of managing this without violating environment isolation1 would be to use Conda's default packages functionality. The idea would be to define default packages (such as autopep8) in a .condarc on only the development systems. The conda env create will respect these and add them to every env you create, so you can still keep a single YAML that describes only the essentials for the production version.
Note that there are multiple options for where to store this .condarc, and Conda can load settings in a nested fashion. If all environments for your user are categorized as "development", then a sensible place to define the default packages would be ~/.condarc. There is additionally a --no-default-packages flag, which can be used to disable such default package installation when you don't need it.
[1] While there are ways to include packages from outside a Conda environment (e.g., through PYTHONPATH), this should be regarded as substandard and only be used as a last resort. Conda is designed with an assumption of full isolation of environments - violating that can lead to undefined behavior.
Related
I have installed anaconda and the base environment is using the anaconda python:
Command type python returns /home/ya/anaconda3/bin/python.
However, when I create a new environment, the default python under the new environment is the /usr/bin/python. How can I change it?
The information below is also covered in the documentation on creating environments.
New Environments are Empty
Conda is a general package and environment manager, not a Python-specific one. Hence, when creating a new environment it does not assume that one wants Python. Instead, if one wants Python installed, it must be specified, e.g.,
conda create -n foo python
Note that the create command can also take version constraints, as well as additional packages. It is best to specify all packages one expects at the time of creation as this simplifies the solving.
Default Packages
The create_default_packages configuration option is available to specify a default set of packages that will always be installed at creation to time. For example, if you know that you will always want Python installed, one could use:
conda config --add create_default_packages python
Note that version can be specified here as well. All subsequent environment creations will then install Python.
One can also subsequently override such defaults with the flag --no-default-packages.
I'm working with a team. We each have our own Windows system. We have shared drives and a shared git repository. We want to have a shared virtual environment (in Python).
My understanding (from previous questions from myself and others) is that virtual environments do not include all files necessary for running python, in particular, the shared VE does not include the Python interpreter.
I can see how we can create a shared VE and it seems we could just copy that around, or put it on the shared drive, or put it in a git repository. But my understanding of this is that it does not eliminate the need for individuals to install their own local versions of python. Is that correct?
One of my colleagues has heard (or read) that "there is a package that allows teams to share their virtual environment configuration through a git-like interface. That way you can “pull” the updated configuration and it will install the new packages automatically. This allows each person to change the configuration and test it before releasing it to the team."
So is there a special package to enable this? Or is it just a regular venv that is included in the git repository with the other files? If we do this, then we must all put the venvs in the same place in on our file systems OR we have to go in and manually change the VIRTUAL_ENV variable in activate.bat. Is that correct?
In any case, we do all have to install our own local versions of python anyway. Is that correct?
If the virtual environment is on a shared drive(group readable), then your team members should be able to access it. A virtual environment is just a directory.
But my understanding of this is that it does not eliminate the need for individuals to install their own local versions of python. Is that correct?
Virtual environments have their own python binaries, which you can see when you run which python inside the virtual environment after it is activated.
So is there a special package to enable this? Or is it just a regular venv that is included in the git repository with the other files? If we do this, then we must all put the venvs in the same place in on our file systems OR we have to go in and manually change the VIRTUAL_ENV variable in activate.bat. Is that correct?
I would advise against uploading a virtual environment directory to version control, since it contains binaries, configuration files that don't belong in there. Its also unnecessary to do this because the dependencies are tracked in a requirements.txt file, which list the pip dependencies and is committed to version control. Additionally, When the virtual environment is activated, the VIRTUAL_ENV environment variable is automatically exported, so there is no need to modify it.
Conclusion
For simplicity, its probably best to have each user create their own virtual environment and install the dependencies from requirements.txt on their local machines. This also ensures users don't make a change to the virtual environment that will affect other users, which is a drawback of the above shared drive approach.
If they want to pull the latest requirements, then pulling the latest change using git pull and reinstalling the dependencies with pip install -r requirements.txt is good enough. You just have to ensure the virtual environment is activated, otherwise the dependencies will get installed system wide. This is where the pipenv package also comes in handy.
Usually in my team projects, the README contains instructions to get this setup for each team member.
Additionally, as Daniel Farrell helpfully mentioned in the comments, pip won't be able to manage packages like libffi, openssl, python-devel etc. inside a virtual environment. This is where using Docker containers become useful, since you can install dependencies inside a isolated environment built on top of the host operating system. This ensures the dependencies don't mess with the system wide packages, which is a good practice to follow in any case.
An example Dockerfile I have used in the past:
FROM python:3.8-slim-buster
# Set environment variables:
ENV VIRTUAL_ENV=/opt/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
# Create virtual environment:
RUN python3 -m venv $VIRTUAL_ENV
# Install dependencies:
COPY requirements.txt .
RUN pip install -r requirements.txt
# Run the application:
COPY app.py .
CMD ["python", "app.py"]
Which I modified from this Elegantly activating a virtualenv in a Dockerfile article.
containerization aims to solve the "from where comes python?" problem. My developers' teams usually use a Dockerfile that installs their requirements within a docker-compose that spins ups a development environment for their applications . Unlike a virtual environment, containers offer a complete userspace solution that works pretty well in windows and osx.
The typical command to export a Anaconda environment to a YAML file is:
conda env export --name my_env > myenv.yml
However, one huge issue is the readbility of this file as it includes hard specifications for all of the libraries and all of their dependencies. Is there a way for Anaconda to export a list of the optimally smallest subset of commands that would subsume these dependencies to make the YAML more readable? For example, if all you installed in a conda environment was pip and scipy, is there a way for Anaconda to realize that the file should just read:
name: my_env
channels:
- defaults
dependencies:
- scipy=1.3.1
- pip=19.2.3
That way, the anaconda environment will still have the exact same specification, if not an improved on (if an upstream bug is fixed) and anyone who looks at the yml file will understand what is "required" to run the code, in the sense that if they did want to/couldn't use the conda environment they would know what packages they needed to install?
Options from the Conda CLI
This is sort of what the --from-history flag is for, but not exactly. Instead of including exact build info for each package, it will include only what are called explicit specifications, i.e., the specifications that a user has explicitly requested via the CLI (e.g., conda install scipy=1.3.1). Have a try:
conda env export --from-history --name my_env > myenv.yml
This will only include versions if the user originally included versions during installation. Hence, creating a new environment is very likely not going to use the exact same versions and builds. On the other hand, if the user originally included additional constraints beyond version and build they will also be included (e.g., a channel specification conda install conda-forge::numpy will lead to conda-forge::numpy).
Another option worth noting is the --no-builds flag, which will export every package in the YAML, but leave out the build specifiers. These flags work in a mutually exclusive manner.
conda-minify
If this is not sufficient, then there is an external utility called conda-minify that offers some functionality to export an environment that is minimized based on a dependency tree rather than through the user's explicit specifications.
Have a look at pipreqs. It creates a requirements.txt file only based on the imports that you are explicitely doing inside your project (and you even have a --no-pin option to ignore the version numbers). You can later use this file to create a conda environemnt via conda install --file requirements.txt.
However, if you're aiming for an evironments.yml file you have to create it manually. But that's just copy and paste from the clean requirements.txt. You only have to separate conda from "pip-only" installs.
I am going through the painful process of learning how to manage packages/ different (virtual) environments in Python/Anaconda. I was told that Anaconda is basically a python installation with all the packages I need (e.g. numpy, scipy, sci-kit learn etc).
However, when I create a new environment, none of these packages is readily available. I cannot import them when using PyCharm with the newly created environment. When I check the Pycharm project interpreter, or the anaconda navigator environments tab, It seems that indeed none of these packages are installed in my new environments. Why is this? It doesn't make sense to me to provide all these packages, but then not make them ready for use when creating new environments. Do I have to install all these packages manually in new env's or am I missing something?
Kindest regards, and thanks in advance.
The reason the default python environment doesn't come with numpy is because maybe you don't want numpy in the environment. Imagine writing an API (or general software package) where your users may or may not have access to numpy. You might want to run tests to make sure your software fails gracefully or has a pure python fallback if numpy is not installed on your user's machine. Conda environments provide this (insanely useful) benefit. Of course, the package in question doesn't have to be numpy. There are some more esoteric packages where this type of testing is useful.
Furthermore, you can create a conda environment with numpy pre-installed, or any other package you want pre-installed (just add them to the end of the conda create command):
conda create --name my-env-name numpy
Anaconda comes with available packages such as numpy, scipy, and sci-kit learn, but if you want to use them within your environment, you must:
1) Create the environment:
conda create --name new_env
2) Activate the environment:
source activate new_env
3) Install the desired package using conda install
conda install numpy
If you'd like to create a new environment that includes installations of all available Anaconda packages, see create anaconda python environment with all packages. You can include anaconda in the list of packages to install in the environment, which is a 'meta-package' meaning 'all the packages that come with the Anaconda installation'.
I don't know about "conda" environments but in general virtual environments are used to provide you a "unique" environment. This might include different packages, different environment variables etc.
The whole point of making a new virtual environment is to have a separate place where you can install all the binaries ( and other resources ) required for your project. If you have some pre-installed binaries in the environment, doesn't it defeat the purpose of creating one in the first place?
The fact that you can create multiple environments helps you to separate binaries that might be needed by one and not by the other.
For instance, if you are creating a project which requires numpy:1.1 but you have numpy:2.1 installed , then you have to change it. So basically, by not installing any other packages, they are not making assumptions about your project's requirements.
You can check the packages you have in your environment with the command:
conda list
If packages are not listed you just have to add it, with the command:
conda install numpy
I am a new user to conda environment and was setting up to use TensorFlow , on Windows.
I came across a command -
source activate IntroToTensorFlow.
I understood IntroToTensorFlow is an environment we are creating, but does it mean we need to create this environment every time?. I am using jupyter notebook, so if I shutdown the kernel will the environment get deactivated?
And if I restart my PC, should I activate the environment everytime ?
Conda is a package manager that installs and manages (usually) Python libraries and (sometimes) non-Python packages. A conda environment is a sort of virtualenv virtual environment; its typical use case is to have a Python interpreter (any version) along with your choice of compatible Python libraries (any version).
The following example might most likely pertain to you. Suppose you have downloaded the implementation of a very nice paper implemented in TF and you want to try it out. But the authors implemented that when Tensorflow was just growing. The APIs have changed now, and so is the required CUDA version. You want to work ideally on the latest TF. Now, what do you do? An easy way to just try out this implementation is to create a different conda environment with the libraries needed for that implementation, run that in this environment, and perhaps if you like it, you might consider upgrading the TF APIs and use it in your code.
The conda environments are also pretty simple in its construction. If you installed conda using Anaconda and default options, you will have your environments in ~/anaconda3/envs. The environments are nothing but directories here, each having various configurations of Python interpreter and libraries of your choice. (So when you shutdown your PC/Jupyter, the environments will of course persist.) At the time of usage, you just switch between the environments to suit your needs. That is, when you source activate an environment, you will be allowed to use the Python interpreter and installed libraries from that environment. Note if you source deactivate or start a new terminal session, you will still be using the root environment.
Besides, Jupyter notebook, if setup with this plugin, will allow you to have nice integration with conda environments and you wouldn't even need to source activate everytime you want to switch. You can choose between the various settings (or conda environments), which are interpreted as different kernels in the notebook. So it would as simple as choosing some environment using a drop-down.
source activate IntroToTensorFlow does not create an environment, it simply activates an environment that has already been created. To create that environment (with tensorflow installed), use conda create -n IntroToTensorFlow tensorflow.
You do not need to create the environment every time, but you do need to activate it every time in order to use the packages installed in it. This is done using source activate IntroToTensorFlow
If you shutdown a kernel, the environment does not get deactivated automatically. To do so, you have to explicitly say source deactivate, or activate a separate environment using source activate xxx, replacing xxx with whatever environment name you want (that you have created previously).
When restarting your PC, (or starting a new session at the command line), you have to manually activate your desired environment to use it. Otherwise, by default, it will be running in your root environment. So, if you've only installed tensorflow in IntroToTensorFlow environment, you have to use source activate IntroToTensorFlow every time in order to use it.
Take a look here for more info