"Installing From Source" Within Anaconda Environment - python

What I would like to do:
I am using macOS and Anaconda 2.
I would like to install a Python package (specifically PyTorch) from source.
I would like to install all the dependencies and the package itself within an Anaconda environment.
I don't want this Anaconda environment to be the default/ root Anaconda environment, but an environment I particularly created for installing this package and its dependencies from source.
What I have done:
First, I created the environment as follows
conda create --name my_env python=3.5
Now, the instructions for installing PyTorch from source are as follows:
export CMAKE_PREFIX_PATH=[anaconda root directory]
conda install numpy pyyaml setuptools cmake cffi
git clone --recursive https://github.com/pytorch/pytorch
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
Now, my questions are:
Following this instructions, requires me to specify anaconda root directory for the CMAKE_PREFIX_PATH. What should that directory be given that I want everything set-up in my_env?
Is it reasonable to create an extra environment for a package installed from source and its dependencies? Why would one do or not do it? My motivation is mainly fear that one day I may screw my system up big time and hence want things to be cleanly separated.
If you can only answer one of the two questions, that is already greatly appreciated. Thanks!

I received this answer from the Anaconda Google discussion group and re-post it here in case anyone else is interested.
It is the path to my_env. If you created it with -n my_env and you haven't otherwise changed your envs dir, it'll be in <anaconda root>/envs/my_env
Yes, this is definitely good practice. The cleanest way to use conda is to install miniconda, not anaconda, and to install as little as possible into the root environment.

Related

Do these commands do the same action?

Basically I would like to know if those 2 snippets do the same thing :
conda install -n myEnv myPackage
VS
conda activate myEnv
pip install myPackage
Or in a different way, does a pip install when a conda environment is activated equal doing a conda install on myEnv ?
EDIT : I thought it was obvious but => more precisely, does the second snippet only install the package on the environment or on the overall system ?
PS : Asking because there's a package available with pip but not with conda and I want it to only be installed on myEnv
The Anaconda docs make it clear that if you use conda as your virtual environment manager, you should stick to conda install to install new packages as far as possible:
Unfortunately, issues can arise when conda and pip are used together
to create an environment, especially when the tools are used
back-to-back multiple times, establishing a state that can be hard to
reproduce. … Running conda after pip has the potential to
overwrite and potentially break packages installed via pip. Similarly,
pip may upgrade or remove a package which a conda-installed package
requires.
If you can't get all the packages you need from a conda channel, they say this, which is good advice even if you don't use pip:
If there is an expectation to install software using pip along-side
conda packages it is a good practice to do this installation into a
purpose-built conda environment to protect other environments from any
modifications that pip might make.
Finally the same document notes:
Use conda environments for isolation
create a conda environment to isolate any changes pip makes
environments take up little space thanks to hard links
care should be taken to avoid running pip in the “root” environment
Provided you activate the correct conda environment first, the pip install command(s) should use that environment's pip and install only into that environment.
Yes and no.
pip downloads and installs the package from PyPI whereas conda does the same from Anaconda repositories.
There are packages in PyPI not present in Anaconda, and the other way around.
For managing the environment I would choose one way or the other, since with pip you can freeze into a requirements.txt (pip freeze > requirements.txt) and conda you can either export the whole environment (conda env export) or the list of packages (conda list --export > requirements.txt). However if you try to use a conda-generated file from pip, it will most probably fail.

Installing dependencies from (Conda) environment.yml without Conda?

I currently use Conda to capture my dependencies for a python project in a environment.yml.
When I build a docker service from the project I need to reinstall these dependencies. I would like to get around, having to add (mini-)conda to my docker image.
Is it possible to parse environment.yml with pip/pipenv or transform this into a corresponding requirements.txt?
(I don't want to leave conda just yet, as this is what MLflow captures, when I log models)
Nope.
conda automatically installs dependencies of conda packages. These are resolved differently by pip, so you'd have to resolve the Anaconda dependency tree in your transformation script.
Many conda packages are non-Python. You couldn't install those dependencies with pip at all.
Some conda packages contain binaries that were compiled with the Anaconda compiler toolchain. Even if the corresponding pip package can compile such binaries on installation, it wouldn't be using the Anaconda toolchain. What you'd get would be fundamentally different from the corresponding conda package.
Some conda packages have fixes applied, which are missing from corresponding pip packages.
I hope this is enough to convince you that your idea won't fly.
Installing Miniconda isn't really a big deal. Just do it :-)

What does conda env do under the hood?

After searching and not finding, I must ask here:
How does conda env work under the hood, meaning, how does anaconda handle environments?
To clarify, I would like an answer or a reference to questions like:
What is kept in the envs/myenv folder?
What happens upon activate myenv?
What happens upon conda install ...?
Where can i find such information?
Conda envs
Basically, conda environments replicate the structure of your system, meaning it will store /bin, /lib, /etc, /var, among other directories. This is more obvious for unix systems, but the same concept is true under windows (DLLs, libs, Scripts, ...).
More details in the official documentation.
Conda install
The idea is that conda install PACKAGE will fetch a precompiled package from a channel (a conda packages repository), and install it under this system-like structure. Instead of relying on system dependencies, conda will install all dependencies of this package under the environment structure, using only conda packages.
Thus installing the same package at a given time point under different systems should result in reliably identical installs.
This is a way to standardize binaries, and it is only achieved by precompiling every package against given versions of libraries, which are shipped as dependencies of the conda environment. For instance, conda-forge and bioconda channels rely on cloud-based CI/CD pipelines to compile all packages on identical and completely clean system images.
Conda also stores metadata about these packages (version, build number, dependencies, license,...) so it is able to solve pretty complex dependency trees and avoid packages/libraries incompatibilities. It is the Solving... step each time you execute conda install.
Conda activate
Then when you conda activate ENV, conda prepends the environment root $CONDA_PREFIX/bin to PATH, so that all executables installed in the environment will be found by the system (and will overload system-wide install of the same executable).
You can imagine it like temporarily replacing the system executables with those from the environment.
More
This a very basic explanation, not 100% accurate, and certainly not complete. If you want to learn more, go read the documentation, experiment with conda, and maybe have an in-depth look to how Conda-forge and Bioconda do build packages, as everything is hosted on github.

Installing Anaconda into a Virtual Environment

I've currently got a working installation of the Enthought Python Distribution on my machine that I don't want to necessarily disrupt, but I'd like to look at moving over to Anaconda from Continuum.
I can easily install Anaconda into the virtualenv directory I create, but I'm not sure how to tell that virtualenv to use the anaconda-version of Python. If I was telling my whole system to use it I can alter .bash_profile with something like export PATH="/DIRECTORIES/anaconda/bin:$PATH. Is there a way to do that within a virtualenv?
I just tested the Anaconde 1.6 installer from http://continuum.io/downloads
After downloading, I did:
bash Anaconda-1.6.0-Linux-x86_64.sh
If you take the defaults, you'll end up with a directory anaconda in your home directory, completely separate from your EPD or system Python installation.
To activate the anaconda installation's default environment, do the following:
source $HOME/anaconda/bin/activate ~/anaconda
All Python commands will now come from the default Anaconda environment in $HOME/anaconda, which is itself a kind of a virtual environment. You can create sub-environments with e.g. conda create -n myenv1 ipython scipy, but this is not necessary.
As a side note, you can also use pip (also in $HOME/anaconda/bin) to install PyPI packages into your Anaconda default environment (it has pip installed by default) or any of the sub-environments (in which case you should first install pip into the sub-environment using conda install -n myenv1 pip).
It is possible to install parts of Anaconda manually into an existing virtualenv, but using their installer is by far the easiest way to test and use, without affecting any of your existing Python installations.
When you create your virtualenv use the -p flag to give it the path to the Python executable you want to use:
virtualenv -p /path/to/python-anaconda-version

How to install python packages without root privileges?

I am using numpy / scipy / pynest to do some research computing on Mac OS X. For performance, we rent a 400-node cluster (with Linux) from our university so that the tasks could be done parallel. The problem is that we are NOT allowed to install any extra packages on the cluster (no sudo or any installation tool), they only provide the raw python itself.
How can I run my scripts on the cluster then? Is there any way to integrate the modules (numpy and scipy also have some compiled binaries I think) so that it could be interpreted and executed without installing packages?
You don't need root privileges to install packages in your home directory. You can do that with a command such as
pip install --user numpy
or from source
python setup.py install --user
See https://stackoverflow.com/a/7143496/284795
The first alternative is much more convenient, so if the server doesn't have pip or easy_install, you should politely ask the admins to add it, explaining the benefit to them (they won't be bothered anymore by requests for individual packages).
You could create a virtual environment through the virtualenv package.
This creates a folder (say venv) with a new copy of the Python executable and a new site-packages directory, into which you can "install" any number of packages without needing any kind of administrative access at all. Thus, activating the environment through source venv/bin/activate will give Python an environment that's equivalent to having those packages installed.
I know this works for SGE clusters, although how the virtual environment is activated might depend on your cluster's configuration.
You can try installing virtualenv on your cluster within your own site-packages directory using the following steps:
Download virtualenv from here, put it on your cluster
Install it using setup.py to a specific, local directory to serve as your own site-packages:
python setup.py build
python setup.py install --install-base /path/to/local-site-packages
Add that directory to your PYTHONPATH:
export PYTHONPATH="/path/to/local-site-packages:${PYTHONPATH}"
Create a virtualenv:
virtualenv venv
You can import a module from an arbitrary path by calling:
sys.path.append()
The Python Distribution Anaconda solves many of the issues discussed in this questions. Anaconda does not require Admin or root access and is able to install to your home directory. Anaconda comes with many of the packages in question (scipy, numpy, sklearn, etc...) as well as the conda installer to install additional packages should additional ones be necessary.
It can be downloaded from https://www.continuum.io/downloads

Categories