I currently use Conda to capture my dependencies for a python project in a environment.yml.
When I build a docker service from the project I need to reinstall these dependencies. I would like to get around, having to add (mini-)conda to my docker image.
Is it possible to parse environment.yml with pip/pipenv or transform this into a corresponding requirements.txt?
(I don't want to leave conda just yet, as this is what MLflow captures, when I log models)
Nope.
conda automatically installs dependencies of conda packages. These are resolved differently by pip, so you'd have to resolve the Anaconda dependency tree in your transformation script.
Many conda packages are non-Python. You couldn't install those dependencies with pip at all.
Some conda packages contain binaries that were compiled with the Anaconda compiler toolchain. Even if the corresponding pip package can compile such binaries on installation, it wouldn't be using the Anaconda toolchain. What you'd get would be fundamentally different from the corresponding conda package.
Some conda packages have fixes applied, which are missing from corresponding pip packages.
I hope this is enough to convince you that your idea won't fly.
Installing Miniconda isn't really a big deal. Just do it :-)
Related
I'm testing poetry and I was wondering if it is possible to install prebuilt packages from conda-forge, as cartopy without relying on conda (so keeping a 100% poetry process). I googled this a bit but the only way I found is to install poetry within a conda venv using pip and then installing from conda-forge using conda and then tweaking poetry files to make it aware of the conda venv so that the TOML is written properly.
Packages like cartopy are a pain to install if not from a prebuilt version, if possible I'd change my conda stack to poetry stack if something like poetry add [?conda-forge?] cartopy works
Thanks.
Not currently possible. Conda is a generic package manager, not just a Python package manager. Furthermore, there is no dedicated metadata in Conda packages to discriminate whether or not they are Python packages, which I think would be a prerequisite for Poetry being able to determine whether the Conda package is even valid for installation. Hence, what OP requests cannot be a thing, or at least would it be a major undertaking to make it one.
However, others have requested similar features, so someone hopeful for such functionality could subscribe to notifications on those, or follow the Feature Roadmap.
After searching and not finding, I must ask here:
How does conda env work under the hood, meaning, how does anaconda handle environments?
To clarify, I would like an answer or a reference to questions like:
What is kept in the envs/myenv folder?
What happens upon activate myenv?
What happens upon conda install ...?
Where can i find such information?
Conda envs
Basically, conda environments replicate the structure of your system, meaning it will store /bin, /lib, /etc, /var, among other directories. This is more obvious for unix systems, but the same concept is true under windows (DLLs, libs, Scripts, ...).
More details in the official documentation.
Conda install
The idea is that conda install PACKAGE will fetch a precompiled package from a channel (a conda packages repository), and install it under this system-like structure. Instead of relying on system dependencies, conda will install all dependencies of this package under the environment structure, using only conda packages.
Thus installing the same package at a given time point under different systems should result in reliably identical installs.
This is a way to standardize binaries, and it is only achieved by precompiling every package against given versions of libraries, which are shipped as dependencies of the conda environment. For instance, conda-forge and bioconda channels rely on cloud-based CI/CD pipelines to compile all packages on identical and completely clean system images.
Conda also stores metadata about these packages (version, build number, dependencies, license,...) so it is able to solve pretty complex dependency trees and avoid packages/libraries incompatibilities. It is the Solving... step each time you execute conda install.
Conda activate
Then when you conda activate ENV, conda prepends the environment root $CONDA_PREFIX/bin to PATH, so that all executables installed in the environment will be found by the system (and will overload system-wide install of the same executable).
You can imagine it like temporarily replacing the system executables with those from the environment.
More
This a very basic explanation, not 100% accurate, and certainly not complete. If you want to learn more, go read the documentation, experiment with conda, and maybe have an in-depth look to how Conda-forge and Bioconda do build packages, as everything is hosted on github.
Does conda allow you to install a dependency into an environment as a development dependency?
I'm thinking of something like how bower does this with --save-dev
AFAICT, no, it does not. This repo represents work around options that might be useful elsewhere:
https://github.com/dazza-codes/conda_container
In short, it supplements a conda install with subsequent pip installs from a requirements.txt and/or a requirements.dev file. Since there can be inconsistencies in conda vs. pip packages (like different name variants etc.), there are use cases for having a combination of conda and pip. Also conda can support a pip array in an environment.yml file but the version specs for conda vs. pip packages are not compatible. Liberal use of pip check is recommended for any combination of packages from different packaging systems.
What I would like to do:
I am using macOS and Anaconda 2.
I would like to install a Python package (specifically PyTorch) from source.
I would like to install all the dependencies and the package itself within an Anaconda environment.
I don't want this Anaconda environment to be the default/ root Anaconda environment, but an environment I particularly created for installing this package and its dependencies from source.
What I have done:
First, I created the environment as follows
conda create --name my_env python=3.5
Now, the instructions for installing PyTorch from source are as follows:
export CMAKE_PREFIX_PATH=[anaconda root directory]
conda install numpy pyyaml setuptools cmake cffi
git clone --recursive https://github.com/pytorch/pytorch
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
Now, my questions are:
Following this instructions, requires me to specify anaconda root directory for the CMAKE_PREFIX_PATH. What should that directory be given that I want everything set-up in my_env?
Is it reasonable to create an extra environment for a package installed from source and its dependencies? Why would one do or not do it? My motivation is mainly fear that one day I may screw my system up big time and hence want things to be cleanly separated.
If you can only answer one of the two questions, that is already greatly appreciated. Thanks!
I received this answer from the Anaconda Google discussion group and re-post it here in case anyone else is interested.
It is the path to my_env. If you created it with -n my_env and you haven't otherwise changed your envs dir, it'll be in <anaconda root>/envs/my_env
Yes, this is definitely good practice. The cleanest way to use conda is to install miniconda, not anaconda, and to install as little as possible into the root environment.
The Anaconda website mentions that the installer has 100 of pre-built packages. Even the installer size of 500mb hints that there should be some pre-built packages.
Yet when we want to use any of the packages we have to install them through the command eg. conda install nltk
Which basically downloads the package from internet and then installs it. Which seems counterintuitive since it is already mentioned on website that nltk is present in the installer.
Can anybody throw some light on this?
There are two parts:
Conda - Package & environment management system. This gives you the
conda command and serves a similar function as pip and
virtualenv.
Anaconda - Python package distribution containing 100's of scientific
packages that are tests and verified to work together.
If you install Miniconda, you will just get conda without the full Anaconda distribution. If you install Anaconda, you will get both the conda management system and the Python distribution. You can also get Anaconda after only having installed conda by running conda install Anaconda.