Best practice to manage dependencies between conda and pip

Best practice to manage dependencies between conda and pip - python

I'm developing a Python library, which depends on multiple packages. I'm struggling to find the most straightforward of managing all those dependencies with the following constraints :
Some of those dependencies are only available as conda packages (technically the source is available but the build process is not something I want to get into)
Other dependencies are only available via pip
I need to install my own library in editable or developer mode
I regularly need to keep the dependencies up-to-date
My current setup for the initial install :
Create a new conda environment
Install the conda-only dependencies with conda install ...
Install my library with pip install -e .
At this point, some packages were installed and are now managed by conda, others by pip. When I want to update my environment, I need to:
Update the conda part of the environment with conda update --all
Update the pip part of the environment by hand
My problem is that this is unstable : when I update all conda packages, it ensures the consistency of the packages it manages. However, I can't guarantee that the environment as a whole stays consistent, and I just realized that I was missing some updates because I forgot to check for updates in the pip part of the environment.
What's the best way to do this ? I've thought of :
Using conda's pip interoperability feature : this seems to work, but I've had some dubious results, probably because of my use of extras_require
Since pip can see the conda packages, the initial install is consistent, which means I can simply reinstall everything when I want to update. This works but is not exactly elegant.

The recommendation in the official documentation for managing a Conda environment that also requires PyPI-sourced or pip-installed local packages is to define all dependencies (both Conda and Pip) in a YAML file. Something like:
env.yaml
name: my_env
channels:
- defaults
dependencies:
- python=3.8
- numpy
- pip
- pip:
- some_pypi_only_pkg
- -e path/to/a/local/pkg
The workflow for updating in such an environment is to update the YAML file (which I would recommend to keep under version control) and then either create a new environment or use
conda env update -f env.yaml
Personally, I would tend to create new envs, rather than mutate (update) an existing one, and use minimal constraints (i.e., >=version) in the YAML. When creating a new env, it should automatically pull the latest consistent packages. Plus, one can keep the previous instances of the env around in case a regression is need during the development lifecycle.

Related

pip dependencies of dependencies when installed from conda environment.yaml

I am trying to create a conda environment.yml file for users of a project. One dependency is not distributed by conda, but available with pip+github. I assume based on this example that I can do this:
dependencies
- pip
- regular_conda_dep
- depend_of_blah
# Install in editable mode.
- -e git+https://github.com/ourgroup/blah.git
But what happens to the dependencies of blah (depend_of_blah)? Will the pip install after the conda so that as long as I am careful to include it gets installed before blah? Later on will blah update cleanly, getting as much as possible from conda?
Or do I need to add --no-deps to the pip line? Is it implied that this is done magically? I don't see a lot of advanced examples that deal with this, but in my experience is that it is a real danger in pip/conda mixes not to use --no-deps, with pip essentially hijacking anything that hasn't been explicitly handled first.

Conda parses the YAML, and partitions the dependency specifications into a Conda set and a Pip set (code). Only the Conda set is used to solve and create the initial environment.1 Once the environment has been successfully created, Conda writes all the Pip specifications to a temporary requirements.txt (code), and then using the python in the environment runs the command:
python -m pip install -U -r <requirements.txt>
So, to explicitly answer the question: If all the dependencies of blah are installed via Conda and they have sufficient versions installed, then Pip should only install blah and leave the Conda versions untouched. This is because the default value for --upgrade-strategy is only-if-needed.
Otherwise, if the Conda dependencies section does not include all the dependencies of blah, then Pip will install the necessary dependencies.
[1]: Technically, if there are create_default_packages set in the Conda configuration, Conda will first create the environment with just these packages, then subsequently install the dependencies specified in the YAML file.

You can tell pip to ignore dependencies via environment variables
PIP_NO_DEPS=1 conda env create -f myenv.yaml
From the documentation:
pip’s command line options can also be set with environment variables
using the format PIP_<UPPER_LONG_NAME> . Dashes (-) have to be
replaced with underscores (_).

Install prebuilt packages from conda-forge (e.g. cartopy) using poetry without relying on conda (using only the channel)

I'm testing poetry and I was wondering if it is possible to install prebuilt packages from conda-forge, as cartopy without relying on conda (so keeping a 100% poetry process). I googled this a bit but the only way I found is to install poetry within a conda venv using pip and then installing from conda-forge using conda and then tweaking poetry files to make it aware of the conda venv so that the TOML is written properly.
Packages like cartopy are a pain to install if not from a prebuilt version, if possible I'd change my conda stack to poetry stack if something like poetry add [?conda-forge?] cartopy works
Thanks.

Not currently possible. Conda is a generic package manager, not just a Python package manager. Furthermore, there is no dedicated metadata in Conda packages to discriminate whether or not they are Python packages, which I think would be a prerequisite for Poetry being able to determine whether the Conda package is even valid for installation. Hence, what OP requests cannot be a thing, or at least would it be a major undertaking to make it one.
However, others have requested similar features, so someone hopeful for such functionality could subscribe to notifications on those, or follow the Feature Roadmap.

Is that a bad idea to use conda and pip install on the same environment?

Since conda install and pip install in many cases do essentially the same thing, what would be the best option? Is there a case when someone should stick to pip install only? Symmetrical, is there a case when one should stick to conda install only? Is there a way to shoot in one's foot by using both conda and pip install in a single environment?
If both approaches are essentially the same and don't contradict each other there should be no reason to stick solely to one of them but not to the other.

Don't mix conda install and pip install within conda environment. Probably, decide to use conda or virtualenv+piponce and for all. And here is how you decide which one suits you best:
Conda installs various (not only python) conda-adopted packages within conda environment. It gets your environments right if you are into environments.
Pip installs python packages within Python environment (virtualenv is one of them). It gets your python packages installed right.
Safe way to use conda: don't rush for the latest stuff and stick to the available packages and you'll be fine.
Safe way to use pip+virtualenv: if you see a dependency issue or wish to remove and clean up after package - don't. Just burn the house, abandon your old environment and create a new one. One command line and 2-5 minutes later things gonna be nice and tidy again.
Pip is the best tool for installing Python packages among the two of them. Since pip packages normally come out first and only later are adopted for conda (by conda staff or contributors). Chances are, after updating or installing the latest version of Python some of the packages would only be available through pip. And the latest freshest versions of packages would only be available in pip. And mixing pip and conda packages together can be a nightmare (at least if you want to utilize conda's advantages).
Conda is the best when it comes to managing dependencies and replicating environments. When uninstalling a package conda can properly clean up after itself and has better control over conflicting dependency versions. Also, conda can export environment config and, if the planets are right at the moment and the new machine is not too different, replicate that environment somewhere else. Also, conda can have larger control over the environment and can, for example, have a different version of Python installed inside of it (virtualenv - only the Python available in the system). You can always create a conda package when you have no freedom of choosing what to use.
Some relevant facts:
Conda takes more space and time to setup
Conda might be better if you don't have admin rights on the system
Conda will help when you have no system Python
virtualenv+pip will free you up of knowing lots of details like that
Some outdated notions:
Conda used to be better for novice developers back in the day (2012ish). There is no usability gap anymore
Conda was linked to Continuum Analytics too much. Now Conda itself is open source, the packages - not so much.

Depends on the complexity of your environment really.
Using pip for a few simple packages should not generate any issues.
Using more pip installs raises the question "Why not use a pip venv then?".
If you're not doing anything major, you might be able to have a mix of pip and conda installs.
There is an extensive explanation why mixing them can be a bad idea here: Using Pip in a Conda Environment.

Is it safe to run conda updates that leads to custom package?

To be on the safe side, I was advised to keep to
$ conda update anaconda
However, some tutorials on internet recommends
$ conda update conda
$ conda update --all
The above two commands lead to the installation of custom packages. Will it cause anaconda to be unstable? Is it safer to simply keep to conda update anaconda?
Custom packages refer to packages with names that contain the word "custom". They do not belong to the standard anaconda package.
conda packages with version name of 'custom'
I am using Anaconda Python version 3.

Open "Command or Conda Prompt" and run:
conda update conda
conda update anaconda
It's a good idea to run both commands twice (one after the other) to be sure that all the basic files are updated.
This should put you back on the latest 'releases', which contains packages that are selected by the people at Continuum to work well together.
If you want the last version of each package run (this can lead to an unstable environment):
conda update --all

If you are implementing some projects and they depend on the previous version of some packages then running $ conda update --all will be disastrous. It may break some of the code from your project.
Running $ conda update anaconda will be a safe option because along with anaconda it will update all of the required dependencies to the required version.

Create anaconda environment with all packages from other environments

Is it possible to create an anaconda environment with all of the packages from my other environments? It would be even better if it could dynamically stay up to date.

If the packages of interest were all pulled from pip you could attempt a pip freeze and requirements install like discussed here.
Pip freeze vs. pip list
But I doubt that would work globally for every module. I remember back in the day trying to extend my base python to include Bokeh, but all the dependency headache eventually caused me to outright install Anaconda.
Looks like there is a means to do this,
$ conda list -e > req.txt
then you can install the environment using
$ conda create -n new environment --file req.txt
These examples are for one off merging of a single source to a single target environment. If you want the union of various environments you'd need to merge the req.txt files and possibly take the highest value version so you'd need to do some string parsing and a little bit of scripting so you don't get conflicting versions installed from various environments funneling down to one.
(I'm not able to test this directly at the moment)

stack 'em.
create environments for base_env (base packages) and app_env (just your application packages)
then,
conda activate base_env
conda activate --stack app_env

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.