Background
The official documentation and this blog in the same website - recommend to install as many requirements as possible with conda then use pip. Apparently this is because conda will be unaware of any changes to the dependencies made by pip and therefore will not be able to resolve dependencies correctly.
Question
Now if one exclusively uses pip and go without installing anything with conda, it seems reasonable to expect conda does not need to be aware of any changes made by pip - as conda effectively becomes a mere tool to isolate dependencies and manage versions. However, this goes against official recommendation as one will NOT install as many requirements as possible with conda.
So the question remains: is there any known drawback from exclusively using pip in a conda environment?
Similar Topics
A similar topic in has been touched a bit in here but does not cover the case of exclusively using pip in a conda environment. I have also been here:
Specific reasons to favor pip vs. conda when installing Python packages
What is the difference between pip and conda?
Using Pip to install packages to Anaconda Environment
Not sure one can give a comprehensive answer on this, but some of the major things that come to mind are:
Lack of deep support for non-Python dependency resolution. While more wheels that bundle non-Python resources have become available over time, it is nowhere near the coverage that Conda provides by being a general package manager rather than Python-specific. For anyone doing interoperable computing (e.g., reticulate), I would expect Conda to be favored.
Optimized libraries. Sort of related to the first point, but the Anaconda team has made an effort to build optimized versions of packages (e.g., MKL for numpy). Not sure if the equivalent is available through PyPI.1
Wasteful redundancy across environments. Conda uses hardlinking when packages and environments are on the same volume, and supports softlinking for spanning across volumes. This helps to minimize replicating any packages that are installed in multiple environments.
Complicates exporting. When exporting (conda env export) Conda doesn't pick up all pip-installed packages - only the ones that come from PyPI. That is, it'll miss things installed from GitHub, etc.. If one did go the pip-only route, I think a more reliable export strategy would be to use pip freeze > requirements.txt, and then make a YAML like
channels:
- defaults
dependencies:
- python=3.8 # specify the version
- pip
- pip:
- -r requirements.txt
with which to recreate the environment.
All that said, I could easily imagine that none of these matter to some people (most of them are conveniences), especially those who tend to work purely in Python. In such cases, however, I don't see why one would not simply forgo Conda altogether and use a Python-specific virtual environment manager.
[1] Someone please correct me if you know otherwise.
Related
TLDR: I want the ability to pip install whichever version of a package will require the minimal changes to my currently installed versions.
Long version:
I support a pretty complex base container for geoscience research.
My users sometimes fork the environment to add a new package for their specific use-case. The pip install of the new package often causes a cascade of upgrades, which inevitably breaks something. pip install --no-deps is no help -- it just means the new package won't work, because it's missing dependencies.
What I always seem to end up doing is manually walking the version history of the new package, looking for a version that will cause minimum disturbance to my existing packages.
Is there any way of automatically finding this "minimal-disturbance" historical version of the new package?
I'm testing poetry and I was wondering if it is possible to install prebuilt packages from conda-forge, as cartopy without relying on conda (so keeping a 100% poetry process). I googled this a bit but the only way I found is to install poetry within a conda venv using pip and then installing from conda-forge using conda and then tweaking poetry files to make it aware of the conda venv so that the TOML is written properly.
Packages like cartopy are a pain to install if not from a prebuilt version, if possible I'd change my conda stack to poetry stack if something like poetry add [?conda-forge?] cartopy works
Thanks.
Not currently possible. Conda is a generic package manager, not just a Python package manager. Furthermore, there is no dedicated metadata in Conda packages to discriminate whether or not they are Python packages, which I think would be a prerequisite for Poetry being able to determine whether the Conda package is even valid for installation. Hence, what OP requests cannot be a thing, or at least would it be a major undertaking to make it one.
However, others have requested similar features, so someone hopeful for such functionality could subscribe to notifications on those, or follow the Feature Roadmap.
I followed the instructions here: Can't find package on Anaconda Navigator. What to do next?
I clicked Open terminal from environment on Anaconda navigator, and then used "pip3 install lmfit" in the terminal. But after installing the lmfit package using pip3, I still cannot find it in the conda list. What should I do?
The Problem
At the time of this question, Conda builds of pip had only just started including a pip3 entrypoint,1 therefore pip3 is very likely referring to a non-Conda version of Python and that is where the package was installed. Try checking which pip3 to find out where it went.
Recommendation
Conda First
Generally, it is preferable to use Conda to install packages in Conda environments, and in this case the package is available via the Conda Forge channel:
conda install -c conda-forge lmfit
Contrary to M. Newville's answer, this recommendation to prefer Conda packages is not about benefiting Conda developers, but instead a rule of thumb to help users avoid creating unstable or unreproducible environments. More info about the risks of mixing pip install and conda install can be found in the post "Using Pip in a Conda Environment".
Nevertheless, the comment that not all packages (and specifically lmfit) are found in the default repository and this complicates installation by requiring resorting to third-party channels is a good point. In fact, because third-parties are free to use different build stacks there are known problems with mixing packages built by Anaconda and those from Conda Forge. However, these issues tend to be rare and limited to compiled code. Additionally, adding trusted channels to a configuration and setting channel priorities heuristically solves the issue.
As for risks in using third-party channels, arbitrary Anaconda Cloud user channels are risky: one should only source packages from channels you trust (just like anything else one installs). Conda Forge in particular is well-reputed and all feedstocks are freely available on GitHub. Moreover, many Python package builds on Conda Forge are simply wrappers around the PyPI build of the package.
PyPI Last
Sometimes it isn't possible to avoid using PyPI. When one must resort to installing from PyPI, it is better practice to use the pip entrypoint from an activate environment, rather than pip3, since only some Conda builds of pip include pip3. For example,
conda activate my_env
pip install lmfit
Again, following the recommendations in "Using Pip in a Conda Environment", one should operate under the assumption that any subsequent calls to conda (install|upgrade|remove) in the environment could have undefined behavior.
PyPI Only
For the sake of completeness, I will note that a stable way of using Conda that is consistent with the recommendations is to limit Conda to the role of environment creation and use pip for all package installation.
This strategy is perhaps the least burden on the Python-only user, who doesn't want to deal with things like finding the Conda-equivalent package name or searching non-default channels. However, its applicability seems limited to Python-only environments, since other libraries may still need to resort to conda install.
[1]: Conda Forge and Anaconda started consistently including pip3 entrypoints for the pip module after version 20.2.
Installing a pure Python package, such as lmfit with the correct version of pip install lmfit should be fine.
Conda first is recommended to make the life of the conda maintainers and packagers easier, not the user's life. FWIW, I maintain both kinds of packages,
and there is no reason to recommend conda install lmfit over pip install lmfit.
In fact, lmfit is not in the default anaconda repository so that installing it requires going to a third-party conda channel such as conda-forge. That adds complexity and risk to your conda environment.
Really, pip install lmfit should be fine.
Since conda install and pip install in many cases do essentially the same thing, what would be the best option? Is there a case when someone should stick to pip install only? Symmetrical, is there a case when one should stick to conda install only? Is there a way to shoot in one's foot by using both conda and pip install in a single environment?
If both approaches are essentially the same and don't contradict each other there should be no reason to stick solely to one of them but not to the other.
Don't mix conda install and pip install within conda environment. Probably, decide to use conda or virtualenv+piponce and for all. And here is how you decide which one suits you best:
Conda installs various (not only python) conda-adopted packages within conda environment. It gets your environments right if you are into environments.
Pip installs python packages within Python environment (virtualenv is one of them). It gets your python packages installed right.
Safe way to use conda: don't rush for the latest stuff and stick to the available packages and you'll be fine.
Safe way to use pip+virtualenv: if you see a dependency issue or wish to remove and clean up after package - don't. Just burn the house, abandon your old environment and create a new one. One command line and 2-5 minutes later things gonna be nice and tidy again.
Pip is the best tool for installing Python packages among the two of them. Since pip packages normally come out first and only later are adopted for conda (by conda staff or contributors). Chances are, after updating or installing the latest version of Python some of the packages would only be available through pip. And the latest freshest versions of packages would only be available in pip. And mixing pip and conda packages together can be a nightmare (at least if you want to utilize conda's advantages).
Conda is the best when it comes to managing dependencies and replicating environments. When uninstalling a package conda can properly clean up after itself and has better control over conflicting dependency versions. Also, conda can export environment config and, if the planets are right at the moment and the new machine is not too different, replicate that environment somewhere else. Also, conda can have larger control over the environment and can, for example, have a different version of Python installed inside of it (virtualenv - only the Python available in the system). You can always create a conda package when you have no freedom of choosing what to use.
Some relevant facts:
Conda takes more space and time to setup
Conda might be better if you don't have admin rights on the system
Conda will help when you have no system Python
virtualenv+pip will free you up of knowing lots of details like that
Some outdated notions:
Conda used to be better for novice developers back in the day (2012ish). There is no usability gap anymore
Conda was linked to Continuum Analytics too much. Now Conda itself is open source, the packages - not so much.
Depends on the complexity of your environment really.
Using pip for a few simple packages should not generate any issues.
Using more pip installs raises the question "Why not use a pip venv then?".
If you're not doing anything major, you might be able to have a mix of pip and conda installs.
There is an extensive explanation why mixing them can be a bad idea here: Using Pip in a Conda Environment.
I'm a beginner trying to play around with machine learning. I downloaded python, and used pip to download libraries like TensorFlow, Pandas, Numpy, etc.
Now, I find that Anaconda is a better package manager to use for machine learning. I'm not sure what I'm supposed to do. Do I have to download all the libraries with Anaconda (which I tried to do with Pandas, and it said the library is already downloaded)?
Could you guys explain to me how I can move from using pip to using anaconda? I really don't understand environments, and this package manager stuff, so please help me!
In principle there is no need to change your package manager. Simply switch to do conda install the next time you would do pip install. Think of it like this: Do you have to re-download everything when switching from internet-explorer to firefox? Probably, some things work a little different between conda and pip but for a basic beginner, these differences should be neglectable.
You could freeze your pip packages and re-install them inside a conda environment to have everything (e.g. package dependencies) neatly managed by Anaconda, which is imho good practice. Pip packages will be available in every subsequent created conda environment, so if you want to use different packages in different environments, better re-install those using conda.
There is some non-trivial difference between conda and pip, mentioned here and here.
Best practices are to use different environment for different purposes. On a conda environment, download or re-download all requirement packages for that environment. Also always install a conda package only after you are done with pip install. Using both two environment, be sure not use the "--user" on pip as conda have user priviledge issues connecting to packages installed by pip.
You can check this link for more information