Can conda perform an install whilst minimally updating dependencies? - python

The conda install man page says
Conda attempts to install the newest versions of the requested
packages. To accomplish this, it may update some packages that
are already installed, or install additional packages.
So first, does this also apply to dependencies that it determines it needs to install or update? Assuming that the answer is "yes"; can that behaviour be changed? For example when working with legacy code it can be useful to update dependencies as little as possible, or install the oldest version of a dependency that will still work. Is there some way to get the conda dependency resolver to figure this out automatically, or does one have to resort to manually figuring out the dependency updates in this case?
Or maybe I am wrong entirely and this is the default behaviour? The dependency resolution rules are not clear to me from the documentation.

Conda's Two-Stage Solving
Conda first tries to find a version of the requested package that could be installed without changing any installed packages (a frozen solve). If that fails, it simply re-solves the entire environment from scratch with the new constraint added (a full solve). There is no in-between (e.g., minimize packages updated). Perhaps this will change in the future, but this has been the state of things for versions 4.6[?]-4.12.
Mamba
If one needs to manually work things out, I'd strongly suggest looking into Mamba. In addition to being a compiled (fast!) drop-in replacement for conda, the mamba repoquery tool could be helpful for identifying the constraints that are problematic. It has a depends subcommand for identifying dependencies and a whoneeds subcommand for reverse dependencies.
Suggested Workflow
Were I working with legacy code, I might try defining a YAML for the environment (env.yaml) and place upper bounds on crucial packages. If I needed new packages, I would dry run adding it (e.g., mamba install -d somepkg) to see how it affects the environment, figure out what if any constraint (again upper bound) it needs, add it to the YAML, then actually install it with mamba env update -f env.yaml.

Related

How to specify the source (git branch) of a conda build package

I created a conda package that build successfully, and that I can install with conda. I am using versioneer to automatically generate the version number of my builds. My project is in a git repository with multiple branches.
My problem is that when I want to install the package, conda will install the last built version (no matter the branch), whereas I would like that it installs by default the last version of the branch Master.
My workaround is to manually specify the version number of the version I want.
Is there a way to generate a version number with versioneer that will make conda install in priority the last built version of the branch master? Alternatively, is there a way to specify conda the branch to get the latest build?
Thanks
Rather than varying the version, I'd suggest looking into encoding the branch info into either the build string or the label/subdirectory. To me, these seem more semantically consistent with the situation.
Build Variants
For the former, this could either be done explicitly by defining a build string that includes some jinja-templated variable coordinated with the Git branch, or automatically through variants defined in the conda_build_config.yaml. If you get this working, then installing a build from branch foo would go something like:
conda install my_package=*=*foo
I don't know a simple example of this, but the Conda Forge blas-feedstock uses a conda_build_config.yaml to define the set of blas_impl options, which is then used to define build strings on the various outputs in meta.yaml.
Repository Labels
For the latter, I only know about Anaconda Cloud hosting (which you may not be using). In that case, one adds a label (subdir) with:
anaconda upload -l foo my_package.tar.gz
If you went this route, then installing a build from a branch foo would go something like:
conda install channel/foo::my_package
where "channel" is the channel to which you upload.

Get the list of packages used in Anaconda

Is there a way to get a list of packages that are being used rather than just installed in the environment?
Example: I can install matplotlib with conda install matplotlib, but if I never used it in any of the files I don't want it to be in the list.
Interesting idea to check the 'frequently used' packages in your environment.
It appears to me that there is no direct way for checking.
I am also attempting to work out this topic now. My layman idea is that we can do it in two consecutive stages: (a) to find out the most-used packages which were either often updated (checked using conda list --revisions) or easily recognized by the user; (b) to trace the dependencies of those packages (whether one package related to another package, or not) through pipdeptree command for checking packages' dependencies. This Anaconda link might also be useful: Managing Anaconda packages
The first step is to identify those most-used packages in your applications from time to time. Then only tracing their dependencies with other packages so that related packages were not unfavorably removed. Despite that, I still think it is better to stick with the default packages provided by Conda and will only add more packages if required.

What is exactly aggressive_update_packages in Anaconda?

I've recently started using the Anaconda environment and in the config list I came across an option called aggressive_update_packages. It is not very clear to me what happens when I add a new package to this. I couldn't find any satisfying description about this option (only a little bit here), so I only can assume what it does: I think it will keep autoupdated the certain package. However I'm certainly not sure how it works, that's what I'm asking. I'm actively developing a package especially for Anaconda environment, and for others it would be a nice feature to keep it automatically updated.
Why it exists
The default settings for the aggressive_updates_packages set are provided mostly for security purposes. Because Conda brings many native libraries with it, some of which provide core functionality for securely communicating on the internet, there is an implicit responsibility to ensure that it's making some effort to patch software that are frequent surfaces of generic cyberattacks.
Try searching any of the default software (e.g., openssl) in the NIST's National Vulnerability Database and you'll quickly get a sense of why it might be crucial to keep those packages patched. Running an old SSL protocol or having an outdated list of certificate authorities leaves one generically vulnerable.
How it works
Essentially, whenever one indicates a willingness to mutate an environment (e.g., conda (install|update|remove)), Conda will check for and request to install the latest versions of the packages in the set. Not much more to it than that. It does not autoupdate packages. If the user never tries to mutate the environment, the package will never be updated.
Repurposing functionality
OP suggests using this as a way to "keep autoupdated the certain package". It's possible that, if your users already frequently mutate their envs, the package will get updated frequently via this setting. However, the setting is not something the package can manipulate on its own (manipulating anything other than install files is expressly forbidden). Users would have to manually manipulate their settings to add "the certain package" to the list.
For users who are reproducibility-minded, I would actively discourage them from changing their global settings to add non-security-essential packages to their aggressive_updates_packages list.
According to conda release notes
aggressive updates: Conda now supports an aggressive_update_packages configuration parameter that holds a sequence of MatchSpec strings, in addition to the pinned_packages configuration parameter. Currently, the default value contains the packages ca-certificates, certifi, and openssl. When manipulating configuration with the conda config command, use of the --system and --env flags will be especially helpful here. For example:
conda config --add aggressive_update_packages defaults::pyopenssl --system
would ensure that, system-wide, solves on all environments enforce using the latest version of pyopenssl from the defaults channel.
conda config --add pinned_packages Python=2.7 --env
would lock all solves for the current active environment to Python versions matching 2.7.*.
According to this issue - https://github.com/conda/conda/issues/7419
This might means any new env created by default adds/updates the packages in aggressive_update_packages configuration.
How to get the variable value? - conda config --show

What is the advantage of Pip over Anaconda?

So, I have seen What is the difference between pip and conda?. However, all of the answers there appear to be from Anaconda supporters. So, that made me wonder: why is pip still the standard? why hasn't everyone just moved to anaconda?
I understand that anaconda only works with its own python, but is that the only disadvantage?
Based on my limited experience, I would guess that the main advantage of pip over conda is the ability to still install packages that are not available from conda or Anaconda.org.
https://conda.io/docs/using/pkgs.html#install-non-conda-packages - says basically the same.
I have been using conda for a while now, mostly studying Machine Learning and related subjects. I am a happy user 99.99% of the time. But when one faces challenges like building and installing tensorflow with GPU support for Mac that would support his or her rather specific/outdated GPU, one can't really rely on conda.
One huge advantage of pip is the built-in ability to install packages system-wide via f.ex.
sudo -H pip install ipython
It actually is smart enough to do this by default if run as the root user, installing to some directory in the global execution path. (/usr/local/bin?)
What can actually be considered an advantage for some things is that pip compiles packages (by default). So some packages like f.ex. theano which are actually optimized upon installation should not be installed via conda, or you are possibly missing out on this.
Finally, as mentioned, pip is directly linked to Python's package archive, whereas conda assumedly needs to be told when a new package was uploaded via a new configuation.

Migrating to pip+virtualenv from setuptools

So pip and virtualenv sound wonderful compared to setuptools. Being able to uninstall would be great. But my project is already using setuptools, so how do I migrate? The web sites I've been able to find so far are very vague and general. So here's an anthology of questions after reading the main web sites and trying stuff out:
First of all, are virtualenv and pip supposed to be in a usable state by now? If not, please disregard the rest as the ravings of a madman.
How should virtualenv be installed? I'm not quite ready to believe it's as convoluted as explained elsewhere.
Is there a set of tested instructions for how to install matplotlib in a virtual environment? For some reason it always wants to compile it here instead of just installing a package, and it always ends in failure (even after build-dep which took up 250 MB of disk space). After a whole bunch of warnings it prints src/mplutils.cpp:17: error: ‘vsprintf’ was not declared in this scope.
How does either tool interact with setup.py? pip is supposed to replace easy_install, but it's not clear whether it's a drop-in or more complicated relationship.
Is virtualenv only for development mode, or should the users also install it?
Will the resulting package be installed with the minimum requirements (like the current egg), or will it be installed with sources & binaries for all dependencies plus all the build tools, creating a gigabyte monster in the virtual environment?
Will the users have to modify their $PATH and $PYTHONPATH to run the resulting package if it's installed in a virtual environment?
Do I need to create a script from a text string for virtualenv like in the bad old days?
What is with the #egg=Package URL syntax? That's not part of the standard URL, so why isn't it a separate parameter?
Where is #rev included in the URL? At the end I suppose, but the documentation is not clear about this ("You can also include #rev in the URL").
What is supposed to be understood by using an existing requirements file as "as a sort of template for the new file"? This could mean any number of things.
Wow, that's quite a set of questions. Many of them would really deserve their own SO question with more details. I'll do my best:
First of all, are virtualenv and pip
supposed to be in a usable state by
now?
Yes, although they don't serve everyone's needs. Pip and virtualenv (along with everything else in Python package management) are far from perfect, but they are widely used and depended upon nonetheless.
How should virtualenv be installed?
I'm not quite ready to believe it's as
convoluted as explained elsewhere.
The answer you link is complex because it is trying to avoid making any changes at all to your global Python installation and install everything in ~/.local instead. This has some advantages, but is more complex to setup. It's also installing virtualenvwrapper, which is a set of convenience bash scripts for working with virtualenv, but is not necessary for using virtualenv.
If you are on Ubuntu, aptitude install python-setuptools followed by easy_install virtualenv should get you a working virtualenv installation without doing any damage to your global python environment (unless you also had the Ubuntu virtualenv package installed, which I don't recommend as it will likely be an old version).
Is there a set of tested instructions
for how to install matplotlib in a
virtual environment? For some reason
it always wants to compile it here
instead of just installing a package,
and it always ends in failure (even
after build-dep which took up 250 MB
of disk space). After a whole bunch of
warnings it prints
src/mplutils.cpp:17: error: ‘vsprintf’
was not declared in this scope.
It "always wants to compile" because pip, by design, installs only from source, it doesn't install pre-compiled binaries. This is a controversial choice, and is probably the primary reason why pip has seen widest adoption among Python web developers, who use more pure-Python packages and commonly develop and deploy in POSIX environments where a working compilation chain is standard.
The reason for the design choice is that providing precompiled binaries has a combinatorial explosion problem with different platforms and build architectures (including python version, UCS-2 vs UCS-4 python builds, 32 vs 64-bit...). The way easy_install finds the right binary package on PyPI sort of works, most of the time, but doesn't account for all these factors and can break. So pip just avoids that issue altogether (replacing it with a requirement that you have a working compilation environment).
In many cases, packages that require C compilation also have a slower-moving release schedule and it's acceptable to simply install OS packages for them instead. This doesn't allow working with different versions of them in different virtualenvs, though.
I don't know what's causing your compilation error, it works for me (on Ubuntu 10.10) with this series of commands:
virtualenv --no-site-packages tmp
. tmp/bin/activate
pip install numpy
pip install -f http://downloads.sourceforge.net/project/matplotlib/matplotlib/matplotlib-1.0.1/matplotlib-1.0.1.tar.gz matplotlib
The "-f" link is necessary to get the most recent version, due to matplotlib's unusual download URLs on PyPI.
How does either tool interact with
setup.py? pip is supposed to replace
easy_install, but it's not clear
whether it's a drop-in or more
complicated relationship.
The setup.py file is a convention of distutils, the Python standard library's package management "solution." distutils alone is missing some key features, and setuptools is a widely-used third-party package that "embraces and extends" distutils to provide some additional features. setuptools also uses setup.py. easy_install is the installer bundled with setuptools. Setuptools development stalled for several years, and distribute was a fork of setuptools to fix some longstanding bugs. Eventually the fork was resolved with a merge of distribute back into setuptools, and setuptools development is now active again (with a new maintainer).
distutils2 was a mostly-rewritten new version of distutils that attempted to incorporate the best ideas from setuptools/distribute, and was supposed to become part of the Python standard library. Unfortunately this effort failed, so for the time being setuptools remains the de facto standard for Python packaging.
Pip replaces easy_install, but it does not replace setuptools; it requires setuptools and builds on top of it. Thus it also uses setup.py.
Is virtualenv only for development
mode, or should the users also install
it?
There's no single right answer to that; it can be used either way. In the end it's really your user's choice, and your software ideally should be able to be installed inside or out of a virtualenv; though you might choose to document and emphasize one approach or the other. It depends very much on who your users are and what environments they are likely to need to install your software into.
Will the resulting package be
installed with the minimum
requirements (like the current egg),
or will it be installed with sources &
binaries for all dependencies plus all
the build tools, creating a gigabyte
monster in the virtual environment?
If a package that requires compilation is installed via pip, it will need to be compiled from source. That also applies to any dependencies that require compilation.
This is unrelated to the question of whether you use a virtualenv. easy_install is available by default in a virtualenv and works just fine there. It can install pre-compiled binary eggs, just like it does outside of a virtualenv.
Will the users have to modify their
$PATH and $PYTHONPATH to run the
resulting package if it's installed in
a virtual environment?
In order to use anything installed in a virtualenv, you need to use the python binary in the virtualenv's bin/ directory (or another script installed into the virtualenv that references this binary). The most common way to do this is to use the virtualenv's activate or activate.bat script to temporarily modify the shell PATH so the virtualenv's bin/ directory is first. Modifying PYTHONPATH is not generally useful or necessary with virtualenv.
Do I need to create a script from a
text string for virtualenv like in the
bad old days?
No.
What is with the #egg=Package URL
syntax? That's not part of the
standard URL, so why isn't it a
separate parameter?
The "#egg=projectname-version" URL fragment hack was first introduced by setuptools and easy_install. Since easy_install scrapes links from the web to find candidate distributions to install for a given package name and version, this hack allowed package authors to add links on PyPI that easy_install could understand, even if they didn't use easy_install's standard naming conventions for their files.
Where is #rev included in the URL? At
the end I suppose, but the
documentation is not clear about this
("You can also include #rev in the
URL").
A couple sentences after that quoted fragment there is a link to "read the requirements file format to learn about other features." The #rev feature is fully documented and demonstrated there.
What is supposed to be understood by
using an existing requirements file as
"as a sort of template for the new
file"? This could mean any number of
things.
The very next sentence says "it will keep the packages listed in devel-req.txt in order and preserve comments." I'm not sure what would be a better concise description.
I can't answer all your questions, but hopefully the following helps.
Both virtualenv and pip are very usable. Many Python devs use these everyday.
Since you have a working easy_install, the easiest way to install both is the following:
easy_install pip
easy_install virtualenv
Once you have virtualenv, just type virtualenv yourEnvName and you'll get your new python virtual environment in a directory named yourEnvName.
From there, it's as easy as source yourEnvName/bin/activate and the virtual python interpreter will be your active. I know nothing about matplotlib, but following the installation interactions should work out ok unless there are weird hard-coded path issues.
If you can install something via easy_install you can usually install it via pip. I haven't found anything that easy_install could do that pip couldn't.
I wouldn't count on users being able to install virtualenv (it depends on who your users are). Technically, a virtual python interpreter can be treated as a real one for most cases. It's main use is not cluttering up the real interpreter's site-packages and if you have two libraries/apps that require different and incompatible versions of the same library.
If you or a user install something in a virtualenv, it won't be available in other virtualenvs or the system Python interpreter. You'll need to use source /path/to/yourvirtualenv/bin/activate command to switch to a virtual environment you installed the library on.
What they mean by "as a sort of template for the new file" is that the pip freeze -r devel-req.txt > stable-req.txt command will create a new file stable-req.txt based on the existing file devel-req.txt. The only difference will be anything installed not already specified in the existing file will be in the new file.

Categories