What is exactly aggressive_update_packages in Anaconda? - python

I've recently started using the Anaconda environment and in the config list I came across an option called aggressive_update_packages. It is not very clear to me what happens when I add a new package to this. I couldn't find any satisfying description about this option (only a little bit here), so I only can assume what it does: I think it will keep autoupdated the certain package. However I'm certainly not sure how it works, that's what I'm asking. I'm actively developing a package especially for Anaconda environment, and for others it would be a nice feature to keep it automatically updated.

Why it exists
The default settings for the aggressive_updates_packages set are provided mostly for security purposes. Because Conda brings many native libraries with it, some of which provide core functionality for securely communicating on the internet, there is an implicit responsibility to ensure that it's making some effort to patch software that are frequent surfaces of generic cyberattacks.
Try searching any of the default software (e.g., openssl) in the NIST's National Vulnerability Database and you'll quickly get a sense of why it might be crucial to keep those packages patched. Running an old SSL protocol or having an outdated list of certificate authorities leaves one generically vulnerable.
How it works
Essentially, whenever one indicates a willingness to mutate an environment (e.g., conda (install|update|remove)), Conda will check for and request to install the latest versions of the packages in the set. Not much more to it than that. It does not autoupdate packages. If the user never tries to mutate the environment, the package will never be updated.
Repurposing functionality
OP suggests using this as a way to "keep autoupdated the certain package". It's possible that, if your users already frequently mutate their envs, the package will get updated frequently via this setting. However, the setting is not something the package can manipulate on its own (manipulating anything other than install files is expressly forbidden). Users would have to manually manipulate their settings to add "the certain package" to the list.
For users who are reproducibility-minded, I would actively discourage them from changing their global settings to add non-security-essential packages to their aggressive_updates_packages list.

According to conda release notes
aggressive updates: Conda now supports an aggressive_update_packages configuration parameter that holds a sequence of MatchSpec strings, in addition to the pinned_packages configuration parameter. Currently, the default value contains the packages ca-certificates, certifi, and openssl. When manipulating configuration with the conda config command, use of the --system and --env flags will be especially helpful here. For example:
conda config --add aggressive_update_packages defaults::pyopenssl --system
would ensure that, system-wide, solves on all environments enforce using the latest version of pyopenssl from the defaults channel.
conda config --add pinned_packages Python=2.7 --env
would lock all solves for the current active environment to Python versions matching 2.7.*.
According to this issue - https://github.com/conda/conda/issues/7419
This might means any new env created by default adds/updates the packages in aggressive_update_packages configuration.
How to get the variable value? - conda config --show

Related

How to download a full package dependency chain using conda?

I have a requirements file that describes the conda packages I need. These are required dependencies in a script I want to run. The script needs to run on a machine that is not connected to a network or internet. Therefore, I have decided to use download-only as explained here.
This is not working for me. When I choose just one dependency to download, I get an error. This is my command-line statement:
conda install --download-only grpcio=1.35.0
This is the error I get:
CondaExitZero: Package caches prepared. UnlinkLinkTransaction
cancelled with --download-only option
Apparently, the download-only is in this cases helps me to cancel the download.... That is not what I want. I want to download the full dependency chain in order to use it in an offline environment.
How do I do that?
Please file a bug report on the Conda repository, since this appears that it shouldn't be happening. I suspect the issue is that for download-only transactions, Conda still determines that a package might need to be removed in order to eventually install the requested package. That shouldn't be necessary, so I'd call that a bug.
In the meantime, either try creating a new environment and then using the --download-only flag with that environment activated. Alternatively, this answer shows how to download raw packages for a full solve, ignoring any existing package cache.

How can the channel order from a yaml conda environment be made persistent?

I am creating a conda environment from myenv.yml. The contents include
channels:
- mychan
- defaults
- conda-forge
dependencies:
- matplotlib
- pandas
- myweirdpy
- exoticcondaforgelib
The channel order is important, but I cant tell from the scant docs I've found whether conda env create with a channels section implies preferred channel order and strict channel order isn't an option on the command. It seems a lot of people use conda create to create an empty environment, then use conda config to enforce the strict order, then use conda env update to catch the yaml part. There is an example in one of the answers at Set the channel_priority in Conda environment.yaml. But the behavior is still not clear to me. For instance, this answer has channel priorities in the .condarc and in the yaml. How is this resolved? Would it do the right thing if I set channel priority strict using conda-config?
Ideally I also want the order of the dependencies respected if I update one library or install something new with dependencies, so I guess I don't mind setting up .condarc for posterity. The ideal way for me to make this simple for users would be if there were a way to set channels in yaml and have them be strictly respected and used to generate an environment-specific .condarc.
There's not a simple transparent mechanism to accomplish this. An environment-local .condarc file is a good strategy and likely the simplest to follow for users. Otherwise, you would have them run conda config --env commands with the environment activated which effectively just creates the same .condarc.
An alternative hacky approach might be to create a custom package that sets such configuration options, but that has shortcomings and some might regard it as taboo. That is, a post-link script could potentially manipulate the environment's local .condarc to conform your desired settings. However, that wouldn't prevent users from changing them manually, and therefore even having the custom package installed couldn't guarantee that the environment is in the state that package was intended to signify.

Can conda perform an install whilst minimally updating dependencies?

The conda install man page says
Conda attempts to install the newest versions of the requested
packages. To accomplish this, it may update some packages that
are already installed, or install additional packages.
So first, does this also apply to dependencies that it determines it needs to install or update? Assuming that the answer is "yes"; can that behaviour be changed? For example when working with legacy code it can be useful to update dependencies as little as possible, or install the oldest version of a dependency that will still work. Is there some way to get the conda dependency resolver to figure this out automatically, or does one have to resort to manually figuring out the dependency updates in this case?
Or maybe I am wrong entirely and this is the default behaviour? The dependency resolution rules are not clear to me from the documentation.
Conda's Two-Stage Solving
Conda first tries to find a version of the requested package that could be installed without changing any installed packages (a frozen solve). If that fails, it simply re-solves the entire environment from scratch with the new constraint added (a full solve). There is no in-between (e.g., minimize packages updated). Perhaps this will change in the future, but this has been the state of things for versions 4.6[?]-4.12.
Mamba
If one needs to manually work things out, I'd strongly suggest looking into Mamba. In addition to being a compiled (fast!) drop-in replacement for conda, the mamba repoquery tool could be helpful for identifying the constraints that are problematic. It has a depends subcommand for identifying dependencies and a whoneeds subcommand for reverse dependencies.
Suggested Workflow
Were I working with legacy code, I might try defining a YAML for the environment (env.yaml) and place upper bounds on crucial packages. If I needed new packages, I would dry run adding it (e.g., mamba install -d somepkg) to see how it affects the environment, figure out what if any constraint (again upper bound) it needs, add it to the YAML, then actually install it with mamba env update -f env.yaml.

How to manage conda shared environment and local environment?

Say the team has a shared conda environment called 'env1' located at this directory:
/home/share/conda/envs/env1/
While my working directory is at:
/home/my_name/...
I do not have write permission for any files under /home/share/, just read permission.
Now I want to use the 'env1' environment with one additional library installed (this library is not originally appearing in /home/share/conda/envs/env1/)
How should I achieve that without re-installing everything in env1 again to my own directory. Also I have to use 'conda install' for that additional package.
I feel that this has something to do with 'conda install --use-local' to handle such shared-local-combined environment situation, but not sure exactly about the procedure.
Thanks for any help and explanation!
It looks like the --use-local flag only refers to whether conda should install a package built locally and maybe is not distributed through the usual channels (or instead of the usual channels). So I don't think this directly relates to your case.
Perhaps one solution is to clone this shared environment into a new one under your own account, where you have write permissions. Then conda install the new package you need in that environment. If you are concerned that it takes up space or duplicates the packages, I recommend reading this answer here, which explains that conda actually tries to not waste space by using hardlinks, so most likely the packages will not actually re-installed, but rather reused, in this new environment.
Finally, I'd personally try to create a new environment even just for the reason of clarity. If I later came back to this project, I'd like to know that it requires your "base/shared" env + an additional package. If it's named identically to the shared one, perhaps this difference wouldn't be so obvious.

Virtualenv for multiple users or groups

I'm setting up a new system for a group of Python rookies to do a specific kind of scientific work using Python. It's got 2 different pythons on it (32 and 64 bit), and I want to install a set of common modules that users on the system will use.
(a) Some modules work out of the box for both pythons,
(b) some compile code and install differently depending on the python, and
(c) some don't work at all on certain pythons.
I've been told that virtualenv (+ wrapper) is good for this type of situation, but it's not clear to me how.
Can I use virtualenv to set up sandboxed modules across multiple user accounts without having to install each module for each user?
Can I use virtualenv to save me some time for case (a), i.e. install a module, but have all pythons see it?
I like the idea of isolating environments, and then having them just type "workon science32", "workon science64", depending on the issues with case (c).
Any advice is appreciated.
With virtualenv, you can allow each environment to use globally installed system packages simply by omitting the --no-site-packages option. This is the default behavior.
If you want to make each environment install all of their own packages, then use --no-site-packages and you will get a bare python installation to install your own modules. This is useful when you do not want packages to conflict with system packages. I normally do this just to keep system upgrades from interfering with working code.
I would be careful about thinking about these as sandboxes, because they are only partially isolated. The paths to python binaries and libraries are modified to use the environment, but really that is all that is going on. Virtualenv does nothing to prevent code running from doing destructive things to the system. Best way to sandbox is set Linux/Unix permissions properly, and give them their own user accounts.
EDIT For Version 1.7+
The default for 1.7 is to not include system packages, so if you want the behavior of using system packages, use the --system-site-packages option. Check the docs for more info.

Categories