How do I get snakemake to activate a conda environment that already exists in my environment list?
I know you can use the --use-conda with a .yaml environment file but that seems to generate a new environment which is just annoying when the environment already exists. Any help with this would be much appreciated.
I have tried using the:
conda:
path/to/some/yamlFile
but it just returns command not found errors for packages in the environment
It is possible. It is essentially an environment config issue. You need to call bash in the snakemake rules and load conda-init'd bash profiles there. Below example works with me:
rule test_conda:
shell:
"""
bash -c '
. $HOME/.bashrc # if not loaded automatically
conda activate base
conda deactivate'
"""
In addition, --use-conda is not necessary in this case at all.
Follow up to answer by liagy, since snakemake runs with strict bash mode (set -u flag), conda activate or deactivate may throw an error showing unbound variable related to conda environment. I ended up editing parent conda.sh file which contains activate function. Doing so will temporarily disable u flag while activating or deactivating conda environments but will preserve bash strict mode for rest of snakemake workflow.
Here is what I did:
Edit (after backing up the original file) ~/anaconda3/etc/profile.d/conda.sh and add following from the first line within __conda_activate() block:
__conda_activate() {
if [[ "$-" =~ .*u.* ]]; then
local bash_set_u
bash_set_u="on"
## temporarily disable u flag
## allow unbound variables from conda env
## during activate/deactivate commands in
## subshell else script will fail with set -u flag
## https://github.com/conda/conda/issues/8186#issuecomment-532874667
set +u
else
local bash_set_u
bash_set_u="off"
fi
# ... rest of code from the original script
And also add following code at the end of __conda_activate() block to re-enable bash strict mode only if present prior to running conda activate/deactivate functions.
## reenable set -u if it was enabled prior to
## conda activate/deactivate operation
if [[ "${bash_set_u}" == "on" ]]; then
set -u
fi
}
Then in Snakefile, you can have following shell commands to manage existing conda environments.
shell:"""
## check current set flags
echo "$-"
## switch conda env
source ~/anaconda3/etc/profile.d/conda.sh && conda activate r-reticulate
## Confirm that set flags are same as prior to conda activate command
echo "$-"
## switch conda env again
conda activate dev
echo "$-"
which R
samtools --version
## revert to previous: r-reticulate
conda deactivate
"""
You do not need to add above patch for __conda_deactivate function as it sources activate script.
PS: Editing ~/anaconda3/etc/profile.d/conda.sh is not ideal. Always backup the original and edited filed. Updating conda will most likely overwrite these changes.
Prefer Snakemake-managed environments
This is an old answer, from before Snakemake added a feature to allow user-managed environments. Other answers cover the newer functionality. Nevertheless, I am retaining this answer here because I believe it adds perspective to the problem, and why this feature is still discouraged from being used. Specifically, from the documentation:
"Importantly, one should be aware that this can hamper reproducibility, because the workflow then relies on this environment to be present in exactly the same way on any new system where the workflow is executed. Essentially, you will have to take care of this manually in such a case. Therefore, the approach using environment definition files described above is highly recommended and preferred." [emphasis in the original]
(Mostly) Original Answer
This wasn't previously possible and I'd still argue it was mostly a good thing. Snakemake having sole ownership of the environment helps improve reproducibility by requiring one to update the YAML instead of directly manipulating the environment with conda (install|update|remove). Note that such a practice of updating a YAML and recreating is a Conda best practice when mixing in Pip, and it definitely doesn't hurt to adopt it generally.
Conda does a lot of hardlinking, so I wouldn't sweat the duplication too much - it's mostly superficial. Moreover, if you create a YAML from the existing environment you wish to use (conda env export > env.yaml) and give that to Snakemake, then all the identical packages that you already have downloaded will be used in the environment that Snakemake creates.
If space really is such a tight resource, you can simply not use Snakemake's --use-conda flag and instead activate your named envs as part of the shell command or script you provide. I would be very careful not to manipulate those envs or at least be very diligent about tracking changes made to them. Perhaps, consider tracking the output of conda env export > env.yaml under version control and putting that YAML as an input file in the Snakemake rules that activate the environment. This way Snakemake can detect that the environment has mutated and the downstream files are potentially outdated.
This question is still trending on Google, so an update:
Since snakemake=6.14.0 (2022-01-26) using an existing, named conda environment is a supported feature.
You simply put the name of the environment some-env-name into the rules conda directive (instead of the .yaml file) and use snakemake --use-conda:
rule NAME:
input:
"table.txt"
output:
"plots/myplot.pdf"
conda:
"some-env-name"
script:
"scripts/plot-stuff.R"
Documentation: https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#using-already-existing-named-conda-environments
Note: It is recommended to use the feature sparsely and prefer to specify a environment.yaml file instead to increase reproduceability.
Related
I have to connect to a server where my user has access to one small partition from /home/users/user_name where I have a quota of limited space and a bigger partition into /big_partition/users/user
After I am logging into that server I will arrive at /home/users/user_name at the bigging. After that, I am doing the following steps.
cd /big_partition/users/user
create conda --prefix=envs python=3.6
on the 4th line, it says Package plan for installation in environment /big_partition/users/user/envs: which is ok.
press y, and not I am getting the following message.
OSError: [Errno 122] Disk quota exceeded: '/home/users/user_name/.conda/envs/.pkgs/python-3.6.2-0/lib/python3.6/unittest/result.py'
Can anyone help me to understand how can I move the .conda folder from /home/users/user_name to /big_partition/users/user at the moment when I am creating this environment?
Configure Environment and Package Default Locations
I'd guess that, despite your efforts to put your environments on the large partition, there is still a default user-level package cache and that is filling up the home partition. At minimum, set up a new package cache and a default environments directory on the large partition:
# create a new pkgs_dirs (wherever, doesn't have to be hidden)
mkdir -p /big_partition/users/user/.conda/pkgs
# add it to Conda as your default
conda config --add pkgs_dirs /big_partition/users/user/.conda/pkgs
# create a new envs_dirs (again wherever)
mkdir -p /big_partition/users/user/.conda/envs
# add it to Conda as your default
conda config --add envs_dirs /big_partition/users/user/.conda/envs
Now you don't have to fuss around with using the --prefix flag any more - your named environments (conda create -n foo) will by default be created inside this directory and you can activate by name instead of directory (conda activate foo).
Transferring Previous Environments and Package Cache
Unfortunately, there's not a great way to move Conda environments across filesystems without destroying the hardlinks. Instead, you'll need to recreate your environments. Since you may or may not want to bother with this, I'm only going to outline it. I can elaborate if needed.
Archive environments. Use conda env export -n foo > foo.yaml (One per environment.)
Move package cache. Copy contents of old package cache (/home/users/user_name/.conda/envs/.pkgs/) to new package cache.
Recreate environments. Use conda env create -n foo -f foo.yaml.
Again, you could just skip this altogether. This is mainly if you want to be very thorough about transferring and not having to redownload stuff for environments you already created.
After this you can delete some the stuff under the old ~/.conda/envs/pkgs folder.
I found the solution. All I need to do is to export CONDA_ENVS_PATH with the path where I want to be the .conda
export CONDA_ENVS_PATH=.
I have a virtual environment in a folder which I have not yet activated. Running conda env list will not list this environment.
However, after I have activated the environment with Conda for the first time, now every time I run conda env list this environment will be included.
Is there a way to ignore the environment in conda env list, without removing it (since this removes the whole folder)?
I want to keep the folder without removing it and without Conda listing it every time.
AFAIK, there is no way through configuration or otherwise out-of-the-box options to designate specific environments to be ignored by the conda env list command. However, if we look under the hood at how Conda generates this list, we can at least implement a workaround.
Conda User Environment Tracking
Conda tracks environments in two ways:
Environments located in any of the envs_dirs directories are automatically discovered. You can check conda config --show envs_dirs to see which directories that includes. Such environments will not be ignorable in conda env list without altering the internals of how conda-env works (i.e., you'd have to fork the conda code and add new functionality).
Any time a user activates an environment, it gets added to a user-specific tracking file, namely,
~/.conda/environments.txt
where ~ is the user home directory. Purging this file of environments you wish to hide should remove the environment from the conda env list output, at least until it is activated again.
Automated Workaround Example
If you'd like a more automated purging, you could include a line in your shell initialization file (e.g., .bash_profile) to remove entries from this file using something like sed and a regex to match the environments you wish to omit.
As a concrete example of this, I frequently encounter this problem as Snakemake user. Snakemake automatically generates Conda environments and uses them to run code in reproducible (-ish) contexts. These environments all get generated under a .snakemake/ directory and eventually start to dominate my conda env list output. This is an absolute bother, since I never intend to manually activate any of these environments, plus they are all named by hashes, so it is practically impossible to recognize their contents by prefix.
To automatically purge these, I can add the following to my .bashrc or .bash_profile:
sed -i '/\.snakemake/d' ~/.conda/environments.txt
This will still lead to transiently showing these auto-generated environments temporarily, but they'll get purged every time a new shell launches. Hopefully, such transient cases aren't a major bother, otherwise I imagine more creative solutions to this are also workable, e.g., triggering the purge operation whenever the file is altered.
When I run the following command:
conda env create -f virtual_platform_mac.yml
I get this error
Collecting package metadata (repodata.json): done
Solving environment: failed
ResolvePackageNotFound:
- pytables==3.4.2=np113py35_0
- h5py==2.7.0=np113py35_0
- anaconda==custom=py35_0
How can I solve this?
I am working on Mac OS X.
Conda v4.7 dropped a branch of the Anaconda Cloud repository called the free channel for the sake of improving solving performance. Unfortunately, this includes many older packages that never got ported to the repository branches that were retained. The requirements failing here are affected by this.
Restore free Channel Searching
Conda provides a means to restore access to this part of the repository through the restore_free_channel configuration option. You can verify that this is the issue by seeing that
conda search pytables=3.4.2[build=np113py35_0]
fails, whereas
CONDA_RESTORE_FREE_CHANNEL=1 conda search pytables=3.4.2[build=np113py35_0]
successfully finds the package, and similarly for the others.
Option 1: Permanent Setting
If you expect to frequently need older packages, then you can globally set the option and then proceed with installing:
conda config --set restore_free_channel true
conda env create -f virtual_platform_mac.yml
Option 2: Temporary Setting
As with all Conda configuration options, you can also use the corresponding environment variable to temporarily restore access just for the command:
Unix/Linux
CONDA_RESTORE_FREE_CHANNEL=1 conda env create -f virtual_platform_mac.yml
Windows
SET CONDA_RESTORE_FREE_CHANNEL=1
conda env create -f virtual_platform_mac.yaml
(Yes, I realize the cognitive dissonance of a ..._mac.yaml, but Windows users need help too.)
Including Channel Manually
One can also manually include the channel as one to be searched:
conda search -c free pytables=3.4.2[build=np113py35_0]
Note that any of these approaches will only use the free channel in this particular search and any future searches or changes to the env will not search the channel.
Pro-Tip: Env-specific Settings
If you have a particular env that you always want to have access to the free channel but you don't want to set this option globally, you can instead set the configuration option only for the environment.
conda activate my_env
conda config --env --set restore_free_channel true
A similar effect can be accomplished by setting and unsetting the CONDA_RESTORE_FREE_CHANNEL variable in scripts placed in the etc/conda/activate.d and etc/conda/deactivate.d folders, respectively. See the documentation for an example.
Another solution might be explained here. Basically, if you import an environment.yml file to a different OS (e.g., from macOS to Windows) you will get build errors.
The solution is to use the flag "--no-buils", but it does not guarantee that the environment.yml will actually be compatible. Some libraries, e.g. libgfortran, are not found on Windows channels for Anaconda (see here).
I would use
CONDA_RESTORE_FREE_CHANNEL=1 conda env create -f
to keep using outdated/older packages
I recently installed the Anaconda version of Python. Now when I type python into the terminal it opens the Anaconda distribution rather than the default distribution. How do I get it to use the default version for the command python on Linux (Ubuntu 12.04 (Precise Pangolin))?
Anaconda adds the path to your .bashrc, so it is found first. You can add the path to your default Python instance to .bashrc or remove the path to Anaconda if you don't want to use it.
You can also use the full path /usr/bin/python in Bash to use the default Python interpreter.
If you leave your .bashrc file as is, any command you run using python will use the Anaconda interpreter. If you want, you could also use an alias for each interpreter.
You will see something like export PATH=$HOME/anaconda/bin:$PATH in your .bashrc file.
So basically, if you want to use Anaconda as your main everyday interpreter, use the full path to your default Python or create an alias. If you want it the other way around, remove the export PATH=.... from bashrc and use full path to Anaconda Python interpreter.
Having tried all the suggestions so far, I think modifying the export statement in file ~/.bashrc, as Piotr Dobrogost seems to suggest, is the best option considering the following:
If you remove the whole statement, you have to use full paths for Conda binaries.
Using Conda 4.4.10 links in the directory anaconda/bin/ point to binaries in the same directory, not the system ones in /usr/bin.
Using this approach you get the system programs for all that have been previously included in $PATH and also the ones specific to anaconda without using full paths.
So in file ~/.bashrc instead of
# Added by the Anaconda3 4.3.0 installer
export PATH="/home/user/anaconda3/bin:$PATH"
one would use
export PATH="$PATH:/home/user/anaconda3/bin"
I faced the same issue and you can do the following.
Go into your .bashrc file and you will find a similar sort of line:
export PATH=~/anaconda3/bin:$PATH
You comment it out and instead type out:
alias pyconda='~/anaconda3/bin/python3'
Or whatever your path is. This worked out for me.
In the year 2020, Conda adds in a more complicated block of code at the bottom of your .bash_profile file that looks something like this:
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/Users/spacetyper/opt/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/Users/spacetyper/opt/miniconda3/etc/profile.d/conda.sh" ]; then
. "/Users/spacetyper/opt/miniconda3/etc/profile.d/conda.sh"
else
export PATH="/Users/spacetyper/opt/miniconda3/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
To use the default Python install by default: Simply move this section of code to the very top of your .bash_profile file.
To give yourself the option of using the Conda installed Python: Add this line below the Conda code block above.
alias pyconda="/Users/spacetyper/opt/miniconda3/bin/python3"
Now you should be able to call the system Python install with python and the Conda install with pyconda.
at 2020, like the #spacetyper mentioned, it acted differently. I've found a handy solution for that from this question: How do I prevent Conda from activating the base environment by default?
To disable automatic base activation:
conda config --set auto_activate_base false
it'll create a ./condarc in home directory after running the first time.
I found that, though I remove export=.../anaconda3/bin:$PATH, there is still .../anaconda3/envs/py36/bin(my virtual environment in Anaconda) in PATH, and the shell still uses Anaconda Python.
So I export PATH=/usr/bin:$PATH (/usr/bin is where system Python reside). Though there is already /usr/bin inPATH, we make it searched before the path of Anaconda, and then the shell will use the system Python when you key python, python3.6, pip, pip3 ....
You can get back to Anaconda by using an alias like mentioned above, or default to Anaconda again by comment export PATH=/usr/bin:$PATH.
There are python, python2 and python2.7 shortcuts in both the /home/username/anaconda/bin/ and /usr/bin/ directory. So you can delete any one of them from one folder and use that for the other.
I mean, if you delete the python2 shortcut from the Anaconda directory, you will have the Python for Anaconda version and
python2 for the default version in the terminal.
I use Anaconda sparingly to build cross-platform packages, but I don't want to use it as my daily driver for Python. For Anaconda, Ruby, and Node.js projects I've adopted to use environment sand-boxing, which essentially hides functionality behind a function away from your path until you specifically need it. I first learned about it from these two GitHub repositories:
https://github.com/benvan/sandboxd
https://github.com/maximbaz/dotfiles
I have a file of sandboxing functions that looks like this:
.zsh/sandboxd.zsh:
#!/bin/zsh
# Based on
# https://github.com/maximbaz/dotfiles/.zsh/sandboxd.zsh
# which was originally adapted from:
# https://github.vom/benvan/sandboxd
# Start with an empty list of all sandbox cmd:hook pairs
sandbox_hooks=()
# deletes all hooks associated with cmd
function sandbox_delete_hooks() {
local cmd=$1
for i in "${sandbox_hooks[#]}";
do
if [[ $i == "${cmd}:"* ]]; then
local hook=$(echo $i | sed "s/.*://")
unset -f "$hook"
fi
done
}
# Prepares the environment and removes hooks
function sandbox() {
local cmd=$1
# NOTE: Use original grep, because aliased grep is using color
if [[ "$(type $cmd | \grep -o function)" = "function" ]]; then
(>&2 echo "Lazy-loading '$cmd' for the first time...")
sandbox_delete_hooks $cmd
sandbox_init_$cmd
else
(>&2 echo "sandbox '$cmd' not found.\nIs 'sandbox_init_$cmd() { ... }' defined and 'sandbox_hook $cmd $cmd' called?")
return 1
fi
}
function sandbox_hook() {
local cmd=$1
local hook=$2
#echo "Creating hook ($2) for cmd ($1)"
sandbox_hooks+=("${cmd}:${hook}")
eval "$hook(){ sandbox $cmd; $hook \$# }"
}
.zshrc
In my .zshrc I create my sandbox'd function(s):
sandbox_hook conda conda
This command turns the normal conda executable into:
conda () {
sandbox conda
conda $#
}
An added bonus of using this technique is that it speeds up shell loading times because sourcing a number of wrapper scripts (e.g. nvm, rvm, etc.) can slow your shell startup time.
It also bugged me that Anaconda installed its Python 3 executable as python by default, which breaks a lot of legacy Python scripts, but that's a separate issue. Using sandboxing like this makes me explicitly aware that I'm using Anaconda's Python instead of the system default.
Anaconda 3 adds more than a simple line in my .bashrc file.
However, it also backs up the original .bashrc file into a .bashrc-anaconda3.bak file.
So my solution was to swap the two.
For my case, when I had
alias python='/usr/bin/python3.6'
in the ~/.bashrc, it always called python3.6 inside and outside of Anaconda Virtual Environment.
In this setting, you could set the Python version by python3 in each Virtual Environment.
I'm trying to test out creating virtual envs through conda create on OS X. It's my first real foray into virtual envs so I'm still wrapping my mind around how to tool them. My first test was
$ conda create -p /users/me/anaconda/envs/envtest
$ source activate /users/me/anaconda/envs/envtest
But when I go to take it down via source deactivate, I get:
Error: too many arguments.
Some googling seems to indicate that there is some configuration in my .profile file that's affecting this but that file is empty. It will probably help to show my .bash_profile:
[[ -s "$HOME/.rvm/scripts/rvm" ]] && . "$HOME/.rvm/scripts/rvm" # Load RVM function
# Added by Canopy installer on 2013-09-12
# VIRTUAL_ENV_DISABLE_PROMPT can be set to '' to make bashprompt show that Canopy is active, otherwise 1
VIRTUAL_ENV_DISABLE_PROMPT=1 source /Users/ibebian/Library/Enthought/Canopy_64bit/User/bin/activate
PYTHONPATH="/Library/Python/2.7/site-packages/:$PYTHONPATH"
export PYTHONPATH
set PATH = "$PATH:/Users/ibebian/Desktop/Postgres.app/Contents/MacOS/bin"
# added by Anaconda 1.8.0 installer
export PATH="/Users/ibebian/anaconda/bin:$PATH"
Any insight here? Much appreciated!
Yes, the problem is the set PATH = "$PATH:/Users/ibebian/Desktop/Postgres.app/Contents/MacOS/bin" line. set sets default arguments for bash functions ($1, $2, and so on). So deactivate thinks it is being called as deactivate PATH = "$PATH:/Users/ibebian/Desktop/Postgres.app/Contents/MacOS/bin", rather than just deactivate.
To assign to a variable, just use
PATH="$PATH:/Users/ibebian/Desktop/Postgres.app/Contents/MacOS/bin"
(note there are no spaces here)