Snakemake doesn't activate conda environment correctly

Snakemake doesn't activate conda environment correctly - python

I have a Python module modulename installed in a conda environment called myenvname.
My snakemake file consists of one simple rule:
rule checker2:
output:
"tata.txt"
conda:
"myenvname"
script:
"scripts/test2.py"
The contents of the test2.py are the following:
import modulename
with open("tata.txt","w") as _f:
_f.write(modulename.__version__)
When I run the above snakemake file with the command snakemake -j 1 --use-conda --conda-frontend conda I get ModuleNotFoundError, which would imply that there is no modulename in my specified environment. However, when I do the following :
conda activate myenvname
python workflow/scripts/test2.py
... everything works perfectly. I have no idea what's going on.
The full error is pasted below, with some info omitted for privacy.
Traceback (most recent call last):
File "/OMITTED/.snakemake/scripts/tmpheaxuqjn.test2.py", line 13, in <module>
import cnvpytor as cnv
ModuleNotFoundError: No module named 'MODULENAME'
[Thu Nov 17 18:27:22 2022]
Error in rule checker2:
jobid: 0
output: tata.txt
conda-env: MYENVNAME
RuleException:
CalledProcessError in line 12 of /OMITTED/workflow/snakefile:
Command 'source /apps/qiime2/miniconda3/bin/activate 'MYENVNAME'; set -euo pipefail; /OMITTED/.conda/envs/snakemake/bin/python3.1 /OMITTED/.snakemake/scripts/tmpheaxuqjn.test2.py' returned non-zero exit status 1.
File "/OMITTED/workflow/snakefile", line 12, in __rule_checker2
File "/OMITTED/.conda/envs/snakemake/lib/python3.10/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /OMITTED/.snakemake/log/2022-11-17T182715.495739.snakemake.log
EDIT:
Typo in script fixed, the typo isn't in the script I'm running so it's not the issue here.
EDIT2:
I've tried two different attempts from comments. All three attempts
are run with the same CLI command snakemake -j 1 --use-conda --conda-frontend conda
Attempt 1
Rule in the snakemake:
rule checker3:
output:
"tata.txt"
conda:
"myenvname"
shell:
"""
conda env list >> {output}
conda list >> {output}
"""
In the output file I had the following (I have lots of environs and packages I've cut out):
# conda environments:
#
...
myenvname * /OMITTED/.conda/envs/myenvname
...
# packages in environment at /OMITTED/.conda/envs/myenvname:
#
# Name Version Build Channel
...
modulename 1.2 dev_0 <develop>
...
This attempt proves that the conda environment is activated and that this environment has modulename.
Attempt 2
Same as running the script, but I've modified the script to include
import time; time.sleep(30); import modulename
So I can snag a runnable script before it's deleted. The script has the following inserted at the start:
######## snakemake preamble start (automatically inserted, do not edit) ########
import sys; sys.path.extend(['/OMITTED/.conda/envs/snakemake/lib/python3.10/site-packages', '/OMITTED/MYWORKINGDIRECTORY/workflow/scripts']); import pickle; snakemake = pickle.loads(####a lot of stuff here###); from snakemake.logging import logger; logger.printshellcmds = False; __real_file__ = __file__; __file__ = '/OMITTED/MYWORKINGDIRECTORY/workflow/scripts/test3.py';
######## snakemake preamble end #########
I have no idea what to do with this information.
Attempt 3
Instead of running script, I've ran a shell command that runs a python script.
rule checker4:
output:
"tata.txt"
conda:
"myenvname"
shell:
"python workflow/scripts/test3.py"
It worked (showed no errors), and when I open "tata.txt" I find "1.2" which is the version of of my module.
Conclusions
The snakemake actually activates proper environment, but the problem is in script part. I have no idea why this is.
There is a similar question here, so this is a duplicate question.

Question is answered. Snakemake actually activates correct environment, but running a python script with the script conflicts with this directive. I don't know if this is a bug in snakemake (version is 6.14.0) or an intentional thing. I've solved the problem by running the python script via shell command with python workflow/scripts/MyScript.py - it's a bit of a problem because I had to include a CLI wrapper that would normally be solved by a snakemake object.

Related

Shell command from Python "env: node: No such file or directory"

I got a Node.js CLI program called meyda installed (Mac OS 10.14) using:
sudo npm install --global meyda
From the Terminal I can call the program and it works as expected; like:
meyda --bs=256 --o=apagodis2.csv DczN6842.wav rms
Now, I want to call it from inside a python script (using Spyder) at the same location and tried this – but getting error:
import os
os.system ('/usr/local/bin/meyda --bs=256 --o=apagodis4.csv samples_training/DczN6842.wav rms')
>>> env: node: No such file or directory
I can issue more "traditional" shell commands like this from the same Python script and it works:
os.system ('cp samples_training/DczN6842.wav copy.wav')
Also tried subprocess call with same result. I confirmed the executable is at /usr/local/bin/
To make sure I also removed all file arguments calling the program using only the help flag but same, error.
os.system ('/usr/local/bin/meyda -h')
>>> env: node: No such file or directory
Why is the command not found from inside Python but sucessfully in the macOS Terminal?

snakemake cluster script ImportError snakemake.utils

I have a strange issue that comes and goes randomly and I really can't figure out when and why.
I am running a snakemake pipeline like this:
conda activate $myEnv
snakemake -s $snakefile --configfile test.conf.yml --cluster "python $qsub_script" --latency-wait 60 --use-conda -p -j 10 --jobscript "$job_script"
I installed snakemake 5.9.1 (also tried downgrading to 5.5.4) within a conda environment.
This works fine if I just run this command, but when I qsub this command to the PBS cluster I'm using, I get an error. My qsub script looks like this:
#PBS stuff...
source ~/.bashrc
hostname
conda activate PGC_de_novo
cd $workDir
snakefile="..."
qsub_script="pbs_qsub_snakemake_wrapper.py"
job_script="..."
snakemake -s $snakefile --configfile test.conf.yml --cluster "python $qsub_script" --latency-wait 60 --use-conda -p -j 10 --jobscript "$job_script" >out 2>err
And the error message I get is:
...
Traceback (most recent call last):
File "/path/to/pbs_qsub_snakemake_wrapper.py", line 6, in <module>
from snakemake.utils import read_job_properties
ImportError: No module named snakemake.utils
Error submitting jobscript (exit code 1):
...
So it looks like for some reason my cluster script doesn't find snakemake, although snakemake is clearly installed. As I said, this problem keeps coming and going. It'd stay for a few hours, then go away for now aparent reason. I guess this indicates an environment problem, but I really can't figure out what, and ran out of ideas. I've tried:
different conda versions
different snakemake versions
different nodes on the cluster
ssh to the node it just failed on and try to reproduce the error
but nothing. Any ideas where to look? Thanks!

Following #Manavalan Gajapathy's advice, I added print(sys.version) commands both to the snakefile and the cluster script, and in both cases got a python version (2.7.5) different than the one indicated in the activated environment (3.7.5).
To cut a long story short - for some reason when I activate the environment within a PBS job, the environment path is added to the $PATH only after /usr/bin, which results in /usr/bin/python being used (which does not have the snakemake package). When the env is activated locally, the env path is added to the beginning of the $PATH, so the right python is used.
I still don't understand this behavior, but at least I could work around it by changing the #PATH. I guess this is not a very elegant solution, but it works for me.

A possibility could be that some cluster nodes don't find the path to the snakemake package so when a job is submitted to those nodes you get the error.
I don't know if/how that could happen but if that is the case you could find the incriminated nodes with something like:
for node in pbsnodes
do
echo $node
ssh $node 'python -c "from snakemake.utils import read_job_properties"'
done
(for nodes in pbsnodes iterates through the available nodes - I don't have the exact syntax right now but hopefully you get the idea). This at least would narrow down the problem a bit...

Makefile on Windows 10 - file not found

After some time I finally managed to successfully install python and pip and run it on my machine using Visual Studio Code.
I am working in virtual environment in python and we have a Makefile with following statement:
test:
source .env && PYTHONPATH=. PY_ENV=testing py.test ${ARGS} --duration=20
File .env lives in the main directory next to Makefile. It contains some environmental variables needed for testing certain APIs.
When I take the line out of the file and run it in my terminal, everything works fine and all tests are running etc.
However if I call the following: make test I am getting this error:
$ make test
source .env && PYTHONPATH=. PY_ENV=testing py.test --duration=20
/usr/bin/sh: line 0: source: .env: file not found
make: *** [test] Error 1
(venv)
To me it looks like when running this command from within Makefile it can't see the .env file but have no idea how to solve it.

The source command isn't looking up the file in the current working directory. As mentioned in man source:
Read and execute commands from filename in the current shell
environment and return the exit status of the last command executed
from filename. If filename does not contain a slash, filenames in
PATH are used to find the directory containing filename.
Change the file path like so:
test:
source ./.env && PYTHONPATH=. PY_ENV=testing py.test ${ARGS} --duration=20
Note that this error does not occur in bash version < 4. This is due to an implementation bug when run under POSIX mode (what make uses, since its default shell is sh, which is usually bash --posix). The correct behaviour was first mentioned in the documentation of bash-2.05 (revision 28ef6c31, file doc/bashref.info):
When Bash is not in POSIX mode, the current directory is searched if
FILENAME is not found in `$PATH'.
These older versions searched the current directory regardless of POSIX mode. It was only in bash-4.0-rc1 (revision 3185942a, file general.c) that this was corrected. Running git diff 3185942a~ 3185942a general.c outputs this section:
## -69,6 +69,7 ## posix_initialize (on)
if (on != 0)
{
interactive_comments = source_uses_path = expand_aliases = 1;
+ source_searches_cwd = 0;
}

Mac gcloud install ImportError: No module named future

When installing gcloud for mac I get this error when I run the install.sh command according to docs here:
Traceback (most recent call last):
File "/path_to_unzipped_file/google-cloud-sdk/bin/bootstrapping/install.py", line 8, in <module>
from __future__ import absolute_import
I poked through and echoed out some stuff in the install shell script. It is setting the environment variables correctly (pointing to my default python installation, pointing to the correct location of the gcloud SDK).
If I just enter the python interpreter (using the same default python that the install script points to when running install.py) I can import the module just fine:
>>> from __future__ import absolute_import
>>>
Only other information worth noting is my default python setup is a virtual environment that I create from python 2.7.15 installed through brew. The virtual environment python bin is first in my PATH so python and python2 and python2.7 all invoke the correct binary. I've had no other issues installing packages on this setup so far.
If I echo the final line of the install.sh script that calls the install.py script it shows /path_to_virtualenv/bin/python -S /path_to_unzipped_file/google-cloud-sdk/bin/bootstrapping/install.py which is the correct python. Or am I missing something?

The script uses the -S command-line switch, which disables loading the site module on start-up.
However, it is a custom dedicated site module installed in a virtualenv that makes a virtualenv work. As such, the -S switch and virtualenvs are incompatible, with -S set fundamental imports such as from __future__ break down entirely.
You can either remove the -S switch from the install.bat command or use a wrapper script to strip it from the command line as you call your real virtualenv Python.

I had the error below when trying to run gcloud commands.
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/gcloud.py", line 20, in <module>
from __future__ import absolute_import
ImportError: No module named __future__
If you have your virtualenv sourced automatically you can specify the environment variable CLOUDSDK_PYTHON i.e. set -x CLOUDSDK_PYTHON /usr/bin/python to not use the virtualenv python.

In google-cloud-sdk/install.sh go to last line, remove variable $CLOUDSDK_PYTHON_ARGS as below.
"$CLOUDSDK_PYTHON" $CLOUDSDK_PYTHON_ARGS "${CLOUDSDK_ROOT_DIR}/bin/bootstrapping/install.py" "$#"
"$CLOUDSDK_PYTHON" "${CLOUDSDK_ROOT_DIR}/bin/bootstrapping/install.py" "$#"

Why is $PATH in remote deployment path different from $PATH in remote system?

I'm currently working on Pycharm with remote python Interpreter(miniconda3/bin/python).
So when I type echo $PATH in remote server, it prints
/home/woosung/bin:/home/woosung/.local/bin:/home/woosung/miniconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
I created project in Pycharm and set remote python Interpreter as miniconda3 python, it works well when I just run some *.py files.
But when I typed some os.system() lines, weird things happened.
For instance, in test.py from Pycharm project
import os
os.system('echo $PATH')
os.system('python --version')
Output is
ssh://woosung#xxx.xxx.xxx.xxx:xx/home/woosung/miniconda3/bin/python -u /tmp/pycharm_project_203/test.py
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
Python 2.7.12
Process finished with exit code 0
I tried same command in remote server,
woosung#test-pc:~$ echo $PATH
/home/woosung/bin:/home/woosung/.local/bin:/home/woosung/miniconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
woosung#test-pc:~$ python --version
Python 3.6.6 :: Anaconda, Inc.
PATH and the version of python are totally different! How can I fix this?
I've already tried add os.system('export PATH="$PATH:$HOME/miniconda3/bin"') to test.py. But it still gives same $PATH.(/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games)
EDIT
Thanks to the comment of #Dietrich Epp, I successfully add interpreter path to the shell $PATH.
(os.environ["PATH"] += ":/home/woosung/miniconda3/bin")
But I stuck the more basic problem. When I add the path and execute command the some *.py file including import library which is only in miniconda3, the shell gives ImportError.
For instance, in test.py
import matplotlib
os.environ["PATH"] += ":/home/woosung/miniconda3/bin"
os.system("python import_test.py")
and import_test.py
import matplotlib
And when I run test.py,
Traceback (most recent call last):
File "import_test.py", line 1, in <module>
import matplotlib
ImportError: No module named matplotlib
Looks like the shell doesn't understand how to utilize modified $PATH.

I find the solution.
It is not direct but quite simple.
I changed os.system("python import_test.py") to os.system(sys.executable + ' import_test.py').
This makes the shell uses the Pycharm remote interpreter(miniconda3), not original.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.