Consider the following example, consisting of three files:
BUILD
load("#rules_python//python:pip.bzl", "pip_install")
pip_install(
requirements = ":requirements.txt",
)
py_binary(
name = "bin",
srcs = ["bin.py"],
)
WORKSPACE
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "rules_python",
sha256 = "954aa89b491be4a083304a2cb838019c8b8c3720a7abb9c4cb81ac7a24230cea",
urls = [
"https://mirror.bazel.build/github.com/bazelbuild/rules_python/releases/download/0.4.0/rules_python-0.4.0.tar.gz",
"https://github.com/bazelbuild/rules_python/releases/download/0.4.0/rules_python-0.4.0.tar.gz",
],
)
bin.py
print('hello, world')
I'm getting the following error:
> bazel run bin
ERROR: Traceback (most recent call last):
File "/data/d33tah/workspace/tmp/experiments/bazel/3-dependencies-without-docker/repro/BUILD", line 3, column 12, in <toplevel>
pip_install(
File "/home/d33tah/.cache/bazel/_bazel_d33tah/055ed32c3fa80a842a34f0252f6032c8/external/rules_python/python/pip.bzl", line 82, column 29, in pip_install
pip_install_dependencies()
File "/home/d33tah/.cache/bazel/_bazel_d33tah/055ed32c3fa80a842a34f0252f6032c8/external/rules_python/python/pip_install/repositories.bzl", line 67, column 14, in pip_install_dependencies
maybe(
File "/home/d33tah/.cache/bazel/_bazel_d33tah/055ed32c3fa80a842a34f0252f6032c8/external/bazel_tools/tools/build_defs/repo/utils.bzl", line 201, column 18, in maybe
repo_rule(name = name, **kwargs)
Error in repository_rule: 'repository rule http_archive' can only be called during workspace loading
ERROR: Skipping 'bin': no such target '//:bin': target 'bin' not declared in package '' defined by /data/d33tah/workspace/tmp/experiments/bazel/3-dependencies-without-docker/repro/BUILD
WARNING: Target pattern parsing failed.
ERROR: no such target '//:bin': target 'bin' not declared in package '' defined by /data/d33tah/workspace/tmp/experiments/bazel/3-dependencies-without-docker/repro/BUILD
INFO: Elapsed time: 0.091s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
FAILED: Build did NOT complete successfully (0 packages loaded)
Creating empty requirements.txt doesn't help. Also, in case it's any relevant:
> bazel version
Bazelisk version: v1.10.1
Build label: 4.2.2
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 2 18:15:58 2021 (1638468958)
Build timestamp: 1638468958
Build timestamp as int: 1638468958
What am I doing wrong and how do I fix it?
Matt Mackay on Bazel Slack immediately found the solution:
pip_install is a repository rule, so it should be in the WORKSPACE file. https://github.com/bazelbuild/rules_python/blob/main/docs/pip.md#pip_install
Also noted there by James "jsharpe" Sharpe:
A quick look and you're calling pip_install in the BUILD file when it should be in the WORKSPACE file.
Here's the corrected code:
BUILD
py_binary(
name = "bin",
srcs = ["bin.py"],
)
WORKSPACE
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "rules_python",
sha256 = "954aa89b491be4a083304a2cb838019c8b8c3720a7abb9c4cb81ac7a24230cea",
urls = [
"https://mirror.bazel.build/github.com/bazelbuild/rules_python/releases/download/0.4.0/rules_python-0.4.0.tar.gz",
"https://github.com/bazelbuild/rules_python/releases/download/0.4.0/rules_python-0.4.0.tar.gz",
],
)
load("#rules_python//python:pip.bzl", "pip_install")
pip_install(
requirements = ":requirements.txt",
)
bin.py
print('hello, world')
requirements.txt
# intentionally left empty
TLDR I'm getting the following error:
The 'conda' command is not available inside your singularity container image. Snakemake mounts your conda installation into singularity. Sometimes, this can fail because of shell restrictions. It has been tested to work with docker://ubuntu, but it e.g. fails with docker://bash
I had created a Snakemake workflow and converted the shell: commands to rule-based package management via Snakemake wrappers: .
However, I ran into issues running this on HPC and one of the HPC support staff strongly recommended against using conda on any HPC system as:
"if the builder [of wrapper] is not super careful, dynamic libraries present in the conda environment that relies on the host libs (there are always a couple present because builder are most of the time carefree) will break. I think that relying on Singularity for your pipeline would make for a more robust system." - Anon
I did some reading over the weekend and according to this document, it's possible to combine containers with conda-based package management; by defining a global conda docker container and per-rule yaml files.
Note: In contrast to the example in the link above (Figure 5.4), which uses a predefined yaml and shell: command, here I've use
conda wrappers which download these yaml files into the
Singularity container (if I'm thinking correctly) so I thought should function the same - see the Note: at the end though...
Snakefile, config.yaml and samples.txt
Snakefile
# Directories------------------------------------------------------------------
configfile: "config.yaml"
# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))
import os
import pandas as pd
# getting the samples information (names, path to r1 & r2) from samples.txt
samples_information = pd.read_csv("samples.txt", sep='\t', index_col=False)
# get a list of the sample names
sample_names = list(samples_information['sample'])
sample_locations = list(samples_information['location'])
samples_dict = dict(zip(sample_names, sample_locations))
# get number of samples
len_samples = len(sample_names)
# Singularity with conda wrappers
singularity: "docker://continuumio/miniconda3:4.5.11"
# Rules -----------------------------------------------------------------------
rule all:
input:
"resources/vep/plugins",
"resources/vep/cache"
rule download_vep_plugins:
output:
directory("resources/vep/plugins")
params:
release=100
resources:
mem=1000,
time=30
wrapper:
"0.66.0/bio/vep/plugins"
rule get_vep_cache:
output:
directory("resources/vep/cache")
params:
species="caenorhabditis_elegans",
build="WBcel235",
release="100"
resources:
mem=1000,
time=30
log:
"logs/vep/cache.log"
cache: True # save space and time with between workflow caching (see docs)
wrapper:
"0.66.0/bio/vep/cache"
config.yaml
# Files
REF_GENOME: "c_elegans.PRJNA13758.WS265.genomic.fa"
GENOME_ANNOTATION: "c_elegans.PRJNA13758.WS265.annotations.gff3"
# Tools
QC_TOOL: "fastQC"
TRIM_TOOL: "trimmomatic"
ALIGN_TOOL: "bwa"
MARKDUP_TOOL: "picard"
CALLING_TOOL: "varscan"
ANNOT_TOOL: "vep"
samples.txt
sample location
MTG324 /home/moldach/wrappers/SUBSET/MTG324_SUBSET
Submission
snakemake --profile slurm --use-singularity --use-conda --jobs 2
Logs
Workflow defines that rule get_vep_cache is eligible for caching between workflows (use the --cache argument to enable this).
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 get_vep_cache
1
[Mon Sep 21 15:35:50 2020]
rule get_vep_cache:
output: resources/vep/cache
log: logs/vep/cache.log
jobid: 0
resources: mem=1000, time=30
Activating singularity image /home/moldach/wrappers/SUBSET/VEP/.snakemake/singularity/d7617773b315c3abcb29e0484085ed06.simg
Activating conda environment: /home/moldach/wrappers/SUBSET/VEP/.snakemake/conda/774ea575
[Mon Sep 21 15:36:38 2020]
Finished job 0.
1 of 1 steps (100%) done
Note: Leaving --use-conda out of the submission of the workflow will cause an error for get_vep_cache: - /bin/bash: vep_install: command not found
Workflow defines that rule get_vep_cache is eligible for caching between workflows (use the --cache argument to enable this).
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 download_vep_plugins
1
[Mon Sep 21 15:35:50 2020]
rule download_vep_plugins:
output: resources/vep/plugins
jobid: 0
resources: mem=1000, time=30
Activating singularity image /home/moldach/wrappers/SUBSET/VEP/.snakemake/singularity/d7617773b315c3abcb29e0484085ed06.simg
Activating conda environment: /home/moldach/wrappers/SUBSET/VEP/.snakemake/conda/9f602d9a
[Mon Sep 21 15:35:56 2020]
Finished job 0.
1 of 1 steps (100%) done
The problem occurs when adding the third rule, fastq:
Updated Snakefile
# Directories------------------------------------------------------------------
configfile: "config.yaml"
# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))
import os
import pandas as pd
# getting the samples information (names, path to r1 & r2) from samples.txt
samples_information = pd.read_csv("samples.txt", sep='\t', index_col=False)
# get a list of the sample names
sample_names = list(samples_information['sample'])
sample_locations = list(samples_information['location'])
samples_dict = dict(zip(sample_names, sample_locations))
# get number of samples
len_samples = len(sample_names)
# Singularity with conda wrappers
singularity: "docker://continuumio/miniconda3:4.5.11"
# Rules -----------------------------------------------------------------------
rule all:
input:
"resources/vep/plugins",
"resources/vep/cache",
expand('{QC_DIR}/{QC_TOOL}/before_trim/{sample}_{pair}_fastqc.{ext}', QC_DIR=dirs_dict["QC_DIR"], QC_TOOL=config["QC_TOOL"], sample=sample_names, pair=['R1', 'R2'], ext=['html', 'zip'])
rule download_vep_plugins:
output:
directory("resources/vep/plugins")
params:
release=100
resources:
mem=1000,
time=30
wrapper:
"0.66.0/bio/vep/plugins"
rule get_vep_cache:
output:
directory("resources/vep/cache")
params:
species="caenorhabditis_elegans",
build="WBcel235",
release="100"
resources:
mem=1000,
time=30
log:
"logs/vep/cache.log"
cache: True # save space and time with between workflow caching (see docs)
wrapper:
"0.66.0/bio/vep/cache"
def getHome(sample):
return(list(os.path.join(samples_dict[sample],"{0}_{1}.fastq.gz".format(sample,pair)) for pair in ['R1','R2']))
rule qc_before_trim_r1:
input:
r1=lambda wildcards: getHome(wildcards.sample)[0]
output:
html=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R1_fastqc.html"),
zip=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R1_fastqc.zip"),
params:
dir=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim")
log:
os.path.join(dirs_dict["LOG_DIR"],config["QC_TOOL"],"{sample}_R1.log")
resources:
mem=1000,
time=30
singularity:
"https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0"
threads: 1
message: """--- Quality check of raw data with FastQC before trimming."""
wrapper:
"0.66.0/bio/fastqc"
rule qc_before_trim_r2:
input:
r1=lambda wildcards: getHome(wildcards.sample)[1]
output:
html=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R2_fastqc.html"),
zip=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R2_fastqc.zip"),
params:
dir=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim")
log:
os.path.join(dirs_dict["LOG_DIR"],config["QC_TOOL"],"{sample}_R2.log")
resources:
mem=1000,
time=30
singularity:
"https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0"
threads: 1
message: """--- Quality check of raw data with FastQC before trimming."""
wrapper:
"0.66.0/bio/fastqc"
Error reported in nohup.out
Building DAG of jobs...
Pulling singularity image https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0.
CreateCondaEnvironmentException:
The 'conda' command is not available inside your singularity container image. Snakemake mounts your conda installation into singularity. Sometimes, this can fail because of shell restrictions. It has been tested to work with docker://ubuntu, but it e.g. fails with docker://bash
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/deployment/conda.py", line 247, in create
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/deployment/conda.py", line 381, in __new__
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/deployment/conda.py", line 394, in __init__
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/deployment/conda.py", line 417, in _check
using shell: instead of wrapper:
I changed the wrapper back into the shell command:
and this is the error I get when submitting with ``:
orkflow defines that rule get_vep_cache is eligible for caching between workflows (use the --cache argument to enable this).
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 qc_before_trim_r2
1
[Mon Sep 21 16:32:54 2020]
Job 0: --- Quality check of raw data with FastQC before trimming.
Activating singularity image /home/moldach/wrappers/SUBSET/VEP/.snakemake/singularity/6740cb07e67eae01644839c9767bdca5.simg
^[[33mWARNING:^[[0m Skipping mount /var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "en_CA.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Skipping '/home/moldach/wrappers/SUBSET/MTG324_SUBSET/MTG324_R2.fastq.gz' which didn't exist, or couldn't be read
Waiting at most 60 seconds for missing files.
MissingOutputException in line 84 of /home/moldach/wrappers/SUBSET/VEP/Snakefile:
Job completed successfully, but some output files are missing. Missing files after 60 seconds:
qc/fastQC/before_trim/MTG324_R2_fastqc.html
qc/fastQC/before_trim/MTG324_R2_fastqc.zip
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 544, in handle_job_success
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 231, in handle_job_success
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
The error Skipping '/home/moldach/wrappers/SUBSET/MTG324_SUBSET/MTG324_R2.fastq.gz' which didn't exist, or couldn't be read is misleading because the file is does exist...
Update 2
Following the advice Manavalan Gajapathy I've eliminated defining singularity at two different levels (global + per-rule).
Now I'm using a singularity container at only the global level and using wrappers via --use-conda which creates the conda environment inside of the container:
# Directories------------------------------------------------------------------
configfile: "config.yaml"
# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))
import os
import pandas as pd
# getting the samples information (names, path to r1 & r2) from samples.txt
samples_information = pd.read_csv("samples.txt", sep='\t', index_col=False)
# get a list of the sample names
sample_names = list(samples_information['sample'])
sample_locations = list(samples_information['location'])
samples_dict = dict(zip(sample_names, sample_locations))
# get number of samples
len_samples = len(sample_names)
# Singularity with conda wrappers
singularity: "docker://continuumio/miniconda3:4.5.11"
# Rules -----------------------------------------------------------------------
rule all:
input:
"resources/vep/plugins",
"resources/vep/cache",
expand('{QC_DIR}/{QC_TOOL}/before_trim/{sample}_{pair}_fastqc.{ext}', QC_DIR=dirs_dict["QC_DIR"], QC_TOOL=config["QC_TOOL"], sample=sample_names, pair=['R1', 'R2'], ext=['html', 'zip'])
rule download_vep_plugins:
output:
directory("resources/vep/plugins")
params:
release=100
resources:
mem=1000,
time=30
wrapper:
"0.66.0/bio/vep/plugins"
rule get_vep_cache:
output:
directory("resources/vep/cache")
params:
species="caenorhabditis_elegans",
build="WBcel235",
release="100"
resources:
mem=1000,
time=30
log:
"logs/vep/cache.log"
cache: True # save space and time with between workflow caching (see docs)
wrapper:
"0.66.0/bio/vep/cache"
def getHome(sample):
return(list(os.path.join(samples_dict[sample],"{0}_{1}.fastq.gz".format(sample,pair)) for pair in ['R1','R2']))
rule qc_before_trim_r1:
input:
r1=lambda wildcards: getHome(wildcards.sample)[0]
output:
html=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R1_fastqc.html"),
zip=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R1_fastqc.zip"),
params:
dir=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim")
log:
os.path.join(dirs_dict["LOG_DIR"],config["QC_TOOL"],"{sample}_R1.log")
resources:
mem=1000,
threads: 1
message: """--- Quality check of raw data with FastQC before trimming."""
wrapper:
"0.66.0/bio/fastqc"
rule qc_before_trim_r2:
input:
r1=lambda wildcards: getHome(wildcards.sample)[1]
output:
html=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R2_fastqc.html"),
zip=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R2_fastqc.zip"),
params:
dir=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim")
log:
os.path.join(dirs_dict["LOG_DIR"],config["QC_TOOL"],"{sample}_R2.log")
resources:
mem=1000,
time=30
threads: 1
message: """--- Quality check of raw data with FastQC before trimming."""
wrapper:
"0.66.0/bio/fastqc"
and submit via:
However, I'm still getting an error:
Workflow defines that rule get_vep_cache is eligible for caching between workflows (use the --cache argument to enable this).
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 qc_before_trim_r2
1
[Tue Sep 22 12:44:03 2020]
Job 0: --- Quality check of raw data with FastQC before trimming.
Activating singularity image /home/moldach/wrappers/SUBSET/OMG/.snakemake/singularity/d7617773b315c3abcb29e0484085ed06.simg
Activating conda environment: /home/moldach/wrappers/SUBSET/OMG/.snakemake/conda/c591f288
Skipping '/work/mtgraovac_lab/MATTS_SCRATCH/rep1_R2.fastq.gz' which didn't exist, or couldn't be read
Skipping ' 2> logs/fastQC/rep1_R2.log' which didn't exist, or couldn't be read
Failed to process qc/fastQC/before_trim
java.io.FileNotFoundException: qc/fastQC/before_trim (Is a directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
at uk.ac.babraham.FastQC.Sequence.FastQFile.<init>(FastQFile.java:73)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:106)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:159)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.<init>(OfflineRunner.java:121)
at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)
Traceback (most recent call last):
File "/home/moldach/wrappers/SUBSET/OMG/.snakemake/scripts/tmpiwwprg5m.wrapper.py", line 35, in <module>
shell(
File "/mnt/snakemake/snakemake/shell.py", line 205, in __new__
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail; fastqc qc/fastQC/before_trim --quiet -t 1 --outdir /tmp/tmps93snag8 /work/mtgraovac_lab/MATTS_SCRATCH/rep1_R2.fastq.gz ' 2> logs/fastQC/rep1_R$
[Tue Sep 22 12:44:16 2020]
Error in rule qc_before_trim_r2:
jobid: 0
output: qc/fastQC/before_trim/rep1_R2_fastqc.html, qc/fastQC/before_trim/rep1_R2_fastqc.zip
log: logs/fastQC/rep1_R2.log (check log file(s) for error message)
conda-env: /home/moldach/wrappers/SUBSET/OMG/.snakemake/conda/c591f288
RuleException:
CalledProcessError in line 97 of /home/moldach/wrappers/SUBSET/OMG/Snakefile:
Command ' singularity exec --home /home/moldach/wrappers/SUBSET/OMG --bind /home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages:/mnt/snakemake /home/moldach/wrappers/SUBSET/OMG/.snakemake$
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2189, in run_wrapper
File "/home/moldach/wrappers/SUBSET/OMG/Snakefile", line 97, in __rule_qc_before_trim_r2
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 529, in _callback
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/concurrent/futures/thread.py", line 57, in run
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2201, in run_wrapper
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Reproducibility
To replicate this you can download this small dataset:
git clone https://github.com/CRG-CNAG/CalliNGS-NF.git
cp CalliNGS-NF/data/reads/rep1_*.fq.gz .
mv rep1_1.fq.gz rep1_R1.fastq.gz
mv rep1_2.fq.gz rep1_R2.fastq.gz
UPDATE 3: Bind Mounts
According to the link shared on mounting:
"By default Singularity bind mounts /home/$USER, /tmp, and $PWD into your container at runtime."
Thus, for simplicity (and also because I got errors using --singularity-args), I've moved the required files into /home/$USER and tried to run from there.
(snakemake) [~]$ pwd
/home/moldach
(snakemake) [~]$ ll
total 3656
drwx------ 26 moldach moldach 4096 Aug 27 17:36 anaconda3
drwx------ 2 moldach moldach 4096 Sep 22 10:11 bin
-rw------- 1 moldach moldach 265 Sep 22 14:29 config.yaml
-rw------- 1 moldach moldach 1817903 Sep 22 14:29 rep1_R1.fastq.gz
-rw------- 1 moldach moldach 1870497 Sep 22 14:29 rep1_R2.fastq.gz
-rw------- 1 moldach moldach 55 Sep 22 14:29 samples.txt
-rw------- 1 moldach moldach 3420 Sep 22 14:29 Snakefile
and ran with bash -c "nohup snakemake --profile slurm --use-singularity --use-conda --jobs 4 &"
However, I still get this odd error:
Activating conda environment: /home/moldach/.snakemake/conda/fdae4f0d
Skipping ' 2> logs/fastQC/rep1_R2.log' which didn't exist, or couldn't be read
Failed to process qc/fastQC/before_trim
java.io.FileNotFoundException: qc/fastQC/before_trim (Is a directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
at uk.ac.babraham.FastQC.Sequence.FastQFile.<init>(FastQFile.java:73)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:106)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:159)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.<init>(OfflineRunner.java:121)
at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)
Traceback (most recent call last):
Why does it think it's being given a directory?
Note: If you submit only with --use-conda, e.g. bash -c "nohup snakemake --profile slurm --use-conda --jobs 4 &" there is no error from the fastqc rules. However, the --use-conda param alone is not %100 reproducible, case-in-point doesn't work on another HPC I tested it on
The full log in nohup.out when using --printshellcmds can be found at this gist
TLDR:
fastqc singularity container used in qc rule likely doesn't have conda available in it, and this doesn't satisfy what snakemake's--use-conda expects.
Explanation:
You have singularity containers defined at two different levels - 1. global level that will be used for all rules, unless they are overridden at rule level; 2. per-rule level that will be used at the rule level.
# global singularity container to use
singularity: "docker://continuumio/miniconda3:4.5.11"
# singularity container defined at rule level
rule qc_before_trim_r1:
....
....
singularity:
"https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0"
When you use --use-singularity and --use-conda together, jobs will be run in conda environment inside the singularity container. So conda command needs to be available inside the singularity container for this to be possible. While this requirement is clearly satisfied for your global-level container, I am quite certain (haven't tested though) this is not the case for your fastqc container.
The way snakemake works if --use-conda flag is supplied, it will create conda environment locally or inside the container depending on the supply of --use-singularity flag. Since you are using snakemake-wrapper for qc rule and it comes with conda env recipe pre-defined, the easiest solution here is to just use the globally defined miniconda container for all rules. That is, there is no need to use fastqc specific container for qc rule.
If you really want to use the fastqc container, then you shouldn't be using --use-conda flag, but of course this will mean that all necessary tools are available from the container(s) defined globally or per rule.
In our build we're creating an executable file with unit tests like this:
tests = env.Program(os.path.join(env['testDir'], name + '_test'),
src + createManifest(env),
LIBS = libs,
LIBPATH = buildLibPath(env),
LINKFLAGS = env['LINKFLAGS'],
CPPPATH = cppPath)
This correctly creates an executable, which later is being run by the following builder:
action = tests[0].abspath + '&& echo %DATE% %TIME% > ${TARGET}'
runTests = env.Command(source = tests,
target = 'test_'+name+'.tmp',
action = action)
Up to this point everything works fine: the tests are being run during the build.
I've recently found Visual Leak Detector tool and wanted to include this in the build. So, I've changed the environment for the builders like this:
vldInclude = os.path.join(os.path.normpath(env['vldIncDir']), 'vld.h')
env.Append(CPPFLAGS='/FI' + vldInclude)
env.Append(LIBPATH = env['vldLibDir'])
vldLib = os.path.join(env['vldLibDir'], 'vld.lib')
libs.append(vldLib) # used in the Program call for the LIBS parameter, see above
scons: *** [build\debug\libname\test_libname.dummy] Error 309
This error message isn't very helpful. What does it mean and how to fix it?
It turns out that the magic number 309 is more googleable when written as: 0xC0000135 (no idea why C, but 135HEX == 309DEC), and it is an identifier of the STATUS_DLL_NOT_FOUND error.
So, it's not a SCons error, but Windows error, that leaks through SCons.
This means that some DLLs are missing, needed by VLD. Lurking into the VLD installation directory (usually: C:\Program Files (x86)\Visual Leak Detector) two DLL files and one manifest file can be found in the bin\Win32 subdirectory.
Not to have the build being dependent on the machine's environment, you can either add the directory to env['ENV']['PATH'] or copy the files to the directory where the tests are being run.
To do the latter:
You need another VLD configuration option, beside the library directory, namely the binaries directory. Let's call it vldBinDir. At the build's startup you can copy these files to the build directory:
def setupVld(env):
sourcePath = env['vldBinDir']
targetPath = env['testDir']
toCopy = ['dbghelp.dll',
'vld_x86.dll',
'Microsoft.DTfW.DHL.manifest']
nodes = []
for c in toCopy:
n = env.Command(os.path.join(targetPath, c),
os.path.join(sourcePath, c),
SCons.Defaults.Copy("${TARGET}", "${SOURCE}"))
nodes.append(n)
env['vldDeps'] = nodes
And then, when creating particular tests, make sure to add the dependency:
for n in env['vldDeps']:
env.Depends(tests, n)