Docker images builds with Bazel on Apple M1

Docker images builds with Bazel on Apple M1 - python

I setup a new project that uses Bazel to build/package my Python applications.
The project uses rules_python to setup a py_binary rule with my source files and rules_docker to setup a py_image rule to build my image.
I am successfully able to bazel run the py_binary by itself.
But when trying to run the py_image rule, it succeeds with building the image, but fails to run the binary entry-point and throws the following error:
INFO: Analyzed target //demo:demo_img (110 packages loaded, 12496 targets configured).
INFO: Found 1 target...
Target //demo:demo_img up-to-date:
bazel-out/darwin_arm64-fastbuild-ST-9e3a93240a9e/bin/demo/demo_img-layer.tar
INFO: Elapsed time: 8.722s, Critical Path: 3.94s
INFO: 31 processes: 13 internal, 18 darwin-sandbox.
INFO: Build completed successfully, 31 total actions
INFO: Build completed successfully, 31 total actions
f83e52040704: Loading layer [==================================================>] 147.1MB/147.1MB
Loaded image ID: sha256:078152695f1056177bd21cd96171245f42f7415f5a1ff802b20fbd973eecddfd
Tagging 078152695f1056177bd21cd96171245f42f7415f5a1ff802b20fbd973eecddfd as bazel/demo:demo_img
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
Traceback (most recent call last):
File "/app/demo/demo_img.binary", line 392, in <module>
Main()
File "/app/demo/demo_img.binary", line 382, in Main
os.execv(args[0], args)
OSError: [Errno 8] Exec format error: '/app/demo/demo_img.binary.runfiles/python3_8_aarch64-apple-darwin/bin/python3'
Taking a look at the generated image
docker run -it --entrypoint sh bazel/demo:demo_img
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
/app/demo/demo_img.binary.runfiles/__main__ # uname -a
Linux c94f44a24832 5.10.104-linuxkit #1 SMP PREEMPT Thu Mar 17 17:05:54 UTC 2022 aarch64 Linux
My current setup also uses a hermetic Python interpreter following this blog post: https://thethoughtfulkoala.com/posts/2020/05/16/bazel-hermetic-python.html
I am assuming that this problem exists due to the mismatch in OS type? The Python binary is built with an interpreter using apple/darwin where as the image is based on linux?
How do I configure py_image to build a binary for linux when developing on an M1 Macbook?
Appendix:
The following files are part of my sample project:
__main__.py
from flask import Flask
app = Flask(__name__)
#app.route("/", methods=["GET"])
def root():
return "OK"
if __name__ == "__main__":
app.run()
BUILD.bazel
load("#rules_python//python:defs.bzl", "py_binary")
load("#io_bazel_rules_docker//python3:image.bzl", py_image = "py3_image")
py_binary(
name = "demo_bin",
srcs = ["__main__.py"],
imports = [".."],
main = "__main__.py",
visibility = ["//:__subpackages__"],
deps = [
"#python_deps_flask//:pkg",
],
)
container_image(
name = "py_alpine_base",
base = "#python-alpine//image",
symlinks = {
"/usr/bin/python": "/usr/local/bin/python", # To work as base for py3_image
"/usr/bin/python3": "/usr/local/bin/python3", # To work as base for py3_image
},
)
py_image(
name = "demo_img",
srcs = ["__main__.py"],
base = "//:py_alpine_base",
main = "__main__.py",
deps = [
"#python_deps_flask//:pkg",
],
)
Where python-alpine is defined in WORKSPACE. It references an arm64 image from dockerhub.
load("#io_bazel_rules_docker//container:container.bzl", "container_pull")
container_pull(
name = "python-alpine",
registry = "index.docker.io",
repository = "arm64v8/python",
tag = "3.8-alpine",
)

The error usually means that instruction set of container host doesn’t match the instructions set of the container image that is attempting to initiate.
Use —platform in build command and specify linux/amd64

Related

Not Able to Run GEM5 with RISC-V: "!seWorkload occurred: Couldn't find appropriate workload object"

I am trying to run gem5 with RISC-V. I have the Linux 64-bits cross compiler ready and I have also installed and compiled gem5. I then tried to use the following tutorial to run gem5: https://canvas.kth.se/courses/24933/pages/tutorial-simulating-a-cpu-with-gem5
I wrote a simple Hello World C program and compiled it using the following command:
riscv64-unknown-linux-gnu-gcc -c hello.c -static -Wall -O0 -o hello
But when I try to run gem5, I get the following error:
build/RISCV/sim/process.cc:137: fatal: fatal condition !seWorkload occurred: Couldn't find appropriate workload object.
I tried to come over this problem but I could not. I added print statements to the configuration file and realized that the error occurs in the line m5.instantiate() in the configuration file attached below. Does anyone know how to solve this issue? What is an seWorkload and why gem5 considers the object as not appropriate?
I am using Ubuntu 22.04. For reference, this is the configuration python file I use for gem5:
import m5
from m5.objects import *
import sys
system = System()
system.clk_domain = SrcClockDomain()
system.clk_domain.clock = '1GHz'
system.clk_domain.voltage_domain = VoltageDomain()
system.mem_mode = 'timing'
system.mem_ranges = [AddrRange('512MB')]
system.cpu = TimingSimpleCPU()
system.membus = SystemXBar()
system.cpu.icache_port = system.membus.cpu_side_ports
system.cpu.dcache_port = system.membus.cpu_side_ports
system.mem_ctrl = MemCtrl()
system.mem_ctrl.dram = DDR3_1600_8x8()
system.mem_ctrl.dram.range = system.mem_ranges[0]
system.mem_ctrl.port = system.membus.mem_side_ports
# start a process
process = Process()
# read command line arguments for the path to the executable
process.cmd = [str(sys.argv[1])]
system.cpu.workload = process
system.cpu.createThreads()
root = Root(full_system = False, system = system)
m5.instantiate() # the error occurs from this line
print("Beginning simulation!")
exit_event = m5.simulate()
print('Exiting # tick %i because %s' %(m5.curTick(), exit_event.getCause()))

m5.util.addToPath('../../') is missing. This is used to add the common scripts to the path to your directory from where you are instantiating the simulation.

Attempt to use pip_install results in: "Error in repository_rule: 'repository rule http_archive' can only be called during workspace loading"

Consider the following example, consisting of three files:
BUILD
load("#rules_python//python:pip.bzl", "pip_install")
pip_install(
requirements = ":requirements.txt",
)
py_binary(
name = "bin",
srcs = ["bin.py"],
)
WORKSPACE
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "rules_python",
sha256 = "954aa89b491be4a083304a2cb838019c8b8c3720a7abb9c4cb81ac7a24230cea",
urls = [
"https://mirror.bazel.build/github.com/bazelbuild/rules_python/releases/download/0.4.0/rules_python-0.4.0.tar.gz",
"https://github.com/bazelbuild/rules_python/releases/download/0.4.0/rules_python-0.4.0.tar.gz",
],
)
bin.py
print('hello, world')
I'm getting the following error:
> bazel run bin
ERROR: Traceback (most recent call last):
File "/data/d33tah/workspace/tmp/experiments/bazel/3-dependencies-without-docker/repro/BUILD", line 3, column 12, in <toplevel>
pip_install(
File "/home/d33tah/.cache/bazel/_bazel_d33tah/055ed32c3fa80a842a34f0252f6032c8/external/rules_python/python/pip.bzl", line 82, column 29, in pip_install
pip_install_dependencies()
File "/home/d33tah/.cache/bazel/_bazel_d33tah/055ed32c3fa80a842a34f0252f6032c8/external/rules_python/python/pip_install/repositories.bzl", line 67, column 14, in pip_install_dependencies
maybe(
File "/home/d33tah/.cache/bazel/_bazel_d33tah/055ed32c3fa80a842a34f0252f6032c8/external/bazel_tools/tools/build_defs/repo/utils.bzl", line 201, column 18, in maybe
repo_rule(name = name, **kwargs)
Error in repository_rule: 'repository rule http_archive' can only be called during workspace loading
ERROR: Skipping 'bin': no such target '//:bin': target 'bin' not declared in package '' defined by /data/d33tah/workspace/tmp/experiments/bazel/3-dependencies-without-docker/repro/BUILD
WARNING: Target pattern parsing failed.
ERROR: no such target '//:bin': target 'bin' not declared in package '' defined by /data/d33tah/workspace/tmp/experiments/bazel/3-dependencies-without-docker/repro/BUILD
INFO: Elapsed time: 0.091s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
FAILED: Build did NOT complete successfully (0 packages loaded)
Creating empty requirements.txt doesn't help. Also, in case it's any relevant:
> bazel version
Bazelisk version: v1.10.1
Build label: 4.2.2
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 2 18:15:58 2021 (1638468958)
Build timestamp: 1638468958
Build timestamp as int: 1638468958
What am I doing wrong and how do I fix it?

Matt Mackay on Bazel Slack immediately found the solution:
pip_install is a repository rule, so it should be in the WORKSPACE file. https://github.com/bazelbuild/rules_python/blob/main/docs/pip.md#pip_install
Also noted there by James "jsharpe" Sharpe:
A quick look and you're calling pip_install in the BUILD file when it should be in the WORKSPACE file.
Here's the corrected code:
BUILD
py_binary(
name = "bin",
srcs = ["bin.py"],
)
WORKSPACE
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "rules_python",
sha256 = "954aa89b491be4a083304a2cb838019c8b8c3720a7abb9c4cb81ac7a24230cea",
urls = [
"https://mirror.bazel.build/github.com/bazelbuild/rules_python/releases/download/0.4.0/rules_python-0.4.0.tar.gz",
"https://github.com/bazelbuild/rules_python/releases/download/0.4.0/rules_python-0.4.0.tar.gz",
],
)
load("#rules_python//python:pip.bzl", "pip_install")
pip_install(
requirements = ":requirements.txt",
)
bin.py
print('hello, world')
requirements.txt
# intentionally left empty

How to use singularity and conda wrappers in Snakemake

TLDR I'm getting the following error:
The 'conda' command is not available inside your singularity container image. Snakemake mounts your conda installation into singularity. Sometimes, this can fail because of shell restrictions. It has been tested to work with docker://ubuntu, but it e.g. fails with docker://bash
I had created a Snakemake workflow and converted the shell: commands to rule-based package management via Snakemake wrappers: .
However, I ran into issues running this on HPC and one of the HPC support staff strongly recommended against using conda on any HPC system as:
"if the builder [of wrapper] is not super careful, dynamic libraries present in the conda environment that relies on the host libs (there are always a couple present because builder are most of the time carefree) will break. I think that relying on Singularity for your pipeline would make for a more robust system." - Anon
I did some reading over the weekend and according to this document, it's possible to combine containers with conda-based package management; by defining a global conda docker container and per-rule yaml files.
Note: In contrast to the example in the link above (Figure 5.4), which uses a predefined yaml and shell: command, here I've use
conda wrappers which download these yaml files into the
Singularity container (if I'm thinking correctly) so I thought should function the same - see the Note: at the end though...
Snakefile, config.yaml and samples.txt
Snakefile
# Directories------------------------------------------------------------------
configfile: "config.yaml"
# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))
import os
import pandas as pd
# getting the samples information (names, path to r1 & r2) from samples.txt
samples_information = pd.read_csv("samples.txt", sep='\t', index_col=False)
# get a list of the sample names
sample_names = list(samples_information['sample'])
sample_locations = list(samples_information['location'])
samples_dict = dict(zip(sample_names, sample_locations))
# get number of samples
len_samples = len(sample_names)
# Singularity with conda wrappers
singularity: "docker://continuumio/miniconda3:4.5.11"
# Rules -----------------------------------------------------------------------
rule all:
input:
"resources/vep/plugins",
"resources/vep/cache"
rule download_vep_plugins:
output:
directory("resources/vep/plugins")
params:
release=100
resources:
mem=1000,
time=30
wrapper:
"0.66.0/bio/vep/plugins"
rule get_vep_cache:
output:
directory("resources/vep/cache")
params:
species="caenorhabditis_elegans",
build="WBcel235",
release="100"
resources:
mem=1000,
time=30
log:
"logs/vep/cache.log"
cache: True # save space and time with between workflow caching (see docs)
wrapper:
"0.66.0/bio/vep/cache"
config.yaml
# Files
REF_GENOME: "c_elegans.PRJNA13758.WS265.genomic.fa"
GENOME_ANNOTATION: "c_elegans.PRJNA13758.WS265.annotations.gff3"
# Tools
QC_TOOL: "fastQC"
TRIM_TOOL: "trimmomatic"
ALIGN_TOOL: "bwa"
MARKDUP_TOOL: "picard"
CALLING_TOOL: "varscan"
ANNOT_TOOL: "vep"
samples.txt
sample location
MTG324 /home/moldach/wrappers/SUBSET/MTG324_SUBSET
Submission
snakemake --profile slurm --use-singularity --use-conda --jobs 2
Logs
Workflow defines that rule get_vep_cache is eligible for caching between workflows (use the --cache argument to enable this).
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 get_vep_cache
1
[Mon Sep 21 15:35:50 2020]
rule get_vep_cache:
output: resources/vep/cache
log: logs/vep/cache.log
jobid: 0
resources: mem=1000, time=30
Activating singularity image /home/moldach/wrappers/SUBSET/VEP/.snakemake/singularity/d7617773b315c3abcb29e0484085ed06.simg
Activating conda environment: /home/moldach/wrappers/SUBSET/VEP/.snakemake/conda/774ea575
[Mon Sep 21 15:36:38 2020]
Finished job 0.
1 of 1 steps (100%) done
Note: Leaving --use-conda out of the submission of the workflow will cause an error for get_vep_cache: - /bin/bash: vep_install: command not found
Workflow defines that rule get_vep_cache is eligible for caching between workflows (use the --cache argument to enable this).
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 download_vep_plugins
1
[Mon Sep 21 15:35:50 2020]
rule download_vep_plugins:
output: resources/vep/plugins
jobid: 0
resources: mem=1000, time=30
Activating singularity image /home/moldach/wrappers/SUBSET/VEP/.snakemake/singularity/d7617773b315c3abcb29e0484085ed06.simg
Activating conda environment: /home/moldach/wrappers/SUBSET/VEP/.snakemake/conda/9f602d9a
[Mon Sep 21 15:35:56 2020]
Finished job 0.
1 of 1 steps (100%) done
The problem occurs when adding the third rule, fastq:
Updated Snakefile
# Directories------------------------------------------------------------------
configfile: "config.yaml"
# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))
import os
import pandas as pd
# getting the samples information (names, path to r1 & r2) from samples.txt
samples_information = pd.read_csv("samples.txt", sep='\t', index_col=False)
# get a list of the sample names
sample_names = list(samples_information['sample'])
sample_locations = list(samples_information['location'])
samples_dict = dict(zip(sample_names, sample_locations))
# get number of samples
len_samples = len(sample_names)
# Singularity with conda wrappers
singularity: "docker://continuumio/miniconda3:4.5.11"
# Rules -----------------------------------------------------------------------
rule all:
input:
"resources/vep/plugins",
"resources/vep/cache",
expand('{QC_DIR}/{QC_TOOL}/before_trim/{sample}_{pair}_fastqc.{ext}', QC_DIR=dirs_dict["QC_DIR"], QC_TOOL=config["QC_TOOL"], sample=sample_names, pair=['R1', 'R2'], ext=['html', 'zip'])
rule download_vep_plugins:
output:
directory("resources/vep/plugins")
params:
release=100
resources:
mem=1000,
time=30
wrapper:
"0.66.0/bio/vep/plugins"
rule get_vep_cache:
output:
directory("resources/vep/cache")
params:
species="caenorhabditis_elegans",
build="WBcel235",
release="100"
resources:
mem=1000,
time=30
log:
"logs/vep/cache.log"
cache: True # save space and time with between workflow caching (see docs)
wrapper:
"0.66.0/bio/vep/cache"
def getHome(sample):
return(list(os.path.join(samples_dict[sample],"{0}_{1}.fastq.gz".format(sample,pair)) for pair in ['R1','R2']))
rule qc_before_trim_r1:
input:
r1=lambda wildcards: getHome(wildcards.sample)[0]
output:
html=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R1_fastqc.html"),
zip=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R1_fastqc.zip"),
params:
dir=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim")
log:
os.path.join(dirs_dict["LOG_DIR"],config["QC_TOOL"],"{sample}_R1.log")
resources:
mem=1000,
time=30
singularity:
"https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0"
threads: 1
message: """--- Quality check of raw data with FastQC before trimming."""
wrapper:
"0.66.0/bio/fastqc"
rule qc_before_trim_r2:
input:
r1=lambda wildcards: getHome(wildcards.sample)[1]
output:
html=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R2_fastqc.html"),
zip=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R2_fastqc.zip"),
params:
dir=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim")
log:
os.path.join(dirs_dict["LOG_DIR"],config["QC_TOOL"],"{sample}_R2.log")
resources:
mem=1000,
time=30
singularity:
"https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0"
threads: 1
message: """--- Quality check of raw data with FastQC before trimming."""
wrapper:
"0.66.0/bio/fastqc"
Error reported in nohup.out
Building DAG of jobs...
Pulling singularity image https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0.
CreateCondaEnvironmentException:
The 'conda' command is not available inside your singularity container image. Snakemake mounts your conda installation into singularity. Sometimes, this can fail because of shell restrictions. It has been tested to work with docker://ubuntu, but it e.g. fails with docker://bash
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/deployment/conda.py", line 247, in create
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/deployment/conda.py", line 381, in __new__
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/deployment/conda.py", line 394, in __init__
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/deployment/conda.py", line 417, in _check
using shell: instead of wrapper:
I changed the wrapper back into the shell command:
and this is the error I get when submitting with ``:
orkflow defines that rule get_vep_cache is eligible for caching between workflows (use the --cache argument to enable this).
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 qc_before_trim_r2
1
[Mon Sep 21 16:32:54 2020]
Job 0: --- Quality check of raw data with FastQC before trimming.
Activating singularity image /home/moldach/wrappers/SUBSET/VEP/.snakemake/singularity/6740cb07e67eae01644839c9767bdca5.simg
^[[33mWARNING:^[[0m Skipping mount /var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "en_CA.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Skipping '/home/moldach/wrappers/SUBSET/MTG324_SUBSET/MTG324_R2.fastq.gz' which didn't exist, or couldn't be read
Waiting at most 60 seconds for missing files.
MissingOutputException in line 84 of /home/moldach/wrappers/SUBSET/VEP/Snakefile:
Job completed successfully, but some output files are missing. Missing files after 60 seconds:
qc/fastQC/before_trim/MTG324_R2_fastqc.html
qc/fastQC/before_trim/MTG324_R2_fastqc.zip
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 544, in handle_job_success
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 231, in handle_job_success
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
The error Skipping '/home/moldach/wrappers/SUBSET/MTG324_SUBSET/MTG324_R2.fastq.gz' which didn't exist, or couldn't be read is misleading because the file is does exist...
Update 2
Following the advice Manavalan Gajapathy I've eliminated defining singularity at two different levels (global + per-rule).
Now I'm using a singularity container at only the global level and using wrappers via --use-conda which creates the conda environment inside of the container:
# Directories------------------------------------------------------------------
configfile: "config.yaml"
# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))
import os
import pandas as pd
# getting the samples information (names, path to r1 & r2) from samples.txt
samples_information = pd.read_csv("samples.txt", sep='\t', index_col=False)
# get a list of the sample names
sample_names = list(samples_information['sample'])
sample_locations = list(samples_information['location'])
samples_dict = dict(zip(sample_names, sample_locations))
# get number of samples
len_samples = len(sample_names)
# Singularity with conda wrappers
singularity: "docker://continuumio/miniconda3:4.5.11"
# Rules -----------------------------------------------------------------------
rule all:
input:
"resources/vep/plugins",
"resources/vep/cache",
expand('{QC_DIR}/{QC_TOOL}/before_trim/{sample}_{pair}_fastqc.{ext}', QC_DIR=dirs_dict["QC_DIR"], QC_TOOL=config["QC_TOOL"], sample=sample_names, pair=['R1', 'R2'], ext=['html', 'zip'])
rule download_vep_plugins:
output:
directory("resources/vep/plugins")
params:
release=100
resources:
mem=1000,
time=30
wrapper:
"0.66.0/bio/vep/plugins"
rule get_vep_cache:
output:
directory("resources/vep/cache")
params:
species="caenorhabditis_elegans",
build="WBcel235",
release="100"
resources:
mem=1000,
time=30
log:
"logs/vep/cache.log"
cache: True # save space and time with between workflow caching (see docs)
wrapper:
"0.66.0/bio/vep/cache"
def getHome(sample):
return(list(os.path.join(samples_dict[sample],"{0}_{1}.fastq.gz".format(sample,pair)) for pair in ['R1','R2']))
rule qc_before_trim_r1:
input:
r1=lambda wildcards: getHome(wildcards.sample)[0]
output:
html=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R1_fastqc.html"),
zip=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R1_fastqc.zip"),
params:
dir=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim")
log:
os.path.join(dirs_dict["LOG_DIR"],config["QC_TOOL"],"{sample}_R1.log")
resources:
mem=1000,
threads: 1
message: """--- Quality check of raw data with FastQC before trimming."""
wrapper:
"0.66.0/bio/fastqc"
rule qc_before_trim_r2:
input:
r1=lambda wildcards: getHome(wildcards.sample)[1]
output:
html=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R2_fastqc.html"),
zip=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim","{sample}_R2_fastqc.zip"),
params:
dir=os.path.join(dirs_dict["QC_DIR"],config["QC_TOOL"],"before_trim")
log:
os.path.join(dirs_dict["LOG_DIR"],config["QC_TOOL"],"{sample}_R2.log")
resources:
mem=1000,
time=30
threads: 1
message: """--- Quality check of raw data with FastQC before trimming."""
wrapper:
"0.66.0/bio/fastqc"
and submit via:
However, I'm still getting an error:
Workflow defines that rule get_vep_cache is eligible for caching between workflows (use the --cache argument to enable this).
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 qc_before_trim_r2
1
[Tue Sep 22 12:44:03 2020]
Job 0: --- Quality check of raw data with FastQC before trimming.
Activating singularity image /home/moldach/wrappers/SUBSET/OMG/.snakemake/singularity/d7617773b315c3abcb29e0484085ed06.simg
Activating conda environment: /home/moldach/wrappers/SUBSET/OMG/.snakemake/conda/c591f288
Skipping '/work/mtgraovac_lab/MATTS_SCRATCH/rep1_R2.fastq.gz' which didn't exist, or couldn't be read
Skipping ' 2> logs/fastQC/rep1_R2.log' which didn't exist, or couldn't be read
Failed to process qc/fastQC/before_trim
java.io.FileNotFoundException: qc/fastQC/before_trim (Is a directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
at uk.ac.babraham.FastQC.Sequence.FastQFile.<init>(FastQFile.java:73)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:106)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:159)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.<init>(OfflineRunner.java:121)
at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)
Traceback (most recent call last):
File "/home/moldach/wrappers/SUBSET/OMG/.snakemake/scripts/tmpiwwprg5m.wrapper.py", line 35, in <module>
shell(
File "/mnt/snakemake/snakemake/shell.py", line 205, in __new__
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail; fastqc qc/fastQC/before_trim --quiet -t 1 --outdir /tmp/tmps93snag8 /work/mtgraovac_lab/MATTS_SCRATCH/rep1_R2.fastq.gz ' 2> logs/fastQC/rep1_R$
[Tue Sep 22 12:44:16 2020]
Error in rule qc_before_trim_r2:
jobid: 0
output: qc/fastQC/before_trim/rep1_R2_fastqc.html, qc/fastQC/before_trim/rep1_R2_fastqc.zip
log: logs/fastQC/rep1_R2.log (check log file(s) for error message)
conda-env: /home/moldach/wrappers/SUBSET/OMG/.snakemake/conda/c591f288
RuleException:
CalledProcessError in line 97 of /home/moldach/wrappers/SUBSET/OMG/Snakefile:
Command ' singularity exec --home /home/moldach/wrappers/SUBSET/OMG --bind /home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages:/mnt/snakemake /home/moldach/wrappers/SUBSET/OMG/.snakemake$
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2189, in run_wrapper
File "/home/moldach/wrappers/SUBSET/OMG/Snakefile", line 97, in __rule_qc_before_trim_r2
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 529, in _callback
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/concurrent/futures/thread.py", line 57, in run
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2201, in run_wrapper
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Reproducibility
To replicate this you can download this small dataset:
git clone https://github.com/CRG-CNAG/CalliNGS-NF.git
cp CalliNGS-NF/data/reads/rep1_*.fq.gz .
mv rep1_1.fq.gz rep1_R1.fastq.gz
mv rep1_2.fq.gz rep1_R2.fastq.gz
UPDATE 3: Bind Mounts
According to the link shared on mounting:
"By default Singularity bind mounts /home/$USER, /tmp, and $PWD into your container at runtime."
Thus, for simplicity (and also because I got errors using --singularity-args), I've moved the required files into /home/$USER and tried to run from there.
(snakemake) [~]$ pwd
/home/moldach
(snakemake) [~]$ ll
total 3656
drwx------ 26 moldach moldach 4096 Aug 27 17:36 anaconda3
drwx------ 2 moldach moldach 4096 Sep 22 10:11 bin
-rw------- 1 moldach moldach 265 Sep 22 14:29 config.yaml
-rw------- 1 moldach moldach 1817903 Sep 22 14:29 rep1_R1.fastq.gz
-rw------- 1 moldach moldach 1870497 Sep 22 14:29 rep1_R2.fastq.gz
-rw------- 1 moldach moldach 55 Sep 22 14:29 samples.txt
-rw------- 1 moldach moldach 3420 Sep 22 14:29 Snakefile
and ran with bash -c "nohup snakemake --profile slurm --use-singularity --use-conda --jobs 4 &"
However, I still get this odd error:
Activating conda environment: /home/moldach/.snakemake/conda/fdae4f0d
Skipping ' 2> logs/fastQC/rep1_R2.log' which didn't exist, or couldn't be read
Failed to process qc/fastQC/before_trim
java.io.FileNotFoundException: qc/fastQC/before_trim (Is a directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
at uk.ac.babraham.FastQC.Sequence.FastQFile.<init>(FastQFile.java:73)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:106)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:159)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.<init>(OfflineRunner.java:121)
at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)
Traceback (most recent call last):
Why does it think it's being given a directory?
Note: If you submit only with --use-conda, e.g. bash -c "nohup snakemake --profile slurm --use-conda --jobs 4 &" there is no error from the fastqc rules. However, the --use-conda param alone is not %100 reproducible, case-in-point doesn't work on another HPC I tested it on
The full log in nohup.out when using --printshellcmds can be found at this gist

TLDR:
fastqc singularity container used in qc rule likely doesn't have conda available in it, and this doesn't satisfy what snakemake's--use-conda expects.
Explanation:
You have singularity containers defined at two different levels - 1. global level that will be used for all rules, unless they are overridden at rule level; 2. per-rule level that will be used at the rule level.
# global singularity container to use
singularity: "docker://continuumio/miniconda3:4.5.11"
# singularity container defined at rule level
rule qc_before_trim_r1:
....
....
singularity:
"https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0"
When you use --use-singularity and --use-conda together, jobs will be run in conda environment inside the singularity container. So conda command needs to be available inside the singularity container for this to be possible. While this requirement is clearly satisfied for your global-level container, I am quite certain (haven't tested though) this is not the case for your fastqc container.
The way snakemake works if --use-conda flag is supplied, it will create conda environment locally or inside the container depending on the supply of --use-singularity flag. Since you are using snakemake-wrapper for qc rule and it comes with conda env recipe pre-defined, the easiest solution here is to just use the globally defined miniconda container for all rules. That is, there is no need to use fastqc specific container for qc rule.
If you really want to use the fastqc container, then you shouldn't be using --use-conda flag, but of course this will mean that all necessary tools are available from the container(s) defined globally or per rule.

What does "Error 309" mean?

In our build we're creating an executable file with unit tests like this:
tests = env.Program(os.path.join(env['testDir'], name + '_test'),
src + createManifest(env),
LIBS = libs,
LIBPATH = buildLibPath(env),
LINKFLAGS = env['LINKFLAGS'],
CPPPATH = cppPath)
This correctly creates an executable, which later is being run by the following builder:
action = tests[0].abspath + '&& echo %DATE% %TIME% > ${TARGET}'
runTests = env.Command(source = tests,
target = 'test_'+name+'.tmp',
action = action)
Up to this point everything works fine: the tests are being run during the build.
I've recently found Visual Leak Detector tool and wanted to include this in the build. So, I've changed the environment for the builders like this:
vldInclude = os.path.join(os.path.normpath(env['vldIncDir']), 'vld.h')
env.Append(CPPFLAGS='/FI' + vldInclude)
env.Append(LIBPATH = env['vldLibDir'])
vldLib = os.path.join(env['vldLibDir'], 'vld.lib')
libs.append(vldLib) # used in the Program call for the LIBS parameter, see above
scons: *** [build\debug\libname\test_libname.dummy] Error 309
This error message isn't very helpful. What does it mean and how to fix it?

It turns out that the magic number 309 is more googleable when written as: 0xC0000135 (no idea why C, but 135HEX == 309DEC), and it is an identifier of the STATUS_DLL_NOT_FOUND error.
So, it's not a SCons error, but Windows error, that leaks through SCons.
This means that some DLLs are missing, needed by VLD. Lurking into the VLD installation directory (usually: C:\Program Files (x86)\Visual Leak Detector) two DLL files and one manifest file can be found in the bin\Win32 subdirectory.
Not to have the build being dependent on the machine's environment, you can either add the directory to env['ENV']['PATH'] or copy the files to the directory where the tests are being run.
To do the latter:
You need another VLD configuration option, beside the library directory, namely the binaries directory. Let's call it vldBinDir. At the build's startup you can copy these files to the build directory:
def setupVld(env):
sourcePath = env['vldBinDir']
targetPath = env['testDir']
toCopy = ['dbghelp.dll',
'vld_x86.dll',
'Microsoft.DTfW.DHL.manifest']
nodes = []
for c in toCopy:
n = env.Command(os.path.join(targetPath, c),
os.path.join(sourcePath, c),
SCons.Defaults.Copy("${TARGET}", "${SOURCE}"))
nodes.append(n)
env['vldDeps'] = nodes
And then, when creating particular tests, make sure to add the dependency:
for n in env['vldDeps']:
env.Depends(tests, n)

How to install copperhead python module?

i want to install this module but get me some error :
my error in pc :
C:\Users\Ali\Desktop\copperhead-master>python setup.py
C:\Users\Ali\Desktop\copperhead-master\setuptools-0.6c9-py2.7.egg-info already e
xists
scons: Reading SConscript files ...
Checking C:\Anaconda\MinGW\bin\g++.exe version... (cached) C:\Anaconda\MinGW\bin
\g++.exe was not found
g++ 4.5 or better required, please add path to siteconf.py
Traceback (most recent call last):
File "setup.py", line 40, in <module>
raise CompileError("Error while building Python Extensions")
distutils.errors.CompileError: Error while building Python Extensions
C:\Users\Ali\Desktop\copperhead-master>
and i fix inside of siteconf.py but still get me error ,
source of siteconf.py :
#! /usr/bin/env python
#
# Configuration file.
# Use Python syntax, e.g.:
# VARIABLE = "value"
#
# The following information can be recorded:
#
# CXX : path and name of the host c++ compiler, eg: /usr/bin/g++-4.5
#
# CC : path and name of the host c compiler, eg: /usr/bin/gcc
#
# BOOST_INC_DIR : Directory where the Boost include files are found.
#
# BOOST_LIB_DIR : Directory where Boost shared libraries are found.
#
# BOOST_PYTHON_LIBNAME : Name of Boost::Python shared library.
# NOTE: Boost::Python must be compiled using the same compiler
# that was used to build your Python. Strange errors will
# ensue if this is not true.
# CUDA_INC_DIR : Directory where CUDA include files are found
#
# CUDA_LIB_DIR : Directory where CUDA libraries are found
#
# NP_INC_DIR : Directory where Numpy include files are found.
#
# TBB_INC_DIR : Directory where TBB include files are found
#
# TBB_LIB_DIR : Directory where TBB libraries are found
#
# THRUST_DIR : Directory where Thrust include files are found.
#
BOOST_INC_DIR = "C:\\Boost\\include\\boost-1_55\\boost"
BOOST_LIB_DIR = "C:\\Boost\\lib"
BOOST_PYTHON_LIBNAME = None
CC = "C:\\Anaconda\\MinGW\\bin\\gcc.exe"
CUDA_INC_DIR = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v5.5\\include"
CUDA_LIB_DIR = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v5.5\\lib\\x64"
CXX = "C:\\Anaconda\\MinGW\\bin\\g++.exe"
NP_INC_DIR = "C:\\Anaconda\\lib\\site-packages\\numpy\\core\\include"
TBB_INC_DIR = None
TBB_LIB_DIR = None
THRUST_DIR = None
G++ directory and file name is true , so what need i do to fix error ?
i use \ inside of directories is it true ? i also test with \ and //

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.