snakemake : rule's input with different pattern - python

I am new to snakemake and would like to use the following rule :
input_path = config["PATH"]
samples = pd.read_csv(config["METAFILE"], sep = '\t', header = 0)['sample']
rule getPaired:
output:
fwd = temp(tmp_path + "/reads/{sample}_fwd.fastq.gz"),
rev = temp(tmp_path + "/reads/{sample}_rev.fastq.gz")
params:
input_path = input_path
run:
shell("scp -i {params.input_path}/{wildcards.sample}_*1*.f*q.gz {output.fwd}"),
shell("scp -i {params.input_path}/{wildcards.sample}_*2*.f*q.gz {output.rev}")
Input files have different patterns :
{sampleID}_R[1-2]_001.fq.gz (for example : 2160_J15_S480_R1_001.fastq.gz)
{sampleID}_[1-2].fq.gz (for example : SRX000001_1.fq.gz)
The getPaired rule works for input like {sample}_[1-2].fq.gz but not for the second pattern.
What am I doing wrong ?

You should make use of input functions. I made an example which isn't really what you need, but I think it should clearly show what you want to achieve:
paths = {'sample1': '/home/jankees/data',
'sample2': '/mnt/data',
'sample3': '/home/christina/fastq'}
extensions = {'sample1': '.fq.gz',
'sample2': '.fq.gz',
'sample3': '.fastq.gz'}
def get_input(wildcards):
input_file = paths[wildcards.sample] + "/read/" + wildcards.sample + extensions[wildcards.sample]
return input_file
rule all:
input:
["sample1_trimmed.fastq.gz",
"sample2_trimmed.fastq.gz",
"sample3_trimmed.fastq.gz"]
rule trim:
input:
get_input
output:
"{sample}_trimmed.fastq.gz"
shell:
"touch {output}"

Related

Optional Output from Bazel Action? (SWIG rule for Bazel)

I'm working on a bazel rule (using version 5.2.0) that uses SWIG (version 4.0.1) to make a python library from C++ code, adapted from a rule in the tensorflow library. The problem I've run into is that, depending on the contents of ctx.file.source.path, the swig invocation might produce a necessary .h file. If it does, the rule below works great. If it doesn't, I get:
ERROR: BUILD:31:11: output 'foo_swig_h.h' was not created
ERROR: BUILD:31:11: SWIGing foo.i. failed: not all outputs were created or valid
If the h_out stuff is removed from _py_swig_gen_impl, the rule below works great when swig doesn't produce the .h file. But, if swig does produce one, bazel seems to ignore it and it isn't available for native.cc_binary to compile, resulting in gcc failing with a 'no such file or directory' error on an #include <foo_swig_cc.h> line in foo_swig_cc.cc.
(The presence or absence of the .h file in the output is determined by whether the .i file at ctx.file.source.path uses SWIG's "directors" feature.)
def _include_dirs(deps):
return depset(transitive = [dep[CcInfo].compilation_context.includes for dep in deps]).to_list()
def _headers(deps):
return depset(transitive = [dep[CcInfo].compilation_context.headers for dep in deps]).to_list()
# Bazel rules for building swig files.
def _py_swig_gen_impl(ctx):
module_name = ctx.attr.module_name
cc_out = ctx.actions.declare_file(module_name + "_swig_cc.cc")
h_out = ctx.actions.declare_file(module_name + "_swig_h.h")
py_out = ctx.actions.declare_file(module_name + ".py")
args = ["-c++", "-python", "-py3"]
args += ["-module", module_name]
args += ["-I" + x for x in _include_dirs(ctx.attr.deps)]
args += ["-I" + x.dirname for x in ctx.files.swig_includes]
args += ["-o", cc_out.path]
args += ["-outdir", py_out.dirname]
args += ["-oh", h_out.path]
args.append(ctx.file.source.path)
outputs = [cc_out, h_out, py_out]
ctx.actions.run(
executable = "swig",
arguments = args,
mnemonic = "Swig",
inputs = [ctx.file.source] + _headers(ctx.attr.deps) + ctx.files.swig_includes,
outputs = outputs,
progress_message = "SWIGing %{input}.",
)
return [DefaultInfo(files = depset(direct = [cc_out, py_out]))]
_py_swig_gen = rule(
attrs = {
"source": attr.label(
mandatory = True,
allow_single_file = True,
),
"swig_includes": attr.label_list(
allow_files = [".i"],
),
"deps": attr.label_list(
allow_files = True,
providers = [CcInfo],
),
"module_name": attr.string(mandatory = True),
},
implementation = _py_swig_gen_impl,
)
def py_wrap_cc(name, source, module_name = None, deps = [], copts = [], **kwargs):
if module_name == None:
module_name = name
python_deps = [
"#local_config_python//:python_headers",
"#local_config_python//:python_lib",
]
# First, invoke the _py_wrap_cc rule, which runs swig. This outputs:
# `module_name.cc`, `module_name.py`, and, sometimes, `module_name.h` files.
swig_rule_name = "swig_gen_" + name
_py_swig_gen(
name = swig_rule_name,
source = source,
swig_includes = ["//third_party/swig_rules:swig_includes"],
deps = deps + python_deps,
module_name = module_name,
)
# Next, we need to compile the `module_name.cc` and `module_name.h` files
# from the previous rule. The `module_name.py` file already generated
# expects there to be a `_module_name.so` file, so we name the cc_binary
# rule this way to make sure that's the resulting file name.
cc_lib_name = "_" + module_name + ".so"
native.cc_binary(
name = cc_lib_name,
srcs = [":" + swig_rule_name],
linkopts = ["-dynamic", "-L/usr/local/lib/"],
linkshared = True,
deps = deps + python_deps,
)
# Finally, package everything up as a python library that can be depended
# on. Note that this rule uses the user-given `name`.
native.py_library(
name = name,
srcs = [":" + swig_rule_name],
srcs_version = "PY3",
data = [":" + cc_lib_name],
imports = ["./"],
)
My question, broadly, how I might best handle this with a single rule. I've tried adding a ctx.actions.write before the ctx.actions.run, thinking that I could generate a dummy '.h' file that would be overwritten if needed. That gives me:
ERROR: BUILD:41:11: for foo_swig_h.h, previous action: action 'Writing file foo_swig_h.h', attempted action: action 'SWIGing foo.i.'
My next idea is to remove the h_out stuff and then try to capture the h file for the cc_binary rule with some kind of glob invocation.
I've seen two approaches: add an attribute to indicate whether it applies, or write a wrapper script to generate it unconditionally.
Adding an attribute means something like "has_h": attr.bool(), and then use that in _py_swig_gen_impl to make the ctx.actions.declare_file(module_name + "_swig_h.h") conditional.
The wrapper script option means using something like this for the executable:
#!/bin/bash
set -e
touch the_path_of_the_header
exec swig "$#"
That will unconditionally create the output, and then swig will overwrite it if applicable. If it's not applicable, then passing around an empty header file in the Bazel rules should be harmless.
For posterity, this is what my _py_swig_gen_impl looks like after implementing #Brian's suggestion above:
def _py_swig_gen_impl(ctx):
module_name = ctx.attr.module_name
cc_out = ctx.actions.declare_file(module_name + "_swig_cc.cc")
h_out = ctx.actions.declare_file(module_name + "_swig_h.h")
py_out = ctx.actions.declare_file(module_name + ".py")
include_dirs = _include_dirs(ctx.attr.deps)
headers = _headers(ctx.attr.deps)
args = ["-c++", "-python", "-py3"]
args += ["-module", module_name]
args += ["-I" + x for x in include_dirs]
args += ["-I" + x.dirname for x in ctx.files.swig_includes]
args += ["-o", cc_out.path]
args += ["-outdir", py_out.dirname]
args += ["-oh", h_out.path]
args.append(ctx.file.source.path)
outputs = [cc_out, h_out, py_out]
# Depending on the contents of `ctx.file.source`, swig may or may not
# output a .h file needed by subsequent rules. Bazel doesn't like optional
# outputs, so instead of invoking swig directly we're going to make a
# lightweight executable script that first `touch`es the .h file that may
# get generated, and then execute that. This means we may be propagating
# an empty .h file around as a "dependency" sometimes, but that's okay.
swig_script_file = ctx.actions.declare_file("swig_exec.sh")
ctx.actions.write(
output = swig_script_file,
is_executable = True,
content = "#!/bin/bash\n\nset -e\ntouch " + h_out.path + "\nexec swig \"$#\"",
)
ctx.actions.run(
executable = swig_script_file,
arguments = args,
mnemonic = "Swig",
inputs = [ctx.file.source] + headers + ctx.files.swig_includes,
outputs = outputs,
progress_message = "SWIGing %{input}.",
)
return [
DefaultInfo(files = depset(direct = outputs)),
]
The ctx.actions.write generates the suggested bash script:
#!/bin/bash
set -e
touch %{h_out.path}
exec swig "$#"
Which guarantees that the expected h_out will always be output by ctx.actions.run, whether or not swig generates it.

Using argparse or docopt for pdftk replacement

I have am writing a program that accepts arguements in the following form.
SYNOPSIS
pdf_form.py <input PDF file | - | PROMPT> [ <operation> <operation arguments> ]
[ output <output filename | - | PROMPT> ] [ flatten ]
Where:
<operation> may be empty, or: [generate_fdf | fill_form |dump_data | dump_data_fields ]
OPTIONS
--help, -h
show summary of options.
<input PDF files | - | PROMPT>
An input PDF file. Use - to pass a single PDF into pdftk via stdin.
[<operation> <operation arguments>]
Available operations are:
generate_fdf,fill_form, dump_data,dump_data_fields
Some operations takes additional arguments, described below.
generate_fdf
Reads a single, input PDF file and generates an FDF file suitable for fill_form out of it
to the given output filename or (if no output is given) to stdout. Does not create a new PDF.
fill_form <FDF data filename | - | PROMPT>
Fills the input PDF's form fields with the data from an FDF file, or stdin.
Enter the data filename after fill_form, or use - to pass the data via stdin, like so:
./pdf_form.py form.pdf fill_form data.fdf output form.filled.pdf
After filling a form, the form fields remain interactive unless flatten is used.
flatten merges the form fields with the PDF pages. You can also use flatten alone, as shown:
./pdf_form.py form.pdf fill_form data.fdf output out.pdf flatten
or:
./pdf_form.py form.filled.pdf output out.pdf flatten
dump_data
Reads a single, input PDF file and reports various statistics, to the given output filename or (if no output is given).
dump_data_fields
Reads a single, input PDF file and reports form field statistics to the given output filename
[flatten]
Use this option to merge an input PDF's interactive form fields and their data with the PDF's pages.
I am currently parsing these options using the following (messy) code
def parse(arg):
""" Parses commandline arguments. """
#checking that request is valid
if len(arg) == 1:
raise ValueError(info())
if arg[1] in ('-h', '--help'):
print(info(verbose=True))
return
if len(arg) == 2:
raise ValueError(info())
path = arg[1]
if path == 'PROMPT':
path = input('Please enter a filename for an input document:')
func = arg[2]
if func == 'fill_form':
fdf = arg[3]
if len(arg) > 5 and arg[4] == 'output':
out = arg[5]
else:
out = '-'
else:
fdf = None
if len(arg) > 4 and arg[3] == 'output':
out = arg[4]
else:
out = '-'
if out == 'PROMPT':
out = input('Output file: ')
flatten = 'flatten' in arg
return path, func, fdf, out, flatten
Is there a better way to do this with either argparse or docopt?
One argument could be
parser.add_argument('-o','--output', default='-')
and later
if args.output in ['PROMPT']:
... input...
others:
parser.add_argument('--flatten', action='store_true')
parser.add_argument('--fill_form', dest='ftd')
parser.add_argument('--path')
if args.path in ['PROMPT']:
args.path = input...

Passing command line arguments from powershell script to a python script

I call the python code from a Powershell script in order to loop over some arguments. Calling the python script from a Powershell is straight forward and works without a hitch:
PS C:\Windows\system32> C:\Users\Administrator\AppData\Local\Programs\Python\Python35-32\python.exe C:\Users\Administrator\AppData\Local\Programs\youtube-upload-master\bin\youtube-upload C:\Users\Administrator\Documents\timelapse\videos\timelapse_10.0.0.51-2016-06-21.mp4 --client-secrets=C:\Users\Administrator\Documents\timelapse\credentials\.yt-ul-ioa-secr.json --credentials-file=C:\Users\Administrator\Documents\timelapse\credentials\.yt-ul-ioa-cred.json --title="Timelapse 21.06.2016" --playlist "Timelapses June 2016"
Then within a script I am changing the parameters inserting variables into the argument strings, and finally calling the whole thing with Invoke-Command:
# yt-ul.ps1
param(
#[switch]$all_cams = $false,
[int]$days = -1,
[string]$cam = "ioa"
)
$cam_ip_hash = #{
"ioa" = "10.0.0.51";
"pam" = "10.0.0.52";
"biz" = "10.0.0.56";
"prz" = "10.160.58.25";
"igu" = "10.160.38.35"}
$cam_ip = $cam_ip_hash[$cam]
$date = (Get-Date).AddDays($days).ToString("yyyy-MM-dd")
$py = "C:\Users\Administrator\AppData\Local\Programs\Python\Python35-32\python.exe"
$yt_ul = "C:\Users\Administrator\AppData\Local\Programs\youtube-upload-master\bin\youtube-upload"
$title_date = (Get-Date).AddDays($days).ToString("dd.MM.yyyy")
$us = New-Object System.Globalization.CultureInfo("en-US")
$playlist_date = (Get-Date).AddDays($days).ToString("Y", $us)
$vid = "C:\Users\Administrator\Documents\timelapse\videos\timelapse_$cam_ip-$date.mp4"
$secr = "--client-secrets=C:\Users\Administrator\Documents\timelapse\credentials\.yt-ul-igu-secr.json"
$cred = "--credentials-file=C:\Users\Administrator\Documents\timelapse\credentials\.yt-ul-igu-cred.json"
$title = "--title=`"Timelapse $title_date`""
$playlist_date = "--playlist `"Timelapses $playlist_date`""
$arg_list = "$yt_ul $vid $secr $cred $title $playlist_date"
Invoke-Command "$py $arg_list"
But actually calling the script fails as follows:
PS C:\Users\Administrator\Documents\scripts> .\yt-ul.ps1
Invoke-Command : Parameter set cannot be resolved using the specified named parameters.
At C:\Users\Administrator\Documents\scripts\yt-ul.ps1:34 char:1
+ Invoke-Command "$py $arg_list"
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [Invoke-Command], ParameterBindingException
+ FullyQualifiedErrorId : AmbiguousParameterSet,Microsoft.PowerShell.Commands.InvokeCommandCommand
I assume I am doing something really stupid with the single and double quotes, but I am not sure.
Thanks to JosefZ this works:
& $py $yt_ul $vid $secr $cred --title "Timelapse $title_date" --playlist "Timelapses $playlist_date"

Variable repeating depending on loop

I'm searching a string into 2 websites.
Each website have the same string, now the main problem is:
s = "name:'(.*)',"
WebType = re.findall(s, html1)
for name in WebType:
if 'RoundCube' or 'SquirrelMail' in name:
print host + "--> " + name
So basically, I think that the results repeat depending on loop results.
The results are:
https://website1.com--> Roundcube
https://website1.com--> SquirrelMail
https://website2.com--> Roundcube
https://website2.com--> SquirrelMail
How can I make the results to be:
https://website1.com--> RoundCube, SquirrelMail
https://website2.com--> Roundcube, SquirrelMail
dursk's solution above should have worked. However, a longer solution is:
s = "name:'(.*)',"
WebType = re.findall(s, html1)
results = []
for name in WebType:
if 'RoundCube' or 'SquirrelMail' in name:
results.append(name)
print host + "--> " + ', '.join(results)
Try this code:
import random
hosts = ['https://website1.com', 'https://website2.com']
WebList = ['RoundCube', 'SquirrelMail'] + ['webtype_{}'.format(x) for x in range(2)]
WebType = WebList[random.randint(0, len(WebList)):]
name_list = ['RoundCube', 'SquirrelMail']
for host in hosts:
name = filter(set(WebType).__contains__, name_list)
if len(name):
print host + "--> " + str(name).lstrip('[').rstrip(']').replace('\'', '')
I created WebList for testing purposes.
In your code, you will not need to import random unless you want to test what's here.
It outputs:
https://website1.com--> RoundCube, SquirrelMail
https://website2.com--> RoundCube, SquirrelMail
It also outputs:
https://website1.com--> SquirrelMail
https://website2.com--> SquirrelMail
You might need to adjust it to your needs.

ignore provided input in python

I have python code which calls a shell script (get_list.sh) and this shell script calls one .txt file which is having the entires like :
aaf:hfhfh:notusable:type: - city_name
hhf:hgyt:usable:type: - city_name
llf:hdgt:used:type: - city_name
and when I providing the input like after running the python code :
code for providing the input :
List = str(raw_input('Enter pipe separated list : ')).upper().strip()
hhf|aaf|llf
code for getting the output :
if List:
try:
cmd = "/home/dponnura/get_list.sh -s " + "\"" + List + "\""
selfP = commands.getoutput(cmd).strip()
except OSError:
print bcolors.FAIL + "Could not invoke Pod Details Script. " + bcolors.ENDC
it shows me the output as :
hhf detils : hfhfh:notusable:type: - city_name
aaf details : hgyt:usable:type: - city_name
llf details : hdgt:used:type: - city_name
What my requirnment is, if I passes the input after execution of python code and if my enties are not present in .txt file it should show me the output as :
if I provide the input as :
hhf|aaf|llf|ggg
then for 'ggg' it should show me like :
'ggg' is wrong input
hhf detils : hfhfh:notusable:type: - city_name
aaf details : hgyt:usable:type: - city_name
llf details : hdgt:used:type: - city_name
Could you please let me know how can I do this in python or shell?
Here is this done in Python, without calling get_list.sh
import sys,re
List = str(raw_input('Enter pipe separated list : ')).strip().split('|')
for linE in open(sys.argv[1]):
for l1 in List:
m1 = re.match(l1+':(.+)',linE)
if m1:
print l1,'details :',m1.group(1)
List.remove(l1)
break
for l1 in List : print l1,'is wrong input'
Usage :
python script.py textFile.txt
Your task can ( and I think have to ) be implemented in pure Python. Below is one of possible variants of its solution using just Python, without external scripts or libraries
pipelst = str(raw_input('Enter pipe separated list : ')).split('|')
filepath = 'test.txt' # specify path to your file here
for lns in open(filepath):
split_pipe = lns.split(':', 1)
if split_pipe[0] in pipelst:
print split_pipe[0], ' details : ', split_pipe[1]
pipelst.remove(split_pipe[0])
for lns in pipelst : print lns,' is wrong input'
As you can see, it is a short, simple and clear.

Categories