Is it possible, to enable bash alias expansion in snakemake?
I'm writing a workflow that is taking a config file for execution parameters.
Let's assume in this config file, the location of a program executable must be defined and will be passed to the shell of a rule.
config.yaml:
myProgram: /very/long/path/to/executable/build/myprogram
Snakefile:
rule: runMyProgram
input: inputfile.txt
output: outputfile.txt
shell: "{config[myProgram]} -i {input} -o {output}"
But I would like to also enable the opportunity to directly call the program with the config.yaml:
myProgram: myprogram
In this case, if the user has set an alias instead of adding myprogram to the $PATH, the shell does not recognize the alias and the rule ends with an error. When testing shopt expand_aliases within the snakemake shell, I see it is turned off, but adding shopt -s expand_aliases to the shell directive of the rule also doesn't do the trick.
I also tried adding the shopt command to the shell.prefix(), with no success, obviously, as it just adds the prefix to the shell.
While all of us would agree, in this minimal example the user should just put the path to the executable to $PATH, there are circumstances where a user would e.g. use different program versions under different aliases.
Or phrased differently: I would like the program not to crash if a user puts an alias instead of adding it to the $PATH.
Hence I was wondering if there was another possibility to turn on expand_aliases globally for Snakemake?
Related
Lets say I have the following python script
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--host", required=True)
parser.add_argument("--enabled", default=False, action="store_true")
args = parser.parse_args()
print("host: " + args.host)
print("enabled: " + str(args.enabled))
$ python3 test.py --host test.com
host: test.com
enabled: False
$ python3 test.py --host test.com --enabled
host: test.com
enabled: True
Now the script is used in a docker image and I want to pass the variables in docker run. For the host parameter it is quite easy
FROM python:3.10-alpine
ENV MY_HOST=default.com
#ENV MY_ENABLED=
ENV TZ=Europe/Berlin
WORKDIR /usr/src/app
COPY test.py .
CMD ["sh", "-c", "python test.py --host ${MY_HOST}"]
But how can I can make the --enabled flag to work? So when the/an ENV is unset or is 0 or off ore something, --enabled should be suppressed, otherwise it should be included in the CMD.
Is is possible without modify the python script?
For exactly the reasons you're showing here, I'd suggest modifying your script to be able to accept command-line options from environment variables. If you add a line
parser.set_defaults(
host=os.environ.get('MY_HOST'),
enabled=(os.environ.get('MY_ENABLED') == 'true')
)
then you can use docker run -e options to provide these values, without the complexity of trying to reconstruct the command line based on which options are and aren't present. (Also see Setting options from environment variables when using argparse.)
CMD ["./test.py"] # a fixed string, environment variables specified separately
docker run -e MY_HOST=example.com -e MY_ENABLED=true my-image
Conversely, you can provide the entire command line and its options when you run the container. (But depending on the context you might just be pushing the "how to construct the command" question up a layer.)
docker run my-image \
./test.py --host=example.com --enabled
In principle you can construct this using a separate shell script without modifying your Python script, but it will be somewhat harder and significantly less safe. That script could look something like
#!/bin/sh
TEST_ARGS="--host $MY_HOST"
if [ -n "$MY_ENABLED" ]; then
TEST_ARGS="$TEST_ARGS --enabled"
fi
exec ./test.py $TEST_ARGS
# ^^^^^^^^^^ without double quotes (usually a bug)
Expanding $TEST_ARGS without putting it in double quotes causes the shell to split the string's value on whitespace. This is usually a bug since it would cause directory names like /home/user/My Files to get split into multiple words. You're still at some risk if the environment variable values happen to contain whitespace or other punctuation, intentionally or otherwise.
There are safer but more obscure ways to approach this in shells with extensions like GNU bash, but not all Docker images contain these. Rather than double-check that your image has bash, and figure out bash array syntax, and write a separate script to do the argument handling, this is where I suggest handling it exclusively at the Python layer is a better approach.
I am trying to use fabric (v2.6) to run some commands that make use of bash's extglob and dotglob.
When I run:
c.run(f"shopt -s extglob dotglob && rm -Rf {project_path}* !(.|..|.venv) && shopt -u extglob dotglob")
I get this error:
`bash: -c: line 0: syntax error near unexpected token `('`
I am using the && because I found doing shopt -s extglob dotglob in a separate run call doesn't persist for the subsequent run calls. I'm pretty sure using && is enabling extglob and dotglob because when I do this:
`c.run("shopt -s extglob dotglob && shopt")`
It prints out the list of options and extglob and dotglob are both enabled.
Where am I going wrong here?
From the bash wiki:
extglob changes the way certain characters are parsed. It is necessary to have a newline (not just a semicolon) between shopt -s extglob and any subsequent commands to use it.
So you have to change your python code appropriately so that a newline is used instead of &&.
Or just do what the bash invocation does directly in python.
It seems extglob can't be used with Python Fabric unfortunately.
From the bash docs
extglob changes the way certain characters are parsed. It is necessary
to have a newline (not just a semicolon) between shopt -s extglob and
any subsequent commands to use it.
But from the Fabric docs
While Fabric can be used for many shell-script-like tasks, there’s a
slightly unintuitive catch: each run [...] has its own distinct shell session. This is required
in order for Fabric to reliably figure out, after your command has
run, what its standard out/error and return codes were.
Fortunately, a similar thing can be achieved using Bash's GLOBIGNORE shell variable instead
The GLOBIGNORE shell variable may be used to restrict the set of file
names matching a pattern. If GLOBIGNORE is set, each matching file
name that also matches one of the patterns in GLOBIGNORE is removed
from the list of matches. If the nocaseglob option is set, the
matching against the patterns in GLOBIGNORE is performed without
regard to case. The filenames . and .. are always ignored when
GLOBIGNORE is set and not null. However, setting GLOBIGNORE to a
non-null value has the effect of enabling the dotglob shell option, so
all other filenames beginning with a ‘.’ will match. To get the old
behavior of ignoring filenames beginning with a ‘.’, make ‘.*’ one of
the patterns in GLOBIGNORE. The dotglob option is disabled when
GLOBIGNORE is unset.
This also handily ignores . and .. when expanding wildcards, so to remove all files - except '.venv' - in a directory, we can do
c.run("GLOBIGNORE='.venv'; rm -Rf {project_path}*")
I placed a script in /etc/profile.d/
# default_dba.sh
if groups | grep -qw "dba" ;
then
if [ $USER != "oracle" ]; then
. /u00/scripts/oracle_alias
fi
fi
The scipt sets aliases if the LDAP user is a member of the dba group.
This works.
The LDAP user starts a python script.
As a last step the python script calls a new bash shell
subprocess.call(['/bin/bash', '-i'], shell=True)
In that shell session there are the special aliases (created by the /u00/scripts/oracle_alias script) missing, just the default os aliases are there.
Can I fix this without creating home directories for LDAP users?
The startup files (under /etc/profile etc..) are read only when the shell is invoked as 'login' shell. eg:- bash -l
See 'INVOCATION' section under man bash for more details.
snippet (from man page)
When bash is invoked as an interactive login shell, or as a non-interactive shell with the --login option,
it first reads and executes commands from the file /etc/profile, if that file exists.
After reading that file, it looks for ~/.bash_profile, ~/.bash_login, and ~/.profile, in that order,
and reads and executes commands from the first one that exists and is readable.
The --noprofile option may be used when the shell is started to inhibit this behavior.
My .profile defines a function
myps () {
ps -aef|egrep "a|b"|egrep -v "c\-"
}
I'd like to execute it from my python script
import subprocess
subprocess.call("ssh user#box \"$(typeset -f); myps\"", shell=True)
Getting an error back
bash: -c: line 0: syntax error near unexpected token `;'
bash: -c: line 0: `; myps'
Escaping ; results in
bash: ;: command not found
script='''
. ~/.profile # load local function definitions so typeset -f can emit them
ssh user#box ksh -s <<EOF
$(typeset -f)
myps
EOF
'''
import subprocess
subprocess.call(['ksh', '-c', script]) # no shell=True
There are a few pertinent items here:
The dotfile defining this function needs to be locally invoked before you run typeset -f to dump the function's definition over the wire. By default, a noninteractive shell does not run the majority of dotfiles (any specified by the ENV environment variable is an exception).
In the given example, this is served by the . ~/profile command within the script.
The shell needs to be one supporting typeset, so it has to be bash or ksh, not sh (as used by script=True by default), which may be provided by ash or dash, lacking this feature.
In the given example, this is served by passing ['ksh', '-c'] is the first two arguments to the argv array.
typeset needs to be run locally, so it can't be in an argv position other than the first with script=True. (To provide an example: subprocess.Popen(['''printf '%s\n' "$#"''', 'This is just literal data!', '$(touch /tmp/this-is-not-executed)'], shell=True) evaluates only printf '%s\n' "$#" as a shell script; This is just literal data! and $(touch /tmp/this-is-not-executed) are passed as literal data, so no file named /tmp/this-is-not-executed is created).
In the given example, this is mooted by not using script=True.
Explicitly invoking ksh -s (or bash -s, as appropriate) ensures that the shell evaluating your function definitions matches the shell you wrote those functions against, rather than passing them to sh -c, as would happen otherwise.
In the given example, this is served by ssh user#box ksh -s inside the script.
I ended up using this.
import subprocess
import sys
import re
HOST = "user#" + box
COMMAND = 'my long command with many many flags in single quotes'
ssh = subprocess.Popen(["ssh", "%s" % HOST, COMMAND],
shell=False,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
result = ssh.stdout.readlines()
The original command was not interpreting the ; before myps properly. Using sh -c fixes that, but... ( please see Charles Duffy comments below ).
Using a combination of single/double quotes sometimes makes the syntax easier to read and less prone to mistakes. With that in mind, a safe way to run the command ( provided the functions in .profile are actually accessible in the shell started by the subprocess.Popen object ):
subprocess.call('ssh user#box "$(typeset -f); myps"', shell=True),
An alternative ( less safe ) method would be to use sh -c for the subshell command:
subprocess.call('ssh user#box "sh -c $(echo typeset -f); myps"', shell=True)
# myps is treated as a command
This seemingly returned the same result:
subprocess.call('ssh user#box "sh -c typeset -f; myps"', shell=True)
There are definitely alternative methods for accomplishing these type of tasks, however, this might give you an idea of what the issue was with the original command.
I've been trying to pass in an environment variable to a Docker container via the -e option. The variable is meant to be used in a supervisor script within the container. Unfortunately, the variable does not get resolved (i.e. they stay for instance$INSTANCENAME). I tried ${var} and "${var}", but this didn't help either. Is there anything I can do or is this just not possible?
The docker run command:
sudo docker run -d -e "INSTANCENAME=instance-1" -e "FOO=2" -v /var/app/tmp:/var/app/tmp -t myrepos/app:tag
and the supervisor file:
[program:app]
command=python test.py --param1=$FOO
stderr_logfile=/var/app/log/$INSTANCENAME.log
directory=/var/app
autostart=true
The variable is being passed to your container, but supervisor doesn't let use environment variables like this inside the configuration files.
You should review the supervisor documentation, and specifically the parts about string expressions. For example, for the command option:
Note that the value of command may include Python string expressions, e.g. /path/to/programname --port=80%(process_num)02d might expand to /path/to/programname --port=8000 at runtime.
String expressions are evaluated against a dictionary containing the keys group_name, host_node_name, process_num, program_name, here (the directory of the supervisord config file), and all supervisord’s environment variables prefixed with ENV_.