Using Python to parse complex arguments to shell script

Using Python to parse complex arguments to shell script - python

When I'm writing shell scripts, I often find myself spending most of my time (especially when debugging) dealing with argument processing. Many scripts I write or maintain are easily more than 80% input parsing and sanitization. I compare that to my Python scripts, where argparse handles most of the grunt work for me, and lets me easily construct complex option structures and sanitization / string parsing behavior.
I'd love, therefore, to be able to have Python do this heavy lifting, and then get these simplified and sanitized values in my shell script, without needing to worry any further about the arguments the user specified.
To give a specific example, many of the shell scripts where I work have been defined to accept their arguments in a specific order. You can call start_server.sh --server myserver --port 80 but start_server.sh --port 80 --server myserver fails with You must specify a server to start. - it makes the parsing code a lot simpler, but it's hardly intuitive.
So a first pass solution could be something as simple as having Python take in the arguments, sort them (keeping their parameters next to them) and returning the sorted arguments. So the shell script still does some parsing and sanitization, but the user can input much more arbitrary content than the shell script natively accepts, something like:
# script.sh -o -aR --dir /tmp/test --verbose
#!/bin/bash
args=$(order.py "$#")
# args is set to "-a --dir /tmp/test -o -R --verbose"
# simpler processing now that we can guarantee the order of parameters
There's some obvious limitations here, notably that parse.py can't distinguish between a final option with an argument and the start of indexed arguments, but that doesn't seem that terrible.
So here's my question: 1) Is there any existing (Python preferably) utility to enable CLI parsing by something more powerful than bash, which can then be accessed by the rest of my bash script after sanitization, or 2) Has anyone done this before? Are there issues or pitfalls or better solutions I'm not aware of? Care to share your implementation?
One (very half-baked) idea:
#!/bin/bash
# Some sort of simple syntax to describe to Python what arguments to accept
opts='
"a", "append", boolean, help="Append to existing file"
"dir", str, help="Directory to run from"
"o", "overwrite", boolean, help="Overwrite duplicates"
"R", "recurse", boolean, help="Recurse into subdirectories"
"v", "verbose", boolean, help="Print additional information"
'
# Takes in CLI arguments and outputs a sanitized structure (JSON?) or fails
p=$(parse.py "Runs complex_function with nice argument parsing" "$opts" "$#")
if [ $? -ne 0 ]; exit 1; fi # while parse outputs usage to stderr
# Takes the sanitized structure and an argument to get
append=$(arg.py "$p" append)
overwrite=$(arg.py "$p" overwrite)
recurse=$(arg.py "$p" recurse)
verbose=$(arg.py "$p" verbose)
cd $(python arg.py "$p" dir)
complex_function $append $overwrite $recurse $verbose
Two lines of code, along with concise descriptions of the arguments to expect, and we're on to the actual script behavior. Maybe I'm crazy, but that seems way nicer than what I feel like I have to do now.
I've seen Parsing shell script arguments and things like this wiki page on easy CLI argument parsing, but many of these patterns feel clunky and error prone, and I dislike having to re-implement them every time I write a shell script, especially when Python, Java, etc. have such nice argument processing libraries.

You could potentially take advantage of associative arrays in bash to help obtain your goal.
declare -A opts=($(getopts.py $#))
cd ${opts[dir]}
complex_function ${opts[append]} ${opts[overwrite]} ${opts[recurse]} \
${opts[verbose]} ${opts[args]}
To make this work, getopts.py should be a python script that parses and sanitizes your arguments. It should print a string like the following:
[dir]=/tmp
[append]=foo
[overwrite]=bar
[recurse]=baz
[verbose]=fizzbuzz
[args]="a b c d"
You could set aside values for checking that the options were able to be properly parsed and sanitized as well.
Returned from getopts.py:
[__error__]=true
Added to bash script:
if ${opts[__error__]}; then
exit 1
fi
If you would rather work with the exit code from getopts.py, you could play with eval:
getopts=$(getopts.py $#) || exit 1
eval declare -A opts=($getopts)
Alternatively:
getopts=$(getopts.py $#)
if [[ $? -ne 0 ]]; then
exit 1;
fi
eval declare -A opts=($getopts)

Having the very same needs, I ended up writing an optparse-inspired parser for bash (which actually uses python internally); you can find it here:
https://github.com/carlobaldassi/bash_optparse
See the README at the bottom for a quick explanation. You may want to check out a simple example at:
https://github.com/carlobaldassi/bash_optparse/blob/master/doc/example_script_simple
From my experience, it's quite robust (I'm super-paranoid), feature-rich, etc., and I'm using it heavily in my scripts. I hope it may be useful to others. Feedback/contributions welcome.

Edit: I haven't used it (yet), but if I were posting this answer today I would probably recommend https://github.com/docopt/docopts instead of a custom approach like the one described below.
I've put together a short Python script that does most of what I want. I'm not convinced it's production quality yet (notably error handling is lacking), but it's better than nothing. I'd welcome any feedback.
It takes advantage of the set builtin to re-assign the positional arguments, allowing the remainder of the script to still handle them as desired.
bashparse.py
#!/usr/bin/env python
import optparse, sys
from pipes import quote
'''
Uses Python's optparse library to simplify command argument parsing.
Takes in a set of optparse arguments, separated by newlines, followed by command line arguments, as argv[2] and argv[3:]
and outputs a series of bash commands to populate associated variables.
'''
class _ThrowParser(optparse.OptionParser):
def error(self, msg):
"""Overrides optparse's default error handling
and instead raises an exception which will be caught upstream
"""
raise optparse.OptParseError(msg)
def gen_parser(usage, opts_ls):
'''Takes a list of strings which can be used as the parameters to optparse's add_option function.
Returns a parser object able to parse those options
'''
parser = _ThrowParser(usage=usage)
for opts in opts_ls:
if opts:
# yes, I know it's evil, but it's easy
eval('parser.add_option(%s)' % opts)
return parser
def print_bash(opts, args):
'''Takes the result of optparse and outputs commands to update a shell'''
for opt, val in opts.items():
if val:
print('%s=%s' % (opt, quote(val)))
print("set -- %s" % " ".join(quote(a) for a in args))
if __name__ == "__main__":
if len(sys.argv) < 2:
sys.stderr.write("Needs at least a usage string and a set of options to parse")
sys.exit(2)
parser = gen_parser(sys.argv[1], sys.argv[2].split('\n'))
(opts, args) = parser.parse_args(sys.argv[3:])
print_bash(opts.__dict__, args)
Example usage:
#!/bin/bash
usage="[-f FILENAME] [-t|--truncate] [ARGS...]"
opts='
"-f"
"-t", "--truncate",action="store_true"
'
echo "$(./bashparse.py "$usage" "$opts" "$#")"
eval "$(./bashparse.py "$usage" "$opts" "$#")"
echo
echo OUTPUT
echo $f
echo $#
echo $0 $2
Which, if run as: ./run.sh one -f 'a_filename.txt' "two' still two" three outputs the following (notice that the internal positional variables are still correct):
f=a_filename.txt
set -- one 'two'"'"' still two' three
OUTPUT
a_filename.txt
one two' still two three
./run.sh two' still two
Disregarding the debugging output, you're looking at approximately four lines to construct a powerful argument parser. Thoughts?

The original premise of my question assumes that delegating to Python is the right approach to simplify argument parsing. If we drop the language requirement we can actually do a decent job* in Bash, using getopts and a little eval magic:
main() {
local _usage='foo [-a] [-b] [-f val] [-v val] [args ...]'
eval "$(parse_opts 'f:v:ab')"
echo "f=$f v=$v a=$a b=$b -- $#: $*"
}
main "$#"
The implementation of parse_opts is in this gist, but the basic approach is to convert options into local variables which can then be handled like normal. All the standard getopts boilerplate is hidden away, and error handling works as expected.
Because it uses local variables within a function, parse_opts is not just useful for command line arguments, it can be used with any function in your script.
* I say "decent job" because Bash's getopts is a fairly limited parser and only supports single-letter options. Elegant, expressive CLIs are still better implemented in other languages like Python. But for reasonably small functions or scripts this provides a nice middle ground without adding too much complexity or bloat.

Related

How can we execute the following bash commands in python linux [duplicate]

On my local machine, I run a python script which contains this line
bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
os.system(bashCommand)
This works fine.
Then I run the same code on a server and I get the following error message
'import site' failed; use -v for traceback
Traceback (most recent call last):
File "/usr/bin/cwm", line 48, in <module>
from swap import diag
ImportError: No module named swap
So what I did then is I inserted a print bashCommand which prints me than the command in the terminal before it runs it with os.system().
Of course, I get again the error (caused by os.system(bashCommand)) but before that error it prints the command in the terminal. Then I just copied that output and did a copy paste into the terminal and hit enter and it works...
Does anyone have a clue what's going on?

Don't use os.system. It has been deprecated in favor of subprocess. From the docs: "This module intends to replace several older modules and functions: os.system, os.spawn".
Like in your case:
import subprocess
bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()

To somewhat expand on the earlier answers here, there are a number of details which are commonly overlooked.
Prefer subprocess.run() over subprocess.check_call() and friends over subprocess.call() over subprocess.Popen() over os.system() over os.popen()
Understand and probably use text=True, aka universal_newlines=True.
Understand the meaning of shell=True or shell=False and how it changes quoting and the availability of shell conveniences.
Understand differences between sh and Bash
Understand how a subprocess is separate from its parent, and generally cannot change the parent.
Avoid running the Python interpreter as a subprocess of Python.
These topics are covered in some more detail below.
Prefer subprocess.run() or subprocess.check_call()
The subprocess.Popen() function is a low-level workhorse but it is tricky to use correctly and you end up copy/pasting multiple lines of code ... which conveniently already exist in the standard library as a set of higher-level wrapper functions for various purposes, which are presented in more detail in the following.
Here's a paragraph from the documentation:
The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.
Unfortunately, the availability of these wrapper functions differs between Python versions.
subprocess.run() was officially introduced in Python 3.5. It is meant to replace all of the following.
subprocess.check_output() was introduced in Python 2.7 / 3.1. It is basically equivalent to subprocess.run(..., check=True, stdout=subprocess.PIPE).stdout
subprocess.check_call() was introduced in Python 2.5. It is basically equivalent to subprocess.run(..., check=True)
subprocess.call() was introduced in Python 2.4 in the original subprocess module (PEP-324). It is basically equivalent to subprocess.run(...).returncode
High-level API vs subprocess.Popen()
The refactored and extended subprocess.run() is more logical and more versatile than the older legacy functions it replaces. It returns a CompletedProcess object which has various methods which allow you to retrieve the exit status, the standard output, and a few other results and status indicators from the finished subprocess.
subprocess.run() is the way to go if you simply need a program to run and return control to Python. For more involved scenarios (background processes, perhaps with interactive I/O with the Python parent program) you still need to use subprocess.Popen() and take care of all the plumbing yourself. This requires a fairly intricate understanding of all the moving parts and should not be undertaken lightly. The simpler Popen object represents the (possibly still-running) process which needs to be managed from your code for the remainder of the lifetime of the subprocess.
It should perhaps be emphasized that just subprocess.Popen() merely creates a process. If you leave it at that, you have a subprocess running concurrently alongside with Python, so a "background" process. If it doesn't need to do input or output or otherwise coordinate with you, it can do useful work in parallel with your Python program.
Avoid os.system() and os.popen()
Since time eternal (well, since Python 2.5) the os module documentation has contained the recommendation to prefer subprocess over os.system():
The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.
The problems with system() are that it's obviously system-dependent and doesn't offer ways to interact with the subprocess. It simply runs, with standard output and standard error outside of Python's reach. The only information Python receives back is the exit status of the command (zero means success, though the meaning of non-zero values is also somewhat system-dependent).
PEP-324 (which was already mentioned above) contains a more detailed rationale for why os.system is problematic and how subprocess attempts to solve those issues.
os.popen() used to be even more strongly discouraged:
Deprecated since version 2.6: This function is obsolete. Use the subprocess module.
However, since sometime in Python 3, it has been reimplemented to simply use subprocess, and redirects to the subprocess.Popen() documentation for details.
Understand and usually use check=True
You'll also notice that subprocess.call() has many of the same limitations as os.system(). In regular use, you should generally check whether the process finished successfully, which subprocess.check_call() and subprocess.check_output() do (where the latter also returns the standard output of the finished subprocess). Similarly, you should usually use check=True with subprocess.run() unless you specifically need to allow the subprocess to return an error status.
In practice, with check=True or subprocess.check_*, Python will throw a CalledProcessError exception if the subprocess returns a nonzero exit status.
A common error with subprocess.run() is to omit check=True and be surprised when downstream code fails if the subprocess failed.
On the other hand, a common problem with check_call() and check_output() was that users who blindly used these functions were surprised when the exception was raised e.g. when grep did not find a match. (You should probably replace grep with native Python code anyway, as outlined below.)
All things counted, you need to understand how shell commands return an exit code, and under what conditions they will return a non-zero (error) exit code, and make a conscious decision how exactly it should be handled.
Understand and probably use text=True aka universal_newlines=True
Since Python 3, strings internal to Python are Unicode strings. But there is no guarantee that a subprocess generates Unicode output, or strings at all.
(If the differences are not immediately obvious, Ned Batchelder's Pragmatic Unicode is recommended, if not outright obligatory, reading. There is a 36-minute video presentation behind the link if you prefer, though reading the page yourself will probably take significantly less time.)
Deep down, Python has to fetch a bytes buffer and interpret it somehow. If it contains a blob of binary data, it shouldn't be decoded into a Unicode string, because that's error-prone and bug-inducing behavior - precisely the sort of pesky behavior which riddled many Python 2 scripts, before there was a way to properly distinguish between encoded text and binary data.
With text=True, you tell Python that you, in fact, expect back textual data in the system's default encoding, and that it should be decoded into a Python (Unicode) string to the best of Python's ability (usually UTF-8 on any moderately up to date system, except perhaps Windows?)
If that's not what you request back, Python will just give you bytes strings in the stdout and stderr strings. Maybe at some later point you do know that they were text strings after all, and you know their encoding. Then, you can decode them.
normal = subprocess.run([external, arg],
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
check=True,
text=True)
print(normal.stdout)
convoluted = subprocess.run([external, arg],
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
check=True)
# You have to know (or guess) the encoding
print(convoluted.stdout.decode('utf-8'))
Python 3.7 introduced the shorter and more descriptive and understandable alias text for the keyword argument which was previously somewhat misleadingly called universal_newlines.
Understand shell=True vs shell=False
With shell=True you pass a single string to your shell, and the shell takes it from there.
With shell=False you pass a list of arguments to the OS, bypassing the shell.
When you don't have a shell, you save a process and get rid of a fairly substantial amount of hidden complexity, which may or may not harbor bugs or even security problems.
On the other hand, when you don't have a shell, you don't have redirection, wildcard expansion, job control, and a large number of other shell features.
A common mistake is to use shell=True and then still pass Python a list of tokens, or vice versa. This happens to work in some cases, but is really ill-defined and could break in interesting ways.
# XXX AVOID THIS BUG
buggy = subprocess.run('dig +short stackoverflow.com')
# XXX AVOID THIS BUG TOO
broken = subprocess.run(['dig', '+short', 'stackoverflow.com'],
shell=True)
# XXX DEFINITELY AVOID THIS
pathological = subprocess.run(['dig +short stackoverflow.com'],
shell=True)
correct = subprocess.run(['dig', '+short', 'stackoverflow.com'],
# Probably don't forget these, too
check=True, text=True)
# XXX Probably better avoid shell=True
# but this is nominally correct
fixed_but_fugly = subprocess.run('dig +short stackoverflow.com',
shell=True,
# Probably don't forget these, too
check=True, text=True)
The common retort "but it works for me" is not a useful rebuttal unless you understand exactly under what circumstances it could stop working.
To briefly recap, correct usage looks like
subprocess.run("string for 'the shell' to parse", shell=True)
# or
subprocess.run(["list", "of", "tokenized strings"]) # shell=False
If you want to avoid the shell but are too lazy or unsure of how to parse a string into a list of tokens, notice that shlex.split() can do this for you.
subprocess.run(shlex.split("no string for 'the shell' to parse")) # shell=False
# equivalent to
# subprocess.run(["no", "string", "for", "the shell", "to", "parse"])
The regular split() will not work here, because it doesn't preserve quoting. In the example above, notice how "the shell" is a single string.
Refactoring Example
Very often, the features of the shell can be replaced with native Python code. Simple Awk or sed scripts should probably just be translated to Python instead.
To partially illustrate this, here is a typical but slightly silly example which involves many shell features.
cmd = '''while read -r x;
do ping -c 3 "$x" | grep 'min/avg/max'
done <hosts.txt'''
# Trivial but horrible
results = subprocess.run(
cmd, shell=True, universal_newlines=True, check=True)
print(results.stdout)
# Reimplement with shell=False
with open('hosts.txt') as hosts:
for host in hosts:
host = host.rstrip('\n') # drop newline
ping = subprocess.run(
['ping', '-c', '3', host],
text=True,
stdout=subprocess.PIPE,
check=True)
for line in ping.stdout.split('\n'):
if 'min/avg/max' in line:
print('{}: {}'.format(host, line))
Some things to note here:
With shell=False you don't need the quoting that the shell requires around strings. Putting quotes anyway is probably an error.
It often makes sense to run as little code as possible in a subprocess. This gives you more control over execution from within your Python code.
Having said that, complex shell pipelines are tedious and sometimes challenging to reimplement in Python.
The refactored code also illustrates just how much the shell really does for you with a very terse syntax -- for better or for worse. Python says explicit is better than implicit but the Python code is rather verbose and arguably looks more complex than this really is. On the other hand, it offers a number of points where you can grab control in the middle of something else, as trivially exemplified by the enhancement that we can easily include the host name along with the shell command output. (This is by no means challenging to do in the shell, either, but at the expense of yet another diversion and perhaps another process.)
Common Shell Constructs
For completeness, here are brief explanations of some of these shell features, and some notes on how they can perhaps be replaced with native Python facilities.
Globbing aka wildcard expansion can be replaced with glob.glob() or very often with simple Python string comparisons like for file in os.listdir('.'): if not file.endswith('.png'): continue. Bash has various other expansion facilities like .{png,jpg} brace expansion and {1..100} as well as tilde expansion (~ expands to your home directory, and more generally ~account to the home directory of another user)
Shell variables like $SHELL or $my_exported_var can sometimes simply be replaced with Python variables. Exported shell variables are available as e.g. os.environ['SHELL'] (the meaning of export is to make the variable available to subprocesses -- a variable which is not available to subprocesses will obviously not be available to Python running as a subprocess of the shell, or vice versa. The env= keyword argument to subprocess methods allows you to define the environment of the subprocess as a dictionary, so that's one way to make a Python variable visible to a subprocess). With shell=False you will need to understand how to remove any quotes; for example, cd "$HOME" is equivalent to os.chdir(os.environ['HOME']) without quotes around the directory name. (Very often cd is not useful or necessary anyway, and many beginners omit the double quotes around the variable and get away with it until one day ...)
Redirection allows you to read from a file as your standard input, and write your standard output to a file. grep 'foo' <inputfile >outputfile opens outputfile for writing and inputfile for reading, and passes its contents as standard input to grep, whose standard output then lands in outputfile. This is not generally hard to replace with native Python code.
Pipelines are a form of redirection. echo foo | nl runs two subprocesses, where the standard output of echo is the standard input of nl (on the OS level, in Unix-like systems, this is a single file handle). If you cannot replace one or both ends of the pipeline with native Python code, perhaps think about using a shell after all, especially if the pipeline has more than two or three processes (though look at the pipes module in the Python standard library or a number of more modern and versatile third-party competitors).
Job control lets you interrupt jobs, run them in the background, return them to the foreground, etc. The basic Unix signals to stop and continue a process are of course available from Python, too. But jobs are a higher-level abstraction in the shell which involve process groups etc which you have to understand if you want to do something like this from Python.
Quoting in the shell is potentially confusing until you understand that everything is basically a string. So ls -l / is equivalent to 'ls' '-l' '/' but the quoting around literals is completely optional. Unquoted strings which contain shell metacharacters undergo parameter expansion, whitespace tokenization and wildcard expansion; double quotes prevent whitespace tokenization and wildcard expansion but allow parameter expansions (variable substitution, command substitution, and backslash processing). This is simple in theory but can get bewildering, especially when there are several layers of interpretation (a remote shell command, for example).
Understand differences between sh and Bash
subprocess runs your shell commands with /bin/sh unless you specifically request otherwise (except of course on Windows, where it uses the value of the COMSPEC variable). This means that various Bash-only features like arrays, [[ etc are not available.
If you need to use Bash-only syntax, you can
pass in the path to the shell as executable='/bin/bash' (where of course if your Bash is installed somewhere else, you need to adjust the path).
subprocess.run('''
# This for loop syntax is Bash only
for((i=1;i<=$#;i++)); do
# Arrays are Bash-only
array[i]+=123
done''',
shell=True, check=True,
executable='/bin/bash')
A subprocess is separate from its parent, and cannot change it
A somewhat common mistake is doing something like
subprocess.run('cd /tmp', shell=True)
subprocess.run('pwd', shell=True) # Oops, doesn't print /tmp
The same thing will happen if the first subprocess tries to set an environment variable, which of course will have disappeared when you run another subprocess, etc.
A child process runs completely separate from Python, and when it finishes, Python has no idea what it did (apart from the vague indicators that it can infer from the exit status and output from the child process). A child generally cannot change the parent's environment; it cannot set a variable, change the working directory, or, in so many words, communicate with its parent without cooperation from the parent.
The immediate fix in this particular case is to run both commands in a single subprocess;
subprocess.run('cd /tmp; pwd', shell=True)
though obviously this particular use case isn't very useful; instead, use the cwd keyword argument, or simply os.chdir() before running the subprocess. Similarly, for setting a variable, you can manipulate the environment of the current process (and thus also its children) via
os.environ['foo'] = 'bar'
or pass an environment setting to a child process with
subprocess.run('echo "$foo"', shell=True, env={'foo': 'bar'})
(not to mention the obvious refactoring subprocess.run(['echo', 'bar']); but echo is a poor example of something to run in a subprocess in the first place, of course).
Don't run Python from Python
This is slightly dubious advice; there are certainly situations where it does make sense or is even an absolute requirement to run the Python interpreter as a subprocess from a Python script. But very frequently, the correct approach is simply to import the other Python module into your calling script and call its functions directly.
If the other Python script is under your control, and it isn't a module, consider turning it into one. (This answer is too long already so I will not delve into details here.)
If you need parallelism, you can run Python functions in subprocesses with the multiprocessing module. There is also threading which runs multiple tasks in a single process (which is more lightweight and gives you more control, but also more constrained in that threads within a process are tightly coupled, and bound to a single GIL.)

Call it with subprocess
import subprocess
subprocess.Popen("cwm --rdf test.rdf --ntriples > test.nt")
The error you are getting seems to be because there is no swap module on the server, you should install swap on the server then run the script again

It is possible you use the bash program, with the parameter -c for execute the commands:
bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
output = subprocess.check_output(['bash','-c', bashCommand])

You can use subprocess, but I always felt that it was not a 'Pythonic' way of doing it. So I created Sultan (shameless plug) that makes it easy to run command line functions.
https://github.com/aeroxis/sultan

Also you can use 'os.popen'.
Example:
import os
command = os.popen('ls -al')
print(command.read())
print(command.close())
Output:
total 16
drwxr-xr-x 2 root root 4096 ago 13 21:53 .
drwxr-xr-x 4 root root 4096 ago 13 01:50 ..
-rw-r--r-- 1 root root 1278 ago 13 21:12 bot.py
-rw-r--r-- 1 root root 77 ago 13 21:53 test.py
None

According to the error you are missing a package named swap on the server. This /usr/bin/cwm requires it. If you're on Ubuntu/Debian, install python-swap using aptitude.

To run the command without a shell, pass the command as a list and implement the redirection in Python using [subprocess]:
#!/usr/bin/env python
import subprocess
with open('test.nt', 'wb', 0) as file:
subprocess.check_call("cwm --rdf test.rdf --ntriples".split(),
stdout=file)
Note: no > test.nt at the end. stdout=file implements the redirection.
To run the command using the shell in Python, pass the command as a string and enable shell=True:
#!/usr/bin/env python
import subprocess
subprocess.check_call("cwm --rdf test.rdf --ntriples > test.nt",
shell=True)
Here's the shell is responsible for the output redirection (> test.nt is in the command).
To run a bash command that uses bashisms, specify the bash executable explicitly e.g., to emulate bash process substitution:
#!/usr/bin/env python
import subprocess
subprocess.check_call('program <(command) <(another-command)',
shell=True, executable='/bin/bash')

copy paste this:
def run_bash_command(cmd: str) -> Any:
import subprocess
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
if error:
raise Exception(error)
else:
return output

subprocess.Popen() is prefered over os.system() as it offers more control and visibility. However, If you find subprocess.Popen() too verbose or complex, peasyshell is a small wrapper I wrote above it, which makes it easy to interact with bash from Python.
https://github.com/davidohana/peasyshell

The pythonic way of doing this is using subprocess.Popen
subprocess.Popen takes a list where the first element is the command to be run followed by any command line arguments.
As an example:
import subprocess
args = ['echo', 'Hello!']
subprocess.Popen(args) // same as running `echo Hello!` on cmd line
args2 = ['echo', '-v', '"Hello Again"']
subprocess.Popen(args2) // same as running 'echo -v "Hello Again!"` on cmd line

Using argparse, how can I process a "chdir" argument before fromfile expansion?

I want to support a sub-command CLI model, like used by git The particular bit I'm having trouble with is the "change directory" option. Like git, I want a -C DIR option which will have the program change to the specified directory before doing the sub-command. Not really a problem, using sub-parsers, BUT I also want to use the argparse.ArgumentParser(fromfile_prefix_chars='#') mechanism after the -C DIR argument is applied during parsing.
Here's the rub: fromfile argument expansion is performed by argparse before all other argument processing. Thus, any such fromfile arguments must either use absolute paths, or paths relative to the CWD at time the parser is invoked. I don't want absolute paths; I "need" to use fromfile paths that are relative to the -C DIR option. I wrote my own class ChdirAction(argparse.Action) to do the obvious. It worked fine, but since fromfile arguments were already expanded, it didn't give me what I want. (After discovering this not-what-I-want behavior, I looked at python3.5/argparse.py and found the same frustration embedded in cold, hard, unforgiving code.)
Here's a directory diagram that might help explain what I want:
/ foo / aaa / iii / arg.txt
| |
| + jjj / arg.txt
| |
| + arg.txt
|
+ bbb / iii / arg.txt
|
+ jjj / arg.txt
Consider when the CWD is either aaa or bbb at the time command line arguments are parsed. If I run with something like prog -C ./iii #arg.txt
I want the parser to expand #arg.txt with arguments from /foo/aaa/iii/arg.txt. What actually happens is that fromfile expands from the contents of /foo/aaa/arg.txt. When CWD is /foo/aaa this is the "wrong" file; when /foo/bbb it raises "error: [Errno 2] No such file or directory: 'arg.txt'"
More generally, prog -C ./DIR #arg.txt should expand from /foo/aaa/DIR/arg.txt which should work even the fromfile has "up-directory" parts, e.g. prog -C ./iii #../arg.txt should expand from /foo/aaa/arg.txt.
If this behavior can be made to happen, then I could -C DIR to any of {aaa,bbb}/{iii,jjj} and obtain consitent behaviour from a common command line construction.
As described, my problem isn't much of a problem. If can provide the -C DIR, to be realized by an os.chdir(DIR) after argument parsing, then I can also construct appropriate fromfile arguments. They could be either absolute or relative to the CWD at parsing (prior to any -C DIR taking effect). This might look like:
cd /foo/aaa; prog -C ./DIR #arg.txt #./DIR/arg.txt
I don't like it, but it would be okay. The REAL problem is that the actual change-directory argument I'm using is more like -C PATTERN. In my real problem case, PATTERN could be a simple path (absolute or relative). Or, it might be a glob pattern, or a partial name that has "non-trivial" resolution logic to find the actual directory for os.chdir(DIR). In this case (which I am struggling with), I can't have the invoker of the program resolve the actual location of the fromfile path.
Actually, I could, but that would put an inappropriate burden on the invoker. AND, when that invoker is an Eclipse launcher, I don't really have the control-flow power necessary to do it. So, it's back to having the program take care of it's own needs; a nicer abstraction, but how do I implement it?

Even as I was fleshing out the question, I came up with an idea. So I tried it out and it's kinda, sorta, okay(ish). I can get a constrained version of what I really want, but it's good enough for me (for now), so I thought I might as well share. It might be good enough for you, too. Even better, it might elicit a true solution from somewhere, maybe S.Bethard?
My hack is to do parsing in two phases: the first, is just enough to get the -C PATTERN argument by way of ArgumentParser.parse_known_args(...) without enabling the fromfile mechanism. If the result of that first (minimal) parsing yields a directory change argument, then I process it. The program aborts if more than a single -C PATTERN was specified, or the PATTERN can't be unambiguously resolved.
Then, I use a completely separate ArgumentParser object, configured with the full set of argument specifications that I actually want and parse it with the fromfile mechanism enabled.
There is some monkey business to get the --help argument to work (setting the proper conflict resolution policy, then merely accepting the arg in the first parser just to pass along to the second, which actually has all the "real" argument specs). Also, the first parser should support the same verbose/quiet options that the second one does, honoring their setting and also passing along from first to second parser.
Here's a simplified version of my application-level arg parser method. It doens't support verbose/quiet options at the first parser stage. I've elided the complexity of how a -C PATTERN is resolved to an actual directory. Also, I cut out the majority of the second parser's argument specification, leaving just the second parser's -C PATTERN argument (needed for --help output).
NOTE: Both parsers have a -C PATTERN argument. In the chdirParser it is meaningful; in the argParser it's present only so it will show up in the help output. Something similar should be done for verbose/quiet options - probably not that trixy, but it's not (yet) important to me so I don't mind always reporting a change of directory, even in quiet mode.
def cli_args_from_argv():
import argparse
import glob
import os
import sys
chdirParser = argparse.ArgumentParser(conflict_handler='resolve')
chdirParser.add_argument("-C", dest="chdir_pattern", action="append" , default=None)
chdirParser.add_argument("--help", "-h", dest="help", action="store_true", default=False)
(partial, remainder) = chdirParser.parse_known_args()
if partial.help:
remainder = ['--help']
elif partial.chdir_pattern:
if len(partial.chdir_pattern) > 1:
print(r'Too many -C options - at most one may be given, but received: {!r}'.format(partial.chdir_pattern), file=sys.stderr)
sys.exit(1)
pattern = partial.chdir_pattern[0]
resolved_dir = pattern
if os.path.exists(resolved_dir):
resolved_dir = pattern
else:
### ELIDED: resolution of pattern into an unambiguous and existing directory
if not resolved_dir:
print("Failed to resolve -C {!r}".format(pattern), file=sys.stderr)
sys.exit(1)
print("Changing to directory: {!r}".format(resolved_dir))
print("");
os.chdir(target_dir)
argParser = argparse.ArgumentParser(usage="usage: PROG [common-args] SUBCMD [subcmd-args]", fromfile_prefix_chars=':')
### ELIDED: a bunches of add_argument(...)
argParser.add_argument("-C", dest="chdir_spec", action="store", default=None, help="Before anything else, chdir to SPEC", metavar="SPEC")
return argParser.parse_args(args=remainder)
I have a feeling that there's probably a better way... Do you know?

I think the resolve bit can be replaced with
chdirParser = argparse.ArgumentParser(add_help=False)
and omit the -h definition and save. Let the second parser handle sys.argv unchanged (since you are including (but ignoring) the -C argument).
That append and test for len(partial.chdir_pattern) > 1 should work if you expect the user to use several -C dir1 ... -C dir2... commands. The alternative to use the default store action, which ends up saving the last of those repetitions. Why might the user repeat the -C, and why should you care? Usually we just ignore repetitions.
You might replace
print("Failed to resolve -C {!r}".format(pattern), file=sys.stderr)
sys.exit(1)
with
parser.error("Failed to resolve -C {!r}".format(pattern)')
It prints the usage (with only -C) and does ansys.exit(2)`. Not quite the same, but may be close enough.
For the second parser, the -C might be simplified (using defaults):
argParser.add_argument("-C", "--chdir-spec", help="Before anything else, chdir to SPEC", metavar="SPEC")
And use the full sys.argv.
return argParser.parse_args()
Otherwise, using 2 parsers makes sense, since the fromfile is present in the changed directory (and you want to ignore any such file in the initial directory).
I thought maybe a :arg.txt string the commandline would give problems in the first parser. But with parse_known_args it will just treat it as an unknown positional. But the proof's in the testing.

Python subprocess module with pre-populated environment

Question can be related to Use python subprocess module like a command line simulator
I have written some infrastructure code called my_shell to which you can pass shell commands of my application that looks like this
class ApplicationTestShell(object):
def __init__(self):
'''
Constructor
'''
self.play_ground_dir = "/var/tmp/MyAppDir"
ensure_dir_exists_and_empty(self.play_ground_dir)
def execute_command(self, command, on_success = None, on_failure = None):
p = create_shell_process(self, self.play_ground_dir)
sout, serr = p.communicate(input = command)
if p.returncode == 0:
on_success(sout)
else:
on_failure(serr)
def create_shell_process(self, cwd):
return Popen("/bin/bash", env= {WHAT DO I DO HERE?},cwd = test_dir, stdout=PIPE, stderr=PIPE, stdin=PIPE)
The interesting bit to me here is the env parameter. Python expects like a 'map' datastructure of all environment variable. My application requires several variables exported and set. The script for setting and exporting is generated by running say '/bin/appload myapp' (Assume appload is always available on the path). What I do currently
is when I call p.communicate I do the following
p.communicate(input = "eval `/bin/appload myapp`;" + command)
So basically before running the command I call the infrastructure setup.
Is there any way to do this in a better fashion in Python. I somehow want to push the eval /bin/appload part to the env parameter on the Popen class OR as part of the shell creation process.
What are the problems with my current implementation? (I feel it is hacky but I may be wrong)

It depends on how /bin/appload myapp works. If it only guarantees that it will output bash syntax, then parsing that output in Python in order to construct the environment object there is almost certainly more trouble than it's worth (you might need to support parameter and variable expansion, subshells, process substitution, etc, etc). On the other hand, if you are sure that /bin/appload myapp will only ever output lines of the form "VARIABLENAME=someword", then that's pretty trivial to parse in Python and you could move it into your Python code if you like.
There are an awful lot of different directions you could go with these requirements; you could capture the output of appload myapp into a tempfile and set the subprocess's $BASH_ENV to that filename; that would cause the shell to source your environment setup before running your command in a way that some might consider cleaner. You could give your command (with the eval-ing prefix) as the first argument to Popen and pass shell=True, and let Popen do the bash invocation on its own (setting $SHELL explicitly to bash if necessary). You could use bash's -c option to specify the code to run on the command line rather than via stdin. You could have a multi-tiered approach by invoking a shell from Python which eval's the appload myapp environment and then exec's another shell underneath it, so that the first doesn't show up in ps listings and the command given to create_shell_process has the shell all to itself (although that shouldn't really matter). You could do a lot of things, depending on what your concerns are with respect to how the shell is invoked, how it looks in ps listings, whether you want your command to still be run if the appload myapp output produces an error when eval'd, etc. But for a general solution, I think what you have is perfectly fine.
I don't see any real problems with the implementation, besides cosmetic things or minor things that probably only came from copying and pasting the code: create_shell_process doesn't use its cwd parameter, and the on_success and on_failure parameters look like they're optional but the defaults will break things (you can't call None).

How does argparse (and the deprecated optparse) respond to 'tab' keypress after python program name, in bash?

I have tested optcomplete working with the optparse module. Its example is a simple file so I could get that working. I also tested it using the argparse module as the prior one is deprecated. But I really do not understand how and by whom the python program gets called on tab presses. I suspect bash together with the shebang line and the argparse (or optparse) module are involved in some way. I have been trying to figure this out (now gonna read the source code).
I have a little more complex program structure, which includes a wrapper around the piece of code which handles the arguments. Its argparse.ArgumentParser() instantiation and calls to add_argument() - which are superclassed into another intermediate module to avoid duplicating code, and wrapper around that is being called - are inside a function.
I want to understand the way this tab completion works between bash and python (or for that matter any other interpretor like perl).
NOTE: I have a fair understanding of bash completion (which I learned just now), and I think I understand the bash(only) custom completion.
NOTE: I have read other similar SO questions, and none really answer this Q.
Edit: Here is the bash function.
I already understood how the python module gets to know about words typed in the command line, by reading os.environ values of variables
$COMP_WORDS
$COMP_CWORD
$COMP_LINE
$COMP_POINT
$COMPREPLY
These variables have values only on tab press.
My question is how does the python module gets triggered?

To understand what's happening here, let's check what that bash function actually does:
COMPREPLY=( $( \
COMP_LINE=$COMP_LINE COMP_POINT=$COMP_POINT \
COMP_WORDS="${COMP_WORDS[*]}" COMP_CWORD=$COMP_CWORD \
OPTPARSE_AUTO_COMPLETE=1 $1 ) )
See the $1 at the end? That means that it actually calls the Python file we want to execute with special environment variables set! To trace what's happening, let's prepare a little script to intercept what optcomplete.autocomplete does:
#!/usr/bin/env python2
import os, sys
import optparse, optcomplete
from cStringIO import StringIO
if __name__ == '__main__':
parser = optparse.OptionParser()
parser.add_option('-s', '--simple', action='store_true',
help="Simple really simple option without argument.")
parser.add_option('-o', '--output', action='store',
help="Option that requires an argument.")
opt = parser.add_option('-p', '--script', action='store',
help="Option that takes python scripts args only.")
opt.completer = optcomplete.RegexCompleter('.*\.py')
# debug env variables
sys.stderr.write("\ncalled with args: %s\n" % repr(sys.argv))
for k, v in sorted(os.environ.iteritems()):
sys.stderr.write(" %s: %s\n" % (k, v))
# setup capturing the actions of `optcomplete.autocomplete`
def fake_exit(i):
sys.stderr.write("autocomplete tried to exit with status %d\n" % i)
sys.stdout = StringIO()
sys.exit = fake_exit
# Support completion for the command-line of this script.
optcomplete.autocomplete(parser, ['.*\.tar.*'])
sys.stderr.write("autocomplete tried to write to STDOUT:\n")
sys.stderr.write(sys.stdout.getvalue())
sys.stderr.write("\n")
opts, args = parser.parse_args()
This gives us the following when we try to autocomplete it:
$ ./test.py [tab]
called with args: ['./test.py']
...
COMP_CWORD: 1
COMP_LINE: ./test.py
COMP_POINT: 10
COMP_WORDS: ./test.py
...
OPTPARSE_AUTO_COMPLETE: 1
...
autocomplete tried to exit with status 1
autocomplete tried to write to STDOUT:
-o -h -s -p --script --simple --help --output
So optcomplete.autocomplete just reads the environment, prepares the matches, writes them to STDOUT and exits. The result -o -h -s -p --script --simple --help --output is then put into a bash array (COMPREPLY=( ... )) and returned to bash to present the choices to the user. No magic involved :)

command line arg parsing through introspection

I'm developing a management script that does a fairly large amount of work via a plethora of command-line options. The first few iterations of the script have used optparse to collect user input and then just run down the page, testing the value of each option in the appropriate order, and doing the action if necessary. This has resulted in a jungle of code that's really hard to read and maintain.
I'm looking for something better.
My hope is to have a system where I can write functions in more or less normal python fashion, and then when the script is run, have options (and help text) generated from my functions, parsed, and executed in the appropriate order. Additionally, I'd REALLY like to be able to build django-style sub-command interfaces, where myscript.py install works completely separately from myscript.py remove (separate options, help, etc.)
I've found simon willison's optfunc and it does a lot of this, but seems to just miss the mark — I want to write each OPTION as a function, rather than try to compress the whole option set into a huge string of options.
I imagine an architecture involving a set of classes for major functions, and each defined method of the class corresponding to a particular option in the command line. This structure provides the advantage of having each option reside near the functional code it modifies, easing maintenance. The thing I don't know quite how to deal with is the ordering of the commands, since the ordering of class methods is not deterministic.
Before I go reinventing the wheel: Are there any other existing bits of code that behave similarly? Other things that would be easy to modify? Asking the question has clarified my own thinking on what would be nice, but feedback on why this is a terrible idea, or how it should work would be welcome.

Don't waste time on "introspection".
Each "Command" or "Option" is an object with two sets of method functions or attributes.
Provide setup information to optparse.
Actually do the work.
Here's the superclass for all commands
class Command( object ):
name= "name"
def setup_opts( self, parser ):
"""Add any options to the parser that this command needs."""
pass
def execute( self, context, options, args ):
"""Execute the command in some application context with some options and args."""
raise NotImplemented
You create sublcasses for Install and Remove and every other command you need.
Your overall application looks something like this.
commands = [
Install(),
Remove(),
]
def main():
parser= optparse.OptionParser()
for c in commands:
c.setup_opts( parser )
options, args = parser.parse()
command= None
for c in commands:
if c.name.startswith(args[0].lower()):
command= c
break
if command:
status= command.execute( context, options, args[1:] )
else:
logger.error( "Command %r is unknown", args[0] )
status= 2
sys.exit( status )

The WSGI library werkzeug provides Management Script Utilities which may do what you want, or at least give you a hint how to do the introspection yourself.
from werkzeug import script
# actions go here
def action_test():
"sample with no args"
pass
def action_foo(name=2, value="test"):
"do some foo"
pass
if __name__ == '__main__':
script.run()
Which will generate the following help message:
$ python /tmp/test.py --help
usage: test.py <action> [<options>]
test.py --help
actions:
foo:
do some foo
--name integer 2
--value string test
test:
sample with no args
An action is a function in the same module starting with "action_" which takes a number of arguments where every argument has a default. The type of the default value specifies the type of the argument.
Arguments can then be passed by position or using --name=value from the shell.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.