Related
I want to support a sub-command CLI model, like used by git The particular bit I'm having trouble with is the "change directory" option. Like git, I want a -C DIR option which will have the program change to the specified directory before doing the sub-command. Not really a problem, using sub-parsers, BUT I also want to use the argparse.ArgumentParser(fromfile_prefix_chars='#') mechanism after the -C DIR argument is applied during parsing.
Here's the rub: fromfile argument expansion is performed by argparse before all other argument processing. Thus, any such fromfile arguments must either use absolute paths, or paths relative to the CWD at time the parser is invoked. I don't want absolute paths; I "need" to use fromfile paths that are relative to the -C DIR option. I wrote my own class ChdirAction(argparse.Action) to do the obvious. It worked fine, but since fromfile arguments were already expanded, it didn't give me what I want. (After discovering this not-what-I-want behavior, I looked at python3.5/argparse.py and found the same frustration embedded in cold, hard, unforgiving code.)
Here's a directory diagram that might help explain what I want:
/ foo / aaa / iii / arg.txt
| |
| + jjj / arg.txt
| |
| + arg.txt
|
+ bbb / iii / arg.txt
|
+ jjj / arg.txt
Consider when the CWD is either aaa or bbb at the time command line arguments are parsed. If I run with something like prog -C ./iii #arg.txt
I want the parser to expand #arg.txt with arguments from /foo/aaa/iii/arg.txt. What actually happens is that fromfile expands from the contents of /foo/aaa/arg.txt. When CWD is /foo/aaa this is the "wrong" file; when /foo/bbb it raises "error: [Errno 2] No such file or directory: 'arg.txt'"
More generally, prog -C ./DIR #arg.txt should expand from /foo/aaa/DIR/arg.txt which should work even the fromfile has "up-directory" parts, e.g. prog -C ./iii #../arg.txt should expand from /foo/aaa/arg.txt.
If this behavior can be made to happen, then I could -C DIR to any of {aaa,bbb}/{iii,jjj} and obtain consitent behaviour from a common command line construction.
As described, my problem isn't much of a problem. If can provide the -C DIR, to be realized by an os.chdir(DIR) after argument parsing, then I can also construct appropriate fromfile arguments. They could be either absolute or relative to the CWD at parsing (prior to any -C DIR taking effect). This might look like:
cd /foo/aaa; prog -C ./DIR #arg.txt #./DIR/arg.txt
I don't like it, but it would be okay. The REAL problem is that the actual change-directory argument I'm using is more like -C PATTERN. In my real problem case, PATTERN could be a simple path (absolute or relative). Or, it might be a glob pattern, or a partial name that has "non-trivial" resolution logic to find the actual directory for os.chdir(DIR). In this case (which I am struggling with), I can't have the invoker of the program resolve the actual location of the fromfile path.
Actually, I could, but that would put an inappropriate burden on the invoker. AND, when that invoker is an Eclipse launcher, I don't really have the control-flow power necessary to do it. So, it's back to having the program take care of it's own needs; a nicer abstraction, but how do I implement it?
Even as I was fleshing out the question, I came up with an idea. So I tried it out and it's kinda, sorta, okay(ish). I can get a constrained version of what I really want, but it's good enough for me (for now), so I thought I might as well share. It might be good enough for you, too. Even better, it might elicit a true solution from somewhere, maybe S.Bethard?
My hack is to do parsing in two phases: the first, is just enough to get the -C PATTERN argument by way of ArgumentParser.parse_known_args(...) without enabling the fromfile mechanism. If the result of that first (minimal) parsing yields a directory change argument, then I process it. The program aborts if more than a single -C PATTERN was specified, or the PATTERN can't be unambiguously resolved.
Then, I use a completely separate ArgumentParser object, configured with the full set of argument specifications that I actually want and parse it with the fromfile mechanism enabled.
There is some monkey business to get the --help argument to work (setting the proper conflict resolution policy, then merely accepting the arg in the first parser just to pass along to the second, which actually has all the "real" argument specs). Also, the first parser should support the same verbose/quiet options that the second one does, honoring their setting and also passing along from first to second parser.
Here's a simplified version of my application-level arg parser method. It doens't support verbose/quiet options at the first parser stage. I've elided the complexity of how a -C PATTERN is resolved to an actual directory. Also, I cut out the majority of the second parser's argument specification, leaving just the second parser's -C PATTERN argument (needed for --help output).
NOTE: Both parsers have a -C PATTERN argument. In the chdirParser it is meaningful; in the argParser it's present only so it will show up in the help output. Something similar should be done for verbose/quiet options - probably not that trixy, but it's not (yet) important to me so I don't mind always reporting a change of directory, even in quiet mode.
def cli_args_from_argv():
import argparse
import glob
import os
import sys
chdirParser = argparse.ArgumentParser(conflict_handler='resolve')
chdirParser.add_argument("-C", dest="chdir_pattern", action="append" , default=None)
chdirParser.add_argument("--help", "-h", dest="help", action="store_true", default=False)
(partial, remainder) = chdirParser.parse_known_args()
if partial.help:
remainder = ['--help']
elif partial.chdir_pattern:
if len(partial.chdir_pattern) > 1:
print(r'Too many -C options - at most one may be given, but received: {!r}'.format(partial.chdir_pattern), file=sys.stderr)
sys.exit(1)
pattern = partial.chdir_pattern[0]
resolved_dir = pattern
if os.path.exists(resolved_dir):
resolved_dir = pattern
else:
### ELIDED: resolution of pattern into an unambiguous and existing directory
if not resolved_dir:
print("Failed to resolve -C {!r}".format(pattern), file=sys.stderr)
sys.exit(1)
print("Changing to directory: {!r}".format(resolved_dir))
print("");
os.chdir(target_dir)
argParser = argparse.ArgumentParser(usage="usage: PROG [common-args] SUBCMD [subcmd-args]", fromfile_prefix_chars=':')
### ELIDED: a bunches of add_argument(...)
argParser.add_argument("-C", dest="chdir_spec", action="store", default=None, help="Before anything else, chdir to SPEC", metavar="SPEC")
return argParser.parse_args(args=remainder)
I have a feeling that there's probably a better way... Do you know?
I think the resolve bit can be replaced with
chdirParser = argparse.ArgumentParser(add_help=False)
and omit the -h definition and save. Let the second parser handle sys.argv unchanged (since you are including (but ignoring) the -C argument).
That append and test for len(partial.chdir_pattern) > 1 should work if you expect the user to use several -C dir1 ... -C dir2... commands. The alternative to use the default store action, which ends up saving the last of those repetitions. Why might the user repeat the -C, and why should you care? Usually we just ignore repetitions.
You might replace
print("Failed to resolve -C {!r}".format(pattern), file=sys.stderr)
sys.exit(1)
with
parser.error("Failed to resolve -C {!r}".format(pattern)')
It prints the usage (with only -C) and does ansys.exit(2)`. Not quite the same, but may be close enough.
For the second parser, the -C might be simplified (using defaults):
argParser.add_argument("-C", "--chdir-spec", help="Before anything else, chdir to SPEC", metavar="SPEC")
And use the full sys.argv.
return argParser.parse_args()
Otherwise, using 2 parsers makes sense, since the fromfile is present in the changed directory (and you want to ignore any such file in the initial directory).
I thought maybe a :arg.txt string the commandline would give problems in the first parser. But with parse_known_args it will just treat it as an unknown positional. But the proof's in the testing.
Is there a way to parse only limited number of switches in a function using argparse? Say, my command is:
python sample.py -t abc -r dfg -h klm -n -p qui
And I want argparse to parse from -t to -h and leave the remaining, also show help for these only.
Next I want to parse any switch after -h into another function and see corresponding help there.
Is this behavior possible in argparse? Also is there a way I can modify sys.arg it is using internally?
Thanks.
python sample.py -t abc -r dfg -h klm -n -p qui
And I want argparse to parse from -t to -h and leave the remaining, also show help for these only. Next I want to parse any switch after -h into another function and see corresponding help there.
There are some issues with your specification:
Is -h the regular help? If so it has priority, producing the help without parsing the other arguments. The string after -h suggests you are treating it like a normal user define argument, which would then require initiating the parser with help turned off. But then how would you ask for help?
What sets the break between the two parsings/help? The number of arguments, the -h flag (regardless of order), or the id of the flags. Remember argparse accepts flagged arguments in any order.
You could define one parser that knows about -t and -r, and another that handles -n and -p. Calling each with parse_known_args lets it operate without raising a unknown argument error.
You can also modify the sys.argv. parse_args (and the known variant), takes an optional argv argument. If that is none, then it uses sys.argv[1:]. So you could either modify sys.argv itself (deleting items), or you could pass a subset of sys.argv to the parser.
parser1.parse_known_args(sys.argv[1:5])
parser2.parse_known_args(['-n','one','-o','two'])
parser3.parse_args(sys.argv[3:])
Play with those ideas, and come back to us if there are further questions.
You can always modify sys.args and put anything you wish there.
As for your main question, you can have two parsers. One of them will have arguments -t to -h, the second -n and -p. Then you can use argparse's parse_known_args() method on each parser, which will parse only the arguments defined for each of them.
When I'm writing shell scripts, I often find myself spending most of my time (especially when debugging) dealing with argument processing. Many scripts I write or maintain are easily more than 80% input parsing and sanitization. I compare that to my Python scripts, where argparse handles most of the grunt work for me, and lets me easily construct complex option structures and sanitization / string parsing behavior.
I'd love, therefore, to be able to have Python do this heavy lifting, and then get these simplified and sanitized values in my shell script, without needing to worry any further about the arguments the user specified.
To give a specific example, many of the shell scripts where I work have been defined to accept their arguments in a specific order. You can call start_server.sh --server myserver --port 80 but start_server.sh --port 80 --server myserver fails with You must specify a server to start. - it makes the parsing code a lot simpler, but it's hardly intuitive.
So a first pass solution could be something as simple as having Python take in the arguments, sort them (keeping their parameters next to them) and returning the sorted arguments. So the shell script still does some parsing and sanitization, but the user can input much more arbitrary content than the shell script natively accepts, something like:
# script.sh -o -aR --dir /tmp/test --verbose
#!/bin/bash
args=$(order.py "$#")
# args is set to "-a --dir /tmp/test -o -R --verbose"
# simpler processing now that we can guarantee the order of parameters
There's some obvious limitations here, notably that parse.py can't distinguish between a final option with an argument and the start of indexed arguments, but that doesn't seem that terrible.
So here's my question: 1) Is there any existing (Python preferably) utility to enable CLI parsing by something more powerful than bash, which can then be accessed by the rest of my bash script after sanitization, or 2) Has anyone done this before? Are there issues or pitfalls or better solutions I'm not aware of? Care to share your implementation?
One (very half-baked) idea:
#!/bin/bash
# Some sort of simple syntax to describe to Python what arguments to accept
opts='
"a", "append", boolean, help="Append to existing file"
"dir", str, help="Directory to run from"
"o", "overwrite", boolean, help="Overwrite duplicates"
"R", "recurse", boolean, help="Recurse into subdirectories"
"v", "verbose", boolean, help="Print additional information"
'
# Takes in CLI arguments and outputs a sanitized structure (JSON?) or fails
p=$(parse.py "Runs complex_function with nice argument parsing" "$opts" "$#")
if [ $? -ne 0 ]; exit 1; fi # while parse outputs usage to stderr
# Takes the sanitized structure and an argument to get
append=$(arg.py "$p" append)
overwrite=$(arg.py "$p" overwrite)
recurse=$(arg.py "$p" recurse)
verbose=$(arg.py "$p" verbose)
cd $(python arg.py "$p" dir)
complex_function $append $overwrite $recurse $verbose
Two lines of code, along with concise descriptions of the arguments to expect, and we're on to the actual script behavior. Maybe I'm crazy, but that seems way nicer than what I feel like I have to do now.
I've seen Parsing shell script arguments and things like this wiki page on easy CLI argument parsing, but many of these patterns feel clunky and error prone, and I dislike having to re-implement them every time I write a shell script, especially when Python, Java, etc. have such nice argument processing libraries.
You could potentially take advantage of associative arrays in bash to help obtain your goal.
declare -A opts=($(getopts.py $#))
cd ${opts[dir]}
complex_function ${opts[append]} ${opts[overwrite]} ${opts[recurse]} \
${opts[verbose]} ${opts[args]}
To make this work, getopts.py should be a python script that parses and sanitizes your arguments. It should print a string like the following:
[dir]=/tmp
[append]=foo
[overwrite]=bar
[recurse]=baz
[verbose]=fizzbuzz
[args]="a b c d"
You could set aside values for checking that the options were able to be properly parsed and sanitized as well.
Returned from getopts.py:
[__error__]=true
Added to bash script:
if ${opts[__error__]}; then
exit 1
fi
If you would rather work with the exit code from getopts.py, you could play with eval:
getopts=$(getopts.py $#) || exit 1
eval declare -A opts=($getopts)
Alternatively:
getopts=$(getopts.py $#)
if [[ $? -ne 0 ]]; then
exit 1;
fi
eval declare -A opts=($getopts)
Having the very same needs, I ended up writing an optparse-inspired parser for bash (which actually uses python internally); you can find it here:
https://github.com/carlobaldassi/bash_optparse
See the README at the bottom for a quick explanation. You may want to check out a simple example at:
https://github.com/carlobaldassi/bash_optparse/blob/master/doc/example_script_simple
From my experience, it's quite robust (I'm super-paranoid), feature-rich, etc., and I'm using it heavily in my scripts. I hope it may be useful to others. Feedback/contributions welcome.
Edit: I haven't used it (yet), but if I were posting this answer today I would probably recommend https://github.com/docopt/docopts instead of a custom approach like the one described below.
I've put together a short Python script that does most of what I want. I'm not convinced it's production quality yet (notably error handling is lacking), but it's better than nothing. I'd welcome any feedback.
It takes advantage of the set builtin to re-assign the positional arguments, allowing the remainder of the script to still handle them as desired.
bashparse.py
#!/usr/bin/env python
import optparse, sys
from pipes import quote
'''
Uses Python's optparse library to simplify command argument parsing.
Takes in a set of optparse arguments, separated by newlines, followed by command line arguments, as argv[2] and argv[3:]
and outputs a series of bash commands to populate associated variables.
'''
class _ThrowParser(optparse.OptionParser):
def error(self, msg):
"""Overrides optparse's default error handling
and instead raises an exception which will be caught upstream
"""
raise optparse.OptParseError(msg)
def gen_parser(usage, opts_ls):
'''Takes a list of strings which can be used as the parameters to optparse's add_option function.
Returns a parser object able to parse those options
'''
parser = _ThrowParser(usage=usage)
for opts in opts_ls:
if opts:
# yes, I know it's evil, but it's easy
eval('parser.add_option(%s)' % opts)
return parser
def print_bash(opts, args):
'''Takes the result of optparse and outputs commands to update a shell'''
for opt, val in opts.items():
if val:
print('%s=%s' % (opt, quote(val)))
print("set -- %s" % " ".join(quote(a) for a in args))
if __name__ == "__main__":
if len(sys.argv) < 2:
sys.stderr.write("Needs at least a usage string and a set of options to parse")
sys.exit(2)
parser = gen_parser(sys.argv[1], sys.argv[2].split('\n'))
(opts, args) = parser.parse_args(sys.argv[3:])
print_bash(opts.__dict__, args)
Example usage:
#!/bin/bash
usage="[-f FILENAME] [-t|--truncate] [ARGS...]"
opts='
"-f"
"-t", "--truncate",action="store_true"
'
echo "$(./bashparse.py "$usage" "$opts" "$#")"
eval "$(./bashparse.py "$usage" "$opts" "$#")"
echo
echo OUTPUT
echo $f
echo $#
echo $0 $2
Which, if run as: ./run.sh one -f 'a_filename.txt' "two' still two" three outputs the following (notice that the internal positional variables are still correct):
f=a_filename.txt
set -- one 'two'"'"' still two' three
OUTPUT
a_filename.txt
one two' still two three
./run.sh two' still two
Disregarding the debugging output, you're looking at approximately four lines to construct a powerful argument parser. Thoughts?
The original premise of my question assumes that delegating to Python is the right approach to simplify argument parsing. If we drop the language requirement we can actually do a decent job* in Bash, using getopts and a little eval magic:
main() {
local _usage='foo [-a] [-b] [-f val] [-v val] [args ...]'
eval "$(parse_opts 'f:v:ab')"
echo "f=$f v=$v a=$a b=$b -- $#: $*"
}
main "$#"
The implementation of parse_opts is in this gist, but the basic approach is to convert options into local variables which can then be handled like normal. All the standard getopts boilerplate is hidden away, and error handling works as expected.
Because it uses local variables within a function, parse_opts is not just useful for command line arguments, it can be used with any function in your script.
* I say "decent job" because Bash's getopts is a fairly limited parser and only supports single-letter options. Elegant, expressive CLIs are still better implemented in other languages like Python. But for reasonably small functions or scripts this provides a nice middle ground without adding too much complexity or bloat.
How can I reverse the results of a shlex.split? That is, how can I obtain a quoted string that would "resemble that of a Unix shell", given a list of strings I wish quoted?
Update0
I've located a Python bug, and made corresponding feature requests here.
We now (3.3) have a shlex.quote function. It’s none other that pipes.quote moved and documented (code using pipes.quote will still work). See http://bugs.python.org/issue9723 for the whole discussion.
subprocess.list2cmdline is a private function that should not be used. It could however be moved to shlex and made officially public. See also http://bugs.python.org/issue1724822.
How about using pipes.quote?
import pipes
strings = ["ls", "/etc/services", "file with spaces"]
" ".join(pipes.quote(s) for s in strings)
# "ls /etc/services 'file with spaces'"
.
There is a feature request for adding shlex.join(), which would do exactly what you ask. As of now, there does not seem any progress on it, though, mostly as it would mostly just forward to shlex.quote(). In the bug report, a suggested implementation is mentioned:
' '.join(shlex.quote(x) for x in split_command)
See https://bugs.python.org/issue22454
It's shlex.join() in python 3.8
subprocess uses subprocess.list2cmdline(). It's not an official public API, but it's mentioned in the subprocess documentation and I think it's pretty safe to use. It's more sophisticated than pipes.open() (for better or worse).
While shlex.quote is available in Python 3.3 and shlex.join is available in Python 3.8, they will not always serve as a true "reversal" of shlex.split. Observe the following snippet:
import shlex
command = "cd /home && bash -c 'echo $HOME'"
print(shlex.split(command))
# ['cd', '/home', '&&', 'bash', '-c', 'echo $HOME']
print(shlex.join(shlex.split(command)))
# cd /home '&&' bash -c 'echo $HOME'
Notice that after splitting and then joining, the && token now has single quotes around it. If you tried running the command now, you'd get an error: cd: too many arguments
If you use subprocess.list2cmdline() as others have suggested, it works nicer with bash operators like &&:
import subprocess
print(subprocess.list2cmdline(shlex.split(command)))
# cd /home && bash -c "echo $HOME"
However you may notice now that the quotes are now double instead of single. This results in $HOME being expanded by the shell rather than being printed verbatim as if you had used single quotes.
In conclusion, there is no 100% fool-proof way of undoing shlex.split, and you will have to choose the option that best suites your purpose and watch out for edge cases.
I am using the OptionParser from optparse module to parse my command that I get using the raw_input().
I have these questions.
1.) I use OptionParser to parse this input, say for eg. (getting multiple args)
my prompt> -a foo -b bar -c spam eggs
I did this with setting the action='store_true' in add_option() for '-c',now if there is another option with multiple argument say -d x y z then how to know which arguments come from which option? also if one of the arguments has to be parsed again like
my prompt> -a foo -b bar -c spam '-f anotheroption'
2.) if i wanted to do something like this..
my prompt> -a foo -b bar
my prompt> -c spam eggs
my prompt> -d x y z
now each entry must not affect the other options set by the previous command. how to accomplish these?
For part 2: you want a new OptionParser instance for each line you process. And look at the cmd module for writing a command loop like this.
You can also solve #1 using the nargs option attribute as follows:
parser = OptionParser()
parser.add_option("-c", "", nargs=2)
parser.add_option("-d", "", nargs=3)
optparse solves #1 by requiring that an argument always have the same number of parameters (even if that number is 0), variable-parameter arguments are not allowed:
Typically, a given option either takes
an argument or it doesn’t. Lots of
people want an “optional option
arguments” feature, meaning that some
options will take an argument if they
see it, and won’t if they don’t. This
is somewhat controversial, because it
makes parsing ambiguous: if "-a" takes
an optional argument and "-b" is
another option entirely, how do we
interpret "-ab"? Because of this
ambiguity, optparse does not support
this feature.
You would solve #2 by not reusing the previous values to parse_args, so it would create a new values object rather than update.