I'd like to support a command line interface where users can declare an arbitrary number of samples, with one or more input files corresponding to each sample. Something like this:
$ myprogram.py \
--foo bar \
--sample1 input1.tsv \
--sample2 input2a.tsv input2b.tsv input2c.tsv \
--sample3 input3-filtered.tsv \
--out output.tsv
The idea is that the option keys will match the pattern --sample(\d+), and each key will consume all subsequent arguments as option values until the next - or -- prefixed flag is encountered. For explicitly declared arguments, this is a common use case that the argparse module supports with the nargs='+' option. But since I need to support an arbitrary number of arguments I can't declare them explicitly.
The parse_known_args command will give me access to all user-supplied arguments, but those not explicitly declared will not be grouped into an indexed data structure. For these I would need to carefully examine the argument list, look ahead to see how many of the subsequent values correspond to the current flag, etc.
Is there any way I can parse these options without having to essentially re-implement large parts of an argument parser (almost) from scratch?
If you can live with a slightly different syntax, namely:
$ myprogram.py \
--foo bar \
--sample input1.tsv \
--sample input2a.tsv input2b.tsv input2c.tsv \
--sample input3-filtered.tsv \
--out output.tsv
where the parameter name doesn't contain a number, but still it performs grouping, try this:
parser.add_argument('--sample', action='append', nargs='+')
It produces a list of lists, ie. --sample x y --sample 1 2 will produce Namespace(sample=[['x', 'y'], ['1', '2']])
As I mentioned in my comment:
import argparse
argv = "myprogram.py \
--foo bar \
--sample1 input1.tsv \
--sample2 input2a.tsv input2b.tsv input2c.tsv \
--sample3 input3-filtered.tsv \
--out output.tsv"
parser = argparse.ArgumentParser()
parser.add_argument('--foo')
parser.add_argument('--out')
for x in range(1, argv.count('--sample') + 1):
parser.add_argument('--sample' + str(x), nargs='+')
args = parser.parse_args(argv.split()[1:])
Gives:
print args
Namespace(foo='bar', out='output.tsv', sample1=['input1.tsv'], sample2=['input2a.tsv', 'input2b.tsv', 'input2c.tsv'], sample3=['input3-filtered.tsv'])
With the real sys.argv you'll probably have to replace the argv.count with the slightly longer ' '.join(sys.argv).count('--sample')
The major downside to this approach is the auto help generation will not cover these fields.
It would be simpler to make that number or key at separate argument value, and collect the related arguments in an nested list.
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--foo')
parser.add_argument('--out')
parser.add_argument('--sample', nargs='+', action='append', metavar=('KEY','TSV'))
parser.print_help()
argv = "myprogram.py \
--foo bar \
--sample 1 input1.tsv \
--sample 2 input2a.tsv input2b.tsv input2c.tsv \
--sample 3 input3-filtered.tsv \
--out output.tsv"
argv = argv.split()
args = parser.parse_args(argv[1:])
print(args)
produces:
1031:~/mypy$ python3 stack44267794.py -h
usage: stack44267794.py [-h] [--foo FOO] [--out OUT] [--sample KEY [TSV ...]]
optional arguments:
-h, --help show this help message and exit
--foo FOO
--out OUT
--sample KEY [TSV ...]
Namespace(foo='bar', out='output.tsv',
sample=[['1', 'input1.tsv'],
['2', 'input2a.tsv', 'input2b.tsv', 'input2c.tsv'],
['3', 'input3-filtered.tsv']])
There have been questions about collecting general key:value pairs. There's nothing in argparse to directly support that. Various things have been suggested, but all boil down to parsing the pairs yourself.
Is it possible to use argparse to capture an arbitrary set of optional arguments?
You have added the complication that the number of arguments per key is variable. That rules out handling '--sample1=input1' as simple strings.
argparse has extended a well known POSIX commandline standard. But if you want to move beyond that, then be prepared to process the arguments either before (sys.argv) or after argparse (the parse_known_args extras).
It may well be possible to do the sort of thing that you are looking for with click rather than argparse.
To quote:
$ click_
Click is a Python package for creating beautiful command line
interfaces in a composable way with as little code as necessary.
It's the "Command Line Interface Creation Kit". It's highly
configurable but comes with sensible defaults out of the box.
It aims to make the process of writing command line tools quick and
fun while also preventing any frustration caused by the inability to
implement an intended CLI API.
Click in three points:
arbitrary nesting of commands
automatic help page generation
supports lazy loading of subcommands at runtime
Read the docs at http://click.pocoo.org/
One of the important features of click is the ability to construct sub-commands, (a bit like using git or image magic covert), which should allow you to structure your command line as:
myprogram.py \
--foo bar \
--sampleset input1.tsv \
--sampleset input2a.tsv input2b.tsv input2c.tsv \
--sampleset input3-filtered.tsv \
combinesets --out output.tsv
Or even:
myprogram.py \
--foo bar \
process input1.tsv \
process input2a.tsv input2b.tsv input2c.tsv \
process input3-filtered.tsv \
combine --out output.tsv
Which might be cleaner, in this case your code would have parameters called --foo and --out and functions called process and combine process would be called with the input file(s) specified and combine with no parameters.
Related
I am writing a program that should take multiple files as input, where for each file a list of arguments is given. The call could look like:
python myprog.py \
--file picture1.svg --scale 3 --color false \
--file picture2.svg --scale 1 --color false \
--file pictureX.svg --scale 11 --color true \
-o output.svg
In my case the order of the files matters as well as the correct grouping of the arguments, of course. So I expect finally to receive a dictionary
[["file":"picture1.svg", "scale":3, "color":"false"],
["file":"picture2.svg", "scale":1, "color":"false"],
["file":"pictureX.svg", "scale":11, "color":"true"]]
---------------------------------------//------------------------------
What I found so far is to:
use action="append" using argparse.
import argparse
parser = argparse.ArgumentParser(description='Call file list with parameters')
parser.add_argument('-f', '--file', type=str, nargs='+', action='append', help='file list')
parser.add_argument('-o', '--output', type=str, help='output file')
args = parser.parse_args()
print(args)
The call would look like:
python myprog.py \
--file picture1.svg scale:3 color:false \
--file picture2.svg scale:1 color:false \
--file pictureX.svg scale:11 color:true \
-o output.svg
This will give me a list of three lists that I can theoretically parse
[["picture1.svg", "scale:3", "color:false"],
["picture2.svg", "scale:1", "color:false"],
["picturex.svg", "scale:11", "color:true"]]
This would not work optimally for automatic help generation at argparse, will not allow to declare default values, etc. Otherwise seems for me the most optimal solution
The other way would be to generate lists of parameters like
parser.add_argument('-c', '--color', type=str, nargs='+', action='append', help='color list')
parser.add_argument('-s', '--scale', type=int, nargs='+', action='append', help='scale list')
which would be called like:
python myprog.py \
--file picture1.svg --scale 3 --color false \
--file picture2.svg --scale 1 --color false \
--file pictureX.svg --scale 11 --color true \
-o output.svg
resulting in list of lists:
[["picture1.svg", "picture2.svg", "picturex.svg"],
["scale:3", "scale:1", "scale:11"],
["color:false","color:false", "color:true"]]
The advantage is in handling everything by argparse. However, if some parameters (for example the second score) are missing, the correspondence between the parameters can not be ensured.
The final possibility I see is to use json as input. It can be easily parsed into the object we want. However, all the advantages of the command line parser will disappear.
What do you think would be the optimal solution, one of the above, or did I overlook something and there is another elegant way to do this?
Thank you!
Well, I think I found a way it is supposed to be by argparse.
We can have our own datatypes as command line arguments. So referring to the problem above, we can create a class, somthing like:
class Filetype():
patterns = {
'filename': re.compile(".*\.svg"),
'scale': re.compile("scale:(\d+\.*\d*)"),
'color': re.compile("color:(True|False)+"),
}
def __call__(self, value):
a = self.patterns["filename"].match(value)
if a:
return {"file": value}
a=self.patterns["scale"].match(value)
if a:
return {"scale": a.group(1)}
a=self.patterns["color"].match(value)
if a:
return {"color": a.group(1)}
raise argparse.ArgumentTypeError(
"'{}' should be either: (1) a file name (*.svg), (2) 'scale:FLOAT' parameter, or (3) 'code:[True|False]'".format(
value))
return value
Now, let us add an argument to the parser. The "type" paramter does the work
parser.add_argument('-f', '--file', type=Filetype(), nargs='+', action='append',
help='list of file names with parameters')
we can call our program now with:
python myprog.py \
--file picture1.svg scale:3 color:false \
--file picture2.svg scale:1 color:false \
--file pictureX.svg scale:11 color:true \
-o output.svg
This seems to be a better compromise. Though we have to perform even simple checks ourself, we can still produce a meaningful error message and argparse still does the heavy work for us
I'm trying to create a command like
prog [-h] [-i ID [ID ...]] | -x [SOMETHING]
{cmd1,cmd2,cmd3}...
So basically at the top level I have a parser that has a mutual exlusive group for the -i and -x options, and then following those (and possibly other) options, I have a command that I want to run. Each command has their own set of options that they use. I can get the commands working fine with the add_subparsers(), but the problem I'm running into is when I try to add an argument to the root parser that has nargs='+'. When I do that, it slurps up all of the arguments for -i thinking that the command is an argument and not an ID.
Is there a way around this? It seems like it would have to look through the arguments to -i looking for a command word and then tell argparse that it should resume parsing at that point.
I had to read your description several times, but I think this is the problem:
prog -i id1 id2 cmd1 -foo 3 ....
and it gives some sort of warning about not finding {cmd1,cmd2,cmd3}. The exact error may differ because in some versions subparsers aren't actually required.
In any case, the arguments to -i are ['id1','id2','cmd1'], everything up to the next - flag. To the main parser, the subparsers argument is just another positional one (with choices). When allocating strings to -i it does not check whether the string matches one of the cmds. It just looks at whether it starts with - or not.
The only way you can use an nargs='+' (or '*') in the context is to include some other flagged argument, e.g.
prog -i id1 id2 -x 3 cmd1 --foo ...
I realize that goes against your mutually_exclusive group.
The basic point is non flag strings are allocated based on position, not value. For a variable nargs you have to have some sort of explicit list terminator.
From the sidebar
Argparse nargs="+" is eating positional argument
It's similar except that your next positional is the subparsers cmd.
==============
A positional with '+' will work right before a subparsers cmd
usage: prog [-h] foo [foo ...] {cmd1,cmd2} ...
In [160]: p1.parse_args('1 22 3 cmd1'.split())
Out[160]: Namespace(cmd='cmd1', foo=['1', '22', '3'])
But that's because strings for foo and cmd are allocated with one regex pattern test.
In
usage: prog [-h] [--bar BAR [BAR ...]] {cmd1,cmd2} ...
strings are allocated to bar without reference to the needs of the following positional, cmd. As shown in the suggested patches for http://bugs.python.org/issue9338, changing this behavior is not a trivial change. It requires an added look-ahead trial-and-error loop.
My Program should include the following options, properly parsed by argparse:
purely optional: [-h, --help] and [-v, --version]
mutually exclusive: [-f FILE, --file FILE] and [-u URL, --url URL]
optional if --url was chosen: [-V, --verbose]
required if either --file or --url was chosen: [-F, --format FORMAT]
The desired usage pattern would be:
prog.py [-h] [-v] [-f FILE (-F FORMAT) | -u URL [-V] (-F FORMAT) ]
with the -F requirement applying to both members of the mutually exclusive group.
Not sure if it rather be a positional.
So it should be possible to run:
prog.py -u "http://foo.bar" -V -F csv
and the parser screaming in case i forgot the -F (as he's supposed to).
What i've done so far:
parser = ArgumentParser(decription='foo')
group = parser.add_mutually_exclusive_group()
group.add_argument('-f','--file', nargs=1, type=str, help='')
group.add_argument('-u','--url', nargs=1, type=str, help='')
parser.add_argument('-V','--verbose', action='store_true', default=False, help='')
parser.add_argument('-F','--format', nargs=1, type=str, help='')
Since it has a 'vanilla mode' to run without command line arguments, all arguments must be optional.
How can i implement points 3. and 4. into my code?
EDIT:
I tried -f and -u as subparsers, as described here, but subcommands seem to be treated like positionals and the parser gives me an error: too few arguments if i run it without arguments.
Use of nargs=2 and tuple metavar approximates your goal
parser = argparse.ArgumentParser(prog='PROG')
group = parser.add_mutually_exclusive_group()
group.add_argument('-f','--file', nargs=2, metavar=('FILE','FORMAT'))
group.add_argument('-u','--url', nargs=2, metavar=('URL','FORMAT'))
parser.add_argument('-V','--verbose', action='store_true',help='optional with url')
which produces:
usage: PROG [-h] [-f FILE FORMAT | -u URL FORMAT] [-V]
optional arguments:
-h, --help show this help message and exit
-f FILE FORMAT, --file FILE FORMAT
-u URL FORMAT, --url URL FORMAT
-V, --verbose optional with url
This requires the format along with filename or url, it just doesn't require the -F. As others noted -V can be ignored in the -f case.
I tried -f and -u as subparsers, as described here, but subcommands seem to be treated like positionals and the parser gives me an error: too few arguments if i run it without arguments.
In the latest version(s) subcommands are no longer treated as required positionals. This was, as best I can tell, a side effect of changing the error message to be more informative. Instead of _parse_known_args doing a:
if positionals:
self.error(_('too few arguments'))
it scans _actions to see which are required, and then lists them by name in the error message. This is discussed in http://bugs.python.org/issue9253 . I know this change is in development (3.4), and may also be in 3.3.
These points can enforced in optparse using a callback method when a certain option is present.
However, in argparse these are not available.
You can add a subparser for the url and the file sub-option, and parse these seperatly.
from the help:
Note that the object returned by parse_args() will only contain attributes for
the main parser and the subparser that was selected by the command line
(and not any other subparsers). So in the example above, when the a command
is specified, only the foo and bar attributes are present, and when the b command
is specified, only the foo and baz attributes are present.
But I would just properly document the usage, and just ignore the arguments that are not
applicable.
e.g. let these two command lines behave exactly the same:
prog.py -f FILE -V
prog.py -f FILE
As documentation suggests:
argparse.REMAINDER. All the remaining command-line arguments are gathered into a list. This is commonly useful for command line utilities that dispatch to other command line utilities:
>>> parser = argparse.ArgumentParser(prog='PROG')
>>> parser.add_argument('--foo')
>>> parser.add_argument('command')
>>> parser.add_argument('args', nargs=argparse.REMAINDER)
>>> print parser.parse_args('--foo B cmd --arg1 XX ZZ'.split())
Namespace(args=['--arg1', 'XX', 'ZZ'], command='cmd', foo='B')
I tried to use this to exactly the same purpose, but in some circumstances it seems buggy for me (or perhaps I get the concept wrong):
import argparse
a = argparse.ArgumentParser()
a.add_argument('-qa', nargs='?')
a.add_argument('-qb', nargs='?')
a.add_argument('rest', nargs=argparse.REMAINDER)
a.parse_args('-qa test ./otherutil bar -q atr'.split())
Result:
test.py: error: ambiguous option: -q could match -qa, -qb
So apparently, if the otherutil has such arguments which somehow "collide" with the arguments given to argparse, it doesn't seem to work correctly.
I would expect when argparse reaches the REMAINDER kind of argument, it just uses up all the strings in the end of the list without any further parsing. Can I reach this effect somehow?
I ran into this while trying to dispatch options to an underlying utility. The solution I wound up using was nargs='*' instead of nargs=argparse.REMAINDER, and then just use the "pseudo-argument" -- to separate the options for my command and the underlying tool:
>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--myflag', action='store_true')
>>> parser.add_argument('toolopts', nargs='*')
>>> parser.parse_args('--myflag -- -a --help'.split())
Namespace(myflag=True, toolopts=['-a', '--help'])
This is reasonably easy to document in the help output.
This has more to do with handling of abbreviations than with the REMAINDER nargs.
In [111]: import argparse
In [112]: a = argparse.ArgumentParser()
...:
...: a.add_argument('-qa', nargs='?')
...: a.add_argument('-qb', nargs='?')
In [113]: a.parse_args('-qa test ./otherutil bar -q atr'.split())
usage: ipython3 [-h] [-qa [QA]] [-qb [QB]]
ipython3: error: ambiguous option: -q could match -qa, -qb
argparse does a 2 pass parsing. First it tries to categorize the strings as options (flags) or arguments. Second it alternates between parsing positionals and optionals, allocating arguments according to the nargs.
Here the ambiguity occurs in the first pass. It's trying to match '-q' with the two available optionals. REMAINDER's special action (absorbing '-q' as though it were an plain string) doesn't occur until the second pass.
Newer argparse versions allow us to turn off the abbreviation handling:
In [114]: a.allow_abbrev
Out[114]: True
In [115]: a.allow_abbrev=False
In [116]: a.parse_args('-qa test ./otherutil bar -q atr'.split())
usage: ipython3 [-h] [-qa [QA]] [-qb [QB]]
ipython3: error: unrecognized arguments: ./otherutil bar -q atr
And if I add the REMAINDER action:
In [117]: a.add_argument('rest', nargs=argparse.REMAINDER)
In [118]: a.parse_args('-qa test ./otherutil bar -q atr'.split())
Out[118]: Namespace(qa='test', qb=None, rest=['./otherutil', 'bar', '-q', 'atr'])
The use of '--' as #Colin suggests works because that string is recognized in the first pass:
In [119]: a.allow_abbrev=True
In [120]: Out[117].nargs='*'
In [121]: a.parse_args('-qa test -- ./otherutil bar -q atr'.split())
Out[121]: Namespace(qa='test', qb=None, rest=['./otherutil', 'bar', '-q', 'atr'])
You need to use two --.
a.add_argument('--qa', nargs='?')
a.add_argument('--qb', nargs='?')
So the options that you define collide with a -q, that accepts at least an argument, defined somewhere else
From argparse doc
ArgumentParser.add_argument(name or flags...)
name or flags - Either a name or a list of option strings, e.g. foo or -f, --foo.
EDIT to reply to #PDani first comment:
This post is interesting.
From what I have understood, argparse follows the POSIX and GNU style.
An important thing is that short (1 letter) option can be grouped together and if one option required one argument this can be be attached to the option letter. For example if you have something like this
a.add_argument('-a', action='store_true')
a.add_argument('-b', action='store_true')
a.add_argument('-c', action='store_true')
a.add_argument('-d', nargs=1)
a.add_argument('-e', nargs=1)
you can call them as -abcd3 -e5 or -a -b -c -d3 -e5 or -cba -e5 -d3, ...
Now, if you have
a.add_argument('-abc', action='store_true')
and you have
would be very hard for argparse to decide if -abc is 3 short arguments attached or one long. So you are forced to define the argument as --abc.
So I guess that you can't use long arguments name with one -.
I know of an alternative way to do the command line parsing called docopt: you can give a look but I doubt that it can solve your problem.
Perhaps some combination of ArgumentParser.parse_known_args() and some other bits of special handling?
This is not be perfect, but might lead in the right direction:
import argparse
import sys
a = argparse.ArgumentParser()
# treat the common-prefixed arguments as options to the prefix
a.add_argument("-q")
# allow a delimiter to set off your arguments from those which should go to the
# other utility, and use parse_known_args() if the delimiter is not present
argv = sys.argv[1:]
if "--" in argv:
i = argv.index("--")
args, extra = a.parse_args(argv[:i]), argv[i + 1:]
else:
a.add_argument("extra", nargs=argparse.REMAINDER)
args, _ = a.parse_known_args(argv)
extra = args.extra
# complain if the `-q` option was not specified correctly
if args.q not in ("something", "otherthing"):
a.error("Must specify '-qsomething' or '-qotherthing'")
print "q:", "-q%s" % (args.q,)
print "extra:", '"%s"' % (" ".join(extra),)
Result:
$ ./testcmd -qsomething test ./otherutil bar -q atr
q: -qsomething
extra: "test ./otherutil bar -q atr"
Caveats:
This will allow a space between -q and the rest of the -q-prefixed option.
This will consume one -q option, but I don't recall whether it will raise an exception (or do any other helpful thing) if more are specified.
I need some help regarding using argparse. What I want to achieve is that I need to pass in only one argument, it could be one of the followings: --k, --r, --b, --p,(ignore the rest). If the argument count is not 1, print "usage" information and quit. Also the program needs to know which flag is passed in in order to create corresponding object. I tried several times but I doesn't work, can anyone give me a hint on this? Thanks.
What you need to use to accomplish that is a mutually exclusive group:
import argparse
parser = argparse.ArgumentParser()
group = parser.add_mutually_exclusive_group()
group.add_argument('-k', action='store_true')
group.add_argument('-r', action='store_true')
group.add_argument('-b', action='store_true')
group.add_argument('-p', action='store_true')
parser.parse_args()
As it can be seen in the example below, only one option in a mutually exclusive group is allowed at the same time:
$ python test.py -k -r -b -p
usage: test.py [-h] [-k | -r | -b | -p]
test.py: error: argument -r: not allowed with argument -k
To check which flag was passed, you just need to look at the argparse.Namespace object returned by parse_args method (the flag passed will be set to True).
How about not using argparse at all? It doesn't seem really necessary.
if len(sys.argv) != 2:
print_usage()
arg = sys.argv[1]
if arg not in ["--k", "--r", "--b", "--p"]:
print_usage()
# Do whatever you want with arg