argparse - Combining parent parser, subparsers and default values

argparse - Combining parent parser, subparsers and default values - python

I wanted to define different subparsers in a script, with both inheriting options from a common parent, but with different defaults. It doesn't work as expected, though.
Here's what I did:
import argparse
# this is the top level parser
parser = argparse.ArgumentParser(description='bla bla')
# this serves as a parent parser
base_parser = argparse.ArgumentParser(add_help=False)
base_parser.add_argument('-n', help='number', type=int)
# subparsers
subparsers = parser.add_subparsers()
subparser1= subparsers.add_parser('a', help='subparser 1',
parents=[base_parser])
subparser1.set_defaults(n=50)
subparser2 = subparsers.add_parser('b', help='subparser 2',
parents=[base_parser])
subparser2.set_defaults(n=20)
args = parser.parse_args()
print args
When I run the script from the command line, this is what I get:
$ python subparse.py b
Namespace(n=20)
$ python subparse.py a
Namespace(n=20)
Apparently, the second set_defaults overwrites the first one in the parent. Since there wasn't anything about it in the argparse documentation (which is pretty detailed), I thought this might be a bug.
Is there some simple solution for this? I could check the args variable afterwards and replace None values with the intended defaults for each subparser, but that's what I expected argparse to do for me.
This is Python 2.7, by the way.

set_defaults loops through the actions of the parser, and sets each default attribute:
def set_defaults(self, **kwargs):
...
for action in self._actions:
if action.dest in kwargs:
action.default = kwargs[action.dest]
Your -n argument (an action object) was created when you defined the base_parser. When each subparser is created using parents, that action is added to the ._actions list of each subparser. It doesn't define new actions; it just copies pointers.
So when you use set_defaults on subparser2, you modify the default for this shared action.
This Action is probably the 2nd item in the subparser1._action list (h is the first).
subparser1._actions[1].dest # 'n'
subparser1._actions[1] is subparser2._actions[1] # true
If that 2nd statement is True, that means the same action is in both lists.
If you had defined -n individually for each subparser, you would not see this. They would have different action objects.
I'm working from my knowledge of the code, not anything in the documentation. It was pointed out recently in Cause Python's argparse to execute action for default that the documentation says nothing about add_argument returning an Action object. Those objects are an important part of the code organization, but they don't get much attention in the documentation.
Copying parent actions by reference also creates problems if the 'resolve' conflict handler is used, and the parent needs to be reused. This issue was raised in
argparse conflict resolver for options in subcommands turns keyword argument into positional argument
and Python bug issue:
http://bugs.python.org/issue22401
A possible solution, for both this issue and that, is to (optionally) make a copy of the action, rather than share the reference. That way the option_strings and defaults can be modified in the children without affecting the parent.

What's happening
The problem here is that parser arguments are objects, and when a parser inherits from it's parents, it adds a reference to the parent's action to it's own list. When you call set_default, it sets the default on this object, which is shared across the subparsers.
You can examine the subparsers to see this:
>>> a1 = [ action for action in subparser1._actions if action.dest=='n' ].pop()
>>> a2 = [ action for action in subparser2._actions if action.dest=='n' ].pop()
>>> a1 is a2 # same object in memory
True
>>> a1.default
20
>>> type(a1)
<class 'argparse._StoreAction'>
First solution: Explicitly add this argument to each subparser
You can fix this by adding the argument to each subparser separately rather than adding it to the base class.
subparser1= subparsers.add_parser('a', help='subparser 1',
parents=[base_parser])
subparser1.add_argument('-n', help='number', type=int, default=50)
subparser2= subparsers.add_parser('b', help='subparser 2',
parents=[base_parser])
subparser2.add_argument('-n', help='number', type=int, default=20)
...
Second solution: multiple base classes
If there are many subparsers which share the same default value, and you want to avoid this, you can create different base classes for each default. Since parents is a list of base classes, you can still group the common parts into another base class, and pass the subparser multiple base classes to inherit from. This is probably unnecessarily complicated.
import argparse
# this is the top level parser
parser = argparse.ArgumentParser(description='bla bla')
# this serves as a parent parser
base_parser = argparse.ArgumentParser(add_help=False)
# add common args
# for group with 50 default
base_parser_50 = argparse.ArgumentParser(add_help=False)
base_parser_50.add_argument('-n', help='number', type=int, default=50)
# for group with 50 default
base_parser_20 = argparse.ArgumentParser(add_help=False)
base_parser_20.add_argument('-n', help='number', type=int, default=20)
# subparsers
subparsers = parser.add_subparsers()
subparser1= subparsers.add_parser('a', help='subparser 1',
parents=[base_parser, base_parser_50])
subparser2 = subparsers.add_parser('b', help='subparser 2',
parents=[base_parser, base_parser_20])
args = parser.parse_args()
print args
First solution with shared args
You can also share a dictionary for the arguments and use unpacking to avoid repeating all the arguments:
import argparse
# this is the top level parser
parser = argparse.ArgumentParser(description='bla bla')
n_args = '-n',
n_kwargs = {'help': 'number', 'type': int}
# subparsers
subparsers = parser.add_subparsers()
subparser1= subparsers.add_parser('a', help='subparser 1')
subparser1.add_argument(*n_args, default=50, **n_kwargs)
subparser2 = subparsers.add_parser('b', help='subparser 2')
subparser2.add_argument(*n_args, default=20, **n_kwargs)
args = parser.parse_args()
print args

I wanted multiple subparsers to inherit common arguments as well, but the parents functionality from argparse gave me issues too as the others have explained. Fortunately, there's a very simple solution: create a function to add the arguments instead of creating a parent.
I pass both subparser1 and subparser2 to a function, parent_parser, which adds the common argument, -n.
import argparse
# this is the top level parser
parser = argparse.ArgumentParser(description='bla bla')
# this serves as a parent parser
def parent_parser(parser_to_update):
parser_to_update.add_argument('-n', help='number', type=int)
return parser_to_update
# subparsers
subparsers = parser.add_subparsers()
subparser1 = subparsers.add_parser('a', help='subparser 1')
subparser1 = parent_parser(subparser1)
subparser1.set_defaults(n=50)
subparser2 = subparsers.add_parser('b', help='subparser 2')
subparser2 = parent_parser(subparser2)
subparser2.set_defaults(n=20)
args = parser.parse_args()
print(args)
When I run the script:
$ python subparse.py b
Namespace(n=20)
$ python subparse.py a
Namespace(n=50)

Related

How to deal with different arguments that may have similarly named dest?

Let's say you want to use subcommands and at its core the subcommands want the same object data points to be stored in Namespace but perhaps grouped by subcommands. How can one extend argparse but not lose any of its standard behavior while achieving this?
For example:
import argparse
parser = argparse.ArgumentParser()
subparser = parser.add_subparsers()
fooparser = subparser.add_parser('foo')
fooparser.add_argument('rawr', dest='rawr')
barparser = subparser.add_parser('bar')
barparser.add_argument('rawr', dest='rawr')
# It would be nice that in the Namespace object this shows up as the following:
# args: foo 0
# Namespace(foo.rawr=0)
# args: bar 1
# Namespace(bar.rawr=1)
The above example just tries to explain my point but the main issue is that what happens is that, when the above code executes parse_args() returns a Namespace that just has rawr=N but what if my code distinguishes behavior based on the subcommand so its important that there be an object that has an attribute rawr within the Namespace object. For example:
if args.foo.rawr:
# do action 1
pass
if args.bar.rawr:
# do action 2
pass
If args only has args.rawr, then you cannot discriminate action 1 or action 2, they both are legal actions without the additional nested layer.

To save the subcommand name, use .add_subparsers(dest=), like this:
subparser = parser.add_subparsers(dest='command')
fooparser = subparser.add_parser('foo')
fooparser.add_argument('rawr')
barparser = subparser.add_parser('bar')
barparser.add_argument('rawr')
for a in ['foo', '0'], ['bar', '1']:
args = parser.parse_args(a)
print(args)
if args.command == 'foo':
print('doing foo!')
elif args.command == 'bar':
print('doing bar!')
Output:
Namespace(command='foo', rawr='0')
doing foo!
Namespace(command='bar', rawr='1')
doing bar!
Thanks to George Shuklin for pointing this out on Medium

Supplying a dest for subparser is desirable, though not required. But it may be enough to further identify the arguments.
Positionals can take any name you want to supply; you can't supply an extra dest. That name will be used in the args Namespace. Use metavar to control the string used in the help.
For flagged arguments (optionals), use the dest.
subparser = parser.add_subparsers(dest='cmd')
fooparser = subparser.add_parser('foo')
fooparser.add_argument('-b','--baz', dest='foo_baz')
fooparser.add_argument('foo_rawr', metavar='rawr')
barparser = subparser.add_parser('bar')
barparser.add_argument('-b','--baz', dest='bar_baz')
barparser.add_argument('bar_rawr', metavar='rawr')
Include a print(args) during debugging to get a clear idea of what the parser does.
In previous SO we have discussed using custom Namespace class and custom Action subclasses to create some sort of nesting or dict like behavior, but I think that's more work than most people need.
Docs also illustrate the use of
parser_foo.set_defaults(func=foo)
to set an extra argument based on the subparser. In this example the value may be an actual function object. The use of the dest is also mentioned in the docs, though perhaps as too much of an afterthought.

Python argparse subparser dest parameter doesn't work with a parent

I'm writing a utility that will have multiple modules, and which module gets run is determined by an argument. Each module has it's own arguments, but all modules will share 4 standard arguments. To get this to work I just set the 'parent' param when creating the subparsers, but the problem is I also need to be able to determine which module was called on the command line. It looks like the 'dest' param is the way to do this, but for some reason having both 'parent' and 'dest' set at the same time does not work.
import argparse
parser = argparse.ArgumentParser() # main parser
parser.addArgument("--foo", action='store_true')
subparsers = parser.add_subparsers(dest='cmd')
# without 'parents=[parser]' it properly stores 'bar' in cmd
# however '--foo' MUST be before 'bar'
bar = subparsers.add_parser("bar", parents=[parser], add_help=False)
bar.add_argument("--test", action='store_true')
# should be able to have '--foo' before OR after 'bar'
parser.parse_args(['--foo', 'bar', '--test'])
In this code, the add_subparsers call sets the dest to 'cmd.' Then, I could parse the arguments and call args.cmd to get the name of the module called (in this case, bar). However when parents is set the value of cmd is always None. Currently my workaround is to just have an empty main parser and simply copy-paste the 4 standard args to every subparser, which works but is not exactly desirable.
My question: Is there another way to determine which module was called? Why does this even happen?

Thanks to the information provided by #hpaulj in the comment to the OP, I managed to get this working.
Basically, you need your main parser and a parent parser. You will then set the parents attribute of your subparsers to be the parent parser. Based on the example you gave, the following should be a working example:
import argparse
# Create parsers
parser = argparse.ArgumentParser()
parent_parser = argparse.ArgumentParser()
# Add arguments to parent parser
parent_parser.add_argument("--foo", action='store_true')
# Create subparser
subparsers = parser.add_subparsers(dest='cmd')
# Add to the subparser
bar = subparsers.add_parser("bar", parents=[parent_parser], add_help=False)
bar.add_argument("--test", action='store_true')
baz = subparsers.add_parser("baz", parents=[parent_parser], add_help=False)
baz.add_argument("--baz-test", action="store_true")
# should be able to have '--foo' before OR after 'bar'
print(parser.parse_args(['bar', '--test']))
print(parser.parse_args(["baz", "--baz-test"]))
This outputs the following:
Namespace(cmd='bar', foo=False, test=True)
Namespace(baz_test=True, cmd='baz', foo=False)
You should then be able to do things like this:
args = parser.parse_args()
if args.cmd == "bar":
print("bar was specified")
elif args.cmd == "baz":
print("baz was specified")
It might not be the perfect solution, but it should work.
(Tested using Python 3.5.2)

Set default for all subparsers on top level parser

I have an argparse parser with several subcommands some of which share an option (via a parent parser). Now I want to set a default value for such an option regardless of which subparser will be executed in the end. My non working code looks like this:
from argparse import ArgumentParser
base = ArgumentParser(add_help=False)
base.add_argument('--foo', action='store_true')
parser = ArgumentParser()
subparsers = parser.add_subparsers(dest='action')
s1 = subparsers.add_parser('a', parents=[base])
s2 = subparsers.add_parser('b', parents=[base])
parser.set_defaults(foo=42)
print(parser.parse_args(['a']))
s1.set_defaults(foo=43)
print(parser.parse_args(['a']))
This prints
Namespace(action='a', foo=False)
Namespace(action='a', foo=43)
I have many subparsers and many options so I want to avoid saving every subparser by name and calling set_defauls on it. Can that be done?
I will know the value I want to set there only after creating all the parsers so I can not specify the defaults in the call to add_argument.
Background: what I am actually working on
The defaults I want to set come from a config file. I actually have two parsers, one to find the config file first and one to parse the subcommands. But I need to define both parsers up front in order to overwrite the help method of the first parser with the help method of the second parser in order to display the full --help text before parsing the config (because that might fail and I could not display the help text). A reduced version of my code looks like this:
import argparse
base = argparse.ArgumentParser(add_help=False)
base.add_argument("--config", help="config file to use")
p1 = argparse.ArgumentParser(parents=[base])
p1.add_argument('remainder', nargs=argparse.REMAINDER)
p2 = argparse.ArgumentParser(parents=[base])
s = p2.add_subparsers(dest='action')
s1 = s.add_parser('a') # add some options
s2 = s.add_parser('b') # add some options
# and so on
p1.print_help = p2.print_help
a1 = p1.parse_args()
config = load_my_config(a1.config)
p2.set_defaults(**config.get_my_defaults())
a2 = p2.parse_args(a1.remainder)

I found a solution to listing all the subparsers. The solution is not to remember all the variables for the different sub-parsers but only the _SubParsersAction object that was used to create them:
import argparse
p = argparse.ArgumentParser()
s = p.add_subparsers()
a = s.add_parser('a')
a.add_argument(...)
b = s.add_parser('b')
c = s.add_parser('c')
...
# now I don't need to remember all the variables a, b, c, ...
# in order to set the defaults on all of theses sub-parsers
config = load_my_config_file()
defaults = config.get_defaults()
for name, sparser in s.choises:
print("Setting defaults on sub parser for '{}'").format(name)
sparser.set_defaults(**defaults)

Argparse: mixing parent parser with subparsers

I want to write a simple tool that takes an arbitrary number of input files and performs one operation on each of them. The syntax is stupidly simple:
mytool operation input1 input2 ... inputN
Some of these operations may require an extra argument
mytool operation op_argument input1 input2 ... inputN
In addition to this I'd like the users to be able to specify whether the operations should be performed in place, and to specify the target directory of the output.
mytool -t $TARGET --in-place operation op_argument input1 input2 input3
And as a very last requirement, I'd like users to be able to get help on each operation individually, as well as on the usage of the tool as a whole.
Here's my attempt at designing an Argument Parser for said tool, together with a Minimal, Complete, Verifiable Example:
#!/bin/env python
import argparse
from collections import namedtuple
Operations = namedtuple('Ops', 'name, argument, description')
IMPLEMENTED_OPERATIONS = {'echo': Operations('echo',
None,
'Echo inputs'),
'fancy': Operations('fancy',
'fancyarg',
'Do fancy stuff')}
if __name__ == "__main__":
# Parent parser with common stuff.
parent = argparse.ArgumentParser(add_help=False)
parent.add_argument('-t', '--target-directory', type=str, default='.',
help="An output directory to store output files.")
parent.add_argument('-i', '--in-place', action='store_true',
help="After succesful execution, delete the inputs.")
# The inputfiles should be the very last positional argument.
parent.add_argument('inputfiles', nargs='*', type=argparse.FileType('r'),
help="A list of input files to operate on.")
# Top level parser.
top_description = "This is mytool. It does stuff"
parser = argparse.ArgumentParser(prog="mytool",
description=top_description,
parents=[parent])
# Operation parsers.
subparsers = parser.add_subparsers(help='Sub-command help', dest='chosen_op')
op_parsers = {}
for op_name, op in IMPLEMENTED_OPERATIONS.items():
op_parsers[op_name] = subparsers.add_parser(op_name,
description=op.description,
parents=[parent])
if op.argument is not None:
op_parsers[op_name].add_argument(op.argument)
args = parser.parse_args()
op_args = {}
for key, subparser in op_parsers.items():
op_args[key] = subparser.parse_args()
print(args.chosen_op)
The problem I have is that the order of the positional arguments is wrong. Somehow, the way I implemented this makes Argparse think that the operation (and its op_argument) should come after the input files, which is obviously not the case.
How can I have the parent positional argument, in my case the inputfiles, as the last positional argument?

To the main parser, subparsers is just another positional argument, but with a unique nargs ('+...'). So it will look for the inputfiles arguments first, and then allocate any left overs to subparsers.
Mixing positionals with subparsers is tricky. It is best to define inputfiles as an argument for each subparser.
parents can make it easy to add the same set of arguments to several subparsers- however those arguments will added first.
So I think you want:
for op_name, op in IMPLEMENTED_OPERATIONS.items():
op_parsers[op_name] = subparsers.add_parser(op_name,
description=op.description,
parents=[parent])
if op.argument is not None:
op_parsers[op_name].add_argument(op.argument)
op_parsers[op_name].add_argument('inputfiles', nargs='*', type=argparse.FileType('r'),
help="A list of input files to operate on.")
As for the help, the normal behavior is to get help for the main parser, or for each subparser. Combining those into one display has been the topic of several SO questions. It's possible but not easy.
The main parser handles input strings in order - flags, positionals etc. When it handles the subparsers positional, it hands the task of to the name subparser, along with all remaining commandline strings. The subparser then acts like a new independent parser, and returns a namespace to the main parser to be incorporated into the main namespace. The main parser does not resume parsing the commandline. So the subparser action is always last.

How ca I get Python ArgParse to stop overwritting positional arguments in child parser

I am attempting to get my script working, but argparse keeps overwriting my positional arguments from the parent parser. How can I get argparse to honor the parent's value for these? It does keep values from optional args.
Here is a very simplified version of what I need. If you run this, you will see that the args are overwritten.
testargs.py
#! /usr/bin/env python3
import argparse
import sys
def main():
preparser = argparse.ArgumentParser(add_help=False)
preparser.add_argument('first',
nargs='?')
preparser.add_argument('outfile',
nargs='?',
type=argparse.FileType('w', encoding='utf-8'),
default=sys.stdout,
help='Output file')
preparser.add_argument(
'--do-something','-d',
action='store_true')
# Parse args with preparser, and find config file
args, remaining_argv = preparser.parse_known_args()
print(args)
parser = argparse.ArgumentParser(
parents=[preparser],
description=__doc__)
parser.add_argument(
'--clear-screen', '-c',
action='store_true')
args = parser.parse_args(args=remaining_argv,namespace=args )
print(args)
if __name__ == '__main__':
main()
And call it with testargs.py something /tmp/test.txt -d -c
You will see it keeps the -d but drops both the positional args and reverts them to defaults.
EDIT: see additional comments in the accepted answer for some caveats.

When you specify parents=[preparser] it means that parser is an extension of preparser, and will parse all arguments relevent to preparser which it is never given.
Lets say the preparser only has one positional argument first and the parser only has one positional argument second, when you make parser a child of preparser it expects both arguments:
import argparse
parser1 = argparse.ArgumentParser(add_help=False)
parser1.add_argument("first")
parser2 = argparse.ArgumentParser(parents=[parser1])
parser2.add_argument("second")
args2 = parser2.parse_args(["arg1","arg2"])
assert args2.first == "arg1" and args2.second == "arg2"
However passing only the remaining arguments that are left over from parser1 would just be ['second'] which is not the correct arguments to parser2:
parser1 = argparse.ArgumentParser(add_help=False)
parser1.add_argument("first")
args1, remaining_args = parser1.parse_known_args(["arg1","arg2"])
parser2 = argparse.ArgumentParser(parents=[parser1])
parser2.add_argument("second")
>>> args1
Namespace(first='arg1')
>>> remaining_args
['arg2']
>>> parser2.parse_args(remaining_args)
usage: test.py [-h] first second
test.py: error: the following arguments are required: second
To only process the arguments that were not handled by the first pass, do not specify it as the parent to the second parser:
parser1 = argparse.ArgumentParser(add_help=False)
parser1.add_argument("first")
args1, remaining_args = parser1.parse_known_args(["arg1","arg2"])
parser2 = argparse.ArgumentParser() #parents=[parser1]) #NO PARENT!
parser2.add_argument("second")
args2 = parser2.parse_args(remaining_args,args1)
assert args2.first == "arg1" and args2.second == "arg2"

The 2 positionals are nargs='?'. A positional like that is always 'seen', since an empty list matches that nargs.
First time through 'text.txt' matches with first and is put in the Namespace. Second time through there isn't any string to match, so the default is used - same as if you had not given that string the first time.
If I change first to have the default nargs, I get
error: the following arguments are required: first
from the 2nd parser. Even though there's a value in the Namespace it still tries to get a value from the argv. (it's like a default, but not quite).
Defaults for positionals with nargs='?' (or *) are tricky. They are optional, but not in quite the same way as optionals. The positional Actions are still called, but with a empty list of values.
I don't think the parents feature does anything for you. preparser already handles that set of arguments; there's no need to handle them again in parser, especially since all the relevant argument strings have been stripped out.
Another option is to leave the parents in, but use the default sys.argv[1:] in the 2nd parser. (but beware of side effects like opening files)
args = parser.parse_args(namespace=args )
A third option is to parse the arguments independently and merge them with a dictionary update.
adict = vars(preparse_args)
adict.update(vars(parser_args))
# taking some care in who overrides who
For more details look in argparse.py file at ArgumentParser._get_values, specifically the not arg_strings cases.
A note about the FileType. That type works nicely for small scripts where you will use the files right away and exit. It isn't so good on large programs where you might want to close the file after use (close stdout???), or use files in a with context.
edit - note on parents
add_argument creates an Action object, and adds it to the parser's list of actions. parse_args basically matches input strings with these actions.
parents just copies those Action objects (by reference) from parent to child. To the child parser it is just as though the actions were created with add_argument directly.
parents is most useful when you are importing a parser and don't have direct access to its definition. If you are defining both parent and child, then parents just saves you some typing/cut-n-paste.
This and other SO questions (mostly triggered the by-reference copy) show that the developers did not intend you to use both the parent and child to do parsing. It can be done, but there are glitches that the they did not consider.
===================
I can imagine defining a custom Action class that would 'behave' in a situation like this. It might, for example, check the namespace for some not default value before adding its own (possibly default) value.
Consider, for example if I changed the action of first to 'append':
preparser.add_argument('first', action='append', nargs='?')
The result is:
1840:~/mypy$ python3 stack37147683.py /tmp/test.txt -d -c
Namespace(do_something=True, first=['/tmp/test.txt'], outfile=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)
Namespace(clear_screen=True, do_something=True, first=['/tmp/test.txt', None], outfile=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)
From the first parser, first=['/tmp/test.txt']; from the second, first=['/tmp/test.txt', None].
Because of the append, the item from the first is preserved, and a new default has been added by the second parser.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.