Creating composable/hierarchical command-line parsers using Python/argparse

Creating composable/hierarchical command-line parsers using Python/argparse - python

(A simplified form of the problem.) I'm writing an API involving some Python components. These might be functions, but for concreteness let's say they're objects. I want to be able to parse options for the various components from the command line.
from argparse import ArgumentParser
class Foo(object):
def __init__(self, foo_options):
"""do stuff with options"""
"""..."""
class Bar(object):
def __init__(sef, bar_options):
"""..."""
def foo_parser():
"""(could also be a Foo method)"""
p = ArgumentParser()
p.add_argument('--option1')
#...
return p
def bar_parser(): "..."
But now I want to be able to build larger components:
def larger_component(options):
f1 = Foo(options.foo1)
f2 = Foo(options.foo2)
b = Bar(options.bar)
# ... do stuff with these pieces
Fine. But how to write the appropriate parser? We might wish for something like this:
def larger_parser(): # probably need to take some prefix/ns arguments
# general options to be overridden by p1, p2
# (this could be done automagically or by hand in `larger_component`):
p = foo_parser(prefix=None, namespace='foo')
p1 = foo_parser(prefix='first-foo-', namespace='foo1')
p2 = foo_parser(prefix='second-foo-', namespace='foo2')
b = bar_parser()
# (you wouldn't actually specify the prefix/namespace twice: )
return combine_parsers([(p1, namespace='foo1', prefix='first-foo-'),
(p2,...),p,b])
larger_component(larger_parser().parse_args())
# CLI should accept --foo1-option1, --foo2-option1, --option1 (*)
which looks a bit like argparse's parents feature if you forget that we want prefixing (so as to be able to add multiple parsers of the same type)
and probably namespacing (so that we can build tree-structured namespaces to reflect the structure of the components).
Of course, we want larger_component and larger_parser to be composable in the same way, and the namespace object passed to a certain component should always have the same internal shape/naming structure.
The trouble seems to be that the argparse API is basically about mutating your parsers, but querying them is more difficult - if you turned a
datatype into a parser directly, you could just walk these objects. I managed to hack something that somewhat works if the user writes a bunch of functions to add arguments to parsers by hand, but each add_argument call must then take a prefix, and the whole thing becomes quite inscrutable and probably non-composable. (You could abstract over this at the cost of duplicating some parts of the internal data structures ...). I also tried to subclass the parser and group objects ...
You could imagine this might be possible using a more algebraic CLI-parsing API, but I don't think rewriting argparse is a good solution here.
Is there a known/straightforward way to do this?

Some thoughts that may help you construct the larger parser:
parser = argparse.ArgumentParser(...)
arg1 = parser.add_argument('--foo',...)
Now arg1 is a reference to the Action object created by add_argument. I'd suggest doing this in an interactive shell and looking at its attributes. Or at least print its repr. You can also experiment with modifying attributes.
Most of what a parser 'knows' about the arguments is contained in these actions. In a sense a parser is an object that 'contains' a bunch of 'actions'.
Look also at:
parser._actions
This is the parser's master list of actions, which will include the default help as well as the ones you add.
The parents mechanism copies Action references from the parent to the child. Note, it does not make copies of the Action objects. It also recreates argument groups - but these groups only serve to group help lines. They have nothing to do with parsing.
args1, extras = parser.parse_known_args(argv, namespace)
is very useful when dealing with multiple parsers. With it, each parser can handle the arguments it knows about, and pass the rest on to others. Try to understand the inputs and outputs to that method.
We have talked about composite Namespace objects in earlier SO questions. The default argparse.Namespace class is a simple object class with a repr method. The parser just uses hasattr, getattr and setattr, trying to be as non-specific as it can. You could construct a more elaborate namespace class.
argparse subcommands with nested namespaces
You can also customize the Action classes. That's where most values are inserted into the Namespace (though defaults are set elsewhere).
IPython uses argparse, both for the main call, and internally for magic commands. It constructs many arguments from config files. Thus it is possible to set many values either with default configs, custom configs, or at the last moment via the commandline arguments.

You might be able to use the concept of composing actions to achieve the functionality that you need. You can build actions that modify the namespace, dest, etc as you need and then compose them with:
def compose_actions(*actions):
"""Compose many argparse actions into one callable action.
Args:
*actions: The actions to compose.
Returns:
argparse.Action: Composed action.
"""
class ComposableAction(argparse.Action):
def __call__(self, parser, namespace, values, option_string=None):
for action in actions:
action(option_string, self.dest).__call__(parser,
namespace,
values,
option_string)
return ComposableAction
See example: https://gist.github.com/mnm364/edee068a5cebbfac43547b57b7c842f1

Related

How to see the attributes of argparse.NameSpace Python object?

For a project involving multiple scripts (data processing, model tuning, model training, testing, etc.) I may keep all my <class 'argparse.ArgumentParser'> objects as the return value of functions for each script task in a module I name cli.py. However, in the calling script itself (__main__), after calling args = parser.parse_args(), if I type args. in VS Code I get no suggested attributes that are specific to that object. How can I get suggested attributes of argparse.NameSpace objects in VS Code?
E.g.,
"""Module for command line interface for different scripts."""
import argparse
def some_task_parser(description: str) -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(description=description)
parser.add_argument('some_arg')
return parser
then
"""Script for executing some task."""
from cli import some_task_parser
if __name__ == '__main__':
parser = some_task_parser('arbitrary task')
args = parser.parse_args()
print(args.) # I WANT SUGGESTED ATTRIBUTES (SUCH AS args.some_arg) TO POP UP!!

I don't know about VSCode, in an ipython session, the tab complete does show the available attributes of Namespace. argparse.Namespace is a relatively simple object class. What you want, I think, are just the attributes of this object.
In [238]: args = argparse.Namespace(test='one', foo='three', bar=3)
In [239]: args
Out[239]: Namespace(bar=3, foo='three', test='one')
In [240]: args.
bar test
foo args.txt
But those attributes will not be available during development. They are put there by the parsing action, at run time. They cannot be reliably deduced from the parser setup. Certainly argparse does not provide such a help.
I often recommend that developers include a
print(args)
during the debugging phase, so they get a clear idea of what the parser does.

Use args.__dict__ to get the names and values. If you need just the argument names, then [key for key in args.__dict__.keys()] will give the list of them.

Is there a way to get ad-hoc polymorphism in Python?

Many languages support ad-hoc polymorphism (a.k.a. function overloading) out of the box. However, it seems that Python opted out of it. Still, I can imagine there might be a trick or a library that is able to pull it off in Python. Does anyone know of such a tool?
For example, in Haskell one might use this to generate test data for different types:
-- In some testing library:
class Randomizable a where
genRandom :: a
-- Overload for different types
instance Randomizable String where genRandom = ...
instance Randomizable Int where genRandom = ...
instance Randomizable Bool where genRandom = ...
-- In some client project, we might have a custom type:
instance Randomizable VeryCustomType where genRandom = ...
The beauty of this is that I can extend genRandom for my own custom types without touching the testing library.
How would you achieve something like this in Python?

Python is not a strongly typed language, so it really doesn't matter if yo have an instance of Randomizable or an instance of some other class which has the same methods.
One way to get the appearance of what you want could be this:
types_ = {}
def registerType ( dtype , cls ) :
types_[dtype] = cls
def RandomizableT ( dtype ) :
return types_[dtype]
Firstly, yes, I did define a function with a capital letter, but it's meant to act more like a class. For example:
registerType ( int , TheLibrary.Randomizable )
registerType ( str , MyLibrary.MyStringRandomizable )
Then, later:
type = ... # get whatever type you want to randomize
randomizer = RandomizableT(type) ()
print randomizer.getRandom()

A Python function cannot be automatically specialised based on static compile-time typing. Therefore its result can only depend on its arguments received at run-time and on the global (or local) environment, unless the function itself is modifiable in-place and can carry some state.
Your generic function genRandom takes no arguments besides the typing information. Thus in Python it should at least receive the type as an argument. Since built-in classes cannot be modified, the generic function (instance) implementation for such classes should be somehow supplied through the global environment or included into the function itself.
I've found out that since Python 3.4, there is #functools.singledispatch decorator. However, it works only for functions which receive a type instance (object) as the first argument, so it is not clear how it could be applied in your example. I am also a bit confused by its rationale:
In addition, it is currently a common anti-pattern for Python code to inspect the types of received arguments, in order to decide what to do with the objects.
I understand that anti-pattern is a jargon term for a pattern which is considered undesirable (and does not at all mean the absence of a pattern). The rationale thus claims that inspecting types of arguments is undesirable, and this claim is used to justify introducing a tool that will simplify ... dispatching on the type of an argument. (Incidentally, note that according to PEP 20, "Explicit is better than implicit.")
The "Alternative approaches" section of PEP 443 "Single-dispatch generic functions" however seems worth reading. There are several references to possible solutions, including one to "Five-minute Multimethods in Python" article by Guido van Rossum from 2005.

Does this count for ad hock polymorphism?
class A:
def __init__(self):
pass
def aFunc(self):
print "In A"
class B:
def __init__(self):
pass
def aFunc(self):
print "In B"
f = A()
f.aFunc()
f = B()
f.aFunc()
output
In A
In B

Another version of polymorphism
from module import aName
If two modules use the same interface, you could import either one and use it in your code.
One example of this is from xml.etree.ElementTree import XMLParser

Parse multiple subcommands in python simultaneously or other way to group parsed arguments

I am converting Bash shell installer utility to Python 2.7 and need to implement complex CLI so I am able to parse tens of parameters (potentially up to ~150). These are names of Puppet class variables in addition to a dozen of generic deployment options, which where available in shell version.
However after I have started to add more variables I faced are several challenges:
1. I need to group parameters into separate dictionaries so deployment options are separated from Puppet variables. If they are thrown into the same bucket, then I will have to write some logic to sort them, potentially renaming parameters and then dictionary merges will not be trivial.
2. There might be variables with the same name but belonging to different Puppet class, so I thought subcommands would allow me to filter what goes where and avoiding name collisions.
At the momment I have implemented parameter parsing via simply adding multiple parsers:
parser = argparse.ArgumentParser(description='deployment parameters.')
env_select = parser.add_argument_group(None, 'Environment selection')
env_select.add_argument('-c', '--client_id', help='Client name to use.')
env_select.add_argument('-e', '--environment', help='Environment name to use.')
setup_type = parser.add_argument_group(None, 'What kind of setup should be done:')
setup_type.add_argument('-i', '--install', choices=ANSWERS, metavar='', action=StoreBool, help='Yy/Nn Do normal install and configuration')
# MORE setup options
...
args, unk = parser.parse_known_args()
config['deploy_cfg'].update(args.__dict__)
pup_class1_parser = argparse.ArgumentParser(description=None)
pup_class1 = pup_class1_parser.add_argument_group(None, 'Puppet variables')
pup_class1.add_argument('--ad_domain', help='AD/LDAP domain name.')
pup_class1.add_argument('--ad_host', help='AD/LDAP server name.')
# Rest of the parameters
args, unk = pup_class1_parser.parse_known_args()
config['pup_class1'] = dict({})
config['pup_class1'].update(args.__dict__)
# Same for class2, class3 and so on.
The problem with this approach that it does not solve issue 2. Also first parser consumes "-h" option and rest of parameters are not shown in help.
I have tried to use example selected as an answer but I was not able to use both commands at once.
## This function takes the 'extra' attribute from global namespace and re-parses it to create separate namespaces for all other chained commands.
def parse_extra (parser, namespace):
namespaces = []
extra = namespace.extra
while extra:
n = parser.parse_args(extra)
extra = n.extra
namespaces.append(n)
return namespaces
pp = pprint.PrettyPrinter(indent=4)
argparser=argparse.ArgumentParser()
subparsers = argparser.add_subparsers(help='sub-command help', dest='subparser_name')
parser_a = subparsers.add_parser('command_a', help = "command_a help")
## Setup options for parser_a
parser_a.add_argument('--opt_a1', help='option a1')
parser_a.add_argument('--opt_a2', help='option a2')
parser_b = subparsers.add_parser('command_b', help = "command_b help")
## Setup options for parser_a
parser_b.add_argument('--opt_b1', help='option b1')
parser_b.add_argument('--opt_b2', help='option b2')
## Add nargs="*" for zero or more other commands
argparser.add_argument('extra', nargs = "*", help = 'Other commands')
namespace = argparser.parse_args()
pp.pprint(namespace)
extra_namespaces = parse_extra( argparser, namespace )
pp.pprint(extra_namespaces)
Results me in:
$ python argtest.py command_b --opt_b1 b1 --opt_b2 b2 command_a --opt_a1 a1
usage: argtest.py [-h] {command_a,command_b} ... [extra [extra ...]]
argtest.py: error: unrecognized arguments: command_a --opt_a1 a1
The same result was when I tried to define parent with two child parsers.
QUESTIONS
Can I somehow use parser.add_argument_group for argument parsing or is it just for the grouping in help print out? It would solve issue 1 without missing help side effect. Passing it as parse_known_args(namespace=argument_group) (if I correctly recall my experiments) gets all the variables (thats ok) but also gets all Python object stuff in resulting dict (that's bad for hieradata YAML)
What I am missing in the second example to allow to use multiple subcommands? Or is that impossible with argparse?
Any other suggestion to group command line variables? I have looked at Click, but did not find any advantages over standard argparse for my task.
Note: I am sysadmin, not a programmer so be gently on me for the non object style coding. :)
Thank you
RESOLVED
Argument grouping solved via the answer suggested by hpaulj.
import argparse
import pprint
parser = argparse.ArgumentParser()
group_list = ['group1', 'group2']
group1 = parser.add_argument_group('group1')
group1.add_argument('--test11', help="test11")
group1.add_argument('--test12', help="test12")
group2 = parser.add_argument_group('group2')
group2.add_argument('--test21', help="test21")
group2.add_argument('--test22', help="test22")
args = parser.parse_args()
pp = pprint.PrettyPrinter(indent=4)
d = dict({})
for group in parser._action_groups:
if group.title in group_list:
d[group.title]={a.dest:getattr(args,a.dest,None) for a in group._group_actions}
print "Parsed arguments"
pp.pprint(d)
This gets me desired result for the issue No.1. until I will have multiple parameters with the same name. Solution may look ugly, but at least it works as expected.
python argtest4.py --test22 aa --test11 yy11 --test21 aaa21
Parsed arguments
{ 'group1': { 'test11': 'yy11', 'test12': None},
'group2': { 'test21': 'aaa21', 'test22': 'aa'}}

Your question is too complicated to understand and respond to in one try. But I'll throw out some preliminary ideas.
Yes, argument_groups are just a way of grouping arguments in the help. They have no effect on parsing.
Another recent SO asked about parsing groups of arguments:
Is it possible to only parse one argument group's parameters with argparse?
That poster initially wanted to use a group as a parser, but the argparse class structure does not allow that. argparse is written in object style. parser=ArguementParser... creates one class of object, parser.add_arguement... creates another, add_argument_group... yet another. You customize it by subclassing ArgumentParser or HelpFormatter or Action classes, etc.
I mentioned a parents mechanism. You define one or more parent parsers, and use those to populate your 'main' parser. They could be run indepdently (with parse_known_args), while the 'main' is used to handle help.
We also discussed grouping the arguments after parsing. A namespace is a simple object, in which each argument is an attribute. It can also be converted to a dictionary. It is easy to pull groups of items from a dictionary.
There have SO questions about using multiple subparsers. That's an awkward proposition. Possible, but not easy. Subparsers are like issueing a command to a system program. You generally issue one command per call. You don't nest them or issue sequences. You let shell piping and scripts handle multiple actions.
IPython uses argparse to parse its inputs. It traps help first, and issues its own message. Most arguments come from config files, so it is possible to set values with default configs, custom configs and in the commandline. It's an example of naming a very large set of arguments.
Subparsers let you use the same argument name, but without being able to invoke multiple subparsers in one call that doesn't help much. And even if you could invoke several subparsers, they would still put the arguments in the same namespace. Also argparse tries to handle flaged arguments in an order independent manner. So a --foo at the end of the command line gets parsed the same as though it were at the start.
There was SO question where we discussed using argument names ('dest') like 'group1.argument1', and I've even discussed using nested namespaces. I could look those up if it would help.
Another thought - load sys.argv and partition it before passing it to one or more parsers. You could split it on some key word, or on prefixes etc.

If you have so many arguments this seems like a design issue. It seems very unmanageable. Can you not implement this using a configuration file that has reasonable set of defaults? Or defaults in the code with a reasonable (i.e. SMALL) number of arguments in the command line, and allow everything or everything else to be overridden with parameters in a 'key:value' configuration file? I can't imagine having to use a CLI with the number of variables you are proposing.

ArgumentParser epilog and description formatting in conjunction with ArgumentDefaultsHelpFormatter

I'm using argparse to take in command line input and also to produce help text. I want to use ArgumentDefaultsHelpFormatter as the formatter_class, however this prevents me from also using RawDescriptionHelpFormatter which would allow me to add custom formatting to my description or epilog.
Is there a sensible method of achieving this aside from writing code to produce text for default values myself? According to the argparse docs, all internals of ArgumentParser are considered implementation details, not public API, so sub-classing isn't an attractive option.

I just tried a multiple inheritance approach, and it works:
class CustomFormatter(argparse.ArgumentDefaultsHelpFormatter, argparse.RawDescriptionHelpFormatter):
pass
parser = argparse.ArgumentParser(description='test\ntest\ntest.',
epilog='test\ntest\ntest.',
formatter_class=CustomFormatter)
This may break if the internals of these classes change though.

I don't see why subclassing a HelpFormatter should be a problem. That isn't messing with the internals of ArgumentParser. The documentation has examples of custom Action and Type classes (or functions). I take the 'there are four such classes' line to be an invitation to write my own HelpFormatter if needed.
The provided HelpFormatter subclasses make quite simple changes, changing just one function. So they can be easily copied or altered.
RawDescription just changes:
def _fill_text(self, text, width, indent):
return ''.join(indent + line for line in text.splitlines(keepends=True))
In theory it could be changed without altering the API, but it's unlikely.
The defaults formatter just changes:
def _get_help_string(self, action):
help = action.help
if '%(default)' not in action.help:
if action.default is not SUPPRESS:
defaulting_nargs = [OPTIONAL, ZERO_OR_MORE]
if action.option_strings or action.nargs in defaulting_nargs:
help += ' (default: %(default)s)'
return help
You could get the same effect by just including %(default)s in all of your argument help lines. In contrast to the Raw subclasses, this is just a convenience class. It doesn't give you more control over the formatting.

Set defaults at runtime

I manage a fairly large python-based quantum chemistry suite, PyQuante. I'm currently struggling with how to set various defaults so that users can choose among different options at runtime.
For example, I have three different methods for computing electron repulsion integrals. Let's call them a,b,c. I used to simply pick the one I liked best (say, c), and have that hard-wired into the module that computes these integrals.
I have now modified this to use a module, Defaults.py, that contains all such hard-wires. But this is set at compile/install time. I would now like users to be able to override these options at runtime, say, using a .pyquanterc.py file.
In my integral routines, I currently have something like
from Defaults import integral_method
I know about dictionaries, and the .update() method. But I don't know how I would use this in real life. My defaults module looks like
integral_method = c
should I modify the end of Defaults.py to look for a .pythonrc.py file and override these values? E.g.
if os.path.exists('$HOME/.pythonrc.py'): do_something
If so, what should do_something look like?

With your current setup, the user can change the default functions in his scripts quite easily:
import Defaults
Defaults.integral_method = somefunc
If the user adds this to his script, all your modules that use integral_method from Defaults will use somefunc to calculate integrals.

I might do this via a factory class.
class IntegralSolver:
"""
Factory class containing methods for solving integrals.
>>> solver = IntegralSolver("method1")
>>> solver(x)
# solution via method1
Can also be used directly:
>>> IntegralSolver.method2(x)
# solution via method2
"""
def __init__(self, method):
self.__call__ = getattr(self, method)
#staticmethod
def method1(x):
return method1_solution
#staticmethod
def method2(x):
return method2_solution

It really depends on how your user runs the toolset. If they twiddle the python code each time, just setting a block at the top labeled OPTIONS should be good. If they run it off the command line, use the argparse library to allow them to switch options on the command line. Perhaps have it read the options out of a file with configParser to read a default file with your options, and if the user sets it, an additional file with their options.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating composable/hierarchical command-line parsers using Python/argparse - python

Related

How to see the attributes of argparse.NameSpace Python object?

Is there a way to get ad-hoc polymorphism in Python?

Parse multiple subcommands in python simultaneously or other way to group parsed arguments

ArgumentParser epilog and description formatting in conjunction with ArgumentDefaultsHelpFormatter

Set defaults at runtime

Categories

Resources