handle errors in Python ArgumentParser

handle errors in Python ArgumentParser - python

I want to manually handle the situation where parse_args() throws an error in case of a unknown value for an argument. For example:
If I have the following python file called script.py:
argp = argparse.ArgumentParser(description='example')
argp.add_argument('--compiler', choices=['default', 'clang3.4', 'clang3.5'])
args = argp.parse_args()
and I run the script with the following args python script.py --compiler=foo it throws the following error:
error: argument --compiler: invalid choice: 'foo' (choose from 'default', 'clang3.4', 'clang3.5')
SystemExit: 2
What do I need to do in order to handle this behaviour myself instead of the script quitting itself? One idea is to subclass argparse.ArgumentParser and override parse_args() or just monkey patch the method but I was wondering if there's a better way that does not require overriding the standard library behaviour?

The whole point to defining choices is to make the parser complain about values that are not in the list. But there are some alternatives:
omit choices (include them in the help text if you want), and do your own testing after parsing. argparse doesn't have to do everything for you. It's main purpose is to figure out what your user wants.
redefine the parser.error method (via subclassing is best) to redirect the error from sys.exit. But you'll have to parse the error message to distinguish between this error and other ones that the parser might raise.
define a type function that checks for choices, and makes the default substitution.
The parsing of the '--compiler' option goes something like this:
grab the string argument after the --compiler flag
pass it through the type function. Default is lambda x:x. int converts it to integer, etc. Raise ValueError is value is bad.
check the returned value against the choices list (if any)
use the action to add the value to the Namespace (default simply stores it).
Error in any of these steps produces an ArgumentError which is trapped by the parser.error method and passed to a parser.exit method.
Since the store_action occurs after type and choices checking, a custom action won't bypass their errors.
Here's a possible type solution (not tested)
def compile_choices(astr):
if astr in ['default', 'clang3.4', 'clang3.5']:
return astr
else:
return 'default'
# could raise ValueError('bad value') if there are some strings you don't like
argp.add_argument('--compiler', type=compile_choices)
=================
If compile_choices takes other arguments, such as the list of choices or the default, you'll need to wrap in some why that defines those values before parsing.
An example accepting a binary string representation:
parser.add_argument('--binary', type=lambda x: int(x, base=2),
help='integer in binary format', default='1010')
or
parser.add_argument('--binary', type=functools.partial(int, base=2), default='1010')

Related

Is it possible to validate `argparse` default argument values?

Is it possible to tell argparse to give the same errors on default argument values as it would on user-specified argument values?
For example, the following will not result in any error:
parser = argparse.ArgumentParser()
parser.add_argument('--choice', choices=['a', 'b', 'c'], default='invalid')
args = vars(parser.parse_args()) # args = {'choice': 'invalid'}
whereas omitting the default, and having the user specify --choice=invalid on the command-line will result in an error (as expected).
Reason for asking is that I would like to have the user to be able to specify default command-line options in a JSON file which are then set using ArgumentParser.set_defaults(), but unfortunately the behaviour demonstrated above prevents these user-specified defaults from being validated.
Update: argparse is inconsistent and I now consider the behavior above to be a bug. The following does trigger an error:
parser = argparse.ArgumentParser()
parser.add_argument('--num', type=int, default='foo')
args = parser.parse_args() # triggers exception in case --num is not
# specified on the command-line
I have opened a bug report for this: https://github.com/python/cpython/issues/100949

I took the time to dig into the source code, and what is happening is that a check is only happening for arguments you gave on the command line. The only way to enforce a check, in my opinion, is to subclass ArgumentParser and have it do the check when you add the argument:
class ValidatingArgumentParser(argparse.ArgumentParser):
def add_argument(self, *args, **kwargs):
super().add_argument(*args, **kwargs)
self._check_value(self._actions[-1],kwargs['default'])

No. Explicit arguments need to be validated because they originate from outside the source code. Default values originate in the source code, so it's the job of the programmer, not the argument parser, to ensure they are valid.
(This is the difference between validation and debugging.)
(Using set_defaults on unvalidated user input still falls under the purview of debugging, as it's not the argument parser itself adding the default values, but the programmer.)

python argparse.FileType('w') check extension

The argparse package does a great job when dealing with command line arguments. I'm wondering however if there is any way to ask argparse to check for file extension (e.g ".txt"). The idea would be to derived one the class related to argparse.FileType. I would be interested in any suggestion.
Keep in mind that I have more than 50 subcommands in my program all having there own CLI. Thus, I would be interest in deriving a class that could be imported in each of them more than adding some uggly tests in all my commands.
Thanks a lot.
# As an example one would be interested in turning this...
parser_grp.add_argument('-o', '--outputfile',
help="Output file.",
default=sys.stdout,
metavar="TXT",
type=argparse.FileType('w'))
# Into that...
from somewhere import FileTypeWithExtensionCheck
parser_grp.add_argument('-o', '--outputfile',
help="Output file.",
default=sys.stdout,
metavar="TXT",
type=FileTypeWithExtensionCheck('w', '.[Tt][Xx][Tt]$'))

You could subclass the argparse.FileType() class, and override the __call__ method to do filename validation:
class FileTypeWithExtensionCheck(argparse.FileType):
def __init__(self, mode='r', valid_extensions=None, **kwargs):
super().__init__(mode, **kwargs)
self.valid_extensions = valid_extensions
def __call__(self, string):
if self.valid_extensions:
if not string.endswith(self.valid_extensions):
raise argparse.ArgumentTypeError(
'Not a valid filename extension')
return super().__call__(string)
You could also support a regex if you really want to, but using str.endswith() is a more common and simpler test.
This takes either a single string, or a tuple of strings specifying valid extensions:
parser_grp.add_argument(
'-o', '--outputfile', help="Output file.",
default=sys.stdout, metavar="TXT",
type=argparse.FileTypeWithExtensionCheck('w', valid_extensions=('.txt', '.TXT', '.text'))
)
You need to handle this in the __call__ method because the FileType() instance is essentially treated like any other type= argument; as a callable, and you can indicate that the specific argument isn't suitable by raising the ArgumentTypeError exception.

My solution is to create an closure that does the extension checking:
import argparse
def ext_check(expected_extension, openner):
def extension(filename):
if not filename.lower().endswith(expected_extension):
raise ValueError()
return openner(filename)
return extension
parser = argparse.ArgumentParser()
parser.add_argument('outfile', type=ext_check('.txt', argparse.FileType('w')))
# test out
args = parser.parse_args()
args.outfile.write('Hello, world\n')
Notes
ext_check basically is a wrapper for argparse.FileType
It takes an expected extension to check and an openner
For simplicity, the expected extension is in lower case, the filename will be converted to lower case prior to validation
openner in this case is an argparse.FileType('w') callable (most likely a function, but I don't care, as long as it is a callable).
ext_check returns a callable, which is a function called extension. I name it this way, so that the error will come out as followed (note the word extension bellow, which is the name of the function):
error: argument outfile: invalid extension value: 'foo.txt2'
Within the extension function, we check the file extension, if passed, we pass the file name to the openner.
What I like about this solution
Concise
Require almost no knowledge of how argparse.FileType works since it just act as a wrapper around it
What I don't like about it
Caller has to know about closure to understand how it works
I have no control over the error message. That is why I have to name my inner function extension to get a somewhat meaningful error message as seen above.
Other possible solutions
Create a custom action, see the documentation for argparse
Subclass argparse.FileType as Martijn Pieters has done
Each of these solutions has its own strong points and weaknesses

argparse doesn't cast default numeric arguments?

Some lines of code are worth a thousand words. Create a python file test.py with the following:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--G', type=float, default=1, nargs='?')
args = parser.parse_args()
print (args.G / 3)
Then, in a terminal:
python test.py
gives 0, while:
python test.py --G 1
gives 0.333333333. I think this is because argparse doesn't seem to cast default arguments to their proper type (so 1 remains an int in default=1, in spite of the type=float), but it casts the argument if it is given explicitly.
I think this is inconsistent and prone to errors. Is there a reason why argparse behaves this way?

I think the 1 in default=1 is evaluated when you call parser.add_argument, where as the non-default value you pass as argument is evaluated at runtime, and therefore can be converted to float by argparse. It's not how argparse behaves; it's how python behaves in general. Consider this
def foo(x=1):
# there is no way to tell the function body that you want 1.0
# except for explicitly conversion
# because 1 is passed by value, which has already been evaluated
# and determined to be an int
print (x/3)
HUGE EDIT: Ha, I understand your point now and I think it's reasonable. So I dig into the source code and looks what I found:
https://hg.python.org/cpython/file/3.5/Lib/argparse.py#l1984
So argparse DOES appear to make sure your default value is type compliant, so long as it's a string. Try:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--G', type=float, default='1', nargs='?')
args = parser.parse_args()
print (args.G / 3)
Not sure if that's a bug or a compromise...

The other answers are right - only string defaults are passed through the type function. But there seems to be some reluctance to accept that logic.
Maybe this example will help:
import argparse
def mytype(astring):
return '['+astring+']'
parser = argparse.ArgumentParser()
parser.add_argument('--foo', type=mytype, default=1)
parser.add_argument('--bar', type=mytype, default='bar')
print parser.parse_args([])
print mytype(1)
produces:
0923:~/mypy$ python stack35429336.py
Namespace(bar='[bar]', foo=1)
Traceback (most recent call last):
File "stack35429336.py", line 8, in <module>
print mytype(1)
File "stack35429336.py", line 3, in mytype
return '['+astring+']'
TypeError: cannot concatenate 'str' and 'int' objects
I define a type function - it takes a string input, and returns something - anything that I want. And raises an error if it can't return that. Here I just append some characters to the string.
When the default is a string, it gets modified. But when it is a number (not a string) it is inserted without change. In fact as written mytype raises an error if given a number.
The argparse type is often confused with the function type(a). The latter returns values like int,str,bool. Plus the most common examples are int and float. But in argparse float is used as a function
float(x) -> floating point number
Convert a string or number to a floating point number, if possible.
type=bool is a common error. Parsing boolean values with argparse. bool() does not convert the string 'False' to the boolean False.
In [50]: bool('False')
Out[50]: True
If argparse passed every default through the type function, it would be difficult to place values like None or False in the namespace. No builtin function converts a string to None.
The key point is that the type parameter is a function (callable), not a casting operation or target.
For further clarification - or confusion - explore the default and type with nargs=2 or action='append'.

The purpose of type is to sanitise and validate the arbitrary input you receive from the command line. It's a first line of defence against bogus input. There is no need to "defend" yourself against the default value, because that's the value you have decided on yourself and you are in power and have full control over your application. type does not mean that the resulting value after parse_args() must be that type; it means that any input should be that type/should be the result of that function.
What the default value is is completely independent of that and entirely up to you. It's completely conceivable to have a default value of False, and if a user inputs a number for this optional argument, then it'll be a number instead.
Upgrading a comment below to be included here: Python's philosophy includes "everyone is a responsible adult and Python won't overly question your decisions" (paraphrased). This very much applies here I think.
The fact that argparse does cast default strings could be explained by the fact that CLI input is always a string, and it's possible that you're chaining several argparsers, and the default value is set by a previous input. Anything that's not a string type however must have been explicitly chosen by you, so it won't be further mangled.

If the default value is a string, the parser parses the value as if it were a command-line argument. In particular, the parser applies any type conversion argument, if provided, before setting the attribute on the Namespace return value. Otherwise, the parser uses the value as is:
parser = argparse.ArgumentParser()
parser.add_argument('--length', default='10', type=int)
parser.add_argument('--width', default=10.5, type=int)
args = parser.parse_args()
Here, args.length will be int 10 and args.width will be float 10.5, just as the default value.
The example is from https://docs.python.org/3.8/library/argparse.html#default.
At first, I also have the same question. Then I notice this example, it explains the question pretty clearly.

How to Check for and raise an error for no command line argument

I have an argparsing program:
import argparse
def main():
p = argparse.ArgumentParser()
help_str = {'rule': 'Blah blah blah and stuff.'}
p.add_argument('-r', '--rule', action="store", type=str, dest='rule', help=help_str['rule'])
p.set_defaults(rule='string')
args = p.parse_args()
Therefore, my command line input:
username#compname:~$python progname.py -r/--rule 'something'
I am trying to figure out how I could just input:
username#compname:~$python progname.py -r/--rule
and have my own error message come up:
print '\n Error: no input value for rule:'
print ' resetting rule to default value.'
args.rule = 'string'
after this, the rule value should print out as 'string'
I am only slightly fluent in Python, sorry. I have tried using try/except blocks and even sys.argv[2] (though I may have done something wrong.) This is the error that keeps popping up with all of my solutions (not very familiar with errors outside Type and Value):
progname.py: error: argument -r/--rule: expected one argument
Any help appreciated.

p = argparse.ArgumentParser()
p.add_argument('-r', '--rule', nargs='?', default='string')
args = p.parse_args()
print args
with nargs='?' produces Namespace(rule='string') if called without argument. With -r, args is Namespace(rule=None). And Namespace(rule='something') with -r something.
Lets add a few lines of code
if args.rule is None:
print '\n Error: no input value for rule:'
print ' resetting rule to default value.'
args.rule = 'string'
print args
Now the output with '-r' (or --rule) is:
Namespace(rule=None)
Error: no input value for rule:
resetting rule to default value.
Namespace(rule='string')
the other cases are the same.
If I drop the default; p.add_argument('-r', '--rule', nargs='?'), the no argument case also produces this custom 'error message', since the default (in argparse) is None.
It's possible to add custom error checking with a custom type or action, but I think testing for None after using argparse is simpler, and easier to understand.
I'd suggest changing this error message to a warning. An Error usually terminates the program; a warning prints the message and continues.
Here's a solution when nargs=2 (or something else that's fixed). It's not a trivial change, since it involves redefining the error method (documented near the end of the argparse docs), and then capturing the error that it produces. The error method cannot access the Namespace (args), nor can it continue parse_args. So it cannot handle any other arguments if there is problem with the rule argument.
class MyParser(argparse.ArgumentParser):
def error(self, message):
if 'rule' in message:
message = 'wrong number of input values for rule'
raise argparse.ArgumentError(None,message)
# cannot access namespace or continue parsing
else:
super(MyParser, self).error(message)
p = MyParser()
p.add_argument('-r', '--rule', nargs=2)
try:
args = p.parse_args()
except argparse.ArgumentError as e:
print e
args = argparse.Namespace(rule='default')
# create a namespace with this default value
print args
Keep in mind that nargs has 2 purposes:
- raise an error if a wrong number of argument strings is given
- allocate multiple argument strings among multiple Actions (the thing created by add_argument).
The 2nd is especially evident if you have, for example, several positionals, taking 1,2 and * arguments respectively. It is a key novel feature of argparse, at least relative to earlier Python parsers. While it possible to tweak it, it is hard to completely redefine it.
If the arguments allow it, you could use nargs='*' instead of the '?', and issue the warning if the number of strings is not '2' (or whatever).

argparse optional argument before positional argument

I was wondering if it is possible to have a positional argument follow an argument with an optional parameter. Ideally the last argument entered into the command line would always apply toward 'testname'.
import argparse
parser = argparse.ArgumentParser(description='TAF')
parser.add_argument('-r','--release',nargs='?',dest='release',default='trunk')
parser.add_argument('testname',nargs='+')
args = parser.parse_args()
I would like both of these calls to have smoketest apply to testname, but the second one results in an error.
>> python TAF.py -r 1.0 smoketest
>> python TAF.py -r smoketest
TAF.py: error: too few arguments
I realize that moving the positional argument to the front would result in the correct behavior of the optional parameter, however this is not quite the format I am looking for. The choices flag looks like an attractive alternative, however it throws an error instead of ignoring the unmatched item.
EDIT:
I've found a hacky way around this. If anyone has a nicer solution I would appreciate it.
import argparse
parser = argparse.ArgumentParser(description='TAF')
parser.add_argument('-r','--release',nargs='?',dest='release',default='trunk')
parser.add_argument('testname',nargs=argparse.REMAINDER)
args = parser.parse_args()
if not args.testname:
args.testname = args.release
args.release = ''

As stated in the documentation:
'?'. One argument will be consumed from the command line if possible,
and produced as a single item. If no command-line argument is present,
the value from default will be produced. Note that for optional
arguments, there is an additional case - the option string is present
but not followed by a command-line argument. In this case the value
from const will be produced.
So, the behaviour you want is not obtainable using '?'. Probably you could write some hack using argparse.Action and meddling with the previous results.(1)
I think the better solution is to split the functionality of that option. Make it an option that requires an argument(but the option itself is optional) and add an option without argument that sets the release to 'trunk'. In this way you can obtain the same results without any hack. Also I think the interface is simpler.
In your example:
python TAF.py -r smoketest
It's quite clear that smoketest will be interpreted as an argument to -r. At least following unix conventions. If you want to keep nargs='?' then the user must use --:
$ python TAF.py -r -- sometest
Namespace(release=None, testname=['sometest']) #parsed result
(1) An idea on how to do this: check if the option has an argument. If it has one check if it is a valid test name. If so put into by hand into testname and set release to the default value. You'll also have to set a "flag" that tells you that this thing happened.
Now, before parsing sys.argv you must redirect sys.stderr. When doing the parsing you must catch SystemExit, check the stderr and see if the error was "too few arguments", check if the flag was set, if so ignore the error and continue running, otherwise you should reprint to the original stderr the error message and exit.
This approach does not look robust, and it's probably buggy.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.