python argparse.FileType('w') check extension - python

The argparse package does a great job when dealing with command line arguments. I'm wondering however if there is any way to ask argparse to check for file extension (e.g ".txt"). The idea would be to derived one the class related to argparse.FileType. I would be interested in any suggestion.
Keep in mind that I have more than 50 subcommands in my program all having there own CLI. Thus, I would be interest in deriving a class that could be imported in each of them more than adding some uggly tests in all my commands.
Thanks a lot.
# As an example one would be interested in turning this...
parser_grp.add_argument('-o', '--outputfile',
help="Output file.",
default=sys.stdout,
metavar="TXT",
type=argparse.FileType('w'))
# Into that...
from somewhere import FileTypeWithExtensionCheck
parser_grp.add_argument('-o', '--outputfile',
help="Output file.",
default=sys.stdout,
metavar="TXT",
type=FileTypeWithExtensionCheck('w', '.[Tt][Xx][Tt]$'))

You could subclass the argparse.FileType() class, and override the __call__ method to do filename validation:
class FileTypeWithExtensionCheck(argparse.FileType):
def __init__(self, mode='r', valid_extensions=None, **kwargs):
super().__init__(mode, **kwargs)
self.valid_extensions = valid_extensions
def __call__(self, string):
if self.valid_extensions:
if not string.endswith(self.valid_extensions):
raise argparse.ArgumentTypeError(
'Not a valid filename extension')
return super().__call__(string)
You could also support a regex if you really want to, but using str.endswith() is a more common and simpler test.
This takes either a single string, or a tuple of strings specifying valid extensions:
parser_grp.add_argument(
'-o', '--outputfile', help="Output file.",
default=sys.stdout, metavar="TXT",
type=argparse.FileTypeWithExtensionCheck('w', valid_extensions=('.txt', '.TXT', '.text'))
)
You need to handle this in the __call__ method because the FileType() instance is essentially treated like any other type= argument; as a callable, and you can indicate that the specific argument isn't suitable by raising the ArgumentTypeError exception.

My solution is to create an closure that does the extension checking:
import argparse
def ext_check(expected_extension, openner):
def extension(filename):
if not filename.lower().endswith(expected_extension):
raise ValueError()
return openner(filename)
return extension
parser = argparse.ArgumentParser()
parser.add_argument('outfile', type=ext_check('.txt', argparse.FileType('w')))
# test out
args = parser.parse_args()
args.outfile.write('Hello, world\n')
Notes
ext_check basically is a wrapper for argparse.FileType
It takes an expected extension to check and an openner
For simplicity, the expected extension is in lower case, the filename will be converted to lower case prior to validation
openner in this case is an argparse.FileType('w') callable (most likely a function, but I don't care, as long as it is a callable).
ext_check returns a callable, which is a function called extension. I name it this way, so that the error will come out as followed (note the word extension bellow, which is the name of the function):
error: argument outfile: invalid extension value: 'foo.txt2'
Within the extension function, we check the file extension, if passed, we pass the file name to the openner.
What I like about this solution
Concise
Require almost no knowledge of how argparse.FileType works since it just act as a wrapper around it
What I don't like about it
Caller has to know about closure to understand how it works
I have no control over the error message. That is why I have to name my inner function extension to get a somewhat meaningful error message as seen above.
Other possible solutions
Create a custom action, see the documentation for argparse
Subclass argparse.FileType as Martijn Pieters has done
Each of these solutions has its own strong points and weaknesses

Related

Is it possible to validate `argparse` default argument values?

Is it possible to tell argparse to give the same errors on default argument values as it would on user-specified argument values?
For example, the following will not result in any error:
parser = argparse.ArgumentParser()
parser.add_argument('--choice', choices=['a', 'b', 'c'], default='invalid')
args = vars(parser.parse_args()) # args = {'choice': 'invalid'}
whereas omitting the default, and having the user specify --choice=invalid on the command-line will result in an error (as expected).
Reason for asking is that I would like to have the user to be able to specify default command-line options in a JSON file which are then set using ArgumentParser.set_defaults(), but unfortunately the behaviour demonstrated above prevents these user-specified defaults from being validated.
Update: argparse is inconsistent and I now consider the behavior above to be a bug. The following does trigger an error:
parser = argparse.ArgumentParser()
parser.add_argument('--num', type=int, default='foo')
args = parser.parse_args() # triggers exception in case --num is not
# specified on the command-line
I have opened a bug report for this: https://github.com/python/cpython/issues/100949
I took the time to dig into the source code, and what is happening is that a check is only happening for arguments you gave on the command line. The only way to enforce a check, in my opinion, is to subclass ArgumentParser and have it do the check when you add the argument:
class ValidatingArgumentParser(argparse.ArgumentParser):
def add_argument(self, *args, **kwargs):
super().add_argument(*args, **kwargs)
self._check_value(self._actions[-1],kwargs['default'])
No. Explicit arguments need to be validated because they originate from outside the source code. Default values originate in the source code, so it's the job of the programmer, not the argument parser, to ensure they are valid.
(This is the difference between validation and debugging.)
(Using set_defaults on unvalidated user input still falls under the purview of debugging, as it's not the argument parser itself adding the default values, but the programmer.)

argparse: how to parse a single string argument OR a file listing many arguments?

I have a use case where I'd like the user to be able to provide, as an argument to argparse, EITHER a single string OR a filename where each line has a string.
Assume the user launches ./myscript.py -i foobar
The logical flow I'm looking for is something like this:
The script determines whether the string foobar is a readable file.
IF it is indeed a readable file, we call some function from the script, passing each line in foobar as an argument to that function. If foobar is not a readable file, we call the same function but just use the string foobar as the argument and return.
I have no ability to guarantee that a filename argument will have a specific extension (or even an extension at all).
Is there a more pythonic way to do this OTHER than just coding up the logic exactly as I've described above? I looked through the argparse tutorial and didn't see anything, but it also seems reasonable to think that there would be some specific hooks for filenames as arguments, so I figured I'd ask.
A way would be:
Let's say that you have created a parser like this:
parser.add_argument('-i',
help='...',
type=function)
Where type points to the function which will be an outer function that evaluates the input of the user and decides if it is a string or a filename
More information about type you can find in the documentation.
Here is a minimal example that demonstrates this use of type:
parser.add_argument('-d','--directory',
type=Val_dir,
help='...')
# ....
def Val_dir(dir):
if not os.path.isdir(dir):
raise argparse.ArgumentTypeError('The directory you specified does not seem to exist!')
else:
return dir
The above example shows that with type we can control the input at parsing time. Of course in your case the function would implement another logic - evaluate if the input is a string or a filename.
This doesn't look like an argparse problem, since all you want from it is a string. That string can be a filename or a function argument. To a parser these will look the same. Also argparse isn't normally used to run functions. It is used to parse the commandline. Your code determines what to do with that information.
So here's a script (untested) that I think does your task:
import argparse
def somefunction(*args):
print(args)
if __name__=='__main__':
parser=argparse.ArgumentParser()
parser.add_argument('-i','--input')
args = parser.parse_args()
try:
with open(args.input) as f:
lines = f.read()
somefunction(*lines)
# or
# for line in lines:
# somefuncion(line.strip())
except:
somefunction(arg.input)
argparse just provides the args.input string. It's the try/except block that determines how it is used.
================
Here's a prefix char approach:
parser=argparse.ArgumentParser(fromfile_prefix_chars='#',
description="use <prog -i #filename> to load values from file")
parser.add_argument('-i','--inputs')
args=parser.parse_args()
for arg in args.inputs:
somefunction(arg)
this is supposed to work with a file like:
one
two
three
https://docs.python.org/3/library/argparse.html#fromfile-prefix-chars

handle errors in Python ArgumentParser

I want to manually handle the situation where parse_args() throws an error in case of a unknown value for an argument. For example:
If I have the following python file called script.py:
argp = argparse.ArgumentParser(description='example')
argp.add_argument('--compiler', choices=['default', 'clang3.4', 'clang3.5'])
args = argp.parse_args()
and I run the script with the following args python script.py --compiler=foo it throws the following error:
error: argument --compiler: invalid choice: 'foo' (choose from 'default', 'clang3.4', 'clang3.5')
SystemExit: 2
What do I need to do in order to handle this behaviour myself instead of the script quitting itself? One idea is to subclass argparse.ArgumentParser and override parse_args() or just monkey patch the method but I was wondering if there's a better way that does not require overriding the standard library behaviour?
The whole point to defining choices is to make the parser complain about values that are not in the list. But there are some alternatives:
omit choices (include them in the help text if you want), and do your own testing after parsing. argparse doesn't have to do everything for you. It's main purpose is to figure out what your user wants.
redefine the parser.error method (via subclassing is best) to redirect the error from sys.exit. But you'll have to parse the error message to distinguish between this error and other ones that the parser might raise.
define a type function that checks for choices, and makes the default substitution.
The parsing of the '--compiler' option goes something like this:
grab the string argument after the --compiler flag
pass it through the type function. Default is lambda x:x. int converts it to integer, etc. Raise ValueError is value is bad.
check the returned value against the choices list (if any)
use the action to add the value to the Namespace (default simply stores it).
Error in any of these steps produces an ArgumentError which is trapped by the parser.error method and passed to a parser.exit method.
Since the store_action occurs after type and choices checking, a custom action won't bypass their errors.
Here's a possible type solution (not tested)
def compile_choices(astr):
if astr in ['default', 'clang3.4', 'clang3.5']:
return astr
else:
return 'default'
# could raise ValueError('bad value') if there are some strings you don't like
argp.add_argument('--compiler', type=compile_choices)
=================
If compile_choices takes other arguments, such as the list of choices or the default, you'll need to wrap in some why that defines those values before parsing.
An example accepting a binary string representation:
parser.add_argument('--binary', type=lambda x: int(x, base=2),
help='integer in binary format', default='1010')
or
parser.add_argument('--binary', type=functools.partial(int, base=2), default='1010')

How to use nosetests in python while also passing/accepting arguments for argparse?

I want to use nose and coverage in my project. When I run nose with --with-coverage argument, my programs argument-parsing module goes nuts because "--with-coverage" isn't a real argument according to it.
How do I turn the argparse off, but during testing only? Nose says all my tests fail because of the bad argument.
I actually just ran into this issue myself the other day. You don't need to "disable" your parsing module or anything. What you can do is change the module that uses argparse to ignore those arguments it receives that it doesn't recognize. That way they can still be used by other scripts (for example if your command-line call passes secondary arguments to another program execution).
Without your code, I'll assume you're using the standard parse_args() method on your argparse.ArgumentParser instance. Replace it with parse_known_args() instead.
Then, whenever you subsequently reference the parsed-arguments Namespace object, you'll need to specify and element, specifically 0. While parse_args() returns the args object alone, parse_known_args() returns tuple: the first element is the parsed known arguments, and the latter element contains the ignored unrecognized arguments (which you can later use/pass in your Python code, if necessary).
Here's the example change from my own project:
class RunArgs(object):
'''
A placeholder for processing arguments passed to program execution.
'''
def __init__(self):
self.getArgs()
#self.pause = self.args.pause # old assignment
self.pause = self.args[0].pause # new assignment
#...
def __repr__(self):
return "<RunArgs(t=%s, #=%s, v=%s)>" % (str(x) for x in (self.pause,self.numreads,self.verbose))
def getArgs(self):
global PAUSE_TIME
global NUM_READS
parser = argparse.ArgumentParser()
parser.add_argument('-p', '--pause', required=False,
type=self.checkPauseArg, action='store', default=PAUSE_TIME)
parser.add_argument('-n', '--numreads', required=False,
type=self.checkNumArg, action='store', default=NUM_READS)
parser.add_argument('-v', '--verbose', required=False,
action='store_true')
#self.args = parser.parse_args() # old parse call
self.args = parser.parse_known_args() # new parse call
#...
I've read that you can use nose-testconfig, or otherwise use mock to replace the call (not test it). Though I'd agree with #Ned Batchelder, it begs questioning the structure of the problem.
As a workaround, instead of running nose with command-line arguments, you can have a .noserc or nose.cfg in the current working directory:
[nosetests]
verbosity=3
with-coverage=1
Though, I agree that parse_known_args() is a better solution.
It sounds like you have tests that run your code, and then your code uses argparse which implicitly pulls arguments from sys.argv. This is a bad way to structure your code. Your code under test should be getting arguments passed to it some other way so that you can control what arguments it sees.
This is an example of why global variables are bad. sys.argv is a global, shared by the entire process. You've limited the modularity, and therefore the testability, of your code by relying on that global.

How do you handle options that can't be used together (using OptionParser)?

My Python script (for todo lists) is started from the command line like this:
todo [options] <command> [command-options]
Some options can not be used together, for example
todo add --pos=3 --end "Ask Stackoverflow"
would specify both the third position and the end of the list. Likewise
todo list --brief --informative
would confuse my program about being brief or informative. Since I want to have quite a powerful option control, cases like these will be a bunch, and new ones will surely arise in the future. If a users passes a bad combination of options, I want to give an informative message, preferably along with the usage help provided by optparse. Currently I handle this with an if-else statement that I find really ugly and poor. My dream is to have something like this in my code:
parser.set_not_allowed(combination=["--pos", "--end"],
message="--pos and --end can not be used together")
and the OptionParser would use this when parsing the options.
Since this doesn't exist as far as I know, I ask the SO community:
How do you handle this?
Possibly by extending optparse.OptionParser:
class Conflict(object):
__slots__ = ("combination", "message", "parser")
def __init__(self, combination, message, parser):
self.combination = combination
self.message = str(message)
self.parser = parser
def accepts(self, options):
count = sum(1 for option in self.combination if hasattr(options, option))
return count <= 1
class ConflictError(Exception):
def __init__(self, conflict):
self.conflict = conflict
def __str__(self):
return self.conflict.message
class MyOptionParser(optparse.OptionParser):
def __init__(self, *args, **kwds):
optparse.OptionParser.__init__(self, *args, **kwds)
self.conflicts = []
def set_not_allowed(self, combination, message):
self.conflicts.append(Conflict(combination, message, self))
def parse_args(self, *args, **kwds):
# Force-ignore the default values and parse the arguments first
kwds2 = dict(kwds)
kwds2["values"] = optparse.Values()
options, _ = optparse.OptionParser.parse_args(self, *args, **kwds2)
# Check for conflicts
for conflict in self.conflicts:
if not conflict.accepts(options):
raise ConflictError(conflict)
# Parse the arguments once again, now with defaults
return optparse.OptionParser.parse_args(self, *args, **kwds)
You can then handle ConflictError where you call parse_args:
try:
options, args = parser.parse_args()
except ConflictError as err:
parser.error(err.message)
Tamás's answer is a good start, but I couldn't get it to work, as it has (or had) a number of bugs, including a broken call to super, "parser" missing in Conflict.__slots__, always raising an error when a conflict is specified because of the use of parser.has_option() in Conflicts.accepts(), etc.
Since I really needed this feature, I rolled my own solution and have made it available from the Python Package Index as ConflictsOptionParser. It works pretty much as a drop in replacement for optparse.OptionParser. (I do know argparse is the new command line parsing hotness, but it is not available in Python 2.6 and below and has less adoption currently than optparse. Send me an email if you'd like to hack up or have hacked up an additional argparse-based solution.) The key is two new methods, register_conflict(), and, to a lesser extent, unregister_conflict():
#/usr/bin/env python
import conflictsparse
parser = conflictsparse.ConflictsOptionParser("python %prog [OPTIONS] ARG")
# You can retain the Option instances for flexibility, in case you change
# option strings later
verbose_opt = parser.add_option('-v', '--verbose', action='store_true')
quiet_opt = parser.add_option('-q', '--quiet', action='store_true')
# Alternatively, you don't need to keep references to the instances;
# we can re-use the option strings later
parser.add_option('--no-output', action='store_true')
# Register the conflict. Specifying an error message is optional; the
# generic one that is generated will usually do.
parser.register_conflict((verbose_opt, quiet_opt, '--no-output'))
# Now we parse the arguments as we would with
# optparse.OptionParser.parse_args()
opts, args = parser.parse_args()
It has a few advantages over the solution begun by Támas:
It works out of the box and is installable through pip (or easy_install, if you must).
Options in a conflict may be specified either by their option strings or by their optparse.Option instances, which helps with the DRY principle; if you use the instances, you can change the actual strings without worrying about breaking conflict code.
It follows normal optparse.OptionParser.parse_args() behavior and automatically calls optparse.OptionParser.error() when it detects conflicting options in the command line arguments, rather than throwing the error directly. (This is both a feature and a bug; kind of a bug in optparse's general design, but a feature for this package in that it is at least consistent with optparse behavior.)

Categories