Getting NameError in unittests for testing a parameter from argparse

Getting NameError in unittests for testing a parameter from argparse - python

I normally use Python for scripts but I am trying to write a unit test and am having a lot of issues. I would like to test a method that creates a parameter --users. The value is how many occurred.
count_users(df, args.metrics)
It is a spark dataframe and the metrics are set like so:
if __name__ == "__main__":
parser = argparse.ArgumentParser("Processing args")
parser.add_argument("--metrics", required=True)
main(parser.parse_args())
The method looks like this:
def count_users(df, metrics):
users = df.where(df.users > 0).count()
temp_df = df.withColumn("user_count_values", F.lit(users))
temp_df.write.json(metrics)
Now I am trying to write my test, and this is where I am not sure about:
def test_count_users(self):
df = (
SparkSession.builder.appName("test")
.getOrCreate()
.createDataFrame(
data=[
(Decimal(0),),
(Decimal(22),),
],
schema=StructType(
[
StructField("users", DecimalType(38, 4), True),
]
),
)
)
ap = argparse.ArgumentParser("Test args")
ap.add_argument("metrics")
args = {_.dest: _ for _ in ap._actions if isinstance(_, _StoreAction)}
assert args.keys() == {"metrics"}
count_users(df, args.metrics)
self.assertTrue(args["metrics"], 1)
Right now I get an error that reads
count_users(df, args.metrics)
AttributeError: 'dict' object has no attribute 'metrics'

It's unclear what you are trying to achieve with the args = { ... line, or the two asserts. Remove them, use something standard like
import argparse
parser = argparse.ArgumentParser("Test args")
parser.add_argument("--metrics", required=True)
args = parser.parse_args(["--metrics", "output.json"])
count_users(df, args.metrics)
Your args variable won't have the appropriate attribute(s) until you parse the arguments. Of course, normally you'd call
args = parser.parse_args()
and let the user provide the --metrics outputfilename.json arguments to the script. The above is more for example or test use cases.

Related

Python: using command-line option to access a tuple

I want to pass a short product code when running my script:
./myscript.py --productcode r|u|c
Then use the short product code to look up data stored in a tuple in the python code:
# create tuples for each product
r=("Redhat","7.2")
u=("Ubuntu","7.5")
c=("Centos","8.1")
# parse the command line
parser = argparse.ArgumentParser()
parser.add_argument("--productcode", help="Short code for product")
options=parser.parse_args()
# get the product code
product_code=options.productcode
# Access elements in the relevant tuple
product_name=product_code[0]
product_version=product_code[1]

As was already mentioned in the comment,
you can store tuples in a dictionary with the matching keys.
import argparse
mapping = {
'r': ("Redhat", "7.2"),
'u': ("Ubuntu", "7.5"),
'c': ("Centos", "8.1"),
}
parser = argparse.ArgumentParser()
parser.add_argument("--productcode", help="Short code for product")
options = parser.parse_args()
product = mapping[options.productcode]
print(product[0])
print(product[1])
In this case:
$ python script.py --productcode c
Centos
8.1
Alternatively, you can create the mapping dynamically (here I used namedtuple instead of a regular tuple).
import argparse
import sys
from collections import namedtuple
Product = namedtuple('Product', ['name', 'version', 'code'])
redhat = Product("Redhat", "7.2", 'r')
ubuntu = Product("Ubuntu", "7.5", 'u')
centos = Product("Centos", "8.1", 'c')
mapping = {
item.code: item
for item in locals().values()
if isinstance(item, Product)
}
parser = argparse.ArgumentParser()
parser.add_argument("--productcode", help="Short code for product")
options = parser.parse_args()
product = mapping[options.productcode]
print(product.name)
print(product.version)

Cannot work around the error TypeError: unhashable type: 'list' in pandas

My dataset looks like this which is fairly large.
I am trying to write this data to different data frames each corresponding to Fullbacks, Center Backs etc.
So far I have this:
def get_dataset(f):
return pd.read_csv(f)
def split(dataset):
names = dataset.Categories.str.extract(r'([^>]*>[^>]*)').drop_duplicates().values.tolist()
splitframes = [dataset[dataset['Categories'].str.contains(name)] for name in names]
for splitframe in splitframes:
splitframe.to_csv(splitframe.name + '.csv')
def main(file):
dataset = get_dataset(file)
split(dataset)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('infile', nargs=1, type=argparse.FileType('r'))
args = parser.parse_args()
main(*args.infile)
I cannot find my way out of this error which says -- > TypeError: unhashable type: 'list'
Appreciate helping me figure out what is wrong with what I am trying to do and get around this.

class instance method takes exactly two args, one given

I am having a bit of an issue. First off, I know that this code is able to stand alone and not be in a class but I would prefer that it is in a class. Second, when I run the code, I get this error TypeError: set_options() takes exactly 2 arguments (1 given) .
Here is my code. If anyone could point me in the right direction, I would appreciate it. I'm assuming that the set_options method isn't getting my jobj instance. Am I correct in assuming that and how would one go about fixing this? ps. I do have the correct imports and here is my py command at terminal python test.py radar 127.0.0.1 hashNumber testplan:speed
class TransferStuff(object):
tool = sys.argv[1]
target = sys.argv[2]
hash = sys.argv[3]
options = sys.argv[4]
def set_options(self, test_options):
option_arr = test_options.split(',')
new_arr = [i + ':{}'.format(i) for i in option_arr if ':' not in i]
for i in option_arr:
if ':' in i:
new_arr.append(i)
d = {}
for i in new_arr:
temp = i.split(':')
d[temp[0]] = temp[1]
return d
data = {'target': target, 'test': tool, 'HASH': hash,
'options': set_options(options)}
def write_to_json(self):
"""Serialize cli args and tool options in json format.
Write stream to json file.
"""
with open('envs.json', 'w') as fi:
json.dump(TransferStuff.data, fi)
if __name__ == "__main__":
try:
jobj = TransferStuff()
jobj.write_to_json()

Your method is inside a class, you need to create a instance of the class:
transfer_stuff_instance = TransferStuff()
And call the method with this instance:
transfer_stuff_instance.ser_options(options)

How do I pass data from one operator to another

I made a custom airflow operator, this operator takes an input and the output of this operator is on XCOM.
What I want to achieve is to call the operator with some defined input, parse the output as Python callable inside the Branch Operator and then pass the parsed output to another task that calls the same operator tree:
CustomOperator_Task1 = CustomOperator(
data={
'type': 'custom',
'date': '2017-11-12'
},
task_id='CustomOperator_Task1',
dag=dag)
data = {}
def checkOutput(**kwargs):
result = kwargs['ti'].xcom_pull(task_ids='CustomOperator_Task1')
if result.success = True:
data = result.data
return "CustomOperator_Task2"
return "Failure"
BranchOperator_Task = BranchPythonOperator(
task_id='BranchOperator_Task ',
dag=dag,
python_callable=checkOutput,
provide_context=True,
trigger_rule="all_done")
CustomOperator_Task2 = CustomOperator(
data= data,
task_id='CustomOperator_Task2',
dag=dag)
CustomOperator_Task1 >> BranchOperator_Task >> CustomOperator_Task2
In task CustomOperator_Task2 I would want to pass the parsed data from BranchOperator_Task. Right now it is always empty {}
What is the best way to do that?

I see your issue now. Setting the data variable like you are won't work because of how Airflow works. An entirely different process will be running the next task, so it won't have the context of what data was set to.
Instead, BranchOperator_Task has to push the parsed output into another XCom so CustomOperator_Task2 can explicitly fetch it.
def checkOutput(**kwargs):
ti = kwargs['ti']
result = ti.xcom_pull(task_ids='CustomOperator_Task1')
if result.success:
ti.xcom_push(key='data', value=data)
return "CustomOperator_Task2"
return "Failure"
BranchOperator_Task = BranchPythonOperator(
...)
CustomOperator_Task2 = CustomOperator(
data_xcom_task_id=BranchOperator_Task.task_id,
data_xcom_key='data',
task_id='CustomOperator_Task2',
dag=dag)
Then your operator might look something like this.
class CustomOperator(BaseOperator):
#apply_defaults
def __init__(self, data_xcom_task_id, data_xcom_key, *args, **kwargs):
self.data_xcom_task_id = data_xcom_task_id
self.data_xcom_key = data_xcom_key
def execute(self, context):
data = context['ti'].xcom_pull(task_ids=self.data_xcom_task_id, key=self.data_xcom_key)
...
Parameters may not be required if you just want to hardcode them. It depends on your use case.

As your comment suggests, the return value from your custom operator is None, therefore your xcom_pull should expect to be empty.
Please use xcom_push explicitly, as the default behavior of airflow could change over time.

argparse subcommands with nested namespaces

Does argparse provide built-in facilities for having it parse groups or parsers into their own namespaces? I feel like I must be missing an option somewhere.
Edit: This example is probably not exactly what I should be doing to structure the parser to meet my goal, but it was what I worked out so far. My specific goal is to be able to give subparsers groups of options that are parsed into namespace fields. The idea I had with parent was simply to use common options for this same purpose.
Example:
import argparse
# Main parser
main_parser = argparse.ArgumentParser()
main_parser.add_argument("-common")
# filter parser
filter_parser = argparse.ArgumentParser(add_help=False)
filter_parser.add_argument("-filter1")
filter_parser.add_argument("-filter2")
# sub commands
subparsers = main_parser.add_subparsers(help='sub-command help')
parser_a = subparsers.add_parser('command_a', help="command_a help", parents=[filter_parser])
parser_a.add_argument("-foo")
parser_a.add_argument("-bar")
parser_b = subparsers.add_parser('command_b', help="command_b help", parents=[filter_parser])
parser_b.add_argument("-biz")
parser_b.add_argument("-baz")
# parse
namespace = main_parser.parse_args()
print namespace
This is what I get, obviously:
$ python test.py command_a -foo bar -filter1 val
Namespace(bar=None, common=None, filter1='val', filter2=None, foo='bar')
But this is what I am really after:
Namespace(bar=None, common=None, foo='bar',
filter=Namespace(filter1='val', filter2=None))
And then even more groups of options already parsed into namespaces:
Namespace(common=None,
foo='bar', bar=None,
filter=Namespace(filter1='val', filter2=None),
anotherGroup=Namespace(bazers='val'),
anotherGroup2=Namespace(fooers='val'),
)
I've found a related question here but it involves some custom parsing and seems to only covers a really specific circumstance.
Is there an option somewhere to tell argparse to parse certain groups into namespaced fields?

If the focus is on just putting selected arguments in their own namespace, and the use of subparsers (and parents) is incidental to the issue, this custom action might do the trick.
class GroupedAction(argparse.Action):
def __call__(self, parser, namespace, values, option_string=None):
group,dest = self.dest.split('.',2)
groupspace = getattr(namespace, group, argparse.Namespace())
setattr(groupspace, dest, values)
setattr(namespace, group, groupspace)
There are various ways of specifying the group name. It could be passed as an argument when defining the Action. It could be added as parameter. Here I chose to parse it from the dest (so namespace.filter.filter1 can get the value of filter.filter1.
# Main parser
main_parser = argparse.ArgumentParser()
main_parser.add_argument("-common")
filter_parser = argparse.ArgumentParser(add_help=False)
filter_parser.add_argument("--filter1", action=GroupedAction, dest='filter.filter1', default=argparse.SUPPRESS)
filter_parser.add_argument("--filter2", action=GroupedAction, dest='filter.filter2', default=argparse.SUPPRESS)
subparsers = main_parser.add_subparsers(help='sub-command help')
parser_a = subparsers.add_parser('command_a', help="command_a help", parents=[filter_parser])
parser_a.add_argument("--foo")
parser_a.add_argument("--bar")
parser_a.add_argument("--bazers", action=GroupedAction, dest='anotherGroup.bazers', default=argparse.SUPPRESS)
...
namespace = main_parser.parse_args()
print namespace
I had to add default=argparse.SUPPRESS so a bazers=None entry does not appear in the main namespace.
Result:
>>> python PROG command_a --foo bar --filter1 val --bazers val
Namespace(anotherGroup=Namespace(bazers='val'),
bar=None, common=None,
filter=Namespace(filter1='val'),
foo='bar')
If you need default entries in the nested namespaces, you could define the namespace before hand:
filter_namespace = argparse.Namespace(filter1=None, filter2=None)
namespace = argparse.Namespace(filter=filter_namespace)
namespace = main_parser.parse_args(namespace=namespace)
result as before, except for:
filter=Namespace(filter1='val', filter2=None)

I'm not entirely sure what you're asking, but I think what you want is for an argument group or sub-command to put its arguments into a sub-namespace.
As far as I know, argparse does not do this out of the box. But it really isn't hard to do by postprocessing the result, as long as you're willing to dig under the covers a bit. (I'm guessing it's even easier to do it by subclassing ArgumentParser, but you explicitly said you don't want to do that, so I didn't try that.)
parser = argparse.ArgumentParser()
parser.add_argument('--foo')
breakfast = parser.add_argument_group('breakfast')
breakfast.add_argument('--spam')
breakfast.add_argument('--eggs')
args = parser.parse_args()
Now, the list of all destinations for breakfast options is:
[action.dest for action in breakfast._group_actions]
And the key-value pairs in args is:
args._get_kwargs()
So, all we have to to is move the ones that match. It'll be a little easier if we construct dictionaries to create the namespaces from:
breakfast_options = [action.dest for action in breakfast._group_actions]
top_names = {name: value for (name, value) in args._get_kwargs()
if name not in breakfast_options}
breakfast_names = {name: value for (name, value) in args._get_kwargs()
if name in breakfast_options}
top_names['breakfast'] = argparse.Namespace(**breakfast_names)
top_namespace = argparse.Namespace(**top_names)
And that's it; top_namespace looks like:
Namespace(breakfast=Namespace(eggs=None, spam='7'), foo='bar')
Of course in this case, we've got one static group. What if you wanted a more general solution? Easy. parser._action_groups is a list of all groups, but the first two are the global positional and keyword groups. So, just iterate over parser._action_groups[2:], and do the same thing for each that you did for breakfast above.
What about sub-commands instead of groups? Similar, but the details are different. If you've kept around each subparser object, it's just whole other ArgumentParser. If not, but you did keep the subparsers object, it's a special type of Action, whose choices is a dict whose keys are the subparser names and whose values are the subparsers themselves. If you kept neither… start at parser._subparsers and figure it out from there.
At any rate, once you know how to find the names you want to move and where you want to move them, it's the same as with groups.
If you've got, in addition to global args and/or groups and subparser-specific args and/or groups, some groups that are shared by multiple subparsers… then conceptually it gets tricky, because each subparser ends up with references to the same group, and you can't move it to al of them. But fortunately, you're only dealing with exactly one subparser (or none), so you can just ignore the other subparsers and move any shared group under the selected subparser (and any group that doesn't exist in the selected subparser, either leave at the top, or throw away, or pick one subparser arbitrarily).

Nesting with Action subclasses is fine for one type of Action, but is a nuisance if you need to subclass several types (store, store true, append, etc). Here's another idea - subclass Namespace. Do the same sort of name split and setattr, but do it in the Namespace rather than the Action. Then just create an instance of the new class, and pass it to parse_args.
class Nestedspace(argparse.Namespace):
def __setattr__(self, name, value):
if '.' in name:
group,name = name.split('.',1)
ns = getattr(self, group, Nestedspace())
setattr(ns, name, value)
self.__dict__[group] = ns
else:
self.__dict__[name] = value
p = argparse.ArgumentParser()
p.add_argument('--foo')
p.add_argument('--bar', dest='test.bar')
print(p.parse_args('--foo test --bar baz'.split()))
ns = Nestedspace()
print(p.parse_args('--foo test --bar baz'.split(), ns))
p.add_argument('--deep', dest='test.doo.deep')
args = p.parse_args('--foo test --bar baz --deep doodod'.split(), Nestedspace())
print(args)
print(args.test.doo)
print(args.test.doo.deep)
producing:
Namespace(foo='test', test.bar='baz')
Nestedspace(foo='test', test=Nestedspace(bar='baz'))
Nestedspace(foo='test', test=Nestedspace(bar='baz', doo=Nestedspace(deep='doodod')))
Nestedspace(deep='doodod')
doodod
The __getattr__ for this namespace (needed for actions like count and append) could be:
def __getattr__(self, name):
if '.' in name:
group,name = name.split('.',1)
try:
ns = self.__dict__[group]
except KeyError:
raise AttributeError
return getattr(ns, name)
else:
raise AttributeError
I've proposed several other options, but like this the best. It puts the storage details where they belong, in the Namespace, not the parser.

In this script I have modified the __call__ method of the argparse._SubParsersAction. Instead of passing the namespace on to the subparser, it passes a new one. It then adds that to the main namespace. I only change 3 lines of __call__.
import argparse
def mycall(self, parser, namespace, values, option_string=None):
parser_name = values[0]
arg_strings = values[1:]
# set the parser name if requested
if self.dest is not argparse.SUPPRESS:
setattr(namespace, self.dest, parser_name)
# select the parser
try:
parser = self._name_parser_map[parser_name]
except KeyError:
args = {'parser_name': parser_name,
'choices': ', '.join(self._name_parser_map)}
msg = _('unknown parser %(parser_name)r (choices: %(choices)s)') % args
raise argparse.ArgumentError(self, msg)
# CHANGES
# parse all the remaining options into a new namespace
# store any unrecognized options on the main namespace, so that the top
# level parser can decide what to do with them
newspace = argparse.Namespace()
newspace, arg_strings = parser.parse_known_args(arg_strings, newspace)
setattr(namespace, 'subspace', newspace) # is there a better 'dest'?
if arg_strings:
vars(namespace).setdefault(argparse._UNRECOGNIZED_ARGS_ATTR, [])
getattr(namespace, argparse._UNRECOGNIZED_ARGS_ATTR).extend(arg_strings)
argparse._SubParsersAction.__call__ = mycall
# Main parser
main_parser = argparse.ArgumentParser()
main_parser.add_argument("--common")
# sub commands
subparsers = main_parser.add_subparsers(dest='command')
parser_a = subparsers.add_parser('command_a')
parser_a.add_argument("--foo")
parser_a.add_argument("--bar")
parser_b = subparsers.add_parser('command_b')
parser_b.add_argument("--biz")
parser_b.add_argument("--baz")
# parse
input = 'command_a --foo bar --bar val --filter extra'.split()
namespace = main_parser.parse_known_args(input)
print namespace
input = '--common test command_b --biz bar --baz val'.split()
namespace = main_parser.parse_args(input)
print namespace
This produces:
(Namespace(command='command_a', common=None,
subspace=Namespace(bar='val', foo='bar')),
['--filter', 'extra'])
Namespace(command='command_b', common='test',
subspace=Namespace(baz='val', biz='bar'))
I used parse_known_args to test how extra strings are passed back to the main parser.
I dropped the parents stuff because it does not add anything to this namespace change. it is just a convenient way of defining a set of arguments that several subparsers use. argparse does not keep a record of which arguments were added via parents, and which were added directly. It is not a grouping tool
argument_groups don't help much either. They are used by the Help formatter, but not by parse_args.
I could subclass _SubParsersAction (instead of reassigning __call__), but then I'd have change the main_parse.register.

Starting from abarnert's answer, I put together the following MWE++ ;-) that handles multiple configuration groups with similar option names.
#!/usr/bin/env python2
import argparse, re
cmdl_skel = {
'description' : 'An example of multi-level argparse usage.',
'opts' : {
'--foo' : {
'type' : int,
'default' : 0,
'help' : 'foo help main',
},
'--bar' : {
'type' : str,
'default' : 'quux',
'help' : 'bar help main',
},
},
# Assume your program uses sub-programs with their options. Argparse will
# first digest *all* defs, so opts with the same name across groups are
# forbidden. The trick is to use the module name (=> group.title) as
# pseudo namespace which is stripped off at group parsing
'groups' : [
{ 'module' : 'mod1',
'description' : 'mod1 description',
'opts' : {
'--mod1-foo, --mod1.foo' : {
'type' : int,
'default' : 0,
'help' : 'foo help for mod1'
},
},
},
{ 'module' : 'mod2',
'description' : 'mod2 description',
'opts' : {
'--mod2-foo, --mod2.foo' : {
'type' : int,
'default' : 1,
'help' : 'foo help for mod2'
},
},
},
],
'args' : {
'arg1' : {
'type' : str,
'help' : 'arg1 help',
},
'arg2' : {
'type' : str,
'help' : 'arg2 help',
},
}
}
def parse_args ():
def _parse_group (parser, opt, **optd):
# digest variants
optv = re.split('\s*,\s*', opt)
# this may rise exceptions...
parser.add_argument(*optv, **optd)
errors = {}
parser = argparse.ArgumentParser(description=cmdl_skel['description'],
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
# it'd be nice to loop in a single run over zipped lists, but they have
# different lenghts...
for opt in cmdl_skel['opts'].keys():
_parse_group(parser, opt, **cmdl_skel['opts'][opt])
for arg in cmdl_skel['args'].keys():
_parse_group(parser, arg, **cmdl_skel['args'][arg])
for grp in cmdl_skel['groups']:
group = parser.add_argument_group(grp['module'], grp['description'])
for mopt in grp['opts'].keys():
_parse_group(group, mopt, **grp['opts'][mopt])
args = parser.parse_args()
all_group_opts = []
all_group_names = {}
for group in parser._action_groups[2:]:
gtitle = group.title
group_opts = [action.dest for action in group._group_actions]
all_group_opts += group_opts
group_names = {
# remove the leading pseudo-namespace
re.sub("^%s_" % gtitle, '', name) : value
for (name, value) in args._get_kwargs()
if name in group_opts
}
# build group namespace
all_group_names[gtitle] = argparse.Namespace(**group_names)
# rebuild top namespace
top_names = {
name: value for (name, value) in args._get_kwargs()
if name not in all_group_opts
}
top_names.update(**all_group_names)
top_namespace = argparse.Namespace(**top_names)
return top_namespace
def main():
args = parse_args()
print(str(args))
print(args.bar)
print(args.mod1.foo)
if __name__ == '__main__':
main()
Then you can call it like this (mnemonic: --mod1-... are options for "mod1", etc.):
$ ./argparse_example.py one two --bar=three --mod1-foo=11231 --mod2.foo=46546
Namespace(arg1='one', arg2='two', bar='three', foo=0, mod1=Namespace(foo=11231), mod2=Namespace(foo=46546))
three
11231

Based on the answer by #abarnert, I wrote a simple function that does what the OP wants:
from argparse import Namespace, ArgumentParser
def parse_args(parser):
assert isinstance(parser, ArgumentParser)
args = parser.parse_args()
# the first two argument groups are 'positional_arguments' and 'optional_arguments'
pos_group, optional_group = parser._action_groups[0], parser._action_groups[1]
args_dict = args._get_kwargs()
pos_optional_arg_names = [arg.dest for arg in pos_group._group_actions] + [arg.dest for arg in optional_group._group_actions]
pos_optional_args = {name: value for name, value in args_dict if name in pos_optional_arg_names}
other_group_args = dict()
# If there are additional argument groups, add them as nested namespaces
if len(parser._action_groups) > 2:
for group in parser._action_groups[2:]:
group_arg_names = [arg.dest for arg in group._group_actions]
other_group_args[group.title] = Namespace(**{name: value for name, value in args_dict if name in group_arg_names})
# combine the positiona/optional args and the group args
combined_args = pos_optional_args
combined_args.update(other_group_args)
return Namespace(**combined_args)
You just give it the ArgumentParser instance and it returns a nested NameSpace according to the group structure of the arguments.

Please check out the argpext module on PyPi, it may help you!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting NameError in unittests for testing a parameter from argparse - python

Related

Python: using command-line option to access a tuple

Cannot work around the error TypeError: unhashable type: 'list' in pandas

class instance method takes exactly two args, one given

How do I pass data from one operator to another

argparse subcommands with nested namespaces

Categories

Resources