Unit testing a method that iterates through loops - python

I am trying to come up with a good way to test a method that goes through some loops and then gets an object and calls one of its methods. That class has its own tests so I'm not sure exactly what to test here. I have written it to have different behavior in the test case, which is obviously not ideal.
I'm looking for suggestions on how to improve this to test without the conditionals.
def direct(self, test=False):
"""
routes data in self.data_groups to
consumers in self.consumers_list
"""
if test:
output_list = []
for data_type, group in self.data_groups.items():
if test:
output_list.append(data_type)
output_list.append(group)
for consumer_name in self.consumers_list[data_type]:
for record in group:
if test:
output_list.append(record.values()[0])
else:
consumer = self.get_consumer(consumer_name,
record)
consumer_output = consumer.do_something()
if test:
return output_list
return True

Unfortunately I'm not sure that what you are talking about is possible. I'd say you could use a decorator but that'd be useless without overriding direct completely. I'm not sure if this is any better for you but you could just override the method for your tests in its own class? This would make it look a lot cleaner and you could group your code in a more intuitive way.
do something like:
class DirectClass:
def __init__(self):
self.data_groups = dict
self.consumers_list = list
def direct(self):
"""
routes data in self.data_groups to
consumers in self.consumers_list
"""
for data_type, group in self.data_groups.items():
for consumer_name in self.consumers_list[data_type]:
for record in group:
consumer = self.get_consumer(consumer_name,
record)
consumer_output = consumer.do_something()
return True
class TestDirect(DirectClass):
def __init__(self):
DirectClass.__init__(self)
def direct(self):
output_list = []
for data_type, group in self.data_groups.items():
output_list.append(data_type)
output_list.append(group)
for consumer_name in self.consumers_list[data_type]:
for record in group:
output_list.append(record.values()[0])
return output_list

I settled on sub-classing the object returned by consumer so that I could call the same code and have the over-ridden run (formerly do_something) method return test data. There are still more ifs than I want but it achieves most of my objective. Credit to #user2916286 for getting me thinking about subclassing in this case.
def direct(self, test=False):
"""
routes data in self.data_groups to
consumers in self.consumers_list
"""
if test:
output_list = []
for data_type, group in self.data_groups.items():
if test:
output_list.append(data_type)
output_list.append(group)
for consumer_name in self.consumers_list[data_type]:
for record in group:
consumer = self.get_consumer(consumer_name,
record, test=test)
consumer_output = consumer.run()
if not consumer_output:
raise Exception('consumer failed')
output_list.append(consumer_output)
if test:
return output_list
return True

Related

How to create several chains in a for loop with Celery and group them all?

I'm trying to create several chains by iterating through a list, and group them, but Celery complains about task2 saying that too many arguments are given. How could I achieve that?
This is how my code looks like:
#shared_task(bind=True)
def update(self):
# list of chains
ch = [chain(task1.s(exid),
task2.s(exid),
task3.s(exid),
task4.s(exid)
)()
for exid in list_exid]
gp = group(*ch)
gp.delay()
#shared_task(bind=True)
def task1(self, exid):
do_stuff()
#shared_task(bind=True)
def task2(self, exid):
do_stuff()
#shared_task(bind=True)
def task3(self, exid):
do_stuff()
#shared_task(bind=True)
def task4(self, exid):
do_stuff()
And the error:
TypeError: task2() takes 1 positional argument but 3 were given
I answer my own question because I have found the solution to my problem. It was caused by using the wrong s() signature.
It is better like this:
#shared_task(bind=True)
def update(self):
# list of chains
ch = [chain(task1.si(exid),
task2.si(exid),
task3.si(exid),
task4.si(exid)
)()
for exid in list_exid]
gp = group(*ch)
gp.delay()
EDIT
As stated in the documentation the use of .si creates immutable signatures meaning returned value of the previous task is ignored. I'm not sure what is the implication when a task from the chain fails

wrapping generators to have a single `next` call instead of two steps ( __iter__ + __next__ )

I'm receiving an unknown number of records for background processing from generators. If there is a more important job, I have to stop to release the process.
The main process is best described as:
def main():
generator_source = generator_for_test_data() # 1. contact server to get data.
uw = UploadWrapper(generator_source) # 2. wrap the data.
while not interrupt(): # 3. check for interrupts.
row = next(uw)
if row is None:
return
print(long_running_job(row)) # 4. do the work.
Is there a way to get to __next__ without having to plug __iter__?
Having two steps - (1) make an iterator, then (2) iterate over it, just seems clumsy.
There are many cases where I'd prefer to submit a function to a function manager (mapreduce style), but in this case I need an instantiated class with some settings. Registering a single function can therefor only work if that function alone is __next__
class UploadWrapper(object):
def __init__(self, generator):
self.generator = generator
self._iterator = None
def __iter__(self):
for page in self.generator:
yield from page.data
def __next__(self):
if self._iterator is None: # ugly bit.
self._iterator = self.__iter__() #
try:
return next(self._iterator)
except StopIteration:
return None
Q: Is there a simpler way?
Working sample added for completeness:
import time
import random
class Page(object):
def __init__(self, data):
self.data = data
def generator_for_test_data():
for t in range(10):
page = Page(data=[(t, i) for i in range(100, 110)])
yield page
def long_running_job(row):
time.sleep(random.randint(1,10)/100)
assert len(row) == 2
assert row[0] in range(10)
assert row[1] in range(100, 110)
return row
def interrupt(): # interrupt check
if random.randint(1,50) == 1:
print("INTERRUPT SIGNAL!")
return True
return False
class UploadWrapper(object):
def __init__(self, generator):
self.generator = generator
self._iterator = None
def __iter__(self):
for ft in self.generator:
yield from ft.data
def __next__(self):
if self._iterator is None:
self._iterator = self.__iter__()
try:
return next(self._iterator)
except StopIteration:
return None
def main():
gen = generator_for_test_data()
uw = UploadWrapper(gen)
while not interrupt(): # check for job interrupt.
row = next(uw)
if row is None:
return
print(long_running_job(row))
if __name__ == "__main__":
main()
Your UploadWrapper seems overtly complex, there is more than a single simpler solution.
My first thought is to ditch the class altogether and just use a function instead:
def uploadwrapper(page_gen):
for page in page_gen:
yield from page.data
Just replace uw = UploadWrapper(gen) with uw = uploadwrapper(gen), and that'll work.
If you insist on the class, you can just get rid of the __next__() and replace uw = UploadWrapper(gen) with uw = iter(UploadWrapper(gen)), and it'll work.
In either case, you must also catch the StopIteration in the caller. __next__() is supposed to raise StopIteration when it's done, and not return None, like yours does. Otherwise, it won't work with things expecting a well-behaving iterator, eg. for loops.
I think you might have some misconceptions about how it all is supposed to fit together, so I'll try my best to explain how it's supposed to work, to the best of my knowledge:
The point of __iter__() is that if you have eg. a list, you can get multiple independent iterators by calling iter(). When you have a for loop, you're essentially first getting an iterator with iter() and then calling next() on it on every loop iteration. If you have two nested loops that use the same list, the iterators and their positions are still separate so there's no conflict. __iter__() is supposed to return an iterator for the container it's on, or if it's called on an iterator, it's supposed to just return self. In that sense, it's kind of wrong for UploadWrapper not to return self in __iter__(), since it wraps a generator and so can't really give independent iterators. As for why leaving out __next__() works, it's because when you define a generator (ie. use yield in a function), the generator has an __iter__() (that returns self, as it should) and __next__() that does what you'd expect. In your original code, you're not really using __iter__() at all for what it's supposed to be used: the code works even if you rename it to something else! This is because you never call iter() on the instance, and just directly call next().
If you wanted to do it "properly" as a class, I think something like this might suffice:
class UploadWrapper(object):
def __init__(self, generator):
self.generator = generator
self.subgen = iter(next(generator).data)
def __iter__(self):
return self
def __next__(self):
while True:
try:
return next(self.subgen)
except StopIteration:
self.subgen = iter(next(self.generator).data)

decompose "with" statements to various functions

I wrote a generic framework that help me to bench-mark code critical sections.
Here is an explanation of the framework and in the end is the problem I am facing and few ideas I have for solutions.
Basically, I am looking for more elegant solutions
Suppose I have a function that does this (in pseudo code):
#Pseudo Code - Don't expect it to run
def foo():
do_begin()
do_critical()
some_value = do_end()
return some_value
I want to run "do_critical" section many times in loop and measure the time but still get the return value.
so, I wrote BenchMarker class that its api is something like that:
#Pseudo Code - Don't expect it to run
bm = BenchMarker(first=do_begin, critical=do_critical, end=do_end)
bm.start_benchmarking()
returned_value = bm.returned_value
benchmark_result = bm.time
This Benckmarker internally performing the following:
#Pseudo Code - Don't expect it to run
class BenchMarker:
def __init__(self):
.....
def start_benchmarking(self):
first()
t0 = take_time
for i in range(n_loops):
critical()
t1 = take_time
self.time = (t1-t0)/n_loops
value = end()
self.returned_value = value
Important to mention that I also able to pass context between first, critical and end functions, but I omitted it for simplicity as this is not the gist of my question.
This framework is working like a charm until the following use case:
I have the following code
#Pseudo Code - Don't expect it to run
def bar():
do_begin()
with some_context_manager() as ctx:
do_critical()
some_value = do_end()
return some_value
Now, after this long introduction (sorry ...), I am getting to the real question.
I don't want to run the "with statement" in the time measuring loop, but the critical code needs the context manger.
so what I basically want is equivalent to the following decomposing of bar:
first -> do_begin() + "what happens in the with before the with body"
critical -> do_critical()
end -> "what happens after the with body" + do_end()
Two Solutions I thought of (but I don't like):
Solution 1
Mimic what with does under the hood
In end of first()m create the context manager object + run it's enter() function
In the start of end(), call the context manager exit() function
Solution 2
Framework Enhancement to handle CM
Add to the framework a "context work mode" (flag, whatever ...) on which the "start_benchmarking" flow will look like this:
#Pseudo Code - Don't expect it to run
def start_benchmarking(self):
first() #including instantiating the context manager
ctx = get_the_context_manager_created_in_first()
with ctx ...:
t0 = take_time
for i in range(n_loops):
critical()
t1 = take_time
self.time = (t1-t0)/n_loops
value = end()
self.returned_value = value
Any other, more elegant, solutions?
this is way over-complicated. and i cannot quite figure out why you'd actually want to do this, but assuming that you have reasons, just create a function that does your timing for you:
def run_func_n_times(n_times, func, *args, **kwargs):
start = time.time()
for _ in range(n_times):
res = func(*args, **kwargs)
return res, (time.time() - start) / n_times
no need for a class, just a simple func:
def example():
do_begin()
print('look, i am here')
with ctx() as blah:
res, timed = run_func_n_times(27, f, foo, bar)
do_end()

Conditional execution without having to check repeatedly for a condition

I have a class with code that fits into the following template:
class aClass:
def __init__(self, switch = False):
self.switch = switch
def f(self):
done = False
while not done:
# a dozen lines of code
if self.switch:
# a single line of code
# another dozen lines of code
So the single line of code in the if statement will either never be executed, or it will be executed in all iterations. And this is actually known as soon as the object is initialized.
When self.switch is True, I would like the single line of code to be executed without having to check for self.switch at every single iteration. And when self.switch is False, I would like the single line of code to be ignored, again without having to repeatedly check for self.switch.
I have of course considered writing two versions of f and selecting the appropriate one in __init__ according to the value of the switch, but duplicating all this code except for a single line doesn't feel right.
Can anyone suggest an elegant way to solve this problem? Perhaps a way to generate the appropriate version of the f method at initialization?
That's a completely valid ask. If not for performance then for readability.
Extract the three pieces of logic (before, inside, and after your condition) in three separate methods and in f() just write two implementations of the big loop:
def first(self):
pass
def second(self):
pass
def third(self):
pass
def f(self):
if self.switch:
while ...:
self.first()
self.third()
else:
while ...:
self.first()
self.second()
self.third()
If you want it more elegant (although it depends on taste), you express the two branches of my f() into two methods first_loop and second_loop and then in __init__ assign self.f = self.first_loop or self.f = self.second_loop depending on the switch:
class SuperUnderperformingAccordingToManyYetReadable(object):
def __init__(self, switch):
if self.switch:
self.f = self._first_loop
else:
self.f = self._second_loop
def _first(self):
pass
def _second(self):
pass
def _third(self):
pass
def _first_loop(self):
while ...:
self.first()
self.third()
def _second_loop(self):
while ...:
self.first()
self.second()
self.third()
You may need to do some extra work to manage breaking out of the while loop.
If the .switch attribute is not supposed to change, try to select the loop body dynamicly in the __init__() method:
def __init__(self, switch=False):
self.switch = switch
self.__fBody = self.__fSwitchTrue if switch else self.__fSwitchFalse
def f(self):
self.__done = False
while not self.__done:
self.__fBody()
def __fSwitchTrue(self):
self.__fBodyStart()
... # a single line of code
self.__fBodyEnd()
def __fSwitchFalse(self):
self.__fBodyStart()
self.__fBodyEnd()
def __fBodyStart(self):
... # a dozen lines of code
def __fBodyEnd(self):
... # another dozen lines of code
Remember to change values used by more than one of the defined methods to attributes (like done is changed to .__done).
In a comment to my original question, JohnColeman suggested using exec and provided a link to another relevant question.
That was an excellent suggestion and the solution I was lead to is:
_template_pre = """\
def f(self):
for i in range(5):
print("Executing code before the optional segment.")
"""
_template_opt = """\
print("Executing the optional segment")
"""
_template_post = """\
print("Executing code after the optional segment.")
"""
class aClass:
def __init__(self, switch = False):
if switch:
fdef = _template_pre + _template_opt + _template_post
else:
fdef = _template_pre + _template_post
exec(fdef, globals(), self.__dict__)
# bind the function
self.f = self.f.__get__(self)
You can verify this actually works:
aClass(switch = False).f()
aClass(switch = True).f()
Before jumping to conclusions as to how "pythonic" this is, let me point out that such an approach is employed in a couple of metaclass recipes I have encountered and even in the Python Standard Library (check the implementation of namedtuple, to name one example).

conditional python with

I have a five or six resources that have nice 'with' handlers, and normally I'd do this:
with res1, res2, res3, res4, res5, res6:
do1
do2
However, sometimes one or more of these resources should not be activated. Which leads to very ugly repetitive code:
with res1, res3, res4, res6: # these always acquired
if res2_enabled:
with res2:
if res5_enabled:
with res5:
do1
do2
else:
do1
do2
else if res5_enabled:
with res5:
...
There must be clean easy ways to do this surely?
You could create a wrapper object that supports the with statement, and do the checking in there. Something like:
with wrapper(res1), wrapper(res2), wrapper(res3):
...
or a wrapper than handles all of them:
with wrapper(res1, res2, res3):
...
The definition for you wrapper would be:
class wrapper(object):
def __init__(self, *objs):
...
def __enter__(self):
initialize objs here
def __exit__(self):
release objects here
If I understand you correctly you can do this:
from contextlib import contextmanager, nested
def enabled_resources(*resources):
return nested(*(res for res,enabled in resources if enabled))
# just for testing
#contextmanager
def test(n):
print n, "entered"
yield
resources = [(test(n), n%2) for n in range(10)]
# you want
# resources = [(res1, res1_enabled), ... ]
with enabled_resources(*resources):
# do1, do2
pass
Original Poster here; here is my approach refined so far:
I can add (or monkey-patch) the bool operator __nonzero__ onto the with objects, returning whether they are enabled. Then, when objects are mutually exclusive, I can have:
with res1 or res2 or res3 or res4:
...
When an resource is togglable, I can create an empty withable that is a nop; wither seems a nice name for it:
class sither:
#classmethod
def __enter__(cls): pass
#classmethod
def __exit__(cls,*args): pass
...
with res1 or wither, res2 or wither:
...
I can also use this keeping the toggling out of the withable objects:
with res1 if res1enabled else wither, res2 if res2enabled else wither:
..
Finally, those I have most control over, I can integrate the enabled checking into the class itself such that when used and not enabled, they are nop:
with res1, res2, res3:
...
The with statement is absolutely adorable, it just seems a bit unentrenched yet. It will be interesting to see what finesse others come up with in this regard...

Categories