Pytest fixtures not persisting despite #session scope - python

I have the following pytest tests set up. I would like my_factory to only be called once per name in names and file_factory to only be called once per filename in filenames
#pytest.fixture(scope="session", params=names)
def my_factory(request):
name = request.param
# create a, b, c which are expensive objects to construct
return a, b, c
#pytest.fixture(scope="session", params=filenames)
def file_factory(request):
fname = request.param
# create f which is a big file to open
return f
def test_single(my_factory, file_factory):
a, b, c = my_factory
f = file_factory
# process file f with a, b, c
# assert successful?
def test_double(my_factory, file_factory):
a, b, c = my_factory
f = file_factory
# process file f backward with a, b, c
# assert successful?
I thought scope='session' would do this for me, however when I print some debugging statements from within each factory function, I see print statements multiple times for each name in my_factory and each filename in filefactory. Is there something I am doing wrong, or is it possible to do what I am looking for?
The only other way I could think of was some sort of function that is executed first which creates a map of all of the objects I am looking for, but that seems clunky.

Related

Refactoring nested loop with shared variables

I have a function that process some quite nested data, using nested loops. Its simplified structure is something like this:
def process_elements(root):
for a in root.elements:
if a.some_condition:
continue
for b in a.elements:
if b.some_condition:
continue
for c in b.elements:
if c.some_condition:
continue
for d in c.elements:
if d.some_condition:
do_something_using_all(a, b, c, d)
This does not look very pythonic to me, so I want to refactor it. My idea was to break it in multiple functions, like:
def process_elements(root):
for a in root.elements:
if a.some_condition:
continue
process_a_elements(a)
def process_a_elements(a):
for b in a.elements:
if b.some_condition:
continue
process_b_elements(b)
def process_b_elements(b):
for c in b.elements:
if c.some_condition:
continue
process_c_elements(c)
def proccess_c_elements(c):
for d in c.elements:
if d.some_condition:
do_something_using_all(a, b, c, d) # Problem: I do not have a nor b!
As you can see, for the more nested level, I need to do something using all its "parent" elements. The functions would have unique scopes, so I couldn't access those elements. Passing all the previous elements to each function (like proccess_c_elements(c, a, b)) does look ugly and not very pythonic to me either...
Any ideas?
I don't know the exact data structures and the complexity of your code but you may try to use a list to pass the object reference to the next daisy chained function something like the following:
def process_elements(root):
for a in root.elements:
if a.some_condition:
continue
listobjects=[]
listobjects.append(a)
process_a_elements(a,listobjects)
def process_a_elements(a,listobjects):
for b in a.elements:
if b.some_condition:
continue
listobjects.append(b)
process_b_elements(b,listobjects)
def process_b_elements(b,listobjects):
for c in b.elements:
if c.some_condition:
continue
listobjects.append(c)
process_c_elements(c,listobjects)
def process_c_elements(c,listobjects):
for d in c.elements:
if d.some_condition:
listobjects.append(d)
do_something_using_all(listobjects)
def do_something_using_all(listobjects):
print(listobjects)
FWIW, I've found a solution, which is to encapsulate all the proccessing inside a class, and having attributes to track the currently processed elements:
class ElementsProcessor:
def __init__(self, root):
self.root = root
self.currently_processed_a = None
self.currently_processed_b = None
def process_elements(self):
for a in self.root.elements:
if a.some_condition:
continue
self.process_a_elements(a)
def process_a_elements(self, a):
self.currently_processed_a = a
for b in a.elements:
if b.some_condition:
continue
self.process_b_elements(b)
def process_b_elements(self, b):
self.currently_processed_b = b
for c in b.elements:
if c.some_condition:
continue
self.process_c_elements(c)
def process_c_elements(self, c):
for d in c.elements:
if d.some_condition:
do_something_using_all(
self.currently_processed_a,
self.currently_processed_b,
c,
d
)

Pytest - combining multiple fixture parameters into a single fixture to optimise fixture instantiation

I have an existing pytest test that makes use of some predefined lists to test the cross-product of them all:
A_ITEMS = [1, 2, 3]
B_ITEMS = [4, 5, 6]
C_ITEMS = [7, 8, 9]
I also have an expensive fixture that has internal conditions dependent on A and B items (but not C), called F:
class Expensive:
def __init__(self):
# expensive set up
time.sleep(10)
def check(self, a, b, c):
return True # keep it simple, but in reality this depends on a, b and c
#pytest.fixture
def F():
return Expensive()
Currently I have a naive approach that simply parametrizes a test function:
#pytest.mark.parametrize("A", A_ITEMS)
#pytest.mark.parametrize("B", B_ITEMS)
#pytest.mark.parametrize("C", C_ITEMS)
def test_each(F, A, B, C):
assert F.check(A, B, C)
This tests all combinations of F with A, B and C items, however it constructs a new Expensive instance via the F fixture for every test. More specifically, it reconstructs a new Expensive via fixture F for every combination of A, B and C.
This is very inefficient, because I should only need to construct a new Expensive when the values of A and B change, which they don't between all tests of C.
What I would like to do is somehow combine the F fixture with the A_ITEMS and B_ITEMS lists, so that the F fixture only instantiates a new instance once for each run through the values of C.
My first approach involves separating the A and B lists into their own fixtures and combining them with the F fixture:
class Expensive:
def __init__(self, A, B):
# expensive set up
self.A = A
self.B = B
time.sleep(10)
def check(self, c):
return True # keep it simple
#pytest.fixture(params=[1,2,3])
def A(request):
return request.param
#pytest.fixture(params=[4,5,6])
def B(request):
return request.param
#pytest.fixture
def F(A, B):
return Expensive(a, b)
#pytest.mark.parametrize("C", C_ITEMS)
def test_each2(F, C):
assert F.check(C)
Although this tests all combinations, unfortunately this creates a new instance of Expensive for each test, rather than combining each A and B item into a single instance that can be reused for each value of C.
I've looked into indirect fixtures, but I can't see a way to send multiple lists (i.e. both the A and B items) to a single fixture.
Is there a better approach I can take with pytest? Essentially what I'm looking to do is minimise the number of times Expensive is instantiated, given that it's dependent on values of item A and B.
Note: I've tried to simplify this, however the real-life situation is that F represents creation of a new process, A and B are command-line parameters for this process, and C is simply a value passed to the process via a socket. Therefore I want to be able to send each value of C to this process without recreating it every time C changes, but obviously if A or B change, I need to restart it (as they are command-line parameters to the process).
I've had some success using a more broadly scoped fixture (module or session) as "cache" for the per-test fixture for this sort of situation where the "lifetimes" of the fixtures proper don't align cleanly with amortised costs or whatever.
If using pytest scoping (as proposed in the other answer) is not an option, you may try to cache the expansive object, so that it will be constructed only if needed.
Basically, this expands the proposal given in the question with an additional static caching of the last used parameters to avoid creating a new Expansive if not needed:
#pytest.fixture(params=A_ITEMS)
def A(request):
return request.param
#pytest.fixture(params=B_ITEMS)
def B(request):
return request.param
class FFactory:
lastAB = None
lastF = None
#classmethod
def F(cls, A, B):
if (A, B) != cls.lastAB:
cls.lastAB = (A, B)
cls.lastF = Expensive(A, B)
return cls.lastF
#pytest.fixture
def F(A, B):
return FFactory.F(A, B)
#pytest.mark.parametrize("C", C_ITEMS)
def test_each(F, C):
assert F.check(C)

How to extend logging without aggressively modifying code? [Write clean code]

Let's say I have a calculate() method which have complicated calculation with many variables, while I want to log down what is the value of variables in different phase (EDIT: Not only for verification but for data study purpose). For example.
# These assignment are arbitrary,
# but my calculate() method is more complex
def calculate(a, b):
c = 2*a+b
d = a-b
if c > d+10:
g = another_calc(a, c):
else:
g = another_calc(a, d):
return c, d, g
def another_calc(a, c_d):
e = a+c_d
f = a*c_d
g = e+f
return g
You may assume the method will be modified a lot for experimental exploration.
There is no much logging here and I want to log down what happen, for example I can write aggressive code like this
# These assignment are arbitrary,
# but my calculate() method is more complex
def calculate(a, b):
info = {"a": a, "b": b}
c = 2*a+b
d = a-b
info["c"], info["d"] = c, d
if c > d+10:
info["switch"] = "entered c"
g, info = another_calc(a, c, info):
else:
info["switch"] = "entered d"
g, info = another_calc(a, d, info):
return c, d, g, info
def another_calc(a, c_d, info):
e = a+c_d
f = a*c_d
g = e+f
info["e"], info["f"], info["g"] = e, f, g
return g, info
This serve my purpose (I got the info object, then it will be exported as CSV for my further study)
But it is pretty ugly to add more (non-functional) lines to the original clean calculate() method, changing signature and return value.
But can I write a cleaner code?
I am thinking whether it is possible to use decorator to wrap this method. Hope you guys would have some great answers. Thanks.
One way to write cleaner code (my opinion) is to wrap the info -dictionary inside a class.
Here is my simple code example:
# These assignment are arbitrary,
# but my calculate() method is more complex
def calculate(a, b, logger):
logger.log("a", a)
logger.log("b", b)
c = 2*a+b
d = a-b
logger.log("c", c)
logger.log("d", d)
if c > d+10:
logger.log("switch", "entered c")
g = another_calc(a, c)
else:
logger.log("switch", "entered d")
g = another_calc(a, d)
return c, d, g
def another_calc(a, c_d, logger):
e = a+c_d
f = a*c_d
g = e+f
logger.log("e", e)
logger.log("f", f)
logger.log("g", g)
return g
class Logger(object):
data = []
def log(self, key, value):
self.data.append({key: value})
def getLog(self):
return self.data
logger = Logger()
print(calculate(4, 7, logger))
print(logger.getLog())
Pros and cons
I use separated logger class here because then I don't need to know how the logger is implemented. In the example, it is just a simple dictionary but if needed, you can just change the implementation of creating a new logger.
Also, you have a way to choose how to print the data or choose output. Maybe you can have an interface for Logger.
I used a dictionary because it looked like you was just needing key-value pairs.
Now, using the logger, we need to change method signature. Of course, you can define default value as None, for example. Then None value should be checked all the time but that is why I didn't define the default value. If you own the code and can change every reference for the calculate()method, then it should not be a problem.
There is also one interesting thing that could be important later. When you have debugged your output and not need to log anything anymore, then you can just implement Null object. Using Null object, you can just remove all logging without changing the code again.
I was trying to think how to use decorator but now find any good way. If only output should be logged, then decorator could work.

Best way to handle functions and sub functions

What is the 'Pythonic' way to handling functions and using subfunctions in a scenario where they are used in a particular order?
As one of the ideas seem to be that functions should be doing 1 thing, I run into the situation that I find myself splitting up functions while they have a fixed order of execution.
When functions are really a kind of 'do step 1', 'then with outcome of step 1, do step 2' I currently end up wrapping the step functions into another function while defining them on the same level. However, I'm wondering if this is indeed the way I should be doing this.
Example code:
def step_1(data):
# do stuff on data
return a
def step_2(data, a):
# do stuff on data with a
return b
def part_1(data):
a = step_1(data)
b = step_2(data, a)
return a, b
def part_2(data_set_2, a, b):
# do stuff on data_set_2 with a and b as input
return c
I'd be calling this from another file/script (or Jupyter notebook) as part_1 and then part_2
Seems to be working just fine for my purposes right now, but as I said I'm wondering at this (early) stage if I should be using a different approach for this.
I guess you can use a Class here, otherwise your code can be made shorter using the following:
def step_1(data):
# do stuff on data
return step_2(data, a)
def step_2(data, a):
# do stuff on data with a
return a, b
def part_2(data_set_2, a, b):
# do stuff on data_set_2 with a and b as input
return c
As a rule of thumb, if more functions use the same arguments, it is a good idea to group them together into a class. But you can also define a main() or run() function that makes uses of your functions in a sequential fashion. Since the example you have made is not too complex, I would avoid using classes and go for something like:
def step_1(data):
# do stuff on data
return step_2(data, a)
def step_2(data, a):
# do stuff on data with a
return a, b
def part_2(data_set_2, a, b):
# do stuff on data_set_2 with a and b as input
return c
def run(data, data_set_2, a, b):
step_1(data)
step_2(data, a)
part_2(data_set_2, a, b)
run(data, data_set_2, a, b)
If the code grows in complexity, using classes is advised. In the end, it's your choice.

Returning multiple objects from a pytest fixture

I am learning how to use pytest by testing a simple event emitter implementation.
Basically, it looks like this
class EventEmitter():
def __init__(self):
...
def subscribe(self, event_map):
# adds listeners to provided in event_map events
def emit(self, event, *args):
# emits event with given args
For convenience, I created a Listener class that is used in tests
class Listener():
def __init__(self):
...
def operation(self):
# actual listener
Currently, test looks the following way
#pytest.fixture
def event():
ee = EventEmitter()
lstr = Listener()
ee.subscribe({"event" : [lstr.operation]})
return lstr, ee
def test_emitter(event):
lstr = event[0]
ee = event[1]
ee.emit("event")
assert lstr.result == 7 # for example
In order to test event emitter, I need to check whether the inner state of the listener has changed after event propagation. Thus, I need two objects and I wonder if there is a better way to do this (maybe use two fixtures instead of one somehow) because this looks kinda ugly to me.
Usually in order to avoid tuples and beautify your code, you can join them back together to one unit as a class, which has been done for you, using collections.namedtuple:
import collections
EventListener = collections.namedtuple('EventListener', 'event listener')
Now modify your fixture:
#pytest.fixture
def event_listener():
e = EventListener(EventEmitter(), Listener())
e.event.subscribe({'event' : [e.listener.operation]})
return e
Now modify your test:
def test_emitter(event_listener):
event_listener.event.emit('event')
assert event_listener.listener.result == 7
You should use a Python feature called iterable unpacking into variables.
def test_emitter(event):
lstr, ee = event # unpacking
ee.emit("event")
assert lstr.result == 7
Basically, you are assigning event[0] to lstr, and event[1] to ee. Using this feature is a very elegant way to avoid using indexes.
Discarding
In case you are going to use your fixture in mutiple tests, and you don't need all values in every test, you can also discard some elements of the iterable if you are not interested in using them as follows:
l = ['a', 'b', 'c', 'd']
a, b, c, d = l # unpacking all elements
a, _, c, d = l # discarding b
a, _, _, d = l # python 2: discard b and c
a, *_, d = l # python 3: discard b and c
a, _, _, _ = l # python2: discard, b, c and d
a, *_ = l # python3: discard b, c, and d
In theory, you are not literally discarding the values, but in Python _, so-called “I don’t care”, is used for ignoring the specific values.
You will probably need two fixtures in this case.
You can try the #pytest.yield_fixture like:
#pytest.yield_fixture
def event():
...
yield <event_properties>
#pytest.yield_fixture
def listener(event):
...
yield <listener_properties>
Note: this is now deprecated https://docs.pytest.org/en/latest/yieldfixture.html
If you can not afford to easily split your tuple fixture into two independent fixtures, you can now "unpack" a tuple or list fixture into other fixtures using my pytest-cases plugin as explained in this answer.
For your example that would look like:
from pytest_cases import pytest_fixture_plus
#pytest_fixture_plus(unpack_into="lstr,ee")
def event():
ee = EventEmitter()
lstr = Listener()
ee.subscribe({"event" : [lstr.operation]})
return lstr, ee
def test_emitter(lstr, ee):
ee.emit("event")
assert lstr.result == 7 # for example
I landed here when searching for a similar topic.
Due to lack of reputation I cannot comment on the answer from #lmiguelvargasf (https://stackoverflow.com/a/56268344/2067635) so I need to create a separate answer.
I'd also prefer returning multiple values and unpacking them to individual variables. This results in concise, Pythonic code.
There is one caveat, though:
For tests that rely on fixtures with autouse=True, this results in a TypeError
Example:
#pytest.fixture(autouse=True)
def foo():
return 1, 2
# Gets fixture automagically through autouse
def test_breaks():
arg1, arg2 = foo
assert arg1 <= arg2
# Explicit request for fixture foo
def test_works(foo):
arg1, arg2 = foo
assert arg1 <= arg2
test_breaks
FAILED [100%]
def test_breaks():
> arg1, arg2 = foo
E TypeError: cannot unpack non-iterable function object
test_works
=============== 1 passed in 0.43s ===============
Process finished with exit code 0
PASSED [100%]
The fix is easy. But it took me some time to figure out what the problem was, so I thought I share my findings.

Categories