Lazy parametrization with pytest - python

When parametrizing tests and fixtures in pytest, pytest seem to eagerly evaluate all parameters and to construct some test list datastructure before starting to execute the tests.
This is a problem in 2 situations:
when you have many parameter values (e.g. from a generator) - the generator and test itself may run fast but all those parameter values eat up all the memory
when parametrizing a fixture with different kind of expensive resources, where you only can afford to run one resource at the same time (e.g. because they listen on the same port or something like that)
Thus my question: Is it possibly to tell pytest to evaluate the parameters on the fly (i.e. lazily)?

EDIT: my first reaction would be "that is exactly what parametrized fixtures are for": a function-scoped fixture is a lazy value being called just before the test node is executed, and by parametrizing the fixture you can predefine as many variants (for example from a database key listing) as you like.
from pytest_cases import fixture_plus
#fixture_plus
def db():
return <todo>
#fixture_plus
#pytest.mark.parametrize("key", [<list_of keys>])
def sample(db, key):
return db.get(key)
def test_foo(sample):
return sample
That being said, in some (rare) situations you still need lazy values in a parametrize function, and you do not wish these to be the variants of a parametrized fixture. For those situations, there is now a solution also in pytest-cases, with lazy_value. With it, you can use functions in the parameter values, and these functions get called only when the test at hand is executed.
Here is an example showing two coding styles (switch the use_partial boolean arg to True to enable the other alternative)
from functools import partial
from random import random
import pytest
from pytest_cases import lazy_value
database = [random() for i in range(10)]
def get_param(i):
return database[i]
def make_param_getter(i, use_partial=False):
if use_partial:
return partial(get_param, i)
else:
def _get_param():
return database[i]
return _get_param
many_lazy_parameters = (make_param_getter(i) for i in range(10))
#pytest.mark.parametrize('a', [lazy_value(f) for f in many_lazy_parameters])
def test_foo(a):
print(a)
Note that lazy_value also has an id argument if you wish to customize the test ids. The default is to use the function __name__, and a support for partial functions is on the way.
You can parametrize fixtures the same way, but remember that you have to use #fixture_plus instead of #pytest.fixture. See pytest-cases documentation for details.
I'm the author of pytest-cases by the way ;)

As for your 2 question - proposed in comment link to manual seems like exactly what one should do. It allows "to setup expensive resources like DB connections or subprocess only when the actual test is run".
But as for 1 question it seems like such feature not implemented. You may directly pass generator to parametrize like so:
#pytest.mark.parametrize('data', data_gen)
def test_gen(data):
...
But pytest will list() of your generator -> RAM problems persists here as well.
I've also found some github issues than shed more light about why pytest not handle generator lazily. And it seems like a design problem. So "its not possible to correctly manage parametrization having a generator as value" because of
"pytest would have to collect all those tests with all the metadata...
collection happens always before test running".
There are also some refers to hypothesis or nose's yield-base tests in such cases. But if you still want to stick to pytest there are some workarounds:
If you somehow knew the number of generated params you may do the following:
import pytest
def get_data(N):
for i in range(N):
yield list(range(N))
N = 3000
data_gen = get_data(N)
#pytest.mark.parametrize('ind', range(N))
def test_yield(ind):
data = next(data_gen)
assert data
So here you parametrize over index (which is not so useful - just indicating pytest number of executions it must made) and generate data inside next run.
You may also wrap it to memory_profiler:
Results (46.53s):
3000 passed
Filename: run_test.py
Line # Mem usage Increment Line Contents
================================================
5 40.6 MiB 40.6 MiB #profile
6 def to_profile():
7 76.6 MiB 36.1 MiB pytest.main(['test.py'])
And compare with straightforward:
#pytest.mark.parametrize('data', data_gen)
def test_yield(data):
assert data
Which 'eats' much more memory:
Results (48.11s):
3000 passed
Filename: run_test.py
Line # Mem usage Increment Line Contents
================================================
5 40.7 MiB 40.7 MiB #profile
6 def to_profile():
7 409.3 MiB 368.6 MiB pytest.main(['test.py'])
If you want to parametrize your test over another params at the same time you may do a bit generalization of previous clause like so:
data_gen = get_data(N)
#pytest.fixture(scope='module', params=len_of_gen_if_known)
def fix():
huge_data_chunk = next(data_gen)
return huge_data_chunk
#pytest.mark.parametrize('other_param', ['aaa', 'bbb'])
def test_one(fix, other_param):
data = fix
...
So we use fixture here at module scope level in order to "preset" our data for parametrized test. Note that right here you may add another test and it will receive generated data as well. Simply add it after test_two:
#pytest.mark.parametrize('param2', [15, 'asdb', 1j])
def test_two(fix, param2):
data = fix
...
NOTE: if you do not know the number of generated data you may use this trick: set some approximate value (better if it be a bit higher than generated tests count) and 'mark' tests passed if it stops with StopIteration which will happen when all data generated already.
Another possibility is to use Factories as fixtures. Here you embed your generator into fixture and try yield in your test till it not ends. But here is another disadvantage - pytest will treat it as single test (with possibly bunch of checks inside) and will fail if one of generated data fails. Other words if compare to parametrize approach not all pytest statistic/features may be accessed.
And yet one another is to use pytest.main() in the loop something like so:
# data_generate
# set_up test
pytest.main(['test'])
Is not concerning iterators itself rather the way to save more Time/RAM if one has parametrized test:
Simply move some parametrization inside tests. Example:
#pytest.mark.parametrize("one", list_1)
#pytest.mark.parametrize("two", list_2)
def test_maybe_convert_objects(self, one, two):
...
Change to:
#pytest.mark.parametrize("one", list_1)
def test_maybe_convert_objects(self, one):
for two in list_2:
...
It's similar to factories but even more easy to implement. Also it not only reduce RAM multiple times but time for collecting metainfo as well. Drawbacks here - for pytest it would be one test for all two values. And it works smoothly with "simple" tests - if one have some special xmarks inside or something there might be problems.
I've also opened corresponding issue there might appear some additional info/tweaks about this problem.

You may find this workaround useful:
from datetime import datetime, timedelta
from time import sleep
import pytest
#pytest.mark.parametrize(
'lazy_params',
[
lambda: (datetime.now() - timedelta(days=1), datetime.now()),
lambda: (datetime.now(), datetime.now() + timedelta(days=1)),
],
)
def test_it(lazy_params):
yesterday, today = lazy_params()
print(f'\n{yesterday}\n{today}')
sleep(1)
assert yesterday < today
Sample output:
========================================================================= test session starts ==========================================================================
platform darwin -- Python 3.7.7, pytest-5.3.5, py-1.8.1, pluggy-0.13.1 -- /usr/local/opt/python/bin/python3.7
cachedir: .pytest_cache
rootdir: /Users/apizarro/tmp
collected 2 items
test_that.py::test_it[<lambda>0]
2020-04-14 18:34:08.700531
2020-04-15 18:34:08.700550
PASSED
test_that.py::test_it[<lambda>1]
2020-04-15 18:34:09.702914
2020-04-16 18:34:09.702919
PASSED
========================================================================== 2 passed in 2.02s ===========================================================================

Related

How can I fail tests in a "teardown" fixture properly in pytest?

I have a test framework that is performing multiple asserts and catching them (inherited someone else's code). The proprietary results report is correct, but as you may have guessed, pytest will mark these as passed:
test_something.py::TestSomething::test_that_should_fail PASSED
I've added an autouse fixture as such:
#pytest.fixture(autouse=True)
def run_after_tests(self):
yield
if self.actually_failed():
pytest.fail("Yay, failing when failures occur is cool!")
This solution works okay, except that it seems like the clean up happens after the test has already been marked as PASSED and a duplicate test is shown with an error.
Now pytest results look like this:
test_something.py::TestSomething::test_that_should_fail PASSED
test_something.py::TestSomething::test_that_should_fail ERROR
Is there a way to delay the evaluation of the test so it doesn't say it has passed?
I know this is a really stupid way of doing things and performing test evaluation at cleanup is not recommended, but there are too many tests that have been written this way and spending weeks to refactor the test is not feasible.
An alternative solution I've thought of is to write a decorator and then sed all the test cases and add it to the functions; but this is going to be my plan B if fixtures can't solve this.
Thanks!
As you've discovered, a fixture is a separate object from the test itself.
You'll need to modify the appropriate pytest hook. I haven't personally tested, but I believe placing the following code into your projects conftest.py will give you your desired result.
def check_for_failure(output) -> bool:
# define me
#pytest.hookimpl(hookwrapper=True)
def pytest_runtest_call(item):
output = yield
if check_for_failure(output):
pytest.fail()

Pytest: parametrizing tests that require a slow initialization

I want to do tests with randomized parameters of a class with a very slow init method. The tests themself are very quick, but require a time consuming initialization step.
Of course. I do something like this:
#pytest.mark.parametrize("params", LIST_OF_RANDOMIZED_PARAMS)
def test_one(params):
state = very_slow_initialization(params)
assert state.fast_test()
#pytest.mark.parametrize("params", LIST_OF_RANDOMIZED_PARAMS)
def test_two(params):
state = very_slow_initialization(params)
assert state.another_fast_test()
From my unsuccessful tries so far I've learnt:
initializing a Testclass with a parametrized set_class(params) method is not supported
Using a fixture that initialized the class still calls the slow initialization every time
I could create a list with all initialized states in advance, however they demand a lot of memory. Furthermore sometimes I like to rune a lot of randomized tests overnight and just stop them the next morning. This this I would need to know precisely how many tests I should to so that all initializations are finished before that.
If possible I would prefer a solution that runs both tests for the first parameter, then runs both with the second parameter and so on.
There is probably a really simple solution for this.
pytest fixtures is a solution for you. Lifetime of fixture might be a single test, class, module or whole test session.
fixture management scales from simple unit to complex functional testing, allowing to parametrize fixtures and tests according to configuration and component options, or to re-use fixtures across function, class, module or whole test session scopes.
Per Fixture availability paragraph, you need to define feature in class, or on module level.
Consider using module-scoped ones (pay attention, that initialization launched only once):
import pytest
#pytest.fixture(scope="module")
def heavy_context():
# Use your LIST_OF_RANDOMIZED_PARAMS randomized parameters here
# to initialize whatever you want.
print("Slow fixture initialized")
return ["I'm heavy"]
def test_1(heavy_context):
print(f"\nUse of heavy context: {heavy_context[0]}")
def test_2(heavy_context):
print(f"\nUse of heavy context: {heavy_context[0]}")
Tests output:
...
collecting ... collected 2 items
test_basic.py::test_1 Slow fixture initialized
PASSED [ 50%]
Use of heavy context: I'm heavy
test_basic.py::test_2 PASSED [100%]
Use of heavy context: I'm heavy
Now, if you need it to be assertion safe (release resources even when test fails), consider creating heavy_context in a context-manager manner (much more details here: Fixture, Running multiple assert statements safely):
import pytest
#pytest.fixture(scope="module")
def heavy_context():
print("Slow context initialized")
obj = ["I'm heavy"]
# It is mandatory to put deinitialiation into "finally" scope
# otherwise in case of exception it won't be executed
try:
yield obj[0]
finally:
print("Slow context released")
def test_1(heavy_context):
# Pay attention, that in fact heavy_context now
# is what we initialized as 'obj' in heavy_context
# function.
print(f"\nUse of heavy context: {heavy_context}")
def test_2(heavy_context):
print(f"\nUse of heavy context: {heavy_context}")
Output:
collecting ... collected 2 items
test_basic.py::test_1 Slow context initialized
PASSED [ 50%]
Use of heavy context: I'm heavy
test_basic.py::test_2 PASSED [100%]
Use of heavy context: I'm heavy
Slow context released
============================== 2 passed in 0.01s ===============================
Process finished with exit code 0
Could you perhaps run the tests one after another without initializing the object again, e.g.:
#pytest.mark.parametrize("params", LIST_OF_RANDOMIZED_PARAMS)
def test_one(params):
state = very_slow_initialization(params)
assert state.fast_test()
assert state.another_fast_test()
or using separate functions for organization:
#pytest.mark.parametrize("params", LIST_OF_RANDOMIZED_PARAMS)
def test_main(params):
state = very_slow_initialization(params)
step_one(state)
step_two(state)
def step_one(state):
assert state.fast_test()
def step_two(state):
assert state.another_fast_test()
Although it's a test script, you can still use functions to organize your code. In the version with separate functions you may even declare a fixture, in case the state may be needed in other tests, too:
#pytest.fixture(scope="module", params=LIST_OF_RANDOMIZED_PARAMS)
def state(request):
return very_slow_initialization(request.param)
def test_main(state):
step_one(state)
step_two(state)
def step_one(state):
assert state.fast_test()
def step_two(state):
assert state.another_fast_test()
I hope I didn't do a mistake here, but it should work like this.

Pytest: Parameterize unit test using a fixture that uses another fixture as input

I am new to parameterize and fixtures and still learning. I found a few post that uses indirect paramerization but it is difficult for me to implement based on what I have in my code. Would appreciate any ideas on how I could achieve this.
I have a couple of fixtures in my conftest.py that supply input files to a function "get_fus_output()" in my test file. That function process the input and generate two data-frames to compare in my testing. Further, I am subletting those two DF based on a common value ('Fus_id') to testthem individually. So the output of this function would be[(Truth_df1, test_df1),(Truth_df2, test_df2)...] just to parameterize the testing of each of these test and truth df. Unfortunately I am not able to use this in my test function "test_annotation_match" since this function needs a fixture.
I am not able to feed the fixture as input to another fixture to parameterize. Yes it is not supported in pytest but not able to figure out a workaround with indirect parameterization.
#fixtures from conftest.py
#pytest.fixture(scope="session")
def test_input_df(fixture_path):
fus_bkpt_file = os.path.join(fixture_path, 'test_bkpt.tsv')
test_input_df= pd.read_csv(fus_bkpt_file, sep='\t')
return test_input_df
#pytest.fixture
def test_truth_df(fixture_path):
test_fus_out_file = os.path.join(fixture_path, 'test_expected_output.tsv')
test_truth_df = pd.read_csv(test_fus_out_file, sep='\t')
return test_truth_df
#pytest.fixture
def res_path():
return utils.get_res_path()
#test script
#pytest.fixture
def get_fus_output(test_input_df, test_truth_df, res_path):
param_list = []
# get output from script
script_out = ex_annot.run(test_input_df, res_path)
for index, row in test_input_df.iterrows():
fus_id = row['Fus_id']
param_list.append((get_frame(test_truth_df, fus_id), get_frame(script_out, fus_id)))
# param_list eg : [(Truth_df1, test_df1),(Truth_df2, test_df2)...]
print(param_list)
return param_list
#pytest.mark.parametrize("get_fus_output", [test_input_df, test_truth_df, res_path], indirect=True)
def test_annotation_match(get_fus_output):
test, expected = get_fusion_output
assert_frame_equal(test, expected, check_dtype=False, check_like=True)
#OUTPUT
================================================================================ ERRORS ================================================================================
_______________________________________________________ ERROR collecting test_annotations.py
_______________________________________________________
test_annotations.py:51: in <module>
#pytest.mark.parametrize("get_fus_output", [test_input_df, test_truth_df, res_path], indirect=True)
E NameError: name 'test_input_df' is not defined
======================================================================= short test summary info ========================================================================
ERROR test_annotations.py - NameError: name 'test_input_df' is not defined
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=========================================================================== 1 error in 1.46s ===========================================================================
I'm not 100% sure I understand what you are trying to do here, but I think your understanding of parameterization and the role of fixtures is incorrect. It seems like you are trying to use the fixtures to create the parameter lists for your tests, which isn't really the right way to go about it (and the way you are doing it certainly won't work, as you are seeing).
To fully explain how to fix this, first, let me give a little background about how parameterization and fixtures are meant to be used.
Parameterization
I don't think anything here should be new, but just to make sure we are on the same page:
Normally, in Pytest, one test_* function is one test case:
def test_square():
assert square(3) == 9
If you want to do the same test but with different data, you can write separate tests:
def test_square_pos():
assert square(3) == 9
def test_square_frac():
assert square(0.5) == 0.25
def test_square_zero():
assert square(0) == 0
def test_square_neg():
assert square(-3) == 9
This isn't great, because it violates the DRY principle. Parameterization is the solution to this. You turn one test case into several by providing a list of test parameters:
#pytest.mark.parametrize('test_input,expected',
[(3, 9), (0.5, 0.25), (0, 0), (-3, 9)])
def test_square(test_input, expected):
assert square(test_input) == expected
Fixtures
Fixtures are also about DRY code, but in a different way.
Suppose you are writing a web app. You might have several tests that need a connection to the database. You can add the same code to each test to open and set up a test database, but that's definitely repeating yourself. If you, say, switch databases, that's a lot of test code to update.
Fixtures are functions that allow you to do some setup (and potentially teardown) that can be used for multiple tests:
#pytest.fixture
def db_connection():
# Open a temporary database in memory
db = sqlite3.connect(':memory:')
# Create a table of test orders to use
db.execute('CREATE TABLE orders (id, customer, item)')
db.executemany('INSERT INTO orders (id, customer, item) VALUES (?, ?, ?)',
[(1, 'Max', 'Pens'),
(2, 'Rachel', 'Binders'),
(3, 'Max', 'White out'),
(4, 'Alice', 'Highlighters')])
return db
def test_get_orders_by_name(db_connection):
orders = get_orders_by_name(db_connection, 'Max')
assert orders = [(1, 'Max', 'Pens'),
(3, 'Max', 'White out')]
def test_get_orders_by_name_nonexistent(db_connection):
orders = get_orders_by_name(db_connection, 'John')
assert orders = []
Fixing Your Code
Ok, so with that background out of the way, let's dig into your code.
The first problem is with your #pytest.mark.parametrize decorator:
#pytest.mark.parametrize("get_fus_output", [test_input_df, test_truth_df, res_path], indirect=True)
This isn't the right situation to use indirect. Just like tests can be parameterized, fixtures can be parameterized, too. It's not very clear from the docs (in my opinion), but indirect is just an alternative way to parameterize fixtures. That's totally different from using a fixture in another fixture, which is what you want.
In fact, for get_fus_output to use the test_input_df, test_truth_df, and res_path fixtures, you don't need the #pytest.mark.parametrize line at all. In general, any argument to a test function or fixture is automatically assumed to be a fixture if it's not otherwise used (e.g. by the #pytest.mark.parametrize decorator).
So, your existing #pytest.mark.parametrize isn't doing what you expect. How do you parameterize your test then? This is getting into the bigger problem: you are trying to use the get_fus_output fixture to create the parameters for test_annotation_match. That isn't the sort of thing you can do with a fixture.
When Pytest runs, first it collects all the test cases, then it runs them one by one. Test parameters have to be ready during the collection stage, but fixtures don't run until the testing stage. There is no way for code inside a fixture to help with parameterization. You can still generate your parameters programmatically, but fixtures aren't the way to do it.
You'll need to do a few things:
First, convert get_fus_output from a fixture to a regular function. That means removing the #pytest.fixture decorator, but you've also got to update it not to use the test_input_df test_truth_df, and res_path fixtures. (If nothing else needs them as fixtures, you can convert them all to regular functions, in which case, you probably want to put them in their own module outside of conftest.py or just move them into the same test script.)
Then, #pytest.mark.parametrize needs to use that function to get a list of parameters:
#pytest.mark.parametrize("expected,test", get_fus_output())
def test_annotation_match(expected, test):
assert_frame_equal(test, expected, check_dtype=False, check_like=True)

Pytest class scope parametrization

I have a couple of fixtures that do some initialization that is rather expensive. Some of those fixtures can take parameters, altering their behaviour slightly.
Because these are so expensive, I wanted to do initialisation of them once per test class. However, it does not destroy and reinit the fixtures on the next permutation of parameters.
See this example: https://gist.github.com/vhdirk/3d7bd632c8433eaaa481555a149168c2
I would expect that StuffStub would be a different instance when DBStub is recreated for parameters 'foo' and 'bar'.
Did I misunderstand something? Is this a bug?
I've recently encountered the same problem and wanted to share another solution. In my case the graph of fixtures that required regenerating for each parameter set was very deep and it's not so easy to control. An alternative is to bypass the pytest parametrization system and programmatically generate the test classes like so:
import pytest
import random
def make_test_class(name):
class TestFoo:
#pytest.fixture(scope="class")
def random_int(self):
return random.randint(1, 100)
def test_someting(self, random_int):
assert random_int and name == "foo"
return TestFoo
TestFooGood = make_test_class("foo")
TestFooBad = make_test_class("bar")
TestFooBad2 = make_test_class("wibble")
You can see from this that three tests are run, one passes (where "foo" == "foo") the other two fail, but you can see that the class scope fixtures have been recreated.
This is not a bug. There is no relation between the fixtures so one of them is not going to get called again just because the other one was due to having multiple params.
In your case db is called twice because db_factory that it uses has 2 params. The stuff fixture on the other hand is called only once because stuff_factory has only one item in params.
You should get what you expect if stuff would include db_factory as well without actually using its output (db_factory would not be called more than twice):
#pytest.fixture(scope="class")
def stuff(stuff_factory, db_factory):
return stuff_factory()

Python: Modify SetUp based on TestCase in unittest.TestCase

I want each testCase while loading setup function should declare different values of "x". Is there a way I can achieve in setUp function. Sample code is mentioned below. How to change PSEUDO CODE in setUp function below?
import random
import unittest
class TestSequenceFunctions(unittest.TestCase):
def setUp(self):
# ***PSEUDO CODE***
x = 10 # if test_shuffle uses setUp()
x = 20 # if test_choice uses setUp()
x = 30 # if test_sample uses setUp()
# ***PSEUDO CODE***
def test_shuffle(self):
#test_shuffle
def test_choice(self):
#test_choice
def test_sample(self):
#test_choice
if __name__ == '__main__':
unittest.main()
I can achieve by writing each testcase in different file but I would drastically increases number of files.
One unittest file thematically captures tests that all cover similar features. The setup is used to get that feature into a testable state.
Move that assignment of X into the actual test method (keeps X = 0 in the setup if you want every test to actually have an X). It makes it clearer when reading the test exactly what is happening and how it is being tested. You shouldn't have conditional logic that affect how tests work inside your setup function because you are introducing complexity into the test's preconditions, which means you have a much larger surface area for errors.
Perhaps I am missing the point, but the assignment in your pseudo code could just be moved to the start of the corresponding test. If the "assignment" is more complex, or spans multiple tests, then just create functions outside the test case but inside the file and the corresponding tests invoke whatever functions are supposed to be part of their "setUp".

Categories