Load existing data catalog programmatically - python

I want to write pytest unit test in Kedro 0.17.5. They need to perform integrity checks on dataframes created by the pipeline.
These dataframes are specified in the catalog.yml and already persisted successfully using kedro run. The catalog.yml is in conf/base.
I have a test module test_my_dataframe.py in src/tests/pipelines/my_pipeline/.
How can I load the data catalog based on my catalog.yml programmatically from within test_my_dataframe.py in order to properly access my specified dataframes?
Or, for that matter, how can I programmatically load the whole project context (including the data catalog) in order to also execute nodes etc.?

For unit testing, we test just the function which we are testing, and everything external to the function we should mock/patch. Check if you really need kedro project context while writing the unit test.
If you really need project context in test, you can do something like following
from kedro.framework.project import configure_project
from kedro.framework.session import KedroSession
with KedroSession.create(package_name="demo", project_path=Path.cwd()) as session:
context = session.load_context()
catalog = context.catalog
or you can also create pytest fixture to use it again and again with scope of your choice.
#pytest.fixture
def get_project_context():
session = KedroSession.create(
package_name="demo",
project_path=Path.cwd()
)
_activate_session(session, force=True)
context = session.load_context()
return context
Different args supported by KedroSession create you can check it here https://kedro.readthedocs.io/en/0.17.5/kedro.framework.session.session.KedroSession.html#kedro.framework.session.session.KedroSession.create
To read more about pytest fixture you can refer to https://docs.pytest.org/en/6.2.x/fixture.html#scope-sharing-fixtures-across-classes-modules-packages-or-session

Related

Correct use of pytest fixtures of objects with Django

I am relatively new to pytest, so I understand the simple use of fixtures that looks like that:
#pytest.fixture
def example_data():
return "abc"
and then using it in a way like this:
def test_data(self, example_data):
assert example_data == "abc"
I am working on a django app and where it gets confusing is when I try to use fixtures to create django objects that will be used for the tests.
The closest solution that I've found online looks like that:
#pytest.fixture
def test_data(self):
users = get_user_model()
client = users.objects.get_or_create(username="test_user", password="password")
and then I am expecting to be able to access this user object in a test function:
#pytest.mark.django_db
#pytest.mark.usefixtures("test_data")
async def test_get_users(self):
# the user object should be included in this queryset
all_users = await sync_to_async(User.objects.all)()
.... (doing assertions) ...
The issue is that when I try to list all the users I can't find the one that was created as part of the test_data fixture and therefore can't use it for testing.
I noticed that if I create the objects inside the function then there is no problem, but this approach won't work for me because I need to parametrize the function and depending on the input add different groups to each user.
I also tried some type of init or setup function for my testing class and creating the User test objects from there but this doesn't seem to be pytest's recommended way of doing things. But either way that approach didn't work either when it comes to listing them later.
Is there any way to create test objects which will be accessible when doing a queryset?
Is the right way to manually create separate functions and objects for each test case or is there a pytest-way of achieving that?

How to explicitly instruct PyTest to drop a database after some tests?

I am writing unit-tests with pytest-django for a django app. I want to make my tests more performant and doing so is requiring me to keep data saved in database for a certain time and not drop it after one test only. For example:
#pytest.mark.django_db
def test_save():
p1 = MyModel.objects.create(description="some description") # this object has the id 1
p1.save()
#pytest.mark.django_db
def test_modify():
p1 = MyModel.objects.get(id=1)
p1.description = "new description"
What I want is to know how to keep both tests separated but both will use the same test database for some time and drop it thereafter.
I think what you need are pytest fixtures. They allow you yo create objects (stored in database if needed) that will be used during tests. You can have a look at pytest fixtures scope that you can set so that the fixture is not deleted from database and reloading for each test that requires it but instead is created once for a bunch of tests and deleted afterwards.
You should read the documentation of pytest fixtures (https://docs.pytest.org/en/6.2.x/fixture.html) and the section dedicated to fixtures' scope (https://docs.pytest.org/en/6.2.x/fixture.html#scope-sharing-fixtures-across-classes-modules-packages-or-session).

pytest mark django db: avoid to save fixture in tests

I need to:
Avoid to use save() in tests
Use #pytest.mark.django_db on all tests inside this class
Create a number of trx fixtures (10/20) to act like false data.
import pytest
from ngg.processing import (
elab_data
)
class TestProcessing:
#pytest.mark.django_db
def test_elab_data(self, plan,
obp,
customer,
bac,
col,
trx_1,
trx_2,
...):
plan.save()
obp.save()
customer.save()
bac.save()
col.save()
trx.save()
elab_data(bac, col)
Where the fixtures are simply models like that:
#pytest.fixture
def plan():
plan = Plan(
name = 'test_plan',
status = '1'
)
return plan
I don't find this way really clean. How would you do that?
TL;DR
test.py
import pytest
from ngg.processing import elab_data
#pytest.mark.django_db
class TestProcessing:
def test_elab_data(self, plan, obp, customer, bac, col, trx_1, trx_2):
elab_data(bac, col)
conftest.py
#pytest.fixture(params=[
('test_plan', 1),
('test_plan2', 2),
])
def plan(request, db):
name, status = request.param
return Plan.objects.create(name=name, status=status)
I'm not quite sure if I got it correctly
Avoid to use save() in tests
You may create objects using instance = Model.objects.create() or just put instance.save() in fixtures.
As described at note section here
To access the database in a fixture, it is recommended that the fixture explicitly request one of the db, transactional_db or django_db_reset_sequences fixtures.
and at fixture section here
This fixture will ensure the Django database is set up. Only required for fixtures that want to use the database themselves. A test function should normally use the pytest.mark.django_db() mark to signal it needs the database.
you may want to use db fixture in you records fixtures and keep django_db mark in your test cases.
Use #pytest.mark.django_db on all tests inside this class
To mark whole classes you may use decorator on classes or pytestmark variable as described here.
You may use pytest.mark decorators with classes to apply markers to all of its test methods
To apply marks at the module level, use the pytestmark global variable
Create a number of trx fixtures (10/20) to act like false data.
I didn't quite get what you were trying to do but still would assume that it is one of the following things:
Create multiple objects and pass them as fixtures
In that case you may want to create a fixture that will return generator or list and use whole list instead of multiple fixtures
Test case using different variants of fixture but with one/few at a time
In that case you may want to parametrize your fixture so it will return different objects and test case will run multiple times - one time per one variant

Python Behave - how to pass value from a scenario to use in a fixture on a feature level?

I have the following test scenario:
Check if project with a specific name was created
Edit this project
Verify that it was edited
remove this project as part of a teardown procedure
Here is an example code to achieve that:
Scenario:
#fixture.remove_edited_project
#web
Scenario: Edit a project data
Given Project was created with the following parameters
| project_name |
| my_project_to_edit |
When I edit the "my_project_to_edit" project
Then Project is edited
Step to save the data in some variable to be used in a teardown function(fixture):
#step('I edit the "{project_name}" project')
def step_impl(context, project_name):
# steps related to editing the project
# storing value in context variable to be used in fixture
context.edited_project_name = project_name
and an example fixture function to remove a project after scenario:
#fixture
def remove_edited_project(context):
yield
logging.info(f'Removing project: "{context.edited_project_name}"')
# Part deleting a project with name stored in context.edited_project_name
In such a configuration everything works fine and project is deleted by a fixture in any case(test failed or passed). Which is alright.
But, when I want to execute such a feature on a Feature level, means placing #fixture.remove_edited_project decorator before Feature Keyword:
#fixture.remove_edited_project
Feature: My project Edit feature
, then this is not working.
I know the reason already - the context.edited_project_name variable is cleaned after every scenario and it's no longer available for this fixture function later.
Is there any good way in passing a parameter somehow to a fixture on a feature level? Somehow globally?
I was trying to use global variables as an option, but this started to be a bit dirty and problematic in this framework.
Ideally it would be to have something like #fixture.edited_project_name('my_project_to_edit')
Because the context gets cleaned of variables created during execution of the scenario you need a mechanism that persists through the feature. One way to do this would be to create a dictionary or other container in the context during setup of the fixture so that it will persist through the feature. The scenarios can set attributes or add to the container and because the dictionary was added during the feature, it will still exist during destruction of the fixture. E.g.,
#fixture
def remove_edited_project(context):
context.my_fixture_properties = {}
yield
logging.info(f'Removing project: "{context.my_fixture_properties['edited_project_name']}"')
#step('I edit the "{project_name}" project')
def step_impl(context, project_name):
# steps related to editing the project
# storing value in context variable to be used in fixture
context.my_fixture_properties['edited_project_name'] = project_name

Django: Split tests across multiple files, but share same database?

I am using Django 1.8. I have been writing tests for my Django API in one long file called test_api.py. The structure of the file has been as follows:
def setUpModule():
management.call_command('loaddata', 'frontend/fixtures/data.json',
verbosity=0)
management.call_command('my_custom_command')
def tearDownModule():
management.call_command('flush', verbosity=0, interactive=False)
class TestAPIBNFViews(TestCase):
def test_api_view_list_all(self):
url = '/api/1.0/bnf_code'
# do testing
def test_api_view_query(self):
# more testint
The fixtures and management command being loaded once before all the tests run, and so far this has worked great.
Now however the file is getting long and unwieldy, and I want to split it into multiple files. I've created multiple files called test_list and test_query and given each a setUpModule section as above.
However, firstly this isn't DRY, and secondly when I run python manage.py test, lots of the tests fail with duplicate foreign key errors like:
ProgrammingError: relation "frontend_sha_id" already exists
I guess this isn't surprising, since the tests are trying to create the test database multiple times.
However, if I remove setUpModule from all but the first test (listed alphabetically by filename), the other tests fail because they can't see any data.
How can I run setUpModule once, before all the tests run, and still keep the tests in separate files for convenience?
Instead of using a global setUpModule for both test classes, you can alternatively use setUpTestData once at each TestCase class. From Django documentation: Testing tools:
The class-level atomic block ... allows the creation of initial data at the class level, once for the whole TestCase.

Categories