I'm trying to write unit tests for my python script which uses sqlalchemy to connect to MySQL.
My function looks like this:
def check_if_table_exists(db):
with db.connect() as cursor:
table_exists = cursor.execute(f"SHOW TABLES LIKE '{PRIMARY_TABLE_NAME}';")
if not table_exists.rowcount:
cursor.execute(f"CREATE TABLE...
I can't find any resources on how to first mock out db.connect(), which in turn also needs to have its execute mocked so that I can test different table_exists scenarios. It's also possible that my code is simply not cohesive to proper unit testing and I need to call the function with a cursor object to begin with.
For reference, db is the output of sqlalchemy.create_engine.
TLDR I need help getting started on unit testing for cases where I get rows back for the SHOW statement and when I don't.
To patch a context manager, you have to patch the return value of __enter__, which is called on entering a context manager. Here is an example for your code:
from unittest import mock
from sqlalchemy import create_engine
from my_project.db_connect import check_if_table_exists
#mock.patch('sqlalchemy.engine.Engine.connect')
def test_dbconnect(engine_mock):
db = create_engine('sqlite:///:memory:')
cursor_mock = engine_mock.return_value.__enter__.return_value
cursor_mock.execute.return_value.rowcount = 0
check_if_table_exists(db)
cursor_mock.execute.assert_called_with("CREATE TABLE")
In this code engine_mock.return_value is a mocked instance of Engine, and to get the mock for cursor, you need to add __enter__.return_value as described.
Having this, you can now mock the return value of execute - in this case you are only interested in the rowcount attribute which is checked in the code. Note that this will change the return value for all calls of execute - if you need different values for subsequent calls, you can use side_effect instead.
Related
I have the following function:
def create_tables(tables: list) -> None:
template = jinja2.Template(
open("/opt/airflow/etl_scripts/postgres/staging_orientdb_create_tables.sql", "r").read()
)
pg_hook = PostgresHook(POSTGRES_CONN_ID) # this is airflow module for Postgres
conn = pg_hook.get_conn()
for table in tables:
sql = template.render(TABLE_NAME=table)
with conn.cursor() as cur:
cur.execute(sql)
conn.commit()
Is there a solution to check the content of the "execute" or "sql" internal variable?
My test looks like this, but because I return nothing, I can't check:
def test_create_tables(mocker):
pg_mock = MagicMock(name="pg")
mocker.patch.object(
PostgresHook,
"get_conn",
return_value=pg_mock
)
mock_file_content = "CREATE TABLE IF NOT EXISTS {{table}}"
mocker.patch("builtins.open", mock_open(read_data=mock_file_content))
create_tables(tables=["mock_table_1", "mock_table_2"])
Thanks,
You cannot access internal variables of a function from the outside.
I strongly suggest refactoring your create_tables function then, if you already see that you want to test a specific part of its algorithm.
You could create a function like get_sql_from_table for example. Then test that separately and mock it out in your test_create_tables.
Although, since all it does is call the template rendering function form an external library, I am not sure, if that is even something you should be testing. You should assume/know that they test their functions themselves.
Same goes for the Postgres functions.
But the general advice stands: If you want to verify that a specific part of your code does what you expect it to do, factor it out into its own unit/function and test that separately.
hey i am new to mocking and testing in general, What i want to implement is that i want to mock a mongo database which is returned by get_database() function i use this function in other functions to get collections and solve other problems.
the issue i am facing is that I am not able to find a way to mock my database in this scenario what also i am not able to find is that how i will be able to mock my collection data as i also want to test an aggregation, i am lost at this point if anyone have done this please guide me in right way.
def get_database():
client = MongoClient(MongoDbConfiguration().get_server(),MongoDbConfiguration().get_port())
database = client[MongoDbConfiguration().get_db()]
return database
def function(database):
database = get_database()
A_collection = database['collection']
return A_collection
I have a fixture I created that pulls data from a database and puts it into a DataFrame so all subsequent tests can run on the same query output. The same code is staged in a function that kicks of my data processing pipeline (which I am building all these tests around). I am not sure I want to mock the db query result because I want to test the returned column names from the query and record uniqueness among other things.
Where I am confused is how to properly test the function I have in my code that is doing the same thing as the output_data fixture. Is it proper to duplicate the code in my app for the fixture?
# df_service.py
from project.settings import MSSQL_DB_CON_STR as con
def df_from_sql(start_date, months):
sql = f"SELECT * FROM dbo.awesome_table_value_function('{start_date}', {months})"
return pd.read_sql(sql=sql, con=con)
Here is the fixture I started with before I recognized I am creating a fixture for functionality I actually want to test and use as a fixture.
# test_df_service.py
import pytest
import pandas as pd
#pytest.fixture(scope="module")
def output_data():
from project.settings import MSSQL_DB_CON_STR as con
start_date = "11/1/2019"
months = 4
sql = f"SELECT * FROM dbo.awesome_table_value_function('{start_date}', {months})"
return pd.read_sql(sql=sql, con=con)
return pd.read_sql(sql=sql, con=con)
def test_columns(output_data):
expected_columns = ['entity','attribute','value','effective_date']
df = output_data
for col in df.columns:
assert col in expected_columns
So, one way you could go about this is the following:
Since the df_from_sql() function is within the domain of your application, it should be also a function you actually want to test. So, I would write a pytest test for the function, asserting things like Does it return X number of columns or Is the number of rows > 0, etc (if you cannot guarantee to return the same data every time).
I see why you would want to use the function to create a fixture, however that introduces the following risks:
The function to get the data for some reason doesn't work. Then all your tests might fail because your ouput data is not what they expected.
The data in the database might change, so you would get different results to what you expected.
In theory, fixtures are supposed to be as fixed in space and time as possible, so what I would recommend is to actually save the data that you need as the output_data fixture in some file, or maybe in a mock table in your database, ensuring it will never change.
In my application, there is a class for each model that holds commonly used queries (I guess it's somewhat of a "Repository" in DDD language). Each of these classes is passed the SQLAlchemy session object to create queries with upon construction. I'm having a little difficulty in figuring the best way to assert certain queries are being run in my unit tests. Using the ubiquitous blog example, let's say I have a "Post" model with columns and attributes "date" and "content". I also have a "PostRepository" with the method "find_latest" that is supposed to query for all posts in descending order by "date". It looks something like:
from myapp.models import Post
class PostRepository(object):
def __init__(self, session):
self._s = session
def find_latest(self):
return self._s.query(Post).order_by(Post.date.desc())
I'm having trouble mocking the Post.date.desc() call. Right now I'm monkey patching a mock in for Post.date.desc in my unit test, but I feel that there is likely a better approach.
Edit: I'm using mox for mock objects, my current unit test looks something like:
import unittest
import mox
class TestPostRepository(unittest.TestCase):
def setUp(self):
self._mox = mox.Mox()
def _create_session_mock(self):
from sqlalchemy.orm.session import Session
return self._mox.CreateMock(Session)
def _create_query_mock(self):
from sqlalchemy.orm.query import Query
return self._mox.CreateMock(Query)
def _create_desc_mock(self):
from myapp.models import Post
return self._mox.CreateMock(Post.date.desc)
def test_find_latest(self):
from myapp.models.repositories import PostRepository
from myapp.models import Post
expected_result = 'test'
session_mock = self._create_session_mock()
query_mock = self._create_query_mock()
desc_mock = self._create_desc_mock()
# Monkey patch
tmp = Post.date.desc
Post.date.desc = desc_mock
session_mock.query(Post).AndReturn(query_mock)
query_mock.order_by(Post.date.desc().AndReturn('test')).AndReturn(query_mock)
query_mock.offset(0).AndReturn(query_mock)
query_mock.limit(10).AndReturn(expected_result)
self._mox.ReplayAll()
r = PostRepository(session_mock)
result = r.find_latest()
self._mox.VerifyAll()
self.assertEquals(expected_result, result)
Post.date.desc = tmp
This does work, though feels ugly and I'm not sure why it fails without the "AndReturn('test')" piece of "Post.date.desc().AndReturn('test')"
I don't think you're really gaining much benefit by using mocks for testing your queries. Testing should be testing the logic of the code, not the implementation. A better solution would be to create a fresh database, add some objects to it, run the query on that database, and determine if you're getting the correct results back. For example:
# Create the engine. This starts a fresh database
engine = create_engine('sqlite://')
# Fills the database with the tables needed.
# If you use declarative, then the metadata for your tables can be found using Base.metadata
metadata.create_all(engine)
# Create a session to this database
session = sessionmaker(bind=engine)()
# Create some posts using the session and commit them
...
# Test your repository object...
repo = PostRepository(session)
results = repo.find_latest()
# Run your assertions of results
...
Now, you're actually testing the logic of the code. This means that you can change the implementation of your method, but so long as the query works correctly, the tests should still pass. If you want, you could write this method as a query that gets all the objects, then slices the resulting list. The test would pass, as it should. Later on, you could change the implementation to run the query using the SA expression APIs, and the test would pass.
One thing to keep in mind is that you might have problems with sqlite behaving differently than another database type. Using sqlite in-memory gives you fast tests, but if you want to be serious about these tests, you'll probably want to run them against the same type of database you'll be using in production as well.
If yet you want to create a unit test with mock input, you can create instances of your model with fake data
In case that the result proxy return result with data from more than one of the models (for instance when you join two tables), you can use collections data struct called namedtuple
We are using it to mock results of join queries
I've got a module pagetypes.py that extracts a couple of constants (I shouldn't really use word constant here) from the db for later reuse:
def _get_page_type_(type):
return PageType.objects.get(type=type)
PAGE_TYPE_MAIN = _get_page_type_('Main')
PAGE_TYPE_OTHER = _get_page_type_('Other')
then somewhere in views I do:
import pagetypes
...
print pagetypes.PAGE_TYPE_MAIN #simplified
It all works fine when db has those records and I make sure it does... unless this code is under test. In that case I want to load those records into db via fixtures. The problem with that is that fixtures are not loaded (even syncdb is not run) by the time pagetypes module is imported resulting in _get_page_type_ call failing with:
psycopg2.ProgrammingError: relation "pagetype" does not exist
Test runner always tries to import pagetypes module, because it is imported by view that is under test.
How do I get around this problem?
I was thinking of lazily loading pagetype constants PAGE_TYPE_MAIN, PAGE_TYPE_OTHER, but then I want it to fail early if those records are not in the db (or fixtures if under test), so I don't really know how to implement this.
I was also thinking of object level caching and just call PageType.objects.get(type=type) whenever constant is used/called, but wouldn't that be an overkill? Calling orm without cache would result in too many db calls, which I want to prevent.
It must be something very simple, but I can't work it out. ;-)
I would use the functions instead of the constants, but memoize them:
_cache = {}
def get_page_type(type_name):
if type_name not in _cache:
_cache[type_name] = PageType.objects.get(type=type_name)
return _cache[type_name]
So now you'd call get_page_type('Main') directly when necessary.