Using python mocking library on sqlalchemy

Using python mocking library on sqlalchemy - python

I'm using sqlalchemy to query my databases for a project. Side-by-side, I'm new to unit testing and I'm trying to learn how can I make unit tests to test my database. I tried to use mocking library to test but so far, it seems to be very difficult.
So I made a piece of code which creates a Session object. This object is used to connect to database.
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.exc import OperationalError, ArgumentError
test_db_string = 'postgresql+psycopg2://testdb:hello#localhost/' \
'test_databasetable'
def get_session(database_connection_string):
try:
Base = declarative_base()
engine = create_engine(database_connection_string)
Base.metadata.bind = engine
DBSession = sessionmaker(bind=engine)
session = DBSession()
connection = session.connection()
return session
except OperationalError:
return None
except ArgumentError:
return None
So I made a unit test case for this function:
import mock
import unittest
from mock import sentinel
import get_session
class TestUtilMock(unittest.TestCase):
#mock.patch('app.Utilities.util.create_engine') # mention the whole path
#mock.patch('app.Utilities.util.sessionmaker')
#mock.patch('app.Utilities.util.declarative_base')
def test_get_session1(self, mock_delarative_base, mock_sessionmaker,
mock_create_engine):
mock_create_engine.return_value = sentinel.engine
get_session('any_path')
mock_delarative_base.called
mock_create_engine.assert_called_once_with('any_path')
mock_sessionmaker.assert_called_once_with(bind=sentinel.engine)
As you see in my unit test, I cannot test code in get_session() starting from line session = DBSession(). So far after googling, I cannot find out if mock value returned can be also used to mock function calls - something like, I mock session object and verify if I called session.connection()
Is the above method of writing unit test case the right way? Is there a better method to do this?

First of all you can test session = DBSession() line also by
self.assertEqual(get_session('any_path'), mock_sessionmaker.return_value.return_value)
Moreover, mock_delarative_base.called is not an assert and cannot fail. Replace it with
self.assertTrue(mock_delarative_base.called)
General considerations
Writing test like these can be very time consuming and make your production code really coupled to test code. It is better to write your own wrapper that presents a comfortable interface for your business code and test it with some mocks and real (simple) tests: you should trust sqlalchemy implementation and just test how your wrapper calls it
After you can either mock your wrapper by a fake object that you can control or use mocks and check how your code calls it but your wrapper will present just some business interface and mocking it should be very simple.

Related

Cannot mock python function called during import

I am trying to mock a sqlalchemy create_engine method called during import of a python script. But I am not able to properly do so. Here is my file structure:
File main.py
from database_wrapper import update_database
def do_something():
# do something and
update_database()
database_wrapper.py
from sqlalchemy import create_engine
engine = create_engine("connection_uri")
def update_database() -> bool:
con = engine.connect()
# do something with the connection, and so on...
In my test unit i do something like:
class TestMicroservice(TestCase):
#patch('database_wrapper.create_engine')
#patch('main.update_database')
def test_1(self, update_database_mock, create_engine_mock):
create_engine_mock.return_value = "Fake engine, babe"
update_database_mock.return_value = True
from main import do_something
do_something()
I have also tried #patch('main.database_wrapper.create_engine'), #patch('main.database_wrapper.sqlalchemy.create_engine'), #patch('sqlalchemy.create_engine') and many other variations but with no luck. I have read a few similar posts but couldn't find this exact case.
I would rather avoid changing database_wrapper.py, hence i cannot move engine = create_engine("connection_uri"). Also because I am happy to create the engine when the program start without having to pass it around.

I found a solution. Thank to MrBean Bremen for setting me on the right route. In fact engine = create_engine("connection_uri") is already lazy initialized, as explained here. The only thing I needed to change was to pass a reasonable string as connection uri, something like
engine = create_engine("postgresql://user:pass#localhost:5432/db?sslmode=disable")
This way the engine would be initialised, but because i will never actually use it in the update_database() method, it would never give a problem.

How to initialize a Postgresql database in a Flask app with SQLAlchemy

The Flask tutorial (and many other tutorials out there) suggests that the engine, the db_session and the Base (an instance of declarative_metadata) are all created at import-time.
This creates some problems, one being, that the URI of the DB is hardcoded in the code and evaluated only once.
One solution is to wrap these calls in functions that accept the app as a parameter, which is what I've done. Mind you - each call caches the result in app.config:
def get_engine(app):
"""Return the engine connected to the database URI in the config file.
Store it in the config for later use.
"""
engine = app.config.setdefault(
'DB_ENGINE', create_engine(app.config['DATABASE_URI'](), echo=True))
return engine
def get_session(app):
"""Return the DB session for the database in use
Store it in the config for later use.
"""
engine = get_engine(app)
db_session = app.config.setdefault(
'DB_SESSION', scoped_session(sessionmaker(
autocommit=False, autoflush=False, bind=engine)))
return db_session
def get_base(app):
"""Return the declarative base to use in DB models.
Store it in the config for later use.
"""
Base = app.config.setdefault('DB_BASE', declarative_base())
Base.query = get_session(app).query_property()
return Base
In init_db, I call all those functions, but there's still code smell:
def init_db(app):
"""Initialise the database"""
create_db(app)
engine = get_engine(app)
db_session = get_session(app)
base = get_base(app)
if not app.config['TESTING']:
import flaskr.models
else:
if 'flaskr.models' not in sys.modules:
import flaskr.models
else:
import flaskr.models
importlib.reload(flaskr.models)
base.metadata.create_all(bind=engine)
The smell is of course the hoops I have to go through to import and create all models.
The reason for the code above is that, when unit testing, init_db is called once for each test (in setup(), as suggested in the same tutorial), but the import will only be performed the first time, and create_all will therefore work only that time.
Not only that, now with a session shared for the duration of the app, I have problems in parametrized negative unit tests (that is, parametrized unit tests that expect some sort of failures): the first instance of the test will trigger a failure (e.g. login failure, see test_login_validate_input in the tutorial) and exit correctly, while all subsequent will bail out early because the db_session should be rolled back first. Clearly there's something wrong with the DB initialization.
What is the Right Way(TM) to initialize the database?

I have eventually decided to refactor the app so that it uses Flask-SQLAlchemy.
In short, the app now does something like this:
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()
def create_app():
app = Flask(__name__)
db.init_app(app)
# ...
With the benefit of hindsight, it's definitely a cleaner approach.
What put me off at the start was this entry from the tutorial (bold is mine):
Because SQLAlchemy is a common database abstraction layer and object relational mapper that requires a little bit of configuration effort, there is a Flask extension that handles that for you. This is recommended if you want to get started quickly.
Which I somehow read as "Using the Flask-SQLAlchemy extension will allow you to cut some corners, which you'll probably end up paying for later".
It's very early stages, but so far no price to pay in terms of flexibility for using said extension.

How to mock sqlalchemy execution output inside `with` context manager

I'm trying to write unit tests for my python script which uses sqlalchemy to connect to MySQL.
My function looks like this:
def check_if_table_exists(db):
with db.connect() as cursor:
table_exists = cursor.execute(f"SHOW TABLES LIKE '{PRIMARY_TABLE_NAME}';")
if not table_exists.rowcount:
cursor.execute(f"CREATE TABLE...
I can't find any resources on how to first mock out db.connect(), which in turn also needs to have its execute mocked so that I can test different table_exists scenarios. It's also possible that my code is simply not cohesive to proper unit testing and I need to call the function with a cursor object to begin with.
For reference, db is the output of sqlalchemy.create_engine.
TLDR I need help getting started on unit testing for cases where I get rows back for the SHOW statement and when I don't.

To patch a context manager, you have to patch the return value of __enter__, which is called on entering a context manager. Here is an example for your code:
from unittest import mock
from sqlalchemy import create_engine
from my_project.db_connect import check_if_table_exists
#mock.patch('sqlalchemy.engine.Engine.connect')
def test_dbconnect(engine_mock):
db = create_engine('sqlite:///:memory:')
cursor_mock = engine_mock.return_value.__enter__.return_value
cursor_mock.execute.return_value.rowcount = 0
check_if_table_exists(db)
cursor_mock.execute.assert_called_with("CREATE TABLE")
In this code engine_mock.return_value is a mocked instance of Engine, and to get the mock for cursor, you need to add __enter__.return_value as described.
Having this, you can now mock the return value of execute - in this case you are only interested in the rowcount attribute which is checked in the code. Note that this will change the return value for all calls of execute - if you need different values for subsequent calls, you can use side_effect instead.

Is it OK to execute code when a module imports?

I'm designing a small GUI application to wrap an sqlite DB (simple CRUD operations). I have created three sqlalchemy models (m_person, m_card.py, m_loan.py, all in a /models folder) and had previously had the following code at the top of each one:
from sqlalchemy import Table, Column, create_engine
from sqlalchemy import Integer, ForeignKey, String, Unicode
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import backref, relation
engine = create_engine("sqlite:///devdata.db", echo=True)
declarative_base = declarative_base(engine)
metadata = declarative_base.metadata
This felt a bit wrong (DRY) so it was suggested that I move all this stuff out to the module level (into models/__init__.py).
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import Table, Column, Boolean, Unicode
from settings import setup
engine = create_engine('sqlite:///' + setup.get_db_path(), echo=False)
declarative_base = declarative_base(engine)
metadata = declarative_base.metadata
session = sessionmaker()
session = session()
..and import declarative_base like so:
from sqlalchemy import Table, Column, Unicode
from models import declarative_base
class Person(declarative_base):
"""
Person model
"""
__tablename__ = "people"
id = Column(Unicode(50), primary_key=True)
fname = Column(Unicode(50))
sname = Column(Unicode(50))
However I've had a lot of feedback that executing code as the module imports like this is bad? I'm looking for a definitive answer on the right way to do it as it seems like by trying to remove code repetition I've introduced some other bad practises. Any feedback would be really useful.
(Below is the get_db_path() method from settings/setup.py for completeness as it is called in the above models/__init__.py code.)
def get_db_path():
import sys
from os import makedirs
from os.path import join, dirname, exists
from constants import DB_FILE, DB_FOLDER
if len(sys.argv) > 1:
db_path = sys.argv[1]
else:
#Check if application is running from exe or .py and adjust db path accordingly
if getattr(sys, 'frozen', False):
application_path = dirname(sys.executable)
db_path = join(application_path, DB_FOLDER, DB_FILE)
elif __file__:
application_path = dirname(__file__)
db_path = join(application_path, '..', DB_FOLDER, DB_FILE)
#Check db path exists and create it if not
def ensure_directory(db_path):
dir = dirname(db_path)
if not exists(dir):
makedirs(dir)
ensure_directory(db_path)
return db_path

Some popular frameworks (Twisted is one example) perform a good deal of initialization logic at import-time. The benefits of being able to build module contents dynamically do come at a price, one of them being that IDEs cannot always decide what is "in" the module or not.
In your specific case, you may want to refactor so that the particular engine is not supplied at import-time but later. You can create the metadata and your declarative_base class at import-time. Then during start-time, after all the classes are defined, you call create_engine and bind the result to your sqlalchemy.orm.sessionmaker. But if your needs are simple, you may not even need to go that far.
In general, I would say this is not Java or C. There is no reason to fear doing things at the module level other than define functions, classes and constants. Your classes are created when the application starts anyway, one after the other. A little monkey-patching of classes (in the same module!), or creating one or two global lookup tables, is OK in my opinion if it simplifies your implementation.
What I would definitely avoid is any code in your module that causes the order of imports to matter for your users (other than the normal way of simply providing logic to be used), or that modifies the behavior of code outside your module. Then your module becomes black magic, which is accepted in the Perl world (ala use strict;), but I find is not "pythonic".
For example, if your module modifies properties of sys.stdout when imported, I would argue that behavior should instead be moved into a function that the user can either call or not.

In principle there is nothing wrong with executing Python code when a module is imported, in fact every single Python module works that way. Defining module members is executing code, after all.
However, in your particular use case I would strongly advise against making a singleton database session object in your code base. You'll be losing out on the ability to do many things, for example unit test your model against an in-memory SQLite or other kind of database engine.
Take a look at the documentation for declarative_base and note how the examples are able to create the model with a declarative_base supplied class that's not yet bound to a database engine. Ideally you want to do that and then have some kind of connection function or class that will manage creating a session and then bind it to base.

There is nothing wrong with executing code at import time.
The guideline is executing enough so that your module is usable, but not so much that importing it is unnecessarily slow, and not so much that you are unnecessarily limiting what can be done with it.
Typically this means defining classes, functions, and global names (actually module level -- bit of a misnomer there), as well as importing anything your classes, functions, etc., need to operate.
This does not usually involve making connections to databases, websites, or other external, dynamic resources, but rather supplying a function to establish those connections when the module user is ready to do so.

I ran into this as well, and created a database.py file with a database manager class, after which I created a single global object. That way the class could read settings from my settings.py file to configure the database, and the first class that needed to use a base object (or session / engine) would initialize the global object, after which everyone just re-uses it. I feel much more comfortable having "from myproject.database import DM" at the top of each class using the SQLAlchemy ORM, where DM is my global database object, and then DM.getBase() to get the base object.
Here is my database.py class:
Session = scoped_session(sessionmaker(autoflush=True))
class DatabaseManager():
"""
The top level database manager used by all the SQLAlchemy classes to fetch their session / declarative base.
"""
engine = None
base = None
def ready(self):
"""Determines if the SQLAlchemy engine and base have been set, and if not, initializes them."""
host='<database connection details>'
if self.engine and self.base:
return True
else:
try:
self.engine = create_engine(host, pool_recycle=3600)
self.base = declarative_base(bind=self.engine)
return True
except:
return False
def getSession(self):
"""Returns the active SQLAlchemy session."""
if self.ready():
session = Session()
session.configure(bind=self.engine)
return session
else:
return None
def getBase(self):
"""Returns the active SQLAlchemy base."""
if self.ready():
return self.base
else:
return None
def getEngine(self):
"""Returns the active SQLAlchemy engine."""
if self.ready():
return self.engine
else:
return None
DM = DatabaseManager()

Python SQLAlchemy - Mocking a model attribute's "desc" method

In my application, there is a class for each model that holds commonly used queries (I guess it's somewhat of a "Repository" in DDD language). Each of these classes is passed the SQLAlchemy session object to create queries with upon construction. I'm having a little difficulty in figuring the best way to assert certain queries are being run in my unit tests. Using the ubiquitous blog example, let's say I have a "Post" model with columns and attributes "date" and "content". I also have a "PostRepository" with the method "find_latest" that is supposed to query for all posts in descending order by "date". It looks something like:
from myapp.models import Post
class PostRepository(object):
def __init__(self, session):
self._s = session
def find_latest(self):
return self._s.query(Post).order_by(Post.date.desc())
I'm having trouble mocking the Post.date.desc() call. Right now I'm monkey patching a mock in for Post.date.desc in my unit test, but I feel that there is likely a better approach.
Edit: I'm using mox for mock objects, my current unit test looks something like:
import unittest
import mox
class TestPostRepository(unittest.TestCase):
def setUp(self):
self._mox = mox.Mox()
def _create_session_mock(self):
from sqlalchemy.orm.session import Session
return self._mox.CreateMock(Session)
def _create_query_mock(self):
from sqlalchemy.orm.query import Query
return self._mox.CreateMock(Query)
def _create_desc_mock(self):
from myapp.models import Post
return self._mox.CreateMock(Post.date.desc)
def test_find_latest(self):
from myapp.models.repositories import PostRepository
from myapp.models import Post
expected_result = 'test'
session_mock = self._create_session_mock()
query_mock = self._create_query_mock()
desc_mock = self._create_desc_mock()
# Monkey patch
tmp = Post.date.desc
Post.date.desc = desc_mock
session_mock.query(Post).AndReturn(query_mock)
query_mock.order_by(Post.date.desc().AndReturn('test')).AndReturn(query_mock)
query_mock.offset(0).AndReturn(query_mock)
query_mock.limit(10).AndReturn(expected_result)
self._mox.ReplayAll()
r = PostRepository(session_mock)
result = r.find_latest()
self._mox.VerifyAll()
self.assertEquals(expected_result, result)
Post.date.desc = tmp
This does work, though feels ugly and I'm not sure why it fails without the "AndReturn('test')" piece of "Post.date.desc().AndReturn('test')"

I don't think you're really gaining much benefit by using mocks for testing your queries. Testing should be testing the logic of the code, not the implementation. A better solution would be to create a fresh database, add some objects to it, run the query on that database, and determine if you're getting the correct results back. For example:
# Create the engine. This starts a fresh database
engine = create_engine('sqlite://')
# Fills the database with the tables needed.
# If you use declarative, then the metadata for your tables can be found using Base.metadata
metadata.create_all(engine)
# Create a session to this database
session = sessionmaker(bind=engine)()
# Create some posts using the session and commit them
...
# Test your repository object...
repo = PostRepository(session)
results = repo.find_latest()
# Run your assertions of results
...
Now, you're actually testing the logic of the code. This means that you can change the implementation of your method, but so long as the query works correctly, the tests should still pass. If you want, you could write this method as a query that gets all the objects, then slices the resulting list. The test would pass, as it should. Later on, you could change the implementation to run the query using the SA expression APIs, and the test would pass.
One thing to keep in mind is that you might have problems with sqlite behaving differently than another database type. Using sqlite in-memory gives you fast tests, but if you want to be serious about these tests, you'll probably want to run them against the same type of database you'll be using in production as well.

If yet you want to create a unit test with mock input, you can create instances of your model with fake data
In case that the result proxy return result with data from more than one of the models (for instance when you join two tables), you can use collections data struct called namedtuple
We are using it to mock results of join queries

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.