Python SQLAlchemy - Mocking a model attribute's "desc" method

Python SQLAlchemy - Mocking a model attribute's "desc" method - python

In my application, there is a class for each model that holds commonly used queries (I guess it's somewhat of a "Repository" in DDD language). Each of these classes is passed the SQLAlchemy session object to create queries with upon construction. I'm having a little difficulty in figuring the best way to assert certain queries are being run in my unit tests. Using the ubiquitous blog example, let's say I have a "Post" model with columns and attributes "date" and "content". I also have a "PostRepository" with the method "find_latest" that is supposed to query for all posts in descending order by "date". It looks something like:
from myapp.models import Post
class PostRepository(object):
def __init__(self, session):
self._s = session
def find_latest(self):
return self._s.query(Post).order_by(Post.date.desc())
I'm having trouble mocking the Post.date.desc() call. Right now I'm monkey patching a mock in for Post.date.desc in my unit test, but I feel that there is likely a better approach.
Edit: I'm using mox for mock objects, my current unit test looks something like:
import unittest
import mox
class TestPostRepository(unittest.TestCase):
def setUp(self):
self._mox = mox.Mox()
def _create_session_mock(self):
from sqlalchemy.orm.session import Session
return self._mox.CreateMock(Session)
def _create_query_mock(self):
from sqlalchemy.orm.query import Query
return self._mox.CreateMock(Query)
def _create_desc_mock(self):
from myapp.models import Post
return self._mox.CreateMock(Post.date.desc)
def test_find_latest(self):
from myapp.models.repositories import PostRepository
from myapp.models import Post
expected_result = 'test'
session_mock = self._create_session_mock()
query_mock = self._create_query_mock()
desc_mock = self._create_desc_mock()
# Monkey patch
tmp = Post.date.desc
Post.date.desc = desc_mock
session_mock.query(Post).AndReturn(query_mock)
query_mock.order_by(Post.date.desc().AndReturn('test')).AndReturn(query_mock)
query_mock.offset(0).AndReturn(query_mock)
query_mock.limit(10).AndReturn(expected_result)
self._mox.ReplayAll()
r = PostRepository(session_mock)
result = r.find_latest()
self._mox.VerifyAll()
self.assertEquals(expected_result, result)
Post.date.desc = tmp
This does work, though feels ugly and I'm not sure why it fails without the "AndReturn('test')" piece of "Post.date.desc().AndReturn('test')"

I don't think you're really gaining much benefit by using mocks for testing your queries. Testing should be testing the logic of the code, not the implementation. A better solution would be to create a fresh database, add some objects to it, run the query on that database, and determine if you're getting the correct results back. For example:
# Create the engine. This starts a fresh database
engine = create_engine('sqlite://')
# Fills the database with the tables needed.
# If you use declarative, then the metadata for your tables can be found using Base.metadata
metadata.create_all(engine)
# Create a session to this database
session = sessionmaker(bind=engine)()
# Create some posts using the session and commit them
...
# Test your repository object...
repo = PostRepository(session)
results = repo.find_latest()
# Run your assertions of results
...
Now, you're actually testing the logic of the code. This means that you can change the implementation of your method, but so long as the query works correctly, the tests should still pass. If you want, you could write this method as a query that gets all the objects, then slices the resulting list. The test would pass, as it should. Later on, you could change the implementation to run the query using the SA expression APIs, and the test would pass.
One thing to keep in mind is that you might have problems with sqlite behaving differently than another database type. Using sqlite in-memory gives you fast tests, but if you want to be serious about these tests, you'll probably want to run them against the same type of database you'll be using in production as well.

If yet you want to create a unit test with mock input, you can create instances of your model with fake data
In case that the result proxy return result with data from more than one of the models (for instance when you join two tables), you can use collections data struct called namedtuple
We are using it to mock results of join queries

Related

Correct use of pytest fixtures of objects with Django

I am relatively new to pytest, so I understand the simple use of fixtures that looks like that:
#pytest.fixture
def example_data():
return "abc"
and then using it in a way like this:
def test_data(self, example_data):
assert example_data == "abc"
I am working on a django app and where it gets confusing is when I try to use fixtures to create django objects that will be used for the tests.
The closest solution that I've found online looks like that:
#pytest.fixture
def test_data(self):
users = get_user_model()
client = users.objects.get_or_create(username="test_user", password="password")
and then I am expecting to be able to access this user object in a test function:
#pytest.mark.django_db
#pytest.mark.usefixtures("test_data")
async def test_get_users(self):
# the user object should be included in this queryset
all_users = await sync_to_async(User.objects.all)()
.... (doing assertions) ...
The issue is that when I try to list all the users I can't find the one that was created as part of the test_data fixture and therefore can't use it for testing.
I noticed that if I create the objects inside the function then there is no problem, but this approach won't work for me because I need to parametrize the function and depending on the input add different groups to each user.
I also tried some type of init or setup function for my testing class and creating the User test objects from there but this doesn't seem to be pytest's recommended way of doing things. But either way that approach didn't work either when it comes to listing them later.
Is there any way to create test objects which will be accessible when doing a queryset?
Is the right way to manually create separate functions and objects for each test case or is there a pytest-way of achieving that?

How to mock sqlalchemy execution output inside `with` context manager

I'm trying to write unit tests for my python script which uses sqlalchemy to connect to MySQL.
My function looks like this:
def check_if_table_exists(db):
with db.connect() as cursor:
table_exists = cursor.execute(f"SHOW TABLES LIKE '{PRIMARY_TABLE_NAME}';")
if not table_exists.rowcount:
cursor.execute(f"CREATE TABLE...
I can't find any resources on how to first mock out db.connect(), which in turn also needs to have its execute mocked so that I can test different table_exists scenarios. It's also possible that my code is simply not cohesive to proper unit testing and I need to call the function with a cursor object to begin with.
For reference, db is the output of sqlalchemy.create_engine.
TLDR I need help getting started on unit testing for cases where I get rows back for the SHOW statement and when I don't.

To patch a context manager, you have to patch the return value of __enter__, which is called on entering a context manager. Here is an example for your code:
from unittest import mock
from sqlalchemy import create_engine
from my_project.db_connect import check_if_table_exists
#mock.patch('sqlalchemy.engine.Engine.connect')
def test_dbconnect(engine_mock):
db = create_engine('sqlite:///:memory:')
cursor_mock = engine_mock.return_value.__enter__.return_value
cursor_mock.execute.return_value.rowcount = 0
check_if_table_exists(db)
cursor_mock.execute.assert_called_with("CREATE TABLE")
In this code engine_mock.return_value is a mocked instance of Engine, and to get the mock for cursor, you need to add __enter__.return_value as described.
Having this, you can now mock the return value of execute - in this case you are only interested in the rowcount attribute which is checked in the code. Note that this will change the return value for all calls of execute - if you need different values for subsequent calls, you can use side_effect instead.

How to mock elastic search python?

I need to mock elasticsearch calls, but I am not sure how to mock them in my python unit tests. I saw this framework called ElasticMock. I tried using it the way indicated in the documentation and it gave me plenty of errors.
It is here :
https://github.com/vrcmarcos/elasticmock
My question is, is there any other way to mock elastic search calls?
This doesn't seem to have an answer either: Mock elastic search data.
And this just indicates to actually do integration tests rather than unit tests, which is not what I want:
Unit testing elastic search inside Django app.
Can anyone point me in the right direction? I have never mocked things with ElasticSearch.

You have to mock the attr or method you need, for example:
import mock
with mock.patch("elasticsearch.Elasticsearch.search") as mocked_search, \
mock.patch("elasticsearch.client.IndicesClient.create") as mocked_index_create:
mocked_search.return_value = "pipopapu"
mocked_index_create.return_value = {"acknowledged": True}
In order to know the path you need to mock, just explore the lib with your IDE. When you already know one you can easily find the others.

After looking at the decorator source code, the trick for me was to reference Elasticsearch with the module:
import elasticsearch
...
elasticsearch.Elasticsearch(...
instead of
from elasticsearch import Elasticsearch
...
Elasticsearch(...

I'm going to give a very abstract answer because this applies to more than ES.
class ProductionCodeIWantToTest:
def __init__(self):
pass
def do_something(data):
es = ES() #or some database or whatever
es.post(data) #or the right syntax
Now I can't test this.
With one small change, injecting a dependency:
class ProductionCodeIWantToTest:
def __init__(self, database):
self.database = database
def do_something(data):
database.save(data) #or the right syntax
Now you can use the real db:
es = ES() #or some database or whatever
thing = ProductionCodeIWantToTest(es)
or test it
mock = #... up to you - just needs a save method so far
thing = ProductionCodeIWantToTest(mock)

Celery pickling not playing nice with Cassandra driver, can't figure out the root cause

I'm experiencing some behavior that I can't quite figure out. I'm using Cassandra to store message objects, and I'm using Celery for async pulls and pushes to the database. Everything is working fine, except for a single Celery task; the other tasks that use the same code/classes work. Here's a rough breakdown of the code logic:
db_manager = DBManager()
class User(object):
def __init__(self, user_id):
... normal init stuff ...
self.loader()
#run_async
def loader(self):
... loads from database if found, otherwise pulls from API ...
# THIS WORKS
#celery.task(name='user-to-db', filter=task_method)
def to_db(self):
# db_manager is a custom backend that handles relevant db reads, writes, etc.
db_manager.add('users', self.user_payload)
# THIS WORKS
#celery.task(name='load-friends', filter=task_method)
def load_friends(self):
# Checks secondary redis index for friends of user
friends = redis.srandmember('users:the-users-id:friends', self.id, 20)
if not friends:
profiles = load_friends_from_api(user_id=self.id)
else:
query = "SELECT * FROM keyspace.users WHERE id IN ({friends})".format(friends=friends)
# Init a User object for every friend
loaded_friends = [User(friend) for friend in profiles]
# Returns a class container with all the instances of User(friend), accessible through a class property
return FriendContainer(self.id, loaded_friends)
# THIS DOES NOT WORK
#celery.task(name='get-user-messages', filter=task_method)
def get_user_messages(self):
# THIS IS WHERE IT FAILS #
messages = db_manager.get("SELECT message FROM keyspace.message_timelines WHERE user_id = {user_id}".format(user_id=self.id))
# THAT LINE ABOVE #
# Init a message class object for every message payload in database
msgs = [Message(m, user=self) for m in messages]
# Returns a message container class holding all the message objects, accessible through a class property
return MessageContainer(msgs)
This last class method throws an error:
File "/usr/local/lib/python2.7/dist-packages/kombu/serialization.py", line 356, in pickle_dumps
return dumper(obj, protocol=pickle_protocol)
EncodeError: Can't pickle <class 'cassandra.io.eventletreactor.message'>: attribute lookup cassandra.io.eventletreactor.message failed
cassandra.io.eventletreactor.message points to a user-defined type in Cassandra that I use as a container for message objects per user. The line that throws this error is:
messages = db_manager.get("SELECT message FROM keyspace.message_timelines WHERE user_id = {user_id}".format(user_id=self.id))
This is the method from DBManager():
class DBManager(object):
... stuff ...
def get(self, query):
# I do some stuff to prepare the query, namely substituting `WHERE this = that` for `WHERE this = ?` to create a Cassandra prepared statement.
statement = cassandra.prepare(query_prepared)
# I want these messages as a dict, not the default namedtuple
cassandra.row_factory = dict_factory
# User id is parsed out of query
results = cassandra.execute(statement, (user_id,))
rows = results.current_rows
# rows is a list of dicts, no weird class references or anything in there
return rows
I've read that Celery tasks out of class methods is/was kind of experimental, but I can't figure out why all the other methods qua tasks that use the same instance of DBManager are working.
The problem seems to be localized to some issue with the user-defined type message that's not playing nice within the Cassandra driver; however, if I run the get method from DBManager within the Celery task itself, it works. That is, if I copy/paste the code that is throwing the error from DBManager.get into User.get_user_messages, it works fine. If I try to call DBManager.get from within User.get_user_messages, it breaks.
I just can't figure out where the problem is. I can do all the following just fine:
Run the get_user_messages method without Celery, and it works.
Run the get_user_messages method WITH Celery if I run the get method code right in the Celery task method itself.
I can run other methods registered as Celery tasks that point to other methods in DBManager that use the Cassandra driver, even ones that insert the same message user-defined type into the database.
I've tried pickling ALL THE THINGS all the way down myself, and in various combinations, and can't reproduce the error.
What I have not tried:
Change serializer to json or yaml. There are a few convenience items in the db payload that won't serialize with either of those two.
Use dill instead of pickle. It seems like this should work without having to switch serializers given that I can get various parts working separately.
I could just say screw it and run the query directly through the Cassandra driver instead of my DBManager class, but I feel like this should be solvable and I'm just missing something really, really obvious, so obvious that I'm not seeing it. Any suggestions on where to look would be greatly appreciated.
In case of relevance: Cassandra 3.3, CQL 3.4, DataStax python driver 3.1

Meh, I found the problem, and it WAS really obvious. I guess I didn't actually try pickling all the things, just most of the things, and I didn't catch this in my 4am debugging stupor.
At any rate, cassandra.row_factory = dict_factory, when called on a user defined type, doesn't actually return everything as a dict. It gives a dict of {'label': message(x='this', y='that')}, where message is a namedtuple. The Cassandra driver dynamically creates the namedtuple inside of a class instance, and so pickle couldn't find it.

How to lazily evaluate ORM call after fixtures are loaded into db in Django?

I've got a module pagetypes.py that extracts a couple of constants (I shouldn't really use word constant here) from the db for later reuse:
def _get_page_type_(type):
return PageType.objects.get(type=type)
PAGE_TYPE_MAIN = _get_page_type_('Main')
PAGE_TYPE_OTHER = _get_page_type_('Other')
then somewhere in views I do:
import pagetypes
...
print pagetypes.PAGE_TYPE_MAIN #simplified
It all works fine when db has those records and I make sure it does... unless this code is under test. In that case I want to load those records into db via fixtures. The problem with that is that fixtures are not loaded (even syncdb is not run) by the time pagetypes module is imported resulting in _get_page_type_ call failing with:
psycopg2.ProgrammingError: relation "pagetype" does not exist
Test runner always tries to import pagetypes module, because it is imported by view that is under test.
How do I get around this problem?
I was thinking of lazily loading pagetype constants PAGE_TYPE_MAIN, PAGE_TYPE_OTHER, but then I want it to fail early if those records are not in the db (or fixtures if under test), so I don't really know how to implement this.
I was also thinking of object level caching and just call PageType.objects.get(type=type) whenever constant is used/called, but wouldn't that be an overkill? Calling orm without cache would result in too many db calls, which I want to prevent.
It must be something very simple, but I can't work it out. ;-)

I would use the functions instead of the constants, but memoize them:
_cache = {}
def get_page_type(type_name):
if type_name not in _cache:
_cache[type_name] = PageType.objects.get(type=type_name)
return _cache[type_name]
So now you'd call get_page_type('Main') directly when necessary.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.