PyTest: Django transaction commit failure - python

I am using Pytest to implement unit test in my django project which has MySql as backend.
In combination with these I am making use of SQLAlchemy for data generation.
I have a python function call_my_flow() which executes two different flows depending upon conditions. First flow uses sqlalchemy connection and second flow uses django connection for database insert.
I have written two unit tests using pytest to check both the flows.
First flow (where sqlalchemy connection is used): Commits the process flow transaction in database and pytest runs as per expectation.
Second flow (where django database connection is used): The transaction commit fails thus resulting into the failure of test.
Demo code:
import pytest
from myflow import call_my_flow
#pytest.fixture(scope="class")
#pytest.mark.django_db(transaction=False)
def setup_my_flow():
call_my_flow()
#pytest.mark.usefixtures("setup_my_flow")
class TestGenerateOrder(object):
#pytest.fixture(autouse=True)
def setuporder(self):
self.first_count = 2
self.second_count = 5
#pytest.mark.order1
#pytest.mark.django_db
def test_first_flow_count(self):
db_count = get_first_count()
assert db_count == self.first_count
#pytest.mark.order2
#pytest.mark.django_db
def test_second_flow_count(self):
db_count = get_second_count()
assert db_count == self.second_count
Please suggest a solution on the same.

Related

How to run Spark unit testing in parallel via pytest (and fixture)?

I am writing unit testing for a spark application. I am using pytest and I have created a fixture to load the spark session once.
When I run one test at a time, it is passing but when I run all the tests together I am getting unexpected behavior. Then, I realize, spark is not multi-threadable. Any way to fix this? Is running pytest in non-parallel mode is the only solution?
Sample code structure,
#pytest.fixture(scope="session")
def spark() -> SparkSession:
builder = SparkSession.builder.appName("pandas-on-spark")
builder = builder.config("spark.sql.execution.arrow.pyspark.enabled", "true")
return builder.getOrCreate()
def test1(spark):
df = spark.createDataFrame(dummy_rows)
# do some transformaton
# assert
def test2(spark):
df = spark.createDataFrame(dummy_rows)
# do some transformaton
# assert
def testN(spark):
df = spark.createDataFrame(dummy_rows)
# do some transformaton
# assert
pytest -s .
With scope="session", you'd have a single Spark session for all the tests, means all variables, all caches, all transformations etc. If you really need to have each transformation completely separated from each test, you should consider having a new Spark session for each test by changing lower scope into class or function. The whole test would run slower, but your logic will be secured.

Datastore delay on creating entities with put()

I am developing an application using with the Cloud Datastore Emulator (2.1.0) and the google-cloud-ndb Python library (1.6).
I find that there is an intermittent delay on entities being retrievable via a query.
For example, if I create an entity like this:
my_entity = MyEntity(foo='bar')
my_entity.put()
get_my_entity = MyEntity.query().filter(MyEntity.foo == 'bar').get()
print(get_my_entity.foo)
it will fail itermittently because the get() method returns None.
This only happens on about 1 in 10 calls.
To demonstrate, I've created this script (also available with ready to run docker-compose setup on GitHub):
import random
from google.cloud import ndb
from google.auth.credentials import AnonymousCredentials
client = ndb.Client(
credentials=AnonymousCredentials(),
project='local-dev',
)
class SampleModel(ndb.Model):
"""Sample model."""
some_val = ndb.StringProperty()
for x in range(1, 1000):
print(f'Attempt {x}')
with client.context():
random_text = str(random.randint(0, 9999999999))
new_model = SampleModel(some_val=random_text)
new_model.put()
retrieved_model = SampleModel.query().filter(
SampleModel.some_val == random_text
).get()
print(f'Model Text: {retrieved_model.some_val}')
What would be the correct way to avoid this intermittent failure? Is there a way to ensure the entity is always available after the put() call?
Update
I can confirm that this is only an issue with the datastore emulator. When testing on app engine and a Firestore in Datastore mode, entities are available immediately after calling put().
The issue turned out to be related to the emulator trying to replicate eventual consistency.
Unlike relational databases, Datastore does not gaurentee that the data will be available immediately after it's posted. This is because there are often replication and indexing delays.
For things like unit tests, this can be resolved by passing --consistency=1.0 to the datastore start command as documented here.

Py.Test parametrizing based on parametrized fixture

I have a class scoped parametrized fixture that gets 3 databases for its params and returns a connection to each one.
Tests in a class uses this fixture to test each DB connection attributes.
Now I have a new class with database tables tests that I want to use the above fixture but to be parametrized on each connection tables.
Any suggestion on the pytest way to implement this? I can't find a way to parametrize based on an already parametrized element.
Thanks
Test classes are used to:
provide setup and teardown functions for test cases
share some common values during testing
With pytest this is not necessary as setup and teardown can be done on fixture level.
For this reason my solution does not use classes (but it could be probably used with them).
To show, that the (fake) connections are created and then closed watch the output on stdout. The trick is
to use #pytest.yield_fixture, which is not using return but yield to provide the value used in
the parameter injected into test case. Whatever is following first yield statement is executed
as teardown code.
"rectangle" style: M x N test runs by two parametrized fixtures
The first case is natural to py.test, where all fixture variants are combined.
As it has M x N test case runs, I call it "rectangle".
My tests are in tests/test_it.py:
import pytest
#pytest.yield_fixture(scope="class", params=["mysql", "pgsql", "firebird"])
def db_connect(request):
print("\nopening db")
yield request.param
print("closing db")
#pytest.fixture(scope="class", params=["user", "groups"])
def table_name(request):
return request.param
def test_it(db_connect, table_name):
print("Testing: {} + {}".format(db_connect, table_name))
If you need more test cases like test_it, just create them with another name.
Running my test case::
$ py.test -sv tests
========================================= test session starts =========================================
platform linux2 -- Python 2.7.9 -- py-1.4.30 -- pytest-2.7.2 -- /home/javl/.virtualenvs/stack/bin/python2
rootdir: /home/javl/sandbox/stack/tests, inifile:
collected 6 items
tests/test_it.py::test_it[mysql-user]
opening db
Testing: mysql + user
PASSEDclosing db
tests/test_it.py::test_it[pgsql-user]
opening db
Testing: pgsql + user
PASSEDclosing db
tests/test_it.py::test_it[pgsql-groups]
opening db
Testing: pgsql + groups
PASSEDclosing db
tests/test_it.py::test_it[mysql-groups]
opening db
Testing: mysql + groups
PASSEDclosing db
tests/test_it.py::test_it[firebird-groups]
opening db
Testing: firebird + groups
PASSEDclosing db
tests/test_it.py::test_it[firebird-user]
opening db
Testing: firebird + user
PASSEDclosing db
====================================== 6 passed in 0.01 seconds =======================================
"Exploding triangles" from one fixture to N dependent fixtures
The idea is as follows:
generate couple of db_connect fixtures, using parametrize fixture
for each db_connect generate N variants of table_name fixtures
have test_it(db_connect, table_name) being called only by proper combinatins of db_connect and
table_name.
This simply does not work
The only solutions is to use some sort of scenarios, which explicitly define, which combinations are
correct.
"Scenarios": indirect parametrization at test function level
Instead of parametrizing fixtures, we have to parametrize test function.
Usually, the parameter value is passed directly to test function as is. If we want a fixture (named
as the parameter name) to take care of creating the value to use, we have to specify the parameter
as indirect. If we say indirect=True, all parameters will be treated this way, if we provide
list of parameter names, only specified parameters will be passed into fixture and remaining will go
as they are into the test fuction. Here I use explicit list of indirect arguments.
import pytest
DBCFG = {"pgsql": "postgresql://scott:tiger#localhost:5432/mydatabaser",
"mysql": "mysql://scott:tiger#localhost/foo",
"oracle": "oracle://scott:tiger#127.0.0.1:1521/sidname"
}
#pytest.yield_fixture(scope="session")
def db_connect(request):
connect_name = request.param
print("\nopening db {connect_name}".format(connect_name=connect_name))
assert connect_name in DBCFG
yield DBCFG[connect_name]
print("\nclosing db {connect_name}".format(connect_name=connect_name))
#pytest.fixture(scope="session")
def table_name(request):
return "tabname-by-fixture {request.param}".format(request=request)
scenarios = [
("mysql", "myslq-user"),
("mysql", "myslq-groups"),
("pgsql", "pgsql-user"),
("pgsql", "pgsql-groups"),
("oracle", "oracle-user"),
("oracle", "oracle-groups"),
]
#pytest.mark.parametrize("db_connect,table_name",
scenarios,
indirect=["db_connect", "table_name"])
def test_it(db_connect, table_name):
print("Testing: {} + {}".format(db_connect, table_name))
Running the test suite:
$ py.test -sv tests/test_indirect.py
py.test========================================= test session starts ==================================
=======
platform linux2 -- Python 2.7.9, pytest-2.8.7, py-1.4.31, pluggy-0.3.1 -- /home/javl/.virtualenvs/stack
/bin/python2
cachedir: tests/.cache
rootdir: /home/javl/sandbox/stack/tests, inifile:
collected 6 items
tests/test_indirect.py::test_it[mysql-myslq-user]
opening db mysql
Testing: mysql://scott:tiger#localhost/foo + tabname-by-fixture myslq-user
PASSED
closing db mysql
tests/test_indirect.py::test_it[mysql-myslq-groups]
opening db mysql
Testing: mysql://scott:tiger#localhost/foo + tabname-by-fixture myslq-groups
PASSED
closing db mysql
tests/test_indirect.py::test_it[pgsql-pgsql-user]
opening db pgsql
Testing: postgresql://scott:tiger#localhost:5432/mydatabaser + tabname-by-fixture pgsql-user
PASSED
closing db pgsql
tests/test_indirect.py::test_it[pgsql-pgsql-groups]
opening db pgsql
Testing: postgresql://scott:tiger#localhost:5432/mydatabaser + tabname-by-fixture pgsql-groups
PASSED
closing db pgsql
tests/test_indirect.py::test_it[oracle-oracle-user]
opening db oracle
Testing: oracle://scott:tiger#127.0.0.1:1521/sidname + tabname-by-fixture oracle-user
PASSED
closing db oracle
tests/test_indirect.py::test_it[oracle-oracle-groups]
opening db oracle
Testing: oracle://scott:tiger#127.0.0.1:1521/sidname + tabname-by-fixture oracle-groups
PASSED
closing db oracle
====================================== 6 passed in 0.01 seconds =======================================
we see it works.
Anyway, there is one small issue - the db_connect scope "session" is not honored and it is
instantiated and destroyed at function level. This is known issue.

Django TestCase not saving my models

I'm currently writing some tests for a Django app. I've got the following standalone function in my app's signals.py file:
def updateLeaveCounts():
# Setting some variables here
todaysPeriods = Period.objects.filter(end__lte=today_end, end__gte=today_start).filter(request__leavetype="AN")
for period in todaysPeriods:
print period
counter = LeaveCounter.objects.get(pk=period.request.submitter)
# some Logic here
period.batch_processed = True
period.save()
and in my TestCase, I'm calling it as follows:
def test_johnsPostLeaveCounters(self):
# Some setup here
p = Period.objects.create(request=request,start=datetime(today.year,today.month,today.day,9),end=datetime(today.year,today.month,today.day,16,30),length='whole')
updateLeaveCounts()
self.assertEqual(p.batch_processed,True)
updateLeaveCounts() is catching my newly created Period object in the for loop (I can see its details printed to the console by print period), but my assertEqual() test is failing - telling me that the batch_processed attribute is still False.
It's as if the period.save() transaction isn't being called.
I'm aware that in Django versions prior to 1.8, you'd have to use the TransactionTestCase class, but I'm running 1.8.3 for this project at the moment, so I don't believe that's the issue.
Is there something I need to do to have the TestCases correctly reflect the model.save() action I'm performing in this function and hence have this function covered by tests?
Try to use refresh_from_db:
# ...
updateLeaveCounts()
p.refresh_from_db()
self.assertEqual(p.batch_processed, True)
# ...

Can I make test dummies for every kind of service for an integrated app?

I have a fairly complex app that uses celery, mongo, redis and pyramid. I use nose for testing. I'm not doing TDD (not test-first, at least), but I am trying very hard to get a decent amount of coverage. I'm stuck in the parts that are integrated with some of the above services. For example, I'm using redis for shared memory between celery tasks, but I'd like to be able to switch to memcache without too much trouble, so I've abstracted out the following functions:
import settings
db = StrictRedis(host=settings.db_uri, db=settings.db_name)
def has_task(correlation_id):
"""Return True if a task exists in db."""
return db.exists(str(correlation_id))
def pop_task(correlation_id):
"""Get a task from db by correlation_id."""
correlation_id = str(correlation_id) # no unicode allowed
task_pickle = db.get(correlation_id)
task = loads(task_pickle)
if task:
db.delete(correlation_id)
return task
def add_task(correlation_id, task_id, model):
"""Add a task to db"""
return db.set(str(correlation_id), dumps((task_id, model)))
I'm also doing similar things to abstract Mongo, which I'm using for persistent storage.
I've seen test suites for integrated web apps that run dummy http servers, create dummy requests and even dummy databases. I'm OK for celery and pyramid, but I haven't been able to find dummies for mongo and redis, so I'm only able to run tests for the above when those services are actually running. Is there any way to provide dummy services for the above so I don't have:
to have external services installed and running, and
to manually create and destroy entire databases (in-memory dummies can be counted on to cleanup after themselves)
I would suggest you use the mock library for such tasks. This allows you to replace your production objects (for example the database connection) with some pseudo objects which could be provided with some functionality needed for testing.
Example:
>>> from mock import Mock
>>> db = Mock()
>>> db.exists.return_value = True
>>> db.exists()
True
You can make assertions how your code interact with the mock, for example:
>>> db.delete(1)
<Mock name='mock.delete()' id='37588880'>
>>> db.delete.assert_called_with(1)
>>> db.delete.assert_called_with(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\mock.py", line 863, in assert_called_with
raise AssertionError(msg)
AssertionError: Expected call: delete(2)
Actual call: delete(1)

Categories