I am using Pony ORM for managing an sqlite database in a python package I am developing.
I would like to use pytest for testing.
My package provides an "agent" object, which is used to connect to a server API and retrieve "events". On initialisation of the agent, the pony orm is setup and bound to a sqlite db, either in memory (for testing) or as a file.
def setup_db(filepath=None):
if filepath:
db.bind(provider="sqlite", filename=filepath, create_db=True)
else:
db.bind(provider="sqlite", filename=":memory:", create_db=True)
db.provider.converter_classes.append((Enum, EnumConverter))
db.generate_mapping(create_tables=True)
The state of the events are stored in an sqlite db using pony orm.
I wish to create a new agent object, with a clean database for each test, so I am using a pytest fixture in the conftest.py file.
#pytest.fixture
def agent():
agent=Agent(parm1="param1",...)
return agent
I am unable to correctly "unbind" from the database and get this error on my second test:
pony.orm.core.BindingError: Database object was already bound to SQLite
provider
I would like some advice on the best way to proceed.
Thanks.
I think for your case you should make some factory for entities and create new db objects for each setup.
def define_entities(db):
class Student(db.Entity):
...
class Group(db.Entity):
...
So then you can do something like
def setup_db(filepath=None):
db = Database()
if filepath:
db.bind(provider="sqlite", filename=filepath, create_db=True)
else:
db.bind(provider="sqlite", filename=":memory:", create_db=True)
define_entities(db)
db.provider.converter_classes.append((Enum, EnumConverter))
db.generate_mapping(create_tables=True)
Looking at Pony code it seems it should be enough to clear provider attribute of Database instance to make it fresh so it can be bind again.
If you yield Agent instead of returning it from your fixture, everything you put after yield statement will be run as fixture tear down code.
The previous answer from zgoda almost works. Besides db.provider, it is also necessary to clear db.schema. My suggestion is that you create another function:
def unbind_db():
db.provider = db.schema = None
And your fixture should be something like:
#fixture
def database() -> None:
setup_db()
...
try:
yield
finally:
unbind_db()
Related
I implemented scenario support for my pytest-based tests, and it works well.
However, one of my fixtures initialises the database with clean tables, and the second scenario runs the test with a dirty database. How can I get the database fixture re-initialised or refreshed in subsequent scenarios?
To be clear, I want to see this:
scenario 1
test_demo1 gets a fresh DB, and the test writes to the DB
test_demo2 does not re-init the DB, but sees the changes made by test_demo1
scenario 2
test_demo1 gets a fresh DB again, and the test writes to the DB
test_demo2 does not re-init the DB, but sees the changes made by test_demo1 only in scenario 2.
def pytest_runtest_setup(item):
if hasattr(item.cls, "scenarios") and "first" in item.keywords:
# what to do here?
#pytest.mark.usefixtures("db")
class TestSampleWithScenarios:
scenarios = [scenario1, scenario2]
#pytest.mark.first
def test_demo1(self, db):
# db is dirty here in scenario2
pass
def test_demo2(self, db):
pass
I'm currently digging through the pytest sources to find an answer and I will post here once I have something.
I've found a workaround. Have a regular test which uses the DB fixture, parametrize with the scenarios, and call the tests in the class directly:
#pytest.mark.parametrize(["scenario"], scenarios)
def test_sample_with_scenarios(db, scenario):
TestSampleWithScenarios().test_demo1(db, scenario)
TestSampleWithScenarios().test_demo2(db, scenario)
I'm experiencing some behavior that I can't quite figure out. I'm using Cassandra to store message objects, and I'm using Celery for async pulls and pushes to the database. Everything is working fine, except for a single Celery task; the other tasks that use the same code/classes work. Here's a rough breakdown of the code logic:
db_manager = DBManager()
class User(object):
def __init__(self, user_id):
... normal init stuff ...
self.loader()
#run_async
def loader(self):
... loads from database if found, otherwise pulls from API ...
# THIS WORKS
#celery.task(name='user-to-db', filter=task_method)
def to_db(self):
# db_manager is a custom backend that handles relevant db reads, writes, etc.
db_manager.add('users', self.user_payload)
# THIS WORKS
#celery.task(name='load-friends', filter=task_method)
def load_friends(self):
# Checks secondary redis index for friends of user
friends = redis.srandmember('users:the-users-id:friends', self.id, 20)
if not friends:
profiles = load_friends_from_api(user_id=self.id)
else:
query = "SELECT * FROM keyspace.users WHERE id IN ({friends})".format(friends=friends)
# Init a User object for every friend
loaded_friends = [User(friend) for friend in profiles]
# Returns a class container with all the instances of User(friend), accessible through a class property
return FriendContainer(self.id, loaded_friends)
# THIS DOES NOT WORK
#celery.task(name='get-user-messages', filter=task_method)
def get_user_messages(self):
# THIS IS WHERE IT FAILS #
messages = db_manager.get("SELECT message FROM keyspace.message_timelines WHERE user_id = {user_id}".format(user_id=self.id))
# THAT LINE ABOVE #
# Init a message class object for every message payload in database
msgs = [Message(m, user=self) for m in messages]
# Returns a message container class holding all the message objects, accessible through a class property
return MessageContainer(msgs)
This last class method throws an error:
File "/usr/local/lib/python2.7/dist-packages/kombu/serialization.py", line 356, in pickle_dumps
return dumper(obj, protocol=pickle_protocol)
EncodeError: Can't pickle <class 'cassandra.io.eventletreactor.message'>: attribute lookup cassandra.io.eventletreactor.message failed
cassandra.io.eventletreactor.message points to a user-defined type in Cassandra that I use as a container for message objects per user. The line that throws this error is:
messages = db_manager.get("SELECT message FROM keyspace.message_timelines WHERE user_id = {user_id}".format(user_id=self.id))
This is the method from DBManager():
class DBManager(object):
... stuff ...
def get(self, query):
# I do some stuff to prepare the query, namely substituting `WHERE this = that` for `WHERE this = ?` to create a Cassandra prepared statement.
statement = cassandra.prepare(query_prepared)
# I want these messages as a dict, not the default namedtuple
cassandra.row_factory = dict_factory
# User id is parsed out of query
results = cassandra.execute(statement, (user_id,))
rows = results.current_rows
# rows is a list of dicts, no weird class references or anything in there
return rows
I've read that Celery tasks out of class methods is/was kind of experimental, but I can't figure out why all the other methods qua tasks that use the same instance of DBManager are working.
The problem seems to be localized to some issue with the user-defined type message that's not playing nice within the Cassandra driver; however, if I run the get method from DBManager within the Celery task itself, it works. That is, if I copy/paste the code that is throwing the error from DBManager.get into User.get_user_messages, it works fine. If I try to call DBManager.get from within User.get_user_messages, it breaks.
I just can't figure out where the problem is. I can do all the following just fine:
Run the get_user_messages method without Celery, and it works.
Run the get_user_messages method WITH Celery if I run the get method code right in the Celery task method itself.
I can run other methods registered as Celery tasks that point to other methods in DBManager that use the Cassandra driver, even ones that insert the same message user-defined type into the database.
I've tried pickling ALL THE THINGS all the way down myself, and in various combinations, and can't reproduce the error.
What I have not tried:
Change serializer to json or yaml. There are a few convenience items in the db payload that won't serialize with either of those two.
Use dill instead of pickle. It seems like this should work without having to switch serializers given that I can get various parts working separately.
I could just say screw it and run the query directly through the Cassandra driver instead of my DBManager class, but I feel like this should be solvable and I'm just missing something really, really obvious, so obvious that I'm not seeing it. Any suggestions on where to look would be greatly appreciated.
In case of relevance: Cassandra 3.3, CQL 3.4, DataStax python driver 3.1
Meh, I found the problem, and it WAS really obvious. I guess I didn't actually try pickling all the things, just most of the things, and I didn't catch this in my 4am debugging stupor.
At any rate, cassandra.row_factory = dict_factory, when called on a user defined type, doesn't actually return everything as a dict. It gives a dict of {'label': message(x='this', y='that')}, where message is a namedtuple. The Cassandra driver dynamically creates the namedtuple inside of a class instance, and so pickle couldn't find it.
I am trying to implement a simple database program in python. I get to the point where I have added elements to the db, changed the values, etc.
class db:
def __init__(self):
self.database ={}
def dbset(self, name, value):
self.database[name]=value
def dbunset(self, name):
self.dbset(name, 'NULL')
def dbnumequalto(self, value):
mylist = [v for k,v in self.database.items() if v==value]
return mylist
def main():
mydb=db()
cmd=raw_input().rstrip().split(" ")
while cmd[0]!='end':
if cmd[0]=='set':
mydb.dbset(cmd[1], cmd[2])
elif cmd[0]=='unset':
mydb.dbunset(cmd[1])
elif cmd[0]=='numequalto':
print len(mydb.dbnumequalto(cmd[1]))
elif cmd[0]=='list':
print mydb.database
cmd=raw_input().rstrip().split(" ")
if __name__=='__main__':
main()
Now, as a next step I want to be able to do nested transactions within this python code.I begin a set of commands with BEGIN command and then commit them with COMMIT statement. A commit should commit all the transactions that began. However, a rollback should revert the changes back to the recent BEGIN. I am not able to come up with a suitable solution for this.
A simple approach is to keep a "transaction" list containing all the information you need to be able to roll-back pending changes:
def dbset(self, name, value):
self.transaction.append((name, self.database.get(name)))
self.database[name]=value
def rollback(self):
# undo all changes
while self.transaction:
name, old_value = self.transaction.pop()
self.database[name] = old_value
def commit(self):
# everything went fine, drop undo information
self.transaction = []
If you are doing this as an academic exercise, you might want to check out the Rudimentary Database Engine recipe on the Python Cookbook. It includes quite a few classes to facilitate what you might expect from a SQL engine.
Database is used to create database instances without transaction support.
Database2 inherits from Database and provides for table transactions.
Table implements database tables along with various possible interactions.
Several other classes act as utilities to support some database actions that would normally be supported.
Like and NotLike implement the LIKE operator found in other engines.
date and datetime are special data types usable for database columns.
DatePart, MID, and FORMAT allow information selection in some cases.
In addition to the classes, there are functions for JOIN operations along with tests / demonstrations.
This is all available for free in the built in sqllite module. The commits and rollbacks for sqllite are discussed in more detail than I can understand here
there's something I'm struggling to understand with SQLAlchamy from it's documentation and tutorials.
I see how to autoload classes from a DB table, and I see how to design a class and create from it (declaratively or using the mapper()) a table that is added to the DB.
My question is how does one write code that both creates the table (e.g. on first run) and then reuses it?
I don't want to have to create the database with one tool or one piece of code and have separate code to use the database.
Thanks in advance,
Peter
create_all() does not do anything if a table exists already, so just call it as soon as you set up your engine or connection.
(Note that if you change your table schema, create_all() will not update it! So you still need "another program" to do that.)
This is the usual pattern:
def createEngine(metadata, dsn, **args):
engine = create_engine(dsn, **args)
metadata.create_all(engine)
return engine
def doStuff(engine):
res = engine.execute('select * from mytable')
# etc etc
def main():
engine = createEngine(metadata, 'sqlite:///:memory:')
doStuff(engine)
if __name__=='__main__':
main()
I think you're perhaps over-thinking the situation. If you want to create the database afresh, you normally just call Base.metadata.create_all() or equivalent, and if you don't want to do that, you don't call it.
You could try calling it every time and handling the exception if it goes wrong, assuming that the database is already set up.
Or you could try querying for a certain table and if that fails, call create_all() to put everything in place.
Every other part of your app should work in the same way whether you perform the db creation or not.
In my application, there is a class for each model that holds commonly used queries (I guess it's somewhat of a "Repository" in DDD language). Each of these classes is passed the SQLAlchemy session object to create queries with upon construction. I'm having a little difficulty in figuring the best way to assert certain queries are being run in my unit tests. Using the ubiquitous blog example, let's say I have a "Post" model with columns and attributes "date" and "content". I also have a "PostRepository" with the method "find_latest" that is supposed to query for all posts in descending order by "date". It looks something like:
from myapp.models import Post
class PostRepository(object):
def __init__(self, session):
self._s = session
def find_latest(self):
return self._s.query(Post).order_by(Post.date.desc())
I'm having trouble mocking the Post.date.desc() call. Right now I'm monkey patching a mock in for Post.date.desc in my unit test, but I feel that there is likely a better approach.
Edit: I'm using mox for mock objects, my current unit test looks something like:
import unittest
import mox
class TestPostRepository(unittest.TestCase):
def setUp(self):
self._mox = mox.Mox()
def _create_session_mock(self):
from sqlalchemy.orm.session import Session
return self._mox.CreateMock(Session)
def _create_query_mock(self):
from sqlalchemy.orm.query import Query
return self._mox.CreateMock(Query)
def _create_desc_mock(self):
from myapp.models import Post
return self._mox.CreateMock(Post.date.desc)
def test_find_latest(self):
from myapp.models.repositories import PostRepository
from myapp.models import Post
expected_result = 'test'
session_mock = self._create_session_mock()
query_mock = self._create_query_mock()
desc_mock = self._create_desc_mock()
# Monkey patch
tmp = Post.date.desc
Post.date.desc = desc_mock
session_mock.query(Post).AndReturn(query_mock)
query_mock.order_by(Post.date.desc().AndReturn('test')).AndReturn(query_mock)
query_mock.offset(0).AndReturn(query_mock)
query_mock.limit(10).AndReturn(expected_result)
self._mox.ReplayAll()
r = PostRepository(session_mock)
result = r.find_latest()
self._mox.VerifyAll()
self.assertEquals(expected_result, result)
Post.date.desc = tmp
This does work, though feels ugly and I'm not sure why it fails without the "AndReturn('test')" piece of "Post.date.desc().AndReturn('test')"
I don't think you're really gaining much benefit by using mocks for testing your queries. Testing should be testing the logic of the code, not the implementation. A better solution would be to create a fresh database, add some objects to it, run the query on that database, and determine if you're getting the correct results back. For example:
# Create the engine. This starts a fresh database
engine = create_engine('sqlite://')
# Fills the database with the tables needed.
# If you use declarative, then the metadata for your tables can be found using Base.metadata
metadata.create_all(engine)
# Create a session to this database
session = sessionmaker(bind=engine)()
# Create some posts using the session and commit them
...
# Test your repository object...
repo = PostRepository(session)
results = repo.find_latest()
# Run your assertions of results
...
Now, you're actually testing the logic of the code. This means that you can change the implementation of your method, but so long as the query works correctly, the tests should still pass. If you want, you could write this method as a query that gets all the objects, then slices the resulting list. The test would pass, as it should. Later on, you could change the implementation to run the query using the SA expression APIs, and the test would pass.
One thing to keep in mind is that you might have problems with sqlite behaving differently than another database type. Using sqlite in-memory gives you fast tests, but if you want to be serious about these tests, you'll probably want to run them against the same type of database you'll be using in production as well.
If yet you want to create a unit test with mock input, you can create instances of your model with fake data
In case that the result proxy return result with data from more than one of the models (for instance when you join two tables), you can use collections data struct called namedtuple
We are using it to mock results of join queries