In my application I'm using SQLAlchemy for storing most persistent data across app restarts. For this I have a db package containing my mapper classes (like Tag, Group etc.) and a support class creating a single engine instance using create_engine and a single, global, Session factory using sessionmaker.
Now my understanding of how to use SQLAlchemys sessions is, that I don't pass them around in my app but rather create instances using the global factory whenever I need database access.
This leads to situations were a record is queried in one session and then passed on to another part of the app, which uses a different session instance. This gives me exceptions like this one:
Traceback (most recent call last):
File "…", line 29, in delete
session.delete(self.record)
File "/usr/lib/python3.3/site-packages/sqlalchemy/orm/session.py", line 1444, in delete
self._attach(state, include_before=True)
File "/usr/lib/python3.3/site-packages/sqlalchemy/orm/session.py", line 1748, in _attach
state.session_id, self.hash_key))
sqlalchemy.exc.InvalidRequestError: Object '<Group at 0x7fb64c7b3f90>' is already attached to session '1' (this is '3')
Now my question is: did I get the usage of Session completly wrong (so I should use one session only at a time and pass that session around to other components together with records from the database) or could this result from actual code issue?
Some example code demonstrating my exact problem:
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base, declared_attr
Base = declarative_base()
class Record(Base):
__tablename__ = "record"
id = Column(Integer, primary_key=True)
name = Column(String)
def __init__(self, name):
self.name = name
def __repr__(self):
return "<%s('%s')>" % (type(self).__name__, self.name)
engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
s1 = Session()
record = Record("foobar")
s1.add(record)
s1.commit()
# This would be a completly different part of app
s2 = Session()
record = s2.query(Record).filter(Record.name == "foobar").first()
def delete_record(record):
session = Session()
session.delete(record)
session.commit()
delete_record(record)
For now I switched over to using a single, global session instance. That's neither nice nor clean in my opinion, but including lots and lots of boiler plate code to expunge objects from one session just to add them back to their original session after handing it over to some other application part was no realistic option, either.
I suppose this will completely blow up if I start using multiple threads to access the database via the very same session…
Related
I am using Flask-SQLAlchemy, with autocommit set to False and autoflush set to True. It's connecting to a mysql database.
I have 3 methods like this:
def insert_something():
insert_statement = <something>
db.session.execute(insert_statement);
db.session.commit()
def delete_something():
delete_statement = <something>
db.session.execute(delete_statement);
db.session.commit()
def delete_something_else():
delete_statement = <something>
db.session.execute(delete_statement);
db.session.commit()
Sometimes I want to run these methods individually; no problems there — but sometimes I want to run them together in a nested transaction. I want insert_something to run first, and delete_something to run afterwards, and delete_something_else to run last. If any of those methods fail then I want everything to be rolled back.
I've tried the following:
db.session.begin_nested()
insert_something()
delete_something()
delete_something_else()
db.session.commit()
This doesn't work, though, because insert_something exits the nested transaction (and releases the savepoint). Then, when delete_something runs db.session.commit() it actually commits the deletion to the database because it is in the outermost transaction.
That final db.session.commit() in the code block above doesn't do anything..everything is already committed by that point.
Maybe I can do something like this, but it's ugly as hell:
db.session.begin_nested()
db.session.begin_nested()
db.session.begin_nested()
db.session.begin_nested()
insert_something()
delete_something()
delete_something_else()
db.session.commit()
There's gotta be a better way to do it without touching the three methods..
Edit:
Now I'm doing it like this:
with db.session.begin_nested():
insert_something()
with db.session.begin_nested():
delete_something()
with db.session.begin_nested():
delete_something_else()
db.session.commit()
Which is better, but still not great.
I'd love to be able to do something like this:
with db.session.begin_nested() as nested:
insert_something()
delete_something()
delete_something_else()
nested.commit() # though I feel like you shouldn't need this in a with block
The docs discuss avoiding this pattern in arbitrary-transaction-nesting-as-an-antipattern and session-faq-whentocreate.
But there is an example in the docs that is similar to this but it is for testing.
https://docs.sqlalchemy.org/en/14/orm/session_transaction.html?highlight=after_transaction_end#joining-a-session-into-an-external-transaction-such-as-for-test-suites
Regardless, here is a gross transaction manager based on the example that "seems" to work but don't do this. I think there are a lot of gotchas in here.
import contextlib
from sqlalchemy import (
create_engine,
Integer,
String,
)
from sqlalchemy.schema import (
Column,
MetaData,
)
from sqlalchemy.orm import declarative_base, Session
from sqlalchemy import event
from sqlalchemy.sql import delete, select
db_uri = 'postgresql+psycopg2://username:password#/database'
engine = create_engine(db_uri, echo=True)
metadata = MetaData()
Base = declarative_base(metadata=metadata)
class Device(Base):
__tablename__ = "devices"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(50))
def get_devices(session):
return [d.name for (d,) in session.execute(select(Device)).all()]
def create_device(session, name):
session.add(Device(name=name))
session.commit()
def delete_device(session, name):
session.execute(delete(Device).filter(Device.name == name))
session.commit()
def almost_create_device(session, name):
session.add(Device(name=name))
session.flush()
session.rollback()
#contextlib.contextmanager
def force_nested_transaction_forever(session, commit_on_complete=True):
"""
Keep re-entering a nested transaction everytime a transaction ends.
"""
d = {
'nested': session.begin_nested()
}
#event.listens_for(session, "after_transaction_end")
def end_savepoint(session, transaction):
# Start another nested trans if the prior one is no longer active.
if not d['nested'].is_active:
d['nested'] = session.begin_nested()
try:
yield
finally:
# Stop trapping us in perpetual nested transactions.
# Is this the right place for this ?
event.remove(session, "after_transaction_end", end_savepoint)
# This seems like it would be error prone.
if commit_on_complete and d['nested'].is_active:
d.pop('nested').commit()
if __name__ == '__main__':
metadata.create_all(engine)
with Session(engine) as session:
with session.begin():
# THIS IS NOT RECOMMENDED
with force_nested_transaction_forever(session):
create_device(session, "0")
create_device(session, "a")
delete_device(session, "a")
almost_create_device(session, "a")
create_device(session, "b")
assert len(get_devices(session)) == 2
assert len(get_devices(session)) == 2
According to SQLAlchemy documentation, engine and sessionmaker instances should be created in the application's global scope:
When do I make a sessionmaker? Just one time, somewhere in your application’s global scope. It should be looked upon as part of your application’s configuration. If your application has three .py files in a package, you could, for example, place the sessionmaker line in your __init__.py file; from that point on your other modules say “from mypackage import Session”. That way, everyone else just uses Session(), and the configuration of that session is controlled by that central point.
Questions:
What is the best practice for cleaning up SQLAlchemy engine and sessionmaker instances? Please refer to my example below, while I could call engine.dispose() in main.py, it does not seem a good practice to clean up a global object from a different module (database.py) in __main__ (main.py), is there a better way to do it?
Do we need to clean up sessionmaker instances? It seems there is no method for closing the sessionmaker instance (sessionmaker.close_all() is deprecated, and session.close_all_sessions() is a session instance method and not sessionmaker method.)
Example:
I created the engine and sessionmaker object in a module called database.py:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from contextlib import contextmanager
DB_ENGINE = create_engine(DB_CONNECTION_STRING, pool_size=5, max_overflow=10)
DB_SESSION = sessionmaker(bind=DB_ENGINE, autocommit=False, autoflush=True, expire_on_commit=False)
#contextmanager
def db_session(db_session_factory):
"""Provide a transactional scope around a series of operations."""
session = db_session_factory()
try:
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.close()
In my main application main.py, I import the module and use the engine and sessionmaker instances as follows. I cleaned up the engine instance at the end of __main__.
from multiprocessing.pool import ThreadPool
from database import DB_ENGINE, DB_SESSION, db_session
def worker_func(data):
with db_session(DB_SESSION) as session:
[...database operations using session object...]
if __name__ == '__main__':
try:
data = [1,2,3,4,5]
with ThreadPool(processes=5) as thread_pool:
results = thread_pool.map(worker_func, data)
except:
raise
finally:
# Cleanup
DB_ENGINE.dispose()
I would like to create a number of functions which start by calling one particular function, and end by calling another.
Each function would take a different number of arguments, but they would share the first and last line. Is this possible?
To give you an example, I am trying to use this to create a set of functions which can connect to my database via sqlalchemy, add an entry to it, and exit nicely:
from sqlalchemy import create_engine
from os import path
from common_classes import *
from sqlalchemy.orm import sessionmaker
def loadSession():
db_path = "sqlite:///" + path.expanduser("~/animal_data.db")
engine = create_engine(db_path, echo=False)
Session = sessionmaker(bind=engine)
session = Session()
Base.metadata.create_all(engine)
return session, engine
def add_animal(id_eth, cage_eth, sex, ear_punches, id_uzh="", cage_uzh=""):
session, engine = loadSession()
new_animal = Animal(id_eth=id_eth, cage_eth=cage_eth, sex=sex, ear_punches=ear_punches, id_uzh=id_uzh, cage_uzh=cage_uzh)
session.add(new_animal)
commit_and_close(session, engine)
def add_genotype(name, zygosity):
session, engine = loadSession()
new_genotype = Genotype(name=name, zygosity=zygosity)
session.add(new_animal)
commit_and_close(session, engine)
def commit_and_close(session, engine):
session.commit()
session.close()
engine.dispose()
Again, what I am trying to do is collapse add_animal() and add_genotype() (and prospectively many more functions) into a single constructor.
I have thought maybe I can use a class for this, and while I believe loadSession() could be called from __init__ I have no idea how to call the commit_and_close() function at the end - nor how to manage the variable number of arguments of every subclass...
Instead of having add_X functions for every type X, just create a single add function that adds an object which you create on the “outside” of the funcition:
So add_animal(params…) becomes add(Animal(params…)), and add_genotype(params…) becomes add(Genotype(params…)).
That way, your add function would just look like this:
def add (obj):
session, engine = loadSession()
session.add(obj)
commit_and_close(session, engine)
Then it’s up to the caller of that function to create the object, which opens up the interface and allows you to get objects from elsewhere too. E.g. something like this would be possible too then:
for animal in zoo.getAnimals():
add(animal)
Just as a preface, I understand that there are easier ways to accomplish much of what i'm trying to do, and the following question is for purposes of learning how to build classes and instantiate a database connection within that class.
I'm building a class that right now just takes in two variables; the name of MongoDB database, and the collection name from that database. I am trying to instantiate the connection of this database and the collection in the init function of this class. The problem I am having is that the init function is connecting to the database of the actual name of the variable instead of the variable's actual assignment. More specifically, if I instantiate,
>>>salesChar = MongoDumps("sales","char")
and then I call,
>>>salesChar.db.name
it will instead connect to the "dBase" (the name of the variable) database instead of the "salesChar" (the assignment of the dBase variable) database. Please view code below,
import pymongo
from pymongo import MongoClient
class MongoDumps():
"""Data Dumping into MongoDB"""
def __init__(self, dBase, dumpCollection):
self.dBase = dBase
self.dumpCollection = dumpCollection
client = MongoClient()
self.db = client.dBase
self.collection = self.db.dumpCollection
I've tried a combination of strategies and none seem to work with a similar result in each one. Are there certain limitations to using assignments in a class? Thanks for your help!
Use getattr to get property by string. As documentation says, getattr(x, 'foobar') is equivalent to x.foobar. Your code should look like:
class MongoDumps():
def __init__(self, dBase, dumpCollection):
self.dBase = dBase
self.dumpCollection = dumpCollection
client = MongoClient()
self.db = getattr(client, dBase)
self.collection = getattr(self.db, dumpCollection)
Then you can use this class to get collection by name:
salesChar = MongoDumps("sales", "char")
first = salesChar.collection.find_one()
I am trying to add an event listener to the before_commit event of an SQLAlchemy Session inside of a Flask application. When doing the following
def before_commit(session):
for item in session:
if hasattr(item, 'on_save'):
item.on_save(session)
event.listen(db.session, 'before_commit', before_commit)
I get
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "app.py", line 60, in <module>
event.listen(db.session, 'before_commit', before_commit)
File "C:\Python27\lib\site-packages\sqlalchemy\event\api.py", line 49, in listen
_event_key(target, identifier, fn).listen(*args, **kw)
File "C:\Python27\lib\site-packages\sqlalchemy\event\api.py", line 22, in _event_key
tgt = evt_cls._accept_with(target)
File "C:\Python27\lib\site-packages\sqlalchemy\orm\events.py", line 1142, in _accept_with
"Session event listen on a scoped_session "
sqlalchemy.exc.ArgumentError: Session event listen on a scoped_session requires that its creation callable is associated with the Session class.
I can't find the correct way to register the event listener. The documentation actually states that event.listen() also accepts a scoped_session, but it seems like it does not?!
http://docs.sqlalchemy.org/en/latest/orm/events.html#sqlalchemy.orm.events.SessionEvents
The listen() function will accept Session objects as well as the return result of
sessionmaker() and scoped_session().
Additionally, it accepts the Session class which will apply listeners to all Session
instances globally.
it means that the factory you've passed to scoped_session() must be a sessionmaker():
from sqlalchemy.orm import scoped_session, sessionmaker, sessionmaker
from sqlalchemy import event
# good
ss1 = scoped_session(sessionmaker())
#event.listens_for(ss1, "before_flush")
def evt(*arg, **kw):
pass
# bad
ss2 = scoped_session(lambda: Session)
#event.listens_for(ss2, "before_flush")
def evt(*arg, **kw):
pass
To give another example, this codebase won't work:
https://sourceforge.net/p/turbogears1/code/HEAD/tree/branches/1.5/turbogears/database.py
# bad
def create_session():
"""Create a session that uses the engine from thread-local metadata.
The session by default does not begin a transaction, and requires that
flush() be called explicitly in order to persist results to the database.
"""
if not metadata.is_bound():
bind_metadata()
return sqlalchemy.orm.create_session()
session = sqlalchemy.orm.scoped_session(create_session)
Instead it needs to be something like the following:
# good
class SessionMakerAndBind(sqlalchemy.orm.sessionmaker):
def __call__(self, **kw):
if not metadata.is_bound():
bind_metadata()
return super(SessionMakerAndBind, self).__call__(**kw)
sessionmaker = SessionMakerAndBind(autoflush=False,
autocommit=True, expire_on_commit=False)
session = sqlalchemy.orm.scoped_session(sessionmaker)