Related
The Flask tutorial (and many other tutorials out there) suggests that the engine, the db_session and the Base (an instance of declarative_metadata) are all created at import-time.
This creates some problems, one being, that the URI of the DB is hardcoded in the code and evaluated only once.
One solution is to wrap these calls in functions that accept the app as a parameter, which is what I've done. Mind you - each call caches the result in app.config:
def get_engine(app):
"""Return the engine connected to the database URI in the config file.
Store it in the config for later use.
"""
engine = app.config.setdefault(
'DB_ENGINE', create_engine(app.config['DATABASE_URI'](), echo=True))
return engine
def get_session(app):
"""Return the DB session for the database in use
Store it in the config for later use.
"""
engine = get_engine(app)
db_session = app.config.setdefault(
'DB_SESSION', scoped_session(sessionmaker(
autocommit=False, autoflush=False, bind=engine)))
return db_session
def get_base(app):
"""Return the declarative base to use in DB models.
Store it in the config for later use.
"""
Base = app.config.setdefault('DB_BASE', declarative_base())
Base.query = get_session(app).query_property()
return Base
In init_db, I call all those functions, but there's still code smell:
def init_db(app):
"""Initialise the database"""
create_db(app)
engine = get_engine(app)
db_session = get_session(app)
base = get_base(app)
if not app.config['TESTING']:
import flaskr.models
else:
if 'flaskr.models' not in sys.modules:
import flaskr.models
else:
import flaskr.models
importlib.reload(flaskr.models)
base.metadata.create_all(bind=engine)
The smell is of course the hoops I have to go through to import and create all models.
The reason for the code above is that, when unit testing, init_db is called once for each test (in setup(), as suggested in the same tutorial), but the import will only be performed the first time, and create_all will therefore work only that time.
Not only that, now with a session shared for the duration of the app, I have problems in parametrized negative unit tests (that is, parametrized unit tests that expect some sort of failures): the first instance of the test will trigger a failure (e.g. login failure, see test_login_validate_input in the tutorial) and exit correctly, while all subsequent will bail out early because the db_session should be rolled back first. Clearly there's something wrong with the DB initialization.
What is the Right Way(TM) to initialize the database?
I have eventually decided to refactor the app so that it uses Flask-SQLAlchemy.
In short, the app now does something like this:
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()
def create_app():
app = Flask(__name__)
db.init_app(app)
# ...
With the benefit of hindsight, it's definitely a cleaner approach.
What put me off at the start was this entry from the tutorial (bold is mine):
Because SQLAlchemy is a common database abstraction layer and object relational mapper that requires a little bit of configuration effort, there is a Flask extension that handles that for you. This is recommended if you want to get started quickly.
Which I somehow read as "Using the Flask-SQLAlchemy extension will allow you to cut some corners, which you'll probably end up paying for later".
It's very early stages, but so far no price to pay in terms of flexibility for using said extension.
So I placed this question without too much context, and got downvoted, let's try again...
For one, I don't follow the logic behind SQLAlchemy's session.add. I understand that it queues the object for insertion, and I understand that session.query looks in the connected database rather than in the session, but is it at all possible, within SQLAlchemy, to query the session without first doing session.flush? My expectation from something which reads session.query is that it queries the session...
I am now manually looking in session.new after a None comes out of session.query().first().
There's two reasons why I don't want to do session.flush before my session.query,
one based on efficiency fears (why should I write to the database, and query the database if I am still within a session which the user may want to rollback?);
two is because I've adopted a fairly large program, and it manages to define its own Session whose instances causes flush to also commit.
So really the core of this question is who's helping me find an error in a GPL program on github!
This is a code snippet with a surprising behaviour in bauble/ghini:
# setting up things in ghini
# <replace-later>
import bauble
import bauble.db as db
db.open('sqlite:///:memory:', verify=False)
from bauble.prefs import prefs
import bauble.pluginmgr as pluginmgr
prefs.init()
prefs.testing = True
pluginmgr.load()
db.create(True)
Session = bauble.db.Session
from bauble.plugins.garden import Location
# </replace-later>
# now just plain straightforward usage
session = Session()
session.query(Location).delete()
session.commit()
u0 = session.query(Location).filter_by(code=u'mario').first()
print u0
u1 = Location(code=u'mario')
session.add(u1)
session.flush()
u2 = session.query(Location).filter_by(code=u'mario').one()
print u1, u2, u1==u2
session.rollback()
u3 = session.query(Location).filter_by(code=u'mario').first()
print u3
the output here is:
None
mario mario True
mario
here you have what I think is just standard simple code to set up a database:
from sqlalchemy import Column, Unicode
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Location(Base):
__tablename__ = 'location'
code = Column(Unicode(64), index=True, primary_key=True)
def __init__(self, code=None):
self.code = code
def __repr__(self):
return self.code
from sqlalchemy import create_engine
engine = create_engine('sqlite:///joindemo.db')
Base.metadata.create_all(engine)
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine, autoflush=False)
with this, the output of the same above code snippet is less surprising:
None
mario mario True
None
The reason why flushes in bauble end up emitting a COMMIT is the line 133 in db.py where they handle their history table:
table.insert(dict(table_name=mapper.local_table.name,
table_id=instance.id, values=str(row),
operation=operation, user=user,
timestamp=datetime.datetime.today())).execute()
Instead of issuing the additional SQL in the event handler using the passed in transactional connection, as they should, they execute the statement itself as is, which means it ends up using the engine as the bind (found through the table's metadata). Executing using the engine has autocommit behaviour. Since bauble always uses a SingletonThreadPool, there's just one connection per thread, and so that statement ends up committing the flushed changes as well. I wonder if this bug is the reason why bauble disables autoflush...
The fix is to change the event handling to use the transactional connection:
class HistoryExtension(orm.MapperExtension):
"""
HistoryExtension is a
:class:`~sqlalchemy.orm.interfaces.MapperExtension` that is added
to all clases that inherit from bauble.db.Base so that all
inserts, updates, and deletes made to the mapped objects are
recorded in the `history` table.
"""
def _add(self, operation, mapper, connection, instance):
"""
Add a new entry to the history table.
"""
... # a ton of code here
table = History.__table__
stmt = table.insert(dict(table_name=mapper.local_table.name,
table_id=instance.id, values=str(row),
operation=operation, user=user,
timestamp=datetime.datetime.today()))
connection.execute(stmt)
def after_update(self, mapper, connection, instance):
self._add('update', mapper, connection, instance)
def after_insert(self, mapper, connection, instance):
self._add('insert', mapper, connection, instance)
def after_delete(self, mapper, connection, instance):
self._add('delete', mapper, connection, instance)
It's worth a note that MapperExtension has been deprecated since version 0.7.
Regarding your views about the session I quote "Session Basics", which you really should read through:
In the most general sense, the Session establishes all conversations with the database and represents a “holding zone” for all the objects which you’ve loaded or associated with it during its lifespan. It provides the entrypoint to acquire a Query object, which sends queries to the database using the Session object’s current database connection, ...
and "Is the Session a cache?":
Yeee…no. It’s somewhat used as a cache, in that it implements the identity map pattern, and stores objects keyed to their primary key. However, it doesn’t do any kind of query caching. This means, if you say session.query(Foo).filter_by(name='bar'), even if Foo(name='bar') is right there, in the identity map, the session has no idea about that. It has to issue SQL to the database, get the rows back, and then when it sees the primary key in the row, then it can look in the local identity map and see that the object is already there. It’s only when you say query.get({some primary key}) that the Session doesn’t have to issue a query.
So:
My expectation from something which reads session.query is that it queries the session...
Your expectations are wrong. The Session handles talking to the DB – among other things.
There's two reasons why I don't want to do session.flush before my session.query,
one based on efficiency fears (why should I write to the database, and query the database if I am still within a session which the user may want to rollback?);
Because your DB may do validation, have triggers, and generate values for some columns – primary keys, timestamps, and the like. The data you thought you're inserting may end up something else in the DB and the Session has absolutely no way to know about that.
Also, why should SQLAlchemy implement a sort of an in-memory DB in itself, with its own query engine, and all the problems that come with synchronizing 2 databases? How would SQLAlchemy support all the different operations and functions of different DBs you query against? Your simple equality predicate example just scratches the surface.
When you rollback, you roll back the DB's transaction (along with the session's unflushed changes).
two is because I've adopted a fairly large program, and it manages to define its own Session whose instances causes flush to also commit.
Caused by the event handling bug.
Why I always need to do that in 2 steps in SqlAlchemy?
import sqlalchemy as sa
import sqlalchemy.orm as orm
engine = sa.create_engine(<dbPath>, echo=True)
Session = orm.sessionmaker(bind=engine)
my_session = Session()
Why I cannot do it in one shot like (it's could be more simple, no?) :
import sqlalchemy as sa
import sqlalchemy.orm as orm
engine = sa.create_engine(<dbPath>, echo=True)
Session = orm.Session(bind=engine)
The reason sessionmaker() exists is so that the various "configurational" arguments it requires only need to be set up in one place, instead of repeating "bind=engine, autoflush=False, expire_on_commit=False", etc. over and over again. Additionally, sessionmaker() provides an "updateable" interface, such that you can set it up somewhere in your application:
session = sessionmaker(expire_on_commit=False)
but then later, when you know what database you're talking to, you can add configuration to it:
session.configure(bind=create_engine("some engine"))
It also serves as a "callable" to pass to the very common scoped_session() construct:
session = scoped_session(sessionmaker(bind=engine))
With all of that said, these are just conventions that the documentation refers to so that a consistent "how to use" story is presented. There's no reason you can't use the constructor directly if that is more convenient, and I use the Session() constructor all the time. It's just that in a non-trivial application, you will probably end up sticking that constructor call to Session() inside some kind of callable function anyway, sessionmaker() serves as a default for that callable.
In the most general sense, the Session establishes all conversations with the database and represents a “holding zone” for all the objects which you’ve loaded or associated with it during its lifespan. It provides the entrypoint to acquire a Query object, which sends queries to the database using the Session object’s current database connection, populating result rows into objects that are then stored in the Session, inside a structure called the Identity Map - a data structure that maintains unique copies of each object, where “unique” means “only one object with a particular primary key”.
Try to pprint and see whats inside;
import pprint
pprint.pprint(my_session)
Here's the rest of the story: http://docs.sqlalchemy.org/ru/latest/orm/session.html
I'm designing a small GUI application to wrap an sqlite DB (simple CRUD operations). I have created three sqlalchemy models (m_person, m_card.py, m_loan.py, all in a /models folder) and had previously had the following code at the top of each one:
from sqlalchemy import Table, Column, create_engine
from sqlalchemy import Integer, ForeignKey, String, Unicode
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import backref, relation
engine = create_engine("sqlite:///devdata.db", echo=True)
declarative_base = declarative_base(engine)
metadata = declarative_base.metadata
This felt a bit wrong (DRY) so it was suggested that I move all this stuff out to the module level (into models/__init__.py).
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import Table, Column, Boolean, Unicode
from settings import setup
engine = create_engine('sqlite:///' + setup.get_db_path(), echo=False)
declarative_base = declarative_base(engine)
metadata = declarative_base.metadata
session = sessionmaker()
session = session()
..and import declarative_base like so:
from sqlalchemy import Table, Column, Unicode
from models import declarative_base
class Person(declarative_base):
"""
Person model
"""
__tablename__ = "people"
id = Column(Unicode(50), primary_key=True)
fname = Column(Unicode(50))
sname = Column(Unicode(50))
However I've had a lot of feedback that executing code as the module imports like this is bad? I'm looking for a definitive answer on the right way to do it as it seems like by trying to remove code repetition I've introduced some other bad practises. Any feedback would be really useful.
(Below is the get_db_path() method from settings/setup.py for completeness as it is called in the above models/__init__.py code.)
def get_db_path():
import sys
from os import makedirs
from os.path import join, dirname, exists
from constants import DB_FILE, DB_FOLDER
if len(sys.argv) > 1:
db_path = sys.argv[1]
else:
#Check if application is running from exe or .py and adjust db path accordingly
if getattr(sys, 'frozen', False):
application_path = dirname(sys.executable)
db_path = join(application_path, DB_FOLDER, DB_FILE)
elif __file__:
application_path = dirname(__file__)
db_path = join(application_path, '..', DB_FOLDER, DB_FILE)
#Check db path exists and create it if not
def ensure_directory(db_path):
dir = dirname(db_path)
if not exists(dir):
makedirs(dir)
ensure_directory(db_path)
return db_path
Some popular frameworks (Twisted is one example) perform a good deal of initialization logic at import-time. The benefits of being able to build module contents dynamically do come at a price, one of them being that IDEs cannot always decide what is "in" the module or not.
In your specific case, you may want to refactor so that the particular engine is not supplied at import-time but later. You can create the metadata and your declarative_base class at import-time. Then during start-time, after all the classes are defined, you call create_engine and bind the result to your sqlalchemy.orm.sessionmaker. But if your needs are simple, you may not even need to go that far.
In general, I would say this is not Java or C. There is no reason to fear doing things at the module level other than define functions, classes and constants. Your classes are created when the application starts anyway, one after the other. A little monkey-patching of classes (in the same module!), or creating one or two global lookup tables, is OK in my opinion if it simplifies your implementation.
What I would definitely avoid is any code in your module that causes the order of imports to matter for your users (other than the normal way of simply providing logic to be used), or that modifies the behavior of code outside your module. Then your module becomes black magic, which is accepted in the Perl world (ala use strict;), but I find is not "pythonic".
For example, if your module modifies properties of sys.stdout when imported, I would argue that behavior should instead be moved into a function that the user can either call or not.
In principle there is nothing wrong with executing Python code when a module is imported, in fact every single Python module works that way. Defining module members is executing code, after all.
However, in your particular use case I would strongly advise against making a singleton database session object in your code base. You'll be losing out on the ability to do many things, for example unit test your model against an in-memory SQLite or other kind of database engine.
Take a look at the documentation for declarative_base and note how the examples are able to create the model with a declarative_base supplied class that's not yet bound to a database engine. Ideally you want to do that and then have some kind of connection function or class that will manage creating a session and then bind it to base.
There is nothing wrong with executing code at import time.
The guideline is executing enough so that your module is usable, but not so much that importing it is unnecessarily slow, and not so much that you are unnecessarily limiting what can be done with it.
Typically this means defining classes, functions, and global names (actually module level -- bit of a misnomer there), as well as importing anything your classes, functions, etc., need to operate.
This does not usually involve making connections to databases, websites, or other external, dynamic resources, but rather supplying a function to establish those connections when the module user is ready to do so.
I ran into this as well, and created a database.py file with a database manager class, after which I created a single global object. That way the class could read settings from my settings.py file to configure the database, and the first class that needed to use a base object (or session / engine) would initialize the global object, after which everyone just re-uses it. I feel much more comfortable having "from myproject.database import DM" at the top of each class using the SQLAlchemy ORM, where DM is my global database object, and then DM.getBase() to get the base object.
Here is my database.py class:
Session = scoped_session(sessionmaker(autoflush=True))
class DatabaseManager():
"""
The top level database manager used by all the SQLAlchemy classes to fetch their session / declarative base.
"""
engine = None
base = None
def ready(self):
"""Determines if the SQLAlchemy engine and base have been set, and if not, initializes them."""
host='<database connection details>'
if self.engine and self.base:
return True
else:
try:
self.engine = create_engine(host, pool_recycle=3600)
self.base = declarative_base(bind=self.engine)
return True
except:
return False
def getSession(self):
"""Returns the active SQLAlchemy session."""
if self.ready():
session = Session()
session.configure(bind=self.engine)
return session
else:
return None
def getBase(self):
"""Returns the active SQLAlchemy base."""
if self.ready():
return self.base
else:
return None
def getEngine(self):
"""Returns the active SQLAlchemy engine."""
if self.ready():
return self.engine
else:
return None
DM = DatabaseManager()
Because of legacy data which is not available in the database but some external files, I want to create a SQLAlchemy object which contains data read from the external files, but isn't written to the database if I execute session.flush()
My code looks like this:
try:
return session.query(Phone).populate_existing().filter(Phone.mac == ident).one()
except:
return self.createMockPhoneFromLicenseFile(ident)
def createMockPhoneFromLicenseFile(self, ident):
# Some code to read necessary data from file deleted....
phone = Phone()
phone.mac = foo
phone.data = bar
phone.state = "Read from legacy file"
phone.purchaseOrderPosition = self.getLegacyOrder(ident)
# SQLAlchemy magic doesn't seem to work here, probably because we don't insert the created
# phone object into the database. So we set the id fields manually.
phone.order_id = phone.purchaseOrderPosition.order_id
phone.order_position_id = phone.purchaseOrderPosition.order_position_id
return phone
Everything works fine except that on a session.flush() executed later in the application SQLAlchemy tries to write the created Phone object to the database (which fortunately doesn't succeed, because phone.state is longer than the data type allows), which breaks the function which issues the flush.
Is there any way to prevent SQLAlchemy from trying to write such an object?
Update
While I didn't find anything on
using_mapper_options(save_on_init=False)
in the Elixir documentation (maybe you can provide a link?), it seemed to me worth a try (I would have preferred a way to prevent a single instance from being written instead of the whole entity).
At first it seemed that the statement has no effect and I suspected my SQLAlchemy/Elixir versions to be too old, but then I found out that the connection to the PurchaseOrderPosition entity (which I didn't modify) made with
phone.purchaseOrderPosition = self.getLegacyOrder(ident)
causes the phone object to be written again. If I remove the statement, everything seems to be fine.
You need to do
import elixir
elixir.options_defaults['mapper_options'] = { 'save_on_init': False }
to prevent Entity instances which you instantiate being auto-added to the session. Ideally, this should be done as early as possible in your code. You can also do this on a per-entity basis, via using_mapper_options(save_on_init=False) - see the Elixir documentation for more details.
Update:
See this post on the Elixir mailing list indicating that this is the solution.
Also, as Ants Aasma points out, you can use cascade options on the Elixir relationship to set up cascade options in SQLAlchemy. See this page for more details.
Well, sqlalchemy doesn't, by default.
Consider the following self-contained example code.
from sqlalchemy import Column, Integer, Unicode, create_engine
from sqlalchemy.orm import create_session
from sqlalchemy.ext.declarative import declarative_base
e = create_engine('sqlite://')
Base = declarative_base(bind=e)
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(Unicode(50))
# create the empty table and a session
Base.metadata.create_all()
s = create_session(bind=e, autoflush=False, autocommit=False)
# assert the table is empty
assert s.query(User).all() == []
# create a new User instance but don't save it to database:
u = User()
u.name = 'siebert'
# I could run s.add(u) here but I won't
s.flush()
s.commit()
# assert the table is still empty
assert s.query(User).all() == []
So I'm not sure what's implicity adding your instances to the session. Normally you have to manually call s.add(u) to make it go to the session. I'm not familiar with elixir so perhaps this is some elixir trickery... Maybe you could remove it from the session, by using session.expunge().
Old post but I came across a similar issue, in my case in sqlalchemy it was caused by cascading on backrefs:
http://docs.sqlalchemy.org/en/rel_0_7/orm/session.html#backref-cascade
Turn it off on your backrefs so that you have to explicitly add things to the session