Is there a way to get the engine from the SQLAlchemy object? - python

I have the following code:
db = SQLAlchemy(
engine_options={ 'connect_args': { 'connect_timeout': 60 }}
)
basis_engine = create_engine(database_stages["dev"]["dev_basis"])
meta_data = MetaData()
meta_data.reflect(bind=basis_engine)
I've created the engine just to be able to access directly a single table, and that adds a great amount of overhead to the app when I start it up (It takes much more time). So, is there a way to get the engine from the SQLAlchemy object and avoid create_engine?

The db object (an instance of SQLAlchemy) has a get_engine method that will return an engine.

Related

SQLAlchemy: give some execution_options parameter to all session's queries at once

Original question
I have set up a soft deletion on SQLAlchemy 1.4 based on this example in the official doc here. The
_add_filtering_criteria filters out the soft-deleted objects whenever the execute_state.execution_options "include_deleted" is False.
At some point, I would like that some queries are able to search in the soft-deleted objects. I am able to do it per queries, i.e. specifying query.execution_options(include_deleted=True) but I would like to make all queries of a particular session include soft-deleted objects without having to specify it for each query.
I have tried to declare the execution_options(include_deleted=True) on the engine's creation but it does not work.
from sqlalchemy import orm
class SoftDeleted:
deletion_date = Column(DateTime, nullable=True)
#event.listens_for(orm.Session, "do_orm_execute")
def _add_filtering_criteria(execute_state: orm.ORMExecuteState) -> None:
"""Intercepts all ORM queries. Adds a with_loader_criteria option to all
of them.
This option applies to SELECT queries and adds a global WHERE criteria
(or as appropriate ON CLAUSE criteria for join targets)
to all objects of a certain class or superclass.
"""
if (not execute_state.is_column_load
and not execute_state.is_relationship_load
and not execute_state.execution_options.get("include_deleted", False)):
execute_state.statement = execute_state.statement.options(
orm.with_loader_criteria(
SoftDeleted,
lambda cls: cls.deletion_date.is_(None),
include_aliases=True,
)
)
engine = sa.create_engine(url, echo=False).execution_options(include_deleted=True)
session_factory = orm.sessionmaker(bind=engine)
session = session_factory()
# let SomeClass be a class that inherits from the SoftDeleted mixin
myquery = session.query(SomeClass).get(1)
# will not retrieve SomeClass with uid=1 if it is soft-deleted
myquery2 = session.query(SomeClass).execution_options(include_deleted=True).get(1)
# will retrieve SomeClass with uid=1 even if it is soft-deleted
As I said, I would like that all queries of the session are able to include soft-deleted objects. Would someone know how I can do?
Solution, thanks to snakecharmerb's answer
After snakecharmerb answers, I modified the following and I got the wanted behaviour
#event.listens_for(orm.Session, "do_orm_execute")
def _add_filtering_criteria(execute_state):
if (not execute_state.is_column_load
and not execute_state.is_relationship_load
and not execute_state.execution_options.get("include_deleted", False)
and not execute_state.session.info.get("include_deleted", False)):
[...]
engine = sa.create_engine(url, echo=False) \
.execution_options(include_deleted=include_deleted)
session_factory = orm.sessionmaker(
bind=engine,
info={'include_deleted':
engine.get_execution_options().get('include_deleted', False)})
session = session_factory()
[...]
You have set your flag in the engine's execution options, so you must retrieve it from there. The engine can be accessed through the session's bind attribute:
(Pdb) execute_state.session.bind.get_execution_options()
immutabledict({'include_deleted': True})
Your code does not error because execute_state has an execution_options attribute, but it contains other things:
(Pdb) execute_state.execution_options
immutabledict({'_sa_orm_load_options': <class 'sqlalchemy.orm.context.QueryContext.default_load_options'>, '_result_disable_adapt_to_context': True, 'future_result': True})
Session objects (and sessionsmakers) have an info attribute, a dictionary that you can populate as you wish. This could used to pass the flag if you wanted to set the flag per session rather than per engine.

SQLAlchemy: automap_base in a forking code

I develop an API server that interacts with MySQL DB reflecting it's schema and also get worked into multiple processes. My code for DB work looks like this:
from sqlalchemy import MetaData
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm.session import Session
my_engine = create_engine_by_info(my_config)
metadata = MetaData(bind=my_engine)
Base: type = automap_base(metadata=metadata)
class User(Base):
__tablename__ = 'auth_user'
# Relation descriptions...
# Other classes...
Base.prepare(my_engine, reflect=True)
def find_user(field):
with Session(my_engine) as session:
query = session.query(User)
query = query.filter(User.field == field)
records = query.all()
for u in records:
return u
return None
And it works fine until process gets forked: after work of the child process the original one looses connection: Lost connection to MySQL server during query.
I guess I should keep my_engine separate for each process (e.g some function with a dict of engines where key is a PID), but how can I do that if my classes definition requires an engine at the beginning? Probably I can move classes in a function too, but it would be a hell... So, what is a good solution here?

Return value for sqlalchemy db.execute() in Flask?

My Flask app is hooked up to my postgres database in Heroku like so:
from flask import Flask, render_template, session, request, url_for, redirect, flash
from flask_session import Session
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
import os
app = Flask(__name__)
# Check for environment variable
if not os.getenv("DATABASE_URL"):
raise RuntimeError("DATABASE_URL is not set")
# Configure session to use filesystem
app.config["SESSION_PERMANENT"] = False
app.config["SESSION_TYPE"] = "filesystem"
Session(app)
# Set up database
engine = create_engine(os.getenv("DATABASE_URL"))
db = scoped_session(sessionmaker(bind=engine))
My Flask methods run various SQL statements. The syntax I'm using is typically like bk = db.execute("SELECT * FROM books WHERE isbn=:isbn", {"isbn": isbn}).
I thought the return value of such a statement would be a list of dictionaries, however, when I coded a method to check for len(bk) it said the object had no length even when it should have.
So, what's the return value that I'm receiving, and why doesn't it seem to have a discernible length to Python? Couldn't find a straight answer anywhere.
It's a ResultProxy object, as explained in the documentation of Session.execute(). It is iterable, but does not have a length, because in general it is not known how many rows a query produces before fetching them all. You could pass it to list(), like in the other answer, or use its fetch methods, namely fetchall().
The individual rows are not represented by dict instances, but RowProxy instances. They do act as a mapping, though, so you can use them as they were a dictionary for most purposes (except serialization to JSON, for example).
sqlalchemy will return a lazy generator of records. You can't use len but you can iterate over it and it will yield the records one by one, to save memory. If you have the memory or the table is small and you want to load all records at once, you can call list() in it:
bk = list(bk)

Flask-SQLAlchemy db.session.query(Model) vs Model.query

This is a weird bug I've stumbled upon, and I am not sure why is it happening, whether it's a bug in SQLAlchemy, in Flask-SQLAlchemy, or any feature of Python I'm not yet aware of.
We are using Flask 0.11.1, with Flask-SQLAlchemy 2.1 using a PostgreSQL as DBMS.
Examples use the following code to update data from the database:
entry = Entry.query.get(1)
entry.name = 'New name'
db.session.commit()
This works totally fine when executing from the Flask shell, so the database is correctly configured. Now, our controller for updating entries, slightly simplified (without validation and other boilerplate), looks like this:
def details(id):
entry = Entry.query.get(id)
if entry:
if request.method == 'POST':
form = request.form
entry.name = form['name']
db.session.commit()
flash('Updated successfully.')
return render_template('/entry/details.html', entry=entry)
else:
flash('Entry not found.')
return redirect(url_for('entry_list'))
# In the application the URLs are built dynamically, hence why this instead of #app.route
app.add_url_rule('/entry/details/<int:id>', 'entry_details', details, methods=['GET', 'POST'])
When I submit the form in details.html, I can see perfectly fine the changes, meaning the form has been submitted properly, is valid and that the model object has been updated. However, when I reload the page, the changes are gone, as if it had been rolled back by the DBMS.
I have enabled app.config['SQLALCHEMY_ECHO'] = True and I can see a "ROLLBACK" before my own manual commit.
If I change the line:
entry = Entry.query.get(id)
To:
entry = db.session.query(Entry).get(id)
As explained in https://stackoverflow.com/a/21806294/4454028, it does work as expected, so my guess what there was some kind of error in Flask-SQLAlchemy's Model.query implementation.
However, as I prefer the first construction, I did a quick modification to Flask-SQLAlchemy, and redefined the query #property from the original:
class _QueryProperty(object):
def __init__(self, sa):
self.sa = sa
def __get__(self, obj, type):
try:
mapper = orm.class_mapper(type)
if mapper:
return type.query_class(mapper, session=self.sa.session())
except UnmappedClassError:
return None
To:
class _QueryProperty(object):
def __init__(self, sa):
self.sa = sa
def __get__(self, obj, type):
return self.sa.session.query(type)
Where sa is the Flask-SQLAlchemy object (ie db in the controller).
Now, this is where things got weird: it still doesn't save the changes. Code is exactly the same, yet the DBMS is still rolling back my changes.
I read that Flask-SQLAlchemy can execute a commit on teardown, and tried adding this:
app.config['SQLALCHEMY_COMMIT_ON_TEARDOWN'] = True
Suddenly, everything works. Question is: why?
Isn't teardown supposed to happen only when the view has finished rendering? Why is the modified Entry.query behaving different to db.session.query(Entry), even if the code is the same?
Below is the correct way to make changes to a model instance and commit them to the database:
# get an instance of the 'Entry' model
entry = Entry.query.get(1)
# change the attribute of the instance; here the 'name' attribute is changed
entry.name = 'New name'
# now, commit your changes to the database; this will flush all changes
# in the current session to the database
db.session.commit()
Note: Don't use SQLALCHEMY_COMMIT_ON_TEARDOWN, as it's considered harmful and also removed from docs. See the changelog for version 2.0.
Edit: If you have two objects of normal session (created using sessionmaker()) instead of scoped session , then on calling db.session.add(entry) above code will raise error sqlalchemy.exc.InvalidRequestError: Object '' is already attached to session '2' (this is '3'). For more understanding about sqlalchemy session, read below section
Major Difference between Scoped Session vs. Normal Session
The session object we mostly constructed from the sessionmaker() call and used to communicate with our database is a normal session. If you call sessionmaker() a second time, you will get a new session object whose states are independent of the previous session. For example, suppose we have two session objects constructed in the following way:
from sqlalchemy import Column, String, Integer, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String)
from sqlalchemy import create_engine
engine = create_engine('sqlite:///')
from sqlalchemy.orm import sessionmaker
session = sessionmaker()
session.configure(bind=engine)
Base.metadata.create_all(engine)
# Construct the first session object
s1 = session()
# Construct the second session object
s2 = session()
Then, we won't be able to add the same User object to both s1 and s2 at the same time. In other words, an object can only be attached at most one unique session object.
>>> jessica = User(name='Jessica')
>>> s1.add(jessica)
>>> s2.add(jessica)
Traceback (most recent call last):
......
sqlalchemy.exc.InvalidRequestError: Object '' is already attached to session '2' (this is '3')
If the session objects are retrieved from a scoped_session object, however, then we don't have such an issue since the scoped_session object maintains a registry for the same session object.
>>> session_factory = sessionmaker(bind=engine)
>>> session = scoped_session(session_factory)
>>> s1 = session()
>>> s2 = session()
>>> jessica = User(name='Jessica')
>>> s1.add(jessica)
>>> s2.add(jessica)
>>> s1 is s2
True
>>> s1.commit()
>>> s2.query(User).filter(User.name == 'Jessica').one()
Notice thats1 and s2 are the same session object since they are both retrieved from a scoped_session object who maintains a reference to the same session object.
Tips
So, try to avoid creating more than one normal session object. Create one object of the session and use it everywhere from declaring models to querying.
Our project is separated in several files to ease mantenaince. One is routes.py with the controllers, and another one is models.py, which contains the SQLAlchemy instance and models.
So, while I was removing boilerplate to get a minimal working Flask project to upload it to a git repository to link it here, I found the cause.
Apparently, the reason is that my workmate, while attempting to insert data using queries instead of the model objects (no, I have no idea why on earth he wanted to do that, but he spent a whole day coding it), had defined another SQLAlchemy instance in the routes.py.
So, when I was trying to insert data from the Flask shell using:
from .models import *
entry = Entry.query.get(1)
entry.name = 'modified'
db.session.commit()
I was using the correct db object, as defined in models.py, and it was working completely fine.
However, as in routes.py there was another db defined after the model import, this one was overwriting the reference to the correct SQLAlchemy instance, so I was commiting with a different session.

peewee vs sqlalchemy performance

I have 2 simple scripts:
from sqlalchemy import create_engine, ForeignKey, Table
from sqlalchemy import Column, Date, Integer, String, DateTime, BigInteger, event
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.engine import Engine
from sqlalchemy.orm import relationship, backref, sessionmaker, scoped_session, Session
class Test(declarative_base()):
__tablename__ = "Test"
def __init__(self, *args, **kwargs):
args = args[0]
for key in args:
setattr(self, key, args[key] )
key = Column(String, primary_key=True)
data = []
for a in range(0,10000):
data.append({ "key" : "key%s" % a})
engine = create_engine("sqlite:///testn", echo=False)
with engine.connect() as connection:
Test.metadata.create_all(engine)
session = Session(engine)
list(map(lambda x: session.merge(Test(x)), data))
session.commit()
result:
real 0m15.300s
user 0m14.920s
sys 0m0.351s
second script:
from peewee import *
class Test(Model):
key = TextField(primary_key=True,null=False)
dbname = "test"
db = SqliteDatabase(dbname)
Test._meta.database = db
data = []
for a in range(0,10000):
data.append({ "key" : "key%s" % a })
if not Test.table_exists():
db.create_tables([Test])
with db.atomic() as tr:
Test.insert_many(data).upsert().execute()
result:
real 0m3.253s
user 0m2.620s
sys 0m0.571s
Why?
This comparison is not entirely valid, as issuing an upsert style query is very different from what SQLAlchemy's Session.merge does:
Session.merge() examines the primary key attributes of the source instance, and attempts to reconcile it with an instance of the same primary key in the session. If not found locally, it attempts to load the object from the database based on primary key, and if none can be located, creates a new instance.
In this test case this will result in 10,000 load attempts against the database, which is expensive.
On the other hand when using peewee with sqlite the combination of insert_many(data) and upsert() can result in a single query:
INSERT OR REPLACE INTO Test (key) VALUES ('key0'), ('key1'), ...
There's no session state to reconcile, since peewee is a very different kind of ORM from SQLAlchemy and on a quick glance looks closer to Core and Tables
In SQLAlchemy instead of list(map(lambda x: session.merge(Test(x)), data)) you could revert to using Core:
session.execute(Test.__table__.insert(prefixes=['OR REPLACE']).values(data))
A major con about this is that you have to write a database vendor specific prefix to INSERT by hand. This will also subvert the Session, as it will have no information or knowledge about the newly added rows.
Bulk insertions using model objects are a little more involved with SQLAlchemy. Very simply put using an ORM is a trade-off between ease of use and speed:
ORMs are basically not intended for high-performance bulk inserts - this is the whole reason SQLAlchemy offers the Core in addition to the ORM as a first-class component.

Categories