can an an ORM column trigger a session flush in SQLAlchemy? - python

Question
Can a property access trigger a session flush in SQLAlchemy? My expectation would be for, e.g., queries attached to an object via column_property() or #hybrid_property to cause a session autoflush, in the same way that queries made through session.Query() do. That does not seem to be the case.
In the simple example below, an Account contains an Entry collection. It also provides a "balance" property, constructed with column_property(), that exposes a select-sum query. New entries only appear in an account's balance if session.flush() is called explicitly.
This behavior seems suboptimal: users of the Account class need to sprinkle flush() calls throughout their code based on knowing the internals of the balance implementation. If the implementation changes---e.g., if "balance" was previously a Python #property---bugs can be introduced even though the Account interface is essentially identical. Is there an alternative?
Complete Example
import sys
import sqlalchemy as sa
import sqlalchemy.sql
import sqlalchemy.orm
import sqlalchemy.ext.declarative
Base = sa.ext.declarative.declarative_base()
class Entry(Base):
__tablename__ = "entries"
id = sa.Column(sa.Integer, primary_key=True)
value = sa.Column(sa.Numeric, primary_key=True)
account_id = sa.Column(sa.Integer, sa.ForeignKey("accounts.id"))
account = sa.orm.relationship("Account", backref="entries")
class Account(Base):
__tablename__ = "accounts"
id = sa.Column(sa.Integer, primary_key=True)
balance = sa.orm.column_property(
sa.sql.select([sa.sql.func.sum(Entry.value)])
.where(Entry.account_id == id)
)
def example(database_url):
# connect to the database and prepare the schema
engine = sa.create_engine(database_url)
session = sa.orm.sessionmaker(bind=engine)()
Base.metadata.create_all(bind = engine)
# add an entry to an account
account = Account()
account.entries.append(Entry(value = 42))
session.add(account)
# and look for that entry in the balance
print "account.balance:", account.balance
assert account.balance == 42
if __name__ == "__main__":
example(sys.argv[1])
Observed Output
$ python sa_column_property_example.py postgres:///za_test
account.balance: None
Traceback (most recent call last):
File "sa_column_property_example.py", line 46, in <module>
example(sys.argv[1])
File "sa_column_property_example.py", line 43, in example
assert account.balance == 42
AssertionError
Preferred Output
I'd like to see "account.balance: 42", without adding an explicit call to session.flush().

a column_property is only evaluated at query time, that is when you say query(Account), as well as when the attribute is expired, that is if you said session.expire("account", ['balance']).
To have an attribute invoke a query every time, we use a #property (some small mods here for the script to work with sqlite):
import sys
import sqlalchemy as sa
import sqlalchemy.sql
import sqlalchemy.orm
import sqlalchemy.ext.declarative
Base = sa.ext.declarative.declarative_base()
class Entry(Base):
__tablename__ = "entries"
id = sa.Column(sa.Integer, primary_key=True)
value = sa.Column(sa.Numeric)
account_id = sa.Column(sa.Integer, sa.ForeignKey("accounts.id"))
account = sa.orm.relationship("Account", backref="entries")
class Account(Base):
__tablename__ = "accounts"
id = sa.Column(sa.Integer, primary_key=True)
#property
def balance(self):
return sqlalchemy.orm.object_session(self).query(
sa.sql.func.sum(Entry.value)
).filter(Entry.account_id == self.id).scalar()
def example(database_url):
# connect to the database and prepare the schema
engine = sa.create_engine(database_url, echo=True)
session = sa.orm.sessionmaker(bind=engine)()
Base.metadata.create_all(bind = engine)
# add an entry to an account
account = Account()
account.entries.append(Entry(value = 42))
session.add(account)
# and look for that entry in the balance
print "account.balance:", account.balance
assert account.balance == 42
if __name__ == "__main__":
example("sqlite://")
Note that "flushing" itself is generally not something we have to worry about; the autoflush feature will ensure flush is called each time query() goes to the database to get results, so it's really ensuring that a query occurs which is what we're going for.
Another approach to this issue is to use hybrids. I'd recommend reading the overview of all three methods at SQL Expressions as Mapped Attributes which lists out the tradeoffs to each approach.

Related

sqlalchemy mixin: after_create not firing in child

I am working on an ORM style version of the pq library (PostgreSQL powered python queue system) where users can have their own queue model. It also has added features such as bulk insert/get, asynchronous support and more (if all goes well I'll be able to publish it).
I am having difficulties creating a trigger (I use a PostgreSQL notification system) automatically after table creation (I want to make the usage as simple as possible so that would be much better than adding an additional classmethod for creating the trigger).
This is similar to the answer in this post however I cannot use this solution because I need to pass a connection (for escaping SQL identifiers by checking the dialect of the connection and for checking if objects exist beforehand).
Here is my attempt at it based on the post I mentionned earlier. I apologize for the long code but I figured I had to include a bit of context.
Base model
from sqlalchemy import (BIGINT, Column, func, Index, nullslast,
nullsfirst, SMALLINT, TIMESTAMP)
from sqlalchemy.orm import declared_attr, declarative_mixin
from sqlalchemy.event import listens_for
# this is the function that returns the base model
def postgres_queue_base(schema:str='public', tz_aware:bool=True, use_trigger:bool=True) -> 'PostgresQueueBase':
#declarative_mixin # this is only for MyPy, it does not modify anything
class PostgresQueueBase:
__tablename__ = 'queue'
#declared_attr
def __table_args__(cls):
return (Index(nullsfirst(cls.schedule_at), nullslast(cls.dequeued_at), postgresql_where=(cls.dequeued_at == None)),
{'schema':schema})
id = Column('id', BIGINT, primary_key=True)
internal_mapping = Column('internal_mapping', BIGINT, nullable=False)
enqueued_at = Column('enqueued_at', TIMESTAMP(timezone=tz_aware), nullable=False, server_default=func.now())
dequeued_at = Column('dequeued_at', TIMESTAMP(timezone=tz_aware))
expected_at = Column(TIMESTAMP(timezone=tz_aware))
schedule_at = Column(TIMESTAMP(timezone=tz_aware))
status = Column(SMALLINT, index=True)
#listens_for(PostgresQueueBase, "instrument_class", propagate=True)
def instrument_class(mapper, class_):
print('EVENT INSTRUMENT CLASS')
if use_trigger and mapper.local_table is not None:
trigger_for_table(table=mapper.local_table)
def trigger_for_table(table):
print('Registering after_create event')
#listens_for(table, "after_create")
def create_trigger(table, connection):
print('AFTER CREATE EVENT')
# code that creates triggers and logs that (here I'll just print something and put pseudo code in a comment)
# trig = PostgresQueueTrigger(schema=get_schema_from_model(table), table_name=table.name, connection=connection)
# trig.add_trigger()
print('Creating notify function public.notify_job')
# unique trigger name using hash of schema.table_name (avoids problems with long names and special chars)
print('Creating trigger trigger_job_5d69fc3870b446d0a1f56a793b799ae3')
return PostgresQueueBase
When I try the base model
from sqlalchemy import Column, create_engine, INTEGER, TEXT
from sqlalchemy.orm import declarative_base
# IMPORTANT: inherit both a declarative base AND the postgres queue base
Base = declarative_base()
PostgresQueueBase = postgres_queue_base(schema='public')
# create custom queue model
class MyQueue(Base, PostgresQueueBase):
# optional custom table name (by default it is "queue")
__tablename__ = 'demo_queue'
# custom columns
operation = Column(TEXT)
project_id = Column(INTEGER)
# create table in database
# change connection string accordingly!
engine = create_engine('postgresql://username:password#localhost:5432/postgres')
Base.metadata.create_all(bind=engine)
EVENT INSTRUMENT CLASS
Registering after_create event
I cannot see "AFTER CREATE EVENT" printed out 😟. How do I get the "after_create" event to be fired?
Thanks in advance for your help 👍!
Sorry, I finally figured it out... The table already existed so the events were never firing. Also the code above has some errors in the events (I could not test them since they were not being executed) and the composite index in table_args somehow gets the name """ NULLS FIRST"". I used a hash to have a better name and avoid problems with character limitation or escaping.
import hashlib
from sqlalchemy import (BIGINT, Column, func, Index, nullslast,
nullsfirst, SMALLINT, TIMESTAMP)
from sqlalchemy.orm import declared_attr, declarative_mixin
from sqlalchemy.event import listens_for
# this is the function that returns the base model
def postgres_queue_base(schema:str='public', tz_aware:bool=True, use_trigger:bool=True) -> 'PostgresQueueBase':
#declarative_mixin # this is only for MyPy, it does not modify anything
class PostgresQueueBase:
__tablename__ = 'queue'
#declared_attr
def __table_args__(cls):
# to prevent any problems such as escaping, SQL injection or limit of characters I'll just md5 the table name for the index
md5 = hashlib.md5(cls.__tablename__.encode('utf-8')).hexdigest()
return (Index(f'queue_prio_ix_{md5}', nullsfirst(cls.schedule_at), nullslast(cls.dequeued_at),
postgresql_where=(cls.dequeued_at == None)),
{'schema':schema})
id = Column('id', BIGINT, primary_key=True)
internal_mapping = Column('internal_mapping', BIGINT, nullable=False)
enqueued_at = Column('enqueued_at', TIMESTAMP(timezone=tz_aware), nullable=False, server_default=func.now())
dequeued_at = Column('dequeued_at', TIMESTAMP(timezone=tz_aware))
expected_at = Column(TIMESTAMP(timezone=tz_aware))
schedule_at = Column(TIMESTAMP(timezone=tz_aware))
status = Column(SMALLINT, index=True)
if use_trigger:
#listens_for(PostgresQueueBase, "instrument_class", propagate=True)
def class_instrument(mapper, class_):
if mapper.local_table is not None:
create_trigger_event(table=mapper.local_table)
def create_trigger_event(table):
#listens_for(table, "after_create")
def create_trigger(target, connection, **kw):
print('Create trigger')
return PostgresQueueBase

Is it possible to use session.insert for one to main relationships in SQLAlchemy?

I have read in the following link:
Sqlalchemy adding multiple records and potential constraint violation
That using SQLAlchemy core library to perform the inserts is much faster option, rather than the ORM's session.add() method:
i.e:
session.add()
should be replaced with:
session.execute(Entry.__table__.insert(), params=inserts)
In the following code I have tried to replace .add with .insert:
from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, func
from sqlalchemy.orm import relationship, backref
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Department(Base):
__tablename__ = 'department'
id = Column(Integer, primary_key=True)
name = Column(String)
class Employee(Base):
__tablename__ = 'employee'
id = Column(Integer, primary_key=True)
name = Column(String)
# Use default=func.now() to set the default hiring time
# of an Employee to be the current time when an
# Employee record was created
hired_on = Column(DateTime, default=func.now())
department_id = Column(Integer, ForeignKey('department.id'))
# Use cascade='delete,all' to propagate the deletion of a Department onto its Employees
department = relationship(
Department,
backref=backref('employees',
uselist=True,
cascade='delete,all'))
from sqlalchemy import create_engine
engine = create_engine('postgres://blah:blah#blah:blah/blah')
from sqlalchemy.orm import sessionmaker
session = sessionmaker()
session.configure(bind=engine)
Base.metadata.create_all(engine)
d = Department(name="IT")
emp1 = Employee(name="John", department=d)
s = session()
s.add(d)
s.add(emp1)
s.commit()
s.delete(d) # Deleting the department also deletes all of its employees.
s.commit()
s.query(Employee).all()
# Insert Option Attempt
from sqlalchemy.dialects.postgresql import insert
d = insert(Department).values(name="IT")
d1 = d.on_conflict_do_nothing()
s.execute(d1)
emp1 = insert(Employee).values(name="John", department=d1)
emp1 = emp1.on_conflict_do_nothing()
s.execute(emp1)
The error I receive:
sqlalchemy.exc.CompileError: Unconsumed column names: department
I can't quite understand the syntax and how to do it in the right way, I'm new to the SQLAlchemy.
It looks my question is similar to How to get primary key columns in pd.DataFrame.to_sql insertion method for PostgreSQL "upsert"
, so potentially by answering either of our questions, you could help two people at the same time ;-)
I am new to SQLAlchemy as well, but this is what I found :
Using your exact code, adding department only didn't work using "s.execute(d1)", so I changed it to the below and it does work :
with engine.connect() as conn:
d = insert(Department).values(name="IT")
d1 = d.on_conflict_do_nothing()
conn.execute(d1)
I found on SQLAlchemy documentation that in the past it was just a warning when you try to use a virtual column that doesn't really exist. But from version 0.8, it has been changed to an exception.
As a result, I am not sure if you can do that using the insert. I think that SQLAlchemy does it behind the scene in some other way when using session.add(). Maybe some experts can elaborate here.
I hope that will help.

Python SQLalchemy access huge DB data without creating models

I am using flaks python and sqlalchemy to connect to a huge db, where a lot of stats are saved. I need to create some useful insights with the use of these stats, so I only need to read/get the data and never modify.
The issue I have now is the following:
Before I can access a table I need to replicate the table in my models file. For example I see the table Login_Data in the DB. So I go into my models and recreate the exact same table.
class Login_Data(Base):
__tablename__ = 'login_data'
id = Column(Integer, primary_key=True)
date = Column(Date, nullable=False)
new_users = Column(Integer, nullable=True)
def __init__(self, date=None, new_users=None):
self.date = date
self.new_users = new_users
def get(self, id):
if self.id == id:
return self
else:
return None
def __repr__(self):
return '<%s(%r, %r, %r)>' % (self.__class__.__name__, self.id, self.date, self.new_users)
I do this because otherwise I cant query it using:
some_data = Login_Data.query.limit(10)
But this feels unnecessary, there must be a better way. Whats the point in recreating the models if they are already defined. What shall I use here:
some_data = [SOMETHING HERE SO I DONT NEED TO RECREATE THE TABLE].query.limit(10)
Simple question but I have not found a solution yet.
Thanks to Tryph for the right sources.
To access the data of an existing DB with sqlalchemy you need to use automap. In your configuration file where you load/declare your DB type. You need to use the automap_base(). After that you can create your models and use the correct table names of the DB without specifying everything yourself:
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy import create_engine
import stats_config
Base = automap_base()
engine = create_engine(stats_config.DB_URI, convert_unicode=True)
# reflect the tables
Base.prepare(engine, reflect=True)
# mapped classes are now created with names by default
# matching that of the table name.
LoginData = Base.classes.login_data
db_session = Session(engine)
After this is done you can now use all your known sqlalchemy functions on:
some_data = db_session.query(LoginData).limit(10)
You may be interested by reflection and automap.
Unfortunately, since I never used any of those features, I am not able to tell you more about them. I just know that they allow to use the database schema without explicitly declaring it in Python.

Instantiating object automatically adds to SQLAlchemy Session. Why?

From my understanding of SQLAlchemy, in order to add a model to a session, I need to call session.add(obj). However, for some reason, in my code, SQLAlchemy seems to do this automatically.
Why is it doing this, and how can I stop it? Am I approaching session in the correct way?
example
>>> from database import Session as db
>>> import clients
>>> from instances import Instance
>>> from uuid import uuid4
>>> len(db.query(Instance).all())
>>> 0 # Note, no instances in database/session
>>> i = Instance(str(uuid4()), clients.get_by_code('AAA001'), [str(uuid4())])
>>> len(db.query(Instance).all())
>>> 1 # Why?? I never called db.add(i)!
database.py
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, scoped_session
from sqlalchemy.ext.declarative import declarative_base
import config
Base = declarative_base()
class Database():
def __init__(self):
db_url = 'postgresql://{:s}:{:s}#{:s}:{}/{:s}'.format(
config.database['user'],
config.database['password'],
config.database['host'],
config.database['port'],
config.database['dbname']
)
self.engine = create_engine(db_url)
session_factory = sessionmaker(bind=self.engine)
self.session = scoped_session(session_factory)
Database = Database()
Session = Database.session
instance.py
from sqlalchemy import Column, Text, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.dialects.postgresql import UUID, ARRAY
import database
Base = database.Base
class Instance(Base):
__tablename__ = 'instances'
uuid = Column(UUID, primary_key=True)
client_code = Column(
Text, ForeignKey('clients.code', ondelete='CASCADE'), nullable=False)
mac_addresses = Column(ARRAY(Text, as_tuple=True),
primary_key=True)
client = relationship("Client", back_populates="instances")
def __init__(self, uuid, client, mac_addresses):
self.uuid = uuid
self.client = client
self.mac_addresses = tuple(mac_addresses)
client.py
from sqlalchemy import Column, Text
from sqlalchemy.orm import relationship
import database
from database import Session as db
Base = database.Base
class Client(Base):
__tablename__ = 'clients'
code = Column(Text, primary_key=True)
name = Column(Text)
instances = relationship("Instance", back_populates='client')
def __init__(self, code, name=None):
self.code = code
self.name = name
def get_by_code(code):
client = db.query(Client).filter(Client.code == code).first()
return client
When you create a SQLAlchemy object and link it directly to another SQLAlchemy object, both objects end up in the session.
The reason is that SQLAlchemy needs to make sure you can query these objects.
Take, for example, a user with addresses.
If you create a user in code, with an address, both the user and the address end up in the session, because the address is linked to the user and SQLAlchemy needs to make sure you can query all addresses of a user using user.addresses.all().
In that case all (possibly) existing addresses need to be fetched, as well as the new address you just added. For that purpose the newly added address needs to be saved in the database.
To prevent this from happening (for example if you only need objects to just calculate with), you can link the objects with their IDs/Foreign Keys:
address.user_id = user.user_id
However, if you do this, you won't be able to access the SQLAlchemy properties anymore. So user.addresses or address.user will no longer yield results.
The reverse is also true; I asked a question myself a while back why linking two objects by ID will not result in SQLAlchemy linking these objects in the ORM:
relevant stackoverflow question
another description of this behavior

How to set up global connection to database?

I have problem with setting up database connection. I want to set connection, where I can see this connection in all my controllers.
Now I use something like this in my controller:
db = create_engine('mysql://root:password#localhost/python')
metadata = MetaData(db)
email_list = Table('email',metadata,autoload=True)
In development.ini I have:
sqlalchemy.url = mysql://root#password#localhost/python
sqlalchemy.pool_recycle = 3600
How do I set _____init_____.py?
I hope you got pylons working; for anyone else that may later read question I'll present some pointers in the right direction.
First of all, you are only creating a engine and a metadata object. While you can use the engine to create connections directly you would almost always use a Session to manage querying and updating your database.
Pylons automatically setups this for you by creating a engine from your configuration file, then passing it to yourproject.model.__init__.py:init_model() which binds it to a scoped_session object.
This scoped_session object is available from yourproject.model.meta and is the object you would use to query your database. For example:
record = meta.Session.query(model.MyTable).filter(id=42)
Because it is a scoped_session it automatically creates a Session object and associates it with the current thread if it doesn't already exists. Scoped_session passes all action (.query(), .add(), .delete()) down into the real Session object and thus allows you a simple way to interact the database with having to manage the non-thread-safe Session object explicitly.
The scoped_session, Session, object from yourproject.model.meta is automatically associated with a metadata object created as either yourproject.model.meta:metadata (in pylons 0.9.7 and below) or yourproject.model.meta:Base.metadata (in pylons 1.0). Use this metadata object to define your tables. As you can see in newer versions of pylons a metadata is associated with a declarative_base() object named Base, which allows you to use SqlAlchemy's declarative style.
Using this from the controller
from yourproject import model
from yourproject.model import Session
class MyController(..):
def resource(self):
result = Session.query(model.email_list).\
filter(model.email_list.c.id=42).one()
return str(result)
Use real connections
If you really want to get a connection object simply use
from yourproject.model import Session
connection = Session.connection()
result = connection.execute("select 3+4;")
// more connection executions
Session.commit()
However this is all good, but what you should be doing is...
This leaves out that you are not really using SqlAlchemy much. The power of SqlAlchemy really shines when you start mapping your database tables to python classes. So anyone looking into using pylons with a database should take a serious look at what you can do with SqlAlchemy. If SqlAlchemy starts out intimidating simply start out with using its declarative approach, which should be enough for almost all pylons apps.
In your model instead of defining Table constructs, do this:
from sqlalchemy import Column, Integer, Unicode, ForeignKey
from sqlalchemy.orm import relation
from yourproject.model.meta import Base
class User(Base):
__tablename__ = 'users'
# primary_key implies nullable=False
id = Column(Integer, primary_key=True, index=True)
# nullable defaults to True
name = Column(Unicode, nullable=False)
notes = relation("UserNote", backref="user")
query = Session.query_property()
class UserNote(Base):
__tablename__ = 'usernotess'
# primary_key implies nullable=False
id = Column(Integer, primary_key=True, index=True)
userid = Column(Integer, index=True, ForeignKey("User.id"))
# nullable defaults to True
text = Column(Unicode, nullable=False)
query = Session.query_property()
Note the query objects. These are smart object that live on the class and associates your classes with the scoped_session(), Session. This allows you to event more easily extract data from your database.
from sqlalchemy.orm import eagerload
def resource(self):
user = User.query.filter(User.id==42).options(eagerload("notes")).one()
return "\n".join([ x.text for x in user.notes ])
1.0 version of Pylons use declarative syntax. More about this, you can see here .
In mode/init.py you can write somthing like this:
from your_programm.model.meta import Session, Base
from sqlalchemy import *
from sqlalchemy.types import *
def init_model(engine):
Session.configure(bind=engine)
class Foo(Base) :
__tablename__ = "foo"
id = Column(Integer, primary_key=True)
name = Column(String)
...
What you want to do is modify the Globals class in your app_globals.py file to include a .engine (or whatever) attribute. Then, in your controllers, you use from pylons import app_globals and app_globals.engine to access the engine (or metadata, session, scoped_session, etc...).

Categories