I'm learning SQLAlchemy and I want to make sure that I've understood the backref parameter in relationship correctly.
For example
from app import db
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(64), unique=True)
posts = db.relationship('Post', backref='author', lazy=True)
class Post(db.Model):
id = db.Column(db.Integer, primary_key=True)
body = db.Column(db.String(140))
user_id = db.Column(db.Integer, db.ForeignKey('user.id'))
Say I have a User object j = models.User.query.get(1). My question is, is there any difference between the following things?
j.posts
Post.query.filter_by(author=j).all()
Post.query.with_parent(j).all()
Post.query.with_parent(j, property='posts').all()
Post.query.with_parent(j, property=User.posts).all()
The results returned are same, but I don't know whether the SQL statements executed are identical.
What I've tried
The SQLAlchemy docs says:
with_parent(instance, property=None, from_entity=None)
...the given property can be None, in which case a search is performed against this Query object’s target mapper.
So the last three statements seem same, but I don't really understand what does this Query object’s target mapper refer to. Is it Post in this case, for this query is performed on Post?
Even if the generated SQL statements are identical, the commands you enlisted may have a different impact on your application, e.g. j.posts will cache (memoize, do not confuse with Werkzeug caching) results you have got, while others will fetch them every single time.
If you remove .all() from your queries you can simply print them:
query = Post.query.filter_by(author=j)
print(query)
Which would result in:
SELECT post.id AS post_id, post.body AS post_body, post.user_id AS post_user_id
FROM post
WHERE ? = post.user_id
Using .all() is essentially like getting [m for m in query]).
The trick with query-printing will not work for j.posts which will return something like:
> print(j.posts)
> [Post(...), Post(..)]
Still, you can see all the silently emitted queries using built-in sqlalchemy loggers. See the following code:
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from sqlalchemy.engine import Engine
from sqlalchemy import event
import logging
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////tmp/tests.db'
db = SQLAlchemy(app)
logging.basicConfig()
logger = logging.getLogger('sqlalchemy.engine')
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(64), unique=True)
posts = db.relationship('Post', backref='author', lazy=True)
class Post(db.Model):
id = db.Column(db.Integer, primary_key=True)
body = db.Column(db.String(140))
user_id = db.Column(db.Integer, db.ForeignKey('user.id'))
db.drop_all()
db.create_all()
user = User(username='test', posts=[Post(body='some body')])
db.session.add(user)
db.session.commit()
# start logging
logger.setLevel(logging.DEBUG)
j = User.query.get(1)
queries = {
"j.posts",
"Post.query.filter_by(author=j)",
"Post.query.with_parent(j)",
"Post.query.with_parent(j, property='posts')",
"Post.query.with_parent(j, property=User.posts)",
}
def test_queries():
for name in queries:
print('\n=======')
print('Executing %s:' % name)
query = eval(name)
print(query)
test_queries() # you should see j.posts query here
print('Second test')
test_queries() # but not here
Getting back to your question: yes, the emitted SQL queries are identical.
In Query object’s target mapper, Query object's target refers to Post in your example. Decoupling this, when you declare Post class, inheriting from db.Model, for SQLAlchemy it is like creating an object Post and mapping the properties of this object to columns of specially created table.
Underneath there is an instance of Mapper class, which is responsible for the mapping for every single model that you create (learn more about mapping here: Types of Mappings). You can simply get this mapper calling class_mapper on your model or object_mapper on an instance of your model:
from sqlalchemy.orm import object_mapper, class_mapper,
from sqlalchemy.orm.mapper import Mapper
assert object_mapper(j) is class_mapper(User)
assert type(class_mapper(User)) is Mapper
The Mapper has all the necessary information about the columns and relations you have in your model. When calling Post.query.with_parent(j) this information is used to find a property (i.e. relationship) relating Post and User objects, so in your case to populate 'property' with User.posts.
To see the queries you can run your python script with -i and then run each query individually and it will print out the SQL code it runs.
Example:
main.py:
import sqlalchemy
from sqlalchemy import create_engine, Column, Integer, String, Sequence
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
import os
engine = create_engine('sqlite:///:memory:', echo=True)
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, Sequence('user_id_seq'), primary_key=True)
name = Column(String(50))
fullname = Column(String(50))
password = Column(String(12))
def __repr__(self):
return "< User(name={}, fullname={}, password={} )>".format(self.name, self.fullname, self.password)
Base.metadata.create_all(engine)
ed_user= User(name='ed', fullname='Ed Jones', password='edpassword')
Session = sessionmaker(bind=engine, autoflush=False)
session = Session()
session.add(ed_user)
session.add_all([
User(name='wendy', fullname='Wendy Williams', password='foobar'),
User(name='mary', fullname='Mary Contraty', password='xxg527'),
User(name='fred', fullname='Fred Flinstone', password='blah')
])
session.commit()
os.system('clear')
Now you run it with python -i main.py, type: session.query(User).filter_by(name='ed').first() and you will see the SQL generated. After running all of your tests I concluded that they are all identical. With this method you can test any query and see if there is any difference.
p.s. I added the os.system('clear') to remove all the unnecessary output from creating the database and some other stuff.
Related
Let's say I have the following SQLAlchemy model.
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column
from sqlalchemy.types import String, Integer
Base = declarative_base()
class User(Base):
__table__ = "users"
id = Column(Integer, primary_key=True, autoincrement=True)
first_name = Column(String)
last_name = Column(String)
age = Column(Integer)
def as_dict(self):
return {key: getattr(self, key) for key in self.__mapper__.c.keys()}
And I make a query to my database like so:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from user import User
engine = create_engine("sqlite:///:memory:")
Session = sessionmaker(bind=engine)
session = Session()
# Add something to database
user = User(first_name="John", last_name="Doe", age=38)
session.add(user)
session.commit()
# query for the result now
result = session.query(User).filter_by(id=1).first()
session.expunge_all()
Now my question is, is it a bad practice to modify result after it has been expunged from the session if I just want to serialize and send it back to the client?
e.g.
# ..continuing from above
result.age = result.age + 4
print(json.dumps(result.as_dict()))
It's not bad practise if you have a legitimate reason for doing so, although it may be confusing if the data returned from your application differs from the data in your database.
Consider your example of a user with a first_name and last_name, it's perfectly legitimate to combine those and return it as name. Your business case may allow users to customise how they wish to be identified and greeted, some preferring first name whilst some users may prefer first and last. In this case it's good practise to perform this logic otherwise it would become the responsibility of the client application which may have to be repeated in several places.
When I have two objects, linked with a "relation" in SQLAlchemy, I realised that simply assigning to that relation is not enough to propagate the values to the other object. For example (see below), if I have a "user" table and a "contact" table (both are highly contrived, but demonstrate the issue well), and a "user" can have multiple "contacts". In that case I will have a foreign key between the users and contacts. If I create both an instance of User and Contact and later assign the user to the contact, I would expect the attributes of the FK to be updated (even without a DB flush) but they are not. Why? And how can I tell SA to do this automatically?
This would be something I would expect to work, but as you can see in the full example below, it does not:
user = User(name='a', lname='b')
contact(type='email', value='foo#bar.com')
contact.user = user
assert contact.username == 'a' # <-- Fails because the attribute is still `None`
Full runnable example:
"""
This code example shows two tables related to each other by a composite key,
using an SQLAlchemy "relation" to allow easy access to related items.
However, as the last few lines show, simply assigning an object A to the
relation of object B does not update the attributes of object B until at least
a "flush" is called.
"""
from sqlalchemy import Column, ForeignKeyConstraint, Unicode, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relation, sessionmaker
Base = declarative_base()
class User(Base):
__tablename__ = "user"
name = Column(Unicode, primary_key=True)
lname = Column(Unicode, primary_key=True)
class Contact(Base):
__tablename__ = "contact"
__table_args__ = (
ForeignKeyConstraint(
['username', 'userlname'],
['user.name', 'user.lname']
),
)
username = Column(Unicode, primary_key=True)
userlname = Column(Unicode, primary_key=True)
type = Column(Unicode)
value = Column(Unicode)
user = relation(User)
engine = create_engine('sqlite://')
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
user = User(name="John", lname="Doe")
contact = Contact(type='email', value='john.doe#example.com')
contact.user = user # <-- How can I tell SA to set the FKs on *contact* here?
session.add(contact)
print('Before flush: contact.username user=%r' % contact.username)
session.flush()
print('After flush : contact.username user=%r' % contact.username)
According to this answer - https://stackoverflow.com/a/52911047/4981223 it is not possible:
The FK of the child object isn't updated until you issue a flush() either explicitly or through a commit(). I think the reason for this is that if the parent object of a relationship is also a new instance with an auto-increment PK, SQLAlchemy needs to get the PK from the database before it can update the FK on the child object (but I stand to be corrected!).
I have two separate SQLAlchemy interfaces to a Postgres database. The first interface, in the context of a Flask App, contains this model:
app = create_app() # sets the SQLAlchemy Database URI, etc.
db = SQLAlchemy(app)
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
created_at = db.Column(db.DateTime, default=datetime.datetime.utcnow)
updated_at = db.Column(db.DateTime, onupdate=datetime.datetime.utcnow)
name = db.Column(db.String, nullable=False)
The second interface is not through Flask -- rather, it's a script that listens for a particular event, in which case it is meant to perform some computations and update a row in the database. To accomplish this, I have SQLAlchemy reflect the existent database:
from sqlalchemy import create_engine, MetaData, Table
from sqlalchemy.orm import mapper, sessionmaker
from sqlalchemy.ext.automap import automap_base
from os import environ
dbPath = "postgresql://" + ...
engine = create_engine(dbPath)
Base = automap_base()
Base.prepare(engine, reflect=True)
metadata = MetaData(engine)
class User(object):
pass
users = Table('user', metadata, autoload=True, autoload_with=engine)
mapper(User, users)
Session = sessionmaker(bind=engine)
session = Session()
The issue I'm now running into is this: when I'm using the first interface to create a new entry or update one, things work as expected, and the created_at and updated_at fields are updated appropriately.
However, when I'm using the second interface -- importing the code and using session.query(User) to get an entry and to update it, the updated_at field doesn't change. Moreover, when I'm using this interface to create a new User, while it creates the new row as expected, it populates neither the created_at nor updated_at fields.
My questions:
Why is this happening? Why does the reflection seemingly break the default/onupdate methods?
How can I fix this?
default and onupdate are handled entirely client side in Python and so cannot be reflected from the DB. See "Limitations of Reflection". In case of default you could use server_default:
class User(db.Model):
...
created_at = db.Column(db.DateTime,
server_default=text("now() at time zone 'UTC'"))
and for onupdate you'd have to write a DB trigger and use server_onupdate=FetchedValue().
On the other hand you could avoid all that and just separate your models from your application code to a module, used by both your Flask application and your script. This would of course be a bit more involved as you'd have to use vanilla SQLAlchemy declarative instead of the customized db.Model base of Flask-SQLAlchemy. Or, you could use custom commands with Flask to implement your scripts, which would allow using the Flask-SQLAlchemy extensions.
I have a fairly simple flask app connected to a postgresql database. I am mainly using the flask app with flask-admin so that I can add records to the database and perhaps build it out into a dashboard later. It's an internal use catalog, basically.
What I am trying to do is also write a script that connects to a third party API to add/update records in the database, so it does not got through the flask app. I am using SQLAlchemy to do this because it's consistent with the app and I just need something to work without fussing over SQL statements.
The flask app's data model is defined as such:
app.py
from flask import Flask, render_template, request
from flask_sqlalchemy import SQLAlchemy
from sqlalchemy.dialects import postgresql
from flask_admin import Admin
# ... APPLICATION CONFIGURATION ...
# db Models
## Table for many to many
keywords = db.Table('keywords',
db.Column('keyword_id', db.Integer, db.ForeignKey('keyword.id')),
db.Column('dataset_id', db.String(24), db.ForeignKey('dataset.dataset_id')),
)
## Model classes
class Dataset(db.Model):
title = db.Column(db.String(120))
description = db.Column(db.Text())
dataset_id = db.Column(db.String(24), primary_key=True, unique=True)
#relationships
dataset_documentation = db.relationship('DataDocument', backref='dataset', lazy='dynamic')
keywords = db.relationship('Keyword', secondary=keywords, backref='dataset', lazy='dynamic')
def __str__(self):
return self.title
class Keyword(db.Model):
id = db.Column(db.Integer, primary_key=True)
keyword = db.Column(db.String(80))
def __str__(self):
return self.keyword
class DataDocument(db.Model):
id = db.Column(db.Integer, primary_key=True)
document = db.Column(db.String(120))
dataset_id = db.Column(db.String(24), db.ForeignKey('dataset.dataset_id'))
def __str__(self):
return self.document
# ... APPLICATION VIEWS ...
So we have datasets with some basic metadata and they have a one to many relationship with a filepath to a document and a many to many relationship to any number of keywords.
The separate script is connecting directly to the database, and mapping existing tables to objects that I can use to create a session and modify the database.
script.py
import config #local config only
from sqlalchemy import create_engine, MetaData, Table
from sqlalchemy.orm import mapper, sessionmaker
# Connecting to postgres database and creating a session with database objects, intantiate empty classes to populate
class Dataset(object):
pass
class DataDocument(object):
pass
class Keyword(object):
pass
## How to instantiate the MTM association table?
db_uri = config.SQLALCHEMY_DATABASE_URI
engine = create_engine(db_uri)
meta = MetaData(engine)
dataset_table = Table('dataset', meta, autoload=True) #correct
datadocument_table = Table('dataset', meta, autoload=True) #incorrect?
keyword_table = Table('keyword', meta, autoload=True) #incorrect?
mapper(Dataset, dataset_table) #correct
mapper(DataDocument, datadocument_table, meta, autoload=True) #??
mapper(Keyword, keyword_table, meta, autoload=True) #??
Session = sessionmaker(bind=engine)
session = Session()
# sample update
data_upsert = Dataset()
data_upsert.title = "Some title"
data_upsert.dataset_id = "Uniq_ID-123"
data_upsert.description = "lorem ipsum foo bar foo"
session.merge(data_upsert)
#attempt to add related properties
key1 = Keyword('test1')
key2 = Keyword('test2')
datadoc = DataDocument('path/to/document.txt')
# FAIL.
data_upsert.append(key1)
data_upsert.append(key2)
data_upsert.append(datadoc)
session.flush()
I am a newbie with sqlalchemy and I can just barely wrap my head around creating the Dataset object in the script from the database engine. But I was thinking in loading the Keyword and Datadocument tables as well that it would already understand the relationships based on what it is loading from the database, but this is where my understanding is running thin.
Is there a straightforward way to complete the picture here? I am assuming it doesn't make sense to define my models again explicitly in script.py, but in reviewing documentation and some tutorials, I am not seeing the missing pieces of loading these relationships into the session so that I can ingest all of the data into the database.
Update your model definitions to add constructor functions. In that case, it allows you to pass the parameters to the object upon instantiation.
models.py
## Model classes
class Dataset(db.Model):
title = db.Column(db.String(120))
description = db.Column(db.Text())
dataset_id = db.Column(db.String(24), primary_key=True, unique=True)
#relationships
dataset_documentation = db.relationship('DataDocument', backref='dataset', lazy='dynamic')
keywords = db.relationship('Keyword', secondary=keywords, backref='dataset', lazy='dynamic')
def __init__(self, title=None, desc=None, dataset_id=None):
self.title = title
self.description = desc
self.dataset_id = dataset_id
def __str__(self):
return self.title
class Keyword(db.Model):
id = db.Column(db.Integer, primary_key=True)
keyword = db.Column(db.String(80))
def __init__(self, keyword=None):
self.keyword = keyword
def __str__(self):
return self.keyword
class DataDocument(db.Model):
id = db.Column(db.Integer, primary_key=True)
document = db.Column(db.String(120))
dataset_id = db.Column(db.String(24), db.ForeignKey('dataset.dataset_id'))
def __init__(self, document, dataset_id):
self.document = document
self.dataset_id = dataset_id
def __str__(self):
return self.document
No need to define the model classes again in the script.py. You can simply import the classes you want to use from the models.py. Then you can insert the data object with its related objects altogether into the database in this way:
script.py
import config #local config only
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from models import Dataset, DataDocument, Keyword
def loadSession(engine):
""""""
Session = sessionmaker(bind=engine)
session = Session()
return session
engine = create_engine(config.SQLALCHEMY_DATABASE_URI, echo=False)
Base = declarative_base(engine)
# load session
session = loadSession(engine)
data_upsert = Dataset(title="Some title", dataset_id="Uniq_ID-125", desc="lorem ipsum foo bar foo")
# add related properties here
key1 = Keyword('test1')
key2 = Keyword('test2')
datadoc = DataDocument('path/to/document.txt', dataset_id="Uniq_ID-125")
# append the properties to the object
data_upsert.dataset_documentation.append(datadoc)
data_upsert.keywords.append(key1)
data_upsert.keywords.append(key2)
session.add(data_upsert)
session.commit()
I've tested the code locally and hope it works for you.
I'm using a bidirectional association_proxy to associate properties Group.members and User.groups. I'm having issues with removing a member from Group.members. In particular, Group.members.remove will successfully remove an entry from Group.members, but will leave a None in place of the corresponding entry in User.groups.
More concretely, the following (minimal-ish) representative code snippet fails its last assertion:
import sqlalchemy as sa
from sqlalchemy.orm import Session
from sqlalchemy.ext.associationproxy import association_proxy
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Group(Base):
__tablename__ = 'group'
id = sa.Column(sa.Integer, autoincrement=True, primary_key=True)
name = sa.Column(sa.UnicodeText())
members = association_proxy('group_memberships', 'user',
creator=lambda user: GroupMembership(user=user))
class User(Base):
__tablename__ = 'user'
id = sa.Column(sa.Integer, autoincrement=True, primary_key=True)
username = sa.Column(sa.UnicodeText())
groups = association_proxy('group_memberships', 'group',
creator=lambda group: GroupMembership(group=group))
class GroupMembership(Base):
__tablename__ = 'user_group'
user_id = sa.Column(sa.Integer, sa.ForeignKey('user.id'), primary_key=True)
group_id = sa.Column(sa.Integer, sa.ForeignKey('group.id'), primary_key=True)
user = sa.orm.relationship(
'User',
backref=sa.orm.backref('group_memberships', cascade="all, delete-orphan"))
group = sa.orm.relationship(
'Group',
backref=sa.orm.backref('group_memberships', cascade="all, delete-orphan"),
order_by='Group.name')
if __name__ == '__main__':
engine = sa.create_engine('sqlite://')
Base.metadata.create_all(engine)
session = Session(engine)
group = Group(name='group name')
user = User(username='user name')
group.members.append(user)
session.add(group)
session.add(user)
session.flush()
assert group.members == [user]
assert user.groups == [group]
group.members.remove(user)
session.flush()
assert group.members == []
assert user.groups == [] # This assertion fails, user.groups is [None]
I've tried to follow the answers to SQLAlchemy relationship with association_proxy problems and How can SQLAlchemy association_proxy be used bi-directionally? but they do not seem to help.
I discovered your problem almost entirely by accident, as I was trying to figure out what's going on.
Because there wasn't any data in the db, I added a session.commit(). It turns out that (from the linked answer):
The changes aren't persisted permanently to disk, or visible to other transactions until the database receives a COMMIT for the current transaction (which is what session.commit() does).
Because you are just .flush()ing the changes, sqlalchemy never re-queries the database. You can verify this by adding:
import logging
logging.getLogger('sqlalchemy').setLevel(logging.INFO)
logging.getLogger('sqlalchemy').addHandler(logging.StreamHandler())
And then simply running your code. It will display all of the queries that are run as they happen. Then you can change session.flush() to session.commit() and then re-run, and you'll see that several SELECT statements are run after your commit.
It looks like either session.expire(user) or session.refresh(user) will force a refresh of the user, as well. I'm not sure if there's a way to force the update to propagate to the other object without being explicit about it (or if that's even desirable).