Enforcing uniqueness using SQLAlchemy association proxies

Enforcing uniqueness using SQLAlchemy association proxies - python

I'm trying to use association proxies to make dealing with tag-style records a little simpler, but I'm running into a problem enforcing uniqueness and getting objects to reuse existing tags rather than always create new ones.
Here is a setup similar to what I have. The examples in the documentation have a few recipes for enforcing uniqueness, but they all rely on having access to a session and usually require a single global session, which I cannot do in my case.
from sqlalchemy import Column, Integer, String, create_engine, ForeignKey
from sqlalchemy.orm import sessionmaker, relationship
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.associationproxy import association_proxy
Base = declarative_base()
engine = create_engine('sqlite://', echo=True)
Session = sessionmaker(bind=engine)
def _tag_find_or_create(name):
# can't use global objects here, may be multiple sessions and engines
# ?? No access to session here, how to do a query
tag = session.query(Tag).filter_by(name=name).first()
tag = Tag.query.filter_by(name=name).first()
if not tag:
tag = Tag(name=name)
return tag
class Item(Base)
__tablename__ = 'item'
id = Column(Integer, primary_key=True)
tags = relationship('Tag', secondary='itemtag')
tagnames = association_proxy('tags', 'name', creator=_tag_find_or_create)
class ItemTag(Base)
__tablename__ = 'itemtag'
id = Column(Integer, primary_key=True)
item_id = Column(Integer, ForeignKey('item.id'))
tag_id = Column(Integer, ForeignKey('tag.id'))
class Tag(Base)
__tablename__ = 'tag'
id = Column(Integer, primary_key=True)
name = Column(String(50), nullable=False)
# Scenario 1
session = Session()
item = Item()
session.add(item)
item.tagnames.append('red')
# Scenario 2
item2 = Item()
item2.tagnames.append('blue')
item2.tagnames.append('red')
session.add(item2)
Without the creator function, I just get tons of duplicate Tag items. The creator function seems like the most obvious place to put this type of check, but I'm unsure how to do a query from inside the creator function.
Consider the two scenarios provided at the bottom of the example. In the first example, it seems like there should be a way to get access to the session in the creator function, since the object the tags are being added to is already associated with a session.
In the second example, the Item object isn't yet associated with a session, so the validation check can't happen in the creator function. It would have to happen later when the object is actually added to a session.
For the first scenario, how would I go about getting access to the session object in the creator function?
For the second scenario, is there a way to "listen" for when the parent object is added to a session and validate the association proxies at that point?

For the first scenario, you can use object_session.
As for the question overall: true, you need access to the current session; if using scoped_session in your application is appropriate, then the second part of the Recipe you link to should work fine to use. See Contextual/Thread-local Sessions for more info.
Working with events and change objects when they change from transient to persistent state will not make your code pretty or very robust. So I would immediately add new Tag objects to the session, and if the transaction is rolled back, they would not be in the database.
Note that in a multi-user environment you are likely to have race condition: the same tag is new and created in simultaneously by two users. The user who commits last will fail (if you have a unique constraint on the database).
In this case you might consider be without the unique constraint, and have a (daily) procedure to clean those duplicates up (and reassign relations). With time there would be less and less new items, and less possibilities for such clashes.

Related

SQLAlchemy relationship not updated after flush

I have some models with a relationship defined between them like so:
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True, nullable=False)
children = Relationship(Child, lazy='joined')
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True, nullable=False)
father_id = Column(Integer, ForeignKey('parent.id'), nullable=False)
If I add a child within the session (using session.add(Child(...))), I would expect its father's children relationship to update to include this child after flushing the session. However, I'm not seeing that.
parent = session.query(Parent).get(parent_id)
num_children = len(parent.children)
# num_children == 3, for example
session.add(Child(father_id=parent_id))
session.flush()
new_num_children = len(parent.children)
# num_children == 3, it should be 4!
Any help would be much appreciated!
I can add the new child to the parent.children list directly, and flush the session, but I'm due to other existing code, I want to add it using session.add.
I can also commit after adding the child, which does correctly update the parent.children relationship, but I don't want to commit the transaction at the point.
I've tried adding a backref to the children relationship, but that doesn't seem to make any difference.

I've just run into this problem myself. SQLAlchemy does some internal memoisation to prevent it emitting a new SQL query every time you access a relationship. The problem is that it doesn't seem to realise that updating the foreign key directly could have an effect on the relationship. While SQLAlchemy probably could be patched to deal with this for simple joins, it would be very difficult for complex joins and I presume this is why it behaves the way it does.
When you do session.flush(), you're sending the changes back to the database, but SQLAlchemy doesn't realise it needs to query the database to update the relationship.
If you call session.expire_all() after the flush, then you force SQLAlchemy to reload every model instance and relationship when they're next accessed - this solves the problem.
You can also use session.expire(obj) to do this more selectively or session.refresh(obj) to do it selectively and immediately re-query the database.
For more information about these methods and how they differ, I found a helpful blog post: https://www.michaelcho.me/article/sqlalchemy-commit-flush-expire-refresh-merge-whats-the-difference
Official docs: https://docs.sqlalchemy.org/en/13/orm/session_api.html

Why SQLAlchemy send extra SELECTs when accessing a persisted model property

Given a simple declarative based class;
class Entity(db.Model):
__tablename__ = 'brand'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(255), nullable=False)
And the next script
entity = Entity()
entity.name = 'random name'
db.session.add(entity)
db.session.commit()
# Just by accessing the property name of the created object a
# SELECT statement is sent to the database.
print entity.name
When I enable echo mode in SQLAlchemy, I can see in the terminal the INSERT statement and an extra SELECT just when I access a property (column) of the model (table row).
If I don't access to any property, the query is not created.
What is the reason for that behavior? In this basic example, We already have the value of the name property assigned to the object. So, Why is needed an extra query? It to secure an up to date value, or something like that?

By default, SQLAlchemy expires objects in the session when you commit. This is controlled via the expire_on_commit parameter.
The reasoning behind this is that the row behind the instance could have been modified outside of the transaction, so if you are not careful you could run into data races, but if you know what you are doing you can safely turn it off.

SQLAlchemy Automatically Create Entry If Doesn't Exist As Foreign Key

I have a large number of .create() calls that rely on a ForeignKey in another table (Users). However, there is no point in the code where I actually create users.
Is there a way for there to be a Users entry created for each foreign key is specified on another table in SQLAlchemy?
For example:
class Rr(db.Model):
__tablename__ = 'rr'
id = db.Column(db.Integer, primary_key=True)
submitter = db.Column(db.String(50), db.ForeignKey('user.username'))
class User(db.Model):
__tablename__ = 'user'
username = db.Column(db.String, primary_key=True)
so If I call Rr(id, submitter=John) is there a way for a John entry to be created in the user table if it does not already exist?
I understand that I can create a wrapper around the .create() method such that it checks the submitter and creates one if it doesn't exist but this seems excess as there are a large number of models that want Users to be automatically created.

I can't think of any orm or sql implementation that does what you ask but there is something that effectively accomplishes what you seek to do described in this SO answer: Does SQLAlchemy have an equivalent of Django's get_or_create?
basically get the User from the db if it exists, if it doesn't create it.
The only down side to this method is that you would need to do 2 queries instead of one but I don't think there is a way to do what you seek in one query

Sqlalchemy: secondary relationship update

I have two tables, say A and B. Both have a primary key id. They have a many-to-many relationship, SEC.
SEC = Table('sec', Base.metadata,
Column('a_id', Integer, ForeignKey('A.id'), primary_key=True, nullable=False),
Column('b_id', Integer, ForeignKey('B.id'), primary_key=True, nullable=False)
)
class A():
...
id = Column(Integer, primary_key=True)
...
rels = relationship(B, secondary=SEC)
class B():
...
id = Column(Integer, primary_key=True)
...
Let's consider this piece of code.
a = A()
b1 = B()
b2 = B()
a.rels = [b1, b2]
...
#some place later
b3 = B()
a.rels = [b1, b3] # errors sometimes
Sometimes, I get an error at the last line saying
duplicate key value violates unique constraint a_b_pkey
In my understanding, I think it tries to add (a.id, b.id) into 'sec' table again resulting in a unique constraint error. Is that what it is? If so, how can I avoid this? If not, why do I have this error?

The problem is you want to make sure the instances you create are unique. We can create an alternate constructor that checks a cache of existing uncommited instances or queries the database for existing commited instance before returning a new instance.
Here is a demonstration of such a method:
from sqlalchemy import Column, Integer, String, ForeignKey, Table
from sqlalchemy.engine import create_engine
from sqlalchemy.ext.declarative.api import declarative_base
from sqlalchemy.orm import sessionmaker, relationship
engine = create_engine('sqlite:///:memory:', echo=True)
Session = sessionmaker(engine)
Base = declarative_base(engine)
session = Session()
class Role(Base):
__tablename__ = 'role'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False, unique=True)
#classmethod
def get_unique(cls, name):
# get the session cache, creating it if necessary
cache = session._unique_cache = getattr(session, '_unique_cache', {})
# create a key for memoizing
key = (cls, name)
# check the cache first
o = cache.get(key)
if o is None:
# check the database if it's not in the cache
o = session.query(cls).filter_by(name=name).first()
if o is None:
# create a new one if it's not in the database
o = cls(name=name)
session.add(o)
# update the cache
cache[key] = o
return o
Base.metadata.create_all()
# demonstrate cache check
r1 = Role.get_unique('admin') # this is new
r2 = Role.get_unique('admin') # from cache
session.commit() # doesn't fail
# demonstrate database check
r1 = Role.get_unique('mod') # this is new
session.commit()
session._unique_cache.clear() # empty cache
r2 = Role.get_unique('mod') # from database
session.commit() # nop
# show final state
print session.query(Role).all() # two unique instances from four create calls
The create_unique method was inspired by the example from the SQLAlchemy wiki. This version is much less convoluted, favoring simplicity over flexibility. I have used it in production systems with no problems.
There are obviously improvements that can be added; this is just a simple example. The get_unique method could be inherited from a UniqueMixin, to be used for any number of models. More flexible memoizing of arguments could be implemented. This also puts aside the problem of multiple threads inserting conflicting data mentioned by Ants Aasma; handling that is more complex but should be an obvious extension. I leave that to you.

The error you mention is indeed from inserting a conflicting value to the sec table. To be sure that it is from the operation you think it is, not some previous change, turn on SQL logging and check what values is it trying to insert before erroring out.
When overwriting a many-to-many collection value, SQLAlchemy compares the new contents of the collection with the state in the database and correspondingly issues delete and insert statements. Unless you are poking around in SQLAlchemy internals, there should be two ways to encounter this error.
First is concurrent modification: Process 1 fetches the value a.rels and notices that it is empty, meanwhile Process 2 also fetches a.rels, sets it to [b1, b2] and commits flushing the (a,b1),(a,b2) tuples, Process 1 sets a.rels to [b1, b3] noticing that the previous contents was empty and when it tries to flush the sec tuple (a,b1) it gets a duplicate key error. The correct action in such cases is usually to retry the transaction from the top. You can use serializable transaction isolation to instead get a serialization error in this case that is distinct from a business logic error causing a duplicate key error.
The second case happens when you have managed to convince SQLAlchemy that you don't need to know the database state by setting the loading strategy of the rels attribute to noload. This can be done when defining the relationship by adding the lazy='noload' parameter, or when querying, calling .options(noload(A.rels)) on the query. SQLAlchemy will assume that sec table has no matching rows for objects loaded with this strategy in effect.

SQLAlchemy: a better way for update with declarative?

Let's say I have a user table in declarative mode:
class User(Base):
__tablename__ = 'user'
id = Column(u'id', Integer(), primary_key=True)
name = Column(u'name', String(50))
When I know user's id without object loaded into session, I update such user like this:
ex = update(User.__table__).where(User.id==123).values(name=u"Bob Marley")
Session.execute(ex)
I dislike using User.__table__, should I stop worrying with that?
Is there a better way to do this?

There's also some update capability at the ORM level. It doesn't handle any tricky cases yet but for the trivial case of single row update (or bulk update) it works fine. It even goes over any already loaded objects and applies the update on them also. You can use it like this:
session.query(User).filter_by(id=123).update({"name": u"Bob Marley"})

You're working on clause level here, not on model/entity/object level. Clause level is lower than mapped objects. And yes, something have to be done to convert one terms into others.
You could also stay on object level and do:
session = Session()
u = session.query(User).get(123)
u.name = u"Bob Marley"
session.commit()
but it will be significantly slower since it leads to the mapped object construction. And I'm not sure that it is more readable.
In the example your provided I see the most natural and “right” solution. I would not worry about little __table__ magic.

Similar functionality is available via the update() method on Table object.
class User(Base):
__tablename__ = 'user'
id = Column('id', Integer(), primary_key=True)
name = Column('name', String(50))
stmt = User.__table__.update().where(User.id==5).values(name='user #5')
To use User.__table__ is how its done in SQLAlchemy.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.