I have some models with a relationship defined between them like so:
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True, nullable=False)
children = Relationship(Child, lazy='joined')
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True, nullable=False)
father_id = Column(Integer, ForeignKey('parent.id'), nullable=False)
If I add a child within the session (using session.add(Child(...))), I would expect its father's children relationship to update to include this child after flushing the session. However, I'm not seeing that.
parent = session.query(Parent).get(parent_id)
num_children = len(parent.children)
# num_children == 3, for example
session.add(Child(father_id=parent_id))
session.flush()
new_num_children = len(parent.children)
# num_children == 3, it should be 4!
Any help would be much appreciated!
I can add the new child to the parent.children list directly, and flush the session, but I'm due to other existing code, I want to add it using session.add.
I can also commit after adding the child, which does correctly update the parent.children relationship, but I don't want to commit the transaction at the point.
I've tried adding a backref to the children relationship, but that doesn't seem to make any difference.
I've just run into this problem myself. SQLAlchemy does some internal memoisation to prevent it emitting a new SQL query every time you access a relationship. The problem is that it doesn't seem to realise that updating the foreign key directly could have an effect on the relationship. While SQLAlchemy probably could be patched to deal with this for simple joins, it would be very difficult for complex joins and I presume this is why it behaves the way it does.
When you do session.flush(), you're sending the changes back to the database, but SQLAlchemy doesn't realise it needs to query the database to update the relationship.
If you call session.expire_all() after the flush, then you force SQLAlchemy to reload every model instance and relationship when they're next accessed - this solves the problem.
You can also use session.expire(obj) to do this more selectively or session.refresh(obj) to do it selectively and immediately re-query the database.
For more information about these methods and how they differ, I found a helpful blog post: https://www.michaelcho.me/article/sqlalchemy-commit-flush-expire-refresh-merge-whats-the-difference
Official docs: https://docs.sqlalchemy.org/en/13/orm/session_api.html
Related
How does one define a ForeignKey and relationship such that one can disable SQLAlchemy's FK-nullifying behavior?
The documentation here seems to describe the use of
passive_deletes=True to allow the database to cascade delete, but only in the context of defining the cascade relationship
property documented here, a property which it seems
to me defines how SQLAlchemy will perform the cascade deletion itself, which is explicitly described as slower than the database engine's
cascade deletion in this section
(see the green box titled ORM-level “delete” cascade vs. FOREIGN KEY level “ON DELETE” cascade).
To use the database's cascade delete, are we supposed to do the following?
define ondelete="CASCADE" on the ForeignKey column,
define passive_deletes=True on the same relationships,
AND define a cascade="delete, delete-orphan" parameter on all relationships between the objects?
It is step 3 that I seem to be confused about: it seems to be defining the cascade for SQLAlchemy rather than allowing the database
to perform it's own deletion. But SQLAlchemy seems to want to null out all dependent foreign keys before the database can get a
chance to cascade delete. I need to disable this behavior, but passive_deletes=True seems not to do it on its own.
The (late) answer here explicitly addresses my issue, but it is not working. He states
There's an important caveat here. Notice how I have a relationship specified with passive_deletes=True? If you don't have that, the entire thing will not work.
This is because by default when you delete a parent record SqlAlchemy does something really weird.
It sets the foreign keys of all child rows to NULL. So if you delete a row from parent_table where id = 5, then it will basically execute
UPDATE child_table SET parent_id = NULL WHERE parent_id = 5
In my code
class Annotation(SearchableMixin, db.Model):
id = db.Column(db.Integer, primary_key=True)
locked = db.Column(db.Boolean, index=True, default=False)
active = db.Column(db.Boolean, default=True)
HEAD = db.relationship("Edit",
primaryjoin="and_(Edit.current==True,"
"Edit.annotation_id==Annotation.id)", uselist=False,
lazy="joined", passive_deletes=True)
edits = db.relationship("Edit",
primaryjoin="and_(Edit.annotation_id==Annotation.id,"
"Edit.approved==True)", lazy="joined", passive_deletes=True)
history = db.relationship("Edit",
primaryjoin="and_(Edit.annotation_id==Annotation.id,"
"Edit.approved==True)", lazy="dynamic", passive_deletes=True)
all_edits = db.relationship("Edit",
primaryjoin="Edit.annotation_id==Annotation.id", lazy="dynamic",
passive_deletes=True)
class Edit(db.Model):
id = db.Column(db.Integer, primary_key=True)
edit_num = db.Column(db.Integer, default=0)
approved = db.Column(db.Boolean, default=False, index=True)
rejected = db.Column(db.Boolean, default=False, index=True)
annotation_id = db.Column(db.Integer,
db.ForeignKey("annotation.id", ondelete="CASCADE"), index=True)
hash_id = db.Column(db.String(40), index=True)
current = db.Column(db.Boolean, default=False, index=True, passive_deletes=True)
annotation = db.relationship("Annotation", foreign_keys=[annotation_id])
previous = db.relationship("Edit",
primaryjoin="and_(remote(Edit.annotation_id)==foreign(Edit.annotation_id),"
"remote(Edit.edit_num)==foreign(Edit.edit_num-1))")
priors = db.relationship("Edit",
primaryjoin="and_(remote(Edit.annotation_id)==foreign(Edit.annotation_id),"
"remote(Edit.edit_num)<=foreign(Edit.edit_num-1))",
uselist=True, passive_deletes=True)
simply setting passive_deletes=True on the parent relationship is not working. I also thought perhaps it was being caused by the relationship
from the child to it's siblings (the relationships Edit.previous and Edit.priors) but setting passive_deletes=True on those two relationships
does not solve the problem, and it causes the following warnings when I simply run an Edit.query.get(n):
/home/malan/projects/icc/icc/venv/lib/python3.7/site-packages/sqlalchemy/orm/relationships.py:1790: SAWarning: On Edit.previous, 'passive_deletes' is normally configured on one-to-many, one-to-one, many-to-many relationships only.
% self)
/home/malan/projects/icc/icc/venv/lib/python3.7/site-packages/sqlalchemy/orm/relationships.py:1790: SAWarning: On Edit.priors, 'passive_deletes' is normally configured on one-to-many, one-to-one, many-to-many relationships only.
% self)
I have actually found this interesting question from 2015 that has never had an answer. It details a failed attempt to execute documentation code.
It seems that after a thorough attempt to analyze my relationships, I have discovered the problem.
First, I will note, passive_deletes=True is the only necessary parameter. You do not need to define cascade at all to take advantage of the database's cascade system.
More importantly, my problem seems to have stemmed from my tree of foreign-key depedencies. I had a cascade that looked like this:
Annotation
/ | \
Vote Edit annotation_followers
/ \
EditVote tags
Where ondelete="CASCADE" was defined for each parent_id column on each child class. Until I set passive_deletes on all of the children in the graph, the nullification behavior continued to misbehave.
For anyone running into a similar problem, my advice is: thoroughly analyze all of your intersecting relationships, and define passive_deletes=True on all of them that it makes sense.
That said, there are still complications I'm working out; for instance, on a many-to-many table the id's aren't even nullifying. Possible next question.
I am working a rest api with python flask and SQLalchemy. I have 2 classes Parent and Child:
Class Parent(db.Model):
id = db.Column(db.Integer, nullable=False, autoincrement=True, primary_key=True)
name = db.Column(db.String, nullable=False)
children = relationship('Child',
secondary=parent_has_children,
back_populates='parents'
)
Class Child(db.Model):
id = db.Column(db.Integer, nullable=False, autoincrement=True, primary_key=True)
name = db.Column(db.String, nullable=False)
parents = relationship('Parent',
secondary=parent_has_children,
back_populates='children'
)
parent_has_children = db.Table('parent_has_children', db.metadata,
db.Column('parent_id', db.Integer, ForeignKey('Parent.id')),
db.Column('child_id', db.Integer, ForeignKey('Child.id'))
)
I have a many to many relationship and for that reason i am using a secondary table.Lets say i have a route who recieves a child_id and a parent_id and building their relationship:
#app.route('/buildrelationship', methods=['POST'])
def buildrelationship():
child_id= request.json['student_id']
parent_id = request.json['parent_id']
child = Child.query.get(child_id)
parent = Parent.query.get(parent_id)
parent.children.append(child)
db.session.commit()
This way i added relationship between parent a child but i had to get the parent and the child from database first and then add relationship.
The request.json may have a list of children to append to a parent or a list o parents to append to a particular child.In this case i have to query as many times as the length of list to take the parent or child and then append the relationship.Is there any better way to append relationship instead of load parent and child objects every time?
It's possible to reduce the querying, but you want to lean on the ORM as much as possible. If you didn't do the query to resolve the POSTed data to the Child or Parent object and instead, say, directly inserted into the M2M table the id's presented-- you could cause a bit of a headache with your databases integrity.
You only have to fire one update query-- if you first iterate once through your list of Children you could end up with a children = [Child<1>, Child<2>, Child<3>] list, then you could just Parent.children.append(children)
If, you were really dealing with tens/hundreds of thousands of child objects per POST and time, memory, etc was actually an issue, you could flip to bulk loading the data, pretty much totally skipping the ORM layer and all the safety features it's helping you with.
First you would want to get a list of your current child objects so you could make sure you're not going to cause an integrity error (again, safety gloves are off, and you're pretty odd if you're doing this without good reason).
existing = {x.id for x in Child.query.all()}
# lets say: existing = {1,3,4,5,6}
Next you'd get your list of POSTed child ids:
target = request.json['student_id']
# lets say target = [1,2,3,3,3]
So, we could now filter down on what actually needs to get inserted, cleaning up anything that might cause us trouble:
children_to_insert = {x for x in target if x in existing}
# active_children = {1,3}
Build a list of dictionaries to represent our M2M table data:
parent_id = request.json['parent_id']
bulk = [{'parent_id': parent_id, 'child_id': x} for x in children_to_insert]
# Then we'd bulk operation it into the database:
db.engine.execute(
parent_has_children.insert(bulk)
)
I've skipped all the other integrity checking you would want (does the parent exist? does it already have children?) but you hopefully get the point, which is, just use the ORM and don't try and go around it's back without a very good reason.
My SQLAlchemy application (running on top of MariaDB) includes two models MyModelA and MyModelB where the latter is a child-record of the former:
class MyModelA(db.Model):
a_id = db.Column(db.Integer, nullable=False, primary_key=True)
my_field1 = db.Column(db.String(1024), nullable=True)
class MyModelB(db.Model):
b_id = db.Column(db.Integer, nullable=False, primary_key=True)
a_id = db.Column(db.Integer, db.ForeignKey(MyModelA.a_id), nullable=False)
my_field2 = db.Column(db.String(1024), nullable=True)
These are the instances of MyModelA and MyModelB that I create:
>>> my_a = MyModelA(my_field1="A1")
>>> my_a.aid
1
>>> MyModelB(a_id=my_a.aid, my_field2="B1")
I have the following code that deletes the instance of MyModelA where a_id==1:
db.session.commit()
try:
my_a = MyModelA.query.get(a_id=1)
assert my_a is not None
print "#1) Number of MyModelAs: %s\n" % MyModelA.query.count()
db.session.delete(my_a)
db.session.commit()
except IntegrityError:
print "#2) Cannot delete instance of MyModelA because it has child record(s)!"
db.session.rollback()
print "#3) Number of MyModelAs: %s\n" % MyModelA.query.count()
When I run this code look at the unexpected results I get:
#1) Number of MyModelAs: 1
#2) Cannot delete instance of MyModelA because it has child record(s)!
#3) Number of MyModelAs: 0
The delete supposedly fails and the DB throws an exception which causes a rollback. However even after the rollback, the number of rows in the table indicates that the row -- which supposedly wasn't deleted -- is actually gone!!!
Why is this happening? How can I fix this? It seems like a bug in SQLAlchemy.
TL;DR
Your problem might be related to the lack of explicit relationship declaration.
For example, here there is a sample of objects' relationship. In addition to the usage of a ForeignKey field, the class explicitly uses the relationship directive to define that connection. In the session API documentation, the following text appears:
object references should be constructed at the object level, not at the foreign key level
Which might imply to the way of SQLAlchemy to manage relations. I am not deeply familiar with the underlying mechanisms, but it is possible that this is what happens. Your session only includes the MyModelA object. Since you did not use the relationship() directive in the definition of MyModelB, objects of MyModelA type are not aware to the fact that some other object might refer them through a ForeignKey. Hence, when the session is about to commit, it is not aware to the fact that deleting the object affects some other MyModelB object, and its transaction mechanism does not take that into account.
I suggest that adding the relationship explicitly might prevent that behavior.
I have a large number of .create() calls that rely on a ForeignKey in another table (Users). However, there is no point in the code where I actually create users.
Is there a way for there to be a Users entry created for each foreign key is specified on another table in SQLAlchemy?
For example:
class Rr(db.Model):
__tablename__ = 'rr'
id = db.Column(db.Integer, primary_key=True)
submitter = db.Column(db.String(50), db.ForeignKey('user.username'))
class User(db.Model):
__tablename__ = 'user'
username = db.Column(db.String, primary_key=True)
so If I call Rr(id, submitter=John) is there a way for a John entry to be created in the user table if it does not already exist?
I understand that I can create a wrapper around the .create() method such that it checks the submitter and creates one if it doesn't exist but this seems excess as there are a large number of models that want Users to be automatically created.
I can't think of any orm or sql implementation that does what you ask but there is something that effectively accomplishes what you seek to do described in this SO answer: Does SQLAlchemy have an equivalent of Django's get_or_create?
basically get the User from the db if it exists, if it doesn't create it.
The only down side to this method is that you would need to do 2 queries instead of one but I don't think there is a way to do what you seek in one query
I'm trying to use association proxies to make dealing with tag-style records a little simpler, but I'm running into a problem enforcing uniqueness and getting objects to reuse existing tags rather than always create new ones.
Here is a setup similar to what I have. The examples in the documentation have a few recipes for enforcing uniqueness, but they all rely on having access to a session and usually require a single global session, which I cannot do in my case.
from sqlalchemy import Column, Integer, String, create_engine, ForeignKey
from sqlalchemy.orm import sessionmaker, relationship
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.associationproxy import association_proxy
Base = declarative_base()
engine = create_engine('sqlite://', echo=True)
Session = sessionmaker(bind=engine)
def _tag_find_or_create(name):
# can't use global objects here, may be multiple sessions and engines
# ?? No access to session here, how to do a query
tag = session.query(Tag).filter_by(name=name).first()
tag = Tag.query.filter_by(name=name).first()
if not tag:
tag = Tag(name=name)
return tag
class Item(Base)
__tablename__ = 'item'
id = Column(Integer, primary_key=True)
tags = relationship('Tag', secondary='itemtag')
tagnames = association_proxy('tags', 'name', creator=_tag_find_or_create)
class ItemTag(Base)
__tablename__ = 'itemtag'
id = Column(Integer, primary_key=True)
item_id = Column(Integer, ForeignKey('item.id'))
tag_id = Column(Integer, ForeignKey('tag.id'))
class Tag(Base)
__tablename__ = 'tag'
id = Column(Integer, primary_key=True)
name = Column(String(50), nullable=False)
# Scenario 1
session = Session()
item = Item()
session.add(item)
item.tagnames.append('red')
# Scenario 2
item2 = Item()
item2.tagnames.append('blue')
item2.tagnames.append('red')
session.add(item2)
Without the creator function, I just get tons of duplicate Tag items. The creator function seems like the most obvious place to put this type of check, but I'm unsure how to do a query from inside the creator function.
Consider the two scenarios provided at the bottom of the example. In the first example, it seems like there should be a way to get access to the session in the creator function, since the object the tags are being added to is already associated with a session.
In the second example, the Item object isn't yet associated with a session, so the validation check can't happen in the creator function. It would have to happen later when the object is actually added to a session.
For the first scenario, how would I go about getting access to the session object in the creator function?
For the second scenario, is there a way to "listen" for when the parent object is added to a session and validate the association proxies at that point?
For the first scenario, you can use object_session.
As for the question overall: true, you need access to the current session; if using scoped_session in your application is appropriate, then the second part of the Recipe you link to should work fine to use. See Contextual/Thread-local Sessions for more info.
Working with events and change objects when they change from transient to persistent state will not make your code pretty or very robust. So I would immediately add new Tag objects to the session, and if the transaction is rolled back, they would not be in the database.
Note that in a multi-user environment you are likely to have race condition: the same tag is new and created in simultaneously by two users. The user who commits last will fail (if you have a unique constraint on the database).
In this case you might consider be without the unique constraint, and have a (daily) procedure to clean those duplicates up (and reassign relations). With time there would be less and less new items, and less possibilities for such clashes.