Flask and SQLalchemy append relationship without loading objects every time - python

I am working a rest api with python flask and SQLalchemy. I have 2 classes Parent and Child:
Class Parent(db.Model):
id = db.Column(db.Integer, nullable=False, autoincrement=True, primary_key=True)
name = db.Column(db.String, nullable=False)
children = relationship('Child',
secondary=parent_has_children,
back_populates='parents'
)
Class Child(db.Model):
id = db.Column(db.Integer, nullable=False, autoincrement=True, primary_key=True)
name = db.Column(db.String, nullable=False)
parents = relationship('Parent',
secondary=parent_has_children,
back_populates='children'
)
parent_has_children = db.Table('parent_has_children', db.metadata,
db.Column('parent_id', db.Integer, ForeignKey('Parent.id')),
db.Column('child_id', db.Integer, ForeignKey('Child.id'))
)
I have a many to many relationship and for that reason i am using a secondary table.Lets say i have a route who recieves a child_id and a parent_id and building their relationship:
#app.route('/buildrelationship', methods=['POST'])
def buildrelationship():
child_id= request.json['student_id']
parent_id = request.json['parent_id']
child = Child.query.get(child_id)
parent = Parent.query.get(parent_id)
parent.children.append(child)
db.session.commit()
This way i added relationship between parent a child but i had to get the parent and the child from database first and then add relationship.
The request.json may have a list of children to append to a parent or a list o parents to append to a particular child.In this case i have to query as many times as the length of list to take the parent or child and then append the relationship.Is there any better way to append relationship instead of load parent and child objects every time?

It's possible to reduce the querying, but you want to lean on the ORM as much as possible. If you didn't do the query to resolve the POSTed data to the Child or Parent object and instead, say, directly inserted into the M2M table the id's presented-- you could cause a bit of a headache with your databases integrity.
You only have to fire one update query-- if you first iterate once through your list of Children you could end up with a children = [Child<1>, Child<2>, Child<3>] list, then you could just Parent.children.append(children)
If, you were really dealing with tens/hundreds of thousands of child objects per POST and time, memory, etc was actually an issue, you could flip to bulk loading the data, pretty much totally skipping the ORM layer and all the safety features it's helping you with.
First you would want to get a list of your current child objects so you could make sure you're not going to cause an integrity error (again, safety gloves are off, and you're pretty odd if you're doing this without good reason).
existing = {x.id for x in Child.query.all()}
# lets say: existing = {1,3,4,5,6}
Next you'd get your list of POSTed child ids:
target = request.json['student_id']
# lets say target = [1,2,3,3,3]
So, we could now filter down on what actually needs to get inserted, cleaning up anything that might cause us trouble:
children_to_insert = {x for x in target if x in existing}
# active_children = {1,3}
Build a list of dictionaries to represent our M2M table data:
parent_id = request.json['parent_id']
bulk = [{'parent_id': parent_id, 'child_id': x} for x in children_to_insert]
# Then we'd bulk operation it into the database:
db.engine.execute(
parent_has_children.insert(bulk)
)
I've skipped all the other integrity checking you would want (does the parent exist? does it already have children?) but you hopefully get the point, which is, just use the ORM and don't try and go around it's back without a very good reason.

Related

SQLAlchemy relationship not updated after flush

I have some models with a relationship defined between them like so:
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True, nullable=False)
children = Relationship(Child, lazy='joined')
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True, nullable=False)
father_id = Column(Integer, ForeignKey('parent.id'), nullable=False)
If I add a child within the session (using session.add(Child(...))), I would expect its father's children relationship to update to include this child after flushing the session. However, I'm not seeing that.
parent = session.query(Parent).get(parent_id)
num_children = len(parent.children)
# num_children == 3, for example
session.add(Child(father_id=parent_id))
session.flush()
new_num_children = len(parent.children)
# num_children == 3, it should be 4!
Any help would be much appreciated!
I can add the new child to the parent.children list directly, and flush the session, but I'm due to other existing code, I want to add it using session.add.
I can also commit after adding the child, which does correctly update the parent.children relationship, but I don't want to commit the transaction at the point.
I've tried adding a backref to the children relationship, but that doesn't seem to make any difference.
I've just run into this problem myself. SQLAlchemy does some internal memoisation to prevent it emitting a new SQL query every time you access a relationship. The problem is that it doesn't seem to realise that updating the foreign key directly could have an effect on the relationship. While SQLAlchemy probably could be patched to deal with this for simple joins, it would be very difficult for complex joins and I presume this is why it behaves the way it does.
When you do session.flush(), you're sending the changes back to the database, but SQLAlchemy doesn't realise it needs to query the database to update the relationship.
If you call session.expire_all() after the flush, then you force SQLAlchemy to reload every model instance and relationship when they're next accessed - this solves the problem.
You can also use session.expire(obj) to do this more selectively or session.refresh(obj) to do it selectively and immediately re-query the database.
For more information about these methods and how they differ, I found a helpful blog post: https://www.michaelcho.me/article/sqlalchemy-commit-flush-expire-refresh-merge-whats-the-difference
Official docs: https://docs.sqlalchemy.org/en/13/orm/session_api.html

Saving complex objects in SQLAlchemy

I'm new to SQLAlchemy and I'm trying to create a new item that includes a list of several sub-items (simple One-to-Many relation) using Flask-RESTful and Flask-SQLAlchemy. I'm trying to create both the item and sub-items simultaneously and I'm not clear on how SQLAlchemy is supposed to work.
class ItemModel(Model):
__tablename__ = 'items'
id = Column(Integer, primary_key=True)
name = Column(String(80))
sub_items = relationship('SubItemModel')
class SubItemModel(db.Model):
__tablename__ = 'sub_items'
id = Column(Integer, primary_key=True)
item_id = Column(Integer, ForeignKey('items.id'))
name = Column(String(80))
item = relationship('ItemModel')
I want to add an item along with several sub_items (through a POST route), but I'm having trouble wrapping my head around which objects to create first and how much SQLAlchemy will do automatically. I got something working by creating an item with no sub_items, creating the sub_items, and then re-saving the item with the sub_items. But this seems pretty clunky, particularly in cases where either the item or some of the sub_items might already exist.
My intuition is just to do something like this:
item = ItemModel(
name="item1",
sub_items=[{name: "subitem1"},{name: "subitem2"}])
session.add(self)
session.commit()
But it's not working (I'm getting errors about unhashable types), and it seems ...too simple, somehow. Like I should define the sub_item objects separately. But since they depend on the item_id I'm not sure how to do this.
I'm sure this has been answered before or explained in a simple tutorial somewhere but I haven't been able to find anything simple enough for me to understand. I'm hoping someone can walk me through the basics. (which parts are supposed to be magical and which parts do I still have to code manually...)
Thanks.
The "many" side of any SQLAlchemy relationship behaves like a standard Python list. You should be creating the SubItemModel objects directly and appending them to ItemModel:
item = ItemModel(name='item1')
subitem1 = SubItemModel(name='subitem1')
subitem2 = SubItemModel(name='subitem2')
item.sub_items.append(subitem1)
item.sub_items.append(subitem2)
or, to append multiple items at once, you can use the standard list extend method:
item = ItemModel(name='item1')
subitem1 = SubItemModel(name='subitem1')
subitem2 = SubItemModel(name='subitem2')
item.sub_items.extend([subitem1, subitem2])
You can, if you want, create the subitems directly when you're adding them:
item = ItemModel(name='item1')
item.sub_items.extend([SubItemModel(name='subitem1'), SubItemModel(name='subitem2')])
Whichever option you choose, you should be adding your created item object to the session, which will automatically include the new child records you've created:
session.add(item)
session.commit()
Voila, your item and subitems should all be inserted into the DB at once.

Python sqlalchemy dynamic relationship

I'm trying to understand if it's possible to do something with Sqlalchemy, or if I'm thinking about it the wrong way. As an example, say I have two (these are just examples) classes:
class Customer(db.Model):
__tablename__ = 'customer'
id = Column(Integer, primary_key=True)
name = Column(String)
addresses = relationship('Address')
class Address(db.Model):
__tablename__ = 'address'
if = Column(Integer, primary_key=True)
address = Column(String)
home = Column(Boolean)
customer_id = Column(Integer, ForeignKey('customer.id'))
And later I want to perform a query that gets the customer and just their home address. Is it possible to do that with something like this:
db.session.query(Customer).join(Address, Address.home == True)
Would the above further refine/restrict the join so the results would only get the home address?
When in doubt if a query construct is what you want, try printing it:
In [29]: db.session.query(Customer).join(Address, Address.home == True)
Out[29]: <sqlalchemy.orm.query.Query at 0x7f14fa651e80>
In [30]: print(_)
SELECT customer.id AS customer_id, customer.name AS customer_name
FROM customer JOIN address ON address.home = true
It is clear that this is not what you want. Every customer is joined with every address that is a home address. Due to how entities are handled this might not be obvious at first. The duplicate rows per customer are ignored and you get a result of distinct Customer entities, even though the underlying query was wrong. The query also effectively just ignores the joined Addresses when forming results.
The easiest solution would be to just query for customer and address tuples with required criteria:
db.session.query(Customer, Address).\
join(Address).\
filter(Address.home)
You could also do something like this
db.session.query(Customer).\
join(Address, (Customer.id == Address.customer_id) & Address.home).\
options(contains_eager(Customer.addresses))
but I'd highly recommend against it. You'd be lying to yourself about what the relationship collection contains and that might backfire at some point. Instead you should add a new one to one relationship to Customer with the custom join condition:
class Customer(db.Model):
...
home_address = relationship(
'Address', uselist=False,
primaryjoin='and_(Customer.id == Address.customer_id, Address.home)')
and then you could use a joined eager load
db.session.query(Customer).options(joinedload(Customer.home_address))
Yeah, that's entirely possible, though you would probably want code like:
# if you know the customer's database id...
# get the first address in the database for the given id that says it's for home
home_address = db.session.query(Address).filter_by(customer_id=customer_id_here, home=True).first()
Instead of having a boolean for home, you might try a 'type' column instead, using an enum. This would let you easily pick an address for places like work, rather than just a binary choice of "either this address is for home or not".
Update: You might also consider using the back_populates keyword argument with the relationship call, so if you have an address instance (called a), you can get the customer it's for with something like a.customer (which is the instance of the Customer class this address is associated with).

Enforcing uniqueness using SQLAlchemy association proxies

I'm trying to use association proxies to make dealing with tag-style records a little simpler, but I'm running into a problem enforcing uniqueness and getting objects to reuse existing tags rather than always create new ones.
Here is a setup similar to what I have. The examples in the documentation have a few recipes for enforcing uniqueness, but they all rely on having access to a session and usually require a single global session, which I cannot do in my case.
from sqlalchemy import Column, Integer, String, create_engine, ForeignKey
from sqlalchemy.orm import sessionmaker, relationship
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.associationproxy import association_proxy
Base = declarative_base()
engine = create_engine('sqlite://', echo=True)
Session = sessionmaker(bind=engine)
def _tag_find_or_create(name):
# can't use global objects here, may be multiple sessions and engines
# ?? No access to session here, how to do a query
tag = session.query(Tag).filter_by(name=name).first()
tag = Tag.query.filter_by(name=name).first()
if not tag:
tag = Tag(name=name)
return tag
class Item(Base)
__tablename__ = 'item'
id = Column(Integer, primary_key=True)
tags = relationship('Tag', secondary='itemtag')
tagnames = association_proxy('tags', 'name', creator=_tag_find_or_create)
class ItemTag(Base)
__tablename__ = 'itemtag'
id = Column(Integer, primary_key=True)
item_id = Column(Integer, ForeignKey('item.id'))
tag_id = Column(Integer, ForeignKey('tag.id'))
class Tag(Base)
__tablename__ = 'tag'
id = Column(Integer, primary_key=True)
name = Column(String(50), nullable=False)
# Scenario 1
session = Session()
item = Item()
session.add(item)
item.tagnames.append('red')
# Scenario 2
item2 = Item()
item2.tagnames.append('blue')
item2.tagnames.append('red')
session.add(item2)
Without the creator function, I just get tons of duplicate Tag items. The creator function seems like the most obvious place to put this type of check, but I'm unsure how to do a query from inside the creator function.
Consider the two scenarios provided at the bottom of the example. In the first example, it seems like there should be a way to get access to the session in the creator function, since the object the tags are being added to is already associated with a session.
In the second example, the Item object isn't yet associated with a session, so the validation check can't happen in the creator function. It would have to happen later when the object is actually added to a session.
For the first scenario, how would I go about getting access to the session object in the creator function?
For the second scenario, is there a way to "listen" for when the parent object is added to a session and validate the association proxies at that point?
For the first scenario, you can use object_session.
As for the question overall: true, you need access to the current session; if using scoped_session in your application is appropriate, then the second part of the Recipe you link to should work fine to use. See Contextual/Thread-local Sessions for more info.
Working with events and change objects when they change from transient to persistent state will not make your code pretty or very robust. So I would immediately add new Tag objects to the session, and if the transaction is rolled back, they would not be in the database.
Note that in a multi-user environment you are likely to have race condition: the same tag is new and created in simultaneously by two users. The user who commits last will fail (if you have a unique constraint on the database).
In this case you might consider be without the unique constraint, and have a (daily) procedure to clean those duplicates up (and reassign relations). With time there would be less and less new items, and less possibilities for such clashes.

How to do two-level many-to-many relationships?

I'm working in Flask with Flask-SQLAlchemy, and trying to setup a many-to-many-to-many relationship: clients can have multiple orders and orders can also have multiple clients (many to many); each order in turn contains a list of unique items (one to many).
I followed the SQLAlchemy documents to setup an association table for the many-to-many relationship, and used the normal relationship/foreign key for the one-to-many relationship; all references are set to lazy='dynamic'.
association_table = Table('association', Base.metadata,
Column('left_id', Integer, ForeignKey('left.id')),
Column('right_id', Integer, ForeignKey('right.id'))
)
what is an efficient way to retrieve all items associated with a client? I'm assuming [item for item in order.items.all() for order in client.orders] will work (less the problem above), but is there more efficient way to do it? What if the results needs to be ordered?
Update
I now know two ways to retrieve the items for the orders of a client by the following two methods (the first one from Audrius' answer):
db.session.query(Item).\
join(Order).join(Order.clients).\
filter(Client.id == 42).\
order_by(Item.timestamp)
and
Item.query.filter(Item.order_id._in(client.order_ids)).order_by(Item.timestamp)‌​
I believe they both provide the same result, but which one should be preferred for efficiency?
When writing query directly in SQL, you would be using joins to retrieve the data you want efficiently (as Rachcha demonstrated in his answer). The same applies to SQLAlchemy. Refer to the SA docs on join() for more examples.
If your model is defined like the following (using Flask-SQLAlchemy, since you tagged your question with its tag):
clients_orders = db.Table('clients_orders',
db.Column('client_id', db.Integer, db.ForeignKey('client.id'),
primary_key=True),
db.Column('order_id', db.Integer, db.ForeignKey('order.id'),
primary_key=True)
)
class Client(db.Model):
id = db.Column(db.Integer, primary_key=True)
orders = db.relationship('Order', secondary=clients_orders,
backref='clients')
# Define other columns...
class Order(db.Model):
id = db.Column(db.Integer, primary_key=True)
# Define other columns...
class Item(db.Model):
id = db.Column(db.Integer, primary_key=True)
order_id = db.Column(db.Integer, db.ForeignKey('order.id'), nullable=False)
order = db.relationship('Order', backref='items')
timestamp = db.Column(db.DateTime)
# Define other columns...
Then your query could look like this:
db.session.query(Item).\
join(Order).join(Order.clients).\
filter(Client.id == 42).\
order_by(Item.timestamp)

Categories