How to do two-level many-to-many relationships? - python

I'm working in Flask with Flask-SQLAlchemy, and trying to setup a many-to-many-to-many relationship: clients can have multiple orders and orders can also have multiple clients (many to many); each order in turn contains a list of unique items (one to many).
I followed the SQLAlchemy documents to setup an association table for the many-to-many relationship, and used the normal relationship/foreign key for the one-to-many relationship; all references are set to lazy='dynamic'.
association_table = Table('association', Base.metadata,
Column('left_id', Integer, ForeignKey('left.id')),
Column('right_id', Integer, ForeignKey('right.id'))
)
what is an efficient way to retrieve all items associated with a client? I'm assuming [item for item in order.items.all() for order in client.orders] will work (less the problem above), but is there more efficient way to do it? What if the results needs to be ordered?
Update
I now know two ways to retrieve the items for the orders of a client by the following two methods (the first one from Audrius' answer):
db.session.query(Item).\
join(Order).join(Order.clients).\
filter(Client.id == 42).\
order_by(Item.timestamp)
and
Item.query.filter(Item.order_id._in(client.order_ids)).order_by(Item.timestamp)‌​
I believe they both provide the same result, but which one should be preferred for efficiency?

When writing query directly in SQL, you would be using joins to retrieve the data you want efficiently (as Rachcha demonstrated in his answer). The same applies to SQLAlchemy. Refer to the SA docs on join() for more examples.
If your model is defined like the following (using Flask-SQLAlchemy, since you tagged your question with its tag):
clients_orders = db.Table('clients_orders',
db.Column('client_id', db.Integer, db.ForeignKey('client.id'),
primary_key=True),
db.Column('order_id', db.Integer, db.ForeignKey('order.id'),
primary_key=True)
)
class Client(db.Model):
id = db.Column(db.Integer, primary_key=True)
orders = db.relationship('Order', secondary=clients_orders,
backref='clients')
# Define other columns...
class Order(db.Model):
id = db.Column(db.Integer, primary_key=True)
# Define other columns...
class Item(db.Model):
id = db.Column(db.Integer, primary_key=True)
order_id = db.Column(db.Integer, db.ForeignKey('order.id'), nullable=False)
order = db.relationship('Order', backref='items')
timestamp = db.Column(db.DateTime)
# Define other columns...
Then your query could look like this:
db.session.query(Item).\
join(Order).join(Order.clients).\
filter(Client.id == 42).\
order_by(Item.timestamp)

Related

SQLAlchemy Unique Groups

I have a setup something like this (I am only including relevant fields, both Member and Group have other information such as Name, Description, etc.):
group_members = db.Table('group_members',
db.Column('member_id', db.Integer, db.ForeignKey('Member'), primary_key=True),
db.Column('group_id', db.Integer, db.ForeignKey('Group'), primary_key=True))
class Member(db.Model):
id = db.Column(db.Integer, primary_key=True)
class Group(db.Model):
id = db.Column(db.Integer, primary_key=True)
members = db.relationship('Member', secondary=group_members,
lazy='subquery', backref=db.backref('group_members', lazy=True))
Members can be a part of many groups, and groups can have many members.
I want each Group's set of members to be unique. For example, given a group with three members, that would be the only group with those three members. Order should not matter, only membership. It should not be possible to add a group to the database with the same set of members as another group in the database.
I have struggled to find anything that would give the results I require here. The solution I was working on before was to add a hybrid_property which is a string representation of a comma-separated, ordered list of the ids of the Members, but I believe there is some issue with accessing relationships within a hybrid_property, and of course that seems a bit convoluted. I feel there must be some easy way to do this that I am missing.

Purpose of joining tables in SQLAlchemy

I'm currently switching from raw SQL queries to the SQLAlchemy package and I'm wondering when to join there tables.
I have 3 tables. Actor and movie are in a M:N relationship. Actor_Movie is the junction table:
class Actor(Base):
__tablename__ = 'actor'
act_id = Column(Integer, primary_key=True)
last_name = Column(String(150), nullable=False, index=True)
first_name = Column(String(150), nullable=False, index=True)
movies = relationship('Movie', secondary='actor_movie')
def __init__(self, last_name, first_name):
self.last_name = last_name
self.first_name = first_name
class Movie(Base):
__tablename__ = 'movie'
movie_id = Column(Integer, primary_key=True)
title = Column(String(150))
actors = relationship('Actor', secondary='actor_movie')
def __init__(self, title):
self.title = title
class ActorMovie(Base):
__tablename__ = 'actor_movie'
fk_actor_id = Column(Integer, ForeignKey('actor.act_id'), primary_key=True)
fk_movie_id = Column(Integer, ForeignKey('movie.movie_id'), primary_key=True)
def __init__(self, fk_actor_id, fk_movie_id):
self.fk_actor_id = fk_actor_id
self.fk_movie_id = fk_movie_id
When I write a simple query like:
result = session.query(Movie).filter(Movie.title == 'Terminator').first()
I get the Movie Object back with and actor field. This actor field contains an InstrumentedList with all actors that are related to the film. This seems like a lot overhead when the relationships are always joined.
Why is the relationship automatically populated and when do I need a manual join?
Based on the result I'm not even sure if the junction table is correct. This seems to be the most "raw SQL" way. I also saw alternative approaches i. e.:
Official SQLAlchemy documentation
"This seems like a lot overhead when the relationships are always joined."
They are not. By default relationships perform a select the first time they are accessed — so called lazy loading.
"Why is the relationship automatically populated"
It was accessed on an instance and the relationship is using default configuration.
"...and when do I need a manual join?"
If you need to for example filter the query based on the related table, or you are fetching many movies and know beforehand that you will need all or some of their actors, though for a many to many relationship selectin eager loading may perform better than a join.
"Based on the result I'm not even sure if the junction table is correct."
That is the correct approach. SQLAlchemy is an ORM, and relationship attributes are the object side of the mapping and association/junction tables the relational side.
All in all the purposes are much the same as in raw SQL, but the ORM handles joins under the hood in some cases, such as eager loading, if configured or instructed to do so. As they say on the home page:
SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.

SQLAlchemy - How to profit from dynamic and eager loading at the same time

I have two tables Factorys and Products, each Factory can have a large collection of Products, so the lazy=dynamic has been applied.
class Factory(Base):
__tablename__ = 'factorys'
ID = Column(Integer, primary_key=True)
products = relationship("Product",lazy='dynamic' )
class Product(Base):
__tablename__ = 'products'
ID = Column(Integer, primary_key=True)
factory_id = Column(Integer, ForeignKey('factorys.ID'))
Name = Column(Text)
In case all products of a factory are needed:
factory.products.all()
should be applied. But since the factory is already loaded at this point of time, it is more performant to have an eagerjoined loading between Factory and Product.
But a joined relation between both tables make the overall performance worse due to the large collection of products, and is not required for example when appending products to a factory.
Is it possible to define different relations between two tables, but using them only in specific cases? For example in a method for the factory class such as:
class Factory(Base):
__tablename__ = 'factorys'
ID = Column(Integer, primary_key=True)
products = relationship("Product",lazy='dynamic' )
def _getProducts():
return relationship("Product",lazy='joined' )
How can I get all the products of a factory in a performant way, not loosing performance when adding products to a factory?
Any tips would be appreciated.
I have run into the same question and had a very difficult time finding the answer.
What you are proposing with returning a relationship will not work as SQLAlchemy must know about the relationship belonging to the table, but doing:
class Factory(Base):
__tablename__ = 'factorys'
ID = Column(Integer, primary_key=True)
products_dyn = relationship("Product",lazy='dynamic', viewonly=True)
products = relationship("Product",lazy='joined' )
should work. Note the viewonly attribute, it is very important because without it SQLAlchemy may try to use both relationships when you add a product to the factory and may produce duplicate entries in specific cases (such as using a secondary table for the relationship).
This way you could use both the eager loaded products and perform an optimized query with the join while hiding it with the table declaration.
Hope that helps!

Can SQLAlchemy secondary tables be db.Model instances?

I'm working on a Flask-SQLAlchemy project and I've implemented a nice JSON serialization method that applies to my SQLAlchemy models. After querying the DB, I can then easily present that data via a REST API. When I'm using secondary tables for many-to-many relationships, those tables are instances of db.Table, like so:
elections_voters = db.Table(
'elections_voters',
db.metadata,
db.Column('election_id', db.Integer, db.ForeignKey('elections.id'), primary_key=True),
db.Column('user_id', db.Integer, db.ForeignKey('users.id'), primary_key=True),
)
class Election(MyModel):
__tablename__ = 'elections'
id = db.Column(db.Integer, db.Sequence('election_id_seq'), autoincrement=True, primary_key=True)
name = db.Column(db.Unicode(255))
voters = db.relationship('User', secondary=elections_voters, backref='electionsVoting')
Let's say I wanted an API that presented just a list of voters for a particular election. I'd do something like Election.query.get_or_404(election_id), then return election.voters.mycustomserialize(), as voters would be populated by SQLAlchemy. However, it's not an instance of db.Model like its parent, so I can't use my serializing method on that child.
Is there a way to set up my models such that the elections_voters secondary table is a full instance of db.Model rather than just db.Table, and is that wise? My serialization method needs access to the column names, which is why I haven't just split it out into a standalone method.
I was over-thinking it:
In this case, elections.voters is just a list, not a model instance. I can just use list comprehension to handle the serializing.
Also, there's Association Objects that do what I described, but I don't think I'll need them.

min/max with orm relationships

I'm trying to find the min/max of a collection off a foreign key. I know that you can do session.query with func.min and func.max, but is there a way that lets me use the standard ORM relationship stuff?
For example with a blog, if I wanted to find the biggest "number comment" for a given post given the schema below, is it possible to do something like Post.query.get(0).number_comments.max()?
class Post(base):
id = Column(Integer, primary_key=True)
number_comments = relationship("NumberComment")
class NumberComment(base):
id = Column(Integer, primary_key=True)
num = Column(Integer, nullable=False)
As in case of using raw SQL, you need to join those tables in your query:
# This class lacks a foreign key in your example.
class NumberComment(base):
# ...
post_id = Column(Integer, ForeignKey(Post.id), nullable=False)
# ...
session.query(func.max(NumberComment.num)).join(Post).\
filter(Post.id == 1).scalar()
There's no other way to do this, at least not like you wanted. There's a reason why SQLAlchemy is called like that and not ORMSorcery ;-)
My advice would be to think in terms of SQL when trying to come up with a query, this will help you a lot.

Categories