Purpose of joining tables in SQLAlchemy - python

I'm currently switching from raw SQL queries to the SQLAlchemy package and I'm wondering when to join there tables.
I have 3 tables. Actor and movie are in a M:N relationship. Actor_Movie is the junction table:
class Actor(Base):
__tablename__ = 'actor'
act_id = Column(Integer, primary_key=True)
last_name = Column(String(150), nullable=False, index=True)
first_name = Column(String(150), nullable=False, index=True)
movies = relationship('Movie', secondary='actor_movie')
def __init__(self, last_name, first_name):
self.last_name = last_name
self.first_name = first_name
class Movie(Base):
__tablename__ = 'movie'
movie_id = Column(Integer, primary_key=True)
title = Column(String(150))
actors = relationship('Actor', secondary='actor_movie')
def __init__(self, title):
self.title = title
class ActorMovie(Base):
__tablename__ = 'actor_movie'
fk_actor_id = Column(Integer, ForeignKey('actor.act_id'), primary_key=True)
fk_movie_id = Column(Integer, ForeignKey('movie.movie_id'), primary_key=True)
def __init__(self, fk_actor_id, fk_movie_id):
self.fk_actor_id = fk_actor_id
self.fk_movie_id = fk_movie_id
When I write a simple query like:
result = session.query(Movie).filter(Movie.title == 'Terminator').first()
I get the Movie Object back with and actor field. This actor field contains an InstrumentedList with all actors that are related to the film. This seems like a lot overhead when the relationships are always joined.
Why is the relationship automatically populated and when do I need a manual join?
Based on the result I'm not even sure if the junction table is correct. This seems to be the most "raw SQL" way. I also saw alternative approaches i. e.:
Official SQLAlchemy documentation

"This seems like a lot overhead when the relationships are always joined."
They are not. By default relationships perform a select the first time they are accessed — so called lazy loading.
"Why is the relationship automatically populated"
It was accessed on an instance and the relationship is using default configuration.
"...and when do I need a manual join?"
If you need to for example filter the query based on the related table, or you are fetching many movies and know beforehand that you will need all or some of their actors, though for a many to many relationship selectin eager loading may perform better than a join.
"Based on the result I'm not even sure if the junction table is correct."
That is the correct approach. SQLAlchemy is an ORM, and relationship attributes are the object side of the mapping and association/junction tables the relational side.
All in all the purposes are much the same as in raw SQL, but the ORM handles joins under the hood in some cases, such as eager loading, if configured or instructed to do so. As they say on the home page:
SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.

Related

Flask SQLAlchemy: many to many relationship error

I am trying to set up many-to-many relationship in SQLAlchemy but I am getting the error:
from shopapp import db
db.create_all()
sqlalchemy.exc.NoReferencedTableError: Foreign key associated with column 'shoppinglists_products.shoppinglist_id_v2' could not find table 'shoppinglist' with which to generate a foreign key to target column 'id'
My code:
from sqlalchemy import ForeignKey
from shopapp import db
shoppinglists_products = db.Table("shoppinglists_products",
db.Column("shoppinglist_id", db.Integer, ForeignKey("shoppinglist.id")),
db.Column("product_id", db.Integer, ForeignKey("product.id")))
class ShoppingList(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(20), unique=True, nullable=False)
products = db.relationship('Product', back_populates="shoppinglists", secondary="shoppinglists_products")
class Product(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(20), unique=True, nullable=False)
Where is the problem?
It seems like Flask-SQLAlchemy has problem finding the table for foreign key reference. Based on your code, here are the two ways you can fix this:
1) Fix shoppinglists_products table:
Flask-SQLAlchemy often converts the CamelCased model names into a syntax similar to this: camel_cased. In your case, ShoppingList will be referred to as shopping_list. Therefore, changing the ForeignKey("shoppinglist.id") to ForeignKey("shopping_list.id") will do the trick.
shoppinglists_products = db.Table("shoppinglists_products",
db.Column("shoppinglist_id", db.Integer, ForeignKey("shopping_list.id")), # <-- fixed
2) Change the model names:
If you'd like, you could go ahead and change the model name from ShoppingList to Shopping and later refer to this as shopping. This would prevent any confusion from rendering further. Usually, developers don't quite often go for a class name which is combined of two words, especially for the ORM cases. This is because various frameworks has different ways of interpreting the class names to create tables.
Expanding on #P0intMaN's answer - explicitly providing the SQL Alchemy table name with __tablename__ = "ShoppingList" (for example) lets you use your preferred case style and prevents SQLAlchemy from 'helping' you by changing the name of something kind of important without telling you.
class ShoppingList(db.Model):
__tablename__ = "ShoppingList"
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(20), unique=True, nullable=False)
products = db.relationship('Product', back_populates="shoppinglists", secondary="shoppinglists_products")
In many/most Flask tutorials and books, simplistic table names (e.g. posts, comments, users) are used, which elide this issue. Thus a trap awaits for those of us who insist on meaningful CamelCased class names. This is mentioned somewhat casually in the documentation here: https://flask-sqlalchemy.palletsprojects.com/en/2.x/models/
Some parts that are required in SQLAlchemy are optional in
Flask-SQLAlchemy. For instance the table name is automatically set for
you unless overridden. It’s derived from the class name converted to
lowercase and with “CamelCase” converted to “camel_case”. To override
the table name, set the tablename class attribute.

SQLAlchemy - How to profit from dynamic and eager loading at the same time

I have two tables Factorys and Products, each Factory can have a large collection of Products, so the lazy=dynamic has been applied.
class Factory(Base):
__tablename__ = 'factorys'
ID = Column(Integer, primary_key=True)
products = relationship("Product",lazy='dynamic' )
class Product(Base):
__tablename__ = 'products'
ID = Column(Integer, primary_key=True)
factory_id = Column(Integer, ForeignKey('factorys.ID'))
Name = Column(Text)
In case all products of a factory are needed:
factory.products.all()
should be applied. But since the factory is already loaded at this point of time, it is more performant to have an eagerjoined loading between Factory and Product.
But a joined relation between both tables make the overall performance worse due to the large collection of products, and is not required for example when appending products to a factory.
Is it possible to define different relations between two tables, but using them only in specific cases? For example in a method for the factory class such as:
class Factory(Base):
__tablename__ = 'factorys'
ID = Column(Integer, primary_key=True)
products = relationship("Product",lazy='dynamic' )
def _getProducts():
return relationship("Product",lazy='joined' )
How can I get all the products of a factory in a performant way, not loosing performance when adding products to a factory?
Any tips would be appreciated.
I have run into the same question and had a very difficult time finding the answer.
What you are proposing with returning a relationship will not work as SQLAlchemy must know about the relationship belonging to the table, but doing:
class Factory(Base):
__tablename__ = 'factorys'
ID = Column(Integer, primary_key=True)
products_dyn = relationship("Product",lazy='dynamic', viewonly=True)
products = relationship("Product",lazy='joined' )
should work. Note the viewonly attribute, it is very important because without it SQLAlchemy may try to use both relationships when you add a product to the factory and may produce duplicate entries in specific cases (such as using a secondary table for the relationship).
This way you could use both the eager loaded products and perform an optimized query with the join while hiding it with the table declaration.
Hope that helps!

How to do two-level many-to-many relationships?

I'm working in Flask with Flask-SQLAlchemy, and trying to setup a many-to-many-to-many relationship: clients can have multiple orders and orders can also have multiple clients (many to many); each order in turn contains a list of unique items (one to many).
I followed the SQLAlchemy documents to setup an association table for the many-to-many relationship, and used the normal relationship/foreign key for the one-to-many relationship; all references are set to lazy='dynamic'.
association_table = Table('association', Base.metadata,
Column('left_id', Integer, ForeignKey('left.id')),
Column('right_id', Integer, ForeignKey('right.id'))
)
what is an efficient way to retrieve all items associated with a client? I'm assuming [item for item in order.items.all() for order in client.orders] will work (less the problem above), but is there more efficient way to do it? What if the results needs to be ordered?
Update
I now know two ways to retrieve the items for the orders of a client by the following two methods (the first one from Audrius' answer):
db.session.query(Item).\
join(Order).join(Order.clients).\
filter(Client.id == 42).\
order_by(Item.timestamp)
and
Item.query.filter(Item.order_id._in(client.order_ids)).order_by(Item.timestamp)‌​
I believe they both provide the same result, but which one should be preferred for efficiency?
When writing query directly in SQL, you would be using joins to retrieve the data you want efficiently (as Rachcha demonstrated in his answer). The same applies to SQLAlchemy. Refer to the SA docs on join() for more examples.
If your model is defined like the following (using Flask-SQLAlchemy, since you tagged your question with its tag):
clients_orders = db.Table('clients_orders',
db.Column('client_id', db.Integer, db.ForeignKey('client.id'),
primary_key=True),
db.Column('order_id', db.Integer, db.ForeignKey('order.id'),
primary_key=True)
)
class Client(db.Model):
id = db.Column(db.Integer, primary_key=True)
orders = db.relationship('Order', secondary=clients_orders,
backref='clients')
# Define other columns...
class Order(db.Model):
id = db.Column(db.Integer, primary_key=True)
# Define other columns...
class Item(db.Model):
id = db.Column(db.Integer, primary_key=True)
order_id = db.Column(db.Integer, db.ForeignKey('order.id'), nullable=False)
order = db.relationship('Order', backref='items')
timestamp = db.Column(db.DateTime)
# Define other columns...
Then your query could look like this:
db.session.query(Item).\
join(Order).join(Order.clients).\
filter(Client.id == 42).\
order_by(Item.timestamp)

min/max with orm relationships

I'm trying to find the min/max of a collection off a foreign key. I know that you can do session.query with func.min and func.max, but is there a way that lets me use the standard ORM relationship stuff?
For example with a blog, if I wanted to find the biggest "number comment" for a given post given the schema below, is it possible to do something like Post.query.get(0).number_comments.max()?
class Post(base):
id = Column(Integer, primary_key=True)
number_comments = relationship("NumberComment")
class NumberComment(base):
id = Column(Integer, primary_key=True)
num = Column(Integer, nullable=False)
As in case of using raw SQL, you need to join those tables in your query:
# This class lacks a foreign key in your example.
class NumberComment(base):
# ...
post_id = Column(Integer, ForeignKey(Post.id), nullable=False)
# ...
session.query(func.max(NumberComment.num)).join(Post).\
filter(Post.id == 1).scalar()
There's no other way to do this, at least not like you wanted. There's a reason why SQLAlchemy is called like that and not ORMSorcery ;-)
My advice would be to think in terms of SQL when trying to come up with a query, this will help you a lot.

SqlAlchemy: How to create connections between users, i.e. make "friends"

I'm trying to use SqlAlchemy to make Sqlite database tables inside of Pylons. I'm using declarative base to create the table, class, and mapper all at once with the following code:
class Friends(Base):
__tablename__ = 'friends'
left_id = Column(Integer, ForeignKey('facebooks.id'), primary_key=True)
right_id = Column(Integer, ForeignKey('facebooks.id'), primary_key=True)
def __repr__(self):
return "<Friend(id:'%s' id: '%s')>" % (self.left_id, self.right_id)
class Facebook(Base):
__tablename__ = 'facebooks'
id = Column(Integer, primary_key=True)
friends = relationship("Facebook",
secondary=Friends.__tablename__,
primaryjoin= id == Friends.right_id,
secondaryjoin= Friends.left_id == id)
def __init__(self, id):
self.id = id
def __repr__(self):
return "<User(id:'%s')>" % (self.id)
I'm just learning about all the different relationships like many to one, one to many, one to one, and many to many and how to implement each with tables and/or declatively. I'm wondering, how do I associate an object with itself? For example, I want to associate facebooks with other facebooks. In other words, to build connections between them, and establish them as "friends". How would I structure the database to make this possible?
Edit: I changed the code, which I've updated above, and I've added an association object called "Friends," but when I add a friend to a facebook object, it only works in one direction. If I add Bob as a friend to John, I can see Bob in John.Friends, but I cannot see John in Bob.Friends. What am I doing wrong? I tried adding the following relationship in the Friends class:
friend = relationship("Facebook", backref="friends")
but I get an error:
sqlalchemy.exc.ArgumentError: Could
not determine join condition between
parent/child tables on relationship
Friends.friend. Specify a
'primaryjoin' expression. If
'secondary' is present,
'secondaryjoin' is needed as well.
Where is this much different from 1:N or N:M relationship? Storing the friend relationships in a table isFriend(user1_id, user2_id) is straight forward. If you think of a friendship relationship as graph, check this: http://www.sqlalchemy.org/docs/orm/examples.html#directed-graphs

Categories