Can SQLAlchemy secondary tables be db.Model instances? - python

I'm working on a Flask-SQLAlchemy project and I've implemented a nice JSON serialization method that applies to my SQLAlchemy models. After querying the DB, I can then easily present that data via a REST API. When I'm using secondary tables for many-to-many relationships, those tables are instances of db.Table, like so:
elections_voters = db.Table(
'elections_voters',
db.metadata,
db.Column('election_id', db.Integer, db.ForeignKey('elections.id'), primary_key=True),
db.Column('user_id', db.Integer, db.ForeignKey('users.id'), primary_key=True),
)
class Election(MyModel):
__tablename__ = 'elections'
id = db.Column(db.Integer, db.Sequence('election_id_seq'), autoincrement=True, primary_key=True)
name = db.Column(db.Unicode(255))
voters = db.relationship('User', secondary=elections_voters, backref='electionsVoting')
Let's say I wanted an API that presented just a list of voters for a particular election. I'd do something like Election.query.get_or_404(election_id), then return election.voters.mycustomserialize(), as voters would be populated by SQLAlchemy. However, it's not an instance of db.Model like its parent, so I can't use my serializing method on that child.
Is there a way to set up my models such that the elections_voters secondary table is a full instance of db.Model rather than just db.Table, and is that wise? My serialization method needs access to the column names, which is why I haven't just split it out into a standalone method.

I was over-thinking it:
In this case, elections.voters is just a list, not a model instance. I can just use list comprehension to handle the serializing.
Also, there's Association Objects that do what I described, but I don't think I'll need them.

Related

Flask SQLAlchemy: many to many relationship error

I am trying to set up many-to-many relationship in SQLAlchemy but I am getting the error:
from shopapp import db
db.create_all()
sqlalchemy.exc.NoReferencedTableError: Foreign key associated with column 'shoppinglists_products.shoppinglist_id_v2' could not find table 'shoppinglist' with which to generate a foreign key to target column 'id'
My code:
from sqlalchemy import ForeignKey
from shopapp import db
shoppinglists_products = db.Table("shoppinglists_products",
db.Column("shoppinglist_id", db.Integer, ForeignKey("shoppinglist.id")),
db.Column("product_id", db.Integer, ForeignKey("product.id")))
class ShoppingList(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(20), unique=True, nullable=False)
products = db.relationship('Product', back_populates="shoppinglists", secondary="shoppinglists_products")
class Product(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(20), unique=True, nullable=False)
Where is the problem?
It seems like Flask-SQLAlchemy has problem finding the table for foreign key reference. Based on your code, here are the two ways you can fix this:
1) Fix shoppinglists_products table:
Flask-SQLAlchemy often converts the CamelCased model names into a syntax similar to this: camel_cased. In your case, ShoppingList will be referred to as shopping_list. Therefore, changing the ForeignKey("shoppinglist.id") to ForeignKey("shopping_list.id") will do the trick.
shoppinglists_products = db.Table("shoppinglists_products",
db.Column("shoppinglist_id", db.Integer, ForeignKey("shopping_list.id")), # <-- fixed
2) Change the model names:
If you'd like, you could go ahead and change the model name from ShoppingList to Shopping and later refer to this as shopping. This would prevent any confusion from rendering further. Usually, developers don't quite often go for a class name which is combined of two words, especially for the ORM cases. This is because various frameworks has different ways of interpreting the class names to create tables.
Expanding on #P0intMaN's answer - explicitly providing the SQL Alchemy table name with __tablename__ = "ShoppingList" (for example) lets you use your preferred case style and prevents SQLAlchemy from 'helping' you by changing the name of something kind of important without telling you.
class ShoppingList(db.Model):
__tablename__ = "ShoppingList"
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(20), unique=True, nullable=False)
products = db.relationship('Product', back_populates="shoppinglists", secondary="shoppinglists_products")
In many/most Flask tutorials and books, simplistic table names (e.g. posts, comments, users) are used, which elide this issue. Thus a trap awaits for those of us who insist on meaningful CamelCased class names. This is mentioned somewhat casually in the documentation here: https://flask-sqlalchemy.palletsprojects.com/en/2.x/models/
Some parts that are required in SQLAlchemy are optional in
Flask-SQLAlchemy. For instance the table name is automatically set for
you unless overridden. It’s derived from the class name converted to
lowercase and with “CamelCase” converted to “camel_case”. To override
the table name, set the tablename class attribute.

Purpose of joining tables in SQLAlchemy

I'm currently switching from raw SQL queries to the SQLAlchemy package and I'm wondering when to join there tables.
I have 3 tables. Actor and movie are in a M:N relationship. Actor_Movie is the junction table:
class Actor(Base):
__tablename__ = 'actor'
act_id = Column(Integer, primary_key=True)
last_name = Column(String(150), nullable=False, index=True)
first_name = Column(String(150), nullable=False, index=True)
movies = relationship('Movie', secondary='actor_movie')
def __init__(self, last_name, first_name):
self.last_name = last_name
self.first_name = first_name
class Movie(Base):
__tablename__ = 'movie'
movie_id = Column(Integer, primary_key=True)
title = Column(String(150))
actors = relationship('Actor', secondary='actor_movie')
def __init__(self, title):
self.title = title
class ActorMovie(Base):
__tablename__ = 'actor_movie'
fk_actor_id = Column(Integer, ForeignKey('actor.act_id'), primary_key=True)
fk_movie_id = Column(Integer, ForeignKey('movie.movie_id'), primary_key=True)
def __init__(self, fk_actor_id, fk_movie_id):
self.fk_actor_id = fk_actor_id
self.fk_movie_id = fk_movie_id
When I write a simple query like:
result = session.query(Movie).filter(Movie.title == 'Terminator').first()
I get the Movie Object back with and actor field. This actor field contains an InstrumentedList with all actors that are related to the film. This seems like a lot overhead when the relationships are always joined.
Why is the relationship automatically populated and when do I need a manual join?
Based on the result I'm not even sure if the junction table is correct. This seems to be the most "raw SQL" way. I also saw alternative approaches i. e.:
Official SQLAlchemy documentation
"This seems like a lot overhead when the relationships are always joined."
They are not. By default relationships perform a select the first time they are accessed — so called lazy loading.
"Why is the relationship automatically populated"
It was accessed on an instance and the relationship is using default configuration.
"...and when do I need a manual join?"
If you need to for example filter the query based on the related table, or you are fetching many movies and know beforehand that you will need all or some of their actors, though for a many to many relationship selectin eager loading may perform better than a join.
"Based on the result I'm not even sure if the junction table is correct."
That is the correct approach. SQLAlchemy is an ORM, and relationship attributes are the object side of the mapping and association/junction tables the relational side.
All in all the purposes are much the same as in raw SQL, but the ORM handles joins under the hood in some cases, such as eager loading, if configured or instructed to do so. As they say on the home page:
SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.

SQLAlchemy - when to make extra models and relationships vs. just storing JSON in column?

I'm writing an app framework for a project, where each app is a set of functions. To describe these functions (parameter schemas, return schemas, plugin info, etc.) I'm using an OpenAPI 3.0-like syntax: https://swagger.io/specification/
These app API descriptions are stored in a PostgreSQL database using SQLAlchemy and serialized/deserialized using Marshmallow.
My question mainly concerns nested objects like the Info object: https://swagger.io/specification/#infoObject
In my mind, I could go about this in one of two ways:
A: Just storing the JSON representation of the object in a column, and validating the schema of that object myself:
class AppApi(Base):
__tablename__ = 'app_api'
id_ = Column(UUIDType(binary=False), primary_key=True, nullable=False, default=uuid4)
info = Column(sqlalchemy_utils.JSONType, nullable=False)
B: Creating a new table for each nested object, and relying on Marshmallow to validate it against the schema during serialization:
class AppApi(Base):
__tablename__ = 'app_api'
id_ = Column(UUIDType(binary=False), primary_key=True, nullable=False, default=uuid4)
info = relationship("Info", cascade="all, delete-orphan", passive_deletes=True)
class ApiInfo(Base):
__tablename__ = 'api_info'
id_ = Column(UUIDType(binary=False), primary_key=True, nullable=False, default=uuid4)
app_api_id = Column(sqlalchemy_utils.UUIDType(binary=False), ForeignKey('app_api.id_', ondelete='CASCADE'))
name = Column(String(), nullable=False)
description = Column(String(), nullable=False)
...etc.
I'm inclined to go for option A since it seems much less involved, but option B feels more "correct." Option A gives me more flexibility and doesn't require me to make models for every single object, but Option B makes it clearer what is being stored in the database.
The app's info object won't be accessed independently of the rest of the app's API, so I'm not sure that there's much value in creating a separate table for it.
What are some other considerations I should be making to choose one or the other?
I think B is better.
With this configuration, you can access the column of ApiInfo faster (and easier).

SQLAlchemy relationship not updated after flush

I have some models with a relationship defined between them like so:
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True, nullable=False)
children = Relationship(Child, lazy='joined')
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True, nullable=False)
father_id = Column(Integer, ForeignKey('parent.id'), nullable=False)
If I add a child within the session (using session.add(Child(...))), I would expect its father's children relationship to update to include this child after flushing the session. However, I'm not seeing that.
parent = session.query(Parent).get(parent_id)
num_children = len(parent.children)
# num_children == 3, for example
session.add(Child(father_id=parent_id))
session.flush()
new_num_children = len(parent.children)
# num_children == 3, it should be 4!
Any help would be much appreciated!
I can add the new child to the parent.children list directly, and flush the session, but I'm due to other existing code, I want to add it using session.add.
I can also commit after adding the child, which does correctly update the parent.children relationship, but I don't want to commit the transaction at the point.
I've tried adding a backref to the children relationship, but that doesn't seem to make any difference.
I've just run into this problem myself. SQLAlchemy does some internal memoisation to prevent it emitting a new SQL query every time you access a relationship. The problem is that it doesn't seem to realise that updating the foreign key directly could have an effect on the relationship. While SQLAlchemy probably could be patched to deal with this for simple joins, it would be very difficult for complex joins and I presume this is why it behaves the way it does.
When you do session.flush(), you're sending the changes back to the database, but SQLAlchemy doesn't realise it needs to query the database to update the relationship.
If you call session.expire_all() after the flush, then you force SQLAlchemy to reload every model instance and relationship when they're next accessed - this solves the problem.
You can also use session.expire(obj) to do this more selectively or session.refresh(obj) to do it selectively and immediately re-query the database.
For more information about these methods and how they differ, I found a helpful blog post: https://www.michaelcho.me/article/sqlalchemy-commit-flush-expire-refresh-merge-whats-the-difference
Official docs: https://docs.sqlalchemy.org/en/13/orm/session_api.html

How to do two-level many-to-many relationships?

I'm working in Flask with Flask-SQLAlchemy, and trying to setup a many-to-many-to-many relationship: clients can have multiple orders and orders can also have multiple clients (many to many); each order in turn contains a list of unique items (one to many).
I followed the SQLAlchemy documents to setup an association table for the many-to-many relationship, and used the normal relationship/foreign key for the one-to-many relationship; all references are set to lazy='dynamic'.
association_table = Table('association', Base.metadata,
Column('left_id', Integer, ForeignKey('left.id')),
Column('right_id', Integer, ForeignKey('right.id'))
)
what is an efficient way to retrieve all items associated with a client? I'm assuming [item for item in order.items.all() for order in client.orders] will work (less the problem above), but is there more efficient way to do it? What if the results needs to be ordered?
Update
I now know two ways to retrieve the items for the orders of a client by the following two methods (the first one from Audrius' answer):
db.session.query(Item).\
join(Order).join(Order.clients).\
filter(Client.id == 42).\
order_by(Item.timestamp)
and
Item.query.filter(Item.order_id._in(client.order_ids)).order_by(Item.timestamp)‌​
I believe they both provide the same result, but which one should be preferred for efficiency?
When writing query directly in SQL, you would be using joins to retrieve the data you want efficiently (as Rachcha demonstrated in his answer). The same applies to SQLAlchemy. Refer to the SA docs on join() for more examples.
If your model is defined like the following (using Flask-SQLAlchemy, since you tagged your question with its tag):
clients_orders = db.Table('clients_orders',
db.Column('client_id', db.Integer, db.ForeignKey('client.id'),
primary_key=True),
db.Column('order_id', db.Integer, db.ForeignKey('order.id'),
primary_key=True)
)
class Client(db.Model):
id = db.Column(db.Integer, primary_key=True)
orders = db.relationship('Order', secondary=clients_orders,
backref='clients')
# Define other columns...
class Order(db.Model):
id = db.Column(db.Integer, primary_key=True)
# Define other columns...
class Item(db.Model):
id = db.Column(db.Integer, primary_key=True)
order_id = db.Column(db.Integer, db.ForeignKey('order.id'), nullable=False)
order = db.relationship('Order', backref='items')
timestamp = db.Column(db.DateTime)
# Define other columns...
Then your query could look like this:
db.session.query(Item).\
join(Order).join(Order.clients).\
filter(Client.id == 42).\
order_by(Item.timestamp)

Categories