Querying two tables using SQLAlchemy and PostgreSQL - python

I need help improving my SQLAlchemy query. I'm using Python 3.7, SQLAlchemy 1.3.15 and PosgresSQL 9.4.3 as database. I'm trying to return the count of appointments for a specific date and timeslot. However, my appointments and appointment slot tables are separate and I'm having to query both models/tables to get the desired results. Here's what I have;
Appointments Model
The appointment table have a few columns, which includes a foreign key to the appointment slots table.
class Appointment(ResourceMixin, db.Model):
__tablename__ = 'appointments'
id = db.Column(db.Integer, primary_key=True)
user_id = db.Column(db.Integer, db.ForeignKey('users.id', onupdate='CASCADE', ondelete='CASCADE'), index=True, nullable=True)
slot_id = db.Column(db.Integer, db.ForeignKey('appointment_slots.id', onupdate='CASCADE', ondelete='CASCADE'), index=True, nullable=False)
appointment_date = db.Column(db.DateTime(), nullable=False)
appointment_type = db.Column(db.String(128), nullable=False, default='general')
Appointment Slots Table
The appointment slots table contains the time slots for the appointments. The Model consist of a relationship back to the appointments table.
class AppointmentSlot(ResourceMixin, db.Model):
__tablename__ = 'appointment_slots'
id = db.Column(db.Integer, primary_key=True)
# Relationships.
appointments = db.relationship('Appointment', uselist=False,
backref='appointments', lazy='joined', passive_deletes=True)
start_time = db.Column(db.String(5), nullable=False, server_default='08:00')
end_time = db.Column(db.String(5), nullable=False, server_default='17:00')
SQLAlchemy Query
Currently I'm running the following SQLAlchemy query to get the appointment count for a specific date and time slot;
appointment_count = db.session.query(func.count(Appointment.id)).join(AppointmentSlot)\
.filter(and_(Appointment.appointment_date == date, AppointmentSlot.id == Appointment.id,
AppointmentSlot.start_time == time)).scalar()
The query above return the correct results, which is a single digit value, but I'm worried that the query is not optimized. Currently the query returns in 380ms , but there's only 8 records in the appointments and appointment_slots tables. These tables will eventually have in the 100s of thousands of records. I'm worried that even though the query is working now that it will eventually struggle with an increase of records.
How can I improved or optimized this query to improve performance? I was looking at SQLAlchemy subqueries using the appointment relationship on the appointment_slots table, but was unable to get it to work and confirm the performance. I'm thinking there must be a better way to run this query especially using the appointments relationship on the appointment_slots table, but I'm currently stumped. Any suggestions?

I was incorrect about the query load time. I was actually looking at the page load that was 380ms. I also change the some fields on the models by removing the slot_id from the appointments model and adding a appointment_id foreign key to the appointment_slots model. The page load for the following query;
appointment_count = db.session.query(func.count(Appointment.id)).join(AppointmentSlot)\
.filter(and_(Appointment.appointment_date == date,
AppointmentSlot.appointment_id == Appointment.id, AppointmentSlot.start_time == time)).scalar()
ended up being; 0.4637ms.
However, I still tried to improve the query and was able to do so by using a SQLAlchemy subquery. The following subquery;
subquery = db.session.query(Appointment.id).filter(Appointment.appointment_date == date).subquery()
query = db.session.query(func.count(AppointmentSlot.id))\
.filter(and_(AppointmentSlot.appointment_id.in_(subquery),
AppointmentSlot.start_time == time)).scalar()
Return a load time of 0.3700ms which shows a much better performance than using the join query.

Related

Flask-SQLAlchemy - get the last quote from a users followed job

My app Model is structured as so:
user_jobs = db.Table('user_jobs',
db.Column('user_id', db.Integer, db.ForeignKey('user.id')),
db.Column('job_id', db.Integer, db.ForeignKey('market.id'))
)
class User(UserMixin, db.Model):
id = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(64), index=True, unique=True)
# Other user model fields....
jobs = db.relationship('Job', secondary=user_jobs, backref='users')
class Job(db.Model):
id = db.Column(db.Integer, primary_key=True)
# Other related fields and relationships
quotes = db.relationship('Quote', backref='job', lazy='dynamic')
class Quote(db.Model):
id = db.Column(db.Integer, primary_key=True)
timestamp = db.Column(db.DateTime, index=True, default=datetime.utcnow)
price = db.Column(db.Integer())
# Other related fields
job_id = db.Column(db.Integer, db.ForeignKey('job.id'))
This model allows users to follow multiple jobs while jobs can have multiple followed users (Many to Many). A job can have multiple Quotes (One to Many).
In my flask app, I am creating a dashboard that displays the users followed jobs. For the followed jobs on the dashboard, I want to display the most recent Quote price and timestamp.
My current thinking is to create a function on the user model to return a joined table of User - Job - Quote, ordering by desc and limit(1). I however am stuck on how to do this.
class User(UserMixin, db.Model):
.....
def get followed_jobs(self):
return ...
Any help would be greatly appreciated.
EDIT:
Given there is a list of users and I'm trying to find the latest quotes that user 1 is following, the raw SQL appears to be:
Select
*
FROM
(
SELECT
job.id, job.job_name, latest_quote.timestamp,
latest_quote.price, user_job.user_id
FROM
(SELECT
job_id, max(timestamp) AS timestamp,
price FROM quote
GROUP BY job_id) AS latest_quote
JOIN
job
ON
job.id = latest_quote.job_id
JOIN
user_job
ON
user_job.job_id = latest_quote.job_id
) as aquery
WHERE user_id = 1;
Can this be made more efficient in SQL?
The below answer might be helpful to get the required data for many-to-many relationship.
SqlAlchemy and Flask, how to query many-to-many relationship
If you require data in serialisable format in many-to-many relationship which is your use-case, I would suggest you use nested schemas in marshmallow.
Flask Marshmallow/SqlAlchemy: Serializing many-to-many relationships

SQLalchemy Core, retrive id/s of the updated row/s

I am trying to learn how to use the SQlalchemy core properly and currently, I have this query.
up = Airport.__table__.update().where(Airport.__table__.c.iata_code == iata_code).values(city=city)
I am using it to update values in a table that has this structure:
class Airport(Base):
__tablename__ = 'airports'
id = Column(Integer, primary_key=True)
iata_code = Column(String(64), index=True, nullable=False)
city = Column(String(256), nullable=False)
The problem is that after the execution of the update procedure I need the ids of the updated rows.
Is it possible to update the values and obtain the ids in only one query? I would like to avoid to have to perform 2 queries for this operation.
The DBMS I am using is mysql.
Disclaimer: This is for SQLAlchemy ORM, not Core
Get the object, and update it. SQLAlchemy will update the instance's ID in the same DB round trip.
airport = Airport.filter_by(Airport.__table__.c.iata_code == iata_code).first()
airport.city = city
db.session.commit()
print(airport.id)

SQLAlchemy bulk create if not exists

I am trying to optimize my code by reducing the calls to the Database. I have the following models:
class PageCategory(Base):
category_id = Column(Text, ForeignKey('category.category_id'), primary_key=True)
page_id = Column(Text, ForeignKey('page.page_id'), primary_key=True)
class Category(Base):
category_id = Column(Text, primary_key=True)
name = Column(Text, nullable=False)
pages = relationship('Page', secondary='page_category')
class Page(Base):
page_id = Column(Text, primary_key=True)
name = Column(Text, nullable=False)
categories = relationship('Category', secondary='page_category')
The code receives a stream of Facebook likes and each one comes with a Pagea Category and the obvious relation between them a PageCategory. I need to find a way to bulk create, if not existing already, the different Pages, Categories and the relation between them. Given that the code needs to be fast I can't afford a round trip to the Database when creating every object.
page = Page(page_id='1', name='1')
category = Category(category_id='2', name='2')
session.add(page)
session.add(category)
session.commit()
...same for PageCategory
Now, given that a page_id and category_id are PK, the database will raise an IntegrityError if we try to insert duplicates, but that is still a round-trip dance. I would need a utility that receives, say a list of objects like session.bulk_save_objects([page1, page2, category1, category2, page_category1, page_category2]) but just create the objects that do not raise an IntegrityError, and ignore the ones that do.
This way I will be avoiding Database IO for every triple of objects. I don't know if this is possible or this exceeds SQLAlchemy capabilities.

SQLAlchemy Join to retrieve data from multiple tables

I'm trying to retrieve data from multiple tables with SQLAlchemy using the .join() method.
When I run the query I was expecting to get a single object back which had all the data from the different tables joined so that I could use a.area_name and so on where area_name is on one of the joined tables. Below is the query I am running and the table layout, if anyone could offer insight into how to achieve the behavior I'm aiming for I would greatly appreciate it! I've been able to use the .join() method with this same syntax to match results and return them, I figured it would return the extra data from the rows as well since it joins the tables (perhaps I'm misunderstanding how the method works or how to retrieve the information via the query object?).
If it helps with the troubleshooting I'm using MySQL as the database
query:
a = User.query.filter(User.user_id==1).join(UserGroup,
User.usergroup==UserGroup.group_id).join(Areas, User.area==Areas.area_id).first()
and the tables:
class User(db.Model):
user_id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(20), unique=True)
usergroup = db.Column(db.Integer, db.ForeignKey('user_group.group_id'), nullable=False)
area = db.Column(db.Integer, db.ForeignKey('areas.area_id'), nullable=False)
class UserGroups(db.Model):
id = db.Column(db.Integer, primary_key=True)
group_id = db.Column(db.Integer, nullable=False, unique=True)
group_name = db.Column(db.String(64), nullable=False, unique=True)
class Areas(db.Model):
id = db.Column(db.Integer, primary_key=True)
area_id = db.Column(db.Integer, nullable=False, unique=True)
area_name = db.Column(db.String(64), nullable=False, unique=True)
So it seems that I need to use a different approach to the query, and that it returns a tuple of objects which I then need to parse.
What worked is:
a = db.session.query(User, UserGroups, Areas
).filter(User.user_id==1
).join(UserGroup,User.usergroup==UserGroup.group_id
).join(Areas, User.area==Areas.area_id
).first()
The rest remaining the same. This then returned a tuple that I could parse where the data from User is a[0], from UserGroups is a[1], and Areas is a[2]. I can then access the group_name column with a[1].group_name etc.
Hopefully this helps someone else who's trying to work with this!
Take a look at SQLAlchemy's relationship function:
http://docs.sqlalchemy.org/en/latest/orm/basic_relationships.html#one-to-many
You may want to add a new attribute to your User class like so:
group = sqlalchemy.relationship('UserGroups', back_populates='users')
This will automagically resolve the one-to-many relationship between User and UserGroups (assuming that a User can only be member of one UserGroup at a time). You can then simply access the attributes of the UserGroup once you have queried a User (or set of Users) from your database:
a = User.query.filter(...).first()
print(a.group.group_name)
SQLAlchemy resolves the joins for you, you do not need to explicitly join the foreign tables when querying.
The reverse access is also possible; if you just query for a UserGroup, you can access the corresponding members directly (via the back_populates-keyword argument):
g = UserGroup.query.filter(...).first()
for u in g.users:
print(u.name)

How to create a field with a list of foreign keys in SQLAlchemy?

I am trying to store a list of models within the field of another model. Here is a trivial example below, where I have an existing model, Actor, and I want to create a new model, Movie, with the field Movie.list_of_actors:
import uuid
from sqlalchemy import Boolean, Column, Integer, String, DateTime
from sqlalchemy.schema import ForeignKey
rom sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
Base = declarative_base()
class Actor(Base):
__tablename__ = 'actors'
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
name = Column(String)
nickname = Column(String)
academy_awards = Column(Integer)
# This is my new model:
class Movie(Base):
__tablename__ = 'movies'
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
title = Column(String)
# How do I make this a list of foreign keys???
list_of_actors = Column(UUID(as_uuid=True), ForeignKey('actors.id'))
I understand that this can be done with a many-to-many relationship, but is there a more simple solution? Note that I don't need to look up which Movie's an Actor is in - I just want to create a new Movie model and access the list of my Actor's. And ideally, I would prefer not to add any new fields to my Actor model.
I've gone through the tutorials using the relationships API, which outlines the various one-to-many/many-to-many combinations using back_propagates and backref here: http://docs.sqlalchemy.org/en/latest/orm/basic_relationships.html But I can't seem to implement my list of foreign keys without creating a full-blown many-to-many implementation.
But if a many-to-many implementation is the only way to proceed, is there a way to implement it without having to create an "association table"? The "association table" is described here: http://docs.sqlalchemy.org/en/latest/orm/basic_relationships.html#many-to-many ? Either way, an example would be very helpful!
Also, if it matters, I am using Postgres 9.5. I see from this post there might be support for arrays in Postgres, so any thoughts on that could be helpful.
Update
It looks like the only reasonable approach here is to create an association table, as shown in the selected answer below. I tried using ARRAY from SQLAlchemy's Postgres Dialect but it doesn't seem to support Foreign Keys. In my example above, I used the following column:
list_of_actors = Column('actors', postgresql.ARRAY(ForeignKey('actors.id')))
but it gives me an error. It seems like support for Postgres ARRAY with Foreign Keys is in progress, but still isn't quite there. Here is the most up to date source of information that I found: http://blog.2ndquadrant.com/postgresql-9-3-development-array-element-foreign-keys/
If you want many actors to be associated to a movie, and many movies be associated to an actor, you want a many-to-many. This means you need an association table. Otherwise, you could chuck away normalisation and use a NoSQL database.
An association table solution might resemble:
class Actor(Base):
__tablename__ = 'actors'
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
name = Column(String)
nickname = Column(String)
academy_awards = Column(Integer)
class Movie(Base):
__tablename__ = 'movies'
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
title = Column(String)
actors = relationship('ActorMovie', uselist=True, backref='movies')
class ActorMovie(Base):
__tablename__ = 'actor_movies'
actor_id = Column(UUID(as_uuid=True), ForeignKey('actors.id'))
movie_id = Column(UUID(as_uuid=True), ForeignKey('movies.id'))
If you don't want ActorMovie to be an object inheriting from Base, you could use sqlachlemy.schema.Table.

Categories