Order rows in a many-to-many relationship by count, but fast

Order rows in a many-to-many relationship by count, but fast - python

I have a many-to-many relationship between the Image and Tag tables in my project:
tags2images = db.Table("tags2images",
db.Column("tag_id", db.Integer, db.ForeignKey("tags.id", ondelete="CASCADE", onupdate="CASCADE"), primary_key=True),
db.Column("image_id", db.Integer, db.ForeignKey("images.id", ondelete="CASCADE", onupdate="CASCADE"), primary_key=True)
)
class Image(db.Model):
__tablename__ = "images"
id = db.Column(db.Integer, primary_key=True, autoincrement=False)
title = db.Column(db.String(1000), nullable=True)
tags = db.relationship("Tag", secondary=tags2images, back_populates="images", passive_deletes=True)
class Tag(db.Model):
__tablename__ = "tags"
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
name = db.Column(db.String(250), nullable=False, unique=True)
images = db.relationship(
"Image",
secondary=tags2images,
back_populates="tags",
passive_deletes=True
)
and I'd like to grab a list of tags, ordered by how many times they're used in images. My images and tags tables contain ~200.000 and ~1.000.000 rows respectively, so there's a decent amount of data.
After a bit of messing around, I arrived at this monstrosity:
db.session.query(Tag, func.count(tags_assoc.c.tag_id).label("total"))\
.join(tags_assoc)\
.group_by(Tag)\
.order_by(text("total DESC"))\
.limit(20).all()
and while it does return a list of (Tag, count) tuples the way I want it to, it takes several seconds, which is not optimal.
I found this very helpful post (Counting relationships in SQLAlchemy) that helped me simplify the above to just
db.session.query(Tag.name, func.count(Tag.id))\
.join(Tag.works)\
.group_by(Tag.id)\
.limit(20).all()
and while this is wicked fast compared to my first attempt, the output obviously isn't sorted anymore. How can I get SQLAlchemy to produce the desired result while keeping the query fast?

This seems like something you probably need to use EXPLAIN for in psql. I added a combined index on both the tag_id and image_id via Index('idx_tags2images', 'tag_id', 'image_id'). I'm not sure what is better, individual indices or combined? But maybe see if using a limited subquery on just the association table before joining is faster.
from sqlalchemy import select
tags2images = Table("tags2images",
Base.metadata,
Column("id", Integer, primary_key=True),
Column("tag_id", Integer, ForeignKey("tags.id", ondelete="CASCADE", onupdate="CASCADE"), index=True),
Column("image_id", Integer, ForeignKey("images.id", ondelete="CASCADE", onupdate="CASCADE"), index=True),
Index('idx_tags2images', 'tag_id', 'image_id'),
)
class Image(Base):
__tablename__ = "images"
id = Column(Integer, primary_key=True)
title = Column(String(1000), nullable=True)
tags = relationship("Tag", secondary=tags2images, back_populates="images", passive_deletes=True)
class Tag(Base):
__tablename__ = "tags"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(250), nullable=False, unique=True)
images = relationship(
"Image",
secondary=tags2images,
back_populates="tags",
passive_deletes=True
)
with Session() as session:
total = func.count(tags2images.c.image_id).label("total")
# Count, group and order just the association table itself.
sub = select(
tags2images.c.tag_id,
total
).group_by(
tags2images.c.tag_id
).order_by(
total.desc()
).limit(20).alias('sub')
# Now bring in the Tag names with a join
# we order again but this time only across 20 entries.
# #NOTE: Subquery will not get tags with image_count == 0
# since we use INNER join.
q = session.query(
Tag,
sub.c.total
).join(
sub,
Tag.id == sub.c.tag_id
).order_by(sub.c.total.desc())
for tag, image_count in q.all():
print (tag.name, image_count)

Related

SQLAlchemy query returning count of ALL join rows, not grouping by joined row

I'm building a CRUD application and trying to display a list of post "tags", with a number next to each of how many posts have used that tag, and ordered by the number of posts. I have one table for posts, one for tags, and one join table called posts_tags. When I execute the query I think should do the trick, it displays the count of all rows of the posts_tags table instead of just the count of rows associated with each tag. In the image below, the "test" tag has been used on 3 posts and "test 2" on 1 (which are the numbers that should show up next to them), but as you can see I get 4 instead:
display of incorrect post counts for tags
My tags table has a relationship with the posts_tags table, allowing me to use "Tag.tagged_post_ids" in the query:
`
class Tag(db.Model):
""" Model for tags table """
__tablename__ = "tags"
id = db.Column(
db.Integer,
primary_key=True,
autoincrement=True
)
tag = db.Column(
db.String(30),
nullable=False,
unique=True
)
description = db.Column(
db.Text,
nullable=False
)
tagged_post_ids = db.relationship(
"PostTag"
)
`
Here's the SQLA query I wrote:
`
tags = db.session.query(Tag.tag, func.count(Tag.tagged_post_ids).label("count")).group_by(Tag.tag).order_by(func.count(Tag.tagged_post_ids)).all()
`
I have successfully built the query in SQL:
SELECT tags.tag, COUNT(posts_tags.post_id) FROM tags JOIN posts_tags ON posts_tags.tag_id = tags.id GROUP BY tags.tag ORDER BY COUNT(posts_tags.post_id) DESC;
My main issue is trying to translate this into SQLAlchemy. I feel like my query is a 1-to-1 for my SQL query, but it's not working! Any help would be greatly appreciated.
EDIT: Adding my Post model and PostTag (join) model:
class Post(db.Model):
""" Model for posts table """
__tablename__ = "posts"
id = db.Column(
db.Integer,
primary_key=True,
autoincrement=True
)
user_id = db.Column(
db.Integer,
db.ForeignKey("users.id")
)
title = db.Column(
db.Text,
nullable=False
)
content = db.Column(
db.Text
)
url = db.Column(
db.Text
)
img_url = db.Column(
db.Text
)
created_at = db.Column(
db.DateTime,
nullable=False,
default=db.func.now()
)
score = db.Column(
db.Integer,
nullable=False,
default=0
)
tags = db.relationship(
"Tag",
secondary="posts_tags",
backref="posts"
)
comments = db.relationship(
"Comment",
backref="post"
)
#property
def tag_list(self):
""" Builds comma separated list of tags for the post. """
tag_list = []
for tag in self.tags:
tag_list.append(tag.tag)
return tag_list
class PostTag(db.Model):
""" Model for join table between posts and tags """
__tablename__ = "posts_tags"
post_id = db.Column(
db.Integer,
db.ForeignKey("posts.id"),
primary_key=True
)
tag_id = db.Column(
db.Integer,
db.ForeignKey("tags.id"),
primary_key=True
)

If you are using backref you only need to define one side of the relationship. I actually don't know what happens when you use func.count on a relationship, I only use it on a column. Here are a couple options. An outer join is needed to catch the case when there are 0 posts with that tag otherwise with an inner join that tag will just be missing from the result. I also use func.coalesce to convert NULL to 0 in the first example.
class Tag(Base):
""" Model for tags table """
__tablename__ = "tags"
id = Column(
Integer,
primary_key=True,
autoincrement=True
)
tag = Column(
String(30),
nullable=False,
unique=True
)
# Redundant
# tagged_post_ids = relationship(
# "PostTag"
# )
class Post(Base):
""" Model for posts table """
__tablename__ = "posts"
id = Column(
Integer,
primary_key=True,
autoincrement=True
)
title = Column(
Text,
nullable=False
)
tags = relationship(
"Tag",
secondary="posts_tags",
backref="posts"
)
#property
def tag_list(self):
""" Builds comma separated list of tags for the post. """
tag_list = []
for tag in self.tags:
tag_list.append(tag.tag)
return tag_list
class PostTag(Base):
""" Model for join table between posts and tags """
__tablename__ = "posts_tags"
post_id = Column(
Integer,
ForeignKey("posts.id"),
primary_key=True
)
tag_id = Column(
Integer,
ForeignKey("tags.id"),
primary_key=True
)
metadata.create_all(engine)
with Session(engine) as session, session.begin():
# With subquery
tag_subq = select(
PostTag.tag_id,
func.count(PostTag.post_id).label("post_count")
).group_by(
PostTag.tag_id
).order_by(
func.count(PostTag.post_id)
).subquery()
q = session.query(
Tag.tag,
func.coalesce(tag_subq.c.post_count, 0)
).outerjoin(
tag_subq,
Tag.id == tag_subq.c.tag_id
).order_by(
func.coalesce(tag_subq.c.post_count, 0))
for (tag_name, post_count) in q.all():
print (tag_name, post_count)
# With join
q = session.query(
Tag.tag,
func.count(PostTag.post_id).label('post_count')
).outerjoin(
PostTag,
Tag.id == PostTag.tag_id
).group_by(
Tag.id
).order_by(
func.count(PostTag.post_id))
for (tag_name, post_count) in q.all():
print (tag_name, post_count)

SQLAlchemy filter with many-to-many relations

A store can have many interests. User request a product that is tagged. Query required is to get the product requests that have tags shared with current store.
# in Store -> relationship('Tag', secondary=store_interest_tags, lazy='dynamic', backref=backref('store', lazy=True))
store_tags = store.interests
matched_requests_to_store = []
for tag in store_tags:
r = session.query(ProductRequest).filter(ProductRequest.product_tags.contains(tag)).all()
matched_requests_to_store.extend(r)
I am sure there might be a more efficient way to query that. I have tried the following:
session.query(ProductRequest).filter(ProductRequest.product_tags.any(store_tags)).all()
But got
psycopg2.errors.SyntaxError: subquery must return only one column
LINE 5: ..._id AND tag.id = product_requests_tags.tag_id AND (SELECT ta...
Any idea how to achieve such query?

A query like this might work, I think it could be done with less joins but this is less rigid than dropping into using the secondary tables directly and specifying the individual joins:
q = session.query(
ProductRequest
).join(
ProductRequest.tags
).join(
Tag.stores
).filter(
Store.id == store.id)
product_requests_for_store = q.all()
With a schema like this:
stores_tags_t = Table(
"stores_tags",
Base.metadata,
Column("id", Integer, primary_key=True),
Column("store_id", Integer, ForeignKey("stores.id")),
Column("tag_id", Integer, ForeignKey("tags.id")),
)
product_requests_tags_t = Table(
"product_request_tags",
Base.metadata,
Column("id", Integer, primary_key=True),
Column("product_request_id", Integer, ForeignKey("product_requests.id")),
Column("tag_id", Integer, ForeignKey("tags.id")),
)
class Store(Base):
__tablename__ = "stores"
id = Column(Integer, primary_key=True)
name = Column(String(), unique=True, index=True)
tags = relationship('Tag', secondary=stores_tags_t, backref=backref('stores'))
class ProductRequest(Base):
__tablename__ = "product_requests"
id = Column(Integer, primary_key=True)
name = Column(String(), unique=True, index=True)
tags = relationship('Tag', secondary=product_requests_tags_t, backref=backref('product_requests'))
class Tag(Base):
__tablename__ = "tags"
id = Column(Integer, primary_key=True)
name = Column(String())

This worked:
session.query(ProductRequest).filter( ProductRequest.product_tags.any(Tag.id.in_(store_tag.id for store_tag in store_tags) ) ).all()

How to give unique together in flask?

I am defining a table in Flask like
groups = db.Table(
"types",
db.Column("one_id", db.Integer, db.ForeignKey("one.id")),
db.Column("two_id", db.Integer, db.ForeignKey("two.id")),
UniqueConstraint('one_id', 'two_id', name='uix_1') #Unique constraint given for unique-together.
)
But this is not working.

I think you can refer to an old topic https://stackoverflow.com/a/10061143/18269348
Here is the code :
# version1: table definition
mytable = Table('mytable', meta,
# ...
Column('customer_id', Integer, ForeignKey('customers.customer_id')),
Column('location_code', Unicode(10)),
UniqueConstraint('customer_id', 'location_code', name='uix_1')
)
# or the index, which will ensure uniqueness as well
Index('myindex', mytable.c.customer_id, mytable.c.location_code, unique=True)
# version2: declarative
class Location(Base):
__tablename__ = 'locations'
id = Column(Integer, primary_key = True)
customer_id = Column(Integer, ForeignKey('customers.customer_id'),
nullable=False)
location_code = Column(Unicode(10), nullable=False)
__table_args__ = (UniqueConstraint('customer_id', 'location_code',
name='_customer_location_uc'),
)
You have a little explanation on the post and a link to the official documentation of sqlalchemy.
Thanks to Van who posted that.

Can't update SQLAlchemy association table extra columns

My models look like this:
class Company(DB_BASE):
__tablename__ = 'Company'
id = Column(Integer, primary_key=True, index=True)
...
products = relationship('Product', secondary=Company_Products, backref='Company')
class Product(DB_BASE):
__tablename__ = 'Product'
id = Column(Integer, primary_key=True, index=True)
...
companies = relationship('Company', secondary=Company_Products, backref='Product')
This is my association table
Company_Products = Table(
'Company_Products',
DB_BASE.metadata,
Column('id', Integer, primary_key=True),
Column('company_id', Integer, ForeignKey('Company.id')),
Column('product_id', Integer, ForeignKey('Product.id')),
Column('quantity', Integer, default=0),
Column('price_per_unit', Integer, default=0),
)
And this is how I'm querying the association table.
company_product = db.query(Company_Products).filter_by(product_id=id, company_id=user.company_id).first()
company_product.quantity = data.data['quantity']
company_product.price = data.data['price']
After creating the many-to-many relationship between a Company and a Product, I would like to modify the relationship extra data, quantity and price_per_unit in this instance. After querying the association object, modifying any attribute yields:
AttributeError: can't set attribute 'quantity'

Follow up on my question, the solution which ended up working for me is making a new model and using it to somewhat simulate an association table.
class Company_Products(DB_BASE):
__tablename__ = 'Company_Products'
id = Column(Integer, primary_key=True, index=True)
...
quantity = Column(String) # 1 - client, 2 - furnizor
price_per_unit = Column(String)
company_id = Column(Integer, ForeignKey('Company.id'))
company = relationship('Company', back_populates='products', lazy='select')
product_id = Column(Integer, ForeignKey('Product.id'))
product = relationship('Product', back_populates='companies', lazy='select')
This is definitely not the best solution, if I come up with something else or come across something which might work out, I will edit this.

Many to many query in sqlalchemy

There are tables for my question.
class TemplateExtra(ExtraBase, InsertMixin, TimestampMixin):
__tablename__ = 'template_extra'
id = Column(Integer, primary_key=True, autoincrement=False)
name = Column(Text, nullable=False)
roles = relationship(
'RecipientRoleExtra',
secondary='template_to_role',
)
class RecipientRoleExtra(
ExtraBase, InsertMixin, TimestampMixin,
SelectMixin, UpdateMixin,
):
__tablename__ = 'recipient_role'
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(Text, nullable=False)
description = Column(Text, nullable=False)
class TemplateToRecipientRoleExtra(ExtraBase, InsertMixin, TimestampMixin):
__tablename__ = 'template_to_role'
id = Column(Integer, primary_key=True, autoincrement=True)
template_id = Column(Integer, ForeignKey('template_extra.id'))
role_id = Column(Integer, ForeignKey('recipient_role.id'))
I want to select all templates with prefetched roles in two sql-queries like Django ORM does with prefetch_related. Can I do it?
This is my current attempt.
def test_custom():
# creating engine with echo=True
s = DBSession()
for t in s.query(TemplateExtra).join(RecipientRoleExtra, TemplateExtra.roles).all():
print(f'id = {t.id}')
for r in t.roles:
print(f'-- {r.name}')
But..
it generates select query for every template to select its roles. Can I make sqlalchemy to do only one query?
generated queries for roles are without join, just FROM recipient_role, template_to_role with WHERE %(param_1)s = template_to_role.template_id AND recipient_role.id = template_to_role.role_id. Is it correct?
Can u help me?

Based on this answer:
flask many to many join as done by prefetch_related from django
Maybe somthing like this:
roles = TemplateExtra.query.options(db.joinedload(TemplateExtra.roles)).all
Let me know if it worked.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Order rows in a many-to-many relationship by count, but fast - python

Related

SQLAlchemy query returning count of ALL join rows, not grouping by joined row

SQLAlchemy filter with many-to-many relations

How to give unique together in flask?

Can't update SQLAlchemy association table extra columns

Many to many query in sqlalchemy

Categories

Resources