Sqlalchemy using func with outerjoin on multiple tables - python

I have the following tables in sqlalchemy :-
class Post(Base):
__tablename__ = 'posts'
id = Column(Integer, primary_key=True)
compare_url =Column(String(200))
url = Column(String(200))
postedby = Column(Integer)
category = Column(String(50))
title = Column(String(500),nullable=False)
author = Column(String(500),default="Unspecified")
content = Column(Text(),default="could not fetch this content you will have to read it externally")
summary = Column(Text())
time = Column(TIMESTAMP(),default=now())
post_type=Column(Text())
Reads = relationship("Read", backref="Post")
Reposts = relationship("RePost", backref="Post")
Votes = relationship("Vote", backref="Post")
class Read(Base):
__tablename__ = 'reads'
id = Column(Integer, primary_key=True)
post_read = Column(Integer, ForeignKey('posts.id'))
#post = relationship("Post", backref=backref('Reads', order_by=id))
time = Column(TIMESTAMP(),default=now())
user_id = Column(String(50))
class Vote(Base):
__tablename__ = 'votes'
id = Column(Integer, primary_key=True)
post_read = Column(Integer, ForeignKey('posts.id'))
time = Column(TIMESTAMP(),default=now())
user_id = Column(String(50))
user_vote = Column(Boolean(),nullable=False)
I have this query
posts = session.query(Post, func.count(Read.id).label('total'),func.sum(Vote.user_vote).label('votes'),User.username).outerjoin(Post.Reads).outerjoin(Post.Votes)
i am trying to get the number of votes and the number of times a post has been read. A vote Value can be -1 or 1
The problem is i am getting the same value for number of Reads and votes on each Post
for example when my reads table has
id post_read time user_id
1 7 2012-09-19 09:32:06 1
and votes table has
id post_read time user_id user_vote
1 7 [->] 2012-09-19 09:42:27 1 1
2 7 [->] 2012-09-19 09:42:27 2 1
But i am still getting the value for votes and reads as two.

It might look as if you can solve this particular problem by simply replacing func.count(Read.id).label('total') with func.count(func.distinct(Read.id)).label('total'). And in fact this will solve the issue with number of reads.
But if you suddenly get another reader for your post (and end up with 2 readers and 2 voters), then all your votes will also be counted twice.
The best solution to this is simply not to aggreate different items in the same query. You can use subqueries to solve this:
subq_read = (session.query(
Post.id,
func.count(Read.id).label("total_read")
).
outerjoin(Post.Reads).
group_by(Read.post_read)
).subquery()
subq_vote = (session.query(
Post.id,
func.sum(Vote.user_vote).label("total_votes")
).
outerjoin(Post.Votes).
group_by(Vote.post_read)
).subquery()
posts = (session.query(
Post,
subq_read.c.total_read,
subq_vote.c.total_votes,
).
outerjoin(subq_read, subq_read.c.id == Post.id).
outerjoin(subq_vote, subq_vote.c.id == Post.id)
.group_by(Post)
)
Note: you have a User.username in your query, but I did not see any join clause in the query. You might want to check this as well.

When joining multiple tables, tables that join earlier get their rows repeated for tables that join later in one-to-many relationships (to put it simply). This is why your count is off. In joins like this, you always need to find something distinct to count in the result set... such as the primary keys. I find this preferable to subqueries as it is much faster. In fact, much of the performance tuning I do comes from eliminating subqueries.
Thus, if you filter on the user_vote column to eliminate the records you don't want to count, you can fix your query like this:
posts = session.query(Post
, func.count(distinct(Read.id)).label('total')
, func.count(distinct(Vote.id)).label('votes')
, User.username
) \
.outerjoin(Post.Reads) \
.outerjoin(Post.Votes) \
.filter(Votes.user_vote == True)
But, you'll probably also want to add a group_by, or another filter, to that as well to get counts per Post, your likely goal.

Related

SQLAlchemy query common attributes?

I have a Association table of movies and categories that belong to those movies, and I want to get all the common categories between two movies (just need the id of the category).
So if both the movies have the category 'Thriller' which has the category_id of 5, i want to get 5. And if they have no common categories it just returns None.
Table looks like:
class MovieCategoryScores(db.Model):
movie_id = db.Column(db.Integer, db.ForeignKey('movie.id'), primary_key=True)
category_id = db.Column(db.Integer, db.ForeignKey('category.id'), primary_key=True)
score = db.Column(db.Integer)
votes = db.Column(db.Integer)
category = relationship("Category", back_populates="movies")
movie = relationship("Movie", back_populates="categories")
I know I can query
categories = MovieCategoryScores.query.filter(MovieCategoryScores.movie_id.in_([movie1, movie2])).all()
to get ALL categories, and I tried putting (MovieCategoryScores.category_id) after the query to only get the id's but that didn't work and I just got a TypeError: 'BaseQuery' object is not callable error.
If I figured out how to just get the ID's I could use something like:
categories.sort()
for index, category_id in enumerate(categories.copy()):
if categories[index+1] != category_id:
categories[index].remove()
return categories
To get get a list of only the id's that that there are 2 of, but it feels like there should be some better way to be able to the ID's of the items where both have the same category_id just through a query command?
Either solution will be much appreciated!
You can use having and func.count() > 1 to get the opposite of distinct (group_by is required for having).
from sqlalchemy import func
categories = MovieCategoryScores.query.with_entities(MovieCategoryScores.category_id).filter(MovieCategoryScores.movie_id.in_([movie1, movie2])).group_by(MovieCategoryScores.category_id).having(func.count(MovieCategoryScores.category_id) > 1).all()
Or if you want to retrieve Category.name you can do the following:
from sqlalchemy import func
categories = MovieCategoryScores.query.with_entities(MovieCategoryScores.name).filter(MovieCategoryScores.movie_id.in_([movie1, movie2])).group_by(MovieCategoryScores.category).having(func.count(MovieCategoryScores.category) > 1).all()
All right so after a lot of headaches I came up with this:
def get_common_categories(movie1, movie2):
categories = MovieCategoryScores.query.with_entities(MovieCategoryScores.category_id).filter(MovieCategoryScores.movie_id.in_([movie1, movie2])).all()
categories.sort()
common = []
for index, category in enumerate(categories):
if index+1 < len(categories) and categories[index+1] == category:
common.append(category[0])
return common
Feels stupid and like there should be some way to do it only with a query and filter, but couldn't figure it out so this will have to do for now.

How can I get matching records that are multiple relationships apart with SQLAlchemy?

I have four tables/classes: Group. Bank, Question, and Survey. Surveys have many questions, questions belong to a bank which in turn belong to a group. I have the group and the survey (I'm in an instance method of Survey and am looping through all the Group instances), and want to know which questions belong to both.
class Group(db.Model):
id = db.Column(db.Integer, primary_key=True)
banks = db.relationship('Bank', backref='group')
class Bank(db.Model):
id = db.Column(db.Integer, primary_key=True)
group_id = db.Column(db.Integer, db.ForeignKey('group.id')
questions = db.relationship('Question', backref='bank', lazy='dynamic')
class Question(db.Model):
id = db.Column(db.Integer, primary_key=True)
survey_id = db.Column(db.Integer, db.ForeignKey('survey.id')
bank_id = db.Column(db.Integer, db.ForeignKey('bank.id')
class Survey(db.Model):
id = db.Column(db.Integer, primary_key=True)
questions = db.relationship('Question', backref='survey', lazy='dynamic')
I thought of trying something like self.questions.filter(Question.bank.in_(group.banks)) (self being a Survey instance), but got a NotImplementedError. Right now I'm using ugly nested for loops with if conditions, and am trying to clean it up, especially since I anticipate there being a speed issue as the number of surveys and questions increase.
for group in groups:
for bank in group.banks:
for question in bank.questions:
if question in self.questions:
# do stuff
You can use joins to traverse the relationship get the information that you want.
This query joins the survey table to the bank table on their shared foreign question id key, then joins the bank table to the group table.
q = (session.query(Survey.id, Question.id, Group.id)
.join(Survey.questions, Bank)
.join(Group)
.filter(Survey.id == self.id)
.all())
To get information for all surveys in one query, remove the filter clause.
The SQL generated by the query is
SELECT survey.id AS survey_id, question.id AS question_id, "group".id AS group_id
FROM survey
JOIN question ON survey.id = question.survey_id
JOIN bank ON bank.id = question.bank_id
JOIN "group" ON "group".id = bank.group_id
WHERE survey.id = ?

Count related items in a sqlalchemy model using ChoiceType

This is a follow up to a previous question here. I'd like to count the number of offers, in each category, and output them in a format, which I can iterate in Jinja.
new, 3
used, 7
broken, 5
Here's what I've got right now:
class Offer(Base):
CATEGORIES = [
(u'new', u'New'),
(u'used', u'Used'),
(u'broken', u'Broken')
]
__tablename__ = 'offers'
id = sa.Column(sa.Integer, primary_key=True)
summary = sa.Column(sa.Unicode(255))
category = sa.Column(ChoiceType(CATEGORIES))
Following the previous answer, I tried something like this:
count_categories = db.session.query(
CATEGORIES.value, func.count(Offer.id)).outerjoin(
Offer).group_by(CATEGORIES.key).all()
This obviously doesn't work because CATEGORIES.value is not defined; How can I pass CATEGORIES to this query, to yield the desired result? The "setup" seems fairly common, and is taken straight from the SQLAlchemy-Utils Data types page
Your help is much appreciated (growing white hair already)!
A horrible but working, temporary work-around:
result = []
for category in Offer.CATEGORIES:
count = db.session.query(func.count(Offer.id)).filter_by(category=category[0]).all()
result.append((category[0], category[1], count[0][0]))
To count the number of offers for each category you need to do an outer join between categories and offers. Given that you are not storing categories as a database table you are having to do this in application code which is not ideal. You just need to create a new categories table and replace the category field in the user table with a foreign key joining to the new categories table. The following code demonstrates this.
class Offer(Base):
__tablename__ = 'offers'
id = sa.Column(sa.Integer, primary_key=True)
summary = sa.Column(sa.Unicode(255))
category_id = sa.Column(sa.Integer, sa.ForeignKey("categories.id"))
category = sa.orm.relationship("Category", back_populates="offers")
class Category(Base):
__tablename__ = 'categories'
id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.Unicode(6), unique=True)
offers = sa.orm.relationship("Offer", back_populates="category")
# populate categories with the same values as your original enumeration
session.add(Category(name="New"))
session.add(Category(name="Used"))
session.add(Category(name="Broken"))
count_categories = session.query(Category.name, func.count(Offer.id)). \
select_from(Category).outerjoin(Offer).group_by(Category.name).all()

How to update many to many relationship by using id with sqlalchemy?

I know I can simply update many to many relationship like this:
tags = db.Table('tags',
db.Column('tag_id', db.Integer, db.ForeignKey('tag.id'), primary_key=True),
db.Column('page_id', db.Integer, db.ForeignKey('page.id'), primary_key=True)
)
class Page(db.Model):
id = db.Column(db.Integer, primary_key=True)
tags = db.relationship('Tag', secondary=tags, lazy='subquery',
backref=db.backref('pages', lazy=True))
class Tag(db.Model):
id = db.Column(db.Integer, primary_key=True)
tag1 = Tag()
tag2 = Tag()
page = Page( tags=[tag1])
and later for updating:
page.append(tag2)
but I want to update them only by the tag id, Assume I have to create a general function that only accepts person and ids for addresses and update it.
What I want is something like this:
page = Page(tags=[1,2]) # 1 and 2 are primary keys of (tag)s
or in a function
def update_with_foreignkey(page, tags=[1,2]):
# dosomething to update page without using Tag object
return updated page
It was a little tricky and by using the evil eval but finally, I find a general way to update many to many relations using foreign keys. My goal was to update my object by getting data from a PUT request and in that request, I only get foreign keys for the relationship with other data.
Step by step solution:
1- Find relationships in the object, I find them using __mapper__.relationships
2- Find the key that represents the relationship.
for rel in Object.__mapper__.relationships:
key = str(rel).rsplit('.',1)[-1]
in question case it return 'tags' as the result.
3- Find the model for another side of the relation ( in this example Tag).
3-1 Find name of the table.
3-2 Convert table name to camleCase because sqlalchemy use underscore for the table name and camelCase for the model.
3-3 Use eval to get the model.
if key in data:
table = eval(convert_to_CamelCase(rel.table.name))
temp = table.query.filter(table.id.in_(data[key])).all() # this line convert ids to sqlacemy objects
All together
def convert_to_CamelCase(word):
return ''.join(x.capitalize() or '_' for x in word.split('_'))
def update_relationship_withForeingkey(Object, data):
for rel in Object.__mapper__.relationships:
key = str(rel).rsplit('.',1)[-1]
if key in data:
table = eval(convert_to_CamelCase(rel.table.name))
temp = table.query.filter(table.id.in_(data[key])).all() # this line convert ids to sqlacemy objects
data[key] = temp
return data
data is what I get from the request, and Object is the sqlalchemy Model that I want to update.
running this few lines update give me the result:
item = Object.query.filter_by(id=data['id'])
data = update_relationship_withForeingkey(Object,data)
for i,j in data.items():
setattr(item,i,j)
db.session.commit()
I'm not sure about caveats of this approach but it works for me. Any improvement and sugesstion are welcome.

Tracking member tagging of artists

I'm writing a web based jukebox for my own amusement and as a learning experiencing.
I'm finally getting around to finishing my models except for one big component: member genre tags. I know I'll have to relate three models and it'll involve using association_proxy and collection classes.
Here's the relevant models (I had an abstract model to handle declaring the id and name fields, but that caused issues that I'll look at later):
class Member(db.Model):
__tablename__ = 'members'
id = db.Column('id', db.Integer, primary_key=True)
name = db.Column('name', db.String(128))
##tagged_artists is a backref from MemberTaggedArtist
tags = association_proxy('tagged_artists', 'artist',
creator=lambda k,v: MemberTaggedArtist(tag=k, artist=v)
)
class Artist(db.Model):
__tablename__ = 'artists'
id = db.Column('id', db.Integer, primary_key=True)
name = db.Column('name', db.String(128))
class Tag(db.Model):
__tablename__ = 'tags'
id = db.Column('id', db.Integer, primary_key=True)
name = db.Column('name', db.String(128))
class MemberTaggedArtist(db.Model):
__tablename__ = 'membertaggedartists'
id = db.Column('id', db.Integer, primary_key=True)
member_id = db.Column('member_id', db.Integer, db.ForeignKey('members.id'))
artist_id = db.Column('artist_id', db.Integer, db.ForeignKey('artists.id'))
tag_id = db.Column('tag_id', db.Integer, db.ForeignKey('tags.id'))
member = db.relationship(Member, backref=db.backref(
'tagged_artists',
collection_class=attribute_mapped_collection('tag')
)
artist = db.relationship(Artist, backref='applied_tags')
tag = db.relationship(Tag, backref='applied_artists')
What I'd like to happen is this:
>>> member = Member(name='justanr')
>>> artist = Artist(name='Gorguts')
>>> tag = Tag('Death Metal')
>>> member.tags['Death Metal'].append('Gorguts')
>>> member.tags
... {'Death Metal':['Gorguts']}
What currently happens is this (note, I built a mixin to handle repr calls):
>>> member.tags
... {Tag (ID:1 Name:Death Metal): MemberTaggedArtist (ID: 1 Member:justanr Artist:Gorguts Tag:Death Metal)}
I haven't been working with association_proxy long enough to understand what I'm doing wrong and even the brief tutorial in the documentation is giving me issues (I'm not sure why and I don't think it's because I'm using Flask-SQLAlchemy).
In short, I'm attempting to build an association proxy to create a dict of lists and I'm completely lost. I'm unsure what values I should proxy along, if using one middle table is over complicating this, and how to construct a secondary (and possibly tertiary) middle table
I finally came up with a solution. I forgot what I knew about SQL...lol I guess? But I recently watched several talks on SQLA that made me realize that I had an 'O' problem -- I was focusing specifically on the Object portion of ORM instead of thinking "SQLA just makes writing Queries easier, it doesn't do it for me." It also didn't help that FSQLA's query object lulled me in complacency either. Don't get me wrong, it's fantastic for basic queries and joins, but for something this complex...not really.
I also didn't realize that my question was actually several questions. One part was "How do I write this query with SQLA's tools?" The other was, "Help me puzzle the correct association_proxy to get this desired outcome."
I'm still not sure about the association proxy portion of it yet, or if I even want to do that anymore. But I finally puzzled the solution to my first question out:
# Query database for tag names and count for a particular artist
artist = models.Artist.find_by_some_means(magic, vars)
q = db.session.query(
models.Tag.id.label('id'),
models.Tag.name.label('tag'),
db.func.count(models.MemberTaggedArtist.member_id).label('count')
)
q = q.join(models.MemberTaggedArtist, models.Tag.id == models.MemberTaggedArtist.tag_id)
q = q.filter(models.MemberTaggedArtist.artist_id == artist.id)
q = q.group_by(models.MemberTaggedArtist.tag_id)
# order by tag count first, then tag name if two tags happen to have the same count
# there's probably a much better way to get there without recalculating the result
q = q.order_by(db.desc(db.func.count(models.MemberTaggedArtist.member_id)))
q = q.order_by(models.Tag.name)
My original planned query was something along the lines of:
SELECT tags.name AS tag, tags.id AS id, count(mta.member_id) AS count
FROM tags, membertaggedartists AS mta
WHERE tags.id = mta.tag_id AND mta.artist_id = :artist_id
GROUP BY id
ORDER BY count DESC, name ASC
And SQLA interprets my request as:
SELECT tags.id AS id, tags.name AS tag, count(membertaggedartists.member_id) AS count
FROM tags
JOIN membertaggedartists ON tags.id = membertaggedartists.tag_id
WHERE membertaggedartists.artist_id = :artist_id_1
GROUP BY membertaggedartists.tag_id
ORDER BY count(membertaggedartists.member_id) DESC, tags.name
Which is remarkably similar (and uses an explicit join unlike mine, which negates the need for one of my original WHERE clauses).

Categories