I have a Association table of movies and categories that belong to those movies, and I want to get all the common categories between two movies (just need the id of the category).
So if both the movies have the category 'Thriller' which has the category_id of 5, i want to get 5. And if they have no common categories it just returns None.
Table looks like:
class MovieCategoryScores(db.Model):
movie_id = db.Column(db.Integer, db.ForeignKey('movie.id'), primary_key=True)
category_id = db.Column(db.Integer, db.ForeignKey('category.id'), primary_key=True)
score = db.Column(db.Integer)
votes = db.Column(db.Integer)
category = relationship("Category", back_populates="movies")
movie = relationship("Movie", back_populates="categories")
I know I can query
categories = MovieCategoryScores.query.filter(MovieCategoryScores.movie_id.in_([movie1, movie2])).all()
to get ALL categories, and I tried putting (MovieCategoryScores.category_id) after the query to only get the id's but that didn't work and I just got a TypeError: 'BaseQuery' object is not callable error.
If I figured out how to just get the ID's I could use something like:
categories.sort()
for index, category_id in enumerate(categories.copy()):
if categories[index+1] != category_id:
categories[index].remove()
return categories
To get get a list of only the id's that that there are 2 of, but it feels like there should be some better way to be able to the ID's of the items where both have the same category_id just through a query command?
Either solution will be much appreciated!
You can use having and func.count() > 1 to get the opposite of distinct (group_by is required for having).
from sqlalchemy import func
categories = MovieCategoryScores.query.with_entities(MovieCategoryScores.category_id).filter(MovieCategoryScores.movie_id.in_([movie1, movie2])).group_by(MovieCategoryScores.category_id).having(func.count(MovieCategoryScores.category_id) > 1).all()
Or if you want to retrieve Category.name you can do the following:
from sqlalchemy import func
categories = MovieCategoryScores.query.with_entities(MovieCategoryScores.name).filter(MovieCategoryScores.movie_id.in_([movie1, movie2])).group_by(MovieCategoryScores.category).having(func.count(MovieCategoryScores.category) > 1).all()
All right so after a lot of headaches I came up with this:
def get_common_categories(movie1, movie2):
categories = MovieCategoryScores.query.with_entities(MovieCategoryScores.category_id).filter(MovieCategoryScores.movie_id.in_([movie1, movie2])).all()
categories.sort()
common = []
for index, category in enumerate(categories):
if index+1 < len(categories) and categories[index+1] == category:
common.append(category[0])
return common
Feels stupid and like there should be some way to do it only with a query and filter, but couldn't figure it out so this will have to do for now.
Related
I'm having an issue with an SQLalchemy query. When i query a specific table it returns 7 items but when I iterate over it, it only finds one item.
Here is the query;
seasons = db.session.query(Seasons).filter(Seasons.tmdb_id == tmdb_id)
I know it returns 7 items because I immediately type the following and it prints the numeber "7";
print(seasons.count())
However when I try to iterate over this seasons object like this expecting to get 7 names, I only get one row with one name.
for item in seasons:
print(item.name)
Here is my Seasons class in my models.py
# Main TV show table
class Seasons(db.Model):
__tablename__ = 'tv_seasons'
tmdb_id = db.Column(db.Integer, primary_key=True)
season_id = db.Column(db.Integer)
air_date = db.Column(db.String)
name = db.Column(db.String)
episode_count = db.Column(db.Integer)
overview = db.Column(db.String)
poster_path = db.Column(db.String)
season_number = db.Column(db.Integer)
Any idea why this is happening?
Because the count() function returns a int value.
Take a look: https://docs.sqlalchemy.org/en/14/orm/query.html#sqlalchemy.orm.Query.count
SELECT count(1) AS count_1 FROM (
SELECT <rest of query follows...>
) AS anon_1
The above SQL returns a single row, which is the aggregate value of the count function; the Query.count() method then returns that single integer value.
So, you should to use Query(...).all() to iterate on items.
then use if to check if has data:
seasons = query(...).all()
if seasons:
for item in seasons:
print(item.name)
This is a follow up to a previous question here. I'd like to count the number of offers, in each category, and output them in a format, which I can iterate in Jinja.
new, 3
used, 7
broken, 5
Here's what I've got right now:
class Offer(Base):
CATEGORIES = [
(u'new', u'New'),
(u'used', u'Used'),
(u'broken', u'Broken')
]
__tablename__ = 'offers'
id = sa.Column(sa.Integer, primary_key=True)
summary = sa.Column(sa.Unicode(255))
category = sa.Column(ChoiceType(CATEGORIES))
Following the previous answer, I tried something like this:
count_categories = db.session.query(
CATEGORIES.value, func.count(Offer.id)).outerjoin(
Offer).group_by(CATEGORIES.key).all()
This obviously doesn't work because CATEGORIES.value is not defined; How can I pass CATEGORIES to this query, to yield the desired result? The "setup" seems fairly common, and is taken straight from the SQLAlchemy-Utils Data types page
Your help is much appreciated (growing white hair already)!
A horrible but working, temporary work-around:
result = []
for category in Offer.CATEGORIES:
count = db.session.query(func.count(Offer.id)).filter_by(category=category[0]).all()
result.append((category[0], category[1], count[0][0]))
To count the number of offers for each category you need to do an outer join between categories and offers. Given that you are not storing categories as a database table you are having to do this in application code which is not ideal. You just need to create a new categories table and replace the category field in the user table with a foreign key joining to the new categories table. The following code demonstrates this.
class Offer(Base):
__tablename__ = 'offers'
id = sa.Column(sa.Integer, primary_key=True)
summary = sa.Column(sa.Unicode(255))
category_id = sa.Column(sa.Integer, sa.ForeignKey("categories.id"))
category = sa.orm.relationship("Category", back_populates="offers")
class Category(Base):
__tablename__ = 'categories'
id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.Unicode(6), unique=True)
offers = sa.orm.relationship("Offer", back_populates="category")
# populate categories with the same values as your original enumeration
session.add(Category(name="New"))
session.add(Category(name="Used"))
session.add(Category(name="Broken"))
count_categories = session.query(Category.name, func.count(Offer.id)). \
select_from(Category).outerjoin(Offer).group_by(Category.name).all()
I know I can simply update many to many relationship like this:
tags = db.Table('tags',
db.Column('tag_id', db.Integer, db.ForeignKey('tag.id'), primary_key=True),
db.Column('page_id', db.Integer, db.ForeignKey('page.id'), primary_key=True)
)
class Page(db.Model):
id = db.Column(db.Integer, primary_key=True)
tags = db.relationship('Tag', secondary=tags, lazy='subquery',
backref=db.backref('pages', lazy=True))
class Tag(db.Model):
id = db.Column(db.Integer, primary_key=True)
tag1 = Tag()
tag2 = Tag()
page = Page( tags=[tag1])
and later for updating:
page.append(tag2)
but I want to update them only by the tag id, Assume I have to create a general function that only accepts person and ids for addresses and update it.
What I want is something like this:
page = Page(tags=[1,2]) # 1 and 2 are primary keys of (tag)s
or in a function
def update_with_foreignkey(page, tags=[1,2]):
# dosomething to update page without using Tag object
return updated page
It was a little tricky and by using the evil eval but finally, I find a general way to update many to many relations using foreign keys. My goal was to update my object by getting data from a PUT request and in that request, I only get foreign keys for the relationship with other data.
Step by step solution:
1- Find relationships in the object, I find them using __mapper__.relationships
2- Find the key that represents the relationship.
for rel in Object.__mapper__.relationships:
key = str(rel).rsplit('.',1)[-1]
in question case it return 'tags' as the result.
3- Find the model for another side of the relation ( in this example Tag).
3-1 Find name of the table.
3-2 Convert table name to camleCase because sqlalchemy use underscore for the table name and camelCase for the model.
3-3 Use eval to get the model.
if key in data:
table = eval(convert_to_CamelCase(rel.table.name))
temp = table.query.filter(table.id.in_(data[key])).all() # this line convert ids to sqlacemy objects
All together
def convert_to_CamelCase(word):
return ''.join(x.capitalize() or '_' for x in word.split('_'))
def update_relationship_withForeingkey(Object, data):
for rel in Object.__mapper__.relationships:
key = str(rel).rsplit('.',1)[-1]
if key in data:
table = eval(convert_to_CamelCase(rel.table.name))
temp = table.query.filter(table.id.in_(data[key])).all() # this line convert ids to sqlacemy objects
data[key] = temp
return data
data is what I get from the request, and Object is the sqlalchemy Model that I want to update.
running this few lines update give me the result:
item = Object.query.filter_by(id=data['id'])
data = update_relationship_withForeingkey(Object,data)
for i,j in data.items():
setattr(item,i,j)
db.session.commit()
I'm not sure about caveats of this approach but it works for me. Any improvement and sugesstion are welcome.
I have some tables like this:
class Genre(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(128), index=True)
artist_id = db.Column(db.Integer, db.ForeignKey('artist.id'))
class Song(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(128), index=True)
artist = db.relationship('Artist', uselist=False)
artist_id = db.Column(db.Integer, db.ForeignKey('artist.id'))
class Artist(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(128), index=True)
genres = db.relationship('Genre')
songs = db.relationship('Song')
So basically, Songs have one Artist. And each Artist can have multiple Genres.
I am trying to get all Songs by any Artist whose list of Genre's contains a Genre by a specific name. I did some researching, and I found something very close:
Song.query.filter(Artist.genres.any(Genre.name.in_([genre_name_im_looking_for])))
This will sort of work, but not for all cases. For example, the above statement will also return all Songs with Artists who have the Genre 'indie rock'. How can I specify that I don't want the Genre name to be in a list of values, but to be a specific value?
Song.query.filter(Artist.genres.any(Genre.name='rock'))
is also not possible because it is a keyword expression.
Any ideas?
With this test data:
# Test Data
artists = [
Artist(
name='should match rock',
genres=[Genre(name='rock'), Genre(name='pop')],
songs=[Song(name='love'), Song(name='hate')]
),
Artist(
name='should NOT match',
genres=[Genre(name='indie rock')],
songs=[Song(name='elsewhere')]
),
]
db.session.add_all(artists)
db.session.commit()
Query below should do what you want:
q = Song.query.filter(Song.artist.has(Artist.genres.any(Genre.name == 'rock')))
assert len(q.all()) == 2
After some more research, I found out one way to approach this problem, although a bit differently than what I wanted.
First, to get all the Artists that contain a specific Genre, I executed:
artists = Artist.query.filter(
Artist.genres.any(Genre.name.like('genre_name'))).all()
Then, I iterated through each Artist, and queried the Song model by using the artist as a keyword expression in the filter, like so:
for a in artist:
most_common_genre_songs = Song.query.filter_by(artist=a).all()
I am not sure if there is a more efficient way to make this call, or do it in a one-liner (I'm betting there is), but for now, this will do.
I have the following tables in sqlalchemy :-
class Post(Base):
__tablename__ = 'posts'
id = Column(Integer, primary_key=True)
compare_url =Column(String(200))
url = Column(String(200))
postedby = Column(Integer)
category = Column(String(50))
title = Column(String(500),nullable=False)
author = Column(String(500),default="Unspecified")
content = Column(Text(),default="could not fetch this content you will have to read it externally")
summary = Column(Text())
time = Column(TIMESTAMP(),default=now())
post_type=Column(Text())
Reads = relationship("Read", backref="Post")
Reposts = relationship("RePost", backref="Post")
Votes = relationship("Vote", backref="Post")
class Read(Base):
__tablename__ = 'reads'
id = Column(Integer, primary_key=True)
post_read = Column(Integer, ForeignKey('posts.id'))
#post = relationship("Post", backref=backref('Reads', order_by=id))
time = Column(TIMESTAMP(),default=now())
user_id = Column(String(50))
class Vote(Base):
__tablename__ = 'votes'
id = Column(Integer, primary_key=True)
post_read = Column(Integer, ForeignKey('posts.id'))
time = Column(TIMESTAMP(),default=now())
user_id = Column(String(50))
user_vote = Column(Boolean(),nullable=False)
I have this query
posts = session.query(Post, func.count(Read.id).label('total'),func.sum(Vote.user_vote).label('votes'),User.username).outerjoin(Post.Reads).outerjoin(Post.Votes)
i am trying to get the number of votes and the number of times a post has been read. A vote Value can be -1 or 1
The problem is i am getting the same value for number of Reads and votes on each Post
for example when my reads table has
id post_read time user_id
1 7 2012-09-19 09:32:06 1
and votes table has
id post_read time user_id user_vote
1 7 [->] 2012-09-19 09:42:27 1 1
2 7 [->] 2012-09-19 09:42:27 2 1
But i am still getting the value for votes and reads as two.
It might look as if you can solve this particular problem by simply replacing func.count(Read.id).label('total') with func.count(func.distinct(Read.id)).label('total'). And in fact this will solve the issue with number of reads.
But if you suddenly get another reader for your post (and end up with 2 readers and 2 voters), then all your votes will also be counted twice.
The best solution to this is simply not to aggreate different items in the same query. You can use subqueries to solve this:
subq_read = (session.query(
Post.id,
func.count(Read.id).label("total_read")
).
outerjoin(Post.Reads).
group_by(Read.post_read)
).subquery()
subq_vote = (session.query(
Post.id,
func.sum(Vote.user_vote).label("total_votes")
).
outerjoin(Post.Votes).
group_by(Vote.post_read)
).subquery()
posts = (session.query(
Post,
subq_read.c.total_read,
subq_vote.c.total_votes,
).
outerjoin(subq_read, subq_read.c.id == Post.id).
outerjoin(subq_vote, subq_vote.c.id == Post.id)
.group_by(Post)
)
Note: you have a User.username in your query, but I did not see any join clause in the query. You might want to check this as well.
When joining multiple tables, tables that join earlier get their rows repeated for tables that join later in one-to-many relationships (to put it simply). This is why your count is off. In joins like this, you always need to find something distinct to count in the result set... such as the primary keys. I find this preferable to subqueries as it is much faster. In fact, much of the performance tuning I do comes from eliminating subqueries.
Thus, if you filter on the user_vote column to eliminate the records you don't want to count, you can fix your query like this:
posts = session.query(Post
, func.count(distinct(Read.id)).label('total')
, func.count(distinct(Vote.id)).label('votes')
, User.username
) \
.outerjoin(Post.Reads) \
.outerjoin(Post.Votes) \
.filter(Votes.user_vote == True)
But, you'll probably also want to add a group_by, or another filter, to that as well to get counts per Post, your likely goal.