So imagine you have the following two tables:
CREATE movies (
id int,
name varchar(255),
...
PRIMARY KEY (id)
);
CREATE movieRentals (
id int,
movie_id int,
customer varchar(255),
dateRented datetime,
...
PRIMARY KEY (id)
FOREIGN KEY (movie_id) REFERENCES movies(id)
);
With SQL directly, I'd approach this query as:
(
SELECT movie_id, count(movie_id) AS rent_count
FROM movieRentals
WHERE dateRented > [TIME_ARG_HERE]
GROUP BY movie_id
)
UNION
(
SELECT id AS movie_id, 0 AS rent_count
FROM movie
WHERE movie_id NOT IN
(
SELECT movie_id
FROM movieRentals
WHERE dateRented > [TIME_ARG_HERE]
GROUP BY movie_id
)
)
(Get a count of all movie rentals, by id, since a given date)
Obviously the Django version of these tables are simple models:
class Movies(models.Model):
name = models.CharField(max_length=255, unique=True)
class MovieRentals(models.Model):
customer = models.CharField(max_length=255)
dateRented = models.DateTimeField()
movie = models.ForeignKey(Movies)
However, translating this to an equivalent query appears to be difficult:
timeArg = datetime.datetime.now() - datetime.timedelta(7,0)
queryset = models.MovieRentals.objects.all()
queryset = queryset.filter(dateRented__gte=timeArg)
queryset = queryset.annotate(rent_count=Count('movies'))
querysetTwo = models.Movies.objects.all()
querysetTwo = querysetTwo.filter(~Q(id__in=[val["movie_id"] for val in queryset.values("movie_id")]))
# Somehow need to set the 0 count. For now force it with Extra:
querysetTwo.extra(select={"rent_count": "SELECT 0 AS rent_count FROM app_movies LIMIT 1"})
# Now union these - for some reason this doesn't work:
# return querysetOne | querysetTwo
# so instead
set1List = [_getMinimalDict(model) for model in queryset]
# Where getMinimalDict just extracts the values I am interested in.
set2List = [_getMinimalDict(model) for model in querysetTwo]
return sorted(set1List + set2List, key=lambda x: x['rent_count'])
However, while this method seems to work, it is incredibly slow. Is there a better way I am missing?
With straight SQL, this would be much easier expressed like this:
SELECT movie.id, count(movieRentals.id) as rent_count
FROM movie
LEFT JOIN movieRentals ON (movieRentals.movie_id = movie.id AND dateRented > [TIME_ARG_HERE])
GROUP BY movie.id
The left join will produce a single row for each movie unrented since [TIME_ARG_HERE], but in those rows, the movieRentals.id column will be NULL.
Then, COUNT(movieRentals.id) will count all of the rentals where they exist, and return 0 if there was only the NULL value.
I must be missing something obvious. Why wouldn't the following work:
queryset = models.MovieRentals.filter(dateRented__gte=timeArg).values('movies').annotate(Count('movies')).aggregate(Min('movies__count'))
Also, clauses can be chained (as shown in the code above), so there is no reason to constantly set a queryset variable to the intermediate querysets.
Related
I have this model about Invoices which has a property method which refers to another model in order to get the cancelation date of the invoice, like so:
class Invoice(models.Model):
# (...)
#property
def cancel_date(self):
if self.canceled:
return self.records.filter(change_type = 'cancel').first().date
else:
return None
And in one of my views, i need to query every invoice that has been canceled after max_date or hasn't been canceled at all.
Like so:
def ExampleView(request):
# (...)
qs = Invoice.objects
if r.get('maxDate'):
max_date = datetime.strptime(r.get('maxDate'), r'%Y-%m-%d')
ids = list(map(lambda i: i.pk, filter(lambda i: (i.cancel_date == None) or (i.cancel_date > max_date), qs)))
qs = qs.filter(pk__in = ids) #Error -> django.db.utils.OperationalError: too many SQL variables
However, ids might give me a huge list of ids which causes the error too many SQL variables.
What's the smartest approach here?
EDIT:
I'm looking for a solution that does not involve adding cancel_date as a model field since invoice.records refers to another model where we store every date attribute of the invoice
Like so:
class InvoiceRecord(models.Model):
invoice = models.ForeignKey(Invoice, related_name = 'records', on_delete = models.CASCADE)
date = models.DateTimeField(default = timezone.now)
change_type = models.CharField(max_length = 32) # Multiple choices field
And every invoice might have more than one same date attribute. For example, one invoice might have two cancelation dates
You can annotate a Subquery() expression [Django docs] which will give you the date to do this:
from django.db.models import OuterRef, Q, Subquery
def ExampleView(request):
# (...)
qs = Invoice.objects.annotate(
cancel_date=Subquery(
InvoiceRecords.objects.filter(invoice=OuterRef("pk")).values('date')[:1]
)
)
if r.get('maxDate'):
max_date = datetime.strptime(r.get('maxDate'), r'%Y-%m-%d')
qs = qs.filter(Q(cancel_date__isnull=True) | Q(cancel_date__gt=max_date))
I would set cancel_date as database field when you set cancel flag. Then you can use single query:
qs = Invoice.objects.filter(Q(cancel_date__isnull=True) | Q(cancel_date__gt=max_date))
It's say cancel_date is NULL or greater than max_date
Not sure about your property cancel_date. It will return first record with change_type='cancel' which can be (don't know your code flow) other record then you call that property on.
I have two models:
class FirstModel(models.Model():
some_fields...
class SecondModel(models.Model):
date = models.DateTimeField()
value = models.IntegerField()
first_model = models.ForeignKey(to="FirstModel", on_delete=models.CASCADE)
and I need to do the following query:
select sum(value) from second_model
inner join (
select max(date) as max_date, id from second_model
where date < NOW()
group by id
) as subquery
on date = max_date and id = subquery.id
I think I can do it using Subquery
subquery = Subquery(SecondModel.objects.values("first_model")
.annotate(max_date=Max("date"))
.filter(date__lt=Func(function="NOW")))
and F() expressions but it only can resolve model fields, not a subquery
Question
Is it possible to implement using Django ORM only?
Also, can I evaluate the sum of values from the second model for all values in the first model by annotating this value? Like
FirstModel.objects.annotate(sum_values=sum_with_inner_join_query).all()
I have 3 tables, users, projects and project_users is_admin table and I am trying to write ORM to get data from them.
My Models are here: https://pastebin.com/ZrmhKyNL
In simple SQL, we could join and select particular columns and get the desired output. But in ORM when I write query like this:
sql = """ select * from
projects p, users u, projects_users pu
where
p.name = '%s' and
p.id = pu.project_id and
pu.user_id = u.id and
p.is_active = true and
u.is_active = true
""" % project_name
and it works well and returns response in this format:
[
{
All columns of above 3 tables.
}
]
But when I try to convert this to sqlalchamey ORM, it doesn't work well:
return (
db.query(User)
.filter(Project.id == ProjectsUser.project_id)
.filter(ProjectsUser.user_id == User.id)
.filter(Project.is_active == True)
.filter(User.is_active == True)
.filter(Project.name == project_name)
.all()
)
I want is_admin value to be returned along with user object. This seems very common use case, but I couldn't find any solution related to SQLalchemy ORM.
If this is a one-to-one relationship, meaning that for every user you can have one and only one entry in the admin table, you could add a column to serve as back-reference:
class User(Base):
id = ...
name = ...
last_login = ...
admin = relationship("Admin", uselist=False, back_populates="user")
class Admin(Base):
id = ...
is_admin = ...
user_id = Column(Integer, ForeignKey('user.id'))
user = relationship("User", back_populates="admin")
Then you can query only the table user, and access the values from the relationship via user.admin.is_admin.
Read more:
One To One - SqlAlchemy docs
relationship.back_populates - SqlAlchemy docs
I have a situation similar to the following one:
class Player(models.Model):
pass
class Item(models.Model):
player = models.ForeignKey(Player,
on_delete=models.CASCADE,
related_name='item_set')
power = models.IntegerField()
I would like to annotate Player.objects.all() with Sum(item_set__power), taking into account only the top N Item when sorted by descending power. Ideally, I would like to do this with a subquery, but I don't know how to write it. How can this be done?
This is the solution using raw queryset (it was easier to implement for me than using ORM, but it might be possible with ORM using Subquery):
N = 2
query = """
SELECT id,
(
SELECT SUM(power)
FROM (SELECT power FROM myapp_item WHERE myapp_item.player_id = players.id ORDER BY power DESC LIMIT %s)
)
AS power__sum
FROM myapp_player AS players GROUP BY players.id
"""
players = Player.objects.raw(query, [N])
Update
Adding annotations is not possible with RawQueryset, but you can use RawSQL expression:
from django.db.models.expressions import RawSQL
N = 2
queryset = Player.objects.all()
query2 = """
SELECT SUM(power)
FROM (SELECT power FROM myapp_item WHERE myapp_item.player_id = myapp_player.id ORDER BY power DESC LIMIT %s)
"""
queryset.annotate(power__sum=RawSQL(query2, (N,)), my_annotation1=..., my_annotation2=...)
I have the following tables in sqlalchemy :-
class Post(Base):
__tablename__ = 'posts'
id = Column(Integer, primary_key=True)
compare_url =Column(String(200))
url = Column(String(200))
postedby = Column(Integer)
category = Column(String(50))
title = Column(String(500),nullable=False)
author = Column(String(500),default="Unspecified")
content = Column(Text(),default="could not fetch this content you will have to read it externally")
summary = Column(Text())
time = Column(TIMESTAMP(),default=now())
post_type=Column(Text())
Reads = relationship("Read", backref="Post")
Reposts = relationship("RePost", backref="Post")
Votes = relationship("Vote", backref="Post")
class Read(Base):
__tablename__ = 'reads'
id = Column(Integer, primary_key=True)
post_read = Column(Integer, ForeignKey('posts.id'))
#post = relationship("Post", backref=backref('Reads', order_by=id))
time = Column(TIMESTAMP(),default=now())
user_id = Column(String(50))
class Vote(Base):
__tablename__ = 'votes'
id = Column(Integer, primary_key=True)
post_read = Column(Integer, ForeignKey('posts.id'))
time = Column(TIMESTAMP(),default=now())
user_id = Column(String(50))
user_vote = Column(Boolean(),nullable=False)
I have this query
posts = session.query(Post, func.count(Read.id).label('total'),func.sum(Vote.user_vote).label('votes'),User.username).outerjoin(Post.Reads).outerjoin(Post.Votes)
i am trying to get the number of votes and the number of times a post has been read. A vote Value can be -1 or 1
The problem is i am getting the same value for number of Reads and votes on each Post
for example when my reads table has
id post_read time user_id
1 7 2012-09-19 09:32:06 1
and votes table has
id post_read time user_id user_vote
1 7 [->] 2012-09-19 09:42:27 1 1
2 7 [->] 2012-09-19 09:42:27 2 1
But i am still getting the value for votes and reads as two.
It might look as if you can solve this particular problem by simply replacing func.count(Read.id).label('total') with func.count(func.distinct(Read.id)).label('total'). And in fact this will solve the issue with number of reads.
But if you suddenly get another reader for your post (and end up with 2 readers and 2 voters), then all your votes will also be counted twice.
The best solution to this is simply not to aggreate different items in the same query. You can use subqueries to solve this:
subq_read = (session.query(
Post.id,
func.count(Read.id).label("total_read")
).
outerjoin(Post.Reads).
group_by(Read.post_read)
).subquery()
subq_vote = (session.query(
Post.id,
func.sum(Vote.user_vote).label("total_votes")
).
outerjoin(Post.Votes).
group_by(Vote.post_read)
).subquery()
posts = (session.query(
Post,
subq_read.c.total_read,
subq_vote.c.total_votes,
).
outerjoin(subq_read, subq_read.c.id == Post.id).
outerjoin(subq_vote, subq_vote.c.id == Post.id)
.group_by(Post)
)
Note: you have a User.username in your query, but I did not see any join clause in the query. You might want to check this as well.
When joining multiple tables, tables that join earlier get their rows repeated for tables that join later in one-to-many relationships (to put it simply). This is why your count is off. In joins like this, you always need to find something distinct to count in the result set... such as the primary keys. I find this preferable to subqueries as it is much faster. In fact, much of the performance tuning I do comes from eliminating subqueries.
Thus, if you filter on the user_vote column to eliminate the records you don't want to count, you can fix your query like this:
posts = session.query(Post
, func.count(distinct(Read.id)).label('total')
, func.count(distinct(Vote.id)).label('votes')
, User.username
) \
.outerjoin(Post.Reads) \
.outerjoin(Post.Votes) \
.filter(Votes.user_vote == True)
But, you'll probably also want to add a group_by, or another filter, to that as well to get counts per Post, your likely goal.