how can i change this SQL to Django ORM code?

how can i change this SQL to Django ORM code? - python

select *
from sample
join process
on sample.processid = process.id
where (processid) in (
select max(processid) as processid
from main_sample
group by serialnumber
)
ORDER BY sample.create_at desc;
models.py
class Sample(models.Model):
processid = models.IntegerField(default=0)
serialnumber = models.CharField(max_length=256) ##
create_at = models.DateTimeField(null=True)
class Process(models.Model):
sample = models.ForeignKey(Sample, blank=False, null=True, on_delete=models.SET_NULL)
Hi I have two models and I need to change this SQL query to Django ORM, Python code.
I need to retrieve the latest Sample(by processid) per unique serial number.
for example,
=> after RUN query
How can I change the SQL query to ORM code?
how can i change the subquery to ORM?
Thanks for reading.

EDIT: To also order by a column that is not one of the distinct or retrieved columns you can fall-back on subqueries. To filter by a single row from a subquery you can use the syntax described in the docs here
from django.db.models import Subquery, OuterRef
subquery = Subquery(Sample.objects.filter(
serialnumber=OuterRef('serialnumber')
).order_by(
'-processid'
).values(
'processid'
)[:1])
results = Sample.objects.filter(
processid=subquery
).order_by(
'create_at'
)
When using PostgreSQL you can pass fields to distinct to get a single result per a certain column, this returns the first result so combined with ordering will do what you need
Sample.objects.order_by('serialnumber', '-processid').distinct('serialnumber')
If you don't use PostgreSQL. Use a values query of the column that should be unique and then annotate the queryset with the condition that should group the values, Max in this case
from django.db.models import Max
Sample.objects.order_by(
'serialnumber'
).values(
'serialnumber'
).annotate(
max_processid=Max('processid')
)

I think this is what you need:
If want multiple related objects
samples = Sample.objects.prefetch_related('process').group_by('serialinumber')
If you want related objects for only one object
samples = Sample.objects.filter(id=1).select_related('process').group_by('serialinumber')

Related

Django models - how to properly calculate multiple values with various conditions

I want to calculate multiple values in a single query on my model. Each metric should have a different filter (or no filters at all). I'm using Django==2.2.3 and a Djongo model.
MyModel columns:
user_id = models.IntegerField(blank=True, null=True)
group_id = models.IntegerField(blank=True, null=True)
What I try to run, suggested in another topic, does not work for me.
Query:
MyModel.objects.aggregate
(
total=Count('user_id'),
test=Count('user_id', filter=Q(user_id='just_a_fake_id')),
group_1_value=Count('user_id', filter=Q(group_id=1)),
group_2_value=Count('user_id', filter=Q(group_id=2)),
)
The results: {'total': 0, 'test': 47479, 'group_1_value': 47479, 'group_2_value': 47479}, does not make sense - all results (except total) returns the same number which is the count of all the records.
What I want to run is query similar to
SELECT COUNT(user_id) as total,
COUNT(CASE WHEN group_id=1 THEN user_id END) as group_1_value,
COUNT(CASE WHEN group_id=2 THEN user_id END) as group_2_value
FROM MyModel
How do I modify the query in order to get the correct values?

How do I include columns properly in a GROUP BY clause of my Django aggregate query?

I'm using Django and Python 3.7. I have the following model ...
class ArticleStat(models.Model):
objects = ArticleStatManager()
article = models.ForeignKey(Article, on_delete=models.CASCADE, related_name='articlestats')
elapsed_time_in_seconds = models.IntegerField(default=0, null=False)
score = models.FloatField(default=0, null=False)
I want to write a MAX/GROUP BY query subject to certain conditions. Specifically I want each row to contain
MAX(ArticleStat.elapsed_time_in_seconds)
ArticleStat.Article.id
ArticleStat.Article.title
ArticleStat.score
in which the columns "ArticleStat.Article.id," "ArticleStat.Article.title," and "ArticleStat.score" are unique per result set row. So I tried this ...
def get_current_articles(self, article):
qset = ArticleStat.objects.values('article__id', 'article__title', 'score').filter(
article__article=article).values('elapsed_time_in_seconds').annotate(\
max_date=Max('elapsed_time_in_seconds'))
print(qset.query)
return qset
However, the resulting SQL does not include the values I want to use in my GROUP BY clause (notice that neither "article" nor "score" is in the GROUP BY) ...
SELECT "myproject_articlestat"."elapsed_time_in_seconds",
MAX("myproject_articlestat"."elapsed_time_in_seconds") AS "max_date"
FROM "myproject_articlestat"
INNER JOIN "myproject_article" ON ("myproject_articlestat"."article_id" = "myproject_article"."id")
WHERE ("myproject_article"."article_id" = 2) GROUP BY "myproject_articlestat"."elapsed_time_in_seconds"
How do I modify my Django query to generate SQL consistent with what I want?

I don't think the answer by #Oleg will work but it's close..
A Subquery expression can be used to select a single value from another queryset. To accomplish this you sort by the value you wish to target then select the first value.
sq = (
ArticleStat.objects
.filter(article=OuterRef('pk'))
.order_by('-elapsed_time_in_seconds')
)
articles = (
Article.objects
.annotate(
max_date=Subquery(sq.values('elapsed_time_in_seconds')[:1]),
score=Subquery(sq.values('score')[:1]),
)
# .values('id', 'path', 'title', 'score', 'max_date')
)

You should not use the 'elapsed_time_in_seconds' in your .values(..) clause, and add the GROUP BY property in the .order_by(..) clause:
qset = ArticleStat.objects.filter(
article__article=article
).values(
'article__path', 'article__title', 'score'
).annotate(
max_date=Max('elapsed_time_in_seconds')
).order_by('article__path', 'article__title', 'score')
This will thus make a QuerySet of dictionaries such that each dictionary contains four elements: 'article__path', 'article__title', 'score', and 'max_date'.

As far as I see, there is no one-step way of doing this in Django.
One way of doing it is with 2 queries and to get use of .annotate() to
add the max id of the related object revision for each revision, then
get all those objectrevisions
Example:
objects = Object.objects.all().annotate(revision_id=Max
('objectrevision__id'))
objectrevisions = ObjectRevision.objects.filter(id__in=
[o.revision_id for o in objects])
This is untested and also its a bit slow, so may be you can also try to write custom SQL as mentioned by Wolfram Kriesing in the blog here

If I understand correctly from all the comments:
Result is to get Articles (id, path, title) filtered by the article argument of get_current_articles method with additional data - maximum elapsed_time_in_seconds from all the ArticleStats of each filtered article and also score of its ArticleStat with maximum elapsed_time.
If so, when the base query can be on Article model: Article.objects.filter(article=article).
Which we can annotate with the Max() of corresponding ArticleStats. This can be done directly on main query .annotate(max_date=Max(articlestats__elapsed_time_in_seconds)) or with Subquery on ArticleStat also filtered the same way as base query on article (we want subquery to run on the same set of article objects as the main query), i.e.
max_sq = ArticleStat.objects.filter(
article__article=OuterRef('article')
).annotate(max_date=Max('elapsed_time_in_seconds'))
Now, to add score column to the result. Max() is aggregate function and it has no row info. In order to get score for the maximum elapsed_time we can make another subquery and filter by max elapsed_time from previous column.
Note: This filter can return multiple ArticleStats objects for the same maximum elapsed_time and article but we will use only the first one. It is for your data structure to make sure that filter returns only one row or provide additional filtering or ordering such that first result will be the one required.
score_sq = ArticleStat.objects.filter(
article__article=OuterRef('article'),
elapsed_time_in_seconds=OuterRef('max_date')
)
And use our subqueries in the main query
qset = Article.objects.filter(
article=article
).annotate(
max_date=Max('articlestats__elapsed_time_in_seconds'),
""" or
max_date=Subquery(
max_sq.values('max_date')[:1]
),
"""
score=Subquery(
score_sq.values('score')[:1]
)
).values(
'id', 'path', 'title', 'score', 'max_date'
)
And somewhat tricky option without using Max() function
but emulating it with ORDER BY and fetching the first row.
artstat_sq = ArticleStat.objects.filter(
article__article=OuterRef('article')
).order_by().order_by('-elapsed_time_in_seconds', '-score')
# empty order_by() to clear any base ordering
qset = Article.objects.filter(
article=article
).annotate(
max_date=Subquery(
artstat_sq.values('elapsed_time_in_seconds')[:1]
),
score=Subquery(
artstat_sq.values('score')[:1]
)
).values(
'id', 'path', 'title', 'score', 'max_date'
)

How to convert SQL scalar subquery to SQLAlchemy expression

I need a litle help with expressing in SQLAlchemy language my code like this:
SELECT
s.agent_id,
s.property_id,
p.address_zip,
(
SELECT v.valuation
FROM property_valuations v WHERE v.zip_code = p.address_zip
ORDER BY ABS(DATEDIFF(v.as_of, s.date_sold))
LIMIT 1
) AS back_valuation,
FROM sales s
JOIN properties p ON s.property_id = p.id
Inner subquery aimed to get property value from table propert_valuations with columns (zip_code INT, valuation DECIMAL, as_if DATE) closest to the date of sale from table sales. I know how to rewrite it but I completely stuck on order_by expression - I cannot prepare subquery to pass ordering member later.
Currently I have following queries:
subquery = (
session.query(PropertyValuation)
.filter(PropertyValuation.zip_code == Property.address_zip)
.order_by(func.abs(func.datediff(PropertyValuation.as_of, Sale.date_sold)))
.limit(1)
)
query = session.query(Sale).join(Sale.property_)
How to combine these queries together?

How to combine these queries together?
Use as_scalar(), or label():
subquery = (
session.query(PropertyValuation.valuation)
.filter(PropertyValuation.zip_code == Property.address_zip)
.order_by(func.abs(func.datediff(PropertyValuation.as_of, Sale.date_sold)))
.limit(1)
)
query = session.query(Sale.agent_id,
Sale.property_id,
Property.address_zip,
# `subquery.as_scalar()` or
subquery.label('back_valuation'))\
.join(Property)
Using as_scalar() limits returned columns and rows to 1, so you cannot get the whole model object using it (as query(PropertyValuation) is a select of all the attributes of PropertyValuation), but getting just the valuation attribute works.
but I completely stuck on order_by expression - I cannot prepare subquery to pass ordering member later.
There's no need to pass it later. Your current way of declaring the subquery is fine as it is, since SQLAlchemy can automatically correlate FROM objects to those of an enclosing query. I tried creating models that somewhat represent what you have, and here's how the query above works out (with added line-breaks and indentation for readability):
In [10]: print(query)
SELECT sale.agent_id AS sale_agent_id,
sale.property_id AS sale_property_id,
property.address_zip AS property_address_zip,
(SELECT property_valuations.valuation
FROM property_valuations
WHERE property_valuations.zip_code = property.address_zip
ORDER BY abs(datediff(property_valuations.as_of, sale.date_sold))
LIMIT ? OFFSET ?) AS back_valuation
FROM sale
JOIN property ON property.id = sale.property_id

GROUP BY in Django Queries

Dear StackOverFlow community:
I need your help in executing following SQL query.
select DATE(creation_date), COUNT(creation_date) from blog_article WHERE creation_date BETWEEN SYSDATE() - INTERVAL 30 DAY AND SYSDATE() GROUP BY DATE(creation_date) AND author="scott_tiger";
Here is my Django Model
class Article(models.Model):
title = models.CharField(...)
author = models.CharField(...)
creation_date = models.DateField(...)
How can I form aforementioned Django query using aggregate() and annotate() functions. I created something like this -
now = datetime.datetime.now()
date_diff = datetime.datetime.now() + datetime.timedelta(-30)
records = Article.objects.values('creation_date', Count('creation_date')).aggregate(Count('creation_date')).filter(author='scott_tiger', created_at__gt=date_diff, created_at__lte=now)
When I run this query it gives me following error -
'Count' object has no attribute 'split'
Any idea who to use it?

Delete Count('creation_date') from values and add annotate(Count('creation_date')) after filter.

Try
records = Article.objects.filter(author='scott_tiger', created_at__gt=date_diff,
created_at__lte=now).values('creation_date').aggregate(
ccd=Count('creation_date')).values('creation_date', 'ccd')
You need to use creation_date__count or customized name(ccd here) to refer the count result column, after aggregate().
Also, values() before aggregate limits group by columns and last value() declares the columns to be selected. There is no need to group by COUNT which is based on group of rows already.

Can Django ORM do an ORDER BY on a specific value of a column?

I have a table 'tickets' with the following columns
id - primary key - auto increment
title - varchar(256)
status - smallint(6) - Can have any value between 1 and 5, handled by Django
When I'll do a SELECT * I want the rows with status = 4 at the top, the other records will follow them. It can be achieved by the following query:
select * from tickets order by status=4 DESC
Can this query be executed through Django ORM? What parameters should be passed to the QuerySet.order_by() method?

q = Ticket.objects.extra(select={'is_top': "status = 4"})
q = q.extra(order_by = ['-is_top'])

I did this while using PostgresSql with django.
from django.db.models import Case, Count, When
Ticket.objects.annotate(
relevancy=Count(Case(When(status=4, then=1)))
).order_by('-relevancy')
It will return all objects from Ticket, but tickets with status = 4 will be at the beginning.
Hope someone will find it useful.

For those in need just like me that stumbled on this now and are using newer versions of Django
from django.db.models import Case, When
Ticket.objects.annotate(
relevancy=Case(
When(status=4, then=1),
When(status=3, then=2),
When(status=2, then=3),
output_field=IntegerField()
)
).order_by('-relevancy')
Using Count() will return 1 or 0 depending if your case was found or not. Not ideal if ordering by a couple of status

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how can i change this SQL to Django ORM code? - python

I think this is what you need: If want multiple related objects samples = Sample.objects.prefetch_related('process').group_by('serialinumber') If you want related objects for only one object samples = Sample.objects.filter(id=1).select_related('process').group_by('serialinumber')

Related

Django models - how to properly calculate multiple values with various conditions

How do I include columns properly in a GROUP BY clause of my Django aggregate query?

How to convert SQL scalar subquery to SQLAlchemy expression

GROUP BY in Django Queries

Can Django ORM do an ORDER BY on a specific value of a column?

Categories

Resources