How to select only date part in postgres by using django ORM - python

My backend setting as following
Postgres 9.5
Python 2.7
Django 1.9
I have a table with datetime type, which named createdAt. I want to use Django ORM to select this field with only date part and group by createdAt filed.
For example, this createdAt filed may store 2016-12-10 00:00:00+0、2016-12-11 10:10:05+0、2016-12-11 17:10:05+0 ... etc。
By using Djanggo ORM, the output should be 2016-12-10、2016-12-11。The corresponding sql should similar to: SELECT date(createdAt) FROM TABLE GROUP BY createdAt.
Thanks.

You can try that:
use __date operator to Filter by DateTimeField with date part: createAt__date, this is similar to __lt or __gt.
use annotate/Func/F to create an extra field base on createAt with only showing the date part.
use values and annotate, Count/Sum to create group by query.
Example code
from django.db import models
from django.db.models import Func, F, Count
class Post(models.Model):
name = models.CharField('name', max_length=255)
createAt = models.DateTimeField('create at', auto_now=True)
Post.objects.filter(createAt__date='2016-12-26') # filter the date time field for specify date part
.annotate(createAtDate=Func(F('createAt'), function='date')) # create an extra field, named createAtDate to hold the date part of the createAt field, this is similar to sql statement: date(createAt)
.values('createAtDate') # group by createAtDate
.annotate(count=Count('id')) # aggregate function, like count(id) in sql statement
output
[{'count': 2, 'createAtDate': datetime.date(2016, 12, 26)}]
Notice:
To specify each method's functionality, I have break the long statement into serval pieces, so you need to remove the carriage return when you paste it to your code.
When you have turned on timezone in your application, like USE_TZ = True, you need be more carefully when compare two dates in django. Timezone is matter when you making query as above.
Hope it would help. :)

Django provide Manager.raw() to perform raw queries on model but as raw query must contains primary key that is not useful in your case. In case Manager.raw() is not quite enough you might need to perform queries that don’t map cleanly to models, you can always access the database directly, routing around the model layer entirely by using connection. connection object represent the default database connection. So you can perform you query like.
from django.db import connection
#acquire cursor object by calling connection.cursor()
cursor = connection.cursor()
cursor.execute('SELECT created_at from TABLE_NAME GROUP BY created_at')
After executing query you can iterate over cursor object to get query result.

Related

Nesting Django QuerySets

Is there a way to create a queryset that operates on a nested queryset?
The simplest example I can think of to explain what I'm trying to accomplish is by demonstration.
I would like to write code something like
SensorReading.objects.filter(reading=1).objects.filter(meter=1)
resulting in SQL looking like
SELECT * FROM (
SELECT * FROM SensorReading WHERE reading=1
) WHERE sensor=1;
More specifically I have a model representing readings from sensors
class SensorReading(models.Model):
sensor=models.PositiveIntegerField()
timestamp=models.DatetimeField()
reading=models.IntegerField()
With this I am creating a queryset that annotates every sensor with the elapsed time since the previous reading in seconds
readings = (
SensorReading.objects.filter(**filters)
.annotate(
previous_read=Window(
expression=window.Lead("timestamp"),
partition_by=[F("sensor"),],
order_by=["timestamp",],
frame=RowRange(start=-1, end=0),
)
)
.annotate(delta=Abs(Extract(F("timestamp") - F("previous_read"), "epoch")))
)
I now want to aggregate those per sensor to see the minimum and maximum elapsed time between readings from every sensor. I initially tried
readings.values("sensor").annotate(max=Max('delta'),min=Min('delta'))[0]
however, this fails because window values cannot be used inside the aggregate.
Are there any methods or libraries to solve this without needing to resort to raw SQL? Or have I just overlooked a simpler solution to the problem?
The short answer is Yes you can, using the id__in lookup and a subquery in the filter method. function from the django.db.models module.
The long answer is how? :
you can create a subquery that retrieves the filtered SensorReading objects, and then use that subquery in the main queryset (For example):
from django.db.models import Subquery
subquery = SensorReading.objects.filter(reading=1).values('id')
readings = SensorReading.objects.filter(id__in=Subquery(subquery), meter=1)
The above code will generate SQL that is similar to what you described in your example:
SELECT * FROM SensorReading
WHERE id IN (SELECT id FROM SensorReading WHERE reading=1)
AND meter=1;
Another way is to chain the filter() on the queryset that you have created, and add the second filter on top of it
readings = (
SensorReading.objects.filter(**filters)
.annotate(
previous_read=Window(
expression=window.Lead("timestamp"),
partition_by=[F("sensor"),],
order_by=["timestamp",],
frame=RowRange(start=-1, end=0),
)
)
.annotate(delta=Abs(Extract(F("timestamp") - F("previous_read"), "epoch")))
.filter(sensor=1)
)
UPDATE:
As you commented below, you can use the RawSQL function from the django.db.models module to to aggregate the window function values without running the subquery multiple times. This allows you to include raw SQL in the queryset, and use the results of that SQL in further querysets or aggregations.
For example, you can create a raw SQL query that retrieves the filtered SensorReading objects, the previous_read and delta fields with the window function applied, and then use that SQL in the main queryset:
from django.db.models import RawSQL
raw_sql = '''
SELECT id, sensor, timestamp, reading,
LAG(timestamp) OVER (PARTITION BY sensor ORDER BY timestamp) as previous_read,
ABS(EXTRACT(EPOCH FROM timestamp - LAG(timestamp) OVER (PARTITION BY sensor ORDER BY timestamp))) as delta
FROM myapp_sensorreading
WHERE reading = 1
'''
readings = SensorReading.objects.raw(raw_sql)
You can then use the readings queryset to aggregate the data as you need, for example:
aggregated_data = readings.values("sensor").annotate(max=Max('delta'),min=Min('delta'))
Just be aware of the security implications of using raw SQL, as it allows you to include user input directly in the query, which could lead to SQL injection attacks. Be sure to properly validate and sanitize any user input that you use in a raw SQL query.
Ended up rolling my own solution, basically introspecting the queryset to create a fake table to use in the creation of a new query set and setting the alias to a node that knows to render the SQL for the inner query
allows me to do something like
readings = (
NestedQuery(
SensorReading.objects.filter(**filters)
.annotate(
previous_read=Window(
expression=window.Lead("timestamp"),
partition_by=[F("sensor"),],
order_by=[
"timestamp",
],
frame=RowRange(start=-1, end=0),
)
)
.annotate(delta=Abs(Extract(F("timestamp") - F("previous_read"), "epoch")))
)
.values("sensor")
.annotate(min=Min("delta"), max=Max("delta"))
)
code is available on github, and I've published it on pypi
https://github.com/Simage/django-nestedquery
I have no doubt that I'm leaking the tables or some such nonsense still and this should be considered proof of concept, not any sort of production code.

Django join-like query with no where condition

Say I have 3 models in Django
class Instrument(models.Model):
ticker = models.CharField(max_length=30, unique=True, db_index=True)
class Instrument_df(models.Model):
instrument = models.OneToOneField(
Instrument,
on_delete=models.CASCADE,
primary_key=True,
)
class Quote(models.Model):
instrument = models.ForeignKey(Instrument, on_delete=models.CASCADE)
I just want to query all Quotes that correspond to an instrument of 'DF' type. in SQL I would perform the join of Quote and Instrument_df on field id.
Using Django's ORM I came out with
Quote.objects.filter(instrument__instrument_df__instrument_id__gte=-1)
I think this does the job, but I see two drawbacks:
1) I am joining 3 tables, when in fact table Instrument would not need to be involved.
2) I had to insert the trivial id > -1 condition, that holds always. This looks awfully artificial.
How should this query be written?
Thanks!
Assuming Instrument_df has other fields not shown in the snippet (else this table is just useless and could be replaced by a flag in Instrument), a possible solution could be to use either a subquery or two queries:
# with a subquery
dfids = Instrument_df.objects.values_list("instrument", flat=True)
Quote.objects.filter(instrument__in=dfids)
# with two queries (can be faster on MySQL)
dfids = list(Instrument_df.objects.values_list("instrument", flat=True))
Quote.objects.filter(instrument__in=dfids)
Whether this will perform better than your actual solution depends on your db vendor and version (MySQL was known for being very bad at handling subqueries, don't know if it's still the case) and actual content.
But I think the best solution here would be a plain raw query - this is a bit less portable and may require more care in case of a schema update (hint: use a custom manager and write this query as a manager method so you have one single point of truth - you don't want to scatter your views with raw sql queries).

select in (select ..) using ORM django

How can I make a query
select name where id in (select id from ...)
using Django ORM? I think I can make this using some loop for for obtain some result and another loop for, for use this result, but I think that is not practical job, is more simple make a query sql, I think that make this in python should be more simple in python
I have these models:
class Invoice (models.Model):
factura_id = models.IntegerField(unique=True)
created_date = models.DateTimeField()
store_id = models.ForeignKey(Store,blank=False)
class invoicePayments(models.Model):
invoice = models.ForeignKey(Factura)
date = models.DateTimeField()#auto_now = True)
money = models.DecimalField(max_digits=9,decimal_places=0)
I need get the payments of a invoice filter by store_id,date of pay.
I make this query in mysql using a select in (select ...). This a simple query but make some similar using django orm i only think and make some loop for but I don't like this idea:
invoiceXstore = invoice.objects.filter(local=3)
for a in invoiceXstore:
payments = invoicePayments.objects.filter(invoice=a.id,
date__range=["2016-05-01", "2016-05-06"])
You can traverse ForeignKey relations using double underscores (__) in Django ORM. For example, your query could be implemented as:
payments = invoicePayments.objects.filter(invoice__store_id=3,
date__range=["2016-05-01", "2016-05-06"])
I guess you renamed your classes to English before posting here. In this case, you may need to change the first part to factura__local=3.
As a side note, it is recommended to rename your model class to InvoicePayments (with a capital I) to be more compliant with PEP8.
Your mysql raw query is a sub query.
select name where id in (select id from ...)
In mysql this will usually be slower than an INNER JOIN (refer : [http://dev.mysql.com/doc/refman/5.7/en/rewriting-subqueries.html]) thus you can rewrite your raw query as an INNER JOIN which will look like 1.
SELECT ip.* FROM invoicepayments i INNER JOIN invoice i ON
ip.invoice_id = i.id
You can then use a WHERE clause to apply the filtering.
The looping query approach you have tried does work but it is not recommended because it results in a large number of queries being executed. Instead you can do.
InvoicePayments.objects.filter(invoice__local=3,
date__range=("2016-05-01", "2016-05-06"))
I am not quite sure what 'local' stands for because your model does not show any field like that. Please update your model with the correct field or edit the query as appropriate.
To lean about __range see this https://docs.djangoproject.com/en/1.9/ref/models/querysets/#range

Django: Selecting distinct values on maximum foreign key value

I have the following models which I'm testing with SQLite3 and MySQL:
# (various model fields extraneous to discussion removed...)
class Run(models.Model):
runNumber = models.IntegerField()
class Snapshot(models.Model):
t = models.DateTimeField()
class SnapshotRun(models.Model):
snapshot = models.ForeignKey(Snapshot)
run = models.ForeignKey(Run)
# other fields which make it possible to have multiple distinct Run objects per Snapshot
I want a query which will give me a set of runNumbers & snapshot IDs for which the Snapshot.id is below some specified value. Naively I would expect this to work:
print SnapshotRun.objects.filter(snapshot__id__lte=ss_id)\
.order_by("run__runNumber", "-snapshot__id")\
.distinct("run__runNumber", "snapshot__id")\
.values("run__runNumber", "snapshot__id")
But this blows up with
NotImplementedError: DISTINCT ON fields is not supported by this database backend
for both database backends. Postgres is unfortunately not an option.
Time to fall back to raw SQL?
Update:
Since Django's ORM won't help me out of this one (thanks #jknupp) I did manage to get the following raw SQL to work:
cursor.execute("""
SELECT r.runNumber, ssr1.snapshot_id
FROM livedata_run AS r
JOIN livedata_snapshotrun AS ssr1
ON ssr1.id =
(
SELECT id
FROM livedata_snapshotrun AS ssr2
WHERE ssr2.run_id = r.id
AND ssr2.snapshot_id <= %s
ORDER BY snapshot_id DESC
LIMIT 1
);
""", max_ss_id)
Here livedata is the Django app these tables live in.
The note in the Django documentation is pretty clear:
Note:
Any fields used in an order_by() call are included in the SQL SELECT columns. This can sometimes lead to unexpected results when used in conjunction with distinct(). If order by fields from a related model, those fields will be added to the selected columns and they may make otherwise duplicate rows appear to be distinct. Since the extra columns don’t appear in the returned results (they are only there to support ordering), it sometimes looks like non-distinct results are being returned.
Similarly, if you use a values() query to restrict the columns selected, the columns used in any order_by() (or default model ordering) will still be involved and may affect uniqueness of the results.
The moral here is that if you are using distinct() be careful about ordering by related models. Similarly, when using distinct() and values() together, be careful when ordering by fields not in the values() call.
Also, below that:
This ability to specify field names (with distinct) is only available in PostgreSQL.

Group by date in a particular format in SQLAlchemy

I have a table called logs which has a datetime field.
I want to select the date and count of rows based on a particular date format.
How do I do this using SQLAlchemy?
I don't know of a generic SQLAlchemy answer. Most databases support some form of date formatting, typically via functions. SQLAlchemy supports calling functions via sqlalchemy.sql.func. So for example, using SQLAlchemy over a Postgres back end, and a table my_table(foo varchar(30), when timestamp) I might do something like
my_table = metadata.tables['my_table']
foo = my_table.c['foo']
the_date = func.date_trunc('month', my_table.c['when'])
stmt = select(foo, the_date).group_by(the_date)
engine.execute(stmt)
To group by date truncated to month. But keep in mind that in that example, date_trunc() is a Postgres datetime function. Other databases will be different. You didn't mention the underlyig database. If there's a database independent way to do it I've never found one. In my case I run production and test aginst Postgres and unit tests aginst SQLite and have resorted to using SQLite user defined functions in my unit tests to emulate Postgress datetime functions.
Does counting yield the same result when you just group by the unformatted datetime column? If so, you could just run the query and use Python date's strftime() method afterwards. i.e.
query = select([logs.c.datetime, func.count(logs.c.datetime)]).group_by(logs.c.datetime)
results = session.execute(query).fetchall()
results = [(t[0].strftime("..."), t[1]) for t in results]
I don't know SQLAlchemy, so I could be off-target. However, I think that all you need is:
SELECT date_formatter(datetime_field, "format-specification") AS dt_field, COUNT(*)
FROM logs
GROUP BY date_formatter(datetime_field, "format-specification")
ORDER BY 1;
OK, maybe you don't need the ORDER BY, and maybe it would be better to re-specify the date expression. There are likely to be alternatives, such as:
SELECT dt_field, COUNT(*)
FROM (SELECT date_formatter(datetime_field, "format-specification") AS dt_field
FROM logs) AS necessary
GROUP BY dt_field
ORDER BY dt_field;
And so on and so forth. Basically, you format the datetime field and then proceed to do the grouping etc on the formatted value.

Categories