Group by date in a particular format in SQLAlchemy - python

I have a table called logs which has a datetime field.
I want to select the date and count of rows based on a particular date format.
How do I do this using SQLAlchemy?

I don't know of a generic SQLAlchemy answer. Most databases support some form of date formatting, typically via functions. SQLAlchemy supports calling functions via sqlalchemy.sql.func. So for example, using SQLAlchemy over a Postgres back end, and a table my_table(foo varchar(30), when timestamp) I might do something like
my_table = metadata.tables['my_table']
foo = my_table.c['foo']
the_date = func.date_trunc('month', my_table.c['when'])
stmt = select(foo, the_date).group_by(the_date)
engine.execute(stmt)
To group by date truncated to month. But keep in mind that in that example, date_trunc() is a Postgres datetime function. Other databases will be different. You didn't mention the underlyig database. If there's a database independent way to do it I've never found one. In my case I run production and test aginst Postgres and unit tests aginst SQLite and have resorted to using SQLite user defined functions in my unit tests to emulate Postgress datetime functions.

Does counting yield the same result when you just group by the unformatted datetime column? If so, you could just run the query and use Python date's strftime() method afterwards. i.e.
query = select([logs.c.datetime, func.count(logs.c.datetime)]).group_by(logs.c.datetime)
results = session.execute(query).fetchall()
results = [(t[0].strftime("..."), t[1]) for t in results]

I don't know SQLAlchemy, so I could be off-target. However, I think that all you need is:
SELECT date_formatter(datetime_field, "format-specification") AS dt_field, COUNT(*)
FROM logs
GROUP BY date_formatter(datetime_field, "format-specification")
ORDER BY 1;
OK, maybe you don't need the ORDER BY, and maybe it would be better to re-specify the date expression. There are likely to be alternatives, such as:
SELECT dt_field, COUNT(*)
FROM (SELECT date_formatter(datetime_field, "format-specification") AS dt_field
FROM logs) AS necessary
GROUP BY dt_field
ORDER BY dt_field;
And so on and so forth. Basically, you format the datetime field and then proceed to do the grouping etc on the formatted value.

Related

Nesting Django QuerySets

Is there a way to create a queryset that operates on a nested queryset?
The simplest example I can think of to explain what I'm trying to accomplish is by demonstration.
I would like to write code something like
SensorReading.objects.filter(reading=1).objects.filter(meter=1)
resulting in SQL looking like
SELECT * FROM (
SELECT * FROM SensorReading WHERE reading=1
) WHERE sensor=1;
More specifically I have a model representing readings from sensors
class SensorReading(models.Model):
sensor=models.PositiveIntegerField()
timestamp=models.DatetimeField()
reading=models.IntegerField()
With this I am creating a queryset that annotates every sensor with the elapsed time since the previous reading in seconds
readings = (
SensorReading.objects.filter(**filters)
.annotate(
previous_read=Window(
expression=window.Lead("timestamp"),
partition_by=[F("sensor"),],
order_by=["timestamp",],
frame=RowRange(start=-1, end=0),
)
)
.annotate(delta=Abs(Extract(F("timestamp") - F("previous_read"), "epoch")))
)
I now want to aggregate those per sensor to see the minimum and maximum elapsed time between readings from every sensor. I initially tried
readings.values("sensor").annotate(max=Max('delta'),min=Min('delta'))[0]
however, this fails because window values cannot be used inside the aggregate.
Are there any methods or libraries to solve this without needing to resort to raw SQL? Or have I just overlooked a simpler solution to the problem?
The short answer is Yes you can, using the id__in lookup and a subquery in the filter method. function from the django.db.models module.
The long answer is how? :
you can create a subquery that retrieves the filtered SensorReading objects, and then use that subquery in the main queryset (For example):
from django.db.models import Subquery
subquery = SensorReading.objects.filter(reading=1).values('id')
readings = SensorReading.objects.filter(id__in=Subquery(subquery), meter=1)
The above code will generate SQL that is similar to what you described in your example:
SELECT * FROM SensorReading
WHERE id IN (SELECT id FROM SensorReading WHERE reading=1)
AND meter=1;
Another way is to chain the filter() on the queryset that you have created, and add the second filter on top of it
readings = (
SensorReading.objects.filter(**filters)
.annotate(
previous_read=Window(
expression=window.Lead("timestamp"),
partition_by=[F("sensor"),],
order_by=["timestamp",],
frame=RowRange(start=-1, end=0),
)
)
.annotate(delta=Abs(Extract(F("timestamp") - F("previous_read"), "epoch")))
.filter(sensor=1)
)
UPDATE:
As you commented below, you can use the RawSQL function from the django.db.models module to to aggregate the window function values without running the subquery multiple times. This allows you to include raw SQL in the queryset, and use the results of that SQL in further querysets or aggregations.
For example, you can create a raw SQL query that retrieves the filtered SensorReading objects, the previous_read and delta fields with the window function applied, and then use that SQL in the main queryset:
from django.db.models import RawSQL
raw_sql = '''
SELECT id, sensor, timestamp, reading,
LAG(timestamp) OVER (PARTITION BY sensor ORDER BY timestamp) as previous_read,
ABS(EXTRACT(EPOCH FROM timestamp - LAG(timestamp) OVER (PARTITION BY sensor ORDER BY timestamp))) as delta
FROM myapp_sensorreading
WHERE reading = 1
'''
readings = SensorReading.objects.raw(raw_sql)
You can then use the readings queryset to aggregate the data as you need, for example:
aggregated_data = readings.values("sensor").annotate(max=Max('delta'),min=Min('delta'))
Just be aware of the security implications of using raw SQL, as it allows you to include user input directly in the query, which could lead to SQL injection attacks. Be sure to properly validate and sanitize any user input that you use in a raw SQL query.
Ended up rolling my own solution, basically introspecting the queryset to create a fake table to use in the creation of a new query set and setting the alias to a node that knows to render the SQL for the inner query
allows me to do something like
readings = (
NestedQuery(
SensorReading.objects.filter(**filters)
.annotate(
previous_read=Window(
expression=window.Lead("timestamp"),
partition_by=[F("sensor"),],
order_by=[
"timestamp",
],
frame=RowRange(start=-1, end=0),
)
)
.annotate(delta=Abs(Extract(F("timestamp") - F("previous_read"), "epoch")))
)
.values("sensor")
.annotate(min=Min("delta"), max=Max("delta"))
)
code is available on github, and I've published it on pypi
https://github.com/Simage/django-nestedquery
I have no doubt that I'm leaking the tables or some such nonsense still and this should be considered proof of concept, not any sort of production code.

Using jsonb_array_elements in sqlalchemy query

I'm using SQLAlchemy ORM and trying to figure out how to produce a PostgreSQL query something along the lines of:
SELECT
payments.*
FROM
payments,
jsonb_array_elements(payments.data #> '{refunds}') refunds
WHERE
(refunds ->> 'created_at')
BETWEEN '2018-12-01T19:30:38Z' AND '2018-12-02T19:30:38Z';
Though with the start date inclusive and stop date exclusive.
I've been able to get close with:
refundsInDB = db_session.query(Payment).\
filter(Payment.data['refunds', 0, 'created_at'].astext >= startTime,
Payment.data['refunds', 0, 'created_at'].astext < stopTime ).\
all()
However, this only works if the refund (which is a nested array in the JSONB data) is the first element in the list of {'refunds':[]} whereas the SQL query above will work regardless of the position in the refund list.
After a good bit of searching it looks like there are some temporary recipes in an old SQLAlchemy github issue, one of which talks about using jsonb_array_elements in a query, but I haven't been able to quite make it work in the way I'd like.
If it helps my Payment.data JSONB is exactly like the Payment object from the Square Connect v1 API and I am trying to search the data using the created_at date found in the nested refunds list of refund objects.
Use Query.select_from() and a function expression alias to perform the query:
# Create a column object for referencing the `value` attribute produced by
# the set of jsonb_array_elements
val = db.column('value', type_=JSONB)
refundsInDB = db_session.query(Payment).\
select_from(
Payment,
func.jsonb_array_elements(Payment.data['refunds']).alias()).\
filter(val['created_at'].astext >= startTime,
val['created_at'].astext < stopTime ).\
all()
Note that unnesting the jsonb array and filtering based on it may produce multiple rows per Payment, but SQLAlchemy will only return distinct entities, when querying for a single entity.

How to select only date part in postgres by using django ORM

My backend setting as following
Postgres 9.5
Python 2.7
Django 1.9
I have a table with datetime type, which named createdAt. I want to use Django ORM to select this field with only date part and group by createdAt filed.
For example, this createdAt filed may store 2016-12-10 00:00:00+0、2016-12-11 10:10:05+0、2016-12-11 17:10:05+0 ... etc。
By using Djanggo ORM, the output should be 2016-12-10、2016-12-11。The corresponding sql should similar to: SELECT date(createdAt) FROM TABLE GROUP BY createdAt.
Thanks.
You can try that:
use __date operator to Filter by DateTimeField with date part: createAt__date, this is similar to __lt or __gt.
use annotate/Func/F to create an extra field base on createAt with only showing the date part.
use values and annotate, Count/Sum to create group by query.
Example code
from django.db import models
from django.db.models import Func, F, Count
class Post(models.Model):
name = models.CharField('name', max_length=255)
createAt = models.DateTimeField('create at', auto_now=True)
Post.objects.filter(createAt__date='2016-12-26') # filter the date time field for specify date part
.annotate(createAtDate=Func(F('createAt'), function='date')) # create an extra field, named createAtDate to hold the date part of the createAt field, this is similar to sql statement: date(createAt)
.values('createAtDate') # group by createAtDate
.annotate(count=Count('id')) # aggregate function, like count(id) in sql statement
output
[{'count': 2, 'createAtDate': datetime.date(2016, 12, 26)}]
Notice:
To specify each method's functionality, I have break the long statement into serval pieces, so you need to remove the carriage return when you paste it to your code.
When you have turned on timezone in your application, like USE_TZ = True, you need be more carefully when compare two dates in django. Timezone is matter when you making query as above.
Hope it would help. :)
Django provide Manager.raw() to perform raw queries on model but as raw query must contains primary key that is not useful in your case. In case Manager.raw() is not quite enough you might need to perform queries that don’t map cleanly to models, you can always access the database directly, routing around the model layer entirely by using connection. connection object represent the default database connection. So you can perform you query like.
from django.db import connection
#acquire cursor object by calling connection.cursor()
cursor = connection.cursor()
cursor.execute('SELECT created_at from TABLE_NAME GROUP BY created_at')
After executing query you can iterate over cursor object to get query result.

SQLAlchemy Filter based on a function of an a field of a table

I am trying to filter a query based on a function of one of the properties of the table. For example assume, I have a table Days which has a property, day of the type DateTime. now I just want to select the rows that the day happens to be Monday, something like:
db.query(Days).filter(Days.days.strftime('%w')=='1')
but this does not work! SQLAlchemy somehow thinks the days is another table and strftime is not a property of that table. What is the correct way of doing it?
When writing queries, you have to realize that the columns you are accessing are SQLAlchemy columns, not Python values. They will only be Python values once you are looking at an actual row from the result of the query. When writing a query, you need to phrase it in terms of SQL expressions.
The date expressions differ between databases. For PostgreSQL, use the extract() function, which returns a number 0 (Sunday) through 6 (Saturday). For MySQL, use the dayofweek function, which returns a number 1 (Sunday) through 7 (Saturday).
# PostgreSQL
from sqlalchemy.sql import extract
session.query(MyTable).filter(extract('dow', MyTable.my_date) == 1)
# MySQL
from sqlalchemy import func
session.query(MyTable).filter(func.dayofweek(MyTable.my_date) == 2)

How to do a group by/count(*) pr year, month and day in django based on a datetime database field?

I have a table describing files with a datetime field. I want to somehow create a report that gives me the number of files grouped by each year, number of files grouped by each year and month and number of files grouped by each year, month and day. I just want records where count(*) > 0. Preferably using the ORM in django or if that`s not possible, using some SQL that runs on both PostgreSQL and SQLite.
The number of records in this database can be huge so my attempts to do this in code, not in SQL ( or indirectly in SQL thru ORM ) don't work and if I get it to work I don`t think it will scale at all.
Grateful for any hints or solutions.
Normally I work on Oracle but a quick google search showed that this should also work for Postgres. For the minutes you could do like this
select to_char(yourtimestamp,'yyyymmdd hh24:mi'), count(*)
from yourtable
group by to_char(yourtimestamp,'yyyymmdd hh24:mi')
order by to_char(yourtimestamp,'yyyymmdd hh24:mi') DESC;
That works then all the way down to years:
select to_char(yourtimestamp,'yyyy'), count(*)
from yourtable
group by to_char(yourtimestamp,'yyyy')
order by to_char(yourtimestamp,'yyyy') DESC;
You are only getting the years where you got something. I think that is what you wanted.
Edit: You need to build an index on "yourtimestamp" otherwise the performance is ugly if you do have a lot of rows.
My mistake - the date() function only works for MySql:
Maybe try this (SQLite):
tbl = MyTable.objects.filter()
tbl = tbl.extra(select={'count':'count(strftime('%Y-%m-%d', timestamp))', 'my_date':'strftime('%Y-%m-%d', timestamp))'}
tbl = tbl.values('count', 'my_date')
tbl.query.group_by = ['strftime('%Y-%m-%d', timestamp)']
For day and month, you could replace '%Y-%m-%d' with variations of the date format strings.
This was for MySQL (just in case someone needs it)
tbl = MyTable.objects.filter()
tbl = tbl.extra(select={'count':'count(date(timestamp))', 'my_date':'date(timestamp)'}
tbl = tbl.values('count', 'my_date')
tbl.query.group_by = ['date(timestamp)']
That works for year.

Categories