extra column based on aggregated results in Django - python

I have a table storing KLine information including security id, max/min price and date and want to calculate the gains for each security in a certain period. Here's my function
def get_rising_rate(start, end):
return models.KLine.objects.\
filter(start_time__gt=start). \
filter(start_time__lt=end).\
values("security__id").\
annotate(min_price=django_models.Min("min_price_in_cent"),
max_price=django_models.Max("max_price_in_cent")).\
extra(select={"gain": "max_price_in_cent/min_price_in_cent"})
order_by("gain")
But I got the following error:
django.db.utils.OperationalError: (1247, "Reference 'max_price' not supported (reference to group function)")
I can do the query with raw SQL like
SELECT
`security_id`,
`min_price`,
`max_price`,
`max_price`/`min_price` AS gain
FROM(
SELECT
`security_id`,
MIN(`min_price_in_cent`) AS `min_price`,
MAX(`max_price_in_cent`) AS `max_price`
FROM `stocks_kline`
WHERE `start_time` > '2014-12-31 16:00:00' AND `start_time` < '2015-12-31 16:00:00'
GROUP BY `security_id`
) AS A
ORDER BY gain DESC
But I wonder if there's a more "django" way to get it done. I've searched "django join queryset", "django queryset as derived tables" but can't get a solution.
Thanks in advance.

Can you try this:
def get_rising_rate(start, end):
return models.KLine.objects.\
filter(start_time__gt=start). \
filter(start_time__lt=end).\
values("security_id").\
annotate(min_price=django_models.Min("min_price_in_cent"),
max_price=django_models.Max("max_price_in_cent")).\
extra(select={"gain": "max(max_price_in_cent)/min(min_price_in_cent)"})
order_by("gain")

Related

Better Alternate instead of using chained union queries in Django ORM

I needed to achieve something like this in Django ORM :
(SELECT * FROM `stats` WHERE MODE = 1 ORDER BY DATE DESC LIMIT 2)
UNION
(SELECT * FROM `stats` WHERE MODE = 2 ORDER BY DATE DESC LIMIT 2)
UNION
(SELECT * FROM `stats` WHERE MODE = 3 ORDER BY DATE DESC LIMIT 2)
UNION
(SELECT * FROM `stats` WHERE MODE = 6 ORDER BY DATE DESC LIMIT 2)
UNION
(SELECT * FROM `stats` WHERE MODE = 5 AND is_completed != 3 ORDER BY DATE DESC)
# mode 5 can return more than 100 records so NO LIMIT here
for which i wrote this :
query_run_now_job_ids = Stats.objects.filter(mode=5).exclude(is_completed=3).order_by('-date')
list_of_active_job_ids = Stats.objects.filter(mode=1).order_by('-date')[:2].union(
Stats.objects.filter(mode=2).order_by('-date')[:2],
Stats.objects.filter(mode=3).order_by('-date')[:2],
Stats.objects.filter(mode=6).order_by('-date')[:2],
query_run_now_job_ids)
but somehow list_of_active_job_ids returned is unordered i.e list_of_active_job_ids.ordered returns False due to which when this query is passed to Paginator class it gives :
UnorderedObjectListWarning:
Pagination may yield inconsistent results with an unordered object_list
I have already set ordering in class Meta in models.py
class Meta:
ordering = ['-date']
Without paginator query works fine and page loads but using paginator , view never loads it keeps on loading .
Is there any better alternate for achieving this without using chain of union .
So I tried another alternate for above mysql query but i'm stuck in another problem to write up condition for mode = 5 in this query :
SELECT
MODE ,
SUBSTRING_INDEX(GROUP_CONCAT( `job_id` SEPARATOR ',' ),',',2) AS job_id_list,
SUBSTRING_INDEX(GROUP_CONCAT( `total_calculations` SEPARATOR ',' ),',',2) AS total_calculations
FROM `stats`
ORDER BY DATE DESC
Even if I was able to write this Query it would lead me to another challenging situation i.e to convert this query for Django ORM .
So why My Query is not ordered even when i have set it in Class Meta .
Also if not this query , Is there any better alternate for achieving this ?
Help would be appreciated ! .
I'm using Python 2.7 and Django 1.11 .
While subqueries may be ordered, the resulting union data is not. You need to explicitly define the ordering.
from django.db import models
def make_query(mode, index):
return (
Stats.objects.filter(mode=mode).
annotate(_sort=models.Value(index, models.IntegerField())).
order_by('-date')
)
list_of_active_job_ids = make_query(1, 1)[:2].union(
make_query(2, 2)[:2],
make_query(3, 3)[:2],
make_query(6, 4)[:2],
make_query(5, 5).exclude(is_completed=3)
).order_by('_sort', '-date')
All I did was add a new, literal value field _sort that has a different value for each subquery and then ordered by it in the final query.The rest of the code is just to reduce duplication. It would have been even cleaner if it wasn't for that mode=6 subquery.

Pass column values into selection for SQL query using pandas.io.sql

I have multiple sql queries I need to run (via pandas.io.sql / .read_sql) that have a very similar structure so I am attempting to parameterize them.
I am wondering if there is a way to pass in column values using .format (which works for strings).
My query (truncated to simplify this post):
sql= '''
SELECT DISTINCT
CAST(report_suite AS STRING) AS report_suite, post_pagename,
COUNT(DISTINCT(CONCAT(post_visid_high,post_visid_low))) AS unique_visitors
FROM
FOO.db
WHERE
date_time BETWEEN '{0}' AND '{1}'
AND report_suite = '{2}'
GROUP BY
report_suite, post_pagename
ORDER BY
unique_visitors DESC
'''.format(*parameters)
What I would like to do, is be able to parameterize the COUNT(DISTINCT(CONCAT(post_visid_high, post_visid_low))) as Unique Visitors
like this somehow:
COUNT(DISTINCT({3})) as {'4'}
The problem I can't seem to get around is that in order to do this would require storing the column names as something other than a string to avoid the quotes. Is there any good ways around this?
Consider the following approach:
sql_dynamic_parms = dict(
func1='CONCAT(post_visid_high,post_visid_low)',
name1='unique_visitors'
)
sql= '''
SELECT DISTINCT
CAST(report_suite AS STRING) AS report_suite, post_pagename,
COUNT(DISTINCT({func1})) AS {name1}
FROM
FOO.db
WHERE
date_time BETWEEN %(date_from)s AND %(date_to)s
AND report_suite = %(report_suite)s
GROUP BY
report_suite, post_pagename
ORDER BY
unique_visitors DESC
'''.format(**sql_dynamic_parms)
params = dict(
date_from=pd.to_datetime('2017-01-01'),
date_to=pd.to_datetime('2017-12-01'),
report_suite=111
)
df = pd.read_sql(sql, conn, params=params)
PS you may want to read PEP-249 to see what kind of parameter placeholders are accepted

Coalesce results in a QuerySet

I have the following models:
class Property(models.Model):
name = models.CharField(max_length=100)
def is_available(self, avail_date_from, avail_date_to):
# Check against the owner's specified availability
available_periods = self.propertyavailability_set \
.filter(date_from__lte=avail_date_from, \
date_to__gte=avail_date_to) \
.count()
if available_periods == 0:
return False
return True
class PropertyAvailability(models.Model):
de_property = models.ForeignKey(Property, verbose_name='Property')
date_from = models.DateField(verbose_name='From')
date_to = models.DateField(verbose_name='To')
rate_sun_to_thurs = models.IntegerField(verbose_name='Nightly rate: Sun to Thurs')
rate_fri_to_sat = models.IntegerField(verbose_name='Nightly rate: Fri to Sat')
rate_7_night_stay = models.IntegerField(blank=True, null=True, verbose_name='Weekly rate')
minimum_stay_length = models.IntegerField(default=1, verbose_name='Min. length of stay')
class Meta:
unique_together = ('date_from', 'date_to')
Essentially, each Property has its availability specified with instances of PropertyAvailability. From this, the Property.is_available() method checks to see if the Property is available during a given period by querying against PropertyAvailability.
This code works fine except for the following scenario:
Example data
Using the current Property.is_available() method, if I were to search for availability between the 2nd of Jan, 2017 and the 5th of Jan, 2017 it'd work because it matches #1.
But if I were to search between the 4th of Jan, 2017 and the 8th of Jan, 2017, it wouldn't return anything because the date range is overlapping between multiple results - it matches neither #1 or #2.
I read this earlier (which introduced a similar problem and solution through coalescing results) but had trouble writing that using Django's ORM or getting it to work with raw SQL.
So, how can I write a query (preferably using the ORM) that will do this? Or perhaps there's a better solution that I'm unaware of?
Other notes
Both avail_date_from and avail_date_to must match up with PropertyAvailability's date_from and date_to fields:
avail_date_from must be >= PropertyAvailability.date_from
avail_date_to must be <= PropertyAvailability.date_to
This is because I need to query that a Property is available within a given period.
Software specs
Django 1.11
PostgreSQL 9.3.16
My solution would be to check whether the date_from or the date_to fields of PropertyAvailability are contained in the period we're interested in. I do this using Q objects. As mentioned in the comments above, we also need to include the PropertyAvailability objects that encompass the entire period we're interested in. If we find more than one instance, we must check if the availability objects are continuous.
from datetime import timedelta
from django.db.models import Q
class Property(models.Model):
name = models.CharField(max_length=100)
def is_available(self, avail_date_from, avail_date_to):
date_range = (avail_date_from, avail_date_to)
# Check against the owner's specified availability
query_filter = (
# One of the records' date fields falls within date_range
Q(date_from__range=date_range) |
Q(date_to__range=date_range) |
# OR date_range falls between one record's date_from and date_to
Q(date_from__lte=avail_date_from, date_to__gte=avail_date_to)
)
available_periods = self.propertyavailability_set \
.filter(query_filter) \
.order_by('date_from')
# BEWARE! This might suck up a lot of memory if the number of returned rows is large!
# I do this because negative indexing of a `QuerySet` is not supported.
available_periods = list(available_periods)
if len(available_periods) == 1:
# must check if availability matches the range
return (
available_periods[0].date_from <= avail_date_from and
available_periods[0].date_to >= avail_date_to
)
elif len(available_periods) > 1:
# must check if the periods are continuous and match the range
if (
available_periods[0].date_from > avail_date_from or
available_periods[-1].date_to < avail_date_to
):
return False
period_end = available_periods[0].date_to
for available_period in available_periods[1:]:
if available_period.date_from - period_end > timedelta(days=1):
return False
else:
period_end = available_period.date_to
return True
else:
return False
I feel the need to mention though, that the database model does not guarantee that there are no overlapping PropertyAvailability objects in your database. In addition, the unique constraint should most likely contain the de_property field.
What you should be able to do is aggregate the data you wish to query against, and combine any overlapping (or adjacent) ranges.
Postgres doesn't have any way of doing this: it has operators for union and combining adjacent ranges, but nothing that will aggregate collections of overlapping/adjacent ranges.
However, you can write a query that will combine them, although how to do it with the ORM is not obvious (yet).
Here is one solution (left as a comment on http://schinckel.net/2014/11/18/aggregating-ranges-in-postgres/#comment-2834554302, and tweaked to combine adjacent ranges, which appears to be what you want):
SELECT int4range(MIN(LOWER(value)), MAX(UPPER(value))) AS value
FROM (SELECT value,
MAX(new_start) OVER (ORDER BY value) AS left_edge
FROM (SELECT value,
CASE WHEN LOWER(value) <= MAX(le) OVER (ORDER BY value)
THEN NULL
ELSE LOWER(value) END AS new_start
FROM (SELECT value,
lag(UPPER(value)) OVER (ORDER BY value) AS le
FROM range_test
) s1
) s2
) s3
GROUP BY left_edge;
One way to make this queryable from within the ORM is to put it in a Postgres VIEW, and have a model that references this.
However, it is worth noting that this queries the whole source table, so you may want to have filtering applied; probably by de_property.
Something like:
CREATE OR REPLACE VIEW property_aggregatedavailability AS (
SELECT de_property
MIN(date_from) AS date_from,
MAX(date_to) AS date_to
FROM (SELECT date_from,
date_to,
MAX(new_from) OVER (PARTITION BY de_property
ORDER BY date_from) AS left_edge
FROM (SELECT de_property,
date_from,
date_to,
CASE WHEN date_from <= MAX(le) OVER (PARTITION BY de_property
ORDER BY date_from)
THEN NULL
ELSE date_from
END AS new_from
FROM (SELECT de_property,
date_from,
date_to,
LAG(date_to) OVER (PARTITION BY de_property
ORDER BY date_from) AS le
FROM property_propertyavailability
) s1
) s2
) s3
GROUP BY de_property, left_edge
)
As an aside, you might want to consider using Postgres's date range objects, because then you can prevent start > finish (automatically), but also prevent overlapping periods for a given property, using exclusion constraints.
Finally, an alternative solution might be to have a derived table, that stores unavailability, based on taking the available periods, and reversing them. This makes writing the query simpler, as you can write a direct overlap, but negate (i.e., a property is available for a given period iff there are no overlapping unavailable periods). I do that in a production system for staff availability/unavailability, where many checks need to be made. Note that is a denormalised solution, and relies on trigger functions (or other updates) to ensure it is kept in sync.

How to convert SQL scalar subquery to SQLAlchemy expression

I need a litle help with expressing in SQLAlchemy language my code like this:
SELECT
s.agent_id,
s.property_id,
p.address_zip,
(
SELECT v.valuation
FROM property_valuations v WHERE v.zip_code = p.address_zip
ORDER BY ABS(DATEDIFF(v.as_of, s.date_sold))
LIMIT 1
) AS back_valuation,
FROM sales s
JOIN properties p ON s.property_id = p.id
Inner subquery aimed to get property value from table propert_valuations with columns (zip_code INT, valuation DECIMAL, as_if DATE) closest to the date of sale from table sales. I know how to rewrite it but I completely stuck on order_by expression - I cannot prepare subquery to pass ordering member later.
Currently I have following queries:
subquery = (
session.query(PropertyValuation)
.filter(PropertyValuation.zip_code == Property.address_zip)
.order_by(func.abs(func.datediff(PropertyValuation.as_of, Sale.date_sold)))
.limit(1)
)
query = session.query(Sale).join(Sale.property_)
How to combine these queries together?
How to combine these queries together?
Use as_scalar(), or label():
subquery = (
session.query(PropertyValuation.valuation)
.filter(PropertyValuation.zip_code == Property.address_zip)
.order_by(func.abs(func.datediff(PropertyValuation.as_of, Sale.date_sold)))
.limit(1)
)
query = session.query(Sale.agent_id,
Sale.property_id,
Property.address_zip,
# `subquery.as_scalar()` or
subquery.label('back_valuation'))\
.join(Property)
Using as_scalar() limits returned columns and rows to 1, so you cannot get the whole model object using it (as query(PropertyValuation) is a select of all the attributes of PropertyValuation), but getting just the valuation attribute works.
but I completely stuck on order_by expression - I cannot prepare subquery to pass ordering member later.
There's no need to pass it later. Your current way of declaring the subquery is fine as it is, since SQLAlchemy can automatically correlate FROM objects to those of an enclosing query. I tried creating models that somewhat represent what you have, and here's how the query above works out (with added line-breaks and indentation for readability):
In [10]: print(query)
SELECT sale.agent_id AS sale_agent_id,
sale.property_id AS sale_property_id,
property.address_zip AS property_address_zip,
(SELECT property_valuations.valuation
FROM property_valuations
WHERE property_valuations.zip_code = property.address_zip
ORDER BY abs(datediff(property_valuations.as_of, sale.date_sold))
LIMIT ? OFFSET ?) AS back_valuation
FROM sale
JOIN property ON property.id = sale.property_id

How can django produce this SQL?

I have the following SQL query that returns what i need:
SELECT sensors_sensorreading.*, MAX(sensors_sensorreading.timestamp) AS "last"
FROM sensors_sensorreading
GROUP BY sensors_sensorreading.chipid
In words: get the last sensor reading entry for each unique chipid.
But i cannot seem to figure out the correct Django ORM statement to produce this query. The best i could come up with is:
SensorReading.objects.values('chipid').annotate(last=Max('timestamp'))
But if i inspect the raw sql it generates:
>>> print connection.queries[-1:]
[{u'time': u'0.475', u'sql': u'SELECT
"sensors_sensorreading"."chipid",
MAX("sensors_sensorreading"."timestamp") AS "last" FROM
"sensors_sensorreading" GROUP BY "sensors_sensorreading"."chipid"'}]
As you can see, it almost generates the correct SQL, except django selects only the chipid field and the aggregate "last" (but i need all the table fields returned instead).
Any idea how to return all fields?
Assuming you also have other fields in the table besides chipid and timestamp, then I would guess this is the SQL you actually need:
select * from (
SELECT *, row_number() over (partition by chipid order by timestamp desc) as RN
FROM sensors_sensorreading
) X where RN = 1
This will return the latest rows for each chipid with all the data that is in the row.

Categories