Django queryset : Calculate monthly average - python

I have a sales model and i want to calculate the (Number of transactions)/(num of days) when grouped by month, week, year.
class SaleItem(models.Model):
id = models.UUIDField(default=uuid4, primary_key=True)
bill = models.ForeignKey()
item = models.ForeignKey('item')
quantity = models.PositiveSmallIntegerField()
price = models.DecimalField(max_digits=13, decimal_places=3, default=0)
So if sales is grouped by month then this becomes (# transcations/# days in that month) for each month. Now if the sales if grouped by year this becomes (# transcations/# days in that year)
Currently i can get the number of transactions
aggregate = 'month' # parameter
# get number of transactions
SaleItem.objects.annotate(date=Trunc('bill__date', aggregate)).values('date').annotate(sales=Count('bill', distinct=True))
But how can i divide each count by the number of days in that group?

Doing it in SQL is possible (and not even that difficult). Getting the number of days in a month is RDBMS-specific though, and there is no generic Django database function to shield you from the various SQL implementations.
Django makes it very easy to wrap your own functions around SQL functions. For instance, for SQLite, you can define
class DaysInMonth(Func):
output_field = IntegerField()
def as_sqlite(self, compiler, connection):
return super().as_sql(
compiler,
connection,
function='strftime',
template='''
%(function)s("%%%%d",
%(expressions)s,
"start of month",
"+1 month",
"-1 day")
''',
)
Then you can use DaysInMonth() to divide your count by the number of days:
qs = (
SaleItem.objects
.annotate(date=Trunc('bill__date', aggregate))
.values('date')
.annotate(
sales = Count('bill', distinct=True),
sales_per_day = F('sales') / DaysInMonth('date')
)
)
If a rounded-down integer is not sufficient and you need a decimal result, this is another hoop to jump through:
sales_per_day=ExpressionWrapper(
Cast('sales', FloatField()) / DaysInMonth(F('date')),
DecimalField()
)
If, heaven forbid, you want to round in the database rather than in your template, you need another custom function:
class Round(Func):
function = 'ROUND'
output_field = FloatField()
arity = 2
sales_per_day=Round(
Cast('sales', FloatField()) / DaysInMonth(F('date')),
2 # decimal precision
)
So Django is really flexible, but as Willem said doing it in Python would save you some pain without losing significant performance (if any at all).

Related

Django Query: Return average monthly spend per user

Given a Django model which stores user transactions, how can I create a query that returns the average monthly spend for each user?
My current solution queries all the transactions , iterates the result and calculates the averages for each user using a dictionary. I am aware that there is a more efficient way to query this using aggregate/annotate but unsure how to write it.
I am more concerned about readability than speed since the number of transactions in the db is relatively small and will never change.
models.py
Class Transactions(models.Model):
id = models.AutoField(primary_key=True)
user = models.ForeignKey('User', on_delete=models.CASCADE)
amount = models.DecimalField(max_digits=100, decimal_places=2)
date = models.DateField()
Taking a late night stab at this (untested). The code below extracts out the year and month from the date, then clear the order with order_by() (may not be necessary in all cases), then group by the 'user', 'year', 'month' and calculate the average, storing in a column named 'average' by using the Avg function.
Documentation has some other good examples.
from django.db.models.functions import ExtractMonth, ExtractYear
from django.db.models import Avg
...
avg_by_user_by_month = Transactions.objects
.annotate(month=ExtractMonth('date'),
year=ExtractYear('date'),) \
.order_by()\
.values('user', 'year', 'month')\
.annotate(average=Avg('amount'))\
.values('user', 'year', 'month', 'average')
EDIT
Alternatively, this may also work:
from django.db.models import Avg, F
...
avg_by_user_by_month = Transactions.objects
.annotate(month=F('date__month'),
year=F('date__year'),) \
.order_by()\
.values('user', 'year', 'month')\
.annotate(average=Avg('amount'))\
.values('user', 'year', 'month', 'average')

SQLAlchemy filter by upcoming birthday / annual anniversary

Env: python 3.8, flask-sqlalchemy, postgres
class User(db.Model):
name = db.Column(db.Text)
birthday = db.Column(db.DateTime)
#classmethod
def upcoming_birthdays(cls):
return (cls.query
.filter("??")
.all()
)
I'd like to create a sqlalchemy query that filters users with an upcoming birthday within X number of days. I thought about using the extract function, to extract the month and day from the birthday, but that doesn't work for days at the end of the month or year. I also thought about trying to convert the birthday to a julian date for comparison, but I don't know how to go about that.
For example if today was August 30, 2020 it would return users with birthdays
September 1 1995
August 31 2010 .... etc
Thanks for your help
You can do achieve your goal of having a simple query as below:
q = (
db.session.query(User)
.filter(has_birthday_next_days(User.birthday, 7))
)
This is not a #classmethod on User, but you can transform the solution to one if so you desire.
What is left to do is to actually implement the has_birthday_next_days(...), which is listed below and is mostly the documentation of the principle:
def has_birthday_next_days(sa_col, next_days: int = 0):
"""
sqlalchemy expression to indicate that an sa_col (such as`User.birthday`)
has anniversary within next `next_days` days.
It is implemented by simply checking if the 'age' of the person (in years)
has changed between today and the `next_days` date.
"""
return age_years_at(sa_col, next_days) > age_years_at(sa_col)
There can be multiple implementations of the age_years_at and below is just one possibility, speficic to postgresql (including the required imports):
import datetime
import sqlalchemy as sa
def age_years_at(sa_col, next_days: int = 0):
"""
Generates a postgresql specific statement to return 'age' (in years)'
from an provided field either today (next_days == 0) or with the `next_days` offset.
"""
stmt = func.age(
(sa_col - sa.func.cast(datetime.timedelta(next_days), sa.Interval))
if next_days != 0
else sa_col
)
stmt = func.date_part("year", stmt)
return stmt
Finally, the desired query q = db.session.query(User).filter(has_birthday_next_days(User.birthday, 30)) generates:
SELECT "user".id,
"user".name,
"user".birthday
FROM "user"
WHERE date_part(%(date_part_1)s, age("user".birthday - CAST(%(param_1)s AS INTERVAL)))
> date_part(%(date_part_2)s, age("user".birthday))
{'date_part_1': 'year', 'param_1': datetime.timedelta(days=30), 'date_part_2': 'year'}
Bonus: Having implemented this using generic functions, it can be used not only on the User.birthday column but any other type compatible value. Also the functions can be used separately in both the select and where parts of the statement. For example:
q = (
db.session.query(
User,
age_years_at(User.birthday).label("age_today"),
age_years_at(User.birthday, 7).label("age_in_a_week"),
has_birthday_next_days(User.birthday, 7).label("has_bday_7-days"),
has_birthday_next_days(User.birthday, 30).label("has_bday_30-days"),
)
.filter(has_birthday_next_days(User.birthday, 30))
)
Your original idea about extracting day and month is good.
If you start with e.g. a birthday September 1 1995, you can move that day and month to the current year (i.e. September 1 2020) and afterwards check if that date is within your specified range (between today and today+X days).
As for the days at the end of the year - you can solve that by moving the birthday day and month not only to the current year, but also to the next one.
For example, let's say that today is December 30 2020 and you have a birthday January 2 1995. Then you would check whether either one of the dates January 2 2020 or January 2 2021 are within your specified range.
from datetime import date, timedelta
from sqlalchemy import func,or_
#classmethod
def upcoming_birthdays(cls):
dateFrom = date.today()
dateTo = date.today() + timedelta(days=5)
thisYear = dateFrom.year
nextYear = dateFrom.year + 1
return (cls.query
.filter(
or_(
func.to_date(func.concat(func.to_char(cls.birthday, "DDMM"), thisYear), "DDMMYYYY").between(dateFrom,dateTo),
func.to_date(func.concat(func.to_char(cls.birthday, "DDMM"), nextYear), "DDMMYYYY").between(dateFrom,dateTo)
)
)
.all()
)

Compare objects from two models in Django

I have two models PriceActual and PriceBenchmark having fields date and price.
I want to compare the actual prices with the benchmark prices.
I'm not interested in benchmark prices with dates which is not present in the actual prices. So if PriceActual only has objects from the last week, I only want to query objects from PriceBenchmark also within the last week.
I guess it's something like
actual = PriceActual.objects.all()
benchmark = PriceActual.objects.filter(date__in=actual)
Edit
The models are really simple
class PriceActual(models.Model):
date = DateField()
price = DecimalField()
class PriceBenchmark(models.Model):
date = DateField()
price = DecimalField()
Maybe not the most efficient...but you could always create a list of dates:
actual = PriceActual.objects.all()
actual_dates = [x.date for x in actual]
benchmark = PriceActual.objects.filter(date__in=actual_dates)

django number of consecutive years

I am working on a page with stocks and dividends. So to simplify, my model is something like this:
class Stock(models.Model):
name = models.CharField("Stock's name", max_length=200)
class Dividend(models.Model):
date = models.DateField('pay date')
amount = models.DecimalField(max_digits=20, decimal_places=10)
stock = models.ForeignKey(Stock)
I want to calculate number of consecutive years where dividend was paid for each stock. Pure existence of dividend that year is enough. If company paid dividends through 2000-2005 and 2008-2014, I want to get number 7. What is the best way to calculate it? I came up with:
1. making query for each year if there is any dividend (too many requests)
2. using values() or values_list() to get only list of distinct ordered years and then iterating over that list
I would go with number 2. Is there any better way how to use queryset to calculate this value?
Edit:
3. I noticed dates just now.
I have thought about your comments and this is what I think.
remram: SQL unnecessary difficult.
dylrei: Firstly I thought it's a great idea, but now I am no so sure. There might be one dividend 70 years ago and then 69 consecutive years no dividends. There are a lot if stocks that do not pay dividends and I should be able to get 0 years quickly. Also, I do not know how to get years, were no dividends were paid in Django.
This is what I came up with:
def get_years_paying(symbol):
"""
get number of consecutive years paying dividends for Stock by symbol
:param symbol:
:return: number of years
"""
num_years = 0
iteration_year = date.today().year
dividend_years = Dividend.objects.filter(
stock__symbol=symbol, spinoff=False, special=False, split=False
).dates(
"date", "year", order='DESC'
)
dividend_years_list = [i.year for i in dividend_years]
while True:
if iteration_year in dividend_years_list:
num_years += 1
elif iteration_year == date.today().year:
pass # current year is optional, there will be no dividends on 1. January
else:
return num_years if num_years != 0 else None
iteration_year -= 1

Django Group By Weekday?

I'm using Django 1.5.1, Python 3.3.x, and can't use raw queries for this.
Is there a way to get a QuerySet grouped by weekday, for a QuerySet that uses a date __range filter? I'm trying to group results by weekday, for a query that ranges between any two dates (could be as much as a year apart). I know how to get rows that match a weekday, but that would require pounding the DB with 7 queries just to find out the data for each weekday.
I've been trying to figure this out for a couple hours by trying different tweaks with the __week_day filter, but nothing's working. Even Googling doesn't help, which makes me wonder if this is even possible. Any Django guru's here know how, if it is possible to do?
Since extra is deprecated, here is a new way of grouping on the day of the week using ExtractDayOfWeek.
from django.db.models.functions import ExtractWeekDay
YourObjects.objects
.annotate(weekday=ExtractWeekDay('timestamp'))
.values('weekday')
.annotate(count=Count('id'))
.values('weekday', 'count')
This will return a result like:
[{'weekday': 1, 'count': 534}, {'weekday': 2, 'count': 574},.......}
It is also important to note that 1 = Sunday and Saturday = 7
Well man I did an algorithm this one brings you all the records since the beginning of the week (Monday) until today
for example if you have a model like this in your app:
from django.db import models
class x(models.Model):
date = models.DateField()
from datetime import datetime
from myapp.models import x
start_date = datetime.date(datetime.now())
week = start_date.isocalendar()[1]
day_week =start_date.isoweekday()
days_quited = 0
less_days = day_week
while less_days != 1:
days_quited += 1
less_days -= 1
week_begin = datetime.date(datetime(start_date.year,start_date.month,start_date.day-days_quited))
records = x.objects.filter(date__range=(week_begin, datetime.date(datetime.now())))
And if you add some records in the admin with a range between June 17 (Monday) and June 22 (today) you will see all those records, and if you add more records with the date of tomorrow for example or with the date of the next Monday you will not see those records.
If you want the records of other week unntil now you only have to put this:
start_date = datetime.date(datetime(year, month, day))
records = x.objects.filter(date__range=(week_begin, datetime.date(datetime.now())))
Hope this helps! :D
You need to add an extra weekday field to the selection, then group by that in the sum or average aggregation. Note that this becomes a database specific query, because the 'extra' notation becomes passed through to the DB select statement.
Given the model:
class x(models.Model):
date = models.DateField()
value = models.FloatField()
Then, for mysql, with a mapping of the ODBC weekday to the python datetime weekday:
x.objects.extra(select={'weekday':"MOD(dayofweek(date)+5,7)"}).values('weekday').annotate(weekday_value=Avg('value'), weekday_value_std=StdDev('value'))
Note that if you do not need to convert the MySql ODBC weekday (1 = Sunday, 2 = Monday...) to python weekday (Monday is 0 and Sunday is 6), then you do not need to do the modulo.
For model like this:
class A(models.Model):
date = models.DateField()
value = models.FloatField()
You can use query:
weekday = {"w": """strftime('%%w', date)"""}
qs = A.objects.extra(select=weekday).values('w').annotate(stat = Sum("value")).order_by()

Categories