Django averaging over a date range - python

I have a simple uptime monitoring app built in Django which checks if a site doesn't respond to a ping request. It stores a timestamp of when it was pinged in the column "dt" and the millisecond response time in the column "ms". The site is pinged every minute and an entry is put into the database.
The Django model looks like this:
class Uptime (models.Model):
user = models.ForeignKey(User)
dt = models.DateTimeField()
ms = models.IntegerField()
I'd like to grab one day at a time from the dt column and take an average of the ms response time for that day. Even though there's 1440 entries per day I'd like to just grab the day (e.g. 4-19-2013) and get the average ms response time. The code I have been using below is undeniably wrong but I feel like I'm on the right track. How could I get this to work?
output_date = Uptime.objects.filter(user = request.user).values_list('dt', flat = True).filter(dt__range=(startTime, endTime)).order_by('dt').extra(select={'day': 'extract( day from dt )'}).values('day')
output_ms = Uptime.objects.filter(user = request.user).filter(dt__range=(startTime, endTime)).extra(select={'day': 'date( dt )'}).values('day').aggregate(Avg('ms'))
Thanks!

You need to annotate to do the group by. Django doesnt have anything in the orm for extracting only dates, so you need to add an "extra" parameter to the query, which specifies to only use the dates. You then select only those values and annotate.
Try the following:
Uptime.objects.filter(user=request.user).extra({'_date': 'Date(dt)'}).values('_date').annotate(avgMs=Avg('ms'))
This should give you a list like follows:
[{'_date': datetime.date(2012, 7, 5), 'avgMs': 300},{'_date': datetime.date(2012, 7, 6), 'avgMs': 350}]

Related

Group data by month, if the month is in a date range of two fields in Django

I have a contract model containing a start and end datetime field. I want to show in a graph how many contracts are active per month (the month is between start and end time).
How can I get this information without multiple database requests per month?
I can annotate it for each field like this
start_month_contracts = contracts.annotate(
start_month=TruncMonth("start")
) \
.values("start_month") \
.annotate(count=Count("start_month"))
end_month_contracts = contracts.annotate(
end_month=TruncMonth("end")
) \
.values("end_month") \
.annotate(count=Count("end_month"))
but how do I combine both to get the active contracts per month?
Suppose you have the following model with start and end dates:
class Contract(models.Model):
...
start = models.DateTimeField()
end = models.DateTimeField()
Basic query for "active" contracts in a month
The basic formula is as you stated:
the month is between start and end time
A query can get us this for any given month...
# Get active contracts for December 2020
month = datetime.datetime(2020, 12, 1)
# all Contract records active in december
qs = Contract.objects.filter(start__lte=month, end__gte=month)
# Or, since we just care about the count, we can use `.count()` instead:
december_active_count = Contract.objects.filter(start__lte=month, end__gte=month).count()
If you find you need to tweak the basic query, that's fine. It's not so much the query as much as it is the methods, which carry through this explanation, without regard to what the query happens to be.
multiple counts in a single query
There's a few ways you can do a single query and chart out your contracts...
Counting records in the django application
A simple naïve approach is to pull all the relevant contracts first in a single query, then count them for each month in Python...
This works fine, but there's a few potential problems:
The DB will send data for each record. If you have many records, the number of bytes required to be sent by the DB could get excessive.
While the calculations here are fairly lightweight, it does require some CPU power for Python to crunch these numbers for every record and could take a while if there are many records
Really, we probably want to have the DB do the counting for us.
Counting on the database
If you wanted to handle this on the database, rather than in Python, you can develop a query to do aggregations DB-side using .aggregate. The benefit here is that the DB only has to transmit the counts, rather than all the records, which is a significantly smaller number of bytes. It also offloads some number crunching from your app to the DB.
Extending on the first example, let's try to get the counts for more than 1 month in a single query. We do this by using aggregate along with the Count aggregation function.
from django.db.models import Count, Q
november = datetime.datetime(2020, 11, 1)
december = datetime.datetime(2020, 12, 1)
contract_counts = Contract.objects.aggregate(
november_counts=Count('pk', filter=Q(start__lte=november, end__gte=november))
december_counts=Count('pk', filter=Q(start__lte=december, end__gte=december))
)
print(contract_counts)
{'november_counts': 376, 'december_counts': 393} # <-- output
We can apply this same principle to get the counts for all months over a specified time range. In order to do this, we pre-determine each month between start and end that will be counted and use Case and Count for each of those months.
Really, this is now just a matter of generating the keyword arguments like above, but dynamically.
I'll also create a custom manager for this model, so make the interface a little nicer.
import calendar
from django.db.models import Count, Q
class ContractManager(models.Manager):
def month_counts(self, start, end):
qs = self.get_queryset()
# generate keyword arguments for .aggregate
aggregations = {}
for month in months(start, end): # the start of each month in the range
month_name = calendar.month_name[month.month]
aggregation_name = f'{month_name}_{month.year}'
aggregations[aggregation_name] = Count(
'pk', filter=Q(start__lte=month, end__gte=month)
)
return qs.aggregate(**aggregations)
class Contract(models.Model):
start = models.DateTimeField()
end = models.DateTimeField()
objects = ContractManager()
You can then produce the counts like so:
start = datetime(2020, 1, 1)
end = datetime(2021, 1, 1)
print(Contract.objects.month_counts(start, end))
The output, gathered from of this might look something like this:
{'January_2020': 2,
'February_2020': 90,
'March_2020': 163,
'April_2020': 234,
'May_2020': 272,
'June_2020': 284,
'July_2020': 284,
'August_2020': 275,
'September_2020': 247,
'October_2020': 205,
'November_2020': 128,
'December_2020': 68,
'January_2021': 3}
You can also see only 1 query is used:
from django.db import connection
print(len(connection.queries))
# 1
Final thoughts and notes
I should mention This is not the most efficient way to do this and there's a lot of room for optimization. You could probably also generate the month intervals on the DB side, instead of in Python, if you wanted. Specific backends may have more performant options available, too, like the daterange functions of Postgres. Though, what we have here should provide enough context for using aggregate to get the counts you want.
I can annotate it for each field like this
I don't think your code here gets you the counts you really want. You're counting the number of contracts that either started or ended in a particular month... but this won't be able to tell you how many contracts were active in any single given month.
P.S.
I omitted the code for the months() function above for brevity. The code can be found here if you're interested. Something like pandas might be more performant, though it shouldn't be a concern, unless your time intervals go over thousands of years :-)

Variables in a Postgres view?

I have a view in Postgres which queries a master table (150 million rows) and retrieves data from the prior day (a function which returns SELECT yesterday; it was the only way to get the view to respect my partition constraints) and then joins it with two dimension tables. This works fine, but how would I loop through this query in Python? Is there a way to make the date dynamic?
for date in date_range('2016-06-01', '2017-07-31'):
(query from the view, replacing the date with the date in the loop)
My workaround was to literally copy and paste the entire view as a huge select statement format string, and then pass in the date in a loop. This worked, but it seems like there must be a better solution to utilize an existing view or to pass in a variable which might be useful in the future.
To loop day by day inside the interval on a for loop you could do something like:
import datetime
initialDate = datetime.datetime(2016, 6, 1)
finalDate = datetime.datetime(2017, 7, 31)
for day in range((finalDate - initialDate).days + 1):
current = (initialDate + datetime.timedelta(days = day)).date()
print("query from the view, replacing the date with " + current.strftime('%m/%d/%Y'))
Replacing the print with the call to your query. If the dates are strings you can do something like:
initialDate = datetime.datetime.strptime("06/01/2016", '%m/%d/%Y')

Django filter using two conditions

I am developing a web app using django. Among with other tables, I have a table called GeneralContract, which has issueDate and Expiration Date as date fields.
I want to find out the profit of an insurance agent in my case would get from these contracts between a period. For example, if the date range is 15 January 2015 - 25 February 2015 I would like to filter all GeneralContract objects who are issued ANY year between this period.
(i.e. issueDate__month = Date1.month AND IssueDate.day_gte= Date1.day) AND (IssueDate__month = Date2.month AND IssueDate.day_lte = Date2.day) ??
I tried the following but it is not giving me the results I wanted and I am not sure if I am writing this in the correct syntax or if my logic is wrong.
criterion1 = Q(issuedate__month=date1.month)
criterion2 = Q(issuedate__day__gte=date1.day)
criterion3 = Q(issuedate__month=date2.month)
criterion4 = Q(issuedate__day__lte=date2.day)
criterionA = criterion1 & criterion2
criterionB = criterion3 & criterion4
criterionC = criterionA & criterionB
currentGenProfits = GeneralContract.objects.filter(criterionC, cancelled=False)
Is this the right way of doing this filtering logic?
You can't do that because if date1.day = 5 and date2.day = 4 you will have no result, you must check the month and the date together, syntax is right but logic is wrong.
I may suggest to start by taking the biggest set and then apply filtering on it
Start with filtering between the two month and then remove from the queryset objects which are on the first month but before the min day and remove from the queryset objects which are on the last month but after the max day, i think that can do the job.

Django query count unique NEW results per day

I'm keeping statistics of my app in a database in a model that looks like this
class MyStats():
event_code = django.db.models.PositiveSmallIntegerField()
timestamp = django.db.models.DateTimeField()
timestamp_date = django.db.models.DateField()
device_id = django.db.models.CharField(max_length=32)
I would like to use this data to determine, for each day, how many NEW app installations I have.
I got as far as this:
MyStats.objects.order_by('-timestamp_date').values('timestamp_date').annotate(count_total=Count('device_id', distinct=True))
But what it seems to give me is the amount of unique users per DAY, which is not desired. Any hints?
"New" really means "Occurring after x_timedelta ago":
import datetime
x_timedelta = datetime.timedelta(days=1)
x_timedelta_ago = datetime.datetime.now() - x_timedelta
your_query = MyStats.objects.filter(timestamp_date__gt=x_timedelta_ago)
your_query_distinct = your_query.order_by('-timestamp').annotate(count_total=Count('device_id', distinct=True)).values()

Django Group By Weekday?

I'm using Django 1.5.1, Python 3.3.x, and can't use raw queries for this.
Is there a way to get a QuerySet grouped by weekday, for a QuerySet that uses a date __range filter? I'm trying to group results by weekday, for a query that ranges between any two dates (could be as much as a year apart). I know how to get rows that match a weekday, but that would require pounding the DB with 7 queries just to find out the data for each weekday.
I've been trying to figure this out for a couple hours by trying different tweaks with the __week_day filter, but nothing's working. Even Googling doesn't help, which makes me wonder if this is even possible. Any Django guru's here know how, if it is possible to do?
Since extra is deprecated, here is a new way of grouping on the day of the week using ExtractDayOfWeek.
from django.db.models.functions import ExtractWeekDay
YourObjects.objects
.annotate(weekday=ExtractWeekDay('timestamp'))
.values('weekday')
.annotate(count=Count('id'))
.values('weekday', 'count')
This will return a result like:
[{'weekday': 1, 'count': 534}, {'weekday': 2, 'count': 574},.......}
It is also important to note that 1 = Sunday and Saturday = 7
Well man I did an algorithm this one brings you all the records since the beginning of the week (Monday) until today
for example if you have a model like this in your app:
from django.db import models
class x(models.Model):
date = models.DateField()
from datetime import datetime
from myapp.models import x
start_date = datetime.date(datetime.now())
week = start_date.isocalendar()[1]
day_week =start_date.isoweekday()
days_quited = 0
less_days = day_week
while less_days != 1:
days_quited += 1
less_days -= 1
week_begin = datetime.date(datetime(start_date.year,start_date.month,start_date.day-days_quited))
records = x.objects.filter(date__range=(week_begin, datetime.date(datetime.now())))
And if you add some records in the admin with a range between June 17 (Monday) and June 22 (today) you will see all those records, and if you add more records with the date of tomorrow for example or with the date of the next Monday you will not see those records.
If you want the records of other week unntil now you only have to put this:
start_date = datetime.date(datetime(year, month, day))
records = x.objects.filter(date__range=(week_begin, datetime.date(datetime.now())))
Hope this helps! :D
You need to add an extra weekday field to the selection, then group by that in the sum or average aggregation. Note that this becomes a database specific query, because the 'extra' notation becomes passed through to the DB select statement.
Given the model:
class x(models.Model):
date = models.DateField()
value = models.FloatField()
Then, for mysql, with a mapping of the ODBC weekday to the python datetime weekday:
x.objects.extra(select={'weekday':"MOD(dayofweek(date)+5,7)"}).values('weekday').annotate(weekday_value=Avg('value'), weekday_value_std=StdDev('value'))
Note that if you do not need to convert the MySql ODBC weekday (1 = Sunday, 2 = Monday...) to python weekday (Monday is 0 and Sunday is 6), then you do not need to do the modulo.
For model like this:
class A(models.Model):
date = models.DateField()
value = models.FloatField()
You can use query:
weekday = {"w": """strftime('%%w', date)"""}
qs = A.objects.extra(select=weekday).values('w').annotate(stat = Sum("value")).order_by()

Categories