I am creating an application in fastapi and I am using Tortoise-ORM as orm. I have the following model:
from tortoise import fields
from app.models.base_class import Base
class Announcement(Base):
name = fields.CharField(max_length=64, null=False)
description = fields.TextField()
date = fields.DatetimeField(auto_now=True)
# ORM relationship between Announcement and User entity
user = fields.ForeignKeyField(
"models.User",
related_name="announcements",
on_delete=fields.CASCADE
)
And I need to get all the "announcements" of the current day, the problem is that my date field is of type datetime and I want to filter by day (without considering the time). How can I do this with tortoise-orm? something like this:
async def get_today_announcement(self):
today = datetime.datetime.now().date()
return await self.model.filter(date=today).all()
(The above does not work since it returns an empty list when the hours do not match).
This is what worked for me:
async def get_today_announcement(self):
today = datetime.datetime.now()
return await self.model.filter(
date__year=today.year,
date__month=today.month,
date__day=today.day
).all()
In a Django project, I have these simplified models defined:
class People(models.Model):
name = models.CharField(max_length=96)
class Event(models.Model):
name = models.CharField(verbose_name='Nom', max_length=96)
date_start = models.DateField()
date_end = models.DateField()
participants = models.ManyToManyField(to='People', through='Participation')
class Participation(models.Model):
"""Represent the participation of 1 people to 1 event, with information about arrival date and departure date"""
people = models.ForeignKey(to=People, on_delete=models.CASCADE)
event = models.ForeignKey(to=Event, on_delete=models.CASCADE)
arrival_d = models.DateField(blank=True, null=True)
departure_d = models.DateField(blank=True, null=True)
Now, I need generate a participation graph: for each single event day, I want the corresponding total number of participations.
Currently, I use this awful code:
def daterange(start, end, include_last_day=False):
"""Return a generator for each date between start and end"""
days = int((end - start).days)
if include_last_day:
days += 1
for n in range(days):
yield start + timedelta(n)
class ParticipationGraph(DetailView):
template_name = 'events/participation_graph.html'
model = Event
def get_context_data(self, **kwargs):
labels = []
data = []
for d in daterange(self.object.date_start, self.object.date_end):
labels.append(formats.date_format(d, 'd/m/Y'))
total_participation = self.object.participation_set
.filter(arrival_d__lte=d, departure_d__gte=d).count()
data.append(total_participation)
kwargs.update({
'labels': labels,
'data': data,
})
return super(ParticipationGraph, self).get_context_data(**kwargs)
Obviously, I run a new SQL query for each day between Event.date_start and Event.date_end. Is there a way to get the same result with a reduced number of SQL query (ideally, only one)?
I tried many aggregation tools from Django orm (values(), distinct(), etc.) but I always fall to the same issue: I don't have a field with a simple date value, I only have start and end date (in Event) and departure and arrival date (in Participation), so I can't find a way to group my results by date.
I agree that the current approach is expensive because, for each day, you are re-querying the DB for participants that you already retrieved earlier. I would instead approach this by doing a one-time query to the DB to get the participants and then use that data to populate your result data structure.
One structural change I would make to your solution is that instead of tracking two lists where each index corresponds to a day and the participation, aggregate the data in a dictionary mapping the day to the number of participants. If we aggregate results this way, we can always convert this to the two-lists at the end if needed.
Here is what my general (pseudo-codeish) approach is:
def formatDate(d):
return formats.date_format(d, 'd/m/Y')
def get_context_data(self, **kwargs):
# initialize the results with dates in question
result = {}
for d in daterange(self.object.date_start, self.object.date_end):
result[formatDate(d)] = 0
# for each participant, add 1 to each date that they are there
for participant in self.object.participation_set:
for d in daterange(participant.arrival_d, participant.departure_d):
result[formatDate(d)] += 1
# if needed, convert result to appropriate two-list format here
kwargs.update({
'participation_amounts': result
})
return super(ParticipationGraph, self).get_context_data(**kwargs)
In terms of performance, both approaches do the same number of operations. In your approach, for every day, d, you filter over every participant, p. Thus, the number of operations is O(dp). In my approach, for each participant I go through every day they attended (worse cast every day, d). Thus, it is also O(dp).
The reason to prefer my approach is what you pointed out. It only hits the database once to retrieve the participant list. Thus, it is less dependent on network latency. It does sacrifice some of the perf benefits that you get from SQL queries over python code. However, the python code is not too complex and should be fairly easy to process for events that even have hundreds of thousands of people.
I saw this question few days ago and honoured it with an upvote, since it is really well written and the problematics is very interesting. Finally I found some time to dedicate to its solution.
Django is a variation of a Model-View-Controller called Model-Template-View. My approach would follow thus the paradigm "fat model and thin controllers" (or translated to conform with Django "fat model and thin views").
Here is how I would rewrite the models:
import pandas
from django.db import models
from django.utils.functional import cached_property
class Person(models.Model):
name = models.CharField(max_length=96)
class Event(models.Model):
name = models.CharField(verbose_name='Nom', max_length=96)
date_start = models.DateField()
date_end = models.DateField()
participants = models.ManyToManyField(to='Person', through='Participation')
#cached_property
def days(self):
days = pandas.date_range(self.date_start, self.date_end).tolist()
return [day.date() for day in days]
#cached_property
def number_of_participants_per_day(self):
number_of_participants = []
participations = self.participation_set.all()
for day in self.days:
count = len([par for par in participations if day in par.days])
number_of_participants.append((day, count))
return number_of_participants
class Participation(models.Model):
people = models.ForeignKey(to=Person, on_delete=models.CASCADE)
event = models.ForeignKey(to=Event, on_delete=models.CASCADE)
arrival_d = models.DateField(blank=True, null=True)
departure_d = models.DateField(blank=True, null=True)
#cached_property
def days(self):
days = pandas.date_range(self.arrival_d, self.departure_d).tolist()
return [day.date() for day in days]
All calculations are placed in the models. Information that depends on the data stored in the database is made available as cached_property.
Let's see an example for Event:
djangocon = Event.objects.create(
name='DjangoCon Europe 2018',
date_start=date(2018,5,23),
date_end=date(2018,5,28)
)
djangocon.days
>>> [datetime.date(2018, 5, 23),
datetime.date(2018, 5, 24),
datetime.date(2018, 5, 25),
datetime.date(2018, 5, 26),
datetime.date(2018, 5, 27),
datetime.date(2018, 5, 28)]
I used pandas for generating the date range, which is probably an overkill for your application, but it has nice syntax and is good for demonstrational purposes. You can generate the date range in your own way.
To get this result there was only one query. The days is available as any other field.
The same thing I made in Participation, here are some examples:
antwane = Person.objects.create(name='Antwane')
rohan = Person.objects.create(name='Rohan Varma')
cezar = Person.objects.create(name='cezar')
They all want to visit DjangoCon Europe in 2018, but not all of them are attending all days:
p1 = Participation.objects.create(
people=antwane,
event=djangocon,
arrival_d=date(2018,5,23),
departure_d=date(2018,5,28)
)
p2 = Participation.objects.create(
people=rohan,
event=djangocon,
arrival_d=date(2018,5,23),
departure_d=date(2018,5,26)
)
p3 = Participation.objects.create(
people=cezar,
event=djangocon,
arrival_d=date(2018,5,25),
departure_d=date(2018,5,28)
)
Now we want to see how many participants there are for every day the event is going on. We track the number of SQL queries too.
from django.db import connection
djangocon = Event.objects.get(pk=1)
djangocon.number_of_participants_per_day
>>> [(datetime.date(2018, 5, 23), 2),
(datetime.date(2018, 5, 24), 2),
(datetime.date(2018, 5, 25), 3),
(datetime.date(2018, 5, 26), 3),
(datetime.date(2018, 5, 27), 2),
(datetime.date(2018, 5, 28), 2)]
connection.queries
>>>[{'time': '0.000', 'sql': 'SELECT "participants_event"."id", "participants_event"."name", "participants_event"."date_start", "participants_event"."date_end" FROM "participants_event" WHERE "participants_event"."id" = 1'},
{'time': '0.000', 'sql': 'SELECT "participants_participation"."id", "participants_participation"."people_id", "participants_participation"."event_id", "participants_participation"."arrival_d", "participants_participation"."departure_d" FROM "participants_participation" WHERE "participants_participation"."event_id" = 1'}]
There are two queries. The first one fetches the object Event and the second gets the number of participants per day for the event.
Now it's up to you to use it in your views as you please. And thanks to the cached properties you won't need to repeat the database query to get the result.
You can follow the same principle and maybe add property to list all participants for each day of an event. It could look like:
class Event(models.Model):
# ... snip ...
#cached_property
def participants_per_day(self):
participants = []
participations = self.participation_set.all().select_related('people')
for day in self.days:
people = [par.people for par in participations if day in par.days]
participants.append((day, people))
return participants
# refactor the number of participants per day
#cached_property
def number_of_participants_per_day(self):
return [(day, len(people)) for day, people in self.participants_per_day]
I hope you like this solution.
I have a simple method. Entries are entries in a time sheet application where employees enter their hours.
class Entry(m.Model):
""" Represents an entry in a time_sheet. An entry is either for work, sick leave or holiday. """
# type choices
WORK = 'w'
SICK = 's'
VACATION = 'v'
type_choices = (
(WORK, 'work'),
(SICK, 'sick leave'),
(VACATION, 'vacation'),
)
# meta
cr_date = m.DateTimeField(auto_now_add=True, editable=False, verbose_name='Date of Creation') # date of creation
owner = m.ForeignKey(User, editable=False, on_delete=m.PROTECT)
# content
type = m.CharField(max_length=1, choices=type_choices, default='w')
day = m.DateField(default=now)
start = m.TimeField(blank=True) # starting time
end = m.TimeField(blank=True) # ending time
recess = m.IntegerField() # recess time in minutes
project = m.ForeignKey(Project, on_delete=m.PROTECT)
#classmethod
def get_entries_for_day(cls, user, day):
""" Retrieves any entries for the supplied day. """
return Entry.objects.filter(day__date=day, owner=user).order_by('start')
However, when I try to run my project like this, it terminates with the following error code:
"Unsupported lookup 'date' for DateField or join on the field not
permitted."
I don't quite understand the message. The specified field is a date field which has no further restrictions. Any hints would be appreciated.
There's no such thing as a __date lookup on a DateField; the field is already a date.
It's not clear what you are trying to compare this field with. Is the day you are passing into that method an integer, or a date? If it's also a date then you should just compare them directly.
I'm facing an issue with Django-filters, The filter was not taking the same date range while I was using it. so I added date__lte/gte in lookup_expr.something like this.
from_date = django_filters.DateFilter(field_name="created_at", lookup_expr='date__gte')
to_date = django_filters.DateFilter(field_name="created_at", lookup_expr='date__lte')
I have the following model for a Measurement belonging to a user:
class Measurement(models.Model):
user = models.ForeignKey(User, related_name='measurements')
timestamp = models.DateTimeField(default=timezone.now, db_index=True)
def get_date(self):
return self.timestamp.date()
And for one of my views I need the measurements grouped by day. I have done the following:
def group_measurements_by_days(user_id):
measurements = Measurement.objects.filter(user=user_id)
daysd = {}
for measurement in measurements:
day = measurement.get_date()
if not day in daysd:
daysd[day] = []
mymeas = daysd[day]
mymeas.append(measurement)
days = [{'day': day, 'measurements': measurements} for day, measurements in daysd.items()]
days = sorted(days, key=itemgetter('day'), reverse=True)
return days
So that I get a list like:
[
{ 'day': date1, 'measurements': [ Measurement, Measurement, ...]},
{ 'day': date2, 'measurements': [ Measurement, Measurement, ...]},
...
]
But this is completely wrong, because I have a mixture of plain python types and django ORM types. This works when I use this data in templates, but is giving me trouble in other areas.
For example, when trying to reuse this code for a django rest api, I suddenly am unable to serialize those objects.
How can I process the data coming from the django ORM (Measurement.objects.filter(user=user_id)), but still keep a consistent format for my API?
Hopefully, you are able to use Django 1.8. Problem solved in this post credits goes there.
You can filter the queryset using grouping by date:
from django.db.models.expressions import Date
queryset = Measurements.objects.filter(user=user_id).annotate(day=Date('timestamp'), measurements=Count(0)).values('day', 'measurements')
How about using this idea ...
mymeas.append(measurement.__dict__)
I have a model for image uploads, that looks something like this:
from django.db import models
from django.contrib.auth.models import User
import datetime
class ImageItem(models.Model):
user = models.ForeignKey(User)
upload_date = models.DateTimeField(auto_now_add = True)
last_modified = models.DateTimeField(auto_now = True)
original_img = models.ImageField(upload_to = img_get_file_path)
I want to query all instances of ImageItem that belong to a particular user, and group them according to date uploaded. For example, for some user, I want a group for April 9 2013, another for April 12 2013, etc. (assuming that they uploaded one or more images on those dates).
I'm thinking I run a simple query, like,
joes_images = ImageItem.objects.filter(user__username='joe')
But then how could I group them by day published? (assuming he did not publish every day, only on some days)
The function would have to return all the groups of images.
why don't you do as following?
joes_images = ImageItem.objects.filter(user__username='joe') # your code
upload_dates = set([(i.year, i.month, i.day) for i in joes_images]) # set of upload_date
joes_images_separated = dict([(d, []) for d in upload_dates])
for d in upload_dates:
for i in joes_images:
if (i.year, i.month, i.day) == d:
joes_images_separated[d].append(i)
Here, upload_dates is a set of dates in joes_images and you get joes_images_separated as a dict (keys are dates, values are lists of joes_images for each date).
I'm sorry for a little dirty code. I think this works. for your information.