I am creating an application in fastapi and I am using Tortoise-ORM as orm. I have the following model:
from tortoise import fields
from app.models.base_class import Base
class Announcement(Base):
name = fields.CharField(max_length=64, null=False)
description = fields.TextField()
date = fields.DatetimeField(auto_now=True)
# ORM relationship between Announcement and User entity
user = fields.ForeignKeyField(
"models.User",
related_name="announcements",
on_delete=fields.CASCADE
)
And I need to get all the "announcements" of the current day, the problem is that my date field is of type datetime and I want to filter by day (without considering the time). How can I do this with tortoise-orm? something like this:
async def get_today_announcement(self):
today = datetime.datetime.now().date()
return await self.model.filter(date=today).all()
(The above does not work since it returns an empty list when the hours do not match).
This is what worked for me:
async def get_today_announcement(self):
today = datetime.datetime.now()
return await self.model.filter(
date__year=today.year,
date__month=today.month,
date__day=today.day
).all()
In a Django project, I have these simplified models defined:
class People(models.Model):
name = models.CharField(max_length=96)
class Event(models.Model):
name = models.CharField(verbose_name='Nom', max_length=96)
date_start = models.DateField()
date_end = models.DateField()
participants = models.ManyToManyField(to='People', through='Participation')
class Participation(models.Model):
"""Represent the participation of 1 people to 1 event, with information about arrival date and departure date"""
people = models.ForeignKey(to=People, on_delete=models.CASCADE)
event = models.ForeignKey(to=Event, on_delete=models.CASCADE)
arrival_d = models.DateField(blank=True, null=True)
departure_d = models.DateField(blank=True, null=True)
Now, I need generate a participation graph: for each single event day, I want the corresponding total number of participations.
Currently, I use this awful code:
def daterange(start, end, include_last_day=False):
"""Return a generator for each date between start and end"""
days = int((end - start).days)
if include_last_day:
days += 1
for n in range(days):
yield start + timedelta(n)
class ParticipationGraph(DetailView):
template_name = 'events/participation_graph.html'
model = Event
def get_context_data(self, **kwargs):
labels = []
data = []
for d in daterange(self.object.date_start, self.object.date_end):
labels.append(formats.date_format(d, 'd/m/Y'))
total_participation = self.object.participation_set
.filter(arrival_d__lte=d, departure_d__gte=d).count()
data.append(total_participation)
kwargs.update({
'labels': labels,
'data': data,
})
return super(ParticipationGraph, self).get_context_data(**kwargs)
Obviously, I run a new SQL query for each day between Event.date_start and Event.date_end. Is there a way to get the same result with a reduced number of SQL query (ideally, only one)?
I tried many aggregation tools from Django orm (values(), distinct(), etc.) but I always fall to the same issue: I don't have a field with a simple date value, I only have start and end date (in Event) and departure and arrival date (in Participation), so I can't find a way to group my results by date.
I agree that the current approach is expensive because, for each day, you are re-querying the DB for participants that you already retrieved earlier. I would instead approach this by doing a one-time query to the DB to get the participants and then use that data to populate your result data structure.
One structural change I would make to your solution is that instead of tracking two lists where each index corresponds to a day and the participation, aggregate the data in a dictionary mapping the day to the number of participants. If we aggregate results this way, we can always convert this to the two-lists at the end if needed.
Here is what my general (pseudo-codeish) approach is:
def formatDate(d):
return formats.date_format(d, 'd/m/Y')
def get_context_data(self, **kwargs):
# initialize the results with dates in question
result = {}
for d in daterange(self.object.date_start, self.object.date_end):
result[formatDate(d)] = 0
# for each participant, add 1 to each date that they are there
for participant in self.object.participation_set:
for d in daterange(participant.arrival_d, participant.departure_d):
result[formatDate(d)] += 1
# if needed, convert result to appropriate two-list format here
kwargs.update({
'participation_amounts': result
})
return super(ParticipationGraph, self).get_context_data(**kwargs)
In terms of performance, both approaches do the same number of operations. In your approach, for every day, d, you filter over every participant, p. Thus, the number of operations is O(dp). In my approach, for each participant I go through every day they attended (worse cast every day, d). Thus, it is also O(dp).
The reason to prefer my approach is what you pointed out. It only hits the database once to retrieve the participant list. Thus, it is less dependent on network latency. It does sacrifice some of the perf benefits that you get from SQL queries over python code. However, the python code is not too complex and should be fairly easy to process for events that even have hundreds of thousands of people.
I saw this question few days ago and honoured it with an upvote, since it is really well written and the problematics is very interesting. Finally I found some time to dedicate to its solution.
Django is a variation of a Model-View-Controller called Model-Template-View. My approach would follow thus the paradigm "fat model and thin controllers" (or translated to conform with Django "fat model and thin views").
Here is how I would rewrite the models:
import pandas
from django.db import models
from django.utils.functional import cached_property
class Person(models.Model):
name = models.CharField(max_length=96)
class Event(models.Model):
name = models.CharField(verbose_name='Nom', max_length=96)
date_start = models.DateField()
date_end = models.DateField()
participants = models.ManyToManyField(to='Person', through='Participation')
#cached_property
def days(self):
days = pandas.date_range(self.date_start, self.date_end).tolist()
return [day.date() for day in days]
#cached_property
def number_of_participants_per_day(self):
number_of_participants = []
participations = self.participation_set.all()
for day in self.days:
count = len([par for par in participations if day in par.days])
number_of_participants.append((day, count))
return number_of_participants
class Participation(models.Model):
people = models.ForeignKey(to=Person, on_delete=models.CASCADE)
event = models.ForeignKey(to=Event, on_delete=models.CASCADE)
arrival_d = models.DateField(blank=True, null=True)
departure_d = models.DateField(blank=True, null=True)
#cached_property
def days(self):
days = pandas.date_range(self.arrival_d, self.departure_d).tolist()
return [day.date() for day in days]
All calculations are placed in the models. Information that depends on the data stored in the database is made available as cached_property.
Let's see an example for Event:
djangocon = Event.objects.create(
name='DjangoCon Europe 2018',
date_start=date(2018,5,23),
date_end=date(2018,5,28)
)
djangocon.days
>>> [datetime.date(2018, 5, 23),
datetime.date(2018, 5, 24),
datetime.date(2018, 5, 25),
datetime.date(2018, 5, 26),
datetime.date(2018, 5, 27),
datetime.date(2018, 5, 28)]
I used pandas for generating the date range, which is probably an overkill for your application, but it has nice syntax and is good for demonstrational purposes. You can generate the date range in your own way.
To get this result there was only one query. The days is available as any other field.
The same thing I made in Participation, here are some examples:
antwane = Person.objects.create(name='Antwane')
rohan = Person.objects.create(name='Rohan Varma')
cezar = Person.objects.create(name='cezar')
They all want to visit DjangoCon Europe in 2018, but not all of them are attending all days:
p1 = Participation.objects.create(
people=antwane,
event=djangocon,
arrival_d=date(2018,5,23),
departure_d=date(2018,5,28)
)
p2 = Participation.objects.create(
people=rohan,
event=djangocon,
arrival_d=date(2018,5,23),
departure_d=date(2018,5,26)
)
p3 = Participation.objects.create(
people=cezar,
event=djangocon,
arrival_d=date(2018,5,25),
departure_d=date(2018,5,28)
)
Now we want to see how many participants there are for every day the event is going on. We track the number of SQL queries too.
from django.db import connection
djangocon = Event.objects.get(pk=1)
djangocon.number_of_participants_per_day
>>> [(datetime.date(2018, 5, 23), 2),
(datetime.date(2018, 5, 24), 2),
(datetime.date(2018, 5, 25), 3),
(datetime.date(2018, 5, 26), 3),
(datetime.date(2018, 5, 27), 2),
(datetime.date(2018, 5, 28), 2)]
connection.queries
>>>[{'time': '0.000', 'sql': 'SELECT "participants_event"."id", "participants_event"."name", "participants_event"."date_start", "participants_event"."date_end" FROM "participants_event" WHERE "participants_event"."id" = 1'},
{'time': '0.000', 'sql': 'SELECT "participants_participation"."id", "participants_participation"."people_id", "participants_participation"."event_id", "participants_participation"."arrival_d", "participants_participation"."departure_d" FROM "participants_participation" WHERE "participants_participation"."event_id" = 1'}]
There are two queries. The first one fetches the object Event and the second gets the number of participants per day for the event.
Now it's up to you to use it in your views as you please. And thanks to the cached properties you won't need to repeat the database query to get the result.
You can follow the same principle and maybe add property to list all participants for each day of an event. It could look like:
class Event(models.Model):
# ... snip ...
#cached_property
def participants_per_day(self):
participants = []
participations = self.participation_set.all().select_related('people')
for day in self.days:
people = [par.people for par in participations if day in par.days]
participants.append((day, people))
return participants
# refactor the number of participants per day
#cached_property
def number_of_participants_per_day(self):
return [(day, len(people)) for day, people in self.participants_per_day]
I hope you like this solution.
I am creating an app which, on any given day, only one entity can be created per day. Here is the model:
class MyModel(ndb.Model):
created = ndb.DateTimeProperty(auto_now_add=True)
Since only one entity is allowed to be created per day, we will need to compare the MyModel.created property to today's date:
import datetime
class CreateEntity(webapp2.RequestHandler):
def get(self):
today = datetime.datetime.today()
my_model = MyModel.query(MyModel.created == today).get()
if my_model:
# print("Today's entity already exists")
else:
# create today's new entity
The problem is that I cannot compare the two dates like this. How can I check if an entity was already created 'today'?
I ended up changing the property from DateTimeProperty to DateProperty. Now I am able to do this:
today_date = datetime.datetime.today().date()
today_entity = MyModel.query(MyModel.created == today_date).get()
You are comparing a DateTime object with a Date object.
Instead of
my_model = MyModel.query(MyModel.created == today).get()
use
my_model = MyModel.query(MyModel.created.date() == today).get()
Seems like the only one solution is to use a "range" query, here's a relevant answer https://stackoverflow.com/a/14963648/762270
You can't query by created property using == since you don't actually know the exact creation datetime (which is what you'll find in created due to the auto_now_add=True option)
But you could query for the most recently created entity and check if its creation datetime is today. Something along these lines:
class CreateEntity(webapp2.RequestHandler):
def get(self):
now = datetime.datetime.utcnow()
# get most recently created one:
entity_list = MyModel.query().order(-MyModel.created).fetch(limit=1)
entity = entity_list[0] if entity_list else None
if entity and entity.created.year == now.year and \
entity.created.month == now.month and \
entity.created.day == now.day:
# print("Today's entity already exists")
else:
# create today's new entity
Or you could compute a datetime for today's 0:00:00 am and query for created bigger than that.
Or you could drop the auto_now_add=True option and explicitly set created to a specific time of the day (say midnight exactly) and then you can query for the datetime matching that time of day today.
Using a range query for a single specific known value you want to lookup is overkill and expensive, I would use one of these 2 solutions:
1 - Extra Property
Sacrifice a little space with an extra property, though since it's one per day, it shouldn't be a big deal.
from datetime import datetime
class MyModel(ndb.Model):
def _pre_put_hook(self):
self.date = datetime.today().strftime("%Y%m%d")
created = ndb.DateTimeProperty(auto_now_add=True)
date = ndb.StringProperty()
class CreateEntity(webapp2.RequestHandler):
def get(self):
today = datetime.today().strftime("%Y%m%d")
my_model = MyModel.query(MyModel.date == today).get()
if my_model:
logging.info("Today's entity already exists")
else:
# MyModel.date gets set automaticaly by _pre_put_hook
my_model = MyModel()
my_model.put()
logging.info("create today's new entity")
2 - Use [today] as Entity ID (preferred)
I would rather use today as the ID for my Entity, that's the fastest/cheaper/optimal way to retrieve your entity later. It could also be a combination with something else, i.e. ID=<userid+today>, in case that entity is per user, or maybe just add userid as a parent (ancestor). So it would be something like this:
from datetime import datetime
class MyModel(ndb.Model):
created = ndb.DateTimeProperty(auto_now_add=True)
class CreateEntity(webapp2.RequestHandler):
def get(self):
today = datetime.today().strftime("%Y%m%d")
my_model = MyModel.get_by_id(today)
if my_model:
logging.info("Today's entity already exists")
else:
my_model = MyModel(id=today)
my_model.put()
logging.info("create today's new entity")
I have a simple method. Entries are entries in a time sheet application where employees enter their hours.
class Entry(m.Model):
""" Represents an entry in a time_sheet. An entry is either for work, sick leave or holiday. """
# type choices
WORK = 'w'
SICK = 's'
VACATION = 'v'
type_choices = (
(WORK, 'work'),
(SICK, 'sick leave'),
(VACATION, 'vacation'),
)
# meta
cr_date = m.DateTimeField(auto_now_add=True, editable=False, verbose_name='Date of Creation') # date of creation
owner = m.ForeignKey(User, editable=False, on_delete=m.PROTECT)
# content
type = m.CharField(max_length=1, choices=type_choices, default='w')
day = m.DateField(default=now)
start = m.TimeField(blank=True) # starting time
end = m.TimeField(blank=True) # ending time
recess = m.IntegerField() # recess time in minutes
project = m.ForeignKey(Project, on_delete=m.PROTECT)
#classmethod
def get_entries_for_day(cls, user, day):
""" Retrieves any entries for the supplied day. """
return Entry.objects.filter(day__date=day, owner=user).order_by('start')
However, when I try to run my project like this, it terminates with the following error code:
"Unsupported lookup 'date' for DateField or join on the field not
permitted."
I don't quite understand the message. The specified field is a date field which has no further restrictions. Any hints would be appreciated.
There's no such thing as a __date lookup on a DateField; the field is already a date.
It's not clear what you are trying to compare this field with. Is the day you are passing into that method an integer, or a date? If it's also a date then you should just compare them directly.
I'm facing an issue with Django-filters, The filter was not taking the same date range while I was using it. so I added date__lte/gte in lookup_expr.something like this.
from_date = django_filters.DateFilter(field_name="created_at", lookup_expr='date__gte')
to_date = django_filters.DateFilter(field_name="created_at", lookup_expr='date__lte')
I have models.py with:
class Game(models.Model):
date = models.DateTimeField()
I want to have a function that returns all games grouped by days. Result should be a dict with keys: 'day' and 'games' (list of games for the day). I do it in such way:
from itertools import groupby
def get_games_grouped_by_days():
def get_day(obj):
return obj.date.day
games = Game.objects.all()
grouped_games = []
for day, games_in_day in groupby(games, key=get_day):
grouped_games.append({
'day': day,
'games': list(games_in_day)
})
return grouped_games
This function works fine, but I would like to know if there is a more query-way to do such group thing.
what about this:
query = Game.objects.all().query
query.group_by = ['date']
results = QuerySet(query=query, model=Game)
Then you can iterate on results, and get the day and game from each item.