Django aggregate Avg vs for loop with prefetch_related()

Django aggregate Avg vs for loop with prefetch_related() - python

I have a Review model that is in one to one relationship with Rating model. A user can give a rating according to six different criteria -- cleanliness, communication, check_in, accuracy, location, and value -- which are defined as fields in the Rating model.
class Rating(models.Model):
cleanliness = models.PositiveIntegerField()
communication = models.PositiveIntegerField()
check_in = models.PositiveIntegerField()
accuracy = models.PositiveIntegerField()
location = models.PositiveIntegerField()
value = models.PositiveIntegerField()
class Review(models.Model):
room = models.ForeignKey('room.Room', on_delete=models.SET_NULL, null=True)
host = models.ForeignKey('user.User', on_delete=models.CASCADE, related_name='host_reviews')
guest = models.ForeignKey('user.User', on_delete=models.CASCADE, related_name='guest_reviews')
rating = models.OneToOneField('Rating', on_delete=models.SET_NULL, null=True)
content = models.CharField(max_length=2000)
I am thinking of a way to calculate the overall rating, which would be the average of average of each column in the Rating model. One way could be using Django's aggregate() function, and another option could be prefetching all reviews and looping through each review to manually calculate the overall rating.
For example,
for room in Room.objects.all()
ratings_dict = Review.objects.filter(room=room)\
.aggregate(*[Avg(field) for field in ['rating__cleanliness', 'rating__communication', \
'rating__check_in', 'rating__accuracy', 'rating__location', 'rating__value']])
ratings_sum = 0
for key in ratings_dict.keys():
ratings_sum += ratings_dict[key] if ratings_dict[key] else 0
Or, simply looping through,
rooms = Room.objects.prefetch_related('review_set')
for room in rooms:
reviews = room.review_set.all()
ratings = 0
for review in reviews:
ratings += (review.rating.cleanliness + review.rating.communication + review.rating.check_in +
review.rating.accuracy + review.rating.location+ review.rating.value)/6
Which way would be more efficient in terms of time complexity and result in less DB calls?
Does aggregate(Avg('field_name')) produce one Avg query at the database level per function call?
Will first calling all rooms with prefetch_related() help reduce number of queries later when calling room.review_set.all()?

Related

How to model many-to-many database with 3 tables

I'm working on a django backend and I'm trying to model a database and want to do it the best practice way.
I need a "User" table, a "Portfolios" table and a "Stocks" table. A user can have multiple portfolios which consist of multiple stocks.
This is my code so far:
class User(models.Model):
user_id = models.AutoField(primary_key=True)
username = models.CharField(max_length=25)
in_cash = models.DecimalField(max_digits=15, decimal_places=2)
class Portfolios(models.Model):
portfolio_id = models.AutoField(primary_key=True)
user_id = models.ForeignKey("User", on_delete=models.CASCADE)
stock_id = models.ForeignKey("Stocks", on_delete=models.CASCADE)
buy_datetime = models.DateTimeField(default=datetime.now, blank=True)
number_of_shares = models.IntegerField()
class Stocks(models.Model):
stock_id = models.AutoField(primary_key=True)
stock_symbol = models.CharField(max_length=12)
In my head I would have an entry in the "Portfolios" table for each stock of a portfolio.
So "Portfolios" would look like
portfolioid 1, userid: 1, stockid: 1, buydate: 12.01.2019, shares: 20
portfolioid 1, userid: 1, stockid: 2, buydate: 19.02.2020, shares: 41
So there is one portfolio, which contains two stocks. But overall that doesn't seem right. If used like in my example above I can't have the portfolioid as a primary key, how can I improve this?
Thanks for your time

What confused me is the name portfolio, which I would call position. Your initial code was correct, although I changed it a bit, removing AutoField which is probably not needed, using a OneToOneField to connect a Customer to a User, removing the s at the end of class names, which are templates, and therefore should be singular, nor plural, adding price to the Stock. And finally changing Portfolio, which should be the sum of all the Positions.
from django.conf import settings
class Customer(models.Model):
customer = models.OneToOneField(settings.AUTH_USER_MODEL, on_delete=models.CASCADE,)
in_cash = models.DecimalField(max_digits=15, decimal_places=2)
def __str__(self):
return self.customer.username
class Position(models.Model):
customer = models.ForeignKey(Customer, on_delete=models.CASCADE)
stock = models.ForeignKey('Stock', on_delete=models.CASCADE)
number_of_shares = models.IntegerField()
buy_datetime = models.DateTimeField(default=datetime.now, blank=True)
def __str__(self):
return self.customer.customer.username + ": " + str(self.number_of_shares) + " shares of " + self.stock.stock_symbol
class Stock(models.Model):
stock_symbol = models.CharField(max_length=12)
price = models.DecimalField(max_digits=10, decimal_places=2)
def __str__(self):
return self.stock_symbol
In my head I would have an entry in the "Portfolios" table for each
stock of a portfolio. So "Portfolios" would look like
portfolioid 1, userid: 1, stockid: 1, buydate: 12.01.2019, shares: 20
portfolioid 1, userid: 1, stockid: 2, buydate: 19.02.2020, shares: 41
So there is one portfolio, which contains two stocks. But overall that
doesn't seem right. If used like in my example above I can't have the
portfolioid as a primary key, how can I improve this?
You are correct, except that should be applied to a Position, each of which is unique, not the Portfolio, which is all the Positions the Customer has.

As in usual many-to-many cases, you will need to create an intermediary table which is also called junction table/association table. Associative Entity
Your users are going to have several portfolios:
class UserPortfolio(models.Model):
user_id = models.ForeignKey("User")
portfolio_id = models.ForeignKey("Portfolio")
The portfolios will have multiple stocks in them:
class PortfolioStock(models.Model):
portfolio_id = models.ForeignKey("Portfolio")
stock_id = models.ForeignKey("Stock")
Now a user can have several portfolios, and those portfolios will include several stocks. In order to get access to the corresponding stocks for a user, you will need to join the tables.

Django Fast Access for Time Series Data

I'm working on a web application with Django & PostgreSQL as Backend tech stack.
My models.py has 2 crucial Models defined. One is Product, and the other one Timestamp.
There are thousands of products and every product has multiple timestamps (60+) inside the DB.
The timestamps hold information about the product's performance for a certain date.
class Product:
owner = models.ForeignKey(AmazonProfile, on_delete=models.CASCADE, null=True)
state = models.CharField(max_length=8, choices=POSSIBLE_STATES, default="St.Less")
budget = models.FloatField(null=True)
product_type = models.CharField(max_length=17, choices=PRODUCT_TYPES, null=True)
name = models.CharField(max_length=325, null=True)
parent = TreeForeignKey('self', on_delete=models.CASCADE, null=True, blank=True, related_name="children")
class Timestamp:
product = models.ForeignKey(Product, null=True, on_delete=models.CASCADE)
product_type = models.CharField(max_length=35, choices=ADTYPES, blank=True, null=True)
owner = models.ForeignKey(AmazonProfile, null=True, blank=True, on_delete=models.CASCADE)
clicks = models.IntegerField(default=0)
spend = models.IntegerField(default=0)
sales = models.IntegerField(default=0)
acos = models.FloatField(default=0)
cost = models.FloatField(default=0)
cpc = models.FloatField(default=0)
orders = models.IntegerField(default=0)
ctr = models.FloatField(default=0)
impressions = models.IntegerField(default=0)
conversion_rate = models.FloatField(default=0)
date = models.DateField(null=True)
I'm using the data for a dashboard, where users are supposed to be able to view their products & the performance of the products for a certain daterange inside a table.
For example, a user might have 100 products inside the table and would like to view all data from the past 2 weeks. For this scenario, I'll describe the code's proceedure below:
Make call to the backend / server
Server has to filter & aggregate all Timestamps for each Product
Server sends data back to client
Client updates table values
The problem is, that step 2. takes a huge amount of time, and I do not know how to improve the performance.
products = Product.objects.filter(name="example")
for product in products:
product.report_set.filter(date_gte="2021-01-01", date__lte="2011-01-14").aggregate(
Sum("clicks"),
Sum("cost"),
Sum("sales"))
That is how the server is currently retrieving the timestamp values for the displayed products.
Any ideas how to retrieve & structure the data in a more efficient way?

It's slow because of the multiple queries you need to make to the database (in the loop).
See if grouping and annotating is better(one query then perhaps queries for fetching each product):-
Timestamp.objects.filter(daterange=["2011-01-01", "2011-01-15"]).values('product').annotate(sum_clicks=Sum("clicks")).annotate(sum_cost=Sum("cost")).annotate(sum_sales=Sum("sales"))
I don't know if this is possible but if it is it would be even better:-
Timestamp.objects.filter(daterange=["2011-01-01", "2011-01-15"]).values('product').annotate(sum_clicks=Sum("clicks")).annotate(sum_cost=Sum("cost")).annotate(sum_sales=Sum("sales")).select_related('product')
Edit:-
After looking back perhaps this might be better:-
products = Product.objects.filter(name="example", report_set__daterange=["2011-01-01", "2011-01-15"]).annotate(sum_clicks=Sum("report_set__clicks")).annotate(sum_cost=Sum("report_set__cost")).annotate(sum_sales=Sum("report_set__sales"))

Without more detail all i can recommend is to optimize, for database optimization i would follow the instructions listed here but know as you speed up the query there will likely be an increase in memory usage.

Filter using Q in Django , when multiple inputs can be null

I have a model product_details, with the following fields brand_name,model_name, bodystyle, transmission, and min_price, max_price
Example of budget values in the dropdown is
1 - 5
5 - 10
10 - 15
I have 5 select dropdown fields on the basis of which I want to filter my results.
I am using the method below to filter based on first three fields
def select(request):
q1 = request.GET.get('brand')
q2 = request.GET.get('model')
q3 = request.GET.get('bodyStyle')
#q4 = request.GET.get('budget')
cars = product_details.objects.filter(Q(bodystyle__icontains=q3)|Q(brand_name__icontains=q1)|Q(model_name__icontains=q2))
#cars = product_details.objects.filter(bodystyle=q3)
return render(request, 'search/search_results.html', {'cars': cars})
I have two questions
1: How do I filter if only 1 or 2 values are selected from dropdowns. What should be my if condition?
2. How do I filter on the basis of range for budget fields? Since the budget needs to be compared with min_price and max_price.
Any help or guidance would be appreciated.
Model Definition:
class product_details(models.Model):
product_id=models.ForeignKey(products, on_delete=models.CASCADE)
model_id=models.ForeignKey(product_models, on_delete=models.CASCADE)
variant_id=models.CharField(primary_key=True,max_length=10)
brand_name=models.CharField(max_length=255, blank=True, null=True)
model_name=models.CharField(max_length=255, blank=True, null=True)
variant_descr=models.CharField(max_length=255, null=True, blank=True)
transmission=models.CharField(max_length=255, blank=True, null=True)
bodystyle=models.CharField(max_length=255, blank=True, null=True)
min_price=models.FloatField(blank=True, null=True)
max_price=models.FloatField(blank=True, null=True)

Please respect python CamelCase naming convention:
class ProductDetails(models.Model):
# ...
For question 2:
selected_min_price= 50
selected_max_price=200
ProductDetails.objects.filter(additional_filters=...).filter(min_price__gt=selected_min_price, max_price__lt=selected_max_price)
See Object.filter(property__lt=...)

ProductDetails.objects.filter(max_price__lte=200, min_price__gte=100) <---- this should give you between ranges of 100 & 200

You can filter your queryset multiple times to apply multiple conditions based on the selected filters, eg:
cars = product_details.objects.all()
if q1:
cars = cars.filter(brand_name__icontains=q1)
if q2:
cars = cars.filter(model_name__icontains=q2)
if q3:
cars = cars.filter(bodystyle__icontains=q3)
For the budget you can use the same technique, if the budget is selected, first split the two values of the range (min, max), then filter your queryset again:
if q4:
budget_min, budget_max = q4.split(' - ')
cars = cars.filter(
min_price__gte=int(budget_min),
max_price__lte=int(budget_max),
)

Queryset difference based on one field

I am trying to compare two querysets based on a single field. But I can't figure out most efficient way to do it.
This is my model and I want to check if old and new room_scans(ForeignKey) has PriceDatum's with the same checkin date. if not, create PriceDatum with that checkin date related to the new room_scan.
class PriceDatum(models.Model):
"""
Stores a price for a date for a given currency for a given
listingscan
Multiple such PriceData objects for each day for next X months are created in each Frequent listing scan
"""
room_scan = models.ForeignKey(RoomScan, default=1, on_delete=models.CASCADE)
room = models.ForeignKey(Room, on_delete=models.CASCADE)
checkin = models.DateField(db_index=True, help_text="Check in date", null=True)
checkout = models.DateField(db_index=True, help_text="checkout date", null=True)
price = models.PositiveSmallIntegerField(help_text="Price in the currency stated")
refund_status = models.CharField(max_length=100, default="N/A")
# scanned = models.DateTimeField(db_index=True, help_text="Check in date", null=True)
availability_count = models.PositiveSmallIntegerField(help_text="How many rooms are available for this price")
max_people = models.PositiveSmallIntegerField(help_text="How many people can stay in the room for this price")
meal = models.CharField(max_length=100, default="N/A", help_text="Tells if breakfast is included in room price")
Below is the code what I am trying to do:
previous_prices_final = previous_prices.filter(refund_status='refund',
current_prices_final=current_prices.filter(
refund_status='refund', max_people=max_people_count, meal=meal).order_by().order_by('checkin')
if len(previous_prices_final) > len(current_prices_final):
difference=previous_prices_final.difference(current_prices_final)
for x in difference:
PriceDatum.objects.create(room_scan=x.room_scan,
room=x.room,
checkin=x.checkin,
checkout=x.checkout,
price=0,
refund_status='refund',
availability_count=0,
max_people=max_people_count,
meal='not_included',
)
The thing is that I get all queries as different, because room_scan foreign key has different time created.
My question is: How do I use difference(), based only on checkin field.

Don't select field that contains creating time. Limit your QS with values.

SQLite Django Model for Inventory of Seeds

I'm trying to build an Inventory Model for a Django App that handles the sale of seeds. Seeds are stored and sold in packs of 3, 5, or 10 seeds of a single variety (for example: 3 pack of mustard seeds).
I want to add x amount of products to inventory with a price for each entry, and sell that product at that price for as long as that entry has items left(quantity field > 0) even if later entries have been made for the same product and presentation but at a different price, so i have the following model:
class Product(models.Model):
name = models.CharField(max_length=100)
class Presentation(models.Model):
seed_qty = models.IntegerField()
class Stock(models.Model):
date = models.DateField(auto_now=True)
quantity = models.IntegerField()
product = models.ForeignKey(Product, on_delete=models.CASCADE)
presentation = models.ForeignKey(Presentation, on_delete=models.CASCADE)
cost = models.FloatField(null=True, blank=True)
sell_price = models.FloatField(null=True, blank=True)
I'm wondering if I should actually relate Product and Stock with a ManyToMany field through a GeneralEntry intermediate model in which I'd store date_added, presentation and cost/price.
My issue is that when I add multiple Stock entries for the same product and presentation, I can't seem to query the earliest prices for each available (quantity>0) stock entry for each product.
What I've tried so far has been:
stock = Stock.objects.filter(quantity__gt=0).order_by('-date')
stock = stock.annotate(min_date=Min('date')).filter(date=min_date)
But that returns that max_date isn't defined.
Any ideas on how to query or rearrange this model ?
Thanks!
*** UPDATE : I wasn't using F() function from django.db.models.
Doing it like this works:
stock = Stock.objects.filter(quantity__gt=0).order_by('-date')
stock = stock.annotate(min_date=Min('date')).filter(date=F('min_date'))

Turns out I wasn't using F() function from django.db.models.
Doing it like this works:
stock = Stock.objects.filter(quantity__gt=0).order_by('-date')
stock = stock.annotate(min_date=Min('date')).filter(date=F('min_date'))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django aggregate Avg vs for loop with prefetch_related() - python

Related

How to model many-to-many database with 3 tables

Django Fast Access for Time Series Data

Filter using Q in Django , when multiple inputs can be null

Queryset difference based on one field

SQLite Django Model for Inventory of Seeds

Categories

Resources