Listing database objects efficiently

Listing database objects efficiently - python

I'm working on a page that lists companies and their employees. Employees have sales. These are saved in a database. Now I need to list all of them. My problem is that the current solution is not fast. One page load takes over 15 seconds.
Currently I have done the following:
companies = {}
employees = {}
for company in Company.objects.all():
sales_count = 0
sales_sum = 0
companies[company.id] = {}
companies[company.id]["name"] = company.name
for employee in company.employees.all():
employee_sales_count = 0
employee_sales_sum = 0
employees[employee.id] = {}
employees[employee.id]["name"] = employee.first_name + " " + employee.last_name
for sale in employee.sales.all():
employee_sales_count+= 1
employee_sales_sum += sale.total
employees[employee.id]["sales_count"] = employee_sales_count
employees[employee.id]["sales_sum"] = employee_sales_sum
sales_count += employee_sales_count
sales_sum += employee_sales_sum
companies[company.id]["sales_count"] = sales_count
companies[company.id]["sales_sum"] = sales_sum
I'm new to Python, not sure if this is a "pythonic" way to do things.
This makes 1500 queries to the database with 100 companies and some employees and sales for each. How should I improve my program to make it efficient?

Avoid nesting of database queries in loops - it's a fine way to performance hell! :-)
Since you're counting all sales for all employees I suggest building your employee and sales dicts on their own. Don't forget to import defaultdict and you may want to lookup how group by and suming/counting works in Django :-)
Lets see... this should give you a indication where to go from here:
# build employee dict
employee_qset = Employee.objects.all()
employees = defaultdict(dict)
for emp in employee_qset.iterator():
employees[emp.company_id][emp.id] = emp
# build sales dict
sales_qset = Sales.objects.all()
sales = defaultdict(dict)
for sale in sales_qset.iterator():
# you could do some calculations here, like sum, or better yet do sums via annotate and group_by in the database
sales[sale.employee_id][sale.id] = sale
# get companies
companies_qset = Companies.objects.all()
companies = {company.id: company for company in companies_qset.iterator()}
for company in companies.itervalues():
# assign employees, assign sales, etc.
pass

Related

How to query Django objects based on a field value in the latest ForeignKey?

I have a Django application to store hourly price and volume (OHLCV candle) for several markets. What I'm trying to achieve is to compare the latest volume of all markets and set top10 = True to the 10 markets with the highest volume in the latest candle. What is the most efficient way to do that ?
EDIT: The queryset should select all the most recent candle in every markets and sort them by volume. Then return the 10 markets the top 10 candles belong to.
models.py
class Market(models.Model):
top10 = JSONField(null=True)
class Candle(models.Model):
market = models.ForeignKey(Market, on_delete=models.CASCADE, related_name='candle', null=True)
price = models.FloatField(null=True)
volume = models.FloatField(null=True)
dt = models.DateTimeField()

Finally, I've figured out the solution, I guess.
latest_distinct = Candle.objects.order_by('market__pk', '-dt').distinct('market__pk')
candle_top = Candle.objects.filter(id__in=latest_distinct).order_by('-volume')[:10]
for item in candle_top:
item.market.top10 = True
item.market.save()
latest_distinct = Candle.objects.order_by('market__pk', '-dt').distinct('market__pk') will select latest candle record for every Market.
candle_top = Candle.objects.filter(id__in=latest_distinct).order_by('-volume')[:10] will sort items in previous query in descending order and slice 10 greatest ones.
Then you iterate over it setting each market.top10 to True
Notice that I'm assuming that Market's top10 field is a boolean. You can substitute your own logic instead of item.market.top10 = True

I have found a solution to my own question by selecting the last candle in every markets with it primary key, and creating a list of lists with list element as [volume, pk]. Then I sort the nested lists by list element 0 volume and select top 10. It returns a list of desired markets:
import operator
v = [[m.candle.first().volume, m.candle.first().id] for m in Market.objects.all()]
top = sorted(v, key=operator.itemgetter(0))[-10:]
[Candle.objects.get(id=i).market for i in [t[1] for t in top]]

postgresql update functionality takes too long

I have a table called
products
Which holds columns
id, data
data here is a JSONB.
id is a unique ID.
I tried bulk adding 10k products it took me nearly 4 minutes.
With lower products update works just fine, but for huge # of products it takes a lot of time, how can I optimize this?
I am trying to bulk update 200k+ products, it's taking me more than 5 minutes right now.
updated_product_ids = []
for product in products:
new_product = model.Product(id, data=product['data'])
new_product['data'] = 'updated data'
new_product['id'] = product.get('id')
updated_product_ids.append(new_product)
def bulk_update(product_ids_arr):
def update_query(count):
return f"""
UPDATE pricing.products
SET data = :{count}
WHERE id = :{count + 1}
"""
queries = []
params = {}
count = 1
for sku in product_ids_arr:
queries.append(update_query(count))
params[str(count)] = json.dumps(sku.data)
params[str(count + 1)] = sku.id
count += 2
session.execute(';'.join(queries), params) #This is what takes so long..
bulk_update(updated_product_ids)
I thought using raw sql to execute this would be faster, but it's taking ALOT of time..
I am trying to update about only 8k products and it takes nearly 3 minutes or more..

Is there a way I can make a for loop, loop through different values to search for in an sqlite3 query

I have a function that gets all of the data for a director each year but I have to create a new function for every year to change the year_granted to the next or previous year. Is there a way I can make a loop that just uses one function and changes the year_granted to the next year.
def getDirectorsInfo2019(self):
c.execute('SELECT first_name, last_name, year_granted, app_units_granted,
full_value_units_granted
FROM Directors INNER JOIN DirectorsUnits ON DirectorsUnits.id_number_unit =
Directors.id_number_dir
WHERE id_number_dir BETWEEN 1 AND 50 AND year_granted=2019')
datas = c.fetchall()
for people in data:
people = [datas[0]]
for people2 in [datas[0]]:
peopl02 = list(pepl2)
self.firstNAme = people2[0]
self.year2019 = people2[2]
self.lastNAme = people2[1]
self.aUnits2019 = people2[3]
self.fUnits2019 = people2[4]

Yes, this is fairly straightforward. The basic idea is to loop through a range and fill in the sql query using the DB-API’s "parameter substitution" method. It looks like this:
query = """
SELECT first_name, last_name, year_granted, app_units_granted, full_value_units_granted FROM Directors
INNER JOIN DirectorsUnits ON DirectorsUnits.id_number_unit = Directors.id_number_dir
WHERE id_number_dir BETWEEN 1 AND 50 AND year_granted=?
"""
# I used a triple-quote string here so that the line breaks are ignored
# Note that this loop would fetch data for 1998, 1999, 2000, and 2001, but not 2002
for year in range(1998, 2002):
# Parameter substitution takes a list or tuple of values, so the value must be in a list even though there's only one
rows = c.execute(query, [str(year),]).fetchall()
for row in rows:
#strictly speaking, you don't need these variable assignments, but it helps to show what's going on
first_name = row[0]
last_name = row[1]
year = row[3]
a_units = row[4]
f_units = row[5]
# do stuff with row data here, append to a list, etc.
I hope this helps!

SQL/Python (Django) - Join each row to entire table

I'm currently creating an application which maps peoples skills against various technologies.
I have 3 tables;
Employees
Name
Department
Skill
Skill name
Results
Name (FK)
Skill (FK)
Skill level
I wish to be able to see every single employee with each skill listed in a table. I believe the correct procedure to retrieve this information would be to perform some sort of for loop and select the info from the 3 tables? The alternative is adding rows to the results table each time an employee or skill is added (although this doesn't seem like correct logic to me).

I think this is a correct logic. Since you have to keep the level of the skill for each employee.
Lets say you have created three models.
Employee
skill
Result
when you do
to get the skills of employee with id = 37
emp = Employee.objects.get(pk=37)
#here we will get an array which has tuple all the skills and its level for employee
skill_level_array = [(Skill.objects.filter(pk=x.skill), x.level) for x in Result.objects.filter(employee=emp)]
To get skills for all empoyees
all_emp = Employee.objects.all()
grand_array = {}
for emp in all_emp:
skill_level_array = [(Skill.objects.filter(pk=x.skill), x.level) for x in Result.objects.filter(employee=emp)]
grand_array[emp] = skill_level_array
Now grand_array has an array of dictionary, with key as employee and value as array of tuple

How to add users to Model from group in Django?

I have a Company that has juniors and seniors. I would like to add users by adding groups instead of individually. Imagine I have Group 1, made of 3 seniors, instead of adding those 3 individually, I'd like to be able to just add Group 1, and have the 3 seniors automatically added to the list of seniors. I'm a little stuck in my current implementation:
class Company(django.model):
juniors = m2m(User)
seniors = m2m(User)
junior_groups = m2m(Group)
senior_groups = m2m(Group)
# currently, I use this signal to add users from a group when a group is added to company
def group_changed(sender, **kwargs):
if kwargs['action'] != 'post_add': return None
co = kwargs['instance']
group_id = kwargs['pk_set'].pop()
juniors = MyGroup.objects.get(pk=group_id).user_set.all()
co.juniors = co.juniors.all() | juniors
co.save()
m2m_changed.connect(...)
The main problem is this looks messy and I have to repeat it for seniors, and potentially other types of users as well.
Is there a more straightforward way to do what I'm trying to do?
Thanks in advance!

are you trying to optimize and avoid having the group object used in your queries ?
if you are ok with a small join query you could use this syntax to get the juniors in company with id = COMP_ID
this way you don't need to handle the users directly and copy them all the time
juniors = User.objects.filter(groups__company_id = COMP_ID , groups__type = Junior)
seniors = User.objects.filter(groups__company_id = COMP_ID , groups__type = Senior)
assuming that
you add related_name "groups" to your m2m relation between groups and users
your groups have type which you manage
you called your foreign-key field 'company' on you Group model
this query can be added as a Property to the company Model , so it give the same programmatic peace of mind

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Listing database objects efficiently - python

Related

How to query Django objects based on a field value in the latest ForeignKey?

postgresql update functionality takes too long

Is there a way I can make a for loop, loop through different values to search for in an sqlite3 query

SQL/Python (Django) - Join each row to entire table

How to add users to Model from group in Django?

Categories

Resources