How can I cut down the number of queries? - python

This code is currently executing about 50 SQL queries:
c = Category.objects.all()
categories_w_rand_books = []
for category in c:
r = Book.objects.filter(author__category=category).order_by('?')[:5]
categories_w_rand_books.append((category, r))
I need to cut down the number of used queries to the minimum to speed up things and do not cause server load.
Basically, I have three models: Category, Author, Book. The Author belong to the Category (not books) and I need to get a list of all categories with 5 random books under each one.

If you prefer single query and are using MySQL, check the excellent link provided by #Crazyshezy in his comment.
For PostgreSQL backends, a possible query is (assuming there are non-nullable FK relationships from Book to Author and from Author to Category):
SELECT * FROM (
SELECT book_table.*, row_number() OVER (PARTITION BY category_id ORDER BY RANDOM()) AS rn
FROM book_table INNER JOIN author_table ON book_table.author_id = author_table.id
) AS sq
WHERE rn <= 5
You could then wrap it inside a RawQuerySet to get Book instances
from collections import defaultdict
qs = Book.objects.raw("""The above sql suited for your tables...""")
collection = defaultdict(list)
for obj in qs:
collection[obj.category_id].append(obj)
categories_w_rand_books = []
for category in c:
categories_w_rand_books.append((category, collection[category.id]))
You may not want to run this query for each request directly w/o some caching.
Furthermore, your code generates at most 50*5=250 Books, randomly, I just wonder why because it seems too many for a single page. Are items displayed as tabs or something else? Perhaps you could reduce the counts of SQLs by doing Ajax, or simplify the requirement?
Update
To use book.author w/o triggering more than another query, try prefetch_related_objects
from django.db.models.query import prefetch_related_objects
qs = list(qs) # have to evaluate at first
prefetch_related_objects(qs, ['author'])
# now instances inside qs already contain cached author instances, and
qs[0].author # will not trigger an extra query
The above code prefetches authors in batch and fills them into the qs. This just adds another query.

I'm not sure if this will help you because I don't know the details and context of your problem, but using order_by('?') is very inefficient, specially with some DB back-ends.
For displaying entities with a bit of randomness I use this approach, using a custom filter:
#register.filter
def random_iterator(list, k):
import random
class MyIterator:
def __init__(self, obj, order):
self.obj=obj
self.cnt=0
self.order = order
def __iter__(self):
return self
def next(self):
try:
result=self.obj.__getitem__(self.order[self.cnt])
self.cnt+=1
return result
except IndexError:
raise StopIteration
if list is None:
list = []
n = len(list)
k = min(n, k)
return MyIterator(list, random.sample(range(n), k))
The code in my Django view is something like this:
RAND_BOUND = 50
categories = Category.objects.filter(......)[RAND_BOUND]
And, I use it in my template in this way:
{% for cat in categories|random_iterator:5 %}
<li>{{ cat }}</li>
{% endfor %}
This code will pick 5 random categories of a (reduced) set of RAND_BOUND.
This is not THE perfect solution, but hope it helps.

Related

How to group rows containing similar element from a table using MYSQL/ Python / Django?

I have a table being rendered in my HMTL page. The data comes from a MYSQL query and being rendered using a loop. Let’s say I have this table:
I'm able to get these data by models -> views - > html
views:
def context(request):
context = {
"contents": Something.objects.get_contents()
}
return render(request, 'contents.html', context)
Notice that the only repeating priority is a type of ‘news.’ I am expecting 4 news to show up and they can share similar priorities. Any other genre should have their own priority. How can I write a query and render them to an html page to have an output like this:
I appreciate all the help! Thank you.
I don't think there's an easy way to do that with a SQL query, try doing the collection in Python. Make a defaultdict which lets you collect your contents field into a list:
from collections import defaultdict
contents_dict = defaultdict(list)
for priority, type, content in Something.objects.get_contents():
contents_dict[(priority, type)].append(content)
Then rebuild into a list of tuples
table = [key + (','.join(s),) for key, s in contents_dict.items()]
then your view looks like:
def context(request):
context = {
"contents": table
}
return render(request, 'contents.html', context)
You can use GROUP BY and GROUP_CONCAT, something like this:
SELECT priority, type, GROUP_CONCAT(content SEPARATOR ', ') FROM table GROUP BY priority, type;
Reference: https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat

Django ORM, how to use values() and still work with choicefield?

I am using django v1.10.2
I am trying to create dynamic reports whereby I store fields and conditions and the main ORM model information into database.
My code for the generation of the dynamic report is
class_object = class_for_name("app.models", main_model_name)
results = (class_object.objects.filter(**conditions_dict)
.values(*display_columns)
.order_by(*sort_columns)
[:50])
So main_model_name can be anything.
This works great except that sometimes associated models of the main_model have choicefield.
So for one of the reports main_model is Pallet.
Pallet has many PalletMovement.
My display columns are :serial_number, created_at, pallet_movement__location
The first two columns are fields that belong to Pallet model.
The last one is from PalletMovement
What happens is that PalletMovement model looks like this:
class PalletMovement(models.Model):
pallet = models.ForeignKey(Pallet, related_name='pallet_movements',
verbose_name=_('Pallet'))
WAREHOUSE_CHOICES = (
('AB', 'AB-Delaware'),
('CD', 'CD-Delaware'),
)
location = models.CharField(choices=WAREHOUSE_CHOICES,
max_length=2,
default='AB',
verbose_name=_('Warehouse Location'))
Since the queryset will return me the raw values, how can I make use of the choicefield in PalletMovement model to ensure that the pallet_movement__location gives me the display of AB-Delaware or CD-Delaware?
Bear in mind that the main_model can be anything depending on what I store in the database.
Presumably, I can store more information in the database to help me do the filtering and presentation of data even better.
The values() method returns a dictionary of key-value pairs representing your field name and a corresponding value.
For example:
Model:
class MyModel(models.Model):
name = models.CharField()
surname = models.CharField()
age = models.IntegerField()
...
Query:
result = MyModel.objects.filter(surname='moutafis').values('name', 'surname')
Result:
< Queryset [{'name': 'moutafis', 'surname': 'john'}] >
You can now manipulate this result as you would a normal dictionary:
if main_model_name is 'PalletMovement':
# Make life easier
choices = dict(PalletMovement.WAREHOUSE_CHOICES)
for item in result:
item.update({
pallet_movement__location: verbal_choice.get(
pallet_movement__location, pallet_movement__location)
})
You can even make this into a function for better re-usability:
def verbalize_choices(choices_dict, queryset, search_key):
result = queryset
for item in result:
item.update({ search_key: choices_dict.get(search_key, search_key) })
return result
verbal_result = verbalize_choices(
dict(PalletMovement.WAREHOUSE_CHOICES),
result,
'pallet_movement__location'
)
I suggest the use of the update() and get() methods because they will save you from potential errors, like:
The search_key does not exist in the choice_dict then get() will return the value of the search_key
update() will try to update the given key-value pair if exists, else it will add it to the dictionary.
If the usage of the above will be in the template representation of your data, you can create a custom template filter instead:
#register.filter(name='verbalize_choice')
def choice_to_verbal(choice):
return dict(PalletMovement.WAREHOUSE_CHOICES)[choice]
Have an extra look here: Django: How to access the display value of a ChoiceField in template given the actual value and the choices?
You would use get_foo_display
In your template:
{{ obj.get_location_display }}
or
{{ obj.pallet_movement.get_location_display }}
[Edit:] As pointed out in the comments this will not work when calling values()
an alternative to create a templatetag is :
{{form.choicefield.1}}
This shows the value of the initial data of the foreign key field instead the id.
The universal solution for any main_model_name is by Django Model _meta API introspection: class_object._meta.get_field(field_name).choices
That is:
choice_dicts = {}
for field_name in display_columns:
choice_dicts[field_name] = {
k: v for k, v in class_object._meta.get_field(field_name).choices
}
out = []
for row in results:
out.append({name: choice_dicts[name].get(value, value)
for name, value in row.items()
})
The rest is a trivial example, mostly copied code from the question
>>> pallet = app.models.Pallet.objects.create()
>>> palletm = app.models.PalletMovement.objects.create(pallet=pallet, location='AB')
>>>
>>> main_model_name = 'PalletMovement'
>>> conditions_dict = {}
>>> display_columns = ['pallet_id', 'location']
>>> sort_columns = []
>>>
>>> class_object = class_for_name("app.models", main_model_name)
>>> results = (class_object.objects.filter(**conditions_dict)
... .values(*display_columns)
... .order_by(*sort_columns)
... )[:50]
>>>
>>> # *** INSERT HERE ALL CODE THAT WAS ABOVE ***
>>>
>>> print(out)
[{'location': 'AB-Delaware', 'pallet_id': 1}]
It works equally with 'pallet_id' or with 'pallet' in display_columns. Even that "_meta" starts with underscore, it is a documented API.

Using yield with multiple ndb.get_multi_async

I am trying to improve efficiency of my current query from appengine datastore. Currently, I am using a synchronous method:
class Hospital(ndb.Model):
name = ndb.StringProperty()
buildings= ndb.KeyProperty(kind=Building,repeated=True)
class Building(ndb.Model):
name = ndb.StringProperty()
rooms= ndb.KeyProperty(kind=Room,repeated=True)
class Room(ndb.Model):
name = ndb.StringProperty()
beds = ndb.KeyProperty(kind=Bed,repeated=True)
class Bed(ndb.Model):
name = ndb.StringProperty()
.....
Currently I go through stupidly:
currhosp = ndb.Key(urlsafe=valid_hosp_key).get()
nbuilds = ndb.get_multi(currhosp.buildings)
for b in nbuilds:
rms = ndb.get_multi(b.rooms)
for r in rms:
bds = ndb.get_multi(r.beds)
for b in bds:
do something with b object
I would like to transform this into a much faster query using get_multi_async
My difficulty is in how I can do this?
Any ideas?
Best
Jon
using the given structures above, it is possible, and was confirmed that you can solve this with a set of tasklets. It is a SIGNIFICANT speed up over the iterative method.
#ndb.tasklet
def get_bed_info(bed_key):
bed_info = {}
bed = yield bed_key.get_async()
format and store bed information into bed_info
raise ndb.Return(bed_info)
#nbd.tasklet
def get_room_info(room_key):
room_info = {}
room = yield room_key.get_async()
beds = yield map(get_bed_info,room.beds)
store room info in room_info
room_info["beds"] = beds
raise ndb.Return(room_info)
#ndb.tasklet
def get_building_info(build_key):
build_info = {}
building = yield build_key.get_async()
rooms = yield map(get_room_info,building.rooms)
store building info in build_info
build_info["rooms"] = rooms
raise ndb.Return(build_info)
#ndb.toplevel
def get_hospital_buildings(hospital_object):
buildings = yield map(get_building_info,hospital_object.buildings)
raise ndb.Return(buildings)
Now comes the main call from the hospital function where you have the hospital object (hosp).
hosp_info = {}
buildings = get_hospital_buildings(hospital_obj)
store hospital info in hosp_info
hosp_info["buildings"] = buildings
return hosp_info
There you go! It is incredibly efficient and lets the schedule complete all the information in the fastest possible manner within the GAE backbone.
You can do something with query.map(). See https://developers.google.com/appengine/docs/python/ndb/async#tasklets and https://developers.google.com/appengine/docs/python/ndb/queryclass#Query_map
Its impossible.
Your 2nd query (ndb.get_multi(b.rooms)) depends on the result of your first query.
So pulling it async dosnt work, as at this point the (first) result of the first query has to be avaiable anyway.
NDB does something like that in the background (it allready buffers the next items of ndb.get_multi(currhosp.buildings) while you process the first result).
However, you could use denormalization, i.e. keeping a big table with one entry per Building-Room-Bed pair, and pull your results from that table.
If you have more reads than writes to this table, this will get you a massive speed improvement (1 DB read, instead of 3).

What is an efficient way of inserting thousands of records into an SQLite table using Django?

I have to insert 8000+ records into a SQLite database using Django's ORM. This operation needs to be run as a cronjob about once per minute.
At the moment I'm using a for loop to iterate through all the items and then insert them one by one.
Example:
for item in items:
entry = Entry(a1=item.a1, a2=item.a2)
entry.save()
What is an efficient way of doing this?
Edit: A little comparison between the two insertion methods.
Without commit_manually decorator (11245 records):
nox#noxdevel marinetraffic]$ time python manage.py insrec
real 1m50.288s
user 0m6.710s
sys 0m23.445s
Using commit_manually decorator (11245 records):
[nox#noxdevel marinetraffic]$ time python manage.py insrec
real 0m18.464s
user 0m5.433s
sys 0m10.163s
Note: The test script also does some other operations besides inserting into the database (downloads a ZIP file, extracts an XML file from the ZIP archive, parses the XML file) so the time needed for execution does not necessarily represent the time needed to insert the records.
You want to check out django.db.transaction.commit_manually.
http://docs.djangoproject.com/en/dev/topics/db/transactions/#django-db-transaction-commit-manually
So it would be something like:
from django.db import transaction
#transaction.commit_manually
def viewfunc(request):
...
for item in items:
entry = Entry(a1=item.a1, a2=item.a2)
entry.save()
transaction.commit()
Which will only commit once, instead at each save().
In django 1.3 context managers were introduced.
So now you can use transaction.commit_on_success() in a similar way:
from django.db import transaction
def viewfunc(request):
...
with transaction.commit_on_success():
for item in items:
entry = Entry(a1=item.a1, a2=item.a2)
entry.save()
In django 1.4, bulk_create was added, allowing you to create lists of your model objects and then commit them all at once.
NOTE the save method will not be called when using bulk create.
>>> Entry.objects.bulk_create([
... Entry(headline="Django 1.0 Released"),
... Entry(headline="Django 1.1 Announced"),
... Entry(headline="Breaking: Django is awesome")
... ])
In django 1.6, transaction.atomic was introduced, intended to replace now legacy functions commit_on_success and commit_manually.
from the django documentation on atomic:
atomic is usable both as a decorator:
from django.db import transaction
#transaction.atomic
def viewfunc(request):
# This code executes inside a transaction.
do_stuff()
and as a context manager:
from django.db import transaction
def viewfunc(request):
# This code executes in autocommit mode (Django's default).
do_stuff()
with transaction.atomic():
# This code executes inside a transaction.
do_more_stuff()
Bulk creation is available in Django 1.4:
https://django.readthedocs.io/en/1.4/ref/models/querysets.html#bulk-create
Have a look at this. It's meant for use out-of-the-box with MySQL only, but there are pointers on what to do for other databases.
You might be better off bulk-loading the items - prepare a file and use a bulk load tool. This will be vastly more efficient than 8000 individual inserts.
To answer the question particularly with regard to SQLite, as asked, while I have just now confirmed that bulk_create does provide a tremendous speedup there is a limitation with SQLite: "The default is to create all objects in one batch, except for SQLite where the default is such that at maximum 999 variables per query is used."
The quoted stuff is from the docs--- A-IV provided a link.
What I have to add is that this djangosnippets entry by alpar also seems to be working for me. It's a little wrapper that breaks the big batch that you want to process into smaller batches, managing the 999 variables limit.
You should check out DSE. I wrote DSE to solve these kinds of problems ( massive insert or updates ). Using the django orm is a dead-end, you got to do it in plain SQL and DSE takes care of much of that for you.
Thomas
def order(request):
if request.method=="GET":
cust_name = request.GET.get('cust_name', '')
cust_cont = request.GET.get('cust_cont', '')
pincode = request.GET.get('pincode', '')
city_name = request.GET.get('city_name', '')
state = request.GET.get('state', '')
contry = request.GET.get('contry', '')
gender = request.GET.get('gender', '')
paid_amt = request.GET.get('paid_amt', '')
due_amt = request.GET.get('due_amt', '')
order_date = request.GET.get('order_date', '')
print(order_date)
prod_name = request.GET.getlist('prod_name[]', '')
prod_qty = request.GET.getlist('prod_qty[]', '')
prod_price = request.GET.getlist('prod_price[]', '')
print(prod_name)
print(prod_qty)
print(prod_price)
# insert customer information into customer table
try:
# Insert Data into customer table
cust_tab = Customer(customer_name=cust_name, customer_contact=cust_cont, gender=gender, city_name=city_name, pincode=pincode, state_name=state, contry_name=contry)
cust_tab.save()
# Retrive Id from customer table
custo_id = Customer.objects.values_list('customer_id').last() #It is return
Tuple as result from Queryset
custo_id = int(custo_id[0]) #It is convert the Tuple in INT
# Insert Data into Order table
order_tab = Orders(order_date=order_date, paid_amt=paid_amt, due_amt=due_amt, customer_id=custo_id)
order_tab.save()
# Insert Data into Products table
# insert multiple data at a one time from djanog using while loop
i=0
while(i<len(prod_name)):
p_n = prod_name[i]
p_q = prod_qty[i]
p_p = prod_price[i]
# this is checking the variable, if variable is null so fill the varable value in database
if p_n != "" and p_q != "" and p_p != "":
prod_tab = Products(product_name=p_n, product_qty=p_q, product_price=p_p, customer_id=custo_id)
prod_tab.save()
i=i+1
I recommend using plain SQL (not ORM) you can insert multiple rows with a single insert:
insert into A select from B;
The select from B portion of your sql could be as complicated as you want it to get as long as the results match the columns in table A and there are no constraint conflicts.
def order(request):
if request.method=="GET":
# get the value from html page
cust_name = request.GET.get('cust_name', '')
cust_cont = request.GET.get('cust_cont', '')
pincode = request.GET.get('pincode', '')
city_name = request.GET.get('city_name', '')
state = request.GET.get('state', '')
contry = request.GET.get('contry', '')
gender = request.GET.get('gender', '')
paid_amt = request.GET.get('paid_amt', '')
due_amt = request.GET.get('due_amt', '')
order_date = request.GET.get('order_date', '')
prod_name = request.GET.getlist('prod_name[]', '')
prod_qty = request.GET.getlist('prod_qty[]', '')
prod_price = request.GET.getlist('prod_price[]', '')
# insert customer information into customer table
try:
# Insert Data into customer table
cust_tab = Customer(customer_name=cust_name, customer_contact=cust_cont, gender=gender, city_name=city_name, pincode=pincode, state_name=state, contry_name=contry)
cust_tab.save()
# Retrive Id from customer table
custo_id = Customer.objects.values_list('customer_id').last() #It is return Tuple as result from Queryset
custo_id = int(custo_id[0]) #It is convert the Tuple in INT
# Insert Data into Order table
order_tab = Orders(order_date=order_date, paid_amt=paid_amt, due_amt=due_amt, customer_id=custo_id)
order_tab.save()
# Insert Data into Products table
# insert multiple data at a one time from djanog using while loop
i=0
while(i<len(prod_name)):
p_n = prod_name[i]
p_q = prod_qty[i]
p_p = prod_price[i]
# this is checking the variable, if variable is null so fill the varable value in database
if p_n != "" and p_q != "" and p_p != "":
prod_tab = Products(product_name=p_n, product_qty=p_q, product_price=p_p, customer_id=custo_id)
prod_tab.save()
i=i+1
return HttpResponse('Your Record Has been Saved')
except Exception as e:
return HttpResponse(e)
return render(request, 'invoice_system/order.html')

Union and Intersect in Django

class Tag(models.Model):
name = models.CharField(maxlength=100)
class Blog(models.Model):
name = models.CharField(maxlength=100)
tags = models.ManyToManyField(Tag)
Simple models just to ask my question.
I wonder how can i query blogs using tags in two different ways.
Blog entries that are tagged with "tag1" or "tag2":
Blog.objects.filter(tags_in=[1,2]).distinct()
Blog objects that are tagged with "tag1" and "tag2" : ?
Blog objects that are tagged with exactly "tag1" and "tag2" and nothing else : ??
Tag and Blog is just used for an example.
You could use Q objects for #1:
# Blogs who have either hockey or django tags.
from django.db.models import Q
Blog.objects.filter(
Q(tags__name__iexact='hockey') | Q(tags__name__iexact='django')
)
Unions and intersections, I believe, are a bit outside the scope of the Django ORM, but its possible to to these. The following examples are from a Django application called called django-tagging that provides the functionality. Line 346 of models.py:
For part two, you're looking for a union of two queries, basically
def get_union_by_model(self, queryset_or_model, tags):
"""
Create a ``QuerySet`` containing instances of the specified
model associated with *any* of the given list of tags.
"""
tags = get_tag_list(tags)
tag_count = len(tags)
queryset, model = get_queryset_and_model(queryset_or_model)
if not tag_count:
return model._default_manager.none()
model_table = qn(model._meta.db_table)
# This query selects the ids of all objects which have any of
# the given tags.
query = """
SELECT %(model_pk)s
FROM %(model)s, %(tagged_item)s
WHERE %(tagged_item)s.content_type_id = %(content_type_id)s
AND %(tagged_item)s.tag_id IN (%(tag_id_placeholders)s)
AND %(model_pk)s = %(tagged_item)s.object_id
GROUP BY %(model_pk)s""" % {
'model_pk': '%s.%s' % (model_table, qn(model._meta.pk.column)),
'model': model_table,
'tagged_item': qn(self.model._meta.db_table),
'content_type_id': ContentType.objects.get_for_model(model).pk,
'tag_id_placeholders': ','.join(['%s'] * tag_count),
}
cursor = connection.cursor()
cursor.execute(query, [tag.pk for tag in tags])
object_ids = [row[0] for row in cursor.fetchall()]
if len(object_ids) > 0:
return queryset.filter(pk__in=object_ids)
else:
return model._default_manager.none()
For part #3 I believe you're looking for an intersection. See line 307 of models.py
def get_intersection_by_model(self, queryset_or_model, tags):
"""
Create a ``QuerySet`` containing instances of the specified
model associated with *all* of the given list of tags.
"""
tags = get_tag_list(tags)
tag_count = len(tags)
queryset, model = get_queryset_and_model(queryset_or_model)
if not tag_count:
return model._default_manager.none()
model_table = qn(model._meta.db_table)
# This query selects the ids of all objects which have all the
# given tags.
query = """
SELECT %(model_pk)s
FROM %(model)s, %(tagged_item)s
WHERE %(tagged_item)s.content_type_id = %(content_type_id)s
AND %(tagged_item)s.tag_id IN (%(tag_id_placeholders)s)
AND %(model_pk)s = %(tagged_item)s.object_id
GROUP BY %(model_pk)s
HAVING COUNT(%(model_pk)s) = %(tag_count)s""" % {
'model_pk': '%s.%s' % (model_table, qn(model._meta.pk.column)),
'model': model_table,
'tagged_item': qn(self.model._meta.db_table),
'content_type_id': ContentType.objects.get_for_model(model).pk,
'tag_id_placeholders': ','.join(['%s'] * tag_count),
'tag_count': tag_count,
}
cursor = connection.cursor()
cursor.execute(query, [tag.pk for tag in tags])
object_ids = [row[0] for row in cursor.fetchall()]
if len(object_ids) > 0:
return queryset.filter(pk__in=object_ids)
else:
return model._default_manager.none()
I've tested these out with Django 1.0:
The "or" queries:
Blog.objects.filter(tags__name__in=['tag1', 'tag2']).distinct()
or you could use the Q class:
Blog.objects.filter(Q(tags__name='tag1') | Q(tags__name='tag2')).distinct()
The "and" query:
Blog.objects.filter(tags__name='tag1').filter(tags__name='tag2')
I'm not sure about the third one, you'll probably need to drop to SQL to do it.
Please don't reinvent the wheel and use django-tagging application which was made exactly for your use case. It can do all queries you describe, and much more.
If you need to add custom fields to your Tag model, you can also take a look at my branch of django-tagging.
This will do the trick for you
Blog.objects.filter(tags__name__in=['tag1', 'tag2']).annotate(tag_matches=models.Count(tags)).filter(tag_matches=2)

Categories