Django `assertNumQueries` showing duplicate queries on deferred field - python

I am having a strange behaviour that I cannot find out why it's happening.
I have a simple queryset with a deferred field, for example Person.objects.filter(id=4).defer('phone') and then I have a test that asserts this:
with self.assertNumQueries(2):
p = Person.objects.filter(id=4).defer('phone').first() # 1 query
p.phone # 1 query
It fails, because it seems to run three queries on that block: the first one when filtering, and two more duplicate queries that come from the p.phone statement (SELECT phone FROM ...).
Does anyone have any idea why this is happening?
Note: i'm using Django 2.0. And it also happens using only(), the counterpart of defer().

I can't reproduce, it's something related to your case. I wrote this test case with default Django user that passes. Provide more info if you need a better answer.
class TestDefer(APITestCase):
def test_defer(self):
u = User.objects.create(email='aaa#bbb.com', is_staff=True)
with self.assertNumQueries(1):
p = User.objects.defer('is_staff').get(id=u.id)
with self.assertNumQueries(1):
print(p.is_staff)
with self.assertNumQueries(1):
p = User.objects.defer('email').get(id=u.id)
with self.assertNumQueries(1):
print(p.email)

Related

How to avoid ordering by in django queryset, order_by() not working

I have a "big" db whith over 60M records, and I'm trying to paginate by 50.
I have another db whith ~8M records and it works perfectly, but with the 60M amount it just never loads and overflows the db.
I found that the problem was the order_by(id) made by django so I tried using a mysql view already ordered by id, but then django tries to order it again. To avoid this, I used order_by(), which is supposed to avoid any ordering, but it still does it.
def get_queryset(self, request):
qs = super(CropAdmin, self).get_queryset(request)
qs1 = qs.only('id', 'grain__id', 'scan__id', 'scan__acquisition__id',
'validated', 'area', 'crop_date', 'matched_label', 'grain__grain_number', 'filename').order_by()
if request.user.is_superuser:
return qs1
The query made is still using order_by:
SELECT `crops_ordered`.`crop_id`,
`crops_ordered`.`crop_date`,
`crops_ordered`.`area`,
`crops_ordered`.`matched_label`,
`crops_ordered`.`validated`,
`crops_ordered`.`scan_id`,
`crops_ordered`.`grain_id`,
`crops_ordered`.`filename`,
`scans`.`scan_id`,
`scans`.`acquisition_id`,
`acquisitions`.`acquisition_id`,
`grains`.`grain_id`,
`grains`.`grain_number`
FROM `crops_ordered`
INNER JOIN `scans`
ON (`crops_ordered`.`scan_id` = `scans`.`scan_id`)
INNER JOIN `acquisitions`
ON (`scans`.`acquisition_id` = `acquisitions`.`acquisition_id`)
INNER JOIN `grains`
ON (`crops_ordered`.`grain_id` = `grains`.`grain_id`)
**ORDER BY `crops_ordered`.`crop_id` DESC**
LIMIT 50
Any idea on how to fix this? Or a better way to work with a db of this size?
I don't believe order_by() will work as there will most likely be a default parameter when Django implemented this function. Having said that, I believe this thread has the answer that you want.
Edit
The link in that thread might provide too much information at once, although there aren't many details on this either. If you don't like Github, there's also an official documentation page on this method but you'll have to manually look for clear_ordering by using CTRL + f or any equivalence.

Querying DB for users when not all given parameters are filled

Sorry about the title being confusing it was hard to figure out how to word the question.
Currently I have a sqllite db with some users in it they have a first name, last name, dob, high school, and high school class. The db is connected to flask using sqlalchemy. What I'm wondering is for my search function I have 4 inputs and I want to have it so if an input isn't used then it won't be used in the search query. Say the person searches for the last name and high school I want it to search just using those parameters. I've tried doing this using a bunch of if statements but it seems messy there must be a better way. Below is the query that I use but it only works if all 4 are filled. Is there a better way than a bunch of if statements with different queries? I've looked around and haven't found anything.
userq=User.query.filter_by(first_name=fname_strip,last_name=lname_strip,hs_class=hs_class_strip).all()
You can try if/else statements like the following:
q = User.query.filter_by(first_name=first_name)
if lname_strip:
q = q.filter_by(last_name=lname_strip)
if hs_class_strip:
q= q.filter_by(hs_class=hs_class_strip)
# Execute the query
q.all()
Updated needs the q to be an assignment.
Okay so what I did was go through and make an if statement like you said but made them into different vars. Then check to see if they where none or not correct and if they were good then they queryied correctly if not then the queried for everything not null. Then changed them to be a set then did set intersection to see what was the same through all of them. Thank you for ionheart for helping me through this and providing the information this is the complete answer using his partial solution.
userf=set()
userl=set()
userc=set()
userh=set()
if fname_strip!='':
userf = User.query.filter_by(first_name=fname_strip).all()
print(userf)
else:
userf = User.query.filter(User.first_name.isnot(None))
if lname_strip!='':
userl = User.query.filter_by(last_name=lname_strip).all()
print(userl)
else:
userl = User.query.filter(User.last_name.isnot(None))
try:
int(hs_class_strip)
userc = User.query.filter_by(hs_class=hs_class_strip).all()
print(userc)
except:
userc = User.query.filter(User.hs_class.isnot(None))
if hs_strip!='':
userh = User.query.filter_by(hs=hs_strip).all()
print(userh)
else:
userh = User.query.filter(User.hs.isnot(None))
userq=[]
common=set(userf) & set(userl) & set(userc) & set(userh)
print(common)
If you pass the arguments to your search function as keyword arguments you can change the signature to accept kwargs and pass those on to the filter query
def search(**kwargs):
userq = User.query.filter_by(**kwargs).all()
This way any arguments you don't specify when calling search will not be passed onto the query, for example calling search(first_name='bob', last_name='fossil') will only add first name and surname arguments to the query

Building Django Q() objects from other Q() objects, but with relation crossing context

I commonly find myself writing the same criteria in my Django application(s) more than once. I'll usually encapsulate it in a function that returns a Django Q() object, so that I can maintain the criteria in just one place.
I will do something like this in my code:
def CurrentAgentAgreementCriteria(useraccountid):
'''Returns Q that finds agent agreements that gives the useraccountid account current delegated permissions.'''
AgentAccountMatch = Q(agent__account__id=useraccountid)
StartBeforeNow = Q(start__lte=timezone.now())
EndAfterNow = Q(end__gte=timezone.now())
NoEnd = Q(end=None)
# Now put the criteria together
AgentAgreementCriteria = AgentAccountMatch & StartBeforeNow & (NoEnd | EndAfterNow)
return AgentAgreementCriteria
This makes it so that I don't have to think through the DB model more than once, and I can combine the return values from these functions to build more complex criterion. That works well so far, and has saved me time already when the DB model changes.
Something I have realized as I start to combine the criterion from these functions that is that a Q() object is inherently tied to the type of object .filter() is being called on. That is what I would expect.
I occasionally find myself wanting to use a Q() object from one of my functions to construct another Q object that is designed to filter a different, but related, model's instances.
Let's use a simple/contrived example to show what I mean. (It's simple enough that normally this would not be worth the overhead, but remember that I'm using a simple example here to illustrate what is more complicated in my app.)
Say I have a function that returns a Q() object that finds all Django users, whose username starts with an 'a':
def UsernameStartsWithAaccount():
return Q(username__startswith='a')
Say that I have a related model that is a user profile with settings including whether they want emails from us:
class UserProfile(models.Model):
account = models.OneToOneField(User, unique=True, related_name='azendalesappprofile')
emailMe = models.BooleanField(default=False)
Say I want to find all UserProfiles which have a username starting with 'a' AND want use to send them some email newsletter. I can easily write a Q() object for the latter:
wantsEmails = Q(emailMe=True)
but find myself wanting to something to do something like this for the former:
startsWithA = Q(account=UsernameStartsWithAaccount())
# And then
UserProfile.objects.filter(startsWithA & wantsEmails)
Unfortunately, that doesn't work (it generates invalid PSQL syntax when I tried it).
To put it another way, I'm looking for a syntax along the lines of Q(account=Q(id=9)) that would return the same results as Q(account__id=9).
So, a few questions arise from this:
Is there a syntax with Django Q() objects that allows you to add "context" to them to allow them to cross relational boundaries from the model you are running .filter() on?
If not, is this logically possible? (Since I can write Q(account__id=9) when I want to do something like Q(account=Q(id=9)) it seems like it would).
Maybe someone suggests something better, but I ended up passing the context manually to such functions. I don't think there is an easy solution, as you might need to call a whole chain of related tables to get to your field, like table1__table2__table3__profile__user__username, how would you guess that? User table could be linked to table2 too, but you don't need it in this case, so I think you can't avoid setting the path manually.
Also you can pass a dictionary to Q() and a list or a dictionary to filter() functions which is much easier to work with than using keyword parameters and applying &.
def UsernameStartsWithAaccount(context=''):
field = 'username__startswith'
if context:
field = context + '__' + field
return Q(**{field: 'a'})
Then if you simply need to AND your conditions you can combine them into a list and pass to filter:
UserProfile.objects.filter(*[startsWithA, wantsEmails])

Django Query only one field of a model using .extra() and without using .defer() or .only()

I'm using django ORM's exact() method to query only selected fields from a set of models to save RAM. I can't use defer() or only() due to some constraints on the ORM manager I am using (it's not the default one).
The following code works without an error:
q1 = Model.custom_manager.all().extra(select={'field1':'field1'})
# I only want one field from this model
However, when I jsonify the q1 queryset, I get every single field of the model.. so extra() must not have worked, or am I doing something wrong?
print SafeString(serializers.serialize('json', q1))
>>> '{ everything!!!!!}'
To be more specific, the custom manager I am using is django-sphinx. Model.search.query(...) for example.
Thanks.
So, Im not sure if you can do exactly what you want to do. However, if you only want the values for a particular field or a few fields, you can do it with values
It likely does the full query, but the result will only have the values you want. Using your example:
q1 = Model.custom_manager.values('field1', 'field2').all()
This should return a ValuesQuerySet. Which you will not be able to use with serializers.serialize so you will have to do something like this:
from django.utils import simplejson
data = [value for value in q1]
json_dump = simplejson.dumps(data)
Another probably better solution is to just do your query like originally intended, forgetting extra and values and just use the fields kwarg in the serialize method like this:
print SafeString(serializers.serialize('json', q1, fields=('field1', 'field2')))
The downside is that none of these things actually do the same thing as Defer or Only(all the fields are returned from the database), but you get the output you desire.

Iterating over a large Django queryset while the data is changing elsewhere

Iterating over a queryset, like so:
class Book(models.Model):
# <snip some other stuff>
activity = models.PositiveIntegerField(default=0)
views = models.PositiveIntegerField(default=0)
def calculate_statistics():
self.activity = book.views * 4
book.save()
def cron_job_calculate_all_book_statistics():
for book in Book.objects.all():
book.calculate_statistics()
...works just fine. However, this is a cron task. book.views is being incremented while this is happening. If book.views is modified while this cronjob is running, it gets reverted.
Now, book.views is not being modified by the cronjob, but it is being cached during the .all() queryset call. When book.save(), I have a feeling it is using the old book.views value.
Is there a way to make sure that only the activity field is updated? Alternatively, let's say there are 100,000 books. This will take quite a while to run. But the book.views will be from when the queryset originally starts running. Is the solution to just use an .iterator()?
UPDATE: Here's effectively what I am doing. If you have ideas about how to make this work well inline, then I'm all for it.
def calculate_statistics(self):
self.activity = self.views + self.hearts.count() * 2
# Can't do self.comments.count with a comments GenericRelation, because Comment uses
# a TextField for object_pk, and that breaks the whole system. Lame.
self.activity += Comment.objects.for_model(self).count() * 4
self.save()
The following will do the job for you in Django 1.1, no loop necessary:
from django.db.models import F
Book.objects.all().update(activity=F('views')*4)
You can have a more complicated calculation too:
for book in Book.objects.all().iterator():
Book.objects.filter(pk=book.pk).update(activity=book.calculate_activity())
Both these options have the potential to leave the activity field out of sync with the rest, but I assume you're ok with that, given that you're calculating it in a cron job.
In addition to what others have said if you are iterating over a large queryset you should use iterator():
Book.objects.filter(stuff).order_by(stuff).iterator()
this will cause Django to not cache the items as it iterates (which could use a ton of memory for a large result set).
No matter how you solve this, beware of transaction-related issues. E.g. default transaction isolation level is set to REPEATABLE READ, at least for MySQL backend. This, plus the fact that both Django and db backend work in a specific autocommit mode with an ongoing transaction means, that even if you use (very nice) whrde suggestion, value of `views' could be no longer valid. I could be wrong here, but feel warned.

Categories