I have a comment model in django which contains a foreign key reference to the User model and I'm trying to lookup comments (of a certain post ID) and then join/get the user data of the author of the commment. This is what I'm doing
result = Comment.objects.filter(post=post).select_related('user').order_by('-created_at')
When I return the result, I get the same exact object I got before I added the select_related() function. Am I missing something here?
The .select_related(…) [Django-doc] function makes a JOIN in the query, and thus will use the query to load the data of the related .user object. If you do not use .select_related(…), then accessing .user of a Comment will result in an extra query. If you thus need to load all the users of N Comments, then that will take N+1 queries (and this is the famous N+1 problem).
.select_related(…) thus functionally does not change (much), it however results in a (significant) performance boost if you plan to access all the .users of the Comments.
You thus can for example print the .username of the .users in the Comment objects with:
for comment in Comment.objects.select_related('user'):
print(comment.user.username)
If you do this without a .select_related(…) clause, it will result in a large amount of queries.
Related
I need to make a function that will be launched in celery and will take records from the model in turn, check something and write data to another model with onetoone relationship. There are a lot of entries and using model_name.objects.all () is not appropriate (it will take a lot of memory and time) how to do it correctly.
You can use an iterator over the queryset https://docs.djangoproject.com/en/dev/ref/models/querysets/#iterator so your records are fetched on by one
model_iterator = your_model.objects.all().iterator()
for record in model_iterator:
do_something(record)
I need to cache a mid-sized queryset (about 500 rows). I had a look on some solutions, django-cache-machine being the most promising.
Since the queryset is pretty much static (it's a table of cities that's been populated in advance and gets updated only by me and anyway, almost never), I just need to serve the same queryset at every request for filtering.
In my search, one detail was really not clear to me: is the cache a sort of singleton object, which is available to every request? By which I mean, if two different users access the same page, and the queryset is evaluated for the first user, does the second one get the cached queryset?
I could not figure out, what problem you are exactly facing. What you are saying is the classical use case for caching. Memcache and redis are two most popular options. You just needs to write some method or function which first tries to load the result from cache, if it not there , the it queries the database. E.g:-
from django.core.cache import cache
def cache_user(userid):
key = "user_{0}".format(userid)
value = cache.get(key)
if value is None:
# fetch value from db
cache.set(value)
return value
Although for simplicity, I have written this as function, ideally this should be a manager method of the concerned model.
I am relatively new to Django and Python, but I have not been able to quite figure this one out.
I essentially want to query the database using filter for a large number of users. Then I want to make a bunch of queries on this just this section of users. So I thought it would be most efficient do first query for my larger filter parameters, and then make my separate filter queries on that set. In code, it looks like this
#Get the big groups of users, like all people with brown hair.
group_of_users = Data.objects.filter(......)
#Now get all the people with brown hair and blue eyes, and then all with green eyes, etc.
for each haircolor :
subset_of_group = group_of_users.filter(....)
That is just pseudo-code by the way, I am not that inept. I thought this would be more efficient, but it seems that if eliminate the first query and simply just get the querysets in the for loop, it is much faster (actually timed).
I fear this is because when I filter first, and then filter each time in the for loop, it is actually doing both sets of filter queries on each for loop execution. So really, doing twice the amount of work I want. I thought with caching this would not matter, as the first filter results would be cached and it would still be faster, but again, I timed it with multiple tests and the single filter is faster. Any ideas?
EDIT:
So it seems that querying for a set of data, and then trying to further query only against that set of data, is not possible. Rather, I should query for a set of data and then further parse that data using regular Python.
As garnertb ans lanzz said, it doesn't matter where you use the filter function, the only thing that matters is when you evaluate the query (see when querysets are evaluated). My guess is that in your tests, you evaluate the queryset somewhere in your code, and that you do more evaluations in your test with separate filter calls.
Whenever a queryset is evaluated, its results are cached. However, this cache does not carry over if you use another method, such as filter or order_by, on the queryset. SO you can't try to evaluate the bigger set, and use filtering on the queryset to retrieve the smaller sets without doing another query.
If you only have a small set of haircolours, you can get away with doing a query for each haircolour. However, if you have many of them, the amount of queries will have a severe impact on performance. In that case it might be better to do a query for the full set of users you want to use, and the do subsequent processing in python:
qs = Data.objects.filter(hair='brown')
objects = dict()
for obj in qs:
objects.setdefault(obj.haircolour, []).append(obj)
for (k, v) in objects.items():
print "Objects for colour '%s':" % k
for obj in v:
print "- %s" % obj
Filtering Django querysets does not perform any database operation, until you actually try to access the result. Filtering only adds conditions to the queryset, which are then used to build the final query when you access the result of the query.
When you assign group_of_users = Data.objects.filter(...), no data is retrieved from the database; you just get a queryset that knows that you want records that satisfy a specific condition (the filtering parameters you supplied to Data.objects.filter), but it does not pre-fetch those actual users. After that, when you assign subset_of_group = group_of_users.filter(....), you don't filter just that previous group of users, but only add more conditions to the queryset; still no data has been retrived from the database at this point. Only when you actually try to access the results of the queryset (by e.g. iterating over the queryset, or by slicing it, or by accessing a single index in it), the queryset will build an (usually) single query that would retrieve only user records that satisfy all filtering conditions you have accumulated in your querysets up to that point. It will still need to filter your entire users table to find those matching users; it cannot take advantage of the "previously retrieved" users from the group_of_users = Data.objects.filter(...) queryset, because nothing has been actually retrieved at that point.
Your approach is exactly right and it is efficient. The Querysets don't touch the database until they are evaluated, so you can add as many filters as you like and the database won't be touched. Django's excellent documentation provides all the information you need to figure out what operations cause the Queryset to be evaluated.
I noticed that there's no guarantee that the data base is updated synchronously after calling save() on a model.
I have done a simple test by making an ajax call to the following method
def save(request, id)
product = ProductModel.objects.find(id = id)
product.name = 'New Product Name'
product.save()
return HTTPResponse('success')
On the client side I wait for a response from the above method and then execute findAll method that retrieves the list of products. The returned list of products contains the old value for the name of the updated product.
However, if I delay the request for the list of products then it contains the new value, just like it is should.
This means that return HTTPResponse('success') if fired before the new values are written into the data base.
If the above is true then is there a way to return the HTTP response only after the data base is updated.
You should have mentioned App Engine more prominently. I've added it to the tags.
This is very definitely because of your lack of understanding of GAE, rather than anything to do with Django. You should read the GAE documentation on eventual consistency in the datastore, and structure your models and queries appropriately.
Normal Django, running with a standard relational database, would not have this issue.
The view should not return anything prior to the .save() function ends its flow.
As for the flow itself, the Django's docs declare it quite explicitly:
When you save an object, Django performs the following steps:
1) Emit a pre-save signal. The signal django.db.models.signals.pre_save is sent, allowing any functions listening for that signal to take some customized action.
2) Pre-process the data. Each field on the object is asked to perform any automated data modification that the field may need to perform.
Most fields do no pre-processing — the field data is kept as-is. Pre-processing is only used on fields that have special behavior. For example, if your model has a DateField with auto_now=True, the pre-save phase will alter the data in the object to ensure that the date field contains the current date stamp. (Our documentation doesn’t yet include a list of all the fields with this “special behavior.”)
3) Prepare the data for the database. Each field is asked to provide its current value in a data type that can be written to the database.
Most fields require no data preparation. Simple data types, such as integers and strings, are ‘ready to write’ as a Python object. However, more complex data types often require some modification.
For example, DateField fields use a Python datetime object to store data. Databases don’t store datetime objects, so the field value must be converted into an ISO-compliant date string for insertion into the database.
4) Insert the data into the database. The pre-processed, prepared data is then composed into an SQL statement for insertion into the database.
5) Emit a post-save signal. The signal django.db.models.signals.post_save is sent, allowing any functions listening for that signal to take some customized action.
Let me note that the behaviour you're receiving is possible if you've applied #transaction.commit_on_success decorator to your view, though, I don't see it in your code.
More on transactions: https://docs.djangoproject.com/en/1.5/topics/db/transactions/
I am trying to design a tagging system with a model like this:
Tag:
content = CharField
creator = ForeignKey
used = IntergerField
It is a many-to-many relationship between tags and what's been tagged.
Everytime I insert a record into the assotication table,
Tag.used is incremented by one, and decremented by one in case of deletion.
Tag.used is maintained because I want to speed up answering the question 'How many times this tag is used?'.
However, this seems to slow insertion down obviously.
Please tell me how to improve this design.
Thanks in advance.
http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html
If your database support materialized indexed views then you might want to create one for this. You can get a large performance boost for frequently run queries that aggregate data, which I think you have here.
your view would be on a query like:
SELECT
TagID,COUNT(*)
FROM YourTable
GROUP BY TagID
The aggregations can be precomputed and stored in the index to minimize expensive computations during query execution.
I don't think it's a good idea to denormalize your data like that.
I think a more elegant solution is to use django aggregation to track how many times the tag has been used http://docs.djangoproject.com/en/dev/topics/db/aggregation/
You could attach the used count to your tag object by calling something like this:
my_tag = Tag.objects.annotate(used=Count('post'))[0]
and then accessing it like this:
my_tag.used
assuming that you have a Post model class that has a ManyToMany field to your Tag class
You can order the Tags by the named annotated field if needed:
Tag.objects.annotate(used=Count('post')).order_by('-used')