django - convert a list back to a queryset [duplicate]

django - convert a list back to a queryset [duplicate] - python

This question already has answers here:
A QuerySet by aggregate field value
(3 answers)
Closed 8 years ago.
I have a handful of records that I would like to sort based on a computed value. Got the answer over here... like so:
sorted(Profile.objects.all(), key=lambda p: p.reputation)
on a Profile class like this:
class Profile(models.Model):
...
#property
def reputation(self):
...
Unfortunately the generic view is expecting a queryset object and throws an error if I give it a list.
Is there a way to do this that returns a queryset
or...
Can I convert a list to a queryset somehow? Couldn't find anything like that in the django docs.
I am hoping not to denormalize the data, but I guess I will if I have to.
Update / Answer:
it seems that the only way to get a queryset back is if you can get all of your logic into the sql queries.
When that is not possible, (I think) you need to denormalize the data

Ok...this post is now old BUT what you could do is get all the ids of the objects in your list, then perform a model.objects.filter(pk__in=list_of_ids)

There is no point in converting a data list back to a query. A query object never holds data; it just represents a query to the database. It would have to fetch everything again if you made your list to a query, and that would be redundant and very bad performance-wise.
What you can do:
Describe how the reputation field is calculated; it's probably possible to order the data in the database somehow.
Modify the view to not require a query object. If it needs to do additional filtering etc. this should be done before any ordering, since the ordering will take less time with less entries (and less data will be fetched from the database.) So you could send the filtered query object to the sort function just before you send it to the template (which shouldn't care whether it's a query or a list.)

Related

Django ORM: Filter results by values from list, limit answers per value?

I'm using Django 2.0 and have a Content model with a ForeignKey(User, ...). I also have a list of user IDs for which I'd like to fetch that Content, ordered by "newest first", but only up to 25 elements per user. I know I can do this:
Content.objects.filter(user_id__in=[1, 2, 3, ...]).order_by('-id')
...to fetch all the Content objects created by each of these users, plus I'll get it all sorted with newest elements first. But I'd like to fetch up to 25 elements for each of these users (some users might create hundreds of these objects, some might create zero). There's of course the dumb way:
for user in [1, 2, 3, ...]:
Content.objects.filter(user_id=user).order_by('-id')[:25]
This however hits the database as many times as there's objects in the user ID list, and that goes quite high (around 100 or so per page view). Is there any way to optimize this case? (I've tried looking around select_related, but that seems to fetch as many related models as possible.)

There are plenty of ways to form a greatest-n-per-group query, but in this case you could form a union of top-n queries of all users:
contents = Content.objects.\
none().\
union(*[Content.objects.
filter(user_id=uid).
order_by('-id')[:25] for uid in user_ids],
all=True)
Using prefetch_related() you could then produce a queryset that fetches the users and injects an attribute of latest content:
users = User.objects.\
filter(id__in=user_ids).\
prefetch_related(models.Prefetch(
'content_set',
queryset=contents,
to_attr='latest_content'))

Does it actually hit the database that many times? I have not looked at the raw SQL but according to the documentation it is equivalent to the LIMIT clause and it also states "Generally, slicing a QuerySet returns a new QuerySet – it doesn’t evaluate the query".
https://docs.djangoproject.com/en/2.0/topics/db/queries/#limiting-querysets
I would be curious to see the raw SQL if you are looking at it and it does NOT do this as I use this paradigm.

Does a MongoDB query need explicit sort() call if data is retrieved from an index?

I am building a mongo database to store data that will be time stamped. Each document in my database has a time field:
{"time":<datetime-object>}
I have created an index for the time field as so:
self.db.test.create_index([("time", pymongo.ASCENDING)])
And have a query that requests only the time stamp information from the database:
self.db.test.find({'time':{'$gte':start, '$lte':end}}, {"time":1, "_id":0}).sort([("time", 1)])
I have read other questions/documentation that say using an index to get documents should return documents in sorted order since the index itself is already sorted, but all of the examples that I saw still had a direct call to sort() as part of the query. My question is, if I am specifically requesting only one field that I have an index for from the database, do I need to include the sort() method as part of my query, or will the documents be returned in sorted order?

if I am specifically requesting only one field that I have an index for from the database, do I need to include the sort() method as part of my query, or will the documents be returned in sorted order?
In your example case of a single field with an index where it is a covered query, then the order returned would be the order from the index itself.
However, in the case of a multikey field with multikey index is not so. This is because multikey indexes cannot cover queries over array field(s).
It is recommended to specify sort() regardless because:
The query planner will discard the sort stage automatically if it's able to use an index. See also Query Optimisation and Explain Results for more information.
Explicitly specifying sort() is not only going to protect your code against the unexpected (i.e. inconsistent values, etc) but also make the code readable.
You may also be interested in Use Indexes to Sort Query Results

Building simlpe django firehose

I have an app that I want to build a "recent activity"/firehose feed of 2-3 combined types of activity such as posts, comments, and likes of posts, and something else + maybe more later. I assume this is done with a query of taking the last of the appropriate object added to the DB and combining it with the last of the other type of object and ordering the new combined list of objects by their timestamps. What is the best way to do something like this? For now, I have something like this for every time someone refreshes the page:
NewPost.objects.all().order_by('-postdate')[0:10] #takes the last 16 recently added posts
Comment.objects.all().order_by('-commentdate')[0:10] #takes equal number of comments site wide ordered by timestamp
So what is the best way to take both of these querysets and render the different Models in 1 list ordered by their timestamp? I assume the type of logic will be the same for adding additional types of objects, so I just want to know how to do it with just 2. Thanks!

I don't really like your approach since when you want to put another object on the firehose you'd need to add a third line (AnotherObject.objects.all ... etc ) to all places you need to display that firehose !
For me, the best way to do this is to create a Firehose Model with fields like: date, action (add/delete/update etc) and object (a generic Foreign Key to the object that was changed). Now, whenever you make a change to an object that you want to add to the firehose, you'd add a new instance of the FirehoseClass with the correct field values. Finally, whenever you want to display the firehose you'll just display all firehose objects.

To combine the lists, you can use create a list by using chain() from itertools, and then sort them by using sorted():
from itertools import chain
combined_lists = list(chain(new_post_list, comment_list))
sorted_combinened_list = sorted(combined_list, key=lambda instance: instance.postdate)
However, as you see, the sorting is only done by using one key. I don't know of any method to use two different keys when sorting. You could fix this by simply add a property to the Comment class, named postdate that simply returns commentdate. Or, even better, you should use the same name for creation time for all your models, e.g. created_at.
This has been answered earlier and more detailed here: How to combine 2 or more querysets in a Django view?

Efficient way to use filter() twice in Django

I am relatively new to Django and Python, but I have not been able to quite figure this one out.
I essentially want to query the database using filter for a large number of users. Then I want to make a bunch of queries on this just this section of users. So I thought it would be most efficient do first query for my larger filter parameters, and then make my separate filter queries on that set. In code, it looks like this
#Get the big groups of users, like all people with brown hair.
group_of_users = Data.objects.filter(......)
#Now get all the people with brown hair and blue eyes, and then all with green eyes, etc.
for each haircolor :
subset_of_group = group_of_users.filter(....)
That is just pseudo-code by the way, I am not that inept. I thought this would be more efficient, but it seems that if eliminate the first query and simply just get the querysets in the for loop, it is much faster (actually timed).
I fear this is because when I filter first, and then filter each time in the for loop, it is actually doing both sets of filter queries on each for loop execution. So really, doing twice the amount of work I want. I thought with caching this would not matter, as the first filter results would be cached and it would still be faster, but again, I timed it with multiple tests and the single filter is faster. Any ideas?
EDIT:
So it seems that querying for a set of data, and then trying to further query only against that set of data, is not possible. Rather, I should query for a set of data and then further parse that data using regular Python.

As garnertb ans lanzz said, it doesn't matter where you use the filter function, the only thing that matters is when you evaluate the query (see when querysets are evaluated). My guess is that in your tests, you evaluate the queryset somewhere in your code, and that you do more evaluations in your test with separate filter calls.
Whenever a queryset is evaluated, its results are cached. However, this cache does not carry over if you use another method, such as filter or order_by, on the queryset. SO you can't try to evaluate the bigger set, and use filtering on the queryset to retrieve the smaller sets without doing another query.
If you only have a small set of haircolours, you can get away with doing a query for each haircolour. However, if you have many of them, the amount of queries will have a severe impact on performance. In that case it might be better to do a query for the full set of users you want to use, and the do subsequent processing in python:
qs = Data.objects.filter(hair='brown')
objects = dict()
for obj in qs:
objects.setdefault(obj.haircolour, []).append(obj)
for (k, v) in objects.items():
print "Objects for colour '%s':" % k
for obj in v:
print "- %s" % obj

Filtering Django querysets does not perform any database operation, until you actually try to access the result. Filtering only adds conditions to the queryset, which are then used to build the final query when you access the result of the query.
When you assign group_of_users = Data.objects.filter(...), no data is retrieved from the database; you just get a queryset that knows that you want records that satisfy a specific condition (the filtering parameters you supplied to Data.objects.filter), but it does not pre-fetch those actual users. After that, when you assign subset_of_group = group_of_users.filter(....), you don't filter just that previous group of users, but only add more conditions to the queryset; still no data has been retrived from the database at this point. Only when you actually try to access the results of the queryset (by e.g. iterating over the queryset, or by slicing it, or by accessing a single index in it), the queryset will build an (usually) single query that would retrieve only user records that satisfy all filtering conditions you have accumulated in your querysets up to that point. It will still need to filter your entire users table to find those matching users; it cannot take advantage of the "previously retrieved" users from the group_of_users = Data.objects.filter(...) queryset, because nothing has been actually retrieved at that point.

Your approach is exactly right and it is efficient. The Querysets don't touch the database until they are evaluated, so you can add as many filters as you like and the database won't be touched. Django's excellent documentation provides all the information you need to figure out what operations cause the Queryset to be evaluated.

How do I speed up iteration of large datasets in Django

I have a query set of approximately 1500 records from a Django ORM query. I have used the select_related() and only() methods to make sure the query is tight. I have also used connection.queries to make sure there is only this one query. That is, I have made sure no extra queries are getting called on each iteration.
When I run the query cut and paste from connection.queries it runs in 0.02 seconds. However, it takes seven seconds to iterate over those records and do nothing with them (pass).
What can I do to speed this up? What causes this slowness?

A QuerySet can get pretty heavy when it's full of model objects. In similar situations, I've used the .values method on the queryset to specify the properties I need as a list of dictionaries, which can be much faster to iterate over.
Django documentation: values_list

1500 records is far from being a large dataset, and seven seconds is really too much. There is probably some problem in your models, you can easily check it by getting (as Brandon says) the values() query, and then create explicitly the 1500 object by iterating the dictionary. Just convert the ValuesQuerySet into a list before the construction to factor out the db connection.

How are you iterating over each item:
items = SomeModel.objects.all()
Regular for loop on each
for item in items:
print item
Or using the QuerySet iterator
for item in items.iterator():
print item
According to the doc, the iterator() can improve performance. The same applies while looping very large Python list or dictionaries, it's best to use iteritems().

Does your model's Meta declaration tell it to "order by" a field that is stored off in some other related table? If so, your attempt to iterate might be triggering 1,500 queries as Django runs off and grabs that field for each item, and then sorts them. Showing us your code would help us unravel the problem!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.