Django: Retrieving IDs of manyToMany fields quickly

Django: Retrieving IDs of manyToMany fields quickly - python

I have the following model schema in Django (with Postgres).
class A(Models.model):
related = models.ManyToManyField("self", null=True)
Given a QuerySet of A, I would like to return a dictionary mapping each instance of A in the QuerySet to a list of ids of its related instances as quickly as possible.
I can surely iterate through each A and query the related field, but is there a more optimal way?

According you have Three instances. You can use the values_list method to retrieve just the results and from this result get just the ID's of their related instances.
I use the pk field to be my filter because i don't know your scheme, but you can use anything, just must be a QuerySet.
>>> result = A.objects.filter(pk=1)
>>> result.values('related__id')
[{'id': 2}, {'id': 3}]
>>> result.values_list('related__id')
[(2,), (3,)]
>>> result.values_list('related__id', flat=True)
[2, 3]

You can get pretty close like this:
qs = A.objects.prefetch_related(Prefetch(
'related',
queryset=A.objects.only('pk'),
to_attr='related_insts')).in_bulk(my_list_of_pks)
This will give a mapping from pks of the current object to the instance itself, so you can iterate through as follows:
for pk, inst in qs.iteritems():
related_ids = (related.pk for related in inst.related_insts)
Or given an instance, you can do a fast lookup like so:
related_ids = (related.pk for related in qs[instance.pk]).
This method maps the instance ids to the related ids (indirectly) since you specifically requested a dictionary. If you aren't doing lookups, you may want the following instead:
qs = A.objects.prefetch_related(Prefetch(
'related',
queryset=A.objects.only('pk'),
to_attr='related_insts')).filter(pk__in=my_list_of_pks)
for inst in qs:
related_ids = (related.pk for related in inst.related_insts)
You may take note of the use of only to only pull the pks from the db. There is an open ticket to allow the use of values and (I presume) values_list in Prefetch queries. This would allow you to do the following.
qs = A.objects.prefetch_related(Prefetch(
'related',
queryset=A.objects.values_list('pk', flat=True),
to_attr='related_ids')).filter(pk__in=my_list_of_pks)
for inst in qs:
related_ids = inst.related_ids
You could of course optimize further, for example by using qs.only('related_insts') on the primary queryset, but make sure you aren't doing anything with these instances-- they're essentially just expensive containers to hold your related_ids.
I believe this is the best that's available for now (without custom queries). To get to exactly what you want, two things are needed:
The feature above is implemented
values_list is made to work with Prefetch to_attr like it does for annotations.
With these two things in place (and continuing the above example) you could do the following to get exactly what you requested:
d = qs.values_list('related_ids', flat=True).in_bulk()
for pk, related_pks in d.items():
print 'Containing Objects %s' % pk
print 'Related objects %s' % related_pks
# And lookups
print 'Object %d has related objects %s' % (20, d[20])
I've left off some details explaining things, but it should be pretty clear from the documentation. If you need any clarification, don't hesitate!

If you're using Postgres:
from django.contrib.postgres.aggregates import ArrayAgg
qs = A.objects.filter(pk__in=[1,2,6]).annotate(related_ids=ArrayAgg('related')).only('id')
mapping = {a.id: a.related_ids for a in qs}
You can also use filter/ordering in the ArrayAgg.

Related

Django: Filter a QuerySet and select results foreign key

In Django, I have two models:
class A(models.Model):
# lots of fields
class B(models.Model):
a = models.ForeignKey(A)
member = models.BooleanField()
I need to construct a query that filters B and selects all A, something like this:
result = B.objects.filter(member=True).a
Above example code will of course return an error QuerySet has no attribute 'a'
Expected result:
a QuerySet containing only A objects
Whats the best and fastest way to achieve the desired functionality?

I assume you are looking for something like
result = A.objects.filter(b__member=True)

An alternative to Andrey Zarubin's answer would be to iterate over the queryset you had and create a list of a objects.
b_objects = B.objects.filter(member=True)
a_objects = [result.a for result in b_objects]

Below code will not filter everything but it will filter all the values with respect to field, might be you are looking for same
B.objects.filter(member=True).filter(a__somefield='some value')

Django one to one relation queryset

I have following two models
class A(models.Model):
name = models.CharField()
age = models.SmallIntergerField()
class B(models.Model):
a = models.OneToOneField(A)
salary = model.IntergerField()
No I have got records both of them. I want to query Model A with known id and I want both A and B records.
The SQL query is:
SELECT A.id, A.name, A.age, B.salary
FROM A INNER JOIN B ON A.id = B.a_id
WHERE A.id=1
Please provide me django query (by using orm). I want to achieve this with one queryset.

q = B.objects.filter(id=id).values('salary','a__id','a__name','a__age')
this will return a ValuesQuerySet
values
values(*fields) Returns a ValuesQuerySet — a QuerySet subclass that
returns dictionaries when used as an iterable, rather than
model-instance objects.
Each of those dictionaries represents an object, with the keys
corresponding to the attribute names of model objects.
You can actually print q.query to get the sql query behind the QuerySet, which in this case is exactly as you requested.

Please try this:
result = B.objects.filter(a__id=1).values('a__id', 'a__name', 'a__age', 'salary')
The result is a <class 'django.db.models.query.ValuesQuerySet'>, which is essentially a list of dictionaries with key as the field name and value as the actual value. If you want only the values, do this:
result = B.objects.filter(a__id=1).values_list('a__id', 'a__name', 'a__age', 'salary')
The result is a <class 'django.db.models.query.ValuesListQuerySet'>, and it's essentially a list of tuples.

How can I filter by key, or keys, a query in Python for Google App Engine?

I have a query and I can apply filters on them without any problem. This works fine:
query.filter('foo =', 'bar')
But what if I want to filter my query by key or a list of keys?
I have them as Key() property or as a string and by trying something like this, it didn't work:
query.filter('key =', 'some_key') #no success
query.filter('key IN', ['key1', 'key2']) #no success

Whilst it's possible to filter on key - see #dplouffe's answer - it's not a good idea. 'IN' clauses execute one query for each item in the clause, so you end up doing as many queries as there are keys, which is a particularly inefficient way to achieve your goal.
Instead, use a batch fetch operation, as #Luke documents, then filter any elements you don't want out of the list in your code.

You can filter queries by doing a GQL Query like this:
result = db.GqlQuery('select * from Model where __key__ IN :1', [db.Key.from_path('Model', 'Key1'), db.Key.from_path('Model', 'Key2')]).fetch(2)
or
result = Model.get([db.Key.from_path('Model', 'Key1'), db.Key.from_path('ProModelduct', 'Key2')])

You cannot filter on a Key. Oops, I was wrong about that. You can filter on a key and other properties at the same time if you have an index set up to handle it. It would look like this:
key = db.Key.from_path('MyModel', 'keyname')
MyModel.all().filter("__key__ =", key).filter('foo = ', 'bar')
You can also look up a number of models by their keys, key IDs, or key names with the get family of methods.
# if you have the key already, or can construct it from its path
models = MyModel.get(Key.from_path(...), ...)
# if you have keys with names
models = MyModel.get_by_key_name('asdf', 'xyz', ...)
# if you have keys with IDs
models = MyModel.get_by_id(123, 456, ...)
You can fetch many entities this way. I don't know the exact limit. If any of the keys doesn't exist, you'll get a None in the list for that entity.
If you need to filter on some property as well as the key, you'll have to do that in two steps. Either fetch by the keys and check for the property, or query on the property and validate the keys.
Here's an example of filtering after fetching. Note that you don't use the Query class's filter method. Instead just filter the list.
models = MyModels.get_by_key_name('asdf', ...)
filtered = itertools.ifilter(lambda x: x.foo == 'bar', models)

Have a look at: https://developers.google.com/appengine/docs/python/ndb/entities?hl=de#multiple
list_of_entities = ndb.get_multi(list_of_keys)

How to combine multiple querysets in Django?

I'm trying to build the search for a Django site I am building, and in that search, I am searching in three different models. And to get pagination on the search result list, I would like to use a generic object_list view to display the results. But to do that, I have to merge three querysets into one.
How can I do that? I've tried this:
result_list = []
page_list = Page.objects.filter(
Q(title__icontains=cleaned_search_term) |
Q(body__icontains=cleaned_search_term))
article_list = Article.objects.filter(
Q(title__icontains=cleaned_search_term) |
Q(body__icontains=cleaned_search_term) |
Q(tags__icontains=cleaned_search_term))
post_list = Post.objects.filter(
Q(title__icontains=cleaned_search_term) |
Q(body__icontains=cleaned_search_term) |
Q(tags__icontains=cleaned_search_term))
for x in page_list:
result_list.append(x)
for x in article_list:
result_list.append(x)
for x in post_list:
result_list.append(x)
return object_list(
request,
queryset=result_list,
template_object_name='result',
paginate_by=10,
extra_context={
'search_term': search_term},
template_name="search/result_list.html")
But this doesn't work. I get an error when I try to use that list in the generic view. The list is missing the clone attribute.
How can I merge the three lists, page_list, article_list and post_list?

Concatenating the querysets into a list is the simplest approach. If the database will be hit for all querysets anyway (e.g. because the result needs to be sorted), this won't add further cost.
from itertools import chain
result_list = list(chain(page_list, article_list, post_list))
Using itertools.chain is faster than looping each list and appending elements one by one, since itertools is implemented in C. It also consumes less memory than converting each queryset into a list before concatenating.
Now it's possible to sort the resulting list e.g. by date (as requested in hasen j's comment to another answer). The sorted() function conveniently accepts a generator and returns a list:
result_list = sorted(
chain(page_list, article_list, post_list),
key=lambda instance: instance.date_created)
If you're using Python 2.4 or later, you can use attrgetter instead of a lambda. I remember reading about it being faster, but I didn't see a noticeable speed difference for a million item list.
from operator import attrgetter
result_list = sorted(
chain(page_list, article_list, post_list),
key=attrgetter('date_created'))

Try this:
matches = pages | articles | posts
It retains all the functions of the querysets which is nice if you want to order_by or similar.
Please note: this doesn't work on querysets from two different models.

Related, for mixing querysets from the same model, or for similar fields from a few models, starting with Django 1.11 a QuerySet.union() method is also available:
union()
union(*other_qs, all=False)
New in Django 1.11. Uses SQL’s UNION operator to combine the results of two or more QuerySets. For example:
>>> qs1.union(qs2, qs3)
The UNION operator selects only distinct values by default. To allow duplicate values, use the all=True
argument.
union(), intersection(), and difference() return model instances of
the type of the first QuerySet even if the arguments are QuerySets of
other models. Passing different models works as long as the SELECT
list is the same in all QuerySets (at least the types, the names don’t
matter as long as the types in the same order).
In addition, only LIMIT, OFFSET, and ORDER BY (i.e. slicing and
order_by()) are allowed on the resulting QuerySet. Further, databases
place restrictions on what operations are allowed in the combined
queries. For example, most databases don’t allow LIMIT or OFFSET in
the combined queries.

You can use the QuerySetChain class below. When using it with Django's paginator, it should only hit the database with COUNT(*) queries for all querysets and SELECT() queries only for those querysets whose records are displayed on the current page.
Note that you need to specify template_name= if using a QuerySetChain with generic views, even if the chained querysets all use the same model.
from itertools import islice, chain
class QuerySetChain(object):
"""
Chains multiple subquerysets (possibly of different models) and behaves as
one queryset. Supports minimal methods needed for use with
django.core.paginator.
"""
def __init__(self, *subquerysets):
self.querysets = subquerysets
def count(self):
"""
Performs a .count() for all subquerysets and returns the number of
records as an integer.
"""
return sum(qs.count() for qs in self.querysets)
def _clone(self):
"Returns a clone of this queryset chain"
return self.__class__(*self.querysets)
def _all(self):
"Iterates records in all subquerysets"
return chain(*self.querysets)
def __getitem__(self, ndx):
"""
Retrieves an item or slice from the chained set of results from all
subquerysets.
"""
if type(ndx) is slice:
return list(islice(self._all(), ndx.start, ndx.stop, ndx.step or 1))
else:
return islice(self._all(), ndx, ndx+1).next()
In your example, the usage would be:
pages = Page.objects.filter(Q(title__icontains=cleaned_search_term) |
Q(body__icontains=cleaned_search_term))
articles = Article.objects.filter(Q(title__icontains=cleaned_search_term) |
Q(body__icontains=cleaned_search_term) |
Q(tags__icontains=cleaned_search_term))
posts = Post.objects.filter(Q(title__icontains=cleaned_search_term) |
Q(body__icontains=cleaned_search_term) |
Q(tags__icontains=cleaned_search_term))
matches = QuerySetChain(pages, articles, posts)
Then use matches with the paginator like you used result_list in your example.
The itertools module was introduced in Python 2.3, so it should be available in all Python versions Django runs on.

In case you want to chain a lot of querysets, try this:
from itertools import chain
result = list(chain(*docs))
where: docs is a list of querysets

The big downside of your current approach is its inefficiency with large search result sets, as you have to pull down the entire result set from the database each time, even though you only intend to display one page of results.
In order to only pull down the objects you actually need from the database, you have to use pagination on a QuerySet, not a list. If you do this, Django actually slices the QuerySet before the query is executed, so the SQL query will use OFFSET and LIMIT to only get the records you will actually display. But you can't do this unless you can cram your search into a single query somehow.
Given that all three of your models have title and body fields, why not use model inheritance? Just have all three models inherit from a common ancestor that has title and body, and perform the search as a single query on the ancestor model.

This can be achieved by two ways either.
1st way to do this
Use union operator for queryset | to take union of two queryset. If both queryset belongs to same model / single model than it is possible to combine querysets by using union operator.
For an instance
pagelist1 = Page.objects.filter(
Q(title__icontains=cleaned_search_term) |
Q(body__icontains=cleaned_search_term))
pagelist2 = Page.objects.filter(
Q(title__icontains=cleaned_search_term) |
Q(body__icontains=cleaned_search_term))
combined_list = pagelist1 | pagelist2 # this would take union of two querysets
2nd way to do this
One other way to achieve combine operation between two queryset is to use itertools chain function.
from itertools import chain
combined_results = list(chain(pagelist1, pagelist2))

You can use Union:
qs = qs1.union(qs2, qs3)
But if you want to apply order_by on the foreign models of the combined queryset... then you need to Select them beforehand this way... otherwise it won't work.
Example
qs = qs1.union(qs2.select_related("foreignModel"), qs3.select_related("foreignModel"))
qs.order_by("foreignModel__prop1")
where prop1 is a property in the foreign model.

DATE_FIELD_MAPPING = {
Model1: 'date',
Model2: 'pubdate',
}
def my_key_func(obj):
return getattr(obj, DATE_FIELD_MAPPING[type(obj)])
And then sorted(chain(Model1.objects.all(), Model2.objects.all()), key=my_key_func)
Quoted from https://groups.google.com/forum/#!topic/django-users/6wUNuJa4jVw. See Alex Gaynor

Requirements:
Django==2.0.2, django-querysetsequence==0.8
In case you want to combine querysets and still come out with a QuerySet, you might want to check out django-queryset-sequence.
But one note about it. It only takes two querysets as it's argument. But with python reduce you can always apply it to multiple querysets.
from functools import reduce
from queryset_sequence import QuerySetSequence
combined_queryset = reduce(QuerySetSequence, list_of_queryset)
And that's it. Below is a situation I ran into and how I employed list comprehension, reduce and django-queryset-sequence
from functools import reduce
from django.shortcuts import render
from queryset_sequence import QuerySetSequence
class People(models.Model):
user = models.OneToOneField(User, on_delete=models.CASCADE)
mentor = models.ForeignKey('self', null=True, on_delete=models.SET_NULL, related_name='my_mentees')
class Book(models.Model):
name = models.CharField(max_length=20)
owner = models.ForeignKey(Student, on_delete=models.CASCADE)
# as a mentor, I want to see all the books owned by all my mentees in one view.
def mentee_books(request):
template = "my_mentee_books.html"
mentor = People.objects.get(user=request.user)
my_mentees = mentor.my_mentees.all() # returns QuerySet of all my mentees
mentee_books = reduce(QuerySetSequence, [each.book_set.all() for each in my_mentees])
return render(request, template, {'mentee_books' : mentee_books})

Here's an idea... just pull down one full page of results from each of the three and then throw out the 20 least useful ones... this eliminates the large querysets and that way you only sacrifice a little performance instead of a lot.

The best option is to use the Django built-in methods:
# Union method
result_list = page_list.union(article_list, post_list)
That will return the union of all the objects in those querysets.
If you want to get just the objects that are in the three querysets, you will love the built-in method of querysets, intersection.
# intersection method
result_list = page_list.intersection(article_list, post_list)

This will do the work without using any other libraries:
result_list = page_list | article_list | post_list

You can use "|"(bitwise or) to combine the querysets of the same model as shown below:
# "store/views.py"
from .models import Food
from django.http import HttpResponse
def test(request):
# ↓ Bitwise or
result = Food.objects.filter(name='Apple') | Food.objects.filter(name='Orange')
print(result)
return HttpResponse("Test")
Output on console:
<QuerySet [<Food: Apple>, <Food: Orange>]>
[22/Jan/2023 12:51:44] "GET /store/test/ HTTP/1.1" 200 9
And, you can use |= to add the queryset of the same model as shown below:
# "store/views.py"
from .models import Food
from django.http import HttpResponse
def test(request):
result = Food.objects.filter(name='Apple')
# ↓↓ Here
result |= Food.objects.filter(name='Orange')
print(result)
return HttpResponse("Test")
Output on console:
<QuerySet [<Food: Apple>, <Food: Orange>]>
[22/Jan/2023 12:51:44] "GET /store/test/ HTTP/1.1" 200 9
Be careful, if adding the queryset of a different model as shown below:
# "store/views.py"
from .models import Food, Drink
from django.http import HttpResponse
def test(request):
# "Food" model # "Drink" model
result = Food.objects.filter(name='Apple') | Drink.objects.filter(name='Milk')
print(result)
return HttpResponse("Test")
There is an error below:
AssertionError: Cannot combine queries on two different base models.
[22/Jan/2023 13:40:54] "GET /store/test/ HTTP/1.1" 500 96025
But, if adding the empty queryset of a different model as shown below:
# "store/views.py"
from .models import Food, Drink
from django.http import HttpResponse
def test(request):
# "Food" model # Empty queryset of "Drink" model
result = Food.objects.filter(name='Apple') | Drink.objects.none()
print(result)
return HttpResponse("Test")
There is no error below:
<QuerySet [<Food: Apple>]>
[22/Jan/2023 13:51:09] "GET /store/test/ HTTP/1.1" 200 9
Again be careful, if adding the object by get() as shown below:
# "store/views.py"
from .models import Food
from django.http import HttpResponse
def test(request):
result = Food.objects.filter(name='Apple')
# ↓↓ Object
result |= Food.objects.get(name='Orange')
print(result)
return HttpResponse("Test")
There is an error below:
AttributeError: 'Food' object has no attribute '_known_related_objects'
[22/Jan/2023 13:55:57] "GET /store/test/ HTTP/1.1" 500 95748

This recursive function concatenates array of querysets into one queryset.
def merge_query(ar):
if len(ar) ==0:
return [ar]
while len(ar)>1:
tmp=ar[0] | ar[1]
ar[0]=tmp
ar.pop(1)
return ar

Filter by property

Is it possible to filter a Django queryset by model property?
i have a method in my model:
#property
def myproperty(self):
[..]
and now i want to filter by this property like:
MyModel.objects.filter(myproperty=[..])
is this somehow possible?

Nope. Django filters operate at the database level, generating SQL. To filter based on Python properties, you have to load the object into Python to evaluate the property--and at that point, you've already done all the work to load it.

I might be misunderstanding your original question, but there is a filter builtin in python.
filtered = filter(myproperty, MyModel.objects)
But it's better to use a list comprehension:
filtered = [x for x in MyModel.objects if x.myproperty()]
or even better, a generator expression:
filtered = (x for x in MyModel.objects if x.myproperty())

Riffing off #TheGrimmScientist's suggested workaround, you can make these "sql properties" by defining them on the Manager or the QuerySet, and reuse/chain/compose them:
With a Manager:
class CompanyManager(models.Manager):
def with_chairs_needed(self):
return self.annotate(chairs_needed=F('num_employees') - F('num_chairs'))
class Company(models.Model):
# ...
objects = CompanyManager()
Company.objects.with_chairs_needed().filter(chairs_needed__lt=4)
With a QuerySet:
class CompanyQuerySet(models.QuerySet):
def many_employees(self, n=50):
return self.filter(num_employees__gte=n)
def needs_fewer_chairs_than(self, n=5):
return self.with_chairs_needed().filter(chairs_needed__lt=n)
def with_chairs_needed(self):
return self.annotate(chairs_needed=F('num_employees') - F('num_chairs'))
class Company(models.Model):
# ...
objects = CompanyQuerySet.as_manager()
Company.objects.needs_fewer_chairs_than(4).many_employees()
See https://docs.djangoproject.com/en/1.9/topics/db/managers/ for more.
Note that I am going off the documentation and have not tested the above.

Looks like using F() with annotations will be my solution to this.
It's not going to filter by #property, since F talks to the databse before objects are brought into python. But still putting it here as an answer since my reason for wanting filter by property was really wanting to filter objects by the result of simple arithmetic on two different fields.
so, something along the lines of:
companies = Company.objects\
.annotate(chairs_needed=F('num_employees') - F('num_chairs'))\
.filter(chairs_needed__lt=4)
rather than defining the property to be:
#property
def chairs_needed(self):
return self.num_employees - self.num_chairs
then doing a list comprehension across all objects.

I had the same problem, and I developed this simple solution:
objects = [
my_object
for my_object in MyModel.objects.all()
if my_object.myProperty == [...]
]
This is not a performatic solution, it shouldn't be done in tables that contains a large amount of data. This is great for a simple solution or for a personal small project.

PLEASE someone correct me, but I guess I have found a solution, at least for my own case.
I want to work on all those elements whose properties are exactly equal to ... whatever.
But I have several models, and this routine should work for all models. And it does:
def selectByProperties(modelType, specify):
clause = "SELECT * from %s" % modelType._meta.db_table
if len(specify) > 0:
clause += " WHERE "
for field, eqvalue in specify.items():
clause += "%s = '%s' AND " % (field, eqvalue)
clause = clause [:-5] # remove last AND
print clause
return modelType.objects.raw(clause)
With this universal subroutine, I can select all those elements which exactly equal my dictionary of 'specify' (propertyname,propertyvalue) combinations.
The first parameter takes a (models.Model),
the second a dictionary like:
{"property1" : "77" , "property2" : "12"}
And it creates an SQL statement like
SELECT * from appname_modelname WHERE property1 = '77' AND property2 = '12'
and returns a QuerySet on those elements.
This is a test function:
from myApp.models import myModel
def testSelectByProperties ():
specify = {"property1" : "77" , "property2" : "12"}
subset = selectByProperties(myModel, specify)
nameField = "property0"
## checking if that is what I expected:
for i in subset:
print i.__dict__[nameField],
for j in specify.keys():
print i.__dict__[j],
print
And? What do you think?

i know it is an old question, but for the sake of those jumping here i think it is useful to read the question below and the relative answer:
How to customize admin filter in Django 1.4

It may also be possible to use queryset annotations that duplicate the property get/set-logic, as suggested e.g. by #rattray and #thegrimmscientist, in conjunction with the property. This could yield something that works both on the Python level and on the database level.
Not sure about the drawbacks, however: see this SO question for an example.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django: Retrieving IDs of manyToMany fields quickly - python

If you're using Postgres: from django.contrib.postgres.aggregates import ArrayAgg qs = A.objects.filter(pk__in=[1,2,6]).annotate(related_ids=ArrayAgg('related')).only('id') mapping = {a.id: a.related_ids for a in qs} You can also use filter/ordering in the ArrayAgg.

Related

Django: Filter a QuerySet and select results foreign key

Django one to one relation queryset

How can I filter by key, or keys, a query in Python for Google App Engine?

How to combine multiple querysets in Django?

Filter by property

Categories

Resources