MongoEngine: Limiting number of responses from DBRef - python

I have a document with around 7k DBRefs in one field to other objects. I want to limit the number of the objects coming back when I query the DBRef field but I cannot find an obvious way of doing it.
project = Project.objects.find({'id': 1})
users = project.users[:10]
On line 2 MongoEngine performs a query to retrieve ALL the users not just the first 10. What can I do to limit the query to only retrieve the first 10?

users = project.users[:10],
This operation is a client side operation, which is performed on the users array that has all the 7k DBRefs values returned by mongodb.
What can I do to limit the query to only retrieve the first 10?
You need to include a projection operation to just select the first 10 elements in the users array.
Project.objects.find({"id": 1},{"users":{"$slice":10}})
The syntax in MongoEngine:
Project.objects(id=1).fields(slice__users[0,10])

If I understand you correctly, there is no way to return a portion of one field. You can pick and choose what fields you are returning, but there is no way to specify a portion of one field.

Related

distance based ordering in django

So i am using django and get user's location at registration time.
Then i show these users on the front page of the app but sorted as per the distance, i.e, the closest ones to the logged in user are on the top and so on.
Now what i am doing is i am ordering them as per distance on the backend using some annotate (etc) functions provided by django ORM.
sortedQueryset = self.get_queryset().annotate(distance=Distance(
'coords', user.coords, spheroid=True)).order_by('distance')
Where 'coords' is the column in db to store the point (location), user.coords is point (coordinates) of the logged in user.
Now to get only first 100 users (say) from the database i can do something like this;
sortedQueryset = self.get_queryset().annotate(distance=Distance(
'coords', user.coords, spheroid=True)).order_by('distance')[:100]
But what it think, it still grabs all the rows, orders them as per distance and then gets 100 of them. Say we have a million users in db, then it always has to get all those and then sort them and then get only 100.
I think it is a lot of overwork (maybe i am wrong or maybe this is the only way as i have to sort as per distance and that also depends on the logged in user, who is closest and who is farthest).
Any suggestions are appreciated. Thanks!
Actually what you have done is right only. This will not slice in Python but limit it in the database query itself. So it won't get all the results and slice it, instead, it runs LIMIT query against the database. See the documentation.
https://docs.djangoproject.com/en/dev/topics/db/queries/#limiting-querysets

Django: Store Q query objects for repeatable search?

In my Django based web app users can perform a search; the query consists of several dynamically constructed complex Q objects.
Depending on the user search parameters, the search will query a variable number of columns and also can stretch over multiple models.
The user should be able to save her search to repeat it at some later point.
For that I'd like to store the Q objects (I guess) in a database table.
Is this good practice? How would you approach this?
Thanks in advance.
If you have just one or a fixed number of Q objects as part of the filter, you can save the argument passed to Q as a dict.
.e.g This:
Q(buy_book__entity__type=ENTITY.INTERNAL)
Is equivalent to this:
q_filter = {"buy_book__entity__type": ENTITY.INTERNAL}
Q(**q_filter)
You can save q_filter in your datastore.

Firestore query takes a too long time to get the value of only one field

. Hi, community.
I have a question/issue about firestore query from Firebase.
I have a collection of around 18000 documents. I would like to get the value of a single same field of some of these documents. I use the python firestore_v1 library from google-cloud-python client. So, for example with list_edges.length = 250:
[db_firestore.document(f"edges/{edge['id']}").get({"distance"}).to_dict()["distance"] for edge in list_edges]
it takes like 30+ seconds to be evaluated, meanwhile with the equal collection on MongoDB it takes not more than 3 seconds doing this and loading the whole object, not only a one field:
list(db_mongo["edges"].find({"city_id":{"$eq":city_id},"id": {"$in": [edge_id for edge in list_edges]}}))
...having said that, I thought the solution could be separate the large collection by city_id, so I create a new collection and copy the corresponded documents inside, so now the query looks like:
[db_firestore.document(f"edges/7/edges/{edge['id']}").get({"distance"}).to_dict()["distance"] for edge in list_edges]
where 7 is a city_id.
However, it takes the same time. So, maybe the issue is around the .get() method, but I could not find any optimized solution for my case.
Could you help me with this? Thanks!
EDITED
I've got the answer from firestore support. The problem is that I make 250 requests doing .get() for each document separately. The idea is to get all the data I want in only one request, so I need to modify the query.
Let's assume I have the next DB:
edges collection with multiples edge_id documents. For each new request, I use a new generated list of edges I need to catch.
In MongoDB, I can do it with the $in operator (having edge_id inside the document), but in firestore, the 'in' operator only accepts up to 10 equality.
So, I need to find out another way to do this.
Any ideas? Thanks!
Firebase recently added support for a limited in operation. See:
The blog post announcing the feature.
The documentation on in and array-contains-any queries.
From the latter:
cities_ref = db.collection(u'cities')
query = cities_ref.where(u'country', u'in', [u'USA', u'Japan'])
A few caveats though:
You can have at most 10 values in the in clause, and you can have only on in (or array-contains-any) clause in query.
I am not sure if you can use this operator to select by ID.

Django ORM: Filter results by values from list, limit answers per value?

I'm using Django 2.0 and have a Content model with a ForeignKey(User, ...). I also have a list of user IDs for which I'd like to fetch that Content, ordered by "newest first", but only up to 25 elements per user. I know I can do this:
Content.objects.filter(user_id__in=[1, 2, 3, ...]).order_by('-id')
...to fetch all the Content objects created by each of these users, plus I'll get it all sorted with newest elements first. But I'd like to fetch up to 25 elements for each of these users (some users might create hundreds of these objects, some might create zero). There's of course the dumb way:
for user in [1, 2, 3, ...]:
Content.objects.filter(user_id=user).order_by('-id')[:25]
This however hits the database as many times as there's objects in the user ID list, and that goes quite high (around 100 or so per page view). Is there any way to optimize this case? (I've tried looking around select_related, but that seems to fetch as many related models as possible.)
There are plenty of ways to form a greatest-n-per-group query, but in this case you could form a union of top-n queries of all users:
contents = Content.objects.\
none().\
union(*[Content.objects.
filter(user_id=uid).
order_by('-id')[:25] for uid in user_ids],
all=True)
Using prefetch_related() you could then produce a queryset that fetches the users and injects an attribute of latest content:
users = User.objects.\
filter(id__in=user_ids).\
prefetch_related(models.Prefetch(
'content_set',
queryset=contents,
to_attr='latest_content'))
Does it actually hit the database that many times? I have not looked at the raw SQL but according to the documentation it is equivalent to the LIMIT clause and it also states "Generally, slicing a QuerySet returns a new QuerySet – it doesn’t evaluate the query".
https://docs.djangoproject.com/en/2.0/topics/db/queries/#limiting-querysets
I would be curious to see the raw SQL if you are looking at it and it does NOT do this as I use this paradigm.

Getting a field from document via mongoengine

I have a places collection, from which i was trying to extract the place names to suggest to the user, but it's taking much time, would like to know if there are any ways to optimize. I use mongoengine ORM and the database is mongodb.
query:
results = Place.objects(name__istartswith=query).only('name')
the query takes very less time in the matter of microseconds.
but now when i try to access the names from results
names = [result.name for result in results]
this line takes a very long time, varies from 3-5 secs, for a list of length around 2500.
I have tried using scalar, but now the time increases when i do an union over another list.
Is there a better way to access the names list.
A queryset isn't actioned until its iterated so results = Place.objects(name=query).only('name') returns a queryset that hasn't been called yet. When you iterate it the query takes place and data is sent over the wire.
Is the query slow when running via pymongo? As you don't need them as MongoEngine objects try using as_pymongo - which returns raw dictionaries back.
Other hints are to make sure the query is performant - using an index - see the profiler docs.

Categories