List of ObjectIDs for an Algolia Index

List of ObjectIDs for an Algolia Index - python

Is there a way to retrieve all objectIDs from an Algolia Index?
I know there is [*Index Name*].browse_all() which in the docs say it can retrieve 1000 objects at a time but it retrieves the entire object rather than just the objectIDs.
I can work with pagination but would rather not and do not want to pull the entire object because our indexes are not small.

Browse is the right way to go.
The good thing is that you can specify arguments while performing a browse_all and one of them can be attributesToRetrieve: [] to not retrieve any attributes. You'll therefore only get the objectID.

Related

Firestore & Python, updating a specific object within an array

Below is an example of the data that I have within my Firestore Application.
I am looking to update the "prices" object using python, is this a possibility? I am finding it increasingly hard to get all the way to the data.
I don't want you to code this for me, i'm merly asking if it is possible and just some quick guidance on how to achieve it.
My api will fetch prices, then find the "ID" of the specific conditions which is stored in the "id" field within the condtion array, if will then update the "prices" object of that found id.
Like I said, is this possible? And a few pointers on how to achieve this would be great!
Kind Regards,
Josh

Updating just one field of an array item is not possible.
The most you could do would be to update only an entire array item. You can do this with arrayRemove() and arrayUnion(). More info in the docs

Firestore query takes a too long time to get the value of only one field

. Hi, community.
I have a question/issue about firestore query from Firebase.
I have a collection of around 18000 documents. I would like to get the value of a single same field of some of these documents. I use the python firestore_v1 library from google-cloud-python client. So, for example with list_edges.length = 250:
[db_firestore.document(f"edges/{edge['id']}").get({"distance"}).to_dict()["distance"] for edge in list_edges]
it takes like 30+ seconds to be evaluated, meanwhile with the equal collection on MongoDB it takes not more than 3 seconds doing this and loading the whole object, not only a one field:
list(db_mongo["edges"].find({"city_id":{"$eq":city_id},"id": {"$in": [edge_id for edge in list_edges]}}))
...having said that, I thought the solution could be separate the large collection by city_id, so I create a new collection and copy the corresponded documents inside, so now the query looks like:
[db_firestore.document(f"edges/7/edges/{edge['id']}").get({"distance"}).to_dict()["distance"] for edge in list_edges]
where 7 is a city_id.
However, it takes the same time. So, maybe the issue is around the .get() method, but I could not find any optimized solution for my case.
Could you help me with this? Thanks!
EDITED
I've got the answer from firestore support. The problem is that I make 250 requests doing .get() for each document separately. The idea is to get all the data I want in only one request, so I need to modify the query.
Let's assume I have the next DB:
edges collection with multiples edge_id documents. For each new request, I use a new generated list of edges I need to catch.
In MongoDB, I can do it with the $in operator (having edge_id inside the document), but in firestore, the 'in' operator only accepts up to 10 equality.
So, I need to find out another way to do this.
Any ideas? Thanks!

Firebase recently added support for a limited in operation. See:
The blog post announcing the feature.
The documentation on in and array-contains-any queries.
From the latter:
cities_ref = db.collection(u'cities')
query = cities_ref.where(u'country', u'in', [u'USA', u'Japan'])
A few caveats though:
You can have at most 10 values in the in clause, and you can have only on in (or array-contains-any) clause in query.
I am not sure if you can use this operator to select by ID.

Does a MongoDB query need explicit sort() call if data is retrieved from an index?

I am building a mongo database to store data that will be time stamped. Each document in my database has a time field:
{"time":<datetime-object>}
I have created an index for the time field as so:
self.db.test.create_index([("time", pymongo.ASCENDING)])
And have a query that requests only the time stamp information from the database:
self.db.test.find({'time':{'$gte':start, '$lte':end}}, {"time":1, "_id":0}).sort([("time", 1)])
I have read other questions/documentation that say using an index to get documents should return documents in sorted order since the index itself is already sorted, but all of the examples that I saw still had a direct call to sort() as part of the query. My question is, if I am specifically requesting only one field that I have an index for from the database, do I need to include the sort() method as part of my query, or will the documents be returned in sorted order?

if I am specifically requesting only one field that I have an index for from the database, do I need to include the sort() method as part of my query, or will the documents be returned in sorted order?
In your example case of a single field with an index where it is a covered query, then the order returned would be the order from the index itself.
However, in the case of a multikey field with multikey index is not so. This is because multikey indexes cannot cover queries over array field(s).
It is recommended to specify sort() regardless because:
The query planner will discard the sort stage automatically if it's able to use an index. See also Query Optimisation and Explain Results for more information.
Explicitly specifying sort() is not only going to protect your code against the unexpected (i.e. inconsistent values, etc) but also make the code readable.
You may also be interested in Use Indexes to Sort Query Results

Building simlpe django firehose

I have an app that I want to build a "recent activity"/firehose feed of 2-3 combined types of activity such as posts, comments, and likes of posts, and something else + maybe more later. I assume this is done with a query of taking the last of the appropriate object added to the DB and combining it with the last of the other type of object and ordering the new combined list of objects by their timestamps. What is the best way to do something like this? For now, I have something like this for every time someone refreshes the page:
NewPost.objects.all().order_by('-postdate')[0:10] #takes the last 16 recently added posts
Comment.objects.all().order_by('-commentdate')[0:10] #takes equal number of comments site wide ordered by timestamp
So what is the best way to take both of these querysets and render the different Models in 1 list ordered by their timestamp? I assume the type of logic will be the same for adding additional types of objects, so I just want to know how to do it with just 2. Thanks!

I don't really like your approach since when you want to put another object on the firehose you'd need to add a third line (AnotherObject.objects.all ... etc ) to all places you need to display that firehose !
For me, the best way to do this is to create a Firehose Model with fields like: date, action (add/delete/update etc) and object (a generic Foreign Key to the object that was changed). Now, whenever you make a change to an object that you want to add to the firehose, you'd add a new instance of the FirehoseClass with the correct field values. Finally, whenever you want to display the firehose you'll just display all firehose objects.

To combine the lists, you can use create a list by using chain() from itertools, and then sort them by using sorted():
from itertools import chain
combined_lists = list(chain(new_post_list, comment_list))
sorted_combinened_list = sorted(combined_list, key=lambda instance: instance.postdate)
However, as you see, the sorting is only done by using one key. I don't know of any method to use two different keys when sorting. You could fix this by simply add a property to the Comment class, named postdate that simply returns commentdate. Or, even better, you should use the same name for creation time for all your models, e.g. created_at.
This has been answered earlier and more detailed here: How to combine 2 or more querysets in a Django view?

Getting a field from document via mongoengine

I have a places collection, from which i was trying to extract the place names to suggest to the user, but it's taking much time, would like to know if there are any ways to optimize. I use mongoengine ORM and the database is mongodb.
query:
results = Place.objects(name__istartswith=query).only('name')
the query takes very less time in the matter of microseconds.
but now when i try to access the names from results
names = [result.name for result in results]
this line takes a very long time, varies from 3-5 secs, for a list of length around 2500.
I have tried using scalar, but now the time increases when i do an union over another list.
Is there a better way to access the names list.

A queryset isn't actioned until its iterated so results = Place.objects(name=query).only('name') returns a queryset that hasn't been called yet. When you iterate it the query takes place and data is sent over the wire.
Is the query slow when running via pymongo? As you don't need them as MongoEngine objects try using as_pymongo - which returns raw dictionaries back.
Other hints are to make sure the query is performant - using an index - see the profiler docs.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

List of ObjectIDs for an Algolia Index - python

Browse is the right way to go. The good thing is that you can specify arguments while performing a browse_all and one of them can be attributesToRetrieve: [] to not retrieve any attributes. You'll therefore only get the objectID.

Related

Firestore & Python, updating a specific object within an array

Firestore query takes a too long time to get the value of only one field

Does a MongoDB query need explicit sort() call if data is retrieved from an index?

Building simlpe django firehose

Getting a field from document via mongoengine

Categories

Resources