Google Datastore Order with ancestors

Google Datastore Order with ancestors - python

Language: Python.
I'm using datastore python lib, everything works fine. When using datastore query to sort the query result, I can add query.order = ['name']; to sort the result. But when query a table with ancestors, like:
ancestor = client.key('user', name, namespace = NAMESPACE)
query = client.query(kind='project', ancestor=ancestor, namespace = NAMESPACE)
Then I set order: query.order = ['name'];, it doesn't work. I wanna sort on the kind project, whose ancestor is kind user.
The error message is: "400 no matching index found. recommended index is:↵- kind: project↵ ancestor: yes↵ properties:↵ - name: name", which is a yaml sample. But I'm not using yaml here. I think there must be a way to sort the result though there's ancestor.

All datastore queries are index-based. From Indexes:
Every Google Cloud Datastore query computes its results using one
or more indexes which contain entity keys in a sequence specified by
the index's properties and, optionally, the entity's ancestors. The
indexes are updated to reflect any changes the application makes to
its entities, so that the correct results of all queries are available
with no further computation needed.
The error you're getting comes from the Datastore side, indicating that the index required for your specific query is missing in the Datastore. It is unrelated to doing an ancestor query and/or to using yaml in your application code.
The yaml reference is simply coming from the way the indexes are being deployed to the Datastore (which is something you need to do for your query to work), see Deploying or deleting indexes.
So you need to create an index.yaml file containing at least the index specification indicated in the error message (and any other indexes you may need for other queries, if any), deploy it to the Datastore, wait for the index to get into the Serving state on the Indexes page (which may take a while), after which your query should work.

Related

Google Cloud Datastore Indexes for count queries

Google cloud datastore mandates that there needs to be composite indexes built to query on multiple fields of one kind. Taking the following query for example,
class Greeting(ndb.Model):
user = ndb.StringProperty()
place = ndb.StringProperty()
# Query 1
Greeting.query(Greeting.user == 'yash#gmail.com', Greeting.place == 'London').fetch()
# Query 2
Greeting.query(Greeting.user == 'yash#gmail.com', Greeting.place == 'London').count()
I am using python with ndb to access cloud datastore. In the above example, Query 1 raises NeedIndexError if there is no composite index defined on user and place. But Query 2 works fine even if there is no index on user and place.
I would like to understand how cloud datastore fetches the count (Query 2) without the index when it mandates the index for fetching the list of entities (Query 1). I understand it stores Stats per kind per index which would result in quicker response for counts on existing indexes (Refer docs). But I'm unable to explain the above behaviour.
Note: There is no issue when querying on one property of a given kind as cloud datastore has indexes on a single properties by default.

There is no clear & direct explanation on why this happens but most likely its because how improved query planner works with zigzag indexes.
You can read more about this here: https://cloud.google.com/appengine/articles/indexselection#Improved_Query_Planner
The logic behind count() working and fetch() does not probably because with count() you don't need to keep in memory a lot of results.
So in case of count() you can easily scale by splitting work in multiple chunks processed in parallel and then just sum corresponding counts into one. You can't do this cheaply with cursors/recordsets.

Setting read_policy in AppEngine Python

In this document it is mentioned that the default read_policy setting is ndb.EVENTUAL_CONSISTENCY.
After I did a bulk delete of entity items from the Datastore versions of the app I pulled up continued to read the old data, so I've tried to figure out how to change this to STRONG_CONSISTENCY with no success, including:
entity.query().fetch(read_policy=ndb.STRONG_CONSISTENCY) and
...fetch(options=ndb.ContextOptions(read_policy=ndb.STRONG_CONSISTENCY))
The error I get is
BadArgumentError: read_policy argument invalid ('STRONG_CONSISTENCY')
How does one change this default? More to the point, how can I ensure that NDB will go to the Datastore to load a result rather than relying on an old cached value? (Note that after the bulk delete the datastore browser tells me the entity is gone.)

You cannot change that default, it is also the only option available. From the very doc you referenced (no other options are mentioned):
Description
Set this to ndb.EVENTUAL_CONSISTENCY if, instead of waiting for the
Datastore to finish applying changes to all returned results, you wish
to get possibly-not-current results faster.
The same is confirmed by inspecting the google.appengine.ext.ndb.context.py file (no STRONG_CONSISTENCY definition in it):
# Constant for read_policy.
EVENTUAL_CONSISTENCY = datastore_rpc.Configuration.EVENTUAL_CONSISTENCY
The EVENTUAL_CONSISTENCY ends up in ndb via the google.appengine.ext.ndb.__init__.py:
from context import *
__all__ += context.__all__
You might be able to avoid the error using a hack like this:
from google.appengine.datastore.datastore_rpc import Configuration
...fetch(options=ndb.ContextOptions(read_policy=Configuration.STRONG_CONSISTENCY))
However I think that only applies to reading the entities for the keys obtained through the query, but not to obtaining the list of keys themselves, which comes from the index the query uses, which is always eventually consistent - the root cause of your deleted entities still appearing in the result (for a while, until the index is updated). From Keys-only Global Query Followed by Lookup by Key:
But it should be noted that a keys-only global query can not exclude
the possibility of an index not yet being consistent at the time of
the query, which may result in an entity not being retrieved at all.
The result of the query could potentially be generated based on
filtering out old index values. In summary, a developer may use a
keys-only global query followed by lookup by key only when an
application requirement allows the index value not yet being
consistent at the time of a query.
Potentially of interest: Bulk delete datastore entity older than 2 days

Why is a keys_only query not returning strongly consistent results?

From what I have been reading in the Google Docs and other SO questions, keys_only queries should return strongly consistent results (here and here, for example).
My code looks something like this:
class ClientsPage(SomeHandler):
def get(self):
query = Client.query()
clients = query.fetch(keys_only=True)
self.write(len(clients))
Even though I am fetching the results with the keys_only=True parameter I am getting stale results right after the creation of a new Client object (which is a root entity). If there were 2 client objects before the insertion, it keeps showing 2 after inserting and redirecting. I have to manually refresh the page in order to see the number change to 3.
I understand I could use ancestor queries, but I am testing some things first and I was surprised to see that a keys_only query returned stale results. Can anyone please explain to me what's going on?
EDIT 1:
This happened in the development server, I have not tested it in production.

Eventual consistency exists because the Datastore needs time to update all indexes. Keys-only query is the same as all the other queries, except it tells the Datastore - I don't need the entire entity, just return me the key. The query still looks at the indexes to get the list of results.
In contrast, getting an entity by key does not need to look at the indexes, so it is always strongly consistent.

Return track-list using musicbrainzngs.search_releases()

I'm getting acquainted with musicbrainzngs and have run into a snag. All of the track-lists which are returned from the following are empty. Are there additional parameters I need to provide or is this a bug?
releases = musicbrainzngs.search_releases(
query='arid:' + musicbrainz_arid
)

This is expected. You have three ways of retrieving data from the MusicBrainz web service (using musicbrainzngs or directly):
lookup/get info for one entity by id: lots of info for that id
browse a list of entities: possibility to get long list, medium amount of information
search for entities: powerful to find things, but not much data given
When you know an entity by id you can look it up directly. You can even add includes to get very detailed information.
When you not only want one entity, but a list (like a list of releases for one artist) you can browse. Even for these you can add includes.
And only when you don't know the id of the entity (or an attached entity) or if you want to cut down the list of entities you search.
In your case you know the artist id and want to get the list of releases. In that case you should use browse_releases and set an include for recordings:
releases = musicbrainzngs.browse_releases(artist=musicbrainz_arid,
inc=["recordings"])

When an entry is deleted from the datastore, is its corresponding search document also deleted?

I am using Google App Engine's Search API to index entities from the Datastore. After I create or modify an object, I have to add it to the Search index. I do this by creating a add_to_search_index method for each model whose entities are indexed, for example:
class Location(ndb.Model):
...
def add_to_search_index(self):
fields = [
search.TextField(name="name", value=self.name),
search.GeoField(name="location", value= search.GeoPoint(self.location.lat, self.location.lon)),
]
document = search.Document(doc_id=str(self.key.id()), fields=fields)
index = search.Index(name='Location_index')
index.put(document)
Does the search API automatically maintain any correspondence between indexed documents and datastore entities?
I suspect they are not, meaning that the Search API will maintain deleted, obsolete entities in its index. If that's the case, then I suppose the best approach would be to use the NDB hook methods to create a remove_from_search_index method that is called before put (for edits/updates) and delete. Please advise if there is a better solution for maintaining correspondence between the datastore and search indices.

Since the datastore (NDB) and the search API are separate back ends they are to be maintained separately. I see you're using the key.id() as the document id. You can use this document id to get a document or to delete it. Maintaining the creation of the search document can be done in the model's _post_put_hook and _post_delete_hook. You may also use the repository pattern to do this. How you do this is up to you.
index = search.Index(name='Location_index')
index.delete([doc_id])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Google Datastore Order with ancestors - python

Related

Google Cloud Datastore Indexes for count queries

Setting read_policy in AppEngine Python

Why is a keys_only query not returning strongly consistent results?

Return track-list using musicbrainzngs.search_releases()

When an entry is deleted from the datastore, is its corresponding search document also deleted?

Categories

Resources