Google Appengine NDB ancestor vs key query

Google Appengine NDB ancestor vs key query - python

I am storing a key of an entity as a property of another in order to relate them. We are in a refactor stage at this point in the project so I was thinking about introducing ancestors.
Is there a performance difference between the two approaches? Any given advantages that I might gain if we introduce ancestors?
class Book(ndb.Model):
...
class Article(ndb.Model):
book_key = ndb.KeyProperty(kind=Book, required=True)
book_key = ndb.Key("Book", 12345)
1st ancestor query approach
qry = Article.query(ancestor=book_key)
2st simple key query approach
qry = Article.query(book_key=book_key)

The ancestor query will always be fully consistent. Querying by book_key, on the other hand, will not necessarily be consistent: you may find that recent changes will not be shown in that query.
On the other hand, introducing an ancestor imposes a limit on the number of updates: you can only do one update per second to any entity group (ie the ancestor and its children).
It's a trade-off for you as to which one is more important in your app.

Related

Can you help me understand the nbd Key Class Documentation or rather ancestor relationship?

I am trying to wrap my head 'round gae datastore, but I do not fully understand the documentation for the Key Class / or maybe it is ancestor relationships in general I do not grasp.
I think what I want is multiple ancestors.
Example:
Say I wanted to model our school's annual sponsored run for charity; school kids run rounds around the track and their relatives (=sponsors) donate to charity for each round completed.
In my mind, I would create the following kinds:
Profile (can be both runner and sponsor)
Run (defines who (cf. profile) runs for what charity, rounds actually completed)
Sponsorship (defines who (cf. profile) donates how much for what run, whether the donation has been made)
I've learned that datastore is a nosql, non-relational database, but haven't fully grasped it. So my questions are:
a. Is creating an entity for "Sponsorship" even the best way in datastore? I could also model it as a has-a relationship (every run has sponsors) - but since I also want to track the amount sponsored, whether sponsor paid up and maybe more this seems inappropriate
b. I'd like to easily query all sponsorhips made by a single person and also all sponsorships belonging to a certain run.
So, I feel, this would be appropriate:
Profile --is ancestor of--> Run
Profile --is ancestor of--> Sponsorship
Run --is ancestor of--> Sponsorship
Is that sensible?
I can see a constructor for a Key that takes several kinds in ancestor order as arguments. Was that designed for this case? "Run" and "profile" would be at the same "level" (i.e. mum&dad ancestors not father&grandfather) - what would that constructor look like in python?

The primary way of establishing relationships between entities is via the key properties in the entity model. Normally no ancestry is needed.
For example:
class Profile(ndb.Model):
name = ndb.StringProperty()
class Run(ndb.Model):
runner = ndb.KeyProperty(kind='Profile')
rounds = ndb.IntegerProperty()
sponsorship = ndb.KeyProperty(kind='Sponsorship')
class Sponsorship(ndb.Model):
run = ndb.KeyProperty(kind='Run')
donor = ndb.KeyProperty(kind='Profile')
done = ndb.BooleanProperty()
The ancestry just places entities inside the same entity group (which can be quite limiting!) while enforcing additional relationships on top of the ones already established by the model. See Transactions and entity groups and maybe Contention problems in Google App Engine.

How to implement composition/agregation with NDB on GAE

How do we implement agregation or composition with NDB on Google App Engine ? What is the best way to proceed depending on use cases ?
Thanks !
I've tried to use a repeated property. In this very simple example, a Project have a list of Tag keys (I have chosen to code it this way instead of using StructuredProperty because many Project objects can share Tag objects).
class Project(ndb.Model):
name = ndb.StringProperty()
tags = ndb.KeyProperty(kind=Tag, repeated=True)
budget = ndb.FloatProperty()
date_begin = ndb.DateProperty(auto_now_add=True)
date_end = ndb.DateProperty(auto_now_add=True)
#classmethod
def all(cls):
return cls.query()
#classmethod
def addTags(cls, from_str):
tagname_list = from_str.split(',')
tag_list = []
for tag in tagname_list:
tag_list.append(Tag.addTag(tag))
cls.tags = tag_list
--
Edited (2) :
Thanks. Finally, I have chosen to create a new Model class 'Relation' representing a relation between two entities. It's more an association, I confess that my first design was unadapted.

An alternative would be to use BigQuery. At first we used NDB, with a RawModel which stores individual, non-aggregated records, and an AggregateModel, which a stores the aggregate values.
The AggregateModel was updated every time a RawModel was created, which caused some inconsistency issues. In hindsight, properly using parent/ancestor keys as Tim suggested would've worked, but in the end we found BigQuery much more pleasant and intuitive to work with.
We just have cronjobs that run everyday to push RawModel to BigQuery and another to create the AggregateModel records with data fetched from BigQuery.
(Of course, this is only effective if you have lots of data to aggregate)

It really does depend on the use case. For small numbers of items StructuredProperty and repeated properties may well be the best fit.
For large numbers of entities you will then look at setting the parent/ancestor in the Key for composition, and have a KeyProperty pointing to the primary entity in a many to one aggregation.
However the choice will also depend heavily on the actual use pattern as well. Then considerations of efficiency kick in.
The best I can suggest is consider carefully how you plan to use these relationships, how active are they (ie are they constantly changing, adding, deleting), do you need to see all members of the relation most of the time, or just subsets. These consideration may well require adjustments to the approach.

How can I test that Django QuerySets are ordered by PK ascending

class Foo(models.Model):
name = models.CharField(max_length=10)
class Meta(object):
ordering = ('pk', )
I want to test that this ordering is working as I expect.
def test_respect_ordering(self):
Foo.objects.create(name="bar", pk=2)
Foo.objects.create(name="baz", pk=1)
results = Foo.objects.all()
self.assertEqual("baz", results[0].name)
self.assertEqual("bar", results[1].name)
Although this works as I expect, my test passes regardless of the Meta class or the ordering property defined in it. Is there some way I can test that this code matters?
Why do I want to test this? My tests run in SQLite, but production is in mysql. Hopefully someday, we'll use a better RDMBS, and maybe results won't be returned by PK across all of these RDMBS's.
The Django docs indicate that sorting doesn't happen automatically.

If you omit the ordering attribute in your Meta class, the resulting generated SQL query will not have an ORDER BY clause (well, in 1.4 anyway). This means you can't rely on the order of the rows.
Unordered SQL queries will generally have an order that looks like it makes some sense. That's because the query plan will most likely use indexes to decrease query time, and indexes can play a big part in the row order for unordered queries. A table generated by a Django model will only have an index on the primary key unless otherwise specified, so in general the 'unordered' order will be quite similar to the order when sorted on primary key.
However, there is absolutely no guarantee here, and this order cannot be relied on. The query plan depends largely on the database engine, and it can even change drastically for very similar queries on the same engine.
If you want a particular order, you should explicitly specify the order you want. That's the only reliable way to guarantee a particular order.

What does the write limitation on ancestor queries mean?

According to the documentation, when using ancestor queries the following limitation will be enforced:
Ancestor queries allow you to make strongly consistent queries to the
datastore, however entities with the same ancestor are limited to 1
write per second.
class Customer(ndb.Model):
name = ndb.StringProperty()
class Purchase(ndb.Model):
price = ndb.IntegerProperty
purchase1 = Purchase(ancestor=customer_entity.key)
purchase2 = Purchase(ancestor=customer_entity.key)
purchase3 = Purchase(ancestor=customer_entity.key)
purchase1.put()
purchase2.put()
purchase3.put()
Taking the same example, if I was about to write three purchases at same time, would I get an exception, as its less than a second apart?

Here you can find two excellent videos about the datastore, strong consistency and entity groups. Datastore Introduction and Datastore Query, Index and Transaction.
About your example. You can use a put_multi() which "counts" for a single entity group write.

google app engine cross group transactions needing parent ancestor

From my understanding, #db.transactional(xg=True) allows for transactions across groups, however the following code returns "queries inside transactions must have ancestors".
#db.transactional(xg=True)
def insertUserID(self,userName):
user = User.gql("WHERE userName = :1", userName).get()
highestUser = User.all().order('-userID').get()
nextUserID = highestID + 1
user.userID = nextUserID
user.put()
Do you need to pass in the key for each entity despite being a cross group transaction? Can you please help modify this example accordingly?

An XG transaction can be applied across max 25 entity groups. Ancestor query limits the query to a single entity group, and you would be able to do queries within those 25 entity groups in a single XG transaction.
A transactional query without parent would potentially include all entity groups in the application and lock everything up, so you get an error message instead.
In app engine one usually tries to avoid monotonically increasing ids. The auto assigned ones might go like 101, 10001, 10002 and so on. If you know that you need monotonically increasng ids it and it'll work for you performance wise, how about:
Have some kind of model representation of userId to enable key_name
usage and direct lookup
Query for userId outside transaction, get highest candidate id
In transaction do get_or_insert; lookup UserId.get_by_key_name(candidateid+1). If
already present and pointing to a different user, try again with +2
and so on until you find a free one and create it, updating the
userid attribute of user at the same time.
If the XG-transaction of updating UserId+User is too slow, perhaps create UserId+task in transaction (not XG), and let the executing task associate UserId and User afterwards. Or a single backend that can serialize UserId creation and perhaps allow put_async if you retry to avoid holes in the sequence and do something like 50 creations per second.
If it's possible to use userName as key_name you can do direct lookup instead of query and make things faster and cheaper.

Cross group transactions allow you to perform a transaction across multiple groups, but they don't remove the prohibition on queries inside transactions. You need to perform the query outside the transaction, and pass the ID of the entity in (and then check any invariants specified in the query still hold) - or, as Shay suggests, use IDs so you don't have to do a query in the first place.

Every datastore entity has a key, a key (amount other things) has a numeric id that the AppEngine assign to it or key_name which you can give it.
In your case it looks like you can use the numeric id, after you call put() on the user entity you will have: user.key().id() (or user.key.id() if your using NDB) which will be unique for each user (as long as all the user have the same parent, which is None in your code).
This id is not sequential but guarantee to be unique.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.