How to change ancestor of an NDB record? - python

In the High-Replication Datastore (I'm using NDB), the consistency is eventual. In order to get a guaranteed complete set, ancestor queries can be used. Ancestor queries also provide a great way to get all the "children" of a particular ancestor with kindless queries. In short, being able to leverage the ancestor model is hugely useful in GAE.
The problem I seem to have is rather simplistic. Let's say I have a contact record and a message record. A given contact record is being treated as the ancestor for each message. However, it is possible that two contacts are created for the same person (user error, different data points, whatever). This situation produces two contact records, which have messages related to them.
I need to be able to "merge" the two records, and bring put all the messages into one big pile. Ideally, I'd be able to modify ancestor for one of the record's children.
The only way I can think of doing this, is to create a mapping and make my app check to see if record has been merged. If it has, look at the mappings to find one or more related records, and perform queries against those. This seems hugely inefficient. Is there more of "by the book" way of handling this use case?

The only way to change the ancestor of an entity is to delete the old one and create a new one with a new key. This must be done for all child (and grand child, etc) entities in the ancestor path. If this isn't possible, then your listed solution works.
This is required because the ancestor path of an entity is part of its unique key. Parents of entities (i.e., entities in the ancestor path) need not exist, so changing a parent's key will leave the children in the datastore with no parent.

Related

How to model a unique constraint in GAE ndb

I want to have several "bundles" (Mjbundle), which essentially are bundles of questions (Mjquestion). The Mjquestion has an integer "index" property which needs to be unique, but it should only be unique within the bundle containing it. I'm not sure how to model something like this properly, I try to do it using a structured (repeating) property below, but there is yet nothing actually constraining the uniqueness of the Mjquestion indexes. What is a better/normal/correct way of doing this?
class Mjquestion(ndb.Model):
"""This is a Mjquestion."""
index = ndb.IntegerProperty(indexed=True, required=True)
genre1 = ndb.IntegerProperty(indexed=False, required=True, choices=[1,2,3,4,5,6,7])
genre2 = ndb.IntegerProperty(indexed=False, required=True, choices=[1,2,3])
#(will add a bunch of more data properties later)
class Mjbundle(ndb.Model):
"""This is a Mjbundle."""
mjquestions = ndb.StructuredProperty(Mjquestion, repeated=True)
time = ndb.DateTimeProperty(auto_now_add=True)
(With the above model and having fetched a certain Mjbundle entity, I am not sure how to quickly fetch a Mjquestion from mjquestions based on the index. The explanation on filtering on structured properties looks like it works on the Mjbundle type level, whereas I already have a Mjbundle entity and was not sure how to quickly query only on the questions contained by that entity, without looping through them all "manually" in code.)
So I'm open to any suggestion on how to do this better.
I read this informational answer: https://stackoverflow.com/a/3855751/129202 It gives some thoughts about scalability and on a related note I will be expecting just a couple of bundles but each bundle will have questions in the thousands.
Maybe I should not use the mjquestions property of Mjbundle at all, but rather focus on parenting: each Mjquestion created should have a certain Mjbundle entity as parent. And then "manually" enforce uniqueness at "insert time" by doing an ancestor query.
When you use a StructuredProperty, all of the entities that type are stored as part of the containing entity - so when you fetch your bundle, you have already fetched all of the questions. If you stick with this way of storing things, iterating to check in code is the solution.

Is it safe to pass Google App Engine Entity Keys into web pages to maintain context?

I have a simple GAE system that contains models for Account, Project and Transaction.
I am using Django to generate a web page that has a list of Projects in a table that belong to a given Account and I want to create a link to each project's details page. I am generating a link that converts the Project's key to string and includes that in the link to make it easy to lookup the Project object. This gives a link that looks like this:
My Project Name
Is it secure to create links like this? Is there a better way? It feels like a bad way to keep context.
The key string shows up in the linked page and is ugly. Is there a way to avoid showing it?
Thanks.
There is few examples, in GAE docs, that uses same approach, and also Key are using characters safe for including in URLs. So, probably, there is no problem.
BTW, I prefer to use numeric ID (obj_key.id()), when my model uses number as identifier, just because it's looks not so ugly.
Whether or not this is 'secure' depends on what you mean by that, and how you implement your app. Let's back off a bit and see exactly what's stored in a Key object. Take your key, go to shell.appspot.com, and enter the following:
db.Key(your_key)
this returns something like the following:
datastore_types.Key.from_path(u'TestKind', 1234, _app=u'shell')
As you can see, the key contains the App ID, the kind name, and the ID or name (along with the kind/id pairs of any parent entities - in this case, none). Nothing here you should be particularly concerned about concealing, so there shouldn't be any significant risk of information leakage here.
You mention as a concern that users could guess other URLs - that's certainly possible, since they could decode the key, modify the ID or name, and re-encode the key. If your security model relies on them not guessing other URLs, though, you might want to do one of a couple of things:
Reconsider your app's security model. You shouldn't rely on 'secret URLs' for any degree of real security if you can avoid it.
Use a key name, and set it to a long, random string that users will not be able to guess.
A final concern is what else users could modify. If you handle keys by passing them to db.get, the user could change the kind name, and cause you to fetch a different entity kind to that which you intended. If that entity kind happens to have similarly named fields, you might do things to the entity (such as revealing data from it) that you did not intend. You can avoid this by passing the key to YourModel.get instead, which will check the key is of the correct kind before fetching it.
All this said, though, a better approach is to pass the key ID or name around. You can extract this by calling .id() on the key object (for an ID - .name() if you're using key names), and you can reconstruct the original key with db.Key.from_path('kind_name', id) - or just fetch the entity directly with YourModel.get_by_id.
After doing some more research, I think I can now answer my own question. I wanted to know if using GAE keys or ids was inherently unsafe.
It is, in fact, unsafe without some additional code, since a user could modify URLs in the returned webpage or visit URL that they build manually. This would potentially let an authenticated user edit another user's data just by changing a key Id in a URL.
So for every resource that you allow access to, you need to ensure that the currently authenticated user has the right to be accessing it in the way they are attempting.
This involves writing extra queries for each operation, since it seems there is no built-in way to just say "Users only have access to objects that are owned by them".
I know this is an old post, but i want to clarify one thing. Sometimes you NEED to work with KEYs.
When you have an entity with a #Parent relationship, you cant get it by its ID, you need to use the whole KEY to get it back form the Datastore. In these cases you need to work with the KEY all the time if you want to retrieve your entity.
They aren't simply increasing; I only have 10 entries in my Datastore and I've already reached 7001.
As long as there is some form of protection so users can't simply guess them, there is no reason not to do it.

Proper way of avoiding to store the same attachment twice

I'm using the project.task model where delegation creates a parent/child link between both.
When delegating I would like the person who gets the delegated task to also have access to the attachments on the original task, how could I avoid to have to really copy it?
I've thought about using an <act_window> or a wizard which checks if there is a parent task and if so (also) show the parent task attachments.
The problem with act_window is that you would need to specify 2 different act_window records and that would still only cover one parent and one child relation (the task could be delegated more)
For the wizard approach it seems to be a lot of overkill work for something that could maybe be solved easier (hence the question).
I think building a wizard is the only way that will work, because there isn't a real link between attachment and project.task. If I were you, I would build a wizard that walks the parent relation to build a list of all ancestor task ids, plus the current task id. Then have the wizard open the attachment window using that list of ids as one of the domain search criteria.

Modeling Hierarchical Data - GAE

I'm new in google-app-engine and google datastore (bigtable) and I've some doubts in order of which could be the best approach to design the required data model.
I need to create a hierarchy model, something like a product catalog, each domain has some subdomains in deep. For the moment the structure for the products changes less than the read requirements. Wine example:
Origin (Toscana, Priorat, Alsacian)
Winery (Belongs only to one Origin)
Wine (Belongs only to one Winery)
All the relations are disjoint and incomplete. Additionally in order of the requirements probably we need to store counters of use for every wine (could require transactions)
In order of the documentation seems there're different potential solutions:
Ancestors management. Using parent relations and transactions
Pseudo-ancestor management. Simulating ancestors with a db.ListProperty(db.Key)
ReferenceProperty. Specifying explicitelly the relation between the classes
But in order of the expected requests to get wines... sometimes by variety, sometimes by origin, sometimes by winery... i'm worried about the behaviour of the queries using these structures (like the multiple joins in a relational model. If you ask for the products of a family... you need to join for the final deep qualifier in the tree of products and join since the family)
Maybe is better to create some duplicated information (in order of the google team recommendations: operations are expensive, but storage is not, so duplicate content should not be seen the main problem)
Some responses of other similar questions suggest:
Store all the parent ids as a hierarchy in a string... like a path property
Duplicate the relations between the Drink entity an all the parents in the tree ...
Any suggestions?
Hi Will,
Our case is more an strict hierarchical approach as you represent in the second example. And the queries is for retrieving list of products, retrieve only one is not usual.
We need to retrieve all the wines from an Origin, from a Winery or from a Variety (If we supose that the variety is another node of the strict hierarchical tree, is only an example)
One way could be include a path property, as you mentioned:
/origin/{id}/winery/{id}/variety/{id}
To allow me to retrieve a list of wines from a variety applying a query like this:
wines_query = Wine.all()
wines_query.filter('key_name >','/origin/toscana/winery/latoscana/variety/merlot/')
wines_query.filter('key_name <','/origin/toscana/winery/latoscana/variety/merlot/zzzzzzzz')
Or like this from an Origin:
wines_query = Wine.all()
wines_query.filter('key_name >','/origin/toscana/')
wines_query.filter('key_name <','/origin/toscana/zzzzzz')
Thank you!
I'm not sure what kinds of queries you'll need to do in addition to those mentioned in the question, but storing the data in an explicit ancestor hierarchy would make the ones you asked about fall out pretty easily.
For example, to get all wines from a particular origin:
origin_key = db.Key.from_path('Origin', 123)
wines_query = db.Query(Wine).ancestor(origin_key)
or to get all wines from a particular winery:
origin_key = db.Key.from_path('Origin', 123)
winery_key = db.Key.from_path('Winery', 456, parent=origin_key)
wines_query = db.Query(Wine).ancestor(winery_key)
and, assuming you're storing the variety as a property on the Wine model, all wines of a particular variety is as simple as
wines_query = Wine.all().filter('variety =', 'merlot')
One possible downside of this strict hierarchical approach is the kind of URL scheme it can impose on you. With a hierarchy that looks like
Origin -> Winery -> Wine
you must know the key name or ID of a wine's origin and winery in order to build a key to retrieve that wine. Unless you've already got the string representation of a wine's key. This basically forces you to have URLs for wines in one of the following forms:
/origin/{id}/winery/{id}/wine/{id}
/wine/{opaque and unfriendly datastore key as a string}
(The first URL could of course be replaced with querystring parameters; the important part is that you need three different pieces of information to identify a given wine.)
Maybe there are other alternatives to these URL schemes that have not occurred to me, though.

datastore transaction restrictions

in my google app application, whenever a user purchases a number of contracts, these events are executed (simplified for clarity):
user.cash is decreased
user.contracts is increased by the number
contracts.current_price is updated.
market.no_of_transactions is increased by 1.
in a rdms, these would be placed within the same transaction. I conceive that google datastore does not allow entities of more than one model to be in the same transaction.
what is the correct approach to this issue? how can I ensure that if a write fails, all preceding writes are rolled back?
edit: I have obviously missed entity groups. Now I'd appreciate some further information regarding how they are used. Another point to clarify is google says "Only use entity groups when they are needed for transactions. For other relationships between entities, use ReferenceProperty properties and Key values, which can be used in queries". does it mean I have to define both a reference property (since I need queriying them) and a parent-child relationship (for transactions)?
edit 2: and finally, how do I define two parents for an entity if the entity is being created to establish an n-to-n relationship between 2 parents?
After a through research, I have found that a distributed transaction layer that provides a solution to the single entity group restriction has been developed in userland with the help of some google people. But so far, it is not released and is only available in java.
Let me add a quote from the Datastore documentation:
A good rule of thumb for entity groups is that they should be about
the size of a single user's worth of
data or smaller.
You could create a pseudo root entity and put everything below this. Then, you execute everything in a transaction.
shanyu, you mentioned the distributed transaction layer that lets you operate across arbitrarily many entity groups in a single transaction. it actually
has been released, it just hasn't been advertised very loudly. it was designed and written by daniel wilkerson and erick armbrust, with some consulting on my part. dan describes it in this talk.
nick johnson has also described
how to do "transfer" type operations across entity groups, similar to what you describe. it's not as general purpose as tapioca-orm, but it's simpler and lighter weight.
there's a related built in feature, transactional tasks, that lets you add a task to a queue within a datastore transaction, such that it will only be added if the transaction commits successfully. that task can then do more datastore operations, including a transaction on a different entity group. it's not as strong as dan and erick's solution, but it does give you guaranteed eventual consistency across entity groups, which is good enough for many use cases, without the extra overhead.
in response to your questions: 1) you're not required to use both reference properties and parent/child relationships (ie entity groups). that guideline just means that entity groups limit datastore write throughput, since writes are serialized per entity group. you should be aware of that if you're considering structuring your data into entity groups just for ancestor queries.
2) an entity can't have more than one parent. if you want to model a many-to-many relationship, you should generally use a ListProperty of reference properties (ie keys). see this article and this talk for details.

Categories