I'm still trying to get my head around how the datastore works. I don't have previous experience with databases, so it's not a conflicting paradigm situation; I think I'm just confused about NDB's ancestor structure.
Let's say I have this model class:
class Spam(Model.ndb)
eggs = ndb.StringProperty();
So I create an instance and store it like this:
foo = Spam(eggs="some string")
foo.put()
I understand that put() returns a key that I could easily call get() on if I'm trying to access it from the same script, but is there a way to specify my own key, so I can easily access the foo entity from another script in my app?
I realize I can specify a parent for foo like this:
foo = Spam(parent=ndb.Key("Bar","Baz"),eggs="some string")
But from there, how would I use "Bar" and/or "Baz" to access foo in a different script?
Parents are used if you have a hierarchy. So if you have recipe books you would put the book as the parent and each recipe as a child. I don't think that's what you want.
If you want to set the key do this:
SuperEggs= Spam(id='SuperEggs', eggs="2 egg whites")
SuperEggs.put()
You can always let App engine set its own keys, this will prevent contention and accidental over rides, when you want access to the entity again simply do a get on some field. Add a name and search that.
FYI the returned id from the put lets you access from any part of your app (or any authorized app). The datastore is global not specific to a script.
Related
I'm trying to access a reference property of an ndb PolyModel subclass from a db Expando
subclass. My two classes look like this:
class Foo(polymodel.PolyModel):
...
class Bar(db.Expando):
...
foo_reference = db.ReferecnceProperty(None, collection_name='foos')
...
The two definitions are in different files.
I assign the reference the following way:
...
foo = Foo.query.get()
bar.foo_reference = ndb.Key.to_old_key(foo.key)
...
I have no problems doing this. I can see the entry stored in the database in the app engine dashboard, but when I try to access the foo_reference I get a 'No implementation for kind Foo' exception. The problem line looks like this:
foo = bar.foo_reference.get()
I have doble checked all my imports and can actually create a Foo entity where I try to access the entity.
Is there some restriction in the db reference properties for referencing ndb?
How do I fix this issue?
Your Bar and Foo classes need to be imported. until you have imported them, the underlying mechanism for retrieving entities and recreating instances of the Models can't find the Class.
Importing them creates a registry of class to entity's.
May be the path to the handler with the query isn't importing the models.
Looking further at your code, you are also mixing db and ndb, plus you have lots of typo's, and why are you using ndb.Key.to_old_key if you are using db for model definition rather than nbd, or is that another typo.
After reading about the GAE Datastore API, I am still unsure if I need to duplicate key names and parents as properties for an entity.
Let's say there are two kinds of entities: Employee and Division. Each employee has a division as its parent, and is identified by an account name. I use the account name as the key name for employees. But when modeling Employee, I would still keep these two as properties:
division = db.ReferenceProperty(Division)
account_name = db.StringProperty()
Obviously I have to manually keep division consistent with its parent, and account_name with its key name. The reasons I am doing this extra work are:
I am afraid GQL/Datastore API may not support parent and key name as well as normal property. Is there anything I can do about a property but not parent or key name (or are they essentially reference properties)? How do I use key names in GQL queries?
The meaning of key name and parent is not particularly clear. As the names are not self-descriptive, I have to inform other contributors that we use account name as key name...
But this is really unnecessary work, wasting time and storage space. I cannot get rid of the SQL-thinking that - why doesn't Google just let us define a property to be the key? and another to be the parent? Then we could name them and use as normal properties...
What's the best practice here?
Keep in mind that in the GAE Datastore you can never change the parent or key_name of an entity once it has been created. These values are permanent for the life of the entity.
If there is even a small chance that the account_name of an Employee could change then you can not use it as a key_name. If it never changes then it could be a very good key_name and will allow you to do cheap gets for Employees using Employee.get_by_key_name() instead of expensive queries.
Parent is not meant to be equivalent to a foreign key. A better equivalent to a foreign key is a reference property.
The main reason you use parent is so that the parent and child entities are in the same entity group which allows you to operate on them both in a single transaction. If you just need a reference to the division from the Employee then just use a reference property. I suggest getting familiar with how entity groups work as this is very important on GAE data modeling:
https://developers.google.com/appengine/docs/python/datastore/entities#Transactions_and_Entity_Groups
Using parent can also cause write performance issues as there is a limit to how quickly you can write to a single entity group (approximately one write per second). When deciding whether to use parent or a reference property you need to think about which entities need to be modified in the same transaction. In many cases you can use Cross Group (XG) transactions instead. It is all about which trade-offs you want to make.
So my suggestions are:
If your account_name for an employee will absolutely never change then use it as a key_name. Otherwise just make it a basic property.
If you need to modify the Employee and the Division in the same transaction (and you can't get this to work with XG transactions) and you will never change the Division of an Employee then make the Division the parent of the Employee. Otherwise just model this relationship with a reference property.
When you create a new Employee object with a Divison as a parent, it would go something like:
div = Division()
... #Complete the division properties
div.put()
emp = Employee(key_name=<account_name>, parent=div)
... #Complete the employee properties
emp.put()
Then, when you want to get a reference to the Division an Employee is part of:
div = emp.parent()
#Get the Employee account_name (which is the employees's key name):
account_name = emp.key().name()
You don't have to store a RefrenceProperty to the Division an Employee is part of since it's already done in the parent. Additionally, you can get the account_name from the Employee entity's key as needed.
To query on the key:
emp = Employee.get_by_key_name(<account_name>, parent=<division>)
#OR
div = Division.get_by_key_name(<keyname>)
#Get all employees in a division
emps = Employee.all().ancestor(div)
i will like to have two types of entities referring to each other.
but python dont know about name of second entity class in the body of first yet.
so how shall i code.
class Business(db.Model):
bus_contact_info_ = db.ReferenceProperty(reference_class=Business_Info)
class Business_Info (db.Model):
my_business_ = db.ReferenceProperty(reference_class=Business)
if you advice to use reference in only one and use the implicitly created property
(which is a query object) in other.
then i question the CPU quota penalty of using query vs directly using get() on key
Pleas advise how to write this code in python
Queries are a little slower, and so they do use a bit more resources. ReferenceProperty does not require reference_class. So you could always define Business like:
class Business(db.Model):
bus_contact_info_ = db.ReferenceProperty()
There may also be better options for your datastructure too. Check out the modelling relationships article for some ideas.
Is this a one-to-one mapping? If this is a one-to-one mapping, you may be better off denormalizing your data.
Does it ever change? If not (and it is one-to-one), perhaps you could use entity groups and structure your data so that you could just directly use the keys / key names. You might be able to do this by making BusinessInfo a child of Business, then always use 'i' as the key_name. For example:
business = Business().put()
business_info = BusinessInfo(key_name='i', parent=business).put()
# Get business_info from business:
business_info = db.get(db.Key.from_path('BusinessInfo', 'i', parent=business))
# Get business from business_info:
business = db.get(business_info.parent())
I'm using google app engine with django 1.0.2 (and the django-helper) and wonder how people go about doing recursive delete.
Suppose you have a model that's something like this:
class Top(BaseModel):
pass
class Bottom(BaseModel):
daddy = db.ReferenceProperty(Top)
Now, when I delete an object of type 'Top', I want all the associated 'Bottom' objects to be deleted as well.
As things are now, when I delete a 'Top' object, the 'Bottom' objects stay and then I get data that doesn't belong anywhere. When accessing the datastore in a view, I end up with:
Caught an exception while rendering: ReferenceProperty failed to be resolved.
I could of course find all objects and delete them, but since my real model is at least 5 levels deep, I'm hoping there's a way to make sure this can be done automatically.
I've found this article about how it works with Java and that seems to be pretty much what I want as well.
Anyone know how I could get that behavior in django as well?
You need to implement this manually, by looking up affected records and deleting them at the same time as you delete the parent record. You can simplify this, if you wish, by overriding the .delete() method on your parent class to automatically delete all related records.
For performance reasons, you almost certainly want to use key-only queries (allowing you to get the keys of entities to be deleted without having to fetch and decode the actual entities), and batch deletes. For example:
db.delete(Bottom.all(keys_only=True).filter("daddy =", top).fetch(1000))
Actually that behavior is GAE-specific. Django's ORM simulates "ON DELETE CASCADE" on .delete().
I know that this is not an answer to your question, but maybe it can help you from looking in the wrong places.
Reconsider the data structure. If the relationship will never change on the record lifetime, you could use "ancestors" feature of GAE:
class Top(db.Model): pass
class Middle(db.Model): pass
class Bottom(db.Model): pass
top = Top()
middles = [Middle(parent=top) for i in range(0,10)]
bottoms = [Bottom(parent=middle) for i in range(0,10) for middle in middles]
Then querying for ancestor=top will find all the records from all levels. So it will be easy to delete them.
descendants = list(db.Query().ancestor(top))
# should return [top] + middles + bottoms
If your hierarchy is only a small number of levels deep, then you might be able to do something with a field that looks like a file path:
daddy.ancestry = "greatgranddaddy/granddaddy/daddy/"
me.ancestry = daddy.ancestry + me.uniquename + "/"
sort of thing. You do need unique names, at least unique among siblings.
The path in object IDs sort of does this already, but IIRC that's bound up with entity groups, which you're advised not to use to express relationships in the data domain.
Then you can construct a query to return all of granddaddy's descendants using the initial substring trick, like this:
query = Person.all()
query.filter("ancestry >", gdaddy.ancestry + "\U0001")
query.filter("ancestry <", gdaddy.ancestry + "\UFFFF")
Obviously this is no use if you can't fit the ancestry into a 500 byte StringProperty.
Is there a way to get the key (or id) value of a db.ReferenceProperty, without dereferencing the actual entity it points to? I have been digging around - it looks like the key is stored as the property name preceeded with an _, but I have been unable to get any code working. Examples would be much appreciated. Thanks.
EDIT: Here is what I have unsuccessfully tried:
class Comment(db.Model):
series = db.ReferenceProperty(reference_class=Series);
def series_id(self):
return self._series
And in my template:
more
The result:
more
Actually, the way that you are advocating accessing the key for a ReferenceProperty might well not exist in the future. Attributes that begin with '_' in python are generally accepted to be "protected" in that things that are closely bound and intimate with its implementation can use them, but things that are updated with the implementation must change when it changes.
However, there is a way through the public interface that you can access the key for your reference-property so that it will be safe in the future. I'll revise the above example:
class Comment(db.Model):
series = db.ReferenceProperty(reference_class=Series);
def series_id(self):
return Comment.series.get_value_for_datastore(self)
When you access properties via the class it is associated, you get the property object itself, which has a public method that can get the underlying values.
You're correct - the key is stored as the property name prefixed with '_'. You should just be able to access it directly on the model object. Can you demonstrate what you're trying? I've used this technique in the past with no problems.
Edit: Have you tried calling series_id() directly, or referencing _series in your template directly? I'm not sure whether Django automatically calls methods with no arguments if you specify them in this context. You could also try putting the #property decorator on the method.