Recursive delete in google app engine - python

I'm using google app engine with django 1.0.2 (and the django-helper) and wonder how people go about doing recursive delete.
Suppose you have a model that's something like this:
class Top(BaseModel):
pass
class Bottom(BaseModel):
daddy = db.ReferenceProperty(Top)
Now, when I delete an object of type 'Top', I want all the associated 'Bottom' objects to be deleted as well.
As things are now, when I delete a 'Top' object, the 'Bottom' objects stay and then I get data that doesn't belong anywhere. When accessing the datastore in a view, I end up with:
Caught an exception while rendering: ReferenceProperty failed to be resolved.
I could of course find all objects and delete them, but since my real model is at least 5 levels deep, I'm hoping there's a way to make sure this can be done automatically.
I've found this article about how it works with Java and that seems to be pretty much what I want as well.
Anyone know how I could get that behavior in django as well?

You need to implement this manually, by looking up affected records and deleting them at the same time as you delete the parent record. You can simplify this, if you wish, by overriding the .delete() method on your parent class to automatically delete all related records.
For performance reasons, you almost certainly want to use key-only queries (allowing you to get the keys of entities to be deleted without having to fetch and decode the actual entities), and batch deletes. For example:
db.delete(Bottom.all(keys_only=True).filter("daddy =", top).fetch(1000))

Actually that behavior is GAE-specific. Django's ORM simulates "ON DELETE CASCADE" on .delete().
I know that this is not an answer to your question, but maybe it can help you from looking in the wrong places.

Reconsider the data structure. If the relationship will never change on the record lifetime, you could use "ancestors" feature of GAE:
class Top(db.Model): pass
class Middle(db.Model): pass
class Bottom(db.Model): pass
top = Top()
middles = [Middle(parent=top) for i in range(0,10)]
bottoms = [Bottom(parent=middle) for i in range(0,10) for middle in middles]
Then querying for ancestor=top will find all the records from all levels. So it will be easy to delete them.
descendants = list(db.Query().ancestor(top))
# should return [top] + middles + bottoms

If your hierarchy is only a small number of levels deep, then you might be able to do something with a field that looks like a file path:
daddy.ancestry = "greatgranddaddy/granddaddy/daddy/"
me.ancestry = daddy.ancestry + me.uniquename + "/"
sort of thing. You do need unique names, at least unique among siblings.
The path in object IDs sort of does this already, but IIRC that's bound up with entity groups, which you're advised not to use to express relationships in the data domain.
Then you can construct a query to return all of granddaddy's descendants using the initial substring trick, like this:
query = Person.all()
query.filter("ancestry >", gdaddy.ancestry + "\U0001")
query.filter("ancestry <", gdaddy.ancestry + "\UFFFF")
Obviously this is no use if you can't fit the ancestry into a 500 byte StringProperty.

Related

Building Django Q() objects from other Q() objects, but with relation crossing context

I commonly find myself writing the same criteria in my Django application(s) more than once. I'll usually encapsulate it in a function that returns a Django Q() object, so that I can maintain the criteria in just one place.
I will do something like this in my code:
def CurrentAgentAgreementCriteria(useraccountid):
'''Returns Q that finds agent agreements that gives the useraccountid account current delegated permissions.'''
AgentAccountMatch = Q(agent__account__id=useraccountid)
StartBeforeNow = Q(start__lte=timezone.now())
EndAfterNow = Q(end__gte=timezone.now())
NoEnd = Q(end=None)
# Now put the criteria together
AgentAgreementCriteria = AgentAccountMatch & StartBeforeNow & (NoEnd | EndAfterNow)
return AgentAgreementCriteria
This makes it so that I don't have to think through the DB model more than once, and I can combine the return values from these functions to build more complex criterion. That works well so far, and has saved me time already when the DB model changes.
Something I have realized as I start to combine the criterion from these functions that is that a Q() object is inherently tied to the type of object .filter() is being called on. That is what I would expect.
I occasionally find myself wanting to use a Q() object from one of my functions to construct another Q object that is designed to filter a different, but related, model's instances.
Let's use a simple/contrived example to show what I mean. (It's simple enough that normally this would not be worth the overhead, but remember that I'm using a simple example here to illustrate what is more complicated in my app.)
Say I have a function that returns a Q() object that finds all Django users, whose username starts with an 'a':
def UsernameStartsWithAaccount():
return Q(username__startswith='a')
Say that I have a related model that is a user profile with settings including whether they want emails from us:
class UserProfile(models.Model):
account = models.OneToOneField(User, unique=True, related_name='azendalesappprofile')
emailMe = models.BooleanField(default=False)
Say I want to find all UserProfiles which have a username starting with 'a' AND want use to send them some email newsletter. I can easily write a Q() object for the latter:
wantsEmails = Q(emailMe=True)
but find myself wanting to something to do something like this for the former:
startsWithA = Q(account=UsernameStartsWithAaccount())
# And then
UserProfile.objects.filter(startsWithA & wantsEmails)
Unfortunately, that doesn't work (it generates invalid PSQL syntax when I tried it).
To put it another way, I'm looking for a syntax along the lines of Q(account=Q(id=9)) that would return the same results as Q(account__id=9).
So, a few questions arise from this:
Is there a syntax with Django Q() objects that allows you to add "context" to them to allow them to cross relational boundaries from the model you are running .filter() on?
If not, is this logically possible? (Since I can write Q(account__id=9) when I want to do something like Q(account=Q(id=9)) it seems like it would).
Maybe someone suggests something better, but I ended up passing the context manually to such functions. I don't think there is an easy solution, as you might need to call a whole chain of related tables to get to your field, like table1__table2__table3__profile__user__username, how would you guess that? User table could be linked to table2 too, but you don't need it in this case, so I think you can't avoid setting the path manually.
Also you can pass a dictionary to Q() and a list or a dictionary to filter() functions which is much easier to work with than using keyword parameters and applying &.
def UsernameStartsWithAaccount(context=''):
field = 'username__startswith'
if context:
field = context + '__' + field
return Q(**{field: 'a'})
Then if you simply need to AND your conditions you can combine them into a list and pass to filter:
UserProfile.objects.filter(*[startsWithA, wantsEmails])

What is the difference between a mongoengine.DynamicEmbeddedDocument vs mongoengine.DictField?

A mongoengine.DynamicEmbeddedDocument can be used to leverage MongoDB's flexible schema-less design. It's expandable and doesn't apply type constraints to the fields, afaik.
A mongoengine.DictField similarly allows for use of MongoDB's schema-less nature. In the documentation they simply say (w.r.t. the DictField)
This is similar to an embedded document, but the structure is not defined.
Does that mean, then, the mongoengine.fields.DictField and the mongoengine.DynamicEmbeddedDocument are completely interchangeable?
EDIT (for more information):
mongoengine.DynamicEmbeddedDocument inherits from mongoengine.EmbeddedDocument which, from the code is:
A mongoengine.Document that isn't stored in its own collection. mongoengine.EmbeddedDocuments should be used as fields on mongoengine.Documents through the mongoengine.EmbeddedDocumentField field type.
A mongoengine.fields.EmbeddedDocumentField is
An embedded document field - with a declared document_type. Only valid values are subclasses of EmbeddedDocument.
Does this mean the only thing that makes the DictField and DynamicEmbeddedDocument not totally interchangeable is that the DynamicEmbeddedDocument has to be defined through the EmbeddedDocumentField field type?
From what I’ve seen, the two are similar, but not entirely interchangeable. Each approach may have a slight advantage based on your needs. First of all, as you point out, the two approaches require differing definitions in the document, as shown below.
class ExampleDynamicEmbeddedDoc(DynamicEmbeddedDocument):
pass
class ExampleDoc(Document):
dict_approach = DictField()
dynamic_doc_approach = EmbeddedDocumentField(ExampleDynamicEmbeddedDoc, default = ExampleDynamicEmbeddedDoc())
Note: The default is not required, but the dynamic_doc_approach field will need to be set to a ExampleDynamicEmbeddedDoc object in order to save. (i.e. trying to save after setting example_doc_instance.dynamic_doc_approach = {} would throw an exception). Also, you could use the GenericEmbeddedDocumentField if you don’t want to tie the field to a specific type of EmbeddedDocument, but the field would still need to be point to an object subclassed from EmbeddedDocument in order to save.
Once set up, the two are functionally similar in that you can save data to them as needed and without restrictions:
e = ExampleDoc()
e.dict_approach["test"] = 10
e.dynamic_doc_approach.test = 10
However, the one main difference that I’ve seen is that you can query against any values added to a DictField, whereas you cannot with a DynamicEmbeddedDoc.
ExampleDoc.objects(dict_approach__test = 10) # Returns a QuerySet containing our entry.
ExampleDoc.objects(dynamic_doc_approach__test = 10) # Throws an exception.
That being said, using an EmbeddedDocument has the advantage of validating fields which you know will be present in the document. (We simply would need to add them to the ExampleDynamicEmbeddedDoc definition). Because of this, I think it is best to use a DynamicEmbeddedDocument when you have a good idea of a schema for the field and only anticipate adding fields minimally (which you will not need to query against). However, if you are not concerned about validation or anticipate adding a lot of fields which you’ll query against, go with a DictField.

Model relationship to html template

I've been struggling for this issue for a few hours - I know there's probably a simple solution that I'm overlooking.
I have a one to many relationship with my models.
I have need to return all rows of one object with the rows for the related object.
In a sense I have this:
object
object
object_relationship.property
object_relationship.property
object
object_relationship.property
object
Now - I can run through all of these fine, but I run into an issue when I want to send these back to the html template.
I can send the object back - but how do I send the object_relationship back in the order that I have it above?
Does this make sense?
You might not need to worry too much about this, acutally... look at these models:
class Venue(base.NamedEntity, HasPerformances, HasUrl, HasLocation):
city = db.ReferenceProperty(City, collection_name='venues')
url = db.StringProperty(required=True, validator=validators.validate_url)
location = db.GeoPtProperty()
class Performance(base.Entity):
show = db.ReferenceProperty(Show, collection_name='performances', required=True)
utc_date_time = db.DateTimeProperty(required=True)
venue = db.ReferenceProperty(Venue, collection_name='performances', required=True)
In a case like this, nothing stops you from using venue.performances from either code or templates and treating it as a list. The API will automatically fire queries as needed to fetch the actual objects. The same thing goes for performance.venue.
The only problem here is performance - you've got a variant of the n+1 problem to deal with. There are workarounds, though, like this article by Nick Johnson. I'd suggest reading the API code too... it makes for interesting reading how the property get is captured and dereferenced.
My first suggestion is to denormalize the data if you are going to do many reports like that. For example, maybe you could include object.name on the object_relationship entity.
That said, you could send a list of dicts to your template, so maybe something like:
data = []
for entity in your_query:
children = [{'name': child.name} for child in entity.object_relation]
data.append({'name': object.name,
'children': children,
...
})
Then pass the data list to your template, and process it.
Please note, this will perform very badly. It will execute another query for every one of the items in your first query. Use Appstats to profile your app.

Concurent Access to datastore in app engine

i want to know if db.run_in_transaction() acts as a lock for Data store operations
and helps in case of concurrent access on same entity.
Does in following code it is guarantied that a concurrent access will not cause a race and instead of creating new entity it will not do a over-write
Is db.run_in_transaction() correct/best way to do so
in following code i m trying to create new unique entity with following code
def txn(charmer=None):
new = None
key = my_magic() + random_part()
sk = Snake.get_by_name(key)
if not sk:
new = Snake(key_name=key, charmer= charmer)
new.put()
return new
db.run_in_transaction(txn, charmer)
That is a safe method. Should the same name get generated twice, only one entity would be created.
It sounds like you have already looked at the transactions documentation. There is also a more detailed description.
Check out the docs (specifically the equivalent code) on Model.get_or_insert, it answers exactly the question you are asking:
The get and subsequent (possible) put
are wrapped in a transaction to ensure
atomicity. Ths means that
get_or_insert() will never overwrite
an existing entity, and will insert a
new entity if and only if no entity
with the given kind and name exists.
What you've done is right and sort of duplicates the Model.get_or_insert, like Robert already explained.
I don't know if this can be called a 'lock'... the way this works is optimistic concurrency - the operation will execute assuming that no one else is trying to do the same thing at the same time, and if someone is, it will give you an exception. You'll need to figure out what you want to do in that case. Maybe ask the user to choose a new name?

App Engine, Cross reference between two entities

i will like to have two types of entities referring to each other.
but python dont know about name of second entity class in the body of first yet.
so how shall i code.
class Business(db.Model):
bus_contact_info_ = db.ReferenceProperty(reference_class=Business_Info)
class Business_Info (db.Model):
my_business_ = db.ReferenceProperty(reference_class=Business)
if you advice to use reference in only one and use the implicitly created property
(which is a query object) in other.
then i question the CPU quota penalty of using query vs directly using get() on key
Pleas advise how to write this code in python
Queries are a little slower, and so they do use a bit more resources. ReferenceProperty does not require reference_class. So you could always define Business like:
class Business(db.Model):
bus_contact_info_ = db.ReferenceProperty()
There may also be better options for your datastructure too. Check out the modelling relationships article for some ideas.
Is this a one-to-one mapping? If this is a one-to-one mapping, you may be better off denormalizing your data.
Does it ever change? If not (and it is one-to-one), perhaps you could use entity groups and structure your data so that you could just directly use the keys / key names. You might be able to do this by making BusinessInfo a child of Business, then always use 'i' as the key_name. For example:
business = Business().put()
business_info = BusinessInfo(key_name='i', parent=business).put()
# Get business_info from business:
business_info = db.get(db.Key.from_path('BusinessInfo', 'i', parent=business))
# Get business from business_info:
business = db.get(business_info.parent())

Categories