Get existing or create new App Engine Syntax - python

I came across this syntax browsing through code for examples. From its surrounding code, it looked like would a) get the entity with the given keyname or b) if the entity did not exist, create a new entity that could be saved. Assume my model class is called MyModel.
my_model = MyModel(key_name='mymodelkeyname',
kwarg1='first arg', kwarg2='second arg')
I'm now running into issues, but only in certain situations. Is my assumption about what this snippet does correct? Or should I always do the following?
my_model = MyModel.get_by_key_name('mymodelkeyname')
if not my_model:
my_model = MyModel(key_name='mymodelkeyname',
kwarg1='first arg', kwarg2='second arg')
else:
# do something with my_model

The constructor, which is what you're using, always constructs a new entity. When you store it, it overwrites any other entity with the same key.
The alternate code you propose also has an issue: it's susceptible to race conditions. Two instances of that code running simultaneously could both determine that the entity does not exist, and each create it, resulting in one overwriting the work of the other.
What you want is the Model.get_or_insert method, which is syntactic sugar for this:
def get_or_insert(cls, key_name, **kwargs):
def _tx():
model = cls.get_by_key_name(key_name)
if not model:
model = cls(key_name=key_name, **kwargs)
model.put()
return model
return db.run_in_transaction(_tx)
Because the get operation and the conditional insert take place in a transaction, the race condition is not possible.

Is this what you are looking for -> http://code.google.com/appengine/docs/python/datastore/modelclass.html#Model_get_or_insert

Related

"Updater" design pattern, as opposed to "Builder"

This is actually language agnostic, but I always prefer Python.
The builder design pattern is used to validate that a configuration is valid prior to creating an object, via delegation of the creation process.
Some code to clarify:
class A():
def __init__(self, m1, m2): # obviously more complex in life
self._m1 = m1
self._m2 = m2
class ABuilder():
def __init__():
self._m1 = None
self._m2 = None
def set_m1(self, m1):
self._m1 = m1
return self
def set_m2(self, m1):
self._m2 = m2
return self
def _validate(self):
# complicated validations
assert self._m1 < 1000
assert self._m1 < self._m2
def build(self):
self._validate()
return A(self._m1, self._m2)
My problem is similar, with an extra constraint that I can't re-create the object each time due to to performance limitations.
Instead, I want to only update an existing object.
Bad solutions I came up with:
I could do as suggested here and just use setters like so
class A():
...
set_m1(self, m1):
self._m1 = m1
# and so on
But this is bad because using setters
Beats the purpose of encapsulation
Beats the purpose of the buillder (now updater), which is supposed to validate that some complex configuration is preserved after the creation, or update in this case.
As I mentioned earlier, I can't recreate the object every time, as this is expensive and I only want to update some fields, or sub-fields, and still validate or sub-validate.
I could add update and validation methods to A and call those, but this beats the purpose of delegating the responsibility of updates, and is intractable in the number of fields.
class A():
...
def update1(m1):
pass # complex_logic1
def update2(m2):
pass # complex_logic2
def update12(m1, m2):
pass # complex_logic12
I could just force to update every single field in A in a method with optional parameters
class A():
...
def update("""list of all fields of A"""):
pass
Which again is not tractable, as this method will soon become a god method due to the many combinations possible.
Forcing the method to always accept changes in A, and validating in the Updater also can't work, as the Updater will need to look at A's internal state to make a descision, causing a circular dependency.
How can I delegate updating fields in my object A
in a way that
Doesn't break encapsulation of A
Actually delegates the responsibility of updating to another object
Is tractable as A becomes more complicated
I feel like I am missing something trivial to extend building to updating.
I am not sure I understand all of your concerns, but I want to try and answer your post. From what you have written I assume:
Validation is complex and multiple properties of an object must be checked to decide if any change to the object is valid.
The object must always be in a valid state. Changes that make the object invalid are not permitted.
It is too expensive to copy the object, make the change, validate the object, and then reject the change if the validation fails.
Move the validation logic out of the builder and into a separate class like ModelValidator with a validateModel(model) method
The first option is to use a command pattern.
Create abstract class or interface named Update (I don't think Python abstract classes/interfaces, but that's fine). The Update interface implements two methods, execute() and undo().
A concrete class has a name like UpdateAdress, UpdatePortfolio, or UpdatePaymentInfo.
Each concrete Update object also holds a reference to your model object.
The concrete classes hold the state needed to for a particular kind of update. Imageine these methods exist on the UpdateAddress class:
UpdateAddress
setStreetNumber(...)
setCity(...)
setPostcode(...)
setCountry(...)
The update object needs to hold both the current and new values of a property. Like:
setStreetNumber(aString):
self.oldStreetNumber = model.getStreetNumber
self.newStreetNumber = aString
When the execute method is called, the model is updated:
execute:
model.setStreetNumber(newStreetNumber)
model.setCity(newCity)
# Set postcode and country
if not ModelValidator.isValid(model):
self.undo()
raise ValidationError
and the undo method looks like:
undo:
model.setStreetNumber(oldStreetNumber)
model.setCity(oldCity)
# Set postcode and country
That is a lot of typing, but it would work. Mutating your model object is nicely encapsulated by different kinds of updates. You can execute or undo the changes by calling those methods on the update object. You can even store a list of update objects for multi-level undos and re-tries.
However, it is a lot of typing for the programmer. Consider using persistent data structures. Persistent data structures can be used to copy objects very quickly -- approximately constant time complexity. Here is a python library of persistent data structures.
Let's assume your data was in a persistent data structure version of a dict. The library I referenced calls it a PMap.
The implementation of the update classes can be simpler. Starting with the constructor:
UpdateAddress(pmap)
self.oldPmap = pmap
self.newPmap = pmap
The setters are easier:
setStreetNumber(aString):
self.newPmap = newPmap.set('streetNumber', aString)
Execute passes back a new instance of the model, with all the updates.
execute:
if ModelValidator.isValid(newModel):
return newModel;
else:
raise ValidationError
The original object has not changed at all, thanks to the magic of persistent data structures.
The best thing is to not do any of this. Instead, use an ORM or object database. That is the "enterprise grade" solution. These libraries give you sophisticated tools like transactions and object version history.

How to query with the ID of the parent instance using **kwargs in Django?

I am currently implementing soft deletion for all models in my database. The idea is that when an instance gets deleted, it actually gets archived with all of its children. If the user tries to create an instance that is identical to the archived one, the archived one gets undeleted along with all of its children instead of creating a new instance.
To do this, I am using django-safedelete where I am making a BaseModel with an overwritten save() method that looks something like this:
def save(self, *args, **kwargs):
# get the foreign key id
foreign_key_id = self.foreign_field.id
# execute a query by that id and some other params
'''I don't know how to do this'''
As to how to do it, I thought I could construct a kwargs dictionary that consists of pairs of <field>:value where <field> = self._meta.get_field(some_field.name) and value = getattr(self, some_field.name).
So how do I add the foreign_key_id to kwargs? I know there is this syntax: Model.objects.filter(foreign_field__id=value)
...but I don't know how to replicate that to put into kwargs the way I'm doing it.
Likewise, is there a better way to do this in general? I don't want to hard-code too many things, which is why I didn't just do this individually for each of the models that I have.
Thank you so much in advance.

Idiomatic python - property or method?

I have a django model class that maintains state as a simple property. I have added a couple of helper properties to the class to access aggregate states - e.g. is_live returns false if the state is any one of ['closed', 'expired', 'deleted'] etc.
As a result of this my model has a collection of is_ properties, that do very simple lookups on internal properties of the object.
I now want to add a new property, is_complete - which is semantically the same as all the other properties - a boolean check on the state of the object - however, this check involves loading up dependent (one-to-many) child objects, checking their state and reporting back based on the results - i.e. this property actually does some (more than one) database query, and processes the results.
So, is it still valid to model as a property (using the #property decorator), or should I instead forego the decorator and leave it as a method?
Pro of using a property is that it's semantically consistent with all the other is_ properties.
Pro of using a method is that it indicates to other developers that this is something that has a more complex implementation, and so should be used sparingly (i.e. not inside a for.. loop).
from django.db import models
class MyModel(models.Model):
state = CharField(default='new')
#property
def is_open(self):
# this is a simple lookup, so makes sense as a property
return self.state in ['new', 'open', 'sent']
def is_complete(self):
# this is a complex database activity, but semantically correct
related_objects = self.do_complicated_database_lookup()
return len(related_objects)==0
EDIT: I come from a .NET background originally, where the split is admirably defined by Jeff Atwood as
"if there's any chance at all that code could spawn an hourglass, it definitely should be a method."
EDIT 2: slight update to the question - would it be a problem to have it as a method, called is_complete, so that there are mixed properties and methods with similar names - or is that just confusing?
So - it would look something like this:
>>> m = MyModel()
>>> m.is_live
True
>>> m.is_complete()
False
It is okay to do that, especially if you will use the following pattern:
class SomeClass(models.Model):
#property
def is_complete(self):
if not hasattr(self, '_is_complete'):
related_objects = self.do_complicated_database_lookup()
self._is_complete = len(related_objects) == 0
return self._is_complete
Just remember that it "caches" the results, so first execution does calculation, but subsequent use existing results.

How can I create a new model entity, and then read it immediately after?

My question is, what is the best way to create a new model entity, and then read it immediately after. For example,
class LeftModel(ndb.Model):
name = ndb.StringProperty(default = "John")
date = ndb.DateTimeProperty(auto_now_add=True)
class RightModel(ndb.Model):
left_model = ndb.KeyProperty(kind=LeftModel)
interesting_fact = ndb.StringProperty(default = "Nothing")
def do_this(self):
# Create a new model entity
new_left = LeftModel()
new_left.name = "George"
new_left.put()
# Retrieve the entity just created
current_left = LeftModel.query().filter(LeftModel.name == "George").get()
# Create a new entity which references the entity just created and retrieved
new_right = RightModel()
new_right.left_model = current_left.key
new_right.interesting_fact = "Something"
new_right.put()
This quite often throws an exception like:
AttributeError: 'NoneType' object has no attribute 'key'
I.e. the retrieval of the new LeftModel entity was unsuccessful. I've faced this problem a few times with appengine and my solution has always been a little hacky. Usually I just put everything in a try except or a while loop until the entity is successfully retrieved. How can I ensure that the model entity is always retrieved without running the risks of infinite loops (in the case of the while loop) or messing up my code (in the case of the try except statements)?
Why are you trying to fetch the object via a query immediately after you have performed the put().
You should use the new_left you just created and immediately assign it to the new_right as in new_right.left_model = current_left.key
The reason you can not query immediately is because HRD uses an eventual consistency model, which means you result of the put will be visible eventualy. If you want a consistent result then you must perform ancestor queries and this implies an ancestor in the key on creation. Given you are creating a tree this is probably not practical. Have a read about Structuring Data for Strong Consistency https://developers.google.com/appengine/docs/python/datastore/structuring_for_strong_consistency
I don't see any reason why you just don't use the entity you just created without the additional query.

Does a django query save its result after it's been called?

I'm trying to determine whether or not a simple caching trick will actually be useful. I know Django querysets are lazy to improve efficiency, but I'm wondering if they save the result of their query after the data has been called.
For instance, if I have two models:
class Klass1(models.Model):
k2 = models.ForeignKey('Klass2')
class Klass2(models.Model):
# Model Code ...
#property
def klasses(self):
self.klasses = Klass1.objects.filter(k2=self)
return self.klasses
And I call klass_2_instance.klasses[:] somewhere, then the database is accessed and returns a query. I'm wondering if I call klass_2_instance.klasses again, will the database be accessed a second time, or will the django query save the result from the first call?
Django will not cache it for you.
Instead of Klass1.objects.filter(k2=self), you could just do self.klass1_set.all().
Because Django always create a set in the many side of 1-n relations.
I guess this kind of cache is complicated because it should remember all filters, excludes and order_by used. Although it could be done using any well designed hash, you should at least have a parameter to disable cache.
If you would like any cache, you could do:
class Klass2(models.Model):
def __init__(self, *args, **kwargs):
self._klass1_cache = None
super(Klass2, self).__init__(*args, **kwargs)
def klasses(self):
if self._klass1_cache is None:
# Here you can't remove list(..) because it is forcing query execution exactly once.
self._klass1_cache = list(self.klass1_set.all())
return self._klass1_cache
This is very useful when you loop many times in all related objects. For me it often happens in template, when I need to loop more than one time.
This query isn't cached by Django.
The forwards FK relationship - ie given a Klass object klass, doing klass.k2 - is cached after the first lookup. But the reverse, which you're doing here - and which is actually usually spelled klass2.klass_set.all() - is not cached.
You can easily memoize it:
#property
def klasses(self):
if not hasattr(self, '_klasses'):
self._klasses = self.klass_set.all()
return self._klasses
(Note that your existing code won't work, as you're overriding the method klasses with an attribute.)
Try using johnny-cache if you want transparent caching of querysets.

Categories