Using GQL to get all new results since the previous result - python

I'm pretty useless when it comes to queries, I'm wondering what's the correct structure for this problem.
Clients are sent data including the key of the object, they use the key to tell the server what was the most recent object they downloaded.
I want to get all objects since that point, the objects have an automatic date attribute.
Additionally, I want to be able to give the 15 (or so) most recent objects to new users who may request using a specific 'new user' key or something similar.
Using the Python2.7 runtime, never used GQL before,
Any help is greatly appreciated.
The Model Class is this:
class Message(db.Model):
user = db.StringProperty()
content = db.TextProperty()
colour = db.StringProperty()
room = db.StringProperty()
date = db.DateTimeProperty(auto_now_add=True)

If it is a db.Key object or a string representation of a key using the db (as opposed to the ndb) API:
last_message = Message.get(lastkey)
If you have the key in another representation, such as the key name:
last_message = Message.get_by_key_name(lastkey)
If you have the key as the numeric ID of the object:
last_message = Message.get_by_id(int(lastkey))
Then, you can get the messages since that last message as follows:
messages_since_last_message = Message.all().filter('date >', last_message.date).order('date')
#OR GQL:
messages_since_last_message = Message.gql("WHERE date > :1 ORDER BY date ASC", last_message.date)
You should maybe use the >= comparator only because there may be multiple messages that arrive at the same exact time, and then filter out all messages that are in the list up to and including the last key you are looking for (this actually depends on your use case and how closely message could be written). Additionally, with the High Replication datastore, there is eventual consistency, so your query is not guaranteed to accurately reflect the datastore unless you use ancestor queries, in which case you limit your entity group to ~1 write per second, which again, depending on your use case, could be a non-issue. The Entity Group here reflects the parent model and all of its children ancestors. The group of ancestors resides on a single entity group.

Related

How to check the existance of single Entity? Google App Engine, Python

Sorry for noobster question again.
But I'm trying to do some very easy stuff here, and I don't know how. Documentation gives me hints which do not work, or apply.
I recieve a POST request and grab a variable out of it. It says "name".
I have to search all over my entities Object (for example) and find out if there's one that has the same name. Is there's none, I must create a new Entity with this name. Easy it may look, but I keep Failing.
Would really appreciate any help.
My code currently is this one:
objects_qry = Object.query(Object.name == data["name"])
if (not objects_qry ):
obj = Object()
obj .name = data["name"]
obj .put()
class Object(ndb.Model):
name = ndb.StringProperty()
Using a query to perform this operation is really inefficient.
In addition your code is possibly unreliable, if name doesn't exist and you have two requests at the same time for name you could end up with two records. And you can't tell because your query only returns the first entity with the name property equal to some value.
Because you expect only one entity for name a query is expensive and inefficient.
So you have two choices you can use get_or_insert or just do a get, and if you have now value create a new entity.
Any way here is a couple of code samples using the name as part of the key.
name = data['name']
entity = Object.get_or_insert(name)
or
entity = Object.get_by_id(name)
if not entity:
entity = Object(id=name)
entity.put()
Calling .query just creates a query object, it doesn't execute it, so trying to evaluate is as a boolean is wrong. Query object have methods, fetch and get that, respectively, return a list of matching entities, or just one entity.
So your code could be re-written:
objects_qry = Object.query(Object.name == data["name"])
existing_object = objects_qry.get()
if not existing_object:
obj = Object()
obj.name = data["name"]
obj.put()
That said, Tim's point in the comments about using the ID instead of a property makes sense if you really care about names being unique - the code above wouldn't stop two simultaneous requests from creating entities with the same name.

appengine many to many field update value and lookup efficiently

I am using appengine with python 2.7 and webapp2 framework. I am not using ndb.model.
I have the following model:
class Story(db.Model);
name = db.StringProperty()
class UserProfile(db.Model):
name = db.StringProperty()
user = db.UserProperty()
class Tracking(db.Model):
user_profile = db.ReferenceProperty(UserProfile)
story = db.ReferenceProperty(Story)
upvoted = db.BooleanProperty()
flagged = db.BoolenProperty()
A user can upvote and/or flag a story but only once. Hence I came up with the above model.
Now when a user clicks on the upvote link, on the database I try to see if the user has not already voted it, hence I do try to do the following:
get the user instance with his id as up = db.get(db.Key.from_path('UserProfile', uid))
then get the story instance as follows s_ins = db.get(db.Key.from_path('Story', uid))
Now it is the turn to check if a Tracking based on these two exist, if yes then don't allow voting, else allow him to vote and update the Tracking instance.
What is the most convenient way to fetch a Tracking instance given an id(db.key().id()) of user_profile and story?
What is the most convenient way to save a Tracking model having given a user profile id and an story id?
Is there a better way to implement tracking?
You can try tracking using lists of keys versus having a separate entry for track/user/story:
class Story(db.Model);
name = db.StringProperty()
class UserProfile(db.Model):
name = db.StringProperty()
user = db.UserProperty()
class Tracking(db.Model):
story = db.ReferenceProperty(Story)
upvoted = db.ListProperty(db.Key)
flagged = db.ListProperty(db.Key)
So when you want to see if a user upvoted for a given story:
Tracking.all().filter('story =', db.Key.from_path('Story', uid)).filter('upvoted =', db.Key.from_path('UserProfile', uid)).get(keys_only=True)
Now the only problem here is the size of the upvoted/flagged lists can't grow too large (I think the limit is 5000), so you'd have to make a class to manage this (that is, when adding to the upvoted/flagged lists, detect if X entries exists, and if so, start a new tracking object to hold additional values). You will also have to make this transactional and with HR you have a 1 write per second threshold. This may or may not be an issue depending on your expected use case. A way around the write threshold would be to implement upvotes/flags using pull-queues and to have a cron job that pulls and batch updates tracking objects as needed.
This method has its pros/cons. The most obvious cons are the ones I just listed. The pros, however, may be worth it. You can get a full list of users who upvoted/flagged a story from a single list (or multiple depending on how popular the story is). You can get a full list of users with a lot fewer queries to the datastore. This method should also take less storage, index, and metadata space. Additionally, adding a user to a tracking object will be cheaper, instead of writing a new object + 2 writes for each property, you would just be charged 1 write for the object + 2 writes for the entry to the list (9 vs 3 writes for adding users to a pre-existing tracked story, or 9 vs 7 for untracked stories)
What you propose sounds reasonable.
Don't use the app engine generated key for Tracking. Because the combination of story/user should be unique, create your own key as a combination of the story/user. Something like
tracking = Tracking.get_or_insert(str(story.id) + "-" + str(user.id), **params)
If you know the story/user, then you can always fetch the tracking by key name.

Duplicating key names and parent as properties in Google App Engine (GAE) Datastore?

After reading about the GAE Datastore API, I am still unsure if I need to duplicate key names and parents as properties for an entity.
Let's say there are two kinds of entities: Employee and Division. Each employee has a division as its parent, and is identified by an account name. I use the account name as the key name for employees. But when modeling Employee, I would still keep these two as properties:
division = db.ReferenceProperty(Division)
account_name = db.StringProperty()
Obviously I have to manually keep division consistent with its parent, and account_name with its key name. The reasons I am doing this extra work are:
I am afraid GQL/Datastore API may not support parent and key name as well as normal property. Is there anything I can do about a property but not parent or key name (or are they essentially reference properties)? How do I use key names in GQL queries?
The meaning of key name and parent is not particularly clear. As the names are not self-descriptive, I have to inform other contributors that we use account name as key name...
But this is really unnecessary work, wasting time and storage space. I cannot get rid of the SQL-thinking that - why doesn't Google just let us define a property to be the key? and another to be the parent? Then we could name them and use as normal properties...
What's the best practice here?
Keep in mind that in the GAE Datastore you can never change the parent or key_name of an entity once it has been created. These values are permanent for the life of the entity.
If there is even a small chance that the account_name of an Employee could change then you can not use it as a key_name. If it never changes then it could be a very good key_name and will allow you to do cheap gets for Employees using Employee.get_by_key_name() instead of expensive queries.
Parent is not meant to be equivalent to a foreign key. A better equivalent to a foreign key is a reference property.
The main reason you use parent is so that the parent and child entities are in the same entity group which allows you to operate on them both in a single transaction. If you just need a reference to the division from the Employee then just use a reference property. I suggest getting familiar with how entity groups work as this is very important on GAE data modeling:
https://developers.google.com/appengine/docs/python/datastore/entities#Transactions_and_Entity_Groups
Using parent can also cause write performance issues as there is a limit to how quickly you can write to a single entity group (approximately one write per second). When deciding whether to use parent or a reference property you need to think about which entities need to be modified in the same transaction. In many cases you can use Cross Group (XG) transactions instead. It is all about which trade-offs you want to make.
So my suggestions are:
If your account_name for an employee will absolutely never change then use it as a key_name. Otherwise just make it a basic property.
If you need to modify the Employee and the Division in the same transaction (and you can't get this to work with XG transactions) and you will never change the Division of an Employee then make the Division the parent of the Employee. Otherwise just model this relationship with a reference property.
When you create a new Employee object with a Divison as a parent, it would go something like:
div = Division()
... #Complete the division properties
div.put()
emp = Employee(key_name=<account_name>, parent=div)
... #Complete the employee properties
emp.put()
Then, when you want to get a reference to the Division an Employee is part of:
div = emp.parent()
#Get the Employee account_name (which is the employees's key name):
account_name = emp.key().name()
You don't have to store a RefrenceProperty to the Division an Employee is part of since it's already done in the parent. Additionally, you can get the account_name from the Employee entity's key as needed.
To query on the key:
emp = Employee.get_by_key_name(<account_name>, parent=<division>)
#OR
div = Division.get_by_key_name(<keyname>)
#Get all employees in a division
emps = Employee.all().ancestor(div)

How can I create two unique, queriable fields for a GAE Datastore Data Model?

First a little setup. Last week I was having trouble implementing a specific methodology that I had constructed which would allow me to manage two unique fields associated with one db.Model object. Since this isn't possible, I created a parent entity class and a child entity class, each having the key_name assigned one of the unique values. You can find my previous question located here, which includes my sample code and a general explaination of my insertion process.
On my original question, someone commented that my solution would not solve my problem of needing two unique fields associated with one db.Model object.
My implementation tried to solve this problem by implementing a static method that creates a ParentEntity and it's key_name property is assigned to one of my unique values. In step two of my process I create a child entity and assign the parent entity to the parent parameter. Both of these steps are executed within a db transaction so I assumed that this would force the uniqueness contraint to work since both of my values were stored within two, separate key_name fields across two separate models.
The commenter pointed out that this solution would not work because when you set a parent to a child entity, the key_name is no longer unique across the entire model but, instead, is unique across the parent-child entries. Bummer...
I believe that I could solve this new problem by changing how these two models are associated with one another.
First, I create a parent object as mentioned above. Next, I create a child entity and assign my second, unique value to it's key_name. The difference is that the second entity has a reference property to the parent model. My first entity is assigned to the reference property but not to the parent parameter. This does not force a one-to-one reference but it does keep both of my values unique and I can manage the one-to-one nature of these objects so long as I can control the insertion process from within a transaction.
This new solution is still problematic. According to the GAE Datastore documentation you can not execute multiple db updates in one transaction if the various entities within the update are not of the same entity group. Since I no longer make my first entity a parent of the second, they are no longer part of the same entity group and can not be inserted within the same transaction.
I'm back to square one. What can I do to solve this problem? Specifically, what can I do to enforce two, unique values associated with one Model entity. As you can see, I am willing to get a bit creative. Can this be done? I know this will involve an out-of-the-box solution but there has to be a way.
Below is my original code from my question I posted last week. I've added a few comments and code changes to implement my second attempt at solving this problem.
class ParentEntity(db.Model):
str1_key = db.StringProperty()
str2 = db.StringProperty()
#staticmethod
def InsertData(string1, string2, string3):
try:
def txn():
#create first entity
prt = ParentEntity(
key_name=string1,
str1_key=string1,
str2=string2)
prt.put()
#create User Account Entity
child = ChildEntity(
key_name=string2,
#parent=prt, #My prt object was previously the parent of child
parentEnt=prt,
str1=string1,
str2_key=string2,
str3=string3,)
child.put()
return child
#This should give me an error, b/c these two entities are no longer in the same entity group. :(
db.run_in_transaction(txn)
except Exception, e:
raise e
class ChildEntity(db.Model):
#foreign and primary key values
str1 = db.StringProperty()
str2_key = db.StringProperty()
#This is no longer a "parent" but a reference
parentEnt = db.ReferenceProperty(reference_class=ParentEntity)
#pertinent data below
str3 = db.StringProperty()
The system you describe will work, at the cost of transactionality. Note that the second entity is no longer a child entity - it's just another entity with a ReferenceProperty.
This solution may be sufficient to your needs - for instance, if you need to enforce that every user has a unique email address, but this is not your primary identifier for a user, you can insert a record into an 'emails' table first, then if that succeeds, insert your primary record. If a failure occurs after the first operation but before the second, you have an email address associated with no record. You can simply ignore this, or timestamp the record and allow it to be reclaimed after some period of time (for example, 30 seconds, the maximum length of a frontend request).
If your requirements on transactionality and uniqueness are stronger than that, there are other options with increasing levels of complexity, such as implementing some form of distributed transactions, but it's unlikely you'll actually need that. If you can tell us more about the nature of the records and the unique keys, we may be able to provide more detailed suggestions.
After scratching my head a bit, last night I decided to go with the following solution. I would assume that this still provides a bit of undesirable overhead for many scenarios, however, I think the overhead may be acceptable for my needs.
The code posted below is a further modification of the code in my question. Most notably, I've created another Model class, called named EGEnforcer (which stands for Entity Group Enforcer.)
The idea is simple. If a transaction can only update multiple records if they are associated with one entity group, I must find a way to associate each of my records that contains my unique values with the same entity group.
To do this, I create an EGEnforcer entry when the application initially starts. Then, when the need arises to make a new entry into my models, I query the EGEnforcer for the record associated with my paired models. After I get my EGEnforcer record, I make it the parent of both records. Viola! My data is now all associated with the same entity group.
Since the *key_name* parameter is unique only across the parent-key_name groups, this should inforce my uniqueness constraints because all of my FirstEntity (previously ParentEntity) entries will have the same parent. Likewise, my SecondEntity (previously ChildEntity) should also have a unique value stored as the key_name because the parent is also always the same.
Since both entities also have the same parent, I can execute these entries within the same transaction. If one fails, they all fail.
#My new class containing unique entries for each pair of models associated within one another.
class EGEnforcer(db.Model):
KEY_NAME_EXAMPLE = 'arbitrary unique value'
#staticmethod
setup():
''' This only needs to be called once for the lifetime of the application. setup() inserts a record into EGEnforcer that will be used as a parent for FirstEntity and SecondEntity entries. '''
ege = EGEnforcer.get_or_insert(EGEnforcer.KEY_NAME_EXAMPLE)
return ege
class FirstEntity(db.Model):
str1_key = db.StringProperty()
str2 = db.StringProperty()
#staticmethod
def InsertData(string1, string2, string3):
try:
def txn():
ege = EGEnforcer.get_by_key_name(EGEnforcer.KEY_NAME_EXAMPLE)
prt = FirstEntity(
key_name=string1,
parent=ege) #Our EGEnforcer record.
prt.put()
child = SecondEntity(
key_name=string2,
parent=ege, #Our EGEnforcer record.
parentEnt=prt,
str1=string1,
str2_key=string2,
str3=string3)
child.put()
return child
#This works because our entities are now part of the same entity group
db.run_in_transaction(txn)
except Exception, e:
raise e
class SecondEntity(db.Model):
#foreign and primary key values
str1 = db.StringProperty()
str2_key = db.StringProperty()
#This is no longer a "parent" but a reference
parentEnt = db.ReferenceProperty(reference_class=ParentEntity)
#Other data...
str3 = db.StringProperty()
One quick note-- Nick Johnson pinned my need for this solution:
This solution may be sufficient to
your needs - for instance, if you need
to enforce that every user has a
unique email address, but this is not
your primary identifier for a user,
you can insert a record into an
'emails' table first, then if that
succeeds, insert your primary record.
This is exactly what I need but my solution is, obviously, a bit different than your suggestion. My method allows for the transaction to completely occur or completely fail. Specifically, when a user creates an account, they first login to their Google account. Next, they are forced to the account creation page if there is no entry associated with their Google account in SecondEntity (which is actually UserAccount form my actual scenario.) If the insertion process fails, they are redirected to the creation page with the reason for this failure.
This could be because their ID is not unique or, potentially, a transactional timeout. If there is a timeout on the insertion of their new user account, I will want to know about it but I will implement some form of checks-and-balance in the near future. For now I simply want to go live, but this uniqueness constraint is an absolute necessity.
Being that my approach is strictly for account creation, and my user account data will not change once created, I believe that this should work and scale well for quite a while. I'm open for comments if this is incorrect.

How do I ensure data integrity for objects in google app engine without using key names?

I'm having a bit of trouble in Google App Engine ensuring that my data is correct when using an ancestor relationship without key names.
Let me explain a little more: I've got a parent entity category, and I want to create a child entity item. I'd like to create a function that takes a category name and item name, and creates both entities if they don't exist. Initially I created one transaction and created both in the transaction if needed using a key name, and this worked fine. However, I realized I didn't want to use the name as the key as it may need to change, and I tried within my transaction to do this:
def add_item_txn(category_name, item_name):
category_query = db.GqlQuery("SELECT * FROM Category WHERE name=:category_name", category_name=category_name)
category = category_query.get()
if not category:
category = Category(name=category_name, count=0)
item_query = db.GqlQuery("SELECT * FROM Item WHERE name=:name AND ANCESTOR IS :category", name=item_name, category=category)
item_results = item_query.fetch(1)
if len(item_results) == 0:
item = Item(parent=category, name=name)
db.run_in_transaction(add_item_txn, "foo", "bar")
What I found when I tried to run this is that App Engine rejects this as it won't let you run a query in a transaction: Only ancestor queries are allowed inside transactions.
Looking at the example Google gives about how to address this:
def decrement(key, amount=1):
counter = db.get(key)
counter.count -= amount
if counter.count < 0: # don't let the counter go negative
raise db.Rollback()
db.put(counter)
q = db.GqlQuery("SELECT * FROM Counter WHERE name = :1", "foo")
counter = q.get()
db.run_in_transaction(decrement, counter.key(), amount=5)
I attempted to move my fetch of the category to before the transaction:
def add_item_txn(category_key, item_name):
category = category_key.get()
item_query = db.GqlQuery("SELECT * FROM Item WHERE name=:name AND ANCESTOR IS :category", name=item_name, category=category)
item_results = item_query.fetch(1)
if len(item_results) == 0:
item = Item(parent=category, name=name)
category_query = db.GqlQuery("SELECT * FROM Category WHERE name=:category_name", category_name="foo")
category = category_query.get()
if not category:
category = Category(name=category_name, count=0)
db.run_in_transaction(add_item_txn, category.key(), "bar")
This seemingly worked, but I found when I ran this with a number of requests that I had duplicate categories created, which makes sense, as the category is queried outside the transaction and multiple requests could create multiple categories.
Does anyone have any idea how I can create these categories properly? I tried to put the category creation into a transaction, but received the error about ancestor queries only again.
Thanks!
Simon
Here is an approach to solving your problem. It is not an ideal approach in many ways, and I sincerely hope that someone other AppEnginer will come up with a neater solution than I have. If not, give this a try.
My approach utilizes the following strategy: it creates entities that act as aliases for the Category entities. The name of the Category can change, but the alias entity will retain its key, and we can use elements of the alias's key to create a keyname for your Category entities, so we will be able to look up a Category by its name, but its storage is decoupled from its name.
The aliases are all stored in a single entity group, and that allows us to use a transaction-friendly ancestor query, so we can lookup or create a CategoryAlias without risking that multiple copies will be created.
When I want to lookup or create a Category and item combo, I can use the category's keyname to programatically generate a key inside the transaction, and we are allowed to get an entity via its key inside a transaction.
class CategoryAliasRoot(db.Model):
count = db.IntegerProperty()
# Not actually used in current code; just here to avoid having an empty
# model definition.
__singleton_keyname = "categoryaliasroot"
#classmethod
def get_instance(cls):
# get_or_insert is inherently transactional; no chance of
# getting two of these objects.
return cls.get_or_insert(cls.__singleton_keyname, count=0)
class CategoryAlias(db.Model):
alias = db.StringProperty()
#classmethod
def get_or_create(cls, category_alias):
alias_root = CategoryAliasRoot.get_instance()
def txn():
existing_alias = cls.all().ancestor(alias_root).filter('alias = ', category_alias).get()
if existing_alias is None:
existing_alias = CategoryAlias(parent=alias_root, alias=category_alias)
existing_alias.put()
return existing_alias
return db.run_in_transaction(txn)
def keyname_for_category(self):
return "category_" + self.key().id
def rename(self, new_name):
self.alias = new_name
self.put()
class Category(db.Model):
pass
class Item(db.Model):
name = db.StringProperty()
def get_or_create_item(category_name, item_name):
def txn(category_keyname):
category_key = Key.from_path('Category', category_keyname)
existing_category = db.get(category_key)
if existing_category is None:
existing_category = Category(key_name=category_keyname)
existing_category.put()
existing_item = Item.all().ancestor(existing_category).filter('name = ', item_name).get()
if existing_item is None:
existing_item = Item(parent=existing_category, name=item_name)
existing_item.put()
return existing_item
cat_alias = CategoryAlias.get_or_create(category_name)
return db.run_in_transaction(txn, cat_alias.keyname_for_category())
Caveat emptor: I have not tested this code. Obviously, you will need to change it to match your actual models, but I think that the principles that it uses are sound.
UPDATE:
Simon, in your comment, you mostly have the right idea; although, there is an important subtlety that you shouldn't miss. You'll notice that the Category entities are not children of the dummy root. They do not share a parent, and they are themselves the root entities in their own entity groups. If the Category entities did all have the same parent, that would make one giant entity group, and you'd have a performance nightmare because each entity group can only have one transaction running on it at a time.
Rather, the CategoryAlias entities are the children of the bogus root entity. That allows me to query inside a transaction, but the entity group doesn't get too big because the Items that belong to each Category aren't attached to the CategoryAlias.
Also, the data in the CategoryAlias entity can change without changing the entitie's key, and I am using the Alias's key as a data point for generating a keyname that can be used in creating the actual Category entities themselves. So, I can change the name that is stored in the CategoryAlias without losing my ability to match that entity with the same Category.
A couple of things to note (I think they're probably just typos) -
The first line of your transactional method calls get() on a key - this is not a documented function. You don't need to have the actual category object in the function anyway - the key is sufficient in both of the places where you are using the category entity.
You don't appear to be calling put() on either of the category or the item (but since you say you are getting data in the datastore, I assume you have left this out for brevity?)
As far as a solution goes - you could attempt to add a value in memcache with a reasonable expiry -
if memcache.add("category.%s" % category_name, True, 60): create_category(...)
This at least stops you creating multiples. It is still a bit tricky to know what do if the query does not return the category, but you cannot grab the lock from memcache. This means the category is in the process of being created.
If the originating request comes from the task queue, then just throw an exception so the task gets re-run.
Otherwise you could wait a bit and query again, although this is a little dodgy.
If the request comes from the user, then you could tell them there has been a conflict and to try again.

Categories