GAE Datastore ndb models accessed in 5 different ways - python

I run an online marketplace. I don't know the best way to access NDB models. I'm afraid it's a real mess and I really don't know which way to turn. If you don't have time for a full response, I'm happy to read an article on NDB best practices
I have these classes, which are interlinked in different ways:
User(webapp2_extras.appengine.auth.models.User) controls seller logins
Partner(ndb.Model) contains information about sellers
menuitem(ndb.Model) contains information about items on menu
order(ndb.Model) contains buyer information & information about an order (all purchases are "guest" purchases)
Preapproval(ndb.Model) contains payment information saved from PayPal
How they're linked.
User - Partner
A 1-to-1 relationship. Both have "email address" fields. If these match, then can retrieve user from partner or vice versa. For example:
user = self.user
partner = model.Partner.get_by_email(user.email_address)
Where in the Partner model we have:
#classmethod
def get_by_email(cls, partner_email):
query = cls.query(Partner.email == partner_email)
return query.fetch(1)[0]
Partner - menuitem
menuitems are children of Partner. Created like so:
myItem = model.menuitem(parent=model.partner_key(partner_name))
menuitems are referenced like this:
menuitems = model.menuitem.get_by_partner_name(partner.name)
where get_by_partner_name is this:
#classmethod
def get_by_partner_name(cls, partner_name):
query = cls.query(
ancestor=partner_key(partner_name)).order(ndb.GenericProperty("itemid"))
return query.fetch(300)
and where partner_key() is a function just floating at the top of the model.py file:
def partner_key(partner_name=DEFAULT_PARTNER_NAME):
return ndb.Key('Partner', partner_name)
Partner - order
Each Partner can have many orders. order has a parent that is Partner. How an order is created:
partner_name = self.request.get('partner_name')
partner_k = model.partner_key(partner_name)
myOrder = model.order(parent=partner_k)
How an order is referenced:
myOrder_k = ndb.Key('Partner', partnername, 'order', ordernumber)
myOrder = myOrder_k.get()
and sometimes like so:
order = model.order.get_by_name_id(partner.name, ordernumber)
(where in model.order we have:
#classmethod
def get_by_name_id(cls, partner_name, id):
return ndb.Key('Partner', partner_name, 'order', int(id)).get()
)
This doesn't feel particularly efficient, particularly as I often have to look up the partner in the datastore just to pull up an order. For example:
user = self.user
partner = model.Partner.get_by_email(user.email_address)
order = model.order.get_by_name_id(partner.name, ordernumber)
Have tried desperately to get something simple like myOrder = order.get_by_id(ordernumber) to work, but it seems that having a partner parent stops that working.
Preapproval - order.
a 1-to-1 relationship. Each order can have a 'Preapproval'. Linkage: a field in the Preapproval class: order = ndb.KeyProperty(kind=order).
creating a Preapproval:
item = model.Preapproval( order=myOrder.key, ...)
accessing a Preapproval:
preapproval = model.Preapproval.query(model.Preapproval.order == order.key).get()
This seems like the easiest method to me.
TL;DR: I'm linking & accessing models in many ways, and it's not very systematic.

User - Parner
You could replace:
#classmethod
def get_by_email(cls, partner_email):
query = cls.query(Partner.email == partner_email)
return query.fetch(1)[0]
with:
#classmethod
def get_by_email(cls, partner_email):
query = cls.query(Partner.email == partner_email).get()
But because of transactions issues is better to use entity groups: User should be parent of Partner.
In this case instead of using get_by_email you can get user without queries:
user = partner.key.parent().get()
Or do an ancestor query for getting the partner object:
partner = Partner.query(ancestor=user_key).get()
Query
Don't use fetch() if you don't need it. Use queries as iterators.
Instead of:
return query.fetch(300)
just:
return query
And then use query as:
for something in query:
blah
Relationships: Partner-Menu Item and Partner - Order
Why are you using entity groups? Ancestors are not used for modeling 1 to N relationships (necessarily). Ancestors are used for transactions, defining entity groups. They are useful in composition relationships (e.g.: partner - user)
You can use a KeyProperty for the relationship. (multivalue (i.e. repeated=true) or not, depending on the orientation of the relationship)
Have tried desperately to get something simple like myOrder = order.get_by_id(ordernumber) to work, but it seems that having a partner parent stops that working.
No problem if you stop using ancestors in this relationship.
TL;DR: I'm linking & accessing models in many ways, and it's not very systematic
There is not a systematic way of linking models. It depends of many factors: cardinality, number of possible items in each side, need transactions, composition relationship, indexes, complexity of future queries, denormalization for optimization, etc.

Ok, I think the first step in cleaning this up is as follows:
At the top of your .py file, import all your models, so you don't have to keep using model.ModelName. That cleans up a bit if the code. model.ModelName becomes ModelName.
First best practice in cleaning this up is to always use a capital letter as the first letter to name a class. A model name is a class. Above, you have mixed model names, like Partner, order, menuitem. It makes it hard to follow. Plus, when you use order as a model name, you may end up with conflicts. Above you redefined order as a variable twice. Use Order as the model name, and this_order as the lookup, and order_key as the key, to clear up some conflicts.
Ok, let's start there

Related

Can an association class be implemented in Python?

I have just started learning software development and I am modelling my system in a UML Class diagram. I am unsure how I would implement this in code.
To keep things simple let’s assume the followimg example:
There is a Room and a Guest Class with association Room(0..)-Guest(0..) and an association class RoomBooking, which contains booking details. How would I model this in Python if my system wants to see all room bookings made by a particular guest?
Most Python applications developed from a UML design are backed by a relational database, usually via an ORM. In which case your design is pretty trivial: your RoomBooking is a table in the database, and the way you look up all RoomBooking objects for a given Guest is just an ORM query. Keeping it vague rather than using a particular ORM syntax, something like this:
bookings = RoomBooking.select(Guest=guest)
With an RDBMS but no ORM, it's not much different. Something like this:
sql = 'SELECT Room, Guest, Charge, Paid FROM RoomBooking WHERE Guest = ?'
cur = db.execute(sql, (guest.id))
bookings = [RoomBooking(*row) for row in cur]
And this points to what you'd do if you're not using a RDBMS: any relation that would be stored as a table with a foreign key is instead stored as some kind of dict in memory.
For example, you might have a dict mapping guests to sets of room bookings:
bookings = guest_booking[guest]
Or, alternatively, if you don't have a huge number of hotels, you might have this mapping implicit, with each hotel having a 1-to-1 mapping of guests to bookings:
bookings = [hotel.bookings[guest] for hotel in hotels]
Since you're starting off with UML, you're probably thinking in strict OO terms, so you'll want to encapsulate this dict in some class, behind some mutator and accessor methods, so you can ensure that you don't accidentally break any invariants.
There are a few obvious places to put it—a BookingManager object makes sense for the guest-to-set-of-bookings mapping, and the Hotel itself is such an obvious place for the per-hotel-guest-to-booking that I used it without thinking above.
But another place to put it, which is closer to the ORM design, is in a class attribute on the RoomBooking type, accessed by classmethods. This also allows you to extend things if you later need to, e.g., look things up by hotel—you'd then put two dicts as class attributes, and ensure that a single method always updates both of them, so you know they're always consistent.
So, let's look at that:
class RoomBooking
guest_mapping = collections.defaultdict(set)
hotel_mapping = collections.defaultdict(set)
def __init__(self, guest, room):
self.guest, self.room = guest, room
#classmethod
def find_by_guest(cls, guest):
return cls.guest_mapping[guest]
#classmethod
def find_by_hotel(cls, hotel):
return cls.hotel_mapping[hotel]
#classmethod
def add_booking(cls, guest, room):
booking = cls(guest, room)
cls.guest_mapping[guest].add(booking)
cls.hotel_mapping[room.hotel].add(booking)
Of course your Hotel instance probably needs to add the booking as well, so it can raise an exception if two different bookings cover the same room on overlapping dates, whether that happens in RoomBooking.add_booking, or in some higher-level function that calls both Hotel.add_booking and RoomBooking.add_booking.
And if this is multi-threaded (which seems like a good possibility, given that you're heading this far down the Java-inspired design path), you'll need a big lock, or a series of fine-grained locks, around the whole transaction.
For persistence, you probably want to store these mappings along with the public objects. But for a small enough data set, or for a server that rarely restarts, it might be simpler to just persist the public objects, and rebuild the mappings at load time by doing a bunch of add_booking calls as part of the load process.
If you want to make it even more ORM-style, you can have a single find method that takes keyword arguments and manually executes a "query plan" in a trivial way:
#classmethod
def find(cls, guest=None, hotel=None):
if guest is None and hotel is None:
return {booking for bookings in cls.guest_mapping.values()
for booking in bookings}
elif hotel is None:
return cls.guest_mapping[guest]
elif guest is None:
return cls.hotel_mapping[hotel]
else:
return {booking for booking in cls.guest_mapping[guest]
if booking.room.hotel == hotel}
But this is already pushing things to the point where you might want to go back and ask whether you were right to not use an ORM in the first place. If that sounds ridiculously heavy duty for your simple toy app, take a look at sqlite3 for the database (which comes with Python, and which takes less work to use than coming up with a way to pickle or json all your data for persistence) and SqlAlchemy for the ORM. There's not much of a learning curve, and not much runtime overhead or coding-time boilerplate.
Sure you can implement it in Python. But there is not a single way. Quite often you have a database layer where the association class is used with two foreign keys (in your case to the primaries of Room and Guest). So in order to search you would just code an according SQL to be sent. In case you want to cache this table you would code it like this (or similarly) with an associative array:
from collections import defaultdict
class Room():
def __init__(self, num):
self.room_number = num
def key(self):
return str(self.room_number)
class Guest():
def __init__(self, name):
self.name = name
def key(self):
return self.name
def nested_dict(n, type):
if n == 1:
return defaultdict(type)
else:
return defaultdict(lambda: nested_dict(n-1, type))
room_booking = nested_dict(2, str)
class Room_Booking():
def __init__(self, date):
self.date = date
room1 = Room(1)
guest1 = Guest("Joe")
room_booking[room1.key()][guest1.key()] = Room_Booking("some date")
print(room_booking[room1.key()][guest1.key()])

How to model a 'Like' mechanism via ndb?

We are about to introduce a social aspect into our app, where users can like each others events.
Getting this wrong would mean a lot of headache later on, hence I would love to get input from some experienced developers on GAE, how they would suggest to model it.
It seems there is a similar question here however the OP didn't provide any code to begin with.
Here are two models:
class Event(ndb.Model):
user = ndb.KeyProperty(kind=User, required=True)
time_of_day = ndb.DateTimeProperty(required=True)
notes = ndb.TextProperty()
timestamp = ndb.FloatProperty(required=True)
class User(UserMixin, ndb.Model):
firstname = ndb.StringProperty()
lastname = ndb.StringProperty()
We need to know who has liked an event, in case that the user may want to unlike it again. Hence we need to keep a reference. But how?
One way would be introducing a RepeatedProperty to the Event class.
class Event(ndb.Model):
....
ndb.KeyProperty(kind=User, repeated=True)
That way any user that would like this Event, would be stored in here. The number of users in this list would determine the number of likes for this event.
Theoretically that should work. However this post from the creator of Python worries me:
Do not use repeated properties if you have more than 100-1000 values.
(1000 is probably already pushing it.) They weren't designed for such
use.
And back to square one. How am I supposed to design this?
RepeatProperty has limitation in number of values (< 1000).
One recommended way to break the limit is using shard:
class Event(ndb.Model):
# use a integer to store the total likes.
likes = ndb.IntegerProperty()
class EventLikeShard(ndb.Model):
# each shard only store 500 users.
event = ndb.KeyProperty(kind=Event)
users = ndb.KeyProperty(kind=User, repeated=True)
If the limitation is more than 1000 but less than 100k.
A simpler way:
class Event(ndb.Model):
likers = ndb.PickleProperty(compressed=True)
Use another model "Like" where you keep the reference to user and event.
Old way of representing many to many in a relational manner. This way you keep all entities separated and can easily add/remove/count.
I would recommend the usual many-to-many relationship using an EventUser model given that the design seems to require unlimited number of user linking an event. The only tricky part is that you must ensure that event/user combination is unique, which can be done using _pre_put_hook. Keeping a likes counter as proposed by #lucemia is indeed a good idea.
You would then would capture the liked action using a boolean, or, you can make it a bit more flexible by including an actions string array. This way, you could also capture action such as signed-up or attended.
Here is a sample code:
class EventUser(ndb.Model):
event = ndb.KeyProperty(kind=Event, required=True)
user = ndb.KeyProperty(kind=User, required=True)
actions = ndb.StringProperty(repeated=True)
# make sure event/user is unique
def _pre_put_hook(self):
cur_key = self.key
for entry in self.query(EventUser.user == self.user, EventUser.event == self.event):
# If cur_key exists, means that user is performing update
if cur_key.id():
if cur_key == entry.key:
continue
else:
raise ValueError("User '%s' is a duplicated entry." % (self.user))
# If adding
raise ValueError("User Add '%s' is a duplicated entry." % (self.user))

NDB, Querying across multiple models. AppEngine

I am struggling with querying across multiple models.
This is what my class structure looks like:
class User(ndb.Model):
...
class LogVisit(ndb.Model)
user = ndb.KeyProperty(kind=User)
...
class LogOnline(ndb.Model)
logVisit = ndb.KeyProperty(kind = LogVisit)
...
and I want to get a list of the user's LogOnline's
what I want to do is this:
qry = LogOnline.query(LogOnline.logvisit.get().user.get() == user)
However app engine wont allow me to use the get method within a query.
Any thoughts on the best way to go about this?
Many thanks.
The most efficient way will be to store the user's key in the LogOnline entity. We can;t see the rest of your model to see what LogVisit adds to the whole excercise so difficult to see what LogVisit as an intermediate entity brings to the design.
Then just
LogOnline.query().filter(LogOnline.user == user)
You will have to stop thinking in terms of SQL if you want to have scalable applications on appengine. Think in terms of pure entity relationships and don't try to normalize the data model. Intermediate entities like LogVisit tend to only be used if you need many to many relationships but are still inefficient if you have more than a few instances of them for a particular relationship.
You are doing it wrong.
# user variable is assumed to be a key
logonlines = [] # You can use set also
logvisits = LogVisit.query().filter(LogVisit.user == user).fetch()
for logvisit in logvisits:
logOnlinesA = LogOnline.query().filter(LogOnline.logVisit == logvisit.key).fetch()
logonlines.extend(logOnlinesA)
Give it a try :
logvisits = LogVisit.query().filter(LogVisit.user == user).fetch(keys_only=True)
logOnlinesA = LogOnline.query().filter(LogOnline.logVisit.in(logvisits)).fetch()

modelling the google datastore/python

Hi I am trying to build an application which has models resembling something like the below ones:-(While it would be easy to merge the two models into one and use them , but that is not feasible in the actual app)
class User(db.Model):
username=db.StringProperty()
email=db.StringProperty()
class UserLikes(db.Model):
username=db.StringProperty()
food=db.StringProperty()
The objective- The user after logging in enters the food that he likes and the app in turn returns all the other users who like that food.
Now suppose a user Alice enters that she likes "Pizzas" , it gets stored in the datastore. She logs out and logs in again.At this point we query the datastore for the food that she likes and then query again for all users who like that food. This as you see are two datastore queries which is not the best way. I am sure there would definitely be a better way to do this. Can someone please help.
[Update:-Or can something like this be done that I change the second model such that usernames become a multivalued property in which all the users that like that food can be stored.. however I am a little unclear here]
[Edit:-Hi Thanks for replying but I found both the solutions below a bit of a overkill here. I tried doing it like below.Request you to have a look at this and kindly advice. I maintained the same two tables,however changed them like below:-
class User(db.Model):
username=db.StringProperty()
email=db.StringProperty()
class UserLikes(db.Model):
username=db.ListProperty(basestring)
food=db.StringProperty()
Now when 2 users update same food they like, it gets stored like
'pizza' ----> 'Alice','Bob'
And my db query to retrieve data becomes quite easy here
query=db.Query(UserLikes).filter('username =','Alice').get()
which I can then iterate over as something like
for elem in query.username:
print elem
Now if there are two foods like below:-
'pizza' ----> 'Alice','Bob'
'bacon'----->'Alice','Fred'
I use the same query as above , and iterate over the queries and then the usernames.
I am quite new to this , to realize that this just might be wrong. Please Suggest!
Beside the relation model you have, you could handle this in two other ways depending on your exact use case. You have a good idea in your update, use a ListProperty. Check out Brett Slatkin's taslk on Relation Indexes for some background.
You could use a child entity (Relation Index) on user that contains a list of foods:
class UserLikes(db.Model):
food = db.StringListProperty()
Then when you are creating a UserLikes instance, you will define the user it relates to as the parent:
likes = UserLikes(parent=user)
That lets you query for other users who like a particular food nicely:
like_apples_keys = UserLikes.all(keys_only=True).filter(food='apples')
user_keys = [key.parent() for key in like_apples_keys]
users_who_like_apples = db.get(user_keys)
However, what may suit your application better, would be to make the Relation a child of a food:
class WhoLikes(db.Model):
users = db.StringListProperty()
Set the key_name to the name of the food when creating the like:
food_i_like = WhoLikes(key_name='apples')
Now, to get all users who like apples:
apple_lover_key_names = WhoLikes.get_by_key_name('apples')
apple_lovers = UserModel.get_by_key_names(apple_lover_key_names.users)
To get all users who like the same stuff as a user:
same_likes = WhoLikes.all().filter('users', current_user_key_name)
like_the_same_keys = set()
for keys in same_likes:
like_the_same_keys.union(keys.users)
same_like_users = UserModel.get_by_key_names(like_the_same_keys)
If you will have lots of likes, or lots users with the same likes, you will need to make some adjustments to the process. You won't be able to fetch 1,000s of users.
Food and User relation is a so called Many-to-Many relationship tipically handled with a Join table; in this case a db.Model that links User and Food.
Something like this:
class User(db.Model):
name = db.StringProperty()
def get_food_I_like(self):
return (entity.name for entity in self.foods)
class Food(db.Model):
name = db.StringProperty()
def get_users_who_like_me(self):
return (entity.name for entity in self.users)
class UserFood(db.Model):
user= db.ReferenceProperty(User, collection_name='foods')
food = db.ReferenceProperty(Food, collection_name='users')
For a given User's entity you could retrieve preferred food with:
userXXX.get_food_I_like()
For a given Food's entity, you could retrieve users that like that food with:
foodYYY.get_users_who_like_me()
There's also another approach to handle many to many relationship storing a list of keys inside a db.ListProperty().
class Food(db.Model):
name = db.StringProperty()
class User(db.Model):
name = db.StringProperty()
food = db.ListProperty(db.Key)
Remember that ListProperty is limited to 5.000 keys or again, you can't add useful properties that would fit perfectly in the join table (ex: a number of stars representing how much a User likes a Food).

How do I ensure data integrity for objects in google app engine without using key names?

I'm having a bit of trouble in Google App Engine ensuring that my data is correct when using an ancestor relationship without key names.
Let me explain a little more: I've got a parent entity category, and I want to create a child entity item. I'd like to create a function that takes a category name and item name, and creates both entities if they don't exist. Initially I created one transaction and created both in the transaction if needed using a key name, and this worked fine. However, I realized I didn't want to use the name as the key as it may need to change, and I tried within my transaction to do this:
def add_item_txn(category_name, item_name):
category_query = db.GqlQuery("SELECT * FROM Category WHERE name=:category_name", category_name=category_name)
category = category_query.get()
if not category:
category = Category(name=category_name, count=0)
item_query = db.GqlQuery("SELECT * FROM Item WHERE name=:name AND ANCESTOR IS :category", name=item_name, category=category)
item_results = item_query.fetch(1)
if len(item_results) == 0:
item = Item(parent=category, name=name)
db.run_in_transaction(add_item_txn, "foo", "bar")
What I found when I tried to run this is that App Engine rejects this as it won't let you run a query in a transaction: Only ancestor queries are allowed inside transactions.
Looking at the example Google gives about how to address this:
def decrement(key, amount=1):
counter = db.get(key)
counter.count -= amount
if counter.count < 0: # don't let the counter go negative
raise db.Rollback()
db.put(counter)
q = db.GqlQuery("SELECT * FROM Counter WHERE name = :1", "foo")
counter = q.get()
db.run_in_transaction(decrement, counter.key(), amount=5)
I attempted to move my fetch of the category to before the transaction:
def add_item_txn(category_key, item_name):
category = category_key.get()
item_query = db.GqlQuery("SELECT * FROM Item WHERE name=:name AND ANCESTOR IS :category", name=item_name, category=category)
item_results = item_query.fetch(1)
if len(item_results) == 0:
item = Item(parent=category, name=name)
category_query = db.GqlQuery("SELECT * FROM Category WHERE name=:category_name", category_name="foo")
category = category_query.get()
if not category:
category = Category(name=category_name, count=0)
db.run_in_transaction(add_item_txn, category.key(), "bar")
This seemingly worked, but I found when I ran this with a number of requests that I had duplicate categories created, which makes sense, as the category is queried outside the transaction and multiple requests could create multiple categories.
Does anyone have any idea how I can create these categories properly? I tried to put the category creation into a transaction, but received the error about ancestor queries only again.
Thanks!
Simon
Here is an approach to solving your problem. It is not an ideal approach in many ways, and I sincerely hope that someone other AppEnginer will come up with a neater solution than I have. If not, give this a try.
My approach utilizes the following strategy: it creates entities that act as aliases for the Category entities. The name of the Category can change, but the alias entity will retain its key, and we can use elements of the alias's key to create a keyname for your Category entities, so we will be able to look up a Category by its name, but its storage is decoupled from its name.
The aliases are all stored in a single entity group, and that allows us to use a transaction-friendly ancestor query, so we can lookup or create a CategoryAlias without risking that multiple copies will be created.
When I want to lookup or create a Category and item combo, I can use the category's keyname to programatically generate a key inside the transaction, and we are allowed to get an entity via its key inside a transaction.
class CategoryAliasRoot(db.Model):
count = db.IntegerProperty()
# Not actually used in current code; just here to avoid having an empty
# model definition.
__singleton_keyname = "categoryaliasroot"
#classmethod
def get_instance(cls):
# get_or_insert is inherently transactional; no chance of
# getting two of these objects.
return cls.get_or_insert(cls.__singleton_keyname, count=0)
class CategoryAlias(db.Model):
alias = db.StringProperty()
#classmethod
def get_or_create(cls, category_alias):
alias_root = CategoryAliasRoot.get_instance()
def txn():
existing_alias = cls.all().ancestor(alias_root).filter('alias = ', category_alias).get()
if existing_alias is None:
existing_alias = CategoryAlias(parent=alias_root, alias=category_alias)
existing_alias.put()
return existing_alias
return db.run_in_transaction(txn)
def keyname_for_category(self):
return "category_" + self.key().id
def rename(self, new_name):
self.alias = new_name
self.put()
class Category(db.Model):
pass
class Item(db.Model):
name = db.StringProperty()
def get_or_create_item(category_name, item_name):
def txn(category_keyname):
category_key = Key.from_path('Category', category_keyname)
existing_category = db.get(category_key)
if existing_category is None:
existing_category = Category(key_name=category_keyname)
existing_category.put()
existing_item = Item.all().ancestor(existing_category).filter('name = ', item_name).get()
if existing_item is None:
existing_item = Item(parent=existing_category, name=item_name)
existing_item.put()
return existing_item
cat_alias = CategoryAlias.get_or_create(category_name)
return db.run_in_transaction(txn, cat_alias.keyname_for_category())
Caveat emptor: I have not tested this code. Obviously, you will need to change it to match your actual models, but I think that the principles that it uses are sound.
UPDATE:
Simon, in your comment, you mostly have the right idea; although, there is an important subtlety that you shouldn't miss. You'll notice that the Category entities are not children of the dummy root. They do not share a parent, and they are themselves the root entities in their own entity groups. If the Category entities did all have the same parent, that would make one giant entity group, and you'd have a performance nightmare because each entity group can only have one transaction running on it at a time.
Rather, the CategoryAlias entities are the children of the bogus root entity. That allows me to query inside a transaction, but the entity group doesn't get too big because the Items that belong to each Category aren't attached to the CategoryAlias.
Also, the data in the CategoryAlias entity can change without changing the entitie's key, and I am using the Alias's key as a data point for generating a keyname that can be used in creating the actual Category entities themselves. So, I can change the name that is stored in the CategoryAlias without losing my ability to match that entity with the same Category.
A couple of things to note (I think they're probably just typos) -
The first line of your transactional method calls get() on a key - this is not a documented function. You don't need to have the actual category object in the function anyway - the key is sufficient in both of the places where you are using the category entity.
You don't appear to be calling put() on either of the category or the item (but since you say you are getting data in the datastore, I assume you have left this out for brevity?)
As far as a solution goes - you could attempt to add a value in memcache with a reasonable expiry -
if memcache.add("category.%s" % category_name, True, 60): create_category(...)
This at least stops you creating multiples. It is still a bit tricky to know what do if the query does not return the category, but you cannot grab the lock from memcache. This means the category is in the process of being created.
If the originating request comes from the task queue, then just throw an exception so the task gets re-run.
Otherwise you could wait a bit and query again, although this is a little dodgy.
If the request comes from the user, then you could tell them there has been a conflict and to try again.

Categories