modelling the google datastore/python - python

Hi I am trying to build an application which has models resembling something like the below ones:-(While it would be easy to merge the two models into one and use them , but that is not feasible in the actual app)
class User(db.Model):
username=db.StringProperty()
email=db.StringProperty()
class UserLikes(db.Model):
username=db.StringProperty()
food=db.StringProperty()
The objective- The user after logging in enters the food that he likes and the app in turn returns all the other users who like that food.
Now suppose a user Alice enters that she likes "Pizzas" , it gets stored in the datastore. She logs out and logs in again.At this point we query the datastore for the food that she likes and then query again for all users who like that food. This as you see are two datastore queries which is not the best way. I am sure there would definitely be a better way to do this. Can someone please help.
[Update:-Or can something like this be done that I change the second model such that usernames become a multivalued property in which all the users that like that food can be stored.. however I am a little unclear here]
[Edit:-Hi Thanks for replying but I found both the solutions below a bit of a overkill here. I tried doing it like below.Request you to have a look at this and kindly advice. I maintained the same two tables,however changed them like below:-
class User(db.Model):
username=db.StringProperty()
email=db.StringProperty()
class UserLikes(db.Model):
username=db.ListProperty(basestring)
food=db.StringProperty()
Now when 2 users update same food they like, it gets stored like
'pizza' ----> 'Alice','Bob'
And my db query to retrieve data becomes quite easy here
query=db.Query(UserLikes).filter('username =','Alice').get()
which I can then iterate over as something like
for elem in query.username:
print elem
Now if there are two foods like below:-
'pizza' ----> 'Alice','Bob'
'bacon'----->'Alice','Fred'
I use the same query as above , and iterate over the queries and then the usernames.
I am quite new to this , to realize that this just might be wrong. Please Suggest!

Beside the relation model you have, you could handle this in two other ways depending on your exact use case. You have a good idea in your update, use a ListProperty. Check out Brett Slatkin's taslk on Relation Indexes for some background.
You could use a child entity (Relation Index) on user that contains a list of foods:
class UserLikes(db.Model):
food = db.StringListProperty()
Then when you are creating a UserLikes instance, you will define the user it relates to as the parent:
likes = UserLikes(parent=user)
That lets you query for other users who like a particular food nicely:
like_apples_keys = UserLikes.all(keys_only=True).filter(food='apples')
user_keys = [key.parent() for key in like_apples_keys]
users_who_like_apples = db.get(user_keys)
However, what may suit your application better, would be to make the Relation a child of a food:
class WhoLikes(db.Model):
users = db.StringListProperty()
Set the key_name to the name of the food when creating the like:
food_i_like = WhoLikes(key_name='apples')
Now, to get all users who like apples:
apple_lover_key_names = WhoLikes.get_by_key_name('apples')
apple_lovers = UserModel.get_by_key_names(apple_lover_key_names.users)
To get all users who like the same stuff as a user:
same_likes = WhoLikes.all().filter('users', current_user_key_name)
like_the_same_keys = set()
for keys in same_likes:
like_the_same_keys.union(keys.users)
same_like_users = UserModel.get_by_key_names(like_the_same_keys)
If you will have lots of likes, or lots users with the same likes, you will need to make some adjustments to the process. You won't be able to fetch 1,000s of users.

Food and User relation is a so called Many-to-Many relationship tipically handled with a Join table; in this case a db.Model that links User and Food.
Something like this:
class User(db.Model):
name = db.StringProperty()
def get_food_I_like(self):
return (entity.name for entity in self.foods)
class Food(db.Model):
name = db.StringProperty()
def get_users_who_like_me(self):
return (entity.name for entity in self.users)
class UserFood(db.Model):
user= db.ReferenceProperty(User, collection_name='foods')
food = db.ReferenceProperty(Food, collection_name='users')
For a given User's entity you could retrieve preferred food with:
userXXX.get_food_I_like()
For a given Food's entity, you could retrieve users that like that food with:
foodYYY.get_users_who_like_me()
There's also another approach to handle many to many relationship storing a list of keys inside a db.ListProperty().
class Food(db.Model):
name = db.StringProperty()
class User(db.Model):
name = db.StringProperty()
food = db.ListProperty(db.Key)
Remember that ListProperty is limited to 5.000 keys or again, you can't add useful properties that would fit perfectly in the join table (ex: a number of stars representing how much a User likes a Food).

Related

App Engine Query Users

I have the User model in my datastore which contains some attributes:
I need to query all users filtering by the company attribute.
So, as I would normally do, I do this:
from webapp2_extras.appengine.auth.models import User
employees = User.query().filter(User.company == self.company_name).fetch()
This gives me:
AttributeError: type object 'User' has no attribute 'company'
And when I do:
employees = User.query().filter().fetch()
It gives me no error and shows the list with all the Users.
How do I query by field? Thanks
Your question is a bit misdirected. You ask how to query by field, which you are already doing with correct syntax. The problem, as Jeff O'Neill noted, is your User model does not have that company field, so your query-by-field attempt results in an error. (Here is some ndb documentation that you should definitely peruse and bookmark if you haven't already.) There are three ways to remedy your missing-field problem:
Subclass the User model, as Jeff shows in his answer. This is quick and simple, and may be the best solution for what you want.
Create your own User model, completely separate from the webapp2 one. This is probably overkill for what you want, just judging from your question, because you would have to write most of your own authentication code that the webapp2 user already handles for you.
Create a new model that contains extra user information, and has a key property containing the corresponding user's key. That would look like this:
class UserProfile(ndb.Expando):
user_key = ndb.KeyProperty(kind='User', required=True)
company = ndb.StringProperty()
# other possibilities: profile pic? address? etc.
with queries like this:
from models.user_profile import UserProfile
from webapp2_extras.appengine.auth.models import User
from google.appengine.ext import ndb
# get the employee keys
employee_keys = UserProfile.query(UserProfile.company == company_name).fetch(keys_only=True)
# get the actual user objects
employees = ndb.get_multi(employee_keys)
What this solution does is it separates your User model that you use for authentication (webapp2's User) from the model that holds extra user information (UserProfile). If you want to store profile pictures or other relatively large amounts of data for each user, you may find this solution works best for you.
Note: you can put your filter criteria in the .query() parentheses to simplify things (I find I rarely use the .filter() method):
# long way
employees = User.query().filter(User.company == self.company_name).fetch()
# shorter way
employees = User.query(User.company == self.company_name).fetch()
You've imported a User class defined by webapp2. This User class does not have an attribute called company so that is why you are getting the error from User.company.
You probably want to do create your own User model by subclassing the one provided by webapp2:
from webapp2_extras.appengine.auth.models import User as Webapp2_User
class User(Webapp2_User):
company = ndb.StringProperty()
Then your query should work.
One caveat, I've never used webapp2_extras.appengine.auth.models so I don't know what that is exactly.

Appending filters to django models

Context
Hey guys,
So let's say I have two models: Person and Attribute connected by a ManyToMany relationship (one person can have many attributes, one attribute can be shared by many people)
class Attribute(models.model):
attribute_name = models.CharField(max_length=100)
attribute_type = models.CharField(max_length=1)
class Person(models.model):
article_name = models.CharField(max_length=100)
attributes = models.ManyToManyField(Attribute)
Attributes can be things like hair colour, location, university degree.
So for example, an attribute may have an 'attribute_name' of 'Computer Science' and an 'attribute_type' of 'D' (for degree).
Another example would be 'London', 'L'.
The Issue
On this web page, users can select people by attributes. For example, they may want to see all people who live in London and who have degrees in both History and Biology (all AND relationships).
I understand that this could be represented in the following (breaks for legibility):
Person.objects
.filter(attributes__attribute_name='London', attributes__attribute_type='L')
.filter(attributes__attribute_name='History', attributes__attribute_type='D')
.filter(attributes__attribute_name='Biology', attributes__attribute_type='D')
However, the user could equally ask for users who have four different degrees. The point being, we don't know how many attributes the user will ask for in the search function.
Questions
As such, which would be the best way to append these filters if we don't know how many, and what types of attributes the user will request?
Is appending filters like this the best way?
Thanks!
Nick
You could obtain all attributes selected by the user and then iterate over:
# sel_att holds the user selected attributes.
result = Person.objects.all()
for att in sel_att:
result = result.filter(
attributes__attribute_name=att.attribute_name,
attributes__attribute_type=att.attribute_type
)
Use the Q module for complex lookups.
For example:
from django.db.models import Q
Person.objects.get(Q(attributes__attribute_name='London') | Q(attributes__attribute_name='History')
Within a QuerySet a | acts as an OR and a , acts as an AND, pretty much as expected.
The problem with chanining filters is you can only implement an AND logic between them, for a complex AND, OR, NOT logic Q would be the better way to go.

GAE Datastore ndb models accessed in 5 different ways

I run an online marketplace. I don't know the best way to access NDB models. I'm afraid it's a real mess and I really don't know which way to turn. If you don't have time for a full response, I'm happy to read an article on NDB best practices
I have these classes, which are interlinked in different ways:
User(webapp2_extras.appengine.auth.models.User) controls seller logins
Partner(ndb.Model) contains information about sellers
menuitem(ndb.Model) contains information about items on menu
order(ndb.Model) contains buyer information & information about an order (all purchases are "guest" purchases)
Preapproval(ndb.Model) contains payment information saved from PayPal
How they're linked.
User - Partner
A 1-to-1 relationship. Both have "email address" fields. If these match, then can retrieve user from partner or vice versa. For example:
user = self.user
partner = model.Partner.get_by_email(user.email_address)
Where in the Partner model we have:
#classmethod
def get_by_email(cls, partner_email):
query = cls.query(Partner.email == partner_email)
return query.fetch(1)[0]
Partner - menuitem
menuitems are children of Partner. Created like so:
myItem = model.menuitem(parent=model.partner_key(partner_name))
menuitems are referenced like this:
menuitems = model.menuitem.get_by_partner_name(partner.name)
where get_by_partner_name is this:
#classmethod
def get_by_partner_name(cls, partner_name):
query = cls.query(
ancestor=partner_key(partner_name)).order(ndb.GenericProperty("itemid"))
return query.fetch(300)
and where partner_key() is a function just floating at the top of the model.py file:
def partner_key(partner_name=DEFAULT_PARTNER_NAME):
return ndb.Key('Partner', partner_name)
Partner - order
Each Partner can have many orders. order has a parent that is Partner. How an order is created:
partner_name = self.request.get('partner_name')
partner_k = model.partner_key(partner_name)
myOrder = model.order(parent=partner_k)
How an order is referenced:
myOrder_k = ndb.Key('Partner', partnername, 'order', ordernumber)
myOrder = myOrder_k.get()
and sometimes like so:
order = model.order.get_by_name_id(partner.name, ordernumber)
(where in model.order we have:
#classmethod
def get_by_name_id(cls, partner_name, id):
return ndb.Key('Partner', partner_name, 'order', int(id)).get()
)
This doesn't feel particularly efficient, particularly as I often have to look up the partner in the datastore just to pull up an order. For example:
user = self.user
partner = model.Partner.get_by_email(user.email_address)
order = model.order.get_by_name_id(partner.name, ordernumber)
Have tried desperately to get something simple like myOrder = order.get_by_id(ordernumber) to work, but it seems that having a partner parent stops that working.
Preapproval - order.
a 1-to-1 relationship. Each order can have a 'Preapproval'. Linkage: a field in the Preapproval class: order = ndb.KeyProperty(kind=order).
creating a Preapproval:
item = model.Preapproval( order=myOrder.key, ...)
accessing a Preapproval:
preapproval = model.Preapproval.query(model.Preapproval.order == order.key).get()
This seems like the easiest method to me.
TL;DR: I'm linking & accessing models in many ways, and it's not very systematic.
User - Parner
You could replace:
#classmethod
def get_by_email(cls, partner_email):
query = cls.query(Partner.email == partner_email)
return query.fetch(1)[0]
with:
#classmethod
def get_by_email(cls, partner_email):
query = cls.query(Partner.email == partner_email).get()
But because of transactions issues is better to use entity groups: User should be parent of Partner.
In this case instead of using get_by_email you can get user without queries:
user = partner.key.parent().get()
Or do an ancestor query for getting the partner object:
partner = Partner.query(ancestor=user_key).get()
Query
Don't use fetch() if you don't need it. Use queries as iterators.
Instead of:
return query.fetch(300)
just:
return query
And then use query as:
for something in query:
blah
Relationships: Partner-Menu Item and Partner - Order
Why are you using entity groups? Ancestors are not used for modeling 1 to N relationships (necessarily). Ancestors are used for transactions, defining entity groups. They are useful in composition relationships (e.g.: partner - user)
You can use a KeyProperty for the relationship. (multivalue (i.e. repeated=true) or not, depending on the orientation of the relationship)
Have tried desperately to get something simple like myOrder = order.get_by_id(ordernumber) to work, but it seems that having a partner parent stops that working.
No problem if you stop using ancestors in this relationship.
TL;DR: I'm linking & accessing models in many ways, and it's not very systematic
There is not a systematic way of linking models. It depends of many factors: cardinality, number of possible items in each side, need transactions, composition relationship, indexes, complexity of future queries, denormalization for optimization, etc.
Ok, I think the first step in cleaning this up is as follows:
At the top of your .py file, import all your models, so you don't have to keep using model.ModelName. That cleans up a bit if the code. model.ModelName becomes ModelName.
First best practice in cleaning this up is to always use a capital letter as the first letter to name a class. A model name is a class. Above, you have mixed model names, like Partner, order, menuitem. It makes it hard to follow. Plus, when you use order as a model name, you may end up with conflicts. Above you redefined order as a variable twice. Use Order as the model name, and this_order as the lookup, and order_key as the key, to clear up some conflicts.
Ok, let's start there

Perform a SQL JOIN on Django models that are not related?

I have 2 Models, User (django.contrib.auth.models.User) and a model named Log. Both contain an "email" field. Log does not have a ForeignKey pointing to the User model. I'm trying to figure out how I can perform a JOIN on these two tables using the email field as the commonality.
There are basically 2 queries I want to be able to perform. A basic join for filtering
#Get all the User objects that have related Log objects with the level parameter set to 3.
User.objects.filter(log__level=3)
I'd also like to do some aggregates.
User.objects.all().anotate(Count('log'))
Of course, it would be nice to be able to do the reverse as well.
log = Log.objects.get(pk=3)
log.user...
Is there a way to do this with the ORM? Maybe something I can add to the model's Meta class to "activate" the relation?
Thanks!
You can add an extra method onto the User class, using MonkeyPatching/DuckPunching:
def logs(user):
return Log.objects.filter(email=user.email)
from django.contrib.auth.models import User
User.logs = property(logs)
Now, you can query a User, and ask for the logs attached (for instance, in a view):
user = request.user
logs = user.logs
This type of process is common in the Ruby world, but seems to be frowned upon in Python.
(I came across the DuckPunching term the other day. It is based on Duck Typing, where we don't care what class something is: if it quacks like a duck, it is a duck as far as we are concerned. If it doesn't quack when you punch it, keep punching until it quacks).
why not use extra()?
example (untested):
User.objects.extra(
select={
'log_count': 'SELECT COUNT(*) FROM myapp_log WHERE myapp_log.email = auth_user.email'
},
)
for the User.objects.filter(log__level=3) portion here is the equivalent with extra (untested):
User.objects.extra(
select={
'log_level_3_count': 'SELECT COUNT(*) FROM myapp_log WHERE (myapp_log.email = auth_user.email) AND (myapp_log.level=3)'
},
).filter(log_level_3_count__gt=0)
Do the Log.email values always correspond to a User? If so, how about just adding a ForeignKey(User) to the Log object?
class Log(models.Model):
# ...
user = models.ForeignKey(User)
With the FK to User, it becomes fairly straight forward to find what you want:
User.objects.filter(log__level=3)
User.objects.all().anotate(Count('log'))
user.log_set.all()
user.log_set.count()
log.user
If the Log.email value does not have to belong to a user you can try adding a method to a model manager.
class LogManager(models.Manager):
def for_user(self, user):
return super(LobManager, self).get_query_set().filter(email=user.email)
class Log(models.Model):
# ...
objects = LogManager()
And then use it like this:
user = User.objects.get(pk=1)
logs_for_user = Log.objects.for_user(user)

A good data model for finding a user's favorite stories

Original Design
Here's how I originally had my Models set up:
class UserData(db.Model):
user = db.UserProperty()
favorites = db.ListProperty(db.Key) # list of story keys
# ...
class Story(db.Model):
title = db.StringProperty()
# ...
On every page that displayed a story I would query UserData for the current user:
user_data = UserData.all().filter('user =' users.get_current_user()).get()
story_is_favorited = (story in user_data.favorites)
New Design
After watching this talk: Google I/O 2009 - Scalable, Complex Apps on App Engine, I wondered if I could set things up more efficiently.
class FavoriteIndex(db.Model):
favorited_by = db.StringListProperty()
The Story Model is the same, but I got rid of the UserData Model. Each instance of the new FavoriteIndex Model has a Story instance as a parent. And each FavoriteIndex stores a list of user id's in it's favorited_by property.
If I want to find all of the stories that have been favorited by a certain user:
index_keys = FavoriteIndex.all(keys_only=True).filter('favorited_by =', users.get_current_user().user_id())
story_keys = [k.parent() for k in index_keys]
stories = db.get(story_keys)
This approach avoids the serialization/deserialization that's otherwise associated with the ListProperty.
Efficiency vs Simplicity
I'm not sure how efficient the new design is, especially after a user decides to favorite 300 stories, but here's why I like it:
A favorited story is associated with a user, not with her user data
On a page where I display a story, it's pretty easy to ask the story if it's been favorited (without calling up a separate entity filled with user data).
fav_index = FavoriteIndex.all().ancestor(story).get()
fav_of_current_user = users.get_current_user().user_id() in fav_index.favorited_by
It's also easy to get a list of all the users who have favorited a story (using the method in #2)
Is there an easier way?
Please help. How is this kind of thing normally done?
What you've described is a good solution. You can optimise it further, however: For each favorite, create a 'UserFavorite' entity as a child entity of the relevant Story entry (or equivalently, as a child entity of a UserInfo entry), with the key name set to the user's unique ID. This way, you can determine if a user has favorited a story with a simple get:
UserFavorite.get_by_name(user_id, parent=a_story)
get operations are 3 to 5 times faster than queries, so this is a substantial improvement.
I don't want to tackle your actual question, but here's a very small tip: you can replace this code:
if story in user_data.favorites:
story_is_favorited = True
else:
story_is_favorited = False
with this single line:
story_is_favorited = (story in user_data.favorites)
You don't even need to put the parentheses around the story in user_data.favorites if you don't want to; I just think that's more readable.
You can make the favorite index like a join on the two models
class FavoriteIndex(db.Model):
user = db.UserProperty()
story = db.ReferenceProperty()
or
class FavoriteIndex(db.Model):
user = db.UserProperty()
story = db.StringListProperty()
Then your query on by user returns one FavoriteIndex object for each story the user has favorited
You can also query by story to see how many users have Favorited it.
You don't want to be scanning through anything unless you know it is limited to a small size
With your new Design you can lookup if a user has favorited a certain story with a query.
You don't need the UserFavorite class entities.
It is a keys_only query so not as fast as a get(key) but faster then a normal query.
The FavoriteIndex classes all have the same key_name='favs'.
You can filter based on __key__.
a_story = ......
a_user_id = users.get_current_user().user_id()
favIndexKey = db.Key.from_path('Story', a_story.key.id_or_name(), 'FavoriteIndex', 'favs')
doesFavStory = FavoriteIndex.all(keys_only=True).filter('__key__ =', favIndexKey).filter('favorited_by =', a_user_id).get()
If you use multiple FavoriteIndex as childs of a Story you can use the ancestor filter
doesFavStory = FavoriteIndex.all(keys_only=True).ancestor(a_story).filter('favorited_by =', a_user_id).get()

Categories