How to model a 'Like' mechanism via ndb? - python

We are about to introduce a social aspect into our app, where users can like each others events.
Getting this wrong would mean a lot of headache later on, hence I would love to get input from some experienced developers on GAE, how they would suggest to model it.
It seems there is a similar question here however the OP didn't provide any code to begin with.
Here are two models:
class Event(ndb.Model):
user = ndb.KeyProperty(kind=User, required=True)
time_of_day = ndb.DateTimeProperty(required=True)
notes = ndb.TextProperty()
timestamp = ndb.FloatProperty(required=True)
class User(UserMixin, ndb.Model):
firstname = ndb.StringProperty()
lastname = ndb.StringProperty()
We need to know who has liked an event, in case that the user may want to unlike it again. Hence we need to keep a reference. But how?
One way would be introducing a RepeatedProperty to the Event class.
class Event(ndb.Model):
....
ndb.KeyProperty(kind=User, repeated=True)
That way any user that would like this Event, would be stored in here. The number of users in this list would determine the number of likes for this event.
Theoretically that should work. However this post from the creator of Python worries me:
Do not use repeated properties if you have more than 100-1000 values.
(1000 is probably already pushing it.) They weren't designed for such
use.
And back to square one. How am I supposed to design this?

RepeatProperty has limitation in number of values (< 1000).
One recommended way to break the limit is using shard:
class Event(ndb.Model):
# use a integer to store the total likes.
likes = ndb.IntegerProperty()
class EventLikeShard(ndb.Model):
# each shard only store 500 users.
event = ndb.KeyProperty(kind=Event)
users = ndb.KeyProperty(kind=User, repeated=True)
If the limitation is more than 1000 but less than 100k.
A simpler way:
class Event(ndb.Model):
likers = ndb.PickleProperty(compressed=True)

Use another model "Like" where you keep the reference to user and event.
Old way of representing many to many in a relational manner. This way you keep all entities separated and can easily add/remove/count.

I would recommend the usual many-to-many relationship using an EventUser model given that the design seems to require unlimited number of user linking an event. The only tricky part is that you must ensure that event/user combination is unique, which can be done using _pre_put_hook. Keeping a likes counter as proposed by #lucemia is indeed a good idea.
You would then would capture the liked action using a boolean, or, you can make it a bit more flexible by including an actions string array. This way, you could also capture action such as signed-up or attended.
Here is a sample code:
class EventUser(ndb.Model):
event = ndb.KeyProperty(kind=Event, required=True)
user = ndb.KeyProperty(kind=User, required=True)
actions = ndb.StringProperty(repeated=True)
# make sure event/user is unique
def _pre_put_hook(self):
cur_key = self.key
for entry in self.query(EventUser.user == self.user, EventUser.event == self.event):
# If cur_key exists, means that user is performing update
if cur_key.id():
if cur_key == entry.key:
continue
else:
raise ValueError("User '%s' is a duplicated entry." % (self.user))
# If adding
raise ValueError("User Add '%s' is a duplicated entry." % (self.user))

Related

Is there a better way to design the Message model?

Is there a better way to design the Message model ?
I have a Message model:
class Message(models.Model):
"""
message
"""
title = models.CharField(max_length=64, help_text="title")
content = models.CharField(max_length=1024, help_text="content")
is_read = models.BooleanField(default=False, help_text="whether message is read")
create_user = models.ForeignKey(User, related_name="messages",help_text="creator")
receive_user = models.CharField(max_length=1024, help_text="receive users' id")
def __str__(self):
return self.title
def __unicode__(self):
return self.title
You see, I use models.CharField to store the users' id, so I can know the users who should receive this row message.
I don't know whether this design type is good. or is there a better way to do that?
I have considered use ManyToMany Field, but I think if user is too many, the admin create one message will create as many as users count, so I think this is not a good idea.
I would definitely use ManyToManyField for your receive_user. You're going to find that keeping a CharField updated and sanitised with user_ids is going to be a nightmare that will involve re-implementing vast swathes of existing Django functionality.
I'm not sure if I understand your potential issue to using ManyToManyField, users of the admin will be able to select which users are to be recipients of the message, it doesn't automatically a message for each user.
e: Also, depending on which version of python you're using (2 or 3) you only need one of either __str__ or __unicode__
__unicode__ is the method to use for python2, __str__ for python3: See this answer for more details
So it actually depends on your needs in which direction I would change your message Model.
General Changes
Based on the guess: you don't ever need an index on the content field
I would change the content to a TextField (alse because the length of 1024 is already to large for a propper index on mysql for example) https://docs.djangoproject.com/en/1.11/ref/databases/#textfield-limitations here some more infos about this topic.
I would pbly increase the size of the title field just because it seems convenient to me.
1. Simple -> One User to One User
The single read field indicates a one to one message:
I would change the Receiver to also be a Foreign key and adapt the related names of the sender and receiver field to represent these connections to something like sent-messages and received-messages.
Like #sebastian-fleck already suggested I'd also change the read field to a datetime field, it only changes your querysets from filter(read=True) to filter(read__isnull=False) to get the same results and you could create a property representing the read as boolean for conveniance, e.g.
#property
def read(self):
return bool(self.read_datetime) # assumed read as a datetime is read_datetime
2. More Complex: One User to Multiple User
This can get a lot more complex, here the least complex solution I could think of.
Conditions:
- there are only messages and no conversation like strukture
- a message should have a read status for every receiver
(I removed descriptions for an easier overview and changed the models according to my opinions from before, this is based on my experience and the business needs I assumed from your example and answers)
#python_2_unicode_compatible
class Message(models.Model):
title = models.CharField(max_length=160)
content = models.TextField()
create_user = models.ForeignKey(User, related_name="sent-messages")
receive_users = models.ManyToManyField(User, through=MessageReceiver)
def __str__(self):
return 'Message: %s' % self.title
#python_2_unicode_compatible
class MessageReceiver(models.Model):
is_read = models.Datetime(null=True, blank=True)
receiver = models.ForeignKey(User)
message = models.ForeignKey(Message)
This structure is using the power of ManyToMany with a custom through Model, check this out, it very mighty: https://docs.djangoproject.com/en/1.11/ref/models/fields/#django.db.models.ManyToManyField.through.
tldr: we want every receiver to have a read status, so we modeled this in a separate object
Longer version: we utilize the power of a custom ManyToMany through model to have a separate read status for every receiver. This means we need to change some parts of our code to work for the many to many structure, e.g. if we want to know if a message was read by all receivers:
def did_all_receiver_read_the_message(message)
unread_count = my_message.receive_users.filter(is_read__isnull=True).count()
if unread_count > 0:
return True
return False
if we want to know if a specific user read a specific message:
def did_user_read_this_message(user, message)
receiver = message.receive_users.get(receiver=user)
return bool(receiver.is_read)
3. Conversations + Messages + Participants
This is something that would exceed my time limit but some short hints:
Conversation holds everything together
Message is written by a Participant and holds a created timestamp
Participant allows access to a conversation and links a User to the Conversation object
the Participant holds a last_read timestamp with can be used to calculate if a message was read or not using the messages created timestamps (-> annoyingly complex part & milliseconds are important)
Everything else pbly would need to be adapted to your specific business needs. This scenario is pbly the most flexible but it's a lot of work (based on personal experience) and adds quite a bit of complexity to your architecture - I only recommend this if it's really really needed ^^.
Disclaimer:
This could be an overall structure, most design decisions I made for the examples are based on assumptions, I could only mentioned some or the text would to long, but feel free to ask.
Please excuse any typos and errors, I didn't had the chance to run the code.

GAE Datastore ndb models accessed in 5 different ways

I run an online marketplace. I don't know the best way to access NDB models. I'm afraid it's a real mess and I really don't know which way to turn. If you don't have time for a full response, I'm happy to read an article on NDB best practices
I have these classes, which are interlinked in different ways:
User(webapp2_extras.appengine.auth.models.User) controls seller logins
Partner(ndb.Model) contains information about sellers
menuitem(ndb.Model) contains information about items on menu
order(ndb.Model) contains buyer information & information about an order (all purchases are "guest" purchases)
Preapproval(ndb.Model) contains payment information saved from PayPal
How they're linked.
User - Partner
A 1-to-1 relationship. Both have "email address" fields. If these match, then can retrieve user from partner or vice versa. For example:
user = self.user
partner = model.Partner.get_by_email(user.email_address)
Where in the Partner model we have:
#classmethod
def get_by_email(cls, partner_email):
query = cls.query(Partner.email == partner_email)
return query.fetch(1)[0]
Partner - menuitem
menuitems are children of Partner. Created like so:
myItem = model.menuitem(parent=model.partner_key(partner_name))
menuitems are referenced like this:
menuitems = model.menuitem.get_by_partner_name(partner.name)
where get_by_partner_name is this:
#classmethod
def get_by_partner_name(cls, partner_name):
query = cls.query(
ancestor=partner_key(partner_name)).order(ndb.GenericProperty("itemid"))
return query.fetch(300)
and where partner_key() is a function just floating at the top of the model.py file:
def partner_key(partner_name=DEFAULT_PARTNER_NAME):
return ndb.Key('Partner', partner_name)
Partner - order
Each Partner can have many orders. order has a parent that is Partner. How an order is created:
partner_name = self.request.get('partner_name')
partner_k = model.partner_key(partner_name)
myOrder = model.order(parent=partner_k)
How an order is referenced:
myOrder_k = ndb.Key('Partner', partnername, 'order', ordernumber)
myOrder = myOrder_k.get()
and sometimes like so:
order = model.order.get_by_name_id(partner.name, ordernumber)
(where in model.order we have:
#classmethod
def get_by_name_id(cls, partner_name, id):
return ndb.Key('Partner', partner_name, 'order', int(id)).get()
)
This doesn't feel particularly efficient, particularly as I often have to look up the partner in the datastore just to pull up an order. For example:
user = self.user
partner = model.Partner.get_by_email(user.email_address)
order = model.order.get_by_name_id(partner.name, ordernumber)
Have tried desperately to get something simple like myOrder = order.get_by_id(ordernumber) to work, but it seems that having a partner parent stops that working.
Preapproval - order.
a 1-to-1 relationship. Each order can have a 'Preapproval'. Linkage: a field in the Preapproval class: order = ndb.KeyProperty(kind=order).
creating a Preapproval:
item = model.Preapproval( order=myOrder.key, ...)
accessing a Preapproval:
preapproval = model.Preapproval.query(model.Preapproval.order == order.key).get()
This seems like the easiest method to me.
TL;DR: I'm linking & accessing models in many ways, and it's not very systematic.
User - Parner
You could replace:
#classmethod
def get_by_email(cls, partner_email):
query = cls.query(Partner.email == partner_email)
return query.fetch(1)[0]
with:
#classmethod
def get_by_email(cls, partner_email):
query = cls.query(Partner.email == partner_email).get()
But because of transactions issues is better to use entity groups: User should be parent of Partner.
In this case instead of using get_by_email you can get user without queries:
user = partner.key.parent().get()
Or do an ancestor query for getting the partner object:
partner = Partner.query(ancestor=user_key).get()
Query
Don't use fetch() if you don't need it. Use queries as iterators.
Instead of:
return query.fetch(300)
just:
return query
And then use query as:
for something in query:
blah
Relationships: Partner-Menu Item and Partner - Order
Why are you using entity groups? Ancestors are not used for modeling 1 to N relationships (necessarily). Ancestors are used for transactions, defining entity groups. They are useful in composition relationships (e.g.: partner - user)
You can use a KeyProperty for the relationship. (multivalue (i.e. repeated=true) or not, depending on the orientation of the relationship)
Have tried desperately to get something simple like myOrder = order.get_by_id(ordernumber) to work, but it seems that having a partner parent stops that working.
No problem if you stop using ancestors in this relationship.
TL;DR: I'm linking & accessing models in many ways, and it's not very systematic
There is not a systematic way of linking models. It depends of many factors: cardinality, number of possible items in each side, need transactions, composition relationship, indexes, complexity of future queries, denormalization for optimization, etc.
Ok, I think the first step in cleaning this up is as follows:
At the top of your .py file, import all your models, so you don't have to keep using model.ModelName. That cleans up a bit if the code. model.ModelName becomes ModelName.
First best practice in cleaning this up is to always use a capital letter as the first letter to name a class. A model name is a class. Above, you have mixed model names, like Partner, order, menuitem. It makes it hard to follow. Plus, when you use order as a model name, you may end up with conflicts. Above you redefined order as a variable twice. Use Order as the model name, and this_order as the lookup, and order_key as the key, to clear up some conflicts.
Ok, let's start there

appengine many to many field update value and lookup efficiently

I am using appengine with python 2.7 and webapp2 framework. I am not using ndb.model.
I have the following model:
class Story(db.Model);
name = db.StringProperty()
class UserProfile(db.Model):
name = db.StringProperty()
user = db.UserProperty()
class Tracking(db.Model):
user_profile = db.ReferenceProperty(UserProfile)
story = db.ReferenceProperty(Story)
upvoted = db.BooleanProperty()
flagged = db.BoolenProperty()
A user can upvote and/or flag a story but only once. Hence I came up with the above model.
Now when a user clicks on the upvote link, on the database I try to see if the user has not already voted it, hence I do try to do the following:
get the user instance with his id as up = db.get(db.Key.from_path('UserProfile', uid))
then get the story instance as follows s_ins = db.get(db.Key.from_path('Story', uid))
Now it is the turn to check if a Tracking based on these two exist, if yes then don't allow voting, else allow him to vote and update the Tracking instance.
What is the most convenient way to fetch a Tracking instance given an id(db.key().id()) of user_profile and story?
What is the most convenient way to save a Tracking model having given a user profile id and an story id?
Is there a better way to implement tracking?
You can try tracking using lists of keys versus having a separate entry for track/user/story:
class Story(db.Model);
name = db.StringProperty()
class UserProfile(db.Model):
name = db.StringProperty()
user = db.UserProperty()
class Tracking(db.Model):
story = db.ReferenceProperty(Story)
upvoted = db.ListProperty(db.Key)
flagged = db.ListProperty(db.Key)
So when you want to see if a user upvoted for a given story:
Tracking.all().filter('story =', db.Key.from_path('Story', uid)).filter('upvoted =', db.Key.from_path('UserProfile', uid)).get(keys_only=True)
Now the only problem here is the size of the upvoted/flagged lists can't grow too large (I think the limit is 5000), so you'd have to make a class to manage this (that is, when adding to the upvoted/flagged lists, detect if X entries exists, and if so, start a new tracking object to hold additional values). You will also have to make this transactional and with HR you have a 1 write per second threshold. This may or may not be an issue depending on your expected use case. A way around the write threshold would be to implement upvotes/flags using pull-queues and to have a cron job that pulls and batch updates tracking objects as needed.
This method has its pros/cons. The most obvious cons are the ones I just listed. The pros, however, may be worth it. You can get a full list of users who upvoted/flagged a story from a single list (or multiple depending on how popular the story is). You can get a full list of users with a lot fewer queries to the datastore. This method should also take less storage, index, and metadata space. Additionally, adding a user to a tracking object will be cheaper, instead of writing a new object + 2 writes for each property, you would just be charged 1 write for the object + 2 writes for the entry to the list (9 vs 3 writes for adding users to a pre-existing tracked story, or 9 vs 7 for untracked stories)
What you propose sounds reasonable.
Don't use the app engine generated key for Tracking. Because the combination of story/user should be unique, create your own key as a combination of the story/user. Something like
tracking = Tracking.get_or_insert(str(story.id) + "-" + str(user.id), **params)
If you know the story/user, then you can always fetch the tracking by key name.

Django - AutoField with regards to a foreign key

I have a model with a unique integer that needs to increment with regards to a foreign key, and the following code is how I currently handle it:
class MyModel(models.Model):
business = models.ForeignKey(Business)
number = models.PositiveIntegerField()
spam = models.CharField(max_length=255)
class Meta:
unique_together = (('number', 'business'),)
def save(self, *args, **kwargs):
if self.pk is None: # New instance's only
try:
highest_number = MyModel.objects.filter(business=self.business).order_by('-number').all()[0].number
self.number = highest_number + 1
except ObjectDoesNotExist: # First MyModel instance
self.number = 1
super(MyModel, self).save(*args, **kwargs)
I have the following questions regarding this:
Multiple people can create MyModel instances for the same business, all over the internet. Is it possible for 2 people creating MyModel instances at the same time, and .count() returns 500 at the same time for both, and then both try to essentially set self.number = 501 at the same time (raising an IntegrityError)? The answer seems like an obvious "yes, it could happen", but I had to ask.
Is there a shortcut, or "Best way" to do this, which I can use (or perhaps a SuperAutoField that handles this)?
I can't just slap a while model_not_saved: try:, except IntegrityError: in, because other restraints in the model could lead to an endless loop, and a disaster worse than Chernobyl (maybe not quite that bad).
You want that constraint at the database level. Otherwise you're going to eventually run into the concurrency problem you discussed. The solution is to wrap the entire operation (read, increment, write) in a transaction.
Why can't you use an AutoField for instead of a PositiveIntegerField?
number = models.AutoField()
However, in this case number is almost certainly going to equal yourmodel.id, so why not just use that?
Edit:
Oh, I see what you want. You want a numberfield that doesn't increment unless there's more than one instance of MyModel.business.
I would still recommend just using the id field if you can, since it's certain to be unique. If you absolutely don't want to do that (maybe you're showing this number to users), then you will need to wrap your save method in a transaction.
You can read more about transactions in the docs:
http://docs.djangoproject.com/en/dev/topics/db/transactions/
If you're just using this to count how many instances of MyModel have a FK to Business, you should do that as a query rather than trying to store a count.

A good data model for finding a user's favorite stories

Original Design
Here's how I originally had my Models set up:
class UserData(db.Model):
user = db.UserProperty()
favorites = db.ListProperty(db.Key) # list of story keys
# ...
class Story(db.Model):
title = db.StringProperty()
# ...
On every page that displayed a story I would query UserData for the current user:
user_data = UserData.all().filter('user =' users.get_current_user()).get()
story_is_favorited = (story in user_data.favorites)
New Design
After watching this talk: Google I/O 2009 - Scalable, Complex Apps on App Engine, I wondered if I could set things up more efficiently.
class FavoriteIndex(db.Model):
favorited_by = db.StringListProperty()
The Story Model is the same, but I got rid of the UserData Model. Each instance of the new FavoriteIndex Model has a Story instance as a parent. And each FavoriteIndex stores a list of user id's in it's favorited_by property.
If I want to find all of the stories that have been favorited by a certain user:
index_keys = FavoriteIndex.all(keys_only=True).filter('favorited_by =', users.get_current_user().user_id())
story_keys = [k.parent() for k in index_keys]
stories = db.get(story_keys)
This approach avoids the serialization/deserialization that's otherwise associated with the ListProperty.
Efficiency vs Simplicity
I'm not sure how efficient the new design is, especially after a user decides to favorite 300 stories, but here's why I like it:
A favorited story is associated with a user, not with her user data
On a page where I display a story, it's pretty easy to ask the story if it's been favorited (without calling up a separate entity filled with user data).
fav_index = FavoriteIndex.all().ancestor(story).get()
fav_of_current_user = users.get_current_user().user_id() in fav_index.favorited_by
It's also easy to get a list of all the users who have favorited a story (using the method in #2)
Is there an easier way?
Please help. How is this kind of thing normally done?
What you've described is a good solution. You can optimise it further, however: For each favorite, create a 'UserFavorite' entity as a child entity of the relevant Story entry (or equivalently, as a child entity of a UserInfo entry), with the key name set to the user's unique ID. This way, you can determine if a user has favorited a story with a simple get:
UserFavorite.get_by_name(user_id, parent=a_story)
get operations are 3 to 5 times faster than queries, so this is a substantial improvement.
I don't want to tackle your actual question, but here's a very small tip: you can replace this code:
if story in user_data.favorites:
story_is_favorited = True
else:
story_is_favorited = False
with this single line:
story_is_favorited = (story in user_data.favorites)
You don't even need to put the parentheses around the story in user_data.favorites if you don't want to; I just think that's more readable.
You can make the favorite index like a join on the two models
class FavoriteIndex(db.Model):
user = db.UserProperty()
story = db.ReferenceProperty()
or
class FavoriteIndex(db.Model):
user = db.UserProperty()
story = db.StringListProperty()
Then your query on by user returns one FavoriteIndex object for each story the user has favorited
You can also query by story to see how many users have Favorited it.
You don't want to be scanning through anything unless you know it is limited to a small size
With your new Design you can lookup if a user has favorited a certain story with a query.
You don't need the UserFavorite class entities.
It is a keys_only query so not as fast as a get(key) but faster then a normal query.
The FavoriteIndex classes all have the same key_name='favs'.
You can filter based on __key__.
a_story = ......
a_user_id = users.get_current_user().user_id()
favIndexKey = db.Key.from_path('Story', a_story.key.id_or_name(), 'FavoriteIndex', 'favs')
doesFavStory = FavoriteIndex.all(keys_only=True).filter('__key__ =', favIndexKey).filter('favorited_by =', a_user_id).get()
If you use multiple FavoriteIndex as childs of a Story you can use the ancestor filter
doesFavStory = FavoriteIndex.all(keys_only=True).ancestor(a_story).filter('favorited_by =', a_user_id).get()

Categories