GAE: from RDBMS to NDB problems - python

I'm learning to work in GAE. I've read a lot of papers, all NDB docs from Google and asome questions here. I'm so used to SQL, but transform my way of think the last 20 years to NoSQL is a little hard for me, and all those different solutions gave here, drives me crazy.
I have the next simple structure:
BOOKS than can have CHAPTERS
CHAPTERS that can have VOTES
For example, Book "Sentinel" can have 3 chapters, and every chapter will have 0, 8 and 12 votes each.
In a traditional SQL I just make foreign keys from VOTES to CHAPTERS and BOOKS, and from CHAPTERS to BOOKS.
I do this for my models:
class Book(ndb.Model):
title = ndb.StringProperty(required=True)
author = ndb.StringProperty(required=True)
created = ndb.DateTimeProperty(auto_now_add=True)
# Define a default ancestor for all the books
#staticmethod
def bookKey(group='books'):
return ndb.Key(Book, group)
# Search all
#classmethod
def getAll(cls):
q = Book.query(ancestor=cls.bookKey())
q = q.order(Book.title)
books = q.fetch(100)
return books
#classmethod
def byId(cls, id):
book = Book.get_by_id(long(id), cls.bookKey())
# Get all the Chapters for a book
def getChapters(self):
chapters = Chapter.query(ancestor=self).order(Chapter.number).fetch(100)
return chapters
class Chapter(ndb.Model):
""" All chapters that a book have """
title = ndb.StringProperty(required=True)
number = ndb.IntegerProperty(default=1)
created = ndb.DateTimeProperty(auto_now_add=True)
book = ndb.KeyProperty(kind=Book)
# Search by Book (parent)
#classmethod
def byBook(cls, book, limit=100):
chapter = book.getChapters()
return chapter
# Search by id
#classmethod
def byId(cls, id, book):
return Chapter.get_by_id(long(id), parent=book)
class Vote(ndb.Model):
""" All votes that a book-chapter have """
value = ndb.IntegerProperty(default=1)
book = ndb.KeyProperty(kind=Book)
chapter = ndb.KeyProperty(kind=Chapter)
Well, my doubts are:
Is this approach correct?
The function bookKey() I've created is good to have a "Dummy Ancestor" in order to ensure that all entities are using ancestors?
Must I define in the Vote class a reference for a book and for a chapter, as it was a foreign keys (just like I think I've done)?
Is well defined the way to retrieve the chapters from a book? I mean, in the Chapter class the function byBook uses a function from the Book class. Or must I avoid to use functions from other entity to have a more clean code?
how can I retrieve all the votes for a chapter?
Which are the rigth ways to get the sum of all the votes for a especific chapter and for especific book?
Finally, I will display a single table with all my books. In the table I want to have the sum of all the votes for each book. For example:
Name | Votes
Sentinel | 30 votes
The witch | 4 votes
How can I get this info, especifically, the counted votes.
Then, clicking on the book name, I want to show all his chapters (I supose that is then when I must use the byBook function on Chapter model, right?).
Which is the GQL I need to obtain this kind of data?
Thanks in advance.

Good start. GAE's datastore is kinda confusing. Because it's schemaless, I've found that dealing with entities is much more akin to dealing with objects/data structures in memory than dealing with database tables.
Here's a few things I'd do differently:
It appears you are creating all your books under a single ancestor. Terrible idea. Screws you over in terms of performance. Unless there is some transactional operation you need to do on a group of books that's not in your current code, this is not right.
From the Book.getChapters() function it appears that you want to make a book the ancestor of a bunch of chapters. This is probably a good use of an ancestor. I don't see the code where you create chapters, but make sure the appropriate book is specified as the ancestor.
I'd simply include a vote as an attribute inside a book or chapter. There's no need to make it a separate kind that you need to issue extra queries on.
If the number of chapters per book would be limited, I'd consider using a StructuredProperty for the chapters. StructuredProperties are essentially structured data within a parent entity (Book). You'd be limited by the maximum size of the Book entity (1MB), but if it fits, it'll save you the cost of doing extra queries, since you wouldn't be querying on chapters without the appropriate book anyways.

Related

(Django) How to add a queryset count for dynamic objects in a list to a template?

I am building a website with django as a place to store my university notes.
I have a notes model which is linked by foreignkey to a categories model.
On my homepage I use a for loop to post a link to each category page, but in brackets I also wanted to display the number of published notes in that category.
For example:
Biology (6)
Chemistry (4)
Physics (12)
etc etc
I use template tag to give the length eg. {{category.notes_set.all|length}} within the for loop to display the number of notes in a category, but this ignores whether the notes are published or whether they are just created. It would give a value of 7 if I had 6 published and 1 unpublished note - I want it to display 6. I really want to filter(published_date__lte=timezone.now()) but don't think this can be achieved in template.
Do I have to create a context dictionary for every single category and annotate it with the count within the view? I feel this would be unmanageable when the number of categories and sub-categories becomes very large. Can I do this as a for loop within views.py?
Sorry if this has an obvious answer, I am a true beginner.
Most recent EDIT: It seems that the most elegant solution is to add a function to my model class in models.py, as per Ben's answer. So for the case-studies landing page I add:
def published_cases(self):
return self.case_set.filter(published_date__lte=timezone.now())
to my Specialty model. Then I add {{specialty.published_cases.all|length}} to my template.
Thanks for the help everyone.
EDIT: I am attempting to incorporate the annotate function into my class based views. Here is some example code for my case-studies landing page. It displays a list of new case-studies, top rated case-studies and a list of specialties which I want to annotate with a count of published case-studies. I have tried the following code:
class CaseListView(TemplateView):
template_name = "case_list.html"
def get_context_data(self, **kwargs):
context = super(CaseListView, self).get_context_data(**kwargs)
context["cases"] = Case.objects.filter(published_date__lte=timezone.now()).order_by("-published_date")[:5]
context["topcases"] = Case.objects.filter(published_date__lte=timezone.now()).annotate(num_upvotes=Count("upvotes")).order_by("-num_upvotes")[:5]
context["specialtys"] = Specialty.objects.all().order_by("name").annotate(num_cases=Count("case", filter=Q(case__published_date__lte=timezone.now())))
return context
This is giving me a NameError (Exception Value: name 'Q' is not defined).
I tried another method of building the context:
context["specialtys"] = Specialty.objects.all().order_by("name").annotate(num_cases=Count("case")).filter(case__published_date__lte=timezone.now())
This did not raise an exception, but did not give the desired result, giving a wildly incorrect value for {{specialty.num_cases}} which with my level of knowledge I can't even begin to imagine how it calculated.
You could add a method to the Category model that only returns published notes:
class Category(models.Model):
... model definition ...
def published_notes(self):
return self.notes_set.filter(published_date__lte=timezone.now())
And use that in the template instead:
{{ category.published_notes|length }}
The method could well be useful elsewhere in the project too. You're likely to want to do the same logic of only-published-notes in multiple places.
You need to use aggregation. In your view you can annotate each category with the count of its related notes, filtering the count to published notes only.
from django.db.models import Count, Q
categories = Category.objects.annotate(note_count=Count("note", filter=Q(note__published_date__lte=timezone.now())))
Now each category has an attribute note_count which has the number of published notes.

Queryset Django Show Object one time

I have a question about makeing queries with django.
I am filtering my query,
but in few of them they have more then one information.
For example
class Teacher(models.Model):
teacher_id = ....
language = models.ForeignKey(Language)
class Language(models.Model):
language_id = ....
language = ....
Now I got a few Objects from Teacher and Language in my Database
Teacher 1 has got many information and he can speak 3 language.
Is there this ForeignKey ok, or should i rather use ManyToManyField?
(Thats not the final question)
i now can search for some teacher and i have a form, where i can put some information that is important for me.
Something like, i am searching for a teacher, who can speck english and french, for example.
I have this in my View
query = Teacher.object.all()
q = Q(is_active=True)
language = request.POST.getlist('language', [x.language_id for x in Language.objects.all()])
if language:
q &=Q(language__in=language)
Now im using the Paginator,
paginator = Paginator(query.filter(q), items_per_page)
Now I got the problem right here,
my filter shows now all Teacher with the Language French and English but now 2 times.
I want to show all objects just one time..
What I have do to?
I Hope you understand, what i am meaning.

GAE Datastore ndb models accessed in 5 different ways

I run an online marketplace. I don't know the best way to access NDB models. I'm afraid it's a real mess and I really don't know which way to turn. If you don't have time for a full response, I'm happy to read an article on NDB best practices
I have these classes, which are interlinked in different ways:
User(webapp2_extras.appengine.auth.models.User) controls seller logins
Partner(ndb.Model) contains information about sellers
menuitem(ndb.Model) contains information about items on menu
order(ndb.Model) contains buyer information & information about an order (all purchases are "guest" purchases)
Preapproval(ndb.Model) contains payment information saved from PayPal
How they're linked.
User - Partner
A 1-to-1 relationship. Both have "email address" fields. If these match, then can retrieve user from partner or vice versa. For example:
user = self.user
partner = model.Partner.get_by_email(user.email_address)
Where in the Partner model we have:
#classmethod
def get_by_email(cls, partner_email):
query = cls.query(Partner.email == partner_email)
return query.fetch(1)[0]
Partner - menuitem
menuitems are children of Partner. Created like so:
myItem = model.menuitem(parent=model.partner_key(partner_name))
menuitems are referenced like this:
menuitems = model.menuitem.get_by_partner_name(partner.name)
where get_by_partner_name is this:
#classmethod
def get_by_partner_name(cls, partner_name):
query = cls.query(
ancestor=partner_key(partner_name)).order(ndb.GenericProperty("itemid"))
return query.fetch(300)
and where partner_key() is a function just floating at the top of the model.py file:
def partner_key(partner_name=DEFAULT_PARTNER_NAME):
return ndb.Key('Partner', partner_name)
Partner - order
Each Partner can have many orders. order has a parent that is Partner. How an order is created:
partner_name = self.request.get('partner_name')
partner_k = model.partner_key(partner_name)
myOrder = model.order(parent=partner_k)
How an order is referenced:
myOrder_k = ndb.Key('Partner', partnername, 'order', ordernumber)
myOrder = myOrder_k.get()
and sometimes like so:
order = model.order.get_by_name_id(partner.name, ordernumber)
(where in model.order we have:
#classmethod
def get_by_name_id(cls, partner_name, id):
return ndb.Key('Partner', partner_name, 'order', int(id)).get()
)
This doesn't feel particularly efficient, particularly as I often have to look up the partner in the datastore just to pull up an order. For example:
user = self.user
partner = model.Partner.get_by_email(user.email_address)
order = model.order.get_by_name_id(partner.name, ordernumber)
Have tried desperately to get something simple like myOrder = order.get_by_id(ordernumber) to work, but it seems that having a partner parent stops that working.
Preapproval - order.
a 1-to-1 relationship. Each order can have a 'Preapproval'. Linkage: a field in the Preapproval class: order = ndb.KeyProperty(kind=order).
creating a Preapproval:
item = model.Preapproval( order=myOrder.key, ...)
accessing a Preapproval:
preapproval = model.Preapproval.query(model.Preapproval.order == order.key).get()
This seems like the easiest method to me.
TL;DR: I'm linking & accessing models in many ways, and it's not very systematic.
User - Parner
You could replace:
#classmethod
def get_by_email(cls, partner_email):
query = cls.query(Partner.email == partner_email)
return query.fetch(1)[0]
with:
#classmethod
def get_by_email(cls, partner_email):
query = cls.query(Partner.email == partner_email).get()
But because of transactions issues is better to use entity groups: User should be parent of Partner.
In this case instead of using get_by_email you can get user without queries:
user = partner.key.parent().get()
Or do an ancestor query for getting the partner object:
partner = Partner.query(ancestor=user_key).get()
Query
Don't use fetch() if you don't need it. Use queries as iterators.
Instead of:
return query.fetch(300)
just:
return query
And then use query as:
for something in query:
blah
Relationships: Partner-Menu Item and Partner - Order
Why are you using entity groups? Ancestors are not used for modeling 1 to N relationships (necessarily). Ancestors are used for transactions, defining entity groups. They are useful in composition relationships (e.g.: partner - user)
You can use a KeyProperty for the relationship. (multivalue (i.e. repeated=true) or not, depending on the orientation of the relationship)
Have tried desperately to get something simple like myOrder = order.get_by_id(ordernumber) to work, but it seems that having a partner parent stops that working.
No problem if you stop using ancestors in this relationship.
TL;DR: I'm linking & accessing models in many ways, and it's not very systematic
There is not a systematic way of linking models. It depends of many factors: cardinality, number of possible items in each side, need transactions, composition relationship, indexes, complexity of future queries, denormalization for optimization, etc.
Ok, I think the first step in cleaning this up is as follows:
At the top of your .py file, import all your models, so you don't have to keep using model.ModelName. That cleans up a bit if the code. model.ModelName becomes ModelName.
First best practice in cleaning this up is to always use a capital letter as the first letter to name a class. A model name is a class. Above, you have mixed model names, like Partner, order, menuitem. It makes it hard to follow. Plus, when you use order as a model name, you may end up with conflicts. Above you redefined order as a variable twice. Use Order as the model name, and this_order as the lookup, and order_key as the key, to clear up some conflicts.
Ok, let's start there

Fetching data from parent table in Django in database hierarchy

Following this answer, I tried to split my SQL Story table into parent/children - with the children holding the specific user data, the parent more generic data. Now I've run into a problem that betrays my lack of experience in Django. My user page attempts to show a list of all the stories that a user has written. Before, when my user page was only pulling data from the story table, it worked fine. Now I need to pull data from two tables with linked info and I just can't work out how to do it.
Here's my user_page view before attempts to pull data from the parent story table too:
def user_page(request, username):
user = get_object_or_404(User, username=username)
userstories = user.userstory_set.order_by('-id')
variables = RequestContext(request, {
'username': username,
'userstories': userstories,
'show_tags': True
})
return render_to_response('user_page.html', variables)
Here is my models.py:
class story(models.Model):
title = models.CharField(max_length=400)
thetext = models.TextField()
class userstory(models.Model):
main = models.ForeignKey(story)
date = models.DateTimeField()
user = models.ForeignKey(User)
I don't really know where to start in terms of looking up the appropriate information in the parent table too and assinging it to a variable. What I need to do is follow the 'main' Key of the userstory table into the story table and assign the story table as a variable. But I just can't see how to implement that in the definition.
EDIT: I've tried story = userstory.objects.get(user=user) but I get 'userstory matching query does not exist.'
Reading through your previous question that you linked to, I've discovered where the confusion lies. I was under the impression that a Story may have many UserStorys associated with it. Note that I'm using Capital for the class name, which is common Python practise. I've made this assumption because your model structure is allowing this to happen with the use of a Foreign Key in your UserStory model. Your model structure should look like this instead:
class Story(models.Model):
title = models.CharField(max_length=400)
thetext = models.TextField()
class UserStory(models.Model):
story = models.OneToOneField(Story) # renamed field to story as convention suggests
date = models.DateTimeField()
user = models.ForeignKey(User)
class ClassicStory(models.Model)
story = models.OneToOneField(Story)
date = models.DateTimeField()
author = models.CharField(max_length=200)
See the use of OneToOne relationships here. A OneToOne field denotes a 1-to-1 relationship, meaning that a Story has one, and only one, UserStory. This also means that a UserStory is related to exactly one Story. This is the "parent-child" relationship, with the extra constraint that a parent has only a single child. Your use of a ForeignKey before means that a Story has multiple UserStories associated with it, which is wrong for your use case.
Now your queries (and attribute accessors) will behave like you expected.
# get all of the users UserStories:
user = request.user
stories = UserStory.objects.filter(user=user).select_related('story')
# print all of the stories:
for s in stories:
print s.story.title
print s.story.thetext
Note that select_related will create a SQL join, so you're not executing another query each time you print out the story text. Read up on this, it is very very very important!
Your previous question mentions that you have another table, ClassicStories. It should also have a OneToOneField, just like the UserStories. Using OneToOne fields in this way makes it very difficult to iterate over the Story model, as it may be a "ClassicStory" but it might be a "UserStory" instead:
# iterate over ALL stories
allstories = Story.objects.all()
for s in allstories:
print s.title
print s.thetext
print s.userstory # this might error!
print s.classicstory # this might error!
See the issue? You don't know what kind of story it is. You need to check the type of story it is before accessing the fields in the sub-table. There are projects that help manage this kind of inheritance around, an example is django-model-utils InheritanceManager, but that's a little advanved. If you never need to iterate over the Story model and access it's sub tables, you don't need to worry though. As long as you only access Story from ClassicStories or UserStories, you will be fine.

A good data model for finding a user's favorite stories

Original Design
Here's how I originally had my Models set up:
class UserData(db.Model):
user = db.UserProperty()
favorites = db.ListProperty(db.Key) # list of story keys
# ...
class Story(db.Model):
title = db.StringProperty()
# ...
On every page that displayed a story I would query UserData for the current user:
user_data = UserData.all().filter('user =' users.get_current_user()).get()
story_is_favorited = (story in user_data.favorites)
New Design
After watching this talk: Google I/O 2009 - Scalable, Complex Apps on App Engine, I wondered if I could set things up more efficiently.
class FavoriteIndex(db.Model):
favorited_by = db.StringListProperty()
The Story Model is the same, but I got rid of the UserData Model. Each instance of the new FavoriteIndex Model has a Story instance as a parent. And each FavoriteIndex stores a list of user id's in it's favorited_by property.
If I want to find all of the stories that have been favorited by a certain user:
index_keys = FavoriteIndex.all(keys_only=True).filter('favorited_by =', users.get_current_user().user_id())
story_keys = [k.parent() for k in index_keys]
stories = db.get(story_keys)
This approach avoids the serialization/deserialization that's otherwise associated with the ListProperty.
Efficiency vs Simplicity
I'm not sure how efficient the new design is, especially after a user decides to favorite 300 stories, but here's why I like it:
A favorited story is associated with a user, not with her user data
On a page where I display a story, it's pretty easy to ask the story if it's been favorited (without calling up a separate entity filled with user data).
fav_index = FavoriteIndex.all().ancestor(story).get()
fav_of_current_user = users.get_current_user().user_id() in fav_index.favorited_by
It's also easy to get a list of all the users who have favorited a story (using the method in #2)
Is there an easier way?
Please help. How is this kind of thing normally done?
What you've described is a good solution. You can optimise it further, however: For each favorite, create a 'UserFavorite' entity as a child entity of the relevant Story entry (or equivalently, as a child entity of a UserInfo entry), with the key name set to the user's unique ID. This way, you can determine if a user has favorited a story with a simple get:
UserFavorite.get_by_name(user_id, parent=a_story)
get operations are 3 to 5 times faster than queries, so this is a substantial improvement.
I don't want to tackle your actual question, but here's a very small tip: you can replace this code:
if story in user_data.favorites:
story_is_favorited = True
else:
story_is_favorited = False
with this single line:
story_is_favorited = (story in user_data.favorites)
You don't even need to put the parentheses around the story in user_data.favorites if you don't want to; I just think that's more readable.
You can make the favorite index like a join on the two models
class FavoriteIndex(db.Model):
user = db.UserProperty()
story = db.ReferenceProperty()
or
class FavoriteIndex(db.Model):
user = db.UserProperty()
story = db.StringListProperty()
Then your query on by user returns one FavoriteIndex object for each story the user has favorited
You can also query by story to see how many users have Favorited it.
You don't want to be scanning through anything unless you know it is limited to a small size
With your new Design you can lookup if a user has favorited a certain story with a query.
You don't need the UserFavorite class entities.
It is a keys_only query so not as fast as a get(key) but faster then a normal query.
The FavoriteIndex classes all have the same key_name='favs'.
You can filter based on __key__.
a_story = ......
a_user_id = users.get_current_user().user_id()
favIndexKey = db.Key.from_path('Story', a_story.key.id_or_name(), 'FavoriteIndex', 'favs')
doesFavStory = FavoriteIndex.all(keys_only=True).filter('__key__ =', favIndexKey).filter('favorited_by =', a_user_id).get()
If you use multiple FavoriteIndex as childs of a Story you can use the ancestor filter
doesFavStory = FavoriteIndex.all(keys_only=True).ancestor(a_story).filter('favorited_by =', a_user_id).get()

Categories