Query Set for Models Retrieved from External API - python

I am currently developing a web app that uses the Amazon Product API to get information on books. I use a Book model that contains only the ASIN amazon identification code that looks a bit like this:
class Book(models.Model):
asin = models.CharField(max_length-10, unique=True)
def retrieve(self, **kwargs):
kwargs['ItemId'] = self.asin
self.xml = amazon_lookup(**kwargs) # returns BeautifulSoup
#property
def title(self):
try:
return self.xml.ItemAttriibutes.Author.text
except AttributeError: # happens when xml has not been populated
return None
...
class BookManager(models.Manager):
def retrieve(self, **kwargs):
kwargs['SearchIndex'] = 'Books'
book_search = amazon_search(**kwargs)
books = []
for item in book_search:
book = self.get_or_create(asin=item.ASIN.text)[0]
book.xml = item
books.append(book)
return books
And then I can call it with
b = Book.objects.retrieve(Keywords="foo bar")
b.retrieve(ResponseGroup="Images,ItemAttributes,...")
t = b.title
This works well enough for tests, but I want a more robust system for future use.
What I'd really like to do is be able to perform searches with query sets so that frequently accessed results can be cached. As it stands, every request for a book detail view creates a new amazon API call. My job would be a lot easier if all the API calls were handled inside Query Sets alongside database calls. Unfortunately, I've found the Django Query Set documentation pretty cryptic and lacking when it comes to customization. This surely isn't a rare use case.
Can anyone provide the idiomatic way of handling a problem like this, or a good resource on the subject?

Related

NDB: how to get child entities that depend on values stored on a parent structured propery

I have the following models:
class Roles(ndb.Model):
email = ndb.StringProperty(required=True)
type = ndb.StringProperty(choices=['writer', 'editor', 'admin']
class Book(ndb.Model):
uid = dnb.StringProperty(required=True)
user = ndb.UserProperty(auto_current_user_add=True)
name = ndb.StringProperty(required=True)
shared_with = ndb.StructuredProperty(Roles, repeated=True, indexed=True)
class Page(ndb.Model):
uid = dnb.StringProperty(required=True)
user = ndb.UserProperty(auto_current_user_add=True)
title = ndb.StringProperty(required=True)
parent_uid = ndb.ComputedProperty(lambda self: self.key.parent().get().uid)
shared_with = ndb.ComputedProperty(lambda self: self.key.parent().get().shared_with)
The structure I am using is:
Book1 Book2 - (parent)
| |
^ ^
pages pages - (child)
When a Book is created, the shared_with is filled with a list of emails/roles.
For example:
Book.uid = user.user_id()
Book.user = user
Book.name = "learning appengine NDB"
Book.shared_with = [Roles("user_1#domain.tld", "admin"), Roles("user_2#domain.tld", "editor")]
When a user creates a Page, the user.user_id() is stored as uid.
Example when user_2#domain.tld (role type: editor) creates a page:
Page.title = "understanding ComputedProperty"
Page.uid = user.user_id()
Page.user = user
With this schema, if I want to show to user_2#domain.tld only The pages he has created, I can do a simple query by filtering by uid, with something like:
# supposing user_2#domain.tld is logged in
user2_pages = Page.query(Page.uid = user.user_id())
But for other users that are listed on the shared_with property of the Book, how could I continue to show their own (pages they created), and all the rest only if they have a Role(admin,editor).
For example, if I want to allow other users (admins,editors); to see a list of last pages created for all the books, how could I perform a query to do so?
What I have been trying so far and not working, is to use a ComputedProperty, I can't make it work as expected.
To verify that I get the correct values, I do a query like:
query = Pages.query().get()
print query.parent_uid
I do get the parent uid, same with the the shared.with values, but for an unknown reason I can't filter with them, when using something like:
query = Pages.query(
Pages.parent_uuid == user.user_id()
)
# query returns None
A probably better and simpler approach is to show pages per book but I would like to know if it is possible to do it for all the books, so that admins and editors can just see a list of last pages created in general, instead of going into each book.
Any ideas?
Your computed property cannot work because it's only updated when Page entity is put. See https://stackoverflow.com/a/12630991/1756187. Any changes to Book entities have no effect on Page computed properties.
You can try to use Model hooks to maintain Page.shared_with. See https://developers.google.com/appengine/docs/python/ndb/entities#hooks.
I'm wondering though if this is the best approach. If you have the sharing info on the Book level, you can use its index to retrieve the list of book keys. You can do that using keys only query. Then you can retrieve the list of all pages for these parent keys. That way you don't have to add shared_with attribute to Page model at all. The cost of query will be slightly bigger, but the Page entities will be smaller and cheaper to maintain

POST List of Objects w/ endpoints-proto-datastore

tl;dr: is it possible, with endpoints-proto-datastore, to receive a list with objects from a POST and insert it in the db?
Following the samples, when building my API i didn't got how could i let the users POST a list of objects so that i could be more efficient about putting a bunch of data in the db using ndb.put_multi, for example.
From this comment here at endpoints_proto_datastore.ndb.model i imagine that it is not possible with how it is designed. Am i right or i am missing something?
Extending the sample provided by endpoints achieved the desired with:
class Greeting(messages.Message):
message = messages.StringField(1)
class GreetingCollection(messages.Message):
items = messages.MessageField(Greeting, 1, repeated=True)
# then inside the endpoints.api class
#endpoints.method(GreetingCollection, GreetingCollection,
path='hellogretting', http_method='POST',
name='greetings.postGreeting')
def greetings_post(self, request):
result = [item for item in request.items]
return GreetingCollection(items=result)
-- edit --
See the docs about POSTing into the datastore, your only issue is that your models aren't EndpointsModels. Instead define a datastore model for both your Greeting and GreetingCollection:
from endpoints_proto_datastore.ndb import EndpointsModel
class Greeting(EndpointsModel):
message = ndb.StringProperty()
class GreetingCollection(EndpointsModel):
items = ndb.StructuredProperty(Greeting, repeated=True)
Once you've done this, you can use
class MyApi(remote.Service):
# ...
#GreetingCollection.method(path='hellogretting', http_method='POST',
name='greetings.postGreeting')
def greetings_post(self, my_collection):
ndb.put_multi(my_collection.items)
return my_collection

Working with ancestors in GAE

I only want that someone confirm me that I'm doing things in the right way.
I have this structure: Books that have Chapters (ancestor=Book) that have Pages (ancestor=Chapter)
It is clear for me that, to search for a Chapter by ID, I need the book to search by ancestor query.
My doubt is: do I need all the chain book-chapter to search a page?
For example (I'm in NDB):
class Book(ndb.Model):
# Search by id
#classmethod
def by_id(cls, id):
return Book.get_by_id(long(id))
class Chapter(ndb.Model):
# Search by id
#classmethod
def by_id(cls, id, book):
return Chapter.get_by_id(long(id), parent=book.key)
class Page(ndb.Model):
# Search by id
#classmethod
def by_id(cls, id, chapter):
return Page.get_by_id(long(id), parent=chapter.key)
Actually, when I need to search a Page to display its contents, I'm passing the complete chain in the url like this:
getPage?bookId=5901353784180736&chapterId=5655612935372800&pageId=1132165198169
So, in the controller, I make this:
def get(self):
# Get the id parameters
bookId = self.request.get('bookId')
chapterId = self.request.get('chapterId')
pageId = self.request.get('pageId')
if bookId and chapterId and pageId:
# Must be a digit
if bookId.isdigit() and chapterId.isdigit() and pageId.isdigit():
# Get the book
book = Book.by_id(bookId)
if book:
# Get the chapter
chapter = Chapter.by_id(chapterId, book)
if chapter:
# Get the page
page = Page.by_id(pageId, chapter)
Is this the right way? Must I have always the complete chain in the URL to get the final element of the chain?
If this is right, I suppose that this way of work, using NDB, does not have any impact on the datastore, because repeated calls to this page always hit the NDB cache for the same book, chapter and page (because I'm getting by id, is not a fetch command). Is my suppose correct?
No, there's no need to do that. The point is that keys are paths: you can build them up dynamically and only hit the datastore when you have a complete one. In your case, it's something like this:
page_key = ndb.Key(Book, bookId, Chapter, chapterId, Page, pageId)
page = page_key.get()
See the NDB docs for more examples.

Dereference Models from many to many relationship

In my schema, as described in the below test data generation example, I want to know a good way to:
Dereference all instances of Favourites that have reference keys to instances of Pictures that have been deleted. Just delete any Favourite that links to a deleted picture.
The Person class is a user
The Picture class is something that can be a Favourite
The Favourite class is an example of the Link-Model way of having many-to-many relationships.
Why this question?
First I hope it doesn't fall out of the scope here, second because this can happen and third because it's interesting.
How?
Let's say that a person can have up to thousands favourites, something like Likes are on social networks or to make it worse, orders, accounts or invalid data in a scientific application.
In our example for some reason (and these reasons happen) a person is experiencing lot of dead favourite link, or I do know, that there are dead favourites.
What would be a good way to do this, reducing ndb.get() operations and not iterating through every Favourite.
Lets not complicate things. Lets make the assumption that we have only one user suffering from dead favourites. He has a class of Person and stubbed user_id property of '123'.
In the following example you can use the following handlers and their corresponding functions.
import time
import sys
import logging
import random
import cgi
import webapp2
from google.appengine.ext import ndb
class Person(ndb.Expando):
pass
class Picture(ndb.Expando):
pass
class Favourite(ndb.Expando):
user_id = ndb.StringProperty(required=True)
#picture = ndb.KeyProperty(kind=Picture, required=True)
pass
class GenerateDataHandler(webapp2.RequestHandler):
def get(self):
try:
number_of_models = abs(int(cgi.escape(self.request.get('n'))))
except:
number_of_models = 10
logging.info("GET ?n=parameter not defined. Using default.")
pass
user_id = '123' #stub
person = Person.query().filter(ndb.GenericProperty('user_id') == user_id).get()
if not person:
person = Person()
person.user_id = user_id #Stub
person.put()
logging.info("Created Person instance")
if not self._gen_data(person, number_of_models):
return
self.response.write("Data generated successfully")
def _gen_data(self, person, number_of_models):
first, last = Picture.allocate_ids(number_of_models)
picture_keys = [ndb.Key(Picture, id) for id in range(first, last+1)]
pictures = []
favourites = []
for picture_key in picture_keys:
picture = Picture(key=picture_key)
pictures.append(picture)
favourite = Favourite(parent=person.key,
user_id=person.user_id,
picture=picture_key
)
favourites.append(favourite)
entities = favourites
entities[1:1] = pictures
ndb.put_multi(entities)
return True
class CorruptDataHandler(webapp2.RequestHandler):
def get(self):
if not self._corrupt_data(0.5):#50% corruption
return
self.response.write("Data corruption completed successfully")
def _corrupt_data(self, n):
picture_keys = Picture.query().fetch(99999, keys_only=True)
random_picture_keys = random.sample(picture_keys, int(float(len(picture_keys))*n))
ndb.delete_multi(random_picture_keys)
return True
class FixDataHandler(webapp2.RequestHandler):
def get(self):
user_id = '123' #stub
person = Person.query().filter(ndb.GenericProperty('user_id') == user_id).get()
self._dereference(person)
def _dereference(self, person):
#Here if where you implement your answer
Separate handlers due to eventual consistency in
the NDB Datastore. More info:
GAE put_multi() entities using backend NDB
Of course I am posting an answer as well to show that I tried something before posting this.
A ReferenceProperty is just a key, so if you have the key of the deleted Person, you can use that to query the Favourite.
Otherwise, there's no easy way. You'll have to filter through all Favourites and find ones that have an invalid Picture. It's very simple in a mapreduce job, but could be an expensive query if you have a lot of Favourites.
You could use a pre delete hook (look here for a way to implement it)
Of course this could be done easier if you use the NDB API instead of the Datastore API (hooks on NDB), but then you'll have to change the way you make the referenes

Does Django Have a Way to Auto-Sort Model Fields?

So basically, I've got a rather large Django project going. It's a private web portal that allows users to manage various phone-related tasks.
Several pages of the portal provide a listing of Model objects to users, and list all of their attributes in a HTML table (so that users can visually look through a list of these items).
The problem I'm having is: I cannot find a Django-ish or pythonic way to handle the sorting of these Model objects by field name. As an example of what I'm talking about, here is one of my views which lists all Partyline Model objects:
def list_partylines(request):
"""
List all `Partyline`s that we own.
"""
# Figure out which sort term to use.
sort_field = request.REQUEST.get('sortby', 'did').strip()
if sort_field.startswith('-'):
search = sort_field[1:]
sort_toggle = ''
else:
search = sort_field
sort_toggle = '-'
# Check to see if the sort term is valid.
if not (search in Partyline._meta.get_all_field_names()):
sort_field = 'did'
if is_user_type(request.user, ['admin']):
partylines = Partyline.objects.all().order_by(sort_field)
else:
partylines = get_my_partylines(request.user, sort_field)
variables = RequestContext(request, {
'partylines': partylines,
'sort_toggle': sort_toggle
})
return render_to_response('portal/partylines/list.html', variables)
The sorting code basically allows users to specify a /url/?sortby=model_field_name parameter which will then return a sorted listing of objects whenever users click on the HTML table name displayed on the page.
Since I have various views in various apps which all show a listing of Model objects, and require sorting, I'm wondering if there is a generic way to do this sorting so that I don't have to?
I'm sorry if this question is a bit unclear, I'm struggling to find the right way to phrase this question.
Thanks.
The way that I'd look at doing this is through a custom QuerySet. In your model, you can define the class QuerySet and add your sorting there. In order to maintain all the logic in the model object, I'd also move the contents of get_my_partylines into the QuerySet, too.
## This class is used to replicate QuerySet methods into a manager.
## This way: Partyline.objects.for_user(foo) works the same as
## Partyline.objects.filter(date=today).for_user(foo)
class CustomQuerySetManager(models.Manager):
def get_query_set(self):
return self.model.QuerySet(self.model)
def __getattr__(self, attr, *args):
try:
return getattr(self.__class__, attr, *args)
except AttributeError:
return getattr(self.get_query_set(), attr, *args)
class Partyline(models.Model):
## Define fields, blah blah.
objects = CustomQuerySetManager()
class QuerySet(QuerySet):
def sort_for_request(self, request):
sort_field = request.REQUEST.get('sortby', 'did').strip()
reverse_order = False
if sort_field.startswith('-'):
search = sort_field[1:]
else:
search = sort_field
reverse_order = True
# Check to see if the sort term is valid.
if not (search in Partyline._meta.get_all_field_names()):
sort_field = 'did'
partylines = self.all().order_by(sort_field)
if reverse_order:
partylines.reverse()
return partylines
def for_user(self, user):
if is_user_type(request.user, ['admin']):
return self.all()
else:
## Code from get_my_partylines goes here.
return self.all() ## Temporary.
views.py:
def list_partylines(request):
"""
List all `Partyline`s that we own.
"""
partylines = Partylines.objects.for_user(request.user).sort_for_request(request)
There's a great example of how this is done in a generic way in django.contrib.admin.views.main.ChangeList although that does much more than sorting you can browse it's code for some hints and ideas. You may also want to look at django.contrib.admin.options.ModelAdmin the changelist method in particular to get more context.

Categories