Dereference Models from many to many relationship - python

In my schema, as described in the below test data generation example, I want to know a good way to:
Dereference all instances of Favourites that have reference keys to instances of Pictures that have been deleted. Just delete any Favourite that links to a deleted picture.
The Person class is a user
The Picture class is something that can be a Favourite
The Favourite class is an example of the Link-Model way of having many-to-many relationships.
Why this question?
First I hope it doesn't fall out of the scope here, second because this can happen and third because it's interesting.
How?
Let's say that a person can have up to thousands favourites, something like Likes are on social networks or to make it worse, orders, accounts or invalid data in a scientific application.
In our example for some reason (and these reasons happen) a person is experiencing lot of dead favourite link, or I do know, that there are dead favourites.
What would be a good way to do this, reducing ndb.get() operations and not iterating through every Favourite.
Lets not complicate things. Lets make the assumption that we have only one user suffering from dead favourites. He has a class of Person and stubbed user_id property of '123'.
In the following example you can use the following handlers and their corresponding functions.
import time
import sys
import logging
import random
import cgi
import webapp2
from google.appengine.ext import ndb
class Person(ndb.Expando):
pass
class Picture(ndb.Expando):
pass
class Favourite(ndb.Expando):
user_id = ndb.StringProperty(required=True)
#picture = ndb.KeyProperty(kind=Picture, required=True)
pass
class GenerateDataHandler(webapp2.RequestHandler):
def get(self):
try:
number_of_models = abs(int(cgi.escape(self.request.get('n'))))
except:
number_of_models = 10
logging.info("GET ?n=parameter not defined. Using default.")
pass
user_id = '123' #stub
person = Person.query().filter(ndb.GenericProperty('user_id') == user_id).get()
if not person:
person = Person()
person.user_id = user_id #Stub
person.put()
logging.info("Created Person instance")
if not self._gen_data(person, number_of_models):
return
self.response.write("Data generated successfully")
def _gen_data(self, person, number_of_models):
first, last = Picture.allocate_ids(number_of_models)
picture_keys = [ndb.Key(Picture, id) for id in range(first, last+1)]
pictures = []
favourites = []
for picture_key in picture_keys:
picture = Picture(key=picture_key)
pictures.append(picture)
favourite = Favourite(parent=person.key,
user_id=person.user_id,
picture=picture_key
)
favourites.append(favourite)
entities = favourites
entities[1:1] = pictures
ndb.put_multi(entities)
return True
class CorruptDataHandler(webapp2.RequestHandler):
def get(self):
if not self._corrupt_data(0.5):#50% corruption
return
self.response.write("Data corruption completed successfully")
def _corrupt_data(self, n):
picture_keys = Picture.query().fetch(99999, keys_only=True)
random_picture_keys = random.sample(picture_keys, int(float(len(picture_keys))*n))
ndb.delete_multi(random_picture_keys)
return True
class FixDataHandler(webapp2.RequestHandler):
def get(self):
user_id = '123' #stub
person = Person.query().filter(ndb.GenericProperty('user_id') == user_id).get()
self._dereference(person)
def _dereference(self, person):
#Here if where you implement your answer
Separate handlers due to eventual consistency in
the NDB Datastore. More info:
GAE put_multi() entities using backend NDB
Of course I am posting an answer as well to show that I tried something before posting this.

A ReferenceProperty is just a key, so if you have the key of the deleted Person, you can use that to query the Favourite.
Otherwise, there's no easy way. You'll have to filter through all Favourites and find ones that have an invalid Picture. It's very simple in a mapreduce job, but could be an expensive query if you have a lot of Favourites.

You could use a pre delete hook (look here for a way to implement it)
Of course this could be done easier if you use the NDB API instead of the Datastore API (hooks on NDB), but then you'll have to change the way you make the referenes

Related

POST List of Objects w/ endpoints-proto-datastore

tl;dr: is it possible, with endpoints-proto-datastore, to receive a list with objects from a POST and insert it in the db?
Following the samples, when building my API i didn't got how could i let the users POST a list of objects so that i could be more efficient about putting a bunch of data in the db using ndb.put_multi, for example.
From this comment here at endpoints_proto_datastore.ndb.model i imagine that it is not possible with how it is designed. Am i right or i am missing something?
Extending the sample provided by endpoints achieved the desired with:
class Greeting(messages.Message):
message = messages.StringField(1)
class GreetingCollection(messages.Message):
items = messages.MessageField(Greeting, 1, repeated=True)
# then inside the endpoints.api class
#endpoints.method(GreetingCollection, GreetingCollection,
path='hellogretting', http_method='POST',
name='greetings.postGreeting')
def greetings_post(self, request):
result = [item for item in request.items]
return GreetingCollection(items=result)
-- edit --
See the docs about POSTing into the datastore, your only issue is that your models aren't EndpointsModels. Instead define a datastore model for both your Greeting and GreetingCollection:
from endpoints_proto_datastore.ndb import EndpointsModel
class Greeting(EndpointsModel):
message = ndb.StringProperty()
class GreetingCollection(EndpointsModel):
items = ndb.StructuredProperty(Greeting, repeated=True)
Once you've done this, you can use
class MyApi(remote.Service):
# ...
#GreetingCollection.method(path='hellogretting', http_method='POST',
name='greetings.postGreeting')
def greetings_post(self, my_collection):
ndb.put_multi(my_collection.items)
return my_collection

How to get a collection_name without having and instance of the referencing object?

I'm doing a simple program about customers, products and drafts.
Since they are referenced to each other in some way, when I delete one entity of a kind, another entity of another kind might give an error.
Here's what I have:
-customer.py
class Customer(db.Model):
"""Defines the Customer entity or model."""
c_name = db.StringProperty(required=True)
c_address = db.StringProperty()
c_email = db.StringProperty() ...
-draft.py
class Draft(db.Model):
"""Defines the draft entity or model."""
d_customer = db.ReferenceProperty( customer.Customer,
collection_name='draft_set')
d_address = db.StringProperty()
d_country = db.StringProperty() ...
Ok, now what I want to do is check if a customer has any Draft referencing to him, before deleting him.
This is the code I'm using:
def deleteCustomer(self, customer_key):
'''Deletes an existing Customer'''
# Get the customer by its key
customer = Customer.get(customer_key)
if customer.draft_set: # (or customer.draft_set.count > 0...)
customer.delete()
else:
do_something_else()
And now, it comes the problem.
If I have a draft previously created with the selected customer on it, there's no problem at all, and it does what has to do. But if I haven't created any draft that references to that customer, when trying to delete him, it will show this error:
AttributeError: 'Customer' object has no attribute 'draft_set'
What am I doing wrong? Is it needed to always create a Draft including a Customer for him to have the collection_name property "available"?
EDIT: I found out what the error was.
Since I have both classes in different .py files, it seems that GAE loads the entities into the datastore at the same moment as it "goes through" the file that contains that model.
Therefore, if I'm executing the program, and never use or import that file, the datastore is not updated until then.
Now what I'm doing is:
from draft.py import Draft
inside de "deleteCustomer()" function and it's finally working fine, but I get a horrible "warning not used" because of so.
Is there any other way I can fix this?
The collection_name property a query, so it should always be available.
What you may be missing is the reference_class parameter (check the ReferenceProperty docs)
class Draft(db.Model):
"""Defines the draft entity or model."""
d_customer = db.ReferenceProperty(reference_class=customer.Customer, collection_name='draft_set')
The following should work:
if customer.draft_set.count():
customer.delete()
note that customer.draft_set will always return true, as it is the generated Query object, so you MUST use the count()
There were two possible solutions:
Ugly, bad one: as described in my edited question.
Best practice: put all the models together inside one file (e.g. models.py) that looks like this:
class Customer(db.Model):
"""Defines the Customer entity or model."""
c_name = db.StringProperty(required=True)
c_address = db.StringProperty()
c_email = db.StringProperty() ...
class Draft(db.Model):
"""Defines the draft entity or model."""
d_customer = db.ReferenceProperty( customer.Customer,
collection_name='draft_set')
d_address = db.StringProperty()
d_country = db.StringProperty() ...
Easy!

Generating fixture data with Python's fixture module

I'm working with the fixture module for the first time, trying to get a better set of fixture data so I can make our functional tests more complete.
I'm finding the fixture module a bit clunky, and I'm hoping there's a better way to do what I'm doing. This is a Flask/SQLAlchemy app in Python 2.7, and we're using nose as a test runner.
So I have a set of employees. Employees have roles. There are a few pages with rather complex permissions, and I'd like to make sure those are tested.
I created a DataSet that has each type of role (there are about 15 roles in our app):
class EmployeeData(DataSet):
class Meta:
storable = Employee
class engineer:
username = "engineer"
role = ROLE_ENGINEER
class manager:
username = "manager"
role = ROLE_MANAGER
class admin:
username = "admin"
role = ROLE_ADMIN
and what I'd like to do is write a functional test that checks only the right people can access a page. (The actual permissions are way more complicated, I just wanted a toy example to show you.)
Something like this:
def test_only_admin_can_see_this_page():
for employee in Employee.query.all():
login(employee)
with self.app.test_request_context('/'):
response = self.test_client.get(ADMIN_PAGE)
if employee.role == ROLE_ADMIN
eq_(200, response.status_code)
else:
eq_(401, response.status_code)
logout(employee)
Is there a way to generate the fixture data so my devs don't have to remember to add a line to the fixtures every time we add a role? We have the canonical list of all roles as configuration elsewhere in the app, so I have that.
I'm not wedded to any of this or the fixture module, so I'm happy to hear suggestions!
An option would be to use factory_boy to create your test data.
Assuming that you keep and update accordingly a list of roles (that will be used later on) like this one:
roles = [ROLE_ENGINEER, ROLE_ADMIN, ROLE_MANAGER, ...]
Let's create a factory for the Employee table:
import factory
from somewhere.in.the.app import roles
class EmployeeFactory(factory.alchemy.SQLAlchemyModelFactory):
class Meta:
model = Employee
sqlalchemy_session = session
username = factory.Sequence(lambda n: u'User %d' % n)
# Other attributes
...
# Now the role choice
role = factory.fuzzy.FuzzyChoice(roles)
The FuzzyChoice method takes a list of choices and makes a random choice from this list.
Now this will be able to create any amount of Employee objects on demand.
Using the factory:
from factory.location import EmployeeFactory
def test_only_admin_can_see_this_page():
EmployeeFactory.create_batch(size=100)
for employee in session.query(Employee).all():
login(employee)
with self.app.test_request_context('/'):
response = self.test_client.get(ADMIN_PAGE)
if employee.role == ROLE_ADMIN
eq_(200, response.status_code)
else:
eq_(401, response.status_code)
logout(employee)
Breakdown:
EmployeeFactory.create_batch(size=100) Creates 100 Employee objects in the test session.
We can access those objects from the factory session.
More information about using factory_boy with SQLAlchemy: https://factoryboy.readthedocs.io/en/latest/orms.html?highlight=sqlalchemy#sqlalchemy.
Be careful with session management especially: https://factoryboy.readthedocs.io/en/latest/orms.html?highlight=sqlalchemy#managing-sessions

Query Set for Models Retrieved from External API

I am currently developing a web app that uses the Amazon Product API to get information on books. I use a Book model that contains only the ASIN amazon identification code that looks a bit like this:
class Book(models.Model):
asin = models.CharField(max_length-10, unique=True)
def retrieve(self, **kwargs):
kwargs['ItemId'] = self.asin
self.xml = amazon_lookup(**kwargs) # returns BeautifulSoup
#property
def title(self):
try:
return self.xml.ItemAttriibutes.Author.text
except AttributeError: # happens when xml has not been populated
return None
...
class BookManager(models.Manager):
def retrieve(self, **kwargs):
kwargs['SearchIndex'] = 'Books'
book_search = amazon_search(**kwargs)
books = []
for item in book_search:
book = self.get_or_create(asin=item.ASIN.text)[0]
book.xml = item
books.append(book)
return books
And then I can call it with
b = Book.objects.retrieve(Keywords="foo bar")
b.retrieve(ResponseGroup="Images,ItemAttributes,...")
t = b.title
This works well enough for tests, but I want a more robust system for future use.
What I'd really like to do is be able to perform searches with query sets so that frequently accessed results can be cached. As it stands, every request for a book detail view creates a new amazon API call. My job would be a lot easier if all the API calls were handled inside Query Sets alongside database calls. Unfortunately, I've found the Django Query Set documentation pretty cryptic and lacking when it comes to customization. This surely isn't a rare use case.
Can anyone provide the idiomatic way of handling a problem like this, or a good resource on the subject?

Google App Engine Python Datastore

Basically what Im trying to make is a data structure where it has the users name, id, and datejoined. Then i want a "sub-structure" where it has the users "text" and the date it was modified. and the user will have multiple instances of this text.
class User(db.Model):
ID = db.IntegerProperty()
name = db.StringProperty()
datejoined = db.DateTimeProperty(auto_now_add=True)
class Content(db.Model):
text = db.StringProperty()
datemod= db.DateTimeProperty(auto_now_add = True)
Is the code set up correctly?
One problem you will have is that making User.ID unique will be non-trivial. The problem is that two writes to the database could occur on different shards, both check at about the same time for existing entries that match the uniqueness constraint and find none, then both create identical entries (with regard to the unique property) and then you have an invalid database state. To solve this, appengine provides a means of ensuring that certain datastore entities are always placed on the same physical machine.
To do this, you make use of the entity keys to tell google how to organize the entities. Lets assume you want the username to be unique. Change User to look like this:
class User(db.Model):
datejoined = db.DateTimeProperty(auto_now_add=True)
Yes, that's really it. There's no username since that's going to be used in the key, so it doesn't need to appear separately. If you like, you can do this...
class User(db.Model):
datejoined = db.DateTimeProperty(auto_now_add=True)
#property
def name(self):
return self.key().name()
To create an instance of a User, you now need to do something a little different, you need to specify a key_name in the init method.
someuser = User(key_name='john_doe')
...
someuser.save()
Well, really you want to make sure that users don't overwrite each other, so you need to wrap the user creation in a transaction. First define a function that does the neccesary check:
def create_user(username):
checkeduser = User.get_by_key_name(username)
if checkeduser is not None:
raise db.Rollback, 'User already exists!'
newuser = User(key_name=username)
# more code
newuser.put()
Then, invoke it in this way
db.run_in_transaction(create_user, 'john_doe')
To find a user, you just do this:
someuser = User.get_by_key_name('john_doe')
Next, you need some way to associate the content to its user, and visa versa. One solution is to put the content into the same entity group as the user by declaring the user as a parent of the content. To do this, you don't need to change the content at all, but you create it a little differently (much like you did with User):
somecontent = Content(parent=User.get_by_key_name('john_doe'))
So, given a content item, you can look up the user by examining its key:
someuser = User.get(somecontent.key().parent())
Going in reverse, looking up all of the content for a particular user is only a little trickier.
allcontent = Content.gql('where ancestor is :user', user=someuser).fetch(10)
Yes, and if you need more documentation, you can check here for database types and here for more info about your model classes.
An alternative solution you may see is using referenceproperty.
class User(db.Model):
name = db.StringProperty()
datejoined = db.DateTimeProperty(auto_now_add=True)
class Content(db.Model):
user = db.ReferenceProperty(User,collection_name='matched_content')
text = db.StringProperty()
datemod= db.DateTimeProperty(auto_now_add = True)
content = db.get(content_key)
user_name = content.user.name
#looking up all of the content for a particular user
user_content = content.user.matched_content
#create new content for a user
new_content = Content(reference=content.user)

Categories