How to model my app in the Google App Engine Datastore - python

I have a hard time to model my applications data to get reasonable performance.
It is an application that track costs within a group of people and today I have the following entities:
class Event(db.Model):
# Values
name = db.StringProperty(required=True)
password = db.StringProperty(required=True)
class Person(db.Model):
# References
event = db.ReferenceProperty(Event, required=True)
# Values
name = db.StringProperty(required=True)
class Transaction(db.Model):
# References
event = db.ReferenceProperty(Event, required=True)
paidby = db.ReferenceProperty(Person, required=True)
# Values
description = db.StringProperty(required=True)
amount = db.FloatProperty(required=True)
datetime = db.DateTimeProperty(auto_now_add=True)
# This is used because a transaction might not distribute costs
# evenly across all persons belonging to the event
class TransactionPerson(db.Model):
# References
event = db.ReferenceProperty(Event, required=False)
transaction = db.ReferenceProperty(Transaction, required=True)
person = db.ReferenceProperty(Person, required=True)
# Values
amount = db.FloatProperty(required=True)
The problem is when I for example want to calculate the balance for each person, then I have to get all the data associated with an event and loop through all TransactionPerson for each Transaction/Person combination (in the below example that is ~65.000 operations)
I have an event example with:
4 Persons
76 Transaction
213 TransactionPerson
And a request to the start page that shows this balance summary per person and all transactions takes:
real: 1863ms
cpu: 6900ms (1769ms real)
api: 2723ms (94ms real)
At the moment I only do 3 RPC requests to get all persons, transactions and transactionpersons for an event and then do all the "relational" work in the application, thats why the cpu ms is pretty high.
Questions:
It takes 2723ms api cpu to only get the 293 objects from 3 datastore requests, isn't that pretty high? The real time is OK (94ms), but it takes a lot from my api cpu quotas?
How can I design this to get a lot better performance? Real ms today is 1863 for this example above, but if there are for example 12 persons the time would triple. These are not acceptable response times.
Thanks!

Generally you want to optimize for reads.
Instead of calculating a person's balance at read time, calculate changes at write time and denormalize, storing the calculated balance in the Person entity.

Related

How to build a Django QuerySet to check conditions on two manyToMany fields

I have the following models (simplified):
class Resource(models.Model):
name = models.CharField(max_length=64, unique=True)
class ResourceFlow(models.Model):
resource = models.ForeignKey(Resource, related_name="flow")
amount = models.IntegerField()
class Workflow(models.Model):
inputs = models.ManyToManyField(ResourceFlow, related_name="workflow")
class Stock(models):
resource = models.ForeignKey(Resource, related_name="stock")
amount = models.IntegerField()
class Producer(models.Model):
workflow = models.ForeignKey(Workflow, related_name="location")
stocks = models.ManyToManyField(Stock, related_name="location")
I would like to test with computation done by the the DB engine if I can start a production.
A production can start if I have enough stock: for my Producer's workflow, all inputs ResourcesFlow amount have to be present in the Producer'stocks
So the queryset might be one those result:
for a given producer return all stocked resources that do not fulfill Workflow inputs amounts conditions
for a given producer return inputs resources needed for the workflow that are not in sufficient quantity in its stocks
It is possible to do that in Django? And if yes how to do it?
Not sure if you've found the answer but anyways, hope I understand your question correctly.
Let's assume we have the following resources:
head = Resource.objects.create(name="head")
neck = Resource.objects.create(name="neck")
body = Resource.objects.create(name="body")
arm = Resource.objects.create(name="arm")
leg = Resource.objects.create(name="leg")
And we have a build_a_robot workflow:
build_a_robot = Workflow.objects.create()
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=head, amount=1))
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=neck, amount=1))
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=body, amount=1))
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=arm, amount=2))
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=leg, amount=2))
And finally, we have a producer:
producer = Producer.objects.create(workflow=build_a_robot)
producer.stocks.add(Stock.objects.create(resource=head, amount=0))
producer.stocks.add(Stock.objects.create(resource=neck, amount=3))
producer.stocks.add(Stock.objects.create(resource=body, amount=1))
producer.stocks.add(Stock.objects.create(resource=arm, amount=10))
producer.stocks.add(Stock.objects.create(resource=leg, amount=1))
We want to find the list of resources that we have run out of to build a robot given producer.
I think here's one way to do it:
from django.db.models import OuterRef, Subquery
required_resources = ResourceFlow.objects.filter(pk__in=producer.workflow.inputs.values("pk")).values("resource")
required_amount = producer.workflow.inputs.filter(resource=OuterRef("resource")).values("amount")[:1]
missing_stocks = Stock.objects.filter(resource_id__in=required_resources).filter(amount__lt=required_amount)
In this example, missing_stocks will be equal to:
<QuerySet [<Stock: Stock [Resource [head], 0]>, <Stock: Stock [Resource [leg], 1]>]>
So, we need more head and leg to build a robot.

Share data between serializes fields

I have a ModelSerializer class which implements some fields with SerializerMethodField. There are a few fields which calculates currency data according to the rates retrieved from a database if there are, otherwise from a bank API.
With current implementation there are a lot of short queries to the database just to retrieve or check a rate. I'm thinking of reducing the number of queries by sharing already calculated data between fields.
The code will explain the idea better:
class CalcualteSomethingSerializer(ModelSerializer):
currency_value_1 = serializers.SerializerMethodField()
currency_value_2 = serializers.SerializerMethodField()
def get_currency_value_1(obj):
if obj.currency_code != 'USD':
rate = get_rates('USD', obj.currency_code, obj.date)
return calculation_logic1(obj.value, rate)
def get_currency_value_2(obj):
if obj.currency_code != 'USD':
rate = get_rates('USD', obj.currency_code, obj.date)
return calculation_logic2(obj.value, rate)
I've tried to save the rates into self._kwargs but it reduced a number of queries only for 5 queries.

Should I use ndb structured property or a seperate model to limit my GAE queries. Basis data modeling questions.

I'm trying to reformat a previous question I posed in a more meaningful way here. If I'm not asking the correct question, or not providing enough information please let me know.
I have data coming in from a Pi that is in the below format and I would like a meaningful way of representing the data in the datastore (ndb)to limit my writes when it arrives, and limit the number of queries I need to make to find the correct place to put the data (aka Zone):
Data comes like this:
{'rssi': '*', 'source_addr': '87:ea:66:be:19:d9',
'id':'rx_io_data', 'samples': [{'adc-0': 253, 'adc-1': 2}, {'adc-0':
252,'adc-1': 2}, {'adc-0': 252, 'adc-1': 2}, {'adc-0': 253, 'adc-1':
1}, {'adc-0': 252, 'adc-1': 2}], 'options': '\x00'}
I've highlighted the information that is important (I don't need code for putting data in. more an appropriate structure for my models)..
So I will use the MAC address to find the "Zone" the readings should be associated with, then I will need to use Zone to lookup associated sensors (seems convoluted),then map the sensor (adc-0, adc-1) to it's human readable mapping( temperature, or heart monitor). I want to keep the total readings, and the individual sensor readings such that later I can query for all heart monitor sensors in all zones, or per zone...
So far I have this which seems convoluted and requires a lot of querying and puts:
class Sensors(ndb.Model): # parent Zone
sensorname = ndb.StringProperty(required=True) # Heart, Temp
sensorpin = ndb.StringProperty(required=True) # adc-0, or adc-1 ...
class Zone(ndb.Model):
zname = ndb.StringProperty(required=True) # Name of zone like "Room# or Patient#"
zonemac = ndb.StringProperty(required=True) # MAC of network adapter
homekey = ndb.KeyProperty(kind=Home, required=True)
datecreated = ndb.DateTimeProperty(auto_now_add=True)
class Readings(ndb.Model): # parent Zone
datecreated = ndb.DateTimeProperty(auto_now_add=True)
alldata = ndb.JsonProperty(required=True) # store all the readings in json for later debug
class Reading(ndb.Model): # parent is Sensor or zone ? individual sensor readings ?
readingskey = ndb.KeyProperty(kind=Readings, required=True) # full reading
value = ndb.IntegerProperty(required= True ) # or 0
name = ndb.StringProperty(required = True) # Heart sensor, temp sensor,... )
Potential Option:
class Sensors(ndb.Model):
sensorname = ndb.StringProperty(required=True) # Heart, Temp
sensorpin = ndb.StringProperty(required=True) # adc-0, or adc-1 ...
class Zone(ndb.Model):
zname = ndb.StringProperty(required=True) # Name of zone like "Room# or Patient#"
zonemac = ndb.StringProperty(required=True) # MAC of network adapter
homekey = ndb.KeyProperty(kind=Home, required=True)
sensors = ndb.StructuredProperty(Sensors, repeated = True) #sensors you can get sensor name but how do you map to adc-0 or whatever
datecreated = ndb.DateTimeProperty(auto_now_add=True)
class Readings(ndb.Model): # parent Zone
datecreated = ndb.DateTimeProperty(auto_now_add=True)
alldata = ndb.JsonProperty(required=True) # store all the readings in json for later debug
individualreading = ndb.StructuredProperty(Reading, repeat=True)
class Reading(ndb.Model): # parent is Sensor or zone ? individual sensor readings ?
readingskey = ndb.KeyProperty(kind=Readings, required=True) # full reading
value = ndb.IntegerProperty(required= True ) # or 0
name = ndb.StringProperty(required = True) # Heart sensor, temp sensor,... )
The trick is all I get from the devices are MAC and sensor mappings (ADC-0, ADC-1 and values)
So I need to look for the zones they belong to, and the sensors they map to to later search for them..
I have done very little database modeling so I have no clue how to do the modeling for this. I could create new models and reference them with key properties or created structuredproperties and query for those (eliminating the Sensors model) but then still having to do Readings and Reading after the lokups.
Any help is much appreciated.
And yes I've read the ndb properties and Filtering queries by structured properties and a few similar SO posts like this one.
I guess using a StructuredProperty would save a small amount of write operations. What I would propose though is figuring out which properties don't have to be indexed and declaring them with option indexed=False. As stated on the documentation, " Unindexed properties cost fewer write ops than indexed properties.", and by experience I can tell you that you can really save a lot of write ops this way.

Do I need to use transactions in google appengine

update 0
My def post() code has changed dramatically because originally it was base on a digital form which included both checkboxes and text entry fields, not just text entry fields, which is the current design to be more paper-like. However, as a result I have other problems which may be solved by one of the proposed solutions, but I cannot exactly follow that proposed solution, so let me try to explain new design and the problems.
The smaller problem is the inefficiency of my implementation because in the def post() I create a distinct name for each input timeslot which is a long string <courtname><timeslotstarthour><timeslotstartminute>. In my code this name is read in a nested for loop with the following snippet [very inefficient, I imagine].
tempreservation=courtname+str(time[0])+str(time[1])
name = self.request.get('tempreservation',None)
The more serious immediate problem is that my def post() code is never read and I cannot figure out why (and maybe it wasn't being read before, either, but I had not tested that far). I wonder if the problem is that for now I want both the post and the get to "finish" the same way. The first line below is for the post() and the second is for the get().
return webapp2.redirect("/read/%s" % location_id)
self.render_template('read.html', {'courts': courts,'location': location, ... etc ...}
My new post() is as follows. Notice I have left in the code the logging.info to see if I ever get there.
class MainPageCourt(BaseHandler):
def post(self, location_id):
logging.info("in MainPageCourt post ")
startTime = self.request.get('startTime')
endTime = self.request.get('endTime')
day = self.request.get('day')
weekday = self.request.get('weekday')
nowweekday = self.request.get('nowweekday')
year = self.request.get('year')
month = self.request.get('month')
nowmonth = self.request.get('nowmonth')
courtnames = self.request.get_all('court')
for c in courtnames:
logging.info("courtname: %s " % c)
times=intervals(startTime,endTime)
for courtname in courtnames:
for time in times:
tempreservation=courtname+str(time[0])+str(time[1])
name = self.request.get('tempreservation',None)
if name:
iden = courtname
court = db.Key.from_path('Locations',location_id,'Courts', iden)
reservation = Reservations(parent=court)
reservation.name = name
reservation.starttime = time
reservation.year = year
reservation.nowmonth = int(nowmonth)
reservation.day = int(day)
reservation.nowweekday = int(nowweekday)
reservation.put()
return webapp2.redirect("/read/%s" % location_id)
Eventually I want to add checking/validating to the above get() code by comparing the existing Reservations data in the datastore with the implied new reservations, and kick out to an alert which tells the user of any potential problems which she can address.
I would also appreciate any comments on these two problems.
end of update 0
My app is for a community tennis court. I want to replace the paper sign up sheet with an online digital sheet that mimics a paper sheet. As unlikely as it seems there may be "transactional" conflicts where two tennis appointments collide. So how do I give the second appointment maker a heads up to the conflict but also give the successful party the opportunity to alter her appointment like she would on paper (with an eraser).
Each half hour is a time slot on the form. People normally sign up for multiple half hours at one time before "submitting".
So in my code within a loop I do a get_all. If any get succeeds I want to give the user control over whether to accept the put() or not. I am still thinking the put() would be an all or nothing, not selective.
So my question is, do I need to make part of the code use an explicit "transaction"?
class MainPageCourt(BaseHandler):
def post(self, location_id):
reservations = self.request.get_all('reservations')
day = self.request.get('day')
weekday = self.request.get('weekday')
nowweekday = self.request.get('nowweekday')
year = self.request.get('year')
month = self.request.get('month')
nowmonth = self.request.get('nowmonth')
if not reservations:
for r in reservations:
r=r.split()
iden = r[0]
temp = iden+' '+r[1]+' '+r[2]
court = db.Key.from_path('Locations',location_id,'Courts', iden)
reservation = Reservations(parent=court)
reservation.starttime = [int(r[1]),int(r[2])]
reservation.year = int(r[3])
reservation.nowmonth = int(r[4])
reservation.day = int(r[5])
reservation.nowweekday = int(nowweekday)
reservation.name = self.request.get(temp)
reservation.put()
return webapp2.redirect("/read/%s" % location_id)
else:
... this important code is not written, pending ...
return webapp2.redirect("/adjust/%s" % location_id)
Have a look at optimistic concurrency control:
http://en.wikipedia.org/wiki/Optimistic_concurrency_control
You can check for the availability of the time slots in a given Court, and write the corresponding Reservations child entities only if their stat_time don't conflict.
Here is how you would do it for 1 single reservation using a ancestor Query:
#ndb.transactional
def make_reservation(court_id, start_time):
court = Court(id=court_id)
existing = Reservation.query(Reservation.start_time == start_time,
ancestor=court.key).fetch(2, keys_only=True)
if len(existing):
return False, existing[0]
return True, Reservation(start_time=start_time, parent=court.key).put()
Alternativly, if you make the slot part of the Reservation id, you can remove the query and construct the Reservation entity keys to check if they already exists:
#ndb.transactional
def make_reservations(court_id, slots):
court = Court(id=court_id)
rs = [Reservation(id=s, parent=court.key) for s in slots]
existing = ndb.get_multi(r.key for r in rs)
if any(existing):
return False, existing
return True, ndb.put_multi(rs)
I think you should always use transactions, but I don't think your concerns are best addressed by transactions.
I think you should implement a two-stage reservation system - which is what you see on most shopping bags and ticketing companies.
Posting the form creates a "reservation request" , which blocks out the time(s) as "in someone else's shopping bag" for 5-15 minutes
Users must submit again on an approval screen to confirm the times. You can give them the ability to update the conflicts on that screen too, and reset the 'reservation lock' on the timeslots as long as possible.
A cronjob - or a faked one that is triggered by a request coming in at a certain window - clears out expired reservation locks and returns the times back to the pool of available slots.

Google App Engine relationtable and paging?

I do have the following datastore model:
class One(db.Model):
OneDateAdded = db.DateTimeProperty(auto_now_add=True)
OneTitle= db.StringProperty()
OneLink= db.LinkProperty()
class Two(db.Model):
TwoDateAdded = db.DateTimeProperty(auto_now_add=True)
TwoTitle= db.StringProperty()
TwoLink= db.LinkProperty()
class Three(db.Model):
ThreeDateAdded = db.DateTimeProperty(auto_now_add=True)
ThreeTitle= db.StringProperty()
ThreeisSomething = db.BooleanProperty(default=False)
ThreeLink= db.LinkProperty()
and a relation table:
class Relation(db.Model):
RelationDateAdded = db.DateTimeProperty(auto_now_add=True)
RelationOne = db.ReferenceProperty(One)
RelationTwo = db.ReferenceProperty(Two)
RelationThree = db.ReferenceProperty(Three)
when i tried to implement the PagedQuery Library i came arround the problem that i can´t use some sort of join due to GAE restrictions.
what i wan´t to accomplish is a query on my relationtable where RelationThree.ThreeisSomething = True
Looping over a set of results does not seem to be a solution because the paging would not work (gets 10 results 2 are true 8 are false resulting in a page with only 2 results ..)
is there a way to do something simple like this: (which does not work)
myPagedQuery = paging.PagedQuery(Release.all().filter('Three.ThreeisSomething =', True), 10)
You can use limit and offset documented here. Be wary though, that using them might be expensive - setting on offset of 100 and limit of 10 actually loads 110 records and gives you the last 10.

Categories