I have a ModelSerializer class which implements some fields with SerializerMethodField. There are a few fields which calculates currency data according to the rates retrieved from a database if there are, otherwise from a bank API.
With current implementation there are a lot of short queries to the database just to retrieve or check a rate. I'm thinking of reducing the number of queries by sharing already calculated data between fields.
The code will explain the idea better:
class CalcualteSomethingSerializer(ModelSerializer):
currency_value_1 = serializers.SerializerMethodField()
currency_value_2 = serializers.SerializerMethodField()
def get_currency_value_1(obj):
if obj.currency_code != 'USD':
rate = get_rates('USD', obj.currency_code, obj.date)
return calculation_logic1(obj.value, rate)
def get_currency_value_2(obj):
if obj.currency_code != 'USD':
rate = get_rates('USD', obj.currency_code, obj.date)
return calculation_logic2(obj.value, rate)
I've tried to save the rates into self._kwargs but it reduced a number of queries only for 5 queries.
Related
I have the following models (simplified):
class Resource(models.Model):
name = models.CharField(max_length=64, unique=True)
class ResourceFlow(models.Model):
resource = models.ForeignKey(Resource, related_name="flow")
amount = models.IntegerField()
class Workflow(models.Model):
inputs = models.ManyToManyField(ResourceFlow, related_name="workflow")
class Stock(models):
resource = models.ForeignKey(Resource, related_name="stock")
amount = models.IntegerField()
class Producer(models.Model):
workflow = models.ForeignKey(Workflow, related_name="location")
stocks = models.ManyToManyField(Stock, related_name="location")
I would like to test with computation done by the the DB engine if I can start a production.
A production can start if I have enough stock: for my Producer's workflow, all inputs ResourcesFlow amount have to be present in the Producer'stocks
So the queryset might be one those result:
for a given producer return all stocked resources that do not fulfill Workflow inputs amounts conditions
for a given producer return inputs resources needed for the workflow that are not in sufficient quantity in its stocks
It is possible to do that in Django? And if yes how to do it?
Not sure if you've found the answer but anyways, hope I understand your question correctly.
Let's assume we have the following resources:
head = Resource.objects.create(name="head")
neck = Resource.objects.create(name="neck")
body = Resource.objects.create(name="body")
arm = Resource.objects.create(name="arm")
leg = Resource.objects.create(name="leg")
And we have a build_a_robot workflow:
build_a_robot = Workflow.objects.create()
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=head, amount=1))
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=neck, amount=1))
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=body, amount=1))
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=arm, amount=2))
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=leg, amount=2))
And finally, we have a producer:
producer = Producer.objects.create(workflow=build_a_robot)
producer.stocks.add(Stock.objects.create(resource=head, amount=0))
producer.stocks.add(Stock.objects.create(resource=neck, amount=3))
producer.stocks.add(Stock.objects.create(resource=body, amount=1))
producer.stocks.add(Stock.objects.create(resource=arm, amount=10))
producer.stocks.add(Stock.objects.create(resource=leg, amount=1))
We want to find the list of resources that we have run out of to build a robot given producer.
I think here's one way to do it:
from django.db.models import OuterRef, Subquery
required_resources = ResourceFlow.objects.filter(pk__in=producer.workflow.inputs.values("pk")).values("resource")
required_amount = producer.workflow.inputs.filter(resource=OuterRef("resource")).values("amount")[:1]
missing_stocks = Stock.objects.filter(resource_id__in=required_resources).filter(amount__lt=required_amount)
In this example, missing_stocks will be equal to:
<QuerySet [<Stock: Stock [Resource [head], 0]>, <Stock: Stock [Resource [leg], 1]>]>
So, we need more head and leg to build a robot.
I have a simple model of an Observation made by a Sensor:
class Observation(models.Model):
class ObservationType(models.TextChoices):
PM25 = 'pm25_kal', 'PM2,5'
PM10 = 'pm10_kal', 'PM10'
RH = 'rh', _('Relative humidity')
TEMP = 'temp', _('Temperature')
date_time = models.DateTimeField()
sensor = models.ForeignKey(Sensor, on_delete=models.CASCADE)
obs_type = models.CharField(max_length=8, choices=ObservationType.choices)
value = models.DecimalField(max_digits=6, decimal_places=3)
What I want to do, is get a list or QuerySet with the latest Observation of a certain type that should at least have been created within 24 hours, for each sensor. I solved the problem using a model method for my Sensor model and a custom QuerySet for my Observation model, to filter recent observations.
class ObservationQuerySet(models.query.QuerySet):
def recent(self):
return self.filter(date_time__gte=timezone.now() - timedelta(days=1))
def latest_recent_observation(self, obs_type):
try:
return self.observation_set.filter(obs_type=obs_type).recent().latest('date_time')
except Observation.DoesNotExist:
return None
I can loop over all sensors and get the latest_recent_observation() for each of them, but for larger datasets it is pretty slow. Is there any way to make this more efficient?
Edit: At this moment I'm using SQLite, but I might switch to MariaDB. Would that make this faster as well?
I eventually figured it out myself. I used annotation to get the value of the latest recent observation.
latest_observation = Subquery(Observation.objects.filter(sensor_id=OuterRef('id'), obs_type=obs_type)
.recent()
.order_by('-date_time')
.values('value')[:1])
With this, I can use annotate() on a Queryset of my Sensor model, which returns a new Queryset with the value of the latest Observation of that given Sensor.
sensors = Sensor.objects.all().annotate(latest_observation=latest_observation)
I'm trying to reformat a previous question I posed in a more meaningful way here. If I'm not asking the correct question, or not providing enough information please let me know.
I have data coming in from a Pi that is in the below format and I would like a meaningful way of representing the data in the datastore (ndb)to limit my writes when it arrives, and limit the number of queries I need to make to find the correct place to put the data (aka Zone):
Data comes like this:
{'rssi': '*', 'source_addr': '87:ea:66:be:19:d9',
'id':'rx_io_data', 'samples': [{'adc-0': 253, 'adc-1': 2}, {'adc-0':
252,'adc-1': 2}, {'adc-0': 252, 'adc-1': 2}, {'adc-0': 253, 'adc-1':
1}, {'adc-0': 252, 'adc-1': 2}], 'options': '\x00'}
I've highlighted the information that is important (I don't need code for putting data in. more an appropriate structure for my models)..
So I will use the MAC address to find the "Zone" the readings should be associated with, then I will need to use Zone to lookup associated sensors (seems convoluted),then map the sensor (adc-0, adc-1) to it's human readable mapping( temperature, or heart monitor). I want to keep the total readings, and the individual sensor readings such that later I can query for all heart monitor sensors in all zones, or per zone...
So far I have this which seems convoluted and requires a lot of querying and puts:
class Sensors(ndb.Model): # parent Zone
sensorname = ndb.StringProperty(required=True) # Heart, Temp
sensorpin = ndb.StringProperty(required=True) # adc-0, or adc-1 ...
class Zone(ndb.Model):
zname = ndb.StringProperty(required=True) # Name of zone like "Room# or Patient#"
zonemac = ndb.StringProperty(required=True) # MAC of network adapter
homekey = ndb.KeyProperty(kind=Home, required=True)
datecreated = ndb.DateTimeProperty(auto_now_add=True)
class Readings(ndb.Model): # parent Zone
datecreated = ndb.DateTimeProperty(auto_now_add=True)
alldata = ndb.JsonProperty(required=True) # store all the readings in json for later debug
class Reading(ndb.Model): # parent is Sensor or zone ? individual sensor readings ?
readingskey = ndb.KeyProperty(kind=Readings, required=True) # full reading
value = ndb.IntegerProperty(required= True ) # or 0
name = ndb.StringProperty(required = True) # Heart sensor, temp sensor,... )
Potential Option:
class Sensors(ndb.Model):
sensorname = ndb.StringProperty(required=True) # Heart, Temp
sensorpin = ndb.StringProperty(required=True) # adc-0, or adc-1 ...
class Zone(ndb.Model):
zname = ndb.StringProperty(required=True) # Name of zone like "Room# or Patient#"
zonemac = ndb.StringProperty(required=True) # MAC of network adapter
homekey = ndb.KeyProperty(kind=Home, required=True)
sensors = ndb.StructuredProperty(Sensors, repeated = True) #sensors you can get sensor name but how do you map to adc-0 or whatever
datecreated = ndb.DateTimeProperty(auto_now_add=True)
class Readings(ndb.Model): # parent Zone
datecreated = ndb.DateTimeProperty(auto_now_add=True)
alldata = ndb.JsonProperty(required=True) # store all the readings in json for later debug
individualreading = ndb.StructuredProperty(Reading, repeat=True)
class Reading(ndb.Model): # parent is Sensor or zone ? individual sensor readings ?
readingskey = ndb.KeyProperty(kind=Readings, required=True) # full reading
value = ndb.IntegerProperty(required= True ) # or 0
name = ndb.StringProperty(required = True) # Heart sensor, temp sensor,... )
The trick is all I get from the devices are MAC and sensor mappings (ADC-0, ADC-1 and values)
So I need to look for the zones they belong to, and the sensors they map to to later search for them..
I have done very little database modeling so I have no clue how to do the modeling for this. I could create new models and reference them with key properties or created structuredproperties and query for those (eliminating the Sensors model) but then still having to do Readings and Reading after the lokups.
Any help is much appreciated.
And yes I've read the ndb properties and Filtering queries by structured properties and a few similar SO posts like this one.
I guess using a StructuredProperty would save a small amount of write operations. What I would propose though is figuring out which properties don't have to be indexed and declaring them with option indexed=False. As stated on the documentation, " Unindexed properties cost fewer write ops than indexed properties.", and by experience I can tell you that you can really save a lot of write ops this way.
I am trying to improve efficiency of my current query from appengine datastore. Currently, I am using a synchronous method:
class Hospital(ndb.Model):
name = ndb.StringProperty()
buildings= ndb.KeyProperty(kind=Building,repeated=True)
class Building(ndb.Model):
name = ndb.StringProperty()
rooms= ndb.KeyProperty(kind=Room,repeated=True)
class Room(ndb.Model):
name = ndb.StringProperty()
beds = ndb.KeyProperty(kind=Bed,repeated=True)
class Bed(ndb.Model):
name = ndb.StringProperty()
.....
Currently I go through stupidly:
currhosp = ndb.Key(urlsafe=valid_hosp_key).get()
nbuilds = ndb.get_multi(currhosp.buildings)
for b in nbuilds:
rms = ndb.get_multi(b.rooms)
for r in rms:
bds = ndb.get_multi(r.beds)
for b in bds:
do something with b object
I would like to transform this into a much faster query using get_multi_async
My difficulty is in how I can do this?
Any ideas?
Best
Jon
using the given structures above, it is possible, and was confirmed that you can solve this with a set of tasklets. It is a SIGNIFICANT speed up over the iterative method.
#ndb.tasklet
def get_bed_info(bed_key):
bed_info = {}
bed = yield bed_key.get_async()
format and store bed information into bed_info
raise ndb.Return(bed_info)
#nbd.tasklet
def get_room_info(room_key):
room_info = {}
room = yield room_key.get_async()
beds = yield map(get_bed_info,room.beds)
store room info in room_info
room_info["beds"] = beds
raise ndb.Return(room_info)
#ndb.tasklet
def get_building_info(build_key):
build_info = {}
building = yield build_key.get_async()
rooms = yield map(get_room_info,building.rooms)
store building info in build_info
build_info["rooms"] = rooms
raise ndb.Return(build_info)
#ndb.toplevel
def get_hospital_buildings(hospital_object):
buildings = yield map(get_building_info,hospital_object.buildings)
raise ndb.Return(buildings)
Now comes the main call from the hospital function where you have the hospital object (hosp).
hosp_info = {}
buildings = get_hospital_buildings(hospital_obj)
store hospital info in hosp_info
hosp_info["buildings"] = buildings
return hosp_info
There you go! It is incredibly efficient and lets the schedule complete all the information in the fastest possible manner within the GAE backbone.
You can do something with query.map(). See https://developers.google.com/appengine/docs/python/ndb/async#tasklets and https://developers.google.com/appengine/docs/python/ndb/queryclass#Query_map
Its impossible.
Your 2nd query (ndb.get_multi(b.rooms)) depends on the result of your first query.
So pulling it async dosnt work, as at this point the (first) result of the first query has to be avaiable anyway.
NDB does something like that in the background (it allready buffers the next items of ndb.get_multi(currhosp.buildings) while you process the first result).
However, you could use denormalization, i.e. keeping a big table with one entry per Building-Room-Bed pair, and pull your results from that table.
If you have more reads than writes to this table, this will get you a massive speed improvement (1 DB read, instead of 3).
I have a hard time to model my applications data to get reasonable performance.
It is an application that track costs within a group of people and today I have the following entities:
class Event(db.Model):
# Values
name = db.StringProperty(required=True)
password = db.StringProperty(required=True)
class Person(db.Model):
# References
event = db.ReferenceProperty(Event, required=True)
# Values
name = db.StringProperty(required=True)
class Transaction(db.Model):
# References
event = db.ReferenceProperty(Event, required=True)
paidby = db.ReferenceProperty(Person, required=True)
# Values
description = db.StringProperty(required=True)
amount = db.FloatProperty(required=True)
datetime = db.DateTimeProperty(auto_now_add=True)
# This is used because a transaction might not distribute costs
# evenly across all persons belonging to the event
class TransactionPerson(db.Model):
# References
event = db.ReferenceProperty(Event, required=False)
transaction = db.ReferenceProperty(Transaction, required=True)
person = db.ReferenceProperty(Person, required=True)
# Values
amount = db.FloatProperty(required=True)
The problem is when I for example want to calculate the balance for each person, then I have to get all the data associated with an event and loop through all TransactionPerson for each Transaction/Person combination (in the below example that is ~65.000 operations)
I have an event example with:
4 Persons
76 Transaction
213 TransactionPerson
And a request to the start page that shows this balance summary per person and all transactions takes:
real: 1863ms
cpu: 6900ms (1769ms real)
api: 2723ms (94ms real)
At the moment I only do 3 RPC requests to get all persons, transactions and transactionpersons for an event and then do all the "relational" work in the application, thats why the cpu ms is pretty high.
Questions:
It takes 2723ms api cpu to only get the 293 objects from 3 datastore requests, isn't that pretty high? The real time is OK (94ms), but it takes a lot from my api cpu quotas?
How can I design this to get a lot better performance? Real ms today is 1863 for this example above, but if there are for example 12 persons the time would triple. These are not acceptable response times.
Thanks!
Generally you want to optimize for reads.
Instead of calculating a person's balance at read time, calculate changes at write time and denormalize, storing the calculated balance in the Person entity.