Google App Engine relationtable and paging? - python

I do have the following datastore model:
class One(db.Model):
OneDateAdded = db.DateTimeProperty(auto_now_add=True)
OneTitle= db.StringProperty()
OneLink= db.LinkProperty()
class Two(db.Model):
TwoDateAdded = db.DateTimeProperty(auto_now_add=True)
TwoTitle= db.StringProperty()
TwoLink= db.LinkProperty()
class Three(db.Model):
ThreeDateAdded = db.DateTimeProperty(auto_now_add=True)
ThreeTitle= db.StringProperty()
ThreeisSomething = db.BooleanProperty(default=False)
ThreeLink= db.LinkProperty()
and a relation table:
class Relation(db.Model):
RelationDateAdded = db.DateTimeProperty(auto_now_add=True)
RelationOne = db.ReferenceProperty(One)
RelationTwo = db.ReferenceProperty(Two)
RelationThree = db.ReferenceProperty(Three)
when i tried to implement the PagedQuery Library i came arround the problem that i can´t use some sort of join due to GAE restrictions.
what i wan´t to accomplish is a query on my relationtable where RelationThree.ThreeisSomething = True
Looping over a set of results does not seem to be a solution because the paging would not work (gets 10 results 2 are true 8 are false resulting in a page with only 2 results ..)
is there a way to do something simple like this: (which does not work)
myPagedQuery = paging.PagedQuery(Release.all().filter('Three.ThreeisSomething =', True), 10)

You can use limit and offset documented here. Be wary though, that using them might be expensive - setting on offset of 100 and limit of 10 actually loads 110 records and gives you the last 10.

Related

How to build a Django QuerySet to check conditions on two manyToMany fields

I have the following models (simplified):
class Resource(models.Model):
name = models.CharField(max_length=64, unique=True)
class ResourceFlow(models.Model):
resource = models.ForeignKey(Resource, related_name="flow")
amount = models.IntegerField()
class Workflow(models.Model):
inputs = models.ManyToManyField(ResourceFlow, related_name="workflow")
class Stock(models):
resource = models.ForeignKey(Resource, related_name="stock")
amount = models.IntegerField()
class Producer(models.Model):
workflow = models.ForeignKey(Workflow, related_name="location")
stocks = models.ManyToManyField(Stock, related_name="location")
I would like to test with computation done by the the DB engine if I can start a production.
A production can start if I have enough stock: for my Producer's workflow, all inputs ResourcesFlow amount have to be present in the Producer'stocks
So the queryset might be one those result:
for a given producer return all stocked resources that do not fulfill Workflow inputs amounts conditions
for a given producer return inputs resources needed for the workflow that are not in sufficient quantity in its stocks
It is possible to do that in Django? And if yes how to do it?
Not sure if you've found the answer but anyways, hope I understand your question correctly.
Let's assume we have the following resources:
head = Resource.objects.create(name="head")
neck = Resource.objects.create(name="neck")
body = Resource.objects.create(name="body")
arm = Resource.objects.create(name="arm")
leg = Resource.objects.create(name="leg")
And we have a build_a_robot workflow:
build_a_robot = Workflow.objects.create()
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=head, amount=1))
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=neck, amount=1))
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=body, amount=1))
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=arm, amount=2))
build_a_robot.inputs.add(ResourceFlow.objects.create(resource=leg, amount=2))
And finally, we have a producer:
producer = Producer.objects.create(workflow=build_a_robot)
producer.stocks.add(Stock.objects.create(resource=head, amount=0))
producer.stocks.add(Stock.objects.create(resource=neck, amount=3))
producer.stocks.add(Stock.objects.create(resource=body, amount=1))
producer.stocks.add(Stock.objects.create(resource=arm, amount=10))
producer.stocks.add(Stock.objects.create(resource=leg, amount=1))
We want to find the list of resources that we have run out of to build a robot given producer.
I think here's one way to do it:
from django.db.models import OuterRef, Subquery
required_resources = ResourceFlow.objects.filter(pk__in=producer.workflow.inputs.values("pk")).values("resource")
required_amount = producer.workflow.inputs.filter(resource=OuterRef("resource")).values("amount")[:1]
missing_stocks = Stock.objects.filter(resource_id__in=required_resources).filter(amount__lt=required_amount)
In this example, missing_stocks will be equal to:
<QuerySet [<Stock: Stock [Resource [head], 0]>, <Stock: Stock [Resource [leg], 1]>]>
So, we need more head and leg to build a robot.

How to query additional databases using cursor in Django Pytests

I am developing a Django app (Django v3.2.10, pytest v7.0.1, pytest-django v4.5.2) which uses cursor to perform raw queries to my secondary DB: my_db2, but when running tests, all the queries return empty results, like if they were running on parallel transactions.
My test file:
#pytest.mark.django_db(transaction=True, databases=['default', 'my_db2'])
class TestItems:
def test_people(self):
person1 = PeopleFactory() # Adds 1 person to my_db2
assert fetch_all_persons() == 1 # Fails Returns 0
My Factory:
class PeopleFactory(factory.django.DjangoModelFactory):
id = factory.Sequence(lambda x: x + 1)
name = factory.Faker('first_name')
class Meta:
model = People
My function:
from django.db import connections
def fetch_all_persons():
with connections['my_db2'].cursor() as cursor:
cursor.execute(f"SELECT * FROM Persons")
return len(list(cursor.fetchall())):
According documentation transaction=True should prevent this issue, but it doesn't, does somebody know how to fix it?
Note.- Using the ORM is not an option, this is just a simplified example to represent the issue. The real queries used are way more complex.
#hoefling and #Arkadiusz Łukasiewicz were right, I just needed to add the corresponding DB within the factories:
class PeopleFactory(factory.django.DjangoModelFactory):
id = factory.Sequence(lambda x: x + 1)
name = factory.Faker('first_name')
class Meta:
model = People
database = 'my_db2'
Thank you both.

peewee check automatically created id not in result of subquery

I have next data structure:
from enum import IntEnum, unique
from pathlib import Path
from datetime import datetime
from peewee import *
#unique
class Status(IntEnum):
CREATED = 0
FAIL = -1
SUCCESS = 1
db_path = Path(__file__).parent / "test.sqlite"
database = SqliteDatabase(db_path)
class BaseModel(Model):
class Meta:
database = database
class Unit(BaseModel):
name = TextField(unique=True)
some_field = TextField(null=True)
created_at = DateTimeField(default=datetime.now)
class Campaign(BaseModel):
id_ = AutoField()
created_at = DateTimeField(default=datetime.now)
class Task(BaseModel):
id_ = AutoField()
status = IntegerField(default=Status.CREATED)
unit = ForeignKeyField(Unit, backref="tasks")
campaign = ForeignKeyField(Campaign, backref="tasks")
Next code create units, campaign and tasks:
def fill_units(count):
units = []
with database.atomic():
for i in range(count):
units.append(Unit.create(name=f"unit{i}"))
return units
def init_campaign(count):
units = Unit.select().limit(count)
with database.atomic():
campaign = Campaign.create()
for unit in units:
Task.create(unit=unit, campaign=campaign)
return campaign
The problem appears when I'm trying to add more units into existing campaign. I need to select units which haven't been used in this campaign. In SQL I can do this using next query:
SELECT * FROM unit WHERE id NOT IN (SELECT unit_id FROM task WHERE campaign_id = 1) LIMIT 10
But how to do this using peewee?
The only way I've found yet is:
def get_new_units_for_campaign(campaign, count):
unit_names = [task.unit.name for task in campaign.tasks]
units = Unit.select().where(Unit.name.not_in(unit_names)).limit(count)
return units
It's somehow works but I'm 100% sure that it's the dumbest way to implement this. Could you show me the proper way to implement this?
Finally I found this:
Unit.select().where(Unit.id.not_in(campaign.tasks.select(Task.unit))).limit(10)
Which produces
SELECT "t1"."id", "t1"."name", "t1"."some_field", "t1"."created_at" FROM "unit" AS "t1" WHERE ("t1"."id" NOT IN (SELECT "t2"."unit_id" FROM "task" AS "t2" WHERE ("t2"."campaign_id" = 1))) LIMIT 10
Which matches with SQL query I've provided in my question.
P.S. I've done some research and it seems to be a proper implementation, but I'd appreciate if somebody correct me and show the better way (if exist).

How to create a custom AutoField primary_key entry within Django

I am trying to create a custom primary_key within my helpdesk/models.py that I will use to track our help desk tickets. I am in the process of writing a small ticking system for our office.
Maybe there is a better way? Right now I have:
id = models.AutoField(primary_key=True)
This increments in the datebase as; 1, 2, 3, 4....50...
I want to take this id assignment and then use it within a function to combine it with some additional information like the date, and the name, 'HELPDESK'.
The code I was using is as follows:
id = models.AutoField(primary_key=True)
def build_id(self, id):
join_dates = str(datetime.now().strftime('%Y%m%d'))
return (('HELPDESK-' + join_dates) + '-' + str(id))
ticket_id = models.CharField(max_length=15, default=(build_id(None, id)))
The idea being is that the entries in the database would be:
HELPDESK-20170813-1
HELPDESK-20170813-2
HELPDESK-20170814-3
...
HELPDESK-20170901-4
...
HELPDESK-20180101-50
...
I want to then use this as the ForeignKey to link the help desk ticket to some other models in the database.
Right now what's coming back is:
HELPDESK-20170813-<django.db.models.fields.AutoField>
This post works - Custom Auto Increment Field Django Curious if there is a better way. If not, this will suffice.
This works for me. It's a slightly modified version from Custom Auto Increment Field Django from above.
models.py
def increment_helpdesk_number():
last_helpdesk = helpdesk.objects.all().order_by('id').last()
if not last_helpdesk:
return 'HEL-' + str(datetime.now().strftime('%Y%m%d-')) + '0000'
help_id = last_helpdesk.help_num
help_int = help_id[13:17]
new_help_int = int(help_int) + 1
new_help_id = 'HEL-' + str(datetime.now().strftime('%Y%m%d-')) + str(new_help_int).zfill(4)
return new_help_id
It's called like this:
help_num = models.CharField(max_length=17, unique=True, default=increment_helpdesk_number, editable=False)
If gives you the following:
HEL-20170815-0000
HEL-20170815-0001
HEL-20170815-0002
...
The numbering doesn't start over after each day, which is something I may look at doing. The more I think about it; however, I am not sure if I even need the date there as I have a creation date field in the model already. So I may just change it to:
HEL-000000000
HEL-000000001
HEL-000000002
...

Using yield with multiple ndb.get_multi_async

I am trying to improve efficiency of my current query from appengine datastore. Currently, I am using a synchronous method:
class Hospital(ndb.Model):
name = ndb.StringProperty()
buildings= ndb.KeyProperty(kind=Building,repeated=True)
class Building(ndb.Model):
name = ndb.StringProperty()
rooms= ndb.KeyProperty(kind=Room,repeated=True)
class Room(ndb.Model):
name = ndb.StringProperty()
beds = ndb.KeyProperty(kind=Bed,repeated=True)
class Bed(ndb.Model):
name = ndb.StringProperty()
.....
Currently I go through stupidly:
currhosp = ndb.Key(urlsafe=valid_hosp_key).get()
nbuilds = ndb.get_multi(currhosp.buildings)
for b in nbuilds:
rms = ndb.get_multi(b.rooms)
for r in rms:
bds = ndb.get_multi(r.beds)
for b in bds:
do something with b object
I would like to transform this into a much faster query using get_multi_async
My difficulty is in how I can do this?
Any ideas?
Best
Jon
using the given structures above, it is possible, and was confirmed that you can solve this with a set of tasklets. It is a SIGNIFICANT speed up over the iterative method.
#ndb.tasklet
def get_bed_info(bed_key):
bed_info = {}
bed = yield bed_key.get_async()
format and store bed information into bed_info
raise ndb.Return(bed_info)
#nbd.tasklet
def get_room_info(room_key):
room_info = {}
room = yield room_key.get_async()
beds = yield map(get_bed_info,room.beds)
store room info in room_info
room_info["beds"] = beds
raise ndb.Return(room_info)
#ndb.tasklet
def get_building_info(build_key):
build_info = {}
building = yield build_key.get_async()
rooms = yield map(get_room_info,building.rooms)
store building info in build_info
build_info["rooms"] = rooms
raise ndb.Return(build_info)
#ndb.toplevel
def get_hospital_buildings(hospital_object):
buildings = yield map(get_building_info,hospital_object.buildings)
raise ndb.Return(buildings)
Now comes the main call from the hospital function where you have the hospital object (hosp).
hosp_info = {}
buildings = get_hospital_buildings(hospital_obj)
store hospital info in hosp_info
hosp_info["buildings"] = buildings
return hosp_info
There you go! It is incredibly efficient and lets the schedule complete all the information in the fastest possible manner within the GAE backbone.
You can do something with query.map(). See https://developers.google.com/appengine/docs/python/ndb/async#tasklets and https://developers.google.com/appengine/docs/python/ndb/queryclass#Query_map
Its impossible.
Your 2nd query (ndb.get_multi(b.rooms)) depends on the result of your first query.
So pulling it async dosnt work, as at this point the (first) result of the first query has to be avaiable anyway.
NDB does something like that in the background (it allready buffers the next items of ndb.get_multi(currhosp.buildings) while you process the first result).
However, you could use denormalization, i.e. keeping a big table with one entry per Building-Room-Bed pair, and pull your results from that table.
If you have more reads than writes to this table, this will get you a massive speed improvement (1 DB read, instead of 3).

Categories