Note: This is a simplified example of what I'm actually trying to do here.
I have the following Parent-Child relationship both driven off a declarative_base.
class Parent(declartive_base):
__tablename__ = 'parents'
id = Column(Integer, primary_key=True)
_children = relationship("Child", lazy='dynamic')
def total_for_date(self, date):
return sum([child.num for child in self._children.filter(Child.date == date)])
#classmethod
def total_for_date_query(cls, date):
#TODO Return a query that represents this...
pass
class Child(declarative_base):
__tablename__ = 'children'
id = Column(Integer, primary_key=True)
num = Column(Integer)
date = Column(Date)
parent_id = Column(Integer, ForeignKey('parents.id'))
_parent = relationship("Parent")
I'd like to calculate a total of a certain number associated with a child given a parent query. This can be performed via python as such
q = session.query(Parent).filter(Parent.id_([4,5,10,...]))
total = sum([parent.total_for_date(datetime.date(2018, 1, 2)) for parent in q.all()])
However, the computation here is done in python and given a large amount of data, won't perform as well compared to SQL.
I'm trying to figure out a way using hybrid expressions, selects, sqlalchemy queries etc. to have an equivalent method on the parent that returns a query/selectable/expression that will allow me to perform the computation on the SQL side, but maintain a similar interface compared to the other method.
In this example, I'd would like to do the following instead.
q = session.query(Parent).filter(Parent.id.in_([4,5,10]))
total = q.select_entity_from(Parent.total_for_date_query(datetime.date(2018, 1, 2))).scalar()
#Note idk if "select_entity_from" is what I want here
But I don't know how to fill out the SQL-side method equivalent total_for_date_query. I just can't seem to wrap my head around when to use a Query vs. Selectable, hybrid property expressions vs. hybrid method expressions etc.
Related
This seems like a real beginner question, but I'm having trouble finding a simple answer. I have simplified this down to just the bare bones with a simple data model representing a one-to-many relationship:
class Room(db.Model):
__tablename__ = 'rooms'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(128), unique=True)
capacity = db.Column(db.Integer)
events = db.relationship('Event', backref='room')
class Event(db.Model):
__tablename__ = 'counts'
id = db.Column(db.Integer, primary_key=True)
unusedCapacity = db.Column(db.Integer)
attendance = db.Column(db.Integer)
room_id = db.Column(db.Integer, db.ForeignKey('rooms.id'))
Event.unusedCapacity is calculated as Room.capacity - Event.attendance, but I need to store the value in the column — Room.capacity may change over time, but the Event.unusedCapacity needs to reflect the actual unused capacity at the time of the Event.
I am currently querying the Room and then creating the event:
room = Room.query.get(room_id) # using Flask sqlAlchemy
event = event(unusedCapacity = room.capacity - attendance, ...etc)
My question is: is there a more efficient way to do this in one step?
As noted in the comments by #SuperShoot, a query on insert can calculate the unused capacity in the database without having to fetch first. An explicit constructor, such as shown by #tooTired, could pass a scalar subquery as unusedCapacity:
class Event(db.Model):
...
def __init__(self, **kwgs):
if 'unusedCapacity' not in kwgs:
kwgs['unusedCapacity'] = \
db.select([Room.capacity - kwgs['attendance']]).\
where(Room.id == kwgs['room_id']).\
as_scalar()
super().__init__(**kwgs)
Though it is possible to use client-invoked SQL expressions as defaults, I'm not sure how one could refer to the values to be inserted in the expression without using a context-sensitive default function, but that did not quite work out: the scalar subquery was not inlined and SQLAlchemy tried to pass it using placeholders instead.
A downside of the __init__ approach is that you cannot perform bulk inserts that would handle unused capacity using the table created for the model as is, but will have to perform a manual query that does the same.
Another thing to look out for is that until a flush takes place the unusedCapacity attribute of a new Event object holds the SQL expression object, not the actual value. The solution by #tooTired is more transparent in this regard, since a new Event object will hold the numeric value of unused capacity from the get go.
SQLAlchemy adds an implicit constructor to all model classes which accepts keyword arguments for all its columns and relationships. You can override this and pass the kwargs without unusedCapacity and get the room capacity in the constructor:
class Event(db.Model):
# ...
#kwargs without unusedCapacity
def __init__(**kwargs):
room = Room.query.get(kwargs.get(room_id))
super(Event, self).__init__(unusedCapacity = room.capacity - kwargs.get(attendance), **kwargs)
#Create new event normally
event = Event(id = 1, attendance = 1, room_id = 1)
I made this statement using flask-sqlalchemy and I've chosen to keep it in its original form. Post.query is equivalent to session.query(Post)
I attempted to make a subquery that would filter out all posts in a database which are in the draft state and not made or modified by the current user. I made this query,
Post.query\
.filter(sqlalchemy.and_(
Post.post_status != Consts.PostStatuses["Draft"],
sqlalchemy.or_(
Post.modified_by_id == current_user.get_id(),
Post.created_by_id == current_user.get_id()))
which created:
Where true AND ("Post".modified_by_id = :modified_by_id_1 OR
"Post".created_by_id = :created_by_id_1)
Expected outcome:
Where "Post".post_status != "Draft" AND (
"Post".modified_by_id = :modified_by_id_1 OR
"Post".created_by_id = :created_by_id_1)
I'm wondering, why this is happening? How can I increase the error level in SQLAlchemy? I think my project is silently failing and I would like to confirm my guess.
Update:
I used the wrong constants dictionary. One dictionary contains ints, the other contains strings (one for data base queries, one for printing).
_post_status = db.Column(
db.SmallInteger,
default=Consts.post_status["Draft"])
post_status contains integers, Consts.PostStatuses contains strings. In hind sight, really bad idea. I'm going to make a single dictionary that returns a tuple instead of two dictionaries.
#property
def post_status(self):
return Consts.post_status.get(getattr(self, "_post_status", None))
the problem is that your post_status property isn't acceptable for usage in an ORM level query, as this is a python descriptor which at the class level by default returns itself:
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
_post_status = Column(String)
#property
def post_status(self):
return self._post_status
print (A.post_status)
print (A.post_status != 5678)
output:
$ python test.py
<property object at 0x10165bd08>
True
the type of usage you're looking for seems like that of a hybrid attribute, which is a SQLAlchemy-included extension to a "regular" python descriptor which produces class-level behavior that's compatible with core SQL expressions:
from sqlalchemy.ext.hybrid import hybrid_property
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
_post_status = Column(String)
#hybrid_property
def post_status(self):
return self._post_status
print (A.post_status)
print (A.post_status != 5678)
output:
$ python test.py
A._post_status
a._post_status != :_post_status_1
be sure to read the hybrid doc carefully including how to establish the correct SQL expression behavior, descriptors that work both at the instance and class level is a somewhat advanced Python technique.
Here's an adjacency list example:
class TreeNode(Base):
__tablename__ = 'tree'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey(id))
name = Column(String(50), nullable=False)
children = relationship("TreeNode",
cascade="all",
backref=backref("parent", remote_side=id)
)
Supposing I've got a simple linear structure:
(0)---->(1)---->(2)---->(3)
How do I get all ancestor nodes of a certain node? Something like node2.parents.all() that returns a list of nodes 0 and 1.
I tried to do this:
parents = relationship("TreeNode", cascade="all", primaryjoin="TreeNode.parent_id==TreeNode.id")
with no luck - it returns children instead of parents.
Thanks.
You can not do it using simple relationship.
If you use MSSQL or Postgresql, instead try to create a (Hybrid) attribute, which would leverage on Query.cte.
Thank you, I'll look it up - for now it seems to be a little bit dark. If someone else's stumbled on this, it's possible to use rather more exprensive thing which still does what I want:
#property
def parents(self):
allparents = []
p = self.parent
while p:
allparents.append(p)
p = p.parent
return allparents
Looking at the bottom of the post you can see i have three classes. The code here is pseudo code written on the fly and untested however it adequately shows my problem. If we need the actual classes I can update this question tomorrow when at work. So ignore syntax issues and code that only represents a thought rather than the actual "code" that would do what i describe there.
Question 1
If you look at the Item search class method you can see that when the user does a search i call search on the base class then based on that result return the correct class/object. This works but seems kludgy. Is there a better way to do this?
Question 2
If you look at the KitItem class you can see that I am overriding the list price. If the flag calc_list is set to true then I sum the list price of the components and return that as the list price for the kit. If its not marked as true I want to return the "base" list price. However as far as I know there is no way to access a parent attribute since in a normal setup it would be meaningless but with sqlalchemy and shared table inheritance it could be useful.
TIA
class Item(DeclarativeBase):
__tablename__ = 'items'
item_id = Column(Integer,primary_key=True,autoincrement=True)
sku = Column(Unicode(50),nullable=False,unique=True)
list_price = Column(Float)
cost_price = Column(Float)
item_type = Column(Unicode(1))
__mapper_args__ = {'polymorphic_on': item_type}
__
def __init__(self,sku,list_price,cost_price):
self.sku = sku
self.list_price = list_price
self.cost_price = cost_price
#classmethod
def search(cls):
"""
" search based on sku, description, long description
" return item as proper class
"""
item = DBSession.query(cls).filter(...) #do search stuff here
if item.item_type == 'K': #Better way to do this???
return DBSession.query(KitItem).get(item.item_id)
class KitItem(Item):
__mapper_args__ = {'polymorphic_identity': 'K'}
calc_list = Column(Boolean,nullable=False,default=False)
#property
def list_price(self):
if self.calc_list:
list_price = 0.0
for comp in self.components:
list_price += comp.component.list_price * comp.qty
return list_price
else:
#need help here
item = DBSession.query(Item).get(self.item_id)
return item.list_price
class KitComponent(DeclarativeBase):
__tablename__ = "kit_components"
kit_id = Column(Integer,ForeignKey('items.item_id'),primarykey=True)
component_id = Column(Integer,ForeignKey('items.item_id'),primarykey=True)
qty = Column(Integer,nullable=False, default=1)
kit = relation(KitItem,backref=backref("components"))
component = relation(Item)
Answer-1: in fact you do not need to do anything special here: given that you configured your inheritance hierarchy properly, your query will already return proper class for every row (Item or KitItem). This is the advantage of the ORM part. What you could do though is to configure the query to immediatelly load also the additional columns which do belong to children of Item (from your code this is only calc_list column), which you can do by specifying with_polymorphic('*'):
#classmethod
def search(cls):
item = DBSession.query(cls).with_polymorphic('*').filter(...) #do search stuff here
return item
Read more on this in Basic Control of Which Tables are Queried.
To see the difference, enabled SQL logging, and compare your tests scripts with and without with_polymorphic(...) - you will most probably require less SQL statements being executed.
Answer-2: I would not override one entry attributed with one which is purely computed. Instead I would just create another computed attribute (lets call it final_price), which would look like following for each of two classes:
class Item(Base):
...
#property
def total_price(self):
return self.list_price
class KitItem(Item):
...
#property
def total_price(self):
if self.calc_list:
_price = 0.0
for comp in self.components:
_price += comp.component.list_price * comp.qty
return _price
else:
# #note: again, you do not need to perform any query here at all, as *self* is that you need
return self.list_price
Also in this case, you might think of configuring the relationship KitItem.components to be eagerly loaded, so that the calculation of the total_price will not trigger additional SQL. But you have to decide yourself if this is beneficial for your use cases (again, analyse the SQLs generated in your scenario).
I have two legacy table that I would like to use SQLAlchemy declarative to access data.
Order:
order_id
is_processed
FooData:
foo_id
order_id
A order may or may not have FooData and I would like to distinguish between the two order types using SQLAlchemy declarative models.
The problem I have wrapping my head around is.
How do I set up such a relationship? Ideally I'd have two classes Order and FooOrder where Order has no FooData and FooOrder has FooData.
I have to query both types (Order and FooOrder) together based on is_processed and process them differently based on whether it is Order or FooOrder. How do I go about querying in this case?
If you can change the DB, then simply add one discriminator column, set
the value of this column to proper value (order|foodata) depending on whether
the foodata exists for it, make it NOT NULL and configure simple Joined Table Inheritance.
If you cannot change the DB (add a discriminator column) and you only have the simple
2-table model as you show, then I would not use inheritance, but rather 1-1 relationship.
Model Definition:
class Order(Base):
__tablename__ = 'order'
__table_args__ = {'autoload': True}
class FooData(Base):
__tablename__ = 'foo_data'
__table_args__ = {'autoload': True}
# #note: you need next line only if your DB does not have FK defined
#__table_args__ = (ForeignKeyConstraint(['order_id'], ['order.order_id']), {'autoload': True})
# define 1-[0..1] relationship from Order to FooData with eager loading
Order.foodata = relationship(FooData, uselist=False, lazy="joined", backref="order")
Adding new objects:
ord = Order(); ord.is_processed = False
session.add(ord)
ord = Order(); ord.is_processed = False
foo = FooData(); foo.someinfo = "test foo created from SA"
ord.foodata = foo
session.add(ord)
session.commit()
Query: all based on is_processed:
qry = session.query(Order).filter(Order.is_processed == False)
for ord in qry:
print ord, ord.foodata
A la Polymorphism:
You can even implement methods on your Order and FooData in a way that it
would seem they are in fact using inheritance:
class Order(Base):
# ...
def process_order(self):
if self.foodata:
self.foodata.process_order()
else:
print "processing order: ", self
class FooData(Base):
# ...
def process_order(self):
print "processing foo_data: ", self