I have this class:
class Monitor(db.Model):
'''
Base Monitor class.
'''
__tablename__ = 'monitor'
id = db.Column(db.Integer(), primary_key=True)
last_checked = db.Column(db.DateTime(timezone=False))
poll_interval = db.Column(db.Interval(),
default=datetime.timedelta(seconds=300))
And I have this query where I attempt to return only objects that haven't been checked since (now - interval):
monitors = db.session.query(Monitor).\
filter(or_(Monitor.last_checked < (datetime.utcnow() - Monitor.poll_interval)),
Monitor.last_checked == None).\
all()
But the query returns nothing. I'm having a hard time figuring out the proper way to do this. Am I on the right track or am I missing something? I'm using MySQL as the database.
Your parenthesis are wrong. I believe what you want is:
monitors = db.session.query(Monitor).\
filter(or_(Monitor.last_checked < (datetime.utcnow() - Monitor.poll_interval),
Monitor.last_checked == None)).\
all()
Related
This seems like a real beginner question, but I'm having trouble finding a simple answer. I have simplified this down to just the bare bones with a simple data model representing a one-to-many relationship:
class Room(db.Model):
__tablename__ = 'rooms'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(128), unique=True)
capacity = db.Column(db.Integer)
events = db.relationship('Event', backref='room')
class Event(db.Model):
__tablename__ = 'counts'
id = db.Column(db.Integer, primary_key=True)
unusedCapacity = db.Column(db.Integer)
attendance = db.Column(db.Integer)
room_id = db.Column(db.Integer, db.ForeignKey('rooms.id'))
Event.unusedCapacity is calculated as Room.capacity - Event.attendance, but I need to store the value in the column — Room.capacity may change over time, but the Event.unusedCapacity needs to reflect the actual unused capacity at the time of the Event.
I am currently querying the Room and then creating the event:
room = Room.query.get(room_id) # using Flask sqlAlchemy
event = event(unusedCapacity = room.capacity - attendance, ...etc)
My question is: is there a more efficient way to do this in one step?
As noted in the comments by #SuperShoot, a query on insert can calculate the unused capacity in the database without having to fetch first. An explicit constructor, such as shown by #tooTired, could pass a scalar subquery as unusedCapacity:
class Event(db.Model):
...
def __init__(self, **kwgs):
if 'unusedCapacity' not in kwgs:
kwgs['unusedCapacity'] = \
db.select([Room.capacity - kwgs['attendance']]).\
where(Room.id == kwgs['room_id']).\
as_scalar()
super().__init__(**kwgs)
Though it is possible to use client-invoked SQL expressions as defaults, I'm not sure how one could refer to the values to be inserted in the expression without using a context-sensitive default function, but that did not quite work out: the scalar subquery was not inlined and SQLAlchemy tried to pass it using placeholders instead.
A downside of the __init__ approach is that you cannot perform bulk inserts that would handle unused capacity using the table created for the model as is, but will have to perform a manual query that does the same.
Another thing to look out for is that until a flush takes place the unusedCapacity attribute of a new Event object holds the SQL expression object, not the actual value. The solution by #tooTired is more transparent in this regard, since a new Event object will hold the numeric value of unused capacity from the get go.
SQLAlchemy adds an implicit constructor to all model classes which accepts keyword arguments for all its columns and relationships. You can override this and pass the kwargs without unusedCapacity and get the room capacity in the constructor:
class Event(db.Model):
# ...
#kwargs without unusedCapacity
def __init__(**kwargs):
room = Room.query.get(kwargs.get(room_id))
super(Event, self).__init__(unusedCapacity = room.capacity - kwargs.get(attendance), **kwargs)
#Create new event normally
event = Event(id = 1, attendance = 1, room_id = 1)
Note: This is a simplified example of what I'm actually trying to do here.
I have the following Parent-Child relationship both driven off a declarative_base.
class Parent(declartive_base):
__tablename__ = 'parents'
id = Column(Integer, primary_key=True)
_children = relationship("Child", lazy='dynamic')
def total_for_date(self, date):
return sum([child.num for child in self._children.filter(Child.date == date)])
#classmethod
def total_for_date_query(cls, date):
#TODO Return a query that represents this...
pass
class Child(declarative_base):
__tablename__ = 'children'
id = Column(Integer, primary_key=True)
num = Column(Integer)
date = Column(Date)
parent_id = Column(Integer, ForeignKey('parents.id'))
_parent = relationship("Parent")
I'd like to calculate a total of a certain number associated with a child given a parent query. This can be performed via python as such
q = session.query(Parent).filter(Parent.id_([4,5,10,...]))
total = sum([parent.total_for_date(datetime.date(2018, 1, 2)) for parent in q.all()])
However, the computation here is done in python and given a large amount of data, won't perform as well compared to SQL.
I'm trying to figure out a way using hybrid expressions, selects, sqlalchemy queries etc. to have an equivalent method on the parent that returns a query/selectable/expression that will allow me to perform the computation on the SQL side, but maintain a similar interface compared to the other method.
In this example, I'd would like to do the following instead.
q = session.query(Parent).filter(Parent.id.in_([4,5,10]))
total = q.select_entity_from(Parent.total_for_date_query(datetime.date(2018, 1, 2))).scalar()
#Note idk if "select_entity_from" is what I want here
But I don't know how to fill out the SQL-side method equivalent total_for_date_query. I just can't seem to wrap my head around when to use a Query vs. Selectable, hybrid property expressions vs. hybrid method expressions etc.
I made this statement using flask-sqlalchemy and I've chosen to keep it in its original form. Post.query is equivalent to session.query(Post)
I attempted to make a subquery that would filter out all posts in a database which are in the draft state and not made or modified by the current user. I made this query,
Post.query\
.filter(sqlalchemy.and_(
Post.post_status != Consts.PostStatuses["Draft"],
sqlalchemy.or_(
Post.modified_by_id == current_user.get_id(),
Post.created_by_id == current_user.get_id()))
which created:
Where true AND ("Post".modified_by_id = :modified_by_id_1 OR
"Post".created_by_id = :created_by_id_1)
Expected outcome:
Where "Post".post_status != "Draft" AND (
"Post".modified_by_id = :modified_by_id_1 OR
"Post".created_by_id = :created_by_id_1)
I'm wondering, why this is happening? How can I increase the error level in SQLAlchemy? I think my project is silently failing and I would like to confirm my guess.
Update:
I used the wrong constants dictionary. One dictionary contains ints, the other contains strings (one for data base queries, one for printing).
_post_status = db.Column(
db.SmallInteger,
default=Consts.post_status["Draft"])
post_status contains integers, Consts.PostStatuses contains strings. In hind sight, really bad idea. I'm going to make a single dictionary that returns a tuple instead of two dictionaries.
#property
def post_status(self):
return Consts.post_status.get(getattr(self, "_post_status", None))
the problem is that your post_status property isn't acceptable for usage in an ORM level query, as this is a python descriptor which at the class level by default returns itself:
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
_post_status = Column(String)
#property
def post_status(self):
return self._post_status
print (A.post_status)
print (A.post_status != 5678)
output:
$ python test.py
<property object at 0x10165bd08>
True
the type of usage you're looking for seems like that of a hybrid attribute, which is a SQLAlchemy-included extension to a "regular" python descriptor which produces class-level behavior that's compatible with core SQL expressions:
from sqlalchemy.ext.hybrid import hybrid_property
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
_post_status = Column(String)
#hybrid_property
def post_status(self):
return self._post_status
print (A.post_status)
print (A.post_status != 5678)
output:
$ python test.py
A._post_status
a._post_status != :_post_status_1
be sure to read the hybrid doc carefully including how to establish the correct SQL expression behavior, descriptors that work both at the instance and class level is a somewhat advanced Python technique.
Or how do I make this thing work?
I have an Interval object:
class Interval(Base):
__tablename__ = 'intervals'
id = Column(Integer, primary_key=True)
start = Column(DateTime)
end = Column(DateTime, nullable=True)
task_id = Column(Integer, ForeignKey('tasks.id'))
#hybrid_property #used to just be #property
def hours_spent(self):
end = self.end or datetime.datetime.now()
return (end-start).total_seconds()/60/60
And a Task:
class Task(Base):
__tablename__ = 'tasks'
id = Column(Integer, primary_key=True)
title = Column(String)
intervals = relationship("Interval", backref="task")
#hybrid_property # Also used to be just #property
def hours_spent(self):
return sum(i.hours_spent for i in self.intervals)
Add all the typical setup code, of course.
Now when I try to do session.query(Task).filter(Task.hours_spent > 3).all()
I get NotImplementedError: <built-in function getitem> from the sum(i.hours_spent... line.
So I was looking at this part of the documentation and theorized that there might be some way that I can write something that will do what I want. This part also looks like it may be of use, and I'll be looking at it while waiting for an answer here ;)
For a simple example of SQLAlchemy's coalesce function, this may help: Handling null values in a SQLAlchemy query - equivalent of isnull, nullif or coalesce.
Here are a couple of key lines of code from that post:
from sqlalchemy.sql.functions import coalesce
my_config = session.query(Config).order_by(coalesce(Config.last_processed_at, datetime.date.min)).first()
SQLAlchemy is not smart enough to build SQL expression tree from these operands, you have to use explicit propname.expression decorator to provide it. But then comes another problem: there is no portable way to convert interval to hours in-database. You'd use TIMEDIFF in MySQL, EXTRACT(EPOCH FROM ... ) / 3600 in PostgreSQL etc. I suggest changing properties to return timedelta instead, and comparing apples to apples.
from sqlalchemy import select, func
class Interval(Base):
...
#hybrid_property
def time_spent(self):
return (self.end or datetime.now()) - self.start
#time_spent.expression
def time_spent(cls):
return func.coalesce(cls.end, func.current_timestamp()) - cls.start
class Task(Base):
...
#hybrid_property
def time_spent(self):
return sum((i.time_spent for i in self.intervals), timedelta(0))
#time_spent.expression
def hours_spent(cls):
return (select([func.sum(Interval.time_spent)])
.where(cls.id==Interval.task_id)
.label('time_spent'))
The final query is:
session.query(Task).filter(Task.time_spent > timedelta(hours=3)).all()
which translates to (on PostgreSQL backend):
SELECT task.id AS task_id, task.title AS task_title
FROM task
WHERE (SELECT sum(coalesce(interval."end", CURRENT_TIMESTAMP) - interval.start) AS sum_1
FROM interval
WHERE task.id = interval.task_id) > %(param_1)s
I needed to use the text function and could not use 0 as an integer.
import sqlalchemy as sa
session.query(sa.func.coalesce(table1.col1, sa.text("0"))).all()
There is a complete example of making a func action similar to coalesc or nvl.
Note how it takes in arguements, and renders an expression... in this case NVL(a, b) when used with Oracle.
http://docs.sqlalchemy.org/en/latest/core/compiler.html#subclassing-guidelines
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import FunctionElement
class coalesce(FunctionElement):
name = 'coalesce'
#compiles(coalesce)
def compile(element, compiler, **kw):
return "coalesce(%s)" % compiler.process(element.clauses)
#compiles(coalesce, 'oracle')
def compile(element, compiler, **kw):
if len(element.clauses) > 2:
raise TypeError("coalesce only supports two arguments on Oracle")
return "nvl(%s)" % compiler.process(element.clauses)
Then when you want to use it...
from my_oracle_functions_sqla import coalesce
select([coalesce(A.value, '---')]) # etc
Hope that helps.
This is the first time I've used ORM, so I'm not sure the best way to handle this. I have a one-to-many relationship where each Parent can have many Children:
class Parent(Base):
__tablename__ = 'Parent'
name = Column(String(50))
gid = Column(String(16), primary_key = True)
lastUpdate = Column(DateTime)
def __init__(self,name, gid):
self.name = name
self.gid = gid
self.lastUpdate = datetime.datetime.now()
class Child(Base):
__tablename__ = 'Child'
id = Column(Integer, primary_key = True)
loc = Column(String(50))
status = Column(String(50))
parent_gid = Column(String(16), ForeignKey('Parent.gid'))
parent = relationship("Parent", backref=backref('children'))
Now, updates are coming in over the network. When an update comes in, I want to UPDATE the appropriate Parent row (updating lastUpdate column) and INSERT new children rows into the database. I don't know how to do that with ORM. Here is my failed attempt:
engine = create_engine('sqlite+pysqlite:///file.db',
module=dbapi2)
Base.metadata.create_all(engine)
session = sessionmaker(bind=engine)()
def addChildren(parent):
p = session.query(Parent).filter(Parent.gid == p1.gid).all()
if len(p) == 0:
session.add(p1)
session.commit()
else:
updateChildren = parent.children[:]
parent.chlidren = []
for c in updateChildren:
c.parent_gid = parent.gid
session.add_all(updateChildren)
session.commit()
if __name__ == '__main__':
#first update from the 'network'
p1 = Parent(name='team1', gid='t1')
p1.children = [Child(loc='x', status='a'), Child(loc='y', status='b')]
addChildren(p1)
import time
time.sleep(1)
#here comes another network update
p1 = Parent(name='team1', gid='t1')
p1.children = [Child(loc='z', status='a'), Child(loc='k', status='b')]
#this fails
addChildren(p1)
I initially tried to do a merge, but that caused the old children to be disassociated with the parent (the foreign IDs were set to null). What is the best way to approach this with ORM? Thanks
EDIT
I guess it doesn't really make sense to create entirely new objects when updates come in over the network. I should just query the session for the appropriate parent, then create new children if necessary and merge? E.g.
def addChildren(pname, pid, cloc, cstat):
p = session.query(Parent).filter(Parent.gid == pid).all()
if len(p) == 0:
p = Parent(pname, pid)
p.children = [Child(loc=cloc, status=cstat)]
session.add(p)
session.commit()
else:
p = p[0]
p.children.append(Child(loc=cloc, status=cstat))
session.merge(p)
session.commit()
You are right - you should not create the same parent twice. In terms of adding children, ... well, you really need only to add them and you do not care about the existing ones... So your edited code should do the job just fine. You can make it shorter and more readable though:
def addChildren(pname, pid, cloc, cstat):
p = session.query(Parent).get(pid) # will give you either Parent or None
if not(p):
p = Parent(pname, pid)
session.add(p)
p.children.append(Child(loc=cloc, status=cstat))
session.commit()
The disadvantage of this way is that for existing Parent the whole collection of Children will be loaded into memory before a new Child is added and later saved to the database. If this is the case (many and increasing number of children for each parent), then the lazy='noload' might be useful:
parent = relationship("Parent", backref=backref('children', lazy='noload'))
This might dramatically improve the speed of inserts, but in this case the access to p.children will never load the existing objects from the database. In such scenarios it is enough to define another relationship. In these situations I prefer to use Building Query-Enabled Properties, so you end up with one property only for adding objects, and the other only for quering persisted results, which often are used by different parts of the system.