SQLAlchemy - mask values in objects on the fly - python

I have the following SQLAlchemy class defined:
Base = sqlalchemy.ext.declarative.declarative_base()
class NSASecrets(Base):
__tablename__ = 'nsasecrets';
id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True);
text = sqlalchemy.Column(sqlalchemy.String);
author = sqlalchemy.Column(sqlalchemy.String);
Now what I want to do is to be able to mask "author" field depending on some logic, something like:
if (allowed):
nsasecrets = session.query(NSASecrets,**mask=False**);
else:
nsasecrets = session.query(NSASecrets,**mask=True**);
for nsasecret in nsasecrets:
print '{0} {1}'.format(author, text);
So depending on this "mask" parameter I would like output to be "John Smith" in False case - output not masked, or "J*** **h" when output is masked. Now obviously I could do it in this very print, but problem is that prints are scattered around the code and the only way I see to do this in controlled centralized manner is to create SQLAlchemy objects with already masked values. So is there any well known solution to this? Or should I just create my own session manager that would overload "query" interface or am I missing some other possible solutions to this?
Thanks

this is typically the kind of thing in Python we do with something called descriptors. A simple way to combine descriptors with SQLAlchemy mapped columns is to use the synonym, though synonym is a bit dated at this point, in favor of a less "magic" system called hybrids. Either can be used here, below is an example of a hybrid:
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base, synonym_for
from sqlalchemy.ext.hybrid import hybrid_property
Base = declarative_base()
class NSASecrets(Base):
__tablename__ = 'nsasecrets'
id = Column(Integer, primary_key=True)
_text = Column("text", String)
_author = Column("author", String)
def _obfuscate(self, value):
return "%s%s" % (value[0], ("*" * (len(value) - 2)))
#hybrid_property
def text(self):
return self._obfuscate(self._text)
#text.setter
def text(self, value):
self._text = value
#text.expression
def text(cls):
return cls._text
#hybrid_property
def author(self):
return self._obfuscate(self._author)
#author.setter
def author(self, value):
self._author = value
#author.expression
def author(cls):
return cls._author
n1 = NSASecrets(text='some text', author="some author")
print n1.text
print n1.author
note that this doesn't have much to do with querying. The idea of formatting the data as it arrives in a rowset is a different way to go, and there's some ways to make that happen too, though if you're only concerned about print statements that refer to "text" and "author", it's likely more convenient to keep that as a python access pattern.

Related

How do I use Mixins with SQLAlchemy to simplify querying and filtering operation?

Assume the following setup:
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class MyClass(Base):
id = Column(Integer, primary_key=True)
name = Column(String)
The normal paradigm to query the DB with SQLAlchemy is to do the following:
Session = sessionmaker()
engine = 'some_db_location_string'
session = Session(bind=engine)
session.query(MyClass).filter(MyClass.id == 1).first()
Suppose, I want to simplify the query to the following:
MyClass(s).filter(MyClass.id == 1).first()
OR
MyClass(s).filter(id == 1).first()
How would I do that? My first attempt at that to use a model Mixin class failed. This is what I tried:
class ModelMixins(object)
def __init__(self, session):
self.session = session
def filter(self, *args):
self.session.query(self).filter(*args)
# Redefine MyClass to use the above class
class MyClass(ModelMixins, Base):
id = Column(Integer, primary_key=True)
name = Column(String)
The main failure seems to be that I can't quite transfer the expression 'MyClass.id == 1' to the actual filter function that is part of the session object.
Folks may ask why would I want to do:
MyClass(s).filter(id == 1).first()
I have seen something similar like this used before and thought that the syntax becomes so much cleaner I can achieve this. I wanted to replicate this but have not been able to. Being able to do something like this:
def get_stuff(some_id):
with session_scope() as s:
rec = MyClass(s).filter(MyClass.id== some_id').first()
if rec:
return rec.name
else:
return None
...seems to be the cleanest way of doing things. For one, session management is kept separate. Secondly, the query itself is simplified. Having a Mixin class like this would allow me to add the filter functionality to any number of classes...So can someone help in this regard?
session.query takes a class; you're giving it self, which is an instance. Replace your filter method with:
def filter(self, *args):
return session.query(self.__class__).filter(*args)
and at least this much works:
In [45]: MyClass(session).filter(MyClass.id==1)
Out[45]: <sqlalchemy.orm.query.Query at 0x10e0bbe80>
The generated SQL looks right, too (newlines added for clarity):
In [57]: str(MyClass(session).filter(MyClass.id==1))
Out[57]: 'SELECT "MyClass".id AS "MyClass_id", "MyClass".name AS "MyClass_name"
FROM "MyClass"
WHERE "MyClass".id = ?'
No guarantees there won't be oddities; I've never tried anything like this before.
Ive been using this mixin to good success. Most likely not the most efficient thing in the world and I am no expert. I define a date_created column for every table
class QueryBuilder:
"""
This class describes a query builer.
"""
q_debug = False
def query_from_dict(self, db_session: Session, **q_params: dict):
"""
Creates a query.
:param db_session: The database session
:type db_session: Session
:param q_params: The quarter parameters
:type q_params: dictionary
"""
q_base = db_session.query(type(self))
for param, value in q_params.items():
if param == 'start_date':
q_base = q_base.filter(
type(self).__dict__.get('date_created') >= value
)
elif param == 'end_date':
q_base = q_base.filter(
type(self).__dict__.get('date_created') <= value
)
elif 'like' in param:
param = param.replace('_like', '')
member = type(self).__dict__.get(param)
if member:
q_base = q_base.filter(member.ilike(f'%{value}%'))
else:
q_base = q_base.filter(
type(self).__dict__.get(param) == value
)
if self.q_debug:
print(q_base)
return q_base

Print an sqlalchemy row

All I'd like to do is print a single row of an sqlalchemy table row.
Say I have:
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class ATable(Base):
__tablename__ = 'atable'
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=False)
Then I'd like to output anything that looks like this:
id: 1
name: theRowName
Preferable without having to hard code in the table columns, i.e. more generally.
I've tried:
atable = Atable()
... #add some values etc.
print atable
print str(atable)
print repr(atable)
print atable.__table__.c
As well as thought about implementing __str__ and __repr__, but they again lack the generality request.
There are many questions on covering a table row into JSON, but that's not really what I want, I care more about the visual output - it doesn't need to be machine readable afterwards.
To be clear - you want a general method to print "col: value" without hardcoding the column names? I do not use SQLAlchemy much, but a __str__ method like this should work:
def __str__(self):
output = ''
for c in self.__table__.columns:
output += '{}: {}\n'.format(c.name, getattr(self, c.name))
return output
You can then put that method in a mixin class to use elsewhere in your models.

Sqlalchemy Filter condition always returns True in subquery

I made this statement using flask-sqlalchemy and I've chosen to keep it in its original form. Post.query is equivalent to session.query(Post)
I attempted to make a subquery that would filter out all posts in a database which are in the draft state and not made or modified by the current user. I made this query,
Post.query\
.filter(sqlalchemy.and_(
Post.post_status != Consts.PostStatuses["Draft"],
sqlalchemy.or_(
Post.modified_by_id == current_user.get_id(),
Post.created_by_id == current_user.get_id()))
which created:
Where true AND ("Post".modified_by_id = :modified_by_id_1 OR
"Post".created_by_id = :created_by_id_1)
Expected outcome:
Where "Post".post_status != "Draft" AND (
"Post".modified_by_id = :modified_by_id_1 OR
"Post".created_by_id = :created_by_id_1)
I'm wondering, why this is happening? How can I increase the error level in SQLAlchemy? I think my project is silently failing and I would like to confirm my guess.
Update:
I used the wrong constants dictionary. One dictionary contains ints, the other contains strings (one for data base queries, one for printing).
_post_status = db.Column(
db.SmallInteger,
default=Consts.post_status["Draft"])
post_status contains integers, Consts.PostStatuses contains strings. In hind sight, really bad idea. I'm going to make a single dictionary that returns a tuple instead of two dictionaries.
#property
def post_status(self):
return Consts.post_status.get(getattr(self, "_post_status", None))
the problem is that your post_status property isn't acceptable for usage in an ORM level query, as this is a python descriptor which at the class level by default returns itself:
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
_post_status = Column(String)
#property
def post_status(self):
return self._post_status
print (A.post_status)
print (A.post_status != 5678)
output:
$ python test.py
<property object at 0x10165bd08>
True
the type of usage you're looking for seems like that of a hybrid attribute, which is a SQLAlchemy-included extension to a "regular" python descriptor which produces class-level behavior that's compatible with core SQL expressions:
from sqlalchemy.ext.hybrid import hybrid_property
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
_post_status = Column(String)
#hybrid_property
def post_status(self):
return self._post_status
print (A.post_status)
print (A.post_status != 5678)
output:
$ python test.py
A._post_status
a._post_status != :_post_status_1
be sure to read the hybrid doc carefully including how to establish the correct SQL expression behavior, descriptors that work both at the instance and class level is a somewhat advanced Python technique.

How do I implement a null coalescing operator in SQLAlchemy?

Or how do I make this thing work?
I have an Interval object:
class Interval(Base):
__tablename__ = 'intervals'
id = Column(Integer, primary_key=True)
start = Column(DateTime)
end = Column(DateTime, nullable=True)
task_id = Column(Integer, ForeignKey('tasks.id'))
#hybrid_property #used to just be #property
def hours_spent(self):
end = self.end or datetime.datetime.now()
return (end-start).total_seconds()/60/60
And a Task:
class Task(Base):
__tablename__ = 'tasks'
id = Column(Integer, primary_key=True)
title = Column(String)
intervals = relationship("Interval", backref="task")
#hybrid_property # Also used to be just #property
def hours_spent(self):
return sum(i.hours_spent for i in self.intervals)
Add all the typical setup code, of course.
Now when I try to do session.query(Task).filter(Task.hours_spent > 3).all()
I get NotImplementedError: <built-in function getitem> from the sum(i.hours_spent... line.
So I was looking at this part of the documentation and theorized that there might be some way that I can write something that will do what I want. This part also looks like it may be of use, and I'll be looking at it while waiting for an answer here ;)
For a simple example of SQLAlchemy's coalesce function, this may help: Handling null values in a SQLAlchemy query - equivalent of isnull, nullif or coalesce.
Here are a couple of key lines of code from that post:
from sqlalchemy.sql.functions import coalesce
my_config = session.query(Config).order_by(coalesce(Config.last_processed_at, datetime.date.min)).first()
SQLAlchemy is not smart enough to build SQL expression tree from these operands, you have to use explicit propname.expression decorator to provide it. But then comes another problem: there is no portable way to convert interval to hours in-database. You'd use TIMEDIFF in MySQL, EXTRACT(EPOCH FROM ... ) / 3600 in PostgreSQL etc. I suggest changing properties to return timedelta instead, and comparing apples to apples.
from sqlalchemy import select, func
class Interval(Base):
...
#hybrid_property
def time_spent(self):
return (self.end or datetime.now()) - self.start
#time_spent.expression
def time_spent(cls):
return func.coalesce(cls.end, func.current_timestamp()) - cls.start
class Task(Base):
...
#hybrid_property
def time_spent(self):
return sum((i.time_spent for i in self.intervals), timedelta(0))
#time_spent.expression
def hours_spent(cls):
return (select([func.sum(Interval.time_spent)])
.where(cls.id==Interval.task_id)
.label('time_spent'))
The final query is:
session.query(Task).filter(Task.time_spent > timedelta(hours=3)).all()
which translates to (on PostgreSQL backend):
SELECT task.id AS task_id, task.title AS task_title
FROM task
WHERE (SELECT sum(coalesce(interval."end", CURRENT_TIMESTAMP) - interval.start) AS sum_1
FROM interval
WHERE task.id = interval.task_id) > %(param_1)s
I needed to use the text function and could not use 0 as an integer.
import sqlalchemy as sa
session.query(sa.func.coalesce(table1.col1, sa.text("0"))).all()
There is a complete example of making a func action similar to coalesc or nvl.
Note how it takes in arguements, and renders an expression... in this case NVL(a, b) when used with Oracle.
http://docs.sqlalchemy.org/en/latest/core/compiler.html#subclassing-guidelines
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import FunctionElement
class coalesce(FunctionElement):
name = 'coalesce'
#compiles(coalesce)
def compile(element, compiler, **kw):
return "coalesce(%s)" % compiler.process(element.clauses)
#compiles(coalesce, 'oracle')
def compile(element, compiler, **kw):
if len(element.clauses) > 2:
raise TypeError("coalesce only supports two arguments on Oracle")
return "nvl(%s)" % compiler.process(element.clauses)
Then when you want to use it...
from my_oracle_functions_sqla import coalesce
select([coalesce(A.value, '---')]) # etc
Hope that helps.

sqlalchemy access parent class attribute

Looking at the bottom of the post you can see i have three classes. The code here is pseudo code written on the fly and untested however it adequately shows my problem. If we need the actual classes I can update this question tomorrow when at work. So ignore syntax issues and code that only represents a thought rather than the actual "code" that would do what i describe there.
Question 1
If you look at the Item search class method you can see that when the user does a search i call search on the base class then based on that result return the correct class/object. This works but seems kludgy. Is there a better way to do this?
Question 2
If you look at the KitItem class you can see that I am overriding the list price. If the flag calc_list is set to true then I sum the list price of the components and return that as the list price for the kit. If its not marked as true I want to return the "base" list price. However as far as I know there is no way to access a parent attribute since in a normal setup it would be meaningless but with sqlalchemy and shared table inheritance it could be useful.
TIA
class Item(DeclarativeBase):
__tablename__ = 'items'
item_id = Column(Integer,primary_key=True,autoincrement=True)
sku = Column(Unicode(50),nullable=False,unique=True)
list_price = Column(Float)
cost_price = Column(Float)
item_type = Column(Unicode(1))
__mapper_args__ = {'polymorphic_on': item_type}
__
def __init__(self,sku,list_price,cost_price):
self.sku = sku
self.list_price = list_price
self.cost_price = cost_price
#classmethod
def search(cls):
"""
" search based on sku, description, long description
" return item as proper class
"""
item = DBSession.query(cls).filter(...) #do search stuff here
if item.item_type == 'K': #Better way to do this???
return DBSession.query(KitItem).get(item.item_id)
class KitItem(Item):
__mapper_args__ = {'polymorphic_identity': 'K'}
calc_list = Column(Boolean,nullable=False,default=False)
#property
def list_price(self):
if self.calc_list:
list_price = 0.0
for comp in self.components:
list_price += comp.component.list_price * comp.qty
return list_price
else:
#need help here
item = DBSession.query(Item).get(self.item_id)
return item.list_price
class KitComponent(DeclarativeBase):
__tablename__ = "kit_components"
kit_id = Column(Integer,ForeignKey('items.item_id'),primarykey=True)
component_id = Column(Integer,ForeignKey('items.item_id'),primarykey=True)
qty = Column(Integer,nullable=False, default=1)
kit = relation(KitItem,backref=backref("components"))
component = relation(Item)
Answer-1: in fact you do not need to do anything special here: given that you configured your inheritance hierarchy properly, your query will already return proper class for every row (Item or KitItem). This is the advantage of the ORM part. What you could do though is to configure the query to immediatelly load also the additional columns which do belong to children of Item (from your code this is only calc_list column), which you can do by specifying with_polymorphic('*'):
#classmethod
def search(cls):
item = DBSession.query(cls).with_polymorphic('*').filter(...) #do search stuff here
return item
Read more on this in Basic Control of Which Tables are Queried.
To see the difference, enabled SQL logging, and compare your tests scripts with and without with_polymorphic(...) - you will most probably require less SQL statements being executed.
Answer-2: I would not override one entry attributed with one which is purely computed. Instead I would just create another computed attribute (lets call it final_price), which would look like following for each of two classes:
class Item(Base):
...
#property
def total_price(self):
return self.list_price
class KitItem(Item):
...
#property
def total_price(self):
if self.calc_list:
_price = 0.0
for comp in self.components:
_price += comp.component.list_price * comp.qty
return _price
else:
# #note: again, you do not need to perform any query here at all, as *self* is that you need
return self.list_price
Also in this case, you might think of configuring the relationship KitItem.components to be eagerly loaded, so that the calculation of the total_price will not trigger additional SQL. But you have to decide yourself if this is beneficial for your use cases (again, analyse the SQLs generated in your scenario).

Categories