I'm using flask Sqlalchemy with a Postgres db and I'm trying to filter to find all the instances of a model where 1 string value of a json data column is equal to another (UUID4) column.
class MyModel (db.Model):
id = db.Column(UUID(as_uuid=True), primary_key=True,
index=True, unique=True, nullable=False,
server_default=sa_text("uuid_generate_v4()"))
site = db.Column(UUID(as_uuid=True), db.ForeignKey(
'site.id'), index=True, nullable=False)
data = db.Column(JSON, default={}, nullable=False)
and these models' data column looks like
{
"cluster": "198519a5-b04a-4371-b188-2b992c25d0ae",
"status": "Pending"
}
This is what I'm trying:
filteredModels = MyModel.query.filter(MyModel.site == MyModel.data['cluster'].astext)
I get:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedFunction) operator does not exist: uuid = text
LINE 4: ...sset.type = 'testplan' AND site_static_asset.site = (site_st...
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
The error message is telling you that Postgresql doesn't have a way to directly compare UUIDs with text values. In other words, it cannot process
MyModel.site == MyModel.data['cluster'].astext
To get around this, you need to cast one side of the comparison to be the same type as the other. Either of these should work:
from sqlalchemy import cast, String
MyModel.query.filter(cast(MyModel.site, String) == MyModel.data['cluster'].astext)
MyModel.query.filter(MyModel.site == cast(MyModel.data['cluster'].astext, UUID))
Related
I am getting my data from my postgres database but it is truncated. For VARCHAR, I know it's possible to set the max size but is it possible to do it too with JSON, or is there an other way?
Here is my request:
robot_id_cast = cast(RobotData.data.op("->>")("id"), String)
robot_camera_cast = cast(RobotData.data.op("->>")(self.camera_name), JSON)
# Get the last upload time for this robot and this camera
subquery_last_upload = (
select([func.max(RobotData.time).label("last_upload")])
.where(robot_id_cast == self.robot_id)
.where(robot_camera_cast != None)
).alias("subquery_last_upload")
main_query = (
select(
[subquery_last_upload.c.last_upload,RobotData.data.op("->")(self.camera_name).label(self.camera_name),])
.where(RobotData.time == subquery_last_upload.c.last_upload)
.where(robot_id_cast == self.robot_id)
.where(robot_camera_cast != None)
)
The problem is with this select part RobotData.data.op("->")(self.camera_name).label(self.camera_name)
Here is my table
class RobotData(PGBase):
__tablename__ = "wr_table"
time = Column(DateTime, nullable=False, primary_key=True)
data = Column(JSON, nullable=False)
Edit: My JSON is 429 characters
The limit of JSON datatype is 1GB in PostgreSQL.
Refs:
https://dba.stackexchange.com/a/286357
https://stackoverflow.com/a/12633183
I have an api endpoint that passes a variable which is used to make a call in the database. For some reason it cannot run the query yet the syntax is correct. My code is below.
#app.route('/api/update/<lastqnid>')
def check_new_entries(lastqnid):
result = Trades.query.filter_by(id=lastqnid).first()
new_entries = Trades.query.filter(Trades.time_recorded > result.time_recorded).all()
The id field is:
id = db.Column(db.String,default=lambda: str(uuid4().hex), primary_key=True)
I have tried filter instead of filter_by and it does not work. When I remove the filter_by(id=lastqnid) it works. What could be the reason it is not running the query?
The trades table am querying from is
class Trades(db.Model):
id = db.Column(db.String,default=lambda: str(uuid4().hex), primary_key=True)
amount = db.Column(db.Integer, unique=False)
time_recorded = db.Column(db.DateTime, unique=False)
The issue you seem to be having is not checking if you found anything before using your result
#app.route('/api/update/<lastqnid>')
def check_new_entries(lastqnid):
result = Trades.query.filter_by(id=lastqnid).first()
# Here result may very well be None, so we can make an escape here
if result == None:
# You may not want to do exactly this, but this is an example
print("No Trades found with id=%s" % lastqnid)
return redirect(request.referrer)
new_entries = Trades.query.filter(Trades.time_recorded > result.time_recorded).all()
I am trying to load an SQLAlchemy in a pandas DataFrame.
When I do:
df = pd.DataFrame(LPRRank.query.all())
I get
>>> df
0 <M. Misty || 1 || 18>
1 <P. Patch || 2 || 18>
...
...
But, what I want is each column in the database to be a column in the dataframe:
0 M. Misty 1 18
1 P. Patch 2 18
...
...
and when I try:
dff = pd.read_sql_query(LPRRank.query.all(), db.session())
I get an Attribute Error:
AttributeError: 'SignallingSession' object has no attribute 'cursor'
and
dff = pd.read_sql_query(LPRRank.query.all(), db.session)
also gives an error:
AttributeError: 'scoped_session' object has no attribute 'cursor'
What I'm using to generate the list of objects is:
app = Flask(__name__)
db = SQLAlchemy(app)
class LPRRank(db.Model):
id = db.Column(db.Integer, primary_key=True)
candid = db.Column(db.String(40), index=True, unique=False)
rank = db.Column(db.Integer, index=True, unique=False)
user_id = db.Column(db.Integer, db.ForeignKey('lprvote.id'))
def __repr__(self):
return '<{} || {} || {}>'.format(self.candid,
self.rank, self.user_id)
This question:
How to convert SQL Query result to PANDAS Data Structure?
is error free, but gives each row as an object, which is not what I want. I can access the individual columns in the returned object, but its seems like there is a better way to do it.
The documentation at pandas.pydata.org is great if you already understand what is going on and just need to review syntax. The documentation from April 20, 2016 (the 1319 page pdf) identifies a pandas connection as still experimental on p.872.
Now, SQLALCHEMY/PANDAS - SQLAlchemy reading column as CLOB for Pandas to_sql is about specifying the SQL type. Mine is SQLAlchemy which is the default.
And, sqlalchemy pandas to_sql OperationalError, Writing to MySQL database with pandas using SQLAlchemy, to_sql, and SQLAlchemy/pandas to_sql for SQLServer -- CREATE TABLE in master db are about writing to the SQL database which produces an operational error, a database error, and a 'create table' error neither of which are my problems.
This one, SQLAlchemy Pandas read_sql from jsonb wants a jsonb attribute to columns: not my cup 'o tea.
This previous question SQLAlchemy ORM conversion to pandas DataFrame addresses my issue but the solution: using query.session.bind is not my solution. I'm opening /closing sessions with db.session.add(), and db.session.commit(), but when I use db.session.bind as specified in the second answer here, then I get an Attribute Error:
AttributeError: 'list' object has no attribute '_execute_on_connection'
Simply add an __init__ method in your model and call the Class object before dataframe build. Specifically below creates an iterable of tuples binded into columns with pandas.DataFrame().
class LPRRank(db.Model):
id = db.Column(db.Integer, primary_key=True)
candid = db.Column(db.String(40), index=True, unique=False)
rank = db.Column(db.Integer, index=True, unique=False)
user_id = db.Column(db.Integer, db.ForeignKey('lprvote.id'))
def __init__(self, candid=None, rank=None, user_id=None):
self.data = (candid, rank, user_id)
def __repr__(self):
return (self.candid, self.rank, self.user_id)
data = db.session.query(LPRRank).all()
df = pd.DataFrame([(d.candid, d.rank, d.user_id) for d in data],
columns=['candid', 'rank', 'user_id'])
Alternatively, use the SQLAlchemy ORM based on your defined Model class, LPRRank, to run read_sql:
df = pd.read_sql(sql = db.session.query(LPRRank)\
.with_entities(LPRRank.candid,
LPRRank.rank,
LPRRank.user_id).statement,
con = db.session.bind)
The Parfait answer is good but could have to problems:
efficiency each object creation imply duplication of data into a DataFrame, so a list of dataframe could take time to be created
That do not mirror a dataframe with a collection of row
Thus below example provides a parent class which is assimilated to a DataFrame representation and a child class assimilated to row of a given dataframe.
Code below provides two way to get a dataframe, the DataFrame object is created only at demand to not waste cpu and memory.
If dataframe is need at creation time you have only to add constructor (def __init__(self, rows:List[MyDataFrameRow] = None)...) and create a new attribute and assing the result of self.data_frame.
from pandas import DataFrame, read_sql
from sqlalchemy import Column, Integer, String, Float, ForeignKey
from sqlalchemy.orm import relationship, Session
Base = declarative_base()
class MyDataFrame(Base):
__tablename__ = 'my_data_frame'
id = Column(Integer, primary_key=True)
rows = relationship('MyDataFrameRow', cascade='all,delete')
#property
def data_frame(self) -> DataFrame:
columns = GenomeCoverageRow.data_frame_columns()
return DataFrame([[getattr(row, column) for column in columns] for row in self.rows],
columns=columns)
#staticmethod
def to_data_frame(identifier: int, session: Session) -> DataFrame:
query = session.query(MyDataFrameRow).join(MyDataFrame).filter(MyDataFrame.id == identifier)
return read_sql(query.statement, session.get_bind())
class MyDataFrameRow(Base):
__tablename__ = 'my_data_row'
id = Column(Integer, primary_key=True)
name= Column(String)
age= Column(Integer)
number_of_children = Column(Integer)
height= Column(Integer)
parent_id = Column(Integer, ForeignKey('my_data_frame.id'))
#staticmethod
def data_frame_columns() -> Tuple[Any]:
return tuple(column.name for column in GenomeCoverageRow.__table__.columns if len(column.foreign_keys) == 0
and column.primary_key is False)
...
session = Session(...)
df1 = MyDataFrame.to_data_frame(1,session)
my_table_obj = session.query(MyDataFrame).filter(MyDataFrame.id == 1).one()
df2 = my_table_obj.data_frame
I'm using flask-sqlalchemy with reflection to build my models but this worked for me:
import pandas as pd
from app.models import Runs
from app import db
def get_db_data_df():
df_runs = pd.read_sql(Runs.__table__.name, con=db.get_engine(), index_col=None)
return df_runs
When creating new records, I'd expect that foreign key fields, and their relationship object would stay in sync (if I change one the other would change to reflect), but this doesn't seem to be the case. Is this possible to do?
Given the following:
Base = declarative_base();
class User(Base):
__tablename__ = 'user';
id = Column(Integer, primary_key=True);
name = Column(String);
fullname = Column(String);
password = Column(String);
equipment = relationship('Equipment', backref='user');
class Equipment(Base):
__tablename__ = 'equipment';
id = Column(Integer, primary_key=True);
user_id = Column(Integer, ForeignKey('user.id'), nullable=False);
name = Column(String);
engine = create_engine('sqlite:///:memory:', echo=True);
Base.metadata.create_all(engine);
session = sessionmaker(bind=engine);
conn = session();
conn.add_all([
User(name='bill', fullname='Bill W.', password='rlrrlrll'), # id=1
User(name='tony', fullname='Tony I.', password='EADGBe'), # id=2
User(name='ozzy', fullname='Ozzy O.', password='durrrr'), # id=3
User(name='geezer', fullname='Terence B.', password='password'), # id=4
]);
I can create related records in either of the two ways:
guitar = Equipment(
user = conn.query(User).filter(User.name == 'tony').one(),
name = 'Gibson SG');
drums = Equipment(
user_id = 1,
name = 'Ludwigs');
Following these lines I'd expect guitar.user_id to be 2, and drums.user to be the 'bill' object, but in both cases they're None. After I conn.add()/conn.commit() then it starts working a little more like I'd expect (both complementary fields return non-None values).
Is there any way for this to work pre-commit? I'd like to be able to construct new records either way (by ID or by object), and in library functions be able to reliably access the ID or object.
You can do this by flushing:
conn.add(guitar)
conn.add(name)
conn.flush()
Flushing emits the INSERT queries but does not COMMIT, meaning you can ROLLBACK later if you need to.
I have the following SQLAlchemy models:
PENDING_STATE = 'pending'
COMPLETE_STATE = 'success'
ERROR_STATE = 'error'
class Assessment(db.Model):
__tablename__ = 'assessments'
id = db.Column(db.Integer, primary_key=True)
state = db.Column(
db.Enum(PENDING_STATE, COMPLETE_STATE, ERROR_STATE,
name='assessment_state'),
default=PENDING_STATE,
nullable=False,
index=True)
test_results = db.relationship("TestResult")
class TestResult(db.Model):
__tablename__ = 'test_results'
name = db.Column(db.String, primary_key=True)
state = db.Column(
db.Enum(PENDING_STATE, COMPLETE_STATE, ERROR_STATE,
name='test_result_state_state'),
default=PENDING_STATE,
nullable=False,
index=True)
assessment_id = db.Column(
db.Integer,
db.ForeignKey(
'assessments.id', onupdate='CASCADE', ondelete='CASCADE'),
primary_key=True)
And I am trying to implement logic to update an assessment to the error state if any of its test results are in the error state and update the assessment to the success state if all of its test results are in the success state.
I can write raw SQL like this:
SELECT 'error'
FROM assessments
WHERE assessments.state = 'error' OR 'error' IN (
SELECT test_results.state
FROM test_results
WHERE test_results.assessment_id = 1);
But I don't know how to translate that into SQLAlchemy. I'd think that subquery would be something like:
(select([test_results.state]).where(test_results.assessment_id == 1)).in_('error')
but I can't find any way to compare query results against literals like I'm doing in the raw SQL. I swear I must be missing something, but I'm just not seeing a way to write queries which return boolean expressions, which I think is fundamentally what I'm butting up against. Just something as simple as:
SELECT 'a' = 'b'
Seems to be absent from the documentation.
Any ideas on how to express this state change in SQLAlchemy? I'd also be perfectly open to rethinking my schemas if it looks like I'm going about this in a silly way.
Thanks!
Query below should do it for error check. Keep in mind that no rows will be returned in case it is not an eror.
q = (db.session.query(literal_column("'error'"))
.select_from(Assessment)
.filter(Assessment.id == sid)
.filter(or_(
Assessment.state == ERROR_STATE,
Assessment.test_results.any(TestResult.state == ERROR_STATE),
)))
If you wish to do similar check for success, you could find if there is any TestResult which is not a success and negate boolean result.
I actually ended up doing this with postgres triggers, which is probably the better way to handle state updates. So for the error case, I've got:
sqlalchemy.event.listen(TestResult.__table__, 'after_create', sqlalchemy.DDL("""
CREATE OR REPLACE FUNCTION set_assessment_failure() RETURNS trigger AS $$
BEGIN
UPDATE assessments
SET state='error'
WHERE id=NEW.assessment_id;
RETURN NEW;
END;
$$ LANGUAGE 'plpgsql';
CREATE TRIGGER assessment_failure
AFTER INSERT OR UPDATE OF state ON test_results
FOR EACH ROW
WHEN (NEW.state = 'error')
EXECUTE PROCEDURE set_assessment_failure();"""))
And something similar for the 'success' case where I count the number of test results vs the number of successful test results.
Credit to van for answering my question as I asked it, though! Thanks, I hadn't bumped into relationship.any before.