Json field truncated in sqlalchemy - python

I am getting my data from my postgres database but it is truncated. For VARCHAR, I know it's possible to set the max size but is it possible to do it too with JSON, or is there an other way?
Here is my request:
robot_id_cast = cast(RobotData.data.op("->>")("id"), String)
robot_camera_cast = cast(RobotData.data.op("->>")(self.camera_name), JSON)
# Get the last upload time for this robot and this camera
subquery_last_upload = (
select([func.max(RobotData.time).label("last_upload")])
.where(robot_id_cast == self.robot_id)
.where(robot_camera_cast != None)
).alias("subquery_last_upload")
main_query = (
select(
[subquery_last_upload.c.last_upload,RobotData.data.op("->")(self.camera_name).label(self.camera_name),])
.where(RobotData.time == subquery_last_upload.c.last_upload)
.where(robot_id_cast == self.robot_id)
.where(robot_camera_cast != None)
)
The problem is with this select part RobotData.data.op("->")(self.camera_name).label(self.camera_name)
Here is my table
class RobotData(PGBase):
__tablename__ = "wr_table"
time = Column(DateTime, nullable=False, primary_key=True)
data = Column(JSON, nullable=False)
Edit: My JSON is 429 characters

The limit of JSON datatype is 1GB in PostgreSQL.
Refs:
https://dba.stackexchange.com/a/286357
https://stackoverflow.com/a/12633183

Related

psycopg2.errors.UndefinedFunction operator does not exist: uuid = text

I'm using flask Sqlalchemy with a Postgres db and I'm trying to filter to find all the instances of a model where 1 string value of a json data column is equal to another (UUID4) column.
class MyModel (db.Model):
id = db.Column(UUID(as_uuid=True), primary_key=True,
index=True, unique=True, nullable=False,
server_default=sa_text("uuid_generate_v4()"))
site = db.Column(UUID(as_uuid=True), db.ForeignKey(
'site.id'), index=True, nullable=False)
data = db.Column(JSON, default={}, nullable=False)
and these models' data column looks like
{
"cluster": "198519a5-b04a-4371-b188-2b992c25d0ae",
"status": "Pending"
}
This is what I'm trying:
filteredModels = MyModel.query.filter(MyModel.site == MyModel.data['cluster'].astext)
I get:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedFunction) operator does not exist: uuid = text
LINE 4: ...sset.type = 'testplan' AND site_static_asset.site = (site_st...
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
The error message is telling you that Postgresql doesn't have a way to directly compare UUIDs with text values. In other words, it cannot process
MyModel.site == MyModel.data['cluster'].astext
To get around this, you need to cast one side of the comparison to be the same type as the other. Either of these should work:
from sqlalchemy import cast, String
MyModel.query.filter(cast(MyModel.site, String) == MyModel.data['cluster'].astext)
MyModel.query.filter(MyModel.site == cast(MyModel.data['cluster'].astext, UUID))

Get count of inserted and updated records in sqlalchemy's upsert

I have a working code of upserting several records by sqlalchemy:
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy import BigInteger
from flask_sqlalchemy import SQLAlchemy as db
class PetModel(db.Model):
__tablename__ = "pets"
id = db.Column(BigInteger, primary_key=True)
data = db.Column(db.String(64), nullable=False, unique=True)
def as_dict(self):
return {
"id": getattr(self, "id"),
"data": getattr(self, "data"),
}
pets = [PetModel(id=1, data="Dog"), PetModel(id=2, data="Cat")]
insert_statement = insert(PetModel).values([_.as_dict() for _ in pets])
upsert_statement = insert_statement.on_conflict_do_update(
constraint="pet_pkey",
set_={"data": insert_statement.excluded.data},
)
ans = db.session.execute(upsert_statement)
I have tried to return all rows by adding returning(PetModel.__table__) into insert_statement, but I can't separate the answer of the [_ for _ in ans] statement on updated and inserted. I don't want to add special field to database.
I know that ans.rowcount returns the sum of updated and inserted records.
How could I get separately the amount of updated and inserted records using sqlalchemy?
As Ilja Everilä said, one of the decisions is to use the xmax hack.
A column with this value we should add to the answer like describing here
from sqlalchemy.dialects.postgresql import insert
...
insert_statement = insert(PetModel).returning(
sqlalchemy.column("xmax") == 0
).values([_.as_dict() for _ in pets])
upsert_statement = insert_statement.on_conflict_do_update(
constraint="pet_pkey",
set_={"data": insert_statement.excluded.data},
)
ans = db.session.execute(upsert_statement)
created = sum([_._row[0] for _ in ans])
updated = len(pets) - created

Translating query with lag function to sqlalchemy with flask?

I'm trying to translate the following query to sqlAlchemy and can't seem to figure it out (I'm not even far yet):
"SELECT time, version_id FROM ( \
SELECT \
time, \
version_id, \
LAG(software_version_id) OVER (ORDER BY time) as previous_version_id \
FROM device_checkins WHERE device_id = 001 AND time BETWEEN '2020-01-01' AND '2021-04-01') tt \
WHERE previous_version_id IS NULL OR version_id != previous_version_id;"
As far as I could figure out, I need the select function that sqlAlchemy provides but I'm running into trouble.
Of course, in the python representation, we have a DeviceCheckin model with all the fields that are used here. I'd love all the help you might be able to provide.
class DeviceCheckin(ModelBase):
__tablename__ = "device_checkins"
time = db.Column(DateTimeUtc(), nullable=False)
device_id = db.Column(sa.BigInteger, nullable=False)
device = db.relationship(...)
software_version_id = db.Column(...)
software_version = db.relationship(...)
You could draw up the subquery expression as follows:
import sqlalchemy as sa
#...
entities = [
DeviceCheckin.time,
DeviceCheckin.version_id,
sa.func.lag(
DeviceCheckin.software_version_id
).over(
order_by=DeviceChecking.time
).label('previous_version_id')
]
condition = (
(DeviceCheckin.device_id == '001')
& (DeviceCheckin.time.between('2020-01-01', '2021-04-01'))
)
subq = DeviceCheckin.query.with_entities(entities).filter(condition).subquery()
Then select from it in the following manner:
condition = (
subq.c.previous_version_id.is_(None)
| (subq.c.version_id != subq.c.previous_version_id)
)
entities = [subq.c.time, subq.c.version_id]
query = subq.select(entities).where(condition)
results = db.session.execute(query)

Python sqlalchemy large file issue

I'm having problem using the following code to load a large(23,000 records, 10 fields) airport code csv file into a database with sqlalchemy:
from numpy import genfromtxt
from time import time
from datetime import datetime
from sqlalchemy import Column, Integer, Float, Date, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
def Load_Data(file_name):
f = lambda s: str(s)
data = genfromtxt(file_name, delimiter=',', skiprows=1, converters={0: f, 1:f, 2:f, 6:f, 7:f, 8:f, 9:f, 10:f})
return data.tolist()
Base = declarative_base()
class AirportCode(Base):
#Tell SQLAlchemy what the table name is and if there's any table-specific arguments it should know about
__tablename__ = 'AirportCode'
__table_args__ = {'sqlite_autoincrement': True}
#tell SQLAlchemy the name of column and its attributes:
id = Column(Integer, primary_key=True, nullable=False)
ident = Column(String)
type = Column(String)
name = Column(String)
latitude_deg = Column(String)
longitude_deg = Column(String)
elevation_ft = Column(String)
continent = Column(String)
iso_country = Column(String)
iso_region = Column(String)
municipality = Column(String)
gps_code = Column(String)
def __repr__(self):
#return "<AirportCode(name='%s', municipality='%s')>\n" % (self.name, self.municipality)
return "name:{} municipality:{}\n".format(self.name, self.municipality)
if __name__ == "__main__":
t = time()
#Create the database
engine = create_engine('sqlite:///airport-codes.db')
Base.metadata.create_all(engine)
#Create the session
session = sessionmaker()
session.configure(bind=engine)
s = session()
records_to_commit = 0
file_name = "airport-codes.csv" #23,000 records fails at next line
#file_name = "airport-codes.alaska 250 records works fine"
print file_name #for debugging
data = Load_Data(file_name) # fails here on large files and triggers the except: below
print 'file loaded' #for debugging
for i in data:
records_to_commit += 1
record = AirportCode(**{
'ident' : i[0].lower(),
'type' : i[1].lower(),
'name' : i[2].lower(),
'latitude_deg' : i[3],
'longitude_deg' : i[4],
'elevation_ft' : i[5],
'continent' : i[6],
'iso_country' : i[7],
'iso_region' : i[8],
'municipality' : i[9].lower(),
'gps_code' : i[10].lower()
})
s.add(record) #Add all the records
#if records_to_commit == 1000:
#s.flush() #Attempt to commit batch of 1000 records
#records_to_commit = 0
s.commit() # flushes everything remaining + commits
s.close() #Close the connection
print "Time elapsed: " + str(time() - t) + " s."
I adapted this code from another post on this forum and it works fine if I use a subset of the main csv file (Alaska airports) that is only 250 records.
When I try the entire data base of 23,000 records the program fails to load at this line in the code:
data = Load_Data(file_name)
I am working on a raspberry pi 3
Thanks for the helpful comments. Removing try/except revealed the issues. There were many international characters, extra commas within fields, and special characters, etc that caused the issue when loading the file. The Alaska airport entries were error free so it loaded fine.
Database now loads 22,000 records in 32 seconds. I deleted about 1000 entries since they were foreign entries and I want this be a US airport directory

Dynamic Datasets and SQLAlchemy

I am refactoring some old SQLite3 SQL statements in Python into SQLAlchemy. In our framework, we have the following SQL statements that takes in a dict with certain known keys and potentially any number of unexpected keys and values (depending what information was provided).
import sqlite3
import sys
def dict_factory(cursor, row):
d = {}
for idx, col in enumerate(cursor.description):
d[col[0]] = row[idx]
return d
def Create_DB(db):
# Delete the database
from os import remove
remove(db)
# Recreate it and format it as needed
with sqlite3.connect(db) as conn:
conn.row_factory = dict_factory
conn.text_factory = str
cursor = conn.cursor()
cursor.execute("CREATE TABLE [Listings] ([ID] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL UNIQUE, [timestamp] REAL NOT NULL DEFAULT(( datetime ( 'now' , 'localtime' ) )), [make] VARCHAR, [model] VARCHAR, [year] INTEGER);")
def Add_Record(db, data):
with sqlite3.connect(db) as conn:
conn.row_factory = dict_factory
conn.text_factory = str
cursor = conn.cursor()
#get column names already in table
cursor.execute("SELECT * FROM 'Listings'")
col_names = list(map(lambda x: x[0], cursor.description))
#check if column doesn't exist in table, then add it
for i in data.keys():
if i not in col_names:
cursor.execute("ALTER TABLE 'Listings' ADD COLUMN '{col}' {type}".format(col=i, type='INT' if type(data[i]) is int else 'VARCHAR'))
#Insert record into table
cursor.execute("INSERT INTO Listings({cols}) VALUES({vals});".format(cols = str(data.keys()).strip('[]'),
vals=str([data[i] for i in data]).strip('[]')
))
#Database filename
db = 'test.db'
Create_DB(db)
data = {'make': 'Chevy',
'model' : 'Corvette',
'year' : 1964,
'price' : 50000,
'color' : 'blue',
'doors' : 2}
Add_Record(db, data)
data = {'make': 'Chevy',
'model' : 'Camaro',
'year' : 1967,
'price' : 62500,
'condition' : 'excellent'}
Add_Record(db, data)
This level of dynamicism is necessary because there's no way we can know what additional information will be provided, but, regardless, it's important that we store all information provided to us. This has never been a problem because in our framework, as we've never expected an unwieldy number of columns in our tables.
While the above code works, it's obvious that it's not a clean implementation and thus why I'm trying to refactor it into SQLAlchemy's cleaner, more robust ORM paradigm. I started going through SQLAlchemy's official tutorials and various examples and have arrived at the following code:
from sqlalchemy import Column, String, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
class Listing(Base):
__tablename__ = 'Listings'
id = Column(Integer, primary_key=True)
make = Column(String)
model = Column(String)
year = Column(Integer)
engine = create_engine('sqlite:///')
session = sessionmaker()
session.configure(bind=engine)
Base.metadata.create_all(engine)
data = {'make':'Chevy',
'model' : 'Corvette',
'year' : 1964}
record = Listing(**data)
s = session()
s.add(record)
s.commit()
s.close()
and it works beautifully with that data dict. Now, when I add a new keyword, such as
data = {'make':'Chevy',
'model' : 'Corvette',
'year' : 1964,
'price' : 50000}
I get a TypeError: 'price' is an invalid keyword argument for Listing error. To try and solve the issue, I modified the class to be dynamic, too:
class Listing(Base):
__tablename__ = 'Listings'
id = Column(Integer, primary_key=True)
make = Column(String)
model = Column(String)
year = Column(Integer)
def __checker__(self, data):
for i in data.keys():
if i not in [a for a in dir(self) if not a.startswith('__')]:
if type(i) is int:
setattr(self, i, Column(Integer))
else:
setattr(self, i, Column(String))
else:
self[i] = data[i]
But I quickly realized this would not work at all for several reasons, e.g. the class was already initialized, the data dict cannot be fed into the class without reinitializing it, it's a hack more than anything, et al.). The more I think about it, the less obvious the solution using SQLAlchemy seems to me. So, my main question is, how do I implement this level of dynamicism using SQLAlchemy?
I've researched a bit to see if anyone has a similar issue. The closest I've found was Dynamic Class Creation in SQLAlchemy but it only talks about the constant attributes ("tablename" et al.). I believe the unanswered https://stackoverflow.com/questions/29105206/sqlalchemy-dynamic-attribute-change may be asking the same question. While Python is not my forte, I consider myself a highly skilled programmer (C++ and JavaScript are my strongest languages) in the context scientific/engineering applications, so I may not hitting the correct Python-specific keywords in my searches.
I welcome any and all help.
class Listing(Base):
__tablename__ = 'Listings'
id = Column(Integer, primary_key=True)
make = Column(String)
model = Column(String)
year = Column(Integer)
def __init__(self,**kwargs):
for k,v in kwargs.items():
if hasattr(self,k):
setattr(self,k,v)
else:
engine.execute("ALTER TABLE %s AD COLUMN %s"%(self.__tablename__,k)
setattr(self.__class__,Column(k, String))
setattr(self,k,v)
might work ... maybe ... I am not entirely sure I did not test it
a better solution would be to use a relational table
class Attribs(Base):
listing_id = Column(Integer,ForeignKey("Listing"))
name = Column(String)
val = Column(String)
class Listing(Base):
id = Column(Integer,primary_key = True)
attributes = relationship("Attribs",backref="listing")
def __init__(self,**kwargs):
for k,v in kwargs.items():
Attribs(listing_id=self.id,name=k,value=v)
def __str__(self):
return "\n".join(["A LISTING",] + ["%s:%s"%(a.name,a.val) for a in self.attribs])
another solution would be to store json
class Listing(Base):
__tablename__ = 'Listings'
id = Column(Integer, primary_key=True)
data = Column(String)
def __init__(self,**kwargs):
self.data = json.dumps(kwargs)
self.data_dict = kwargs
the best solution would be to use a no-sql key,value store (maybe even just a simple json file? or perhaps shelve? or even pickle I guess)

Categories