Automatically truncate strings in sqlalchemy's ORM (postgresql database)

Automatically truncate strings in sqlalchemy's ORM (postgresql database) - python

How can I automatically truncate string values in a data model across many attributes, without explicitly defining a #validates method for each one?
My current code:
from sqlalchemy import Column, Integer, String
from sqlalchemy.orm import validates
class MyModel:
__tablename__ = 'my_model'
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(40), nullable=False, unique=True)
# I can "force" truncation to my model using "validates"
# I'd prefer not to use this solution though...
#validates('name')
def validate_code(self, key, value):
max_len = getattr(self.__class__, key).prop.columns[0].type.length
if value and len(value) > max_len:
value = value[:max_len]
return value
My concern is that my ORM will span many tables and fields and there's a high risk of oversight in including attributes in string length validation. In simpler words, I need a solution that'll scale. Ideally, something in my session configuration which'll automatically truncate strings that are too long...

You could create a customised String type that automatically truncates its value on insert.
import sqlalchemy.types as types
class LimitedLengthString(types.TypeDecorator):
impl = types.String
def process_bind_param(self, value, dialect):
return value[:self.impl.length]
def copy(self, **kwargs):
return LimitedLengthString(self.impl.length)
class MyModel:
__tablename__ = 'my_model'
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(LimitedLengthString(40), nullable=False, unique=True)
The extended type will still create VARCHAR(40) in the database, so it should be possible to replace String(40) with LimitedLengthString(40)* in your code without a database migration.
* You might want to choose a shorter name.

Related

SQLAlchemy unique constrain by field

I have UniqueConstraint on field, but it wont allow me to add multiple entries (two is max!)
from sqlalchemy import Column, Integer, String, Boolean, UniqueConstraint
class Cart(SqlAlchemyBase):
__tablename__ = 'cart'
__table_args__ = (UniqueConstraint('is_latest'), {})
sid = Column(Integer, primary_key=True)
is_latest = Column(Boolean, index=True, nullable=False)
name = Column(String)
I would like to support more entries, so that one name can have two variants:
name=foo, is_latest=True
name=foo, is_latest=False
name=bar, is_latest=True
name=bar, is_latest=False
but then reject any subsequent attempt to write name=foo (or bar) and is_latest=True

What you are trying to achieve here is a type 2 slowly changing dimension, this is a topic that has been discussed extensively and I encourage you to look it up.
When I look at your table you seem to use sid as a surrogate key, but I fail to see what is the natural key and what will be updated as time goes.
Anyway, there are several ways to achieve SCD type 2 result without the need to worry about your check, but the the simplest in my mind is to keep on adding records with your natural key and when querying, select only the one with highest surrogate key (autoincrementing integer), no need for current uniqueness here as only the latest value is fetched.
There are examples for versioning rows in SQLAlchemy docs, but since website come and go, I'll put a simplified draft of the above approach here.
class VersionedItem(Versioned, Base):
id = Column(Integer, primary_key=True) # surrogate key
sku = Column(String, index=True) # natural key
price = Column(Integer) # the value that changes with time
#event.listens_for(Session, "before_flush")
def before_flush(session, flush_context, instances):
for instance in session.dirty:
if not (
isinstance(instance, VersionedItem)
and session.is_modified(instance)
and attributes.instance_state(instance).has_identity
):
continue
make_transient(instance) # remove db identity from instance
instance.id = None # remove surrogate key
session.add(instance) # insert instance as new record

Looks like a Partial Unique Index can be used:
class Cart(SqlAlchemyBase):
__tablename__ = 'cart'
id = Column(Integer, primary_key=True)
cart_id = Column(Integer)
is_latest = Column(Boolean, default=False)
name = Column(String)
__table_args__ = (
Index('only_one_latest_cart', name, is_latest,
unique=True,
postgresql_where=(is_latest)),
)
name=foo, is_latest = True
name=foo, is_latest = False
name=bar, is_latest = False
name=bar, is_latest = False
And when adding another name=foo, is_latest = True
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "only_one_latest_cart"
DETAIL: Key (name, is_latest)=(foo, t) already exists.

Flask-SQLAlchemy serializable objects with integer, float and boolean types in JSON

I am using Flask 1.1.2 and Flask-SQLAlchemy 2.4.4 to create a RESTFul service.
I want all of my model classes to be serializable and so have written a base class that my models derive from. In my Serializable class, the method as_dict() returns the string value of a column. However, I want to be modify the method to:
Return integers and floats (not their string version)
Boolean types to be returned in valid JSON (i.e. true, false)
This is my code so far:
class Serializable():
def as_dict(self):
return {c.name: str(getattr(self, c.name)) for c in self.__table__.columns}
class DataFrequency(db.Model, Serializable):
__tablename__ = 'data_frequency'
__table_args__ = (
db.CheckConstraint('period > 0'),
)
id = db.Column(db.Integer, primary_key=True, server_default=text("nextval('data_frequency_id_seq'::regclass)"))
period = db.Column(db.Integer, nullable=False)
name = db.Column(db.Text, nullable=False, unique=True)
How may I modify the as_dict() method to correctly handle Integer (and it's variants), Floats (and it's variants) and Boolean column types?

you can use column.type.python_type to cast the column value, for ex. c.type.python_type("1") will return 1, instead of "1" if type is int

Change SQLAlchemy Primary Key after it has been defined

Problem: Simply put, I am trying to redefine a SQLAlchemy ORM table's primary key after it has already been defined.
Example:
class Base:
#declared_attr
def __tablename__(cls):
return f"{cls.__name__}"
#declared_attr
def id(cls):
return Column(Integer, cls.seq, unique=True,
autoincrement=True, primary_key=True)
Base = declarative_base(cls=Base)
class A_Table(Base):
newPrimaryKeyColumnsDerivedFromAnotherFunction = []
# Please Note: as the variable name tries to say,
# these columns are auto-generated and not known until after all
# ORM classes (models) are defined
# OTHER CLASSES
def changePriKeyFunc(model):
pass # DO STUFF
# Then do
Base.metadata.create_all(bind=arbitraryEngine)
# After everything has been altered and tied into a little bow
*Please note, this is a simplification of the true problem I am trying to solve.
Possible Solution: Your first thought might have been to do something like this:
def possibleSolution(model):
for pricol in model.__table__.primary_key:
pricol.primary_key = False
model.__table__.primary_key = PrimaryKeyConstraint(
*model.newPrimaryKeyColumnsDerivedFromAnotherFunction,
# TODO: ADD all the columns that are in the model that are also a primary key
# *[col for col in model.__table__.c if col.primary_key]
)
But, this doesn't work, because when trying to add, flush, and commit, an error gets thrown:
InvalidRequestError: Instance <B_Table at 0x104aa1d68> cannot be refreshed -
it's not persistent and does not contain a full primary key.
Even though this:
In [2]: B_Table.__table__.primary_key
Out[2]: PrimaryKeyConstraint(Column('a_TableId', Integer(),
ForeignKey('A_Table.id'), table=<B_Table>,
primary_key=True, nullable=False))
as well as this:
In [3]: B_Table.__table__
Out[3]: Table('B_Table', MetaData(bind=None),
Column('id', Integer(), table=<B_Table>, nullable=False,
default=Sequence('test_1', start=1, increment=1,
metadata=MetaData(bind=None))),
Column('a_TableId', Integer(),
ForeignKey('A_Table.id'), table=<B_Table>,
primary_key=True, nullable=False),
schema=None)
and finally:
In [5]: b.a_TableId
Out[5]: 1
Also note that the database actually reflects the changed (and true) primary key, so I know that there's something going on with the ORM/SQLAlchemy.
Question: In summary, how can I change the model's primary key after the model has already been defined?
edit: See below for full code (same type of error, just in SQLite)
from sqlalchemy import Column, Integer, ForeignKey
from sqlalchemy.orm import relationship, sessionmaker
from sqlalchemy.ext.declarative import declared_attr, declarative_base
from sqlalchemy.schema import PrimaryKeyConstraint
from sqlalchemy import Sequence, create_engine
class Base:
#declared_attr
def __tablename__(cls):
return f"{cls.__name__}"
#declared_attr
def seq(cls):
return Sequence("test_1", start=1, increment=1)
#declared_attr
def id(cls):
return Column(Integer, cls.seq, unique=True, autoincrement=True, primary_key=True)
Base = declarative_base(cls=Base)
def relate(model, x):
"""Model is the original class, x is what class needs to be as
an attribute for model"""
attributeName = x.__tablename__
idAttributeName = "{}Id".format(attributeName)
setattr(model, idAttributeName,
Column(ForeignKey(x.id)))
setattr(model, attributeName,
relationship(x,
foreign_keys=getattr(model, idAttributeName),
primaryjoin=getattr(
model, idAttributeName) == x.id,
remote_side=x.id
)
)
return model.__table__.c[idAttributeName]
def possibleSolution(model):
if len(model.defined):
newPriCols = []
for x in model.defined:
newPriCols.append(relate(model, x))
for priCol in model.__table__.primary_key:
priCol.primary_key = False
priCol.nullable = True
model.__table__.primary_key = PrimaryKeyConstraint(
*newPriCols
# TODO: ADD all the columns that are in the model that are also a primary key
# *[col for col in model.__table__.c if col.primary_key]
)
class A_Table(Base):
pass
class B_Table(Base):
defined = [A_Table]
possibleSolution(B_Table)
engine = create_engine('sqlite://')
Base.metadata.create_all(bind=engine)
Session = sessionmaker(bind=engine)
session = Session()
a = A_Table()
b = B_Table(A_TableId=a.id)
print(B_Table.__table__.primary_key)
session.add(a)
session.commit()
session.add(b)
session.commit()

Originally, the error you say the PK reassignment is causing is:
InvalidRequestError: Instance <B_Table at 0x104aa1d68> cannot be refreshed -
it's not persistent and does not contain a full primary key.
I don't get that running you MCVE, instead I get a pretty helpful warning first:
SAWarning: Column 'B_Table.A_TableId' is marked as a member of the
primary key for table 'B_Table', but has no Python-side or server-side
default generator indicated, nor does it indicate 'autoincrement=True'
or 'nullable=True', and no explicit value is passed. Primary key
columns typically may not store NULL.
And a very detailed exception message when the script fails:
sqlalchemy.orm.exc.FlushError: Instance has
a NULL identity key. If this is an auto-generated value, check that
the database table allows generation of new primary key values, and
that the mapped Column object is configured to expect these generated
values. Ensure also that this flush() is not occurring at an
inappropriate time, such as within a load() event.
So assuming that the example accurately describes your problem, the answer is straightforward. A primary key cannot be null.
A_Table inherits off Base:
class A_Table(Base):
pass
Base gives A_Table an autoincrement PK through declared_attr id():
#declared_attr
def id(cls):
return Column(Integer, cls.seq, unique=True, autoincrement=True, primary_key=True)
Similarly, B_Table is defined off Base but the PK is overwritten in possibleSolution() such that it becomes a ForeignKey to A_Table:
PrimaryKeyConstraint(Column('A_TableId', Integer(), ForeignKey('A_Table.id'), table=<B_Table>, primary_key=True, nullable=False))
Then, we instantiate an instance of A_Table without any kwargs and immediately allocate the id attribute of instance a to field A_TableId when constructing b:
a = A_Table()
b = B_Table(A_TableId=a.id)
At this point we can stop and inspect the attribute values of each:
print(a.id, b.A_TableId)
# None None
a.id is None because it's an autoincrement which needs to be populated by the database, not the ORM. So SQLAlchemy doesn't know it's value until after the instance is flushed to the database.
So what happens if we include a flush() operation after adding instance a to the session:
a = A_Table()
session.add(a)
session.flush()
b = B_Table(A_TableId=a.id)
print(a.id, b.A_TableId)
# 1 1
So by issuing the flush first, we've got a value for a.id, meaning that we also have a value for b.A_TableId.
session.add(b)
session.commit()
# no error

SQLAlchemy - select data associated with foreign key, NOT the foreign key itself

I have two tables, Name and Person.
Name:
id (int, primary key)
name (varchar)
Person:
id (int, primary key)
name_id (int, foreign key->Name.id)
Assuming my models are set up with the foreign keys, if I run Person.query.first().name_id, this will return an integer. I want it to return the name varchar. Is this possible? Or is there something I can do to get the same result?

class Name(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.Text) # or varchar
class Person(db.Model):
id = db.Column(db.Integer, primary_key=True)
name_id = db.Column(db.Integer, db.ForeignKey('name.id'))
_name = db.relationship('Name')
#property
def name(self):
return self._name.name
Getting the actual name back could be done like this or with a select in the name function. I prefer this way with a property. You'll need to fill in the details with how you are using joins in the relationship and other details.
You could #property to#hybrid_property to get some neat functionality from SQLAlchemy. Of course, you need to use it effectively.

SQLAlchemy column synonym with different type

I'm using the SQLAlchemy recipe here to magically JSON encode/decode a column from the DB in my model like:
class Thing(Base):
__tablename__ = 'things'
id = Column(Integer(), primary_key=True)
data = Column(JSONEncodedDict)
I hit a snag when I wanted to create an extra "raw_data" field in my model to access the same underlying JSON data, but without encoding/decoding it:
raw_data = Column("data", VARCHAR)
SQLAlchemy seems to get confused by the name collision and leave one column un-mapped. Is there any way I can convince SQLAlchemy to actually map both attributes to the same column?

I would just define the raw_data column through SQLAlchemy and then use Python's property/setter to make transparent use of data. I.e.:
class Thing(Base):
__tablename__ = 'things'
id = Column(Integer(), primary_key=True)
raw_data = Column(String())
#property
def data(self):
# add some checking here too
return json.loads(self.raw_data)
#data.setter
def data(self, value):
# dito
self.raw_data = json.dumps(value)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Automatically truncate strings in sqlalchemy's ORM (postgresql database) - python

Related

SQLAlchemy unique constrain by field

Flask-SQLAlchemy serializable objects with integer, float and boolean types in JSON

Change SQLAlchemy Primary Key after it has been defined

SQLAlchemy - select data associated with foreign key, NOT the foreign key itself

SQLAlchemy column synonym with different type

Categories

Resources