Aim; to have full cascade save/load/delete with SQLAlchemy from the root entity (scenario) through Segment and then Timeseries.
Question; from a parent class, BaseSegment, with multiple properties that link to a child class, Timeseries, whose table links back with segment_id, how do I configure a backref?
class Timeseries(Base):
__tablename__ = "timeseries"
__table_args__ = {"schema": "public"}
# values etc
segment_id = Column(UUID(as_uuid=True), ForeignKey("public.segments.id"))
class BaseSegment(Base):
__tablename__ = "segments"
__table_args__ = {"schema": "public"}
actual_effect_id = Column(UUID(as_uuid=True), ForeignKey(Timeseries.id))
actual_effect = relationship(
EffectVarTimeseries,
backref="segment",
foreign_keys=[actual_effect_id],
remote_side=Timeseries.segment_id,
)
predicted_effect_id = Column(UUID(as_uuid=True), ForeignKey(Timeseries.id))
predicted_effect = relationship(
EffectVarTimeseries,
backref="segment",
foreign_keys=[predicted_effect_id],
remote_side=Timeseries.segment_id,
)
Crashes like this:
sqlalchemy.exc.InvalidRequestError: One or more mappers failed to initialize - can't proceed with initialization of other mappers. Triggering mapper: 'mapped class BaseSegment->segments'. Original exception was: Relationship BaseSegment.actual_effect could not determine any unambiguous local/remote column pairs based on join condition and remote_side arguments. Consider using the remote() annotation to accurately mark those elements of the join condition that are on the remote side of the relationship.
In particular, when I have remote_side=Timeseries.segment_id I get this crash.
If I remove a bit of the config above, instead of the Session cascade-writing from segment -> timeseries, I get this error*:
sqlalchemy.exc.IntegrityError: (psycopg2.errors.NotNullViolation) null value in column "segment_id" of relation "timeseries" violates not-null constraint
Furthermore, if I generate the UUID id from the calling python code, the DB complains that basically the segment's Id doesn't exist (yet!)**.
I can't just do a back_populates because I don't know whether the Timeseries is for a predicted_effect or actual_effect. How do I model this?
*Details, full error:
E sqlalchemy.exc.IntegrityError: (psycopg2.errors.NotNullViolation) null value in column "segment_id" of relation "timeseries" violates not-null constraint
E DETAIL: Failing row contains (b2dff54f-ff1c-4457-8cf1-0783ea2acdbb, effect_var, null, revenue, EUR, Resulting amount revenue per day, {45753.00000000003,63875.00000000002,12731.999999999998,56585,37..., datetime64[ns], D, null, null, null, revenue, null).
E
E [SQL: INSERT INTO public.timeseries (id, type, variable_name, variable_unit, variable_description, values, dtype, freq, segment_id, funnel_level, treatment, behavioural_segment_id, effect) VALUES (%(id)s, %(type)s, %(variable_name)s, %(variable_unit)s, %(variable_description)s, %(values)s::FLOAT[], %(dtype)s, %(freq)s, %(segment_id)s, %(funnel_level)s, %(treatment)s, %(behavioural_segment_id)s, %(effect)s)]
E [parameters: {'id': UUID('b2dff54f-ff1c-4457-8cf1-0783ea2acdbb'), 'type': 'effect_var', 'variable_name': 'revenue', 'variable_unit': 'EUR', 'variable_description': 'Resulting amount revenue per day', 'values': [45753.00000000003, 63875.00000000002, 12731.999999999998, 56585.0, 37048.99999999997, 19938.999999999985, 28346.999999999985], 'dtype': 'datetime64[ns]', 'freq': 'D', 'segment_id': None, 'funnel_level': None, 'treatment': None, 'behavioural_segment_id': None, 'effect': 'revenue'}]
**For the client-generated UUID:
sqlalchemy.exc.IntegrityError: (psycopg2.errors.ForeignKeyViolation) insert or update on table "timeseries" violates foreign key constraint "timeseries_segment_id_fkey"
References
backref
working with related objects
annotations recommended
mailing list similar error
Related
I am trying to update one item at a time using the Django ORM with TimescaleDB as my database.
I have a timesacle hypertable defined by the following model:
class RecordTimeSeries(models.Model):
# NOTE: We have removed the primary key (unique constraint) manually, since we don't want an id column
timestamp = models.DateTimeField(primary_key=True)
location = PointField(srid=settings.SRID, null=True)
station = models.ForeignKey(Station, on_delete=models.CASCADE)
# This is a ForeignKey and not an OneToOneField because of [this](https://stackoverflow.com/questions/61205063/error-cannot-create-a-unique-index-without-the-column-date-time-used-in-part)
record = models.ForeignKey(Record, null=True, on_delete=models.CASCADE)
temperature_celsius = models.FloatField(null=True)
class Meta:
unique_together = (
"timestamp",
"station",
"record",
)
When I update the item using save():
record_time_series = models.RecordTimeSeries.objects.get(
record=record,
timestamp=record.timestamp,
station=record.station,
)
record_time_series.location=record.location
record_time_series.temperature_celsius=temperature_celsius
record_time_series.save()
I get the following error:
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "5_69_db_recordtimeseries_timestamp_station_id_rec_0c66b9ab_uniq"
DETAIL: Key ("timestamp", station_id, record_id)=(2022-05-25 09:15:00+00, 2, 2) already exists.
and I see that the query that django used is the following:
{'sql': 'UPDATE "db_recordtimeseries" SET "location" = NULL, "station_id" = 2, "record_id" = 2, "temperature_celsius" = 26.0 WHERE "db_recordtimeseries"."timestamp" = \'2022-05-25T09:15:00\'::timestamp', 'time': '0.007'}
On the other hand the update is successful with update():
record_time_series = models.RecordTimeSeries.objects.filter(
record=record,
timestamp=record.timestamp,
station=record.station,
)
record_time_series.update(
location=record.location,
temperature_celsius=temperature_celsius,
)
and the sql used by django is:
{'sql': 'UPDATE "db_recordtimeseries" SET "location" = NULL, "temperature_celsius" = 25.0 WHERE ("db_recordtimeseries"."record_id" = 2 AND "db_recordtimeseries"."station_id" = 2 AND "db_recordtimeseries"."timestamp" = \'2022-05-25T09:15:00\'::timestamp)', 'time': '0.012'}
Obviously, the first query is wrong because it does not have the correct parameters in the WHERE clause, but why doesn't django include those parameters, since timestamp is not a unique key, and how can this be fixed?
I think the error was caused because of foregn_key:
Firstly:
Be aware that the update() method is converted directly to an SQL
statement. It is a bulk operation for direct updates. It doesn’t run
any save() methods on your models, or emit the pre_save or post_save
signals (which are a consequence of calling save()), or honor the
auto_now field option.
source
Secondly:
Analogous to ON DELETE there is also ON UPDATE which is invoked when a
referenced column is changed (updated). The possible actions are the
same. In this case, CASCADE means that the updated values of the
referenced column(s) should be copied into the referencing row(s).
source
Using Python and SQLAlchemy, is it possible to insert None / NIL_UUID / NULL value in a Postgresql foreign key column that links to a primary key, both stored as UUID ?
None returns column none does not exist :
statement = "INSERT INTO tb_person (pk_person, first_name, last_name, fk_person_parent) VALUES ('9ce131...985
fea06', 'John', 'Doe', None)"
parameters = {}, context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0x7fbff5ea2730>
def do_execute(self, cursor, statement, parameters, context=None):
> cursor.execute(statement, parameters)
E psycopg2.errors.UndefinedColumn: column "none" does not exist
E LINE 1: '9ce131...985','John', 'Doe', None)
E ^
E HINT: Perhaps you meant to reference the column "tb_person.last_name".
../../.local/share/virtualenvs/project/lib/python3.8/site-packages/sqlalchemy/engine/default.py:593: UndefinedColumn
a NIL_UUID (i.e. a valid UUID formed with 0s) returns psycopg2.errors.ForeignKeyViolation:
E psycopg2.errors.ForeignKeyViolation: insert or update on table "tb_person" violates foreign key constrain
t "tb_person_fk_person_parent_fkey"
E DETAIL: Key (fk_person_parent)=(00000000-0000-0000-0000-000000000000) is not present in table "tb_person
".
MORE DETAILS
I use SQLAlchemy classical mapping (SQLAlchemy Core), my table is defined like this :
tb_person = Table(
"tb_person",
metadata,
Column(
"pk_person",
UUID(as_uuid=True),
default=uuid.uuid4,
unique=True,
nullable=False
),
Column("first_name", String(255)),
Column("last_name", String(255)),
Column(
"fk_person_parent", UUID(as_uuid=True),
ForeignKey("tb_person.pk_person"),
nullable=True
)
)
The mapper is defined like this :
client_mapper = mapper(
domain.model.Person,
tb_person,
properties={
"child": relationship(domain.model.Person),
},
)
The unit test works well when inserting a UUID that already exists in the database in the pk_person field.
I was using raw SQL, and as suggested by #AdrianKlaver it is much preferable to use params - it fixes the problem.
# In the context of this unit test,
# we have to handle UUID generation here
uuid_1 = uuid.uuid4()
uuid_2 = uuid.uuid4()
uuid_3 = uuid.uuid4()
session.execute(
"""
INSERT INTO tb_person
(pk_person, first_name, last_name, fk_parent_person)
VALUES
(:uuid_1, 'John', 'Doe', :uuid_none),
(:uuid_2, 'Jean', 'Dupont', :uuid_none)
(:uuid_3, 'Baby', 'Doe', :uuid_1)
""",
{"uuid_1": uuid_1, "uuid_2", "uuid_3": uuid_3, ":uuid_none": None}
)
It effectively translates to NULL in the query.
Using SQLAlchemy on PostgreSQL, I try to improve performance at insertion (about 100k egdes to insert), executing "nested inserts" in a single query for one edge and its nodes.
Using Insert.from_select, I get following error and I don't really understand why.
CompileError: bindparam() name 'name' is reserved for automatic usage in the VALUES or SET clause of this insert/update statement. Please use a name other than column name when using bindparam() with insert() or update() (for example, 'b_name').
from sqlalchemy import *
metadata = MetaData()
node = Table('node', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
)
edge = Table('edge', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
Column('source_id', Integer(), ForeignKey(node.c.id)),
Column('target_id', Integer(), ForeignKey(node.c.id)),
)
engine = create_engine('postgres://postgres:postgres#db:5432')
metadata.create_all(engine)
e1_source = insert(node).values(name='e1_source').returning(node.c.id).cte('source')
e1_target = insert(node).values(name='e1_target').returning(node.c.id).cte('target')
e1 = insert(edge).from_select(
['source_id', 'target_id', 'name'], # bindparam error
# ['source_id', 'target_id', 'b_name'], # key error
# [edge.c.source_id, edge.c.target_id, edge.c.name], # bindparam error
select([
e1_source.c.id,
e1_target.c.id,
literal('e1'),
])
)
engine.execute(e1)
EDIT: Below, the SQL query I was expected to produces. I remain open to any suggestions to achieve my purpose though.
CREATE TABLE node (
id SERIAL PRIMARY KEY,
name VARCHAR
);
CREATE TABLE edge (
id SERIAL PRIMARY KEY,
source_id INTEGER REFERENCES node (id),
target_id INTEGER REFERENCES node (id),
name VARCHAR
);
WITH source AS (
INSERT INTO node (name)
VALUES ('e1_source')
RETURNING id
), target as (
INSERT INTO node (name)
VALUES ('e1_target')
RETURNING id
)
INSERT INTO edge (source_id, target_id, name)
SELECT source.id, target.id, 'e1'
FROM source, target;
I have finally figured out where bindparam was implicitly used by SQLAlchemy to solve my issue: in the node queries and not the edge query as I was first thinking.
But I am still not sure if this is the proper way to perform nested insert queries with SQLAlchemy and if it will improve execution time.
e1_source = insert(node).values(name=bindparam('source_name')).returning(node.c.id).cte('source')
e1_target = insert(node).values(name=bindparam('target_name')).returning(node.c.id).cte('target')
e1 = insert(edge).from_select(
['source_id', 'target_id', 'name'],
select([
e1_source.c.id,
e1_target.c.id,
literal('e1'),
])
)
engine.execute(e1, {
'source_name': 'e1_source',
'target_name': 'e1_target',
})
With the below models, why does the following interactive succeed in adding duplicate associations to a relationship during the same transaction? I expected (and need) to it fail with the UniqueConstraint placed on the association table.
Models:
from app import db # this is the access to SQLAlchemy
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
sz_shirt_dress_sleeve = db.relationship(
'SizeKeyShirtDressSleeve',
secondary=LinkUserSizeShirtDressSleeve,
backref=db.backref('users', lazy='dynamic'),
order_by="asc(SizeKeyShirtDressSleeve.id)")
class SizeKeyShirtDressSleeve(db.Model):
id = db.Column(db.Integer, primary_key=True)
size = db.Column(db.Integer)
def __repr__(self):
return 'Dress shirt sleeve size: %r' % self.size
LinkUserSizeShirtDressSleeve = db.Table(
'link_user_size_shirt_dress_sleeve',
db.Column(
'size_id',
db.Integer,
db.ForeignKey('size_key_shirt_dress_sleeve.id'), primary_key=True),
db.Column(
'user_id',
db.Integer,
db.ForeignKey('user.id'), primary_key=True),
db.UniqueConstraint('size_id', 'user_id', name='uq_association')
)
Because of the UniqueConstraint on the association table, I expected this interactive session to cause an IntegrityError. It doesn't and allows me to associate the same size twice:
>>> from app.models import User, SizeKeyShirtDressSleeve
>>> db.session.add(User(id=8))
>>> db.session.commit()
>>> u = User.query.filter(User.id==8).one()
>>> u
<User id: 8, email: None, password_hash: None>
>>> u.sz_shirt_dress_sleeve
[]
>>> should_cause_error = SizeKeyShirtDressSleeve.query.first()
>>> should_cause_error
Dress shirt sleeve size: 3000
>>> u.sz_shirt_dress_sleeve.append(should_cause_error)
>>> u.sz_shirt_dress_sleeve.append(should_cause_error)
>>> u.sz_shirt_dress_sleeve
[Dress shirt sleeve size: 3000, Dress shirt sleeve size: 3000]
>>> db.session.commit()
>>>
Wait, what? Isn't that relationship representative of what is in my association table? I guess I should verify that:
(immediately after, same session)
>>> from app.models import LinkUserSizeShirtDressSleeve as Sleeve
>>> db.session.query(Sleeve).filter(Sleeve.c.user_id==8).all()
[(1, 8)]
>>>
So u.sz_shirt_dress_sleeve wasn't accurately representing the state of the association table. ...Okay. But I need it to. In fact, I do know it will fail if I do try to add another should_cause_error object to the relationship:
>>> u.sz_shirt_dress_sleeve.append(should_cause_error)
>>> db.session.commit()
# huge stack trace
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: link_user_size_shirt_dress_sleeve.size_id, link_user_size_shirt_dress_sleeve.user_id [SQL: 'INSERT INTO link_user_size_shirt_dress_sleeve (size_id, user_id) VALUES (?, ?)'] [parameters: (1, 8)] (Background on this error at: http://sqlalche.me/e/gkpj)
>>>
Great! Okay, so things I'm inferring:
1) It's possible to have duplicate items in the relationship list.
2) It's possible for the relationship list to not accurately reflect the state of the association table it is responsible for.
3) The UniqueConstraint works ...as long as I continue interacting with the relationship in separate transactions (punctuated by session.commit()).
Questions: Are 1), 2), or 3) incorrect? And how can I prevent duplicate items being present in my relationship list inside the same transaction?
Those three things are all correct. 3) should be qualified: the UniqueConstraint always works in the sense that your database will never be inconsistent; it just doesn't give you an error unless the relationship you're adding is already flushed.
The fundamental reason this happens is an impedance mismatch between an association table in SQL and its representation in SQLAlchemy. A table in SQL is a multiset of tuples, so with that UNIQUE constraint, your LinkUserSizeShirtDressSleeve table is a set of (size_id, user_id) tuples. On the other hand, the default representation of a relationship in SQLAlchemy an ordered list of objects, but it imposes some limitations on the way it maintains this list and the way it expects you to interact with this list, so it behaves more like a set in some ways. In particular, it silently ignores duplicate entries in your association table (if you happen to not have a UNIQUE constraint), and it assumes that you never add duplicate objects to this list in the first place.
If this is a problem for you, just make the behavior more in line with SQL by using collection_class=set on your relationship. If you want an error to be raised when you add duplicate entries into the relationship, create a custom collection class based on set that fails on duplicate adds. In some of my projects, I've resorted to monkey-patching the relationship constructor to set collection_class=set on all of my relationships to make this less verbose.
Here's how I would such a custom collection class:
class UniqueSet(set):
def add(self, el):
if el in self:
raise ValueError("Value already exists")
super().add(el)
I have a legacy database that creates default values for several columns using a variety of stored procedures. It would be more or less prohibitive to try and track down the names and add queries to my code, not to mention a maintenance nightmare.
What I would like is to be able to tell sqlalchemy to ignore the columns that I don't really care about. Unfortunately, it doesn't. Instead it provides null values that violate the DB constraints.
Here's an example of what I mean:
import sqlalchemy as sa
import logging
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
l = logging.getLogger('sqlalchemy.engine')
l.setLevel(logging.INFO)
l.addHandler(logging.StreamHandler())
engine = sa.create_engine('postgresql+psycopg2://user#host:port/dbname')
Session = sessionmaker(bind=engine)
session = Session()
temp_metadata = sa.MetaData(schema='pg_temp')
TempBase = declarative_base(metadata=temp_metadata)
with session.begin(subtransactions=True):
session.execute('''
CREATE TABLE pg_temp.whatevs (
id serial
, fnord text not null default 'fnord'
, value text not null
);
INSERT INTO pg_temp.whatevs (value) VALUES ('something cool');
''')
class Whatever(TempBase):
__tablename__ = 'whatevs'
id = sa.Column('id', sa.Integer, primary_key=True, autoincrement=True)
fnord = sa.Column('fnord', sa.String)
value = sa.Column('value', sa.String)
w = Whatever(value='something cool')
session.add(w)
This barfs, because:
INSERT INTO pg_temp.whatevs (fnord, value) VALUES (%(fnord)s, %(value)s) RETURNING pg_temp.whatevs.id
{'fnord': None, 'value': 'something cool'}
ROLLBACK
Traceback (most recent call last):
File "/home/wayne/.virtualenvs/myenv/lib64/python3.5/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
context)
File "/home/wayne/.virtualenvs/myenv/lib64/python3.5/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
cursor.execute(statement, parameters)
psycopg2.IntegrityError: null value in column "fnord" violates not-null constraint
DETAIL: Failing row contains (2, null, something cool).
What I expected is that it would just skip out on the fnord column, since it didn't get set.
Even if I do:
w = Whatever()
w.value = 'this breaks too'
or add:
def __init__(self, value):
self.value = value
to the Whatever class... still no dice.
How can I tell sqlalchemy that "look, these other columns are fine, I know I'm not providing a value - the database is going to take care of that for me. It's okay, just don't worry about these columns"?
The only way I'm aware of is to futz with the class definition and lie, saying those columns don't exist... but I do actually want them to come in on queries.
Add a server side default with server_default for fnord:
class Whatever(TempBase):
__tablename__ = 'whatevs'
id = sa.Column(sa.Integer, primary_key=True, autoincrement=True)
fnord = sa.Column(sa.String, nullable=False, server_default='fnord')
value = sa.Column(sa.String, nullable=False)
SQLAlchemy quite happily lets the default do its thing server side, if just told about it. If you have columns that do not have a default set in the DDL, but through triggers, stored procedures, or the like, have a look at FetchedValue.
A test with SQLite:
In [8]: engine.execute("""CREATE TABLE whatevs (
...: id INTEGER NOT NULL,
...: fnord VARCHAR DEFAULT 'fnord' NOT NULL,
...: value VARCHAR NOT NULL,
...: PRIMARY KEY (id)
...: )""")
In [12]: class Whatever(Base):
...: __tablename__ = 'whatevs'
...: id = Column(Integer, primary_key=True, autoincrement=True)
...: fnord = Column(String, nullable=False, server_default="fnord")
...: value = Column(String, nullable=False)
...:
In [13]: session.add(Whatever(value='asdf'))
In [14]: session.commit()
2016-08-31 23:46:09,826 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
2016-08-31 23:46:09,827 INFO sqlalchemy.engine.base.Engine INSERT INTO whatevs (value) VALUES (?)
INFO:sqlalchemy.engine.base.Engine:INSERT INTO whatevs (value) VALUES (?)
2016-08-31 23:46:09,827 INFO sqlalchemy.engine.base.Engine ('asdf',)
INFO:sqlalchemy.engine.base.Engine:('asdf',)
2016-08-31 23:46:09,828 INFO sqlalchemy.engine.base.Engine COMMIT
INFO:sqlalchemy.engine.base.Engine:COMMIT