Using Python and SQLAlchemy, is it possible to insert None / NIL_UUID / NULL value in a Postgresql foreign key column that links to a primary key, both stored as UUID ?
None returns column none does not exist :
statement = "INSERT INTO tb_person (pk_person, first_name, last_name, fk_person_parent) VALUES ('9ce131...985
fea06', 'John', 'Doe', None)"
parameters = {}, context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0x7fbff5ea2730>
def do_execute(self, cursor, statement, parameters, context=None):
> cursor.execute(statement, parameters)
E psycopg2.errors.UndefinedColumn: column "none" does not exist
E LINE 1: '9ce131...985','John', 'Doe', None)
E ^
E HINT: Perhaps you meant to reference the column "tb_person.last_name".
../../.local/share/virtualenvs/project/lib/python3.8/site-packages/sqlalchemy/engine/default.py:593: UndefinedColumn
a NIL_UUID (i.e. a valid UUID formed with 0s) returns psycopg2.errors.ForeignKeyViolation:
E psycopg2.errors.ForeignKeyViolation: insert or update on table "tb_person" violates foreign key constrain
t "tb_person_fk_person_parent_fkey"
E DETAIL: Key (fk_person_parent)=(00000000-0000-0000-0000-000000000000) is not present in table "tb_person
".
MORE DETAILS
I use SQLAlchemy classical mapping (SQLAlchemy Core), my table is defined like this :
tb_person = Table(
"tb_person",
metadata,
Column(
"pk_person",
UUID(as_uuid=True),
default=uuid.uuid4,
unique=True,
nullable=False
),
Column("first_name", String(255)),
Column("last_name", String(255)),
Column(
"fk_person_parent", UUID(as_uuid=True),
ForeignKey("tb_person.pk_person"),
nullable=True
)
)
The mapper is defined like this :
client_mapper = mapper(
domain.model.Person,
tb_person,
properties={
"child": relationship(domain.model.Person),
},
)
The unit test works well when inserting a UUID that already exists in the database in the pk_person field.
I was using raw SQL, and as suggested by #AdrianKlaver it is much preferable to use params - it fixes the problem.
# In the context of this unit test,
# we have to handle UUID generation here
uuid_1 = uuid.uuid4()
uuid_2 = uuid.uuid4()
uuid_3 = uuid.uuid4()
session.execute(
"""
INSERT INTO tb_person
(pk_person, first_name, last_name, fk_parent_person)
VALUES
(:uuid_1, 'John', 'Doe', :uuid_none),
(:uuid_2, 'Jean', 'Dupont', :uuid_none)
(:uuid_3, 'Baby', 'Doe', :uuid_1)
""",
{"uuid_1": uuid_1, "uuid_2", "uuid_3": uuid_3, ":uuid_none": None}
)
It effectively translates to NULL in the query.
Related
Aim; to have full cascade save/load/delete with SQLAlchemy from the root entity (scenario) through Segment and then Timeseries.
Question; from a parent class, BaseSegment, with multiple properties that link to a child class, Timeseries, whose table links back with segment_id, how do I configure a backref?
class Timeseries(Base):
__tablename__ = "timeseries"
__table_args__ = {"schema": "public"}
# values etc
segment_id = Column(UUID(as_uuid=True), ForeignKey("public.segments.id"))
class BaseSegment(Base):
__tablename__ = "segments"
__table_args__ = {"schema": "public"}
actual_effect_id = Column(UUID(as_uuid=True), ForeignKey(Timeseries.id))
actual_effect = relationship(
EffectVarTimeseries,
backref="segment",
foreign_keys=[actual_effect_id],
remote_side=Timeseries.segment_id,
)
predicted_effect_id = Column(UUID(as_uuid=True), ForeignKey(Timeseries.id))
predicted_effect = relationship(
EffectVarTimeseries,
backref="segment",
foreign_keys=[predicted_effect_id],
remote_side=Timeseries.segment_id,
)
Crashes like this:
sqlalchemy.exc.InvalidRequestError: One or more mappers failed to initialize - can't proceed with initialization of other mappers. Triggering mapper: 'mapped class BaseSegment->segments'. Original exception was: Relationship BaseSegment.actual_effect could not determine any unambiguous local/remote column pairs based on join condition and remote_side arguments. Consider using the remote() annotation to accurately mark those elements of the join condition that are on the remote side of the relationship.
In particular, when I have remote_side=Timeseries.segment_id I get this crash.
If I remove a bit of the config above, instead of the Session cascade-writing from segment -> timeseries, I get this error*:
sqlalchemy.exc.IntegrityError: (psycopg2.errors.NotNullViolation) null value in column "segment_id" of relation "timeseries" violates not-null constraint
Furthermore, if I generate the UUID id from the calling python code, the DB complains that basically the segment's Id doesn't exist (yet!)**.
I can't just do a back_populates because I don't know whether the Timeseries is for a predicted_effect or actual_effect. How do I model this?
*Details, full error:
E sqlalchemy.exc.IntegrityError: (psycopg2.errors.NotNullViolation) null value in column "segment_id" of relation "timeseries" violates not-null constraint
E DETAIL: Failing row contains (b2dff54f-ff1c-4457-8cf1-0783ea2acdbb, effect_var, null, revenue, EUR, Resulting amount revenue per day, {45753.00000000003,63875.00000000002,12731.999999999998,56585,37..., datetime64[ns], D, null, null, null, revenue, null).
E
E [SQL: INSERT INTO public.timeseries (id, type, variable_name, variable_unit, variable_description, values, dtype, freq, segment_id, funnel_level, treatment, behavioural_segment_id, effect) VALUES (%(id)s, %(type)s, %(variable_name)s, %(variable_unit)s, %(variable_description)s, %(values)s::FLOAT[], %(dtype)s, %(freq)s, %(segment_id)s, %(funnel_level)s, %(treatment)s, %(behavioural_segment_id)s, %(effect)s)]
E [parameters: {'id': UUID('b2dff54f-ff1c-4457-8cf1-0783ea2acdbb'), 'type': 'effect_var', 'variable_name': 'revenue', 'variable_unit': 'EUR', 'variable_description': 'Resulting amount revenue per day', 'values': [45753.00000000003, 63875.00000000002, 12731.999999999998, 56585.0, 37048.99999999997, 19938.999999999985, 28346.999999999985], 'dtype': 'datetime64[ns]', 'freq': 'D', 'segment_id': None, 'funnel_level': None, 'treatment': None, 'behavioural_segment_id': None, 'effect': 'revenue'}]
**For the client-generated UUID:
sqlalchemy.exc.IntegrityError: (psycopg2.errors.ForeignKeyViolation) insert or update on table "timeseries" violates foreign key constraint "timeseries_segment_id_fkey"
References
backref
working with related objects
annotations recommended
mailing list similar error
I'm trying to understand what the set_ means in SQLAlchemy's on_conflict_do_update method. i have the following Table:
Table(
"test",
metadata,
Column("id", Integer, primary_key=True),
Column("firstname", String(100)),
Column("lastname", String(100)),
)
and what insert some like this (if i wrote it in psql)
INSERT INTO test (id, firstname, lastname) VALUES (1, 'John', 'Doe)
ON CONFLICT (id) DO UPDATE SET firstname = EXCLUDED.firstname, lastname = EXCLUDED.lastname
I did some due diligence and saw people write in the set_ like this:
import sqlalchemy.dialects import postgresql
insert_stmt = postgresql.insert(target).values([{'id':1,'firstname':'John','lastname':'Doe'}])
primary_keys = [key.name for key in inspect(target).primary_key]
update_dict = {c.name: c for c in insert_stmt.excluded if not c.primary_key}
stmt = insert_stmt.on_conflict_do_update(index_elements = primary_keys , set_ = update_dict)
engine.execute(stmt)
Is the update_dict just looking at the EXCLUDED values (the ones I want to update with) that I set in my insert_stmt? If I str(update_dict) I get an dictionary of specific information regarding the column {'firstname': Column('firstname', VARCHAR(length=100), table=<excluded>), 'lastname': Column('lastname', VARCHAR(length=100), table=<excluded>)}, is the method above the only way to retrieve the data? Can you write it out manually?
Given a table with the following schema:
create table json_data (
id integer PRIMARY KEY NOT NULL,
default_object VARCHAR(10) NOT NULL,
data jsonb NOT NULL
);
For each of entity in the table I want to retrieve value of data['first']['name'] field, or if it's null value of data[json_data.default_object]['name'], or if the latter is also null then return some default value. In "pure" SQL I can write the following code to satisfy my needs:
insert into
json_data(
id,
default_object,
data
)
values(
0,
'default',
'{"first": {"name": "first_name_1"}, "default": {"name": "default_name_1"}}'
),
(
1,
'default',
'{"first": {}, "default": {"name": "default_name_2"}}'
);
select
id,
coalesce(
json_data.data -> 'first' ->> 'name',
json_data.data -> json_data.default_object ->> 'name',
'default_value'
) as value
from
json_data;
I tried to "translate" the "model" above into an SQLAlchemy entity:
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.hybrid import hybrid_property
Base = declarative_base()
class JsonObject(Base):
__tablename__ = 'json_data'
id = sa.Column(sa.Integer, primary_key=True)
default_object = sa.Column(sa.String(10), nullable=False)
data = sa.Column(postgresql.JSONB, nullable=False)
#hybrid_property
def name(self) -> str:
obj = self.data.get('first')
default_obj = self.data.get(self.default_object)
return (obj.get('name') if obj else default_obj.get('name')) or default_obj.get('name')
#name.setter
def name(self, value: str):
obj = self.data.setdefault('first', dict())
obj['name'] = value
#name.expression
def name(self):
return sa.func.coalesce(
self.data[('first', 'name')].astext,
self.data[(self.default_object, 'name')].astext,
'default_value',
)
But it seems that expression for the name hybrid property doesn't work as I expect. If I query entities by name property, like:
query = session.query(JsonObject).filter(JsonObject.name == 'name')
The query is expanded by SQLAlchemy into a something like this:
SELECT json_data.id AS json_data_id, json_data.default_object AS json_data_default_object, json_data.data AS json_data_data
FROM json_data
WHERE coalesce((json_data.data #> %(data_1)s), (json_data.data #> %(data_2)s), %(coalesce_1)s) = %(coalesce_2)s
It uses path operator instead of index operator. What should I do to make SQLAlchemy create an expression such as I wrote in the beginning of the question?
Ok, the solution I found is quite straightforward. As SQLAlchemy documentation tells:
Index operations return an expression object whose type defaults to JSON by default, so that further JSON-oriented instructions may be called upon the result type.
Therefore we can use "chained" python indexing operators. So the following code looks legit to me:
class JsonObject(Base):
# Almost the same stuff, except for the following:
#name.expression
def name(self):
return sa.func.coalesce(
self.data['first']['name'].astext,
self.data[self.default_object]['name'].astext,
'default_value',
)
Using SQLAlchemy on PostgreSQL, I try to improve performance at insertion (about 100k egdes to insert), executing "nested inserts" in a single query for one edge and its nodes.
Using Insert.from_select, I get following error and I don't really understand why.
CompileError: bindparam() name 'name' is reserved for automatic usage in the VALUES or SET clause of this insert/update statement. Please use a name other than column name when using bindparam() with insert() or update() (for example, 'b_name').
from sqlalchemy import *
metadata = MetaData()
node = Table('node', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
)
edge = Table('edge', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
Column('source_id', Integer(), ForeignKey(node.c.id)),
Column('target_id', Integer(), ForeignKey(node.c.id)),
)
engine = create_engine('postgres://postgres:postgres#db:5432')
metadata.create_all(engine)
e1_source = insert(node).values(name='e1_source').returning(node.c.id).cte('source')
e1_target = insert(node).values(name='e1_target').returning(node.c.id).cte('target')
e1 = insert(edge).from_select(
['source_id', 'target_id', 'name'], # bindparam error
# ['source_id', 'target_id', 'b_name'], # key error
# [edge.c.source_id, edge.c.target_id, edge.c.name], # bindparam error
select([
e1_source.c.id,
e1_target.c.id,
literal('e1'),
])
)
engine.execute(e1)
EDIT: Below, the SQL query I was expected to produces. I remain open to any suggestions to achieve my purpose though.
CREATE TABLE node (
id SERIAL PRIMARY KEY,
name VARCHAR
);
CREATE TABLE edge (
id SERIAL PRIMARY KEY,
source_id INTEGER REFERENCES node (id),
target_id INTEGER REFERENCES node (id),
name VARCHAR
);
WITH source AS (
INSERT INTO node (name)
VALUES ('e1_source')
RETURNING id
), target as (
INSERT INTO node (name)
VALUES ('e1_target')
RETURNING id
)
INSERT INTO edge (source_id, target_id, name)
SELECT source.id, target.id, 'e1'
FROM source, target;
I have finally figured out where bindparam was implicitly used by SQLAlchemy to solve my issue: in the node queries and not the edge query as I was first thinking.
But I am still not sure if this is the proper way to perform nested insert queries with SQLAlchemy and if it will improve execution time.
e1_source = insert(node).values(name=bindparam('source_name')).returning(node.c.id).cte('source')
e1_target = insert(node).values(name=bindparam('target_name')).returning(node.c.id).cte('target')
e1 = insert(edge).from_select(
['source_id', 'target_id', 'name'],
select([
e1_source.c.id,
e1_target.c.id,
literal('e1'),
])
)
engine.execute(e1, {
'source_name': 'e1_source',
'target_name': 'e1_target',
})
I have a legacy database that creates default values for several columns using a variety of stored procedures. It would be more or less prohibitive to try and track down the names and add queries to my code, not to mention a maintenance nightmare.
What I would like is to be able to tell sqlalchemy to ignore the columns that I don't really care about. Unfortunately, it doesn't. Instead it provides null values that violate the DB constraints.
Here's an example of what I mean:
import sqlalchemy as sa
import logging
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
l = logging.getLogger('sqlalchemy.engine')
l.setLevel(logging.INFO)
l.addHandler(logging.StreamHandler())
engine = sa.create_engine('postgresql+psycopg2://user#host:port/dbname')
Session = sessionmaker(bind=engine)
session = Session()
temp_metadata = sa.MetaData(schema='pg_temp')
TempBase = declarative_base(metadata=temp_metadata)
with session.begin(subtransactions=True):
session.execute('''
CREATE TABLE pg_temp.whatevs (
id serial
, fnord text not null default 'fnord'
, value text not null
);
INSERT INTO pg_temp.whatevs (value) VALUES ('something cool');
''')
class Whatever(TempBase):
__tablename__ = 'whatevs'
id = sa.Column('id', sa.Integer, primary_key=True, autoincrement=True)
fnord = sa.Column('fnord', sa.String)
value = sa.Column('value', sa.String)
w = Whatever(value='something cool')
session.add(w)
This barfs, because:
INSERT INTO pg_temp.whatevs (fnord, value) VALUES (%(fnord)s, %(value)s) RETURNING pg_temp.whatevs.id
{'fnord': None, 'value': 'something cool'}
ROLLBACK
Traceback (most recent call last):
File "/home/wayne/.virtualenvs/myenv/lib64/python3.5/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
context)
File "/home/wayne/.virtualenvs/myenv/lib64/python3.5/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
cursor.execute(statement, parameters)
psycopg2.IntegrityError: null value in column "fnord" violates not-null constraint
DETAIL: Failing row contains (2, null, something cool).
What I expected is that it would just skip out on the fnord column, since it didn't get set.
Even if I do:
w = Whatever()
w.value = 'this breaks too'
or add:
def __init__(self, value):
self.value = value
to the Whatever class... still no dice.
How can I tell sqlalchemy that "look, these other columns are fine, I know I'm not providing a value - the database is going to take care of that for me. It's okay, just don't worry about these columns"?
The only way I'm aware of is to futz with the class definition and lie, saying those columns don't exist... but I do actually want them to come in on queries.
Add a server side default with server_default for fnord:
class Whatever(TempBase):
__tablename__ = 'whatevs'
id = sa.Column(sa.Integer, primary_key=True, autoincrement=True)
fnord = sa.Column(sa.String, nullable=False, server_default='fnord')
value = sa.Column(sa.String, nullable=False)
SQLAlchemy quite happily lets the default do its thing server side, if just told about it. If you have columns that do not have a default set in the DDL, but through triggers, stored procedures, or the like, have a look at FetchedValue.
A test with SQLite:
In [8]: engine.execute("""CREATE TABLE whatevs (
...: id INTEGER NOT NULL,
...: fnord VARCHAR DEFAULT 'fnord' NOT NULL,
...: value VARCHAR NOT NULL,
...: PRIMARY KEY (id)
...: )""")
In [12]: class Whatever(Base):
...: __tablename__ = 'whatevs'
...: id = Column(Integer, primary_key=True, autoincrement=True)
...: fnord = Column(String, nullable=False, server_default="fnord")
...: value = Column(String, nullable=False)
...:
In [13]: session.add(Whatever(value='asdf'))
In [14]: session.commit()
2016-08-31 23:46:09,826 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
2016-08-31 23:46:09,827 INFO sqlalchemy.engine.base.Engine INSERT INTO whatevs (value) VALUES (?)
INFO:sqlalchemy.engine.base.Engine:INSERT INTO whatevs (value) VALUES (?)
2016-08-31 23:46:09,827 INFO sqlalchemy.engine.base.Engine ('asdf',)
INFO:sqlalchemy.engine.base.Engine:('asdf',)
2016-08-31 23:46:09,828 INFO sqlalchemy.engine.base.Engine COMMIT
INFO:sqlalchemy.engine.base.Engine:COMMIT