SQLAlchemy does not respect cascade when deleting [duplicate] - python

I am developing an extension to an existing app which uses sqlalchemy 0.6.
The app has sqlalchemy tables created the non-declarative way. I am trying to create in my extension a new table with a foreign key column pointing at the primary key of the main table in the application database and I am creating it declaratively.
This all works fine, with the table created once the extension is loaded, and with no complaints at all. My table prints out and demonstrates that new rows have been added ok.
What I want and think is possible (but don't know as I have never used sql or any other database) is for the corresponding row in my table to be deleted when the row in the app's main table with the corresponding foreign key is deleted.
So far, and with many permutations having been tried, nothing has worked. I thought that with a backref set and with a relation defined with delete being cascaded, there shouldn't be a problem. Because the new table is defined in an extension which should just plugin, I don't want to edit the code in the main app at all, at least that is my goal. One of the problems that I have, though, is that the main app table that I want to reference, has no member variables defined in its class, does not declare its primary key in its mapper and only has the primary key declared in the table. This makes it difficult to create a relation(ship) clause, the first argument of which must be to a class or mapper (in this case neither of which have the primary key declared).
Is there any way of achieving this?
ps - here is some of the code that I am using. LocalFile is the declarative class. All the connection details are taken care of by the main application.
if not self.LocalFile.__table__.exists(bind=Engine):
self.LocalFile__table__.create(bind=Engine)
Here is the LocalFile class - Base is a declarative base class with bind=Engine passed in the constructor:
class LocalFile(Base):
__tablename__ = 'local_file'
_id = Column(Integer, Sequence('local_file_sequence', start=1, increment=1), primary_key=True)
_filename = Column(String(50), nullable=False)
_filepath = Column(String(128), nullable=False)
_movieid = Column(Integer, ForeignKey(db.tables.movies.c.movie_id, onupdate='CASCADE', ondelete='CASCADE'))
#movies = relation(db.Movie, backref="local_file", cascade="all")
#property
def filename(self):
return self._filename
#filename.setter
def filename(self, filename):
self._filename = filename
#property
def filepath(self):
return self._filepath
#filepath.setter
def filepath(self, filepath):
self._filepath = filepath
#property
def movieid(self):
return self._movieid
#movieid.setter
def movieid(self, movieid):
self._movieid = movieid
#property
def id(self):
return self._id
#id.setter
def id(self, id):
self._id = id
filename = synonym('_filename', descriptor=filename)
movieid = synonym('_movieid', descriptor=movieid)
filepath = synonym('_filepath', descriptor=filepath)
id = synonym('_id', descriptor=id)
def __init__(self, filename, filepath, movieid):
self._filename = filename
self._filepath = filepath
self._movieid = movieid
def __repr__(self):
return "<User('%s','%s', '%s')>" % (self.filename, self.filepath, self.movieid)
Edit:
The backend is sqlite3. Below is the code from the creation of the table produced by using the echo command (thanks for pointing that out, it's very useful - already I suspect that the existing application is generating far more sql than is necessary).
Following the reported sql table creation is the code generated when a row is removed. I personally can't see any statement that references the possible deletion of a row in the local file table, but I know very little sql currently. Thanks.
2011-12-29 16:29:18,530 INFO sqlalchemy.engine.base.Engine.0x...0650
CREATE TABLE local_file (
_id INTEGER NOT NULL,
_filename VARCHAR(50) NOT NULL,
_filepath VARCHAR(128) NOT NULL,
_movieid INTEGER,
PRIMARY KEY (_id),
FOREIGN KEY(_movieid) REFERENCES movies (movie_id) ON DELETE CASCADE ON UPDATE CASCADE
)
2011-12-29T16:29:18: I: sqlalchemy.engine.base.Engine.0x...0650(base:1387):
CREATE TABLE local_file (
_id INTEGER NOT NULL,
_filename VARCHAR(50) NOT NULL,
_filepath VARCHAR(128) NOT NULL,
_movieid INTEGER,
PRIMARY KEY (_id),
FOREIGN KEY(_movieid) REFERENCES movies (movie_id) ON DELETE CASCADE ON UPDATE CASCADE
)
2011-12-29 16:29:18,534 INFO sqlalchemy.engine.base.Engine.0x...0650 ()
2011-12-29T16:29:18: I: sqlalchemy.engine.base.Engine.0x...0650(base:1388): ()
2011-12-29 16:29:18,643 INFO sqlalchemy.engine.base.Engine.0x...0650 COMMIT
2011-12-29T16:29:18: I: sqlalchemy.engine.base.Engine.0x...0650(base:1095): COMMIT
for row in table produces the following for the two tables:
the local file table:
(, u' 310 To Yuma')
(, u' Ravenous')
the movie table in the existing app:
(, u'IMDb - 3:10 to Yuma')
(, u'Ravenous')
The code when deleting a row is so long that I cannot include it here (200 lines or so - isn't that a little too many for deleting one row?), but it makes no reference to deleting a row in the localfile table. There are statements like:
2011-12-29 17:09:17,141 INFO sqlalchemy.engine.base.Engine.0x...0650 UPDATE movies SET poster_md5=?, updated=? WHERE movies.movie_id = ?
2011-12-29T17:09:17: I: sqlalchemy.engine.base.Engine.0x...0650(base:1387): UPDATE movies SET poster_md5=?, updated=? WHERE movies.movie_id = ?
2011-12-29 17:09:17,142 INFO sqlalchemy.engine.base.Engine.0x...0650 (None, '2011-12-29 17:09:17.141019', 2)
2011-12-29T17:09:17: I: sqlalchemy.engine.base.Engine.0x...0650(base:1388): (None, '2011-12-29 17:09:17.141019', 2)
2011-12-29 17:09:17,150 INFO sqlalchemy.engine.base.Engine.0x...0650 DELETE FROM posters WHERE posters.md5sum = ?
2011-12-29T17:09:17: I: sqlalchemy.engine.base.Engine.0x...0650(base:1387): DELETE FROM posters WHERE posters.md5sum = ?
2011-12-29 17:09:17,157 INFO sqlalchemy.engine.base.Engine.0x...0650 (u'083841e14b8bb9ea166ea4b2b976f03d',)

In SQLite you must turn on support for foreign keys explicitly or it just ignores any SQL related to foreign keys.
engine = create_engine(database_url)
def on_connect(conn, record):
conn.execute('pragma foreign_keys=ON')
from sqlalchemy import event
event.listen(engine, 'connect', on_connect)

Related

python sqlalchemy bulk_save_objects doesn't use bulk

In continue to my previous post
I'm trying to use the bulk_save_objects for a list of objects (the objects dont have a PK value therefore it should create it for each object). When I use the bulk_save_objects I see an insert per object instead of one insert for all objects.
The code :
class Product(Base):
__tablename__ = 'products'
id = Column('id',BIGINT, primary_key=True)
barcode = Column('barcode' ,BIGINT)
productName = Column('name', TEXT,nullable=False)
objectHash=Column('objectHash',TEXT,unique=True,nullable=False)
def __init__(self, productData,picture=None):
self.barcode = productData[ProductTagsEnum.barcode.value]
self.productName = productData[ProductTagsEnum.productName.value]
self.objectHash = md5((str(self.barcode)+self.produtName).encode('utf-8')).hexdigest()
Another class contains the following save method :
def saveNewProducts(self,products):
Session = sessionmaker()
session=Session()
productsHashes=[ product.objectHash for product in products]
query = session.query(Product.objectHash).filter(Product.objectHash.in_(productsHashes))
existedHashes=query.all()
newProducts = [ product for product in products if product.objectHash not in productsHashes]
/*also tried : session.bulk_save_objects(newProducts, preserve_order=False)*/
session.bulk_save_objects(newProducts)
UPDATE 1
I following what #Ilja Everilä recommended in the comments, I added a few parameters to the connection string :
engine = create_engine('postgresql://postgres:123#localhost:5432/mydb', pool_size=25, max_overflow=0,
executemany_mode='values',
executemany_values_page_size=10000, executemany_batch_page_size=500,
echo=True)
In the console I saw multiple inserts with the following format :
2019-09-16 16:48:46,509 INFO sqlalchemy.engine.base.Engine INSERT INTO products (barcode, productName, objectHash) VALUES (%(barcode)s, %(productName)s, %(objectHash)s, ) RETURNING products.id
2019-09-16 16:48:46,509 INFO sqlalchemy.engine.base.Engine {'barcode': '5008251', 'productName': 'ice ream','object_hash': 'b2752233ec523f2e874dc95b70020ae5'}
In my case, the solution I used : I deleted the id column and set the objectHash as PK, and afterwards the save_bulk and add_all functions worked and actually did bulk insert. It seems like those functions work only if you already have the pk inside the object.

SQLite triggers & datetime defaults in SQL DDL using Peewee in Python

I have a SQLite table defined like so:
create table if not exists KeyValuePair (
key CHAR(255) primary key not null,
val text not null,
fup timestamp default current_timestamp not null, -- time of first upload
lup timestamp default current_timestamp not null -- time of last upload
);
create trigger if not exists entry_first_insert after insert
on KeyValuePair
begin
update KeyValuePair set lup = current_timestamp where key = new.key;
end;
create trigger if not exists entry_last_updated after update of value
on KeyValuePair
begin
update KeyValuePair set lup = current_timestamp where key = old.key;
end;
I'm trying to write a peewee.Model for this table in Python. This is what I have so far:
import peewee as pw
db = pw.SqliteDatabase('dhm.db')
class BaseModel(pw.Model):
class Meta:
database = db
class KeyValuePair(BaseModel):
key = pw.FixedCharField(primary_key=True, max_length=255)
val = pw.TextField(null=False)
fup = pw.DateTimeField(
verbose_name='first_updated', null=False, default=datetime.datetime.now)
lup = pw.DateTimeField(
verbose_name='last_updated', null=False, default=datetime.datetime.now)
db.connect()
db.create_tables([KeyValuePair])
When I inspect the SQL produced by the last line I get:
CREATE TABLE "keyvaluepair" (
"key" CHAR(255) NOT NULL PRIMARY KEY,
"val" TEXT NOT NULL,
"fup" DATETIME NOT NULL,
"lup" DATETIME NOT NULL
);
So I have two questions at this point:
I've been unable to find a way to achieve the behavior of the entry_first_insert and entry_last_updated triggers. Does peewee support triggers? If not, is there a way to just create a table from a .sql file rather than the Model class definition?
Is there a way to make the default for fup and lup propogate to the SQL definitions?
I've figured out a proper answer to both questions. This solution actually enforces the desired triggers and default timestamps in the SQL DDL.
First we define a convenience class to wrap up the SQL for a trigger. There is a more proper way to do this with the peewee.Node objects, but I didn't have time to delve into all of that for this project. This Trigger class simply provides string formatting to output proper sql for trigger creation.
class Trigger(object):
"""Trigger template wrapper for use with peewee ORM."""
_template = """
{create} {name} {when} {trigger_op}
on {tablename}
begin
{op} {tablename} {sql} where {pk} = {old_new}.{pk};
end;
"""
def __init__(self, table, name, when, trigger_op, op, sql, safe=True):
self.create = 'create trigger' + (' if not exists' if safe else '')
self.tablename = table._meta.name
self.pk = table._meta.primary_key.name
self.name = name
self.when = when
self.trigger_op = trigger_op
self.op = op
self.sql = sql
self.old_new = 'new' if trigger_op.lower() == 'insert' else 'old'
def __str__(self):
return self._template.format(**self.__dict__)
Next we define a class TriggerTable that inherits from the BaseModel. This class overrides the default create_table to follow table creation with trigger creation. If any triggers fail to create, the whole create is rolled back.
class TriggerTable(BaseModel):
"""Table with triggers."""
#classmethod
def triggers(cls):
"""Return an iterable of `Trigger` objects to create upon table creation."""
return tuple()
#classmethod
def new_trigger(cls, name, when, trigger_op, op, sql):
"""Create a new trigger for this class's table."""
return Trigger(cls, name, when, trigger_op, op, sql)
#classmethod
def create_table(cls, fail_silently=False):
"""Create this table in the underlying database."""
super(TriggerTable, cls).create_table(fail_silently)
for trigger in cls.triggers():
try:
cls._meta.database.execute_sql(str(trigger))
except:
cls._meta.database.drop_table(cls, fail_silently)
raise
The next step is to create a class BetterDateTimeField. This Field object overrides the default __ddl__ to append a "DEFAULT current_timestamp" string if the default instance variable is set to the datetime.datetime.now function. There are certainly better ways to do this, but this one captures the basic use case.
class BetterDateTimeField(pw.DateTimeField):
"""Propogate defaults to database layer."""
def __ddl__(self, column_type):
"""Return a list of Node instances that defines the column."""
ddl = super(BetterDateTimeField, self).__ddl__(column_type)
if self.default == datetime.datetime.now:
ddl.append(pw.SQL('DEFAULT current_timestamp'))
return ddl
Finally, we define the new and improved KeyValuePair Model, incorporating our trigger and datetime field improvements. We conclude the Python code by creating the table.
class KeyValuePair(TriggerTable):
"""DurableHashMap entries are key-value pairs."""
key = pw.FixedCharField(primary_key=True, max_length=255)
val = pw.TextField(null=False)
fup = BetterDateTimeField(
verbose_name='first_updated', null=False, default=datetime.datetime.now)
lup = BetterDateTimeField(
verbose_name='last_updated', null=False, default=datetime.datetime.now)
#classmethod
def triggers(cls):
return (
cls.new_trigger(
'kvp_first_insert', 'after', 'insert', 'update',
'set lup = current_timestamp'),
cls.new_trigger(
'kvp_last_udpated', 'after', 'update', 'update',
'set lup = current_timestamp')
)
KeyValuePair.create_table()
Now the schema is created properly:
sqlite> .schema keyvaluepair
CREATE TABLE "keyvaluepair" ("key" CHAR(255) NOT NULL PRIMARY KEY, "val" TEXT NOT NULL, "fup" DATETIME NOT NULL DEFAULT current_timestamp, "lup" DATETIME NOT NULL DEFAULT current_timestamp);
CREATE TRIGGER kvp_first_insert after insert
on keyvaluepair
begin
update keyvaluepair set lup = current_timestamp where key = new.key;
end;
CREATE TRIGGER kvp_last_udpated after update
on keyvaluepair
begin
update keyvaluepair set lup = current_timestamp where key = old.key;
end;
sqlite> insert into keyvaluepair (key, val) values ('test', 'test-value');
sqlite> select * from keyvaluepair;
test|test-value|2015-12-07 21:58:05|2015-12-07 21:58:05
sqlite> update keyvaluepair set val = 'test-value-two' where key = 'test';
sqlite> select * from keyvaluepair;
test|test-value-two|2015-12-07 21:58:05|2015-12-07 21:58:22
You can override the save function of the model where you insert the timestamps. See TimeStampModel for an example.
I stumbled across exactly this issue a while ago, and spent some time coming up with an optimal design to support Triggers in PeeWee (inspired by the above answer). I am quite happy with how we ended up implementing it, and wanted to share this. At some point I will do a PR into Peewee for this.
Creating Triggers & TriggerListeners in PeeWee
Objective
This document describes how to do this in two parts:
How to add a Trigger to a model in the database.
How to create a ListenThread that will have a callback function that is notified each time the table is updated.
How-To Implementation
The beauty of this design is you only need one item: the TriggerModelMixin Model. Then it is easy to create listeners to subscribe/have callback methods.
The TriggerModelMixin can be copy-pasted as:
class TriggerModelMixin(Model):
""" PeeWee Model with support for triggers.
This will create a trigger that on all table updates will send
a NOTIFY to {tablename}_updates.
Note that it will also take care of updating the triggers as
appropriate/necesary.
"""
_template = """
CREATE OR REPLACE FUNCTION {function_name}()
RETURNS trigger AS
$BODY$
BEGIN
PERFORM pg_notify(
CAST('{notify_channel_name}' AS text),
row_to_json(NEW)::text);
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION {function_name}() OWNER TO postgres;
DROP TRIGGER IF EXISTS {trigger_name} ON "{tablename}";
CREATE TRIGGER {trigger_name}
AFTER INSERT OR UPDATE OR DELETE
ON "{tablename}"
{frequency}
EXECUTE PROCEDURE {function_name}();
"""
function_name_template = "{table_name}updatesfunction"
trigger_name_template = "{table_name}updatestrigger"
notify_channel_name_template = "{table_name}updates"
frequency = "FOR EACH ROW"
#classmethod
def get_notify_channel(cls):
table_name = cls._meta.table_name
return cls.notify_channel_name_template.format(**{"table_name": table_name})
#classmethod
def create_table(cls, fail_silently=False):
""" Create table and triggers """
super(TriggerModelMixin, cls).create_table()
table_name = cls._meta.table_name
notify_channel = cls.get_notify_channel()
function_name = cls.function_name_template.format(**{"table_name": table_name})
trigger_name = cls.trigger_name_template.format(**{"table_name": table_name})
trigger = cls._template.format(**{
"function_name": function_name,
"trigger_name": trigger_name,
"notify_channel_name": notify_channel,
"tablename": table_name,
"frequency": cls.frequency
}
)
logger.info(f"Creating Triggers for {cls}")
cls._meta.database.execute_sql(str(trigger))
#classmethod
def create_db_listener(cls):
''' Returns an object that will listen to the database notify channel
and call a specified callback function if triggered.
'''
class Trigger_Listener:
def __init__(self, db_model):
self.db_model = db_model
self.running = True
self.test_mode = False
self.channel_name = ""
def stop(self):
self.running = False
def listen_and_call(self, f, *args, timeout: int = 5, sync=False):
''' Start listening and call the callback method `f` if a
trigger notify is received.
This has two styles: sync (blocking) and async (non-blocking)
Note that `f` must have `record` as a keyword parameter - this
will be the record that sent the notification.
'''
if sync:
return self.listen_and_call_sync(f, *args, timeout=timeout)
else:
t = threading.Thread(
target=self.listen_and_call_sync,
args=(f, *args),
kwargs={'timeout': timeout}
)
t.start()
def listen_and_call_sync(self, f, *args, timeout: int = 5):
''' Call callback function `f` when the channel is notified. '''
self.channel_name = self.db_model.get_notify_channel()
db = self.db_model._meta.database
db.execute_sql(f"LISTEN {self.channel_name};")
conn = db.connection()
while self.running:
# The if see's if the response is non-null
if not select.select([conn], [], [], timeout) == ([], [], []):
# Wait for the bytes to become fully available in the buffer
conn.poll()
while conn.notifies:
record = conn.notifies.pop(0)
logger.info(f"Trigger recieved with record {record}")
f(*args, record=record)
if self.test_mode:
break
return Trigger_Listener(cls)
Example Implementation:
db_listener = FPGExchangeOrder.create_db_listener()
def callback_method(record=None):
# CallBack Method to handle the record.
logger.info(f"DB update on record: f{record}")
# Handle the update here
db_listener.listen_and_call(callback_method)
How to use this
1. Add a Trigger to a model in the database
This is very easy. Just add the mixin TriggerModelMixin to the model that you want to add support to. This Mixin will handle the creation of the triggers, and the Listening method to notify when the triggers are called.
2. Create a ListenThread to have a Callback
We have two modes for the listener: async (non-blocking) and sync (blocking). By default, it will be non-blocking, you can change this with the sync=True if you want it to be blocking.
To use it (in either case), create a callback method. Note that this callback method will be blocking when updates are received (records are processed in serial), so do not have heavy load or I/O in this method. The only requirement of this method is a keyed parameter of record - which will be where the record from the database is returned as a dictionary.
From this, just create the listener, then call listen_and_call.

SQLAlchemy temporary table with Declarative Base

I need a temporary table in my programme. I have seen that this can be achieved with the "mapper" syntax in this way:
t = Table(
't', metadata,
Column('id', Integer, primary_key=True),
# ...
prefixes=['TEMPORARY'],
)
Seen here
But, my whole code is using the declarative base, it is what I understand, and I would like to stick to it. There is the possibility of using a hybrid approach but if possible I'd avoid it.
This is a simplified version of how my declarative class looks like:
import SQLAlchemy as alc
class Tempo(Base):
"""
Class for temporary table used to process data coming from xlsx
#param Base Declarative Base
"""
# TODO: make it completely temporary
__tablename__ = 'tempo'
drw = alc.Column(alc.String)
date = alc.Column(alc.Date)
check_number = alc.Column(alc.Integer)
Thanks in advance!
EDITED WITH THE NEW PROBLEMS:
Now the class looks like this:
import SQLAlchemy as alc
class Tempo(Base):
"""
Class for temporary table used to process data coming from xlsx
#param Base Declarative Base
"""
# TODO: make it completely temporary
__tablename__ = 'tempo'
__table_args__ = {'prefixes': ['TEMPORARY']}
drw = alc.Column(alc.String)
date = alc.Column(alc.Date)
check_number = alc.Column(alc.Integer)
And when I try to insert data in this table, I get the following error message:
sqlalchemy.exc.OperationalError: (OperationalError) no such table:
tempo u'INSERT INTO tempo (...) VALUES (?, ?, ?, ?, ?, ?, ?, ?)' (....)
It seems the table doesn't exist just by declaring it. I have seen something like create_all() that might be the solution for this (it's funny to see how new ideas come while explaining thoroughly)
Then again, thank you very much!
Is it possible to use __table_args__? See https://docs.sqlalchemy.org/en/14/orm/declarative_tables.html#orm-declarative-table-configuration
class Tempo(Base):
"""
Class for temporary table used to process data coming from xlsx
#param Base Declarative Base
"""
# TODO: make it completely temporary
__tablename__ = 'tempo'
__table_args__ = {'prefixes': ['TEMPORARY']}
drw = alc.Column(alc.String)
date = alc.Column(alc.Date)
check_number = alc.Column(alc.Integer)
Old question, but if anyone out there wants to create a temp table from an existing declarative table model on the fly rather than having it always be a part of your model/code, you can try the following approach. Copying __table_args__ is a little tricky since it can have multiple formats and any Index objects need to be recreated so they aren't associated with the old table.
import time
from sqlalchemy.schema import CreateTable
def copy_table_args(model, **kwargs):
"""Try to copy existing __table_args__, override params with kwargs"""
table_args = model.__table_args__
if isinstance(table_args, tuple):
new_args = []
for arg in table_args:
if isinstance(arg, dict):
table_args_dict = arg.copy()
table_args_dict.update(**kwargs)
new_args.append(arg)
elif isinstance(arg, sa.Index):
index = sa.Index(
arg.name,
*[col for col in arg.columns.keys()],
unique=arg.unique,
**arg.kwargs,
)
new_args.append(index)
else:
# TODO: need to handle Constraints
raise Exception(f"Unhandled table arg: {arg}")
table_args = tuple(new_args)
elif isinstance(table_args, dict):
table_args = {
k: (v.copy() if hasattr(v, "copy") else v) for k, v in table_args.items()
}
table_args.update(**kwargs)
else:
raise Exception(f"Unexpected __table_args__ type: {table_args}")
return table_args
def copy_table_from_model(conn, model, **kwargs):
model_name = model.__name__ + "Tmp"
table_name = model.__table__.name + "_" + str(time.time()).replace(".", "_")
table_args = copy_table_args(model, extend_existing=True)
args = {c.name: c.copy() for c in model.__table__.c}
args["__tablename__"] = table_name
args["__table_args__"] = table_args
copy_model = type(model_name, model.__bases__, args)
print(str(CreateTable(copy_model.__table__)))
copy_model.__table__.create(conn)
return copy_model
def temp_table_from_model(conn, model, **kwargs):
return copy_table_from_model(conn, model, prefixes=["TEMPORARY"])
Note: I haven't added logic to handle copying Constraints, and this is lightly tested against MySQL. Also note that if you do this with non-temporary tables and auto-named indexes (i.e. Column(..., index=True)) then this may not play nice with alembic.

sqlalchemy: union query few columns from multiple tables with condition

I'm trying to adapt some part of a MySQLdb application to sqlalchemy in declarative base. I'm only beginning with sqlalchemy.
The legacy tables are defined something like:
student: id_number*, semester*, stateid, condition, ...
choice: id_number*, semester*, choice_id, school, program, ...
We have 3 tables for each of them (student_tmp, student_year, student_summer, choice_tmp, choice_year, choice_summer), so each pair (_tmp, _year, _summer) contains information for a specific moment.
select *
from `student_tmp`
inner join `choice_tmp` using (`id_number`, `semester`)
My problem is the information that is important to me is to get the equivalent of the following select:
SELECT t.*
FROM (
(
SELECT st.*, ct.*
FROM `student_tmp` AS st
INNER JOIN `choice_tmp` as ct USING (`id_number`, `semester`)
WHERE (ct.`choice_id` = IF(right(ct.`semester`, 1)='1', '3', '4'))
AND (st.`condition` = 'A')
) UNION (
SELECT sy.*, cy.*
FROM `student_year` AS sy
INNER JOIN `choice_year` as cy USING (`id_number`, `semester`)
WHERE (cy.`choice_id` = 4)
AND (sy.`condition` = 'A')
) UNION (
SELECT ss.*, cs.*
FROM `student_summer` AS ss
INNER JOIN `choice_summer` as cs USING (`id_number`, `semester`)
WHERE (cs.`choice_id` = 3)
AND (ss.`condition` = 'A')
)
) as t
* used for shorten the select, but I'm actually only querying for about 7 columns out of the 50 availables.
This information is used in many flavors... "Do I have new students? Do I still have all students from a given date? Which students are subscribed after the given date? etc..." The result of this select statement is to be saved in another database.
Would it be possible for me to achieve this with a single view-like class? The information is read-only so I don't need to be able to modify/create/delte. Or do I have to declare a class for each table (ending up with 6 classes) and every time I need to query, I have to remember to filter?
Thanks for pointers.
EDIT: I don't have modification access to the database (I cannot create a view). Both databases may not be on the same server (so I cannot create a view on my second DB).
My concern is to avoid the full table scan before filtering on condition and choice_id.
EDIT 2: I've set up declarative classes like this:
class BaseStudent(object):
id_number = sqlalchemy.Column(sqlalchemy.String(7), primary_key=True)
semester = sqlalchemy.Column(sqlalchemy.String(5), primary_key=True)
unique_id_number = sqlalchemy.Column(sqlalchemy.String(7))
stateid = sqlalchemy.Column(sqlalchemy.String(12))
condition = sqlalchemy.Column(sqlalchemy.String(3))
class Student(BaseStudent, Base):
__tablename__ = 'student'
choices = orm.relationship('Choice', backref='student')
#class StudentYear(BaseStudent, Base):...
#class StudentSummer(BaseStudent, Base):...
class BaseChoice(object):
id_number = sqlalchemy.Column(sqlalchemy.String(7), primary_key=True)
semester = sqlalchemy.Column(sqlalchemy.String(5), primary_key=True)
choice_id = sqlalchemy.Column(sqlalchemy.String(1))
school = sqlalchemy.Column(sqlalchemy.String(2))
program = sqlalchemy.Column(sqlalchemy.String(5))
class Choice(BaseChoice, Base):
__tablename__ = 'choice'
__table_args__ = (
sqlalchemy.ForeignKeyConstraint(['id_number', 'semester',],
[Student.id_number, Student.semester,]),
)
#class ChoiceYear(BaseChoice, Base): ...
#class ChoiceSummer(BaseChoice, Base): ...
Now, the query that gives me correct SQL for one set of table is:
q = session.query(StudentYear, ChoiceYear) \
.select_from(StudentYear) \
.join(ChoiceYear) \
.filter(StudentYear.condition=='A') \
.filter(ChoiceYear.choice_id=='4')
but it throws an exception...
"Could not locate column in row for column '%s'" % key)
sqlalchemy.exc.NoSuchColumnError: "Could not locate column in row for column '*'"
How do I use that query to create myself a class I can use?
If you can create this view on the database, then you simply map the view as if it was a table. See Reflecting Views.
# DB VIEW
CREATE VIEW my_view AS -- #todo: your select statements here
# SA
my_view = Table('my_view', metadata, autoload=True)
# define view object
class ViewObject(object):
def __repr__(self):
return "ViewObject %s" % str((self.id_number, self.semester,))
# map the view to the object
view_mapper = mapper(ViewObject, my_view)
# query the view
q = session.query(ViewObject)
for _ in q:
print _
If you cannot create a VIEW on the database level, you could create a selectable and map the ViewObject to it. The code below should give you the idea:
student_tmp = Table('student_tmp', metadata, autoload=True)
choice_tmp = Table('choice_tmp', metadata, autoload=True)
# your SELECT part with the columns you need
qry = select([student_tmp.c.id_number, student_tmp.c.semester, student_tmp.stateid, choice_tmp.school])
# your INNER JOIN condition
qry = qry.where(student_tmp.c.id_number == choice_tmp.c.id_number).where(student_tmp.c.semester == choice_tmp.c.semester)
# other WHERE clauses
qry = qry.where(student_tmp.c.condition == 'A')
You can create 3 queries like this, then combine them with union_all and use the resulting query in the mapper:
view_mapper = mapper(ViewObject, my_combined_qry)
In both cases you have to ensure though that a PrimaryKey is properly defined on the view, and you might need to override the autoloaded view, and specify the primary key explicitely (see the link above). Otherwise you will either receive an error, or might not get proper results from the query.
Answer to EDIT-2:
qry = (session.query(StudentYear, ChoiceYear).
select_from(StudentYear).
join(ChoiceYear).
filter(StudentYear.condition == 'A').
filter(ChoiceYear.choice_id == '4')
)
The result will be tuple pairs: (Student, Choice).
But if you want to create a new mapped class for the query, then you can create a selectable as the sample above:
student_tmp = StudentTmp.__table__
choice_tmp = ChoiceTmp.__table__
.... (see sample code above)
This is to show what I ended up doing, any comment welcomed.
class JoinedYear(Base):
__table__ = sqlalchemy.select(
[
StudentYear.id_number,
StudentYear.semester,
StudentYear.stateid,
ChoiceYear.school,
ChoiceYear.program,
],
from_obj=StudentYear.__table__.join(ChoiceYear.__table__),
) \
.where(StudentYear.condition == 'A') \
.where(ChoiceYear.choice_id == '4') \
.alias('YearView')
and I will elaborate from there...
Thanks #van

How to do an upsert with SqlAlchemy?

I have a record that I want to exist in the database if it is not there, and if it is there already (primary key exists) I want the fields to be updated to the current state. This is often called an upsert.
The following incomplete code snippet demonstrates what will work, but it seems excessively clunky (especially if there were a lot more columns). What is the better/best way?
Base = declarative_base()
class Template(Base):
__tablename__ = 'templates'
id = Column(Integer, primary_key = True)
name = Column(String(80), unique = True, index = True)
template = Column(String(80), unique = True)
description = Column(String(200))
def __init__(self, Name, Template, Desc):
self.name = Name
self.template = Template
self.description = Desc
def UpsertDefaultTemplate():
sess = Session()
desired_default = Template("default", "AABBCC", "This is the default template")
try:
q = sess.query(Template).filter_by(name = desiredDefault.name)
existing_default = q.one()
except sqlalchemy.orm.exc.NoResultFound:
#default does not exist yet, so add it...
sess.add(desired_default)
else:
#default already exists. Make sure the values are what we want...
assert isinstance(existing_default, Template)
existing_default.name = desired_default.name
existing_default.template = desired_default.template
existing_default.description = desired_default.description
sess.flush()
Is there a better or less verbose way of doing this? Something like this would be great:
sess.upsert_this(desired_default, unique_key = "name")
although the unique_key kwarg is obviously unnecessary (the ORM should be able to easily figure this out) I added it just because SQLAlchemy tends to only work with the primary key. eg: I've been looking at whether Session.merge would be applicable, but this works only on primary key, which in this case is an autoincrementing id which is not terribly useful for this purpose.
A sample use case for this is simply when starting up a server application that may have upgraded its default expected data. ie: no concurrency concerns for this upsert.
SQLAlchemy supports ON CONFLICT with two methods on_conflict_do_update() and on_conflict_do_nothing().
Copying from the documentation:
from sqlalchemy.dialects.postgresql import insert
stmt = insert(my_table).values(user_email='a#b.com', data='inserted data')
stmt = stmt.on_conflict_do_update(
index_elements=[my_table.c.user_email],
index_where=my_table.c.user_email.like('%#gmail.com'),
set_=dict(data=stmt.excluded.data)
)
conn.execute(stmt)
SQLAlchemy does have a "save-or-update" behavior, which in recent versions has been built into session.add, but previously was the separate session.saveorupdate call. This is not an "upsert" but it may be good enough for your needs.
It is good that you are asking about a class with multiple unique keys; I believe this is precisely the reason there is no single correct way to do this. The primary key is also a unique key. If there were no unique constraints, only the primary key, it would be a simple enough problem: if nothing with the given ID exists, or if ID is None, create a new record; else update all other fields in the existing record with that primary key.
However, when there are additional unique constraints, there are logical issues with that simple approach. If you want to "upsert" an object, and the primary key of your object matches an existing record, but another unique column matches a different record, then what do you do? Similarly, if the primary key matches no existing record, but another unique column does match an existing record, then what? There may be a correct answer for your particular situation, but in general I would argue there is no single correct answer.
That would be the reason there is no built in "upsert" operation. The application must define what this means in each particular case.
Nowadays, SQLAlchemy provides two helpful functions on_conflict_do_nothing and on_conflict_do_update. Those functions are useful but require you to swich from the ORM interface to the lower-level one - SQLAlchemy Core.
Although those two functions make upserting using SQLAlchemy's syntax not that difficult, these functions are far from providing a complete out-of-the-box solution to upserting.
My common use case is to upsert a big chunk of rows in a single SQL query/session execution. I usually encounter two problems with upserting:
For example, higher level ORM functionalities we've gotten used to are missing. You cannot use ORM objects but instead have to provide ForeignKeys at the time of insertion.
I'm using this following function I wrote to handle both of those issues:
def upsert(session, model, rows):
table = model.__table__
stmt = postgresql.insert(table)
primary_keys = [key.name for key in inspect(table).primary_key]
update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}
if not update_dict:
raise ValueError("insert_or_update resulted in an empty update_dict")
stmt = stmt.on_conflict_do_update(index_elements=primary_keys,
set_=update_dict)
seen = set()
foreign_keys = {col.name: list(col.foreign_keys)[0].column for col in table.columns if col.foreign_keys}
unique_constraints = [c for c in table.constraints if isinstance(c, UniqueConstraint)]
def handle_foreignkeys_constraints(row):
for c_name, c_value in foreign_keys.items():
foreign_obj = row.pop(c_value.table.name, None)
row[c_name] = getattr(foreign_obj, c_value.name) if foreign_obj else None
for const in unique_constraints:
unique = tuple([const,] + [row[col.name] for col in const.columns])
if unique in seen:
return None
seen.add(unique)
return row
rows = list(filter(None, (handle_foreignkeys_constraints(row) for row in rows)))
session.execute(stmt, rows)
I use a "look before you leap" approach:
# first get the object from the database if it exists
# we're guaranteed to only get one or zero results
# because we're filtering by primary key
switch_command = session.query(Switch_Command).\
filter(Switch_Command.switch_id == switch.id).\
filter(Switch_Command.command_id == command.id).first()
# If we didn't get anything, make one
if not switch_command:
switch_command = Switch_Command(switch_id=switch.id, command_id=command.id)
# update the stuff we care about
switch_command.output = 'Hooray!'
switch_command.lastseen = datetime.datetime.utcnow()
session.add(switch_command)
# This will generate either an INSERT or UPDATE
# depending on whether we have a new object or not
session.commit()
The advantage is that this is db-neutral and I think it's clear to read. The disadvantage is that there's a potential race condition in a scenario like the following:
we query the db for a switch_command and don't find one
we create a switch_command
another process or thread creates a switch_command with the same primary key as ours
we try to commit our switch_command
There are multiple answers and here comes yet another answer (YAA). Other answers are not that readable due to the metaprogramming involved. Here is an example that
Uses SQLAlchemy ORM
Shows how to create a row if there are zero rows using on_conflict_do_nothing
Shows how to update the existing row (if any) without creating a new row using on_conflict_do_update
Uses the table primary key as the constraint
A longer example in the original question what this code is related to.
import sqlalchemy as sa
import sqlalchemy.orm as orm
from sqlalchemy import text
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy.orm import Session
class PairState(Base):
__tablename__ = "pair_state"
# This table has 1-to-1 relationship with Pair
pair_id = sa.Column(sa.ForeignKey("pair.id"), nullable=False, primary_key=True, unique=True)
pair = orm.relationship(Pair,
backref=orm.backref("pair_state",
lazy="dynamic",
cascade="all, delete-orphan",
single_parent=True, ), )
# First raw event in data stream
first_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
# Last raw event in data stream
last_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
# The last hypertable entry added
last_interval_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
#staticmethod
def create_first_event_if_not_exist(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Sets the first event value if not exist yet."""
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, first_event_at=ts).
on_conflict_do_nothing()
)
#staticmethod
def update_last_event(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Replaces the the column last_event_at for a named pair."""
# Based on the original example of https://stackoverflow.com/a/49917004/315168
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, last_event_at=ts).
on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_event_at": ts})
)
#staticmethod
def update_last_interval(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Replaces the the column last_interval_at for a named pair."""
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, last_interval_at=ts).
on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_interval_at": ts})
)
The below works fine for me with redshift database and will also work for combined primary key constraint.
SOURCE : this
Just few modifications required for creating SQLAlchemy engine in the function
def start_engine()
from sqlalchemy import Column, Integer, Date ,Metadata
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.dialects import postgresql
Base = declarative_base()
def start_engine():
engine = create_engine(os.getenv('SQLALCHEMY_URI',
'postgresql://localhost:5432/upsert'))
connect = engine.connect()
meta = MetaData(bind=engine)
meta.reflect(bind=engine)
return engine
class DigitalSpend(Base):
__tablename__ = 'digital_spend'
report_date = Column(Date, nullable=False)
day = Column(Date, nullable=False, primary_key=True)
impressions = Column(Integer)
conversions = Column(Integer)
def __repr__(self):
return str([getattr(self, c.name, None) for c in self.__table__.c])
def compile_query(query):
compiler = query.compile if not hasattr(query, 'statement') else
query.statement.compile
return compiler(dialect=postgresql.dialect())
def upsert(session, model, rows, as_of_date_col='report_date', no_update_cols=[]):
table = model.__table__
stmt = insert(table).values(rows)
update_cols = [c.name for c in table.c
if c not in list(table.primary_key.columns)
and c.name not in no_update_cols]
on_conflict_stmt = stmt.on_conflict_do_update(
index_elements=table.primary_key.columns,
set_={k: getattr(stmt.excluded, k) for k in update_cols},
index_where=(getattr(model, as_of_date_col) < getattr(stmt.excluded, as_of_date_col))
)
print(compile_query(on_conflict_stmt))
session.execute(on_conflict_stmt)
session = start_engine()
upsert(session, DigitalSpend, initial_rows, no_update_cols=['conversions'])
This allows access to the underlying models based on string names
def get_class_by_tablename(tablename):
"""Return class reference mapped to table.
https://stackoverflow.com/questions/11668355/sqlalchemy-get-model-from-table-name-this-may-imply-appending-some-function-to
:param tablename: String with name of table.
:return: Class reference or None.
"""
for c in Base._decl_class_registry.values():
if hasattr(c, '__tablename__') and c.__tablename__ == tablename:
return c
sqla_tbl = get_class_by_tablename(table_name)
def handle_upsert(record_dict, table):
"""
handles updates when there are primary key conflicts
"""
try:
self.active_session().add(table(**record_dict))
except:
# Here we'll assume the error is caused by an integrity error
# We do this because the error classes are passed from the
# underlying package (pyodbc / sqllite) SQLAlchemy doesn't mask
# them with it's own code - this should be updated to have
# explicit error handling for each new db engine
# <update>add explicit error handling for each db engine</update>
active_session.rollback()
# Query for conflic class, use update method to change values based on dict
c_tbl_primary_keys = [i.name for i in table.__table__.primary_key] # List of primary key col names
c_tbl_cols = dict(sqla_tbl.__table__.columns) # String:Col Object crosswalk
c_query_dict = {k:record_dict[k] for k in c_tbl_primary_keys if k in record_dict} # sub-dict from data of primary key:values
c_oo_query_dict = {c_tbl_cols[k]:v for (k,v) in c_query_dict.items()} # col-object:query value for primary key cols
c_target_record = session.query(sqla_tbl).filter(*[k==v for (k,v) in oo_query_dict.items()]).first()
# apply new data values to the existing record
for k, v in record_dict.items()
setattr(c_target_record, k, v)
This works for me with sqlite3 and postgres. Albeit it might fail with combined primary key constraints and will most likely fail with additional unique constraints.
try:
t = self._meta.tables[data['table']]
except KeyError:
self._log.error('table "%s" unknown', data['table'])
return
try:
q = insert(t, values=data['values'])
self._log.debug(q)
self._db.execute(q)
except IntegrityError:
self._log.warning('integrity error')
where_clause = [c.__eq__(data['values'][c.name]) for c in t.c if c.primary_key]
update_dict = {c.name: data['values'][c.name] for c in t.c if not c.primary_key}
q = update(t, values=update_dict).where(*where_clause)
self._log.debug(q)
self._db.execute(q)
except Exception as e:
self._log.error('%s: %s', t.name, e)
As we had problems with generated default-ids and references which lead to ForeignKeyViolation-Errors like
update or delete on table "..." violates foreign key constraint
Key (id)=(...) is still referenced from table "...".
we had to exclude the id for the update dict, as otherwise the it will be always generated as new default value.
In addition the method is returning the created/updated entity.
from sqlalchemy.dialects.postgresql import insert # Important to use the postgresql insert
def upsert(session, data, key_columns, model):
stmt = insert(model).values(data)
# Important to exclude the ID for update!
exclude_for_update = [model.id.name, *key_columns]
update_dict = {c.name: c for c in stmt.excluded if c.name not in exclude_for_update}
stmt = stmt.on_conflict_do_update(
index_elements=key_columns,
set_=update_dict
).returning(model)
orm_stmt = (
select(model)
.from_statement(stmt)
.execution_options(populate_existing=True)
)
return session.execute(orm_stmt).scalar()
Example:
class UpsertUser(Base):
__tablename__ = 'upsert_user'
id = Column(Id, primary_key=True, default=uuid.uuid4)
name: str = Column(sa.String, nullable=False)
user_sid: str = Column(sa.String, nullable=False, unique=True)
house_admin = relationship('UpsertHouse', back_populates='admin', uselist=False)
class UpsertHouse(Base):
__tablename__ = 'upsert_house'
id = Column(Id, primary_key=True, default=uuid.uuid4)
admin_id: Id = Column(Id, ForeignKey('upsert_user.id'), nullable=False)
admin: UpsertUser = relationship('UpsertUser', back_populates='house_admin', uselist=False)
# Usage
upserted_user = upsert(session, updated_user, [UpsertUser.user_sid.name], UpsertUser)
Note: Only tested on postgresql but could work also for other DBs which support ON DUPLICATE KEY UPDATE e.g. MySQL
In case of sqlite, the sqlite_on_conflict='REPLACE' option can be used when defining a UniqueConstraint, and sqlite_on_conflict_unique for unique constraint on a single column. Then session.add will work in a way just like upsert. See the official documentation.
I use this code for upsert
Before using this code, you should add primary keys to table in database.
from sqlalchemy import create_engine
from sqlalchemy import MetaData, Table
from sqlalchemy.inspection import inspect
from sqlalchemy.engine.reflection import Inspector
from sqlalchemy.dialects.postgresql import insert
def upsert(df, engine, table_name, schema=None, chunk_size = 1000):
metadata = MetaData(schema=schema)
metadata.bind = engine
table = Table(table_name, metadata, schema=schema, autoload=True)
# olny use common columns between df and table.
table_columns = {column.name for column in table.columns}
df_columns = set(df.columns)
intersection_columns = table_columns.intersection(df_columns)
df1 = df[intersection_columns]
records = df1.to_dict('records')
# get list of fields making up primary key
primary_keys = [key.name for key in inspect(table).primary_key]
with engine.connect() as conn:
chunks = [records[i:i + chunk_size] for i in range(0, len(records), chunk_size)]
for chunk in chunks:
stmt = insert(table).values(chunk)
update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}
s = stmt.on_conflict_do_update(
index_elements= primary_keys,
set_=update_dict)
conn.execute(s)

Categories