How to deal with the sqlalchemy's DB table blocking policy? - python

Can someone explain, how can I avoid application freezing when I, for example, have a list of entities and an ability to move to detail pages.
So, I open the list, and one sqlalchemy session starts, then I open one detail page and another one goes, then another one, and application freeze, because one session blocks another.
I cannot use one session for whole application, because I then can't say, that something was edited on form by just checking out session.dirty, new, deleted attributes and application state handling becomes the hell of fragile unreadable code.
Do I need to implement some another kind of session handling policy?
Do I need to tune sqlalchemy mapping or sql server?
Here is the minimal working example:
from sqlalchemy import MetaData, Table, Column, FetchedValue, ForeignKey, create_engine
from sqlalchemy.types import BigInteger, String
from sqlalchemy.orm import mapper, relationship, sessionmaker, Session
class Ref(object):
id = None
name = None
id_parent = None
class TableMapper(object):
def __init__(self, metadata, mapped_type):
self._table = None
self._mapped_type = mapped_type
def get_table(self):
return self._table
def set_table(self, table):
assert isinstance(table, Table)
self._table = table
class RefTableMapper(TableMapper):
def __init__(self, metadata):
TableMapper.__init__(self, metadata, Ref)
self.set_table(Table('Ref', metadata,
Column('id', BigInteger,
primary_key = True, nullable = False),
Column('name', String),
Column('id_parent', BigInteger,
ForeignKey('Ref.id'))
))
def map_table(self):
r_parent = relationship(Ref,
uselist = False,
remote_side = [self._table.c.id],
primaryjoin = (
self._table.c.id_parent == self._table.c.id))
mapper(Ref, self._table,
properties = {'parent': r_parent})
return self._table
class Mapper(object):
def __init__(self, url, echo = False):
self._engine = create_engine(url, echo = echo)
self._metadata = MetaData(self._engine)
self._Session = sessionmaker(bind = self._engine, autoflush = False)
ref_t = RefTableMapper(self._metadata).map_table()
def create_session(self):
return self._Session()
if __name__ == '__main__':
mapp = Mapper(r'mssql://username:pwd#Server\SQLEXPRESS/DBName', True)
s = mapp.create_session()
rr = s.query(Ref).all()
s1 = mapp.create_session()
merged = s1.merge(rr)
merged.flush()
s2 = mapp.create_session()
rr1 = s2.query(Ref).all() #application freezes!

SQL Server's default isolation mode locks entire tables very aggressively. (The above example seems like perhaps you're emitting an UPDATE and then emitting SELECT in a different transaction while the previous transaction is pending, though session.merge() does not accept a list and the contents of the table aren't specified above so its difficult to say).
Anyway, it's typical practice to enable multi-version concurrency control (SQL server calls it "row versioning") so that it has reasonable ability to lock individual rows against each other instead of full tables:
ALTER DATABASE MyDatabase SET ALLOW_SNAPSHOT_ISOLATION ON
ALTER DATABASE MyDatabase SET READ_COMMITTED_SNAPSHOT ON
Detail on this is available at http://msdn.microsoft.com/en-us/library/ms175095.aspx .

Related

Create a Calculated Column in SQLAlchemy Core Table

I have a SQLAlchemy table defined as following:
user = Table('user', MetaData(),
Column('id', Integer),
Column('first_name', String),
Column('last_name', String))
I often need to refer the full_name of a user during query, for example:
sql = (select([
user.c.id,
(user.c.first_name + user.c.last_name).label('full_name')
]).where(user.c.id == 123))
Since full_name is used in many places, so such kind of code has a lot copies.
I wonder whether there is a way in SQLAlchemy I can create a calculated Column, so that I can conveniently use it just like other normal Column, the SQLAlchemy automatically converts it into (user.c.first_name + user.c.last_name).label('full_name') whenever I refer user.c.full_name
sql = (select([
user.c.id,
user.c.full_name
]).where(user.c.id == 123))
I searched and found there some solution in SQLAlchemy ORM using column_property or hybrid_property. The difference in my case is that I can only use SQLAlchemy Core.
It is not possible to create calculated columns in sqlalchemy core. You don't need to do this however, all that is required is to save your expression in a variable and then use this in your select statements. If you have many of these to store then you could namespace them by storing them all in a collection. In the example below I've used an SQLAlchemy Properties object for this purpose so it will behave in similar manner to the columns collection.
class Table(sqlalchemy.Table):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.d = sqlalchemy.util._collections.Properties({})
user = Table('user', MetaData(),
Column('id', Integer),
Column('first_name', String),
Column('last_name', String))
user.d.full_name = (user.c.first_name + user.c.last_name).label('full_name')
user.d.backwards_name = (user.c.last_name + user.c.first_name).label('backwards_name')
sql = (select([
user.c.id,
user.d.full_name
]).where(user.c.id == 123))

Why does a query invoke a auto-flush in SQLAlchemy?

The code you see above is just a sample but it works to reproduce this error:
sqlalchemy.exc.IntegrityError: (raised as a result of Query-invoked autoflush;
consider using a session.no_autoflush block if this flush is occurring prematurely)
(sqlite3.IntegrityError) NOT NULL constraint failed: X.nn
[SQL: 'INSERT INTO "X" (nn, val) VALUES (?, ?)'] [parameters: (None, 1)]
A mapped instance is still added to a session. The instance wants to know (which means query on the database) if other instances its own type exists having the same values. There is a second attribute/column (_nn). It is specified to NOT NULL. But by default it is NULL.
When the instance (like in the sample) is still added to the session a call to query.one() invoke a auto-flush. This flush create an INSERT which tries to store the instance. This fails because _nn is still null and violates the NOT NULL constraint.
That is what I understand currently.
But the question is why does it invoke an auto-flush? Can I block that?
#!/usr/bin/env python3
import os.path
import os
import sqlalchemy as sa
import sqlalchemy.orm as sao
import sqlalchemy.ext.declarative as sad
from sqlalchemy_utils import create_database
_Base = sad.declarative_base()
session = None
class X(_Base):
__tablename__ = 'X'
_oid = sa.Column('oid', sa.Integer, primary_key=True)
_nn = sa.Column('nn', sa.Integer, nullable=False) # NOT NULL!
_val = sa.Column('val', sa.Integer)
def __init__(self, val):
self._val = val
def test(self, session):
q = session.query(X).filter(X._val == self._val)
x = q.one()
print('x={}'.format(x))
dbfile = 'x.db'
def _create_database():
if os.path.exists(dbfile):
os.remove(dbfile)
engine = sa.create_engine('sqlite:///{}'.format(dbfile), echo=True)
create_database(engine.url)
_Base.metadata.create_all(engine)
return sao.sessionmaker(bind=engine)()
if __name__ == '__main__':
session = _create_database()
for val in range(3):
x = X(val)
x._nn = 0
session.add(x)
session.commit()
x = X(1)
session.add(x)
x.test(session)
Of course a solution would be to not add the instance to the session before query.one() was called. This work. But in my real (but to complex for this question) use-case it isn't a nice solution.
How to turn off autoflush feature:
Temporary: you can use no_autoflush context manager on snippet where you query the database, i.e. in X.test method:
def test(self, session):
with session.no_autoflush:
q = session.query(X).filter(X._val == self._val)
x = q.one()
print('x={}'.format(x))
Session-wide: just pass autoflush=False to your sessionmaker:
return sao.sessionmaker(bind=engine, autoflush=False)()
I know this is old but it might be helpful for some others who are getting this error while using flask-sqlalchemy. The below code has fixed my issue with autoflush.
db = SQLAlchemy(session_options={"autoflush": False})

Updating row in SqlAlchemy ORM

I am trying to obtain a row from DB, modify that row and save it again.
Everything by using SqlAlchemy
My code
from sqlalchemy import Column, DateTime, Integer, String, Table, MetaData
from sqlalchemy.orm import mapper
from sqlalchemy import create_engine, orm
metadata = MetaData()
product = Table('product', metadata,
Column('id', Integer, primary_key=True),
Column('name', String(1024), nullable=False, unique=True),
)
class Product(object):
def __init__(self, id, name):
self.id = id
self.name = name
mapper(Product, product)
db = create_engine('sqlite:////' + db_path)
sm = orm.sessionmaker(bind=db, autoflush=True, autocommit=True, expire_on_commit=True)
session = orm.scoped_session(sm)
result = session.execute("select * from product where id = :id", {'id': 1}, mapper=Product)
prod = result.fetchone() #there are many products in db so query is ok
prod.name = 'test' #<- here I got AttributeError: 'RowProxy' object has no attribute 'name'
session .add(prod)
session .flush()
Unfortunately it does not work, because I am trying to modify RowProxy object. How can I do what I want (load, change and save(update) row) in SqlAlchemy ORM way?
I assume that your intention is to use Object-Relational API.
So to update row in db you'll need to do this by loading mapped object from the table record and updating object's property.
Please see code example below.
Please note I've added example code for creating new mapped object and creating first record in table also there is commented out code at the end for deleting the record.
from sqlalchemy import Column, DateTime, Integer, String, Table, MetaData
from sqlalchemy.orm import mapper
from sqlalchemy import create_engine, orm
metadata = MetaData()
product = Table('product', metadata,
Column('id', Integer, primary_key=True),
Column('name', String(1024), nullable=False, unique=True),
)
class Product(object):
def __init__(self, id, name):
self.id = id
self.name = name
def __repr__(self):
return "%s(%r,%r)" % (self.__class__.name,self.id,self.name)
mapper(Product, product)
db = create_engine('sqlite:////temp/test123.db')
metadata.create_all(db)
sm = orm.sessionmaker(bind=db, autoflush=True, autocommit=True, expire_on_commit=True)
session = orm.scoped_session(sm)
#create new Product record:
if session.query(Product).filter(Product.id==1).count()==0:
new_prod = Product("1","Product1")
print "Creating new product: %r" % new_prod
session.add(new_prod)
session.flush()
else:
print "product with id 1 already exists: %r" % session.query(Product).filter(Product.id==1).one()
print "loading Product with id=1"
prod = session.query(Product).filter(Product.id==1).one()
print "current name: %s" % prod.name
prod.name = "new name"
print prod
prod.name = 'test'
session.add(prod)
session.flush()
print prod
#session.delete(prod)
#session.flush()
PS SQLAlchemy also provides SQL Expression API that allows to work with table records directly without creating mapped objects. In my practice we are using Object-Relation API in most of the applications, sometimes we use SQL Expressions API when we need to perform low level db operations efficiently such as inserting or updating thousands of records with one query.
Direct links to SQLAlchemy documentation:
Object Relational Tutorial
SQL Expression Language Tutorial

Cannot move object from one database to another

I am trying to move an object from one database into another. The mappings are the same but the tables are different. This is a merging tool where data from an old database need to be imported into a new one. Still, I think I am missing something fundamental about SQLAlchemy here. What is it?
from sqlalchemy import Column, Float, String, Enum
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import orm
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
DeclarativeBase = declarative_base()
class Datum (DeclarativeBase):
__tablename__ = "xUnitTestData"
Key = Column(String, primary_key=True)
Value = Column(Float)
def __init__ (self, k, v):
self.Key = k
self.Value = v
src_engine = create_engine('sqlite:///:memory:', echo=False)
dst_engine = create_engine('sqlite:///:memory:', echo=False)
DeclarativeBase.metadata.create_all(src_engine)
DeclarativeBase.metadata.create_all(dst_engine)
SessionSRC = sessionmaker(bind=src_engine)
SessionDST = sessionmaker(bind=dst_engine)
item = Datum('eek', 666)
session1 = SessionSRC()
session1.add(item)
session1.commit()
session1.close()
session2 = SessionDST()
session2.add(item)
session2.commit()
print item in session2 # >>> True
print session2.query(Datum).all() # >>> []
session2.close()
I'm not really aware about what happens under the hood, but in the ORM pattern an object matches to a particular row in a particular table. If you try to add the same object to two different tables in two different databases, that doesn't sound like a good practice even if the table definition is exactly the same.
What I'd do to workaround this problem is just create a new object that is a copy of the original object and add it to the database:
session1 = SessionSRC()
session1.add(item)
session1.commit()
new_item = Datum(item.Key, item.Value)
session2 = SessionDST()
session2.add(new_item)
session2.commit()
print new_item in session2 # >>> True
print session2.query(Datum).all() # >>> [<__main__.Datum object at 0x.......>]
session2.close()
session1.close()
Note that session1 isn't closed immediately to be able to read the original object attributes while creating the new object.

How to do an upsert with SqlAlchemy?

I have a record that I want to exist in the database if it is not there, and if it is there already (primary key exists) I want the fields to be updated to the current state. This is often called an upsert.
The following incomplete code snippet demonstrates what will work, but it seems excessively clunky (especially if there were a lot more columns). What is the better/best way?
Base = declarative_base()
class Template(Base):
__tablename__ = 'templates'
id = Column(Integer, primary_key = True)
name = Column(String(80), unique = True, index = True)
template = Column(String(80), unique = True)
description = Column(String(200))
def __init__(self, Name, Template, Desc):
self.name = Name
self.template = Template
self.description = Desc
def UpsertDefaultTemplate():
sess = Session()
desired_default = Template("default", "AABBCC", "This is the default template")
try:
q = sess.query(Template).filter_by(name = desiredDefault.name)
existing_default = q.one()
except sqlalchemy.orm.exc.NoResultFound:
#default does not exist yet, so add it...
sess.add(desired_default)
else:
#default already exists. Make sure the values are what we want...
assert isinstance(existing_default, Template)
existing_default.name = desired_default.name
existing_default.template = desired_default.template
existing_default.description = desired_default.description
sess.flush()
Is there a better or less verbose way of doing this? Something like this would be great:
sess.upsert_this(desired_default, unique_key = "name")
although the unique_key kwarg is obviously unnecessary (the ORM should be able to easily figure this out) I added it just because SQLAlchemy tends to only work with the primary key. eg: I've been looking at whether Session.merge would be applicable, but this works only on primary key, which in this case is an autoincrementing id which is not terribly useful for this purpose.
A sample use case for this is simply when starting up a server application that may have upgraded its default expected data. ie: no concurrency concerns for this upsert.
SQLAlchemy supports ON CONFLICT with two methods on_conflict_do_update() and on_conflict_do_nothing().
Copying from the documentation:
from sqlalchemy.dialects.postgresql import insert
stmt = insert(my_table).values(user_email='a#b.com', data='inserted data')
stmt = stmt.on_conflict_do_update(
index_elements=[my_table.c.user_email],
index_where=my_table.c.user_email.like('%#gmail.com'),
set_=dict(data=stmt.excluded.data)
)
conn.execute(stmt)
SQLAlchemy does have a "save-or-update" behavior, which in recent versions has been built into session.add, but previously was the separate session.saveorupdate call. This is not an "upsert" but it may be good enough for your needs.
It is good that you are asking about a class with multiple unique keys; I believe this is precisely the reason there is no single correct way to do this. The primary key is also a unique key. If there were no unique constraints, only the primary key, it would be a simple enough problem: if nothing with the given ID exists, or if ID is None, create a new record; else update all other fields in the existing record with that primary key.
However, when there are additional unique constraints, there are logical issues with that simple approach. If you want to "upsert" an object, and the primary key of your object matches an existing record, but another unique column matches a different record, then what do you do? Similarly, if the primary key matches no existing record, but another unique column does match an existing record, then what? There may be a correct answer for your particular situation, but in general I would argue there is no single correct answer.
That would be the reason there is no built in "upsert" operation. The application must define what this means in each particular case.
Nowadays, SQLAlchemy provides two helpful functions on_conflict_do_nothing and on_conflict_do_update. Those functions are useful but require you to swich from the ORM interface to the lower-level one - SQLAlchemy Core.
Although those two functions make upserting using SQLAlchemy's syntax not that difficult, these functions are far from providing a complete out-of-the-box solution to upserting.
My common use case is to upsert a big chunk of rows in a single SQL query/session execution. I usually encounter two problems with upserting:
For example, higher level ORM functionalities we've gotten used to are missing. You cannot use ORM objects but instead have to provide ForeignKeys at the time of insertion.
I'm using this following function I wrote to handle both of those issues:
def upsert(session, model, rows):
table = model.__table__
stmt = postgresql.insert(table)
primary_keys = [key.name for key in inspect(table).primary_key]
update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}
if not update_dict:
raise ValueError("insert_or_update resulted in an empty update_dict")
stmt = stmt.on_conflict_do_update(index_elements=primary_keys,
set_=update_dict)
seen = set()
foreign_keys = {col.name: list(col.foreign_keys)[0].column for col in table.columns if col.foreign_keys}
unique_constraints = [c for c in table.constraints if isinstance(c, UniqueConstraint)]
def handle_foreignkeys_constraints(row):
for c_name, c_value in foreign_keys.items():
foreign_obj = row.pop(c_value.table.name, None)
row[c_name] = getattr(foreign_obj, c_value.name) if foreign_obj else None
for const in unique_constraints:
unique = tuple([const,] + [row[col.name] for col in const.columns])
if unique in seen:
return None
seen.add(unique)
return row
rows = list(filter(None, (handle_foreignkeys_constraints(row) for row in rows)))
session.execute(stmt, rows)
I use a "look before you leap" approach:
# first get the object from the database if it exists
# we're guaranteed to only get one or zero results
# because we're filtering by primary key
switch_command = session.query(Switch_Command).\
filter(Switch_Command.switch_id == switch.id).\
filter(Switch_Command.command_id == command.id).first()
# If we didn't get anything, make one
if not switch_command:
switch_command = Switch_Command(switch_id=switch.id, command_id=command.id)
# update the stuff we care about
switch_command.output = 'Hooray!'
switch_command.lastseen = datetime.datetime.utcnow()
session.add(switch_command)
# This will generate either an INSERT or UPDATE
# depending on whether we have a new object or not
session.commit()
The advantage is that this is db-neutral and I think it's clear to read. The disadvantage is that there's a potential race condition in a scenario like the following:
we query the db for a switch_command and don't find one
we create a switch_command
another process or thread creates a switch_command with the same primary key as ours
we try to commit our switch_command
There are multiple answers and here comes yet another answer (YAA). Other answers are not that readable due to the metaprogramming involved. Here is an example that
Uses SQLAlchemy ORM
Shows how to create a row if there are zero rows using on_conflict_do_nothing
Shows how to update the existing row (if any) without creating a new row using on_conflict_do_update
Uses the table primary key as the constraint
A longer example in the original question what this code is related to.
import sqlalchemy as sa
import sqlalchemy.orm as orm
from sqlalchemy import text
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy.orm import Session
class PairState(Base):
__tablename__ = "pair_state"
# This table has 1-to-1 relationship with Pair
pair_id = sa.Column(sa.ForeignKey("pair.id"), nullable=False, primary_key=True, unique=True)
pair = orm.relationship(Pair,
backref=orm.backref("pair_state",
lazy="dynamic",
cascade="all, delete-orphan",
single_parent=True, ), )
# First raw event in data stream
first_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
# Last raw event in data stream
last_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
# The last hypertable entry added
last_interval_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
#staticmethod
def create_first_event_if_not_exist(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Sets the first event value if not exist yet."""
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, first_event_at=ts).
on_conflict_do_nothing()
)
#staticmethod
def update_last_event(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Replaces the the column last_event_at for a named pair."""
# Based on the original example of https://stackoverflow.com/a/49917004/315168
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, last_event_at=ts).
on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_event_at": ts})
)
#staticmethod
def update_last_interval(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Replaces the the column last_interval_at for a named pair."""
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, last_interval_at=ts).
on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_interval_at": ts})
)
The below works fine for me with redshift database and will also work for combined primary key constraint.
SOURCE : this
Just few modifications required for creating SQLAlchemy engine in the function
def start_engine()
from sqlalchemy import Column, Integer, Date ,Metadata
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.dialects import postgresql
Base = declarative_base()
def start_engine():
engine = create_engine(os.getenv('SQLALCHEMY_URI',
'postgresql://localhost:5432/upsert'))
connect = engine.connect()
meta = MetaData(bind=engine)
meta.reflect(bind=engine)
return engine
class DigitalSpend(Base):
__tablename__ = 'digital_spend'
report_date = Column(Date, nullable=False)
day = Column(Date, nullable=False, primary_key=True)
impressions = Column(Integer)
conversions = Column(Integer)
def __repr__(self):
return str([getattr(self, c.name, None) for c in self.__table__.c])
def compile_query(query):
compiler = query.compile if not hasattr(query, 'statement') else
query.statement.compile
return compiler(dialect=postgresql.dialect())
def upsert(session, model, rows, as_of_date_col='report_date', no_update_cols=[]):
table = model.__table__
stmt = insert(table).values(rows)
update_cols = [c.name for c in table.c
if c not in list(table.primary_key.columns)
and c.name not in no_update_cols]
on_conflict_stmt = stmt.on_conflict_do_update(
index_elements=table.primary_key.columns,
set_={k: getattr(stmt.excluded, k) for k in update_cols},
index_where=(getattr(model, as_of_date_col) < getattr(stmt.excluded, as_of_date_col))
)
print(compile_query(on_conflict_stmt))
session.execute(on_conflict_stmt)
session = start_engine()
upsert(session, DigitalSpend, initial_rows, no_update_cols=['conversions'])
This allows access to the underlying models based on string names
def get_class_by_tablename(tablename):
"""Return class reference mapped to table.
https://stackoverflow.com/questions/11668355/sqlalchemy-get-model-from-table-name-this-may-imply-appending-some-function-to
:param tablename: String with name of table.
:return: Class reference or None.
"""
for c in Base._decl_class_registry.values():
if hasattr(c, '__tablename__') and c.__tablename__ == tablename:
return c
sqla_tbl = get_class_by_tablename(table_name)
def handle_upsert(record_dict, table):
"""
handles updates when there are primary key conflicts
"""
try:
self.active_session().add(table(**record_dict))
except:
# Here we'll assume the error is caused by an integrity error
# We do this because the error classes are passed from the
# underlying package (pyodbc / sqllite) SQLAlchemy doesn't mask
# them with it's own code - this should be updated to have
# explicit error handling for each new db engine
# <update>add explicit error handling for each db engine</update>
active_session.rollback()
# Query for conflic class, use update method to change values based on dict
c_tbl_primary_keys = [i.name for i in table.__table__.primary_key] # List of primary key col names
c_tbl_cols = dict(sqla_tbl.__table__.columns) # String:Col Object crosswalk
c_query_dict = {k:record_dict[k] for k in c_tbl_primary_keys if k in record_dict} # sub-dict from data of primary key:values
c_oo_query_dict = {c_tbl_cols[k]:v for (k,v) in c_query_dict.items()} # col-object:query value for primary key cols
c_target_record = session.query(sqla_tbl).filter(*[k==v for (k,v) in oo_query_dict.items()]).first()
# apply new data values to the existing record
for k, v in record_dict.items()
setattr(c_target_record, k, v)
This works for me with sqlite3 and postgres. Albeit it might fail with combined primary key constraints and will most likely fail with additional unique constraints.
try:
t = self._meta.tables[data['table']]
except KeyError:
self._log.error('table "%s" unknown', data['table'])
return
try:
q = insert(t, values=data['values'])
self._log.debug(q)
self._db.execute(q)
except IntegrityError:
self._log.warning('integrity error')
where_clause = [c.__eq__(data['values'][c.name]) for c in t.c if c.primary_key]
update_dict = {c.name: data['values'][c.name] for c in t.c if not c.primary_key}
q = update(t, values=update_dict).where(*where_clause)
self._log.debug(q)
self._db.execute(q)
except Exception as e:
self._log.error('%s: %s', t.name, e)
As we had problems with generated default-ids and references which lead to ForeignKeyViolation-Errors like
update or delete on table "..." violates foreign key constraint
Key (id)=(...) is still referenced from table "...".
we had to exclude the id for the update dict, as otherwise the it will be always generated as new default value.
In addition the method is returning the created/updated entity.
from sqlalchemy.dialects.postgresql import insert # Important to use the postgresql insert
def upsert(session, data, key_columns, model):
stmt = insert(model).values(data)
# Important to exclude the ID for update!
exclude_for_update = [model.id.name, *key_columns]
update_dict = {c.name: c for c in stmt.excluded if c.name not in exclude_for_update}
stmt = stmt.on_conflict_do_update(
index_elements=key_columns,
set_=update_dict
).returning(model)
orm_stmt = (
select(model)
.from_statement(stmt)
.execution_options(populate_existing=True)
)
return session.execute(orm_stmt).scalar()
Example:
class UpsertUser(Base):
__tablename__ = 'upsert_user'
id = Column(Id, primary_key=True, default=uuid.uuid4)
name: str = Column(sa.String, nullable=False)
user_sid: str = Column(sa.String, nullable=False, unique=True)
house_admin = relationship('UpsertHouse', back_populates='admin', uselist=False)
class UpsertHouse(Base):
__tablename__ = 'upsert_house'
id = Column(Id, primary_key=True, default=uuid.uuid4)
admin_id: Id = Column(Id, ForeignKey('upsert_user.id'), nullable=False)
admin: UpsertUser = relationship('UpsertUser', back_populates='house_admin', uselist=False)
# Usage
upserted_user = upsert(session, updated_user, [UpsertUser.user_sid.name], UpsertUser)
Note: Only tested on postgresql but could work also for other DBs which support ON DUPLICATE KEY UPDATE e.g. MySQL
In case of sqlite, the sqlite_on_conflict='REPLACE' option can be used when defining a UniqueConstraint, and sqlite_on_conflict_unique for unique constraint on a single column. Then session.add will work in a way just like upsert. See the official documentation.
I use this code for upsert
Before using this code, you should add primary keys to table in database.
from sqlalchemy import create_engine
from sqlalchemy import MetaData, Table
from sqlalchemy.inspection import inspect
from sqlalchemy.engine.reflection import Inspector
from sqlalchemy.dialects.postgresql import insert
def upsert(df, engine, table_name, schema=None, chunk_size = 1000):
metadata = MetaData(schema=schema)
metadata.bind = engine
table = Table(table_name, metadata, schema=schema, autoload=True)
# olny use common columns between df and table.
table_columns = {column.name for column in table.columns}
df_columns = set(df.columns)
intersection_columns = table_columns.intersection(df_columns)
df1 = df[intersection_columns]
records = df1.to_dict('records')
# get list of fields making up primary key
primary_keys = [key.name for key in inspect(table).primary_key]
with engine.connect() as conn:
chunks = [records[i:i + chunk_size] for i in range(0, len(records), chunk_size)]
for chunk in chunks:
stmt = insert(table).values(chunk)
update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}
s = stmt.on_conflict_do_update(
index_elements= primary_keys,
set_=update_dict)
conn.execute(s)

Categories