Why don't duplicates in a relationship violate a UniqueConstraint? - python

With the below models, why does the following interactive succeed in adding duplicate associations to a relationship during the same transaction? I expected (and need) to it fail with the UniqueConstraint placed on the association table.
Models:
from app import db # this is the access to SQLAlchemy
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
sz_shirt_dress_sleeve = db.relationship(
'SizeKeyShirtDressSleeve',
secondary=LinkUserSizeShirtDressSleeve,
backref=db.backref('users', lazy='dynamic'),
order_by="asc(SizeKeyShirtDressSleeve.id)")
class SizeKeyShirtDressSleeve(db.Model):
id = db.Column(db.Integer, primary_key=True)
size = db.Column(db.Integer)
def __repr__(self):
return 'Dress shirt sleeve size: %r' % self.size
LinkUserSizeShirtDressSleeve = db.Table(
'link_user_size_shirt_dress_sleeve',
db.Column(
'size_id',
db.Integer,
db.ForeignKey('size_key_shirt_dress_sleeve.id'), primary_key=True),
db.Column(
'user_id',
db.Integer,
db.ForeignKey('user.id'), primary_key=True),
db.UniqueConstraint('size_id', 'user_id', name='uq_association')
)
Because of the UniqueConstraint on the association table, I expected this interactive session to cause an IntegrityError. It doesn't and allows me to associate the same size twice:
>>> from app.models import User, SizeKeyShirtDressSleeve
>>> db.session.add(User(id=8))
>>> db.session.commit()
>>> u = User.query.filter(User.id==8).one()
>>> u
<User id: 8, email: None, password_hash: None>
>>> u.sz_shirt_dress_sleeve
[]
>>> should_cause_error = SizeKeyShirtDressSleeve.query.first()
>>> should_cause_error
Dress shirt sleeve size: 3000
>>> u.sz_shirt_dress_sleeve.append(should_cause_error)
>>> u.sz_shirt_dress_sleeve.append(should_cause_error)
>>> u.sz_shirt_dress_sleeve
[Dress shirt sleeve size: 3000, Dress shirt sleeve size: 3000]
>>> db.session.commit()
>>>
Wait, what? Isn't that relationship representative of what is in my association table? I guess I should verify that:
(immediately after, same session)
>>> from app.models import LinkUserSizeShirtDressSleeve as Sleeve
>>> db.session.query(Sleeve).filter(Sleeve.c.user_id==8).all()
[(1, 8)]
>>>
So u.sz_shirt_dress_sleeve wasn't accurately representing the state of the association table. ...Okay. But I need it to. In fact, I do know it will fail if I do try to add another should_cause_error object to the relationship:
>>> u.sz_shirt_dress_sleeve.append(should_cause_error)
>>> db.session.commit()
# huge stack trace
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: link_user_size_shirt_dress_sleeve.size_id, link_user_size_shirt_dress_sleeve.user_id [SQL: 'INSERT INTO link_user_size_shirt_dress_sleeve (size_id, user_id) VALUES (?, ?)'] [parameters: (1, 8)] (Background on this error at: http://sqlalche.me/e/gkpj)
>>>
Great! Okay, so things I'm inferring:
1) It's possible to have duplicate items in the relationship list.
2) It's possible for the relationship list to not accurately reflect the state of the association table it is responsible for.
3) The UniqueConstraint works ...as long as I continue interacting with the relationship in separate transactions (punctuated by session.commit()).
Questions: Are 1), 2), or 3) incorrect? And how can I prevent duplicate items being present in my relationship list inside the same transaction?

Those three things are all correct. 3) should be qualified: the UniqueConstraint always works in the sense that your database will never be inconsistent; it just doesn't give you an error unless the relationship you're adding is already flushed.
The fundamental reason this happens is an impedance mismatch between an association table in SQL and its representation in SQLAlchemy. A table in SQL is a multiset of tuples, so with that UNIQUE constraint, your LinkUserSizeShirtDressSleeve table is a set of (size_id, user_id) tuples. On the other hand, the default representation of a relationship in SQLAlchemy an ordered list of objects, but it imposes some limitations on the way it maintains this list and the way it expects you to interact with this list, so it behaves more like a set in some ways. In particular, it silently ignores duplicate entries in your association table (if you happen to not have a UNIQUE constraint), and it assumes that you never add duplicate objects to this list in the first place.
If this is a problem for you, just make the behavior more in line with SQL by using collection_class=set on your relationship. If you want an error to be raised when you add duplicate entries into the relationship, create a custom collection class based on set that fails on duplicate adds. In some of my projects, I've resorted to monkey-patching the relationship constructor to set collection_class=set on all of my relationships to make this less verbose.
Here's how I would such a custom collection class:
class UniqueSet(set):
def add(self, el):
if el in self:
raise ValueError("Value already exists")
super().add(el)

Related

Flask sqlalchemy filter objects in relationship for each object

How to filter objects in relationship in one command?
Example filter: I have list of childrens and every children has toys. Show me how to filter each childs toys, so that list child.toys contains only red toys.
class Child(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(40))
toys = db.relationship('Toy', lazy='dynamic')
class Toy(db.Model):
id = db.Column(db.Integer, primary_key=True)
color = db.Column(db.String(40))
child_id = db.Column(db.Integer, db.ForeignKey('child.id') )
Child id
Child name
1
First
2
Second
Toy id
Toy color
Toy child_id
1
Blue
1
2
Red
1
3
Orange
2
4
Red
2
Desired output in python list:
Child id
Child name
Toy id
Toy color
1
First
2
Red
2
Second
4
Red
Edit:
this table will be printed by:
for child in filtered_children:
for toy in child.toys:
print(f'{child.id} {child.name} {toy.id} {toy.color}')
Dynamic Relationship Loaders provides correct result, but you have to iterate like this in for loop:
children = Child.query.all()
for child in children:
child.toys = child.toys.filter_by(color='Red')
len( children[0].toys ) #should by one
len( children[1].toys ) #should by one
Is there a way how to filter objects from relationship without for loop?
Edit:
Reformulated question: Is there a way to apply the filter at the outer query such that no additional filtering need to be done inside the for loop such that each child.toys attribute for each child in the loop will only contain Red toys?
Since it appears you already configured some rudimentary relationship that allowed the usage of join conditions, the next step is simply to actually use the Query.join() call.
My example below uses SQLAlchemy directly, so you may need to adapt the references to those specific to Flask-SQLALchemy (e.g. instead of creating a session you may instead use db.session, which I referenced from the Quickstart; also the Child and Toy classes I used inherit from a common sqlalchemy.ext.declarative.declarative_base() for the same reason, otherwise the same principles introduced below should be commonly applicable).
First, import and set up the classes - to keep this generalized, I will be using sqlalchemy directly as noted:
from sqlalchemy import Column, Integer, String
from sqlalchemy import create_engine, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship, contains_eager
Base = declarative_base()
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
name = Column(String(40))
toys = relationship('Toy')
class Toy(Base):
__tablename__ = 'toy'
id = Column(Integer, primary_key=True)
color = Column(String(40))
child_id = Column(Integer, ForeignKey('child.id') )
Note that we have removed the extra keyword arguments for the Child.toys relationship construct, as the specified 'dynamic' style of loading is incompatible with the query desired.
Then set up the engine and add your data provided from your question:
engine = create_engine('sqlite://')
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
session.add(Child(name='First'))
session.add(Child(name='Second'))
session.add(Toy(color='Blue', child_id=1))
session.add(Toy(color='Red', child_id=1))
session.add(Toy(color='Orange', child_id=2))
session.add(Toy(color='Red', child_id=2))
session.commit()
With the data added, we can see that in the documentation for loading relationships from SQLAlchemy, there is a rather comprehensive set of relationship loading API, and the contains_eager is the suitable one for your use case, as it states "that the given attribute should be eagerly loaded from columns stated manually in the query." So let's see it in action:
filtered_children = session.query(Child).join(Child.toys).filter(
Toy.color=='Red'
).options(
contains_eager(Child.toys)
).all()
This query ensures that the toys relationship declared through the Child is joined with the query, and that we also filter by "Red" toys, with the option to indicate that Child.toys should be "eager loaded from columns stated manually in the query" such that it will become available through each of the returned child object.
Now, see that your most recent desired for loop over filtered_children and their toys produce your desired output:
for child in filtered_children:
for toy in child.toys:
print(f'{child.id} {child.name} {toy.id} {toy.color}')
The following output should be produced:
1 First 2 Red
2 Second 4 Red
If we had logging enabled, we will see that the following output that indicates the SELECT statement that was issued by SQLAlchemy's engine:
INFO:sqlalchemy.engine.base.Engine:SELECT toy.id AS toy_id, toy.color AS toy_color, toy.child_id AS toy_child_id, child.id AS child_id, child.name AS child_name
FROM child JOIN toy ON child.id = toy.child_id
WHERE toy.color = ?
INFO:sqlalchemy.engine.base.Engine:('Red',)
DEBUG:sqlalchemy.engine.base.Engine:Col ('toy_id', 'toy_color', 'toy_child_id', 'child_id', 'child_name')
DEBUG:sqlalchemy.engine.base.Engine:Row (2, 'Red', 1, 1, 'First')
DEBUG:sqlalchemy.engine.base.Engine:Row (4, 'Red', 2, 2, 'Second')
Note that only a single SELECT ... JOIN query was done and no additional queries were made for each of the Child, which would result in significant performance impact.

python sqlalchemy bulk_save_objects doesn't use bulk

In continue to my previous post
I'm trying to use the bulk_save_objects for a list of objects (the objects dont have a PK value therefore it should create it for each object). When I use the bulk_save_objects I see an insert per object instead of one insert for all objects.
The code :
class Product(Base):
__tablename__ = 'products'
id = Column('id',BIGINT, primary_key=True)
barcode = Column('barcode' ,BIGINT)
productName = Column('name', TEXT,nullable=False)
objectHash=Column('objectHash',TEXT,unique=True,nullable=False)
def __init__(self, productData,picture=None):
self.barcode = productData[ProductTagsEnum.barcode.value]
self.productName = productData[ProductTagsEnum.productName.value]
self.objectHash = md5((str(self.barcode)+self.produtName).encode('utf-8')).hexdigest()
Another class contains the following save method :
def saveNewProducts(self,products):
Session = sessionmaker()
session=Session()
productsHashes=[ product.objectHash for product in products]
query = session.query(Product.objectHash).filter(Product.objectHash.in_(productsHashes))
existedHashes=query.all()
newProducts = [ product for product in products if product.objectHash not in productsHashes]
/*also tried : session.bulk_save_objects(newProducts, preserve_order=False)*/
session.bulk_save_objects(newProducts)
UPDATE 1
I following what #Ilja Everilä recommended in the comments, I added a few parameters to the connection string :
engine = create_engine('postgresql://postgres:123#localhost:5432/mydb', pool_size=25, max_overflow=0,
executemany_mode='values',
executemany_values_page_size=10000, executemany_batch_page_size=500,
echo=True)
In the console I saw multiple inserts with the following format :
2019-09-16 16:48:46,509 INFO sqlalchemy.engine.base.Engine INSERT INTO products (barcode, productName, objectHash) VALUES (%(barcode)s, %(productName)s, %(objectHash)s, ) RETURNING products.id
2019-09-16 16:48:46,509 INFO sqlalchemy.engine.base.Engine {'barcode': '5008251', 'productName': 'ice ream','object_hash': 'b2752233ec523f2e874dc95b70020ae5'}
In my case, the solution I used : I deleted the id column and set the objectHash as PK, and afterwards the save_bulk and add_all functions worked and actually did bulk insert. It seems like those functions work only if you already have the pk inside the object.

SQLAlchemy: How to keep the record ID and its relationship record in sync for new records (pre-commit)?

When creating new records, I'd expect that foreign key fields, and their relationship object would stay in sync (if I change one the other would change to reflect), but this doesn't seem to be the case. Is this possible to do?
Given the following:
Base = declarative_base();
class User(Base):
__tablename__ = 'user';
id = Column(Integer, primary_key=True);
name = Column(String);
fullname = Column(String);
password = Column(String);
equipment = relationship('Equipment', backref='user');
class Equipment(Base):
__tablename__ = 'equipment';
id = Column(Integer, primary_key=True);
user_id = Column(Integer, ForeignKey('user.id'), nullable=False);
name = Column(String);
engine = create_engine('sqlite:///:memory:', echo=True);
Base.metadata.create_all(engine);
session = sessionmaker(bind=engine);
conn = session();
conn.add_all([
User(name='bill', fullname='Bill W.', password='rlrrlrll'), # id=1
User(name='tony', fullname='Tony I.', password='EADGBe'), # id=2
User(name='ozzy', fullname='Ozzy O.', password='durrrr'), # id=3
User(name='geezer', fullname='Terence B.', password='password'), # id=4
]);
I can create related records in either of the two ways:
guitar = Equipment(
user = conn.query(User).filter(User.name == 'tony').one(),
name = 'Gibson SG');
drums = Equipment(
user_id = 1,
name = 'Ludwigs');
Following these lines I'd expect guitar.user_id to be 2, and drums.user to be the 'bill' object, but in both cases they're None. After I conn.add()/conn.commit() then it starts working a little more like I'd expect (both complementary fields return non-None values).
Is there any way for this to work pre-commit? I'd like to be able to construct new records either way (by ID or by object), and in library functions be able to reliably access the ID or object.
You can do this by flushing:
conn.add(guitar)
conn.add(name)
conn.flush()
Flushing emits the INSERT queries but does not COMMIT, meaning you can ROLLBACK later if you need to.

how to build a many-to-many relationship using the SQL Expression language

On advice from the creator of Sqlalchemy I'm attempting to write my initial bulk insert of data using the SQL Expression Language. See for details.
The tutorial covers a what i assume to be an example of a one-to-many relationship between User and Address. They create the relationship by with the use of the executemany() method:
>>> conn.execute(addresses.insert(), [
... {'user_id': 1, 'email_address' : 'jack#yahoo.com'},
... {'user_id': 1, 'email_address' : 'jack#msn.com'},
... {'user_id': 2, 'email_address' : 'www#www.org'},
... {'user_id': 2, 'email_address' : 'wendy#aol.com'},
... ])
INSERT INTO addresses (user_id, email_address) VALUES (?, ?)
((1, 'jack#yahoo.com'), (1, 'jack#msn.com'), (2, 'www#www.org'), (2, 'wendy#aol.com'))
COMMIT
<sqlalchemy.engine.ResultProxy object at 0x...>
How ever in my design i have several many-to-many relationships. One being between User and Organization. Note i'm using Flask-Sqlalchemy.
# assuming a relationship like:
# organizations = db.Table('organizations',
# db.Column('organization_id', db.Integer, db.ForeignKey('organization.id')),
# db.Column('user_id', db.Integer, db.ForeignKey('user.id')))
# class User(db.Model):
# ...
# organizations = db.relationship('Organization', secondary=organizations,
# backref=db.backref('users', lazy='dynamic'))
# class Organization(db.Model):
# ...
def db_reset():
# lets reset it
db.drop_all()
db.create_all()
# establish our connection
engine = create_engine(SQLALCHEMY_DATABASE_URI)
conn = engine.connect()
# load in our users
user_insert = User.__table__insert()
for user in users: # users is a list of named tuples
conn.execute(user_insert, name=user.name, role=user.role,
organizations= [orgA,orgB,...etc] # how do i do this?
So my question is contained in my code as well as my title. How can a build a many to many relationship in using the SQL Expression Language?
You should certainly still be defining everything, including your many-to-many relationship, using the ORM. The ORM rides on top of the SQL expression language. So for that one use case where you need to do a mass insert, you make use of the underlying Table object in conjunction with insert(), but when you're doing anything else, you stick with regular ORM patterns.

How to do an upsert with SqlAlchemy?

I have a record that I want to exist in the database if it is not there, and if it is there already (primary key exists) I want the fields to be updated to the current state. This is often called an upsert.
The following incomplete code snippet demonstrates what will work, but it seems excessively clunky (especially if there were a lot more columns). What is the better/best way?
Base = declarative_base()
class Template(Base):
__tablename__ = 'templates'
id = Column(Integer, primary_key = True)
name = Column(String(80), unique = True, index = True)
template = Column(String(80), unique = True)
description = Column(String(200))
def __init__(self, Name, Template, Desc):
self.name = Name
self.template = Template
self.description = Desc
def UpsertDefaultTemplate():
sess = Session()
desired_default = Template("default", "AABBCC", "This is the default template")
try:
q = sess.query(Template).filter_by(name = desiredDefault.name)
existing_default = q.one()
except sqlalchemy.orm.exc.NoResultFound:
#default does not exist yet, so add it...
sess.add(desired_default)
else:
#default already exists. Make sure the values are what we want...
assert isinstance(existing_default, Template)
existing_default.name = desired_default.name
existing_default.template = desired_default.template
existing_default.description = desired_default.description
sess.flush()
Is there a better or less verbose way of doing this? Something like this would be great:
sess.upsert_this(desired_default, unique_key = "name")
although the unique_key kwarg is obviously unnecessary (the ORM should be able to easily figure this out) I added it just because SQLAlchemy tends to only work with the primary key. eg: I've been looking at whether Session.merge would be applicable, but this works only on primary key, which in this case is an autoincrementing id which is not terribly useful for this purpose.
A sample use case for this is simply when starting up a server application that may have upgraded its default expected data. ie: no concurrency concerns for this upsert.
SQLAlchemy supports ON CONFLICT with two methods on_conflict_do_update() and on_conflict_do_nothing().
Copying from the documentation:
from sqlalchemy.dialects.postgresql import insert
stmt = insert(my_table).values(user_email='a#b.com', data='inserted data')
stmt = stmt.on_conflict_do_update(
index_elements=[my_table.c.user_email],
index_where=my_table.c.user_email.like('%#gmail.com'),
set_=dict(data=stmt.excluded.data)
)
conn.execute(stmt)
SQLAlchemy does have a "save-or-update" behavior, which in recent versions has been built into session.add, but previously was the separate session.saveorupdate call. This is not an "upsert" but it may be good enough for your needs.
It is good that you are asking about a class with multiple unique keys; I believe this is precisely the reason there is no single correct way to do this. The primary key is also a unique key. If there were no unique constraints, only the primary key, it would be a simple enough problem: if nothing with the given ID exists, or if ID is None, create a new record; else update all other fields in the existing record with that primary key.
However, when there are additional unique constraints, there are logical issues with that simple approach. If you want to "upsert" an object, and the primary key of your object matches an existing record, but another unique column matches a different record, then what do you do? Similarly, if the primary key matches no existing record, but another unique column does match an existing record, then what? There may be a correct answer for your particular situation, but in general I would argue there is no single correct answer.
That would be the reason there is no built in "upsert" operation. The application must define what this means in each particular case.
Nowadays, SQLAlchemy provides two helpful functions on_conflict_do_nothing and on_conflict_do_update. Those functions are useful but require you to swich from the ORM interface to the lower-level one - SQLAlchemy Core.
Although those two functions make upserting using SQLAlchemy's syntax not that difficult, these functions are far from providing a complete out-of-the-box solution to upserting.
My common use case is to upsert a big chunk of rows in a single SQL query/session execution. I usually encounter two problems with upserting:
For example, higher level ORM functionalities we've gotten used to are missing. You cannot use ORM objects but instead have to provide ForeignKeys at the time of insertion.
I'm using this following function I wrote to handle both of those issues:
def upsert(session, model, rows):
table = model.__table__
stmt = postgresql.insert(table)
primary_keys = [key.name for key in inspect(table).primary_key]
update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}
if not update_dict:
raise ValueError("insert_or_update resulted in an empty update_dict")
stmt = stmt.on_conflict_do_update(index_elements=primary_keys,
set_=update_dict)
seen = set()
foreign_keys = {col.name: list(col.foreign_keys)[0].column for col in table.columns if col.foreign_keys}
unique_constraints = [c for c in table.constraints if isinstance(c, UniqueConstraint)]
def handle_foreignkeys_constraints(row):
for c_name, c_value in foreign_keys.items():
foreign_obj = row.pop(c_value.table.name, None)
row[c_name] = getattr(foreign_obj, c_value.name) if foreign_obj else None
for const in unique_constraints:
unique = tuple([const,] + [row[col.name] for col in const.columns])
if unique in seen:
return None
seen.add(unique)
return row
rows = list(filter(None, (handle_foreignkeys_constraints(row) for row in rows)))
session.execute(stmt, rows)
I use a "look before you leap" approach:
# first get the object from the database if it exists
# we're guaranteed to only get one or zero results
# because we're filtering by primary key
switch_command = session.query(Switch_Command).\
filter(Switch_Command.switch_id == switch.id).\
filter(Switch_Command.command_id == command.id).first()
# If we didn't get anything, make one
if not switch_command:
switch_command = Switch_Command(switch_id=switch.id, command_id=command.id)
# update the stuff we care about
switch_command.output = 'Hooray!'
switch_command.lastseen = datetime.datetime.utcnow()
session.add(switch_command)
# This will generate either an INSERT or UPDATE
# depending on whether we have a new object or not
session.commit()
The advantage is that this is db-neutral and I think it's clear to read. The disadvantage is that there's a potential race condition in a scenario like the following:
we query the db for a switch_command and don't find one
we create a switch_command
another process or thread creates a switch_command with the same primary key as ours
we try to commit our switch_command
There are multiple answers and here comes yet another answer (YAA). Other answers are not that readable due to the metaprogramming involved. Here is an example that
Uses SQLAlchemy ORM
Shows how to create a row if there are zero rows using on_conflict_do_nothing
Shows how to update the existing row (if any) without creating a new row using on_conflict_do_update
Uses the table primary key as the constraint
A longer example in the original question what this code is related to.
import sqlalchemy as sa
import sqlalchemy.orm as orm
from sqlalchemy import text
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy.orm import Session
class PairState(Base):
__tablename__ = "pair_state"
# This table has 1-to-1 relationship with Pair
pair_id = sa.Column(sa.ForeignKey("pair.id"), nullable=False, primary_key=True, unique=True)
pair = orm.relationship(Pair,
backref=orm.backref("pair_state",
lazy="dynamic",
cascade="all, delete-orphan",
single_parent=True, ), )
# First raw event in data stream
first_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
# Last raw event in data stream
last_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
# The last hypertable entry added
last_interval_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))
#staticmethod
def create_first_event_if_not_exist(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Sets the first event value if not exist yet."""
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, first_event_at=ts).
on_conflict_do_nothing()
)
#staticmethod
def update_last_event(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Replaces the the column last_event_at for a named pair."""
# Based on the original example of https://stackoverflow.com/a/49917004/315168
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, last_event_at=ts).
on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_event_at": ts})
)
#staticmethod
def update_last_interval(dbsession: Session, pair_id: int, ts: datetime.datetime):
"""Replaces the the column last_interval_at for a named pair."""
dbsession.execute(
insert(PairState).
values(pair_id=pair_id, last_interval_at=ts).
on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_interval_at": ts})
)
The below works fine for me with redshift database and will also work for combined primary key constraint.
SOURCE : this
Just few modifications required for creating SQLAlchemy engine in the function
def start_engine()
from sqlalchemy import Column, Integer, Date ,Metadata
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.dialects import postgresql
Base = declarative_base()
def start_engine():
engine = create_engine(os.getenv('SQLALCHEMY_URI',
'postgresql://localhost:5432/upsert'))
connect = engine.connect()
meta = MetaData(bind=engine)
meta.reflect(bind=engine)
return engine
class DigitalSpend(Base):
__tablename__ = 'digital_spend'
report_date = Column(Date, nullable=False)
day = Column(Date, nullable=False, primary_key=True)
impressions = Column(Integer)
conversions = Column(Integer)
def __repr__(self):
return str([getattr(self, c.name, None) for c in self.__table__.c])
def compile_query(query):
compiler = query.compile if not hasattr(query, 'statement') else
query.statement.compile
return compiler(dialect=postgresql.dialect())
def upsert(session, model, rows, as_of_date_col='report_date', no_update_cols=[]):
table = model.__table__
stmt = insert(table).values(rows)
update_cols = [c.name for c in table.c
if c not in list(table.primary_key.columns)
and c.name not in no_update_cols]
on_conflict_stmt = stmt.on_conflict_do_update(
index_elements=table.primary_key.columns,
set_={k: getattr(stmt.excluded, k) for k in update_cols},
index_where=(getattr(model, as_of_date_col) < getattr(stmt.excluded, as_of_date_col))
)
print(compile_query(on_conflict_stmt))
session.execute(on_conflict_stmt)
session = start_engine()
upsert(session, DigitalSpend, initial_rows, no_update_cols=['conversions'])
This allows access to the underlying models based on string names
def get_class_by_tablename(tablename):
"""Return class reference mapped to table.
https://stackoverflow.com/questions/11668355/sqlalchemy-get-model-from-table-name-this-may-imply-appending-some-function-to
:param tablename: String with name of table.
:return: Class reference or None.
"""
for c in Base._decl_class_registry.values():
if hasattr(c, '__tablename__') and c.__tablename__ == tablename:
return c
sqla_tbl = get_class_by_tablename(table_name)
def handle_upsert(record_dict, table):
"""
handles updates when there are primary key conflicts
"""
try:
self.active_session().add(table(**record_dict))
except:
# Here we'll assume the error is caused by an integrity error
# We do this because the error classes are passed from the
# underlying package (pyodbc / sqllite) SQLAlchemy doesn't mask
# them with it's own code - this should be updated to have
# explicit error handling for each new db engine
# <update>add explicit error handling for each db engine</update>
active_session.rollback()
# Query for conflic class, use update method to change values based on dict
c_tbl_primary_keys = [i.name for i in table.__table__.primary_key] # List of primary key col names
c_tbl_cols = dict(sqla_tbl.__table__.columns) # String:Col Object crosswalk
c_query_dict = {k:record_dict[k] for k in c_tbl_primary_keys if k in record_dict} # sub-dict from data of primary key:values
c_oo_query_dict = {c_tbl_cols[k]:v for (k,v) in c_query_dict.items()} # col-object:query value for primary key cols
c_target_record = session.query(sqla_tbl).filter(*[k==v for (k,v) in oo_query_dict.items()]).first()
# apply new data values to the existing record
for k, v in record_dict.items()
setattr(c_target_record, k, v)
This works for me with sqlite3 and postgres. Albeit it might fail with combined primary key constraints and will most likely fail with additional unique constraints.
try:
t = self._meta.tables[data['table']]
except KeyError:
self._log.error('table "%s" unknown', data['table'])
return
try:
q = insert(t, values=data['values'])
self._log.debug(q)
self._db.execute(q)
except IntegrityError:
self._log.warning('integrity error')
where_clause = [c.__eq__(data['values'][c.name]) for c in t.c if c.primary_key]
update_dict = {c.name: data['values'][c.name] for c in t.c if not c.primary_key}
q = update(t, values=update_dict).where(*where_clause)
self._log.debug(q)
self._db.execute(q)
except Exception as e:
self._log.error('%s: %s', t.name, e)
As we had problems with generated default-ids and references which lead to ForeignKeyViolation-Errors like
update or delete on table "..." violates foreign key constraint
Key (id)=(...) is still referenced from table "...".
we had to exclude the id for the update dict, as otherwise the it will be always generated as new default value.
In addition the method is returning the created/updated entity.
from sqlalchemy.dialects.postgresql import insert # Important to use the postgresql insert
def upsert(session, data, key_columns, model):
stmt = insert(model).values(data)
# Important to exclude the ID for update!
exclude_for_update = [model.id.name, *key_columns]
update_dict = {c.name: c for c in stmt.excluded if c.name not in exclude_for_update}
stmt = stmt.on_conflict_do_update(
index_elements=key_columns,
set_=update_dict
).returning(model)
orm_stmt = (
select(model)
.from_statement(stmt)
.execution_options(populate_existing=True)
)
return session.execute(orm_stmt).scalar()
Example:
class UpsertUser(Base):
__tablename__ = 'upsert_user'
id = Column(Id, primary_key=True, default=uuid.uuid4)
name: str = Column(sa.String, nullable=False)
user_sid: str = Column(sa.String, nullable=False, unique=True)
house_admin = relationship('UpsertHouse', back_populates='admin', uselist=False)
class UpsertHouse(Base):
__tablename__ = 'upsert_house'
id = Column(Id, primary_key=True, default=uuid.uuid4)
admin_id: Id = Column(Id, ForeignKey('upsert_user.id'), nullable=False)
admin: UpsertUser = relationship('UpsertUser', back_populates='house_admin', uselist=False)
# Usage
upserted_user = upsert(session, updated_user, [UpsertUser.user_sid.name], UpsertUser)
Note: Only tested on postgresql but could work also for other DBs which support ON DUPLICATE KEY UPDATE e.g. MySQL
In case of sqlite, the sqlite_on_conflict='REPLACE' option can be used when defining a UniqueConstraint, and sqlite_on_conflict_unique for unique constraint on a single column. Then session.add will work in a way just like upsert. See the official documentation.
I use this code for upsert
Before using this code, you should add primary keys to table in database.
from sqlalchemy import create_engine
from sqlalchemy import MetaData, Table
from sqlalchemy.inspection import inspect
from sqlalchemy.engine.reflection import Inspector
from sqlalchemy.dialects.postgresql import insert
def upsert(df, engine, table_name, schema=None, chunk_size = 1000):
metadata = MetaData(schema=schema)
metadata.bind = engine
table = Table(table_name, metadata, schema=schema, autoload=True)
# olny use common columns between df and table.
table_columns = {column.name for column in table.columns}
df_columns = set(df.columns)
intersection_columns = table_columns.intersection(df_columns)
df1 = df[intersection_columns]
records = df1.to_dict('records')
# get list of fields making up primary key
primary_keys = [key.name for key in inspect(table).primary_key]
with engine.connect() as conn:
chunks = [records[i:i + chunk_size] for i in range(0, len(records), chunk_size)]
for chunk in chunks:
stmt = insert(table).values(chunk)
update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}
s = stmt.on_conflict_do_update(
index_elements= primary_keys,
set_=update_dict)
conn.execute(s)

Categories