Sqlalchemy add_all() ignore duplicate key IntegrityError

Sqlalchemy add_all() ignore duplicate key IntegrityError - python

I'm adding a list of objects entries to a database. Sometimes it may happen that one of this objects is already in the database (I do not have any control on that).
With only one IntegrityError all the transactions will fail, i.e. all the objects in entries will not be inserted into the database.
try:
session.add_all(entries)
session.commit()
except:
logger.error(f"Error! Rolling back")
session.rollback()
raise
finally:
session.close()
My desired behavior would be: if there is a IntegrityError in one of the entries, catch it and do not add that object to the database, otherwise continue normally (do not fail)
Edit: I'm usign MySQL as backend.

I depends on what backend you're using.
PostgreSQL has a wonderful INSERT() ON CONFLICT DO NOTHING clause which you can use with SQLAlchemy:
from sqlalchemy.dialects.postgresql import insert
session.execute(insert(MyTable)
.values(my_entries)
.on_conflict_do_nothing())
MySQL has the similar INSERT IGNORE clause, but SQLAlchemy has less support for it. Luckily, according to this answer, there is a workaround, using prefix_with:
session.execute(MyTable.__table__
.insert()
.prefix_with('IGNORE')
.values(my_entries))
The only thing is that my_entries needs to be a list of column to value mappings. That means [{ 'id': 1, 'name': 'Ringo' }, { 'id': 2, 'name': 'Paul' }, ...] et cetera.

A solution I have found is to query the database before adding it
try:
instance = session.query(InstancesTable).filter_by(id=entry.id).first()
if instance:
return
session.add(entry)
session.commit()
except:
logger.error(f"Error! Rolling back")
session.rollback()
raise

Related

No exception raised in SqlAlchemy if error in Vertica database

I have table defined in Vertica in which one of the columns has UNIQUE constraint enforced. Now, on inserting a new row, if the same value is present in the column then an error 6745 is raised when the query is executed in the database shell. I am trying to achieve this using Sqlalchemy.
I have an Sqlalchemy engine defined and connect to the DB using this. Next I use execute() which can be used with the above connection created to execute a raw SQL query. I am using a try-except block around the above implementation to catch any exceptions. On inserting a new row with Sqlalchemy no exception is raised but the constraint is enforced in the database side(no duplicated entries written). But the error raised in the database is not captured by Sqlalchemy, hence cannot really say if the operation succeeded or if there was a conflict with the new data being added.
How can I configure Sqlalchemy to raise an exception in case an error was raised on the Database?
I am using the vertica_python dialect.
Temporary Solution:
For now, I use the number of entries in the table before and after performing the operation to classify the status of the operation. This is a dirty hack and not efficient.

You can configure SqlAlchemy to raise an exception by setting the raise_on_unique_violation flag to True on your Vertica connection object. This flag tells SqlAlchemy to raise an exception if a unique constraint violation occurs, rather than silently ignoring it.
For example:
from sqlalchemy import create_engine
from sqlalchemy.dialects.vertica import VerticaDialect
engine = create_engine("vertica+vertica_python://username:password#hostname:port/dbname",
connect_args={'raise_on_unique_violation': True},
echo=True,
dialect_cls=VerticaDialect)
connection = engine.connect()
When you use the connection.execute() method to insert a new row, if a unique constraint violation occurs, SqlAlchemy will raise a UniqueViolation exception, which you can catch and handle in your code.
You can also use session.flush() and session.commit() to handle the exception.
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
session = Session()
try:
session.add(new_row)
session.flush()
session.commit()
except IntegrityError as e:
session.rollback()
raise e
You can check if the error code is 6745, if yes then it is a unique constraint violation error.

SQLAlchemy not rolling back after FlushError

I'm writing some test to a REST API linked to a MySQL db with python+werkzeug+SQLalchemy, one of the test is to try to add a "object" with the primary key missing in the json and verify that it fails and doesn't insert anything in the DB. It used to work fine with sqlite but I switched to MySQLdb and now I get a FlushError (instead of an IntegrityError I used to catch) and when I try to rollback after the error, it doesn't throw any error but the entry is in the database with the primary key set to ''. The code looks like this:
session = Session()
try:
res = func(*args, session=session, **kwargs)
session.commit()
except sqlalchemy.exc.SQLAlchemyError as e:
session.rollback()
return abort(422)
else:
return res
finally:
session.close()
And here's the error that I catch during the try/except:
class 'sqlalchemy.orm.exc.FlushError':Instance has a NULL identity key. If this is an auto-generated value, check that the database table allows generation of new primary key values, and that the mapped Column object is configured to expect these generated values. Ensure also that this flush() is not occurring at an inappropriate time, such as within a load() event.
I just read the documentation about the SQLalchemy session and rollback feature but don't understand why the rollback doesn't work for me as this is almost textbook example from the documentation.
I use Python 2.7.13, werkzeug '0.12.2', sqlalchemy '1.1.13' and MySQLdb '1.2.3' and mysql Ver 14.14 Distrib 5.1.73 !
Thanks for your help

It looks like the problem was MYSQL only:
By default, the strict mode isn't activated and allow incorrect insert/update to make changes in the database (wtf?), the solution is to change the sql_mode, either globally:
MySQL: Setting sql_mode permanently
Or in SQLalchemy like explained in this blog post:
https://www.enricozini.org/blog/2012/tips/sa-sqlmode-traditional/

Catch python DatabaseErrors generically

I have a database schema that might be implemented in a variety of different database engines (let's say an MS Access database that I'll connect to with pyodbc or a SQLite database that I'll connect to via the built-in sqlite3 module as an simple example).
I'd like to create a factory function/method that returns a database connection of the appropriate type based on some parameter, similar to the following:
def createConnection(connType, params):
if connType == 'sqlite':
return sqlite3.connect(params['filename'])
elif connType == 'msaccess':
return pyodbc.connect('DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ={};'.format(params['filename']))
else:
# do something else
Now I've got some query code that should work with any connection type (since the schema is identical no matter the underlying DB engine) but may throw an exception that I'll need to catch:
db = createDatabase(params['dbType'], params)
cursor = db.cursor()
try:
cursor.execute('SELECT A, B, C FROM TABLE')
for row in cursor:
print('{},{},{}'.format(row.A, row.B, row.C))
except DatabaseError as err:
# Do something...
The problem I'm having is that the DatabaseError classes from each DB API 2.0 implementation don't share a common base class (other than the way-too-generic Exception), so I don't know how to catch these exceptions generically. Obviously I could do something like the following:
try:
# as before
except sqlite3.DatabaseError as err:
# do something
except pyodbc.DatabaseError as err:
# do something again
...where I included an explicit catch block for each possible database engine. But this seems distinctly non-pythonic to me.
How can I generically catch DatabaseErrors from different underlying DB API 2.0 database implementations?

There is a number of approaches :
Use a catch-all exception and then work out what exception it is. If it is not in your list, raise the exception again (or your own). See: Python When I catch an exception, how do I get the type, file, and line number?
Perhaps you want to take the problem in a different way: your factory code should also provide the exception to test for.
A simpler approach in my view (and the one I use in practice), is to have a class for all database connections, and to subclass it for each specific database type/syntax. Inheritance allows you to take care of all specificities. For some reason, I never had to worry about this issue.

conditional add statement in SQLAlchemy

Suppose I want to upload several SQL records, to a table that may not be populated yet. If there is a record with a primary key("ID") that already exists, either in the table or in the records to be committed to a table, I want to replace the existing record with the new record.
I'm using mssql, SQL server 2008.
My first guess would be
try:
session.add(record)
session.commit
except:
session.query().\
filter(Class.ID == record.ID).\
update(some expression)
session.commit()
what should the expression be? and is there a cleaner(and safer!) way of doing this?

In general unless using statements that guarantee atomicity, you'll always have to account for race conditions that might arise from multiple actors trying to either insert or update (don't forget delete). Even the MERGE statement, though a single statement, can have race conditions if not used correctly.
Traditionally this kind of "upsert" is performed using stored procedures or other SQL or implementation specific features available, like the MERGE statement.
An SQLAlchemy solution has to either attempt the insert and perform an update if an integrity error is raised, or perform the udpate and attempt an insert if no rows were affected. It should be prepared to retry in case both operations fail (a row might get deleted or inserted in between):
from sqlalchemy.exc import IntegrityError
while True: # Infinite loop, use a retry counter if necessary
try:
# begin a save point, prevents the whole transaction failing
# in case of an integrity error
with session.begin_nested():
session.add(record)
# Flush instead of commit, we need the transaction intact
session.flush()
# If the flush is successful, break out of the loop as the insert
# was performed
break
except IntegrityError:
# Attempt the update. If the session has to reflect the changes
# performed by the update, change the `synchronize_session` argument.
if session.query(Class).\
filter_by(ID=record.ID).\
update({...},
syncronize_session=False):
# 1 or more rows were affected (hopefully 1)
break
# Nothing was updated, perhaps a DELETE in between
# Both operations have failed, retry
session.commit()
Regarding
If there is a record with a primary key("ID") that already exists, either in the table or in the records to be committed to a table, I want to replace the existing record with the new record.
If you can be sure that no concurrent updates to the table in question will happen, you can use Session.merge for this kind of task:
# Records have primary key set, on which merge can either load existing
# state and merge, or create a new record in session if none was found.
for record in records:
merged_record = session.merge(record)
# Note that merged_record is not record
session.commit()
The SQLAlchemy merge will first check if an instance with given primary key exists in the identity map. If it doesn't and load is passed as True it'll check the database for the primary key. If a given instance has no primary key or an instance cannot be found, a new instance will be created.
The merge will then copy the state of the given instance onto the located/created instance. The new instance is returned.

No. There is a much better pattern for doing this. Do a query first to see if the record already exists, and then proceed accordingly.
Using your syntax, it would be something like the following:
result = session.query().filter(Class.ID == record.ID).first()
# If record does not exist in Db, then add record
if result is None:
try:
session.add(record)
session.commit()
except:
db.rollback()
log.error('Rolling back transaction in query-none block')
# If record does exist, then update value of record in Db
else:
try:
session.query().\
filter(Class.ID == record.ID).\
update(some expression)
session.commit()
except:
db.rollback()
log.error('Rolling back transaction')
It's usually a good idea to wrap your database operations in a try/except block , so you're on the right track with the try-portion of what you wrote. Depending on what you're doing, the except block should typically show you an error message or do a db rollback.

SQLAlchemy and explicit locking

I have multiple processes that can potentially insert duplicate rows into the database. These inserts do not happen very frequently (a few times every hour) so it is not performance critical.
I've tried an exist check before doing the insert, like so:
#Assume we're inserting a camera object, that's a valid SQLAlchemy ORM object that inherits from declarative_base...
try:
stmt = exists().where(Camera.id == camera_id)
exists_result = session.query(Camera).with_lockmode("update").filter(stmt).first()
if exists_result is None:
session.add(Camera(...)) #Lots of parameters, just assume it works
session.commit()
except IntegrityError as e:
session.rollback()
The problem I'm running into is that the exist() check doesn't lock the table, and so there is a chance that multiple processes could attempt to insert the same object at the same time. In such a scenario, one process succeeds with the insert and the others fail with an IntegrityError exception. While this works, it doesn't feel "clean" to me.
I would really like some way of locking the Camera table before doing the exists() check.

Pehaps this might be of interest to you:
https://groups.google.com/forum/?fromgroups=#!topic/sqlalchemy/8WLhbsp2nls
You can lock the tables by executing the SQL directly. I'm not sure what that looks like in Elixir, but in plain SA it'd be something like:
conn = engine.connect()
conn.execute("LOCK TABLES Pointer WRITE")
#do stuff with conn
conn.execute("UNLOCK TABLES")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.