conditional add statement in SQLAlchemy - python

Suppose I want to upload several SQL records, to a table that may not be populated yet. If there is a record with a primary key("ID") that already exists, either in the table or in the records to be committed to a table, I want to replace the existing record with the new record.
I'm using mssql, SQL server 2008.
My first guess would be
try:
session.add(record)
session.commit
except:
session.query().\
filter(Class.ID == record.ID).\
update(some expression)
session.commit()
what should the expression be? and is there a cleaner(and safer!) way of doing this?

In general unless using statements that guarantee atomicity, you'll always have to account for race conditions that might arise from multiple actors trying to either insert or update (don't forget delete). Even the MERGE statement, though a single statement, can have race conditions if not used correctly.
Traditionally this kind of "upsert" is performed using stored procedures or other SQL or implementation specific features available, like the MERGE statement.
An SQLAlchemy solution has to either attempt the insert and perform an update if an integrity error is raised, or perform the udpate and attempt an insert if no rows were affected. It should be prepared to retry in case both operations fail (a row might get deleted or inserted in between):
from sqlalchemy.exc import IntegrityError
while True: # Infinite loop, use a retry counter if necessary
try:
# begin a save point, prevents the whole transaction failing
# in case of an integrity error
with session.begin_nested():
session.add(record)
# Flush instead of commit, we need the transaction intact
session.flush()
# If the flush is successful, break out of the loop as the insert
# was performed
break
except IntegrityError:
# Attempt the update. If the session has to reflect the changes
# performed by the update, change the `synchronize_session` argument.
if session.query(Class).\
filter_by(ID=record.ID).\
update({...},
syncronize_session=False):
# 1 or more rows were affected (hopefully 1)
break
# Nothing was updated, perhaps a DELETE in between
# Both operations have failed, retry
session.commit()
Regarding
If there is a record with a primary key("ID") that already exists, either in the table or in the records to be committed to a table, I want to replace the existing record with the new record.
If you can be sure that no concurrent updates to the table in question will happen, you can use Session.merge for this kind of task:
# Records have primary key set, on which merge can either load existing
# state and merge, or create a new record in session if none was found.
for record in records:
merged_record = session.merge(record)
# Note that merged_record is not record
session.commit()
The SQLAlchemy merge will first check if an instance with given primary key exists in the identity map. If it doesn't and load is passed as True it'll check the database for the primary key. If a given instance has no primary key or an instance cannot be found, a new instance will be created.
The merge will then copy the state of the given instance onto the located/created instance. The new instance is returned.

No. There is a much better pattern for doing this. Do a query first to see if the record already exists, and then proceed accordingly.
Using your syntax, it would be something like the following:
result = session.query().filter(Class.ID == record.ID).first()
# If record does not exist in Db, then add record
if result is None:
try:
session.add(record)
session.commit()
except:
db.rollback()
log.error('Rolling back transaction in query-none block')
# If record does exist, then update value of record in Db
else:
try:
session.query().\
filter(Class.ID == record.ID).\
update(some expression)
session.commit()
except:
db.rollback()
log.error('Rolling back transaction')
It's usually a good idea to wrap your database operations in a try/except block , so you're on the right track with the try-portion of what you wrote. Depending on what you're doing, the except block should typically show you an error message or do a db rollback.

Related

Explicitly checking if an SQL INSERT operation succeeded unnecessary?

I'm using Python to talk to a Postgres DBMS using psycopg2.
Is it safe to assume that if an INSERT returns without raising an exception, then that INSERT actually did store something new in the database?
Right now, I've been checking the 'rowcount' attribute of the database cursor, and if it's 0 then that means the INSERT failed. However, I'm starting to think that this isn't necessary.
Is it safe to assume that if an INSERT returns without raising an
exception, then that INSERT actually did store something new in the
database?
No.
The affected record count will be zero if:
You ran an INSERT INTO ... SELECT ..., and the query returned no rows
You ran an INSERT INTO ... ON CONFLICT DO NOTHING, and it encountered a conflict
You have a BEFORE INSERT trigger on your table, and the trigger function returned NULL
You have a rule defined which results in no records being affected (e.g. ... DO INSTEAD NOTHING)
(... and possibly more, though nothing comes to mind.)
The common thread is that it will only affect zero records if you told it to, one way or another. Whether you want to treat any of these as a "failure" is highly dependent on your application logic.
Anything which is unequivocally a "failure" (constraint violation, serialisation failure, out of disk space...) should throw an error, so checking the record count is generally unnecessary.
By default postgres will return None for a successful insert:
cursor.execute - The method returns None. If a query was executed, the returned values can be retrieved using fetch*() methods.
http://initd.org/psycopg/docs/cursor.html
If you want to know something about the insert, an easy/efficient option is to use RETURNING (which takes the same options as a SELECT):
INSERT INTO ... RETURNING id
found similar question here, How to check if value is inserted successfully or not?
they seem to use the row count method to check if the data was inserted correctly.

PonyORM (Python) "Value was updated outside of current transaction" but it wasn't

I'm using Pony ORM version 0.7 with a Sqlite3 database on disk, and running into this issue: I am performing a select, then an update, then a select, then another update, and getting an error message of
pony.orm.core.UnrepeatableReadError: Value of Task.order_id for
Task[23654] was updated outside of current transaction (was: 1, now: 2)
I've reduced the problem to the minimum set of commands that causes the problem (i.e. removing anything causes the problem not to occur):
#db_session
def test_method():
tasks = list(map(Task.to_dict, Task.select()))
db.execute("UPDATE Task SET order_id=order_id*2")
task_to_move = select(task for task in Task if task.order_id == 2).first()
task_to_move.order_id = 1
test_method()
For completeness's sake, here is the definition of Task:
class Task(db.Entity):
text = Required(unicode)
heading = Required(int)
create_timestamp = Required(datetime)
done_timestamp = Optional(datetime)
order_id = Required(int)
Also, if I remove the constraint that task.order_id == 2 from my select, the problem no longer occurs, so I assume the problem has something to do with querying based on a field that has been changed since the transaction has started, but I don't know why the error message is telling me that it was changed by a different transaction (unless maybe db.execute is executing in a separate transaction because it is raw SQL?)
I've already looked at this similar question, but the problem was different (Pony ORM reports record "was updated outside of current transaction" while there is not other transaction) and at this documentation (https://docs.ponyorm.com/transactions.html) but neither solved my problem.
Any ideas what might be going on here?
Pony uses optimistic concurrency control by default. For each attribute Pony remembers its current value (potentially modified by application code) as well as original value which was read from the database. During UPDATE Pony checks that the value of column in the database is still the same. If the value is changed, Pony assumes that some concurrent transaction did it, and throw exception in order to avoid the "lost update" situation.
If you execute some raw SQL query, Pony does not know what exactly was modified in the database. So when Pony encounters that the counter value was changed, it mistakenly thinks that the value was changed by another transaction.
In order to avoid the problem you can mark order_id attribute as volatile. Then Pony will assume, that the value of attribute can change at any time (by trigger or raw SQL update), and will exclude that attribute from optimistic checks:
class Task(db.Entity):
text = Required(unicode)
heading = Required(int)
create_timestamp = Required(datetime)
done_timestamp = Optional(datetime)
order_id = Required(int, volatile=True)
Note that Pony will cache the value of volatile attribute and will not re-read the value from the database until the object was saved, so in some situation you can get obsolete value in Python.
Update:
Starting from release 0.7.4 you can also specify optimistic=False option to db_session to turn off optimistic checks for specific transaction that uses raw SQL queries:
with db_session(optimistic=False):
...
or
#db_session(optimistic=False)
def some_function():
...
Also it is possible now to specify optimistic=False option for attribute instead of specifying volatile=True. Then Pony will not make optimistic checks for that attribute, but will still consider treat it as non-volatile

SQLAlchemy and explicit locking

I have multiple processes that can potentially insert duplicate rows into the database. These inserts do not happen very frequently (a few times every hour) so it is not performance critical.
I've tried an exist check before doing the insert, like so:
#Assume we're inserting a camera object, that's a valid SQLAlchemy ORM object that inherits from declarative_base...
try:
stmt = exists().where(Camera.id == camera_id)
exists_result = session.query(Camera).with_lockmode("update").filter(stmt).first()
if exists_result is None:
session.add(Camera(...)) #Lots of parameters, just assume it works
session.commit()
except IntegrityError as e:
session.rollback()
The problem I'm running into is that the exist() check doesn't lock the table, and so there is a chance that multiple processes could attempt to insert the same object at the same time. In such a scenario, one process succeeds with the insert and the others fail with an IntegrityError exception. While this works, it doesn't feel "clean" to me.
I would really like some way of locking the Camera table before doing the exists() check.
Pehaps this might be of interest to you:
https://groups.google.com/forum/?fromgroups=#!topic/sqlalchemy/8WLhbsp2nls
You can lock the tables by executing the SQL directly. I'm not sure what that looks like in Elixir, but in plain SA it'd be something like:
conn = engine.connect()
conn.execute("LOCK TABLES Pointer WRITE")
#do stuff with conn
conn.execute("UNLOCK TABLES")

When should I be calling flush() on SQLAlchemy?

I'm new to SQLAlchemy and have inherited a somewhat messy codebase without access to the original author.
The code is litered with calls to DBSession.flush(), seemingly any time the author wanted to make sure data was being saved. At first I was just following patterns I saw in this code, but as I'm reading docs, it seems this is unnecessary - that autoflushing should be in place. Additionally, I've gotten into a few cases with AJAX calls that generate the error "InvalidRequestError: Session is already flushing".
Under what scenarios would I legitimately want to keep a call to flush()?
This is a Pyramid app, and SQLAlchemy is being setup with:
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension(), expire_on_commit=False))
Base = declarative_base()
The ZopeTransactionExtension on the DBSession in conjunction with the pyramid_tm being active on your project will handle all commits for you. The situations where you need to flush are:
You want to create a new object and get back the primary key.
DBSession.add(obj)
DBSession.flush()
log.info('look, my new object got primary key %d', obj.id)
You want to try to execute some SQL in a savepoint and rollback if it fails without invalidating the entire transaction.
sp = transaction.savepoint()
try:
foo = Foo()
foo.id = 5
DBSession.add(foo)
DBSession.flush()
except IntegrityError:
log.error('something already has id 5!!')
sp.rollback()
In all other cases involving the ORM, the transaction will be aborted for you upon exception, or committed upon success automatically by pyramid_tm. If you execute raw SQL, you will need to execute transaction.commit() yourself or mark the session as dirty via zope.sqlalchemy.mark_changed(DBSession) otherwise there is no way for the ZTE to know the session has changed.
Also you should leave expire_on_commit at the default of True unless you have a really good reason.

How to create a rollback button in django with MySQLdb?

I want to create a rollback button in my django project using MySQLdb. I have tried to use commit() and rollback() with InnoDB as database engine, rollback() seems not work because the database was updated even though rollback() was put after commit(). Here is some related lines in python code:
def update(request):
if 'weight' in request.GET and request.GET['weight']:
weight = request.GET['weight']
else:
return HttpResponseRedirect('/clamplift/')
if 'realtag' in request.GET and request.GET['realtag']:
realtag = request.GET['realtag']
else:
return HttpResponseRedirect('/clamplift/')
conn = MySQLdb.Connect(host="localhost", user="root", passwd="", db="scale")
cur = conn.cursor()
cur.execute("UPDATE `scale`.`scale_stock` SET `current_weight` = %s WHERE `scale_stock`.`paper_roll_id` = %s", (weight,realtag))
conn.commit()
conn.rollback() # I test rollback() function here.
cur.close()
conn.close()
return HttpResponseRedirect('/clamplift/')
Actually I want one button for update data to database and another button to rollback like providing undo function.
Please give me any idea. Thank you very much.
I have no experience with MySQLdb directly, but most of the time, the rollback method is only good during a transaction. That is, if a transaction is still open, then rollback will undo everything that has happened since the start of the transaction. So when you call commit, you are ending the transaction and can no longer meaningfully call rollback.
In response to question in comments:
Add a datetime field to the model to track revisions. Create another model with the same fields as the original model and a foreign key into to original model's table. When an update occurs to the original model, copy the old values into the new table, set the datetime to NOW and set the foreign key in the revisions table to point at the row being edited before performing the update. You can then provide a list of previous states to revert to by selecting all of the revisions for the row in question and letting the user select based on datetime. Of course you can also provide the user that made the edit as well or other specific information that you think will be useful to the users of your app.

Categories