Explicitly checking if an SQL INSERT operation succeeded unnecessary? - python

I'm using Python to talk to a Postgres DBMS using psycopg2.
Is it safe to assume that if an INSERT returns without raising an exception, then that INSERT actually did store something new in the database?
Right now, I've been checking the 'rowcount' attribute of the database cursor, and if it's 0 then that means the INSERT failed. However, I'm starting to think that this isn't necessary.

Is it safe to assume that if an INSERT returns without raising an
exception, then that INSERT actually did store something new in the
database?
No.
The affected record count will be zero if:
You ran an INSERT INTO ... SELECT ..., and the query returned no rows
You ran an INSERT INTO ... ON CONFLICT DO NOTHING, and it encountered a conflict
You have a BEFORE INSERT trigger on your table, and the trigger function returned NULL
You have a rule defined which results in no records being affected (e.g. ... DO INSTEAD NOTHING)
(... and possibly more, though nothing comes to mind.)
The common thread is that it will only affect zero records if you told it to, one way or another. Whether you want to treat any of these as a "failure" is highly dependent on your application logic.
Anything which is unequivocally a "failure" (constraint violation, serialisation failure, out of disk space...) should throw an error, so checking the record count is generally unnecessary.

By default postgres will return None for a successful insert:
cursor.execute - The method returns None. If a query was executed, the returned values can be retrieved using fetch*() methods.
http://initd.org/psycopg/docs/cursor.html
If you want to know something about the insert, an easy/efficient option is to use RETURNING (which takes the same options as a SELECT):
INSERT INTO ... RETURNING id

found similar question here, How to check if value is inserted successfully or not?
they seem to use the row count method to check if the data was inserted correctly.

Related

Python MySQLdb not returning the result of last query?

So I have a function using MySQLdb:
def getUserPoints(uid):
qServer.execute("SELECT points FROM TS3_STAMM_1 WHERE ts3_uid=%s", (uid,))
qConn.commit()
r = int(qServer.fetchall()[0][0])
return r
which returns a single unsigned int.
Now two things happen:
if I leave out the qConn.commit() it will always return the same value, even though the value on the mySQL database changed. (But isnt the commit call just for changing things?)
And also, for some reason the query returns the same value as last query if there hasnt been a query for that exact entry for over 10 minutes. But after querying a second time like a second later, it returns the new value.
Why is that? Is it an issue with my code or the query? Maybe there is a cache which isnt cleared and returned on the second time?
I also tried just running the query twice, but there is still the same problem.
Also getting rid of the commit call doesn't change anything, and fetching twice doesn't change it either.
The default with mysql-python is autocommit=False. That means that your queries implicitly start transactions and you need to explicity call commit to commit changes to the database.
If you are running with the REPEATABLE READ isolation level, you won't see changes from other transactions. When you call getUserPoints, the first select is in the old transaction so you get the old value. Then the transaction is committed, so you get the updated value when you call getUserPoints again.
The autocommit=False behaviour can be unintuitive. Django, for example, defaults to autocommit=True, and they recommend the READ COMMITTED isolation level instead of REPEATABLE READ (the default for MySQL).

PonyORM (Python) "Value was updated outside of current transaction" but it wasn't

I'm using Pony ORM version 0.7 with a Sqlite3 database on disk, and running into this issue: I am performing a select, then an update, then a select, then another update, and getting an error message of
pony.orm.core.UnrepeatableReadError: Value of Task.order_id for
Task[23654] was updated outside of current transaction (was: 1, now: 2)
I've reduced the problem to the minimum set of commands that causes the problem (i.e. removing anything causes the problem not to occur):
#db_session
def test_method():
tasks = list(map(Task.to_dict, Task.select()))
db.execute("UPDATE Task SET order_id=order_id*2")
task_to_move = select(task for task in Task if task.order_id == 2).first()
task_to_move.order_id = 1
test_method()
For completeness's sake, here is the definition of Task:
class Task(db.Entity):
text = Required(unicode)
heading = Required(int)
create_timestamp = Required(datetime)
done_timestamp = Optional(datetime)
order_id = Required(int)
Also, if I remove the constraint that task.order_id == 2 from my select, the problem no longer occurs, so I assume the problem has something to do with querying based on a field that has been changed since the transaction has started, but I don't know why the error message is telling me that it was changed by a different transaction (unless maybe db.execute is executing in a separate transaction because it is raw SQL?)
I've already looked at this similar question, but the problem was different (Pony ORM reports record "was updated outside of current transaction" while there is not other transaction) and at this documentation (https://docs.ponyorm.com/transactions.html) but neither solved my problem.
Any ideas what might be going on here?
Pony uses optimistic concurrency control by default. For each attribute Pony remembers its current value (potentially modified by application code) as well as original value which was read from the database. During UPDATE Pony checks that the value of column in the database is still the same. If the value is changed, Pony assumes that some concurrent transaction did it, and throw exception in order to avoid the "lost update" situation.
If you execute some raw SQL query, Pony does not know what exactly was modified in the database. So when Pony encounters that the counter value was changed, it mistakenly thinks that the value was changed by another transaction.
In order to avoid the problem you can mark order_id attribute as volatile. Then Pony will assume, that the value of attribute can change at any time (by trigger or raw SQL update), and will exclude that attribute from optimistic checks:
class Task(db.Entity):
text = Required(unicode)
heading = Required(int)
create_timestamp = Required(datetime)
done_timestamp = Optional(datetime)
order_id = Required(int, volatile=True)
Note that Pony will cache the value of volatile attribute and will not re-read the value from the database until the object was saved, so in some situation you can get obsolete value in Python.
Update:
Starting from release 0.7.4 you can also specify optimistic=False option to db_session to turn off optimistic checks for specific transaction that uses raw SQL queries:
with db_session(optimistic=False):
...
or
#db_session(optimistic=False)
def some_function():
...
Also it is possible now to specify optimistic=False option for attribute instead of specifying volatile=True. Then Pony will not make optimistic checks for that attribute, but will still consider treat it as non-volatile

conditional add statement in SQLAlchemy

Suppose I want to upload several SQL records, to a table that may not be populated yet. If there is a record with a primary key("ID") that already exists, either in the table or in the records to be committed to a table, I want to replace the existing record with the new record.
I'm using mssql, SQL server 2008.
My first guess would be
try:
session.add(record)
session.commit
except:
session.query().\
filter(Class.ID == record.ID).\
update(some expression)
session.commit()
what should the expression be? and is there a cleaner(and safer!) way of doing this?
In general unless using statements that guarantee atomicity, you'll always have to account for race conditions that might arise from multiple actors trying to either insert or update (don't forget delete). Even the MERGE statement, though a single statement, can have race conditions if not used correctly.
Traditionally this kind of "upsert" is performed using stored procedures or other SQL or implementation specific features available, like the MERGE statement.
An SQLAlchemy solution has to either attempt the insert and perform an update if an integrity error is raised, or perform the udpate and attempt an insert if no rows were affected. It should be prepared to retry in case both operations fail (a row might get deleted or inserted in between):
from sqlalchemy.exc import IntegrityError
while True: # Infinite loop, use a retry counter if necessary
try:
# begin a save point, prevents the whole transaction failing
# in case of an integrity error
with session.begin_nested():
session.add(record)
# Flush instead of commit, we need the transaction intact
session.flush()
# If the flush is successful, break out of the loop as the insert
# was performed
break
except IntegrityError:
# Attempt the update. If the session has to reflect the changes
# performed by the update, change the `synchronize_session` argument.
if session.query(Class).\
filter_by(ID=record.ID).\
update({...},
syncronize_session=False):
# 1 or more rows were affected (hopefully 1)
break
# Nothing was updated, perhaps a DELETE in between
# Both operations have failed, retry
session.commit()
Regarding
If there is a record with a primary key("ID") that already exists, either in the table or in the records to be committed to a table, I want to replace the existing record with the new record.
If you can be sure that no concurrent updates to the table in question will happen, you can use Session.merge for this kind of task:
# Records have primary key set, on which merge can either load existing
# state and merge, or create a new record in session if none was found.
for record in records:
merged_record = session.merge(record)
# Note that merged_record is not record
session.commit()
The SQLAlchemy merge will first check if an instance with given primary key exists in the identity map. If it doesn't and load is passed as True it'll check the database for the primary key. If a given instance has no primary key or an instance cannot be found, a new instance will be created.
The merge will then copy the state of the given instance onto the located/created instance. The new instance is returned.
No. There is a much better pattern for doing this. Do a query first to see if the record already exists, and then proceed accordingly.
Using your syntax, it would be something like the following:
result = session.query().filter(Class.ID == record.ID).first()
# If record does not exist in Db, then add record
if result is None:
try:
session.add(record)
session.commit()
except:
db.rollback()
log.error('Rolling back transaction in query-none block')
# If record does exist, then update value of record in Db
else:
try:
session.query().\
filter(Class.ID == record.ID).\
update(some expression)
session.commit()
except:
db.rollback()
log.error('Rolling back transaction')
It's usually a good idea to wrap your database operations in a try/except block , so you're on the right track with the try-portion of what you wrote. Depending on what you're doing, the except block should typically show you an error message or do a db rollback.

Is there a better way to deal with DoesNotExist query sets

There is probably a better way of dealing with non existant query sets...!
The problem i have with this code is that it raises an exception if the normal case will be true! That is: if a workspace name with the same name in the db is not existent.
But instead of having an exception i would like to go for a query that does not return DoesNotExist but true or false
My unelegant code:
try:
is_workspace_name = Workspace.objects.get(workspace_name=workspace_name,user=self.user.id )
except:
return workspace_name
if is_workspace_name:
raise forms.ValidationError(u'%s already exists as a workspace name! Please choose a different one!' %workspace_name )
Thanks a lot!
You can use exists() method. Quoting docs:
Returns True if the QuerySet contains any results, and False if not.
This tries to perform the query in the simplest and fastest way
possible, but it does execute nearly the same query as a normal
QuerySet query.
Remarks: the simplest and fastest way. It is cheaper to use exists (than count) because with exists the database stops counting at first occurrence.
if Workspace.objects.filter(workspace_name=workspace_name,
user=self.user.id).exists()
raise forms.ValidationError(u'%s already exists ...!' % workspace_name)
else:
return workspace_name
Checking for the existence of a record.
If you want to test for the existence of a record in your database, you could be using Workspace.objects.filter(workspace_name = workspace_name,user = self.user.id).count().
This will return the number of records matching your conditions. This number will be 0 in case there is none, which will be readily usable with an if clause. I believe this to me the most standard and easy way to do what you need here.
## EDIT ## Actually that's false, you might want to check danihp's answer for a better solution using Queryset.exists!
A word of warning: the case of checking for existence before insertion
Be cautious when using such a construct however, especially if you plan on checking whether you have a duplicate before trying to insert a record. In such a case, the best solution is to try to create the record and see if it raises an exception.
Indeed, you could be in the following situation:
Request 1 reaches the server
Request 2 reaches the server
Check is done for request 1, no object exist.
Check is done for request 2, no object exist.
Proceed with creation in request 1.
Proceed with creation in request 2.
And... you have a duplicate - this is called a race condition, and is a common issue when dealing with parallel code.
Long story short, you should use try, expect and unique constraints when dealing with insertion.
Using get_or_create, as suggested by init3, also helps. Indeed, get_or_create is aware of this, and you'll be safe so long as unwanted duplicated would raise an IntegrityError
obj, created = Workspace.objects.get_or_create(workspace_name=workspace_name, user=self.user.id)
if created:
# everything ok
# do something
pass
else:
# not ok
# respond he should choose anything else
pass
read more at the docs

SQLAlchemy - INSERT OR REPLACE equivalent

does anybody know what is the equivalent to SQL "INSERT OR REPLACE" clause in SQLAlchemy and its SQL expression language?
Many thanks -- honzas
What about Session.merge?
Session.merge(instance, load=True, **kw)
Copy the state an instance onto the persistent instance with the same identifier.
If there is no persistent instance currently associated with the session, it will be loaded. Return the persistent instance. If the given instance is unsaved, save a copy of and return it as a newly persistent instance. The given instance does not become associated with the session. This operation cascades to associated instances if the association is mapped with cascade="merge".
from http://www.sqlalchemy.org/docs/reference/orm/sessions.html
Session.save_or_update(model)
I don't think (correct me if I'm wrong) INSERT OR REPLACE is in any of the SQL standards; it's an SQLite-specific thing. There is MERGE, but that isn't supported by all dialects either. So it's not available in SQLAlchemy's general dialect.
The cleanest solution is to use Session, as suggested by M. Utku. You could also use SAVEPOINTs to save, try: an insert, except IntegrityError: then rollback and do an update instead. A third solution is to write your INSERT with an OUTER JOIN and a WHERE clause that filters on the rows with nulls.
You can use OR REPLACE as a so-called prefix in your SQLAlchemy Insert -- the documentation for how to place OR REPLACE between INSERT and INTO in your SQL statement is here

Categories