SQLAlchemy(Postgresql) - Race Conditions

SQLAlchemy(Postgresql) - Race Conditions - python

We are writing an inventory system and I have some questions about sqlalchemy (postgresql) and transactions/sessions. This is a web app using TG2, not sure this matters but to much info is never a bad.
How can make sure that when changing inventory qty's that i don't run into race conditions. If i understand it correctly if user on is going to decrement inventory on an item to say 0 and user two is also trying to decrement the inventory to 0 then if user 1s session hasn't been committed yet then user two starting inventory number is going to be the same as user one resulting in a race condition when both commit, one overwriting the other instead of having a compound effect.
If i wanted to use postgresql sequence for things like order/invoice numbers how can I get/set next values from sqlalchemy without running into race conditions?
EDIT: I think i found the solution i need to use with_lockmode, using for update or for share. I am going to leave open for more answers or for others to correct me if I am mistaken.
TIA

If two transactions try to set the same value at the same time one of them will fail. The one that loses will need error handling. For your particular example you will want to query for the number of parts and update the number of parts in the same transaction.
There is no race condition on sequence numbers. Save a record that uses a sequence number the DB will automatically assign it.
Edit:
Note as Limscoder points out you need to set the isolation level to Repeatable Read.

Setup the scenario you are talking about and see how your configuration handles it. Just open up two separate connections to test it.
Also read up on FOR UPDATE For Update and also on transaction isolation level Isolation Level

Related

How to avoid integrity errors occuring because of concurrent creations/updates?

So let's say there's a model A which looks like this:
class A(model):
name = char(unique=True)
When a user tries to create a new A, a view will check whether the name is already taken. Like that:
name_taken = A.objects.get(name=name_passed_by_user)
if name_taken:
return "Name exists!"
# Creating A here
It used to work well, but as the system grew there started to appear concurrent attempts at creating A's with the same name. And sometimes multiple requests pass the "name exists check" in the same few milliseconds, resulting in integrity errors, since the name field has to be UNIQUE, and multiple requests to create a certain name pass the check.
The current solution is a lot of "try: except IntegrityError:" wraps around creation parts, despite the prior check. Is there a way to avoid that? Because there are a lot of models with UNIQUE constraints like that, thus a lot of ugly "try: except IntegrityError:" wraps. Is it possible to lock as to not prevent from SELECTing, but lock as to prevent from SELECTing FOR UPDATE? Or maybe there's a more proper solution? I'm certain it's a common problem with usernames and other fields/columns like them, and there must be a proper approach rather than exception catching.
The DB is Postgres10, ORM is SQLAlchemy of Python, but tweaks to db directly are applicable too.

The only thing you can do is to set the appropriate transaction isolation directly to postgres. Neither python nor the ORM can do anything about it. serialized level will most likely solve your problem. But it might slow down performance, so you should try repeatable read too.

If you are using Python, you should have heard of the “ask forgiveness, not permission” design principle.
To avoid the race condition you describe, simply try to add the new row to the table.
If you get a unique_violation (SQLSTATE 23505), rollback the transaction and return that the name exists.

django: Why transaction.savepoint_rollback(savepoint_id) doesn't rollback the PK (id) sequence in the database?

I'm using django 1.9 with psql. I'm taking advantage of the transaction.savepoint_rollback functionality to create a large number of instances and then rollback the changes.
Everything works as expected. However, I find interesting that this functionality doesn't rollback the id sequence for the created model. e.g. if 1000 objects are created and rolled back, the new objects will start with ids greater than 1000.
Wondering if somebody knows how to roll back the IDs or if that isn't possible at all?

Well, I found this is not really a django related situation. I found this discussion:
The general idea with sequences is that they produce numbers that can be
meaningfully compared for equality and for greater/less-than, but not
for distance from each other. Because they're exempt from transactional
rollback you shouldn't use them when you need a gap-less sequence of
numbers.
It's usually a sign of an application design problem when you need a
gapless sequence. Try to work out a way to do what you need when there
can be gaps. Sometimes it's genuinely necessary to have gapless
sequences though - for example, when generating cheque or invoice numbers.

Multiple queries vs. manually sorting one large query (AppEngine NDB)

For a model like:
class Thing(ndb.Model):
visible = ndb.BooleanProperty()
made_by = ndb.KeyProperty(kind=User)
belongs_to = ndb.KeyProperty(kind=AnotherThing)
Essentially performing an 'or' query, but comparing different properties so I can't use a built in OR... I want to get all Thing (belonging to a particular AnotherThing) which either have visible set to True or visible is False and made_by is the current user.
Which would be less demanding on the datastore (ie financially cost less):
Query to get everything, ie: Thing.query(Thing.belongs_to == some_thing.key) and iterate through the results, storing the visible ones, and the ones that aren't visible but are made_by the current user?
Query to get the visible ones, ie: Thing.query(Thing.belongs_to == some_thing.key, Thing.visible == "True") and query separately to get the non-visible ones by the current user, ie: Thing.query(Thing.belongs_to == some_thing.key, Thing.visible == "False", Thing.made_by = current_user)?
Number 1. would get many unneeded results, like non-visible Things by other users - which I think is many reads of the datastore? 2. is two whole queries though, which is also possibly unnecessarily heavy, right? I'm still trying to work out what kinds of interaction with the database cause what kinds of costs.
I'm using ndb, tasklets and memcache where necessary, in case that's relevant.

Number two is going to be financially less for two reasons. First you pay for each read of the data store and for each returned entity in a query, therefore you will be paying more for the first one which you have to Read all data and query all data. The second way you only pay for what you need.
Secondly you also pay for backend or frontend time, and you will be using time to iterate through all your results in the first method, where as you need to spend no time for the second method.
I can't see a way where the first option is better. (maybe if you only have a few entities??)
To understand how reads and queries cost you scroll down a little on:
https://developers.google.com/appengine/docs/billing
You will see how Read, Writes and Smalls are added up for reads, writes and queries.
I would also just query for ones that are owned by the current user instead of visible=false and owner=current, this way you don't need a composite index which will save some time. You can also make visible a partial index this was saving some space as well (only index it when true, assuming you never need to query for false ones). You will need to do a litte work to remove duplicates, but that is probably not to bad.

You are probably best benchmarking both cases using real-world data. It's hard to determine things like this in the abstract, as there are many subtleties that may affect overall performance.
I would expect option 2 to be better though. Loading tons of objects that you don't care about is simply going to put a heavy burden on the data store that I don't think an extra query would be comparable to. Of course, it depends on how many extra things, etc.

Python MySQLdb: Update if exists, else insert

I am looking for a simple way to query an update or insert based on if the row exists in the first place. I am trying to use Python's MySQLdb right now.
This is how I execute my query:
self.cursor.execute("""UPDATE `inventory`
SET `quantity` = `quantity`+{1}
WHERE `item_number` = {0}
""".format(item_number,quantity));
I have seen four ways to accomplish this:
DUPLICATE KEY. Unfortunately the primary key is already taken up as a unique ID so I can't use this.
REPLACE. Same as above, I believe it relies on a primary key to work properly.
mysql_affected_rows(). Usually you can use this after updating the row to see if anything was affected. I don't believe MySQLdb in Python supports this feature.
Of course the last ditch effort: Make a SELECT query, fetchall, then update or insert based on the result. Basically I am just trying to keep the queries to a minimum, so 2 queries instead of 1 is less than ideal right now.
Basically I am wondering if I missed any other way to accomplish this before going with option 4. Thanks for your time.

Mysql DOES allow you to have unique indexes, and INSERT ... ON DUPLICATE UPDATE will do the update if any unique index has a duplicate, not just the PK.
However, I'd probably still go for the "two queries" approach. You are doing this in a transaction, right?
Do the update
Check the rows affected, if it's 0 then do the insert
OR
Attempt the insert
If it failed because of a unique index violation, do the update (NB: You'll want to check the error code to make sure it didn't fail for some OTHER reason)
The former is good if the row will usually exist already, but can cause a race (or deadlock) condition if you do it outside a transaction or have your isolation mode is not high enough.
Creating a unique index on item_number in your inventory table sounds like a good idea to me, because I imagine (without knowing the details of your schema) that one item should only have a single stock level (assuming your system doesn't allow multiple stock locations etc).

Insert/Delete performance

DB Table:
id int(6)
message char(5)
I have to add a record (message) to the DB table. In case of duplicate message(this message already exists with different id) I want to delete (or inactivate somehow) the both of the messages and get their ID's in reply.
Is it possible to perform with only one query? Any performance tips ?...
P.S.
I use PostgreSQL.
The main my problem I worried about, is a need to use locks when performing this with two or more queries...
Many thanks!

If you really want to worry about locking do this.
UPDATE table SET status='INACTIVE' WHERE id = 'key';
If this succeeds, there was a duplicate.
INSERT the additional inactive record. Do whatever else you want with your duplicates.
If this fails, there was no duplicate.
INSERT the new active record.
Commit.
This seizes an exclusive lock right away. The alternatives aren't quite as nice.
Start with an INSERT and check for duplicates doesn't seize a lock until you start updating. It's not clear if this is a problem or not.
Start with a SELECT would need to add a LOCK TABLE to assure that the select held the row found so it could be updated. If no row is found, the insert will work fine.
If you have multiple concurrent writers and two writers could attempt access at the same time, you may not be able to tolerate row-level locking.
Consider this.
Process A does a LOCK ROW and a SELECT but finds no row.
Process B does a LOCK ROW and a SELECT but finds no row.
Process A does an INSERT and a COMMIT.
Process B does an INSERT and a COMMIT. You now have duplicate active records.
Multiple concurrent insert/update transactions will only work with table-level locking. Yes, it's a potential slow-down. Three rules: (1) Keep your transactions as short as possible, (2) release the locks as quickly as possible, (3) handle deadlocks by retrying.

You could write a procedure with both of those commands in it, but it may make more sense to use an insert trigger to check for duplicates (or a nightly job, if it's not time-sensitive).

It is a little difficult to understand your exact requirement. Let me rephrase it two ways:
You want both the entries with same messages in the table (with different IDs), and want to know the IDs for some further processing (marking them as inactive, etc.). For this, You could write a procedure with the separate queries. I don't think you can achieve this with one query.
You do not want either of the entries in the table (i got this from 'i want to delete'). For this, you only have to check if the message already exists and then delete the row if it does, else insert it. I don't think this too can be achieved with one query.
If performance is a constraint during insert, you could insert without any checks and then periodically, sanitize the database.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

SQLAlchemy(Postgresql) - Race Conditions - python

Setup the scenario you are talking about and see how your configuration handles it. Just open up two separate connections to test it. Also read up on FOR UPDATE For Update and also on transaction isolation level Isolation Level

Related

How to avoid integrity errors occuring because of concurrent creations/updates?

django: Why transaction.savepoint_rollback(savepoint_id) doesn't rollback the PK (id) sequence in the database?

Multiple queries vs. manually sorting one large query (AppEngine NDB)

Python MySQLdb: Update if exists, else insert

Insert/Delete performance

Categories

Resources