How to avoid integrity errors occuring because of concurrent creations/updates? - python

So let's say there's a model A which looks like this:
class A(model):
name = char(unique=True)
When a user tries to create a new A, a view will check whether the name is already taken. Like that:
name_taken = A.objects.get(name=name_passed_by_user)
if name_taken:
return "Name exists!"
# Creating A here
It used to work well, but as the system grew there started to appear concurrent attempts at creating A's with the same name. And sometimes multiple requests pass the "name exists check" in the same few milliseconds, resulting in integrity errors, since the name field has to be UNIQUE, and multiple requests to create a certain name pass the check.
The current solution is a lot of "try: except IntegrityError:" wraps around creation parts, despite the prior check. Is there a way to avoid that? Because there are a lot of models with UNIQUE constraints like that, thus a lot of ugly "try: except IntegrityError:" wraps. Is it possible to lock as to not prevent from SELECTing, but lock as to prevent from SELECTing FOR UPDATE? Or maybe there's a more proper solution? I'm certain it's a common problem with usernames and other fields/columns like them, and there must be a proper approach rather than exception catching.
The DB is Postgres10, ORM is SQLAlchemy of Python, but tweaks to db directly are applicable too.

The only thing you can do is to set the appropriate transaction isolation directly to postgres. Neither python nor the ORM can do anything about it. serialized level will most likely solve your problem. But it might slow down performance, so you should try repeatable read too.

If you are using Python, you should have heard of the “ask forgiveness, not permission” design principle.
To avoid the race condition you describe, simply try to add the new row to the table.
If you get a unique_violation (SQLSTATE 23505), rollback the transaction and return that the name exists.

Related

SQLAlchemy add vs add_all

I'm trying to figure out when to use session.add and when to use session.add_all with SQLAlchemy.
Specifically, I don't understand the downsides of using add_all. It can do everything that add can do, so why not just always use it? There is no mention of this in the SQLalchemy documentation.
If you only have one new record to add, then use sqlalchemy.orm.session.Session.add() but if you have multiple records then use sqlalchemy.orm.session.Session.add_all(). There's not really a significant difference, except the API of the first method is for a single instance whereas the second is for multiple instances. Is that a big difference? No. It's just convenience.
I was wondering about the same and as mentioned by others, there is no real difference. However, I would like to add that using add in a loop, instead of using add_all allows you to be more fine grained regarding exception handling. Passing a list of mapped class instances to add_all will cause a rollback for all instances if, for example, one of these objects violates a constraint (e.g., unique). I prefer to decouple my data logic from my service logic and decide what to do with instances not stored in the service layer by returning them from my data layer. However, I think it depends on how you are handling exceptions.

Conflict resolution in ZODB

I do run parallel write requests on my ZODB. I do have multiple BTree instances inside my ZODB. Once the server accesses the same objects inside such a BTree, I get a ConflictError for the IOBucket class. For all my Django bases classes I do have _p_resolveconflict set up, but can't implement it for IOBucket 'cause its a C based class.
I did a deeper analysis, but still don't understand why it complains about the IOBucket class and what it writes into it. Additionally, what would be the right strategy to resolve it?
Thousand thanks for any help!
IOBucket is part of the persistence structure of a BTree; it exists to try and reduce conflict errors, and it does try and resolve conflicts where possible.
That said, conflicts are not always avoidable, and you should restart your transaction. In Zope, for example, the whole request is re-run up to 5 times if a ConflictError is raised. Conflicts are ZODB's way of handling the (hopefully rare) occasion where two different requests tried to change the exact same data structure.
Restarting your transaction means calling transaction.begin() and applying the same changes again. The .begin() will fetch any changes made by the other process and your commit will be based on the fresh data.

SQLAlchemy(Postgresql) - Race Conditions

We are writing an inventory system and I have some questions about sqlalchemy (postgresql) and transactions/sessions. This is a web app using TG2, not sure this matters but to much info is never a bad.
How can make sure that when changing inventory qty's that i don't run into race conditions. If i understand it correctly if user on is going to decrement inventory on an item to say 0 and user two is also trying to decrement the inventory to 0 then if user 1s session hasn't been committed yet then user two starting inventory number is going to be the same as user one resulting in a race condition when both commit, one overwriting the other instead of having a compound effect.
If i wanted to use postgresql sequence for things like order/invoice numbers how can I get/set next values from sqlalchemy without running into race conditions?
EDIT: I think i found the solution i need to use with_lockmode, using for update or for share. I am going to leave open for more answers or for others to correct me if I am mistaken.
TIA
If two transactions try to set the same value at the same time one of them will fail. The one that loses will need error handling. For your particular example you will want to query for the number of parts and update the number of parts in the same transaction.
There is no race condition on sequence numbers. Save a record that uses a sequence number the DB will automatically assign it.
Edit:
Note as Limscoder points out you need to set the isolation level to Repeatable Read.
Setup the scenario you are talking about and see how your configuration handles it. Just open up two separate connections to test it.
Also read up on FOR UPDATE For Update and also on transaction isolation level Isolation Level

Will two programs using SqlAlchemy conflict when trying to access the same table (SQLite)?

I'm going to have two independent programs (using SqlAlchemy / ORM / Declarative)
that will inevitably try to access the same database-file/table(SQLite) at the same time.
They could both want to read or write to that table.
Will there be a conflict when this happens?
If the answer is yes, how could this be handled?
Sqlite is resistant to any issues as you describe. http://www.sqlite.org/howtocorrupt.html gives you details on what could cause problems, and they're generally isolated from anything the code might accidentally do.
If you're concerned due to the nature of your application data access, use BEGIN TRANSACTION and COMMIT/ROLLBACK as appropriate. If your transactions are single query access (that is, you're not reading a value in one query and then changing it in another relative to what you already read), this should not be necessary.

What's a good general way to look SQLAlchemy transactions, complete with authenticated user, etc?

I'm using SQLAlchemy's declarative extension. I'd like all changes to tables logs, including changes in many-to-many relationships (mapping tables). Each table should have a separate "log" table with a similar schema, but additional columns specifying when the change was made, who made the change, etc.
My programming model would be something like this:
row.foo = 1
row.log_version(username, change_description, ...)
Ideally, the system wouldn't allow the transaction to commit without row.log_version being called.
Thoughts?
There are too many questions in one, so they that full answers to all them won't fit StackOverflow answer format. I'll try to describe hints in short, so ask separate question for them if it's not enough.
Assigning user and description to transaction
The most popular way to do so is assigning user (and other info) to some global object (threading.local() in threaded application). This is very bad way, that causes hard to discover bugs.
A better way is assigning user to the session. This is OK when session is created for each web request (in fact, it's the best design for application with authentication anyway), since there is the only user using this session. But passing description this way is not as good.
And my favorite solution is to extent Session.commit() method to accept optional user (and probably other info) parameter and assign it current transaction. This is the most flexible, and it suites well to pass description too. Note that info is bound to single transaction and is passed in obvious way when transaction is closed.
Discovering changes
There is a sqlalchemy.org.attributes.instance_state(obj) contains all information you need. The most useful for you is probably state.committed_state dictionary which contains original state for changed fields (including many-to-many relations!). There is also state.get_history() method (or sqlalchemy.org.attributes.get_history() function) returning a history object with has_changes() method and added and deleted properties for new and old value respectively. In later case use state.manager.keys() (or state.manager.attributes) to get a list of all fields.
Automatically storing changes
SQLAlchemy supports mapper extension that can provide hooks before and after update, insert and delete. You need to provide your own extension with all before hooks (you can't use after since the state of objects is changed on flush). For declarative extension it's easy to write a subclass of DeclarativeMeta that adds a mapper extension for all your models. Note that you have to flush changes twice if you use mapped objects for log, since a unit of work doesn't account objects created in hooks.
We have a pretty comprehensive "versioning" recipe at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/LogVersions . It seems some other users have contributed some variants on it. The mechanics of "add a row when something changes at the ORM level" are all there.
Alternatively you can also intercept at the execution level using ConnectionProxy, search through the SQLA docs for how to use that.
edit: versioning is now an example included with SQLA: http://docs.sqlalchemy.org/en/rel_0_8/orm/examples.html#versioned-objects

Categories