Python (PostgreSQL) wrap local and remote calls in one transaction

Python (PostgreSQL) wrap local and remote calls in one transaction - python

I am doing synchronization between two databases in Odoo. If it goes without any issues on remote, then it synchronizes on both sides. But if something goes wrong on remote, then local database changes are committed, but remote is not.
In other words, databases go out of sync.
Is there a way to make changes in local database and if something goes wrong trying to synchronize remote database, rollback local database to previous state.
There is this method:
#api.one
def order_process_now(self):
servers = self._synchro_before_sale()
# Process local order
inv_id = self.action_invoice_create()
if inv_id:
inv = self.env['account.invoice'].search([('id', '=', inv_id)])
inv.signal_workflow('invoice_open')
for picking in self.picking_ids:
picking.force_assign()
picking.action_done()
# Process remote orders
self._remote_order_action('order_process_now', servers)
As you can see it is divided into two parts. First it makes changes to local database, then makes changes on remote (using xmlrpclib with erppeek wrapper).
How can I make this method as one transaction, so if anything goes wrong executing method, any changes to databases would rollback?

What you need for this is two-phase commit.
The general idea is:
Begin your local and remote changes
Do the required work on each
On the remote side PREPARE TRANSACTION and take note of the returned ID in persistent storage
On the local side COMMIT the changes
On the remote side COMMIT PREPARED with the returned ID, or if the local commit failed for some reason, ROLLBACK PREPARED instead.
If your app restarts it must look at its record of of prepared-but-not-committed remote transactions and:
if the local transaction was committed) issue a COMMIT PREPARED; or
if the local transaction was NOT committed issue a ROLLBACK PREPARED
This is not simple to get right. The naïve approach that fails to record the local commit ID doesn't really fix anything, it just replaces inconsistent database state with leaked prepared transactions. You must actually keep a record of the prepared transactions and resolve them after a crash or restart. Remember that the ROLLBACK PREPARED or COMMIT PREPARED can fail due to connectivity issues, DB restarts, etc.
For this reason many people use a separate transaction manager that takes care of this part for them. MSDTC is an option on Windows systems. For Java you can supposedly use JTC. On C/UNIX systems you could use XA. Unfortunately distributed transaction managers appear to attract horrible, baroque and ill-defined API design (can you say javax.transaction.HeuristicMixedException?)

You'll need to look at two phase commits. Basically this lets you do a trial commit on each separate system and then only if both succeed do a final "real" commit.
You still need to deal with the case where e.g. the client crashes. Then you'll have prepared commits hanging about and you'll want to roll them back and start again.

Related

how to make database robust to process kills with sqlite postgress and sqlalchemy?

I am using python and sqlalchemy to manage a sqlite database (in the future I plan to replace sqlite with postgres).
The operations I do are INSERT, SELECT and DELETE and all these operations are part of a python script that runs every hour.
Each one of these operation can take a considerable amount of time due to the large amount of data.
Now in certain circumstances the python script may be killed by an external process. How can I make sure that my database is not corrupted if the script is killed while reading / writing from the DB?

Well, you use a database.
Databases implement ACID properties (see here). To the extent possible, these guarantee the integrity of the data, even when transactions are not complete.
The issue that you are focusing on is dropped connections. I think dropped connections usually result in a transaction being rolled back (I'm not sure if there are exceptions). That is, the database ignores everything since the last commit.
So, the database protects you against internal corruption. Your data model might become invalid, if the sequence of operations is stopped at an arbitrary place. The solution to this is to wrap such operations into a transaction, so the transaction is rolled back.
There is a (small) danger of databases getting corrupted when the hardware or software they are running on suddenly "disappears". This is rare and there are safeguards. And, this is not the problem that you are concerned with (unless your SQLite instance is part of your python process).

Is is possible to cherry-pick the db connection (persistent) django will use on each request on a per client/session basis?

I'd like to be able to tell someone (django, pgBouncer, or whoever can provide me with this service) to always hand me the same connection to the database (PostgreSQL in this case) on a per client/session basis, instead of getting a random one each time (or creating a new one for that matter).
To my knowledge:
Django's CONN_MAX_AGE can control the lifetime of connections, so
far so good. This will also have a positive impact on performance
(no connection setup penalties).
Some pooling package (pgBouncer for example) can hold the connections and hand them to me as I need them. We're almost there.
The only bit I'm missing is the possibility to ask pgBouncer (or any other similar tool for that matter) to give me a specific db connection, instead of "one from the pool". This is important because I want to have control over the lifetime of the transaction. I want to be able to open a transaction, then send a series of commands, and then manually commit all the work, or roll everything back should something fail.
Many years ago, I've implemented something very similar to what I'm looking for now. It was a simple connection pool made in C which would hold as many connections to oracle as clients needed on one hand, while on the other it would give these clients the chance to recover these exact connections based on some ID, which could have been for example a PHP session ID. That way users could acquire a lock on some database object/row, and the lock would persist even after the apache process died. From that point on the session owner was in total control of that row until he decided it was time to commit it, or until the backend decided it was time to let the transaction go by idleness.

About refreshing objects in sqlalchemy session

I am dealing with a doubt about sqlalchemy and objects refreshing!
I am in the situation in what I have 2 sessions, and the same object has been queried in both sessions! For some particular thing I cannot to close one of the sessions.
I have modified the object and commited the changes in session A, but in session B, the attributes are the initial ones! without modifications!
Shall I implement a notification system to communicate changes or there is a built-in way to do this in sqlalchemy?

Sessions are designed to work like this. The attributes of the object in Session B WILL keep what it had when first queried in Session B. Additionally, SQLAlchemy will not attempt to automatically refresh objects in other sessions when they change, nor do I think it would be wise to try to create something like this.
You should be actively thinking of the lifespan of each session as a single transaction in the database. How and when sessions need to deal with the fact that their objects might be stale is not a technical problem that can be solved by an algorithm built into SQLAlchemy (or any extension for SQLAlchemy): it is a "business" problem whose solution you must determine and code yourself. The "correct" response might be to say that this isn't a problem: the logic that occurs with Session B could be valid if it used the data at the time that Session B started. Your "problem" might not actually be a problem. The docs actually have an entire section on when to use sessions, but it gives a pretty grim response if you are hoping for a one-size-fits-all solution...
A Session is typically constructed at the beginning of a logical
operation where database access is potentially anticipated.
The Session, whenever it is used to talk to the database, begins a
database transaction as soon as it starts communicating. Assuming the
autocommit flag is left at its recommended default of False, this
transaction remains in progress until the Session is rolled back,
committed, or closed. The Session will begin a new transaction if it
is used again, subsequent to the previous transaction ending; from
this it follows that the Session is capable of having a lifespan
across many transactions, though only one at a time. We refer to these
two concepts as transaction scope and session scope.
The implication here is that the SQLAlchemy ORM is encouraging the
developer to establish these two scopes in his or her application,
including not only when the scopes begin and end, but also the expanse
of those scopes, for example should a single Session instance be local
to the execution flow within a function or method, should it be a
global object used by the entire application, or somewhere in between
these two.
The burden placed on the developer to determine this scope is one area
where the SQLAlchemy ORM necessarily has a strong opinion about how
the database should be used. The unit of work pattern is specifically
one of accumulating changes over time and flushing them periodically,
keeping in-memory state in sync with what’s known to be present in a
local transaction. This pattern is only effective when meaningful
transaction scopes are in place.
That said, there are a few things you can do to change how the situation works:
First, you can reduce how long your session stays open. Session B is querying the object, then later you are doing something with that object (in the same session) that you want to have the attributes be up to date. One solution is to have this second operation done in a separate session.
Another is to use the expire/refresh methods, as the docs show...
# immediately re-load attributes on obj1, obj2
session.refresh(obj1)
session.refresh(obj2)
# expire objects obj1, obj2, attributes will be reloaded
# on the next access:
session.expire(obj1)
session.expire(obj2)
You can use session.refresh() to immediately get an up-to-date version of the object, even if the session already queried the object earlier.

Run this, to force session to update latest value from your database of choice:
session.expire_all()
Excellent DOC about default behavior and lifespan of session

I just had this issue and the existing solutions didn't work for me for some reason. What did work was to call session.commit(). After calling that, the object had the updated values from the database.

TL;DR Rather than working on Session synchronization, see if your task can be reasonably easily coded with SQLAlchemy Core syntax, directly on the Engine, without the use of (multiple) Sessions
For someone coming from SQL and JDBC experience, one critical thing to learn about SQLAlchemy, which, unfortunately, I didn't clearly come across reading through the multiple documents for months is that SQLAlchemy consists of two fundamentally different parts: the Core and the ORM. As the ORM documentation is listed first on the website and most examples use the ORM-like syntax, one gets thrown into working with it and sets them-self up for errors and confusion - if thinking about ORM in terms of SQL/JDBC. ORM uses its own abstraction layer that takes a complete control over how and when actual SQL statements are executed. The rule of thumb is that a Session is cheap to create and kill, and it should never be re-used for anything in the program's flow and logic that may cause re-querying, synchronization or multi-threading. On the other hand, the Core is the direct no-thrills SQL, very much like a JDBC Driver. There is one place in the docs I found that "suggests" using Core over ORM:
it is encouraged that simple SQL operations take place here, directly on the Connection, such as incrementing counters or inserting extra rows within log
tables. When dealing with the Connection, it is expected that Core-level SQL
operations will be used; e.g. those described in SQL Expression Language Tutorial.
Although, it appears that using a Connection causes the same side effect as using a Session: re-query of a specific record returns the same result as the first query, even if the record's content in the DB was changed. So, apparently Connections are as "unreliable" as Sessions to read DB content in "real time", but a direct Engine execution seems to be working fine as it picks a Connection object from the pool (assuming that the retrieved Connection would never be in the same "reuse" state relatively to the query as the specific open Connection). The Result object should be closed explicitly, as per SA docs

What is your isolation level is set to?
SHOW GLOBAL VARIABLES LIKE 'transaction_isolation';
By default mysql innodb transaction_isolation is set to REPEATABLE-READ.
+-----------------------+-----------------+
| Variable_name | Value |
+-----------------------+-----------------+
| transaction_isolation | REPEATABLE-READ |
+-----------------------+-----------------+
Consider setting it to READ-COMMITTED.
You can set this for your sqlalchemy engine only via:
create_engine("mysql://<connection_string>", isolation_level="READ COMMITTED")
I think another option is:
engine = create_engine("mysql://<connection_string>")
engine.execution_options(isolation_level="READ COMMITTED")
Or set it globally in the DB via:
SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED;
https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html
and
https://docs.sqlalchemy.org/en/14/orm/session_transaction.html#setting-transaction-isolation-levels-dbapi-autocommit

If u had added the incorrect model to the session, u can do:
db.session.rollback()

how to enforce sqlite select for update transaction behavior in sqlalchemy

Yesterday I was working with some sqlalchemy stuff that needed a "select ... for update" concept to avoid a race condition. Adding .with_lockmode('update') to the query works a treat on InnoDB and Postgres, but for sqlite I end up having to sneak in a
if session.bind.name == 'sqlite':
session.execute('begin immediate transaction')
before doing the select.
This seems to work for now, but it feels like cheating. Is there a better way to do this?

SELECT ... FOR UPDATE OF ... is not supported. This is understandable
considering the mechanics of SQLite in that row locking is redundant
as the entire database is locked when updating any bit of it. However,
it would be good if a future version of SQLite supports it for SQL
interchageability reasons if nothing else. The only functionality
required is to ensure a "RESERVED" lock is placed on the database if
not already there.
excerpt from
https://www2.sqlite.org/cvstrac/wiki?p=UnsupportedSql
[EDIT] also see https://sqlite.org/isolation.html thanks #michauwilliam.
i think you have to synchronize the access to the whole database. normal synchronization mechanism should also apply here file lock, process synchronization etc

I think a SELECT FOR UPDATE is relevant for SQLite. There is no way to lock the database BEFORE I start to write. By then it's too late. Here is the scenario:
I have two servers and one database queue table. Each server is looking for work and when it picks up a job, it updates the queue table with an "I got it” so the other server doesn’t also pick it up the same work. I need to leave the record in the queue in case of recovery.
Server 1 reads the first unclaimed item and has it in memory. Server 2 reads the same record and now has it in memory too. Server 1 then updates the record, locking the database, updates, then unlocks. Server 2 then locks the database, updates, and unlocks. The result is both servers now work on the same job. The table shows Server 2 has it and the Server 1 update is lost.
I solved this by creating a lock database table. Server 1 begins a transaction, writes to the lock table which locks the database for writing. Server 2 now tries to begin a transaction and write to the lock table, but is prevented. Server 1 now reads the first queue record and then updates it with the “I got it” code. Then deletes the record it just wrote to the lock table, commits and releases the lock. Now server 2 is able to begin its transaction, write to the lock table, read the 2nd queue record, update it with its “I got it” code, delete it’s lock record, commits and the database is available for the next server looking for work.

Mysql Connection, one or many?

I'm writing a script in python which basically queries WMI and updates the information in a mysql database. One of those "write something you need" to learn to program exercises.
In case something breaks in the middle of the script, for example, the remote computer turns off, it's separated out into functions.
Query Some WMI data
Update that to the database
Query Other WMI data
Update that to the database
Is it better to open one mysql connection at the beginning and leave it open or close the connection after each update?
It seems as though one connection would use less resources. (Although I'm just learning, so this is a complete guess.) However, opening and closing the connection with each update seems more 'neat'. Functions would be more stand alone, rather than depend on code outside that function.

"However, opening and closing the connection with each update seems more 'neat'. "
It's also a huge amount of overhead -- and there's no actual benefit.
Creating and disposing of connections is relatively expensive. More importantly, what's the actual reason? How does it improve, simplify, clarify?
Generally, most applications have one connection that they use from when they start to when they stop.

I don't think that there is "better" solution. Its too early to think about resources. And since wmi is quite slow ( in comparison to sql connection ) the db is not an issue.
Just make it work. And then make it better.
The good thing about working with open connection here, is that the "natural" solution is to use objects and not just functions. So it will be a learning experience( In case you are learning python and not mysql).

Think for a moment about the following scenario:
for dataItem in dataSet:
update(dataItem)
If you open and close your connection inside of the update function and your dataSet contains a thousand items then you will destroy the performance of your application and ruin any transactional capabilities.
A better way would be to open a connection and pass it to the update function. You could even have your update function call a connection manager of sorts. If you intend to perform single updates periodically then open and close your connection around your update function calls.
In this way you will be able to use functions to encapsulate your data operations and be able to share a connection between them.
However, this approach is not great for performing bulk inserts or updates.

Useful clues in S.Lott's and Igal Serban's answers. I think you should first find out your actual requirements and code accordingly.
Just to mention a different strategy; some applications keep a pool of database (or whatever) connections and in case of a transaction just pull one from that pool. It seems rather obvious you just need one connection for this kind of application. But you can still keep a pool of one connection and apply following;
Whenever database transaction is needed the connection is pulled from the pool and returned back at the end.
(optional) The connection is expired (and of replaced by a new one) after a certain amount of time.
(optional) The connection is expired after a certain amount of usage.
(optional) The pool can check (by sending an inexpensive query) if the connection is alive before handing it over the program.
This is somewhat in between single connection and connection per transaction strategies.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.