I am using Postgres DB and SQLAlchemy 1.4 for managing persistence in Python applications. In one of my use cases, I have a requirement to do an audit log of status. At a high level, this would be psuedo code listed below. With this approach, I am getting an exception - "psycopg2.OperationalError: query_wait_timeout". I had checked in the pg_stat_activity table and the query is in status 'idle in transaction'. I am not sure why this behavior is happening and want to understand what would be a better approach to create a transaction to insert into the audit table that is not part of the Parent transaction.
# Create Session
engine = create_engine(database_url)
session = sessionmaker(bind=engine)()
try:
# Other logic
#Insert into DB
session.add(object)
session.commit()
# Other logic
except:
#Add to Audit Log
engine1 = create_engine(database_url)
session1 = sessionmaker(bind=engine1)()
session1.add(audit)
session1.commit()
session1.close()
finally:
session.close()
Related
We have data in a Snowflake cloud database that we would like to move into an Oracle database. As we would like to work toward refreshing the Oracle database regularly, I am trying to use SQLAlchemy to automate this.
I would like to do this using Core because my team is all experienced with SQL, but I am the only one with Python experience. I think it would be easier to tweak the data pulls if we just pass SQL strings. Plus the Snowflake db has some columns with JSON that seems easier to parse using direct SQL since I do not see JSON in the SnowflakeDialect.
I have established connections to both databases and am able to do select queries from both. I have also manually created the tables in our Oracle db so that the keys and datatypes match what I am pulling from Snowflake. When I try to insert, though, my Jupyter notebook just continuously says "Executing Cell" and hangs. Any thoughts on how to proceed or how to get the notebook to tell me where the hangup is?
from sqlalchemy import create_engine,pool,MetaData,text
from snowflake.sqlalchemy import URL
import pandas as pd
eng_sf = create_engine(URL( #engine for snowflake
account = 'account'
user = 'user'
password = 'password'
database = 'database'
schema = 'schema'
warehouse = 'warehouse'
role = 'role'
timezone = 'timezone'
))
eng_o = create_engine("oracle+cx_oracle://{}[{}]:{}#{}".format('user','proxy','password','database'),poolclass=pool.NullPool) #engine for oracle
meta_o = MetaData()
meta_o.reflect(bind=eng_o)
person_o = meta_o['bb_lms_person'] # other oracle tables follow this example
meta_sf = MetaData()
meta_sf.reflect(bind=eng_sf,only=['person']) # other snowflake tables as well, but for simplicity, let's look at one
person_sf = meta_sf.tables['person']
person_query = """
SELECT ID
,EMAIL
,STAGE:student_id::STRING as STUDENT_ID
,ROW_INSERTED_TIME
,ROW_UPDATED_TIME
,ROW_DELETED_TIME
FROM cdm_lms.PERSON
"""
with eng_sf.begin() as connection:
result = connection.execute(text(person_query)).fetchall() # this snippet runs and returns result as expected
with eng_o.begin() as connection:
connection.execute(person_o.insert(),result) # this is a coinflip, sometimes it runs, sometimes it just hangs 5ever
eng_sf.dispose()
eng_o.dispose()
I've checked the typical offenders. The keys for both person_o and the result are all lowercase and match. Any guidance would be appreciated.
use the metadata for the table. the fTable_Stage update or inserted as fluent functions and assign values to lambda variables. This is very safe because only metadata field variables can be used in the lambda. I am updating three fields:LateProbabilityDNN, Sentiment_Polarity, Sentiment_Subjectivity
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
connection=engine.connect()
metadata=MetaData()
Session = sessionmaker(bind = engine)
session = Session()
fTable_Stage=Table('fTable_Stage', metadata,autoload=True,autoload_with=engine)
stmt=fTable_Stage.update().where(fTable_Stage.c.KeyID==keyID).values(\
LateProbabilityDNN=round(float(late_proba),2),\
Sentiment_Polarity=round(my_valance.sentiment.polarity,2),\
Sentiment_Subjectivity= round(my_valance.sentiment.subjectivity,2)\
)
connection.execute(stmt)
My application does not update the database - all queries are SELECT statements. I'm struggling how best to handle direct changes to the database (i.e. opening MySQLWorkbench and changing data there). Without session.commit(), my Flask application is returning stale data.
My solution right now is to have a session.commit() as the first line of each Flask endpoint, but I feel this is the incorrect way of handling this.
Session creation at start of app:
engine = db.create_engine('mysql+pymysql://...')
connection = engine.connect()
metadata = db.MetaData()
Base = declarative_base()
Session = sessionmaker(autoflush=True)
Session.configure(bind=engine)
session = Session()
session.expire_all() to mark all session data as expired. Then when you are trying to access something, it will be fetched from the database.
session.expire(object) does the same but for objects only
db.session.refresh(some_object) expires and reloads all object data
Nice article about that can be found here: https://www.michaelcho.me/article/sqlalchemy-commit-flush-expire-refresh-merge-whats-the-difference
I tried to totally seperate Flask and SQLAlchemy using this method but Flask still seems to be able to detect my database and start a new transaction at the beginning of each request.
The db.py file creates a new session and defines a simple model of a table:
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, String
engine = create_engine("mysql://web:kingtezdu#localhost/web_unique")
print("creating new session")
db_session = scoped_session(sessionmaker(bind=engine))
Base = declarative_base()
Base.query = db_session.query_property()
# define model of 'persons' table
class Person(Base):
__tablename__ = "persons"
name = Column(String(30), primary_key=True)
def __repr__(self):
return "Person(\"{0.name}\")".format(self)
# create table
Base.metadata.create_all(bind=engine)
And app.py, a simple Flask application using the SQLAlchemy session and model:
from flask import Flask, escape
app = Flask(__name__)
# importing new session
from db import db_session, Person
# registering for app teardown to remove session
#app.teardown_appcontext
def shutdown_session(exception=None):
db_session.remove()
#app.route("/query")
def query():
# query all persons in the database
all_persons = Person.query.all()
print all_persons
return "" # we use the console output
if __name__ == "__main__":
app.run(debug=True)
Let's run this:
$ python app.py
creating new session
* Running on http://127.0.0.1:5000/
* Restarting with reloader
creating new session
Weired enough it runs db.py two times but we just ignore this, let's access the webpage /query:
[]
127.0.0.1 - - [23/Dec/2015 18:20:14] "GET /query HTTP/1.1" 200 -
We can see that our request was answered, though we only use the console output. There is no Person in the database yet, let's add one:
mysql> INSERT INTO persons (name) VALUES ("Marie");
Query OK, 1 row affected (0.11 sec)
Marie is part of the database now so we reload the webpage:
[Person("Marie")]
127.0.0.1 - - [23/Dec/2015 18:24:48] "GET /query HTTP/1.1" 200 -
As you can see the session already knows about Marie. Flask didn't create a new session. That means that there was a new transaction started. Contrast this to the plan python example below to see the difference.
My question is how Flask is able to start a new transaction on the begin of each request. Flask shouldn't know about the database but seems to be able to change something about it's behaviour.
In case you don't know what a SQLAlchemy transaction is read this paragraph extracted from Managing Transactions:
When the transactional state is completed after a rollback or commit,
the Session releases all Transaction and Connection resources, and
goes back to the “begin” state, which will again invoke new Connection
and Transaction objects as new requests to emit SQL statements are
received.
So a transaction is ended by a commit and will cause a new connection to be set up which will then make the session read the database again. In reality this means that you have to commit when you want to see changes made to the database:
First in interactive python mode:
>>> from db import db_session, Person
creating new session
>>> Person.query.all()
[]
Switch over to MySQL and insert a new Person:
mysql> INSERT INTO persons (name) VALUES ("Paul");
Query OK, 1 row affected (0.03 sec)
Finally try to load Paul into our session:
>>> Person.query.all()
[]
>>> db_session.commit()
>>> Person.query.all()
[Person("Paul")]
I think the issue here is that scoped_session somewhat hides what happens to the actual sessions in use. When your teardown handler
# registering for app teardown to remove session
#app.teardown_appcontext
def shutdown_session(exception=None):
db_session.remove()
runs at the end of each request, you call db_session.remove() which disposes of the session used in that particular request along with any transaction context. See http://docs.sqlalchemy.org/en/latest/orm/contextual.html for the details, particularly
The scoped_session.remove() method first calls Session.close() on the
current Session, which has the effect of releasing any
connection/transactional resources owned by the Session first, then
discarding the Session itself. “Releasing” here means that connections
are returned to their connection pool and any transactional state is
rolled back, ultimately using the rollback() method of the underlying
DBAPI connection.
Something peculiar I've noticed is that any changes committed to the DB outside of the session (such as ones made in MySQL's Workbench) are not recognised in the sqlAlchemy session. I have to close and open a new session for sqlAlchemy to recognise it.
For example, a row I deleted manually is still fetched from sqlAlchemy.
This is how I initialise the session:
engine = create_engine('mysql://{}:{}#{}/{}'.format(username, password, host, schema), pool_recycle=3600)
Session = sessionmaker(bind=engine)
session = Session()
metadata = MetaData()
How can I get sqlAlchemy to recognise them?
My sqlAlchemy version is 0.9.4 and my MySQL version is 5.5.34. We use only sqlAlchemy's Core (no ORM).
To be able to read committed data from others transactions you'll need to set transaction isolation level to READ COMMITTED. For sqlalchemy and mysql:
To set isolation level using create_engine():
engine = create_engine(
"mysql://scott:tiger#localhost/test",
isolation_level="READ COMMITTED")
To set using per-connection execution options:
connection = engine.connect()
connection = connection.execution_options(
isolation_level="READ COMMITTED")
source
DBSession = sessionmaker(bind=self.engine)
def add_person(name):
s = DBSession()
s.add(Person(name=name))
s.commit()
Everytime I run add_person() another connection is created with my postgreSQL DB.
Looking at:
SELECT count(*) FROM pg_stat_activity;
I see the count going up, until I get a Remaining connection slots are reserved for non-replication superuser connections error.
How do I kill those connections? Am I wrong in opening a new session everytime I want to add a Person record?
In general, you should keep your Session object (here DBSession) separate from any functions that make changes to the database. So in your case you might try something like this instead:
DBSession = sessionmaker(bind=self.engine)
session = DBSession() # create your session outside of functions that will modify database
def add_person(name):
session.add(Person(name=name))
session.commit()
Now you will not get new connections every time you add a person to the database.