Sqlalchemy keep alive or reconnect Sessions after timeout - python

I am trying to understand how best to keep alive or reconnect SQLAlchemy DB Sessions after they go stale due to MySQL wait_timeout expiring.
I am setting a low wait timeout to illustrate the issue below:
mysql> SET ##GLOBAL.wait_timeout = 2;
Query OK, 0 rows affected (0.00 sec)
mysql> show global variables like 'wait_timeout';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout | 2 |
+---------------+-------+
1 row in set (0.01 sec)
Here's the error that I am seeing
In [1]: sess.query(SomeTable).first()
Out[1]: <SomeTable(id=1)>
In [2]: import time
In [3]: time.sleep(4)
In [4]: sess.query(SomeTable).first()
OperationalError: (MySQLdb._exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query')
Here's one way to prevent the error
In [1]: sess.query(SomeTable).first()
Out[1]: <SomeTable(id=1)>
In [2]: sess.close()
In [3]: import time
In [4]: time.sleep(4)
In [5]: sess.query(SomeTable).first()
Out[5]: <SomeTable(id=1)>
Here's how I am creating the db session
sess = sessionmaker(binds={
DeclarativeBase: create_engine(
db_config.url,
pool_recycle=3600,
pool_pre_ping=True,
)},
A few questions:
The pool_pre_ping argument seems to make no difference to the Session object. This seems to be because SQLAlchemy runs the connection test when a new connection is checked out from the pool. I am guessing that the session object checks out a new connection soon after it is created, but does not check if the connection is alive for every query. Is my understanding correct?
The only way to prevent the session timeout seems to be to call close sess. close() soon after the query is fired. Calling sess. close() seems to reset the connection, so the next time I call sess. query a new connection seems to be checked out from the pool. Is there a better way to do this?
How could I keep the session that bound to the DB ORM instances returned from my query alive for longer? obj = db.query(SomeTable).first(); time.sleep(10); obj.access_some_db_relationship

Related

Blob storage trigger timing out anywhere from ~10 seconds to a couple minutes

I'm getting quite a few timeouts as my blob storage trigger is running. It seems to timeout whenever I'm inserting values into an Azure SQL DB. I have the functionTimeout parameter set in the host.json to "functionTimeout": "00:40:00", although I'm seeing timeouts happen within a couple of minutes. Why would this be the case? My function app is on ElasticPremium pricing tier.
System.TimeoutException message:
Exception while executing function: Functions.BlobTrigger2 The operation has timed out.
My connection to the db (I close it at the end of the script):
# urllib.parse.quote_plus for python 3
params = urllib.parse.quote_plus(fr'Driver={DRIVER};Server=tcp:{SERVER_NAME},1433;Database=newTestdb;Uid={USER_NAME};Pwd={PASSWORD};Encrypt=yes;TrustServerCertificate=no;Connection Timeout=0;')
conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
engine_azure = create_engine(conn_str,echo=True)
conn = engine_azure.connect()
This is the line of code that is run before the timeout happens (Inserting to db):
processed_df.to_sql(blob_name_file.lower(), conn, if_exists = 'append', index=False, chunksize=500)

How to avoid the QueuePool limit error using Flask-SQLAlchemy?

I'm developing a webapp using Flask-SQLAlchemy and a Postgre DB, then I have this dropdown list in my webpage which is populated from a select to the DB, after selecting different values for a couple of times I get the "sqlalchemy.exc.TimeoutError:".
My package's versions are:
Flask-SQLAlchemy==2.5.1
psycopg2-binary==2.8.6
SQLAlchemy==1.4.15
My parameters for the DB connection are set as:
app.config['SQLALCHEMY_POOL_SIZE'] = 20
app.config['SQLALCHEMY_MAX_OVERFLOW'] = 20
app.config['SQLALCHEMY_POOL_TIMEOUT'] = 5
app.config['SQLALCHEMY_POOL_RECYCLE'] = 10
The error I'm getting is:
sqlalchemy.exc.TimeoutError: QueuePool limit of size 20 overflow 20 reached, connection timed out, timeout 5.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
After changing the value of the 'SQLALCHEMY_MAX_OVERFLOW' from 20 to 100 I get the following error after some value changes on the dropdown list.
psycopg2.OperationalError: connection to server at "localhost" (::1), port 5432 failed: FATAL: sorry, too many clients already
Every time a new value is selected from the dropdown list, four queries are triggered to the database and they are used to populate four corresponding tables in my HTML with the results from that query.
I have a 'db.session.commit()' statement after every single query to the DB, but even though I have it, I get this error after a few value changes to my dropdown list.
I know that I should be looking to correctly manage my connection sessions, but I'm strugling with this. I thought about setting the pool timeout to 5s, instead of the default 30s in hopes that the session would be closed and returned to the pool in a faster way, but it seems it didn't help.
As a suggestion from #snakecharmerb, I checked the output of:
select * from pg_stat_activity;
I ran the webapp for 10 different values before it showed me an error, which means all the 20+20 sessions where used and are left in an 'idle in transaction' state.
Do anybody have any idea suggestion on what should I change or look for?
I found a solution to the issue I was facing, in another post from StackOverFlow.
When you assign your flask app to your db variable, on top of indicating which Flask app it should use, you can also pass on session options, as below:
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy(app, session_options={'autocommit': True})
The usage of 'autocommit' solved my issue.
Now, as suggested, I'm using:
app.config['SQLALCHEMY_POOL_SIZE'] = 1
app.config['SQLALCHEMY_MAX_OVERFLOW'] = 0
Now everything is working as it should.
The original post which helped me is: Autocommit in Flask-SQLAlchemy
#snakecharmerb, #jorzel, #J_H -> Thanks for the help!
You are leaking connections.
A little counterintuitively,
you may find you obtain better results with a lower pool limit.
A given python thread only needs a single pooled connection,
for the simple single-database queries you're doing.
Setting the limit to 1, with 0 overflow,
will cause you to notice a leaked connection earlier.
This makes it easier to pin the blame on the source code that leaked it.
As it stands, you have lots of code, and the error is deferred
until after many queries have been issued,
making it harder to reason about system behavior.
I will assume you're using sqlalchemy 1.4.29.
To avoid leaking, try using this:
from contextlib import closing
from sqlalchemy import create_engine, text
from sqlalchemy.orm import scoped_session, sessionmaker
engine = create_engine(some_url, future=True, pool_size=1, max_overflow=0)
get_session = scoped_session(sessionmaker(bind=engine))
...
with closing(get_session()) as session:
try:
sql = """yada yada"""
rows = session.execute(text(sql)).fetchall()
session.commit()
...
# Do stuff with result rows.
...
except Exception:
session.rollback()
I am using flask-restful.
So when I got this error -> QueuePool limit of size 20 overflow 20 reached, connection timed out, timeout 5.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
I found out in logs that my checked out connections are not closing. this I found out using logger.info(db_session.get_bind().pool.status())
def custom_decorator(error_message, db_session):
def api_decorator(func):
def api_request(self, *args, **kwargs):
try:
response = func(self)
db_session.commit()
return response
except Exception as err:
db_session.rollback()
logger.error(error_message.format(err))
return error_response(
message=f"Internal Server Error",
status_code=HTTPStatus.INTERNAL_SERVER_ERROR,
)
finally:
db_session.close()
return api_request
return api_decorator
So I had to create this decorator which handles the db_session closing automatically. Using this I am not getting any active checked out connections.
you can use the decorators in your function as follows:
#custom_decorator("blah", db_session)
def example():
"some code"

SQLAlchemy engine.execute() leaves a connection to the database in sleeping status

I am using SQL server database. I've noticed that when executing the code below, I get a connection to the database left over in 'sleeping' state with an 'AWAITING COMMAND' status.
engine = create_engine(url, connect_args={'autocommit': True})
res = engine.execute(f"CREATE DATABASE my_database")
res.close()
engine.dispose()
With a breakpoint after the engine.dispose() call, I can see an entry on the server in the EXEC sp_who2 table. This entry only disappears after I kill the process.
Probably Connection Pooling
Connection Pooling
A connection pool is a standard technique used to
maintain long running connections in memory for efficient re-use, as
well as to provide management for the total number of connections an
application might use simultaneously.
Particularly for server-side web applications, a connection pool is
the standard way to maintain a “pool” of active database connections
in memory which are reused across requests.
SQLAlchemy includes several connection pool implementations which
integrate with the Engine. They can also be used directly for
applications that want to add pooling to an otherwise plain DBAPI
approach.
.
I'm not sure if this is what gets in the way of my teardown method which drops the database
To drop a database that's possibly in use try:
USE master;
ALTER DATABASE mydb SET RESTRiCTED_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE mydb;
You basically want to kill all the connections You could use something like this:
For MS SQL Server 2012 and above
USE [master];
DECLARE #kill varchar(8000) = '';
SELECT #kill = #kill + 'kill ' + CONVERT(varchar(5), session_id) + ';'
FROM sys.dm_exec_sessions
WHERE database_id = db_id('MyDB')
EXEC(#kill);
For MS SQL Server 2000, 2005, 2008
USE master;
DECLARE #kill varchar(8000); SET #kill = '';
SELECT #kill = #kill + 'kill ' + CONVERT(varchar(5), spid) + ';'
FROM master..sysprocesses
WHERE dbid = db_id('MyDB')
EXEC(#kill);
Or something more script-like:
DECLARE #pid SMALLINT, #sql NVARCHAR(100)
DECLARE curs CURSOR LOCAL FORWARD_ONLY FOR
SELECT DISTINCT pid FROM master..sysprocesses where dbid = DB_ID(#dbname)
OPEN curs
fetch next from curs into #pid
while ##FETCH_STATUS = 0
BEGIN
SET #sql = 'KILL ' + CONVERT(VARCHAR, #pid)
EXEC(#sql)
FETCH NEXT FROM curs into #pid
END
CLOSE curs
DEALLOCATE curs
More can be found here:
Script to kill all connections to a database (More than RESTRICTED_USER ROLLBACK)

SQLAlchemy + MariaDB: MySQL server has gone away

I know this question has been asked before but I cannot get it to work. I'm writing an app to scrap some stock information on the web. The scraping part takes about 70 minutes to complete where I pass my SQLAlchemy objects into a function.
After the function completes it is supposed to insert the data into the data to the database and this is when I get the error. I guess MariaDB have closed the session then?
Code:
with get_session() as session:
stocks = session.query(Stock).filter(or_(Stock.market.like('%Large%'), Stock.market.like("%First North%"))).all()
for stock, path in chrome.download_stock(stocks=stocks): # This function takes about 70 minutes, not using any session in here, only Stock objects
# Starting to insert and get the error on the first insert
Error:
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
get_session() function:
#contextmanager
def get_session(debug=False):
engine = create_engine('mysql://root:pw#IP/DB', echo=debug, encoding='utf8', pool_recycle=300, pool_pre_ping=True)
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
try:
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.close()
I have tried to decrease the pool_recycle to 300 seconds and add the new pool_pre_ping that came with SQLAlchemny 1.2 but nothing works. Any ideas? You think it is in the code or on the server side?
MariaDB: 10.2.14
SQLAlchemy: 1.2.7
EDIT:
Started to investigate MariaDB wait_timeout because of FrankerZ's comment with some interesting result, first from mysql-command:
SHOW SESSION VARIABLES LIKE 'wait_timeout';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout | 28800 |
+---------------+-------+
1 row in set (0.00 sec)
Then through Python / SQLAlchemy:
print(session.execute("SHOW SESSION VARIABLES LIKE 'wait_timeout';").first())
('wait_timeout', '600')
Any explanation for this? Should be the problem right?
There are at least 2 "wait_timeouts"; it is quite confusing.
Right after connecting, do
SET SESSION wait_timeout=12000
That will give you 200 minutes.
Also make sure SQLAlchemy does not have a timeout. (PHP, for example, does.)

Operational error 2055 while exporting pandas dataframe to MySQL using SQLAlchemy

I am using SQLAlchemy for the first time to export around 6 million records to MySQL. Following is the error I receive:
OperationalError: (mysql.connector.errors.OperationalError) 2055: Lost connection to MySQL server at '127.0.0.1:3306', system error: 10053 An established connection was aborted by the software in your host machine
Code:
import pandas as pd
import sqlalchemy
df=pd.read_excel(r"C:\Users\mazin\1-601.xlsx")
database_username = 'root'
database_password = 'aUtO1115'
database_ip = '127.0.0.1'
database_name = 'patenting in psis'
database_connection = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}#{2}/{3}'.
format(database_username, database_password,
database_ip, database_name), pool_recycle=1, pool_timeout=30).connect()
df.to_sql(con=database_connection, name='sample', if_exists='replace')
database_connection.close()
Note: I do not get the error if I export around 100 records. After referring to similar posts, I have added the pool_recycle and pool_timeout parameters but the error still persists.
Problem is that you're trying to import 6 million rows as one chunk. And it is taking time. With your current config, pool_recycle is set to 1 second, meaning connection will close after 1 second, and that for sure is not enough time to insert 6 mill rows. My suggestion is next:
database_connection = sqlalchemy.create_engine(
'mysql+mysqlconnector://{0}:{1}#{2}/{3}'.format(
database_username,
database_password,
database_ip, database_name
), pool_recycle=3600, pool_size=5).connect()
df.to_sql(
con=database_connection,
name='sample',
if_exists='replace',
chunksize=1000
)
This will set pool of 5 connections with recycle time of 1 hour. And second line will insert 1000 at a time (instead of all the rows at once). You can experiment with values to achieve best performance.

Categories