SQLAlchemy + MariaDB: MySQL server has gone away - python

I know this question has been asked before but I cannot get it to work. I'm writing an app to scrap some stock information on the web. The scraping part takes about 70 minutes to complete where I pass my SQLAlchemy objects into a function.
After the function completes it is supposed to insert the data into the data to the database and this is when I get the error. I guess MariaDB have closed the session then?
Code:
with get_session() as session:
stocks = session.query(Stock).filter(or_(Stock.market.like('%Large%'), Stock.market.like("%First North%"))).all()
for stock, path in chrome.download_stock(stocks=stocks): # This function takes about 70 minutes, not using any session in here, only Stock objects
# Starting to insert and get the error on the first insert
Error:
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
get_session() function:
#contextmanager
def get_session(debug=False):
engine = create_engine('mysql://root:pw#IP/DB', echo=debug, encoding='utf8', pool_recycle=300, pool_pre_ping=True)
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
try:
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.close()
I have tried to decrease the pool_recycle to 300 seconds and add the new pool_pre_ping that came with SQLAlchemny 1.2 but nothing works. Any ideas? You think it is in the code or on the server side?
MariaDB: 10.2.14
SQLAlchemy: 1.2.7
EDIT:
Started to investigate MariaDB wait_timeout because of FrankerZ's comment with some interesting result, first from mysql-command:
SHOW SESSION VARIABLES LIKE 'wait_timeout';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout | 28800 |
+---------------+-------+
1 row in set (0.00 sec)
Then through Python / SQLAlchemy:
print(session.execute("SHOW SESSION VARIABLES LIKE 'wait_timeout';").first())
('wait_timeout', '600')
Any explanation for this? Should be the problem right?

There are at least 2 "wait_timeouts"; it is quite confusing.
Right after connecting, do
SET SESSION wait_timeout=12000
That will give you 200 minutes.
Also make sure SQLAlchemy does not have a timeout. (PHP, for example, does.)

Related

Sqlalchemy keep alive or reconnect Sessions after timeout

I am trying to understand how best to keep alive or reconnect SQLAlchemy DB Sessions after they go stale due to MySQL wait_timeout expiring.
I am setting a low wait timeout to illustrate the issue below:
mysql> SET ##GLOBAL.wait_timeout = 2;
Query OK, 0 rows affected (0.00 sec)
mysql> show global variables like 'wait_timeout';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout | 2 |
+---------------+-------+
1 row in set (0.01 sec)
Here's the error that I am seeing
In [1]: sess.query(SomeTable).first()
Out[1]: <SomeTable(id=1)>
In [2]: import time
In [3]: time.sleep(4)
In [4]: sess.query(SomeTable).first()
OperationalError: (MySQLdb._exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query')
Here's one way to prevent the error
In [1]: sess.query(SomeTable).first()
Out[1]: <SomeTable(id=1)>
In [2]: sess.close()
In [3]: import time
In [4]: time.sleep(4)
In [5]: sess.query(SomeTable).first()
Out[5]: <SomeTable(id=1)>
Here's how I am creating the db session
sess = sessionmaker(binds={
DeclarativeBase: create_engine(
db_config.url,
pool_recycle=3600,
pool_pre_ping=True,
)},
A few questions:
The pool_pre_ping argument seems to make no difference to the Session object. This seems to be because SQLAlchemy runs the connection test when a new connection is checked out from the pool. I am guessing that the session object checks out a new connection soon after it is created, but does not check if the connection is alive for every query. Is my understanding correct?
The only way to prevent the session timeout seems to be to call close sess. close() soon after the query is fired. Calling sess. close() seems to reset the connection, so the next time I call sess. query a new connection seems to be checked out from the pool. Is there a better way to do this?
How could I keep the session that bound to the DB ORM instances returned from my query alive for longer? obj = db.query(SomeTable).first(); time.sleep(10); obj.access_some_db_relationship

How to avoid the QueuePool limit error using Flask-SQLAlchemy?

I'm developing a webapp using Flask-SQLAlchemy and a Postgre DB, then I have this dropdown list in my webpage which is populated from a select to the DB, after selecting different values for a couple of times I get the "sqlalchemy.exc.TimeoutError:".
My package's versions are:
Flask-SQLAlchemy==2.5.1
psycopg2-binary==2.8.6
SQLAlchemy==1.4.15
My parameters for the DB connection are set as:
app.config['SQLALCHEMY_POOL_SIZE'] = 20
app.config['SQLALCHEMY_MAX_OVERFLOW'] = 20
app.config['SQLALCHEMY_POOL_TIMEOUT'] = 5
app.config['SQLALCHEMY_POOL_RECYCLE'] = 10
The error I'm getting is:
sqlalchemy.exc.TimeoutError: QueuePool limit of size 20 overflow 20 reached, connection timed out, timeout 5.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
After changing the value of the 'SQLALCHEMY_MAX_OVERFLOW' from 20 to 100 I get the following error after some value changes on the dropdown list.
psycopg2.OperationalError: connection to server at "localhost" (::1), port 5432 failed: FATAL: sorry, too many clients already
Every time a new value is selected from the dropdown list, four queries are triggered to the database and they are used to populate four corresponding tables in my HTML with the results from that query.
I have a 'db.session.commit()' statement after every single query to the DB, but even though I have it, I get this error after a few value changes to my dropdown list.
I know that I should be looking to correctly manage my connection sessions, but I'm strugling with this. I thought about setting the pool timeout to 5s, instead of the default 30s in hopes that the session would be closed and returned to the pool in a faster way, but it seems it didn't help.
As a suggestion from #snakecharmerb, I checked the output of:
select * from pg_stat_activity;
I ran the webapp for 10 different values before it showed me an error, which means all the 20+20 sessions where used and are left in an 'idle in transaction' state.
Do anybody have any idea suggestion on what should I change or look for?
I found a solution to the issue I was facing, in another post from StackOverFlow.
When you assign your flask app to your db variable, on top of indicating which Flask app it should use, you can also pass on session options, as below:
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy(app, session_options={'autocommit': True})
The usage of 'autocommit' solved my issue.
Now, as suggested, I'm using:
app.config['SQLALCHEMY_POOL_SIZE'] = 1
app.config['SQLALCHEMY_MAX_OVERFLOW'] = 0
Now everything is working as it should.
The original post which helped me is: Autocommit in Flask-SQLAlchemy
#snakecharmerb, #jorzel, #J_H -> Thanks for the help!
You are leaking connections.
A little counterintuitively,
you may find you obtain better results with a lower pool limit.
A given python thread only needs a single pooled connection,
for the simple single-database queries you're doing.
Setting the limit to 1, with 0 overflow,
will cause you to notice a leaked connection earlier.
This makes it easier to pin the blame on the source code that leaked it.
As it stands, you have lots of code, and the error is deferred
until after many queries have been issued,
making it harder to reason about system behavior.
I will assume you're using sqlalchemy 1.4.29.
To avoid leaking, try using this:
from contextlib import closing
from sqlalchemy import create_engine, text
from sqlalchemy.orm import scoped_session, sessionmaker
engine = create_engine(some_url, future=True, pool_size=1, max_overflow=0)
get_session = scoped_session(sessionmaker(bind=engine))
...
with closing(get_session()) as session:
try:
sql = """yada yada"""
rows = session.execute(text(sql)).fetchall()
session.commit()
...
# Do stuff with result rows.
...
except Exception:
session.rollback()
I am using flask-restful.
So when I got this error -> QueuePool limit of size 20 overflow 20 reached, connection timed out, timeout 5.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
I found out in logs that my checked out connections are not closing. this I found out using logger.info(db_session.get_bind().pool.status())
def custom_decorator(error_message, db_session):
def api_decorator(func):
def api_request(self, *args, **kwargs):
try:
response = func(self)
db_session.commit()
return response
except Exception as err:
db_session.rollback()
logger.error(error_message.format(err))
return error_response(
message=f"Internal Server Error",
status_code=HTTPStatus.INTERNAL_SERVER_ERROR,
)
finally:
db_session.close()
return api_request
return api_decorator
So I had to create this decorator which handles the db_session closing automatically. Using this I am not getting any active checked out connections.
you can use the decorators in your function as follows:
#custom_decorator("blah", db_session)
def example():
"some code"

SQLalchemy fetch via pandas not completed when running in airflow env but when started manually

I have a function that connects to a mysql db and executes a query, that takes quite long (approx. 10 min)
def foo(connections_string): # connection_string something like "mysql://user:key#jost/db"
statement = "SELECT * FROM largtable"
conn = None
df = None
try:
engine = sqlalchemy.create_engine(
connections_string,
connect_args={
"connect_timeout": 1500,
},
poolclass = QueuePool,
pool_pre_ping = True,
pool_size = 10,
pool_recycle=3600,
pool_timeout = 900,
)
conn = engine.connect()
df = pd.read_sql_query(statement, conn)
except Exception:
raise Exception("could not load data")
finally:
if conn:
conn.close()
return df
When I run this in my local envionment, it works and takes about 600 seconds. When I run this via airflow, it fails after about 5 to 6 Mins with the error (_mysql_exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query')
I have tried the suggestions on stakoverflow to adjust the timeout of sqlalchemy (e.g., this and this) and from the sqlalchemy docs, which lead to the additional args (pool_ and connection_args) for the create_engine() function. However, these didn't seem to have any effect at all.
I've also tried to replace sqlalchemy with pymysql, which lead to the same error on airflow. Thus, I didn't try flask-sqlalchemy yet, since I expect the same result.
Since it works in the basically same environment (py version 3.7.x, sqlalchemy 1.3.3 and pandas 1.3.x) if not run by airflow but doesn't when run by airflow, I think there is some global variable, that overrules my timeout settings. But I have no idea where to start the search.
And some additional info, b/c somebody could work with the info: I got it running with airflow twice now in off-hours (5 am and sundays). But not again since.
PS: unfortunately, pagination as suggested here is not an option, since the query runtime results from transformations and calculations.

Where to session.commit() for a SELECT query - SQLALchemy, Flask

My application does not update the database - all queries are SELECT statements. I'm struggling how best to handle direct changes to the database (i.e. opening MySQLWorkbench and changing data there). Without session.commit(), my Flask application is returning stale data.
My solution right now is to have a session.commit() as the first line of each Flask endpoint, but I feel this is the incorrect way of handling this.
Session creation at start of app:
engine = db.create_engine('mysql+pymysql://...')
connection = engine.connect()
metadata = db.MetaData()
Base = declarative_base()
Session = sessionmaker(autoflush=True)
Session.configure(bind=engine)
session = Session()
session.expire_all() to mark all session data as expired. Then when you are trying to access something, it will be fetched from the database.
session.expire(object) does the same but for objects only
db.session.refresh(some_object) expires and reloads all object data
Nice article about that can be found here: https://www.michaelcho.me/article/sqlalchemy-commit-flush-expire-refresh-merge-whats-the-difference

sqlalchemy queries not fetching right data

I have a basic query that I run over and over at each 2 minutes to extract all records that have a flag set to 1/true etc.
If I run the script from the command and I have a record with the flag set it extracts it then, if I go to mysql directly and re-set that flag to true/1 the next time (2 minutes) the query is executed the record is not found.
I have enabled the queries executed to be printed out to my console and if I execute the query directly into mysql I can see the record showing up. Why isnt sqlalchemy finding it?
Here's my config:
engine = create_engine( config.DATABASE_URI, pool_recycle=1800 )
metadata = MetaData()
db_session = scoped_session( sessionmaker( bind = engine,
autoflush = True,
autocommit = False ) )
The problem may be related to the scoped session. With this type of session, a session is bind to every server thread.. what framework are you using?

Categories