sqlalchemy NullPool seems not closing the underlying DBAPI connection - python

If I understand the documentation correctly, session creates or checkout an existing connection from pool.
After I close the connection it goes back to connection pool but the connection to DBAPI is not closed. To close the DBAPI connection, I need to use NullPool.
I tried to check if my understanding is correct but seems there is some disconnect.
I'm able to open a session even after creating connections using NullPool.
Below is the sample code, where it has two connections one without NullPool and one with NullPool.
from time import sleep
from sqlalchemy.orm import sessionmaker
from sqlalchemy.pool import NullPool
from sqlalchemy import create_engine
con_str = 'postgresql+psycopg2://pgusr:pgusr#docker-db.appnet/iamdb'
all_connections = [
create_engine(con_str),
create_engine(con_str, poolclass=NullPool),
]
def get_qry(ssn_rows):
return f'SELECT * FROM users LIMIT {ssn_rows}'
def get_ssn(cnx):
SessionClass = sessionmaker(bind=cnx)
return SessionClass()
def test_ssn(ssn, records):
print('')
print(list(ssn.execute(get_qry(records))))
for cnx in all_connections:
ssn = get_ssn(cnx)
test_ssn(ssn, 3)
ssn.close()
sleep(1.5)
ssn = get_ssn(cnx)
test_ssn(ssn, 5)
Above code executed successfully, where I expected, in the second iteration(where NullPool connection is chosen) it should throws an error while trying to create a session for the second time i.e. after the first session is closed because in case of NullPool the connection to DBAPI should also have closed. However in contrast, both the connections were able to open multiple sessions.
Am I missing something here?
How do I close the connection to DBAPI after the session is closed?

Related

How to avoid the QueuePool limit error using Flask-SQLAlchemy?

I'm developing a webapp using Flask-SQLAlchemy and a Postgre DB, then I have this dropdown list in my webpage which is populated from a select to the DB, after selecting different values for a couple of times I get the "sqlalchemy.exc.TimeoutError:".
My package's versions are:
Flask-SQLAlchemy==2.5.1
psycopg2-binary==2.8.6
SQLAlchemy==1.4.15
My parameters for the DB connection are set as:
app.config['SQLALCHEMY_POOL_SIZE'] = 20
app.config['SQLALCHEMY_MAX_OVERFLOW'] = 20
app.config['SQLALCHEMY_POOL_TIMEOUT'] = 5
app.config['SQLALCHEMY_POOL_RECYCLE'] = 10
The error I'm getting is:
sqlalchemy.exc.TimeoutError: QueuePool limit of size 20 overflow 20 reached, connection timed out, timeout 5.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
After changing the value of the 'SQLALCHEMY_MAX_OVERFLOW' from 20 to 100 I get the following error after some value changes on the dropdown list.
psycopg2.OperationalError: connection to server at "localhost" (::1), port 5432 failed: FATAL: sorry, too many clients already
Every time a new value is selected from the dropdown list, four queries are triggered to the database and they are used to populate four corresponding tables in my HTML with the results from that query.
I have a 'db.session.commit()' statement after every single query to the DB, but even though I have it, I get this error after a few value changes to my dropdown list.
I know that I should be looking to correctly manage my connection sessions, but I'm strugling with this. I thought about setting the pool timeout to 5s, instead of the default 30s in hopes that the session would be closed and returned to the pool in a faster way, but it seems it didn't help.
As a suggestion from #snakecharmerb, I checked the output of:
select * from pg_stat_activity;
I ran the webapp for 10 different values before it showed me an error, which means all the 20+20 sessions where used and are left in an 'idle in transaction' state.
Do anybody have any idea suggestion on what should I change or look for?
I found a solution to the issue I was facing, in another post from StackOverFlow.
When you assign your flask app to your db variable, on top of indicating which Flask app it should use, you can also pass on session options, as below:
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy(app, session_options={'autocommit': True})
The usage of 'autocommit' solved my issue.
Now, as suggested, I'm using:
app.config['SQLALCHEMY_POOL_SIZE'] = 1
app.config['SQLALCHEMY_MAX_OVERFLOW'] = 0
Now everything is working as it should.
The original post which helped me is: Autocommit in Flask-SQLAlchemy
#snakecharmerb, #jorzel, #J_H -> Thanks for the help!
You are leaking connections.
A little counterintuitively,
you may find you obtain better results with a lower pool limit.
A given python thread only needs a single pooled connection,
for the simple single-database queries you're doing.
Setting the limit to 1, with 0 overflow,
will cause you to notice a leaked connection earlier.
This makes it easier to pin the blame on the source code that leaked it.
As it stands, you have lots of code, and the error is deferred
until after many queries have been issued,
making it harder to reason about system behavior.
I will assume you're using sqlalchemy 1.4.29.
To avoid leaking, try using this:
from contextlib import closing
from sqlalchemy import create_engine, text
from sqlalchemy.orm import scoped_session, sessionmaker
engine = create_engine(some_url, future=True, pool_size=1, max_overflow=0)
get_session = scoped_session(sessionmaker(bind=engine))
...
with closing(get_session()) as session:
try:
sql = """yada yada"""
rows = session.execute(text(sql)).fetchall()
session.commit()
...
# Do stuff with result rows.
...
except Exception:
session.rollback()
I am using flask-restful.
So when I got this error -> QueuePool limit of size 20 overflow 20 reached, connection timed out, timeout 5.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
I found out in logs that my checked out connections are not closing. this I found out using logger.info(db_session.get_bind().pool.status())
def custom_decorator(error_message, db_session):
def api_decorator(func):
def api_request(self, *args, **kwargs):
try:
response = func(self)
db_session.commit()
return response
except Exception as err:
db_session.rollback()
logger.error(error_message.format(err))
return error_response(
message=f"Internal Server Error",
status_code=HTTPStatus.INTERNAL_SERVER_ERROR,
)
finally:
db_session.close()
return api_request
return api_decorator
So I had to create this decorator which handles the db_session closing automatically. Using this I am not getting any active checked out connections.
you can use the decorators in your function as follows:
#custom_decorator("blah", db_session)
def example():
"some code"

SQLAlchemy verify SSL connection

I would like to verify the SSL connection that SQLAlchemy sets up when using create_engine to connect to a PostgreSQL database. For example, if I have the following Python 3 code:
from sqlalchemy import create_engine
conn_string = "postgresql+psycopg2://myuser:******#someserver:5432/somedb"
conn_args = {
"sslmode": "verify-full",
"sslrootcert": "/etc/ssl/certs/ca-certificates.crt",
}
engine = create_engine(conn_string, connect_args=conn_args)
I know that I can print the contents of engine.__dict__, but it doesn't contain any information about the SSL settings (TLS version, cipher suite, etc) that it's using to connect:
{
'_echo': False,
'dialect': <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0x7f988a217978>,
'dispatch': <sqlalchemy.event.base.ConnectionEventsDispatch object at 0x7f988938e788>,
'engine': Engine(postgresql+psycopg2://myuser:******#someserver:5432/somedb),
'logger': <Logger sqlalchemy.engine.base.Engine (DEBUG)>,
'pool': <sqlalchemy.pool.impl.QueuePool object at 0x7f988a238c50>,
'url': postgresql+psycopg2://myuser:******#someserver:5432/somedb
}
I know I can do something like SELECT * FROM pg_stat_ssl;, but does the SQLAlchemy engine store this kind of information as a class attribute / method?
Thank you!
I don't use postgres so hopefully this holds true for you.
SQLAlchemy takes the info that you provide in the url and passes it down to the underlying dbapi library that is also specified in the url, in your case it's psycopg2.
Your engine instance only connects to the database when needed, and sqlalchemy just passes the connection info along to the driver specified in the url which returns a connection that sqlalchemy uses.
Forgive that this is mysql, but should be fundamentally the same for you:
>>> engine
Engine(mysql+mysqlconnector://test:***#localhost/test)
>>> conn = engine.connect()
>>> conn
<sqlalchemy.engine.base.Connection object at 0x000001614ACBE2B0>
>>> conn.connection
<sqlalchemy.pool._ConnectionFairy object at 0x000001614BF08630>
>>> conn.connection.connection
<mysql.connector.connection_cext.CMySQLConnection object at 0x000001614AB7E1D0>
Calling engine.connect() returns a sqlalchemy.engine.base.Connection instance that has a connection property for which the docstring says:
The underlying DB-API connection managed by this Connection.
However, you can see from above that it actually returns a sqlalchemy.pool._ConnectionFairy object which from it's docstring:
Proxies a DBAPI connection...
Here is the __init__() method of the connection fairy, and as you can see it has a connection attribute that is the actual underlying dbapi connection.
def __init__(self, dbapi_connection, connection_record, echo):
self.connection = dbapi_connection
self._connection_record = connection_record
self._echo = echo
As to what info is available on the dbapi connection object, it depends on the implementation of that particular driver. E.g psycopg2 connection objects have an info attribute:
A ConnectionInfo object exposing information about the native libpq
connection.
That info object has attributes such as ssl_in_use:
True if the connection uses SSL, False if not.
And ssl_attribute:
Returns SSL-related information about the connection.
So you don't have to dig too deep to get at the actual db connection to see what is really going on.
Also, if you want to ensure that all client connections are ssl, you can always force them to.
HereĀ“s a quick and dirty of what SuperShoot spelled out in detail:
>>> from sqlalchemy import create_engine
>>> db_string = "postgresql+psycopg2://myuser:******#someserver:5432/somedb"
>>> db = create_engine(db_string)
>>> conn = db.connect()
>>> conn.connection.connection.info.ssl_in_use
Should return True if using SSL.
In case someone is looking for PostgreSQL and pg8000, see the pg8000 docs.
For SSL defaults, it is:
import sqlalchemy
sqlalchemy.create_engine(url, connect_args={'ssl_context':True})

How to keep session active for long time in sqlalchemy?

I have a code that runs a query from a query list. These query are long and take quite a long time to execute. Since I am executing these query in a loop, the session seems to expire and I get a error telling me that the connection to the server was lost.
Then I created the session as well as engine inside the loop (I closed the session and disposed the engine at the end of the loop.) I have understood that creating new connection is an expensive operation.
How can I re-use the connection in this case so that I do not have to create the session and engine each time?
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
# an Engine, which the Session will use for connection
# resources
some_engine = create_engine('mysql://user:password#localhost/')
# create a configured "Session" class
Session = sessionmaker(bind=some_engine)
# create a Session
session = Session()
for long_query in long_query_list:
# work with sess
session.execute(long_query)
session.commit()

Issue with Stale data Flask/SqlAlchemy

I have the following set up for which on session.query() SqlAlchemy returns stale data:
Web application running on Flask with Gunicorn + supervisor.
one of the services is composed in this way:
app.py:
#app.route('/api/generatepoinvoice', methods=["POST"])
#auth.login_required
def generate_po_invoice():
try:
po_id = request.json['po_id']
email = request.json['email']
return jsonify(response=POInvoiceGenerator.get_invoice(po_id, email))
except Exception as ex:
app.logger.error("generate_po_invoice(): " + ex.message)
in another folder i have the database related stuff:
DatabaseModels (folder)
|-->Model.py
|-->Connection.py
that's what is contained in the connection.py file:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, scoped_session
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine(DB_BASE_URI, isolation_level="READ COMMITTED")
Session = scoped_session(sessionmaker(bind=engine))
session = Session()
Base = declarative_base()
and thats an extract of the model.py file:
from DatabaseModels.Connection import Base
from sqlalchemy import Column, String, etc...
class Po(Base):
__tablename__ = 'PLC_PO'
id = Column("POId", Integer, primary_key=True)
code = Column("POCode", String(50))
etc...
Then i have another file POInvoiceGenerator.py
that contains the call to the database for fetching some data:
import DatabaseModels.Connection as connection
import DatabaseModels.model as model
def get_invoice(po_code, email):
try:
po_code = po_code.strip()
PLCConnection.session.expire_all()
po = connection.session.query(model.Po).filter(model.Po.code == po_code).first()
except Exception as ex:
logger.error("get_invoice(): " + ex.message)
in subsequent users calls to this service sometimes i start to get errors like: could not find data in the db for that specific code and so on. Like if the data are stale and so on.
My first approach was to add isolation_level="READ COMMITTED" to the engine declaration and then to create a scoped session, but the stale data reading keeps appening.
Is there anyone that had any idea if my setup is wrong (the session and the model are reused among multiple methods and files)
Thanks in advance.
even if the solution pointed by #TonyMountax seems valid and made me discover something that i didn't know about SqlAlchemy, In the end i opted for something different.
I figured out that the connection established by SqlAlchemy was durable since it was created from a pool of connection everytime, this somehow was causing the data to be stale.
i added a NullPool to my code:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, scoped_session
from sqlalchemy.pool import NullPool
engine = create_engine(DB_URI, isolation_level="READ COMMITTED", poolclass=NullPool)
Session = scoped_session(sessionmaker(bind=engine))
session = Session()
and then i'm calling a session close for every query that i make:
session.query("some query..")
session.close()
this will cause SqlAlchemy to create a new connection every time and get fresh data from the db.
Hope that this is the correct way to use it and that might be useful to someone else.
The way you instantiate your database connections means that they are reused for the next request, and they have some state left from the previous request. SQLAlchemy uses a concept of sessions to interact with the database, so that your data does not abruptly change in a single request even if you happen to perform the same query twice. This makes sense when you are using the ORM query features. For instance, if you were to query len(User.friendlist) twice during the same session, but a friend request was accepted during the request, then it will still show the same number in both locations.
To fix this, you must set up the session on first request, then you must tear it down when the request is finished. To do so is not trivial, but there is a well-established project that does it already: Flask-SQLAlchemy. It's from Pallets, the people behind Flask itself and Jinja2.

How to reduce number of connections using SQLAlchemy + postgreSQL?

I'm developing on heroku using their Postgres add-on with the Dev plan, which has a connection limit of 20. I'm new to python and this may be trivial, but I find it difficult to abstract the database connection without causing OperationalError: (OperationalError) FATAL: too many connections for role.
Currently I have databeam.py:
import os
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy
from settings import databaseSettings
class Db(object):
def __init__(self):
self.app = Flask(__name__)
self.app.config.from_object(__name__)
self.app.config['SQLALCHEMY_DATABASE_URI'] = os.environ.get('DATABASE_URL', databaseSettings())
self.db = SQLAlchemy(self.app)
db = Db()
And when I'm creating a controller for a page, I do this:
import databeam
db = databeam.db
locations = databeam.locations
templateVars = db.db.session.query(locations).filter(locations.parent == 0).order_by(locations.order.asc()).all()
This does produce what I want, but slowly and at times causes the error metioned above. Since I come from a php background I have a certain mindset of how to deal with DB connections (I.e. like the example above), but I fear it doesn't fit well with python.
What is the proper way of abstracting the db connection in one place and then just using the same connection in all imports?
Within SQL Alchemy you should be able to create a connection pool. This pool is what the pool size would be for each Dyno. On the Dev and Basic plan since you could have up to 20, you could set this at 20 if you run 1 dyno, 10 if you run 2, etc. To configure your pool you can setup the engine:
engine = create_engine('postgresql://me#localhost/mydb',
pool_size=20, max_overflow=0)
This sets up your db engine with a pool which you pull from automatically then. You can also configure the pool manually, more details on that can be found on the pooling guide of SQL Alchemy - http://docs.sqlalchemy.org/en/latest/core/pooling.html

Categories