I discovered that SQLAlchemy does not release the database connections (in my case) so this piles up to the point that it might crash the server. The connections are made from different threads.
Here is the simplified code
"""
Test to see DB connection allocation size while making call from multiple threads
"""
from time import sleep
from threading import Thread, current_thread
import uuid
from sqlalchemy import func, or_, desc
from sqlalchemy import event
from sqlalchemy import ForeignKey, Column, Integer, String, DateTime, UniqueConstraint
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import relationship
from sqlalchemy.orm import scoped_session, Session
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.types import Integer, DateTime, String, Boolean, Text, Float
from sqlalchemy.engine import Engine
from sqlalchemy.pool import NullPool
# MySQL
SQLALCHEMY_DATABASE = 'mysql'
SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://amalgam:amalgam#localhost/amalgam?charset=utf8mb4' # https://stackoverflow.com/questions/47419943/pymysql-warning-1366-incorrect-string-value-xf0-x9f-x98-x8d-t
SQLALCHEMY_ECHO = False
SQLALCHEMY_ENGINE_OPTIONS = {'pool_size': 40, 'max_overflow': 0}
SQLALCHEMY_ISOLATION_LEVEL = "AUTOCOMMIT"
# DB Engine
# engine = create_engine(SQLALCHEMY_DATABASE_URI, echo=SQLALCHEMY_ECHO, pool_recycle=3600,
# isolation_level= SQLALCHEMY_ISOLATION_LEVEL,
# **SQLALCHEMY_ENGINE_OPTIONS
# ) # Connect to server
engine = create_engine(SQLALCHEMY_DATABASE_URI,
echo=SQLALCHEMY_ECHO,
# poolclass=NullPool,
pool_recycle=3600,
isolation_level= SQLALCHEMY_ISOLATION_LEVEL,
**SQLALCHEMY_ENGINE_OPTIONS
) # Connect to server
session_factory = sessionmaker(bind=engine)
Base = declarative_base()
# ORM Entity
class User(Base):
LEVEL_NORMAL = 'normal'
LEVEL_ADMIN = 'admin'
__tablename__ = "users"
id = Column(Integer, primary_key=True)
name = Column(String(100), nullable=True)
email = Column(String(100), nullable=True, unique=True)
password = Column(String(100), nullable=True)
level = Column(String(100), default=LEVEL_NORMAL)
# Workers
NO = 10
workers = []
_scoped_session_factory = scoped_session(session_factory)
def job(job_id):
session = _scoped_session_factory()
print("Job is {}".format(job_id))
user = User(name='User {} {}'.format(job_id, uuid.uuid4()), email='who cares {} {}'.format(job_id, uuid.uuid4()))
session.add(user)
session.commit()
session.close()
print("Job {} done".format(job_id))
sleep(10)
# Create worker threads
for i in range(NO):
workers.append(Thread(target=job, kwargs={'job_id':i}))
# Start them
for worker in workers:
worker.start()
# Join them
for worker in workers:
worker.join()
# Allow some time to see MySQL's "show processlist;" command
sleep(10)
The moment the program reaches
sleep(10)
and I run the
show processlist;
it give the following result - meaning that all connections to the DB are still alive.
How can I force closing those connections?
Note: I could make use of
poolclass=NullPool
but I feel that that solution is too restrictive - I would like to still have access to a database pool but being able to somehow close connections when wanted
The following is from the signature for QueuePool constructor
pool_size – The size of the pool to be maintained, defaults to 5. This
is the largest number of connections that will be kept persistently in
the pool. Note that the pool begins with no connections; once this
number of connections is requested, that number of connections will
remain. pool_size can be set to 0 to indicate no size limit; to
disable pooling, use a NullPool instead.
max_overflow – The maximum overflow size of the pool. When the number
of checked-out connections reaches the size set in pool_size,
additional connections will be returned up to this limit. When those
additional connections are returned to the pool, they are disconnected
and discarded. It follows then that the total number of simultaneous
connections the pool will allow is pool_size + max_overflow, and the
total number of “sleeping” connections the pool will allow is
pool_size. max_overflow can be set to -1 to indicate no overflow
limit; no limit will be placed on the total number of concurrent
connections. Defaults to 10.
SQLALCHEMY_ENGINE_OPTIONS = {'pool_size': 40, 'max_overflow': 0}
Given the above, this configuration is asking SQLAlchemy to keep up to 40 connections open.
If you don't like that, but want to keep some connections available you might try a configuration like this:
SQLALCHEMY_ENGINE_OPTIONS = {'pool_size': 10, 'max_overflow': 30}
This will keep 10 persistent connections in the pool, and will burst up to 40 connections if requested concurrently. Any connection in surplus of the configured pool size are immediately closed upon being checked back into the pool.
Related
I have an app built using Fastapi & SQLAlchemy for handling all the DB-related stuff.
When the APIs are triggered via the frontend, I see that the connections are opened & they remain in IDLE state for a while. Is it possible to reduce the IDLE time via sqlalchemy?
I do the following to connect to the Postgres DB:
import sqlalchemy as db
eng = db.create_engine(<SQLALCHEMY_DATABASE_URI>)
conn = eng.connect()
metadata = db.MetaData()
table = db.Table(
<table_name>,
metadata,
autoload=True,
autoload_with=eng)
user_id = 1
try:
if ids_by_user is None:
query = db.select([table.columns.created_at]).where(
table.columns.user_id == user_id,
).order_by(
table.columns.created_at.desc()
)
result = conn.execute(query).fetchmany(1)
time = result[0][0]
time_filtering_query = db.select([table]).where(
table.columns.created_at == time
)
time_result = conn.execute(time_filtering_query).fetchall()
conn.close()
return time_result
else:
output_by_id = []
for i in ids_by_user:
query = db.select([table]).where(
db.and_(
table.columns.id == i,
table.columns.user_id == user_id
)
)
result = conn.execute(query).fetchall()
output_by_id.append(result)
output_by_id = [output_by_id[j][0]
for j in range(len(output_by_id))
if output_by_id[j]]
conn.close()
return output_by_id
finally:
eng.dispose()
Even after logging out of the app, the connections are still active & in idle state for a while and don't close immediately.
Edit 1
I tried using NullPool & the connections are still idle & in ROLLBACK, which is the same as when didn't use NullPool
You can reduce connection idle time by setting a maximum lifetime per connection by using pool_recycle. Note that connections already checked out will not be terminated until they are no longer in use.
If you are interested in reducing both the idle time and keeping the overall number of unused connections low, you can set a lower pool_size and then set max_overflow to allow for more connections to be allocated when the application is under heavier load.
from sqlalchemy import create_engine
e = create_engine(<SQLALCHEMY_DATABASE_URI>,
pool_recycle=3600 # idle connections will be terminated after 1 hour
pool_size=5 #pool size under normal conditions
max_overflow=5 #additional connections when pool size is exeeded
)
Google cloud has a helpful guide for optimizing Postgres connection pooling that you might find useful
If I understand the documentation correctly, session creates or checkout an existing connection from pool.
After I close the connection it goes back to connection pool but the connection to DBAPI is not closed. To close the DBAPI connection, I need to use NullPool.
I tried to check if my understanding is correct but seems there is some disconnect.
I'm able to open a session even after creating connections using NullPool.
Below is the sample code, where it has two connections one without NullPool and one with NullPool.
from time import sleep
from sqlalchemy.orm import sessionmaker
from sqlalchemy.pool import NullPool
from sqlalchemy import create_engine
con_str = 'postgresql+psycopg2://pgusr:pgusr#docker-db.appnet/iamdb'
all_connections = [
create_engine(con_str),
create_engine(con_str, poolclass=NullPool),
]
def get_qry(ssn_rows):
return f'SELECT * FROM users LIMIT {ssn_rows}'
def get_ssn(cnx):
SessionClass = sessionmaker(bind=cnx)
return SessionClass()
def test_ssn(ssn, records):
print('')
print(list(ssn.execute(get_qry(records))))
for cnx in all_connections:
ssn = get_ssn(cnx)
test_ssn(ssn, 3)
ssn.close()
sleep(1.5)
ssn = get_ssn(cnx)
test_ssn(ssn, 5)
Above code executed successfully, where I expected, in the second iteration(where NullPool connection is chosen) it should throws an error while trying to create a session for the second time i.e. after the first session is closed because in case of NullPool the connection to DBAPI should also have closed. However in contrast, both the connections were able to open multiple sessions.
Am I missing something here?
How do I close the connection to DBAPI after the session is closed?
I'm utilizing Flask and SqlAlchemy. The database I've created for SqlAlchemy seems to mess up when I try to run my website and will pop up with the error stating that there's a thread error. I'm wondering if it's because I haven't dropped my table from my previous schema. I'm using a linux server to try and run the "python3" and the file to set up my database.
I've tried to physically delete the table from my local drive and the re run it but I still up this error.
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm import scoped_session
from database_setup import Base, Category, Item
engine = create_engine('sqlite:///database_tables.db')
Base.metadata.bind = engine
Session = sessionmaker()
Session.bind = engine
session = Session()
brushes = Category(id = 1, category_name = 'Brushes')
session.add(brushes)
session.commit()
pencils = Category(id = 2, category_name = 'Pencils')
session.add(pencils)
session.commit()
When I am in debug mode using Flask, I click the links I've made using these rows, but after three clicks I get the error
"(sqlite3.ProgrammingError) SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140244909291264 and this is thread id 140244900898560 [SQL: SELECT category.id AS category_id, category.category_name AS category_category_name FROM category] [parameters: [{}]] (Background on this error at: http://sqlalche.me/e/f405)"
you can use for each thread a session, by indexing them using the thread id _thread.get_ident():
import _thread
engine = create_engine('sqlite:///history.db', connect_args={'check_same_thread': False})
...
Base.metadata.create_all(engine)
sessions = {}
def get_session():
thread_id = _thread.get_ident() # get thread id
if thread_id in sessions:
return sessions[thread_id]
session_factory = sessionmaker(bind=engine)
Session = scoped_session(session_factory)
sessions[thread_id] = Session()
return sessions[thread_id]
then use get_session() where it is needed, in your case:
get_session().add(brushes)
get_session().commit()
I have the following set up for which on session.query() SqlAlchemy returns stale data:
Web application running on Flask with Gunicorn + supervisor.
one of the services is composed in this way:
app.py:
#app.route('/api/generatepoinvoice', methods=["POST"])
#auth.login_required
def generate_po_invoice():
try:
po_id = request.json['po_id']
email = request.json['email']
return jsonify(response=POInvoiceGenerator.get_invoice(po_id, email))
except Exception as ex:
app.logger.error("generate_po_invoice(): " + ex.message)
in another folder i have the database related stuff:
DatabaseModels (folder)
|-->Model.py
|-->Connection.py
that's what is contained in the connection.py file:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, scoped_session
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine(DB_BASE_URI, isolation_level="READ COMMITTED")
Session = scoped_session(sessionmaker(bind=engine))
session = Session()
Base = declarative_base()
and thats an extract of the model.py file:
from DatabaseModels.Connection import Base
from sqlalchemy import Column, String, etc...
class Po(Base):
__tablename__ = 'PLC_PO'
id = Column("POId", Integer, primary_key=True)
code = Column("POCode", String(50))
etc...
Then i have another file POInvoiceGenerator.py
that contains the call to the database for fetching some data:
import DatabaseModels.Connection as connection
import DatabaseModels.model as model
def get_invoice(po_code, email):
try:
po_code = po_code.strip()
PLCConnection.session.expire_all()
po = connection.session.query(model.Po).filter(model.Po.code == po_code).first()
except Exception as ex:
logger.error("get_invoice(): " + ex.message)
in subsequent users calls to this service sometimes i start to get errors like: could not find data in the db for that specific code and so on. Like if the data are stale and so on.
My first approach was to add isolation_level="READ COMMITTED" to the engine declaration and then to create a scoped session, but the stale data reading keeps appening.
Is there anyone that had any idea if my setup is wrong (the session and the model are reused among multiple methods and files)
Thanks in advance.
even if the solution pointed by #TonyMountax seems valid and made me discover something that i didn't know about SqlAlchemy, In the end i opted for something different.
I figured out that the connection established by SqlAlchemy was durable since it was created from a pool of connection everytime, this somehow was causing the data to be stale.
i added a NullPool to my code:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, scoped_session
from sqlalchemy.pool import NullPool
engine = create_engine(DB_URI, isolation_level="READ COMMITTED", poolclass=NullPool)
Session = scoped_session(sessionmaker(bind=engine))
session = Session()
and then i'm calling a session close for every query that i make:
session.query("some query..")
session.close()
this will cause SqlAlchemy to create a new connection every time and get fresh data from the db.
Hope that this is the correct way to use it and that might be useful to someone else.
The way you instantiate your database connections means that they are reused for the next request, and they have some state left from the previous request. SQLAlchemy uses a concept of sessions to interact with the database, so that your data does not abruptly change in a single request even if you happen to perform the same query twice. This makes sense when you are using the ORM query features. For instance, if you were to query len(User.friendlist) twice during the same session, but a friend request was accepted during the request, then it will still show the same number in both locations.
To fix this, you must set up the session on first request, then you must tear it down when the request is finished. To do so is not trivial, but there is a well-established project that does it already: Flask-SQLAlchemy. It's from Pallets, the people behind Flask itself and Jinja2.
I wrote a script with this sort of logic in order to insert many records into a PostgreSQL table as they are generated.
#!/usr/bin/env python3
import asyncio
from concurrent.futures import ProcessPoolExecutor as pool
from functools import partial
import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base
metadata = sa.MetaData(schema='stackoverflow')
Base = declarative_base(metadata=metadata)
class Example(Base):
__tablename__ = 'example'
pk = sa.Column(sa.Integer, primary_key=True)
text = sa.Column(sa.Text)
sa.event.listen(Base.metadata, 'before_create',
sa.DDL('CREATE SCHEMA IF NOT EXISTS stackoverflow'))
engine = sa.create_engine(
'postgresql+psycopg2://postgres:password#localhost:5432/stackoverflow'
)
Base.metadata.create_all(engine)
session = sa.orm.sessionmaker(bind=engine, autocommit=True)()
def task(value):
engine.dispose()
with session.begin():
session.add(Example(text=value))
async def infinite_task(loop):
spawn_task = partial(loop.run_in_executor, None, task)
while True:
await asyncio.wait([spawn_task(value) for value in range(10000)])
def main():
loop = asyncio.get_event_loop()
with pool() as executor:
loop.set_default_executor(executor)
asyncio.ensure_future(infinite_task(loop))
loop.run_forever()
loop.close()
if __name__ == '__main__':
main()
This code works just fine, creating a pool of as many processes as I have CPU cores, and happily chugging along forever. I wanted to see how threads would compare to processes, but I could not get a working example. Here are the changes I made:
from concurrent.futures import ThreadPoolExecutor as pool
session_maker = sa.orm.sessionmaker(bind=engine, autocommit=True)
Session = sa.orm.scoped_session(session_maker)
def task(value):
engine.dispose()
# create new session per thread
session = Session()
with session.begin():
session.add(Example(text=value))
# remove session once the work is done
Session.remove()
This version runs for a while before a flood of "too many clients" exceptions:
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: sorry, too many clients already
What am I missing?
It turns out that the problem is engine.dispose(), which, in the words of Mike Bayer (zzzeek) "is leaving PG connections lying open to be garbage collected."
Source: https://groups.google.com/forum/#!topic/sqlalchemy/zhjCBNebnDY
So the updated task function looks like this:
def task(value):
# create new session per thread
session = Session()
with session.begin():
session.add(Example(text=value))
# remove session object once the work is done
session.remove()
It looks like you're opening a lot of new connections without closing them, try to add engine.dispose() after:
from concurrent.futures import ThreadPoolExecutor as pool
session_maker = sa.orm.sessionmaker(bind=engine, autocommit=True)
Session = sa.orm.scoped_session(session_maker)
def task(value):
engine.dispose()
# create new session per thread
session = Session()
with session.begin():
session.add(Example(text=value))
# remove session once the work is done
Session.remove()
engine.dispose()
Keep in mind the cost of a new connection, so ideally you should have one connection per process/thread, but I'm not sure how ThreadPoolExecutor works and probably connections are not being closed on thread's execution finish.