I am trying to convert my single threaded application to multi threaded application which uses database using SQLAlchemy. And I found that SQLAlchemy session is not thread safe. So we need to use scoped_session factory for thread safe db access.
Below is my input dataset
input_list = [data1, data2, data3, data4, data5]
Single thread application
from sqlalchemy.orm import sessionmaker, scoped_session
Session = sessionmaker(bind=engine_url)
for data in input_list:
def myfunction(data):
db_session = Session()
print(db_session)
# use db_session to query/store the data
When I try to convert it to multithreaded application
from sqlalchemy.orm import sessionmaker, scoped_session
Session = scoped_session(sessionmaker(bind=engine_url))
def myfunction(data):
db_session = Session()
print(db_session)
# use db_session to query/store the data
def myfunction_parallel():
with ThreadPool(4) as pool:
output = pool.map(myfunction, input_list)
In multithread variant, I am getting db_session as same object, but my expectation is that there should be a new session object created for each thread and the session should be different?
The scoped session registry registers session for each thread that requests one. This enables code to call db_session = Session() and get the expected session for the thread.
However it's application's responsibility to inform the session registry when a session is no longer required. The application does this by calling Session.remove(), as documented here:
The scoped_session.remove() method first calls Session.close() on the current Session, which has the effect of releasing any connection/transactional resources owned by the Session first, then discarding the Session itself. “Releasing” here means that connections are returned to their connection pool and any transactional state is rolled back, ultimately using the rollback() method of the underlying DBAPI connection.
At this point, the scoped_session object is “empty”, and will create a new Session when called again.
This code should work as expected:
def myfunction(data):
db_session = Session()
print(db_session)
# use db_session to query/store the data
Session.remove()
Related
Instead of:
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
# an Engine, which the Session will use for connection
# resources
engine = create_engine('sqlite:///...')
# create session and add objects
with Session(engine) as session:
session.add(some_object)
session.add(some_other_object)
session.commit()
I create a sessionmaker (according to example in documentation, see bellow):
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
# an Engine, which the Session will use for connection
# resources, typically in module scope
engine = create_engine('postgresql://scott:tiger#localhost/')
# a sessionmaker(), also in the same scope as the engine
Session = sessionmaker(engine)
# we can now construct a Session() without needing to pass the
# engine each time
with Session() as session:
session.add(some_object)
session.add(some_other_object)
session.commit()
Can I use the sessions from session maker in different threads (spawning multiple sessions at the same time)? In other words, is session maker thread safe object? If yes, can multiple sessions exists and read/write into same tables at the same time?
Furthermore, what is the advantage of of using 'scoped_session' - is it realated to problem of multiple sessions (one per thread)?:
# set up a scoped_session
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker
session_factory = sessionmaker(bind=some_engine)
Session = scoped_session(session_factory)
# now all calls to Session() will create a thread-local session
some_session = Session()
# you can now use some_session to run multiple queries, etc.
# remember to close it when you're finished!
Session.remove()
Session objects are not thread-safe, but are thread-local. What I recommend using is sessionmaker instead of Session. It will yield a Session object every time you need it, thus not idling the database connection. I'd use the approach below.
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, Session
DB_ENGINE = create_engine('sqlite:///...')
DB_SES_MAKER = sessionmaker(bind=DB_ENGINE)
def get_db():
db = DB_SES_MAKER()
try:
yield db
finally:
db.close()
Then call get_db whenever needed:
db = next(get_db())
engine = db.create_engine(self.url, convert_unicode=True, pool_size=5, pool_recycle=1800, max_overflow=10)
connection = self.engine.connect()
Session = scoped_session(sessionmaker(bind=self.engine, autocommit=False, autoflush=True))
I initialize my session like this. And
def first():
with Session() as session:
second(session)
def second(session):
session.add(obj)
third(session)
def third(session):
session.execute(query)
I use my session like this.
I think the pool is assigned one for each session. So, I think that the above code can work well when pool_size=1, max_overflow=0. But, when I set up like that It stuck and return Exception like.
descriptor '__init__' requires a 'super' object but received a 'tuple'
Why is that? The pool is assigned more than one per session rather than one by one?
And when using session with with, Can do I careless commit and rollback when exception?
I have a code that runs a query from a query list. These query are long and take quite a long time to execute. Since I am executing these query in a loop, the session seems to expire and I get a error telling me that the connection to the server was lost.
Then I created the session as well as engine inside the loop (I closed the session and disposed the engine at the end of the loop.) I have understood that creating new connection is an expensive operation.
How can I re-use the connection in this case so that I do not have to create the session and engine each time?
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
# an Engine, which the Session will use for connection
# resources
some_engine = create_engine('mysql://user:password#localhost/')
# create a configured "Session" class
Session = sessionmaker(bind=some_engine)
# create a Session
session = Session()
for long_query in long_query_list:
# work with sess
session.execute(long_query)
session.commit()
I have the following set up for which on session.query() SqlAlchemy returns stale data:
Web application running on Flask with Gunicorn + supervisor.
one of the services is composed in this way:
app.py:
#app.route('/api/generatepoinvoice', methods=["POST"])
#auth.login_required
def generate_po_invoice():
try:
po_id = request.json['po_id']
email = request.json['email']
return jsonify(response=POInvoiceGenerator.get_invoice(po_id, email))
except Exception as ex:
app.logger.error("generate_po_invoice(): " + ex.message)
in another folder i have the database related stuff:
DatabaseModels (folder)
|-->Model.py
|-->Connection.py
that's what is contained in the connection.py file:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, scoped_session
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine(DB_BASE_URI, isolation_level="READ COMMITTED")
Session = scoped_session(sessionmaker(bind=engine))
session = Session()
Base = declarative_base()
and thats an extract of the model.py file:
from DatabaseModels.Connection import Base
from sqlalchemy import Column, String, etc...
class Po(Base):
__tablename__ = 'PLC_PO'
id = Column("POId", Integer, primary_key=True)
code = Column("POCode", String(50))
etc...
Then i have another file POInvoiceGenerator.py
that contains the call to the database for fetching some data:
import DatabaseModels.Connection as connection
import DatabaseModels.model as model
def get_invoice(po_code, email):
try:
po_code = po_code.strip()
PLCConnection.session.expire_all()
po = connection.session.query(model.Po).filter(model.Po.code == po_code).first()
except Exception as ex:
logger.error("get_invoice(): " + ex.message)
in subsequent users calls to this service sometimes i start to get errors like: could not find data in the db for that specific code and so on. Like if the data are stale and so on.
My first approach was to add isolation_level="READ COMMITTED" to the engine declaration and then to create a scoped session, but the stale data reading keeps appening.
Is there anyone that had any idea if my setup is wrong (the session and the model are reused among multiple methods and files)
Thanks in advance.
even if the solution pointed by #TonyMountax seems valid and made me discover something that i didn't know about SqlAlchemy, In the end i opted for something different.
I figured out that the connection established by SqlAlchemy was durable since it was created from a pool of connection everytime, this somehow was causing the data to be stale.
i added a NullPool to my code:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, scoped_session
from sqlalchemy.pool import NullPool
engine = create_engine(DB_URI, isolation_level="READ COMMITTED", poolclass=NullPool)
Session = scoped_session(sessionmaker(bind=engine))
session = Session()
and then i'm calling a session close for every query that i make:
session.query("some query..")
session.close()
this will cause SqlAlchemy to create a new connection every time and get fresh data from the db.
Hope that this is the correct way to use it and that might be useful to someone else.
The way you instantiate your database connections means that they are reused for the next request, and they have some state left from the previous request. SQLAlchemy uses a concept of sessions to interact with the database, so that your data does not abruptly change in a single request even if you happen to perform the same query twice. This makes sense when you are using the ORM query features. For instance, if you were to query len(User.friendlist) twice during the same session, but a friend request was accepted during the request, then it will still show the same number in both locations.
To fix this, you must set up the session on first request, then you must tear it down when the request is finished. To do so is not trivial, but there is a well-established project that does it already: Flask-SQLAlchemy. It's from Pallets, the people behind Flask itself and Jinja2.
I wrote a script with this sort of logic in order to insert many records into a PostgreSQL table as they are generated.
#!/usr/bin/env python3
import asyncio
from concurrent.futures import ProcessPoolExecutor as pool
from functools import partial
import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base
metadata = sa.MetaData(schema='stackoverflow')
Base = declarative_base(metadata=metadata)
class Example(Base):
__tablename__ = 'example'
pk = sa.Column(sa.Integer, primary_key=True)
text = sa.Column(sa.Text)
sa.event.listen(Base.metadata, 'before_create',
sa.DDL('CREATE SCHEMA IF NOT EXISTS stackoverflow'))
engine = sa.create_engine(
'postgresql+psycopg2://postgres:password#localhost:5432/stackoverflow'
)
Base.metadata.create_all(engine)
session = sa.orm.sessionmaker(bind=engine, autocommit=True)()
def task(value):
engine.dispose()
with session.begin():
session.add(Example(text=value))
async def infinite_task(loop):
spawn_task = partial(loop.run_in_executor, None, task)
while True:
await asyncio.wait([spawn_task(value) for value in range(10000)])
def main():
loop = asyncio.get_event_loop()
with pool() as executor:
loop.set_default_executor(executor)
asyncio.ensure_future(infinite_task(loop))
loop.run_forever()
loop.close()
if __name__ == '__main__':
main()
This code works just fine, creating a pool of as many processes as I have CPU cores, and happily chugging along forever. I wanted to see how threads would compare to processes, but I could not get a working example. Here are the changes I made:
from concurrent.futures import ThreadPoolExecutor as pool
session_maker = sa.orm.sessionmaker(bind=engine, autocommit=True)
Session = sa.orm.scoped_session(session_maker)
def task(value):
engine.dispose()
# create new session per thread
session = Session()
with session.begin():
session.add(Example(text=value))
# remove session once the work is done
Session.remove()
This version runs for a while before a flood of "too many clients" exceptions:
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: sorry, too many clients already
What am I missing?
It turns out that the problem is engine.dispose(), which, in the words of Mike Bayer (zzzeek) "is leaving PG connections lying open to be garbage collected."
Source: https://groups.google.com/forum/#!topic/sqlalchemy/zhjCBNebnDY
So the updated task function looks like this:
def task(value):
# create new session per thread
session = Session()
with session.begin():
session.add(Example(text=value))
# remove session object once the work is done
session.remove()
It looks like you're opening a lot of new connections without closing them, try to add engine.dispose() after:
from concurrent.futures import ThreadPoolExecutor as pool
session_maker = sa.orm.sessionmaker(bind=engine, autocommit=True)
Session = sa.orm.scoped_session(session_maker)
def task(value):
engine.dispose()
# create new session per thread
session = Session()
with session.begin():
session.add(Example(text=value))
# remove session once the work is done
Session.remove()
engine.dispose()
Keep in mind the cost of a new connection, so ideally you should have one connection per process/thread, but I'm not sure how ThreadPoolExecutor works and probably connections are not being closed on thread's execution finish.