I'm working on an application which uses Flask, SQLAlchemy and PostgreSQL. I've to write a transaction that executes multiple queries on the database.
def exec_query_1():
with db.engine.connect() as connection:
connection.execute(#some-query)
def exec_query_2():
with db.engine.connect() as connection:
connection.execute(#some-query)
def exec_query_3():
with db.engine.connect() as connection:
connection.execute(#some-query)
def execute_transaction():
with db.engine.connect() as connection:
with connection.begin() as transaction:
exec_query_1()
exec_query_2()
exec_query_3()
Given that the application is multithreaded, will this code work as expected?
If yes, how? If no, what would be the right approach to make it work?
The code will not work as expected, even in a single thread. The connections opened in the functions are separate1 from the connection used in execute_transaction() and have their own transactions. You should arrange your code so that the functions receive the connection with the ongoing transaction as an argument:
def exec_query_1(connection):
connection.execute(#some-query)
def exec_query_2(connection):
connection.execute(#some-query)
def exec_query_3(connection):
connection.execute(#some-query)
def execute_transaction():
with db.engine.connect() as connection:
with connection.begin() as transaction:
exec_query_1(connection)
exec_query_2(connection)
exec_query_3(connection)
Remember that connections are not thread-safe, so don't share them between threads. "When do I construct a Session, when do I commit it, and when do I close it?" is a good read, altough about Session.
1 May depend on pool configuration.
Related
I have deployed my flask application on apache+mod_wsgi
I'm using WSGI Daemon mode and have this config in apache httpd.conf:
WSGIDaemonProcess flask_test user=apache group=apache threads=20
For simplicity lets say for each request, I need to execute a query to insert data into Oracle DataBase.
So in my flask application, I have done something like this:
# DB.py
import cx_Oracle
class DB:
def __init__(self, connection_string):
self.conn = cx_Oracle.connect(connection_string, threaded=True)
def insert(query):
cur = self.conn.cursor()
cur.execute(query)
cur.close()
self.conn.commit()
# flask_app.py
from flask import Flask, request, jsonify
from DB import DB
app = Flask(__name__)
db = DB(connection_string)
#app.route("/foo", methods=["POST"])
def foo():
post_data = request.get_json()
# parse above data
# create insert query with parsed data values
db.insert(insert_processed_data_QUERY)
# generate response
return jsonify(response)
When I start the apache+mod_wsgi server, the DB object is created and the DB connection is established.
For all incoming requests, the same DB object is used to execute insert query.
So far this works fine for me. However my concern is that if there are no requests for a long period of time, the DB connection might time out, and then my app will not work for a new request when it comes.
I've been monitoring my application and have observed that the DB connection persists for hours and hours. But I'm pretty sure it might timeout if there is no request for 2-3 days(?)
What would be the correct way to ensure that the DB connection will stay open forever? (i.e. as long as the apache server is running)
Use a pool instead of a standalone connection. When you acquire a connection from the pool it will check to see if the connection is no longer valid and automatically dispense a new one. So you need something like this:
pool = cx_Oracle.SessionPool(user=user, password=password, dsn=dsn, min=1,
max=2, increment=1)
Then in your code you need to do the following:
with pool.acquire() as connection:
# do what you need to do with the connection
I'm a newbie to Python and Flask, and I use Oracle, when learning Flask tutorial, I code as follow, but it smells really bad, please help me with these questions, thanks a lot!
1) need I release connection to poll explicitly?
2) how can I implement poll acquire and release gracefully?
def get_dbpool():
if not hasattr(g, 'db_pool'):
g.dbPool = connect_db()
return g.dbPool
#app.teardown_appcontext
def close_db(error):
if hasattr(g, 'db_pool'):
g.dbPool.close()
#app.route('/')
def hello_world():
db = get_dbpool().acquire()
cursor=db.cursor()
sql=''
cursor.execute(sql)
rows = cursor.fetchall()
cursor.close()
get_dbpool().release(db)
return json.jsonify(combines=rows)
There is no need to release the connection to the pool explicitly unless you intend to keep processing for some time and don't need the connection any longer. cx_Oracle automatically releases the connection back to the pool when the connection goes out of scope (function ends), provided that you haven't implemented a circular reference to the connection, of course! In that case you would have to wait until garbage collection executes. Hopefully that answers your questions!
I wonder is it fine if I keep a reference to a db and a collection as class members ?
Like that
from pymongo import MongoClient
class ClientDataStore(object):
BASE_MONGO_CONNECTION_URL = 'mongodb://localhost:27017/'
MAIN_DB_NAME = "bank"
CLIENT_COLLECTION_NAME = "client"
def __init__(self):
self.mongo = MongoClient(ClientDataStore.BASE_MONGO_CONNECTION_URL)
self.db = self.mongo[ClientDataStore.MAIN_DB_NAME]
self.client_collection = self.db[ClientDataStore.CLIENT_COLLECTION_NAME]
def get_client_info(self, id):
client = self.client_collection.find_one({"_id": id})
return client
Will it keep the opened connection or it will open it as necessary ?
Or I should open the db and get the collection all only when I need this ?
Thanks
This is a good idea. MongoClient has a connection pool that keeps open connections indefinitely. Keeping an open connection will reduce latency and increase throughput in your application. See the Connection Pool FAQ for PyMongo.
There are many questions about is django db connection thread safe, but they all seem to be asking the default request threads.
What if I am writing custom script that uses database connection in threads:
from django.db import connections
import threading
class Transform(object):
def transform_data(self, listing):
cursor = self.connection.cursor()
cursor.execute('SELECT ... WHERE id = %s', listing.id)
data = cursor.fetchall()
...
def run(self):
connection = self.connections['legacy']
for listing in listings:
threading.Thread(target=self.transform_data, args=[listing])
How safe is data inside transform_data thread in terms of the result from cursor is not mixed up with other threads?
Ideally each thread should be using its own connection. If you do that when you execute the select query inside transform_data you are essentially getting a snapshot of the data at that point in time. You can retrieve the rows without having to worry about their being updated or deleted by other threads provided that the other threads have their own connection.
If all threads share the same connection what exactly happens is very dependent on what database you are using and transaction isolation level
Each item in the connections object returns a thread-local connection to that database. By default, these connections cannot be shared between threads; attempting to do so will result in a DatabaseError.
Always use connections[alias] within the thread that executes your queries. Never access connections[alias] in the parent thread and pass the object to the child thread. This will ensure that every connection object you use is local to the current thread, avoiding any threading issues.
To fix your code and make it thread-safe, you would change it like this:
from django.db import connections
import threading
class Transform(object):
def transform_data(self, listing):
# Access the database connection on the global `connections` object
# from within the child thread.
cursor = connections['legacy'].cursor()
cursor.execute('SELECT ... WHERE id = %s', listing.id)
data = cursor.fetchall()
...
def run(self):
for listing in listings:
threading.Thread(target=self.transform_data, args=[listing])
DBSession = sessionmaker(bind=self.engine)
def add_person(name):
s = DBSession()
s.add(Person(name=name))
s.commit()
Everytime I run add_person() another connection is created with my postgreSQL DB.
Looking at:
SELECT count(*) FROM pg_stat_activity;
I see the count going up, until I get a Remaining connection slots are reserved for non-replication superuser connections error.
How do I kill those connections? Am I wrong in opening a new session everytime I want to add a Person record?
In general, you should keep your Session object (here DBSession) separate from any functions that make changes to the database. So in your case you might try something like this instead:
DBSession = sessionmaker(bind=self.engine)
session = DBSession() # create your session outside of functions that will modify database
def add_person(name):
session.add(Person(name=name))
session.commit()
Now you will not get new connections every time you add a person to the database.