How to reduce number of connections using SQLAlchemy + postgreSQL? - python

I'm developing on heroku using their Postgres add-on with the Dev plan, which has a connection limit of 20. I'm new to python and this may be trivial, but I find it difficult to abstract the database connection without causing OperationalError: (OperationalError) FATAL: too many connections for role.
Currently I have databeam.py:
import os
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy
from settings import databaseSettings
class Db(object):
def __init__(self):
self.app = Flask(__name__)
self.app.config.from_object(__name__)
self.app.config['SQLALCHEMY_DATABASE_URI'] = os.environ.get('DATABASE_URL', databaseSettings())
self.db = SQLAlchemy(self.app)
db = Db()
And when I'm creating a controller for a page, I do this:
import databeam
db = databeam.db
locations = databeam.locations
templateVars = db.db.session.query(locations).filter(locations.parent == 0).order_by(locations.order.asc()).all()
This does produce what I want, but slowly and at times causes the error metioned above. Since I come from a php background I have a certain mindset of how to deal with DB connections (I.e. like the example above), but I fear it doesn't fit well with python.
What is the proper way of abstracting the db connection in one place and then just using the same connection in all imports?

Within SQL Alchemy you should be able to create a connection pool. This pool is what the pool size would be for each Dyno. On the Dev and Basic plan since you could have up to 20, you could set this at 20 if you run 1 dyno, 10 if you run 2, etc. To configure your pool you can setup the engine:
engine = create_engine('postgresql://me#localhost/mydb',
pool_size=20, max_overflow=0)
This sets up your db engine with a pool which you pull from automatically then. You can also configure the pool manually, more details on that can be found on the pooling guide of SQL Alchemy - http://docs.sqlalchemy.org/en/latest/core/pooling.html

Related

How to avoid the QueuePool limit error using Flask-SQLAlchemy?

I'm developing a webapp using Flask-SQLAlchemy and a Postgre DB, then I have this dropdown list in my webpage which is populated from a select to the DB, after selecting different values for a couple of times I get the "sqlalchemy.exc.TimeoutError:".
My package's versions are:
Flask-SQLAlchemy==2.5.1
psycopg2-binary==2.8.6
SQLAlchemy==1.4.15
My parameters for the DB connection are set as:
app.config['SQLALCHEMY_POOL_SIZE'] = 20
app.config['SQLALCHEMY_MAX_OVERFLOW'] = 20
app.config['SQLALCHEMY_POOL_TIMEOUT'] = 5
app.config['SQLALCHEMY_POOL_RECYCLE'] = 10
The error I'm getting is:
sqlalchemy.exc.TimeoutError: QueuePool limit of size 20 overflow 20 reached, connection timed out, timeout 5.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
After changing the value of the 'SQLALCHEMY_MAX_OVERFLOW' from 20 to 100 I get the following error after some value changes on the dropdown list.
psycopg2.OperationalError: connection to server at "localhost" (::1), port 5432 failed: FATAL: sorry, too many clients already
Every time a new value is selected from the dropdown list, four queries are triggered to the database and they are used to populate four corresponding tables in my HTML with the results from that query.
I have a 'db.session.commit()' statement after every single query to the DB, but even though I have it, I get this error after a few value changes to my dropdown list.
I know that I should be looking to correctly manage my connection sessions, but I'm strugling with this. I thought about setting the pool timeout to 5s, instead of the default 30s in hopes that the session would be closed and returned to the pool in a faster way, but it seems it didn't help.
As a suggestion from #snakecharmerb, I checked the output of:
select * from pg_stat_activity;
I ran the webapp for 10 different values before it showed me an error, which means all the 20+20 sessions where used and are left in an 'idle in transaction' state.
Do anybody have any idea suggestion on what should I change or look for?
I found a solution to the issue I was facing, in another post from StackOverFlow.
When you assign your flask app to your db variable, on top of indicating which Flask app it should use, you can also pass on session options, as below:
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy(app, session_options={'autocommit': True})
The usage of 'autocommit' solved my issue.
Now, as suggested, I'm using:
app.config['SQLALCHEMY_POOL_SIZE'] = 1
app.config['SQLALCHEMY_MAX_OVERFLOW'] = 0
Now everything is working as it should.
The original post which helped me is: Autocommit in Flask-SQLAlchemy
#snakecharmerb, #jorzel, #J_H -> Thanks for the help!
You are leaking connections.
A little counterintuitively,
you may find you obtain better results with a lower pool limit.
A given python thread only needs a single pooled connection,
for the simple single-database queries you're doing.
Setting the limit to 1, with 0 overflow,
will cause you to notice a leaked connection earlier.
This makes it easier to pin the blame on the source code that leaked it.
As it stands, you have lots of code, and the error is deferred
until after many queries have been issued,
making it harder to reason about system behavior.
I will assume you're using sqlalchemy 1.4.29.
To avoid leaking, try using this:
from contextlib import closing
from sqlalchemy import create_engine, text
from sqlalchemy.orm import scoped_session, sessionmaker
engine = create_engine(some_url, future=True, pool_size=1, max_overflow=0)
get_session = scoped_session(sessionmaker(bind=engine))
...
with closing(get_session()) as session:
try:
sql = """yada yada"""
rows = session.execute(text(sql)).fetchall()
session.commit()
...
# Do stuff with result rows.
...
except Exception:
session.rollback()
I am using flask-restful.
So when I got this error -> QueuePool limit of size 20 overflow 20 reached, connection timed out, timeout 5.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
I found out in logs that my checked out connections are not closing. this I found out using logger.info(db_session.get_bind().pool.status())
def custom_decorator(error_message, db_session):
def api_decorator(func):
def api_request(self, *args, **kwargs):
try:
response = func(self)
db_session.commit()
return response
except Exception as err:
db_session.rollback()
logger.error(error_message.format(err))
return error_response(
message=f"Internal Server Error",
status_code=HTTPStatus.INTERNAL_SERVER_ERROR,
)
finally:
db_session.close()
return api_request
return api_decorator
So I had to create this decorator which handles the db_session closing automatically. Using this I am not getting any active checked out connections.
you can use the decorators in your function as follows:
#custom_decorator("blah", db_session)
def example():
"some code"

cx_oracle persistent connection on flask+apache+mod_wsgi

I have deployed my flask application on apache+mod_wsgi
I'm using WSGI Daemon mode and have this config in apache httpd.conf:
WSGIDaemonProcess flask_test user=apache group=apache threads=20
For simplicity lets say for each request, I need to execute a query to insert data into Oracle DataBase.
So in my flask application, I have done something like this:
# DB.py
import cx_Oracle
class DB:
def __init__(self, connection_string):
self.conn = cx_Oracle.connect(connection_string, threaded=True)
def insert(query):
cur = self.conn.cursor()
cur.execute(query)
cur.close()
self.conn.commit()
# flask_app.py
from flask import Flask, request, jsonify
from DB import DB
app = Flask(__name__)
db = DB(connection_string)
#app.route("/foo", methods=["POST"])
def foo():
post_data = request.get_json()
# parse above data
# create insert query with parsed data values
db.insert(insert_processed_data_QUERY)
# generate response
return jsonify(response)
When I start the apache+mod_wsgi server, the DB object is created and the DB connection is established.
For all incoming requests, the same DB object is used to execute insert query.
So far this works fine for me. However my concern is that if there are no requests for a long period of time, the DB connection might time out, and then my app will not work for a new request when it comes.
I've been monitoring my application and have observed that the DB connection persists for hours and hours. But I'm pretty sure it might timeout if there is no request for 2-3 days(?)
What would be the correct way to ensure that the DB connection will stay open forever? (i.e. as long as the apache server is running)
Use a pool instead of a standalone connection. When you acquire a connection from the pool it will check to see if the connection is no longer valid and automatically dispense a new one. So you need something like this:
pool = cx_Oracle.SessionPool(user=user, password=password, dsn=dsn, min=1,
max=2, increment=1)
Then in your code you need to do the following:
with pool.acquire() as connection:
# do what you need to do with the connection

Issue with Stale data Flask/SqlAlchemy

I have the following set up for which on session.query() SqlAlchemy returns stale data:
Web application running on Flask with Gunicorn + supervisor.
one of the services is composed in this way:
app.py:
#app.route('/api/generatepoinvoice', methods=["POST"])
#auth.login_required
def generate_po_invoice():
try:
po_id = request.json['po_id']
email = request.json['email']
return jsonify(response=POInvoiceGenerator.get_invoice(po_id, email))
except Exception as ex:
app.logger.error("generate_po_invoice(): " + ex.message)
in another folder i have the database related stuff:
DatabaseModels (folder)
|-->Model.py
|-->Connection.py
that's what is contained in the connection.py file:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, scoped_session
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine(DB_BASE_URI, isolation_level="READ COMMITTED")
Session = scoped_session(sessionmaker(bind=engine))
session = Session()
Base = declarative_base()
and thats an extract of the model.py file:
from DatabaseModels.Connection import Base
from sqlalchemy import Column, String, etc...
class Po(Base):
__tablename__ = 'PLC_PO'
id = Column("POId", Integer, primary_key=True)
code = Column("POCode", String(50))
etc...
Then i have another file POInvoiceGenerator.py
that contains the call to the database for fetching some data:
import DatabaseModels.Connection as connection
import DatabaseModels.model as model
def get_invoice(po_code, email):
try:
po_code = po_code.strip()
PLCConnection.session.expire_all()
po = connection.session.query(model.Po).filter(model.Po.code == po_code).first()
except Exception as ex:
logger.error("get_invoice(): " + ex.message)
in subsequent users calls to this service sometimes i start to get errors like: could not find data in the db for that specific code and so on. Like if the data are stale and so on.
My first approach was to add isolation_level="READ COMMITTED" to the engine declaration and then to create a scoped session, but the stale data reading keeps appening.
Is there anyone that had any idea if my setup is wrong (the session and the model are reused among multiple methods and files)
Thanks in advance.
even if the solution pointed by #TonyMountax seems valid and made me discover something that i didn't know about SqlAlchemy, In the end i opted for something different.
I figured out that the connection established by SqlAlchemy was durable since it was created from a pool of connection everytime, this somehow was causing the data to be stale.
i added a NullPool to my code:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, scoped_session
from sqlalchemy.pool import NullPool
engine = create_engine(DB_URI, isolation_level="READ COMMITTED", poolclass=NullPool)
Session = scoped_session(sessionmaker(bind=engine))
session = Session()
and then i'm calling a session close for every query that i make:
session.query("some query..")
session.close()
this will cause SqlAlchemy to create a new connection every time and get fresh data from the db.
Hope that this is the correct way to use it and that might be useful to someone else.
The way you instantiate your database connections means that they are reused for the next request, and they have some state left from the previous request. SQLAlchemy uses a concept of sessions to interact with the database, so that your data does not abruptly change in a single request even if you happen to perform the same query twice. This makes sense when you are using the ORM query features. For instance, if you were to query len(User.friendlist) twice during the same session, but a friend request was accepted during the request, then it will still show the same number in both locations.
To fix this, you must set up the session on first request, then you must tear it down when the request is finished. To do so is not trivial, but there is a well-established project that does it already: Flask-SQLAlchemy. It's from Pallets, the people behind Flask itself and Jinja2.

Connecting psycopg2 with Python in Heroku

I've been trying for some days to connect my python 3 script to PostgresSQL database(psycopg2) in Heroku, without Django.
I found some article and related questions, but I had to invest a lot of time to get something that I thought should be very straightforward, even for a newbie like me.
I eventually made it work somehow but hopefully posting the question (and answer) will help other people to achieve it faster.
Of course, if anybody has a better way, please share it.
As I said, I had a python script that I wanted to make it run from the cloud using Heroku. No Django involved (just a script/scraper).
Articles that I found helpful at the beginning, even if they were not enough:
Running Python Background Jobs with Heroku
Simple twitter-bot with Python, Tweepy and Heroku
Main steps:
1. Procfile
Procfile has to be:
worker: python3 folder/subfolder/myscript.py
2. Heroku add-on
Add-on Heroku Postgres :: Database has to be added to the appropriate personal app in the heroku account.
To make sure this was properly set, this was quite helpful.
3. Python script with db connection
Finally, to create the connection in my python script myscript.py, I took this article as a reference and adapted it to Python 3:
import psycopg2
import urllib.parse as urlparse
import os
url = urlparse.urlparse(os.environ['DATABASE_URL'])
dbname = url.path[1:]
user = url.username
password = url.password
host = url.hostname
port = url.port
con = psycopg2.connect(
dbname=dbname,
user=user,
password=password,
host=host,
port=port
)
To create a new database, this SO question explains it. Key line is:
con.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
You can do it using the SQLALCHEMY library.
First, you need to install the SQLALCHEMY library using pip, if you don't have pip on your computer install, you will know-how using a simple google search
pip install sqlalchemy
Here is the code snippet that do what you want:
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
import os
# Put your URL in an environment variable and connect.
engine = create_engine(os.getenv("DATABASE_URL"))
db = scoped_session(sessionmaker(bind=engine))
# Some variables you need.
var1 = 12
var2 = "Einstein"
# Execute statements
db.execute("SELECT id, username FROM users WHERE id=:id, username=:username"\
,{"id": var1, "username": var2}).fetchall()
# Don't forget to commit if you did an insertion,etc...
db.commit()
I wasn't able to parse the DATABASE_URL provided by Heroku with the urllib.parse as suggested above, but the following worked for me:
The URL I retrieved from Heroku was in the format:
postgres://username:password#host:port/database
for example:
postgres://jticiuimwernbk:ff78903549d4c6ec13a53a8ffefcd201b937d54c35d976
#ec2-52-123-182-987.compute-1.amazonaws.com:5432/dbsd4fdf6c1awq
So I manually dissected it as follows:
user = 'jticiuimwernbk'
password = 'ff78903549d4c6ec13a53a8ffefcd201b937d54c35d976'
host = 'ec2-52-123-182-987.compute-1.amazonaws.com'
port = '5432'
database = 'dbsd4fdf6c1awq'
#Then created the connection using the above:
con = psycopg2.connect(database=database,
user=user,
password=password,
host=host,
port=port)
# and now I was able to perform queries:
cur = conn.cursor()
results = cur.execute("<some SQL query>;").fetchall()
cur.close()
conn.close()

Google App Engine and Cloud SQL: Lost connection to MySQL server at 'reading initial communication packet' SQL 2nd Gen

I'm getting an error similar to other posts in this subject.
I tried switching from 1st gen to 2nd gen SQL server (both on us-central1), but it still doesn't work.
I copied my CLOUDSQL_PROJECT from the url on the top of my project.
I copied my CLOUDSQL_INSTANCE from the proprieties part in the SQL page.
In my main.py, I'm trying to run Google sample code, and it doesn't work (locally it does, of course):
if os.getenv('SERVER_SOFTWARE', '').startswith('Google App Engine/'):
db = MySQLdb.connect(
unix_socket='/cloudsql/{}:{}'.format(
CLOUDSQL_PROJECT,
CLOUDSQL_INSTANCE),
user=user,passwd=password)
# When running locally, you can either connect to a local running
# MySQL instance, or connect to your Cloud SQL instance over TCP.
else:
db = MySQLdb.connect(host=host,user=user,passwd=password)
cursor = db.cursor()
cursor.execute('SHOW VARIABLES')
for r in cursor.fetchall():
self.response.write('{}\n'.format(r))
The documentation is slightly outdated. You should be able to always use the "Instance connection name" property from the SQL properties page to construct the unix socket path; just append that value after the "/cloudsql/" prefix.
For second generation, the connection format is project:region:name. In your example, it maps to "hello-world-123:us-central1:sqlsomething3", and the unix socket path is "/cloudsql/hello-world-123:us-central1:sqlsomething3".

Categories