Is having a global database connection allowed in WSGI applications? - python

I need to create a simple project in Flask. I don't want to use SQLAlchemy. In the code snippet below, everyone that connects to the server uses the same connection object but for each request, a new cursor object is created. I am asking this because I have never used Python DB api before in this way. Is it correct? Should I create a new connection object for each request or use the same connection and cursor object for each request or the method below. Which one is correct?
import mysql.connector
from flask import Flask, request
app = Flask(__name__)
try:
con = mysql.connector.connect(user='root',password='',host='localhost',database='pywork')
except mysql.connector.Error as err:
print("Something went wrong")
#app.route('/')
def home():
cursor = con.cursor()
cursor.execute("INSERT INTO table_name VALUES(NULL,'test record')")
con.commit()
cursor.close()
return ""

WSGI applications may be served by several worker processes and threads. So you might end up having multiple threads using the same connection. So you need to find out whether your library's implementation of the connection is thread safe. Look up the documentation and see if they claim to provide Level 2 thread safety.
Then you should reflect about whether or not you need transactions during your requests. If you find you need transactions (e.g., requests issue multiple database commands with an inconsistent state in between or possible race conditions), you should use different connections, because transactions are always connection wide. Note that some database systems or configurations don't support transactions or don't isolate separate connections from each other.
So if you share a connection, you should assume that you work with autocommit turned on (or better: actually do that).

Related

Taking mongoengine.connect out of the setting.py in django

Most of the blog posts and examples on the web, in purpose of connecting to the MongoDB using Mongoengine in Python/Django, have suggested that we should add these lines to the settings.py file of the app:
from mongoengine import connect
connect('project1', host='localhost')
It works fine for most of the cases except one I have faced recently:
When the database is down!
Let say if db goes down, the process that is taking care of the web server (in my case, Supervisord) will stop running the app because of Exception that connect throws. It may try few more times but after its timeout reached, it will stop trying.
So even if your app has some parts that are not tied to db, they will also break down.
A quick solution to this is adding a try/exception block to the connect code:
try:
connect('project1', host='localhost')
except Exception as e:
print(e)
but I am looking for a better and clean way to handle this.
Unfortunately this is not really possible with mongoengine unless you go with the try-except solution like you did.
You could try to connect with pure pymongo version 3.0+ using MongoClient and registering the connection manually in the mongoengine.connection._connection_settings dictionary (quite hacky but should work). From pymongo documentation:
Changed in version 3.0: MongoClient is now the one and only client class for a standalone server, mongos, or replica set. It includes the functionality that had been split into MongoReplicaSetClient: it can connect to a replica set, discover all its members, and monitor the set for stepdowns, elections, and reconfigs.
The MongoClient constructor no longer blocks while connecting to the server or servers, and it no longer raises ConnectionFailure if they are unavailable, nor ConfigurationError if the user’s credentials are wrong. Instead, the constructor returns immediately and launches the connection process on background threads.

How to use Peewee with Tornado perfectly

I'm useing peewee with my tornado webapp,when I read peewee's document,I found:
Adding Request Hooks
When building web-applications, it is very important that you manage your database connections correctly. In this section I will describe how to add hooks to your web app to ensure the database connection is handled properly.
These steps will ensure that regardless of whether you’re using a simple SQLite database, or a pool of multiple Postgres connections, peewee will handle the connections correctly.
http://docs.peewee-orm.com/en/latest/peewee/database.html
Insides,it tells how Flask Django Bottle...to use that except the solution for Tornado
I wonder it's a easy way for tornado to solve this problem? Or this doesn't matter at all?
The idea there is that you want to open a connection when a request begins, and close it when the request is finished (the response is returned).
To do this it looks like you can subclass RequestHandler:
from tornado.web import RequestHandler
db = SqliteDatabase('my_db.db')
class PeeweeRequestHandler(RequestHandler):
def prepare(self):
db.connect()
return super(PeeweeRequestHandler, self).prepare()
def on_finish(self):
if not db.is_closed():
db.close()
return super(PeeweeRequestHandler, self).on_finish()

Flask-SQLAlchemy "MySQL server has gone away" when using HAproxy

I've built a small python REST service using Flask, with Flask-SQLAlchemy used for talking to the MySQL DB.
If I connect directly to the MySQL server everything is good, no problems at all. If I use HAproxy (handles HA/failover, though in this dev environment there is only one DB server) then I constantly get MySQL server has gone away errors if the application doesn't talk to the DB frequently enough.
My HAproxy client timeout is set to 50 seconds, so what I think is happening is it cuts the stream, but the application isn't aware and tries to make use of an invalid connection.
Is there a setting I should be using when using services like HAproxy?
Also it doesn't seem to reconnect automatically, but if I issue a request manually I get Can't reconnect until invalid transaction is rolled back, which is odd since it is just a select() call I'm making, so I don't think it is a commit() I'm missing - or should I be calling commit() after every ORM based query?
Just to tidy up this question with an answer I'll post what I (think I) did to solve the issues.
Problem 1: HAproxy
Either increase the HAproxy client timeout value (globally, or in the frontend definition) to a value longer than what MySQL is set to reset on (see this interesting and related SF question)
Or set SQLALCHEMY_POOL_RECYCLE = 30 (30 in my case was less than HAproxy client timeout) in Flask's app.config so that when the DB is initialised it will pull in those settings and recycle connections before HAproxy cuts them itself. Similar to this issue on SO.
Problem 2: Can't reconnect until invalid transaction is rolled back
I believe I fixed this by tweaking the way the DB is initialised and imported across various modules. I basically now have a module that simply has:
from flask.ext.sqlalchemy import SQLAlchemy
db = SQLAlchemy()
Then in my main application factory I simply:
from common.database import db
db.init_app(app)
Also since I wanted to easily load table structures automatically I initialised the metadata binds within the app context, and I think it was this which cleanly handled the commit() issue/error I was getting, as I believe the database sessions are now being correctly terminated after each request.
with app.app_context():
# Setup DB binding
db.metadata.bind = db.engine

What are the connection limits for Google Cloud SQL from App Engine, and how to best reuse DB connections?

I have a Google App Engine app that uses a Google Cloud SQL instance for storing data. I need my instance to be able to serve hundreds of clients at a time, via restful calls, which each result in one or a handful of DB queries. I've wrapped the methods that need DB access and store the handle to the DB connection in os.environ. See this SO question/answer for basically how I'm doing it.
However, as soon as a couple hundred clients connect to my app and trigger database calls, I start getting these errors in the Google App Engine error logs (and my app returns 500, of course):
could not connect: ApplicationError: 1033 Instance has too many concurrent requests: 100 Traceback (most recent call last): File "/base/python27_run
Any tips from experienced users of Google App Engine and Google Cloud SQL? Thanks in advance.
Here's the code for the decorator I use around methods that require DB connection:
def with_db_cursor(do_commit = False):
""" Decorator for managing DB connection by wrapping around web calls.
Stores connections and open connection count in the os.environ dictionary
between calls. Sets a cursor variable in the wrapped function. Optionally
does a commit. Closes the cursor when wrapped method returns, and closes
the DB connection if there are no outstanding cursors.
If the wrapped method has a keyword argument 'existing_cursor', whose value
is non-False, this wrapper is bypassed, as it is assumed another cursor is
already in force because of an alternate call stack.
Based mostly on post by : Shay Erlichmen
At: https://stackoverflow.com/a/10162674/379037
"""
def method_wrap(method):
def wrap(*args, **kwargs):
if kwargs.get('existing_cursor', False):
#Bypass everything if method called with existing open cursor
vdbg('Shortcircuiting db wrapper due to exisiting_cursor')
return method(None, *args, **kwargs)
conn = os.environ.get("__data_conn")
# Recycling connection for the current request
# For some reason threading.local() didn't work
# and yes os.environ is supposed to be thread safe
if not conn:
conn = _db_connect()
os.environ["__data_conn"] = conn
os.environ["__data_conn_ref"] = 1
dbg('Opening first DB connection via wrapper.')
else:
os.environ["__data_conn_ref"] = (os.environ["__data_conn_ref"] + 1)
vdbg('Reusing existing DB connection. Count using is now: {0}',
os.environ["__data_conn_ref"])
try:
cursor = conn.cursor()
try:
result = method(cursor, *args, **kwargs)
if do_commit or os.environ.get("__data_conn_commit"):
os.environ["__data_conn_commit"] = False
dbg('Wrapper executing DB commit.')
conn.commit()
return result
finally:
cursor.close()
finally:
os.environ["__data_conn_ref"] = (os.environ["__data_conn_ref"] -
1)
vdbg('One less user of DB connection. Count using is now: {0}',
os.environ["__data_conn_ref"])
if os.environ["__data_conn_ref"] == 0:
dbg("No more users of this DB connection. Closing.")
os.environ["__data_conn"] = None
db_close(conn)
return wrap
return method_wrap
def db_close(db_conn):
if db_conn:
try:
db_conn.close()
except:
err('Unable to close the DB connection.', )
raise
else:
err('Tried to close a non-connected DB handle.')
Short answer:
Your queries are probably too slow and the mysql server doesn't have enough threads to process all of the requests you are trying to send it.
Long Answer:
As background, Cloud SQL has two limits that are relevant here:
Connections: These correspond to the 'conn' object in your code. There is a corresponding datastructure on the server. Once you have too many of these objects (currently configured to 1000), the least recently used will automatically be closed. When a connection gets closed underneath you, you'll get an unknown connection error (ApplicationError: 1007) the next time you try to use that connection.
Concurrent Requests: These are queries that are executing on the server. Each executing query ties up a thread in the server, so there is a limit of 100. When there are too many concurrent requests, subsequent requests will be rejected with the error you are getting (ApplicationError: 1033)
It doesn't sound like the connection limit is affecting you, but I wanted to mention it just in case.
When it comes to Concurrent Requests, increasing the limit might help, but it usually makes the problem worse. There are two cases we've seen in the past:
Deadlock: A long running query is locking a critical row of the database. All subsequent queries block on that lock. The app times out on those queries, but they keep running on the server, tying up those threads until the deadlock timeout triggers.
Slow Queries: Each query is really, really slow. This usually happens when the query requires a temporary file sort. The application times out and retries the query while the first try of the query is still running and counting against the concurrent request limit. If you can find your average query time, you can get an estimate of how many QPS your mysql instance can support (e.g. 5 ms per query means 200 QPS for each thread. Since there are 100 threads, you could do 20,000 QPS. 50 ms per query means 2000 QPS.)
You should use EXPLAIN and SHOW ENGINE INNODB STATUS to see which of the two problems is going on.
Of course, it is also possible that you are just driving a ton of traffic at your instance and there just aren't enough threads. In that case, you'll probably be maxing out the cpu for the instance anyway, so adding more threads won't help.
I read from the documentation and noticed there's a 12 connection / instance limit:
Look for "Each App Engine instance cannot have more than 12 concurrent connections to a Google Cloud SQL instance." in https://developers.google.com/appengine/docs/python/cloud-sql/

Database connection management when using only Django ORM

I am using Django ORM layer outside of Django. The project is a web application using a cusotm in-house built framework.
Now, I had no problems to set up Django ORM to run standalone, but I am a little bit worried about connection management. I have read Using only DB part of Django here at SO and it is true that Django does some special connection handling at the beginning and the end of each request. From django/db/__init__.py:
# Register an event that closes the database connection
# when a Django request is finished.
def close_connection(**kwargs):
for conn in connections.all():
conn.close()
signals.request_finished.connect(close_connection)
# Register an event that resets connection.queries
# when a Django request is started.
def reset_queries(**kwargs):
for conn in connections.all():
conn.queries = []
signals.request_started.connect(reset_queries)
# Register an event that rolls back the connections
# when a Django request has an exception.
def _rollback_on_exception(**kwargs):
from django.db import transaction
for conn in connections:
try:
transaction.rollback_unless_managed(using=conn)
except DatabaseError:
pass
signals.got_request_exception.connect(_rollback_on_exception)
What problems can I run into if I skip this connection management? (I have no way to plug in those signals into my framework easily)
It depends on your use case. Each of these functions do something specific, which may or may not affect you.
If this is a long-running process and you have DEBUG on, you'll need to reset the queries or it'll keep all the queries you've run in memory.
If you spawn lots of threads, connect to the DB once early on in each thread, and then leave the threads running, you'll also want to close the connections when you're done using the DB, or you could hit your DB's connection limit.
You almost certainly don't need _rollback_on_exception - I'm assuming you have your intended transaction behavior set up within the relevant code itself.

Categories