I wonder is it fine if I keep a reference to a db and a collection as class members ?
Like that
from pymongo import MongoClient
class ClientDataStore(object):
BASE_MONGO_CONNECTION_URL = 'mongodb://localhost:27017/'
MAIN_DB_NAME = "bank"
CLIENT_COLLECTION_NAME = "client"
def __init__(self):
self.mongo = MongoClient(ClientDataStore.BASE_MONGO_CONNECTION_URL)
self.db = self.mongo[ClientDataStore.MAIN_DB_NAME]
self.client_collection = self.db[ClientDataStore.CLIENT_COLLECTION_NAME]
def get_client_info(self, id):
client = self.client_collection.find_one({"_id": id})
return client
Will it keep the opened connection or it will open it as necessary ?
Or I should open the db and get the collection all only when I need this ?
Thanks
This is a good idea. MongoClient has a connection pool that keeps open connections indefinitely. Keeping an open connection will reduce latency and increase throughput in your application. See the Connection Pool FAQ for PyMongo.
Related
I have an issue with in my Flask app concerning SQLAlchemy and MySQL.
In one of my file: connection.py I have a function that creates a DB connection and set it as a global variable:
db = None
def db_connect(force=False):
global db
db = pymysql.connect(.....)
def makecursor():
cursor = db.cursor(pymysql.cursors.DictCursor)
return db, cursor
And then I have a User Model created with SQL ALchemy models.ppy
class User(Model):
id = column........
It inherits from Model which is a class that I create in another file orm.py
import connection
Engine = create_engine(url, creator=lambda x: connection.makecursor()[0], pool_pre_ping=True)
session_factory = sessionmaker(bind=Engine, autoflush=autoflush)
Session = scoped_session(session_factory)
class _Model:
query = Session.query_property()
Model = declarative_base(cls=_Model, constructor=model_constructor)
In my application I can have long script running so the DB timeout. So I have a function that "reconnect" my DB (it actually only create a new connexion and replace the global DB variable)
My goal is to be able to catch the close of my DB and reconnect it instantly. I tried with SQLAlchemy events but it never worked. (here)
Here is some line that reproduces the error:
res = User.query.filter_by(username="myuser#gmail.com").first()
connection.db.close()
# connection.reconnect() # --> SOLUTION
res = User.query.filter_by(username="myuser#gmail.com").first()
If you guys have any ideas of how to achieve that, let me know 🙏🏻
Oh and I forgot, this application is still running with python2.7.
I'm a newbie to Python and Flask, and I use Oracle, when learning Flask tutorial, I code as follow, but it smells really bad, please help me with these questions, thanks a lot!
1) need I release connection to poll explicitly?
2) how can I implement poll acquire and release gracefully?
def get_dbpool():
if not hasattr(g, 'db_pool'):
g.dbPool = connect_db()
return g.dbPool
#app.teardown_appcontext
def close_db(error):
if hasattr(g, 'db_pool'):
g.dbPool.close()
#app.route('/')
def hello_world():
db = get_dbpool().acquire()
cursor=db.cursor()
sql=''
cursor.execute(sql)
rows = cursor.fetchall()
cursor.close()
get_dbpool().release(db)
return json.jsonify(combines=rows)
There is no need to release the connection to the pool explicitly unless you intend to keep processing for some time and don't need the connection any longer. cx_Oracle automatically releases the connection back to the pool when the connection goes out of scope (function ends), provided that you haven't implemented a circular reference to the connection, of course! In that case you would have to wait until garbage collection executes. Hopefully that answers your questions!
There are many questions about is django db connection thread safe, but they all seem to be asking the default request threads.
What if I am writing custom script that uses database connection in threads:
from django.db import connections
import threading
class Transform(object):
def transform_data(self, listing):
cursor = self.connection.cursor()
cursor.execute('SELECT ... WHERE id = %s', listing.id)
data = cursor.fetchall()
...
def run(self):
connection = self.connections['legacy']
for listing in listings:
threading.Thread(target=self.transform_data, args=[listing])
How safe is data inside transform_data thread in terms of the result from cursor is not mixed up with other threads?
Ideally each thread should be using its own connection. If you do that when you execute the select query inside transform_data you are essentially getting a snapshot of the data at that point in time. You can retrieve the rows without having to worry about their being updated or deleted by other threads provided that the other threads have their own connection.
If all threads share the same connection what exactly happens is very dependent on what database you are using and transaction isolation level
Each item in the connections object returns a thread-local connection to that database. By default, these connections cannot be shared between threads; attempting to do so will result in a DatabaseError.
Always use connections[alias] within the thread that executes your queries. Never access connections[alias] in the parent thread and pass the object to the child thread. This will ensure that every connection object you use is local to the current thread, avoiding any threading issues.
To fix your code and make it thread-safe, you would change it like this:
from django.db import connections
import threading
class Transform(object):
def transform_data(self, listing):
# Access the database connection on the global `connections` object
# from within the child thread.
cursor = connections['legacy'].cursor()
cursor.execute('SELECT ... WHERE id = %s', listing.id)
data = cursor.fetchall()
...
def run(self):
for listing in listings:
threading.Thread(target=self.transform_data, args=[listing])
So I am building a mongo database class that will be provide access to inserting documents to the insertion service and provide access for viewing documents via a querying service. Right now I have the following for my database.py class:
import pymongo
client = pymongo.MongoClient('mongodb://localhost:27017/')
db_connection = client['my_database']
class DB_Object(object):
""" A class providing structure and access to the Database """
def add_document(self, json_obj):
coll = db_connection["some collection"]
document = {
"name" : "imma name",
"raw value" : 777,
"converted value" : 333
}
coll.insert(document)
def query_response(self, query):
"""query logic here"""
If I want concurrent queries and inserts with this class being called by multiple services is this the correct location for the lines:
client = pymongo.MongoClient('mongodb://localhost:27017/')
db_connection = client['my_database']
And is this a standard way to provide access?
Your code is correct. You should continue to use the same MongoClient instance for all operations in your application, this will ensure that all operations share the same connection pool and use as few connections as possible--this will maximize your efficiency. MongoClient is thread-safe so this will work even if you have concurrent operations on multiple threads.
I'm developing on heroku using their Postgres add-on with the Dev plan, which has a connection limit of 20. I'm new to python and this may be trivial, but I find it difficult to abstract the database connection without causing OperationalError: (OperationalError) FATAL: too many connections for role.
Currently I have databeam.py:
import os
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy
from settings import databaseSettings
class Db(object):
def __init__(self):
self.app = Flask(__name__)
self.app.config.from_object(__name__)
self.app.config['SQLALCHEMY_DATABASE_URI'] = os.environ.get('DATABASE_URL', databaseSettings())
self.db = SQLAlchemy(self.app)
db = Db()
And when I'm creating a controller for a page, I do this:
import databeam
db = databeam.db
locations = databeam.locations
templateVars = db.db.session.query(locations).filter(locations.parent == 0).order_by(locations.order.asc()).all()
This does produce what I want, but slowly and at times causes the error metioned above. Since I come from a php background I have a certain mindset of how to deal with DB connections (I.e. like the example above), but I fear it doesn't fit well with python.
What is the proper way of abstracting the db connection in one place and then just using the same connection in all imports?
Within SQL Alchemy you should be able to create a connection pool. This pool is what the pool size would be for each Dyno. On the Dev and Basic plan since you could have up to 20, you could set this at 20 if you run 1 dyno, 10 if you run 2, etc. To configure your pool you can setup the engine:
engine = create_engine('postgresql://me#localhost/mydb',
pool_size=20, max_overflow=0)
This sets up your db engine with a pool which you pull from automatically then. You can also configure the pool manually, more details on that can be found on the pooling guide of SQL Alchemy - http://docs.sqlalchemy.org/en/latest/core/pooling.html