So I am building a mongo database class that will be provide access to inserting documents to the insertion service and provide access for viewing documents via a querying service. Right now I have the following for my database.py class:
import pymongo
client = pymongo.MongoClient('mongodb://localhost:27017/')
db_connection = client['my_database']
class DB_Object(object):
""" A class providing structure and access to the Database """
def add_document(self, json_obj):
coll = db_connection["some collection"]
document = {
"name" : "imma name",
"raw value" : 777,
"converted value" : 333
}
coll.insert(document)
def query_response(self, query):
"""query logic here"""
If I want concurrent queries and inserts with this class being called by multiple services is this the correct location for the lines:
client = pymongo.MongoClient('mongodb://localhost:27017/')
db_connection = client['my_database']
And is this a standard way to provide access?
Your code is correct. You should continue to use the same MongoClient instance for all operations in your application, this will ensure that all operations share the same connection pool and use as few connections as possible--this will maximize your efficiency. MongoClient is thread-safe so this will work even if you have concurrent operations on multiple threads.
Related
I wonder is it fine if I keep a reference to a db and a collection as class members ?
Like that
from pymongo import MongoClient
class ClientDataStore(object):
BASE_MONGO_CONNECTION_URL = 'mongodb://localhost:27017/'
MAIN_DB_NAME = "bank"
CLIENT_COLLECTION_NAME = "client"
def __init__(self):
self.mongo = MongoClient(ClientDataStore.BASE_MONGO_CONNECTION_URL)
self.db = self.mongo[ClientDataStore.MAIN_DB_NAME]
self.client_collection = self.db[ClientDataStore.CLIENT_COLLECTION_NAME]
def get_client_info(self, id):
client = self.client_collection.find_one({"_id": id})
return client
Will it keep the opened connection or it will open it as necessary ?
Or I should open the db and get the collection all only when I need this ?
Thanks
This is a good idea. MongoClient has a connection pool that keeps open connections indefinitely. Keeping an open connection will reduce latency and increase throughput in your application. See the Connection Pool FAQ for PyMongo.
I need to access two different collection each in their respective databases on the same server. For example i need the collection "dummy" in the database "dummy" and collection "foo" in database "bar". To connect to single database I have been using this code
client = MongoClient()
db = client.dummy()
collection = db['dummy']
But if I also add
db1 = client.bar
collection = db1['foo']
This is not working.
There are many questions about is django db connection thread safe, but they all seem to be asking the default request threads.
What if I am writing custom script that uses database connection in threads:
from django.db import connections
import threading
class Transform(object):
def transform_data(self, listing):
cursor = self.connection.cursor()
cursor.execute('SELECT ... WHERE id = %s', listing.id)
data = cursor.fetchall()
...
def run(self):
connection = self.connections['legacy']
for listing in listings:
threading.Thread(target=self.transform_data, args=[listing])
How safe is data inside transform_data thread in terms of the result from cursor is not mixed up with other threads?
Ideally each thread should be using its own connection. If you do that when you execute the select query inside transform_data you are essentially getting a snapshot of the data at that point in time. You can retrieve the rows without having to worry about their being updated or deleted by other threads provided that the other threads have their own connection.
If all threads share the same connection what exactly happens is very dependent on what database you are using and transaction isolation level
Each item in the connections object returns a thread-local connection to that database. By default, these connections cannot be shared between threads; attempting to do so will result in a DatabaseError.
Always use connections[alias] within the thread that executes your queries. Never access connections[alias] in the parent thread and pass the object to the child thread. This will ensure that every connection object you use is local to the current thread, avoiding any threading issues.
To fix your code and make it thread-safe, you would change it like this:
from django.db import connections
import threading
class Transform(object):
def transform_data(self, listing):
# Access the database connection on the global `connections` object
# from within the child thread.
cursor = connections['legacy'].cursor()
cursor.execute('SELECT ... WHERE id = %s', listing.id)
data = cursor.fetchall()
...
def run(self):
for listing in listings:
threading.Thread(target=self.transform_data, args=[listing])
I'm creating an iOS client for App.net and I'm attempting to setup a push notification server. Currently my app can add a user's App.net account id (a string of numbers) and a APNS device token to a MySQL database on my server. It can also remove this data. I've adapted code from these two tutorials:
How To Write A Simple PHP/MySQL Web Service for an iOS App - raywenderlich.com
Apple Push Notification Services in iOS 6 Tutorial: Part 1/2 - raywenderlich.com
In addition, I've adapted this awesome python script to listen in to App.net's App Stream API.
My python is horrendous, as is my MySQL knowledge. What I'm trying to do is access the APNS device token for the accounts I need to notify. My database table has two fields/columns for each entry, one for user_id and a one for device_token. I'm not sure of the terminology, please let me know if I can clarify this.
I've been trying to use peewee to read from the database but I'm in way over my head. This is a test script with placeholder user_id:
import logging
from pprint import pprint
import peewee
from peewee import *
db = peewee.MySQLDatabase("...", host="localhost", user="...", passwd="...")
class MySQLModel(peewee.Model):
class Meta:
database = db
class Active_Users(MySQLModel):
user_id = peewee.CharField(primary_key=True)
device_token = peewee.CharField()
db.connect()
# This is the placeholder user_id
userID = '1234'
token = Active_Users.select().where(Active_Users.user_id == userID)
pprint(token)
This then prints out:
<class '__main__.User'> SELECT t1.`id`, t1.`user_id`, t1.`device_token` FROM `user` AS t1 WHERE (t1.`user_id` = %s) [u'1234']
If the code didn't make it clear, I'm trying to query the database for the row with the user_id of '1234' and I want to store the device_token of the same row (again, probably the wrong terminology) into a variable that I can use when I send the push notification later on in the script.
How do I correctly return the device_token? Also, would it be easier to forgo peewee and simply query the database using python-mysqldb? If that is the case, how would I go about doing that?
The call User.select().where(User.user_id == userID) returns a User object but you are assigning it to a variable called token as you're expecting just the device_token.
Your assignment should be this:
matching_users = Active_Users.select().where(Active_Users.user_id == userID) # returns an array of matching users even if there's just one
if matching_users is not None:
token = matching_users[0].device_token
I'm developing on heroku using their Postgres add-on with the Dev plan, which has a connection limit of 20. I'm new to python and this may be trivial, but I find it difficult to abstract the database connection without causing OperationalError: (OperationalError) FATAL: too many connections for role.
Currently I have databeam.py:
import os
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy
from settings import databaseSettings
class Db(object):
def __init__(self):
self.app = Flask(__name__)
self.app.config.from_object(__name__)
self.app.config['SQLALCHEMY_DATABASE_URI'] = os.environ.get('DATABASE_URL', databaseSettings())
self.db = SQLAlchemy(self.app)
db = Db()
And when I'm creating a controller for a page, I do this:
import databeam
db = databeam.db
locations = databeam.locations
templateVars = db.db.session.query(locations).filter(locations.parent == 0).order_by(locations.order.asc()).all()
This does produce what I want, but slowly and at times causes the error metioned above. Since I come from a php background I have a certain mindset of how to deal with DB connections (I.e. like the example above), but I fear it doesn't fit well with python.
What is the proper way of abstracting the db connection in one place and then just using the same connection in all imports?
Within SQL Alchemy you should be able to create a connection pool. This pool is what the pool size would be for each Dyno. On the Dev and Basic plan since you could have up to 20, you could set this at 20 if you run 1 dyno, 10 if you run 2, etc. To configure your pool you can setup the engine:
engine = create_engine('postgresql://me#localhost/mydb',
pool_size=20, max_overflow=0)
This sets up your db engine with a pool which you pull from automatically then. You can also configure the pool manually, more details on that can be found on the pooling guide of SQL Alchemy - http://docs.sqlalchemy.org/en/latest/core/pooling.html