Redis py: when to use connection pool? - python

pool = redis.ConnectionPool(host='10.0.0.1', port=6379, db=0)
r = redis.Redis(connection_pool=pool)
vs.
r = redis.Redis(host='10.0.0.1', port=6379, db=0)
Those two works fine.
Whats the idea behind using connection pool? When would you use it?

From the redis-py docs:
Behind the scenes, redis-py uses a connection pool to manage connections to a Redis server. By default, each Redis instance you create will in turn create its own connection pool. You can override this behavior and use an existing connection pool by passing an already created connection pool instance to the connection_pool argument of the Redis class. You may choose to do this in order to implement client side sharding or have finer grain control of how connections are managed.
So, normally this is not something you need to handle yourself, and if you do, then you know!

Related

How to initialise connection pool with "n" connections in Redis python library using redis.ConnectionPool

In the Redis Python library, the redis.ConnectionPool class is used to create and manage a pool of connections to a Redis server. A connection pool is an object that maintains a set of idle connections that can be used to send commands to the Redis server. When a new connection is needed, the connection pool creates a new connection and adds it to the pool. When a connection is no longer needed, it is returned to the pool to be reused later.
I want to perform a load test using locust, and exclude time for creating a new connection.
What is clean way to initialize with (say 100) connections?

Concurrency-safe way to initialize global data connections in Flask

Global variables are not thread-safe or "process-safe" in Flask.
However, I need to open connections to services that each worker will use, such as a PubSub client or a Cloud Storage client. It seems like these still need to be global so that any function in the application can access them. To lazily initialize them, I check if the variable is None, and this needs to be thread-safe. What is the recommended approach for opening connections that each request will use? Should I use a thread lock to synchronize?
The question you linked is talking about data, not connections. Having multiple workers mutating global data is not good because you can't reason about where those workers are in a web application to keep them in sync.
The solution to that question is to use an external data source, like a database, which must be connected to somehow. Your idea to have one global connection is not safe though, since multiple worker threads would interact with it concurrently and either mess with each other's state or wait one at a time to acquire the resource. The simplest way to handle this is to establish a connection in each view when you need it.
This example shows how to have a unique connection per request, without globals, reusing the connection once it's established for the request. The g object, while it looks like a global, is implemented as a thread-local behind the scenes, so each worker gets it's own g instance and connection stored on it during one request only.
from flask import g
def get_conn():
"""Use this function to establish or get the already established
connection during a request. The connection is closed at the end
of the request. This avoids having a global connection by storing
the connection on the g object per request.
"""
if "conn" not in g:
g.conn = make_connection(...)
return g.conn
#app.teardown_request
def close_conn(e):
"""Automatically close the connection after the request if
it was opened.
"""
conn = g.pop("conn", None)
if conn is not None:
conn.close()
#app.route("/get_data")
def get_data():
# If something else has already used get_conn during the
# request, this will return the same connection. Anything
# that uses it after this will also use the same connection.
conn = get_conn()
data = conn.query(...)
return jsonify(data)
You might eventually find that establishing a new connection each request is too expensive once you have many thousands of concurrent requests. One solution is to build a connection pool to store a list of connections globally, with a thread-safe way to acquire and replace a connection in the list as needed. SQLAlchemy (and Flask-SQLAlchemy) uses this technique. Many libraries already provide connection pool implementations, so either use them or use them as a reference for your own.

Client connection in Pymongo

How does MongoClient works and creates a connection pooling or thread creation?
What are major resources used if a create a multiple connections?
My main reson for asking is this ?
I have created multiple classes in python which represents functionality of single collection in mongodb. In each class i am creating a client
self.client = MongoClient(hostname, port)
What resources i need to worry about and what can be performance issues?
If there way i can share single client along all classes ?
Make one MongoClient. Make it a global variable in a module:
client = MongoClient(host, port)
A MongoClient has a built-in connection pool, and it starts a thread to monitor its connection to your server. For best efficiency, make one MongoClient and share it throughout your program.

Managing connection to redis from Python

I'm using redis-py in my python application to store simple variables or lists of variables in a Redis database, so I thought it would be better to create a connection to the redis server every time I need to save or retrieve a variable as this is not done very often and I don't want to have a permanent connection that timeout.
After reading through some basic tutorials, I created the connections using the Redis class, but have not found a way to close the connection, as this is the first time I'm using Redis. I'm not sure if I'm using the best approach for managing the connections so I would like some advice for this.
This is how I'm setting or getting a variable now:
import redis
def getVariable(variable_name):
my_server = redis.Redis("10.0.0.1")
response = my_server.get(variable_name)
return response
def setVariable(variable_name, variable_value):
my_server = redis.Redis("10.0.0.1")
my_server.set(variable_name, variable_value)
I basically use this code to store the last connection time or to get an average of requests per second done to my app and stuff like that.
Thanks for your advice.
Python uses a reference counter mechanism to deal with objects, so at the end of the blocks, the my_server object will be automatically destroyed and the connection closed. You do not need to close it explicitly.
Now this is not how you are supposed to manage Redis connections. Connecting/disconnecting for each operation is too expensive, so it is much better to maintain the connection opened. With redis-py it can be done by declaring a pool of connections:
import redis
POOL = redis.ConnectionPool(host='10.0.0.1', port=6379, db=0)
def getVariable(variable_name):
my_server = redis.Redis(connection_pool=POOL)
response = my_server.get(variable_name)
return response
def setVariable(variable_name, variable_value):
my_server = redis.Redis(connection_pool=POOL)
my_server.set(variable_name, variable_value)
Please note connection pool management is mostly automatic and done within redis-py.
#sg1990 what if you have 10.000 users requiring redis at the same time? They cannot share a single connection and you've just created yourself a bottleneck.
With a pool of connections you can create an arbitrary number of connections and simply use get_connection() and release(), from redis-py docs.
A connection per user is a huge overkill, since every connection needs to maintain an open socket. This way you'd automatically decrease a number of e.g. concurrent websocket users that your machine can handle by half.
you can use this to create two databases in redis:
r1 = redis.StrictRedis(host="localhost", port=6379, db=0, decode_responses=True)
r2 = redis.StrictRedis(host="localhost", port=6379, db=1, decode_responses=True)

MongoDB Connection Management in Python

Is it better to do
from pymongo import Connection
conn = Connection()
db = conn.db_name
def contrivedExample(firstName)
global db
return db.people.find_one({'first-name': firstName})
or
from pymongo import Connection
def contrivedExample(firstName):
with Connection() as conn:
return conn.db_name.people.find_one({'first-name': firstName})
Various basic MongoDB tutorials (Python-oriented or not) imply that an app should connect once at startup; is that actually the case? Does the answer change for non-trivial, long-running applications? Does the answer change for web applications specifically? What are the pros/cons of going single-connection vs. connection-per-request?
Assuming "once at startup" is the right answer, would it be appropriate to start that connection in __init__.py?
The pymongo Connection class supports connection pooling and, as of version 2.2, the auto_start_request option ensures that the same socket will be used for a connection activity during the lifetime of the thread (default behavior). Additionally, there is built-in support for reconnecting when necessary, although your application code should handle the immediate exception.
To your question, I believe it'd be preferable to rely on pymongo's own connection pooling and request a new connection per thread. This Stack Overflow thread also discusses some best practices and explains some of the options at play, which you may find helpful. If necessary, you have the option of sharing the same socket between threads.

Categories