Concurrency-safe way to initialize global data connections in Flask - python

Global variables are not thread-safe or "process-safe" in Flask.
However, I need to open connections to services that each worker will use, such as a PubSub client or a Cloud Storage client. It seems like these still need to be global so that any function in the application can access them. To lazily initialize them, I check if the variable is None, and this needs to be thread-safe. What is the recommended approach for opening connections that each request will use? Should I use a thread lock to synchronize?

The question you linked is talking about data, not connections. Having multiple workers mutating global data is not good because you can't reason about where those workers are in a web application to keep them in sync.
The solution to that question is to use an external data source, like a database, which must be connected to somehow. Your idea to have one global connection is not safe though, since multiple worker threads would interact with it concurrently and either mess with each other's state or wait one at a time to acquire the resource. The simplest way to handle this is to establish a connection in each view when you need it.
This example shows how to have a unique connection per request, without globals, reusing the connection once it's established for the request. The g object, while it looks like a global, is implemented as a thread-local behind the scenes, so each worker gets it's own g instance and connection stored on it during one request only.
from flask import g
def get_conn():
"""Use this function to establish or get the already established
connection during a request. The connection is closed at the end
of the request. This avoids having a global connection by storing
the connection on the g object per request.
"""
if "conn" not in g:
g.conn = make_connection(...)
return g.conn
#app.teardown_request
def close_conn(e):
"""Automatically close the connection after the request if
it was opened.
"""
conn = g.pop("conn", None)
if conn is not None:
conn.close()
#app.route("/get_data")
def get_data():
# If something else has already used get_conn during the
# request, this will return the same connection. Anything
# that uses it after this will also use the same connection.
conn = get_conn()
data = conn.query(...)
return jsonify(data)
You might eventually find that establishing a new connection each request is too expensive once you have many thousands of concurrent requests. One solution is to build a connection pool to store a list of connections globally, with a thread-safe way to acquire and replace a connection in the list as needed. SQLAlchemy (and Flask-SQLAlchemy) uses this technique. Many libraries already provide connection pool implementations, so either use them or use them as a reference for your own.

Related

Should a connection to Redis cluster be made on each Flask request?

I have a Flask API, it connects to a Redis cluster for caching purposes. Should I be creating and tearing down a Redis connection on each flask api call? Or, should I try and maintain a connection across requests?
My argument against the second option is that I should really try and keep the api as stateless as possible, and I also don't know if keeping some persistent across request might causes threads race conditions or other side effects.
However, if I want to persist a connection, should it be saved on the session or on the application context?
This is about performance and scale. To get those 2 buzzwords buzzing you'll in fact need persistent connections.
Eventual race conditions will be no different than with a reconnect on every request so that shouldn't be a problem. Any RCs will depend on how you're using redis, but if it's just caching there's not much room for error.
I understand the desired stateless-ness of an API from a client sides POV, but not so sure what you mean about the server side.
I'd suggest you put them in the application context, not the sessions (those could become too numerous) whereas the app context gives you the optimal 1 connection per process (and created immediately at startup). Scaling this way becomes easy-peasy: you'll never have to worry about hitting the max connection counts on the redis box (and the less multiplexing the better).
It's good idea from the performance standpoint to keep connections to a database opened between requests. The reason for that is that opening and closing connections is not free and takes some time which may become problem when you have too many requests. Another issue that a database can only handle up to a certain number of connections and if you open more, database performance will degrade, so you need to control how many connections are opened at the same time.
To solve both of these issues you may use a connection pool. A connection pool contains a number of opened database connections and provides access to them. When a database operation should be performed from a connection shoul be taken from a pool. When operation is completed a connection should be returned to the pool. If a connection is requested when all connections are taken a caller will have to wait until some connections are returned to the pool. Since no new connections are opened in this processed (they all opened in advance) this will ensure that a database will not be overloaded with too many parallel connections.
If connection pool is used correctly a single connection will be used by only one thread will use it at any moment.
Despite of the fact that connection pool has a state (it should track what connections are currently in use) your API will be stateless. This is because from the API perspective "stateless" means: does not have a state/side-effects visible to an API user. Your server can perform a number of operations that change its internal state like writing to log files or writing to a cache, but since this does not influence what data is being returned as a reply to API calls this does not make this API "stateful".
You can see some examples of using Redis connection pool here.
Regarding where it should be stored I would use application context since it fits better to its purpose.

Client connection in Pymongo

How does MongoClient works and creates a connection pooling or thread creation?
What are major resources used if a create a multiple connections?
My main reson for asking is this ?
I have created multiple classes in python which represents functionality of single collection in mongodb. In each class i am creating a client
self.client = MongoClient(hostname, port)
What resources i need to worry about and what can be performance issues?
If there way i can share single client along all classes ?
Make one MongoClient. Make it a global variable in a module:
client = MongoClient(host, port)
A MongoClient has a built-in connection pool, and it starts a thread to monitor its connection to your server. For best efficiency, make one MongoClient and share it throughout your program.

How to call a function with a class object as a parameter in string form

class host_struct(object):
host_id = dict()
host_license_id = dict()
def m(a):
eval(a)
host = host_struct()
m('host.host_id={1:1}')
print host
The above code doesn't work and is a sample of what I am trying to accomplish. I am trying to solve a problem where I need to call a function with a class object as a string, yet in the function manipulate the object as a class.
Here is my problem: I have a connection pooler/broker module, that maintains a persistent connection to the server. The server sets a inactivity TTL on all connections of 30 minutes. So every 29 minutes the broker need to touch the server to maintain a persistent connection. At the same time the connection broker needs to process client requests which it will send to the server and when the server responds, send the server's reply to the client.
The communications to the server are via a connection class that has many complex objects. So allowing the client modules to directly manipulate the class would bypass the connection broker entirely which will result in the server terminating the connection due to the inactivity TTL.
Is this possible? Is there a better way to address this problem?
Here is some additional background. I am opening a connection to VMWare vCenter. To initiate the connection, I instantiate the connection class, then call a connection method. Currently in my client programs, I am doing all of this now. However I am running into a problem with vCenter and need to connect once when I start the program and use the same connection for the entire run. Currently I am opening a connection to vCenter do my work, close the connection and sleep for a period of time then repeat the process. This continual connect/disconnect is causing issues. So I wrote a test to see if I could address the issues my maintaining a persistent connection and I was successful.
vcenter = VIServer()
vcenter.connect(*config_values)
At this point, the vcenter object is connected to the server. There are several method calls I need to make to query certain objects. Here are 2 examples of the many I use:
vms = vcenter._retrieve_properties_traversal(property_names=vm_objects,obj_type='VirtualMachine')
or
api_version = vcenter.get_api_version()
The first line will retrieve specific VM objects from the server and the second gets the API version. I would like to call this method from the connection broker because he will be the one that is keeping the connection to vCenter open.
So in my connection broker I would like to pass 'vcenter.get_api_version()' as a string argument and have the connection broker execute api = vcenter.get_api_version().
Does this help to clarify?
Use exec instead of eval. Example:
class host_struct: # no need for parentheses if not inheriting from something besides object
host_id = {} # use of {} is more idiomatic than dict()
host_license_id = {}
def m(a):
exec a
host = host_struct()
m('host.host_id.update({1:1})') # updating will add to existing dict instead of replacing
print host.host_id[1]
Running this script produces the expecte output of 1.

Managing connection to redis from Python

I'm using redis-py in my python application to store simple variables or lists of variables in a Redis database, so I thought it would be better to create a connection to the redis server every time I need to save or retrieve a variable as this is not done very often and I don't want to have a permanent connection that timeout.
After reading through some basic tutorials, I created the connections using the Redis class, but have not found a way to close the connection, as this is the first time I'm using Redis. I'm not sure if I'm using the best approach for managing the connections so I would like some advice for this.
This is how I'm setting or getting a variable now:
import redis
def getVariable(variable_name):
my_server = redis.Redis("10.0.0.1")
response = my_server.get(variable_name)
return response
def setVariable(variable_name, variable_value):
my_server = redis.Redis("10.0.0.1")
my_server.set(variable_name, variable_value)
I basically use this code to store the last connection time or to get an average of requests per second done to my app and stuff like that.
Thanks for your advice.
Python uses a reference counter mechanism to deal with objects, so at the end of the blocks, the my_server object will be automatically destroyed and the connection closed. You do not need to close it explicitly.
Now this is not how you are supposed to manage Redis connections. Connecting/disconnecting for each operation is too expensive, so it is much better to maintain the connection opened. With redis-py it can be done by declaring a pool of connections:
import redis
POOL = redis.ConnectionPool(host='10.0.0.1', port=6379, db=0)
def getVariable(variable_name):
my_server = redis.Redis(connection_pool=POOL)
response = my_server.get(variable_name)
return response
def setVariable(variable_name, variable_value):
my_server = redis.Redis(connection_pool=POOL)
my_server.set(variable_name, variable_value)
Please note connection pool management is mostly automatic and done within redis-py.
#sg1990 what if you have 10.000 users requiring redis at the same time? They cannot share a single connection and you've just created yourself a bottleneck.
With a pool of connections you can create an arbitrary number of connections and simply use get_connection() and release(), from redis-py docs.
A connection per user is a huge overkill, since every connection needs to maintain an open socket. This way you'd automatically decrease a number of e.g. concurrent websocket users that your machine can handle by half.
you can use this to create two databases in redis:
r1 = redis.StrictRedis(host="localhost", port=6379, db=0, decode_responses=True)
r2 = redis.StrictRedis(host="localhost", port=6379, db=1, decode_responses=True)

pymongo connection pooling and client requests

I know pymongo is thread safe and has an inbuilt connection pool.
In a web app that I am working on, I am creating a new connection instance on every request.
My understanding is that since pymongo manages the connection pool, it isn't wrong approach to create a new connection on each request, as at the end of the request the connection instance will be reclaimed and will be available on subsequent requests.
Am I correct here, or should I just create a single instance to use across multiple requests?
The "wrong approach" depends upon the architecture of your application. With pymongo being thread-safe and automatic connection pooling, the actual use of a single shared connection, or multiple connections, is going to "work". But the results will depend on what you expect the behavior to be. The documentation comments on both cases.
If your application is threaded, from the docs, each thread accessing a connection will get its own socket. So whether you create a single shared connection, or request a new one, it comes down to whether your requests are threaded or not.
When using gevent, you can have a socket per greenlet. This means you don't have to have a true thread per request. The requests can be async, and still get their own socket.
In a nutshell:
If your webapp requests are threaded, then it doesn't matter which way you access a new connection. The result will be the same (socket per thread)
If your webapp is async via gevent, then it doesn't matter which way you access a new conection. The result will be the same. (socket per greenlet)
If your webapp is async, but NOT via gevent, then you have to take into consideration the notes on the best suggested workflow.

Categories