How to handle Redis connection across multiple modules?

How to handle Redis connection across multiple modules? - python

I'm building an application that uses Redis as a datastore. Accordingly, I have many functions that interact with Redis, usually as wrappers for a group of Redis commands.
As the application grows past my initial .py file, I'm at a loss for how to handle the Redis connection across multiple modules. Currently, my pointer to the Redis connection is declared at the top of the file and every function assumes it's present rather than passing it to every function. If I spread these functions into multiple files, then each module creates its own Redis pointer to use and each instance of the application opens up multiple connections to Redis.
I would like one instance to just make use of the same connection.
I don't want to do this:
import redis
class MyApp(object):
def __init__(self):
self.r = redis.Redis()
(all my app functions that touch redis go here)
I also don't want to pass the Redis pointer as an argument into every function.
Is there some other way I can get functions from different modules to share a single Redis() instance?

Something like this:
Module redis_manager:
class RedisManager(object):
def __init__():
# Connect to redis etc
self.redis = 12345
redis_manager = RedisManager()
Then in your other modules, you can do:
from redis_manager import redis_manager
redis_manager.redis.stuff

Related

What happens when two functions try to use the same List to update at the same time in Python [duplicate]

In my application, the state of a common object is changed by making requests, and the response depends on the state.
class SomeObj():
def __init__(self, param):
self.param = param
def query(self):
self.param += 1
return self.param
global_obj = SomeObj(0)
#app.route('/')
def home():
flash(global_obj.query())
render_template('index.html')
If I run this on my development server, I expect to get 1, 2, 3 and so on. If requests are made from 100 different clients simultaneously, can something go wrong? The expected result would be that the 100 different clients each see a unique number from 1 to 100. Or will something like this happen:
Client 1 queries. self.param is incremented by 1.
Before the return statement can be executed, the thread switches over to client 2. self.param is incremented again.
The thread switches back to client 1, and the client is returned the number 2, say.
Now the thread moves to client 2 and returns him/her the number 3.
Since there were only two clients, the expected results were 1 and 2, not 2 and 3. A number was skipped.
Will this actually happen as I scale up my application? What alternatives to a global variable should I look at?

You can't use global variables to hold this sort of data. Not only is it not thread safe, it's not process safe, and WSGI servers in production spawn multiple processes. Not only would your counts be wrong if you were using threads to handle requests, they would also vary depending on which process handled the request.
Use a data source outside of Flask to hold global data. A database, memcached, or redis are all appropriate separate storage areas, depending on your needs. If you need to load and access Python data, consider multiprocessing.Manager. You could also use the session for simple data that is per-user.
The development server may run in single thread and process. You won't see the behavior you describe since each request will be handled synchronously. Enable threads or processes and you will see it. app.run(threaded=True) or app.run(processes=10). (In 1.0 the server is threaded by default.)
Some WSGI servers may support gevent or another async worker. Global variables are still not thread safe because there's still no protection against most race conditions. You can still have a scenario where one worker gets a value, yields, another modifies it, yields, then the first worker also modifies it.
If you need to store some global data during a request, you may use Flask's g object. Another common case is some top-level object that manages database connections. The distinction for this type of "global" is that it's unique to each request, not used between requests, and there's something managing the set up and teardown of the resource.

This is not really an answer to thread safety of globals.
But I think it is important to mention sessions here.
You are looking for a way to store client-specific data. Every connection should have access to its own pool of data, in a threadsafe way.
This is possible with server-side sessions, and they are available in a very neat flask plugin: https://pythonhosted.org/Flask-Session/
If you set up sessions, a session variable is available in all your routes and it behaves like a dictionary. The data stored in this dictionary is individual for each connecting client.
Here is a short demo:
from flask import Flask, session
from flask_session import Session
app = Flask(__name__)
# Check Configuration section for more details
SESSION_TYPE = 'filesystem'
app.config.from_object(__name__)
Session(app)
#app.route('/')
def reset():
session["counter"]=0
return "counter was reset"
#app.route('/inc')
def routeA():
if not "counter" in session:
session["counter"]=0
session["counter"]+=1
return "counter is {}".format(session["counter"])
#app.route('/dec')
def routeB():
if not "counter" in session:
session["counter"] = 0
session["counter"] -= 1
return "counter is {}".format(session["counter"])
if __name__ == '__main__':
app.run()
After pip install Flask-Session, you should be able to run this. Try accessing it from different browsers, you'll see that the counter is not shared between them.

Another example of a data source external to requests is a cache, such as what's provided by Flask-Caching or another extension.
Create a file common.py and place in it the following:
from flask_caching import Cache
# Instantiate the cache
cache = Cache()
In the file where your flask app is created, register your cache with the following code:
# Import cache
from common import cache
# ...
app = Flask(__name__)
cache.init_app(app=app, config={"CACHE_TYPE": "filesystem",'CACHE_DIR': Path('/tmp')})
Now use throughout your application by importing the cache and executing as follows:
# Import cache
from common import cache
# store a value
cache.set("my_value", 1_000_000)
# Get a value
my_value = cache.get("my_value")

While totally accepting the previous upvoted answers, and discouraging use of global variables for production and scalable Flask storage, for the purpose of prototyping or really simple servers, running under the flask 'development server'...
...
The Python built-in data types, and I personally used and tested the global dict, as per Python documentation are thread safe. Not process safe.
The insertions, lookups, and reads from such a (server global) dict will be OK from each (possibly concurrent) Flask session running under the development server.
When such a global dict is keyed with a unique Flask session key, it can be rather useful for server-side storage of session specific data otherwise not fitting into the cookie (max size 4 kB).
Of course, such a server global dict should be carefully guarded for growing too large, being in-memory. Some sort of expiring the 'old' key/value pairs can be coded during request processing.
Again, it is not recommended for production or scalable deployments, but it is possibly OK for local task-oriented servers where a separate database is too much for the given task.
...

Sharing a variable between Celery tasks [duplicate]

In my application, the state of a common object is changed by making requests, and the response depends on the state.
class SomeObj():
def __init__(self, param):
self.param = param
def query(self):
self.param += 1
return self.param
global_obj = SomeObj(0)
#app.route('/')
def home():
flash(global_obj.query())
render_template('index.html')
If I run this on my development server, I expect to get 1, 2, 3 and so on. If requests are made from 100 different clients simultaneously, can something go wrong? The expected result would be that the 100 different clients each see a unique number from 1 to 100. Or will something like this happen:
Client 1 queries. self.param is incremented by 1.
Before the return statement can be executed, the thread switches over to client 2. self.param is incremented again.
The thread switches back to client 1, and the client is returned the number 2, say.
Now the thread moves to client 2 and returns him/her the number 3.
Since there were only two clients, the expected results were 1 and 2, not 2 and 3. A number was skipped.
Will this actually happen as I scale up my application? What alternatives to a global variable should I look at?

This is not really an answer to thread safety of globals.
But I think it is important to mention sessions here.
You are looking for a way to store client-specific data. Every connection should have access to its own pool of data, in a threadsafe way.
This is possible with server-side sessions, and they are available in a very neat flask plugin: https://pythonhosted.org/Flask-Session/
If you set up sessions, a session variable is available in all your routes and it behaves like a dictionary. The data stored in this dictionary is individual for each connecting client.
Here is a short demo:
from flask import Flask, session
from flask_session import Session
app = Flask(__name__)
# Check Configuration section for more details
SESSION_TYPE = 'filesystem'
app.config.from_object(__name__)
Session(app)
#app.route('/')
def reset():
session["counter"]=0
return "counter was reset"
#app.route('/inc')
def routeA():
if not "counter" in session:
session["counter"]=0
session["counter"]+=1
return "counter is {}".format(session["counter"])
#app.route('/dec')
def routeB():
if not "counter" in session:
session["counter"] = 0
session["counter"] -= 1
return "counter is {}".format(session["counter"])
if __name__ == '__main__':
app.run()
After pip install Flask-Session, you should be able to run this. Try accessing it from different browsers, you'll see that the counter is not shared between them.

Another example of a data source external to requests is a cache, such as what's provided by Flask-Caching or another extension.
Create a file common.py and place in it the following:
from flask_caching import Cache
# Instantiate the cache
cache = Cache()
In the file where your flask app is created, register your cache with the following code:
# Import cache
from common import cache
# ...
app = Flask(__name__)
cache.init_app(app=app, config={"CACHE_TYPE": "filesystem",'CACHE_DIR': Path('/tmp')})
Now use throughout your application by importing the cache and executing as follows:
# Import cache
from common import cache
# store a value
cache.set("my_value", 1_000_000)
# Get a value
my_value = cache.get("my_value")

While totally accepting the previous upvoted answers, and discouraging use of global variables for production and scalable Flask storage, for the purpose of prototyping or really simple servers, running under the flask 'development server'...
...
The Python built-in data types, and I personally used and tested the global dict, as per Python documentation are thread safe. Not process safe.
The insertions, lookups, and reads from such a (server global) dict will be OK from each (possibly concurrent) Flask session running under the development server.
When such a global dict is keyed with a unique Flask session key, it can be rather useful for server-side storage of session specific data otherwise not fitting into the cookie (max size 4 kB).
Of course, such a server global dict should be carefully guarded for growing too large, being in-memory. Some sort of expiring the 'old' key/value pairs can be coded during request processing.
Again, it is not recommended for production or scalable deployments, but it is possibly OK for local task-oriented servers where a separate database is too much for the given task.
...

Python module establishing database connection without main

I am moving the code that interacts with my Neo4J database into a separate python module so I can stop repeating myself in other modules.
The problem I am having is that in each function call in the new module I am having to have a separate call to...
db = Graph('http://localhost/db/data')
...to establish a connection to the database.
This seems really silly and is not solving my goal of reducing the amount of unnecessary code.
Normally, I would establish the connection in the main function but because this module is being called from elsewhere I can't do this.
I am looking for a way of establishing a local variable that will persist between function calls, so I can forget about connecting to the db.

Apart from code bloat you are making it extremely inefficient by having every function connect to a database. If you want to stick with procedural programming, then make it a global variable. With classes you can either connect in the __init__ method or pass database connection to __init__ as an argument. Most of my classes support both:
class A:
def __init__(self, ..., db_connection=None):
self.db_connection = db_connection or DbConnection()
...
def f(self):
self.db_connection("SELECT A from B")

Configure different email backends in Pyramid .ini file

I'm developing a reasonably basic app in Pyramid. The app includes functionality to send email. Currently I'm intending to use Sendgrid for this purpose but do not want to couple it too tightly. Additionally, I do not want any emails sent out during development or testing. My solution is to create lightweight middleware classes for each provider, all providing a send() method.
I imagine loose coupling can be achieved by using the Configurator object but I'm not quite there yet.
If given the following code (note there is no request as I want to be able to call this via Celery):
def send_email(sender, recipient, subject, contents):
emailer = get_emailer()
emailer.send(from=sender, to=receipient, subject=subject, body=contents)
How would the get_emailer() function look like, assuming my development.ini contained something like pyramid.includes = my_app.DumpToConsoleEmailer?

Your mention of Celery changes everything... Celery doesn't make it very obvious, but a Celery worker is a completely absolutely separate process, which knows absolutely nothing about your Pyramid application and potentially runs on a different machine, executing tasks hours after your web application created them - a worker just takes tasks one by one from the queue and executes them. There's no request, no Configurator, no WSGI stack, no PasteDeploy which assembles your application from an .ini file.
Point is - a Celery worker does not know if your Pyramid process was started in development or production configuration, unless you tell it explicitly. It is even possible to have a worker executing tasks from two applications, one running in development mode, and another in production :)
One option is to pass the configuration to your celery worker explicitly on startup (say, by declaring some variable in celeryconfig.py). Then a worker would always use the same mailer for all tasks.
Another option is to pass a "mailer_type" parameter explicitly from your Pyramid app to the worker for each task:
#task
def send_email(sender, recipient, subject, contents, mailer_type='dummy'):
emailer = get_emailer(mailer_type)
emailer.send(from=sender, to=receipient, subject=subject, body=contents)
In your Pyramid app, you can put any key/value pairs in your .ini file and access them via request.registry.settings:
send_email.delay(..., request.registry.settings['mailer_type'])

Since asking the question a month ago I have done a bit of reading. It has lead me to two possible solutions:
A) Idiomatic Pyramid
We want to address two problems:
How to set up a class, specified in a PasteDeploy Configuration File (.ini file) globally for the Pyramid app
How to access our class at runtime without "going through" request
To set up the specified class we should define an includeme() function in our module and then specify the module in our .ini file as part of pyramid.includes. In our includeme() function we then use config.registry.registerUtility(), a part of the Zope Component Architecture, to register our class and an interface it implements.
To access our class at runtime we then need to call registry.queryUtility(), having gotten the registry from pyramid.threadlocal.get_current_registry().
This solution is a bit of a hack since it uses threadlocal to get the config.
B) Lazy Module Globals
My personal solution to the problem was more simple (and likely not thread-safe):
# In module MailerHolder:
class Holder(object):
mailer = None
holder = Holder()
def get_emailer():
return holder.mailer
#In module ConsoleMailer:
import MailHolder
class ConsoleMailer(object):
def send(self, **kwargs):
# Code to print email to console
def includeme(config):
MailHolder.holder.mailer = ConsoleMailer()

Cleaining up objects with the del method

I'm developing a python program to monitor and control a game-server. The game-server has many game-cores, and those cores handle the clients.
I have a python class called Server that holds instances of the class Core, and those instances are used to manage the actual game-cores. The Core class needs to connect to the game-core via TCP-Socket, in order to send commands to that specific game-core. To close those sockets properly, the Core class has a __del__ method which closes the socket.
An example:
class Server(object):
Cores = [] # list which will be filled with the Core objects
def __init__(self):
# detect the game-cores, create the core objects and append them to self.Cores
class Core(object):
CoreSocket = None # when the socket gets created, the socket-object will be bound to this variable
def __init__(self, coreID):
# initiate the socket connection between the running game-core and this python object
def __del__(self):
# properly close the socket connection
Now, when I use the Core class itself, the destructor always gets called properly. But when I use the Server class, the Core objects inside Server.Cores never get destructed. I have read that the gc has a problem with circular references and classes with destructors, but the Core objects never reference the Server object (only the socket-object, in Core.CoreSocket), so no circular references are created.
I usually prefer using the with-statement for resource cleaning, but in this case I need to send commands over many different methods in the Server class, so using with won't help ... I also tried to create and close the socket on each command, but that really kills the performance when I need to send many commands. Weak refereneces created with the weakref module won't help eigther, because the destructors then get called immediately after I create the Server object.
Why don't the Core objects get destructed properly when the Server object gets cleaned up by the gc? I guess I'm just forgetting something simple, but I just can't find out what it is.
Or maybe there is a better approach for closing those sockets when the object gets cleaned up?

You've mixed up class and instance members. Unlike in some other languages, defining a variable at class scope creates a class variable, not an instance variable. When a Server instance dies, the Server class is still around and still holds references to the cores. Define self.cores in the __init__ method instead:
class Server(object):
def __init__(self):
self.cores = []

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.