I am looking for a way to correctly manage module level global variables that use some operating system resource (like a file or a thread).
The problem is that when the module is reloaded, my resource must be properly disposed (e.g. the file closed or the thread terminated) before creating the new one.
So I need a better pattern to manage those singleton objects.
I've been reading the docs on module reload and this is quite interesting:
When a module is reloaded, its dictionary (containing the module’s
global variables) is retained. Redefinitions of names will override
the old definitions, so this is generally not a problem. If the new
version of a module does not define a name that was defined by the old
version, the old definition remains. This feature can be used to the
module’s advantage if it maintains a global table or cache of objects
— with a try statement it can test for the table’s presence and skip
its initialization if desired:
try:
cache
except NameError:
cache = {}
So I could just check if the objects already exist, and dispose them before creating the new ones.
You need to monkeypatch or fork django to hook into django dev server reloading feature and do the proper thing to manage file closing etc...
But since you develop a django application, if you mean to use a proper server to serve your app in the future you should consider managing your global variables and think about semaphores and all that jazz.
But before going this route and implement all this difficult code prone to error and hair loss. You should consider other solution like nosql databases (redis, mongodb, neo4j, hadoop...) and background process managers like celery and gearman. If all of this don't feet your use-case(s) and you can't avoid to create and manage files yourself and global variables then consider the client/server pattern where clients are webserver threads unless you want to mess with NFS.
Related
Currently we have several web applications(written with Django) which work well under gunicorn's default sync worker, and we want to use its gevent worker to get a better performance.
It is known that several operations which have side-effects may cause problems when gevent.monkey.patch_all() is used:
Read and write global variables including static class variables and so on. Read and write global variables between greenlets may get unexpected results.
3rd-party libs that using C extension (mysqlclient etc). Generally there should be no problems though IO/timeouts may be blocking in the C extension. But if the C extension stores some global state variables, gevent may cause some unexpected behaviors. And I wonder that if there are any problem when non-blocking IO or multithreading is used in the C extension.
Now the problem comes for:
How to check if any global variables may be read/write by two or more greenlets? Or generally, is there any global variable writing operations in our web applications?
How to check if a C extension module is compatible with gevent?
For the first problem we have an idea:
Add hooks code in cpython. For every PyObject we create two lists to store IDs (or something like) of greenlets which read or write the object. Rebuild cpython and run our application with real workload to check. But it seems too complex to implement.
I'm happy to accept that this might not be possible, let alone sensible, but is it possible to keep a persistent reference to an object I have created?
For example, in a few of my views I have code that looks a bit like this (simplified for clarity):
api = Webclient()
api.login(GPLAY_USER,GPLAY_PASS)
url = api.get_stream_urls(track.stream_id)[0]
client = mpd.MPDClient()
client.connect("localhost", 6600)
client.clear()
client.add(url)
client.play()
client.disconnect()
It would be really neat if I could just keep one reference to api and client throughout my project, especially to avoid repeated api logins with gmusicapi. Can I declare them in settings.py? (I'm guessing this is a terrible idea), or by some other means keep a connection to them that's persistent?
Ideally I would then have functions like get_api() which would check the existing object was still ok and return it or create a new one as required.
You can't have anything that's instantiated once per application, because you'll almost certainly have more than one server process, and objects aren't easily shared across processes. However, one per process is definitely possible, and worthwhile. To do that, you only need to instantiate it at module level in the relevant file (eg views.py). That means it will be automatically instantiated when Django first imports that file (in that process), and you can refer to it as a global variable in that file. It will persist as long as the process does, and when as new process is created, a new global var will be instantiated.
You could make them properties of your application object or of some
other application object that is declared at the top level of your
project - before anything else needs it.
If you put them into a class that gets instantiated on the first
import and is then just used on the rest it can be imported by
several modules and accessed.
Either way they would have a life of the length of the execution.
You can't persist the object reference, but you can store something either in memory django cache or in memcached django cache.
Django Cache
https://docs.djangoproject.com/en/dev/topics/cache/
See also
Creating a Persistent Data Object In Django
INTRO
I've recently switched to Python, after about 10 years of PHP development and habits.
Eg. in Symfony2, every request to server (Apache for instance) has to load eg. container class and instantiate it, to construct the "rest" of the objects.
As far as I understand (I hope) Python's WSGI env, an app is created once, and until that app closes, every request just calls methods/functions.
This means that I can have eg. one instance of some class, that can be accessed every time, request is dispatched, without having to instantiate it in every request. Am I right?
QUESTION
I want to have one instance of class since the call to __init__ is very expensive (in both computing and resources lockup). In PHP instantiating this in every request degrades performance, am I right that with Python's WSGI I can instantiate this once, on app startup, and use through requests? If so, how do I achieve this?
WSGI is merely a standardized interface that makes it possible to build the various components of a web-server architecture so that they can talk to each other.
Pyramid is a framework whose components are glued with each other through WSGI.
Pyramid, like other WSGI frameworks, makes it possible to choose the actual server part of the stack, like gunicorn, Apache, or others. That choice is for you to make, and there lies the ultimate answer to your question.
What you need to know is whether your server is multi-threaded or multi-process. In the latter case, it's not enough to check whether a global variable has been instantiated in order to initialize costly resources, because subsequent requests might end up in separate processes, that don't share state.
If your model is multi-threaded, then you might indeed rely on global state, but be aware of the fact that you are introducing a strong dependency in your code. Maybe a singleton pattern coupled with dependency-injection can help to keep your code cleaner and more open to change.
The best method I found was mentioned (and I missed it earlier) in Pyramid docs:
From Pyramid Docs#Startup
Note that an augmented version of the values passed as **settings to the Configurator constructor will be available in Pyramid view callable code as request.registry.settings. You can create objects you wish to access later from view code, and put them into the dictionary you pass to the configurator as settings. They will then be present in the request.registry.settings dictionary at application runtime.
There are a number of ways to do this in pyramid, depending on what you want to accomplish in the end. It might be useful to look closely at the Pyramid/SQLAlchemy tutorial as an example of how to handle an expensive initialization (database connection and metadata setup) and then pass that into the request-handling engine.
Note that in the referenced link, the important part for your question is the __init__.py file's handling of initialize_sql and the subsequent creation of DBSession.
I was wondering about implementing a singleton class following http://code.activestate.com/recipes/52558-the-singleton-pattern-implemented-with-python/ but was wondering about any (b)locking issues. My code is suppose to cache SQL statements and execute all cached statements using cursor.executemany(SQL, list-of-params) when a certain number of cached elements are reached or a specific execute-call is done by the user. Implementing a singleton was suppose to make it possible to cache statements application-wide, but Im afraid Ill run into (b)locking issues.
Any thoughts?
By avoiding lazy initialization the blocking problem will go away. In a module where initialization of your connection to the database is occurring import the module that contains the singleton and then immediately create an instance of the singleton that is not stored in a variable.
#Do Database Initialization
import MySingleton
MySingleton()
#Allow threads to be created
Why don't you use the module directly (as pointed out before, models are Singletons). If you create a module like:
# mymodule.py
from mydb import Connection
connection = Connection('host', 'port')
you can use the import mechanism and the connection instance will be the same everywhere.
from mymodule import connection
Of course, you can define a much more complex initialization of connection (possibly via writing your own class), but the point is that Python will only initialize the module once, and provide the same objects for every subsequent call.
I believe the Singleton (or Borg) patterns have very specific applications in Python, and for the most part you should rely on direct imports until proven otherwise.
There should be no problems unless you plan to use that Singleton instance with several threads.
Recently I've faced with some issue caused by wrongly implemented cache reloading mechanism - cache data was first cleared and then filled. This works well in single thread, but produces bugs in multithreading.
As long as you use CPython - Global Interpreter Lock should prevent blocking problems. You could also use the Borg pattern.
There is a certain page on my website where I want to prevent the same user from visiting it twice in a row. To prevent this, I plan to create a Lock object (from Python's threading library). However, I would need to store that across sessions. Is there anything I should watch out for when trying to store a Lock object in a session (specifically a Beaker session)?
Storing a threading.Lock instance in a session (or anywhere else that needs serialization) is a terrible idea, and presumably you'll get an exception if you try to (since such an object cannot be serialized, e.g., it cannot be pickled). A traditional approach for cooperative serialization of processes relies on file locking (on "artificial" files e.g. in a directory such as /tmp/locks/<username> if you want the locking to be per-user, as you indicate). I believe the wikipedia entry does a good job of describing the general area; if you tell us what OS you're running order, we might suggest something more specific (unfortunately I do not believe there is a cross-platform solution for this).
I just realized that this was a terrible question since locking a lock and saving it to the session takes two steps thus defeating the purpose of the lock's atomic actions.