Is middleware in Django thread-safe? - python

Recently I've read this article:
http://blog.roseman.org.uk/2010/02/01/middleware-post-processing-django-gotcha/
I don't understand, why does the solution described there work?
Why does instantiating separate objects make data chunk thread-safe?
I have two guesses:
Django explicitly holds middleware objects in shared memory, and do not do this for other objects, so other objects are thread-safe.
In second example, in article, lifetime of thread-safety-critical data is much less that in first example, so probably, thread-unsafe operations just have no time to occur.
There is also issues with thread-safety in Django templates.
My question is - how to guess when Django thread-safe and where its not? is there any logic in it or conventions? Another question - I know that request object is thread safe - it is clear, that it wouldn't be safe, web-sites built with Django would be not able to operate, but what exactly makes it thread-safe?

The point, as I note in that article, is that the middleware is instantiated once per process. In most methods of deploying Django, a process lasts for multiple requests. Note that you never instantiate the middleware object yourself: Django takes care of that. That's a clue that it's being done outside the request/response cycle.
The extra object I used there is being instantiated within the process_response method. So, as soon as that method returns, the new object goes out of scope and is destroyed, and there are no thread-safety issues.
Generally speaking, the only objects you have to worry about thread-safety on are those you instantiate at module or class level rather than inside a function/method, and those you don't instantiate yourself, like the middleware here. And even there, requests are explicitly an exception: you can count on those being per-request (naturally).

Related

Is there a way I can persist context locals for sub-threads?

Currently I create a library that records backend calls like ones made to boto3 and requests libraries, and then populates a global "data" object based on some data like the status code of responses, etc.
I originally had the data object as global, but then I realized this was a bad idea because when the application is run in parallel, the data object is simultaneously modified (which would possibly corrupt it), however I want to keep this object separate for each invocation of my application.
So I looked into Flask context locals, similar to how it does for its global "request" object. I manage to implement a way using LocalProxy how they did it, so it works fine now with parallel requests to my application - the issue now though, is that whenever the application spawns a new sub-thread it creates an entirely new context and thus I can't retrieve the data object from its parent thread, e.g. for that request session - basically I need to copy and modify the same data object that is local to the main thread for that particular application request.
To clarify, I was able to do this when I previously had data as a true "global" object - multiple sub-threads could properly modify the same object. However, it did not handle the case for simultaneous requests made to application, as I mentioned; so I manage to fix that, but now the sub-threads are not able to modify the same data object any more *sad face*
I looked at some solutions like below, but this did not help me because the decorator approach only works for "local" functions. Since the functions that I need to decorate are "global" functions like requests.request that threads across various application requests will use, I think I need to use another approach where I can temporarily copy same thread context to use in sub-threads (and my understanding is it should not overwrite or decorate the function, as this is a "global" one that will be use by simultaneous requests to application). Would appreciate any help or possible ideas how I can make this work for my use-case.
Thanks.
Flask throwing 'working outside of request context' when starting sub thread

Django Threading Structure

First of all to begin with 'Yes' i checked and googled this topic but can't find anything that gives me a clear answer to my question? I am a beginner in Djagno and studying its documentation where i read about the Thread Safety Considerations for render method of nodes for Templates Tags. Here is the link to the documentation Link. My question lies where it states that Once the node is parsed the render method for that node might be called multiple times i am confused whether it is talking about the use of the template tag in the same document at different places for the same user at the single instance level of the user on the server or the use of the template tag for multiple request coming from users all around the world sharing the same django instance in memory? If its the latter one does't django create a new instance at the server level for every new user request and have separate resources for every user in the memory or am i wrong about this?
It's the latter.
A WSGI server usually runs a number of persistent processes, and in each process it runs a number of threads. While some automatic scaling can be applied, the number of processes and threads is more or less constant, and determines how many concurrent requests Django can handle. The days where each request would create a new CGI process are long gone, and in most cases persistent processes are much more efficient.
Each process has its own memory, and the communication between processes is usually handled by the database, the cache etc. They can't communicate directly through memory.
Each thread within a process shares the same memory. That means that any object that is not locally scoped (e.g. only defined inside a function), is accessible from the other threads. The cached template loader parses each template once per process, and each thread uses the same parsed nodes. That also means that if you set e.g. self.foo = 'bar' in one thread, each thread will then read 'bar' when accessing self.foo. Since multiple threads run at the same time, this can quickly become a huge mess that's impossible to debug, which is why thread safety is so important.
As the documentation says, as long as you don't store data on self, but put it into context.render_context, you should be fine.

AttributeError when using memcache.gets()

I am trying to use memcached with Google App Engine. I import the library using
from google.appengine.api import memcache
and then call it using
posts = memcache.gets("posts")
Then I get the following error:
AttributeError: 'module' object has no attribute 'gets'
I have looked through the Google App Engine documentation regarding memcache, but I can't find any examples using memcache.gets(). Memcache.get() seems to be used the way I call gets above.
gets is a method of the memcache client object, not a module-level function of memcache. The module-level functions are quite simple, stateless, and synchronous; using the client object, you can do more advanced stuff, if you have to, as documented at https://cloud.google.com/appengine/docs/python/memcache/clientclass .
Specifically, per the docs at https://cloud.google.com/appengine/docs/python/memcache/clientclass#Client_gets , "You use" gets "rather than get if you want to avoid conditions in which two or more callers are trying to modify the same key value at the same time, leading to undesired overwrites." since gets also gets (and stashes in the client object) the cas_id which lets you use the cas (compare-and-set) call (you don't have to explicitly handle the cas_id yourself).
Since it doesn't seem you're attempting a compare-and-set operation, I would recommend using the simpler module-level function get, rather than instantiating a client object and using its instance method gets.
If you actually do need to compare and set, a very good explanation can be found here:
The Client object is required because the gets() operation actually
squirrels away some hidden information that is used by the subsequent
cas() operation. Because the memcache functions are stateless (meaning
they don't alter any global values), these operations are only
available as methods on the Client object, not as functions in the
memcache module. (Apart from these two, the methods on the Client
object are exactly the same as the functions in the module, as you can
tell by comparing the documentation.)
The solution would be to use the class:
client = memcache.Client()
posts = client.gets("posts")
...
client.cas("posts", "new_value")
Although, of course, you would need more than that for cas to be useful.

Django: Keep a persistent reference to an object?

I'm happy to accept that this might not be possible, let alone sensible, but is it possible to keep a persistent reference to an object I have created?
For example, in a few of my views I have code that looks a bit like this (simplified for clarity):
api = Webclient()
api.login(GPLAY_USER,GPLAY_PASS)
url = api.get_stream_urls(track.stream_id)[0]
client = mpd.MPDClient()
client.connect("localhost", 6600)
client.clear()
client.add(url)
client.play()
client.disconnect()
It would be really neat if I could just keep one reference to api and client throughout my project, especially to avoid repeated api logins with gmusicapi. Can I declare them in settings.py? (I'm guessing this is a terrible idea), or by some other means keep a connection to them that's persistent?
Ideally I would then have functions like get_api() which would check the existing object was still ok and return it or create a new one as required.
You can't have anything that's instantiated once per application, because you'll almost certainly have more than one server process, and objects aren't easily shared across processes. However, one per process is definitely possible, and worthwhile. To do that, you only need to instantiate it at module level in the relevant file (eg views.py). That means it will be automatically instantiated when Django first imports that file (in that process), and you can refer to it as a global variable in that file. It will persist as long as the process does, and when as new process is created, a new global var will be instantiated.
You could make them properties of your application object or of some
other application object that is declared at the top level of your
project - before anything else needs it.
If you put them into a class that gets instantiated on the first
import and is then just used on the rest it can be imported by
several modules and accessed.
Either way they would have a life of the length of the execution.
You can't persist the object reference, but you can store something either in memory django cache or in memcached django cache.
Django Cache
https://docs.djangoproject.com/en/dev/topics/cache/
See also
Creating a Persistent Data Object In Django

How to handle local long-living objects in WSGI env

INTRO
I've recently switched to Python, after about 10 years of PHP development and habits.
Eg. in Symfony2, every request to server (Apache for instance) has to load eg. container class and instantiate it, to construct the "rest" of the objects.
As far as I understand (I hope) Python's WSGI env, an app is created once, and until that app closes, every request just calls methods/functions.
This means that I can have eg. one instance of some class, that can be accessed every time, request is dispatched, without having to instantiate it in every request. Am I right?
QUESTION
I want to have one instance of class since the call to __init__ is very expensive (in both computing and resources lockup). In PHP instantiating this in every request degrades performance, am I right that with Python's WSGI I can instantiate this once, on app startup, and use through requests? If so, how do I achieve this?
WSGI is merely a standardized interface that makes it possible to build the various components of a web-server architecture so that they can talk to each other.
Pyramid is a framework whose components are glued with each other through WSGI.
Pyramid, like other WSGI frameworks, makes it possible to choose the actual server part of the stack, like gunicorn, Apache, or others. That choice is for you to make, and there lies the ultimate answer to your question.
What you need to know is whether your server is multi-threaded or multi-process. In the latter case, it's not enough to check whether a global variable has been instantiated in order to initialize costly resources, because subsequent requests might end up in separate processes, that don't share state.
If your model is multi-threaded, then you might indeed rely on global state, but be aware of the fact that you are introducing a strong dependency in your code. Maybe a singleton pattern coupled with dependency-injection can help to keep your code cleaner and more open to change.
The best method I found was mentioned (and I missed it earlier) in Pyramid docs:
From Pyramid Docs#Startup
Note that an augmented version of the values passed as **settings to the Configurator constructor will be available in Pyramid view callable code as request.registry.settings. You can create objects you wish to access later from view code, and put them into the dictionary you pass to the configurator as settings. They will then be present in the request.registry.settings dictionary at application runtime.
There are a number of ways to do this in pyramid, depending on what you want to accomplish in the end. It might be useful to look closely at the Pyramid/SQLAlchemy tutorial as an example of how to handle an expensive initialization (database connection and metadata setup) and then pass that into the request-handling engine.
Note that in the referenced link, the important part for your question is the __init__.py file's handling of initialize_sql and the subsequent creation of DBSession.

Categories