Python 2.7: Thread local storage's instantiation when first accessed? - python

Complete code here: https://gist.github.com/mnjul/82151862f7c9585dcea616a7e2e82033
Environment is Python 2.7.6 on an up-to-date Ubuntu 14.04 x64.
Prologue:
Well, I got this strange piece of code at my work project, and it's a classic "somebody wrote it and quit the job, and it works but I don't why" piece, so I decided to write a stripped-down version of it, hoping to get my questions clarified/answered. Please kindly check the referred gist.
Situation:
So, I have a custom class Storage inheriting from Python's thread local storage, intended to book-keep some thread-local data. There is only one instance of that class, instantiated in the global scope when no threads have been constructed. So I would expect that as there is only one Storage instance, its __init__() running only once, those Runner threads would actually not have thread-local storage and data accesses will clash.
However this turned out to be wrong and the code output (see my comment at that gist) indicates that each thread actually perfectly has its own local storage --- strangely, at each thread's first access to the storage object (i.e. a set()), Storage.__init__() is mysteriously run, thus properly creating the thread-local storage, producing the desired effect.
Questions: Why on earth did Storage.__init__ get invoked when the threads attempted to call a member function of a seemingly already-instantiated object? Is this a CPython (or PThread, if that matters) implementation detail? I feel like there're a lot of things happening between my stack trace's "py_thr_local.py", line 36, in run => storage.set('keykey', value) and "py_thr_local.py", line 14, in __init__, but I can't find any relevant piece of information in (C)Python's source code, or on the StackOverflow.
Any feedback is welcome. Let me know if I need to clarify things or provide more information.

The first piece of information to consider is what is a thread-local? They are independently initialized instances of a particular type that are tied to a particular thread. With that in mind I would expect that some initialization code would be called multiple times. While in some languages like Java the initialization is more explicit, it does not necessarily need to be.
Let's look at the source for the supertype of the storage container you're using: https://github.com/python/cpython/blob/2.7/Lib/_threading_local.py
Line 186 contains the local type that is being used. Taking a look at that class you can see that the methods setattr and getattribute are among the overridden methods. Remember that in python these methods are called every time you attempt to assign a value or access a value in a type. The implementations of these methods acquire a local lock and then call the _patch method. This patch method creates a new dictionary and assigns it to the current instance dict (using object base to avoid infinite recursion: How is the __getattribute__ method used?)
So when you are calling storage.set(...) you are actually looking up a proxy dictionary in the local thread. If one doesn't exist the the init method is called on your type (see line 182). The result of that lookup is substituted in to the current instances dict method, and then the appropriate method is called on object to retrieve or set that value (l. 193,206,219) which uses the newly installed dict.

That's part of the contract (from http://svn.python.org/projects/python/tags/r27a1/Lib/_threading_local.py):
Note that if you define an init method, it will be
called each time the local object is used in a separate thread.
It's not too well documented in the official docs but basically each time you interact with a thread local in a different thread a new instance unique to that thread gets allocated.

Related

Is there a way I can persist context locals for sub-threads?

Currently I create a library that records backend calls like ones made to boto3 and requests libraries, and then populates a global "data" object based on some data like the status code of responses, etc.
I originally had the data object as global, but then I realized this was a bad idea because when the application is run in parallel, the data object is simultaneously modified (which would possibly corrupt it), however I want to keep this object separate for each invocation of my application.
So I looked into Flask context locals, similar to how it does for its global "request" object. I manage to implement a way using LocalProxy how they did it, so it works fine now with parallel requests to my application - the issue now though, is that whenever the application spawns a new sub-thread it creates an entirely new context and thus I can't retrieve the data object from its parent thread, e.g. for that request session - basically I need to copy and modify the same data object that is local to the main thread for that particular application request.
To clarify, I was able to do this when I previously had data as a true "global" object - multiple sub-threads could properly modify the same object. However, it did not handle the case for simultaneous requests made to application, as I mentioned; so I manage to fix that, but now the sub-threads are not able to modify the same data object any more *sad face*
I looked at some solutions like below, but this did not help me because the decorator approach only works for "local" functions. Since the functions that I need to decorate are "global" functions like requests.request that threads across various application requests will use, I think I need to use another approach where I can temporarily copy same thread context to use in sub-threads (and my understanding is it should not overwrite or decorate the function, as this is a "global" one that will be use by simultaneous requests to application). Would appreciate any help or possible ideas how I can make this work for my use-case.
Thanks.
Flask throwing 'working outside of request context' when starting sub thread

Interact with stored python objects on server

I want to keep a python class permanently alive so I can continually interact with it. The reason for this is that this class is highly memory intensive which means that (1) I cannot fit it into memory multiple times, and (2) Loading the class is prohibitively slow.
I have tried implementing this using both Pyro and RPYC, but it appears that these packages always delete the object and create a new object every time a new request is made (which is exactly what I don't want to do.) However, I did find the following option for Pyro:
#Pyro4.behavior(instance_mode="single")
Which ensures that only a single instance is created. However, since it is possible that multiple requests will be made simultaneously I am not 100% that this is safe to do. Is there a better way to accomplish what I am trying to do?
Thanks in advance for any help, it is greatly appreciated! (I've been struggling with this for quite a while now).
L
If you don't want to make your class thread safe, you can set SERVERTYPE to "multiplex", this will make it so all remote method calls are processed sequentially.
https://pythonhosted.org/Pyro4/servercode.html#server-types-and-concurrency-model:
multiplexed server (servertype "multiplex")
This server uses a connection multiplexer to process all remote method calls sequentially. No threads are used in this server. It uses the best supported selector available on your platform (kqueue, poll, select). It means only one method call is running at a time, so if it takes a while to complete, all other calls are waiting for their turn (even when they are from different proxies). The instance mode used for registering your class, won’t change the way the concurrent access to the instance is done: in all cases, there is only one call active at all times. Your objects will never be called concurrently from different threads, because there are no threads. It does still affect when and how often Pyro creates an instance of your class.

Python Static Thread Variable

I have a number of threads in my software that all do the same thing, but each thread operates from a different "perspective." I have a "StateModel" object that is used throughout the thread and the objects within the thread, but StateModel needs to be calculated differently for each thread.
I don't like the idea of passing the StateModel object around to all of the functions that need it. Normally, I would create a module variable and all of the objects throughout the program could reference the same data from the module variable. But, is there a way to have this concept of a static module variable that is different and independent for each thread? A kind of static Thread variable?
Thanks.
This is implemented in threading.local.
I tend to dislike mostly-quoting-the-docs answers, but... well, time and place for everything.
A class that represents thread-local data. Thread-local data are data
whose values are thread specific. To manage thread-local data, just
create an instance of local (or a subclass) and store attributes on
it:
mydata = threading.local()
mydata.x = 1
The instance’s values will be
different for separate threads.
For more details and extensive examples, see the documentation string
of the _threading_local module.
Notably you can just have your class extend threading.local, and suddenly your class has thread-local behavior.

Pyro4, object changes class (the State patern) but Proxy does not see it

In an application I use a low level library and run it in a separate process.
The connection to the process is done with Pyro4.
The library needs initialization before the work and release of resources after.
Thus, I implement the State pattern to have both states nicely separated.
The library is embedded into an object (which is called by Pyro4.Daemon). It has initialization method, which does necessary procedures and changes the __class__ of the object to the "ready" one, which has all the methods to call the library.
However, Pyro4.Proxy does not see the object change its' methods after the initialization method is called. Though, if you look at the actual object (in a separate interpreter, running the Pyro4.Daemon in a thread) -- the __class__ does change and everything is fine there. The problem is on Pyro4.Proxy side.
So, is it possible to change the __class__ when working with Pyro4? Am I just doing something wrong?
PS
It seems Pyro4.Proxy grabs the methods of the remote object when you call it the first time. And then the methods are frozen. Could I "refresh" this procedure somehow, ask Proxy to check out the methods again?
Aha! So the answer is to run:
pyro_proxy._pyroGetMetadata()
it grabs the metadata about the remote object and fills the methods of Pyro4.Proxy object.
Links:
PYRO Implementation
The code (it's Python-only package!), in your installation or on repository

Django: Keep a persistent reference to an object?

I'm happy to accept that this might not be possible, let alone sensible, but is it possible to keep a persistent reference to an object I have created?
For example, in a few of my views I have code that looks a bit like this (simplified for clarity):
api = Webclient()
api.login(GPLAY_USER,GPLAY_PASS)
url = api.get_stream_urls(track.stream_id)[0]
client = mpd.MPDClient()
client.connect("localhost", 6600)
client.clear()
client.add(url)
client.play()
client.disconnect()
It would be really neat if I could just keep one reference to api and client throughout my project, especially to avoid repeated api logins with gmusicapi. Can I declare them in settings.py? (I'm guessing this is a terrible idea), or by some other means keep a connection to them that's persistent?
Ideally I would then have functions like get_api() which would check the existing object was still ok and return it or create a new one as required.
You can't have anything that's instantiated once per application, because you'll almost certainly have more than one server process, and objects aren't easily shared across processes. However, one per process is definitely possible, and worthwhile. To do that, you only need to instantiate it at module level in the relevant file (eg views.py). That means it will be automatically instantiated when Django first imports that file (in that process), and you can refer to it as a global variable in that file. It will persist as long as the process does, and when as new process is created, a new global var will be instantiated.
You could make them properties of your application object or of some
other application object that is declared at the top level of your
project - before anything else needs it.
If you put them into a class that gets instantiated on the first
import and is then just used on the rest it can be imported by
several modules and accessed.
Either way they would have a life of the length of the execution.
You can't persist the object reference, but you can store something either in memory django cache or in memcached django cache.
Django Cache
https://docs.djangoproject.com/en/dev/topics/cache/
See also
Creating a Persistent Data Object In Django

Categories