passing in an instance method into a python multiprocessing Process - python

If I pass in a reference to an instance method instead of a module level method into multiprocessing.Process, when I call it's start method is an exact copy of the parent process instance passed in or is the constructor called again? What happens with 'deep' instance member objects? Exact copy or default values?

Instances are not passed between processes, because instances are per Python VM process.
Values passed between processes are pickled. Unpickling normally restores instances by measures other than calling __init__; As far as I can tell it directly sets attribute values to resurrect an instance, and resolves references to/from other instances.
Provided that you run identical code in either process (and with multiprocessing, you do), it restores a reference to a correct instance method inside a restored chain of instances.
This means that if an __init__ in the chain of objects does something side-effectful, it will not have been done on the receiving side, that is, in a subprocess. Do such initialization explicitly.
All in all, it's easiest to share (effectively) immutable objects between parallel processes, and reconcile results after join()ing all of them.

Related

Mutithreading a method from an object instantiated in the main thread

If I instantiate an object in the main thread, and then send one of it's member methods to a ThreadPoolExecutor, does Python somehow create a copy-by-value of the object and sends it to the subthread, so that the objects member method will have access to its own copy of self?
Or is it indeed accessing self from the object in the main thread, thus meaning that every member in a subthread is modifying / overwriting the same properties (living in the main thread)?
Threads share a memory space. There is no magic going on behind the scenes, so code in different threads accesses the same objects. Thread switches can occur at any time, although most simple Python operations are atomic. It is up to you to avoid race conditions. Normal Python scoping rules apply.
You might want to read about ThreadLocal variables if you want to find out about workarounds to the default behavior.
Processes as quite different. Each Process has its own memory space and its own copy of all the objects it references.

How does one expose a memory managed item that must be closed/deleted as an importable resource in python3?

Suppose I have an item that interfaces with a C library. The interface allocates memory.
The __del__ method takes care of everything, but there is no guarantee the __del__ method will be called in an imperative python3 runtime.
So, I have overloaded the context manager functions and can declare my item 'with':
with Foo(**kwargs) as foo:
foo.doSomething()
# no memory leaks
However, I am now exposing my foo in an __init__.py, and am curious how I could possibly expose the context manager object in a way that allows a user to use it without using it inside of a 'with' block.
Is there a way I can open/construct my Foo inside my module such that there is a guarantee __del__ will be called (or the context manager exit function) so that it is exposed for use, but doesn't expose a daemon or other long term process to risk of memory loss?
Or is deletion implied when an object is constructed implicitly via import, even though it ~may or may not~ occur when the object is constructed in the runtime scope?
Although this should probably not be the case, or at least might not always be the case from version to version...
...Python3.9 does indeed call __del__ on objects initialized prior to importation.
I can't accept this answer because I have no way of directly proving that it is not possible python3 does not call the __del__, but it has not failed to call it so far, whereas I get a memory leak every time I do not dispose the object properly after declaring it during runtime flow.

multiprocessing initargs - how it works under the hood?

I've assumed that multiprocessing.Pool uses pickle to pass initargs to child processes.
However I find the following stange:
value = multiprocessing.Value('i', 1)
multiprocess.Pool(initializer=worker, initargs=(value, )) # Works
But this does not work:
pickle.dumps(value)
throwing:
RuntimeError: Synchronized objects should only be shared between processes through inheritance
Why is that, and how multiprocessing initargs can bypass that, as it's using pickle as well?
As I understand, multiprocessing.Value is using shared memory behind the scenes, what is the difference between inheritance or passing it via initargs? Specifically speaking on Windows, where the code does not fork, so a new instance of multiprocessing.Value is created.
And if you had instead passed an instance of multiprocessing.Lock(), the error message would have been RuntimeError: Lock objects should only be shared between processes through inheritance. These things can be passed as arguments if you are creating an instance of multiprocessing.Process, which is in fact what is being used when you say initializer=worker, initargs=(value,). The test being made is whether a process is currently being spawned, which is not the case when you already have an existing process pool and are now just submitting some work for it. But why this restriction?
Would it make sense for you to be able to pickle this shared memory to a file and then a week later trying to unpickle it and use it? Of course not! Python cannot know that you would not be doing anything so foolish as that and so it places great restrictions on how shared memory and locks can be pickled/unpickled, which is only for passing to other processes.

Python 2.7: Thread local storage's instantiation when first accessed?

Complete code here: https://gist.github.com/mnjul/82151862f7c9585dcea616a7e2e82033
Environment is Python 2.7.6 on an up-to-date Ubuntu 14.04 x64.
Prologue:
Well, I got this strange piece of code at my work project, and it's a classic "somebody wrote it and quit the job, and it works but I don't why" piece, so I decided to write a stripped-down version of it, hoping to get my questions clarified/answered. Please kindly check the referred gist.
Situation:
So, I have a custom class Storage inheriting from Python's thread local storage, intended to book-keep some thread-local data. There is only one instance of that class, instantiated in the global scope when no threads have been constructed. So I would expect that as there is only one Storage instance, its __init__() running only once, those Runner threads would actually not have thread-local storage and data accesses will clash.
However this turned out to be wrong and the code output (see my comment at that gist) indicates that each thread actually perfectly has its own local storage --- strangely, at each thread's first access to the storage object (i.e. a set()), Storage.__init__() is mysteriously run, thus properly creating the thread-local storage, producing the desired effect.
Questions: Why on earth did Storage.__init__ get invoked when the threads attempted to call a member function of a seemingly already-instantiated object? Is this a CPython (or PThread, if that matters) implementation detail? I feel like there're a lot of things happening between my stack trace's "py_thr_local.py", line 36, in run => storage.set('keykey', value) and "py_thr_local.py", line 14, in __init__, but I can't find any relevant piece of information in (C)Python's source code, or on the StackOverflow.
Any feedback is welcome. Let me know if I need to clarify things or provide more information.
The first piece of information to consider is what is a thread-local? They are independently initialized instances of a particular type that are tied to a particular thread. With that in mind I would expect that some initialization code would be called multiple times. While in some languages like Java the initialization is more explicit, it does not necessarily need to be.
Let's look at the source for the supertype of the storage container you're using: https://github.com/python/cpython/blob/2.7/Lib/_threading_local.py
Line 186 contains the local type that is being used. Taking a look at that class you can see that the methods setattr and getattribute are among the overridden methods. Remember that in python these methods are called every time you attempt to assign a value or access a value in a type. The implementations of these methods acquire a local lock and then call the _patch method. This patch method creates a new dictionary and assigns it to the current instance dict (using object base to avoid infinite recursion: How is the __getattribute__ method used?)
So when you are calling storage.set(...) you are actually looking up a proxy dictionary in the local thread. If one doesn't exist the the init method is called on your type (see line 182). The result of that lookup is substituted in to the current instances dict method, and then the appropriate method is called on object to retrieve or set that value (l. 193,206,219) which uses the newly installed dict.
That's part of the contract (from http://svn.python.org/projects/python/tags/r27a1/Lib/_threading_local.py):
Note that if you define an init method, it will be
called each time the local object is used in a separate thread.
It's not too well documented in the official docs but basically each time you interact with a thread local in a different thread a new instance unique to that thread gets allocated.

Python Static Thread Variable

I have a number of threads in my software that all do the same thing, but each thread operates from a different "perspective." I have a "StateModel" object that is used throughout the thread and the objects within the thread, but StateModel needs to be calculated differently for each thread.
I don't like the idea of passing the StateModel object around to all of the functions that need it. Normally, I would create a module variable and all of the objects throughout the program could reference the same data from the module variable. But, is there a way to have this concept of a static module variable that is different and independent for each thread? A kind of static Thread variable?
Thanks.
This is implemented in threading.local.
I tend to dislike mostly-quoting-the-docs answers, but... well, time and place for everything.
A class that represents thread-local data. Thread-local data are data
whose values are thread specific. To manage thread-local data, just
create an instance of local (or a subclass) and store attributes on
it:
mydata = threading.local()
mydata.x = 1
The instance’s values will be
different for separate threads.
For more details and extensive examples, see the documentation string
of the _threading_local module.
Notably you can just have your class extend threading.local, and suddenly your class has thread-local behavior.

Categories