Mutithreading a method from an object instantiated in the main thread - python

If I instantiate an object in the main thread, and then send one of it's member methods to a ThreadPoolExecutor, does Python somehow create a copy-by-value of the object and sends it to the subthread, so that the objects member method will have access to its own copy of self?
Or is it indeed accessing self from the object in the main thread, thus meaning that every member in a subthread is modifying / overwriting the same properties (living in the main thread)?

Threads share a memory space. There is no magic going on behind the scenes, so code in different threads accesses the same objects. Thread switches can occur at any time, although most simple Python operations are atomic. It is up to you to avoid race conditions. Normal Python scoping rules apply.
You might want to read about ThreadLocal variables if you want to find out about workarounds to the default behavior.
Processes as quite different. Each Process has its own memory space and its own copy of all the objects it references.

Related

multiprocessing initargs - how it works under the hood?

I've assumed that multiprocessing.Pool uses pickle to pass initargs to child processes.
However I find the following stange:
value = multiprocessing.Value('i', 1)
multiprocess.Pool(initializer=worker, initargs=(value, )) # Works
But this does not work:
pickle.dumps(value)
throwing:
RuntimeError: Synchronized objects should only be shared between processes through inheritance
Why is that, and how multiprocessing initargs can bypass that, as it's using pickle as well?
As I understand, multiprocessing.Value is using shared memory behind the scenes, what is the difference between inheritance or passing it via initargs? Specifically speaking on Windows, where the code does not fork, so a new instance of multiprocessing.Value is created.
And if you had instead passed an instance of multiprocessing.Lock(), the error message would have been RuntimeError: Lock objects should only be shared between processes through inheritance. These things can be passed as arguments if you are creating an instance of multiprocessing.Process, which is in fact what is being used when you say initializer=worker, initargs=(value,). The test being made is whether a process is currently being spawned, which is not the case when you already have an existing process pool and are now just submitting some work for it. But why this restriction?
Would it make sense for you to be able to pickle this shared memory to a file and then a week later trying to unpickle it and use it? Of course not! Python cannot know that you would not be doing anything so foolish as that and so it places great restrictions on how shared memory and locks can be pickled/unpickled, which is only for passing to other processes.

Does threading.Condition maintain a collection of Thread objects?

Trying to wrap my wits around how threading works. The high-level language in the docs and source code is helpful up to a degree but still leaves me scratching my head. What exactly, in terms of data structures, is the relationship between Thread and Condition objects? What does it mean when a thread "releases" a lock? That the Condition object dequeues its reference to the thread? Is there a lower-level description of these interactions, preferably in Python terms, to be found on the Internet?
A Condition maintains a list (actually a collections.deque) of what are notionally threads, waiting on the condition. It actually stores locks that the waiting threads are blocked on, but thinking of it storing the threads is a conceptual shortcut if you don't care too much about the implementation. The list is initially empty, but any time a thread calls the Condition's wait method, it will create a new lock and add it to the list before blocking on the lock (conceptually, this adds the thread to the list, and suspends it). Locks are removed from the list after another thread calls notify or notify_all, which unlocks one or more of the lock objects in the list, waking up the corresponding threads.
Releasing a lock means unlocking it. It's a basic operation on a Lock object (the reverse of acquire, which locks the Lock). A lock is "held" in between an acquire and a release, and only one thread can hold a Lock at a given time (other threads will either block in acquire, or the operation will fail, perhaps after a timeout). You can use the context manager protocol to call acquire and release for you in simple cases:
with some_lock: # this acquires some_lock, blocking until it's available
do_stuff() # some_lock is held while this runs
# some_lock will be released automatically when the with block ends
Each Condition object is associated with a Lock, either a pre-existing one that you pass to its constructor, or one it creates internally for you (if you don't pass anything). The main Condition operations (wait and notify, and their variants) require that you already hold the associated lock before you call them. You can do the lock operations directly on the Condition object itself, since it proxies the Lock's acquire and release methods (and the equivalent context manager methods).
The Condition class is written in pure Python, so if you want to know how it works on a low level, there's probably no better source of information than the source code itself!
It might also be useful to see how a Condition is used to synchronize multithreaded access to an object. A good example of that is the queue module in the standard library, where each Queue uses three Conditions (not_full, not_empty and all_tasks_done) to efficiently manage threads that are trying to access or modify its data.

passing in an instance method into a python multiprocessing Process

If I pass in a reference to an instance method instead of a module level method into multiprocessing.Process, when I call it's start method is an exact copy of the parent process instance passed in or is the constructor called again? What happens with 'deep' instance member objects? Exact copy or default values?
Instances are not passed between processes, because instances are per Python VM process.
Values passed between processes are pickled. Unpickling normally restores instances by measures other than calling __init__; As far as I can tell it directly sets attribute values to resurrect an instance, and resolves references to/from other instances.
Provided that you run identical code in either process (and with multiprocessing, you do), it restores a reference to a correct instance method inside a restored chain of instances.
This means that if an __init__ in the chain of objects does something side-effectful, it will not have been done on the receiving side, that is, in a subprocess. Do such initialization explicitly.
All in all, it's easiest to share (effectively) immutable objects between parallel processes, and reconcile results after join()ing all of them.

Python Static Thread Variable

I have a number of threads in my software that all do the same thing, but each thread operates from a different "perspective." I have a "StateModel" object that is used throughout the thread and the objects within the thread, but StateModel needs to be calculated differently for each thread.
I don't like the idea of passing the StateModel object around to all of the functions that need it. Normally, I would create a module variable and all of the objects throughout the program could reference the same data from the module variable. But, is there a way to have this concept of a static module variable that is different and independent for each thread? A kind of static Thread variable?
Thanks.
This is implemented in threading.local.
I tend to dislike mostly-quoting-the-docs answers, but... well, time and place for everything.
A class that represents thread-local data. Thread-local data are data
whose values are thread specific. To manage thread-local data, just
create an instance of local (or a subclass) and store attributes on
it:
mydata = threading.local()
mydata.x = 1
The instance’s values will be
different for separate threads.
For more details and extensive examples, see the documentation string
of the _threading_local module.
Notably you can just have your class extend threading.local, and suddenly your class has thread-local behavior.

Does thread-local mean thread safe?

Specifically I'm talking about Python. I'm trying to hack something (just a little) by seeing an object's value without ever passing it in, and I'm wondering if it is thread safe to use thread local to do that. Also, how do you even go about doing such a thing?
No -- thread local means that each thread gets its own copy of that variable. Using it is (at least normally) thread-safe, simply because each thread uses its own variable, separate from variables by the same name that's accessible to other threads. OTOH, they're not (normally) useful for communication between threads.

Categories