GLOBAL_VAR = 1
class Worker:
class_var = 2
Worker's instances are created by multiple processes. Will they have their own copies of above variables? If not, can I use them to safely lock access to a resource accessed by multiple class instances (creating and accessing them in a thread-safe manner, of course)? I want to do it transparently for a class user. What is the right way to do this?
Are you talking about multithreading or multiprocessing? These are very different in Python.
Threads can access variables like you are use to in Python, without restriction.
Process on the other hand can't access variables from another process (aside some exception of shared variables). Process will copy the current state of the local variables on creation, but it will only be a copy.
Related
If I instantiate an object in the main thread, and then send one of it's member methods to a ThreadPoolExecutor, does Python somehow create a copy-by-value of the object and sends it to the subthread, so that the objects member method will have access to its own copy of self?
Or is it indeed accessing self from the object in the main thread, thus meaning that every member in a subthread is modifying / overwriting the same properties (living in the main thread)?
Threads share a memory space. There is no magic going on behind the scenes, so code in different threads accesses the same objects. Thread switches can occur at any time, although most simple Python operations are atomic. It is up to you to avoid race conditions. Normal Python scoping rules apply.
You might want to read about ThreadLocal variables if you want to find out about workarounds to the default behavior.
Processes as quite different. Each Process has its own memory space and its own copy of all the objects it references.
I've assumed that multiprocessing.Pool uses pickle to pass initargs to child processes.
However I find the following stange:
value = multiprocessing.Value('i', 1)
multiprocess.Pool(initializer=worker, initargs=(value, )) # Works
But this does not work:
pickle.dumps(value)
throwing:
RuntimeError: Synchronized objects should only be shared between processes through inheritance
Why is that, and how multiprocessing initargs can bypass that, as it's using pickle as well?
As I understand, multiprocessing.Value is using shared memory behind the scenes, what is the difference between inheritance or passing it via initargs? Specifically speaking on Windows, where the code does not fork, so a new instance of multiprocessing.Value is created.
And if you had instead passed an instance of multiprocessing.Lock(), the error message would have been RuntimeError: Lock objects should only be shared between processes through inheritance. These things can be passed as arguments if you are creating an instance of multiprocessing.Process, which is in fact what is being used when you say initializer=worker, initargs=(value,). The test being made is whether a process is currently being spawned, which is not the case when you already have an existing process pool and are now just submitting some work for it. But why this restriction?
Would it make sense for you to be able to pickle this shared memory to a file and then a week later trying to unpickle it and use it? Of course not! Python cannot know that you would not be doing anything so foolish as that and so it places great restrictions on how shared memory and locks can be pickled/unpickled, which is only for passing to other processes.
In the script's main thread, I setup a variable called queue and fill it with URLs. Then I create 8 processes using multiprocessing.Process and then those processes spawn 10 threads each using the threading library.
Within the thread worker (which was spawned by another process as per above), I have global queue.
Then will queue.get() act as expected? I've tried it and it seems to be okay during some tests and others not.
Question is, can global variables be accessed from another process and thread?
It is hard to understand what you are exactly asking. But there are two main questions here:
Can global variables be accessed from another process?
No, not without some form of inter-process communication, and even then you would be passing a copy of that variable to the other process. Each process has its own global state.
Can global variables be accessed from another thread?
Yes, a thread who lives in the same process can access a global variable, but you must ensure the safety of any memory that is accessed by multiple threads. Meaning, threads should not access writable memory at the same time as other threads or you run the risk of one thread writing to the memory while another thread attempts to read it.
Answering Question Above
If I understand the setup correctly, each of your child processes has their own global variable queue. Each of those queues should be accessible only to the threads that are spawned within that process.
I have a number of threads in my software that all do the same thing, but each thread operates from a different "perspective." I have a "StateModel" object that is used throughout the thread and the objects within the thread, but StateModel needs to be calculated differently for each thread.
I don't like the idea of passing the StateModel object around to all of the functions that need it. Normally, I would create a module variable and all of the objects throughout the program could reference the same data from the module variable. But, is there a way to have this concept of a static module variable that is different and independent for each thread? A kind of static Thread variable?
Thanks.
This is implemented in threading.local.
I tend to dislike mostly-quoting-the-docs answers, but... well, time and place for everything.
A class that represents thread-local data. Thread-local data are data
whose values are thread specific. To manage thread-local data, just
create an instance of local (or a subclass) and store attributes on
it:
mydata = threading.local()
mydata.x = 1
The instance’s values will be
different for separate threads.
For more details and extensive examples, see the documentation string
of the _threading_local module.
Notably you can just have your class extend threading.local, and suddenly your class has thread-local behavior.
I have a problem with understanding how data is exchanged between processes (multiprocessing implementation). It behaves as parameters are passed as reference (or copy - depends whether it is mutable or imutable variable). If so, how it is achieved between processes?
Below examplary code is understandable for me if it is executed within one process (ConsumerProcess is a thread not a process for instance) but how does it work if it is exercised within 2 separate processes?
tasks = Queue()
consumerProcess = ConsumerProcess(tasks) # it is subprocess for main
tasks.put(aTask) # why does it behave as reference?
The multiprocessing library either lets you use shared memory, or in the case your Queue class, using a manager service that coordinates communication between processes.
See the Sharing state between processes and Managers sections in the documentation.
Managers use Proxy objects to represent state in a process. The Queue class is such a proxy. State is then shared via pickled states:
An important feature of proxy objects is that they are picklable so they can be passed between processes.