Please consider this code:
import time
from multiprocessing import Process
class Host(object):
def __init__(self):
self.id = None
def callback(self):
print "self.id = %s" % self.id
def bind(self, event_source):
event_source.callback = self.callback
class Event(object):
def __init__(self):
self.callback = None
def trigger(self):
self.callback()
h = Host()
h.id = "A"
e = Event()
h.bind(e)
e.trigger()
def delayed_trigger(f, delay):
time.sleep(delay)
f()
p = Process(target = delayed_trigger, args = (e.trigger, 3,))
p.start()
h.id = "B"
e.trigger()
This gives in output
self.id = A
self.id = B
self.id = A
However, I expected it to give
self.id = A
self.id = B
self.id = B
..because the h.id was already changed to "B" by the time the trigger method was called.
It seems that a copy of host instance is created at the moment when the separate Process is started, so the changes in the original host do not influence that copy.
In my project (more elaborate, of course), the host instance fields are altered time to time, and it is important that the events that are triggered by the code running in a separate process, have access to those changes.
multiprocessing runs stuff in separate processes. It is almost inconceivable that things are not copied as they're sent, as sharing stuff between processes requires shared memory or communication.
In fact, if you peruse the module, you can see the amount of effort it takes to actually share anything between the processes after the diverge, either through explicit communication, or through explicitly-shared objects (which are of a very limited subset of the language, and have to be managed by a Manager).
Related
I have two custom Python classes, the first one has a method to make some calculations (using Pool) and create a new instance attribute, and the second one is used to aggregate two objects of the first class and has a method with which I want to send said calculations (also in parallel) in the two first-class objects and correctly save their new instance attributes.
Dummy code:
from multiprocessing import Pool, Process
class State:
def __init__(self, data):
self.data = data
def calculate(self):
with Pool() as p:
p.map(function, args)
new_attribute = *some code that reads the files generated with the Pool*
self.new_attribute = new_attribute
return
class Pair:
def __init__(self. state1:State, state2:State):
self.state1 = state1
self.state2 = state2
def calculate_states(self):
for state in [self.state1, self.state2]
p = Process(state.calculate, args)
p.start()
return
state1 = State(data1)
state2 = State(data2)
pair = Pair(state1, state2)
pair.calculate_states()
The problem is that, as I have found out during my extensive research about the problem, multiprocessing.Process creates copies of the namespace in which the processes work, and the values aren't returned to the main namespace. Setting the process.daemon to True produces an error, because "daemonic processes aren't allowed to have children", which is the same thing that happens if I exchange the Processes by an additional Pool. Using multiprocess (instead of multiprocessing) or concurrent.futures doesn't seem to work either. Additionally, I don't understand how multiprocessing.Queue works and I'm not sure if it could be applied here (I have read somewhere that it could be used).
I would like to do what I am trying to do without having to pass a shared-memory object to the Processes (to write the new_attribute into it and then apply it to the States in the main namespace). I hope someone can point me towards the solution even if I have not provided a working code/reproducible example.
Your problem arises from invoking method calculate as a new subprocess. You can still compute the new attributes in parallel without doing that by using map_async with a callback argument.
I have taken your code and provided missing function implementations to demonstrate:
from multiprocessing import Pool, cpu_count
def some_code(data):
if data == 1:
return 1032
if data == 2:
return 9874
raise ValueError('Invalid data value:', data)
def function(val):
...
# return value is not of interest
class State:
def __init__(self, data):
self.data = data
def calculate(self, pool, args):
pool.map_async(function, args, callback=self.callback)
def callback(self, result):
"""
Called when map_async completes
"""
new_attribute = some_code(self.data)
self.new_attribute = new_attribute
class Pair:
def __init__(self, state1:State, state2:State):
self.state1 = state1
self.state2 = state2
def calculate_states(self):
args = (6, 9, 18)
# Assumption is computation is VERY CPU-intensive
# If there is quite a bit of I/O involved then: pool_size = 2 * len(args)
# If it's mostly I/O you should have been using multithreading to begin with
pool_size = min(2*len(args), cpu_count())
with Pool(pool_size) as pool:
for state in [self.state1, self.state2]:
state.calculate(pool, args)
# wait for tasks to complete
pool.close()
pool.join()
# Required for Windows:
if __name__ == '__main__':
data1 = 1
data2 = 2
state1 = State(data1)
state2 = State(data2)
pair = Pair(state1, state2)
pair.calculate_states()
print(state1.new_attribute, state2.new_attribute)
Prints:
1032 9874
I am trying to implement a system where:
Actors generate data
Replay is a class that manage the data generated by the Actors (In theory it does much more than in the code below, but I kept it simple for posting it here)
Learner use the data of the Replay class (and sometimes update some data of Replay)
To implement that, I appended my generated data of the Actors to a multiprocessing.Queue, I use a process to push my data to my Replay. I used a multiprocessing.BaseManager to share the Replay.
This is my implementation (the code is working):
import time
import random
from collections import deque
import torch.multiprocessing as mp
from multiprocessing.managers import BaseManager
T = 20
B = 5
REPLAY_MINIMUM_SIZE = 10
REPLAY_MAXIMUM_SIZE = 100
class Actor:
def __init__(self, global_buffer, rank):
self.rank = rank
self.local_buffer = []
self.global_buffer = global_buffer
def run(self, num_steps):
for step in range(num_steps):
data = f'{self.rank}_{step}'
self.local_buffer.append(data)
if len(self.local_buffer) >= B:
self.global_buffer.put(self.local_buffer)
self.local_buffer = []
class Learner:
def __init__(self, replay):
self.replay = replay
def run(self, num_steps):
while self.replay.size() <= REPLAY_MINIMUM_SIZE:
time.sleep(0.1)
for step in range(num_steps):
batch = self.replay.sample(B)
print(batch)
class Replay:
def __init__(self, capacity):
self.memory = deque(maxlen=capacity)
def push(self, experiences):
self.memory.extend(experiences)
def sample(self, n):
return random.sample(self.memory, n)
def size(self):
return len(self.memory)
def send_data_to_replay(global_buffer, replay):
while True:
if not global_buffer.empty():
batch = global_buffer.get()
replay.push(batch)
if __name__ == '__main__':
num_actors = 2
global_buffer = mp.Queue()
BaseManager.register("ReplayMemory", Replay)
Manager = BaseManager()
Manager.start()
replay = Manager.ReplayMemory(REPLAY_MAXIMUM_SIZE)
learner = Learner(replay)
learner_process = mp.Process(target=learner.run, args=(T,))
learner_process.start()
actor_processes = []
for rank in range(num_actors):
p = mp.Process(target=Actor(global_buffer, rank).run, args=(T,))
p.start()
actor_processes.append(p)
replay_process = mp.Process(target=send_data_to_replay, args=(global_buffer, replay,))
replay_process.start()
learner_process.join()
[actor_process.join() for actor_process in actor_processes]
replay_process.join()
I followed several tutorials and read websites related to multiprocessing, but I am very new to distributed computing. I am not sure if what I am doing is right.
I wanted to know if there is some malpractice in my code or things that are not following good practices. Moreover, when I launch the program, the different processes do not terminate. And I am not sure why and how to handle it.
Any feedback would be appreciated!
I find that when working with multiprocessing it is best to have a Queue for each running process. When you are ready to close the application you can send an exit message ( or poison pill ) to each queue and close each process cleanly.
When you launch a child process pass the parent queue and the child queue to the new process through inheritance.
I have a class (MyClass) which contains a queue (self.msg_queue) of actions that need to be run and I have multiple sources of input that can add tasks to the queue.
Right now I have three functions that I want to run concurrently:
MyClass.get_input_from_user()
Creates a window in tkinter that has the user fill out information and when the user presses submit it pushes that message onto the queue.
MyClass.get_input_from_server()
Checks the server for a message, reads the message, and then puts it onto the queue. This method uses functions from MyClass's parent class.
MyClass.execute_next_item_on_the_queue()
Pops a message off of the queue and then acts upon it. It is dependent on what the message is, but each message corresponds to some method in MyClass or its parent which gets run according to a big decision tree.
Process description:
After the class has joined the network, I have it spawn three threads (one for each of the above functions). Each threaded function adds items from the queue with the syntax "self.msg_queue.put(message)" and removes items from the queue with "self.msg_queue.get_nowait()".
Problem description:
The issue I am having is that it seems that each thread is modifying its own queue object (they are not sharing the queue, msg_queue, of the class of which they, the functions, are all members).
I am not familiar enough with Multiprocessing to know what the important error messages are; however, it is stating that it cannot pickle a weakref object (it gives no indication of which object is the weakref object), and that within the queue.put() call the line "self._sem.acquire(block, timeout) yields a '[WinError 5] Access is denied'" error. Would it be safe to assume that this failure in the queue's reference not copying over properly?
[I am using Python 3.7.2 and the Multiprocessing package's Process and Queue]
[I have seen multiple Q/As about having threads shuttle information between classes--create a master harness that generates a queue and then pass that queue as an argument to each thread. If the functions didn't have to use other functions from MyClass I could see adapting this strategy by having those functions take in a queue and use a local variable rather than class variables.]
[I am fairly confident that this error is not the result of passing my queue to the tkinter object as my unit tests on how my GUI modifies its caller's queue work fine]
Below is a minimal reproducible example for the queue's error:
from multiprocessing import Queue
from multiprocessing import Process
import queue
import time
class MyTest:
def __init__(self):
self.my_q = Queue()
self.counter = 0
def input_function_A(self):
while True:
self.my_q.put(self.counter)
self.counter = self.counter + 1
time.sleep(0.2)
def input_function_B(self):
while True:
self.counter = 0
self.my_q.put(self.counter)
time.sleep(1)
def output_function(self):
while True:
try:
var = self.my_q.get_nowait()
except queue.Empty:
var = -1
except:
break
print(var)
time.sleep(1)
def run(self):
process_A = Process(target=self.input_function_A)
process_B = Process(target=self.input_function_B)
process_C = Process(target=self.output_function)
process_A.start()
process_B.start()
process_C.start()
# without this it generates the WinError:
# with this it still behaves as if the two input functions do not modify the queue
process_C.join()
if __name__ == '__main__':
test = MyTest()
test.run()
Indeed - these are not "threads" - these are "processes" - while if you were using multithreading, and not multiprocessing, the self.my_q instance would be the same object, placed at the same memory space on the computer,
multiprocessing does a fork of the process, and any data in the original process (the one in execution in the "run" call) will be duplicated when it is used - so, each subprocess will see its own "Queue" instance, unrelated to the others.
The correct way to have various process share a multiprocessing.Queue object is to pass it as a parameter to the target methods. The simpler way to reorganize your code so that it works is thus:
from multiprocessing import Queue
from multiprocessing import Process
import queue
import time
class MyTest:
def __init__(self):
self.my_q = Queue()
self.counter = 0
def input_function_A(self, queue):
while True:
queue.put(self.counter)
self.counter = self.counter + 1
time.sleep(0.2)
def input_function_B(self, queue):
while True:
self.counter = 0
queue.put(self.counter)
time.sleep(1)
def output_function(self, queue):
while True:
try:
var = queue.get_nowait()
except queue.Empty:
var = -1
except:
break
print(var)
time.sleep(1)
def run(self):
process_A = Process(target=self.input_function_A, args=(queue,))
process_B = Process(target=self.input_function_B, args=(queue,))
process_C = Process(target=self.output_function, args=(queue,))
process_A.start()
process_B.start()
process_C.start()
# without this it generates the WinError:
# with this it still behaves as if the two input functions do not modify the queue
process_C.join()
if __name__ == '__main__':
test = MyTest()
test.run()
As you can see, since your class is not actually sharing any data through the instance's attributes, this "class" design does not make much sense for your application - but for grouping the different workers in the same code block.
It would be possible to have a magic-multiprocess-class that would have some internal method to actually start the worker-methods and share the Queue instance - so if you have a lot of those in a project, there would be a lot less boilerplate.
Something along:
from multiprocessing import Queue
from multiprocessing import Process
import time
class MPWorkerBase:
def __init__(self, *args, **kw):
self.queue = None
self.is_parent_process = False
self.is_child_process = False
self.processes = []
# ensure this can be used as a colaborative mixin
super().__init__(*args, **kw)
def run(self):
if self.is_parent_process or self.is_child_process:
# workers already initialized
return
self.queue = Queue()
processes = []
cls = self.__class__
for name in dir(cls):
method = getattr(cls, name)
if callable(method) and getattr(method, "_MP_worker", False):
process = Process(target=self._start_worker, args=(self.queue, name))
self.processes.append(process)
process.start()
# Setting these attributes here ensure the child processes have the initial values for them.
self.is_parent_process = True
self.processes = processes
def _start_worker(self, queue, method_name):
# this method is called in a new spawned process - attribute
# changes here no longer reflect attributes on the
# object in the initial process
# overwrite queue in this process with the queue object sent over the wire:
self.queue = queue
self.is_child_process = True
# call the worker method
getattr(self, method_name)()
def __del__(self):
for process in self.processes:
process.join()
def worker(func):
"""decorator to mark a method as a worker that should
run in its own subprocess
"""
func._MP_worker = True
return func
class MyTest(MPWorkerBase):
def __init__(self):
super().__init__()
self.counter = 0
#worker
def input_function_A(self):
while True:
self.queue.put(self.counter)
self.counter = self.counter + 1
time.sleep(0.2)
#worker
def input_function_B(self):
while True:
self.counter = 0
self.queue.put(self.counter)
time.sleep(1)
#worker
def output_function(self):
while True:
try:
var = self.queue.get_nowait()
except queue.Empty:
var = -1
except:
break
print(var)
time.sleep(1)
if __name__ == '__main__':
test = MyTest()
test.run()
I'm a newbie to Python and learning about threads. I have created a sample Producer-Consumer code wherein I add a movie to a list in Producer thread and pop the front element from the same list in Consumer thread. The problem is while printing the items of the movie List along with thread name I'm getting incorrect thread name in Producer thread. This is my code
Producer.py
from threading import Thread
from threading import RLock
import time
class Producer(Thread):
def __init__(self):
Thread.__init__(self)
Thread.name = 'Producer'
self.movieList = list()
self.movieListLock = RLock()
def printMovieList(self):
self.movieListLock.acquire()
if len(self.movieList) > 0:
for movie in self.movieList:
print(Thread.name, movie)
print('\n')
self.movieListLock.release()
def pushMovieToList(self, movie):
self.movieListLock.acquire()
self.movieList.append(movie)
self.printMovieList()
self.movieListLock.release()
def run(self):
for i in range(6):
self.pushMovieToList('Avengers' + str(i + 1))
time.sleep(1)
Consumer.py
from threading import Thread
import time
class Consumer(Thread):
def __init__(self):
Thread.__init__(self)
Thread.name = 'Consumer'
self.objProducer = None
def popMovieFromList(self):
self.objProducer.movieListLock.acquire()
if len(self.objProducer.movieList) > 0:
movie = self.objProducer.movieList.pop(0)
print(Thread.name, ':', movie)
print('\n')
self.objProducer.movieListLock.release()
def run(self):
while True:
time.sleep(1)
self.popMovieFromList()
Main.py
from Producer import *
from Consumer import *
def main():
objProducer = Producer()
objConsumer = Consumer()
objConsumer.objProducer = objProducer
objProducer.start()
objConsumer.start()
objProducer.join()
objConsumer.join()
main()
I am not sure whether you solve this problem.
Hope my answer will be helpful.
You can check the document of threading.
Here it says that Thread.name may set same name for multiple thread.
name
A string used for identification purposes only. It has no semantics. Multiple threads may be given the same name. The initial name is set by the constructor.
I think Thread.name is a static variable so it shares in different thread.
If you want to set name of thread, you can set it in thread object like this:
class Producer(Thread):
def __init__(self):
Thread.__init__(self)
self.name= 'Producer'
and get it by threading.current_thread().name.
if len(self.movieList) > 0:
for movie in self.movieList:
print(threading.current_thread().name, movie)
Hope you enjoy it!
I was wondering if it would be possible to create some sort of static set in a Python Process subclass to keep track the types processes that are currently running asynchronously.
class showError(Process):
# Define some form of shared set that is shared by all Processes
displayed_errors = set()
def __init__(self, file_name, error_type):
super(showError, self).__init__()
self.error_type = error_type
def run(self):
if error_type not in set:
displayed_errors.add(error_type)
message = 'Please try again. ' + str(self.error_type)
winsound.MessageBeep(-1)
result = win32api.MessageBox(0, message, 'Error', 0x00001000)
if result == 0:
displayed_errors.discard(error_type)
That way, when I create/start multiple showError processes with the same error_type, subsequent error windows will not be created. So how can we define this shared set?
You can use a multiprocessing.Manager.dict (there's no set object available, but you can use a dict in the same way) and share that between all your subprocesses.
import multiprocessing as mp
if __name__ == "__main__":
m = mp.Manager()
displayed_errors = m.dict()
subp = showError("some filename", "some error type", displayed_errors)
Then change showError.__init__ to accept the shared dict:
def __init__(self, file_name, error_type, displayed_errors):
super(showError, self).__init__()
self.error_type = error_type
self.displayed_errors = displayed_errors
Then this:
displayed_errors.add(error_type)
Becomes:
self.displayed_errors[error_type] = 1
And this:
displayed_errors.discard(error_type)
Becomes:
try:
del self.displayed_errors[error_type]
except KeyError:
pass