I am using a code posted below to enable pause-restart functionality for multiprocessing Pool.
I would appreciate if you explain me why event variable has to be sent as an argument to setup() function. Why then a global variable unpaused is declared inside of the scope of setup() function and then it is set to be the same as event variable:
def setup(event):
global unpaused
unpaused = event
I also would like to know a logistic behind of the following declaration:
pool=mp.Pool(2, setup, (event,))
The first argument submitted is the number of the CPU cores to be used by Pool.
The second argument submitted is a function setup() which is mentioned above.
Why wouldn't it all be accomplished like:
global event
event=mp.Event()
pool = mp.Pool(processes=2)
And every time we need to pause or to restart a job we would just use:
To pause:
event.clear()
To restart:
event.set()
Why would we need a global variable unpaused? I don't get it! Please advise.
import time
import multiprocessing as mp
def myFunct(arg):
proc=mp.current_process()
print 'starting:', proc.name, proc.pid,'...\n'
for i in range(110):
for n in range(500000):
pass
print '\t ...', proc.name, proc.pid, 'completed\n'
def setup(event):
global unpaused
unpaused = event
def pauseJob():
event.clear()
def continueJob():
event.set()
event=mp.Event()
pool=mp.Pool(2, setup, (event,))
pool.map_async(myFunct, [1,2,3])
event.set()
pool.close()
pool.join()
You're misunderstanding how Event works. But first, I'll cover what setup is doing.
The setup function is executed in each child process inside the pool as soon as it is started. So, you're setting a global variable called event inside each process to be the the same multiprocessing.Event object you created in your main process. You end up with each sub-process having a global variable called event that's reference to the same multiprocessing.Event object. This will allow you to signal your child processes from the main process, just like you want. See this example:
import multiprocessing
event = None
def my_setup(event_):
global event
event = event_
print "event is %s in child" % event
if __name__ == "__main__":
event = multiprocessing.Event()
p = multiprocessing.Pool(2, my_setup, (event,))
print "event is %s in parent" % event
p.close()
p.join()
Output:
dan#dantop2:~$ ./mult.py
event is <multiprocessing.synchronize.Event object at 0x7f93cd7a48d0> in child
event is <multiprocessing.synchronize.Event object at 0x7f93cd7a48d0> in child
event is <multiprocessing.synchronize.Event object at 0x7f93cd7a48d0> in parent
As you can see, it's the same event in the two child processes as well as the parent. Just like you want.
However, passing event to setup actually isn't necessary. You can just inherit the event instance from the parent process:
import multiprocessing
event = None
def my_worker(num):
print "event is %s in child" % event
if __name__ == "__main__":
event = multiprocessing.Event()
pool = multiprocessing.Pool(2)
pool.map_async(my_worker, [i for i in range(pool._processes)]) # Just call my_worker for every process in the pool.
pool.close()
pool.join()
print "event is %s in parent" % event
Output:
dan#dantop2:~$ ./mult.py
event is <multiprocessing.synchronize.Event object at 0x7fea3b1dc8d0> in child
event is <multiprocessing.synchronize.Event object at 0x7fea3b1dc8d0> in child
event is <multiprocessing.synchronize.Event object at 0x7fea3b1dc8d0> in parent
This is a lot simpler, and is the preferred way to pass a semaphore between parent and child. In fact, if you were to try to pass the event directly to a worker function, you'd get an error:
RuntimeError: Semaphore objects should only be shared between processes through inheritance
Now, back to how you're misunderstanding the way Event works. Event is meant to be used like this:
import time
import multiprocessing
def event_func(num):
print '\t%r is waiting' % multiprocessing.current_process()
event.wait()
print '\t%r has woken up' % multiprocessing.current_process()
if __name__ == "__main__":
event = multiprocessing.Event()
pool = multiprocessing.Pool()
a = pool.map_async(event_func, [i for i in range(pool._processes)])
print 'main is sleeping'
time.sleep(2)
print 'main is setting event'
event.set()
pool.close()
pool.join()
Output:
main is sleeping
<Process(PoolWorker-1, started daemon)> is waiting
<Process(PoolWorker-2, started daemon)> is waiting
<Process(PoolWorker-4, started daemon)> is waiting
<Process(PoolWorker-3, started daemon)> is waiting
main is setting event
<Process(PoolWorker-2, started daemon)> has woken up
<Process(PoolWorker-1, started daemon)> has woken up
<Process(PoolWorker-4, started daemon)> has woken up
<Process(PoolWorker-3, started daemon)> has woken up
As you can see, the child processes need to explicitly call event.wait() for them to be paused. They get unpaused when event.set is called in the main process. Right now none of your workers are calling event.wait, so none of them can ever be paused. I suggest you take a look at the docs for threading.Event, which multiprocessing.Event replicates.
Related
I'm testing multiprocessing using apply_async.
However, it looks like each apply_async is called from MainProcess and it's not exactly asynchronous. Each function is called only after previous one is finished. I'm not sure what I'm missing here.
I'm using Windows with Python 3.8, so it's using the spawn method to create processes.
import os
import time
from multiprocessing import Pool, cpu_count, current_process
from threading import current_thread
def go_to_sleep():
pid = os.getpid()
thread_name = current_thread().name
process_name = current_process().name
print(f"{pid} Process {process_name} and {thread_name} going to sleep")
time.sleep(5)
def apply_async():
pool = Pool(processes=cpu_count())
print(f"Number of procesess {len(pool._pool)}")
for i in range(20):
pool.apply_async(go_to_sleep())
pool.close()
pool.join()
def main():
apply_async()
if __name__ == "__main__":
start_time = time.perf_counter()
main()
end_time = time.perf_counter()
print(f"Elapsed run time: {end_time - start_time} seconds.")
Output:
Number of procesess 8
26776 Process MainProcess and MainThread going to sleep
26776 Process MainProcess and MainThread going to sleep
26776 Process MainProcess and MainThread going to sleep
The problem is that your code is not actually calling the specified function in the process pool, it is calling it in the main thread, and passing the result of calling it to pool.apply_async.
That is, instead of calling pool.apply_async(go_to_sleep()), you should call pool.apply_async(go_to_sleep). You need to pass the function that should be called to Pool.apply_async - you should not call the function when you call Pool.apply_async.
I've read that it's considered bad practice to kill a thread. (Is there any way to kill a Thread?) There are a LOT of answers there, and I'm wondering if even using a thread in the first place is the right answer for me.
I have a bunch multiprocessing.Processes. Essentially, each Process is doing this:
while some_condition:
result = self.function_to_execute(i, **kwargs_i)
# outQ is a multiprocessing.queue shared between all Processes
self.outQ.put(Result(i, result))
Problem is... I need a way to interrupt function_to_execute, but can't modify the function itself. Initially, I was thinking simply process.terminate(), but that appears to be unsafe with multiprocessing.queue.
Most likely (but not guaranteed), if I need to kill a thread, the 'main' program is going to be done soon. Is my safest option to do something like this? Or perhaps there is a more elegant solution than using a thread in the first place?
def thread_task():
while some_condition:
result = self.function_to_execute(i, **kwargs_i)
if (this_thread_is_not_daemonized):
self.outQ.put(Result(i, result))
t = Thread(target=thread_task)
t.start()
if end_early:
t.daemon = True
I believe the end result of this is that the Process that spawned the thread will continue to waste CPU cycles on a task I no longer care about the output for, but if the main program finishes, it'll clean up all my memory nicely.
The main problem with daemonizing a thread is that the main program could potentially continue for 30+ minutes even when I don't care about the output of that thread anymore.
From the threading docs:
If you want your threads to stop gracefully, make them non-daemonic
and use a suitable signalling mechanism such as an Event
Here is a contrived example of what I was thinking - no idea if it mimics what you are doing or can be adapted for your situation. Another caveat: I've never written any real concurrent code.
Create an Event object in the main process and pass it all the way to the thread.
Design the thread so that it loops until the Event object is set. Once you don't need the processing anymore SET the Event object in the main process. No need to modify the function being run in the thread.
from multiprocessing import Process, Queue, Event
from threading import Thread
import time, random, os
def f_to_run():
time.sleep(.2)
return random.randint(1,10)
class T(Thread):
def __init__(self, evt,q, func, parent):
self.evt = evt
self.q = q
self.func = func
self.parent = parent
super().__init__()
def run(self):
while not self.evt.is_set():
n = self.func()
self.q.put(f'PID {self.parent}-{self.name}: {n}')
def f(T,evt,q,func):
pid = os.getpid()
t = T(evt,q,func,pid)
t.start()
t.join()
q.put(f'PID {pid}-{t.name} is alive - {t.is_alive()}')
q.put(f'PID {pid}:DONE')
return 'foo done'
if __name__ == '__main__':
results = []
q = Queue()
evt = Event()
# two processes each with one thread
p= Process(target=f, args=(T, evt, q, f_to_run))
p1 = Process(target=f, args=(T, evt, q, f_to_run))
p.start()
p1.start()
while len(results) < 40:
results.append(q.get())
print('.',end='')
print('')
evt.set()
p.join()
p1.join()
while not q.empty():
results.append(q.get_nowait())
for thing in results:
print(thing)
I initially tried to use threading.Event but the multiprocessing module complained that it couldn't be pickled. I was actually surprised that the multiprocessing.Queue and multiprocessing.Event worked AND could be accessed by the thread.
Not sure why I started with a Thread subclass - I think I thought it would be easier to control/specify what happens in it's run method. But it can be done with a function also.
from multiprocessing import Process, Queue, Event
from threading import Thread
import time, random
def f_to_run():
time.sleep(.2)
return random.randint(1,10)
def t1(evt,q, func):
while not evt.is_set():
n = func()
q.put(n)
def g(t1,evt,q,func):
t = Thread(target=t1,args=(evt,q,func))
t.start()
t.join()
q.put(f'{t.name} is alive - {t.is_alive()}')
return 'foo'
if __name__ == '__main__':
q = Queue()
evt = Event()
p= Process(target=g, args=(t1, evt, q, f_to_run))
p.start()
time.sleep(5)
evt.set()
p.join()
I am writing a python script which has 2 child processes. The main logic occurs in one process and another process waits for some time and then kills the main process even if the logic is not done.
I read that calling os_exit(1) stops the interpreter, so the entire script is killed automatically. I've used it like shown below:
import os
from multiprocessing import Process, Lock
from multiprocessing.sharedctypes import Array
# Main process
def main_process(shared_variable):
shared_variable.value = "mainprc"
time.sleep(20)
print("Task finished normally.")
os._exit(1)
# Timer process
def timer_process(shared_variable):
threshold_time_secs = 5
time.sleep(threshold_time_secs)
print("Timeout reached")
print("Shared variable ",shared_variable.value)
print("Task is shutdown.")
os._exit(1)
if __name__ == "__main__":
lock = Lock()
shared_variable = Array('c',"initial",lock=lock)
process_main = Process(target=main_process, args=(shared_variable))
process_timer = Process(target=timer_process, args=(shared_variable))
process_main.start()
process_timer.start()
process_timer.join()
The timer process calls os._exit but the script still waits for the main process to print "Task finished normally." before exiting.
How do I make it such that if timer process exits, the entire program is shutdown (including main process)?
Thanks.
everytime when running this program, I hear my cpu fan is boosting. I suspected the busy waiting while loops in the code is the cause. I wonder how a real programmer will do to optimize this?
from multiprocessing import Process, Queue
import threading
class PThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
#view leave will set this event
self.event = threading.Event()
def run(self):
while 1:
if not self.event.is_set():
print 'run'
else:
break
def server_control(queue):
while True:
try:
event = queue.get(False)
except:
event = None
if event == 'DETECTED':
print 'DETECTED'
t = PThread()
t.start()
elif event == 'LEAVE':
print 'Viewer_left'
t.event.set()
t.join()
elif event == 'QUIT':
break
q=Queue()
p = Process(target=server_control, args=(q,))
p.start()
p.join()
If a thread needs to wait for an event, it should sleep until the event occurs, rather than busy-waiting. Your event object has a wait() method that can be used to accomplish that. Call it, and it won't return until some other thread has called set() on the event (or the timeout elapses, if you specify one). In the meantime, the thread uses no CPU.
The multiprocessing module has a clone of threading's event object
from multiprocessing import Process, Event
Instead of use a Queue. You should declare event of interest in your main and pass them to other process
In your case:
detected = Event()
leave = Event()
exit = Event()
Process(target=server_control, args=(detected, leave, exit))
and finally check if the event is fired or wait in your loop
You might make the loop a bit less tight by adding a time.sleep(0) in the loop to pass the remainder of the quantum to another thread.
See also: How does a threading.Thread yield the rest of its quantum in Python?
If I have a threading.Event and the following two lines of code:
event.set()
event.clear()
and I have some threads who are waiting for that event.
My question is related to what happens when calling the set() method:
Can I be ABSOLUTELY sure that all the waiting thread(s) will be notified? (i.e. Event.set() "notifies" the threads)
Or could it happen that those two lines are executed so quickly after each other, that some threads might still be waiting? (i.e. Event.wait() polls the event's state, which might be already "cleared" again)
Thanks for your answers!
In the internals of Python, an event is implemented with a Condition() object.
When calling the event.set() method, the notify_all() of the condition is called (after getting the lock to be sure to be not interrupted), then all the threads receive the notification (the lock is released only when all the threads are notified), so you can be sure that all the threads will effectively be notified.
Now, clearing the event just after the notification is not a problem.... until you do not want to check the event value in the waiting threads with an event.is_set(), but you only need this kind of check if you were waiting with a timeout.
Examples :
pseudocode that works :
#in main thread
event = Event()
thread1(event)
thread2(event)
...
event.set()
event.clear()
#in thread code
...
event.wait()
#do the stuff
pseudocode that may not work :
#in main thread
event = Event()
thread1(event)
thread2(event)
...
event.set()
event.clear()
#in thread code
...
while not event.is_set():
event.wait(timeout_value)
#do the stuff
Edited : in python >= 2.7 you can still wait for an event with a timeout and be sure of the state of the event :
event_state = event.wait(timeout)
while not event_state:
event_state = event.wait(timeout)
It's easy enough to verify that things work as expected (Note: this is Python 2 code, which will need adapting for Python 3):
import threading
e = threading.Event()
threads = []
def runner():
tname = threading.current_thread().name
print 'Thread waiting for event: %s' % tname
e.wait()
print 'Thread got event: %s' % tname
for t in range(100):
t = threading.Thread(target=runner)
threads.append(t)
t.start()
raw_input('Press enter to set and clear the event:')
e.set()
e.clear()
for t in threads:
t.join()
print 'All done.'
If you run the above script and it terminates, all should be well :-) Notice that a hundred threads are waiting for the event to be set; it's set and cleared straight away; all threads should see this and should terminate (though not in any definite order, and the "All done" can be printed anywhere after the "Press enter" prompt, not just at the very end.
Python 3+
It's easier to check that it works
import threading
import time
lock = threading.Lock() # just to sync printing
e = threading.Event()
threads = []
def runner():
tname = threading.current_thread().name
with lock:
print('Thread waiting for event ', tname)
e.wait()
with lock:
print('Thread got event: ', tname)
for t in range(8): # Create 8 threads could be 100's
t = threading.Thread(target=runner)
threads.append(t)
t.start()
time.sleep(1) # force wait until set/clear
e.set()
e.clear()
for t in threads:
t.join()
print('Done')