I'm testing multiprocessing using apply_async.
However, it looks like each apply_async is called from MainProcess and it's not exactly asynchronous. Each function is called only after previous one is finished. I'm not sure what I'm missing here.
I'm using Windows with Python 3.8, so it's using the spawn method to create processes.
import os
import time
from multiprocessing import Pool, cpu_count, current_process
from threading import current_thread
def go_to_sleep():
pid = os.getpid()
thread_name = current_thread().name
process_name = current_process().name
print(f"{pid} Process {process_name} and {thread_name} going to sleep")
time.sleep(5)
def apply_async():
pool = Pool(processes=cpu_count())
print(f"Number of procesess {len(pool._pool)}")
for i in range(20):
pool.apply_async(go_to_sleep())
pool.close()
pool.join()
def main():
apply_async()
if __name__ == "__main__":
start_time = time.perf_counter()
main()
end_time = time.perf_counter()
print(f"Elapsed run time: {end_time - start_time} seconds.")
Output:
Number of procesess 8
26776 Process MainProcess and MainThread going to sleep
26776 Process MainProcess and MainThread going to sleep
26776 Process MainProcess and MainThread going to sleep
The problem is that your code is not actually calling the specified function in the process pool, it is calling it in the main thread, and passing the result of calling it to pool.apply_async.
That is, instead of calling pool.apply_async(go_to_sleep()), you should call pool.apply_async(go_to_sleep). You need to pass the function that should be called to Pool.apply_async - you should not call the function when you call Pool.apply_async.
Related
I have written a simple code like below. This is just a model of another, much more complicated problem. Here is a simple function "task submit" addint tasks in the queue, its aim is to continiously seek tasks deligated by used since user can create new tasks after the code has been launched. I have a worker, behaving like doing something, just a simple worker function. Then I call ThreadPoolExecutor, call "task submit" with queue argument. Then I start adding tasks pulled from queue. But it happens the code doest terminate even when only main thread (which is my program itself) remains in the pool of threads. Cant understand why even shutdown doesnt work.
from concurrent.futures import ThreadPoolExecutor as Tpe
import time
import random
import queue
import threading
def task_submit(q):
for i in range(7):
threading.currentThread().setName('task_submit')
new_task = random.randint(10, 20)
q.put_nowait(new_task)
print(f' {i} new task with argument {new_task} has been added to queue')
time.sleep(5)
def worker(t):
threading.currentThread().setName(f'worker {t}')
print(f'{threading.currentThread().getName()} started')
time.sleep(t)
print(f'{threading.currentThread().getName()} FINISHED!')
with Tpe(max_workers=4) as executor:
q = queue.Queue(maxsize=100)
q_thread = executor.submit(task_submit, q)
tasks = []
while True:
time.sleep(10)
print('\n\n------------NEW CYCLE----------------\n\n')
if not q.empty():
print(threading.enumerate())
tasks.append(executor.submit(worker, q.get()))
else:
print('is queue empty?', q.empty())
print(f'active threads: {threading.active_count()}')
print(threading.enumerate())
executor.shutdown(wait=True)
Is there a way to make the processes in concurrent.futures.ProcessPoolExecutor terminate if the parent process terminates for any reason?
Some details: I'm using ProcessPoolExecutor in a job that processes a lot of data. Sometimes I need to terminate the parent process with a kill command, but when I do that the processes from ProcessPoolExecutor keep running and I have to manually kill them too. My primary work loop looks like this:
with concurrent.futures.ProcessPoolExecutor(n_workers) as executor:
result_list = [executor.submit(_do_work, data) for data in data_list]
for id, future in enumerate(
concurrent.futures.as_completed(result_list)):
print(f'{id}: {future.result()}')
Is there anything I can add here or do differently to make the child processes in executor terminate if the parent dies?
You can start a thread in each process to terminate when parent process dies:
def start_thread_to_terminate_when_parent_process_dies(ppid):
pid = os.getpid()
def f():
while True:
try:
os.kill(ppid, 0)
except OSError:
os.kill(pid, signal.SIGTERM)
time.sleep(1)
thread = threading.Thread(target=f, daemon=True)
thread.start()
Usage: pass initializer and initargs to ProcessPoolExecutor
with concurrent.futures.ProcessPoolExecutor(
n_workers,
initializer=start_thread_to_terminate_when_parent_process_dies, # +
initargs=(os.getpid(),), # +
) as executor:
This works even if the parent process is SIGKILL/kill -9'ed.
I would suggest two changes:
Use a kill -15 command, which can be handled by the Python program as a SIGTERM signal rather than a kill -9 command.
Use a multiprocessing pool created with the multiprocessing.pool.Pool class, whose terminate method works quite differently than that of the concurrent.futures.ProcessPoolExecutor class in that it will kill all processes in the pool so any tasks that have been submitted and running will be also immediately terminated.
Your equivalent program using the new pool and handling a SIGTERM interrupt would be:
from multiprocessing import Pool
import signal
import sys
import os
...
def handle_sigterm(*args):
#print('Terminating...', file=sys.stderr, flush=True)
pool.terminate()
sys.exit(1)
# The process to be "killed", if necessary:
print(os.getpid(), file=sys.stderr)
pool = Pool(n_workers)
signal.signal(signal.SIGTERM, handle_sigterm)
results = pool.imap_unordered(_do_work, data_list)
for id, result in enumerate(results):
print(f'{id}: {result}')
You could run the script in a kill-cgroup. When you need to kill the whole thing, you can do so by using the cgroup's kill switch. Even a cpu-cgroup will do the trick as you can access the group's pids.
Check this article on how to use cgexec.
I have two python function which I want to run in parallel. I don't want sub_task function to wait for main_task function.
from threading import Thread
from multiprocessing import Process
from time import sleep,time
def main_task():
while True:
sleep(2)
print('main task running')
def sub_task():
while True:
sleep(5)
print('sub task running')
When I used thread this way, I can see output
q=Thread(target = main_task).start()
s=Thread(target = sub_task).start()
But when I used Process this way, I cannot see ouput
q=Process(target = main_task).start()
s=Process(target = sub_task).start()
So what is wrong with the implementation.
I am writing a python script which has 2 child processes. The main logic occurs in one process and another process waits for some time and then kills the main process even if the logic is not done.
I read that calling os_exit(1) stops the interpreter, so the entire script is killed automatically. I've used it like shown below:
import os
from multiprocessing import Process, Lock
from multiprocessing.sharedctypes import Array
# Main process
def main_process(shared_variable):
shared_variable.value = "mainprc"
time.sleep(20)
print("Task finished normally.")
os._exit(1)
# Timer process
def timer_process(shared_variable):
threshold_time_secs = 5
time.sleep(threshold_time_secs)
print("Timeout reached")
print("Shared variable ",shared_variable.value)
print("Task is shutdown.")
os._exit(1)
if __name__ == "__main__":
lock = Lock()
shared_variable = Array('c',"initial",lock=lock)
process_main = Process(target=main_process, args=(shared_variable))
process_timer = Process(target=timer_process, args=(shared_variable))
process_main.start()
process_timer.start()
process_timer.join()
The timer process calls os._exit but the script still waits for the main process to print "Task finished normally." before exiting.
How do I make it such that if timer process exits, the entire program is shutdown (including main process)?
Thanks.
I am using a code posted below to enable pause-restart functionality for multiprocessing Pool.
I would appreciate if you explain me why event variable has to be sent as an argument to setup() function. Why then a global variable unpaused is declared inside of the scope of setup() function and then it is set to be the same as event variable:
def setup(event):
global unpaused
unpaused = event
I also would like to know a logistic behind of the following declaration:
pool=mp.Pool(2, setup, (event,))
The first argument submitted is the number of the CPU cores to be used by Pool.
The second argument submitted is a function setup() which is mentioned above.
Why wouldn't it all be accomplished like:
global event
event=mp.Event()
pool = mp.Pool(processes=2)
And every time we need to pause or to restart a job we would just use:
To pause:
event.clear()
To restart:
event.set()
Why would we need a global variable unpaused? I don't get it! Please advise.
import time
import multiprocessing as mp
def myFunct(arg):
proc=mp.current_process()
print 'starting:', proc.name, proc.pid,'...\n'
for i in range(110):
for n in range(500000):
pass
print '\t ...', proc.name, proc.pid, 'completed\n'
def setup(event):
global unpaused
unpaused = event
def pauseJob():
event.clear()
def continueJob():
event.set()
event=mp.Event()
pool=mp.Pool(2, setup, (event,))
pool.map_async(myFunct, [1,2,3])
event.set()
pool.close()
pool.join()
You're misunderstanding how Event works. But first, I'll cover what setup is doing.
The setup function is executed in each child process inside the pool as soon as it is started. So, you're setting a global variable called event inside each process to be the the same multiprocessing.Event object you created in your main process. You end up with each sub-process having a global variable called event that's reference to the same multiprocessing.Event object. This will allow you to signal your child processes from the main process, just like you want. See this example:
import multiprocessing
event = None
def my_setup(event_):
global event
event = event_
print "event is %s in child" % event
if __name__ == "__main__":
event = multiprocessing.Event()
p = multiprocessing.Pool(2, my_setup, (event,))
print "event is %s in parent" % event
p.close()
p.join()
Output:
dan#dantop2:~$ ./mult.py
event is <multiprocessing.synchronize.Event object at 0x7f93cd7a48d0> in child
event is <multiprocessing.synchronize.Event object at 0x7f93cd7a48d0> in child
event is <multiprocessing.synchronize.Event object at 0x7f93cd7a48d0> in parent
As you can see, it's the same event in the two child processes as well as the parent. Just like you want.
However, passing event to setup actually isn't necessary. You can just inherit the event instance from the parent process:
import multiprocessing
event = None
def my_worker(num):
print "event is %s in child" % event
if __name__ == "__main__":
event = multiprocessing.Event()
pool = multiprocessing.Pool(2)
pool.map_async(my_worker, [i for i in range(pool._processes)]) # Just call my_worker for every process in the pool.
pool.close()
pool.join()
print "event is %s in parent" % event
Output:
dan#dantop2:~$ ./mult.py
event is <multiprocessing.synchronize.Event object at 0x7fea3b1dc8d0> in child
event is <multiprocessing.synchronize.Event object at 0x7fea3b1dc8d0> in child
event is <multiprocessing.synchronize.Event object at 0x7fea3b1dc8d0> in parent
This is a lot simpler, and is the preferred way to pass a semaphore between parent and child. In fact, if you were to try to pass the event directly to a worker function, you'd get an error:
RuntimeError: Semaphore objects should only be shared between processes through inheritance
Now, back to how you're misunderstanding the way Event works. Event is meant to be used like this:
import time
import multiprocessing
def event_func(num):
print '\t%r is waiting' % multiprocessing.current_process()
event.wait()
print '\t%r has woken up' % multiprocessing.current_process()
if __name__ == "__main__":
event = multiprocessing.Event()
pool = multiprocessing.Pool()
a = pool.map_async(event_func, [i for i in range(pool._processes)])
print 'main is sleeping'
time.sleep(2)
print 'main is setting event'
event.set()
pool.close()
pool.join()
Output:
main is sleeping
<Process(PoolWorker-1, started daemon)> is waiting
<Process(PoolWorker-2, started daemon)> is waiting
<Process(PoolWorker-4, started daemon)> is waiting
<Process(PoolWorker-3, started daemon)> is waiting
main is setting event
<Process(PoolWorker-2, started daemon)> has woken up
<Process(PoolWorker-1, started daemon)> has woken up
<Process(PoolWorker-4, started daemon)> has woken up
<Process(PoolWorker-3, started daemon)> has woken up
As you can see, the child processes need to explicitly call event.wait() for them to be paused. They get unpaused when event.set is called in the main process. Right now none of your workers are calling event.wait, so none of them can ever be paused. I suggest you take a look at the docs for threading.Event, which multiprocessing.Event replicates.