Python multiprocessing module: join processes with timeout

Python multiprocessing module: join processes with timeout - python

I'm doing an optimization of parameters of a complex simulation. I'm using the multiprocessing module for enhancing the performance of the optimization algorithm. The basics of multiprocessing I learned at http://pymotw.com/2/multiprocessing/basics.html.
The complex simulation lasts different times depending on the given parameters from the optimization algorithm, around 1 to 5 minutes. If the parameters are chosen very badly, the simulation can last 30 minutes or more and the results are not useful. So I was thinking about build in a timeout to the multiprocessing, that terminates all simulations that last more than a defined time. Here is an abstracted version of the problem:
import numpy as np
import time
import multiprocessing
def worker(num):
time.sleep(np.random.random()*20)
def main():
pnum = 10
procs = []
for i in range(pnum):
p = multiprocessing.Process(target=worker, args=(i,), name = ('process_' + str(i+1)))
procs.append(p)
p.start()
print('starting', p.name)
for p in procs:
p.join(5)
print('stopping', p.name)
if __name__ == "__main__":
main()
The line p.join(5) defines the timeout of 5 seconds. Because of the for-loop for p in procs: the program waits 5 seconds until the first process is finished and then again 5 seconds until the second process is finished and so on, but i want the program to terminate all processes that last more than 5 seconds. Additionally, if none of the processes last longer than 5 seconds the program must not wait this 5 seconds.

You can do this by creating a loop that will wait for some timeout amount of seconds, frequently checking to see if all processes are finished. If they don't all finish in the allotted amount of time, then terminate all of the processes:
TIMEOUT = 5
start = time.time()
while time.time() - start <= TIMEOUT:
if not any(p.is_alive() for p in procs):
# All the processes are done, break now.
break
time.sleep(.1) # Just to avoid hogging the CPU
else:
# We only enter this if we didn't 'break' above.
print("timed out, killing all processes")
for p in procs:
p.terminate()
p.join()

If you want to kill all the processes you could use the Pool from multiprocessing
you'll need to define a general timeout for all the execution as opposed of individual timeouts.
import numpy as np
import time
from multiprocessing import Pool
def worker(num):
xtime = np.random.random()*20
time.sleep(xtime)
return xtime
def main():
pnum = 10
pool = Pool()
args = range(pnum)
pool_result = pool.map_async(worker, args)
# wait 5 minutes for every worker to finish
pool_result.wait(timeout=300)
# once the timeout has finished we can try to get the results
if pool_result.ready():
print(pool_result.get(timeout=1))
if __name__ == "__main__":
main()
This will get you a list with the return values for all your workers in order.
More information here:
https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool

Thanks to the help of dano I found a solution:
import numpy as np
import time
import multiprocessing
def worker(num):
time.sleep(np.random.random()*20)
def main():
pnum = 10
TIMEOUT = 5
procs = []
bool_list = [True]*pnum
for i in range(pnum):
p = multiprocessing.Process(target=worker, args=(i,), name = ('process_' + str(i+1)))
procs.append(p)
p.start()
print('starting', p.name)
start = time.time()
while time.time() - start <= TIMEOUT:
for i in range(pnum):
bool_list[i] = procs[i].is_alive()
print(bool_list)
if np.any(bool_list):
time.sleep(.1)
else:
break
else:
print("timed out, killing all processes")
for p in procs:
p.terminate()
for p in procs:
print('stopping', p.name,'=', p.is_alive())
p.join()
if __name__ == "__main__":
main()
Its not the most elegant way, I'm sure there is a better way than using bool_list. Processes that are still alive after the timeout of 5 seconds will be killed. If you are setting shorter times in the worker function than the timeout, you will see that the program stops before the timeout of 5 seconds is reached. I'm still open for more elegant solutions if there are :)

Related

Stop a process when error occur in multiprocessing

I have created a 3 process in python. I have attached a code.
Now I want to stop the execution of running p2,p3 process because I got an error due to p1 process.I have idea to add p2.terminate(),I don't know where to add in this case. Thanks in advance.
def table(a):
try:
for i in range(100):
print(i,'x',a,'=',a*i)
except:
print("error")
processes = []
p1= multiprocessing.Process(target = table,args=['s'])
p2= multiprocessing.Process(target = table,args=[5])
p3= multiprocessing.Process(target = table,args=[2])
p1.start()
p2.start()
p3.start()
processes.append(p1)
processes.append(p2)
processes.append(p3)
for process in processes:
process.join()```

To stop any given process once one of the process terminates due to an error, first set up your target table() to exit with an appropriate exitcode > 0
def table(args):
try:
for i in range(100):
print(i,'x', a ,'=', a*i)
except:
sys.exit(1)
sys.exit(0)
Then you can start your processes and poll the processes to see if any one has terminated.
#!/usr/bin/env python3
# coding: utf-8
import multiprocessing
import time
import logging
import sys
logging.basicConfig(level=logging.INFO, format='[%(asctime)-15s] [%(processName)-10s] %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
def table(args):
try:
for i in range(5):
logging.info('{} x {} = {}'.format(i, args, i*args))
if isinstance(args, str):
raise ValueError()
time.sleep(5)
except:
logging.error('Done in Error Path: {}'.format(args))
sys.exit(1)
logging.info('Done in Success Path: {}'.format(args))
sys.exit(0)
if __name__ == '__main__':
p1 = multiprocessing.Process(target=table, args=('s',))
p2 = multiprocessing.Process(target=table, args=(5,))
p3 = multiprocessing.Process(target=table, args=(2,))
processes = [p1, p2, p3]
for process in processes:
process.start()
while True:
failed = []
completed = []
for process in processes:
if process.exitcode is not None and process.exitcode != 0:
failed.append(process)
if failed:
for process in processes:
if process not in failed:
logging.info('Terminating Process: {}'.format(process))
process.terminate()
break
if len(completed) == len(processes):
break
time.sleep(1)
Essentially, you are using terminate() to stop the remaining processes that are still running.

to stop a all cores when one core has faced with error, i use this code block:
processes = []
for j in range(0, n_core):
p = multiprocessing.Process(target=table, args=('some input',))
processes.append(p)
time.sleep(0.1)
p.start()
flag = True
while flag:
flag = False
for p in processes:
if p.exitcode == 1:
for z in processes:
z.kill()
sys.exit(1)
elif p.is_alive():
flag = True
for p in processes:
p.join()

First, I have modified function table to throw an exception that is not caught when the argument passed to it is 's' and to delay .1 seconds otherwise before printing to give the main process a chance to realize that the sub-process through an exception and can cancel the other processes before they have started printing. Otherwise, the other processes will have completed before you can cancel them. Here I am using a process pool, which supports a terminate method that conveniently terminates all submitted, uncompleted tasks without having to cancel each one individually (although that is also an option).
The code creates a multiprocessing pool of size 3 since that is the number of "tasks" being submitted and then uses method apply_async to submit the 3 tasks to run in parallel (assuming you have at least 3 processors). apply_sync returns an AsyncResult instance whose get method can be called to wait for the completion of the submitted task and to get the return value from the worker function table, which is None for the second and third tasks submitted and of no interest, or will throw an exception if the worker function had an uncaught exception, which is the case with the first task submitted:
import multiprocessing
import time
def table(a):
if a == 's':
raise Exception('I am "s"')
time.sleep(.1)
for i in range(100):
print(i,'x',a,'=',a*i)
# required for Windows:
if __name__ == '__main__':
pool = multiprocessing.Pool(3) # create a pool of 3 processes
result1 = pool.apply_async(table, args=('s',))
result2 = pool.apply_async(table, args=(5,))
result3 = pool.apply_async(table, args=(2,))
try:
result1.get() # wait for completion of first task
except Exception as e:
print(e)
pool.terminate() # kill all processes in the pool
else:
# wait for all submitted tasks to complete:
pool.close()
pool.join()
"""
# or alternatively:
result2.get() # wait for second task to finish
result3.get() # wait for third task to finish
"""
Prints:
I am "s"

with multiprocessing, why sequential execution

I am following some examples online to learn how to program in parallel, i.e., how to use multiprocessing.
I am running on windows 10, with spyder 3.3.6, python 3.7.
import os
import time
from multiprocessing import Process, Queue
def square(numbers, queue):
print("started square")
for i in numbers:
queue.put(i*i)
print(i*i)
print(f"{os.getpid()}")
def cube(numbers, queue):
print("started cube")
for i in numbers:
queue.put(i*i*i)
print(i*i*i)
print(f"{os.getpid()}")
if __name__ == '__main__':
numbers = range(5)
queue = Queue()
square_process = Process(target=square, args=(numbers,queue))
cube_process = Process(target=cube, args=(numbers,queue))
square_process.start()
cube_process.start()
square_process.join()
cube_process.join()
print("Already joined")
while not queue.empty():
print(queue.get())
I expect the output of queue to be mixed or uncertain as it depends on how fast a process is started or how fast the first process finishes all the statements?
Theoretically, we can get something like 0, 1, 4, 8, 9, 27, 16, 64.
But the actual output is sequential like below
0
1
4
9
16
0
1
8
27
64

There are few things to understand here
Two processes are executing square and cube functions independently. Within the functions they will maintain the order as it is governed by for loop.
The only part that is going to be random at a point in time is - 'which process is executing and adding what to queue'. So it may be that square process is in its 5th iteration (i = 4) while cube process is in its 2nd iteration (i = 1).
You are using a single instance of Queue to add items from two processes that are executing square and cube functions separately. Queues are first in first out (FIFO) so when you get from Queue (& print in the main thread) it will maintain the order in which it has received the items.
Execute following updated version of your program, to better understand
import os
import time
from multiprocessing import Process, Queue
def square(numbers, queue):
print("started square process id is %s"%os.getpid())
for i in numbers:
queue.put("Square of %s is %s "%(i, i*i))
print("square: added %s in queue:"%i)
def cube(numbers, queue):
print("started cube process id is %s"%os.getpid())
for i in numbers:
queue.put("Cube of %s is %s "%(i, i*i*i))
print("cube: added %s in queue:"%i)
if __name__ == '__main__':
numbers = range(15)
queue = Queue()
square_process = Process(target=square, args=(numbers,queue))
cube_process = Process(target=cube, args=(numbers,queue))
square_process.start()
cube_process.start()
square_process.join()
cube_process.join()
print("Already joined")
while not queue.empty():
print(queue.get())

pretty sure this is just because spinning up a process takes some time, so they tend to run after each other
I rewrote it to make jobs have a better chance of running in parallel:
from multiprocessing import Process, Queue
from time import time, sleep
def fn(queue, offset, start_time):
sleep(start_time - time())
for i in range(10):
queue.put(offset + i)
if __name__ == '__main__':
queue = Queue()
start_time = time() + 0.1
procs = []
for i in range(2):
args = (queue, i * 10, start_time)
procs.append(Process(target=fn, args=args))
for p in procs: p.start()
for p in procs: p.join()
while not queue.empty():
print(queue.get())
I should note that I get nondeterministic ordering of output as you seemed to be expecting. I'm under Linux so you might get something different under Windows, but I think it's unlikely

Looks like MisterMiyagi is right. Start additional python process is much more expensive, than calculating squares from 0 to 4 :) I've created version of code with lock primitive and now we sure that processes started simultaneously.
import os
from multiprocessing import Process, Queue, Lock
def square(numbers, queue, lock):
print("started square")
# Block here, until lock release
lock.acquire()
for i in numbers:
queue.put(i*i)
print(f"{os.getpid()}")
def cube(numbers, queue, lock):
# Finally release lock
lock.release()
print("started cube")
for i in numbers:
queue.put(i*i*i)
print(f"{os.getpid()}")
if __name__ == '__main__':
numbers = range(5)
queue = Queue()
lock = Lock()
# Activate lock
lock.acquire()
square_process = Process(target=square, args=(numbers,queue,lock))
cube_process = Process(target=cube, args=(numbers,queue,lock))
square_process.start()
cube_process.start()
cube_process.join()
square_process.join()
print("Already joined")
while not queue.empty():
print(queue.get())
My output is:
0
0
1
4
1
9
8
16
27
64

The processes themselves are not doing anything CPU heavy or network bound so they take pretty negligible amount of time to execute. My guess would be that by the time the second process is started, the first one is already finished. Processes are parallel by nature, but since your tasks are so menial it gives the illusion that they are being run sequentially. You can introduce some randomness into your script to see the parallelism in action,
import os
from multiprocessing import Process, Queue
from random import randint
from time import sleep
def square(numbers, queue):
print("started square")
for i in numbers:
if randint(0,1000)%2==0:
sleep(3)
queue.put(i*i)
print(i*i)
print(f"square PID : {os.getpid()}")
def cube(numbers, queue):
print("started cube")
for i in numbers:
if randint(0,1000)%2==0:
sleep(3)
queue.put(i*i*i)
print(i*i*i)
print(f"cube PID : {os.getpid()}")
if __name__ == '__main__':
numbers = range(5)
queue = Queue()
square_process = Process(target=square, args=(numbers,queue))
cube_process = Process(target=cube, args=(numbers,queue))
square_process.start()
cube_process.start()
square_process.join()
cube_process.join()
print("Already joined")
while not queue.empty():
print(queue.get())
Here the two processes randomly pause their execution, so when one process is paused the other one gets a chance to add a number to the queue (multiprocessing.Queue is thread and process safe). If you run this script a couple of times you'll see that the order of items in the queue are not always the same

p.close and p.join in multiporcessing.Pool

I am following an instruction from youtube to learn multiprocessing
from multiprocessing import Pool
import subprocess
import time
def f(n):
sum = 0
for x in range(1000):
sum += x*x
return sum
if __name__ == "__main__":
t1 = time.time()
p = Pool()
result = p.map(f, range(10000))
p.close()
p.join()
print("Pool took: ", time.time()-t1)
I am puzzled about p.close() and p.join()
when processes were closed, they did not exist any more, how could manipulate .join to them?

join() waits for a child process to be killed. Killed processes send a signal informing their parents that they are quite dead. close() doesn't kill any process, It just closes a pipe which informs readers of that pipe, that there will be no more data coming through it.

Python multiprocessing processes terminating?

I've read a number of answers here on Stackoverflow about Python multiprocessing, and I think this one is the most useful for my purposes: python multiprocessing queue implementation.
Here is what I'd like to do: poll the database for new work, put it in the queue and have 4 processes continuously do the work. What I'm unclear on is what happens when an item in the queue is done being processed. In the question above, the process terminates when the queue is empty. However, in my case, I'd just like to keep waiting until there is data in the queue. So do I just sleep and periodically check the queue? So my worker processes will never die? Is that good practice?
def mp_worker(queue):
while True:
if (queue.qsize() == 0):
time.sleep(20)
else:
db_record = queue.get()
process_file(db_record)
def mp_handler():
num_workers = 4
processes = [Process(target=mp_worker, args=(queue,)) for _ in range(num_workers)]
for process in processes:
process.start()
for process in processes:
process.join()
if __name__ == '__main__':
db_conn = db.create_postgre_connection(DB_CONFIG)
while True:
db_records = db.retrieve_received_files(DB_CONN)
if (len(db_records) > 0):
for db_record in db_records:
queue.put(db_record)
mp_handler()
else:
time.sleep(20)
db_conn.close()
Does it make sense?
Thanks.

Figured it out. Workers have to die, since otherwise they never return. But I start a new set of workers when there is data anyway, so that's not a problem. Updated code:
def mp_worker(queue):
while queue.qsize() > 0 :
db_record = queue.get()
process_file(db_record)
def mp_handler():
num_workers = 4
if (queue.qsize() < num_workers):
num_workers = queue.qsize()
processes = [Process(target=mp_worker, args=(queue,)) for _ in range(num_workers)]
for process in processes:
process.start()
for process in processes:
process.join()
if __name__ == '__main__':
while True:
db_records = db.retrieve_received_files(DB_CONN)
print(db_records)
if (len(db_records) > 0):
for db_record in db_records:
queue.put(db_record)
mp_handler()
else:
time.sleep(20)
DB_CONN.close()

Python multiprocessing with Queue (split loads dynamically)

I am trying to use multiprocessing to process very large number of files.
I tried to put the list of files into queue and make 3 workers split the load with a common Queue data type. However this seems not working. Probably I am misunderstanding about the queue in multiprocessing package.
Below is the example source code:
import multiprocessing
from multiprocessing import Queue
def worker(i, qu):
"""worker function"""
while ~qu.empty():
val=qu.get()
print 'Worker:',i, ' start with file:',val
j=1
for k in range(i*10000,(i+1)*10000): # some time consuming process
for j in range(i*10000,(i+1)*10000):
j=j+k
print 'Worker:',i, ' end with file:',val
if __name__ == '__main__':
jobs = []
qu=Queue()
for j in range(100,110): # files numbers are from 100 to 110
qu.put(j)
for i in range(3): # 3 multiprocess
p = multiprocessing.Process(target=worker, args=(i,qu))
jobs.append(p)
p.start()
p.join()
Thanks for the comments.
I come to know that using Pool is the best solution.
import multiprocessing
import time
def worker(val):
"""worker function"""
print 'Worker: start with file:',val
time.sleep(1.1)
print 'Worker: end with file:',val
if __name__ == '__main__':
file_list=range(100,110)
p = multiprocessing.Pool(2)
p.map(worker, file_list)

Two issues:
1) you are joining only on the 3rd process
2) Why not use multiprocessing.Pool?
3) race condition on qu.get()
1 & 3)
import multiprocessing
from multiprocessing import Queue
def worker(i, qu):
"""worker function"""
while 1:
try:
val=qu.get(timeout)
except Queue.Empty: break# Yay no race condition
print 'Worker:',i, ' start with file:',val
j=1
for k in range(i*10000,(i+1)*10000): # some time consuming process
for j in range(i*10000,(i+1)*10000):
j=j+k
print 'Worker:',i, ' end with file:',val
if __name__ == '__main__':
jobs = []
qu=Queue()
for j in range(100,110): # files numbers are from 100 to 110
qu.put(j)
for i in range(3): # 3 multiprocess
p = multiprocessing.Process(target=worker, args=(i,qu))
jobs.append(p)
p.start()
for p in jobs: #<--- join on all processes ...
p.join()
2)
for how to use the Pool, see:
https://docs.python.org/2/library/multiprocessing.html

You are joining only the last of your created processes. That means if the first or the second process is still working while the third is finished, your main process is goning down and kills the remaining processes before they are finished.
You should join them all in order to wait until they are finished:
for p in jobs:
p.join()
Another thing is you should consider using qu.get_nowait() in order to get rid of the race condition between qu.empty() and qu.get().
For example:
try:
while 1:
message = self.queue.get_nowait()
""" do something fancy here """
except Queue.Empty:
pass
I hope that helps

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python multiprocessing module: join processes with timeout - python

Related

Stop a process when error occur in multiprocessing

with multiprocessing, why sequential execution

p.close and p.join in multiporcessing.Pool

Python multiprocessing processes terminating?

Python multiprocessing with Queue (split loads dynamically)

Categories

Resources