Stop a process when error occur in multiprocessing

Stop a process when error occur in multiprocessing - python

I have created a 3 process in python. I have attached a code.
Now I want to stop the execution of running p2,p3 process because I got an error due to p1 process.I have idea to add p2.terminate(),I don't know where to add in this case. Thanks in advance.
def table(a):
try:
for i in range(100):
print(i,'x',a,'=',a*i)
except:
print("error")
processes = []
p1= multiprocessing.Process(target = table,args=['s'])
p2= multiprocessing.Process(target = table,args=[5])
p3= multiprocessing.Process(target = table,args=[2])
p1.start()
p2.start()
p3.start()
processes.append(p1)
processes.append(p2)
processes.append(p3)
for process in processes:
process.join()```

To stop any given process once one of the process terminates due to an error, first set up your target table() to exit with an appropriate exitcode > 0
def table(args):
try:
for i in range(100):
print(i,'x', a ,'=', a*i)
except:
sys.exit(1)
sys.exit(0)
Then you can start your processes and poll the processes to see if any one has terminated.
#!/usr/bin/env python3
# coding: utf-8
import multiprocessing
import time
import logging
import sys
logging.basicConfig(level=logging.INFO, format='[%(asctime)-15s] [%(processName)-10s] %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
def table(args):
try:
for i in range(5):
logging.info('{} x {} = {}'.format(i, args, i*args))
if isinstance(args, str):
raise ValueError()
time.sleep(5)
except:
logging.error('Done in Error Path: {}'.format(args))
sys.exit(1)
logging.info('Done in Success Path: {}'.format(args))
sys.exit(0)
if __name__ == '__main__':
p1 = multiprocessing.Process(target=table, args=('s',))
p2 = multiprocessing.Process(target=table, args=(5,))
p3 = multiprocessing.Process(target=table, args=(2,))
processes = [p1, p2, p3]
for process in processes:
process.start()
while True:
failed = []
completed = []
for process in processes:
if process.exitcode is not None and process.exitcode != 0:
failed.append(process)
if failed:
for process in processes:
if process not in failed:
logging.info('Terminating Process: {}'.format(process))
process.terminate()
break
if len(completed) == len(processes):
break
time.sleep(1)
Essentially, you are using terminate() to stop the remaining processes that are still running.

to stop a all cores when one core has faced with error, i use this code block:
processes = []
for j in range(0, n_core):
p = multiprocessing.Process(target=table, args=('some input',))
processes.append(p)
time.sleep(0.1)
p.start()
flag = True
while flag:
flag = False
for p in processes:
if p.exitcode == 1:
for z in processes:
z.kill()
sys.exit(1)
elif p.is_alive():
flag = True
for p in processes:
p.join()

First, I have modified function table to throw an exception that is not caught when the argument passed to it is 's' and to delay .1 seconds otherwise before printing to give the main process a chance to realize that the sub-process through an exception and can cancel the other processes before they have started printing. Otherwise, the other processes will have completed before you can cancel them. Here I am using a process pool, which supports a terminate method that conveniently terminates all submitted, uncompleted tasks without having to cancel each one individually (although that is also an option).
The code creates a multiprocessing pool of size 3 since that is the number of "tasks" being submitted and then uses method apply_async to submit the 3 tasks to run in parallel (assuming you have at least 3 processors). apply_sync returns an AsyncResult instance whose get method can be called to wait for the completion of the submitted task and to get the return value from the worker function table, which is None for the second and third tasks submitted and of no interest, or will throw an exception if the worker function had an uncaught exception, which is the case with the first task submitted:
import multiprocessing
import time
def table(a):
if a == 's':
raise Exception('I am "s"')
time.sleep(.1)
for i in range(100):
print(i,'x',a,'=',a*i)
# required for Windows:
if __name__ == '__main__':
pool = multiprocessing.Pool(3) # create a pool of 3 processes
result1 = pool.apply_async(table, args=('s',))
result2 = pool.apply_async(table, args=(5,))
result3 = pool.apply_async(table, args=(2,))
try:
result1.get() # wait for completion of first task
except Exception as e:
print(e)
pool.terminate() # kill all processes in the pool
else:
# wait for all submitted tasks to complete:
pool.close()
pool.join()
"""
# or alternatively:
result2.get() # wait for second task to finish
result3.get() # wait for third task to finish
"""
Prints:
I am "s"

Related

Python multiprocessing processes terminating?

I've read a number of answers here on Stackoverflow about Python multiprocessing, and I think this one is the most useful for my purposes: python multiprocessing queue implementation.
Here is what I'd like to do: poll the database for new work, put it in the queue and have 4 processes continuously do the work. What I'm unclear on is what happens when an item in the queue is done being processed. In the question above, the process terminates when the queue is empty. However, in my case, I'd just like to keep waiting until there is data in the queue. So do I just sleep and periodically check the queue? So my worker processes will never die? Is that good practice?
def mp_worker(queue):
while True:
if (queue.qsize() == 0):
time.sleep(20)
else:
db_record = queue.get()
process_file(db_record)
def mp_handler():
num_workers = 4
processes = [Process(target=mp_worker, args=(queue,)) for _ in range(num_workers)]
for process in processes:
process.start()
for process in processes:
process.join()
if __name__ == '__main__':
db_conn = db.create_postgre_connection(DB_CONFIG)
while True:
db_records = db.retrieve_received_files(DB_CONN)
if (len(db_records) > 0):
for db_record in db_records:
queue.put(db_record)
mp_handler()
else:
time.sleep(20)
db_conn.close()
Does it make sense?
Thanks.

Figured it out. Workers have to die, since otherwise they never return. But I start a new set of workers when there is data anyway, so that's not a problem. Updated code:
def mp_worker(queue):
while queue.qsize() > 0 :
db_record = queue.get()
process_file(db_record)
def mp_handler():
num_workers = 4
if (queue.qsize() < num_workers):
num_workers = queue.qsize()
processes = [Process(target=mp_worker, args=(queue,)) for _ in range(num_workers)]
for process in processes:
process.start()
for process in processes:
process.join()
if __name__ == '__main__':
while True:
db_records = db.retrieve_received_files(DB_CONN)
print(db_records)
if (len(db_records) > 0):
for db_record in db_records:
queue.put(db_record)
mp_handler()
else:
time.sleep(20)
DB_CONN.close()

How to timeout a running function and all it's child processes on Linux?

How to force a function and all it's child processes to timeout on Linux?
For example, how could multiprocessed_func be forced to finish after 10s:
import time
def multiprocessed_func(seconds):
# Assume this a long running function which uses
# multiprocessing internally and returns None.
time.sleep(seconds)
try:
multiprocessed_func(600)
except:
print('took too long')

Borrowing from the psutil docs, we could inspect the current process and terminate or kill all the child processes after a given time.
def terminate_children(grace_period):
procs = psutil.Process().children()
for p in procs:
p.terminate()
gone, still_alive = psutil.wait_procs(procs, timeout=grace_period)
for p in still_alive:
p.kill()
raise TimeoutError
try:
multiprocessed_func(long_run=600)
time.sleep(10) # then timeout
terminate_children(grace_period=2)
except TimeoutError:
print('timed out')
pass
Full example:
import multiprocessing
import time
import psutil
def slow_worker(long_run):
print('started')
time.sleep(long_run)
print('finished')
def multiprocessed_func(long_run):
jobs = []
for i in range(5):
p = multiprocessing.Process(target=slow_worker, args=(long_run,))
jobs.append(p)
p.start()
print('starting', p.pid)
def on_terminate(proc):
print('terminating {}, exit code {}'.format(proc, proc.returncode))
def terminate_children(grace_period):
procs = psutil.Process().children()
for p in procs:
p.terminate()
gone, still_alive = psutil.wait_procs(procs, timeout=grace_period,
callback=on_terminate)
for p in still_alive:
p.kill()
raise TimeoutError
try:
multiprocessed_func(long_run=600)
time.sleep(10)
terminate_children(grace_period=2)
except TimeoutError:
print('timed out')
pass
If terminating all the child processes in the current process is excessive because there are additional multiprocessed methods in the current process that need to be preserved, then we could wrap multiprocessed_func in another process.
def safe_run(timeout, grace_period):
try:
multiprocessed_func(long_run=600)
time.sleep(timeout)
terminate_children(grace_period)
except TimeoutError:
pass
timeout, grace_period = 10, 2
p = multiprocessing.Process(target=safe_run, args=(timeout, grace_period,))
p.start()
p.join()
p.terminate()
time.sleep(2)
if p.is_alive():
p.kill()

Python multiprocessing - check status of each processes

I wonder if it is possible to check how long of each processes take.
for example, there are four workers and the job should take no more than 10 seconds, but one of worker take more than 10 seconds.Is there way to raise a alert after 10 seconds and before process finish the job.
My initial thought is using manager, but it seems I have wait till process finished.
Many thanks.

You can check whether process is alive after you tried to join it. Don't forget to set timeout otherwise it'll wait until job is finished.
Here is simple example for you
from multiprocessing import Process
import time
def task():
import time
time.sleep(5)
procs = []
for x in range(2):
proc = Process(target=task)
procs.append(proc)
proc.start()
time.sleep(2)
for proc in procs:
proc.join(timeout=0)
if proc.is_alive():
print "Job is not finished!"

I have found this solution time ago (somewhere here in StackOverflow) and I am very happy with it.
Basically, it uses signal to raise an exception if a process takes more than expected.
All you need to do is to add this class to your code:
import signal
class Timeout:
def __init__(self, seconds=1, error_message='TimeoutError'):
self.seconds = seconds
self.error_message = error_message
def handle_timeout(self, signum, frame):
raise TimeoutError(self.error_message)
def __enter__(self):
signal.signal(signal.SIGALRM, self.handle_timeout)
signal.alarm(self.seconds)
def __exit__(self, type, value, traceback):
signal.alarm(0)
Here is a general example of how it works:
import time
with Timeout(seconds=3, error_message='JobX took too much time'):
try:
time.sleep(10) #your job
except TimeoutError as e:
print(e)
In your case, I would add the with statement to the job that your worker need to perform. Then you catch the Exception and you do what you think is best.
Alternatively, you can periodically check if a process is alive:
timeout = 3 #seconds
start = time.time()
while time.time() - start < timeout:
if any(proces.is_alive() for proces in processes):
time.sleep(1)
else:
print('All processes done')
else:
print("Timeout!")
# do something

Use Pipe and messages
from multiprocessing import Process, Pipe
import numpy as np
caller, worker = Pipe()
val1 = ['der', 'die', 'das']
def worker_function(info):
print (info.recv())
for i in range(10):
print (val1[np.random.choice(3, 1)[0]])
info.send(['job finished'])
info.close()
def request(data):
caller.send(data)
task = Process(target=worker_function, args=(worker,))
if not task.is_alive():
print ("task is requested")
task.start()
if caller.recv() == ['job finished']:
task.join()
print ("finished")
if __name__ == '__main__':
data = {'input': 'here'}
request(data)

Python Locking Critical Section

I am trying to use the multiprocessing library in Python to process "tests" concurrently. I have a list of tests stored in the variable test_files. I want to workers to remove a test from test_files and call the process_test function of them. However when I run this code, both processes run the same test. It seems that I am not accessing test_files in a thread safe manner. What am I doing wrong?
Code
def process_worker(lock, test_files)
# Keep going until we run out of tests
while True:
test_file = None
# Critical section of code
lock.acquire()
try:
if len(test_files) != 0:
test_file = test_files.pop()
finally:
lock.release()
# End critical section of code
# If there is another test in the queue process it
if test_file is not None:
print "Running test {0} on worker {1}".format(test_file, multiprocessing.current_process().name)
process_test(test_file)
else:
# No more tests to process
return
# Mutex for workers
lock = multiprocessing.Lock()
# Declare our workers
p1 = multiprocessing.Process(target = process_worker, name = "Process 1", args=(lock, test_files))
p2 = multiprocessing.Process(target = process_worker, name = "Process 2", args=(lock, test_files))
# Start processing
p1.start()
p2.start()
# Block until both workers finish
p1.join()
p2.join()
Output
Running test "BIT_Test" on worker Process 1
Running test "BIT_Test" on worker Process 2

Trying to share a list like this not the right approach here. You should use a process-safe data structure, like multiprocessing.Queue, or better yet, use a multiprocessing.Pool and let it handle the queuing for you. What you're doing is perfectly suited for Pool.map:
import multiprocessing
def process_worker(test_file):
print "Running test {0} on worker {1}".format(test_file, multiprocessing.current_process().name)
process_test(test_file)
p = multiprocessing.Pool(2) # 2 processes in the pool
# map puts each item from test_files in a Queue, lets the
# two processes in our pool pull each item from the Queue,
# and then execute process_worker with that item as an argument.
p.map(process_worker, test_files)
p.close()
p.join()
Much simpler!

You could also use multiprocessing.Manager
import multiprocessing
def process_worker(lock, test_files):
# Keep going until we run out of tests
while True:
test_file = None
# Critical section of code
lock.acquire()
try:
if len(test_files) != 0:
test_file = test_files.pop()
finally:
lock.release()
# End critical section of code
# If there is another test in the queue process it
if test_file is not None:
print "Running test %s on worker %s" % (test_file, multiprocessing.current_process().name)
#process_test(test_file)
else:
# No more tests to process
return
# Mutex for workers
lock = multiprocessing.Lock()
manager = multiprocessing.Manager()
test_files = manager.list(['f1', 'f2', 'f3'])
# Declare our workers
p1 = multiprocessing.Process(target = process_worker, name = "Process 1", args=(lock, test_files))
p2 = multiprocessing.Process(target = process_worker, name = "Process 2", args=(lock, test_files))
# Start processing
p1.start()
p2.start()
# Block until both workers finish
p1.join()
p2.join()

Python multiprocessing module: join processes with timeout

I'm doing an optimization of parameters of a complex simulation. I'm using the multiprocessing module for enhancing the performance of the optimization algorithm. The basics of multiprocessing I learned at http://pymotw.com/2/multiprocessing/basics.html.
The complex simulation lasts different times depending on the given parameters from the optimization algorithm, around 1 to 5 minutes. If the parameters are chosen very badly, the simulation can last 30 minutes or more and the results are not useful. So I was thinking about build in a timeout to the multiprocessing, that terminates all simulations that last more than a defined time. Here is an abstracted version of the problem:
import numpy as np
import time
import multiprocessing
def worker(num):
time.sleep(np.random.random()*20)
def main():
pnum = 10
procs = []
for i in range(pnum):
p = multiprocessing.Process(target=worker, args=(i,), name = ('process_' + str(i+1)))
procs.append(p)
p.start()
print('starting', p.name)
for p in procs:
p.join(5)
print('stopping', p.name)
if __name__ == "__main__":
main()
The line p.join(5) defines the timeout of 5 seconds. Because of the for-loop for p in procs: the program waits 5 seconds until the first process is finished and then again 5 seconds until the second process is finished and so on, but i want the program to terminate all processes that last more than 5 seconds. Additionally, if none of the processes last longer than 5 seconds the program must not wait this 5 seconds.

You can do this by creating a loop that will wait for some timeout amount of seconds, frequently checking to see if all processes are finished. If they don't all finish in the allotted amount of time, then terminate all of the processes:
TIMEOUT = 5
start = time.time()
while time.time() - start <= TIMEOUT:
if not any(p.is_alive() for p in procs):
# All the processes are done, break now.
break
time.sleep(.1) # Just to avoid hogging the CPU
else:
# We only enter this if we didn't 'break' above.
print("timed out, killing all processes")
for p in procs:
p.terminate()
p.join()

If you want to kill all the processes you could use the Pool from multiprocessing
you'll need to define a general timeout for all the execution as opposed of individual timeouts.
import numpy as np
import time
from multiprocessing import Pool
def worker(num):
xtime = np.random.random()*20
time.sleep(xtime)
return xtime
def main():
pnum = 10
pool = Pool()
args = range(pnum)
pool_result = pool.map_async(worker, args)
# wait 5 minutes for every worker to finish
pool_result.wait(timeout=300)
# once the timeout has finished we can try to get the results
if pool_result.ready():
print(pool_result.get(timeout=1))
if __name__ == "__main__":
main()
This will get you a list with the return values for all your workers in order.
More information here:
https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool

Thanks to the help of dano I found a solution:
import numpy as np
import time
import multiprocessing
def worker(num):
time.sleep(np.random.random()*20)
def main():
pnum = 10
TIMEOUT = 5
procs = []
bool_list = [True]*pnum
for i in range(pnum):
p = multiprocessing.Process(target=worker, args=(i,), name = ('process_' + str(i+1)))
procs.append(p)
p.start()
print('starting', p.name)
start = time.time()
while time.time() - start <= TIMEOUT:
for i in range(pnum):
bool_list[i] = procs[i].is_alive()
print(bool_list)
if np.any(bool_list):
time.sleep(.1)
else:
break
else:
print("timed out, killing all processes")
for p in procs:
p.terminate()
for p in procs:
print('stopping', p.name,'=', p.is_alive())
p.join()
if __name__ == "__main__":
main()
Its not the most elegant way, I'm sure there is a better way than using bool_list. Processes that are still alive after the timeout of 5 seconds will be killed. If you are setting shorter times in the worker function than the timeout, you will see that the program stops before the timeout of 5 seconds is reached. I'm still open for more elegant solutions if there are :)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Stop a process when error occur in multiprocessing - python

Related

Python multiprocessing processes terminating?

How to timeout a running function and all it's child processes on Linux?

Python multiprocessing - check status of each processes

Python Locking Critical Section

Python multiprocessing module: join processes with timeout

Categories

Resources