Timeout for each thread in ThreadPool in python - python

I am using Python 2.7.
I am currently using ThreadPoolExecuter like this:
params = [1,2,3,4,5,6,7,8,9,10]
with concurrent.futures.ThreadPoolExecutor(5) as executor:
result = list(executor.map(f, params))
The problem is that f sometimes runs for too long. Whenever I run f, I want to limit its run to 100 seconds, and then kill it.
Eventually, for each element x in param, I would like to have an indication of whether or not f had to be killed, and in case it wasn't - what was the return value.
Even if f times out for one parameter, I still want to run it with the next parameters.
The executer.map method does have a timeout parameter, but it sets a timeout for the entire run, from the time of the call to executer.map, and not for each thread separately.
What is the easiest way to get my desired behavior?

This answer is in terms of python's multiprocessing library, which is usually preferable to the threading library, unless your functions are just waiting on network calls. Note that the multiprocessing and threading libraries have the same interface.
Given you're processes run for potentially 100 seconds each, the overhead of creating a process for each one is fairly small in comparison. You probably have to make your own processes to get the necessary control.
One option is to wrap f in another function that will exectue for at most 100 seconds:
from multiprocessing import Pool
def timeout_f(arg):
pool = Pool(processes=1)
return pool.apply_async(f, [arg]).get(timeout=100)
Then your code changes to:
result = list(executor.map(timeout_f, params))
Alternatively, you could write your own thread/process control:
from multiprocessing import Process
from time import time
def chunks(l, n):
""" Yield successive n-sized chunks from l. """
for i in xrange(0, len(l), n):
yield l[i:i+n]
processes = [Process(target=f, args=(i,)) for i in params]
exit_codes = []
for five_processes = chunks(processes, 5):
for p in five_processes:
p.start()
time_waited = 0
start = time()
for p in five_processes:
if time_waited >= 100:
p.join(0)
p.terminate()
p.join(100 - time_waited)
p.terminate()
time_waited = time() - start
for p in five_processes:
exit_codes.append(p.exit_code)
You'd have to get the return values through something like Can I get a return value from multiprocessing.Process?
The exit codes of the processes are 0 if the processes completed and non-zero if they were terminated.
Techniques from:
Join a group of python processes with a timeout, How do you split a list into evenly sized chunks?
As another option, you could just try to use apply_async on multiprocessing.Pool
from multiprocessing import Pool, TimeoutError
from time import sleep
if __name__ == "__main__":
pool = Pool(processes=5)
processes = [pool.apply_async(f, [i]) for i in params]
results = []
for process in processes:
try:
result.append(process.get(timeout=100))
except TimeoutError as e:
results.append(e)
Note that the above possibly waits more than 100 seconds for each process, as if the first one takes 50 seconds to complete, the second process will have had 50 extra seconds in its run time. More complicated logic (such as the previous example) is needed to enforce stricter timeouts.

Related

Run N processes but never reuse the same process

I like to run a bunch of processes concurrently but never want to reuse an already existing process. So, basically once a process is finished I like to create a new one. But at all times the number of processes should not exceed N.
I don't think I can use multiprocessing.Pool for this since it reuses processes.
How can I achieve this?
One solution would be to run N processes and wait until all processed are done. Then repeat the same thing until all tasks are done. This solution is not very good since each process can have very different runtimes.
Here is a naive solution that appears to work fine:
from multiprocessing import Process, Queue
import random
import os
from time import sleep
def f(q):
print(f"{os.getpid()} Starting")
sleep(random.choice(range(1, 10)))
q.put("Done")
def create_proc(q):
p = Process(target=f, args=(q,))
p.start()
if __name__ == "__main__":
q = Queue()
N = 5
for n in range(N):
create_proc(q)
while True:
q.get()
create_proc(q)
Pool can reuse a process a limited number of times, including one time only when you pass maxtasksperchild=1. You might also try initializer to see if you can run the picky once per process parts of your library there instead of in your pool jobs.

Periodically restart Python multiprocessing pool

I have a Python multiprocessing pool doing a very long job that even after a thorough debugging is not robust enough not to fail every 24 hours or so, because it depends on many third-party, non-Python tools with complex interactions. Also, the underlying machine has certain problems that I cannot control. Note that by failing I don't mean the whole program crashing, but some or most of the processes becoming idle because of some errors, and the app itself either hanging or continuing the job just with the processes that haven't failed.
My solution right now is to periodically kill the job, manually, and then just restart from where it was.
Even if it's not ideal, what I want to do now is the following: restart the multiprocessing pool periodically, programatically, from the Python code itself. I don't really care if this implies killing the pool workers in the middle of their job. Which would be the best way to do that?
My code looks like:
with Pool() as p:
for _ in p.imap_unordered(function, data):
save_checkpoint()
log()
What I have in mind would be something like:
start = 0
end = 1000 # magic number
while start + 1 < len(data):
current_data = data[start:end]
with Pool() as p:
for _ in p.imap_unordered(function, current_data):
save_checkpoint()
log()
start += 1
end += 1
Or:
start = 0
end = 1000 # magic number
while start + 1 < len(data):
current_data = data[start:end]
start_timeout(time=TIMEOUT) # which would be the best way to to do that without breaking multiprocessing?
try:
with Pool() as p:
for _ in p.imap_unordered(function, current_data):
save_checkpoint()
log()
start += 1
end += 1
except Timeout:
pass
Or any suggestion you think would be better. Any help would be much appreciated, thanks!
The problem with your current code is that it iterates the multiprocessed results directly, and that call will block. Fortunately there's an easy solution: use apply_async exactly as suggested in the docs. But because of how you describe the use-case here and the failure, I've adapted it somewhat. Firstly, a mock task:
from multiprocessing import Pool, TimeoutError, cpu_count
from time import sleep
from random import randint
def log():
print("logging is a dangerous activity: wear a hard hat.")
def work(d):
sleep(randint(1, 100) / 100)
print("finished working")
if randint(1, 10) == 1:
print("blocking...")
while True:
sleep(0.1)
return d
This work function will fail with a probabilty of 0.1, blocking indefinitely. We create the tasks:
data = list(range(100))
nproc = cpu_count()
And then generate futures for all of them:
while data:
print(f"== Processing {len(data)} items. ==")
with Pool(nproc) as p:
tasks = [p.apply_async(work, (d,)) for d in data]
Then we can try to get the tasks out manually:
for task in tasks:
try:
res = task.get(timeout=1)
data.remove(res)
log()
except TimeoutError:
failed.append(task)
if len(failed) < nproc:
print(
f"{len(failed)} processes are blocked,"
f" but {nproc - len(failed)} remain."
)
else:
break
The controlling timeout here is the timeout to .get. It should be as long as you expect the longest process to take. Note that we detect when the whole pool is tied up and give up.
But since in the scenario you describe some threads are going to take longer than others, we can give 'failed' processes some time to recover. Thus every time a task fails we quickly check if the others have in fact succeeded:
for task in failed:
try:
res = task.get(timeout=0.01)
data.remove(res)
failed.remove(task)
log()
except TimeoutError:
continue
Whether this is a good addition in your case depends on whether your tasks really are as flaky as I'm guessing they are.
Exiting the context manager for the pool will terminate the pool, so we don't even need to handle that ourselves. If you have significant variation you might want to increase the pool size (thus increasing the number of tasks which are allowed to stall) or allow tasks a grace period before considering them 'failed'.

Using concurrent.futures to call a fn in parallel every second

I've been trying to get to grips with how I can use concurrent.futures to call a function 3 times every second, without waiting for it to return. I will collect the results after I've made all the calls I need to make.
Here is where I am at the moment, and I'm surprised that sleep() within this example function prevents my code from launching the next chunk of 3 function calls. I'm obviously not understanding the documentation well enough here :)
def print_something(thing):
print(thing)
time.sleep(10)
# define a generator
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
def main():
chunk_number = 0
alphabet = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
for current_chunk in chunks(alphabet, 3): # Restrict to calling the function 3 times per second
with ProcessPoolExecutor(max_workers=3) as executor:
futures = { executor.submit(print_something, thing): thing for thing in current_chunk }
chunk_number += 1
print('chunk %s' % chunk_number)
time.sleep(1)
for result in as_completed(futures):
print(result.result())
This code results in chunks of 3 being printed with a sleep time of 10s between each chunk.How can I change this to ensure I'm not waiting for the function to return before calling for the next batch ?
Thanks
First, for each iteration of for current_chunk in chunks(alphabet, 3):, you are creating a new ProcessPoolExecutor instance and futures dictionary instance clobbering the previous one. So the final loop for result in as_completed(futures): would only be printing the results from the last chunk submitted. Second, and the reason why I believe you are hanging, your block that is governed by with ProcessPoolExecutor(max_workers=3) as executor: will not terminate until the tasks that are submitted by the executor are completed and that will take at least 10 seconds. So, the next iteration of the for current_chunk in chunks(alphabet, 3): block won't be executed more frequently than once every 10 seconds.
Note also that the block for result in as_completed(futures): needs to be moved within the with ThreadPoolExecutor(max_workers=26) as executor: block for the same reason. That is, if it is placed after, it will not be executed until all the tasks have completed and so you will not be able to get results "as they complete."
You need to do a bit of rearranging as shown below (I have also modified print_something to return something other than None. There should be no hangs now if you have enough workers (26) to run the 26 tasks being submitted. I doubt your desktop (if you are running this on your PC) has 26 cores to support 26 concurrently executing processes. But I note that print_something only prints a short string and then sleeps for 10 seconds, which allows it to relinquish its processor to another process in the pool. So, while with cpu-intensive tasks, little is to be gained by specifying a max_workers value greater than the number of actual physical processors/cores you have on your computer, in this case it's OK. But more efficient when you have tasks that spend little time executing actual Python byte code is to use threading instead of processes, since the cost of creating threads is much less than the cost of creating processes. However, threading is notoriously poor when the tasks you are running largely consists of Python byte code since such code cannot be executed concurrently due to serialization of the Global Interpreter Lock (GIL).
Topic for you to research: The Global Interpreter Lock (GIL) and Python byte code execution
Update to use threads:
So we should substitute the ThreadPoolExecutor with 26 or more light-weight threads for the ProcessPoolExecutor. The beauty of the concurrent.futures module is that no other code needs to be changed. But most important is to change the block structure and have a single executor.
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
def print_something(thing):
# NOT cpu-intensive, so threads should work well here
print(thing)
time.sleep(10)
return thing # so there is a non-None result
# define a generator
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
def main():
chunk_number = 0
alphabet = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
futures = {}
with ThreadPoolExecutor(max_workers=26) as executor:
for current_chunk in chunks(alphabet, 3): # Restrict to calling the function 3 times per second
futures.update({executor.submit(print_something, thing): thing for thing in current_chunk })
chunk_number += 1
print('chunk %s' % chunk_number)
time.sleep(1)
# needs to be within the executor block else it won't run until all futures are complete
for result in as_completed(futures):
print(result.result())
if __name__ == '__main__':
main()

Timing a multiprocessing script

I've stumbled across a weird timing issue while using the multiprocessing module.
Consider the following scenario. I have functions like this:
import multiprocessing as mp
def workerfunc(x):
# timehook 3
# something with x
# timehook 4
def outer():
# do something
mygen = ... (some generator expression)
pool = mp.Pool(processes=8)
# time hook 1
result = [pool.apply(workerfunc, args=(x,)) for x in mygen]
# time hook 2
if __name__ == '__main__':
outer()
I am utilizing the time module to get an arbitrary feeling for how long my functions run. I successfully create 8 separate processes, which terminate without error. The longest time for a worker to finish is about 130 ms (measured between timehook 3 and 4).
I expected (as they are running in parallel) that the time between hook 1 and 2 will be approximately the same. Surprisingly, I get 600 ms as a result.
My machine has 32 cores and should be able to handle this easily. Can anybody give me a hint where this difference in time comes from?
Thanks!
You are using pool.apply which is blocking. Use pool.apply_async instead and then the function calls will all run in parallel, and each will return an AsyncResult object immediately. You can use this object to check when the processes are done and then retrieve the results using this object also.
Since you are using multiprocessing and not multithreading your performance issue is not related to GIL (Python's Global Interpreter Lock).
I've found an interesting link explaining this with an example, you can find it in the bottom of this answer.
The GIL does not prevent a process from running on a different
processor of a machine. It simply only allows one thread to run at
once within the interpreter.
So multiprocessing not multithreading will allow you to achieve true
concurrency.
Lets understand this all through some benchmarking because only that
will lead you to believe what is said above. And yes, that should be
the way to learn — experience it rather than just read it or
understand it. Because if you experienced something, no amount of
argument can convince you for the opposing thoughts.
import random
from threading import Thread
from multiprocessing import Process
size = 10000000 # Number of random numbers to add to list
threads = 2 # Number of threads to create
my_list = []
for i in xrange(0,threads):
my_list.append([])
def func(count, mylist):
for i in range(count):
mylist.append(random.random())
def multithreaded():
jobs = []
for i in xrange(0, threads):
thread = Thread(target=func,args=(size,my_list[i]))
jobs.append(thread)
# Start the threads
for j in jobs:
j.start()
# Ensure all of the threads have finished
for j in jobs:
j.join()
def simple():
for i in xrange(0, threads):
func(size,my_list[i])
def multiprocessed():
processes = []
for i in xrange(0, threads):
p = Process(target=func,args=(size,my_list[i]))
processes.append(p)
# Start the processes
for p in processes:
p.start()
# Ensure all processes have finished execution
for p in processes:
p.join()
if __name__ == "__main__":
multithreaded()
#simple()
#multiprocessed()
Additional information
Here you can find the source of this information and a more detailed technical explanation (bonus: there's also Guido Van Rossum quotes in it :) )

Why my parallel code is slower than the sequential

I am trying to implement an online recursive parallel algorithm, which is highly parallelizable. My problem is that my python implementation does not work as I want. I have two 2D matrices where I want to update recursively every column every time a new observation is observed at time-step t.
My parallel code is like this
def apply_async(t):
worker = mp.Pool(processes = 4)
for i in range(4):
X[:,i,np.newaxis], b[:,i,np.newaxis] = worker.apply_async(OULtraining, args=(train[t,i], X[:,i,np.newaxis], b[:,i,np.newaxis])).get()
worker.close()
worker.join()
for t in range(p,T):
count = 0
for l in range(p):
for k in range(4):
gn[count]=train[t-l-1,k]
count+=1
G = G*v + gn # gn.T
Gt = (1/(t-p+1))*G
if __name__ == '__main__':
apply_async(t)
The two matrices are X and b. I want to replace directly on master's memory as each process updates recursively only one specific column of the matrices.
Why this implementation is slower than the sequential?
Is there any way to resume the process every time-step rather than killing them and create them again? Could this be the reason it is slower?
The reason is, your program is in practice sequential. This is an example code snippet that is from parallelism standpoint identical to yours:
from multiprocessing import Pool
from time import sleep
def gwork( qq):
print (qq)
sleep(1)
return 42
p = Pool(processes=4)
for q in range(1, 10):
p.apply_async(gwork, args=(q,)).get()
p.close()
p.join()
Run this and you shall notice numbers 1-9 appearing exactly once in a second. Why is this? The reason is your .get(). This means every call to apply_async will in practice block in get() until a result is available. It will submit one task, wait a second emulating processing delay, then return the result, after which another task is submitted to your pool. This means there is no parallel execution ongoing at all.
Try replacing the pool management part with this:
results = []
for q in range(1, 10):
res = p.apply_async(gwork, args=(q,))
results.append(res)
p.close()
p.join()
for r in results:
print (r.get())
You can now see parallelism at work, as four of your tasks are now processed simultaneously. Your loop does not block in get, as get is moved out of the loop and results are received only when they are ready.
NB: If your arguments to your worker or the return values from them are large data structures, you will lose some performance. In practice Python implements these as queues, and transmitting a lot of data via a queue is slow on relative terms compared to getting an in-memory copy of a data structure when a subprocess is forked.

Categories