This is my code:
from multiprocessing import Pool, Lock
from datetime import datetime as dt
console_out = "/STDOUT/Console.out"
chunksize = 50
lock = Lock()
def writer(message):
lock.acquire()
with open(console_out, 'a') as out:
out.write(message)
out.flush()
lock.release()
def conf_wrapper(state):
import ProcessingModule as procs
import sqlalchemy as sal
stcd, nrows = state
engine = sal.create_engine('postgresql://foo:bar#localhost:5432/schema')
writer("State {s} started at: {n}"
"\n".format(s=str(stcd).zfill(2), n=dt.now()))
with engine.connect() as conn, conn.begin():
procs.processor(conn, stcd, nrows, chunksize)
writer("\tState {s} finished at: {n}"
"\n".format(s=str(stcd).zfill(2), n=dt.now()))
def main():
nprocesses = 12
maxproc = 1
state_list = [(2, 113), (10, 119), (15, 84), (50, 112), (44, 110), (11, 37), (33, 197)]
with open(console_out, 'w') as out:
out.write("Starting at {n}\n".format(n=dt.now()))
out.write("Using {p} processes..."
"\n".format(p=nprocesses))
with Pool(processes=int(nprocesses), maxtasksperchild=maxproc) as pool:
pool.map(func=conf_wrapper, iterable=state_list, chunksize=1)
with open(console_out, 'a') as out:
out.write("\nAll done at {n}".format(n=dt.now()))
The file console_out never has all 7 states in it. It always misses one or more state. Here is the output from the latest run:
Starting at 2016-07-27 21:46:58.638587
Using 12 processes...
State 44 started at: 2016-07-27 21:47:01.482322
State 02 started at: 2016-07-27 21:47:01.497947
State 11 started at: 2016-07-27 21:47:01.529198
State 10 started at: 2016-07-27 21:47:01.497947
State 11 finished at: 2016-07-27 21:47:15.701207
State 15 finished at: 2016-07-27 21:47:24.123164
State 44 finished at: 2016-07-27 21:47:32.029489
State 50 finished at: 2016-07-27 21:47:51.203107
State 10 finished at: 2016-07-27 21:47:53.046876
State 33 finished at: 2016-07-27 21:47:58.156301
State 02 finished at: 2016-07-27 21:48:18.856979
All done at 2016-07-27 21:48:18.992277
Why?
Note, OS is Windows Server 2012 R2.
Since you're running on Windows, nothing is inherited by worker processes. Each process runs the entire main program "from scratch".
In particular, with the code as written every process has its own instance of lock, and these instances have nothing to do with each other. In short, lock isn't supplying any inter-process mutual exclusion at all.
To fix this, the Pool constructor can be changed to call a once-per-process initialization function, to which you pass an instance of Lock(). For example, like so:
def init(L):
global lock
lock = L
and then add these arguments to the Pool() constructor:
initializer=init, initargs=(Lock(),),
And you no longer need the:
lock = Lock()
line.
Then the inter-process mutual exclusion will work as intended.
WITHOUT A LOCK
If you'd like to delegate all output to a writer process, you could skip the lock and use a queue instead to feed that process [and see later for different version].
def writer_process(q):
with open(console_out, 'w') as out:
while True:
message = q.get()
if message is None:
break
out.write(message)
out.flush() # can't guess whether you really want this
and change writer() to just:
def writer(message):
q.put(message)
You would again need to use initializer= and initargs= in the Pool constructor so that all processes use the same queue.
Only one process should run writer_process(), and that can be started on its own as an instance of multiprocessing.Process.
Finally, to let writer_process() know it's time to quit, when it is time for it to drain the queue and return just run
q.put(None)
in the main process.
LATER
The OP settled on this version instead, because they needed to open the output file in other code simultaneously:
def writer_process(q):
while True:
message = q.get()
if message == 'done':
break
else:
with open(console_out, 'a') as out:
out.write(message)
I don't know why the terminating sentinel was changed to "done". Any unique value works for this; None is traditional.
Related
Note: this question is different from that question, notably in when the jobs are dispatched to the workers and when the results are gathered.
So I have this code:
mp_jobqueue = MP.Queue()
mp_mgr = MP.Manager()
mp_state = mp_mgr.dict()
mp_faileds = mp_mgr.list()
# the processing in process_data_worker is very CPU-intensive,
# thus totally not suitable for async.
workers: List[MP.Process] = []
for ident in range(0, WORKER_COUNT):
print(ident, end=" ", flush=True)
mp_state[ident] = None
w = MP.Process(
target=process_data_worker,
args=(mp_jobqueue, mp_state, mp_faileds),
)
w.start()
workers.append(w)
# fetch_data asynchronously fetches chunks of data,
# each chunk will be directly fed into the job queue to be processed
# by the workers
asyncio.run(fetch_data(mp_jobqueue))
# when we reach here, all data-fetching should have been finished
# and submitted to the workers' job queue
# wait until mp_jobqueue is empty AND all workers are IDLE
safed_workers = 0
while not mp_jobqueue.is_empty() or safed_workers < WORKER_COUNT:
time.sleep(1.0)
safed_workers = sum(1 for state in mp_state.values() if state == "IDLE")
# gather failed results
faileds = list(mp_faileds)
# close manager first to prevent GetOverlappedResult error
mp_mgr.shutdown()
mp_mgr.join()
# disband the workers
[mp_jobqueue.put("DIE") for _ in workers]
time.sleep(1.0)
mp_jobqueue.close()
[w.join() for w in workers]
So as you can see, I cannot use pool.map() to gather the "faileds".
This got me thinking, though:
Will it be better (performance-wise) to use another Queue for mp_faileds instead of a list like it is now? Because I only need an object that can handle "add into bag" and "take out from bag until bag is empty".
Edit: Just found out about multiprocessing.queues.SimpleQueue. The answers to this question, notably this particular answer, seems to hint that SimpleQueue might be even faster. Can someone confirm?
I am using the multiprocessing.pool.ThreadPool with N threads (e.g 5 threads) and I wanted to check the total number of active threads in my process. To do that I am using the method threading.active_count(). I know it's a different module, but I found no other method to count the number of active threads in the multiprocessing package,
The expected result is N+1 (the number of threads I started plus the main thread), but I always get a higher number.
For ThreadPool(2) I am getting 6 active threads
For ThreadPool(5) I am getting 9 active threads
For ThreadPool(10) I am getting 14 active threads
It's important to say that threading.active_count() works fine when creating threads using the threading module. And I found out that multiprocessing.pool.ThreadPool is not well documented.
Can someone help me?
A reproduceable code is described bellow
import threading
from multiprocessing.pool import ThreadPool
import time
import requests
import os
urls_to_download = [
'https://picsum.photos/seed/1/1920/1080',
'https://picsum.photos/seed/2/1920/1080',
'https://picsum.photos/seed/3/1920/1080',
'https://picsum.photos/seed/4/1920/1080',
'https://picsum.photos/seed/5/1920/1080',
'https://picsum.photos/seed/6/1920/1080',
'https://picsum.photos/seed/7/1920/1080',
'https://picsum.photos/seed/8/1920/1080',
'https://picsum.photos/seed/9/1920/1080',
'https://picsum.photos/seed/10/1920/1080',
'https://picsum.photos/seed/11/1920/1080',
'https://picsum.photos/seed/12/1920/1080',
'https://picsum.photos/seed/13/1920/1080',
'https://picsum.photos/seed/14/1920/1080',
'https://picsum.photos/seed/15/1920/1080',
'https://picsum.photos/seed/16/1920/1080',
'https://picsum.photos/seed/17/1920/1080'
]
output_dir = 'downloaded_images'
##
def download(url):
print(f'downloading {url}')
img_data = requests.get(url).content
img_name = url.split('/')[-3]
img_name = f'{img_name}.jpg'
print(f'Received data for {img_name}')
print(f'Active Threads: {threading.active_count()}')
with open(os.path.join(output_dir,img_name), 'wb') as img_file:
img_file.write(img_data)
number_of_threads = 2
t1 = time.perf_counter()
with ThreadPool(number_of_threads) as pool:
pool.map(download,urls_to_download)
t2 = time.perf_counter()
print(f'Finished in {t2-t1} seconds')
I use python multiprocessing to compute some sort of scores on DNA sequences from a large file.
For that I write and use the script below.
I use a Linux machine with 48 cpu in python 3.8 environment.
Th code work fine, and terminate the work correctly and print the processing time at the end.
Problem: when I use the htop command, I find that all 48 processes are still alive.
I don't know why, and I don't know what to add to my script to avoid this.
import csv
import sys
import concurrent.futures
from itertools import combinations
import psutil
import time
nb_cpu = psutil.cpu_count(logical=False)
def fun_job(seq_1, seq_2): # seq_i : (id, string)
start = time.time()
score_dist = compute_score_dist(seq_1[1], seq_2[1])
end = time.time()
return seq_1[0], seq_2[0], score_dist, end - start # id seq1, id seq2, score, time
def help_fun_job(nested_pair):
return fun_job(nested_pair[0], nested_pair[1])
def compute_using_multi_processing(list_comb_ids, dict_ids_seqs):
start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor(max_workers=nb_cpu) as executor:
results = executor.map(help_fun_job,
[((pair_ids[0], dict_ids_seqs[pair_ids[0]]), (pair_ids[1], dict_ids_seqs[pair_ids[1]]))
for pair_ids in list_comb_ids])
save_results_to_csv(results)
finish = time.perf_counter()
proccessing_time = str(datetime.timedelta(seconds=round(finish - start, 2)))
print(f' Processing time Finished in {proccessing_time} hh:mm:ss')
def main():
print("nb_cpu in this machine : ", nb_cpu)
file_path = sys.argv[1]
dict_ids_seqs = get_dict_ids_seqs(file_path)
list_ids = list(dict_ids_seqs) # This will convert the dict_keys to a list
list_combined_ids = list(combinations(list_ids, 2))
compute_using_multi_processing(list_combined_ids, dict_ids_seqs)
if __name__ == '__main__':
main()
Thank you for your help.
Edit : add the complete code for fun_job (after #Booboo answer)
from Bio import Align
def fun_job(seq_1, seq_2): # seq_i : (id, string)
start = time.time()
aligner = Align.PairwiseAligner()
aligner.mode = 'global'
score_dist = aligner.score(seq_1[1],seq_2[1])
end = time.time()
return seq_1[0], seq_2[0], score_dist, end - start # id seq1, id seq2, score, time
When the with ... as executor: block exits, there is an implicit call to executor.shutdown(wait=True). This will wait for all pending futures to to be done executing "and the resources associated with the executor have been freed", which presumably includes terminating the processes in the pool (if possible?). Why your program terminates (or does it?) or at least you say all the futures have completed executing, while the processes have not terminated is a bit of a mystery. But you haven't provided the code for fun_job, so who can say why this is so?
One thing you might try is to switch to using the multiprocessing.pool.Pool class from the multiprocessing module. It supports a terminate method, which is implicitly called when its context manager with block exits, that explicitly attempts to terminate all processes in the pool:
#import concurrent.futures
import multiprocessing
... # etc.
def compute_using_multi_processing(list_comb_ids, dict_ids_seqs):
start = time.perf_counter()
with multiprocessing.Pool(processes=nb_cpu) as executor:
results = executor.map(help_fun_job,
[((pair_ids[0], dict_ids_seqs[pair_ids[0]]), (pair_ids[1], dict_ids_seqs[pair_ids[1]]))
for pair_ids in list_comb_ids])
save_results_to_csv(results)
finish = time.perf_counter()
proccessing_time = str(datetime.timedelta(seconds=round(finish - start, 2)))
print(f' Processing time Finished in {proccessing_time} hh:mm:ss')
Python on AWS Lambda does not support multiprocessing.Pool.map(), as documented in this other question. Please note that the other question was asking why it doesn't work. This question is different, I'm asking how to emulate the functionality given the lack of underlying support.
One of the answers to that other question gave us this code:
# Python 3.6
from multiprocessing import Pipe, Process
def myWorkFunc(data, connection):
result = None
# Do some work and store it in result
if result:
connection.send([result])
else:
connection.send([None])
def myPipedMultiProcessFunc():
# Get number of available logical cores
plimit = multiprocessing.cpu_count()
# Setup management variables
results = []
parent_conns = []
processes = []
pcount = 0
pactive = []
i = 0
for data in iterable:
# Create the pipe for parent-child process communication
parent_conn, child_conn = Pipe()
# create the process, pass data to be operated on and connection
process = Process(target=myWorkFunc, args=(data, child_conn,))
parent_conns.append(parent_conn)
process.start()
pcount += 1
if pcount == plimit: # There is not currently room for another process
# Wait until there are results in the Pipes
finishedConns = multiprocessing.connection.wait(parent_conns)
# Collect the results and remove the connection as processing
# the connection again will lead to errors
for conn in finishedConns:
results.append(conn.recv()[0])
parent_conns.remove(conn)
# Decrement pcount so we can add a new process
pcount -= 1
# Ensure all remaining active processes have their results collected
for conn in parent_conns:
results.append(conn.recv()[0])
conn.close()
# Process results as needed
Can this sample code be modified to support multiprocessing.Pool.map()?
What have I tried so far
I analysed the above code and I do not see a parameter for the function to be executed or the data, so I'm inferring that it does not perform the same function as multiprocessing.Pool.map(). It is not clear what the code does, other than demonstrating the building blocks that could be assembled into a solution.
Is this a "write my code for me" question?
Yes to some extent, it is. This issue impacts thousands of Python developers, and it would be far more efficient for the world economy, less green-house gas emissions, etc if all of us share the same code, instead of forcing every SO user who encounters this to go and develop their own workaround. I hope I've done my part by distilling this into a clear question with the presumed building blocks ready to go.
I was able to get this working for my own tests.
I've based my code on this link : https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda/
NB1: you MUST increase memory allocation to the lambda function. with the default minimal amount, there's no increase in performance with multiprocessing. With the maximum my account can allocate (3008MB) the figures below were attained.
NB2: I'm completely ignoring max processes in parallel here. My usage doesn't have a whole lot of elements to work on.
with the code below, usage is:
work = funcmap(yourfunction,listofstufftoworkon)
yourresults = work.run()
running from my laptop:
jumper#jumperdebian[3333] ~/scripts/tmp 2019-09-04 11:52:30
└─ $ ∙ python3 -c "import tst; tst.lambda_handler(None,None)"
results : [(35, 9227465), (35, 9227465), (35, 9227465), (35, 9227465)]
SP runtime : 9.574460506439209
results : [(35, 9227465), (35, 9227465), (35, 9227465), (35, 9227465)]
MP runtime : 6.422513484954834
running from aws:
Function Logs:
START RequestId: 075a92c0-7c4f-4f48-9820-f394ee899a97 Version: $LATEST
results : [(35, 9227465), (35, 9227465), (35, 9227465), (35, 9227465)]
SP runtime : 12.135798215866089
results : [(35, 9227465), (35, 9227465), (35, 9227465), (35, 9227465)]
MP runtime : 7.293526887893677
END RequestId: 075a92c0-7c4f-4f48-9820-f394ee899a97
Here's the test code:
import time
from multiprocessing import Process, Pipe
import boto3
class funcmap(object):
fmfunction=None
fmlist=None
def __init__(self,pfunction,plist):
self.fmfunction=pfunction
self.fmlist=plist
def calculation(self, pfunction, pload, conn):
panswer=pfunction(pload)
conn.send([pload,panswer])
conn.close()
def run(self):
datalist = self.fmlist
processes = []
parent_connections = []
for datum in datalist:
parent_conn, child_conn = Pipe()
parent_connections.append(parent_conn)
process = Process(target=self.calculation, args=(self.fmfunction, datum, child_conn,))
processes.append(process)
pstart=time.time()
for process in processes:
process.start()
#print("starting at t+ {} s".format(time.time()-pstart))
for process in processes:
process.join()
#print("joining at t+ {} s".format(time.time()-pstart))
results = []
for parent_connection in parent_connections:
resp=parent_connection.recv()
results.append((resp[0],resp[1]))
return results
def fibo(n):
if n <= 2 : return 1
return fibo(n-1)+fibo(n-2)
def lambda_handler(event, context):
#worklist=[22,23,24,25,26,27,28,29,30,31,32,31,30,29,28,27,26,27,28,29]
#worklist=[22,23,24,25,26,27,28,29,30]
worklist=[30,30,30,30]
#worklist=[30]
_start = time.time()
results=[]
for a in worklist:
results.append((a,fibo(a)))
print("results : {}".format(results))
_end = time.time()
print("SP runtime : {}".format(_end-_start))
_mstart = time.time()
work = funcmap(fibo,worklist)
results = work.run()
print("results : {}".format(results))
_mend = time.time()
print("MP runtime : {}".format(_mend-_mstart))
hope it helps.
I had the same issue, and ended up implementing my own simple wrapper around multiprocessing.Pool. Definitely not bullet proof, but enough for simple use cases as drop-in replacement.
https://stackoverflow.com/a/63633248/158049
It is fairly easy to do parallel work with Python 3's concurrent.futures module as shown below.
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
future_to = {executor.submit(do_work, input, 60): input for input in dictionary}
for future in concurrent.futures.as_completed(future_to):
data = future.result()
It is also very handy to insert and retrieve items into a Queue.
q = queue.Queue()
for task in tasks:
q.put(task)
while not q.empty():
q.get()
I have a script running in background listening for updates. Now, in theory assume that, as those updates arrive, I would queue them and do work on them concurrently using the ThreadPoolExecutor.
Now, individually, all of these components work in isolation, and make sense, but how do I go about using them together? I am not aware if it is possible to feed the ThreadPoolExecutor work from the queue in real time unless the data to work from is predetermined?
In a nutshell, all I want to do is, receive updates of say 4 messages a second, shove them in a queue, and get my concurrent.futures to work on them. If I don't, then I am stuck with a sequential approach which is slow.
Let's take the canonical example in the Python documentation below:
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
The list of URLS is fixed. Is it possible to feed this list in real-time and get the worker to process it as they come by, perhaps from a queue for management purposes? I am a bit confused on whether my approach is actually possible?
The example from the Python docs, expanded to take its work from a queue. A change to note, is that this code uses concurrent.futures.wait instead of concurrent.futures.as_completed to allow new work to be started while waiting for other work to complete.
import concurrent.futures
import urllib.request
import time
import queue
q = queue.Queue()
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
def feed_the_workers(spacing):
""" Simulate outside actors sending in work to do, request each url twice """
for url in URLS + URLS:
time.sleep(spacing)
q.put(url)
return "DONE FEEDING"
def load_url(url, timeout):
""" Retrieve a single page and report the URL and contents """
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# start a future for a thread which sends work in through the queue
future_to_url = {
executor.submit(feed_the_workers, 0.25): 'FEEDER DONE'}
while future_to_url:
# check for status of the futures which are currently working
done, not_done = concurrent.futures.wait(
future_to_url, timeout=0.25,
return_when=concurrent.futures.FIRST_COMPLETED)
# if there is incoming work, start a new future
while not q.empty():
# fetch a url from the queue
url = q.get()
# Start the load operation and mark the future with its URL
future_to_url[executor.submit(load_url, url, 60)] = url
# process any completed futures
for future in done:
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
if url == 'FEEDER DONE':
print(data)
else:
print('%r page is %d bytes' % (url, len(data)))
# remove the now completed future
del future_to_url[future]
Output from fetching each url twice:
'http://www.foxnews.com/' page is 67574 bytes
'http://www.cnn.com/' page is 136975 bytes
'http://www.bbc.co.uk/' page is 193780 bytes
'http://some-made-up-domain.com/' page is 896 bytes
'http://www.foxnews.com/' page is 67574 bytes
'http://www.cnn.com/' page is 136975 bytes
DONE FEEDING
'http://www.bbc.co.uk/' page is 193605 bytes
'http://some-made-up-domain.com/' page is 896 bytes
'http://europe.wsj.com/' page is 874649 bytes
'http://europe.wsj.com/' page is 874649 bytes
At work I found a situation where I wanted to do parallel work on an unbounded stream of data. I created a small library inspired by the excellent answer already provided by Stephen Rauch.
I originally approached this problem by thinking about two separate threads, one that submits work to a queue and one that monitors the queue for any completed tasks and makes more room for new work to come in. This is similar to what Stephen Rauch proposed, where he consumes the stream using a feed_the_workers function that runs in a separate thread.
Talking to one of my colleagues, he helped me realize that you can get away with doing everything in a single thread if you define a buffered iterator that allows you to control how many elements are let out of the input stream every time you are ready to submit more work to the thread pool.
So we introduce the BufferedIter class
class BufferedIter(object):
def __init__(self, iterator):
self.iter = iterator
def nextN(self, n):
vals = []
for _ in range(n):
vals.append(next(self.iter))
return vals
which allows us to define the stream processor in the following way
import logging
import queue
import signal
import sys
import time
from concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED
level = logging.DEBUG
log = logging.getLogger(__name__)
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(logging.Formatter('%(asctime)s %(message)s'))
handler.setLevel(level)
log.addHandler(handler)
log.setLevel(level)
WAIT_SLEEP = 1 # second, adjust this based on the timescale of your tasks
def stream_processor(input_stream, task, num_workers):
# Use a queue to signal shutdown.
shutting_down = queue.Queue()
def shutdown(signum, frame):
log.warning('Caught signal %d, shutting down gracefully ...' % signum)
# Put an item in the shutting down queue to signal shutdown.
shutting_down.put(None)
# Register the signal handler
signal.signal(signal.SIGTERM, shutdown)
signal.signal(signal.SIGINT, shutdown)
def is_shutting_down():
return not shutting_down.empty()
futures = dict()
buffer = BufferedIter(input_stream)
with ThreadPoolExecutor(num_workers) as executor:
num_success = 0
num_failure = 0
while True:
idle_workers = num_workers - len(futures)
if not is_shutting_down():
items = buffer.nextN(idle_workers)
for data in items:
futures[executor.submit(task, data)] = data
done, _ = wait(futures, timeout=WAIT_SLEEP, return_when=ALL_COMPLETED)
for f in done:
data = futures[f]
try:
f.result(timeout=0)
except Exception as exc:
log.error('future encountered an exception: %r, %s' % (data, exc))
num_failure += 1
else:
log.info('future finished successfully: %r' % data)
num_success += 1
del futures[f]
if is_shutting_down() and len(futures) == 0:
break
log.info("num_success=%d, num_failure=%d" % (num_success, num_failure))
Below we show an example for how to use the stream processor
import itertools
def integers():
"""Simulate an infinite stream of work."""
for i in itertools.count():
yield i
def task(x):
"""The task we would like to perform in parallel.
With some delay to simulate a time consuming job.
With a baked in exception to simulate errors.
"""
time.sleep(3)
if x == 4:
raise ValueError('bad luck')
return x * x
stream_processor(integers(), task, num_workers=3)
The output for this example is shown below
2019-01-15 22:34:40,193 future finished successfully: 1
2019-01-15 22:34:40,193 future finished successfully: 0
2019-01-15 22:34:40,193 future finished successfully: 2
2019-01-15 22:34:43,201 future finished successfully: 5
2019-01-15 22:34:43,201 future encountered an exception: 4, bad luck
2019-01-15 22:34:43,202 future finished successfully: 3
2019-01-15 22:34:46,208 future finished successfully: 6
2019-01-15 22:34:46,209 future finished successfully: 7
2019-01-15 22:34:46,209 future finished successfully: 8
2019-01-15 22:34:49,215 future finished successfully: 11
2019-01-15 22:34:49,215 future finished successfully: 10
2019-01-15 22:34:49,215 future finished successfully: 9
^C <=== THIS IS WHEN I HIT Ctrl-C
2019-01-15 22:34:50,648 Caught signal 2, shutting down gracefully ...
2019-01-15 22:34:52,221 future finished successfully: 13
2019-01-15 22:34:52,222 future finished successfully: 14
2019-01-15 22:34:52,222 future finished successfully: 12
2019-01-15 22:34:52,222 num_success=14, num_failure=1
I really liked the interesting approach by #pedro above. However, when processing thousands of files, I noticed that at the end a StopIteration would be thrown and some files would always be skipped. I had to make a little modification to as follows. Very useful answer again.
class BufferedIter(object):
def __init__(self, iterator):
self.iter = iterator
def nextN(self, n):
vals = []
try:
for _ in range(n):
vals.append(next(self.iter))
return vals, False
except StopIteration as e:
return vals, True
-- Call as follows
...
if not is_shutting_down():
items, is_finished = buffer.nextN(idle_workers)
if is_finished:
stop()
...
-- Where stop is a function that simply tells to shutdown
def stop():
shutting_down.put(None)
It is possible to gain the benefits of the executor without strictly having to use a Queue. New tasks are submitted from the main thread. The undone futures are tracked and waited on until all futures are done.
import concurrent.futures
import sys
import time
sys.setrecursionlimit(64) # This is only for demonstration purposes to trigger a RecursionError. Do not set in practice.
def slow_factorial(n: int) -> int:
time.sleep(0.01)
if n == 0:
return 1
else:
return n * slow_factorial(n-1)
initial_inputs = [0, 1, 5, 20, 200, 100, 50, 51, 55, 40, 44, 21, 222, 333, 202, 1000, 10, 9000, 9009, 99, 9999]
for executor_class in (concurrent.futures.ThreadPoolExecutor, concurrent.futures.ProcessPoolExecutor):
for max_workers in (4, 8, 16, 32):
start_time = time.monotonic()
with executor_class(max_workers=max_workers) as executor:
futures_to_n = {executor.submit(slow_factorial, n): n for n in initial_inputs}
while futures_to_n:
futures_done, futures_not_done = concurrent.futures.wait(futures_to_n, return_when=concurrent.futures.FIRST_COMPLETED)
# Note: Length of futures_done is often > 1.
for future in futures_done:
n = futures_to_n.pop(future)
try:
factorial_n = future.result()
except RecursionError:
n_smaller = int(n ** 0.9)
future = executor.submit(slow_factorial, n_smaller)
futures_to_n[future] = n_smaller
# print(f'Failed to compute factorial of {n}. Trying to compute factorial of a smaller number {n_smaller} instead.')
else:
# print(f'Factorial of {n} is {factorial_n}.')
pass
used_time = time.monotonic() - start_time
executor_type = executor_class.__name__.removesuffix('PoolExecutor').lower()
print(f'Workflow took {used_time:.1f}s with {max_workers} {executor_type} workers.')
print()
Output:
Workflow took 9.4s with 4 thread workers.
Workflow took 6.3s with 8 thread workers.
Workflow took 5.4s with 16 thread workers.
Workflow took 5.2s with 32 thread workers.
Workflow took 9.0s with 4 process workers.
Workflow took 5.9s with 8 process workers.
Workflow took 5.1s with 16 process workers.
Workflow took 4.9s with 32 process workers.
For more clarity, uncomment the two print statements. As per the output above, there is an asymptotic speed benefit with more workers.