Async loop in python - python

I'm new to python. I'm using the latest version.
I have a for loop that takes long to execute, and would like to run it in parallel to improve performances.
After some research, I gathered that async.io and async for is my best option, but I haven't understood yet how I can transform my for loop using this technique.
Here is my code:
def filter(my_list):
res = []
for _ in my_list:
if check(_): # this takes a while to execute
res.append(_)
else:
print(f'{_} removed')
return res
How can I optimize the execution time of this program ?
The rest of the program should remain the same, meaning that calling filter should not change, and should return a filtered list.
Thanks

Async
unless you modify check()
to be an async function
use async libraries/modules
to be primarily io bound
you will not gain any performance from async. example of a valid async function.
async def check(item):
await asyncio.sleep(1)
return item > 5
if you did have an async check function, you could do something like this
series version that takes 10s
my_list = list(range(10))
res = [item for item in my_list if await check(item)]
vs parralel version that takes 1s
import asyncio
my_list = list(range(10))
check_tasks = [check(_) for _ in my_list]
checked = await asyncio.gather(*check_tasks)
res = [item for keep, item in zip(checked, my_list) if keep]
print(res)
Note how while creating the list of check_tasks, we don't use await. This is because asyncio.gather takes in coroutines.
Also if you use time.sleep(1) instead of asyncio.sleep(1), both series and parallel will have the same 10s runtime.
If you want to limit the maximum async coroutines executing at one point of time, you can use an asyncio.Semaphore and modify check().
Example - if we want 2 in parallel at a given time -
sem = asyncio.Semaphore(2)
async def check(item):
async with sem:
await asyncio.sleep(1)
return item > 5
which takes 5s
Multiprocessing version
check is defined as
import time
def check(item):
await time.sleep(1)
return item > 5
our initial code that runs in series will be
my_list = list(range(10))
checked = map(check, my_list)
res = [item for keep, item in zip(checked, my_list) if keep]
print(res)
and the parallel version will be
from multiprocessing import Pool
my_list = list(range(10))
with Pool(5) as p:
checked = p.map(check, my_list)
res = [item for keep, item in zip(checked, my_list) if keep]
print(res)
Pool(5) will start 5 processes here. Keep in mind that starting a process is expensive.

Related

Given N generators, is it possible to create a generator that runs them in parallel processes and yields the zip of those generators?

Suppose I have N generators gen_1, ..., gen_N where each on them will yield the same number of values. I would like a generator gen such that it runs gen_1, ..., gen_N in N parallel processes and yields (next(gen_1), next(gen_2), ... next(gen_N))
That is I would like to have:
def gen():
yield (next(gen_1), next(gen_2), ... next(gen_N))
in such a way that each gen_i is running on its own process. Is it possible to do this? I have tried doing this in the following dummy example with no success:
A = range(4)
def gen(a):
B = ['a', 'b', 'c']
for b in B:
yield b + str(a)
def target(g):
return next(g)
processes = [Process(target=target, args=(gen(a),)) for a in A]
for p in processes:
p.start()
for p in processes:
p.join()
However I get the error TypeError: cannot pickle 'generator' object.
EDIT:
I have modified #darkonaut answer's a bit to fit my needs. I am posting it in case some of you find it useful. We first define a couple of utility functions:
from itertools import zip_longest
from typing import List, Generator
def grouper(iterable, n, fillvalue=iter([])):
"Collect data into fixed-length chunks or blocks"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
def split_generators_into_batches(generators: List[Generator], n_splits):
chunks = grouper(generators, len(generators) // n_splits + 1)
return [zip_longest(*chunk) for chunk in chunks]
The following class is responsible for splitting any number of generators into n (number of processes) batches and proccessing them yielding the desired result:
import multiprocessing as mp
class GeneratorParallelProcessor:
SENTINEL = 'S'
def __init__(self, generators, n_processes = 2 * mp.cpu_count()):
self.n_processes = n_processes
self.generators = split_generators_into_batches(list(generators), n_processes)
self.queue = mp.SimpleQueue()
self.barrier = mp.Barrier(n_processes + 1)
self.sentinels = [self.SENTINEL] * n_processes
self.processes = [
mp.Process(target=self._worker, args=(self.barrier, self.queue, gen)) for gen in self.generators
]
def process(self):
for p in self.processes:
p.start()
while True:
results = list(itertools.chain(*(self.queue.get() for _ in self.generators)))
if results != self.sentinels:
yield results
self.barrier.wait()
else:
break
for p in self.processes:
p.join()
def _worker(self, barrier, queue, generator):
for x in generator:
queue.put(x)
barrier.wait()
queue.put(self.SENTINEL)
To use it just do the following:
parallel_processor = GeneratorParallelProcessor(generators)
for grouped_generator in parallel_processor.process():
output_handler(grouped_generator)
It's possible to get such an "Unified Parallel Generator (UPG)" (attempt to coin a name) with some effort, but as #jasonharper already mentioned, you definitely need to assemble the sub-generators within the child-processes, since a running generator can't be pickled.
The pattern below is re-usable with only the generator function gen() being custom to this example. The design uses multiprocessing.SimpleQueue for returning generator results to the parent and multiprocessing.Barrier for synchronization.
Calling Barrier.wait() will block the caller (thread in any process) until the number of specified parties has called .wait(), whereupon all threads currently waiting on the Barrier get released simultaneously. The usage of Barrier here ensures further generator-results are only started to be computed after the parent has received all results from an iteration, which might be desirable to keep overall memory consumption in check.
The number of parallel workers used equals the number of argument-tuples you provide within the gen_args_tuples-iterable, so gen_args_tuples=zip(range(4)) will use four workers for example. See comments in code for further details.
import multiprocessing as mp
SENTINEL = 'SENTINEL'
def gen(a):
"""Your individual generator function."""
lst = ['a', 'b', 'c']
for ch in lst:
for _ in range(int(10e6)): # some dummy computation
pass
yield ch + str(a)
def _worker(i, barrier, queue, gen_func, gen_args):
for x in gen_func(*gen_args):
print(f"WORKER-{i} sending item.")
queue.put((i, x))
barrier.wait()
queue.put(SENTINEL)
def parallel_gen(gen_func, gen_args_tuples):
"""Construct and yield from parallel generators
build from `gen_func(gen_args)`.
"""
gen_args_tuples = list(gen_args_tuples) # ensure list
n_gens = len(gen_args_tuples)
sentinels = [SENTINEL] * n_gens
queue = mp.SimpleQueue()
barrier = mp.Barrier(n_gens + 1) # `parties`: + 1 for parent
processes = [
mp.Process(target=_worker, args=(i, barrier, queue, gen_func, args))
for i, args in enumerate(gen_args_tuples)
]
for p in processes:
p.start()
while True:
results = [queue.get() for _ in range(n_gens)]
if results != sentinels:
results.sort()
yield tuple(r[1] for r in results) # sort and drop ids
barrier.wait() # all workers are waiting
# already, so this will unblock immediately
else:
break
for p in processes:
p.join()
if __name__ == '__main__':
for res in parallel_gen(gen_func=gen, gen_args_tuples=zip(range(4))):
print(res)
Output:
WORKER-1 sending item.
WORKER-0 sending item.
WORKER-3 sending item.
WORKER-2 sending item.
('a0', 'a1', 'a2', 'a3')
WORKER-1 sending item.
WORKER-2 sending item.
WORKER-3 sending item.
WORKER-0 sending item.
('b0', 'b1', 'b2', 'b3')
WORKER-2 sending item.
WORKER-3 sending item.
WORKER-1 sending item.
WORKER-0 sending item.
('c0', 'c1', 'c2', 'c3')
Process finished with exit code 0
I went for a little different approach, you can modify the example below accordingly.
So somewhere in the main script initialize the pool according to your needs, you need just this 2 lines
from multiprocessing import Pool
pool = Pool(processes=4)
then you can define a generator function like this:
(Note that the generators input is assumed to be any iterable containing all the generators)
def parallel_generators(generators, pool):
results = ['placeholder']
while len(results) != 0:
batch = pool.map_async(next, generators) # defines the next round of values
results = list(batch.get) # actual calculation done here
yield results
return
We define the results condition in the while loop like this because map objects with next and generators return an empty list when the generators stop producing values. So at that point we just terminate the parallel generator.
EDIT
So apparently multiproccecing pool, and map don't play good with generators making the above code not work as intended so do not use until later update.
As for the pickle error it seems some bound functions do not support pickle which is needed in the multiprocessing library in order to transfer objects and functions, for a workaround the pathos mutliprocessing library uses dill which solves the need for pickle and is an option you might want to try, searching in Stack Overflow for your error you can also find some more complicated solutions with custom code for pickling the functions needed.

Aggregating an async generator to a tuple

In trying to aggregate the results from an asynchronous generator, like so:
async def result_tuple():
async def result_generator():
# some await things happening in here
yield 1
yield 2
return tuple(num async for num in result_generator())
I get a
TypeError: 'async_generator' object is not iterable
when executing the async for line.
But PEP 530 seems to suggest that it should be valid:
Asynchronous Comprehensions
We propose to allow using async for inside list, set and dict comprehensions. Pending PEP 525 approval, we can also allow creation of asynchronous generator expressions.
Examples:
set comprehension: {i async for i in agen()};
list comprehension: [i async for i in agen()];
dict comprehension: {i: i ** 2 async for i in agen()};
generator expression: (i ** 2 async for i in agen()).
What's going on, and how can I aggregate an asynchronous generator into a single tuple?
In the PEP excerpt, the comprehensions are listed side-by-side in the same bullet list, but the generator expression is very different from the others.
There is no such thing as a "tuple comprehension". The argument to tuple() makes an asynchronous generator:
tuple(num async for num in result_generator())
The line is equivalent to tuple(result_generator()). The tuple then tries to iterate over the generator synchronously and raises the TypeError.
The other comprehensions will work, though, as the question expected. So it's possible to generate a tuple by first aggregating to a list, like so:
async def result_tuple():
async def result_generator():
# some await things happening in here
yield 1
yield 2
return tuple([num async for num in result_generator()])

Concurrent futures wait for subset of tasks

I'm using Python's concurrent.futures framework. I have used the map() function to launch concurrent tasks as such:
def func(i):
return i*i
list = [1,2,3,4,5]
async_executor = concurrent.futures.ThreadPoolExecutor(5)
results = async_executor.map(func,list)
I am interested only in the first n results and want to stop the executor after the first n threads are finished where n is a number less than the size of the input list. Is there any way to do this in Python? Is there another framework I should look into?
You can't use map() for this because it provides no way to stop waiting for the results, nor any way to get the submitted futures and cancel them. However, you can do it using submit():
import concurrent.futures
import time
def func(i):
time.sleep(i)
return i*i
list = [1,2,3,6,6,6,90,100]
async_executor = concurrent.futures.ThreadPoolExecutor(2)
futures = {async_executor.submit(func, i): i for i in list}
for ii, future in enumerate(concurrent.futures.as_completed(futures)):
print(ii, "result is", future.result())
if ii == 2:
async_executor.shutdown(wait=False)
for victim in futures:
victim.cancel()
break
The above code takes about 11 seconds to run--it executes jobs [1,2,3,6,7] but not the rest.

How to run multiple functions to return respective lists in parallel

In the example code, I would like to run 4 functions in parallel and return list values for each. Is the multiprocessing package appropriate for this task? If so how do I implement it?
Example Code:
from multiprocessing import Pool
def func_a(num):
return([1+num,2+num,3+num])
def func_b(num):
return([10+num,11+num,12+num])
def func_c(num):
return([20+num,21+num,22+num])
def func_d(num):
return([30+num,31+num,32+num])
if __name__ == '__main__':
pool = Pool(processes=2)
list_a = ???
list_b = ???
list_c = ???
list_d = ???
full_list = []
for item in list_a:
full_list.append(item)
for item in list_b:
full_list.append(item)
for item in list_c:
full_list.append(item)
for item in list_d:
full_list.append(item)
Any information much appreciated. Thanks in advance.
As explained in Process Pools, you need to submit all of the jobs to the pool, and then wait for all of the results.
I'm not sure what arguments you want to pass to these functions, since it isn't in your question or your code, but I'll just make up something arbitrary.
if __name__ == '__main__':
pool = Pool(processes=2)
result_a = pool.apply_async(func_a, (23,))
result_b = pool.apply_async(func_b, (42,))
result_c = pool.apply_async(func_c, (fractions.Fraction(1, 2),))
result_d = pool.apply_async(func_a, (1j * math.pi,))
full_list = []
for item in result_a.get():
full_list.append(item)
for item in result_b.get():
full_list.append(item)
for item in result_c.get():
full_list.append(item)
for item in result_d.get():
full_list.append(item)
You can dramatically simplify this in multiple ways (e.g., each of those for loops can be replaced by a single call to extend, or you can just write full_list = result_a.get() + result_b.get() + result_c.get() + result_d.get()), but this is the smallest change to your existing code that works. (And if you really want to simplify this code, I think you'd be happier with concurrent.futures.ProcessPoolExecutor in the first place.)
Assuming (since my question was never answered) that each function receives the same number:
def apply_func(f):
return f(3)
full_list = sum(pool.map(apply_func, [func_a, func_b, func_c, func_d]), [])

Parallel recursive function in Python

How do I parallelize a recursive function in Python?
My function looks like this:
def f(x, depth):
if x==0:
return ...
else :
return [x] + map(lambda x:f(x, depth-1), list_of_values(x))
def list_of_values(x):
# Heavy compute, pure function
When trying to parallelize it with multiprocessing.Pool.map, Windows opens an infinite number of processes and hangs.
What's a good (preferably simple) way to parallelize it (for a single multicore machine)?
Here is the code that hangs:
from multiprocessing import Pool
pool = pool(processes=4)
def f(x, depth):
if x==0:
return ...
else :
return [x] + pool.map(lambda x:f(x, depth-1), list_of_values(x))
def list_of_values(x):
# Heavy compute, pure function
OK, sorry for the problems with this.
I'm going to answer a slightly different question where f() returns the sum of the values in the list. That is because it's not clear to me from your example what the return type of f() would be, and using an integer makes the code simple to understand.
This is complex because there are two different things happening in parallel:
the calculation of the expensive function in the pool
the recursive expansion of f()
I am very careful to only use the pool to calculate the expensive function. In that way we don't get an "explosion" of processes, but because this is asynchronous we need to postpone a lot of work for the callback that the worker calls once the expensive function is done.
More than that, we need to use a countdown latch so that we know when all the separate sub-calls to f() are complete.
There may be a simpler way (I am pretty sure there is, but I need to do other things), but perhaps this gives you an idea of what is possible:
from multiprocessing import Pool, Value, RawArray, RLock
from time import sleep
class Latch:
'''A countdown latch that lets us wait for a job of "n" parts'''
def __init__(self, n):
self.__counter = Value('i', n)
self.__lock = RLock()
def decrement(self):
with self.__lock:
self.__counter.value -= 1
print('dec', self.read())
return self.read() == 0
def read(self):
with self.__lock:
return self.__counter.value
def join(self):
while self.read():
sleep(1)
def list_of_values(x):
'''An expensive function'''
print(x, ': thinking...')
sleep(1)
print(x, ': thought')
return list(range(x))
pool = Pool()
def async_f(x, on_complete=None):
'''Return the sum of the values in the expensive list'''
if x == 0:
on_complete(0) # no list, return 0
else:
n = x # need to know size of result beforehand
latch = Latch(n) # wait for n entires to be calculated
result = RawArray('i', n+1) # where we will assemble the map
def delayed_map(values):
'''This is the callback for the pool async process - it runs
in a separate thread within this process once the
expensive list has been calculated and orchestrates the
mapping of f over the result.'''
result[0] = x # first value in list is x
for (v, i) in enumerate(values):
def callback(fx, i=i):
'''This is the callback passed to f() and is called when
the function completes. If it is the last of all the
calls in the map then it calls on_complete() (ie another
instance of this function) for the calling f().'''
result[i+1] = fx
if latch.decrement(): # have completed list
# at this point result contains [x]+map(f, ...)
on_complete(sum(result)) # so return sum
async_f(v, callback)
# Ask worker to generate list then call delayed_map
pool.apply_async(list_of_values, [x], callback=delayed_map)
def run():
'''Tie into the same mechanism as above, for the final value.'''
result = Value('i')
latch = Latch(1)
def final_callback(value):
result.value = value
latch.decrement()
async_f(6, final_callback)
latch.join() # wait for everything to complete
return result.value
print(run())
PS: I am using Python 3.2 and the ugliness above is because we are delaying computation of the final results (going back up the tree) until later. It's possible something like generators or futures could simplify things.
Also, I suspect you need a cache to avoid needlessly recalculating the expensive function when called with the same argument as earlier.
See also yaniv's answer - which seems to be an alternative way to reverse the order of the evaluation by being explicit about depth.
After thinking about this, I found a simple, not complete, but good enough answer:
# A partially parallel solution. Just do the first level of recursion in parallel. It might be enough work to fill all cores.
import multiprocessing
def f_helper(data):
return f(x=data['x'],depth=data['depth'], recursion_depth=data['recursion_depth'])
def f(x, depth, recursion_depth):
if depth==0:
return ...
else :
if recursion_depth == 0:
pool = multiprocessing.Pool(processes=4)
result = [x] + pool.map(f_helper, [{'x':_x, 'depth':depth-1, 'recursion_depth':recursion_depth+1 } _x in list_of_values(x)])
pool.close()
else:
result = [x] + map(f_helper, [{'x':_x, 'depth':depth-1, 'recursion_depth':recursion_depth+1 } _x in list_of_values(x)])
return result
def list_of_values(x):
# Heavy compute, pure function
I store the main process id initially and transfer it to sub programs.
When I need to start a multiprocessing job, I check the number of children of the main process. If it is less than or equal to the half of my CPU count, then I run it as parallel. If it greater than the half of my CPU count, then I run it serial. In this way, it avoids bottlenecks and uses CPU cores effectively. You can tune the number of cores for your case. For example, you can set it to the exact number of CPU cores, but you should not exceed it.
def subProgramhWrapper(func, args):
func(*args)
parent = psutil.Process(main_process_id)
children = parent.children(recursive=True)
num_cores = int(multiprocessing.cpu_count()/2)
if num_cores >= len(children):
#parallel run
pool = MyPool(num_cores)
results = pool.starmap(subProgram, input_params)
pool.close()
pool.join()
else:
#serial run
for input_param in input_params:
subProgramhWrapper(subProgram, input_param)

Categories