I have a pool_map function that can be used to limit the number of simultaneously executing functions.
The idea is to have a coroutine function accepting a single parameter that is mapped to a list of possible parameters, but to also wrap all function calls into a semaphore acquisition, whereupon only a limited number is running at once:
from typing import Callable, Awaitable, Iterable, Iterator
from asyncio import Semaphore
A = TypeVar('A')
V = TypeVar('V')
async def pool_map(
func: Callable[[A], Awaitable[V]],
arg_it: Iterable[A],
size: int=10
) -> Generator[Awaitable[V], None, None]:
"""
Maps an async function to iterables
ensuring that only some are executed at once.
"""
semaphore = Semaphore(size)
async def sub(arg):
async with semaphore:
return await func(arg)
return map(sub, arg_it)
I modified and didn’t test above code for the sake of an example, but my variant works well. E.g. you can use it like this:
from asyncio import get_event_loop, coroutine, as_completed
from contextlib import closing
URLS = [...]
async def run_all(awaitables):
for a in as_completed(awaitables):
result = await a
print('got result', result)
async def download(url): ...
if __name__ != '__main__':
pool = pool_map(download, URLS)
with closing(get_event_loop()) as loop:
loop.run_until_complete(run_all(pool))
But a problem arises if there is an exception thrown while awaiting a future. I can’t see how to cancel all scheduled or still-running tasks, neither the ones still waiting for the semaphore to be acquired.
Is there a library or an elegant building block for this that I don’t know, or do I have to build all parts myself? (i.e. a Semaphore with access to its waiters, a as_finished that provides access to its running task queue, …)
Use ensure_future to get a Task instead of a coroutine:
import asyncio
from contextlib import closing
def pool_map(func, args, size=10):
"""
Maps an async function to iterables
ensuring that only some are executed at once.
"""
semaphore = asyncio.Semaphore(size)
async def sub(arg):
async with semaphore:
return await func(arg)
tasks = [asyncio.ensure_future(sub(x)) for x in args]
return tasks
async def f(n):
print(">>> start", n)
if n == 7:
raise Exception("boom!")
await asyncio.sleep(n / 10)
print("<<< end", n)
return n
async def run_all(tasks):
exc = None
for a in asyncio.as_completed(tasks):
try:
result = await a
print('=== result', result)
except asyncio.CancelledError as e:
print("!!! cancel", e)
except Exception as e:
print("Exception in task, cancelling!")
for t in tasks:
t.cancel()
exc = e
if exc:
raise exc
pool = pool_map(f, range(1, 20), 3)
with closing(asyncio.get_event_loop()) as loop:
loop.run_until_complete(run_all(pool))
Here's a naive solution, based on the fact that cancel is a no-op if the task is already finished:
async def run_all(awaitables):
futures = [asyncio.ensure_future(a) for a in awaitables]
try:
for fut in as_completed(futures):
result = await fut
print('got result', result)
except:
for future in futures:
future.cancel()
await asyncio.wait(futures)
Related
Current versions of Python (Dec 2022) still allow using #coroutine decorator and a generation can be as:
import asyncio
asyncify = asyncio.coroutine
data_ready = False # Status of a pipe, just to test
def gen():
global data_ready
while not data_ready:
print("not ready")
data_ready = True # Just to test
yield
return "done"
async def main():
result = await asyncify(gen)()
print(result)
loop = asyncio.new_event_loop()
loop.create_task(main())
loop.run_forever()
However, new Python versions 3.8+ will deprecate #coroutine decorator (the asyncify function alias), how to wait for (await) generator to end as above?
I tried to use async def as expected by the warning but not working:
import asyncio
asyncify = asyncio.coroutine
data_ready = False # Just to test
async def gen():
global data_ready
while not data_ready:
print("not ready")
data_ready = True # Just to test
yield
yield "done"
return
async def main():
# this has error: TypeError: object async_generator can't be used in 'await' expression
result = await gen()
print(result)
loop = asyncio.new_event_loop()
loop.create_task(main())
loop.run_forever()
Asynchronous generators inherit asynchronous iterator and are aimed for asynchronous iterations. You can not directly await them as regular coroutines.
With that in mind, returning to your experimental case and your question "how to wait for (await) generator to end?": to get the final yielded value - perform asynchronous iterations:
import asyncio
data_ready = False # Just to test
async def gen():
global data_ready
while not data_ready:
print("not ready")
data_ready = True # Just to test
yield "processing"
yield "done"
return
async def main():
a_gen = gen()
async for result in a_gen: # assign to result on each async iteration
pass
print('result:', result)
asyncio.run(main())
Prints:
not ready
result: done
Naturally, you can also advance the async generator in steps with anext:
a_gen = gen()
val_1 = await anext(a_gen)
Summing up, follow the guidlines on PEP 525 – Asynchronous Generators and try to not mix old-depreceted things with the actual ones.
In most of the asynchronous coroutines I write, I only need to replace the function definition def func() -> async def func() and the sleep time.sleep(s) -> await asyncio.sleep(s).
Is it possible to convert a standard python function into an async function where all the time.sleep(s) is converted to await asyncio.sleep(s)?
Example
Performance during task
Measure performance during a task
import asyncio
import random
async def performance_during_task(task):
stop_event = asyncio.Event()
target_task = asyncio.create_task(task(stop_event))
perf_task = asyncio.create_task(measure_performance(stop_event))
await target_task
await perf_task
async def measure_performance(event):
while not event.is_set():
print('Performance: ', random.random())
await asyncio.sleep(.2)
if __name__ == "__main__":
asyncio.run(
performance_during_task(task)
)
Task
The task has to be defined with async def and await asyncio.sleep(s)
async def task(event):
for i in range(10):
print('Step: ', i)
await asyncio.sleep(.2)
event.set()
into ->
Easy task definition
To have others not worrying about async etc. I want them to be able to define a task normally (e.g. with a decorator?)
#as_async
def easy_task(event):
for i in range(10):
print('Step: ', i)
time.sleep(.2)
event.set()
So that it can be used as an async function with e.g. performance_during_task()
I think I found a solution similar to the interesting GitHub example mentioned in the comments and a similar post here.
We can write a decorator like
from functools import wraps, partial
def to_async(func):
#wraps(func) # Makes sure that function is returned for e.g. func.__name__ etc.
async def run(*args, loop=None, executor=None, **kwargs):
if loop is None:
loop = asyncio.get_event_loop(). # Make event loop of nothing exists
pfunc = partial(func, *args, **kwargs) # Return function with variables (event) filled in
return await loop.run_in_executor(executor, pfunc).
return run
Such that easy task becomes
#to_async
def easy_task(event):
for i in range(10):
print('Step: ', i)
time.sleep(.2)
event.set()
Where wraps makes sure we can call attributes of the original function (explained here).
And partial fills in the variables as explained here.
I have written code for async pool below. in __aexit__ i'm cancelling the _worker tasks after the tasks get finished. But when i run the code, the worker tasks are not getting cancelled and the code is running forever. This what the task looks like: <Task pending coro=<AsyncPool._worker() running at \async_pool.py:17> wait_for=<Future cancelled>>. The asyncio.wait_for is getting cancelled but not the worker tasks.
class AsyncPool:
def __init__(self,coroutine,no_of_workers,timeout):
self._loop = asyncio.get_event_loop()
self._queue = asyncio.Queue()
self._no_of_workers = no_of_workers
self._coroutine = coroutine
self._timeout = timeout
self._workers = None
async def _worker(self):
while True:
try:
ret = False
queue_item = await self._queue.get()
ret = True
result = await asyncio.wait_for(self._coroutine(queue_item), timeout = self._timeout,loop= self._loop)
except Exception as e:
print(e)
finally:
if ret:
self._queue.task_done()
async def push_to_queue(self,item):
self._queue.put_nowait(item)
async def __aenter__(self):
assert self._workers == None
self._workers = [asyncio.create_task(self._worker()) for _ in range(self._no_of_workers)]
return self
async def __aexit__(self,type,value,traceback):
await self._queue.join()
for worker in self._workers:
worker.cancel()
await asyncio.gather(*self._workers, loop=self._loop, return_exceptions =True)
To use the Asyncpool:
async def something(item):
print("got", item)
await asyncio.sleep(item)
async def main():
async with AsyncPool(something, 5, 2) as pool:
for i in range(10):
await pool.push_to_queue(i)
asyncio.run(main())
The Output in my terminal:
The problem is that your except Exception exception clause also catches cancellation, and ignores it. To add to the confusion, print(e) just prints an empty line in case of a CancelledError, which is where the empty lines in the output come from. (Changing it to print(type(e)) shows what's going on.)
To correct the issue, change except Exception to something more specific, like except asyncio.TimeoutError. This change is not needed in Python 3.8 where asyncio.CancelledError no longer derives from Exception, but from BaseException, so except Exception doesn't catch it.
When you have an asyncio task created and then cancelled, you still have the task alive that need to be "reclaimed". So you want to await worker for it. However, once you await such a cancelled task, as it will never give you back the expected return value, the asyncio.CancelledError will be raised and you need to catch it somewhere.
Because of this behavior, I don't think you should gather them but to await for each of the cancelled tasks, as they are supposed to return right away:
async def __aexit__(self,type,value,traceback):
await self._queue.join()
for worker in self._workers:
worker.cancel()
for worker in self._workers:
try:
await worker
except asyncio.CancelledError:
print("worker cancelled:", worker)
This appears to work. The event is a counting timer and when it expires it cancels the tasks.
import asyncio
from datetime import datetime as dt
from datetime import timedelta as td
import random
import time
class Program:
def __init__(self):
self.duration_in_seconds = 20
self.program_start = dt.now()
self.event_has_expired = False
self.canceled_success = False
async def on_start(self):
print("On Start Event Start! Applying Overrides!!!")
await asyncio.sleep(random.randint(3, 9))
async def on_end(self):
print("On End Releasing All Overrides!")
await asyncio.sleep(random.randint(3, 9))
async def get_sensor_readings(self):
print("getting sensor readings!!!")
await asyncio.sleep(random.randint(3, 9))
async def evauluate_data(self):
print("checking data!!!")
await asyncio.sleep(random.randint(3, 9))
async def check_time(self):
if (dt.now() - self.program_start > td(seconds = self.duration_in_seconds)):
self.event_has_expired = True
print("Event is DONE!!!")
else:
print("Event is not done! ",dt.now() - self.program_start)
async def main(self):
# script starts, do only once self.on_start()
await self.on_start()
print("On Start Done!")
while not self.canceled_success:
readings = asyncio.ensure_future(self.get_sensor_readings())
analysis = asyncio.ensure_future(self.evauluate_data())
checker = asyncio.ensure_future(self.check_time())
if not self.event_has_expired:
await readings
await analysis
await checker
else:
# close other tasks before final shutdown
readings.cancel()
analysis.cancel()
checker.cancel()
self.canceled_success = True
print("cancelled hit!")
# script ends, do only once self.on_end() when even is done
await self.on_end()
print('Done Deal!')
async def main():
program = Program()
await program.main()
In python, what's the idiomatic way to establish a one-way communication between two threading.Threads, call them thread a and thread b.
a is the producer, it continuously generates values for b to consume.
b is the consumer, it reads one value generated by a, process the value with a coroutine, and then reads the next value, and so on.
Illustration:
q = very_magic_queue.Queue()
def worker_of_a(q):
while True:
q.put(1)
time.sleep(1)
a = threading.Thread(worker_of_a, args=(q,))
a.start()
async def loop(q):
while True:
# v must be processed in the same order as they are produced
v = await q.get()
print(v)
async def foo():
pass
async def b_main(q):
loop_fut = asyncio.ensure_future(loop(q))
foo_fut = asyncio.ensure_future(foo())
_ = await asyncio.wait([loop_fut, foo_fut], ...)
# blah blah blah
def worker_of_b(q):
asyncio.set_event_loop(asyncio.new_event_loop())
asyncio.get_event_loop().run_until_complete(b_main(q))
b = threading.Thread(worker_of_b, args=(q,))
b.start()
Of course the above code doesn't work, because queue.Queue.get cannot be awaitted, and asyncio.Queue cannot be used in another thread.
I also need a communication channel from b to a.
I would be great if the solution could also work with gevent.
Thanks :)
I had a similar problem -communicate data between a thread and asyncio. The solution I used is to create a sync Queue and add methods for async get and async put using asyncio.sleep to make it non-blocking.
Here is my queue class:
#class to provide queue (sync or asyc morph)
class queMorph(queue.Queue):
def __init__(self,qSize,qNM):
super().__init__(qSize)
self.timeout=0.018
self.me=f'queMorph-{qNM}'
#Introduce methods for async awaitables morph of Q
async def aget(self):
while True:
try:
return self.get_nowait()
except queue.Empty:
await asyncio.sleep(self.timeout)
except Exception as E:
raise
async def aput(self,data):
while True:
try:
return self.put_nowait(data)
except queue.Full:
print(f'{self.me} Queue full on put..')
await asyncio.sleep(self.timeout)
except Exception as E:
raise
To put/get items from queue from the thread (synchronous), use the normal q.get() and q.put() blocking functions.
In the async loop, use q.aget() and q.aput() which do not block.
You can use a synchronized queue from the queue module and defer the wait to a ThreadPoolExecutor:
async def loop(q):
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=1) as executor:
loop = asyncio.get_event_loop()
while True:
# v must be processed in the same order as they are produced
v = await loop.run_in_executor(executor, q.get)
print(v)
I've used Janus to solve this problem - it's a Python library that gives you a thread-safe queue that can be used to communicate between asyncio and a thread.
def threaded(sync_q):
for i in range(100):
sync_q.put(i)
sync_q.join()
async def async_code(async_q):
for i in range(100):
val = await async_q.get()
assert val == i
async_q.task_done()
queue = janus.Queue()
fut = loop.run_in_executor(None, threaded, queue.sync_q)
await async_code(queue.async_q)
I have the following method in my Tornado handler:
async def get(self):
url = 'url here'
try:
async for batch in downloader.fetch(url):
self.write(batch)
await self.flush()
except Exception as e:
logger.warning(e)
This is the code for downloader.fetch():
async def fetch(url, **kwargs):
timeout = kwargs.get('timeout', aiohttp.ClientTimeout(total=12))
response_validator = kwargs.get('response_validator', json_response_validator)
extractor = kwargs.get('extractor', json_extractor)
try:
async with aiohttp.ClientSession(timeout=timeout) as session:
async with session.get(url) as resp:
response_validator(resp)
async for batch in extractor(resp):
yield batch
except aiohttp.client_exceptions.ClientConnectorError:
logger.warning("bad request")
raise
except asyncio.TimeoutError:
logger.warning("server timeout")
raise
I would like yield the "batch" object from multiple downloaders in paralel.
I want the first available batch from the first downloader and so on until all downloaders finished. Something like this (this is not working code):
async for batch in [downloader.fetch(url1), downloader.fetch(url2)]:
....
Is this possible? How can I modify what I am doing in order to be able to yield from multiple coroutines in parallel?
How can I modify what I am doing in order to be able to yield from multiple coroutines in parallel?
You need a function that merges two async sequences into one, iterating over both in parallel and yielding elements from one or the other, as they become available. While such a function is not included in the current standard library, you can find one in the aiostream package.
You can also write your own merge function, as shown in this answer:
async def merge(*iterables):
iter_next = {it.__aiter__(): None for it in iterables}
while iter_next:
for it, it_next in iter_next.items():
if it_next is None:
fut = asyncio.ensure_future(it.__anext__())
fut._orig_iter = it
iter_next[it] = fut
done, _ = await asyncio.wait(iter_next.values(),
return_when=asyncio.FIRST_COMPLETED)
for fut in done:
iter_next[fut._orig_iter] = None
try:
ret = fut.result()
except StopAsyncIteration:
del iter_next[fut._orig_iter]
continue
yield ret
Using that function, the loop would look like this:
async for batch in merge(downloader.fetch(url1), downloader.fetch(url2)):
....
Edit:
As mentioned in the comment, below method does not execute given routines in parallel.
Checkout aitertools library.
import asyncio
import aitertools
async def f1():
await asyncio.sleep(5)
yield 1
async def f2():
await asyncio.sleep(6)
yield 2
async def iter_funcs():
async for x in aitertools.chain(f2(), f1()):
print(x)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(iter_funcs())
It seems that, functions being iterated must be couroutine.