I am working a sample program that reads from a datasource (csv or rdbms) in chunks, makes some transformation and sends it via socket to a server.
But because the csv is very large, for testing purpose I want to break the reading after few chunks.
Unfortunately something goes wrong and I do not know what and how to fix it. Probably I have to do some cancellation, but now sure where and how. I get the following error:
Task was destroyed but it is pending!
task: <Task pending coro=<<async_generator_athrow without __name__>()>>
The sample code is:
import asyncio
import json
async def readChunks():
# this is basically a dummy alternative for reading csv in chunks
df = [{"chunk_" + str(x) : [r for r in range(10)]} for x in range(10)]
for chunk in df:
await asyncio.sleep(0.001)
yield chunk
async def send(row):
j = json.dumps(row)
print(f"to be sent: {j}")
await asyncio.sleep(0.001)
async def main():
i = 0
async for chunk in readChunks():
for k, v in chunk.items():
await asyncio.gather(send({k:v}))
i += 1
if i > 5:
break
#print(f"item in main via async generator is {chunk}")
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
Many async resources, such as generators, need to be cleaned up with the help of an event loop. When an async for loop stops iterating an async generator via break, the generator is cleaned up by the garbage collector only. This means the task is pending (waits for the event loop) but gets destroyed (by the garbage collector).
The most straightforward fix is to aclose the generator explicitly:
async def main():
i = 0
aiter = readChunks() # name iterator in order to ...
try:
async for chunk in aiter:
...
i += 1
if i > 5:
break
finally:
await aiter.aclose() # ... clean it up when done
These patterns can be simplified using the asyncstdlib (disclaimer: I maintain this library). asyncstdlib.islice allows to take a fixed number of items before cleanly closing the generator:
import asyncstdlib as a
async def main():
async for chunk in a.islice(readChunks(), 5):
...
If the break condition is dynamic, scoping the iterator guarantees cleanup in any case:
import asyncstdlib as a
async def main():
async with a.scoped_iter(readChunks()) as aiter:
async for idx, chunk in a.enumerate(aiter):
...
if idx >= 5:
break
This works...
import asyncio
import json
import logging
logging.basicConfig(format='%(asctime)s.%(msecs)03d %(message)s',
datefmt='%S')
root = logging.getLogger()
root.setLevel(logging.INFO)
async def readChunks():
# this is basically a dummy alternative for reading csv in chunks
df = [{"chunk_" + str(x) : [r for r in range(10)]} for x in range(10)]
for chunk in df:
await asyncio.sleep(0.002)
root.info('readChunks: next chunk coming')
yield chunk
async def send(row):
j = json.dumps(row)
root.info(f"to be sent: {j}")
await asyncio.sleep(0.002)
async def main():
i = 0
root.info('main: starting to read chunks')
async for chunk in readChunks():
for k, v in chunk.items():
root.info(f'main: sending an item')
#await asyncio.gather(send({k:v}))
stuff = await send({k:v})
i += 1
if i > 5:
break
#print(f"item in main via async generator is {chunk}")
##loop = asyncio.get_event_loop()
##loop.run_until_complete(main())
##loop.close()
if __name__ == '__main__':
asyncio.run(main())
... At least it runs and finishes.
The issue with stopping an async generator by reaking out of an async for loop is described in bugs.python.org/issue38013 and looks like it was fixed in 3.7.5.
However, using
loop = asyncio.get_event_loop()
loop.set_debug(True)
loop.run_until_complete(main())
loop.close()
I get a debug error but no Exception in Python 3.8.
Task was destroyed but it is pending!
task: <Task pending name='Task-8' coro=<<async_generator_athrow without __name__>()>>
Using the higher level API asyncio.run(main()) with debugging ON I do not get the debug message. If you are going to try and upgrade to Python 3.7.5-9 you probably should still use asyncio.run().
The problem is simple. You do early exit from loop, but async generator is not exhausted yet(its pending):
...
if i > 5:
break
...
Your readChunks is running in async and your loop. and without completing the program you are breaking it.
That's why it gives asyncio task was destroyed but it is pending
In short async task was doing its work in the background but you killed it by breaking the loop (stopping the program).
Related
I am trying to write an iterator which moves on to the next step in the iteration while awaiting an IO bound task. To roughly demonstrate what I'm trying to do in code
for i in iterable:
await io_bound_task() # move on to next step in iteration
# do more stuff when task is complete
I initially tried running with a simple for loop, with a sleep simulating an IO bound task
import asyncio
import random
async def main() -> None:
for i in range(3):
print(f"starting task {i}")
result = await io_bound_task(i)
print(f"finished task {result}")
async def io_bound_task(i: int) -> int:
await asyncio.sleep(random.random())
return i
asyncio.run(main())
here the code runs synchronously and outputs
starting task 0
finished task 0
starting task 1
finished task 1
starting task 2
finished task 2
which I assume is because the for loop is blocking. So I think an asynchronous for loop is the way to proceed? so I try using an asynchronous iterator
from __future__ import annotations
import asyncio
import random
class AsyncIterator:
def __init__(self, max_value: int) -> None:
self.max_value = max_value
self.count = 0
def __aiter__(self) -> AsyncIterator:
return self
async def __anext__(self) -> int:
if self.count == self.max_value:
raise StopAsyncIteration
self.count += 1
return self.count
async def main() -> None:
async for i in AsyncIterator(3):
print(f"starting task {i}")
result = await io_bound_task(i)
print(f"finished task {result}")
async def io_bound_task(i: int) -> int:
await asyncio.sleep(random.random())
return i
asyncio.run(main())
but this also seems to run synchronously and results in the output
starting task 1
finished task 1
starting task 2
finished task 2
starting task 3
finished task 3
every time. So I think the asynchronous iterator is not doing what I assumed it would do? At this point I'm stuck. Is it an issue with my understanding of the asynchronous iterator? Can someone give me some pointers as to how to achieve what I'm trying to do?
I'm new to working with async, so apologies if I'm doing something stupid. Any help is appreciated. Thanks.
I'm on python 3.8.10 if that is a relevant detail.
The thing that you are looking for is called a task, and can be created using the asyncio.create_task function. All the approaches you tried involved awaiting the coroutine io_bound_task(i), and await means something like "wait for this to complete before continuing". If you wrap your coroutine in a task, then it will run in the background rather than you having to wait for it to complete before continuing.
Here is a version of your code using tasks:
import asyncio
import random
async def main() -> None:
tasks = []
for i in range(3):
print(f"starting task {i}")
tasks.append(asyncio.create_task(io_bound_task(i)))
for task in tasks:
result = await task
print(f"finished task {result}")
async def io_bound_task(i: int) -> int:
await asyncio.sleep(random.random())
return i
asyncio.run(main())
Output:
starting task 0
starting task 1
starting task 2
finished task 0
finished task 1
finished task 2
You can also use asyncio.gather (if you need all results before continuing) or asyncio.wait for awaiting multiple tasks, rather than a loop. For example if task 2 completes before task 0 and you don't want to wait for task 0, you could do:
async def main() -> None:
pending = []
for i in range(3):
print(f"starting task {i}")
pending.append(asyncio.create_task(io_bound_task(i)))
while pending:
done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
for task in done:
result = await task
print(f"finished task {result}")
I have a complex function Vehicle.set_data, which has many nested functions, API calls, DB calls, etc. For the sake of this example, I will simplify it.
I am trying to use Async IO to run Vehicle.set_data on multiple vehicles at once. Here is my Vehicle model:
class Vehicle:
def __init__(self, token):
self.token = token
# Works async
async def set_data(self):
await asyncio.sleep(random.random() * 10)
# Does not work async
# def set_data(self):
# time.sleep(random.random() * 10)
And here is my Async IO routinue:
async def set_vehicle_data(vehicle):
# sleep for T seconds on average
await vehicle.set_data()
def get_random_string():
return ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(5))
async def producer(queue):
count = 0
while True:
count += 1
# produce a token and send it to a consumer
token = get_random_string()
vehicle = Vehicle(token)
print(f'produced {vehicle.token}')
await queue.put(vehicle)
if count > 3:
break
async def consumer(queue):
while True:
vehicle = await queue.get()
# process the token received from a producer
print(f'Starting consumption for vehicle {vehicle.token}')
await set_vehicle_data(vehicle)
queue.task_done()
print(f'Ending consumption for vehicle {vehicle.token}')
async def main():
queue = asyncio.Queue()
# #todo now, do I need multiple producers
producers = [asyncio.create_task(producer(queue))
for _ in range(3)]
consumers = [asyncio.create_task(consumer(queue))
for _ in range(3)]
# with both producers and consumers running, wait for
# the producers to finish
await asyncio.gather(*producers)
print('---- done producing')
# wait for the remaining tasks to be processed
await queue.join()
# cancel the consumers, which are now idle
for c in consumers:
c.cancel()
asyncio.run(main())
In the example above, this commented section of code does not allow multiple vehicles to process at once:
# Does not work async
# def set_data(self):
# time.sleep(random.random() * 10)
Because this is such a complex query in our actual codebase, it would be a tremendous refactor to go flag every single nested function with async and await. Is there any way I can make this function work async without marking up my whole codebase with async?
You can run the function in a separate thread with asyncio.to_thread
await asyncio.to_thread(self.set_data)
If you're using python <3.9 use loop.run_in_executor
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, self.set_data)
This question already has answers here:
asynchronous python itertools chain multiple generators
(2 answers)
Closed 3 years ago.
I would like to listen for events from multiple instances of the same object and then merge this event streams to one stream. For example, if I use async generators:
class PeriodicYielder:
def __init__(self, period: int) -> None:
self.period = period
async def updates(self):
while True:
await asyncio.sleep(self.period)
yield self.period
I can successfully listen for events from one instance:
async def get_updates_from_one():
each_1 = PeriodicYielder(1)
async for n in each_1.updates():
print(n)
# 1
# 1
# 1
# ...
But how can I get events from multiple async generators? In other words: how can I iterate through multiple async generators in the order they are ready to produce next value?
async def get_updates_from_multiple():
each_1 = PeriodicYielder(1)
each_2 = PeriodicYielder(2)
async for n in magic_async_join_function(each_1.updates(), each_2.updates()):
print(n)
# 1
# 1
# 2
# 1
# 1
# 2
# ...
Is there such magic_async_join_function in stdlib or in 3rd party module?
You can use wonderful aiostream library. It'll look like this:
import asyncio
from aiostream import stream
async def test1():
for _ in range(5):
await asyncio.sleep(0.1)
yield 1
async def test2():
for _ in range(5):
await asyncio.sleep(0.2)
yield 2
async def main():
combine = stream.merge(test1(), test2())
async with combine.stream() as streamer:
async for item in streamer:
print(item)
asyncio.run(main())
Result:
1
1
2
1
1
2
1
2
2
2
If you wanted to avoid the dependency on an external library (or as a learning exercise), you could merge the async iterators using a queue:
def merge_async_iters(*aiters):
# merge async iterators, proof of concept
queue = asyncio.Queue(1)
async def drain(aiter):
async for item in aiter:
await queue.put(item)
async def merged():
while not all(task.done() for task in tasks):
yield await queue.get()
tasks = [asyncio.create_task(drain(aiter)) for aiter in aiters]
return merged()
This passes the test from Mikhail's answer, but it's not perfect: it doesn't propagate the exception in case one of the async iterators raises. Also, if the task that exhausts the merged generator returned by merge_async_iters() gets cancelled, or if the same generator is not exhausted to the end, the individual drain tasks are left hanging.
A more complete version could handle the first issue by detecting an exception and transmitting it through the queue. The second issue can be resolved by merged generator cancelling the drain tasks as soon as the iteration is abandoned. With those changes, the resulting code looks like this:
def merge_async_iters(*aiters):
queue = asyncio.Queue(1)
run_count = len(aiters)
cancelling = False
async def drain(aiter):
nonlocal run_count
try:
async for item in aiter:
await queue.put((False, item))
except Exception as e:
if not cancelling:
await queue.put((True, e))
else:
raise
finally:
run_count -= 1
async def merged():
try:
while run_count:
raised, next_item = await queue.get()
if raised:
cancel_tasks()
raise next_item
yield next_item
finally:
cancel_tasks()
def cancel_tasks():
nonlocal cancelling
cancelling = True
for t in tasks:
t.cancel()
tasks = [asyncio.create_task(drain(aiter)) for aiter in aiters]
return merged()
Different approaches to merging async iterators can be found in this answer, and also this one, where the latter allows for adding new streams mid-stride. The complexity and subtlety of these implementations shows that, while it is useful to know how to write one, actually doing so is best left to well-tested external libraries such as aiostream that cover all the edge cases.
Is it possible to run an async while loop independently of another one?
Instead of the actual code I isolated the issue I am having in the following example code
import asyncio, time
class Time:
def __init__(self):
self.start_time = 0
async def dates(self):
while True:
t = time.time()
if self.start_time == 0:
self.start_time = t
yield t
await asyncio.sleep(1)
async def printer(self):
while True:
print('looping') # always called
await asyncio.sleep(self.interval)
async def init(self):
async for i in self.dates():
if i == self.start_time:
self.interval = 3
await self.printer()
print(i) # Never Called
loop = asyncio.get_event_loop()
t = Time()
loop.run_until_complete(t.init())
Is there a way to have the print function run independently so print(i) gets called each time?
What it should do is print(i) each second and every 3 seconds call self.printer(i)
Essentially self.printer is a separate task that does not need to be called very often, only every x seconds(in this case 3).
In JavaScript the solution is to do something like so
setInterval(printer, 3000);
EDIT: Ideally self.printer would also be able to be canceled / stopped if a condition or stopping function is called
The asyncio equivalent of JavaScript's setTimeout would be asyncio.ensure_future:
import asyncio
async def looper():
for i in range(1_000_000_000):
print(f'Printing {i}')
await asyncio.sleep(0.5)
async def main():
print('Starting')
future = asyncio.ensure_future(looper())
print('Waiting for a few seconds')
await asyncio.sleep(4)
print('Cancelling')
future.cancel()
print('Waiting again for a few seconds')
await asyncio.sleep(2)
print('Done')
if __name__ == '__main__':
asyncio.get_event_loop().run_until_complete(main())
You'd want to register your self.printer() coroutine as a separate task; pass it to asyncio.ensure_future() rather than await on it directly:
asyncio.ensure_future(self.printer())
By passing the coroutine to asyncio.ensure_future(), you put it on the list of events that the loop switches between as each awaits on further work to be completed.
With that change, your test code outputs:
1516819094.278697
looping
1516819095.283424
1516819096.283742
looping
1516819097.284152
# ... etc.
Tasks are the asyncio equivalent of threads in a multithreading scenario.
How can one add a new coroutine to a running asyncio loop? Ie. one that is already executing a set of coroutines.
I guess as a workaround one could wait for existing coroutines to complete and then initialize a new loop (with the additional coroutine). But is there a better way?
You can use create_task for scheduling new coroutines:
import asyncio
async def cor1():
...
async def cor2():
...
async def main(loop):
await asyncio.sleep(0)
t1 = loop.create_task(cor1())
await cor2()
await t1
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))
loop.close()
To add a function to an already running event loop you can use:
asyncio.ensure_future(my_coro())
In my case I was using multithreading (threading) alongside asyncio and wanted to add a task to the event loop that was already running. For anyone else in the same situation, be sure to explicitly state the event loop (as one doesn't exist inside a Thread). i.e:
In global scope:
event_loop = asyncio.get_event_loop()
Then later, inside your Thread:
asyncio.ensure_future(my_coro(), loop=event_loop)
Your question is very close to "How to add function call to running program?"
When exactly you need to add new coroutine to event loop?
Let's see some examples. Here program that starts event loop with two coroutines parallely:
import asyncio
from random import randint
async def coro1():
res = randint(0,3)
await asyncio.sleep(res)
print('coro1 finished with output {}'.format(res))
return res
async def main():
await asyncio.gather(
coro1(),
coro1()
) # here we have two coroutines running parallely
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Output:
coro1 finished with output 1
coro1 finished with output 2
[Finished in 2.2s]
May be you need to add some coroutines that would take results of coro1 and use it as soon as it's ready? In that case just create coroutine that await coro1 and use it's returning value:
import asyncio
from random import randint
async def coro1():
res = randint(0,3)
await asyncio.sleep(res)
print('coro1 finished with output {}'.format(res))
return res
async def coro2():
res = await coro1()
res = res * res
await asyncio.sleep(res)
print('coro2 finished with output {}'.format(res))
return res
async def main():
await asyncio.gather(
coro2(),
coro2()
) # here we have two coroutines running parallely
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Output:
coro1 finished with output 1
coro2 finished with output 1
coro1 finished with output 3
coro2 finished with output 9
[Finished in 12.2s]
Think about coroutines as about regular functions with specific syntax. You can start some set of functions to execute parallely (by asyncio.gather), you can start next function after first done, you can create new functions that call others.
If the task is to add a coroutine(s) to a loop that is already executing some coroutines, then you can use this solution of mine
import asyncio
import time
from threading import Thread
from random import randint
# first, we need a loop running in a parallel Thread
class AsyncLoopThread(Thread):
def __init__(self):
super().__init__(daemon=True)
self.loop = asyncio.new_event_loop()
def run(self):
asyncio.set_event_loop(self.loop)
self.loop.run_forever()
# example coroutine
async def coroutine(num, sec):
await asyncio.sleep(sec)
print(f'Coro {num} has finished')
if __name__ == '__main__':
# init a loop in another Thread
loop_handler = AsyncLoopThread()
loop_handler.start()
# adding first 5 coros
for i in range(5):
print(f'Add Coro {i} to the loop')
asyncio.run_coroutine_threadsafe(coroutine(i, randint(3, 5)), loop_handler.loop)
time.sleep(3)
print('Adding 5 more coros')
# adding 5 more coros
for i in range(5, 10):
print(f'Add Coro {i} to the loop')
asyncio.run_coroutine_threadsafe(coroutine(i, randint(3, 5)), loop_handler.loop)
# let them all finish
time.sleep(60)
After execution of this example we will get this output:
Add Coro 0 to the loop
Add Coro 1 to the loop
Add Coro 2 to the loop
Add Coro 3 to the loop
Add Coro 4 to the loop
Coro 0 has finished
Adding 5 more coros
Add Coro 5 to the loop
Add Coro 6 to the loop
Add Coro 7 to the loop
Add Coro 8 to the loop
Add Coro 9 to the loop
Coro 1 has finished
Coro 3 has finished
Coro 2 has finished
Coro 4 has finished
Coro 9 has finished
Coro 5 has finished
Coro 7 has finished
Coro 6 has finished
Coro 8 has finished
Process finished with exit code 0
None of the answers here seem to exactly answer the question. It is possible to add tasks to a running event loop by having a "parent" task do it for you. I'm not sure what the most pythonic way to make sure that parent doesn't end until it's children have all finished (assuming that's the behavior you want), but this does work.
import asyncio
import random
async def add_event(n):
print('starting ' + str(n))
await asyncio.sleep(n)
print('ending ' + str(n))
return n
async def main(loop):
added_tasks = []
delays = list(range(5))
# shuffle to simulate unknown run times
random.shuffle(delays)
for n in delays:
print('adding ' + str(n))
task = loop.create_task(add_event(n))
added_tasks.append(task)
await asyncio.sleep(0)
print('done adding tasks')
results = await asyncio.gather(*added_tasks)
print('done running tasks')
return results
loop = asyncio.get_event_loop()
results = loop.run_until_complete(main(loop))
print(results)