The program I wrote loop through a range and find numbers that are prime and palindrome.
As a part of learning asyncio I tried to re-construct it using async. But the result was not good.Here async code is taking much longer that the synchronous code.
synchronous code
import math
import time
def prime(n):
limit=int(math.sqrt(n))
for j in range(2,limit):
if(n%j==0):
return 0
return 1
def pallindrome(n):
n=str(n)
m=n[::-1]
if(m==n):
return 1
return 0
a, b, c = 999999999, 9999999, 0
start = time.time()
for i in range(a, b, -1):
if(pallindrome(i)):
if(prime(i)):
c+=1
print(i)
if(c==20):
break
print("took --> ", time.time()-start)
RESULT :
999727999
999686999
999676999
999565999
999454999
999434999
999272999
999212999
999070999
998979899
998939899
998898899
998757899
998666899
998565899
998333899
998282899
998202899
998171899
998121899
took --> 0.6525201797485352
asynchronous code
import math , time, asyncio
async def is_prime(n):
limit= int(math.sqrt(n))
for j in range(2,limit):
await asyncio.sleep(0)
if(n%j==0):
return 0
return 1
async def is_pallindrome(n):
await asyncio.sleep(0)
n=str(n)
m=n[::-1]
if(m==n):
return 1
return 0
async def waiting(start):
while True:
print("processing --> time took {:.2f} --> still running".format(time.time()-start))
await asyncio.sleep(2)
async def main():
a, b, c = 999999999, 9999999, 0
start = time.time()
for i in range(a, b , -1):
await asyncio.sleep(0)
if(await is_pallindrome(i)):
if(await is_prime(i)):
c+=1
print(i)
if(c==20):
break
print(f"Found {c} results in {time.time()-start}s exiting now")
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.create_task(waiting(time.time()))
future = asyncio.ensure_future(main())
loop.run_until_complete(future)
RESULT:
999727999
999686999
999676999
999565999
999454999
999434999
999272999
999212999
999070999
998979899
998939899
998898899
998757899
998666899
998565899
998333899
998282899
998202899
998171899
998121899
Found 20 results in 18.48567509651184s exiting now
another interesting thing is that passing loop.set_debug(True) and running the code tooks
103 seconds to complete.
can someone explain why this happen?
Your use case seem to be CPU intensive only, and does not require IO work.
Async in python is mainly used to keep using the CPU, while IO operations are running (http request, file writing)
I think that you might be confused with threading. Python can only use one CPU core at a time, and async jobs are queued and executed by the same core. This means that in your example, you will not gain anything by using async, but maybe add some overhead that will slow your execution time.
Related
I had the hypothesis that if I wrote mutually recursive coroutines with asyncio, they would not hit the maximum recursion depth exception, since the event loop was calling them (and act like a trampoline). This, however, is not the case when I write them like this:
import asyncio
#asyncio.coroutine
def a(n):
print("A: {}".format(n))
if n > 1000: return n
else: yield from b(n+1)
#asyncio.coroutine
def b(n):
print("B: {}".format(n))
yield from a(n+1)
loop = asyncio.get_event_loop()
loop.run_until_complete(a(0))
When this runs, I get RuntimeError: maximum recursion depth exceeded while calling a Python object.
Is there a way to keep the stack from growing in recursive coroutines with asyncio?
To keep the stack from growing, you have to allow each coroutine to actually exit after it schedules the next recursive call, which means you have to avoid using yield from. Instead, you use asyncio.async (or asyncio.ensure_future if using Python 3.4.4+) to schedule the next coroutine with the event loop, and use Future.add_done_callback to schedule a callback to run once the recursive call returns. Each coroutine then returns an asyncio.Future object, which has its result set inside the callback that's run when the recursive call it scheduled completes.
It's probably easiest to understand if you actually see the code:
import asyncio
#asyncio.coroutine
def a(n):
fut = asyncio.Future() # We're going to return this right away to our caller
def set_result(out): # This gets called when the next recursive call completes
fut.set_result(out.result()) # Pull the result from the inner call and return it up the stack.
print("A: {}".format(n))
if n > 1000:
return n
else:
in_fut = asyncio.async(b(n+1)) # This returns an asyncio.Task
in_fut.add_done_callback(set_result) # schedule set_result when the Task is done.
return fut
#asyncio.coroutine
def b(n):
fut = asyncio.Future()
def set_result(out):
fut.set_result(out.result())
print("B: {}".format(n))
in_fut = asyncio.async(a(n+1))
in_fut.add_done_callback(set_result)
return fut
loop = asyncio.get_event_loop()
print("Out is {}".format(loop.run_until_complete(a(0))))
Output:
A: 0
B: 1
A: 2
B: 3
A: 4
B: 5
...
A: 994
B: 995
A: 996
B: 997
A: 998
B: 999
A: 1000
B: 1001
A: 1002
Out is 1002
Now, your example code doesn't actually return n all the way back up the stack, so you could make something functionally equivalent that's a bit simpler:
import asyncio
#asyncio.coroutine
def a(n):
print("A: {}".format(n))
if n > 1000: loop.stop(); return n
else: asyncio.async(b(n+1))
#asyncio.coroutine
def b(n):
print("B: {}".format(n))
asyncio.async(a(n+1))
loop = asyncio.get_event_loop()
asyncio.async(a(0))
loop.run_forever()
But I suspect you really meant to return n all the way back up.
In Python 3.7, you can achieve the "trampoline" effect by using asyncio.create_task() instead of awaiting the coroutine directly.
import asyncio
async def a(n):
print(f"A: {n}")
if n > 1000: return n
return await asyncio.create_task(b(n+1))
async def b(n):
print(f"B: {n}")
return await asyncio.create_task(a(n+1))
assert asyncio.run(a(0)) == 1002
However, this has the disadvantage that the event loop still needs to keep track of all the intermediate tasks, since each task is awaiting its successor. We can use a Future object to avoid this problem.
import asyncio
async def _a(n, f):
print(f"A: {n}")
if n > 1000:
f.set_result(n)
return
asyncio.create_task(_b(n+1, f))
async def _b(n, f):
print(f"B: {n}}")
asyncio.create_task(_a(n+1, f))
async def a(n):
f = asyncio.get_running_loop().create_future()
asyncio.create_task(_a(0, f))
return await f
assert asyncio.run(a(0)) == 1002
I changed the code to async, await and measured time. I really like how much more readable it is.
Future:
import asyncio
#asyncio.coroutine
def a(n):
fut = asyncio.Future()
def set_result(out):
fut.set_result(out.result())
if n > 1000:
return n
else:
in_fut = asyncio.async(b(n+1))
in_fut.add_done_callback(set_result)
return fut
#asyncio.coroutine
def b(n):
fut = asyncio.Future()
def set_result(out):
fut.set_result(out.result())
in_fut = asyncio.async(a(n+1))
in_fut.add_done_callback(set_result)
return fut
import timeit
print(min(timeit.repeat("""
loop = asyncio.get_event_loop()
loop.run_until_complete(a(0))
""", "from __main__ import a, b, asyncio", number=10)))
Result:
% time python stack_ori.py
0.6602963969999109
python stack_ori.py 2,06s user 0,01s system 99% cpu 2,071 total
Async, await:
import asyncio
async def a(n):
if n > 1000:
return n
else:
ret = await asyncio.ensure_future(b(n + 1))
return ret
async def b(n):
ret = await asyncio.ensure_future(a(n + 1))
return ret
import timeit
print(min(timeit.repeat("""
loop = asyncio.get_event_loop()
loop.run_until_complete(a(0))
""", "from __main__ import a, b, asyncio", number=10)))
Result:
% time python stack.py
0.45157229300002655
python stack.py 1,42s user 0,02s system 99% cpu 1,451 total
I am trying to write an iterator which moves on to the next step in the iteration while awaiting an IO bound task. To roughly demonstrate what I'm trying to do in code
for i in iterable:
await io_bound_task() # move on to next step in iteration
# do more stuff when task is complete
I initially tried running with a simple for loop, with a sleep simulating an IO bound task
import asyncio
import random
async def main() -> None:
for i in range(3):
print(f"starting task {i}")
result = await io_bound_task(i)
print(f"finished task {result}")
async def io_bound_task(i: int) -> int:
await asyncio.sleep(random.random())
return i
asyncio.run(main())
here the code runs synchronously and outputs
starting task 0
finished task 0
starting task 1
finished task 1
starting task 2
finished task 2
which I assume is because the for loop is blocking. So I think an asynchronous for loop is the way to proceed? so I try using an asynchronous iterator
from __future__ import annotations
import asyncio
import random
class AsyncIterator:
def __init__(self, max_value: int) -> None:
self.max_value = max_value
self.count = 0
def __aiter__(self) -> AsyncIterator:
return self
async def __anext__(self) -> int:
if self.count == self.max_value:
raise StopAsyncIteration
self.count += 1
return self.count
async def main() -> None:
async for i in AsyncIterator(3):
print(f"starting task {i}")
result = await io_bound_task(i)
print(f"finished task {result}")
async def io_bound_task(i: int) -> int:
await asyncio.sleep(random.random())
return i
asyncio.run(main())
but this also seems to run synchronously and results in the output
starting task 1
finished task 1
starting task 2
finished task 2
starting task 3
finished task 3
every time. So I think the asynchronous iterator is not doing what I assumed it would do? At this point I'm stuck. Is it an issue with my understanding of the asynchronous iterator? Can someone give me some pointers as to how to achieve what I'm trying to do?
I'm new to working with async, so apologies if I'm doing something stupid. Any help is appreciated. Thanks.
I'm on python 3.8.10 if that is a relevant detail.
The thing that you are looking for is called a task, and can be created using the asyncio.create_task function. All the approaches you tried involved awaiting the coroutine io_bound_task(i), and await means something like "wait for this to complete before continuing". If you wrap your coroutine in a task, then it will run in the background rather than you having to wait for it to complete before continuing.
Here is a version of your code using tasks:
import asyncio
import random
async def main() -> None:
tasks = []
for i in range(3):
print(f"starting task {i}")
tasks.append(asyncio.create_task(io_bound_task(i)))
for task in tasks:
result = await task
print(f"finished task {result}")
async def io_bound_task(i: int) -> int:
await asyncio.sleep(random.random())
return i
asyncio.run(main())
Output:
starting task 0
starting task 1
starting task 2
finished task 0
finished task 1
finished task 2
You can also use asyncio.gather (if you need all results before continuing) or asyncio.wait for awaiting multiple tasks, rather than a loop. For example if task 2 completes before task 0 and you don't want to wait for task 0, you could do:
async def main() -> None:
pending = []
for i in range(3):
print(f"starting task {i}")
pending.append(asyncio.create_task(io_bound_task(i)))
while pending:
done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
for task in done:
result = await task
print(f"finished task {result}")
Was using the multiprocessing process before async, to test which is faster I am trying to run the code with async, but it gives me an error saying: 'await' outside async.
My code:
import asyncio
import time
async def sleep():
print(f'Time: {time.time() - start:.2f}')
await asyncio.sleep(1)
async def sum(name, numbers):
def sum_(numbers):
total = 0
print(f'Task {name}: Computing {total}+{number}')
await sleep()
total += number
print(f'Task {name}: Sum = {total}\n')
for number in numbers:
sum_(numbers)
start = time.time()
loop = asyncio.get_event_loop()
tasks = [
loop.create_task(sum("A", [1, 2])),
loop.create_task(sum("B", [1, 2, 3])),
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
end = time.time()
print(f'Time: {end-start:.2f} sec')
Please note: This is just an example of code, in original code I cannot do as per the below:
for number in numbers:
sum_(numbers)
await sleep()
Testing asyncio as advised on this comment
sum_ is a separate function, as far as Python is concerned. If you want to await inside a function, it needs to be async. And if you want to call an async function, you need to await it.
async def sum(name, numbers):
async def sum_(numbers): # <-- This function needs to be async
total = 0
print(f'Task {name}: Computing {total}+{number}')
await sleep()
total += number
print(f'Task {name}: Sum = {total}\n')
for number in numbers:
await sum_(numbers) # <-- And we need to await it here
I am working a sample program that reads from a datasource (csv or rdbms) in chunks, makes some transformation and sends it via socket to a server.
But because the csv is very large, for testing purpose I want to break the reading after few chunks.
Unfortunately something goes wrong and I do not know what and how to fix it. Probably I have to do some cancellation, but now sure where and how. I get the following error:
Task was destroyed but it is pending!
task: <Task pending coro=<<async_generator_athrow without __name__>()>>
The sample code is:
import asyncio
import json
async def readChunks():
# this is basically a dummy alternative for reading csv in chunks
df = [{"chunk_" + str(x) : [r for r in range(10)]} for x in range(10)]
for chunk in df:
await asyncio.sleep(0.001)
yield chunk
async def send(row):
j = json.dumps(row)
print(f"to be sent: {j}")
await asyncio.sleep(0.001)
async def main():
i = 0
async for chunk in readChunks():
for k, v in chunk.items():
await asyncio.gather(send({k:v}))
i += 1
if i > 5:
break
#print(f"item in main via async generator is {chunk}")
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
Many async resources, such as generators, need to be cleaned up with the help of an event loop. When an async for loop stops iterating an async generator via break, the generator is cleaned up by the garbage collector only. This means the task is pending (waits for the event loop) but gets destroyed (by the garbage collector).
The most straightforward fix is to aclose the generator explicitly:
async def main():
i = 0
aiter = readChunks() # name iterator in order to ...
try:
async for chunk in aiter:
...
i += 1
if i > 5:
break
finally:
await aiter.aclose() # ... clean it up when done
These patterns can be simplified using the asyncstdlib (disclaimer: I maintain this library). asyncstdlib.islice allows to take a fixed number of items before cleanly closing the generator:
import asyncstdlib as a
async def main():
async for chunk in a.islice(readChunks(), 5):
...
If the break condition is dynamic, scoping the iterator guarantees cleanup in any case:
import asyncstdlib as a
async def main():
async with a.scoped_iter(readChunks()) as aiter:
async for idx, chunk in a.enumerate(aiter):
...
if idx >= 5:
break
This works...
import asyncio
import json
import logging
logging.basicConfig(format='%(asctime)s.%(msecs)03d %(message)s',
datefmt='%S')
root = logging.getLogger()
root.setLevel(logging.INFO)
async def readChunks():
# this is basically a dummy alternative for reading csv in chunks
df = [{"chunk_" + str(x) : [r for r in range(10)]} for x in range(10)]
for chunk in df:
await asyncio.sleep(0.002)
root.info('readChunks: next chunk coming')
yield chunk
async def send(row):
j = json.dumps(row)
root.info(f"to be sent: {j}")
await asyncio.sleep(0.002)
async def main():
i = 0
root.info('main: starting to read chunks')
async for chunk in readChunks():
for k, v in chunk.items():
root.info(f'main: sending an item')
#await asyncio.gather(send({k:v}))
stuff = await send({k:v})
i += 1
if i > 5:
break
#print(f"item in main via async generator is {chunk}")
##loop = asyncio.get_event_loop()
##loop.run_until_complete(main())
##loop.close()
if __name__ == '__main__':
asyncio.run(main())
... At least it runs and finishes.
The issue with stopping an async generator by reaking out of an async for loop is described in bugs.python.org/issue38013 and looks like it was fixed in 3.7.5.
However, using
loop = asyncio.get_event_loop()
loop.set_debug(True)
loop.run_until_complete(main())
loop.close()
I get a debug error but no Exception in Python 3.8.
Task was destroyed but it is pending!
task: <Task pending name='Task-8' coro=<<async_generator_athrow without __name__>()>>
Using the higher level API asyncio.run(main()) with debugging ON I do not get the debug message. If you are going to try and upgrade to Python 3.7.5-9 you probably should still use asyncio.run().
The problem is simple. You do early exit from loop, but async generator is not exhausted yet(its pending):
...
if i > 5:
break
...
Your readChunks is running in async and your loop. and without completing the program you are breaking it.
That's why it gives asyncio task was destroyed but it is pending
In short async task was doing its work in the background but you killed it by breaking the loop (stopping the program).
I had the hypothesis that if I wrote mutually recursive coroutines with asyncio, they would not hit the maximum recursion depth exception, since the event loop was calling them (and act like a trampoline). This, however, is not the case when I write them like this:
import asyncio
#asyncio.coroutine
def a(n):
print("A: {}".format(n))
if n > 1000: return n
else: yield from b(n+1)
#asyncio.coroutine
def b(n):
print("B: {}".format(n))
yield from a(n+1)
loop = asyncio.get_event_loop()
loop.run_until_complete(a(0))
When this runs, I get RuntimeError: maximum recursion depth exceeded while calling a Python object.
Is there a way to keep the stack from growing in recursive coroutines with asyncio?
To keep the stack from growing, you have to allow each coroutine to actually exit after it schedules the next recursive call, which means you have to avoid using yield from. Instead, you use asyncio.async (or asyncio.ensure_future if using Python 3.4.4+) to schedule the next coroutine with the event loop, and use Future.add_done_callback to schedule a callback to run once the recursive call returns. Each coroutine then returns an asyncio.Future object, which has its result set inside the callback that's run when the recursive call it scheduled completes.
It's probably easiest to understand if you actually see the code:
import asyncio
#asyncio.coroutine
def a(n):
fut = asyncio.Future() # We're going to return this right away to our caller
def set_result(out): # This gets called when the next recursive call completes
fut.set_result(out.result()) # Pull the result from the inner call and return it up the stack.
print("A: {}".format(n))
if n > 1000:
return n
else:
in_fut = asyncio.async(b(n+1)) # This returns an asyncio.Task
in_fut.add_done_callback(set_result) # schedule set_result when the Task is done.
return fut
#asyncio.coroutine
def b(n):
fut = asyncio.Future()
def set_result(out):
fut.set_result(out.result())
print("B: {}".format(n))
in_fut = asyncio.async(a(n+1))
in_fut.add_done_callback(set_result)
return fut
loop = asyncio.get_event_loop()
print("Out is {}".format(loop.run_until_complete(a(0))))
Output:
A: 0
B: 1
A: 2
B: 3
A: 4
B: 5
...
A: 994
B: 995
A: 996
B: 997
A: 998
B: 999
A: 1000
B: 1001
A: 1002
Out is 1002
Now, your example code doesn't actually return n all the way back up the stack, so you could make something functionally equivalent that's a bit simpler:
import asyncio
#asyncio.coroutine
def a(n):
print("A: {}".format(n))
if n > 1000: loop.stop(); return n
else: asyncio.async(b(n+1))
#asyncio.coroutine
def b(n):
print("B: {}".format(n))
asyncio.async(a(n+1))
loop = asyncio.get_event_loop()
asyncio.async(a(0))
loop.run_forever()
But I suspect you really meant to return n all the way back up.
In Python 3.7, you can achieve the "trampoline" effect by using asyncio.create_task() instead of awaiting the coroutine directly.
import asyncio
async def a(n):
print(f"A: {n}")
if n > 1000: return n
return await asyncio.create_task(b(n+1))
async def b(n):
print(f"B: {n}")
return await asyncio.create_task(a(n+1))
assert asyncio.run(a(0)) == 1002
However, this has the disadvantage that the event loop still needs to keep track of all the intermediate tasks, since each task is awaiting its successor. We can use a Future object to avoid this problem.
import asyncio
async def _a(n, f):
print(f"A: {n}")
if n > 1000:
f.set_result(n)
return
asyncio.create_task(_b(n+1, f))
async def _b(n, f):
print(f"B: {n}}")
asyncio.create_task(_a(n+1, f))
async def a(n):
f = asyncio.get_running_loop().create_future()
asyncio.create_task(_a(0, f))
return await f
assert asyncio.run(a(0)) == 1002
I changed the code to async, await and measured time. I really like how much more readable it is.
Future:
import asyncio
#asyncio.coroutine
def a(n):
fut = asyncio.Future()
def set_result(out):
fut.set_result(out.result())
if n > 1000:
return n
else:
in_fut = asyncio.async(b(n+1))
in_fut.add_done_callback(set_result)
return fut
#asyncio.coroutine
def b(n):
fut = asyncio.Future()
def set_result(out):
fut.set_result(out.result())
in_fut = asyncio.async(a(n+1))
in_fut.add_done_callback(set_result)
return fut
import timeit
print(min(timeit.repeat("""
loop = asyncio.get_event_loop()
loop.run_until_complete(a(0))
""", "from __main__ import a, b, asyncio", number=10)))
Result:
% time python stack_ori.py
0.6602963969999109
python stack_ori.py 2,06s user 0,01s system 99% cpu 2,071 total
Async, await:
import asyncio
async def a(n):
if n > 1000:
return n
else:
ret = await asyncio.ensure_future(b(n + 1))
return ret
async def b(n):
ret = await asyncio.ensure_future(a(n + 1))
return ret
import timeit
print(min(timeit.repeat("""
loop = asyncio.get_event_loop()
loop.run_until_complete(a(0))
""", "from __main__ import a, b, asyncio", number=10)))
Result:
% time python stack.py
0.45157229300002655
python stack.py 1,42s user 0,02s system 99% cpu 1,451 total