Async function runs in a sync (serial) way, tasks block each other - python

I do not get any acceleration using asyncio. This snippet still runs the same fashion as a sync job. Most of the examples use asyncio.sleep() to impose delay, my question is what if part of the code poses the delay depending on the input parameters.
async def c(n):
#this loop is supposed to impose delay
for i in range(1, n * 40000):
c *= i
return n
async def f():
tasks = [c(i) for i in [2,1,3]]
r=[]
completed, pending = await asyncio.wait(tasks)
for item in completed:
r.append(item.result())
return r
if __name__=="__main__":
loop = asyncio.get_event_loop()
k=loop.run_until_complete(f())
loop.close()
I expect to get [1,2,3] but I do not (there is no time difference when running in serial also)

asyncio is not about getting acceleration, it's about avoiding "callback hell" when programming in an asynchronous environment, such as (but not limited to) non-blocking IO. Since the code in the question is not asynchronous, there is nothing to gain from using asyncio - but you can look into multiprocessing instead.
In the above case, the function is defined as async, but it runs its entire calculation without awaiting anything. It also contains references to unassigned variables, so let's start with a version that runs:
async def long_calc(n):
p = 1
for i in range(1, n * 10000):
p *= i
print(math.log(p))
return p
The print at the end immediately indicates when the calculation is done. Starting several such coroutines "in parallel" is done with asyncio.gather:
async def wait_calcs():
return await asyncio.gather(*[long_calc(i) for i in [2, 1, 3]])
asyncio.gather will let the calculations run and return once all of them are complete, returning a list of their results in the order in which they they appear in the argument list. But the output printed when running loop.run_until_complete(wait_calcs()) shows that calculations are not really running in parallel:
178065.71824964616
82099.71749644238
279264.3442843094
The results correspond to the [2, 1, 3] order. If the coroutines were running in parallel, the smallest number would appear first because its coroutine has by far the least work to do.
We can force the coroutine to give a chance to other coroutines to run by introducing a no-op sleep in the inner loop:
async def long_calc(n):
p = 1
for i in range(1, n * 10000):
p *= i
await asyncio.sleep(0)
print(math.log(p))
return p
The output now shows that the coroutines were running in parallel:
82099.71749644238
178065.71824964616
279264.3442843094
Note that this version also takes more time to run because it involves more switching between the coroutines and the main loop. The slowdown can be avoided by only sleeping once in a hundred cycles or so.

Related

Why asynchronous sleep task and cpu-bound task cannot proceed concurrently?

I need to send HTTP requests and do some CPU intensive task while waiting for the response. I tried to mock the situation with an asyncio.sleep and a CPU task below:
import asyncio
async def main():
loop = asyncio.get_event_loop()
start = loop.time()
task = asyncio.create_task(asyncio.sleep(1))
# ------Useless CPU-Bound Task------ #
for n in range(10 ** 7):
n **= 7
# ---------------------------------- #
print(f"CPU-bound process finished in {loop.time()-start:.2f} seconds.")
await task
print(f"Finished in {loop.time()-start:.2f} seconds.")
asyncio.run(main())
Output:
CPU-bound process finished in 2.12 seconds.
Finished in 3.12 seconds.
I expected the sleeping task to proceed during the CPU process but apparently they ran synchronously. This also makes me worry about the requests that I need to send such that CPU process might begin and completely block the requests so that they don't get sent to the server until the process finishes etc.
So the question is why does this happen and how to prevent it?
I've also read somewhere that asyncio only switches context upon await calls. Does this have disadvantages in a situation like this, if so, how?
Append: Will using threading have any advantages over asyncio in this scenario? I know it's many questions, but I'm really confused.
Asyncio tasks are more co-operative concurrency than true concurrency.
Your sleeper task won't actually start running until you "yield" control to it, which is usually done with an await call. Since that happens after your main (CPU-intensive) code is finished, there will be an extra second after that before everything is actually done.
An await asyncio.sleep(0) between sleeper task creation and CPU-intensive work will allow the sleeper task to commence. It will them immediately yield back to the main task and they'll run "concurrently".
Of course, a CPU-bound async task sort of defeats the purpose of asyncio since it won't yield to allow other tasks to run in a timely manner. That doesn't really matter for this sleeper but, if it was a task that had to do thirty things, one per second, that would be a problem.
If you need to do anything like that, it's a good idea to either choose one of the other forty-eight ways of doing concurrency in Python :-), or yield enough in the main task so that other tasks can run. In other words, something like:
yield_cycle = 0.1 # Cycle time.
then = time.monotonic() # Base time.
for n in range(10 ** 7):
n **= 7
if time.monotonic() - then > yield_cycle: # Check cycle time.
await asyncio.sleep(0) # Yield if exceeded.
then = time.monotonic() # Prep next cycle.
In fact, we have a helper function in our own code base which does exactly this. I can't give you the actual source code but I think it's (hopefully) simple enough to recite from memory:
async def play_nice(secs: float, base: float) -> float:
"""Yield periodically in intensive task.
Initial call can use negative base to yield immediately.
Args:
secs: Minimum run time before yield will happen.
base: Base monotonic time to use for calculations.
Returns:
New base time to use.
"""
if base < 0:
base = time.monotonic() - secs
if time.monotonic() - base >= secs:
await asyncio.sleep(0)
return time.monotonic()
return base
# Your code is then:
then = await play_nice(secs=0.1, base=-1) # Initial yield.
for n in range(10 ** 7):
n **= 7
then = await play_nice(secs=0.1, base=then) # Subsequent ones.
The reason is your CPU intensive task has the control until it yields it. You can force it to yield using sleep:
sleep() always suspends the current task, allowing other tasks to run.
Setting the delay to 0 provides an optimized path to allow other tasks to run. This can be used by long-running functions to avoid blocking the event loop for the full duration of the function call.
import asyncio
async def test_sleep(n):
await asyncio.sleep(n)
async def main():
loop = asyncio.get_event_loop()
start = loop.time()
task = asyncio.create_task(asyncio.sleep(1))
await asyncio.sleep(0)
# ------Useless CPU-Bound Task------ #
for n in range(10 ** 7):
n **= 7
# ---------------------------------- #
print(f'CPU-bound process finished in {loop.time()-start:.2f} seconds.')
await task
print(f"Finished in {loop.time()-start:.2f} seconds.")
await main()
Will output
CPU-bound process finished in 4.21 seconds.
Finished in 4.21 seconds.

Order of execution in async function. How is it determined?

When I run the following asynchronous code:
from asyncio import create_task, sleep, run
async def print_(delay, x):
print(f'start: {x}')
await sleep(delay)
print(f'end: {x}')
async def main():
slow_task = create_task(print_(2, 'slow task'))
fast_task = create_task(print_(1, 'fast task'))
# The order of execution here is strange:
print(0)
await slow_task
print(1)
await fast_task
run(main())
I get an unexpected order of execution:
0
start: slow task
start: fast task
end: fast task
end: slow task
1
What exactly is happening?
How can I predict the order of execution?
What I find strange is print(1) is ignored until all tasks are finished. To my understanding the code runs as expected until it reaches any await. Then it creates a task-loop with any other awaitable it finds down the line. Which it prioritizes. Right?
That's what I find surprising. I'd expect it to run print(1) before any task is complete. Why doesn't it?
Is that the standard behavior or does it vary? If so, what does it vary upon?
If you could into detail how the event loop works alongside the rest of the code, that'd be great.
Let's walk through this:
You create asynchronous processes for a fast task and a slow task at the same time; even the "fast" one will take a significant amount of time.
Immediately after creating them, you print 0, so this becomes your first output.
You call await slow_task, passing control to the event loop until slow_task finishes.
Because you requested slow_task, it's prioritized, so it starts first, so start: slow_task is printed.
Because slow_task contains an await sleep(2), it passes control back to the event loop, which finds fast_task as ready to operate and starts it, so start: fast_task is printed.
Because fast_task's await sleep(1) finishes first, fast_task completes, and end: fast_task is printed. Because we're awaiting slow_task, not fast_task, we remain in the event loop.
Finally, the slow task finishes, so it prints end: slow task. Because this is what we were awaiting for, control flow is returned to the synchronous process.
After the slow task has finished, you print 1, so this becomes your last output.
Finally, you wait for the fast task to finish; it already did finish earlier, while you were waiting for the fast task, so this returns immediately.
Everything is exactly as one would expect.
That said, for the more general case, you can't expect order-of-operations in an async program to be reliably deterministic in real-world cases where you're waiting on I/O operations that can take a variable amount of time.

Asyncify string joining in Python

I have the following code snippet which I want to transform into asynchronous code (data tends to be a large Iterable):
transformed_data = (do_some_transformation(d) for d in data)
stacked_jsons = "\n\n".join(json.dumps(t, separators=(",", ":")) for t in transformed_data)
I managed to rewrite the do_some_transformation-function to be async so I can do the following:
transformed_data = (await do_some_transformation(d) for d in data)
async_generator = (json.dumps(event, separators=(",", ":")) async for t in transformed_data)
stacked_jsons = ???
What's the best way to incrementally join the jsons produced by the async generator so that the joining process is also asynchronous?
This snippet is part of a larger I/O-bound-application which and has many asynchronous components and thus would profit from asynchifying everything.
The point of str.join is to transform an entire list at once.1 If items arrive incrementally, it can be advantageous to accumulate them one by one.
async def join(by: str, _items: 'AsyncIterable[str]') -> str:
"""Asynchronously joins items with some string"""
result = ""
async for item in _items:
if result and by: # only add the separator between items
result += by
result += item
return result
The async for loop is sufficient to let the async iterable suspend between items so that other tasks may run. The primary advantage of this approach is that even for very many items, this never stalls the event loop for longer than adding the next item.
This utility can directly digest the async generator:
stacked_jsons = join("\n\n", (json.dumps(event, separators=(",", ":")) async for t in transformed_data))
When it is know that the data is small enough that str.join runs in adequate time, one can directly convert the data to a list instead and use str.join:
stacked_jsons = "\n\n".join([json.dumps(event, separators=(",", ":")) async for t in transformed_data])
The [... async for ...] construct is an asynchronous list comprehension. This internally works asynchronously to iterate, but produces a regular list once all items are fetched – only this resulting list is passed to str.join and can be processed synchronously.
1 Even when joining an iterable, str.join will internally turn it into a list first.
More in depth explanation about my comment:
Asyncio is a great tool if your processor has a lot of waiting to do.
For example: when you make request to a db over the network, after the request is sent your cpu just does nothing until it gets an answer.
Using the async await syntax you can have your processor execute other tasks while "waiting" for the current one to finish. this does not mean it runs them in parallel. There is only one task running at a time.
In your case (for what i can see) the cpu never waits for something it is constantly running string operations.
if you want to run these operations in parallel you might want to take a look at ProcesPools.
This is not bound by a single process and core but will spread the processing over several cores to run it in parallel.
from concurrent.futures import ProcessPoolExecutor
def main():
with ProcessPoolExecutor() as executor:
transformed_data = executor.map(do_some_transformation, data) #returns an iterable
stacked_jsons = "\n\n".join(json.dumps(t, separators=(",", ":")) for t in transformed_data)
if __name__ == '__main__':
main()
I hope the provided code can help you.
ps.
The if __name__ part is required
edit: i saw your comment about 10k dicts, assume you have 8 cores (ignore multithreading) then each process will only transform 1250 dicts, instead of the 10k your main thread does now. These processes run simultaniously and although the performance increase is not linear it should process them a lot faster.
TL;DR: Consider using producer/consumer pattern, if do_some_transformation is IO bound, and you really want an incremental aggregation.
Of course, async itself only brings an advantage if you actually have any other proper async tasks to begin with.
As #MisterMiyagi said, if do_some_transformation is IO bound and time consuming, firing all transformation as a horde of async tasks can be a good idea.
Example code:
import asyncio
import json
data = ({"large": "data"},) * 3 # large
stacked_jsons = ""
async def transform(d: dict, q: asyncio.Queue) -> None:
# `do_some_transformation`: long IO bound task
await asyncio.sleep(1)
await q.put(d)
# WARNING: incremental concatination of string would be slow,
# since string is immutable.
async def join(q: asyncio.Queue):
global stacked_jsons
while True:
d = await q.get()
stacked_jsons += json.dumps(d, separators=(",", ":")) + "\n\n"
q.task_done()
async def main():
q = asyncio.Queue()
producers = [asyncio.create_task(transform(d, q)) for d in data]
consumer = asyncio.create_task(join(q))
await asyncio.gather(*producers)
await q.join() # Implicitly awaits consumers, too
consumer.cancel()
print(stacked_jsons)
if __name__ == "__main__":
import time
s = time.perf_counter()
asyncio.run(main())
elapsed = time.perf_counter() - s
print(f"{__file__} executed in {elapsed:0.2f} seconds.")
So that do_some_transformation don't block each other. Output:
$ python test.py
{"large":"data"}
{"large":"data"}
{"large":"data"}
test.py executed in 1.00 seconds.
Besides, I don't think incremental concatenation of string is a good idea, since string is immutable and a lot of memory would be wasted ;)
Reference: Async IO in Python: A Complete Walkthrough - Real Python

await for any future asyncio

I'm trying to use asyncio to handle concurrent network I/O. A very large number of functions are to be scheduled at a single point which vary greatly in time it takes for each to complete. Received data is then processed in a separate process for each output.
The order in which the data is processed is not relevant, so given the potentially very long waiting period for output I'd like to await for whatever future finishes first instead of a predefined order.
def fetch(x):
sleep()
async def main():
futures = [loop.run_in_executor(None, fetch, x) for x in range(50)]
for f in futures:
await f
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Normally, awaiting in order in which futures were queued is fine:
Blue color represents time each task is in executor's queue, i.e. run_in_executor has been called, but the function was not yet executed, as the executor runs only 5 tasks simultaneously; green is time spent on executing the function itself; and the red is the time spent waiting for all previous futures to await.
In my case where functions vary in time greatly, there is a lot of time lost on waiting for previous futures in queue to await, while I could be locally processing GET output. This makes my system idle for a while only to get overwhelmed when several outputs complete simultaneously, then jumping back to idle waiting for more requests to finish.
Is there a way to await whatever future is first completed in the executor?
Looks like you are looking for asyncio.wait with return_when=asyncio.FIRST_COMPLETED.
def fetch(x):
sleep()
async def main():
futures = [loop.run_in_executor(None, fetch, x) for x in range(50)]
while futures:
done, futures = await asyncio.wait(futures,
loop=loop, return_when=asyncio.FIRST_COMPLETED)
for f in done:
await f
loop = asyncio.get_event_loop()
loop.run_until_complete(main())

asyncio tasks getting unexpectedly defered

I've been trying to learn a bit about asyncio, and I'm having some unexpected behavior. I've set up a simple fibonacci server that supports multiple connections using streams. The fib calculation is written recursively, so I can simulate long running calculations by entering in a large number. As expected, long running calculations block I/O until the long running calculation completes.
Here's the problem though. I rewrote the fibonacci function to be a coroutine. I expected that by yielding from each recursion, control would fall back to the event loop, and awaiting I/O tasks would get a chance to execute, and that you'd even be able to run multiple fib calculations concurrently. This however doesn't seem to be the case.
Here's the code:
import asyncio
#asyncio.coroutine
def fib(n):
if n < 1:
return 1
a = yield from fib(n-1)
b = yield from fib(n-2)
return a + b
#asyncio.coroutine
def fib_handler(reader, writer):
print('Connection from : {}'.format(writer.transport.get_extra_info('peername')))
while True:
req = yield from reader.readline()
if not req:
break
print(req)
n = int(req)
result = yield from fib(n)
writer.write('{}\n'.format(result).encode('ascii'))
yield from writer.drain()
writer.close()
print("Closed")
def server(address):
loop = asyncio.get_event_loop()
fib_server = asyncio.start_server(fib_handler, *address, loop=loop)
fib_server = loop.run_until_complete(fib_server)
try:
loop.run_forever()
except KeyboardInterrupt:
print('closing...')
fib_server.close()
loop.run_until_complete(fib_server.wait_closed())
loop.close()
server(('', 25000))
This server runs perfectly well if you netcat to port 25000 and start entering in numbers. However if you start a long running calculation (say 35), no other calculations will run until the first completes. In fact, additional connections won't even be processed.
I know that the event loop is feeding back the yields from recursive fib calls, so control has to be falling all the way down. But I thought that the loop would process the other calls in the I/O queues (such as spawning a second fib_handler) before "trampolining" back to the fib function.
I'm sure I must be misunderstanding something or that there is some kind of bug I'm overlooking but I can't for the life of me find it.
Any insight you can provide will be much appreciated.
The first issue is that you're calling yield from fib(n) inside of fib_handler. Including yield from means that fib_handler will block until the call to fib(n) is complete, which means it can't handle any input you provide while fib is running. You would have this problem even if all you did was I/O inside of fib. To fix this, you should use asyncio.async(fib(n)) (or preferably, asyncio.ensure_future(fib(n)), if you have a new enough version of Python) to schedule fib with the event loop, without actually blocking fib_handler. From there, you can use Future.add_done_callback to write the result to the client when it's ready:
import asyncio
from functools import partial
from concurrent.futures import ProcessPoolExecutor
#asyncio.coroutine
def fib(n):
if n < 1:
return 1
a = yield from fib(n-1)
b = yield from fib(n-2)
return a + b
def do_it(writer, result):
writer.write('{}\n'.format(result.result()).encode('ascii'))
asyncio.async(writer.drain())
#asyncio.coroutine
def fib_handler(reader, writer):
print('Connection from : {}'.format(writer.transport.get_extra_info('peername')))
executor = ProcessPoolExecutor(4)
loop = asyncio.get_event_loop()
while True:
req = yield from reader.readline()
if not req:
break
print(req)
n = int(req)
result = asyncio.async(fib(n))
# Write the result to the client when fib(n) is done.
result.add_done_callback(partial(do_it, writer))
writer.close()
print("Closed")
That said, this change alone still won't completely fix the problem; while it will allow multiple clients to connect and issue commands concurrently, a single client will still get synchronous behavior. This happens because when you call yield from coro() directly on a coroutine function, control isn't given back to the event loop until coro() (or another coroutine called by coro) actually executes some non-blocking I/O. Otherwise, Python will just execute coro without yielding control. This is a useful performance optimization, since giving control to the event loop when your coroutine isn't actually going to do blocking I/O is a waste of time, especially given Python's high function call overhead.
In your case, fib never does any I/O, so once you call yield from fib(n-1) inside of fib itself, the event loop never gets to run again until its done recursing, which will block fib_handler from reading any subsequent input from the client until the call to fib is done. Wrapping all your calls to fib in asyncio.async guarantees that control is given to the event loop each time you make a yield from asyncio.async(fib(...)) call. When I made this change, in addition to using asyncio.async(fib(n)) in fib_handler, I was able to process multiple inputs from a single client concurrently. Here's the full example code:
import asyncio
from functools import partial
from concurrent.futures import ProcessPoolExecutor
#asyncio.coroutine
def fib(n):
if n < 1:
return 1
a = yield from fib(n-1)
b = yield from fib(n-2)
return a + b
def do_it(writer, result):
writer.write('{}\n'.format(result.result()).encode('ascii'))
asyncio.async(writer.drain())
#asyncio.coroutine
def fib_handler(reader, writer):
print('Connection from : {}'.format(writer.transport.get_extra_info('peername')))
executor = ProcessPoolExecutor(4)
loop = asyncio.get_event_loop()
while True:
req = yield from reader.readline()
if not req:
break
print(req)
n = int(req)
result = asyncio.async(fib(n))
result.add_done_callback(partial(do_it, writer))
writer.close()
print("Closed")
Input/Output on client-side:
dan#dandesk:~$ netcat localhost 25000
35 # This was input
4 # This was input
8 # output
24157817 # output
Now, even though this works, I wouldn't use this implementation, since its doing a bunch of CPU-bound work in a single-threaded program that also wants to serve I/O in that same thread. This isn't going to scale very well, and won't have ideal performance. Instead, I'd recommend using loop.run_in_executor to run the calls to fib in a background process, which allows the asyncio thread to run at full capacity, and also allows us to scale the calls to fib across multiple cores:
import asyncio
from functools import partial
from concurrent.futures import ProcessPoolExecutor
def fib(n):
if n < 1:
return 1
a = fib(n-1)
b = fib(n-2)
return a + b
def do_it(writer, result):
writer.write('{}\n'.format(result.result()).encode('ascii'))
asyncio.async(writer.drain())
#asyncio.coroutine
def fib_handler(reader, writer):
print('Connection from : {}'.format(writer.transport.get_extra_info('peername')))
executor = ProcessPoolExecutor(8) # 8 Processes in the pool
loop = asyncio.get_event_loop()
while True:
req = yield from reader.readline()
if not req:
break
print(req)
n = int(req)
result = loop.run_in_executor(executor, fib, n)
result.add_done_callback(partial(do_it, writer))
writer.close()
print("Closed")

Categories