I've been trying to learn a bit about asyncio, and I'm having some unexpected behavior. I've set up a simple fibonacci server that supports multiple connections using streams. The fib calculation is written recursively, so I can simulate long running calculations by entering in a large number. As expected, long running calculations block I/O until the long running calculation completes.
Here's the problem though. I rewrote the fibonacci function to be a coroutine. I expected that by yielding from each recursion, control would fall back to the event loop, and awaiting I/O tasks would get a chance to execute, and that you'd even be able to run multiple fib calculations concurrently. This however doesn't seem to be the case.
Here's the code:
import asyncio
#asyncio.coroutine
def fib(n):
if n < 1:
return 1
a = yield from fib(n-1)
b = yield from fib(n-2)
return a + b
#asyncio.coroutine
def fib_handler(reader, writer):
print('Connection from : {}'.format(writer.transport.get_extra_info('peername')))
while True:
req = yield from reader.readline()
if not req:
break
print(req)
n = int(req)
result = yield from fib(n)
writer.write('{}\n'.format(result).encode('ascii'))
yield from writer.drain()
writer.close()
print("Closed")
def server(address):
loop = asyncio.get_event_loop()
fib_server = asyncio.start_server(fib_handler, *address, loop=loop)
fib_server = loop.run_until_complete(fib_server)
try:
loop.run_forever()
except KeyboardInterrupt:
print('closing...')
fib_server.close()
loop.run_until_complete(fib_server.wait_closed())
loop.close()
server(('', 25000))
This server runs perfectly well if you netcat to port 25000 and start entering in numbers. However if you start a long running calculation (say 35), no other calculations will run until the first completes. In fact, additional connections won't even be processed.
I know that the event loop is feeding back the yields from recursive fib calls, so control has to be falling all the way down. But I thought that the loop would process the other calls in the I/O queues (such as spawning a second fib_handler) before "trampolining" back to the fib function.
I'm sure I must be misunderstanding something or that there is some kind of bug I'm overlooking but I can't for the life of me find it.
Any insight you can provide will be much appreciated.
The first issue is that you're calling yield from fib(n) inside of fib_handler. Including yield from means that fib_handler will block until the call to fib(n) is complete, which means it can't handle any input you provide while fib is running. You would have this problem even if all you did was I/O inside of fib. To fix this, you should use asyncio.async(fib(n)) (or preferably, asyncio.ensure_future(fib(n)), if you have a new enough version of Python) to schedule fib with the event loop, without actually blocking fib_handler. From there, you can use Future.add_done_callback to write the result to the client when it's ready:
import asyncio
from functools import partial
from concurrent.futures import ProcessPoolExecutor
#asyncio.coroutine
def fib(n):
if n < 1:
return 1
a = yield from fib(n-1)
b = yield from fib(n-2)
return a + b
def do_it(writer, result):
writer.write('{}\n'.format(result.result()).encode('ascii'))
asyncio.async(writer.drain())
#asyncio.coroutine
def fib_handler(reader, writer):
print('Connection from : {}'.format(writer.transport.get_extra_info('peername')))
executor = ProcessPoolExecutor(4)
loop = asyncio.get_event_loop()
while True:
req = yield from reader.readline()
if not req:
break
print(req)
n = int(req)
result = asyncio.async(fib(n))
# Write the result to the client when fib(n) is done.
result.add_done_callback(partial(do_it, writer))
writer.close()
print("Closed")
That said, this change alone still won't completely fix the problem; while it will allow multiple clients to connect and issue commands concurrently, a single client will still get synchronous behavior. This happens because when you call yield from coro() directly on a coroutine function, control isn't given back to the event loop until coro() (or another coroutine called by coro) actually executes some non-blocking I/O. Otherwise, Python will just execute coro without yielding control. This is a useful performance optimization, since giving control to the event loop when your coroutine isn't actually going to do blocking I/O is a waste of time, especially given Python's high function call overhead.
In your case, fib never does any I/O, so once you call yield from fib(n-1) inside of fib itself, the event loop never gets to run again until its done recursing, which will block fib_handler from reading any subsequent input from the client until the call to fib is done. Wrapping all your calls to fib in asyncio.async guarantees that control is given to the event loop each time you make a yield from asyncio.async(fib(...)) call. When I made this change, in addition to using asyncio.async(fib(n)) in fib_handler, I was able to process multiple inputs from a single client concurrently. Here's the full example code:
import asyncio
from functools import partial
from concurrent.futures import ProcessPoolExecutor
#asyncio.coroutine
def fib(n):
if n < 1:
return 1
a = yield from fib(n-1)
b = yield from fib(n-2)
return a + b
def do_it(writer, result):
writer.write('{}\n'.format(result.result()).encode('ascii'))
asyncio.async(writer.drain())
#asyncio.coroutine
def fib_handler(reader, writer):
print('Connection from : {}'.format(writer.transport.get_extra_info('peername')))
executor = ProcessPoolExecutor(4)
loop = asyncio.get_event_loop()
while True:
req = yield from reader.readline()
if not req:
break
print(req)
n = int(req)
result = asyncio.async(fib(n))
result.add_done_callback(partial(do_it, writer))
writer.close()
print("Closed")
Input/Output on client-side:
dan#dandesk:~$ netcat localhost 25000
35 # This was input
4 # This was input
8 # output
24157817 # output
Now, even though this works, I wouldn't use this implementation, since its doing a bunch of CPU-bound work in a single-threaded program that also wants to serve I/O in that same thread. This isn't going to scale very well, and won't have ideal performance. Instead, I'd recommend using loop.run_in_executor to run the calls to fib in a background process, which allows the asyncio thread to run at full capacity, and also allows us to scale the calls to fib across multiple cores:
import asyncio
from functools import partial
from concurrent.futures import ProcessPoolExecutor
def fib(n):
if n < 1:
return 1
a = fib(n-1)
b = fib(n-2)
return a + b
def do_it(writer, result):
writer.write('{}\n'.format(result.result()).encode('ascii'))
asyncio.async(writer.drain())
#asyncio.coroutine
def fib_handler(reader, writer):
print('Connection from : {}'.format(writer.transport.get_extra_info('peername')))
executor = ProcessPoolExecutor(8) # 8 Processes in the pool
loop = asyncio.get_event_loop()
while True:
req = yield from reader.readline()
if not req:
break
print(req)
n = int(req)
result = loop.run_in_executor(executor, fib, n)
result.add_done_callback(partial(do_it, writer))
writer.close()
print("Closed")
Related
I need to send HTTP requests and do some CPU intensive task while waiting for the response. I tried to mock the situation with an asyncio.sleep and a CPU task below:
import asyncio
async def main():
loop = asyncio.get_event_loop()
start = loop.time()
task = asyncio.create_task(asyncio.sleep(1))
# ------Useless CPU-Bound Task------ #
for n in range(10 ** 7):
n **= 7
# ---------------------------------- #
print(f"CPU-bound process finished in {loop.time()-start:.2f} seconds.")
await task
print(f"Finished in {loop.time()-start:.2f} seconds.")
asyncio.run(main())
Output:
CPU-bound process finished in 2.12 seconds.
Finished in 3.12 seconds.
I expected the sleeping task to proceed during the CPU process but apparently they ran synchronously. This also makes me worry about the requests that I need to send such that CPU process might begin and completely block the requests so that they don't get sent to the server until the process finishes etc.
So the question is why does this happen and how to prevent it?
I've also read somewhere that asyncio only switches context upon await calls. Does this have disadvantages in a situation like this, if so, how?
Append: Will using threading have any advantages over asyncio in this scenario? I know it's many questions, but I'm really confused.
Asyncio tasks are more co-operative concurrency than true concurrency.
Your sleeper task won't actually start running until you "yield" control to it, which is usually done with an await call. Since that happens after your main (CPU-intensive) code is finished, there will be an extra second after that before everything is actually done.
An await asyncio.sleep(0) between sleeper task creation and CPU-intensive work will allow the sleeper task to commence. It will them immediately yield back to the main task and they'll run "concurrently".
Of course, a CPU-bound async task sort of defeats the purpose of asyncio since it won't yield to allow other tasks to run in a timely manner. That doesn't really matter for this sleeper but, if it was a task that had to do thirty things, one per second, that would be a problem.
If you need to do anything like that, it's a good idea to either choose one of the other forty-eight ways of doing concurrency in Python :-), or yield enough in the main task so that other tasks can run. In other words, something like:
yield_cycle = 0.1 # Cycle time.
then = time.monotonic() # Base time.
for n in range(10 ** 7):
n **= 7
if time.monotonic() - then > yield_cycle: # Check cycle time.
await asyncio.sleep(0) # Yield if exceeded.
then = time.monotonic() # Prep next cycle.
In fact, we have a helper function in our own code base which does exactly this. I can't give you the actual source code but I think it's (hopefully) simple enough to recite from memory:
async def play_nice(secs: float, base: float) -> float:
"""Yield periodically in intensive task.
Initial call can use negative base to yield immediately.
Args:
secs: Minimum run time before yield will happen.
base: Base monotonic time to use for calculations.
Returns:
New base time to use.
"""
if base < 0:
base = time.monotonic() - secs
if time.monotonic() - base >= secs:
await asyncio.sleep(0)
return time.monotonic()
return base
# Your code is then:
then = await play_nice(secs=0.1, base=-1) # Initial yield.
for n in range(10 ** 7):
n **= 7
then = await play_nice(secs=0.1, base=then) # Subsequent ones.
The reason is your CPU intensive task has the control until it yields it. You can force it to yield using sleep:
sleep() always suspends the current task, allowing other tasks to run.
Setting the delay to 0 provides an optimized path to allow other tasks to run. This can be used by long-running functions to avoid blocking the event loop for the full duration of the function call.
import asyncio
async def test_sleep(n):
await asyncio.sleep(n)
async def main():
loop = asyncio.get_event_loop()
start = loop.time()
task = asyncio.create_task(asyncio.sleep(1))
await asyncio.sleep(0)
# ------Useless CPU-Bound Task------ #
for n in range(10 ** 7):
n **= 7
# ---------------------------------- #
print(f'CPU-bound process finished in {loop.time()-start:.2f} seconds.')
await task
print(f"Finished in {loop.time()-start:.2f} seconds.")
await main()
Will output
CPU-bound process finished in 4.21 seconds.
Finished in 4.21 seconds.
I have the following code snippet which I want to transform into asynchronous code (data tends to be a large Iterable):
transformed_data = (do_some_transformation(d) for d in data)
stacked_jsons = "\n\n".join(json.dumps(t, separators=(",", ":")) for t in transformed_data)
I managed to rewrite the do_some_transformation-function to be async so I can do the following:
transformed_data = (await do_some_transformation(d) for d in data)
async_generator = (json.dumps(event, separators=(",", ":")) async for t in transformed_data)
stacked_jsons = ???
What's the best way to incrementally join the jsons produced by the async generator so that the joining process is also asynchronous?
This snippet is part of a larger I/O-bound-application which and has many asynchronous components and thus would profit from asynchifying everything.
The point of str.join is to transform an entire list at once.1 If items arrive incrementally, it can be advantageous to accumulate them one by one.
async def join(by: str, _items: 'AsyncIterable[str]') -> str:
"""Asynchronously joins items with some string"""
result = ""
async for item in _items:
if result and by: # only add the separator between items
result += by
result += item
return result
The async for loop is sufficient to let the async iterable suspend between items so that other tasks may run. The primary advantage of this approach is that even for very many items, this never stalls the event loop for longer than adding the next item.
This utility can directly digest the async generator:
stacked_jsons = join("\n\n", (json.dumps(event, separators=(",", ":")) async for t in transformed_data))
When it is know that the data is small enough that str.join runs in adequate time, one can directly convert the data to a list instead and use str.join:
stacked_jsons = "\n\n".join([json.dumps(event, separators=(",", ":")) async for t in transformed_data])
The [... async for ...] construct is an asynchronous list comprehension. This internally works asynchronously to iterate, but produces a regular list once all items are fetched – only this resulting list is passed to str.join and can be processed synchronously.
1 Even when joining an iterable, str.join will internally turn it into a list first.
More in depth explanation about my comment:
Asyncio is a great tool if your processor has a lot of waiting to do.
For example: when you make request to a db over the network, after the request is sent your cpu just does nothing until it gets an answer.
Using the async await syntax you can have your processor execute other tasks while "waiting" for the current one to finish. this does not mean it runs them in parallel. There is only one task running at a time.
In your case (for what i can see) the cpu never waits for something it is constantly running string operations.
if you want to run these operations in parallel you might want to take a look at ProcesPools.
This is not bound by a single process and core but will spread the processing over several cores to run it in parallel.
from concurrent.futures import ProcessPoolExecutor
def main():
with ProcessPoolExecutor() as executor:
transformed_data = executor.map(do_some_transformation, data) #returns an iterable
stacked_jsons = "\n\n".join(json.dumps(t, separators=(",", ":")) for t in transformed_data)
if __name__ == '__main__':
main()
I hope the provided code can help you.
ps.
The if __name__ part is required
edit: i saw your comment about 10k dicts, assume you have 8 cores (ignore multithreading) then each process will only transform 1250 dicts, instead of the 10k your main thread does now. These processes run simultaniously and although the performance increase is not linear it should process them a lot faster.
TL;DR: Consider using producer/consumer pattern, if do_some_transformation is IO bound, and you really want an incremental aggregation.
Of course, async itself only brings an advantage if you actually have any other proper async tasks to begin with.
As #MisterMiyagi said, if do_some_transformation is IO bound and time consuming, firing all transformation as a horde of async tasks can be a good idea.
Example code:
import asyncio
import json
data = ({"large": "data"},) * 3 # large
stacked_jsons = ""
async def transform(d: dict, q: asyncio.Queue) -> None:
# `do_some_transformation`: long IO bound task
await asyncio.sleep(1)
await q.put(d)
# WARNING: incremental concatination of string would be slow,
# since string is immutable.
async def join(q: asyncio.Queue):
global stacked_jsons
while True:
d = await q.get()
stacked_jsons += json.dumps(d, separators=(",", ":")) + "\n\n"
q.task_done()
async def main():
q = asyncio.Queue()
producers = [asyncio.create_task(transform(d, q)) for d in data]
consumer = asyncio.create_task(join(q))
await asyncio.gather(*producers)
await q.join() # Implicitly awaits consumers, too
consumer.cancel()
print(stacked_jsons)
if __name__ == "__main__":
import time
s = time.perf_counter()
asyncio.run(main())
elapsed = time.perf_counter() - s
print(f"{__file__} executed in {elapsed:0.2f} seconds.")
So that do_some_transformation don't block each other. Output:
$ python test.py
{"large":"data"}
{"large":"data"}
{"large":"data"}
test.py executed in 1.00 seconds.
Besides, I don't think incremental concatenation of string is a good idea, since string is immutable and a lot of memory would be wasted ;)
Reference: Async IO in Python: A Complete Walkthrough - Real Python
I've been trying to get to grips with how I can use concurrent.futures to call a function 3 times every second, without waiting for it to return. I will collect the results after I've made all the calls I need to make.
Here is where I am at the moment, and I'm surprised that sleep() within this example function prevents my code from launching the next chunk of 3 function calls. I'm obviously not understanding the documentation well enough here :)
def print_something(thing):
print(thing)
time.sleep(10)
# define a generator
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
def main():
chunk_number = 0
alphabet = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
for current_chunk in chunks(alphabet, 3): # Restrict to calling the function 3 times per second
with ProcessPoolExecutor(max_workers=3) as executor:
futures = { executor.submit(print_something, thing): thing for thing in current_chunk }
chunk_number += 1
print('chunk %s' % chunk_number)
time.sleep(1)
for result in as_completed(futures):
print(result.result())
This code results in chunks of 3 being printed with a sleep time of 10s between each chunk.How can I change this to ensure I'm not waiting for the function to return before calling for the next batch ?
Thanks
First, for each iteration of for current_chunk in chunks(alphabet, 3):, you are creating a new ProcessPoolExecutor instance and futures dictionary instance clobbering the previous one. So the final loop for result in as_completed(futures): would only be printing the results from the last chunk submitted. Second, and the reason why I believe you are hanging, your block that is governed by with ProcessPoolExecutor(max_workers=3) as executor: will not terminate until the tasks that are submitted by the executor are completed and that will take at least 10 seconds. So, the next iteration of the for current_chunk in chunks(alphabet, 3): block won't be executed more frequently than once every 10 seconds.
Note also that the block for result in as_completed(futures): needs to be moved within the with ThreadPoolExecutor(max_workers=26) as executor: block for the same reason. That is, if it is placed after, it will not be executed until all the tasks have completed and so you will not be able to get results "as they complete."
You need to do a bit of rearranging as shown below (I have also modified print_something to return something other than None. There should be no hangs now if you have enough workers (26) to run the 26 tasks being submitted. I doubt your desktop (if you are running this on your PC) has 26 cores to support 26 concurrently executing processes. But I note that print_something only prints a short string and then sleeps for 10 seconds, which allows it to relinquish its processor to another process in the pool. So, while with cpu-intensive tasks, little is to be gained by specifying a max_workers value greater than the number of actual physical processors/cores you have on your computer, in this case it's OK. But more efficient when you have tasks that spend little time executing actual Python byte code is to use threading instead of processes, since the cost of creating threads is much less than the cost of creating processes. However, threading is notoriously poor when the tasks you are running largely consists of Python byte code since such code cannot be executed concurrently due to serialization of the Global Interpreter Lock (GIL).
Topic for you to research: The Global Interpreter Lock (GIL) and Python byte code execution
Update to use threads:
So we should substitute the ThreadPoolExecutor with 26 or more light-weight threads for the ProcessPoolExecutor. The beauty of the concurrent.futures module is that no other code needs to be changed. But most important is to change the block structure and have a single executor.
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
def print_something(thing):
# NOT cpu-intensive, so threads should work well here
print(thing)
time.sleep(10)
return thing # so there is a non-None result
# define a generator
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
def main():
chunk_number = 0
alphabet = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
futures = {}
with ThreadPoolExecutor(max_workers=26) as executor:
for current_chunk in chunks(alphabet, 3): # Restrict to calling the function 3 times per second
futures.update({executor.submit(print_something, thing): thing for thing in current_chunk })
chunk_number += 1
print('chunk %s' % chunk_number)
time.sleep(1)
# needs to be within the executor block else it won't run until all futures are complete
for result in as_completed(futures):
print(result.result())
if __name__ == '__main__':
main()
Right now I have some code that does roughly the following
def generator():
while True:
value = do_some_lengthy_IO()
yield value
def model():
for datapoint in generator():
do_some_lengthy_computation(datapoint)
Right now, the I/O and the computation happen in serial. Ideally the should be running in parallel concurrently (the generator having ready the next value) since they share nothing but the value being passed. I started looking into this and got very confused with the multiprocessing, threading, and async stuff and could not get a minimal working example going. Also, since some of this seems to be recent features, I am using Python 3.6.
I ended up figuring it out. The simplest way is to use the multiprocessing package and use a pipe to communicate with the child process. I wrote a wrapper that can take any generator
import time
import multiprocessing
def bg(gen):
def _bg_gen(gen, conn):
while conn.recv():
try:
conn.send(next(gen))
except StopIteration:
conn.send(StopIteration)
return
parent_conn, child_conn = multiprocessing.Pipe()
p = multiprocessing.Process(target=_bg_gen, args=(gen, child_conn))
p.start()
parent_conn.send(True)
while True:
parent_conn.send(True)
x = parent_conn.recv()
if x is StopIteration:
return
else:
yield x
def generator(n):
for i in range(n):
time.sleep(1)
yield i
#This takes 2s/iteration
for i in generator(100):
time.sleep(1)
#This takes 1s/iteration
for i in bg(generator(100)):
time.sleep(1)
The only missing thing right now is that for infinite generators the process is never killed but that can be easily added by doing a parent_conn.send(False).
I do not get any acceleration using asyncio. This snippet still runs the same fashion as a sync job. Most of the examples use asyncio.sleep() to impose delay, my question is what if part of the code poses the delay depending on the input parameters.
async def c(n):
#this loop is supposed to impose delay
for i in range(1, n * 40000):
c *= i
return n
async def f():
tasks = [c(i) for i in [2,1,3]]
r=[]
completed, pending = await asyncio.wait(tasks)
for item in completed:
r.append(item.result())
return r
if __name__=="__main__":
loop = asyncio.get_event_loop()
k=loop.run_until_complete(f())
loop.close()
I expect to get [1,2,3] but I do not (there is no time difference when running in serial also)
asyncio is not about getting acceleration, it's about avoiding "callback hell" when programming in an asynchronous environment, such as (but not limited to) non-blocking IO. Since the code in the question is not asynchronous, there is nothing to gain from using asyncio - but you can look into multiprocessing instead.
In the above case, the function is defined as async, but it runs its entire calculation without awaiting anything. It also contains references to unassigned variables, so let's start with a version that runs:
async def long_calc(n):
p = 1
for i in range(1, n * 10000):
p *= i
print(math.log(p))
return p
The print at the end immediately indicates when the calculation is done. Starting several such coroutines "in parallel" is done with asyncio.gather:
async def wait_calcs():
return await asyncio.gather(*[long_calc(i) for i in [2, 1, 3]])
asyncio.gather will let the calculations run and return once all of them are complete, returning a list of their results in the order in which they they appear in the argument list. But the output printed when running loop.run_until_complete(wait_calcs()) shows that calculations are not really running in parallel:
178065.71824964616
82099.71749644238
279264.3442843094
The results correspond to the [2, 1, 3] order. If the coroutines were running in parallel, the smallest number would appear first because its coroutine has by far the least work to do.
We can force the coroutine to give a chance to other coroutines to run by introducing a no-op sleep in the inner loop:
async def long_calc(n):
p = 1
for i in range(1, n * 10000):
p *= i
await asyncio.sleep(0)
print(math.log(p))
return p
The output now shows that the coroutines were running in parallel:
82099.71749644238
178065.71824964616
279264.3442843094
Note that this version also takes more time to run because it involves more switching between the coroutines and the main loop. The slowdown can be avoided by only sleeping once in a hundred cycles or so.