When all coroutines are waiting, asyncio listens for events to wake them up again. A common example would be asyncio.sleep(), which registers a timed event. In practice an event is usually an IO socket ready for receiving or sending new data.
To get a better understanding of this behaviour, I set up a simple test: It sends an http request to localhost and waits for the response. On localhost, I've set up a flask server which waits for 1 second before responding. After sending the request, the client sleeps for 1 second, then it awaits the response. I would expect this to return in rougly a second, since both my program and the server should sleep in parallel. But it takes 2 seconds:
import aiohttp
import asyncio
from time import perf_counter
async def main():
async with aiohttp.ClientSession() as session:
# this http request will take 1 second to respond
async with session.get("http://127.0.0.1:5000/") as response:
# yield control for 1 second
await asyncio.sleep(1)
# wait for the http request to return
text = await response.text()
return text
loop = asyncio.get_event_loop()
start = perf_counter()
results = loop.run_until_complete(main())
stop = perf_counter()
print(f"took {stop-start} seconds") # 2.01909
What is asyncio doing here, why can't I overlap waiting times ?
I'm not interested in the specific scenario of HTTP requests, aiohttp is only used to construct an example. Which is probably a bit dangerous: This could be related to aiohttp and not asyncio at all.
Actually, I expect this to be the case (hence the question title about both asyncio and aiohttp).
My first intuition was that the request is maybe not sent before calling asyncio.sleep(). So I reordered things a bit:
# start coroutine
text = response.text()
# yield control for 1 second
await asyncio.sleep(1)
# wait for the http request to return
text = await text
But this still takes two seconds.
Ok, now to be really sure that the request was sent off before going to sleep, I added print("incoming") to the route on the server, before it goes to sleep. I also changed the length of sleeping time to 10 seconds on the client. The server prints incoming immediately after the client is run. The client takes 11 seconds in total.
#app.route('/')
def index():
print("incoming")
time.sleep(1)
return 'done'
Since the HTTP request is made immediately, the server has definitely sent off an answer before the client wakes up from asyncio.sleep(). It seems to me that the socket providing the HTTP request should be ready as soon as the client wakes up. But still, the total runtime is always an addition of client and server waiting times.
Am I misusing asyncio somehow, or is this related to aiohttp after all ?
The problem is that one second happens in server is performed in async with session.get("http://127.0.0.1:5000/") as response:.
The http request finishes before you get this response object.
You can test it by:
...
async def main():
async with aiohttp.ClientSession() as session:
start = perf_counter()
# this http request will take 1 second to respond
async with session.get("http://127.0.0.1:5000/") as response:
end = perf_counter()
print(f"took {end-start} seconds to get response")
# yield control for 1 second
await asyncio.sleep(1)
# wait for the http request to return
text = await response.text()
return text
...
And btw you can surely overlap this waiting time, as long as you have another running coroutine.
Your testing code has three awaits (two explicit and one hidden in async with) in series, so you don't get any parallel waiting. The code that tests the scenario you describe is something along the lines of:
async def download():
async with aiohttp.ClientSession() as session:
async with session.get("http://127.0.0.1:5000/") as response:
text = await response.text()
return text
async def main():
loop = asyncio.get_event_loop()
# have download start "in the background"
dltask = loop.create_task(download())
# now sleep
await asyncio.sleep(1)
# and now await the end of the download
text = await dltask
Running this coroutine should take the expected time.
Related
We're getting started with Django Channels and are struggling with the following use case:
Our app receives multiple requests from a single client (another server) in a short time. Creating each response takes a long time. The order in which responses are sent to the client doesn't matter.
We want to keep an open WebSocket connection to reduce connection overhead for sending many requests and responses from and to the same client.
Django Channels seems to process messages on the same WebSocket connection strictly in order, and won't start processing the next frame before the previous one has been responded to.
Consider the following example:
Example
Server-side
import asyncio
from channels.generic.websocket import AsyncWebsocketConsumer
class QuestionConsumer(AsyncWebsocketConsumer):
async def websocket_connect(self, event):
await self.accept()
async def complicated_answer(self, question):
await asyncio.sleep(3)
return {
"What is the Answer to Life, The Universe and Everything?": "42",
"Why?": "Because.",
}.get(question, "Don't know")
async def receive(self, text_data=None, bytes_data=None):
# while awaiting below, we should start processing the next WS frame
answer = await self.complicated_answer(text_data)
await self.send(answer)
asgi.py:
from django.urls import re_path
from channels.routing import ProtocolTypeRouter, URLRouter
application = ProtocolTypeRouter(
{"websocket": URLRouter([
re_path(r"^questions", QuestionConsumer.as_asgi(), name="questions",)
]}
)
)
Client-side
import asyncio
import websockets
from time import time
async def main():
async with websockets.connect("ws://0.0.0.0:8000/questions") as ws:
tasks = []
for m in [
"What is the Answer to Life, The Universe and Everything?",
"Why?"
]:
tasks.append(ws.send(m))
# send all requests (without waiting for response)
time_before = time()
await asyncio.gather(*tasks)
# wait for responses
for t in tasks:
print(await ws.recv())
print("{:.1f} seconds since first request".format(time() - time_before))
asyncio.get_event_loop().run_until_complete(main())
Result
Actual
42
3.0 seconds since first request
Because.
6.0 seconds since first request
Desired
42
3.0 seconds since first request
Because.
3.0 seconds since first request
In other words, we would like the event loop to switch between async tasks not only for multiple consumers, but also for all tasks handled by the same consumer. Is this possible or is there a workaround we are overlooking? Have you used Django Channels for similar challenges and how did you solve them?
The consumer's receive function is called sequentially for each incoming WebSocket message, and when the await of the first receive is reached, the receive method wasn't called for the second message and hence switching context to the second co-routine is not yet possible. I couldn't find a source for this, but I'm guessing that this is part of the ASGI protocol itself. For many use-cases, handling WebSocket messages stricty in the order of receiving is probably desired.
The solution to handle messages asynchronously is to not send the response from the receive method, but instead send the response from a coroutine scheduled through loop.create_task.
Scheduling the long-running coroutine which generates response allows receive to complete, and for the next receive to begin. Once the second message's response generation has been scheduled, two coroutines will have been scheduled, and the interpreter can switch contexts to execute them asynchronously.
For the example in the question, this is the solution I found:
class QuestionConsumer(AsyncWebsocketConsumer):
async def complicated_answer(self, question):
await asyncio.sleep(3)
answer = {
"What is the Answer to Life, The Universe and Everything?": "42",
"Why?": "Because.",
}.get(question, "Don't know")
# instead of returning the answer, send it directly to client as a response
await self.send(answer)
async def receive(self, text_data=None, bytes_data=None):
# instead of awaiting, schedule the coroutine
loop = asyncio.get_running_loop()
loop.create_task(
self.complicated_answer(text_data)
)
The output of this altered consumer matches the desired output given by the question. Note that responses may be returned out of order, and clients are responsible for matching requests to responses.
Note that for Python versions <3.7, get_event_loop should be used instead of get_running_loop.
I am running two loops asynchronously, and want both to have access to the same websocket connection. One function periodic_fetch() fetches some data periodically (every 60 seconds) and sends a message to the websocket if a condition is met. The other retrieve_websocket() receives messages from the websocket and perform some action if a condition is met. As of now, I connect to the websocket in both functions, but that means retrieve_websocket() will not receive the response to the websocket message sent by periodic_fetch(). How do I create one websocket connection and use the same one in both loops as they run asynchronously? My code:
# Imports
import asyncio
import websockets
from datetime import datetime
websocket_url = "wss://localhost:5000/"
# Simulate fetching some data
async def fetch_data():
print("Fetching started")
await asyncio.sleep(2)
return {"data": 2}
# Receive and analyze websocket data
async def retrieve_websocket():
async with websockets.connect(websocket_url) as ws:
while True:
msg = await ws.recv()
print(msg)
# Perform some task if condition is met
# Periodically fetch data and send messages to websocket
async def periodic_fetch():
async with websockets.connect(websocket_url) as ws:
while True:
print(datetime.now())
fetch_task = asyncio.create_task(fetch_data())
wait_task = asyncio.create_task(asyncio.sleep(60))
res = await fetch_task
# Send message to websocket
await ws.send("Websocket message")
# Wait the remaining wait duration
await wait_task
loop = asyncio.get_event_loop()
cors = asyncio.wait([periodic_fetch(), retrieve_websocket()])
loop.run_until_complete(cors)
The solution was to open the connection in a separate function and use asyncio.gather() passing in the two functions with the websocket as parameter.
async def run_script():
async with websockets.connect(websocket_url) as ws:
await asyncio.gather(periodic_fetch(ws), retrieve_websocket(ws))
asyncio.run(run_script())
Please consider the following. There's a system that asks for data using HTTP POST methods. Right after sending such a request, the system waits for an HTTP response with a status code and data as separate messages. The existing system is built in a way that it won't accept a response with status code and data combined, which is, to be honest, doesn't make sense to me. On my side, I need to implement a system, which will receive such requests and provide data to clients. I decided to use the AIOHTTP library to solve this problem. As I'm very new to AIOHTTP, I can't find a way to send data back to the client right after returning a response. The existing system which sends requests also has an endpoint on its side. So, what I think of doing, is to return a response with a status code to the client and then as a client send a POST request to the provided endpoint. So my system will work both as a client and as a server. Now, what I do not understand, is how to implement this using AIOHTTP.
Let's say I have the following endpoint on my side with a handler. Please, consider this to only be pseudocode.
async def init():
app = web.Application()
app.add_routes([web.post('/endpoint/', handle)])
app_runner = web.AppRunner(app)
await app_runner.setup()
site = web.TCPSite(runner=app_runner, host='127.0.0.1', port=8008)
await site.start()
async def handle(request):
data = await request.text()
result = await process(data) # Data processing routine. Might be time-consuming.
await session.post(SERVER_ENDPOINT, data=result) # Let's say I have session in this block
# and I'm sending data back to the client.
return web.Response(status=200) # Returning a status without data.
Now, I need web.Response(status=200) to happen as soon as possible and only then process received data and send data back to the client. What I thought of doing is to wrap data processing and request sending in a task and adding it to a queue. Now, I always need the response to be sent first and I'm afraid that when using tasks, this might not be always true, or is it? Might the task be completed before returning a response? Is AIOHTTP good for this task? Should I consider something else?
Update #1
I've found a method called finish_response. Might it be used to implement something like this?
async def handler(self, request):
self.finish_response(web.Response(status=200)) # Just an example.
self.session.post(SERVER_ENDPOINT, data=my_data)
return True # or something
aiohttp has a sibling project called aiojobs, which is used to handle background tasks. There is an example of how aiojobs integrates with aiohttp in their documentation.
So, modifying your example to work with aiojobs:
import aiojobs.aiohttp
async def init():
app = web.Application()
app.add_routes([web.post('/endpoint/', handle)])
# We must setup AIOJobs from AIOHTTP app
aiojobs.aiohttp.setup(app)
app_runner = web.AppRunner(app)
await app_runner.setup()
site = web.TCPSite(runner=app_runner, host='127.0.0.1', port=8008)
await site.start()
async def handle(request):
data = await request.text()
result = await process(data) # Data processing routine. Might be time-consuming.
# Here we create the background task
await aiojobs.aiohttp.spawn(request, session.post(SERVER_ENDPOINT, data=result))
# The response should return as soon as the task is created - it does not wait for the task to finish.
return web.Response(status=200) # Returning a status without data.
If you want the await process(data) to also be scheduled as a task, then you can move both calls into a seperate function, and schedule them together:
async def push_to_server(data):
result = await process(data)
await session.post(SERVER_ENDPOINT, data=result))
async def handle(request):
data = await request.text()
await aiojobs.aiohttp.spawn(request, push_to_server(data))
return web.Response(status=200)
If you want to make sure the response is sent before the push_to_server coroutine is called, then you can make use of asyncio events:
import asyncio
async def push_to_server(data, start):
await start.wait()
result = await process(data)
await session.post(SERVER_ENDPOINT, data=result))
async def handle(request):
data = await request.text()
start = asyncio.Event()
await aiojobs.aiohttp.spawn(request, push_to_server(data, start))
response = web.Response(status=200)
await response.prepare(request)
await response.write_eof()
start.set()
return response
Here, await response.prepare(request) and await response.write_eof() is just a long-winded way of sending the response, but allows us to call the start.set() afterwards, which will trigger the push_to_server functionality which is waiting on that event (await start.wait()).
I have a fastAPI app that posts two requests, one of them is longer (if it helps, they're Elasticsearch queries and I'm using the AsyncElasticsearch module which already returns coroutine). This is my attempt:
class my_module:
search_object = AsyncElasticsearch(url, port)
async def do_things(self):
resp1 = await search_object.search() #the longer one
print(check_resp1)
resp2 = await search_object.search() #the shorter one
print(check_resp2)
process(resp2)
process(resp1)
do_synchronous_things()
return thing
app = FastAPI()
#app.post("/")
async def service(user_input):
result = await my_module.do_things()
return results
What I observed is instead of awaiting resp1, by the time it got to check_resp1 it's already a full response, as if I didn't use async at all.
I'm new to python async, I knew my code wouldn't work, but I don't know how to fix it. As far as I understand, when interpreter sees await it starts the function then just moves on, which in this case should immediately post the next request. How do I make it do that?
Yes, that's correct the coroutine won't proceed until the results are ready. You can use asyncio.gather to run tasks concurrently:
import asyncio
async def task(msg):
print(f"START {msg}")
await asyncio.sleep(1)
print(f"END {msg}")
return msg
async def main():
await task("1")
await task("2")
results = await asyncio.gather(task("3"), task("4"))
print(results)
if __name__ == "__main__":
asyncio.run(main())
Test:
$ python test.py
START 1
END 1
START 2
END 2
START 3
START 4
END 3
END 4
['3', '4']
Alternatively you can use asyncio.as_completed to get the earliest next result:
for coro in asyncio.as_completed((task("5"), task("6"))):
earliest_result = await coro
print(earliest_result)
Update Fri 2 Apr 09:25:33 UTC 2021:
asyncio.run is available since Python 3.7+, in previous versions you will have to create and start the loop manually:
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
Explanation
The reason your code run synchronyously is that in do_things function, the code is executed as follow:
Schedule search_object.search() to execute
Wait till search_object.search() is finished and get the result
Schedule search_object.search() to execute
Wait till search_object.search() is finished and get the result
Execute (synchronyously) process(resp2)
Execute (synchronyously) process(resp1)
Execute (synchronyously) do_synchronous_things()
What you intended, is to make steps 1 and 3 executed before 2 and 4. You can make it easily with unsync library - here is the documentation.
How you can fix this
from unsync import unsync
class my_module:
search_object = AsyncElasticsearch(url, port)
#unsync
async def search1():
return await search_object.search()
#unsync
async def search2(): # not sure if this is any different to search1
return await search_object.search()
async def do_things(self):
task1, task2 = self.search1(), self.search2() # schedule tasks
resp1, resp2 = task1.result(), task2.result() # wait till tasks are executed
# you might also do similar trick with process function to run process(resp2) and process(resp1) concurrently
process(resp2)
process(resp1)
do_synchronous_things() # if this does not rely on resp1 and resp2 it might also be put into separate task to make the computation quicker. To do this use #unsync(cpu_bound=True) decorator
return thing
app = FastAPI()
#app.post("/")
async def service(user_input):
result = await my_module.do_things()
return results
More information
If you want to learn more about asyncio and asyncronyous programming, I recommend this tutorial. There is also similar case that you presented with a few possible solutions to make the coroutines run concurrently.
PS. Obviosuly I could not run this code, so you must debug it on your own.
I am running a webscraper class who's method name is self.get_with_random_proxy_using_chain.
I am trying to send multithreaded calls to the same url, and would like that once there is a result from any thread, the method returns a response and closes other still active threads.
So far my code looks like this (probably naive):
from concurrent.futures import ThreadPoolExecutor, as_completed
# class initiation etc
max_workers = cpu_count() * 5
urls = [url_to_open] * 50
with ThreadPoolExecutor(max_workers=max_workers) as executor:
future_to_url=[]
for url in urls: # i had to do a loop to include sleep not to overload the proxy server
future_to_url.append(executor.submit(self.get_with_random_proxy_using_chain,
url,
timeout,
update_proxy_score,
unwanted_keywords,
unwanted_status_codes,
random_universe_size,
file_path_to_save_streamed_content))
sleep(0.5)
for future in as_completed(future_to_url):
if future.result() is not None:
return future.result()
But it runs all the threads.
Is there a way to close all threads once the first future has completed.
I am using windows and python 3.7x
So far I found this link, but I don't manage to make it work (pogram still runs for a long time).
As far as I know, running futures cannot be cancelled. Quite a lot has been written about this. And there are even some workarounds.
But I would suggest taking a closer look at the asyncio module. It is quite well suited for such tasks.
Below is a simple example, when several concurrent requests are made, and upon receiving the first result, the rest are canceled.
import asyncio
from typing import Set
from aiohttp import ClientSession
async def fetch(url, session):
async with session.get(url) as response:
return await response.read()
async def wait_for_first_response(tasks):
done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
for p in pending:
p.cancel()
return done.pop().result()
async def request_one_of(*urls):
tasks = set()
async with ClientSession() as session:
for url in urls:
task = asyncio.create_task(fetch(url, session))
tasks.add(task)
return await wait_for_first_response(tasks)
async def main():
response = await request_one_of("https://wikipedia.org", "https://apple.com")
print(response)
asyncio.run(main())