We're getting started with Django Channels and are struggling with the following use case:
Our app receives multiple requests from a single client (another server) in a short time. Creating each response takes a long time. The order in which responses are sent to the client doesn't matter.
We want to keep an open WebSocket connection to reduce connection overhead for sending many requests and responses from and to the same client.
Django Channels seems to process messages on the same WebSocket connection strictly in order, and won't start processing the next frame before the previous one has been responded to.
Consider the following example:
Example
Server-side
import asyncio
from channels.generic.websocket import AsyncWebsocketConsumer
class QuestionConsumer(AsyncWebsocketConsumer):
async def websocket_connect(self, event):
await self.accept()
async def complicated_answer(self, question):
await asyncio.sleep(3)
return {
"What is the Answer to Life, The Universe and Everything?": "42",
"Why?": "Because.",
}.get(question, "Don't know")
async def receive(self, text_data=None, bytes_data=None):
# while awaiting below, we should start processing the next WS frame
answer = await self.complicated_answer(text_data)
await self.send(answer)
asgi.py:
from django.urls import re_path
from channels.routing import ProtocolTypeRouter, URLRouter
application = ProtocolTypeRouter(
{"websocket": URLRouter([
re_path(r"^questions", QuestionConsumer.as_asgi(), name="questions",)
]}
)
)
Client-side
import asyncio
import websockets
from time import time
async def main():
async with websockets.connect("ws://0.0.0.0:8000/questions") as ws:
tasks = []
for m in [
"What is the Answer to Life, The Universe and Everything?",
"Why?"
]:
tasks.append(ws.send(m))
# send all requests (without waiting for response)
time_before = time()
await asyncio.gather(*tasks)
# wait for responses
for t in tasks:
print(await ws.recv())
print("{:.1f} seconds since first request".format(time() - time_before))
asyncio.get_event_loop().run_until_complete(main())
Result
Actual
42
3.0 seconds since first request
Because.
6.0 seconds since first request
Desired
42
3.0 seconds since first request
Because.
3.0 seconds since first request
In other words, we would like the event loop to switch between async tasks not only for multiple consumers, but also for all tasks handled by the same consumer. Is this possible or is there a workaround we are overlooking? Have you used Django Channels for similar challenges and how did you solve them?
The consumer's receive function is called sequentially for each incoming WebSocket message, and when the await of the first receive is reached, the receive method wasn't called for the second message and hence switching context to the second co-routine is not yet possible. I couldn't find a source for this, but I'm guessing that this is part of the ASGI protocol itself. For many use-cases, handling WebSocket messages stricty in the order of receiving is probably desired.
The solution to handle messages asynchronously is to not send the response from the receive method, but instead send the response from a coroutine scheduled through loop.create_task.
Scheduling the long-running coroutine which generates response allows receive to complete, and for the next receive to begin. Once the second message's response generation has been scheduled, two coroutines will have been scheduled, and the interpreter can switch contexts to execute them asynchronously.
For the example in the question, this is the solution I found:
class QuestionConsumer(AsyncWebsocketConsumer):
async def complicated_answer(self, question):
await asyncio.sleep(3)
answer = {
"What is the Answer to Life, The Universe and Everything?": "42",
"Why?": "Because.",
}.get(question, "Don't know")
# instead of returning the answer, send it directly to client as a response
await self.send(answer)
async def receive(self, text_data=None, bytes_data=None):
# instead of awaiting, schedule the coroutine
loop = asyncio.get_running_loop()
loop.create_task(
self.complicated_answer(text_data)
)
The output of this altered consumer matches the desired output given by the question. Note that responses may be returned out of order, and clients are responsible for matching requests to responses.
Note that for Python versions <3.7, get_event_loop should be used instead of get_running_loop.
Related
I found this library for asynchronously consuming kafka messages: https://github.com/aio-libs/aiokafka
It gives this code example:
from aiokafka import AIOKafkaConsumer
import asyncio
async def consume():
consumer = AIOKafkaConsumer(
'redacted',
bootstrap_servers='redacted',
auto_offset_reset="earliest"
#group_id="my-group"
)
# Get cluster layout and join group `my-group`
await consumer.start()
try:
# Consume messages
async for msg in consumer:
print("consumed: ", msg.topic, msg.partition, msg.offset,
msg.key, msg.value, msg.timestamp)
finally:
# Will leave consumer group; perform autocommit if enabled.
await consumer.stop()
asyncio.run(consume())
I would like to find out the biggest kafka message using this code. So, Inside async for I need to do max_size = max(max_size, len(msg.value)). But I think it won't be thread-safe, and I need to lock access to it?
try:
max_size = -1
# Consume messages
async for msg in consumer:
max_size = max(max_size, len(msg.value)) # do I need to lock this code?
How do I do it in python? I've checked out this page: https://docs.python.org/3/library/asyncio-sync.html and I'm confused because those synchronization primitives are not thread-safe? So I can't use them in a multithreaded context? I'm really confused. I come from a Java background and need to write this script, so, pardon me that I haven't read all the asyncio books out there.
Is my understanding correct that the body of the async for loop is a continuation that may be scheduled on a separate thread when the asynchronous operation is done?
Please consider the following. There's a system that asks for data using HTTP POST methods. Right after sending such a request, the system waits for an HTTP response with a status code and data as separate messages. The existing system is built in a way that it won't accept a response with status code and data combined, which is, to be honest, doesn't make sense to me. On my side, I need to implement a system, which will receive such requests and provide data to clients. I decided to use the AIOHTTP library to solve this problem. As I'm very new to AIOHTTP, I can't find a way to send data back to the client right after returning a response. The existing system which sends requests also has an endpoint on its side. So, what I think of doing, is to return a response with a status code to the client and then as a client send a POST request to the provided endpoint. So my system will work both as a client and as a server. Now, what I do not understand, is how to implement this using AIOHTTP.
Let's say I have the following endpoint on my side with a handler. Please, consider this to only be pseudocode.
async def init():
app = web.Application()
app.add_routes([web.post('/endpoint/', handle)])
app_runner = web.AppRunner(app)
await app_runner.setup()
site = web.TCPSite(runner=app_runner, host='127.0.0.1', port=8008)
await site.start()
async def handle(request):
data = await request.text()
result = await process(data) # Data processing routine. Might be time-consuming.
await session.post(SERVER_ENDPOINT, data=result) # Let's say I have session in this block
# and I'm sending data back to the client.
return web.Response(status=200) # Returning a status without data.
Now, I need web.Response(status=200) to happen as soon as possible and only then process received data and send data back to the client. What I thought of doing is to wrap data processing and request sending in a task and adding it to a queue. Now, I always need the response to be sent first and I'm afraid that when using tasks, this might not be always true, or is it? Might the task be completed before returning a response? Is AIOHTTP good for this task? Should I consider something else?
Update #1
I've found a method called finish_response. Might it be used to implement something like this?
async def handler(self, request):
self.finish_response(web.Response(status=200)) # Just an example.
self.session.post(SERVER_ENDPOINT, data=my_data)
return True # or something
aiohttp has a sibling project called aiojobs, which is used to handle background tasks. There is an example of how aiojobs integrates with aiohttp in their documentation.
So, modifying your example to work with aiojobs:
import aiojobs.aiohttp
async def init():
app = web.Application()
app.add_routes([web.post('/endpoint/', handle)])
# We must setup AIOJobs from AIOHTTP app
aiojobs.aiohttp.setup(app)
app_runner = web.AppRunner(app)
await app_runner.setup()
site = web.TCPSite(runner=app_runner, host='127.0.0.1', port=8008)
await site.start()
async def handle(request):
data = await request.text()
result = await process(data) # Data processing routine. Might be time-consuming.
# Here we create the background task
await aiojobs.aiohttp.spawn(request, session.post(SERVER_ENDPOINT, data=result))
# The response should return as soon as the task is created - it does not wait for the task to finish.
return web.Response(status=200) # Returning a status without data.
If you want the await process(data) to also be scheduled as a task, then you can move both calls into a seperate function, and schedule them together:
async def push_to_server(data):
result = await process(data)
await session.post(SERVER_ENDPOINT, data=result))
async def handle(request):
data = await request.text()
await aiojobs.aiohttp.spawn(request, push_to_server(data))
return web.Response(status=200)
If you want to make sure the response is sent before the push_to_server coroutine is called, then you can make use of asyncio events:
import asyncio
async def push_to_server(data, start):
await start.wait()
result = await process(data)
await session.post(SERVER_ENDPOINT, data=result))
async def handle(request):
data = await request.text()
start = asyncio.Event()
await aiojobs.aiohttp.spawn(request, push_to_server(data, start))
response = web.Response(status=200)
await response.prepare(request)
await response.write_eof()
start.set()
return response
Here, await response.prepare(request) and await response.write_eof() is just a long-winded way of sending the response, but allows us to call the start.set() afterwards, which will trigger the push_to_server functionality which is waiting on that event (await start.wait()).
Using Python 3.7.4 and the asyncio package I'm trying to write an application that should spawn around 20000 (20k or more) TCP clients which then connect to a single server.
The clients then wait for a command from the server (received_data = await reader.read(4096)) and proceed to executing it (await loop.run_in_executor(...)) then send the response back to the server (writer.write(resp)).
After this cycle is completed, I sleep 100ms (await asyncio.sleep(100e-3)) in order to allow other coroutines to run.
The 20k clients should never disconnect and should process commands from the server indefinitely.
I'm interested in ways I can change the code to optimize it (barring the use of uvloop or directly implementing a Protocol since I saw in uvloop's docs this could improve the performance) beyond what it is capable now.
Let's assume that I cannot modify handle_request.
For example the await asyncio.sleep(100e-3) is especially bothering me, but I had to add it there, otherwise the impression was that no other coroutines ran other than the first one! Why could that be?
Say I remove the sleep (since in theory the other awaits should allow other coroutines to run), what else could I do?
Below is a minimal example of what my application looks like:
import asyncio
from collections import namedtuple
import logging
import os
import sys
logger = logging.getLogger(__name__)
should_exit = asyncio.Event()
def exit(signame, loop):
should_exit.set()
logger.warning('Exiting soon...')
def handle_request(received_data, entity):
logger.info('Backend logic here that consumes a bit of time depending on the entity and the received_data')
async def run_entity(entity, args):
logger.info(f'Running entity {entity}')
loop = asyncio.get_running_loop()
try:
reader, writer = await asyncio.open_connection(args.addr[0], int(args.addr[1]))
logger.debug(f'{entity} connected to {args.addr[0]}:{args.addr[1]}')
try:
while not should_exit.is_set():
received_data = await reader.read(4096)
if received_data:
logger.debug(f'{entity} received data {received_data}')
success, resp = await loop.run_in_executor(None, functools.partial(handle_request, received_data, entity))
if success:
logger.debug(f'{entity} sending response {resp}')
writer.write(resp)
await writer.drain()
await asyncio.sleep(100e-3)
except ConnectionResetError:
pass
except ConnectionRefusedError:
logger.warning(f'Connection refused by {args.addr[0]}:{args.addr[1]}.')
except Exception:
logger.exception('Details of unexpected error:')
logger.info(f'Stopped entity {entity}')
async def main(entities, args):
if os.name == 'posix':
loop = asyncio.get_running_loop()
loop.add_signal_handler(signal.SIGTERM, functools.partial(exit, signal.SIGTERM, loop))
loop.add_signal_handler(signal.SIGINT, functools.partial(exit, signal.SIGINT, loop))
tasks = (run_entity(entity, args) for entity in entities)
await asyncio.gather(*tasks)
if __name__ == '__main__':
ArgsReplacement = namedtuple('ArgsReplacement', ['addr'])
asyncio.run(main(range(20000), ArgsReplacement(addr=['127.0.0.1', '4242'])))
When all coroutines are waiting, asyncio listens for events to wake them up again. A common example would be asyncio.sleep(), which registers a timed event. In practice an event is usually an IO socket ready for receiving or sending new data.
To get a better understanding of this behaviour, I set up a simple test: It sends an http request to localhost and waits for the response. On localhost, I've set up a flask server which waits for 1 second before responding. After sending the request, the client sleeps for 1 second, then it awaits the response. I would expect this to return in rougly a second, since both my program and the server should sleep in parallel. But it takes 2 seconds:
import aiohttp
import asyncio
from time import perf_counter
async def main():
async with aiohttp.ClientSession() as session:
# this http request will take 1 second to respond
async with session.get("http://127.0.0.1:5000/") as response:
# yield control for 1 second
await asyncio.sleep(1)
# wait for the http request to return
text = await response.text()
return text
loop = asyncio.get_event_loop()
start = perf_counter()
results = loop.run_until_complete(main())
stop = perf_counter()
print(f"took {stop-start} seconds") # 2.01909
What is asyncio doing here, why can't I overlap waiting times ?
I'm not interested in the specific scenario of HTTP requests, aiohttp is only used to construct an example. Which is probably a bit dangerous: This could be related to aiohttp and not asyncio at all.
Actually, I expect this to be the case (hence the question title about both asyncio and aiohttp).
My first intuition was that the request is maybe not sent before calling asyncio.sleep(). So I reordered things a bit:
# start coroutine
text = response.text()
# yield control for 1 second
await asyncio.sleep(1)
# wait for the http request to return
text = await text
But this still takes two seconds.
Ok, now to be really sure that the request was sent off before going to sleep, I added print("incoming") to the route on the server, before it goes to sleep. I also changed the length of sleeping time to 10 seconds on the client. The server prints incoming immediately after the client is run. The client takes 11 seconds in total.
#app.route('/')
def index():
print("incoming")
time.sleep(1)
return 'done'
Since the HTTP request is made immediately, the server has definitely sent off an answer before the client wakes up from asyncio.sleep(). It seems to me that the socket providing the HTTP request should be ready as soon as the client wakes up. But still, the total runtime is always an addition of client and server waiting times.
Am I misusing asyncio somehow, or is this related to aiohttp after all ?
The problem is that one second happens in server is performed in async with session.get("http://127.0.0.1:5000/") as response:.
The http request finishes before you get this response object.
You can test it by:
...
async def main():
async with aiohttp.ClientSession() as session:
start = perf_counter()
# this http request will take 1 second to respond
async with session.get("http://127.0.0.1:5000/") as response:
end = perf_counter()
print(f"took {end-start} seconds to get response")
# yield control for 1 second
await asyncio.sleep(1)
# wait for the http request to return
text = await response.text()
return text
...
And btw you can surely overlap this waiting time, as long as you have another running coroutine.
Your testing code has three awaits (two explicit and one hidden in async with) in series, so you don't get any parallel waiting. The code that tests the scenario you describe is something along the lines of:
async def download():
async with aiohttp.ClientSession() as session:
async with session.get("http://127.0.0.1:5000/") as response:
text = await response.text()
return text
async def main():
loop = asyncio.get_event_loop()
# have download start "in the background"
dltask = loop.create_task(download())
# now sleep
await asyncio.sleep(1)
# and now await the end of the download
text = await dltask
Running this coroutine should take the expected time.
I am working on a bot that streams post from the Steem Blockchain (using the synchronous beem library) and sends posts that fulfil certain criteria to a Discord channel (using the asynchronous Discord.py library). This is is my (simplified) code:
bot = commands.Bot(command_prefix="!")
async def send_discord(msg):
await bot.wait_until_ready()
await bot.send_message(bot.get_channel("mychannelid"), msg)
async def scan_post(post):
"""Scan queued Comment objects for defined patterns"""
post.refresh()
if post["author"] == "myusername":
await loop.create_task(send_discord("New post found"))
async def start_blockchain():
stream = map(blockchain.stream(opNames=["comment"]))
for post in stream:
await loop.create_task(scan_post(post))
if __name__ == '__main__':
while True:
loop.create_task(start_blockchain())
try:
loop.run_until_complete(bot.start(TOKEN))
except Exception as error:
bot.logout()
logger.warning("Bot restarting "+repr(error))
Before I implemented discord.py I would just call the synchronous function scan_post(post) and it worked just fine, but now with the asynchronous implementation the posts are not processed fast enough and the stream has a rapidly increasing delay. If I make scan_post(post) a synchronous function, the processing time is fine, but the Discord websocket closes (or does not even open) and the bot goes offline. How can I solve this in a simple way (without rewriting the beem library)?
I solved the problem: I run the beem stream in its own thread and the asynchronous functions in a second thread. With the janus library I can then add objects from the beam thread to a queue that is processed by the asynchronous thread.