Python Async Functions won't Give up the CPU - python

I have two async functions that both need to run constantly and one of them just hogs all of the CPU.
The first function handles receiving websocket messages from a client.
async def handle_message(self, ws):
"""Handles a message from the websocket."""
logger.info('awaiting message')
while True:
msg = await ws.receive()
logger.debug('received message: %s', msg)
jmsg = json.loads(msg['text'])
logger.info('received message: {}'.format(jmsg))
param = jmsg['parameter']
val = jmsg['data']['value']
logger.info('setting parameter {} to {}'.format(param, val))
self.camera.change_parameter(param, val)
The second function grabs images from a camera and sends them to the frontend client. This is the one that one that won't give the other guy any time.
async def send_image(self, ws):
"""Sends an image to the websocket."""
for im in self.camera:
await asyncio.sleep(1000)
h, w = im.shape[:2]
resized = cv2.resize(im, (w // 4, h // 4))
await ws.send_bytes(image_to_bytes(resized))
I'm executing these coroutines using asyncio.gather(). The decorator is from FastAPI and Backend() is my class that contains the two async coroutines.
#app.websocket('/ws')
async def websocket_endpoint(websocket: WebSocket):
"""Handle a WebSocket connection."""
backend = Backend()
logger.info('Started backend.')
await websocket.accept()
try:
aws = [backend.send_image(websocket), backend.handle_message(websocket)]
done, pending = await asyncio.gather(*aws)
except WebSocketDisconnect:
await websocket.close()
Both of these coroutines will operate seperately, but if I try to run them together send_image() never gives any time to handle_message and so none of the messages are ever received (or at least that's what I think is going on).
I thought this is what asyncio was trying to solve, but I'm probably using it wrong. I thought about using multiprocessing, but I'm pretty sure FastAPI expects awaitables here. I also read about using the return variables from gather(), but I didn't really understand. Something about canceling the pending tasks and adding them back to the event loop.
Can anyone show me the correct (and preferably modern pythonic) way to make these async coroutines run concurrently?

Related

Python AIOHTTP send a request right after returning a response

Please consider the following. There's a system that asks for data using HTTP POST methods. Right after sending such a request, the system waits for an HTTP response with a status code and data as separate messages. The existing system is built in a way that it won't accept a response with status code and data combined, which is, to be honest, doesn't make sense to me. On my side, I need to implement a system, which will receive such requests and provide data to clients. I decided to use the AIOHTTP library to solve this problem. As I'm very new to AIOHTTP, I can't find a way to send data back to the client right after returning a response. The existing system which sends requests also has an endpoint on its side. So, what I think of doing, is to return a response with a status code to the client and then as a client send a POST request to the provided endpoint. So my system will work both as a client and as a server. Now, what I do not understand, is how to implement this using AIOHTTP.
Let's say I have the following endpoint on my side with a handler. Please, consider this to only be pseudocode.
async def init():
app = web.Application()
app.add_routes([web.post('/endpoint/', handle)])
app_runner = web.AppRunner(app)
await app_runner.setup()
site = web.TCPSite(runner=app_runner, host='127.0.0.1', port=8008)
await site.start()
async def handle(request):
data = await request.text()
result = await process(data) # Data processing routine. Might be time-consuming.
await session.post(SERVER_ENDPOINT, data=result) # Let's say I have session in this block
# and I'm sending data back to the client.
return web.Response(status=200) # Returning a status without data.
Now, I need web.Response(status=200) to happen as soon as possible and only then process received data and send data back to the client. What I thought of doing is to wrap data processing and request sending in a task and adding it to a queue. Now, I always need the response to be sent first and I'm afraid that when using tasks, this might not be always true, or is it? Might the task be completed before returning a response? Is AIOHTTP good for this task? Should I consider something else?
Update #1
I've found a method called finish_response. Might it be used to implement something like this?
async def handler(self, request):
self.finish_response(web.Response(status=200)) # Just an example.
self.session.post(SERVER_ENDPOINT, data=my_data)
return True # or something
aiohttp has a sibling project called aiojobs, which is used to handle background tasks. There is an example of how aiojobs integrates with aiohttp in their documentation.
So, modifying your example to work with aiojobs:
import aiojobs.aiohttp
async def init():
app = web.Application()
app.add_routes([web.post('/endpoint/', handle)])
# We must setup AIOJobs from AIOHTTP app
aiojobs.aiohttp.setup(app)
app_runner = web.AppRunner(app)
await app_runner.setup()
site = web.TCPSite(runner=app_runner, host='127.0.0.1', port=8008)
await site.start()
async def handle(request):
data = await request.text()
result = await process(data) # Data processing routine. Might be time-consuming.
# Here we create the background task
await aiojobs.aiohttp.spawn(request, session.post(SERVER_ENDPOINT, data=result))
# The response should return as soon as the task is created - it does not wait for the task to finish.
return web.Response(status=200) # Returning a status without data.
If you want the await process(data) to also be scheduled as a task, then you can move both calls into a seperate function, and schedule them together:
async def push_to_server(data):
result = await process(data)
await session.post(SERVER_ENDPOINT, data=result))
async def handle(request):
data = await request.text()
await aiojobs.aiohttp.spawn(request, push_to_server(data))
return web.Response(status=200)
If you want to make sure the response is sent before the push_to_server coroutine is called, then you can make use of asyncio events:
import asyncio
async def push_to_server(data, start):
await start.wait()
result = await process(data)
await session.post(SERVER_ENDPOINT, data=result))
async def handle(request):
data = await request.text()
start = asyncio.Event()
await aiojobs.aiohttp.spawn(request, push_to_server(data, start))
response = web.Response(status=200)
await response.prepare(request)
await response.write_eof()
start.set()
return response
Here, await response.prepare(request) and await response.write_eof() is just a long-winded way of sending the response, but allows us to call the start.set() afterwards, which will trigger the push_to_server functionality which is waiting on that event (await start.wait()).

Discord.py avoid blocking on_message method

I'm working on a Discord bot in which I mainly process images. So far it's working but when multiple images are sent at once, I experience a lot of blocking and inconsistency.
It goes like this:
User upload image > Bot places 'eyes' emoji on the message > bot processes the image > bot responds with result.
However, sometimes it can handle multiple images at once (the bot places the eyes emoji on the first few images) but usually it just puts emoji on the first image and then after finishing that one it will process the next 2-3 images etc.
The process which takes most of the time is the OCR reading the image.
Here is some abstract code:
main.py
#client.event
async def on_message(message):
...
if len(message.attachments) > 0: await message_service.handle_image(message)
...
message_service.py
async def handle_image(self, message):
supported_attachments = filter_out_unsupported(message.attachments)
images = []
await message.reply(f"{random_greeting()} {message.author.mention}, I'm processing your upload(s) please wait a moment, this could take up to 30 seconds.")
await message.add_reaction('đź‘€')
for a in supported_attachments:
async with aiohttp.ClientSession() as session:
async with session.get(a) as res:
if res.status == 200:
buffer = io.BytesIO(await res.read())
arr = np.asarray(bytearray(buffer.read()), dtype=np.uint8)
images.append(cv2.imdecode(arr, -1))
for image in images:
result = await self.image_handler.handle_image(image, message.author)
await message.remove_reaction('đź‘€', message.author)
if result == None:
await message.reply(f"{message.author.mention} I can't process your image. It's incorrect, unclear or I'm just not smart enough... :(")
await message.add_reaction('❌')
else:
await message.reply(result)
image_handler
async def handle_image(self, image, author):
try:
if image is None: return None
governor_id = str(self.__get_governor_id_by_discord_id(author.id))
if governor_id == None:
return f"{author.mention} there was no account registered under your discord id, please register by using this format: `$register <governor_id> <in game name>`, for example: `$register ... ...`. After that repost the screenshot.\n As for now multiple accounts are not supported."
# This part is most likely the bottleneck !!
read_result = self.reader.read_image_task(image)
if self.__no_values_are_found(...):
return None
return self.sheets_client.update_player_row_in_sheets(...)
except:
return None
def __no_values_are_found(self, *args):
return all(v is None for v in [*args])
def __get_governor_id_by_discord_id(self, id):
return self.sheets_client.get_governor_id_by_discord_id(id)
I'm new to Python and Discord bots in general, but is there a clean way to handle this?
I was thinking about threading but can't seem to find many solutions within this context, which makes me believe I am missing something or doing something inefficiently.
There is actually a clean way, you can create your own to_thread decorator and decorate your blocking functions (though they cannot be coroutines, they must be normal, synchronous functions)
import asyncio
from functools import partial, wraps
def to_thread(func):
#wraps(func)
async def wrapper(*args, **kwargs):
loop = asyncio.get_event_loop()
callback = partial(func, *args, **kwargs)
return await loop.run_in_executor(None, callback) # if using python 3.9+ use `await asyncio.to_thread(callback)`
return wrapper
# usage
#to_thread
def handle_image(self, image, author): # notice how it's *not* an async function
...
# calling
await handle_image(...) # not a coroutine, yet I'm awaiting it (cause of the wrapper function)

Ways to optimize simple asyncio program where TCP clients are persistent

Using Python 3.7.4 and the asyncio package I'm trying to write an application that should spawn around 20000 (20k or more) TCP clients which then connect to a single server.
The clients then wait for a command from the server (received_data = await reader.read(4096)) and proceed to executing it (await loop.run_in_executor(...)) then send the response back to the server (writer.write(resp)).
After this cycle is completed, I sleep 100ms (await asyncio.sleep(100e-3)) in order to allow other coroutines to run.
The 20k clients should never disconnect and should process commands from the server indefinitely.
I'm interested in ways I can change the code to optimize it (barring the use of uvloop or directly implementing a Protocol since I saw in uvloop's docs this could improve the performance) beyond what it is capable now.
Let's assume that I cannot modify handle_request.
For example the await asyncio.sleep(100e-3) is especially bothering me, but I had to add it there, otherwise the impression was that no other coroutines ran other than the first one! Why could that be?
Say I remove the sleep (since in theory the other awaits should allow other coroutines to run), what else could I do?
Below is a minimal example of what my application looks like:
import asyncio
from collections import namedtuple
import logging
import os
import sys
logger = logging.getLogger(__name__)
should_exit = asyncio.Event()
def exit(signame, loop):
should_exit.set()
logger.warning('Exiting soon...')
def handle_request(received_data, entity):
logger.info('Backend logic here that consumes a bit of time depending on the entity and the received_data')
async def run_entity(entity, args):
logger.info(f'Running entity {entity}')
loop = asyncio.get_running_loop()
try:
reader, writer = await asyncio.open_connection(args.addr[0], int(args.addr[1]))
logger.debug(f'{entity} connected to {args.addr[0]}:{args.addr[1]}')
try:
while not should_exit.is_set():
received_data = await reader.read(4096)
if received_data:
logger.debug(f'{entity} received data {received_data}')
success, resp = await loop.run_in_executor(None, functools.partial(handle_request, received_data, entity))
if success:
logger.debug(f'{entity} sending response {resp}')
writer.write(resp)
await writer.drain()
await asyncio.sleep(100e-3)
except ConnectionResetError:
pass
except ConnectionRefusedError:
logger.warning(f'Connection refused by {args.addr[0]}:{args.addr[1]}.')
except Exception:
logger.exception('Details of unexpected error:')
logger.info(f'Stopped entity {entity}')
async def main(entities, args):
if os.name == 'posix':
loop = asyncio.get_running_loop()
loop.add_signal_handler(signal.SIGTERM, functools.partial(exit, signal.SIGTERM, loop))
loop.add_signal_handler(signal.SIGINT, functools.partial(exit, signal.SIGINT, loop))
tasks = (run_entity(entity, args) for entity in entities)
await asyncio.gather(*tasks)
if __name__ == '__main__':
ArgsReplacement = namedtuple('ArgsReplacement', ['addr'])
asyncio.run(main(range(20000), ArgsReplacement(addr=['127.0.0.1', '4242'])))

How to implement single-producer multi-consumer with aioredis pub/sub

I have the web app. That app has endpoint to push some object data to redis channel.
And another endpoint handles websocket connection, where that data is fetched from channel and send to client via ws.
When i connect via ws, messages gets only first connected client.
How to read messages from redis channel with multiple clients and not create a new subscription?
Websocket handler.
Here i subscribe to channel, save it to app (init_tram_channel). Then run job where i listen channel and send messages(run_tram_listening).
#routes.get('/tram-state-ws/{tram_id}')
async def tram_ws(request: web.Request):
ws = web.WebSocketResponse()
await ws.prepare(request)
tram_id = int(request.match_info['tram_id'])
channel_name = f'tram_{tram_id}'
await init_tram_channel(channel_name, request.app)
tram_job = await run_tram_listening(
request=request,
ws=ws,
channel=request.app['tram_producers'][channel_name]
)
request.app['websockets'].add(ws)
try:
async for msg in ws:
if msg.type == aiohttp.WSMsgType.TEXT:
if msg.data == 'close':
await ws.close()
break
if msg.type == aiohttp.WSMsgType.ERROR:
logging.error(f'ws connection was closed with exception {ws.exception()}')
else:
await asyncio.sleep(0.005)
except asyncio.CancelledError:
pass
finally:
await tram_job.close()
request.app['websockets'].discard(ws)
return ws
Subscribing and saving channel.
Every channel is related to unique object, and in order not to create many channels that related to the same object, i save only one to app.
app['tram_producers'] is dict.
async def init_tram_channel(
channel_name: str,
app: web.Application
):
if channel_name not in app['tram_producers']:
channel, = await app['redis'].subscribe(channel_name)
app['tram_producers'][channel_name] = channel
Running coro for channel listening.
I run it via aiojobs:
async def run_tram_listening(
request: web.Request,
ws: web.WebSocketResponse,
channel: Channel
):
"""
:return: aiojobs._job.Job object
"""
listen_redis_job = await spawn(
request,
_read_tram_subscription(
ws,
channel
)
)
return listen_redis_job
Coro where i listen and send messages:
async def _read_tram_subscription(
ws: web.WebSocketResponse,
channel: Channel
):
try:
async for msg in channel.iter():
tram_data = msg.decode()
await ws.send_json(tram_data)
except asyncio.CancelledError:
pass
except Exception as e:
logging.error(msg=e, exc_info=e)
The following code has been found in some aioredis github issue (I've adopted it to my task).
class TramProducer:
def __init__(self, channel: aioredis.Channel):
self._future = None
self._channel = channel
def __aiter__(self):
return self
def __anext__(self):
return asyncio.shield(self._get_message())
async def _get_message(self):
if self._future:
return await self._future
self._future = asyncio.get_event_loop().create_future()
message = await self._channel.get_json()
future, self._future = self._future, None
future.set_result(message)
return message
So, how it works? TramProducer wraps the way we get messages.
As said #Messa
message is received from one Redis subscription only once.
So only one client of TramProducer is retrieving messages from redis, while other clients are waiting for future result that will be set after receiving message from channel.
If self._future initialized it means that somebody is waiting for message from redis, so we will just wait for self._future result.
TramProducer usage (i've taken an example from my question):
async def _read_tram_subscription(
ws: web.WebSocketResponse,
tram_producer: TramProducer
):
try:
async for msg in tram_producer:
await ws.send_json(msg)
except asyncio.CancelledError:
pass
except Exception as e:
logging.error(msg=e, exc_info=e)
TramProducer initialization:
async def init_tram_channel(
channel_name: str,
app: web.Application
):
if channel_name not in app['tram_producers']:
channel, = await app['redis'].subscribe(channel_name)
app['tram_producers'][channel_name] = TramProducer(channel)
I think it maybe helpfull for somebody.
Full project here https://gitlab.com/tram-emulator/tram-server
I guess a message is received from one Redis subscription only once, and if there is more than one listeners in your app, then only one of them will get it.
So you need to create something like mini pub/sub inside the application to distribute the messages to all listeners (websocket connections in this case).
Some time ago I've made an aiohttp websocket chat example - not with Redis, but at least the cross-websocket distribution is there: https://github.com/messa/aiohttp-nextjs-demo-chat/blob/master/chat_web/views/api.py
The key is to have an application-wide message_subcriptions, where every websocket connection registers itself, or perhaps its own asyncio.Queue (I've used Event in my example, but that's suboptimal), and whenever message comes from Redis, it is pushed to all relevant queues.
Of course when websocket connection ends (client unsubscribe, disconnect, failure...) the queue should be removed (and possibly Redis subscription cancelled if it was the last connection listening to it).
Asyncio doesn’t mean we should forget about queues :) Also it’s good to get familiar with combining multiple tasks at once (reading from websocket, reading from message queue, perhaps reading from some notification queue...). Using queues can also help you to handle client reconnects more cleanly (without loss of any messages).

asyncIO multithreaded server with two coroutines

I'm programming a server in Python3, which takes screenshot and sends it over websockets. I have coroutine for handling connection and I would like to create another coroutine for taking screenshot at some interval. Screenshot coroutine will probably run in different thread and I will need to propagate the result to some shared variable with read-write lock, to be able to send it. My questions: (result should be multiplatform, if possible)
How is it possible to schedule tasks like this? I created server which runs forever, and I can create periodical coroutine, but somehow I can't put them together in one loop.
What is a good way to propagate the result from one thread (or coroutine, if server is single threaded) to another?
I found this piece of code similar to this and I can't get it to work (second coroutine doesn't execute). Can someone correct this with and without multithreading?
async def print_var():
global number
await asyncio.sleep(2)
print(number)
async def inc_var():
global number
await asyncio.sleep(5)
number += 1
number = 0
asyncio.get_event_loop().run_until_complete(print_var())
asyncio.async(inc_var)
asyncio.get_event_loop().run_forever()
Post-answer edit
In the end after more hours of googling, I actually got it to work on a single thread, so there's no danger of race condition. (But I'm still not sure what ensure_future does, and why it isn't called on event loop.)
users = set()
def register(websocket):
users.add(websocket)
def unregister(websocket):
users.remove(websocket)
async def get_screenshot():
global screenshot
while True:
screenshot = screenshot()
await asyncio.sleep(0.2)
async def server(websocket, path):
global screenshot
register(websocket)
try:
async for message in websocket:
respond(screenshot)
finally:
unregister(websocket)
def main():
asyncio.get_event_loop().run_until_complete(
websockets.serve(server, 'localhost', 6789))
asyncio.ensure_future(get_screenshot())
asyncio.get_event_loop().run_forever()
main()
In Python 3.7:
import asyncio
import websockets
CAPTURE_INTERVAL = 1
running = True
queues = set()
async def handle(ws, path):
queue = asyncio.Queue()
queues.add(queue)
while running:
data = await queue.get()
if not data:
break
await ws.send(data)
def capture_screen():
# Do some work here, preferably in C extension without holding the GIL
return b'screenshot data'
async def main():
global running
loop = asyncio.get_running_loop()
server = await websockets.serve(handle, 'localhost', 8765)
try:
while running:
data = await loop.run_in_executor(None, capture_screen)
for queue in queues:
queue.put_nowait(data)
await asyncio.sleep(CAPTURE_INTERVAL)
finally:
running = False
for queue in queues:
queue.put_nowait(None)
server.close()
await server.wait_closed()
if __name__ == '__main__':
asyncio.run(main())
Please note, this is only for demonstrating the producer-consumer fan-out pattern. The queues are not essential - you can simply send data to all server.sockets in main() directly, while in handle() you should worry about incoming websocket messages. For example, client may control image compression rate like this:
import asyncio
import websockets
CAPTURE_INTERVAL = 1
DEFAULT = b'default'
qualities = {}
async def handle(ws, path):
try:
async for req in ws:
qualities[ws] = req
finally:
qualities.pop(ws, None)
def capture_screen():
# Do some work here, preferably in C extension without holding the GIL
return {
DEFAULT: b'default screenshot data',
b'60': b'data at 60% quality',
b'80': b'data at 80% quality',
}
async def main():
loop = asyncio.get_running_loop()
server = await websockets.serve(handle, 'localhost', 8765)
try:
while True:
data = await loop.run_in_executor(None, capture_screen)
for ws in server.sockets:
quality = qualities.get(ws, DEFAULT)
if quality not in data:
quality = DEFAULT
asyncio.create_task(ws.send(data[quality]))
await asyncio.sleep(CAPTURE_INTERVAL)
finally:
server.close()
await server.wait_closed()
if __name__ == '__main__':
asyncio.run(main())

Categories