Problem : aiokafka consumer starving the fastapi endpoint, due to which our kubernetes liveness probes are failing and any other service calling the exposed endpoints are getting timed out.
Details :
There is a kafka consumer, which starts during fastapi startup event, and keep on listening to the particular topic.
And then there is fastapi endpoint which serves the request.
When there are lot messages in kafka topic partion, kafka consumer starving the eventloop and the requests served by fastapi endpoints are timing out.
How can we solve this problem?
#all the imports
consumer = None
consumer_task = None
log = None
def get_application():
#initialize fastapi app and with different routes and do some stuff
return app
app = get_application()
#app.on_event("startup")
async def startup_event():
#initialize consumer
await initialize()
# start consuming
await consume()
#app.on_event("shutdown")
async def shutdown_event():
#close consumer
async def initialize():
#initilize
# get cluster layout and join group
await consumer.start()
await consumer.seek_to_committed()
async def consume():
global consumer_task
loop = asyncio.get_event_loop()
consumer_task = loop.create_task(send_consumer_message(consumer))
async def send_consumer_message(consumer):
try:
# consume messages
async for msg in consumer:
#do message processing
except Exception as e:
log.info(f"message consuming failed withe error: {repr(e)}")
finally:
log.warning("stopping consumer")
await consumer.stop()
Related
I try to create a websocket server, I wanna make a client to exchange data from server to client, but now my data from other process, I need to make a queue accept data from other process, it makes my main websocket function blocked, the final result is that could not reconnect after client connection break, I think it blocked in the code of queue.
Here is my part of my code:
class RecorderEventHook(object):
def __init__(self, high_event_mq):
self.high_event_mq = high_event_mq
self.msg = None
self.loop = None
# #wrap_keep_alive
async def on_msg_event(self, websocket):
try:
# async for message in websocket:
while True:
msg = self.high_event_mq.get()
await websocket.send(json.dumps(msg))
# msg
except Exception as error:
print(error)
async def event_controller(self):
await websockets.serve(self.on_msg_event, 'localhost', 8888)
def start(self):
loop = asyncio.new_event_loop()
loop.create_task(self.event_controller())
loop.run_forever()
I try to save connected websocket object and using in other thread(in same process), but it failed and mentions
"xxxx" function never waited
I want to be able to receive data from other processes without affecting the normal reconnection of the client.
Anybody help and big appreciate.
The following is a reduced version of server that periodically serves any connected clients with telemetry in the form of json strings. This was my initial attempt, in which the main loop pushes data to all connected clients. However, I cannot simply let the handler terminate after "registering" the client. The connection will be closed. So I need to block the handler until the main loop determines the client has disconnected. Signalling the handler through an Event simply does nothing.
#routes.get('/telemetry/json')
async def handler(request: Request):
global CLIENT
CLIENT = await StreamResponse().prepare(request)
log.debug(f"Wait for {EVENT}")
await EVENT.wait() # This never wakes up!
log.debug(f"Client {request.remote} disconnected")
async def main():
global EVENT
EVENT = Event()
app = web.Application()
app.add_routes(routes)
runner = web.AppRunner(app)
await runner.setup()
await web.TCPSite(runner, port=8080).start()
while True:
await sleep(1)
if CLIENT is None:
continue
try:
await CLIENT.write('FLUSH\n'.encode('utf-8'))
await CLIENT.drain()
except ConnectionResetError:
log.debug(f"Notify {EVENT}")
EVENT.set()
log.addHandler(logging.StreamHandler())
log.setLevel(10)
asyncio.run(main())
To clarify: The use of global CLIENT and EVENT is not how it is intended. The handling of multipple clients was removed to make the example code as short as possible.
I am developing a private chat for two or more users to communicate with each other. I have an endpoint for the websocket connection where only authenticated users are able to do a handshake between client and the server
The problem occurs when the websocket connection is accepted. Consumer handler is running smoothly in the infinite loop which waits for the messages from the client side to actually do some specific tasks that are requested, but the producer on the other side, hangs in an infinite loop and that causes the CPU spike up to 100%
Obviously I need one listener to a specific redis channel where I get all the messages from the users in real time, somehow I should listen to it, while loop does that but because of that CPU spike obviously it is not a good solution.
# api.py
async def consumer_handler(service):
"""Messages received on the websocket connection Consumer - (Publisher)"""
try:
while True:
received_data = await service.websocket.receive_json()
if received_data['event_type'] == "online.users":
await service.get_online_user_status(received_data['role_id'])
elif received_data['event_type'] == "message.user":
await service.send_message(received_data['user_id'], received_data['content'])
elif received_data['event_type'] == "info":
await service.get_info()
except WebSocketDisconnect:
logger.debug("WebSocketDisconnect - consumer handler disconnected")
async def producer_handler(service):
"""Messages generated at the backend to send to the websocket Producer - (Subscriber)"""
try:
while True:
if service.pubsub.subscribed:
message = await service.pubsub.get_message(ignore_subscribe_messages=True)
if message:
await service.websocket.send_json(message['data'].decode())
except (ConnectionClosedOK, aioredis.exceptions.ConnectionError) as e:
logger.debug(f"{e.__class__}", "producer handler disconnected")
#chat_app.websocket("/")
async def websocket_endpoint(websocket: WebSocket,
current_user: User = Depends(is_authenticated_ws)):
if not current_user:
return
async with ConnectionContextManager(user_id=current_user.id, websocket=websocket) as service:
producer_task = asyncio.ensure_future(producer_handler(service))
consumer_task = asyncio.ensure_future(consumer_handler(service))
done, pending = await asyncio.wait(
[consumer_task, producer_task],
return_when=asyncio.FIRST_COMPLETED
)
for task in pending:
task.cancel()
This endpoint handles the both producer/subscriber logic as it is described in the websockets documentation
#websocket_utils.py
class WebsocketService:
"""
This acts like a service for websocket, is returned within the context manager
this class is used to not interact with consumer directly, instead interact it with the manager
"""
def __init__(self, *, user_id: UUID4, websocket: WebSocket, pubsub: PubSub):
self.user_id = user_id
self.websocket = websocket
self.pubsub = pubsub
async def get_online_user_status(self, role_id):
await consumer.online_user_status_per_role(role_id, self.websocket)
async def send_message(self, user_id: UUID4, content: str):
await consumer.send_message_to_user(user_id=user_id,
message=content,
websocket=self.websocket)
async def get_info(self):
await consumer.fetch_info(self.websocket)
class ConnectionContextManager:
"""
This context manager handles the websocket connection
on enter, it returns a controller for the websocket events
"""
websocket_service: WebsocketService
def __init__(self, *, user_id: UUID4, websocket: WebSocket):
self.websocket_service = WebsocketService(user_id=user_id,
websocket=websocket,
pubsub=websocket.app.redis.pubsub())
async def __aenter__(self):
logger.debug("Context manager enter")
await consumer.connect(
user_id=self.websocket_service.user_id,
websocket=self.websocket_service.websocket,
pubsub=self.websocket_service.pubsub
)
return self.websocket_service
async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
await consumer.disconnect(
user_id=self.websocket_service.user_id,
pubsub=self.websocket_service.pubsub,
websocket=self.websocket_service.websocket,
)
logger.debug("Context manager exit")
This context manager ensures that each user has their own pubsub channel and it creates a controller for the actual consumer so that I do not have to pass user_id and other handy parameters all the time when I need a specific resource.
class ConnectionConsumer:
__redis: aioredis.Redis
def __init__(self):
self.__redis = aioredis.from_url(settings.ws_redis_url, encoding='utf-8', decode_responses=True)
async def __send_json(self, obj: dict, websocket: WebSocket):
await websocket.send_json(obj)
async def connect(self, *, user_id: UUID4, websocket: WebSocket, pubsub: PubSub):
# Accept connection if authorization is successful, set the user online and subscribe to its channel layer
await websocket.accept()
await self.__redis.set(f"status:{user_id}", "1") # status:UUID4 (means online)
await pubsub.subscribe(f"channel:{user_id}") # subscribe to itself's channel
async def disconnect(self, *, user_id: UUID4, websocket: WebSocket, pubsub: PubSub):
# Gracefully disconnect from the websocket and remove the channel layer from pubsub
await self.__redis.delete(f"status:{user_id}")
await pubsub.unsubscribe(f"channel:{user_id}")
await pubsub.close()
await self.__redis.close()
await websocket.close()
And here is the actual consumer which is called from the service that context manager returns.
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
4ed80g7fb093 s_be 1.77% 76.09MiB / 15.29GiB 0.49% 37.3kB / 21.1kB 0B / 0B 7
This is the docker stats for the container when only consumer is handled
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
4ed80g7fb093 s_be 100.36% 76.08MiB / 15.29GiB 0.49% 42.9kB / 25.7kB 0B / 0B 7
And this is the docker stats for the container when both producer and consumer handlers are running
I have tried to split the connections as well but I have the same issue.
I know this is pretty old question, but I also got similar problem.
The issue is that await pubsub.get_message uses timeout=0.0 param by default, which causes "infinite polling" and high CPU usage.
You can specify timeout argument (it must be a "seconds" float) so the system will wait before returning. Also, you can pass timeout=None to make get_message function to wait indefinitely for next message.
message = await pubsub.get_message(
ignore_subscribe_messages=True, timeout=None
) # This will wait for new message indefinitely
if message:
...
SOLUTION
here is the refactored producer handler that uses listen() function instead of get_message() which yields response instead of returning, this causes event loop to yield message whenever the value would be sent to it with generator expression. it does not need to check each time if value is available or not and therefore the problem gets solved and we do not need to have a timeout or await sleep() function inside the code
#logger.catch
async def producer_handler(service):
"""Messages generated at the backend to send to the websocket Producer - (Subscriber)"""
try:
while True:
if service.pubsub.subscribed:
async for message in service.pubsub.listen():
if message['type'] == "subscribe": continue
await service.websocket.send_text(message['data'])
except (ConnectionClosedOK, aioredis.exceptions.ConnectionError) as e:
sentry_sdk.capture_exception(e)
logger.debug(f"{e.__class__}", "producer handler disconnected")
I am currently working on a server platform, which is based on an event driven architecture. An event should enter the system via a websocket connection, and after some processing the response for it should also leave the system via the same websocket connection. The implementation logic behind the idea is, that if a connection is made to the server, I put it in a while cycle, and await it to send me data until it disconnects. The incoming data is put into a queue, from which a worker thread will pull it out and process it. On the other part, I have created a task, which is polling an outgoing event queue, and if there is an event in the queue, it sends it to the corresponding recipient. Unfortunately my current asyncio logic is flawed, in the way that polling the outgoing event queue blocks the receiving task, and I cannot wrap my head around a way to fix it. Here are some code snippets, which should represent the problem presented above:
Starting the websocket server
def run(self, address: str, port: int, ssl_context: ssl.SSLContext = None):
start_server = websockets.serve(
self.websocket_connection_handler, address, port, ssl=ssl_context)
event_loop = asyncio.get_event_loop()
event_loop.create_task(self.send_heartbeat())
event_loop.create_task(self.dispatch_outgoing_events())
print(f'Running on {"wss" if ssl_context else "ws"}://{address}:{port}')
event_loop.run_until_complete(start_server)
event_loop.run_forever()
The dispatcher function which infinitely polls data from the outgoing queue
async def dispatch_outgoing_events(self):
while not self.exit_state.should_exit:
if len(self.outgoing_event_queue) == 0:
await asyncio.sleep(0)
else:
event = self.outgoing_event_queue.get_event()
destination = event.destination
client_id = re.findall(
r'[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}', destination)[0]
client = self.client_store.get(client_id)
await client.websocket.send(serializer.serialize(event))
The connection handler function for the websocket
async def websocket_connection_handler(self, websocket, path):
client_id = await self.register(websocket)
try:
while not self.exit_state.should_exit:
correlation_id = str(uuid4())
message = await websocket.recv()
else:
try:
event = serializer.deserialize(
message, correlation_id, client_id)
event.return_address = f'remote://websocket/{client_id}'
self.incoming_event_queue.add_event(event)
except Exception as e:
event = type('evt', (object,), dict(system_entry=str(
datetime.datetime.utcnow()), destination=f'remote://websocket/{client_id}'))()
self.exception_handler.handle_exception(
e, event)
except Exception as exception:
print(
f'client {client_id} suddenly disconnected. Reason: {type(exception).__name__} -> {exception}')
self.client_store.remove(client_id)
self.topic_factory.remove_client(client_id)
self.topic_factory.get_topic('server_notifications').publish(ClientDisconnectedNotification(client_id),
str(uuid4()))
I have the web app. That app has endpoint to push some object data to redis channel.
And another endpoint handles websocket connection, where that data is fetched from channel and send to client via ws.
When i connect via ws, messages gets only first connected client.
How to read messages from redis channel with multiple clients and not create a new subscription?
Websocket handler.
Here i subscribe to channel, save it to app (init_tram_channel). Then run job where i listen channel and send messages(run_tram_listening).
#routes.get('/tram-state-ws/{tram_id}')
async def tram_ws(request: web.Request):
ws = web.WebSocketResponse()
await ws.prepare(request)
tram_id = int(request.match_info['tram_id'])
channel_name = f'tram_{tram_id}'
await init_tram_channel(channel_name, request.app)
tram_job = await run_tram_listening(
request=request,
ws=ws,
channel=request.app['tram_producers'][channel_name]
)
request.app['websockets'].add(ws)
try:
async for msg in ws:
if msg.type == aiohttp.WSMsgType.TEXT:
if msg.data == 'close':
await ws.close()
break
if msg.type == aiohttp.WSMsgType.ERROR:
logging.error(f'ws connection was closed with exception {ws.exception()}')
else:
await asyncio.sleep(0.005)
except asyncio.CancelledError:
pass
finally:
await tram_job.close()
request.app['websockets'].discard(ws)
return ws
Subscribing and saving channel.
Every channel is related to unique object, and in order not to create many channels that related to the same object, i save only one to app.
app['tram_producers'] is dict.
async def init_tram_channel(
channel_name: str,
app: web.Application
):
if channel_name not in app['tram_producers']:
channel, = await app['redis'].subscribe(channel_name)
app['tram_producers'][channel_name] = channel
Running coro for channel listening.
I run it via aiojobs:
async def run_tram_listening(
request: web.Request,
ws: web.WebSocketResponse,
channel: Channel
):
"""
:return: aiojobs._job.Job object
"""
listen_redis_job = await spawn(
request,
_read_tram_subscription(
ws,
channel
)
)
return listen_redis_job
Coro where i listen and send messages:
async def _read_tram_subscription(
ws: web.WebSocketResponse,
channel: Channel
):
try:
async for msg in channel.iter():
tram_data = msg.decode()
await ws.send_json(tram_data)
except asyncio.CancelledError:
pass
except Exception as e:
logging.error(msg=e, exc_info=e)
The following code has been found in some aioredis github issue (I've adopted it to my task).
class TramProducer:
def __init__(self, channel: aioredis.Channel):
self._future = None
self._channel = channel
def __aiter__(self):
return self
def __anext__(self):
return asyncio.shield(self._get_message())
async def _get_message(self):
if self._future:
return await self._future
self._future = asyncio.get_event_loop().create_future()
message = await self._channel.get_json()
future, self._future = self._future, None
future.set_result(message)
return message
So, how it works? TramProducer wraps the way we get messages.
As said #Messa
message is received from one Redis subscription only once.
So only one client of TramProducer is retrieving messages from redis, while other clients are waiting for future result that will be set after receiving message from channel.
If self._future initialized it means that somebody is waiting for message from redis, so we will just wait for self._future result.
TramProducer usage (i've taken an example from my question):
async def _read_tram_subscription(
ws: web.WebSocketResponse,
tram_producer: TramProducer
):
try:
async for msg in tram_producer:
await ws.send_json(msg)
except asyncio.CancelledError:
pass
except Exception as e:
logging.error(msg=e, exc_info=e)
TramProducer initialization:
async def init_tram_channel(
channel_name: str,
app: web.Application
):
if channel_name not in app['tram_producers']:
channel, = await app['redis'].subscribe(channel_name)
app['tram_producers'][channel_name] = TramProducer(channel)
I think it maybe helpfull for somebody.
Full project here https://gitlab.com/tram-emulator/tram-server
I guess a message is received from one Redis subscription only once, and if there is more than one listeners in your app, then only one of them will get it.
So you need to create something like mini pub/sub inside the application to distribute the messages to all listeners (websocket connections in this case).
Some time ago I've made an aiohttp websocket chat example - not with Redis, but at least the cross-websocket distribution is there: https://github.com/messa/aiohttp-nextjs-demo-chat/blob/master/chat_web/views/api.py
The key is to have an application-wide message_subcriptions, where every websocket connection registers itself, or perhaps its own asyncio.Queue (I've used Event in my example, but that's suboptimal), and whenever message comes from Redis, it is pushed to all relevant queues.
Of course when websocket connection ends (client unsubscribe, disconnect, failure...) the queue should be removed (and possibly Redis subscription cancelled if it was the last connection listening to it).
Asyncio doesn’t mean we should forget about queues :) Also it’s good to get familiar with combining multiple tasks at once (reading from websocket, reading from message queue, perhaps reading from some notification queue...). Using queues can also help you to handle client reconnects more cleanly (without loss of any messages).