I found this library for asynchronously consuming kafka messages: https://github.com/aio-libs/aiokafka
It gives this code example:
from aiokafka import AIOKafkaConsumer
import asyncio
async def consume():
consumer = AIOKafkaConsumer(
'redacted',
bootstrap_servers='redacted',
auto_offset_reset="earliest"
#group_id="my-group"
)
# Get cluster layout and join group `my-group`
await consumer.start()
try:
# Consume messages
async for msg in consumer:
print("consumed: ", msg.topic, msg.partition, msg.offset,
msg.key, msg.value, msg.timestamp)
finally:
# Will leave consumer group; perform autocommit if enabled.
await consumer.stop()
asyncio.run(consume())
I would like to find out the biggest kafka message using this code. So, Inside async for I need to do max_size = max(max_size, len(msg.value)). But I think it won't be thread-safe, and I need to lock access to it?
try:
max_size = -1
# Consume messages
async for msg in consumer:
max_size = max(max_size, len(msg.value)) # do I need to lock this code?
How do I do it in python? I've checked out this page: https://docs.python.org/3/library/asyncio-sync.html and I'm confused because those synchronization primitives are not thread-safe? So I can't use them in a multithreaded context? I'm really confused. I come from a Java background and need to write this script, so, pardon me that I haven't read all the asyncio books out there.
Is my understanding correct that the body of the async for loop is a continuation that may be scheduled on a separate thread when the asynchronous operation is done?
Related
I've just started using RabbitMQ using aio-pika, and I have multiple queue names to consume.
So far I use this tutorial in worker.py but with multiple queue declarations so it looks like this:
import asyncio
from aio_pika import connect
async def main() -> None:
# Perform connection
connection = await connect(Settings.RABBIT_URL)
async with connection:
# Creating a channel
channel = await connection.channel()
await channel.set_qos(prefetch_count=0)
# Declaring queue
queue = await channel.declare_queue(
"queue_1",
durable=True,
)
queue2 = await channel.declare_queue(
"queue_2",
durable=True,
)
queue3 = await channel.declare_queue(
"queue_3",
durable=True,
)
# Start listening the queue with name 'task_queue'
await queue.consume(on_message)
await queue2.consume(on_message)
await queue3.consume(on_message)
print(" [*] Waiting for messages. To exit press CTRL+C")
await asyncio.Future()
The thing is I need to make it flexible to declare queues as many as how many queue names I could fetch from database, so first is my method to declare multiple queues correct, second how to declare queues based on list of queue names?
Thank you.
Ok, after some trials and errors, basically the way I declare queues is right, the next problem about declaring queues and consuming dynamically can be done by using looping the query results and use locals() to initiate variables dynamically.
queues = dict()
for q in range(0, len(query)):
queue_name = query[q]['result']
locals()['queues_{0}'.format(q)] = await channel.declare_queue(queue_name, durable=True)
await locals()['queues_{0}'.format(q)].consume(on_message)
I’m trying to write a multiplayer card game for a friend.
I want to make it simple enough that they (who are new-ish to Python) can easily understand the code, while still being fully featured.
I would like to have multiple clients connecting to the server, and it was my idea to send “messages”, which are JSON. As the messages arrived, they were appended to an “ID”, and then put in a queue, which is a list. The game engine code could just pop off messages, process them, and then append messages to the outgoing queue.
I was wondering what the easiest way to implement this would be, or if there is a simpler way I should be considering.
I’ve seen some people using socketserver, others using asyncio, and some people using threading.
I figured out a possible solution, using asyncio:
import asyncio
import json
outgoing = asyncio.Queue()
incoming = asyncio.Queue()
async def handle_incoming(reader):
buffer = ""
while True:
buffer += (await reader.read(1)).decode('utf-8')
if '\r\n' in buffer:
packet, buffer = buffer.split('\r\n', 1)
message = json.loads(packet)
await incoming.put(message)
async def handle_outgoing(writer):
while True:
message = await outgoing.get()
packet = json.dumps(message) + '\r\n'
writer.write(packet.encode('utf-8'))
await writer.drain()
async def handle_client(reader, writer):
asyncio.create_task(handle_incoming(reader))
asyncio.create_task(handle_outgoing(writer))
async def main():
# This is where you would do your main program logic
while True:
print(await incoming.get())
async def run_server():
asyncio.create_task(main())
server = await asyncio.start_server(handle_client, 'localhost', 15555)
async with server:
await server.serve_forever()
asyncio.run(run_server())
We're getting started with Django Channels and are struggling with the following use case:
Our app receives multiple requests from a single client (another server) in a short time. Creating each response takes a long time. The order in which responses are sent to the client doesn't matter.
We want to keep an open WebSocket connection to reduce connection overhead for sending many requests and responses from and to the same client.
Django Channels seems to process messages on the same WebSocket connection strictly in order, and won't start processing the next frame before the previous one has been responded to.
Consider the following example:
Example
Server-side
import asyncio
from channels.generic.websocket import AsyncWebsocketConsumer
class QuestionConsumer(AsyncWebsocketConsumer):
async def websocket_connect(self, event):
await self.accept()
async def complicated_answer(self, question):
await asyncio.sleep(3)
return {
"What is the Answer to Life, The Universe and Everything?": "42",
"Why?": "Because.",
}.get(question, "Don't know")
async def receive(self, text_data=None, bytes_data=None):
# while awaiting below, we should start processing the next WS frame
answer = await self.complicated_answer(text_data)
await self.send(answer)
asgi.py:
from django.urls import re_path
from channels.routing import ProtocolTypeRouter, URLRouter
application = ProtocolTypeRouter(
{"websocket": URLRouter([
re_path(r"^questions", QuestionConsumer.as_asgi(), name="questions",)
]}
)
)
Client-side
import asyncio
import websockets
from time import time
async def main():
async with websockets.connect("ws://0.0.0.0:8000/questions") as ws:
tasks = []
for m in [
"What is the Answer to Life, The Universe and Everything?",
"Why?"
]:
tasks.append(ws.send(m))
# send all requests (without waiting for response)
time_before = time()
await asyncio.gather(*tasks)
# wait for responses
for t in tasks:
print(await ws.recv())
print("{:.1f} seconds since first request".format(time() - time_before))
asyncio.get_event_loop().run_until_complete(main())
Result
Actual
42
3.0 seconds since first request
Because.
6.0 seconds since first request
Desired
42
3.0 seconds since first request
Because.
3.0 seconds since first request
In other words, we would like the event loop to switch between async tasks not only for multiple consumers, but also for all tasks handled by the same consumer. Is this possible or is there a workaround we are overlooking? Have you used Django Channels for similar challenges and how did you solve them?
The consumer's receive function is called sequentially for each incoming WebSocket message, and when the await of the first receive is reached, the receive method wasn't called for the second message and hence switching context to the second co-routine is not yet possible. I couldn't find a source for this, but I'm guessing that this is part of the ASGI protocol itself. For many use-cases, handling WebSocket messages stricty in the order of receiving is probably desired.
The solution to handle messages asynchronously is to not send the response from the receive method, but instead send the response from a coroutine scheduled through loop.create_task.
Scheduling the long-running coroutine which generates response allows receive to complete, and for the next receive to begin. Once the second message's response generation has been scheduled, two coroutines will have been scheduled, and the interpreter can switch contexts to execute them asynchronously.
For the example in the question, this is the solution I found:
class QuestionConsumer(AsyncWebsocketConsumer):
async def complicated_answer(self, question):
await asyncio.sleep(3)
answer = {
"What is the Answer to Life, The Universe and Everything?": "42",
"Why?": "Because.",
}.get(question, "Don't know")
# instead of returning the answer, send it directly to client as a response
await self.send(answer)
async def receive(self, text_data=None, bytes_data=None):
# instead of awaiting, schedule the coroutine
loop = asyncio.get_running_loop()
loop.create_task(
self.complicated_answer(text_data)
)
The output of this altered consumer matches the desired output given by the question. Note that responses may be returned out of order, and clients are responsible for matching requests to responses.
Note that for Python versions <3.7, get_event_loop should be used instead of get_running_loop.
For two projects I rely on a asycio producer-consumer model to work through some tasks.
The producers work of messages that come in from either mqtt or zermq.
This is the code of interest (running python3.7):
async def Producer(client, topic_filter, queue):
async with client.filtered_messages(topic_filter) as messages:
async for message in messages:
message = message.payload.decode()
await queue.put(message)
OutputText( 'Added element to queue.' )
async def Consumer(client, queue: asyncio.Queue):
while True:
item = await queue.get()
await DoTask(client, item)
await asyncio.sleep(timedelay)
queue.task_done()
When I just start up this code it works as expected. But after operating for time I find that the consumer stop working. When this happens I can still send messages to the script. The log file shows the print out that the element was added to the queue. But the consumer isn't triggered and remains idle.
I found that this normally happens when the machine that it is running on has to use some swap memory or goes to 100% CPU usage. Therefore I am guessing that the consumer doesn't have a proper connection with the queue anymore but I could be wrong.
Since I don't get any errors when this happens it is very hard to debug. Any idea's on how to debug this would be great.
Cheers,
Hilbert
I am working on a bot that streams post from the Steem Blockchain (using the synchronous beem library) and sends posts that fulfil certain criteria to a Discord channel (using the asynchronous Discord.py library). This is is my (simplified) code:
bot = commands.Bot(command_prefix="!")
async def send_discord(msg):
await bot.wait_until_ready()
await bot.send_message(bot.get_channel("mychannelid"), msg)
async def scan_post(post):
"""Scan queued Comment objects for defined patterns"""
post.refresh()
if post["author"] == "myusername":
await loop.create_task(send_discord("New post found"))
async def start_blockchain():
stream = map(blockchain.stream(opNames=["comment"]))
for post in stream:
await loop.create_task(scan_post(post))
if __name__ == '__main__':
while True:
loop.create_task(start_blockchain())
try:
loop.run_until_complete(bot.start(TOKEN))
except Exception as error:
bot.logout()
logger.warning("Bot restarting "+repr(error))
Before I implemented discord.py I would just call the synchronous function scan_post(post) and it worked just fine, but now with the asynchronous implementation the posts are not processed fast enough and the stream has a rapidly increasing delay. If I make scan_post(post) a synchronous function, the processing time is fine, but the Discord websocket closes (or does not even open) and the bot goes offline. How can I solve this in a simple way (without rewriting the beem library)?
I solved the problem: I run the beem stream in its own thread and the asynchronous functions in a second thread. With the janus library I can then add objects from the beam thread to a queue that is processed by the asynchronous thread.