How to pass bidirectional messages between two python codes using kafka - python

In my computer vision project, I want to send images from webots-controller to the AI-model as inputs and then send movements from the AI-model to webots-controller. (for the bidirectional message passing, I used two topics)
I wrote this simple code to pass messages but it don't work. What should I do?
Code
# AI-model.py
from kafka import KafkaConsumer, KafkaProducer
producer = KafkaProducer(bootstrap_servers='localhost:9092')
consumer = KafkaConsumer('model-mailbox')
while(True):
img = consumer.__next__()
print(img.key)
print('a-received')
producer.send('webots-mailbox', key=b'movement', value=b'a')
producer.flush()
print('a-sent')
# webots-controller.py
from kafka import KafkaConsumer, KafkaProducer
producer = KafkaProducer(bootstrap_servers='localhost:9092')
consumer = KafkaConsumer('webots-mailbox')
while True:
producer.send('model-mailbox', key=b'image', value=b'b')
producer.flush()
print('b-sent')
movement = consumer.__next__()
print(movement.key)
print('b-received')
Output
These are the console outputs. (I run the AI model first)
matin#matin:~/ python AI-model.py
b'image'
a-received
a-sent
As you can see, webots-controller don't receive any message.
matin#matin:~/ python webots-controller.py
b-sent
Extra x)
Also mentioning that when I comment a-received part, my messages will arrive at b-received part and console shows this output.
matin#matin:~/ python webots-controller.py
b-sent
b'movement'
b-received

Related

Python script is running in the IDE, but not in the terminal (Kafka)

This may or may not be Kafka related, but I encountered this while learning Kafka. I've got a python producer script that looks like this:
from kafka import KafkaProducer
from json import dumps
class Producer:
def __init__(self):
self.connection = KafkaProducer(
bootstrap_servers=['localhost:9092'],
value_serializer=lambda x: dumps(x).encode('utf-8')
)
def push_client(self, data):
self.connection.send('client-pusher', value=data)
data = {
"first_name": "Davey",
"email": "davey#dave.com",
"group_id": 3,
"date": "2021-12-12"
}
producer = Producer()
producer.push_client(data)
I'm running the Kafka Broker in Docker, and the messages get consumed on the other end by this script:
import json
from datetime import date
from typing import Optional
from kafka import KafkaConsumer
from pydantic import BaseModel
class Client(BaseModel):
first_name: str
email: str
group_id: Optional[int] = None
date: date
consumer = KafkaConsumer(
'client-pusher',
bootstrap_servers=['localhost:9092'],
auto_offset_reset='earliest',
enable_auto_commit=True,
group_id='my-group-id',
value_deserializer=lambda x: json.loads(x.decode('utf-8'))
)
while True:
msg_pack = consumer.poll(timeout_ms=500)
for tp, messages in msg_pack.items():
for message in messages:
client = Client(**message.value)
print(client)
The consumer script listens for new messages on an infinite loop. I can run the consumer in terminal or in vscode and it will always print out the data dict from the producer, but ONLY if I run the producer script in Visual Studio code.
If I run the producer script in the terminal with
python producer.py
the messages don't come through to the consumer. There are no runtime errors (print statements in the producer come through fine). I cannot for the life of me see what's different about the environment in my IDE.
I have different virtual environments governing both scripts. I've tried running the producer with the full path to the venv, copied straight from vscode's terminal, for example
/home/me/whatever/dummy-producer/.venv/bin/python producer.py
I've also printed out everything in sys.path – they're identical between the IDE and the terminal.
What else might I try to find the difference between vscode's execution and the terminal's? I'm using zsh, if that matters.
Kafka clients don't immediately send the messages; if you have less than the default batch size and the app exits, you're effectively dropping events.
If you want to send immediately, you need one more method in the producer
def push_client(self, data):
self.connection.send('client-pusher', value=data)
self.connection.flush()

Python Boto3 not receiving messages but SQS shows in flight

I've a docker which fetches messages from a standard SQS. But most of the times, the code shows it received zero messages and exits. While the SQS console shows the messages under "Messages in flight", so the messages were received by some consumer.
This is my docker entry point
ENV PYTHONPATH="$PYTHONPATH:/app"
ENTRYPOINT [ "python3" ]
CMD ["multi.py"]
This is multi.py code
import multiprocessing as mp
import subprocess
def s():
subprocess.call(['python3', 'script.py'])
n_process = min(mp.cpu_count(), 8)
process = []
for i in range(n_process):
p = mp.Process(target=s)
process.append(p)
p.start()
for p in process:
p.join()
This is script.py part of the code which calls receive_messages
sqs = boto3.resource('sqs', region_name=REGION, aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
queue = sqs.get_queue_by_name(QueueName=QUEUE_NAME)
def main():
while True:
m = queue.receive_messages()
for message in m:
process_message(message)
message.delete()
Also, the docker works like 60% of the time. But I'm trying to figure out why it fails.
PS: Solved
This is from the boto3 docs
Short poll is the default behavior where a weighted random set of machines is sampled on a ReceiveMessage call. Thus, only the messages on the sampled machines are returned. If the number of messages in the queue is extremely small, you might not receive any messages in a particular ReceiveMessage response. If this happens, repeat the request.
m = queue.receive_messages(WaitTimeSeconds=5)
This will resolve the issue because in cases where there are very less amount of messages in SQS, polling for messages will be very likely to fail.
You can read about short-polling on boto3 docs here.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sqs.html#SQS.Queue.receive_messages

Why is my unit test for Kafka in Python not working?

from kafka import KafkaProducer, errors, admin, KafkaConsumer
SERVERS = ['localhost:9092']
TEST_TOPIC = 'test-topic'
DATA = [{'A':'A'}, {'A':'A'}, {'A':'A'}]
class TestKafkaConsumer(unittest.TestCase):
#classmethod
def setUpClass(self):
self._producer = KafkaProducer(bootstrap_servers=SERVERS, value_serializer=lambda x:dumps(x).encode('utf-8'))
def _send_data(self):
for data in DATA:
print(self._producer.send(TEST_TOPIC, value=data))
def test_basic_processing(self):
self._send_data()
received = []
consumer = KafkaConsumer(TEST_TOPIC, bootstrap_servers=SERVERS)
for msg in consumer:
message = json.loads(msg.value.decode('utf-8'))
received.append(message)
if (len(received) >= len(DATA)):
self.assertEqual(received, DATA)
This should succeed pretty quickly, as it just sends the data to the the Kafka broker in a pretty straightforward manner. However, it times out; the consumer never reads a single message. If I move the consumer portion to a different file and run it in a different terminal window, the messages are "consumed" pretty instantly. Why is the unittest not working for a consumer in this unittest?
You're producing records with your producer and then you're reading, this might be your problem.
When your consumer is started, you already had produced records, so, from the consumer point of view, there are no new messages.
You should run your consumer in a different thread, before your producer start producing.
Yannick

Cannot get KafkaConsumer messages without hanging

I've written some data into my one-partition topic with a KafkaProducer, I'm trying to view this data using a KafkaConsumer by either looping through the consumer or poll()
import time
from datetime import datetime, timedelta
from kafka import KafkaProducer, KafkaConsumer, TopicPartition
consumer = KafkaConsumer(bootstrap_servers='localhost:9092',group_id='my-group',enable_auto_commit=False)
tp = TopicPartition(topic_name, 0)
consumer.assign([tp])
consumer.seek_to_end(tp)
last_offset = consumer.position(tp)
producer = KafkaProducer(bootstrap_servers='localhost:9092')
stopWriting = datetime.now() + timedelta(seconds=10)
while datetime.now() < stopWriting:
producer.send(topic='my-topic',value=str(datetime.now()).encode('utf-8'))
time.sleep(1)
producer.close()
consumer.seek(tp, last_offset)
#looping through the consumer
for msg in consumer:
print(msg)
# or looping through the polled messages
for msg in consumer.poll():
print(msg)
Neither one seems to work properly, the consumer loop does print out the messages, but always ended up hanging by an infinite loop within kafka/consumer/group.py(886)_message_generator. The poll loop doesn't print anything out at all. Is there something I'm missing to read out all of the newly made messages without hanging the program? I'm using Python 3.6.1 and kafka-python version 1.3.4
I found a way with poll(), first you need a timeout with it because none of the messages are in the buffer. Next return a dict of {TopicParition:[ConsumerRecord]}, so you need to specify the topic partition that you want to read the messages from.
import sys
records = consumer.poll(timeout_ms=sys.maxsize)
for record in records[tp]:
print(record)

kafka consumer on single node multi broker configuration

Did somebody worked on kafka python with single node multi broker setup?
I was able to produce and consume the data with single node single broker settings but when I have changed that to single node multi broker data was produced and was stored in the topics but when I run the consumer code data was not consumed.
Any suggestions on the above would be appreciated. Thanks in advance!
Note: all the configurations like producer , consumer and server properties were verified and are fine.
Producer code:
from kafka.producer import KafkaProducer
def producer():
data = {'desc' : 'testing', 'data' : 'testing single node multi broker'}
topic = 'INTERNAL'
producer = KafkaProducer(value_serializer=lambda v:json.dumps(v).encode('utf-8'), bootstrap_servers=["localhost:9092","localhost:9093","localhost:9094"])
producer.send(topic, data)
producer.flush()
Consumer code:
from kafka.consumer import KafkaConsumer
def consumer():
topic = 'INTERNAL'
consumer = KafkaConsumer(topic,bootstrap_servers=["localhost:9092","localhost:9093","localhost:9094"])
for data in consumer:
print data
Server 1 config: I have added two more server files like this with same parameters for other brokers with the difference in broker.id, log.dirs values.
broker.id=1
port=9092
num.network.threads=3
log.dirs=/tmp/kafka-logs-1
num.partitions=3
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
log.cleaner.enable=false
zookeeper.connect=localhost:2181
delete.topic.enable=true
Producer config:
metadata.broker.list=localhost:9092,localhost:9093,localhost:9094
Consumer config:
zookeeper.connect=127.0.0.1:2181
zookeeper.connection.timeout.ms=6000
Do you receive the messages with a simple Kafka consumer ?
bin/kafka-console-consumer.sh –bootstrap-server localhost:9092,localhost:9093,localhost:9094 –topic INTERNAL –from-beginning
Or with this one :
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic INTERNAL
If you get the messages with the second command, try to delete /tmp/log.dir of your brokers and log files in /tmp/zookeepker/version-2/. Then restart zookeeper and your brokers and create your topic again.

Categories