Did somebody worked on kafka python with single node multi broker setup?
I was able to produce and consume the data with single node single broker settings but when I have changed that to single node multi broker data was produced and was stored in the topics but when I run the consumer code data was not consumed.
Any suggestions on the above would be appreciated. Thanks in advance!
Note: all the configurations like producer , consumer and server properties were verified and are fine.
Producer code:
from kafka.producer import KafkaProducer
def producer():
data = {'desc' : 'testing', 'data' : 'testing single node multi broker'}
topic = 'INTERNAL'
producer = KafkaProducer(value_serializer=lambda v:json.dumps(v).encode('utf-8'), bootstrap_servers=["localhost:9092","localhost:9093","localhost:9094"])
producer.send(topic, data)
producer.flush()
Consumer code:
from kafka.consumer import KafkaConsumer
def consumer():
topic = 'INTERNAL'
consumer = KafkaConsumer(topic,bootstrap_servers=["localhost:9092","localhost:9093","localhost:9094"])
for data in consumer:
print data
Server 1 config: I have added two more server files like this with same parameters for other brokers with the difference in broker.id, log.dirs values.
broker.id=1
port=9092
num.network.threads=3
log.dirs=/tmp/kafka-logs-1
num.partitions=3
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
log.cleaner.enable=false
zookeeper.connect=localhost:2181
delete.topic.enable=true
Producer config:
metadata.broker.list=localhost:9092,localhost:9093,localhost:9094
Consumer config:
zookeeper.connect=127.0.0.1:2181
zookeeper.connection.timeout.ms=6000
Do you receive the messages with a simple Kafka consumer ?
bin/kafka-console-consumer.sh –bootstrap-server localhost:9092,localhost:9093,localhost:9094 –topic INTERNAL –from-beginning
Or with this one :
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic INTERNAL
If you get the messages with the second command, try to delete /tmp/log.dir of your brokers and log files in /tmp/zookeepker/version-2/. Then restart zookeeper and your brokers and create your topic again.
Related
In my computer vision project, I want to send images from webots-controller to the AI-model as inputs and then send movements from the AI-model to webots-controller. (for the bidirectional message passing, I used two topics)
I wrote this simple code to pass messages but it don't work. What should I do?
Code
# AI-model.py
from kafka import KafkaConsumer, KafkaProducer
producer = KafkaProducer(bootstrap_servers='localhost:9092')
consumer = KafkaConsumer('model-mailbox')
while(True):
img = consumer.__next__()
print(img.key)
print('a-received')
producer.send('webots-mailbox', key=b'movement', value=b'a')
producer.flush()
print('a-sent')
# webots-controller.py
from kafka import KafkaConsumer, KafkaProducer
producer = KafkaProducer(bootstrap_servers='localhost:9092')
consumer = KafkaConsumer('webots-mailbox')
while True:
producer.send('model-mailbox', key=b'image', value=b'b')
producer.flush()
print('b-sent')
movement = consumer.__next__()
print(movement.key)
print('b-received')
Output
These are the console outputs. (I run the AI model first)
matin#matin:~/ python AI-model.py
b'image'
a-received
a-sent
As you can see, webots-controller don't receive any message.
matin#matin:~/ python webots-controller.py
b-sent
Extra x)
Also mentioning that when I comment a-received part, my messages will arrive at b-received part and console shows this output.
matin#matin:~/ python webots-controller.py
b-sent
b'movement'
b-received
This may or may not be Kafka related, but I encountered this while learning Kafka. I've got a python producer script that looks like this:
from kafka import KafkaProducer
from json import dumps
class Producer:
def __init__(self):
self.connection = KafkaProducer(
bootstrap_servers=['localhost:9092'],
value_serializer=lambda x: dumps(x).encode('utf-8')
)
def push_client(self, data):
self.connection.send('client-pusher', value=data)
data = {
"first_name": "Davey",
"email": "davey#dave.com",
"group_id": 3,
"date": "2021-12-12"
}
producer = Producer()
producer.push_client(data)
I'm running the Kafka Broker in Docker, and the messages get consumed on the other end by this script:
import json
from datetime import date
from typing import Optional
from kafka import KafkaConsumer
from pydantic import BaseModel
class Client(BaseModel):
first_name: str
email: str
group_id: Optional[int] = None
date: date
consumer = KafkaConsumer(
'client-pusher',
bootstrap_servers=['localhost:9092'],
auto_offset_reset='earliest',
enable_auto_commit=True,
group_id='my-group-id',
value_deserializer=lambda x: json.loads(x.decode('utf-8'))
)
while True:
msg_pack = consumer.poll(timeout_ms=500)
for tp, messages in msg_pack.items():
for message in messages:
client = Client(**message.value)
print(client)
The consumer script listens for new messages on an infinite loop. I can run the consumer in terminal or in vscode and it will always print out the data dict from the producer, but ONLY if I run the producer script in Visual Studio code.
If I run the producer script in the terminal with
python producer.py
the messages don't come through to the consumer. There are no runtime errors (print statements in the producer come through fine). I cannot for the life of me see what's different about the environment in my IDE.
I have different virtual environments governing both scripts. I've tried running the producer with the full path to the venv, copied straight from vscode's terminal, for example
/home/me/whatever/dummy-producer/.venv/bin/python producer.py
I've also printed out everything in sys.path – they're identical between the IDE and the terminal.
What else might I try to find the difference between vscode's execution and the terminal's? I'm using zsh, if that matters.
Kafka clients don't immediately send the messages; if you have less than the default batch size and the app exits, you're effectively dropping events.
If you want to send immediately, you need one more method in the producer
def push_client(self, data):
self.connection.send('client-pusher', value=data)
self.connection.flush()
I have the following Kafka consumer, it works well if assigning the group_id to None - it received all historical messages and my newly tested message.
consumer = KafkaConsumer(
topic,
bootstrap_servers=bootstrap_servers,
auto_offset_reset=auto_offset_reset,
enable_auto_commit=enable_auto_commit,
group_id=group_id,
value_deserializer=lambda x: json.loads(x.decode('utf-8'))
)
for m in consumer:
However, it doesn't receive anything if I set the group_id to some value. I tried to run the test producer to send new messages and nothing is received.
The consumer console does show the following message:
2020-11-07 00:56:01 INFO ThreadPoolExecutor-0_0 base.py (Re-)joining group my_group
2020-11-07 00:56:07 INFO ThreadPoolExecutor-0_0 base.py Successfully joined group my_group with generation 497
2020-11-07 00:56:07 INFO ThreadPoolExecutor-0_0 subscription_state.py Updated partition assignment: []
2020-11-07 00:56:07 INFO ThreadPoolExecutor-0_0 consumer.py Setting newly assigned partitions set() for group my_group
One partition of a topic can only be consumed by one consumer within the same ConsumerGroup.
If you do not set the group.id, the KafkaConsumer will generate a new, random group.id for you. As this group.id is unique you will see data is being consumed.
If you have multiple consumers running with the identical group.id, only one consumer will read the data whereas the other one stays idle not consuming anything.
I know, this is not the solution to the author's problem. Still, if you landed here you might be having this problem for another reason. Same as I had.
So, at least for kafka-python v2.0.2 and Aiven Kafka broker setup, the problem was solved by adding dry call of consumer.poll().
This is especially weird since this is not required when no group_id is asssigned.
Output from:
def get():
for message in consumer:
print(message.value)
consumer.commit()
Is nothing in this case
While below works as expected. It reads out only new messages from last commit():
Output from:
def get():
consumer.poll()
for message in consumer:
print(message.value)
consumer.commit()
It outputs all messages in this topic since last commit, as expected
JFYI, class constructor looks like this:
consumer = KafkaConsumer(
topics,
bootstrap_servers=self._service_uri,
auto_offset_reset='earliest',
enable_auto_commit=False,
client_id='my_consumer_name',
group_id=self.GROUP_ID,
security_protocol="SSL",
ssl_cafile=self._ca_path,
ssl_certfile=self._cert_path,
ssl_keyfile=self._key_path,
)
¯\_(ツ)_/¯
I have a kafka machine running in AWS which consists of several topics.
I have the following Lambda function which Produces a message and push that to one of the kafka topic.
import json from kafka
import KafkaClient from kafka
import SimpleProducer from kafka
import KafkaProducer
def lambda_handler(event, context):
kafka = KafkaClient("XXXX.XXX.XX.XX:XXXX")
print(kafka)
producer = SimpleProducer(kafka, async = True)
print(producer)
task_op = {
"'message": "Hai, Calling from AWS Lambda"
}
print(json.dumps(task_op))
producer.send_messages("topic_atx_ticket_update",json.dumps(task_op).encode('utf-8'))
print(producer.send_messages)
return ("Messages Sent to Kafka Topic")
But I see messages are not pushed as i expected.
Note: No Issues in Roles and Policies, Connectivity.
While Creating a Kafka Producer object,
producer = SimpleProducer(kafka, async=True)
"async" String should be False, like
producer = SimpleProducer(kafka, async=False)
Then,
you can send the Kafka Message to a topic from AWS Lambda.
from kafka import KafkaProducer, errors, admin, KafkaConsumer
SERVERS = ['localhost:9092']
TEST_TOPIC = 'test-topic'
DATA = [{'A':'A'}, {'A':'A'}, {'A':'A'}]
class TestKafkaConsumer(unittest.TestCase):
#classmethod
def setUpClass(self):
self._producer = KafkaProducer(bootstrap_servers=SERVERS, value_serializer=lambda x:dumps(x).encode('utf-8'))
def _send_data(self):
for data in DATA:
print(self._producer.send(TEST_TOPIC, value=data))
def test_basic_processing(self):
self._send_data()
received = []
consumer = KafkaConsumer(TEST_TOPIC, bootstrap_servers=SERVERS)
for msg in consumer:
message = json.loads(msg.value.decode('utf-8'))
received.append(message)
if (len(received) >= len(DATA)):
self.assertEqual(received, DATA)
This should succeed pretty quickly, as it just sends the data to the the Kafka broker in a pretty straightforward manner. However, it times out; the consumer never reads a single message. If I move the consumer portion to a different file and run it in a different terminal window, the messages are "consumed" pretty instantly. Why is the unittest not working for a consumer in this unittest?
You're producing records with your producer and then you're reading, this might be your problem.
When your consumer is started, you already had produced records, so, from the consumer point of view, there are no new messages.
You should run your consumer in a different thread, before your producer start producing.
Yannick