producer = KafkaProducer(bootstrap_servers='kf-p1l-node3:9092,xxxxx,xxxxx',
value_serializer=lambda x: dumps(x).encode('utf-8')) # utf-8
consumer = KafkaConsumer( bootstrap_servers='rdwh-node1:49092,xxxxx,xxxxx',
# bootstrap_servers='kf-p1l-node3:9092,xxxxx,xxxxx',
auto_offset_reset=param["AUTO_OFFSET_RESET"],
consumer_timeout_ms=param["CONSUMER_TIMEOUT_MS"],
enable_auto_commit=False,
auto_commit_interval_ms=60000,
group_id=param["GROUP_ID"],
client_id=param["CLIENT_ID"]
)
consumer.subscribe([param["TOPIC_IN"]])
This code work if, KafkaProducer and KafkaConsumer's bootstrap_server are the same. But if change KafkaConsumer to another server it doesn't work
Bootstrap servers must contain all servers for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping. You can check the documentation here: http://kafka.apache.org/090/documentation.html
consumer = KafkaConsumer('my-topic',
group_id='my-group',
bootstrap_servers=['node1:port1', 'node1:port2', 'node2:port3'])
Related
I load messages from kafka topic to database. Loading to database can fail. Also I do not want to lose unsent messages.
App code:
import faust
app = faust.App('App', broker='kafka://localhost:9092')
source_topic = app.topic('source_topic')
failed_channel = app.channel() # channel for unsent messages
#app.agent(source_topic)
async def process(stream):
async for batch in stream.take(100_000, within=60):
# here we have not info about partitions and keys
# to reuse them when resending if sending failed
try:
pass # send to database. can fail
except ConnectionError:
for record in batch:
# sending to channel is faster than sending to topic
await failed_channel.send(value=record)
#app.agent(failed_channel)
async def resend_failed(stream):
async for unsent_msg in stream:
await source_topic.send(value=unsent_msg)
Maybe there is more standart way to handle such situations? Adding app.topic('source_topic', acks=False) works only after restarting app.
I load messages from kafka topic to database
Maybe there is more standart way to handle such situations
Yes - it's called Kafka Connect :-)
The standard pattern is to do any processing on your data and write it [back to] Kafka topics. Then you use the Kafka topic as a source for a Kafka Connect sink connector, in this case the Kafka Connect JDBC Sink connector.
Kafka Connect is part of Apache Kafka, and handles restarts, scaleout, failures, etc etc.
See also Kafka Connect in Action: JDBC Sink
I am trying to write a producer and consumer code in python using pika for rabbitmq. However for my specific case, I need to run producer on a different host and consumer on other.
I have already written a producer code as:
import pika
credentials = pika.PlainCredentials('username', 'password')
parameters = pika.ConnectionParameters('ip add of another host', 5672, '/', credentials)
connection = pika.BlockingConnection()
channel = connection.channel()
channel.queue_declare(queue='test')
channel.basic_publish(exchange='', routing_key='test', body='hello all!')
print (" [x] sent 'Hello all!")
connection.close()
The above producer code is running without any error. I also created a new user and gave administrator credentials to it on rabbitmq-server. However when I run the consumer code on another host running rabbitmq-server, I do not see any output:
import pika
credentials = pika.PlainCredentials('username', 'password')
parameters = pika.ConnectionParameters('localhost', 5672, '/', credentials)
connection = pika.BlockingConnection()
channel = connection.channel()
channel.queue_declare(queue='test')
def callback(ch, method, properties, body):
print(" [x] Recieved %r" % body)
channel.basic_consume(
queue='test', on_message_callback=callback, auto_ack=True)
print (' [x] waiting for messages. To exit press ctrl+c')
channel.start_consume()
So, here i had two hosts on the same network which had rabbitmq installed. However one has 3.7.10 and other had 3.7.16 version of rabbitmq.
The producer is able to send the text without error, but the consumer on another host is not receiving any text.
I do not get any problem when both run on same machine, as i just replace connection settings with localhost. Since user guest is only allowed to connect on localhost by default, i created a new user on consumer host running rabbitmq-server.
Please look if anyone can help me out here...
I have a couple of questions when I see your problem:
Are you 100% sure that on your RabbitMQ management monitoring
you see 2 connections? One from your local host and another from the another host? This will help to debug
Second, Did you check that your ongoing port 5672 on the server that host RabbitMQ is open? Because maybe your producer does not manage to connect What is your cloud provider?
If you don't want to manage those kinds of issues, you should use a service like https://zenaton.com. They host everything for you, and you have integrated monitoring, error handling etc.
Your consumer and producer applications must connect to the same RabbitMQ server. If you have two instances of RabbitMQ running they are independent. Messages do not move from one instance of RabbitMQ to another unless you configure Shovel or Federation.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
You don't seem to be passing the parameters to the BlockingConnection instance.
import pika
rmq_server = "ip_address_of_rmq_server"
credentials = pika.PlainCredentials('username', 'password')
parameters = pika.ConnectionParameters(rmq_server, 5672, '/', credentials)
connection = pika.BlockingConnection(parameters)
channel = connection.channel()
Also, your consumer is attaching to the localhost hostname. Make sure this actually resolves and that your RabbitMQ service is listening on the localhost address (127.0.0.1) It may not be bound to that address. I believe that RMQ will bind to all interfaces (and thus all addresses) by default but I'm not sure.
so I am fairly new to Kafka. I am attempted to run a simple Kafka consumer and producer. when I run my consumer it prints hello right before the for loop. But nothing ever prints in the for loop, leading me to believe it never enters the for loop in the first place and the consumer doesn't consume the messages from the producer. I am running this on a linux system.
Can anyone give advice on what could be wrong with the producer or consumer? I have displayed my producer and consumer code which are both only a few lines of code.
This is my producer:
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers='localhost:2181',api_version=(1,0,1))
producer.send('MyFirstTopic1', 'Hello, World!')
This is my consumer:
from kafka import KafkaConsumer,KafkaProducer,TopicPartition,OffsetAndMetadata
consumer = KafkaConsumer(
bootstrap_servers=['localhost:2181'],api_version=(1,0,1),
group_id=None,
enable_auto_commit=False,
auto_offset_reset='smallest'
)
consumer.subscribe('MyFirstTopic1',0)
print("hello")
for message in consumer:
print(message)
So when running my producer it eventually gives an error.Anyone know what this means and if this is possibly what is wrong.
File "producer.py", line 3, in <module>
producer.send('MyFirstTopic1', 'Hello, World!')
File "/usr/local/lib/python3.5/site-packages/kafka/producer/kafka.py", line 543, in send
self._wait_on_metadata(topic, self.config['max_block_ms'] / 1000.0)
File "/usr/local/lib/python3.5/site-packages/kafka/producer/kafka.py", line 664, in _wait_on_metadata
"Failed to update metadata after %.1f secs." % max_wait)
kafka.errors.KafkaTimeoutError: KafkaTimeoutError: Failed to update metadata after 60.0 secs.
It looks like you are using the wrong host in your client configurations. localhost:2181 is usually the Zookeeper server.
For your clients to work, you need to set bootstrap_servers to the Kafka broker hostname and port instead. This is localhost:9092 by default.
See https://kafka-python.readthedocs.io/en/latest/apidoc/KafkaProducer.html
Confluent Kafka 5.0.0 has been installed on AWS EC2 which has Public IP say 54.XX.XX.XX
Opened port 9092 on the EC2 machine with 0.0.0.0
In /etc/kafka/server.properties I have
advertised.listeners=PLAINTEXT://54.XX.XX.XX:9092
listeners=PLAINTEXT://0.0.0.0:9092
In /etc/kafka/producer.properties I have bootstrap.servers=0.0.0.0:9092
on local machine
In /etc/kafka/consumer.properties I have bootstrap.servers=54.XX.XX.XX:9092
In the EC2, started kafka 'confluent start' and created 'mytopic'
My producer.py code running from local machine looks like (relavant portion):
from confluent_kafka import Producer
broker = '54.XX.XX.XX'
topic = 'mytopic'
p = Producer({'bootstrap.servers': broker})
for data in dictList:
p.poll(0)
sendme = json.dumps(data)
p.produce(topic, sendme.encode('utf-8'), callback=delivery_report)
p.flush()
This seems to write messages to 'mytopic' in the kafka stream in the EC2. I can see those messages in 'kafkacat -b 54.XX.XX.XX -t mytopic' on the EC2.
But I am not able to access those message from local machine as a simple message printing consumer, with code as below:
from confluent_kafka import Consumer, KafkaError, KafkaException
import json
import sys
broker = '54.XX.XX.XX'
topic = 'mytopic'
group = 'mygroup'
c = Consumer({
'bootstrap.servers': broker,
'group.id': group,
'session.timeout.ms': 6000,
'default.topic.config': {
'auto.offset.reset': 'smallest'
}
})
basic_consume_loop(c,[topic])
def basic_consume_loop(consumer, topics):
try:
consumer.subscribe(topics)
while running:
msg = consumer.poll(timeout=1.0)
if msg is None: continue
if msg.error():
if msg.error().code() == KafkaError._PARTITION_EOF:
# End of partition event
sys.stderr.write('{} [{}] reached end at offset {}\n'.format(msg.topic(), msg.partition(), msg.offset()))
data_process()
elif msg.error():
raise KafkaException(msg.error())
else:
msg_process(msg)
finally:
# Close down consumer to commit final offsets.
print("Shutting down the consumer")
consumer.close()
It just hangs, did I miss any settings?
Following steps seem to work.
On both, local and EC2 machine, in /etc/kakfa/server.properties set
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://54.XX.XX.XX:9092
On local machine, in /etc/kakfa/producer.properties set
bootstrap.servers=0.0.0.0:9092
On EC2 machine, in /etc/kakfa/producer.properties set
bootstrap.servers=localhost:9092
On both local and EC2 machine, in /etc/kakfa/consumer.properties set
bootstrap.servers=0.0.0.0:9092
group.id=mygroup
Use 'confluent-start' to start all necessary daemons on remote EC2 machine.
On local machine, Confluent is NOT made running.
On local machine (for ip hiding, optional):
export KAFKA_PRODUCER_IP=54.XX.XX.XX
With this, producer from local machine,can put messages on remote EC2 Kafka by following:
broker = os.environ['KAFKA_PRODUCER_IP'] + ':9092'
topic = 'mytopic'
p = Producer({'bootstrap.servers': broker})
From local machine, messages could be fetched from remote EC2 kafka by following:
broker = os.environ['KAFKA_PRODUCER_IP'] + ':9092'
topic = 'mytopic'
group = 'mygroup'
c = Consumer({
'bootstrap.servers': broker,
'group.id': group,
'session.timeout.ms': 6000,
'default.topic.config': {
'auto.offset.reset': 'smallest'
}
})
These steps seems to work. There could be some redundancies, if so, do point out.
I have an end-to-end pipeline of an web application like below in Python3.6
Socket(connection from client to server) -> Flask Server -> Kafka Producer ->Kafka Consumer ->NLPService
Now when I get some result back from the NLPService, I need to send it back to the client. I am thinking below steps
NLP service writes the result to a different topic on Kafka producer (done)
Kafka consumer retrieves the result from Kafka broker (done)
Kafka consumer needs to write the result to the flask server
Then flask server will send the result back to the socket
Socket writes to client
I have already done steps 1-2. But stuck at step 3, 4. How do I write from Kafka to the flask server? If I just call a function at my server.py, then logically it seems like I have to create a socket within at function at server.py which will do the job of sending to client through socket. But syntax wise it looks weird. What am I missing?
at consumer.py
#receiving reply
topicReply = 'Reply'
consumerReply = KafkaConsumer(topicReply, value_deserializer=lambda m: json.loads(m.decode('ascii')))
for message in consumerReply:
#send reply back to Server
fromConsumer(message.value)
at server.py
socketio = SocketIO(app)
def fromConsumer(msg):
#socketio.on('reply')
def replyMessage(msg):
send(msg)
The above construct in server.py doesn't make sense to me. Please suggest.