I have tried this tutorial and have successfully consumed kafka topics that are published on a server at my work place. I am not the producer, just purely a consumer. However, the code in that tutorial is to stream in a terminal. Now I want to try it with Python and record the messages into text file (or something of that sort).
This is the code I use, after reading a few more threads and tutorials (such as here):
from kafka import KafkaConsumer
bootstrap_servers = ['xxxxxx:xxxx']
topicName: = 'my_topic_name'
consumer = KafkaConsumer(topicName, group_id='group1', bootstrap_servers=bootstrap_servers, consumer_timeout_ms=1000)
for msg in consumer:
print(msg.value)
Here I want to first print out the message. But I get this error after 1000ms timeout:
kafka.errors.NoBrokersAvailable: NoBrokersAvailable
which sounds logical to me, since a broker is needed and the code above does not seem to do anything with a broker.
If I don't set the consumer_timeout_ms=1000ms, the Python console get just stuck without displaying anything.
How do I resolve this?
More details:
I am doing the following in parallel:
1 - Run zookeeper in one terminal
2 - Run kafka cluster in another terminal
3 - Streaming the topics (nicely) in another terminal with the command kafka-consumer-console
4 - In another terminal, run the Python code in this question.
All of these terminals are Ubuntu in WLS2 (Windows).
If you're able to use the WSL terminal with kafka-console-consumer, then running Python code there should work the same.
If you're connecting to a remote Kafka server, chances are the WSL2 network settings are simply not able to reach that address. (multiple issues elsewhere talk about WSL2 and not having external internet access) . Therefore, you should really consider running Python code on the Windows host itself. Otherwise, sounds like you'll need to adjust your network configurations.
The for loop will wait for new messages to a topic, not read existing data until you add another parameter to the consumer to tell it to
FWIW, you can use kafka-console-consumer ... >> file.txt to write out to a file
Related
I currently have a Python script sending data over to an IoT Hub and a node in a Node-red flow receiving that information, but for some cases that would not work (ex. when internet is down).
I'm wondering if there is any way I can adapt my Python script to get that json object sent directly to Node-red bypassing any communication over the Internet.
Any hint would be appreciated!
You could add in a messaging solution in the middle. You would use something like a Python MQi library, which would be able to provide you with assured message delivery at the Python end when the network connection re-establishes. There are a varied set of MQ and MQRest Node-RED nodes that you can use on the Node-RED end listening for messages.
i have already started to learn Kafka. Trying basic operations on it. I have stucked on a point which about the 'Brokers'.
My kafka is running but when i want to create a partition.
from kafka import TopicPartition
(ERROR THERE) consumer = KafkaConsumer(bootstrap_servers='localhost:1234')
consumer.assign([TopicPartition('foobar', 2)])
msg = next(consumer)
traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/kafka/consumer/group.py", line 284, in init
self._client = KafkaClient(metrics=self._metrics, **self.config)
File "/usr/local/lib/python2.7/dist-packages/kafka/client_async.py", line 202, in init
self.config['api_version'] = self.check_version(timeout=check_timeout)
File "/usr/local/lib/python2.7/dist-packages/kafka/client_async.py", line 791, in check_version
raise Errors.NoBrokersAvailable()
kafka.errors.NoBrokersAvailable: NoBrokersAvailable
I had the same error during kafka streaming. The code below resolved my error: We need to define the API version in KafkaProducer.
KafkaProducer(bootstrap_servers=['localhost:9092'],
api_version=(0,11,5),
value_serializer=lambda x: dumps(x).encode('utf-8'))
You cannot create partitions within a consumer. Partitions are created when you create a topic. For example, using command line tool:
bin/kafka-topics.sh \
--zookeeper localhost:2181 \
--create --topic myNewTopic \
--partitions 10 \
--replication-factor 3
This creates a new topic "myNewTopic" with 10 partitions (numbered from 0 to 9) and replication factor 3. (see http://docs.confluent.io/3.0.0/kafka/post-deployment.html#admin-operations and https://kafka.apache.org/documentation.html#quickstart_createtopic)
Within your consumer, if you call assign(), it means you want to consume the corresponding partition and this partition must exist already.
The problem for me, was the firewall rule as I am running Kafka on Google Cloud.
It was working for me yesterday and I was scratching my head today for 1 hour thinking about why it doesn't work anymore .
As the public IP address of my local system changes every time I connect to a different LAN or WiFi, I had to allow my local system's public IP in the firewall rules. I would suggest using a connection with a fixed public IP or to check for this whenever you switch/change your connection.
These small changes in the configurations take too much to debug and fix them. Felt like wasted an hour for this.
Don't know if this answer is still relevant but recently resolved this same problem in a VBox VM broker not reachable from host Windows OS.
Since you have mentioned bootsrap_servers in KafkaConsumer, I assume you are using at least kafka 0.10.0.0
Please look for the advertised.listeners property in server.properties file and set it to PLAINTEXT://localhost:9092 or PLAINTEXT://<broker_ip>:9092
But before you set that make sure your broker is reachable from the environment where your consumer is running (by doing ping <broker_ip> and netcat nc -vz <broker_ip> 9092).
Also, you need to restart the kafka-server and consumer/producer (whatever is running) and try sending/receiving.
For example, if you are running VM, you may like to use Host-only adapter to make the broker reachable from host machine
NOTE: This configuration works for Kafka Server >= 0.10.X.X but not for 0.8.2.X. Haven't checked for 0.9.0.X
It looks like you want to start consuming messages instead of creating partions. Nevertheless - can you reach kafka at port 1234? 9092 is kafkas default port maybe you can try this one. If you found the right port but your application still produces errors you can try to use a console consumer to test your setup:
bin/kafka-console-producer.sh --broker-list localhost:<yourportnumber> --topic foobar
The console consumer is part of the standard kafka distribution. Maybe that gets you a little closer to the source of the problem.
NoBrokersAvailable can be a answer of bad configuration of hostname in kafka configuration.
Use 127.0.0.1 instead of localhost or any other IP relevant to your usecase.
Changing localhost:9092 to 127.0.0.1:9092 worked for me.
from kafka import KafkaConsumer
consumer = KafkaConsumer('topicname',bootstrap_servers=['127.0.0.1:9092'])
print(consumer.config)
print(consumer.bootstrap_connected())
Status Quo:
I have two python apps (frontend-server and data-collector, a database is 'between' them).
Currently using redis as db and its publish/subscribe protocol to notify the frontend when new data is available.
But may I want to use a different database (and don't want to keep redis on the system just for the pub/sub).
Are there any simple alternatives to notify my frontend if the data-collector has transacted new data to the database (without using an external message queue like beanstalkd or redis)?
ZeroMQ is a good option. It has good Python bindings, and it makes communicating between processes on the same machine and processes on different machines look almost identical.
Start by reading the guide: http://zguide.zeromq.org/page:all
As I mentioned in my comment, if you want something that is going across a network then other than setting up a web service (flask app?), or writing your own INET socket server there is nothing built in to the operating system to communicate between machines. Beanstalk has a very simple API in Python and I've used it for this kind of thing very successfully.
try:
beanstalk = beanstalkc.Connection(host="my.host.com")
beanstalk.watch("update_queue")
except:
print "Error connecting to beanstalk"
while True:
job = beanstalk.reserve()
do_something_with_job(job)
If you are only going to be working on the same machine, then read up on linux IPC. A socket connection between processes is very fast and has practically zero overhead. They can also be a part of an asynchronous program when you take advantage of epoll call backs.
so i recently want to set up a config server using zookeeper (3.4.3), with python client (http://pypi.python.org/pypi/zc-zookeeper-static).
i noticed if i just set up one watch, it's pretty fast to get the notification that the node changes. but when i try to watch 100 nodes from the same session, it takes about 2 minutes to get notified for some reason. here's my python script: http://pastebin.com/BC6nKdRV
zookeeper server config is pretty simple:
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
maxClientCnxns=0
not sure if there's something i did wrong here. any advice would be great. thx!
turns out it's the problem with the client. kazoo has no problem with multiple watches per session. i tested with 5000 watches and change notification is still almost instant. https://github.com/python-zk/kazoo
I am developing a testbed for cloud computing environment. I want to establish multiple client connection to a server. What I want is that, server first of all send a data to all the clients specifying sending_interval and then all the clients will keep on sending their data with a time gap of that time_interval (as specified by the server). Please help me out, how can I do the same using python socket program. (i.e. I want multiple client to single server connectivity and also client sending data with the time gap specified by server). Will be great-full if anyone can help me. Thanks in advance.
This problem is easily solved by the ZeroMQ socket library. It is production stable. It allows you to define publisher-subscriber relationships, where a publishing process will publish data on a port regardless of how many (0 to infinite) listening processes there are. They call this the PUB-SUB model; it's in their docs (link below).
It sounds like you want to set up a bunch of clients that are all publishers. They can subscribe to a controlling channel, which which will send updates to their configuration (how often to write). They also act as publishers, pushing out their own data at an interval specified by default/config channel/socket.
Then, you have one or more listening processes that listen to all the clients' published messages. Perhaps you could even have two listening processes, one for backup or DR, or whatever.
We're using ZeroMQ and loving the simplicity it gives; there's no connection errors because the publisher doesn't care if anyone is listening, and the subscriber can start before the publisher and if there's nothing there to listen to, it can just loop around and wait until there is.
Bindings are available in ALL languages (it's freaky). The Python binding isn't pure-python, it does require a C compiler, but is frighteningly fast, and the pub/sub example is a cut/paste, 'golly, it works!' experience.
Link: http://zeromq.org
There are MANY other methods available with this library, including message queues, etc. They have relatively complete documentation, too.
Multi-Client and Single server Socket programming can be achieved by Multithreading in Socket Programming. I have implemented both the method:
Single Client and Single Server
Multiclient and Single Server
In my GitHub Repo Link: https://github.com/shauryauppal/Socket-Programming-Python
What is Multi-threading Socket Programming?
Multithreading is a process of executing multiple threads simultaneously in a single process.
To understand well you can visit Link: https://www.geeksforgeeks.org/socket-programming-multi-threading-python/, written by me.