Kafka-python script does not execute properly - python

I have run the below code in the Python Shell:
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers='localhost:9092')
future = producer.send('hello-topic', b'Hello, World!')
This works perfectly in that the Kafka consumer picks up the messages.
BUT...
Running it via a script does nothing.
Am I missing something obvious?
The only way to get it working as a script is to add this line...
future.get(timeout=10)
Any help would be appreciated.

kafka send() details from the link : send() is asynchronous. When called it adds the record to a buffer of pending record sends and immediately returns. This allows the producer to batch together individual records for efficiency.
You can use flush()/poll() method to send the message immediately.

Related

Redis-py: run_in_thread eventhandler stops getting called after few hours

I'm trying to implement a basic pubsub using redis-py client.
The idea is, the publisher is actually a callback that gets called periodically and will publish some information on channel1 in the callback function.
The subscriber will listen on that channel for this message and do some processing accordingly.
The subscriber is actually a basic bare-bones webserver that is deployed on k8s and it simply should show up the messages that it receives via the event_handler function.
subscriber.py
class Sub(object):
def __init___(self):
redis = Redis(host=...,
port=...,
password=...,
db=0)
ps = redis.pubsub(ignore_subscribe_messages=True)
ps.subscribe(**{'channel1': Sub.event_handler})
ps.run_in_thread(sleep_time=0.01, daemon=True)
#staticmethod
def event_handler(msg):
print("Hello from event handler")
if msg and msg.get('type') == 'message': # interested only in messages, not subscribe/unsubscribe/pmessages
# process the message
publisher.py
redis = Redis(host=...,
port=...,
password=...,
db=0)
def call_back(msg):
global redis
redis.publish('channel1', msg)
At the beginning, the messages are published and the subscriber event handler prints and process it correctly.
The problem is, after few hours, the subscriber stops showing up those messages. I've checked publisher logs and the messages definitely get sent out, but I'm not able to figure out why the event_handler is not getting called after few hours.
The print statement in it stops getting printed which is why I say the handler is not getting fired after few hours.
Initially I suspected the thread must have died, but on exec into the system I see it listed under the list of threads.
I've read through a lot of blogs, documentations but haven't found much help.
All I can deduce is the event handler stops getting called after sometime.
Can anyone help understand what's going on and the best way to reliably consume pubsub messages in a non blocking way?
Really appreciate any insights you guys have! :(
could you post the whole puplisher.py, please? It could be the case that call_back(msg) isn't called anymore.
To check whether a client is still subscribed, you can use the command PUBSUB CHANNELS in reds-cli.
Regards, Martin

RabbitMQ - Python/Pika How to know if queue is empty?

My idea is also described here if I express myself incorrectly (Send images with their names in one message - RabbitMQ (Python 3.X))
I currently have a problem with RabbitMQ --->
I made a working queue on which several consumers work at the same time, it is a containerized image processing that gives a str output with the requested information.
The results must be sent on another queue when the processing is finished,
but how do I know if the queue containing the images is empty and there is no more work to do? I would like to know if a command like "if the queue is empty, then send the results..." to say it roughly.
Thank you for your time, have a good day.
You can do a passive declare of the queue to get the count of messages, but that may not be reliable as the count returned does not include messages in the "unacked" state. You could query the queue's counts via the HTTP API.
Or, whatever application publishes the images could send a "no more images" message to indicate no more work to do. The consumer that receives that message could then query the HTTP API to confirm that no messages are in the Ready or Unacked state, then send the results to the next queue.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
hi i think you can solve this with queue_declare.
status = channel.queue_declare(queue_name, passive=True)
if status.method.message_count > 5:
return True
log.error(f'{queue_name} has no message or less than 5 messages')
return False

Python Producer can send via shell, but not .py

I have a running and tested Kafka cluster, and am trying to use a Python script to send messages to the brokers. This works when I use the Python3 shell and call the producer method, however when I put these same commands into a python file and execute it - the script seems to hang.
I am using the kafka-python library for the consumer and producer. When I use the Python3 shell I can see the messages appear in the topic using Kafka GUI tool 2.0.4
I've tried various loops and statements in the python code, but nothing seems to make it 'run' to completion.
>>>from kafka import KafkaProducer
>>>producer = KafkaProducer(bootstrap_servers='BOOTSTRAP_SRV:9092')
>>>producer.send('MyTopic', b'Has this worked?')
>>>>>><kafka.producer.future.FutureRecordMetadata object at 0x7f7af9ece048>
And this works and bytes appears in the broker topic data.
When I put the same code as above in a python .py file and execute with Python3 it completes, but no data is sent to Kafka broker.
No error shown either.
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers='BOOTSTRAP_SRV:9092')
producer.send('MyTopic', b'Some Data to Check')
As you can see, it returns a future.
Kafka clients will batch records, they don't immeadiately send one record at a time, and to make it do that, you will need to wait or flush the producer buffer so that it'll send before the app exits. In other words, the interactive terminal keeps the producer data in-memory, running in the background, and the other way discards that data
As the docs, show
future = producer.send(...)
try:
record_metadata = future.get(timeout=10)
except KafkaError:
# Decide what to do if produce request failed...
log.exception()
pass
Or just put producer.flush(), if you don't care about the metadata or grabbing the future.

why data is received not the same way it has been sent over UDS?

I am investigating a bit UDS for logging from an app and then using another process to send the logs to an external server. Overall it seems to work fine but when I was testing it I discovered that if I send some logs in a for loop, when stream is read from the socket it contains more "messages".
you can find the code for receving logs here https://github.com/MattBlack85/alf (after installing you can run it with alf /tmp/alf.sock http://127.0.0.1:8080)
you can find a small example to send logs here https://gist.github.com/MattBlack85/86d620a306f16416a7f96a1a035984dc
you can find a small webserver to let alf send over the logs here https://gist.github.com/MattBlack85/0638ef87eb077eb46879d6c90a30cf7a
if the for loop has no sleep, the result is something like that
[2018-12-18 13:12:39,798] [DEBUG] alf.worker - MSG from queue: b'{"time":"2018-12-18 13:12:39,797","name":"test","levelname":"DEBUG","message":"test 0","pathname":"logalf.py"}{"time":"2018-12-18 13:12:39,798","name":"test","levelname":"DEBUG","message":"test 1","pathname":"logalf.py"}{"time":"2018-12-18 13:12:39,798","name":"test","levelname":"DEBUG","message":"test 2","pathname":"logalf.py"}{"time":"2018-12-18 13:12:39,798","name":"test","levelname":"DEBUG","message":"test 3","pathname":"logalf.py"}{"time":"2018-12-18 13:12:39,798","name":"test","levelname":"DEBUG","message":"test 4","pathname":"logalf.py"}{"time":"2018-12-18 13:12:39,798","name":"test","levelname":"DEBUG","message":"test 5","pathname":"logalf.py"}'
while if I put a small break of 1ms all the messages are received one by one. I tried to close all heavy process on my OS and leave the CPU free but it didn't work. This is not a big issue as I can add a terminator when formatting the JSON log and the split what is read from the socket and put every item of the resulting list into the queue, but why I am seeing this at all?

Sending data with kafka-python only working when briefly delaying code

I'm sending some data to a Kafka topic using kafka-python. I struggled with not being able to send data to my Kafka topic for a while until I found out that if I delay the code briefly it works.
from kafka import KafkaProducer
from time import sleep
producer = KafkaProducer(bootstrap_servers="localhost:9092")
producer.send("topic", "foo")
sleep(.1)
This code does not work for me without using sleep(.1). It's like sending data needs time to settle for it to work properly. Is there anything in the kafka-python client that deals with this? Or a better solution?
A year later, but to anyone seeing this, a solution is below. The issue here is race condition with the end of the script and the send call, which is why the sleep() command works.
The kafka module should better handle the python exit, or at the minimum output something to standard out/error, so this behavior isn't silent.
From the kafka-python github:
# Block until a single message is sent (or timeout)
future = producer.send('foobar', b'another_message')
result = future.get(timeout=60)
Now you can guarantee that your script will block until a message has been confirmed published.

Categories