Working with the beta Google Pub/Sub client (v0.28.3)
Has anyone seen a scenario where the same message is constantly redelivered every 10s, even after acking it?
This goes beyond the at-least-once nature of Pub/Sub. It happens sporadically, but when it does occur we see the same message continuously for several hours.
I suspect it's because we process incoming messages in a background thread from the subscriber; but haven't yet been able to consistently reproduce it. Is that not kosher for some reason?
If a bug, happy to file it but operating on the assumption we're doing something wrong. Has anyone dealt with similar issues?
With debug logging we see something like:
D 13:51:46.000 Received response: received_messages { ... message_id: "155264162517414" ... }
D 13:51:46.000 New message received from Pub/Sub: %r
I 13:51:46.000 Processing Message: 155264162517414
I 13:51:48.000 Acking Message: 155264162517414
D 13:51:48.000 Sending request: ack_ids: "LDR..."
D 13:51:50.000 Snoozing lease management for 4.009431 seconds.
D 13:51:50.000 Renewing lease for 0 ack IDs.
D 13:51:50.000 The current p99 value is 10 seconds.
...
D 13:51:59.000 Received response: received_messages { ... message_id: "155264162517414" ... }
D 13:51:59.000 New message received from Pub/Sub: %r
I 13:51:59.000 Processing Message: 155264162517414
Here's a toy version of code that shows how we are threading and this sometimes triggers the issue running locally:
import Queue
import logging
import threading
import random
import time
from google.cloud import pubsub
SUBSCRIPTION_PATH = ...
class Worker(threading.Thread):
"""Background thread to consume incoming messages."""
def __init__(self, name):
threading.Thread.__init__(self, name=name)
self.queue = Queue.Queue()
def run(self):
while True:
message = self.queue.get()
self.process(message)
print '<< Acking :', message.message_id
message.ack()
self.queue.task_done()
def process(self, message):
"""Fake some work by sleeping for 0-15s. """
s = random.randint(0, 15)
print '>> Worker sleeping for ', s, message.message_id
for i in range(s):
time.sleep(1)
print i
class Subscriber(threading.Thread):
"""Handles the subscription to pubsub."""
def __init__(self):
threading.Thread.__init__(self, name='Subscriber')
self.subscriber = pubsub.SubscriberClient()
self.worker = Worker('FakeWorker')
self.worker.daemon = True
def run(self):
self.worker.start()
flow_control = pubsub.types.FlowControl(max_messages=10)
policy = self.subscriber.subscribe(SUBSCRIPTION_PATH,
flow_control=flow_control,
callback=self._consume)
print 'Sub started, thread', threading.current_thread()
def _consume(self, message):
self.worker.queue.put(message)
if __name__ == '__main__':
subscriber = Subscriber()
subscriber.start()
while 1:
pass
Thank you!
In addition to the at-least-once nature of Pub/Sub, acks in Pub/Sub are best effort. This means that there are two potential ways that an ack can "mess up".
The message can be successfully acked by Pub/Sub, and redelivered once (presumably due to a race condition).
The message can fail to be successfully acked.
In a world where the second situation obtains, the client library will not give you any kind of error (because the client library itself is not given one), and you will start seeing the message on a cadence (and this will be 10 seconds if your process times are short).
The solution to this is to simply ack the message again when you receive it. I assume (it is not clear from the toy code, so I am guessing) that you are simply ignoring repeat messages, but if you ack the repeat, you should stop getting it.
If you are re-acking the message, then please open an issue against the client library.
Related
I have been having some interesting issues recently with Python and MQTT.
Basically, my code is subscribing to a topic, and every time there is a new message published, it tries to control a device. Now, this is a blocking function and thus is run in a separate thread, so that on_message() would return immediately.
Additionally, the code publishes a status to a topic every 60 seconds. The code runs fine in the beginning, often a day or two. The device is being controlled via subscribed MQTT messages and the status is published just fine.
Then, it suddenly stops receiving any MQTT messages and also stops publishing them. The publish() function however, does not indicate that there would be problems, and is_connected() returns True. Restarting the program allows it to run another day or two. Below is the full code.
import paho.mqtt.client as mqtt
import json
import threading
class Controller():
def __init__(self):
self.mqtt_client = mqtt.Client()
self.pub_topic = "outgoing"
self.mqtt_client.on_message = self.on_message
self.mqtt_client.connect("192.168.1.1", 1883, 600)
self.mqtt_client.subscribe("incoming")
# This is a blocking function, execution takes approximately 5 minutes.
# The function only runs if there is no existing thread running it yet.
def control_device(self, input_commands):
print("Do some stuff...")
def process_mqtt(self, msg):
mqtt_msg = json.loads(msg.payload.decode('utf-8'))
self.control_device(mqtt_msg)
payload = '{"message": "process started"}'
self.mqtt_client.publish(self.pub_topic, payload)
def on_message(self, client, userdata, msg):
thread = threading.Thread(target=self.process_mqtt, args=(msg,))
thread.start()
# Status is sent to the same topic every 60 seconds
def send_status_msg(self):
if minute_passed:
payload = '{"status": 0}'
self.mqtt_client.publish(self.pub_topic, payload)
def run(self):
while True:
self.mqtt_client.loop()
self.send_status_msg()
if __name__ == "__main__":
c = Controller()
c.run()
Is there something I have not understood about how the MQTT library works? I found some discussion about how you should not publish inside on_message(), but in this case it is put into a separate thread.
I'm using python with pika, and have the following two similar use cases:
Connect to RabbitMQ server A and server B (at different IP addrs with different credentials), listen on exchange A1 on server A; when a message arrives, process it and send to an exchange on server B
Open an HTTP listener and connect to RabbitMQ server B; when a specific HTTP request arrives, process it and send to an exchange on server B
Alas, in both these cases using my usual techniques, by the time I get to sending to server B the connection throws ConnectionClosed or ChannelClosed.
I assume this is the cause: while waiting on the incoming messages, the connection to server B (its "driver") is starved of CPU cycles, and it never gets a chance to service is connection socket, thus it can't respond to heartbeats from server B, thus the servers shuts down the connection.
But I can't noodle out the fix. My current work around is lame: I catch the ConnectionClosed, reopen a connection to server B, and retry sending my message.
But what is the "right" way to do this? I've considered these, but don't really feel I have all the parts to solve this:
Don't just sit forever in server A's basic_consume (my usual pattern), but rather, use a timeout, and when I catch the timeout somehow "service" heartbeats on server B's driver, before returning to a "consume with timeout"... but how do I do that? How do I "let service B's connection driver service its heartbeats"?
I know the socket library's select() call can wait for messages on several sockets and once, then service the socket who has packets waiting. So maybe this is what pika's SelectConnection is for? a) I'm not sure, this is just a hunch. b) Even if right, while I can find examples of how to create this connection, I can't find examples of how to use it to solve my multiconnection case.
Set up the the two server connections in different processes... and use Python interprocess queues to get the processed message from one process to the next. The concept is "two different RabbitMQ connections in two different processes should thus then be able to independently service their heartbeats". Except... I think this has a fatal flaw: the process with "server B" is, instead, going to be "stuck" waiting on the interprocess queue, and the same "starvation" is going to happen.
I've checked StackOverflow and Googled this for an hour last night: I can't for the life of me find a blog post or sample code for this.
Any input? Thanks a million!
I managed to work it out, basing my solution on the documentation and an answer in the pika-python Google group.
First of all, your assumption is correct — the client process that's connected to server B, responsible for publishing, cannot reply to heartbeats if it's already blocking on something else, like waiting a message from server A or blocking on an internal communication queue.
The crux of the solution is that the publisher should run as a separate thread and use BlockingConnection.process_data_events to service heartbeats and such. It looks like that method is supposed to be called in a loop that checks if the publisher still needs to run:
def run(self):
while self.is_running:
# Block at most 1 second before returning and re-checking
self.connection.process_data_events(time_limit=1)
Proof of concept
Since proving the full solution requires having two separate RabbitMQ instances running, I have put together a Git repo with an appropriate docker-compose.yml, the application code and comments to test this solution.
https://github.com/karls/rabbitmq-two-connections
Solution outline
Below is a sketch of the solution, minus imports and such. Some notable things:
Publisher runs as a separate thread
The only "work" that the publisher does is servicing heartbeats and such, via Connection.process_data_events
The publisher registers a callback whenever the consumer wants to publish a message, using Connection.add_callback_threadsafe
The consumer takes the publisher as a constructor argument so it can publish the messages it receives, but it can work via any other mechanism as long as you have a reference to an instance of Publisher
The code is taken from the linked Git repo, which is why certain details are hardcoded, e.g the queue name etc. It will work with any RabbitMQ setup needed (direct-to-queue, topic exchange, fanout, etc).
class Publisher(threading.Thread):
def __init__(
self,
connection_params: ConnectionParameters,
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
self.daemon = True
self.is_running = True
self.name = "Publisher"
self.queue = "downstream_queue"
self.connection = BlockingConnection(connection_params)
self.channel = self.connection.channel()
self.channel.queue_declare(queue=self.queue, auto_delete=True)
self.channel.confirm_delivery()
def run(self):
while self.is_running:
self.connection.process_data_events(time_limit=1)
def _publish(self, message):
logger.info("Calling '_publish'")
self.channel.basic_publish("", self.queue, body=message.encode())
def publish(self, message):
logger.info("Calling 'publish'")
self.connection.add_callback_threadsafe(lambda: self._publish(message))
def stop(self):
logger.info("Stopping...")
self.is_running = False
# Call .process_data_events one more time to block
# and allow the while-loop in .run() to break.
# Otherwise the connection might be closed too early.
#
self.connection.process_data_events(time_limit=1)
if self.connection.is_open:
self.connection.close()
logger.info("Connection closed")
logger.info("Stopped")
class Consumer:
def __init__(
self,
connection_params: ConnectionParameters,
publisher: Optional["Publisher"] = None,
):
self.publisher = publisher
self.queue = "upstream_queue"
self.connection = BlockingConnection(connection_params)
self.channel = self.connection.channel()
self.channel.queue_declare(queue=self.queue, auto_delete=True)
self.channel.basic_qos(prefetch_count=1)
def start(self):
self.channel.basic_consume(
queue=self.queue, on_message_callback=self.on_message
)
try:
self.channel.start_consuming()
except KeyboardInterrupt:
logger.info("Warm shutdown requested...")
except Exception:
traceback.print_exception(*sys.exc_info())
finally:
self.stop()
def on_message(self, _channel: Channel, m, _properties, body):
try:
message = body.decode()
logger.info(f"Got: {message!r}")
if self.publisher:
self.publisher.publish(message)
else:
logger.info(f"No publisher provided, printing message: {message!r}")
self.channel.basic_ack(delivery_tag=m.delivery_tag)
except Exception:
traceback.print_exception(*sys.exc_info())
self.channel.basic_nack(delivery_tag=m.delivery_tag, requeue=False)
def stop(self):
logger.info("Stopping consuming...")
if self.connection.is_open:
logger.info("Closing connection...")
self.connection.close()
if self.publisher:
self.publisher.stop()
logger.info("Stopped")
I am trying to fetch messages from a consumer and send it to a queue. For this I am using Stomp.py After going through articles and posts, I wrote below code:
import ssl
import stomp
stompurl = "xxxxxxxx.mq.us-west-2.amazonaws.com"
stompuser = "stomuser"
stomppass = "password"
class MyListener(stomp.ConnectionListener):
msg_list = []
def __init__(self):
self.msg_list = []
def on_error(self, frame):
self.msg_list.append('(ERROR) ' + frame.body)
def on_message(self, frame):
self.msg_list.append(frame.body)
conn = stomp.Connection(host_and_ports=[(stompurl, "61614")], auto_decode=True)
conn.set_ssl(for_hosts=[(stompurl, "61614")], ssl_version=ssl.PROTOCOL_TLS)
lst = MyListener()
listener = conn.set_listener('', lst)
conn.connect(stompuser, stomppass, wait=True)
# conn.send(body='Test message', destination='Test_QUEUE')
conn.subscribe('Test_QUEUE', '102')
print(listener.message_list)
import time; time.sleep(2)
messages = lst.msg_list
# conn.disconnect()
print(messages)
With this code I am able to send messages to Test_QUEUE but I can't fetch all messages from consumer. How can I pull out all messages from a consumer and post to a queue for processing.
I'm not a Python + STOMP expert, but in every other language I've used when you create an asynchronous (i.e. non-blocking) message listener as you have done then you must prevent your application from exiting. You have a time.sleep(2) in there, but is that realistically enough time to fetch all the messages from the queue?
It appears your application will exit after print(messages) which means that if you don't get all the messages during the time.sleep(2) then your application will simply terminate.
I'm experimenting with creating a combination of the topic exchange mentioned in tutorial #5 and RPC mentioned in tutorial #6, and while it works once, it doesn't work again unless I restart the consumer code.
In the client code, which runs on the machine with the RabbitMQ server, I have register_request() which receives a message (from a higher level made with Flask) and adds it to the exchange based on a routing key, and then waits for a response. The callback reply_queue_callback() adds responses to a dictionary where the keys are the correlation ID.
class QueueManager(object):
def __init__(self):
"""
Initializes an exchange and a reply queue.
"""
self.responses = {}
self.connection = pika.BlockingConnection(pika.ConnectionParameters(host="localhost"))
atexit.register(self.close_connection)
self.channel = self.connection.channel()
self.channel.exchange_declare(exchange=EXCHANGE_NAME, type="topic")
result = self.channel.queue_declare(exclusive=True)
self.reply_queue = result.method.queue
self.channel.basic_consume(self.reply_queue_callback, no_ack=True, queue=self.reply_queue)
def close_connection(self):
"""
Closes the connection to RabbitMQ. Runs upon destruction of the instance.
"""
print "*** Closing queue connection..."
self.connection.close()
def reply_queue_callback(self, ch, method, props, body):
"""
A callback that is executed when there's a new message in the reply queue.
"""
self.responses[props.correlation_id] = literal_eval(body)
def register_request(self, routing_key, message):
"""
Adds a message to the exchange.
"""
corr_id = str(uuid.uuid4())
self.channel.basic_publish(exchange=EXCHANGE_NAME, routing_key=routing_key,
properties=pika.BasicProperties(
reply_to=self.reply_queue,
correlation_id=corr_id),
body=message)
print "*** Sent request with correlation ID", corr_id
return corr_id
def fetch_response(self, corr_id):
"""
A polling function that waits for a message in the reply queue.
"""
print "Waiting for a response..."
while not self.responses.get(corr_id):
self.connection.process_data_events()
return self.responses.pop(corr_id)
In the consumer's code, which runs on a separate machine, receive_requests() is the main function and request_callback() is the callback function for a new message.
def request_callback(ch, method, props, message):
"""
A callback that is executed when a relevant message is found in the exchange.
"""
print "Pulled a request with correlation ID %s" % props.correlation_id
response = produce_response(message)
print "Produced a response, publishing..."
ch.basic_publish(exchange="",
routing_key=props.reply_to,
properties=pika.BasicProperties(correlation_id=props.correlation_id),
body=response)
ch.basic_ack(delivery_tag=method.delivery_tag)
print " [*] Waiting for new messages\n"
def receive_requests():
"""
The main loop. Opens a connection to the RabbitMQ server and consumes messages from the exchange.
"""
connection = pika.BlockingConnection(pika.ConnectionParameters(host=RABBITMQ_IP))
channel = connection.channel()
channel.exchange_declare(exchange=EXCHANGE_NAME, type="topic")
result = channel.queue_declare(exclusive=True)
queue_name = result.method.queue
for binding_key in BINDING_KEYS:
channel.queue_bind(exchange=EXCHANGE_NAME, queue=queue_name, routing_key=binding_key)
channel.basic_consume(request_callback, queue=queue_name, no_ack=True)
try:
print(" [*] Waiting for messages. To exit press CTRL+C\n")
channel.start_consuming()
except KeyboardInterrupt:
print "Aborting..."
When I produce a message the first time, the consumer handles it and I get a response back, but with the second message it seems that nothing reaches the consumer (the client prints that it added the new message to the exchange, but the consumer doesn't print anything). I assume something's wrong with the consumer, because if after the first message I restart the consumer's code and keep the client running as is, a second message works fine.
Any idea what the problem is? Perhaps I'm missing something in the consumer's callback?
I wrote a demo about chatting with other clients with pyxmpp2,but when the client is idle for about 5 minutes the server would disconnect with the client,openfire cannot config the timeout,so I decide to send a presence message in 5 minutes ,the problem puzzling me is when to send the prensense message?
import pyxmpp2
class EchoBot(EventHandler, XMPPFeatureHandler):
"""Echo Bot implementation."""
def __init__(self, my_jid, settings):
version_provider = VersionProvider(settings)
self.client = Client(my_jid, [self, version_provider], settings)
#event_handler(AuthorizedEvent)
def handle_authorized(self,event):
presence = Presence(to_jid ="....",stanza_type = "available")
self.client.stream.send(presence)
def run(self):
"""Request client connection and start the main loop."""
self.client.connect()
self.client.run()
def disconnect(self):
""""""
self.client.disconnect()
def keepconnect(self):
presence = Presence(to_jid ="....",stanza_type = "available")
self.client.stream.send(presence)
print "send presence"
....
bot = McloudBot(JID(mcloudbotJID), settings)
try:
bot.run()
t = threading.Thread(target=bot.run())
timer=threading.Timer(5,bot.keepconnect())
t.start()
timer.start()
except KeyboardInterrupt:
bot.disconnect()
but it seems not work...
Check out
http://community.igniterealtime.org/docs/DOC-2053
This details the dissconnect idle property in OF that you can set to a value in milli seconds
Dissconnecting idle clients is something that's important in session based comms. It has more to do with the client closing unexpectedly rather than just neing idle though.
You can implement ping or heartbeat packet sending in your client as you mention above. Maybe check out the pidgin implementation of whitespace IQ requests.
Hope this steers you in the right direction.
James