Paho MQTT (Python) - when to call client.loop_stop() during disconnection?

Paho MQTT (Python) - when to call client.loop_stop() during disconnection? - python

I'm writing a simple Python client using Paho MQTT to communicate with mosquitto. I'm running into some issues with client.disconnect() and more specifically client.loop_stop().
According to the Paho docs, loop_start() is used to start up a threaded network loop. I've found that the most reliable way to call this is immediately after calling client.connect(). Apparently calling it just prior can have some unexpected effects. Anyway, for me this works fine.
The issue is around when I call client.loop_stop() around the time I wish to disconnect. Some online tutorials show that the best place to call this is in the on_disconnect handler, to be sure that the disconnection has fully completed, and so that any pending subscription or unsubscription attempts have been fully handled by the broker. This appears to work, but if I then attempt to reconnect (by a call to client.connect()) then the connection attempt does not work correctly - the client state gets stuck midway through, and the mosquitto broker only reports the following:
1642374165: New connection from 127.0.0.1:39719 on port 1883.
And nothing else. The connection has not worked. I'm not sure if the Broker is waiting for something from the Client (most likely?), or the Client has sent bad data to the Broker and triggered some kind of issue, but whatever the case, the connection is not valid.
If I move the loop_stop() call to just prior to the call to client.disconnect(), I get much more reliable behaviour, and the broker shows a proper subsequent connection attempt:
1642375893: New connection from 127.0.0.1:38735 on port 1883.
1642375893: New client connected from 127.0.0.1:38735 as client0 (p2, c0, k60).
1642375893: No will message specified.
1642375893: Sending CONNACK to client0 (1, 0)
However I understand that this may cause other issues - in particular, the disconnection may occur before any pending subscription or unsubscription requests have been fully processed, since the network loop is terminated before the disconnection is performed.
What I'd like to know is what's the official word on how to do a clean disconnection and terminate the network loop thread properly, without resorting to arbitrary time.sleep() delays to give things enough time to work themselves out.

Related

Notification for FIN/ACK using python socket

I have a basic implementation of a TCP client using python sockets, all the client does is connect to a server and send heartbeats every X seconds. The problem is that I don't want to send the server a heartbeat if the connection is closed, but I'm not sure how to detect this situation without actually sending a heartbeat and catch an exception. When I turn off the server, in the traffic capture I see FIN/ACK arriving and the client sends an ACK back, this is when I want my code to do something (or at least change some internal state of the connection). Currently, what happens is after the server went down and X seconds passed since last heartbeat the client will try to send another heartbeat, only then I see RST packet in the traffic capture and get an exception of broken pipe (errno 32). Clearly python socket handles the transport layer and the heartbeats are part of application layer, the problem I want to solve is not to send the redundant heartbeat after FIN/ACK arrived from server, any simple way to know the connection state with python socket?

MQTT How to know the Broker Status

In a web application with MQTT in python (using paho-mqtt lib) I would like to know if there is a way to get the broker status in real time, because the only way that i find is to store the variable "rc" into the method on_connect but it's more like a client/connection state.
EDIT 1 : after reading mosquitto broker documentation, i found that you can subscribe to '$SYS/broker/connection/#' which is supposed to give you back 1 if the connection is up and 0 if it goes down. However when i do :
subscribe.callback(self.message_callback, '$SYS/broker/connection/#', port = port, hostname=broker, auth=authentication, protocol=client.MQTTv31, tls=TLS)
impossible to get payload and topic this message although i'm doing exactly the same command to get messages from my sensors except that the topic is '#' and it's working perfectly.
Does anyone knows why ?

There is no way to poll the state of the connection to the broker from the client.
The on_disconnect callback should be called when the connection to the broker is dropped.
This should be kicked off when the keep alive times out, but also as the result of a failure to publish (if you try to publish data before the timeout expires).
Also the rc from a call to the publish command will indicate if the connection has dropped.

Pika connection closed after 3 heartbeats

I'm writing a script which receives HTTP requests (using Tornado), parses them, and sends them to a RabbitMQ broker using pika.
The code looks like this:
def main():
conn_params = pika.ConnectionParameters(
host=BROKER_NAME,
port=BROKER_PORT,
ssl=True,
virtual_host=VIRTUAL_HOST,
credentials=pika.PlainCredentials(BROKER_USER, BROKER_PASS),
heartbeat_interval=HEARTBEAT_INTERVAL
)
conn = pika.BlockingConnection(conn_params)
channel = conn.channel()
# Create the web server which handles application requests.
application = tornado.web.Application([
(URL_BILLING, SomeHandler, dict(channel=channel))
])
# Start the server
application.listen(LISTENING_PORT)
tornado.ioloop.IOLoop.instance().start()
As you can see, I open a single connection and channel, and pass the channel to any instance of the handler which is created, the idea being to save traffic and avoid opening a new connection/channel for every request.
The issue I'm experiencing is that the connection is closed after 3 heartbeats. I used Wireshark in order to figure out what the problem is, but all I can see is that the server sends a PSH (I'm assuming this is the heartbeat) and my scripts replies with an ACK. This happens 3 times with HEARTBEAT_INTERVAL in between them, and then the server just sends a FIN and the connection dies.
Any idea why that happens? Also, should I keep the connection open or is it better to create a new one for every message I need to send?
Thanks for the help.
UPDATE: I looked in the RabbitMQ log, and it says:
Missed heartbeats from client, timeout: 10s
I thought the server was meant to send heartbeats to the client, to make sure it answers, and this agrees with what I observed using Wireshark, but from this log it seems it is the client which is meant to report to the server, not the other way around, and the client, evidently, doesn't report. Am I getting this right?
UPDATE: Figured it out, sort of. A blocking connection (which is what I used) is unable to send heartbeats because it's, well, blocking. As mentioned in this issue, the heartbeat_interval parameters is only used to negotiate the connection with the server, but the client doesn't actually send heartbeats. Since this is the case, what is the best way to keep a long-running connection with pika? Even if I don't specify heartbeat_interval, the server defaults to a heartbeat every 10 minutes, so the connection will die after 30 minutes...

For future visitors:
Pika has an async example which uses heartbeat:
http://pika.readthedocs.org/en/0.10.0/examples/asynchronous_publisher_example.html
For Tornado specific, this example shows how to use Tornado's IOLoop in pika's async model:
http://pika.readthedocs.org/en/0.10.0/examples/tornado_consumer.html

Timeout SSL handshake

I'm trying to incorporate TimeoutMixin in a protocol over SSL. However, when the timeout occurs and it makes a call to the transport.loseConnection() nothing happens. I think this is related to this code in TLSMemoryBIOProtocol:
def _shutdownTLS(self):
"""
Initiate, or reply to, the shutdown handshake of the TLS layer.
"""
try:
shutdownSuccess = self._tlsConnection.shutdown()
except Error:
# Mid-handshake, a call to shutdown() can result in a
# WantWantReadError, or rather an SSL_ERR_WANT_READ; but pyOpenSSL
# doesn't allow us to get at the error. See:
# https://github.com/pyca/pyopenssl/issues/91
shutdownSuccess = False
self._flushSendBIO()
if shutdownSuccess:
# Both sides have shutdown, so we can start closing lower-level
# transport. This will also happen if we haven't started
# negotiation at all yet, in which case shutdown succeeds
# immediately.
self.transport.loseConnection()
The issue is that the time-out is happening before the handshaking can occur. On the server side it has a port open listening for connections but the server is frozen and can't do the proper handshaking. That code snippet looks like it fails to do the TLS shutdown and then does nothing.
My question is:
How do I set a timeout on the SSL handshaking? If the handshaking doesn't happen in a reasonable amount of time, how do drop the connection properly? Also, is there anything wrong with the above snippet being changed to drop the underlying lower-level connection regardless of the severing of the TLS connection? (just doing nothing and hanging indefinitely doesn't seem like the right approach)
EDIT:
The failure of the call to loseConnection seems to happen if any data is sent before and if nothing is sent than it seems to work properly.

loseConnection is the API for an orderly connection shutdown. If you want to terminate the connection abruptly, abortConnection is the API for you.

Sleep after ZMQ connect?

In a ROUTER-ROUTER setup, after I connect one ROUTER socket to another, if I don't sleep (for say 0.1s or so) after I connect() to the other ROUTER socket, the send() usually doesn't go through (although it sometimes does, by chance).
Is there a way to make sure I am connected before I send?
Why aren't the send()s queued and properly executed until the connection is made?
Also, this is not about whether the server on the other end is alive but rather that I send() too soon after I connect() and somehow it fails. I am not sure why.

Is there a way to make sure I am connected before I send?
Not directly. The recommended approach is to use something like the Freelanch Protocol and keep pinging until you receive a response. If you stop receiving responses to your pings you should consider yourself disconnected.
Why aren't the send()s queued and properly executed until the connection is made?
A router cannot send a message to a peer until both sides have completed an internal ZeroMQ handshake. That's just the way it works, since the ROUTER requires the ID of its peer in order to "route". Apparently sleeping for .1sec is the right amount of time on your dev system. If you need the ability to connect and then send without sleeping or retrying, then you need to use a different pattern.
For example, with DEALER-ROUTER, a DEALER client can connect and immediately send and ZeroMQ will queue the message until it is delivered. The reason the works is that the DEALER does not require the ID of the peer - since it does not "route". When the ROUTER server receives the message, that handshake is already complete so it can respond right away without sleeping.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.