twisted - detection of lost connection takes more than 30 minutes

twisted - detection of lost connection takes more than 30 minutes - python

I've written a tcp client using python and twisted, it connects to a server and communicate in a simple string based protocol (Defined by the server manufacturer). The TCP/IP connection should persist, and reconnect in case of failure.
When some sort of network error occurs (I assume on the server side or on some node along the way), it takes a very long time for the client to realize that and initiate a new connection, much more than a few minutes.
Is there a way to speed that up? Some sort of built in TCP/IP keep alive functionality that can detect the disconnect sooner?
I can implement a keep alive mechanism myself, and look for timeouts, not sure that's the best practice in this case. What do you think? Also, when using reactor.connectTCP() and reactor.run() with a ClientFactory, what's the best way to force a re-connection?

Application level keep-alives for TCP-based protocols are a good idea. You should probably implement this. This gives you complete and precise control over the timeout semantics you want from your application.
TCP itself has a keepalive mechanism. You can enable this with an ITCPTransport method call from your protocol. For example:
class YourProtocol(Protocol):
def connectionMade(self):
self.transport.setTcpKeepAlive(True)
The exact semantics of this keepalive are platform and configuration dependent. It's entirely possible this is already enabled and is what's detecting your connection lose. Thirty minutes is a pretty plausible amount of time for this mechanism to notice a lost connection.

As stated in by Jean-Paul Calderone, you can either implement an application level keepalive or use the TCP keepalive mechanism. The application level keepalive is the preferred method as it gives you more fine-grained control.
The TCP keepalive mechanism lives on the OS level and the defaults are OS dependant, but are configurable. For example the default linux TCP keepalive works in the following way:
After 2 hours send a keepalive probe.
If this fails, send another probe every 75 seconds.
After 9 consecutive fails, mark the connection as closed. This will be picked up by the server and it will trigger whatever cleanup mechanisms it has in place.
See: https://en.wikipedia.org/wiki/Keepalive#TCP_keepalive and http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html
So while the TCP keepalive will eventually reap your dead connections, it will take quite a long time to kick in.

Related

Efficient way to send results every 1-30 seconds from one machine to another

Key points:
I need to send roughly ~100 float numbers every 1-30 seconds from one machine to another.
The first machine is catching those values through sensors connected to it.
The second machine is listening for them, passing them to an http server (nginx), a telegram bot and another program sending emails with alerts.
How would you do this and why?
Please be accurate. It's the first time I work with sockets and with python, but I'm confident I can do this. Just give me crucial details, lighten me up!
Some small portion (a few rows) of the core would be appreciated if you think it's a delicate part, but the main goal of my question is to see the big picture.

Main thing here is to decide on a connection design and to choose protocol. I.e. will you have a persistent connection to your server or connect each time when new data is ready to it.
Then will you use HTTP POST or Web Sockets or ordinary sockets. Will you rely exclusively on nginx or your data catcher will be another serving service.
This would be a most secure way, if other people will be connecting to nginx to view sites etc.
Write or use another server to run on another port. For example, another nginx process just for that. Then use SSL (i.e. HTTPS) with basic authentication to prevent anyone else from abusing the connection.
Then on client side, make a packet every x seconds of all data (pickle.dumps() or json or something), then connect to your port with your credentials and pass the packet.
Python script may wait for it there.
Or you write a socket server from scratch in Python (not extra hard) to wait for your packets.
The caveat here is that you have to implement your protocol and security. But you gain some other benefits. Much more easier to maintain persistent connection if you desire or need to. I don't think it is necessary though and it can become bulky to code break recovery.
No, just wait on some port for a connection. Client must clearly identify itself (else you instantly drop the connection), it must prove that it talks your protocol and then send the data.
Use SSL sockets to do it so that you don't have to implement encryption yourself to preserve authentication data. You may even rely only upon in advance built keys for security and then pass only data.
Do not worry about the speed. Sockets are handled by OS and if you are on Unix-like system you may connect as many times you want in as little time interval you need. Nothing short of DoS attack won't inpact it much.
If on Windows, better use some finished server because Windows sometimes do not release a socket on time so you will be forced to wait or do some hackery to avoid this unfortunate behaviour (non blocking sockets and reuse addr and then some flo control will be needed).
As far as your data is small you don't have to worry much about the server protocol. I would use HTTPS myself, but I would write myown light-weight server in Python or modify and run one of examples from internet. That's me though.

The simplest thing that could possibly work would be to take your N floats, convert them to a binary message using struct.pack(), and then send them via a UDP socket to the target machine (if it's on a single LAN you could even use UDP multicast, then multiple receivers could get the data if needed). You can safely send a maximum of 60 to 170 double-precision floats in a single UDP datagram (depending on your network).
This requires no application protocol, is easily debugged at the network level using Wireshark, is efficient, and makes it trivial to implement other publishers or subscribers in any language.

Using sniffing with python elasticsearch client to solve dead TCP connection issues

The Python elasticsearch client in my applicaiton is having connectivity issues (refused connections) because idle TCP connections timeout due to a firewall (I have no way to prevent this).
The easiest way for me to fix this would be if I could prevent the connection from going idle by sending some data over it periodically, the sniffing options in the elasticsearch client seem ideal for this, however they're not very well documented:
sniff_on_start – flag indicating whether to obtain a list of nodes
from the cluser at startup time
sniffer_timeout – number of seconds
between automatic sniffs
sniff_on_connection_fail – flag controlling
if connection failure triggers a sniff
sniff_timeout – timeout used for the sniff request - it should be a fast api call and we are talking potentially to more nodes so we want to fail quickly. Not used during initial sniffing (if sniff_on_start is on) when the connection still isn’t initialized.
What I would like is for the client to sniff every (say) 5 minutes, should I be using the sniff_timeout or sniffer_timeout option? Also, should the sniff_on_start parameter be set to True?

I used the suggestion from #val and found that these settings solved my problem:
sniff_on_start=True
sniffer_timeout=60
sniff_on_connection_fail=True
The sniffing puts enough traffic on the TCP connections so that they are never idle for long enough for our firewall to kill the conneciton.

UDP Server in Python

How can I create a UDP server in Python that is possible to know when a client has disconnected? The server needs to be fast because I will use in an MMORPG. Never did a UDP server then I have a little trouble.

There is no such thing as a connection in UDP. Because of this, it becomes your responsibility to detect if the client has disconnected. Generally speaking, your protocol should implement a way to notify the server that it is ending its session. Additionally, you will need to implement some type of timeout functionality such that after a certain period of interactivity, the session is ended.
Note that UDP is more difficult to work with than TCP because packets are not always guaranteed to be delivered. Depending on what you are doing, you might need to implement some type of check to ensure that packets that are not delivered are sent again. TCP does this for you, but it also has the side effect of making the protocol slower.
This answer provides some more considerations: https://stackoverflow.com/a/57489/4250606

UDP is not connection-based. Since no connection exists when using UDP, there is nothing to disconnect. Since there is nothing to disconnect, you can't ever know when something disconnects. It never will because it was never connected in the first place.

Close inactive connections in Twisted

I'm running a Twisted server with the LineReceiver protocol. Sometimes clients will disconnect silently, so Twisted keeps the connection open. And because the server doesn't send anything unless requested of it, there's never a TCP timeout. In other words, some connections are never closed server-side.
How can I have Twisted close a connection that's been inactive for a few hours?

You can schedule timed events using reactor.callLater. Based on this, there's a helper for adding timeouts to protocols, twisted.protocols.policies.TimeoutMixin.
Another approach is to use TCP keep-alives, which you can enable using the transport's setTcpKeepAlive method.
And another approach is to use application-level keep-alives. Essentially send a ''noop'' once in a while. It doesn't need a response. If the connection has been lost, the extra data in the send buffer will cause the TCP stack to eventually notice.
See also the FAQ entry.

How to detect non-graceful disconnect of Twisted on Linux?

I wrote a server based on Twisted, and I encountered a problem, some of the clients are disconnected not gracefully. For example, the user pulls out the network cable.
For a while, the client on Windows is disconnected (the connectionLost is called, and it is also written in Twisted). And on the Linux server side, my connectionLost of twisted is never triggered. Even it try to writes data to client when the connection is lost. Why Twisted can't detect those non-graceful disconnection (even write data to client) on Linux? How to makes Twisted detect non-graceful disconnections? Because the feature Twisted can't detect non-graceful, I have lots of zombie user on my server.
---- Update ----
I thought it might be the feature of socket of unix-like os, so, what is the behavior of socket on unix-like for handling situation like this?
Thanks.
Victor Lin.

You're describing the behavior of TCP connections on an unreliable network. Twisted is merely exposing this behavior: after all, when you set up a TCP connection with Twisted, it is nothing more than a TCP connection.
You're mistaken when you say that the connectionLost callback isn't invoked even if you try to send data over it. After two minutes, the underlying TCP connection will disappear and Twisted will inform you of this by calling connectionLost.
If you need to detect this condition more quickly than that, then you can implement your own timeouts using reactor.callLater.

Seconding what Jean-Paul said, if you need more fine grained TCP connection management, just use reactor.CallLater. We have exactly that implementation on a Twisted/wxPython trading platform, and it works a treat. You might also want to tweak the behaviour of the ReconnectingClientFactory in order to achieve the results I understand your looking for.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.