Monitoring a tcp port - python

For fun, I've been toying around with writing a load balancer in python and have been trying to figure the best (correct?) way to test if a port is available and the remote host is still there.
I'm finding that, once connected, it becomes difficult to tell when the remote host goes down. I've turned keep alive on, but can't get it to recognize a downed connection sooner than a minute (I realize polling more often than a minute might be overkill, but lets say I wanted to), even after setting the various TCP_KEEPALIVE options to their lowest.
When I use nonblocking sockets, I've noticed that a recv() will return an error ("resource temporarily unavailable") when it reads from a live socket, but returns "" when reading from a dead one (send and recv of 0 bytes, which might be the cause?). That seems like an odd way to test for it connected, though, and makes it impossible to tell if the connected died but after sending some data.
Aside from connecting/disconnecting for every check, is there something I can do? Can I manually send a tcp keepalive, or can I establish a lower level connection that will let me test the connectivity without sending real data the remote server would potentially process?

I'd recommend not leaving your (single) test socket connected - make a new connection each time you need to poll. Every load balancer / server availability system I've ever seen uses this method instead of a persistent connection.
If the remote server hasn't responded within a reasonable amount of time (e.g. 10s) mark it as "down". Use timers and signals rather than function response codes to handle that timeout.

"it becomes difficult to tell when the remote host goes down"
Correct. This is a feature of TCP. The whole point of TCP is to have an enduring connection between ports. Theoretically an application can drop and reconnect to the port through TCP (the socket libraries don't provide a lot of support for this, but it's part of the TCP protocol).

ping was invented for that purpose
also you might be able to send malformed TCP packets to your destination. For example, in the TCP headers there is a flag for acknowleging end of transmission, its the FIN message. If you send a message with ACK and FIN the remote host should complain with a return packet and you'll be able to evaluate round trip time.

It is theoretically possible to spam a keepalive packet. But to set it to very low intervals, you may need to dig into raw sockets. Also, your host may ignore it if its coming in too fast.
The best way to check if a host is alive in a TCP connection is to send data, and wait for an ACK packet. If the ACK packet arrives, the SEND function will return non-zero.

You can use Bash pseudo-device files for TCP/UDP connection with a specific I/O port, for example:
printf "" > /dev/tcp/example.com/80 && echo Works
This would open the connection, but won't send anything. You can test it by:
nc -vl 1234 &
printf "" > /dev/tcp/localhost/1234
For simple monitoring use cron with above command or using watch:
watch bash -c 'echo > /dev/tcp/localhost/1234 && echo Works || echo FAIL'
However it's recommended to use specific tools which are designed for that such as Monit, Nagios, etc.
Monit
Here is example rule using Monit (monit):
# Verify host.
check host example with address example.com
if failed
port 80
protocol http
then alert

Related

Why is there a discrepancy between python sockets and tcp ping for the same IP:port destination?

My setup:
I am using an IP and port provided by portmap.io to allow me to perform port forwarding.
I have OpenVPN installed (as required by portmap.io), and I run a ready-made config file when I want to operate my project.
My main effort involves sending messages between a client and a server using sockets in Python.
I have installed a software called tcping, which basically allows me to ping an IP:port over a tcp connection.
This figure basically sums it up:
Results I'm getting:
When I try to "ping" said IP, the average RTT ends up being around 30ms consistently.
I try to use the same IP to program sockets in Python, where I have a server script on my machine running, and a client script on any other machine but binding to this IP. I try sending a small message like "Hello" over the socket, and I am finding that the message is taking a significantly greater amount of time to travel across, and an inconsistent one for that matter. Sometimes it ends up taking 1 second, sometimes 400ms...
What is the reason for this discrepancy?
What is the reason for this discrepancy?
tcpping just measures the time needed to establish the TCP connection. The connection establishment is usually completely done in the OS kernel, so there is not even a switch to user space involved.
Even some small data exchange at the application is significantly more expensive. First, the initial TCP handshake must be done. Usually only once the TCP handshake is done the client starts sending the payload, which then needs to be delivered to the other side, put into the sockets read buffer, schedule the user space application to run, read the data from the buffer in the application and process, create and deliver the response to the peers OS kernel, let the kernel deliver the response to the local system and lots of stuff here too until the local app finally gets the response and ends the timing of how long this takes.
Given that the time for the last one is that much off from the pure RTT I would assume though that the server system has either low performance or high load or that the application is written badly.

How to know if the remote tcp device is powered off

In my GO code, I am establishing a TCP connection as below:
conn, err1 := net.Dial("tcp", <remote_address>)
if err1 == nil {
buf := make([]byte, 256)
text, err := conn.Read(buf[:])
if err == io.EOF {
//remote connection close handle
fmt.Println("connection got reset by peer")
panic(err)
}
}
To simulate the other end, I am running a python script on a different computer, which opens a socket and sends some random data to the socket above lines of codes are listening to. Now my problem is, when I am killing this python code by pressing ctrl+C, the remote connection closed event is recognised finely by above code and I get a chance to handle that.
However, if I simply turn off the remote computer (where the python script is running) my code doesn't get notified at all.
In my case, the connection should always be opened and should be able to send the data randomly, and only if the remote machine gets powered off, my GO code should get notified.
Can someone help me in this scenario, how would I get notification when the remote machine hosting the socket itself gets powered off? How would I get the trigger remotely in my GO code?
PS - This seems to be a pretty common problem in real time, though not in the testing environment.
There is no way to determine the difference between a host that is powered off and a connection that has been broken, so you treat them the same way.
You can send a heartbeat message on your own, and close the connection when you reach some timeout period between heartbeat packets. The timeout can either be set manually by timing the packets, or you can use SetReadDeadline before each read to terminate the connection immediately when the deadline is reached.
You can also use TCP Keepalive to do this for you, using TCPConn.SetKeepAlive to enable it and TCPConn.SetKeepAlivePeriod to set the interval between keepalive packets. The time it takes to actually close the connection will be system dependent.
You should also set a timeout when dialing, since connecting to a down host isn't guaranteed to return an ICMP Host Unreachable response. You can use DialTimeout, a net.Dialer with the Timeout parameter set, or Dialer.DialContext.
Simply reading through the stdlib documentation should provide you with plenty of information: https://golang.org/pkg/net/
You need to add some kind of heartbeat message. Then, looking at GO documentation, you can use DialTimeout instead of Dial, each time you receive the heartbeat message or any other you can reset the timeout.
Another alternative is to use TCP keepalive. Which you can do in Python by using setsockopt, I can't really help you with GO but this link seems like a good description of how to enable keepalive with it:
http://felixge.de/2014/08/26/tcp-keepalive-with-golang.html

Using sniffing with python elasticsearch client to solve dead TCP connection issues

The Python elasticsearch client in my applicaiton is having connectivity issues (refused connections) because idle TCP connections timeout due to a firewall (I have no way to prevent this).
The easiest way for me to fix this would be if I could prevent the connection from going idle by sending some data over it periodically, the sniffing options in the elasticsearch client seem ideal for this, however they're not very well documented:
sniff_on_start – flag indicating whether to obtain a list of nodes
from the cluser at startup time
sniffer_timeout – number of seconds
between automatic sniffs
sniff_on_connection_fail – flag controlling
if connection failure triggers a sniff
sniff_timeout – timeout used for the sniff request - it should be a fast api call and we are talking potentially to more nodes so we want to fail quickly. Not used during initial sniffing (if sniff_on_start is on) when the connection still isn’t initialized.
What I would like is for the client to sniff every (say) 5 minutes, should I be using the sniff_timeout or sniffer_timeout option? Also, should the sniff_on_start parameter be set to True?
I used the suggestion from #val and found that these settings solved my problem:
sniff_on_start=True
sniffer_timeout=60
sniff_on_connection_fail=True
The sniffing puts enough traffic on the TCP connections so that they are never idle for long enough for our firewall to kill the conneciton.

Python Twisted client not able to receive response from server

I have a client written using python-twisted (http://pastebin.com/X7UYYLWJ) which sends a UDP packet to a UDP Server written in C using libuv. When the client sends a packet to the server, it is successfully received by the server and it sends a response back to the python client. But the client not receiving any response, what could be the reason ?
Unfortunately for you, there are many possibilities.
Your code uses connect to set up a "connected UDP" socket. Connected UDP sockets filter the packets they receive. If packets are received from any address other than the one to which the socket is connected, they are dropped. It may be that the server sends its responses from a different address than you've connected to (perhaps it uses another port or perhaps it is multi-homed and uses a different IP).
Another possibility is that a NAT device is blocking the return packets. UDP NAT hole punching has come a long way but it's still not perfect. It could be that the server's response arrives at the NAT device and gets discarded or misrouted.
Related to this is the possibility that an intentionally configured firewall is blocking the return packets.
Another possibility is that the packets are simply lost. UDP is not a reliable protocol. A congested router, faulty networking gear, or various other esoteric (often transient) concerns might be resulting in the packet getting dropped at some point, instead of forwarded to the next hop.
Your first step in debugging this should be to make your application as permissive as possible. Get rid of the use of connected UDP so that all packets that make it to your process get delivered to your application code.
If that doesn't help, use tcpdump or wireshark or a similar tool to determine if the packets make it to your computer at all. If they do but your application isn't seeing them, look for a local firewall configuration that might reject them.
If they're not making it to your computer, see if they make it to your router. Use whatever diagnostic tools are available (along the lines of tcpdump) on your router to see whether packets make it that far or not. Or if there are no such tools, remove the router from the equation. If you see packets making it to your router but no further, look for firewall or NAT configuration issues there.
If packets don't make it as far as your router, move to the next hop you have access to. This is where things might get difficult since you may not have access to the next hop or the next hop might be the server (with many intervening hops - which you have to just hope are all working).
Does the server actually generate a reply? What addressing information is on that reply? Does it match the client's expectations? Does it get dropped at the server's outgoing interface because of congestion or a firewall?
Hopefully you'll discover something interesting at one of these steps and be able to fix the problem.
I had a similar problem. The problem was windows firewall. In firewall allowed programs settings, allowing the communication for pythonw/python did solve the problem. My python program was:
from socket import *
import time
address = ( '192.168.1.104', 42) #Defind who you are talking to (must match arduino IP and port)
client_socket = socket(AF_INET, SOCK_DGRAM) #Set Up the Socket
client_socket.bind(('', 45)) # arduino sending to port 45
client_socket.settimeout(1) #only wait 1 second for a response
data = "xyz"
client_socket.sendto(data, address)
try:
rec_data, addr = client_socket.recvfrom(2048) #Read response from arduino
print rec_data #Print the response from Arduino
except:
pass
while(1):
pass

Unwanted RST TCP packet with Scapy

In order to understand how TCP works, I tried to forge my own TCP SYN/SYN-ACK/ACK (based on the tutorial: http://www.thice.nl/creating-ack-get-packets-with-scapy/ ).
The problem is that whenever my computer recieve the SYN-ACK from the server, it generates a RST packet that stops the connection process.
I tried on a OS X Lion and on a Ubuntu 10.10 Maverick Meerkat, both reset the connection. I found this: http://lkml.indiana.edu/hypermail/linux/net/0404.2/0021.html, I don't know if it is the reason.
Does anyone could tell me what could be the reason? And how to avoid this problem?
Thank you.
The article you cited makes this pretty clear...
Since you are not completing the full TCP handshake your operating system might try to take control and can start sending RST (reset) packets, to avoid this we can use iptables:
iptables -A OUTPUT -p tcp --tcp-flags RST RST -s 192.168.1.20 -j DROP
Essentially, the problem is that scapy runs in user space, and the linux kernel will receive the SYN-ACK first. The kernel will send a RST because it won't have a socket open on the port number in question, before you have a chance to do anything with scapy.
The solution (as the blog mentions) is to firewall your kernel from sending a RST packet.
I don't have a non-iptables answer, but one can fix the reset issue. Instead of trying to filter the outgoing reset in the filter table, filter all of the incoming packets from the target in the raw table instead. This prevents the return packets from the target from even being processed by the kernel, though scapy still sees them. I used the following syntax:
iptables -t raw -A PREROUTING -p tcp --dport <source port I use for scapy traffic> -j DROP
This solution does force me to use the same source port for my traffic; feel free to use your own iptables-fu to identify your target's return packets.
The blog article cited in other answers is not entirely correct. It's not only that you aren't completing the three way handshake, it's that the kernel's IP stack has no idea that there's a connection happening. When it receives the SYN-ACK, it sends a RST-ACK because it's unexpected. Receiving first or last really doesn't enter into it. The stack receiving the SYN-ACK is the issue.
Using IPTables to drop outbound RST packets is a common and valid approach, but sometimes you need to send a RST from Scapy. A more involved but very workable approach is to go lower, generating and responding to ARP with a MAC that is different from the host's. This allows you to have the ability to send and receive anything without any interference from the host.
Clearly this is more effort. Personally, I only take this approach (as opposed to the RST dropping approach) when I actually need to send a RST myself.
I found a solution without IPTables in https://widu.tumblr.com/post/43624355124/suppressing-tcp-rst-on-raw-sockets .
To bypass this problem, simply create a standard TCP socket as a server socket and bind to the requested port. Don’t do accept().
Just socket(), bind() on the port and listen(). This relaxes the kernel and let you do the 3-way handshake.

Categories