Python; Troubles controlling dead sockets through select

Python; Troubles controlling dead sockets through select - python

I have some code which will connect to a host and do nothing but listen for incoming data until either the client is shut down or the host send a close statement. For this my code works well.
However when the host dies without sending a close statement, my client keeps listening for incoming data forever as expected. To resolve this I made the socket timeout every foo seconds and start the process of checking if the connection is alive or not. From the Python socket howto I found this:
One very nasty problem with select: if somewhere in those input lists of sockets is one which has died a nasty death, the select will fail. You then need to loop through every single damn socket in all those lists and do a select([sock],[],[],0) until you find the bad one. That timeout of 0 means it won’t take long, but it’s ugly.
# Example code written for this question.
from select import select
from socket include socket, AF_INET, SOCK_STREAM
socket = socket(AF_INET, SOCK_STREAM)
socket.connect(('localhost', 12345))
socklist = [socket,]
attempts = 0
def check_socklist(socks):
for sock in socklist:
(r, w, e) = select([sock,], [], [], 0)
...
...
...
while True:
(r, w, e) = select(socklist, [], [], 60)
for sock in r:
if sock is socket:
msg = sock.recv(4096)
if not msg:
attempts +=1
if attempts >= 10:
check_socket(socklist)
break
else:
attempts = 0
print msg
This text creates three questions.
I was taught that to check if a connection is alive or not, one has to write to the socket and see if a response returns. If not, the connection has to be assumed it is dead. In the text it says that to check for bad connections, one single out each socket, pass it to select's first parameter and set the timeout to zero. How will this confirm that the socket is dead or not?
Why not test if the socket is dead or alive by trying to write to the socket instead?
What am I looking for when the connection is alive and when it is dead? Select will timeout at once, so having no data there will prove nothing.
I realize there are libraries like gevent, asyncore and twisted that can help me with this, but I have chosen to do this my self to get a better understanding of what is happening and to get more control over the source my self.

If a connected client crashes or exits, but its host OS and computer are still running, then its OS's TCP stack will send your server a FIN packet to let your computer's TCP stack know that the TCP connection has been closed. Your Python app will see this as select() indicating that the client's socket is ready-for-read, and then when you call recv() on the socket, recv() will return 0. When that happens, you should respond by closing the socket.
If the connected client's computer never gets a chance to send a FIN packet, on the other hand (e.g. because somebody reached over and yanked its Ethernet cord or power cable out of the socket), then your server won't realize that the TCP connection is defunct for quite a while -- possibly forever. The easiest way to avoid having a "zombie socket" is simply to have your server send some dummy data on the socket every so often, e.g. once per minute or something. The client should know to discard the dummy data. The benefit of sending the dummy data is that your server's TCP stack will then notice that it's not getting any ACK packets back for the data packet(s) it sent, and will resend them; and after a few resends your server's TCP stack will give up and decide that the connection is dead, at which point you'll see the same behavior that I described in my first paragraph.

If you write something to a socket and then wait for an answer to check the connection, the server should support this "ping" messages. It is not alway the case. Otherwise the server app may crash itself or disconnect your client if the server doesn't wait this message.
If select failed in the way you described, the socket framework knows which socket is dead. You just need to find it. But if a socket is dead by that nasty death like server's app crash, it doesn't mean mandatory that client's socket framework will detect that. E.g. in the case when a client is waiting some messages from the server and the server crashes, in some cases the client can wait forever. For example Putty, to avoid this scenario, can use application's protocol-level ping (SSH ping option) of the server to check the connection; SSH server can use TCP keepalive to check the connection and to prevent network equipment from dropping connections without activity.
(see p.1).
You are right that select's timeout and having no data proves nothing. As documentation says you have to check every socket when select fails.

Related

What happend with TCP connection when client-pc shut down? [duplicate]

I have some code which will connect to a host and do nothing but listen for incoming data until either the client is shut down or the host send a close statement. For this my code works well.
However when the host dies without sending a close statement, my client keeps listening for incoming data forever as expected. To resolve this I made the socket timeout every foo seconds and start the process of checking if the connection is alive or not. From the Python socket howto I found this:
One very nasty problem with select: if somewhere in those input lists of sockets is one which has died a nasty death, the select will fail. You then need to loop through every single damn socket in all those lists and do a select([sock],[],[],0) until you find the bad one. That timeout of 0 means it won’t take long, but it’s ugly.
# Example code written for this question.
from select import select
from socket include socket, AF_INET, SOCK_STREAM
socket = socket(AF_INET, SOCK_STREAM)
socket.connect(('localhost', 12345))
socklist = [socket,]
attempts = 0
def check_socklist(socks):
for sock in socklist:
(r, w, e) = select([sock,], [], [], 0)
...
...
...
while True:
(r, w, e) = select(socklist, [], [], 60)
for sock in r:
if sock is socket:
msg = sock.recv(4096)
if not msg:
attempts +=1
if attempts >= 10:
check_socket(socklist)
break
else:
attempts = 0
print msg
This text creates three questions.
I was taught that to check if a connection is alive or not, one has to write to the socket and see if a response returns. If not, the connection has to be assumed it is dead. In the text it says that to check for bad connections, one single out each socket, pass it to select's first parameter and set the timeout to zero. How will this confirm that the socket is dead or not?
Why not test if the socket is dead or alive by trying to write to the socket instead?
What am I looking for when the connection is alive and when it is dead? Select will timeout at once, so having no data there will prove nothing.
I realize there are libraries like gevent, asyncore and twisted that can help me with this, but I have chosen to do this my self to get a better understanding of what is happening and to get more control over the source my self.

If a connected client crashes or exits, but its host OS and computer are still running, then its OS's TCP stack will send your server a FIN packet to let your computer's TCP stack know that the TCP connection has been closed. Your Python app will see this as select() indicating that the client's socket is ready-for-read, and then when you call recv() on the socket, recv() will return 0. When that happens, you should respond by closing the socket.
If the connected client's computer never gets a chance to send a FIN packet, on the other hand (e.g. because somebody reached over and yanked its Ethernet cord or power cable out of the socket), then your server won't realize that the TCP connection is defunct for quite a while -- possibly forever. The easiest way to avoid having a "zombie socket" is simply to have your server send some dummy data on the socket every so often, e.g. once per minute or something. The client should know to discard the dummy data. The benefit of sending the dummy data is that your server's TCP stack will then notice that it's not getting any ACK packets back for the data packet(s) it sent, and will resend them; and after a few resends your server's TCP stack will give up and decide that the connection is dead, at which point you'll see the same behavior that I described in my first paragraph.

If you write something to a socket and then wait for an answer to check the connection, the server should support this "ping" messages. It is not alway the case. Otherwise the server app may crash itself or disconnect your client if the server doesn't wait this message.
If select failed in the way you described, the socket framework knows which socket is dead. You just need to find it. But if a socket is dead by that nasty death like server's app crash, it doesn't mean mandatory that client's socket framework will detect that. E.g. in the case when a client is waiting some messages from the server and the server crashes, in some cases the client can wait forever. For example Putty, to avoid this scenario, can use application's protocol-level ping (SSH ping option) of the server to check the connection; SSH server can use TCP keepalive to check the connection and to prevent network equipment from dropping connections without activity.
(see p.1).
You are right that select's timeout and having no data proves nothing. As documentation says you have to check every socket when select fails.

robust continuous TCP connection (python socket)

My goal is to establish a continuous and robust TCP connection between one server and exactly one client. If one side fails, the other one should wait until it recovers.
I wrote the following code based on this question (that only asks for continuous, but not robust TCP connections and does not handle keepalive issues), this post and my own experience.
I have two questions:
How can I make the keepalive work? If the server dies, the client only recognizes it after trying to send() - which worked also without the KEEPALIVE option as this results in a connection reset. Is there some way that the socket sends an interrupt for a connection that is dead or some keepalive function that I can check on a regular basis?
Is this a robust way of handling a continous TCP connection? Having a stable, continous TCP connection seems to be a standard problem, however, I couldn't find tutorials covering this in detail. There must be some best-practice.
Note, I could handle keep alive messages on my own at the application level. However, as TCP already implements this at transport level, it is better to rely on this service provided by the lower level.
The server:
from socket import *
serverPort = 12000
while True:
# 1. Configure server socket
serverSocket = socket(AF_INET, SOCK_STREAM)
serverSocket.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
serverSocket.bind(('127.0.0.1', serverPort))
serverSocket.listen(1)
print("waiting for client connecting...")
connectionSocket, addr = serverSocket.accept()
connectionSocket.setsockopt(SOL_SOCKET, SO_KEEPALIVE,1)
print(connectionSocket.getsockopt(SOL_SOCKET,SO_KEEPALIVE))
print("...connected.")
serverSocket.close() # Destroy the server socket; we don't need it anymore since we are not accepting any connections beyond this point.
# 2. communication routine
while True:
try:
sentence = connectionSocket.recv(512).decode()
except ConnectionResetError as e:
print("Client connection closed")
break
if(len(sentence)==0): # close if client closed connection
break
else:
print("recv: "+str(sentence))
# 3. proper closure
connectionSocket.shutdown(SHUT_RDWR)
connectionSocket.close()
print("connection closed.")
The client:
from socket import *
import time
while True:
# 1. configure socket dest.
serverName = '127.0.0.1'
serverPort = 12000
clientSocket = socket(AF_INET, SOCK_STREAM)
try:
clientSocket.setsockopt(SOL_SOCKET, SO_KEEPALIVE,1)
clientSocket.connect((serverName, serverPort))
print(clientSocket.getsockopt(SOL_SOCKET,SO_KEEPALIVE))
except ConnectionRefusedError as e:
print("Server refused connection. retrying")
time.sleep(1)
continue
# 2. communication routine
while(1):
sentence = input('input sentence: ')
if(sentence == "close"):
break
try:
clientSocket.send(sentence.encode())
except ConnectionResetError as e:
print("Server connection closed")
break
# 3. proper closure
clientSocket.shutdown(SHUT_RDWR)
clientSocket.close()
I tried to hold this example as minimal as possible. But given the requirement of robustness, it is relativley long.
I also tried some socket options as TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT.
Thank you!

I will try to answer both questions.
... Is there some way that the socket sends an interrupt for a connection that is dead ...
I know none. TCP_KEEPALIVE only tries to maintain the connection. It is very useful if any equipment on the network flow has a timeout, because it prevents the timeout to abort the connection. But if the connection drops because because of any other reason (that timeout) TCP_KEEPALIVE cannot do anything. The rationale is that there is no need to restore a dropped inactive connection before something has to be exchanged.
Is this a robust way of handling a continous TCP connection?
Not really.
The robust way is to be prepared that the connection fails for any reason at any moment. So you should be prepared to face an error when sending a message (your code is) and if that happens try to re-open the connection and send the message again (your current code does not). Something like:
def connect(...):
# establish and return a connection
...
return clientSocket
clientSocket = connect(...)
while True:
...
while True:
try:
clientSocket.send(message)
break
except OSError:
clientSocket = connect()
...
Unrelated: your graceful shutdown is incorrect. The initiator (the part using shutdown) should not immediately close the socket, but start a read loop and only close when everything has be received and processed.

How can I make the keepalive work? If the server dies, the client only recognizes it after trying to send() - which worked also without the KEEPALIVE option as this results in a connection reset.
Keepalive is more useful on the server or reading side. And it is a tricky beast. The socket won't notify you at all unless you read/write. You can query its state (even though I'm not sure this is possible with the standard Python) but this still doesn't solve the problem of notification. You need to check the state periodically anyway.
Is there some way that the socket sends an interrupt for a connection that is dead or some keepalive function that I can check on a regular basis?
Have you ever heard about the Two Generals' Problem? There is no reliable way to detect whether one side is dead or not. We can however be close enough with pings and timeouts.
Note, I could handle keep alive messages on my own at the application level. However, as TCP already implements this at transport level, it is better to rely on this service provided by the lower level.
No, it is not better. If, for any reason, there's a proxy between the server and the client, then no TCP feature will help you. Because by design these only control a single connection, while with a proxy you have at least two connections. You should not think about your connection in terms of the underlying transport (TCP). Instead create your own protocol with ping command which the server (or client or both) send periodically together with timeouts. This way you can be sure that the peer is alive up to period interval.
Is this a robust way of handling a continous TCP connection? Having a stable, continous TCP connection seems to be a standard problem, however, I couldn't find tutorials covering this in detail. There must be some best-practice.
You won't find tutorials covering this, because that problem has no solution. Most people simulate "I'm still alive" with the combination of pings and timeouts.

Change a python server to client after a specific timeout

So, I have a server socket defined as server_sock the current code looks as follow
# define a variable server_sock
server_sock.bind(("", PORT_ANY))
server_sock.listen(1)
port = server_sock.getsockname()[1]
client_sock, client_info = server_sock.accept()
print("[+] Accepted connection from ", client_info)
server_sock.close()
client_sock.close()
# define a variable sock
sock.connect((host, port)) # This will be the client socket
Now, this code will create a server_sock , listen for incoming connections and after any client is connected it will close those socket and act as client by using another sock.
What I am planning to do is to first let the code run as a server (i.e server_sock should wait for connections) for specific timeout (Let's assume 10 seconds).
After the 10 seconds, the server_sock should get closed by itself and then the next piece of code (i.e client part) should start.
So, it's loosely something like change from Server Mode to Client Mode after a specific Time-Out.
I am having a hard time to solve this issue. Usually server_sock.accept() line would be stuck until it hits a new connection else it won't proceed.
So, how can I implement something which breaks that whole thing after a specific time-out.
Note that I am running this code cross platform on Windows and UNIX. So, I have been looking to some signal specific answers but Windows doesn't support some Signal.
EDIT:
Many people are saying to use settimeout() on the socket. But that doesn't answer the behavior I need.
Because Let's say if I get a connection from a device during the server_sock mode, I would like to continue communication thereafter rather than abruptly closing the socket.
The settimeout() will close socket no matter what actions are being performed. So, that fails to answer my case

You can simply use a timeout function.
# define a socket variable server_sock
server_sock.settimeout(10.0) #setting timeout for 10 sec
# rest of your code

python udp socket.timeout on local machine

So I'm making a basic "ping" application using UDP for an assignment, and everything is working except the implementation of socket.settimeout().
I can't seem to figure out why, but it has to do with the bound socket. It might just be a quirk in Python but I'd like to confirm it before I document it.
I'm not looking for a functional code answer (that'd be cheating), but rather why what I have is broken. (e.g: some undocumented reason that Python doesn't like client/server on same machine etc)
Python Socket Timeout Details: http://docs.python.org/2/library/socket.html#socket.socket.settimeout
In the code represented below, the communication with server running on same machine is successful, but only when the client does not bind to the socket. But if it does not bind to the socket, the timeout system fails (this being tested by turning off the server, in which all ten timeouts get printed immediately and at once).
Note: Code is not ideal, but this is a networking theory class and not a programming class. It just has to work in the end. I could hand it in right now and get an A, but I want to understand why the timeout function does not work.
EDIT: To clarify an issue, the use of Bind in the client was after seeing the server code had it before I realized UDP doesn't need it, but it happened to make the timeout function work properly, but breaks the normal operation.
Could the socket.settimeout() declaration only work for TCP maybe?
Client Code (which has the timeout):
import socket
import time
import select
data = "Re-verify our range to target... one ping only. \n"
addrDest = ("127.0.0.1",8675)
addrLocal = ("127.0.0.1",12345)
totalTime = 0
averageTime = 0
totalPings = 0
#timeout_seconds = 1.0
UDPSock = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
UDPSock.bind(addrLocal)
# adding bind here even though it's UDP makes timeout work,
# but breaks normal functionality
UDPSock.settimeout(1)
while (totalPings < 10):
totalPings = (totalPings + 1)
start = time.clock()
str_list = []
str_list.append(str(totalPings))
str_list.append(" ")
str_list.append(str(start))
str_list.append(" ")
str_list.append(data)
dataOut = ''.join(str_list)
UDPSock.sendto(dataOut,addrDest)
try:
dataIn,addrIn = UDPSock.recvfrom(1024)
print dataIn.strip(),"\n",addrIn
elapsed = ((time.clock() - start) * 1000)
print elapsed, " ms round trip"
except socket.error:
print "Connection timed out for Packet #", totalPings
Server Code:
import socket
UDPSock = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
# (to all IP addresses on this system)
listen_addr = ("",8675)
UDPSock.bind(listen_addr)
# Report on all data packets received and
# where they came from in each case (as this is
# UDP, each may be from a different source and it's
# up to the server to sort this out!)
while True:
data,addr = UDPSock.recvfrom(1024)
print data.strip(),addr
UDPSock.sendto(data,addr)

Why do you need to bind to local address of the client? Will the client act as a server too at any point? If not there is no need to bind the client at all. You need a specific port only if you need your client to act as a server, if you don't call bind it will create a random port no ranging from (0 - 1023 are reserved) 1024 - 65535 (if I remember correctly) and that will be Source Port in the UDP Packet, Source Address is the Address where client runs.
According to Berkley Sockets
bind() assigns a socket to an address. When a socket is created using socket(),
it is only given a protocol family, but not assigned an address. This association with an address must be performed with the bind() system call before the socket can accept connections to other hosts
If this is a Networking class project and you are trying to implement Client-Server architecture then you should never call bind from within your client code because Client should never act as a Server and Client should connect to a listening Server not Server connecting to Client.
Update:
Bind may be required to be called from a TCP Client-Server design but not from a UDP Client-Server model because UDP is a send and forget design and doesn't have low level packet send success acknowledgement. A UDP packet will have Source Address and Port within itself.

I found the cause of the problem by removing the exception handling. There is a socket error when the server is turned off, specifically "socket.error: [Errno 10054] An existing connection was forcibly closed by the remote host" when it tries to read from the socket.
This apparently ignores the timeout function when the socket is not bound in Python (which is why the timeout worked when I bound it).
If I run the server, but just have it not send any data (by commenting the last line), the program times out correctly when it does not receive its packet back.
I am also going to use a more specific exception handler
In the end it's just a quirk in Python and the fact that UDP is connection-less.
Also, someone mentioned the use of "select" to solve this problem. I looked into it, and I would end up with an if...else statement block which kinda works, but the native exceptions are preferred.
Thanks all.
-Jimmy

Modbus TCP Client closes the connection because of unexpected answer from my server implementation

I have implemented a Modbus over TCP as server software in Python. App is multithreaded and relies heavily on standard libs. I have problems managing the connection on the server side.
Meanwhile my implementation as Modbus over TCP as client works just fine.
Implementation description
The server is multithreaded, one thread manages the SOCK_STREAM socket for receiving
frames
select is used out of efficiency reasons
A semaphore is used for preventing concurrent access on socket resource while sending or receiving
Encapsulation of Modbus upper layer is done transparently through send and receive methods, it is only a matter of building a frame with the right header and payload anyway...
Another threads runs, inside it, Modbus send and receive methods are invoked.
TCP Context
TCP is up and running, bound to a port, max client set and listening.
Traces under wireshark show:
Client: SYN
My app Server: SYN, ACK
Client: ACK
On the server side a brand new socket has been created as expected and bound to the client socket.
So far, all is good.
Modbus Context
Client: Send Modbus frame, TCP flags = 0x18 which is ACK + PUSH
My app Server: Does not wait and send a single empty TCP ack frame.
Client: Waits for a modbus frame with tcp ack flag. Therefore, takes it as an error and asks to closes the connection.
Hence, my server software cannot send any actual response afterwards as the socket on the client side is being closed or is already closed.
My problem
I receive a modbus frame that the main thread need to process (server side)
Processing takes a few ms, in the meantime a TCP ACK frame is sent through my server socket, whereas I would like it not to send anything !
Do you have any idea on how to manage the ACK behavior ? I have read stuff about the naggle algorithm, but it does not seem to be in the scope of the problem here...
I'm not sure that any option of the setsockopt method would solve my problem also, but I may be mistaken.
If you have any suggestion I am very interested...
I hope I am clear enough.

It seems like a strange requirement that all TCP packets must contain a payload as this is very difficult to control unless you are integrated with the TCP stack. If it really is the case that the client crashes because the ACK has no Modbus payload, I think the only thing you can do from python is try disabling the TCP_QUICKACK socket option so that TCP waits 500ms before sending an ACK. This obviously won't work in all cases (or may not at all if it takes your code > 500ms to create a response), but I don't know of any other options using the socket API from python.
This SO answer tells you how to disable it: Disable TCP Delayed ACKs. Should be easy to figure out how to enable it from that. Note, you need to constantly re-enable it after receiving data.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.