How to handle a broken pipe (SIGPIPE) in python?

How to handle a broken pipe (SIGPIPE) in python? - python

I've written a simple multi-threaded game server in python that creates a new thread for each client connection. I'm finding that every now and then, the server will crash because of a broken-pipe/SIGPIPE error. I'm pretty sure it is happening when the program tries to send a response back to a client that is no longer present.
What is a good way to deal with this? My preferred resolution would simply close the server-side connection to the client and move on, rather than exit the entire program.
PS: This question/answer deals with the problem in a generic way; how specifically should I solve it?

Assuming that you are using the standard socket module, you should be catching the socket.error: (32, 'Broken pipe') exception (not IOError as others have suggested). This will be raised in the case that you've described, i.e. sending/writing to a socket for which the remote side has disconnected.
import socket, errno, time
# setup socket to listen for incoming connections
s = socket.socket()
s.bind(('localhost', 1234))
s.listen(1)
remote, address = s.accept()
print "Got connection from: ", address
while 1:
try:
remote.send("message to peer\n")
time.sleep(1)
except socket.error, e:
if isinstance(e.args, tuple):
print "errno is %d" % e[0]
if e[0] == errno.EPIPE:
# remote peer disconnected
print "Detected remote disconnect"
else:
# determine and handle different error
pass
else:
print "socket error ", e
remote.close()
break
except IOError, e:
# Hmmm, Can IOError actually be raised by the socket module?
print "Got IOError: ", e
break
Note that this exception will not always be raised on the first write to a closed socket - more usually the second write (unless the number of bytes written in the first write is larger than the socket's buffer size). You need to keep this in mind in case your application thinks that the remote end received the data from the first write when it may have already disconnected.
You can reduce the incidence (but not entirely eliminate) of this by using select.select() (or poll). Check for data ready to read from the peer before attempting a write. If select reports that there is data available to read from the peer socket, read it using socket.recv(). If this returns an empty string, the remote peer has closed the connection. Because there is still a race condition here, you'll still need to catch and handle the exception.
Twisted is great for this sort of thing, however, it sounds like you've already written a fair bit of code.

Read up on the try: statement.
try:
# do something
except socket.error, e:
# A socket error
except IOError, e:
if e.errno == errno.EPIPE:
# EPIPE error
else:
# Other error

SIGPIPE (although I think maybe you mean EPIPE?) occurs on sockets when you shut down a socket and then send data to it. The simple solution is not to shut the socket down before trying to send it data. This can also happen on pipes, but it doesn't sound like that's what you're experiencing, since it's a network server.
You can also just apply the band-aid of catching the exception in some top-level handler in each thread.
Of course, if you used Twisted rather than spawning a new thread for each client connection, you probably wouldn't have this problem. It's really hard (maybe impossible, depending on your application) to get the ordering of close and write operations correct if multiple threads are dealing with the same I/O channel.

I face with the same question. But I submit the same code the next time, it just works.
The first time it broke:
$ packet_write_wait: Connection to 10.. port 22: Broken pipe
The second time it works:
[1] Done nohup python -u add_asc_dec.py > add2.log 2>&1
I guess the reason may be about the current server environment.

My answer is very close to S.Lott's, except I'd be even more particular:
try:
# do something
except IOError, e:
# ooops, check the attributes of e to see precisely what happened.
if e.errno != 23:
# I don't know how to handle this
raise
where "23" is the error number you get from EPIPE. This way you won't attempt to handle a permissions error or anything else you're not equipped for.

Related

When do we add error handling code in a program?

I learned how to add error handling. But I am little bit confused about when we need to add it to my code.
See this following example.
Do I need error handling for every line of calling the socket?
Server.py
import socket
sockfd = socket.socket()
try:
sockfd.bind(("127.0.0.1", 20001))
except socket.error as emsg:
print("Socket bind error: ", emsg)
sys.exit(1)
print("I_am", socket.gethostname(), "and_I_am_listening_...")
sockfd.listen(5)
new, who = sockfd.accept() # Return the TCP connection
print("A_connection_with", who, "has_been_established")
try:
message = new.recv(50)
except socket.error as err:
print("Recv error: ", err)
if message:
print("\'"+message.decode("ascii")+"\'", "is received from", who)
else:
print("Connection is broken")
new.close()
sockfd.close()

One commonly used approach to error handling is to add it when there is a possibility for something unexpected to happen, i.e. something that you did not expect when writing the code. This means that if you have e.g. a simple function whose different possible error states you know, that is when you do not need to add try-statements since you can handle the errors with e.g. if-else statements. This also usually reduces memory consumption, since try-block has a bigger overhead. But in case there are things that you do not know or have no control over, e.g. external program returns or API calls, that is when you should add error handling.

Recovering from zmq.error.Again on a zmq.PAIR socket

I have a single client talking to a single server using a pair socket:
context = zmq.Context()
socket = context.socket(zmq.PAIR)
socket.setsockopt(zmq.SNDTIMEO, 1000)
socket.connect("tcp://%s:%i"%(host,port))
...
if msg != None:
try:
socket.send(msg)
except Exception as e:
print(e, e.errno)
The program sends approximately one 10-byte message every second. We were seeing issues where the program would eventually start to hang infinitely waiting for a message to send, so we added a SNDTIMEO. However, now we are starting to get zmq.error.Again instead. Once we get this error, the resource never becomes available again. I'm looking into which error code exactly is occurring, but I was generally wondering what techniques people use to recover from zmq.error.Again inside their programs. Should I destroy the socket connection and re-establish it?

Fact#0: PAIR/PAIR is different from other ZeroMQ archetypes
RFC 31 explicitly defines:
Overall Goals of this Pattern
PAIR is not a general-purpose socket but is intended for specific use cases where the two peers are architecturally stable. This usually limits PAIR to use within a single process, for inter-thread communication.
Next, if not correctly set the SNDHWM size and in case of the will to use the PAIR to operate over tcp://-transport-class also all the O/S-related L3/L2-attributed, any next .send() will also yield EAGAIN error.
There are a few additional counter-measures ( CONFLATE, IMMEDIATE, HEARTBEAT_{IVL|TTL|TIMEOUT} ), but there is the above mentioned principal limit on PAIR/PAIR, which sets what not to expect to happen if using this archetype.
The main suspect:
Given the said design-side limits, a damaged transport-path, the PAIR-access-point will not re-negotiate the reconstruction of the socket into the RTO-state.
For this reason, if your code indeed wants to remain using PAIR/PAIR, it may be wise to assemble also an emergency SIG/flag path so as to allow the distributed-system robustly survive such L3/L2/L1-incidents, that the PAIR/PAIR is known not to auto-take care of.
Epilogue:
your code does not use non-blocking .send()-mode, while the EAGAIN error-state is exactly used to signal a blocked-capability ( unability of the Access-Point to .send() at this very moment ) by setting the EAGAIN.
Better use the published API details:
aRetCODE = -1 # _______________________________________ PRESET
try:
aRetCODE = socket.send( msg, zmq.DONTWAIT ) #_______ .SET on RET
if ( aRetCODE == -1 ):
... # ZeroMQ: SIG'd via ERRNO:
except:
... #_______ .HANDLE EXC
finally:
...

How to properly use timeout parameter in select?

I'm new to socket programming (and somewhat to Python too) and I'm having trouble getting the select timeout to work the way I want to (on the server side). Before clients connect, timeout works just fine. I give it a value of 1 second and the timeout expires in my loop every 1 second.
Once a client connects, however, it doesn't wait 1 second to tell me the timeout expires. It just loops as fast as it can and tells me the timeout expires. Here's a snippet of my code:
while running:
try:
self.timeout_expired = False
inputready, outputready, exceptready = select.select(self.inputs, self.outputs, [], self.timeout)
except select.error, e:
break
except socket.error, e:
break
if not (inputready):
# Timeout expired
print 'Timeout expired'
self.timeout_expired = True
# Additional processing follows here
I'm not sure if this is enough code to see where my problem is, so please let me know if you need to see more. Basically, after a client connects, it at least appears that it ignores the timeout of 1 second and just runs as fast as it can, continuously telling me "Timeout expired". Any idea what I'm missing?
Thanks much!!
Edit: I should clarify..."inputready" represents input from a client connecting or sending data to the server, as well as stdin from the server. The other variables returned from select are only server-side variables, and is what I'm trying to do is detect whether the CLIENT took too long to reply, so I'm only checking if inputready is empty.

It is only a timeout if inputready, outputready, and exceptready are ALL empty. My guess is you have added the client socket to both self.inputs and self.outputs. Since the output socket is usually writable, it will always show up in outputready. Only add the client socket to self.outputs if you are ready to output something.

"When the timeout expires, select() returns three empty lists.
...To use a timeout requires adding the extra argument to the select() call and handling the empty lists after select() returns."
readable, writable, exceptional = select.select(inputs, outputs, inputs,timeout)
if not (readable or writable or exceptional):
print(' timed out, do some other work here', file=sys.stderr)
[https://pymotw.com/3/select/index.html][1]

Python: Non-blocking socket or Asynchronos I/O

I am new to Python and currently have to write a python socket to be run as a script that communicates with a device over TCP/IP (a weather station).
The device acts as the Server Side (listening over IP:PORT, accepting connection, receiving request, transferring data).
I only need to send one message, receive the answer and then peacefully and nicely shutdown and close the socket.
try:
comSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
except socket.error, msg:
sys.stderr.write("[ERROR] %s\n" % msg[1])
sys.exit(1)
try:
comSocket.connect((''))
except socket.error, msg:
sys.stderr.write("[ERROR] %s\n" % msg[1])
sys.exit(2)
comSocket.send('\r')
comSocket.recv(128)
comSocket.send('\r')
comSocket.recv(128)
comSocket.send('\r\r')
comSocket.recv(128)
comSocket.send('1I\r\r3I\r\r4I\r\r13I\r\r5I\r\r8I\r\r7I\r\r9I\r\r')
rawData = comSocket.recv(512)
comSocket.shutdown(1)
comSocket.close()
The problem I'm having is:
The communication channel is unreliable, the device is slow. So, sometimes the device response with message of length 0 (just an ACK), the my code will freeze and wait for response forever.
This piece of code contains the portion that involves SOCKET, the whole code will be run under CRON so freezing is not a desirable behavior.
My question is:
What would be the best way in Python to handle that behavior, so that the code doesn't freeze and wait forever but will attempt to move on to the next send (or such).

You can try a timeout approach, like Russel code or you can use a non-blocking socket, as shown in the code below. It will never block at socket.recv and you can use it inside a loop to retry as many times you want. This way your program will not hang at timeout. This way, you can test if data is available and if not, you can do other things and try again later.
socket.setblocking(0)
while (retry_condition):
try:
data = socket.recv(512)
except socket.error:
'''no data yet..'''

I'd recommend eventlet and green threads for this.
Twisted is a good library but a little steep learning curve for such a simple use case.
Check out some examples here.

Try, before receiving, putting a timeout on the socket:
comSocket.settimeout(5.0)
try:
rawData = comSocket.recv(512)
except socket.timeout:
print "No response from server"

non-blocking read/log from an http stream

I have a client that connects to an HTTP stream and logs the text data it consumes.
I send the streaming server an HTTP GET request... The server replies and continuously publishes data... It will either publish text or send a ping (text) message regularly... and will never close the connection.
I need to read and log the data it consumes in a non-blocking manner.
I am doing something like this:
import urllib2
req = urllib2.urlopen(url)
for dat in req:
with open('out.txt', 'a') as f:
f.write(dat)
My questions are:
will this ever block when the stream is continuous?
how much data is read in each chunk and can it be specified/tuned?
is this the best way to read/log an http stream?

Hey, that's three questions in one! ;-)
It could block sometimes - even if your server is generating data quite quickly, network bottlenecks could in theory cause your reads to block.
Reading the URL data using "for dat in req" will mean reading a line at a time - not really useful if you're reading binary data such as an image. You get better control if you use
chunk = req.read(size)
which can of course block.
Whether it's the best way depends on specifics not available in your question. For example, if you need to run with no blocking calls whatever, you'll need to consider a framework like Twisted. If you don't want blocking to hold you up and don't want to use Twisted (which is a whole new paradigm compared to the blocking way of doing things), then you can spin up a thread to do the reading and writing to file, while your main thread goes on its merry way:
def func(req):
#code the read from URL stream and write to file here
...
t = threading.Thread(target=func)
t.start() # will execute func in a separate thread
...
t.join() # will wait for spawned thread to die
Obviously, I've omitted error checking/exception handling etc. but hopefully it's enough to give you the picture.

You're using too high-level an interface to have good control about such issues as blocking and buffering block sizes. If you're not willing to go all the way to an async interface (in which case twisted, already suggested, is hard to beat!), why not httplib, which is after all in the standard library? HTTPResponse instance .read(amount) method is more likely to block for no longer than needed to read amount bytes, than the similar method on the object returned by urlopen (although admittedly there are no documented specs about that on either module, hmmm...).

Another option is to use the socket module directly. Establish a connection, send the HTTP request, set the socket to non-blocking mode, and then read the data with socket.recv() handling 'Resource temporarily unavailable' exceptions (which means that there is nothing to read). A very rough example is this:
import socket, time
BUFSIZE = 1024
s = socket.socket()
s.connect(('localhost', 1234))
s.send('GET /path HTTP/1.0\n\n')
s.setblocking(False)
running = True
while running:
try:
print "Attempting to read from socket..."
while True:
data = s.recv(BUFSIZE)
if len(data) == 0: # remote end closed
print "Remote end closed"
running = False
break
print "Received %d bytes: %r" % (len(data), data)
except socket.error, e:
if e[0] != 11: # Resource temporarily unavailable
print e
raise
# perform other program tasks
print "Sleeping..."
time.sleep(1)
However, urllib.urlopen() has some benefits if the web server redirects, you need URL based basic authentication etc. You could make use of the select module which will tell you when there is data to read.

Yes when you catch up with the server it will block until the server produces more data
Each dat will be one line including the newline on the end
twisted is a good option
I would swap the with and for around in your example, do you really want to open and close the file for every line that arrives?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.