Zeromq with python hangs if connecting to invalid socket - python

If I connect to an inexistent socket with pyzmq I need to hit CTRL_C to stop the program. Could someone explay why this happens?
import zmq
INVALID_ADDR = 'ipc:///tmp/idontexist.socket'
context = zmq.Context()
socket = context.socket(zmq.REQ)
socket.connect(INVALID_ADDR)
socket.send('hello')
poller = zmq.Poller()
poller.register(socket, zmq.POLLIN)
conn = dict(poller.poll(1000))
if conn:
if conn.get(socket) == zmq.POLLIN:
print "got result: ", socket.recv(zmq.NOBLOCK)
else:
print 'got no result'

This question was also posted as a pyzmq Issue on GitHub. I will paraphrase my explanation here (I hope that is appropriate, I am fairly new to SO):
A general rule: When in doubt, hangs at the end of your zeromq program are due to LINGER.
The hang here is caused by the LINGER socket option, and happens in the context.term() method called during garbage collection at the very end of the script. The LINGER behavior is described in the zeromq docs, but to put it simply, it is a timeout (in milliseconds) to wait for any pending messages in the queue to be handled after closing the socket before dropping the messages. The default behavior is LINGER=-1, which means to wait forever.
In this case, since no peer was ever started, the 'hello' message that you tried to send is still waiting in the send queue when the socket tries to close. With LINGER=-1, ZeroMQ will wait until a peer is ready to receive that message before shutting down. If you bind a REP socket to 'ipc:///tmp/idontexist.socket' while this script is apparently hanging, the message will be delivered and the script will finish exiting cleanly.
If you do not want your script to wait (as indicated by your print statements that you have already given up on getting a reply), set LINGER to any non-negative value (e.g. socket.linger = 0), and context.term() will return after waiting the specified number of milliseconds.
I should note that the INVALID_ADDR variable name suggests an understanding that connection to an interface that does not yet have a listener is not valid - this is incorrect. zeromq allows bind/connect events to happen in any order, as illustrated by the behavior described above, of binding a REP socket to the interface while the sending script is blocking on term().

In most cases, you can bind and connect ZMQ sockets in either order, so your connect()/send() is simply waiting for the corresponding bind() at the other end, which never comes, so the program appears to hang. Check where the program is hanging by printing out some logging statements...

Related

How can I prevent pyzmq from blocking my python application?

This is my code:
def _poll_for_messages(self, poller: Poller):
sockets = dict(poller.poll(3000))
if not sockets:
self._reconnect_if_necessary(poller)
return
if self._command_handler.command_socket in sockets:
encoded_message = self._command_handler.command_socket.recv_multipart()
This should communicate with my service bus and potentially reconnect if the bus gets restarted. When the Bus gets shut down, sometimes the last line still gets reached but the socket is not able to receive a message and it waits for one indefinitely.
For normal receives there is zmq.DONTWAIT but this does not work for multipart messages as far as I'm aware. Is there an easy way around this or am I polling for messages the wrong way in general?
If anyone stumbles over this and has the same problem, mine got fixed by adding the zmq.POLLIN flag when registering a socket to my poller:
poller.register(self._command_handler._command_socket, zmq.POLLIN)

closing the server after all clients are closed

I have a very basic socket script which sends a single message to clients.
Part of server script :
while True:
con,address=s.accept()
con.send("Hello from server".encode())
con.close()
s.close()
Part of client script :
message = s.recv(5)
while message:
print("Message", message.decode())
sleep(1)
message=s.recv(5)
s.close()
I start 2 clients. They both prints the message (5 bytes at a time), then close.
However the server remains open, because it is still waiting for clients.
What is the correct way to exit the server while True loop ?
You have to specify on what condition you want your server to exit. Usually a server is programmed like a daemon, i.e., run indefinitely. In python, you already have a way to break the infinite while loop -- Ctrl-C to trigger a keyboard exception. Otherwise, think of the following:
After N clients handled, break inside the loop. You will need to have a counter and keep track of the clients handled
On some POSIX signal, such as the answer in How do I capture SIGINT in Python?, and usually this is the way daemons do to terminate nicely
By the way, your server code may need rewrite: You currently only handle one client at a time without parallel processing. It will very easy run into head-of-line blocking issues when you have many clients.

readable socket times out on recv

I have a 'jobs' server which accepts requests from a client (there are 8 clients sending requests from another machine). The server then submits a 'job' (a 'job' is just an executable which writes a results file to disk), and on a 'jobs manager' thread waits until the job is done. When a job is done it sends a message to the client that a results files is ready to be copied back to the client.
On the main thread I use select to read incoming connections from clients, as well as jobs requests:
readable, writable, exceptional = select.select(inputs, [], [])
where inputs is a list of accepted connections (sockets), and this list also includes the server socket. All sockets are set to non-blocking. To my best understanding, if this call to select returns a non-empty readable, it means some elements of inputs has incoming data waiting to be read.
I am reading data using the following logic (SIZE is a constant):
for s in readable:
if s is not server:
try:
socket_ok = True
data = s.recv(SIZE)
except socket.error as e:
print ('ERROR socket error: ' + str(e) )
socket_ok = False
except Exception as e:
print ('ERROR error reading from socket: ' + str(e))
socket_ok = False
if not socket_ok:
# do something
I have 2 problems:
Sometimes I get a [Errno 110] Connection timed out exception, and I don't understand why - if I have a readable socket, doesn't it mean it has some data to be read?
How to deal with this exception - the #do something part. I can do a 'cleanup' - delete the running jobs which were requested by the timed-out socket, and remove the dead socket from the list. But I have no way of letting the client know that it should stop waiting for these jobs' results. Ideally I would like to reconnect somehow, because the jobs themselves keep running and produce results which I don't want to throw away.
EDIT I realized now that the jobs manager thread also have access to the sockets via a Queue instance - if a job is finished, the thread sends a 'job done' message through the relevant socket - so maybe the send and recv methods of the same socket cause some kind of race condition? But anyway, I don't see how this can cause a 'connection timed out' error.
A solution that was just a guess and seems to work: On the client side, I am using a blocking recv method to get message from the server that the job is done. Since a job can take a long time (e.g if the cluster running the jobs is low on resources), I guessed that maybe the socket waiting was the cause of the time-out. So instead of using recv in blocking mode, I use it with time-out of 5 seconds, so I can send a dummy message to the server every 5 seconds to keep the connection alive until a message is received. Now I don't get the exception (on the server side) any more.

In this Python 3 client-server example, client can't send more than one message

This is a simple client-server example where the server returns whatever the client sends, but reversed.
Server:
import socketserver
class MyTCPHandler(socketserver.BaseRequestHandler):
def handle(self):
self.data = self.request.recv(1024)
print('RECEIVED: ' + str(self.data))
self.request.sendall(str(self.data)[::-1].encode('utf-8'))
server = socketserver.TCPServer(('localhost', 9999), MyTCPHandler)
server.serve_forever()
Client:
import socket
import threading
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect(('localhost',9999))
def readData():
while True:
data = s.recv(1024)
if data:
print('Received: ' + data.decode('utf-8'))
t1 = threading.Thread(target=readData)
t1.start()
def sendData():
while True:
intxt = input()
s.send(intxt.encode('utf-8'))
t2 = threading.Thread(target=sendData)
t2.start()
I took the server from an example I found on Google, but the client was written from scratch. The idea was having a client that can keep sending and receiving data from the server indefinitely.
Sending the first message with the client works. But when I try to send a second message, I get this error:
ConnectionAbortedError: [WinError 10053] An established connection was
aborted by the software in your host machine
What am I doing wrong?
For TCPServer, the handle method of the handler gets called once to handle the entire session. This may not be entirely clear from the documentation, but socketserver is, like many libraries in the stdlib, meant to serve as clear sample code as well as to be used directly, which is why the docs link to the source, where you can clearly see that it's only going to call handle once per connection (TCPServer.get_request is defined as just calling accept on the socket).
So, your server receives one buffer, sends back a response, and then quits, closing the connection.
To fix this, you need to use a loop:
def handle(self):
while True:
self.data = self.request.recv(1024)
if not self.data:
print('DISCONNECTED')
break
print('RECEIVED: ' + str(self.data))
self.request.sendall(str(self.data)[::-1].encode('utf-8'))
A few side notes:
First, using BaseRequestHandler on its own only allows you to handle one client connection at a time. As the introduction in the docs says:
These four classes process requests synchronously; each request must be completed before the next request can be started. This isn’t suitable if each request takes a long time to complete, because it requires a lot of computation, or because it returns a lot of data which the client is slow to process. The solution is to create a separate process or thread to handle each request; the ForkingMixIn and ThreadingMixIn mix-in classes can be used to support asynchronous behaviour.
Those mixin classes are described further in the rest of the introduction, and farther down the page, and at the bottom, with a nice example at the end. The docs don't make it clear, but if you need to do any CPU-intensive work in your handler, you want ForkingMixIn; if you need to share data between handlers, you want ThreadingMixIn; otherwise it doesn't matter much which you choose.
Note that if you're trying to handle a large number of simultaneous clients (more than a couple dozen), neither forking nor threading is really appropriate—which means TCPServer isn't really appropriate. For that case, you probably want asyncio, or a third-party library (Twisted, gevent, etc.).
Calling str(self.data) is a bad idea. You're just going to get the source-code-compatible representation of the byte string, like b'spam\n'. What you want is to decode the byte string into the equivalent Unicode string: self.data.decode('utf8').
There's no guarantee that each sendall on one side will match up with a single recv on the other side. TCP is a stream of bytes, not a stream of messages; it's perfectly possible to get half a message in one recv, and two and a half messages in the next one. When testing with a single connection on localhost with the system under light load, it will probably appear to "work", but as soon as you try to deploy any code that assumes that each recv gets exactly one message, your code will break. See Sockets are byte streams, not message streams for more details. Note that if your messages are just lines of text (as they are in your example), using StreamRequestHandler and its rfile attribute, instead of BaseRequestHandler and its request attribute, solves this problem trivially.
You probably want to set server.allow_reuse_address = True. Otherwise, if you quit the server and re-launch it again too quickly, it'll fail with an error like OSError: [Errno 48] Address already in use.

python socket server/client protocol with unstable client connection

I have a threaded python socket server that opens a new thread for each connection.
The thread is a very simple communication based on question and answer.
Basically client sends initial data transmission, server takes it run an external app that does stuff to the transmission and returns a reply that the server will send back and the loop will begin again until client disconnects.
Now because the client will be on a mobile phone thus an unstable connection I get left with open threads no longer connected and because the loop starts with recv it is rather difficult to break on lost connectivity this way.
I was thinking on adding a send before the recv to test if connection is still alive but this might not help at all if the client disconnects after my failsafe send as the client sends a data stream every 5 seconds only.
I noticed the recv will break sometimes but not always and in those cases I am left with zombie threads using resources.
Also this could be a solid vulnerability for my system to be DOSed.
I have looked through the python manual and Googled since thursday trying to find something for this but most things I find are related to client and non blocking mode.
Can anyone point me in the right direction towards a good way on fixing this issue?
Code samples:
Listener:
serversocket = socket(AF_INET, SOCK_STREAM)
serversocket.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
serversocket.bind(addr)
serversocket.listen(2)
logg("Binded to port: " + str(port))
# Listening Loop
while 1:
clientsocket, clientaddr = serversocket.accept()
threading.Thread(target=handler, args=(clientsocket, clientaddr,port,)).start()
# This is useless as it will never get here
serversocket.close()
Handler:
# Socket connection handler (Threaded)
def handler(clientsocket, clientaddr, port):
clientsocket.settimeout(15)
# Loop till client closes connection or connection drops
while 1:
stream = ''
while 1:
ending = stream[-6:] # get stream ending
if ending == '.$$$$.':
break
try:
data = clientsocket.recv(1)
except:
sys.exit()
if not data:
sys.exit()
# this is the usual point where thread is closed when a client closes connection normally
stream += data
# Clear the line ending
stream = base64.b64encode(stream[:-6])
# Send data to be processed
re = getreply(stream)
# Send response to client
try:
clientsocket.send(re + str('.$$$$.'))
except:
sys.exit()
As you can see there are three conditions that at least one should trigger exit if connection fails but sometimes they do not.
Sorry, but I think that threaded idea in this case is not good. As you do not need to process/do a lot of stuff in these threads (workers?) and most of the time these threads are waiting for socket (is the blocking operation, isn't it?) I would advice to read about event-driven programming. According to sockets this pattern is extremly useful, becouse you can do all stuff in one thread. You are communicate with one socket at a time, but the rest of connections are just waiting to data so there is almost no loss. When you send several bytes you just check that maybe another connection requires carrying. You can read about select
and epoll.
In python there is several libraries to play with this nicly:
libev (c library wrapper) - pyev
tornado
twisted
I used tornado in some projects and it is done this task very good. Libev is nice also, but is a c-wrapper so it is a little bit low-level (but very nice for some tasks).
So you should use socket.settimeout(float) with the clientsocket like one of the comments suggested.
The reason you don't see any difference is, when you call socket.recv(bufsize[, flags]) and the timeout runs out an socket.timeout exception is thrown and you catch that exception and exit.
try:
data = clientsocket.recv(1)
except:
sys.exit()
should be somthing like:
try:
data = clientsocket.recv(1)
except timeout:
#timeout occurred
#handle it
clientsocket.close()
sys.exit()

Categories