ZMQ pair (for signaling) is blocking because of bad connection - python

I have two threads. One is a Worker Thread, the other a Communication Thread.
The Worker Thread is reading data off a serial port, doing some processing, and then enqueueing the results to be sent to a server.
The Communication Tthread is reading the results off the queue, and sending it. The challenge is that connectivity is wireless, and although usually present, it can be spotty (dropping in and out of range for a few minutes), and I don't want to block Worker Thread if I lose connectivity.
The pattern I have chosen for this, is as follows:
Worker Thread has an enqueue method which adds the message to a Queue, then send a signal to inproc://signal using a zmq.PAIR.
Communication Thread uses zmq.DEALER to communicate to the server (a zmq.ROUTER), but polls the inproc://signal pair in order to register whether there is a new message needing sending or not.
The following is a simplified example of the pattern:
import Queue
import zmq
import time
import threading
import simplejson
class ZmqPattern():
def __init__(self):
self.q_out = Queue.Queue()
self.q_in = Queue.Queue()
self.signal = None
self.API_KEY = 'SOMETHINGCOMPLEX'
self.zmq_comm_thr = None
def start_zmq_signal(self):
self.context = zmq.Context()
# signal socket for waking the zmq thread to send messages to the relay
self.signal = self.context.socket(zmq.PAIR)
self.signal.bind("inproc://signal")
def enqueue(self, msg):
print("> pre-enqueue")
self.q_out.put(msg)
print("< post-enqueue")
print(") send sig")
self.signal.send(b"")
print("( sig sent")
def communication_thread(self, q_out):
poll = zmq.Poller()
self.endpoint_url = 'tcp://' + '127.0.0.1' + ':' + '9001'
wake = self.context.socket(zmq.PAIR)
wake.connect("inproc://signal")
poll.register(wake, zmq.POLLIN)
self.socket = self.context.socket(zmq.DEALER)
self.socket.setsockopt(zmq.IDENTITY, self.API_KEY)
self.socket.connect(self.endpoint_url)
poll.register(self.socket, zmq.POLLIN)
while True:
sockets = dict(poll.poll())
if self.socket in sockets:
message = self.socket.recv()
message = simplejson.loads(message)
# Incomming messages which need to be handled on the worker thread
self.q_in.put(message)
if wake in sockets:
wake.recv()
while not q_out.empty():
print(">> Popping off Queue")
message = q_out.get()
print(">>> Popped off Queue")
message = simplejson.dumps(message)
print("<<< About to be sent")
self.socket.send(message)
print("<< Sent")
def start(self):
self.start_zmq_signal()
# ZMQ Thread
self.zmq_comm_thr = threading.Thread(target=self.communication_thread, args=([self.q_out]))
self.zmq_comm_thr.daemon = True
self.zmq_comm_thr.name = "ZMQ Thread"
self.zmq_comm_thr.start()
if __name__ == '__main__':
test = ZmqPattern()
test.start()
print '###############################################'
print '############## Starting comms #################'
print "###############################################"
last_debug = time.time()
test_msg = {}
for c in xrange(1000):
key = 'something{}'.format(c)
val = 'important{}'.format(c)
test_msg[key] = val
while True:
test.enqueue(test_msg)
if time.time() - last_debug > 1:
last_debug = time.time()
print "Still alive..."
If you run this, you'll see the dealer blocks as there is no router on the other end, and shortly after, the pair blocks as the Communication Thread isn't receiving
How should I best set up the inproc zmq to not block Worker Thread.
FYI, the most the entire system would need to buffer is in the order of 200k messages, and each message is around 256 bytes.

The dealer socket has a limit on the number of messages it will store, called the high water mark. Right below your dealer socket creation, try:
self.socket = self.context.socket(zmq.DEALER)
self.socket.setsockopt(zmq.SNDHWM, 200000)
And set that number as high as you dare; the limit is your machine's memory.
EDIT:
Some good discussion of high water marks in this question:
Majordomo broker: handling large number of connections

Related

Python socket.recv(8192) is hanging despite epoll

I'm quite sure I'm the problem here, but I can't separate myself to understand what I've done wrong.
I'm using epoll to check if there's data in the pipe from clients.
If there is, retrieve it and just put it in a placeholder and if the data is len(0) or less, disconnect.
But for whatever reason, .recv(8192) becomes a blocking call holding up the code for 5 solid seconds. Which is coincidentally the same amount I'm using as as sleep on my thread in the client application for the first thread.
Server side:
import sys
from select import epoll, EPOLLIN, EPOLLHUP
from socket import *
from time import time
socks = {}
polly = epoll()
sock = socket()
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.bind(('', 1337))
sock.listen(4)
mainFid = sock.fileno()
polly.register(mainFid, EPOLLIN)
while 1:
for fid, eid in polly.poll(0.25):
#print('New event:', fid, eid)
if fid == mainFid:
ns, na = sock.accept()
polly.register(ns.fileno())
print(time(), na,'connected', ns.fileno())
socks[ns.fileno()] = {'sock' : ns, 'addr' : na}
elif fid in socks:
print(time(), 'Client is sending data!')
data = socks[fid]['sock'].recv(8192)
print(time(), 'Data recieved successfully.')
if len(data) <= 0: socks[fid]['sock'].close()
Client side:
from socket import *
from time import sleep, time
from threading import *
class aClient(Thread):
def __init__(self, delay=5):
Thread.__init__(self)
self.s = socket()
self.delay = delay
print(time(), 'Connecting.')
self.s.connect(('127.0.0.1', 1337))
self.start()
def run(self):
print(time(), 'Sending data.')
self.s.send(b'Get me data!')
print(time(), 'Data has been sent.')
sleep(self.delay)
self.s.close()
aClient()
sleep(2)
aClient(10)
I can't for the love of it all understand why recv() becomes a blocking call, there's obviously(?) data in the pipe..
For starters, this makes the code useless in terms of speed, but also clients will received "Connection refused" because I'm not picking them up quickly enough because I send way to much time in recv().
Using python -m trace --trace test_socket.py it also shows that it gets stuck on:
test_socket.py(27): data = socks[fid]['sock'].recv(8192)
Whenever the client disconnects the recv() releases, that's the correlation between the clients sleep.
And this is normally not how it's supposed to happen.
I've tested on two different machines including not using 127.0.0.1 as the source and destination. Same thing there. And searching for a solution just returns my own solutions on SO threads

Tornado websocket client loosing response messages?

I need to process frames from a webcam and send a few selected frames to a remote websocket server. The server answers immediately with a confirmation message (much like an echo server).
Frame processing is slow and cpu intensive so I want to do it using a separate thread pool (producer) to use all the available cores. So the client (consumer) just sits idle until the pool has something to send.
My current implementation, see below, works fine only if I add a small sleep inside the producer test loop. If I remove this delay I stop receiving any answer from the server (both the echo server and from my real server). Even the first answer is lost, so I do not think this is a flood protection mechanism.
What am I doing wrong?
import tornado
from tornado.websocket import websocket_connect
from tornado import gen, queues
import time
class TornadoClient(object):
url = None
onMessageReceived = None
onMessageSent = None
ioloop = tornado.ioloop.IOLoop.current()
q = queues.Queue()
def __init__(self, url, onMessageReceived, onMessageSent):
self.url = url
self.onMessageReceived = onMessageReceived
self.onMessageSent = onMessageSent
def enqueueMessage(self, msgData, binary=False):
print("TornadoClient.enqueueMessage")
self.ioloop.add_callback(self.addToQueue, (msgData, binary))
print("TornadoClient.enqueueMessage done")
#gen.coroutine
def addToQueue(self, msgTuple):
yield self.q.put(msgTuple)
#gen.coroutine
def main_loop(self):
connection = None
try:
while True:
while connection is None:
try:
print("Connecting...")
connection = yield websocket_connect(self.url)
print("Connected " + str(connection))
except Exception, e:
print("Exception on connection " + str(e))
connection = None
print("Retry in a few seconds...")
yield gen.Task(self.ioloop.add_timeout, time.time() + 3)
try:
print("Waiting for data to send...")
msgData, binaryVal = yield self.q.get()
print("Writing...")
sendFuture = connection.write_message(msgData, binary=binaryVal)
print("Write scheduled...")
finally:
self.q.task_done()
yield sendFuture
self.onMessageSent("Sent ok")
print("Write done. Reading...")
msg = yield connection.read_message()
print("Got msg.")
self.onMessageReceived(msg)
if msg is None:
print("Connection lost")
connection = None
print("main loop completed")
except Exception, e:
print("ExceptionExceptionException")
print(e)
connection = None
print("Exit main_loop function")
def start(self):
self.ioloop.run_sync(self.main_loop)
print("Main loop completed")
######### TEST METHODS #########
def sendMessages(client):
time.sleep(2) #TEST only: wait for client startup
while True:
client.enqueueMessage("msgData", binary=False)
time.sleep(1) # <--- comment this line to break it
def testPrintMessage(msg):
print("Received: " + str(msg))
def testPrintSentMessage(msg):
print("Sent: " + msg)
if __name__=='__main__':
from threading import Thread
client = TornadoClient("ws://echo.websocket.org", testPrintMessage, testPrintSentMessage)
thread = Thread(target = sendMessages, args = (client, ))
thread.start()
client.start()
My real problem
In my real program I use a "window like" mechanism to protect the consumer (an autobahn.twisted.websocket server): the producer can send up to a maximum number of un-acknowledge messages (the webcam frames), then stops waiting for half of the window to free up.
The consumer sends a "PROCESSED" message back acknowleding one or more messages (just a counter, not by id).
What I see on the consumer log is that the messages are processed and the answer is sent back but these acks vanish somewhere in the network.
I have little experience with asynchio so I wanted to be sure that I'm not missing any yield, annotation or something else.
This is the consumer side log:
2017-05-13 18:59:54+0200 [-] TX Frame to tcp4:192.168.0.5:48964 : fin = True, rsv = 0, opcode = 1, mask = -, length = 21, repeat_length = None, chopsize = None, sync = False, payload = {"type": "PROCESSED"}
2017-05-13 18:59:54+0200 [-] TX Octets to tcp4:192.168.0.5:48964 : sync = False, octets = 81157b2274797065223a202250524f434553534544227d
This is neat code. I believe the reason you need a sleep in your sendMessages thread is because, otherwise, it keeps calling enqueueMessage as fast as possible, millions of times per second. Since enqueueMessage does not wait for the enqueued message to be processed, it keeps calling IOLoop.add_callback as fast as it can, without giving the loop enough opportunity to execute the callbacks.
The loop might make some progress running on the main thread, since you're not actually blocking it. But the sendMessages thread adds callbacks much faster than the loop can handle them. By the time the loop has popped one message from the queue and has begun to process it, millions of new callbacks are added already, which the loop must execute before it can advance to the next stage of message-processing.
Therefore, for your test code, I think it's correct to sleep between calls to enqueueMessage on the thread.

ZeroMQ: load balance many workers and one master

Suppose I have one master process that divides up data to be processed in parallel. Lets say there are 1000 chunks of data and 100 nodes on which to run the computations.
Is there some way to do REQ/REP to keep all the workers busy? I've tried to use the load balancer pattern in the guide but with a single client, sock.recv() is going to block until it receives its response from the worker.
Here is the code, slightly modified from the zmq guide for a load balancer. Is starts up one client, 10 workers, and a load balancer/broker in the middle. How can I get all those workers working at the same time???
from __future__ import print_function
from multiprocessing import Process
import zmq
import time
import uuid
import random
def client_task():
"""Basic request-reply client using REQ socket."""
socket = zmq.Context().socket(zmq.REQ)
socket.identity = str(uuid.uuid4())
socket.connect("ipc://frontend.ipc")
# Send request, get reply
for i in range(100):
print("SENDING: ", i)
socket.send('WORK')
msg = socket.recv()
print(msg)
def worker_task():
"""Worker task, using a REQ socket to do load-balancing."""
socket = zmq.Context().socket(zmq.REQ)
socket.identity = str(uuid.uuid4())
socket.connect("ipc://backend.ipc")
# Tell broker we're ready for work
socket.send(b"READY")
while True:
address, empty, request = socket.recv_multipart()
time.sleep(random.randint(1, 4))
socket.send_multipart([address, b"", b"OK : " + str(socket.identity)])
def broker():
context = zmq.Context()
frontend = context.socket(zmq.ROUTER)
frontend.bind("ipc://frontend.ipc")
backend = context.socket(zmq.ROUTER)
backend.bind("ipc://backend.ipc")
# Initialize main loop state
workers = []
poller = zmq.Poller()
# Only poll for requests from backend until workers are available
poller.register(backend, zmq.POLLIN)
while True:
sockets = dict(poller.poll())
if backend in sockets:
# Handle worker activity on the backend
request = backend.recv_multipart()
worker, empty, client = request[:3]
if not workers:
# Poll for clients now that a worker is available
poller.register(frontend, zmq.POLLIN)
workers.append(worker)
if client != b"READY" and len(request) > 3:
# If client reply, send rest back to frontend
empty, reply = request[3:]
frontend.send_multipart([client, b"", reply])
if frontend in sockets:
# Get next client request, route to last-used worker
client, empty, request = frontend.recv_multipart()
worker = workers.pop(0)
backend.send_multipart([worker, b"", client, b"", request])
if not workers:
# Don't poll clients if no workers are available
poller.unregister(frontend)
# Clean up
backend.close()
frontend.close()
context.term()
def main():
NUM_CLIENTS = 1
NUM_WORKERS = 10
# Start background tasks
def start(task, *args):
process = Process(target=task, args=args)
process.start()
start(broker)
for i in range(NUM_CLIENTS):
start(client_task)
for i in range(NUM_WORKERS):
start(worker_task)
# Process(target=broker).start()
if __name__ == "__main__":
main()
I guess there is different ways to do this :
-you can, for example, use the threading module to launch all your requests from your single client, with something like:
result_list = [] # Add the result to a list for the example
rlock = threading.RLock()
def client_thread(client_url, request, i):
context = zmq.Context.instance()
socket = context.socket(zmq.REQ)
socket.setsockopt_string(zmq.IDENTITY, '{}'.format(i))
socket.connect(client_url)
socket.send(request.encode())
reply = socket.recv()
with rlock:
result_list.append((i, reply))
return
def client_task():
# tasks = list with all your tasks
url_client = "ipc://frontend.ipc"
threads = []
for i in range(len(tasks)):
thread = threading.Thread(target=client_thread,
args=(url_client, tasks[i], i,))
thread.start()
threads.append(thread)
-you can take benefit of an evented library like asyncio (there is a submodule zmq.asyncio and an other library aiozmq, the last one offers a higher level of abstraction). In this case you will send your requests to the workers, sequentially too, but without blocking for each response (and so not keeping the main loop busy) and get the results when they came back to the main loop. This could look like this:
import asyncio
import zmq.asyncio
async def client_async(request, context, i, client_url):
"""Basic client sending a request (REQ) to a ROUTER (the broker)"""
socket = context.socket(zmq.REQ)
socket.setsockopt_string(zmq.IDENTITY, '{}'.format(i))
socket.connect(client_url)
await socket.send(request.encode())
reply = await socket.recv()
socket.close()
return reply
async def run(loop):
# tasks = list full of tasks
url_client = "ipc://frontend.ipc"
asyncio_tasks = []
ctx = zmq.asyncio.Context()
for i in range(len(tasks)):
task = asyncio.ensure_future(client_async(tasks[i], ctx, i, url_client))
asyncio_tasks.append(task)
responses = await asyncio.gather(*asyncio_tasks)
return responses
zmq.asyncio.install()
loop = asyncio.get_event_loop()
results = loop.run_until_complete(run(loop))
I didn't tested theses two snippets but they are both coming (with modifications to fit the question) from code i have using zmq in a similar configuration than your question.

ZeroMQ round robin and workers subscription

I got some clients connecting to a frontend broker and some workers doing some job.
zeromq pattern I use :
How can I have a round-robin distribution for my workers AND a worker selection base on event name ?
I used PUB/SUB pattern for the subscription filtering but I don't want my broker to send the same message to workers.
Here some code (python3, zmq):
client.py
context = zmq.Context()
socket = context.socket(zmq.DEALER)
socket.identity = b'frontend'
socket.connect('tcp://127.0.0.1:4444')
while True:
event = random.choice([b'CreateUser', b'GetIndex', b'GetIndex', b'GetIndex'])
socket.send(event)
print('Emit %s event' % event)
time.sleep(1)
broker.py
context = zmq.Context()
frontend = context.socket(zmq.ROUTER)
frontend.identity = b'broker'
frontend.bind("tcp://127.0.0.1:4444")
backend = context.socket(zmq.DEALER)
backend.identity = b'broker'
backend.bind("tcp://127.0.0.1:5555")
poller = zmq.Poller()
poller.register(frontend, zmq.POLLIN)
poller.register(backend, zmq.POLLIN)
id = 0
while True:
id += 1
sockets = dict(poller.poll())
if frontend in sockets:
event, message = frontend.recv_multipart()
print('Event %s from %s' % (message.decode('utf-8'), event.decode('utf-8')))
backend.send_multipart([message,str(id).encode('utf-8')])
create_user_worker.py
context = zmq.Context()
worker = context.socket(zmq.DEALER)
worker.identity = b'create-user-worker'
worker.connect("tcp://127.0.0.1:5555")
while True:
message, id = worker.recv_multipart()
if message == b'CreateUser':
print(message, id)
get_index_worker.py
context = zmq.Context()
worker = context.socket(zmq.DEALER)
worker.identity = b'get-index-worker'
worker.connect("tcp://127.0.0.1:5555")
while True:
message, id = worker.recv_multipart()
if message == b'GetIndex':
print(message, id)
The output of the following code:
get_index_worker.py
b'GetIndex' b'1'
b'GetIndex' b'2'
b'GetIndex' b'4'
b'GetIndex' b'6'
create_user_worker.py
b'CreateUser' b'3'
The task for the event with the id 5 is lost
github repo: https://github.com/guillaumevincent/tornado-zeromq
Status Quo : As-is
ROUTER/DEALER Device is agnostic to any other logic, than it's internal design dictates ( listen on client side, dispatch any incoming message on a round-robin basis down the line, towards a worker side & keep internal records so as to be able to return answer messages from workers back towards the respective client, nothing more )
How to get more?
Try to imagine another possible approach.
Each client can have more sockets and may get .connect()-ed to more Device-s.
Each Device will receive just the "specialised" type of messages and will handle these appropriately with a standard, round-robin "primitive-load-balancing" Merry-Go-Round behaviour
This way both your design objectives ( distribute messages towards a pool of otherwise load-balanced handlers I. while keeping an event-specific direction principle II. ) are met with still using the most primitive ZeroMQ entities.

Using process instead of thread with zeromq

I'm reading this code http://zguide.zeromq.org/py:mtserver
But when I've tried to replace threading.Thread by multiprocessing.Process I got the error
Assertion failed: ok (mailbox.cpp:84)
Code is
import time
import threading
import zmq
def worker_routine(worker_url, context=None):
"""Worker routine"""
context = context or zmq.Context.instance()
# Socket to talk to dispatcher
socket = context.socket(zmq.REP)
socket.connect(worker_url)
while True:
string = socket.recv()
print("Received request: [ %s ]" % (string))
# do some 'work'
time.sleep(1)
#send reply back to client
socket.send(b"World")
def main():
"""Server routine"""
url_worker = "inproc://workers"
url_client = "tcp://*:5555"
# Prepare our context and sockets
context = zmq.Context.instance()
# Socket to talk to clients
clients = context.socket(zmq.ROUTER)
clients.bind(url_client)
# Socket to talk to workers
workers = context.socket(zmq.DEALER)
workers.bind(url_worker)
# Launch pool of worker threads
for i in range(5):
process = multiprocessing.Process(target=worker_routine, args=(url_worker,))
process.start()
zmq.device(zmq.QUEUE, clients, workers)
# We never get here but clean up anyhow
clients.close()
workers.close()
context.term()
if __name__ == "__main__":
main()
The limitations of each transport is detailed in the API.
inproc is for intra-process communication (i.e. threads). You should try ipc which support inter-process communication or even just tcp.

Categories