ZeroMQ: load balance many workers and one master

ZeroMQ: load balance many workers and one master - python

Suppose I have one master process that divides up data to be processed in parallel. Lets say there are 1000 chunks of data and 100 nodes on which to run the computations.
Is there some way to do REQ/REP to keep all the workers busy? I've tried to use the load balancer pattern in the guide but with a single client, sock.recv() is going to block until it receives its response from the worker.
Here is the code, slightly modified from the zmq guide for a load balancer. Is starts up one client, 10 workers, and a load balancer/broker in the middle. How can I get all those workers working at the same time???
from __future__ import print_function
from multiprocessing import Process
import zmq
import time
import uuid
import random
def client_task():
"""Basic request-reply client using REQ socket."""
socket = zmq.Context().socket(zmq.REQ)
socket.identity = str(uuid.uuid4())
socket.connect("ipc://frontend.ipc")
# Send request, get reply
for i in range(100):
print("SENDING: ", i)
socket.send('WORK')
msg = socket.recv()
print(msg)
def worker_task():
"""Worker task, using a REQ socket to do load-balancing."""
socket = zmq.Context().socket(zmq.REQ)
socket.identity = str(uuid.uuid4())
socket.connect("ipc://backend.ipc")
# Tell broker we're ready for work
socket.send(b"READY")
while True:
address, empty, request = socket.recv_multipart()
time.sleep(random.randint(1, 4))
socket.send_multipart([address, b"", b"OK : " + str(socket.identity)])
def broker():
context = zmq.Context()
frontend = context.socket(zmq.ROUTER)
frontend.bind("ipc://frontend.ipc")
backend = context.socket(zmq.ROUTER)
backend.bind("ipc://backend.ipc")
# Initialize main loop state
workers = []
poller = zmq.Poller()
# Only poll for requests from backend until workers are available
poller.register(backend, zmq.POLLIN)
while True:
sockets = dict(poller.poll())
if backend in sockets:
# Handle worker activity on the backend
request = backend.recv_multipart()
worker, empty, client = request[:3]
if not workers:
# Poll for clients now that a worker is available
poller.register(frontend, zmq.POLLIN)
workers.append(worker)
if client != b"READY" and len(request) > 3:
# If client reply, send rest back to frontend
empty, reply = request[3:]
frontend.send_multipart([client, b"", reply])
if frontend in sockets:
# Get next client request, route to last-used worker
client, empty, request = frontend.recv_multipart()
worker = workers.pop(0)
backend.send_multipart([worker, b"", client, b"", request])
if not workers:
# Don't poll clients if no workers are available
poller.unregister(frontend)
# Clean up
backend.close()
frontend.close()
context.term()
def main():
NUM_CLIENTS = 1
NUM_WORKERS = 10
# Start background tasks
def start(task, *args):
process = Process(target=task, args=args)
process.start()
start(broker)
for i in range(NUM_CLIENTS):
start(client_task)
for i in range(NUM_WORKERS):
start(worker_task)
# Process(target=broker).start()
if __name__ == "__main__":
main()

I guess there is different ways to do this :
-you can, for example, use the threading module to launch all your requests from your single client, with something like:
result_list = [] # Add the result to a list for the example
rlock = threading.RLock()
def client_thread(client_url, request, i):
context = zmq.Context.instance()
socket = context.socket(zmq.REQ)
socket.setsockopt_string(zmq.IDENTITY, '{}'.format(i))
socket.connect(client_url)
socket.send(request.encode())
reply = socket.recv()
with rlock:
result_list.append((i, reply))
return
def client_task():
# tasks = list with all your tasks
url_client = "ipc://frontend.ipc"
threads = []
for i in range(len(tasks)):
thread = threading.Thread(target=client_thread,
args=(url_client, tasks[i], i,))
thread.start()
threads.append(thread)
-you can take benefit of an evented library like asyncio (there is a submodule zmq.asyncio and an other library aiozmq, the last one offers a higher level of abstraction). In this case you will send your requests to the workers, sequentially too, but without blocking for each response (and so not keeping the main loop busy) and get the results when they came back to the main loop. This could look like this:
import asyncio
import zmq.asyncio
async def client_async(request, context, i, client_url):
"""Basic client sending a request (REQ) to a ROUTER (the broker)"""
socket = context.socket(zmq.REQ)
socket.setsockopt_string(zmq.IDENTITY, '{}'.format(i))
socket.connect(client_url)
await socket.send(request.encode())
reply = await socket.recv()
socket.close()
return reply
async def run(loop):
# tasks = list full of tasks
url_client = "ipc://frontend.ipc"
asyncio_tasks = []
ctx = zmq.asyncio.Context()
for i in range(len(tasks)):
task = asyncio.ensure_future(client_async(tasks[i], ctx, i, url_client))
asyncio_tasks.append(task)
responses = await asyncio.gather(*asyncio_tasks)
return responses
zmq.asyncio.install()
loop = asyncio.get_event_loop()
results = loop.run_until_complete(run(loop))
I didn't tested theses two snippets but they are both coming (with modifications to fit the question) from code i have using zmq in a similar configuration than your question.

Related

celery - message queues for long time running processes

I'm building a web server via django 1.11.5, that uses celery-3.1.23 & rabbitmq as message queue manager, to send async-tasks to a number of different demon-processes (processes with infinite loop [long time running]).
How can I dynamically create queues for each process separately, and receive messages from the process' queue inside the daemon-process, do something asynchronously, and then forward the result to another "aggregator queue", to collect & validate the results, and sending a response to the user. (please see attached ilustracion)
So far, I connected the processes via multiprocessing.connection Client and Server objects, and opened the processes by the Process object.
code - consumer:
from multiprocessing.connection import Listener
from multiprocessing import Process
def main_process_loop(path, port_id, auth_key):
# Initialize the action1 intance to handle work stream:
action_handler = ActionHandler(path)
# Initialize the infinite loop that will run the process:
pid, auth_key_bytes = int(port_id), bytes(auth_key)
address = ('localhost', pid) # family is deduced to be 'AF_INET'
while True:
try:
listener = Listener(address, authkey=auth_key_bytes)
conn = listener.accept()
input_values = conn.recv()
listener.close()
if input_values == None:
raise Exception(ERR_MSG_INVALID_ARGV)
else:
#do something with input_values and ActionHandler
# need to return success message to user
except Exception as err:
# need to return fail message to user
if __name__ == '__main__':
# worker_processes = []
for auth_key, port_id in PID_DICT.items():
path = TEMPLATE_FORMAT.format(auth_key)
p = Process(target=main_process_loop, args=(path, port_id, auth_key))
# worker_processes.append(p)
p.start()
# for p in worker_processes:
# p.join()
# print "all processes have been initiated"
code - celery task:
from multiprocessing.connection import Client
from celery import Celery
app = Celery('tasks', broker='amqp://localhost:5672//')
#app.task
def run_backend_processes(a_lst, b_lst, in_type, out_path, in_file_name):
ARGV_FORMAT = r"IN_TYPE={0} IN_PATH={1} B_LEVEL=" + str(b_lst) + " OUT_PATH={2}"
##################################################
for process in a_lst:
pid = {
'A': 6001,
'B': 6002,
'C': 6003,
'D': 6004,
}[process]
file_path = os.path.join(out_path, process + "_" + in_file_name)
argv_string = ARGV_FORMAT.format(in_type, file_path, out_path)
address = ('localhost', int(pid))
conn = Client(address, authkey=bytes(mxd_process))
conn.send(str(argv_string))
conn.close()
return 'process succeed'
and the django's view is not unique - uses "run_backend_processes.delay"
Thank you,
Yoav.
Q&A Tried:
Celery parallel distributed task with multiprocessing
Can a celery worker/server accept tasks from a non celery producer?

Daemon Thread in Daemon service

The Summary
I'm writing a daemon to run as a Linux service. It communicates with an API (which I'm also writing but isn't part of the problem,) gets data, does stuff with that data, then feeds the newly munged data back to the API.
When I issue the script.py start command, it works just fine. The daemon process starts the script and the daemon Threads kick off and run.
What doesn't happen is when I issue the script.py stop command, the daemon Threads keep running. The stopping of the main thread (the one kicked off by script.py start) doesn't stop the daemon Threads.
I can still see them running with ps ux. And they keep running until manually killed.
The Question
How do I get my script.py stop to kill the daemon Threads as well as the main thread launched by the daemon module?
The Details
More in depth: It's a network device polling engine with a server/agent model. This is the agent side.
There are two daemon threads:
GetterThread
PutterThread
There are up to 15 worker threads of class WorkerThread that can be launched to either ping or SNMP poll the inventory of a given IP address. They merely launch a sub-process that does the actual pinging or polling.
There are three data Queues:
ping_request_queue
poll_request_queue
result_queue
The whole thing is wrapped up in a custom class called App that is controlled by the daemon module
GetterThread
class GetterThread(threading.Thread):
""" This thread is responsible for fetching the ping and poll lists from the server and dropping them into the
appropriate queue """
server = None # type: Server
ping_request_queue = None # type: Queue.Queue
poll_request_queue = None # type: Queue.Queue
def __init__(self, server, ping_request_queue, poll_request_queue):
"""
Create the Thread
:param Server server: The server to use
:param Queue.Queue ping_request_queue:
:param Queue.Queue poll_request_queue:
"""
threading.Thread.__init__(self)
self.ctl = ThreadController()
self.server = server # type: Server
self.ping_request_queue = ping_request_queue # type: Queue.Queue
self.poll_request_queue = poll_request_queue # type: Queue.Queue
def run(self):
while self.ctl.run:
if not self.server.online:
sleep(30)
self.server.check_in()
continue
sleep(1)
ping_list, poll_list = self.server.get_lists()
for r in ping_list:
req = PingRequest.decode(r)
self.ping_request_queue.put(req)
for r in poll_list:
req = PollRequest.decode(r)
self.poll_request_queue.put(req)
self.ctl.remove()
PutterThread
class PutterThread(threading.Thread):
""" This thread is responsible for picking up results from the results_queue and sending them to the server """
server = None # type: Server
q = None # type: Queue.Queue
def __init__(self, server, result_queue):
"""
Create a thread to put the results on the server
:param Queue.Queue result_queue:
"""
threading.Thread.__init__(self)
self.ctl = ThreadController()
self.server = server # type: Server
self.q = result_queue
def run(self):
while self.ctl.run:
if not self.server.online:
sleep(30)
self.server.check_in()
continue
sleep(1)
if self.q.not_empty:
result = self.q.get()
if isinstance(result, Request):
if result.stage == Request.PINGED:
""" Send the ping results """
f = self.server.send_ping_results
lmsg = 'Sent ping result for request {}'.format(result.uuid)
elif result.stage == Request.POLLED:
f = self.server.send_poll_results
lmsg = 'Sent poll result for request {}'.format(result.uuid)
else:
continue
f(result)
logging.debug(lmsg)
else:
logging.info('Bad request in queue: {!r}'.format(result))
self.ctl.remove()
Both the getter and putter thread instances are set as daemons.
I'm running the whole script as a daemon:
class App:
def __init__(self):
self.pidfile_path = "/var/run/project/poller.agent.pid"
self.logfile_path = "/var/log/project/poller.agent.log"
self.handler = logging.FileHandler(self.logfile_path)
def run(self):
result_queue = Queue.Queue()
ping_request_queue = Queue.Queue()
poll_request_queue = Queue.Queue()
getter_thread = GetterThread(self.server, ping_request_queue, poll_request_queue)
getter_thread.setName('GetterThread')
getter_thread.setDaemon(True)
putter_thread = PutterThread(self.server, results_queue)
putter_thread.setName('PutterThread')
putter_thread.setDaemon(True)
worker_threads = []
max_threads = {
'ping': 5,
'poll': 10,
}
thread_defs = [
('ping', ping_request_queue, result_queue),
('poll', poll_request_queue, result_queue)
]
while True:
if ping_request_queue.not_empty or poll_request_queue.not_empty:
for thread_def in thread_defs:
thread_type, input_queue, output_queue = thread_def
thread_count = min(input_queue.qsize(), max_threads.get(thread_type))
for x in range(thread_count):
t = WorkerThread(*thread_def)
t.setName('WorkerThread-{}-{:02n}'.format(thread_type, x)
worker_threads.append(t)
t.start()
sleep(1)
if __name__ == "__main__":
app = App()
daemon_runner = runner.DaemonRunner(app)
daemon_runner.daemon_context.files_preserve = [app.handler.stream]
daemon_runner.do_action()

Tornado websocket client loosing response messages?

I need to process frames from a webcam and send a few selected frames to a remote websocket server. The server answers immediately with a confirmation message (much like an echo server).
Frame processing is slow and cpu intensive so I want to do it using a separate thread pool (producer) to use all the available cores. So the client (consumer) just sits idle until the pool has something to send.
My current implementation, see below, works fine only if I add a small sleep inside the producer test loop. If I remove this delay I stop receiving any answer from the server (both the echo server and from my real server). Even the first answer is lost, so I do not think this is a flood protection mechanism.
What am I doing wrong?
import tornado
from tornado.websocket import websocket_connect
from tornado import gen, queues
import time
class TornadoClient(object):
url = None
onMessageReceived = None
onMessageSent = None
ioloop = tornado.ioloop.IOLoop.current()
q = queues.Queue()
def __init__(self, url, onMessageReceived, onMessageSent):
self.url = url
self.onMessageReceived = onMessageReceived
self.onMessageSent = onMessageSent
def enqueueMessage(self, msgData, binary=False):
print("TornadoClient.enqueueMessage")
self.ioloop.add_callback(self.addToQueue, (msgData, binary))
print("TornadoClient.enqueueMessage done")
#gen.coroutine
def addToQueue(self, msgTuple):
yield self.q.put(msgTuple)
#gen.coroutine
def main_loop(self):
connection = None
try:
while True:
while connection is None:
try:
print("Connecting...")
connection = yield websocket_connect(self.url)
print("Connected " + str(connection))
except Exception, e:
print("Exception on connection " + str(e))
connection = None
print("Retry in a few seconds...")
yield gen.Task(self.ioloop.add_timeout, time.time() + 3)
try:
print("Waiting for data to send...")
msgData, binaryVal = yield self.q.get()
print("Writing...")
sendFuture = connection.write_message(msgData, binary=binaryVal)
print("Write scheduled...")
finally:
self.q.task_done()
yield sendFuture
self.onMessageSent("Sent ok")
print("Write done. Reading...")
msg = yield connection.read_message()
print("Got msg.")
self.onMessageReceived(msg)
if msg is None:
print("Connection lost")
connection = None
print("main loop completed")
except Exception, e:
print("ExceptionExceptionException")
print(e)
connection = None
print("Exit main_loop function")
def start(self):
self.ioloop.run_sync(self.main_loop)
print("Main loop completed")
######### TEST METHODS #########
def sendMessages(client):
time.sleep(2) #TEST only: wait for client startup
while True:
client.enqueueMessage("msgData", binary=False)
time.sleep(1) # <--- comment this line to break it
def testPrintMessage(msg):
print("Received: " + str(msg))
def testPrintSentMessage(msg):
print("Sent: " + msg)
if __name__=='__main__':
from threading import Thread
client = TornadoClient("ws://echo.websocket.org", testPrintMessage, testPrintSentMessage)
thread = Thread(target = sendMessages, args = (client, ))
thread.start()
client.start()
My real problem
In my real program I use a "window like" mechanism to protect the consumer (an autobahn.twisted.websocket server): the producer can send up to a maximum number of un-acknowledge messages (the webcam frames), then stops waiting for half of the window to free up.
The consumer sends a "PROCESSED" message back acknowleding one or more messages (just a counter, not by id).
What I see on the consumer log is that the messages are processed and the answer is sent back but these acks vanish somewhere in the network.
I have little experience with asynchio so I wanted to be sure that I'm not missing any yield, annotation or something else.
This is the consumer side log:
2017-05-13 18:59:54+0200 [-] TX Frame to tcp4:192.168.0.5:48964 : fin = True, rsv = 0, opcode = 1, mask = -, length = 21, repeat_length = None, chopsize = None, sync = False, payload = {"type": "PROCESSED"}
2017-05-13 18:59:54+0200 [-] TX Octets to tcp4:192.168.0.5:48964 : sync = False, octets = 81157b2274797065223a202250524f434553534544227d

This is neat code. I believe the reason you need a sleep in your sendMessages thread is because, otherwise, it keeps calling enqueueMessage as fast as possible, millions of times per second. Since enqueueMessage does not wait for the enqueued message to be processed, it keeps calling IOLoop.add_callback as fast as it can, without giving the loop enough opportunity to execute the callbacks.
The loop might make some progress running on the main thread, since you're not actually blocking it. But the sendMessages thread adds callbacks much faster than the loop can handle them. By the time the loop has popped one message from the queue and has begun to process it, millions of new callbacks are added already, which the loop must execute before it can advance to the next stage of message-processing.
Therefore, for your test code, I think it's correct to sleep between calls to enqueueMessage on the thread.

Using process instead of thread with zeromq

I'm reading this code http://zguide.zeromq.org/py:mtserver
But when I've tried to replace threading.Thread by multiprocessing.Process I got the error
Assertion failed: ok (mailbox.cpp:84)
Code is
import time
import threading
import zmq
def worker_routine(worker_url, context=None):
"""Worker routine"""
context = context or zmq.Context.instance()
# Socket to talk to dispatcher
socket = context.socket(zmq.REP)
socket.connect(worker_url)
while True:
string = socket.recv()
print("Received request: [ %s ]" % (string))
# do some 'work'
time.sleep(1)
#send reply back to client
socket.send(b"World")
def main():
"""Server routine"""
url_worker = "inproc://workers"
url_client = "tcp://*:5555"
# Prepare our context and sockets
context = zmq.Context.instance()
# Socket to talk to clients
clients = context.socket(zmq.ROUTER)
clients.bind(url_client)
# Socket to talk to workers
workers = context.socket(zmq.DEALER)
workers.bind(url_worker)
# Launch pool of worker threads
for i in range(5):
process = multiprocessing.Process(target=worker_routine, args=(url_worker,))
process.start()
zmq.device(zmq.QUEUE, clients, workers)
# We never get here but clean up anyhow
clients.close()
workers.close()
context.term()
if __name__ == "__main__":
main()

The limitations of each transport is detailed in the API.
inproc is for intra-process communication (i.e. threads). You should try ipc which support inter-process communication or even just tcp.

ZMQ pair (for signaling) is blocking because of bad connection

I have two threads. One is a Worker Thread, the other a Communication Thread.
The Worker Thread is reading data off a serial port, doing some processing, and then enqueueing the results to be sent to a server.
The Communication Tthread is reading the results off the queue, and sending it. The challenge is that connectivity is wireless, and although usually present, it can be spotty (dropping in and out of range for a few minutes), and I don't want to block Worker Thread if I lose connectivity.
The pattern I have chosen for this, is as follows:
Worker Thread has an enqueue method which adds the message to a Queue, then send a signal to inproc://signal using a zmq.PAIR.
Communication Thread uses zmq.DEALER to communicate to the server (a zmq.ROUTER), but polls the inproc://signal pair in order to register whether there is a new message needing sending or not.
The following is a simplified example of the pattern:
import Queue
import zmq
import time
import threading
import simplejson
class ZmqPattern():
def __init__(self):
self.q_out = Queue.Queue()
self.q_in = Queue.Queue()
self.signal = None
self.API_KEY = 'SOMETHINGCOMPLEX'
self.zmq_comm_thr = None
def start_zmq_signal(self):
self.context = zmq.Context()
# signal socket for waking the zmq thread to send messages to the relay
self.signal = self.context.socket(zmq.PAIR)
self.signal.bind("inproc://signal")
def enqueue(self, msg):
print("> pre-enqueue")
self.q_out.put(msg)
print("< post-enqueue")
print(") send sig")
self.signal.send(b"")
print("( sig sent")
def communication_thread(self, q_out):
poll = zmq.Poller()
self.endpoint_url = 'tcp://' + '127.0.0.1' + ':' + '9001'
wake = self.context.socket(zmq.PAIR)
wake.connect("inproc://signal")
poll.register(wake, zmq.POLLIN)
self.socket = self.context.socket(zmq.DEALER)
self.socket.setsockopt(zmq.IDENTITY, self.API_KEY)
self.socket.connect(self.endpoint_url)
poll.register(self.socket, zmq.POLLIN)
while True:
sockets = dict(poll.poll())
if self.socket in sockets:
message = self.socket.recv()
message = simplejson.loads(message)
# Incomming messages which need to be handled on the worker thread
self.q_in.put(message)
if wake in sockets:
wake.recv()
while not q_out.empty():
print(">> Popping off Queue")
message = q_out.get()
print(">>> Popped off Queue")
message = simplejson.dumps(message)
print("<<< About to be sent")
self.socket.send(message)
print("<< Sent")
def start(self):
self.start_zmq_signal()
# ZMQ Thread
self.zmq_comm_thr = threading.Thread(target=self.communication_thread, args=([self.q_out]))
self.zmq_comm_thr.daemon = True
self.zmq_comm_thr.name = "ZMQ Thread"
self.zmq_comm_thr.start()
if __name__ == '__main__':
test = ZmqPattern()
test.start()
print '###############################################'
print '############## Starting comms #################'
print "###############################################"
last_debug = time.time()
test_msg = {}
for c in xrange(1000):
key = 'something{}'.format(c)
val = 'important{}'.format(c)
test_msg[key] = val
while True:
test.enqueue(test_msg)
if time.time() - last_debug > 1:
last_debug = time.time()
print "Still alive..."
If you run this, you'll see the dealer blocks as there is no router on the other end, and shortly after, the pair blocks as the Communication Thread isn't receiving
How should I best set up the inproc zmq to not block Worker Thread.
FYI, the most the entire system would need to buffer is in the order of 200k messages, and each message is around 256 bytes.

The dealer socket has a limit on the number of messages it will store, called the high water mark. Right below your dealer socket creation, try:
self.socket = self.context.socket(zmq.DEALER)
self.socket.setsockopt(zmq.SNDHWM, 200000)
And set that number as high as you dare; the limit is your machine's memory.
EDIT:
Some good discussion of high water marks in this question:
Majordomo broker: handling large number of connections

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

ZeroMQ: load balance many workers and one master - python

Related

celery - message queues for long time running processes

Daemon Thread in Daemon service

Tornado websocket client loosing response messages?

Using process instead of thread with zeromq

ZMQ pair (for signaling) is blocking because of bad connection

Categories

Resources