I'm building a web server via django 1.11.5, that uses celery-3.1.23 & rabbitmq as message queue manager, to send async-tasks to a number of different demon-processes (processes with infinite loop [long time running]).
How can I dynamically create queues for each process separately, and receive messages from the process' queue inside the daemon-process, do something asynchronously, and then forward the result to another "aggregator queue", to collect & validate the results, and sending a response to the user. (please see attached ilustracion)
So far, I connected the processes via multiprocessing.connection Client and Server objects, and opened the processes by the Process object.
code - consumer:
from multiprocessing.connection import Listener
from multiprocessing import Process
def main_process_loop(path, port_id, auth_key):
# Initialize the action1 intance to handle work stream:
action_handler = ActionHandler(path)
# Initialize the infinite loop that will run the process:
pid, auth_key_bytes = int(port_id), bytes(auth_key)
address = ('localhost', pid) # family is deduced to be 'AF_INET'
while True:
try:
listener = Listener(address, authkey=auth_key_bytes)
conn = listener.accept()
input_values = conn.recv()
listener.close()
if input_values == None:
raise Exception(ERR_MSG_INVALID_ARGV)
else:
#do something with input_values and ActionHandler
# need to return success message to user
except Exception as err:
# need to return fail message to user
if __name__ == '__main__':
# worker_processes = []
for auth_key, port_id in PID_DICT.items():
path = TEMPLATE_FORMAT.format(auth_key)
p = Process(target=main_process_loop, args=(path, port_id, auth_key))
# worker_processes.append(p)
p.start()
# for p in worker_processes:
# p.join()
# print "all processes have been initiated"
code - celery task:
from multiprocessing.connection import Client
from celery import Celery
app = Celery('tasks', broker='amqp://localhost:5672//')
#app.task
def run_backend_processes(a_lst, b_lst, in_type, out_path, in_file_name):
ARGV_FORMAT = r"IN_TYPE={0} IN_PATH={1} B_LEVEL=" + str(b_lst) + " OUT_PATH={2}"
##################################################
for process in a_lst:
pid = {
'A': 6001,
'B': 6002,
'C': 6003,
'D': 6004,
}[process]
file_path = os.path.join(out_path, process + "_" + in_file_name)
argv_string = ARGV_FORMAT.format(in_type, file_path, out_path)
address = ('localhost', int(pid))
conn = Client(address, authkey=bytes(mxd_process))
conn.send(str(argv_string))
conn.close()
return 'process succeed'
and the django's view is not unique - uses "run_backend_processes.delay"
Thank you,
Yoav.
Q&A Tried:
Celery parallel distributed task with multiprocessing
Can a celery worker/server accept tasks from a non celery producer?
Related
I have a Celery/Django worker connected via RabbitMQ to the server. When the worker finishes a job I want it to terminate if there are no jobs left - how can I check there are no jobs left in the queue?
kill the pid when task is finisched throw psutil.
For example:
import psutil
import os
#celery.task
def my_task():
pid=os.getpid() # get the worker pid
# your code
return pid # or store it somewhere
def task_caller()
result = my_task.apply()
if no_more_jobs('my_queue'):
kill_worker(result)
def kill_worker(pid):
try:
proc = psutil.Process(pid=pid)
for child in proc.children(recursive=True):
child.kill()
proc.kill()
return True
except Exception:
# manage exception
return False
def no_more_jobs(queue):
# edit below params
connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
channel = connection.channel()
q = channel.queue_declare(queue)
return q.method.message_count == 0
Note: This is a basic example which needs to be edited depending your producer/consumer logic
The Summary
I'm writing a daemon to run as a Linux service. It communicates with an API (which I'm also writing but isn't part of the problem,) gets data, does stuff with that data, then feeds the newly munged data back to the API.
When I issue the script.py start command, it works just fine. The daemon process starts the script and the daemon Threads kick off and run.
What doesn't happen is when I issue the script.py stop command, the daemon Threads keep running. The stopping of the main thread (the one kicked off by script.py start) doesn't stop the daemon Threads.
I can still see them running with ps ux. And they keep running until manually killed.
The Question
How do I get my script.py stop to kill the daemon Threads as well as the main thread launched by the daemon module?
The Details
More in depth: It's a network device polling engine with a server/agent model. This is the agent side.
There are two daemon threads:
GetterThread
PutterThread
There are up to 15 worker threads of class WorkerThread that can be launched to either ping or SNMP poll the inventory of a given IP address. They merely launch a sub-process that does the actual pinging or polling.
There are three data Queues:
ping_request_queue
poll_request_queue
result_queue
The whole thing is wrapped up in a custom class called App that is controlled by the daemon module
GetterThread
class GetterThread(threading.Thread):
""" This thread is responsible for fetching the ping and poll lists from the server and dropping them into the
appropriate queue """
server = None # type: Server
ping_request_queue = None # type: Queue.Queue
poll_request_queue = None # type: Queue.Queue
def __init__(self, server, ping_request_queue, poll_request_queue):
"""
Create the Thread
:param Server server: The server to use
:param Queue.Queue ping_request_queue:
:param Queue.Queue poll_request_queue:
"""
threading.Thread.__init__(self)
self.ctl = ThreadController()
self.server = server # type: Server
self.ping_request_queue = ping_request_queue # type: Queue.Queue
self.poll_request_queue = poll_request_queue # type: Queue.Queue
def run(self):
while self.ctl.run:
if not self.server.online:
sleep(30)
self.server.check_in()
continue
sleep(1)
ping_list, poll_list = self.server.get_lists()
for r in ping_list:
req = PingRequest.decode(r)
self.ping_request_queue.put(req)
for r in poll_list:
req = PollRequest.decode(r)
self.poll_request_queue.put(req)
self.ctl.remove()
PutterThread
class PutterThread(threading.Thread):
""" This thread is responsible for picking up results from the results_queue and sending them to the server """
server = None # type: Server
q = None # type: Queue.Queue
def __init__(self, server, result_queue):
"""
Create a thread to put the results on the server
:param Queue.Queue result_queue:
"""
threading.Thread.__init__(self)
self.ctl = ThreadController()
self.server = server # type: Server
self.q = result_queue
def run(self):
while self.ctl.run:
if not self.server.online:
sleep(30)
self.server.check_in()
continue
sleep(1)
if self.q.not_empty:
result = self.q.get()
if isinstance(result, Request):
if result.stage == Request.PINGED:
""" Send the ping results """
f = self.server.send_ping_results
lmsg = 'Sent ping result for request {}'.format(result.uuid)
elif result.stage == Request.POLLED:
f = self.server.send_poll_results
lmsg = 'Sent poll result for request {}'.format(result.uuid)
else:
continue
f(result)
logging.debug(lmsg)
else:
logging.info('Bad request in queue: {!r}'.format(result))
self.ctl.remove()
Both the getter and putter thread instances are set as daemons.
I'm running the whole script as a daemon:
class App:
def __init__(self):
self.pidfile_path = "/var/run/project/poller.agent.pid"
self.logfile_path = "/var/log/project/poller.agent.log"
self.handler = logging.FileHandler(self.logfile_path)
def run(self):
result_queue = Queue.Queue()
ping_request_queue = Queue.Queue()
poll_request_queue = Queue.Queue()
getter_thread = GetterThread(self.server, ping_request_queue, poll_request_queue)
getter_thread.setName('GetterThread')
getter_thread.setDaemon(True)
putter_thread = PutterThread(self.server, results_queue)
putter_thread.setName('PutterThread')
putter_thread.setDaemon(True)
worker_threads = []
max_threads = {
'ping': 5,
'poll': 10,
}
thread_defs = [
('ping', ping_request_queue, result_queue),
('poll', poll_request_queue, result_queue)
]
while True:
if ping_request_queue.not_empty or poll_request_queue.not_empty:
for thread_def in thread_defs:
thread_type, input_queue, output_queue = thread_def
thread_count = min(input_queue.qsize(), max_threads.get(thread_type))
for x in range(thread_count):
t = WorkerThread(*thread_def)
t.setName('WorkerThread-{}-{:02n}'.format(thread_type, x)
worker_threads.append(t)
t.start()
sleep(1)
if __name__ == "__main__":
app = App()
daemon_runner = runner.DaemonRunner(app)
daemon_runner.daemon_context.files_preserve = [app.handler.stream]
daemon_runner.do_action()
Suppose I have one master process that divides up data to be processed in parallel. Lets say there are 1000 chunks of data and 100 nodes on which to run the computations.
Is there some way to do REQ/REP to keep all the workers busy? I've tried to use the load balancer pattern in the guide but with a single client, sock.recv() is going to block until it receives its response from the worker.
Here is the code, slightly modified from the zmq guide for a load balancer. Is starts up one client, 10 workers, and a load balancer/broker in the middle. How can I get all those workers working at the same time???
from __future__ import print_function
from multiprocessing import Process
import zmq
import time
import uuid
import random
def client_task():
"""Basic request-reply client using REQ socket."""
socket = zmq.Context().socket(zmq.REQ)
socket.identity = str(uuid.uuid4())
socket.connect("ipc://frontend.ipc")
# Send request, get reply
for i in range(100):
print("SENDING: ", i)
socket.send('WORK')
msg = socket.recv()
print(msg)
def worker_task():
"""Worker task, using a REQ socket to do load-balancing."""
socket = zmq.Context().socket(zmq.REQ)
socket.identity = str(uuid.uuid4())
socket.connect("ipc://backend.ipc")
# Tell broker we're ready for work
socket.send(b"READY")
while True:
address, empty, request = socket.recv_multipart()
time.sleep(random.randint(1, 4))
socket.send_multipart([address, b"", b"OK : " + str(socket.identity)])
def broker():
context = zmq.Context()
frontend = context.socket(zmq.ROUTER)
frontend.bind("ipc://frontend.ipc")
backend = context.socket(zmq.ROUTER)
backend.bind("ipc://backend.ipc")
# Initialize main loop state
workers = []
poller = zmq.Poller()
# Only poll for requests from backend until workers are available
poller.register(backend, zmq.POLLIN)
while True:
sockets = dict(poller.poll())
if backend in sockets:
# Handle worker activity on the backend
request = backend.recv_multipart()
worker, empty, client = request[:3]
if not workers:
# Poll for clients now that a worker is available
poller.register(frontend, zmq.POLLIN)
workers.append(worker)
if client != b"READY" and len(request) > 3:
# If client reply, send rest back to frontend
empty, reply = request[3:]
frontend.send_multipart([client, b"", reply])
if frontend in sockets:
# Get next client request, route to last-used worker
client, empty, request = frontend.recv_multipart()
worker = workers.pop(0)
backend.send_multipart([worker, b"", client, b"", request])
if not workers:
# Don't poll clients if no workers are available
poller.unregister(frontend)
# Clean up
backend.close()
frontend.close()
context.term()
def main():
NUM_CLIENTS = 1
NUM_WORKERS = 10
# Start background tasks
def start(task, *args):
process = Process(target=task, args=args)
process.start()
start(broker)
for i in range(NUM_CLIENTS):
start(client_task)
for i in range(NUM_WORKERS):
start(worker_task)
# Process(target=broker).start()
if __name__ == "__main__":
main()
I guess there is different ways to do this :
-you can, for example, use the threading module to launch all your requests from your single client, with something like:
result_list = [] # Add the result to a list for the example
rlock = threading.RLock()
def client_thread(client_url, request, i):
context = zmq.Context.instance()
socket = context.socket(zmq.REQ)
socket.setsockopt_string(zmq.IDENTITY, '{}'.format(i))
socket.connect(client_url)
socket.send(request.encode())
reply = socket.recv()
with rlock:
result_list.append((i, reply))
return
def client_task():
# tasks = list with all your tasks
url_client = "ipc://frontend.ipc"
threads = []
for i in range(len(tasks)):
thread = threading.Thread(target=client_thread,
args=(url_client, tasks[i], i,))
thread.start()
threads.append(thread)
-you can take benefit of an evented library like asyncio (there is a submodule zmq.asyncio and an other library aiozmq, the last one offers a higher level of abstraction). In this case you will send your requests to the workers, sequentially too, but without blocking for each response (and so not keeping the main loop busy) and get the results when they came back to the main loop. This could look like this:
import asyncio
import zmq.asyncio
async def client_async(request, context, i, client_url):
"""Basic client sending a request (REQ) to a ROUTER (the broker)"""
socket = context.socket(zmq.REQ)
socket.setsockopt_string(zmq.IDENTITY, '{}'.format(i))
socket.connect(client_url)
await socket.send(request.encode())
reply = await socket.recv()
socket.close()
return reply
async def run(loop):
# tasks = list full of tasks
url_client = "ipc://frontend.ipc"
asyncio_tasks = []
ctx = zmq.asyncio.Context()
for i in range(len(tasks)):
task = asyncio.ensure_future(client_async(tasks[i], ctx, i, url_client))
asyncio_tasks.append(task)
responses = await asyncio.gather(*asyncio_tasks)
return responses
zmq.asyncio.install()
loop = asyncio.get_event_loop()
results = loop.run_until_complete(run(loop))
I didn't tested theses two snippets but they are both coming (with modifications to fit the question) from code i have using zmq in a similar configuration than your question.
I'm reading this code http://zguide.zeromq.org/py:mtserver
But when I've tried to replace threading.Thread by multiprocessing.Process I got the error
Assertion failed: ok (mailbox.cpp:84)
Code is
import time
import threading
import zmq
def worker_routine(worker_url, context=None):
"""Worker routine"""
context = context or zmq.Context.instance()
# Socket to talk to dispatcher
socket = context.socket(zmq.REP)
socket.connect(worker_url)
while True:
string = socket.recv()
print("Received request: [ %s ]" % (string))
# do some 'work'
time.sleep(1)
#send reply back to client
socket.send(b"World")
def main():
"""Server routine"""
url_worker = "inproc://workers"
url_client = "tcp://*:5555"
# Prepare our context and sockets
context = zmq.Context.instance()
# Socket to talk to clients
clients = context.socket(zmq.ROUTER)
clients.bind(url_client)
# Socket to talk to workers
workers = context.socket(zmq.DEALER)
workers.bind(url_worker)
# Launch pool of worker threads
for i in range(5):
process = multiprocessing.Process(target=worker_routine, args=(url_worker,))
process.start()
zmq.device(zmq.QUEUE, clients, workers)
# We never get here but clean up anyhow
clients.close()
workers.close()
context.term()
if __name__ == "__main__":
main()
The limitations of each transport is detailed in the API.
inproc is for intra-process communication (i.e. threads). You should try ipc which support inter-process communication or even just tcp.
According to the celery tutorial regarding real-time monitoring of celery workers, one can also programmatically capture the events produced by the workers and take action accordingly.
My question is how can I integrate a monitor as the one in this example, in a Celery-Django application?
EDIT:
The code example in the tutorial looks like:
from celery import Celery
def my_monitor(app):
state = app.events.State()
def announce_failed_tasks(event):
state.event(event)
task_id = event['uuid']
print('TASK FAILED: %s[%s] %s' % (
event['name'], task_id, state[task_id].info(), ))
with app.connection() as connection:
recv = app.events.Receiver(connection, handlers={
'task-failed': announce_failed_tasks,
'worker-heartbeat': announce_dead_workers,
})
recv.capture(limit=None, timeout=None, wakeup=True)
if __name__ == '__main__':
celery = Celery(broker='amqp://guest#localhost//')
my_monitor(celery)
So I want to capture task_failed event sent by the worker, and to get its task_id like the tutorial shows, to get the result for this task from the result-backend that was configured for my application and process it further. My problem is that it is not obvious to me how to get the application, as in a django-celery project it is not transparent to me the instantiation of Celery library.
I am also open to any other idea as to how to process the results when a worker has finished executing a task.
Ok, I found a way of doing this, though I am not sure that this is the solution, but it works for me. The monitor function basically connects directly to the broker and listens to different types of events. My code looks like this:
from celery.events import EventReceiver
from kombu import Connection as BrokerConnection
def my_monitor:
connection = BrokerConnection('amqp://guest:guest#localhost:5672//')
def on_event(event):
print "EVENT HAPPENED: ", event
def on_task_failed(event):
exception = event['exception']
print "TASK FAILED!", event, " EXCEPTION: ", exception
while True:
try:
with connection as conn:
recv = EventReceiver(conn,
handlers={'task-failed' : on_task_failed,
'task-succeeded' : on_event,
'task-sent' : on_event,
'task-received' : on_event,
'task-revoked' : on_event,
'task-started' : on_event,
# OR: '*' : on_event
})
recv.capture(limit=None, timeout=None)
except (KeyboardInterrupt, SystemExit):
print "EXCEPTION KEYBOARD INTERRUPT"
sys.exit()
This is all. And I run this in a different process than the normal application, meaning that I create a child process of my celery application which only runs this function.
HTH
Beware of a couple of gotchas
You need to set CELERY_SEND_EVENTS flag as true in your celery config.
You can also set the event monitor in a new thread from your worker.
Here is my implementation:
class MonitorThread(object):
def __init__(self, celery_app, interval=1):
self.celery_app = celery_app
self.interval = interval
self.state = self.celery_app.events.State()
self.thread = threading.Thread(target=self.run, args=())
self.thread.daemon = True
self.thread.start()
def catchall(self, event):
if event['type'] != 'worker-heartbeat':
self.state.event(event)
# logic here
def run(self):
while True:
try:
with self.celery_app.connection() as connection:
recv = self.celery_app.events.Receiver(connection, handlers={
'*': self.catchall
})
recv.capture(limit=None, timeout=None, wakeup=True)
except (KeyboardInterrupt, SystemExit):
raise
except Exception:
# unable to capture
pass
time.sleep(self.interval)
if __name__ == '__main__':
app = get_celery_app() # returns app
MonitorThread(app)
app.start()