Python multiprocessing and handling exceptions in workers

Python multiprocessing and handling exceptions in workers - python

I use python multiprocessing library for an algorithm in which I have many workers processing certain data and returning result to the parent process. I use multiprocessing.Queue for passing jobs to workers, and second to collect results.
It all works pretty well, until worker fails to process some chunk of data. In the simplified example below each worker has two phases:
initialization - can fail, in this case worker should be destroyed
data processing - processing a chunk of data can fail, in this case worker should skip this chunk and continue with next data.
When either of this phases fails I get a deadlock after script completion. This code simulates my problem:
import multiprocessing as mp
import random
workers_count = 5
# Probability of failure, change to simulate failures
fail_init_p = 0.2
fail_job_p = 0.3
#========= Worker =========
def do_work(job_state, arg):
if random.random() < fail_job_p:
raise Exception("Job failed")
return "job %d processed %d" % (job_state, arg)
def init(args):
if random.random() < fail_init_p:
raise Exception("Worker init failed")
return args
def worker_function(args, jobs_queue, result_queue):
# INIT
# What to do when init() fails?
try:
state = init(args)
except:
print "!Worker %d init fail" % args
return
# DO WORK
# Process data in the jobs queue
for job in iter(jobs_queue.get, None):
try:
# Can throw an exception!
result = do_work(state, job)
result_queue.put(result)
except:
print "!Job %d failed, skip..." % job
finally:
jobs_queue.task_done()
# Telling that we are done with processing stop token
jobs_queue.task_done()
#========= Parent =========
jobs = mp.JoinableQueue()
results = mp.Queue()
for i in range(workers_count):
mp.Process(target=worker_function, args=(i, jobs, results)).start()
# Populate jobs queue
results_to_expect = 0
for j in range(30):
jobs.put(j)
results_to_expect += 1
# Collecting the results
# What if some workers failed to process the job and we have
# less results than expected
for r in range(results_to_expect):
result = results.get()
print result
#Signal all workers to finish
for i in range(workers_count):
jobs.put(None)
#Wait for them to finish
jobs.join()
I have two question about this code:
When init() fails, how to detect that worker is invalid and not to wait for it to finish?
When do_work() fails, how to notify parent process that less results should be expected in the results queue?
Thank you for help!

I changed your code slightly to make it work (see explanation below).
import multiprocessing as mp
import random
workers_count = 5
# Probability of failure, change to simulate failures
fail_init_p = 0.5
fail_job_p = 0.4
#========= Worker =========
def do_work(job_state, arg):
if random.random() < fail_job_p:
raise Exception("Job failed")
return "job %d processed %d" % (job_state, arg)
def init(args):
if random.random() < fail_init_p:
raise Exception("Worker init failed")
return args
def worker_function(args, jobs_queue, result_queue):
# INIT
# What to do when init() fails?
try:
state = init(args)
except:
print "!Worker %d init fail" % args
result_queue.put('init failed')
return
# DO WORK
# Process data in the jobs queue
for job in iter(jobs_queue.get, None):
try:
# Can throw an exception!
result = do_work(state, job)
result_queue.put(result)
except:
print "!Job %d failed, skip..." % job
result_queue.put('job failed')
#========= Parent =========
jobs = mp.Queue()
results = mp.Queue()
for i in range(workers_count):
mp.Process(target=worker_function, args=(i, jobs, results)).start()
# Populate jobs queue
results_to_expect = 0
for j in range(30):
jobs.put(j)
results_to_expect += 1
init_failures = 0
job_failures = 0
successes = 0
while job_failures + successes < 30 and init_failures < workers_count:
result = results.get()
init_failures += int(result == 'init failed')
job_failures += int(result == 'job failed')
successes += int(result != 'init failed' and result != 'job failed')
#print init_failures, job_failures, successes
for ii in range(workers_count):
jobs.put(None)
My changes:
Changed jobs to be just a normal Queue (instead of JoinableQueue).
Workers now communicate back special results strings "init failed" and "job failed".
The master process monitors for the said special results so long as specific conditions are in effect.
In the end, put "stop" requests (i.e. None jobs) for however many workers you have, regardless. Note that not all of these may be pulled from the queue (in case the worker failed to initalize).
By the way, your original code was nice and easy to work with. The random probabilities bit is pretty cool.

Related

100 percent load with multiprocessing queues

this only replicates my problem to get 100% load for the main python script if it tries to control loop over a shared queue
import multiprocessing
import random
def func1(num, q):
while True:
num = random.randint(1, 101)
if q.empty():
q.put(num)
def func2(num, q):
while True:
num = q.get()
num = num ** 2
if q.empty():
q.put(num)
num = 2
q = multiprocessing.Queue()
p1 = multiprocessing.Process(target=func1, args=(num, q))
p2 = multiprocessing.Process(target=func2, args=(num, q))
p1.daemon = True
p2.daemon = True
p1.start()
p2.start()
running = True
while running:
if not q.empty():
num = q.get(True, 0.1)
print(num)
would there be a better method to control from a script multiple worker processes. Better in sense of no load !?

I'm not sure I understand your program:
What's with the num parameter of func1() and func2()? It never gets used.
func2 will discard its result if func1 happens to have posted another number after func2 got the last number out of the queue.
Why do you daemonize the workers? Are you quite sure this is what you want?
The if not q.empty(): q.get() construct in the main code will sooner or later raise a queue.Empty exception because it's a race between it and the q.get() in func2.
The uncaught queue.Empty exception will terminate the main process, leaving the two workers orphaned - and running.
General advice:
Use different queues for issuing jobs (request queue) and collecting results (response queue). Include the request in the response if necessary.
Think about how to terminate the workers. Consider a "poison pill", i.e. a value in the request queue that causes workers to die, i.e. exit/terminate.
Be really really sure you understand the race conditions in your code, like the one I mentioned above (empty vs. get).
Here's some sample code I hacked up:
import multiprocessing
import time
import random
import os
def request_generator(requests):
while True:
requests.put(random.randint(1, 101))
time.sleep(0.01)
def worker(requests, responses):
worker_id = os.getpid()
while True:
request = requests.get()
response = request ** 2
responses.put((request, response, worker_id))
def main():
requests = multiprocessing.Queue()
responses = multiprocessing.Queue()
gen = multiprocessing.Process(target=request_generator, args=(requests,))
w1 = multiprocessing.Process(target=worker, args=(requests, responses))
w2 = multiprocessing.Process(target=worker, args=(requests, responses))
gen.start()
w1.start()
w2.start()
while True:
req, resp, worker_id = responses.get()
print("worker {}: {} => {}".format(worker_id, req, resp))
if __name__ == "__main__":
main()

Queue doesn't process all elements when there are many threads

I have noticed that when I have many threads pulling elements from a queue, there are less elements processed than the number that I put into the queue. This is sporadic but seems to happen somewhere around half the time when I run the following code.
#!/bin/env python
from threading import Thread
import httplib, sys
from Queue import Queue
import time
import random
concurrent = 500
num_jobs = 500
results = {}
def doWork():
while True:
result = None
try:
result = curl(q.get())
except Exception as e:
print "Error when trying to get from queue: {0}".format(str(e))
if results.has_key(result):
results[result] += 1
else:
results[result] = 1
try:
q.task_done()
except:
print "Called task_done when all tasks were done"
def curl(ourl):
result = 'all good'
try:
time.sleep(random.random() * 2)
except Exception as e:
result = "error: %s" % str(e)
except:
result = str(sys.exc_info()[0])
finally:
return result or "None"
print "\nRunning {0} jobs on {1} threads...".format(num_jobs, concurrent)
q = Queue()
for i in range(concurrent):
t = Thread(target=doWork)
t.daemon = True
t.start()
for x in range(num_jobs):
q.put("something")
try:
q.join()
except KeyboardInterrupt:
sys.exit(1)
total_responses = 0
for result in results:
num_responses = results[result]
print "{0}: {1} time(s)".format(result, num_responses)
total_responses += num_responses
print "Number of elements processed: {0}".format(total_responses)

Tim Peters hit the nail on the head in the comments. The issue is that the tracking of results is threaded and isn't protected by any sort of mutex. That allows something like this to happen:
thread A gets result: "all good"
thread A checks results[result]
thread A sees no such key
thread A suspends # <-- before counting its result
thread B gets result: "all good"
thread B checks results[result]
thread B sees no such key
thread B sets results['all good'] = 1
thread C ...
thread C sets results['all good'] = 2
thread D ...
thread A resumes # <-- and remembers it needs to count its result still
thread A sets results['all good'] = 1 # resetting previous work!
A more typical workflow might have a results queue that the main thread is listening on.
workq = queue.Queue()
resultsq = queue.Queue()
make_work(into=workq)
do_work(from=workq, respond_on=resultsq)
# do_work would do respond_on.put_nowait(result) instead of
# return result
results = {}
while True:
try:
result = resultsq.get()
except queue.Empty:
break # maybe? You'd probably want to retry a few times
results.setdefault(result, 0) += 1

multiprocessing - reading big input data - program hangs

I want to run parallel computation on some input data which is loaded from a file. (The file can be really big, so I use a generator for this.)
On a certain number of items, my code runs OK but above this threshold the program hangs (some of the worker processes do not end).
Any suggestions? (I am running this with python2.7, 8 CPUs; 5,000 lines still OK, 7,500 does not work.)
Firstly, you need an input file. Generate it in bash:
for i in {0..10000}; do echo -e "$i"'\r' >> counter.txt; done
Then, run this:
python2.7 main.py 100 counter.txt > run_log.txt
main.py:
#!/usr/bin/python2.7
import os, sys, signal, time
import Queue
import multiprocessing as mp
def eat_queue(job_queue, result_queue):
"""Eats input queue, feeds output queue
"""
proc_name = mp.current_process().name
while True:
try:
job = job_queue.get(block=False)
if job == None:
print(proc_name + " DONE")
return
result_queue.put(execute(job))
except Queue.Empty:
pass
def execute(x):
"""Does the computation on the input data
"""
return x*x
def save_result(result):
"""Saves results in a list
"""
result_list.append(result)
def load(ifilename):
"""Generator reading the input file and
yielding it row by row
"""
ifile = open(ifilename, "r")
for line in ifile:
line = line.strip()
num = int(line)
yield (num)
ifile.close()
print("file closed".upper())
def put_tasks(job_queue, ifilename):
"""Feeds the job queue
"""
for item in load(ifilename):
job_queue.put(item)
for _ in range(get_max_workers()):
job_queue.put(None)
def get_max_workers():
"""Returns optimal number of processes to run
"""
max_workers = mp.cpu_count() - 2
if max_workers < 1:
return 1
return max_workers
def run(workers_num, ifilename):
job_queue = mp.Queue()
result_queue = mp.Queue()
# decide how many processes are to be created
max_workers = get_max_workers()
print "processes available: %d" % max_workers
if workers_num < 1 or workers_num > max_workers:
workers_num = max_workers
workers_list = []
# a process for feeding job queue with the input file
task_gen = mp.Process(target=put_tasks, name="task_gen",
args=(job_queue, ifilename))
workers_list.append(task_gen)
for i in range(workers_num):
tmp = mp.Process(target=eat_queue, name="w%d" % (i+1),
args=(job_queue, result_queue))
workers_list.append(tmp)
for worker in workers_list:
worker.start()
for worker in workers_list:
worker.join()
print "worker %s finished!" % worker.name
if __name__ == '__main__':
result_list = []
args = sys.argv
workers_num = int(args[1])
ifilename = args[2]
run(workers_num, ifilename)

This is because nothing in your code takes anything off result_queue. The behavior then depends on internal queue buffering details: if "not a lot" of data is waiting, everything appears fine, but if "a lot" of data is waiting, everything freezes. Not much more can be said, because it involves layers of internal magic ;-) But the docs do warn about it:
Warning
As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See Programming guidelines.
One easy way to repair that: First add
result_queue.put(None)
before eat_queue() returns. Then add:
count = 0
while count < workers_num:
if result_queue.get() is None:
count += 1
before the main program .join()s the workers. That drains the result queue, and everything shuts down cleanly then.
BTW, this code is pretty bizarre:
while True:
try:
job = job_queue.get(block=False)
if job == None:
print(proc_name + " DONE")
return
result_queue.put(execute(job))
except Queue.Empty:
pass
Why are you doing non-blocking get()? This turns into a CPU-hog "busy loop" so long as the queue is empty. The primary point of .get() is to supply an efficient way to wait for work to show up. So:
while True:
job = job_queue.get()
if job is None:
print(proc_name + " DONE")
break
else:
result_queue.put(execute(job))
result_queue.put(None)
does the same thing, but far more efficiently.
Queue size caution
You didn't ask about this, but let's cover it before it bites you ;-) By default, there is no bound on a Queue's size. If, e.g., you add a billion items to the Queue, it will demand enough RAM to hold a billion items. So if your producer(s) can generate work items faster than your consumer(s) can process them, memory use can get out of hand quickly.
Fortunately, that's easy to repair: specify a maximum queue size. For example,
job_queue = mp.Queue(maxsize=10*workers_num)
^^^^^^^^^^^^^^^^^^^^^^^
Then job_queue.put(some_work_item) will block until consumers reduce the size of the queue to less than the maximum. This way you can process enormous problems with a queue that requires trivial RAM.

How to use multiprocessing queue in Python?

I'm having much trouble trying to understand just how the multiprocessing queue works on python and how to implement it. Lets say I have two python modules that access data from a shared file, let's call these two modules a writer and a reader. My plan is to have both the reader and writer put requests into two separate multiprocessing queues, and then have a third process pop these requests in a loop and execute as such.
My main problem is that I really don't know how to implement multiprocessing.queue correctly, you cannot really instantiate the object for each process since they will be separate queues, how do you make sure that all processes relate to a shared queue (or in this case, queues)

My main problem is that I really don't know how to implement multiprocessing.queue correctly, you cannot really instantiate the object for each process since they will be separate queues, how do you make sure that all processes relate to a shared queue (or in this case, queues)
This is a simple example of a reader and writer sharing a single queue... The writer sends a bunch of integers to the reader; when the writer runs out of numbers, it sends 'DONE', which lets the reader know to break out of the read loop.
You can spawn as many reader processes as you like...
from multiprocessing import Process, Queue
import time
import sys
def reader_proc(queue):
"""Read from the queue; this spawns as a separate Process"""
while True:
msg = queue.get() # Read from the queue and do nothing
if msg == "DONE":
break
def writer(count, num_of_reader_procs, queue):
"""Write integers into the queue. A reader_proc() will read them from the queue"""
for ii in range(0, count):
queue.put(ii) # Put 'count' numbers into queue
### Tell all readers to stop...
for ii in range(0, num_of_reader_procs):
queue.put("DONE")
def start_reader_procs(qq, num_of_reader_procs):
"""Start the reader processes and return all in a list to the caller"""
all_reader_procs = list()
for ii in range(0, num_of_reader_procs):
### reader_p() reads from qq as a separate process...
### you can spawn as many reader_p() as you like
### however, there is usually a point of diminishing returns
reader_p = Process(target=reader_proc, args=((qq),))
reader_p.daemon = True
reader_p.start() # Launch reader_p() as another proc
all_reader_procs.append(reader_p)
return all_reader_procs
if __name__ == "__main__":
num_of_reader_procs = 2
qq = Queue() # writer() writes to qq from _this_ process
for count in [10**4, 10**5, 10**6]:
assert 0 < num_of_reader_procs < 4
all_reader_procs = start_reader_procs(qq, num_of_reader_procs)
writer(count, len(all_reader_procs), qq) # Queue stuff to all reader_p()
print("All reader processes are pulling numbers from the queue...")
_start = time.time()
for idx, a_reader_proc in enumerate(all_reader_procs):
print(" Waiting for reader_p.join() index %s" % idx)
a_reader_proc.join() # Wait for a_reader_proc() to finish
print(" reader_p() idx:%s is done" % idx)
print(
"Sending {0} integers through Queue() took {1} seconds".format(
count, (time.time() - _start)
)
)
print("")

Here's a dead simple usage of multiprocessing.Queue and multiprocessing.Process that allows callers to send an "event" plus arguments to a separate process that dispatches the event to a "do_" method on the process. (Python 3.4+)
import multiprocessing as mp
import collections
Msg = collections.namedtuple('Msg', ['event', 'args'])
class BaseProcess(mp.Process):
"""A process backed by an internal queue for simple one-way message passing.
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.queue = mp.Queue()
def send(self, event, *args):
"""Puts the event and args as a `Msg` on the queue
"""
msg = Msg(event, args)
self.queue.put(msg)
def dispatch(self, msg):
event, args = msg
handler = getattr(self, "do_%s" % event, None)
if not handler:
raise NotImplementedError("Process has no handler for [%s]" % event)
handler(*args)
def run(self):
while True:
msg = self.queue.get()
self.dispatch(msg)
Usage:
class MyProcess(BaseProcess):
def do_helloworld(self, arg1, arg2):
print(arg1, arg2)
if __name__ == "__main__":
process = MyProcess()
process.start()
process.send('helloworld', 'hello', 'world')
The send happens in the parent process, the do_* happens in the child process.
I left out any exception handling that would obviously interrupt the run loop and exit the child process. You can also customize it by overriding run to control blocking or whatever else.
This is really only useful in situations where you have a single worker process, but I think it's a relevant answer to this question to demonstrate a common scenario with a little more object-orientation.

I had a look at multiple answers across stack overflow and the web while trying to set-up a way of doing multiprocessing using queues for passing around large pandas dataframes. It seemed to me that every answer was re-iterating the same kind of solutions without any consideration of the multitude of edge cases one will definitely come across when setting up calculations like these. The problem is that there is many things at play at the same time. The number of tasks, the number of workers, the duration of each task and possible exceptions during task execution. All of these make synchronization tricky and most answers do not address how you can go about it. So this is my take after fiddling around for a few hours, hopefully this will be generic enough for most people to find it useful.
Some thoughts before any coding examples. Since queue.Empty or queue.qsize() or any other similar method is unreliable for flow control, any code of the like
while True:
try:
task = pending_queue.get_nowait()
except queue.Empty:
break
is bogus. This will kill the worker even if milliseconds later another task turns up in the queue. The worker will not recover and after a while ALL the workers will disappear as they randomly find the queue momentarily empty. The end result will be that the main multiprocessing function (the one with the join() on the processes) will return without all the tasks having completed. Nice. Good luck debugging through that if you have thousands of tasks and a few are missing.
The other issue is the use of sentinel values. Many people have suggested adding a sentinel value in the queue to flag the end of the queue. But to flag it to whom exactly? If there is N workers, assuming N is the number of cores available give or take, then a single sentinel value will only flag the end of the queue to one worker. All the other workers will sit waiting for more work when there is none left. Typical examples I've seen are
while True:
task = pending_queue.get()
if task == SOME_SENTINEL_VALUE:
break
One worker will get the sentinel value while the rest will wait indefinitely. No post I came across mentioned that you need to submit the sentinel value to the queue AT LEAST as many times as you have workers so that ALL of them get it.
The other issue is the handling of exceptions during task execution. Again these should be caught and managed. Moreover, if you have a completed_tasks queue you should independently count in a deterministic way how many items are in the queue before you decide that the job is done. Again relying on queue sizes is bound to fail and returns unexpected results.
In the example below, the par_proc() function will receive a list of tasks including the functions with which these tasks should be executed alongside any named arguments and values.
import multiprocessing as mp
import dill as pickle
import queue
import time
import psutil
SENTINEL = None
def do_work(tasks_pending, tasks_completed):
# Get the current worker's name
worker_name = mp.current_process().name
while True:
try:
task = tasks_pending.get_nowait()
except queue.Empty:
print(worker_name + ' found an empty queue. Sleeping for a while before checking again...')
time.sleep(0.01)
else:
try:
if task == SENTINEL:
print(worker_name + ' no more work left to be done. Exiting...')
break
print(worker_name + ' received some work... ')
time_start = time.perf_counter()
work_func = pickle.loads(task['func'])
result = work_func(**task['task'])
tasks_completed.put({work_func.__name__: result})
time_end = time.perf_counter() - time_start
print(worker_name + ' done in {} seconds'.format(round(time_end, 5)))
except Exception as e:
print(worker_name + ' task failed. ' + str(e))
tasks_completed.put({work_func.__name__: None})
def par_proc(job_list, num_cpus=None):
# Get the number of cores
if not num_cpus:
num_cpus = psutil.cpu_count(logical=False)
print('* Parallel processing')
print('* Running on {} cores'.format(num_cpus))
# Set-up the queues for sending and receiving data to/from the workers
tasks_pending = mp.Queue()
tasks_completed = mp.Queue()
# Gather processes and results here
processes = []
results = []
# Count tasks
num_tasks = 0
# Add the tasks to the queue
for job in job_list:
for task in job['tasks']:
expanded_job = {}
num_tasks = num_tasks + 1
expanded_job.update({'func': pickle.dumps(job['func'])})
expanded_job.update({'task': task})
tasks_pending.put(expanded_job)
# Use as many workers as there are cores (usually chokes the system so better use less)
num_workers = num_cpus
# We need as many sentinels as there are worker processes so that ALL processes exit when there is no more
# work left to be done.
for c in range(num_workers):
tasks_pending.put(SENTINEL)
print('* Number of tasks: {}'.format(num_tasks))
# Set-up and start the workers
for c in range(num_workers):
p = mp.Process(target=do_work, args=(tasks_pending, tasks_completed))
p.name = 'worker' + str(c)
processes.append(p)
p.start()
# Gather the results
completed_tasks_counter = 0
while completed_tasks_counter < num_tasks:
results.append(tasks_completed.get())
completed_tasks_counter = completed_tasks_counter + 1
for p in processes:
p.join()
return results
And here is a test to run the above code against
def test_parallel_processing():
def heavy_duty1(arg1, arg2, arg3):
return arg1 + arg2 + arg3
def heavy_duty2(arg1, arg2, arg3):
return arg1 * arg2 * arg3
task_list = [
{'func': heavy_duty1, 'tasks': [{'arg1': 1, 'arg2': 2, 'arg3': 3}, {'arg1': 1, 'arg2': 3, 'arg3': 5}]},
{'func': heavy_duty2, 'tasks': [{'arg1': 1, 'arg2': 2, 'arg3': 3}, {'arg1': 1, 'arg2': 3, 'arg3': 5}]},
]
results = par_proc(task_list)
job1 = sum([y for x in results if 'heavy_duty1' in x.keys() for y in list(x.values())])
job2 = sum([y for x in results if 'heavy_duty2' in x.keys() for y in list(x.values())])
assert job1 == 15
assert job2 == 21
plus another one with some exceptions
def test_parallel_processing_exceptions():
def heavy_duty1_raises(arg1, arg2, arg3):
raise ValueError('Exception raised')
return arg1 + arg2 + arg3
def heavy_duty2(arg1, arg2, arg3):
return arg1 * arg2 * arg3
task_list = [
{'func': heavy_duty1_raises, 'tasks': [{'arg1': 1, 'arg2': 2, 'arg3': 3}, {'arg1': 1, 'arg2': 3, 'arg3': 5}]},
{'func': heavy_duty2, 'tasks': [{'arg1': 1, 'arg2': 2, 'arg3': 3}, {'arg1': 1, 'arg2': 3, 'arg3': 5}]},
]
results = par_proc(task_list)
job1 = sum([y for x in results if 'heavy_duty1' in x.keys() for y in list(x.values())])
job2 = sum([y for x in results if 'heavy_duty2' in x.keys() for y in list(x.values())])
assert not job1
assert job2 == 21
Hope that is helpful.

in "from queue import Queue" there is no module called queue, instead multiprocessing should be used. Therefore, it should look like "from multiprocessing import Queue"

Just made a simple and general example for demonstrating passing a message over a Queue between 2 standalone programs. It doesn't directly answer the OP's question but should be clear enough indicating the concept.
Server:
multiprocessing-queue-manager-server.py
import asyncio
import concurrent.futures
import multiprocessing
import multiprocessing.managers
import queue
import sys
import threading
from typing import Any, AnyStr, Dict, Union
class QueueManager(multiprocessing.managers.BaseManager):
def get_queue(self, ident: Union[AnyStr, int, type(None)] = None) -> multiprocessing.Queue:
pass
def get_queue(ident: Union[AnyStr, int, type(None)] = None) -> multiprocessing.Queue:
global q
if not ident in q:
q[ident] = multiprocessing.Queue()
return q[ident]
q: Dict[Union[AnyStr, int, type(None)], multiprocessing.Queue] = dict()
delattr(QueueManager, 'get_queue')
def init_queue_manager_server():
if not hasattr(QueueManager, 'get_queue'):
QueueManager.register('get_queue', get_queue)
def serve(no: int, term_ev: threading.Event):
manager: QueueManager
with QueueManager(authkey=QueueManager.__name__.encode()) as manager:
print(f"Server address {no}: {manager.address}")
while not term_ev.is_set():
try:
item: Any = manager.get_queue().get(timeout=0.1)
print(f"Client {no}: {item} from {manager.address}")
except queue.Empty:
continue
async def main(n: int):
init_queue_manager_server()
term_ev: threading.Event = threading.Event()
executor: concurrent.futures.ThreadPoolExecutor = concurrent.futures.ThreadPoolExecutor()
i: int
for i in range(n):
asyncio.ensure_future(asyncio.get_running_loop().run_in_executor(executor, serve, i, term_ev))
# Gracefully shut down
try:
await asyncio.get_running_loop().create_future()
except asyncio.CancelledError:
term_ev.set()
executor.shutdown()
raise
if __name__ == '__main__':
asyncio.run(main(int(sys.argv[1])))
Client:
multiprocessing-queue-manager-client.py
import multiprocessing
import multiprocessing.managers
import os
import sys
from typing import AnyStr, Union
class QueueManager(multiprocessing.managers.BaseManager):
def get_queue(self, ident: Union[AnyStr, int, type(None)] = None) -> multiprocessing.Queue:
pass
delattr(QueueManager, 'get_queue')
def init_queue_manager_client():
if not hasattr(QueueManager, 'get_queue'):
QueueManager.register('get_queue')
def main():
init_queue_manager_client()
manager: QueueManager = QueueManager(sys.argv[1], authkey=QueueManager.__name__.encode())
manager.connect()
message = f"A message from {os.getpid()}"
print(f"Message to send: {message}")
manager.get_queue().put(message)
if __name__ == '__main__':
main()
Usage
Server:
$ python3 multiprocessing-queue-manager-server.py N
N is a integer indicating how many servers should be created. Copy one of the <server-address-N> output by the server and make it the first argument of each multiprocessing-queue-manager-client.py.
Client:
python3 multiprocessing-queue-manager-client.py <server-address-1>
Result
Server:
Client 1: <item> from <server-address-1>
Gist: https://gist.github.com/89062d639e40110c61c2f88018a8b0e5
UPD: Created a package here.
Server:
import ipcq
with ipcq.QueueManagerServer(address=ipcq.Address.AUTO, authkey=ipcq.AuthKey.AUTO) as server:
server.get_queue().get()
Client:
import ipcq
client = ipcq.QueueManagerClient(address=ipcq.Address.AUTO, authkey=ipcq.AuthKey.AUTO)
client.get_queue().put('a message')

We implemented two versions of this, one a simple multi thread pool that can execute many types of callables, making our lives much easier and the second version that uses processes, which is less flexible in terms of callables and requires and extra call to dill.
Setting frozen_pool to true will freeze execution until finish_pool_queue is called in either class.
Thread Version:
'''
Created on Nov 4, 2019
#author: Kevin
'''
from threading import Lock, Thread
from Queue import Queue
import traceback
from helium.loaders.loader_retailers import print_info
from time import sleep
import signal
import os
class ThreadPool(object):
def __init__(self, queue_threads, *args, **kwargs):
self.frozen_pool = kwargs.get('frozen_pool', False)
self.print_queue = kwargs.get('print_queue', True)
self.pool_results = []
self.lock = Lock()
self.queue_threads = queue_threads
self.queue = Queue()
self.threads = []
for i in range(self.queue_threads):
t = Thread(target=self.make_pool_call)
t.daemon = True
t.start()
self.threads.append(t)
def make_pool_call(self):
while True:
if self.frozen_pool:
#print '--> Queue is frozen'
sleep(1)
continue
item = self.queue.get()
if item is None:
break
call = item.get('call', None)
args = item.get('args', [])
kwargs = item.get('kwargs', {})
keep_results = item.get('keep_results', False)
try:
result = call(*args, **kwargs)
if keep_results:
self.lock.acquire()
self.pool_results.append((item, result))
self.lock.release()
except Exception as e:
self.lock.acquire()
print e
traceback.print_exc()
self.lock.release()
os.kill(os.getpid(), signal.SIGUSR1)
self.queue.task_done()
def finish_pool_queue(self):
self.frozen_pool = False
while self.queue.unfinished_tasks > 0:
if self.print_queue:
print_info('--> Thread pool... %s' % self.queue.unfinished_tasks)
sleep(5)
self.queue.join()
for i in range(self.queue_threads):
self.queue.put(None)
for t in self.threads:
t.join()
del self.threads[:]
def get_pool_results(self):
return self.pool_results
def clear_pool_results(self):
del self.pool_results[:]
Process Version:
'''
Created on Nov 4, 2019
#author: Kevin
'''
import traceback
from helium.loaders.loader_retailers import print_info
from time import sleep
import signal
import os
from multiprocessing import Queue, Process, Value, Array, JoinableQueue, Lock,\
RawArray, Manager
from dill import dill
import ctypes
from helium.misc.utils import ignore_exception
from mem_top import mem_top
import gc
class ProcessPool(object):
def __init__(self, queue_processes, *args, **kwargs):
self.frozen_pool = Value(ctypes.c_bool, kwargs.get('frozen_pool', False))
self.print_queue = kwargs.get('print_queue', True)
self.manager = Manager()
self.pool_results = self.manager.list()
self.queue_processes = queue_processes
self.queue = JoinableQueue()
self.processes = []
for i in range(self.queue_processes):
p = Process(target=self.make_pool_call)
p.start()
self.processes.append(p)
print 'Processes', self.queue_processes
def make_pool_call(self):
while True:
if self.frozen_pool.value:
sleep(1)
continue
item_pickled = self.queue.get()
if item_pickled is None:
#print '--> Ending'
self.queue.task_done()
break
item = dill.loads(item_pickled)
call = item.get('call', None)
args = item.get('args', [])
kwargs = item.get('kwargs', {})
keep_results = item.get('keep_results', False)
try:
result = call(*args, **kwargs)
if keep_results:
self.pool_results.append(dill.dumps((item, result)))
else:
del call, args, kwargs, keep_results, item, result
except Exception as e:
print e
traceback.print_exc()
os.kill(os.getpid(), signal.SIGUSR1)
self.queue.task_done()
def finish_pool_queue(self, callable=None):
self.frozen_pool.value = False
while self.queue._unfinished_tasks.get_value() > 0:
if self.print_queue:
print_info('--> Process pool... %s' % (self.queue._unfinished_tasks.get_value()))
if callable:
callable()
sleep(5)
for i in range(self.queue_processes):
self.queue.put(None)
self.queue.join()
self.queue.close()
for p in self.processes:
with ignore_exception: p.join(10)
with ignore_exception: p.terminate()
with ignore_exception: del self.processes[:]
def get_pool_results(self):
return self.pool_results
def clear_pool_results(self):
del self.pool_results[:]
def test(eg):
print 'EG', eg
Call with either:
tp = ThreadPool(queue_threads=2)
tp.queue.put({'call': test, 'args': [random.randint(0, 100)]})
tp.finish_pool_queue()
or
pp = ProcessPool(queue_processes=2)
pp.queue.put(dill.dumps({'call': test, 'args': [random.randint(0, 100)]}))
pp.queue.put(dill.dumps({'call': test, 'args': [random.randint(0, 100)]}))
pp.finish_pool_queue()

A multi-producers and multi-consumers example, verified. It should be easy to modify it to cover other cases, single/multi producers, single/multi consumers.
from multiprocessing import Process, JoinableQueue
import time
import os
q = JoinableQueue()
def producer():
for item in range(30):
time.sleep(2)
q.put(item)
pid = os.getpid()
print(f'producer {pid} done')
def worker():
while True:
item = q.get()
pid = os.getpid()
print(f'pid {pid} Working on {item}')
print(f'pid {pid} Finished {item}')
q.task_done()
for i in range(5):
p = Process(target=worker, daemon=True).start()
# send thirty task requests to the worker
producers = []
for i in range(2):
p = Process(target=producer)
producers.append(p)
p.start()
# make sure producers done
for p in producers:
p.join()
# block until all workers are done
q.join()
print('All work completed')
Explanation:
Two producers and five consumers in this example.
JoinableQueue is used to make sure all elements stored in queue will be processed. 'task_done' is for worker to notify an element is done. 'q.join()' will wait for all elements marked as done.
With #2, there is no need to join wait for every worker.
But it is important to join wait for every producer to store element into queue. Otherwise, program exit immediately.

dynamically calculate number of processes to be spawned

I have a list of about 15 years in the year_queue, I need to spawn one process for each year. But depending on which server I am running the code, the number of processors vary. How do I dynamically vary the variable num_processes depending on the number of processers in the server?
If I set num_processes > number of processers, would it automatically spawn accordingly? When I test this - it creates 15 processes & splits the CPU power between them. I am looking for a way to first create 'n' number of processes, where n = number of processers in the server, and then as each of those processes finish, the next is spawned.
for i in range(num_processes):
worker = ForEachPerson(year_queue, result_queue, i, dict_of_files)
print "worker spawned for " + str(i)
worker.start()
results = []
while len(results) < len(years):
result = result_queue.get()
results.append(result)
Anyone had the same issue?
while year_queue.empty() != True:
for i in range(num_processes):
worker = ForEachPerson(year_queue, result_queue, i, dict_of_files)
print "worker spawned for " + str(i)
worker.start()
# collect results off the queue
print "results being collected"
results = []
while len(results) < len(num_processes):
result = result_queue.get()
results.append(result)

Use a multiprocessing Pool. The class does all the tedious work of selecting the right number of processes and running them for you. It also doesn't spawn a new process for each task, but reuses processes once they're done.
def process_year(year):
...
return result
pool = multiprocessing.Pool()
results = pool.map(process_year, year_queue)

from multiprocessing import Process, Queue, cpu_count
from Queue import Empty
class ForEachPerson(Process):
def __init__(self, year_queue, result_queue, i, dict_of_files):
self.year_queue=year_queue
self.result_queue=result_queue
self.i=i
self.dict_of_files=dict_of_files
super(ForEachPerson, self).__init__()
def run(self):
while True:
try:
year=self.year_queue.get()
''' Do something '''
self.result_queue.put(year)
except Empty:
self.result_queue.close()
return
if __name__ == '__main__':
year_queue=Queue()
result_queue=Queue()
dict_of_files={}
start_year=1996
num_years=15
for year in range(start_year, start_year + num_years):
year_queue.put(year)
workers=[]
for i in range(cpu_count()):
worker = ForEachPerson(year_queue, result_queue, i, dict_of_files)
print 'worker spawned for', str(i)
worker.start()
workers.append(worker)
results=[]
while len(results) < num_years:
try:
year=result_queue.get()
results.append(year)
print 'Result:', year
except Empty:
pass
for worker in workers:
worker.terminate()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python multiprocessing and handling exceptions in workers - python

Related

100 percent load with multiprocessing queues

Queue doesn't process all elements when there are many threads

multiprocessing - reading big input data - program hangs

How to use multiprocessing queue in Python?

dynamically calculate number of processes to be spawned

Categories

Resources