Printing contents of a Queue in Python - python

If I am using the python module queue.Queue, I want to be able to print out the contents using a method that does not pop the original queue or create a new queue object.
I have tried looking into doing a get and then putting the contents back but this is too high cost.
# Ideally it would look like the following
from queue import Queue
q = Queue()
q.print()
q.put(1)
q.print()
>> [] # Or something like this
>> [1] # Or something like this

>>> print(list(q.queue))
Does this work for you?

Assuming you are using python 2.
You can use something like this:
from queue import Queue
q = Queue.Queue()
q.put(1)
q.put(2)
q.put(3)
print q.queue
You can also loop on it :
for q_item in q.queue:
print q_item
But unless you are dealing with threads, I would use a normal list as a Queue implementation.

Sorry, I am a bit late to answer this question, but going by this comment, I extended the Queue in the multiprocessing package as per your requirements. Hopefully it will help someone in the future.
import multiprocessing as mp
from multiprocessing import queues
class IterQueue(queues.Queue):
def __init__(self, *args, **kwargs):
ctx = mp.get_context()
kwargs['ctx'] = ctx
super().__init__(*args, **kwargs)
# <---- Iter Protocol ------>
def __iter__(self):
return self
def __next__(self):
try:
if not self.empty():
return self.get() # block=True | default
else:
raise StopIteration
except ValueError: # the Queue is closed
raise StopIteration
Given below is a sample usage of this IterQueue I wrote:
def sample_func(queue_ref):
for i in range(10):
queue_ref.put(i)
IQ = IterQueue()
p = mp.Process(target=sample_func, args=(IQ,))
p.start()
p.join()
print(list(IQ)) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
I have tested this IterQueue for even a few more complex scenarios, and it seems to be working fine. Let me know if you think this works, or it could fail in some situation.

not sure if this is still a question, but using the name.queue (i.e. q.queue here) works for me. This works also for the other types of queues in the module.
import queue
q = queue.Queue()
print(list(q.queue))
q.put(1)
print(list(q.queue))

The most common way to print the content of the queue if you are not using the queue then use this code snippet:
class Queue:
def __init__(self):
self.items = []
def push(self, e):
self.items.append(e)
def pop(self):
head = self.items[0]
self.items = self.item[1:]
return head
def print(self):
for e in self.items:
print(e)
q = Queue()
q.push(1)
q.push(23)
q.print()
OUTPUT
1
23

Related

Block script from hanging when islice is used on a multiprocessing Queue

I have an iterable Python class that wraps around a multiprocessing generator. There are use cases where only a subset of what is generated is necessary, so it gets wrapped in islice.
However, the call hangs when islice is used, I guess due to the underlying multiprocessing Process not being aware that things are over.
A minimally functioning example is as follows:
from itertools import islice
import multiprocessing as mp
STOP_MSG = 'STOP!'
def generator(queue, max_val):
for i in range(max_val):
queue.put(i)
queue.put(STOP_MSG)
class GeneratorMPProc:
def __init__(self, max_val):
self.max_val = max_val
def __iter__(self):
queue = mp.Queue()
feeder_process = mp.Process(
target=generator,
args=(
queue,
self.max_val,
))
feeder_process.start()
msg = queue.get()
while msg != STOP_MSG:
yield msg
msg = queue.get()
feeder_process.join()
if __name__ == '__main__':
max_val = 0xFFFFFFFFF
end_val = 10
psm = GeneratorMPProc(max_val)
rsm = [i for i in islice(psm, end_val)]
How do I fix this so that it terminates correctly even when islice or any subset selector is used?
Your isllice call is not going to return until GeneratorMPProc.iter returns and that will take quite a while with max_val set to 0xFFFFFFFFF (writing to a queue is not the fastest operation and this will also use up a bit of resources). In other words, "things are not over" until your generator function ends and thus your multiprocess.Process actually ends and can be joined.
Set max_val to a value such as 20 and your program will terminate readily enough.
if __name__ == '__main__':
#max_val = 0xFFFFFFFFF
max_val = 20
end_val = 10
psm = GeneratorMPProc(max_val)
rsm = [i for i in islice(psm, end_val)]
print(rsm)
Prints:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Update
You might want to consider starting the "generator" process with an extra argument daemon=True to make it a daemon process and then remove feeder_process.join() altogether from method __iter__. The code will then work even with your original max_val.
def __iter__(self):
queue = mp.Queue()
feeder_process = mp.Process(
target=generator,
args=(
queue,
self.max_val,
),
daemon=True
)
feeder_process.start()
msg = queue.get()
while msg != STOP_MSG:
yield msg
msg = queue.get()

How can I append to class variables using multiprocessing in python?

I have this program where everything is built in a class object. There is a function that does 50 computations of a another function, each with a different input, so I decided to use multiprocessing to speed it up. However, the list that needs to be returned in the end always returns empty. any ideas? Here is a simplified version of my problem. The output of main_function() should be a list containing the numbers 0-9, however the list returns empty.
class MyClass(object):
def __init__(self):
self.arr = list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
jobs = []
for i in range(0,10):
p = multiprocessing.Process(target=self.helper_function, args=(i,))
jobs.append(p)
p.start()
for job in jobs:
jobs.join()
print(self.arr)
arr is a list that's not going to be shared across subprocess instances.
For that you have to use a Manager object to create a managed list that is aware of the fact that it's shared between processes.
The key is:
self.arr = multiprocessing.Manager().list()
full working example:
import multiprocessing
class MyClass(object):
def __init__(self):
self.arr = multiprocessing.Manager().list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
jobs = []
for i in range(0,10):
p = multiprocessing.Process(target=self.helper_function, args=(i,))
jobs.append(p)
p.start()
for job in jobs:
job.join()
print(self.arr)
if __name__ == "__main__":
a = MyClass()
a.main_function()
this code now prints: [7, 9, 2, 8, 6, 0, 4, 3, 1, 5]
(well of course the order cannot be relied on between several executions, but all numbers are here which means that all processes contributed to the result)
multiprocessing is touchy.
For simple multiprocessing tasks, I would recomend:
from multiprocessing.dummy import Pool as ThreadPool
class MyClass(object):
def __init__(self):
self.arr = list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
pool = ThreadPool(4)
pool.map(self.helper_function, range(10))
print(self.arr)
if __name__ == '__main__':
c = MyClass()
c.main_function()
The idea of using map instead of complicated multithreading calls is from one of my favorite blog posts: https://chriskiehl.com/article/parallelism-in-one-line

Python sharing a deque between multiprocessing processes

I've been looking at the following questions for the pas hour without any luck:
Python sharing a dictionary between parallel processes
multiprocessing: sharing a large read-only object between processes?
multiprocessing in python - sharing large object (e.g. pandas dataframe) between multiple processes
I've written a very basic test file to illustrate what I'm trying to do:
from collections import deque
from multiprocessing import Process
import numpy as np
class TestClass:
def __init__(self):
self.mem = deque(maxlen=4)
self.process = Process(target=self.run)
def run(self):
while True:
self.mem.append(np.array([0, 1, 2, 3, 4]))
def print_values(x):
while True:
print(x)
test = TestClass()
process = Process(target=print_values(test.mem))
test.process.start()
process.start()
Currently this outputs the following :
deque([], maxlen=4)
How can I access the mem value's from the main code or the process that runs "print_values"?
Unfortunately multiprocessing.Manager() doesn't support deque but it does work with list, dict, Queue, Value and Array. A list is fairly close so I've used it in the example below..
from multiprocessing import Process, Manager, Lock
import numpy as np
class TestClass:
def __init__(self):
self.maxlen = 4
self.manager = Manager()
self.mem = self.manager.list()
self.lock = self.manager.Lock()
self.process = Process(target=self.run, args=(self.mem, self.lock))
def run(self, mem, lock):
while True:
array = np.random.randint(0, high=10, size=5)
with lock:
if len(mem) >= self.maxlen:
mem.pop(0)
mem.append(array)
def print_values(mem, lock):
while True:
with lock:
print mem
test = TestClass()
print_process = Process(target=print_values, args=(test.mem, test.lock))
test.process.start()
print_process.start()
test.process.join()
print_process.join()
You have to be a little careful using manager objects. You can use them a lot like the objects they reference but you can't do something like... mem = mem[-4:] to truncate the values because you're changing the referenced object.
As for coding style, I might move the Manager objects outside the class or move the print_values function inside it but for an example, this works. If you move things around, just note that you can't use self.mem directly in the run method. You need to pass it in when you start the process or the fork that python does in the background will create a new instance and it won't be shared.
Hopefully this works for your situation, if not, we can try to adapt it a bit.
So by combining the code provided by #bivouac0 and the comment #Marijn Pieters posted, I came up with the following solution:
from multiprocessing import Process, Manager, Queue
class testClass:
def __init__(self, maxlen=4):
self.mem = Queue(maxsize=maxlen)
self.process = Process(target=self.run)
def run(self):
i = 0
while True:
self.mem.empty()
while not self.mem.full():
self.mem.put(i)
i += 1
def print_values(queue):
while True:
values = queue.get()
print(values)
if __name__ == "__main__":
test = testClass()
print_process = Process(target=print_values, args=(test.mem,))
test.process.start()
print_process.start()
test.process.join()
print_process.join()

Apply a method to a list of objects in parallel using multi-processing

I have created a class with a number of methods. One of the methods is very time consuming, my_process, and I'd like to do that method in parallel. I came across Python Multiprocessing - apply class method to a list of objects but I'm not sure how to apply it to my problem, and what effect it will have on the other methods of my class.
class MyClass():
def __init__(self, input):
self.input = input
self.result = int
def my_process(self, multiply_by, add_to):
self.result = self.input * multiply_by
self._my_sub_process(add_to)
return self.result
def _my_sub_process(self, add_to):
self.result += add_to
list_of_numbers = range(0, 5)
list_of_objects = [MyClass(i) for i in list_of_numbers]
list_of_results = [obj.my_process(100, 1) for obj in list_of_objects] # multi-process this for-loop
print list_of_numbers
print list_of_results
[0, 1, 2, 3, 4]
[1, 101, 201, 301, 401]
I'm going to go against the grain here, and suggest sticking to the simplest thing that could possibly work ;-) That is, Pool.map()-like functions are ideal for this, but are restricted to passing a single argument. Rather than make heroic efforts to worm around that, simply write a helper function that only needs a single argument: a tuple. Then it's all easy and clear.
Here's a complete program taking that approach, which prints what you want under Python 2, and regardless of OS:
class MyClass():
def __init__(self, input):
self.input = input
self.result = int
def my_process(self, multiply_by, add_to):
self.result = self.input * multiply_by
self._my_sub_process(add_to)
return self.result
def _my_sub_process(self, add_to):
self.result += add_to
import multiprocessing as mp
NUM_CORE = 4 # set to the number of cores you want to use
def worker(arg):
obj, m, a = arg
return obj.my_process(m, a)
if __name__ == "__main__":
list_of_numbers = range(0, 5)
list_of_objects = [MyClass(i) for i in list_of_numbers]
pool = mp.Pool(NUM_CORE)
list_of_results = pool.map(worker, ((obj, 100, 1) for obj in list_of_objects))
pool.close()
pool.join()
print list_of_numbers
print list_of_results
A big of magic
I should note there are many advantages to taking the very simple approach I suggest. Beyond that it "just works" on Pythons 2 and 3, requires no changes to your classes, and is easy to understand, it also plays nice with all of the Pool methods.
However, if you have multiple methods you want to run in parallel, it can get a bit annoying to write a tiny worker function for each. So here's a tiny bit of "magic" to worm around that. Change worker() like so:
def worker(arg):
obj, methname = arg[:2]
return getattr(obj, methname)(*arg[2:])
Now a single worker function suffices for any number of methods, with any number of arguments. In your specific case, just change one line to match:
list_of_results = pool.map(worker, ((obj, "my_process", 100, 1) for obj in list_of_objects))
More-or-less obvious generalizations can also cater to methods with keyword arguments. But, in real life, I usually stick to the original suggestion. At some point catering to generalizations does more harm than good. Then again, I like obvious things ;-)
If your class is not "huge", I think process oriented is better.
Pool in multiprocessing is suggested.
This is the tutorial -> https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers
Then seperate the add_to from my_process since they are quick and you can wait util the end of the last process.
def my_process(input, multiby):
return xxxx
def add_to(result,a_list):
xxx
p = Pool(5)
res = []
for i in range(10):
res.append(p.apply_async(my_process, (i,5)))
p.join() # wait for the end of the last process
for i in range(10):
print res[i].get()
Generally the easiest way to run the same calculation in parallel is the map method of a multiprocessing.Pool (or the as_completed function from concurrent.futures in Python 3).
However, the map method applies a function that only takes one argument to an iterable of data using multiple processes.
So this function cannot be a normal method, because that requires at least two arguments; it must also include self! It could be a staticmethod, however. See also this answer for a more in-depth explanation.
Based on the answer of Python Multiprocessing - apply class method to a list of objects and your code:
add MyClass object into simulation object
class simulation(multiprocessing.Process):
def __init__(self, id, worker, *args, **kwargs):
# must call this before anything else
multiprocessing.Process.__init__(self)
self.id = id
self.worker = worker
self.args = args
self.kwargs = kwargs
sys.stdout.write('[%d] created\n' % (self.id))
run what you want in run function
def run(self):
sys.stdout.write('[%d] running ... process id: %s\n' % (self.id, os.getpid()))
self.worker.my_process(*self.args, **self.kwargs)
sys.stdout.write('[%d] completed\n' % (self.id))
Try this:
list_of_numbers = range(0, 5)
list_of_objects = [MyClass(i) for i in list_of_numbers]
list_of_sim = [simulation(id=k, worker=obj, multiply_by=100*k, add_to=10*k) \
for k, obj in enumerate(list_of_objects)]
for sim in list_of_sim:
sim.start()
If you don't absolutely need to stick with Multiprocessing module then,
it can easily achieved using concurrents.futures library
here's the example code:
from concurrent.futures.thread import ThreadPoolExecutor, wait
MAX_WORKERS = 20
class MyClass():
def __init__(self, input):
self.input = input
self.result = int
def my_process(self, multiply_by, add_to):
self.result = self.input * multiply_by
self._my_sub_process(add_to)
return self.result
def _my_sub_process(self, add_to):
self.result += add_to
list_of_numbers = range(0, 5)
list_of_objects = [MyClass(i) for i in list_of_numbers]
With ThreadPoolExecutor(MAX_WORKERS) as executor:
for obj in list_of_objects:
executor.submit(obj.my_process, 100, 1).add_done_callback(on_finish)
def on_finish(future):
result = future.result() # do stuff with your result
here executor returns future for every task it submits. keep in mind that if you use add_done_callback() finished task from thread returns to the main thread (which would block your main thread) if you really want true parallelism then you should wait for future objects separately. here's the code snippet for that.
futures = []
with ThreadPoolExecutor(MAX_WORKERS) as executor:
for objin list_of_objects:
futures.append(executor.submit(obj.my_process, 100, 1))
wait(futures)
for succeded, failed in futures:
# work with your result here
if succeded:
print (succeeeded.result())
if failed:
print (failed.result())
hope this helps.

How to use multiprocessing queue in Python?

I'm having much trouble trying to understand just how the multiprocessing queue works on python and how to implement it. Lets say I have two python modules that access data from a shared file, let's call these two modules a writer and a reader. My plan is to have both the reader and writer put requests into two separate multiprocessing queues, and then have a third process pop these requests in a loop and execute as such.
My main problem is that I really don't know how to implement multiprocessing.queue correctly, you cannot really instantiate the object for each process since they will be separate queues, how do you make sure that all processes relate to a shared queue (or in this case, queues)
My main problem is that I really don't know how to implement multiprocessing.queue correctly, you cannot really instantiate the object for each process since they will be separate queues, how do you make sure that all processes relate to a shared queue (or in this case, queues)
This is a simple example of a reader and writer sharing a single queue... The writer sends a bunch of integers to the reader; when the writer runs out of numbers, it sends 'DONE', which lets the reader know to break out of the read loop.
You can spawn as many reader processes as you like...
from multiprocessing import Process, Queue
import time
import sys
def reader_proc(queue):
"""Read from the queue; this spawns as a separate Process"""
while True:
msg = queue.get() # Read from the queue and do nothing
if msg == "DONE":
break
def writer(count, num_of_reader_procs, queue):
"""Write integers into the queue. A reader_proc() will read them from the queue"""
for ii in range(0, count):
queue.put(ii) # Put 'count' numbers into queue
### Tell all readers to stop...
for ii in range(0, num_of_reader_procs):
queue.put("DONE")
def start_reader_procs(qq, num_of_reader_procs):
"""Start the reader processes and return all in a list to the caller"""
all_reader_procs = list()
for ii in range(0, num_of_reader_procs):
### reader_p() reads from qq as a separate process...
### you can spawn as many reader_p() as you like
### however, there is usually a point of diminishing returns
reader_p = Process(target=reader_proc, args=((qq),))
reader_p.daemon = True
reader_p.start() # Launch reader_p() as another proc
all_reader_procs.append(reader_p)
return all_reader_procs
if __name__ == "__main__":
num_of_reader_procs = 2
qq = Queue() # writer() writes to qq from _this_ process
for count in [10**4, 10**5, 10**6]:
assert 0 < num_of_reader_procs < 4
all_reader_procs = start_reader_procs(qq, num_of_reader_procs)
writer(count, len(all_reader_procs), qq) # Queue stuff to all reader_p()
print("All reader processes are pulling numbers from the queue...")
_start = time.time()
for idx, a_reader_proc in enumerate(all_reader_procs):
print(" Waiting for reader_p.join() index %s" % idx)
a_reader_proc.join() # Wait for a_reader_proc() to finish
print(" reader_p() idx:%s is done" % idx)
print(
"Sending {0} integers through Queue() took {1} seconds".format(
count, (time.time() - _start)
)
)
print("")
Here's a dead simple usage of multiprocessing.Queue and multiprocessing.Process that allows callers to send an "event" plus arguments to a separate process that dispatches the event to a "do_" method on the process. (Python 3.4+)
import multiprocessing as mp
import collections
Msg = collections.namedtuple('Msg', ['event', 'args'])
class BaseProcess(mp.Process):
"""A process backed by an internal queue for simple one-way message passing.
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.queue = mp.Queue()
def send(self, event, *args):
"""Puts the event and args as a `Msg` on the queue
"""
msg = Msg(event, args)
self.queue.put(msg)
def dispatch(self, msg):
event, args = msg
handler = getattr(self, "do_%s" % event, None)
if not handler:
raise NotImplementedError("Process has no handler for [%s]" % event)
handler(*args)
def run(self):
while True:
msg = self.queue.get()
self.dispatch(msg)
Usage:
class MyProcess(BaseProcess):
def do_helloworld(self, arg1, arg2):
print(arg1, arg2)
if __name__ == "__main__":
process = MyProcess()
process.start()
process.send('helloworld', 'hello', 'world')
The send happens in the parent process, the do_* happens in the child process.
I left out any exception handling that would obviously interrupt the run loop and exit the child process. You can also customize it by overriding run to control blocking or whatever else.
This is really only useful in situations where you have a single worker process, but I think it's a relevant answer to this question to demonstrate a common scenario with a little more object-orientation.
I had a look at multiple answers across stack overflow and the web while trying to set-up a way of doing multiprocessing using queues for passing around large pandas dataframes. It seemed to me that every answer was re-iterating the same kind of solutions without any consideration of the multitude of edge cases one will definitely come across when setting up calculations like these. The problem is that there is many things at play at the same time. The number of tasks, the number of workers, the duration of each task and possible exceptions during task execution. All of these make synchronization tricky and most answers do not address how you can go about it. So this is my take after fiddling around for a few hours, hopefully this will be generic enough for most people to find it useful.
Some thoughts before any coding examples. Since queue.Empty or queue.qsize() or any other similar method is unreliable for flow control, any code of the like
while True:
try:
task = pending_queue.get_nowait()
except queue.Empty:
break
is bogus. This will kill the worker even if milliseconds later another task turns up in the queue. The worker will not recover and after a while ALL the workers will disappear as they randomly find the queue momentarily empty. The end result will be that the main multiprocessing function (the one with the join() on the processes) will return without all the tasks having completed. Nice. Good luck debugging through that if you have thousands of tasks and a few are missing.
The other issue is the use of sentinel values. Many people have suggested adding a sentinel value in the queue to flag the end of the queue. But to flag it to whom exactly? If there is N workers, assuming N is the number of cores available give or take, then a single sentinel value will only flag the end of the queue to one worker. All the other workers will sit waiting for more work when there is none left. Typical examples I've seen are
while True:
task = pending_queue.get()
if task == SOME_SENTINEL_VALUE:
break
One worker will get the sentinel value while the rest will wait indefinitely. No post I came across mentioned that you need to submit the sentinel value to the queue AT LEAST as many times as you have workers so that ALL of them get it.
The other issue is the handling of exceptions during task execution. Again these should be caught and managed. Moreover, if you have a completed_tasks queue you should independently count in a deterministic way how many items are in the queue before you decide that the job is done. Again relying on queue sizes is bound to fail and returns unexpected results.
In the example below, the par_proc() function will receive a list of tasks including the functions with which these tasks should be executed alongside any named arguments and values.
import multiprocessing as mp
import dill as pickle
import queue
import time
import psutil
SENTINEL = None
def do_work(tasks_pending, tasks_completed):
# Get the current worker's name
worker_name = mp.current_process().name
while True:
try:
task = tasks_pending.get_nowait()
except queue.Empty:
print(worker_name + ' found an empty queue. Sleeping for a while before checking again...')
time.sleep(0.01)
else:
try:
if task == SENTINEL:
print(worker_name + ' no more work left to be done. Exiting...')
break
print(worker_name + ' received some work... ')
time_start = time.perf_counter()
work_func = pickle.loads(task['func'])
result = work_func(**task['task'])
tasks_completed.put({work_func.__name__: result})
time_end = time.perf_counter() - time_start
print(worker_name + ' done in {} seconds'.format(round(time_end, 5)))
except Exception as e:
print(worker_name + ' task failed. ' + str(e))
tasks_completed.put({work_func.__name__: None})
def par_proc(job_list, num_cpus=None):
# Get the number of cores
if not num_cpus:
num_cpus = psutil.cpu_count(logical=False)
print('* Parallel processing')
print('* Running on {} cores'.format(num_cpus))
# Set-up the queues for sending and receiving data to/from the workers
tasks_pending = mp.Queue()
tasks_completed = mp.Queue()
# Gather processes and results here
processes = []
results = []
# Count tasks
num_tasks = 0
# Add the tasks to the queue
for job in job_list:
for task in job['tasks']:
expanded_job = {}
num_tasks = num_tasks + 1
expanded_job.update({'func': pickle.dumps(job['func'])})
expanded_job.update({'task': task})
tasks_pending.put(expanded_job)
# Use as many workers as there are cores (usually chokes the system so better use less)
num_workers = num_cpus
# We need as many sentinels as there are worker processes so that ALL processes exit when there is no more
# work left to be done.
for c in range(num_workers):
tasks_pending.put(SENTINEL)
print('* Number of tasks: {}'.format(num_tasks))
# Set-up and start the workers
for c in range(num_workers):
p = mp.Process(target=do_work, args=(tasks_pending, tasks_completed))
p.name = 'worker' + str(c)
processes.append(p)
p.start()
# Gather the results
completed_tasks_counter = 0
while completed_tasks_counter < num_tasks:
results.append(tasks_completed.get())
completed_tasks_counter = completed_tasks_counter + 1
for p in processes:
p.join()
return results
And here is a test to run the above code against
def test_parallel_processing():
def heavy_duty1(arg1, arg2, arg3):
return arg1 + arg2 + arg3
def heavy_duty2(arg1, arg2, arg3):
return arg1 * arg2 * arg3
task_list = [
{'func': heavy_duty1, 'tasks': [{'arg1': 1, 'arg2': 2, 'arg3': 3}, {'arg1': 1, 'arg2': 3, 'arg3': 5}]},
{'func': heavy_duty2, 'tasks': [{'arg1': 1, 'arg2': 2, 'arg3': 3}, {'arg1': 1, 'arg2': 3, 'arg3': 5}]},
]
results = par_proc(task_list)
job1 = sum([y for x in results if 'heavy_duty1' in x.keys() for y in list(x.values())])
job2 = sum([y for x in results if 'heavy_duty2' in x.keys() for y in list(x.values())])
assert job1 == 15
assert job2 == 21
plus another one with some exceptions
def test_parallel_processing_exceptions():
def heavy_duty1_raises(arg1, arg2, arg3):
raise ValueError('Exception raised')
return arg1 + arg2 + arg3
def heavy_duty2(arg1, arg2, arg3):
return arg1 * arg2 * arg3
task_list = [
{'func': heavy_duty1_raises, 'tasks': [{'arg1': 1, 'arg2': 2, 'arg3': 3}, {'arg1': 1, 'arg2': 3, 'arg3': 5}]},
{'func': heavy_duty2, 'tasks': [{'arg1': 1, 'arg2': 2, 'arg3': 3}, {'arg1': 1, 'arg2': 3, 'arg3': 5}]},
]
results = par_proc(task_list)
job1 = sum([y for x in results if 'heavy_duty1' in x.keys() for y in list(x.values())])
job2 = sum([y for x in results if 'heavy_duty2' in x.keys() for y in list(x.values())])
assert not job1
assert job2 == 21
Hope that is helpful.
in "from queue import Queue" there is no module called queue, instead multiprocessing should be used. Therefore, it should look like "from multiprocessing import Queue"
Just made a simple and general example for demonstrating passing a message over a Queue between 2 standalone programs. It doesn't directly answer the OP's question but should be clear enough indicating the concept.
Server:
multiprocessing-queue-manager-server.py
import asyncio
import concurrent.futures
import multiprocessing
import multiprocessing.managers
import queue
import sys
import threading
from typing import Any, AnyStr, Dict, Union
class QueueManager(multiprocessing.managers.BaseManager):
def get_queue(self, ident: Union[AnyStr, int, type(None)] = None) -> multiprocessing.Queue:
pass
def get_queue(ident: Union[AnyStr, int, type(None)] = None) -> multiprocessing.Queue:
global q
if not ident in q:
q[ident] = multiprocessing.Queue()
return q[ident]
q: Dict[Union[AnyStr, int, type(None)], multiprocessing.Queue] = dict()
delattr(QueueManager, 'get_queue')
def init_queue_manager_server():
if not hasattr(QueueManager, 'get_queue'):
QueueManager.register('get_queue', get_queue)
def serve(no: int, term_ev: threading.Event):
manager: QueueManager
with QueueManager(authkey=QueueManager.__name__.encode()) as manager:
print(f"Server address {no}: {manager.address}")
while not term_ev.is_set():
try:
item: Any = manager.get_queue().get(timeout=0.1)
print(f"Client {no}: {item} from {manager.address}")
except queue.Empty:
continue
async def main(n: int):
init_queue_manager_server()
term_ev: threading.Event = threading.Event()
executor: concurrent.futures.ThreadPoolExecutor = concurrent.futures.ThreadPoolExecutor()
i: int
for i in range(n):
asyncio.ensure_future(asyncio.get_running_loop().run_in_executor(executor, serve, i, term_ev))
# Gracefully shut down
try:
await asyncio.get_running_loop().create_future()
except asyncio.CancelledError:
term_ev.set()
executor.shutdown()
raise
if __name__ == '__main__':
asyncio.run(main(int(sys.argv[1])))
Client:
multiprocessing-queue-manager-client.py
import multiprocessing
import multiprocessing.managers
import os
import sys
from typing import AnyStr, Union
class QueueManager(multiprocessing.managers.BaseManager):
def get_queue(self, ident: Union[AnyStr, int, type(None)] = None) -> multiprocessing.Queue:
pass
delattr(QueueManager, 'get_queue')
def init_queue_manager_client():
if not hasattr(QueueManager, 'get_queue'):
QueueManager.register('get_queue')
def main():
init_queue_manager_client()
manager: QueueManager = QueueManager(sys.argv[1], authkey=QueueManager.__name__.encode())
manager.connect()
message = f"A message from {os.getpid()}"
print(f"Message to send: {message}")
manager.get_queue().put(message)
if __name__ == '__main__':
main()
Usage
Server:
$ python3 multiprocessing-queue-manager-server.py N
N is a integer indicating how many servers should be created. Copy one of the <server-address-N> output by the server and make it the first argument of each multiprocessing-queue-manager-client.py.
Client:
python3 multiprocessing-queue-manager-client.py <server-address-1>
Result
Server:
Client 1: <item> from <server-address-1>
Gist: https://gist.github.com/89062d639e40110c61c2f88018a8b0e5
UPD: Created a package here.
Server:
import ipcq
with ipcq.QueueManagerServer(address=ipcq.Address.AUTO, authkey=ipcq.AuthKey.AUTO) as server:
server.get_queue().get()
Client:
import ipcq
client = ipcq.QueueManagerClient(address=ipcq.Address.AUTO, authkey=ipcq.AuthKey.AUTO)
client.get_queue().put('a message')
We implemented two versions of this, one a simple multi thread pool that can execute many types of callables, making our lives much easier and the second version that uses processes, which is less flexible in terms of callables and requires and extra call to dill.
Setting frozen_pool to true will freeze execution until finish_pool_queue is called in either class.
Thread Version:
'''
Created on Nov 4, 2019
#author: Kevin
'''
from threading import Lock, Thread
from Queue import Queue
import traceback
from helium.loaders.loader_retailers import print_info
from time import sleep
import signal
import os
class ThreadPool(object):
def __init__(self, queue_threads, *args, **kwargs):
self.frozen_pool = kwargs.get('frozen_pool', False)
self.print_queue = kwargs.get('print_queue', True)
self.pool_results = []
self.lock = Lock()
self.queue_threads = queue_threads
self.queue = Queue()
self.threads = []
for i in range(self.queue_threads):
t = Thread(target=self.make_pool_call)
t.daemon = True
t.start()
self.threads.append(t)
def make_pool_call(self):
while True:
if self.frozen_pool:
#print '--> Queue is frozen'
sleep(1)
continue
item = self.queue.get()
if item is None:
break
call = item.get('call', None)
args = item.get('args', [])
kwargs = item.get('kwargs', {})
keep_results = item.get('keep_results', False)
try:
result = call(*args, **kwargs)
if keep_results:
self.lock.acquire()
self.pool_results.append((item, result))
self.lock.release()
except Exception as e:
self.lock.acquire()
print e
traceback.print_exc()
self.lock.release()
os.kill(os.getpid(), signal.SIGUSR1)
self.queue.task_done()
def finish_pool_queue(self):
self.frozen_pool = False
while self.queue.unfinished_tasks > 0:
if self.print_queue:
print_info('--> Thread pool... %s' % self.queue.unfinished_tasks)
sleep(5)
self.queue.join()
for i in range(self.queue_threads):
self.queue.put(None)
for t in self.threads:
t.join()
del self.threads[:]
def get_pool_results(self):
return self.pool_results
def clear_pool_results(self):
del self.pool_results[:]
Process Version:
'''
Created on Nov 4, 2019
#author: Kevin
'''
import traceback
from helium.loaders.loader_retailers import print_info
from time import sleep
import signal
import os
from multiprocessing import Queue, Process, Value, Array, JoinableQueue, Lock,\
RawArray, Manager
from dill import dill
import ctypes
from helium.misc.utils import ignore_exception
from mem_top import mem_top
import gc
class ProcessPool(object):
def __init__(self, queue_processes, *args, **kwargs):
self.frozen_pool = Value(ctypes.c_bool, kwargs.get('frozen_pool', False))
self.print_queue = kwargs.get('print_queue', True)
self.manager = Manager()
self.pool_results = self.manager.list()
self.queue_processes = queue_processes
self.queue = JoinableQueue()
self.processes = []
for i in range(self.queue_processes):
p = Process(target=self.make_pool_call)
p.start()
self.processes.append(p)
print 'Processes', self.queue_processes
def make_pool_call(self):
while True:
if self.frozen_pool.value:
sleep(1)
continue
item_pickled = self.queue.get()
if item_pickled is None:
#print '--> Ending'
self.queue.task_done()
break
item = dill.loads(item_pickled)
call = item.get('call', None)
args = item.get('args', [])
kwargs = item.get('kwargs', {})
keep_results = item.get('keep_results', False)
try:
result = call(*args, **kwargs)
if keep_results:
self.pool_results.append(dill.dumps((item, result)))
else:
del call, args, kwargs, keep_results, item, result
except Exception as e:
print e
traceback.print_exc()
os.kill(os.getpid(), signal.SIGUSR1)
self.queue.task_done()
def finish_pool_queue(self, callable=None):
self.frozen_pool.value = False
while self.queue._unfinished_tasks.get_value() > 0:
if self.print_queue:
print_info('--> Process pool... %s' % (self.queue._unfinished_tasks.get_value()))
if callable:
callable()
sleep(5)
for i in range(self.queue_processes):
self.queue.put(None)
self.queue.join()
self.queue.close()
for p in self.processes:
with ignore_exception: p.join(10)
with ignore_exception: p.terminate()
with ignore_exception: del self.processes[:]
def get_pool_results(self):
return self.pool_results
def clear_pool_results(self):
del self.pool_results[:]
def test(eg):
print 'EG', eg
Call with either:
tp = ThreadPool(queue_threads=2)
tp.queue.put({'call': test, 'args': [random.randint(0, 100)]})
tp.finish_pool_queue()
or
pp = ProcessPool(queue_processes=2)
pp.queue.put(dill.dumps({'call': test, 'args': [random.randint(0, 100)]}))
pp.queue.put(dill.dumps({'call': test, 'args': [random.randint(0, 100)]}))
pp.finish_pool_queue()
A multi-producers and multi-consumers example, verified. It should be easy to modify it to cover other cases, single/multi producers, single/multi consumers.
from multiprocessing import Process, JoinableQueue
import time
import os
q = JoinableQueue()
def producer():
for item in range(30):
time.sleep(2)
q.put(item)
pid = os.getpid()
print(f'producer {pid} done')
def worker():
while True:
item = q.get()
pid = os.getpid()
print(f'pid {pid} Working on {item}')
print(f'pid {pid} Finished {item}')
q.task_done()
for i in range(5):
p = Process(target=worker, daemon=True).start()
# send thirty task requests to the worker
producers = []
for i in range(2):
p = Process(target=producer)
producers.append(p)
p.start()
# make sure producers done
for p in producers:
p.join()
# block until all workers are done
q.join()
print('All work completed')
Explanation:
Two producers and five consumers in this example.
JoinableQueue is used to make sure all elements stored in queue will be processed. 'task_done' is for worker to notify an element is done. 'q.join()' will wait for all elements marked as done.
With #2, there is no need to join wait for every worker.
But it is important to join wait for every producer to store element into queue. Otherwise, program exit immediately.

Categories