multiprocessing.Pool.imap_unordered hangs in Python 2.6? - python

Following script hangs when running with Python 2.6.7. It prints [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] as expected in Python 2.7. Could it be a bug in Python 2.6? Are there any workaround?
def p1(x):
return x + 1
class Tasks(object):
#property
def mapper(self):
from multiprocessing import Pool
pool = Pool(processes=2)
return pool.imap_unordered
def run(self):
xs = range(10)
return self.mapper(p1, xs)
ts = Tasks()
print(list(ts.run()))
In my program, I could workaround the hang by rewriting Tasks.run to:
def run(self):
xs = range(10)
mapper = mapper
return mapper(p1, xs)
But I couldn't reproduce this with the above script.
Also note that the use of #property is essential here. Assigning mapper in __init__ like the following solves the problem:
def __init__(self):
from multiprocessing import Pool
pool = Pool(processes=2)
self.mapper = pool.imap_unordered

Yes, this is a bug in python2.6
The Pool object that you're using needs to stay referenced somewhere, otherwise python will hang when trying to use imap* methods (and possibly others)
Here is a fix for your example. Note that the pool object is kept inside the Tasks object, this may break your existing code.
class Tasks(object):
_pool = None
#property
def mapper(self):
from multiprocessing import Pool
if self._pool is None:
self._pool = Pool(processes=2)
return self._pool.imap_unordered
def run(self):
xs = range(10)
return self.mapper(p1, xs)

Related

How can I append to class variables using multiprocessing in python?

I have this program where everything is built in a class object. There is a function that does 50 computations of a another function, each with a different input, so I decided to use multiprocessing to speed it up. However, the list that needs to be returned in the end always returns empty. any ideas? Here is a simplified version of my problem. The output of main_function() should be a list containing the numbers 0-9, however the list returns empty.
class MyClass(object):
def __init__(self):
self.arr = list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
jobs = []
for i in range(0,10):
p = multiprocessing.Process(target=self.helper_function, args=(i,))
jobs.append(p)
p.start()
for job in jobs:
jobs.join()
print(self.arr)
arr is a list that's not going to be shared across subprocess instances.
For that you have to use a Manager object to create a managed list that is aware of the fact that it's shared between processes.
The key is:
self.arr = multiprocessing.Manager().list()
full working example:
import multiprocessing
class MyClass(object):
def __init__(self):
self.arr = multiprocessing.Manager().list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
jobs = []
for i in range(0,10):
p = multiprocessing.Process(target=self.helper_function, args=(i,))
jobs.append(p)
p.start()
for job in jobs:
job.join()
print(self.arr)
if __name__ == "__main__":
a = MyClass()
a.main_function()
this code now prints: [7, 9, 2, 8, 6, 0, 4, 3, 1, 5]
(well of course the order cannot be relied on between several executions, but all numbers are here which means that all processes contributed to the result)
multiprocessing is touchy.
For simple multiprocessing tasks, I would recomend:
from multiprocessing.dummy import Pool as ThreadPool
class MyClass(object):
def __init__(self):
self.arr = list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
pool = ThreadPool(4)
pool.map(self.helper_function, range(10))
print(self.arr)
if __name__ == '__main__':
c = MyClass()
c.main_function()
The idea of using map instead of complicated multithreading calls is from one of my favorite blog posts: https://chriskiehl.com/article/parallelism-in-one-line

Printing contents of a Queue in Python

If I am using the python module queue.Queue, I want to be able to print out the contents using a method that does not pop the original queue or create a new queue object.
I have tried looking into doing a get and then putting the contents back but this is too high cost.
# Ideally it would look like the following
from queue import Queue
q = Queue()
q.print()
q.put(1)
q.print()
>> [] # Or something like this
>> [1] # Or something like this
>>> print(list(q.queue))
Does this work for you?
Assuming you are using python 2.
You can use something like this:
from queue import Queue
q = Queue.Queue()
q.put(1)
q.put(2)
q.put(3)
print q.queue
You can also loop on it :
for q_item in q.queue:
print q_item
But unless you are dealing with threads, I would use a normal list as a Queue implementation.
Sorry, I am a bit late to answer this question, but going by this comment, I extended the Queue in the multiprocessing package as per your requirements. Hopefully it will help someone in the future.
import multiprocessing as mp
from multiprocessing import queues
class IterQueue(queues.Queue):
def __init__(self, *args, **kwargs):
ctx = mp.get_context()
kwargs['ctx'] = ctx
super().__init__(*args, **kwargs)
# <---- Iter Protocol ------>
def __iter__(self):
return self
def __next__(self):
try:
if not self.empty():
return self.get() # block=True | default
else:
raise StopIteration
except ValueError: # the Queue is closed
raise StopIteration
Given below is a sample usage of this IterQueue I wrote:
def sample_func(queue_ref):
for i in range(10):
queue_ref.put(i)
IQ = IterQueue()
p = mp.Process(target=sample_func, args=(IQ,))
p.start()
p.join()
print(list(IQ)) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
I have tested this IterQueue for even a few more complex scenarios, and it seems to be working fine. Let me know if you think this works, or it could fail in some situation.
not sure if this is still a question, but using the name.queue (i.e. q.queue here) works for me. This works also for the other types of queues in the module.
import queue
q = queue.Queue()
print(list(q.queue))
q.put(1)
print(list(q.queue))
The most common way to print the content of the queue if you are not using the queue then use this code snippet:
class Queue:
def __init__(self):
self.items = []
def push(self, e):
self.items.append(e)
def pop(self):
head = self.items[0]
self.items = self.item[1:]
return head
def print(self):
for e in self.items:
print(e)
q = Queue()
q.push(1)
q.push(23)
q.print()
OUTPUT
1
23

Python sharing a deque between multiprocessing processes

I've been looking at the following questions for the pas hour without any luck:
Python sharing a dictionary between parallel processes
multiprocessing: sharing a large read-only object between processes?
multiprocessing in python - sharing large object (e.g. pandas dataframe) between multiple processes
I've written a very basic test file to illustrate what I'm trying to do:
from collections import deque
from multiprocessing import Process
import numpy as np
class TestClass:
def __init__(self):
self.mem = deque(maxlen=4)
self.process = Process(target=self.run)
def run(self):
while True:
self.mem.append(np.array([0, 1, 2, 3, 4]))
def print_values(x):
while True:
print(x)
test = TestClass()
process = Process(target=print_values(test.mem))
test.process.start()
process.start()
Currently this outputs the following :
deque([], maxlen=4)
How can I access the mem value's from the main code or the process that runs "print_values"?
Unfortunately multiprocessing.Manager() doesn't support deque but it does work with list, dict, Queue, Value and Array. A list is fairly close so I've used it in the example below..
from multiprocessing import Process, Manager, Lock
import numpy as np
class TestClass:
def __init__(self):
self.maxlen = 4
self.manager = Manager()
self.mem = self.manager.list()
self.lock = self.manager.Lock()
self.process = Process(target=self.run, args=(self.mem, self.lock))
def run(self, mem, lock):
while True:
array = np.random.randint(0, high=10, size=5)
with lock:
if len(mem) >= self.maxlen:
mem.pop(0)
mem.append(array)
def print_values(mem, lock):
while True:
with lock:
print mem
test = TestClass()
print_process = Process(target=print_values, args=(test.mem, test.lock))
test.process.start()
print_process.start()
test.process.join()
print_process.join()
You have to be a little careful using manager objects. You can use them a lot like the objects they reference but you can't do something like... mem = mem[-4:] to truncate the values because you're changing the referenced object.
As for coding style, I might move the Manager objects outside the class or move the print_values function inside it but for an example, this works. If you move things around, just note that you can't use self.mem directly in the run method. You need to pass it in when you start the process or the fork that python does in the background will create a new instance and it won't be shared.
Hopefully this works for your situation, if not, we can try to adapt it a bit.
So by combining the code provided by #bivouac0 and the comment #Marijn Pieters posted, I came up with the following solution:
from multiprocessing import Process, Manager, Queue
class testClass:
def __init__(self, maxlen=4):
self.mem = Queue(maxsize=maxlen)
self.process = Process(target=self.run)
def run(self):
i = 0
while True:
self.mem.empty()
while not self.mem.full():
self.mem.put(i)
i += 1
def print_values(queue):
while True:
values = queue.get()
print(values)
if __name__ == "__main__":
test = testClass()
print_process = Process(target=print_values, args=(test.mem,))
test.process.start()
print_process.start()
test.process.join()
print_process.join()

Target function does not assign to class attribute when called by multiprocessing.Process from within Jupyter notebook

Consider the following code:
class Test:
def __init__(self):
self.out = []
def fit(self, x):
for i in x:
self.out.append(i*i)
test = Test()
X = [1, 2, 3, 4, 5]
I call test.fit() using Process form multiprocessing in an Jupyter Notebook like this:
from multiprocessing import Process
p = Process(target=test.fit, args=(X,))
p.start()
My problem here is, that even though Process calls test.fit, the product i*i is not appended to test.out. If I replace self.out.append(i*i) by print(i*i), I get the desired result. Hence, the calculation is done. If I call test.fit(X) without Process, test.out is append as desired. What am I doing wrong here?
Using Jupyter 4.4.0 on macOS 10.12
Because of the GIL, Python cannot share the memory with subprocess, basically. In order to share the memory with subprocess, I think you need to use multiprocessing.Queue.
from multiprocessing import Process, Queue
class Test:
def __init__(self):
self.out = Queue()
def fit(self, x):
for i in x:
self.out.put(i*i)
test = Test()
X = [1, 2, 3, 4, 5]
p = Process(target=test.fit, args=(X,))
p.start()
p.join()
Then, get values from queue.
y = []
while not test.out.empty():
y.append(test.out.get())
print(y)
Added#2017/12/02
Another way using to create shared list via multiprocessing.Manager is here. I think this is what you originally tried.
from multiprocessing import Manager, Process, Queue
manager = Manager()
class Test:
def __init__(self):
self.out = shared_list = manager.list()
def fit(self, x):
for i in x:
self.out.append(i*i)
test = Test()
X = [1, 2, 3, 4, 5]
p = Process(target=test.fit, args=(X,))
p.start()
p.join()
print(test.out)

Python sharing a lock between processes

I am attempting to use a partial function so that pool.map() can target a function that has more than one parameter (in this case a Lock() object).
Here is example code (taken from an answer to a previous question of mine):
from functools import partial
def target(lock, iterable_item):
for item in items:
# Do cool stuff
if (... some condition here ...):
lock.acquire()
# Write to stdout or logfile, etc.
lock.release()
def main():
iterable = [1, 2, 3, 4, 5]
pool = multiprocessing.Pool()
l = multiprocessing.Lock()
func = partial(target, l)
pool.map(func, iterable)
pool.close()
pool.join()
However when I run this code, I get the error:
Runtime Error: Lock objects should only be shared between processes through inheritance.
What am I missing here? How can I share the lock between my subprocesses?
You can't pass normal multiprocessing.Lock objects to Pool methods, because they can't be pickled. There are two ways to get around this. One is to create Manager() and pass a Manager.Lock():
def main():
iterable = [1, 2, 3, 4, 5]
pool = multiprocessing.Pool()
m = multiprocessing.Manager()
l = m.Lock()
func = partial(target, l)
pool.map(func, iterable)
pool.close()
pool.join()
This is a little bit heavyweight, though; using a Manager requires spawning another process to host the Manager server. And all calls to acquire/release the lock have to be sent to that server via IPC.
The other option is to pass the regular multiprocessing.Lock() at Pool creation time, using the initializer kwarg. This will make your lock instance global in all the child workers:
def target(iterable_item):
for item in items:
# Do cool stuff
if (... some condition here ...):
lock.acquire()
# Write to stdout or logfile, etc.
lock.release()
def init(l):
global lock
lock = l
def main():
iterable = [1, 2, 3, 4, 5]
l = multiprocessing.Lock()
pool = multiprocessing.Pool(initializer=init, initargs=(l,))
pool.map(target, iterable)
pool.close()
pool.join()
The second solution has the side-effect of no longer requiring partial.
Here's a version (using Barrier instead of Lock, but you get the idea) which would also work on Windows (where the missing fork is causing additional troubles):
import multiprocessing as mp
def procs(uid_barrier):
uid, barrier = uid_barrier
print(uid, 'waiting')
barrier.wait()
print(uid, 'past barrier')
def main():
N_PROCS = 10
with mp.Manager() as man:
barrier = man.Barrier(N_PROCS)
with mp.Pool(N_PROCS) as p:
p.map(procs, ((uid, barrier) for uid in range(N_PROCS)))
if __name__ == '__main__':
mp.freeze_support()
main()

Categories