How do I uses Python's multiprocessing.Value method with .map()? - python

I'm testing some code using multiprocessing to try to understand it, but I'm struggling to get the .Value to work. What am I doing wrong, it says p doesn't exist?
here's my code:
from multiprocessing import Pool, Value
from ctypes import c_int
if __name__ =="__main__":
p=Value(c_int,1)
def addone(a):
print(a)
with p.get_lock():
print(p.value)
p.value += 1
if __name__ =="__main__":
with Pool() as po:
po.map(addone,range(19))
print(p.value)
And I get this error:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\(I removed this)\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\(I removed this)\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "C:\Users\(I removed this)\Desktop\Python\make\multiTest.py", line 10, in addone
with p.get_lock():
NameError: name 'p' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File (I removed this), line 15, in <module>
po.map(addone,range(19))
File "C:\Users\(I removed this)\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\(I removed this)\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 771, in get
raise self._value
NameError: name 'p' is not defined
What should I do?

You only want a single instance of the shared value, p, to be operated on by the pool processes executing your worker function, addone. The way to do this is to use the initalizer and initargs arguments of the multiprocessing.pool.Pool class to initialize a global variable, p, in each pool process with the shared value created by the main process. Passing explicitly p to a multiprocessing pool worker function as an additional argument will not work; this will result in the cryptic error: RuntimeError: Synchronized objects should only be shared between processes through inheritance.
def init_pool_processes(shared_value):
global p
p = shared_value
def addone(a):
print(a)
with p.get_lock():
print(p.value)
p.value += 1
if __name__ =="__main__":
from multiprocessing import Pool, Value
from ctypes import c_int
p = Value(c_int, 1)
with Pool(initializer=init_pool_processes, initargs=(p,)) as po:
po.map(addone, range(19))
print(p.value)

Related

Python multiprocessing pool apply_async error

I'm trying to evaluate a number of processes in a multiprocessing pool but keep running into errors and I can't work out why... There's a simplified version of the code below:
class Object_1():
def add_godd_spd_column()
def calculate_correlations(arg1, arg2, arg3):
return {'a': 1}
processes = {}
pool = Pool(processes=6)
for i in range(1, 10):
processes[i] = pool.apply_async(calculate_correlations,
args=(arg1, arg2, arg3,))
correlations = {}
for i in range(0, 10):
correlations[i] = processes[i].get()
This returns the following error:
Traceback (most recent call last):
File "./02_results.py", line 116, in <module>
correlations[0] = processes[0].get()
File "/opt/anaconda3/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
File "/opt/anaconda3/lib/python3.5/multiprocessing/pool.py", line 385, in
_handle_tasks
put(task)
File "/opt/anaconda3/lib/python3.5/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/opt/anaconda3/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'SCADA.add_good_spd_column.<locals>.calculate_correlations
When I call the following:
correlations[0].successful()
I get the following error:
Traceback (most recent call last):
File "./02_results.py", line 116, in <module>
print(processes[0].successful())
File "/opt/anaconda3/lib/python3.5/multiprocessing/pool.py", line 595, in
successful
assert self.ready()
AssertionError
Is this because the process isn't actually finished before the .get() is called? The function being evaluated just returns a dictionary which should definitely be pickle-able...
Cheers,
The error is occurring because pickling a function nested in another function is not supported, and multiprocessing.Pool needs to pickle the function you pass as an argument to apply_async in order to execute it in a worker process. You have to move the function to the top level of the module, or make it an instance method of the class. Keep in mind that if you make it an instance method, the instance of the class itself must also be picklable.
And yes, the assertion error when calling successful() occurs because you're calling it before a result is ready. From the docs:
successful()
Return whether the call completed without raising an exception. Will raise AssertionError if the result is not ready.

Python multiprocessing race condition

I discovered a strange error when using concurrent.futures to read from multiple text files.
Here is a small reproducible example:
import os
import concurrent.futures
def read_file(file):
with open(os.path.join(data_dir, file),buffering=1000) as f:
for row in f:
try:
print(row)
except Exception as e:
print(str(e))
if __name__ == '__main__':
data_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 'data'))
files = ['file1', 'file2']
with concurrent.futures.ProcessPoolExecutor() as executor:
for file,_ in zip(files,executor.map(read_file,files)):
pass
file1 and file2 are arbitrary text files in the data directory.
I am getting the following error (basically a process tries to read data_dir variable before it is assigned):
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\my_username\AppData\Local\Continuum\Anaconda3\lib\concurrent\futures\process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "C:\Users\my_username\AppData\Local\Continuum\Anaconda3\lib\concurrent\futures\process.py", line 153, in _process_chunk
return [fn(*args) for args in chunk]
File "C:\Users\my_username\AppData\Local\Continuum\Anaconda3\lib\concurrent\futures\process.py", line 153, in <listcomp>
return [fn(*args) for args in chunk]
File "C:\Users\my_username\Downloads\example.py", line 5, in read_file
with open(os.path.join(data_dir, file),buffering=1000) as f:
NameError: name 'data_dir' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "example.py", line 16, in <module>
for file,_ in zip(files,executor.map(read_file,files)):
File "C:\Users\my_username\AppData\Local\Continuum\Anaconda3\lib\concurrent\futures\_base.py", line 556, in result_iterator
yield future.result()
File "C:\Users\my_username\AppData\Local\Continuum\Anaconda3\lib\concurrent\futures\_base.py", line 405, in result
return self.__get_result()
File "C:\Users\my_username\AppData\Local\Continuum\Anaconda3\lib\concurrent\futures\_base.py", line 357, in __get_result
raise self._exception
NameError: name 'data_dir' is not defined
If I place data_dir assignment before if __name__ == '__main__': block, I don't get this error and the code executes as expected.
What is causing this error? Clearly, data_dir is assigned before any asynchronous calls should be made in both cases.
ProcessPoolExecutor spaws a new Python process, imports the right module and calls the function you provide.
As data_dir will only be defined when you run the module, not when you import it, the error is to be expected.
Providing the data_dir file descriptor as an argument to read_file might work, as I believe that processes inherit the file descriptors of their parents. You'd need to check, though.
If were to use a ThreadPoolExecutor however, your example should work, as the spawned threads share memory.
fork() not available on windows, so python use spawn to start new process, which will start a fresh python interpreter process, no memory will be shared, but python will try to recreate worker function environment in the new process, that's why module level variable works. See doc for more detail.

Pathos (python module) behaves different in IDE and shell

I am trying to understand how to use the Pathos package to run a function that calls a function. It was my understanding an the advantage of Pathos over the main multiprocessing package was that it allowed functions inside functions. However, I can't seem to make it work. Here is the simplest example I could come up with:
def testf(x):
print(x)
import dill
import pathos
from pathos.multiprocessing import ProcessingPool
pool = ProcessingPool(nodes=3)
out2 = pool.map(testf, [1,2,3,4,5,6,7,8,9])
pool.close()
Output:
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/james/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/james/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/james/.local/lib/python3.6/site-packages/pathos/helpers/mp_helper.py", line 14, in <lambda>
func = lambda args: f(*args)
File "<input>", line 2, in testf
NameError: name 'print' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<input>", line 5, in <module>
File "/home/james/.local/lib/python3.6/site-packages/pathos/multiprocessing.py", line 136, in map
return _pool.map(star(f), zip(*args)) # chunksize
File "/home/james/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/james/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 608, in get
raise self._value
NameError: name 'print' is not defined
Edit: It seems that if I paste this code into a shell console, it works fine. No dice if I run it from my IDE of choice, PyCharm. So now my question is why the same code would work differently in the same version of the python interpreter (3.6.1) based on whether it is run from a shell or the in-app console/

Python multiprocessing Pool.imap throws ValueError: list.remove(x): x not in list

I have a very simple test fixture that instantiate and close a test class 'APMSim' in different threads, the class is not picklable, so I have to use multiprocessing Pool.imap to avoid them being transferred between processes:
class APMSimFixture(TestCase):
def setUp(self):
self.pool = multiprocessing.Pool()
self.sims = self.pool.imap(
apmSimUp,
range(numCores)
)
def tearDown(self):
self.pool.map(
simDown,
self.sims
)
def test_empty(self):
pass
However, when I run the empty Python unittest I encounter the following error:
Error
Traceback (most recent call last):
File "/home/peng/git/datapassport/spookystuff/mav/pyspookystuff_test/mav/__init__.py", line 87, in tearDown
self.sims
File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
Why this could happen? Is there a fix to this?
multiprocessing is re-raising an exception from your worker function/child process in the parent process, but it loses the traceback in the transfer from child to parent. Check your worker function, it's that code that's going wrong. It might help to take whatever your worker function is and change:
def apmSimUp(...):
... body ...
to:
import traceback
def apmSimUp(...):
try:
... body ...
except:
traceback.print_exc()
raise
This explicitly prints the full, original exception traceback (then lets it propagate normally), so you can see what the real problem is.

Using Python Server Process Manager() with Value()

I'm using a Server Process to handle shared memory in my program.
manager = multiprocessing.Manager()
tasksRemaining = manager.list()
sampleFileList = manager.list()
sortedSamples = manager.Value(c_int)
I get the following error when trying to declare sortedSamples:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/managers.py", line 207, in handle_request
result = func(c, *args, **kwds)
File "/usr/lib/python2.7/multiprocessing/managers.py", line 386, in create
obj = callable(*args, **kwds)
TypeError: __init__() takes at least 3 arguments (2 given)
According to the documentation at https://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes, Manager() supports list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Queue, Value and Array.
Whenever I do this outside of a manager, it works fine, such as:
sortedSamples = multiprocessing.Value(c_int)
What seems to be the problem?
Seems like when you use a manager you need to provide the actual value as well:
Value(typecode, value) Create an object with a writable value
attribute and return a proxy for it.\
(See the documentation for SyncManager)
Try doing e.g.:
value = manager.Value('i', 0)

Categories