I'm developing a software to benchmark some scripts Python using different methods (mono-thread, multi-threads, multi-processes). So I need to execute the same function (with same arguments, etc...) in differents processes.
How to pass the function to execute as argument to a process target ?
What I currently understand is that a reference to a function cannot work because the function referenced is not visible for other processes, that's why I tried with a custom manager for the shared memory.
Here a simplified code:
#!/bin/python
from multiprocessing import Pool
from multiprocessing.managers import BaseManager
from itertools import repeat
class FunctionManager(BaseManager):
pass
def maFunction(a, b):
print(a + b)
def threadedFunction(f_i_args):
(f, i, args) = f_i_args
f(*args)
FunctionManager.register('Function', maFunction)
myManager = FunctionManager()
myManager.start()
myManager.Function(0, 0) # Test 1
threadedFunction((maFunction, 0, (1, 1))) # Test 2
p = Pool()
args = zip(repeat(myManager.Function), range(10), repeat(2, 2))
p.map(threadedFunction, args) # Does not work
p.join()
myManager.shutdown()
The current pickling error at "p.map()" is the following :
2
0
Traceback (most recent call last):
File "./test.py", line 27, in <module>
p.map(threadedFunction, args) # Does not work
File "/usr/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
File "/usr/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'weakref'>: attribute lookup weakref on builtins failed
I got a bit different error from running your code. Your key problem I think is that you pass a function to FunctionManager.register() instead of a class. I also had to remove your zip to make it work and create a list manually, but this you can probably fix. This is just an example.
The following code works and does something using your exact structure. I would do this a bit differently and not use BaseManager, but I assume you have your reasons.
#!/usr/bin/python3.5
from multiprocessing import Pool
from multiprocessing.managers import BaseManager
from itertools import repeat
class FunctionManager(BaseManager):
pass
class maClass(object):
def __init__(self):
pass
def maFunction(self,a, b):
print(a + b)
def threadedFunction(f_i_args):
(f, i, args) = f_i_args
f(*args)
FunctionManager.register('Foobar', maClass)
myManager = FunctionManager()
myManager.start()
foobar = myManager.Foobar()
foobar.maFunction(0, 0) # Test 1
threadedFunction((foobar.maFunction, 0, (1, 1))) # Test 2
p = Pool()
#args = list(zip(repeat(foobar.maFunction), range(10), repeat(2, 2)))
args = []
for i in range(10):
args.append([foobar.maFunction, i, (i,2)])
p.map(threadedFunction, args) # Does now work
p.close()
p.join()
myManager.shutdown()
Or did I misunderstand your problem completely?
Hannu
Related
I was trying to run dll-library using Pooling in python and ran into the following problem. I've created a simple dll-library to illustrate the problem. Here's the source code of the dll-library, which contains only one function which sums two double numbers:
extern "C" {
double sum(double x, double y);
}
double sum(double x, double y) {
return x + y;
}
I compile it on a Linux system using
g++ -fPIC -c dll_main.cpp
g++ dll_main.o -shared -o sum_dll.so
I use this dll-library in the following Python script:
#!/usr/bin/env python3.8
from ctypes import *
import multiprocessing
from multiprocessing import Pool, freeze_support
def run_dll(dll_obj, x, y):
x_c = c_double(x)
y_c = c_double(y)
z = dll_obj.sum(x_c, y_c)
return z
def main():
pool = Pool(processes=2)
dll_obj = cdll.LoadLibrary('./sum_dll.so')
dll_obj.sum.restype = c_double
z = pool.map(run_dll, [(dll_obj, 2, 3), (dll_obj, 3, 4)])
pool.close()
pool.join()
if __name__ == "__main__":
freeze_support()
main()
I get the following error-message:
Traceback (most recent call last):
File "./run_dll.py", line 24, in <module>
main()
File "./run_dll.py", line 17, in main
z = pool.map(run_dll, [(dll_obj, 2, 3), (dll_obj, 3, 4)])
File "/usr/lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.8/multiprocessing/pool.py", line 768, in get
raise self._value
File "/usr/lib/python3.8/multiprocessing/pool.py", line 537, in _handle_tasks
put(task)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'CDLL.__init__.<locals>._FuncPtr'
What am I doing wrong? How to use dll-library with several processes in python properly?
You can't pickle dll pointers, because accessing the dlls requires a systemcall to link them to your process.
since you cannot pass dll functions as arguments, you need to wrap them in a python function, so you should define this function the same way you'd define a normal python function, in the global scope
from ctypes import *
import multiprocessing
from multiprocessing import Pool, freeze_support
dll_obj = cdll.LoadLibrary('./sum_dll.so')
dll_obj.sum.restype = c_double
def pyrun_dll(x, y): # looks for the dll in global scope
x_c = c_double(x)
y_c = c_double(y)
z = dll_obj.sum(x_c, y_c)
return z
def dll_sum(x_c, y_c): # pickleable wrapper
return dll_obj.sum(x_c, y_c)
def run_dll(py_sum, x, y): # arguments must be pickleable
x_c = c_double(x)
y_c = c_double(y)
z = py_sum(x_c, y_c)
return z
Then you call these python functions in your main
def main():
pool = Pool(processes=2)
z = pool.starmap(pyrun_dll, [(2, 3), (3, 4)])
z2 = pool.starmap(run_dll, [(dll_sum, 2, 3), (dll_sum, 3, 4)])
pool.close()
pool.join()
if __name__ == "__main__":
freeze_support()
main()
the way the above code executes depends on your operating system, but this will work on all platforms (assuming you change the dll name for each platform), because they will either fork the dll reference from the global scope or create one when they import your file.
Your original function will work if you pass dll_sum to it, instead of the dll raw function handle.
If you need your dll to be loaded dynamically then you should have each process make the call to cdll to link to the dll itself (usually through the initializer), not get it through a function arguments.
I'm trying to decrease running time by using multiprocessing.
I got a weird error TypeError: cannot pickle 'weakref' object
I'm not quite sure why this error occurs because I also use this approach to run another program but it run normally. Can someone explain why this error occurs.
I already follow this Solution but it did not work for me.
import multiprocessing
from scipy import stats
import numpy as np
import pandas as pd
class T_TestFeature:
def __init__(self, data, classes):
self.data = data
self.classes = classes
self.manager = multiprocessing.Manager()
self.pval = self.manager.list()
def preform(self):
process = []
for i in range(10):
process.append(multiprocessing.Process(target=self.t_test, args=(i,)))
for p in process:
p.start()
for p in process:
p.join()
def t_test(self, k):
index_samples = np.array(self.data)[:,k]
rs1 = [index_samples[i] for i in range(len(index_samples)) if self.classes[i] == "Virginia"]
rs2 = [index_samples[i] for i in range(len(index_samples)) if self.classes[i] != "Virginia"]
self.pval.append(stats.ttest_ind(rs1, rs2, equal_var=False).pvalue)
def main():
df = pd.read_excel("/Users/xxx/Documents/Project/src/flattened.xlsx")
flattened = df.values.T
y = df.columns
result = T_TestFeature(flattened, y)
result.preform()
print(result.pval)
if __name__ == "__main__":
main()
Traceback (most recent call last):
File "/Users/xxx/Documents/Project/src/t_test.py", line 41, in <module>
main()
File "/Users/xxx/Documents/Project/src/t_test.py", line 37, in main
result.preform()
File "/Users/xxx/Documents/Project/src/t_test.py", line 21, in preform
p.start()
File "/Users/xxx/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/xxx/opt/anaconda3/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Users/xxx/opt/anaconda3/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/Users/xxx/opt/anaconda3/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/Users/x/opt/anaconda3/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _xxlaunch
reduction.dump(process_obj, fp)
File "/Users/xxx/opt/anaconda3/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'weakref' object
Here is a simpler way to reproduce your issue:
from multiprocessing import Manager, Process
class A:
def __init__(self):
self.manager = Manager()
def start(self):
print("started")
if __name__ == "__main__":
a = A()
proc = Process(target=a.start)
proc.start()
proc.join()
You cannot pickle instances containing manager objects, because they contain reference to the manager process they started (therefore, in general you can't pickle instances containing objects of class Process).
A simple fix would be to not store the manager. It will automatically be garbage collected once no references to the managed list remains:
def __init__(self, data, classes):
self.data = data
self.classes = classes
manager = multiprocessing.Manager()
self.pval = manager.list()
I met some behavior of Python multiprocessing which I cannot understand...
For example:
from multiprocessing import Pool
import time
import sys
def f(x):
time.sleep(10)
print(x)
return x * x
def f2(x, f):
time.sleep(10)
print(x, file=f)
return x * x
if __name__ == '__main__':
p = Pool(5)
for t in range(10):
p.apply_async(f, args=(t,))
p.close()
p.join() # Here it blocks and prints the number, which is normal.
p = Pool(5)
for t in range(10):
p.apply_async(f2, args=(t, sys.stdout))
p.close()
p.join() # Here it does not block and nothing happends(no output at all)...
The output is:
3
1
0
2
4
5
9
6
7
8
I know that we have to use something like shared variables to pass to the function when using multiprocessing and apply_async, but what will happen if I pass a normal variable to a function used in apply_async?
The multiprocessing.Pool executes your logic in a separate process. If the logic raises and exception, the Pool will return it to the caller.
In your code you are not collecting the output of your functions, therefore you don't notice the real issue.
Try to modify your code as follows:
p = Pool(5)
for t in range(10):
task = p.apply_async(f2, args=(t, sys.stdout))
task.get()
You will then get the actual exception which was raised within f2:
Traceback (most recent call last):
File "asd.py", line 24, in <module>
p.apply_async(f2, args=(t, sys.stdout)).get()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
File "/usr/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot serialize '_io.TextIOWrapper' object
It turns out that sys.stdout is not picklable. Which, in this case, is not an issue as sys.stdout unique per process. You can avoid passing it over the function and just use it as is within f2.
I'm trying to write in the same shared array in a parallel processing python script.
When I do it outside a class, in a normal script, everything works right. But when I try to do it through a class (using the same code), I get the
Runtime Error: SynchronizedArray objects should only be shared between processes through inheritance.
My script is the following (without a class):
import numpy
import ctypes
from multiprocessing import Pool, Array, cpu_count
n = 2
total_costs_matrix_base = Array(ctypes.c_double, n*n)
total_costs_matrix = numpy.ctypeslib.as_array(
total_costs_matrix_base.get_obj())
total_costs_matrix = total_costs_matrix.reshape(n,n)
def set_total_costs_matrix( i, j, def_param = total_costs_matrix_base):
total_costs_matrix[i,j] = i * j
if __name__ == "__main__":
pool = Pool(processes=cpu_count())
iterable = []
for i in range(n):
for j in range(i+1,n):
iterable.append((i,j))
pool.starmap(set_total_costs_matrix, iterable)
total_costs_matrix.dump('some/path/to/file')
That script works well. The one that doesn't is the following (which uses a class):
import numpy
import ctypes
from multiprocessing import Pool, Array, cpu_count
class CostComputation(object):
"""Computes the cost matrix."""
def __init__(self):
self.n = 2
self.total_costs_matrix_base = Array(ctypes.c_double, self.n*self.n)
self.total_costs_matrix = numpy.ctypeslib.as_array(
self.total_costs_matrix_base.get_obj())
self.total_costs_matrix = self.total_costs_matrix.reshape(self.n,self.n)
def set_total_costs_matrix(self, i, j, def_param = None):
def_param = self.total_costs_matrix_base
self.total_costs_matrix[i,j] = i * j
def write_cost_matrix(self):
pool = Pool(processes=cpu_count())
iterable = []
for i in range(self.n):
for j in range(i+1,self.n):
iterable.append((i,j))
pool.starmap(self.set_total_costs_matrix, iterable)
self.total_costs_matrix.dump('some/path/to/file')
After this, I would call write_cost_matrix from another file, after creating an instance of CostComputation.
I read this answer but still couldn't solve my problem.
I'm using Python 3.4.2 in a Mac OSX Yosemite 10.10.4.
EDIT
When using the class CostComputation, the script I'm using is:
from cost_computation import CostComputation
cc = CostComputation()
cc.write_costs_matrix()
The whole error is:
Traceback (most recent call last):
File "app.py", line 65, in <module>
cc.write_cost_matrix()
File "/path/to/cost_computation.py", line 75, in write_cost_matrix
pool.starmap(self.set_total_costs_matrix, iterable)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py", line 268, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py", line 599, in get
raise self._value
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py", line 383, in _handle_tasks
put(task)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/sharedctypes.py", line 192, in __reduce__
assert_spawning(self)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/context.py", line 347, in assert_spawning
' through inheritance' % type(obj).__name__
RuntimeError: SynchronizedArray objects should only be shared between processes through inheritance
Try creating a second class which contains the shared data only. Then use that class object in your main class.
Here is the code I am using:
def initFunction(arg1, arg2):
def funct(value):
return arg1 * arg2 * value
return funct
os.system("taskset -p 0xff %d" % os.getpid())
pool = Pool(processes=4)
t = np.linspace(0,1,10e3)
a,b,c,d,e,f,g,h = sy.symbols('a,b,c,d,e,f,g,h',commutative=False)
arg1 = sy.Matrix([[a,b],[c,d]])
arg2 = sy.Matrix([[e,f],[g,h]])
myFunct = initFunction(arg1, arg2)
m3 = map(myFunct,t) # this works
m4 = pool.map(myFunct,t) # this does NOT work
The error I'm getting is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "/home/justin/Research/mapTest.py", line 46, in <module>
m4 = pool.map(myFunct,t)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
So what does this error mean and how can I multiprocess this map function?
Objects that you pass between processes when using multiprocessing must be importable from the __main__ module, so that they can be unpickled in the child. Nested functions, like funct, are not importable from __main__, so you get that error. You can achieve what you're trying by using a functools.partial instead:
from multiprocessing import Pool
from functools import partial
def funct(arg1, arg2, value):
return arg1 * arg2 * value
if __name__ == "__main__":
t = [1,2,3,4]
arg1 = 4
arg2 = 5
pool = Pool(processes=4)
func = partial(funct, arg1, arg2)
m4 = pool.map(func,t)
print(m4)
Output:
[20, 40, 60, 80]