I am new to multiprocessing and I need your help.
I have four variables that each of them can take up to 4 values (integers or floats) and I stored all of them in a list called par=[A, B, C, D]. (see below)
I have created a list of possible combinations with par = itertools.product(*par) .
Then, I call a function func1, that takes these arguments and some more and calculates stuff. With the results of the func1, I call another function that calculates stuff and then writes to a file.
I want to run these as a whole in parallel with multiprocessing.Pool I thought to embed func1 and func2 in another function, called func_run, and map this with the list par I created above.
To summarize, my code looks like:
#values that I will use for func1
r = np.logspace(np.log10(5),np.log10(300),300)
T = 200*r
#Parameters for the sim
A = [0.1, 0.05, 0.001, 0.005]
B = [0.005, 0.025, 0.05, 0.1]
C = [20, 60, 100, 200]
D = [10, 20, 40, 80]
#Store them in a list
par = [A, B, C, D]
#Create a list with all combinations
par = list(itertools.product(*par))
def func_run(param):
for i in range(len(param)):
# Call func1
values = func1(param[i][0],param[i][1],param[i][2], param[i][3], r, T)
x = values[0]
y = values[1]
# and so on
# Call func2
results = func2(x,y,...)
z = results[0]
w = results[1]
# and so on
data_dict = {'result 1': [param[i][0]], 'result 2' : [param[i][1]]}
df = pd.DataFrame(data=data_dict)
with open(filename, 'a') as f:
df.to_csv(f, header=False)
return
Then, I call the func_run with multiprocessing.
from multiprocessing import Pool
pool = Pool(processes=4)
results = pool.map(func_run, par)
As a result, I get a, TypeError with traceback:
---------------------------------------------------------------------------
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/user/anaconda3/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/user/anaconda3/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "<ipython-input-14-5ce94acfd95e>", line 5, in run
values = calc_val(param[i][0],param[i][1],param[i][2], param[i][3], r, T)
TypeError: 'float' object is not subscriptable
"""
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
<ipython-input-15-f45146f68f66> in <module>()
1 pool = Pool(processes=4)
----> 2 test = pool.map(run,par)
~/anaconda3/lib/python3.6/multiprocessing/pool.py in map(self, func, iterable, chunksize)
264 in a list that is returned.
265 '''
--> 266 return self._map_async(func, iterable, mapstar, chunksize).get()
267
268 def starmap(self, func, iterable, chunksize=None):
~/anaconda3/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
642 return self._value
643 else:
--> 644 raise self._value
645
646 def _set(self, i, obj):
TypeError: 'float' object is not subscriptable
Unfortunately, it is impossible to add the whole functions and what are they doing because they are hundreds of lines. I hope that you can get the feeling though even though you cannot really reproduce it by yourselfs.
Is it possible to run something like this with multiprocessing or I need a different approach?
It would be great if anyone can help me understand the error and make it run.
The result of
par = list(itertools.product(*par))
is a list of tuples of floats (and ints). Pool.map() takes an iterable as the 2nd argument and maps over its items, passing them individually to given func. In other words in the function func_run(param) param is not a list of tuples of numbers, but a tuple of numbers, and so
param[i][0]
is trying to access the ith float object's 0th item, which of course makes no sense, and so the exception. You probably should remove the for-loop in func_run():
def func_run(param):
values = func1(param[0], param[1], param[2], param[3], r, T)
...
Related
I have looked around on SO and surprisingly not found an answer to this question. I assume this is because normally inner/nested functions are used for something in particular (eg. maintaining an environment variable, factories) as opposed to something trivial like I'm trying to use them for. In any case, I can't seem to find any information on how to properly call an inner function from an outer function without having to declare inner() above outer() in the file. The problem is from this problem on HackerRank (https://www.hackerrank.com/challenges/circular-array-rotation/problem).
def circularArrayRotation(a, k, queries):
def rotateArrayRightCircular(arr: list, iterations: int) -> list:
"""
Perform a 'right circular rotation' on an array for number of iterations.
Note: function actually moves last 'iterations' elements of array to front of array.
>>>rotateArrayRightCircular([0,1,2], 1)
[2,0,1]
>>>rotateArrayRightCircular([0,1,2,3,4,5], 3)
[3,4,5,0,1,2]
>>>rotateArrayRightCircular([0,1,2,3,4,5], 6)
[0,1,2,3,4,5]
"""
return arr[-1 * iterations:] + arr[0:-1 * iterations]
k = k % len(a)
a = rotateArrayRightCircular(a, k)
res = []
for n in queries:
res.append(a[n])
return res
The code above does what I want it to, but it's somehow inelegant to me that I have to put the inner function call after the inner function definition. Various errors with different attempts:
# trying 'self.inner()'
Traceback (most recent call last):
File "solution.py", line 52, in <module>
result = circularArrayRotation(a, k, queries)
File "solution.py", line 13, in circularArrayRotation
a = self.rotateArrayRightCircular(a, k)
NameError: name 'self' is not defined
# Removing 'self' and leaving the definition of inner() after the call to inner()
Traceback (most recent call last):
File "solution.py", line 52, in <module>
result = circularArrayRotation(a, k, queries)
File "solution.py", line 13, in circularArrayRotation
a = rotateArrayRightCircular(a, k)
UnboundLocalError: local variable 'rotateArrayRightCircular' referenced before assignment
Any idea how I could include def inner() after the call to inner() without throwing an error?
As a function is executed from top to bottom, and a function is put into existence as the function is processed, what you want is just not possible.
You could put the function before the outer one, making it outer itself, possibly adding some parameters (not necessary here). (BTW, it looks so generic that other parts of the code might want to use it as well, so why not outer?)
But otherwise, you are stuck. It is essetially the same situation as in
def f():
print(a) # a doesn't exist yet, so this is an error
a = 4
Well, you could do it this way:
def circularArrayRotation(a, k, queries):
def inner_code():
k = k % len(a)
a = rotateArrayRightCircular(a, k)
# BTW, instead of the following, you could just do
# return [a[n] for n in queries]
res = []
for n in queries:
res.append(a[n])
return res
def rotateArrayRightCircular(arr: list, iterations: int) -> list:
"""
Perform a 'right circular rotation' on an array for number of iterations.
Note: function actually moves last 'iterations' elements of array to front of array.
>>>rotateArrayRightCircular([0,1,2], 1)
[2,0,1]
>>>rotateArrayRightCircular([0,1,2,3,4,5], 3)
[3,4,5,0,1,2]
>>>rotateArrayRightCircular([0,1,2,3,4,5], 6)
[0,1,2,3,4,5]
"""
return arr[-1 * iterations:] + arr[0:-1 * iterations]
return inner_code()
but I don't see that you gain anything from it.
This is not possible in Python, but is possible in other languages like Javascript and PHP. It is called function hoisting.
Hi I'm trying to use multiprocessing to speed up my code. However, the apply_async doesn't work for me. I tried to do a simple example like:
from multiprocessing.pool import Pool
t = [0, 1, 2, 3, 4, 5]
def cube(x):
t[x] = x**3
pool = Pool(processes=4)
for i in range(6):
pool.apply_async(cube, args=(i, ))
for x in t:
print(x)
It does not really change t as I would expect.
My real code is like:
from multiprocessing.pool import Pool
def func(a, b, c, d):
#some calculations
#save result to files
#no return value
lt = #list of possible value of a
#set values to b, c, d
p = Pool()
for i in lt:
p.apply_async(func, args=(i, b, c, d, ))
Where are the problems here?
Thank you!
Update: Thanks to the comments and answers, now I understand why my simple example won't work. However, I'm still in trouble with my real code. I have checked that my func does not rely on any global variable, so it seems not to be the same problem as my example code.
As suggested, I added a return value to my func, now my code is:
f = Flux("reactor")
d = Detector("Ge")
mv = arange(-6, 1.5, 0.5)
p = Pool()
lt = ["uee", "dee"]
for i in lt:
re = p.apply_async(res, args=(i, d, f, mv, ))
print(re.get())
p.close()
p.join()
Now I get the following error:
Traceback (most recent call last):
File "/Users/Shu/Documents/Programming/Python/Research/debug.py", line 35, in <module>
print(re.get())
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 608, in get
raise self._value
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'Flux.__init__.<locals>.<lambda>'
EDIT: the first example you provided will not work for a simple reason: processes do not share memory. Therefore, the change t[x] = x**3 will not be applied to the parent process leaving the values of the list t unchanged.
You need to actually return the value from the computation and build a new list from that.
def cube(x):
return x**3
t = [0, 1, 2, 3, 4, 5]
p = Pool()
t = p.map(cube, t)
print(t)
If, as you claim in the second example, the results are supposed not to be returned but to be independently stored within files and this does not happen, I'd recommend to check the return value of your function to see whether the function itself is raising an exception or not.
I'd recommend you to get the actual results and see what happens:
p = Pool()
for i in lt:
res = p.apply_async(func, args=(i, b, c, d, ))
print(res.get()) # this will raise an exception if it happens within func
p.close() # do not accept any more tasks
p.join() # wait for the completion of all scheduled jobs
Function quits too soon, try add at the end of your script this code:
import time
time.sleep(3)
Could some one explain what is wrong with below code
from multiprocessing import Pool
def sq(x):
yield x**2
p = Pool(2)
n = p.map(sq, range(10))
I am getting following error
MaybeEncodingError Traceback (most recent call
last) in ()
5 p = Pool(2)
6
----> 7 n = p.map(sq, range(10))
/home/devil/anaconda3/lib/python3.4/multiprocessing/pool.py in
map(self, func, iterable, chunksize)
258 in a list that is returned.
259 '''
--> 260 return self._map_async(func, iterable, mapstar, chunksize).get()
261
262 def starmap(self, func, iterable, chunksize=None):
/home/devil/anaconda3/lib/python3.4/multiprocessing/pool.py in
get(self, timeout)
606 return self._value
607 else:
--> 608 raise self._value
609
610 def _set(self, i, obj):
MaybeEncodingError: Error sending result: '[, ]'. Reason:
'TypeError("can't pickle generator objects",)'
Many thanks in advance.
You have to use a function not a generator here. Means: change yield by return to convert sq to a function. Pool can't work with generators.
Moreover, when trying to create a working version on Windows, I had a strange repeating error message.
Attempt to start a new process before the current process
has finished its bootstrapping phase.
This probably means that you are on Windows and you have
forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
literally quoting the comment I got, since it's self-explanatory:
the error on windows is because each process spawns a new python process which interprets the python file etc. so everything outside the "if main block" is executed again"
so to be portable, you have to use __name__=="__main__" when running this module:
from multiprocessing import Pool
def sq(x):
return x**2
if __name__=="__main__":
p = Pool(2)
n = p.map(sq, range(10))
print(n)
Result:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Edit: if you don't want to store the values beforehand, you can use imap:
n = p.imap(sq, range(10))
n is now a generator object. To consume the values (and activate the actual processing), I force iteration through a list and I get the same result as above:
print(list(n))
Note that the documentation indicates that imap is much slower than map.
I have a function returned by theano.function(), and I want to use it within multiprocessing for speedup. The following is a simplified demo script to show where I run into problem:
import numpy as np
from multiprocessing import Pool
from functools import partial
import theano
from theano import tensor
def get_theano_func():
x = tensor.dscalar()
y = x + 0.1
f = theano.function([x], [y])
return f
def func1(func, x):
return func(x)
def MPjob(xlist):
f = get_theano_func()
fp = partial(func1, func=f)
pool = Pool(processes=5)
Results = pool.imap(fp, xlist)
Y = []
for y in Results:
Y.append(y[0])
pool.close()
return Y
if __name__ == '__main__':
xlist = np.arange(0, 5, 1)
Y = MPjob(xlist)
print(Y)
In the above codes, the theano function 'f' is fed to 'func1()' as input argument. If MPjob() runs correctly, it should return [0.1, 1.1, 2.1, 3.1, 4.1]. However, an exception "TypeError: func1() got multiple values for argument 'func'" raised.
The full trackback log is as follows:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Python35\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
TypeError: func1() got multiple values for argument 'func'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "F:/DaweiLeng/Code/workspace/Python/General/theano_multiprocess_debug.py", line 36, in <module>
Y = MPjob(xlist)
File "F:/DaweiLeng/Code/workspace/Python/General/theano_multiprocess_debug.py", line 29, in MPjob
for y in Results:
File "C:\Python35\lib\multiprocessing\pool.py", line 695, in next
raise value
TypeError: func1() got multiple values for argument 'func'
Anyone got a hint?
Turns out it's related with the partial() function. The full explanation is here https://github.com/Theano/Theano/issues/4720#issuecomment-232029702
I have the need to take a function of n parameters and n lists of values and apply the function to each possible permutation of arguments. I have looked in itertools and none of the functions are quite right. The following is my attempt. Can someone explain what I am doing wrong? Thanks.
def CrossReference(f, *args):
result = []
def inner(g, *argsinner):
rest = argsinner[1:]
a = argsinner[0]
if type(a) is not list:
a = [a]
for x in a:
h = partial(g, x)
if len(rest) > 0:
inner(h, rest)
else:
result.append(h())
inner(f, args)
return result
Here is my example test and error:
def sums(x,y,z):
return x+y+z
CrossReference(sums, [1,2,3], 4, [5,6,7])
Traceback (most recent call last): File "", line 1,
in File "", line 13,
in CrossReference File "", line 12, in inner
TypeError: sums() takes exactly 3 arguments (1 given)
The problem is in the way you call your inner function. You define your function header as:
def inner(g, *argsinner):
But you call your function like:
inner(f, args)
And:
inner(h, rest)
This means that you will end up with a single tuple (monotuple?) containing the tuple of your args. You can either change your function definition to:
def inner(g, argsinner):
Or change your calling to:
inner(h, *rest)
def sums(x,y=0,z=0):
return x+y+z
def apply(fn,*args):
for a in args:
try:
yield fn(*a)
except TypeError:
try:
yield fn(**a)
except TypeError:
yield fn(a)
print list(apply(sums,[1,2,3], 4, [5,6,7]))
is one way you might do it (not the only way though)