Python multiprocessing map function error - python

I have a simple multiprocessing example that I'm trying to create. The ordinary map() function version works, but when changed to Pool.map, I'm getting a strange error:
from multiprocessing import Pool
from functools import partial
x = [1,2,3]
y = 10
f = lambda x,y: x**2+y
# ordinary map works:
map(partial(f,y=y),x)
# [11, 14, 19]
# multiprocessing map does not
p = Pool(4)
p.map(partial(f, y=y), x)
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
Pickling error? What is this exactly?

The arguments to Pool.map must be picklable. Module-level functions are picklable, but partial(f, y=y) is not defined at the module-level and so is not pickable.
There is a simple workaround:
def g(x, y=y):
return f(x, y)
p.map(g, x)
Functions made with functools.partial used to be unpickable.
However, with Python2.7 or better, you can also define g (at the module level) using functools.partial:
import multiprocessing as mp
import functools
def f(x, y):
return x**2 + y
x = [1,2,3]
y = 10
g = functools.partial(f, y=y)
if __name__ == '__main__':
p = mp.Pool()
print(p.map(g, x))
yields [11, 14, 19]. But note to get this result f had to be defined with def rather than lambda. I think this is because pickle relies on "fully qualified" name references to look up function object values.

Related

Why lambdify never stops?

x = symbols('x')
ch = 'exp(cos(cos(exp((sin(-0.06792841536110628))**(-6.045461643745118)))))'
f = lambdify(x, ch, "numpy")
print(float(f(2)))
It does not work, the programm is running and never ends(no error is issued).
My goal is to avoid this kind of cases (among multiple cases) by doing a try/except but i can't as there is no error
Why no error is released?
How can i avoid these cases ?
Thanks for your helping me !
In general, I'm not sure you can. SymPy or NumPy will keep trying to compute the number until precision is exhausted. But you can create a function that will raise and error if numbers are out of bounds for your interest:
>>> from sympy import cos as _cos, I, exp
>>> def cos(x):
... if abs(x) > 10**20: raise ValueError
... return _cos(x)
>>> exp(cos(cos(exp(5*(1+I)))))
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "<string>", line 2, in cos
ValueError
>>> f = lambda x: exp(cos(cos(exp(x))))
>>> f(sin(-0.06792841536110628)**-6.045461643745118)
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "<string>", line 1, in <lambda>
File "<string>", line 2, in cos
ValueError
But you have to think carefully about when you want to raise such an error. For example, SymPy has no trouble computing f(100) or f(100*I) if the non-error-catching cos is used. So think about when actually you want the error to rise.
lambdify is a lexical translator, converting a sympy expression to a python/numpy function.
Make a string with a symbol:
In [27]: ch = 'exp(cos(cos(exp((sin(x))**(-6.045461643745118)))))'
sympify(ch) has no problem, because it doesn't need to do any numeric calculation. So lambdify also works:
In [28]: f=lambdify(x,ch)
In [29]: f?
Signature: f(x)
Docstring:
Created with lambdify. Signature:
func(x)
Expression:
exp(cos(cos(exp((sin(x))**(-6.045461643745118)))))
Source code:
def _lambdifygenerated(x):
return (exp(cos(cos(exp(sin(x)**(-6.045461643745118))))))
The equivalent mpmath:
def _lambdifygenerated(x):
return (exp(cos(cos(exp(sin(x)**(mpf((1, 54452677612106279, -53, 56))))))))
And a working numeric evaluation:
In [33]: f(0j)
Out[33]: mpc(real='nan', imag='0.0')

Python Numba #jit "Lowering Error"?

I have been making a program involving complex number calculations and three of the functions I am using are these:
import turtle
import cmath
import numpy as np
from numba import jit
#jit
def quadratics(arange=[0,10],brange=[0,100],crange=[0,100], step=2):
l = []
for a in range(arange[0],arange[1]+1,step):
for b in range(brange[0],brange[1]+1,step):
for c in range(crange[0],crange[1]+1,step):
if a != 0:
l.append((-b+cmath.sqrt(b**2-4*a*c))/(2*a))
l.append((-b-cmath.sqrt(b**2-4*a*c))/(2*a))
return l
def mindistance(point, roots):
return min(np.array([(point.real-i.real)**2+(point.imag-i.imag)**2 for i in roots]))
#jit
def drawing_matrix(imsz=500,xrange=[-5,5],yrange=[-5,5],poly=2,acc=0.01):
l = np.zeros((imsz, imsz))
roots = quadratics()
for x in range(0, imsz):
for y in range(0, imsz):
c = complex((x/imsz)*(xrange[1]-xrange[0])+xrange[0],(y/imsz)*(yrange[1]-yrange[0])+yrange[0])
if mindistance(c, roots) <= acc:
l[x,y] = 1
return l
Now, I have been using Numba to speed things up with the #jit decorator and it's fine apart from mindistance(). If I put the #jit decorator on that function (which would be really useful, since it is called thousands of times during a program run) it produces the most almighty of error messages ending with:
numba.errors.LoweringError: Failed at object (object mode backend)
make_function(code=<code object <listcomp> at 0x000001F460FAB540, file "C:\Users\Isky\Documents\IT\Programs\Mathematics\AlgebraicNumbers.py", line 19>, name=$const0.7, defaults=None, closure=$0.5)
File "AlgebraicNumbers.py", line 19
[1] During: lowering "$0.8 = make_function(code=<code object <listcomp> at 0x000001F460FAB540, file "C:\Users\Isky\Documents\IT\Programs\Mathematics\AlgebraicNumbers.py", line 19>, name=$const0.7, defaults=None, closure=$0.5)" at C:\Users\Isky\Documents\IT\Programs\Mathematics\AlgebraicNumbers.py (19)
which is line 19 (as in def mindistance()). Can you tell me why Numba doesn't like this function?

Python - apply_async doesn't execute function

Hi I'm trying to use multiprocessing to speed up my code. However, the apply_async doesn't work for me. I tried to do a simple example like:
from multiprocessing.pool import Pool
t = [0, 1, 2, 3, 4, 5]
def cube(x):
t[x] = x**3
pool = Pool(processes=4)
for i in range(6):
pool.apply_async(cube, args=(i, ))
for x in t:
print(x)
It does not really change t as I would expect.
My real code is like:
from multiprocessing.pool import Pool
def func(a, b, c, d):
#some calculations
#save result to files
#no return value
lt = #list of possible value of a
#set values to b, c, d
p = Pool()
for i in lt:
p.apply_async(func, args=(i, b, c, d, ))
Where are the problems here?
Thank you!
Update: Thanks to the comments and answers, now I understand why my simple example won't work. However, I'm still in trouble with my real code. I have checked that my func does not rely on any global variable, so it seems not to be the same problem as my example code.
As suggested, I added a return value to my func, now my code is:
f = Flux("reactor")
d = Detector("Ge")
mv = arange(-6, 1.5, 0.5)
p = Pool()
lt = ["uee", "dee"]
for i in lt:
re = p.apply_async(res, args=(i, d, f, mv, ))
print(re.get())
p.close()
p.join()
Now I get the following error:
Traceback (most recent call last):
File "/Users/Shu/Documents/Programming/Python/Research/debug.py", line 35, in <module>
print(re.get())
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 608, in get
raise self._value
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'Flux.__init__.<locals>.<lambda>'
EDIT: the first example you provided will not work for a simple reason: processes do not share memory. Therefore, the change t[x] = x**3 will not be applied to the parent process leaving the values of the list t unchanged.
You need to actually return the value from the computation and build a new list from that.
def cube(x):
return x**3
t = [0, 1, 2, 3, 4, 5]
p = Pool()
t = p.map(cube, t)
print(t)
If, as you claim in the second example, the results are supposed not to be returned but to be independently stored within files and this does not happen, I'd recommend to check the return value of your function to see whether the function itself is raising an exception or not.
I'd recommend you to get the actual results and see what happens:
p = Pool()
for i in lt:
res = p.apply_async(func, args=(i, b, c, d, ))
print(res.get()) # this will raise an exception if it happens within func
p.close() # do not accept any more tasks
p.join() # wait for the completion of all scheduled jobs
Function quits too soon, try add at the end of your script this code:
import time
time.sleep(3)

Error Output Using Timer Object in Python

I've got several functions that I need to evaluate the performance of using the timeit module.
For starters, I'm attempting to use a Timer object to evaluate a sequential search function run on a list of random integers which should return the time to execute in seconds. The function returns a value of False for -1 (since it will never find -1) but gives me the following error along with it. Here is the complete output:
False
Traceback (most recent call last):
File "D:/.../search-test.py", line 37, in <module>
main()
File "D:/.../search-test.py", line 33, in main
print(t1.timeit(number=100))
File "C:\...\Anaconda2\lib\timeit.py", line 202, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
TypeError: sequential_search() takes exactly 2 arguments (0 given)
This is my program:
from timeit import Timer
import random
def sequential_search(a_list, item):
pos = 0
found = False
while pos < len(a_list) and not found:
if a_list[pos] == item:
found = True
else:
pos = pos+1
return found
def num_gen(value):
myrandom = random.sample(xrange(0, value), value)
return myrandom
def main():
new_list = num_gen(100)
print(sequential_search(new_list, -1))
t1 = (Timer("sequential_search()", "from __main__ import sequential_search"))
print(t1.timeit(number=100))
if __name__ == '__main__':
main()
I'm a noob to programming and can honestly say that I'm struggling. This error doesn't make any sense to me. I don't understand why it's asking for the sequential_search function arguments when they're already passed in main(). Plugging the arguments in to the Timer statement doesn't solve the problem.
Please help me understand what I've screwed up. Thank you!
This is how you make a timer object -
t1 = (Timer("sequential_search(new_list, -1)", setup="from __main__ import sequential_search, num_gen;new_list=num_gen(100);"))
print(t1.timeit(number=100))
Output -
False
0.0014021396637
It didn't work just because you were simply not passing the arguments. So just initialize the variables in setup(not necessarily though) and you are good to go.

How to parallel a Theano function using multiprocessing?

I have a function returned by theano.function(), and I want to use it within multiprocessing for speedup. The following is a simplified demo script to show where I run into problem:
import numpy as np
from multiprocessing import Pool
from functools import partial
import theano
from theano import tensor
def get_theano_func():
x = tensor.dscalar()
y = x + 0.1
f = theano.function([x], [y])
return f
def func1(func, x):
return func(x)
def MPjob(xlist):
f = get_theano_func()
fp = partial(func1, func=f)
pool = Pool(processes=5)
Results = pool.imap(fp, xlist)
Y = []
for y in Results:
Y.append(y[0])
pool.close()
return Y
if __name__ == '__main__':
xlist = np.arange(0, 5, 1)
Y = MPjob(xlist)
print(Y)
In the above codes, the theano function 'f' is fed to 'func1()' as input argument. If MPjob() runs correctly, it should return [0.1, 1.1, 2.1, 3.1, 4.1]. However, an exception "TypeError: func1() got multiple values for argument 'func'" raised.
The full trackback log is as follows:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Python35\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
TypeError: func1() got multiple values for argument 'func'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "F:/DaweiLeng/Code/workspace/Python/General/theano_multiprocess_debug.py", line 36, in <module>
Y = MPjob(xlist)
File "F:/DaweiLeng/Code/workspace/Python/General/theano_multiprocess_debug.py", line 29, in MPjob
for y in Results:
File "C:\Python35\lib\multiprocessing\pool.py", line 695, in next
raise value
TypeError: func1() got multiple values for argument 'func'
Anyone got a hint?
Turns out it's related with the partial() function. The full explanation is here https://github.com/Theano/Theano/issues/4720#issuecomment-232029702

Categories