Error in use of python multiprocessing module with generator function. - python

Could some one explain what is wrong with below code
from multiprocessing import Pool
def sq(x):
yield x**2
p = Pool(2)
n = p.map(sq, range(10))
I am getting following error
MaybeEncodingError Traceback (most recent call
last) in ()
5 p = Pool(2)
6
----> 7 n = p.map(sq, range(10))
/home/devil/anaconda3/lib/python3.4/multiprocessing/pool.py in
map(self, func, iterable, chunksize)
258 in a list that is returned.
259 '''
--> 260 return self._map_async(func, iterable, mapstar, chunksize).get()
261
262 def starmap(self, func, iterable, chunksize=None):
/home/devil/anaconda3/lib/python3.4/multiprocessing/pool.py in
get(self, timeout)
606 return self._value
607 else:
--> 608 raise self._value
609
610 def _set(self, i, obj):
MaybeEncodingError: Error sending result: '[, ]'. Reason:
'TypeError("can't pickle generator objects",)'
Many thanks in advance.

You have to use a function not a generator here. Means: change yield by return to convert sq to a function. Pool can't work with generators.
Moreover, when trying to create a working version on Windows, I had a strange repeating error message.
Attempt to start a new process before the current process
has finished its bootstrapping phase.
This probably means that you are on Windows and you have
forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
literally quoting the comment I got, since it's self-explanatory:
the error on windows is because each process spawns a new python process which interprets the python file etc. so everything outside the "if main block" is executed again"
so to be portable, you have to use __name__=="__main__" when running this module:
from multiprocessing import Pool
def sq(x):
return x**2
if __name__=="__main__":
p = Pool(2)
n = p.map(sq, range(10))
print(n)
Result:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Edit: if you don't want to store the values beforehand, you can use imap:
n = p.imap(sq, range(10))
n is now a generator object. To consume the values (and activate the actual processing), I force iteration through a list and I get the same result as above:
print(list(n))
Note that the documentation indicates that imap is much slower than map.

Related

Python Multiprocessing, TypeError

I am new to multiprocessing and I need your help.
I have four variables that each of them can take up to 4 values (integers or floats) and I stored all of them in a list called par=[A, B, C, D]. (see below)
I have created a list of possible combinations with par = itertools.product(*par) .
Then, I call a function func1, that takes these arguments and some more and calculates stuff. With the results of the func1, I call another function that calculates stuff and then writes to a file.
I want to run these as a whole in parallel with multiprocessing.Pool I thought to embed func1 and func2 in another function, called func_run, and map this with the list par I created above.
To summarize, my code looks like:
#values that I will use for func1
r = np.logspace(np.log10(5),np.log10(300),300)
T = 200*r
#Parameters for the sim
A = [0.1, 0.05, 0.001, 0.005]
B = [0.005, 0.025, 0.05, 0.1]
C = [20, 60, 100, 200]
D = [10, 20, 40, 80]
#Store them in a list
par = [A, B, C, D]
#Create a list with all combinations
par = list(itertools.product(*par))
def func_run(param):
for i in range(len(param)):
# Call func1
values = func1(param[i][0],param[i][1],param[i][2], param[i][3], r, T)
x = values[0]
y = values[1]
# and so on
# Call func2
results = func2(x,y,...)
z = results[0]
w = results[1]
# and so on
data_dict = {'result 1': [param[i][0]], 'result 2' : [param[i][1]]}
df = pd.DataFrame(data=data_dict)
with open(filename, 'a') as f:
df.to_csv(f, header=False)
return
Then, I call the func_run with multiprocessing.
from multiprocessing import Pool
pool = Pool(processes=4)
results = pool.map(func_run, par)
As a result, I get a, TypeError with traceback:
---------------------------------------------------------------------------
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/user/anaconda3/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/user/anaconda3/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "<ipython-input-14-5ce94acfd95e>", line 5, in run
values = calc_val(param[i][0],param[i][1],param[i][2], param[i][3], r, T)
TypeError: 'float' object is not subscriptable
"""
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
<ipython-input-15-f45146f68f66> in <module>()
1 pool = Pool(processes=4)
----> 2 test = pool.map(run,par)
~/anaconda3/lib/python3.6/multiprocessing/pool.py in map(self, func, iterable, chunksize)
264 in a list that is returned.
265 '''
--> 266 return self._map_async(func, iterable, mapstar, chunksize).get()
267
268 def starmap(self, func, iterable, chunksize=None):
~/anaconda3/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
642 return self._value
643 else:
--> 644 raise self._value
645
646 def _set(self, i, obj):
TypeError: 'float' object is not subscriptable
Unfortunately, it is impossible to add the whole functions and what are they doing because they are hundreds of lines. I hope that you can get the feeling though even though you cannot really reproduce it by yourselfs.
Is it possible to run something like this with multiprocessing or I need a different approach?
It would be great if anyone can help me understand the error and make it run.
The result of
par = list(itertools.product(*par))
is a list of tuples of floats (and ints). Pool.map() takes an iterable as the 2nd argument and maps over its items, passing them individually to given func. In other words in the function func_run(param) param is not a list of tuples of numbers, but a tuple of numbers, and so
param[i][0]
is trying to access the ith float object's 0th item, which of course makes no sense, and so the exception. You probably should remove the for-loop in func_run():
def func_run(param):
values = func1(param[0], param[1], param[2], param[3], r, T)
...

Python - apply_async doesn't execute function

Hi I'm trying to use multiprocessing to speed up my code. However, the apply_async doesn't work for me. I tried to do a simple example like:
from multiprocessing.pool import Pool
t = [0, 1, 2, 3, 4, 5]
def cube(x):
t[x] = x**3
pool = Pool(processes=4)
for i in range(6):
pool.apply_async(cube, args=(i, ))
for x in t:
print(x)
It does not really change t as I would expect.
My real code is like:
from multiprocessing.pool import Pool
def func(a, b, c, d):
#some calculations
#save result to files
#no return value
lt = #list of possible value of a
#set values to b, c, d
p = Pool()
for i in lt:
p.apply_async(func, args=(i, b, c, d, ))
Where are the problems here?
Thank you!
Update: Thanks to the comments and answers, now I understand why my simple example won't work. However, I'm still in trouble with my real code. I have checked that my func does not rely on any global variable, so it seems not to be the same problem as my example code.
As suggested, I added a return value to my func, now my code is:
f = Flux("reactor")
d = Detector("Ge")
mv = arange(-6, 1.5, 0.5)
p = Pool()
lt = ["uee", "dee"]
for i in lt:
re = p.apply_async(res, args=(i, d, f, mv, ))
print(re.get())
p.close()
p.join()
Now I get the following error:
Traceback (most recent call last):
File "/Users/Shu/Documents/Programming/Python/Research/debug.py", line 35, in <module>
print(re.get())
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 608, in get
raise self._value
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'Flux.__init__.<locals>.<lambda>'
EDIT: the first example you provided will not work for a simple reason: processes do not share memory. Therefore, the change t[x] = x**3 will not be applied to the parent process leaving the values of the list t unchanged.
You need to actually return the value from the computation and build a new list from that.
def cube(x):
return x**3
t = [0, 1, 2, 3, 4, 5]
p = Pool()
t = p.map(cube, t)
print(t)
If, as you claim in the second example, the results are supposed not to be returned but to be independently stored within files and this does not happen, I'd recommend to check the return value of your function to see whether the function itself is raising an exception or not.
I'd recommend you to get the actual results and see what happens:
p = Pool()
for i in lt:
res = p.apply_async(func, args=(i, b, c, d, ))
print(res.get()) # this will raise an exception if it happens within func
p.close() # do not accept any more tasks
p.join() # wait for the completion of all scheduled jobs
Function quits too soon, try add at the end of your script this code:
import time
time.sleep(3)

Error Output Using Timer Object in Python

I've got several functions that I need to evaluate the performance of using the timeit module.
For starters, I'm attempting to use a Timer object to evaluate a sequential search function run on a list of random integers which should return the time to execute in seconds. The function returns a value of False for -1 (since it will never find -1) but gives me the following error along with it. Here is the complete output:
False
Traceback (most recent call last):
File "D:/.../search-test.py", line 37, in <module>
main()
File "D:/.../search-test.py", line 33, in main
print(t1.timeit(number=100))
File "C:\...\Anaconda2\lib\timeit.py", line 202, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
TypeError: sequential_search() takes exactly 2 arguments (0 given)
This is my program:
from timeit import Timer
import random
def sequential_search(a_list, item):
pos = 0
found = False
while pos < len(a_list) and not found:
if a_list[pos] == item:
found = True
else:
pos = pos+1
return found
def num_gen(value):
myrandom = random.sample(xrange(0, value), value)
return myrandom
def main():
new_list = num_gen(100)
print(sequential_search(new_list, -1))
t1 = (Timer("sequential_search()", "from __main__ import sequential_search"))
print(t1.timeit(number=100))
if __name__ == '__main__':
main()
I'm a noob to programming and can honestly say that I'm struggling. This error doesn't make any sense to me. I don't understand why it's asking for the sequential_search function arguments when they're already passed in main(). Plugging the arguments in to the Timer statement doesn't solve the problem.
Please help me understand what I've screwed up. Thank you!
This is how you make a timer object -
t1 = (Timer("sequential_search(new_list, -1)", setup="from __main__ import sequential_search, num_gen;new_list=num_gen(100);"))
print(t1.timeit(number=100))
Output -
False
0.0014021396637
It didn't work just because you were simply not passing the arguments. So just initialize the variables in setup(not necessarily though) and you are good to go.

How to call the function that yields (python 2.7)

I have two functions func1 and func2 that are specific implementations of func0 that YIELDS its result:
def func0(parameter, **kwargs):
#do sth with kwargs and parameter
yield result # result is html
how should I refer to func0 inside the "specific" functions to make them yield their results? Is return ok?
def func1(**kwargs):
return func0(parameter=1, **kwargs)
def func2(**kwargs):
return func0(parameter=2, **kwargs)
In Python 3.3+, the normal way would be to use yield from. From the documentation:
PEP 380 adds the yield from expression, allowing a generator to delegate part of its operations to another generator. This allows a section of code containing yield to be factored out and placed in another generator. Additionally, the subgenerator is allowed to return with a value, and the value is made available to the delegating generator.
For Python 2.7 that's not possible, however. Here's an alternative that works instead:
def base_squared_generator(parameter):
yield parameter ** 2
def two_squared_generator():
yield next(base_squared_generator(parameter=2))
def three_squared_generator():
yield next(base_squared_generator(parameter=3))
print(next(two_squared_generator()))
print(next(three_squared_generator()))
Output
4
9
If you use return, then func1 will return the generator that is func0. Alternatively, if you use yield from, then the wrapping function becomes a generator itself, yielding the individual items from func0. The yielded elements are the same in both cases.
def func1(**kwargs):
return func0(parameter=1, **kwargs)
def func2(**kwargs):
yield from func0(parameter=1, **kwargs)
Note how func1 returns a func0-generator, while func2 returns a func2-generator.
>>> func1()
<generator object func0 at 0x7fe038147ea0>
>>> func2()
<generator object func2 at 0x7fe038147ee8>
>>> list(func1()) == list(func2())
True
Note that yield from was introduced in Python 3. In Python 2, you can achieve the samy by yielding from a loop.
def func2(**kwargs):
for x in func0(parameter=1, **kwargs):
yield x
You are returning generators from the functions.
You need to read about generator, it's not long, anyway is here a way to use it:
gen = func1(args...)
res = gen.next() # python 2
or
res = next(gen) # python 2 and 3
This is how i would do it:
def func0(a):
yield a**2
from functools import partial
func1 = partial(func0, a=1)
func2 = partial(func0, a=10)
print(next(func1())) # prints 1
print(next(func2())) # prints 100
You can take a look at partial there. As i said in the comments it essentially clones your function with some of its required parameters already set.
So if func0 yields so do its partials func1 and func2.

Running multiprocessing on two different functions in Python 2.7

I have 2 different functions that I want to use multiprocessing for: makeFakeTransactions and elasticIndexing. The function makeFakeTransactions returns a list of dictionaries, which is then added to the async_results list. So essentially, async_results is a list of lists. I want to use this list of lists as input for the elasticIndexing function, but I must wait for the first p.apply_async to finish first before I use the list of lists. How do I ensure that the first batch of multiprocessing is finished before I initiate the next one?
Also, when I run the program as is, it skips the second p.apply_async and just terminates. Do I have to declare a separate multiprocessing.Pool variable to do another multiprocessing operation?
store_num = 1
process_number = 6
num_transactions = 10
p = multiprocessing.Pool(process_number)
async_results = [p.apply_async(makeFakeTransactions, args = (store_num, num_transactions,)) for store_num in xrange(1, 10, 5)]
results = [ar.get() for ar in async_results]
async_results = [p.apply_async(elasticIndexing, args = (result_list,)) for result_list in results]
EDIT:
I tried using p.join() after async_results, but it gives this error:
Traceback (most recent call last):
File "C:\Users\workspace\Proj\test.py", line 210, in <module>
p.join()
File "C:\Python27\lib\multiprocessing\pool.py", line 460, in join
assert self._state in (CLOSE, TERMINATE)
AssertionError

Categories