multiprocessing of function with constant and iterable arguments

multiprocessing of function with constant and iterable arguments - python

Hi stackoverflow users,
I tried to look this up but couldn't find an answer: I essentially like to process a function in parallel (independent processes!) and the function has one iterable (x) and several constant arguments (k, d). Here is a very simplified example:
from multiprocessing import *
def test_function(args):
k = args[0]
d = args[1]
x = args[2]
del args
return k*x + d
if __name__ == '__main__':
pool = Pool(processes=2)
k = 3.
d = 5.
constants = [k,d]
xvalues = range(0,10)
result = [pool.apply_async(test_function, constants.append(i)) for i in xvalues]
output = [r.get() for r in result]
print output
#I expect [5.0, 8.0, 11.0, 14.0, 17.0, 20.0, 23.0, 26.0, 29.0, 32.0]
This gives me the following error message:
Traceback (most recent call last):
File "test_function.py", line 23, in <module>
output = [r.get() for r in result]
File "C:\Program Files\Python2.7\lib\multiprocessing\pool.py", line 528, in get
raise self._value
TypeError: test_function() argument after * must be a sequence, not NoneType
So my questions are:
What does this error message actually mean?
How do I fix it to get the expected results (see last line of code example)?
Is there a better/working/elegant way for the line that calls apply_sync?
FYI: I'm new here and to python, please bear with me and let me know if my post needs more details.
Thanks a lot for any suggestions!

What does this error message actually mean?
The value returned by the append method is always None, hence when doing:
pool.apply_async(test_function, constants.append(i))
you are calling pool.apply_asynch with None as args argument, but apply_asynch expects an iterable as argument. What apply_asynch is doing is called tuple-unpacking.
How do I fix it to get the expected results?
To achieve the expected output simple concatenate the i to the constants:
pool.apply_asynch(test_function, (constants + [i],))
Is there a better/working/elegant way for the line that calls
apply_sync?
Note that you have to wrap all the arguments into a one element tuple, since your test_function accepts a single argument.
You could modify it in this way:
def test_function(k, d, x):
# etc
And simply use:
pool.apply_asynch(test_function, constants + [i])
The apply_asynch will automatically unpack the args into the three arguments of the function using tuple-unpacking. (read carefully the documentation for Pool.apply and friends).
Is there a better/working/elegant way for the line that calls
apply_sync?
As pointed out by Silas instead of using Pool.apply_asynch to a list of values you should use the Pool.map or Pool.map_asynch methods, which do that for you.
For example:
results = pool.map(test_function, [(constants + [i],) for i in xvalues])
However note that in this case test_function must accept a single argument, so you have to manually unpack the constants and the x, like you were doing in your question.
Also, as general suggestion:
In your test_function there is absolutely no need to do del args. It will only slow down the execution of the function(by a very little amount). Use del sparingly, only when needed.
Instead of assigning by hand the elements from the tuple you can use the syntax:
k, d, x = args
Which is equivalent to the (possibly slightly slower):
k = args[0]
d = args[1]
x = args[2]
Expect big slow downs using multiprocessing to call such simple functions. The cost to communicate and synchronize processes is pretty big, hence you must avoid calling simple function, and whenever possible try to work "in chunks"(e.g. instead of sending each request separately, send a list of 100 requests to a worker in a single argument).

constants.append(i) returns None , you should append the values first, and then use constants as the second parameter.
>>> constants = []
>>> i = 2
>>> bug_value = constants.append(i)
>>> constants
[2]
>>> bug_value is None
True
>>>
Use result = [pool.apply_async(test_function, constants+ [i]) for i in xvalues] indeed
list + list appends the two lists and returns the resulting list.

Related

Passing in an array as argument to a function

I am quite new to python and probably facing a very simple problem. However, I was not able to find a solution via Google, as the information I found indicated my method should work.
All I want to do is passing in an array as argument to a function.
The function that shall take an array:
def load(components):
global status
global numberOfLoadedPremixables
results = []
print('componetnsJson: ', componentsJson, file=sys.stderr)
status = statusConstants.LOADING
for x in range(0, len(components)):
blink(3)
gramm = components[x]['Gramm']
#outcome = load(gramm)
time.sleep(10)
outcome = 3
results.append(outcome)
numberOfLoadedPremixables += 1
status = statusConstants.LOADING_FINISHED
Then I am trying to start this function on a background thread:
background_thread = threading.Thread(target=load, args=[1,2,3]) #[1,2,3] only for testing
background_thread.start()
As a result, I end up with the error:
TypeError: load() takes 1 positional argument but 3 were given

Since you need to pass the whole array as a single unit to the function, wrap that in a tuple:
background_thread = threading.Thread(target=load, args=([1,2,3],))
The (,) turns the args into a single-element tuple that gets passed to your function
The issue is happening because python expects args to be a sequence which gets unwrapped when being passed to the function, so your function was actually being called like: load(1, 2, 3)

Function composition, tuples and unpacking

(disclaimed: not a Python kid, so please be gentle)
I am trying to compose functions using the following:
def compose(*functions):
return functools.reduce(lambda acc, f: lambda x: acc(f(x)), functions, lambda x: x)
which works as expected for scalar functions. I'd like to work with functions returning tuples and others taking multiple arguments, eg.
def dummy(name):
return (name, len(name), name.upper())
def transform(name, size, upper):
return (upper, -size, name)
# What I want to achieve using composition,
# ie. f = compose(transform, dummy)
transform(*dummy('Australia'))
=> ('AUSTRALIA', -9, 'Australia')
Since dummy returns a tuple and transform takes three arguments, I need to unpack the value.
How can I achieve this using my compose function above? If I try like this, I get:
f = compose(transform, dummy)
f('Australia')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in <lambda>
File "<stdin>", line 2, in <lambda>
TypeError: transform() takes exactly 3 arguments (1 given)
Is there a way to change compose such that it will unpack where needed?

This one works for your example but it wont handle just any arbitrary function - it will only works with positional arguments and (of course) the signature of any function must match the return value of the previous (wrt/ application order) one.
def compose(*functions):
return functools.reduce(
lambda f, g: lambda *args: f(*g(*args)),
functions,
lambda *args: args
)
Note that using reduce here, while certainly idiomatic in functional programming, is rather unpythonic. The "obvious" pythonic implementation would use iteration instead:
def itercompose(*functions):
def composed(*args):
for func in reversed(functions):
args = func(*args)
return args
return composed
Edit:
You ask "Is there a way to make have a compose function which will work in both cases" - "both cases" here meaning wether the functions returns an iterable or not (what you call "scalar functions", a concept that has no meaning in Python).
Using the iteration-based implementation, you could just test if the return value is iterable and wrap it in a tuple ie:
import collections
def itercompose(*functions):
def composed(*args):
for func in reversed(functions):
if not isinstance(args, collections.Iterable):
args = (args,)
args = func(*args)
return args
return composed
but this is not garanteed to work as expected - actually this is even garanteed to NOT work as expected for most use cases. There are a lot of builtin iterable types in Python (and even more user-defined ones) and just knowing an object is iterable doesn't say much about it's semantic.
For example a dict or str are iterable but in this case should obviously be considered a "scalar". A list is iterable too, and how it should be interpreted in this case is actually just undecidable without knowing exactly what it contains and what the "next" function in composition order expects - in some cases you will want to treat it as a single argument, in other cases ase a list of args.
IOW only the caller of the compose() function can really tell how each function result should be considered - actually you might even have cases where you want a tuple to be considered as a "scalar" value by the next function. So to make a long story short: no, there's no one-size-fits-all generic solution in Python. The best I could think of requires a combination of result inspection and manual wrapping of composed functions so the result is properly interpreted by the "composed" function but at this point manually composing the functions will be both way simpler and much more robust.
FWIW remember that Python is first and mostly a dynamically typed object oriented language so while it does have a decent support for functional programming idioms it's obviously not the best tool for real functional programming.

You might consider inserting a "function" (really, a class constructor) in your compose chain to signal the unpacking of the prior/inner function's results. You would then adjust your composer function to check for that class to determine if the prior result should be unpacked. (You actually end up doing the reverse: tuple-wrap all function results except those signaled to be unpacked -- and then have the composer unpack everything.) It adds overhead, it's not at all Pythonic, it's written in a terse lambda style, but it does accomplish the goal of being able to properly signal in a function chain when the composer should unpack a result. Consider the following generic code, which you can then adapt to your specific composition chain:
from functools import reduce
from operator import add
class upk: #class constructor signals composer to unpack prior result
def __init__(s,r): s.r = r #hold function's return for wrapper function
idt = lambda x: x #identity
wrp = lambda x: x.r if isinstance(x, upk) else (x,) #wrap all but unpackables
com = lambda *fs: ( #unpackable compose, unpacking whenever upk is encountered
reduce(lambda a,f: lambda *x: a(*wrp(f(*x))), fs, idt) )
foo = com(add, upk, divmod) #upk signals divmod's results should be unpacked
print(foo(6,4))
This circumvents the problem, as called out by prior answers/comments, of requiring your composer to guess which types of iterables should be unpacked. Of course, the cost is that you must explicitly insert upk into the callable chain whenever unpacking is required. In that sense, it is by no means "automatic", but it is still a fairly simple/terse way of achieving the intended result while avoiding unintended wraps/unwraps in many corner cases.

The compose function in the answer contributed by Bruno did do the job for functions with multiple arguments but didn't work any more for scalar ones unfortunately.
Using the fact that Python `unpacks' tuples into positional arguments, this is how I solved it:
import functools
def compose(*functions):
def pack(x): return x if type(x) is tuple else (x,)
return functools.reduce(
lambda acc, f: lambda *y: f(*pack(acc(*pack(y)))), reversed(functions), lambda *x: x)
which now works just as expected, eg.
#########################
# scalar-valued functions
#########################
def a(x): return x + 1
def b(x): return -x
# explicit
> a(b(b(a(15))))
# => 17
# compose
> compose(a, b, b, a)(15)
=> 17
########################
# tuple-valued functions
########################
def dummy(x):
return (x.upper(), len(x), x)
def trans(a, b, c):
return (b, c, a)
# explicit
> trans(*dummy('Australia'))
# => ('AUSTRALIA', 9, 'Australia')
# compose
> compose(trans, dummy)('Australia')
# => ('AUSTRALIA', 9, 'Australia')
And this also works with multiple arguments:
def add(x, y): return x + y
# explicit
> b(a(add(5, 3)))
=> -9
# compose
> compose(b, a, add)(5, 3)
=> -9

How to pass a function with more than one argument to python concurrent.futures.ProcessPoolExecutor.map()?

I would like concurrent.futures.ProcessPoolExecutor.map() to call a function consisting of 2 or more arguments. In the example below, I have resorted to using a lambda function and defining ref as an array of equal size to numberlist with an identical value.
1st Question: Is there a better way of doing this? In the case where the size of numberlist can be million to billion elements in size, hence ref size would have to follow numberlist, this approach unnecessarily takes up precious memory, which I would like to avoid. I did this because I read the map function will terminate its mapping until the shortest array end is reach.
import concurrent.futures as cf
nmax = 10
numberlist = range(nmax)
ref = [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
workers = 3
def _findmatch(listnumber, ref):
print('def _findmatch(listnumber, ref):')
x=''
listnumber=str(listnumber)
ref = str(ref)
print('listnumber = {0} and ref = {1}'.format(listnumber, ref))
if ref in listnumber:
x = listnumber
print('x = {0}'.format(x))
return x
a = map(lambda x, y: _findmatch(x, y), numberlist, ref)
for n in a:
print(n)
if str(ref[0]) in n:
print('match')
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
#for n in executor.map(_findmatch, numberlist):
for n in executor.map(lambda x, y: _findmatch(x, ref), numberlist, ref):
print(type(n))
print(n)
if str(ref[0]) in n:
print('match')
Running the code above, I found that the map function was able to achieve my desired outcome. However, when I transferred the same terms to concurrent.futures.ProcessPoolExecutor.map(), python3.5 failed with this error:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/queues.py", line 241, in _feed
obj = ForkingPickler.dumps(obj)
File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function <lambda> at 0x7fd2a14db0d0>: attribute lookup <lambda> on __main__ failed
Question 2: Why did this error occur and how do I get concurrent.futures.ProcessPoolExecutor.map() to call a function with more than 1 argument?

To answer your second question first, you are getting an exception because a lambda function like the one you're using is not picklable. Since Python uses the pickle protocol to serialize the data passed between the main process and the ProcessPoolExecutor's worker processes, this is a problem. It's not clear why you are using a lambda at all. The lambda you had takes two arguments, just like the original function. You could use _findmatch directly instead of the lambda and it should work.
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
for n in executor.map(_findmatch, numberlist, ref):
...
As for the first issue about passing the second, constant argument without creating a giant list, you could solve this in several ways. One approach might be to use itertools.repeat to create an iterable object that repeats the same value forever when iterated on.
But a better approach would probably be to write an extra function that passes the constant argument for you. (Perhaps this is why you were trying to use a lambda function?) It should work if the function you use is accessible at the module's top-level namespace:
def _helper(x):
return _findmatch(x, 5)
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
for n in executor.map(_helper, numberlist):
...

(1) No need to make a list. You can use itertools.repeat to create an iterator that just repeats the some value.
(2) You need to pass a named function to map because it will be passed to the subprocess for execution. map uses the pickle protocol to send things, lambdas can't be pickled and therefore they can't be part of the map. But its totally unnecessary. All your lambda did was call a 2 parameter function with 2 parameters. Remove it completely.
The working code is
import concurrent.futures as cf
import itertools
nmax = 10
numberlist = range(nmax)
workers = 3
def _findmatch(listnumber, ref):
print('def _findmatch(listnumber, ref):')
x=''
listnumber=str(listnumber)
ref = str(ref)
print('listnumber = {0} and ref = {1}'.format(listnumber, ref))
if ref in listnumber:
x = listnumber
print('x = {0}'.format(x))
return x
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
#for n in executor.map(_findmatch, numberlist):
for n in executor.map(_findmatch, numberlist, itertools.repeat(5)):
print(type(n))
print(n)
#if str(ref[0]) in n:
# print('match')

Regarding your first question, do I understand it correctly that you want to pass an argument whose value is determined only at the time you call map but constant for all instances of the mapped function? If so, I would do the map with a function derived from a "template function" with the second argument (ref in your example) baked into it using functools.partial:
from functools import partial
refval = 5
def _findmatch(ref, listnumber): # arguments swapped
...
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
for n in executor.map(partial(_findmatch, refval), numberlist):
...
Re. question 2, first part: I haven't found the exact piece of code that tries to pickle (serialize) the function that should then be executed in parallel, but it sounds natural that that has to happen -- not only the arguments but also the function has to be transferred to the workers somehow, and it likely has to be serialized for this transfer. The fact that partial functions can be pickled while lambdas cannot is mentioned elsewhere, for instance here: https://stackoverflow.com/a/19279016/6356764.
Re. question 2, second part: if you wanted to call a function with more than one argument in ProcessPoolExecutor.map, you would pass it the function as the first argument, followed by an iterable of first arguments for the function, followed by an iterable of its second arguments etc. In your case:
for n in executor.map(_findmatch, numberlist, ref):
...

Pythonic way to efficiently handle variable number of return args

So I have a function that can either work quietly or verbosely. In quiet mode it produces an output. In verbose mode it also saves intermediate calculations to a list, though doing so takes extra computation in itself.
Before you ask, yes, this is an identified bottleneck for optimization, and the verbose output is rarely needed so that's fine.
So the question is, what's the most pythonic way to efficiently handle a function which may or may not return a second value? I suspect a pythonic way would be named tuples or dictionary output, e.g.
def f(x,verbose=False):
result = 0
verbosity = []
for _ in x:
foo = # something quick to calculate
result += foo
if verbose:
verbosity += # something slow to calculate based on foo
return {"result":result, "verbosity":verbosity}
But that requires constructing a dict when it's not needed.
Some alternatives are:
# "verbose" changes syntax of return value, yuck!
return result if verbose else (result,verbosity)
or using a mutable argument
def f(x,verbosity=None):
if verbosity:
assert verbosity==[[]]
result = 0
for _ in x:
foo = # something quick to calculate
result += foo
if verbosity:
# hard coded value, yuck
verbosity[0] += # something slow to calculate based on foo
return result
# for verbose results call as
verbosity = [[]]
f(x,verbosity)
Any better ideas?

Don't return verbosity. Make it an optional function argument, passed in by the caller, mutated in the function if not empty.
The non-pythonic part of some answers is the need to test the structure of the return value. Passing mutable arguments for optional processing avoids this ugliness.

I like the first option, but instead of passing a verbose parameter in the function call, return a tuple of a quick result and a lazily-evaluated function:
import time
def getResult(x):
quickResult = x * 2
def verboseResult():
time.sleep(5)
return quickResult * 2
return (quickResult, verboseResult)
# Returns immediately
(quickResult, verboseResult) = getResult(2)
print(quickResult) # Prints immediately
print(verboseResult()) # Prints after running the long-running function

Why is multiprocessing's apply_async so picky?

Sample code that works without issue:
from multiprocessing import *
import time
import random
def myfunc(d):
a = random.randint(0,1000)
d[a] = a
print("Process; %s" % a)
print("Starting mass threads")
man = Manager()
d = man.dict()
p = Pool(processes=8)
for i in range(0,100):
p.apply_async(myfunc, [d])
p.close()
p.join()
print(d)
print("Ending multiprocessing")
If you change p.apply_async(myfunc, [d]) to p.apply_async(myfunc, (d)) or p.apply_async(myfunc, d) then the pool will not work at all. If you add another arg to myfunc and then just pass in a None it'll work like this p.apply_async(myfunc, (None, d)) — but why?

The documentation for apply_async says the following:
apply(func[, args[, kwds]])
Call func with arguments args and keyword arguments kwds. It blocks until the result is ready. Given this blocks, apply_async() is better suited for performing work in parallel. Additionally, func is only executed in one of the workers of the pool.
Thus instead of taking star and double star arguments, it takes positional arguments and keyword arguments to be passed to the target function as the 2nd and 3rd arguments to the function; the second must be an iterable and the 3rd one a mapping, respectively.
Notice that since the apply works asynchronously, you won't see any exceptions, unless you .wait and .get them from the results;
You can try simply:
for i in range(0,100):
result = p.apply_async(myfunc, d)
print(result.get())
In the code above, the result.get() waits for the completion of the 100th thread and returns its returned value - or tries as it will fail, because the managed dictionary cannot be used as the positional arguments:
Traceback (most recent call last):
File "test.py", line 21, in <module>
print(result.get())
File "/usr/lib/pythonN.N/multiprocessing/pool.py", line 558, in get
raise self._value
KeyError: 0
Thus, looking at your original question: do note that [d] is a list of length 1; (d) is the same as d; to have a tuple of length 1 you need to type (d,). From the Python 3 tutorial section 5.3:
A special problem is the construction of tuples containing 0 or 1
items: the syntax has some extra quirks to accommodate these. Empty
tuples are constructed by an empty pair of parentheses; a tuple with
one item is constructed by following a value with a comma (it is not
sufficient to enclose a single value in parentheses). Ugly, but
effective. For example:
>>> empty = ()
>>> singleton = 'hello', # <-- note trailing comma
>>> len(empty)
0
>>> len(singleton)
1
>>> singleton
('hello',)
(d,), [d], {d}, or even iter(frozenset(d)) or {d: True} would work just nicely as your positional arguments; all these as args would result in an Iterable whose iterator yields exactly 1 value - that of d. On the other hand, if you had passed almost any other kind of value than that unfortunate managed dictionary, you would have gotten a much more usable error; say if the value was 42, you'd have got:
TypeError: myfunc() argument after * must be a sequence, not int

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

multiprocessing of function with constant and iterable arguments - python

Related

Passing in an array as argument to a function

Function composition, tuples and unpacking

How to pass a function with more than one argument to python concurrent.futures.ProcessPoolExecutor.map()?

Pythonic way to efficiently handle variable number of return args

Why is multiprocessing's apply_async so picky?

Categories

Resources