Multiprocessing : Use process_map with many arg function

Multiprocessing : Use process_map with many arg function - python

I found this answer (https://stackoverflow.com/a/59905309/7462275) to display a progress bar very very simple to use. I would like to use this simple solution for functions that take many arguments.
Following, the above mentioned answer, I write this code that works :
from tqdm.contrib.concurrent import process_map
import time
def _foo(my_tuple):
my_number1, my_number2 = my_tuple
square = my_number1 * my_number2
time.sleep(1)
return square
r = process_map(_foo, [(i,j) for i,j in zip(range(0,30),range(100,130))],max_workers=mp.cpu_count())
But I wonder, if it is the correct solution (using a tuple to assign function variable) to do that. Thanks for answer

Related

tqdm and numpy vectorize

I am using a np.vectorize-ed function and would like to see the progress of the function with tqdm. However, I have not been able to figure out how to do this.
All the suggestions I have found relate to converting the calculation into a for-loop, or into a pd.DataFrame.

I finally found a method that works to get the tqdm progress bar to update with a np.vectorize function. I wrap the vectorize function using a with
with tqdm(total=len(my_inputs)) as pbar:
my_output = np.vectorize(my_function)(my_inputs)
in my_function() I then add the following lines
global pbar
pbar.update(1)
and voila! I now have a progress bar that updates with each iteration. Only slight performance dip on my code.
Note: when you instantiate the function it might complain that pbar is not yet defined. Simply put a pbar = 0 before you instantiate, and then the function will call the pbar defined by the with
Hope it helps everyone reading here.

To the best of my knowledge,tqdm does not wrap over numpy.vectorize.
To display the progress bar for numpy arrays, numpy.ndenumerate can be used.
Given the inputs and function:
import numpy as np
from tqdm import tqdm
a = np.array([1, 2, 3, 4])
b = 2
def myfunc(a, b):
"Return a-b if a>b, otherwise return a+b"
if a > b:
return a - b
else:
return a + b
Replace this vectorised part below
# using numpy.vectorize
vfunc = np.vectorize(myfunc)
vfunc(a, b)
with this
# using numpy.ndenumerate instead
[myfunc(x,b) for index, x in tqdm(np.ndenumerate(a))]
To see the tqdm progress.

Based on #Carl Kirstein's answer I came up with the following solution. I added the pbar element to my_function as an argument and updated it inside the function.
with tqdm(total=len(my_inputs)) as pbar:
my_output = np.vectorize(my_function)(my_inputs, pbar)
Somewhere inside my_function I somewhere added pbar.update(1).
def my_function(args, pbar):
...
pbar.update(1)
...

multiprocessing pool.map on a function inside other function

Say I have a function that provides different results for the same input and needs to be performed multiple times for the same input to obtain mean (I'll sketch a trivial example, but in reality the source of randomness is train_test_split from sklearn.model_selection if that matters)
define f(a,b):
output=[]
for i in range(0,b):
output[i] = np.mean(np.random.rand(a,))
return np.mean(output)
The arguments for this function are defined inside another function like so (again, a trivial example, please don't mind if these are not efficient/pythonistic):
define g(c,d):
a = c
b = c*d
result=f(a,b)
return(result)
Instead of using a for loop, I want to use multiprocessing to speed up the execution time. I found that neither pool.apply nor pool.startmap do the trick (execution time goes up), only pool.map works. However, it can only take one argument (in this case - the number of iterations). I tried redefining f as follows:
define f(number_of_iterations):
output=np.mean(np.random.rand(a,))
return output
And then use pool.map as follows:
import multiprocessing as mp
define g(c,d):
temp=[]
a = c
b = c*d
pool = mp.Pool(mp.cpu_count())
temp = pool.map(f, [number_of_iterations for number_of_iterations in b])
pool.close()
result=np.mean(temp)
return(result)
Basically, a convoluted workaround to make f a one-argument function. The hope was that f would still pick up argument a, however, executing g results in an error about a not being defined.
Is there any way to make pool.map work in this context?

I think functool.partial solves your issue. Here is a implementation: https://stackoverflow.com/a/25553970/9177173 Here the documentation: https://docs.python.org/3.7/library/functools.html#functools.partial

Is there any operator.unpack in Python?

Is there any built-in version for this
def unpack(f, a):
return f(**a) #or ``return f(*a)''
Why isn't unpack considered to be an operator and located in operator.*?
I'm trying to do something similar to this (but of course want a general solution to the same type of problem):
from functools import partial, reduce
from operator import add
data = [{'tag':'p','inner':'Word'},{'tag':'img','inner':'lower'}]
renderer = partial(unpack, "<{tag}>{inner}</{tag}>".format)
print(reduce(add, map(renderer, data)))
as without using lambdas or comprehensions.

That is not the way to go about this. How about
print(''.join('<{tag}>{inner}</{tag}>'.format(**d) for d in data))
Same behavior in a much more Pythonic style.
Edit: Since you seem opposed to using any of the nice features of Python, how about this:
def tag_format(x):
return '<{tag}>{inner}</{tag}>'.format(tag=x['tag'], inner=x['inner'])
results = []
for d in data:
results.append(tag_format(d))
print(''.join(results))

I don't know of an operator that does what you want, but you don't really need it to avoid lambdas or comprehensions:
from functools import reduce
from operator import add
data = [{'tag':'p','inner':'Word'},{'tag':'img','inner':'lower'}]
print(reduce(add, map("<{0[tag]}>{0[inner]}</{0[tag]}>".format, data)))
Seems like it would be possible to generalize something like this if you wanted.

FOOn(i, j, k) notation for nd numpy array in weave.inline()'s support_code argument? Are there any alternatives?

I want to use support_code to define functions that interact with nd numpy arrays. Inside the code argument, the FOO3(i, j, k) notation works, but only in it, not in support_code.Something like this:
import scipy
import scipy.weave
code = '''return_val = f(1);'''
support_code = '''int f(int i) {
return FOO3(i, i, i);
}''''
foo = scipy.arange(3**3).reshape(3,3,3)
print(scipy.weave.inline(code, ['foo'], support_code=support_code))

The concept of support code is mainly to do some includes. In your case, I guess the function should look something like this:
import scipy
import scipy.weave
def foofunc(i):
foo = scipy.arange(3**3).reshape(3,3,3)
code = '''#do something lengthy with foo and maybe i'''
scipy.weave.inline(code, ['foo', 'i']))
return foo[i,i,i]
You don't need support code at all, for what you're trying to do. You also don't have any speed improvement, when you try to do a function return in C instead of doing that in python, also array access is neglectable compared to the cost of the function call. To get a better idea, when and how weave can help you, to speed up your code, have a look here.

Efficient way of calling set of functions in Python

I have a set of functions:
functions=set(...)
All the functions need one parameter x.
What is the most efficient way in python of doing something similar to:
for function in functions:
function(x)

The code you give,
for function in functions:
function(x)
...does not appear to do anything with the result of calling function(x). If that is indeed so, meaning that these functions are called for their side-effects, then there is no more pythonic alternative. Just leave your code as it is.† The point to take home here, specifically, is
Avoid functions with side-effects in list-comprehensions.
As for efficiency: I expect that using anything else instead of your simple loop will not improve runtime. When in doubt, use timeit. For example, the following tests seem to indicate that a regular for-loop is faster than a list-comprehension. (I would be reluctant to draw any general conclusions from this test, thought):
>>> timeit.Timer('[f(20) for f in functions]', 'functions = [lambda n: i * n for i in range(100)]').repeat()
[44.727972984313965, 44.752119779586792, 44.577917814254761]
>>> timeit.Timer('for f in functions: f(20)', 'functions = [lambda n: i * n for i in range(100)]').repeat()
[40.320928812026978, 40.491761207580566, 40.303879022598267]
But again, even if these tests would have indicated that list-comprehensions are faster, the point remains that you should not use them when side-effects are involved, for readability's sake.
†: Well, I'd write for f in functions, so that the difference beteen function and functions is more pronounced. But that's not what this question is about.

If you need the output, a list comprehension would work.
[func(x) for func in functions]

I'm somewhat doubtful of how much of an impact this will have on the total running time of your program, but I guess you could do something like this:
[func(x) for func in functions]
The downside is that you will create a new list that you immediatly toss away, but it should be slightly faster than just the for-loop.
In any case, make sure you profile your code to confirm that this really is a bottleneck that you need to take care of.

Edit: I redid the test using timeit
My new test code:
import timeit
def func(i):
return i;
a = b = c = d = e = f = func
functions = [a, b, c, d, e, f]
timer = timeit.Timer("[f(2) for f in functions]", "from __main__ import functions")
print (timer.repeat())
timer = timeit.Timer("map(lambda f: f(2), functions)", "from __main__ import functions")
print (timer.repeat())
timer = timeit.Timer("for f in functions: f(2)", "from __main__ import functions")
print (timer.repeat())
Here is the results from this timing.
testing list comprehension
[1.7169530391693115, 1.7683839797973633, 1.7840299606323242]
testing map(f, l)
[2.5285000801086426, 2.5957231521606445, 2.6551258563995361]
testing plain loop
[1.1665718555450439, 1.1711149215698242, 1.1652190685272217]
My original, time.time() based timings are pretty much inline with this testing, plain for loops seem to be the most efficient.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing : Use process_map with many arg function - python

Related

tqdm and numpy vectorize

multiprocessing pool.map on a function inside other function

Is there any operator.unpack in Python?

FOOn(i, j, k) notation for nd numpy array in weave.inline()'s support_code argument? Are there any alternatives?

Efficient way of calling set of functions in Python

Categories

Resources