Efficient way of calling set of functions in Python - python

I have a set of functions:
functions=set(...)
All the functions need one parameter x.
What is the most efficient way in python of doing something similar to:
for function in functions:
function(x)

The code you give,
for function in functions:
function(x)
...does not appear to do anything with the result of calling function(x). If that is indeed so, meaning that these functions are called for their side-effects, then there is no more pythonic alternative. Just leave your code as it is.† The point to take home here, specifically, is
Avoid functions with side-effects in list-comprehensions.
As for efficiency: I expect that using anything else instead of your simple loop will not improve runtime. When in doubt, use timeit. For example, the following tests seem to indicate that a regular for-loop is faster than a list-comprehension. (I would be reluctant to draw any general conclusions from this test, thought):
>>> timeit.Timer('[f(20) for f in functions]', 'functions = [lambda n: i * n for i in range(100)]').repeat()
[44.727972984313965, 44.752119779586792, 44.577917814254761]
>>> timeit.Timer('for f in functions: f(20)', 'functions = [lambda n: i * n for i in range(100)]').repeat()
[40.320928812026978, 40.491761207580566, 40.303879022598267]
But again, even if these tests would have indicated that list-comprehensions are faster, the point remains that you should not use them when side-effects are involved, for readability's sake.
†: Well, I'd write for f in functions, so that the difference beteen function and functions is more pronounced. But that's not what this question is about.

If you need the output, a list comprehension would work.
[func(x) for func in functions]

I'm somewhat doubtful of how much of an impact this will have on the total running time of your program, but I guess you could do something like this:
[func(x) for func in functions]
The downside is that you will create a new list that you immediatly toss away, but it should be slightly faster than just the for-loop.
In any case, make sure you profile your code to confirm that this really is a bottleneck that you need to take care of.

Edit: I redid the test using timeit
My new test code:
import timeit
def func(i):
return i;
a = b = c = d = e = f = func
functions = [a, b, c, d, e, f]
timer = timeit.Timer("[f(2) for f in functions]", "from __main__ import functions")
print (timer.repeat())
timer = timeit.Timer("map(lambda f: f(2), functions)", "from __main__ import functions")
print (timer.repeat())
timer = timeit.Timer("for f in functions: f(2)", "from __main__ import functions")
print (timer.repeat())
Here is the results from this timing.
testing list comprehension
[1.7169530391693115, 1.7683839797973633, 1.7840299606323242]
testing map(f, l)
[2.5285000801086426, 2.5957231521606445, 2.6551258563995361]
testing plain loop
[1.1665718555450439, 1.1711149215698242, 1.1652190685272217]
My original, time.time() based timings are pretty much inline with this testing, plain for loops seem to be the most efficient.

Related

multiprocessing pool.map on a function inside other function

Say I have a function that provides different results for the same input and needs to be performed multiple times for the same input to obtain mean (I'll sketch a trivial example, but in reality the source of randomness is train_test_split from sklearn.model_selection if that matters)
define f(a,b):
output=[]
for i in range(0,b):
output[i] = np.mean(np.random.rand(a,))
return np.mean(output)
The arguments for this function are defined inside another function like so (again, a trivial example, please don't mind if these are not efficient/pythonistic):
define g(c,d):
a = c
b = c*d
result=f(a,b)
return(result)
Instead of using a for loop, I want to use multiprocessing to speed up the execution time. I found that neither pool.apply nor pool.startmap do the trick (execution time goes up), only pool.map works. However, it can only take one argument (in this case - the number of iterations). I tried redefining f as follows:
define f(number_of_iterations):
output=np.mean(np.random.rand(a,))
return output
And then use pool.map as follows:
import multiprocessing as mp
define g(c,d):
temp=[]
a = c
b = c*d
pool = mp.Pool(mp.cpu_count())
temp = pool.map(f, [number_of_iterations for number_of_iterations in b])
pool.close()
result=np.mean(temp)
return(result)
Basically, a convoluted workaround to make f a one-argument function. The hope was that f would still pick up argument a, however, executing g results in an error about a not being defined.
Is there any way to make pool.map work in this context?
I think functool.partial solves your issue. Here is a implementation: https://stackoverflow.com/a/25553970/9177173 Here the documentation: https://docs.python.org/3.7/library/functools.html#functools.partial

Pythonic way to re-apply a function to its own output n times?

Assume there are some useful transformation functions, for example random_spelling_error, that we would like to apply n times.
My temporary solution looks like this:
def reapply(n, fn, arg):
for i in range(n):
arg = fn(arg)
return arg
reapply(3, random_spelling_error, "This is not a test!")
Is there a built-in or otherwise better way to do this?
It need not handle variable lengths args or keyword args, but it could. The function will be called at scale, but the values of n will be low and the size of the argument and return value will be small.
We could call this reduce but that name was of course taken for a function that can do this and too much more, and was removed in Python 3. Here is Guido's argument:
So in my mind, the applicability of reduce() is pretty much limited to
associative operators, and in all other cases it's better to write out
the accumulation loop explicitly.
reduce is still available in python 3 using the functools module. I don't really know that it's any more pythonic, but here's how you could achieve it in one line:
from functools import reduce
def reapply(n, fn, arg):
return reduce(lambda x, _: fn(x), range(n), arg)
Get rid of the custom function completely, you're trying to compress two readable lines into one confusing function call. Which one do you think is easier to read and understand, your way:
foo = reapply(3, random_spelling_error, foo)
Or a simple for loop that's one more line:
for _ in range(3):
foo = random_spelling_error(foo)
Update: According to your comment
Let's assume that there are many transformation functions I may want to apply.
Why not try something like this:
modifiers = (random_spelling_error, another_function, apply_this_too)
for modifier in modifiers:
for _ in range(3):
foo = modifier(foo)
Or if you need different amount of repeats for different functions, try creating a list of tuples:
modifiers = [
(random_spelling_error, 5),
(another_function, 3),
...
]
for modifier, count in modifiers:
for _ in range(count):
foo = modifier(foo)
some like recursion, not always obviously 'better'
def reapply(n, fn, arg):
if n:
arg = reapply(n-1, fn, fn(arg))
return arg
reapply(1, lambda x: x**2, 2)
Out[161]: 4
reapply(2, lambda x: x**2, 2)
Out[162]: 16

Code execution time: how to properly DRY several timeit executions?

Let's assume we want to use timeit for some performance testing with different inputs.
The obvious, non-DRY way would be something like this:
import timeit
# define functions to test
def some_list_operation_A(lst):
...
def some_list_operation_B(lst):
...
# create different lists (different input) to test the functions with
...
inputs = [lsta, lstb, lstc, ...]
# measure performance with first function
num = 10
t_lsta = timeit.timeit("some_list_operation_A(lsta)",
setup="from __main__ import some_list_operation_A, lsta",
number=num)
t_lstb = timeit.timeit("some_list_operation_A(lstb)",
setup="from __main__ import some_list_operation_A, lstb",
number=num)
t_lstc = timeit.timeit("some_list_operation_A(lstc)",
setup="from __main__ import some_list_operation_A, lstc",
number=num)
...
# print results & do some comparison stuff
for res in [t_lsta, t_lstb, t_lstc, ...]:
print("{:.4f}s".format(res))
...
# do this ALL OVER AGAIN for 'some_list_operation_B'
...
# print new results
# do this ALL OVER AGAIN for 'some_list_operation_C'
# ...I guess you'll got the point
...
I think it should be very clear that this would be a really ugly way to measure the performance of different functions for different input.
What I currently do is something like this:
...
inputs = dict()
inputs["lsta"] = lsta
inputs["lstb"] = lstb
inputs["lstc"] = lstc
for f in ["some_list_operation_A", "some_list_operation_B", ...]:
r = dict() # results
for key, val in inputs.iteritems():
r[key] = timeit.timeit("{}(inputs[{}])".format(f, key),
setup="from __main__ import {}, inputs".format(f),
number=num
# evaluate results 'r' for function 'f' here
# (includes a comparison of the results -
# that's why I save them in 'r')
...
# loop moves on to next function 'f'
Basically, I am using .format here to insert the function name and insert the right data inputs[key]. After .format fills all {}, the result is one correct stmt string of timeit.
While this is a lot shorter than the obvious non-DRY solution, it is also less readable and more like hack, isn't it?
What would be an appropriate DRY solution for such problems?
I also thought of simply timing the functions with decorators (this would be neat?!) - but I did not succeed: the decorator should not only print the result. In my # evaluate results 'r'-step I am not only printing the results, but I am also comparing them: like computing relative differences and stuff. Thus, I would need the decorator to return something in order to compare the results for each run...
Can someone hint me in the right direction for a clean, pythonic solution? I would like to have more beautiful/ideomatic code...and especially: shorter code!

Python: Redefining function from within the function

I have some expensive function f(x) that I want to only calculate once, but is called rather frequently. In essence, the first time the function is called, it should compute a whole bunch of values for a range of x since it will be integrated over anyway and then interpolate that one with splines, and cache the coefficients somehow, possibly in a file for further use.
My idea was to do something like the following, since it would be pretty easy to implement. First time the function is called, it does something, then redefines itself, then does something else from then on. However, it does not work as expected and might in general be bad practice.
def f():
def g():
print(2)
print(1)
f = g
f()
f()
Expected output:
1
2
Actual output:
1
1
Defining g() outside of f() does not help. Why does this not work? Other than that, the only solution I can think of right now is to use some global variable. Or does it make sense to somehow write a class for this?
This is overly complicated. Instead, use memoization:
def memoized(f):
res = []
def resf():
if len(res) == 0
res.append(f())
return res[0]
return resf
and then simply
#memoized
def f():
# expensive calculation here ...
return calculated_value
In Python 3, you can replace memoized with functools.lru_cache.
Simply add global f at the beginning of the f function, otherwise python creates a local f variable.
Changing f in f's scope doesn't affect outside of function, if you want to change f, you could use global:
>>> def f():
... print(1)
... global f
... f=lambda: print(2)
...
>>> f()
1
>>> f()
2
>>> f()
2
You can use memoization and decoration to cache the result. See an example here. A separate question on memoization that might prove useful can be found here.
What you're describing is the kind of problem caching was invented for. Why not just have a buffer to hold the result; before doing the expensive calculation, check if the buffer is already filled; if so, return the buffered result, otherwise, execute the calculation, fill the buffer, and then return the result. No need to go all fancy with self-modifying code for this.

Is it a good idea to have a syntax sugar to function composition in Python?

Some time ago I looked over Haskell docs and found it's functional composition operator really nice. So I've implemented this tiny decorator:
from functools import partial
class _compfunc(partial):
def __lshift__(self, y):
f = lambda *args, **kwargs: self.func(y(*args, **kwargs))
return _compfunc(f)
def __rshift__(self, y):
f = lambda *args, **kwargs: y(self.func(*args, **kwargs))
return _compfunc(f)
def composable(f):
return _compfunc(f)
#composable
def f1(x):
return x * 2
#composable
def f2(x):
return x + 3
#composable
def f3(x):
return (-1) * x
#composable
def f4(a):
return a + [0]
print (f1 >> f2 >> f3)(3) #-9
print (f4 >> f1)([1, 2]) #[1, 2, 0, 1, 2, 0]
print (f4 << f1)([1, 2]) #[1, 2, 1, 2, 0]
The problem:
without language support we can't use this syntax on builtin functions or lambdas like this:
((lambda x: x + 3) >> abs)(2)
The question:
is it useful? Does it worth to be discussed on python mail list?
IMHO: no, it's not. While I like Haskell, this just doesn't seem to fit in Python. Instead of (f1 >> f2 >> f3) you can do compose(f1, f2, f3) and that solves your problem -- you can use it with any callable without any overloading, decorating or changing the core (IIRC somebody already proposed functools.compose at least once; I can't find it right now).
Besides, the language definition is frozen right now, so they will probably reject that kind of change anyway -- see PEP 3003.
Function composition isn't a super-common operation in Python, especially not in a way that a composition operator is clearly needed. If something was added, I am not certain I like the choice of << and >> for Python, which are not as obvious to me as they seem to be to you.
I suspect a lot of people would be more comfortable with a function compose, the order of which is not problematic: compose(f, g)(x) would mean f(g(x)), the same order as o in math and . in Haskell. Python tries to avoid using punctuation when English words will do, especially when the special characters don't have widely-known meaning. (Exceptions are made for things that seem too useful to pass up, such as # for decorators (with much hesitation) and * and ** for function arguments.)
If you do choose to send this to python-ideas, you'll probably win a lot more people if you can find some instances in the stdlib or popular Python libraries that function composition could have made code more clear, easy to write, maintainable, or efficient.
You can do it with reduce, although the order of calls is left-to-right only:
def f1(a):
return a+1
def f2(a):
return a+10
def f3(a):
return a+100
def call(a,f):
return f(a)
reduce(call, (f1, f2, f3), 5)
# 5 -> f1 -> f2 -> f3 -> 116
reduce(call, ((lambda x: x+3), abs), 2)
# 5
I don't have enough experience with Python to have a view on whether a language change would be worthwhile. But I wanted to describe the options available with the current language.
To avoid creating unexpected behavior, functional composition should ideally follow the standard math (or Haskell) order of operations, i.e., f ∘ g ∘ h should mean apply h, then g, then f.
If you want to use an existing operator in Python, say <<, as you mention you'd have a problem with lambdas and built-ins. You can make your life easier by defining the reflected version __rlshift__ in addition to __lshift__. With that, lambda/built-ins adjacent to composable objects would be taken care of. When you do have two adjacent lambda/built-ins, you'll need to explicitly convert (just one of) them with composable, as #si14 suggested. Note I really mean __rlshift__, not __rshift__; in fact, I would advise against using __rshift__ at all, since the order change is confusing despite the directional hint provided by the shape of the operator.
But there's another approach that you may want to consider. Ferdinand Jamitzky has a great recipe for defining pseudo infix operators in Python that work even on built-ins. With this, you can write f |o| g for function composition, which actually looks very reasonable.

Categories