Function closure performance - python

I thought that I improve performance when I replace this code:
def f(a, b):
return math.sqrt(a) * b
result = []
a = 100
for b in range(1000000):
result.append(f(a, b))
with:
def g(a):
def f(b):
return math.sqrt(a) * b
return f
result = []
a = 100
func = g(a)
for b in range(1000000):
result.append(func(b))
I assumed that since a is fixed when the closure is performed, the interpreter would precompute everything that involves a, and so math.sqrt(a) would be repeated just once instead of 1000000 times.
Is my understanding always correct, or always incorrect, or correct/incorrect depending on the implementation?
I noticed that the code object for func is built (at least in CPython) before runtime, and is immutable. The code object then seems to use global environment to achieve the closure. This seems to suggest that the optimization I hoped for does not happen.

I assumed that since a is fixed when the closure is performed, the interpreter would precompute everything that involves a, and so
math.sqrt(a) would be repeated just once instead of 1000000 times.
That assumption is wrong, I don't know where it came from. A closure just captures variable bindings, in your case it captures the value of a, but that doesn't mean that any more magic is going on: The expression math.sqrt(a) is still evaluated every time f is called.
After all, it has to be computed every time because the interpreter doesn't know that sqrt is "pure" (the return value is only dependent on the argument and no side-effects are performed). Optimizations like the ones you expect are practical in functional languages (referential transparency and static typing help a lot here), but would be very hard to implement in Python, which is an imperative and dynamically typed language.
That said, if you want to precompute the value of math.sqrt(a), you need to do that explicitly:
def g(a):
s = math.sqrt(a)
def f(b):
return s * b
return f
Or using lambda:
def g(a):
s = math.sqrt(a)
return lambda b: s * b
Now that g really returns a function with 1 parameter, you have to call the result with only one argument.

The code is not evaluated statically; the code inside the function is still calculated each time. The function object contains all the byte code which expresses the code in the function; it doesn't evaluate any of it. You could improve matters by calculating the expensive value once:
def g(a):
root_a = math.sqrt(a)
def f(b):
return root_a * b
return f
result = []
a = 100
func = g(a)
for b in range(1000000):
result.append(func(b))
Naturally, in this trivial example, you could improve performance much more:
a = 100
root_a = math.sqrt(a)
result = [root_a * b for b in range(1000000)]
But I presume you're working with a more complex example than that where that doesn't scale?

As usual, the timeit module is your friend. Try some things and see how it goes. If you don't care about writing ugly code, this might help a little as well:
def g(a):
def f(b,_local_func=math.sqrt):
return _local_func(a)*b
Apparently python takes a performance penalty whenever it tries to access a "global" variable/function. If you can make that access local, you can shave off a little time.

Related

Intermediate results from recursion

I have a problem where I need to produce something which is naturally computed recursively, but where I also need to be able to interrogate the intermediate steps in the recursion if needed.
I know I can do this by passing and mutating a list or similar structure. However, this looks ugly to me and I'm sure there must be a neater way, e.g. using generators. What I would ideally love to be able to do is something like:
intermediate_results = [f(x) for x in range(T)]
final_result = intermediate_results[T-1]
in an efficient way. While my solution is not performance critical, I can't justify the massive amount of redundant effort in that first line. It looks to me like a generator would be perfect for this except for the fact that f is fundamentally much more suited to recursion in my case (which at least in my mind is the complete opposite of a generator, but maybe I'm just not thinking far enough outside of the box).
Is there a neat Pythonic way of doing something like this that I just don't know about, or do I just need to just capitulate and pollute my function f by passing it an intermediate_results list which I then mutate as a side-effect?
I have a generic solution for you using a decorator. We create a Memoize class which stores the results of previous times the function is executed (including in recursive calls). If the arguments given have already been seen, the cached versions are used to quickly lookup the result.
The custom class has the benefit over an lru_cache in that you can see the results.
from functools import wraps
class Memoize:
def __init__(self):
self.store = {}
def save(self, fun):
#wraps(fun)
def wrapper(*args):
if args not in self.store:
self.store[args] = fun(*args)
return self.store[args]
return wrapper
m = Memoize()
#m.save
def fibo(n):
if n <= 0: return 0
elif n == 1: return 1
else: return fibo(n-1) + fibo(n-2)
Then after running different things you can see what the cache contains. When you run future function calls, m.store will be used as a lookup so calculation doesn't need to be redone.
>>> f(8)
21
>>> m.store
{(1,): 1,
(0,): 0,
(2,): 1,
(3,): 2,
(4,): 3,
(5,): 5,
(6,): 8,
(7,): 13,
(8,): 21}
You could modify the save function to use the name of the function and the args as the key, so that multiple function results can be stored in the same Memoize class.
You can use your existing solution that makes many "redundant" calls to f, but employ the use of function caching to save the results to previous calls to f.
In other words, when f(x1) is called, it's input arguments and corresponding return values are saved, and the next time it is called, the result is simply pulled from the cache
see functools.lru_cache for the standard library solution to this
ie:
from functools import lru_cache
#lru_cache
intermediate_results = [f(x) for x in range(T)]
final_result = intermediate_results[T-1]
Note, however, f must be a pure function (no side-effects, 1-to-1 mapping) for this to work properly
Having considered your comments, I'll now try to give another perspective on the problem.
So, let's consider a concrete example:
def f(x):
a = 2
return g(x) + a if x != 0 else 0
def g(x):
b = 1
return h(x) - b
def h(x):
c = 1/2
return f(x-1)*(1+c)
I
First of all, it should be mentioned that (in our particular case) the algorithm has form of: f(x) = p(f(x - 1)) for some p. It follows that f(x) = p^x(f(0)) = p^x(0). That means we should just apply p to 0 x times to get the desired result, which can be done in an iterative process, so this can be written without recursion. Though I believe that your real case is much harder. Moreover, it would be too boring and uninformative to stop here)
II
Generally speaking, we can divide all possible solutions into two groups: the ones that require refactoring (i.e. rewriting functions f, g, h) and the ones that do not. I have little to offer from the latter one (and I don't think anyone can). Consider the following, however:
def fk(x, k):
a = 2
return k(gk(x, k) + a if x != 0 else 0)
def gk(x, k):
b = 1
return k(hk(x, k) - b)
def hk(x, k):
c = 1/2
return k(fk(x-1, k)*(1+c))
def printret(x):
print(x)
return x
f(4, printret) # see what happens
Inspired by continuation-passing style, but that's totally not it.
What's the point? It's something between your idea of passing a list to write down all the computations and memoizing. This k carries additional behavior with it, such as printing or writing to list (you can make a function that writes to some list, why not?). But if you look carefully you'll see that it lefts inner code of these functions practically untouched (only input and output to function are affected), so one can produce a decorator associated with a function like printret that does essentially the same thing for f, g, h.
Pros: no need to modify code, much more flexible than passing a list, no additional work (like in memoizing).
Cons: Impure (printing or modifying sth), not so flexible as we would like.
III
Now let's see how modifying function bodies can help. Don't be afraid of what's written below, take your time and play with that thing a little.
class Logger:
def __init__(self, lst, cur_val):
self.lst = lst
self.cur_val = cur_val
def bind(self, f):
res = f(self.cur_val)
return Logger([self.cur_val] + res.lst + self.lst, res.cur_val)
def __repr__(self):
return "Logger( " + repr({'value' : self.cur_val,'lst' : self.lst}) + " )"
def unit(x):
return Logger([], x)
# you can also play with lala
def lala(x):
if x <= 0:
return unit(1)
else:
return lala(x - 1).bind(lambda y: unit(2*y))
def f(x):
a = 2
if x == 0:
return unit(0)
else:
return g(x).bind(lambda y: unit(y + a))
def g(x):
b = 1
return h(x).bind(lambda y: unit(y - b))
def h(x):
c = 1/2
return f(x-1).bind(lambda y: unit(y*(1+c)))
f(4) # see for yourself
Logger is called a monad. I'm not very familiar with this concept myself, but I guess I'm doing everything right) f, g, h are functions that take a number and return a Logger instance. Logger's bind takes in a function (like f) and returns Logger with new value (computed by f) and updated 'logs'. The key point - as I see it - is the ability to do whatever we want with collected functions in the order the resulting value was calculated.
Afterword
I'm not at all some kind of 'guru' of functional programming, I believe I'm missing a lot of things here. But what I've understood is that functional programming is about inversing the flow of the program. That's why, for instance, I totally agree with your opinion about generators being opposed to functional programming. When we use generator gen in, say, function func, we yield values one by one to func and func does sth with them in e.g. a loop. The functional approach would be to make gen a function taking func as a parameter and make func perform computations on 'yielded' values. It's like gen and func exchanged their places. So the flow is inversed! And there are plenty of other ways of inversing the flow. Monads are one of them.
itertools islice gets a generator, start value and stop value. it will give you the elements between the start value and stop value as a generator. if islice is not clear you can check the docs here https://docs.python.org/3/library/itertools.html
intermediate_result = map(f, range(T))
final_result = next(itertools.islice(intermediate_result, start=T-1, stop=T))

Is there a way to change the argument name of a nested function?

So the easiest way to illustrate my question is through an example. Let's say I want to write a function function(a=None,b=None,c=None), this function will only take two arguments at a time and the third one will be left blank. For any two arguments that one gives to the function it will return the missing one such that a+b=c. So for example function(a=5, c=15) would return 10 and for function(a = 5, b = 10) it would return 15. Now for the sake of the argument let's say that a could not be written as a function of b and c, or its simply too complicated to find a closed solution (this is clearly not the case here because to find a, I could simply say a = c-b). Anyway, if I was to write such a function I'd do something like this:
#import a root finder
from scipy.optimize import newton
def function(a= None, b= None, c = None):
#find the missing parameter:
vars = [a,b,c]
comp = vars.index(None)
if comp == 0:
def aux_fun(a):
return (a+b-c)
elif comp == 1:
def aux_fun(b):
return (a+b-c)
else:
def aux_fun(c):
return (a+b-c)
return newton(aux_fun, 0)
I have not found a solution to this other than writing 3 different functions and calling the correct one in newton. This works for this small example but if I have a bigger problem let's say with 100 variables writing 100 functions is not pretty.
My question is: is there any way such that I only have to write aux_fun once and change its parameter based on the missing parameter from function
Thanks a lot for your answers!
I haven't fully grasped what you are trying to do, but I don't think you quite understand how newton works.
optimize.newton is going to call your function with
func(x, a, b, c ...)
where x is a scalar or array of the size of the initial value, the x0. The other arguments are passed via the args tuple. This pattern of passing an iteration variable and args to the function is widely used in these scipy.optimize functions. I've answered a number of questions regarding these arguments.
newton(func, x0, args=(a,b,c))
These aren't keyword arguments.
Read and experiment with the examples in the docs. People often mess up the args tuple. And then explore a few small examples of your own before seriously trying to do this 'renaming'.
edit
This might work - I haven't tested it:
def function(a= None, b= None, c = None):
#find the missing parameter:
vars = [a,b,c]
comp = vars.index(None)
def aux_fun(x):
vars[comp] = x
a,b,c = vars
return (a+b-c)
return newton(aux_fun, 0)
What you want is impossible. Given a generic function f(a, b, c, d) = 0, there is no way to generically turn it into a set of functions a = f1(b, c, d), b = f2(a, c, d), etc. You are entering the realm of symbolic computation. You would need your code to understand trigonometry, algebra, calculus, exponentiation and logarithms.
Updating based on comments below.
So you want something like:
def aux_func(x):
vars[comp] = x
return f(*vars)

Pure functions optimization

A pure function is a function whose return value is the same for the same arguments and that doesn't have any side effects.
Does CPython recognize that the return value will be the same and make an optimization calling the function only once? If not, do other python interpreters do it?
Below I wrote an example using os.path.join, assuming it's a pure function (I don't actually know its implementation), but the question extends to all pure functions.
dirpath = "C:\\Users"
dirname = "Username"
mylist = [["another\\directory", 6], ["C:\\Users\\Username", 8], ["foo\\bar", 3]]
count = 0
for pair in mylist:
if os.path.join(dirpath, dirname) == pair[0]:
count = pair[1]
dirpath and dirname aren't getting modified inside the for loop. Given the same input, os.path.join always have the same return value.
The standard python implementation does almost no optimization of user code.
However you can use the lru cache decorator on pure functions to get the functionality that you want.
from functools import lru_cache
def fib(n):
"""
Calculate the n'th fibanaci number
With O(N^2) <quadratic> runtime
"""
if n < 2: return n
return fib(n-1) + fib(n-2)
#lru_cache
def fib2(n):
"""
Calculate the n'th fibanaci number
With O(N) <linear> runtime
"""
if n < 2: return n
return fib2(n-1) + fib2(n-2)
Strictly speaking, Python does not have pure functions. It is well-defined to modify the meaning of a function at any time.
>>> def add(a, b): return a + b
>>> def sub(a, b): return a - b
>>> add(10, 5)
15
>>> add.__code__ = sub.__code__
>>> add(10, 5)
5
In addition, it is possible to change the builtins, globals and closures that a function accesses.
The reference implementation CPython makes no optimisations based on the purity of functions.
The implementation PyPy uses a tracing JIT which is capable of pure optimisations. Note that this applies to low-level operations (not necessarily entire functions) and only in often-used code.

Creating a copy of a function with some vars fixed

Assume I have a function
def multiply_by(x, multiplier):
return x * multiplier
How can I create a copy of that function and fix the multiplier in that function?
multiply_by_5 = multiply_by? <-- here I need python magic
such that multiply_by_5 would have only one argument x and the multiplier would be 5? So that
multiply_by_5(2)
10
Is there a way in Python 2.7 to do that?
You can use functools.partial with keyword argument:
>>> def multiply_by(x, multiplier):
... return x * multiplier
...
>>> from functools import partial
>>> multiply_by_5 = partial(multiply_by, multiplier=5)
>>> multiply_by_5(2)
10
functools.partial is made exactly for this.
you can use it like
import functools
multiply_by_5=functools.partial(multiply_by,multiplier=5)
As suggested by #niemmi's answer, functools.partial is probably the way to go.
However, similar work can be done using curried functions:
def multiply_by(multiplier):
def multiply(x):
return multiplier * x
return multiply
>>> multiply_by_5 = multiply_by(5) # no magic
>>> multiply_by_5(2)
10
Or using the lambda syntax:
def multiply_by(multiplier):
return lambda x: multiplier * x
Note that partial is more succinct, more efficient, and more directly express your intent in a standard way. The above technique is an example of the concept called closure, which is means that a function defined in inner scope may refer to variables defined in enclosing scopes, and "close" over them, remembering them, and even mutating them.
Since this technique is more general, it might take the reader of your code more time to understand what exactly do you mean in your code, since your code may be arbitrarily complicated.
Specifically for multiplication (and other operators) partial can be combined with operator.mul:
>>> import functools, operator
>>> multiply_by_5 = functools.partial(operator.mul, 5)
>>> multiply_by_5(2)
10
Here's an alternative that doesn't use functools.partial. Instead we define a function inside a function. The inner function "remembers" any of the local variables of the outer function that it needs (including the outer function's arguments). The magic that makes this happen is called closure.
def multiply_factory(multiplier):
def fixed_multiply(x):
return x * multiplier
return fixed_multiply
multiply_by_3 = multiply_factory(3)
multiply_by_5 = multiply_factory(5)
for i in range(5):
print(i, multiply_by_3(i), multiply_by_5(i))
output
0 0 0
1 3 5
2 6 10
3 9 15
4 12 20
If you want, you can use your existing multiply_by function in the closure, although that's slightly less efficient, due to the overhead of an extra function call. Eg:
def multiply_factory(multiplier):
def fixed_multiply(x):
return multiply_by(x, multiplier)
return fixed_multiply
That can be written more compactly using lambda syntax:
def multiply_factory(multiplier):
return lambda x: multiply_by(x, multiplier)
If you cannot change the multiply_by() function, the simplest and perhaps best way is probably
def multiply_by_5(x):
return multiply_by(x, 5)
You can also use lambda if you really want a one-liner.
However, you may want to change your first function to
def multiply_by(x, multiplier = 5):
return x * multiplier
Then you can do either of these:
print(multiply_by(4, 3))
12
print(multiply_by(2))
10

calculating current value based on previous value

i would like to perform a calculation using python, where the current value (i) of the equation is based on the previous value of the equation (i-1), which is really easy to do in a spreadsheet but i would rather learn to code it
i have noticed that there is loads of information on finding the previous value from a list, but i don't have a list i need to create it! my equation is shown below.
h=(2*b)-h[i-1]
can anyone give me tell me a method to do this ?
i tried this sort of thing, but that will not work as when i try to do the equation i'm calling a value i haven't created yet, if i set h=0 then i get an error that i am out of index range
i = 1
for i in range(1, len(b)):
h=[]
h=(2*b)-h[i-1]
x+=1
h = [b[0]]
for val in b[1:]:
h.append(2 * val - h[-1]) # As you add to h, you keep up with its tail
for large b list (brr, one-letter identifier), to avoid creating large slice
from itertools import islice # For big list it will keep code less wasteful
for val in islice(b, 1, None):
....
As pointed out by #pad, you simply need to handle the base case of receiving the first sample.
However, your equation makes no use of i other than to retrieve the previous result. It's looking more like a running filter than something which needs to maintain a list of past values (with an array which might never stop growing).
If that is the case, and you only ever want the most recent value,then you might want to go with a generator instead.
def gen():
def eqn(b):
eqn.h = 2*b - eqn.h
return eqn.h
eqn.h = 0
return eqn
And then use thus
>>> f = gen()
>>> f(2)
4
>>> f(3)
2
>>> f(2)
0
>>>
The same effect could be acheived with a true generator using yield and send.
First of, do you need all the intermediate values? That is, do you want a list h from 0 to i? Or do you just want h[i]?
If you just need the i-th value you could us recursion:
def get_h(i):
if i>0:
return (2*b) - get_h(i-1)
else:
return h_0
But be aware that this will not work for large i, as it will exceed the maximum recursion depth. (Thanks for pointing this out kdopen) In that case a simple for-loop or a generator is better.
Even better is to use a (mathematically) closed form of the equation (for your example that is possible, it might not be in other cases):
def get_h(i):
if i%2 == 0:
return h_0
else:
return (2*b)-h_0
In both cases h_0 is the initial value that you start out with.
h = []
for i in range(len(b)):
if i>0:
h.append(2*b - h[i-1])
else:
# handle i=0 case here
You are successively applying a function (equation) to the result of a previous application of that function - the process needs a seed to start it. Your result looks like this [seed, f(seed), f(f(seed)), f(f(f(seed)), ...]. This concept is function composition. You can create a generalized function that will do this for any sequence of functions, in Python functions are first class objects and can be passed around just like any other object. If you need to preserve the intermediate results use a generator.
def composition(functions, x):
""" yields f(x), f(f(x)), f(f(f(x)) ....
for each f in functions
functions is an iterable of callables taking one argument
"""
for f in functions:
x = f(x)
yield x
Your specs require a seed and a constant,
seed = 0
b = 10
The equation/function,
def f(x, b = b):
return 2*b - x
f is applied b times.
functions = [f]*b
Usage
print list(composition(functions, seed))
If the intermediate results are not needed composition can be redefined as
def composition(functions, x):
""" Returns f(x), g(f(x)), h(g(f(x)) ....
for each function in functions
functions is an iterable of callables taking one argument
"""
for f in functions:
x = f(x)
return x
print composition(functions, seed)
Or more generally, with no limitations on call signature:
def compose(funcs):
'''Return a callable composed of successive application of functions
funcs is an iterable producing callables
for [f, g, h] returns f(g(h(*args, **kwargs)))
'''
def outer(f, g):
def inner(*args, **kwargs):
return f(g(*args, **kwargs))
return inner
return reduce(outer, funcs)
def plus2(x):
return x + 2
def times2(x):
return x * 2
def mod16(x):
return x % 16
funcs = (mod16, plus2, times2)
eq = compose(funcs) # mod16(plus2(times2(x)))
print eq(15)
While the process definition appears to be recursive, I resisted the temptation so I could stay out of maximum recursion depth hades.
I got curious, searched SO for function composition and, of course, there are numerous relavent Q&A's.

Categories