Pure functions optimization - python

A pure function is a function whose return value is the same for the same arguments and that doesn't have any side effects.
Does CPython recognize that the return value will be the same and make an optimization calling the function only once? If not, do other python interpreters do it?
Below I wrote an example using os.path.join, assuming it's a pure function (I don't actually know its implementation), but the question extends to all pure functions.
dirpath = "C:\\Users"
dirname = "Username"
mylist = [["another\\directory", 6], ["C:\\Users\\Username", 8], ["foo\\bar", 3]]
count = 0
for pair in mylist:
if os.path.join(dirpath, dirname) == pair[0]:
count = pair[1]
dirpath and dirname aren't getting modified inside the for loop. Given the same input, os.path.join always have the same return value.

The standard python implementation does almost no optimization of user code.
However you can use the lru cache decorator on pure functions to get the functionality that you want.
from functools import lru_cache
def fib(n):
"""
Calculate the n'th fibanaci number
With O(N^2) <quadratic> runtime
"""
if n < 2: return n
return fib(n-1) + fib(n-2)
#lru_cache
def fib2(n):
"""
Calculate the n'th fibanaci number
With O(N) <linear> runtime
"""
if n < 2: return n
return fib2(n-1) + fib2(n-2)

Strictly speaking, Python does not have pure functions. It is well-defined to modify the meaning of a function at any time.
>>> def add(a, b): return a + b
>>> def sub(a, b): return a - b
>>> add(10, 5)
15
>>> add.__code__ = sub.__code__
>>> add(10, 5)
5
In addition, it is possible to change the builtins, globals and closures that a function accesses.
The reference implementation CPython makes no optimisations based on the purity of functions.
The implementation PyPy uses a tracing JIT which is capable of pure optimisations. Note that this applies to low-level operations (not necessarily entire functions) and only in often-used code.

Related

Intermediate results from recursion

I have a problem where I need to produce something which is naturally computed recursively, but where I also need to be able to interrogate the intermediate steps in the recursion if needed.
I know I can do this by passing and mutating a list or similar structure. However, this looks ugly to me and I'm sure there must be a neater way, e.g. using generators. What I would ideally love to be able to do is something like:
intermediate_results = [f(x) for x in range(T)]
final_result = intermediate_results[T-1]
in an efficient way. While my solution is not performance critical, I can't justify the massive amount of redundant effort in that first line. It looks to me like a generator would be perfect for this except for the fact that f is fundamentally much more suited to recursion in my case (which at least in my mind is the complete opposite of a generator, but maybe I'm just not thinking far enough outside of the box).
Is there a neat Pythonic way of doing something like this that I just don't know about, or do I just need to just capitulate and pollute my function f by passing it an intermediate_results list which I then mutate as a side-effect?
I have a generic solution for you using a decorator. We create a Memoize class which stores the results of previous times the function is executed (including in recursive calls). If the arguments given have already been seen, the cached versions are used to quickly lookup the result.
The custom class has the benefit over an lru_cache in that you can see the results.
from functools import wraps
class Memoize:
def __init__(self):
self.store = {}
def save(self, fun):
#wraps(fun)
def wrapper(*args):
if args not in self.store:
self.store[args] = fun(*args)
return self.store[args]
return wrapper
m = Memoize()
#m.save
def fibo(n):
if n <= 0: return 0
elif n == 1: return 1
else: return fibo(n-1) + fibo(n-2)
Then after running different things you can see what the cache contains. When you run future function calls, m.store will be used as a lookup so calculation doesn't need to be redone.
>>> f(8)
21
>>> m.store
{(1,): 1,
(0,): 0,
(2,): 1,
(3,): 2,
(4,): 3,
(5,): 5,
(6,): 8,
(7,): 13,
(8,): 21}
You could modify the save function to use the name of the function and the args as the key, so that multiple function results can be stored in the same Memoize class.
You can use your existing solution that makes many "redundant" calls to f, but employ the use of function caching to save the results to previous calls to f.
In other words, when f(x1) is called, it's input arguments and corresponding return values are saved, and the next time it is called, the result is simply pulled from the cache
see functools.lru_cache for the standard library solution to this
ie:
from functools import lru_cache
#lru_cache
intermediate_results = [f(x) for x in range(T)]
final_result = intermediate_results[T-1]
Note, however, f must be a pure function (no side-effects, 1-to-1 mapping) for this to work properly
Having considered your comments, I'll now try to give another perspective on the problem.
So, let's consider a concrete example:
def f(x):
a = 2
return g(x) + a if x != 0 else 0
def g(x):
b = 1
return h(x) - b
def h(x):
c = 1/2
return f(x-1)*(1+c)
I
First of all, it should be mentioned that (in our particular case) the algorithm has form of: f(x) = p(f(x - 1)) for some p. It follows that f(x) = p^x(f(0)) = p^x(0). That means we should just apply p to 0 x times to get the desired result, which can be done in an iterative process, so this can be written without recursion. Though I believe that your real case is much harder. Moreover, it would be too boring and uninformative to stop here)
II
Generally speaking, we can divide all possible solutions into two groups: the ones that require refactoring (i.e. rewriting functions f, g, h) and the ones that do not. I have little to offer from the latter one (and I don't think anyone can). Consider the following, however:
def fk(x, k):
a = 2
return k(gk(x, k) + a if x != 0 else 0)
def gk(x, k):
b = 1
return k(hk(x, k) - b)
def hk(x, k):
c = 1/2
return k(fk(x-1, k)*(1+c))
def printret(x):
print(x)
return x
f(4, printret) # see what happens
Inspired by continuation-passing style, but that's totally not it.
What's the point? It's something between your idea of passing a list to write down all the computations and memoizing. This k carries additional behavior with it, such as printing or writing to list (you can make a function that writes to some list, why not?). But if you look carefully you'll see that it lefts inner code of these functions practically untouched (only input and output to function are affected), so one can produce a decorator associated with a function like printret that does essentially the same thing for f, g, h.
Pros: no need to modify code, much more flexible than passing a list, no additional work (like in memoizing).
Cons: Impure (printing or modifying sth), not so flexible as we would like.
III
Now let's see how modifying function bodies can help. Don't be afraid of what's written below, take your time and play with that thing a little.
class Logger:
def __init__(self, lst, cur_val):
self.lst = lst
self.cur_val = cur_val
def bind(self, f):
res = f(self.cur_val)
return Logger([self.cur_val] + res.lst + self.lst, res.cur_val)
def __repr__(self):
return "Logger( " + repr({'value' : self.cur_val,'lst' : self.lst}) + " )"
def unit(x):
return Logger([], x)
# you can also play with lala
def lala(x):
if x <= 0:
return unit(1)
else:
return lala(x - 1).bind(lambda y: unit(2*y))
def f(x):
a = 2
if x == 0:
return unit(0)
else:
return g(x).bind(lambda y: unit(y + a))
def g(x):
b = 1
return h(x).bind(lambda y: unit(y - b))
def h(x):
c = 1/2
return f(x-1).bind(lambda y: unit(y*(1+c)))
f(4) # see for yourself
Logger is called a monad. I'm not very familiar with this concept myself, but I guess I'm doing everything right) f, g, h are functions that take a number and return a Logger instance. Logger's bind takes in a function (like f) and returns Logger with new value (computed by f) and updated 'logs'. The key point - as I see it - is the ability to do whatever we want with collected functions in the order the resulting value was calculated.
Afterword
I'm not at all some kind of 'guru' of functional programming, I believe I'm missing a lot of things here. But what I've understood is that functional programming is about inversing the flow of the program. That's why, for instance, I totally agree with your opinion about generators being opposed to functional programming. When we use generator gen in, say, function func, we yield values one by one to func and func does sth with them in e.g. a loop. The functional approach would be to make gen a function taking func as a parameter and make func perform computations on 'yielded' values. It's like gen and func exchanged their places. So the flow is inversed! And there are plenty of other ways of inversing the flow. Monads are one of them.
itertools islice gets a generator, start value and stop value. it will give you the elements between the start value and stop value as a generator. if islice is not clear you can check the docs here https://docs.python.org/3/library/itertools.html
intermediate_result = map(f, range(T))
final_result = next(itertools.islice(intermediate_result, start=T-1, stop=T))

Setting parameters with decorators vs nested functions

I need to call a multi-parameter function many times while all but one parameter is fixed. I was thinking of using decorators:
# V1 - with #decorator
def dec_adder(num):
def wrap(fun):
def wrapped_fun(n1):
return fun(n1, second_num=num)
return wrapped_fun
return wrap
#dec_adder(2)
def adder(first_num, second_num):
return first_num + second_num
print adder(5)
>>> 7
But this seems confusing since it appears to be calling a 2-parameter function, adder with only one argument.
Another approach is to use a nested function definition that uses local variables from the parent function:
# V2 - without #decorator
def add_wrapper(num):
def wrapped_adder(num_2):
return num + num_2
return wrapped_adder
adder = add_wrapper(2)
print adder(5)
>>> 7
But I hesitate to use this approach since in my actual implementation the wrapped function is very complex. My instinct is that it should have a stand-alone definition.
Forgive me if this ventures into the realm of opinion, but is either approach considered better design and/or more Pythonic? Is there some other approach I should consider?
functools.partial should work nicely in this case:
from functools import partial
def adder(n1, n2):
return n1 + n2
adder_2 = partial(adder, 2)
adder_2(5)
Its' docstring:
partial(func, *args, **keywords) - new function with partial application
of the given arguments and keywords.
-- so, you can set keyword arguments as well.
PS
Sadly, the built-in sum does not suit this case: it sums over an iterable (in fact, sum(iterable[, start]) -> value), so partial(sum, 2) does not work.
Another possible solution - you can use functools and parametrized decorator:
from functools import wraps
def decorator(num):
def decor(f):
#wraps(f)
def wrapper(n,*args,**kwargs):
return f(n+num,*args,**kwargs)
return wrapper
return decor
#decorator(num=2) # number to add to the parameter
def test(n,*args,**kwargs):
print n
test(10) # base amount - prints 12

SICP "streams as signals" in Python

I have found some nice examples (here, here) of implementing SICP-like streams in Python. But I am still not sure how to handle an example like the integral found in SICP 3.5.3 "Streams as signals."
The Scheme code found there is
(define (integral integrand initial-value dt)
(define int
(cons-stream initial-value
(add-streams (scale-stream integrand dt)
int)))
int)
What is tricky about this one is that the returned stream int is defined in terms of itself (i.e., the stream int is used in the definition of the stream int).
I believe Python could have something similarly expressive and succinct... but not sure how. So my question is, what is an analogous stream-y construct in Python? (What I mean by a stream is the subject of 3.5 in SICP, but briefly, a construct (like a Python generator) that returns successive elements of a sequence of indefinite length, and can be combined and processed with operations such as add-streams and scale-stream that respect streams' lazy character.)
There are two ways to read your question. The first is simply: How do you use Stream constructs, perhaps the ones from your second link, but with a recursive definition? That can be done, though it is a little clumsy in Python.
In Python you can represent looped data structures but not directly. You can't write:
l = [l]
but you can write:
l = [None]
l[0] = l
Similarly you can't write:
def integral(integrand,initial_value,dt):
int_rec = cons_stream(initial_value,
add_streams(scale_stream(integrand,dt),
int_rec))
return int_rec
but you can write:
def integral(integrand,initial_value,dt):
placeholder = Stream(initial_value,lambda : None)
int_rec = cons_stream(initial_value,
add_streams(scale_stream(integrand,dt),
placeholder))
placeholder._compute_rest = lambda:int_rec
return int_rec
Note that we need to clumsily pre-compute the first element of placeholder and then only fix up the recursion for the rest of the stream. But this does all work (alongside appropriate definitions of all the rest of the code - I'll stick it all at the bottom of this answer).
However, the second part of your question seems to be asking how to do this naturally in Python. You ask for an "analogous stream-y construct in Python". Clearly the answer to that is exactly the generator. The generator naturally provides the lazy evaluation of the stream concept. It differs by not being naturally expressed recursively but then Python does not support that as well as Scheme, as we will see.
In other words, the strict stream concept can be expressed in Python (as in the link and above) but the idiomatic way to do it is to use generators.
It is more or less possible to replicate the Scheme example by a kind of direct mechanical transformation of stream to generator (but avoiding the built-in int):
def integral_rec(integrand,initial_value,dt):
def int_rec():
for x in cons_stream(initial_value,
add_streams(scale_stream(integrand,dt),int_rec())):
yield x
for x in int_rec():
yield x
def cons_stream(a,b):
yield a
for x in b:
yield x
def add_streams(a,b):
while True:
yield next(a) + next(b)
def scale_stream(a,b):
for x in a:
yield x * b
The only tricky thing here is to realise that you need to eagerly call the recursive use of int_rec as an argument to add_streams. Calling it doesn't start it yielding values - it just creates the generator ready to yield them lazily when needed.
This works nicely for small integrands, though it's not very pythonic. The Scheme version works by optimising the tail recursion - the Python version will exceed the max stack depth if your integrand is too long. So this is not really appropriate in Python.
A direct and natural pythonic version would look something like this, I think:
def integral(integrand,initial_value,dt):
value = initial_value
yield value
for x in integrand:
value += dt * x
yield value
This works efficiently and correctly treats the integrand lazily as a "stream". However, it uses iteration rather than recursion to unpack the integrand iterable, which is more the Python way.
In moving to natural Python I have also removed the stream combination functions - for example, replaced add_streams with +=. But we could still use them if we wanted a sort of halfway house version:
def accum(initial_value,a):
value = initial_value
yield value
for x in a:
value += x
yield value
def integral_hybrid(integrand,initial_value,dt):
for x in accum(initial_value,scale_stream(integrand,dt)):
yield x
This hybrid version uses the stream combinations from the Scheme and avoids only the tail recursion. This is still pythonic and python includes various other nice ways to work with iterables in the itertools module. They all "respect streams' lazy character" as you ask.
Finally here is all the code for the first recursive stream example, much of it taken from the Berkeley reference:
class Stream(object):
"""A lazily computed recursive list."""
def __init__(self, first, compute_rest, empty=False):
self.first = first
self._compute_rest = compute_rest
self.empty = empty
self._rest = None
self._computed = False
#property
def rest(self):
"""Return the rest of the stream, computing it if necessary."""
assert not self.empty, 'Empty streams have no rest.'
if not self._computed:
self._rest = self._compute_rest()
self._computed = True
return self._rest
def __repr__(self):
if self.empty:
return '<empty stream>'
return 'Stream({0}, <compute_rest>)'.format(repr(self.first))
Stream.empty = Stream(None, None, True)
def cons_stream(a,b):
return Stream(a,lambda : b)
def add_streams(a,b):
if a.empty or b.empty:
return Stream.empty
def compute_rest():
return add_streams(a.rest,b.rest)
return Stream(a.first+b.first,compute_rest)
def scale_stream(a,scale):
if a.empty:
return Stream.empty
def compute_rest():
return scale_stream(a.rest,scale)
return Stream(a.first*scale,compute_rest)
def make_integer_stream(first=1):
def compute_rest():
return make_integer_stream(first+1)
return Stream(first, compute_rest)
def truncate_stream(s, k):
if s.empty or k == 0:
return Stream.empty
def compute_rest():
return truncate_stream(s.rest, k-1)
return Stream(s.first, compute_rest)
def stream_to_list(s):
r = []
while not s.empty:
r.append(s.first)
s = s.rest
return r
def integral(integrand,initial_value,dt):
placeholder = Stream(initial_value,lambda : None)
int_rec = cons_stream(initial_value,
add_streams(scale_stream(integrand,dt),
placeholder))
placeholder._compute_rest = lambda:int_rec
return int_rec
a = truncate_stream(make_integer_stream(),5)
print(stream_to_list(integral(a,8,.5)))

calculating current value based on previous value

i would like to perform a calculation using python, where the current value (i) of the equation is based on the previous value of the equation (i-1), which is really easy to do in a spreadsheet but i would rather learn to code it
i have noticed that there is loads of information on finding the previous value from a list, but i don't have a list i need to create it! my equation is shown below.
h=(2*b)-h[i-1]
can anyone give me tell me a method to do this ?
i tried this sort of thing, but that will not work as when i try to do the equation i'm calling a value i haven't created yet, if i set h=0 then i get an error that i am out of index range
i = 1
for i in range(1, len(b)):
h=[]
h=(2*b)-h[i-1]
x+=1
h = [b[0]]
for val in b[1:]:
h.append(2 * val - h[-1]) # As you add to h, you keep up with its tail
for large b list (brr, one-letter identifier), to avoid creating large slice
from itertools import islice # For big list it will keep code less wasteful
for val in islice(b, 1, None):
....
As pointed out by #pad, you simply need to handle the base case of receiving the first sample.
However, your equation makes no use of i other than to retrieve the previous result. It's looking more like a running filter than something which needs to maintain a list of past values (with an array which might never stop growing).
If that is the case, and you only ever want the most recent value,then you might want to go with a generator instead.
def gen():
def eqn(b):
eqn.h = 2*b - eqn.h
return eqn.h
eqn.h = 0
return eqn
And then use thus
>>> f = gen()
>>> f(2)
4
>>> f(3)
2
>>> f(2)
0
>>>
The same effect could be acheived with a true generator using yield and send.
First of, do you need all the intermediate values? That is, do you want a list h from 0 to i? Or do you just want h[i]?
If you just need the i-th value you could us recursion:
def get_h(i):
if i>0:
return (2*b) - get_h(i-1)
else:
return h_0
But be aware that this will not work for large i, as it will exceed the maximum recursion depth. (Thanks for pointing this out kdopen) In that case a simple for-loop or a generator is better.
Even better is to use a (mathematically) closed form of the equation (for your example that is possible, it might not be in other cases):
def get_h(i):
if i%2 == 0:
return h_0
else:
return (2*b)-h_0
In both cases h_0 is the initial value that you start out with.
h = []
for i in range(len(b)):
if i>0:
h.append(2*b - h[i-1])
else:
# handle i=0 case here
You are successively applying a function (equation) to the result of a previous application of that function - the process needs a seed to start it. Your result looks like this [seed, f(seed), f(f(seed)), f(f(f(seed)), ...]. This concept is function composition. You can create a generalized function that will do this for any sequence of functions, in Python functions are first class objects and can be passed around just like any other object. If you need to preserve the intermediate results use a generator.
def composition(functions, x):
""" yields f(x), f(f(x)), f(f(f(x)) ....
for each f in functions
functions is an iterable of callables taking one argument
"""
for f in functions:
x = f(x)
yield x
Your specs require a seed and a constant,
seed = 0
b = 10
The equation/function,
def f(x, b = b):
return 2*b - x
f is applied b times.
functions = [f]*b
Usage
print list(composition(functions, seed))
If the intermediate results are not needed composition can be redefined as
def composition(functions, x):
""" Returns f(x), g(f(x)), h(g(f(x)) ....
for each function in functions
functions is an iterable of callables taking one argument
"""
for f in functions:
x = f(x)
return x
print composition(functions, seed)
Or more generally, with no limitations on call signature:
def compose(funcs):
'''Return a callable composed of successive application of functions
funcs is an iterable producing callables
for [f, g, h] returns f(g(h(*args, **kwargs)))
'''
def outer(f, g):
def inner(*args, **kwargs):
return f(g(*args, **kwargs))
return inner
return reduce(outer, funcs)
def plus2(x):
return x + 2
def times2(x):
return x * 2
def mod16(x):
return x % 16
funcs = (mod16, plus2, times2)
eq = compose(funcs) # mod16(plus2(times2(x)))
print eq(15)
While the process definition appears to be recursive, I resisted the temptation so I could stay out of maximum recursion depth hades.
I got curious, searched SO for function composition and, of course, there are numerous relavent Q&A's.

Function closure performance

I thought that I improve performance when I replace this code:
def f(a, b):
return math.sqrt(a) * b
result = []
a = 100
for b in range(1000000):
result.append(f(a, b))
with:
def g(a):
def f(b):
return math.sqrt(a) * b
return f
result = []
a = 100
func = g(a)
for b in range(1000000):
result.append(func(b))
I assumed that since a is fixed when the closure is performed, the interpreter would precompute everything that involves a, and so math.sqrt(a) would be repeated just once instead of 1000000 times.
Is my understanding always correct, or always incorrect, or correct/incorrect depending on the implementation?
I noticed that the code object for func is built (at least in CPython) before runtime, and is immutable. The code object then seems to use global environment to achieve the closure. This seems to suggest that the optimization I hoped for does not happen.
I assumed that since a is fixed when the closure is performed, the interpreter would precompute everything that involves a, and so
math.sqrt(a) would be repeated just once instead of 1000000 times.
That assumption is wrong, I don't know where it came from. A closure just captures variable bindings, in your case it captures the value of a, but that doesn't mean that any more magic is going on: The expression math.sqrt(a) is still evaluated every time f is called.
After all, it has to be computed every time because the interpreter doesn't know that sqrt is "pure" (the return value is only dependent on the argument and no side-effects are performed). Optimizations like the ones you expect are practical in functional languages (referential transparency and static typing help a lot here), but would be very hard to implement in Python, which is an imperative and dynamically typed language.
That said, if you want to precompute the value of math.sqrt(a), you need to do that explicitly:
def g(a):
s = math.sqrt(a)
def f(b):
return s * b
return f
Or using lambda:
def g(a):
s = math.sqrt(a)
return lambda b: s * b
Now that g really returns a function with 1 parameter, you have to call the result with only one argument.
The code is not evaluated statically; the code inside the function is still calculated each time. The function object contains all the byte code which expresses the code in the function; it doesn't evaluate any of it. You could improve matters by calculating the expensive value once:
def g(a):
root_a = math.sqrt(a)
def f(b):
return root_a * b
return f
result = []
a = 100
func = g(a)
for b in range(1000000):
result.append(func(b))
Naturally, in this trivial example, you could improve performance much more:
a = 100
root_a = math.sqrt(a)
result = [root_a * b for b in range(1000000)]
But I presume you're working with a more complex example than that where that doesn't scale?
As usual, the timeit module is your friend. Try some things and see how it goes. If you don't care about writing ugly code, this might help a little as well:
def g(a):
def f(b,_local_func=math.sqrt):
return _local_func(a)*b
Apparently python takes a performance penalty whenever it tries to access a "global" variable/function. If you can make that access local, you can shave off a little time.

Categories