Related
I have a problem where I need to produce something which is naturally computed recursively, but where I also need to be able to interrogate the intermediate steps in the recursion if needed.
I know I can do this by passing and mutating a list or similar structure. However, this looks ugly to me and I'm sure there must be a neater way, e.g. using generators. What I would ideally love to be able to do is something like:
intermediate_results = [f(x) for x in range(T)]
final_result = intermediate_results[T-1]
in an efficient way. While my solution is not performance critical, I can't justify the massive amount of redundant effort in that first line. It looks to me like a generator would be perfect for this except for the fact that f is fundamentally much more suited to recursion in my case (which at least in my mind is the complete opposite of a generator, but maybe I'm just not thinking far enough outside of the box).
Is there a neat Pythonic way of doing something like this that I just don't know about, or do I just need to just capitulate and pollute my function f by passing it an intermediate_results list which I then mutate as a side-effect?
I have a generic solution for you using a decorator. We create a Memoize class which stores the results of previous times the function is executed (including in recursive calls). If the arguments given have already been seen, the cached versions are used to quickly lookup the result.
The custom class has the benefit over an lru_cache in that you can see the results.
from functools import wraps
class Memoize:
def __init__(self):
self.store = {}
def save(self, fun):
#wraps(fun)
def wrapper(*args):
if args not in self.store:
self.store[args] = fun(*args)
return self.store[args]
return wrapper
m = Memoize()
#m.save
def fibo(n):
if n <= 0: return 0
elif n == 1: return 1
else: return fibo(n-1) + fibo(n-2)
Then after running different things you can see what the cache contains. When you run future function calls, m.store will be used as a lookup so calculation doesn't need to be redone.
>>> f(8)
21
>>> m.store
{(1,): 1,
(0,): 0,
(2,): 1,
(3,): 2,
(4,): 3,
(5,): 5,
(6,): 8,
(7,): 13,
(8,): 21}
You could modify the save function to use the name of the function and the args as the key, so that multiple function results can be stored in the same Memoize class.
You can use your existing solution that makes many "redundant" calls to f, but employ the use of function caching to save the results to previous calls to f.
In other words, when f(x1) is called, it's input arguments and corresponding return values are saved, and the next time it is called, the result is simply pulled from the cache
see functools.lru_cache for the standard library solution to this
ie:
from functools import lru_cache
#lru_cache
intermediate_results = [f(x) for x in range(T)]
final_result = intermediate_results[T-1]
Note, however, f must be a pure function (no side-effects, 1-to-1 mapping) for this to work properly
Having considered your comments, I'll now try to give another perspective on the problem.
So, let's consider a concrete example:
def f(x):
a = 2
return g(x) + a if x != 0 else 0
def g(x):
b = 1
return h(x) - b
def h(x):
c = 1/2
return f(x-1)*(1+c)
I
First of all, it should be mentioned that (in our particular case) the algorithm has form of: f(x) = p(f(x - 1)) for some p. It follows that f(x) = p^x(f(0)) = p^x(0). That means we should just apply p to 0 x times to get the desired result, which can be done in an iterative process, so this can be written without recursion. Though I believe that your real case is much harder. Moreover, it would be too boring and uninformative to stop here)
II
Generally speaking, we can divide all possible solutions into two groups: the ones that require refactoring (i.e. rewriting functions f, g, h) and the ones that do not. I have little to offer from the latter one (and I don't think anyone can). Consider the following, however:
def fk(x, k):
a = 2
return k(gk(x, k) + a if x != 0 else 0)
def gk(x, k):
b = 1
return k(hk(x, k) - b)
def hk(x, k):
c = 1/2
return k(fk(x-1, k)*(1+c))
def printret(x):
print(x)
return x
f(4, printret) # see what happens
Inspired by continuation-passing style, but that's totally not it.
What's the point? It's something between your idea of passing a list to write down all the computations and memoizing. This k carries additional behavior with it, such as printing or writing to list (you can make a function that writes to some list, why not?). But if you look carefully you'll see that it lefts inner code of these functions practically untouched (only input and output to function are affected), so one can produce a decorator associated with a function like printret that does essentially the same thing for f, g, h.
Pros: no need to modify code, much more flexible than passing a list, no additional work (like in memoizing).
Cons: Impure (printing or modifying sth), not so flexible as we would like.
III
Now let's see how modifying function bodies can help. Don't be afraid of what's written below, take your time and play with that thing a little.
class Logger:
def __init__(self, lst, cur_val):
self.lst = lst
self.cur_val = cur_val
def bind(self, f):
res = f(self.cur_val)
return Logger([self.cur_val] + res.lst + self.lst, res.cur_val)
def __repr__(self):
return "Logger( " + repr({'value' : self.cur_val,'lst' : self.lst}) + " )"
def unit(x):
return Logger([], x)
# you can also play with lala
def lala(x):
if x <= 0:
return unit(1)
else:
return lala(x - 1).bind(lambda y: unit(2*y))
def f(x):
a = 2
if x == 0:
return unit(0)
else:
return g(x).bind(lambda y: unit(y + a))
def g(x):
b = 1
return h(x).bind(lambda y: unit(y - b))
def h(x):
c = 1/2
return f(x-1).bind(lambda y: unit(y*(1+c)))
f(4) # see for yourself
Logger is called a monad. I'm not very familiar with this concept myself, but I guess I'm doing everything right) f, g, h are functions that take a number and return a Logger instance. Logger's bind takes in a function (like f) and returns Logger with new value (computed by f) and updated 'logs'. The key point - as I see it - is the ability to do whatever we want with collected functions in the order the resulting value was calculated.
Afterword
I'm not at all some kind of 'guru' of functional programming, I believe I'm missing a lot of things here. But what I've understood is that functional programming is about inversing the flow of the program. That's why, for instance, I totally agree with your opinion about generators being opposed to functional programming. When we use generator gen in, say, function func, we yield values one by one to func and func does sth with them in e.g. a loop. The functional approach would be to make gen a function taking func as a parameter and make func perform computations on 'yielded' values. It's like gen and func exchanged their places. So the flow is inversed! And there are plenty of other ways of inversing the flow. Monads are one of them.
itertools islice gets a generator, start value and stop value. it will give you the elements between the start value and stop value as a generator. if islice is not clear you can check the docs here https://docs.python.org/3/library/itertools.html
intermediate_result = map(f, range(T))
final_result = next(itertools.islice(intermediate_result, start=T-1, stop=T))
A pure function is a function whose return value is the same for the same arguments and that doesn't have any side effects.
Does CPython recognize that the return value will be the same and make an optimization calling the function only once? If not, do other python interpreters do it?
Below I wrote an example using os.path.join, assuming it's a pure function (I don't actually know its implementation), but the question extends to all pure functions.
dirpath = "C:\\Users"
dirname = "Username"
mylist = [["another\\directory", 6], ["C:\\Users\\Username", 8], ["foo\\bar", 3]]
count = 0
for pair in mylist:
if os.path.join(dirpath, dirname) == pair[0]:
count = pair[1]
dirpath and dirname aren't getting modified inside the for loop. Given the same input, os.path.join always have the same return value.
The standard python implementation does almost no optimization of user code.
However you can use the lru cache decorator on pure functions to get the functionality that you want.
from functools import lru_cache
def fib(n):
"""
Calculate the n'th fibanaci number
With O(N^2) <quadratic> runtime
"""
if n < 2: return n
return fib(n-1) + fib(n-2)
#lru_cache
def fib2(n):
"""
Calculate the n'th fibanaci number
With O(N) <linear> runtime
"""
if n < 2: return n
return fib2(n-1) + fib2(n-2)
Strictly speaking, Python does not have pure functions. It is well-defined to modify the meaning of a function at any time.
>>> def add(a, b): return a + b
>>> def sub(a, b): return a - b
>>> add(10, 5)
15
>>> add.__code__ = sub.__code__
>>> add(10, 5)
5
In addition, it is possible to change the builtins, globals and closures that a function accesses.
The reference implementation CPython makes no optimisations based on the purity of functions.
The implementation PyPy uses a tracing JIT which is capable of pure optimisations. Note that this applies to low-level operations (not necessarily entire functions) and only in often-used code.
Curve fitting tools such as those in scipy tend to assume that the parameters of the model functions are real-valued.
When fitting a model that depends on complex-valued parameters to a complex-valued data set, one therefore first has to create a version of the model, in which each complex parameter is replaced by two real ones.
First, a simple example:
# original function, representing a model where a,b may be complex-valued
def f(x, a, b):
return a+b*x
# modified function, complex parameters have been replaced by two real ones
def f_r(x, a_r, a_i, b_r, b_i):
return f(x, a_r + 1J*a_i, b_r+1J*b_i)
print( f(1,2+3J,4+5J) == f_r(1,2,3,4,5) )
Note: The output of the model is still complex-valued, but this can easily be taken care of by appropriately defining the residual function.
Now, instead of having to write new code for every function f, I would like to have a "function factory" to which I pass the function object f together with a list of booleans is_complex specifying which arguments of f are to be assumed complex-valued (and therefore need to be replaced by two real-valued arguments).
This list of booleans could e.g. be inferred from the initial values provided together with f.
I am new to this kind of problem, so I looked around on the web and came across the decorator module. Before going to the generic case, here is the example from above using the Functionmaker class:
import decorator
def f(x, a, b):
return a+b*x
f_r = decorator.FunctionMaker.create(
'f_r(x, a_r, a_i, b_r, b_i)',
'return f(x, a_r + 1J*a_i, b_r + 1J*b_i)',
dict(f=f))
For the generic case, one can now imagine to synthesize the two strings that are passed to the function maker:
import decorator
import inspect
def f(x, a, b):
return a+b*x
def fmaker(f,is_complex):
argspec = inspect.getargspec(f)
args = argspec.args[:]
fname = f.func_name
s1 = "{}_r(".format(fname)
s2 = "return f("
for arg, cplx in zip(args, is_complex):
if not cplx:
s1 += "{},".format(arg)
s2 += "{},".format(arg)
else:
s1 += "{}_r,".format(arg)
s1 += "{}_i,".format(arg)
s2 += "{}_r+1J*{}_i,".format(arg,arg)
s1 += ')'
s2 += ')'
return decorator.FunctionMaker.create(s1,s2,dict(f=f))
is_complex = [False, True, True]
f_r = fmaker(f,is_complex)
# prints ArgSpec(args=['x', 'a_r', 'a_i', 'b_r', 'b_i'], varargs=None, keywords=None, defaults=())
print(inspect.getargspec(f_r))
print( f(1,2+3J,4+5J) == f_r(1,2,3,4,5) )
This seems to solve the problem.
My question is: is this a reasonable way of doing this? Are there better/simpler ways in python?
P.S. I am not a computer scientist, so if I am using technical terms incorrectly, please feel free to revise.
You do not have to do any nasty string based generation, you can simply use basic function closures to create a wrapper:
def complex_unroll(f, are_complex):
# This function will have access to are_complex and f through python closure
# *args give us access to all parameters as a list
def g(*args, **kwargs):
# new_args stores new list of parameters, the complex ones
new_args = []
# arg_id is iterator used to keep track where are we in the original list
arg_id = 0
for is_complex in are_complex:
if is_complex:
# if we request complex unroll, we merge two consequtive params
new_args.append(args[arg_id] + 1J*args[arg_id+1])
# and move iterator 2 slots
arg_id += 2
else:
# otherwise, just copy the argument
new_args.append(args[arg_id])
arg_id += 1
# finally we return a call to original function f with new args
return f(*new_args, **kwargs)
# our unroll function returns a newly designed function g
return g
And now
def f(x, a, b):
return a+b*x
def f_r(x, a_r, a_i, b_r, b_i):
return f(x, a_r + 1J*a_i, b_r+1J*b_i)
f_u = complex_unroll(f, [False, True, True])
print f(1,2+3J,4+5J)
print f_r(1,2,3,4,5)
print f_u(1,2,3,4,5)
f_u2 = complex_unroll(f, [True, True, True])
print f_u2(1,0,2,3,4,5)
Works as desired.
Why I would prefer this path as compared to the proposed one in the question?
It does not use any additional modules/libraries, just a very basic mechanism of python's dealing with arguments and closures. In particular your solution does reflection, it analyzes the defined function, which is quite complex operation as compared to what you try to obtain.
It handles named arguments just fine, so if you have f(x, a, b, flag), you can still just use g = complex_unroll(f, [False, True, True]) and call g(0, 0, 0, 0, 0, flag = True), which would fail in your code. You could add support for this, though.
i would like to perform a calculation using python, where the current value (i) of the equation is based on the previous value of the equation (i-1), which is really easy to do in a spreadsheet but i would rather learn to code it
i have noticed that there is loads of information on finding the previous value from a list, but i don't have a list i need to create it! my equation is shown below.
h=(2*b)-h[i-1]
can anyone give me tell me a method to do this ?
i tried this sort of thing, but that will not work as when i try to do the equation i'm calling a value i haven't created yet, if i set h=0 then i get an error that i am out of index range
i = 1
for i in range(1, len(b)):
h=[]
h=(2*b)-h[i-1]
x+=1
h = [b[0]]
for val in b[1:]:
h.append(2 * val - h[-1]) # As you add to h, you keep up with its tail
for large b list (brr, one-letter identifier), to avoid creating large slice
from itertools import islice # For big list it will keep code less wasteful
for val in islice(b, 1, None):
....
As pointed out by #pad, you simply need to handle the base case of receiving the first sample.
However, your equation makes no use of i other than to retrieve the previous result. It's looking more like a running filter than something which needs to maintain a list of past values (with an array which might never stop growing).
If that is the case, and you only ever want the most recent value,then you might want to go with a generator instead.
def gen():
def eqn(b):
eqn.h = 2*b - eqn.h
return eqn.h
eqn.h = 0
return eqn
And then use thus
>>> f = gen()
>>> f(2)
4
>>> f(3)
2
>>> f(2)
0
>>>
The same effect could be acheived with a true generator using yield and send.
First of, do you need all the intermediate values? That is, do you want a list h from 0 to i? Or do you just want h[i]?
If you just need the i-th value you could us recursion:
def get_h(i):
if i>0:
return (2*b) - get_h(i-1)
else:
return h_0
But be aware that this will not work for large i, as it will exceed the maximum recursion depth. (Thanks for pointing this out kdopen) In that case a simple for-loop or a generator is better.
Even better is to use a (mathematically) closed form of the equation (for your example that is possible, it might not be in other cases):
def get_h(i):
if i%2 == 0:
return h_0
else:
return (2*b)-h_0
In both cases h_0 is the initial value that you start out with.
h = []
for i in range(len(b)):
if i>0:
h.append(2*b - h[i-1])
else:
# handle i=0 case here
You are successively applying a function (equation) to the result of a previous application of that function - the process needs a seed to start it. Your result looks like this [seed, f(seed), f(f(seed)), f(f(f(seed)), ...]. This concept is function composition. You can create a generalized function that will do this for any sequence of functions, in Python functions are first class objects and can be passed around just like any other object. If you need to preserve the intermediate results use a generator.
def composition(functions, x):
""" yields f(x), f(f(x)), f(f(f(x)) ....
for each f in functions
functions is an iterable of callables taking one argument
"""
for f in functions:
x = f(x)
yield x
Your specs require a seed and a constant,
seed = 0
b = 10
The equation/function,
def f(x, b = b):
return 2*b - x
f is applied b times.
functions = [f]*b
Usage
print list(composition(functions, seed))
If the intermediate results are not needed composition can be redefined as
def composition(functions, x):
""" Returns f(x), g(f(x)), h(g(f(x)) ....
for each function in functions
functions is an iterable of callables taking one argument
"""
for f in functions:
x = f(x)
return x
print composition(functions, seed)
Or more generally, with no limitations on call signature:
def compose(funcs):
'''Return a callable composed of successive application of functions
funcs is an iterable producing callables
for [f, g, h] returns f(g(h(*args, **kwargs)))
'''
def outer(f, g):
def inner(*args, **kwargs):
return f(g(*args, **kwargs))
return inner
return reduce(outer, funcs)
def plus2(x):
return x + 2
def times2(x):
return x * 2
def mod16(x):
return x % 16
funcs = (mod16, plus2, times2)
eq = compose(funcs) # mod16(plus2(times2(x)))
print eq(15)
While the process definition appears to be recursive, I resisted the temptation so I could stay out of maximum recursion depth hades.
I got curious, searched SO for function composition and, of course, there are numerous relavent Q&A's.
I thought that I improve performance when I replace this code:
def f(a, b):
return math.sqrt(a) * b
result = []
a = 100
for b in range(1000000):
result.append(f(a, b))
with:
def g(a):
def f(b):
return math.sqrt(a) * b
return f
result = []
a = 100
func = g(a)
for b in range(1000000):
result.append(func(b))
I assumed that since a is fixed when the closure is performed, the interpreter would precompute everything that involves a, and so math.sqrt(a) would be repeated just once instead of 1000000 times.
Is my understanding always correct, or always incorrect, or correct/incorrect depending on the implementation?
I noticed that the code object for func is built (at least in CPython) before runtime, and is immutable. The code object then seems to use global environment to achieve the closure. This seems to suggest that the optimization I hoped for does not happen.
I assumed that since a is fixed when the closure is performed, the interpreter would precompute everything that involves a, and so
math.sqrt(a) would be repeated just once instead of 1000000 times.
That assumption is wrong, I don't know where it came from. A closure just captures variable bindings, in your case it captures the value of a, but that doesn't mean that any more magic is going on: The expression math.sqrt(a) is still evaluated every time f is called.
After all, it has to be computed every time because the interpreter doesn't know that sqrt is "pure" (the return value is only dependent on the argument and no side-effects are performed). Optimizations like the ones you expect are practical in functional languages (referential transparency and static typing help a lot here), but would be very hard to implement in Python, which is an imperative and dynamically typed language.
That said, if you want to precompute the value of math.sqrt(a), you need to do that explicitly:
def g(a):
s = math.sqrt(a)
def f(b):
return s * b
return f
Or using lambda:
def g(a):
s = math.sqrt(a)
return lambda b: s * b
Now that g really returns a function with 1 parameter, you have to call the result with only one argument.
The code is not evaluated statically; the code inside the function is still calculated each time. The function object contains all the byte code which expresses the code in the function; it doesn't evaluate any of it. You could improve matters by calculating the expensive value once:
def g(a):
root_a = math.sqrt(a)
def f(b):
return root_a * b
return f
result = []
a = 100
func = g(a)
for b in range(1000000):
result.append(func(b))
Naturally, in this trivial example, you could improve performance much more:
a = 100
root_a = math.sqrt(a)
result = [root_a * b for b in range(1000000)]
But I presume you're working with a more complex example than that where that doesn't scale?
As usual, the timeit module is your friend. Try some things and see how it goes. If you don't care about writing ugly code, this might help a little as well:
def g(a):
def f(b,_local_func=math.sqrt):
return _local_func(a)*b
Apparently python takes a performance penalty whenever it tries to access a "global" variable/function. If you can make that access local, you can shave off a little time.