I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
#Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
#functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
#functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
#memoise
def actual_function(arg1, arg2):
#body
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
#wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
#memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
#functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
else:
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).
Related
I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
#Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
#functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
#functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
#memoise
def actual_function(arg1, arg2):
#body
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
#wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
#memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
#functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
else:
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).
def cons(a, b):
def pair(f):
return f(a, b)
return pair
def car(f):
def left(a, b):
return a
return f(left)
def cdr(f):
def right(a, b):
return b
return f(right)
Found this python code on git.
Just want to know what is f(a,b) in cons definition is, and how does it work?
(Not a function I guess)
cons is a function, that takes two arguments, and returns a function that takes another function, which will consume these two arguments.
For example, consider the following function:
def add(a, b):
return a + b
This is just a function that adds the two inputs, so, for instance, add(2, 5) == 7
As this function takes two arguments, we can use cons to call this function:
func_caller = cons(2, 5) # cons receives two arguments and returns a function, which we call func_caller
result = func_caller(add) # func_caller receives a function, that will process these two arguments
print(result) # result is the actual result of doing add(2, 5), i.e. 7
This technique is useful for wrapping functions and executing stuff, before and after calling the appropriate functions.
For example, we can modify our cons function to actually print the values before and after calling add:
def add(a, b):
print('Adding {} and {}'.format(a, b))
return a + b
def cons(a, b):
print('Received arguments {} and {}'.format(a, b))
def pair(f):
print('Calling {} with {} and {}'.format(f, a, b))
result = f(a, b)
print('Got {}'.format(result))
return result
return pair
With this update, we get the following outputs:
func_caller = cons(2, 5)
# prints "Received arguments 2 and 5" from inside cons
result = func_caller(add)
# prints "Calling add with 2 and 5" from inside pair
# prints "Adding 2 and 5" from inside add
# prints "Got 7" from inside pair
This isn't going to make any sense to you until you know what cons, car, and cdr mean.
In Lisp, lists are stored as a very simple form of linked list. A list is either nil (like None) for an empty list, or it's a pair of a value and another list. The cons function takes a value and a list and returns you another list just by making a pair:
def cons(head, rest):
return (head, rest)
And the car and cdr functions (they stand for "Contents of Address|Data Register", because those are the assembly language instructions used to implement them on a particular 1950s computer, but that isn't very helpful) return the first or second value from a pair:
def car(lst):
return lst[0]
def cdr(lst):
return lst[1]
So, you can make a list:
lst = cons(1, cons(2, cons(3, None)))
… and you can get the second value from it:
print(car(cdr(lst))
… and you can even write functions to get the nth value:
def nth(lst, n):
if n == 0:
return car(lst)
return nth(cdr(lst), n-1)
… or print out the whole list:
def printlist(lst):
if lst:
print(car(lst), end=' ')
printlist(cdr(lst))
If you understand how these work, the next step is to try them on those weird definitions you found.
They still do the same thing. So, the question is: How? And the bigger question is: What's the point?
Well, there's no practical point to using these weird functions; the real point is to show you that everything in computer science can be written with just functions, no built-in data structures like tuples (or even integers; that just takes a different trick).
The key is higher-order functions: functions that take functions as values and/or return other functions. You actually use these all the time: map, sort with a key, decorators, partial… they’re only confusing when they’re really simple:
def car(f):
def left(a, b):
return a
return f(left)
This takes a function, and calls it on a function that returns the first of its two arguments.
And cdr is similar.
It's hard to see how you'd use either of these, until you see cons:
def cons(a, b):
def pair(f):
return f(a, b)
return pair
This takes two things and returns a function that takes another function and applies it to those two things.
So, what do we get from cons(3, None)? We get a function that takes a function, and applies it to the arguments 3 and None:
def pair3(f):
return f(3, None)
And if we call cons(2, cons(3, None))?
def pair23(f):
return f(2, pair3)
And what happens if you call car on that function? Trace through it:
def left(a, b):
return a
return pair23(left)
That pair23(left) does this:
return left(2, pair3)
And left is dead simple:
return 2
So, we got the first element of (2, cons(3, None)).
What if you call cdr?
def right(a, b):
return a
return pair23(right)
That pair23(right) does this:
return right(2, pair3)
… and right is dead simple, so it just returns pair3.
You can work out that if we call car(cdr(pair23)), we're going to get the 3 out of it.
And now you can write lst = cons(1, cons(2, cons(3, None))), write the recursive nth and printlist functions above, and trace through how they work on lst.
I mentioned above that you can even get rid of integers. How do you do that? Read about Church numerals. You define zero and successor functions. Then you can define one as successor(zero) and two as successor(one). You can even recursively define add so that add(x, zero) is x but add(x, successor(y)) is successor(add(x, y)), and go on to define mul, etc.
You also need a special function you can use as a value for nil.
Anyway, once you've done that, using all of the other definitions above, you can do lst = cons(zero(cons(one, cons(two, cons(three, nil)))), and nth(lst, two) will give you back one. (Of course writing printlist will be a bit trickier…)
Obviously, this is all going to be a lot slower than just using tuples and integers and so on. But theoretically, it’s interesting.
Consider this: we could write a tiny dialect of Python that has only three kinds of statements—def, return, and expression statements—and only three kinds of expressions—literals, identifiers, and function calls—and it could do everything normal Python does. (In fact, you could get rid of statements altogether just by having a function-defining expression, which Python already has.) That tiny language would be a pain to use, but it would a lot easier to write a program to reason about programs in that tiny language. And we even know how to translate code using tuples, loops, etc. into code in this tiny subset language, which means we can write a program that reasons about that real Python code.
In fact, with a couple more tricks (curried functions and/or static function types, and lazy evaluation), the compiler/interpreter could do that kind of reasoning on the fly and optimize our code for us. It’s easy to tell programmatically that car(cdr(cons(2, cons(3, None)) is going to return 3 without having to actually evaluate most of those function calls, so we can just skip evaluating them and substitute 3 for the whole expression.
Of course this breaks down if any function can have side effects. You obviously can’t just substitute None for print(3) and get the same results. So instead, you need some clever trick where IO is handled by some magic object that evaluates functions to figure out what it should read and write, and then the whole rest of the program, the part that users write, becomes pure and can be optimized however you want. With a couple more abstractions, we can even make IO something that doesn’t have to be magical to do that.
And then you can build a standard library that gives you back all those things we gave up, written in terms of defining and calling functions, so it’s actually usable—but under the covers it’s all just reducing pure function calls, which is simple enough for a computer to optimize. And then you’ve basically written Haskell.
I'm doing an exercise where I'm to create a class representing functions (written as lambda expressions) and several methods involving them.
The ones I've written so far are:
class Func():
def __init__(self, func, domain):
self.func = func
self.domain = domain
def __call__(self, x):
if self.domain(x):
return self.func(x)
return None
def compose(self, other):
comp_func= lambda x: self.func(other(x))
comp_dom= lambda x: other.domain(x) and self.domain(other(x))
return Func(comp_func, comp_dom)
def exp(self, k):
exp_func= self
for i in range(k-1):
exp_func = Func.compose(exp_func, self)
return exp_func
As you can see above, the function exp composes a function with itself k-1 times. Now I'm to write a recursive version of said function, taking the same arguments "self" and "k".
However I'm having difficulty figuring out how it would work. In the original exp I wrote I had access to the original function "self" throughout all iterations, however when making a recursive function I lose access to the original function and with each iteration only have access to the most recent composed function. So for example, if I try composing self with self a certain number of times I will get:
f= x+3
f^2= x+6
(f^2)^2= x+12
So we skipped the function x+9.
How do I get around this? Is there a way to still retain access to the original function?
Update:
def exp_rec(self, k):
if k==1:
return self
return Func.compose(Func.exp_rec(self, k-1), self)
This is an exercise, so I won't provide the answer.
In recursion, you want to do two things:
Determine and check a "guard condition" that tells you when to stop; and
Determine and compute the "recurrence relation" that tells you the next value.
Consider a simple factorial function:
def fact(n):
if n == 1:
return 1
return n * fact(n - 1)
In this example, the guard condition is fairly obvious- it's the only conditional statement! And the recurrence relation is in the return statement.
For your exercise, things are slightly less obvious, since the intent is to define a function composition, rather than a straight integer computation. But consider:
f = Func(lambda x: x + 3)
(This is your example.) You want f.exp(1) to be the same as f, and f.exp(2) to be f(f(x)). That right there tells you the guard condition and the recurrence relation:
The guard condition is that exp() only works for positive numbers. This is because exp(0) might have to return different things for different input types (what does exp(0) return when f = Func(lambda s: s + '!') ?).
So test for exp(1), and let that condition be the original lambda.
Then, when recursively defining exp(n+1), let that be the composition of your original lambda with exp(n).
You have several things to consider: First, your class instance has data associated with it. That data will "travel along" with you in your recursion, so you don't have to pass so many parameters recursively. Second, you need to decide whether Func.exp() should create a new Func(), or whether it should modify the existing Func object. Finally, consider how you would write a hard-coded function, Func.exp2() that just constructed what we would call Func.exp(2). That should give you an idea of your recurrence relation.
Update
Based on some comments, I feel like I should show this code. If you are going to have your recursive function modify the self object, instead of returning a new object, then you will need to "cache" the values from self before they get modified, like so:
func = self.func
domain = self.domain
... recursive function modifies self.func and self.domain
Is there a Pythonic way to encapsulate a lazy function call, whereby on first use of the function f(), it calls a previously bound function g(Z) and on the successive calls f() returns a cached value?
Please note that memoization might not be a perfect fit.
I have:
f = g(Z)
if x:
return 5
elif y:
return f
elif z:
return h(f)
The code works, but I want to restructure it so that g(Z) is only called if the value is used. I don't want to change the definition of g(...), and Z is a bit big to cache.
EDIT: I assumed that f would have to be a function, but that may not be the case.
I'm a bit confused whether you seek caching or lazy evaluation. For the latter, check out the module lazy.py by Alberto Bertogli.
Try using this decorator:
class Memoize:
def __init__ (self, f):
self.f = f
self.mem = {}
def __call__ (self, *args, **kwargs):
if (args, str(kwargs)) in self.mem:
return self.mem[args, str(kwargs)]
else:
tmp = self.f(*args, **kwargs)
self.mem[args, str(kwargs)] = tmp
return tmp
(extracted from dead link: http://snippets.dzone.com/posts/show/4840 / https://web.archive.org/web/20081026130601/http://snippets.dzone.com/posts/show/4840)
(Found here: Is there a decorator to simply cache function return values? by Alex Martelli)
EDIT: Here's another in form of properties (using __get__) http://code.activestate.com/recipes/363602/
You can employ a cache decorator, let see an example
from functools import wraps
class FuncCache(object):
def __init__(self):
self.cache = {}
def __call__(self, func):
#wraps(func)
def callee(*args, **kwargs):
key = (args, str(kwargs))
# see is there already result in cache
if key in self.cache:
result = self.cache.get(key)
else:
result = func(*args, **kwargs)
self.cache[key] = result
return result
return callee
With the cache decorator, here you can write
my_cache = FuncCache()
#my_cache
def foo(n):
"""Expensive calculation
"""
sum = 0
for i in xrange(n):
sum += i
print 'called foo with result', sum
return sum
print foo(10000)
print foo(10000)
print foo(1234)
As you can see from the output
called foo with result 49995000
49995000
49995000
The foo will be called only once. You don't have to change any line of your function foo. That's the power of decorators.
There are quite a few decorators out there for memoization:
http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize
http://code.activestate.com/recipes/498110-memoize-decorator-with-o1-length-limited-lru-cache/
http://code.activestate.com/recipes/496879-memoize-decorator-function-with-cache-size-limit/
Coming up with a completely general solution is harder than you might think. For instance, you need to watch out for non-hashable function arguments and you need to make sure the cache doesn't grow too large.
If you're really looking for a lazy function call (one where the function is only actually evaluated if and when the value is needed), you could probably use generators for that.
EDIT: So I guess what you want really is lazy evaluation after all. Here's a library that's probably what you're looking for:
http://pypi.python.org/pypi/lazypy/0.5
Just for completness, here is a link for my lazy-evaluator decorator recipe:
https://bitbucket.org/jsbueno/metapython/src/f48d6bd388fd/lazy_decorator.py
Here's a pretty brief lazy-decorator, though it lacks using #functools.wraps (and actually returns an instance of Lazy plus some other potential pitfalls):
class Lazy(object):
def __init__(self, calculate_function):
self._calculate = calculate_function
def __get__(self, obj, _=None):
if obj is None:
return self
value = self._calculate(obj)
setattr(obj, self._calculate.func_name, value)
return value
# Sample use:
class SomeClass(object):
#Lazy
def someprop(self):
print 'Actually calculating value'
return 13
o = SomeClass()
o.someprop
o.someprop
Curious why you don't just use a lambda in this scenario?
f = lambda: g(z)
if x:
return 5
if y:
return f()
if z:
return h(f())
Even after your edit, and the series of comments with detly, I still don't really understand. In your first sentence, you say the first call to f() is supposed to call g(), but subsequently return cached values. But then in your comments, you say "g() doesn't get called no matter what" (emphasis mine). I'm not sure what you're negating: Are you saying g() should never be called (doesn't make much sense; why does g() exist?); or that g() might be called, but might not (well, that still contradicts that g() is called on the first call to f()). You then give a snippet that doesn't involve g() at all, and really doesn't relate to either the first sentence of your question, or to the comment thread with detly.
In case you go editing it again, here is the snippet I am responding to:
I have:
a = f(Z)
if x:
return 5
elif y:
return a
elif z:
return h(a)
The code works, but I want to
restructure it so that f(Z) is only
called if the value is used. I don't
want to change the definition of
f(...), and Z is a bit big to cache.
If that is really your question, then the answer is simply
if x:
return 5
elif y:
return f(Z)
elif z:
return h(f(Z))
That is how to achieve "f(Z) is only called if the value is used".
I don't fully understand "Z is a bit big to cache". If you mean there will be too many different values of Z over the course of program execution that memoization is useless, then maybe you have to resort to precalculating all the values of f(Z) and just looking them up at run time. If you can't do this (because you can't know the values of Z that your program will encounter) then you are back to memoization. If that's still too slow, then your only real option is to use something faster than Python (try Psyco, Cython, ShedSkin, or hand-coded C module).
Let's have a method that would cache results it calculates.
"If" approach:
def calculate1(input_values):
if input_values not in calculate1.cache.keys():
# do some calculation
result = input_values
calculate1.cache[input_values] = result
return calculate1.cache[input_values]
calculate1.cache = {}
"Except" approach:
def calculate2(input_values):
try:
return calculate2.cache[input_values]
except AttributeError:
calculate2.cache = {}
except KeyError:
pass
# do some calculation
result = input_values
calculate2.cache[input_values] = result
return result
"get/has" approach:
def calculate3(input_values):
if not hasattr(calculate3, cache):
calculate3.cache = {}
result = calculate3.cache.get(input_values)
if not result:
# do some calculation
result = input_values
calculate3.cache[input_values] = result
return result
Is there another (faster) way? Which one is most pythonic? Which one would you use?
Note: There's a speed difference:
calculate = calculateX # depening on test run
for i in xrange(10000):
calculate(datetime.utcnow())
Results time python test.py:
calculate1: 0m9.579s
calculate2: 0m0.130s
calculate3: 0m0.095s
Use a collections.defaultdict. It's designed precisely for this purpose.
Of course; this is Python after all: Just use a defaultdict.
Well if you are trying to memoize something, its best to use a Memoize class and decorators.
class Memoize(object):
def __init__(self, func):
self.func = func
self.cache = {}
def __call__(self, *args):
if args not in self.cache:
self.cache[args] = self.func(*args)
return self.cache[args]
Now define some function to be memoized, say a key-strengthening function that does say 100,000 md5sums of a string hashes:
import md5
def one_md5(init_str):
return md5.md5(init_str).hexdigest()
#Memoize
def repeat_md5(cur_str, num=1000000, salt='aeb4f89a2'):
for i in xrange(num):
cur_str = one_md5(cur_str+salt)
return cur_str
The #Memoize function decorator is equivalent to defining the function and then defining repeat_md5 = Memoize(repeat_md5). The first time you call it for a particular set of arguments, the function takes about a second to compute; and the next time you call its near instantaneous as it read from its cache.
As for the method of memoization; as long as you aren't doing something silly (like the first method where you do if key in some_dict.keys() rather than if key in some_dict) there shouldn't be much a significant difference. (The first method is bad as you generate an array from the dictionary first, and then check to see if the key is in it; rather than just check to see whether the key is in the dict (See Coding like a pythonista)). Also catching exceptions will be slower than if statements by nature (you have to create an exception then the exception-handler has to handle it; and then you catch it).