Rewriting recursive algorithm to memoized algorithm - python

I have written the following recursive algorithm:
p = [2,3,2,1,4]
def fn(c,i):
if(c < 0 or i < 0):
return 0
if(c == 0):
return 1
return fn(c,i-1)+fn(c-p[i-1],i-1)
Its a solution to a problem where you have c coins, and you have to find out have many ways you can spend your c coins on beers. There are n different beers, only one of each beer.
i is denoted as the i'th beer, with the price of p[i], the prices are stored in array p.
The algorithm recursively calls itself, and if c == 0, it returns 1, as it has found a valid permutation. If c or i is less than 0, it returns 0 as it's not a valid permutation, as it exceeds the amount of coins available.
Now I need to rewrite the algorithm as a Memoized algorithm. This is my first time trying this, so I'm a little confused on how to do it.
Ive been trying different stuff, my latest try is the following code:
p = [2,3,2,1,4]
prev = np.empty([5, 5])
def fni(c,i):
if(prev[c][i] != None):
return prev[c][i]
if(c < 0 or i < 0):
prev[c][i] = 0
return 0
if(c == 0):
prev[c][i] = 1
return 1
prev[c][i] = fni(c,i-1)+fni(c-p[i-1],i-1)
return prev[c][i]
"Surprisingly" it doesn't work, and im sure it's completely wrong. My thought was to save the results of the recursive call in an 2d array of 5x5, and check in the start if the result is already saved in the array, and if it is just return it.
I only provided my above attempt to show something, so don't take the code too seriously.
My prev array is all 0's, and should be values of null so just ignore that.
My task is actually only to solve it as pseudocode, but I thought it would be easier to write it as code to make sure that it would actually work, so pseudo code would help as well.
I hope I have provided enough information, else feel free to ask!
EDIT: I forgot to mention that I have 5 coins, and 5 different beers (one of each beer). So c = 5, and i = 5

First, np.empty() by default gives an array of uninitialized values, not Nones, as the documentation points out:
>>> np.empty([2, 2])
array([[ -9.74499359e+001, 6.69583040e-309],
[ 2.13182611e-314, 3.06959433e-309]]) #uninitialized
Secondly, although this is more subjective, you should default to using dictionaries for memoization in Python. Arrays may be more efficient if you know you'll actually memoize most of the possible values, but it can be hard to tell that ahead of time. At the very least, make sure your array values are initialized. It's good that you're using numpy-- that will help you avoid the common beginner mistake of writing memo = [[0]*5]*5.
Thirdly, you should perform checks for 'out of bounds' or negative parameters (c < 0 or i < 0) before you use them to access an array as in prev[c][i] != None. Negative indices in Python could map you to a different memoized parameter's value.
Besides those details, your memoization code and strategy is sound.

Related

Finding the Maximum Pyramidal Number by recursion in Python

I'm given the task to define a function to find the largest pyramidal number. For context, this is what pyramidal numbers are:
1 = 1^2
5 = 1^2 + 2^2
14 = 1^2 + 2^2 + 3^2
And so on.
The first part of the question requires me to find iteratively the largest pyramidal number within the range of argument n. To which, I successfully did:
def largest_square_pyramidal_num(n):
total = 0
i = 0
while total <= n:
total += i**2
i += 1
if total > n:
return total - (i-1)**2
else:
return total
So far, I can catch on.
The next part of the question then requires me to define the same function, but this time recursively. That's where I was instantly stunned. For the usual recursive functions that I have worked on before, I had always operated ON the argument, but had never come across a function where the argument was the condition instead. I struggled for quite a while and ended up with a function I knew clearly would not work. But I simply could not wrap my head around how to "recurse" such function. Here's my obviously-wrong code:
def largest_square_pyramidal_num_rec(n):
m = 0
pyr_number = 0
pyr_number += m**2
def pyr_num(m):
if pyr_number >= n:
return pyr_number
else:
return pyr_num(m+1)
return pyr_number
I know this is erroneous, and I can say why, but I don't know how to correct it. Does anyone have any advice?
Edit: At the kind request of a fellow programmer, here is my logic and what I know is wrong:
Here's my logic: The process that repeats itself is the addition of square numbers to give the pyr num. Hence this is the recursive process. But this isn't what the argument is about, hence I need to redefine the recursive argument. In this case, m, and build up to a pyr num of pyr_number, to which I will compare with the condition of n. I'm used to recursion in decrements, but it doesn't make sense to me (I mean, where to start?) so I attempted to recall the function upwards.
BUT this clearly isn't right. First of all, I'm sceptical of defining the element m and pyr_num outside of the pyr_num subfunction. Next, m isn't pre-defined. Which is wrong. Lastly and most importantly, the calling of pyr_num will always call back pyr_num = 0. But I cannot figure out another way to write this logic out
Here's a recursive function to calculate the pyramid number, based on how many terms you give it.
def pyramid(terms: int) -> int:
if terms <=1:
return 1
return terms * terms + pyramid(terms - 1)
pyramid(3) # 14
If you can understand what the function does and how it works, you should be able to figure out another function that gives you the greatest pyramid less than n.
def base(n):
return rec(n, 0, 0)
def rec(n, i, tot):
if tot > n:
return tot - (i-1)**2
else:
return rec(n, i+1, tot+i**2)
print(base(NUMBER))
this output the same thing of your not-recursive function.

Is my code correct to find a prime number by means of recursion in python? or is the answer key?

I am studying Python by the book "a beginner guide to python 3" written by Mr.John Hunt. In chapter 8, which is about recursion, there is an exercise, that demands a code in which a prime number is found by recursion. I wrote first code below independently, but the answer key is written in different structure. Because I am very doubtful about recursion, What is your analysis about these two? Which is more recursive?
My code:
def is_prime(n, holder = 1):
if n == 2:
return True
else:
if (n-1 + holder)%(n-1) == 0:
return False
else:
return is_prime(n-1, holder+1)
print('is_prime(9):', is_prime(9))
print('is_prime(31):', is_prime(31))
Answer key:
def is_prime(n, i=2):
# Base cases
if n <= 2:
return True if (n == 2) else False
if n % i == 0:
return False
if i * i > n:
return True
# Check for next divisor
return is_prime(n, i + 1)
print('is_prime(9):', is_prime(9))
print('is_prime(31):', is_prime(31))
My suggestion in this case would be not to use recursion at all. Whilst I understand that you want to use this as a learning example of how to use recursion, it is also important to learn when to use recursion.
Recursion has a maximum allowed depth, because the deeper the recursion, the more items need to be put on the call stack. As such, this is not a good example to use recursion for, because it is easy to reach the maximum in this case. Even the "model" example code suffers from this. The exact maximum recursion depth may be implementation-dependent, but for example, if I try to use it to compute is_prime(1046527) then I get an error:
RecursionError: maximum recursion depth exceeded while calling a Python object
and inserting a print(i) statement shows that it is encountered when i=998.
A simple non-recursive equivalent of the "model" example will not have this problem. (There are more efficient solutions, but this one is trying to stay close to the model solution apart from not using recursion.)
def is_prime(n):
if n == 2:
return True
i = 2
while i * i <= n:
if n % i == 0:
return False
i += 1
return True
(In practice you would probably also want to handle n<2 cases.)
If you want a better example of a problem to practise recursive programming, check out the Tower of Hanoi problem. In this case, you will find that using recursion allows you to make a simpler and cleaner solution than is possible without it, while being unlikely to involve exceeding the maximum recursion depth (you are unlikely to need to consider a tower 1000 disks high, because the solution would require a vast number of moves, 2^1000-1 or about 10^301).
As another good example of where recursion can be usefully employed, try using turtle graphics to draw a Koch snowflake.
I'd say the Answer Key needs improvement. We can make it faster and handle the base cases more cleanly:
def is_prime(n, i=3):
# Base cases
if n < 2:
return False
if n % 2 == 0:
return n == 2
if i * i > n:
return True
if n % i == 0:
return False
# Check for next divisor
return is_prime(n, i + 2)
The original answer key starts at 2 and counts up by 1 -- here we start at 3 and count up by 2.
As far as your answer goes, there's a different flaw to consider. Python's default stack depth is 1,000 frames, and your function fails shortly above input of 1,000. The solution above uses recursion more sparingly and can handle input of up to nearly 4,000,000 before hitting up against Python's default stack limit.
Yes your example seems to work correctly. Note However, that by the nature of the implementation, the answer key is more efficient. To verify that a number n is a prime number, your algorithm uses a maximum of n-1 function calls, while the provided answer stops after reaching the iteration count of sqrt(n). Checking higher numbers makes generally no sense since if n is dividable without remainder by a value a > sqrt(n) it has to also be dividable by b = n % a.
Furthermore, your code raises an exception for evaluating at n = 1 since the modulo of 0 is not defined.

Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3?

It is my understanding that the range() function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.
This being the case, I would have expected the following line to take an inordinate amount of time because, in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:
1_000_000_000_000_000 in range(1_000_000_000_000_001)
Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).
I have also tried things like this, but the calculation is still almost instant:
# count by tens
1_000_000_000_000_000_000_000 in range(0,1_000_000_000_000_000_000_001,10)
If I try to implement my own range function, the result is not so nice!
def my_crappy_range(N):
i = 0
while i < N:
yield i
i += 1
return
What is the range() object doing under the hood that makes it so fast?
Martijn Pieters's answer was chosen for its completeness, but also see abarnert's first answer for a good discussion of what it means for range to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for __contains__ function optimization across Python implementations. abarnert's other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of xrange in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.
The Python 3 range() object doesn't produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.
The object also implements the object.__contains__ hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.
From the range() object documentation:
The advantage of the range type over a regular list or tuple is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the start, stop and step values, calculating individual items and subranges as needed).
So at a minimum, your range() object would do:
class my_range:
def __init__(self, start, stop=None, step=1, /):
if stop is None:
start, stop = 0, start
self.start, self.stop, self.step = start, stop, step
if step < 0:
lo, hi, step = stop, start, -step
else:
lo, hi = start, stop
self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1
def __iter__(self):
current = self.start
if self.step < 0:
while current > self.stop:
yield current
current += self.step
else:
while current < self.stop:
yield current
current += self.step
def __len__(self):
return self.length
def __getitem__(self, i):
if i < 0:
i += self.length
if 0 <= i < self.length:
return self.start + i * self.step
raise IndexError('my_range object index out of range')
def __contains__(self, num):
if self.step < 0:
if not (self.stop < num <= self.start):
return False
else:
if not (self.start <= num < self.stop):
return False
return (num - self.start) % self.step == 0
This is still missing several things that a real range() supports (such as the .index() or .count() methods, hashing, equality testing, or slicing), but should give you an idea.
I also simplified the __contains__ implementation to only focus on integer tests; if you give a real range() object a non-integer value (including subclasses of int), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.
* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it’s all executed in optimised C code and Python stores integer values in 30-bit chunks, you’d run out of memory before you saw any performance impact due to the size of the integers involved here.
The fundamental misunderstanding here is in thinking that range is a generator. It's not. In fact, it's not any kind of iterator.
You can tell this pretty easily:
>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]
If it were a generator, iterating it once would exhaust it:
>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]
What range actually is, is a sequence, just like a list. You can even test this:
>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True
This means it has to follow all the rules of being a sequence:
>>> a[3] # indexable
3
>>> len(a) # sized
5
>>> 3 in a # membership
True
>>> reversed(a) # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3) # implements 'index'
3
>>> a.count(3) # implements 'count'
1
The difference between a range and a list is that a range is a lazy or dynamic sequence; it doesn't remember all of its values, it just remembers its start, stop, and step, and creates the values on demand on __getitem__.
(As a side note, if you print(iter(a)), you'll notice that range uses the same listiterator type as list. How does that work? A listiterator doesn't use anything special about list except for the fact that it provides a C implementation of __getitem__, so it works fine for range too.)
Now, there's nothing that says that Sequence.__contains__ has to be constant time—in fact, for obvious examples of sequences like list, it isn't. But there's nothing that says it can't be. And it's easier to implement range.__contains__ to just check it mathematically ((val - start) % step, but with some extra complexity to deal with negative steps) than to actually generate and test all the values, so why shouldn't it do it the better way?
But there doesn't seem to be anything in the language that guarantees this will happen. As Ashwini Chaudhari points out, if you give it a non-integral value, instead of converting to integer and doing the mathematical test, it will fall back to iterating all the values and comparing them one by one. And just because CPython 3.2+ and PyPy 3.x versions happen to contain this optimization, and it's an obvious good idea and easy to do, there's no reason that IronPython or NewKickAssPython 3.x couldn't leave it out. (And in fact, CPython 3.0-3.1 didn't include it.)
If range actually were a generator, like my_crappy_range, then it wouldn't make sense to test __contains__ this way, or at least the way it makes sense wouldn't be obvious. If you'd already iterated the first 3 values, is 1 still in the generator? Should testing for 1 cause it to iterate and consume all the values up to 1 (or up to the first value >= 1)?
Use the source, Luke!
In CPython, range(...).__contains__ (a method wrapper) will eventually delegate to a simple calculation which checks if the value can possibly be in the range. The reason for the speed here is we're using mathematical reasoning about the bounds, rather than a direct iteration of the range object. To explain the logic used:
Check that the number is between start and stop, and
Check that the stride value doesn't "step over" our number.
For example, 994 is in range(4, 1000, 2) because:
4 <= 994 < 1000, and
(994 - 4) % 2 == 0.
The full C code is included below, which is a bit more verbose because of memory management and reference counting details, but the basic idea is there:
static int
range_contains_long(rangeobject *r, PyObject *ob)
{
int cmp1, cmp2, cmp3;
PyObject *tmp1 = NULL;
PyObject *tmp2 = NULL;
PyObject *zero = NULL;
int result = -1;
zero = PyLong_FromLong(0);
if (zero == NULL) /* MemoryError in int(0) */
goto end;
/* Check if the value can possibly be in the range. */
cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
if (cmp1 == -1)
goto end;
if (cmp1 == 1) { /* positive steps: start <= ob < stop */
cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
}
else { /* negative steps: stop < ob <= start */
cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
}
if (cmp2 == -1 || cmp3 == -1) /* TypeError */
goto end;
if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
result = 0;
goto end;
}
/* Check that the stride does not invalidate ob's membership. */
tmp1 = PyNumber_Subtract(ob, r->start);
if (tmp1 == NULL)
goto end;
tmp2 = PyNumber_Remainder(tmp1, r->step);
if (tmp2 == NULL)
goto end;
/* result = ((int(ob) - start) % step) == 0 */
result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
end:
Py_XDECREF(tmp1);
Py_XDECREF(tmp2);
Py_XDECREF(zero);
return result;
}
static int
range_contains(rangeobject *r, PyObject *ob)
{
if (PyLong_CheckExact(ob) || PyBool_Check(ob))
return range_contains_long(r, ob);
return (int)_PySequence_IterSearch((PyObject*)r, ob,
PY_ITERSEARCH_CONTAINS);
}
The "meat" of the idea is mentioned in the comment lines:
/* positive steps: start <= ob < stop */
/* negative steps: stop < ob <= start */
/* result = ((int(ob) - start) % step) == 0 */
As a final note - look at the range_contains function at the bottom of the code snippet. If the exact type check fails then we don't use the clever algorithm described, instead falling back to a dumb iteration search of the range using _PySequence_IterSearch! You can check this behaviour in the interpreter (I'm using v3.5.0 here):
>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
... pass
...
>>> x_ = MyInt(x)
>>> x in r # calculates immediately :)
True
>>> x_ in r # iterates for ages.. :(
^\Quit (core dumped)
To add to Martijn’s answer, this is the relevant part of the source (in C, as the range object is written in native code):
static int
range_contains(rangeobject *r, PyObject *ob)
{
if (PyLong_CheckExact(ob) || PyBool_Check(ob))
return range_contains_long(r, ob);
return (int)_PySequence_IterSearch((PyObject*)r, ob,
PY_ITERSEARCH_CONTAINS);
}
So for PyLong objects (which is int in Python 3), it will use the range_contains_long function to determine the result. And that function essentially checks if ob is in the specified range (although it looks a bit more complex in C).
If it’s not an int object, it falls back to iterating until it finds the value (or not).
The whole logic could be translated to pseudo-Python like this:
def range_contains (rangeObj, obj):
if isinstance(obj, int):
return range_contains_long(rangeObj, obj)
# default logic by iterating
return any(obj == x for x in rangeObj)
def range_contains_long (r, num):
if r.step > 0:
# positive step: r.start <= num < r.stop
cmp2 = r.start <= num
cmp3 = num < r.stop
else:
# negative step: r.start >= num > r.stop
cmp2 = num <= r.start
cmp3 = r.stop < num
# outside of the range boundaries
if not cmp2 or not cmp3:
return False
# num must be on a valid step inside the boundaries
return (num - r.start) % r.step == 0
If you're wondering why this optimization was added to range.__contains__, and why it wasn't added to xrange.__contains__ in 2.7:
First, as Ashwini Chaudhary discovered, issue 1766304 was opened explicitly to optimize [x]range.__contains__. A patch for this was accepted and checked in for 3.2, but not backported to 2.7 because "xrange has behaved like this for such a long time that I don't see what it buys us to commit the patch this late." (2.7 was nearly out at that point.)
Meanwhile:
Originally, xrange was a not-quite-sequence object. As the 3.1 docs say:
Range objects have very little behavior: they only support indexing, iteration, and the len function.
This wasn't quite true; an xrange object actually supported a few other things that come automatically with indexing and len,* including __contains__ (via linear search). But nobody thought it was worth making them full sequences at the time.
Then, as part of implementing the Abstract Base Classes PEP, it was important to figure out which builtin types should be marked as implementing which ABCs, and xrange/range claimed to implement collections.Sequence, even though it still only handled the same "very little behavior". Nobody noticed that problem until issue 9213. The patch for that issue not only added index and count to 3.2's range, it also re-worked the optimized __contains__ (which shares the same math with index, and is directly used by count).** This change went in for 3.2 as well, and was not backported to 2.x, because "it's a bugfix that adds new methods". (At this point, 2.7 was already past rc status.)
So, there were two chances to get this optimization backported to 2.7, but they were both rejected.
* In fact, you even get iteration for free with indexing alone, but in 2.3 xrange objects got a custom iterator.
** The first version actually reimplemented it, and got the details wrong—e.g., it would give you MyIntSubclass(2) in range(5) == False. But Daniel Stutzbach's updated version of the patch restored most of the previous code, including the fallback to the generic, slow _PySequence_IterSearch that pre-3.2 range.__contains__ was implicitly using when the optimization doesn't apply.
The other answers explained it well already, but I'd like to offer another experiment illustrating the nature of range objects:
>>> r = range(5)
>>> for i in r:
print(i, 2 in r, list(r))
0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]
As you can see, a range object is an object that remembers its range and can be used many times (even while iterating over it), not just a one-time generator.
It's all about a lazy approach to the evaluation and some extra optimization of range.
Values in ranges don't need to be computed until real use, or even further due to extra optimization.
By the way, your integer is not such big, consider sys.maxsize
sys.maxsize in range(sys.maxsize) is pretty fast
due to optimization - it's easy to compare given integer just with min and max of range.
but:
Decimal(sys.maxsize) in range(sys.maxsize) is pretty slow.
(in this case, there is no optimization in range, so if python receives unexpected Decimal, python will compare all numbers)
You should be aware of an implementation detail but should not be relied upon, because this may change in the future.
TL;DR
The object returned by range() is actually a range object. This object implements the iterator interface so you can iterate over its values sequentially, just like a generator, list, or tuple.
But it also implements the __contains__ interface which is actually what gets called when an object appears on the right-hand side of the in operator. The __contains__() method returns a bool of whether or not the item on the left-hand side of the in is in the object. Since range objects know their bounds and stride, this is very easy to implement in O(1).
Due to optimization, it is very easy to compare given integers just with min and max range.
The reason that the range() function is so fast in Python3 is that here we use mathematical reasoning for the bounds, rather than a direct iteration of the range object.
So for explaining the logic here:
Check whether the number is between the start and stop.
Check whether the step precision value doesn't go over our number.
Take an example, 997 is in range(4, 1000, 3) because:
4 <= 997 < 1000, and (997 - 4) % 3 == 0.
Try x-1 in (i for i in range(x)) for large x values, which uses a generator comprehension to avoid invoking the range.__contains__ optimisation.
TLDR;
the range is an arithmetic series so it can very easily calculate whether the object is there. It could even get the index of it if it were list like really quickly.
__contains__ method compares directly with the start and end of the range

How many combinations are possible?

The recursive formula for computing the number of ways of choosing k items out of a set of n items, denoted C(n,k), is:
1 if K = 0
C(n,k) = { 0 if n<k
c(n-1,k-1)+c(n-1,k) otherwise
I’m trying to write a recursive function C that computes C(n,k) using this recursive formula. The code I have written should work according to myself but it doesn’t give me the correct answers.
This is my code:
def combinations(n,k):
# base case
if k ==0:
return 1
elif n<k:
return 0
# recursive case
else:
return combinations(n-1,k-1)+ combinations(n-1,k)
The answers should look like this:
>>> c(2, 1)
0
>>> c(1, 2)
2
>>> c(2, 5)
10
but I get other numbers... don’t see where the problem is in my code.
I would try reversing the arguments, because as written n < k.
I think you mean this:
>>> c(2, 1)
2
>>> c(5, 2)
10
Your calls, e.g. c(2, 5) means that n=2 and k=5 (as per your definition of c at the top of your question). So n < k and as such the result should be 0. And that’s exactly what happens with your implementation. And all other examples do yield the actually correct results as well.
Are you sure that the arguments of your example test cases have the correct order? Because they are all c(k, n)-calls. So either those calls are wrong, or the order in your definition of c is off.
This is one of those times where you really shouldn't be using a recursive function. Computing combinations is very simple to do directly. For some things, like a factorial function, using recursion there is no big deal, because it can be optimized with tail-recursion anyway.
Here's the reason why:
Why do we never use this definition for the Fibonacci sequence when we are writing a program?
def fibbonacci(idx):
if(idx < 2):
return idx
else:
return fibbonacci(idx-1) + fibbonacci(idx-2)
The reason is because that, because of recursion, it is prohibitively slow. Multiple separate recursive calls should be avoided if possible, for the same reason.
If you do insist on using recursion, I would recommend reading this page first. A better recursive implementation will require only one recursive call each time. Rosetta code seems to have some pretty good recursive implementations as well.

Memoization In Python

I'd really love your help with understanding this using of Memoization in Python. I'm new to Python and I'm not quiet sure how to understand this syntax.
def fib_mem(n):
return fib_mem_helper(n,[0,1]+[-1]*(n-1))
def fib_mem_helper(i,mem):
if mem[i] == -1:
mem[i]=fib_mem_helper(i-1,mem) + fib_mem_helper(i-2,mem)
return mem[i]
This is a code I saw for avaluating fibonacci number using memoization, what does [0,1]+[-1]*(n-1) mean? Can you please explain me what does this type of writing represent?
[0, 1] + [-1] * (n - 1) means "concatenate two lists, one is [0, 1], the other one is a -1 repeated n-1 times".
[-1]*5 will create a new list with five elements being -1,i.e [-1 -1 -1 -1 -1]
[0 1]+[-1]*5 will append the two lists becoming [0 1 -1 -1 -1 -1 -1]
Strange coding, though. Looks like syntax errors. But according to your question:
[0,1] is a list with two elements, the first is an integer 0 and the second one is an integer 1.
A sensible implementation of such a function with memoization in Python would be:
def fib(i):
try:
return fib._memory[i]
except KeyError:
pass
if i == 1 or i == 2:
return 1
else:
f = fib(i-1) + fib(i-2)
fib._memory[i] = f
return f
fib._memory = {}
Memoization is a technique to avoid re-computing the same problem. I will come back to your question but here is an easier to understand solution.
mem = {0:0, 1:1}
def fib(n):
global mem
if mem.get(n,-1) == -1:
mem[n] = fib(n-1) + fib(n-2)
return mem[n]
By making mem a global variable, you can take advantage of memoization across calls to fib(). The line mem.get(n,-1) == -1 checks if mem already contains the computation for n. If so, it returns the result mem[n]. Otherwise, it makes recursive calls to fib() to compute fib(n) and stores this in mem[n].
Let's walk through your code. The second argument here fib_mem_helper(n,[0,1]+[-1]*(n-1)) passes a list of the form [0,1,-1,-1,...] with a length of (n+1). Within the fib_mem_helper function, this list is referenced by variable mem. If mem[i] turns out be -1, you compute m[i]; otherwise use the already computed result for mem[i]. Since you are not persisting mem across the calls to fib_mem(), it would run an order of magnitude slower.
First, I have to say that even after editing, your code still has a wrong indentation: return mem[i] should be unindented.
Among list operations, "+" means concatenation, "*" means repetition, so [0,1]+[-1]*(n-1) means a list: [0, 1, -1, ..., -1](totally (n-1) negative 1's).
More explanation:
List [0, 1, -1, ..., -1] stores calculated fibonacci sequences(memoization). Initially it only contains two valid values: 0 and 1, all "-1" elements mean the sequence at that index has not been computed yet. This memo is passed as the 2nd parameter to function fib_mem_helper. If the specified index(i.e. i)'s fibonacci number hasn't been computed(test if mem[i] == -1), fib_mem_helper will recursively compute it and store it to mem[i]. If it's been computed, just return from the memo without recomputing.
That's the whole story.
Final word:
This code is not efficient enough, although it takes use of memoization. In fact, it creates a new list each time when fib_mem is called. For example, if you call fib_mem(8) twice, the second call still has to recreate a list and recompute everything afresh. The reason is that you store the memo inside the scope of fib_mem. To fix it, you could save memo as a dictionary that's outside fib_mem.
The speed-up in python can be a million fold or more, when using memoization on certain functions. Here is an example with the fibonacci series. The conventional recursive way is like this and takes forever.
def recursive_fibonacci(number):
if number==0 or number==1:
return 1
return recursive_fibonacci(number-1) + recursive_fibonacci(number-2)
print recursive_fibonacci(50),
The same algorithm with memoization takes a few milli seconds. Try it yourself!
def memoize(f):
memo={}
def helper(x):
if x not in memo:
memo[x]=f(x)
return memo[x]
return helper
#memoize
def recursive_fibonacci(number):
if number==0 or number==1:
return 1
return recursive_fibonacci(number-1) + recursive_fibonacci(number-2)
print recursive_fibonacci(50),

Categories