Why is this recursive solution faster than the iterative solution?

Why is this recursive solution faster than the iterative solution? - python

I wrote a recursive solution for something today, and as it goes, curiosity led me down a weird path. I wanted to see how an optimized recursive solution compares to an iterative solution so I chose a classic, the Nth Fibonacci to test with.
I was surprised to find that the recursive solution with memoization is much faster than the iterative solution and I would like to know why.
Here is the code (using python3):
import time
import sys
sys.setrecursionlimit(10000)
## recursive:
def fibr(n, memo = {}):
if n <= 1:
return n
if n in memo:
return memo[n]
memo[n] = fibr(n-1, memo) + fibr(n-2, memo)
return memo[n]
## iterative:
def fibi(n):
a, b = 0, 1
for _ in range(n):
a, b = b, a + b
return a
rstart = time.time()
for n in range(10000):
fibr(n)
rend = time.time()
istart = time.time()
for n in range(10000):
fibi(n)
iend = time.time()
print(f"recursive: {rend-rstart}")
print(f"iterative: {iend-istart}")
My results:
recursive: 0.010010004043579102
iterative: 6.274333238601685
Unless I'm mistaken, both the recursive solution and the iterative solution are about as optimized as they can get? If I'm wrong about that, I'd like to know why.
If not, what would cause the iterative solution to be so much slower? It seems to be slower for all values of n, but harder to notice when n is something more reasonable, like <1000. (I'm using 10000 as you can see above)
Some things I've tried:
I thought it might be the magic swapping in the iterative solution a, b = b, a + b, so I tried replacing it with a more traditional "swap" pattern:
tmp = a + b
a = b
b = tmp
#a, b = b, a + b
But the results are basically the same, so that's not the problem there.
Re-arrange the code so that the iterative solution runs first, just to see if there was some weird cache issue at the OS level? It doesn't change the results so that's not it, probably?
My understanding here (and it might be wrong) is that the recursive solution with memoization is O(n). And the iterative solution is also O(n) simply because it iterates from 0..n.
Am I missing something really obvious? I feel like I must be missing something here.

You might expect that def fibr(n, memo = {}): — when called without a supplied memo — turns into something translated a bit like:
def _fibr_without_defaults(n, memo):
...
def fibr(n):
return _fibr_without_defaults(n, {})
That is, if memo is missing it implicitly gets a blank dictionary per your default.
In actuality, it translates into something more like:
def _fibr_without_defaults(n, memo):
...
_fibr_memo_default = {}
def fibr(n):
return _fibr_without_defaults(n, _fibr_memo_default)
That is, a default argument value is not "use this to construct a default value" but instead "use this actual value by default". Every single call to fibr you make (without supplying memo) is sharing a default memo dictionary.
That means in:
for n in range(10000):
fibr(n)
The prior iterations of your loop are filling out memo for the future iterations. When n is 1000, for example, all the work performed by n<=999 is still stored.
By contrast, the iterative version always starts iterating from 0, no matter what work prior iterative calls performed.
If you perform the translation above by hand, so that you really do get a fresh empty memo for each call, you'll see the iterative version is faster. (Makes sense; inserting things into a dictionary and retrieving them just to do the same work as simple iteration will be slower.)

They are not the same.
The recursive version uses the Memoization pattern, calculating only once the result of fibr(n) and storing/caching the result for insta-return if needed again. It's an O(n) algorithm.
The iterative version calculates everything from scratch. It's an O(n2) algorithm (I think).

Related

My program can't run that fast even with memoization

I tried a problem on project euler where I needed to find the sum of all the fibonacci terms under 4 million. It took me a long time but then I found out that I can use memoization to do it but it seems to take still a long time. After a lot of research, I found out that I can use a built-in module called lru_cache. My question is : why isn't it as fast as memoization ?
Here's my code:
from functools import lru_cache
#lru_cache(maxsize=1000000)
def fibonacci_memo(input_value):
global value
fibonacci_cache = {}
if input_value in fibonacci_cache:
return fibonacci_cache[input_value]
if input_value == 0:
value = 1
elif input_value == 1:
value = 1
elif input_value > 1:
value = fibonacci_memo(input_value - 1) + fibonacci_memo(input_value - 2)
fibonacci_cache[input_value] = value
return value
def sumOfFib():
SUM = 0
for n in range(500):
if fibonacci_memo(n) < 4000000:
if fibonacci_memo(n) % 2 == 0:
SUM += fibonacci_memo(n)
return SUM
print(sumOfFib())
The code works by the way. It takes less than a second to run it when I use the lru_cache module.

The other answer is the correct way to calculate the fibonacci sequence, indeed, but you should also know why your memoization wasn't working. To be specific:
fibonacci_cache = {}
This line being inside the function means you were emptying your cache every time fibonacci_memo was called.

You shouldn't be computing the Fibonacci sequence, not even by dynamic programming. Since the Fibonacci sequence satisfies a linear recurrence relation with constant coefficients and constant order, then so will be the sequence of their sums.
Definitely don't cache all the values. That will give you an unnecessary consumption of memory. When the recurrences have constant order, you only need to remember as many previous terms as the order of the recurrence.
Further more, there is a way to turn recurrences of constant order into systems recurrences of order one. The solution of the latter is given by a power of a matrix. This gives a faster algorithm, for large values of n. Each step will be more expensive, though. So, the best method would use a combination of the two, choosing the first method for small values of n and the latter for large inputs.
O(n) using the recurrence for the sum
Denote S_n=F_0+F_1+...+F_n the sum of the first Fibonacci numbers F_0,F_1,...,F_n.
Observe that
S_{n+1}-S_n=F_{n+1}
S_{n+2}-S_{n+1}=F_{n+2}
S_{n+3}-S_{n+2}=F_{n+3}
Since F_{n+3}=F_{n+2}+F_{n+1} we get that S_{n+3}-S_{n+2}=S_{n+2}-S_n. So
S_{n+3}=2S_{n+2}-S_n
with the initial conditions S_0=F_0=1, S_1=F_0+F_1=1+1=2, and S_2=S_1+F_2=2+2=4.
One thing that you can do is compute S_n bottom up, remembering the values of only the previous three terms at each step. You don't need to remember all of the values of S_k, from k=0 to k=n. This gives you an O(n) algorithm with O(1) amount of memory.
O(ln(n)) by matrix exponentiation
You can also get an O(ln(n)) algorithm in the following way:
Call X_n to be the column vector with components S_{n+2},S_{n+1},S_{n}
So, the recurrence above gives the recurrence
X_{n+1}=AX_n
where A is the matrix
[
[2,0,-1],
[1,0,0],
[0,1,0],
]
Therefore, X_n=A^nX_0. We have X_0. To multiply by A^n we can do exponentiation by squaring.

For the sake of completeness here are implementations of the general ideas described in #NotDijkstra's answer plus my humble optimizations including the "closed form" solution implemented in integer arithmetic.
We can see that the "smart" methods are not only an order of magnitude faster but also seem to scale better compatible with the fact (thanks #NotDijkstra) that Python big ints use better than naive multiplication.
import numpy as np
import operator as op
from simple_benchmark import BenchmarkBuilder, MultiArgument
B = BenchmarkBuilder()
def pow(b,e,mul=op.mul,unit=1):
if e == 0:
return unit
res = b
for bit in bin(e)[3:]:
res = mul(res,res)
if bit=="1":
res = mul(res,b)
return res
def mul_fib(a,b):
return (a[0]*b[0]+5*a[1]*b[1])>>1 , (a[0]*b[1]+a[1]*b[0])>>1
def fib_closed(n):
return pow((1,1),n+1,mul_fib)[1]
def fib_mat(n):
return pow(np.array([[1,1],[1,0]],'O'),n,op.matmul)[0,0]
def fib_sequential(n):
t1,t2 = 1,1
for i in range(n-1):
t1,t2 = t2,t1+t2
return t2
def sum_fib_direct(n):
t1,t2,res = 1,1,1
for i in range(n):
t1,t2,res = t2,t1+t2,res+t2
return res
def sum_fib(n,method="closed"):
if method == "direct":
return sum_fib_direct(n)
return globals()[f"fib_{method}"](n+2)-1
methods = "closed mat sequential direct".split()
def f(method):
def f(n):
return sum_fib(n,method)
f.__name__ = method
return f
for method in methods:
B.add_function(method)(f(method))
B.add_arguments('N')(lambda:(2*(1<<k,) for k in range(23)))
r = B.run()
r.plot()
import matplotlib.pylab as P
P.savefig(fib.png)

I am not sure how you are taking anything near a second. Here is the memoized version without fanciness:
class fibs(object):
def __init__(self):
self.thefibs = {0:0, 1:1}
def __call__(self, n):
if n not in self.thefibs:
self.thefibs[n] = self(n-1)+self(n-2)
return self.thefibs[n]
dog = fibs()
sum([dog(i) for i in range(40) if dog(i) < 4000000])

Python iterate through unique commutative equations

Inspired by the math of this video, I was curious how long it takes get a number, n, where I can use the number, 1, and a few operations like addition and subtraction, in the binary base. Currently, I have things coded up like this:
import itertools as it
def brute(m):
out = set()
for combo in it.product(['+','*','|'], repeat=m):
x = parse(combo)
if type(x) == int and 0 < x-1:
out.add(x)
return out
def parse(ops):
eq = ""
last = 1
for op in ops:
if op == "|":
last *= 2
last += 1
else:
eq += str(last)
eq += op
last = 1
eq += str(last)
return eval(eq)
Here, "|" refers to concatenation, thus combo = ("+","|","*","|","|") would parse as 1+11*111 = 1+3*7 = 22. I'm using this to build a table of how many numbers are m operations away and study how this grows. Currently I am using the itertools product function to search all possible operation combos, despite this being a bit repetitive, as "+" and "*" are both commutative operations. Is there a clean way to only generate the commutatively unique expressions?
Currently, my best idea is to have a second brute force program, brute2(k) which only uses the operations "*" and "|", and then have:
def brute(m):
out = set()
for k in range(m):
for a,b in it.product(brute(k),brute2(m-k)):
out.add(a+b)
if I memoize things, this would be pretty decent, but I'm not sure if there's a more pythonic or efficient way than this. Furthermore, this fails to cleanly generalize if I decide to add more operations in like subtraction.
What I'm hoping to achieve is insight on if there is some module or simple method which will efficiently iterate through the operation combos. Currently, itertools.product has complexity O(3^m). However, with memoizing and using the brute2 method, it seems I could get things down to complexity O(|brute(m-1)|), which seems to be asymptotically ~O(1.5^m). While I am moderately happy with this second method, it would be nice if there was a more generalizable method which would extend for arbitrary amounts of operations, some of which are commutative.
Update:
I've now gotten my second idea coded up. With this, I was able to quickly get all numbers reachable in 42 operations, when my old code got stuck for hours after 20 operations. Here is the new code:
memo1 = {0:{1}}
def brute1(m):
if m not in memo1:
out = set(brute2(m+1))
for i in range(m):
for a,b in it.product(brute1(i),brute2(m-i)):
out.add(a+b)
memo1[m] = set(out)
return memo1[m]
memo2 = {0:{1}}
def brute2(m):
if m not in memo2:
out = set()
for i in range(m):
for x in brute2(i):
out.add(x*(2**(m-i)-1))
memo2[m] = set(out)
return memo2[m]
I were to generalize this, you order all your commutative operations, [op_1,op_2,... op_x], have brute(n,0) return all numbers reachable with n non-commutative operations, and then have:
memo = {}
def brute(n,i):
if (n,i) not in memo:
out = brute(n,i-1) #in case I don't use op_i
for x in range(1,n): #x is the number of operations before my last use of op_i, if combo = (op_i,"|","+","+","*","+",op_i,"*","|"), this case would be covered when x = 6
for a,b in it.product(brute(x,i),brute(n-x-1,i-1)):
out.add(a op_i b)
memo[(n,i)] = out
return memo[(n,i)]
However, you have to be careful about order of operations. If I did addition and then multiplication, things would be totally different, and this would say different things. If there's a hierarchy like PEMDAS, and you don't want to consider parentheses, I believe you just list the operations in decreasing order of priority, i.e. op_1 is the first operation you do etc. If you allow general parenthesization, then you should have something like this I believe:
memo = {0:1}
def brute(m, ops):
if m not in memo:
out = set()
for i in range(m):
for a,b in it.product(brute(i),brute(m-i-1)):
for op in ops:
out.add( op(a,b) )
memo[m] = out
return memo[m]
I'm more interested in the case where we have some sort of PEMDAS system in place, and the parentheses are implied by the sequence of operations, but feedback on the validity/efficiency to either case is still welcome.

Recursion, memoization and mutable default arguments in Python

"Base" meaning without just using lru_cache. All of these are "fast enough" -- I'm not looking for the fastest algorithm -- but the timings surprised me so I was hoping I could learn something about how Python "works".
Simple loop (/tail recursion):
def fibonacci(n):
a, b = 0, 1
if n in (a, b): return n
for _ in range(n - 1):
a, b = b, a + b
return b
Simple memoized:
def fibonacci(n, memo={0:0, 1:1}):
if len(memo) <= n:
memo[n] = fibonacci(n - 1) + fibonacci(n - 2)
return memo[n]
Using a generator:
def fib_seq():
a, b = 0, 1
yield a
yield b
while True:
a, b = b, a + b
yield b
def fibonacci(n):
return next(x for (i, x) in enumerate(fib_seq()) if i == n)
I expected the first, being dead simple, to be the fastest. It's not. The second is by far the fastest, despite the recursion and lots of function calls. The third is cool, and uses "modern" features, but is even slower, which is disappointing. (I was tempted to think of generators as in some ways an alternative to memoization -- since they remember their state -- and since they're implemented in C I was hoping they'd be faster.)
Typical results:
loop: about 140 μs
memo: about 430 ns
genr: about 250 μs
So can anyone explain, in particular, why memoization is an order of magnitude faster than a simple loop?
EDIT:
Clear now that I have (like many before me) simply stumbled upon Python's mutable default arguments. This behavior explains the real and the apparent gains in execution speeds.

What you're seeing is the whole point of memoization. The first time you call the function, the memo cache is empty and it has to recurse. But the next time you call it with the same or a lower parameter, the answer is already in the cache, so it returns immediately. if you perform thousands of calls, you're amortizing that first call's time over all the other calls. That's what makes memoization such a useful optimization, you only pay the cost the first time.
If you want to see how long it takes when the cache is fresh and you have to do all the recursions, you can pass the initial cache as an explicit argument in the benchmark call:
fibonacci(100, {0:0, 1:1})

Benefits of Quichesort

I created this program for an assignment in which we were required to create an implementation of Quichesort. This is a hybrid sorting algorithm that uses Quicksort until it reaches a certain recursion depth (log2(N), where N is the length of the list), then switches to Heapsort, to avoid exceeding the maximum recursion depth.
While testing my implementation, I discovered that although it generally performed better than regular Quicksort, Heapsort consistently outperformed both. Can anyone explain why Heapsort performs better, and under what circumstances Quichesort would be better than both Quicksort and Heapsort?
Note that for some reason, the assignment referred to the algorithm as "Quipsort".
Edit: Apparently, "Quichesort" is actually identical to
Introsort.
I also noticed that a logic error in my medianOf3() function was
causing it to return the wrong value for certain inputs. Here is an improved
version of the function:
def medianOf3(lst):
"""
From a lst of unordered data, find and return the the median value from
the first, middle and last values.
"""
first, last = lst[0], lst[-1]
if len(lst) <= 2:
return min(first, last)
middle = lst[(len(lst) - 1) // 2]
return sorted((first, middle, last))[1]
Would this explain the algorithm's relatively poor performance?
Code for Quichesort:
import heapSort # heapSort
import math # log2 (for quicksort depth limit)
def medianOf3(lst):
"""
From a lst of unordered data, find and return the the median value from
the first, middle and last values.
"""
first, last = lst[0], lst[-1]
if len(lst) <= 2:
return min(first, last)
median = lst[len(lst) // 2]
return max(min(first, median), min(median, last))
def partition(pivot, lst):
"""
partition: pivot (element in lst) * List(lst) ->
tuple(List(less), List(same, List(more))).
Where:
List(Less) has values less than the pivot
List(same) has pivot value/s, and
List(more) has values greater than the pivot
e.g. partition(5, [11,4,7,2,5,9,3]) == [4,2,3], [5], [11,7,9]
"""
less, same, more = [], [], []
for val in lst:
if val < pivot:
less.append(val)
elif val > pivot:
more.append(val)
else:
same.append(val)
return less, same, more
def quipSortRec(lst, limit):
"""
A non in-place, depth limited quickSort, using median-of-3 pivot.
Once the limit drops to 0, it uses heapSort instead.
"""
if lst == []:
return []
if limit == 0:
return heapSort.heapSort(lst)
limit -= 1
pivot = medianOf3(lst)
less, same, more = partition(pivot, lst)
return quipSortRec(less, limit) + same + quipSortRec(more, limit)
def quipSort(lst):
"""
The main routine called to do the sort. It should call the
recursive routine with the correct values in order to perform
the sort
"""
depthLim = int(math.log2(len(lst)))
return quipSortRec(lst, depthLim)
Code for Heapsort:
import heapq # mkHeap (for adding/removing from heap)
def heapSort(lst):
"""
heapSort(List(Orderable)) -> List(Ordered)
performs a heapsort on 'lst' returning a new sorted list
Postcondition: the argument lst is not modified
"""
heap = list(lst)
heapq.heapify(heap)
result = []
while len(heap) > 0:
result.append(heapq.heappop(heap))
return result

The basic facts are as follows:
Heapsort has worst-case O(n log(n)) performance but tends to be slow in practice.
Quicksort has O(n log(n)) performance on average, but O(n^2) in the worst case but is fast in practice.
Introsort is intended to harness the fast-in-practice performance of quicksort, while still guaranteeing the worst-case O(n log(n)) behavior of heapsort.
One question to ask is, why is quicksort faster "in practice" than heapsort? This is a tough one to answer, but most answers point to how quicksort has better spatial locality, leading to fewer cache misses. However, I'm not sure how applicable this is to Python, as it is running in an interpreter and has a lot more junk going on under the hood than other languages (e.g. C) that could interfere with cache performance.
As to why your particular introsort implementation is slower than Python's heapsort - again, this is difficult to determine. First of all, note that the heapq module is written in Python, so it's on a relatively even footing with your implementation. It may be that creating and concatenating many smaller lists is costly, so you could try rewriting your quicksort to act in-place and see if that helps. You could also try tweaking various aspects of the implementation to see how that affects performance, or run the code through a profiler and see if there are any hot spots. But in the end I think it's unlikely you'll find a definite answer. It may just boil down to which operations are particularly fast or slow in the Python interpreter.

How many combinations are possible?

The recursive formula for computing the number of ways of choosing k items out of a set of n items, denoted C(n,k), is:
1 if K = 0
C(n,k) = { 0 if n<k
c(n-1,k-1)+c(n-1,k) otherwise
I’m trying to write a recursive function C that computes C(n,k) using this recursive formula. The code I have written should work according to myself but it doesn’t give me the correct answers.
This is my code:
def combinations(n,k):
# base case
if k ==0:
return 1
elif n<k:
return 0
# recursive case
else:
return combinations(n-1,k-1)+ combinations(n-1,k)
The answers should look like this:
>>> c(2, 1)
0
>>> c(1, 2)
2
>>> c(2, 5)
10
but I get other numbers... don’t see where the problem is in my code.

I would try reversing the arguments, because as written n < k.
I think you mean this:
>>> c(2, 1)
2
>>> c(5, 2)
10

Your calls, e.g. c(2, 5) means that n=2 and k=5 (as per your definition of c at the top of your question). So n < k and as such the result should be 0. And that’s exactly what happens with your implementation. And all other examples do yield the actually correct results as well.
Are you sure that the arguments of your example test cases have the correct order? Because they are all c(k, n)-calls. So either those calls are wrong, or the order in your definition of c is off.

This is one of those times where you really shouldn't be using a recursive function. Computing combinations is very simple to do directly. For some things, like a factorial function, using recursion there is no big deal, because it can be optimized with tail-recursion anyway.
Here's the reason why:
Why do we never use this definition for the Fibonacci sequence when we are writing a program?
def fibbonacci(idx):
if(idx < 2):
return idx
else:
return fibbonacci(idx-1) + fibbonacci(idx-2)
The reason is because that, because of recursion, it is prohibitively slow. Multiple separate recursive calls should be avoided if possible, for the same reason.
If you do insist on using recursion, I would recommend reading this page first. A better recursive implementation will require only one recursive call each time. Rosetta code seems to have some pretty good recursive implementations as well.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.