Complexity analysis of nested recursive functions - python

I've written out a recursive algorithm for a little homegrown computer algebra system, where I'm applying pairwise reductions to the list of operands of an algebraic operation (adjacent operands only, as the algebra is non-commutative). I'm trying to get an idea of the runtime complexity of my algorithm (but unfortunately, as a physicist it's been a very long time since I took any undergrad CS courses that dealt with complexity analysis). Without going into details of the specific problem, I think I can formalize the algorithm in terms of a function f that is a "divide" step and a function g that combines the results. My algorithm would then take the following formal representation:
f(1) = 1 # recursion anchor for f
f(n) = g(f(n/2), f(n/2))
g(n, 0) = n, g(0, m) = m # recursion ...
g(1, 0) = g(0, 1) = 1 # ... anchors for g
/ g(g(n-1, 1), m-1) if reduction is "non-neutral"
g(n, m) = | g(n-1, m-1) if reduction is "neutral"
\ n + m if no reduction is possible
In this notation, the functions f and g receive lists as arguments and return lists, with the length of the input/output lists being the argument and the right-hand-side of the equations above.
For the full story, the actual code corresponding to f and g is the following:
def _match_replace_binary(cls, ops: list) -> list:
"""Reduce list of `ops`"""
n = len(ops)
if n <= 1:
return ops
ops_left = ops[:n//2]
ops_right = ops[n//2:]
return _match_replace_binary_combine(
cls,
_match_replace_binary(cls, ops_left),
_match_replace_binary(cls, ops_right))
def _match_replace_binary_combine(cls, a: list, b: list) -> list:
"""combine two fully reduced lists a, b"""
if len(a) == 0 or len(b) == 0:
return a + b
if len(a) == 1 and len(b) == 1:
return a + b
r = _get_binary_replacement(a[-1], b[0], cls._binary_rules)
if r is None:
return a + b
if r == cls.neutral_element:
return _match_replace_binary_combine(cls, a[:-1], b[1:])
r = [r, ]
return _match_replace_binary_combine(
cls,
_match_replace_binary_combine(cls, a[:-1], r),
b[1:])
I'm interested in the worst-case number of times get_binary_replacement is
called, depending on the size of ops

So I think I've got it now. To restate the problem: find the number of calls to _get_binary_replacement when calling _match_replace_binary with an input of size n.
define function g(n, m) (as in original question) that maps the size of the the two inputs of _match_replace_binary_combine to the size of the output
define a function T_g(n, m) that maps the size of the two inputs of _match_replace_binary_combine to the total number of calls to g that is required to obtain the result. This is also the (worst case) number of calls to _get_binary_replacement as each call to _match_replace_binary_combine calls _get_binary_replacement at most once
We can now consider the worst case and best case for g:
best case (no reduction): g(n,m) = n + m, T_g(n, m) = 1
worst case (all non-neutral reduction): g(n, m) = 1, T_g(n, m) = 2*(n+m) - 1 (I determined this empirically)
Now, the master theorem (WP) applies:
Going through the description on WP:
k=1 (the recursion anchor is for size 1)
We split into a = 2 subproblems of size n/2 in constant (d = 1) time
After solving the subproblems, the amount of work required to combine the results is c = T_g(n/2, n/2). This is n-1 (approximately n) in the worst case and 1 in the best case
Thus, following the examples on the WP page for the master theorem, the worst case complexity is n * log(n), and the best case complexity is n
Empirical trials seem to bear out this result. Any objections to my line of reasoning?

Related

Finding if the lcm of a list is in the set 3^d

I need to find a better way to raise the 'tes' variable in python till I can get the print out to terminate.
(Running rather slow at 700)
I propose the need for a better way of defining the 'shifts' list, and reducing redundancies since the problem is symmetric lcm(shifts(n,m))=lcm(shift(mxn))
N, M, k =3,3,0
tes =700
sides=[]
def gcd(n, m):
if m == 0:
return n
return gcd(m, n % m)
for N in range(tes+1):
M=3
for M in range(tes+1):
n, m, shifts = N, M, []
while ma.floor(min (n,m)/2)>= 1:
shifts.append(2*n+2*m-4)
n-=2
m-=2
#lcm of shifts list
lcm = 1
for i in shifts:
lcm = lcm * i // gcd(lcm, i)
p = ma.log(lcm) / ma.log(3)
# checking to see if power lcm is in {3^d} where d in Natural numbers
if (p - int(p) == 0) and lcm>1:
print(shifts, N, M,lcm)
M+= 1
N+ 1
So far I have attempted to come up with an equation to make composing the list more efficient (so far this pythonic manner seems to be working better). Was working with sympy but that became to cumbersome dealing with sympy.products to try and directly find the lcm of the list flat out because I couldn't get the bounds correct.
-If I could find a way to avoid redundancies, since lcm is the same f(m,n)=f(n,m).
-Need list comprehension for the shifts if not a basic formula
-Been Looking into mathematical ways to prove if the two sets (3^d,lcm(shifts)) intersect for some natural numbers, but I need to transform lcm(shifts(n,m)) into an analytic function from the current programmatic/numerical method.
-Any resources are also helpful, because this is just a segment for the total project and any further reading I'm sure will be a help in the future for the overall project.

Recursion with memory vs loop

I've made two functions for computing the Fibonacci Sequence, one using recursion with memory and one using a loop;
def fib_rec(n, dic = {0 : 0, 1 : 1}):
if n in dic:
return dic[n]
else:
fib = fib_rec(n - 2, dic) + fib_rec(n - 1, dic)
dic[n] = fib
return fib
def fib_loop(n):
if n == 0 or n == 1:
return n
else:
smaller = 0
larger = 1
for i in range(1, n):
smaller, larger = larger, smaller + larger
return larger
I've heard that the Fibonacci Sequence often is solved using recursion, but I'm wondering why. Both my algorithms are of linear time complexity, but the one using a loop will not have to carry a dictionary of all past Fibonacci numbers, it also won't exceed Python's recursion depth.
Is this problem solved using recursion only to teach recursion or am I missing something?
The usual recursive O(N) Fibonacci implementation is more like this:
def fib(n, a=0, b=1):
if n == 0: return a
if n == 1: return b
return fib(n - 1, b, a + b)
The advantage with this approach (aside from the fact that it uses O(1) memory) is that it is tail-recursive: some compilers and/or runtimes can take advantage of that to secretly convert it to a simple JUMP instruction. This is called tail-call optimization.
Python, sadly, doesn't use this strategy, so it will use extra memory for the call stack, which as you noted quickly runs into Python's recursion depth limit.
The Fibonacci sequence is mostly a toy problem, used for teaching people how to write algorithms and about big Oh notation. It has elegant functional solutions as well as showing the strengths of dynamic programming (basically your dictionary-based solution), but it's also practically a solved problem.
We can also go a lot faster. The page https://www.nayuki.io/page/fast-fibonacci-algorithms describes how. It includes a fast doubling algorithm written in Python:
#
# Fast doubling Fibonacci algorithm (Python)
# by Project Nayuki, 2015. Public domain.
# https://www.nayuki.io/page/fast-fibonacci-algorithms
#
# (Public) Returns F(n).
def fibonacci(n):
if n < 0:
raise ValueError("Negative arguments not implemented")
return _fib(n)[0]
# (Private) Returns the tuple (F(n), F(n+1)).
def _fib(n):
if n == 0:
return (0, 1)
else:
a, b = _fib(n // 2)
c = a * (b * 2 - a)
d = a * a + b * b
if n % 2 == 0:
return (c, d)
else:
return (d, c + d)

Theoretical vs actual time-complexity for algorithm calculating 2^n

I am trying to compute the time-complexity and compare it with the actual computation times.
If I am not mistaken, the time-complexity is O(log(n)), but looking at the actual computation times it looks more like O(n) or even O(nlog(n)).
What could be reason for this difference?
def pow(n):
"""Return 2**n, where n is a nonnegative integer."""
if n == 0:
return 1
x = pow(n//2)
if n%2 == 0:
return x*x
return 2*x*x
Theoretical time-complexity:
Actual run times:
I was suspecting your time calculation is not accurate, so I did it using timeit, here're my stats:
import timeit
# N
sx = [10, 100, 1000, 10e4, 10e5, 5e5, 10e6, 2e6, 5e6]
# average runtime in seconds
sy = [timeit.timeit('pow(%d)' % i, number=100, globals=globals()) for i in sx]
Update:
Well, the code did run with O(n*log(n))...! A possible explanation is that multiplication / division is not O(1) for large numbers so this part doesn't hold:
T(n) = 1 + T(n//2)
= 1 + 1 + T(n//4)
# ^ ^
# mul>1
# div>1
# when n is large
Experiment with multiplication and division:
mul = lambda x: x*x
div = lambda y: x//2
s1 = [timeit.timeit('mul(%d)' % i, number=1000, globals=globals()) for i in sx]
s2 = [timeit.timeit('div(%d)' % i, number=1000, globals=globals()) for i in sx]
And plots, same for mul and div - they are not O(1) (?) small integers seem to be more efficient but no big difference for large integers. I don't know what could be the cause then. (though, I should keep the answer here if it can help)
The number of iterations will be log(n,2) but each iteration needs to perform a multiplication between two numbers that are twice as large as the preceding iteration's.
The best multiplication algorithms for variable precision numbers perform in O(N * log(N) * log(log(N))) or O(N^log(3)) where N is the number of digits (bits or words) needed to represent the number. It would seem that the two complexities combine to produce execution times that are larger than O(log(n)) in practice.
The digit count of the two numbers at each iteration is 2^i. So the total time will be the sum of multiplication (x*x) complexities for numbers going through the log(n) iterations
To compute the function's time complexity based on the Schönhage–Strassen multiplications algorithm, we would need to add the time complexity of each iteration using : O(N * log(N) * log(log(N))):
∑ 2^i * log(2^i) * log(log(2^i)) [i = 0...log(n)]
∑ 2^i * i * log(i) [i = 0...log(n)]
which would be quite complex, so let's look at a simpler scenario.
If Python's variable precision multiplications used the most naive O(N^2) algorithm, the worst case time could be expressed as:
∑ (2^i)^2 [i = 0...log(n)]
∑ 4^i [i = 0...log(n)]
(4^(log(n)+1)-1)/3 # because ∑K^i [i=0..n] = (K^(n+1)-1)/(K-1)
( 4*4^log(n) - 1 ) / 3
( 4*(2^log(n))^2 - 1 ) / 3
(4*n^2-1)/3 # 2^log(n) = n
(4/3)*n^2-1/3
This would be O(n^2), which suggests that the log(n) iteration time cancels itself out in favour of the multiplication's complexity profile.
We get the same result if we apply this reasoning to the Karatsuba multiplication algorithm: O(N^log(3)):
∑ (2^i)^log(3) [i=0..log(n)]
∑ (2^log(3))^i [i=0..log(n)]
∑ 3^i [i=0..log(n)]
( 3^(log(n)+1) - 1 ) / 2 # because ∑K^i [i=0..n] = (K^(n+1)-1)/(K-1)
( 3*3^log(n) - 1 ) / 2
( 3*(2^log(3))^log(n) - 1 ) / 2
( 3*(2^log(n))^log(3) - 1 ) / 2
(3/2)*n^log(3) - 1/2
which corresponds to O(n^log(3)) and corroborates the theory.
Note that the last column of your measurement table is misleading because you're making n progress exponentially. This changes the meaning of t[i]/t[i-1] and its interpretation for the evaluation of time complexity. It would be more meaningful if the progression between N[i] and N[i-1] was linear.
Taking into account the N[i]/N[i-1] ratio in the calculation, I found that the results seem to correlate more with O(n^log(3)) which would suggest that Python uses Karatsuba for large integer multiplications. (for version 3.7.1 on MacOS) However, this correlation is very weak.
FINAL ANSWER: O(log(N))
After doing more tests, I realized that there are wild variations in the time taken to multiply large numbers. Sometimes larger numbers take considerably less time than smaller ones. This makes the timing figures suspect and correlation to a time complexity based on a small and irregular sample is not going to be conclusive.
With a larger and more evenly distributed sample, the time strongly correlates (0.99) with log(N). This would mean the differences introduced by multiplication overhead only impact fixed points in the value range. Intentionally selecting values of N that are orders of magnitude apart exacerbated the impact of these fixed points thus skewing the results.
So you can ignore all the nice theories I wrote above, because the data shows that the time complexity is indeed Log(n). You just have to use a more meaningful sample (and better rate of change calculations).
Its because multiply 2 small numbers its O(1). But multiply 2 long number (N - num)O(log(N)**2). https://en.wikipedia.org/wiki/Multiplication_algorithm
So in each step time increase not O(log(N))
This can be complex, but there are different cases that you will have to examine for different values of n since this is recursive. This should explain it https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms).
You have to consider the true input size of the function. It's not the magnitude of n, but the number of bits needed to represent n, which is logarithmic in the magnitude. That is, dividing a number by 2 doesn't cut the input size in half: it only reduces it by 1 bit. This means that for an n-bit number (whose value is between 2^n and 2^(n+1)), the running time is indeed logarithmic in magnitude, but linear in the number of bits.
n lg n bits to represent n
--------------------------------------
10 between 2 and 3 4 (1010)
100 between 4 and 5 7 (1100100)
1000 just under 7 10 (1111101000)
10000 between 9 and 10 14 (10011100010000)
Each time you multiply n by 10, you are only increasing the input size by 3-4 bits, roughly a factor of 2, not a factor of 10.
For some integer values python will internally use "long reperesentation" And in your case this happens somewhere after n=63, so your theoretical time complexity should be correct only for values of n < 63.
For "long representation" multiplying 2 numbers (x * y) have complexity bigger than O(1):
for x == y (e.g. x*x) complexity is around O(Py_SIZE(x)² / 2).
for x != y (e.g. 2*x) multiplication is performed like "Schoolbook long multiplication", so complexity will be O(Py_SIZE(x)*Py_SIZE(y)). In your case it might a little affect performance too, because 2*x*x will do (2*x)*x, while faster way would be to do 2*(x*x)
And so for n>=63 theoretical complexity must also account for complexity of multiplications.
It's possible to measure "pure" complexity of custom pow (ignoring complexity of multiplication) if you can reduce complexity of multiplication to O(1). For example:
SQUARE_CACHE = {}
HALFS_CACHE = {}
def square_and_double(x, do_double=False):
key = hash((x, do_double))
if key not in SQUARE_CACHE:
if do_double:
SQUARE_CACHE[key] = 2 * square_and_double(x, False)
else:
SQUARE_CACHE[key] = x*x
return SQUARE_CACHE[key]
def half_and_remainder(x):
key = hash(x)
if key not in HALFS_CACHE:
HALFS_CACHE[key] = divmod(x, 2)
return HALFS_CACHE[key]
def pow(n):
"""Return 2**n, where n is a non-negative integer."""
if n == 0:
return 1
x = pow(n//2)
return square_and_double(x, do_double=bool(n % 2 != 0))
def pow_alt(n):
"""Return 2**n, where n is a non-negative integer."""
if n == 0:
return 1
half_n, remainder = half_and_remainder(n)
x = pow_alt(half_n)
return square_and_double(x, do_double=bool(remainder != 0))
import timeit
import math
# Values of n:
sx = sorted([int(x) for x in [100, 1000, 10e4, 10e5, 5e5, 10e6, 2e6, 5e6, 10e7, 10e8, 10e9]])
# Fill caches of `square_and_double` and `half_and_remainder` to ensure that complexity of both `x*x` and of `divmod(x, 2)` are O(1):
[pow_alt(n) for n in sx]
# Average runtime in ms:
sy = [timeit.timeit('pow_alt(%d)' % n, number=500, globals=globals())*1000 for n in sx]
# Theoretical values:
base = 2
sy_theory = [sy[0]]
t0 = sy[0] / (math.log(sx[0], base))
sy_theory.extend([
t0*math.log(x, base)
for x in sx[1:]
])
print("real timings:")
print(sy)
print("\ntheory timings:")
print(sy_theory)
print('\n\nt/t_prev:')
print("real:")
print(['--' if i == 0 else "%.2f" % (sy[i]/sy[i-1]) for i in range(len(sy))])
print("\ntheory:")
print(['--' if i == 0 else "%.2f" % (sy_theory[i]/sy_theory[i-1]) for i in range(len(sy_theory))])
# OUTPUT:
real timings:
[1.7171500003314577, 2.515988002414815, 4.5264500004122965, 4.929114998958539, 5.251838003459852, 5.606903003354091, 6.680275000690017, 6.948587004444562, 7.609975000377744, 8.97067000187235, 16.48820400441764]
theory timings:
[1.7171500003314577, 2.5757250004971866, 4.292875000828644, 4.892993172417281, 5.151450000994373, 5.409906829571465, 5.751568172583011, 6.010025001160103, 6.868600001325832, 7.727175001491561, 8.585750001657289]
t/t_prev:
real:
['--', '1.47', '1.80', '1.09', '1.07', '1.07', '1.19', '1.04', '1.10', '1.18', '1.84']
theory:
['--', '1.50', '1.67', '1.14', '1.05', '1.05', '1.06', '1.04', '1.14', '1.12', '1.11']
Results are still not perfect but close to theoretical O(log(n))
You can generate textbook-like results if you count what they count, the steps taken:
def pow(n):
global calls
calls+=1
"""Return 2**n, where n is a nonnegative integer."""
if n == 0:
return 1
x = pow(n//2)
if n%2 == 0:
return x*x
return 2*x*x
def steppow(n):
global calls
calls=0
pow(n)
return calls
sx = [math.pow(10,n) for n in range(1,11)]
sy = [steppow(n)/math.log(n) for n in sx]
print(sy)
Then it produces something like this:
[2.1714724095162588, 1.737177927613007, 1.5924131003119235, 1.6286043071371943, 1.5634601348517065, 1.5200306866613815, 1.5510517210830421, 1.5200306866613813, 1.4959032154445342, 1.5200306866613813]
Where the 1.52... appears to be some kind of favourite.
But the actual runtime also includes the seemingly innocent mathematical operations, which grow in complexity too as the number physically grows in the memory. CPython uses a numer of multiplication implementations branching at various points:
long_mul is the entry:
if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
return PyLong_FromLongLong((long long)v);
}
z = k_mul(a, b);
if the numbers fit into a CPU word, they get multiplied in place (but the result may be larger, hence the LongLong (*)), otherwise they will use k_mul() which stands for Karatsuba multiplication, which also checks a couple things based on size and value:
i = a == b ? KARATSUBA_SQUARE_CUTOFF : KARATSUBA_CUTOFF;
if (asize <= i) {
if (asize == 0)
return (PyLongObject *)PyLong_FromLong(0);
else
return x_mul(a, b);
}
for shorter numbers, a classic algorithm is used, x_mul(), and shortness-check also depends on the product being a square, beause x_mul() has an optimized code path for calculating x*x-like expressions. However, above a certain in-memory-size, the algorithm stays locally, but then it does another check about how different the magnitude of the two values are different:
if (2 * asize <= bsize)
return k_lopsided_mul(a, b);
possibly branching to yet another algorithm, k_lopsided_mul(), which is still Karatsuba, but optimized for multiplying numbers with significant difference in magnitude.
In short, even the 2*x*x has significance, if you replace it with x*x*2, timeit results will differ:
2*x*x: [0.00020009249478223623, 0.0002965123323532072, 0.00034258906889154733, 0.0024181753953639975, 0.03395215528201522, 0.4794894526936972, 4.802882867816082]
x*x*2: [0.00014974939375012042, 0.00020265231347948998, 0.00034002925019471775, 0.0024501731290706985, 0.03400164511014836, 0.462764023966729, 4.841786565730171]
(measured as
sx = [math.pow(10,n) for n in range(1,8)]
sy = [timeit.timeit('pow(%d)' % i, number=100, globals=globals()) for i in sx]
)
(*) by the way, as the size of the result is often overestimated (like at the very beginning, long*long may or may not fit into a long afterwards), there is a long_normalize function too, which at the end does spend time on freeing extra memory (see comment above it), but still sets the correct size for the internal object, which involves a loop counting zeroes in front of the actual number.

Time Complexity of recursive of function

Need help proving the time complexity of a recursive function.
Supposedly it's 2^n. I need to prove that this is the case.
def F(n):
if n == 0:
return 0
else:
result = 0
for i in range(n):
result += F(i)
return n*result+n`
Here's another version that does the same thing. The assignment said to use an array to store values in an attempt to reduce the time complexity so what I did was this
def F2(n,array):
if n < len(array):
answer = array[n]
elif n == 0:
answer = 0
array.append(answer)
else:
result = 0
for i in range(n):
result += F2(i,array)
answer = n*result+n
array.append(answer)
return answer
Again what I am looking for is the explanation of how to find the complexities of the two snippets of code, not interested in just knowing the answer.
All and any help greatly appreciated.
PS: for some reason, I can't get "def F2" to stay in the code block...sorry about that
Okay, the first function you wrote is an example of Exhaustive Search where you are exploring every possible branch that can be formed from a set of whole numbers up to n (which you have passed in the argument and you are using for loop for that). To explain you the time complexity I am going to consider the recursion stack as a Tree (to represent a recursive function call stack you can either use a stack or use an n-ary Tree)
Let's call you first function F1:
F1(3), now three branches will be formed for each number in the set S (set is the whole numbers up to n). I have taken n = 3, coz it will be easy for me to make the diagram for it. You can try will other larger numbers and observe the recursion call stack.
3
/| \
0 1 2 ----> the leftmost node is returns 0 coz (n==0) it's the base case
| /\
0 0 1
|
0 ----> returns 0
So here you have explored every possibility branches. If you try to write the recursive equation for the above problem then:
T(n) = 1; n is 0
= T(n-1) + T(n-2) + T(n-3) + ... + T(1); otherwise
Here,
T(n-1) = T(n-2) + T(n-3) + ... T(1).
So, T(n-1) + T(n-2) + T(n-3) + ... + T(1) = T(n-1) + T(n-1)
So, the Recursive equation becomes:
T(n) = 1; n is 0
= 2*T(n-1); otherwise
Now you can easily solve this recurrence relation (or use can use Masters theorem for the fast solution). You will get the time complexity as O(2^n).
Solving the recurrence relation:
T(n) = 2T(n-1)
= 2(2T(n-1-1) = 4T(n-2)
= 4(2T(n-3)
= 8T(n-3)
= 2^k T(n-k), for some integer `k` ----> equation 1
Now we are given the base case where n is 0, so let,
n-k = 0 , i.e. k = n;
Put k = n in equation 1,
T(n) = 2^n * T(n-n)
= 2^n * T(0)
= 2^n * 1; // as T(0) is 1
= 2^n
So, T.C = O(2^n)
So this is how you can get the time complexity for your first function. Next, if you observe the recursion Tree formed above (each node in the tree is a subproblem of the main problem), you will see that the nodes are repeating (i.e. the subproblems are repeating). So you have used a memory in your second function F2 to store the already computed value and whenever the sub-problems are occurring again (i.e. repeating subproblems) you are using the pre-computed value (this saves time for computing the sub-problems again and again). The approach is also known as Dynamic Programming.
Let's now see the second function, here you are returning answer. But, if you see your function you are building an array named as array in your program. The main time complexity goes there. Calculating its time complexity is simple because there is always just one level of recursion involved (or casually you can say no recursion involved) as every number i which is in range of number n is always going to be less than the number n, So the first if condition gets executed and control returns from there in F2. So each call can't go deeper than 2 level in the call stack.
So,
Time complexity of second function = time taken to build the array;
= 1 comparisions + 1 comparisions + 2 comparisions + ... + (n-1) comparisions
= 1 + 2 + 3 + ... + n-1
= O(n^2).
Let me give you a simple way to observe such recursions more deeply. You can print the recursion stack on the console and observe how the function calls are being made. Below I have written your code where I am printing the function calls.
Code:
def indent(n):
for i in xrange(n):
print ' '*i,
# second argument rec_cnt is just taken to print the indented function properly
def F(n, rec_cnt):
indent(rec_cnt)
print 'F(' + str(n) + ')'
if n == 0:
return 0
else:
result = 0
for i in range(n):
result += F(i, rec_cnt+1)
return n*result+n
# third argument is just taken to print the indented function properly
def F2(n, array, rec_cnt):
indent(rec_cnt)
print 'F2(' + str(n) + ')'
if n < len(array):
answer = array[n]
elif n == 0:
answer = 0
array.append(answer)
else:
result = 0
for i in range(n):
result += F2(i, array, rec_cnt+1)
answer = n*result+n
array.append(answer)
return answer
print F(4, 1)
lis = []
print F2(4, lis, 1)
Now observe the output:
F(4)
F(0)
F(1)
F(0)
F(2)
F(0)
F(1)
F(0)
F(3)
F(0)
F(1)
F(0)
F(2)
F(0)
F(1)
F(0)
96
F2(4)
F2(0)
F2(1)
F2(0)
F2(2)
F2(0)
F2(1)
F2(3)
F2(0)
F2(1)
F2(2)
96
In the first function call stack i.e. F1, you see that each call is explored up to 0, i.e. we are exploring each possible branch up to 0 (the base case), so, we call it Exhaustive Search.
In the second function call stack, you can see that the function calls are getting only two levels deep, i.e. they are using the pre-computed value to solve the repeated subproblems. Thus, it's time complexity is lesser than F1.

Subset sum for large sums

The subset sum problem is well-known for being NP-complete, but there are various tricks to solve versions of the problem somewhat quickly.
The usual dynamic programming algorithm requires space that grows with the target sum. My question is: can we reduce this space requirement?
I am trying to solve a subset sum problem with a modest number of elements but a very large target sum. The number of elements is too large for the exponential time algorithm (and shortcut method) and the target sum is too large for the usual dynamic programming method.
Consider this toy problem that illustrates the issue. Given the set A = [2, 3, 6, 8] find the number of subsets that sum to target = 11 . Enumerating all subsets we see the answer is 2: (3, 8) and (2, 3, 6).
The dynamic programming solution gives the same result, of course - ways[11] returns 2:
def subset_sum(A, target):
ways = [0] * (target + 1)
ways[0] = 1
ways_next = ways[:]
for x in A:
for j in range(x, target + 1):
ways_next[j] += ways[j - x]
ways = ways_next[:]
return ways[target]
Now consider targeting the sum target = 1100 the set A = [200, 300, 600, 800]. Clearly there are still 2 solutions: (300, 800) and (200, 300, 600). However, the ways array has grown by a factor of 100.
Is it possible to skip over certain weights when filling out the dynamic programming storage array? For my example problem I could compute the greatest common denominator of the input set and then reduce all items by that constant, but this won't work for my real application.
This SO question is related, but those answers don't use the approach I have in mind. The second comment by Akshay on this page says:
...in the cases where n is very small (eg. 6) and sum is very large
(eg. 1 million) then the space complexity will be too large. To avoid
large space complexity n HASHTABLES can be used.
This seems closer to what I'm looking for, but I can't seem to actually implement the idea. Is this really possible?
Edited to add: A smaller example of a problem to solve. There is 1 solution.
target = 5213096522073683233230240000
A = [2316931787588303659213440000,
1303274130518420808307560000,
834095443531789317316838400,
579232946897075914803360000,
425558899761116998631040000,
325818532629605202076890000,
257436865287589295468160000,
208523860882947329329209600,
172333769324749858949760000,
144808236724268978700840000,
123386899930738064691840000,
106389724940279249657760000,
92677271503532146368537600,
81454633157401300519222500,
72153585080604612224640000,
64359216321897323867040000,
57762842349846905631360000,
52130965220736832332302400,
47284322195679666514560000,
43083442331187464737440000,
39418499221729173786240000,
36202059181067244675210000,
33363817741271572692673536,
30846724982684516172960000,
28604096143065477274240000,
26597431235069812414440000,
24794751591313594450560000,
23169317875883036592134400,
21698632766175580575360000,
20363658289350325129805625,
19148196591638873216640000,
18038396270151153056160000,
17022355990444679945241600]
A real problem is:
target = 262988806539946324131984661067039976436265064677212251086885351040000
A = [116883914017753921836437627140906656193895584300983222705282378240000,
65747201634986581032996165266759994109066266169303062771721337760000,
42078209046391411861117545770726396229802410348353960173901656166400,
29220978504438480459109406785226664048473896075245805676320594560000,
21468474003260924418937523352411426647858372626711204170357987840000,
16436800408746645258249041316689998527266566542325765692930334440000,
12987101557528213537381958571211850688210620477887024745031375360000,
10519552261597852965279386442681599057450602587088490043475414041600,
8693844844295746252297013588993057072273225278585528961549928960000,
7305244626109620114777351696306666012118474018811451419080148640000,
6224587137040149683597270084426981690799173128454727836375984640000,
5367118500815231104734380838102856661964593156677801042589496960000,
4675356560710156873457505085636266247755823372039328908211295129600,
4109200102186661314562260329172499631816641635581441423232583610000,
3639983481521748430892521260443459881470796742937193786669693440000,
3246775389382053384345489642802962672052655119471756186257843840000,
2914003396564502206448583502127866774917064428556368433095682560000,
2629888065399463241319846610670399764362650646772122510868853510400,
2385386000362324935437502594712380738650930291856800463373109760000,
2173461211073936563074253397248264268068306319646382240387482240000,
1988573206351200938616141104476672789688204647842814753019927040000,
1826311156527405028694337924076666503029618504702862854770037160000,
1683128361855656474444701830829055849192096413934158406956066246656,
1556146784260037420899317521106745422699793282113681959093996160000,
1443011284169801504153550952356872298690068941987447193892375040000,
1341779625203807776183595209525714165491148289169450260647374240000,
1250838556670374906691960338012080744048823137584838292922165760000,
1168839140177539218364376271409066561938955843009832227052823782400,
1094646437211014876720019400903392201607763016346356924399106560000,
1027300025546665328640565082293124907954160408895360355808145902500,
965982760477305139144112620999228563585913919842836551283325440000,
909995870380437107723130315110864970367699185734298446667423360000,
858738960130436976757500934096457065914334905068448166814319513600,
811693847345513346086372410700740668013163779867939046564460960000,
768411414287644482489363509326632509674989232073666182868912640000,
728500849141125551612145875531966693729266107139092108273920640000,
691620793004461075955252231602997965644352569828303092930664960000,
657472016349865810329961652667599941090662661693030627717213377600,
625791330255672395317036671188673352614551016483550865168079360000,
596346500090581233859375648678095184662732572964200115843277440000,
568931977371436071675467087219123799753953628290345594563299840000,
543365302768484140768563349312066067017076579911595560096870560000,
519484062301128541495278342848474027528424819115480989801255014400,
497143301587800234654035276119168197422051161960703688254981760000,
476213321032044045508347054897310957784092466595223632570186240000,
456577789131851257173584481019166625757404626175715713692509290000,
438132122515529069774235170457376054037925971973698044293020160000,
420782090463914118611175457707263962298024103483539601739016561664,
404442609057972047876946806715939986830088526993021531852188160000,
389036696065009355224829380276686355674948320528420489773499040000,
374494562534633427030238036407319297168052779889230688624970240000,
360752821042450376038387738089218074672517235496861798473093760000,
347753793771829850091880543559722282890929011143421158461997158400,
335444906300951944045898802381428541372787072292362565161843560000,
323778155173833578494287055791985197213007158728485381455075840000,
312709639167593726672990084503020186012205784396209573230541440000,
302199145693704480473409550206308504954053507241841138853071360000,
292209785044384804591094067852266640484738960752458056763205945600,
282707666261699891568916593460940582033071824431295083135592960000,
273661609302753719180004850225848050401940754086589231099776640000,
265042888929147215048611399412486748738992254650755607041456640000,
256825006386666332160141270573281226988540102223840088952036475625,
248983485481605987343890803377079267631966925138189113455039385600,
241495690119326284786028155249807140896478479960709137820831360000,
234340660761814501342824380545368657996226388663143017230461440000,
227498967595109276930782578777716242591924796433574611666855840000,
220952578483466770957349011608519198854244960871423861446658560000,
214684740032609244189375233524114266478583726267112041703579878400,
208679870295533683104133831435857945991878646837700655494453760000,
202923461836378336521593102675185167003290944966984761641115240000,
197401994025105141026072179446079922264038329650750423033879040000,
192102853571911120622340877331658127418747308018416545717228160000,
187014262428406274938300203425450649910232934881573156328451805184,
182125212285281387903036468882991673432316526784773027068480160000,
177425404985627474536673746714144021883127046501745489011223040000,
172905198251115268988813057900749491411088142457075773232666240000,
168555556186474170249629649778586749838977769381324948621621760000,
164368004087466452582490413166899985272665665423257656929303344400]
In the particular comment you linked to, the suggestion is to use a hashtable to only store values which actually arise as a sum of some subset. In the worst case, this is exponential in the number of elements, so it is basically equivalent to the brute force approach you already mentioned and ruled out.
In general, there are two parameters to the problem - the number of elements in the set and the size of the target sum. Naive brute force is exponential in the first, while the standard dynamic programming solution is exponential in the second. This works well when one of the parameters is small, but you already indicated that both parameters are too big for an exponential solution. Therefore, you are stuck with the "hard" general case of the problem.
Most NP-Complete problems have some underlying graph whether implicit or explicit. Using graph partitioning and DP, it can be solved exponential in the treewidth of the graph but only polynomial in the size of the graph with treewidth held constant. Of course, without access to your data, it is impossible to say what the underlying graph might look like or whether it is in one of the classes of graphs that have bounded treewidths and hence can be solved efficiently.
Edit: I just wrote the following code to show what I meant by reducing it mod small numbers. The following code solves your first problem in less than a second, but it doesn't work on the larger problem (though it does reduce it to n=57, log(t)=68).
target = 5213096522073683233230240000
A = [2316931787588303659213440000,
1303274130518420808307560000,
834095443531789317316838400,
579232946897075914803360000,
425558899761116998631040000,
325818532629605202076890000,
257436865287589295468160000,
208523860882947329329209600,
172333769324749858949760000,
144808236724268978700840000,
123386899930738064691840000,
106389724940279249657760000,
92677271503532146368537600,
81454633157401300519222500,
72153585080604612224640000,
64359216321897323867040000,
57762842349846905631360000,
52130965220736832332302400,
47284322195679666514560000,
43083442331187464737440000,
39418499221729173786240000,
36202059181067244675210000,
33363817741271572692673536,
30846724982684516172960000,
28604096143065477274240000,
26597431235069812414440000,
24794751591313594450560000,
23169317875883036592134400,
21698632766175580575360000,
20363658289350325129805625,
19148196591638873216640000,
18038396270151153056160000,
17022355990444679945241600]
import itertools, time
from fractions import gcd
def gcd_r(seq):
return reduce(gcd, seq)
def miniSolve(t, vals):
vals = [x for x in vals if x and x <= t]
for k in range(len(vals)):
for sub in itertools.combinations(vals, k):
if sum(sub) == t:
return sub
return None
def tryMod(n, state, answer):
t, vals, mult = state
mods = [x%n for x in vals if x%n]
if (t%n or mods) and sum(mods) < n:
print 'Filtering with', n
print t.bit_length(), len(vals)
else:
return state
newvals = list(vals)
tmod = t%n
if not tmod:
for x in vals:
if x%n:
newvals.remove(x)
else:
if len(set(mods)) != len(mods):
#don't want to deal with the complexity of multisets for now
print 'skipping', n
else:
mini = miniSolve(tmod, mods)
if mini is None:
return None
mini = set(mini)
for x in vals:
mod = x%n
if mod:
if mod in mini:
t -= x
answer.add(x*mult)
newvals.remove(x)
g = gcd_r(newvals + [t])
t = t//g
newvals = [x//g for x in newvals]
mult *= g
return (t, newvals, mult)
def solve(t, vals):
answer = set()
mult = 1
for d in itertools.count(2):
if not t:
return answer
elif not vals or t < min(vals):
return None #no solution'
res = tryMod(d, (t, vals, mult), answer)
if res is None:
return None
t, vals, mult = res
if len(vals) < 23:
break
if (d % 10000) == 0:
print 'd', d
#don't want to deal with the complexity of multisets for now
assert(len(set(vals)) == len(vals))
rest = miniSolve(t, vals)
if rest is None:
return None
answer.update(x*mult for x in rest)
return answer
start_t = time.time()
answer = solve(target, A)
assert(answer <= set(A) and sum(answer) == target)
print answer

Categories