Solution Performance (DP, hashtable) for Partition Equal Subset Sum

Solution Performance (DP, hashtable) for Partition Equal Subset Sum - python

I know there are some related questions have already been asked in stackoverflow. However, this question is more related to the performance difference among 3 approaches.
The question is: Given a non-empty array containing only positive integers, find if the array can be partitioned into two subsets such that the sum of elements in both subsets is equal. https://leetcode.com/problems/partition-equal-subset-sum/
i.e [1, 5, 11, 5] = True, [1, 5, 9] = False
By solving this problem, I have tried 3 approaches:
Approach 1: Dynamic Programming. Top to Bottom Recursion + memorisation (Result: Time Limit Exceeded):
def canPartition(nums):
total, n = sum(nums), len(nums)
if total & 1 == 1: return False
half = total >> 1
mem = [[0 for _ in range(half)] for _ in range(n)]
def dp(n, half, mem):
if half == 0: return True
if n == -1: return False
if mem[n - 1][half - 1]: return mem[n - 1][half - 1]
mem[n - 1][half - 1] = dp(n - 1, half, mem) or dp(n - 1, half - nums[n - 1], mem)
return mem[n - 1][half - 1]
return dp(n - 1, half, mem)
Approach 2: Dynamic Programming. Bottom up. (Result: 2208 ms Accepted):
def canPartition(self, nums):
total, n = sum(nums), len(nums)
if total & 1 == 1: return False
half = total >> 1
matrix = [[0 for _ in range(half + 1)] for _ in range(n)]
for i in range(n):
for j in range(1, half + 1):
if i == 0:
if j >= nums[i]: matrix[i][j] = nums[i]
else: matrix[i][j] = 0
else:
if j >= nums[i]:
matrix[i][j] = max(matrix[i - 1][j], nums[i] + matrix[i - 1][j - nums[i]])
else: matrix[i][j] = matrix[i - 1][j]
if matrix[i][j] == half: return True
return False
Approach 3: HashTable (Dict). Result (172 ms Accepted):
def canPartition(self, nums):
total = sum(nums)
if total & 1 == 0:
half = total >> 1
cur = {0}
for number in nums:
cur |= { number + x for x in cur} # update the dictionary (hashtable) if key doesn't exist
if half in cur: return True
return False
I really don't understand two things for above 3 approaches regarding time complexity:
I would expect the approach 1 and the approach 2 should have the same result. Both is using a table (matrix) to record the calculated state, but why bottom up approach is quicker ?
I don't know why approach 3 is so much quicker compared to the others. Note: as a glance, approach 3 seems to be 2 to Nth Power approach, but it is using dictionary to discard duplicate value, so the time complexity should be T(n * half).

My guess about the difference between approach 1 and the others is that due to the recursion, approach 1 needs to generate significantly more stack frames, which cost more in system resources than just allocating the matrix and iterating over a conditional. But if I were you, I would try to use some kind of process and memory analyzer to better determine and confirm what's happening. Approach 1 assigns a matrix dependent on the range but the algorithm actually limits the number of iterations to potentially far less since the next function call jumps to a sum subtracted by the array element, rather than combing all possibilities.
Approach 3 depends solely on the number of input elements and the number of sums that can be generated. In each iteration, it adds the current number in the input to all previously achievable numbers, adding only new ones to that list. Given the list [50000, 50000, 50000], for example, approach 3 would iterate over at most three sums: 50000, 100000, and 150000. But since it depends on the range, approach 2 would iterate at least 75000 * 3 times!
Given the list [50000, 50000, 50000], approaches 1, 2 and 3 generate the following numbers of iterations: 15, 225000, and 6.

You are right, Approach 1) and 3) have the same time complexity, approach 2 is the DP version of the knapsack(0/1), approach 1 is the branch and bound version. You can improve approach one by pruning the tree through any of the knapsack heuristics but the optimization has to be strict e.g. if the existing sum and the sum of remaining elements at level K is < half then skip it. This way approach 1) can have better computational complexity than 3).
Why are approach 1) and 3) having a different running time,
[To some extent]
That has more to do with the implementation of dictionaries in python. The dictionaries are natively implemented by the Python interpreter, any operation on them is simply going to be faster than anything that needs to be interpreted first and more often. Also, function calls have higher overhead in python, they are objects. So calling one is not a simple bump up stack and jmp/call operation.
[To a large extent]
Another aspect to mull on is the time complexity of the third approach. For approach 3, the only way time complexity can be exponential is if each iteration results in the insertion of as many elements as are there in the dictionary for the current iteration.
cur |= { number + x for x in cur}
The above line should double |cur|.
I think that it is possible for a series like,
s = {k,K2,K3, ..., kn, (>Kn+1)}
(where K is a prime> 2)
to give the worst case time of order of 2n for Approach 3.Not sure yet what is the average expected time complexity.

Related

Trying to understand the time complexity of this dynamic recursive subset sum

# Returns true if there exists a subsequence of `A[0…n]` with the given sum
def subsetSum(A, n, k, lookup):
# return true if the sum becomes 0 (subset found)
if k == 0:
return True
# base case: no items left, or sum becomes negative
if n < 0 or k < 0:
return False
# construct a unique key from dynamic elements of the input
key = (n, k)
# if the subproblem is seen for the first time, solve it and
# store its result in a dictionary
if key not in lookup:
# Case 1. Include the current item `A[n]` in the subset and recur
# for the remaining items `n-1` with the decreased total `k-A[n]`
include = subsetSum(A, n - 1, k - A[n], lookup)
# Case 2. Exclude the current item `A[n]` from the subset and recur for
# the remaining items `n-1`
exclude = subsetSum(A, n - 1, k, lookup)
# assign true if we get subset by including or excluding the current item
lookup[key] = include or exclude
# return solution to the current subproblem
return lookup[key]
if __name__ == '__main__':
# Input: a set of items and a sum
A = [7, 3, 2, 5, 8]
k = 14
# create a dictionary to store solutions to subproblems
lookup = {}
if subsetSum(A, len(A) - 1, k, lookup):
print('Subsequence with the given sum exists')
else:
print('Subsequence with the given sum does not exist')
It is said that the complexity of this algorithm is O(n * sum), but I can't understand how or why;
can someone help me? Could be a wordy explanation or a recurrence relation, anything is fine

The simplest explanation I can give is to realize that when lookup[(n, k)] has a value, it is True or False and indicates whether some subset of A[:n+1] sums to k.
Imagine a naive algorithm that just fills in all the elements of lookup row by row.
lookup[(0, i)] (for 0 ≤ i ≤ total) has just two elements true, i = A[0] and i = 0, and all the other elements are false.
lookup[(1, i)] (for 0 ≤ i ≤ total) is true if lookup[(0, i)] is true or i ≥ A[1] and lookup[(0, i - A[1]) is true. I can reach the sum i either by using A[i] or not, and I've already calculated both of those.
...
lookup[(r, i)] (for 0 ≤ i ≤ total) is true if lookup[(r - 1, i)] is true or i ≥ A[r] and lookup[(r - 1, i - A[r]) is true.
Filling in this table this way, it is clear that we can completely fill the lookup table for rows 0 ≤ row < len(A) in time len(A) * total since filling in each element in linear. And our final answer is just checking if (len(A) - 1, sum) True in the table.
Your program is doing the exact same thing, but calculating the value of entries of lookup as they are needed.

Sorry for submitting two answers. I think I came up with a slightly simpler explanation.
Take your code in imagine putting the three lines inside if key not in lookup: into a separate function, calculateLookup(A, n, k, lookup). I'm going to call "the cost of calling calculateLookup for n and k for a specific value of n and k to be the total time spent in the call to calculateLookup(A, n, k, loopup), but excluding any recursive calls to calculateLookup.
The key insight is that as defined above, the cost of calling calculateLookup() for any n and k is O(1). Since we are excluding recursive calls in the cost, and there are no for loops, the cost of calculateLookup is the cost of just executing a few tests.
The entire algorithm does a fixed amount of work, calls calculateLookup, and then a small amount of work. Hence the amount of time spent in our code is the same as asking how many times do we call calculateLookup?
Now we're back to previous answer. Because of the lookup table, every call to calculateLookup is called with a different value for (n, k). We also know that we check the bounds of n and k before each call to calculateLookup so 1 ≤ k ≤ sum and 0 ≤ n ≤ len(A). So calculateLookup is called at most (len(A) * sum) times.
In general, for these algorithms that use memoization/cacheing, the easiest thing to do is to separately calculate and then sum:
How long things take assuming all values you need are cached.
How long it takes to fill the cache.
The algorithm you presented is just filling up the lookup cache. It's doing it in an unusual order, and its not filling every entry in the table, but that's all its doing.
The code would be slightly faster with
lookup[key] = subsetSum(A, n - 1, k - A[n], lookup) or subsetSum(A, n - 1, k, lookup)
Doesn't change the O() of the code in the worst case, but can avoid some unnecessary calculations.

Big O of backtracking solution counts permutations with range

I have a problem and I've been struggling with my solution time and space complexity:
Given an array of integers (possible duplicates) A and min, low, high are integers.
Find the total number of combinations of items in A that:
low <= A[i] <= high
Each combination has at least min numbers.
Numbers in one combination can be duplicates as they're considered unique in A but combinations can not be duplicates. E.g.: [1,1,2] -> combinations: [1,1],[1,2],[1,1,2] are ok but [1,1],[1,1], [1,2], [2,1] ... are not.
Example: A=[4, 6, 3, 13, 5, 10], min = 2, low = 3, high = 5
There are 4 ways to combine valid integers in A: [4,3],[4,5],[4,3,5],[3,5]
Here's my solution and it works:
class Solution:
def __init__(self):
pass
def get_result(self, arr, min_size, low, high):
return self._count_ways(arr, min_size, low, high, 0, 0)
def _count_ways(self, arr, min_size, low, high, idx, comb_size):
if idx == len(arr):
return 0
count = 0
for i in range(idx, len(arr)):
if arr[i] >= low and arr[i] <= high:
comb_size += 1
if comb_size >= min_size:
count += 1
count += self._count_ways(arr, min_size, low, high, i + 1, comb_size)
comb_size -= 1
return count
I use backtracking so:
Time: O(n!) because for every single integer, I check with each and every single remaining one in worst case - when all integers can form combinations.
Space: O(n) for at most I need n calls on the call stack and I only use 2 variables to keep track of my combinations.
Is my analysis correct?
Also, a bit out of the scope but: Should I do some kind of memoization to improve it?

If I understand your requirements correctly, your algorithm is far too complicated. You can do it as follows:
Compute array B containing all elements in A between low and high.
Return sum of Choose(B.length, k) for k = min .. B.length, where Choose(n,k) is n(n-1)..(n-k+1)/k!.
Time and space complexities are O(n) if you use memoization to compute the numerators/denominators of the Choose function (e.g. if you have already computed 5*4*3, you only need one multiplication to compute 5*4*3*2 etc.).
In your example, you would get B = [4, 3, 5], so B.length = 3, and the result is
Choose(3, 2) + Choose(3, 3)
= (3 * 2)/(2 * 1) + (3 * 2 * 1)/(3 * 2 * 1)
= 3 + 1
= 4

Your analysis of the time complexity isn't quite right.
I understand where you're getting O(n!): the for i in range(idx, len(arr)): loop decreases in length with every recursive call, so it seems like you're doing n*(n-1)*(n-2)*....
However, the recursive calls from a loop of length m do not always contain a loop of size m-1. Suppose your outermost call has 3 elements. The loop iterates through 3 possible values, each spawning a new call. The first such call will have a loop that iterates over 2 values, but the next call iterates over only 1 value, and the last immediately hits your base case and stops. So instead of 3*2*1=((1+1)+(1+1)+(1+1)), you get ((1+0)+1+0).
A call to _count_ways with an array of size n takes twice as long as a call with size n-1. To see this, consider the first branch in the call of size n which is to choose the first element or not. First we choose that first element, which leads to a recursive call with size n-1. Second we do not choose that first element, which gives us n-1 elements left to iterate over, so it's as if we had a second recursive call with size n-1.
Each increase in n increase time complexity by a factor of 2, so the time complexity of your solution is O(2^n). This makes sense: you're checking every combination, and there are 2^n combinations in a set of size n.
However, as you're only trying to count the combinations and not do something with them, this is highly inefficient. See #Mo B.'s answer for a better solution.

Maximum non-contiguous sum of values in list less than or equal to k

I have a list of values [6,1,1,5,2] and a value k = 10. I want to find the maximum sum of values from the list that is less than or equal to k, return the value and the numbers used. In this case the output would be: 10, [6,1,1,2].
I was using this code from GeeksForGeeks as an example but it doesn't work correctly (in this case, the code's result is 9).
The values do not need to be contiguous - they can be in any order.
def maxsum(arr, n, sum):
curr_sum = arr[0]
max_sum = 0
start = 0;
for i in range(1, n):
if (curr_sum <= sum):
max_sum = max(max_sum, curr_sum)
while (curr_sum + arr[i] > sum and start < i):
curr_sum -= arr[start]
start += 1
curr_sum += arr[i]
if (curr_sum <= sum):
max_sum = max(max_sum, curr_sum)
return max_sum
if __name__ == '__main__':
arr = [6, 1, 1, 5, 2]
n = len(arr)
sum = 10
print(maxsum(arr, n, sum))
I also haven't figured out how to output the values that are used for the sum as a list.

This problem is at least as hard as the well-studied subset sum problem, which is NP-complete. In particular, any algorithm which solves your problem can be used to solve the subset sum problem, by finding the maximum sum <= k and then outputting True if the sum equals k, or False if the sum is less than k.
This means your problem is NP-hard, and there is no known algorithm which solves it in polynomial time. Your algorithm's running time is linear in the length of the input array, so it cannot correctly solve the problem, and no similar algorithm can correctly solve the problem.
One approach that can work is a backtracking search - for each element, try including it in the sum, then backtrack and try not including it in the sum. This will take exponential time in the length of the input array.
If your array elements are always integers, another option is dynamic programming; there is a standard dynamic programming algorithm which solves the integer subset sum problem in pseudopolynomial time, which could easily be adapted to solve your form of the problem.

Here's a solution using itertools.combinations. It's fast enough for small lists, but slows down significantly if you have a large sum and large list of values.
from itertools import combinations
def find_combo(values, k):
for num_sum in range(k, 0, -1):
for quant in range(1, len(values) + 1):
for combo in combinations(values, quant):
if sum(combo) == num_sum:
return combo
values = [6, 1, 1, 5, 2]
k = 10
answer = find_combo(values, k)
print(answer, sum(answer))
This solution works for any values in a list and any k, as long as the number of values needed in the solution sum doesn't become large.
The solution presented by user10987432 has a flaw that this function avoids, which is that it always accepts values that keep the sum below k. With that solution, the values are ordered from largest to smallest and then iterated through and added to the solution if it doesn't bring the sum higher than k. However a simple example shows this to be inaccurate:
values = [7, 5, 4, 1] k = 10
In that solution, the sum would begin at 0, then go up to 7 with the first item, and finish at 8 after reaching the last index. The correct solution, however, is 5 + 4 + 1 = 10.

Which algorithm is faster to sort these number pairs?

I wrote two solutions in python.. It is supposed to take a list of numbers and sort the ones that add up to a sum, both these return the same pairs, but which one is more efficient? I'm not sure if using python's count method does more work behind the scene making the second one longer
numbers = [1, 2, 4, 4, 4, 4, 5, 7, 7, 8, 8, 8, 9]
match = []
for i in range(len(numbers)):
for j in range(len(numbers)):
if (i!=j):
if(numbers[i] + numbers[j] == sum):
match.append([numbers[i], numbers[j]])
match2 = []
for i in range(len(numbers)):
counterPart = abs(numbers[i] - sum)
numberOfCounterParts = numbers.count(counterPart)
if(numberOfCounterParts >= 1):
if(counterPart == numbers[i]):
for j in range(numbers.count(counterPart)-1):
match2.append([numbers[i], counterPart])
else:
for j in range(numbers.count(counterPart)):
match2.append([numbers[i], counterPart])
Is there an even better solution that I'm missing?

When comparing algorithms, you should compare their time complexities. Measuring the time is also a good idea, but heavily dependent on the input, which now is tiny.
The first algorithm takes:
O(N2)
because of the double for loop.
For the second algorithm, you should take into account that count() has a time complexity of O(N). You have on for loop, and in its body count() will be called twice, once after abs() and once in whichever body of the if-else statement you go into. As a result the time complexity is O(N) * 2 * O(N) = 2*O(N<sup>2</sup>), which yields:
O(N2)
That means that both algorithm have the same time complexity. As a result it now has meaning to measure the performance, by running many experiments and taking the average of the time measurements, with big enough input to reflect performance.

It's almost always useful to measure complexity of your algorithms.
Both of your algorithms has O(N^2) complexity, so there are almost interchangeable in terms of performance.
You may improve your algorithm by keeping a mapping of value-index pairs. It will reduce complexity to O(N), basically you'll have one loop.

you can run test yourself using the timeit module:
t1 = timeit(setup='from __main__ import sort1, numbers',
stmt='sort1(numbers)',
number=1)
t2 = timeit(setup='from __main__ import sort2, numbers',
stmt='sort2(numbers)',
number=1)
print(t1)
print(t2)
also note that sum is a built-in and therefore not a good name for a variable...
there are way better algorithms for that! especially considering you have duplicates in your list.
here is a faster version that will only give you the matches but not the multiplicity of the matches:
def sum_target(lst, target):
# make list unique
unique_set = set(lst)
unique_list = list(unique_set)
remainders = {number: target-number for number in unique_list}
print(remainders)
match = set()
for a, b in remainders.items():
if a == b and lst.count(a) >= 2:
match.add((a, b))
else:
if b in remainders:
match.add(frozenset((a, b)))
return match

Yes there is better algorithm which can be used if you know lower_bound and upper_bound of the data. Counting Sort which takes O(N) time and space is not constant (which depends on the range of upper bound and lower bound).
Refer Counting Sort
PS: Counting Sort is not a comparison based sort algorithm.
Refer below sample code:
def counting_sort(numbers, k):
counter = [0] * (k + 1)
for i in numbers:
counter[i] += 1
ndx = 0
for i in range(len(counter)):
while 0 < counter[i]:
numbers[ndx] = i
ndx += 1
counter[i] -= 1

Number of multiples less than the max number

For the following problem on SingPath:
Given an input of a list of numbers and a high number,
return the number of multiples of each of
those numbers that are less than the maximum number.
For this case the list will contain a maximum of 3 numbers
that are all relatively prime to each
other.
Here is my code:
def countMultiples(l, max_num):
counting_list = []
for i in l:
for j in range(1, max_num):
if (i * j < max_num) and (i * j) not in counting_list:
counting_list.append(i * j)
return len(counting_list)
Although my algorithm works okay, it gets stuck when the maximum number is way too big
>>> countMultiples([3],30)
9 #WORKS GOOD
>>> countMultiples([3,5],100)
46 #WORKS GOOD
>>> countMultiples([13,25],100250)
Line 5: TimeLimitError: Program exceeded run time limit.
How to optimize this code?

3 and 5 have some same multiples, like 15.
You should remove those multiples, and you will get the right answer
Also you should check the inclusion exclusion principle https://en.wikipedia.org/wiki/Inclusion-exclusion_principle#Counting_integers
EDIT:
The problem can be solved in constant time. As previously linked, the solution is in the inclusion - exclusion principle.
Let say you want to get the number of multiples of 3 less than 100, you can do this by dividing floor(100/3), the same applies for 5, floor(100/5).
Now to get the multiplies of 3 and 5 that are less than 100, you would have to add them, and subtract the ones that are multiples of both. In this case, subtracting multiplies of 15.
So the answer for multiples of 3 and 5, that are less than 100 is floor(100/3) + floor(100/5) - floor(100/15).
If you have more than 2 numbers, it gets a bit more complicated, but the same approach applies, for more check https://en.wikipedia.org/wiki/Inclusion-exclusion_principle#Counting_integers
EDIT2:
Also the loop variant can be speed up.
Your current algorithm appends multiple in a list, which is very slow.
You should switch the inner and outer for loop. By doing that you would check if any of the divisors divide the number, and you get the the divisor.
So just adding a boolean variable which tells you if any of your divisors divide the number, and counting the times the variable is true.
So it would like this:
def countMultiples(l, max_num):
nums = 0
for j in range(1, max_num):
isMultiple = False
for i in l:
if (j % i == 0):
isMultiple = True
if (isMultiple == True):
nums += 1
return nums
print countMultiples([13,25],100250)

If the length of the list is all you need, you'd be better off with a tally instead of creating another list.
def countMultiples(l, max_num):
count = 0
counting_list = []
for i in l:
for j in range(1, max_num):
if (i * j < max_num) and (i * j) not in counting_list:
count += 1
return count

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.