Partitioning array to k subsets using dynamic programming in python

Partitioning array to k subsets using dynamic programming in python - python

I found this geeksforgeeks article explaining how to approach the "partitioning array into 2 subsets" problem. As the problem I am working on is a 3-partition problem, I would like to extend the logic of the code used in this article to the 3-partition problem. However, I am unsure how the 2-D array is used to determine whether the array can be partitioned into 2 subsets.
This is the description of the problem from the page:
Partition problem is to determine whether a given set can be partitioned into two subsets such that the sum of elements in both subsets is the same.
Here is the link to the page: https://www.geeksforgeeks.org/partition-problem-dp-18/
Here is the code implemented on the post:
def findPartition(arr, n):
sum = 0
i, j = 0, 0
# calculate sum of all elements
for i in range(n):
sum += arr[i]
if sum % 2 != 0:
return false
part = [[True for i in range(n + 1)]
for j in range(sum // 2 + 1)]
# initialize top row as true
for i in range(0, n + 1):
part[0][i] = True
# initialize leftmost column,
# except part[0][0], as 0
for i in range(1, sum // 2 + 1):
part[i][0] = False
# fill the partition table in
# bottom up manner
for i in range(1, sum // 2 + 1):
for j in range(1, n + 1):
part[i][j] = part[i][j - 1]
if i >= arr[j - 1]:
part[i][j] = (part[i][j] or
part[i - arr[j - 1]][j - 1])
return part[sum // 2][n]
I understand that the 2 for loops traverse the table row by row, column by column and the default value for the current cell is the value of the cell to the left of it (meaning, the number is assumed not to be added to the subset).
for i in range(1, sum // 3 + 1):
for j in range(1, n + 1):
part[i][j] = part[i][j - 1]
The if condition ensures that the current number does not exceed the target value of sum // 2
if i >= arr[j - 1]:
This leads me to my question: Using this code, at the bottom rightmost cell (whose value we return to the main function) wouldn't the code just check if sum // k can be obtained for the at least 1 subset? How can we assume that sum // k can be achieved for all subsets based on this? This is because we are unsure about which numbers from the array have been used and if any numbers across the subarrays overlap.
part[i][j] = (part[i][j] or part[i - arr[j - 1]][j - 1])
I have also found a similar solution for the 3-partition problem posted by rishabh1005 on github (link: https://github.com/rishabh1005/Algorithmic-Toolbox/blob/master/week6_dynamic_programming2/2_partitioning_souvenirs/partition3.py). However, he implements a counter to check if the value of each cell is equal to the target value of sum // k. The counter is incremented each time a cell is equal to sum // k.
if count<3: return 0
else: return 1
However, I am similar unsure about the final if conditions he used. Why can the array be partitioned into 3 subsets if the counter value is > 3?
Thanks everyone for your help.

Related

Getting all subsets from subset sum problem on Python using Dynamic Programming

I am trying to extract all subsets from a list of elements which add up to a certain value.
Example -
List = [1,3,4,5,6]
Sum - 9
Output Expected = [[3,6],[5,4]]
Have tried different approaches and getting the expected output but on a huge list of elements it is taking a significant amount of time.
Can this be optimized using Dynamic Programming or any other technique.
Approach-1
def subset(array, num):
result = []
def find(arr, num, path=()):
if not arr:
return
if arr[0] == num:
result.append(path + (arr[0],))
else:
find(arr[1:], num - arr[0], path + (arr[0],))
find(arr[1:], num, path)
find(array, num)
return result
numbers = [2, 2, 1, 12, 15, 2, 3]
x = 7
subset(numbers,x)
Approach-2
def isSubsetSum(arr, subset, N, subsetSize, subsetSum, index , sum):
global flag
if (subsetSum == sum):
flag = 1
for i in range(0, subsetSize):
print(subset[i], end = " ")
print("")
else:
for i in range(index, N):
subset[subsetSize] = arr[i]
isSubsetSum(arr, subset, N, subsetSize + 1,
subsetSum + arr[i], i + 1, sum)

If you want to output all subsets you can't do better than a sluggish O(2^n) complexity, because in the worst case that will be the size of your output and time complexity is lower-bounded by output size (this is a known NP-Complete problem). But, if rather than returning a list of all subsets, you just want to return a boolean value indicating whether achieving the target sum is possible, or just one subset summing to target (if it exists), you can use dynamic programming for a pseudo-polynomial O(nK) time solution, where n is the number of elements and K is the target integer.
The DP approach involves filling in an (n+1) x (K+1) table, with the sub-problems corresponding to the entries of the table being:
DP[i][k] = subset(A[i:], k) for 0 <= i <= n, 0 <= k <= K
That is, subset(A[i:], k) asks, 'Can I sum to (little) k using the suffix of A starting at index i?' Once you fill in the whole table, the answer to the overall problem, subset(A[0:], K) will be at DP[0][K]
The base cases are for i=n: they indicate that you can't sum to anything except for 0 if you're working with the empty suffix of your array
subset(A[n:], k>0) = False, subset(A[n:], k=0) = True
The recursive cases to fill in the table are:
subset(A[i:], k) = subset(A[i+1:, k) OR (A[i] <= k AND subset(A[i+i:], k-A[i]))
This simply relates the idea that you can use the current array suffix to sum to k either by skipping over the first element of that suffix and using the answer you already had in the previous row (when that first element wasn't in your array suffix), or by using A[i] in your sum and checking if you could make the reduced sum k-A[i] in the previous row. Of course, you can only use the new element if it doesn't itself exceed your target sum.
ex: subset(A[i:] = [3,4,1,6], k = 8)
would check: could I already sum to 8 with the previous suffix (A[i+1:] = [4,1,6])? No. Or, could I use the 3 which is now available to me to sum to 8? That is, could I sum to k = 8 - 3 = 5 with [4,1,6]? Yes. Because at least one of the conditions was true, I set DP[i][8] = True
Because all the base cases are for i=n, and the recurrence relation for subset(A[i:], k) relies on the answers to the smaller sub-problems subset(A[i+i:],...), you start at the bottom of the table, where i = n, fill out every k value from 0 to K for each row, and work your way up to row i = 0, ensuring you have the answers to the smaller sub-problems when you need them.
def subsetSum(A: list[int], K: int) -> bool:
N = len(A)
DP = [[None] * (K+1) for x in range(N+1)]
DP[N] = [True if x == 0 else False for x in range(K+1)]
for i in range(N-1, -1, -1):
Ai = A[i]
DP[i] = [DP[i+1][k] or (Ai <=k and DP[i+1][k-Ai]) for k in range(0, K+1)]
# print result
print(f"A = {A}, K = {K}")
print('Ai,k:', *range(0,K+1), sep='\t')
for (i, row) in enumerate(DP): print(A[i] if i < N else None, *row, sep='\t')
print(f"DP[0][K] = {DP[0][K]}")
return DP[0][K]
subsetSum([1,4,3,5,6], 9)
If you want to return an actual possible subset alongside the bool indicating whether or not it's possible to make one, then for every True flag in your DP you should also store the k index for the previous row that got you there (it will either be the current k index or k-A[i], depending on which table lookup returned True, which will indicate whether or not A[i] was used). Then you walk backwards from DP[0][K] after the table is filled to get a subset. This makes the code messier but it's definitely do-able. You can't get all subsets this way though (at least not without increasing your time complexity again) because the DP table compresses information.

Here is the optimized solution to the problem with a complexity of O(n^2).
def get_subsets(data: list, target: int):
# initialize final result which is a list of all subsets summing up to target
subsets = []
# records the difference between the target value and a group of numbers
differences = {}
for number in data:
prospects = []
# iterate through every record in differences
for diff in differences:
# the number complements a record in differences, i.e. a desired subset is found
if number - diff == 0:
new_subset = [number] + differences[diff]
new_subset.sort()
if new_subset not in subsets:
subsets.append(new_subset)
# the number fell short to reach the target; add to prospect instead
elif number - diff < 0:
prospects.append((number, diff))
# update the differences record
for prospect in prospects:
new_diff = target - sum(differences[prospect[1]]) - prospect[0]
differences[new_diff] = differences[prospect[1]] + [prospect[0]]
differences[target - number] = [number]
return subsets

Understanding print matrix in clockwise spiral in Python

I found this exercise to study matrices or 2d vectors in Python (I'm beginner)
'''
start row index - k
end row index - m
start column index - l
end column index - n
iterator - i
array or list - a
'''
# Python3 program to print
# given matrix in spiral form
def spiralPrint(m, n, a) :
k = 0; l = 0
''' k - starting row index
m - ending row index
l - starting column index
n - ending column index
i - iterator '''
while (k < m and l < n) :
# Print the first row from
# the remaining rows
for i in range(l, n) :
print(a[k][i], end = " ")
k += 1
# Print the last column from
# the remaining columns
for i in range(k, m) :
print(a[i][n - 1], end = " ")
n -= 1
# Print the last row from
# the remaining rows
if ( k < m) :
for i in range(n - 1, (l - 1), -1) :
print(a[m - 1][i], end = " ")
m -= 1
# Print the first column from
# the remaining columns
if (l < n) :
for i in range(m - 1, k - 1, -1) :
print(a[i][l], end = " ")
l += 1
a =[[1,2,3,4,5],
[6,7,8,9,10],
[11,12,13,14,15],
[16,17,18,19,20]]
R = 4
C = 5
spiralPrint(R,C,a)
I wanted to understand the logic behind it, I mean why they were used 4 loops to iterate and why they assigned indices as rows and columns?
Also, how does the function know exactly that at the end of the element it has to go under the second list and go around?

There are four for-loops, one for each side of the spiral.
Each iteration of the while-loop writes out one ring of the matrix. k and m control how much of each side column needs to be printed. k starts at 0 and m at the height of the matrix. Each time a top row is printed, k gets incremented by 1, so the subsequent columns to be printed start one row lower. Each time a bottom row is printed, m gets decremented by 1, so the subsequent columns to be printed stop one row higher.
Similarly, l and n control how much of each row needs to be printed, and get adjusted every time a column is printed.
Everything is easier to follow if you draw the matrix on a sheet of paper and write out k, m, l and n while you mark the parts that are being printed.
The exercise is a good illustration of how ranges work in Python.

How to apply recursion to this code about the number of ways to sum up to 'N'?

Given a list of integers, and a target integer N, I want to find the number of ways in which the integers in the list can be added to get N. Repetition is allowed.
This is the code:
def countWays(arr, m, N):
count = [0 for i in range(N + 1)]
# base case
count[0] = 1
# Count ways for all values up
# to 'N' and store the result
# m=len(arr)
for i in range(1, N + 1):
for j in range(m):
# if i >= arr[j] then
# accumulate count for value 'i' as
# ways to form value 'i-arr[j]'
if (i >= arr[j]):
count[i] += count[i - arr[j]]
# required number of ways
return count[N]
(from Geeksforgeeks)
Any idea on how to do it using recursion and memoization?

The problem you are trying to solve is the same as the number of ways to make a change for an amount given a list of denominations. In your case, the amount is analogous to target number N and the denominations are analogous to the list of integers. Here is the recursive code. The link is https://www.geeksforgeeks.org/coin-change-dp-7/
# Returns the count of ways we can sum
# arr[0...m-1] coins to get sum N
def count(arr, m, N ):
# If N is 0 then there is 1
# solution (do not include any coin)
if (N == 0):
return 1
# If N is less than 0 then no
# solution exists
if (N < 0):
return 0;
# If there are no coins and N
# is greater than 0, then no
# solution exist
if (m <=0 and N >= 1):
return 0
# count is sum of solutions (i)
# including arr[m-1] (ii) excluding arr[m-1]
return count( arr, m - 1, N ) + count( arr, m, N-arr[m-1] );

Interviewstreet's Insertion sort program

I tried to program Interiewstreet's Insertion sort challenge Link for the challenge
in Python and here is my code shown below.
The program runs fine for a limit(which I'm not sure of) of input elements, but returns a false output for inputs of larger sizes. Can anyone guide me what am I doing wrong?
# This program tries to identify number of times swapping is done to sort the input array
"""
=>Get input values and print them
=>Get number of test cases and get inputs for those test cases
=>Complete Insertion sort routine
=>Add a variable to count the swapping's
"""
def sort_swap_times(nums):
""" This function takes a list of elements and then returns the number of times
swapping was necessary to complete the sorting
"""
times_swapped = 0L
# perform the insertion sort routine
for j in range(1, len(nums)):
key = nums[j]
i = j - 1
while i >= 0 and nums[i] > key:
# perform swap and update the tracker
nums[i + 1] = nums[i]
times_swapped += 1
i = i - 1
# place the key value in the position identified
nums[i + 1] = key
return times_swapped
# get the upper limit.
limit = int(raw_input())
swap_count = []
# get the length and elements.
for i in range(limit):
length = int(raw_input())
elements_str = raw_input() # returns a list of strings
# convert the given elements from str to int
elements_int = map(int, elements_str.split())
# pass integer elements list to perform the sorting
# get the number of times swapping was needed and append the return value to swap_count list
swap_count.append(sort_swap_times(elements_int))
# print the swap counts for each input array
for x in swap_count:
print x

Your algorithm is correct, but this is a naive approach to the problem and will give you a Time Limit Exceed signal on large test cases (i.e., len(nums) > 10000). Let's analyze the run-time complexity of your algorithm.
for j in range(1, len(nums)):
key = nums[j]
i = j - 1
while i >= 0 and nums[i] > key:
# perform swap and update the tracker
nums[i + 1] = nums[i]
times_swapped += 1
i = i - 1
# place the key value in the position identified
nums[i + 1] = key
The number of steps required in the above snippet is proportional to 1 + 2 + .. + len(nums)-1, or len(nums)*(len(nums)-1)/2 steps, which is O(len(nums)^2).
Hint:
Use the fact that all values will be within [1,10^6]. What you are really doing here is finding the number of inversions in the list, i.e. find all pairs of i < j s.t. nums[i] > nums[j]. Think of a data structure that allows you to find the number of swaps needed for each insert operation in logarithmic time complexity. Of course, there are other approaches.
Spoiler:
Binary Indexed Trees

better algorithm for checking 5 in a row/col in a matrix

is there a good algorithm for checking whether there are 5 same elements in a row or a column or diagonally given a square matrix, say 6x6?
there is ofcourse the naive algorithm of iterating through every spot and then for each point in the matrix, iterate through that row, col and then the diagonal. I am wondering if there is a better way of doing it.

You could keep a histogram in a dictionary (mapping element type -> int). And then you iterate over your row or column or diagonal, and increment histogram[element], and either check at the end to see if you have any 5s in the histogram, or if you can allow more than 5 copies, you can just stop once you've reached 5 for any element.
Simple, one-dimensional, example:
m = ['A', 'A', 'A', 'A', 'B', 'A']
h = {}
for x in m:
if x in h:
h[x] += 1
else:
h[x] = 1
print "Histogram:", h
for k in h:
if h[k]>=5:
print "%s appears %d times." % (k,h[k])
Output:
Histogram: {'A': 5, 'B': 1}
A appears 5 times.
Essentially, h[x] will store the number of times the element x appears in the array (in your case, this will be the current row, or column or diagonal). The elements don't have to appear consecutively, but the counts would be reset each time you start considering a new row/column/diagonal.

You can check whether there are k same elements in a matrix of integers in a single pass.
Suppose that n is the size of the matrix and m is the largest element. We have n column, n row and 1 diagonal.
Foreach column, row or diagonal we have at most n distinct element.
Now we can create a histogram containing (n + n + 1) * (2 * m + 1) element. Representing
the rows, columns and the diagonal each of them containing at most n distinct element.
size = (n + n + 1) * (2 * m + 1)
histogram = zeros(size, Int)
Now the tricky part is how to update this histogram ?
Consider this function in pseudo-code:
updateHistogram(i, j, element)
if (element < 0)
element = m - element;
rowIndex = i * m + element
columnIndex = n * m + j * m + element
diagonalIndex = 2 * n * m + element
histogram[rowIndex] = histogram[rowIndex] + 1
histogram[columnIndex] = histogram[columnIndex] + 1
if (i = j)
histogram[diagonalIndex] = histogram[diagonalIndex] + 1
Now all you have to do is to iterate throw the histogram and check whether there is an element > k

Your best approach may depend on whether you control the placement of elements.
For example, if you were building a game and just placed the most recent element on the grid, you could capture into four strings the vertical, horizontal, and diagonal strips that intersected that point, and use the same algorithm on each strip, tallying each element and evaluating the totals. The algorithm may be slightly different depending on whether you're counting five contiguous elements out of the six, or allow gaps as long as the total is five.

For rows you can keep a counter, which indicates how many of the same elements in a row you currently have. To do this, iterate through the row and
if current element matches the previous element, increase the counter by one. If counter is 5, then you have found the 5 elements you wanted.
if current element doesn't match previous element, set the counter to 1.
The same principle can be applied to columns and diagonals as well. You probably want to use array of counters for columns (one element for each column) and diagonals so you can iterate through the matrix once.
I did the small example for a smaller case, but you can easily change it:
n = 3
matrix = [[1, 2, 3, 4],
[1, 2, 3, 1],
[2, 3, 1, 3],
[2, 1, 4, 2]]
col_counter = [1, 1, 1, 1]
for row in range(0, len(matrix)):
row_counter = 1
for col in range(0, len(matrix[row])):
current_element = matrix[row][col]
# check elements in a same row
if col > 0:
previous_element = matrix[row][col - 1]
if current_element == previous_element:
row_counter = row_counter + 1
if row_counter == n:
print n, 'in a row at:', row, col - n + 1
else:
row_counter = 1
# check elements in a same column
if row > 0:
previous_element = matrix[row - 1][col]
if current_element == previous_element:
col_counter[col] = col_counter[col] + 1;
if col_counter[col] == n:
print n, 'in a column at:', row - n + 1, col
else:
col_counter[col] = 1
I left out diagonals to keep the example short and simple, but for diagonals you can use the same principle as you use on columns. The previous element would be one of the following (depending on the direction of diagonal):
matrix[row - 1][col - 1]
matrix[row - 1][col + 1]
Note that you will need to make a little bit extra effort in the second case. For example traverse the row in the inner loop from right to left.

I don't think you can avoid iteration, but you can at least do an XOR of all elements and if the result of that is 0 => they are all equal, then you don't need to do any comparisons.

You can try improve your method with some heuristics: use the knowledge of the matrix size to exclude element sequences that do not fit and suspend unnecessary calculation. In case the given vector size is 6, you want to find 5 equal elements, and the first 3 elements are different, further calculation do not have any sense.
This approach can give you a significant advantage, if 5 equal elements in a row happen rarely enough.

If you code the rows/columns/diagonals as bitmaps, "five in a row" means "mask % 31== 0 && mask / 31 == power_of_two"
00011111 := 0x1f 31 (five in a row)
00111110 := 0x3e 62 (five in a row)
00111111 := 0x3f 63 (six in a row)
If you want to treat the six-in-a-row case also as as five-in-a-row, the easiest way is probably to:
for ( ; !(mask & 1) ; mask >>= 1 ) {;}
return (mask & 0x1f == 0x1f) ? 1 : 0;
Maybe the Stanford bit-tweaking department has a better solution or suggestion that does not need looping?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Partitioning array to k subsets using dynamic programming in python - python

Related

Getting all subsets from subset sum problem on Python using Dynamic Programming

Understanding print matrix in clockwise spiral in Python

How to apply recursion to this code about the number of ways to sum up to 'N'?

Interviewstreet's Insertion sort program

better algorithm for checking 5 in a row/col in a matrix

Categories

Resources