Analyzing dynamic programming solutions to different maximum subarray problems - python

I was trying out some dynamic programming problems. I came across the following problems:
1. Maximum (consecutive) subset-sum
Finding the maximum sum of any consecutive set of elements in the given input array.
One possible solution is as follows:
def max_subarray_sum(arr):
curr_sum = arr[0]
max_so_far = arr[0]
for num in arr[1:]:
curr_sum = max(num, curr_sum + num)
max_so_far = max(curr_sum, max_so_far)
return max_so_far
2. Maximum non-adjacent elements subset-sum
Given an array of integers, find the subset of non-adjacent elements with the maximum sum. Calculate the sum of that subset. hackerrank link
One possible solution in python is as follows:
def maxSubsetSum(arr):
second_last_max_sum = arr[0]
last_max_sum = max(arr[0], arr[1])
for num in arr[2:]:
curr_max_sum = max(num, last_max_sum, second_last_max_sum + num)
second_last_max_sum = last_max_sum
last_max_sum = curr_max_sum
return last_max_sum
The solution to the first problem involves two max(), while that for the second involves single max(). So, I was guessing if I can somehow build logic with just single max() for the first problem, say by changing arguments to that single max() like for the second problem.
Q1. After considerable pondering, I came to the conclusion that I cannot because of the constraint in the first problem: sub-array need to be formed by consecutive numbers. Am I right with this?
Q2. Also, can I generalize this further, that is, what "kinds" of problems can be solved with just single max() and which kinds of problems require more than one max()? Or in other words, what are the "characteristics" of constraints on subarray that can be solved in single max() and of those that need at least two max()?

Related

Asymmetric Swaps - minimising max/min difference in list through swaps

Was doing some exercises in CodeChef and came across the Asymmetric Swaps problem:
Problem
Chef has two arrays 𝐴 and 𝐵 of the same size 𝑁.
In one operation, Chef can:
Choose two integers 𝑖 and 𝑗 (1 ≤ 𝑖,𝑗 ≤ 𝑁) and swap the elements 𝐴𝑖 and 𝐵𝑗.
​
Chef came up with a task to find the minimum possible value of (𝐴𝑚𝑎𝑥 − 𝐴𝑚𝑖𝑛) after performing the swap operation any (possibly zero) number of times.
Since Chef is busy, can you help him solve this task?
Note that 𝐴𝑚𝑎𝑥 and 𝐴𝑚𝑖𝑛 denote the maximum and minimum elements of the array 𝐴 respectively.
I have tried the below logic for the solution. But the logic fails for some test cases and I have no access to the failed test cases and where exactly the below code failed to meet the required output.
T = int(input())
for _ in range(T):
arraySize = int(input())
A = list(map(int, input().split()))
B = list(map(int, input().split()))
sortedList = sorted(A+B)
minLower = sortedList[arraySize-1] - sortedList[0] # First half of the sortedList
minUpper = sortedList[(arraySize*2)-1] - sortedList[arraySize] # Second half of the sortedList
print(min(minLower,minUpper))
I saw some submitted answers and didn't get the reason or logic why they are doing so. Can someone guide where am I missing?
The approach to sort the input into one list is the right one. But it is not enough to look at the left and the right half of that sorted list.
It could well be that there is another sublist of length 𝑁 that has its extreme values closer to each other.
Take for instance this input:
A = [1,4,5]
B = [6,11,12]
Then the sorted list is [1,4,5,6,11,12] and [4,5,6] is actually the sublist which minimises the difference between its maximum and minimum value.
So implement a loop where you select the minimum among A[i+N-1] - A[i].

Why no index out of range error?

lst = [
8,2,22,97,38,15,0,40,0,75,4,5,7,78,52,12,50,77,91,8, #0-19
49,49,99,40,17,81,18,57,60,87,17,40,98,43,69,48,4,56,62,0, #20-39
81,49,31,73,55,79,14,29,93,71,40,67,53,88,30,3,49,13,36,65, #40-59
52,70,95,23,4,60,11,42,69,24,68,56,1,32,56,71,37,2,36,91,
22,31,16,71,51,67,63,89,41,92,36,54,22,40,40,28,66,33,13,80,
24,47,32,60,99,3,45,2,44,75,33,53,78,36,84,20,35,17,12,50,
32,98,81,28,64,23,67,10,26,38,40,67,59,54,70,66,18,38,64,70,
67,26,20,68,2,62,12,20,95,63,94,39,63,8,40,91,66,49,94,21,
24,55,58,5,66,73,99,26,97,17,78,78,96,83,14,88,34,89,63,72,
21,36,23,9,75,0,76,44,20,45,35,14,0,61,33,97,34,31,33,95,
78,17,53,28,22,75,31,67,15,94,3,80,4,62,16,14,9,53,56,92,
16,39,5,42,96,35,31,47,55,58,88,24,0,17,54,24,36,29,85,57,
86,56,0,48,35,71,89,7,5,44,44,37,44,60,21,58,51,54,17,58,
19,80,81,68,5,94,47,69,28,73,92,13,86,52,17,77,4,89,55,40,
4,52,8,83,97,35,99,16,7,97,57,32,16,26,26,79,33,27,98,66,
88,36,68,87,57,62,20,72,3,46,33,67,46,55,12,32,63,93,53,69,
4,42,16,73,38,25,39,11,24,94,72,18,8,46,29,32,40,62,76,36, #320-339
20,69,36,41,72,30,23,88,34,62,99,69,82,67,59,85,74,4,36,16, #340-359
20,73,35,29,78,31,90,1,74,31,49,71,48,86,81,16,23,57,5,54, #360-379
1,70,54,71,83,51,54,69,16,92,33,48,61,43,52,1,89,19,67,48] #380-399
prodsum = 1
def prod(iter):
p = 1
for n in iter:
p *= n
return p
for n in range(0,5000,20): #NOT OUT OF RANGE???
for i in range(0,17):
if prod(lst[n+i:n+i+4]) > prodsum:
prodsum = prod(lst[n+i:n+i+4])
I'm trying to learn/improve my very rudimentary skills in Python so I've been going through Project Euler challenges. The challenge question is more complex but I basically have a 20x20 grid and have the find 4 adjacent numbers with the largest product.
I basically turned the grid into a list (with 400 values) and was gonna scan row indices. I accidentally entered in a large number for my for loop and noticed I didn't get a out of range error. Why is this?
You would get an out-of-range error with plain indexing. Eg, if you had an array of 10 elements, and you asked for my_list[20]. However, with slicing my_array[a: b] you either get elements from a to b-1, or to the end of the list. That's just a design decision of the language.
You don't get a out-of-range as you never directly access your list based on your index (n).
You use lst[n+i:n+i+4] to get a slice of lst ... which just is empty if your indices are out of range so prod(...) is called with [] and returns 1.
Slicing outside the bounds of a sequence doesn't cause an error. If you try to index single item, you'll got an error.

How to find the median between two sorted arrays?

I'm working on a competitive programming problem where we're trying to find the median of two sorted arrays. The optimal algorithm is to perform a binary search and identify splitting points, i and j, between the two arrays.
I'm having trouble deriving the solution myself. I don't understand the initial logic. I will follow how I think of the problem so far.
The concept of the median is to partition the given array into two sets. Consider a hypothetical left array and a hypothetical right array after merging the two given arrays. Both these arrays are of the same length.
We know that the median given both those hypothetical arrays works out to be [max(left) + min(right)]/2. This makes sense so far. But the issue here is now knowing how to construct the left and right arrays.
We can choose a splitting point on ArrayA as i and a splitting point on ArrayB as j. Note that len(ArrayB[:j] + ArrayB[:i]) == len(ArrayB[j:] +ArrayB[i:]).
Now we just need to find the cutting points. We could try all splitting points i, j such that they satisfy the median condition. However this works out to be O(m*n) where M is size of ArrayB and where N is size of ArrayA.
I'm not sure how to get where I am to the binary search solution using my train of thought. If someone could give me pointers - that would be awesome.
Here is my approach that I managed to come up with.
First of all we know that the resulting array will contain N+M elements, meaning that the left part will contain (N+M)/2 elements, and the right part will contain (N+M)/2 elements as well. Let's denote the resulting array as Ans, and denote the size of one of its parts as PartSize.
Perform a binary search operation on array A. The range of such binary search will be [0, N]. This binary search operation will help you determine the number of elements from array A that will form the left part of the resulting array.
Now, suppose we are testing the value i. If i elements from array A are supposed to be included in the left part of the resulting array, this means that j = PartSize - i elements must be included from array B in the first part as well. We have the following possibilities:
j > M this is an invalid state. In this case it means we still need to choose more elements from array A, so our new binary search range becomes [i + 1, N].
j <= M & A[i+1] < B[j] This is a tricky case. Think about it. If the next element in array A is smaller than the element j in array B, this means that element A[i+1] is supposed to be in the left part rather than element B[j]. In this case our new binary search range becomes [i+1, N].
j <= M & A[i] > B[j+1] This is close to the previous case. If the next element in array B is smaller than the element i in array A, the means that element B[j+1] is supposed to be in the left part rather than element A[i]. In this case our new binary search range becomes [0, i-1].
j <= M & A[i+1] >= B[j] & A[i] <= B[j+1] this is the optimal case, and you have finally found your answer.
After the binary search operation is finished, and you managed to calculate both i and j, you can now easily find the value of the median. You need to handle a few cases here depending on whether N+M is odd or even.
Hope it helps!

Choosing python data structures to speed up algorithm implementation

So I'm given a large collection (roughly 200k) of lists. Each contains a subset of the numbers 0 through 27. I want to return two of the lists where the product of their lengths is greater than the product of the lengths of any other pair of lists. There's another condition, namely that the lists have no numbers in common.
There's an algorithm I found for this (can't remember the source, apologies for non-specificity of props) which exploits the fact that there are fewer total subsets of the numbers 0 through 27 than there are words in the dictionary.
The first thing I've done is looped through all the lists, found the unique subset of integers that comprise it and indexed it as a number between 0 and 1<<28. As follows:
def index_lists(lists):
index_hash = {}
for raw_list in lists:
length = len(raw_list)
if length > index_hash.get(index,{}).get("length"):
index = find_index(raw_list)
index_hash[index] = {"list": raw_list, "length": length}
return index_hash
This gives me the longest list and the length of the that list for each subset that's actually contained in the collection of lists given. Naturally, not all subsets from 0 to (1<<28)-1 are necessarily included, since there's not guarantee the supplied collection has a list containing each unique subset.
What I then want, for each subset 0 through 1<<28 (all of them this time) is the longest list that contains at most that subset. This is the part that is killing me. At a high level, it should, for each subset, first check to see if that subset is contained in the index_hash. It should then compare the length of that entry in the hash (if it exists there) to the lengths stored previously in the current hash for the current subset minus one number (this is an inner loop 27 strong). The greatest of these is stored in this new hash for the current subset of the outer loop. The code right now looks like this:
def at_most_hash(index_hash):
most_hash = {}
for i in xrange(1<<28): # pretty sure this is a bad idea
max_entry = index_hash.get(i)
if max_entry:
max_length = max_entry["length"]
max_word = max_entry["list"]
else:
max_length = 0
max_word = []
for j in xrange(28): # again, probably not great
subset_index = i & ~(1<<j) # gets us a pre-computed subset
at_most_entry = most_hash.get(subset_index, {})
at_most_length = at_most_entry.get("length",0)
if at_most_length > max_length:
max_length = at_most_length
max_list = at_most_entry["list"]
most_hash[i] = {"length": max_length, "list": max_list}
return most_hash
This loop obviously takes several forevers to complete. I feel that I'm new enough to python that my choice of how to iterate and what data structures to use may have been completely disastrous. Not to mention the prospective memory problems from attempting to fill the dictionary. Is there perhaps a better structure or package to use as data structures? Or a better way to set up the iteration? Or maybe I can do this more sparsely?
The next part of the algorithm just cycles through all the lists we were given and takes the product of the subset's max_length and complementary subset's max length by looking them up in at_most_hash, taking the max of those.
Any suggestions here? I appreciate the patience for wading through my long-winded question and less than decent attempt at coding this up.
In theory, this is still a better approach than working with the collection of lists alone since that approach is roughly o(200k^2) and this one is roughly o(28*2^28 + 200k), yet my implementation is holding me back.
Given that your indexes are just ints, you could save some time and space by using lists instead of dicts. I'd go further and bring in NumPy arrays. They offer compact storage representation and efficient operations that let you implicitly perform repetitive work in C, bypassing a ton of interpreter overhead.
Instead of index_hash, we start by building a NumPy array where index_array[i] is the length of the longest list whose set of elements is represented by i, or 0 if there is no such list:
import numpy
index_array = numpy.zeros(1<<28, dtype=int) # We could probably get away with dtype=int8.
for raw_list in lists:
i = find_index(raw_list)
index_array[i] = max(index_array[i], len(raw_list))
We then use NumPy operations to bubble up the lengths in C instead of interpreted Python. Things might get confusing from here:
for bit_index in xrange(28):
index_array = index_array.reshape([1<<(28-bit_index), 1<<bit_index])
numpy.maximum(index_array[::2], index_array[1::2], out=index_array[1::2])
index_array = index_array.reshape([1<<28])
Each reshape call takes a new view of the array where data in even-numbered rows corresponds to sets with the bit at bit_index clear, and data in odd-numbered rows corresponds to sets with the bit at bit_index set. The numpy.maximum call then performs the bubble-up operation for that bit. At the end, each cell index_array[i] of index_array represents the length of the longest list whose elements are a subset of set i.
We then compute the products of lengths at complementary indices:
products = index_array * index_array[::-1] # We'd probably have to adjust this part
# if we picked dtype=int8 earlier.
find where the best product is:
best_product_index = products.argmax()
and the longest lists whose elements are subsets of the set represented by best_product_index and its complement are the lists we want.
This is a bit too long for a comment so I will post it as an answer. One more direct way to index your subsets as integers is to use "bitsets" with each bit in the binary representation corresponding to one of the numbers.
For example, the set {0,2,3} would be represented by 20 + 22 + 23 = 13 and {4,5} would be represented by 24 + 25 = 48
This would allow you to use simple lists instead of dictionaries and Python's generic hashing function.

4-sum algorithm in Python [duplicate]

This question already has answers here:
Quadratic algorithm for 4-SUM
(3 answers)
Closed 9 years ago.
I am trying to find whether a list has 4 elements that sum to 0 (and later find what those elements are). I'm trying to make a solution based off the even k algorithm described at https://cs.stackexchange.com/questions/2973/generalised-3sum-k-sum-problem.
I get this code in Python using combinations from the standard library
def foursum(arr):
seen = {sum(subset) for subset in combinations(arr,2)}
return any(-x in seen for x in seen)
But this fails for input like [-1, 1, 2, 3]. It fails because it matches the sum (-1+1) with itself. I think this problem will get even worse when I want to find the elements because you can separate a set of 4 distinct items into 2 sets of 2 items in 6 ways: {1,4}+{-2,-3}, {1,-2}+{4,-3} etc etc.
How can I make an algorithm that correctly returns all solutions avoiding this problem?
EDIT: I should have added that I want to use as efficient algorithm as possible. O(len(arr)^4) is too slow for my task...
This works.
import itertools
def foursum(arr):
seen = {}
for i in xrange(len(arr)):
for j in xrange(i+1,len(arr)):
if arr[i]+arr[j] in seen: seen[arr[i]+arr[j]].add((i,j))
else: seen[arr[i]+arr[j]] = {(i,j)}
for key in seen:
if -key in seen:
for (i,j) in seen[key]:
for (p,q) in seen[-key]:
if i != p and i != q and j != p and j != q:
return True
return False
EDIT
This can be made more pythonic i think, I don't know enough python.
It is normal for the 4SUM problem to permit input elements to be used multiple times. For instance, given the input (2 3 1 0 -4 -1), valid solutions are (3 1 0 -4) and (0 0 0 0).
The basic algorithm is O(n^2): Use two nested loops, each running over all the items in the input, to form all sums of pairs, storing the sums and their components in some kind of dictionary (hash table, AVL tree). Then scan the pair-sums, reporting any quadruple for which the negative of the pair-sum is also present in the dictionary.
If you insist on not duplicating input elements, you can modify the algorithm slightly. When computing the two nested loops, start the second loop beyond the current index of the first loop, so no input elements are taken twice. Then, when scanning the dictionary, reject any quadruples that include duplicates.
I discuss this problem at my blog, where you will see solutions in multiple languages, including Python.
First note that the problem is O(n^4) in worst case, since the output size might be of O(n^4) (you are looking for finding all solutions, not only the binary problem).
Proof:
Take an example of [-1]*(n/2).extend([1]*(n/2)). you need to "choose" two instances of -1 w/o repeats - (n/2)*(n/2-1)/2 possibilities, and 2 instances of 1 w/o repeats - (n/2)*(n/2-1)/2 possibilities. This totals in (n/2)*(n/2-1)*(n/2)*(n/2-1)/4 which is in Theta(n^4)
Now, that we understood we cannot achieve O(n^2logn) worst case, we can get to the following algorithm (pseudo-code), that should scale closer to O(n^2logn) for "good" cases (few identical sums), and get O(n^4) worst case (as expected).
Pseudo-code:
subsets <- all subsets of size of indices (not values!)
l <- empty list
for each s in subsets:
#appending a triplet of (sum,idx1,idx2):
l.append(( arr[s[0]] + arr[s[1]], s[0],s[1]))
sort l by first element (sum) in each tupple
for each x in l:
binary search l for -x[0] #for the sum
for each element y that satisfies the above:
if x[1] != y[1] and x[2] != y[1] and x[1] != y[2] and x[2] != y[2]:
yield arr[x[1]], arr[x[2]], arr[y[1]], arr[y[2]]
Probably a pythonic way to do the above will be more elegant and readable, but I am not a python expert I am afraid.
EDIT: Ofcourse the algorithm shall be atleast as time complex as per the solution size!
If the number of possible solutions is not 'large' as compared to n, then
A suggested solution in O(N^3):
Find pair-wise sums of all elements and build a NxN matrix of the sums.
For each element in this matrix, build a struct that would have sumValue, row and column as it fields.
Sort all these N^2 struct elements in a 1D array. (in O(N^2 logN) time).
For each element x in this array, conduct a binary search for its partner y such that x + y = 0 (O(logn) per search).
Now if you find a partner y, check if its row or column field matches with the element x. If so, iterate sequentially in both directions until either there is no more such y.
If you find some y's that do not have a common row or column with x, then increment the count (or print the solution).
This iteration can at most take 2N steps because the length of rows and columns is N.
Hence the total order of complexity for this algorithm shall be O(N^2 * N) = O(N^3)

Categories