No Negative Prefix

No Negative Prefix - python

def minOperation(A,N):
operations = 0
for i in range(N):
if sum(A[:i+1])<0:
A[i] = A[i]+1
operations += 1
return operations
What am I doing wrong in this code?
The question says:
Given an array A[] of N integers. In each operation, the person can increase the ith element by 1 (i.e. set A[i] = A[i] + 1). The task is to calculate the minimum number of operations required such that there is no prefix in the array A[] whose sum is less than zero. (i.e. for all i, This condition should be satisfied A[1] + A[2] + .. + A[i] >= 0).

what I am doing wrong in this code
Your code assumes that a sum that becomes negative, can always be solved by just adding one to the last term, but you might need to suddenly add a one to all terms visited so far. For example:
A = [1, 1, 1, 1, -9]
Your code will not add a one to the first 4 terms, and then will only change -9 to -8, but that doesn't do enough. At that moment you really should consider that all previous terms should get an extra point, so that the prefix sum becomes 2+2+2+2-8, which then is still OK.
So the idea is to keep track of how many ones you have added, which will at the same time indicate at which index you can add the next one -- if needed. Each time your running total becomes negative, you know how many operations you would need in addition. If those are available, then use them to get the total to zero.
Code:
def minOperations(lst):
operations = 0
total = 0
for i, val in enumerate(lst):
total += val
if total < 0:
# Add the number of operations to make the total 0
operations += -total
total = 0
# If more operations are needed than available...
if operations > i + 1:
return -1 # ...it cannot be solved
return operations
# Example run
lst = [1, 1, -5, 3, 2, -4, -1]
print(minOperations(lst)) # 3
NB: it is not efficient to recalculate the sum of the subarray at each iteration. Just keep adding the current value to a running sum.

Walk through array and calculate prefix sums.
If for every sum inequality
- PrefixSum[i] <= i
is true, then result is
max(0, - min(PrefixSum))
else result is None (we cannot add enough ones)

Related

Trying to understand the time complexity of this dynamic recursive subset sum

# Returns true if there exists a subsequence of `A[0…n]` with the given sum
def subsetSum(A, n, k, lookup):
# return true if the sum becomes 0 (subset found)
if k == 0:
return True
# base case: no items left, or sum becomes negative
if n < 0 or k < 0:
return False
# construct a unique key from dynamic elements of the input
key = (n, k)
# if the subproblem is seen for the first time, solve it and
# store its result in a dictionary
if key not in lookup:
# Case 1. Include the current item `A[n]` in the subset and recur
# for the remaining items `n-1` with the decreased total `k-A[n]`
include = subsetSum(A, n - 1, k - A[n], lookup)
# Case 2. Exclude the current item `A[n]` from the subset and recur for
# the remaining items `n-1`
exclude = subsetSum(A, n - 1, k, lookup)
# assign true if we get subset by including or excluding the current item
lookup[key] = include or exclude
# return solution to the current subproblem
return lookup[key]
if __name__ == '__main__':
# Input: a set of items and a sum
A = [7, 3, 2, 5, 8]
k = 14
# create a dictionary to store solutions to subproblems
lookup = {}
if subsetSum(A, len(A) - 1, k, lookup):
print('Subsequence with the given sum exists')
else:
print('Subsequence with the given sum does not exist')
It is said that the complexity of this algorithm is O(n * sum), but I can't understand how or why;
can someone help me? Could be a wordy explanation or a recurrence relation, anything is fine

The simplest explanation I can give is to realize that when lookup[(n, k)] has a value, it is True or False and indicates whether some subset of A[:n+1] sums to k.
Imagine a naive algorithm that just fills in all the elements of lookup row by row.
lookup[(0, i)] (for 0 ≤ i ≤ total) has just two elements true, i = A[0] and i = 0, and all the other elements are false.
lookup[(1, i)] (for 0 ≤ i ≤ total) is true if lookup[(0, i)] is true or i ≥ A[1] and lookup[(0, i - A[1]) is true. I can reach the sum i either by using A[i] or not, and I've already calculated both of those.
...
lookup[(r, i)] (for 0 ≤ i ≤ total) is true if lookup[(r - 1, i)] is true or i ≥ A[r] and lookup[(r - 1, i - A[r]) is true.
Filling in this table this way, it is clear that we can completely fill the lookup table for rows 0 ≤ row < len(A) in time len(A) * total since filling in each element in linear. And our final answer is just checking if (len(A) - 1, sum) True in the table.
Your program is doing the exact same thing, but calculating the value of entries of lookup as they are needed.

Sorry for submitting two answers. I think I came up with a slightly simpler explanation.
Take your code in imagine putting the three lines inside if key not in lookup: into a separate function, calculateLookup(A, n, k, lookup). I'm going to call "the cost of calling calculateLookup for n and k for a specific value of n and k to be the total time spent in the call to calculateLookup(A, n, k, loopup), but excluding any recursive calls to calculateLookup.
The key insight is that as defined above, the cost of calling calculateLookup() for any n and k is O(1). Since we are excluding recursive calls in the cost, and there are no for loops, the cost of calculateLookup is the cost of just executing a few tests.
The entire algorithm does a fixed amount of work, calls calculateLookup, and then a small amount of work. Hence the amount of time spent in our code is the same as asking how many times do we call calculateLookup?
Now we're back to previous answer. Because of the lookup table, every call to calculateLookup is called with a different value for (n, k). We also know that we check the bounds of n and k before each call to calculateLookup so 1 ≤ k ≤ sum and 0 ≤ n ≤ len(A). So calculateLookup is called at most (len(A) * sum) times.
In general, for these algorithms that use memoization/cacheing, the easiest thing to do is to separately calculate and then sum:
How long things take assuming all values you need are cached.
How long it takes to fill the cache.
The algorithm you presented is just filling up the lookup cache. It's doing it in an unusual order, and its not filling every entry in the table, but that's all its doing.
The code would be slightly faster with
lookup[key] = subsetSum(A, n - 1, k - A[n], lookup) or subsetSum(A, n - 1, k, lookup)
Doesn't change the O() of the code in the worst case, but can avoid some unnecessary calculations.

Big O of backtracking solution counts permutations with range

I have a problem and I've been struggling with my solution time and space complexity:
Given an array of integers (possible duplicates) A and min, low, high are integers.
Find the total number of combinations of items in A that:
low <= A[i] <= high
Each combination has at least min numbers.
Numbers in one combination can be duplicates as they're considered unique in A but combinations can not be duplicates. E.g.: [1,1,2] -> combinations: [1,1],[1,2],[1,1,2] are ok but [1,1],[1,1], [1,2], [2,1] ... are not.
Example: A=[4, 6, 3, 13, 5, 10], min = 2, low = 3, high = 5
There are 4 ways to combine valid integers in A: [4,3],[4,5],[4,3,5],[3,5]
Here's my solution and it works:
class Solution:
def __init__(self):
pass
def get_result(self, arr, min_size, low, high):
return self._count_ways(arr, min_size, low, high, 0, 0)
def _count_ways(self, arr, min_size, low, high, idx, comb_size):
if idx == len(arr):
return 0
count = 0
for i in range(idx, len(arr)):
if arr[i] >= low and arr[i] <= high:
comb_size += 1
if comb_size >= min_size:
count += 1
count += self._count_ways(arr, min_size, low, high, i + 1, comb_size)
comb_size -= 1
return count
I use backtracking so:
Time: O(n!) because for every single integer, I check with each and every single remaining one in worst case - when all integers can form combinations.
Space: O(n) for at most I need n calls on the call stack and I only use 2 variables to keep track of my combinations.
Is my analysis correct?
Also, a bit out of the scope but: Should I do some kind of memoization to improve it?

If I understand your requirements correctly, your algorithm is far too complicated. You can do it as follows:
Compute array B containing all elements in A between low and high.
Return sum of Choose(B.length, k) for k = min .. B.length, where Choose(n,k) is n(n-1)..(n-k+1)/k!.
Time and space complexities are O(n) if you use memoization to compute the numerators/denominators of the Choose function (e.g. if you have already computed 5*4*3, you only need one multiplication to compute 5*4*3*2 etc.).
In your example, you would get B = [4, 3, 5], so B.length = 3, and the result is
Choose(3, 2) + Choose(3, 3)
= (3 * 2)/(2 * 1) + (3 * 2 * 1)/(3 * 2 * 1)
= 3 + 1
= 4

Your analysis of the time complexity isn't quite right.
I understand where you're getting O(n!): the for i in range(idx, len(arr)): loop decreases in length with every recursive call, so it seems like you're doing n*(n-1)*(n-2)*....
However, the recursive calls from a loop of length m do not always contain a loop of size m-1. Suppose your outermost call has 3 elements. The loop iterates through 3 possible values, each spawning a new call. The first such call will have a loop that iterates over 2 values, but the next call iterates over only 1 value, and the last immediately hits your base case and stops. So instead of 3*2*1=((1+1)+(1+1)+(1+1)), you get ((1+0)+1+0).
A call to _count_ways with an array of size n takes twice as long as a call with size n-1. To see this, consider the first branch in the call of size n which is to choose the first element or not. First we choose that first element, which leads to a recursive call with size n-1. Second we do not choose that first element, which gives us n-1 elements left to iterate over, so it's as if we had a second recursive call with size n-1.
Each increase in n increase time complexity by a factor of 2, so the time complexity of your solution is O(2^n). This makes sense: you're checking every combination, and there are 2^n combinations in a set of size n.
However, as you're only trying to count the combinations and not do something with them, this is highly inefficient. See #Mo B.'s answer for a better solution.

Maximum non-contiguous sum of values in list less than or equal to k

I have a list of values [6,1,1,5,2] and a value k = 10. I want to find the maximum sum of values from the list that is less than or equal to k, return the value and the numbers used. In this case the output would be: 10, [6,1,1,2].
I was using this code from GeeksForGeeks as an example but it doesn't work correctly (in this case, the code's result is 9).
The values do not need to be contiguous - they can be in any order.
def maxsum(arr, n, sum):
curr_sum = arr[0]
max_sum = 0
start = 0;
for i in range(1, n):
if (curr_sum <= sum):
max_sum = max(max_sum, curr_sum)
while (curr_sum + arr[i] > sum and start < i):
curr_sum -= arr[start]
start += 1
curr_sum += arr[i]
if (curr_sum <= sum):
max_sum = max(max_sum, curr_sum)
return max_sum
if __name__ == '__main__':
arr = [6, 1, 1, 5, 2]
n = len(arr)
sum = 10
print(maxsum(arr, n, sum))
I also haven't figured out how to output the values that are used for the sum as a list.

This problem is at least as hard as the well-studied subset sum problem, which is NP-complete. In particular, any algorithm which solves your problem can be used to solve the subset sum problem, by finding the maximum sum <= k and then outputting True if the sum equals k, or False if the sum is less than k.
This means your problem is NP-hard, and there is no known algorithm which solves it in polynomial time. Your algorithm's running time is linear in the length of the input array, so it cannot correctly solve the problem, and no similar algorithm can correctly solve the problem.
One approach that can work is a backtracking search - for each element, try including it in the sum, then backtrack and try not including it in the sum. This will take exponential time in the length of the input array.
If your array elements are always integers, another option is dynamic programming; there is a standard dynamic programming algorithm which solves the integer subset sum problem in pseudopolynomial time, which could easily be adapted to solve your form of the problem.

Here's a solution using itertools.combinations. It's fast enough for small lists, but slows down significantly if you have a large sum and large list of values.
from itertools import combinations
def find_combo(values, k):
for num_sum in range(k, 0, -1):
for quant in range(1, len(values) + 1):
for combo in combinations(values, quant):
if sum(combo) == num_sum:
return combo
values = [6, 1, 1, 5, 2]
k = 10
answer = find_combo(values, k)
print(answer, sum(answer))
This solution works for any values in a list and any k, as long as the number of values needed in the solution sum doesn't become large.
The solution presented by user10987432 has a flaw that this function avoids, which is that it always accepts values that keep the sum below k. With that solution, the values are ordered from largest to smallest and then iterated through and added to the solution if it doesn't bring the sum higher than k. However a simple example shows this to be inaccurate:
values = [7, 5, 4, 1] k = 10
In that solution, the sum would begin at 0, then go up to 7 with the first item, and finish at 8 after reaching the last index. The correct solution, however, is 5 + 4 + 1 = 10.

Finding Missing Element in an Array

I have an interesting problem that given two sorted arrays:
a with n elements , b with n-1 elements.
b has all the elements of a except one element is missing.
How to find that element in O(log n) time?
I have tried this code:
def lostElements2(a, b):
if len(a)<len(b):
a, b = b, a
l, r = 0, len(a)-1
while l<r:
m = l + (r-l)//2
if a[m]==b[m]:
l = m+1
else:
r = m - 1
return a[r]
print(lostElements2([-1,0,4,5,7,9], [-1,0,4,5,9]))
I am not getting what should I return in the function, should it be a[l], a[r]?
I am getting how the logic inside the function should be: if the mid values of both arrays match, it means, b till the mid point is the same as a, and hence the missing element must be on the right of mid.
But am not able to create a final solution, when should the loop stop and what should be returned? How will it guarantee that a[l] or a[r] is indeed the missing element?

The point of l and r should be that l is always a position where the lists are equal, while r is always a position where they differ. Ie.
a[l]==b[l] and a[r]!=b[r]
The only mistake in the code is to update r to m-1 instead of m. If we know that a[m]!=b[m], we can safely set r=m. But setting it to m-1risks getting a[r]==b[r], which breaks the algorithm.
def lostElements2(a, b):
if len(a) < len(b):
a, b = b, a
if a[0] != b[0]:
return a[0]
l, r = 0, len(a)-1
while l < r:
m = l + (r-l)//2
if a[m] == b[m]:
l = m+1
else:
r = m # The only change
return a[r]
(As #btilly points out, this algorithm fails if we allow for repeated values.)
edit from #btilly
To fix that potential flaw, if the values are equal, we search for the range with the same value. To do that we walk forward in steps of size 1, 2, 4, 8 and so on until the value switches, then do a binary search. And walk backwards the same. Now look for a difference at each edge.
The effort required for that search is O(log(k)) where k is the length of the repeated value. So we are now replacing O(log(n)) lookups with searches. If there is an upper bound K on the length of that search, that makes the overall running time. O(log(n)log(K)). That makes the worst case running time O(log(n)^2). If K is close to sqrt(n), it is easy to actually hit that worst case.
I claimed in a comment that if at most K elements are repeated more than K times then the running time is O(log(n)log(K)). On further analysis, that claim is wrong. If K = log(n) and the log(n) runs of length sqrt(n) are placed to hit all the choices of the search, then you get running time O(log(n)^2) and not O(log(n)log(log(n))).
However if at most log(K) elements are repeated more than K times, then you DO get a running time of O(log(n)log(K)). Which should be good enough for most cases. :-)

The principle of this problem is simple, the details are hard.
You have arranged that array a is the longer one. Good, that simplifies life. Now you need to return the value of a at the first position where the value of a differs from the value of b.
Now you need to be sure to deal with the following edge cases.
The differing value is the last (ie at a position where only array a has a value.
The differing value is the very first. (Binary search algorithms are easy to screw up for this case.
There is a run the same. That is a = [1, 1, 2, 2, 2, 2, 3] while b = [1, 2, 2, 2, 2, 3] - when you land in the middle the fact that the values match can mislead you!
Good luck!

Your code is not handling the case where the missing element is the index m itself. Your if/else clause that follows will always move the bounds of where the missing element can be to not include m.
You could fix this by including an additional check:
if a[m]==b[m]:
l = m+1
elif m==0 or a[m-1]==b[m-1]:
return a[m]
else:
r = m - 1
An alternative would be to store the last value of m:
last_m = 0
...
else:
last_m = m
r = m - 1
...
return a[last_m]
Which would cause it to return the last time a mismatch was detected.

Find duplicate in array - Time complexity < O(n^2) and constant extra space O(1). (Amazon Interview)

Given below is the problem statement and the solution. I am not able to grasp the logic behind the solution.
Problem Statement:
Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.
Note:
You must not modify the array (assume the array is read only).
You must use only constant, O(1) extra space.
Your runtime complexity should be less than O(n2).
There is only one duplicate number in the array, but it could be repeated more than once.
Sample Input: [3 4 1 4 1]
Output: 1
The Solution for the problem posted on leetcode is:
class Solution(object):
def findDuplicate(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
low = 1
high = len(nums)-1
while low < high:
mid = low+(high-low)/2
count = 0
for i in nums:
if i <= mid:
count+=1
if count <= mid:
low = mid+1
else:
high = mid
return low
Explanation for the above code (as per the author):
This solution is based on binary search.
At first the search space is numbers between 1 to n. Each time I select a number mid (which is the one in the middle) and count all the numbers equal to or less than mid. Then if the count is more than mid, the search space will be [1 mid] otherwise [mid+1 n]. I do this until search space is only one number.
Let's say n=10 and I select mid=5. Then I count all the numbers in the array which are less than equal mid. If the there are more than 5 numbers that are less than 5, then by Pigeonhole Principle (https://en.wikipedia.org/wiki/Pigeonhole_principle) one of them has occurred more than once. So I shrink the search space from [1 10] to [1 5]. Otherwise the duplicate number is in the second half so for the next step the search space would be [6 10].
Doubt: In the above solution, when count <= mid , why are we changing low to low = mid + 1 or otherwise changing high = mid ? What's the logic behind it?
I am unable to understand the logic behind this algorithm
Related Link:
https://discuss.leetcode.com/topic/25580/two-solutions-with-explanation-o-nlog-n-and-o-n-time-o-1-space-without-changing-the-input-array

Well it's a binary search. You are cutting the search space in half and repeating.
Think about it this way: you have a list of 101 items, and you know it contains values 1-100. Take the halfway point, 50. Count how many items are less than or equal to 50. If there are more than 50 items that are less than or equal to 50, then the duplicate is in the range 0-50, otherwise the duplicate is in the range 51-100.
Binary search is simply cutting the range in half. Looking at 0-50, taking midpoint 25 and repeating.
The crucial part of this algorithm which I believe is causing confusion is the for loop. I'll attempt to explain it. Firstly note that there is no usage of indices anywhere in this algorithm - just inspect the code and you'll see that index references do not exist. Secondly, note that the algorithm loops through the entire collection for each iteration of the while loop.
Let me make the following change, then consider the value of inspection_count after every while loop.
inspection_count=0
for i in nums:
inspection_count+=1
if i <= mid:
count+=1
Of course inspection_count will be equal to len(nums). The for loop iterates the entire collection, and for every element checks to see whether it is within the candidate range (of values, not indices).
The duplication test itself is simple and elegant - as others pointed out, this is the pigeonhole principle. Given a collection of n values where every value is in the range {p..q}, if q-p < n then there must be duplicates in the range. Think of some easy cases -
p = 0, q = 5, n = 10
"I have ten values, and every value is between zero and five.
At least one of these values must be duplicated."
We can generalize this, but a more valid and relevant example is
p = 50, q = 99, n = 50
"I have a collection of fifty values, and every value is between fifty and ninety-nine.
There are only forty nine *distinct* values in my collection.
Therefore there is a duplicate."

The logic behind setting low = mid+1 or high = mid is essentially what makes it a solution based on binary search. The search space is divided in half and the while loop is searching only in the lower half (high = mid) or the higher half (low = mid+1) on its next iteration.
So I shrink the search space from [1 10] to [1 5]. Otherwise the duplicate number is in the second half so for the next step the search space would be [6 10].
This is the part of the explanation regarding your question.

lets say you have 10 numbers.
a=[1,2,2,3,4,5,6,7,8,9]
then mid=5
and the number of elements that are less than or equal to 5 are 6 (1,2,2,3,4,5).
now count=6 which is greater than mid. this implies that there is atleast one duplicate in the first half so what the code is doing is making the search space to the first half that is from [1-10] to [1-5] and so on.
Else a duplicate occurs in second half so search space will be [5-10].
Do tell me if you have doubts.

public static void findDuplicateInArrayTest() {
int[] arr = {1, 7, 7, 3, 6, 7, 2, 4};
int dup = findDuplicateInArray(arr, 0, arr.length - 1);
System.out.println("duplicate: " + dup);
}
public static int findDuplicateInArray(int[] arr, int l, int r) {
while (l != r) {
int m = (l + r) / 2;
int count = 0;
for (int i = 0; i < arr.length; i++)
if (arr[i] <= m)
count++;
if (count > m)
r = m;
else
l = m + 1;
}
return l;
}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.