sliding window algorithm - condition for start < n - python

Below is a sliding window solution for finding the minimum length subarray with a sum greater than x from geeksforgeeks (https://www.geeksforgeeks.org/minimum-length-subarray-sum-greater-given-value/)
# O(n) solution for finding smallest
# subarray with sum greater than x
# Returns length of smallest subarray
# with sum greater than x. If there
# is no subarray with given sum, then
# returns n + 1
def smallestSubWithSum(arr, n, x):
# Initialize current sum and minimum length
curr_sum = 0
min_len = n + 1
# Initialize starting and ending indexes
start = 0
end = 0
while (end < n):
# Keep adding array elements while current
# sum is smaller than x
while (curr_sum <= x and end < n):
curr_sum += arr[end]
end+= 1
# If current sum becomes greater than x.
while (curr_sum > x and start < n):
# Update minimum length if needed
if (end - start < min_len):
min_len = end - start
# remove starting elements
curr_sum -= arr[start]
start+= 1
return min_len
I have tested that this solution can work, but I'm confused by why in the last while loop, start is checked for being less than n - wouldn't you want it to be less than end, otherwise start can go beyond end, which doesn't really make sense to me?

Since curr_sum was built by adding elements up to end, it will get to zero (or smaller than x) before start can reach end. This will exit the while loop. This also implies that the algorithm will probably not work with negative numbers in the array.
Personally I would have written it a little differently. Here's an example with the negative number condition taken into account:
def minSub(arr,x):
subTotal = 0
size,minSize = 0,len(arr)+1
start = iter(arr)
for value in arr:
subTotal += value
size += 1
while subTotal not in range(0,x+1):
if subTotal>x :
minSize = min(minSize,size)
subTotal -= next(start,0)
size -= 1
return minSize
output:
arr = [1, 4, 45, 6, 0, 19]
x = 51
print(minSub(arr,x)) #3
arr = [-8, 1, 4, -1, 3, -6]
x = 6
print(minSub(arr,x)) # 4
arr = [1, 11, 100, 1, 0, 200, 3, 2, 1, 250]
x = 280
print(minSub(arr,x)) # 4
arr = [1, 10, 5, 2, 7]
x = 9
print(minSub(arr,x)) # 1
arr = [1, 2, 4]
x = 8
print(minSub(arr,x)) # 4

Related

Cut a sequence of length N into subsequences such that the sum of each subarray is less than M and the cut minimizes the sum of max of each part

Given an integer array sequence a_n of length N, cut the sequence into several parts such that every one of which is a consequtive subsequence of the original sequence.
Every part must satisfy the following:
The sum of each part is not greater than a given integer M
Find a cut that minimizes the sum of the maximum integer of each part
For example:
input : n = 8, m = 17 arr = [2, 2, 2, 8, 1, 8, 2, 1]
output = 12
explanation: subarrays = [2, 2, 2], [8, 1, 8], [2, 1]
sum = 2 + 8 + 2 = 12
0 <= N <= 100000
each integer is between 0 and 1000000
If no such cut exists, return -1
I believe this is a dynamic programming question, but I am not sure how to approach this.
I am relatively new to coding, and came across this question in an interview which I could not do. I would like to know how to solve it for future reference.
Heres what I tried:
n = 8
m = 17
arr = [2, 2, 2, 8, 1, 8, 2, 1]
biggest_sum, i = 0, 0
while (i < len(arr)):
seq_sum = 0
biggest_in_seq = -1
while (seq_sum <= m and i < len(arr)):
if (seq_sum + arr[i] <= m ):
seq_sum += arr[i]
if (arr[i] > biggest_in_seq):
biggest_in_seq = arr[i]
i += 1
else:
break
biggest_sum += biggest_in_seq
if (biggest_sum == 0):
print(-1)
else:
print(biggest_sum)
This givens the result 16, and the subsequences are: [[2, 2, 2, 8, 1], [8, 2, 1]]
Problem is that you are filling every sequence from left to right up to the maximum allowed value m. You should evaluate different options of sequence lengths and minimize the result, which in the example means that the 2 8 values must be in the same sequence.
a possible solution could be:
n = 8
m = 17
arr = [2, 2, 2, 8, 1, 8, 2, 1]
def find_solution(arr, m, n):
if max(arr)>m:
return -1
optimal_seq_length = [0] * n
optimal_max_sum = [0] * n
for seq_start in reversed(range(n)):
seq_len = 0
seq_sum = 0
seq_max = 0
while True:
seq_len += 1
seq_end = seq_start + seq_len
if seq_end > n:
break
last_value_in_seq = arr[seq_end - 1]
seq_sum += last_value_in_seq
if seq_sum > m:
break
seq_max = max(seq_max, last_value_in_seq)
max_sum_from_next_seq_on = 0 if seq_end >= n else optimal_max_sum[seq_end]
max_sum = max_sum_from_next_seq_on + seq_max
if seq_len == 1 or max_sum < optimal_max_sum[seq_start]:
optimal_max_sum[seq_start] = max_sum
optimal_seq_length[seq_start] = seq_len
# create solution list of lists
solution = []
seg_start = 0
while seg_start < n:
seg_length = optimal_seq_length[seg_start]
solution.append(arr[seg_start:seg_start+seg_length])
seg_start += seg_length
return solution
print(find_solution(arr, m, n))
# [[2, 2, 2], [8, 1, 8], [2, 1]]
Key aspects of my proposal:
start from a small array (only last element), and make the problem array grow to the front:
[1]
[2, 1]
[8, 2, 1]
etc.
for each of above problem arrays, store:
the optimal sum of the maximum of each sequence (optimal_max_sum), which is the value to be minimized
the sequence length of the first sequence (optimal_seq_length) to achieve this optimal value
do this by: for each allowed sequence length starting at the beginning of the problem array:
calculate the new max_sum value and add it to previously calculated optimal_max_sum for the part after this sequence
keep the smallest max_sum, store it in optimal_max_sum and the associated seq_length in optimal_seq_length

Converting repetitive if statements into a loop

I have this code:
#!/usr/bin/python3
def contract(e, i, c, n):
l = len(e)
grid = [[0 for i in range(i + 1)] for x in range(l)]
for num1, row1 in enumerate(grid):
row1[0] = e[num1] #add exponents
for num2, row2 in enumerate(grid):
if 0 <= num2 < n[0]:
grid[num2][1] = c[num2]
if n[0] <= num2 < n[0] + n[1]:
grid[num2][2] = c[num2]
if n[0] + n[1] <= num2 < n[0] + n[1] + n[2]:
grid[num2][3] = c[num2]
for g in grid:
print(g)
e = [0, 1, 2, 3]
i = 3
c = [4, 5, 6, 7]
n = [1, 2, 1]
contract(e, i, c, n)
The idea of this code is that I have a 2 dimensional grid that has dimensions len(e) x (i + 1). The first column contains exponents e. The rest of the columns should contain coefficients c in such a way that n determines the positions of the coefficients in the grid. For example, since n[0] = 1, column 1, row 0 in the grid contains number 4. The next element in n is 2, so the next column in the grid (column 2) should contain 2 numbers, meaning numbers 5 and 6 in rows below the row that I used previously (meaning rows 1 and 2 because row 0 is already used). n[2] = 1 so grid[3][3] = 7, etc.
I implemented this with repetitive if-statements and the code works fine, the output is as it should be:
[0, 4, 0, 0]
[1, 0, 5, 0]
[2, 0, 6, 0]
[3, 0, 0, 7]
However, I would like to make an extensible program that can do this for any number of coefficients and exponents. How can I convert those repetitive if statements to a single loop?
I would convert it into a for loop that keeps track of the sum of the elements seen so far, adjusting the corresponding element if the inequality holds for that iteration:
for num2, row2 in enumerate(grid):
total = 0
for n_idx, n_elem in enumerate(n):
if total <= num2 < total + n_elem:
grid[num2][n_idx + 1] = c[num2]
total += n_elem
I would advise against using sum() in this loop, as it recomputes the sum from scratch on each iteration, which isn't very efficient.
Use a loop that sums successive slices of the n list.
for num2, row2 in enumerate(grid):
for idx in range(len(n)):
if sum(n[:idx]) <= num2 < sum(n[:idx+1]):
grid[num2][idx+1] = c[num2]
This is a direct mapping of the code you wrote to a loop, and reasonable if n doesn't get too large. BrokenBenchmark's answer is optimized to take advantage of the fact that the sum of each slice is the sum of the previous slice plus the current element.

Get Indices To Split NumPy Array

Let's say I have a NumPy array:
x = np.array([3, 9, 2, 1, 5, 4, 7, 7, 8, 6])
If I sum up this array, I get 52. What I need is a way to split it up starting from left to right into roughly n chunks where n is chosen by the user. Essentially, the splits occur in a greedy fashion. So, for some number of chunks n, the first n - 1 chunks must each sum up to at least 52/n and they must be consecutive indices taken from left to right.
So, if n = 2 then the first chunk would consist of the first 7 elements:
chunk[0] = x[:7] # [3, 9, 2, 1, 5, 4, 7], sum = 31
chunk[1] = x[7:] # [7, 8, 6], sum = 21
Notice that the first chunk wouldn't consist of the first 6 elements only since the sum would be 24 which is less than 52/2 = 26. Also, notice that the number of elements in each chunk is allowed to vary as long as the sum criteria is met. Finally, it is perfectly fine for the last chunk to not be close to 52/2 = 26 since the other chunk(s) may take more.
However, the output that I need is a two column array that contains the start index in the first column and the (exclusive) stop index in the second column:
[[0, 7],
[7, 10]]
If n = 4, then the first 3 chunks need to each sum up to at least 52/4 = 13 and would look like this:
chunk[0] = x[:3] # [3, 9, 2], sum = 14
chunk[1] = x[3:7] # [1, 5, 4], sum = 17
chunk[2] = x[7:9] # [7, 8], sum = 15
chunk[3] = x[9:] # [6], sum = 6
And the output that I need would be:
[[0, 3],
[3, 7],
[7, 9],
[9, 10]
So, one naive approach using for loops might be:
ranges = np.zeros((n_chunks, 2), np.int64)
ranges_idx = 0
range_start_idx = start
sum = 0
for i in range(x.shape[0]):
sum += x[i]
if sum > x.sum() / n_chunks:
ranges[ranges_idx, 0] = range_start_idx
ranges[ranges_idx, 1] = min(
i + 1, x.shape[0]
) # Exclusive stop index
# Reset and Update
range_start_idx = i + 1
ranges_idx += 1
sum = 0
# Handle final range outside of for loop
ranges[ranges_idx, 0] = range_start_idx
ranges[ranges_idx, 1] = x.shape[0]
if ranges_idx < n_chunks - 1:
left[ranges_idx:] = x.shape[0]
return ranges
I am looking for a nicer vectorized solution.
I found inspiration in a similar question that was answered:
def func(x, n):
out = np.zeros((n, 2), np.int64)
cum_arr = x.cumsum() / x.sum()
idx = 1 + np.searchsorted(cum_arr, np.linspace(0, 1, n, endpoint=False)[1:])
out[1:, 0] = idx # Fill the first column with start indices
out[:-1, 1] = idx # Fill the second column with exclusive stop indices
out[-1, 1] = x.shape[0] # Handle the stop index for the final chunk
return out
Update
To cover the pathological case, we need to be a little more precise and do something like:
def func(x, n, truncate=False):
out = np.zeros((n_chunks, 2), np.int64)
cum_arr = x.cumsum() / x.sum()
idx = 1 + np.searchsorted(cum_arr, np.linspace(0, 1, n, endpoint=False)[1:])
out[1:, 0] = idx # Fill the first column with start indices
out[:-1, 1] = idx # Fill the second column with exclusive stop indices
out[-1, 1] = x.shape[0] # Handle the stop index for the final chunk
# Handle pathological case
diff_idx = np.diff(idx)
if np.any(diff_idx == 0):
row_truncation_idx = np.argmin(diff_idx) + 2
out[row_truncation_idx:, 0] = x.shape[0]
out[row_truncation_idx-1:, 1] = x.shape[0]
if truncate:
out = out[:row_truncation_idx]
return out
Here is a solution that doesn't iterate over all elements:
def fun2(array, n):
min_sum = np.sum(array) / n
cumsum = np.cumsum(array)
i = -1
count = min_sum
out = []
while i < len(array)-1:
j = np.searchsorted(cumsum, count)
out.append([i+1, j+1])
i = j
if i < len(array):
count = cumsum[i] + min_sum
out[-1][1] -= 1
return np.array(out)
For the two test cases it produces the results you expected. HTH

How to count the number of items in several bins using loop in python? details showed in picture

Question details showed in the picture Thanks for your help.
Write a function histogram(values, dividers) that takes as argument a sequence of values and a sequence of bin dividers, and returns the histogram as a sequence of a suitable type (say, an array) with the counts in each bin. The number of bins is the number of dividers + 1; the first bin has no lower limit and the last bin has no upper limit. As in (a), elements that are equal to one of the dividers are counted in the bin below.
For example, suppose the sequence of values is the numbers 1,..,10 and the bin dividers are array(2, 5, 7); the histogram should be array(2, 3, 2, 3).
Here is my code
def histogram(values, dividers):
count=0
for element in values:
index=0
i=0
count[i]=0
while index < len(dividers) - 2:
if element <= dividers[index]:
i=dividers[index]
count[i] += 1
index=len(dividers)
elif element > dividers[index] and element <= dividers[index+1]:
i=dividers[index]
count[i] += 1
index= len(dividers)
index += 1
return count[i]
from bisect import bisect_left
# Using Python builtin to find where value is in dividers
(this is O(log n) for each value)
def histogram(values, dividers):
count = [0]*(1+len(dividers))
for element in values:
i = bisect_left(dividers, element)
count[i] += 1
return count
values = list(range(1, 11)) # list from 1 through 10
bins = [2, 5, 7]
c = histogram(values, bins) # Result [2, 3, 2, 3]
Explanation of histogram
1. bisect_left finds the bin the index the value should be inserted
2. We update count array according to this index. Count array size is
(1+len(bins)), to allow for values > bins[-1]
A simple implementation would be to prepare a list of counters of size len(dividers)+1.
Go through all numbers provided:
if your current number is bigger then the largest bin-divider, increment the last bins counter
else go through all dividers until your number is no longer bigger as it, and increment that bin-counter by 1
This leads to:
def histogram(values, dividers):
bins = [0 for _ in range(len(dividers)+1)]
print (bins)
for num in values:
if num > dividers[-1]:
bins[-1] += 1
else:
k = 0
while num > dividers[k]:
k+=1
bins[k] += 1
return bins
print(histogram(range(20),[2,4,9]))
Output:
# counts
[3, 2, 5, 10]
Explanation
Dividers: [2,4,9]
Bins: [ 2 and less | 4 | 9 | 10 and more ]
Numbers: 0..19
0, 1, 2 -> not bigger then 9, smaller/equal 2
3, 4 -> not bigger then 9, smaller/equal 4
5, 6, 7, 8, 9 -> not bigger then 9, smaller/equal 9
10, 11, 12, 13, 14, 15, 16, 17, 18, 19 -> bigger 9
This is a naive implementation and there are faster ones using tree like data structures for more performance. Consider a divider of [5,6,7] and a list of [7,7,7,7,7,7] this would run 6 times (6*7) testing for bins 3 times (bigger then 5, bigger then 6, not bigger then 7) == 18 unrolled loops.
There are more efficient algos possible using better suited data structures.

Largest Subset whose sum is less than equal to a given sum

A list is defined as follows: [1, 2, 3]
and the sub-lists of this are:
[1], [2], [3],
[1,2]
[1,3]
[2,3]
[1,2,3]
Given K for example 3 the task is to find the largest length of sublist with sum of elements is less than equal to k.
I am aware of itertools in python but it will result in segmentation fault for larger lists. Is there any other efficient algorithm to achieve this? Any help would be appreciated.
My code is as allows:
from itertools import combinations
def maxLength(a, k):
#print a,k
l= []
i = len(a)
while(i>=0):
lst= list(combinations(sorted(a),i))
for j in lst:
#rint list(j)
lst = list(j)
#print sum(lst)
sum1=0
sum1 = sum(lst)
if sum1<=k:
return len(lst)
i=i-1
You can use the dynamic programming solution that #Apy linked to. Here's a Python example:
def largest_subset(items, k):
res = 0
# We can form subset with value 0 from empty set,
# items[0], items[0...1], items[0...2]
arr = [[True] * (len(items) + 1)]
for i in range(1, k + 1):
# Subset with value i can't be formed from empty set
cur = [False] * (len(items) + 1)
for j, val in enumerate(items, 1):
# cur[j] is True if we can form a set with value of i from
# items[0...j-1]
# There are two possibilities
# - Set can be formed already without even considering item[j-1]
# - There is a subset with value i - val formed from items[0...j-2]
cur[j] = cur[j-1] or ((i >= val) and arr[i-val][j-1])
if cur[-1]:
# If subset with value of i can be formed store
# it as current result
res = i
arr.append(cur)
return res
ITEMS = [5, 4, 1]
for i in range(sum(ITEMS) + 1):
print('{} -> {}'.format(i, largest_subset(ITEMS, i)))
Output:
0 -> 0
1 -> 1
2 -> 1
3 -> 1
4 -> 4
5 -> 5
6 -> 6
7 -> 6
8 -> 6
9 -> 9
10 -> 10
In above arr[i][j] is True if set with value of i can be chosen from items[0...j-1]. Naturally arr[0] contains only True values since empty set can be chosen. Similarly for all the successive rows the first cell is False since there can't be empty set with non-zero value.
For rest of the cells there are two options:
If there already is a subset with value of i even without considering item[j-1] the value is True
If there is a subset with value of i - items[j - 1] then we can add item to it and have a subset with value of i.
As far as I can see (since you treat sub array as any items of the initial array) you can use greedy algorithm with O(N*log(N)) complexity (you have to sort the array):
1. Assign entire array to the sub array
2. If sum(sub array) <= k then stop and return sub array
3. Remove maximim item from the sub array
4. goto 2
Example
[1, 2, 3, 5, 10, 25]
k = 12
Solution
sub array = [1, 2, 3, 5, 10, 25], sum = 46 > 12, remove 25
sub array = [1, 2, 3, 5, 10], sum = 21 > 12, remove 10
sub array = [1, 2, 3, 5], sum = 11 <= 12, stop and return
As an alternative you can start with an empty sub array and add up items from minimum to maximum while sum is less or equal then k:
sub array = [], sum = 0 <= 12, add 1
sub array = [1], sum = 1 <= 12, add 2
sub array = [1, 2], sum = 3 <= 12, add 3
sub array = [1, 2, 3], sum = 6 <= 12, add 5
sub array = [1, 2, 3, 5], sum = 11 <= 12, add 10
sub array = [1, 2, 3, 5, 10], sum = 21 > 12, stop,
return prior one: [1, 2, 3, 5]
Look, for generating the power-set it takes O(2^n) time. It's pretty bad. You can instead use the dynamic programming approach.
Check in here for the algorithm.
http://www.geeksforgeeks.org/dynamic-programming-subset-sum-problem/
And yes, https://www.youtube.com/watch?v=s6FhG--P7z0 (Tushar explains everything well) :D
Assume everything is positive. (Handling negatives is a simple extension of this and is left to the reader as an exercise). There exists an O(n) algorithm for the described problem. Using the O(n) median select, we partition the array based on the median. We find the sum of the left side. If that is greater than k, then we cannot take all elements, we must thus recur on the left half to try to take a smaller set. Otherwise, we subtract the sum of the left half from k, then we recur on the right half to see how many more elements we can take.
Partitioning the array based on median select and recurring on only 1 of the halves yields a runtime of n+n/2 +n/4 +n/8.. which geometrically sums up to O(n).

Categories