Given an array and a number k, you need to count the number of subarrays in which k is the maximum.
For example, in the array [4,1,2,3,1,5] and k=3. So the count for this array would be 6.
I came up with the following solution:
count = 0
n = len(a)
for i in range(n):
for j in range(i,n):
b = a[i:j]
if k in b and max(b) == k:
count += 1
return count
The time complexity for this is O(n^2). How can I optimize it (using the two pointer approach, preferably) to get an O(n) solution?
One solution for a unique k in the list:
k = 3
a = [4,1,2,3,1,5]
length = len(a)
ucount, lcount = 0, 0
# Find the index of k:
index = a.index(k)
# From this position, go in one direction until a larger number is found
# increment ucount for each step
upper = index
while upper < length and a[upper] <= k:
ucount += 1
upper += 1
# After that, go from index backwards until a larger number is found
# increment lcount for each step
lower = index
while lower >= 0 and a[lower] <= k:
lcount += 1
lower -= 1
# Multiply the upper and lower count
print(ucount*lcount)
Worst case, that's O(n) for finding the index and O(n) again for both while loops together. Which is still O(n) altogether.
Another solution would be collecting lower, index and upper while traversing the list once.
For multiple occurrences of k it gets more complicated, especially when they overlap (when they are connected by numbers < k).
Related
the problem is to find total number of sub-lists from a given list that doesn't contain numbers greater than a specified upper bound number say right and sub lists max number should be greater than a lower bound say left .Suppose my list is: x=[2, 0, 11, 3, 0] and upper bound for sub-list elements is 10 and lower bound is 1 then my sub-lists can be [[2],[2,0],[3],[3,0]] as sub lists are always continuous .My script runs well and produces correct output but needs some optimization
def query(sliced,left,right):
end_index=0
count=0
leng=len(sliced)
for i in range(leng):
stack=[]
end_index=i
while(end_index<leng and sliced[end_index]<=right):
stack.append(sliced[end_index])
if max(stack)>=left:
count+=1
end_index+=1
print (count)
origin=[2,0,11,3,0]
left=1
right=10
query(origin,left,right)
output:4
for a list say x=[2,0,0,1,11,14,3,5] valid sub-lists can be [[2],[2,0],[2,0,0],[2,0,0,1],[0,0,1],[0,1],[1],[3],[5],[3,5]] total being 10
Brute force
Generate every possible sub-list and check if the given criteria hold for each sub-list.
Worst case scenario: For every element e in the array, left < e < right.
Time complexity: O(n^3)
Optimized brute force (OP's code)
For every index in the array, incrementally build a temporary list (not really needed though) which is valid.
Worst case scenario: For every element e in the array, left < e < right.
Time complexity: O(n^2)
A more optimized solution
If the array has n elements, then the number of sub-lists in the array is 1 + 2 + 3 + ... + n = (n * (n + 1)) / 2 = O(n^2). We can use this formula strategically.
First, as #Tim mentioned, we can just consider the sum of the sub-lists that do not contain any numbers greater than right by partitioning the list about those numbers greater than right. This reduces the task to only considering sub-lists that have all elements less than or equal to right then summing the answers.
Next, break apart the reduced sub-list (yes, the sub-list of the sub-list) by partitioning the reduced sub-list about the numbers greater than or equal to left. For each of those sub-lists, compute the number of possible sub-lists of that sub-list of sub-lists (which is k * (k + 1) / 2 if the sub-list has length k). Once that is done for all the the sub-lists of sub-lists, add them together (store them in, say, w) then compute the number of possible sub-lists of that sub-list and subtract w.
Then aggregate your results by sum.
Worst case scenario: For every element e in the array, e < left.
Time Complexity: O(n)
I know this is very difficult to understand, so I have included working code:
def compute(sliced, lo, hi, left):
num_invalid = 0
start = 0
search_for_start = True
for end in range(lo, hi):
if search_for_start and sliced[end] < left:
start = end
search_for_start = False
elif not search_for_start and sliced[end] >= left:
num_invalid += (end - start) * (end - start + 1) // 2
search_for_start = True
if not search_for_start:
num_invalid += (hi - start) * (hi - start + 1) // 2
return ((hi - lo) * (hi - lo + 1)) // 2 - num_invalid
def query(sliced, left, right):
ans = 0
start = 0
search_for_start = True
for end in range(len(sliced)):
if search_for_start and sliced[end] <= right:
start = end
search_for_start = False
elif not search_for_start and sliced[end] > right:
ans += compute(sliced, start, end, left)
search_for_start = True
if not search_for_start:
ans += compute(sliced, start, len(sliced), left)
return ans
Categorise the numbers as small, valid and large (S,V and L) and further index the valid numbers: V_1, V_2, V_3 etc. Let us start off by assuming there are no large numbers.
Consider the list A = [S,S,…,S,V_1, X,X,X,X,…X] .If V_1 has index n, there are n+1, subsets of the form [V_1], [S,V_1], [S,S,V_1] and so on. And for each of these n+1 subsets, we can append the len(A)-n-1 sequences: [X], [XX], [XXX] and so on. Giving a total of (n+1)(len(A)-n) subsets containing V_1.
But we can partition the set of all subsets by those containing V_k but no V_n for n less than k. Hence we must then, simply perform the same calculation on the remaining XXX…X part of the list using V_2 and itterate. This would require something like this:
def query(sliced,left,right,total):
index=0
while index<len(sliced):
if sliced[index]>=left:
total+=(index+1)*(len(sliced)-index)
return total+query(sliced[index+1:],left,right,0)
else:
index+=1
return total
To incorporate the large numbers, we can just partition the whole set according to where the large numbers occur and add the total number of sequence for each. If we call our first function, sub_query, then we arrive at the following:
def sub_query(sliced,left,right,total):
index=0
while index<len(sliced):
if sliced[index]>=left:
total+=(index+1)*(len(sliced)-index)
return total+sub_query(sliced[index+1:],left,right,0)
else:
index+=1
return total
def query(sliced,left,right):
index=0
count=0
while index<len(sliced):
if sliced[index]>right:
count+=sub_query(sliced[:index],left,right,0)
sliced=sliced[index+1:]
index=0
else:
index+=1
count+=sub_query(sliced,left,right,0)
print (count)
This seems to run through the list and check for max/min values fewer times. Note it doesn’t distinguish between sub-lists that are the same but from different positions in the original list (as would arise from a list such as [0,1,0,0,1,0]. But the code from the original post wouldn’t do that either, so I am guessing this is not a requirement.
PROBLEM :
You are given a list of size N, initialized with zeroes. You have to perform M operations on the list and output the maximum of final values of all the elements in the list. For every operation, you are given three integers a,b and k and you have to add value to all the elements ranging from index a to b(both inclusive).
Input Format
First line will contain two integers N and M separated by a single space.
Next lines will contain three integers a,b and k separated by a single space.
Numbers in list are numbered from 1 to N.
Here is the code which I have written:
n,m=map(int,input().split())
arr=[]
for i in range(n+1):
arr.append(0)
for j in range(m):
a,b,k=map(int,input().split())
for i in range(a,b+1):
arr[i]+=k;
print(max(arr))
When I try to submit my solution I get a "TERMINATED DUE TO TIMOUT" message.Could you please suggest a strategy to avoid these kind of errors and also a solution to the problem.
Thanks in advance!
Don't loop over the list range; instead, use map again to increment the indicated values. Something like
for j in range(m):
a,b,k=map(int,input().split())
arr[a:b+1] = map(lambda <increment-by-k>, arr[a:b+1])
This should let your resident optimization swoop in and save some time.
You probably need an algorithm that has better complexity than O(M*N).
You can put interval delimiters in a list:
n,m=map(int,input().split())
intervals = []
arr = [0 for i in range(n)]
for j in range(m):
a,b,k=map(int,input().split())
intervals += [(str(a), "begin", k)]
intervals += [(str(b), "end", k)]
intervals = sorted(intervals, key=lambda x: x[0]+x[1])
k, i = 0, 0
for op in intervals:
ind = int(op[0])
if op[1] == "begin":
while ind > i:
arr[i] += k
i += 1
k += op[2]
else:
while i <= ind:
arr[i] += k
i+= 1
k -= op[2]
print(arr)
If the sorting algorithm is O(MlogM), this is O(MlogM + N)
File input.txt consists of two lines: first has integer number N space then integer number K (1 ≤ N,K ≤ 250000). Second has N space-delimeted integers, where each integer is less than or equal to K. It is guaranteed that each integer from 1 to K is in the array. The task is to find subarray of minimum length, that contains all integers. And print its start and end. Note, that indexing starts from 1.
Examples:
Input Output
5 3 2 4
1 2 1 3 2
6 4 2 6
2 4 2 3 3 1
I had this task in a recent programming competition. It is over, and I am not cheating. I've implemented it using python 3:
with open('input.txt') as file:
N, K = [int(x) for x in file.readline().split()]
alley = [int(x) for x in file.readline().split()]
trees = {}
min_idx = (1, N)
min_length = N
for i in range(N):
trees[alley[i]] = i
if len(trees) == K:
idx = (min(trees.values())+1, max(trees.values())+1)
length = idx[1] - idx[0] + 1
if length < min_length:
min_idx = idx
min_length = length
if min_length == K:
break
print (str(min_idx[0]) + " " + str(min_idx[1]))
The idea is to save last position of i-th tree into a dictionary and if dictionary contains all items, check if this subarray is minimum.
16th test showed that my algorithm exceeded time limit, which was 1 second. I think, that my algorithm is O(N), because it finishes in one run across array, and map access costs O(1).
How can one speed up this algorithm? Can be complexity reduced or is it my misunderstanding of some Python which takes much time?
Your algorithm is good but ignoring the time that len(trees) < K, it's O(NK) because every call to min and max is O(K). There's no need to call max because max(trees.values()) == i. Dealing with min is trickier, but if you keep track of which key corresponds to the minimum index then you can recalculate it only when that key is updated.
A minor point is that your last if doesn't always need to be checked.
Overall:
trees = {}
min_idx = (1, N)
min_length = N
first_index = -1
for i in range(N):
trees[alley[i]] = i
if len(trees) == K:
if first_index == -1 or alley[first_index] == alley[i]:
first_index = min(trees.values())
idx = (first_index+1, i+1)
length = idx[1] - idx[0] + 1
if length < min_length:
min_idx = idx
min_length = length
if min_length == K:
break
Make integer array Counts[K], fill with zeros.
Keep some variables - left index L, right index R (like your idx[0] and idx[1]), zero count Z.
Set L and R to 1, increment Counts[A[1]], set Z to K-1
Move R, incrementing Counts[A[1]], and decrement Z if zero entry is updated, until Z becomes 0
At this moment subarray [L..R] contains all values from to K
Now move L, decrementing Counts entry for values leaving the window. Increment Z if some entry becomes 0. When Z becomes non-zero, stop moving L and move R again.
When R reaches N and L stops, process is over. Minimum length is minimum from valid (R-L+1) pairs
Example run for your [1 2 1 3 2]
Move R
1 0 0 Z=2
1 1 0 Z=1
2 1 0 Z=1
2 1 1 Z=0
Move L
1 1 1 Z=0
1 0 1 Z=1 Stop moving L, check previous L,R pair 2,4
Move R
1 1 1 Z=0
move L
9 1 1 Z=1 Stop moving L, check previous L,R pair 3,5
So I'm working on some practice problems and having trouble reducing the complexity. I am given an array of distinct integers a[] and a threshold value T. I need to find the number of triplets i,j,k such that a[i] < a[j] < a[k] and a[i] + a[j] + a[k] <= T. I've gotten this down from O(n^3) to O(n^2 log n) with the following python script. I'm wondering if I can optimize this any further.
import sys
import bisect
first_line = sys.stdin.readline().strip().split(' ')
num_numbers = int(first_line[0])
threshold = int(first_line[1])
count = 0
if num_numbers < 3:
print count
else:
numbers = sys.stdin.readline().strip().split(' ')
numbers = map(int, numbers)
numbers.sort()
for i in xrange(num_numbers - 2):
for j in xrange(i+1, num_numbers - 1):
k_1 = threshold - (numbers[i] + numbers[j])
if k_1 < numbers[j]:
break
else:
cross_thresh = bisect.bisect(numbers,k_1) - (j+1)
if cross_thresh > 0:
count += cross_thresh
print count
In the above example, the first input line simply provides the number of numbers and the threshold. The next line is the full list. If the list is less than 3, there is no triplets that can exist, so we return 0. If not, we read in the full list of integers, sort them, and then process them as follows: we iterate over every element of i and j (such that i < j) and we compute the highest value of k that would not break i + j + k <= T. We then find the index (s) of the first element in the list that violates this condition and take all the elements between j and s and add them to the count. For 30,000 elements in a list, this takes about 7 minutes to run. Is there any way to make it faster?
You are performing binary search for each (i,j) pair to find the corresponding value for k. Hence O(n^2 log(n)).
I can suggest an algorithm that will have the worst case time complexity of O(n^2).
Assume the list is sorted from left to right and elements are numbered from 1 to n. Then the pseudo code is:
for i = 1 to n - 2:
j = i + 1
find maximal k with binary search
while j < k:
j = j + 1
find maximal k with linear search to the left, starting from last k position
The reason this has the worst case time complexity of O(n^2) and not O(n^3) is because the position k is monotonically decreasing. Thus even with linear scanning, you are not spending O(n) for each (i,j) pair. Rather, you are spending a total of O(n) time to scan for k for each distinct i value.
O(n^2) version implemented in Python (based on wookie919's answer):
def triplets(N, T):
N = sorted(N)
result = 0
for i in xrange(len(N)-2):
k = len(N)-1
for j in xrange(i+1, len(N)-1):
while k>=0 and N[i]+N[j]+N[k]>T:
k-=1
result += max(k, j)-j
return result
import random
sample = random.sample(xrange(1000000), 30000)
print triplets(sample, 500000)
I tried to program Interiewstreet's Insertion sort challenge Link for the challenge
in Python and here is my code shown below.
The program runs fine for a limit(which I'm not sure of) of input elements, but returns a false output for inputs of larger sizes. Can anyone guide me what am I doing wrong?
# This program tries to identify number of times swapping is done to sort the input array
"""
=>Get input values and print them
=>Get number of test cases and get inputs for those test cases
=>Complete Insertion sort routine
=>Add a variable to count the swapping's
"""
def sort_swap_times(nums):
""" This function takes a list of elements and then returns the number of times
swapping was necessary to complete the sorting
"""
times_swapped = 0L
# perform the insertion sort routine
for j in range(1, len(nums)):
key = nums[j]
i = j - 1
while i >= 0 and nums[i] > key:
# perform swap and update the tracker
nums[i + 1] = nums[i]
times_swapped += 1
i = i - 1
# place the key value in the position identified
nums[i + 1] = key
return times_swapped
# get the upper limit.
limit = int(raw_input())
swap_count = []
# get the length and elements.
for i in range(limit):
length = int(raw_input())
elements_str = raw_input() # returns a list of strings
# convert the given elements from str to int
elements_int = map(int, elements_str.split())
# pass integer elements list to perform the sorting
# get the number of times swapping was needed and append the return value to swap_count list
swap_count.append(sort_swap_times(elements_int))
# print the swap counts for each input array
for x in swap_count:
print x
Your algorithm is correct, but this is a naive approach to the problem and will give you a Time Limit Exceed signal on large test cases (i.e., len(nums) > 10000). Let's analyze the run-time complexity of your algorithm.
for j in range(1, len(nums)):
key = nums[j]
i = j - 1
while i >= 0 and nums[i] > key:
# perform swap and update the tracker
nums[i + 1] = nums[i]
times_swapped += 1
i = i - 1
# place the key value in the position identified
nums[i + 1] = key
The number of steps required in the above snippet is proportional to 1 + 2 + .. + len(nums)-1, or len(nums)*(len(nums)-1)/2 steps, which is O(len(nums)^2).
Hint:
Use the fact that all values will be within [1,10^6]. What you are really doing here is finding the number of inversions in the list, i.e. find all pairs of i < j s.t. nums[i] > nums[j]. Think of a data structure that allows you to find the number of swaps needed for each insert operation in logarithmic time complexity. Of course, there are other approaches.
Spoiler:
Binary Indexed Trees