Find the initial position of the lowest/longest sequence in an array - python

Similar to sleep cycles alarms, I need to cut my array in the best place possible (low numbers in my scenario) respecting a range of min/max amount of values...
To simplify, if I am able to find the longest lowest sequence in an array, I think I can move forward.
For example:
[1,2,3,0,1,4,5,6,6,0.1,1.1,2,4]
Should return 9, because 0.1 is the first value of the longest lowest, even though I have a lower value than 0.1;
[4,5,7,10,0.13,0.2,0.12,8,9,28,0.1,0.11,0.102]
Should return 10, because it is lower than 1, even though, the sequence has the same amount of numbers...
Longer sequences (in my scenario) are more important than lower. Any idea how to start this? I don't have a threshold, but a solution involving this should be ok I think (if calculated on-the-fly)

I'm not sure about your convoluted logic behind "longest lowest", but this could be a good start:
data = [4,5,7,10,0.13,0.2,0.12,8,9,28,0.1,0.11,0.102]
result = [[0,0]]
threshold = 1.0
for num, val in enumerate(data) :
if val < threshold :
if result[-1][1] == 0 : # start a new sequence
result[-1][0] = num
result[-1][1] = 1
else : # continue existing sequence
result[-1][1] += 1
else : # end the previous sequence
if result[-1][1] > 0 :
result.append([0,0])
returns (first element, sequence length) pairs:
[[4, 3], [10, 3]]
that you may further analyse for the length, value of the first element or whatever you like.

Related

Find the closest two values in an array where the difference is at least x

Task: Given an unsorted array, how do I find the closest two values such that their absolute difference is at least x (return -1 if there is no two elements with this difference in the array)?
My approach: I tried using a sliding window approach:
I start the first pointer at index 0 and the second pointer at index 1.
I increment index 1 until I get the difference between the array elements at the first and second indexes are at least x.
After that, I start incrementing the first index until the second index, keeping track of any values in between that are small enough to also match the difference of at least x.
I do this until the second pointer makes it to the end of the array.
Here's my code with an example:
length = 6
x = 5
arr = [1, 1, 4, 3, 7, 5]
smallestRange = float('inf')
first = 0
second = 1
while (second < length):
# Keep going until the proper difference is found
while (second < length and abs(arr[first] - arr[second]) < x):
second += 1
# No matches are found
if (second == length):
break
# Let first catch up to second
minVal = first
while (first != second):
if (arr[first] <= arr[minVal] and abs(arr[first] - arr[second]) >= x):
minVal = first
first += 1
if (minVal != second):
if (second - minVal < smallestRange):
smallestRange = second - minVal
second += 1
if (smallestRange == float('inf')):
smallestRange = -1
print(smallestRange)
Problem: This isn't covering all of the possible cases. For example, if the first element doesn't have any absolute differences with any of the other elements, my second pointer will just go straight to the end and my code just returns -1. I've also tried checking other input arrays by hand and sometimes it doesn't work. Could I get any pointers on how to fix this?
The example should result in printing 3 since 1 and 7 would be the closest pair that has a difference of at least 5.

Recovering Subsets in Subset Sum Problem - Not All Subsets Appear

Brushing up on dynamic programming (DP) when I came across this problem. I managed to use DP to determine how many solutions there are in the subset sum problem.
def SetSum(num_set, num_sum):
#Initialize DP matrix with base cases set to 1
matrix = [[0 for i in range(0, num_sum+1)] for j in range(0, len(num_set)+1)]
for i in range(len(num_set)+1): matrix[i][0] = 1
for i in range(1, len(num_set)+1): #Iterate through set elements
for j in range(1, num_sum+1): #Iterate through sum
if num_set[i-1] > j: #When current element is greater than sum take the previous solution
matrix[i][j] = matrix[i-1][j]
else:
matrix[i][j] = matrix[i-1][j] + matrix[i-1][j-num_set[i-1]]
#Retrieve elements of subsets
subsets = SubSets(matrix, num_set, num_sum)
return matrix[len(num_set)][num_sum]
Based on Subset sum - Recover Solution, I used the following method to retrieve the subsets since the set will always be sorted:
def SubSets(matrix, num_set, num):
#Initialize variables
height = len(matrix)
width = num
subset_list = []
s = matrix[0][num-1] #Keeps track of number until a change occurs
for i in range(1, height):
current = matrix[i][width]
if current > s:
s = current #keeps track of changing value
cnt = i -1 #backwards counter, -1 to exclude current value already appended to list
templist = [] #to store current subset
templist.append(num_set[i-1]) #Adds current element to subset
total = num - num_set[i-1] #Initial total will be sum - max element
while cnt > 0: #Loop backwards to find remaining elements
if total >= num_set[cnt-1]: #Takes current element if it is less than total
templist.append(num_set[cnt-1])
total = total - num_set[cnt-1]
cnt = cnt - 1
templist.sort()
subset_list.append(templist) #Add subset to solution set
return subset_list
However, since it is a greedy approach it only works when the max element of each subset is distinct. If two subsets have the same max element then it only returns the one with the larger values. So for elements [1, 2, 3, 4, 5] with sum of 10 it only returns
[1, 2, 3, 4] , [1, 4, 5]
When it should return
[1, 2, 3, 4] , [2, 3, 5] , [1, 4, 5]
I could add another loop inside the while loop to leave out each element but that would increase the complexity to O(rows^3) which can potentially be more than the actual DP, O(rows*columns). Is there another way to retrieve the subsets without increasing the complexity? Or to keep track of the subsets while the DP approach is taking place? I created another method that can retrieve all of the unique elements in the solution subsets in O(rows):
def RecoverSet(matrix, num_set):
height = len(matrix) - 1
width = len(matrix[0]) - 1
subsets = []
while height > 0:
current = matrix[height][width]
top = matrix[height-1][width]
if current > top:
subsets.append(num_set[height-1])
if top == 0:
width = width - num_set[height-1]
height -= 1
return subsets
Which would output [1, 2, 3, 4, 5]. However, getting the actual subsets from it seems like solving the subset problem all over again. Any ideas/suggestions on how to store all of the solution subsets (not print them)?
That's actually a very good question, but it seems mostly you got the right intuition.
The DP approach allows you to build a 2D table and essentially encode how many subsets sum up to the desired target sum, which takes time O(target_sum*len(num_set)).
Now if you want to actually recover all solutions, this is another story in the sense that the number of solution subsets might be very large, in fact much larger than the table you built while running the DP algorithm. If you want to find all solutions, you can use the table as a guide but it might take a long time to find all subsets. In fact, you can find them by going backwards through the recursion that defined your table (the if-else in your code when filling up the table). What do I mean by that?
Well let's say you try to find the solutions, having only the filled table at your disposal. The first thing to do to tell whether there is a solution is to check that the element at row len(num_set) and column num has value > 0, indicating that at least one subset sums up to num. Now there are two possibilities, either the last number in num_set is used in a solution in which case we must then check whether there is a subset using all numbers except that last one, which sums up to num-num_set[-1]. This is one possible branch in the recursion. The other one is when the last number in num_set is not used in a solution, in which case we must then check whether we can still find a solution to sum up to num, but having all numbers except that last one.
If you keep going you will see that the recovering can be done by doing the recursion backwards. By keeping track of the numbers along the way (so the different paths in the table that lead to the desired sum) you can retrieve all solutions, but again bear in mind that the running time might be extremely long because we want to actually find all solutions, not just know their existence.
This code should be what you are looking for recovering solutions given the filled matrix:
def recover_sol(matrix, set_numbers, target_sum):
up_to_num = len(set_numbers)
### BASE CASES (BOTTOM OF RECURSION) ###
# If the target_sum becomes negative or there is no solution in the matrix, then
# return an empty list and inform that this solution is not a successful one
if target_sum < 0 or matrix[up_to_num][target_sum] == 0:
return [], False
# If bottom of recursion is reached, that is, target_sum is 0, just return an empty list
# and inform that this is a successful solution
if target_sum == 0:
return [], True
### IF NOT BASE CASE, NEED TO RECURSE ###
# Case 1: last number in set_numbers is not used in solution --> same target but one item less
s1_sols, success1 = recover_sol(matrix, set_numbers[:-1], target_sum)
# Case 2: last number in set_numbers is used in solution --> target is lowered by item up_to_num
s2_sols, success2 = recover_sol(matrix, set_numbers[:-1], target_sum - set_numbers[up_to_num-1])
# If Case 2 is a success but bottom of recursion was reached
# so that it returned an empty list, just set current sol as the current item
if s2_sols == [] and success2:
# The set of solutions is just the list containing one item (so this explains the list in list)
s2_sols = [[set_numbers[up_to_num-1]]]
# Else there are already solutions and it is a success, go through the multiple solutions
# of Case 2 and add the current number to them
else:
s2_sols = [[set_numbers[up_to_num-1]] + s2_subsol for s2_subsol in s2_sols]
# Join lists of solutions for both Cases, and set success value to True
# if either case returns a successful solution
return s1_sols + s2_sols, success1 or success2
For the full solution with matrix filling AND recovering of solutions you can then do
def subset_sum(set_numbers, target_sum):
n_numbers = len(set_numbers)
#Initialize DP matrix with base cases set to 1
matrix = [[0 for i in range(0, target_sum+1)] for j in range(0, n_numbers+1)]
for i in range(n_numbers+1):
matrix[i][0] = 1
for i in range(1, n_numbers+1): #Iterate through set elements
for j in range(1, target_sum+1): #Iterate through sum
if set_numbers[i-1] > j: #When current element is greater than sum take the previous solution
matrix[i][j] = matrix[i-1][j]
else:
matrix[i][j] = matrix[i-1][j] + matrix[i-1][j-set_numbers[i-1]]
return recover_sol(matrix, set_numbers, target_sum)[0]
Cheers!

Find all elements that appear more than n/4 times in a sorted array

My question is similar to Find all elements that appear more than n/4 times in linear time, you are given an array of size n, find all elements that appear more than n/4 times, the difference is that the array is sorted and the runtime should be better than O(n).
My approach is to do 3 binary searches for the first occurrence of each element in position n/4, n/2 and 3*n/4, since the array is sorted, we can know if each element appears more than n/4 times by checking if the next n/4 element has the same value.
I have written the following code in python3, do you guys think my approach is correct and if there is anything that can be simplified?:
import bisect
# return -1 if x doesn't exist in arr
def binary_search(arr, x):
pos = bisect.bisect_left(arr, x)
return pos if pos != len(arr) and arr[pos] == x else -1
def majority(arr):
n = len(arr)
output = []
quarters = [arr[n//4],arr[n//2],arr[3*n//4]]
# avoid repeating answer in output array
if arr[n//4] == arr[n//2]:
quarters.remove(arr[n//4])
quarters.remove(arr[n//2])
output.append(arr[n//2])
if arr[n//2] == arr[3*n//4]:
if arr[n//2] in arr:
quarters.remove(quarters[n//2])
if arr[3*n//4] in arr:
quarters.remove(quarters[3*n//4])
if arr[n//2] not in output:
output.append(arr[n//2])
for quarter in quarters:
pos = binary_search(arr, quarter)
if pos != -1 and pos+n//4 < len(arr) and arr[pos] == arr[pos+n//4]:
output.append(arr[pos])
return output
print(majority([1,1,1,6,6,6,9,145]))
I think what you want is more like:
Examine the element at position n/4
Do a binary search to find the first occurrence of that item.
Do a binary search to find the next occurrence of that item.
If last-first > n/4, then output it.
Repeat that process for n/2 and 3(n/4)
There is an early out opportunity if the previous item extends beyond the next n/4 marker.
I would make the following improvement. Take the 9 values that are at position 0, n/8, n/4, 3n/8, ..., n. You only need to consider values that were repeated twice.
When you do the binary search, you can do both ends in tandem. That way if the most common value is much smaller or much larger than 1/4 then you don't do most of the binary search.

Pythonic way of checking if indefinite # of consec elements in list sum to given value

Having trouble figuring out a nice way to get this task done.
Say i have a list of triangular numbers up to 1000 -> [0,1,3,6,10,15,..]etc
Given a number, I want to return the consecutive elements in that list that sum to that number.
i.e.
64 --> [15,21,28]
225 --> [105,120]
371 --> [36, 45, 55, 66, 78, 91]
if there's no consecutive numbers that add up to it, return an empty list.
882 --> [ ]
Note that the length of consecutive elements can be any number - 3,2,6 in the examples above.
The brute force way would iteratively check every possible consecutive pairing possibility for each element. (start at 0, look at the sum of [0,1], look at the sum of [0,1,3], etc until the sum is greater than the target number). But that's probably O(n*2) or maybe worse. Any way to do it better?
UPDATE:
Ok, so a friend of mine figured out a solution that works at O(n) (I think) and is pretty intuitively easy to follow. This might be similar (or the same) to Gabriel's answer, but it was just difficult for me to follow and I like that this solution is understandable even from a basic perspective. this is an interesting question, so I'll share her answer:
def findConsec(input1 = 7735):
list1 = range(1, 1001)
newlist = [reduce(lambda x,y: x+y,list1[0:i]) for i in list1]
curr = 0
end = 2
num = sum(newlist[curr:end])
while num != input1:
if num < input1:
num += newlist[end]
end += 1
elif num > input1:
num -= newlist[curr]
curr += 1
if curr == end:
return []
if num == input1:
return newlist[curr:end]
A 3-iteration max solution
Another solution would be to start from close where your number would be and walk forward from one position behind. For any number in the triangular list vec, their value can be defined by their index as:
vec[i] = sum(range(0,i+1))
The division between the looking-for sum value and the length of the group is the average of the group and, hence, lies within it, but may as well not exist in it.
Therefore, you can set the starting point for finding a group of n numbers whose sum matches a value val as the integer part of the division between them. As it may not be in the list, the position would be that which minimizes their difference.
# vec as np.ndarray -> the triangular or whatever-type series
# val as int -> sum of n elements you are looking after
# n as int -> number of elements to be summed
import numpy as np
def seq_index(vec,n,val):
index0 = np.argmin(abs(vec-(val/n)))-n/2-1 # covers odd and even n values
intsum = 0 # sum of which to keep track
count = 0 # counter
seq = [] # indices of vec that sum up to val
while count<=2: # walking forward from the initial guess of where the group begins or prior to it
intsum = sum(vec[(index0+count):(index0+count+n)])
if intsum == val:
seq.append(range(index0+count,index0+count+n))
count += 1
return seq
# Example
vec = []
for i in range(0,100):
vec.append(sum(range(0,i))) # build your triangular series from i = 0 (0) to i = 99 (whose sum equals 4950)
vec = np.array(vec) # convert to numpy to make it easier to query ranges
# looking for a value that belong to the interval 0-4590
indices = seq_index(vec,3,4)
# print indices
print indices[0]
print vec[indices]
print sum(vec[indices])
Returns
print indices[0] -> [1, 2, 3]
print vec[indices] -> [0 1 3]
print sum(vec[indices]) -> 4 (which we were looking for)
This seems like an algorithm question rather than a question on how to do it in python.
Thinking backwards I would copy the list and use it in a similar way to the Sieve of Eratosthenes. I would not consider the numbers that are greater than x. Then start from the greatest number and sum backwards. Then if I get greater than x, subtract the greatest number (exclude it from the solution) and continue to sum backward.
This seems the most efficient way to me and actually is O(n) - you never go back (or forward in this backward algorithm), except when you subtract or remove the biggest element, which doesn't need accessing the list again - just a temp var.
To answer Dunes question:
Yes, there is a reason - to subtracts the next largest in case of no-solution that sums larger. Going from the first element, hit a no-solution would require access to the list again or to the temporary solution list to subtract a set of elements that sum greater than the next element to sum. You risk to increase the complexity by accessing more elements.
To improve efficiency in the cases where an eventual solution is at the beginning of the sequence you can search for the smaller and larger pair using binary search. Once a pair of 2 elements, smaller than x is found then you can sum the pair and if it sums larger than x you go left, otherwise you go right. This search has logarithmic complexity in theory. In practice complexity is not what it is in theory and you can do whatever you like :)
You should pick the first three elements, sum them and do and then you keep subtracting the first of the three and add the next element in the list and see if the sum add up to whatever number you want. That would be O(n).
# vec as np.ndarray
import numpy as np
itsum = sum(list[0:2]) # the sum you want to iterate and check its value
sequence = [[] if itsum == whatever else [range(0,3)]] # indices of the list that add up to whatever (creation)
for i in range(3,len(vec)):
itsum -= vec[i-3]
itsum += vec[i]
if itsum == whatever:
sequence.append(range(i-2,i+1)) # list of sequences that add up to whatever
The solution you provide in the question isn't truly O(n) time complexity -- the way you compute your triangle numbers makes the computation O(n2). The list comprehension throws away the previous work that want into calculating the last triangle number. That is: tni = tni-1 + i (where tn is a triangle number). Since you also, store the triangle numbers in a list, your space complexity is not constant, but related to the size of the number you are looking for. Below is an identical algorithm, but is O(n) time complexity and O(1) space complexity (written for python 3).
# for python 2, replace things like `highest = next(high)` with `highest = high.next()`
from itertools import count, takewhile, accumulate
def find(to_find):
# next(low) == lowest number in total
# next(high) == highest number not in total
low = accumulate(count(1)) # generator of triangle numbers
high = accumulate(count(1))
total = highest = next(high)
# highest = highest number in the sequence that sums to total
# definitely can't find solution if the highest number in the sum is greater than to_find
while highest <= to_find:
# found a solution
if total == to_find:
# keep taking numbers from the low iterator until we find the highest number in the sum
return list(takewhile(lambda x: x <= highest, low))
elif total < to_find:
# add the next highest triangle number not in the sum
highest = next(high)
total += highest
else: # if total > to_find
# subtract the lowest triangle number in the sum
total -= next(low)
return []

dificulty solving a code in O(logn)

I wrote a function that gets as an input a list of unique ints in order,(from small to big). Im supposed to find in the list an index that matches the value in the index. for example if L[2]==2 the output is true.
so after i did that in complexity O(logn) i now want to find how many indexes behave like that in the given list with the same complexity O(logn).
im uploading my first code that does the first part and the second code which i need help with:
def steady_state(L):
lower= 0
upper= len(L) -1
while lower<=upper:
middle_i= (upper+ lower)//2
if L[middle_i]== middle_i:
return middle_i
elif L[middle_i]>middle_i:
upper= middle_i-1
else:
lower= middle_i +1
return None
def cnt_steady_states(L):
lower= 0
upper= len(L) -1
a=b=steady_state(L)
if steady_state(L)== None:
return 0
else:
cnt=1
while True:
if L[upper] == upper and a<=upper:
cnt+= upper-a
upper= a
if L[lower]== lower and b>=lower:
cnt+= b- lower
lower = b
It's not possible with the restrictions you've given yet. The best complexity you can theoretically achieve is O(­n).
O() assumes the worst case (just a definition, you could drop that part). And in the worst case you will always have to look at each item in order to check it for being equal to its index.
The case changes if you have more restrictions (e. g. the numbers are all ints and none may appear more than once, i. e. no two consecutive numbers are equal). Maybe this is the case?
EDIT:
After hearing that in fact my assumed restrictions apply (i. e. only once-appearing ints) I now propose this approach: You can safely assume that you can have only exactly one continuous range where all your matching entries are located. I. e. you only need to find a lower bound and upper bound. The wanted result will then be the size of that range.
Each bound can safely be found using a binary search, of which each has O(log n).
def binsearch(field, lower=True, a=0, b=None):
if b is None:
b = len(field)
while a + 1 < b:
c = (a + b) / 2
if lower:
if field[c] < c:
a = c
else:
b = c
else: # search for upper bound
if field[c] > c:
b = c
else:
a = c
return b if lower else a
def indexMatchCount(field):
upper = binsearch(field, lower=False)
lower = binsearch(field, b=upper+1)
return upper - lower + 1
This I used for testing:
field = list({ random.randint(-10, 30) for i in range(30) })
field.sort()
upper = binsearch(field, lower=False)
lower = binsearch(field, b=upper+1)
for i, f in enumerate(field):
print lower <= i <= upper, i == f, i, f
Assuming negative integers are OK:
I think the key is that if you get a value less than your index, you know all indices to the left also do not match their value (since the integers are strictly increasing). Also, once you get an index whose value is greater than the index, everything to the right is incorrect (same reason). You can then do a divide and conquer algorithm like you did in the first case. Something along the lines of:
check middle index:
if equal:
count = count + 1
check both halves, minus this index
elif value > index:
check left side (lower to index)
elif index > value:
check right side (index to upper)
In the worst case (every index matches the value), we still have to check every index.
If the integers are non-negative, then you know even more. You now also know that if an index matches the value, all indices to the left must also match the value (why?). Thus, you get:
check middle index:
if equal:
count = count + indices to the left (index-lower)
check the right side (index to upper)
elif value > index:
check left side (lower to index)
elif index > value:
##Can't happen in this case
Now our worst case is significantly improved. Instead of finding an index that matches and not gaining any new information from it, we gain a ton of information when we find one that matches, and now know half of the indices match.
If "all of the numbers are ints and they appear only once", then you can simply do a binary search for the first pair of numbers where L[i]==i && L[i+1]!=i+1.
To allow negative ints, check if L[0]<0, and if so, search between 1..N for:
i>0 && L[i]==i && L[i-1]!=i-1. Then perform the previous search between i and N.

Categories