Binary search: weird middle point calculation

Binary search: weird middle point calculation - python

Regarding calculation of the list mid-point: why is there
i = (first +last) //2
and last is initialized to len(a_list) - 1? From my quick tests, this algorithm without -1 works correctly.
def binary_search(a_list, item):
"""Performs iterative binary search to find the position of an integer in a given, sorted, list.
a_list -- sorted list of integers
item -- integer you are searching for the position of
"""
first = 0
last = len(a_list) - 1
while first <= last:
i = (first + last) / 2
if a_list[i] == item:
return '{item} found at position {i}'.format(item=item, i=i)
elif a_list[i] > item:
last = i - 1
elif a_list[i] < item:
first = i + 1
else:
return '{item} not found in the list'.format(item=item)

The last legal index is len(a_list) - 1. The algorithm will work correctly, as first will always be no more than this, so that the truncated mean will never go out of bounds. However, without the -1, the midpoint computation will be one larger than optimum about half the time, resulting in a slight loss of speed.

Consider the case where the item you're searching for is greater than all the elements of the list. In that case the statement first = i + 1 gets executed repeatedly. Finally you get to the last iteration of the loop, where first == last. In that case i is also equal to last, but if last=len() then i is off the end of the list! The first if statement will fail with an index out of range.
See for yourself: https://ideone.com/yvdTzo
You have another error in that code too, but I'll let you find it for yourself.

Related

Leetcode jumpgame recursive approach

Given the following problem:
You are given an integer array nums. You are initially positioned at the array's first index, and each element in the array represents your maximum jump length at that position.
Return true if you can reach the last index, or false otherwise.
Example 1:
Input: nums = [2,3,1,1,4]
Output: True
Explanation: Jump 1 step from index 0 to 1, then 3 steps to the last index.
Example 2:
Input: nums = [3,2,1,0,4]
Output: False
Explanation: You will always arrive at index 3 no matter what. Its maximum jump length is 0, which makes it impossible to reach the last index.
I am trying to come up with a recursive solution. This is what I have so far. I am not looking for the optimal solution. I am just trying to solve using recursion for now. If n[i] is 0 I want the loop to go back to the previous loop and continue recursing, but I can't figure out how to do it.
def jumpGame(self, n: []) -> bool:
if len(n) < 2:
return True
for i in range(len(n)):
for j in range(1, n[i]+1):
next = i + j
return self.jumpGame(n[next:])
return False

If you want to do recursively and you said no need to be optimal ( so not memoized ), you could go with the below method. You don't need nested loops.
Also no need to explore all paths, you could optimize by looking at the step that you are going by checking i + (jump) < n
def jumpGame(a, i):
if i > len(a) - 1:
return False
if i == len(a) - 1:
return True
reached = False
for j in range(1, a[i] + 1):
if i + j < len(a):
reached = jumpGame(a, i + j)
if reached:
return True
return reached
print(jumpGame([2, 3, 1, 1, 4], 0))
print(jumpGame([3,2,1,0,4], 0))
True
False

When considering recursive solutions, the first thing you should consider is the 'base case', followed by the 'recursive case'. The base case is just 'what is the smallest form of this problem for which I can determine an answer', and the recursive is 'can I get from some form n of this problem to some form n - 1'.
That's a bit pedantic, but lets apply it to your situation. What is the base case? That case is if you have a list of length 1. If you have a list of length 0, there is no last index and you can return false. That would simply be:
if len(ls) == 0:
return False
if len(ls) == 1:
return True
Since we don't care what is in the last index, only at arriving at the last index, we know these if statements handle our base case.
Now for the recursive step. Assuming you have a list of length n, we must consider how to reduce the size of the problem. This is by making a 'jump', and we know that we can make a jump equal to a length up to the value of the current index. Then we just need to test each of these lengths. If any of them return True, we succeed.
any(jump_game(n[jump:] for jump in range(1, n[0] + 1)))
There are two mechanisms we are using here to make this easy. any takes in a sequence and quits as soon as one value is True, returning True. If none of them are true, it will return False. The second is a list slice, n[jump:] which takes a slice of a list from the index jump to the end. This might result in an empty list.
Putting this together we get:
def jump_game(n: list) -> bool:
# Base cases
if len(n) == 0:
return False
if len(n) == 1:
return True
# Recursive case
return any(jump_game(n[jump:]) for jump in range(1, n[0] + 1))
The results:
>>> jump_game([2,3,1,1,4])
True
>>> jump_game([3,2,1,0,1])
False
>>> jump_game([])
False
>>> jump_game([1])
True
I'm trying to lay out the rigorous approach here, because I think it helps to clarify where recursion goes wrong. In your recursive case you do need to iterate through your options - but that is only one loop, not the two you have. In your solution, in each recursion, you're iterating (for i in range(len(n))) through the entire list. So, you're really hiding an iterative solution inside a recursive one. Further, your base case is wrong, because a list of length 0 is considered a valid solution - but in fact, only a list of length 1 should return a True result.
What you should focus on for recursion is, again, solving the smallest possible form(s) of the problem. Here, it is if the list is one or zero length long. Then, you need to step each other possible size of the problem (length of the list) to a base case. We know we can do that by examining the first element, and choosing to jump anywhere up to that value. This gives us our options. We try each in turn until a solution is found - or until the space is exhausted and we can confidently say there is no solution.

Binary search on array with duplicate

First time posting here, so apologies in advance if I am not following best practices. My algorithm is supposed to do the following in a sorted array with possible duplicates.
Return -1 if the element does not exist in the array
Return the smallest index where the element is present.
I have written a binary search algorithm for an array without duplicate. This returns a position of the element or -1. Based on blackbox testing, I know that the non-duplicate version of the binary search works. I have then recursively called that function via another function to search from 0 to position-1 to find the first incidence of the element, if any.
I am currently failing a black box test. I am getting a wrong answer error and not a time out error. I have tried most of the corner cases that I could think of and also ran a brute force test with the naive search algorithm and could not find an issue.
I am looking for some guidance on what might be wrong in the implementation rather than an alternate solution.
The format is as follow:
Input:
5 #array size
3 4 7 7 8 #array elements need to be sorted
5 #search query array size
3 7 2 8 4 #query elements
Output
0 2 -1 4 1
My code is shown below:
class BinarySearch:
def __init__(self,input_list,query):
self.array=input_list
self.length=len(input_list)
self.query=query
return
def binary_search(self,low,high):
'''
Implementing the binary search algorithm with distinct numbers on a
sorted input.
'''
#trivial case
if (self.query<self.array[low]) or (self.query>self.array[high-1]):
return -1
elif (low>=high-1) and self.array[low]!=self.query:
return -1
else:
m=low+int(np.floor((high-low)/2))
if self.array[low]==self.query:
return low
elif (self.array[m-1]>=self.query):
return self.binary_search(low,m)
elif self.array[high-1]==self.query:
return high-1
else:
return self.binary_search(m,high)
return
class DuplicateBinarySearch(BinarySearch):
def __init__(self,input_list,query):
BinarySearch.__init__(self,input_list,query)
def handle_duplicate(self,position):
'''
Function handles the duplicate number problem.
Input: position where query is identified.
Output: updated earlier position if it exists else return
original position.
'''
if position==-1:
return -1
elif position==0:
return 0
elif self.array[position-1]!=self.query:
return position
else:
new_position=self.binary_search(0,position)
if new_position==-1 or new_position>=position:
return position
else:
return self.handle_duplicate(new_position)
def naive_duplicate(self,position):
old_position=position
if position==-1:
return -1
else:
while position>=0 and self.array[position]==self.query:
position-=1
if position==-1:
return old_position
else:
return position+1
if __name__ == '__main__':
num_keys = int(input())
input_keys = list(map(int, input().split()))
assert len(input_keys) == num_keys
num_queries = int(input())
input_queries = list(map(int, input().split()))
assert len(input_queries) == num_queries
for q in input_queries:
item=DuplicateBinarySearch(input_keys,q)
#res=item.handle_duplicate(item.binary_search(0,item.length))
#res=item.naive_duplicate(item.binary_search(0,item.length))
#assert res_check==res
print(item.handle_duplicate(item.binary_search(0,item.length)), end=' ')
#print(item.naive_duplicate(item.binary_search(0,item.length)), end=' ')
When I run a naive duplicate algorithm, I get a time out error:
Failed case #56/57: time limit exceeded (Time used: 10.00/5.00, memory used: 42201088/536870912.)
When I run the binary search with duplicate algorithm, I get a wrong answer error on a different test case:
Failed case #24/57: Wrong answer
(Time used: 0.11/5.00, memory used: 42106880/536870912.)
The problem statement is as follows:
Problem Statement
Update:
I could make the code work by making the following change but I have not been able to create a test case to see why the code would fail in the first case.
Original binary search function that works with no duplicates but fails an unknown edge case when a handle_duplicate function calls it recursively. I changed the binary search function to the following:
def binary_search(self,low,high):
'''
Implementing the binary search algorithm with distinct numbers on a sorted input.
'''
#trivial case
if (low>=high-1) and self.array[low]!=self.query:
return -1
elif (self.query<self.array[low]) or (self.query>self.array[high-1]):
return -1
else:
m=low+(high-low)//2
if self.array[low]==self.query:
return low
elif (self.array[m-1]>=self.query):
return self.binary_search(low,m)
elif self.array[m]<=self.query:
return self.binary_search(m,high)
elif self.array[high-1]==self.query:
return high-1
else:
return -1

Since you are going to implement binary search with recursive, i would suggest you add a variable 'result' which act as returning value and hold intermediate index which equal to target value.
Here is an example:
def binarySearchRecursive(nums, left, right, target, result):
"""
This is your exit point.
If the target is not found, result will be -1 since it won't change from initial value.
If the target is found, result will be the index of the first occurrence of the target.
"""
if left > right:
return result
# Overflow prevention
mid = left + (right - left) // 2
if nums[mid] == target:
# We are not sure if this is the first occurrence of the target.
# So we will store the index to the result now, and keep checking.
result = mid
# Since we are looking for "first occurrence", we discard right half.
return binarySearchRecursive(nums, left, mid - 1, target, result)
elif target < nums[mid]:
return binarySearchRecursive(nums, left, mid - 1, target, result)
else:
return binarySearchRecursive(nums, mid + 1, right, target, result)
if __name__ == '__main__':
nums = [2,4,4,4,7,7,9]
target = 4
(left, right) = (0, len(nums)-1)
result = -1 # Initial value
index = binarySearchRecursive(nums, left, right, target, result)
if index != -1:
print(index)
else:
print('Not found')
From your updated version, I still feel the exit point of your function is a little unintuitive.(Your "trivial case" section)
Since the only condition that your searching should stop, is that you have searched all possible section of the list. That is when the range of searching area is 0, there is no element left to be search and check. In implementation, that is when left < right, or high < low, is true.
The 'result' variable, is initialized as -1 when the function first been called from main. And won't change if there is no match find. And after each successful matching, since we can not be sure if it is the first occurrence, we will just store this index into the result. If there are more 'left matching', then the value will be update. If there is not, then the value will be eventually returned. If the target is not in the list, the return will be -1, as its original initialized value.

Always have this error 'IndexError: string index out of range', when i have taken into consideration of index range

Need to write a program that prints the longest substring of variable, in which the letters occur in alphabetical order.
eg. s = 'onsjdfjqiwkvftwfbx', it should returns 'dfjq'.
as a beginner, code written as below:
y=()
z=()
for i in range(len(s)-1):
letter=s[i]
while s[i]<=s[i+1]:
letter+=s[i+1]
i+=1
y=y+(letter,)
z=z+(len(letter),)
print(y[z.index(max(z))])
However, above code will always return
IndexError: string index out of range.
It will produce the desired result until I change it to range(len(s)-3).
Would like to seek advice on:
Why range(len(s)-1) will lead to such error message? In order to take care of index up to i+1, I have already reduce the range value by 1.
my rationale is, if the length of variable s is 14, it has index from 0-13, range(14) produce value 0-13. However as my code involves i+1 index, range is reduced by 1 to take care of this part.
How to amend above code to produce correct result.
if s = 'abcdefghijklmnopqrstuvwxyz', above code with range(len(s)-3) returns IndexError: string index out of range again. Why? what's wrong with this code?
Any help is appreciated~

Te reason for the out of range index is that in your internal while loop, you are advancing i without checking for its range. Your code is also very inefficient, as you have nested loops, and you are doing a lot of relatively expensive string concatenation. A linear time algorithm without concatenations would look something like this:
s = 'onsjdfjqiwkvftwfbcdefgxa'
# Start by assuming the longest substring is the first letter
longest_end = 0
longest_length = 1
length = 1
for i in range(1, len(s)):
if s[i] > s[i - 1]:
# If current character higher in order than previous increment current length
length += 1
if length > longest_length:
# If current length, longer than previous maximum, remember position
longest_end = i + 1
longest_length = length
else:
# If not increasing order, reset current length
length = 1
print(s[longest_end - longest_length:longest_end])

Regarding "1":
Actually, using range(len(s)-2) should also work.
The reason range(len(s)-1) breaks:
For 'onsjdfjqiwkvftwfbx', the len() will be equal to 18. Still, the max index you can refer is 17 (since indexing starts at 0).
Thus, when you loop through "i"'s, at some point, i will increase to 17 (which corresponds to len(s)-1) and then try access s[i+1] in your while comparison (which is impossible).
Regarding "2":
The following should work:
current_output = ''
biggest_output = ''
for letter in s:
if current_output == '':
current_output += letter
else:
if current_output[-1]<=letter:
current_output += letter
else:
if len(current_output) > len(biggest_output):
biggest_output = current_output
current_output = letter
if len(current_output) > len(biggest_output):
biggest_output = current_output
print(biggest_output)

Improving the time complexity of a function that returns the index of the first occurrence of an element in a list

UPDATE 1 (Oct.16): The original code had a few logic errors which were rectified. The updated code below should now produce the correct output for all lists L, S.T they meet the criteria for a special list.
I am trying to decrease the running time of the following function:
The "firstrepeat" function takes in a special list L and an index, and produces the smallest index such that L[i] == L[j]. In other words, whatever the element at L[i] is, the "firstrepeat" function returns the index of the first occurrence of this element in the list.
What is special about the list L?:
The list may contain repeated elements on the increasing side of the list, or the decreasing side, but not both. i.e [3,2,1,1,1,5,6] is fine but not [4,3,2,2,1,2,3]
The list is decreasing(or staying the same) and then increasing(or staying the same).
Examples:
L = [4,2,0,1,3]
L = [3,3,3,1,0,7,8,9,9]
L = [4,3,3,1,1,1]
L = [1,1,1,1]
Example Output:
Say we have L = [4,3,3,1,1,1]
firstrepeat(L,2) would output 1
firstrepeat(L,5) would output 3
I have the following code. I believe the complexity is O(log n) or better (though I could be missing something). I am looking for ways to improve the time complexity.
def firstrepeat(L, i):
left = 0
right = i
doubling = 1
#A Doubling Search
#In doubling search, we start at one index and then we look at one step
#forward, then two steps forward, then four steps, then 8, then 16, etc.
#Once we have gone too far, we do a binary search on the subset of the list
#between where we started and where we went to far.
while True:
if (right - doubling) < 0:
left = 0
break
if L[i] != L[right - doubling]:
left = right - doubling
break
if L[i] == L[right - doubling]:
right = right - doubling
doubling = doubling * 2
#A generic Binary search
while right - left > 1:
median = (left + right) // 2
if L[i] != L[median]:
left = median
else:
right = median
f L[left] == L[right]:
return left
else:
return right

Find all elements that appear more than n/4 times in a sorted array

My question is similar to Find all elements that appear more than n/4 times in linear time, you are given an array of size n, find all elements that appear more than n/4 times, the difference is that the array is sorted and the runtime should be better than O(n).
My approach is to do 3 binary searches for the first occurrence of each element in position n/4, n/2 and 3*n/4, since the array is sorted, we can know if each element appears more than n/4 times by checking if the next n/4 element has the same value.
I have written the following code in python3, do you guys think my approach is correct and if there is anything that can be simplified?:
import bisect
# return -1 if x doesn't exist in arr
def binary_search(arr, x):
pos = bisect.bisect_left(arr, x)
return pos if pos != len(arr) and arr[pos] == x else -1
def majority(arr):
n = len(arr)
output = []
quarters = [arr[n//4],arr[n//2],arr[3*n//4]]
# avoid repeating answer in output array
if arr[n//4] == arr[n//2]:
quarters.remove(arr[n//4])
quarters.remove(arr[n//2])
output.append(arr[n//2])
if arr[n//2] == arr[3*n//4]:
if arr[n//2] in arr:
quarters.remove(quarters[n//2])
if arr[3*n//4] in arr:
quarters.remove(quarters[3*n//4])
if arr[n//2] not in output:
output.append(arr[n//2])
for quarter in quarters:
pos = binary_search(arr, quarter)
if pos != -1 and pos+n//4 < len(arr) and arr[pos] == arr[pos+n//4]:
output.append(arr[pos])
return output
print(majority([1,1,1,6,6,6,9,145]))

I think what you want is more like:
Examine the element at position n/4
Do a binary search to find the first occurrence of that item.
Do a binary search to find the next occurrence of that item.
If last-first > n/4, then output it.
Repeat that process for n/2 and 3(n/4)
There is an early out opportunity if the previous item extends beyond the next n/4 marker.

I would make the following improvement. Take the 9 values that are at position 0, n/8, n/4, 3n/8, ..., n. You only need to consider values that were repeated twice.
When you do the binary search, you can do both ends in tandem. That way if the most common value is much smaller or much larger than 1/4 then you don't do most of the binary search.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Binary search: weird middle point calculation - python

Related

Leetcode jumpgame recursive approach

Binary search on array with duplicate

Always have this error 'IndexError: string index out of range', when i have taken into consideration of index range

Improving the time complexity of a function that returns the index of the first occurrence of an element in a list

Find all elements that appear more than n/4 times in a sorted array

Categories

Resources