Is my code's worse time complexity is log(n)? - python

The method foo gets as a parameter a sorted list with different numbers and returns the count of all the occurrences such that: i == list[i] (where i is the index 0 <= i <= len(list)).
def foo_helper(lst, start, end):
if start > end:
# end of recursion
return 0
if lst[end] < end or lst[start] > start:
# no point checking this part of the list
return 0
# all indexes must be equal to their values
if abs(end - start) == lst[end] - lst[start]:
return end - start + 1
middle = (end + start) // 2
print(lst[start:end+1], start, middle, end)
if lst[middle] == middle:
#print("lst[" , middle , "]=", lst[middle])
return 1 + foo_helper(lst, middle+1, end) + foo_helper(lst, start, middle-1)
elif lst[middle] < middle:
return foo_helper(lst, middle+1, end)
else:
return foo_helper(lst, start, middle-1)
def foo(lst):
return foo_helper(lst, 0, len(lst)-1)
My question is if this code's worst-case complexity = log(n)?
If not, What should I do different?

If you have a list of N numbers, all unique, and known to be sorted, then if list[0] == 0 and list[N-1] == N-1, then the uniqueness and ordering properties dictate that the entire list meets the property that list[i] == i. This can be determined in O(1) time - just check the first and last list entries.
The uniqueness and ordering properties force any list to have three separate regions - a possibly empty prefix region where list[i] < i, a possibly empty middle region where list[i] == i, and a possibly empty suffix region where list[i] > i]. In the general case, finding the middle region requires O(n) time - a scan from the front to find the first index where list[i] == i, and a scan from the back to find the last such index (or you could do both with one single forward scan). Once you find those, you are guaranteed by uniqueness and ordering that all the indexes in between will have the same property...
Edit: As pointed out by #tobias_k below, you could also do a binary search to find the two end points, which would be O(log n) instead of O(n). This would be the better option if your inputs are completely general.

To expand on my comment trying to think about this problem. Consider of the graph of the identity function, which represents the indices. We want to know where this sorted list (a strictly monotonic function) intersects the line representing the indices y = x, considering only integer locations. I think you should be able to find this in O(n) time (as commented it seems binary search for the intersection bounds should work), though I need to look at your code more closely to see what it's doing.
Because we have a sorted list with unique elements, we have i == list[i] either at no place
at one place
or if there are multiple places they must be consecutive (once you're above the line you can never come back down)
Code used:
import numpy as np
import matplotlib.pyplot as plt
a = np.unique(np.random.randint(-25, 50, 50))
indices = range(len(a))
plt.scatter(indices, indices, c='b')
plt.scatter(indices, a, c='r')
plt.show()

Related

Improving the time complexity of a function that returns the index of the first occurrence of an element in a list

UPDATE 1 (Oct.16): The original code had a few logic errors which were rectified. The updated code below should now produce the correct output for all lists L, S.T they meet the criteria for a special list.
I am trying to decrease the running time of the following function:
The "firstrepeat" function takes in a special list L and an index, and produces the smallest index such that L[i] == L[j]. In other words, whatever the element at L[i] is, the "firstrepeat" function returns the index of the first occurrence of this element in the list.
What is special about the list L?:
The list may contain repeated elements on the increasing side of the list, or the decreasing side, but not both. i.e [3,2,1,1,1,5,6] is fine but not [4,3,2,2,1,2,3]
The list is decreasing(or staying the same) and then increasing(or staying the same).
Examples:
L = [4,2,0,1,3]
L = [3,3,3,1,0,7,8,9,9]
L = [4,3,3,1,1,1]
L = [1,1,1,1]
Example Output:
Say we have L = [4,3,3,1,1,1]
firstrepeat(L,2) would output 1
firstrepeat(L,5) would output 3
I have the following code. I believe the complexity is O(log n) or better (though I could be missing something). I am looking for ways to improve the time complexity.
def firstrepeat(L, i):
left = 0
right = i
doubling = 1
#A Doubling Search
#In doubling search, we start at one index and then we look at one step
#forward, then two steps forward, then four steps, then 8, then 16, etc.
#Once we have gone too far, we do a binary search on the subset of the list
#between where we started and where we went to far.
while True:
if (right - doubling) < 0:
left = 0
break
if L[i] != L[right - doubling]:
left = right - doubling
break
if L[i] == L[right - doubling]:
right = right - doubling
doubling = doubling * 2
#A generic Binary search
while right - left > 1:
median = (left + right) // 2
if L[i] != L[median]:
left = median
else:
right = median
f L[left] == L[right]:
return left
else:
return right

Binary search: weird middle point calculation

Regarding calculation of the list mid-point: why is there
i = (first +last) //2
and last is initialized to len(a_list) - 1? From my quick tests, this algorithm without -1 works correctly.
def binary_search(a_list, item):
"""Performs iterative binary search to find the position of an integer in a given, sorted, list.
a_list -- sorted list of integers
item -- integer you are searching for the position of
"""
first = 0
last = len(a_list) - 1
while first <= last:
i = (first + last) / 2
if a_list[i] == item:
return '{item} found at position {i}'.format(item=item, i=i)
elif a_list[i] > item:
last = i - 1
elif a_list[i] < item:
first = i + 1
else:
return '{item} not found in the list'.format(item=item)
The last legal index is len(a_list) - 1. The algorithm will work correctly, as first will always be no more than this, so that the truncated mean will never go out of bounds. However, without the -1, the midpoint computation will be one larger than optimum about half the time, resulting in a slight loss of speed.
Consider the case where the item you're searching for is greater than all the elements of the list. In that case the statement first = i + 1 gets executed repeatedly. Finally you get to the last iteration of the loop, where first == last. In that case i is also equal to last, but if last=len() then i is off the end of the list! The first if statement will fail with an index out of range.
See for yourself: https://ideone.com/yvdTzo
You have another error in that code too, but I'll let you find it for yourself.

Find all elements that appear more than n/4 times in a sorted array

My question is similar to Find all elements that appear more than n/4 times in linear time, you are given an array of size n, find all elements that appear more than n/4 times, the difference is that the array is sorted and the runtime should be better than O(n).
My approach is to do 3 binary searches for the first occurrence of each element in position n/4, n/2 and 3*n/4, since the array is sorted, we can know if each element appears more than n/4 times by checking if the next n/4 element has the same value.
I have written the following code in python3, do you guys think my approach is correct and if there is anything that can be simplified?:
import bisect
# return -1 if x doesn't exist in arr
def binary_search(arr, x):
pos = bisect.bisect_left(arr, x)
return pos if pos != len(arr) and arr[pos] == x else -1
def majority(arr):
n = len(arr)
output = []
quarters = [arr[n//4],arr[n//2],arr[3*n//4]]
# avoid repeating answer in output array
if arr[n//4] == arr[n//2]:
quarters.remove(arr[n//4])
quarters.remove(arr[n//2])
output.append(arr[n//2])
if arr[n//2] == arr[3*n//4]:
if arr[n//2] in arr:
quarters.remove(quarters[n//2])
if arr[3*n//4] in arr:
quarters.remove(quarters[3*n//4])
if arr[n//2] not in output:
output.append(arr[n//2])
for quarter in quarters:
pos = binary_search(arr, quarter)
if pos != -1 and pos+n//4 < len(arr) and arr[pos] == arr[pos+n//4]:
output.append(arr[pos])
return output
print(majority([1,1,1,6,6,6,9,145]))
I think what you want is more like:
Examine the element at position n/4
Do a binary search to find the first occurrence of that item.
Do a binary search to find the next occurrence of that item.
If last-first > n/4, then output it.
Repeat that process for n/2 and 3(n/4)
There is an early out opportunity if the previous item extends beyond the next n/4 marker.
I would make the following improvement. Take the 9 values that are at position 0, n/8, n/4, 3n/8, ..., n. You only need to consider values that were repeated twice.
When you do the binary search, you can do both ends in tandem. That way if the most common value is much smaller or much larger than 1/4 then you don't do most of the binary search.

recursion, finding max, why is it not stopping?

I tried to find the maximum value in a sorted list. but the recursion is not stopping. please, can somebody help me?
A = [5,16,28,43,0,1]
start = 0
end = len(A) - 1
mid = 0
print mid
def search(start, end, mid):
mid = int((start + end) / 2)
print mid
if A[mid] > [mid - 1] and A[mid] > A[mid + 1]:
return A[mid]
else:
if A[mid - 1] > A[mid + 1]:
search(start, mid, mid)
else:
search(mid, end, mid)
print search(start, end, mid)
You need to add a "basis case" (where the recursion stops).
A natural basis case for this problem: if start is equal to end, just return A[start]
EDIT:
I just looked at this and the more I look the more confused I get. Why are you using recursion to find a max? It would make more sense to use recursion to do a "binary search" to find a value inside a sorted list.
If you want to really find a max value, that's pretty easy. With recursion, we first want a "basis case" that gives us a trivial solution; then we want more code that will take us one step closer to that solution.
In this case, the basis case: we have only one value in the list; return it as the max. To be specific, if the start and end together specify just one value, return that one value. To make it proof against errors, might as well make this also handle the case where start is equal to or even greater than end.
Next remember the first value.
Next make a recursive call, but add one to start to reduce the size of the list we are considering. This is the part that takes us one step closer to a solution. Repeat this step enough times and we arrive at the basis case where there is only one value in the list to consider.
Finally compare the remembered first value with the result of the recursive call and return the larger of the two.
I'll lay it out in psuedocode for you:
BASIS CASE: start and end specify one value: return A[start]
save A[0] in a variable
save recursive_call_to_this_function(start+1, end) in a variable
compare two saved values and return the larger
Once you have tried to write the above in code, peek below this line for my working tested solution.
def recursive_max(start, end):
if start >= end - 1:
return A[start]
x0 = A[start]
x1 = recursive_max(start+1, end)
if x0 >= x1:
return x0
else:
return x1
print recursive_max(start, end)
I agree with most of steveha's answer, but rather than picking off one element at a time I'd suggest dividing the list into halves and finding the max in each half. You won't do any fewer comparisons, but the growth of the recursive stack will be O(log(len(A))) rather than O(len(A)). For large lists this would be the difference between getting a stack overflow or not.
My implementation (which takes the list as an argument rather than expecting it to be global) follows:
def recursive_max(value_list, start, end):
if start >= end:
return value_list[start]
mid = start + (end - start) // 2
lower_half_max = recursive_max(value_list, start, mid)
upper_half_max = recursive_max(value_list, mid+1, end)
if lower_half_max > upper_half_max:
return lower_half_max
else:
return upper_half_max

dificulty solving a code in O(logn)

I wrote a function that gets as an input a list of unique ints in order,(from small to big). Im supposed to find in the list an index that matches the value in the index. for example if L[2]==2 the output is true.
so after i did that in complexity O(logn) i now want to find how many indexes behave like that in the given list with the same complexity O(logn).
im uploading my first code that does the first part and the second code which i need help with:
def steady_state(L):
lower= 0
upper= len(L) -1
while lower<=upper:
middle_i= (upper+ lower)//2
if L[middle_i]== middle_i:
return middle_i
elif L[middle_i]>middle_i:
upper= middle_i-1
else:
lower= middle_i +1
return None
def cnt_steady_states(L):
lower= 0
upper= len(L) -1
a=b=steady_state(L)
if steady_state(L)== None:
return 0
else:
cnt=1
while True:
if L[upper] == upper and a<=upper:
cnt+= upper-a
upper= a
if L[lower]== lower and b>=lower:
cnt+= b- lower
lower = b
It's not possible with the restrictions you've given yet. The best complexity you can theoretically achieve is O(­n).
O() assumes the worst case (just a definition, you could drop that part). And in the worst case you will always have to look at each item in order to check it for being equal to its index.
The case changes if you have more restrictions (e. g. the numbers are all ints and none may appear more than once, i. e. no two consecutive numbers are equal). Maybe this is the case?
EDIT:
After hearing that in fact my assumed restrictions apply (i. e. only once-appearing ints) I now propose this approach: You can safely assume that you can have only exactly one continuous range where all your matching entries are located. I. e. you only need to find a lower bound and upper bound. The wanted result will then be the size of that range.
Each bound can safely be found using a binary search, of which each has O(log n).
def binsearch(field, lower=True, a=0, b=None):
if b is None:
b = len(field)
while a + 1 < b:
c = (a + b) / 2
if lower:
if field[c] < c:
a = c
else:
b = c
else: # search for upper bound
if field[c] > c:
b = c
else:
a = c
return b if lower else a
def indexMatchCount(field):
upper = binsearch(field, lower=False)
lower = binsearch(field, b=upper+1)
return upper - lower + 1
This I used for testing:
field = list({ random.randint(-10, 30) for i in range(30) })
field.sort()
upper = binsearch(field, lower=False)
lower = binsearch(field, b=upper+1)
for i, f in enumerate(field):
print lower <= i <= upper, i == f, i, f
Assuming negative integers are OK:
I think the key is that if you get a value less than your index, you know all indices to the left also do not match their value (since the integers are strictly increasing). Also, once you get an index whose value is greater than the index, everything to the right is incorrect (same reason). You can then do a divide and conquer algorithm like you did in the first case. Something along the lines of:
check middle index:
if equal:
count = count + 1
check both halves, minus this index
elif value > index:
check left side (lower to index)
elif index > value:
check right side (index to upper)
In the worst case (every index matches the value), we still have to check every index.
If the integers are non-negative, then you know even more. You now also know that if an index matches the value, all indices to the left must also match the value (why?). Thus, you get:
check middle index:
if equal:
count = count + indices to the left (index-lower)
check the right side (index to upper)
elif value > index:
check left side (lower to index)
elif index > value:
##Can't happen in this case
Now our worst case is significantly improved. Instead of finding an index that matches and not gaining any new information from it, we gain a ton of information when we find one that matches, and now know half of the indices match.
If "all of the numbers are ints and they appear only once", then you can simply do a binary search for the first pair of numbers where L[i]==i && L[i+1]!=i+1.
To allow negative ints, check if L[0]<0, and if so, search between 1..N for:
i>0 && L[i]==i && L[i-1]!=i-1. Then perform the previous search between i and N.

Categories