dificulty solving a code in O(logn) - python

I wrote a function that gets as an input a list of unique ints in order,(from small to big). Im supposed to find in the list an index that matches the value in the index. for example if L[2]==2 the output is true.
so after i did that in complexity O(logn) i now want to find how many indexes behave like that in the given list with the same complexity O(logn).
im uploading my first code that does the first part and the second code which i need help with:
def steady_state(L):
lower= 0
upper= len(L) -1
while lower<=upper:
middle_i= (upper+ lower)//2
if L[middle_i]== middle_i:
return middle_i
elif L[middle_i]>middle_i:
upper= middle_i-1
else:
lower= middle_i +1
return None
def cnt_steady_states(L):
lower= 0
upper= len(L) -1
a=b=steady_state(L)
if steady_state(L)== None:
return 0
else:
cnt=1
while True:
if L[upper] == upper and a<=upper:
cnt+= upper-a
upper= a
if L[lower]== lower and b>=lower:
cnt+= b- lower
lower = b

It's not possible with the restrictions you've given yet. The best complexity you can theoretically achieve is O(­n).
O() assumes the worst case (just a definition, you could drop that part). And in the worst case you will always have to look at each item in order to check it for being equal to its index.
The case changes if you have more restrictions (e. g. the numbers are all ints and none may appear more than once, i. e. no two consecutive numbers are equal). Maybe this is the case?
EDIT:
After hearing that in fact my assumed restrictions apply (i. e. only once-appearing ints) I now propose this approach: You can safely assume that you can have only exactly one continuous range where all your matching entries are located. I. e. you only need to find a lower bound and upper bound. The wanted result will then be the size of that range.
Each bound can safely be found using a binary search, of which each has O(log n).
def binsearch(field, lower=True, a=0, b=None):
if b is None:
b = len(field)
while a + 1 < b:
c = (a + b) / 2
if lower:
if field[c] < c:
a = c
else:
b = c
else: # search for upper bound
if field[c] > c:
b = c
else:
a = c
return b if lower else a
def indexMatchCount(field):
upper = binsearch(field, lower=False)
lower = binsearch(field, b=upper+1)
return upper - lower + 1
This I used for testing:
field = list({ random.randint(-10, 30) for i in range(30) })
field.sort()
upper = binsearch(field, lower=False)
lower = binsearch(field, b=upper+1)
for i, f in enumerate(field):
print lower <= i <= upper, i == f, i, f

Assuming negative integers are OK:
I think the key is that if you get a value less than your index, you know all indices to the left also do not match their value (since the integers are strictly increasing). Also, once you get an index whose value is greater than the index, everything to the right is incorrect (same reason). You can then do a divide and conquer algorithm like you did in the first case. Something along the lines of:
check middle index:
if equal:
count = count + 1
check both halves, minus this index
elif value > index:
check left side (lower to index)
elif index > value:
check right side (index to upper)
In the worst case (every index matches the value), we still have to check every index.
If the integers are non-negative, then you know even more. You now also know that if an index matches the value, all indices to the left must also match the value (why?). Thus, you get:
check middle index:
if equal:
count = count + indices to the left (index-lower)
check the right side (index to upper)
elif value > index:
check left side (lower to index)
elif index > value:
##Can't happen in this case
Now our worst case is significantly improved. Instead of finding an index that matches and not gaining any new information from it, we gain a ton of information when we find one that matches, and now know half of the indices match.

If "all of the numbers are ints and they appear only once", then you can simply do a binary search for the first pair of numbers where L[i]==i && L[i+1]!=i+1.
To allow negative ints, check if L[0]<0, and if so, search between 1..N for:
i>0 && L[i]==i && L[i-1]!=i-1. Then perform the previous search between i and N.

Related

Find the initial position of the lowest/longest sequence in an array

Similar to sleep cycles alarms, I need to cut my array in the best place possible (low numbers in my scenario) respecting a range of min/max amount of values...
To simplify, if I am able to find the longest lowest sequence in an array, I think I can move forward.
For example:
[1,2,3,0,1,4,5,6,6,0.1,1.1,2,4]
Should return 9, because 0.1 is the first value of the longest lowest, even though I have a lower value than 0.1;
[4,5,7,10,0.13,0.2,0.12,8,9,28,0.1,0.11,0.102]
Should return 10, because it is lower than 1, even though, the sequence has the same amount of numbers...
Longer sequences (in my scenario) are more important than lower. Any idea how to start this? I don't have a threshold, but a solution involving this should be ok I think (if calculated on-the-fly)
I'm not sure about your convoluted logic behind "longest lowest", but this could be a good start:
data = [4,5,7,10,0.13,0.2,0.12,8,9,28,0.1,0.11,0.102]
result = [[0,0]]
threshold = 1.0
for num, val in enumerate(data) :
if val < threshold :
if result[-1][1] == 0 : # start a new sequence
result[-1][0] = num
result[-1][1] = 1
else : # continue existing sequence
result[-1][1] += 1
else : # end the previous sequence
if result[-1][1] > 0 :
result.append([0,0])
returns (first element, sequence length) pairs:
[[4, 3], [10, 3]]
that you may further analyse for the length, value of the first element or whatever you like.

Binary search: weird middle point calculation

Regarding calculation of the list mid-point: why is there
i = (first +last) //2
and last is initialized to len(a_list) - 1? From my quick tests, this algorithm without -1 works correctly.
def binary_search(a_list, item):
"""Performs iterative binary search to find the position of an integer in a given, sorted, list.
a_list -- sorted list of integers
item -- integer you are searching for the position of
"""
first = 0
last = len(a_list) - 1
while first <= last:
i = (first + last) / 2
if a_list[i] == item:
return '{item} found at position {i}'.format(item=item, i=i)
elif a_list[i] > item:
last = i - 1
elif a_list[i] < item:
first = i + 1
else:
return '{item} not found in the list'.format(item=item)
The last legal index is len(a_list) - 1. The algorithm will work correctly, as first will always be no more than this, so that the truncated mean will never go out of bounds. However, without the -1, the midpoint computation will be one larger than optimum about half the time, resulting in a slight loss of speed.
Consider the case where the item you're searching for is greater than all the elements of the list. In that case the statement first = i + 1 gets executed repeatedly. Finally you get to the last iteration of the loop, where first == last. In that case i is also equal to last, but if last=len() then i is off the end of the list! The first if statement will fail with an index out of range.
See for yourself: https://ideone.com/yvdTzo
You have another error in that code too, but I'll let you find it for yourself.

Pythonic way of checking if indefinite # of consec elements in list sum to given value

Having trouble figuring out a nice way to get this task done.
Say i have a list of triangular numbers up to 1000 -> [0,1,3,6,10,15,..]etc
Given a number, I want to return the consecutive elements in that list that sum to that number.
i.e.
64 --> [15,21,28]
225 --> [105,120]
371 --> [36, 45, 55, 66, 78, 91]
if there's no consecutive numbers that add up to it, return an empty list.
882 --> [ ]
Note that the length of consecutive elements can be any number - 3,2,6 in the examples above.
The brute force way would iteratively check every possible consecutive pairing possibility for each element. (start at 0, look at the sum of [0,1], look at the sum of [0,1,3], etc until the sum is greater than the target number). But that's probably O(n*2) or maybe worse. Any way to do it better?
UPDATE:
Ok, so a friend of mine figured out a solution that works at O(n) (I think) and is pretty intuitively easy to follow. This might be similar (or the same) to Gabriel's answer, but it was just difficult for me to follow and I like that this solution is understandable even from a basic perspective. this is an interesting question, so I'll share her answer:
def findConsec(input1 = 7735):
list1 = range(1, 1001)
newlist = [reduce(lambda x,y: x+y,list1[0:i]) for i in list1]
curr = 0
end = 2
num = sum(newlist[curr:end])
while num != input1:
if num < input1:
num += newlist[end]
end += 1
elif num > input1:
num -= newlist[curr]
curr += 1
if curr == end:
return []
if num == input1:
return newlist[curr:end]
A 3-iteration max solution
Another solution would be to start from close where your number would be and walk forward from one position behind. For any number in the triangular list vec, their value can be defined by their index as:
vec[i] = sum(range(0,i+1))
The division between the looking-for sum value and the length of the group is the average of the group and, hence, lies within it, but may as well not exist in it.
Therefore, you can set the starting point for finding a group of n numbers whose sum matches a value val as the integer part of the division between them. As it may not be in the list, the position would be that which minimizes their difference.
# vec as np.ndarray -> the triangular or whatever-type series
# val as int -> sum of n elements you are looking after
# n as int -> number of elements to be summed
import numpy as np
def seq_index(vec,n,val):
index0 = np.argmin(abs(vec-(val/n)))-n/2-1 # covers odd and even n values
intsum = 0 # sum of which to keep track
count = 0 # counter
seq = [] # indices of vec that sum up to val
while count<=2: # walking forward from the initial guess of where the group begins or prior to it
intsum = sum(vec[(index0+count):(index0+count+n)])
if intsum == val:
seq.append(range(index0+count,index0+count+n))
count += 1
return seq
# Example
vec = []
for i in range(0,100):
vec.append(sum(range(0,i))) # build your triangular series from i = 0 (0) to i = 99 (whose sum equals 4950)
vec = np.array(vec) # convert to numpy to make it easier to query ranges
# looking for a value that belong to the interval 0-4590
indices = seq_index(vec,3,4)
# print indices
print indices[0]
print vec[indices]
print sum(vec[indices])
Returns
print indices[0] -> [1, 2, 3]
print vec[indices] -> [0 1 3]
print sum(vec[indices]) -> 4 (which we were looking for)
This seems like an algorithm question rather than a question on how to do it in python.
Thinking backwards I would copy the list and use it in a similar way to the Sieve of Eratosthenes. I would not consider the numbers that are greater than x. Then start from the greatest number and sum backwards. Then if I get greater than x, subtract the greatest number (exclude it from the solution) and continue to sum backward.
This seems the most efficient way to me and actually is O(n) - you never go back (or forward in this backward algorithm), except when you subtract or remove the biggest element, which doesn't need accessing the list again - just a temp var.
To answer Dunes question:
Yes, there is a reason - to subtracts the next largest in case of no-solution that sums larger. Going from the first element, hit a no-solution would require access to the list again or to the temporary solution list to subtract a set of elements that sum greater than the next element to sum. You risk to increase the complexity by accessing more elements.
To improve efficiency in the cases where an eventual solution is at the beginning of the sequence you can search for the smaller and larger pair using binary search. Once a pair of 2 elements, smaller than x is found then you can sum the pair and if it sums larger than x you go left, otherwise you go right. This search has logarithmic complexity in theory. In practice complexity is not what it is in theory and you can do whatever you like :)
You should pick the first three elements, sum them and do and then you keep subtracting the first of the three and add the next element in the list and see if the sum add up to whatever number you want. That would be O(n).
# vec as np.ndarray
import numpy as np
itsum = sum(list[0:2]) # the sum you want to iterate and check its value
sequence = [[] if itsum == whatever else [range(0,3)]] # indices of the list that add up to whatever (creation)
for i in range(3,len(vec)):
itsum -= vec[i-3]
itsum += vec[i]
if itsum == whatever:
sequence.append(range(i-2,i+1)) # list of sequences that add up to whatever
The solution you provide in the question isn't truly O(n) time complexity -- the way you compute your triangle numbers makes the computation O(n2). The list comprehension throws away the previous work that want into calculating the last triangle number. That is: tni = tni-1 + i (where tn is a triangle number). Since you also, store the triangle numbers in a list, your space complexity is not constant, but related to the size of the number you are looking for. Below is an identical algorithm, but is O(n) time complexity and O(1) space complexity (written for python 3).
# for python 2, replace things like `highest = next(high)` with `highest = high.next()`
from itertools import count, takewhile, accumulate
def find(to_find):
# next(low) == lowest number in total
# next(high) == highest number not in total
low = accumulate(count(1)) # generator of triangle numbers
high = accumulate(count(1))
total = highest = next(high)
# highest = highest number in the sequence that sums to total
# definitely can't find solution if the highest number in the sum is greater than to_find
while highest <= to_find:
# found a solution
if total == to_find:
# keep taking numbers from the low iterator until we find the highest number in the sum
return list(takewhile(lambda x: x <= highest, low))
elif total < to_find:
# add the next highest triangle number not in the sum
highest = next(high)
total += highest
else: # if total > to_find
# subtract the lowest triangle number in the sum
total -= next(low)
return []

Is my code's worse time complexity is log(n)?

The method foo gets as a parameter a sorted list with different numbers and returns the count of all the occurrences such that: i == list[i] (where i is the index 0 <= i <= len(list)).
def foo_helper(lst, start, end):
if start > end:
# end of recursion
return 0
if lst[end] < end or lst[start] > start:
# no point checking this part of the list
return 0
# all indexes must be equal to their values
if abs(end - start) == lst[end] - lst[start]:
return end - start + 1
middle = (end + start) // 2
print(lst[start:end+1], start, middle, end)
if lst[middle] == middle:
#print("lst[" , middle , "]=", lst[middle])
return 1 + foo_helper(lst, middle+1, end) + foo_helper(lst, start, middle-1)
elif lst[middle] < middle:
return foo_helper(lst, middle+1, end)
else:
return foo_helper(lst, start, middle-1)
def foo(lst):
return foo_helper(lst, 0, len(lst)-1)
My question is if this code's worst-case complexity = log(n)?
If not, What should I do different?
If you have a list of N numbers, all unique, and known to be sorted, then if list[0] == 0 and list[N-1] == N-1, then the uniqueness and ordering properties dictate that the entire list meets the property that list[i] == i. This can be determined in O(1) time - just check the first and last list entries.
The uniqueness and ordering properties force any list to have three separate regions - a possibly empty prefix region where list[i] < i, a possibly empty middle region where list[i] == i, and a possibly empty suffix region where list[i] > i]. In the general case, finding the middle region requires O(n) time - a scan from the front to find the first index where list[i] == i, and a scan from the back to find the last such index (or you could do both with one single forward scan). Once you find those, you are guaranteed by uniqueness and ordering that all the indexes in between will have the same property...
Edit: As pointed out by #tobias_k below, you could also do a binary search to find the two end points, which would be O(log n) instead of O(n). This would be the better option if your inputs are completely general.
To expand on my comment trying to think about this problem. Consider of the graph of the identity function, which represents the indices. We want to know where this sorted list (a strictly monotonic function) intersects the line representing the indices y = x, considering only integer locations. I think you should be able to find this in O(n) time (as commented it seems binary search for the intersection bounds should work), though I need to look at your code more closely to see what it's doing.
Because we have a sorted list with unique elements, we have i == list[i] either at no place
at one place
or if there are multiple places they must be consecutive (once you're above the line you can never come back down)
Code used:
import numpy as np
import matplotlib.pyplot as plt
a = np.unique(np.random.randint(-25, 50, 50))
indices = range(len(a))
plt.scatter(indices, indices, c='b')
plt.scatter(indices, a, c='r')
plt.show()

(Binary) Summing the elements of a list

I need to sum the elements of a list, containing all zeros or ones, so that the result is 1 if there is a 1 in the list, but 0 otherwise.
def binary_search(l, low=0,high=-1):
if not l: return -1
if(high == -1): high = len(l)-1
if low == high:
if l[low] == 1: return low
else: return -1
mid = (low + high)//2
upper = [l[mid:high]]
lower = [l[0:mid-1]]
u = sum(int(x) for x in upper)
lo = sum(int(x) for x in lower)
if u == 1: return binary_search(upper, mid, high)
elif lo == 1: return binary_search(lower, low, mid-1)
return -1
l = [0 for x in range(255)]
l[123] = 1
binary_search(l)
The code I'm using to test
u = sum(int(x) for x in upper)
works fine in the interpreter, but gives me the error
TypeError: int() argument must be a string or a number, not 'list'
I've just started to use python, and can't figure out what's going wrong (the version I've written in c++ doesn't work either).
Does anyone have any pointers?
Also, how would I do the sum so that it is a binary xor, not simply decimal addition?
You don't actually want a sum; you want to know whether upper or lower contains a 1 value. Just take advantage of Python's basic container-type syntax:
if 1 in upper:
# etc
if 1 in lower:
# etc
The reason you're getting the error, by the way, is because you're wrapping upper and lower with an extra nested list when you're trying to split l (rename this variable, by the way!!). You just want to split it like this:
upper = the_list[mid:high]
lower = the_list[:mid-1]
Finally, it's worth noting that your logic is pretty weird. This is not a binary search in the classic sense of the term. It looks like you're implementing "find the index of the first occurrence of 1 in this list". Even ignoring the fact that there's a built-in function to do this already, you would be much better served by just iterating through the whole list until you find a 1. Right now, you've got O(nlogn) time complexity (plus a bunch of extra one-off loops), which is pretty silly considering the output can be replicated in O(n) time by:
def first_one(the_list):
for i in range(len(the_list)):
if the_list[i] == 1:
return i
return -1
Or of course even more simply by using the built-in function index:
def first_one(the_list):
try:
return the_list.index(1)
except ValueError:
return -1
I need to sum the elements of a list, containing all zeros or ones, so that the result is 1 if there is a 1 in the list, but 0 otherwise.
What's wrong with
int(1 in l)
I need to sum the elements of a list, containing all zeros or ones, so that the result is 1 if there is a 1 in the list, but 0 otherwise.
No need to sum the whole list; you can stop at the first 1. Simply use any(). It will return True if there is at least one truthy value in the container and False otherwise, and it short-circuits (i.e. if a truthy value is found early in the list, it doesn't scan the rest). Conveniently, 1 is truthy and 0 is not.
True and False work as 1 and 0 in an arithmetic context (Booleans are a subclass of integers), but if you want specifically 1 and 0, just wrap any() in int().
Stop making nested lists.
upper = l[mid:high]
lower = l[0:mid-1]

Categories