Consolidate IPs into ranges in python - python

Assume I have a list of IP ranges (last term only) that may or may not overlap:
('1.1.1.1-7', '2.2.2.2-10', '3.3.3.3-3.3.3.3', '1.1.1.4-25', '2.2.2.4-6')
I'm looking for a way to identify any overlapping ranges and consolidate them into single ranges.
('1.1.1.1-25', '2.2.2.2-10', '3.3.3.3-3')
Current thought for algorithm is to expand all ranges into a list of all IPs, eliminate duplicates, sort, and consolidate any consecutive IPs.
Any more python-esque algorithm suggestions?

Here is my version, as a module. My algorithm is identical to the one lunixbochs mentions in his answer, and the conversion from range string to integers and back is nicely modularized.
import socket, struct
def ip2long(ip):
packed = socket.inet_aton(ip)
return struct.unpack("!L", packed)[0]
def long2ip(n):
unpacked = struct.pack('!L', n)
return socket.inet_ntoa(unpacked)
def expandrange(rng):
# expand '1.1.1.1-7' to ['1.1.1.1', '1.1.1.7']
start, end = [ip.split('.') for ip in rng.split('-')]
return map('.'.join, (start, start[:len(start) - len(end)] + end))
def compressrange((start, end)):
# compress ['1.1.1.1', '1.1.1.7'] to '1.1.1.1-7'
start, end = start.split('.'), end.split('.')
return '-'.join(map('.'.join,
(start, end[next((i for i in range(4) if start[i] != end[i]), 3):])))
def strings_to_ints(ranges):
# turn range strings into list of lists of ints
return [map(ip2long, rng) for rng in map(expandrange, ranges)]
def ints_to_strings(ranges):
# turn lists of lists of ints into range strings
return [compressrange(map(long2ip, rng)) for rng in ranges]
def consolodate(ranges):
# join overlapping ranges in a sorted iterable
iranges = iter(ranges)
startmin, startmax = next(iranges)
for endmin, endmax in iranges:
# leave out the '+ 1' if you want to join overlapping ranges
# but not consecutive ranges.
if endmin <= (startmax + 1):
startmax = max(startmax, endmax)
else:
yield startmin, startmax
startmin, startmax = endmin, endmax
yield startmin, startmax
def convert_consolodate(ranges):
# convert a list of possibly overlapping ip range strings
# to a sorted, consolodated list of non-overlapping ip range strings
return list(ints_to_strings(consolodate(sorted(strings_to_ints(ranges)))))
if __name__ == '__main__':
ranges = ('1.1.1.1-7',
'2.2.2.2-10',
'3.3.3.3-3.3.3.3',
'1.1.1.4-25',
'2.2.2.4-6')
print convert_consolodate(ranges)
# prints ['1.1.1.1-25', '2.2.2.2-10', '3.3.3.3-3']

Convert your ranges into pairs of numbers. These functions will convert individual IPs to and from integer values.
def ip2long(ip):
packed = socket.inet_aton(ip)
return struct.unpack("!L", packed)[0]
def long2ip(n):
unpacked = struct.pack('!L', n)
return socket.inet_ntoa(unpacked)
Now you can sort/merge the edges of each range as numbers, then convert back to IPs to get a nice representation. This question about merging time ranges has a nice algorithm.
Parse your strings of 1.1.1.1-1.1.1.2 and 1.1.1.1-2 into a pair of numbers. For the latter format, you could do:
x = '1.1.1.1-2'
first, add = x.split('-')
second = first.rsplit('.', 1)[0] + '.' + add
pair = ip2long(first), ip2long(second)
Merge the overlapping ranges using simple number comparisons.
Convert back to string representation (still assumes latter format):
first, second = pair
first = long2ip(first) + '-' + long2ip(second).rsplit('.', 1)[1]

Once I faced the same problem. The only difference was that I had to efficiently keep line segments in a list. It was for a Monte-Carlo simulation. And the newly randomly generated line segments had to be added to the existing sorted and merged line segments.
I adapted the algorithm to your problem using the answer by lunixbochs to convert IPs to integers.
This solution allows to add a new IP range to the existing list of already merged ranges (while other solutions rely on having the list-of-ranges-to-merge sorted and do not allow adding a new range to already merged range list). It's done in add_range function by using bisect module to find the place where to insert the new IP range and then deleting the redundant IP intervals and inserting the new range with adjusted boundaries so that the new range embraces all the deleted ranges.
import socket
import struct
import bisect
def ip2long(ip):
'''IP to integer'''
packed = socket.inet_aton(ip)
return struct.unpack("!L", packed)[0]
def long2ip(n):
'''integer to IP'''
unpacked = struct.pack('!L', n)
return socket.inet_ntoa(unpacked)
def get_ips(s):
'''Convert string IP interval to tuple with integer representations of boundary IPs
'1.1.1.1-7' -> (a,b)'''
s1,s2 = s.split('-')
if s2.isdigit():
s2 = s1[:-1] + s2
return (ip2long(s1),ip2long(s2))
def add_range(iv,R):
'''add new Range to already merged ranges inplace'''
left,right = get_ips(R)
#left,right are left and right boundaries of the Range respectively
#If this is the very first Range just add it to the list
if not iv:
iv.append((left,right))
return
#Searching the first interval with left_boundary < left range side
p = bisect.bisect_right(iv, (left,right)) #place after the needed interval
p -= 1 #calculating the number of interval basing on the position where the insertion is needed
#Interval: |----X----| (delete)
#Range: <--<--|----------| (extend)
#Detect if the left Range side is inside the found interval
if p >=0: #if p==-1 then there was no interval found
if iv[p][1]>= right:
#Detect if the Range is completely inside the interval
return #drop the Range; I think it will be a very common case
if iv[p][1] >= left-1:
left = iv[p][0] #extending the left Range interval
del iv[p] #deleting the interval from the interval list
p -= 1 #correcting index to keep the invariant
#Intervals: |----X----| |---X---| (delete)
#Range: |-----------------------------|
#Deleting all the intervals which are inside the Range interval
while True:
p += 1
if p >= len(iv) or iv[p][0] >= right or iv[p][1] > right:
'Stopping searching for the intervals which is inside the Range interval'
#there are no more intervals or
#the interval is to the right of the right Range side
# it's the next case (right Range side is inside the interval)
break
del iv[p] #delete the now redundant interval from the interval list
p -= 1 #correcting index to keep the invariant
#Interval: |--------X--------| (delete)
#Range: |-----------|-->--> (extend)
#Working the case when the right Range side is inside the interval
if p < len(iv) and iv[p][0] <= right-1:
#there is no condition for right interval side since
#this case would have already been worked in the previous block
right = iv[p][1] #extending the right Range side
del iv[p] #delete the now redundant interval from the interval list
#No p -= 1, so that p is no pointing to the beginning of the next interval
#which is the position of insertion
#Inserting the new interval to the list
iv.insert(p, (left,right))
def merge_ranges(ranges):
'''Merge the ranges'''
iv = []
for R in ranges:
add_range(iv,R)
return ['-'.join((long2ip(left),long2ip(right))) for left,right in iv]
ranges = ('1.1.1.1-7', '2.2.2.2-10', '3.3.3.3-3.3.3.3', '1.1.1.4-25', '2.2.2.4-6')
print(merge_ranges(ranges))
Output:
['1.1.1.1-1.1.1.25', '2.2.2.2-2.2.2.10', '3.3.3.3-3.3.3.3']
This was a lot of fun for me to code! Thank you for that :)

The netaddr package does what you want.
See summarizing adjacent subnets with python netaddr cidr_merge
and https://netaddr.readthedocs.io/en/latest/tutorial_01.html#summarizing-list-of-addresses-and-subnets

Unify format of your ips, turn range into a pair of ints.
Now the task is much simpler - "consolidate" integer range. I believe there are a lot of existing efficient algorithm to do that, below only my naive try:
>>> orig_ranges = [(1,5), (7,12), (2,3), (13,13), (13,17)] # should result in (1,5), (7,12), (13,17)
>>> temp_ranges = {}
>>> for r in orig_ranges:
temp_ranges.setdefault(r[0], []).append('+')
temp_ranges.setdefault(r[1], []).append('-')
>>> start_count = end_count = 0
>>> start = None
>>> for key in temp_ranges:
if start is None:
start = key
start_count += temp_ranges[key].count('+')
end_count += temp_ranges[key].count('-')
if start_count == end_count:
print start, key
start = None
start_count = end_count = 0
1 5
7 12
13 17
The general idea is the next: after we put ranges one onto another (in temp_ranges dict), we may find new composed ranges simply by counting beginnings and endings of original ranges; once we got equality, we found a united range.

I had these lying around in case you need em, using socket/struct is probably better way to go though
def ip_str_to_int(address):
"""Convert IP address in form X.X.X.X to an int.
>>> ip_str_to_int('74.125.229.64')
1249764672
"""
parts = address.split('.')
parts.reverse()
return sum(int(v) * 256 ** i for i, v in enumerate(parts))
def ip_int_to_str(address):
"""Convert IP address int into the form X.X.X.X.
>>> ip_int_to_str(1249764672)
'74.125.229.64'
"""
parts = [(address & 255 << 8 * i) >> 8 * i for i in range(4)]
parts.reverse()
return '.'.join(str(x) for x in parts)

Related

Python script to make every combination of a string with placed characters

I'm looking for help in creating a script to add periods to a string in every place but first and last, using as many periods as needed to create as many combinations as possible:
The output for the string 1234 would be:
["1234", "1.234", "12.34", "123.4", "1.2.34", "1.23.4" etc. ]
And obviously this needs to work for all lengths of string.
You should solve this type of problems yourself, these are simple algorithms to manipulate data that you should know how to come up with.
However, here is the solution (long version for more clarity):
my_str = "1234" # original string
# recursive function for constructing dots
def construct_dot(s, t):
# s - the string to put dots
# t - number of dots to put
# zero dots will return the original string in a list (stop criteria)
if t==0: return [s]
# allocation for results list
new_list = []
# iterate the next dot location, considering the remaining dots.
for p in range(1,len(s) - t + 1):
new_str = str(s[:p]) + '.' # put the dot in the location
res_str = str(s[p:]) # crop the string frot the dot to the end
sub_list = construct_dot(res_str, t-1) # make a list with t-1 dots (recursive)
# append concatenated strings
for sl in sub_list:
new_list.append(new_str + sl)
# we result with a list of the string with the dots.
return new_list
# now we will iterate the number of the dots that we want to put in the string.
# 0 dots will return the original string, and we can put maximum of len(string) -1 dots.
all_list = []
for n_dots in range(len(my_str)):
all_list.extend(construct_dot(my_str,n_dots))
# and see the results
print(all_list)
Output is:
['1234', '1.234', '12.34', '123.4', '1.2.34', '1.23.4', '12.3.4', '1.2.3.4']
A concise solution without recursion: using binary combinations (think of 0, 1, 10, 11, etc) to determine where to insert the dots.
Between each letter, put a dot when there's a 1 at this index and an empty string when there's a 0.
your_string = "1234"
def dot_combinations(string):
i = 0
combinations = []
# Iter while the binary representation length is smaller than the string size
while i.bit_length() < len(string):
current_word = []
for index, letter in enumerate(string):
current_word.append(letter)
# Append a dot if there's a 1 in this position
if (1 << index) & i:
current_word.append(".")
i+=1
combinations.append("".join(current_word))
return combinations
print dot_combinations(your_string)
Output:
['1234', '1.234', '12.34', '1.2.34', '123.4', '1.23.4', '12.3.4', '1.2.3.4']

Python list manipulation: Given a list of ranges number, return the list of combined ranges

I was given this problem during a phone interview:
Suppose there is a list of ranges. For example, [[1-6],[10-19],[5-8]].
Write a function that returns the list of combined ranges
such that input [[1-6],[10-19],[5-8]] to the function returns
[[1,8],[10,19]] (only the start and end number). Note, the input list
may contain arbitrary number of
ranges.
My solution to this problem is:
Combine all range list into one list:
[[1-6],[10-19],[5-8]] -> [1-6,10-19,5-8]
Perform sorting on the list:
list = Sorted(list) -> [1,2,3,4,5,5,6,6,7,8,10...]
Use list = set(list) to get rid of the redundant numbers
Iterate through the list and find the range
I know this solution is definitely what they are looking for (that's why I failed the interview terribly) as the time complexity is O(nlogn) (sorting), n is the number of distinct numbers in the range.
Can you python expert gives a O(n) solution, n as the number of ranges in the original list?
First of all, the solution mentioned in the question is not O(nlgn), where n is the number of segments. This is O(Xlg(X))where, X = length of the segment*num of segments, which is terribly slow.
An O(NlgN) solution exists where N is the number of segments.
Sort the segments by their starting point.
Sweep across the sorted list and check if the current segment overlaps with the previous one. If yes, then extend the previous segment if required.
Sample code:
inp = [[1,6], [10,19], [5,8]]
inp = sorted(inp)
segments = []
for i in inp:
if segments:
if segments[-1][1] >= i[0]:
segments[-1][1] = max(segments[-1][1], i[1])
continue
segments.append(i)
print segments # [[1, 8], [10, 19]]
You could use heapq to create a heap from the ranges. Then pop range from a heap and if it overlaps with the top of the heap replace the top with merged range. If there's no overlap or there's no more ranges append it to result:
import heapq
def merge(ranges):
heapq.heapify(ranges)
res = []
while ranges:
start, end = heapq.heappop(ranges)
if ranges and ranges[0][0] <= end:
heapq.heapreplace(ranges, [start, max(end, ranges[0][1])])
else:
res.append((start, end))
return res
ranges = [[1,6],[10,19],[5,8]]
print(merge(ranges))
Output:
[(1, 8), (10, 19)]
Above has O(n log n) time complexity where n is the number of ranges.
In case range is [x,y] and max_x,y is less probably within a few millions you can do this
The idea is that I use the technique of hashing to put them in sorted order taking advantage of lower max_y.
We then iterate and keep the current 'good' range is variables mn and mx.
When a new range comes if it is entirely outside the 'good' range, we append the good range and make the new range as the good range. Otherwise we change the good range accordingly.
max_y = 1000000
range_sort = [None]*max_y
ranges = [[1,6],[10,19],[5,8]]
for r in ranges:
if range_sort[r[0]] is not None and range_sort[r[0]]>=r[1]:
continue ## handling the case [1,5] [1,8]
range_sort[r[0]] = r[1] # in the list lower value is stored as index, higher as value
mx = -1
mn = 1000000000
ans = []
for x,y in enumerate(range_sort): # The values are correct as explained in comment above
if y is None:
continue #To remove the null values
if x<mn:
mn = x # This will change the lower value of current range
if x>mx and mx>0: # If lower val x higher than current upper mx
ans.append([mn,mx]) # append current lower (mn) and upper(mx)
mn = x
mx = y # change the current upper and lower to the new one
if y>mx:
mx = y # This will change upper value of current range
ans.append([mn,mx]) # This has to be outside as last range won't get appended
print ans
Output: [[1,8],[10,19]]
Time complexity O(MAX_y)

Python - Find matching elements between and within nested lists

Background: I have a lengthy script which calculates possible chemical formula for a given mass (based on a number of criteria), and outputs (amongst other things) a code which corresponds to the 'class' of compounds which that formula belong to. I calculate formula from batches of masses which should all be members of the same class. However, given instrumentation etc limits, it is possible to calculate several possible formula for each mass. I need to check if any of the classes calculated are common to all peaks, and if so, return the position of the match/etc.
I'm struggling with working out how to do an iterative if/for loop which checks every combination for matches (in an efficient way).
The image included summarises the issue:
Or on actual screenshots of the data structure:
image link here -
As you can see, I have a list called "formulae" which has a variable number of elements (in this case, 12).
Each element in formulae is a list, again with a variable number of elements.
Each element within those lists is a list, containing 15 7 elements. I wish to compare the 11th element amongst different elements.
I.e.
formulae[0][0][11] == formulae[1][0][11]
formulae[0][0][11] == formulae[1][1][11]
...
formulae[0][1][11] == formulae[11][13][11]
I imagine the answer might involve a couple of nested for and if statements, but I can't get my head around it.
I then will need to export the lists which matched (like formulae[0][0]) to a new array.
Unless I'm doing this wrong?
Thanks for any help!
EDIT:
1- My data structure has changed slightly, and I need to check that elements [?][?][4] and [?][?][5] and [?][?][6] and [?][?][7] all match the corresponding elements in another list.
I've attempted to adapt some of the code suggested, but can't quite get it to work...
check_O = 4
check_N = 5
check_S = 6
check_Na = 7
# start with base (left-hand) formula
nbase_i = len(formulae)
for base_i in range(len(formulae)): # length of first index
for base_j in range(len(formulae[base_i])): # length of second index
count = 0
# check against comparison (right-hand) formula
for comp_i in range(len(formulae)): # length of first index
for comp_j in range(len(formulae[comp_i])): # length of second index
if base_i != comp_i:
o_test = formulae[base_i][base_j][check_O] == formulae[comp_i][comp_j][check_O]
n_test = formulae[base_i][base_j][check_N] == formulae[comp_i][comp_j][check_N]
s_test = formulae[base_i][base_j][check_S] == formulae[comp_i][comp_j][check_S]
na_test = formulae[base_i][base_j][check_Na] == formulae[comp_i][comp_j][check_Na]
if o_test == n_test == s_test == na_test == True:
count = count +1
else:
count = 0
if count < nbase_i:
print base_i, base_j, comp_i,comp_j
o_test = formulae[base_i][base_j][check_O] == formulae[comp_i][comp_j][check_O]
n_test = formulae[base_i][base_j][check_N] == formulae[comp_i][comp_j][check_N]
s_test = formulae[base_i][base_j][check_S] == formulae[comp_i][comp_j][check_S]
na_test = formulae[base_i][base_j][check_Na] == formulae[comp_i][comp_j][check_Na]
if o_test == n_test == s_test == na_test == True:
count = count +1
else:
count = 0
elif count == nbase_i:
matching = "Got a match! " + "[" +str(base_i) + "][" + str(base_j) + "] matches with " + "[" + str(comp_i) + "][" + str(comp_j) +"]"
print matching
else:
count = 0
I would take a look at using in such as
agg = []
for x in arr:
matched = [y for y in arr2 if x in y]
agg.append(matched)
Prune's answer not right, should be like this:
check_index = 11
# start with base (left-hand) formula
for base_i in len(formulae): # length of first index
for base_j in len(formulae[base_i]): # length of second index
# check against comparison (right-hand) formula
for comp_i in len(formulae): # length of first index
for comp_j in len(formulae[comp_i]): # length of second index
if formulae[base_i][base_j][check_index] == formulae[comp_i][comp_j][check_index]:
print "Got a match"
# Here you add whatever info *you* need to identify the match
I'm not sure I fully understand your data structure, hence I'm not gonna write code here but propose an idea: how about an inverted index?
Like you scan once the lists creating kind of a summary of where the value you look for is.
You could create a dictionary composed as follows:
{
'ValueOfInterest1': [ (position1), (position2) ],
'ValueOfInterest2': [ (positionX) ]
}
Then at the end you can have a look at the dictionary and see if any of the values (basically lists) have length > 1.
Of course you'd need to find a way to create a position format that makes sense to you.
Just an idea.
Does this get you going?
check_index = 11
# start with base (left-hand) formula
for base_i in len(formulae): # length of first index
for base_j in len(formulae[0]): # length of second index
# check against comparison (right-hand) formula
for comp_i in len(formulae): # length of first index
for comp_j in len(formulae[0]): # length of second index
if formulae[base_i][base[j] == formulae[comp_i][comp_j]:
print "Got a match"
# Here you add whatever info *you* need to identify the match

Merging arrays slices in Python

So I have a string that looks like this:
data="ABCABDABDABBCBABABDBCABBDBACBBCDB"
And I am taking random 10 character slices out of it:
start=int(random.random()*100)
end = start+10
slice = data[start:start+10]
But what I am trying to do now is count the number of 'gaps' or 'holes' that were not sliced out at all.
slices_indices = []
for i in xrange(0,100):
start=int(random.random()*100)
end= 10
slice = data[start:end]
...
slices_indices.append([start,end])
For instance, after running this a couple times. I covered this amount:
ABCAB DABD ABBCBABABDB C ABBDBACBBCDB
But left two 'gaps' of slices. Is there a 'Pythonic' way to find the number of these gaps? So basically I am looking for a function that count_gaps given the slice indices.
For example above,
count_gaps(slices_indices)
would give me two
Thanks in advance
There are several, although all involve a bit of messing about
You could compare the removed strings against the original, and work out which characters you didn't hit.
That's a very roundabout way of doing it, though, and won't work properly if you ever have the same 10 characters in the string twice. eg 1234123 or something.
A better solution would be to store the values of i you use, then step back through the data string comparing the current position to the values of i you used (plus 10). If it doesn't match, job done.
eg (pseudo code)
# Make an array the same length as the string
charsUsed = array(data.length)
# Do whatever
for i in xrange(0,100)
someStuffYouWereDoingBefore()
# Store our "used chars" in the array
for(char = i; char < i+10; char++)
if(char <= data.length) # Don't go out of bounds on the array!
charsUsed[i] = true
Then to see which chars weren't used, just walk through charsUsed array and count whatever it is you want to count (consecutive gaps etc)
Edit in response to updated question:
I'd still use the above method to make a "which chars were used" array. Your count_gaps() function then just needs to walk through the array to "find" the gaps
eg (pseudo...something. This isn't even vaguely Python. Hopefully you get the idea though)
The idea is essentially to see if the current position is false (ie not used) and the last position is true (used) meaning it's the start of a "new" gap. If both are false, we're in the middle of a gap, and if both are true, we're in the middle of a "used" string
function find_gaps(array charsUsed)
{
# Count the gaps
numGaps = 0
# What did we look at last (to see if it's the start of a gap)
# Assume it's true if you want to count "gaps" at the start of the string, assume it's false if you don't.
lastPositionUsed = true
for(i = 0; i < charsUsed.length; i++)
{
if(charsUsed[i] = false && lastPositionUsed = true)
{
numGaps++
}
lastPositionUsed = charsUsed[i]
}
return numGaps
}
The other option would be to step through the charsUsed array again and "group" consecutive values into a smaller away, then count the value you want... essentially the same thing but with a different approach. With this example I just ignore group I don't want and the "rest" of the group I do, counting only the boundaries between the group we don't want, and the group we do.
It is a bit of a messy task, but I think sets are the way to go. I hope my code below is self-explanatory, but if there are parts you don't understand please let me know.
#! /usr/bin/env python
''' Count gaps.
Find and count the sections in a sequence that weren't touched by random slicing
From http://stackoverflow.com/questions/26060688/merging-arrays-slices-in-python
Written by PM 2Ring 2014.09.27
'''
import random
from string import ascii_lowercase
def main():
def rand_slice():
start = random.randint(0, len(data) - slice_width)
return start, start + slice_width
#The data to slice
data = 5 * ascii_lowercase
print 'Data:\n%s\nLength : %d\n' % (data, len(data))
random.seed(42)
#A set to capture slice ranges
slices = set()
slice_width = 10
num_slices = 10
print 'Extracting %d slices from data' % num_slices
for i in xrange(num_slices):
start, end = rand_slice()
slices |= set(xrange(start, end))
data_slice = data[start:end].upper()
print '\n%2d, %2d : %s' % (start, end, data_slice)
data = data[:start] + data_slice + data[end:]
print data
#print sorted(slices)
print '\nSlices:\n%s\n' % sorted(slices)
print '\nSearching for gaps missed by slicing'
unsliced = sorted(tuple(set(xrange(len(data))) - slices))
print 'Unsliced:\n%s\n' % (unsliced,)
gaps = []
if unsliced:
last = start = unsliced[0]
for i in unsliced[1:]:
if i > last + 1:
t = (start, last + 1)
gaps.append(t)
print t
start = i
last = i
t = (start, last + 1)
gaps.append(t)
print t
print '\nGaps:\n%s\nCount: %d' % (gaps, len(gaps))
if __name__ == '__main__':
main()
I'd use some kind of bitmap. For example, Extending your code:
data="ABCABDABDABBCBABABDBCABBDBACBBCDB"
slices_indices = [0]*len(data)
for i in xrange(0,100):
start=int(random.random()*len(data))
end=start + 10
slice = data[start:end]
slices_indices[start:end] = [1] * len(slice)
I've used a list here, but you could use any other appropriate data structure, probably something more compact, if your data is rather big.
So, we've initialized the bitmap with zeros, and marked with ones the selected chunks of data. Now we can use something from itertools, for example:
from itertools import groupby
groups = groupby(slices_indices)
groupby returns an iterator where each element is a tuple (element, iterator). To just count gaps you can do something simple, like:
gaps = len([x for x in groups if x[0] == 0])

Is my code's worse time complexity is log(n)?

The method foo gets as a parameter a sorted list with different numbers and returns the count of all the occurrences such that: i == list[i] (where i is the index 0 <= i <= len(list)).
def foo_helper(lst, start, end):
if start > end:
# end of recursion
return 0
if lst[end] < end or lst[start] > start:
# no point checking this part of the list
return 0
# all indexes must be equal to their values
if abs(end - start) == lst[end] - lst[start]:
return end - start + 1
middle = (end + start) // 2
print(lst[start:end+1], start, middle, end)
if lst[middle] == middle:
#print("lst[" , middle , "]=", lst[middle])
return 1 + foo_helper(lst, middle+1, end) + foo_helper(lst, start, middle-1)
elif lst[middle] < middle:
return foo_helper(lst, middle+1, end)
else:
return foo_helper(lst, start, middle-1)
def foo(lst):
return foo_helper(lst, 0, len(lst)-1)
My question is if this code's worst-case complexity = log(n)?
If not, What should I do different?
If you have a list of N numbers, all unique, and known to be sorted, then if list[0] == 0 and list[N-1] == N-1, then the uniqueness and ordering properties dictate that the entire list meets the property that list[i] == i. This can be determined in O(1) time - just check the first and last list entries.
The uniqueness and ordering properties force any list to have three separate regions - a possibly empty prefix region where list[i] < i, a possibly empty middle region where list[i] == i, and a possibly empty suffix region where list[i] > i]. In the general case, finding the middle region requires O(n) time - a scan from the front to find the first index where list[i] == i, and a scan from the back to find the last such index (or you could do both with one single forward scan). Once you find those, you are guaranteed by uniqueness and ordering that all the indexes in between will have the same property...
Edit: As pointed out by #tobias_k below, you could also do a binary search to find the two end points, which would be O(log n) instead of O(n). This would be the better option if your inputs are completely general.
To expand on my comment trying to think about this problem. Consider of the graph of the identity function, which represents the indices. We want to know where this sorted list (a strictly monotonic function) intersects the line representing the indices y = x, considering only integer locations. I think you should be able to find this in O(n) time (as commented it seems binary search for the intersection bounds should work), though I need to look at your code more closely to see what it's doing.
Because we have a sorted list with unique elements, we have i == list[i] either at no place
at one place
or if there are multiple places they must be consecutive (once you're above the line you can never come back down)
Code used:
import numpy as np
import matplotlib.pyplot as plt
a = np.unique(np.random.randint(-25, 50, 50))
indices = range(len(a))
plt.scatter(indices, indices, c='b')
plt.scatter(indices, a, c='r')
plt.show()

Categories