Merging arrays slices in Python - python

So I have a string that looks like this:
data="ABCABDABDABBCBABABDBCABBDBACBBCDB"
And I am taking random 10 character slices out of it:
start=int(random.random()*100)
end = start+10
slice = data[start:start+10]
But what I am trying to do now is count the number of 'gaps' or 'holes' that were not sliced out at all.
slices_indices = []
for i in xrange(0,100):
start=int(random.random()*100)
end= 10
slice = data[start:end]
...
slices_indices.append([start,end])
For instance, after running this a couple times. I covered this amount:
ABCAB DABD ABBCBABABDB C ABBDBACBBCDB
But left two 'gaps' of slices. Is there a 'Pythonic' way to find the number of these gaps? So basically I am looking for a function that count_gaps given the slice indices.
For example above,
count_gaps(slices_indices)
would give me two
Thanks in advance

There are several, although all involve a bit of messing about
You could compare the removed strings against the original, and work out which characters you didn't hit.
That's a very roundabout way of doing it, though, and won't work properly if you ever have the same 10 characters in the string twice. eg 1234123 or something.
A better solution would be to store the values of i you use, then step back through the data string comparing the current position to the values of i you used (plus 10). If it doesn't match, job done.
eg (pseudo code)
# Make an array the same length as the string
charsUsed = array(data.length)
# Do whatever
for i in xrange(0,100)
someStuffYouWereDoingBefore()
# Store our "used chars" in the array
for(char = i; char < i+10; char++)
if(char <= data.length) # Don't go out of bounds on the array!
charsUsed[i] = true
Then to see which chars weren't used, just walk through charsUsed array and count whatever it is you want to count (consecutive gaps etc)
Edit in response to updated question:
I'd still use the above method to make a "which chars were used" array. Your count_gaps() function then just needs to walk through the array to "find" the gaps
eg (pseudo...something. This isn't even vaguely Python. Hopefully you get the idea though)
The idea is essentially to see if the current position is false (ie not used) and the last position is true (used) meaning it's the start of a "new" gap. If both are false, we're in the middle of a gap, and if both are true, we're in the middle of a "used" string
function find_gaps(array charsUsed)
{
# Count the gaps
numGaps = 0
# What did we look at last (to see if it's the start of a gap)
# Assume it's true if you want to count "gaps" at the start of the string, assume it's false if you don't.
lastPositionUsed = true
for(i = 0; i < charsUsed.length; i++)
{
if(charsUsed[i] = false && lastPositionUsed = true)
{
numGaps++
}
lastPositionUsed = charsUsed[i]
}
return numGaps
}
The other option would be to step through the charsUsed array again and "group" consecutive values into a smaller away, then count the value you want... essentially the same thing but with a different approach. With this example I just ignore group I don't want and the "rest" of the group I do, counting only the boundaries between the group we don't want, and the group we do.

It is a bit of a messy task, but I think sets are the way to go. I hope my code below is self-explanatory, but if there are parts you don't understand please let me know.
#! /usr/bin/env python
''' Count gaps.
Find and count the sections in a sequence that weren't touched by random slicing
From http://stackoverflow.com/questions/26060688/merging-arrays-slices-in-python
Written by PM 2Ring 2014.09.27
'''
import random
from string import ascii_lowercase
def main():
def rand_slice():
start = random.randint(0, len(data) - slice_width)
return start, start + slice_width
#The data to slice
data = 5 * ascii_lowercase
print 'Data:\n%s\nLength : %d\n' % (data, len(data))
random.seed(42)
#A set to capture slice ranges
slices = set()
slice_width = 10
num_slices = 10
print 'Extracting %d slices from data' % num_slices
for i in xrange(num_slices):
start, end = rand_slice()
slices |= set(xrange(start, end))
data_slice = data[start:end].upper()
print '\n%2d, %2d : %s' % (start, end, data_slice)
data = data[:start] + data_slice + data[end:]
print data
#print sorted(slices)
print '\nSlices:\n%s\n' % sorted(slices)
print '\nSearching for gaps missed by slicing'
unsliced = sorted(tuple(set(xrange(len(data))) - slices))
print 'Unsliced:\n%s\n' % (unsliced,)
gaps = []
if unsliced:
last = start = unsliced[0]
for i in unsliced[1:]:
if i > last + 1:
t = (start, last + 1)
gaps.append(t)
print t
start = i
last = i
t = (start, last + 1)
gaps.append(t)
print t
print '\nGaps:\n%s\nCount: %d' % (gaps, len(gaps))
if __name__ == '__main__':
main()

I'd use some kind of bitmap. For example, Extending your code:
data="ABCABDABDABBCBABABDBCABBDBACBBCDB"
slices_indices = [0]*len(data)
for i in xrange(0,100):
start=int(random.random()*len(data))
end=start + 10
slice = data[start:end]
slices_indices[start:end] = [1] * len(slice)
I've used a list here, but you could use any other appropriate data structure, probably something more compact, if your data is rather big.
So, we've initialized the bitmap with zeros, and marked with ones the selected chunks of data. Now we can use something from itertools, for example:
from itertools import groupby
groups = groupby(slices_indices)
groupby returns an iterator where each element is a tuple (element, iterator). To just count gaps you can do something simple, like:
gaps = len([x for x in groups if x[0] == 0])

Related

Python Optimizating the Van sequence

I am writing a code on python for the platform Coding Games . The code is about Van Eck's sequence and i pass 66% of the "tests".
Everything is working as expected , the problem is that the process runs out of the time allowed.
Yes , the code is slow.
I am not a python writer and I would like to ask you if you could do any optimization on the piece of code and if your method is complex ( Complex,meaning if you will be using something along vectorized data ) and not just swap an if (because that is easily understandable) to give a good explanation for your choice .
Here is my code for the problem
import sys
import math
def LastSeen(array):
startingIndex = 0
lastIndex = len(array) - 1
closestNum = 0
for startingIndex in range(len(array)-1,-1,-1):
if array[lastIndex] == array[startingIndex] and startingIndex != lastIndex :
closestNum = abs(startingIndex - lastIndex)
break
array.append(closestNum)
return closestNum
def calculateEck(elementFirst,numSeq):
number = numSeq
first = elementFirst
result = 0
sequence.append(first)
sequence.append(0)
number -= 2
while number != 0 :
result = LastSeen(sequence)
number -= 1
print(result)
firstElement = int(input())
numSequence = int(input())
sequence = []
calculateEck(firstElement,numSequence)
so here is my code without dictionaries. van_eck contains the sequence in the end. Usually I would use a dict to track the last position of each element to save runtime. Otherwise you would need to iterate over the list to find the last occurence which can take very long.
Instead of a dict, I simply initialized an array of sufficient size and use it like a dict. To determine its size keep in mind that all numbers in the van-eck sequence are either 0 or tell you how far away the last occurrence is. So the first n numbers of the sequence can never be greater than n. Hence, you can just give the array a length equal to the size of the sequence you want to have in the end.
-1 means the element was not there before.
DIGITS = 100
van_eck = [0]
last_pos = [0] + [-1] * DIGITS
for i in range(DIGITS):
current_element = van_eck[i]
if last_pos[current_element] == -1:
van_eck.append(0)
else:
van_eck.append(i - last_pos[current_element])
last_pos[current_element] = i

How to loop to generate string in sequence?

I am trying to create a loop where I can generate string using loop. What I am trying to achieve is that I want to create a small collection of strings starting from 1 character to up to 5 characters.
So, starting from sting 1, I want to go to 55555 but this is number so it seems easy if I just add them, but when it comes to alpha numeric, it gets tricky.
Here is explanation,
I have collection of alpha-numeric chars as string s = "123ABC" and what I want to do is that I want to create all possible 1 character string out of it, so I will have 1,2,3,A,B,C and after that I want to add one more digit in length of string so I can get 11, 12, 13 and so on until I get all possible combination out of it up to CA, CB, CC and I want to get it up to CCCCCC. I am confused in loop because I can get it to generate a temp sting but looping inside to rotate characters is tricky,
this is what I have done so far,
i = 0
strr = "123ABC"
while i < len(strr):
t = strr[0] * (i+1)
for q in range(0, len(t)):
# Here I need help to rotate more
pass
i += 1
Can anyone explain me or point me to resource where I can find solution for it?
You may want to use itertools.permutations function:
import itertools
chars = '123ABC'
for i in xrange(1, len(chars)+1):
print list(itertools.permutations(chars, i))
EDIT:
To get a list of strings, try this:
import itertools
chars = '123ABC'
strings = []
for i in xrange(1, len(chars)+1):
strings.extend(''.join(x) for x in itertools.permutations(chars, i))
This is a nested loop. Different depths of recursion produce all possible combinations.
strr = "123ABC"
def prod(items, level):
if level == 0:
yield []
else:
for first in items:
for rest in prod(items, level-1):
yield [first] + rest
for ln in range(1, len(strr)+1):
print("length:", ln)
for s in prod(strr, ln):
print(''.join(s))
It is also called cartesian product and there is a corresponding function in itertools.

how to make an imputed string to a list, change it to a palindrome(if it isn't already) and reverse it as a string back

A string is palindrome if it reads the same forward and backward. Given a string that contains only lower case English alphabets, you are required to create a new palindrome string from the given string following the rules gives below:
1. You can reduce (but not increase) any character in a string by one; for example you can reduce the character h to g but not from g to h
2. In order to achieve your goal, if you have to then you can reduce a character of a string repeatedly until it becomes the letter a; but once it becomes a, you cannot reduce it any further.
Each reduction operation is counted as one. So you need to count as well how many reductions you make. Write a Python program that reads a string from a user input (using raw_input statement), creates a palindrome string from the given string with the minimum possible number of operations and then prints the palindrome string created and the number of operations needed to create the new palindrome string.
I tried to convert the string to a list first, then modify the list so that should any string be given, if its not a palindrome, it automatically edits it to a palindrome and then prints the result.after modifying the list, convert it back to a string.
c=raw_input("enter a string ")
x=list(c)
y = ""
i = 0
j = len(x)-1
a = 0
while i < j:
if x[i] < x[j]:
a += ord(x[j]) - ord(x[i])
x[j] = x[i]
print x
else:
a += ord(x[i]) - ord(x[j])
x [i] = x[j]
print x
i = i + 1
j = (len(x)-1)-1
print "The number of operations is ",a print "The palindrome created is",( ''.join(x) )
Am i approaching it the right way or is there something I'm not adding up?
Since only reduction is allowed, it is clear that the number of reductions for each pair will be the difference between them. For example, consider the string 'abcd'.
Here the pairs to check are (a,d) and (b,c).
Now difference between 'a' and 'd' is 3, which is obtained by (ord('d')-ord('a')).
I am using absolute value to avoid checking which alphabet has higher ASCII value.
I hope this approach will help.
s=input()
l=len(s)
count=0
m=0
n=l-1
while m<n:
count+=abs(ord(s[m])-ord(s[n]))
m+=1
n-=1
print(count)
This is a common "homework" or competition question. The basic concept here is that you have to find a way to get to minimum values with as few reduction operations as possible. The trick here is to utilize string manipulation to keep that number low. For this particular problem, there are two very simple things to remember: 1) you have to split the string, and 2) you have to apply a bit of symmetry.
First, split the string in half. The following function should do it.
def split_string_to_halves(string):
half, rem = divmod(len(string), 2)
a, b, c = '', '', ''
a, b = string[:half], string[half:]
if rem > 0:
b, c = string[half + 1:], string[rem + 1]
return (a, b, c)
The above should recreate the string if you do a + c + b. Next is you have to convert a and b to lists and map the ord function on each half. Leave the remainder alone, if any.
def convert_to_ord_list(string):
return map(ord, list(string))
Since you just have to do a one-way operation (only reduction, no need for addition), you can assume that for each pair of elements in the two converted lists, the higher value less the lower value is the number of operations needed. Easier shown than said:
def convert_to_palindrome(string):
halfone, halftwo, rem = split_string_to_halves(string)
if halfone == halftwo[::-1]:
return halfone + halftwo + rem, 0
halftwo = halftwo[::-1]
zipped = zip(convert_to_ord_list(halfone), convert_to_ord_list(halftwo))
counter = sum([max(x) - min(x) for x in zipped])
floors = [min(x) for x in zipped]
res = "".join(map(chr, floors))
res += rem + res[::-1]
return res, counter
Finally, some tests:
target = 'ideal'
print convert_to_palindrome(target) # ('iaeai', 6)
target = 'euler'
print convert_to_palindrome(target) # ('eelee', 29)
target = 'ohmygodthisisinsane'
print convert_to_palindrome(target) # ('ehasgidihmhidigsahe', 84)
I'm not sure if this is optimized nor if I covered all bases. But I think this pretty much covers the general concept of the approach needed. Compared to your code, this is clearer and actually works (yours does not). Good luck and let us know how this works for you.

Python - Find matching elements between and within nested lists

Background: I have a lengthy script which calculates possible chemical formula for a given mass (based on a number of criteria), and outputs (amongst other things) a code which corresponds to the 'class' of compounds which that formula belong to. I calculate formula from batches of masses which should all be members of the same class. However, given instrumentation etc limits, it is possible to calculate several possible formula for each mass. I need to check if any of the classes calculated are common to all peaks, and if so, return the position of the match/etc.
I'm struggling with working out how to do an iterative if/for loop which checks every combination for matches (in an efficient way).
The image included summarises the issue:
Or on actual screenshots of the data structure:
image link here -
As you can see, I have a list called "formulae" which has a variable number of elements (in this case, 12).
Each element in formulae is a list, again with a variable number of elements.
Each element within those lists is a list, containing 15 7 elements. I wish to compare the 11th element amongst different elements.
I.e.
formulae[0][0][11] == formulae[1][0][11]
formulae[0][0][11] == formulae[1][1][11]
...
formulae[0][1][11] == formulae[11][13][11]
I imagine the answer might involve a couple of nested for and if statements, but I can't get my head around it.
I then will need to export the lists which matched (like formulae[0][0]) to a new array.
Unless I'm doing this wrong?
Thanks for any help!
EDIT:
1- My data structure has changed slightly, and I need to check that elements [?][?][4] and [?][?][5] and [?][?][6] and [?][?][7] all match the corresponding elements in another list.
I've attempted to adapt some of the code suggested, but can't quite get it to work...
check_O = 4
check_N = 5
check_S = 6
check_Na = 7
# start with base (left-hand) formula
nbase_i = len(formulae)
for base_i in range(len(formulae)): # length of first index
for base_j in range(len(formulae[base_i])): # length of second index
count = 0
# check against comparison (right-hand) formula
for comp_i in range(len(formulae)): # length of first index
for comp_j in range(len(formulae[comp_i])): # length of second index
if base_i != comp_i:
o_test = formulae[base_i][base_j][check_O] == formulae[comp_i][comp_j][check_O]
n_test = formulae[base_i][base_j][check_N] == formulae[comp_i][comp_j][check_N]
s_test = formulae[base_i][base_j][check_S] == formulae[comp_i][comp_j][check_S]
na_test = formulae[base_i][base_j][check_Na] == formulae[comp_i][comp_j][check_Na]
if o_test == n_test == s_test == na_test == True:
count = count +1
else:
count = 0
if count < nbase_i:
print base_i, base_j, comp_i,comp_j
o_test = formulae[base_i][base_j][check_O] == formulae[comp_i][comp_j][check_O]
n_test = formulae[base_i][base_j][check_N] == formulae[comp_i][comp_j][check_N]
s_test = formulae[base_i][base_j][check_S] == formulae[comp_i][comp_j][check_S]
na_test = formulae[base_i][base_j][check_Na] == formulae[comp_i][comp_j][check_Na]
if o_test == n_test == s_test == na_test == True:
count = count +1
else:
count = 0
elif count == nbase_i:
matching = "Got a match! " + "[" +str(base_i) + "][" + str(base_j) + "] matches with " + "[" + str(comp_i) + "][" + str(comp_j) +"]"
print matching
else:
count = 0
I would take a look at using in such as
agg = []
for x in arr:
matched = [y for y in arr2 if x in y]
agg.append(matched)
Prune's answer not right, should be like this:
check_index = 11
# start with base (left-hand) formula
for base_i in len(formulae): # length of first index
for base_j in len(formulae[base_i]): # length of second index
# check against comparison (right-hand) formula
for comp_i in len(formulae): # length of first index
for comp_j in len(formulae[comp_i]): # length of second index
if formulae[base_i][base_j][check_index] == formulae[comp_i][comp_j][check_index]:
print "Got a match"
# Here you add whatever info *you* need to identify the match
I'm not sure I fully understand your data structure, hence I'm not gonna write code here but propose an idea: how about an inverted index?
Like you scan once the lists creating kind of a summary of where the value you look for is.
You could create a dictionary composed as follows:
{
'ValueOfInterest1': [ (position1), (position2) ],
'ValueOfInterest2': [ (positionX) ]
}
Then at the end you can have a look at the dictionary and see if any of the values (basically lists) have length > 1.
Of course you'd need to find a way to create a position format that makes sense to you.
Just an idea.
Does this get you going?
check_index = 11
# start with base (left-hand) formula
for base_i in len(formulae): # length of first index
for base_j in len(formulae[0]): # length of second index
# check against comparison (right-hand) formula
for comp_i in len(formulae): # length of first index
for comp_j in len(formulae[0]): # length of second index
if formulae[base_i][base[j] == formulae[comp_i][comp_j]:
print "Got a match"
# Here you add whatever info *you* need to identify the match

Consolidate IPs into ranges in python

Assume I have a list of IP ranges (last term only) that may or may not overlap:
('1.1.1.1-7', '2.2.2.2-10', '3.3.3.3-3.3.3.3', '1.1.1.4-25', '2.2.2.4-6')
I'm looking for a way to identify any overlapping ranges and consolidate them into single ranges.
('1.1.1.1-25', '2.2.2.2-10', '3.3.3.3-3')
Current thought for algorithm is to expand all ranges into a list of all IPs, eliminate duplicates, sort, and consolidate any consecutive IPs.
Any more python-esque algorithm suggestions?
Here is my version, as a module. My algorithm is identical to the one lunixbochs mentions in his answer, and the conversion from range string to integers and back is nicely modularized.
import socket, struct
def ip2long(ip):
packed = socket.inet_aton(ip)
return struct.unpack("!L", packed)[0]
def long2ip(n):
unpacked = struct.pack('!L', n)
return socket.inet_ntoa(unpacked)
def expandrange(rng):
# expand '1.1.1.1-7' to ['1.1.1.1', '1.1.1.7']
start, end = [ip.split('.') for ip in rng.split('-')]
return map('.'.join, (start, start[:len(start) - len(end)] + end))
def compressrange((start, end)):
# compress ['1.1.1.1', '1.1.1.7'] to '1.1.1.1-7'
start, end = start.split('.'), end.split('.')
return '-'.join(map('.'.join,
(start, end[next((i for i in range(4) if start[i] != end[i]), 3):])))
def strings_to_ints(ranges):
# turn range strings into list of lists of ints
return [map(ip2long, rng) for rng in map(expandrange, ranges)]
def ints_to_strings(ranges):
# turn lists of lists of ints into range strings
return [compressrange(map(long2ip, rng)) for rng in ranges]
def consolodate(ranges):
# join overlapping ranges in a sorted iterable
iranges = iter(ranges)
startmin, startmax = next(iranges)
for endmin, endmax in iranges:
# leave out the '+ 1' if you want to join overlapping ranges
# but not consecutive ranges.
if endmin <= (startmax + 1):
startmax = max(startmax, endmax)
else:
yield startmin, startmax
startmin, startmax = endmin, endmax
yield startmin, startmax
def convert_consolodate(ranges):
# convert a list of possibly overlapping ip range strings
# to a sorted, consolodated list of non-overlapping ip range strings
return list(ints_to_strings(consolodate(sorted(strings_to_ints(ranges)))))
if __name__ == '__main__':
ranges = ('1.1.1.1-7',
'2.2.2.2-10',
'3.3.3.3-3.3.3.3',
'1.1.1.4-25',
'2.2.2.4-6')
print convert_consolodate(ranges)
# prints ['1.1.1.1-25', '2.2.2.2-10', '3.3.3.3-3']
Convert your ranges into pairs of numbers. These functions will convert individual IPs to and from integer values.
def ip2long(ip):
packed = socket.inet_aton(ip)
return struct.unpack("!L", packed)[0]
def long2ip(n):
unpacked = struct.pack('!L', n)
return socket.inet_ntoa(unpacked)
Now you can sort/merge the edges of each range as numbers, then convert back to IPs to get a nice representation. This question about merging time ranges has a nice algorithm.
Parse your strings of 1.1.1.1-1.1.1.2 and 1.1.1.1-2 into a pair of numbers. For the latter format, you could do:
x = '1.1.1.1-2'
first, add = x.split('-')
second = first.rsplit('.', 1)[0] + '.' + add
pair = ip2long(first), ip2long(second)
Merge the overlapping ranges using simple number comparisons.
Convert back to string representation (still assumes latter format):
first, second = pair
first = long2ip(first) + '-' + long2ip(second).rsplit('.', 1)[1]
Once I faced the same problem. The only difference was that I had to efficiently keep line segments in a list. It was for a Monte-Carlo simulation. And the newly randomly generated line segments had to be added to the existing sorted and merged line segments.
I adapted the algorithm to your problem using the answer by lunixbochs to convert IPs to integers.
This solution allows to add a new IP range to the existing list of already merged ranges (while other solutions rely on having the list-of-ranges-to-merge sorted and do not allow adding a new range to already merged range list). It's done in add_range function by using bisect module to find the place where to insert the new IP range and then deleting the redundant IP intervals and inserting the new range with adjusted boundaries so that the new range embraces all the deleted ranges.
import socket
import struct
import bisect
def ip2long(ip):
'''IP to integer'''
packed = socket.inet_aton(ip)
return struct.unpack("!L", packed)[0]
def long2ip(n):
'''integer to IP'''
unpacked = struct.pack('!L', n)
return socket.inet_ntoa(unpacked)
def get_ips(s):
'''Convert string IP interval to tuple with integer representations of boundary IPs
'1.1.1.1-7' -> (a,b)'''
s1,s2 = s.split('-')
if s2.isdigit():
s2 = s1[:-1] + s2
return (ip2long(s1),ip2long(s2))
def add_range(iv,R):
'''add new Range to already merged ranges inplace'''
left,right = get_ips(R)
#left,right are left and right boundaries of the Range respectively
#If this is the very first Range just add it to the list
if not iv:
iv.append((left,right))
return
#Searching the first interval with left_boundary < left range side
p = bisect.bisect_right(iv, (left,right)) #place after the needed interval
p -= 1 #calculating the number of interval basing on the position where the insertion is needed
#Interval: |----X----| (delete)
#Range: <--<--|----------| (extend)
#Detect if the left Range side is inside the found interval
if p >=0: #if p==-1 then there was no interval found
if iv[p][1]>= right:
#Detect if the Range is completely inside the interval
return #drop the Range; I think it will be a very common case
if iv[p][1] >= left-1:
left = iv[p][0] #extending the left Range interval
del iv[p] #deleting the interval from the interval list
p -= 1 #correcting index to keep the invariant
#Intervals: |----X----| |---X---| (delete)
#Range: |-----------------------------|
#Deleting all the intervals which are inside the Range interval
while True:
p += 1
if p >= len(iv) or iv[p][0] >= right or iv[p][1] > right:
'Stopping searching for the intervals which is inside the Range interval'
#there are no more intervals or
#the interval is to the right of the right Range side
# it's the next case (right Range side is inside the interval)
break
del iv[p] #delete the now redundant interval from the interval list
p -= 1 #correcting index to keep the invariant
#Interval: |--------X--------| (delete)
#Range: |-----------|-->--> (extend)
#Working the case when the right Range side is inside the interval
if p < len(iv) and iv[p][0] <= right-1:
#there is no condition for right interval side since
#this case would have already been worked in the previous block
right = iv[p][1] #extending the right Range side
del iv[p] #delete the now redundant interval from the interval list
#No p -= 1, so that p is no pointing to the beginning of the next interval
#which is the position of insertion
#Inserting the new interval to the list
iv.insert(p, (left,right))
def merge_ranges(ranges):
'''Merge the ranges'''
iv = []
for R in ranges:
add_range(iv,R)
return ['-'.join((long2ip(left),long2ip(right))) for left,right in iv]
ranges = ('1.1.1.1-7', '2.2.2.2-10', '3.3.3.3-3.3.3.3', '1.1.1.4-25', '2.2.2.4-6')
print(merge_ranges(ranges))
Output:
['1.1.1.1-1.1.1.25', '2.2.2.2-2.2.2.10', '3.3.3.3-3.3.3.3']
This was a lot of fun for me to code! Thank you for that :)
The netaddr package does what you want.
See summarizing adjacent subnets with python netaddr cidr_merge
and https://netaddr.readthedocs.io/en/latest/tutorial_01.html#summarizing-list-of-addresses-and-subnets
Unify format of your ips, turn range into a pair of ints.
Now the task is much simpler - "consolidate" integer range. I believe there are a lot of existing efficient algorithm to do that, below only my naive try:
>>> orig_ranges = [(1,5), (7,12), (2,3), (13,13), (13,17)] # should result in (1,5), (7,12), (13,17)
>>> temp_ranges = {}
>>> for r in orig_ranges:
temp_ranges.setdefault(r[0], []).append('+')
temp_ranges.setdefault(r[1], []).append('-')
>>> start_count = end_count = 0
>>> start = None
>>> for key in temp_ranges:
if start is None:
start = key
start_count += temp_ranges[key].count('+')
end_count += temp_ranges[key].count('-')
if start_count == end_count:
print start, key
start = None
start_count = end_count = 0
1 5
7 12
13 17
The general idea is the next: after we put ranges one onto another (in temp_ranges dict), we may find new composed ranges simply by counting beginnings and endings of original ranges; once we got equality, we found a united range.
I had these lying around in case you need em, using socket/struct is probably better way to go though
def ip_str_to_int(address):
"""Convert IP address in form X.X.X.X to an int.
>>> ip_str_to_int('74.125.229.64')
1249764672
"""
parts = address.split('.')
parts.reverse()
return sum(int(v) * 256 ** i for i, v in enumerate(parts))
def ip_int_to_str(address):
"""Convert IP address int into the form X.X.X.X.
>>> ip_int_to_str(1249764672)
'74.125.229.64'
"""
parts = [(address & 255 << 8 * i) >> 8 * i for i in range(4)]
parts.reverse()
return '.'.join(str(x) for x in parts)

Categories