How to join integers intervals in python? - python

I have used the module intervals (http://pyinterval.readthedocs.io/en/latest/index.html)
And created an interval from a set or start, end tuples:
intervals = interval.interval([1,8], [7,10], [15,20])
Which result in interval([1.0, 10.0], [15.0, 20.0]) as the [1,8] and [7,10] overlaps.
But this module interprets the values of the pairs as real numbers, so two continuous intervals in integers will not be joined together.
Example:
intervals = interval.interval([1,8], [9,10], [11,20])
results in: interval([1.0, 8.0], [9.0, 10.0], [11.0, 20.0])
My question is how can I join this intervals as integers and not as real numbers? And in the last example the result would be interval([1.0, 20.0])

The intervals module pyinterval is used for real numbers, not for integers. If you want to use objects, you can create an integer interval class or you can also code a program to join integer intervals using the interval module:
def join_int_intervlas(int1, int2):
if int(int1[-1][-1])+1 >= int(int2[-1][0]):
return interval.interval([int1[-1][0], int2[-1][-1]])
else:
return interval.interval()

I believe you can use pyintervals for integer intervals too by adding interval([-0.5, 0.5]). With your example you get
In[40]: interval([1,8], [9,10], [11,20]) + interval([-0.5, 0.5])
Out[40]: interval([0.5, 20.5])

This takes a list of tuples like l = [(25,24), (17,18), (5,9), (24,16), (10,13), (15,19), (22,25)]
# Idea by Ben Voigt in https://stackoverflow.com/questions/32869247/a-container-for-integer-intervals-such-as-rangeset-for-c
def sort_condense(ivs):
if len(ivs) == 0:
return []
if len(ivs) == 1:
if ivs[0][0] > ivs[0][1]:
return [(ivs[0][1], ivs[0][0])]
else:
return ivs
eps = []
for iv in ivs:
ivl = min(iv)
ivr = max(iv)
eps.append((ivl, False))
eps.append((ivr, True))
eps.sort()
ret = []
level = 0
i = 0
while i < len(eps)-1:
if not eps[i][1]:
level = level+1
if level == 1:
left = eps[i][0]
else:
if level == 1:
if not eps[i+1][1]
and eps[i+1][0] == eps[i][0]+1:
i = i+2
continue
right = eps[i][0]
ret.append((left, right))
level = level-1
i = i+1
ret.append((left, eps[len(eps)-1][0]))
return ret
In [1]: sort_condense(l)
Out[1]: [(5, 13), (15, 25)]
The idea is outlined in Ben Voigt's answer to A container for integer intervals, such as RangeSet, for C++
Python is not my main language, sorry.

I came up with the following program:
ls = [[1,8], [7,10], [15,20]]
ls2 = []
prevList = ls[0]
for lists in ls[1:]:
if lists[0] <= prevList[1]+1:
prevList = [prevList[0], lists[1]]
else:
ls2.append(prevList)
prevList = lists
ls2.append(prevList)
print ls2 # prints [[1, 10], [15, 20]]
It permutes through all lists and checks if the firsy element of each list is less than or equal to the previous element + 1. If so, it clubs the two.

Related

Iterate through ranges and return those not in any range?

I have a list of floats.
values = [2.3, 6.4, 11.3]
What I want to do is find a range from each value in the list of size delta = 2, then iterate through another range of floats and compare each float to each range, then return the floats that do not fall in any ranges.
What I have so far is,
not_in_range =[]
for x in values:
pre = float(x - delta)
post = float(x + delta)
for y in numpy.arange(0,15,0.5):
if (pre <= y <= post) == True:
pass
else:
not_in_range.append(y)
But obviously, this does not work for several reasons: redundancy, does not check all ranges at once, etc. I am new to coding and I am struggling to think abstractly enough to solve this problem. Any help in formulating a plan of action would be greatly appreciated.
EDIT
For clarity, what I want is a list of ranges from each value (or maybe a numpy array?) as
[0.3, 4.3]
[4.4, 8.4]
[9.3, 13.3]
And to return any float from 0 - 15 in increments of 0.5 that do not fall in any of those ranges, so the final output would be:
not_in_ranges = [0, 8.5, 9, 13.5, 14, 14.5]
To generate the list of ranges, you could do a quick list comprehension:
ranges = [[x-2, x+2] for x in values]
## [[0.3, 4.3], [4.4, 8.4], [9.3, 13.3]]
Then, to return any float from 0 to 15 (in increments of 0.5) that don't fall in any of the ranges, you can use:
not_in_ranges = []
for y in numpy.arange(0, 15, 0.5): # for all desired values to check
if not any(pre < y and y < post for pre, post in ranges):
not_in_ranges.append(y) # if it is in none of the intervals, append it
## [0.0, 8.5, 9.0, 13.5, 14.0, 14.5]
Explanation: This loops through each of the possible values and appends it to the not_in_ranges list if it is not in any of the intervals. To check if it is in the intervals, I use the builtin python function any to check if there are any pre and post values in the list ranges that return True when pre < y < post (i.e. if y is in any of the intervals). If this is False, then it doesn't fit into any of the intervals and so is added to the list of such values.
Alternatively, if you only need the result (and not both of the lists), you can combine the two with something like:
not_in_ranges = []
for y in numpy.arange(0, 15, 0.5):
if not any(x-2 < y and y < x+2 for x in values):
not_in_ranges.append(y)
You could even use list comprehension again, giving the very pythonic looking:
not_in_ranges = [y for y in numpy.arange(0, 15, 0.5) if not any(x-2 < y and y < x+2 for x in values)]
Note that the last one is likely the fastest to run since the append call is quite slow and list comprehension is almost always faster. Though it certainly might not be the easiest to understand at a glance if you aren't already used to python list comprehension format.
I have done the comparative analysis (in jupyter notebook). Look the results.
# First cell
import numpy as np
values = np.random.randn(1000000)
values.shape
# Second cell
%%time
not_in_range =[]
for x in values:
pre = float(x - 2)
post = float(x + 2)
for y in np.arange(0,15,0.5):
if (pre <= y <= post) == True:
pass
else:
not_in_range.append(y)
# Second cell output - Wall time: 37.2 s
# Third cell
%%time
pre = values - 2
post = values + 2
whole_range = np.arange(0,15,0.5)
whole_range
search_range = []
for pr, po in zip(pre, post):
pr = (int(pr) + 0.5) if (pr%5) else int(pr)
po = (int(po) + 0.5) if (po%5) else int(po)
search_range += list(np.arange(pr, po, 0.5))
whole_range = set(whole_range)
search_range = set(search_range)
print(whole_range.difference(search_range))
# Third cell output - Wall time: 3.99 s
You can use the interval library intvalpy
from intvalpy import Interval
import numpy as np
values = [2.3, 6.4, 11.3]
delta = 2
intervals = values + Interval(-delta, delta)
not_in_ranges = []
for k in np.arange(0, 15, 0.5):
if not k in intervals:
not_in_ranges.append(k)
print(not_in_ranges)
Intervals are created according to the constructive definitions of interval arithmetic operations.
The in operator checks whether a point (or an interval) is contained within another interval.

Replacing 'NA's in a nested list

I am trying to do the following: identify if there is a 'NA' value in a nested list, and if so, to replace it with the average value of the sum of the other elements of the list. The elements of the lists should be floats. For example:
[["1.2","3.1","0.2"],["44.0","NA","90.0"]]
should return
[[1.2, 3.1, 0.2], [44.0, 67.0, 90.0]]
The code below, albeit long and redundant, works:
def convert_data(data):
first = []
second = []
third = []
fourth = []
count = 0
for i in data:
for y in i:
if 'NA' not in i:
y = float(y)
first.append(y)
elif 'NA' in i:
a = i.index('NA')
second.append(y)
second[a] = 0
for q in second:
q = float(q)
third.append(q)
count+= q
length = len(third)
count = count/(length-1)
third[a] = count
fourth.extend([first,third])
return fourth
data = [["1.2","3.1","0.2"],["44.0","NA","90.0"]]
convert_data(data)
for example:
data = [["1.2","3.1","0.2"],["44.0","NA","90.0"]]
convert_data(data)
returns the desired output:
[[1.2, 3.1, 0.2], [44.0, 67.0, 90.0]]
but if the 'NA' is in the first list e.g.
data = [["1.2","NA","0.2"],["44.0","67.00","90.0"]]
then it doesn't. Can someone please explain how to fix this?
data_var = [["1.2", "3.1", "0.2"], ["44.0", "NA", "90.0"]]
def replace_na_with_mean(list_entry):
for i in range(len(list_entry)):
index_list = []
m = 0
while 'NA' in list_entry[i]:
index_list.append(list_entry[i].index('NA') + m)
del list_entry[i][list_entry[i].index('NA')]
if list_entry[i]:
for n in range(len(list_entry[i])):
list_entry[i][n] = float(list_entry[i][n])
if index_list:
if list_entry[i]:
avg = sum(list_entry[i]) / len(list_entry[i])
else:
avg = 0
for l in index_list:
list_entry[i].insert(l, avg)
return list_entry
print(replace_na_with_mean(data_var))
I'd suggest to use pandas functionality, since these types of operations are exactly what pandas was developed for. One can simply achieve what you want in just few lines of code:
import pandas as pd
data = [["1.2","NA","0.2"],["44.0","67.00","90.0"]]
df = pd.DataFrame(data).T.replace("NA", pd.np.nan).astype('<f8')
res = df.fillna(df.mean()).T.values.tolist()
which returns the wanted output:
[[1.2, 0.7, 0.2], [44.0, 67.0, 90.0]]
Btw your code works for me just fine in this simple case:
convert_data(data)
> [[44.0, 67.0, 90.0], [1.2, 0.7, 0.2]]
It will definitely start failing or giving faulty results in more complicated cases, f.e. if you have more than 1 "NA" value in the nested list, you will get ValueError exception (you will be trying to convert string into float).
This should do the trick, using numpy:
import numpy as np
x=[["1.2","3.1","0.2"],["44.0","NA","90.0"]]
#convert to float
x=np.char.replace(np.array(x), "NA", "nan").astype(np.float)
#replace nan-s with mean
mask=x.astype(str)=="nan"
x[mask]=np.nanmean(x, axis=1)[mask.any(axis=1)]
Output:
[[ 1.2 3.1 0.2]
[44. 67. 90. ]]
One reason why your code ended up a little overcomplicated is that you tried to start by solving the problem of a "nested list." But really, all you need is a function that processes a list of numeric strings with some "NA" values, and then you can just apply that function to every item in the list.
def float_or_average(list_of_num_strings):
# First, convert every item that you can to a number. You need to do this
# before you can handle even ONE "NA" value, because the "NA" values need
# to be replaced with the average of all the numbers in the collection.
# So for now, convert ["1.2", "NA", "2.0"] to [1.2, "NA", 2.0]
parsed = []
# While we're at it, let's record the sum of the floats and their count,
# so that we can compute that average.
numeric_sum = 0.0
numeric_count = 0
for item in list_of_num_strings:
if item == "NA":
parsed.append(item)
else:
floating_point_value = float(item)
parsed.append(floating_point_value)
numeric_sum += floating_point_value
numeric_count += 1
# Now we can calculate the average:
average = numeric_sum / numeric_count
# And replace the "NA" values with them.
for i, item in enumerate(parsed):
if item == "NA":
parsed[i] == average
return parsed
# Or, with a list comprehension (replacing the previous four lines of
# code):
return [number if number != "NA" else average for number in parsed]
# Using this function on a nested list is as easy as
example_data = [["1.2", "3.1", "0.2"], ["44.0", "NA", "90.0"]]
parsed_nested_list = []
for sublist in example_data:
parsed_nested_list.append(float_or_average(sublist))
# Or, using a list comprehension (replacing the previous three lines of code):
parsed_nested_list = [float_or_average(sublist) for sublist in example_data]
def convert_data(data):
for lst in data:
sum = 0
index_na = list()
for elem in range(len(lst)):
if lst[elem] != 'NA':
sum += float(lst[elem])
lst[elem] = float(lst[elem])
else:
index_na.append(elem)
if len(index_na) > 0:
len_values = sum / (len(lst)-len(index_na))
for i in index_na:
lst[i] = float("{0:.2f}".format(len_values))
return data

Most frequently overlapping range - Python3.x

I'm a beginner, trying to write code listing the most frequently overlapping ranges in a list of ranges.
So, input is various ranges (#1 through #7 in the example figure; https://prntscr.com/kj80xl) and I would like to find the most common range (in the example 3,000- 4,000 in 6 out of 7 - 86 %). Actually, I would like to find top 5 most frequent.
Not all ranges overlap. Ranges are always positive and given as integers with 1 distance (standard range).
What I have now is only code comparing one sequence to another and returning the overlap, but after that I'm stuck.
def range_overlap(range_x,range_y):
x = (range_x[0], (range_x[-1])+1)
y = (range_y[0], (range_y[-1])+1)
overlap = (max(x[0],y[0]),min(x[-1],(y[-1])))
if overlap[0] <= overlap[1]:
return range(overlap[0], overlap[1])
else:
return "Out of range"
I would be very grateful for any help.
Better solution
I came up with a simpler solution (at least IMHO) so here it is:
def get_abs_min(ranges):
return min([min(r) for r in ranges])
def get_abs_max(ranges):
return max([max(r) for r in ranges])
def count_appearances(i, ranges):
return sum([1 for r in ranges if i in r])
def create_histogram(ranges):
keys = [str(i) for i in range(len(ranges) + 1)]
histogram = dict.fromkeys(keys)
results = []
min = get_abs_min(range_list)
max = get_abs_max(range_list)
for i in range(min, max):
count = str(count_appearances(i, ranges))
if histogram[count] is None:
histogram[count] = dict(start=i, end=None)
elif histogram[count]['end'] is None:
histogram[count]['end'] = i
elif histogram[count]['end'] == i - 1:
histogram[count]['end'] = i
else:
start = histogram[count]['start']
end = histogram[count]['end']
results.append((range(start, end + 1), count))
histogram[count]['start'] = i
histogram[count]['end'] = None
for count, d in histogram.items():
if d is not None and d['start'] is not None and d['end'] is not None:
results.append((range(d['start'], d['end'] + 1), count))
return results
def main(ranges, top):
appearances = create_histogram(ranges)
return sorted(appearances, key=lambda t: t[1], reverse=True)[:top]
The idea here is as simple as iterating through a superposition of all the ranges and building a histogram of appearances (e.g. the number of original ranges this current i appears in)
After that just sort and slice according to the chosen size of the results.
Just call main with the ranges and the top number you want (or None if you want to see all results).
OLDER EDITS BELOW
I (almost) agree with #Kasramvd's answer.
here is my take on it:
from collections import Counter
from itertools import combinations
def range_overlap(x, y):
common_part = list(set(x) & set(y))
if common_part:
return range(common_part[0], common_part[-1] +1)
else:
return False
def get_most_common(range_list, top_frequent):
overlaps = Counter(range_overlap(i, j) for i, j in
combinations(list_of_ranges, 2))
return [(r, i) for (r, i) in overlaps.most_common(top_frequent) if r]
you need to input the range_list and the number of top_frequent you want.
EDIT
the previous answer solved this question for all 2's combinations over the range list.
This edit is tested against your input and results with the correct answer:
from collections import Counter
from itertools import combinations
def range_overlap(*args):
sets = [set(r) for r in args]
common_part = list(set(args[0]).intersection(*sets))
if common_part:
return range(common_part[0], common_part[-1] +1)
else:
return False
def get_all_possible_combinations(range_list):
all_combos = []
for i in range(2, len(range_list)):
all_combos.append(combinations(range_list, i))
all_combos = [list(combo) for combo in all_combos]
return all_combos
def get_most_common_for_combo(combo):
return list(filter(None, [range_overlap(*option) for option in combo]))
def get_most_common(range_list, top_frequent):
all_overlaps = []
combos = get_all_possible_combinations(range_list)
for combo in combos:
all_overlaps.extend(get_most_common_for_combo(combo))
return [r for (r, i) in Counter(all_overlaps).most_common(top_frequent) if r]
And to get the results just run get_most_common(range_list, top_frequent)
Tested on my machine (ubunut 16.04 with python 3.5.2) with your input range_list and top_frequent = 5 with the results:
[range(3000, 4000), range(2500, 4000), range(1500, 4000), range(3000, 6000), range(1, 4000)]
You can first change your function to return a valid range in both cases so that you can use it in a set of comparisons. Also, since Python's range objects are not already created iterables but smart objects that only get start, stop and step attributes of a range and create the range on-demand, you can do a little change on your function as well.
def range_overlap(range_x,range_y):
rng = range(max(range_x.start, range_y.start),
min(range_x.stop, range_y.stop)+1)
if rng.start < rng.stop:
return rng.start, rng.stop
Now, if you have a set of ranges and you want to compare all the pairs you can use itertools.combinations to get all the pairs and then using range_overlap and collections.Counter you can find the number of overlapped ranges.
from collections import Counter
from itertools import combinations
overlaps = Counter(range_overlap(i,j) for i, j in
combinations(list_of_ranges, 2))

How to reduce a collection of ranges to a minimal set of ranges [duplicate]

This question already has answers here:
Union of multiple ranges
(5 answers)
Closed 7 years ago.
I'm trying to remove overlapping values from a collection of ranges.
The ranges are represented by a string like this:
499-505 100-115 80-119 113-140 500-550
I want the above to be reduced to two ranges: 80-140 499-550. That covers all the values without overlap.
Currently I have the following code.
cr = "100-115 115-119 113-125 80-114 180-185 500-550 109-120 95-114 200-250".split(" ")
ar = []
br = []
for i in cr:
(left,right) = i.split("-")
ar.append(left);
br.append(right);
inc = 0
for f in br:
i = int(f)
vac = []
jnc = 0
for g in ar:
j = int(g)
if(i >= j):
vac.append(j)
del br[jnc]
jnc += jnc
print vac
inc += inc
I split the array by - and store the range limits in ar and br. I iterate over these limits pairwise and if the i is at least as great as the j, I want to delete the element. But the program doesn't work. I expect it to produce this result: 80-125 500-550 200-250 180-185
For a quick and short solution,
from operator import itemgetter
from itertools import groupby
cr = "499-505 100-115 80-119 113-140 500-550".split(" ")
fullNumbers = []
for i in cr:
a = int(i.split("-")[0])
b = int(i.split("-")[1])
fullNumbers+=range(a,b+1)
# Remove duplicates and sort it
fullNumbers = sorted(list(set(fullNumbers)))
# Taken From http://stackoverflow.com/questions/2154249
def convertToRanges(data):
result = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
group = map(itemgetter(1), g)
result.append(str(group[0])+"-"+str(group[-1]))
return result
print convertToRanges(fullNumbers)
#Output: ['80-140', '499-550']
For the given set in your program, output is ['80-125', '180-185', '200-250', '500-550']
Main Possible drawback of this solution: This may not be scalable!
Let me offer another solution that doesn't take time linearly proportional to the sum of the range sizes. Its running time is linearly proportional to the number of ranges.
def reduce(range_text):
parts = range_text.split()
if parts == []:
return ''
ranges = [ tuple(map(int, part.split('-'))) for part in parts ]
ranges.sort()
new_ranges = []
left, right = ranges[0]
for range in ranges[1:]:
next_left, next_right = range
if right + 1 < next_left: # Is the next range to the right?
new_ranges.append((left, right)) # Close the current range.
left, right = range # Start a new range.
else:
right = max(right, next_right) # Extend the current range.
new_ranges.append((left, right)) # Close the last range.
return ' '.join([ '-'.join(map(str, range)) for range in new_ranges ]
This function works by sorting the ranges, then looking at them in order and merging consecutive ranges that intersect.
Examples:
print(reduce('499-505 100-115 80-119 113-140 500-550'))
# => 80-140 499-550
print(reduce('100-115 115-119 113-125 80-114 180-185 500-550 109-120 95-114 200-250'))
# => 80-125 180-185 200-250 500-550

python - unique set of ranges, merging when needed

Is there a datastructure that will maintain a unique set of ranges, merging an contiguous or overlapping ranges that are added? I need to track which ranges have been processed, but this may occur in an arbitrary order. E.g.:
range_set = RangeSet() # doesn't exist that I know of, this is what I need help with
def process_data(start, end):
global range_set
range_set.add_range(start, end)
# ...
process_data(0, 10)
process_data(20, 30)
process_data(5, 15)
process_data(50, 60)
print(range_set.missing_ranges())
# [[16,19], [31, 49]]
print(range_set.ranges())
# [[0,15], [20,30], [50, 60]]
Notice that overlapping or contiguous ranges get merged together. What is the best way to do this? I looked at using the bisect module, but its use didn't seem terribly clear.
Another approach is based on sympy.sets.
>>> import sympy as sym
>>> a = sym.Interval(1, 2, left_open=False, right_open=False)
>>> b = sym.Interval(3, 4, left_open=False, right_open=False)
>>> domain = sym.Interval(0, 10, left_open=False, right_open=False)
>>> missing = domain - a - b
>>> missing
[0, 1) U (2, 3) U (4, 10]
>>> 2 in missing
False
>>> missing.complement(domain)
[1, 2] U [3, 4]
You could get some similar functionality with pythons built-in set data structure; supposing only integer values are valid for start and end.
>>> whole_domain = set(range(12))
>>> A = set(range(0,1))
>>> B = set(range(4,9))
>>> C = set(range(3,6)) # processed range(3,5) twice
>>> done = A | B | C
>>> print done
set([0, 3, 4, 5, 6, 7, 8])
>>> missing = whole_domain - done
>>> print missing
set([1, 2, 9, 10, 11])
This still lacks many 'range'-features but might be sufficient.
A simple query if a certain range was already processed could look like this:
>>> isprocessed = [foo in done for foo in set(range(2,6))]
>>> print isprocessed
[False, True, True, True]
I've only lightly tested it, but it sounds like you're looking for something like this. You'll need to add the methods to get the ranges and missing ranges yourself, but it should be very straighforward as RangeSet.ranges is a list of Range objects maintained in sorted order. For a more pleasant interface you could write a convenience method that converted it to a list of 2-tuples, for example.
EDIT: I've just modified it to use less-than-or-equal comparisons for merging. Note, however, that this won't merge "adjacent" entries (e.g. it won't merge (1, 5) and (6, 10)). To do this you'd need to simply modify the condition in Range.check_merge().
import bisect
class Range(object):
# Reduces memory usage, overkill unless you're using a lot of these.
__slots__ = ["start", "end"]
def __init__(self, start, end):
"""Initialise this range."""
self.start = start
self.end = end
def __cmp__(self, other):
"""Sort ranges by their initial item."""
return cmp(self.start, other.start)
def check_merge(self, other):
"""Merge in specified range and return True iff it overlaps."""
if other.start <= self.end and other.end >= self.start:
self.start = min(other.start, self.start)
self.end = max(other.end, self.end)
return True
return False
class RangeSet(object):
def __init__(self):
self.ranges = []
def add_range(self, start, end):
"""Merge or insert the specified range as appropriate."""
new_range = Range(start, end)
offset = bisect.bisect_left(self.ranges, new_range)
# Check if we can merge backwards.
if offset > 0 and self.ranges[offset - 1].check_merge(new_range):
new_range = self.ranges[offset - 1]
offset -= 1
else:
self.ranges.insert(offset, new_range)
# Scan for forward merges.
check_offset = offset + 1
while (check_offset < len(self.ranges) and
new_range.check_merge(self.ranges[offset+1])):
check_offset += 1
# Remove any entries that we've just merged.
if check_offset - offset > 1:
self.ranges[offset+1:check_offset] = []
You have hit on a good solution in your example use case. Rather than try to maintain a set of the ranges that have been used, keep track of the ranges that haven't been used. This makes the problem pretty easy.
class RangeSet:
def __init__(self, min, max):
self.__gaps = [(min, max)]
self.min = min
self.max = max
def add(self, lo, hi):
new_gaps = []
for g in self.__gaps:
for ng in (g[0],min(g[1],lo)),(max(g[0],hi),g[1]):
if ng[1] > ng[0]: new_gaps.append(ng)
self.__gaps = new_gaps
def missing_ranges(self):
return self.__gaps
def ranges(self):
i = iter([self.min] + [x for y in self.__gaps for x in y] + [self.max])
return [(x,y) for x,y in zip(i,i) if y > x]
The magic is in the add method, which checks each existing gap to see whether it is affected by the new range, and adjusts the list of gaps accordingly.
Note that the behaviour of the tuples used for ranges here is the same as Python's range objects, i.e. they are inclusive of the start value and exclusive of the stop value. This class will not behave in exactly the way you described in your question, where your ranges seem to be inclusive of both.
Have a look at portion (https://pypi.org/project/portion/). I'm the maintainer of this library, and it supports disjuction of continuous intervals out of the box. It automatically simplifies adjacent and overlapping intervals.
Consider the intervals provided in your example:
>>> import portion as P
>>> i = P.closed(0, 10) | P.closed(20, 30) | P.closed(5, 15) | P.closed(50, 60)
>>> # get "used ranges"
>>> i
[0,15] | [20,30] | [50,60]
>>> # get "missing ranges"
>>> i.enclosure - i
(15,20) | (30,50)
Similar to DavidT's answer – also based on sympy's sets, but using a list of any length and addition (union) in a single operation:
import sympy
intervals = [[1,4], [6,10], [3,5], [7,8]] # pairs of left,right
print(intervals)
symintervals = [sympy.Interval(i[0],i[1], left_open=False, right_open=False) for i in intervals]
print(symintervals)
merged = sympy.Union(*symintervals) # one operation; adding to an union one by one is much slower for a large number of intervals
print(merged)
for i in merged.args: # assumes that the "merged" result is an union, not a single interval
print(i.left, i.right) # getting bounds of merged intervals
Here's my solution:
def flatten(collection):
subset = set()
for elem in collection:
to_add = elem
to_remove = set()
for s in subset:
if s[0] <= to_add[0] <= s[1] or s[0] <= to_add[1] <= s[1] or (s[0] > to_add[0] and s[1] < to_add[1]):
to_remove.add(s)
to_add = (min(to_add[0], s[0]), max(to_add[1], s[1]))
subset -= to_remove
subset.add(to_add)
return subset
range_set = {(-12, 4), (3, 20), (21, 25), (25, 30), (-13, -11), (5, 10), (-13, 20)}
print(flatten(range_set))
# {(21, 30), (-13, 20)}

Categories