Related
For each element of a python list, I want to check if others are in a given range corresponding to that element and then return those elements with their indices.
For example:
list = [9, 10, 15, 20, 21, 22]
range = (x-2 to x+2)
Result:
9 - range: 7-11, element:10, index:1
10 - range: 8-12, element:9, index:0
15 - range: 13-17, element:, index:
20 - range: 18-22, element:21,22, index:4,5
21 - range: 19-23, element:20,22, index:3,5
22 - range: 20-24, element:20,21, index:3,4
You can use a nested list comprehension, using enumerate to get indexes of each element and comparing all the other values with the current value +/-2:
l = [9, 10, 15, 20, 21, 22]
result = [[(v, i) for i, v in enumerate(l) if abs(v-x) < 3 and i != j] for j, x in enumerate(l)]
Output:
[
[(10, 1)],
[(9, 0)],
[],
[(21, 4), (22, 5)],
[(20, 3), (22, 5)],
[(20, 3), (21, 4)]
]
Since your list is sorted you can make this more efficient, but you can use this as a starting point.
list = [9, 10, 15, 20, 21, 22]
for a in list:
in_range = []
indices = []
for i, b in enumerate(list):
if b == a:
continue
if a - 2 <= b <= a + 2:
in_range.append(b)
indices.append(i)
print(f'{a} - range: {a - 2}-{a + 2}, element:{in_range}, index:{indices}')
I have three lists price1, price2 and deviation. I wanted to find the top three highest price between price1 and price2. So to achieve that I first sorted them in decreasing order then I took the first 3 elements from both the list price1 and price2, and after that, I found the max between each max. Now I want to know the original position of the max value that is obtained after all this so that I can put it in deviation. Before sorting price1 and price2:
price1 = [12,1,2,4,45,67,433,111,45,12] #Total 10 elements in all the lists
price2 = [23,12,233,11,54,232,323,12,42,4]
deviation = [23,45,56,67,78,786,45,34,2,1]
After sorting to get the top 3 prices:
print("Price1: ",sorted(price1, reverse=True))
print("Price2: ",sorted(price2, reverse=True))
output:
Price1: [433, 111, 67]
Price2: [323, 233, 232]
To get the max from it:
sortedPrice1 = sorted(price1, reverse=True)
sortedPrice2 = sorted(price2, reverse=True)
print("Price1: ", sortedPrice1[:3])
print("Price2: ", sortedPrice2[:3])
for i,j in zip(sortedPrice1[:3], sortedPrice2[:3]):
print("max: ", max(i, j))
output:
Price1: [433, 111, 67]
Price2: [323, 233, 232]
max: 433
max: 233
max: 232
Now What I want is that I want to find the postions of these max values. for example:
433 is in `price1` at 6th position
233 is in `price2` at 2nd position
232 is in `price2` at 5th position
and ultimately in the end I want to put these positions into the deviation list to get the value in front of these prices. so:
deviation[6] = 45
deviations[2] = 56
deviations[5] = 786
No need to use list.index(), zip() is enough:
price1 = [12,1,2,4,45,67,433,111,45,12] #Total 10 elements in all the lists
price2 = [23,12,233,11,54,232,323,12,42,4]
deviation = [23,45,56,67,78,786,45,34,2,1]
for a, b, _ in zip( sorted(zip(price1, deviation), key=lambda k: k[0], reverse=True),
sorted(zip(price2, deviation), key=lambda k: k[0], reverse=True), range(3) ): # range(3) because we want max 3 elements
print(max(a, b, key=lambda k: k[0]))
Prints:
(433, 45)
(233, 56)
(232, 786)
EDIT: To have sublists in price1/price2/deviation lists, you can do:
price1 = [[12, 2, 3, 4],[1, 2, 5, 56],[12,34,45,3],[23,2,3,4],[1,6,55,34]]
price2 = [[1, 2, 3, 4],[1, 2, 3, 4],[1, 2, 3, 4],[1, 2, 3, 4],[1, 2, 3, 4]]
deviation = [[10, 20, 30, 40],[10, 20, 30, 40],[10, 20, 30, 40],[10, 20, 30, 40],[10, 20, 30, 40]]
for p1, p2, dev in zip(price1, price2, deviation):
print('price1=', p1)
print('price2=', p2)
print('dev=', dev)
for a, b, _ in zip( sorted(zip(p1, dev), key=lambda k: k[0], reverse=True),
sorted(zip(p2, dev), key=lambda k: k[0], reverse=True), range(3) ): # range(3) because we want max 3 elements
print(max(a, b, key=lambda k: k[0]))
print()
Prints:
price1= [12, 2, 3, 4]
price2= [1, 2, 3, 4]
dev= [10, 20, 30, 40]
(12, 10)
(4, 40)
(3, 30)
price1= [1, 2, 5, 56]
price2= [1, 2, 3, 4]
dev= [10, 20, 30, 40]
(56, 40)
(5, 30)
(2, 20)
price1= [12, 34, 45, 3]
price2= [1, 2, 3, 4]
dev= [10, 20, 30, 40]
(45, 30)
(34, 20)
(12, 10)
price1= [23, 2, 3, 4]
price2= [1, 2, 3, 4]
dev= [10, 20, 30, 40]
(23, 10)
(4, 40)
(3, 30)
price1= [1, 6, 55, 34]
price2= [1, 2, 3, 4]
dev= [10, 20, 30, 40]
(55, 30)
(34, 40)
(6, 20)
The list method index returns the position of an element in a list. If you replace your max function with an if statement, you can produce the desired output
for i,j in zip(sortedPrice1[:3], sortedPrice2[:3]):
if i > j:
print("%d is in `price1` at the %dth position" % (i, price1.index(i))
else:
print("%d is in `price2` at the %dth position" % (j, price2.index(j))
Note that this code:
Doesn't deal with the case i=j
Doesn't deal with multiple members of price1/price2 which are both in the top three.
You can do it with a try / except too but with the same limitations as #Yberman states in his answer but without needing to define any other variable than the lists itself:
price1 = [12,1,2,4,45,67,433,111,45,12] #Total 10 elements in all the lists
price2 = [23,12,233,11,54,232,323,12,42,4]
deviation = [23,45,56,67,78,786,45,34,2,1]
for i,j in zip(sorted(price1,reverse=True)[:3], sorted(price2,reverse=True)[:3]):
try:
print(deviation[price1.index(max(i, j))])
except ValueError:
print(deviation[price2.index(max(i,j))])
Output:
45
56
786
First create a copy of all 3 lists such as,
price1 = [12,1,2,4,45,67,433,111,45,12] #Total 10 elements in all the lists
price2 = [23,12,233,11,54,232,323,12,42,4]
deviation = [23,45,56,67,78,786,45,34,2,1]
d_p1=price1
d_p2=price2
d_dev=deviation
then perform all the operation to get max elements on this duplicate lists,
and then when you have max elements in a list such as,
max_l=[433,233,232]
traverse through this list to get position,
for i in max_l:
if i in price_1:
print("index:",price_1.index(i))
else:
print("index:",price_2.index(i))
if the positions of elements are changing after performing some operations then this process can be helpful to find original positions.
You can do it it two lines of code.
First find the index.
item_index_in_original_list = price1.index(433)
Then use it in your deviation list
deviation_of_max_item = deviation[item_index_in_original_list]
To repeat this for other items you can make a function with these two lines and call it.
def find_deviation(item_value,original_list):
item_index_in_original_list = original_list.index(item_value)
return deviation[item_index_in_original_list]
required_deviation = find_deviation(433, price1)
I have two sorted lists of same length, e.g.:
first_list = [3, 5, 15, 19, 23]
second_list = [0, 11, 22, 34, 43]
I wish to know if the numbers of the first list are in the interval formed by the two numbers in a row from the second list. Then, i wish to print the value of the first list and the interval.
As an example:
3 in range(0,11) # True
5 in range(0,11) # True
15 in range(0,11) # False, next
15 in range(11,22) # True
and so on.
first_list = [3, 5, 15, 19, 23]
second_list = [0, 11, 22, 34, 43]
aux = 0
count = 1
n_iterations = len(first_list)
iteration = 0
while iteration != n_iterations:
iteration += 1
for i in first_list:
counter = first_list.index(i)
try:
interval = range(second_list[aux],second_list[count])
if i in interval:
print(i, 'in', interval)
if i not in interval:
aux += 1
count += 1
interval = range(second_list[aux], second_list[count])
if i in interval:
print(i, 'in' ,interval)
except IndexError:
break
The code i've tried does actually work. But it seems so inefficient and 'hard-to-read' that there must be another way to do it.
edit: Since the question was updated, I updated my answer a little. I happily concede that this solution is not optimal, and takes more time than user7440787's solution. I focused more on the "hard-to-read" than "inefficient" part of the problem.
First, my answer, then some explanations.
first_list = [3, 5, 15, 19, 23]
second_list = [0, 11, 22, 34, 43]
for low,high in zip(second_list, second_list[1:]):
for val in first_list:
if low <= val <= high:
print("{} is in {}".format(val, (low, high)))
Output:
3 is in (0, 11)
5 is in (0, 11)
15 is in (11, 22)
19 is in (11, 22)
23 is in (22, 34)
This gives you simple elegance, while still being readable. It's always a fun challenge to stick everything on to one line, but the code is much easier to handle when you break it into significant parts. Of course, there are other ways you could have done this, but this shows off quite a few niceties of python.
zip(second_list, second_list[1:]) goes through the list of pairs in second_list. The [1:] slices the list to be all but the first element of it, while the zip takes the two views of the list and yields a tuple for each pair, which then get assigned to low and high.
low <= val <= high is a logical error in C, C++ or C#, but is perfectly right in python. It is defined as "low <= val and val <= high", which is exactly what you're looking for in this case.
Using str.format is an easy way to output all types of variables in python.
If you want to print the first time an entry is not in the list, then you need:
first_list = [3, 5, 15, 19, 23]
second_list = [0, 11, 22, 34, 43]
regions = list(zip(second_list[:-1],second_list[1:]))
n, i = 0, 0
print_stm = '%i %s in range(%i,%i)'
while n < len(regions) and i < len(first_list):
if regions[n][0] <= first_list[i] <= regions[n][1]:
print(print_stm % ((first_list[i], '') + regions[n]))
i+=1
else:
print(print_stm % ((first_list[i], 'not') + regions[n]))
n+=1
The result is:
3 in range(0,11)
5 in range(0,11)
15 not in range(0,11)
15 in range(11,22)
19 in range(11,22)
23 not in range(11,22)
23 in range(22,34)
If you're only interested in knowing the intervals for each entry in first_list, you can simplify it to this:
first_list = [3, 5, 15, 19, 23]
second_list = [0, 11, 22, 34, 43]
regions = zip(second_list[:-1],second_list[1:])
intervals = [ (i, n) for i in first_list for n in regions if n[0] < i < n[1]]
The result is:
[(3, (0, 11)), (5, (0, 11)), (15, (11, 22)), (19, (11, 22)), (23, (22, 34))]
Here's what i've done :
first_list = [3, 5, 15, 19, 23]
second_list = [0, 11, 22, 34, 43]
L = list(zip(second_list[:-1], second_list[1:]))
i = 0
n = 0
interval = L[i]
while n < len(first_list):
nb = first_list[n]
while nb not in range(interval[0],interval[1]):
print(nb, ' is not in range', interval)
i += 1
if i >= len(L):
break
interval = L[i]
if i >= len(L):
break
print(nb, ' is in range', interval)
n += 1
The output :
3 is in range(0, 11)
5 is in range(0, 11)
15 is not in range(0, 11)
15 is in range(11, 22)
19 is in range(11, 22)
23 is not in range(11, 22)
23 is in range(22, 34)
One line solution for all possible pairs of ranges defined by the second list.
Your code seems fine but here is a shorter version:
# define all duo pairs based on `second_list`
all_pairs = [[second_list[p1], second_list[p2]] for p1 in range(len(second_list)) for p2 in range(p1+1,len(second_list))]
results = list()
results.append([[element,"is in", (all_pairs[i][0],all_pairs[i][1])] for element in first_list for i in range(len(all_pairs)) if element in range(all_pairs[i][0],all_pairs[i][1])])
If I understood correctly, the above should work.
Output:
[[[3, 'is in', (0, 11)],
[3, 'is in', (0, 22)],
[3, 'is in', (0, 34)],
[3, 'is in', (0, 43)],
[5, 'is in', (0, 11)],
[5, 'is in', (0, 22)],
[5, 'is in', (0, 34)],
[5, 'is in', (0, 43)],
[15, 'is in', (0, 22)],
[15, 'is in', (0, 34)],
[15, 'is in', (0, 43)],
[15, 'is in', (11, 22)],
[15, 'is in', (11, 34)],
[15, 'is in', (11, 43)],
[19, 'is in', (0, 22)],
[19, 'is in', (0, 34)],
[19, 'is in', (0, 43)],
[19, 'is in', (11, 22)],
[19, 'is in', (11, 34)],
[19, 'is in', (11, 43)],
[23, 'is in', (0, 34)],
[23, 'is in', (0, 43)],
[23, 'is in', (11, 34)],
[23, 'is in', (11, 43)],
[23, 'is in', (22, 34)],
[23, 'is in', (22, 43)]]]
I'd like to identify groups of consecutive numbers in a list, so that:
myfunc([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20])
Returns:
[(2,5), (12,17), 20]
And was wondering what the best way to do this was (particularly if there's something inbuilt into Python).
Edit: Note I originally forgot to mention that individual numbers should be returned as individual numbers, not ranges.
EDIT 2: To answer the OP new requirement
ranges = []
for key, group in groupby(enumerate(data), lambda (index, item): index - item):
group = map(itemgetter(1), group)
if len(group) > 1:
ranges.append(xrange(group[0], group[-1]))
else:
ranges.append(group[0])
Output:
[xrange(2, 5), xrange(12, 17), 20]
You can replace xrange with range or any other custom class.
Python docs have a very neat recipe for this:
from operator import itemgetter
from itertools import groupby
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
print(map(itemgetter(1), g))
Output:
[2, 3, 4, 5]
[12, 13, 14, 15, 16, 17]
If you want to get the exact same output, you can do this:
ranges = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
group = map(itemgetter(1), g)
ranges.append((group[0], group[-1]))
output:
[(2, 5), (12, 17)]
EDIT: The example is already explained in the documentation but maybe I should explain it more:
The key to the solution is
differencing with a range so that
consecutive numbers all appear in same
group.
If the data was: [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
Then groupby(enumerate(data), lambda (i,x):i-x) is equivalent of the following:
groupby(
[(0, 2), (1, 3), (2, 4), (3, 5), (4, 12),
(5, 13), (6, 14), (7, 15), (8, 16), (9, 17)],
lambda (i,x):i-x
)
The lambda function subtracts the element index from the element value. So when you apply the lambda on each item. You'll get the following keys for groupby:
[-2, -2, -2, -2, -8, -8, -8, -8, -8, -8]
groupby groups elements by equal key value, so the first 4 elements will be grouped together and so forth.
I hope this makes it more readable.
python 3 version may be helpful for beginners
import the libraries required first
from itertools import groupby
from operator import itemgetter
ranges =[]
for k,g in groupby(enumerate(data),lambda x:x[0]-x[1]):
group = (map(itemgetter(1),g))
group = list(map(int,group))
ranges.append((group[0],group[-1]))
more_itertools.consecutive_groups was added in version 4.0.
Demo
import more_itertools as mit
iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
[list(group) for group in mit.consecutive_groups(iterable)]
# [[2, 3, 4, 5], [12, 13, 14, 15, 16, 17], [20]]
Code
Applying this tool, we make a generator function that finds ranges of consecutive numbers.
def find_ranges(iterable):
"""Yield range of consecutive numbers."""
for group in mit.consecutive_groups(iterable):
group = list(group)
if len(group) == 1:
yield group[0]
else:
yield group[0], group[-1]
iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
list(find_ranges(iterable))
# [(2, 5), (12, 17), 20]
The source implementation emulates a classic recipe (as demonstrated by #Nadia Alramli).
Note: more_itertools is a third-party package installable via pip install more_itertools.
The "naive" solution which I find somewhat readable atleast.
x = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 22, 25, 26, 28, 51, 52, 57]
def group(L):
first = last = L[0]
for n in L[1:]:
if n - 1 == last: # Part of the group, bump the end
last = n
else: # Not part of the group, yield current group and start a new
yield first, last
first = last = n
yield first, last # Yield the last group
>>>print list(group(x))
[(2, 5), (12, 17), (22, 22), (25, 26), (28, 28), (51, 52), (57, 57)]
Assuming your list is sorted:
>>> from itertools import groupby
>>> def ranges(lst):
pos = (j - i for i, j in enumerate(lst))
t = 0
for i, els in groupby(pos):
l = len(list(els))
el = lst[t]
t += l
yield range(el, el+l)
>>> lst = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
>>> list(ranges(lst))
[range(2, 6), range(12, 18)]
Here it is something that should work, without any import needed:
def myfunc(lst):
ret = []
a = b = lst[0] # a and b are range's bounds
for el in lst[1:]:
if el == b+1:
b = el # range grows
else: # range ended
ret.append(a if a==b else (a,b)) # is a single or a range?
a = b = el # let's start again with a single
ret.append(a if a==b else (a,b)) # corner case for last single/range
return ret
Please note that the code using groupby doesn't work as given in Python 3 so use this.
for k, g in groupby(enumerate(data), lambda x:x[0]-x[1]):
group = list(map(itemgetter(1), g))
ranges.append((group[0], group[-1]))
This doesn't use a standard function - it just iiterates over the input, but it should work:
def myfunc(l):
r = []
p = q = None
for x in l + [-1]:
if x - 1 == q:
q += 1
else:
if p:
if q > p:
r.append('%s-%s' % (p, q))
else:
r.append(str(p))
p = q = x
return '(%s)' % ', '.join(r)
Note that it requires that the input contains only positive numbers in ascending order. You should validate the input, but this code is omitted for clarity.
import numpy as np
myarray = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
sequences = np.split(myarray, np.array(np.where(np.diff(myarray) > 1)[0]) + 1)
l = []
for s in sequences:
if len(s) > 1:
l.append((np.min(s), np.max(s)))
else:
l.append(s[0])
print(l)
Output:
[(2, 5), (12, 17), 20]
I think this way is simpler than any of the answers I've seen here (Edit: fixed based on comment from Pleastry):
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
starts = [x for x in data if x-1 not in data and x+1 in data]
ends = [x for x in data if x-1 in data and x+1 not in data and x not in starts]
singles = [x for x in data if x-1 not in data and x+1 not in data]
list(zip(starts, ends)) + singles
Output:
[(2, 5), (12, 17), 20]
Edited:
As #dawg notes, this is O(n**2). One option to improve performance would be to convert the original list to a set (and also the starts list to a set) i.e.
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
data_as_set = set(data)
starts = [x for x in data_as_set if x-1 not in data_as_set and x+1 in data_as_set]
startset = set(starts)
ends = [x for x in data_as_set if x-1 in data_as_set and x+1 not in data_as_set and x not in startset]
singles = [x for x in data_as_set if x-1 not in data_as_set and x+1 not in data_as_set]
print(list(zip(starts, ends)) + singles)
Using groupby and count from itertools gives us a short solution. The idea is that, in an increasing sequence, the difference between the index and the value will remain the same.
In order to keep track of the index, we can use an itertools.count, which makes the code cleaner as using enumerate:
from itertools import groupby, count
def intervals(data):
out = []
counter = count()
for key, group in groupby(data, key = lambda x: x-next(counter)):
block = list(group)
out.append([block[0], block[-1]])
return out
Some sample output:
print(intervals([0, 1, 3, 4, 6]))
# [[0, 1], [3, 4], [6, 6]]
print(intervals([2, 3, 4, 5]))
# [[2, 5]]
This is my method in which I tried to prioritize readability. Note that it returns a tuple of the same values if there is only one value in a group. That can be fixed easily in the second snippet I'll post.
def group(values):
"""return the first and last value of each continuous set in a list of sorted values"""
values = sorted(values)
first = last = values[0]
for index in values[1:]:
if index - last > 1: # triggered if in a new group
yield first, last
first = index # update first only if in a new group
last = index # update last on every iteration
yield first, last # this is needed to yield the last set of numbers
Here is the result of a test:
values = [0, 5, 6, 7, 12, 13, 21, 22, 23, 24, 25, 26, 30, 44, 45, 50]
result = list(group(values))
print(result)
result = [(0, 0), (5, 7), (12, 13), (21, 26), (30, 30), (44, 45), (50, 50)]
If you want to return only a single value in the case of a single value in a group, just add a conditional check to the yields:
def group(values):
"""return the first and last value of each continuous set in a list of sorted values"""
values = sorted(values)
first = last = values[0]
for index in values[1:]:
if index - last > 1: # triggered if in a new group
if first == last:
yield first
else:
yield first, last
first = index # update first only if in a new group
last = index # update last on every iteration
if first == last:
yield first
else:
yield first, last
result = [0, (5, 7), (12, 13), (21, 26), 30, (44, 45), 50]
Here's the answer I came up with. I'm writing the code for other people to understand, so I'm fairly verbose with variable names and comments.
First a quick helper function:
def getpreviousitem(mylist,myitem):
'''Given a list and an item, return previous item in list'''
for position, item in enumerate(mylist):
if item == myitem:
# First item has no previous item
if position == 0:
return None
# Return previous item
return mylist[position-1]
And then the actual code:
def getranges(cpulist):
'''Given a sorted list of numbers, return a list of ranges'''
rangelist = []
inrange = False
for item in cpulist:
previousitem = getpreviousitem(cpulist,item)
if previousitem == item - 1:
# We're in a range
if inrange == True:
# It's an existing range - change the end to the current item
newrange[1] = item
else:
# We've found a new range.
newrange = [item-1,item]
# Update to show we are now in a range
inrange = True
else:
# We were in a range but now it just ended
if inrange == True:
# Save the old range
rangelist.append(newrange)
# Update to show we're no longer in a range
inrange = False
# Add the final range found to our list
if inrange == True:
rangelist.append(newrange)
return rangelist
Example run:
getranges([2, 3, 4, 5, 12, 13, 14, 15, 16, 17])
returns:
[[2, 5], [12, 17]]
Using numpy + comprehension lists:
With numpy diff function, consequent input vector entries that their difference is not equal to one can be identified. The start and end of the input vector need to be considered.
import numpy as np
data = np.array([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20])
d = [i for i, df in enumerate(np.diff(data)) if df!= 1]
d = np.hstack([-1, d, len(data)-1]) # add first and last elements
d = np.vstack([d[:-1]+1, d[1:]]).T
print(data[d])
Output:
[[ 2 5]
[12 17]
[20 20]]
Note: The request that individual numbers should be treated differently, (returned as individual, not ranges) was omitted. This can be reached by further post-processing the results. Usually this will make things more complex without gaining any benefit.
One-liner in Python 2.7 if interested:
x = [2, 3, 6, 7, 8, 14, 15, 19, 20, 21]
d = iter(x[:1] + sum(([i1, i2] for i1, i2 in zip(x, x[1:] + x[:1]) if i2 != i1+1), []))
print zip(d, d)
>>> [(2, 3), (6, 8), (14, 15), (19, 21)]
A short solution that works without additional imports. It accepts any iterable, sorts unsorted inputs, and removes duplicate items:
def ranges(nums):
nums = sorted(set(nums))
gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
return list(zip(edges, edges))
Example:
>>> ranges([2, 3, 4, 7, 8, 9, 15])
[(2, 4), (7, 9), (15, 15)]
>>> ranges([-1, 0, 1, 2, 3, 12, 13, 15, 100])
[(-1, 3), (12, 13), (15, 15), (100, 100)]
>>> ranges(range(100))
[(0, 99)]
>>> ranges([0])
[(0, 0)]
>>> ranges([])
[]
This is the same as #dansalmo's solution which I found amazing, albeit a bit hard to read and apply (as it's not given as a function).
Note that it could easily be modified to spit out "traditional" open ranges [start, end), by e.g. altering the return statement:
return [(s, e+1) for s, e in zip(edges, edges)]
I copied this answer over from another question that was marked as a duplicate of this one with the intention to make it easier findable (after I just now searched again for this topic, finding only the question here at first and not being satisfied with the answers given).
The versions by Mark Byers, Andrea Ambu, SilentGhost, Nadia Alramli, and truppo are simple and fast. The 'truppo' version encouraged me to write a version that retains the same nimble behavior while handling step sizes other than 1 (and lists as singletons elements that don't extend more than 1 step with a given step size). It is given here.
>>> list(ranges([1,2,3,4,3,2,1,3,5,7,11,1,2,3]))
[(1, 4, 1), (3, 1, -1), (3, 7, 2), 11, (1, 3, 1)]
Not the best approach , but here is my 2 cents
def getConsecutiveValues2(arr):
x = ""
final = []
end = 0
start = 0
for i in range(1,len(arr)) :
if arr[i] - arr[i-1] == 1 :
end = i
else :
print(start,end)
final.append(arr[start:end+1])
start = i
if i == len(arr) - 1 :
final.append(arr[start:end+1])
return final
x = [1,2,3,5,6,8,9,10,11,12]
print(getConsecutiveValues2(x))
>> [[1, 2, 3], [5, 6], [8, 9, 10, 11]]
This implementation works for regular or irregular steps
I needed to achieve the same thing but with the slight difference where steps can be irregular. this is my implementation
def ranges(l):
if not len(l):
return range(0,0)
elif len(l)==1:
return range(l[0],l[0]+1)
# get steps
sl = sorted(l)
steps = [i-j for i,j in zip(sl[1:],sl[:-1])]
# get unique steps indexes range
groups = [[0,0,steps[0]],]
for i,s in enumerate(steps):
if s==groups[-1][-1]:
groups[-1][1] = i+1
else:
groups.append( [i+1,i+1,s] )
g2 = groups[-2]
if g2[0]==g2[1]:
if sl[i+1]-sl[i]==s:
_=groups.pop(-2)
groups[-1][0] = i
# create list of ranges
return [range(sl[i],sl[j]+s,s) if s!=0 else [sl[i]]*(j+1-i) for i,j,s in groups]
Here's an example
from timeit import timeit
# for regular ranges
l = list(range(1000000))
ranges(l)
>>> [range(0, 1000000)]
l = list(range(10)) + list(range(20,25)) + [1,2,3]
ranges(l)
>>> [range(0, 2), range(1, 3), range(2, 4), range(3, 10), range(20, 25)]
sorted(l);[list(i) for i in ranges(l)]
>>> [0, 1, 1, 2, 2, 3, 3, 4, 5, 6, 7, 8, 9, 20, 21, 22, 23, 24]
>>> [[0, 1], [1, 2], [2, 3], [3, 4, 5, 6, 7, 8, 9], [20, 21, 22, 23, 24]]
# for irregular steps list
l = [1, 3, 5, 7, 10, 11, 12, 100, 200, 300, 400, 60, 99, 4000,4001]
ranges(l)
>>> [range(1, 9, 2), range(10, 13), range(60, 138, 39), range(100, 500, 100), range(4000, 4002)]
## Speed test
timeit("ranges(l)","from __main__ import ranges,l", number=1000)/1000
>>> 9.303160999934334e-06
Yet another solution if you expect your input to be a set:
def group_years(years):
consecutive_years = []
for year in years:
close = {y for y in years if abs(y - year) == 1}
for group in consecutive_years:
if len(close.intersection(group)):
group |= close
break
else:
consecutive_years.append({year, *close})
return consecutive_years
Example:
group_years({2016, 2017, 2019, 2020, 2022})
Out[54]: [{2016, 2017}, {2019, 2020}, {2022}]
I'd like to identify groups of consecutive numbers in a list, so that:
myfunc([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20])
Returns:
[(2,5), (12,17), 20]
And was wondering what the best way to do this was (particularly if there's something inbuilt into Python).
Edit: Note I originally forgot to mention that individual numbers should be returned as individual numbers, not ranges.
EDIT 2: To answer the OP new requirement
ranges = []
for key, group in groupby(enumerate(data), lambda (index, item): index - item):
group = map(itemgetter(1), group)
if len(group) > 1:
ranges.append(xrange(group[0], group[-1]))
else:
ranges.append(group[0])
Output:
[xrange(2, 5), xrange(12, 17), 20]
You can replace xrange with range or any other custom class.
Python docs have a very neat recipe for this:
from operator import itemgetter
from itertools import groupby
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
print(map(itemgetter(1), g))
Output:
[2, 3, 4, 5]
[12, 13, 14, 15, 16, 17]
If you want to get the exact same output, you can do this:
ranges = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
group = map(itemgetter(1), g)
ranges.append((group[0], group[-1]))
output:
[(2, 5), (12, 17)]
EDIT: The example is already explained in the documentation but maybe I should explain it more:
The key to the solution is
differencing with a range so that
consecutive numbers all appear in same
group.
If the data was: [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
Then groupby(enumerate(data), lambda (i,x):i-x) is equivalent of the following:
groupby(
[(0, 2), (1, 3), (2, 4), (3, 5), (4, 12),
(5, 13), (6, 14), (7, 15), (8, 16), (9, 17)],
lambda (i,x):i-x
)
The lambda function subtracts the element index from the element value. So when you apply the lambda on each item. You'll get the following keys for groupby:
[-2, -2, -2, -2, -8, -8, -8, -8, -8, -8]
groupby groups elements by equal key value, so the first 4 elements will be grouped together and so forth.
I hope this makes it more readable.
python 3 version may be helpful for beginners
import the libraries required first
from itertools import groupby
from operator import itemgetter
ranges =[]
for k,g in groupby(enumerate(data),lambda x:x[0]-x[1]):
group = (map(itemgetter(1),g))
group = list(map(int,group))
ranges.append((group[0],group[-1]))
more_itertools.consecutive_groups was added in version 4.0.
Demo
import more_itertools as mit
iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
[list(group) for group in mit.consecutive_groups(iterable)]
# [[2, 3, 4, 5], [12, 13, 14, 15, 16, 17], [20]]
Code
Applying this tool, we make a generator function that finds ranges of consecutive numbers.
def find_ranges(iterable):
"""Yield range of consecutive numbers."""
for group in mit.consecutive_groups(iterable):
group = list(group)
if len(group) == 1:
yield group[0]
else:
yield group[0], group[-1]
iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
list(find_ranges(iterable))
# [(2, 5), (12, 17), 20]
The source implementation emulates a classic recipe (as demonstrated by #Nadia Alramli).
Note: more_itertools is a third-party package installable via pip install more_itertools.
The "naive" solution which I find somewhat readable atleast.
x = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 22, 25, 26, 28, 51, 52, 57]
def group(L):
first = last = L[0]
for n in L[1:]:
if n - 1 == last: # Part of the group, bump the end
last = n
else: # Not part of the group, yield current group and start a new
yield first, last
first = last = n
yield first, last # Yield the last group
>>>print list(group(x))
[(2, 5), (12, 17), (22, 22), (25, 26), (28, 28), (51, 52), (57, 57)]
Assuming your list is sorted:
>>> from itertools import groupby
>>> def ranges(lst):
pos = (j - i for i, j in enumerate(lst))
t = 0
for i, els in groupby(pos):
l = len(list(els))
el = lst[t]
t += l
yield range(el, el+l)
>>> lst = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
>>> list(ranges(lst))
[range(2, 6), range(12, 18)]
Here it is something that should work, without any import needed:
def myfunc(lst):
ret = []
a = b = lst[0] # a and b are range's bounds
for el in lst[1:]:
if el == b+1:
b = el # range grows
else: # range ended
ret.append(a if a==b else (a,b)) # is a single or a range?
a = b = el # let's start again with a single
ret.append(a if a==b else (a,b)) # corner case for last single/range
return ret
Please note that the code using groupby doesn't work as given in Python 3 so use this.
for k, g in groupby(enumerate(data), lambda x:x[0]-x[1]):
group = list(map(itemgetter(1), g))
ranges.append((group[0], group[-1]))
This doesn't use a standard function - it just iiterates over the input, but it should work:
def myfunc(l):
r = []
p = q = None
for x in l + [-1]:
if x - 1 == q:
q += 1
else:
if p:
if q > p:
r.append('%s-%s' % (p, q))
else:
r.append(str(p))
p = q = x
return '(%s)' % ', '.join(r)
Note that it requires that the input contains only positive numbers in ascending order. You should validate the input, but this code is omitted for clarity.
import numpy as np
myarray = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
sequences = np.split(myarray, np.array(np.where(np.diff(myarray) > 1)[0]) + 1)
l = []
for s in sequences:
if len(s) > 1:
l.append((np.min(s), np.max(s)))
else:
l.append(s[0])
print(l)
Output:
[(2, 5), (12, 17), 20]
I think this way is simpler than any of the answers I've seen here (Edit: fixed based on comment from Pleastry):
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
starts = [x for x in data if x-1 not in data and x+1 in data]
ends = [x for x in data if x-1 in data and x+1 not in data and x not in starts]
singles = [x for x in data if x-1 not in data and x+1 not in data]
list(zip(starts, ends)) + singles
Output:
[(2, 5), (12, 17), 20]
Edited:
As #dawg notes, this is O(n**2). One option to improve performance would be to convert the original list to a set (and also the starts list to a set) i.e.
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
data_as_set = set(data)
starts = [x for x in data_as_set if x-1 not in data_as_set and x+1 in data_as_set]
startset = set(starts)
ends = [x for x in data_as_set if x-1 in data_as_set and x+1 not in data_as_set and x not in startset]
singles = [x for x in data_as_set if x-1 not in data_as_set and x+1 not in data_as_set]
print(list(zip(starts, ends)) + singles)
Using groupby and count from itertools gives us a short solution. The idea is that, in an increasing sequence, the difference between the index and the value will remain the same.
In order to keep track of the index, we can use an itertools.count, which makes the code cleaner as using enumerate:
from itertools import groupby, count
def intervals(data):
out = []
counter = count()
for key, group in groupby(data, key = lambda x: x-next(counter)):
block = list(group)
out.append([block[0], block[-1]])
return out
Some sample output:
print(intervals([0, 1, 3, 4, 6]))
# [[0, 1], [3, 4], [6, 6]]
print(intervals([2, 3, 4, 5]))
# [[2, 5]]
This is my method in which I tried to prioritize readability. Note that it returns a tuple of the same values if there is only one value in a group. That can be fixed easily in the second snippet I'll post.
def group(values):
"""return the first and last value of each continuous set in a list of sorted values"""
values = sorted(values)
first = last = values[0]
for index in values[1:]:
if index - last > 1: # triggered if in a new group
yield first, last
first = index # update first only if in a new group
last = index # update last on every iteration
yield first, last # this is needed to yield the last set of numbers
Here is the result of a test:
values = [0, 5, 6, 7, 12, 13, 21, 22, 23, 24, 25, 26, 30, 44, 45, 50]
result = list(group(values))
print(result)
result = [(0, 0), (5, 7), (12, 13), (21, 26), (30, 30), (44, 45), (50, 50)]
If you want to return only a single value in the case of a single value in a group, just add a conditional check to the yields:
def group(values):
"""return the first and last value of each continuous set in a list of sorted values"""
values = sorted(values)
first = last = values[0]
for index in values[1:]:
if index - last > 1: # triggered if in a new group
if first == last:
yield first
else:
yield first, last
first = index # update first only if in a new group
last = index # update last on every iteration
if first == last:
yield first
else:
yield first, last
result = [0, (5, 7), (12, 13), (21, 26), 30, (44, 45), 50]
Here's the answer I came up with. I'm writing the code for other people to understand, so I'm fairly verbose with variable names and comments.
First a quick helper function:
def getpreviousitem(mylist,myitem):
'''Given a list and an item, return previous item in list'''
for position, item in enumerate(mylist):
if item == myitem:
# First item has no previous item
if position == 0:
return None
# Return previous item
return mylist[position-1]
And then the actual code:
def getranges(cpulist):
'''Given a sorted list of numbers, return a list of ranges'''
rangelist = []
inrange = False
for item in cpulist:
previousitem = getpreviousitem(cpulist,item)
if previousitem == item - 1:
# We're in a range
if inrange == True:
# It's an existing range - change the end to the current item
newrange[1] = item
else:
# We've found a new range.
newrange = [item-1,item]
# Update to show we are now in a range
inrange = True
else:
# We were in a range but now it just ended
if inrange == True:
# Save the old range
rangelist.append(newrange)
# Update to show we're no longer in a range
inrange = False
# Add the final range found to our list
if inrange == True:
rangelist.append(newrange)
return rangelist
Example run:
getranges([2, 3, 4, 5, 12, 13, 14, 15, 16, 17])
returns:
[[2, 5], [12, 17]]
Using numpy + comprehension lists:
With numpy diff function, consequent input vector entries that their difference is not equal to one can be identified. The start and end of the input vector need to be considered.
import numpy as np
data = np.array([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20])
d = [i for i, df in enumerate(np.diff(data)) if df!= 1]
d = np.hstack([-1, d, len(data)-1]) # add first and last elements
d = np.vstack([d[:-1]+1, d[1:]]).T
print(data[d])
Output:
[[ 2 5]
[12 17]
[20 20]]
Note: The request that individual numbers should be treated differently, (returned as individual, not ranges) was omitted. This can be reached by further post-processing the results. Usually this will make things more complex without gaining any benefit.
One-liner in Python 2.7 if interested:
x = [2, 3, 6, 7, 8, 14, 15, 19, 20, 21]
d = iter(x[:1] + sum(([i1, i2] for i1, i2 in zip(x, x[1:] + x[:1]) if i2 != i1+1), []))
print zip(d, d)
>>> [(2, 3), (6, 8), (14, 15), (19, 21)]
A short solution that works without additional imports. It accepts any iterable, sorts unsorted inputs, and removes duplicate items:
def ranges(nums):
nums = sorted(set(nums))
gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
return list(zip(edges, edges))
Example:
>>> ranges([2, 3, 4, 7, 8, 9, 15])
[(2, 4), (7, 9), (15, 15)]
>>> ranges([-1, 0, 1, 2, 3, 12, 13, 15, 100])
[(-1, 3), (12, 13), (15, 15), (100, 100)]
>>> ranges(range(100))
[(0, 99)]
>>> ranges([0])
[(0, 0)]
>>> ranges([])
[]
This is the same as #dansalmo's solution which I found amazing, albeit a bit hard to read and apply (as it's not given as a function).
Note that it could easily be modified to spit out "traditional" open ranges [start, end), by e.g. altering the return statement:
return [(s, e+1) for s, e in zip(edges, edges)]
I copied this answer over from another question that was marked as a duplicate of this one with the intention to make it easier findable (after I just now searched again for this topic, finding only the question here at first and not being satisfied with the answers given).
The versions by Mark Byers, Andrea Ambu, SilentGhost, Nadia Alramli, and truppo are simple and fast. The 'truppo' version encouraged me to write a version that retains the same nimble behavior while handling step sizes other than 1 (and lists as singletons elements that don't extend more than 1 step with a given step size). It is given here.
>>> list(ranges([1,2,3,4,3,2,1,3,5,7,11,1,2,3]))
[(1, 4, 1), (3, 1, -1), (3, 7, 2), 11, (1, 3, 1)]
Not the best approach , but here is my 2 cents
def getConsecutiveValues2(arr):
x = ""
final = []
end = 0
start = 0
for i in range(1,len(arr)) :
if arr[i] - arr[i-1] == 1 :
end = i
else :
print(start,end)
final.append(arr[start:end+1])
start = i
if i == len(arr) - 1 :
final.append(arr[start:end+1])
return final
x = [1,2,3,5,6,8,9,10,11,12]
print(getConsecutiveValues2(x))
>> [[1, 2, 3], [5, 6], [8, 9, 10, 11]]
This implementation works for regular or irregular steps
I needed to achieve the same thing but with the slight difference where steps can be irregular. this is my implementation
def ranges(l):
if not len(l):
return range(0,0)
elif len(l)==1:
return range(l[0],l[0]+1)
# get steps
sl = sorted(l)
steps = [i-j for i,j in zip(sl[1:],sl[:-1])]
# get unique steps indexes range
groups = [[0,0,steps[0]],]
for i,s in enumerate(steps):
if s==groups[-1][-1]:
groups[-1][1] = i+1
else:
groups.append( [i+1,i+1,s] )
g2 = groups[-2]
if g2[0]==g2[1]:
if sl[i+1]-sl[i]==s:
_=groups.pop(-2)
groups[-1][0] = i
# create list of ranges
return [range(sl[i],sl[j]+s,s) if s!=0 else [sl[i]]*(j+1-i) for i,j,s in groups]
Here's an example
from timeit import timeit
# for regular ranges
l = list(range(1000000))
ranges(l)
>>> [range(0, 1000000)]
l = list(range(10)) + list(range(20,25)) + [1,2,3]
ranges(l)
>>> [range(0, 2), range(1, 3), range(2, 4), range(3, 10), range(20, 25)]
sorted(l);[list(i) for i in ranges(l)]
>>> [0, 1, 1, 2, 2, 3, 3, 4, 5, 6, 7, 8, 9, 20, 21, 22, 23, 24]
>>> [[0, 1], [1, 2], [2, 3], [3, 4, 5, 6, 7, 8, 9], [20, 21, 22, 23, 24]]
# for irregular steps list
l = [1, 3, 5, 7, 10, 11, 12, 100, 200, 300, 400, 60, 99, 4000,4001]
ranges(l)
>>> [range(1, 9, 2), range(10, 13), range(60, 138, 39), range(100, 500, 100), range(4000, 4002)]
## Speed test
timeit("ranges(l)","from __main__ import ranges,l", number=1000)/1000
>>> 9.303160999934334e-06
Yet another solution if you expect your input to be a set:
def group_years(years):
consecutive_years = []
for year in years:
close = {y for y in years if abs(y - year) == 1}
for group in consecutive_years:
if len(close.intersection(group)):
group |= close
break
else:
consecutive_years.append({year, *close})
return consecutive_years
Example:
group_years({2016, 2017, 2019, 2020, 2022})
Out[54]: [{2016, 2017}, {2019, 2020}, {2022}]