Python split list into n chunks - python

I know this question has been covered many times but my requirement is different.
I have a list like: range(1, 26). I want to divide this list into a fixed number n. Assuming n = 6.
>>> x
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
>>> l = [ x [i:i + 6] for i in range(0, len(x), 6) ]
>>> l
[[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24], [25]]
As you can see I didn't get 6 chunks (six sublists with elements of original list). How do I divide a list in such a way that I get exactly n chunks which may be even or uneven

Use numpy
>>> import numpy
>>> x = range(25)
>>> l = numpy.array_split(numpy.array(x),6)
or
>>> import numpy
>>> x = numpy.arange(25)
>>> l = numpy.array_split(x,6);
You can also use numpy.split but that one throws in error if the length is not exactly divisible.

The solution(s) below have many advantages:
Uses generator to yield the result.
No imports.
Lists are balanced (you never end up with 4 lists of size 4 and one list of size 1 if you split a list of length 17 into 5).
def chunks(l, n):
"""Yield n number of striped chunks from l."""
for i in range(0, n):
yield l[i::n]
The code above produces the below output for l = range(16) and n = 6:
[0, 6, 12]
[1, 7, 13]
[2, 8, 14]
[3, 9, 15]
[4, 10]
[5, 11]
If you need the chunks to be sequential instead of striped use this:
def chunks(l, n):
"""Yield n number of sequential chunks from l."""
d, r = divmod(len(l), n)
for i in range(n):
si = (d+1)*(i if i < r else r) + d*(0 if i < r else i - r)
yield l[si:si+(d+1 if i < r else d)]
Which for l = range(16) and n = 6 produces:
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 10, 11]
[12, 13]
[14, 15]
See this stackoverflow link for more information on the advantages of generators.

If order doesn't matter:
def chunker_list(seq, size):
return (seq[i::size] for i in range(size))
print(list(chunker_list([1, 2, 3, 4, 5], 2)))
>>> [[1, 3, 5], [2, 4]]
print(list(chunker_list([1, 2, 3, 4, 5], 3)))
>>> [[1, 4], [2, 5], [3]]
print(list(chunker_list([1, 2, 3, 4, 5], 4)))
>>> [[1, 5], [2], [3], [4]]
print(list(chunker_list([1, 2, 3, 4, 5], 5)))
>>> [[1], [2], [3], [4], [5]]
print(list(chunker_list([1, 2, 3, 4, 5], 6)))
>>> [[1], [2], [3], [4], [5], []]

more_itertools.divide is one approach to solve this problem:
import more_itertools as mit
iterable = range(1, 26)
[list(c) for c in mit.divide(6, iterable)]
Output
[[ 1, 2, 3, 4, 5], # remaining item
[ 6, 7, 8, 9],
[10, 11, 12, 13],
[14, 15, 16, 17],
[18, 19, 20, 21],
[22, 23, 24, 25]]
As shown, if the iterable is not evenly divisible, the remaining items are distributed from the first to the last chunk.
See more about the more_itertools library here.

My answer is to simply use python built-in Slice:
# Assume x is our list which we wish to slice
x = range(1, 26)
# Assume we want to slice it to 6 equal chunks
result = []
for i in range(0, len(x), 6):
slice_item = slice(i, i + 6, 1)
result.append(x[slice_item])
# Result would be equal to
[[0,1,2,3,4,5], [6,7,8,9,10,11],
[12,13,14,15,16,17],[18,19,20,21,22,23], [24, 25]]

I came up with the following solution:
l = [x[i::n] for i in range(n)]
For example:
n = 6
x = list(range(26))
l = [x[i::n] for i in range(n)]
print(l)
Output:
[[0, 6, 12, 18, 24], [1, 7, 13, 19, 25], [2, 8, 14, 20], [3, 9, 15, 21], [4, 10, 16, 22], [5, 11, 17, 23]]
As you can see, the output consists from n chunks, which have roughly the same number of elements.
How it works?
The trick is to use list slice step (the number after two semicolons) and to increment the offset of stepped slicing. First, it takes every n element starting from the first, then every n element starting from the second and so on. This completes the task.

Try this:
from __future__ import division
import math
def chunked(iterable, n):
""" Split iterable into ``n`` iterables of similar size
Examples::
>>> l = [1, 2, 3, 4]
>>> list(chunked(l, 4))
[[1], [2], [3], [4]]
>>> l = [1, 2, 3]
>>> list(chunked(l, 4))
[[1], [2], [3], []]
>>> l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> list(chunked(l, 4))
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
"""
chunksize = int(math.ceil(len(iterable) / n))
return (iterable[i * chunksize:i * chunksize + chunksize]
for i in range(n))
It returns an iterator instead of a list for efficiency (I'm assuming you want to loop over the chunks), but you can replace that with a list comprehension if you want. When the number of items is not divisible by number of chunks, the last chunk is smaller than the others.
EDIT: Fixed second example to show that it doesn't handle one edge case

Here take my 2 cents..
from math import ceil
size = 3
seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
chunks = [
seq[i * size:(i * size) + size]
for i in range(ceil(len(seq) / size))
]
# [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11]]

Hint:
x is the string to be split.
k is number of chunks
n = len(x)/k
[x[i:i+n] for i in range(0, len(x), n)]

One way would be to make the last list uneven and the rest even. This can be done as follows:
>>> x
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
>>> m = len(x) // 6
>>> test = [x[i:i+m] for i in range(0, len(x), m)]
>>> test[-2:] = [test[-2] + test[-1]]
>>> test
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24, 25]]

Assuming you want to divide into n chunks:
n = 6
num = float(len(x))/n
l = [ x [i:i + int(num)] for i in range(0, (n-1)*int(num), int(num))]
l.append(x[(n-1)*int(num):])
This method simply divides the length of the list by the number of chunks and, in case the length is not a multiple of the number, adds the extra elements in the last list.

If you want to have the chunks as evenly sized as possible:
def chunk_ranges(items: int, chunks: int) -> List[Tuple[int, int]]:
"""
Split the items by best effort into equally-sized chunks.
If there are fewer items than chunks, each chunk contains an item and
there are fewer returned chunk indices than the argument `chunks`.
:param items: number of items in the batch.
:param chunks: number of chunks
:return: list of (chunk begin inclusive, chunk end exclusive)
"""
assert chunks > 0, \
"Unexpected non-positive chunk count: {}".format(chunks)
result = [] # type: List[Tuple[int, int]]
if items <= chunks:
for i in range(0, items):
result.append((i, i + 1))
return result
chunk_size, extras = divmod(items, chunks)
start = 0
for i in range(0, chunks):
if i < extras:
end = start + chunk_size + 1
else:
end = start + chunk_size
result.append((start, end))
start = end
return result
Test case:
def test_chunk_ranges(self):
self.assertListEqual(chunk_ranges(items=8, chunks=1),
[(0, 8)])
self.assertListEqual(chunk_ranges(items=8, chunks=2),
[(0, 4), (4, 8)])
self.assertListEqual(chunk_ranges(items=8, chunks=3),
[(0, 3), (3, 6), (6, 8)])
self.assertListEqual(chunk_ranges(items=8, chunks=5),
[(0, 2), (2, 4), (4, 6), (6, 7), (7, 8)])
self.assertListEqual(chunk_ranges(items=8, chunks=6),
[(0, 2), (2, 4), (4, 5), (5, 6), (6, 7), (7, 8)])
self.assertListEqual(
chunk_ranges(items=8, chunks=7),
[(0, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8)])
self.assertListEqual(
chunk_ranges(items=8, chunks=9),
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8)])

In cases, where your list contains elements of different types or iterable objects that store values of different types (f.e. some elements are integers, and some are strings), if you use array_split function from numpy package to split it, you will get chunks with elements of same type:
import numpy as np
data1 = [(1, 2), ('a', 'b'), (3, 4), (5, 6), ('c', 'd'), ('e', 'f')]
chunks = np.array_split(data1, 3)
print(chunks)
# [array([['1', '2'],
# ['a', 'b']], dtype='<U11'), array([['3', '4'],
# ['5', '6']], dtype='<U11'), array([['c', 'd'],
# ['e', 'f']], dtype='<U11')]
data2 = [1, 2, 'a', 'b', 3, 4, 5, 6, 'c', 'd', 'e', 'f']
chunks = np.array_split(data2, 3)
print(chunks)
# [array(['1', '2', 'a', 'b'], dtype='<U11'), array(['3', '4', '5', '6'], dtype='<U11'),
# array(['c', 'd', 'e', 'f'], dtype='<U11')]
If you would like to have initial types of elements in chunks after splitting of list, you can modify source code of array_split function from numpy package or use this implementation:
from itertools import accumulate
def list_split(input_list, num_of_chunks):
n_total = len(input_list)
n_each_chunk, extras = divmod(n_total, num_of_chunks)
chunk_sizes = ([0] + extras * [n_each_chunk + 1] + (num_of_chunks - extras) * [n_each_chunk])
div_points = list(accumulate(chunk_sizes))
sub_lists = []
for i in range(num_of_chunks):
start = div_points[i]
end = div_points[i + 1]
sub_lists.append(input_list[start:end])
return (sub_list for sub_list in sub_lists)
result = list(list_split(data1, 3))
print(result)
# [[(1, 2), ('a', 'b')], [(3, 4), (5, 6)], [('c', 'd'), ('e', 'f')]]
result = list(list_split(data2, 3))
print(result)
# [[1, 2, 'a', 'b'], [3, 4, 5, 6], ['c', 'd', 'e', 'f']]

This solution is based on the zip "grouper" pattern from the Python 3 docs. The small addition is that if N does not divide the list length evenly, all the extra items are placed into the first chunk.
import itertools
def segment_list(l, N):
chunk_size, remainder = divmod(len(l), N)
first, rest = l[:chunk_size + remainder], l[chunk_size + remainder:]
return itertools.chain([first], zip(*[iter(rest)] * chunk_size))
Example usage:
>>> my_list = list(range(10))
>>> segment_list(my_list, 2)
[[0, 1, 2, 3, 4], (5, 6, 7, 8, 9)]
>>> segment_list(my_list, 3)
[[0, 1, 2, 3], (4, 5, 6), (7, 8, 9)]
>>>
The advantages of this solution are that it preserves the order of the original list, and is written in a functional style that lazily evaluates the list only once when called.
Note that because it returns an iterator, the result can only be consumed once. If you want the convenience of a non-lazy list, you can wrap the result in list:
>>> x = list(segment_list(my_list, 2))
>>> x
[[0, 1, 2, 3, 4], (5, 6, 7, 8, 9)]
>>> x
[[0, 1, 2, 3, 4], (5, 6, 7, 8, 9)]
>>>

I would simply do
(let's say you want n chunks)
import numpy as np
# convert x to numpy.ndarray
x = np.array(x)
l = np.array_split(x, n)
It works and it's only 2 lines.
Example:
# your list
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
# amount of chunks you want
n = 6
x = np.array(x)
l = np.array_split(x, n)
print(l)
>> [array([1, 2, 3, 4, 5]), array([6, 7, 8, 9]), array([10, 11, 12, 13]), array([14, 15, 16, 17]), array([18, 19, 20, 21]), array([22, 23, 24, 25])]
And if you want a list of list:
l = [list(elem) for elem in l]
print(l)
>> [[1, 2, 3, 4, 5], [6, 7, 8, 9], [10, 11, 12, 13], [14, 15, 16, 17], [18, 19, 20, 21], [22, 23, 24, 25]]

x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
chunk = len(x)/6
l=[]
i=0
while i<len(x):
if len(l)<=4:
l.append(x [i:i + chunk])
else:
l.append(x [i:])
break
i+=chunk
print l
#output=[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24, 25]]

arr1=[-20, 20, -10, 0, 4, 8, 10, 6, 15, 9, 18, 35, 40, -30, -90, 99]
n=4
final = [arr1[i * n:(i + 1) * n] for i in range((len(arr1) + n - 1) // n )]
print(final)
Output:
[[-20, 20, -10, 0], [4, 8, 10, 6], [15, 9, 18, 35], [40, -30, -90,
99]]

This function will return the list of lists with the set maximum amount of values in one list (chunk).
def chuncker(list_to_split, chunk_size):
list_of_chunks =[]
start_chunk = 0
end_chunk = start_chunk+chunk_size
while end_chunk <= len(list_to_split)+chunk_size:
chunk_ls = list_to_split[start_chunk: end_chunk]
list_of_chunks.append(chunk_ls)
start_chunk = start_chunk +chunk_size
end_chunk = end_chunk+chunk_size
return list_of_chunks
Example:
ls = list(range(20))
chuncker(list_to_split = ls, chunk_size = 6)
output:
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19]]

This accepts generators, without consuming it at once. If we know the size of the generator, the binsize can be calculated by max(1, size // n_chunks).
from time import sleep
def chunks(items, binsize):
lst = []
for item in items:
lst.append(item)
if len(lst) == binsize:
yield lst
lst = []
if len(lst) > 0:
yield lst
def g():
for item in [1, 2, 3, 4, 5, 6, 7]:
print("accessed:", item)
sleep(1)
yield item
for a in chunks(g(), 3):
print("chunk:", list(a), "\n")

For people looking for an answer in python 3(.6) without imports.
x is the list to be split.
n is the length of chunks.
L is the new list.
n = 6
L = [x[i:i + int(n)] for i in range(0, (n - 1) * int(n), int(n))]
#[[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24], [25]]

Related

Create 3 lists from one list [duplicate]

I know this question has been covered many times but my requirement is different.
I have a list like: range(1, 26). I want to divide this list into a fixed number n. Assuming n = 6.
>>> x
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
>>> l = [ x [i:i + 6] for i in range(0, len(x), 6) ]
>>> l
[[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24], [25]]
As you can see I didn't get 6 chunks (six sublists with elements of original list). How do I divide a list in such a way that I get exactly n chunks which may be even or uneven
Use numpy
>>> import numpy
>>> x = range(25)
>>> l = numpy.array_split(numpy.array(x),6)
or
>>> import numpy
>>> x = numpy.arange(25)
>>> l = numpy.array_split(x,6);
You can also use numpy.split but that one throws in error if the length is not exactly divisible.
The solution(s) below have many advantages:
Uses generator to yield the result.
No imports.
Lists are balanced (you never end up with 4 lists of size 4 and one list of size 1 if you split a list of length 17 into 5).
def chunks(l, n):
"""Yield n number of striped chunks from l."""
for i in range(0, n):
yield l[i::n]
The code above produces the below output for l = range(16) and n = 6:
[0, 6, 12]
[1, 7, 13]
[2, 8, 14]
[3, 9, 15]
[4, 10]
[5, 11]
If you need the chunks to be sequential instead of striped use this:
def chunks(l, n):
"""Yield n number of sequential chunks from l."""
d, r = divmod(len(l), n)
for i in range(n):
si = (d+1)*(i if i < r else r) + d*(0 if i < r else i - r)
yield l[si:si+(d+1 if i < r else d)]
Which for l = range(16) and n = 6 produces:
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 10, 11]
[12, 13]
[14, 15]
See this stackoverflow link for more information on the advantages of generators.
If order doesn't matter:
def chunker_list(seq, size):
return (seq[i::size] for i in range(size))
print(list(chunker_list([1, 2, 3, 4, 5], 2)))
>>> [[1, 3, 5], [2, 4]]
print(list(chunker_list([1, 2, 3, 4, 5], 3)))
>>> [[1, 4], [2, 5], [3]]
print(list(chunker_list([1, 2, 3, 4, 5], 4)))
>>> [[1, 5], [2], [3], [4]]
print(list(chunker_list([1, 2, 3, 4, 5], 5)))
>>> [[1], [2], [3], [4], [5]]
print(list(chunker_list([1, 2, 3, 4, 5], 6)))
>>> [[1], [2], [3], [4], [5], []]
more_itertools.divide is one approach to solve this problem:
import more_itertools as mit
iterable = range(1, 26)
[list(c) for c in mit.divide(6, iterable)]
Output
[[ 1, 2, 3, 4, 5], # remaining item
[ 6, 7, 8, 9],
[10, 11, 12, 13],
[14, 15, 16, 17],
[18, 19, 20, 21],
[22, 23, 24, 25]]
As shown, if the iterable is not evenly divisible, the remaining items are distributed from the first to the last chunk.
See more about the more_itertools library here.
My answer is to simply use python built-in Slice:
# Assume x is our list which we wish to slice
x = range(1, 26)
# Assume we want to slice it to 6 equal chunks
result = []
for i in range(0, len(x), 6):
slice_item = slice(i, i + 6, 1)
result.append(x[slice_item])
# Result would be equal to
[[0,1,2,3,4,5], [6,7,8,9,10,11],
[12,13,14,15,16,17],[18,19,20,21,22,23], [24, 25]]
I came up with the following solution:
l = [x[i::n] for i in range(n)]
For example:
n = 6
x = list(range(26))
l = [x[i::n] for i in range(n)]
print(l)
Output:
[[0, 6, 12, 18, 24], [1, 7, 13, 19, 25], [2, 8, 14, 20], [3, 9, 15, 21], [4, 10, 16, 22], [5, 11, 17, 23]]
As you can see, the output consists from n chunks, which have roughly the same number of elements.
How it works?
The trick is to use list slice step (the number after two semicolons) and to increment the offset of stepped slicing. First, it takes every n element starting from the first, then every n element starting from the second and so on. This completes the task.
Try this:
from __future__ import division
import math
def chunked(iterable, n):
""" Split iterable into ``n`` iterables of similar size
Examples::
>>> l = [1, 2, 3, 4]
>>> list(chunked(l, 4))
[[1], [2], [3], [4]]
>>> l = [1, 2, 3]
>>> list(chunked(l, 4))
[[1], [2], [3], []]
>>> l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> list(chunked(l, 4))
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
"""
chunksize = int(math.ceil(len(iterable) / n))
return (iterable[i * chunksize:i * chunksize + chunksize]
for i in range(n))
It returns an iterator instead of a list for efficiency (I'm assuming you want to loop over the chunks), but you can replace that with a list comprehension if you want. When the number of items is not divisible by number of chunks, the last chunk is smaller than the others.
EDIT: Fixed second example to show that it doesn't handle one edge case
Here take my 2 cents..
from math import ceil
size = 3
seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
chunks = [
seq[i * size:(i * size) + size]
for i in range(ceil(len(seq) / size))
]
# [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11]]
Hint:
x is the string to be split.
k is number of chunks
n = len(x)/k
[x[i:i+n] for i in range(0, len(x), n)]
One way would be to make the last list uneven and the rest even. This can be done as follows:
>>> x
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
>>> m = len(x) // 6
>>> test = [x[i:i+m] for i in range(0, len(x), m)]
>>> test[-2:] = [test[-2] + test[-1]]
>>> test
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24, 25]]
Assuming you want to divide into n chunks:
n = 6
num = float(len(x))/n
l = [ x [i:i + int(num)] for i in range(0, (n-1)*int(num), int(num))]
l.append(x[(n-1)*int(num):])
This method simply divides the length of the list by the number of chunks and, in case the length is not a multiple of the number, adds the extra elements in the last list.
If you want to have the chunks as evenly sized as possible:
def chunk_ranges(items: int, chunks: int) -> List[Tuple[int, int]]:
"""
Split the items by best effort into equally-sized chunks.
If there are fewer items than chunks, each chunk contains an item and
there are fewer returned chunk indices than the argument `chunks`.
:param items: number of items in the batch.
:param chunks: number of chunks
:return: list of (chunk begin inclusive, chunk end exclusive)
"""
assert chunks > 0, \
"Unexpected non-positive chunk count: {}".format(chunks)
result = [] # type: List[Tuple[int, int]]
if items <= chunks:
for i in range(0, items):
result.append((i, i + 1))
return result
chunk_size, extras = divmod(items, chunks)
start = 0
for i in range(0, chunks):
if i < extras:
end = start + chunk_size + 1
else:
end = start + chunk_size
result.append((start, end))
start = end
return result
Test case:
def test_chunk_ranges(self):
self.assertListEqual(chunk_ranges(items=8, chunks=1),
[(0, 8)])
self.assertListEqual(chunk_ranges(items=8, chunks=2),
[(0, 4), (4, 8)])
self.assertListEqual(chunk_ranges(items=8, chunks=3),
[(0, 3), (3, 6), (6, 8)])
self.assertListEqual(chunk_ranges(items=8, chunks=5),
[(0, 2), (2, 4), (4, 6), (6, 7), (7, 8)])
self.assertListEqual(chunk_ranges(items=8, chunks=6),
[(0, 2), (2, 4), (4, 5), (5, 6), (6, 7), (7, 8)])
self.assertListEqual(
chunk_ranges(items=8, chunks=7),
[(0, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8)])
self.assertListEqual(
chunk_ranges(items=8, chunks=9),
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8)])
In cases, where your list contains elements of different types or iterable objects that store values of different types (f.e. some elements are integers, and some are strings), if you use array_split function from numpy package to split it, you will get chunks with elements of same type:
import numpy as np
data1 = [(1, 2), ('a', 'b'), (3, 4), (5, 6), ('c', 'd'), ('e', 'f')]
chunks = np.array_split(data1, 3)
print(chunks)
# [array([['1', '2'],
# ['a', 'b']], dtype='<U11'), array([['3', '4'],
# ['5', '6']], dtype='<U11'), array([['c', 'd'],
# ['e', 'f']], dtype='<U11')]
data2 = [1, 2, 'a', 'b', 3, 4, 5, 6, 'c', 'd', 'e', 'f']
chunks = np.array_split(data2, 3)
print(chunks)
# [array(['1', '2', 'a', 'b'], dtype='<U11'), array(['3', '4', '5', '6'], dtype='<U11'),
# array(['c', 'd', 'e', 'f'], dtype='<U11')]
If you would like to have initial types of elements in chunks after splitting of list, you can modify source code of array_split function from numpy package or use this implementation:
from itertools import accumulate
def list_split(input_list, num_of_chunks):
n_total = len(input_list)
n_each_chunk, extras = divmod(n_total, num_of_chunks)
chunk_sizes = ([0] + extras * [n_each_chunk + 1] + (num_of_chunks - extras) * [n_each_chunk])
div_points = list(accumulate(chunk_sizes))
sub_lists = []
for i in range(num_of_chunks):
start = div_points[i]
end = div_points[i + 1]
sub_lists.append(input_list[start:end])
return (sub_list for sub_list in sub_lists)
result = list(list_split(data1, 3))
print(result)
# [[(1, 2), ('a', 'b')], [(3, 4), (5, 6)], [('c', 'd'), ('e', 'f')]]
result = list(list_split(data2, 3))
print(result)
# [[1, 2, 'a', 'b'], [3, 4, 5, 6], ['c', 'd', 'e', 'f']]
This solution is based on the zip "grouper" pattern from the Python 3 docs. The small addition is that if N does not divide the list length evenly, all the extra items are placed into the first chunk.
import itertools
def segment_list(l, N):
chunk_size, remainder = divmod(len(l), N)
first, rest = l[:chunk_size + remainder], l[chunk_size + remainder:]
return itertools.chain([first], zip(*[iter(rest)] * chunk_size))
Example usage:
>>> my_list = list(range(10))
>>> segment_list(my_list, 2)
[[0, 1, 2, 3, 4], (5, 6, 7, 8, 9)]
>>> segment_list(my_list, 3)
[[0, 1, 2, 3], (4, 5, 6), (7, 8, 9)]
>>>
The advantages of this solution are that it preserves the order of the original list, and is written in a functional style that lazily evaluates the list only once when called.
Note that because it returns an iterator, the result can only be consumed once. If you want the convenience of a non-lazy list, you can wrap the result in list:
>>> x = list(segment_list(my_list, 2))
>>> x
[[0, 1, 2, 3, 4], (5, 6, 7, 8, 9)]
>>> x
[[0, 1, 2, 3, 4], (5, 6, 7, 8, 9)]
>>>
I would simply do
(let's say you want n chunks)
import numpy as np
# convert x to numpy.ndarray
x = np.array(x)
l = np.array_split(x, n)
It works and it's only 2 lines.
Example:
# your list
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
# amount of chunks you want
n = 6
x = np.array(x)
l = np.array_split(x, n)
print(l)
>> [array([1, 2, 3, 4, 5]), array([6, 7, 8, 9]), array([10, 11, 12, 13]), array([14, 15, 16, 17]), array([18, 19, 20, 21]), array([22, 23, 24, 25])]
And if you want a list of list:
l = [list(elem) for elem in l]
print(l)
>> [[1, 2, 3, 4, 5], [6, 7, 8, 9], [10, 11, 12, 13], [14, 15, 16, 17], [18, 19, 20, 21], [22, 23, 24, 25]]
x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
chunk = len(x)/6
l=[]
i=0
while i<len(x):
if len(l)<=4:
l.append(x [i:i + chunk])
else:
l.append(x [i:])
break
i+=chunk
print l
#output=[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24, 25]]
arr1=[-20, 20, -10, 0, 4, 8, 10, 6, 15, 9, 18, 35, 40, -30, -90, 99]
n=4
final = [arr1[i * n:(i + 1) * n] for i in range((len(arr1) + n - 1) // n )]
print(final)
Output:
[[-20, 20, -10, 0], [4, 8, 10, 6], [15, 9, 18, 35], [40, -30, -90,
99]]
This function will return the list of lists with the set maximum amount of values in one list (chunk).
def chuncker(list_to_split, chunk_size):
list_of_chunks =[]
start_chunk = 0
end_chunk = start_chunk+chunk_size
while end_chunk <= len(list_to_split)+chunk_size:
chunk_ls = list_to_split[start_chunk: end_chunk]
list_of_chunks.append(chunk_ls)
start_chunk = start_chunk +chunk_size
end_chunk = end_chunk+chunk_size
return list_of_chunks
Example:
ls = list(range(20))
chuncker(list_to_split = ls, chunk_size = 6)
output:
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19]]
This accepts generators, without consuming it at once. If we know the size of the generator, the binsize can be calculated by max(1, size // n_chunks).
from time import sleep
def chunks(items, binsize):
lst = []
for item in items:
lst.append(item)
if len(lst) == binsize:
yield lst
lst = []
if len(lst) > 0:
yield lst
def g():
for item in [1, 2, 3, 4, 5, 6, 7]:
print("accessed:", item)
sleep(1)
yield item
for a in chunks(g(), 3):
print("chunk:", list(a), "\n")
For people looking for an answer in python 3(.6) without imports.
x is the list to be split.
n is the length of chunks.
L is the new list.
n = 6
L = [x[i:i + int(n)] for i in range(0, (n - 1) * int(n), int(n))]
#[[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24], [25]]

Get list of numbers then return the numbers in the list that are increasing [duplicate]

The aim is to find groups of increasing/monotonic numbers given a list of integers. Each item in the resulting group must be of a +1 increment from the previous item
Given an input:
x = [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
I need to find groups of increasing numbers and achieve:
increasing_numbers = [(7,8,9,10), (0,1,2,3,4,5)]
And eventually also the number of increasing numbers:
len(list(chain(*increasing_numbers)))
And also the len of the groups:
increasing_num_groups_length = [len(i) for i in increasing_numbers]
I have tried the following to get the number of increasing numbers:
>>> from itertools import tee, chain
>>> def pairwise(iterable):
... a, b = tee(iterable)
... next(b, None)
... return zip(a, b)
...
>>> x = [8, 9, 10, 11, 7, 1, 2, 3, 4, 5, 6]
>>> set(list(chain(*[(i,j) for i,j in pairwise(x) if j-1==i])))
set([1, 2, 3, 4, 5, 6, 8, 9, 10, 11])
>>> len(set(list(chain(*[(i,j) for i,j in pairwise(x) if j-1==i]))))
10
But I'm unable to keep the order and the groups of increasing numbers.
How can I achieve the increasing_numbers groups of integer tuples and also the increasing_num_groups_length?
Also, is there a name for such/similar problem?
EDITED
I've came up with this solution but it's super verbose and I'm sure there's an easier way to achieve the increasing_numbers output:
>>> from itertools import tee, chain
>>> def pairwise(iterable):
... a, b = tee(iterable)
... next(b, None)
... return zip(a, b)
...
>>> x = [8, 9, 10, 11, 7, 1, 2, 3, 4, 5, 6]
>>> boundary = iter([0] + [i+1 for i, (j,k) in enumerate(pairwise(x)) if j+1!=k] + [len(x)])
>>> [tuple(x[i:next(boundary)]) for i in boundary]
[(8, 9, 10, 11), (1, 2, 3, 4, 5, 6)]
Is there a more pythonic / less verbose way to do this?
Another input/output example:
[in]:
[17, 17, 19, 20, 21, 22, 0, 1, 2, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 14, 14, 28, 29, 30, 31, 32, 33, 34, 35, 36, 40]
[out]:
[(19, 20, 21, 22), (0, 1, 2), (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14),
(28, 29, 30, 31, 32, 33, 34, 35, 36)]
EDIT:
Here's a code-golf solution (142 characters):
def f(x):s=[0]+[i for i in range(1,len(x)) if x[i]!=x[i-1]+1]+[len(x)];return [x[j:k] for j,k in [s[i:i+2] for i in range(len(s)-1)] if k-j>1]
Expanded version:
def igroups(x):
s = [0] + [i for i in range(1, len(x)) if x[i] != x[i-1] + 1] + [len(x)]
return [x[j:k] for j, k in [s[i:i+2] for i in range(len(s)-1)] if k - j > 1]
Commented version:
def igroups(x):
# find the boundaries where numbers are not consecutive
boundaries = [i for i in range(1, len(x)) if x[i] != x[i-1] + 1]
# add the start and end boundaries
boundaries = [0] + boundaries + [len(x)]
# take the boundaries as pairwise slices
slices = [boundaries[i:i + 2] for i in range(len(boundaries) - 1)]
# extract all sequences with length greater than one
return [x[start:end] for start, end in slices if end - start > 1]
Original solution:
Not sure whether this counts as "pythonic" or "not too verbose":
def igroups(iterable):
items = iter(iterable)
a, b = None, next(items, None)
result = [b]
while b is not None:
a, b = b, next(items, None)
if b is not None and a + 1 == b:
result.append(b)
else:
if len(result) > 1:
yield tuple(result)
result = [b]
print(list(igroups([])))
print(list(igroups([0, 0, 0])))
print(list(igroups([7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5])))
print(list(igroups([8, 9, 10, 11, 7, 1, 2, 3, 4, 5, 6])))
print(list(igroups([9, 1, 2, 3, 1, 1, 2, 3, 5])))
Output:
[]
[]
[(7, 8, 9, 10), (0, 1, 2, 3, 4, 5)]
[(8, 9, 10, 11), (1, 2, 3, 4, 5, 6)]
[(1, 2, 3), (1, 2, 3)]
A couple of different ways using itertools and numpy:
from itertools import groupby, tee, cycle
x = [17, 17, 19, 20, 21, 22, 0, 1, 2, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 14, 14, 28, 29, 30, 31, 32, 33, 34, 35,
36, 1, 2, 3, 4,34,54]
def sequences(l):
x2 = cycle(l)
next(x2)
grps = groupby(l, key=lambda j: j + 1 == next(x2))
for k, v in grps:
if k:
yield tuple(v) + (next((next(grps)[1])),)
print(list(sequences(x)))
[(19, 20, 21, 22), (0, 1, 2), (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), (28, 29, 30, 31, 32, 33, 34, 35, 36), (1, 2, 3, 4)]
Or using python3 and yield from:
def sequences(l):
x2 = cycle(l)
next(x2)
grps = groupby(l, key=lambda j: j + 1 == next(x2))
yield from (tuple(v) + (next((next(grps)[1])),) for k,v in grps if k)
print(list(sequences(x)))
Using a variation of my answer here with numpy.split :
out = [tuple(arr) for arr in np.split(x, np.where(np.diff(x) != 1)[0] + 1) if arr.size > 1]
print(out)
[(19, 20, 21, 22), (0, 1, 2), (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), (28, 29, 30, 31, 32, 33, 34, 35, 36), (1, 2, 3, 4)]
And similar to ekhumoro's answer:
def sequences(x):
it = iter(x)
prev, temp = next(it), []
while prev is not None:
start = next(it, None)
if prev + 1 == start:
temp.append(prev)
elif temp:
yield tuple(temp + [prev])
temp = []
prev = start
To get the length and the tuple:
def sequences(l):
x2 = cycle(l)
next(x2)
grps = groupby(l, key=lambda j: j + 1 == next(x2))
for k, v in grps:
if k:
t = tuple(v) + (next(next(grps)[1]),)
yield t, len(t)
def sequences(l):
x2 = cycle(l)
next(x2)
grps = groupby(l, lambda j: j + 1 == next(x2))
yield from ((t, len(t)) for t in (tuple(v) + (next(next(grps)[1]),)
for k, v in grps if k))
def sequences(x):
it = iter(x)
prev, temp = next(it), []
while prev is not None:
start = next(it, None)
if prev + 1 == start:
temp.append(prev)
elif temp:
yield tuple(temp + [prev]), len(temp) + 1
temp = []
prev = start
Output will be the same for all three:
[((19, 20, 21, 22), 4), ((0, 1, 2), 3), ((4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), 11)
, ((28, 29, 30, 31, 32, 33, 34, 35, 36), 9), ((1, 2, 3, 4), 4)]
I think the most maintainable solution would be to make it simple:
def group_by(l):
res = [[l[0]]]
for i in range(1, len(l)):
if l[i-1] < l[i]:
res[-1].append(l[i])
else:
res.append([l[i]])
return res
This solution does not filter out single element sequences, but it can be easily implemented. Additionally, this has O(n) complexity. And you can make it an generator as well if you want.
By maintainable I mean code that is not an one-liner of 300 characters, with some convoluted expressions. Then maybe you would want to use Perl :). At least you will how the function behaves one year later.
>>> x = [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
>>> print(group_by(x))
[[7, 8, 9, 10], [6], [0, 1, 2, 3, 4, 5]]
If two consecutive numbers are increasing by one I form a list (group) of tuples of those numbers.
When non-increasing and if the list (group) is non-empty, I unpack it and zip again to rebuild the pair of sequence which were broken by the zip. I use set comprehension to eliminate duplicate numbers.
def extract_increasing_groups(seq):
seq = tuple(seq)
def is_increasing(a,b):
return a + 1 == b
def unzip(seq):
return tuple(sorted({ y for x in zip(*seq) for y in x}))
group = []
for a,b in zip(seq[:-1],seq[1:]):
if is_increasing(a,b):
group.append((a,b))
elif group:
yield unzip(group)
group = []
if group:
yield unzip(group)
if __name__ == '__main__':
x = [17, 17, 19, 20, 21, 22, 0, 1, 2, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 14, 14, 28, 29, 30, 31, 32, 33, 34, 35, 36, 40]
for group in extract_increasing_groups(x):
print(group)
Simpler one using set;
from collections import namedtuple
from itertools import islice, tee
def extract_increasing_groups(iterable):
iter1, iter2 = tee(iterable)
iter2 = islice(iter2,1,None)
is_increasing = lambda a,b: a + 1 == b
Igroup = namedtuple('Igroup','group, len')
group = set()
for pair in zip(iter1, iter2):
if is_increasing(*pair):
group.update(pair)
elif group:
yield Igroup(tuple(sorted(group)),len(group))
group = set()
if group:
yield Igroup(tuple(sorted(group)), len(group))
if __name__ == '__main__':
x = [17, 17, 19, 20, 21, 22, 0, 1, 2, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 14, 14, 28, 29, 30, 31, 32, 33, 34, 35, 36, 40]
total = 0
for group in extract_increasing_groups(x):
total += group.len
print('Group: {}\nLength: {}'.format(group.group, group.len))
print('Total: {}'.format(total))
def igroups(L):
R=[[]]
[R[-1].append(L[i]) for i in range(len(L)) if (L[i-1]+1==L[i] if L[i-1]+1==L[i] else R.append([L[i]]))]
return [P for P in R if len(P)>1]
tests=[[],
[0, 0, 0],
[7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5],
[8, 9, 10, 11, 7, 1, 2, 3, 4, 5, 6],
[9, 1, 2, 3, 1, 1, 2, 3, 5],
[4,3,2,1,1,2,3,3,4,3],
[1, 4, 3],
[1],
[1,2],
[2,1]
]
for L in tests:
print(L)
print(igroups(L))
print("-"*10)
outputting the following:
[]
[]
----------
[0, 0, 0]
[]
----------
[7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
[[7, 8, 9, 10], [0, 1, 2, 3, 4, 5]]
----------
[8, 9, 10, 11, 7, 1, 2, 3, 4, 5, 6]
[[8, 9, 10, 11], [1, 2, 3, 4, 5, 6]]
----------
[9, 1, 2, 3, 1, 1, 2, 3, 5]
[[1, 2, 3], [1, 2, 3]]
----------
[4, 3, 2, 1, 1, 2, 3, 3, 4, 3]
[[1, 2, 3], [3, 4]]
----------
[1, 4, 3]
[]
----------
[1]
[]
----------
[1, 2]
[[1, 2]]
----------
[2, 1]
[]
----------
EDIT
My first attemp using itertools.groupby was a fail, sorry for that.
With itertools.groupby, the problem of partionning a list of integers L in sublists of adjacent and increasing consecutive items from L can be done with a one-liner. Nevertheless I don't know how pythonic it can be considered ;)
Here is the code with some simple tests:
[EDIT : now subsequences are increasing by 1, I missed this point the first time.]
from itertools import groupby
def f(i):
return L[i-1]+1==L[i]
def igroups(L):
return [[L[I[0]-1]]+[L[i] for i in I] for I in [I for (v,I) in [(k,[i for i in list(g)]) for (k, g) in groupby(range(1, len(L)), f)] if v]]
outputting:
tests=[
[0, 0, 0, 0],
[7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5],
[8, 9, 10, 11, 7, 1, 2, 3, 4, 5, 6],
[9, 1, 2, 3, 1, 1, 2, 3, 5],
[4,3,2,1,1,2,3,3,4,3],
[1, 4, 3],
[1],
[1,2, 2],
[2,1],
[0, 0, 0, 0, 2, 5, 5, 8],
]
for L in tests:
print(L)
print(igroups(L))
print('-'*10)
[0, 0, 0, 0]
[]
----------
[7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
[[7, 8, 9, 10], [0, 1, 2, 3, 4, 5]]
----------
[8, 9, 10, 11, 7, 1, 2, 3, 4, 5, 6]
[[8, 9, 10, 11], [1, 2, 3, 4, 5, 6]]
----------
[9, 1, 2, 3, 1, 1, 2, 3, 5]
[[1, 2, 3], [1, 2, 3]]
----------
[4, 3, 2, 1, 1, 2, 3, 3, 4, 3]
[[1, 2, 3], [3, 4]]
----------
[1, 4, 3]
[]
----------
[1]
[]
----------
[1, 2, 2]
[[1, 2]]
----------
[2, 1]
[]
----------
[0, 0, 0, 0, 2, 5, 5, 8]
[]
----------
Some explanation. If you "unroll" the code, the logic is more apparant :
from itertools import groupby
def f(i):
return L[i]==L[i-1]+1
def igroups(L):
monotonic_states = [(k,list(g)) for (k, g) in groupby(range(1, len(L)), f)]
increasing_items_indices = [I for (v,I) in monotonic_states if v]
print("\nincreasing_items_indices ->", increasing_items_indices, '\n')
full_increasing_items= [[L[I[0]-1]]+[L[i] for i in I] for I in increasing_items_indices]
return full_increasing_items
L= [2, 8, 4, 5, 6, 7, 8, 5, 9, 10, 11, 12, 25, 26, 27, 42, 41]
print(L)
print(igroups(L))
outputting :
[2, 8, 4, 5, 6, 7, 8, 5, 9, 10, 11, 12, 25, 26, 27, 42, 41]
increasing_items_indices -> [[3, 4, 5, 6], [9, 10, 11], [13, 14]]
[[4, 5, 6, 7, 8], [9, 10, 11, 12], [25, 26, 27]]
We need a key function f that compares an item with the preceding one in the given list. Now, the important point is that the groupby function with the key function f provides a tuple (k, S) where S represents adjacent indices from the initial list and where the state of f is constant, the state being given by the value of k: if k is True, then S represents increasing (by 1) items indices else non-increasing items indices. (in fact, as the example above shows, the list S is incomplete and lacks the first item).
I also made some random tests with one million items lists : igroups function returns always the correct response but is 4 times slower than a naive implementation! Simpler is easier and faster ;)
Thanks alvas for your question, it gives me a lot of fun!
A (really) simple implementation:
x = [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
result = []
current = x[0]
temp = []
for i in xrange(1, len(x)):
if (x[i] - current == 1):
temp.append( x[i] )
else:
if (len(temp) > 1):
result.append(temp)
temp = [ x[i] ]
current = x[i]
result.append(temp)
And you will get [ [7, 8, 9, 10], [0, 1, 2, 3, 4, 5] ]. From there, you can get the number of increasing numbers by [ len(x) for x in result ] and the total number of numbers sum( len(x) for x in result).
I think this works. It's not fancy but it's simple. It constructs a start list sl and an end list el, which should always be the same length, then uses them to index into x:
def igroups(x):
sl = [i for i in range(len(x)-1)
if (x == 0 or x[i] != x[i-1]+1) and x[i+1] == x[i]+1]
el = [i for i in range(1, len(x))
if x[i] == x[i-1]+1 and (i == len(x)-1 or x[i+1] != x[i]+1)]
return [x[sl[i]:el[i]+1] for i in range(len(sl))]
Late answer, but a simple implementation, generalized to take a predicate so that it need not necessarily be increasing numbers, but could readily be any relationship between the two numbers.
def group_by(lst, predicate):
result = [[lst[0]]]
for i, x in enumerate(lst[1:], start=1):
if not predicate(lst[i - 1], x):
result.append([x])
else:
result[-1].append(x)
return list(filter(lambda lst: len(lst) > 1, result))
Testing this:
>>> group_by([1,2,3,4, 7, 1, 0, 2], lambda x, y: x < y)
[[1, 2, 3, 4, 7], [0, 2]]
>>> group_by([1,2,3,4, 7, 1, 0, 2], lambda x, y: x > y)
[[7, 1, 0]]
>>> group_by([1,2,3,4, 7, 1, 0, 0, 2], lambda x, y: x < y)
[[1, 2, 3, 4, 7], [0, 2]]
>>> group_by([1,2,3,4, 7, 1, 0, 0, 2], lambda x, y: x > y)
[[7, 1, 0]]
>>> group_by([1,2,3,4, 7, 1, 0, 0, 2], lambda x, y: x >= y)
[[7, 1, 0, 0]]
>>>
And now we can easily specialize this:
>>> def ascending_groups(lst):
... return group_by(lst, lambda x, y: x < y)
...
>>> ascending_groups([1,2,3,4, 7, 1, 0, 0, 2])
[[1, 2, 3, 4, 7], [0, 2]]
>>>

Splitting list based on difference between consecutive elements

I found this question that is related to mine. In that question a specific case is treated, and that's splitting a list of integers when a difference of more than 1 is present between consecutive elements.
I was wondering: is there way to make this work for a difference of N, a parameter? Namely, suppose we have this list:
[1,2,3,6,8,10,14,15,17,20]
For N=2, the output should be:
[[1,2,3], [6,8,10], [14,15,17], [20]]
For N=3, the output should be:
[[1,2,3,6,8,10], [14,15,17,20]]
And for N=4, the output should be the same input list.
I did it like this:
from itertools import takewhile
input_list = [1,2,3,6,8,10,14,15,17,20]
N = 4
def fun(l, N, output=[]):
if len(l):
output.append([x[1] for x in takewhile(lambda x: x[1]-x[0]<=N,
zip([l[0]]+l, l))])
fun(l[len(output[-1]):], N, output)
return output
fun(input_list, N)
But I don't really like it: it's unreadable. Something stylish as a one-liner or something pretty pythonic would be appreciated!
Two lines with list-comprehension:
def split_list(l, n):
index_list = [None] + [i for i in range(1, len(l)) if l[i] - l[i - 1] > n] + [None]
return [l[index_list[j - 1]:index_list[j]] for j in range(1, len(index_list))]
test:
example = [1, 2, 3, 6, 8, 10, 14, 15, 17, 20]
for i in range(2,5):
print(split_list(example, i))
# [[1, 2, 3], [6, 8, 10], [14, 15, 17], [20]]
# [[1, 2, 3, 6, 8, 10], [14, 15, 17, 20]]
# [[1, 2, 3, 6, 8, 10, 14, 15, 17, 20]]
def spacer(data, n=1):
set(data)
output = [[data[0]]]
for i in data[1:]:
if i - output[-1][-1] > n:
output.append([i])
else:
output[-1].append(i)
return output
data = [1, 2, 3, 6, 8, 10, 14, 15, 17, 20]
for i in range(1, 4):
print("N={}, {}".format(i, spacer(data, n=i)))
output:
N=1, [[1, 2, 3], [6], [8], [10], [14, 15], [17], [20]]
N=2, [[1, 2, 3], [6, 8, 10], [14, 15, 17], [20]]
N=3, [[1, 2, 3, 6, 8, 10], [14, 15, 17, 20]]

Sorting List of Integers into List of Lists by Digit Sums

I am trying to write a Python function to sort a list of numbers into a list of lists of numbers with each sublist only containing numbers that have the digit sum of the index of the sub list in the larger list.
So, for example, for all of the numbers from 1 to 25, it should yield a list of lists like this:
[[], [1, 10], [2, 11, 20], [3, 12, 21], [4, 13, 22], [5, 14, 23], [6, 15, 24], [7, 16], [8, 17], [9, 18], [19]]
I have the following code so far:
def digit_sum(integer_data_type):
int_string = str(integer_data_type)
sum = 0
for digits in int_string:
sum += int(digits)
return sum
def organize_by_digit_sum(integer_list):
integer_list.sort()
max_ds = 9*len(str(max(integer_list)))+1
list_of_lists = []
current_ds = 0
while current_ds <= max_ds:
current_list = []
for n in integer_list:
if digit_sum(n) == current_ds:
current_list.append(n)
list_of_lists.append(current_list)
current_ds += 1
return list_of_lists
Obviously, this is inefficient because it has to loop through the entire integer list over and over for each digit sum from 0 through the maximum digit sum.
Also, it initially assumes the maximum digit sum is 9 times the length of the maximum integer. To be clear, I do want to always have a sublist for the possible digit_sum of zero so that I can refer to a particular digit sum's sublist by the index of the list of lists.
I want the function only have to loop through each integer in the list exactly once and append it to the correct sublist.
I would appreciate any help or insights about this.
The following loops exactly once on the data and returns a dictionary whose keys are the sums, and values are the items that correspond to that sum:
from collections import defaultdict
from pprint import pprint
def group_by_sum(lst):
d = defaultdict(list)
for i in lst:
d[sum(int(j) for j in str(i))].append(i)
return d
pprint(group_by_sum(range(1, 25)))
# {1: [1, 10],
# 2: [2, 11, 20],
# 3: [3, 12, 21],
# 4: [4, 13, 22],
# 5: [5, 14, 23],
# 6: [6, 15, 24],
# 7: [7, 16],
# 8: [8, 17],
# 9: [9, 18],
# 10: [19]}
You can sort the dictionary values based on the sums to have a list, but I think keeping your data as a dictionary might serve you better.
If you don't mind using itertools, here is a way that should be more efficient.
from itertools import groupby
digit_sum = lambda x: sum(int(i) for i in str(x))
[list(g) for _, g in groupby(sorted(range(1,26), key = digit_sum), key = digit_sum)]
# ^^^^^^^^^^ replace this with your actual data
# [[1, 10],
# [2, 11, 20],
# [3, 12, 21],
# [4, 13, 22],
# [5, 14, 23],
# [6, 15, 24],
# [7, 16, 25],
# [8, 17],
# [9, 18],
# [19]]
The way it works here: use sorted() to sort your original list by the digit sum of the integers so that you can use groupby() method to group your list by the digit sum and then loop through the groups and convert the integers in each group to a list.
Update:
To get list where the digit sum of the sub list is equal to the index, you can firstly create a dictionary:
dict_ = dict((k,list(g)) for k, g in groupby(sorted(range(1,26), key = digit_sum), key = digit_sum))
dict_
# {1: [1, 10],
# 2: [2, 11, 20],
# 3: [3, 12, 21],
# 4: [4, 13, 22],
# 5: [5, 14, 23],
# 6: [6, 15, 24],
# 7: [7, 16, 25],
# 8: [8, 17],
# 9: [9, 18],
# 10: [19]}
[dict_.get(key, []) for key in range(max(dict_.keys()))]
# [[],
# [1, 10],
# [2, 11, 20],
# [3, 12, 21],
# [4, 13, 22],
# [5, 14, 23],
# [6, 15, 24],
# [7, 16, 25],
# [8, 17],
# [9, 18]]
If you want a solution that leaves empty lists, and space efficiency isn't your main concern, I would use a list of tuples:
>>> def digit_sum(digits):
... total = 0
... while digits != 0:
... total += digits % 10
... digits = digits // 10
... return total
...
>>> numbers = list(range(1,26))
>>> pairs = sorted((digit_sum(n),n) for n in numbers)
>>> pairs
[(1, 1), (1, 10), (2, 2), (2, 11), (2, 20), (3, 3), (3, 12), (3, 21), (4, 4), (4, 13), (4, 22), (5, 5), (5, 14), (5, 23), (6, 6), (6, 15), (6, 24), (7, 7), (7, 16), (7, 25), (8, 8), (8, 17), (9, 9), (9, 18), (10, 19)]
>>> maximum_sum = pairs[-1][0]
>>> list_of_lists = [[] for _ in range(maximum_sum+1)]
>>> for pair in pairs:
... list_of_lists[pair[0]].append(pair[1])
...
>>> list_of_lists
[[], [1, 10], [2, 11, 20], [3, 12, 21], [4, 13, 22], [5, 14, 23], [6, 15, 24], [7, 16, 25], [8, 17], [9, 18], [19]]
>>>
So, suppose your data is much more sparse:
>>> numbers = [4,25,47,89]
>>> pairs = sorted((digit_sum(n),n) for n in numbers)
>>> pairs
[(4, 4), (7, 25), (11, 47), (17, 89)]
>>> maximum_sum = pairs[-1][0]
>>> list_of_lists = [[] for _ in range(maximum_sum+1)]
>>> for pair in pairs:
... list_of_lists[pair[0]].append(pair[1])
...
>>> from pprint import pprint
>>> pprint(list_of_lists,width=2)
[[],
[],
[],
[],
[4],
[],
[],
[25],
[],
[],
[],
[47],
[],
[],
[],
[],
[],
[89]]
>>>
And you can access your data as such:
>>> list_of_lists[17]
[89]
>>> list_of_lists[8]
[]
>>>
Very easy:
list_of_lists = [[] for i in range(11)]
for i in range(25):
digit_sum = sum(int(i) for i in str(i))
list_of_lists[digit_sum].append(i)
print (list_of_lists)

Finding groups of increasing numbers in a list

The aim is to find groups of increasing/monotonic numbers given a list of integers. Each item in the resulting group must be of a +1 increment from the previous item
Given an input:
x = [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
I need to find groups of increasing numbers and achieve:
increasing_numbers = [(7,8,9,10), (0,1,2,3,4,5)]
And eventually also the number of increasing numbers:
len(list(chain(*increasing_numbers)))
And also the len of the groups:
increasing_num_groups_length = [len(i) for i in increasing_numbers]
I have tried the following to get the number of increasing numbers:
>>> from itertools import tee, chain
>>> def pairwise(iterable):
... a, b = tee(iterable)
... next(b, None)
... return zip(a, b)
...
>>> x = [8, 9, 10, 11, 7, 1, 2, 3, 4, 5, 6]
>>> set(list(chain(*[(i,j) for i,j in pairwise(x) if j-1==i])))
set([1, 2, 3, 4, 5, 6, 8, 9, 10, 11])
>>> len(set(list(chain(*[(i,j) for i,j in pairwise(x) if j-1==i]))))
10
But I'm unable to keep the order and the groups of increasing numbers.
How can I achieve the increasing_numbers groups of integer tuples and also the increasing_num_groups_length?
Also, is there a name for such/similar problem?
EDITED
I've came up with this solution but it's super verbose and I'm sure there's an easier way to achieve the increasing_numbers output:
>>> from itertools import tee, chain
>>> def pairwise(iterable):
... a, b = tee(iterable)
... next(b, None)
... return zip(a, b)
...
>>> x = [8, 9, 10, 11, 7, 1, 2, 3, 4, 5, 6]
>>> boundary = iter([0] + [i+1 for i, (j,k) in enumerate(pairwise(x)) if j+1!=k] + [len(x)])
>>> [tuple(x[i:next(boundary)]) for i in boundary]
[(8, 9, 10, 11), (1, 2, 3, 4, 5, 6)]
Is there a more pythonic / less verbose way to do this?
Another input/output example:
[in]:
[17, 17, 19, 20, 21, 22, 0, 1, 2, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 14, 14, 28, 29, 30, 31, 32, 33, 34, 35, 36, 40]
[out]:
[(19, 20, 21, 22), (0, 1, 2), (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14),
(28, 29, 30, 31, 32, 33, 34, 35, 36)]
EDIT:
Here's a code-golf solution (142 characters):
def f(x):s=[0]+[i for i in range(1,len(x)) if x[i]!=x[i-1]+1]+[len(x)];return [x[j:k] for j,k in [s[i:i+2] for i in range(len(s)-1)] if k-j>1]
Expanded version:
def igroups(x):
s = [0] + [i for i in range(1, len(x)) if x[i] != x[i-1] + 1] + [len(x)]
return [x[j:k] for j, k in [s[i:i+2] for i in range(len(s)-1)] if k - j > 1]
Commented version:
def igroups(x):
# find the boundaries where numbers are not consecutive
boundaries = [i for i in range(1, len(x)) if x[i] != x[i-1] + 1]
# add the start and end boundaries
boundaries = [0] + boundaries + [len(x)]
# take the boundaries as pairwise slices
slices = [boundaries[i:i + 2] for i in range(len(boundaries) - 1)]
# extract all sequences with length greater than one
return [x[start:end] for start, end in slices if end - start > 1]
Original solution:
Not sure whether this counts as "pythonic" or "not too verbose":
def igroups(iterable):
items = iter(iterable)
a, b = None, next(items, None)
result = [b]
while b is not None:
a, b = b, next(items, None)
if b is not None and a + 1 == b:
result.append(b)
else:
if len(result) > 1:
yield tuple(result)
result = [b]
print(list(igroups([])))
print(list(igroups([0, 0, 0])))
print(list(igroups([7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5])))
print(list(igroups([8, 9, 10, 11, 7, 1, 2, 3, 4, 5, 6])))
print(list(igroups([9, 1, 2, 3, 1, 1, 2, 3, 5])))
Output:
[]
[]
[(7, 8, 9, 10), (0, 1, 2, 3, 4, 5)]
[(8, 9, 10, 11), (1, 2, 3, 4, 5, 6)]
[(1, 2, 3), (1, 2, 3)]
A couple of different ways using itertools and numpy:
from itertools import groupby, tee, cycle
x = [17, 17, 19, 20, 21, 22, 0, 1, 2, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 14, 14, 28, 29, 30, 31, 32, 33, 34, 35,
36, 1, 2, 3, 4,34,54]
def sequences(l):
x2 = cycle(l)
next(x2)
grps = groupby(l, key=lambda j: j + 1 == next(x2))
for k, v in grps:
if k:
yield tuple(v) + (next((next(grps)[1])),)
print(list(sequences(x)))
[(19, 20, 21, 22), (0, 1, 2), (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), (28, 29, 30, 31, 32, 33, 34, 35, 36), (1, 2, 3, 4)]
Or using python3 and yield from:
def sequences(l):
x2 = cycle(l)
next(x2)
grps = groupby(l, key=lambda j: j + 1 == next(x2))
yield from (tuple(v) + (next((next(grps)[1])),) for k,v in grps if k)
print(list(sequences(x)))
Using a variation of my answer here with numpy.split :
out = [tuple(arr) for arr in np.split(x, np.where(np.diff(x) != 1)[0] + 1) if arr.size > 1]
print(out)
[(19, 20, 21, 22), (0, 1, 2), (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), (28, 29, 30, 31, 32, 33, 34, 35, 36), (1, 2, 3, 4)]
And similar to ekhumoro's answer:
def sequences(x):
it = iter(x)
prev, temp = next(it), []
while prev is not None:
start = next(it, None)
if prev + 1 == start:
temp.append(prev)
elif temp:
yield tuple(temp + [prev])
temp = []
prev = start
To get the length and the tuple:
def sequences(l):
x2 = cycle(l)
next(x2)
grps = groupby(l, key=lambda j: j + 1 == next(x2))
for k, v in grps:
if k:
t = tuple(v) + (next(next(grps)[1]),)
yield t, len(t)
def sequences(l):
x2 = cycle(l)
next(x2)
grps = groupby(l, lambda j: j + 1 == next(x2))
yield from ((t, len(t)) for t in (tuple(v) + (next(next(grps)[1]),)
for k, v in grps if k))
def sequences(x):
it = iter(x)
prev, temp = next(it), []
while prev is not None:
start = next(it, None)
if prev + 1 == start:
temp.append(prev)
elif temp:
yield tuple(temp + [prev]), len(temp) + 1
temp = []
prev = start
Output will be the same for all three:
[((19, 20, 21, 22), 4), ((0, 1, 2), 3), ((4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), 11)
, ((28, 29, 30, 31, 32, 33, 34, 35, 36), 9), ((1, 2, 3, 4), 4)]
I think the most maintainable solution would be to make it simple:
def group_by(l):
res = [[l[0]]]
for i in range(1, len(l)):
if l[i-1] < l[i]:
res[-1].append(l[i])
else:
res.append([l[i]])
return res
This solution does not filter out single element sequences, but it can be easily implemented. Additionally, this has O(n) complexity. And you can make it an generator as well if you want.
By maintainable I mean code that is not an one-liner of 300 characters, with some convoluted expressions. Then maybe you would want to use Perl :). At least you will how the function behaves one year later.
>>> x = [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
>>> print(group_by(x))
[[7, 8, 9, 10], [6], [0, 1, 2, 3, 4, 5]]
If two consecutive numbers are increasing by one I form a list (group) of tuples of those numbers.
When non-increasing and if the list (group) is non-empty, I unpack it and zip again to rebuild the pair of sequence which were broken by the zip. I use set comprehension to eliminate duplicate numbers.
def extract_increasing_groups(seq):
seq = tuple(seq)
def is_increasing(a,b):
return a + 1 == b
def unzip(seq):
return tuple(sorted({ y for x in zip(*seq) for y in x}))
group = []
for a,b in zip(seq[:-1],seq[1:]):
if is_increasing(a,b):
group.append((a,b))
elif group:
yield unzip(group)
group = []
if group:
yield unzip(group)
if __name__ == '__main__':
x = [17, 17, 19, 20, 21, 22, 0, 1, 2, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 14, 14, 28, 29, 30, 31, 32, 33, 34, 35, 36, 40]
for group in extract_increasing_groups(x):
print(group)
Simpler one using set;
from collections import namedtuple
from itertools import islice, tee
def extract_increasing_groups(iterable):
iter1, iter2 = tee(iterable)
iter2 = islice(iter2,1,None)
is_increasing = lambda a,b: a + 1 == b
Igroup = namedtuple('Igroup','group, len')
group = set()
for pair in zip(iter1, iter2):
if is_increasing(*pair):
group.update(pair)
elif group:
yield Igroup(tuple(sorted(group)),len(group))
group = set()
if group:
yield Igroup(tuple(sorted(group)), len(group))
if __name__ == '__main__':
x = [17, 17, 19, 20, 21, 22, 0, 1, 2, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 14, 14, 28, 29, 30, 31, 32, 33, 34, 35, 36, 40]
total = 0
for group in extract_increasing_groups(x):
total += group.len
print('Group: {}\nLength: {}'.format(group.group, group.len))
print('Total: {}'.format(total))
def igroups(L):
R=[[]]
[R[-1].append(L[i]) for i in range(len(L)) if (L[i-1]+1==L[i] if L[i-1]+1==L[i] else R.append([L[i]]))]
return [P for P in R if len(P)>1]
tests=[[],
[0, 0, 0],
[7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5],
[8, 9, 10, 11, 7, 1, 2, 3, 4, 5, 6],
[9, 1, 2, 3, 1, 1, 2, 3, 5],
[4,3,2,1,1,2,3,3,4,3],
[1, 4, 3],
[1],
[1,2],
[2,1]
]
for L in tests:
print(L)
print(igroups(L))
print("-"*10)
outputting the following:
[]
[]
----------
[0, 0, 0]
[]
----------
[7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
[[7, 8, 9, 10], [0, 1, 2, 3, 4, 5]]
----------
[8, 9, 10, 11, 7, 1, 2, 3, 4, 5, 6]
[[8, 9, 10, 11], [1, 2, 3, 4, 5, 6]]
----------
[9, 1, 2, 3, 1, 1, 2, 3, 5]
[[1, 2, 3], [1, 2, 3]]
----------
[4, 3, 2, 1, 1, 2, 3, 3, 4, 3]
[[1, 2, 3], [3, 4]]
----------
[1, 4, 3]
[]
----------
[1]
[]
----------
[1, 2]
[[1, 2]]
----------
[2, 1]
[]
----------
EDIT
My first attemp using itertools.groupby was a fail, sorry for that.
With itertools.groupby, the problem of partionning a list of integers L in sublists of adjacent and increasing consecutive items from L can be done with a one-liner. Nevertheless I don't know how pythonic it can be considered ;)
Here is the code with some simple tests:
[EDIT : now subsequences are increasing by 1, I missed this point the first time.]
from itertools import groupby
def f(i):
return L[i-1]+1==L[i]
def igroups(L):
return [[L[I[0]-1]]+[L[i] for i in I] for I in [I for (v,I) in [(k,[i for i in list(g)]) for (k, g) in groupby(range(1, len(L)), f)] if v]]
outputting:
tests=[
[0, 0, 0, 0],
[7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5],
[8, 9, 10, 11, 7, 1, 2, 3, 4, 5, 6],
[9, 1, 2, 3, 1, 1, 2, 3, 5],
[4,3,2,1,1,2,3,3,4,3],
[1, 4, 3],
[1],
[1,2, 2],
[2,1],
[0, 0, 0, 0, 2, 5, 5, 8],
]
for L in tests:
print(L)
print(igroups(L))
print('-'*10)
[0, 0, 0, 0]
[]
----------
[7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
[[7, 8, 9, 10], [0, 1, 2, 3, 4, 5]]
----------
[8, 9, 10, 11, 7, 1, 2, 3, 4, 5, 6]
[[8, 9, 10, 11], [1, 2, 3, 4, 5, 6]]
----------
[9, 1, 2, 3, 1, 1, 2, 3, 5]
[[1, 2, 3], [1, 2, 3]]
----------
[4, 3, 2, 1, 1, 2, 3, 3, 4, 3]
[[1, 2, 3], [3, 4]]
----------
[1, 4, 3]
[]
----------
[1]
[]
----------
[1, 2, 2]
[[1, 2]]
----------
[2, 1]
[]
----------
[0, 0, 0, 0, 2, 5, 5, 8]
[]
----------
Some explanation. If you "unroll" the code, the logic is more apparant :
from itertools import groupby
def f(i):
return L[i]==L[i-1]+1
def igroups(L):
monotonic_states = [(k,list(g)) for (k, g) in groupby(range(1, len(L)), f)]
increasing_items_indices = [I for (v,I) in monotonic_states if v]
print("\nincreasing_items_indices ->", increasing_items_indices, '\n')
full_increasing_items= [[L[I[0]-1]]+[L[i] for i in I] for I in increasing_items_indices]
return full_increasing_items
L= [2, 8, 4, 5, 6, 7, 8, 5, 9, 10, 11, 12, 25, 26, 27, 42, 41]
print(L)
print(igroups(L))
outputting :
[2, 8, 4, 5, 6, 7, 8, 5, 9, 10, 11, 12, 25, 26, 27, 42, 41]
increasing_items_indices -> [[3, 4, 5, 6], [9, 10, 11], [13, 14]]
[[4, 5, 6, 7, 8], [9, 10, 11, 12], [25, 26, 27]]
We need a key function f that compares an item with the preceding one in the given list. Now, the important point is that the groupby function with the key function f provides a tuple (k, S) where S represents adjacent indices from the initial list and where the state of f is constant, the state being given by the value of k: if k is True, then S represents increasing (by 1) items indices else non-increasing items indices. (in fact, as the example above shows, the list S is incomplete and lacks the first item).
I also made some random tests with one million items lists : igroups function returns always the correct response but is 4 times slower than a naive implementation! Simpler is easier and faster ;)
Thanks alvas for your question, it gives me a lot of fun!
A (really) simple implementation:
x = [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
result = []
current = x[0]
temp = []
for i in xrange(1, len(x)):
if (x[i] - current == 1):
temp.append( x[i] )
else:
if (len(temp) > 1):
result.append(temp)
temp = [ x[i] ]
current = x[i]
result.append(temp)
And you will get [ [7, 8, 9, 10], [0, 1, 2, 3, 4, 5] ]. From there, you can get the number of increasing numbers by [ len(x) for x in result ] and the total number of numbers sum( len(x) for x in result).
I think this works. It's not fancy but it's simple. It constructs a start list sl and an end list el, which should always be the same length, then uses them to index into x:
def igroups(x):
sl = [i for i in range(len(x)-1)
if (x == 0 or x[i] != x[i-1]+1) and x[i+1] == x[i]+1]
el = [i for i in range(1, len(x))
if x[i] == x[i-1]+1 and (i == len(x)-1 or x[i+1] != x[i]+1)]
return [x[sl[i]:el[i]+1] for i in range(len(sl))]
Late answer, but a simple implementation, generalized to take a predicate so that it need not necessarily be increasing numbers, but could readily be any relationship between the two numbers.
def group_by(lst, predicate):
result = [[lst[0]]]
for i, x in enumerate(lst[1:], start=1):
if not predicate(lst[i - 1], x):
result.append([x])
else:
result[-1].append(x)
return list(filter(lambda lst: len(lst) > 1, result))
Testing this:
>>> group_by([1,2,3,4, 7, 1, 0, 2], lambda x, y: x < y)
[[1, 2, 3, 4, 7], [0, 2]]
>>> group_by([1,2,3,4, 7, 1, 0, 2], lambda x, y: x > y)
[[7, 1, 0]]
>>> group_by([1,2,3,4, 7, 1, 0, 0, 2], lambda x, y: x < y)
[[1, 2, 3, 4, 7], [0, 2]]
>>> group_by([1,2,3,4, 7, 1, 0, 0, 2], lambda x, y: x > y)
[[7, 1, 0]]
>>> group_by([1,2,3,4, 7, 1, 0, 0, 2], lambda x, y: x >= y)
[[7, 1, 0, 0]]
>>>
And now we can easily specialize this:
>>> def ascending_groups(lst):
... return group_by(lst, lambda x, y: x < y)
...
>>> ascending_groups([1,2,3,4, 7, 1, 0, 0, 2])
[[1, 2, 3, 4, 7], [0, 2]]
>>>

Categories