This question already has answers here:
Rolling or sliding window iterator?
(29 answers)
Closed last month.
Is there an efficient or elegant way to retrieve all the k-size sublists of a list in Python? For example:
arr = [2, 3, 5, 7, 11, 13]
I want all 3-element sublists:
result = [[2, 3, 5],
[3, 5, 7],
[5, 7, 11],
[7, 11, 13]]
I know I could create this with a for loop, slicing the list with arr[i:i+3], but the lists I'm dealing with are gigantic and I'm hoping for an efficient mechanism, or at least an elegant or Pythonic mechanism.
I'm using Pandas as well, so happy to use a Pandas mechanism.
If you actually want to construct the list, I don't think you'll do better than a basic list comprehension like this:
arr = [2, 3, 5, 7, 11, 13]
result = [arr[i:i+k] for i in range(len(arr)-k+1)]
If you want to minimize memory use, you could use a generator:
arr = [2, 3, 5, 7, 11, 13]
def window(arr, k):
for i in range(len(arr)-k+1):
yield arr[i:i+k]
for group in window(arr, 3):
... # do something with group
You could also do something where you zip together k copies of the list, each offset by one. But that would take as much memory as the first solution, probably without much performance advantage.
There may be something quick and efficient in numpy or pandas, but you would need to show more about what your input and output should look like.
There are some other ideas here, but they are focused on general iterables (where you can only pull items out once), rather than lists (where you can access items by index, possibly repeatedly).
You can use more_itertools
import more_itertools
list(more_itertools.windowed(arr,3))
[(2, 3, 5), (3, 5, 7), (5, 7, 11), (7, 11, 13)]
OR
using itertools:
from itertools import islice
def pairwise(iterable, n):
"s -> (s0,s1,..s(n-1)), (s1,s2,.., sn), (s2, s3,..,s(n+1)), ..."
iters = iter(iterable)
result = tuple(islice(iters, n))
if len(result) == n:
yield result
for elem in iters:
result = result[1:] + (elem,)
yield result
You can use strides:
arr = [2, 3, 5, 7, 11, 13]
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
a = rolling_window(np.array(arr), 3)
print (a)
[[ 2 3 5]
[ 3 5 7]
[ 5 7 11]
[ 7 11 13]]
print (a.tolist())
[[2, 3, 5],
[3, 5, 7],
[5, 7, 11],
[7, 11, 13]]
If your source (list) is gigantic, it means the source provider should produce a value on demand. The way of doing that is by making a generator.
Hypothetical source generator from a file;
def gen_value():
with open('big-file.txt') as f:
for line in f:
for x in line.split():
yield int(x)
The grouper function recipe may be used to consume the generator:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
So you may call list(grouper(gen(), 3))
I want to do something like the following:
s = set()
s.add('a')
s.add('q')
s.add('x')
s.add('banana')
s1 = s(:1)
s2 = s(2:3)
Is the above a valid operation on sets? Is there a way to apply the above to a generator such as:
def Chunks(container, size):
for i in xrange(0, len(container), size):
yield container(i:i+size)
You can convert the set to a list, do the slicing, and then convert it back to sets:
In [13]: s = {1,2,3}
In [14]: s_l = list(s)
In [15]: print set(s_l[:1]), set(s_l[1:])
set([1]) set([2, 3])
Do note that sets do not support ordering, so any such ordering operation would have to be done within a list:
In [16]: s = {1,2,3,0}
In [17]: s
Out[17]: {0, 1, 2, 3}
In [18]: s_l = list(s)
In [19]: print set(s_l[:2]), set(s_l[2:])
set([0, 1]) set([2, 3])
Using itertools.islice it would be easy
>>> from itertools import islice
>>> def chunk(it, size):
... it = iter(it)
... return iter(lambda: tuple(islice(it, size)), ())
>>> data = {i for i in range(20)}
for j in chunk({i for i in range(20)}, 4):
... print(j)
...
(0, 1, 2, 3)
(4, 5, 6, 7)
(8, 9, 10, 11)
(12, 13, 14, 15)
(16, 17, 18, 19)
Sets are not indexable in the same way that lists are, but you can iterate over "chunks" or subgroups from them. Use the grouper recipe from itertools:
from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
s2 = set('Andy Sandy Jack Jill Frank Fred Sally Sarah Bill Billy Bob'.split())
for g in grouper(s2, 3):
print g
Yields:
('Sarah', 'Frank', 'Bill')
('Fred', 'Billy', 'Jill')
('Andy', 'Jack', 'Bob')
('Sally', 'Sandy', None)
The initial order of the set initializer is not preserved because unlike lists,
sets don't preserve order.
you can use itertools.islice, though sets don't guarantee an ordering so be careful if you aren't looking at the whole set
Im trying to write a function that creates set of dynamic sublists each containing 5 elements from a list passed to it.Here's my attempt at the code
def sublists(seq):
i=0
x=[]
while i<len(seq)-1:
j=0
while j<5:
X.append(seq[i]) # How do I change X after it reaches size 5?
#return set of sublists
EDIT:
Sample input: [1,2,3,4,5,6,7,8,9,10]
Expected output: [[1,2,3,4,5],[6,7,8,9,10]]
Well, for starters, you'll need to (or at least should) have two lists, a temporary one and a permanent one that you return (Also you will need to increase j and i or, more practically, use a for loop, but I assume you just forgot to post that).
EDIT removed first code as the style given doesn't match easily with the expected results, see other two possibilities.
Or, more sensibly:
def sublists(seq):
x=[]
for i in range(0,len(seq),5):
x.append(seq[i:i+5])
return x
Or, more sensibly again, a simple list comprehension:
def sublists(seq):
return [seq[i:i+5] for i in range(0,len(seq),5)]
When given the list:
l = [1,2,3,4,5,6,7,8,9,10]
They will return
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]
Have you considered using itertools.combinations(...)?
For example:
>>> from itertools import combinations
>>> l = [1,2,3,4,5,6]
>>> list(combinations(l, 5))
[(1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 5, 6), (1, 2, 4, 5, 6), (1, 3, 4, 5, 6), (2, 3, 4, 5, 6)]
By "dynamic sublists", do you mean break up the list into groups of five elements? This is similar to your approach:
def sublists(lst, n):
ret = []
i = 0
while i < len(lst):
ret.append(seq[i:i+n])
i += n
return ret
Or, using iterators:
def sublists(seq, n):
it = iter(seq)
while True:
r = list(itertools.islice(it, 5))
if not r:
break
yield r
which will return an iterator of lists over list of length up to five. (If you took out the list call, weird things would happen if you didn't access the iterators in the same order.)
This question already has answers here:
How do I split a list into equally-sized chunks?
(66 answers)
Closed 8 months ago.
I am surprised I could not find a "batch" function that would take as input an iterable and return an iterable of iterables.
For example:
for i in batch(range(0,10), 1): print i
[0]
[1]
...
[9]
or:
for i in batch(range(0,10), 3): print i
[0,1,2]
[3,4,5]
[6,7,8]
[9]
Now, I wrote what I thought was a pretty simple generator:
def batch(iterable, n = 1):
current_batch = []
for item in iterable:
current_batch.append(item)
if len(current_batch) == n:
yield current_batch
current_batch = []
if current_batch:
yield current_batch
But the above does not give me what I would have expected:
for x in batch(range(0,10),3): print x
[0]
[0, 1]
[0, 1, 2]
[3]
[3, 4]
[3, 4, 5]
[6]
[6, 7]
[6, 7, 8]
[9]
So, I have missed something and this probably shows my complete lack of understanding of python generators. Anyone would care to point me in the right direction ?
[Edit: I eventually realized that the above behavior happens only when I run this within ipython rather than python itself]
This is probably more efficient (faster)
def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
for x in batch(range(0, 10), 3):
print x
Example using list
data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # list of data
for x in batch(data, 3):
print(x)
# Output
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 10]
It avoids building new lists.
The recipes in the itertools module provide two ways to do this depending on how you want to handle a final odd-sized lot (keep it, pad it with a fillvalue, ignore it, or raise an exception):
from itertools import islice, zip_longest
def batched(iterable, n):
"Batch data into lists of length n. The last batch may be shorter."
# batched('ABCDEFG', 3) --> ABC DEF G
it = iter(iterable)
while True:
batch = list(islice(it, n))
if not batch:
return
yield batch
def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
"Collect data into non-overlapping fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
# grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
# grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
args = [iter(iterable)] * n
if incomplete == 'fill':
return zip_longest(*args, fillvalue=fillvalue)
if incomplete == 'strict':
return zip(*args, strict=True)
if incomplete == 'ignore':
return zip(*args)
else:
raise ValueError('Expected fill, strict, or ignore')
More-itertools includes two functions that do what you need:
chunked(iterable, n) returns an iterable of lists, each of length n (except the last one, which may be shorter);
ichunked(iterable, n) is similar, but returns an iterable of iterables instead.
As others have noted, the code you have given does exactly what you want. For another approach using itertools.islice you could see an example of following recipe:
from itertools import islice, chain
def batch(iterable, size):
sourceiter = iter(iterable)
while True:
batchiter = islice(sourceiter, size)
yield chain([batchiter.next()], batchiter)
Solution for Python 3.8 if you are working with iterables that don't define a len function, and get exhausted:
from itertools import islice
def batcher(iterable, batch_size):
iterator = iter(iterable)
while batch := list(islice(iterator, batch_size)):
yield batch
Example usage:
def my_gen():
yield from range(10)
for batch in batcher(my_gen(), 3):
print(batch)
>>> [0, 1, 2]
>>> [3, 4, 5]
>>> [6, 7, 8]
>>> [9]
Could of course be implemented without the walrus operator as well.
This is a very short code snippet I know that does not use len and works under both Python 2 and 3 (not my creation):
def chunks(iterable, size):
from itertools import chain, islice
iterator = iter(iterable)
for first in iterator:
yield list(chain([first], islice(iterator, size - 1)))
Weird, seems to work fine for me in Python 2.x
>>> def batch(iterable, n = 1):
... current_batch = []
... for item in iterable:
... current_batch.append(item)
... if len(current_batch) == n:
... yield current_batch
... current_batch = []
... if current_batch:
... yield current_batch
...
>>> for x in batch(range(0, 10), 3):
... print x
...
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]
A workable version without new features in python 3.8, adapted from #Atra Azami's answer.
import itertools
def batch_generator(iterable, batch_size=1):
iterable = iter(iterable)
while True:
batch = list(itertools.islice(iterable, batch_size))
if len(batch) > 0:
yield batch
else:
break
for x in batch_generator(range(0, 10), 3):
print(x)
Output:
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]
def batch(iterable, n):
iterable=iter(iterable)
while True:
chunk=[]
for i in range(n):
try:
chunk.append(next(iterable))
except StopIteration:
yield chunk
return
yield chunk
list(batch(range(10), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
Moving as much into CPython as possible, by leveraging islice and iter(callable) behavior:
from itertools import islice
def chunked(generator, size):
"""Read parts of the generator, pause each time after a chunk"""
# islice returns results until 'size',
# make_chunk gets repeatedly called by iter(callable).
gen = iter(generator)
make_chunk = lambda: list(islice(gen, size))
return iter(make_chunk, [])
Inspired by more-itertools, and shortened to the essence of that code.
This is what I use in my project. It handles iterables or lists as efficiently as possible.
def chunker(iterable, size):
if not hasattr(iterable, "__len__"):
# generators don't have len, so fall back to slower
# method that works with generators
for chunk in chunker_gen(iterable, size):
yield chunk
return
it = iter(iterable)
for i in range(0, len(iterable), size):
yield [k for k in islice(it, size)]
def chunker_gen(generator, size):
iterator = iter(generator)
for first in iterator:
def chunk():
yield first
for more in islice(iterator, size - 1):
yield more
yield [k for k in chunk()]
I like this one,
def batch(x, bs):
return [x[i:i+bs] for i in range(0, len(x), bs)]
This returns a list of batches of size bs, you can make it a generator by using a generator expression (i for i in iterable) of course.
Here is an approach using reduce function.
Oneliner:
from functools import reduce
reduce(lambda cumulator,item: cumulator[-1].append(item) or cumulator if len(cumulator[-1]) < batch_size else cumulator + [[item]], input_array, [[]])
Or more readable version:
from functools import reduce
def batch(input_list, batch_size):
def reducer(cumulator, item):
if len(cumulator[-1]) < batch_size:
cumulator[-1].append(item)
return cumulator
else:
cumulator.append([item])
return cumulator
return reduce(reducer, input_list, [[]])
Test:
>>> batch([1,2,3,4,5,6,7], 3)
[[1, 2, 3], [4, 5, 6], [7]]
>>> batch(a, 8)
[[1, 2, 3, 4, 5, 6, 7]]
>>> batch([1,2,3,None,4], 3)
[[1, 2, 3], [None, 4]]
This would work for any iterable.
from itertools import zip_longest, filterfalse
def batch_iterable(iterable, batch_size=2):
args = [iter(iterable)] * batch_size
return (tuple(filterfalse(lambda x: x is None, group)) for group in zip_longest(fillvalue=None, *args))
It would work like this:
>>>list(batch_iterable(range(0,5)), 2)
[(0, 1), (2, 3), (4,)]
PS: It would not work if iterable has None values.
You can just group iterable items by their batch index.
def batch(items: Iterable, batch_size: int) -> Iterable[Iterable]:
# enumerate items and group them by batch index
enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
# extract items from enumeration tuples
item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
return item_batches
It is often the case when you want to collect inner iterables so here is more advanced version.
def batch_advanced(items: Iterable, batch_size: int, batches_mapper: Callable[[Iterable], Any] = None) -> Iterable[Iterable]:
enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
if batches_mapper:
item_batches = (batches_mapper(t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
else:
item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
return item_batches
Examples:
print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, tuple)))
# [(1, 9, 3, 5), (2, 4, 2)]
print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, list)))
# [[1, 9, 3, 5], [2, 4, 2]]
Related functionality you may need:
def batch(size, i):
""" Get the i'th batch of the given size """
return slice(size* i, size* i + size)
Usage:
>>> [1,2,3,4,5,6,7,8,9,10][batch(3, 1)]
>>> [4, 5, 6]
It gets the i'th batch from the sequence and it can work with other data structures as well, like pandas dataframes (df.iloc[batch(100,0)]) or numpy array (array[batch(100,0)]).
from itertools import *
class SENTINEL: pass
def batch(iterable, n):
return (tuple(filterfalse(lambda x: x is SENTINEL, group)) for group in zip_longest(fillvalue=SENTINEL, *[iter(iterable)] * n))
print(list(range(10), 3)))
# outputs: [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9,)]
print(list(batch([None]*10, 3)))
# outputs: [(None, None, None), (None, None, None), (None, None, None), (None,)]
I use
def batchify(arr, batch_size):
num_batches = math.ceil(len(arr) / batch_size)
return [arr[i*batch_size:(i+1)*batch_size] for i in range(num_batches)]
Keep taking (at most) n elements until it runs out.
def chop(n, iterable):
iterator = iter(iterable)
while chunk := list(take(n, iterator)):
yield chunk
def take(n, iterable):
iterator = iter(iterable)
for i in range(n):
try:
yield next(iterator)
except StopIteration:
return
This code has the following features:
Can take lists or generators (no len()) as input
Does not require imports of other packages
No padding added to last batch
def batch_generator(items, batch_size):
itemid=0 # Keeps track of current position in items generator/list
batch = [] # Empty batch
for item in items:
batch.append(item) # Append items to batch
if len(batch)==batch_size:
yield batch
itemid += batch_size # Increment the position in items
batch = []
yield batch # yield last bit