Say I have a list that gets generated such as:
list_x = [0,1,2,3,4,5,6,7,8,9]
Then I divide it up like this:
list_x = [01,23,45,67,89]
with this list comprehension:
list_x = [0,1,2,3,4,5,6,7,8,9]
grp_count = 2
new_list = map(int, [list_x[i+0]+list_x[i+1] for i in range(0, len(list_x)-1, grp_count)])
How can I make this code so I can group it into a grouping based on `grp_count'
for example if group_count = 5:
list_x = [01234,56789]
I know I have to insert multiple list_x[i+n] for each addition of grouping size somehow.
This looks a lot like the itertools grouper recipe from https://docs.python.org/3/library/itertools.html
from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
then
list_x = [0,1,2,3,4,5,6,7,8,9]
print(list(grouper(list_x, 5, 0)))
gives
[(0, 1, 2, 3, 4), (5, 6, 7, 8, 9)]
As I said in a comment, there's no way to create a list of integers in Python that would display like [01234,56789] because Python doesn't show the value of integers with a leading zero like that. The closest you could get is [1234, 56789].
However you can create a list of strings with those digits in them like this:
def grouper(n, iterable):
return zip(*[iter(iterable)]*n)
list_x = [0,1,2,3,4,5,6,7,8,9]
grp_count = 5
new_list = [''.join(map(str, g)) for g in grouper(grp_count, list_x)]
print(new_list) #-> ['01234', '56789']
You can use list comprehension to do the trick like this example:
def grouper(a, num):
if num > int(len(a)/2):
return []
# If you need to return only a list of lists use:
# return [a[k:k+num] for k in range(0, len(a), num)]
return ["".join(map(str, a[k:k+num])) for k in range(0, len(a), num)]
a = [0,1,2,3,4,5,6,7,8,9]
print(grouper(a, 2))
print(grouper(a, 5))
Output:
['01', '23', '45', '67', '89']
['01234', '56789']
I have the following situation. Say I have a variable batch_size and a list called data. I want to pull batch_size elements out of data, so that when I hit the end I wrap around. In other words:
data =[1,2,3,4,5]
batch_size = 4
-> [1,2,3,4], [5,1,2,3], [4,5,1,2], ...
Is there some nice idiomatic way of returning slices like this? The start index is always batch_size * batch modulo the length of data, but is there a simple way of "wrapping around" from the beginning if batch_size * (batch+1) goes beyond the length of the list? I can of course patch together two slices in this case, but I was hoping that there's some really clean way of doing this.
The only assumption I'm making is that batch_size < len(data).
You could use itertools.cycle and the grouper recipe from itertools
import itertools
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
data = [1,2,3,4,5]
batch_size = 4
how_many_groups = 5
groups = grouper(itertools.cycle(data), batch_size)
chunks = [next(groups) for _ in range(how_many_groups)]
The result of chunks is then:
[(1, 2, 3, 4),
(5, 1, 2, 3),
(4, 5, 1, 2),
(3, 4, 5, 1),
(2, 3, 4, 5)]
So if you actually need those as lists, you'll have to cast it as such ([list(next(groups)) for ...])
You can, also, use deque from collections module and do one rotation over the deques like this example:
from collections import deque
def grouper(iterable, elements, rotations):
if elements > len(iterable):
return []
b = deque(iterable)
for _ in range(rotations):
yield list(b)[:elements]
b.rotate(1)
data = [1,2,3,4,5]
elements = 4
rotations = 5
final = list(grouper(data, elements, rotations))
print(final)
Output:
[[1, 2, 3, 4], [5, 1, 2, 3], [4, 5, 1, 2], [3, 4, 5, 1], [2, 3, 4, 5]]
I want to do something like the following:
s = set()
s.add('a')
s.add('q')
s.add('x')
s.add('banana')
s1 = s(:1)
s2 = s(2:3)
Is the above a valid operation on sets? Is there a way to apply the above to a generator such as:
def Chunks(container, size):
for i in xrange(0, len(container), size):
yield container(i:i+size)
You can convert the set to a list, do the slicing, and then convert it back to sets:
In [13]: s = {1,2,3}
In [14]: s_l = list(s)
In [15]: print set(s_l[:1]), set(s_l[1:])
set([1]) set([2, 3])
Do note that sets do not support ordering, so any such ordering operation would have to be done within a list:
In [16]: s = {1,2,3,0}
In [17]: s
Out[17]: {0, 1, 2, 3}
In [18]: s_l = list(s)
In [19]: print set(s_l[:2]), set(s_l[2:])
set([0, 1]) set([2, 3])
Using itertools.islice it would be easy
>>> from itertools import islice
>>> def chunk(it, size):
... it = iter(it)
... return iter(lambda: tuple(islice(it, size)), ())
>>> data = {i for i in range(20)}
for j in chunk({i for i in range(20)}, 4):
... print(j)
...
(0, 1, 2, 3)
(4, 5, 6, 7)
(8, 9, 10, 11)
(12, 13, 14, 15)
(16, 17, 18, 19)
Sets are not indexable in the same way that lists are, but you can iterate over "chunks" or subgroups from them. Use the grouper recipe from itertools:
from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
s2 = set('Andy Sandy Jack Jill Frank Fred Sally Sarah Bill Billy Bob'.split())
for g in grouper(s2, 3):
print g
Yields:
('Sarah', 'Frank', 'Bill')
('Fred', 'Billy', 'Jill')
('Andy', 'Jack', 'Bob')
('Sally', 'Sandy', None)
The initial order of the set initializer is not preserved because unlike lists,
sets don't preserve order.
you can use itertools.islice, though sets don't guarantee an ordering so be careful if you aren't looking at the whole set
This question already has answers here:
How do I split a list into equally-sized chunks?
(66 answers)
Closed 8 months ago.
I am surprised I could not find a "batch" function that would take as input an iterable and return an iterable of iterables.
For example:
for i in batch(range(0,10), 1): print i
[0]
[1]
...
[9]
or:
for i in batch(range(0,10), 3): print i
[0,1,2]
[3,4,5]
[6,7,8]
[9]
Now, I wrote what I thought was a pretty simple generator:
def batch(iterable, n = 1):
current_batch = []
for item in iterable:
current_batch.append(item)
if len(current_batch) == n:
yield current_batch
current_batch = []
if current_batch:
yield current_batch
But the above does not give me what I would have expected:
for x in batch(range(0,10),3): print x
[0]
[0, 1]
[0, 1, 2]
[3]
[3, 4]
[3, 4, 5]
[6]
[6, 7]
[6, 7, 8]
[9]
So, I have missed something and this probably shows my complete lack of understanding of python generators. Anyone would care to point me in the right direction ?
[Edit: I eventually realized that the above behavior happens only when I run this within ipython rather than python itself]
This is probably more efficient (faster)
def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
for x in batch(range(0, 10), 3):
print x
Example using list
data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # list of data
for x in batch(data, 3):
print(x)
# Output
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 10]
It avoids building new lists.
The recipes in the itertools module provide two ways to do this depending on how you want to handle a final odd-sized lot (keep it, pad it with a fillvalue, ignore it, or raise an exception):
from itertools import islice, zip_longest
def batched(iterable, n):
"Batch data into lists of length n. The last batch may be shorter."
# batched('ABCDEFG', 3) --> ABC DEF G
it = iter(iterable)
while True:
batch = list(islice(it, n))
if not batch:
return
yield batch
def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
"Collect data into non-overlapping fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
# grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
# grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
args = [iter(iterable)] * n
if incomplete == 'fill':
return zip_longest(*args, fillvalue=fillvalue)
if incomplete == 'strict':
return zip(*args, strict=True)
if incomplete == 'ignore':
return zip(*args)
else:
raise ValueError('Expected fill, strict, or ignore')
More-itertools includes two functions that do what you need:
chunked(iterable, n) returns an iterable of lists, each of length n (except the last one, which may be shorter);
ichunked(iterable, n) is similar, but returns an iterable of iterables instead.
As others have noted, the code you have given does exactly what you want. For another approach using itertools.islice you could see an example of following recipe:
from itertools import islice, chain
def batch(iterable, size):
sourceiter = iter(iterable)
while True:
batchiter = islice(sourceiter, size)
yield chain([batchiter.next()], batchiter)
Solution for Python 3.8 if you are working with iterables that don't define a len function, and get exhausted:
from itertools import islice
def batcher(iterable, batch_size):
iterator = iter(iterable)
while batch := list(islice(iterator, batch_size)):
yield batch
Example usage:
def my_gen():
yield from range(10)
for batch in batcher(my_gen(), 3):
print(batch)
>>> [0, 1, 2]
>>> [3, 4, 5]
>>> [6, 7, 8]
>>> [9]
Could of course be implemented without the walrus operator as well.
This is a very short code snippet I know that does not use len and works under both Python 2 and 3 (not my creation):
def chunks(iterable, size):
from itertools import chain, islice
iterator = iter(iterable)
for first in iterator:
yield list(chain([first], islice(iterator, size - 1)))
Weird, seems to work fine for me in Python 2.x
>>> def batch(iterable, n = 1):
... current_batch = []
... for item in iterable:
... current_batch.append(item)
... if len(current_batch) == n:
... yield current_batch
... current_batch = []
... if current_batch:
... yield current_batch
...
>>> for x in batch(range(0, 10), 3):
... print x
...
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]
A workable version without new features in python 3.8, adapted from #Atra Azami's answer.
import itertools
def batch_generator(iterable, batch_size=1):
iterable = iter(iterable)
while True:
batch = list(itertools.islice(iterable, batch_size))
if len(batch) > 0:
yield batch
else:
break
for x in batch_generator(range(0, 10), 3):
print(x)
Output:
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]
def batch(iterable, n):
iterable=iter(iterable)
while True:
chunk=[]
for i in range(n):
try:
chunk.append(next(iterable))
except StopIteration:
yield chunk
return
yield chunk
list(batch(range(10), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
Moving as much into CPython as possible, by leveraging islice and iter(callable) behavior:
from itertools import islice
def chunked(generator, size):
"""Read parts of the generator, pause each time after a chunk"""
# islice returns results until 'size',
# make_chunk gets repeatedly called by iter(callable).
gen = iter(generator)
make_chunk = lambda: list(islice(gen, size))
return iter(make_chunk, [])
Inspired by more-itertools, and shortened to the essence of that code.
This is what I use in my project. It handles iterables or lists as efficiently as possible.
def chunker(iterable, size):
if not hasattr(iterable, "__len__"):
# generators don't have len, so fall back to slower
# method that works with generators
for chunk in chunker_gen(iterable, size):
yield chunk
return
it = iter(iterable)
for i in range(0, len(iterable), size):
yield [k for k in islice(it, size)]
def chunker_gen(generator, size):
iterator = iter(generator)
for first in iterator:
def chunk():
yield first
for more in islice(iterator, size - 1):
yield more
yield [k for k in chunk()]
I like this one,
def batch(x, bs):
return [x[i:i+bs] for i in range(0, len(x), bs)]
This returns a list of batches of size bs, you can make it a generator by using a generator expression (i for i in iterable) of course.
Here is an approach using reduce function.
Oneliner:
from functools import reduce
reduce(lambda cumulator,item: cumulator[-1].append(item) or cumulator if len(cumulator[-1]) < batch_size else cumulator + [[item]], input_array, [[]])
Or more readable version:
from functools import reduce
def batch(input_list, batch_size):
def reducer(cumulator, item):
if len(cumulator[-1]) < batch_size:
cumulator[-1].append(item)
return cumulator
else:
cumulator.append([item])
return cumulator
return reduce(reducer, input_list, [[]])
Test:
>>> batch([1,2,3,4,5,6,7], 3)
[[1, 2, 3], [4, 5, 6], [7]]
>>> batch(a, 8)
[[1, 2, 3, 4, 5, 6, 7]]
>>> batch([1,2,3,None,4], 3)
[[1, 2, 3], [None, 4]]
This would work for any iterable.
from itertools import zip_longest, filterfalse
def batch_iterable(iterable, batch_size=2):
args = [iter(iterable)] * batch_size
return (tuple(filterfalse(lambda x: x is None, group)) for group in zip_longest(fillvalue=None, *args))
It would work like this:
>>>list(batch_iterable(range(0,5)), 2)
[(0, 1), (2, 3), (4,)]
PS: It would not work if iterable has None values.
You can just group iterable items by their batch index.
def batch(items: Iterable, batch_size: int) -> Iterable[Iterable]:
# enumerate items and group them by batch index
enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
# extract items from enumeration tuples
item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
return item_batches
It is often the case when you want to collect inner iterables so here is more advanced version.
def batch_advanced(items: Iterable, batch_size: int, batches_mapper: Callable[[Iterable], Any] = None) -> Iterable[Iterable]:
enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
if batches_mapper:
item_batches = (batches_mapper(t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
else:
item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
return item_batches
Examples:
print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, tuple)))
# [(1, 9, 3, 5), (2, 4, 2)]
print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, list)))
# [[1, 9, 3, 5], [2, 4, 2]]
Related functionality you may need:
def batch(size, i):
""" Get the i'th batch of the given size """
return slice(size* i, size* i + size)
Usage:
>>> [1,2,3,4,5,6,7,8,9,10][batch(3, 1)]
>>> [4, 5, 6]
It gets the i'th batch from the sequence and it can work with other data structures as well, like pandas dataframes (df.iloc[batch(100,0)]) or numpy array (array[batch(100,0)]).
from itertools import *
class SENTINEL: pass
def batch(iterable, n):
return (tuple(filterfalse(lambda x: x is SENTINEL, group)) for group in zip_longest(fillvalue=SENTINEL, *[iter(iterable)] * n))
print(list(range(10), 3)))
# outputs: [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9,)]
print(list(batch([None]*10, 3)))
# outputs: [(None, None, None), (None, None, None), (None, None, None), (None,)]
I use
def batchify(arr, batch_size):
num_batches = math.ceil(len(arr) / batch_size)
return [arr[i*batch_size:(i+1)*batch_size] for i in range(num_batches)]
Keep taking (at most) n elements until it runs out.
def chop(n, iterable):
iterator = iter(iterable)
while chunk := list(take(n, iterator)):
yield chunk
def take(n, iterable):
iterator = iter(iterable)
for i in range(n):
try:
yield next(iterator)
except StopIteration:
return
This code has the following features:
Can take lists or generators (no len()) as input
Does not require imports of other packages
No padding added to last batch
def batch_generator(items, batch_size):
itemid=0 # Keeps track of current position in items generator/list
batch = [] # Empty batch
for item in items:
batch.append(item) # Append items to batch
if len(batch)==batch_size:
yield batch
itemid += batch_size # Increment the position in items
batch = []
yield batch # yield last bit
This question already has answers here:
How to iterate over a list in chunks
(39 answers)
Closed 8 months ago.
I have an array of bytes and what I want to do is take four bytes from the array, do something with it and then take the next four bytes. Is it at all possible to do this is a list comprehension or make a for loop take four items from the array instead of one?
Another option is using itertools
http://docs.python.org/library/itertools.html
by using the grouper() method
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
def clumper(s, count=4):
for x in range(0, len(s), count):
yield s[x:x+count]
>>> list(clumper("abcdefghijklmnopqrstuvwxyz"))
['abcd', 'efgh', 'ijkl', 'mnop', 'qrst', 'uvwx', 'yz']
>>> list(clumper("abcdefghijklmnopqrstuvwxyz", 5))
['abcde', 'fghij', 'klmno', 'pqrst', 'uvwxy', 'z']
In one line
x="12345678987654321"
y=[x[i:i+4] for i in range(0,len(x),4)]
print y
suxmac2:Music ajung$ cat xx.py
lst = range(20)
for i in range(0, len(lst)/4):
print lst[i*4 : i*4+4]
suxmac2:Music ajung$ python2.5 xx.py
[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10, 11]
[12, 13, 14, 15]
[16, 17, 18, 19]