Slicing a list into n nearly-equal-length partitions [duplicate] - python

This question already has answers here:
Splitting a list into N parts of approximately equal length
(36 answers)
Closed 5 years ago.
I'm looking for a fast, clean, pythonic way to divide a list into exactly n nearly-equal partitions.
partition([1,2,3,4,5],5)->[[1],[2],[3],[4],[5]]
partition([1,2,3,4,5],2)->[[1,2],[3,4,5]] (or [[1,2,3],[4,5]])
partition([1,2,3,4,5],3)->[[1,2],[3,4],[5]] (there are other ways to slice this one too)
There are several answers in here Iteration over list slices that run very close to what I want, except they are focused on the size of the list, and I care about the number of the lists (some of them also pad with None). These are trivially converted, obviously, but I'm looking for a best practice.
Similarly, people have pointed out great solutions here How do you split a list into evenly sized chunks? for a very similar problem, but I'm more interested in the number of partitions than the specific size, as long as it's within 1. Again, this is trivially convertible, but I'm looking for a best practice.

Just a different take, that only works if [[1,3,5],[2,4]] is an acceptable partition, in your example.
def partition ( lst, n ):
return [ lst[i::n] for i in xrange(n) ]
This satisfies the example mentioned in #Daniel Stutzbach's example:
partition(range(105),10)
# [[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
# [1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 101],
# [2, 12, 22, 32, 42, 52, 62, 72, 82, 92, 102],
# [3, 13, 23, 33, 43, 53, 63, 73, 83, 93, 103],
# [4, 14, 24, 34, 44, 54, 64, 74, 84, 94, 104],
# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95],
# [6, 16, 26, 36, 46, 56, 66, 76, 86, 96],
# [7, 17, 27, 37, 47, 57, 67, 77, 87, 97],
# [8, 18, 28, 38, 48, 58, 68, 78, 88, 98],
# [9, 19, 29, 39, 49, 59, 69, 79, 89, 99]]

Here's a version that's similar to Daniel's: it divides as evenly as possible, but puts all the larger partitions at the start:
def partition(lst, n):
q, r = divmod(len(lst), n)
indices = [q*i + min(i, r) for i in xrange(n+1)]
return [lst[indices[i]:indices[i+1]] for i in xrange(n)]
It also avoids the use of float arithmetic, since that always makes me uncomfortable. :)
Edit: an example, just to show the contrast with Daniel Stutzbach's solution
>>> print [len(x) for x in partition(range(105), 10)]
[11, 11, 11, 11, 11, 10, 10, 10, 10, 10]

def partition(lst, n):
division = len(lst) / float(n)
return [ lst[int(round(division * i)): int(round(division * (i + 1)))] for i in xrange(n) ]
>>> partition([1,2,3,4,5],5)
[[1], [2], [3], [4], [5]]
>>> partition([1,2,3,4,5],2)
[[1, 2, 3], [4, 5]]
>>> partition([1,2,3,4,5],3)
[[1, 2], [3, 4], [5]]
>>> partition(range(105), 10)
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39, 40, 41], [42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52], [53, 54, 55, 56, 57, 58, 59, 60, 61, 62], [63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73], [74, 75, 76, 77, 78, 79, 80, 81, 82, 83], [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94], [95, 96, 97, 98, 99, 100, 101, 102, 103, 104]]
Python 3 version:
def partition(lst, n):
division = len(lst) / n
return [lst[round(division * i):round(division * (i + 1))] for i in range(n)]

Below is one way.
def partition(lst, n):
increment = len(lst) / float(n)
last = 0
i = 1
results = []
while last < len(lst):
idx = int(round(increment * i))
results.append(lst[last:idx])
last = idx
i += 1
return results
If len(lst) cannot be evenly divided by n, this version will distribute the extra items at roughly equal intervals. For example:
>>> print [len(x) for x in partition(range(105), 10)]
[11, 10, 11, 10, 11, 10, 11, 10, 11, 10]
The code could be simpler if you don't mind all of the 11s being at the beginning or the end.

This answer provides a function split(list_, n, max_ratio), for people
who want to split their list into n pieces with at most max_ratio
ratio in piece length. It allows for more variation than the
questioner's 'at most 1 difference in piece length'.
It works by sampling n piece lengths within the desired ratio range
[1 , max_ratio), placing them after each other to form a 'broken
stick' with the right distances between the 'break points' but the wrong
total length. Scaling the broken stick to the desired length gives us
the approximate positions of the break points we want. To get integer
break points requires subsequent rounding.
Unfortunately, the roundings can conspire to make pieces just too short,
and let you exceed the max_ratio. See the bottom of this answer for an
example.
import random
def splitting_points(length, n, max_ratio):
"""n+1 slice points [0, ..., length] for n random-sized slices.
max_ratio is the largest allowable ratio between the largest and the
smallest part.
"""
ratios = [random.uniform(1, max_ratio) for _ in range(n)]
normalized_ratios = [r / sum(ratios) for r in ratios]
cumulative_ratios = [
sum(normalized_ratios[0:i])
for i in range(n+1)
]
scaled_distances = [
int(round(r * length))
for r in cumulative_ratios
]
return scaled_distances
def split(list_, n, max_ratio):
"""Slice a list into n randomly-sized parts.
max_ratio is the largest allowable ratio between the largest and the
smallest part.
"""
points = splitting_points(len(list_), n, ratio)
return [
list_[ points[i] : points[i+1] ]
for i in range(n)
]
You can try it out like so:
for _ in range(10):
parts = split('abcdefghijklmnopqrstuvwxyz', 4, 2)
print([(len(part), part) for part in parts])
Example of a bad result:
parts = split('abcdefghijklmnopqrstuvwxyz', 10, 2)
# lengths range from 1 to 4, not 2 to 4
[(3, 'abc'), (3, 'def'), (1, 'g'),
(4, 'hijk'), (3, 'lmn'), (2, 'op'),
(2, 'qr'), (3, 'stu'), (2, 'vw'),
(3, 'xyz')]

Related

Indexing numpy.ndarrays periodically

I am trying to access (read/write) numpy.ndarrays periodically. In other words, if I have my_array with the shape of 10*10 and I use the access operator with the inputs:
my_arrray[10, 10] or acess_function(my_array, 10, 10)
I can have access to element
my_array[0, 0].
I want to have read/write ability at my returned element of periodically indexed array.
Can anyone how to do it without making a shifted copy of my original array?
I think this does what you want but I'm not sure whether there's something more elegant that exists. It's probably possible to write a general function for an Nd array but this does 2D only. As you said it uses modular arithmetic.
import numpy as np
def access(shape, ixr, ixc):
""" Returns a selection. """
return np.s_[ixr % shape[0], ixc % shape[1]]
arr = np.arange(100)
arr.shape = 10,10
arr[ access(arr.shape, 45, 87) ]
# 57
arr[access(arr.shape, 45, 87)] = 100
In [18]: arr
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# [ 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
# [ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
# [ 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
# [ 50, 51, 52, 53, 54, 55, 56, **100**, 58, 59],
# [ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
# [ 70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
# [ 80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
# [ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
Edit - Generic nD version
def access(shape, *args):
if len(shape) != len(args):
error = 'Inconsistent number of dimemsions: {} & number of indices: {} in coords.'
raise IndexError( error.format(len(shape), len(args)))
res = []
for limit, ix in zip(shape, args):
res.append(ix % limit)
return tuple(res)
Usage/Test
a = np.arange(24)
a.shape = 2,3,4
a[access(a.shape, 5, 6, 7)]
# 15
a[access(a.shape, 5,6,7) ] = 100
a
# array([[[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]],
# [[ 12, 13, 14, 100],
# [ 16, 17, 18, 19],
# [ 20, 21, 22, 23]]])

How to count number of values in the randomly generated list?

I have created a random list by below command:
import random
a=[random.randrange(0,100) for i in xrange(50)]
print a
Now, what could be the command for counting the number of values that are between 0 and 9, 10 and 19, 20 and 29, and so on.
I can print them as below:
import random
a = [random.randrange(0,100) for i in xrange(50)]
for b in a:
if b<10:
print b
But, I don't know how to write a command to count the number of the values after printing b.
Thanks for your comments.
Just make a dictionary, enumerate and count.
>>> import random
>>> a = [random.randrange(0,100) for i in xrange(50)]
>>> a
[88, 48, 7, 92, 22, 13, 66, 38, 72, 34, 8, 18, 13, 29, 48, 63, 23, 30, 91, 40, 96, 89, 27, 8, 92, 26, 98, 83, 31, 45, 81, 4, 55, 4, 42, 94, 64, 35, 19, 64, 18, 96, 26, 12, 1, 54, 89, 67, 82, 62]
>>> counts = {}
>>> for i in a:
t = counts.setdefault(i/10,0)
counts[i/10] = t + 1
>>> counts
{0: 6, 1: 6, 2: 6, 3: 5, 4: 5, 5: 2, 6: 6, 7: 1, 8: 6, 9: 7}
# Means: 0-9=> 6 numbers, 10-19=> 6 numbers etc.
if I understood you correctly, then so:
import random
a = [random.randrange(0,100) for i in xrange(50)]
print len(filter(lambda x: 0 <= x < 10,a))
print len(filter(lambda x: 10 <= x < 20,a))
etc
You may use bisect.bisect(...) to achieve this as:
from bisect import bisect
import random
randon_nums = [random.randint(0,100) for _ in xrange(100)]
bucket = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100] # can also be created using:
# range(10, 101, 10)
randon_nums.sort() # sort the initial list in order to use it with `bisect`
counts = []
last_bucket_count = 0 # to track the count of numbers in last calculated bucket
for range_max in bucket:
i = bisect(randon_nums, range_max, end_index)
counts.append(i - last_bucket_count)
last_bucket_count = i
Sample Run:
When the value of random_nums is:
>>> randon_nums
[0, 1, 4, 5, 5, 5, 5, 6, 7, 7, 8, 8, 10, 10, 11, 11, 12, 13, 13, 13, 16, 17, 18, 18, 18, 18, 19, 20, 21, 22, 24, 24, 25, 25, 26, 26, 26, 26, 26, 29, 30, 30, 31, 33, 37, 37, 38, 42, 42, 43, 44, 44, 47, 47, 49, 51, 52, 55, 55, 57, 57, 58, 59, 63, 63, 63, 63, 64, 64, 65, 66, 67, 68, 71, 73, 73, 73, 74, 77, 79, 82, 83, 83, 83, 84, 85, 87, 87, 88, 89, 89, 90, 92, 93, 95, 96, 98, 98, 99, 99]
the above program returns count as:
>>> counts
[ 14, 14, 14, 5, 8, 8, 10, 7, 12, 8]
# ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
# 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100
In data analysis and statistics this is called 'binning'. If you poke around on the 'net for terms like 'bin' and 'bins' you'll find a plethora of pages about software and how to do this.
But a really good one uses the preeminent product for Python, numpy.
>>> import random
>>> a=[random.randrange(0,100) for i in range(50)]
>>> from numpy import histogram
In your case you need to set up the end points of the bins which are -0.5, 9.5, 19.5, 29.5, 39.5, 49.5, 59.5, 69.5, 79.5, 89.5, and 99.5. (I've chosen -0.5 for the low end only because it made my calculation easier.) histogram counts how many items fall within each of the ranges given by these numbers, taken in pairs (-0.5 to 9.5, 9.5 to 19.5, etc).
>>> bins = [-0.5+10*i for i in range(11)]
>>> hist,_ = histogram(a, bins)
And here's the result.
>>> hist
array([6, 6, 2, 6, 2, 3, 6, 9, 5, 5], dtype=int64)

Is there a preferred or built-in way to partition arrays? [duplicate]

This question already has answers here:
Splitting a list into N parts of approximately equal length
(36 answers)
Closed 5 years ago.
I'm looking for a fast, clean, pythonic way to divide a list into exactly n nearly-equal partitions.
partition([1,2,3,4,5],5)->[[1],[2],[3],[4],[5]]
partition([1,2,3,4,5],2)->[[1,2],[3,4,5]] (or [[1,2,3],[4,5]])
partition([1,2,3,4,5],3)->[[1,2],[3,4],[5]] (there are other ways to slice this one too)
There are several answers in here Iteration over list slices that run very close to what I want, except they are focused on the size of the list, and I care about the number of the lists (some of them also pad with None). These are trivially converted, obviously, but I'm looking for a best practice.
Similarly, people have pointed out great solutions here How do you split a list into evenly sized chunks? for a very similar problem, but I'm more interested in the number of partitions than the specific size, as long as it's within 1. Again, this is trivially convertible, but I'm looking for a best practice.
Just a different take, that only works if [[1,3,5],[2,4]] is an acceptable partition, in your example.
def partition ( lst, n ):
return [ lst[i::n] for i in xrange(n) ]
This satisfies the example mentioned in #Daniel Stutzbach's example:
partition(range(105),10)
# [[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
# [1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 101],
# [2, 12, 22, 32, 42, 52, 62, 72, 82, 92, 102],
# [3, 13, 23, 33, 43, 53, 63, 73, 83, 93, 103],
# [4, 14, 24, 34, 44, 54, 64, 74, 84, 94, 104],
# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95],
# [6, 16, 26, 36, 46, 56, 66, 76, 86, 96],
# [7, 17, 27, 37, 47, 57, 67, 77, 87, 97],
# [8, 18, 28, 38, 48, 58, 68, 78, 88, 98],
# [9, 19, 29, 39, 49, 59, 69, 79, 89, 99]]
Here's a version that's similar to Daniel's: it divides as evenly as possible, but puts all the larger partitions at the start:
def partition(lst, n):
q, r = divmod(len(lst), n)
indices = [q*i + min(i, r) for i in xrange(n+1)]
return [lst[indices[i]:indices[i+1]] for i in xrange(n)]
It also avoids the use of float arithmetic, since that always makes me uncomfortable. :)
Edit: an example, just to show the contrast with Daniel Stutzbach's solution
>>> print [len(x) for x in partition(range(105), 10)]
[11, 11, 11, 11, 11, 10, 10, 10, 10, 10]
def partition(lst, n):
division = len(lst) / float(n)
return [ lst[int(round(division * i)): int(round(division * (i + 1)))] for i in xrange(n) ]
>>> partition([1,2,3,4,5],5)
[[1], [2], [3], [4], [5]]
>>> partition([1,2,3,4,5],2)
[[1, 2, 3], [4, 5]]
>>> partition([1,2,3,4,5],3)
[[1, 2], [3, 4], [5]]
>>> partition(range(105), 10)
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39, 40, 41], [42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52], [53, 54, 55, 56, 57, 58, 59, 60, 61, 62], [63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73], [74, 75, 76, 77, 78, 79, 80, 81, 82, 83], [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94], [95, 96, 97, 98, 99, 100, 101, 102, 103, 104]]
Python 3 version:
def partition(lst, n):
division = len(lst) / n
return [lst[round(division * i):round(division * (i + 1))] for i in range(n)]
Below is one way.
def partition(lst, n):
increment = len(lst) / float(n)
last = 0
i = 1
results = []
while last < len(lst):
idx = int(round(increment * i))
results.append(lst[last:idx])
last = idx
i += 1
return results
If len(lst) cannot be evenly divided by n, this version will distribute the extra items at roughly equal intervals. For example:
>>> print [len(x) for x in partition(range(105), 10)]
[11, 10, 11, 10, 11, 10, 11, 10, 11, 10]
The code could be simpler if you don't mind all of the 11s being at the beginning or the end.
This answer provides a function split(list_, n, max_ratio), for people
who want to split their list into n pieces with at most max_ratio
ratio in piece length. It allows for more variation than the
questioner's 'at most 1 difference in piece length'.
It works by sampling n piece lengths within the desired ratio range
[1 , max_ratio), placing them after each other to form a 'broken
stick' with the right distances between the 'break points' but the wrong
total length. Scaling the broken stick to the desired length gives us
the approximate positions of the break points we want. To get integer
break points requires subsequent rounding.
Unfortunately, the roundings can conspire to make pieces just too short,
and let you exceed the max_ratio. See the bottom of this answer for an
example.
import random
def splitting_points(length, n, max_ratio):
"""n+1 slice points [0, ..., length] for n random-sized slices.
max_ratio is the largest allowable ratio between the largest and the
smallest part.
"""
ratios = [random.uniform(1, max_ratio) for _ in range(n)]
normalized_ratios = [r / sum(ratios) for r in ratios]
cumulative_ratios = [
sum(normalized_ratios[0:i])
for i in range(n+1)
]
scaled_distances = [
int(round(r * length))
for r in cumulative_ratios
]
return scaled_distances
def split(list_, n, max_ratio):
"""Slice a list into n randomly-sized parts.
max_ratio is the largest allowable ratio between the largest and the
smallest part.
"""
points = splitting_points(len(list_), n, ratio)
return [
list_[ points[i] : points[i+1] ]
for i in range(n)
]
You can try it out like so:
for _ in range(10):
parts = split('abcdefghijklmnopqrstuvwxyz', 4, 2)
print([(len(part), part) for part in parts])
Example of a bad result:
parts = split('abcdefghijklmnopqrstuvwxyz', 10, 2)
# lengths range from 1 to 4, not 2 to 4
[(3, 'abc'), (3, 'def'), (1, 'g'),
(4, 'hijk'), (3, 'lmn'), (2, 'op'),
(2, 'qr'), (3, 'stu'), (2, 'vw'),
(3, 'xyz')]

python loop x + 1 times in a list of list until number y

What I want is the list in num loops x + 1 everytime until y is generated(and loops is stoped), which is a large number.
def loop_num(y):
num = []
num.append([1])
num.append([2,3])
num.append([4,5,6])
num.append([7,8,9,10])
... #stop append when y in the appended list
#if y = 9, then `append [7,8]` and `return num`
return num
# something like[[1 item], [2items], [3items], ...]
# the number append to the list can be a random number or ascending integers.
sorry for not clear
Two itertools.count objects should do what you want:
from itertools import count
def loop_num(y):
counter, x = count(1), count(1)
n = 0
while n < y:
num = []
for i in range(next(x)):
num.append(next(counter))
if num[-1] == y:
break
yield num
n = num[-1]
Output:
>>> list(loop_num(100))
[[1],
[2, 3],
[4, 5, 6],
[7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21],
[22, 23, 24, 25, 26, 27, 28],
[29, 30, 31, 32, 33, 34, 35, 36],
[37, 38, 39, 40, 41, 42, 43, 44, 45],
[46, 47, 48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66],
[67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78],
[79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91],
[92, 93, 94, 95, 96, 97, 98, 99, 100]]
def loop_num(x):
i=1
cnt=0
sum=0
while sum<x:
sum+=i
cnt=cnt+1
i=i+1
num=[ [] for x in range(cnt)]
count=0
sz=1
init=1
while(count<cnt):
cur=1
while(cur<=sz):
num[count].append(init)
init=init+1
cur=cur+1
count=count+1
sz=sz+1;
return num
From a python file you may run it from command line (say filename is test.py)
python -c 'import test; print test.loop_num(55)'
I am not sure what is your question. But I am assuming that you have done something similar to this.
x=[[1], [2,3], [4,5,6]]
b=[str(a)+' items' for a in [j for i in x for j in i]]
And what you are looking for is this.
c=max([j for i in x for j in i])
to do this.
z=[]
z.append(str(c)+' items')

Subset of an ndarray based on another array

I have an array of ints, they need to be grouped by 4 each. I'd also like to select them based on another criterion, start < t < stop. I tried
data[group].reshape((-1,4))[start < t < stop]
but that complains about the start < t < stop because that's hardcoded syntax. Can I somehow intersect the two arrays from start < t and t < stop?
The right way of boolean indexing for an array should be like this:
>>> import numpy as np
>>> a=np.random.randint(0,20,size=24)
>>> b=np.arange(24)
>>> b[(8<a)&(a<15)] #rather than 8<a<15
array([ 3, 5, 6, 11, 13, 16, 17, 18, 20, 21, 22, 23])
But you may not be able to reshape the resulting array into a shape of (-1,4), it is a coincidence that the resulting array here contains 3*4 elements.
EDIT, now I understand your OP better. You always reshape data[group] first, right?:
>>> b=np.arange(96)
>>> b.reshape((-1,4))[(8<a)&(a<15)]
array([[12, 13, 14, 15],
[20, 21, 22, 23],
[24, 25, 26, 27],
[44, 45, 46, 47],
[52, 53, 54, 55],
[64, 65, 66, 67],
[68, 69, 70, 71],
[72, 73, 74, 75],
[80, 81, 82, 83],
[84, 85, 86, 87],
[88, 89, 90, 91],
[92, 93, 94, 95]])
How about this?
import numpy as np
arr = np.arange(32)
t = np.arange(300, 364, 2)
start = 310
stop = 352
mask = np.logical_and(start < t, t < stop)
print mask
print arr[mask].reshape((-1,4))
I did the masking before the reshaping, not sure if that's what you wanted. The key part is probably the logical_and().

Categories