Indexing a numpy array using a numpy array of slices - python

(Edit: I wrote a solution basing on hpaulj's answer, see code at the bottom of this post)
I wrote a function that subdivides an n-dimensional array into smaller ones such that each of the subdivisions has max_chunk_size elements in total.
Since I need to subdivide many arrays of same shapes and then perform operations on the corresponding chunks, it doesn't actually operate on the data rather than creates an array of "indexers", i. e. an array of (slice(x1, x2), slice(y1, y2), ...) objects (see the code below). With these indexers I can retrieve subdivisions by calling the_array[indexer[i]] (see examples below).
Also, the array of these indexers has same number of dimensions as input and divisions are aligned along corresponding axes, i. e. blocks the_array[indexer[i,j,k]] and the_array[indexer[i+1,j,k]] are adjusent along the 0-axis, etc.
I was expecting that I should also be able to concatenate these blocks by calling the_array[indexer[i:i+2,j,k]] and that the_array[indexer] would return just the_array, however such calls result in an error:
IndexError: arrays used as indices must be of integer (or boolean)
type
Is there a simple way around this error?
Here's the code:
import numpy as np
import itertools
def subdivide(shape, max_chunk_size=500000):
shape = np.array(shape).astype(float)
total_size = shape.prod()
# calculate maximum slice shape:
slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)
# create a list of slices for each dimension:
slices = [[slice(left, min(right, n)) \
for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
for n, step_size in zip(shape.astype(int), slice_shape)]
result = np.empty(reduce(lambda a,b:a*len(b), slices, 1), dtype=np.object)
for i, el in enumerate(itertools.product(*slices)): result[i] = el
result.shape = np.ceil(shape / slice_shape).astype(int)
return result
Here's an example usage:
>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> slices = subdivide(ar.shape, 16)
>>> slices
array([[(slice(0, 2, None), slice(0, 6, None)),
(slice(0, 2, None), slice(6, 12, None)),
(slice(0, 2, None), slice(12, 15, None))],
[(slice(2, 4, None), slice(0, 6, None)),
(slice(2, 4, None), slice(6, 12, None)),
(slice(2, 4, None), slice(12, 15, None))],
[(slice(4, 6, None), slice(0, 6, None)),
(slice(4, 6, None), slice(6, 12, None)),
(slice(4, 6, None), slice(12, 15, None))]], dtype=object)
>>> ar[slices[1,0]]
array([[30, 31, 32, 33, 34, 35],
[45, 46, 47, 48, 49, 50]])
>>> ar[slices[0,2]]
array([[12, 13, 14],
[27, 28, 29]])
>>> ar[slices[2,1]]
array([[66, 67, 68, 69, 70, 71],
[81, 82, 83, 84, 85, 86]])
>>> ar[slices[:2,1:3]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: arrays used as indices must be of integer (or boolean) type
Here's a solution based on hpaulj's answer:
import numpy as np
import itertools
class Subdivision():
def __init__(self, shape, max_chunk_size=500000):
shape = np.array(shape).astype(float)
total_size = shape.prod()
# calculate maximum slice shape:
slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)
# create a list of slices for each dimension:
slices = [[slice(left, min(right, n)) \
for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
for n, step_size in zip(shape.astype(int), slice_shape)]
self.slices = \
np.array(list(itertools.product(*slices)), \
dtype=np.object).reshape(tuple(np.ceil(shape / slice_shape).astype(int)) + (len(shape),))
def __getitem__(self, args):
if type(args) != tuple: args = (args,)
# turn integer index into equivalent slice
args = tuple(slice(arg, arg + 1 if arg != -1 else None) if type(arg) == int else arg for arg in args)
# select the slices
# always select all elements from the last axis (which contains slices for each data dimension)
slices = self.slices[args + ((slice(None),) if Ellipsis in args else (Ellipsis, slice(None)))]
return np.ix_(*tuple(np.r_[tuple(slices[tuple([0] * i + [slice(None)] + \
[0] * (len(slices.shape) - 2 - i) + [i])])] \
for i in range(len(slices.shape) - 1)))
Example usage:
>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> subdiv = Subdivision(ar.shape, 16)
>>> ar[subdiv[...]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> ar[subdiv[0]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
>>> ar[subdiv[:2,1]]
array([[ 6, 7, 8, 9, 10, 11],
[21, 22, 23, 24, 25, 26],
[36, 37, 38, 39, 40, 41],
[51, 52, 53, 54, 55, 56]])
>>> ar[subdiv[2,:3]]
array([[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> ar[subdiv[...,:2]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]])

Your slices produce 2x6 and 2x3 arrays.
In [36]: subslice=slices[:2,1:3]
In [37]: subslice[0,0]
Out[37]: array([slice(0, 2, None), slice(6, 12, None)], dtype=object)
In [38]: ar[tuple(subslice[0,0])]
Out[38]:
array([[ 6, 7, 8, 9, 10, 11],
[21, 22, 23, 24, 25, 26]])
My numpy version expects me to turn the subslice into a tuple. This is the same as
ar[slice(0,2), slice(6,12)]
ar[:2, 6:12]
That's just the basic syntax of indexing and slicing. ar is 2d, so ar[(i,j)] requires a 2 element tuple - of slices, lists, arrays, or integers. It won't work with an array of slice objects.
How ever it is possible to concatenate the results into a larger array. That can be done after indexing or the slices can be converted into indexing lists.
np.bmat for example concatenates together a 2d arangement of arrays:
In [42]: np.bmat([[ar[tuple(subslice[0,0])], ar[tuple(subslice[0,1])]],
[ar[tuple(subslice[1,0])],ar[tuple(subslice[1,1])]]])
Out[42]:
matrix([[ 6, 7, 8, 9, 10, 11, 12, 13, 14],
[21, 22, 23, 24, 25, 26, 27, 28, 29],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[51, 52, 53, 54, 55, 56, 57, 58, 59]])
You could generalize this. It just uses hstack and vstack on the nested lists. The result is np.matrix but can be converted back to array.
The other approach is to use tools like np.arange, np.r_, np.xi_ to create index arrays. It'll take some playing around to generate an example.
To combine the [0,0] and [0,1] subslices:
In [64]: j = np.r_[subslice[0,0,1],subslice[0,1,1]]
In [65]: i = np.r_[subslice[0,0,0]]
In [66]: i,j
Out[66]: (array([0, 1]), array([ 6, 7, 8, 9, 10, 11, 12, 13, 14]))
In [68]: ix = np.ix_(i,j)
In [69]: ix
Out[69]:
(array([[0],
[1]]), array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14]]))
In [70]: ar[ix]
Out[70]:
array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14],
[21, 22, 23, 24, 25, 26, 27, 28, 29]])
Or with i = np.r_[subslice[0,0,0], subslice[1,0,0]], ar[np.ix_(i,j)] produces the 4x9 array.

Related

How do you split an array into specific intervals in Num.py for Python?

The question follows a such:
x = np.arange(100)
Write Python code to split the following array at these intervals: 10, 25, 45, 75, 95
I have used the split function and unable to get at these specific intervals, can anyone enlighten me on another method or am i doing it wrongly?
Here's both the manual way and the numpy way with split.
# Manual method
x = np.arange(100)
split_indices = [10, 25, 45, 75, 95]
split_arrays = []
for i, j in zip([0]+split_indices[:-1], split_indices):
split_arrays.append(x[i:j])
print(split_arrays)
# Numpy method
split_arrays_np = np.split(x, split_indices)
print(split_arrays_np)
And the result is (for both)
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]),
array([25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]),
array([45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74]),
array([75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94])
]

Transform a list of ranges into a single list

I have a data frame that have some points to mark another dataset.
I'm creating a range from the starting mark and the stopping mark that I want to transform into a single list or numpy array.
I have the following:
list(map(lambda limits : np.arange(limits[1] - limits[0]-1, -1, -1),
zip(df_cycles['Start_point'], df_cycles['Stop_point']))
)
This is returning a list of arrays:
[array([1155, 1154, 1153, ..., 2, 1, 0]),
array([71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55,
54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38,
37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21,
20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,
3, 2, 1, 0]),
...]
How can I modify or transform the output to have a single list or NumPy array like this:
array([1155, 1154, 1153, ..., 2, 1, 0, 71, 70, 69, 68, 67, 66, 65,
64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48,
47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31,
30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14,
13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2, 1, 0,...])
Just do:
flatarray = np.concatenate(list_of_arrays)
concatenate puts together two or more arrays into a single new array; you don't to do it a single array at a time (it creates a Schlemiel the Painter's algorithm), but once you've got them all, it's an efficient way to combine them.

How to transpose every n rows into columns in NumPy?

I have a numpy array of shape (1000000,).
I would like every n=1000 rows to become columns.
The resulting shape should be (1000, 1000)
How can I do this with NumPy? np.transpose() doesn't seem to do what I want.
I don't want to use a for loop for performance reasons.
You can use reshape with the order='F' parameter:
Example with a (100,) 1D array converted to (10,10) 2D array:
a = np.arange(100). # array([0, 1, 2, ..., 98, 99])
b = a.reshape((10,10), order='F')
Output:
>>> b
array([[ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90],
[ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91],
[ 2, 12, 22, 32, 42, 52, 62, 72, 82, 92],
[ 3, 13, 23, 33, 43, 53, 63, 73, 83, 93],
[ 4, 14, 24, 34, 44, 54, 64, 74, 84, 94],
[ 5, 15, 25, 35, 45, 55, 65, 75, 85, 95],
[ 6, 16, 26, 36, 46, 56, 66, 76, 86, 96],
[ 7, 17, 27, 37, 47, 57, 67, 77, 87, 97],
[ 8, 18, 28, 38, 48, 58, 68, 78, 88, 98],
[ 9, 19, 29, 39, 49, 59, 69, 79, 89, 99]])

Indexing numpy.ndarrays periodically

I am trying to access (read/write) numpy.ndarrays periodically. In other words, if I have my_array with the shape of 10*10 and I use the access operator with the inputs:
my_arrray[10, 10] or acess_function(my_array, 10, 10)
I can have access to element
my_array[0, 0].
I want to have read/write ability at my returned element of periodically indexed array.
Can anyone how to do it without making a shifted copy of my original array?
I think this does what you want but I'm not sure whether there's something more elegant that exists. It's probably possible to write a general function for an Nd array but this does 2D only. As you said it uses modular arithmetic.
import numpy as np
def access(shape, ixr, ixc):
""" Returns a selection. """
return np.s_[ixr % shape[0], ixc % shape[1]]
arr = np.arange(100)
arr.shape = 10,10
arr[ access(arr.shape, 45, 87) ]
# 57
arr[access(arr.shape, 45, 87)] = 100
In [18]: arr
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# [ 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
# [ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
# [ 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
# [ 50, 51, 52, 53, 54, 55, 56, **100**, 58, 59],
# [ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
# [ 70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
# [ 80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
# [ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
Edit - Generic nD version
def access(shape, *args):
if len(shape) != len(args):
error = 'Inconsistent number of dimemsions: {} & number of indices: {} in coords.'
raise IndexError( error.format(len(shape), len(args)))
res = []
for limit, ix in zip(shape, args):
res.append(ix % limit)
return tuple(res)
Usage/Test
a = np.arange(24)
a.shape = 2,3,4
a[access(a.shape, 5, 6, 7)]
# 15
a[access(a.shape, 5,6,7) ] = 100
a
# array([[[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]],
# [[ 12, 13, 14, 100],
# [ 16, 17, 18, 19],
# [ 20, 21, 22, 23]]])

Numpy array split by pairs of irregular (start, stop)

I have an numpy array with x and y values of points. I have another array which contains pairs of start and end indices. Originally this data was in pandas DataFrame, but since it was over 60 millions items, the loc algorithm was very slow. Is there any numpy fast method to split this?
import numpy as np
xy_array = np.arange(100).reshape(2,-1)
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
split_paris = [[0, 10], [10, 13], [13, 17], [20, 22]]
expected_result = [
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]],
[[10, 11, 12], [60, 61, 62]],
[[13, 14, 15, 16], [63, 64, 65, 66]],
[[20, 21], [70, 71]]
]
Update:
It is not always the case that, next pair will start from end of previous.
This will do it:
import numpy as np
xy_array = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
split_paris = [[0, 10], [10, 13], [13, 17]]
expected_result = [xy_array[:, x:y] for x, y in split_paris]
expected_result
#[array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]]), array([[10, 11, 12],
# [60, 61, 62]]), array([[13, 14, 15, 16],
# [63, 64, 65, 66]])]
It is using index slicing basically working in sense array[rows, columns] having : take all rows and x:y taking columns from x to y.
you can always use the np.array_split function provided by numpy. and use the ranges you want
x = np.arange(8.0)
>>> np.array_split(x, 3)
[array([ 0., 1., 2.]), array([ 3., 4., 5.]), array([ 6., 7.])]

Categories