NumPy nearest value along axis of multidimensional array

NumPy nearest value along axis of multidimensional array - python

I'd like to create a function, that returns the nearest value in the array along a specified axis to a given value.
To get the index of the nearest value I use the following code where arr is a multidimensional array and value is the value to look for:
def nearest_index( arr, value, axis=None ):
return ( np.abs( arr - value ) ).argmin( axis=axis )
But I struggle with using the result of this function to get the values from the array.
It is easy with 1D-arrays:
In [14]: arr_1 = np.random.randint( 10, 100, size=( 10, ) )
In [15]: arr_1
Out[15]: array([67, 49, 90, 29, 60, 80, 31, 55, 29, 10])
In [16]: nearest_index( arr_1, 50 )
Out[16]: 1
In [17]: arr_1[nearest_index( arr_1, 50 )]
Out[17]: 49
or with flattened arrays:
In [25]: arr_3 = np.random.randint( 10, 100, size=( 2, 3, 4, ) )
In [26]: arr_3
Out[26]:
array([[[85, 51, 74, 79],
[63, 42, 27, 75],
[89, 68, 80, 63]],
[[85, 72, 74, 16],
[85, 22, 47, 78],
[44, 70, 98, 34]]])
In [27]: idx_flat = nearest_index( arr_3, 50, axis=None )
In [28]: idx_flat
Out[28]: 1
In [29]: idx = np.unravel_index( idx_flat, arr_3.shape )
In [30]: idx
Out[30]: (0, 0, 1)
In [31]: arr_3[idx]
Out[31]: 51
How can I create a function, that returns the values along the defined axis?
I tried the solution for this Question, but I only got it working for axis=-1.
Note that it is not an issue to me, if only the first occurance of the result is found if multiple elements in the array are equally near the expected value.

For a multi-dimensional array, we need to use advanced-indexing. So, for a generic n-dim array and with a specified axis, we could do something like this -
def argmin_values_along_axis(arr, value, axis):
argmin_idx = np.abs(arr - value).argmin(axis=axis)
shp = arr.shape
indx = list(np.ix_(*[np.arange(i) for i in shp]))
indx[axis] = np.expand_dims(argmin_idx, axis=axis)
return np.squeeze(arr[indx])
Sample runs -
In [203]: arr_3 = np.random.randint( 10, 100, size=( 2, 3, 4, ) )
In [204]: arr_3
Out[204]:
array([[[94, 55, 26, 51],
[82, 66, 80, 66],
[96, 54, 93, 57]],
[[59, 28, 95, 56],
[47, 48, 17, 77],
[15, 57, 57, 25]]])
In [205]: argmin_values_along_axis(arr_3, value=50, axis=0)
Out[205]:
array([[59, 55, 26, 51],
[47, 48, 80, 66],
[15, 54, 57, 57]])
In [206]: argmin_values_along_axis(arr_3, value=50, axis=1)
Out[206]:
array([[82, 54, 26, 51],
[47, 48, 57, 56]])
In [207]: argmin_values_along_axis(arr_3, value=50, axis=2)
Out[207]:
array([[51, 66, 54],
[56, 48, 57]])

Well it works for me.
def nearest_index(arr, value, axis=None):
return np.argmin(np.abs( arr - value ), axis=axis)
>>> X
array([[76, 94, 56, 93, 28, 0, 44, 50, 89, 93],
[80, 99, 29, 98, 39, 27, 55, 70, 19, 76],
[87, 7, 28, 78, 47, 95, 34, 97, 66, 27],
[75, 78, 82, 30, 15, 0, 2, 25, 58, 69],
[31, 2, 34, 1, 56, 7, 87, 78, 32, 77],
[89, 80, 76, 97, 49, 18, 62, 35, 94, 41],
[ 2, 44, 83, 3, 64, 4, 49, 93, 46, 8],
[51, 63, 45, 57, 77, 90, 93, 4, 26, 81],
[43, 92, 22, 98, 93, 36, 46, 25, 35, 36],
[30, 14, 42, 91, 86, 14, 78, 9, 37, 19]])
>>> X[nearest_index(X, 2, axis=0), np.arange(10)]
array([ 2, 2, 22, 1, 15, 0, 2, 4, 19, 8])
>>> X[np.arange(10), nearest_index(X, 2, axis=1)]
array([ 0, 19, 7, 2, 2, 18, 2, 4, 22, 9])

Related

Fancy indexing across multiple dimensions in numpy [duplicate]

I've got a strange situation.
I have a 2D Numpy array, x:
x = np.random.random_integers(0,5,(20,8))
And I have 2 indexers--one with indices for the rows, and one with indices for the column. In order to index X, I am having to do the following:
row_indices = [4,2,18,16,7,19,4]
col_indices = [1,2]
x_rows = x[row_indices,:]
x_indexed = x_rows[:,column_indices]
Instead of just:
x_new = x[row_indices,column_indices]
(which fails with: error, cannot broadcast (20,) with (2,))
I'd like to be able to do the indexing in one line using the broadcasting, since that would keep the code clean and readable...also, I don't know all that much about python under the hood, but as I understand it, it should be faster to do it in one line (and I'll be working with pretty big arrays).
Test Case:
x = np.random.random_integers(0,5,(20,8))
row_indices = [4,2,18,16,7,19,4]
col_indices = [1,2]
x_rows = x[row_indices,:]
x_indexed = x_rows[:,col_indices]
x_doesnt_work = x[row_indices,col_indices]

Selections or assignments with np.ix_ using indexing or boolean arrays/masks
1. With indexing-arrays
A. Selection
We can use np.ix_ to get a tuple of indexing arrays that are broadcastable against each other to result in a higher-dimensional combinations of indices. So, when that tuple is used for indexing into the input array, would give us the same higher-dimensional array. Hence, to make a selection based on two 1D indexing arrays, it would be -
x_indexed = x[np.ix_(row_indices,col_indices)]
B. Assignment
We can use the same notation for assigning scalar or a broadcastable array into those indexed positions. Hence, the following works for assignments -
x[np.ix_(row_indices,col_indices)] = # scalar or broadcastable array
2. With masks
We can also use boolean arrays/masks with np.ix_, similar to how indexing arrays are used. This can be used again to select a block off the input array and also for assignments into it.
A. Selection
Thus, with row_mask and col_mask boolean arrays as the masks for row and column selections respectively, we can use the following for selections -
x[np.ix_(row_mask,col_mask)]
B. Assignment
And the following works for assignments -
x[np.ix_(row_mask,col_mask)] = # scalar or broadcastable array
Sample Runs
1. Using np.ix_ with indexing-arrays
Input array and indexing arrays -
In [221]: x
Out[221]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[88, 92, 46, 67, 44, 81, 17, 67],
[31, 70, 47, 90, 52, 15, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, 76, 56, 72, 43, 79, 53, 37],
[74, 46, 95, 27, 81, 97, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
In [222]: row_indices
Out[222]: [4, 2, 5, 4, 1]
In [223]: col_indices
Out[223]: [1, 2]
Tuple of indexing arrays with np.ix_ -
In [224]: np.ix_(row_indices,col_indices) # Broadcasting of indices
Out[224]:
(array([[4],
[2],
[5],
[4],
[1]]), array([[1, 2]]))
Make selections -
In [225]: x[np.ix_(row_indices,col_indices)]
Out[225]:
array([[76, 56],
[70, 47],
[46, 95],
[76, 56],
[92, 46]])
As suggested by OP, this is in effect same as performing old-school broadcasting with a 2D array version of row_indices that has its elements/indices sent to axis=0 and thus creating a singleton dimension at axis=1 and thus allowing broadcasting with col_indices. Thus, we would have an alternative solution like so -
In [227]: x[np.asarray(row_indices)[:,None],col_indices]
Out[227]:
array([[76, 56],
[70, 47],
[46, 95],
[76, 56],
[92, 46]])
As discussed earlier, for the assignments, we simply do so.
Row, col indexing arrays -
In [36]: row_indices = [1, 4]
In [37]: col_indices = [1, 3]
Make assignments with scalar -
In [38]: x[np.ix_(row_indices,col_indices)] = -1
In [39]: x
Out[39]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[88, -1, 46, -1, 44, 81, 17, 67],
[31, 70, 47, 90, 52, 15, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, -1, 56, -1, 43, 79, 53, 37],
[74, 46, 95, 27, 81, 97, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
Make assignments with 2D block(broadcastable array) -
In [40]: rand_arr = -np.arange(4).reshape(2,2)
In [41]: x[np.ix_(row_indices,col_indices)] = rand_arr
In [42]: x
Out[42]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[88, 0, 46, -1, 44, 81, 17, 67],
[31, 70, 47, 90, 52, 15, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, -2, 56, -3, 43, 79, 53, 37],
[74, 46, 95, 27, 81, 97, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
2. Using np.ix_ with masks
Input array -
In [19]: x
Out[19]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[88, 92, 46, 67, 44, 81, 17, 67],
[31, 70, 47, 90, 52, 15, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, 76, 56, 72, 43, 79, 53, 37],
[74, 46, 95, 27, 81, 97, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
Input row, col masks -
In [20]: row_mask = np.array([0,1,1,0,0,1,0],dtype=bool)
In [21]: col_mask = np.array([1,0,1,0,1,1,0,0],dtype=bool)
Make selections -
In [22]: x[np.ix_(row_mask,col_mask)]
Out[22]:
array([[88, 46, 44, 81],
[31, 47, 52, 15],
[74, 95, 81, 97]])
Make assignments with scalar -
In [23]: x[np.ix_(row_mask,col_mask)] = -1
In [24]: x
Out[24]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[-1, 92, -1, 67, -1, -1, 17, 67],
[-1, 70, -1, 90, -1, -1, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, 76, 56, 72, 43, 79, 53, 37],
[-1, 46, -1, 27, -1, -1, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
Make assignments with 2D block(broadcastable array) -
In [25]: rand_arr = -np.arange(12).reshape(3,4)
In [26]: x[np.ix_(row_mask,col_mask)] = rand_arr
In [27]: x
Out[27]:
array([[ 17, 39, 88, 14, 73, 58, 17, 78],
[ 0, 92, -1, 67, -2, -3, 17, 67],
[ -4, 70, -5, 90, -6, -7, 24, 22],
[ 19, 59, 98, 19, 52, 95, 88, 65],
[ 85, 76, 56, 72, 43, 79, 53, 37],
[ -8, 46, -9, 27, -10, -11, 93, 69],
[ 49, 46, 12, 83, 15, 63, 20, 79]])

What about:
x[row_indices][:,col_indices]
For example,
x = np.random.random_integers(0,5,(5,5))
## array([[4, 3, 2, 5, 0],
## [0, 3, 1, 4, 2],
## [4, 2, 0, 0, 3],
## [4, 5, 5, 5, 0],
## [1, 1, 5, 0, 2]])
row_indices = [4,2]
col_indices = [1,2]
x[row_indices][:,col_indices]
## array([[1, 5],
## [2, 0]])

import numpy as np
x = np.random.random_integers(0,5,(4,4))
x
array([[5, 3, 3, 2],
[4, 3, 0, 0],
[1, 4, 5, 3],
[0, 4, 3, 4]])
# This indexes the elements 1,1 and 2,2 and 3,3
indexes = (np.array([1,2,3]),np.array([1,2,3]))
x[indexes]
# returns array([3, 5, 4])
Notice that numpy has very different rules depending on what kind of indexes you use. So indexing several elements should be by a tuple of np.ndarray (see indexing manual).
So you need only to convert your list to np.ndarray and it should work as expected.

I think you are trying to do one of the following (equlvalent) operations:
x_does_work = x[row_indices,:][:,col_indices]
x_does_work = x[:,col_indices][row_indices,:]
This will actually create a subset of x with only the selected rows, then select the columns from that, or vice versa in the second case. The first case can be thought of as
x_does_work = (x[row_indices,:])[:,col_indices]

Your first try would work if you write it with np.newaxis
x_new = x[row_indices[:, np.newaxis],column_indices]

How do I neatly select a numpy sub-array in 1 line? [duplicate]

I've got a strange situation.
I have a 2D Numpy array, x:
x = np.random.random_integers(0,5,(20,8))
And I have 2 indexers--one with indices for the rows, and one with indices for the column. In order to index X, I am having to do the following:
row_indices = [4,2,18,16,7,19,4]
col_indices = [1,2]
x_rows = x[row_indices,:]
x_indexed = x_rows[:,column_indices]
Instead of just:
x_new = x[row_indices,column_indices]
(which fails with: error, cannot broadcast (20,) with (2,))
I'd like to be able to do the indexing in one line using the broadcasting, since that would keep the code clean and readable...also, I don't know all that much about python under the hood, but as I understand it, it should be faster to do it in one line (and I'll be working with pretty big arrays).
Test Case:
x = np.random.random_integers(0,5,(20,8))
row_indices = [4,2,18,16,7,19,4]
col_indices = [1,2]
x_rows = x[row_indices,:]
x_indexed = x_rows[:,col_indices]
x_doesnt_work = x[row_indices,col_indices]

Selections or assignments with np.ix_ using indexing or boolean arrays/masks
1. With indexing-arrays
A. Selection
We can use np.ix_ to get a tuple of indexing arrays that are broadcastable against each other to result in a higher-dimensional combinations of indices. So, when that tuple is used for indexing into the input array, would give us the same higher-dimensional array. Hence, to make a selection based on two 1D indexing arrays, it would be -
x_indexed = x[np.ix_(row_indices,col_indices)]
B. Assignment
We can use the same notation for assigning scalar or a broadcastable array into those indexed positions. Hence, the following works for assignments -
x[np.ix_(row_indices,col_indices)] = # scalar or broadcastable array
2. With masks
We can also use boolean arrays/masks with np.ix_, similar to how indexing arrays are used. This can be used again to select a block off the input array and also for assignments into it.
A. Selection
Thus, with row_mask and col_mask boolean arrays as the masks for row and column selections respectively, we can use the following for selections -
x[np.ix_(row_mask,col_mask)]
B. Assignment
And the following works for assignments -
x[np.ix_(row_mask,col_mask)] = # scalar or broadcastable array
Sample Runs
1. Using np.ix_ with indexing-arrays
Input array and indexing arrays -
In [221]: x
Out[221]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[88, 92, 46, 67, 44, 81, 17, 67],
[31, 70, 47, 90, 52, 15, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, 76, 56, 72, 43, 79, 53, 37],
[74, 46, 95, 27, 81, 97, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
In [222]: row_indices
Out[222]: [4, 2, 5, 4, 1]
In [223]: col_indices
Out[223]: [1, 2]
Tuple of indexing arrays with np.ix_ -
In [224]: np.ix_(row_indices,col_indices) # Broadcasting of indices
Out[224]:
(array([[4],
[2],
[5],
[4],
[1]]), array([[1, 2]]))
Make selections -
In [225]: x[np.ix_(row_indices,col_indices)]
Out[225]:
array([[76, 56],
[70, 47],
[46, 95],
[76, 56],
[92, 46]])
As suggested by OP, this is in effect same as performing old-school broadcasting with a 2D array version of row_indices that has its elements/indices sent to axis=0 and thus creating a singleton dimension at axis=1 and thus allowing broadcasting with col_indices. Thus, we would have an alternative solution like so -
In [227]: x[np.asarray(row_indices)[:,None],col_indices]
Out[227]:
array([[76, 56],
[70, 47],
[46, 95],
[76, 56],
[92, 46]])
As discussed earlier, for the assignments, we simply do so.
Row, col indexing arrays -
In [36]: row_indices = [1, 4]
In [37]: col_indices = [1, 3]
Make assignments with scalar -
In [38]: x[np.ix_(row_indices,col_indices)] = -1
In [39]: x
Out[39]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[88, -1, 46, -1, 44, 81, 17, 67],
[31, 70, 47, 90, 52, 15, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, -1, 56, -1, 43, 79, 53, 37],
[74, 46, 95, 27, 81, 97, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
Make assignments with 2D block(broadcastable array) -
In [40]: rand_arr = -np.arange(4).reshape(2,2)
In [41]: x[np.ix_(row_indices,col_indices)] = rand_arr
In [42]: x
Out[42]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[88, 0, 46, -1, 44, 81, 17, 67],
[31, 70, 47, 90, 52, 15, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, -2, 56, -3, 43, 79, 53, 37],
[74, 46, 95, 27, 81, 97, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
2. Using np.ix_ with masks
Input array -
In [19]: x
Out[19]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[88, 92, 46, 67, 44, 81, 17, 67],
[31, 70, 47, 90, 52, 15, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, 76, 56, 72, 43, 79, 53, 37],
[74, 46, 95, 27, 81, 97, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
Input row, col masks -
In [20]: row_mask = np.array([0,1,1,0,0,1,0],dtype=bool)
In [21]: col_mask = np.array([1,0,1,0,1,1,0,0],dtype=bool)
Make selections -
In [22]: x[np.ix_(row_mask,col_mask)]
Out[22]:
array([[88, 46, 44, 81],
[31, 47, 52, 15],
[74, 95, 81, 97]])
Make assignments with scalar -
In [23]: x[np.ix_(row_mask,col_mask)] = -1
In [24]: x
Out[24]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[-1, 92, -1, 67, -1, -1, 17, 67],
[-1, 70, -1, 90, -1, -1, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, 76, 56, 72, 43, 79, 53, 37],
[-1, 46, -1, 27, -1, -1, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
Make assignments with 2D block(broadcastable array) -
In [25]: rand_arr = -np.arange(12).reshape(3,4)
In [26]: x[np.ix_(row_mask,col_mask)] = rand_arr
In [27]: x
Out[27]:
array([[ 17, 39, 88, 14, 73, 58, 17, 78],
[ 0, 92, -1, 67, -2, -3, 17, 67],
[ -4, 70, -5, 90, -6, -7, 24, 22],
[ 19, 59, 98, 19, 52, 95, 88, 65],
[ 85, 76, 56, 72, 43, 79, 53, 37],
[ -8, 46, -9, 27, -10, -11, 93, 69],
[ 49, 46, 12, 83, 15, 63, 20, 79]])

What about:
x[row_indices][:,col_indices]
For example,
x = np.random.random_integers(0,5,(5,5))
## array([[4, 3, 2, 5, 0],
## [0, 3, 1, 4, 2],
## [4, 2, 0, 0, 3],
## [4, 5, 5, 5, 0],
## [1, 1, 5, 0, 2]])
row_indices = [4,2]
col_indices = [1,2]
x[row_indices][:,col_indices]
## array([[1, 5],
## [2, 0]])

import numpy as np
x = np.random.random_integers(0,5,(4,4))
x
array([[5, 3, 3, 2],
[4, 3, 0, 0],
[1, 4, 5, 3],
[0, 4, 3, 4]])
# This indexes the elements 1,1 and 2,2 and 3,3
indexes = (np.array([1,2,3]),np.array([1,2,3]))
x[indexes]
# returns array([3, 5, 4])
Notice that numpy has very different rules depending on what kind of indexes you use. So indexing several elements should be by a tuple of np.ndarray (see indexing manual).
So you need only to convert your list to np.ndarray and it should work as expected.

I think you are trying to do one of the following (equlvalent) operations:
x_does_work = x[row_indices,:][:,col_indices]
x_does_work = x[:,col_indices][row_indices,:]
This will actually create a subset of x with only the selected rows, then select the columns from that, or vice versa in the second case. The first case can be thought of as
x_does_work = (x[row_indices,:])[:,col_indices]

Your first try would work if you write it with np.newaxis
x_new = x[row_indices[:, np.newaxis],column_indices]

Indexing numpy.ndarrays periodically

I am trying to access (read/write) numpy.ndarrays periodically. In other words, if I have my_array with the shape of 10*10 and I use the access operator with the inputs:
my_arrray[10, 10] or acess_function(my_array, 10, 10)
I can have access to element
my_array[0, 0].
I want to have read/write ability at my returned element of periodically indexed array.
Can anyone how to do it without making a shifted copy of my original array?

I think this does what you want but I'm not sure whether there's something more elegant that exists. It's probably possible to write a general function for an Nd array but this does 2D only. As you said it uses modular arithmetic.
import numpy as np
def access(shape, ixr, ixc):
""" Returns a selection. """
return np.s_[ixr % shape[0], ixc % shape[1]]
arr = np.arange(100)
arr.shape = 10,10
arr[ access(arr.shape, 45, 87) ]
# 57
arr[access(arr.shape, 45, 87)] = 100
In [18]: arr
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# [ 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
# [ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
# [ 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
# [ 50, 51, 52, 53, 54, 55, 56, **100**, 58, 59],
# [ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
# [ 70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
# [ 80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
# [ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
Edit - Generic nD version
def access(shape, *args):
if len(shape) != len(args):
error = 'Inconsistent number of dimemsions: {} & number of indices: {} in coords.'
raise IndexError( error.format(len(shape), len(args)))
res = []
for limit, ix in zip(shape, args):
res.append(ix % limit)
return tuple(res)
Usage/Test
a = np.arange(24)
a.shape = 2,3,4
a[access(a.shape, 5, 6, 7)]
# 15
a[access(a.shape, 5,6,7) ] = 100
a
# array([[[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]],
# [[ 12, 13, 14, 100],
# [ 16, 17, 18, 19],
# [ 20, 21, 22, 23]]])

Numpy: using entries of array to populate larger array with repeats [duplicate]

I've got a strange situation.
I have a 2D Numpy array, x:
x = np.random.random_integers(0,5,(20,8))
And I have 2 indexers--one with indices for the rows, and one with indices for the column. In order to index X, I am having to do the following:
row_indices = [4,2,18,16,7,19,4]
col_indices = [1,2]
x_rows = x[row_indices,:]
x_indexed = x_rows[:,column_indices]
Instead of just:
x_new = x[row_indices,column_indices]
(which fails with: error, cannot broadcast (20,) with (2,))
I'd like to be able to do the indexing in one line using the broadcasting, since that would keep the code clean and readable...also, I don't know all that much about python under the hood, but as I understand it, it should be faster to do it in one line (and I'll be working with pretty big arrays).
Test Case:
x = np.random.random_integers(0,5,(20,8))
row_indices = [4,2,18,16,7,19,4]
col_indices = [1,2]
x_rows = x[row_indices,:]
x_indexed = x_rows[:,col_indices]
x_doesnt_work = x[row_indices,col_indices]

Selections or assignments with np.ix_ using indexing or boolean arrays/masks
1. With indexing-arrays
A. Selection
We can use np.ix_ to get a tuple of indexing arrays that are broadcastable against each other to result in a higher-dimensional combinations of indices. So, when that tuple is used for indexing into the input array, would give us the same higher-dimensional array. Hence, to make a selection based on two 1D indexing arrays, it would be -
x_indexed = x[np.ix_(row_indices,col_indices)]
B. Assignment
We can use the same notation for assigning scalar or a broadcastable array into those indexed positions. Hence, the following works for assignments -
x[np.ix_(row_indices,col_indices)] = # scalar or broadcastable array
2. With masks
We can also use boolean arrays/masks with np.ix_, similar to how indexing arrays are used. This can be used again to select a block off the input array and also for assignments into it.
A. Selection
Thus, with row_mask and col_mask boolean arrays as the masks for row and column selections respectively, we can use the following for selections -
x[np.ix_(row_mask,col_mask)]
B. Assignment
And the following works for assignments -
x[np.ix_(row_mask,col_mask)] = # scalar or broadcastable array
Sample Runs
1. Using np.ix_ with indexing-arrays
Input array and indexing arrays -
In [221]: x
Out[221]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[88, 92, 46, 67, 44, 81, 17, 67],
[31, 70, 47, 90, 52, 15, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, 76, 56, 72, 43, 79, 53, 37],
[74, 46, 95, 27, 81, 97, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
In [222]: row_indices
Out[222]: [4, 2, 5, 4, 1]
In [223]: col_indices
Out[223]: [1, 2]
Tuple of indexing arrays with np.ix_ -
In [224]: np.ix_(row_indices,col_indices) # Broadcasting of indices
Out[224]:
(array([[4],
[2],
[5],
[4],
[1]]), array([[1, 2]]))
Make selections -
In [225]: x[np.ix_(row_indices,col_indices)]
Out[225]:
array([[76, 56],
[70, 47],
[46, 95],
[76, 56],
[92, 46]])
As suggested by OP, this is in effect same as performing old-school broadcasting with a 2D array version of row_indices that has its elements/indices sent to axis=0 and thus creating a singleton dimension at axis=1 and thus allowing broadcasting with col_indices. Thus, we would have an alternative solution like so -
In [227]: x[np.asarray(row_indices)[:,None],col_indices]
Out[227]:
array([[76, 56],
[70, 47],
[46, 95],
[76, 56],
[92, 46]])
As discussed earlier, for the assignments, we simply do so.
Row, col indexing arrays -
In [36]: row_indices = [1, 4]
In [37]: col_indices = [1, 3]
Make assignments with scalar -
In [38]: x[np.ix_(row_indices,col_indices)] = -1
In [39]: x
Out[39]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[88, -1, 46, -1, 44, 81, 17, 67],
[31, 70, 47, 90, 52, 15, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, -1, 56, -1, 43, 79, 53, 37],
[74, 46, 95, 27, 81, 97, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
Make assignments with 2D block(broadcastable array) -
In [40]: rand_arr = -np.arange(4).reshape(2,2)
In [41]: x[np.ix_(row_indices,col_indices)] = rand_arr
In [42]: x
Out[42]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[88, 0, 46, -1, 44, 81, 17, 67],
[31, 70, 47, 90, 52, 15, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, -2, 56, -3, 43, 79, 53, 37],
[74, 46, 95, 27, 81, 97, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
2. Using np.ix_ with masks
Input array -
In [19]: x
Out[19]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[88, 92, 46, 67, 44, 81, 17, 67],
[31, 70, 47, 90, 52, 15, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, 76, 56, 72, 43, 79, 53, 37],
[74, 46, 95, 27, 81, 97, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
Input row, col masks -
In [20]: row_mask = np.array([0,1,1,0,0,1,0],dtype=bool)
In [21]: col_mask = np.array([1,0,1,0,1,1,0,0],dtype=bool)
Make selections -
In [22]: x[np.ix_(row_mask,col_mask)]
Out[22]:
array([[88, 46, 44, 81],
[31, 47, 52, 15],
[74, 95, 81, 97]])
Make assignments with scalar -
In [23]: x[np.ix_(row_mask,col_mask)] = -1
In [24]: x
Out[24]:
array([[17, 39, 88, 14, 73, 58, 17, 78],
[-1, 92, -1, 67, -1, -1, 17, 67],
[-1, 70, -1, 90, -1, -1, 24, 22],
[19, 59, 98, 19, 52, 95, 88, 65],
[85, 76, 56, 72, 43, 79, 53, 37],
[-1, 46, -1, 27, -1, -1, 93, 69],
[49, 46, 12, 83, 15, 63, 20, 79]])
Make assignments with 2D block(broadcastable array) -
In [25]: rand_arr = -np.arange(12).reshape(3,4)
In [26]: x[np.ix_(row_mask,col_mask)] = rand_arr
In [27]: x
Out[27]:
array([[ 17, 39, 88, 14, 73, 58, 17, 78],
[ 0, 92, -1, 67, -2, -3, 17, 67],
[ -4, 70, -5, 90, -6, -7, 24, 22],
[ 19, 59, 98, 19, 52, 95, 88, 65],
[ 85, 76, 56, 72, 43, 79, 53, 37],
[ -8, 46, -9, 27, -10, -11, 93, 69],
[ 49, 46, 12, 83, 15, 63, 20, 79]])

What about:
x[row_indices][:,col_indices]
For example,
x = np.random.random_integers(0,5,(5,5))
## array([[4, 3, 2, 5, 0],
## [0, 3, 1, 4, 2],
## [4, 2, 0, 0, 3],
## [4, 5, 5, 5, 0],
## [1, 1, 5, 0, 2]])
row_indices = [4,2]
col_indices = [1,2]
x[row_indices][:,col_indices]
## array([[1, 5],
## [2, 0]])

import numpy as np
x = np.random.random_integers(0,5,(4,4))
x
array([[5, 3, 3, 2],
[4, 3, 0, 0],
[1, 4, 5, 3],
[0, 4, 3, 4]])
# This indexes the elements 1,1 and 2,2 and 3,3
indexes = (np.array([1,2,3]),np.array([1,2,3]))
x[indexes]
# returns array([3, 5, 4])
Notice that numpy has very different rules depending on what kind of indexes you use. So indexing several elements should be by a tuple of np.ndarray (see indexing manual).
So you need only to convert your list to np.ndarray and it should work as expected.

I think you are trying to do one of the following (equlvalent) operations:
x_does_work = x[row_indices,:][:,col_indices]
x_does_work = x[:,col_indices][row_indices,:]
This will actually create a subset of x with only the selected rows, then select the columns from that, or vice versa in the second case. The first case can be thought of as
x_does_work = (x[row_indices,:])[:,col_indices]

Your first try would work if you write it with np.newaxis
x_new = x[row_indices[:, np.newaxis],column_indices]

Indexing a numpy array using a numpy array of slices

(Edit: I wrote a solution basing on hpaulj's answer, see code at the bottom of this post)
I wrote a function that subdivides an n-dimensional array into smaller ones such that each of the subdivisions has max_chunk_size elements in total.
Since I need to subdivide many arrays of same shapes and then perform operations on the corresponding chunks, it doesn't actually operate on the data rather than creates an array of "indexers", i. e. an array of (slice(x1, x2), slice(y1, y2), ...) objects (see the code below). With these indexers I can retrieve subdivisions by calling the_array[indexer[i]] (see examples below).
Also, the array of these indexers has same number of dimensions as input and divisions are aligned along corresponding axes, i. e. blocks the_array[indexer[i,j,k]] and the_array[indexer[i+1,j,k]] are adjusent along the 0-axis, etc.
I was expecting that I should also be able to concatenate these blocks by calling the_array[indexer[i:i+2,j,k]] and that the_array[indexer] would return just the_array, however such calls result in an error:
IndexError: arrays used as indices must be of integer (or boolean)
type
Is there a simple way around this error?
Here's the code:
import numpy as np
import itertools
def subdivide(shape, max_chunk_size=500000):
shape = np.array(shape).astype(float)
total_size = shape.prod()
# calculate maximum slice shape:
slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)
# create a list of slices for each dimension:
slices = [[slice(left, min(right, n)) \
for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
for n, step_size in zip(shape.astype(int), slice_shape)]
result = np.empty(reduce(lambda a,b:a*len(b), slices, 1), dtype=np.object)
for i, el in enumerate(itertools.product(*slices)): result[i] = el
result.shape = np.ceil(shape / slice_shape).astype(int)
return result
Here's an example usage:
>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> slices = subdivide(ar.shape, 16)
>>> slices
array([[(slice(0, 2, None), slice(0, 6, None)),
(slice(0, 2, None), slice(6, 12, None)),
(slice(0, 2, None), slice(12, 15, None))],
[(slice(2, 4, None), slice(0, 6, None)),
(slice(2, 4, None), slice(6, 12, None)),
(slice(2, 4, None), slice(12, 15, None))],
[(slice(4, 6, None), slice(0, 6, None)),
(slice(4, 6, None), slice(6, 12, None)),
(slice(4, 6, None), slice(12, 15, None))]], dtype=object)
>>> ar[slices[1,0]]
array([[30, 31, 32, 33, 34, 35],
[45, 46, 47, 48, 49, 50]])
>>> ar[slices[0,2]]
array([[12, 13, 14],
[27, 28, 29]])
>>> ar[slices[2,1]]
array([[66, 67, 68, 69, 70, 71],
[81, 82, 83, 84, 85, 86]])
>>> ar[slices[:2,1:3]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: arrays used as indices must be of integer (or boolean) type
Here's a solution based on hpaulj's answer:
import numpy as np
import itertools
class Subdivision():
def __init__(self, shape, max_chunk_size=500000):
shape = np.array(shape).astype(float)
total_size = shape.prod()
# calculate maximum slice shape:
slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)
# create a list of slices for each dimension:
slices = [[slice(left, min(right, n)) \
for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
for n, step_size in zip(shape.astype(int), slice_shape)]
self.slices = \
np.array(list(itertools.product(*slices)), \
dtype=np.object).reshape(tuple(np.ceil(shape / slice_shape).astype(int)) + (len(shape),))
def __getitem__(self, args):
if type(args) != tuple: args = (args,)
# turn integer index into equivalent slice
args = tuple(slice(arg, arg + 1 if arg != -1 else None) if type(arg) == int else arg for arg in args)
# select the slices
# always select all elements from the last axis (which contains slices for each data dimension)
slices = self.slices[args + ((slice(None),) if Ellipsis in args else (Ellipsis, slice(None)))]
return np.ix_(*tuple(np.r_[tuple(slices[tuple([0] * i + [slice(None)] + \
[0] * (len(slices.shape) - 2 - i) + [i])])] \
for i in range(len(slices.shape) - 1)))
Example usage:
>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> subdiv = Subdivision(ar.shape, 16)
>>> ar[subdiv[...]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> ar[subdiv[0]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
>>> ar[subdiv[:2,1]]
array([[ 6, 7, 8, 9, 10, 11],
[21, 22, 23, 24, 25, 26],
[36, 37, 38, 39, 40, 41],
[51, 52, 53, 54, 55, 56]])
>>> ar[subdiv[2,:3]]
array([[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> ar[subdiv[...,:2]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]])

Your slices produce 2x6 and 2x3 arrays.
In [36]: subslice=slices[:2,1:3]
In [37]: subslice[0,0]
Out[37]: array([slice(0, 2, None), slice(6, 12, None)], dtype=object)
In [38]: ar[tuple(subslice[0,0])]
Out[38]:
array([[ 6, 7, 8, 9, 10, 11],
[21, 22, 23, 24, 25, 26]])
My numpy version expects me to turn the subslice into a tuple. This is the same as
ar[slice(0,2), slice(6,12)]
ar[:2, 6:12]
That's just the basic syntax of indexing and slicing. ar is 2d, so ar[(i,j)] requires a 2 element tuple - of slices, lists, arrays, or integers. It won't work with an array of slice objects.
How ever it is possible to concatenate the results into a larger array. That can be done after indexing or the slices can be converted into indexing lists.
np.bmat for example concatenates together a 2d arangement of arrays:
In [42]: np.bmat([[ar[tuple(subslice[0,0])], ar[tuple(subslice[0,1])]],
[ar[tuple(subslice[1,0])],ar[tuple(subslice[1,1])]]])
Out[42]:
matrix([[ 6, 7, 8, 9, 10, 11, 12, 13, 14],
[21, 22, 23, 24, 25, 26, 27, 28, 29],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[51, 52, 53, 54, 55, 56, 57, 58, 59]])
You could generalize this. It just uses hstack and vstack on the nested lists. The result is np.matrix but can be converted back to array.
The other approach is to use tools like np.arange, np.r_, np.xi_ to create index arrays. It'll take some playing around to generate an example.
To combine the [0,0] and [0,1] subslices:
In [64]: j = np.r_[subslice[0,0,1],subslice[0,1,1]]
In [65]: i = np.r_[subslice[0,0,0]]
In [66]: i,j
Out[66]: (array([0, 1]), array([ 6, 7, 8, 9, 10, 11, 12, 13, 14]))
In [68]: ix = np.ix_(i,j)
In [69]: ix
Out[69]:
(array([[0],
[1]]), array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14]]))
In [70]: ar[ix]
Out[70]:
array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14],
[21, 22, 23, 24, 25, 26, 27, 28, 29]])
Or with i = np.r_[subslice[0,0,0], subslice[1,0,0]], ar[np.ix_(i,j)] produces the 4x9 array.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

NumPy nearest value along axis of multidimensional array - python

Related

Fancy indexing across multiple dimensions in numpy [duplicate]

How do I neatly select a numpy sub-array in 1 line? [duplicate]

Indexing numpy.ndarrays periodically

Numpy: using entries of array to populate larger array with repeats [duplicate]

Indexing a numpy array using a numpy array of slices

Categories

Resources