How does integer-array indexing work in numpy? - python

I am not able to understand integer array indexing in numpy.
>>> x = np.array([[1, 2], [3, 4], [5, 6]])
>>> x[[0, 1, 2], [0, 1, 0]]
array([1, 4, 5])
Please explain me what is happening in this?

x[[0,1,2],[0,1,0]]
[0,1,2] <- here you specify which arrays you will be using
[0,1,0] <- here you choose elements from each of specified arrays
So element 0 from array 0, element 1 form arr 1 and so on

In [76]: x = np.array([[1, 2], [3, 4], [5, 6]])
In [77]: x
Out[77]:
array([[1, 2],
[3, 4],
[5, 6]])
Because the 1st and 2nd indexing lists match in size, their values are paired up to select elements from x. I'll illustrate it with list indexing:
In [78]: x[[0, 1, 2], [0, 1, 0]]
Out[78]: array([1, 4, 5])
In [79]: list(zip([0, 1, 2], [0, 1, 0]))
Out[79]: [(0, 0), (1, 1), (2, 0)]
In [80]: [x[i,j] for i,j in zip([0, 1, 2], [0, 1, 0])]
Out[80]: [1, 4, 5]
Or more explicitly, it is returning x[0,0], x[1,1] and x[2,0], as a 1d array. Another way to think it is that you've picked the [0,1,0] elements from the 3 rows (respectively).

I find it easiest to understand as follows:
In [179]: x = np.array([[1, 2], [3, 4], [5, 6]])
In [180]: x
Out[180]:
array([[1, 2],
[3, 4],
[5, 6]])
Say we want to select 1, 4, and 5 from this matrix. So the 0th column of row 0, the 1st column of the 1st row, and the 0th column of the 2nd row. Now provide the index with two arrays (one for each dimension of the matrix), where we populate these arrays with the rows and then the columns we are interested in:
In [181]: rows = np.array([0, 1, 2])
In [182]: cols = np.array([0, 1, 0])
In [183]: x[rows, cols]
Out[183]: array([1, 4, 5])

Related

Removing values from a 3D array of indices

I have a 3D array of indices generated from np.argsort, sorted by the 0-th axis, so that each column is the sorting index. However, I want to drop some values from this array, say 0. Of course, I can remove the 0-th slice then sort again, but I need to repeat this sort for many times and each time I need to remove some different values, so I would like to see if there is a more efficient way to generate the array. I think this problem is the same as shifting the NaN value along axis=0 to the end.
Example
Consider the following 3D array of sorting indices. Notice that along axis=0 the array has unique values.
arr = np.array(
[[[0, 0],
[1, 2]],
[[1, 2],
[0, 1]],
[[2, 1],
[2, 0]]]
)
Suppose I would like to remove the value 0 from it. The result would look like
array([[[1, 2],
[1, 2]],
[[2, 1],
[2, 1]]])
What I've tried
I tried removing the values using np.where and then reshape the array, but it is different from the expected array.
>>> arr[np.where(arr != 0)]
array([1, 2, 1, 2, 1, 2, 1, 2])
>>> arr[np.where(arr != 0)].reshape(-1, 2, 2)
array([[[1, 2],
[1, 2]],
[[1, 2],
[1, 2]]])
Explanation of output
In consider arr[:, 1, 0] = [1, 0, 2]. After dropping 0, the new array is [1, 2]. Therefore new_arr[:, 1, 0] = [1, 2].
I just realized that you can specify the axis order in np.transpose, and I came up with a solution with that.
Solution
>>> arr_t = np.transpose(arr, (1, 2, 0))
>>> arr_dp = arr_t[arr_t != 0]
>>> arr_dp_rs = arr_dp.reshape(arr.shape[1], arr.shape[2], -1)
>>> new_arr = np.transpose(arr_dp_rs, (2, 0 ,1))
>>> new_arr
array([[[1, 2],
[1, 2]],
[[2, 1],
[2, 1]]])
Explanation
We first transpose arr so the 0-th axis is the inner most axis. This ensures after subsetting, the values are ordered in the 0-th axis.
>>> arr_t = np.transpose(arr, (1, 2, 0))
>>> arr_t
array([[[0, 1, 2],
[0, 2, 1]],
[[1, 0, 2],
[2, 1, 0]]])
>>> arr_dp = arr_t[arr_t != 0]
>>> arr_dp
array([1, 2, 2, 1, 1, 2, 2, 1])
Now the values are in the desired order but along the 0-th axis, we reshape it then swap the axis again.
arr_dp_rs = arr_dp.reshape(arr.shape[1], arr.shape[2], -1)
arr_dp_rs
array([[[1, 2],
[2, 1]],
[[1, 2],
[2, 1]]])
new_arr = np.transpose(arr_dp_rs, (2, 0 ,1))
new_arr
array([[[1, 2],
[1, 2]],
[[2, 1],
[2, 1]]])

Interpretation of counts for `numpy.unique` when applied on a matrix

numpy.unique has an optional argument return_counts. From the docs:
return_counts bool, optional If True, also return the number of times
each unique item appears in ar.
New in version 1.9.0.
Which is straightforward for a 1-D array. However, I'm trying to the unique values and counts for each row of a matrix. Here is a sample matrix:
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5],
])
When I apply np.unique:
np.unique(m_sample, axis=1, return_counts=True)
(array([[1, 1, 2],
[2, 2, 2],
[3, 3, 3],
[1, 5, 4]]), array([1, 1, 1]))
I'm not really sure what the returned matrix here represents, much less so the counts array. Is this perhaps a bug in numpy (or maybe a case the developer did not consider)? Am I misunderstanding how to use the parameters in this case?
When you specify an axis, np.unique returns unique subarrays indexed along this axis. To see is better, assume that one of the rows repeats:
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5],
[1, 2, 1]
])
In such case np.unique(m_sample, axis=0, return_counts=True) gives:
(array([[1, 2, 1],
[1, 4, 5],
[2, 2, 2],
[3, 3, 3]]),
array([2, 1, 1, 1]))
The first element of this tuple lists unique rows of the array, and the second how many times each row appears in the array. In this example, the row [1, 2, 1] is repeated twice.
To get unique values in each row you can try, for example, the following:
import numpy as np
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5]
])
s = np.sort(m_sample, axis=1)
mask = np.full(m_sample.shape, True)
mask[:, 1:] = s[:, :-1] != s[:, 1:]
np.split(s[mask], np.cumsum(mask.sum(axis=1)))[:-1]
It gives:
[array([1, 2]), array([2]), array([3]), array([1, 4, 5])]

numpy.where , numpy.take and indices on multidimensional arrays

I've got 2-d array that looks like this:
my_array
array([[6, 1, 4],
[4, 8, 4],
[6, 3, 5]])
After using np.where I've got a tuple with two arrays for both dimensions
indices = np.where(my_array > 4)
(array([0, 1, 2, 2], dtype=int64),
array([0, 1, 0, 2], dtype=int64))
my question are
is there a method to turn these arrays into iterable set of paired indices without iterating through them with a for loop?
what I would like to get is a list(?) of tuples with pairs of indices that I could directly use over array to get the objects one at a time.
So in this case
paired_indices = [ (0,0), (1,1), (2,0), (2,2) ]
my_array[paired_indices[0]]
6
my_array[paired_indices[1]]
8
How can I use numpy.take(my_array, indices) to get the proper elements when the arrays are multidimensional? It works fine on 1-dimensional array, but I couldn't figure so far how to deal with more dimensions.
when I use:
a = np.take(my_array, indices)
the result is:
array([[6, 1, 4, 4],
[6, 1, 6, 4]])
how was the result of
a = np.take(my_array, indices)
calculated into below, and how could you interpret this result?
array([[6, 1, 4, 4],
[6, 1, 6, 4]])
In [13]: arr = np.array([[6, 1, 4],
...: [4, 8, 4],
...: [6, 3, 5]])
In [14]: idx = np.nonzero(arr>4)
In [15]: idx
Out[15]: (array([0, 1, 2, 2]), array([0, 1, 0, 2]))
This tuple can be used directly to index the array, the result being a 1d array of the >4 values:
In [16]: arr[idx]
Out[16]: array([6, 8, 6, 5])
it can also be used to modify values
In [17]: arr1 = arr.copy()
In [18]: arr1[idx] = 10
In [19]: arr1
Out[19]:
array([[10, 1, 4],
[ 4, 10, 4],
[10, 3, 10]])
You can get an array index pairs with:
In [20]: np.argwhere(arr>4)
Out[20]:
array([[0, 0],
[1, 1],
[2, 0],
[2, 2]])
argwhere just applies transpose to the where:
In [21]: np.transpose(idx)
Out[21]:
array([[0, 0],
[1, 1],
[2, 0],
[2, 2]])
arr[idx] is the equivalent of:
In [22]: arr[idx[0],idx[1]]
Out[22]: array([6, 8, 6, 5])
np.take is a way of indexing along just one axis at at time, so isn't very useful in this context. The indices in idx are meant to be used together, not individually.

argmax for multidimensional array along some axis

I have a multidimension array that looks like this:
my_array = np.arange(2)[:,None,None] *np.arange(4)[:, None]*np.arange(8)
I am looking for a multidimensional equivalent of the 2-D argmax
In particular, I am looking for argmax of maxima along axis = 2. I tried reshaping first, but reshaping will completely destroy the original indices information of the entire array, so it probably won't work. I have no clue how to do it and need helps from you guys. Thank you in advance
EDIT: Desire output is:
[(0,0,0),(1,3,1),(1,3,2),(1,3,3),(1,3,4),(1,3,5),(1,3,6),(1,3,7)]
This exactly is the array of the indices of maxima along axis = 2
For finding such argmax indices along the last axis of a 3D ndarray, we can use something along these lines -
In [66]: idx = my_array.reshape(-1,my_array.shape[-1]).argmax(0)
In [67]: r,c = np.unravel_index(idx,my_array.shape[:-1])
In [68]: l = np.arange(len(idx))
In [69]: np.c_[r,c,l]
Out[69]:
array([[0, 0, 0],
[1, 3, 1],
[1, 3, 2],
[1, 3, 3],
[1, 3, 4],
[1, 3, 5],
[1, 3, 6],
[1, 3, 7]])
To extend this to a generic ndarray -
In [99]: R = np.unravel_index(idx,my_array.shape[:-1])
In [104]: np.hstack((np.c_[R],l[:,None]))
Out[104]:
array([[0, 0, 0],
[1, 3, 1],
[1, 3, 2],
[1, 3, 3],
[1, 3, 4],
[1, 3, 5],
[1, 3, 6],
[1, 3, 7]])

Sampling unique column indexes for each row of a numpy array

I want to generate a fixed number of random column indexes (without replacement) for each row of a numpy array.
A = np.array([[3, 5, 2, 3, 3],
[1, 3, 3, 4, 5],
[3, 5, 4, 2, 1],
[1, 2, 3, 5, 3]])
If I fixed the required column number to 2, I want something like
np.array([[1,3],
[0,4],
[1,4],
[2,3]])
I am looking for a non-loop Numpy based solution. I tried with choice, but with the replacement=False I get error
ValueError: Cannot take a larger sample than population when
'replace=False'
Here's one vectorized approach inspired by this post -
def random_unique_indexes_per_row(A, N=2):
m,n = A.shape
return np.random.rand(m,n).argsort(1)[:,:N]
Sample run -
In [146]: A
Out[146]:
array([[3, 5, 2, 3, 3],
[1, 3, 3, 4, 5],
[3, 5, 4, 2, 1],
[1, 2, 3, 5, 3]])
In [147]: random_unique_indexes_per_row(A, N=2)
Out[147]:
array([[4, 0],
[0, 1],
[3, 2],
[2, 0]])
In [148]: random_unique_indexes_per_row(A, N=3)
Out[148]:
array([[2, 0, 1],
[3, 4, 2],
[3, 2, 1],
[4, 3, 0]])
Like this?
B = np.random.randint(5, size=(len(A), 2))
You can use random.choice() as following:
def random_indices(arr, n):
x, y = arr.shape
return np.random.choice(np.arange(y), (x, n))
# or return np.random.randint(low=0, high=y, size=(x, n))
Demo:
In [34]: x, y = A.shape
In [35]: np.random.choice(np.arange(y), (x, 2))
Out[35]:
array([[0, 2],
[0, 1],
[0, 1],
[3, 1]])
As an experimental approach here is a way that in 99% of the times will give unique indices:
In [60]: def random_ind(arr, n):
...: x, y = arr.shape
...: ind = np.random.randint(low=0, high=y, size=(x * 2, n))
...: _, index = np.unique(ind.dot(np.random.rand(ind.shape[1])), return_index=True)
...: return ind[index][:4]
...:
...:
...:
In [61]: random_ind(A, 2)
Out[61]:
array([[0, 1],
[1, 0],
[1, 1],
[1, 4]])
In [62]: random_ind(A, 2)
Out[62]:
array([[1, 0],
[2, 0],
[2, 1],
[3, 1]])
In [64]: random_ind(A, 3)
Out[64]:
array([[0, 0, 0],
[1, 1, 2],
[0, 4, 1],
[2, 3, 1]])
In [65]: random_ind(A, 4)
Out[65]:
array([[0, 4, 0, 3],
[1, 0, 1, 4],
[0, 4, 1, 2],
[3, 0, 1, 0]])
This function will return IndexError at line return ind[index][:4] if there's no 4 unique items in that case you can repeat the function to make sure you'll get the desire result.

Categories