Removing values from a 3D array of indices

Removing values from a 3D array of indices - python

I have a 3D array of indices generated from np.argsort, sorted by the 0-th axis, so that each column is the sorting index. However, I want to drop some values from this array, say 0. Of course, I can remove the 0-th slice then sort again, but I need to repeat this sort for many times and each time I need to remove some different values, so I would like to see if there is a more efficient way to generate the array. I think this problem is the same as shifting the NaN value along axis=0 to the end.
Example
Consider the following 3D array of sorting indices. Notice that along axis=0 the array has unique values.
arr = np.array(
[[[0, 0],
[1, 2]],
[[1, 2],
[0, 1]],
[[2, 1],
[2, 0]]]
)
Suppose I would like to remove the value 0 from it. The result would look like
array([[[1, 2],
[1, 2]],
[[2, 1],
[2, 1]]])
What I've tried
I tried removing the values using np.where and then reshape the array, but it is different from the expected array.
>>> arr[np.where(arr != 0)]
array([1, 2, 1, 2, 1, 2, 1, 2])
>>> arr[np.where(arr != 0)].reshape(-1, 2, 2)
array([[[1, 2],
[1, 2]],
[[1, 2],
[1, 2]]])
Explanation of output
In consider arr[:, 1, 0] = [1, 0, 2]. After dropping 0, the new array is [1, 2]. Therefore new_arr[:, 1, 0] = [1, 2].

I just realized that you can specify the axis order in np.transpose, and I came up with a solution with that.
Solution
>>> arr_t = np.transpose(arr, (1, 2, 0))
>>> arr_dp = arr_t[arr_t != 0]
>>> arr_dp_rs = arr_dp.reshape(arr.shape[1], arr.shape[2], -1)
>>> new_arr = np.transpose(arr_dp_rs, (2, 0 ,1))
>>> new_arr
array([[[1, 2],
[1, 2]],
[[2, 1],
[2, 1]]])
Explanation
We first transpose arr so the 0-th axis is the inner most axis. This ensures after subsetting, the values are ordered in the 0-th axis.
>>> arr_t = np.transpose(arr, (1, 2, 0))
>>> arr_t
array([[[0, 1, 2],
[0, 2, 1]],
[[1, 0, 2],
[2, 1, 0]]])
>>> arr_dp = arr_t[arr_t != 0]
>>> arr_dp
array([1, 2, 2, 1, 1, 2, 2, 1])
Now the values are in the desired order but along the 0-th axis, we reshape it then swap the axis again.
arr_dp_rs = arr_dp.reshape(arr.shape[1], arr.shape[2], -1)
arr_dp_rs
array([[[1, 2],
[2, 1]],
[[1, 2],
[2, 1]]])
new_arr = np.transpose(arr_dp_rs, (2, 0 ,1))
new_arr
array([[[1, 2],
[1, 2]],
[[2, 1],
[2, 1]]])

Related

Numpy: Why advance indexing of > 2 dimensional arrays with a list and numpy array give different results

I have 3-dimensional numpy array
a = np.array([
[
[1, 3],
[0, 2]
],
[
[2, 1],
[4, 2]
]
], dtype=np.int32)
And I want to get the elements using the indices [0, 1, 1], and [1, 0, 1]
which I expect to give me:
[2, 1]
If I index with list, it returns the result I wanted, but if I index with Numpy array, it gives different results, why is that?
>>> indices = [[0, 1], [1, 0], [1, 1]]
>>> indices_arr = np.array(indices, dtype=np.int32)
>>> a[indices]
# OUTPUT
# array([2, 1], dtype=int32
>>> a[indices_arr]
# OUTPUT
'''
array([[[[1, 3],
[0, 2]],
[[2, 1],
[4, 2]]],
[[[2, 1],
[4, 2]],
[[1, 3],
[0, 2]]],
[[[2, 1],
[4, 2]],
[[2, 1],
[4, 2]]]], dtype=int32)
'''

In a current numpy version indices gives a warning:
In [46]: a[indices_arr].shape
Out[46]: (3, 2, 2, 2)
In [47]: a[indices]
C:\Users\paul\AppData\Local\Temp\ipykernel_8668\2035022355.py:1:
FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated;
use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted
as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
a[indices]
Out[47]: array([2, 1])
doing the tuple conversion myself:
In [48]: a[tuple(indices)]
Out[48]: array([2, 1])
Your indices_arr is the different result it warns about.
With tuple, we are applying each sublist to a separate dimension of a`:
In [49]: a[[0, 1], [1, 0], [1, 1]]
Out[49]: array([2, 1])
With indices_arr, the array is applied to just the first dimension, hence the (2,) becomes (3,2).

Interpretation of counts for `numpy.unique` when applied on a matrix

numpy.unique has an optional argument return_counts. From the docs:
return_counts bool, optional If True, also return the number of times
each unique item appears in ar.
New in version 1.9.0.
Which is straightforward for a 1-D array. However, I'm trying to the unique values and counts for each row of a matrix. Here is a sample matrix:
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5],
])
When I apply np.unique:
np.unique(m_sample, axis=1, return_counts=True)
(array([[1, 1, 2],
[2, 2, 2],
[3, 3, 3],
[1, 5, 4]]), array([1, 1, 1]))
I'm not really sure what the returned matrix here represents, much less so the counts array. Is this perhaps a bug in numpy (or maybe a case the developer did not consider)? Am I misunderstanding how to use the parameters in this case?

When you specify an axis, np.unique returns unique subarrays indexed along this axis. To see is better, assume that one of the rows repeats:
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5],
[1, 2, 1]
])
In such case np.unique(m_sample, axis=0, return_counts=True) gives:
(array([[1, 2, 1],
[1, 4, 5],
[2, 2, 2],
[3, 3, 3]]),
array([2, 1, 1, 1]))
The first element of this tuple lists unique rows of the array, and the second how many times each row appears in the array. In this example, the row [1, 2, 1] is repeated twice.
To get unique values in each row you can try, for example, the following:
import numpy as np
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5]
])
s = np.sort(m_sample, axis=1)
mask = np.full(m_sample.shape, True)
mask[:, 1:] = s[:, :-1] != s[:, 1:]
np.split(s[mask], np.cumsum(mask.sum(axis=1)))[:-1]
It gives:
[array([1, 2]), array([2]), array([3]), array([1, 4, 5])]

Looping through multi-dimensional array and filtering based on a condition

I believe similar questions have been asked but none which deal with the problem I am facing.
I have an array of shape (H,W,L) - I must loop through each instance of the array to filter out values (their x,y location) that meet a particular criteria. (say val > t_r and val < t_c) - I must repeat this for each of the K values.
For eg: If we have an array of shape (2,3,4)
A = [[[1,2,3,], [3,4,5,]],
[[6,7,8],[1,4,5]],
[[5,7,7],[9,4,3]],
[[1,2,4],[4,6,7]]]
suppose the first criteria is val > 2 and the second criteria is val < 6 and store the (row, col) value in a N x 3 array. Where the first 2 values are the 'row','col' and the last one corresponds to the layer / third dimension.
then the expected output of the operation should be something like -
output = [[0,2,0],[1,0,0],[1,0,0],[1,1,0],[1,2,0]....] this would correspond to the values filtered from A[:,:,0]
One approach I have thought of is - using 3 for loops - i,j,k to loop over each of the elements, but I am unable to figure out the exact implementation. I would also like to implement vectorization wherever possible. I could use some guidance.

You may use np.nonzero and vectorize your comparisons.
a = np.asarray(A)
res = np.vstack(np.nonzero((a>2)&(a<6))).T
array([[0, 0, 2],
[0, 1, 0],
[0, 1, 1],
[0, 1, 2],
[1, 1, 1],
[1, 1, 2],
[2, 0, 0],
[2, 1, 1],
[2, 1, 2],
[3, 0, 2],
[3, 1, 0]], dtype=int64)
You can always reorder the columns to your liking e.g.:
res[:, [1,2,0]]
array([[0, 2, 0],
[1, 0, 0],
[1, 1, 0],
[1, 2, 0],
[1, 1, 1],
[1, 2, 1],
[0, 0, 2],
[1, 1, 2],
[1, 2, 2],
[0, 2, 3],
[1, 0, 3]], dtype=int64)

How does integer-array indexing work in numpy?

I am not able to understand integer array indexing in numpy.
>>> x = np.array([[1, 2], [3, 4], [5, 6]])
>>> x[[0, 1, 2], [0, 1, 0]]
array([1, 4, 5])
Please explain me what is happening in this?

x[[0,1,2],[0,1,0]]
[0,1,2] <- here you specify which arrays you will be using
[0,1,0] <- here you choose elements from each of specified arrays
So element 0 from array 0, element 1 form arr 1 and so on

In [76]: x = np.array([[1, 2], [3, 4], [5, 6]])
In [77]: x
Out[77]:
array([[1, 2],
[3, 4],
[5, 6]])
Because the 1st and 2nd indexing lists match in size, their values are paired up to select elements from x. I'll illustrate it with list indexing:
In [78]: x[[0, 1, 2], [0, 1, 0]]
Out[78]: array([1, 4, 5])
In [79]: list(zip([0, 1, 2], [0, 1, 0]))
Out[79]: [(0, 0), (1, 1), (2, 0)]
In [80]: [x[i,j] for i,j in zip([0, 1, 2], [0, 1, 0])]
Out[80]: [1, 4, 5]
Or more explicitly, it is returning x[0,0], x[1,1] and x[2,0], as a 1d array. Another way to think it is that you've picked the [0,1,0] elements from the 3 rows (respectively).

I find it easiest to understand as follows:
In [179]: x = np.array([[1, 2], [3, 4], [5, 6]])
In [180]: x
Out[180]:
array([[1, 2],
[3, 4],
[5, 6]])
Say we want to select 1, 4, and 5 from this matrix. So the 0th column of row 0, the 1st column of the 1st row, and the 0th column of the 2nd row. Now provide the index with two arrays (one for each dimension of the matrix), where we populate these arrays with the rows and then the columns we are interested in:
In [181]: rows = np.array([0, 1, 2])
In [182]: cols = np.array([0, 1, 0])
In [183]: x[rows, cols]
Out[183]: array([1, 4, 5])

np.choose not giving desired result after broadcasting

I would like to pick the nth elements as specified in maxsuit from suitCounts. I did broadcast the maxsuit array so I do get a result, but not the desired one. Any suggestions what I'm doing conceptually wrong is appreciated. I don't understand the result of np.choose(self.maxsuit[:,:,None]-1, self.suitCounts), which is not what I'm looking for.
>>> self.maxsuit
Out[38]:
array([[3, 3],
[1, 1],
[1, 1]], dtype=int64)
>>> self.maxsuit[:,:,None]-1
Out[33]:
array([[[2],
[2]],
[[0],
[0]],
[[0],
[0]]], dtype=int64)
>>> self.suitCounts
Out[34]:
array([[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[4, 1, 2, 0],
[3, 0, 3, 0]],
[[2, 2, 0, 0],
[1, 1, 1, 0]]])
>>> np.choose(self.maxsuit[:,:,None]-1, self.suitCounts)
Out[35]:
array([[[2, 2, 0, 0],
[1, 1, 1, 0]],
[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[2, 1, 3, 0],
[1, 0, 3, 0]]])
The desired result would be:
[[3,3],[4,3],[2,1]]

You could use advanced-indexing for a broadcasted way to index into the array, like so -
In [415]: val # Data array
Out[415]:
array([[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[4, 1, 2, 0],
[3, 0, 3, 0]],
[[2, 2, 0, 0],
[1, 1, 1, 0]]])
In [416]: idx # Indexing array
Out[416]:
array([[3, 3],
[1, 1],
[1, 1]])
In [417]: m,n = val.shape[:2]
In [418]: val[np.arange(m)[:,None],np.arange(n),idx-1]
Out[418]:
array([[3, 3],
[4, 3],
[2, 1]])
A bit cleaner way with np.ogrid to use open range arrays -
In [424]: d0,d1 = np.ogrid[:m,:n]
In [425]: val[d0,d1,idx-1]
Out[425]:
array([[3, 3],
[4, 3],
[2, 1]])

This is the best I can do with choose
In [23]: np.choose([[1,2,0],[1,2,0]], suitcounts[:,:,:3])
Out[23]:
array([[4, 2, 3],
[3, 1, 3]])
choose prefers that we use a list of arrays, rather than single one. It's supposed to prevent misuse. So the problem could be written as:
In [24]: np.choose([[1,2,0],[1,2,0]], [suitcounts[0,:,:3], suitcounts[1,:,:3], suitcounts[2,:,:3]])
Out[24]:
array([[4, 2, 3],
[3, 1, 3]])
The idea is to select items from the 3 subarrays, based on an index array like:
In [25]: np.array([[1,2,0],[1,2,0]])
Out[25]:
array([[1, 2, 0],
[1, 2, 0]])
The output will match the indexing array in shape. The choise arrays have match in shape as well, hence my use of [...,:3].
Values for the first column are selected from suitcounts[1,:,:3], for the 2nd column from suitcounts[2...] etc.
choose is limited to 32 choices; this is limitation imposed by the broadcasting mechanism.
Speaking of broadcasting I could simplify the expression
In [26]: np.choose([1,2,0], suitcounts[:,:,:3])
Out[26]:
array([[4, 2, 3],
[3, 1, 3]])
This broadcasts [1,2,0] to match the 2x3 shape of the subarrays.
I could get the target order by reordering the columns:
In [27]: np.choose([0,1,2], suitcounts[:,:,[2,0,1]])
Out[27]:
array([[3, 4, 2],
[3, 3, 1]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Removing values from a 3D array of indices - python

Related

Numpy: Why advance indexing of > 2 dimensional arrays with a list and numpy array give different results

Interpretation of counts for `numpy.unique` when applied on a matrix

Looping through multi-dimensional array and filtering based on a condition

How does integer-array indexing work in numpy?

np.choose not giving desired result after broadcasting

Categories

Resources