Interpretation of counts for `numpy.unique` when applied on a matrix - python

numpy.unique has an optional argument return_counts. From the docs:
return_counts bool, optional If True, also return the number of times
each unique item appears in ar.
New in version 1.9.0.
Which is straightforward for a 1-D array. However, I'm trying to the unique values and counts for each row of a matrix. Here is a sample matrix:
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5],
])
When I apply np.unique:
np.unique(m_sample, axis=1, return_counts=True)
(array([[1, 1, 2],
[2, 2, 2],
[3, 3, 3],
[1, 5, 4]]), array([1, 1, 1]))
I'm not really sure what the returned matrix here represents, much less so the counts array. Is this perhaps a bug in numpy (or maybe a case the developer did not consider)? Am I misunderstanding how to use the parameters in this case?

When you specify an axis, np.unique returns unique subarrays indexed along this axis. To see is better, assume that one of the rows repeats:
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5],
[1, 2, 1]
])
In such case np.unique(m_sample, axis=0, return_counts=True) gives:
(array([[1, 2, 1],
[1, 4, 5],
[2, 2, 2],
[3, 3, 3]]),
array([2, 1, 1, 1]))
The first element of this tuple lists unique rows of the array, and the second how many times each row appears in the array. In this example, the row [1, 2, 1] is repeated twice.
To get unique values in each row you can try, for example, the following:
import numpy as np
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5]
])
s = np.sort(m_sample, axis=1)
mask = np.full(m_sample.shape, True)
mask[:, 1:] = s[:, :-1] != s[:, 1:]
np.split(s[mask], np.cumsum(mask.sum(axis=1)))[:-1]
It gives:
[array([1, 2]), array([2]), array([3]), array([1, 4, 5])]

Related

Removing values from a 3D array of indices

I have a 3D array of indices generated from np.argsort, sorted by the 0-th axis, so that each column is the sorting index. However, I want to drop some values from this array, say 0. Of course, I can remove the 0-th slice then sort again, but I need to repeat this sort for many times and each time I need to remove some different values, so I would like to see if there is a more efficient way to generate the array. I think this problem is the same as shifting the NaN value along axis=0 to the end.
Example
Consider the following 3D array of sorting indices. Notice that along axis=0 the array has unique values.
arr = np.array(
[[[0, 0],
[1, 2]],
[[1, 2],
[0, 1]],
[[2, 1],
[2, 0]]]
)
Suppose I would like to remove the value 0 from it. The result would look like
array([[[1, 2],
[1, 2]],
[[2, 1],
[2, 1]]])
What I've tried
I tried removing the values using np.where and then reshape the array, but it is different from the expected array.
>>> arr[np.where(arr != 0)]
array([1, 2, 1, 2, 1, 2, 1, 2])
>>> arr[np.where(arr != 0)].reshape(-1, 2, 2)
array([[[1, 2],
[1, 2]],
[[1, 2],
[1, 2]]])
Explanation of output
In consider arr[:, 1, 0] = [1, 0, 2]. After dropping 0, the new array is [1, 2]. Therefore new_arr[:, 1, 0] = [1, 2].
I just realized that you can specify the axis order in np.transpose, and I came up with a solution with that.
Solution
>>> arr_t = np.transpose(arr, (1, 2, 0))
>>> arr_dp = arr_t[arr_t != 0]
>>> arr_dp_rs = arr_dp.reshape(arr.shape[1], arr.shape[2], -1)
>>> new_arr = np.transpose(arr_dp_rs, (2, 0 ,1))
>>> new_arr
array([[[1, 2],
[1, 2]],
[[2, 1],
[2, 1]]])
Explanation
We first transpose arr so the 0-th axis is the inner most axis. This ensures after subsetting, the values are ordered in the 0-th axis.
>>> arr_t = np.transpose(arr, (1, 2, 0))
>>> arr_t
array([[[0, 1, 2],
[0, 2, 1]],
[[1, 0, 2],
[2, 1, 0]]])
>>> arr_dp = arr_t[arr_t != 0]
>>> arr_dp
array([1, 2, 2, 1, 1, 2, 2, 1])
Now the values are in the desired order but along the 0-th axis, we reshape it then swap the axis again.
arr_dp_rs = arr_dp.reshape(arr.shape[1], arr.shape[2], -1)
arr_dp_rs
array([[[1, 2],
[2, 1]],
[[1, 2],
[2, 1]]])
new_arr = np.transpose(arr_dp_rs, (2, 0 ,1))
new_arr
array([[[1, 2],
[1, 2]],
[[2, 1],
[2, 1]]])

Subsampling 3D array using the neighbourhood sum

The title is probably confusing. I have a reasonably large 3D numpy array. I'd like to cut it's size by 2^3 by binning blocks of size (2,2,2). Each element in the new 3D array should then contain the sum of the elements in it's respective block in the original array.
As an example, consider a 4x4x4 array:
input = [[[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]],
[[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]],
... ]]]
(I'm only representing half of it to save space). Notice that all the elements with the same value constitute a (2x2x2) block. The output should be a 2x2x2 array such that each element is the sum of a block:
output = [[[8, 16],
[24, 32]],
... ]]]
So 8 is the sum of all 1's, 16 is the sum of the 2's, and so on.
There's a builtin to do those block-wise reductions - skimage.measure.block_reduce-
In [36]: a
Out[36]:
array([[[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]],
[[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]]])
In [37]: from skimage.measure import block_reduce
In [39]: block_reduce(a, block_size=(2,2,2), func=np.sum)
Out[39]:
array([[[ 8, 16],
[24, 32]]])
Use other reduction ufuncs, say max-reduction -
In [40]: block_reduce(a, block_size=(2,2,2), func=np.max)
Out[40]:
array([[[1, 2],
[3, 4]]])
Implementing such a function isn't that difficult with NumPy tools and could be done like so -
def block_reduce_numpy(a, block_size, func):
shp = a.shape
new_shp = np.hstack([(i//j,j) for (i,j) in zip(shp,block_size)])
select_axes = tuple(np.arange(a.ndim)*2+1)
return func(a.reshape(new_shp),axis=select_axes)

argmax for multidimensional array along some axis

I have a multidimension array that looks like this:
my_array = np.arange(2)[:,None,None] *np.arange(4)[:, None]*np.arange(8)
I am looking for a multidimensional equivalent of the 2-D argmax
In particular, I am looking for argmax of maxima along axis = 2. I tried reshaping first, but reshaping will completely destroy the original indices information of the entire array, so it probably won't work. I have no clue how to do it and need helps from you guys. Thank you in advance
EDIT: Desire output is:
[(0,0,0),(1,3,1),(1,3,2),(1,3,3),(1,3,4),(1,3,5),(1,3,6),(1,3,7)]
This exactly is the array of the indices of maxima along axis = 2
For finding such argmax indices along the last axis of a 3D ndarray, we can use something along these lines -
In [66]: idx = my_array.reshape(-1,my_array.shape[-1]).argmax(0)
In [67]: r,c = np.unravel_index(idx,my_array.shape[:-1])
In [68]: l = np.arange(len(idx))
In [69]: np.c_[r,c,l]
Out[69]:
array([[0, 0, 0],
[1, 3, 1],
[1, 3, 2],
[1, 3, 3],
[1, 3, 4],
[1, 3, 5],
[1, 3, 6],
[1, 3, 7]])
To extend this to a generic ndarray -
In [99]: R = np.unravel_index(idx,my_array.shape[:-1])
In [104]: np.hstack((np.c_[R],l[:,None]))
Out[104]:
array([[0, 0, 0],
[1, 3, 1],
[1, 3, 2],
[1, 3, 3],
[1, 3, 4],
[1, 3, 5],
[1, 3, 6],
[1, 3, 7]])

How to count the number of 1D arrays in a 2D array (Python)?

If I have a numpy 2D array, say:
a = [[1, 2, 3], [2, 3, 4], [3, 4, 5], [1, 2, 3]]
How do I count the number of instances of [1, 2, 3] in a? (The answer I'm looking for is 2 in this case)
Since you said it's a numpy array, rather than a list, you can do something like:
>>> a = np.array([[1, 2, 3], [2, 3, 4], [3, 4, 5], [1, 2, 3]])
>>> sum((a == [1,2,3]).all(1))
2
(a == [1,2,3]).all(1) gives you a boolean array or where all the values in the row match [1,2,3]: array([ True, False, False, True], dtype=bool), and the sum of that is the count of all True values in there
If you want the counts of all the arrays you could use unique:
import numpy as np
a = np.array([[1, 2, 3], [2, 3, 4], [3, 4, 5], [1, 2, 3]])
uniques, counts = np.unique(a, return_counts=True, axis=0)
print([(unique, count) for unique, count in zip(uniques, counts)])
Output
[(array([1, 2, 3]), 2), (array([2, 3, 4]), 1), (array([3, 4, 5]), 1)]

How to sort in descending order with numpy?

I have a numpy array like this:
A = array([[1, 3, 2, 7],
[2, 4, 1, 3],
[6, 1, 2, 3]])
I would like to sort the rows of this matrix in descending order and get the arguments of the sorted matrix like this:
As = array([[3, 1, 2, 0],
[1, 3, 0, 2],
[0, 3, 2, 1]])
I did the following:
import numpy
A = numpy.array([[1, 3, 2, 7], [2, 4, 1, 3], [6, 1, 2, 3]])
As = numpy.argsort(A, axis=1)
But this gives me the sorting in ascending order. Also, after I spent some time looking for a solution in the internet, I expect that there must be an argument to argsort function from numpy that would reverse the order of sorting. But, apparently there is no such argument! Why!?
There is an argument called order. I tried, by guessing, numpy.argsort(..., order=reverse) but it does not work.
I looked for a solution in previous questions here and I found that I can do:
import numpy
A = numpy.array([[1, 3, 2, 7], [2, 4, 1, 3], [6, 1, 2, 3]])
As = numpy.argsort(A, axis=1)
As = As[::-1]
For some reason, As = As[::-1] does not give me the desired output.
Well, I guess it must be simple but I am missing something.
How can I sort a numpy array in descending order?
Just multiply your matrix by -1 to reverse order:
[In]: A = np.array([[1, 3, 2, 7],
[2, 4, 1, 3],
[6, 1, 2, 3]])
[In]: print( np.argsort(-A) )
[Out]: [[3 1 2 0]
[1 3 0 2]
[0 3 2 1]]

Categories