NumPy 2D array boolean indexing with each axis - python

I created 2D array and I did boolean indexing with 2 bool index arrays.
first one is for axis 0, next one is for axis 1.
I expected that values on cross True and True from each axis are selected like Pandas.
but the result is not.
I wonder how it works that code below.
and I want to get the link from official numpy site describing this question.
Thanks in advance.
a = np.arange(9).reshape(3,3)
a
----------------------------
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
a[ [True, False, True], [True, False, True] ]
--------------------------
array([0, 8])
My expectation is [0, 6, 2, 8].
(I know how to get the result that I expect.)

In [20]: a = np.arange(9).reshape(3,3)
If the lists are passed to ix_, the result is 2 arrays that can be used, with broadcasting to index the desired block:
In [21]: np.ix_([True, False, True], [True, False, True] )
Out[21]:
(array([[0],
[2]]),
array([[0, 2]]))
In [22]: a[_]
Out[22]:
array([[0, 2],
[6, 8]])
This isn't 1d, but can be easily raveled.
Trying to make equivalent boolean arrays does not work:
In [23]: a[[[True], [False], [True]], [True, False, True]]
Traceback (most recent call last):
File "<ipython-input-23-26bc93cfc53a>", line 1, in <module>
a[[[True], [False], [True]], [True, False, True]]
IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed
Boolean indexes must be either 1d, or nd matching the target, here (3,3).
In [26]: np.array([True, False, True])[:,None]& np.array([True, False, True])
Out[26]:
array([[ True, False, True],
[False, False, False],
[ True, False, True]])

What you want is consecutive slices: a[[True, False, True]][:,[True, False, True]]
a = np.arange(9).reshape(3,3)
x = [True, False, True]
y = [True, False, True]
a[x][:,y]
as flat array
a[[True, False, True]][:,[True, False, True]].flatten(order='F')
output: array([0, 6, 2, 8])
alternative
NB. this requires arrays for slicing
a = np.arange(9).reshape(3,3)
x = np.array([False, False, True])
y = np.array([True, False, True])
a.T[x&y[:,None]]
output: array([0, 6, 2, 8])

Related

Histogramming boolean numpy arrays by giving each array a unique label

I have a large sample (M) of boolean arrays of length 'N'. So there are 2^N unique boolean arrays possible.
I would like to know how many arrays are duplicates of each other and create a histogram.
One possibility is to create a unique integer (a[0] + a[1]*2 + a[3]*4 + ...+a[N]*2^(N-1)) for each unique array and histogram that integer.
But this is going to be O(M*N). What is the best way to do this?
numpy.ravel_multi_index is able to do this for you:
arr = np.array([[True, True, True],
[True, False, True],
[True, False, True],
[False, False, False],
[True, False, True]], dtype=int)
nums = np.ravel_multi_index(arr.T, (2,) * arr.shape[1])
>>> nums
array([7, 5, 5, 0, 5], dtype=int64)
And since you need a histogram, use
>>> np.histogram(nums, bins=np.arange(2**arr.shape[1]+1))
(array([1, 0, 0, 0, 0, 3, 0, 1], dtype=int64),
array([0, 1, 2, 3, 4, 5, 6, 7, 8]))
Another option is to use np.unique:
>>> np.unique(arr, return_counts=True, axis=0)
(array([[0, 0, 0],
[1, 0, 1],
[1, 1, 1]]),
array([1, 3, 1], dtype=int64))
With vectorized operation, the creation of a key is much more faster than a[0] + a[1]x2 + a[3]x4 + ...+a[N]*2^(N-1). I think that's no a better solution... in any case you need to almost "read" one time each value, and this require MxN step.
N = 3
M = 5
sample = [
np.array([True, True, True]),
np.array([True, False, True]),
np.array([True, False, True]),
np.array([False, False, False]),
np.array([True, False, True]),
]
multipliers = [2<<i for i in range(N-2, -1, -1)]+[1]
buckets = {}
buck2vec = {}
for s in sample:
key = sum(s*multipliers)
if key not in buckets:
buckets[key] = 0
buck2vec[key] = s
buckets[key]+=1
for key in buckets:
print(f"{buck2vec[key]} -> {buckets[key]} occurency")
Results:
[False False False] -> 1 occurency
[ True False True] -> 3 occurency
[ True True True] -> 1 occurency

Unexpected masked element in numpy's isin() with masked arrays. Bug?

Using numpy and the following masked arrays
import numpy.ma as ma
a = ma.MaskedArray([[1,2,3],[4,5,6]], [[True,False,False],[False,False,False]])
ta = ma.array([1,4,5])
>>> a
masked_array(
data=[[--, 2, 3],
[4, 5, 6]],
mask=[[ True, False, False],
[False, False, False]],
fill_value=999999)
>>> ta
masked_array(data=[1, 4, 5],
mask=False,
fill_value=999999)
to check for each element in a if it is in ta, I use
ma.isin(a, ta)
This command gives
masked_array(
data=[[False, False, False],
[True, True, --]],
mask=[[False, False, False],
[False, False, True]],
fill_value=True)
Why is the last element in the result masked? Neither of the input arrays is masked at this point.
Using the the standard numpy version produces to be expected results:
>>> import numpy as np
>>> np.isin(a, ta)
array([[ True, False, False],
[ True, True, False]])
Here, however, the very first element is True because the mask of a was ignored.
Tested with Python 3.9.4 and numpy 1.20.3.

Boolean indicies of extended slice

I have the following numpy array:
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
I can use extended slicing to select e.g. columns:
>>> a[:,0::2]
array([[1, 3],
[4, 6],
[7, 9]])
>>> a[:,1::2]
array([[2],
[5],
[8]])
But I want to produce the following:
array([[True, False, True],
[True, False, True],
[True, False, True]])
array([[False, True, False],
[False, True, False],
[False, True, False]])
import numpy as np
bools = np.array([[False, False, False],
[False, False, False],
[False, False, False]])
bools[:, 0::2] = True
print(bools)
Output:
[[ True False True]
[ True False True]
[ True False True]]
np.array([[True if y%2==0 else False for y,z in enumerate(x)] for x in bools])
np.array([[False if y%2==0 else True for y,z in enumerate(x)] for x in bools])
Explanation:
By using list comprehension, variable 'x' iterates through each row of a. The inner list comprehension iterates through each of this('x' from outer comprehension) list elements. It can be observed that in your output, True & False values depend on index of the elements rather than the element values. Hence by using enumerate(), we get the index of each element in 'y' & value in 'z'. And using conditions on 'y', we decide on replacing with True or False

Mask from max values in numpy array, specific axis

Input example:
I have a numpy array, e.g.
a=np.array([[0,1], [2, 1], [4, 8]])
Desired output:
I would like to produce a mask array with the max value along a given axis, in my case axis 1, being True and all others being False. e.g. in this case
mask = np.array([[False, True], [True, False], [False, True]])
Attempt:
I have tried approaches using np.amax but this returns the max values in a flattened list:
>>> np.amax(a, axis=1)
array([1, 2, 8])
and np.argmax similarly returns the indices of the max values along that axis.
>>> np.argmax(a, axis=1)
array([1, 0, 1])
I could iterate over this in some way but once these arrays become bigger I want the solution to remain something native in numpy.
Method #1
Using broadcasting, we can use comparison against the max values, while keeping dims to facilitate broadcasting -
a.max(axis=1,keepdims=1) == a
Sample run -
In [83]: a
Out[83]:
array([[0, 1],
[2, 1],
[4, 8]])
In [84]: a.max(axis=1,keepdims=1) == a
Out[84]:
array([[False, True],
[ True, False],
[False, True]], dtype=bool)
Method #2
Alternatively with argmax indices for one more case of broadcasted-comparison against the range of indices along the columns -
In [92]: a.argmax(axis=1)[:,None] == range(a.shape[1])
Out[92]:
array([[False, True],
[ True, False],
[False, True]], dtype=bool)
Method #3
To finish off the set, and if we are looking for performance, use intialization and then advanced-indexing -
out = np.zeros(a.shape, dtype=bool)
out[np.arange(len(a)), a.argmax(axis=1)] = 1
Create an identity matrix and select from its rows using argmax on your array:
np.identity(a.shape[1], bool)[a.argmax(axis=1)]
# array([[False, True],
# [ True, False],
# [False, True]], dtype=bool)
Please note that this ignores ties, it just goes with the value returned by argmax.
You're already halfway in the answer. Once you compute the max along an axis, you can compare it with the input array and you'll have the required binary mask!
In [7]: maxx = np.amax(a, axis=1)
In [8]: maxx
Out[8]: array([1, 2, 8])
In [12]: a >= maxx[:, None]
Out[12]:
array([[False, True],
[ True, False],
[False, True]], dtype=bool)
Note: This uses NumPy broadcasting when doing the comparison between a and maxx
in on line : np.equal(a.max(1)[:,None],a) or np.equal(a.max(1),a.T).T .
But this can lead to several ones in a row.
In a multi-dimensional case you can also use np.indices. Let's suppose you have an array:
a = np.array([[
[0, 1, 2],
[3, 8, 5],
[6, 7, -1],
[9, 5, 8]],[
[5, 2, 8],
[7, 6, -3],
[-1, 2, 1],
[3, 5, 6]]
])
you can access argmax values calculated for axis 0 like so:
k = np.zeros((2, 4, 3), np.bool)
k[a.argmax(0), ind[0], ind[1]] = 1
The output would be:
array([[[False, False, False],
[False, True, True],
[ True, True, False],
[ True, True, True]],
[[ True, True, True],
[ True, False, False],
[False, False, True],
[False, False, False]]])

Different starting indices for slices in NumPy

I'm wondering if it's possible without iterating with a for loop to do something like this:
a = np.array([[1, 2, 5, 3, 4],
[4, 5, 6, 7, 8]])
cleaver = np.argmax(a == 5, axis=1) # np.array([2, 1])
foo(a, cleaver)
>>> np.array([False, False, True, True, True],
[False, True, True, True, True])
Is there a way to accomplish this through slicing or some other non-iterative function? The arrays I'm using are quite large and iterating over them row by row is prohibitively expensive.
You can use some broadcasting magic -
cleaver[:,None] <= np.arange(a.shape[1])
Sample run -
In [60]: a
Out[60]:
array([[1, 2, 5, 3, 4],
[4, 5, 6, 7, 8]])
In [61]: cleaver
Out[61]: array([2, 1])
In [62]: cleaver[:,None] <= np.arange(a.shape[1])
Out[62]:
array([[False, False, True, True, True],
[False, True, True, True, True]], dtype=bool)

Categories