Select elements of numpy array via boolean mask array

Select elements of numpy array via boolean mask array - python

I have a boolean mask array a of length n:
a = np.array([True, True, True, False, False])
I have a 2d array with n columns:
b = np.array([[1,2,3,4,5], [1,2,3,4,5]])
I want a new array which contains only the "True"-values, for example
c = ([[1,2,3], [1,2,3]])
c = a * b does not work because it contains also "0" for the false columns what I don't want
c = np.delete(b, a, 1) does not work
Any suggestions?

You probably want something like this:
>>> a = np.array([True, True, True, False, False])
>>> b = np.array([[1,2,3,4,5], [1,2,3,4,5]])
>>> b[:,a]
array([[1, 2, 3],
[1, 2, 3]])
Note that for this kind of indexing to work, it needs to be an ndarray, like you were using, not a list, or it'll interpret the False and True as 0 and 1 and give you those columns:
>>> b[:,[True, True, True, False, False]]
array([[2, 2, 2, 1, 1],
[2, 2, 2, 1, 1]])

You can use numpy.ma module and use np.ma.masked_array function to do so.
>>> x = np.array([1, 2, 3, -1, 5])
>>> mx = ma.masked_array(x, mask=[0, 0, 0, 1, 0])
masked_array(data=[1, 2, 3, --, 5], mask=[False, False, False, True, False], fill_value=999999)

Hope I'm not too late! Here's your array:
X = np.array([[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]])
Let's create an array of zeros of the same shape as X:
mask = np.zeros_like(X)
# array([[0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0]])
Then, specify the columns that you want to mask out or hide with a 1. In this case, we want the last 2 columns to be masked out.
mask[:, -2:] = 1
# array([[0, 0, 0, 1, 1],
# [0, 0, 0, 1, 1]])
Create a masked array:
X_masked = np.ma.masked_array(X, mask)
# masked_array(data=[[1, 2, 3, --, --],
# [1, 2, 3, --, --]],
# mask=[[False, False, False, True, True],
# [False, False, False, True, True]],
# fill_value=999999)
We can then do whatever we want with X_masked, like taking the sum of each column (along axis=0):
np.sum(X_masked, axis=0)
# masked_array(data=[2, 4, 6, --, --],
# mask=[False, False],
# fill_value=1e+20)
Great thing about this is that X_masked is just a view of X, not a copy.
X_masked.base is X
# True

Related

Histogramming boolean numpy arrays by giving each array a unique label

I have a large sample (M) of boolean arrays of length 'N'. So there are 2^N unique boolean arrays possible.
I would like to know how many arrays are duplicates of each other and create a histogram.
One possibility is to create a unique integer (a[0] + a[1]*2 + a[3]*4 + ...+a[N]*2^(N-1)) for each unique array and histogram that integer.
But this is going to be O(M*N). What is the best way to do this?

numpy.ravel_multi_index is able to do this for you:
arr = np.array([[True, True, True],
[True, False, True],
[True, False, True],
[False, False, False],
[True, False, True]], dtype=int)
nums = np.ravel_multi_index(arr.T, (2,) * arr.shape[1])
>>> nums
array([7, 5, 5, 0, 5], dtype=int64)
And since you need a histogram, use
>>> np.histogram(nums, bins=np.arange(2**arr.shape[1]+1))
(array([1, 0, 0, 0, 0, 3, 0, 1], dtype=int64),
array([0, 1, 2, 3, 4, 5, 6, 7, 8]))
Another option is to use np.unique:
>>> np.unique(arr, return_counts=True, axis=0)
(array([[0, 0, 0],
[1, 0, 1],
[1, 1, 1]]),
array([1, 3, 1], dtype=int64))

With vectorized operation, the creation of a key is much more faster than a[0] + a[1]x2 + a[3]x4 + ...+a[N]*2^(N-1). I think that's no a better solution... in any case you need to almost "read" one time each value, and this require MxN step.
N = 3
M = 5
sample = [
np.array([True, True, True]),
np.array([True, False, True]),
np.array([True, False, True]),
np.array([False, False, False]),
np.array([True, False, True]),
]
multipliers = [2<<i for i in range(N-2, -1, -1)]+[1]
buckets = {}
buck2vec = {}
for s in sample:
key = sum(s*multipliers)
if key not in buckets:
buckets[key] = 0
buck2vec[key] = s
buckets[key]+=1
for key in buckets:
print(f"{buck2vec[key]} -> {buckets[key]} occurency")
Results:
[False False False] -> 1 occurency
[ True False True] -> 3 occurency
[ True True True] -> 1 occurency

is there a method for finding the indexes of a 2d-array based on a given array

suppose we have two arrays like these two:
A=np.array([[1, 4, 3, 0, 5],[6, 0, 7, 12, 11],[20, 15, 34, 45, 56]])
B=np.array([[4, 5, 6, 7]])
I intend to write a code in which I can find the indexes of an array such as A based on values in
the array B
for example, I want the final results to be something like this:
C=[[0 1]
[0 4]
[1 0]
[1 2]]
can anybody provide me with a solution or a hint?

Do you mean?
In [375]: np.isin(A,B[0])
Out[375]:
array([[False, True, False, False, True],
[ True, False, True, False, False],
[False, False, False, False, False]])
In [376]: np.argwhere(np.isin(A,B[0]))
Out[376]:
array([[0, 1],
[0, 4],
[1, 0],
[1, 2]])
B shape of (1,4) where the initial 1 isn't necessary. That's why I used B[0], though isin, via in1d ravels it anyways.
where is result is often more useful
In [381]: np.where(np.isin(A,B))
Out[381]: (array([0, 0, 1, 1]), array([1, 4, 0, 2]))
though it's a bit harder to understand.
Another way to get the isin array:
In [383]: (A==B[0,:,None,None]).any(axis=0)
Out[383]:
array([[False, True, False, False, True],
[ True, False, True, False, False],
[False, False, False, False, False]])

You can try in this way by using np.where().
index = []
for num in B:
for nums in num:
x,y = np.where(A == nums)
index.append([x,y])
print(index)
>>array([[0,1],
[0,4],
[1,0],
[1,2]])

With zip and np.where:
>>> list(zip(*np.where(np.in1d(A, B).reshape(A.shape))))
[(0, 1), (0, 4), (1, 0), (1, 2)]
Alternatively:
>>> np.vstack(np.where(np.isin(A,B))).transpose()
array([[0, 1],
[0, 4],
[1, 0],
[1, 2]], dtype=int64)

reduce ndarray dimesion to max nonzero index

I have a large ndarray and I want to reduce it's first dimension, replacing it with the highest nonzero index number of that dimension. For example, if [:,0,0] is (0,1,6,0,0), I would like to [0,0] of the new array to equal "3". I've run ndarray.nonzero, but I'm unsure of where to go from there. Any help appreciated. A little code below to get started:
import numpy as np
myary = np.random.randint(0,4,(10,5,5))
newary = np.nonzero(myary)

"Mask, flip and argmax" solution
You can still use argmax for another purpose: getting the first Trues along the flipped subarrays.
Condensed version
def last_nonzero(arr):
return arr.shape[-1] - np.argmax(np.flip(arr>0, axis=-1), axis=-1) - 1
last_nonzero(np.array([[0, 1, 6, 0, 0], [1, 2, 0, 3, 0]]))
# > array([2, 3], dtype=int64)
Explanation
Just generating sparse data:
import numpy as np
myary = np.random.randint(0,2, (4,3,2))*np.random.randint(0,20,(4, 3, 2))
# > array([[[18, 5, 3, 17],
# [ 0, 0, 0, 0],
# [ 0, 14, 0, 17]],
# [[ 0, 0, 10, 8],
# [ 0, 18, 13, 0],
# [ 7, 3, 0, 0]]])
We can now mask for values > 0 to get a boolean array, then flip it along the last axis:
flipped_mask = np.flip(myary>0, axis=-1)
# > array([[[ True, True, True, True],
# [False, False, False, False],
# [ True, False, True, False]],
# [[ True, True, False, False],
# [False, True, True, False],
# [False, False, True, True]]])
Now we can take the argmax along the last axis. It will give us the index of the first True in each subarray of the mask.
max_mask = np.argmax(flipped_mask, axis=-1)
# > array([[0, 0, 0],
# [0, 1, 2]], dtype=int64)
Finally, since we processed the flipped version, we must "flip back" the indices by substracting them to the length of the subarrays (minus 1 because of zero-indexing):
last_nonzero = myary.shape[-1] - max_mask - 1
# > array([[3, 3, 3],
# [3, 2, 1]], dtype=int64)
Observation
If a "subarray" is full of zeros only, this solution will return the last index of the subarray.
last_nonzero(np.array([[0,1,2,3,0], [0, 0, 0, 0, 0]]))
# > array([3, 4], dtype=int64)

Numpy element-wise in operation

Suppose I have a column vector y with length n, and I have a matrix X of size n*m. I want to check for each element i in y, whether the element is in the corresponding row in X. What is the most efficient way of doing this?
For example:
y = [1,2,3,4].T
and
X =[[1, 2, 3],[3, 4, 5],[4, 3, 2],[2, 2, 2]]
Then the output should be
[1, 0, 1, 0] or [True, False, True, False]
which ever is easier.
Of course we can use a for loop to iterate through both y and X, but is there any more efficient way of doing this?

Vectorized approach using broadcasting -
((X == y[:,None]).any(1)).astype(int)
Sample run -
In [41]: X # Input 1
Out[41]:
array([[1, 2, 3],
[3, 4, 5],
[4, 3, 2],
[2, 2, 2]])
In [42]: y # Input 2
Out[42]: array([1, 2, 3, 4])
In [43]: X == y[:,None] # Broadcasted comparison
Out[43]:
array([[ True, False, False],
[False, False, False],
[False, True, False],
[False, False, False]], dtype=bool)
In [44]: (X == y[:,None]).any(1) # Check for any match along each row
Out[44]: array([ True, False, True, False], dtype=bool)
In [45]: ((X == y[:,None]).any(1)).astype(int) # Convert to 1s and 0s
Out[45]: array([1, 0, 1, 0])

python numpy get masked data without flattening

How do I get the masked data only without flattening the data into a 1D array? That is, suppose I have a numpy array
a = np.array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
and I mask all elements greater than 1,
b = ma.masked_greater(a, 1)
masked_array(data =
[[0 1 -- --]
[0 1 -- --]
[0 1 -- --]],
mask =
[[False False True True]
[False False True True]
[False False True True]],
fill_value = 999999)
How do I get only the masked elements without flattening the output? That is, I need to get
array([[ 2, 3],
[2, 3],
[2, 3]])

Lets try an example that produces a ragged result - different number of 'masked' values in each row.
In [292]: a=np.arange(12).reshape(3,4)
In [293]: a
Out[293]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [294]: a<6
Out[294]:
array([[ True, True, True, True],
[ True, True, False, False],
[False, False, False, False]], dtype=bool)
The flattened list of values that match this condition. It can't return a regular 2d array, so it has to revert to a flattened array.
In [295]: a[a<6]
Out[295]: array([0, 1, 2, 3, 4, 5])
do the same thing, but iterating row by row
In [296]: [a1[a1<6] for a1 in a]
Out[296]: [array([0, 1, 2, 3]), array([4, 5]), array([], dtype=int32)]
Trying to make an array of the result produces an object type array, which is little more than a list in an array wrapper:
In [297]: np.array([a1[a1<6] for a1 in a])
Out[297]: array([array([0, 1, 2, 3]), array([4, 5]), array([], dtype=int32)], dtype=object)
The fact that the result is ragged is a good indicator that it is difficult, if not impossible, to perform that action with one vectorized operation.
Here's another way of producing the list of arrays. With sum I find how many elements there are in each row, and then use this to split the flattened array into sublists.
In [320]: idx=(a<6).sum(1).cumsum()[:-1]
In [321]: idx
Out[321]: array([4, 6], dtype=int32)
In [322]: np.split(a[a<6], idx)
Out[322]: [array([0, 1, 2, 3]), array([4, 5]), array([], dtype=float64)]
It does use 'flattening'. And for these small examples it is slower than the row iteration. (Don't worry about the empty float array, split had to construct something and used a default dtype. )
A different mask, without empty rows clearly shows the equivalence of the 2 approaches.
In [344]: mask=np.tri(3,4,dtype=bool) # lower tri
In [345]: mask
Out[345]:
array([[ True, False, False, False],
[ True, True, False, False],
[ True, True, True, False]], dtype=bool)
In [346]: idx=mask.sum(1).cumsum()[:-1]
In [347]: idx
Out[347]: array([1, 3], dtype=int32)
In [348]: [a1[m] for a1,m in zip(a,mask)]
Out[348]: [array([0]), array([4, 5]), array([ 8, 9, 10])]
In [349]: np.split(a[mask],idx)
Out[349]: [array([0]), array([4, 5]), array([ 8, 9, 10])]

Zip the two lists together, and then filter them out:
data = [[0, 1, 1, 1], [0, 1, 1, 1], [0, 1, 1, 1]]
mask = [[False, False, True, True],
[False, False, True, True],
[False, False, True, True]]
zipped = zip(data, mask) # [([0, 1, 1, 1], [False, False, True, True]), ([0, 1, 1, 1], [False, False, True, True]), ([0, 1, 1, 1], [False, False, True, True])]
masked = []
for lst, mask in zipped:
pairs = zip(lst, mask) # [(0, False), (1, False), (1, True), (1, True)]
masked.append([num for num, b in pairs if b])
print(masked) # [[1, 1], [1, 1], [1, 1]]
or more succinctly:
zipped = [...]
masked = [[num for num, b in zip(lst, mask) if b] for lst, mask in zipped]
print(masked) # [[1, 1], [1, 1], [1, 1]]

Due to vectorization in numpy you can use np.where to select items from the first array and use None (or some arbitrary value) to indicate the places that a value has been masked out. Note that this means you have to use a less compact representation for the array so may want to use -1 or some special value.
import numpy as np
a = np.array([
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
mask = np.array([[ True, True, True, True],
[ True, False, True, True],
[False, True, True, False]])
np.where(a, np.array, None)
This produces
array([[0, 1, 2, 3],
[0, None, 2, 3],
[None, 1, 2, None]], dtype=object)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Select elements of numpy array via boolean mask array - python

You can use numpy.ma module and use np.ma.masked_array function to do so. >>> x = np.array([1, 2, 3, -1, 5]) >>> mx = ma.masked_array(x, mask=[0, 0, 0, 1, 0]) masked_array(data=[1, 2, 3, --, 5], mask=[False, False, False, True, False], fill_value=999999)

Related

Histogramming boolean numpy arrays by giving each array a unique label

is there a method for finding the indexes of a 2d-array based on a given array

reduce ndarray dimesion to max nonzero index

Numpy element-wise in operation

python numpy get masked data without flattening

Categories

Resources