Elementwise AND or OR operations in python for 2D array - python

Is there any method in python to compute elementwise OR or AND operations for 2D arrays across rows or columns?
For example, for the following array, elementwise OR operations across row would result in a vector [1, 0, 0, 0, 0, 0, 0, 0].
array([[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)

numpy has logical_or, logical_xor and logical_and which have a reduce method
>> np.logical_or.reduce(a, axis=0)
array([ True, False, False, False, False, False, False, False], dtype=bool)
as you see in the example they coerce to bool dtype, so if you require uint8 you have to cast back in the end.
since bools are stored as bytes you can use cheap viewcasting for that.
with the axis keyword you can select along which axis to reduce. it is possible to select multiple axes
>> np.logical_or.reduce(a, axis=1)
array([ True, True, True, True], dtype=bool)
>>> np.logical_or.reduce(a, axis=(0, 1))
True
the keepdims keyword is useful for broadcasting, for example to find all "crosses" of rows and columns >= 2 in array b
>>> b = np.random.randint(0,10, (4, 4))
>>> b
array([[0, 5, 3, 4],
[4, 1, 5, 4],
[4, 5, 5, 5],
[2, 4, 6, 1]])
>>> rows = np.logical_and.reduce(b >= 2, axis=1, keepdims=True)
# keepdims=False (default) -> rows.shape==(4,) keepdims=True -> rows.shape==(4, 1)
>>> cols = np.logical_and.reduce(b >= 2, axis=0, keepdims=True)
# keepdims=False (default) -> cols.shape==(4,) keepdims=True -> cols.shape==(1, 4)
>>> rows & cols # shapes (4, 1) and (1, 4) are broadcast to (4, 4)
array([[False, False, False, False],
[False, False, False, False],
[False, False, True, False],
[False, False, False, False]], dtype=bool)
notice the slight abuse of the & operator which stands for bitwise_and. since the effect is the same on bools (in fact trying to use and in this place would have thrown an exception) this is common practice
as #ajcr points out the popular np.any and np.all are shorthand for np.logical_or.reduce and np.logical_and.reduce.
note, however, that there are subtle differences
>>> np.logical_or.reduce(a)
array([ True, False, False, False, False, False, False, False], dtype=bool)
>>> np.any(a)
True
OR:
if you want to stick with uint8 and know for certain all your entries will be 0 and 1 you can use bitwise_and, bitwise_or and bitwise_xor
>>> np.bitwise_or.reduce(a, axis=0)
array([1, 0, 0, 0, 0, 0, 0, 0], dtype=uint8)

Related

How to efficiently filter/create mask of numpy.array based on list of tuples

I try to create mask of numpy.array based on list of tuples. Here is my solution that produces expected result:
import numpy as np
filter_vals = [(1, 1, 0), (0, 0, 1), (0, 1, 0)]
data = np.array([
[[0, 0, 0], [1, 1, 0], [1, 1, 1]],
[[1, 0, 0], [0, 1, 0], [0, 0, 1]],
[[1, 1, 0], [0, 1, 1], [1, 0, 1]],
])
mask = np.array([], dtype=bool)
for f_val in filter_vals:
if mask.size == 0:
mask = (data == f_val).all(-1)
else:
mask = mask | (data == f_val).all(-1)
Output/mask:
array([[False, True, False],
[False, True, True],
[ True, False, False]]
Problem is that with bigger data array and increasing number of tuples in filter_vals, it is getting slower.
It there any better solution? I tried to use np.isin(data, filter_vals), but it does not provide result I need.
A classical approach using broadcasting would be:
*A, B = data.shape
(data.reshape((-1,B)) == np.array(filter_vals)[:,None]).all(-1).any(0).reshape(A)
This will however be memory expensive. So applicability really depends on your use case.
output:
array([[False, True, False],
[False, True, True],
[ True, False, False]])

reduce ndarray dimesion to max nonzero index

I have a large ndarray and I want to reduce it's first dimension, replacing it with the highest nonzero index number of that dimension. For example, if [:,0,0] is (0,1,6,0,0), I would like to [0,0] of the new array to equal "3". I've run ndarray.nonzero, but I'm unsure of where to go from there. Any help appreciated. A little code below to get started:
import numpy as np
myary = np.random.randint(0,4,(10,5,5))
newary = np.nonzero(myary)
"Mask, flip and argmax" solution
You can still use argmax for another purpose: getting the first Trues along the flipped subarrays.
Condensed version
def last_nonzero(arr):
return arr.shape[-1] - np.argmax(np.flip(arr>0, axis=-1), axis=-1) - 1
last_nonzero(np.array([[0, 1, 6, 0, 0], [1, 2, 0, 3, 0]]))
# > array([2, 3], dtype=int64)
Explanation
Just generating sparse data:
import numpy as np
myary = np.random.randint(0,2, (4,3,2))*np.random.randint(0,20,(4, 3, 2))
# > array([[[18, 5, 3, 17],
# [ 0, 0, 0, 0],
# [ 0, 14, 0, 17]],
# [[ 0, 0, 10, 8],
# [ 0, 18, 13, 0],
# [ 7, 3, 0, 0]]])
We can now mask for values > 0 to get a boolean array, then flip it along the last axis:
flipped_mask = np.flip(myary>0, axis=-1)
# > array([[[ True, True, True, True],
# [False, False, False, False],
# [ True, False, True, False]],
# [[ True, True, False, False],
# [False, True, True, False],
# [False, False, True, True]]])
Now we can take the argmax along the last axis. It will give us the index of the first True in each subarray of the mask.
max_mask = np.argmax(flipped_mask, axis=-1)
# > array([[0, 0, 0],
# [0, 1, 2]], dtype=int64)
Finally, since we processed the flipped version, we must "flip back" the indices by substracting them to the length of the subarrays (minus 1 because of zero-indexing):
last_nonzero = myary.shape[-1] - max_mask - 1
# > array([[3, 3, 3],
# [3, 2, 1]], dtype=int64)
Observation
If a "subarray" is full of zeros only, this solution will return the last index of the subarray.
last_nonzero(np.array([[0,1,2,3,0], [0, 0, 0, 0, 0]]))
# > array([3, 4], dtype=int64)

Numpy array , Find columns with 1s

I am looking to find columns in a numpy array where atleast one cell has a 1.
Input Array
[0,0,1,0,0,0,1,0,0,1]
[0,1,0,0,0,0,0,0,1,0]
[0,0,0,0,0,0,0,1,0,0]
[0,0,0,1,0,0,1,0,0,0]
Expected Output
[0,1,1,1,0,0,1,1,1,1]
Use numpy.any with axis=0 (to flatten along the first axis, i.e. flatten along the rows):
>>> np.any(a, axis=0)
array([False, True, True, True, False, False, True, True, True, True], dtype=bool)
Of course, you can convert the boolean array into integers easily:
>>> np.any(a, axis=0)*1
array([0, 1, 1, 1, 0, 0, 1, 1, 1, 1])
You can simply | (or) them all together:
>>> np.array([0,0,1,0,0,0,1,0,0,1]) | np.array([0,1,0,0,0,0,0,0,1,0])
array([0, 1, 1, 0, 0, 0, 1, 0, 1, 1])

How can you turn an index array into a mask array in Numpy?

Is it possible to convert an array of indices to an array of ones and zeros, given the range?
i.e. [2,3] -> [0, 0, 1, 1, 0], in range of 5
I'm trying to automate something like this:
>>> index_array = np.arange(200,300)
array([200, 201, ... , 299])
>>> mask_array = ??? # some function of index_array and 500
array([0, 0, 0, ..., 1, 1, 1, ... , 0, 0, 0])
>>> train(data[mask_array]) # trains with 200~299
>>> predict(data[~mask_array]) # predicts with 0~199, 300~499
Here's one way:
In [1]: index_array = np.array([3, 4, 7, 9])
In [2]: n = 15
In [3]: mask_array = np.zeros(n, dtype=int)
In [4]: mask_array[index_array] = 1
In [5]: mask_array
Out[5]: array([0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0])
If the mask is always a range, you can eliminate index_array, and assign 1 to a slice:
In [6]: mask_array = np.zeros(n, dtype=int)
In [7]: mask_array[5:10] = 1
In [8]: mask_array
Out[8]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
If you want an array of boolean values instead of integers, change the dtype of mask_array when it is created:
In [11]: mask_array = np.zeros(n, dtype=bool)
In [12]: mask_array
Out[12]:
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False], dtype=bool)
In [13]: mask_array[5:10] = True
In [14]: mask_array
Out[14]:
array([False, False, False, False, False, True, True, True, True,
True, False, False, False, False, False], dtype=bool)
For a single dimension, try:
n = (15,)
index_array = [2, 5, 7]
mask_array = numpy.zeros(n)
mask_array[index_array] = 1
For more than one dimension, convert your n-dimensional indices into one-dimensional ones, then use ravel:
n = (15, 15)
index_array = [[1, 4, 6], [10, 11, 2]] # you may need to transpose your indices!
mask_array = numpy.zeros(n)
flat_index_array = np.ravel_multi_index(
index_array,
mask_array.shape)
numpy.ravel(mask_array)[flat_index_array] = 1
There's a nice trick to do this as a one-liner, too - use the numpy.in1d and numpy.arange functions like this (the final line is the key part):
>>> x = np.linspace(-2, 2, 10)
>>> y = x**2 - 1
>>> idxs = np.where(y<0)
>>> np.in1d(np.arange(len(x)), idxs)
array([False, False, False, True, True, True, True, False, False, False], dtype=bool)
The downside of this approach is that it's ~10-100x slower than the appropch Warren Weckesser gave... but it's a one-liner, which may or may not be what you're looking for.
As requested, here it is in an answer. The code:
[x in index_array for x in range(500)]
will give you a mask like you asked for, but it will use Bools instead of 0's and 1's.

Select elements of numpy array via boolean mask array

I have a boolean mask array a of length n:
a = np.array([True, True, True, False, False])
I have a 2d array with n columns:
b = np.array([[1,2,3,4,5], [1,2,3,4,5]])
I want a new array which contains only the "True"-values, for example
c = ([[1,2,3], [1,2,3]])
c = a * b does not work because it contains also "0" for the false columns what I don't want
c = np.delete(b, a, 1) does not work
Any suggestions?
You probably want something like this:
>>> a = np.array([True, True, True, False, False])
>>> b = np.array([[1,2,3,4,5], [1,2,3,4,5]])
>>> b[:,a]
array([[1, 2, 3],
[1, 2, 3]])
Note that for this kind of indexing to work, it needs to be an ndarray, like you were using, not a list, or it'll interpret the False and True as 0 and 1 and give you those columns:
>>> b[:,[True, True, True, False, False]]
array([[2, 2, 2, 1, 1],
[2, 2, 2, 1, 1]])
You can use numpy.ma module and use np.ma.masked_array function to do so.
>>> x = np.array([1, 2, 3, -1, 5])
>>> mx = ma.masked_array(x, mask=[0, 0, 0, 1, 0])
masked_array(data=[1, 2, 3, --, 5], mask=[False, False, False, True, False], fill_value=999999)
Hope I'm not too late! Here's your array:
X = np.array([[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]])
Let's create an array of zeros of the same shape as X:
mask = np.zeros_like(X)
# array([[0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0]])
Then, specify the columns that you want to mask out or hide with a 1. In this case, we want the last 2 columns to be masked out.
mask[:, -2:] = 1
# array([[0, 0, 0, 1, 1],
# [0, 0, 0, 1, 1]])
Create a masked array:
X_masked = np.ma.masked_array(X, mask)
# masked_array(data=[[1, 2, 3, --, --],
# [1, 2, 3, --, --]],
# mask=[[False, False, False, True, True],
# [False, False, False, True, True]],
# fill_value=999999)
We can then do whatever we want with X_masked, like taking the sum of each column (along axis=0):
np.sum(X_masked, axis=0)
# masked_array(data=[2, 4, 6, --, --],
# mask=[False, False],
# fill_value=1e+20)
Great thing about this is that X_masked is just a view of X, not a copy.
X_masked.base is X
# True

Categories