Numpy: Mask of elements in array1 that are also elements of array2 - python

I wonder if there is a numpy natural way of creating a binary mask over array1 for elements that are also in array2. Another way to say it, binary mask over array1 for intersection of array1 and 2.
This works:
def bin_mask(a, b):
return sum(a==n for n in b)
a = np.array([1,2,3,4,5,6,7,8,9,20])
b = np.array([3,5,7])
In: bin_mask(a,b)
Out: array([0, 0, 1, 0, 1, 0, 1, 0, 0, 0])
But I wonder if there is some numpy prebuilt I am missing.
Edit:
Correct answer from comments: np.isin(a, b). I also marked in1d as the correct answer. Both work.

in1d method does the trick too:
>>> np.in1d(a,b)
array([False, False, True, False, True, False, True, False, False, False])
>>> np.in1d(a,b).astype(int)
array([0, 0, 1, 0, 1, 0, 1, 0, 0, 0])

Related

How to efficiently filter/create mask of numpy.array based on list of tuples

I try to create mask of numpy.array based on list of tuples. Here is my solution that produces expected result:
import numpy as np
filter_vals = [(1, 1, 0), (0, 0, 1), (0, 1, 0)]
data = np.array([
[[0, 0, 0], [1, 1, 0], [1, 1, 1]],
[[1, 0, 0], [0, 1, 0], [0, 0, 1]],
[[1, 1, 0], [0, 1, 1], [1, 0, 1]],
])
mask = np.array([], dtype=bool)
for f_val in filter_vals:
if mask.size == 0:
mask = (data == f_val).all(-1)
else:
mask = mask | (data == f_val).all(-1)
Output/mask:
array([[False, True, False],
[False, True, True],
[ True, False, False]]
Problem is that with bigger data array and increasing number of tuples in filter_vals, it is getting slower.
It there any better solution? I tried to use np.isin(data, filter_vals), but it does not provide result I need.
A classical approach using broadcasting would be:
*A, B = data.shape
(data.reshape((-1,B)) == np.array(filter_vals)[:,None]).all(-1).any(0).reshape(A)
This will however be memory expensive. So applicability really depends on your use case.
output:
array([[False, True, False],
[False, True, True],
[ True, False, False]])

Changing the array value above the first non-zero element in the column

I'm looking for vectorized way to changing the array value above the first non-zero element in the column.
for x in range(array.shape[1]):
for y in range(array.shape[0]):
if array[y,x]>0:
break
else:
array[y,x]=255
In
Out
As you wrote about an array (not a DataFrame), I assume that you have
a Numpy array and want to use Numpy methods.
To do your task, run the following code:
np.where(np.cumsum(np.not_equal(array, 0), axis=0), array, 255)
Example and explanation of steps:
The source array:
array([[0, 1, 0],
[0, 0, 1],
[1, 1, 0],
[1, 0, 0]])
np.not_equal(array, 0) computes a boolean array with True for
elements != 0:
array([[False, True, False],
[False, False, True],
[ True, True, False],
[ True, False, False]])
np.cumsum(..., axis=0) computes cumulative sum (True counted as 1)
along axis 0 (in columns):
array([[0, 1, 0],
[0, 1, 1],
[1, 2, 1],
[2, 2, 1]], dtype=int32)
​4. The above array is a mask used in where. For masked values (where
the corresponding element of the mask is True (actually, != 0)),
take values from corresponding elements of array, otherwise take 255:
np.where(..., array, 255)
The result (for my array) is:
array([[255, 1, 255],
[255, 0, 1],
[ 1, 1, 0],
[ 1, 0, 0]])
Use masking:
array[array == 0] = 255

Elementwise AND or OR operations in python for 2D array

Is there any method in python to compute elementwise OR or AND operations for 2D arrays across rows or columns?
For example, for the following array, elementwise OR operations across row would result in a vector [1, 0, 0, 0, 0, 0, 0, 0].
array([[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
numpy has logical_or, logical_xor and logical_and which have a reduce method
>> np.logical_or.reduce(a, axis=0)
array([ True, False, False, False, False, False, False, False], dtype=bool)
as you see in the example they coerce to bool dtype, so if you require uint8 you have to cast back in the end.
since bools are stored as bytes you can use cheap viewcasting for that.
with the axis keyword you can select along which axis to reduce. it is possible to select multiple axes
>> np.logical_or.reduce(a, axis=1)
array([ True, True, True, True], dtype=bool)
>>> np.logical_or.reduce(a, axis=(0, 1))
True
the keepdims keyword is useful for broadcasting, for example to find all "crosses" of rows and columns >= 2 in array b
>>> b = np.random.randint(0,10, (4, 4))
>>> b
array([[0, 5, 3, 4],
[4, 1, 5, 4],
[4, 5, 5, 5],
[2, 4, 6, 1]])
>>> rows = np.logical_and.reduce(b >= 2, axis=1, keepdims=True)
# keepdims=False (default) -> rows.shape==(4,) keepdims=True -> rows.shape==(4, 1)
>>> cols = np.logical_and.reduce(b >= 2, axis=0, keepdims=True)
# keepdims=False (default) -> cols.shape==(4,) keepdims=True -> cols.shape==(1, 4)
>>> rows & cols # shapes (4, 1) and (1, 4) are broadcast to (4, 4)
array([[False, False, False, False],
[False, False, False, False],
[False, False, True, False],
[False, False, False, False]], dtype=bool)
notice the slight abuse of the & operator which stands for bitwise_and. since the effect is the same on bools (in fact trying to use and in this place would have thrown an exception) this is common practice
as #ajcr points out the popular np.any and np.all are shorthand for np.logical_or.reduce and np.logical_and.reduce.
note, however, that there are subtle differences
>>> np.logical_or.reduce(a)
array([ True, False, False, False, False, False, False, False], dtype=bool)
>>> np.any(a)
True
OR:
if you want to stick with uint8 and know for certain all your entries will be 0 and 1 you can use bitwise_and, bitwise_or and bitwise_xor
>>> np.bitwise_or.reduce(a, axis=0)
array([1, 0, 0, 0, 0, 0, 0, 0], dtype=uint8)

Numpy array , Find columns with 1s

I am looking to find columns in a numpy array where atleast one cell has a 1.
Input Array
[0,0,1,0,0,0,1,0,0,1]
[0,1,0,0,0,0,0,0,1,0]
[0,0,0,0,0,0,0,1,0,0]
[0,0,0,1,0,0,1,0,0,0]
Expected Output
[0,1,1,1,0,0,1,1,1,1]
Use numpy.any with axis=0 (to flatten along the first axis, i.e. flatten along the rows):
>>> np.any(a, axis=0)
array([False, True, True, True, False, False, True, True, True, True], dtype=bool)
Of course, you can convert the boolean array into integers easily:
>>> np.any(a, axis=0)*1
array([0, 1, 1, 1, 0, 0, 1, 1, 1, 1])
You can simply | (or) them all together:
>>> np.array([0,0,1,0,0,0,1,0,0,1]) | np.array([0,1,0,0,0,0,0,0,1,0])
array([0, 1, 1, 0, 0, 0, 1, 0, 1, 1])

How can you turn an index array into a mask array in Numpy?

Is it possible to convert an array of indices to an array of ones and zeros, given the range?
i.e. [2,3] -> [0, 0, 1, 1, 0], in range of 5
I'm trying to automate something like this:
>>> index_array = np.arange(200,300)
array([200, 201, ... , 299])
>>> mask_array = ??? # some function of index_array and 500
array([0, 0, 0, ..., 1, 1, 1, ... , 0, 0, 0])
>>> train(data[mask_array]) # trains with 200~299
>>> predict(data[~mask_array]) # predicts with 0~199, 300~499
Here's one way:
In [1]: index_array = np.array([3, 4, 7, 9])
In [2]: n = 15
In [3]: mask_array = np.zeros(n, dtype=int)
In [4]: mask_array[index_array] = 1
In [5]: mask_array
Out[5]: array([0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0])
If the mask is always a range, you can eliminate index_array, and assign 1 to a slice:
In [6]: mask_array = np.zeros(n, dtype=int)
In [7]: mask_array[5:10] = 1
In [8]: mask_array
Out[8]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
If you want an array of boolean values instead of integers, change the dtype of mask_array when it is created:
In [11]: mask_array = np.zeros(n, dtype=bool)
In [12]: mask_array
Out[12]:
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False], dtype=bool)
In [13]: mask_array[5:10] = True
In [14]: mask_array
Out[14]:
array([False, False, False, False, False, True, True, True, True,
True, False, False, False, False, False], dtype=bool)
For a single dimension, try:
n = (15,)
index_array = [2, 5, 7]
mask_array = numpy.zeros(n)
mask_array[index_array] = 1
For more than one dimension, convert your n-dimensional indices into one-dimensional ones, then use ravel:
n = (15, 15)
index_array = [[1, 4, 6], [10, 11, 2]] # you may need to transpose your indices!
mask_array = numpy.zeros(n)
flat_index_array = np.ravel_multi_index(
index_array,
mask_array.shape)
numpy.ravel(mask_array)[flat_index_array] = 1
There's a nice trick to do this as a one-liner, too - use the numpy.in1d and numpy.arange functions like this (the final line is the key part):
>>> x = np.linspace(-2, 2, 10)
>>> y = x**2 - 1
>>> idxs = np.where(y<0)
>>> np.in1d(np.arange(len(x)), idxs)
array([False, False, False, True, True, True, True, False, False, False], dtype=bool)
The downside of this approach is that it's ~10-100x slower than the appropch Warren Weckesser gave... but it's a one-liner, which may or may not be what you're looking for.
As requested, here it is in an answer. The code:
[x in index_array for x in range(500)]
will give you a mask like you asked for, but it will use Bools instead of 0's and 1's.

Categories