numpy - Multidimensional boolean mask - python

I'm quite new to Python and numpy and I just cannot get this to work without manual iteration.
I have an n-dimensional data array with floating point values and an equally shaped boolean "mask" array. From that I need to get a new array in the same shape as the both others with all values from the data array where the mask array at the same position is True. Everything else should be 0.:
# given
data = np.array([[1., 2.], [3., 4.]])
mask = np.array([[True, False], [False, True]])
# target
[[1., 0.], [0., 4.]]
Seems like numpy.where() might offer this but I could not get it to work.
Bonus: Don't create new array but replace data values in-position where mask is False to prevent new memory allocation.
Thanks!

This should work
data[~mask] = 0
Numpy boolean array can be used as index (https://docs.scipy.org/doc/numpy-1.15.0/user/basics.indexing.html#boolean-or-mask-index-arrays). The operation will be applied only on pixels with the value "True". Here you first need to invert your mask so False becomes True. You need the inversion because you want to operate on pixels with a False value.

Also, you can just multiply them. Because 'True' and 'False' is treated as '1' and '0' respectively when a boolean array is input in mathematical operations. So,
#element-wise multiplication
data*mask
or
np.multiply(data, mask)

Related

Convert boolean tensor to binary in tensorflow

I have an Boolean tensor and I want to convert to a binary tensor of ones and zeros.
To put it into context - I have the following tensor
[[ True False True]
[False True False]
[ True False True]]
which I need to turn into ones and zeros so then I can multiply element wise with a value tensor, i.e.:
[[1. 0.64082676 0.90568966]
[0.64082676 1. 0.37999165]
[0.90568966 0.37999165 1. ]]
I tried both these functions
masks = tf.map_fn(logical, masks, dtype=tf.float32)
masks = tf.vectorized_map(logical, masks)
with
#tf.function
def logical(x):
if tf.equal(x, True):
return zero
return one
but unfortunately no luck. I also tried to multiply directly with the Boolean tensor but that was not allowed.
So any guidance on how to resolve this?
I think I solved it using this and some magic. Let penalties be the value tensor
test = tf.where(masks, penalties * 0.0, penalties * 1.0)
For people who need literally what the question asked:
tf.where(r, 1, 0)
where r is your boolean tensor

Is there a way to vectorize applying the mean function to masked regions in an ndarray?

Let's say I have a two ndarray define as such:
import numpy as np
mask = np.array([[1,1],[1,2]])
values = np.array([[1., 3.],[2., 2.]])
My goal is to calculate the mean of the values based on the mask regions indicated by the integer in mask. Naturally, I would use a for-loop:
out = np.zeros(len(np.unique(mask)))
for j,i in enumerate(np.unique(mask)):
out[j] = np.nanmean(values[mask==i])
However, this serialized solution becomes very slow for large, multidimensional arrays. Is there a way to vectorize this operation efficiently? Thank you for your help in advance!
You can use np.bincount:
unq,inv,cnt = np.unique(mask,return_inverse=1,return_counts=1)
np.bincount(inv,values.ravel())/cnt
# array([2., 2.])

NumPy - Faster Operations on Masked Array?

I have a numpy array:
import numpy as np
arr = np.random.rand(100)
If I want to find its maximum value, I run np.amax which runs 155,357 times a second on my machine.
However, for some reasons, I have to mask some of its values. Lets, for example, mask just one cell:
import numpy.ma as ma
arr = ma.masked_array(arr, mask=[0]*99 + [1])
Now, finding the max is much slower, running 26,574 times a second.
This is only 17% of the speed of this operation on a none-masked array.
Other operations, for example, are the subtract, add, and multiply. Although on a masked array they operate on ALL OF THE VALUES, it is only 4% of the speed compared to a none-masked array (15,343/497,663)
I'm looking for a faster way to operate on masked arrays like this, whether its using numpy or not.
(I need to run this on real data, which is arrays with multiple dimensions, and millions of cells)
MaskedArray is a subclass of the base numpy ndarray. It does not have compiled code of its own. Look at the numpy/ma/ directory for details, or the main file:
/usr/local/lib/python3.6/dist-packages/numpy/ma/core.py
A masked array has to key attributes, data and mask, one is the data array you used to create it, the other a boolean array of the same size.
So all operations have to take those two arrays into account. Not only does it calculate new data, it also has to calculate a new mask.
It can take several approaches (depending on the operation):
use the data as is
use compressed data - a new array with the masked values removed
use filled data, where the masked values are replaced by the fillvalue or some innocuous value (e.g. 0 when doing addition, 1 when doing multiplication).
The number of masked values, 0 or all, makes little, if any, difference is speed.
So the speed differences that you see are not surprising. There's a lot of extra calculation going on. The ma.core.py file says this package was first developed in pre-numpy days, and incorporated into numpy around 2005. While there have been changes to keep it up to date, I don't think it has been significantly reworked.
Here's the code for np.ma.max method:
def max(self, axis=None, out=None, fill_value=None, keepdims=np._NoValue):
kwargs = {} if keepdims is np._NoValue else {'keepdims': keepdims}
_mask = self._mask
newmask = _check_mask_axis(_mask, axis, **kwargs)
if fill_value is None:
fill_value = maximum_fill_value(self)
# No explicit output
if out is None:
result = self.filled(fill_value).max(
axis=axis, out=out, **kwargs).view(type(self))
if result.ndim:
# Set the mask
result.__setmask__(newmask)
# Get rid of Infs
if newmask.ndim:
np.copyto(result, result.fill_value, where=newmask)
elif newmask:
result = masked
return result
# Explicit output
....
The key steps are
fill_value = maximum_fill_value(self) # depends on dtype
self.filled(fill_value).max(
axis=axis, out=out, **kwargs).view(type(self))
You can experiment with filled to see what happens with your array.
In [40]: arr = np.arange(10.)
In [41]: arr
Out[41]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
In [42]: Marr = np.ma.masked_array(arr, mask=[0]*9 + [1])
In [43]: Marr
Out[43]:
masked_array(data=[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, --],
mask=[False, False, False, False, False, False, False, False,
False, True],
fill_value=1e+20)
In [44]: np.ma.maximum_fill_value(Marr)
Out[44]: -inf
In [45]: Marr.filled()
Out[45]:
array([0.e+00, 1.e+00, 2.e+00, 3.e+00, 4.e+00, 5.e+00, 6.e+00, 7.e+00,
8.e+00, 1.e+20])
In [46]: Marr.filled(_44)
Out[46]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., -inf])
In [47]: arr.max()
Out[47]: 9.0
In [48]: Marr.max()
Out[48]: 8.0

Numpy zeros 2d array: substituting elements at specific indices

For a function I have to write again for CodeSignal, I create an 'empty' matrix with numpy called 'result'. During the course of a for loop, I want to add 1s to certain elements of this zeros matrix:
matrix = [[True, False, False],
[False, True, False],
[False, False, False]]
matrix = np.array(matrix) ## input matrix
(row, col) = matrix.shape
result = np.zeros((row,col), dtype=int) ## made empty matrix of same size
for i in range(0, row):
for j in range(0, col):
mine = matrix[i,j],[i,j]
if mine[0] == True: ##for indices in input matrix where element is called True..
result[i+1,j+1][i+1,j+1] = 1 ##..replace neighbouring elements with 1 (under construction ;) )
print(result)
My very first problem comes with the last part, substituting elements at given indices with another value.
E.g. result[1,1][1,1] = 1
I always get the error
TypeError: object does not support item assignment
and this happened after setting np.zeros to various object types - int32, int8, complex, float64...
If I try:
E.g. result[1,1][1,1] == 1
I get:
IndexError: invalid index to scalar variable.
So what is the way to change or add elements to 2d np arrays at specific locations?
It makes no sense t write:
matrix[i,j][i,j]
The matrix is a 2d array, so that means that matrix[i,j] is a scalar, not an array. Applying 0[i,j] is non-sensical.
You can implement this as:
for i in range(row-1):
for j in range(col-1):
if matrix[i,j]:
result[i+1,j+1] = 1
here you thus will "shift" the values of matrix one to the right, and one down. But then you better perform this with:
result[1:,1:] = matrix[:-1,:-1]
This then gives us:
>>> result
array([[0., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])

Is there a quick way to ignore certain values of an array when performing numpy operations?

Sorry if the title is confusing, but it is very hard to put what I would like to do in a single sentence. Image you have an image stack stack in the form of N m x n matrices as a numpy array in the shape of (m, n, N). Now, if I want to perform the numpy.median for example along the stack axis N it is very easy: numpy.median(stack, 0). The problem is that for each image of the stack, I also have a mask of pixels that I would not like to include in the operation, in this case numpy.median. Is there any efficient way to do that?
So far, all I could think of is this, but it is increadibly slow and absolutely not feasible:
median = [[]]*images[0].flatten().shape
for i in range(len(images)):
image = images[i].flatten()
mask = mask[i].flatten()
for j in range(len(median)):
if mask[j] == 0:
median[j].append(image[j])
for i in range(len(median)):
median[j] = np.median(median[j]) if median[j] else 0
median = np.array(median).reshape(images[0].shape)
There has to be a better way.
What you can do is build a an array with NaNs in the non-masked values and compute np.nanmedian (which ignores NaNs). You can build such an array "on the fly" using np.where:
x = np.arange(4*3*4).reshape((4,3,4))
m = x%2 == 0
np.nanmedian(np.where(m, x, np.NaN), axis=2)
>>array([[ 1., 5., 9.],
[13., 17., 21.],
[25., 29., 33.],
[37., 41., 45.]])
I have a hard time understanding what you are trying to say, but hopefully this will help:
You can use np.whereto find and replace - or ignore/remove - values that you want to exclude.
Or you can use bin_mask = stack != value_you_want_to_ignore to get a boolean array that you can use to ignore your critical values.

Categories