Weird issue masking 3D arrays iteratively in Python loop - python

I have a 3D array, containing 10 2D maps of the world. I created a mask of the oceans, and I am trying to create a second array, identical to my first 3D array, but where the oceans are masked for each year. I thought that this should work:
SIF_year = np.ndarray((SIF_year0.shape))
for i in range(0,SIF_year0.shape[0]):
SIF_year[i,:,:] = np.ma.array(SIF_year0[i,:,:], mask = np.logical_not(mask_global_land))
where SIF_year0 is the initial 3D array, and SIF_year is the version that has been masked. However, SIF_year comes out looking just like SIF_year0. Interestingly, if I do:
SIF_year = np.ndarray((SIF_year0.shape))
for i in range(0,SIF_year0.shape[0]):
SIF_test = np.ma.array(SIF_year0[i,:,:], mask = np.logical_not(mask_global_land))
then SIF_test is the masked 2D array I need. I have tried saving the masked array to SIF_test and then resaving it into SIF_year[i,:,:], but then SIF_year looks like SIF_year0 again!
There must be some obvious mistake I'm missing...

I think I have solved the problem by adding an extra step in the loop that replaces the masked values by np.NaN using ma.filled (https://docs.scipy.org/doc/numpy/reference/routines.ma.html):
SIF_year = np.ndarray((SIF_year0.shape))
for i in range(0,SIF_year0.shape[0]):
SIF_test = np.ma.array(SIF_year0[i,:,:], mask = np.logical_not(mask_global_land))
SIF_year[i,:,:] = np.ma.filled(SIF_test, np.nan)

Related

Mask a three dimensional array to perform segmentation

I'm working with Python. I want to know if there is a python way to mask a 3d array XYZ (volumetric image) to perform a segmentation analysis such as skeletonization.
I'm handling a 600x600x300 array so reducing the number of candidates to be analyzed is key to performance. I tried np.array[mask] but the array dimension changes to 1. The where method such as this How to Correctly mask 3D Array with numpy performs the change to one value at the time, but skeletonization needs to analyze the neighbors to be performed.
Edit: This is something simple but it might help you to get the idea. It's to create a 3d AOI inside a volume.
# create array with random numbers
Array = np.random.random([10, 10,10])
# create a boolean mask of zeros
maskArr=np.zeros_like(Array, dtype=bool)
# set a few values in the mask to true
maskArr[1:8,1:5,1:3] = 1
# Try to analise the data with mask
process= morphology.skeletonize(Array[maskArr])
this is the error due to the 1-d array:
ValueError: skeletonize requires a 2D or 3D image as input, got 1D.

Efficiently filter 3D matrix in numpy with variable 2D masks

I have a 3D numpy array points of dimensions [10000x3000x128] where the first dimension is the number of frames, the second dimension the number of points in each frame and the third dimension is a 128-element feature vector associated to each point. What I want to do is to efficiently filter the points in each frame by using a boolean 2D mask of dimensions [10000x3000] and for each of the selected points also take the related 128-dim vector of features. Moreover, in output I need still a 3D vector and not a merged 2D vector and possibly avoid any for loop.
Actually what I'm doing is:
# example of points
points = np.array([10000, 3000, 128])
# fg, bg = 2D dimensional boolean np.array
# init empty lists
fg_points, bg_points = [], []
for i in range(points.shape[0]):
fg_mask_tmp, bg_mask_tmp = fg[i], bg[i]
fg_points.append(points[i,fg_mask_tmp,:])
bg_points.append(points[i,bg_mask_tmp,:])
fg_features, bg_features = np.array(fg_points), np.array(bg_points)
But this is a quite naive solution that for sure can be improved in a more numpy-like way.
In addition, I also tried other solutions as:
fg_features = points[fg,:]
But this solution does not preserve the dimensions of the array merging the two first dimensions since the number of filtered points for each frame can vary.
Another solution I tried is to enlarge the 2D masks by appending a [128] true value to the last dimension, but with any successful result.
Dos anyone know a possible efficient solution?
Thank you in advance for any help!

Taking mean of numpy ndarray with masked elements

I have a MxN array of values taken from an experiment. Some of these values are invalid and are set to 0 to indicate such. I can construct a mask of valid/invalid values using
mask = (mat1 == 0) & (mat2 == 0)
which produces an MxN array of bool. It should be noted that the masked locations do not neatly follow columns or rows of the matrix - so simply cropping the matrix is not an option.
Now, I want to take the mean along one axis of my array (E.G end up with a 1xN array) while excluding those invalid values in the mean calculation. Intuitively I thought
np.mean(mat1[mask],axis=1)
should do it, but the mat1[mask] operation produces a 1D array which appears to just be the elements where mask is true - which doesn't help when I only want a mean across one dimension of the array.
Is there a 'python-esque' or numpy way to do this? I suppose I could use the mask to set masked elements to NaN and use np.nanmean - but that still feels kind of clunky. Is there a way to do this 'cleanly'?
I think the best way to do this would be something along the lines of:
masked = np.ma.masked_where(mat1 == 0 && mat2 == 0, array_to_mask)
Then take the mean with
masked.mean(axis=1)
One similarly clunky but efficient way is to multiply your array with the mask, setting the masked values to zero. Then of course you'll have to divide by the number of non-masked values manually. Hence clunkiness. But this will work with integer-valued arrays, something that can't be said about the nan case. It also seems to be fastest for both small and larger arrays (including the masked array solution in another answer):
import numpy as np
def nanny(mat, mask):
mat = mat.astype(float).copy() # don't mutate the original
mat[~mask] = np.nan # mask values
return np.nanmean(mat, axis=0) # compute mean
def manual(mat, mask):
# zero masked values, divide by number of nonzeros
return (mat*mask).sum(axis=0)/mask.sum(axis=0)
# set up dummy data for testing
N,M = 400,400
mat1 = np.random.randint(0,N,(N,M))
mask = np.random.randint(0,2,(N,M)).astype(bool)
print(np.array_equal(nanny(mat1, mask), manual(mat1, mask))) # True

Convert list of NDarrays to dataframe

I'm trying to convert a list ND arrays to a dataframe in order to do a Isomap on it. But this doesn't convert. Anyone how to convert in such that I can do an Isomap on it?
#Creation and filling of list samples*
samples = list()
for i in range(72):
img =misc.imread('Datasets/ALOI/32/32_r'+str(i*5)+'.png' )
samples.append(img)
...
df = pd.DataFrame(samples) #This doesn't work gives
#ValueError: Must pass 2-d input*
...
iso = manifold.Isomap(n_neighbors=4, n_components=3)
iso.fit(df) #The end goal of my DataFrame
That is obvious, isn't it? All images are 2D data, rows and columns. Stacking them in a list causes it to gain a third dimension. DataFrames are by nature 2D. Hence the error.
You have 2 possible fixes:
Create a Panel.
wp = pd.Panel.from_dict(zip(samples, [str(i*5) for i in range(72)]))
Stack your arrays one on top of the other, or side by side:
# On top of another:
df = pd.concat([pd.DataFrame(sample) for sample in samples], axis=0,
keys=[str(i*5) for i in range(72)])
# Side by side:
df = pd.concat([pd.DataFrame(sample) for sample in samples], axis=1,
keys=[str(i*5) for i in range(72)])
Another way to do it is to convert your 2D arrays (images) to 1D arrays (that are expected by sklearn) using the reshape method on the images:
for i in range(yourRange):
img = misc.imread(yourFile)
samples.append(img.reshape(-1))
df = pd.DataFrame(samples)
Olivera almost had it.
the problem
When you run misc.imread, the output is a NxM (2D) array. Putting this in a list, makes it 3D. DataFrame expects a 2D input.
the fix
Before it goes in the list, the array should be 'flattened' using ravel:
img =misc.imread('Datasets/ALOI/32/32_r'+str(i*5)+'.png' ).ravel()
why .reshape(-1) doesn't work
Reshaping the array preserves the array's rank. Instead of converting it to an Nx1 array, you want it to be Nx(nothing), which is what ravel() does.

How to "remove" mask from numpy array after performing operations?

I have a 2D numpy array that I need to mask based on a condition so that I can apply an operation to the masked array then revert the masked values back to the original.
For example:
import numpy as np
array = np.random.random((3,3))
condition = np.random.randint(0, 2, (3,3))
masked = np.ma.array(array, mask=condition)
masked += 2.0
But how can I change the masked values back to the original and "remove" the mask after applying a given operation to the masked array?
The reason why I need to do this is that I am generating a boolean array based on a set of conditions and I need to modify the elements of the array that satisfy the condition.
I could use boolean indexing to do this with a 1D array, but with the 2D array I need to retain its original shape ie. not return a 1D array with only the values satisfying the condition(s).
The accepted answer doesn't answer the question. Assigning the mask to False works in practice but many algorithms do not support masked arrays (e.g. scipy.linalg.lstsq()) and this method doesn't get rid of it. So you will experience an error like this:
ValueError: masked arrays are not supported
The only way to really get rid of the mask is assigning the variable only to the data of the masked array.
import numpy as np
array = np.random.random((3,3))
condition = np.random.randint(0, 2, (3,3))
masked = np.ma.array(array, mask=condition)
masked += 2.0
masked.mask = False
hasattr(masked, 'mask')
>> True
Assigning the variable to the data using the MaskedArray data attribute:
masked = masked.data
hasattr(masked, 'mask')
>> False
You already have it: it's called array!
This is because while masked makes sure you only increment certain values in the matrix, the data is never actually copied. So once your code executes, array has the elements at condition incremented, and the rest remain unchanged.

Categories