Always yield a masked array with netCDF4 - python

Question:
Is there a way of forcing netCDF4 to always output a masked array, regardless of whether it slice contains any fill values?
Background:
I have a netCDF dataset of values on a grid, over time that I read using the netCDF4 package.
nc_data = netCDF4.Dataset('file.nc', 'r')
The initial timesteps yield masked arrays:
var1_t0 = nc_data.variables['var1'][0][:]
var1_t0
masked_array(...)
The later timesteps yield standard ndarrays:
var1_t200 = nc_data.variables['var1'][200][:]
var1_t200
ndarray(...)
Desired result:
I would like masked arrays for the latter with a mask of all False, rather than a standard ndarray.

I don't think this is directly possible, but you can work it around by creating a masked_array if necessary:
var1_t0 = nc_data.variables['var1'][0][:]
if type(var1_t0) is numpy.ma.core.MaskedArray:
var1_t0 = numpy.ma.core.MaskedArray(var1_t0, numpy.zeros(var1_t0.shape, dtype = bool))

Related

What is the fastest way to read in an image to an array of tuples?

I am trying to assign provinces to an area for use in a game mod. I have two separate maps for area and provinces.
provinces file,
area file.
Currently I am reading in an image in Python and storing it in an array using PIL like this:
import PIL
land_prov_pic = Image.open(INPUT_FILES_DIR + land_prov_str)
land_prov_array = np.array(land_prov_pic)
image_size = land_prov_pic.size
for x in range(image_size[0]):
if x % 100 == 0:
print(x)
for y in range(image_size[1]):
land_prov_array[x][y] = land_prov_pic.getpixel((x,y))
Where you end up with land_prov_array[x][y] = (R,G,B)
However, this get's really slow, especially for large images. I tried reading it in using opencv like this:
import opencv
land_prov_array = cv2.imread(INPUT_FILES_DIR + land_prov_str)
land_prov_array = cv2.cvtColor(land_prov_array, cv2.COLOR_BGR2RGB) #Convert from BGR to RGB
But now land_prov_array[x][y] = [R G B] which is an ndarray and can't be inserted into a set. But it's way faster than the previous for loop. How do I convert [R G B] to (R,G,B) for every element in the array without for loops or, better yet, read it in that way?
EDIT: Added pictures, more description, and code blocks for readability.
It is best to convert the [R,G,B] array to tuple when you need it to be a tuple, rather than converting the whole image to this form. An array of tuples takes up a lot more memory, and will be a lot slower to process, than a numeric array.
The answer by isCzech shows how to create a NumPy view over a 3D array that presents the data as if it were a 2D array of tuples. This might not require the additional memory of an actual array of tuples, but it is still a lot slower to process.
Most importantly, most NumPy functions (such as np.mean) and operators (such as +) cannot be applied to such an array. Thus, one is obliged to iterate over the array in Python code (or with a #np.vectorize function), which is a lot less efficient than using NumPy functions and operators that work on the array as a whole.
For transformation from a 3D array (data3D) to a 2D array (data2D), I've used this approach:
import numpy as np
dt = np.dtype([('x', 'u1'), ('y', 'u1'), ('z', 'u1')])
data2D = data3D.view(dtype=dt).squeeze()
The .view modifies the data type and returns still a 3D array with the last dimension of size 1 which can be then removed by .squeeze. Alternatively you can use .squeeze(axis=-1) to only squeeze the last dimension (in case some of your other dimensions are of size 1 too).
Please note I've used uint8 ('u1') - your type may be different.
Trying to do this using a loop is very slow, indeed (compared to this approach at least).
Similar question here: Show a 2d numpy array where contents are tuples as an image

Taking mean of numpy ndarray with masked elements

I have a MxN array of values taken from an experiment. Some of these values are invalid and are set to 0 to indicate such. I can construct a mask of valid/invalid values using
mask = (mat1 == 0) & (mat2 == 0)
which produces an MxN array of bool. It should be noted that the masked locations do not neatly follow columns or rows of the matrix - so simply cropping the matrix is not an option.
Now, I want to take the mean along one axis of my array (E.G end up with a 1xN array) while excluding those invalid values in the mean calculation. Intuitively I thought
np.mean(mat1[mask],axis=1)
should do it, but the mat1[mask] operation produces a 1D array which appears to just be the elements where mask is true - which doesn't help when I only want a mean across one dimension of the array.
Is there a 'python-esque' or numpy way to do this? I suppose I could use the mask to set masked elements to NaN and use np.nanmean - but that still feels kind of clunky. Is there a way to do this 'cleanly'?
I think the best way to do this would be something along the lines of:
masked = np.ma.masked_where(mat1 == 0 && mat2 == 0, array_to_mask)
Then take the mean with
masked.mean(axis=1)
One similarly clunky but efficient way is to multiply your array with the mask, setting the masked values to zero. Then of course you'll have to divide by the number of non-masked values manually. Hence clunkiness. But this will work with integer-valued arrays, something that can't be said about the nan case. It also seems to be fastest for both small and larger arrays (including the masked array solution in another answer):
import numpy as np
def nanny(mat, mask):
mat = mat.astype(float).copy() # don't mutate the original
mat[~mask] = np.nan # mask values
return np.nanmean(mat, axis=0) # compute mean
def manual(mat, mask):
# zero masked values, divide by number of nonzeros
return (mat*mask).sum(axis=0)/mask.sum(axis=0)
# set up dummy data for testing
N,M = 400,400
mat1 = np.random.randint(0,N,(N,M))
mask = np.random.randint(0,2,(N,M)).astype(bool)
print(np.array_equal(nanny(mat1, mask), manual(mat1, mask))) # True

Weird issue masking 3D arrays iteratively in Python loop

I have a 3D array, containing 10 2D maps of the world. I created a mask of the oceans, and I am trying to create a second array, identical to my first 3D array, but where the oceans are masked for each year. I thought that this should work:
SIF_year = np.ndarray((SIF_year0.shape))
for i in range(0,SIF_year0.shape[0]):
SIF_year[i,:,:] = np.ma.array(SIF_year0[i,:,:], mask = np.logical_not(mask_global_land))
where SIF_year0 is the initial 3D array, and SIF_year is the version that has been masked. However, SIF_year comes out looking just like SIF_year0. Interestingly, if I do:
SIF_year = np.ndarray((SIF_year0.shape))
for i in range(0,SIF_year0.shape[0]):
SIF_test = np.ma.array(SIF_year0[i,:,:], mask = np.logical_not(mask_global_land))
then SIF_test is the masked 2D array I need. I have tried saving the masked array to SIF_test and then resaving it into SIF_year[i,:,:], but then SIF_year looks like SIF_year0 again!
There must be some obvious mistake I'm missing...
I think I have solved the problem by adding an extra step in the loop that replaces the masked values by np.NaN using ma.filled (https://docs.scipy.org/doc/numpy/reference/routines.ma.html):
SIF_year = np.ndarray((SIF_year0.shape))
for i in range(0,SIF_year0.shape[0]):
SIF_test = np.ma.array(SIF_year0[i,:,:], mask = np.logical_not(mask_global_land))
SIF_year[i,:,:] = np.ma.filled(SIF_test, np.nan)

How to "remove" mask from numpy array after performing operations?

I have a 2D numpy array that I need to mask based on a condition so that I can apply an operation to the masked array then revert the masked values back to the original.
For example:
import numpy as np
array = np.random.random((3,3))
condition = np.random.randint(0, 2, (3,3))
masked = np.ma.array(array, mask=condition)
masked += 2.0
But how can I change the masked values back to the original and "remove" the mask after applying a given operation to the masked array?
The reason why I need to do this is that I am generating a boolean array based on a set of conditions and I need to modify the elements of the array that satisfy the condition.
I could use boolean indexing to do this with a 1D array, but with the 2D array I need to retain its original shape ie. not return a 1D array with only the values satisfying the condition(s).
The accepted answer doesn't answer the question. Assigning the mask to False works in practice but many algorithms do not support masked arrays (e.g. scipy.linalg.lstsq()) and this method doesn't get rid of it. So you will experience an error like this:
ValueError: masked arrays are not supported
The only way to really get rid of the mask is assigning the variable only to the data of the masked array.
import numpy as np
array = np.random.random((3,3))
condition = np.random.randint(0, 2, (3,3))
masked = np.ma.array(array, mask=condition)
masked += 2.0
masked.mask = False
hasattr(masked, 'mask')
>> True
Assigning the variable to the data using the MaskedArray data attribute:
masked = masked.data
hasattr(masked, 'mask')
>> False
You already have it: it's called array!
This is because while masked makes sure you only increment certain values in the matrix, the data is never actually copied. So once your code executes, array has the elements at condition incremented, and the rest remain unchanged.

numpy create 2D mask from list of indices [+ then draw from masked array]

I have a 2-D array of values and need to mask certain elements of that array (with indices taken from a list of ~ 100k tuple-pairs) before drawing random samples from the remaining elements without replacement.
I need something that is both quite fast/efficient (hopefully avoiding for loops) and has a small memory footprint because in practice the master array is ~ 20000 x 20000.
For now I'd be content with something like (for illustration):
xys=[(1,2),(3,4),(6,9),(7,3)]
gxx,gyy=numpy.mgrid[0:100,0:100]
mask = numpy.where((gxx,gyy) not in set(xys)) # The bit I can't get right
# Now sample the masked array
draws=numpy.random.choice(master_array[mask].flatten(),size=40,replace=False)
Fortunately for now I don't need the x,y coordinates of the drawn fluxes - but bonus points if you know an efficient way to do this all in one step (i.e. it would be acceptable for me to identify those coordinates first and then use them to fetch the corresponding master_array values; the illustration above is a shortcut).
Thanks!
Linked questions:
Numpy mask based on if a value is in some other list
Mask numpy array based on index
Implementation of numpy in1d for 2D arrays?
You can do it efficently using sparse coo matrix
from scipy import sparse
xys=[(1,2),(3,4),(6,9),(7,3)]
coords = zip(*xys)
mask = sparse.coo_matrix((numpy.ones(len(coords[0])), coords ), shape= master_array.shape, dtype=bool)
draws=numpy.random.choice( master_array[~mask.toarray()].flatten(), size=10)

Categories