NetCDF4 [[--]] value for lat, long interpolation - python

Some of my requests to a netCDF4 object return a [[--]] value for invalid. The real numeric value for some locations is [[someNumerical]] .
How can I catch this? It's not documented in the http://matplotlib.org/basemap/api/basemap_api.html interp documentation?
The reason why I am getting it is that my lat, long are out of bounds for reasonable interpolation, but I simply do not understand how to catch this return value.
Here's my call to it:
value = interp(theData, longitudes, latitudes, np.asarray( [[ convertLongitude(longitude)]] ), np.asarray( [[ convertLatitude(latitude) ]] ), checkbounds=True, masked=True, order=1)
Well, a workaround is of course to do
if str(value) == '[[--]]':
doSomething

Your question is unclear as to where you think the problem is - in the values fetched via NetCDF4, or values returned by interp.
However when looking at the documentation for interp I find:
masked
If True, points outside the range of xin and yin are masked (in a masked array). If masked is set to a number, then points outside the range of xin and yin will be set to that number. Default False.
http://matplotlib.org/basemap/api/basemap_api.html#mpl_toolkits.basemap.interp
The [--] value makes sense in the context of masked array.
In a masked array, masked values (usually non-valid ones) are displayed with a --:
In [380]: x=np.ma.masked_greater(np.arange(4), 2)
In [381]: x
Out[381]:
masked_array(data = [0 1 2 --],
mask = [False False False True],
fill_value = 999999)
You need to read up on masked array if you want to use the masked=True parameter.
You can do things like replace the masked elements with a fillvalue
In [387]: x.filled()
Out[387]: array([ 0, 1, 2, 999999])
In [388]: x.filled(-1)
Out[388]: array([ 0, 1, 2, -1])
or remove them
In [389]: x.compressed()
Out[389]: array([0, 1, 2])
The fact that you are seeing [[--]] suggests that values might be a 2d array. If so compressed might not be useful.
But a key point is that values array does not actually have -- values. That is what is displayed, as a filler.

Related

Extract 2d ndarray from arbitrarily dimensional ndarray using index arrays

I want to extract parts of an numpy ndarray based on arrays of index positions for some of the dimensions. Let me show this on an example
Example data
dummy = np.random.rand(5,2,100)
X = np.array([[0,1],[4,1],[2,0]])
dummy is the original ndarray with dimensionality 5x2x100. This dimensionality is arbitrary, it could as well be 5x2x4x100.
X is a matrix of index values, here X[:,0] are the indices of the first dimension of dummy, X[:,1] those of the second dimension. The number of columns in X is always the number of dimensions in dummy minus 1.
Example output
I want to extract an ndarray of the following form for this example
[
dummy[0,1,:],
dummy[4,1,:],
dummy[2,0,:]
]
Complications
If the number of dimensions in dummy were fixed, this could just be done by dummy[X[:,0],X[:,1],:] . Sadly the dimensionality can be different, e.g. dummy could be a 5x2x4x6x100 ndarray and X correspondingly would then be 3x4 . My attempts at dealing with it have not yielded the desired result.
dummy[X,:] yields a 3x2x2x100 ndarray for this example same as dummy[X]
Iteratively reducing dummy by doing something like dummy = dummy[X[:,i],:] with i an iterator over the number of columns of X also does not reduce the ndarray in the example past 3x2x100
I have a feeling that this should be pretty simple with numpy indexing, but I guess my search for a solution was missing the right terms for this.
Does anyone have a solution to this?
I will try to provide some explainability to #Michael Szczesny answer.
First, notice that if you have an np.array with dimension n and pass m indexes where m<n, then it will be the same as using : in the dimensions >=m. In your case, for example:
dummy[(0, 0)] == dummy[0, 0, :]
Given that, note that you can also pass an array as an index. Thus:
dummy[([0, 1], [0, 0])]
It would be the same as:
np.array([dummy[(0,0)], dummy[(1,0)]])
You can validate that using:
dummy[([0, 1], [0, 0])] == np.array([dummy[(0,0)], dummy[(1,0)]])
Finally, notice that:
(*X.T,)
# (array([0, 4, 2]), array([1, 1, 0]))
You are here getting each dimension as an array, and then you will get:
[
dummy[0,1],
dummy[4,1],
dummy[2,0]
]
Which is the same as:
[
dummy[0,1,:],
dummy[4,1,:],
dummy[2,0,:]
]
Edit: Instead of using (*X.T,), you can use tuple(X.T), which for me, makes more sense
as Michael Szczesny wrote, the best solution is dummy[(*X.T,)].
Since X[:,0] are the indices of the first dimension of dummy and X[:,1] are the indices of the second dimension of dummy, if you transpose X (X.T) you'll have the the indices of the first dimension of dummy as X.T[0] and the indices of the second dimension of dummy as X.T[1].
Now to slice dummy as you want, you can specify the indices of the first and of the second dimension in this way:
dummy[(first_dim_indices, second_dim_indices)] = dummy[(X.T[0], X.T[1])]
In order to simplify the code (and since you doesn't want to transpose the X matrix twice) you can unpack X.T in a tuple as (*X.T,) and so write X[(*X.T,)] is the same thing to write dummy[(X.T[0], X.T[1])].
This writing is also useful if you have an unfixed number of dimensions to slice trough because you will unpack from X.T as many lines as there are dimensions to slice in dummy. For example suppose you want to retrieve an 1D-array from dummy given the following indices:
first_dim: (0, 4, 2)
second_dim: (1, 1, 0)
third_dim: (9, 8, 7)
You can specify the indices of the 3 dimensions as X = np.array([[0,1,9],[4,1,8],[2,0,7]]) and dim[(*X.T,)] is still valid.

What is the difference between resize and reshape when using arrays in NumPy?

I have just started using NumPy. What is the difference between resize and reshape for arrays?
Reshape doesn't change the data as mentioned here.
Resize changes the data as can be seen here.
Here are some examples:
>>> numpy.random.rand(2,3)
array([[ 0.6832785 , 0.23452056, 0.25131171],
[ 0.81549186, 0.64789272, 0.48778127]])
>>> ar = numpy.random.rand(2,3)
>>> ar.reshape(1,6)
array([[ 0.43968751, 0.95057451, 0.54744355, 0.33887095, 0.95809916,
0.88722904]])
>>> ar
array([[ 0.43968751, 0.95057451, 0.54744355],
[ 0.33887095, 0.95809916, 0.88722904]])
After reshape the array didn't change, but only output a temporary array reshape.
>>> ar.resize(1,6)
>>> ar
array([[ 0.43968751, 0.95057451, 0.54744355, 0.33887095, 0.95809916,
0.88722904]])
After resize the array changed its shape.
One major difference is reshape() does not change your data, but resize() does change it. resize() first accommodates all the values in the original array. After that, if extra space is there (or size of new array is greater than original array), it adds its own values. As #David mentioned in comments, what values resize() adds depends on how that is called.
You can call reshape() and resize() function in the following two ways.
numpy.resize()
ndarray.resize() - where ndarray is an n dimensional array you are resizing.
You can similarly call reshape also as numpy.reshape() and ndarray.reshape(). But here they are almost the same except the syntax.
One point to notice is that, reshape() will always try to return a view wherever possible, otherwise it would return a copy. Also, it can't tell what will be returned when, but you can make your code to raise error whenever the data is copied.
For resize() function, numpy.resize() returns a new copy of the array whereas ndarray.resize() does it in-place. But they don't go to the view thing.
Now coming to the point that what the values of extra elements should be. From the documentation, it says
If the new array is larger than the original array, then the new array is filled with repeated copies of a. Note that this behavior is different from a.resize(new_shape) which fills with zeros instead of repeated copies of a.
So for ndarray.resize() it is the value 0, but for numpy.resize() it is the values of the array itself (of course, whatever can fit in the new size). The below code snippet will make it clear.
In [40]: arr = np.array([1, 2, 3, 4])
In [41]: np.resize(arr, (2,5))
Out[41]:
array([[1, 2, 3, 4, 1],
[2, 3, 4, 1, 2]])
In [42]: arr.resize((2,5))
In [43]: arr
Out[43]:
array([[1, 2, 3, 4, 0],
[0, 0, 0, 0, 0]])
You can also see that ndarray.resize() returns None and does the resizing in-place.
reshape() is able to change the shape only (i.e. the meta info), not the number of elements.
If the array has five elements, we may use e.g. reshape(5, ), reshape(1, 5),
reshape(1, 5, 1), but not reshape(2, 3).
reshape() in general don't modify data themselves, only meta info about them,
the .reshape() method (of ndarray) returns the reshaped array, keeping the original array untouched.
resize() is able to change both the shape and the number of elements, too.
So for an array with five elements we may use resize(5, 1), but also resize(2, 2) or resize(7, 9).
The .resize() method (of ndarray) returns None, changing only the original array (so it seems as an in-place change).
Suppose you have the following np.ndarray:
a = np.array([1, 2, 3, 4]) # Shape of this is (4,)
Now we try 'a.reshape'
a.reshape(1, 4)
array([[1, 2, 3, 4]])
a.shape # This will again return (4,)
We see that the shape of a hasn't changed.
Let's try 'a.resize' now:
a.resize(1,4)
a.shape # Now the shape changes to (1,4)
'resize' changed the shape of our original NumPy array a (It changes shape 'IN-PLACE').
One more point is:
np.reshape can take -1 in one dimension. np.resize can't.
Example as below:
arr = np.arange(20)
arr.resize(5, 2, 2)
arr.reshape(2, 2, -1)

Numpy Masking with Array

I'm not certain of the best way of asking this question, so I apologize ahead of time.
I'm trying find a peak on each row of an NxM numpy array of audio signals. Each row in the array is treated individually and I'd like to get all values a certain number of standard deviations above the noise floor for each N in the array in frequency space. In this experiment I know that I do not have a signal above 400Hz so I'm using that as my noise floor. I'm running into issues when trying to mask. Here is my code snippet:
from scipy import signal
import numpy as np
Pxx_den = signal.periodogram(input, fs=sampleRate ,nfft=sampleRate,axis=1)
p = np.array(Pxx_den)[1].astype(np.float)
noiseFloor = np.mean(p[:,400:],axis=1)
stdFloor = np.std(p[:,400:],axis=1)
p = np.ma.masked_less(p,noiseFloor+stdFloor*2)
This example will generate an error of:
ValueError: operands could not be broadcast together with shapes (91,5001) (91,)
I've deduced that this is because ma.masked_less works with a single value and does not take in an array. I would like the output to be an NxM array of values greater than the condition. Is there a Numpy way of doing what I'd like or an efficient alternative?
I've also looked at some peak detection routines such as peakUtils and scipy.signal.find_peaks_cwt() but they seem to only act on 1D arrays.
Thanks in advance
Before getting too far into using masked arrays makes sure that the following code handles them. It has to be aware of how masked arrays works, or defer to masked array methods.
As to the specific problem, I think this recreates it:
In [612]: x=np.arange(10).reshape(2,5)
In [613]: np.ma.masked_less(x,np.array([3,6]))
...
ValueError: operands could not be broadcast together with shapes (2,5) (2,)
I have a 2d array, and I try to apply the < mask with different values for each row.
Instead I can generate the mask as a 2d array matching x:
In [627]: mask= x<np.array([3,6])[:,None]
In [628]: np.ma.masked_where(mask,x)
Out[628]:
masked_array(data =
[[-- -- -- 3 4]
[-- 6 7 8 9]],
mask =
[[ True True True False False]
[ True False False False False]],
fill_value = 999999)
I can also select the values, though in a way that looses the 2d structure.
In [631]: x[~mask]
Out[631]: array([3, 4, 6, 7, 8, 9])
In [632]: np.ma.masked_where(mask,x).compressed()
Out[632]: array([3, 4, 6, 7, 8, 9])

is Numpy's masked array memory efficient?

I was wondering: are numpy's masked arrays able to store a compact representation of the available values? In other words, if I have a numpy array with no set values, will it be stored in memory with negligible size?
Actually, this is not just a casual question, but I need such memory optimization for an application I am developing.
No a masked array is not more compact.
In [344]: m = np.ma.masked_array([1,2,3,4],[1,0,0,1])
In [345]: m
Out[345]:
masked_array(data = [-- 2 3 --],
mask = [ True False False True],
fill_value = 999999)
In [346]: m.data
Out[346]: array([1, 2, 3, 4])
In [347]: m.mask
Out[347]: array([ True, False, False, True], dtype=bool)
It contains both the original (full) array, and a mask. The mask may be a scalar, or it may be a boolean array with the same shape as the data.
scipy.sparse stores just the nonzero values of an array, though the space savings depends on the storage format and the sparsity. So you might simulate your masking with sparsity. Or you could take ideas from that representation.
What do you plan to do with these arrays? Just access items, or do calculations?
Masked arrays are most useful for data that is mostly good, with a modest number of 'bad' values. For example, real life data series with occasional glitches, or monthly data padded to 31 days. Masking lets you keep the data in a rectangular arrangement, and still calculate things like the mean and sum without useing the masked vales.

derivative with numpy.diff problems

I have this problem:
I have an array of 7 elements:
vector = [array([ 76.27789424]), array([ 76.06870298]), array([ 75.85016864]), array([ 75.71155968]), array([ 75.16982466]), array([ 73.08832948]), array([ 68.59935515])]
(this array is the result of a lot of operation)
now I want calculate the derivative with numpy.diff(vector) but I know that the type must be a numpy array.
for this, I type:
vector=numpy.array(vector);
if I print the vector, now, the result is:
[[ 76.27789424]
[ 76.06870298]
[ 75.85016864]
[ 75.71155968]
[ 75.16982466]
[ 73.08832948]
[ 68.59935515]]
but If i try to calculate the derivative, the result is [].
Can You help me, please?
Thanks a lot!
vector is a list of arrays, to get a 1-D NumPy array use a list comprehension and pass it to numpy.array:
>>> vector = numpy.array([x[0] for x in vector])
>>> numpy.diff(vector)
array([-0.20919126, -0.21853434, -0.13860896, -0.54173502, -2.08149518,
-4.48897433])
vector = numpy.array(vector);
gives you a two dimensional array with seven rows and one column
>>> vector.shape
(7, 1)
The shape reads like: (length axis 0, length axis 1, length axis 2, ...)
As you can see the last axis is axis 1 and it's length is 1.
from the docs
numpy.diff(a, n=1, axis=-1)
...
axis : int, optional
The axis along which the difference is taken, default is the last axis.
There is no way to take difference of a single value. So lets try to use the first axis which has a length of 7. Since axis counting starts with zero, the first axis is 0
>>> np.diff(vector, axis=0)
array([[-0.20919126],
[-0.21853434],
[-0.13860896],
[-0.54173502],
[-2.08149518],
[-4.48897433]])
Note that every degree of derivative will be one element shorter so the new shape is (7-1, 1) which is (6, 1). Lets verify that
>>> np.diff(vector, axis=0).shape
(6, 1)

Categories