Numpy Masking with Array - python

I'm not certain of the best way of asking this question, so I apologize ahead of time.
I'm trying find a peak on each row of an NxM numpy array of audio signals. Each row in the array is treated individually and I'd like to get all values a certain number of standard deviations above the noise floor for each N in the array in frequency space. In this experiment I know that I do not have a signal above 400Hz so I'm using that as my noise floor. I'm running into issues when trying to mask. Here is my code snippet:
from scipy import signal
import numpy as np
Pxx_den = signal.periodogram(input, fs=sampleRate ,nfft=sampleRate,axis=1)
p = np.array(Pxx_den)[1].astype(np.float)
noiseFloor = np.mean(p[:,400:],axis=1)
stdFloor = np.std(p[:,400:],axis=1)
p = np.ma.masked_less(p,noiseFloor+stdFloor*2)
This example will generate an error of:
ValueError: operands could not be broadcast together with shapes (91,5001) (91,)
I've deduced that this is because ma.masked_less works with a single value and does not take in an array. I would like the output to be an NxM array of values greater than the condition. Is there a Numpy way of doing what I'd like or an efficient alternative?
I've also looked at some peak detection routines such as peakUtils and scipy.signal.find_peaks_cwt() but they seem to only act on 1D arrays.
Thanks in advance

Before getting too far into using masked arrays makes sure that the following code handles them. It has to be aware of how masked arrays works, or defer to masked array methods.
As to the specific problem, I think this recreates it:
In [612]: x=np.arange(10).reshape(2,5)
In [613]: np.ma.masked_less(x,np.array([3,6]))
...
ValueError: operands could not be broadcast together with shapes (2,5) (2,)
I have a 2d array, and I try to apply the < mask with different values for each row.
Instead I can generate the mask as a 2d array matching x:
In [627]: mask= x<np.array([3,6])[:,None]
In [628]: np.ma.masked_where(mask,x)
Out[628]:
masked_array(data =
[[-- -- -- 3 4]
[-- 6 7 8 9]],
mask =
[[ True True True False False]
[ True False False False False]],
fill_value = 999999)
I can also select the values, though in a way that looses the 2d structure.
In [631]: x[~mask]
Out[631]: array([3, 4, 6, 7, 8, 9])
In [632]: np.ma.masked_where(mask,x).compressed()
Out[632]: array([3, 4, 6, 7, 8, 9])

Related

Sum pattern across array

I'm having trouble finding the proper way to do something I think should be trivial using numpy. I have an array (1000x1000) and I want to calculate the sum of a specific pattern across the array.
For example:
If I have this array and want to calculate the sum of a two-cell-right diagonal I would expect [7,12,11,8,12,6,11,7] (a total of 8 sums).
How can I do this?
This operation is called a 2-dimensional convolution:
>>> import numpy as np
>>> from scipy.signal import convolve2d
>>> kernel = np.eye(2, dtype=int)
>>> a = np.array([[5,3,7,1,2],[3,2,9,4,7],[8,9,4,2,3]])
>>> convolve2d(a, kernel, mode='valid')
array([[ 7, 12, 11, 8],
[12, 6, 11, 7]])
Should you want to generalize it to arbitrary dimensions, there is also scipy.ndimage.convolve available. It will also work for this 2d case, but does not offer the mode='valid' convenience.
l = [[5,3,7,1,2],[3,2,9,4,7],[8,9,4,2,3]]
[q+l[w+1][t+1] for w,i in enumerate(l[:-1]) for t,q in enumerate(i[:-1])]
then you can avoid using numpy :) and the output is
[7,12,11,8,12,6,11,7]

numpy: Why is there a difference between (x,1) and (x, ) dimensionality

I am wondering why in numpy there are one dimensional array of dimension (length, 1) and also one dimensional array of dimension (length, ) w/o a second value.
I am running into this quite frequently, e.g. when using np.concatenate() which then requires a reshape step beforehand (or I could directly use hstack/vstack).
I can't think of a reason why this behavior is desirable. Can someone explain?
Edit:
It was suggested by one of the comments that my question is a possible duplicate. I am more interested in the underlying working logic of Numpy and not that there is a distinction between 1d and 2d arrays which I think is the point of the mentioned thread.
The data of a ndarray is stored as a 1d buffer - just a block of memory. The multidimensional nature of the array is produced by the shape and strides attributes, and the code that uses them.
The numpy developers chose to allow for an arbitrary number of dimensions, so the shape and strides are represented as tuples of any length, including 0 and 1.
In contrast MATLAB was built around FORTRAN programs that were developed for matrix operations. In the early days everything in MATLAB was a 2d matrix. Around 2000 (v3.5) it was generalized to allow more than 2d, but never less. The numpy np.matrix still follows that old 2d MATLAB constraint.
If you come from a MATLAB world you are used to these 2 dimensions, and the distinction between a row vector and column vector. But in math and physics that isn't influenced by MATLAB, a vector is a 1d array. Python lists are inherently 1d, as are c arrays. To get 2d you have to have lists of lists or arrays of pointers to arrays, with x[1][2] style of indexing.
Look at the shape and strides of this array and its variants:
In [48]: x=np.arange(10)
In [49]: x.shape
Out[49]: (10,)
In [50]: x.strides
Out[50]: (4,)
In [51]: x1=x.reshape(10,1)
In [52]: x1.shape
Out[52]: (10, 1)
In [53]: x1.strides
Out[53]: (4, 4)
In [54]: x2=np.concatenate((x1,x1),axis=1)
In [55]: x2.shape
Out[55]: (10, 2)
In [56]: x2.strides
Out[56]: (8, 4)
MATLAB adds new dimensions at the end. It orders its values like a order='F' array, and can readily change a (n,1) matrix to a (n,1,1,1). numpy is default order='C', and readily expands an array dimension at the start. Understanding this is essential when taking advantage of broadcasting.
Thus x1 + x is a (10,1)+(10,) => (10,1)+(1,10) => (10,10)
Because of broadcasting a (n,) array is more like a (1,n) one than a (n,1) one. A 1d array is more like a row matrix than a column one.
In [64]: np.matrix(x)
Out[64]: matrix([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [65]: _.shape
Out[65]: (1, 10)
The point with concatenate is that it requires matching dimensions. It does not use broadcasting to adjust dimensions. There are a bunch of stack functions that ease this constraint, but they do so by adjusting the dimensions before using concatenate. Look at their code (readable Python).
So a proficient numpy user needs to be comfortable with that generalized shape tuple, including the empty () (0d array), (n,) 1d, and up. For more advanced stuff understanding strides helps as well (look for example at the strides and shape of a transpose).
Much of it is a matter of syntax. This tuple (x) isn't a tuple at all (just a redundancy). (x,), however, is.
The difference between (x,) and (x,1) goes even further. You can take a look into the examples of previous questions like this. Quoting the example from it, this is an 1D numpy array:
>>> np.array([1, 2, 3]).shape
(3,)
But this one is 2D:
>>> np.array([[1, 2, 3]]).shape
(1, 3)
Reshape does not make a copy unless it needs to so it should be safe to use.

is Numpy's masked array memory efficient?

I was wondering: are numpy's masked arrays able to store a compact representation of the available values? In other words, if I have a numpy array with no set values, will it be stored in memory with negligible size?
Actually, this is not just a casual question, but I need such memory optimization for an application I am developing.
No a masked array is not more compact.
In [344]: m = np.ma.masked_array([1,2,3,4],[1,0,0,1])
In [345]: m
Out[345]:
masked_array(data = [-- 2 3 --],
mask = [ True False False True],
fill_value = 999999)
In [346]: m.data
Out[346]: array([1, 2, 3, 4])
In [347]: m.mask
Out[347]: array([ True, False, False, True], dtype=bool)
It contains both the original (full) array, and a mask. The mask may be a scalar, or it may be a boolean array with the same shape as the data.
scipy.sparse stores just the nonzero values of an array, though the space savings depends on the storage format and the sparsity. So you might simulate your masking with sparsity. Or you could take ideas from that representation.
What do you plan to do with these arrays? Just access items, or do calculations?
Masked arrays are most useful for data that is mostly good, with a modest number of 'bad' values. For example, real life data series with occasional glitches, or monthly data padded to 31 days. Masking lets you keep the data in a rectangular arrangement, and still calculate things like the mean and sum without useing the masked vales.

Python numpy array indexing. How is this working?

I came across this python code (which works) and to me it seems amazing. However, I am unable to figure out what this code is doing. To replicate it, I sort of wrote a test code:
import numpy as np
# Create a random array which represent the 6 unique coeff.
# of a symmetric 3x3 matrix
x = np.random.rand(10, 10, 6)
So, I have 100 symmetric 3x3 matrices and I am only storing the unique components. Now, I want to generate the full 3x3 matrix and this is where the magic happens.
indices = np.array([[0, 1, 3],
[1, 2, 4],
[3, 4, 5]])
I see what this is doing. This is how the 0-5 index components should be arranged in the 3x3 matrix to have a symmetric matrix.
mat = x[..., indices]
This line has me lost. So, it is working on the last dimension of the x array but it is not at all clear to me how the rearrangement and reshaping is done but this indeed returns an array of shape (10, 10, 3, 3). I am amazed and confused!
From the advanced indexing documentation - bi rico's link.
Example
Suppose x.shape is (10,20,30) and ind is a (2,3,4)-shaped indexing intp array, thenresult = x[...,ind,:] has shape (10,2,3,4,30) because the (20,)-shaped subspace has been replaced with a (2,3,4)-shaped broadcasted indexing subspace. If we let i, j, kloop over the (2,3,4)-shaped subspace then result[...,i,j,k,:] =x[...,ind[i,j,k],:]. This example produces the same result as x.take(ind, axis=-2).

Convert 1D array into numpy matrix

I have a simple, one dimensional Python array with random numbers. What I want to do is convert it into a numpy Matrix of a specific shape. My current attempt looks like this:
randomWeights = []
for i in range(80):
randomWeights.append(random.uniform(-1, 1))
W = np.mat(randomWeights)
W.reshape(8,10)
Unfortunately it always creates a matrix of the form:
[[random1, random2, random3, ...]]
So only the first element of one dimension gets used and the reshape command has no effect. Is there a way to convert the 1D array to a matrix so that the first x items will be row 1 of the matrix, the next x items will be row 2 and so on?
Basically this would be the intended shape:
[[1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, ... , 16],
[..., 800]]
I suppose I can always build a new matrix in the desired form manually by parsing through the input array. But I'd like to know if there is a simpler, more eleganz solution with built-in functions I'm not seeing. If I have to build those matrices manually I'll have a ton of extra work in other areas of the code since all my source data comes in simple 1D arrays but will be computed as matrices.
reshape() doesn't reshape in place, you need to assign the result:
>>> W = W.reshape(8,10)
>>> W.shape
(8,10)
You can use W.resize(), ndarray.resize()

Categories