Averaging when iterating over numpy array - python

I have a dataset called MEL of shape (94824,) wherein most instances have shape (99, 13) but some have smaller shapes. It consists of (float) MEL frequencies. I'm trying to put all the values in an empty numpy matrix of shape (94824, 99 , 13). So some instances are left empty. Any suggestions?
MEL type = numpy.ndarray
for i in MEL type(i) = <class 'numpy.ndarray'>
for j in i type (j) = <class 'numpy.ndarray'>

Since your MEL array is not of homogeneous shape, first we need to filter out the arrays whose shape is common (i.e. (99, 13)). For this, we could use:
filtered = []
for arr in MEL:
if arr.shape == (99, 13):
filtered.append(arr)
else:
continue
Then we can initialize an array to hold the results. And then we can iterate over this filtered list of arrays and calculate the mean over axis 1 like:
averaged_arr = np.zeros((len(filtered), 99))
for idx, arr in enumerate(filtered):
averaged_arr[idx] = np.mean(arr, axis=1)
This should compute the desired matrix.
Here is a demo to reproduce your setup, assuming all arrays of the same shape:
# inputs
In [20]: MEL = np.empty(94824, dtype=np.object)
In [21]: for idx in range(94824):
...: MEL[idx] = np.random.randn(99, 13)
# shape of the array of arrays
In [13]: MEL.shape
Out[13]: (94824,)
# shape of each array
In [15]: MEL[0].shape
Out[15]: (99, 13)
# to hold results
In [17]: averaged_arr = np.zeros((94824, 99))
# compute average
In [18]: for idx, arr in enumerate(MEL):
...: averaged_arr[idx] = np.mean(arr, axis=1)
# check the shape of resultant array
In [19]: averaged_arr.shape
Out[19]: (94824, 99)

Related

Writing a Transpose a vector in python

I have to write a python function where i need to compute the vector
For A is n by n and xn is n by 1
r_n = Axn - (xn^TAxn)xn
Im using numpy but .T doesn't work on vectors and when I just do
r_n = A#xn - (xn#A#xn)#xn but xn#A#xn gives me a scaler.
I've tried changing the A with the xn but nothing seems to work.
Making a 3x1 numpy array like this...
import numpy as np
a = np.array([1, 2, 3])
...and then attempting to take its transpose like this...
a_transpose = a.T
...will, confusingly, return this:
# [1 2 3]
If you want to define a (column) vector whose transpose you can meaningfully take, and get a row vector in return, you need to define it like this:
a = np.reshape(np.array([1, 2, 3]), (3, 1))
print(a)
# [[1]
# [2]
# [3]]
a_transpose = a.T
print(a_transpose)
# [[1 2 3]]
If you want to define a 1 x n array whose transpose you can take to get an n x 1 array, you can do it like this:
a = np.array([[1, 2, 3]])
and then get its transpose by calling a.T.
If A is (n,n) and xn is (n,1):
A#xn - (xn#A#xn)#xn
(n,n)#(n,1) - ((n,1)#(n,n)#(n,1)) # (n,1)
(n,1) error (1 does not match n)
If xn#A#xn gives scalar that's because xn is (n,) shape; as per np.matmul docs that's a 2d with two 1d arrays
(n,)#(n,n)#(n,) => (n,)#(n,) -> scalar
I think you want
(1,n) # (n,n) # (n,1) => (1,1)
Come to think of it that (1,1) array should be same single values as the scalar.
Sample calculation; 1st with the (n,) shape:
In [6]: A = np.arange(1,10).reshape(3,3); x = np.arange(1,4)
In [7]: A#x
Out[7]: array([14, 32, 50]) # (3,3)#(3,)=>(3,)
In [8]: x#A#x # scalar
Out[8]: 228
In [9]: (x#A#x)#x
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[9], line 1
----> 1 (x#A#x)#x
ValueError: matmul: Input operand 0 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)
matmul does not like to work with scalars. But we can use np.dot instead, or simply multiply:
In [10]: (x#A#x)*x
Out[10]: array([228, 456, 684]) # (3,)
In [11]: A#x - (x#A#x)*x
Out[11]: array([-214, -424, -634])
Change the array to (3,1):
In [12]: xn = x[:,None]; xn.shape
Out[12]: (3, 1)
In [13]: A#xn - (xn.T#A#xn)*xn
Out[13]:
array([[-214],
[-424],
[-634]]) # same numbers but in (3,1) shape

Pick 3D numpy array with 1D array of indices

Given that there is 2 Numpy Array :
3d_array with shape (100,10,2),
1d_indices with shape (100)
What is the Numpy way/equivalent to do this :
result = []
for i,j in zip(range(len(3d_array)),1d_indices):
result.append(3d_array[i,j])
Which should return result.shape (100,2)
The closest I've come to is by using fancy indexing on Numpy :
result = 3d_array[np.arange(len(3d_array)), 1d_indices]
Your code snippet should be equivalent to 3d_array[:, 1d_indices].reshape(-1,2), example:
a = np.arange(100*10*2).reshape(100,10,2) # 3d array
b = np.random.randint(0, 10, 100) # 1d indices
def fun(a,b):
result = []
for i in range(len(a)):
for j in b:
result.append(a[i,j])
return np.array(result)
assert (a[:, b].reshape(-1, 2) == fun(a, b)).all()

Simplify numpy.dot within loop

Is it possible to simplify this:
import numpy as np
a = np.random.random_sample((40, 3))
data_base = np.random.random_sample((20, 3))
mean = np.random.random_sample((40,))
data = []
for s in data_base:
data.append(mean + np.dot(a, s))
data should be of size (20, 40). I was wondering if I could do some broadcasting instead of the loop. I was not able to do it with np.add and some [:, None]. I certainly do not use this correctly.
Your data creates a (20,40) array:
In [385]: len(data)
Out[385]: 20
In [386]: data = np.array(data)
In [387]: data.shape
Out[387]: (20, 40)
The straight forward application of dot produces the same thing:
In [388]: M2=mean+np.dot(data_base, a.T)
In [389]: np.allclose(M2,data)
Out[389]: True
The matmul operator also works with these arrays (no need to expand and squeeze):
M3 = data_base#a.T + mean

Dynamically indexing/choosing the dimension of numpy array

Just working on a CNN and am stuck on a tensor algorithm.
I want to be able to iterate through a list, or tuple, of dimensions and choose a range of elements of X (a multi dimensional array) from that dimension, while leaving the other dimensions alone.
x = np.random.random((10,3,32,32)) #some multi dimensional array
dims = [2,3] #aka the 32s
#for a dimension in dims
#I want the array of numbers from i:i+window in that dimension
#something like
arr1 = x.index(i:i+3,axis = dim[0])
#returns shape 10,3,3,32
arr2 = arr1.index(i:i+3,axis = dim[1])
#returns shape 10,3,3,3
np.take should work for you (read its docs)
In [237]: x=np.ones((10,3,32,32),int)
In [238]: dims=[2,3]
In [239]: arr1=x.take(range(1,1+3), axis=dims[0])
In [240]: arr1.shape
Out[240]: (10, 3, 3, 32)
In [241]: arr2=x.take(range(1,1+3), axis=dims[1])
In [242]: arr2.shape
Out[242]: (10, 3, 32, 3)
You can try slicing with
arr1 = x[:,:,i:i+3,:]
and
arr2 = arr1[:,:,:,i:i+3]
Shape is then
>>> x[:,:,i:i+3,:].shape
(10, 3, 3, 32)

Vectorized way of accessing row specific elements in a numpy array

I have a 2-D NumPy array and a set of indices the size of which is the first dimension of the NumPy array.
X = np.random.rand(5, 3)
a = np.random.randint(0, 3, 5)
I need to do something like
for i, ind in enumerate(a):
print X[i][ind]
Is there a vectorized way of doing this?
Here you go:
X = np.random.rand(5, 3)
a = np.random.randint(0, 3, 5)
In [12]: X[np.arange(a.size), a]
Out[12]: array([ 0.99653335, 0.30275346, 0.92844957, 0.54728781, 0.43535668])
In [13]: for i, ind in enumerate(a):
print X[i][ind]
# ....:
#0.996533345844
#0.30275345582
#0.92844956619
#0.54728781105
#0.435356681672
I'm assuming here that you don't need each value on a separate line and just want to extract the values.

Categories