I am working on a .wav signals using python 3.5 and trying to extract mfcc, mfcc delta, mfcc delta-deltas, and other signal features. but there is an error raised only with mfcc delta with is:
Traceback (most recent call last):
mfcc_delta = librosa.feature.delta(mfcc)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\site-packages\librosa\feature\utils.py", line 116, in delta
**kwargs)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\site-packages\scipy\signal\_savitzky_golay.py", line 337, in savgol_filter
coeffs = savgol_coeffs(window_length, polyorder, deriv=deriv, delta=delta)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\site-packages\scipy\signal\_savitzky_golay.py", line 139, in savgol_coeffs
coeffs, _, _, _ = lstsq(A, y)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\site-packages\scipy\linalg\basic.py", line 1226, in lstsq
% (-info, lapack_driver))
ValueError: illegal value in 4-th argument of internal None
I am working on the following code:
import librosa
import numpy as np
import librosa
from scipy import signal
import scipy.stats
def preprocess_cough(x,fs, cutoff = 6000, normalize = True, filter_ = True, downsample = True):
#Preprocess Data
if len(x.shape)>1:
x = np.mean(x,axis=1) # Convert to mono
if normalize:
x = x/(np.max(np.abs(x))+1e-17) # Norm to range between -1 to 1
if filter_:
b, a = butter(4, fs_downsample/fs, btype='lowpass') # 4th order butter lowpass filter
x = filtfilt(b, a, x)
if downsample:
x = signal.decimate(x, int(fs/fs_downsample)) # Downsample for anti-aliasing
fs_new = fs_downsample
return np.float32(x), fs_new
audio_data = 'F:/test/'
files = librosa.util.find_files(audio_data, ext=['wav'])
x,fs = librosa.load(myFile,sr=48000)
arr, f = preprocess_cough(x,fs)
mfcc = librosa.feature.mfcc(y=arr, sr=f, n_mfcc=13)
mfcc_delta = librosa.feature.delta(mfcc)
mfcc_delta2 = librosa.feature.delta(mfcc, order=2)
when I remove the mffcs calculations and calculate the other wav signal features the error does not appear again. Also, I have tried to remove n_mfcc=13 parameter but the error still raises.
Sample of the output and the shape of mfcc variable
[-3.86701782e+02 -4.14421021e+02 -4.67373749e+02 -4.76989105e+02
-4.23713501e+02 -3.71329285e+02 -3.47003693e+02 -3.19309082e+02
-3.29547089e+02 -3.32584625e+02 -2.78399109e+02 -2.43284348e+02
-2.47878128e+02 -2.59308533e+02 -2.71102844e+02 -2.87314514e+02
-2.58869965e+02 -6.01125565e+01 1.66160011e+01 -8.58060551e+00
-8.49179382e+01 -9.29880371e+01 -9.96001358e+01 -1.04499428e+02
-3.65511665e+01 -3.82106819e+01 -8.69802475e+01 -1.22267052e+02
-1.70187592e+02 -2.35996841e+02 -2.96493286e+02 -3.39086365e+02
-3.59514771e+02]
and the shape is (13,33)
Can anyone help me, please?
Thanks in advance
Somewhat similarly to the issue raised in this question the issue is related to the intricacies of the underlying numerical operations that librosa defers to scipy. SciPy depends on LAPACK library being installed. So at first I would check if you have it installed.
Also, you may want to debug the script step-by-step to step into SciPy and examine actual values that are percolating from librosa.feature.delta to scipy.signal.savgol_filter which may tell you the reason when you cross-check them with documentation.
Related
I'm trying to run a K-S test on some data. Now I have the code working, but I'm not sure I understaned whats going on, and I also get an error when trying to set the loc. Essentially I get both the KS and P-test value. But I'm not sure I fully grasp it, enough to use the result.
I'm using the scipy.stats.ks_2samp module found here.
This is the code I am running
from scipy import stats
np.random.seed(12345678) #fix random seed to get the same result
n1 = len(low_ni_sample) # size of first sample
n2 = len(high_ni_sample) # size of second sample
# Scale is standard deviation
scale = 3
rvs1 = stats.norm.rvs(low_ni_sample[:,0], size=n1, scale=scale)
rvs2 = stats.norm.rvs(high_ni_sample[:,0], size=n2, scale=scale)
ksresult = stats.ks_2samp(rvs1, rvs2)
ks_val = ksresult[0]
p_val = ksresult[1]
print('K-S Statistics ' + str(ks_val))
print('P-value ' + str(p_val))
Which gives this:
K-S Statistics 0.04507948306145837
P-value 0.8362207851676332
Now for those examples I've seen, the loc is added in as this:
rvs1 = stats.norm.rvs(low_ni_sample[:,0], size=n1, loc=0., scale=scale)
rvs2 = stats.norm.rvs(high_ni_sample[:,0], size=n2, loc=0.5, scale=scale)
If I do that however, I get this error:
Traceback (most recent call last):
File "<ipython-input-342-aa890a947919>", line 13, in <module>
rvs1 = stats.norm.rvs(low_ni_sample[:,0], size=n1, loc=0., scale=scale)
File "/home/kongstad/anaconda3/envs/tensorflow/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py", line 937, in rvs
args, loc, scale, size = self._parse_args_rvs(*args, **kwds)
TypeError: _parse_args_rvs() got multiple values for argument 'loc'
Here is a snapshot, showing the content of the two datasets being used.
low_ni_sample, high_ni_sample.
So my questions are:
Why cant I add a loc value and what does it represent?
Changing the scale changes the result significantly, why and what to go by?
How would I plot this out in such a way it makes sense?
After running Silma's suggestion I stumbled upon a new error.
from scipy import stats
np.random.seed(12345678) #fix random seed to get the same result
n1 = len(low_ni_sample) # size of first sample
n2 = len(high_ni_sample) # size of second sample
# Scale is standard deviation
scale = 3
ndist = stats.norm(loc=0., scale=scale)
rvs1 = ndist.rvs(low_ni_sample[:,0],size=n1)
rvs2 = ndist.rvs(high_ni_sample[:,0],size=n2)
#rvs1 = stats.norm.rvs(low_ni_sample[:,2], size=n1, scale=scale)
#rvs2 = stats.norm.rvs(high_ni_sample[:,2], size=n2, scale=scale)
ksresult = stats.ks_2samp(rvs1, rvs2)
ks_val = ksresult[0]
p_val = ksresult[1]
print('K-S Statistics ' + str(ks_val))
print('P-value ' + str(p_val))
With this error message
rvs1 = ndist.rvs(low_ni_sample[:,0],size=n1)
TypeError: rvs() got multiple values for argument 'size'
The error comes from the fact that you should first create an instance of the normal distribution before using it:
ndist = stats.norm(loc=0., scale=scale)
then do
rvs1 = ndist.rvs(size=n1)
to generate n1 samples drawn from a normal distribution centered on 0 and with a standard deviation scale.
The location is therefore the mean of your distribution.
Changing the scale changes the variance of your distribution (you get more variability), so this obviously impacts the KS test...
As for the plot, I'm not sure I see what you mean... if you want to plot the histograms, then do
import matplotlib.pyplot as plt
plt.hist(rvs1)
plt.show()
Or even better, install seaborn and use their distplot methods, for instance the KDE.
Overall I would advise you to try to read a little more on distributions and KS tests before you go any further, see for instance the wikipedia page.
EDIT
the code shown above is used to generate random samples from a standard distribution (which I assumed was your goal, to compare with your samples).
If what you want to do is directly compare your two sample data, then all you need is
ksresult = stats.ks_2samp(low_ni_sample[:,0], high_ni_sample[:,0])
again, this is assuming that low_ni_sample[:,0]and high_ni_sample[:,0] are 1D-arrays containing many measurements of the quantity of interest, cf. ks_2samp documentation
I am a bit new to applying machine learning, so I was trying to teach myself how to do linear regression with any kind of data on mldata.org and in the Python scikit package. I tested out the linear regression example code (http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html) and the code worked well with the diabetes dataset. However, I tried to use the code with other datasets, such as one about earthquakes on mldata (http://mldata.org/repository/data/viewslug/global-earthquakes/). However, I was not able to do so due to the dimension problems on there.
Warning (from warnings module):
File "/usr/lib/python2.7/dist-packages/numpy/core/_methods.py", line 55
warnings.warn("Mean of empty slice.", RuntimeWarning)
RuntimeWarning: Mean of empty slice.
Warning (from warnings module):
File "/usr/lib/python2.7/dist-packages/numpy/core/_methods.py", line 65
ret, rcount, out=ret, casting='unsafe', subok=False)
RuntimeWarning: invalid value encountered in true_divide
Traceback (most recent call last):
File "/home/anthony/Documents/Programming/Python/Machine Learning/Scikit/earthquake_linear_regression.py", line 38, in <module>
regr.fit(earthquake_X_train, earthquake_y_train)
File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/base.py", line 371, in fit
linalg.lstsq(X, y)
File "/usr/lib/python2.7/dist-packages/scipy/linalg/basic.py", line 518, in lstsq
raise ValueError('incompatible dimensions')
ValueError: incompatible dimensions
How do I set up the dimensions of the data?
Size of the data:
earthquake_X.shape
(59209, 1, 4)
earthquake_X_train.shape
(59189, 1)
earthquake_y_test.shape
(3, 59209)
earthquake.target.shape
(3, 59209)
The code:
# Code source: Jaques Grobler
# License: BSD 3 clause
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
#Experimenting with earthquake data
from sklearn.datasets.mldata import fetch_mldata
import tempfile
test_data_home = tempfile.mkdtemp()
# Load the diabetes dataset
earthquake = fetch_mldata('Global Earthquakes', data_home = test_data_home)
# Use only one feature
earthquake_X = earthquake.data[:, np.newaxis]
earthquake_X_temp = earthquake_X[:, :, 2]
# Split the data into training/testing sets
earthquake_X_train = earthquake_X_temp[:-20]
earthquake_X_test = earthquake_X_temp[-20:]
# Split the targets into training/testing sets
earthquake_y_train = earthquake.target[:-20]
earthquake_y_test = earthquake.target[-20:]
print "Splitting of data for preformance check completed"
# Create linear regression object
regr = linear_model.LinearRegression()
print "Created linear regression object"
# Train the model using the training sets
regr.fit(earthquake_X_train, earthquake_y_train)
print "Dataset trained"
# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean square error
print("Residual sum of squares: %.2f"
% np.mean((regr.predict(earthquake_X_test) - earthquake_y_test) ** 2))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % regr.score(earthquake_X_test, earthquake_y_test))
# Plot outputs
plt.scatter(earthquake_X_test, earthquake_y_test, color='black')
plt.plot(earthquake_X_test, regr.predict(earthquake_X_test), color='blue',
linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()
Your array of targets (earthquake_y_train) is of wrong shape. Moreover actually it's empty.
When you do
earthquake_y_train = earthquake.target[:-20]
you select all rows but last 20 among first axis. And, according to the data you posted, earthquake.target has shape (3, 59209), so there are no rows to select!
But even if there were any, it'd be still an error. Why? Because first dimensions of X and y must be the same. According to the sklearn's documentation, LinearRegression's fit expects X to be of shape [n_samples, n_features] and y — [n_samples, n_targets].
In order to fix it change definitions of ys to the following:
earthquake_y_train = earthquake.target[:, :-20].T
earthquake_y_test = earthquake.target[:, -20:].T
P.S. Even if you fix all these problem there's still a problem in your script: plt.scatter can't work with "multidimensional" ys.
I recently tried to use scipy.odr package to conduct a regression analysis. Whenever I try to load a list of data where the elements depend on a function, a value error is raised:
ValueError: x could not be made into a suitable array
I have been using the same kind of programming to make fits using scipy's leastsq and curve_fit routines without problems.
Any idea of what to change and how to proceed? Thanks a lot...
Here I include a minimal working example:
from scipy import odr
from functools import partial
import numpy as np
import matplotlib.pyplot as plt
### choose select=0 and for myModel a list of elements is called which are a function of some parameters
### this results in error message: ValueError: x could not be made into a suitable array
### choose select=1, the function temp is exlcuded, and a fit is generated
### what do i have to do in order to run the programm successfully using select=0?
## choose here!
select=1
pfit=[1.0,1.0]
q0=[1,2,3,4,5]
q1=[3,8,10,19,27]
def temp(par, val):
p1,p2=par
temp_out = p1*val**p2
return temp_out
def fodr(a,x):
if select==0:
fitf = np.array([xi(a) for xi in x])
else:
fitf= a[0]*x**a[1]
return fitf
# define model
myModel = odr.Model(fodr)
# load data
damy=q1
if select==0:
damx=[]
for el in q0:
elm=partial(temp,val=el)
damx.append(elm)
#damx=[el(pfit) for el in damx] # check that function temp works fine
#print damx
else:
damx=q0
myData = odr.Data(damx, damy)
myOdr = odr.ODR(myData, myModel , beta0=pfit, maxit=100, ifixb=[1,1])
out = myOdr.run()
out.pprint()
Edit:
# Robert:
Thanks for your reply. I am using scipy version '0.14.0'. Using select==0 in my minimal example I get following traceback:
Traceback (most recent call last):
File "scipy-odr.py", line 48, in <module>
out = myOdr.run()
File "/home/tg/anaconda/lib/python2.7/site-packages/scipy/odr/odrpack.py", line 1061, in run
self.output = Output(odr(*args, **kwds))
ValueError: x could not be made into a suitable array
In short, your code does not work because damx is a now a list of functools.partial.
scipy.odr is a simple wrapper around Fortran Orthogonal Distance Regression (ODRPACK), both xdata and ydata have to be numerical since they will be converted to some Fortran type under the hood. It doesn't know what to do with a list of functools.partial, therefore the error.
The code could compute Fourier transform from a .tiff image on my Ubuntu 11.04. On Windows XP it produces memory error. What to change? Thank you.
def fouriertransform(result): #function for Fourier transform computation
for filename in glob.iglob ('*.tif')
imgfourier = scipy.misc.imread(filename) #read the image
arrayfourier = numpy.array([imgfourier])#make an array
# Take the fourier transform of the image.
F1 = fftpack.fft2(arrayfourier)
# Now shift so that low spatial frequencies are in the center.
F2 = fftpack.fftshift(F1)
# the 2D power spectrum is:
psd2D = np.abs(F2)**2
L = psd2D
np.set_printoptions(threshold=3)
#np.set_printoptions(precision = 3, threshold = None, edgeitems = None, linewidth = 3, suppress = True, nanstr = None, infstr = None, formatter = None)
for subarray in L:
for array in subarray:
for array in subarray:
for elem in array:
print '%3.10f\n' % elem
The error output is:
Traceback (most recent call last):
File "C:\Documents and Settings\HrenMudak\Мои документы\Моя музыка\fourier.py", line 27, in <module>
F1 = fftpack.fft2(arrayfourier)
File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 571, in fft2
return fftn(x,shape,axes,overwrite_x)
File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 521, in fftn
return _raw_fftn_dispatch(x, shape, axes, overwrite_x, 1)
File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 535, in _raw_fftn_dispatch
return _raw_fftnd(tmp,shape,axes,direction,overwrite_x,work_function)
File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 463, in _raw_fftnd
x, copy_made = _fix_shape(x, s[i], waxes[i])
File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 134, in _fix_shape
z = zeros(s,x.dtype.char)
MemoryError
I've tried to run your code, except that I replaced the mahotas.imread with the scipy.misc.imread function, because I don't have that library, and I could not reproduce your error.
Some further remarks:
can you try to use the scipy.misc.imread function instead of the mahotas function? I suppose the issue could be there
what is the actual exception that is thrown? (+other output?)
what are the dimensions of your image? Gray-scale / RGB? Printing all values for a large image could indeed take up quite some memory, so it might be better to visualize the results with e.g. matplotlibs imshow function.
I am trying to convert an image from cartesian to polar so that I can unravel the image, but I am getting a runtime error. If you are curious how this looks visually, see this example.
Code:
import scipy
import scipy.ndimage
import numpy as np
from math import *
import cv2
def logpolar(input):
# This takes a numpy array and returns it in Log-Polar coordinates.
coordinates = np.mgrid[0:max(input.shape[:])*2,0:360] # We create a cartesian array which will be used to compute log-polar coordinates.
log_r = 10**(coordinates[0,:]/(input.shape[0]*2.)*log10(input.shape[1])) # This contains a normalized logarithmic gradient
angle = 2.*pi*(coordinates[1,:]/360.) # This is a linear gradient going from 0 to 2*Pi
# Using scipy's map_coordinates(), we map the input array on the log-polar coordinate. Do not forget to center the coordinates!
lpinput = scipy.ndimage.interpolation.map_coordinates(input,(log_r*np.cos(angle)+input.shape[0]/2.,log_r*np.sin(angle)+input.shape[1]/2.),order=3,mode='constant')
# Returning log-normal...
return lpinput
# Load image
image = cv2.imread("test.jpg")
result = logpolar(image)
Error message in console:
Traceback (most recent call last):
File "test.py", line 23, in <module>
result = logpolar(image)
File "test.py", line 15, in logpolar
lpinput = scipy.ndimage.interpolation.map_coordinates(input,(log_r*np.cos(angle)+input.shape[0]/2.,log_r*np.sin(angle)+input.shape[1]/2.),order=3,mode='constant')
File "/Library/Python/2.7/site-packages/scipy-0.13.0.dev_c31f167_20130415-py2.7-macosx-10.8-intel.egg/scipy/ndimage/interpolation.py", line 295, in map_coordinates
raise RuntimeError('invalid shape for coordinate array')
RuntimeError: invalid shape for coordinate array
My first guess would be that you are passing in a colour image which is 3 dimensional. At first glance I don't think your code could handle that.
My guess was based off of the error you pasted, specifically
"invalid shape for coordinate array"
When using higher dimensional arrays like that usually you have to pass extra parameters around specifying which axis to repeat the operations over and even then sometimes it does not work. I didn't see a repeated extra integer at the end of your argument lists so I figured you weren't trying to handle that case explicitly and might have forgotten to check your array dimensions after reading in the image.
Glad it helped :)