Error while trying to use Regression on N_dimensional Arrays - python

My code is supposed to read audio files and predict another audio file(I dont care about accruacy for now just the error)
regr = svm.SVR()
print('Fitting...')
regr.fit(data0, data1)
clf1= regr.fit(sample_rate1,sample_rate0)
clf0 = regr.fit(data,data1)
print('Done!')
predata = clf.predict(data2)
predrate = clf1.predict(sample_rate2)
wavfile.write('result.wav',predrate,predata)# using predicted ndarrays it saves the audio file
The error which I get is:
Traceback (most recent call last):
File "D:\ Folder\Python\module properties\wav.py", line 10, in <module>
regr.fit(data0, data1)
File "C:\Users\Admin1\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\svm\_base.py", line 169, in fit
X, y = self._validate_data(X, y, dtype=np.float64,
File "C:\Users\Admin1\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\base.py", line 433, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "C:\Users\Admin1\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\Admin1\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\utils\validation.py", line 826, in check_X_y
y = column_or_1d(y, warn=True)
File "C:\Users\Admin1\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\Admin1\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\utils\validation.py", line 864, in column_or_1d
raise ValueError(
ValueError: y should be a 1d array, got an array of shape (8960, 2) instead.

Check your independent and dependent variable X and y assignments.
The 'fit' function is used in the form model.fit(X,y), and the code that establishes the model fit and that gave you the error seems to be:
regr.fit(data0, data1)
Thus your predictor variables as written should be X = data0, and your target (output) variable should be y = data1.
Make sure you don't have it in reverse, and it shouldn't be:
regr.fit(data1, data0)
If the data is correctly assigned, try flattening the array.
You are also given the ValueError, "y should be a 1d array, got an array of shape (8960, 2) instead."
Flattening means converting a multidimensional array to a one dimensional array. Try reshape(-1).
data1 = data1.reshape(-1)
I hope this helps! Without any additional information about the dataset and the model's code, it's hard to figure out what to do next.

Related

statsmodels SARIMAX with exogenous variables matrices are different sizes

I'm running a SARIMAX model but running into problems with specifying the exogenous variables. In the first block of code (below) I specify one exogenous variable lesdata['LESpost'] and the model runs without a problem. However, when I add in another exogenous variable I end up with an error message (see stack trace).
ar = (1,0,1) # AR(1 3)
ma = (0) # No MA terms
mod1 = sm.tsa.statespace.SARIMAX(lesdata['emadm'], exog= (lesdata['LESpost'],lesdata['QOF']), trend='c', order=(ar,0,ma), mle_regression=True)
Traceback (most recent call last):
File "<ipython-input-129-d1300aeaeffc>", line 4, in <module>
mle_regression=True)
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\tsa\statespace\sarimax.py", line 510, in __init__
endog, exog=exog, k_states=k_states, k_posdef=k_posdef, **kwargs
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\tsa\statespace\mlemodel.py", line 84, in __init__
missing='none')
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\tsa\base\tsa_model.py", line 43, in __init__
super(TimeSeriesModel, self).__init__(endog, exog, missing=missing)
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\base\model.py", line 212, in __init__
super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\base\model.py", line 63, in __init__
**kwargs)
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\base\model.py", line 88, in _handle_data
data = handle_data(endog, exog, missing, hasconst, **kwargs)
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\base\data.py", line 630, in handle_data
**kwargs)
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\base\data.py", line 80, in __init__
self._check_integrity()
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\base\data.py", line 496, in _check_integrity
super(PandasData, self)._check_integrity()
File "C:\Users\danie\Anaconda2\lib\site-packages\statsmodels\base\data.py", line 403, in _check_integrity
raise ValueError("endog and exog matrices are different sizes")
ValueError: endog and exog matrices are different sizes
Is there something obvious I am missing here? The variables are all of the same length and there are no missing data.
Thanks for reading and hope you can help !
Two dimensional data needs to have observations in row and variables in columns after applying numpy.asarray.
exog = (lesdata['LESpost'],lesdata['QOF'])
Applying asarray to this tuple puts the variables in rows which is the numpy default from the C origin which is not what statsmodels wants.
DataFrames are already shaped in the appropriate way, so one option is to use a DataFrame with the desired columns
exog = lesdata[['LESpost', 'QOF']]
Another option for list or tuples of array_likes is to use numpy.column_stack, e.g.
exog = np.column_stack((lesdata['LESpost'].values,lesdata['QOF'].values))

Cannot plot my function : return array(a, dtype, copy=False, order=order) TypeError: float() argument must be a string or a number

I'm trying to plot a function that gives the arctan of the angle of several scatterplots (it's a physics experiment):
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
filename='rawPhaseDataf2f_17h_15m.dat'
datatype=np.dtype( [('Shotnumber',np.dtype('>f8')),('A1',np.dtype('>f8')), ('A2',np.dtype('>f8')), ('f2f',np.dtype('>f8')), ('intensity',np.dtype('>f8'))])
data=np.fromfile(filename,dtype=datatype)
#time=data['Shotnumber']/9900 # reprate is 9900 Hz -> time in seconds
A1=data['A1']
A2=data['A2']
#np.sort()
i=range(1,209773)
def x(i) :
return arctan((A1.item(i)/A2.item(i))*(i/209772))
def y(i) :
return i*2*pi/209772
plot(x,y)
plt.figure('Scatterplot')
plt.plot(A1,A2,',') #Scatterplot
plt.xlabel('A1')
plt.ylabel('A2')
plt.figure('2D Histogram')
plt.hist2d(A1,A2,100) # 2D Histogram
plt.xlabel('A1')
plt.ylabel('A2')
plt.show()
My error is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/spyderlib/widgets/externalshell /sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "/home/nelly/Bureau/ Téléchargements/Kr4 Experiment/read_rawPhaseData.py", line 21, in <module>
plot(x,y)
File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 2987, in plot
ret = ax.plot(*args, **kwargs)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 4138, in plot
self.add_line(line)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 1497, in add_line
self._update_line_limits(line)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 1508, in _update_line_limits
path = line.get_path()
File "/usr/lib/pymodules/python2.7/matplotlib/lines.py", line 743, in get_path
self.recache()
File "/usr/lib/pymodules/python2.7/matplotlib/lines.py", line 420, in recache
x = np.asarray(xconv, np.float_)
File "/usr/lib/python2.7/dist-packages/numpy/core/numeric.py", line 460, in asarray
return array(a, dtype, copy=False, order=order)
TypeError: float() argument must be a string or a number
I know that the problem is from the plot(x,y). I think that my error comes from the definition of x and y. A1 and A2 are matrix, N the number of points and Ak is the index of the matrix. I want to have arctan(A1k/A2k)*(k/N).
There are lots of problems with your code, and your understanding of python and array operations. I'm just going to handle the first part of the code (and the error you get), and hopefully you can continue to fix it from there.
This should fix the error you're getting and generate a plot:
# size = 209772
size = A1.size # I'm assuming that the size of the array is 209772
z = np.arange(1, size+1)/(size+1) # construct an array from [1/209773, 1.0]
# Calculate the x and y arrays
x = np.arctan((A1/A2)*z)
y = z*2*pi
# Plot x and y
plt.plot(x, y)
Discussion:
There are lots of issues with this chunk of code:
i=range(1,209773)
def x(i) :
return arctan((A1.item(i)/A2.item(i))*(i/209772))
def y(i) :
return i*2*pi/209772
plot(x, y)
You're defining two functions called x and y, and then you are passing those functions to the plotting method. The plotting method accepts numbers (in lists or arrays), not functions. That is the reason for the error that you are getting. So you instead need to construct a list/array of numbers and pass that to the function.
You're defining a variable i which is a list of numbers. But when you define the functions x and y, you are creating new variables named i which have nothing to do with the list you created earlier. This is because of how "scope" works in python.
The functions arctan and plot are not defined "globally", instead they are only defined in the packages numpy and matplotlib. So you need to call them from those packages.

Scikit-learn: linear_model.SGDClassifier(): ValueError: ndarray is not C-contiguous when calling partial_fit()

I am trying to run a linear_model.SGDClassifier() and have it update after every example it classifies.
My code works for a small feature file (10 features), but when I give it a bigger feature file (some 80000 features, but very sparse) it keeps giving me errors straight away, the first time partial_fit() is called.
This is what I do in pseudocode:
X, y = load_svmlight_file(train_file)
classifier = linear_model.SGDClassifier()
classifier.fit(X, y)
for every test_line in test file:
test_X, test_y = getFeatures(test_line)
# This gives me a Python list for X
# and an integer label for y
print "prediction: %f" % = classifier.predict([test_X])
classifier.partial_fit(csr_matrix([test_X]),
csr_matrix([Y_GroundTruth])
classes=np.unique(y) )
The error I keep getting for the partial_fit() line is:
File "/bla/bla/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py", line 487, in partial_fit
coef_init=None, intercept_init=None)
File "/bla/bla/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py", line 371, in _partial_fit
sample_weight=sample_weight, n_iter=n_iter)
File "/bla/bla/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py", line 451, in _fit_multiclass
for i in range(len(self.classes_)))
File "/bla/bla/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 517, in __call__
self.dispatch(function, args, kwargs)
File "/bla/bla/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 312, in dispatch
job = ImmediateApply(func, args, kwargs)
File "/bla/bla/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 136, in __init__
self.results = func(*args, **kwargs)
File "/bla/bla/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py", line 284, in fit_binary
est.power_t, est.t_, intercept_decay)
File "sgd_fast.pyx", line 327, in sklearn.linear_model.sgd_fast.plain_sgd (sklearn/linear_model/sgd_fast.c:7568)
ValueError: ndarray is not C-contiguous
I also tried feeding partial.fit() Python arrays, or numpy arrays (which are C-contiguous (sort=C) by default, I thought), but this gives the same result.
The classes attribute is not the problem I think. The same error appears if I leave it out or if I give the right classes in hard code.
I do notice that when I print the flags of the _coef array of the classifier, it says:
Flags of coef_ array:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
I am sure I am doing something wrong, but really, I don't see what...
Any help appreciated!
For the record (so this question doesn't appear unanswered), this question was previously answered on the scikit-learn mailing list. It's a bug in scikit-learn 0.14's SGDClassifier. The workaround is to replace the initial fit with partial_fit.
Update: I fixed the bug a couple of minutes ago.

ZeroDivisionError when using scipy.interpolate.griddata

I'm getting a ZeroDivisionError from the following code:
#stacking the array into a complex array allows np.unique to choose
#truely unique points. We also keep a handle on the unique indices
#to allow us to index `self` in the same order.
unique_points,index = np.unique(xdata[mask]+1j*ydata[mask],
return_index=True)
#Now we break it into the data structure we need.
points = np.column_stack((unique_points.real,unique_points.imag))
xx1,xx2 = self.meta['rcm_xx1'],self.meta['rcm_xx2']
yy1 = self.meta['rcm_yy2']
gx = np.arange(xx1,xx2+dx,dx)
gy = np.arange(-yy1,yy1+dy,dy)
GX,GY = np.meshgrid(gx,gy)
xi = np.column_stack((GX.ravel(),GY.ravel()))
gdata = griddata(points,self[mask][index],xi,method='linear',
fill_value=np.nan)
Here, xdata,ydata and self are all 2D numpy.ndarrays (or subclasses thereof) with the same shape and dtype=np.float32. mask is a 2d ndarray with the same shape and dtype=bool. Here's a link for those wanting to peruse the scipy.interpolate.griddata documentation.
Originally, xdata and ydata are derived from a non-uniform cylindrical grid that has a 4 point stencil -- I thought that the error might be coming from the fact that the same point was defined multiple times, so I made the set of input points unique as suggested in this question. Unfortunately, that hasn't seemed to help. The full traceback is:
Traceback (most recent call last):
File "/xxxxxxx/rcm.py", line 428, in <module>
x[...,1].to_pz0()
File "/xxxxxxx/rcm.py", line 285, in to_pz0
fill_value=fill_value)
File "/usr/local/lib/python2.7/site-packages/scipy/interpolate/ndgriddata.py", line 183, in griddata
ip = LinearNDInterpolator(points, values, fill_value=fill_value)
File "interpnd.pyx", line 192, in scipy.interpolate.interpnd.LinearNDInterpolator.__init__ (scipy/interpolate/interpnd.c:2935)
File "qhull.pyx", line 996, in scipy.spatial.qhull.Delaunay.__init__ (scipy/spatial/qhull.c:6607)
File "qhull.pyx", line 183, in scipy.spatial.qhull._construct_delaunay (scipy/spatial/qhull.c:1919)
ZeroDivisionError: float division
For what it's worth, the code "works" (No exception) if I use the "nearest" method.

Drawing under a curve in matplotlib

For a subplot (self.intensity), I want to shade the area under the graph.
I tried this, hoping it was the correct syntax:
self.intensity.fill_between(arange(l,r), 0, projection)
Which I intend as to do shading for projection numpy array within (l,r) integer limits.
But it gives me an error. How do I do it correctly?
Heres the traceback:
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/matplotlib/backends/backend_wx.py", line 1289, in _onLeftButtonDown
FigureCanvasBase.button_press_event(self, x, y, 1, guiEvent=evt)
File "/usr/lib/pymodules/python2.7/matplotlib/backend_bases.py", line 1576, in button_press_event
self.callbacks.process(s, mouseevent)
File "/usr/lib/pymodules/python2.7/matplotlib/cbook.py", line 265, in process
proxy(*args, **kwargs)
File "/usr/lib/pymodules/python2.7/matplotlib/cbook.py", line 191, in __call__
return mtd(*args, **kwargs)
File "/root/dev/spectrum/spectrum/plot_handler.py", line 55, in _onclick
self._call_click_callback(event.xdata)
File "/root/dev/spectrum/spectrum/plot_handler.py", line 66, in _call_click_callback
self.__click_callback(data)
File "/root/dev/spectrum/spectrum/plot_handler.py", line 186, in _on_plot_click
band_data = self._band_data)
File "/root/dev/spectrum/spectrum/plot_handler.py", line 95, in draw
self.intensity.fill_between(arange(l,r), 0, projection)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 6457, in fill_between
raise ValueError("Argument dimensions are incompatible")
ValueError: Argument dimensions are incompatible
It seems like you are trying to fill the part of the projection from l to r. fill_between expects the x and y arrays to be of equal lengths, so you can not expect to fill part of the curve only.
To get what you want, you can do either of the following:
1. send only part of the projection that needs to be filled to the command; and draw the rest of the projection separately.
2. send a separate boolean array as argument that defines the sections to fill in. See the documentation!
For the former method, see the example code below:
from pylab import *
a = subplot(111)
t = arange(1, 100)/50.
projection = sin(2*pi*t)
# Draw the original curve
a.plot(t, projection)
# Define areas to fill in
l, r = 10, 50
# Fill the areas
a.fill_between(t[l:r], projection[l:r])
show()

Categories