Finding error range for peak point using polynomial fitting - python

I have data for a spectral line which makes a noisy U shaped curve .
I want to fit a curve and find the x,y values for the minimum point .
I then fitted a polynomial to it using polyfit .
I then found to minimum point on the fitted curve .
NB: The original curve is not symmetric (The left side is slightly steeper than the right .)
Therefore the min(original) is slightly left of min(fitted_curve)
How do I find the X and Y errors for this point ?
Here are the bones of my code :
import pylab , numpy
x = [... linear list of floats ...]
y = [... list of floats ...] # Produces a noisy U shaped curve .
fit = numpy.polyfit(x,y,3)
fit2 = numpy.polyval(fit,x) # Fit a third order polynomial .
miny = # min y value on fitted curve .
minx = # corresponding x value , not the actually min(x) .
pylab.plot(x,y,'k-')
pylab.plot(x,fitt,'r-')
pylab.plot(minx,miny,'ro')
pylab.show()
Now that I have the original [x,y] , the fitted curve [x,fitt2] and the minimum point on the fitted curve [minx,miny] , how do I find the error range for this single point ?
Thanks .

For numpy 1.7 the polyfit has the option cov=True. You get as additional output the covariance matrix. From this, using Gaussian error propagation, you can get the error of the minimum. But what kind of spectrum is it? Very often there are model shapes to fit, such that there is no need for the polynomial fit.
You might also want to look at scipy.optimize.curve_fit
PS: What makes you think that the true value is left of your fitted value. This would be true if your fit function was symmetric, being applied to the asymmetric peak. The third order polynomial, however, should be able to address asymmetry.

Related

How are the coefficients derived in the scipy.interpolate.splprep function

Pardon this long post.
I am new to BSpline and is struggling to understand few things. I have a set of data points for which I need to construct a BSpline curve. The Datapoints are as follows:
x = [150 130 148]
y = [149 114 79]
After running the following function:
from scipy.interpolate import splprep, splev
tck, u = splprep([x, y], k =2, s = 0)
I am getting
parameters
u = [0. 0.505987 1.]
knots t = [0, 0, 0, 1, 1, 1]
coefficients c = [array([150. , 111.01850233, 148. ]), array([149. , 114.83829958, 79. ])]
k = 2 (this is the degree of curve I have used as an input for splprep).
I now need to test whether the t,c,k values generated are correct or not.
I ran the following function -
newPts = splev(u, tck)
This is giving me back the x and y data points I used in slprep.
newPts[0] = x
newPts[1] = y
Plotting newPts[0]againt newPts[1] gives me the following spline
spline evlaluation 1
The second test I ran was changing the parameters value to
u = np.linspace(0,1,5)
then ran the following
newPts = splev(u, tck)
This time my spline curve looks like the following
spline evaluation 2
From the following links computing the parameters, knot vector generation, I deduced that my parameter(u) and knots(t) are derived correctly. However, the computation of coeffcients look complicated. But from the Global Curve interpolation formula, found here, coefficient matrix, I can see the coefficient matrix is an nXn, matrix that is in my case it has to be a 3X3 matrix. But the coefficient matrix that I am getting is 2X3 that too the first array are the coefficients of x and the last array are the coefficients of y.
I really need a concrete way to prove if the coefficients derived from the splprep library are correct or not.
Really appreciate the help.
Yes, the values are correct. Let me show you how I have checked them using the Wolfram language (AKA Mathematica):
First, I take the control points (what you saved as c)
cps={{150,149},{111.01850233,114.83829958},{148,79}};
Since there are no internal knots (i.e., t=[0, 0, 0, 1, 1, 1]), your B-spline actually reduces to a Bézier curve. Let's create it:
curve:=BezierFunction[cps]
Now we can evaluate it in the parameters u and check that it interpolates your data.
In[23]:= curve[0]
Out[23]= {150.,149.}
In[24]:= curve[0.505987]
Out[24]= {130.,114.}
In[25]:= curve[1]
Out[25]= {148.,79.}
We can even plot the entire curve:
data={{150,149}, {130,114},{148,79}};
Graphics[{PointSize[Large],BezierCurve[cps], Green, Line[cps],Red,Point[cps],Blue, Point[data]}]
The curve is black, its control polygon red and the data points blue; clearly, the curve passes through all the three data points.

Fit curve to data, get analytical form, & check when curve crosses threshold

I have 40 points for each curve and I would like to smooth the function and estimate when the curve crosses a threshold on the y axis.
Is there a fitting function that I can easily apply this to, I can use interpolate to plot the new function but I can't figure out how to request the x value for which y = threshold.
Unfortunately the curves don't all have the same shape so I can't use scipy.optimize.curve_fit.
Thanks!
Update: Adding two curves:
Curve 1
[942.153,353.081,53.088,125.110,140.851,188.170,70.536,-122.473,-369.061,-407.945,88.734,484.334,267.762,65.831,74.010,-55.781,-260.024,-466.830,-524.511,-76.833,-36.779,-117.366,218.578,175.662,185.653,299.285,215.276,546.048,1210.132,3087.326,7052.849,13867.824,27156.939,51379.664,91908.266,148874.563,215825.031,290073.219,369567.781,437031.688]
Curve 2
[-39034.039,-34637.941,-24945.094,-16697.996,-9247.398,-2002.051,3409.047,3658.145,7542.242,11781.340,11227.688,10089.035,9155.883,8413.980,5289.578,3150.676,4590.023,6342.871,3294.719,580.567,-938.586,-3919.738,-5580.390,-3141.793,-2785.945,-2683.597,-4287.750,-4947.902,-7347.554,-8919.457,-6403.359,-6722.011,-8181.414,-6807.566,-7603.218,-6298.371,-6909.523,-5878.675,-5193.578,-7193.980]
x values are
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40]
For fitting a smooth curve, you can fit Legendre polynomials using numpy.polynomial.legendre.Legendre's fit method.
# import packages we need later
import matplotlib.pyplot as plt
import numpy as np
Fitting Legendre Polynomials
Preparing data as numpy arrays:
curve1 = \
np.asarray([942.153,353.081,53.088,125.110,140.851,188.170,70.536,-122.473,-369.061,-407.945,88.734,484.334,267.762,65.831,74.010,-55.781,-260.024,-466.830,-524.511,-76.833,-36.779,-117.366,218.578,175.662,185.653,299.285,215.276,546.048,1210.132,3087.326,7052.849,13867.824,27156.939,51379.664,91908.266,148874.563,215825.031,290073.219,369567.781,437031.688])
curve2 = \
np.asarray([-39034.039,-34637.941,-24945.094,-16697.996,-9247.398,-2002.051,3409.047,3658.145,7542.242,11781.340,11227.688,10089.035,9155.883,8413.980,5289.578,3150.676,4590.023,6342.871,3294.719,580.567,-938.586,-3919.738,-5580.390,-3141.793,-2785.945,-2683.597,-4287.750,-4947.902,-7347.554,-8919.457,-6403.359,-6722.011,-8181.414,-6807.566,-7603.218,-6298.371,-6909.523,-5878.675,-5193.578,-7193.980])
xvals = \
np.asarray([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40])
Lets fit the Legendre polynomials, degree being the highest degree polynomial used, the first few is here for example.
degree=10
legendrefit_curve1 = np.polynomial.legendre.Legendre.fit(xvals, curve1, deg=degree)
legendrefit_curve2 = np.polynomial.legendre.Legendre.fit(xvals, curve2, deg=degree)
Calculate these fitted curves at evenly spaced points using the linspace method. n is the number of point pairs we want to have.
n=100
fitted_vals_curve1 = legendrefit_curve1.linspace(n=n)
fitted_vals_curve2 = legendrefit_curve2.linspace(n=n)
Let's plot the result, along with a threshold (using axvline):
plt.scatter(xvals, curve1)
plt.scatter(xvals, curve2)
plt.plot(fitted_vals_curve1[0],fitted_vals_curve1[1],c='r')
plt.plot(fitted_vals_curve2[0],fitted_vals_curve2[1],c='k')
threshold=100000
plt.axhline(y=threshold)
The curves fit beautifully.
When is the threshold crossed?
To check where the threshold is crossed in each series, you can do:
for x, y in zip(fitted_vals_curve1[0], fitted_vals_curve1[1]):
if y > threshold:
xcross_curve1 = x
break
for x, y in zip(fitted_vals_curve2[0], fitted_vals_curve2[1]):
if y > threshold:
xcross_curve2 = x
break
xcross_curve1 and xcross_curve2 will hold the x value where curve1 and curve2 crossed the threshold if they did cross the threshold; if they did not, they will be undefined.
Let's plot them to check if it works (link to axhline docs):
plt.scatter(xvals, curve1)
plt.scatter(xvals, curve2)
plt.plot(fitted_vals_curve1[0],fitted_vals_curve1[1],c='r')
plt.plot(fitted_vals_curve2[0],fitted_vals_curve2[1],c='k')
plt.axhline(y=threshold)
try: plt.axvline(x=xcross_curve1)
except NameError: print('curve1 is not passing the threshold',c='b')
try: plt.axvline(x=xcross_curve2)
except NameError: print('curve2 is not passing the threshold')
As expected, we get this plot:
(and a text output: curve2 is not passing the threshold.)
If you would like to increase accuracy of xcross_curve1 or xcross_curve2, you can increase degree and n defined above.
From Legendre to Polynomial form
We have fitted a curve, which roughly has the form:
where P_n is the nth Legendre polynomial, s(x) is some function which transforms x to the range P_n expects (some math stuff which we don't need to know now).
We want our fitted line in the form:
We'll use legendre() of scipy.special:
from scipy.special import legendre
We'll also use use np.pad (docs, good SO post).
legendredict={}
for icoef, coef in enumerate(legendrefit_curve1.coef):
legendredict[icoef]=coef*np.pad(legendre(icoef).coef,(10-icoef,0),mode='constant')
legendredict will hold keys from 0 to 10, and each value in the dict will be a list of floats. The key is refering to the degree of the Polynomial, and the list of floats are expressing what are the coefficients of x**n values within that constituent polynomial of our fit, in a backwards order.
For example:
P_4 is:
legendredict[4] is:
isarray([ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 3.29634565e+05, 3.65967884e-11,
-2.82543913e+05, 1.82983942e-11, 2.82543913e+04])
Meaning that in the sum of P_ns (f(x), above), we have q_4 lot of P_4, which is equivalent to having 2.82543913e+04 of 1s, 1.82983942e-11 of x, -2.82543913e+05 of x^2, etc, only from the P_4 component.
So if we want to know how much 1s, xs, x^2s, etc we need to form the polynomial sum, we need to add the need for 1s, xs, x^2s, etcs from all the different P_ns. This is what we do:
polycoeffs = np.sum(np.stack(list(legendredict.values())),axis=0)
Then let's form a polynomial sum:
for icoef, coef in enumerate(reversed(polycoeffs)):
print(str(coef)+'*s(x)**'+str(icoef),end='\n +')
Giving the output:
-874.1456709637822*s(x)**0
+2893.7228005540596*s(x)**1
+50415.38472217957*s(x)**2
+-6979.322584205707*s(x)**3
+-453363.49985790614*s(x)**4
+-250464.7549807652*s(x)**5
+1250129.5521521813*s(x)**6
+1267709.5031024509*s(x)**7
+-493280.0177807359*s(x)**8
+-795684.224334346*s(x)**9
+-134370.1696946264*s(x)**10
+
(We are going to ignore the last + sign, formatting is not the main point here.)
We need to calculate s(x) as well. If we are working in a Jupyter Notebook / Google Colab, only executing a cell with legendrefit_curve1 returns:
From where we can clearly see that s(x) is -1.0512820512820513+0.05128205128205128x. If we want to do it in a more programmatic way:
2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0]) is 0.05128205128205128 &
-1-2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0]) is just -1.0512820512820513
Which is true for some mathamatical reasons not much relevant here (related Q).
So we can define:
def s(input):
a=-1-2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0])
b=2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0])
return a+b*input
Also, lets define, based on the above obtained sum of polynomials of s(x):
def polyval(x):
return -874.1456709637822*s(x)**0+2893.7228005540596*s(x)**1+50415.38472217957*s(x)**2+-6979.322584205707*s(x)**3+-453363.49985790614*s(x)**4+-250464.7549807652*s(x)**5+1250129.5521521813*s(x)**6+1267709.5031024509*s(x)**7+-493280.0177807359*s(x)**8+-795684.224334346*s(x)**9+-134370.1696946264*s(x)**10
In a more programmatic way:
def polyval(x):
return sum([coef*s(x)**icoef for icoef, coef in enumerate(reversed(polycoeffs))])
Check that our polynomial indeed fits:
plt.scatter(fitted_vals_curve1[0],fitted_vals_curve1[1],c='r')
plt.plot(fitted_vals_curve1[0],[polyval(val) for val in fitted_vals_curve1[0]])
It does:
So let's print out our pure polynomial sum, with s(x) being replaced by an explicit function:
for icoef, coef in enumerate(reversed(polycoeffs)):
print(str(coef)+'*(-1.0512820512820513+0512820512820513*x)**'+str(icoef),end='\n +')
Giving the output:
-874.1456709637822*(-1.0512820512820513+0512820512820513*x)**0
+2893.7228005540596*(-1.0512820512820513+0512820512820513*x)**1
+50415.38472217957*(-1.0512820512820513+0512820512820513*x)**2
+-6979.322584205707*(-1.0512820512820513+0512820512820513*x)**3
+-453363.49985790614*(-1.0512820512820513+0512820512820513*x)**4
+-250464.7549807652*(-1.0512820512820513+0512820512820513*x)**5
+1250129.5521521813*(-1.0512820512820513+0512820512820513*x)**6
+1267709.5031024509*(-1.0512820512820513+0512820512820513*x)**7
+-493280.0177807359*(-1.0512820512820513+0512820512820513*x)**8
+-795684.224334346*(-1.0512820512820513+0512820512820513*x)**9
+-134370.1696946264*(-1.0512820512820513+0512820512820513*x)**10
+
Which can be simplified, as desired. (Ignore the last + sign.)
If want a higher (lower) degree polynomial fit, just fit higher (lower) degrees of Legendre polynomials.

How to plot svm hyperplane with only one feature

I have a dataset with one feature and I'm using scikit-learn train a support vector classifier. I'd like to visualize the results, but I'm a little bit perplexed on how to plot the scatter. I'm getting my hyperplane by doing the following:
slope = clf.coef_[0][0]
intercept = clf.intercept_[0]
Which gives me y = -.01x + 2.5
I'm assuming this is my hyperplane. I can't seem to figure out how to plot my data around this with only one feature. What would I use for my y-axis?
It's an interesting problem. On the surface it's very simple — one feature means one dimension, hence the hyperplane has to be 0-dimensional, i.e. a point. Yet what scikit-learn gives you is a line. So the question is really how to turn this line into a point.
I've spent about an hour looking for answers in the documentation for scikit-learn, but there is simply nothing on 1-d SVM classifiers (probably because they are not practical). So I decided to play with the sample code below to see if I can figure out the answer:
from sklearn import svm
n_samples = 100
X = np.concatenate([np.random.normal(0,0.1,n_samples), np.random.normal(10,0.1,n_samples)]).reshape(-1,1)
y = np.array([0]*n_samples+[1]*n_samples)
clf = svm.LinearSVC(max_iter = 10000)
clf.fit(X,y)
slope = clf.coef_
intercept = clf.intercept_
print(slope, intercept)
print(-intercept/slope)
X is the array of samples such that the first 100 points are sampled from N(0,0.1), and the next 100 points are sampled from N(10,0.1). y is the array of labels (100 of class '0' and 100 of class '1'). Intuitively it's clear the hyperplane should be halfway between 0 and 10.
Once you fit the classifier, you find out the intercept is about -0.96 which is nowhere near where the 0-d hyperplane (i.e. a point) should be. However, if you take y=0 and back-calculate x, it will be pretty close to 5. Now try changing the means of the distributions that make up X, and you will find that the answer is always -intercept/slope. That's your 0-d hyperplane (point) for the classifier.
So to visualize, you merely need to plot your data on a number line (use different colours for the classes), and then plot the boundary obtained by dividing the negative intercept by the slope. I'm not sure how to plot a number line, but you can always resort to a scatter plot with all y coordinates set to 0.

What does np.polyfit do and return?

I went through the docs but I'm not able to interpret correctly
IN my code, I wanted to find a line that goes through 2 points(x1,y1), (x2,y2), so I've used
np.polyfit((x1,x2),(y1,y2),1)
since its a 1 degree polynomial(a straight line)
It returns me [ -1.04 727.2 ]
Though my code (which is a much larger file) runs properly, and does what it is intended to do - i want to understand what this is returning
I'm assuming polyfit returns a line(curved, straight, whatever) that satisfies(goes through) the points given to it, so how can a line be represented with 2 points which it is returning?
From the numpy.polyfit documentation:
Returns:
p : ndarray, shape (deg + 1,) or (deg + 1, K)
Polynomial coefficients, highest power first. If y was 2-D, the coefficients for k-th data set are in p[:,k].
So these numbers are the coefficients of your polynomial. Thus, in your case:
y = -1.04*x + 727.2
By the way, numpy.polyfit will only return an equation that goes through all the points (say you have N) if the degree of the polynomial is at least N-1. Otherwise, it will return a best fit that minimises the squared error.
These are essentially the beta and the alpha values for the given data.
Where beta necessarily demonstrates the degree of volatility or the slope

Reducing difference between two graphs by optimizing more than one variable in MATLAB/Python?

Suppose 'h' is a function of x,y,z and t and it gives us a graph line (t,h) (simulated). At the same time we also have observed graph (observed values of h against t). How can I reduce the difference between observed (t,h) and simulated (t,h) graph by optimizing values of x,y and z? I want to change the simulated graph so that it imitates closer and closer to the observed graph in MATLAB/Python. In literature I have read that people have done same thing by Lavenberg-marquardt algorithm but don't know how to do it?
You are actually trying to fit the parameters x,y,z of the parametrized function h(x,y,z;t).
MATLAB
You're right that in MATLAB you should either use lsqcurvefit of the Optimization toolbox, or fit of the Curve Fitting Toolbox (I prefer the latter).
Looking at the documentation of lsqcurvefit:
x = lsqcurvefit(fun,x0,xdata,ydata);
It says in the documentation that you have a model F(x,xdata) with coefficients x and sample points xdata, and a set of measured values ydata. The function returns the least-squares parameter set x, with which your function is closest to the measured values.
Fitting algorithms usually need starting points, some implementations can choose randomly, in case of lsqcurvefit this is what x0 is for. If you have
h = #(x,y,z,t) ... %// actual function here
t_meas = ... %// actual measured times here
h_meas = ... %// actual measured data here
then in the conventions of lsqcurvefit,
fun <--> #(params,t) h(params(1),params(2),params(3),t)
x0 <--> starting guess for [x,y,z]: [x0,y0,z0]
xdata <--> t_meas
ydata <--> h_meas
Your function h(x,y,z,t) should be vectorized in t, such that for vector input in t the return value is the same size as t. Then the call to lsqcurvefit will give you the optimal set of parameters:
x = lsqcurvefit(#(params,t) h(params(1),params(2),params(3),t),[x0,y0,z0],t_meas,h_meas);
h_fit = h(x(1),x(2),x(3),t_meas); %// best guess from curve fitting
Python
In python, you'd have to use the scipy.optimize module, and something like scipy.optimize.curve_fit in particular. With the above conventions you need something along the lines of this:
import scipy.optimize as opt
popt,pcov = opt.curve_fit(lambda t,x,y,z: h(x,y,z,t), t_meas, y_meas, p0=[x0,y0,z0])
Note that the p0 starting array is optional, but all parameters will be set to 1 if it's missing. The result you need is the popt array, containing the optimal values for [x,y,z]:
x,y,z = popt
h_fit = h(x,y,z,t_meas)

Categories