I have a very specific task, where I need to find the slope of my exponential function.
I have two arrays, one denoting the wavelength range between 400 and 750 nm, the other the absorption spectrum. x = wavelengths, y = absorption.
My fit function should look something like that:
y_mod = np.float(a_440) * np.exp(-S*(x - 440.))
where S is the slope and in the image equals 0.016, which should be in the range of S values I should get (+/- 0.003). a_440 is the reference absorption at 440 nm, x is the wavelength.
Modelled vs. original plot:
I would like to know how to define my function in order to get an exponential fit (not on log transformed quantities) of it without guessing beforehand what the S value is.
What I've tried so far was to define the function in such way:
def func(x, a, b):
return a * np.exp(-b * (x-440))
And it gives pretty nice matches
fitted vs original.
What I'm not sure is whether this approach is correct or should I do it differently?
How would one use also the least squares or the absolute differences in y approaches for minimization in order to remove the effect of overliers?
Is it possible to also add random noise to the data and recompute the fit?
Your situation is the same as the one described in the documentation for scipy's curve_fit.
The problem you're incurring is that your definition of the function accepts only one argument when it should receive three: x (the independent variable where the function is evaluated), plus a_440 and S.
Cleaning a bit, the function should be more like this.
def func(x, A, S):
return A*np.exp(-S*(x-440.))
It might be that you run into a warning about the covariance matrix. you solve that by providing a decent starting point to the curve_fit through the argument p0 and providing a list. For example in this case p0=[1,0.01] and in the fitting call it would look like the following
curve_fit(func, x, y, p0=[1,0.01])
Related
I'm working on two functions. I have two data sets, eg [[x(1), y(1)], ..., [x(n), y(n)]], dataSet and testData.
createMatrix(D, S) which returns a data matrix, where D is the degree and S is a vector of real numbers [s(1), s(2), ..., s(n)].
I know numpy has a function called polyfit. But polyfit takes in three variables, any advice on how I'd create the matrix?
polyFit(D), which takes in the polynomial of degree D and fits it to the data sets using linear least squares. I'm trying to return the weight vector and errors. I also know that there is lstsq in numpy.linag that I found in this question: Fitting polynomials to data
Is it possible to use that question to recreate what I'm trying?
This is what I have so far, but it isn't working.
def createMatrix(D, S):
x = []
y = []
for i in dataSet:
x.append(i[0])
y.append(i[1])
polyfit(x, y, D)
What I don't get here is what does S, the vector of real numbers, have to do with this?
def polyFit(D)
I'm basing a lot of this on the question posted above. I'm unsure about how to get just w though, the weight vector. I'll be coding the errors, so that's fine I was just wondering if you have any advice on getting the weight vectors themselves.
It looks like all createMatrix is doing is creating the two vectors required by polyfit. What you have will work, but, the more pythonic way to do it is
def createMatrix(dataSet, D):
D = 3 # set this to whatever degree you're trying
x, y = zip(*dataSet)
return polyfit(x, y, D)
(This S/O link provides a detailed explanation of the zip(*dataSet) idiom.)
This will return a vector of coefficients that you can then pass to something like poly1d to generate results. (Further explanation of both polyfit and poly1d can be found here.)
Obviously, you'll need to decide what value you want for D. The simple answer to that is 1, 2, or 3. Polynomials of higher order than cubic tend to be rather unstable and the intrinsic errors make their output rather meaningless.
It sounds like you might be trying to do some sort of correlation analysis (i.e., does y vary with x and, if so, to what extent?) You'll almost certainly want to just use linear (D = 1) regression for this type of analysis. You can try to do a least squares quadratic fit (D = 2) but, again, the error bounds are probably wider than your assumptions (e.g. normality of distribution) will tolerate.
Suppose 'h' is a function of x,y,z and t and it gives us a graph line (t,h) (simulated). At the same time we also have observed graph (observed values of h against t). How can I reduce the difference between observed (t,h) and simulated (t,h) graph by optimizing values of x,y and z? I want to change the simulated graph so that it imitates closer and closer to the observed graph in MATLAB/Python. In literature I have read that people have done same thing by Lavenberg-marquardt algorithm but don't know how to do it?
You are actually trying to fit the parameters x,y,z of the parametrized function h(x,y,z;t).
MATLAB
You're right that in MATLAB you should either use lsqcurvefit of the Optimization toolbox, or fit of the Curve Fitting Toolbox (I prefer the latter).
Looking at the documentation of lsqcurvefit:
x = lsqcurvefit(fun,x0,xdata,ydata);
It says in the documentation that you have a model F(x,xdata) with coefficients x and sample points xdata, and a set of measured values ydata. The function returns the least-squares parameter set x, with which your function is closest to the measured values.
Fitting algorithms usually need starting points, some implementations can choose randomly, in case of lsqcurvefit this is what x0 is for. If you have
h = #(x,y,z,t) ... %// actual function here
t_meas = ... %// actual measured times here
h_meas = ... %// actual measured data here
then in the conventions of lsqcurvefit,
fun <--> #(params,t) h(params(1),params(2),params(3),t)
x0 <--> starting guess for [x,y,z]: [x0,y0,z0]
xdata <--> t_meas
ydata <--> h_meas
Your function h(x,y,z,t) should be vectorized in t, such that for vector input in t the return value is the same size as t. Then the call to lsqcurvefit will give you the optimal set of parameters:
x = lsqcurvefit(#(params,t) h(params(1),params(2),params(3),t),[x0,y0,z0],t_meas,h_meas);
h_fit = h(x(1),x(2),x(3),t_meas); %// best guess from curve fitting
Python
In python, you'd have to use the scipy.optimize module, and something like scipy.optimize.curve_fit in particular. With the above conventions you need something along the lines of this:
import scipy.optimize as opt
popt,pcov = opt.curve_fit(lambda t,x,y,z: h(x,y,z,t), t_meas, y_meas, p0=[x0,y0,z0])
Note that the p0 starting array is optional, but all parameters will be set to 1 if it's missing. The result you need is the popt array, containing the optimal values for [x,y,z]:
x,y,z = popt
h_fit = h(x,y,z,t_meas)
I want to fit an array of data (in the program called "data", of size "n") with a Gaussian function and I want to get the estimations for the parameters of the curve, namely the mean and the sigma. Is the following code, which I found on the Web, a fast way to do that? If so, how can I actually get the estimated values of the parameters?
import pylab as plb
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
x = ar(range(n))
y = data
n = len(x) #the number of data
mean = sum(x*y)/n #note this correction
sigma = sum(y*(x-mean)**2)/n #note this correction
def gaus(x,a,x0,sigma,c):
return a*exp(-(x-x0)**2/(sigma**2))+c
popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma,0.0])
print popt
print pcov
plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit')
plt.xlabel('q')
plt.ylabel('data')
plt.show()
To answer your first question, "Is the following code, which I found on the Web, a fast way to do that?"
The code that you have is in fact the right way to proceed with fitting your data, when you believe is Gaussian and know the fitting function (except change the return function to
a*exp(-(x-x0)**2/(sigma**2)).
I believe for a Gaussian function you don't need the constant c parameter.
A common use of least-squares minimization is curve fitting, where one has a parametrized model function meant to explain some phenomena and wants to adjust the numerical values for the model to most closely match some data. With scipy, such problems are commonly solved with scipy.optimize.curve_fit.
To answer your second question, "If so, how can I actually get the estimated values of the parameters?"
You can go to the link provided for scipy.optimize.curve_fit and find that the best fit parameters reside in your popt variable. In your example, popt will contain the mean and sigma of your data. In addition to the best fit parameters, pcov will contain the covariance matrix, which will have the errors of your mean and sigma. To obtain 1sigma standard deviations, you can simply use np.sqrt(pcov) and obtain the same.
I am trying to fit a polynomial to my data, e.g.
import scipy as sp
x = [1,6,9,17,23,28]
y = [6.1, 7.52324, 5.71, 5.86105, 6.3, 5.2]
and say I know the degree of polynomial (e.g.: 3), then I just use scipy.polyfit method to get the polynomial of a given degree:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
fittedModelFunction = sp.polyfit(x, y, 3)
func = sp.poly1d(fittedModelFunction)
++++++++++++++++++++++++++++++
QUESTIONS: ++++++++++++++++++++++++++++++
1) How can I tell in addition that the resulting function func must be always positive (i.e. f(x) >= 0 for any x)?
2) How can I further define a constraint (e.g. number of (local) min and max points, etc.) in order to get a better fitting?
Is there smth like this:
http://mail.scipy.org/pipermail/scipy-user/2007-July/013138.html
but more accurate?
Always Positve
I haven't been able to find a scipy reference that determines if a function is positive-definite, but an indirect way would be to find the all the roots - Scipy Roots - of the function and inspect the limits near those roots. There are a few cases to consider:
No roots at all
Pick any x and evaluate the function. Since the function does not cross the x-axis because of a lack of roots, any positive result will indicate the function is positive!
Finite number of roots
This is probably the most likely case. You would have to inspect the limits before and after each root - Scipy Limits. You would have to specify your own minimum acceptable delta for the limit however. I haven't seen a 2-sided limit method provided by Scipy, but it looks simple enough to make your own.
from sympy import limit
// f: function, v: variable to limit, p: point, d: delta
// returns two limit values
def twoSidedLimit(f, v, p, d):
return limit(f, v, p-d), limit(f, v, p+d)
Infinite roots
I don't think that polyfit would generate an oscillating function, but this is something to consider. I don't know how to handle this with the method I have already offered... Um, hope it does not happen?
Constraints
The only built-in form of constraints seems to be limited to the optimize library of SciPy. A crude way to enforce constraints for polyfit would be to get the function from polyfit, generate a vector of values for various x, and try to select values from the vector that violate the constraint. If you try to use filter, map, or lambda it may be slow with large vectors since python's filter makes a copy of the list/vector being filtered. I can't really help in this regard.
I have two lists .
import numpy
x = numpy.array([7250, ... list of 600 ints ... ,7849])
y = numpy.array([2.4*10**-16, ... list of 600 floats ... , 4.3*10**-16])
They make a U shaped curve.
Now I want to fit a gaussian to that curve.
from scipy.optimize import curve_fit
n = len(x)
mean = sum(y)/n
sigma = sum(y - mean)**2/n
def gaus(x,a,x0,sigma,c):
return a*numpy.exp(-(x-x0)**2/(2*sigma**2))+c
popt, pcov = curve_fit(gaus,x,y,p0=[-1,mean,sigma,-5])
pylab.plot(x,y,'r-')
pylab.plot(x,gaus(x,*popt),'k-')
pylab.show()
I just end up with the noisy original U-shaped curve and a straight horizontal line running through the curve.
I am not sure what the -1 and the -5 represent in the above code but I am sure that I need to adjust them or something else to get the gaussian curve. I have been playing around with possible values but to no avail.
Any ideas?
First of all, your variable sigma is actually variance, i.e. sigma squared --- http://en.wikipedia.org/wiki/Variance#Definition.
This confuses the curve_fit by giving it a suboptimal starting estimate.
Then, your fitting ansatz, gaus, includes an amplitude a and an offset, is this what you actually need? And the starting values are a=-1 (negated bell shape) and offset c=-5. Where do they come from?
Here's what I'd do:
fix your fitting model. Do you want just a gaussian, does it need to be normalized. If it does, then the amplitude a is fixed by sigma etc.
Have a look at the actual data. What's the tail (offset), what's the sign (amplitude sign).
If you're actually want just a gaussian without any bells and whistles, you might not actually need curve_fit: a gaussian is fully defined by two first moments, mean and sigma. Calculate them as you do, plot them over the data and see if you're not all set.
p0 in your call to curve_fit gives the initial guesses for the additional parameters of you function in addition to x. In the above code you are saying that I want the curve_fit function to use -1 as the initial guess for a, -5 as the initial guess for c, mean as the initial guess for x0, and sigma as the guess for sigma. The curve_fit function will then adjust these parameters to try and get a better fit. The problem is your initial guesses at your function parameters are really bad given the order of (x,y)s.
Think a little bit about the order of magnitude of your different parameters for the Gaussian. a should be around the size of your y values (10**-16) as at the peak of the Gaussian the exponential part will never be larger than 1. x0 will give the position within your x values at which the exponential part of your Gaussian will be 1, so x0 should be around 7500, probably somewhere in the centre of your data. Sigma indicates the width, or spread of your Gaussian, so perhaps something in the 100's just a guess. Finally c is just an offset to shift the whole Gaussian up and down.
What I would recommend doing, is before fitting the curve, pick some values for a, x0, sigma, and c that seem reasonable and just plot the data with the Gaussian, and play with a, x0, sigma, and c until you get something that looks at least some what the way you want the Gaussian to fit, then use those as the starting points for curve_fit p0 values. The values I gave should get you started, but may not do exactly what you want. For instance a probably needs to be negative if you want to flip the Gaussian to get a "U" shape.
Also printing out the values that curve_fit thinks are good for your a,x0,sigma, and c might help you see what it is doing and if that function is on the right track to minimizing the residual of the fit.
I have had similar problems doing curve fitting with gnuplot, if the initial values are too far from what you want to fit it goes in completely the wrong direction with the parameters to minimize the residuals, and you could probably do better by eye. Think of these functions as a way to fine tune your by eye estimates of these parameters.
hope that helps
I don't think you are estimating your initial guesses for mean and sigma correctly.
Take a look at the SciPy Cookbook here
I think it should look like this.
x = numpy.array([7250, ... list of 600 ints ... ,7849])
y = numpy.array([2.4*10**-16, ... list of 600 floats ... , 4.3*10**-16])
n = len(x)
mean = sum(x*y)/sum(y)
sigma = sqrt(abs(sum((x-mean)**2*y)/sum(y)))
def gaus(x,a,x0,sigma,c):
return a*numpy.exp(-(x-x0)**2/(2*sigma**2))+c
popy, pcov = curve_fit(gaus,x,y,p0=[-max(y),mean,sigma,min(x)+((max(x)-min(x)))/2])
pylab.plot(x,gaus(x,*popt))
If anyone has a link to a simple explanation why these are the correct moments I would appreciate it. I am going on faith that SciPy Cookbook got it right.
Here is the solution thanks to everyone .
x = numpy.array([7250, ... list of 600 ints ... ,7849])
y = numpy.array([2.4*10**-16, ... list of 600 floats ... , 4.3*10**-16])
n = len(x)
mean = sum(x)/n
sigma = math.sqrt(sum((x-mean)**2)/n)
def gaus(x,a,x0,sigma,c):
return a*numpy.exp(-(x-x0)**2/(2*sigma**2))+c
popy, pcov = curve_fit(gaus,x,y,p0=[-max(y),mean,sigma,min(x)+((max(x)-min(x)))/2])
pylab.plot(x,gaus(x,*popt))
Maybe it is because I use matlab and fminsearch or my fits have to work on much fewer datapoints (~ 5-10), I have much better results with the following starter values (as simple as they are):
a = max(y)-min(y);
imax= find(y==max(y),1);
mean = x(imax);
avg = sum(x.*y)./sum(y);
sigma = sqrt(abs(sum((x-avg).^2.*y) ./ sum(y)));
c = min(y);
The sigma works fine.