I'm trying to use Scipy's ODR to fit various different curves to data. These curves have to be given as an ODR Model, which is defined by a function. This function has two arguments: p and x. p is a list of the parameters that will be optimised. Example:
def f(p, x):
m, c = p
return m*x + c
model = Model(f)
data = RealData(xdata, ydata)
odr_setup = ODR(data, model, beta0=[0,0], partol=0.001)
odr_result = odr_setup.run()
My problem: the length of p depends on the function I am fitting. If the length of beta0 and p are not the same, it gives a ValueError. To get round this, I have constructed some nested try-except statements to get the matching length. Is there a more elegant way of achieving the same thing?
Ideally I would like something like ODR(data, model, ***beta0=[0]*n_params***, partol=0.001). How would I find n_params? Or is there a better way?
Thanks!
Related
I am new to python. I want to perform orthogonal distance regression by using Scipy ODR by using the code below. I do not know how can I extract slope and intercept from the output and on what logic we give values to beta0 in "myodr = odr.ODR(mydata, linear, beta0=[0.,1.]"
def f(B, x):
return B[0]*x + B[1]
linear = odr.Model(f)
mydata = odr.Data(x, y, wd=1./xerr, we=1./yerr)
myodr = odr.ODR(mydata, linear, beta0=[0.,1.]) # how to put beta0 values here
myoutput = myodr.run()
myoutput.pprint()
The fitted parameters are stored in the beta attribute of the Output instance returned by the run() method.
You used the generic Model, so the meaning of the parameters is determined by how you use them in f. In your case, you can recover the slope and intercept with
slope, intercept = myoutput.beta
ODR fitting involves solving a nonlinear problem with numerical methods that require a starting "guess" of the solution. The better the guess, the more likely that the numerical method will converge to a solution. Determining a good initial guess for an arbitrary model can be difficult. You model is linear, so there are several methods that might work fine. You could, for example, use the result of a standard least squares fit. Or you could use the equation of the line through two data points, say the points associated with the smallest and largest x values.
Because your model is linear, you might prefer to use the predefined model unilinear. Then you don't have to define f, and you don't have to provide beta0, because the unilinear model includes its own method for generating an initial guess. The meaning of the parameters in the unilinear model are the same as in your f function, so you can continue to use slope, intercept = myoutput.beta to retrieve the results.
Something like this should work:
mydata = odr.Data(x, y, wd=1./xerr, we=1./yerr)
myodr = odr.ODR(mydata, model=odr.unilinear)
myoutput = myodr.run()
slope, intercept = myoutput.beta
I am new to python and I am trying to learn how to plot and fit data.
I have an empeirical formula for describing the function y(x)
and i want to fit it to an exponential of the form : y = a* x ^ b
I am using numpy.arrays but i am not sure numpy.polyfit is usefull here because i do not want to fit with high order polynomials, neither exponentials of the form : y = a * e ^ (b*x).
Can you please suggest a way to do this?
my function is this one, here written as y (E_n):
E_n = np.linspace(1, 10**6, 10**6)
y= 0.018*(E_n**(-2.7)) * (1/(1+(2.77*cos(45)*E_n/115)) + 0.367/(1+(1.18*cos(45)*E_n/850)))
Thank you
Consider using scipy.optimize.curve_fit. Define a function of the form you desire, pass it to the function. Read the linked documentation well. In many cases, you may need to pass chosen initial values for the parameters. curve_fit takes all of them to be 1 by default, and this might not yield desirable results.
I am new to this statistical thing.
So I have a data set y = f(x) for some values of x. I want to fit this data to a func so that for every point in y I can calculate the value of x.
suppose the model I want to fit is something like
def func(x,a,b,c):
return a+b*x/c
now to use minimize function, i've to define parameters:
params = Parameters()
params.add('a' , value = 10)
params.add('b' , value = 1)
params.add('c' , value = 2)
result = minimize(func, param, arg=(x, y))
My question is, what if I want to make my x variable as parameter and pass it as parameter.
Basically when I pass x as variable i am passing an array which corresponds to specific points in my data set. However I want to find use x as a parameter because I want to find value of x for certain points of data y.
Parameters and fitting variables are scalar floating-point numbers. That is to say, they have one value that can take a continuous range of values.
Do you mean that you want every element of x to be independently varied in the fit?
It is common to use minimization methods to fit data: find the set of values for the variables (say, your a, b, and c) so that y - func(x, a, b, c) is as small as possible.
Your incomplete code snippet (to be clear, it is always better to include a complete example) doesn't do that -- it doesn't pass y into func.
More importantly, you seem to be looking to "find the value of x for certain points of data y". That doesn't quite make sense to me.... Maybe clean up and clarify the question?
It's not a one way step.
I achieved it by using Model class in lmfit
Consider this.
I have model function which I name say: MF through which I calculate exact samples as of raw data.
I have raw data : Raw_Data (say for example coming from sensors)
then I have certain parameters. (x, y, z, samples)
Now I consider that samples is an independent variable.
My goal is to estimate one of the parameters.
First I have to create a Model class
mod = Model (MF, parameters, independant_vars = ['samples']
then you set the initial parameters using Parameters() class.
and add init values
fit_params = Parameters()
fit_params.add('x', value = 170)
fit_params.add('y', value = 120)
fit_params.add('z', value = 110)
Then you fit the model
Result - mod.fit(Raw_data, params = fit_params, samples = your_samples)
The output is a dict.
I'm working on two functions. I have two data sets, eg [[x(1), y(1)], ..., [x(n), y(n)]], dataSet and testData.
createMatrix(D, S) which returns a data matrix, where D is the degree and S is a vector of real numbers [s(1), s(2), ..., s(n)].
I know numpy has a function called polyfit. But polyfit takes in three variables, any advice on how I'd create the matrix?
polyFit(D), which takes in the polynomial of degree D and fits it to the data sets using linear least squares. I'm trying to return the weight vector and errors. I also know that there is lstsq in numpy.linag that I found in this question: Fitting polynomials to data
Is it possible to use that question to recreate what I'm trying?
This is what I have so far, but it isn't working.
def createMatrix(D, S):
x = []
y = []
for i in dataSet:
x.append(i[0])
y.append(i[1])
polyfit(x, y, D)
What I don't get here is what does S, the vector of real numbers, have to do with this?
def polyFit(D)
I'm basing a lot of this on the question posted above. I'm unsure about how to get just w though, the weight vector. I'll be coding the errors, so that's fine I was just wondering if you have any advice on getting the weight vectors themselves.
It looks like all createMatrix is doing is creating the two vectors required by polyfit. What you have will work, but, the more pythonic way to do it is
def createMatrix(dataSet, D):
D = 3 # set this to whatever degree you're trying
x, y = zip(*dataSet)
return polyfit(x, y, D)
(This S/O link provides a detailed explanation of the zip(*dataSet) idiom.)
This will return a vector of coefficients that you can then pass to something like poly1d to generate results. (Further explanation of both polyfit and poly1d can be found here.)
Obviously, you'll need to decide what value you want for D. The simple answer to that is 1, 2, or 3. Polynomials of higher order than cubic tend to be rather unstable and the intrinsic errors make their output rather meaningless.
It sounds like you might be trying to do some sort of correlation analysis (i.e., does y vary with x and, if so, to what extent?) You'll almost certainly want to just use linear (D = 1) regression for this type of analysis. You can try to do a least squares quadratic fit (D = 2) but, again, the error bounds are probably wider than your assumptions (e.g. normality of distribution) will tolerate.
First question:
I'm trying to fit experimental datas with function of the following form:
f(x) = m_o*(1-exp(-t_o*x)) + ... + m_j*(1-exp(-t_j*x))
Currently, I don't find a way to have an undetermined number of parameters m_j, t_j, I'm forced to do something like this:
def fitting_function(x, m_1, t_1, m_2, t_2):
return m_1*(1.-numpy.exp(-t_1*x)) + m_2*(1.-numpy.exp(-t_2*x))
parameters, covariance = curve_fit(fitting_function, xExp, yExp, maxfev = 100000)
(xExp and yExp are my experimental points)
Is there a way to write my fitting function like this:
def fitting_function(x, li):
res = 0.
for el in range(len(li) / 2):
res += li[2*idx]*(1-numpy.exp(-li[2*idx+1]*x))
return res
where li is the list of fitting parameters and then do a curve_fitting? I don't know how to tell to curve_fitting what is the number of fitting parameters.
When I try this kind of form for fitting_function, I have errors like "ValueError: Unable to determine number of fit parameters."
Second question:
Is there any way to force my fitting parameters to be positive?
Any help appreciated :)
See my question and answer here. I've also made a minimal working example demonstrating how it could be done for your application. I make no claims that this is the best way - I am muddling through all this myself, so any critiques or simplifications are appreciated.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as pl
def wrapper(x, *args): #take a list of arguments and break it down into two lists for the fit function to understand
N = len(args)/2
amplitudes = list(args[0:N])
timeconstants = list(args[N:2*N])
return fit_func(x, amplitudes, timeconstants)
def fit_func(x, amplitudes, timeconstants): #the actual fit function
fit = np.zeros(len(x))
for m,t in zip(amplitudes, timeconstants):
fit += m*(1.0-np.exp(-t*x))
return fit
def gen_data(x, amplitudes, timeconstants, noise=0.1): #generate some fake data
y = np.zeros(len(x))
for m,t in zip(amplitudes, timeconstants):
y += m*(1.0-np.exp(-t*x))
if noise:
y += np.random.normal(0, noise, size=len(x))
return y
def main():
x = np.arange(0,100)
amplitudes = [1, 2, 3]
timeconstants = [0.5, 0.2, 0.1]
y = gen_data(x, amplitudes, timeconstants, noise=0.01)
p0 = [1, 2, 3, 0.5, 0.2, 0.1]
popt, pcov = curve_fit(lambda x, *p0: wrapper(x, *p0), x, y, p0=p0) #call with lambda function
yfit = gen_data(x, popt[0:3], popt[3:6], noise=0)
pl.plot(x,y,x,yfit)
pl.show()
print popt
print pcov
if __name__=="__main__":
main()
A word of warning, though. A linear sum of exponentials is going to make the fit EXTREMELY sensitive to any noise, particularly for a large number of parameters. You can test that by adding even a small amount of noise to the data generated in the script - even small deviations cause it to get the wrong answer entirely while the fit still looks perfectly valid by eye (test with noise=0, 0.01, and 0.1). Be very careful interpreting your results even if the fit looks good. It's also a form that allows for variable swapping: the best fit solution is the same even if you swap any pairs of (m_i, t_i) with (m_j, t_j), meaning your chi-square has multiple identical local minima that might mean your variables get swapped around during fitting, depending on your initial conditions. This is unlikely to be a numeriaclly robust way to extract these parameters.
To your second question, yes, you can, by defining your exponentials like so:
m_0**2*(1.0-np.exp(-t_0**2*x)+...
Basically, square them all in your actual fit function, fit them, and then square the results (which could be negative or positive) to get your actual parameters. You can also define variables to be between a certain range by using different proxy forms.