non linear least square fitting with the variable as the integration limit - python

I'm trying to make some non-linear fittings with python which involve an integral, and the limits of the integral depends on the independent variable. The code is the following:
import numpy as np
import scipy as sc
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.integrate import quad
T,M=np.genfromtxt("zfc.txt", unpack=True, skiprows = 0) #here I load the data to fit
plt.plot(T,M,'o')
def arg_int1(x,sigma,Ebm):
return (1/(np.sqrt(2*np.pi)*sigma*Ebm))*np.exp(-(np.log(x/float(Ebm))**2)/(2*sigma**2))
def arg_int2(x,sigma,Ebm):
return (1/(np.sqrt(2*np.pi)*sigma*x))*np.exp(-(np.log(x/float(Ebm))**2)/(2*sigma**2))
def zfc(x,k1,k2,k3):
Temp=x*k2*27/float(k2/1.36e-16)
#Temp=k2*27/float(k2/1.36e-16) #apparently x can't be fitted with curve_fit if appears as well in the integral limits
A=sc.integrate.quad(arg_int1,0,Temp,args=(k3,k2))[0]
B=sc.integrate.quad(arg_int2,Temp,10*k2,args=(k3,k2))[0]
M=k1*(k2/1.36e-16*A/x+B)
return M
T_fit=np.linspace(1,301,301)
popt, pcov = curve_fit(zfc,T,M,p0=(0.5,2.802e-13,0.46))
M_fit=np.zeros(301)
M_fit[0]=zfc(100,0.5,2.8e-13,0.46)
for i in range (1,301):
M_fit[i-1]=zfc(i,popt[0],popt[1],popt[2])
plt.plot(T_fit,M_fit,'g')
The eror that I get is:
File "C:\Users\usuario\Anaconda\lib\site-packages\scipy\integrate\quadpack.py", line 329, in _quad
if (b != Inf and a != -Inf):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
which I don't understand, since the function is well defined. I know that the solution of my problem is the feeded parameters (i have made the fit with mathematica). I have tried to look for fitting for the Bloch-Gruneisen function (where the independent variable defines the integral limits as well) but I have not found any clue.

The problem is that scipy.optimize.curve_fit expects zfc to work on array arguments, i.e. given an n-array of x-values and 3 n-arrays of k1,k2,k3 values the zfc(x,k1,k2,k3) should return an n-array containing the corresponding values of the function. This can however easily be achieved by creating a wrapper for the function using np.vectorize:
zfc_wrapper = np.vectorize(zfc)
popt, pcov = curve_fit(zfc_wrapper,T,M,p0=(0.5,2.802e-13,0.46))
Next time it would be nice if you could provide some sample input data. I managed to run it with test data from some arbitrary function, but this may not always be the case.
Cheers.

Related

Python scipy curve_fit Exponential equation not fitting as expected

I have data I am trying to fit a exponential to, this data is not ideal however when use JMP's in-build curve fit function it works as expected and a I get a good approximation of my data (please see bellow figure, JMP Fit Curve Exponential 3P).
JMP Fit Curve Exponential 3P
I am know trying to replicate this using the python library scipy.optimize with the curve_fit function as described here. However this is producing very different curves please see bellow.
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
df = pd.read_csv('test.csv', sep = ',' ,index_col = None, engine='python')
def exponential_3p(x, a, b, c):
return a + b * np.exp(c * x)
popt, pcov = curve_fit(exponential_3p,df.x,df.y)
a = popt[0]
b = popt[1]
c = popt[2]
plt.plot(df.x,df.y)
plt.plot(df.x,exponential_3p(df.x, a, b, c))
scipy optimize.curve_fit Exponential
You are yet another victim of the incomprehensible stupidity of scipy.optimize.curve_fit.
Curve-fitting and local optimization problems REQUIRES initial values for all variable parameters. They are not optional. There is no "default value" that makes sense. scipy.optimize.curve_fit lies to you about this and allows you to not provide initial values and silently (not even a warning!) assumes that you meant all initial values to be 1. This is wrong, wrong, wrong.
You must give sensible starting values or the fit.

Problem while fitting a set of data points to arbitrary curve with Scipy

I have a set of data points which, according to the model I want to implement, could be modelled with a certain curve (in this case, a product between an exponential and a complementary error function).
For fitting these data into such a curve, I tried:
import numpy as np
from scipy.optimize import curve_fit
from scipy import special
x_fit = np.linspace(0,1,1000)
def fitted_function(x_fit, c, d, S):
return c*np.exp(((S*d/2)**2)-x_fit*d)*special.erfc(S*d/2-x_fit/S)
FitParameters, FitCovariance = curve_fit(fitted_function, x_data, y_data, maxfev = 100000)
It does not give me any particular error, but the result of the fitting is evidently wrong. I strongly suspect that it has to do with the the part x_fit/S, where the fitting parameter S appears as a denominator.
For example, I encounter the same problem while fitting a simple exponential: if I define the fitting curve with
return a*np.exp(-x_fit/b)
with a, b fitting parameters; since the fitting parameter b appears as a denominator, I find the same problem (i.e. the resulting fitted curve is a horizontal line for some reason).
For the case of a simple exponential I can simple bypass this by doing
return a*np.exp(-b*x_fit)
so that b is not a denominator anymore and the fitted curve is really an exponential curve. For my current case, instead, I cannot do this since S appears ad a numerator and a denominator in different part of the expression.
Any ideas? Thank you in advance!

Scipy.optimize.curve_fit won't fit cosine power law

For several hours now, I have been trying to fit a model to a (generated) dataset as a casus for a problem I've been struggling with. I generated datapoints for the function f(x) = A*cos^n(x)+b, and added some noise. When I try to fit the dataset with this function and curve_fit, I get the error
./tester.py:10: RuntimeWarning: invalid value encountered in power
return Amp*(np.cos(x))**n + b
/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py:690: OptimizeWarning: Covariance of the parameters could not be estimated category=OptimizeWarning)
The code I'm using to generate the datapoints and fit the model is the following:
#!/usr/bin/env python
from __future__ import print_function
import numpy as np
from scipy.optimize import curve_fit
from matplotlib.pyplot import figure, show, rc, plot
def f(x, Amp, n, b):
return np.real(Amp*(np.cos(x))**n + b)
x = np.arange(0, 6.28, 0.01)
randomPart = np.random.rand(len(x))-0.5
fig = figure()
sample = f(x, 5, 2, 5)+randomPart
frame = fig.add_subplot(1,1,1)
frame.plot(x, sample, label="Sample measurements")
popt, pcov = curve_fit(f, x, sample, p0=(1,1,1))
modeldata = f(x, popt[0], popt[1], popt[2])
print(modeldata)
frame.plot(x, modeldata, label="Best fit")
frame.legend()
frame.set_xlabel("x")
frame.set_ylabel("y")
show()
The noisy data is shown - see the image below.
Does any of you have a clue of what's going on? I suspect it has something to do with the power law going into the complex domain, as the real part of the function is nowhere divergent. I have tried returning only the real part of the function, setting realistic bounds in curve_fit and using a numpy array instead of a python list for p0 already as well. I'm running the latest version of scipy available to me, scipy 0.17.0-1.
The problem is the following:
>>> (-2)**1.1
(-2.0386342710747223-0.6623924280875919j)
>>> np.array(-2)**1.1
__main__:1: RuntimeWarning: invalid value encountered in power
nan
Unlike native python floats, numpy doubles usually refuse to take part in operations leading to complex results:
>>> np.sqrt(-1)
__main__:1: RuntimeWarning: invalid value encountered in sqrt
nan
As a quick workaround I suggest adding an np.abs call to your function, and using appropriate bounds for fitting to make sure this doesn't give a spurious fit. If your model is near the truth and your sample (I mean the cosine in your sample) is positive, then adding an absolute value around it should be a no-op (update: I realize this is never the case, see the proper approach below).
def f(x, Amp, n, b):
return Amp*(np.abs(np.cos(x)))**n + b # only change here
With this small change I get this:
For reference, the parameters from the fit are (4.96482314, 2.03690954, 5.03709923]) comparing to the generation with (5,2,5).
After giving it a bit more thought I realized that the cosine will always be negative for half your domain (duh). So the workaround I suggested might be a bit problematic, or at least its correctness is non-trivial. On the other hand, thinking of your original formula containing cos(x)^n, with negative values for cos(x) this only makes sense as a model if n is an integer, otherwise you get a complex result. Since we can't solve Diophantine fitting problems, we need to handle this properly.
The most proper way (by which I mean the way that is least likely to bias your data) is this: first do the fitting with a model that converts your data to complex numbers then takes the complex magnitude on output:
def f(x, Amp, n, b):
return Amp*np.abs(np.cos(x.astype(np.complex128))**n) + b
This is obviously much less efficient than my workaround, since in each fitting step we create a new mesh, and do some extra work both in the form of complex arithmetic and an extra magnitude calculation. This gives me the following fit even with no bounds set:
The parameters are (5.02849409, 1.97655728, 4.96529108). These are close too. However, if we put these values back into the actual model (without np.abs), we get imaginary parts as large as -0.37, which is not overwhelming but significant.
So the second step should be redoing the fit with a proper model---one that has an integer exponent. Take the exponent 2 which is obvious from your fit, and do a new fit with this model. I don't believe any other approach gives you a mathematically sound result. You can also start from the original popt, hoping that it's indeed close to the truth. Of course we could use the original function with some currying, but it's much faster to use a dedicated double-specific version of your model.
from __future__ import print_function
import numpy as np
from scipy.optimize import curve_fit
from matplotlib.pyplot import subplots, show
def f_aux(x, Amp, n, b):
return Amp*np.abs(np.cos(x.astype(np.complex128))**n) + b
def f_real(x, Amp, n, b):
return Amp*np.cos(x)**n + b
x = np.arange(0, 2*np.pi, 0.01) # pi
randomPart = np.random.rand(len(x)) - 0.5
sample = f(x, 5, 2, 5) + randomPart
fig,(frame_aux,frame) = subplots(ncols=2)
for fr in frame_aux,frame:
fr.plot(x, sample, label="Sample measurements")
fr.legend()
fr.set_xlabel("x")
fr.set_ylabel("y")
# auxiliary fit for n value
popt_aux, pcov_aux = curve_fit(f_aux, x, sample, p0=(1,1,1))
modeldata = f(x, *popt_aux)
#print(modeldata)
print('Auxiliary fit parameters: {}'.format(popt_aux))
frame_aux.plot(x, modeldata, label="Auxiliary fit")
# check visually, test if it's close to an integer, but otherwise
n = np.round(popt_aux[1])
# actual fit with integral exponent
popt, pcov = curve_fit(lambda x,Amp,b,n=n: f_real(x,Amp,n,b), x, sample, p0=(popt_aux[0],popt_aux[2]))
modeldata = f(x, popt[0], n, popt[1])
#print(modeldata)
print('Final fit parameters: {}'.format([popt[0],n,popt[1]]))
frame.plot(x, modeldata, label="Best fit")
frame_aux.legend()
frame.legend()
show()
Note that I changed a few things in your code which doesn't really affect my point. The figure from the above, so the one that shows both the auxiliary fit and the proper one:
The output:
Auxiliary fit parameters: [ 5.02628994 2.00886409 5.00652371]
Final fit parameters: [5.0288141074549699, 2.0, 5.0009730316739462]
Just to reiterate: while there might be no visual difference between the auxiliary fit and the proper one, only the latter gives a meaningful answer to your problem.

Gaussian fit in Python - parameters estimation

I want to fit an array of data (in the program called "data", of size "n") with a Gaussian function and I want to get the estimations for the parameters of the curve, namely the mean and the sigma. Is the following code, which I found on the Web, a fast way to do that? If so, how can I actually get the estimated values of the parameters?
import pylab as plb
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
x = ar(range(n))
y = data
n = len(x) #the number of data
mean = sum(x*y)/n #note this correction
sigma = sum(y*(x-mean)**2)/n #note this correction
def gaus(x,a,x0,sigma,c):
return a*exp(-(x-x0)**2/(sigma**2))+c
popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma,0.0])
print popt
print pcov
plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit')
plt.xlabel('q')
plt.ylabel('data')
plt.show()
To answer your first question, "Is the following code, which I found on the Web, a fast way to do that?"
The code that you have is in fact the right way to proceed with fitting your data, when you believe is Gaussian and know the fitting function (except change the return function to
a*exp(-(x-x0)**2/(sigma**2)).
I believe for a Gaussian function you don't need the constant c parameter.
A common use of least-squares minimization is curve fitting, where one has a parametrized model function meant to explain some phenomena and wants to adjust the numerical values for the model to most closely match some data. With scipy, such problems are commonly solved with scipy.optimize.curve_fit.
To answer your second question, "If so, how can I actually get the estimated values of the parameters?"
You can go to the link provided for scipy.optimize.curve_fit and find that the best fit parameters reside in your popt variable. In your example, popt will contain the mean and sigma of your data. In addition to the best fit parameters, pcov will contain the covariance matrix, which will have the errors of your mean and sigma. To obtain 1sigma standard deviations, you can simply use np.sqrt(pcov) and obtain the same.

How can I do a least squares fit in python, using data that is only an upper limit?

I am trying to perform a least squares fit in python to a known function with three variables. I am able to complete this task for randomly generated data with errors, but the actual data that I need to fit includes some data points that are upper limits on the values. The function describes the flux as a function of wavelength, but in some cases the flux measured at the given wavelength is not an absolute value with an error but rather a maximum value for the flux, with the real value being anything below that down to zero.
Is there some way of telling the fitting task that some data points are upper limits? Additionally, I have to do this for a number of data sets, and the number of data points which could be upper limits is different for each one, so being able to do this automatically would be beneficial but not a necessity.
I apologise if any of this is unclear, I will endeavour to explain it more clearly if it is needed.
The code I am using to fit my data is included below.
import numpy as np
from scipy.optimize import leastsq
import math as math
import matplotlib.pyplot as plt
def f_all(x,p):
return np.exp(p[0])/((x**(3+p[1]))*((np.exp(14404.5/((x*1000000)*p[2])))-1))
def residual(p,y,x,error):
err=(y-(f_all(x,p)))/error
return err
p0=[-30,2.0,35.0]
data=np.genfromtxt("./Data_Files/Object_001")
wavelength=data[:,0]
flux=data[:,1]
errors=data[:,2]
p,cov,infodict,mesg,ier=leastsq(residual, p0, args = (flux, wavelength, errors), full_output=True)
print p
Scipy.optimize.leastsq is a convenient way to fit data, but the work underneath is the minimization of a function. Scipy.optimize contains many minimization functions, some of then having the capacity of handling constraints. Here I explain with fmin_slsqp which I know, perhaps the others can do also; see Scipy.optimize doc
fmin_slsqp requires a function to minimize and an initial value for the parameter. The function to minimize is the sum of the square of the residuals. For the parameters, I perform first a traditional leastsq fit and use the result as an initial value for the constrained minimization problem. Then there are several ways to impose constraints (see doc); the simpler is the f_ieqcons parameters: it requires a function which returns an array whose values must always be positive (that's the constraints). Here the function returns positive values if, for all maximal values points, the fit function is below the point.
import numpy
import scipy.optimize as scimin
import matplotlib.pyplot as mpl
datax=numpy.array([1,2,3,4,5]) # data coordinates
datay=numpy.array([2.95,6.03,11.2,17.7,26.8])
constraintmaxx=numpy.array([0]) # list of maximum constraints
constraintmaxy=numpy.array([1.2])
# least square fit without constraints
def fitfunc(x,p): # model $f(x)=a x^2+c
a,c=p
return c+a*x**2
def residuals(p): # array of residuals
return datay-fitfunc(datax,p)
p0=[1,2] # initial parameters guess
pwithout,cov,infodict,mesg,ier=scimin.leastsq(residuals, p0,full_output=True) #traditionnal least squares fit
# least square fir with constraints
def sum_residuals(p): # the function we want to minimize
return sum(residuals(p)**2)
def constraints(p): # the constraints: all the values of the returned array will be >=0 at the end
return constraintmaxy-fitfunc(constraintmaxx,p)
pwith=scimin.fmin_slsqp(sum_residuals,pwithout,f_ieqcons=constraints) # minimization with constraint
# plotting
ax=mpl.figure().add_subplot(1,1,1)
ax.plot(datax,datay,ls="",marker="x",color="blue",mew=2.0,label="Datas")
ax.plot(constraintmaxx,constraintmaxy,ls="",marker="x",color="red",mew=2.0,label="Max points")
morex=numpy.linspace(0,6,100)
ax.plot(morex,fitfunc(morex,pwithout),color="blue",label="Fit without constraints")
ax.plot(morex,fitfunc(morex,pwith),color="red",label="Fit with constraints")
ax.legend(loc=2)
mpl.show()
In this example I fit an imaginary sample of points on a parabola. Here is the result, without and with constraint (the red cross on left):
I hope this will do for your data sample; otherwise, please post one of your data files so that we can try with real data. I know my example does not takes care of error bars on data, but you can easily handle them by modifying the residuals function.

Categories