Problems using scipy.odr with math.erf() - python

I have a problem using the orthogonal distance regression with a function using the error function math.erf(). To be more clear it seems I have a problem with the variables which should be fit.
import numpy as np
from scipy import odr
import math
def CDF(x,a,b,c):
#definition of errorfunction to fit data
return c/2*(1+math.erf((x-a)/(b*math.sqrt(2))))
#to make things a little shorter I excluded the parts in which the data is
read and prepared
guess=Ergebnis_popt[1] #using found values for a,b,c from curve_fit without
errors
guess_a=guess[0]
guess_b=guess[1]
guess_c=guess[2]
xerr=Verschiebung_fit[2] #determination of x_erros, the values for the y
errors are included in the prepared data
xerr=xerr/xerr
xerr=xerr/200
#building the model for the odr fit, as the erf function seems to only
handle single values I use np.vectorize
cfd = odr.Model(np.vectorize(CDF,excluded=['a','b','c']),extra_args=
['a','b','c'])
#Definition of the Data with ererrors for the fit
mydata =odr.RealData(Verschiebung_fit[2],y=y_values_fit[2],sx=xerr,sy=y_values_fit[4])
#basis for the odr fit
odr=odr.ODR(mydata,cfd, beta0 = [guess_a,guess_b,guess_c] )
myoutput = odr.run()
myoutput.pprint()
All this results in a Error:
File "C:\Users\Public\Anaconda\lib\site-packages\numpy\lib\function_base.py", line 2785, in _get_ufunc_and_otypes
outputs = func(*inputs)
File "C:\Users\Public\Anaconda\lib\site-packages\numpy\lib\function_base.py", line 2750, in func
return self.pyfunc(*the_args, **kwargs)
TypeError: CDF() takes 4 positional arguments but 5 were given
I am a novice using python. I guess there is a problem with communicating the variables which should be fitted, but I can not figure out where this problem occurs.

Related

The covariance of the parameters cannot be estimated during curve fitting

I'm trying to solve two unknown parameters based on my function expression using the scipy.optimize.curve_fit function. The equation I used is as follows:
enter image description here
My code is as follows:
p_freqs =np.array(0.,8.19672131,16.39344262,24.59016393,32.78688525,
40.98360656,49.18032787,57.37704918,65.57377049,73.7704918,
81.96721311,90.16393443,98.36065574,106.55737705,114.75409836,
122.95081967,131.14754098,139.3442623, 147.54098361,155.73770492,
163.93442623,172.13114754,180.32786885,188.52459016,196.72131148,
204.91803279,213.1147541, 221.31147541,229.50819672,237.70491803,
245.90163934)
p_fft_amp1 = np.array(3.34278536e-08,5.73549829e-08,1.94897033e-08,1.59088184e-08,
9.23948302e-09,3.71198908e-09,1.85535722e-09,1.86064653e-09,
1.52149363e-09,1.33626573e-09,1.19468040e-09,1.08304535e-09,
9.96594475e-10,9.25671797e-10,8.66775330e-10,8.17287132e-10,
7.75342888e-10,7.39541296e-10,7.08843676e-10,6.82440637e-10,
6.59712650e-10,6.40169517e-10,6.23422124e-10,6.09159901e-10,
5.97134297e-10,5.87146816e-10,5.79040074e-10,5.72691200e-10,
5.68006964e-10,5.64920239e-10,5.63387557e-10)
def cal_omiga_tstar(omiga,tstar,f):
return omiga*np.exp(-np.pi*f*tstar)/(1+(f/18.15)**2)
omiga,tstar = optimize.curve_fit(cal_omiga_tstar,p_freqs,p_fft_amp1)[0]
When I run the code I get the following prompt:
OptimizeWarning: Covariance of the parameters could not be estimated warnings.warn('Covariance of the parameters could not be estimated'
I couldn't exactly pinpoint the cause of your error message because your code had some errors prior to that. First, the construction of the two arrays has invalid syntax, then your definition of cal_omiga_tstar has the wrong argument order. While fixing these problems I did get your error message once, but I haven't been able to reproduce it, weirdly enough. However, I did manage to fit your function. You should supply initial guesses to the parameters, especially since your y has so many low values. There's no magic here, just plot your model and data until it's relatively close. Then, let the algorithm take the wheel.
Here's my code:
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
# Changed here, was "np.array(0.,..."
p_freqs =np.array([0.,8.19672131,16.39344262,24.59016393,32.78688525,
40.98360656,49.18032787,57.37704918,65.57377049,73.7704918,
81.96721311,90.16393443,98.36065574,106.55737705,114.75409836,
122.95081967,131.14754098,139.3442623, 147.54098361,155.73770492,
163.93442623,172.13114754,180.32786885,188.52459016,196.72131148,
204.91803279,213.1147541, 221.31147541,229.50819672,237.70491803,
245.90163934])
p_fft_amp1 = np.array([3.34278536e-08,5.73549829e-08,1.94897033e-08,1.59088184e-08,
9.23948302e-09,3.71198908e-09,1.85535722e-09,1.86064653e-09,
1.52149363e-09,1.33626573e-09,1.19468040e-09,1.08304535e-09,
9.96594475e-10,9.25671797e-10,8.66775330e-10,8.17287132e-10,
7.75342888e-10,7.39541296e-10,7.08843676e-10,6.82440637e-10,
6.59712650e-10,6.40169517e-10,6.23422124e-10,6.09159901e-10,
5.97134297e-10,5.87146816e-10,5.79040074e-10,5.72691200e-10,
5.68006964e-10,5.64920239e-10,5.63387557e-10])
# Changed sequence from "omiga, tstar, f" to "f, omiga, tstar".
def cal_omiga_tstar(f, omiga,tstar):
return omiga*np.exp(-np.pi*f*tstar)/(1+(f/18.15)**2)
# Changed this call to get popt, pcov, and supplied the initial guesses
popt, pcov = curve_fit(cal_omiga_tstar,p_freqs,p_fft_amp1, p0=(1E-5, 1E-2))
Here's popt: array([ 4.51365934e-08, -1.48124194e-06]) and pcov: array([[1.35757744e-17, 3.54656128e-12],[3.54656128e-12, 2.90508007e-06]]). As you can see, the covariance matrix could be estimated in this case.
Here's the model x data curve:

Fitting multiple datasets with shared paramaters

I am trying to fit different data set with different non-linear function that shared some parameters and it look something like this:
import matplotlib
from matplotlib import pyplot as plt
from scipy import optimize
import numpy as np
#some non-linear function
def Sigma1x(x,C11,C111,C1111,C11111):
return C11*x+1/2*C111*pow(x,2)+1/6*C1111*pow(x,3)+1/24*C11111*pow(x,4)
def Sigma2x(x,C12,C112,C1112,C11112):
return C12*x+1/2*C112*pow(x,2)+1/6*C1112*pow(x,3)+1/24*C11112*pow(x,4)
def Sigma1y(y,C12,C111,C222,C112,C1111,C1112,C2222,C12222):
return C12*y+1/2*(C111-C222+C112)*pow(y,2)+1/12*(C111+2*C1112-C2222)*pow(y,3)+1/24*C12222*pow(y,4)
def Sigma2y(y,C11,C222,C222,C2222):
return C11*y+1/2*C222*pow(y,2)+1/6*C2222*pow(y,3)+1/24*C22222*pow(y,4)
def Sigmaz(z,C11,C12,C111,C222,C112,C1111,C1112,C2222,C1122,C11111,C11112,C122222,C11122,C22222):
return (C11+C12)*z+1/2*(2*C111-C222+3*C112)*pow(z,2)+1/6*(3/2*C1111+4*C1112-1/2*C222+3*C1122)*pow(z,3)+\
1/24*(3*C11111+10*C11112-5*C12222+10*C11122-2*C22222)*pow(z,4)
# Experimental datasets
Xdata=np.loadtxt('x-direction.txt') #This contain x axis and two other dataset, should be fitted with Sigma1x and Sigma2x
Ydata=np.loadtxt('y-direction.txt') #his contain yaxis and two other dataset, should be fitted with Sigma1yand Sigma2y
Zdata=nploadtxt('z-direction.txt')#This contain z axis and one dataset fitted with Sigmaz
The question is how to use optimize.leastsq or other packages to fit the data with the appropriate function, knowing that they share multiple paramaters?
I was able to solve ( partially the initial question). I found symfit a very comprehensive and easy to use. So i wrote the following code
import matplotlib.pyplot as plt
from symfit import *
import numpy as np
from symfit.core.minimizers import DifferentialEvolution, BFGS
Y_strain = np.genfromtxt('Y_strain.csv', delimiter=',')
X_strain=np.genfromtxt('X_strain.csv', delimiter=',')
xmax=max(X_strain[:,0])
xmin=min(X_strain[:,0])
xdata = np.linspace(xmin, xmax, 50)
ymax=max(Y_strain[:,0])
ymin=max(Y_strain[:,0])
ydata=np.linspace(ymin, ymax, 50)
x,y,Sigma1x,Sigma2x,Sigma1y,Sigma2y= variables('x,y,Sigma1x,Sigma2x,Sigma1y,Sigma2y')
C11,C111,C1111,C11111,C12,C112,C1112,C11112,C222,C2222,C12222,C22222 = parameters('C11,C111,C1111,C11111,C12,C112,C1112,C11112,C222,C2222,C12222,C22222')
model =Model({
Sigma1x:C11*x+1/2*C111*pow(x,2)+1/6*C1111*pow(x,3)+1/24*C11111*pow(x,4),
Sigma2x:C12*x+1/2*C112*pow(x,2)+1/6*C1112*pow(x,3)+1/24*C11112*pow(x,4),
#Sigma1y:C12*y+1/2*(C111-C222+C112)*pow(y,2)+1/12*(C111+2*C1112-C2222)*pow(y,3)+1/24*C12222*pow(y,4),
#Sigma2y:C11*y+1/2*C222*pow(y,2)+1/6*C2222*pow(y,3)+1/24*C22222*pow(y,4),
})
fit = Fit(model, x=X_strain[:,0], Sigma1x=X_strain[:,1],Sigma2x=X_strain[:,2])
fit_result = fit.execute()
print(fit_result)
plt.scatter(Y_strain[:,0],Y_strain[:,2])
plt.scatter(Y_strain[:,0],Y_strain[:,1])
plt.plot(xdata, model(x=xdata, **fit_result.params).Sigma1x)
plt.plot(xdata, model(x=xdata, **fit_result.params).Sigma2x)
However, The resulting fit is very bad :
Parameter Value Standard Deviation
C11 1.203919e+02 3.988977e+00
C111 -6.541505e+02 5.643111e+01
C1111 1.520749e+03 3.713742e+02
C11111 -7.824107e+02 1.015887e+03
C11112 4.451211e+03 1.015887e+03
C1112 -1.435071e+03 3.713742e+02
C112 9.207923e+01 5.643111e+01
C12 3.272248e+01 3.988977e+00
Status message Desired error not necessarily achieved due to precision loss.
Number of iterations 59
Objective <symfit.core.objectives.LeastSquares object at 0x000001CC00C0A508>
Minimizer <symfit.core.minimizers.BFGS object at 0x000001CC7F84A548>
Goodness of fit qualifiers:
chi_squared 6.230510793023184
objective_value 3.115255396511592
r_squared 0.991979767376565
Any idea's how to improve the fit?

statsmodels - printing summary of ARMA fit throws error

I want to fit an ARMA(p,q) model to simulated data, y, and check the effect of different estimation methods on the results. However, fitting a model to the same object like so
model = tsa.ARMA(y,(1,1))
results_mle = model.fit(trend='c', method='mle', disp=False)
results_css = model.fit(trend='c', method='css', disp=False)
and printing the results
print result_mle.summary()
print result_css.summary()
generates the following error
File "C:\Anaconda\lib\site-packages\statsmodels\tsa\arima_model.py", line 1572, in summary
smry.add_table_params(self, alpha=alpha, use_t=False)
File "C:\Anaconda\lib\site-packages\statsmodels\iolib\summary.py", line 885, in add_table_params
use_t=use_t)
File "C:\Anaconda\lib\site-packages\statsmodels\iolib\summary.py", line 475, in summary_params
exog_idx]
IndexError: index 3 is out of bounds for axis 0 with size 3
If, instead, I do this
model1 = tsa.ARMA(y,(1,1))
model2 = tsa.ARMA(y,(1,1))
result_mle = model1.fit(trend='c',method='css-mle',disp=False)
print result_mle.summary()
result_css = model2.fit(trend='c',method='css',disp=False)
print result_css.summary()
no error occurs. Is that supposed to be or a Bug that should be fixed?
BTW the ARMA process I generated as follows
from __future__ import division
import statsmodels.tsa.api as tsa
import numpy as np
# generate arma
a = -0.7
b = -0.7
c = 2
s = 10
y1 = np.random.normal(c/(1-a),s*(1+(a+b)**2/(1-a**2)))
e = np.random.normal(0,s,(100,))
y = [y1]
for t in xrange(e.size-1):
arma = c + a*y[-1] + e[t+1] + b*e[t]
y.append(arma)
y = np.array(y)
You could report this as a bug, even though it looks like a consequence of the current design.
Some attributes of the model change when the estimation method is changed, which should in general be avoided. Since both results instances access the same model, the older one is inconsistent with it in this case.
http://www.statsmodels.org/dev/pitfalls.html#repeated-calls-to-fit-with-different-parameters
In general, statsmodels tries to keep all parameters that need to change the model in the model.__init__ and not as arguments in fit, and attach the outcome of fit and results to the Results instance.
However, this is not followed everywhere, especially not in older models that gained new options along the way.
trend is an example that is supposed to go into the ARMA.__init__ because it is now handled together with the exog (which is an ARMAX model), but wasn't in pure ARMA. The estimation method belongs in fit and should not cause problems like these.
Aside: There is a helper function to simulate an ARMA process that uses scipy.signal.lfilter and should be much faster than an iteration loop in Python.

ValueError loading data for scipy.odr regression

I recently tried to use scipy.odr package to conduct a regression analysis. Whenever I try to load a list of data where the elements depend on a function, a value error is raised:
ValueError: x could not be made into a suitable array
I have been using the same kind of programming to make fits using scipy's leastsq and curve_fit routines without problems.
Any idea of what to change and how to proceed? Thanks a lot...
Here I include a minimal working example:
from scipy import odr
from functools import partial
import numpy as np
import matplotlib.pyplot as plt
### choose select=0 and for myModel a list of elements is called which are a function of some parameters
### this results in error message: ValueError: x could not be made into a suitable array
### choose select=1, the function temp is exlcuded, and a fit is generated
### what do i have to do in order to run the programm successfully using select=0?
## choose here!
select=1
pfit=[1.0,1.0]
q0=[1,2,3,4,5]
q1=[3,8,10,19,27]
def temp(par, val):
p1,p2=par
temp_out = p1*val**p2
return temp_out
def fodr(a,x):
if select==0:
fitf = np.array([xi(a) for xi in x])
else:
fitf= a[0]*x**a[1]
return fitf
# define model
myModel = odr.Model(fodr)
# load data
damy=q1
if select==0:
damx=[]
for el in q0:
elm=partial(temp,val=el)
damx.append(elm)
#damx=[el(pfit) for el in damx] # check that function temp works fine
#print damx
else:
damx=q0
myData = odr.Data(damx, damy)
myOdr = odr.ODR(myData, myModel , beta0=pfit, maxit=100, ifixb=[1,1])
out = myOdr.run()
out.pprint()
Edit:
# Robert:
Thanks for your reply. I am using scipy version '0.14.0'. Using select==0 in my minimal example I get following traceback:
Traceback (most recent call last):
File "scipy-odr.py", line 48, in <module>
out = myOdr.run()
File "/home/tg/anaconda/lib/python2.7/site-packages/scipy/odr/odrpack.py", line 1061, in run
self.output = Output(odr(*args, **kwds))
ValueError: x could not be made into a suitable array
In short, your code does not work because damx is a now a list of functools.partial.
scipy.odr is a simple wrapper around Fortran Orthogonal Distance Regression (ODRPACK), both xdata and ydata have to be numerical since they will be converted to some Fortran type under the hood. It doesn't know what to do with a list of functools.partial, therefore the error.

Creating new distributions in scipy

I'm trying to create a distribution based on some data I have, then draw randomly from that distribution. Here's what I have:
from scipy import stats
import numpy
def getDistribution(data):
kernel = stats.gaussian_kde(data)
class rv(stats.rv_continuous):
def _cdf(self, x):
return kernel.integrate_box_1d(-numpy.Inf, x)
return rv()
if __name__ == "__main__":
# pretend this is real data
data = numpy.concatenate((numpy.random.normal(2,5,100), numpy.random.normal(25,5,100)))
d = getDistribution(data)
print d.rvs(size=100) # this usually fails
I think this is doing what I want it to, but I frequently get an error (see below) when I try to do d.rvs(), and d.rvs(100) never works. Am I doing something wrong? Is there an easier or better way to do this? If it's a bug in scipy, is there some way to get around it?
Finally, is there more documentation on creating custom distributions somewhere? The best I've found is the scipy.stats.rv_continuous documentation, which is pretty spartan and contains no useful examples.
The traceback:
Traceback (most recent call last): File "testDistributions.py", line
19, in
print d.rvs(size=100) File "/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/stats/distributions.py",
line 696, in rvs
vals = self._rvs(*args) File "/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/stats/distributions.py",
line 1193, in _rvs
Y = self._ppf(U,*args) File "/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/stats/distributions.py",
line 1212, in _ppf
return self.vecfunc(q,*args) File "/usr/local/lib/python2.6/dist-packages/numpy-1.6.1-py2.6-linux-x86_64.egg/numpy/lib/function_base.py",
line 1862, in call
theout = self.thefunc(*newargs) File "/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/stats/distributions.py",
line 1158, in _ppf_single_call
return optimize.brentq(self._ppf_to_solve, self.xa, self.xb, args=(q,)+args, xtol=self.xtol) File
"/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/optimize/zeros.py",
line 366, in brentq
r = _zeros._brentq(f,a,b,xtol,maxiter,args,full_output,disp) ValueError: f(a) and f(b) must have different signs
Edit
For those curious, following the advice in the answer below, here's code that works:
from scipy import stats
import numpy
def getDistribution(data):
kernel = stats.gaussian_kde(data)
class rv(stats.rv_continuous):
def _rvs(self, *x, **y):
# don't ask me why it's using self._size
# nor why I have to cast to int
return kernel.resample(int(self._size))
def _cdf(self, x):
return kernel.integrate_box_1d(-numpy.Inf, x)
def _pdf(self, x):
return kernel.evaluate(x)
return rv(name='kdedist', xa=-200, xb=200)
Specifically to your traceback:
rvs uses the inverse of the cdf, ppf, to create random numbers. Since you are not specifying ppf, it is calculated by a rootfinding algorithm, brentq. brentq uses lower and upper bounds on where it should search for the value at with the function is zero (find x such that cdf(x)=q, q is quantile).
The default for the limits, xa and xb, are too small in your example. The following works for me with scipy 0.9.0, xa, xb can be set when creating the function instance
def getDistribution(data):
kernel = stats.gaussian_kde(data)
class rv(stats.rv_continuous):
def _cdf(self, x):
return kernel.integrate_box_1d(-numpy.Inf, x)
return rv(name='kdedist', xa=-200, xb=200)
There is currently a pull request for scipy to improve this, so in the next release xa and xb will be expanded automatically to avoid the f(a) and f(b) must have different signs exception.
There is not much documentation on this, the easiest is to follow some examples (and ask on the mailing list).
edit: addition
pdf: Since you have the density function also given by gaussian_kde, I would add the _pdf method, which will make some calculations more efficient.
edit2: addition
rvs: If you are interested in generating random numbers, then gaussian_kde has a resample method. Random Samples can be generated by sampling from the data and adding gaussian noise. So, this will be faster than the generic rvs using the ppf method. I would write a ._rvs method that just calls gaussian_kde's resample method.
precomputing ppf: I don't know of any general way to precompute the ppf. However, the way I thought of doing it (but never tried so far) is to precompute the ppf at many points and then use linear interpolation to approximate the ppf function.
edit3: about _rvs to answer Srivatsan's question in the comment
_rvs is the distribution specific method that is called by the public method rvs. rvs is a generic method that does some argument checking, adds location and scale, and sets the attribute self._size which is the size of the requested array of random variables, and then calls the distribution specific method ._rvs or it's generic counterpart. The extra arguments in ._rvs are shape parameters, but since there are none in this case, *x and **y are redundant and unused.
I don't know how well the size or shape of the .rvs method works in the multivariate case. These distributions are designed for univariate distributions, and might not fully work for the multivariate case, or might need some reshapes.

Categories