Related
im trying to estimate the gjr garch model using this code below and the dataset im using is the bitcoin returns (daily) total of 1600 observations. im getting this error ''index 1 is out of bounds for axis 0 with size 1''. and above it it says :
File "C:\Users\georgios\Downloads\untitled1.py", line 97, in
estimates = fmin_slsqp(gjr_garch_likelihood, startingVals,
File "C:\Users\georgios\Nieuwe map\lib\site-packages\scipy\optimize\slsqp.py", line 207, in fmin_slsqp
res = _minimize_slsqp(func, x0, args, jac=fprime, bounds=bounds,
File "C:\Users\georgios\Nieuwe map\lib\site-packages\scipy\optimize\slsqp.py", line 375, in _minimize_slsqp
sf = _prepare_scalar_function(func, x, jac=jac, args=args, epsilon=eps,
File "C:\Users\georgios\Nieuwe map\lib\site-packages\scipy\optimize\optimize.py", line 261, in _prepare_scalar_function
sf = ScalarFunction(fun, x0, args, grad, hess,
File "C:\Users\georgios\Nieuwe map\lib\site-packages\scipy\optimize_differentiable_functions.py", line 136, in init
self._update_fun()
File "C:\Users\georgios\Nieuwe map\lib\site-packages\scipy\optimize_differentiable_functions.py", line 226, in _update_fun
self._update_fun_impl()
File "C:\Users\georgios\Nieuwe map\lib\site-packages\scipy\optimize_differentiable_functions.py", line 133, in update_fun
self.f = fun_wrapped(self.x)
File "C:\Users\georgios\Nieuwe map\lib\site-packages\scipy\optimize_differentiable_functions.py", line 130, in fun_wrapped
return fun(x, *args)
File "C:\Users\georgios\Downloads\untitled1.py", line 21, in gjr_garch_likelihood
sigma2[t] = (omega + alpha * eps[t-1]**2
File "C:\Users\georgios\Nieuwe map\lib\site-packages\pandas\core\series.py", line 977, in setitem
values[key] = value
IndexError: index 1 is out of bounds for axis 0 with size 1
My dataset is fine from
What I see , I only have one excel file with only one column with the returns
And the garch model is Univariate
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from numpy import size, log, pi, sum, array, zeros, diag, mat, asarray, sqrt, \
copy
from numpy.linalg import inv
from scipy.optimize import fmin_slsqp
def gjr_garch_likelihood(parameters, data, sigma2, out=None):
''' Returns negative log-likelihood for GJR-GARCH(1,1,1) model.'''
mu = parameters[0]
omega = parameters[1]
alpha = parameters[2]
gamma = parameters[3]
beta = parameters[4]
T = size(data,0)
eps = data - mu
# Data and sigma2 are T by 1 vectors
for t in range(1,T):
sigma2[t] = (omega + alpha * eps[t-1]**2
+ gamma * eps[t-1]**2 * (eps[t-1]<0) + beta * sigma2[t-1])
logliks = 0.5*(log(2*pi) + log(sigma2) + eps**2/sigma2)
loglik = sum(logliks)
if out is None:
return loglik
else:
return loglik, logliks, copy(sigma2)
def gjr_constraint(parameters, data, sigma2, out=None):
''' Constraint that alpha+gamma/2+beta<=1'''
alpha = parameters[2]
gamma = parameters[3]
beta = parameters[4]
return array([1-alpha-gamma/2-beta])
def hessian_2sided(fun, theta, args):
f = fun(theta, *args)
h = 1e-5*np.abs(theta)
thetah = theta + h
h = thetah - theta
K = size(theta,0)
h = np.diag(h)
fp = zeros(K)
fm = zeros(K)
for i in range(K):
fp[i] = fun(theta+h[i], *args)
fm[i] = fun(theta-h[i], *args)
fpp = zeros((K,K))
fmm = zeros((K,K))
for i in range(K):
for j in range(i,K):
fpp[i,j] = fun(theta + h[i] + h[j], *args)
fpp[j,i] = fpp[i,j]
fmm[i,j] = fun(theta - h[i] - h[j], *args)
fmm[j,i] = fmm[i,j]
hh = (diag(h))
hh = hh.reshape((K,1))
hh = hh # hh.T
H = zeros((K,K))
for i in range(K):
for j in range(i,K):
H[i,j] = (fpp[i,j] - fp[i] - fp[j] + f
+ f - fm[i] - fm[j] + fmm[i,j])/hh[i,j]/2
H[j,i] = H[i,j]
return H
# Import data
FTSEreturn = pd.read_csv('1.csv')
# Starting values
startingVals = array([FTSEreturn.mean(),
FTSEreturn.var() * .01,
.03, .09, .90])
# Estimate parameters
finfo = np.finfo(np.float64)
bounds = [(-10*FTSEreturn.mean(), 10*FTSEreturn.mean()),
(finfo.eps, 2*FTSEreturn.var() ),
(0.0,1.0), (0.0,1.0), (0.0,1.0)]
T = FTSEreturn.shape[0]
sigma2 = T * FTSEreturn.var()
# Pass a NumPy array, not a pandas Series
args = (np.asarray(FTSEreturn), sigma2)
estimates = fmin_slsqp(gjr_garch_likelihood, startingVals,
f_ieqcons=gjr_constraint, bounds = bounds,
args = args)
loglik, logliks, sigma2final = gjr_garch_likelihood(estimates, FTSEreturn,
sigma2, out=True)
step = 1e-5 * estimates
scores = zeros((T,5))
for i in range(5):
h = step[i]
delta = np.zeros(5)
delta[i] = h
loglik, logliksplus, sigma2 = gjr_garch_likelihood(estimates + delta, \
np.asarray(FTSEreturn), sigma2, out=True)
loglik, logliksminus, sigma2 = gjr_garch_likelihood(estimates - delta, \
np.asarray(FTSEreturn), sigma2, out=True)
scores[:,i] = (logliksplus - logliksminus)/(2*h)
I = (scores.T # scores)/T
J = hessian_2sided(gjr_garch_likelihood, estimates, args)
J = J/T
Jinv = mat(inv(J))
vcv = Jinv*mat(I)*Jinv/T
vcv = asarray(vcv)
output = np.vstack((estimates,sqrt(diag(vcv)),estimates/sqrt(diag(vcv)))).T
print('Parameter Estimate Std. Err. T-stat')
param = ['mu','omega','alpha','gamma','beta']
for i in range(len(param)):
print('{0:<11} {1:>0.6f} {2:0.6f} {3: 0.5f}'.format(param[i],
output[i,0], output[i,1], output[i,2]))
any help would be appreciated because im stuck on this one month now and i cant solve it
The error says there's an indexing error during a setitem in
sigma2[t] = (omega + alpha * eps[t-1]**2
In other words, t is too large for the array sigma2.
That's in the gjr_garch_likelihood function. t is an iterator that starts at 1. sigma2 is an argument. So we need to look at how the function is called, and the corresponding argument is.
That's more complicated, since it's a function used by fmin_slsqp. So the next step is to review that function's docs, to understand how it calls the func, and especially what arguments it provides to your gjr.... I won't do that for you!
But it's a good idea when encountering errors like this to add some diagnostic prints to your gjr... function. You need to clearly understand what gets passed to it, paying particular attention to array shapes. Don't make assumptions. Verify.
You may need to change gjr ... to accomodate the arguments, or you may need to modify how you call fmin....
Question - how much of this did you already figure out in the past month of struggle? If you already figured this out, you should have included that information in the question.
I'm trying to fit and plot a Gaussian curve to some given data. This is what I have so far:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Generate data
mu, sigma = 0, 0.1
y, xe = np.histogram(np.random.normal(mu, sigma, 1000))
x = .5 * (xe[:-1] + xe[1:])
def gauss (x, y):
p = [x0, y0, sigma]
return p[0] * np.exp(-(x-p[1])**2 / (2 * p[2]**2))
p0 = [1., 1., 1.]
fit = curve_fit(gauss, x, y, p0=p0)
plt.plot(gauss(x, y))
plt.show()
When I run the code I get this error:
TypeError: gauss() takes exactly 2 arguments (4 given)
I don't understand where I have given my function 4 arguments. I'm also not convinced I'm using the curve function correctly, but I'm not sure exactly what I'm doing wrong. Any help would be appreciated.
Edit
Here's the Traceback:
Traceback (most recent call last):
File "F:\Numerical methods\rw893 final assignment.py", line 21, in <module>
fitE, fitI = curve_fit(gauss, x, y, p0=p0)
File "F:\Portable Python 2.7.5.1\App\lib\site-packages\scipy\optimize\minpack.py", line 515, in curve_fit
res = leastsq(func, p0, args=args, full_output=1, **kw)
File "F:\Portable Python 2.7.5.1\App\lib\site-packages\scipy\optimize\minpack.py", line 354, in leastsq
shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
File "F:\Portable Python 2.7.5.1\App\lib\site-packages\scipy\optimize\minpack.py", line 17, in _check_func
res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
File "F:\Portable Python 2.7.5.1\App\lib\site-packages\scipy\optimize\minpack.py", line 427, in _general_function
return function(xdata, *params) - ydata
TypeError: gauss() takes exactly 2 arguments (4 given)
Check the first scipy documentation docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.optimize.curve_fit.html:
scipy.optimize.curve_fit
scipy.optimize.curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw)
Use non-linear least squares to fit a function, f, to data.
Assumes ydata = f(xdata, *params) + eps
Explaining the idea
The function to be fitted should take only scalars (not: *p0).
I want to remind you that you hand over the initialization parameters x0, y0, sigma to the function gauss during the call of curve_fit.
You call the initialization p0 = [x0, y0, sigma].
The function gauss returns the value y = y0 * np.exp(-((x - x0) / sigma)**2).
Therefore the input values need to be x, x0, y0, sigma.
The first parameter x is the data you know together with the result of the function y. The later three parameters will be fitted - you hand over them as initialization parameters.
Working example
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Create data:
x0, sigma = 0, 0.1
y, xe = np.histogram(np.random.normal(x0, sigma, 1000))
x = .5 * (xe[:-1] + xe[1:])
# Function to be fitted
def gauss(x, x0, y0, sigma):
p = [x0, y0, sigma]
return p[1]* np.exp(-((x-p[0])/p[2])**2)
# Initialization parameters
p0 = [1., 1., 1.]
# Fit the data with the function
fit, tmp = curve_fit(gauss, x, y, p0=p0)
# Plot the results
plt.title('Fit parameters:\n x0=%.2e y0=%.2e sigma=%.2e' % (fit[0], fit[1], fit[2]))
# Data
plt.plot(x, y, 'r--')
# Fitted function
x_fine = np.linspace(xe[0], xe[-1], 100)
plt.plot(x_fine, gauss(x_fine, fit[0], fit[1], fit[2]), 'b-')
plt.savefig('Gaussian_fit.png')
plt.show()
Probably your callback is called in curve_fit with a different number of parameters.
Have a look at the documentation where it says:
The model function, f(x, ...). It must take the independent variable
as the first argument and the parameters to fit as separate remaining
arguments.
To make sure this works out you might want to take *args after the first argument and have a look at what you get.
from numpy import loadtxt
import numpy as np
from scipy import *
from matplotlib import *
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c, d, x0):
return a*np.exp(-(x-x0)**2/(2*d**2)) + c
x = np.linspace(0,4,50)
y = func(x, 2.5, 1.3, 0.5, 1.0, 2.0)
yn = y + 0.2*np.random.normal(size=len(x))
p = [1,1,1,1,1]
popt, pcov = curve_fit(func, x, yn, p0=p)
plt.plot(x,func(x,popt[0],popt[1],popt[2],popt[3],popt[4]))
plt.plot(x,yn,'r+')
plt.show()
This should help. This can also be extended to a 3d Gaussian, then the
input array 'x' should be a k-dimensional array for the (x,y) values and
'yn' should be the z-values.
I have the following code:
import numpy as np
from scipy.optimize import curve_fit
def func(x, p): return p[0] + p[1] + x
popt, pcov = curve_fit(func, np.arange(10), np.arange(10), p0=(0, 0))
It will raise TypeError: func() takes exactly 2 arguments (3 given). Well, that sounds fair - curve_fit unpact the (0, 0) to be two scalar inputs. So I tried this:
popt, pcov = curve_fit(func, np.arange(10), np.arange(10), p0=((0, 0),))
Again, it said: ValueError: object too deep for desired array
If I left it as default (not specifying p0):
popt, pcov = curve_fit(func, np.arange(10), np.arange(10))
It will raise IndexError: invalid index to scalar variable. Obviously, it only gave the function a scalar for p.
I can make def func(x, p1, p2): return p1 + p2 + x to get it working, but with more complicated situations the code is going to look verbose and messy. I'd really love it if there's a cleaner solution to this problem.
Thanks!
Not sure if this is cleaner, but at least it is easier now to add more parameters to the fitting function. Maybe one could even make an even better solution out of this.
import numpy as np
from scipy.optimize import curve_fit
def func(x, p): return p[0] + p[1] * x
def func2(*args):
return func(args[0],args[1:])
popt, pcov = curve_fit(func2, np.arange(10), np.arange(10), p0=(0, 0))
print popt,pcov
EDIT: This works for me
import numpy as np
from scipy.optimize import curve_fit
def func(x, *p): return p[0] + p[1] * x
popt, pcov = curve_fit(func, np.arange(10), np.arange(10), p0=(0, 0))
print popt,pcov
Problem
When using curve_fit you must explicitly say the number of fit parameters. Doing something like:
def f(x, *p):
return sum( [p[i]*x**i for i in range(len(p))] )
would be great, since it would be a general nth-order polynomial fitting function, but unfortunately, in my SciPy 0.12.0, it raises:
ValueError: Unable to determine number of fit parameters.
Solution
So you should do:
def f_1(x, p0, p1):
return p0 + p1*x
def f_2(x, p0, p1, p2):
return p0 + p1*x + p2*x**2
and so forth...
Then you can call using the p0 argument:
curve_fit(f_1, xdata, ydata, p0=(0,0))
scipy.optimize.curve_fit
scipy.optimize.curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw)
Use non-linear least squares to fit a function, f, to data.
Assumes ydata = f(xdata, *params) + eps
Explaining the idea
The function to be fitted should take only scalars (not: *p0).
Remember that the result of the fit depends on the initialization parameters.
Working example
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def func(x, a0, a1):
return a0 + a1 * x
x, y = np.arange(10), np.arange(10) + np.random.randn(10)/10
popt, pcov = curve_fit(func, x, y, p0=(1, 1))
# Plot the results
plt.title('Fit parameters:\n a0=%.2e a1=%.2e' % (popt[0], popt[1]))
# Data
plt.plot(x, y, 'rx')
# Fitted function
x_fine = np.linspace(x[0], x[-1], 100)
plt.plot(x_fine, func(x_fine, popt[0], popt[1]), 'b-')
plt.savefig('Linear_fit.png')
plt.show()
You can define functions that return other functions (see Passing additional arguments using scipy.optimize.curve_fit? )
Working example :
import numpy as np
import random
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt
import math
def funToFit(x):
return 0.5+2*x-3*x*x+0.2*x*x*x+0.1*x*x*x*x
xx=[random.uniform(1,5) for i in range(30)]
yy=[funToFit(xx[i])+random.uniform(-1,1) for i in range(len(xx))]
a=np.zeros(5)
def make_func(numarg):
def func(x,*a):
ng=numarg
v=0
for i in range(ng):
v+=a[i]*np.power(x,i)
return v
return func
leastsq, covar = curve_fit(make_func(len(a)),xx,yy,tuple(a))
print leastsq
def fFited(x):
v=0
for i in range(len(leastsq)):
v+=leastsq[i]*np.power(x,i)
return v
xfine=np.linspace(1,5,200)
plt.plot(xx,yy,".")
plt.plot(xfine,fFited(xfine))
plt.show()
This is an old thread now, but I also just ran into this issue. Building on Emile Maras' solution, but expanding the function to to return either the nth order polynomial fitting function for curve_fit, or the y values based on fit results. This facilitates plotting and residual calculations. Here is an example that fits data to progressively higher order polynomials and plots the results and residuals.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def funToFit(x):
return 0.5 + 2*x -3*x**2 + 0.2*x**3 + 0.1*x**4
x=np.arange(30)
y=funToFit(x)+np.random.normal(0,5000,x.size)
def polyfun(order=0,x=np.arange(0),P=np.arange(0)):
if P.size>0 and x.size>0:
y=0
for i in range(P.size):
y+=P[i]*np.power(x,i)
return y
elif order>0:
def fitfun(x,*a):
y=0
for i in range(order+1):
y+=a[i]*np.power(x,i)
return y
return fitfun
else:
raise Exception("Either order or x and P must be provided")
plt.figure("fits")
plt.plot(x,y,color="black")
for i in range(4):
order = i+1
[fit,covar] = curve_fit(polyfun(order=order),x,y,p0=(1,)*(order+1))
yfit = polyfun(x=x,P=fit)
res = yfit-y
plt.figure("fits")
plt.plot(x,yfit)
plt.figure("res")
plt.plot(x,res)
I am trying to fit a skewed and shifted Gaussian curve using scipy's curve_fit function, but I find that under certain conditions the fitting is quite poor, often giving me close to or exactly a straight line.
The code below is derived from the curve_fit documentation. The code provided is an arbitrary set of data for test purposes but displays the issue quite well.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as math
import scipy.special as sp
#def func(x, a, b, c):
# return a*np.exp(-b*x) + c
def func(x, sigmag, mu, alpha, c,a):
#normal distribution
normpdf = (1/(sigmag*np.sqrt(2*math.pi)))*np.exp(-(np.power((x-mu),2)/(2*np.power(sigmag,2))))
normcdf = (0.5*(1+sp.erf((alpha*((x-mu)/sigmag))/(np.sqrt(2)))))
return 2*a*normpdf*normcdf + c
x = np.linspace(0,100,100)
y = func(x, 10,30, 0,0,1)
yn = y + 0.001*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn,) #p0=(9,35,0,9,1))
y_fit= func(x,popt[0],popt[1],popt[2],popt[3],popt[4])
plt.plot(x,yn)
plt.plot(x,y_fit)
The issue seems to pop up when I shift the gaussian too far from zero (using mu). I have tried giving initial values, even those identical to my original function, but it does not solve the problem. For a value of mu=10, curve_fit works perfectly, but if I use mu>=30 it not longer fits the data.
Giving starting points for minimization often works wonders. Try giving the minimizer some information on the position of the maximum and the width of the curve:
popt, pcov = curve_fit(func, x, yn, p0=(1./np.std(yn), np.argmax(yn) ,0,0,1))
Changing this single line in your code with sigma=10 and mu=50 produces
You can call curve_fit many times with random initial guess, and choose the parameters with minimum error.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as math
import scipy.special as sp
def func(x, sigmag, mu, alpha, c,a):
#normal distribution
normpdf = (1/(sigmag*np.sqrt(2*math.pi)))*np.exp(-(np.power((x-mu),2)/(2*np.power(sigmag,2))))
normcdf = (0.5*(1+sp.erf((alpha*((x-mu)/sigmag))/(np.sqrt(2)))))
return 2*a*normpdf*normcdf + c
x = np.linspace(0,100,100)
y = func(x, 10,30, 0,0,1)
yn = y + 0.001*np.random.normal(size=len(x))
results = []
for i in xrange(50):
p = np.random.randn(5)*10
try:
popt, pcov = curve_fit(func, x, yn, p)
except:
pass
err = np.sum(np.abs(func(x, *popt) - yn))
results.append((err, popt))
if err < 0.1:
break
err, popt = min(results, key=lambda x:x[0])
y_fit= func(x, *popt)
plt.plot(x,yn)
plt.plot(x,y_fit)
print len(results)
I have a data surface that I'm fitting using SciPy's leastsq function.
I would like to have some estimate of the quality of the fit after leastsq returns. I'd expected that this would be included as a return from the function, but, if so, it doesn't seem to be clearly documented.
Is there such a return or, barring that, some function I can pass my data and the returned parameter values and fit function to that will give me an estimate of fit quality (R^2 or some such)?
Thanks!
If you call leastsq like this:
import scipy.optimize
p,cov,infodict,mesg,ier = optimize.leastsq(
residuals,a_guess,args=(x,y),full_output=True)
where
def residuals(a,x,y):
return y-f(x,a)
then, using the definition of R^2 given here,
ss_err=(infodict['fvec']**2).sum()
ss_tot=((y-y.mean())**2).sum()
rsquared=1-(ss_err/ss_tot)
What is infodict['fvec'] you ask? It's the array of residuals:
In [48]: optimize.leastsq?
...
infodict -- a dictionary of optional outputs with the keys:
'fvec' : the function evaluated at the output
For example:
import scipy.optimize as optimize
import numpy as np
import collections
import matplotlib.pyplot as plt
x = np.array([821,576,473,377,326])
y = np.array([255,235,208,166,157])
def sigmoid(p,x):
x0,y0,c,k=p
y = c / (1 + np.exp(-k*(x-x0))) + y0
return y
def residuals(p,x,y):
return y - sigmoid(p,x)
Param=collections.namedtuple('Param','x0 y0 c k')
p_guess=Param(x0=600,y0=200,c=100,k=0.01)
p,cov,infodict,mesg,ier = optimize.leastsq(
residuals,p_guess,args=(x,y),full_output=True)
p=Param(*p)
xp = np.linspace(100, 1600, 1500)
print('''\
x0 = {p.x0}
y0 = {p.y0}
c = {p.c}
k = {p.k}
'''.format(p=p))
pxp=sigmoid(p,xp)
# You could compute the residuals this way:
resid=residuals(p,x,y)
print(resid)
# [ 0.76205302 -2.010142 2.60265297 -3.02849144 1.6739274 ]
# But you don't have to compute `resid` -- `infodict['fvec']` already
# contains the info.
print(infodict['fvec'])
# [ 0.76205302 -2.010142 2.60265297 -3.02849144 1.6739274 ]
ss_err=(infodict['fvec']**2).sum()
ss_tot=((y-y.mean())**2).sum()
rsquared=1-(ss_err/ss_tot)
print(rsquared)
# 0.996768131959
plt.plot(x, y, '.', xp, pxp, '-')
plt.xlim(100,1000)
plt.ylim(130,270)
plt.xlabel('x')
plt.ylabel('y',rotation='horizontal')
plt.grid(True)
plt.show()