Maximum Likelihood estimation of GARCH(1,1) with gamma distribution - python

I am trying to fit a GARCH(1,1) model to a dataset with Gamma(a, 1/a) distribution, using maximum likelihood estimation.
Therefore, the loglikelihood function im using is:
LogL = - ln(Γ(nu)) + (nu - 1) * ln(x) - nu*(x/mu) - nu * ln(mu)
x = data, mu = GARCH(1,1). nu is the input of the gamma function.
I am trying to estimate simultaneously nu and the GARCH(1,1) parameters (omega, alpha, beta). The code I wrote is
import numpy as np
import pandas as pd
import scipy.optimize as opt
from scipy.special import gamma
df_aex = pd.read_excel('/Users/jr/Desktop/Data_complete.xlsx', sheet_name=0)
rv = df_aex["rv5"]
rv = np.array(rv*100)
#Objective function to minimize
def garch_filter(omega, alpha1, beta, rv):
irv = len(rv)
mu_t = np.zeros(irv)
for i in range(irv):
if i == 0:
mu_t[i] = omega/(1 - alpha1 - beta)
else:
mu_t[i] = omega + (alpha1 * rv[i - 1]) + (beta * mu_t[i - 1])
return mu_t
# Define function to calculate GARCH log likelihood
def gamma_loglike(vP, rv):
omega = vP[0]
alpha1 = vP[1]
beta = vP[2]
nu = vP[3]
mu_t = garch_filter(omega, alpha1, beta, rv)
LogL = - np.sum(- np.log(gamma(nu)) + (nu * np.log(nu)) + (nu - 1) * np.log(rv) - nu * np.divide(rv, mu_t) - nu*np.log(mu))
return LogL
#Constraints
def constraint1(vP):
return 1 - (vP[1] + vP[2])
cons1 = ({'type': 'ineq', 'fun': lambda x: np.array(x)})
cons2 = ({'type': 'ineq', 'fun': constraint1})
cons = [cons1, cons2]
#Initial guesses
vP0 = np.array([0.009, 0.10, 0.85, 0.5])
#Maximization
res = opt.minimize(gamma_loglike, vP0, args=rv,
bounds=((0.0001, None), (0.0001, 0.999), (0.0001, 0.999), (0.0001, None)), constraints=cons, options={'disp':True})
o_est = res.x[0]
a_est = res.x[1]
b_est = res.x[2]
nu_est = res.x[3]
print([o_est, a_est, b_est, nu_est])
I'm expecting output to be something like [0.01, 0.05, 0.7, 4] but my first value (omega) is around 40 which is way too high.
Someone that could help me with this problem?

My likelihood function was not quite right.. I have fixed it now.
So the code above can be used to write a maximum likelihood estimation model that estimates the GARCH(1,1) process and the degrees of freedom of the fitted gamma distribution.

Related

Integrating and fitting coupled ODE's for SIR modelling

In this case, there are 3 ODE's that describe a SIR model. The issue comes in I want to calculate which beta and gamma values are the best to fit onto the datapoints from the x_axis and y_axisvalues. The method I'm currently using is to integrate the ODE's using odeintfrom the scipy library and the curve_fit method also from the same library. In this case, how would you calculate the values for beta and gamma to fit the datapoints?
P.S. the current error is this: ValueError: operands could not be broadcast together with shapes (3,) (14,)
#initial values
S_I_R = (0.762/763, 1/763, 0)
x_axis = [m for m in range(1,15)]
y_axis = [3,8,28,75,221,291,255,235,190,125,70,28,12,5]
# ODE's that describe the system
def equation(SIR_Values,t,beta,gamma):
Array = np.zeros((3))
SIR = SIR_Values
Array[0] = -beta * SIR[0] * SIR[1]
Array[1] = beta * SIR[0] * SIR[1] - gamma * SIR[1]
Array[2] = gamma * SIR[1]
return Array
# Results = spi.odeint(equation,S_I_R,time)
#fitting the values
beta_values,gamma_values = curve_fit(equation, x_axis,y_axis)
# Starting values
S0 = 762/763
I0 = 1/763
R0 = 0
x_axis = np.array([m for m in range(0,15)])
y_axis = np.array([1,3,8,28,75,221,291,255,235,190,125,70,28,12,5])
y_axis = np.divide(y_axis,763)
def sir_model(y, x, beta, gamma):
S = -beta * y[0] * y[1]
R = gamma * y[1]
I = beta * y[0] * y[1] - gamma * y[1]
return S, I, R
def fit_odeint(x, beta, gamma):
return spi.odeint(sir_model, (S0, I0, R0), x, args=(beta, gamma))[:,1]
popt, pcov = curve_fit(fit_odeint, x_axis, y_axis)
beta,gamma = popt
fitted = fit_odeint(x_axis,*popt)
plt.plot(x_axis,y_axis, 'o', label = "infected per day")
plt.plot(x_axis, fitted, label = "fitted graph")
plt.xlabel("Time (in days)")
plt.ylabel("Fraction of infected")
plt.title("Fitted beta and gamma values")
plt.legend()
plt.show()
As this example from scipy documentation, the function must output an array with the same size as x_axis and y_axis.

How to solve / fit a geometric brownian motion process in Python?

For example, the below code simulates Geometric Brownian Motion (GBM) process, which satisfies the following stochastic differential equation:
The code is a condensed version of the code in this Wikipedia article.
import numpy as np
np.random.seed(1)
def gbm(mu=1, sigma = 0.6, x0=100, n=50, dt=0.1):
step = np.exp( (mu - sigma**2 / 2) * dt ) * np.exp( sigma * np.random.normal(0, np.sqrt(dt), (1, n)))
return x0 * step.cumprod()
series = gbm()
How to fit the GBM process in Python? That is, how to estimate mu and sigma and solve the stochastic differential equation given the timeseries series?
Parameter estimation for SDEs is a research level area, and thus rather non-trivial. Whole books exist on the topic. Feel free to look into those for more details.
But here's a trivial approach for this case. Firstly, note that the log of GBM is an affinely transformed Wiener process (i.e. a linear Ito drift-diffusion process). So
d ln(S_t) = (mu - sigma^2 / 2) dt + sigma dB_t
Thus we can estimate the log process parameters and translate them to fit the original process. Check out
[1],
[2],
[3],
[4], for example.
Here's a script that does this in two simple ways for the drift (just wanted to see the difference), and just one for the diffusion (sorry). The drift of the log-process is estimated by (X_T - X_0) / T and via the incremental MLE (see code). The diffusion parameter is estimated (in a biased way) with its definition as the infinitesimal variance.
import numpy as np
np.random.seed(9713)
# Parameters
mu = 1.5
sigma = 0.9
x0 = 1.0
n = 1000
dt = 0.05
# Times
T = dt*n
ts = np.linspace(dt, T, n)
# Geometric Brownian motion generator
def gbm(mu, sigma, x0, n, dt):
step = np.exp( (mu - sigma**2 / 2) * dt ) * np.exp( sigma * np.random.normal(0, np.sqrt(dt), (1, n)))
return x0 * step.cumprod()
# Estimate mu just from the series end-points
# Note this is for a linear drift-diffusion process, i.e. the log of GBM
def simple_estimate_mu(series):
return (series[-1] - x0) / T
# Use all the increments combined (maximum likelihood estimator)
# Note this is for a linear drift-diffusion process, i.e. the log of GBM
def incremental_estimate_mu(series):
total = (1.0 / dt) * (ts**2).sum()
return (1.0 / total) * (1.0 / dt) * ( ts * series ).sum()
# This just estimates the sigma by its definition as the infinitesimal variance (simple Monte Carlo)
# Note this is for a linear drift-diffusion process, i.e. the log of GBM
# One can do better than this of course (MLE?)
def estimate_sigma(series):
return np.sqrt( ( np.diff(series)**2 ).sum() / (n * dt) )
# Estimator helper
all_estimates0 = lambda s: (simple_estimate_mu(s), incremental_estimate_mu(s), estimate_sigma(s))
# Since log-GBM is a linear Ito drift-diffusion process (scaled Wiener process with drift), we
# take the log of the realizations, compute mu and sigma, and then translate the mu and sigma
# to that of the GBM (instead of the log-GBM). (For sigma, nothing is required in this simple case).
def gbm_drift(log_mu, log_sigma):
return log_mu + 0.5 * log_sigma**2
# Translates all the estimates from the log-series
def all_estimates(es):
lmu1, lmu2, sigma = all_estimates0(es)
return gbm_drift(lmu1, sigma), gbm_drift(lmu2, sigma), sigma
print('Real Mu:', mu)
print('Real Sigma:', sigma)
### Using one series ###
series = gbm(mu, sigma, x0, n, dt)
log_series = np.log(series)
print('Using 1 series: mu1 = %.2f, mu2 = %.2f, sigma = %.2f' % all_estimates(log_series) )
### Using K series ###
K = 10000
s = [ np.log(gbm(mu, sigma, x0, n, dt)) for i in range(K) ]
e = np.array( [ all_estimates(si) for si in s ] )
avgs = np.mean(e, axis=0)
print('Using %d series: mu1 = %.2f, mu2 = %.2f, sigma = %.2f' % (K, avgs[0], avgs[1], avgs[2]) )
The output:
Real Mu: 1.5
Real Sigma: 0.9
Using 1 series: mu1 = 1.56, mu2 = 1.54, sigma = 0.96
Using 10000 series: mu1 = 1.51, mu2 = 1.53, sigma = 0.93

Solving set of ODEs with Scipy

I am trying to develop an algorithm (use scipy.integrate.odeint()) that predicts the changing concentration of cells, substrate and product (i.e., 𝑋, 𝑆, 𝑃) over time until the system reaches steady- state (~100 or 200 hours). The initial concentration of cells in the bioreactor is 0.1 𝑔/𝐿 and there is no glucose or product in the reactor initially. I want to test the algorithm for a range of different flow rates, 𝑄, between 0.01 𝐿/ℎ and 0.25 𝐿/ℎ and analyze the impact of the flow rate on product production (i.e., 𝑄 ⋅ 𝑃 in 𝑔/ℎ). Eventually, I would like to generate a plot that shows product production rate (y-axis) versus flow rate, 𝑄, on the x-axis. My goal is to estimate the flow rate that results in the maximum (or critical) production rate. This is my code so far:
from scipy.integrate import odeint
import numpy as np
# Constants
u_max = 0.65
K_s = 0.14
K_1 = 0.48
V = 2
X_in = 0
S_in = 4
Y_s = 0.38
Y_p = 0.2
# Variables
# Q - Flow Rate (L/h), value between 0.01 and 0.25 that produces best Q * P
# X - Cell Concentration (g/L)
# S - The glucose concentration (g/L)
# P - Product Concentration (g/L)
# Equations
def func_dX_dt(X, t, S):
u = (u_max) / (1 + (K_s / S))
dX_dt = (((Q * S_in) - (Q * S)) / V) + (u * X)
return dX_dt
def func_dS_dt(S, t, X):
u = (u_max) / (1 + (K_s / S))
dS_dt = (((Q * S_in) - (Q * S)) / V) - (u * (X / Y_s))
return dS_dt
def func_dP_dt(P, t, X, S):
u = (u_max) / (1 + (K_s / S))
dP_dt = ((-Q * P) / V) - (u * (X / Y_p))
return dP_dt
t = np.linspace(0, 200, 200)
# Q placeholder
Q = 0.01
# Attempt to solve the Ordinary differential equations
sol_dX_dt = odeint(func_dX_dt, 0.1, t, args=(S,))
sol_dS_dt = odeint(func_dS_dt, 0.1, t, args=(X,))
sol_dP_dt = odeint(func_dP_dt, 0.1, t, args=(X,S))
In the programs current state there does not seem to be be a way to generate the steady state value for P. I attempted to make this modification to get the value of X.
sol_dX_dt = odeint(func_dX_dt, 0.1, t, args=(odeint(func_dS_dt, 0.1, t, args=(X,)),))
It produces the error:
NameError: name 'X' is not defined
At this point I am not sure how to move forward.
(Edit 1: Added Original Equations)
First Equation
Second Equation and Third Equation
You do not have to apply the functions to each part but return a tuple of the derivatives as I show below:
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
Q = 0.01
V = 2
Ys = 0.38
Sin = 4
Yp = 0.2
Xin = 0
umax = 0.65
Ks = 0.14
K1 = 0.48
def mu(S, umax, Ks, K1):
return umax/((1+Ks/S)*(1+S/K1))
def dxdt(x, t, *args):
X, S, P = x
Q, V, Xin, Ys, Sin, Yp, umax, Ks, K1 = args
m = mu(S, umax, Ks, K1)
dXdt = (Q*Xin - Q*X)/V + m*X
dSdt = (Q*Sin - Q*S)/V - m*X/Ys
dPdt = -Q*P/V - m*X/Yp
return dXdt, dSdt, dPdt
t = np.linspace(0, 200, 200)
X0 = 0.1
S0 = 0.1
P0 = 0.1
x0 = X0, S0, P0
sol = odeint(dxdt, x0, t, args=(Q, V, Xin, Ys, Sin, Yp, umax, Ks, K1))
plt.plot(t, sol[:, 0], 'r', label='X(t)')
plt.plot(t, sol[:, 1], 'g', label='S(t)')
plt.plot(t, sol[:, 2], 'b', label='P(t)')
plt.legend(loc='best')
plt.xlabel('t')
plt.grid()
plt.show()
Output:

Gradient Descent in python implementation issue

Hey I am trying to understand this algorithm for a linear hypothesis. I can't figure out if my implementation is correct or not. I think it is not correct but I can't figure out what am I missing.
theta0 = 1
theta1 = 1
alpha = 0.01
for i in range(0,le*10):
for j in range(0,le):
temp0 = theta0 - alpha * (theta1 * x[j] + theta0 - y[j])
temp1 = theta1 - alpha * (theta1 * x[j] + theta0 - y[j]) * x[j]
theta0 = temp0
theta1 = temp1
print ("Values of slope and y intercept derived using gradient descent ",theta1, theta0)
It is giving me the correct answer to the 4th degree of precision. but when I compare it to other programs on the net I am getting confused by it.
Thanks in advance!
Implementation of the Gradient Descent algorithm:
import numpy as np
cur_x = 1 # Initial value
gamma = 1e-2 # step size multiplier
precision = 1e-10
prev_step_size = cur_x
# test function
def foo_func(x):
y = (np.sin(x) + x**2)**2
return y
# Iteration loop until a certain error measure
# is smaller than a maximal error
while (prev_step_size > precision):
prev_x = cur_x
cur_x += -gamma * foo_func(prev_x)
prev_step_size = abs(cur_x - prev_x)
print("The local minimum occurs at %f" % cur_x)

how can I do a maximum likelihood regression using scipy.optimize.minimize

How can I do a maximum likelihood regression using scipy.optimize.minimize? I specifically want to use the minimize function here, because I have a complex model and need to add some constraints. I am currently trying a simple example using the following:
from scipy.optimize import minimize
def lik(parameters):
m = parameters[0]
b = parameters[1]
sigma = parameters[2]
for i in np.arange(0, len(x)):
y_exp = m * x + b
L = sum(np.log(sigma) + 0.5 * np.log(2 * np.pi) + (y - y_exp) ** 2 / (2 * sigma ** 2))
return L
x = [1,2,3,4,5]
y = [2,3,4,5,6]
lik_model = minimize(lik, np.array([1,1,1]), method='L-BFGS-B', options={'disp': True})
When I run this, convergence fails. Does anyone know what is wrong with my code?
The message I get running this is 'ABNORMAL_TERMINATION_IN_LNSRCH'. I am using the same algorithm that I have working using optim in R.
Thank you Aleksander. You were correct that my likelihood function was wrong, not the code. Using a formula I found on wikipedia I adjusted the code to:
import numpy as np
from scipy.optimize import minimize
def lik(parameters):
m = parameters[0]
b = parameters[1]
sigma = parameters[2]
for i in np.arange(0, len(x)):
y_exp = m * x + b
L = (len(x)/2 * np.log(2 * np.pi) + len(x)/2 * np.log(sigma ** 2) + 1 /
(2 * sigma ** 2) * sum((y - y_exp) ** 2))
return L
x = np.array([1,2,3,4,5])
y = np.array([2,5,8,11,14])
lik_model = minimize(lik, np.array([1,1,1]), method='L-BFGS-B')
plt.scatter(x,y)
plt.plot(x, lik_model['x'][0] * x + lik_model['x'][1])
plt.show()
Now it seems to be working.
Thanks for the help!

Categories