smoothed state disturbances in statsmodels' state space model

smoothed state disturbances in statsmodels' state space model - python

I am building a custom state-space model with statsmodels as follows
from statsmodels.tsa.statespace import mlemodel
mod = mlemodel.MLEModel(YY, k_states=n_st, k_posdef=n_sh)
mod['design'] = Z
mod['transition'] = T
mod['selection'] = R
mod['state_cov'] = np.eye(n_sh)
mod['obs_intercept'] = d
mod.initialize_stationary()
I am interested in the smoothed states and smoothed state disturbances, which I get with
results = mod.smooth([])
The smoothed states in results.smoothed_state are correct (I have the true values against which I am comparing), but the smoothed state disturbances in results.smoothed_state_disturbance are shifted forward one period - the first column contains the (correct) smoothed disturbances for the second period, etc, while the last column contains zeros, which are the correct smoothed disturbances for one period after the end of the sample.
My understanding is that this has to do with the timing of the state equation, which according to statsmodels docs here is
alpha(t+1) = T alpha(t) + R eta(t) (1)
and therefore implies that the first observation y_{1} is related to the state alpha_{1} which in turn depends on the disturbance eta_{0}, and that the smoothed value of that disturbance is not returned by the smoother. On the other hand, in this statmodels docs, the timing of the state equation is
alpha(t) = T alpha(t-1) + R eta(t) (2)
and implies that the state alpha_{1} depends on eta_{1}, not on eta_{0}. Since both (future (1) and contemporaneous (2)) timing conventions appear in the statsmodels docs, I thought that it would be possible to choose which one to use. Unfortunately, haven't been able to find out how. I tried changing the smoother timing with results = mod.smooth([], filter_timing=1) which according to the docs uses Kim and Nelson (1999) (contemporaneous) timing, rather than the default Durbin and Koopman (2012) (future) timing. But then I get totally different (and wrong, because I know what the true values are) results not only from the smoother but also for the value of the loglikelihood. I also looked for examples in the unit tests for smoothing but there are only tests against MATLAB and R libraries that also use the future timing, and there are no tests (for disturbance smoothing) against STATA, which uses the alternative contemporaneous timing.
My question is, is there a way to either write the state equation with the contemporaneous timing ((2) above) or to recover the smoothed state disturbances associated with the observed data in the first period.
Here is some code for the following AR(1) model with a measurement error, using contemporaneous timing for the state equation, initialized with the stationary distribution.
alpha(0) ~ N(0, 1/(1-.5**2))
alpha(t) = .5 alpha(t-1) + eta(t), eta(t) ~ N(0, 1)
y(t) = alpha(t) + e(t), e(t) ~ N(0, 1)
from statsmodels.tsa.statespace import mlemodel
import numpy as np
import sys
from scipy.stats import multivariate_normal
from numpy.random import default_rng
gen = default_rng(42)
T = np.array([.5])
Z = np.array([1.])
Q = np.array([1.])
H = np.array([1.])
R = np.array([1.])
P0 = 1/(1-T**2)
# Simulate data for 2 periods
alpha0 = gen.normal(0, np.sqrt(P0))
eta1 = gen.normal(0, 1)
e1 = gen.normal(0, 1)
eta2 = gen.normal(0, 1)
e2 = gen.normal(0, 1)
alpha1 = .5*alpha0 + eta1
y1 = alpha1 + e1
alpha2 = .5*alpha1 + eta2
y2 = alpha2 + e2
First, use statsmodels.statespace for to compute smoothed state, smoothed state disturbance and log-likelihood given just the first data point
mod1 = mlemodel.MLEModel(y1, k_states=1, k_posdef=1)
mod1['design'] = Z
mod1['transition'] = T
mod1['selection'] = R
mod1['state_cov'] = Q
mod1['obs_cov'] = H
mod1.initialize_stationary()
results1 = mod1.smooth([])
results1.smoothed_state, results1.smoothed_state_disturbance, results1.llf
gives
(array([[-0.06491681]]), array([[0.]]), -1.3453530272821392)
Note that observing y(1) we can compute the conditional expectations of eta(1), however, what is returned here is only the conditional expectations of eta(2). Since the model is stationary and Gaussian, the conditional expectations of alpha(1) and eta(1) given y(1) can be computed from their joint distribution (see here for the relevant formulae), as shown in the following code
# Define a matrix L1 which maps [alpha(0), eta(1), e(1)] into [alpha0, eta1, e1, alpha1, y1]
L1 = np.vstack((np.eye(3), # alpha(0), eta(1), e(1)
np.r_[T, 1, 0], # alpha(1)
np.r_[T, 1, 1], # y(1)
))
# check
np.testing.assert_array_equal(np.r_[alpha0, eta1, e1, alpha1, y1],
L1 # np.r_[alpha0, eta1, e1])
# Compute Sigma1 as the covariance matrix of [alpha0, eta1, e1, alpha1, y1]
D1 = np.eye(3)
D1[0, 0] = P0
Sigma1 = L1 # D1 # L1.T
# [alpha0, eta1, e1, alpha1, y1] has a multivariate Normal distribution, and we can apply well-known formulae to compute conditional expectations and the log-likelihood
ind_e1 = 1
ind_eta1 = 2
ind_alpha1 = 3
ind_y1 = 4
smooth_eta1 = (Sigma1[ind_eta1, ind_y1]/Sigma1[ind_y1, ind_y1])*y1
smooth_alpha1 = (Sigma1[ind_alpha1, ind_y1]/Sigma1[ind_y1, ind_y1])*y1
loglik1 = multivariate_normal.logpdf(y1, cov=Sigma1[ind_y1, ind_y1])
smooth_alpha1, smooth_eta1, loglik1
which gives
(array([-0.06491681]), array([-0.04868761]), -1.3453530272821392)
Extending to the first 2 periods, with statsmodels
y = np.array([y1, y2])
mod2 = mlemodel.MLEModel(y, k_states=1, k_posdef=1)
mod2.ssm.timing_init_filtered = True
mod2['design'] = Z
mod2['transition'] = T
mod2['selection'] = R
mod2['state_cov'] = Q
mod2['obs_cov'] = H
mod2.initialize_stationary()
results2 = mod2.smooth([])
results2.smoothed_state, results2.smoothed_state_disturbance, results2.llf
gives
(array([[-0.25292213, -0.78447967]]),
array([[-0.65801861, 0. ]]),
-3.1092778246103645)
And computing the conditional expectations from the joint distribution
# L2 maps [alpha(0), eta(1), e(1), eta(2), e(2)] into [alpha0, eta1, e1, eta2, e2, alpha1, alpha2, y1, y2]
L2 = np.vstack((np.eye(5), # alpha(0), eta(1), e(1), eta(2), e(2)
np.r_[T, 1, 0, 0, 0], # alpha(1)
np.r_[T**2, T, 0, 1, 0], # alpha(2)
np.r_[T, 1, 1, 0, 0], # y(1)
np.r_[T**2, T, 0, 1, 1], # y(2)
))
np.testing.assert_array_equal(np.r_[alpha0, eta1, e1, eta2, e2, alpha1, alpha2, y1, y2],
L2 # np.r_[alpha0, eta1, e1, eta2, e2,])
# Sigma2 is the covariance of [alpha0, eta1, e1, eta2, e2, alpha1, alpha2, y1, y2]
D2 = np.eye(5)
D2[0, 0] = P0
Sigma2 = L2 # D2 # L2.T
ind_e = [2, 4]
ind_eta = [1, 3]
ind_alpha = [5, 6]
ind_y = [7, 8]
# compute smoothed disturbances and states, and loglikelihood
smooth_eta = Sigma2[ind_eta, :][:, ind_y] # np.linalg.solve(Sigma2[ind_y, :][:, ind_y], y)
smooth_alpha = Sigma2[ind_alpha, :][:, ind_y] # np.linalg.solve(Sigma2[ind_y, :][:, ind_y], y)
loglik2 = multivariate_normal.logpdf(y.flatten(), cov=Sigma2[ind_y, :][:, ind_y])
smooth_alpha.flatten(), smooth_eta.flatten(), loglik2
gives
(array([-0.25292213, -0.78447967]),
array([-0.1896916 , -0.65801861]),
-3.1092778246103636)
The smoothed states alpha(t), and the loglikelihood values are the same. The smoothed disturbances returned by statsmodels.statespace.mlemodel are for eta(2) and eta(3).

Shortest answer
The short answer to your question is that you can do the following to recover the smoothed estimate that you're looking for:
r_0 = results1.smoother_results.scaled_smoothed_estimator_presample
R = mod1['selection']
Q = mod1['state_cov']
eta_hat_0 = Q # R.T # r_0
print(eta_hat_0)
This gives [-0.04868761] as you wanted. This comes from the usual disturbance smoother equation (e.g. Durbin and Koopman's 2012 book, equation 4.69) for the period 0. In Statsmodels, we store the smoothed disturbance estimates for the sample period, 1 - nobs, but we do store the r_0, as given above, and so you can compute the value you want directly.
Alternative, maybe easier, method
This is an alternative method to get this output, which we can derive by writing your problem in a different way. Under the timing assumption of Statsmodels, as you point out, the distribution of the first state alpha(1) ~ N(a1, P1) is specified as a prior. We can view your desired model as specifying the 0th state, alpha(0) ~ N(a0, P0), as a prior and then consider the first state as being produced by the transition equation: alpha(1) = T alpha(0) + eta(0).
This is now written using the error convention of Statsmodels, and we can use Statsmodels to compute the smoothed results. The only trick is that there is no observation y(0) associated with the first state alpha(0), because we are only including the alpha(0) -> alpha(1) transition step so that we can specify the prior as you wanted. But that is no problem, we can simply include a missing value.
So if we just modify your original model by putting an nan presample value into the input data, we get the result we want:
mod1 = mlemodel.MLEModel(np.r_[np.nan, y1], k_states=1, k_posdef=1)
mod1['design'] = Z
mod1['transition'] = T
mod1['selection'] = R
mod1['state_cov'] = Q
mod1['obs_cov'] = H
mod1.initialize_stationary()
results1 = mod1.smooth([])
print(results1.smoothed_state, results1.smoothed_state_disturbance, results1.llf)
yields:
[[-0.03245841 -0.06491681]] [[-0.04868761 0. ]] -1.3453530272821392
Note on the filter_timing flag:
The filter_timing flag also allows you to change the timing convention for the prior, but not for the states. If you set the prior alpha(0|0) ~ N(a_00, P_00) in the alternative timing (filter_timing=1), then setting alpha(1) ~ N(a_1, P_1) with a_1 = T a_00 and P_1 = T P_00 T' + R Q R' in the default timing (filter_timing=0) will give you exactly the same results.

Related

Gradient Descent algorithm for linear regression do not optmize the y-intercept parameter

I'm following Andrew Ng Coursera course on Machine Learning and I tried to implement the Gradient Descent Algorithm in Python. I'm having trouble with the y-intercept parameter because it doesn't look like to go to the best value. Here's my code:
# IMPORTS
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# Acquiring Data
# Source: https://github.com/mattnedrich/GradientDescentExample
data = pd.read_csv('data.csv')
def cost_function(a, b, x_values, y_values):
'''
Calculates the square mean error for a given dataset
with (x,y) pairs and the model y' = a + bx
a: y-intercept for the model
b: slope of the curve
x_values, y_values: points (x,y) of the dataset
'''
data_len = len(x_values)
total_error = sum([((a + b * x_values[i]) - y_values[i])**2
for i in range(data_len)])
return total_error / (2 * float(data_len))
def a_gradient(a, b, x_values, y_values):
'''
Partial derivative of the cost_function with respect to 'a'
a, b: values for 'a' and 'b'
x_values, y_values: points (x,y) of the dataset
'''
data_len = len(x_values)
a_gradient = sum([((a + b * x_values[i]) - y_values[i])
for i in range(data_len)])
return a_gradient / float(data_len)
def b_gradient(a, b, x_values, y_values):
'''
Partial derivative of the cost_function with respect to 'b'
a, b: values for 'a' and 'b'
x_values, y_values: points (x,y) of the dataset
'''
data_len = len(x_values)
b_gradient = sum([(((a + b * x_values[i]) - y_values[i]) * x_values[i])
for i in range(data_len)])
return b_gradient / float(data_len)
def gradient_descent_step(a_current, b_current, x_values, y_values, alpha):
'''
Give a step in direction of the minimum of the cost_function using
the 'a' and 'b' gradiants. Return new values for 'a' and 'b'.
a_current, b_current: the current values for 'a' and 'b'
x_values, y_values: points (x,y) of the dataset
'''
new_a = a_current - alpha * a_gradient(a_current, b_current, x_values, y_values)
new_b = b_current - alpha * b_gradient(a_current, b_current, x_values, y_values)
return (new_a, new_b)
def run_gradient_descent(a, b, x_values, y_values, alpha, precision, plot=False, verbose=False):
'''
Runs the gradient_descent_step function and updates (a,b) until
the value of the cost function varies less than 'precision'.
a, b: initial values for the point a and b in the cost_function
x_values, y_values: points (x,y) of the dataset
alpha: learning rate for the algorithm
precision: value for the algorithm to stop calculation
'''
iterations = 0
delta_cost = cost_function(a, b, x_values, y_values)
error_list = [delta_cost]
iteration_list = [0]
# The loop runs until the delta_cost reaches the precision defined
# When the variation in cost_function is small it means that the
# the function is near its minimum and the parameters 'a' and 'b'
# are a good guess for modeling the dataset.
while delta_cost > precision:
iterations += 1
iteration_list.append(iterations)
# Calculates the initial error with current a,b values
prev_cost = cost_function(a, b, x_values, y_values)
# Calculates new values for a and b
a, b = gradient_descent_step(a, b, x_values, y_values, alpha)
# Updates the value of the error
actual_cost = cost_function(a, b, x_values, y_values)
error_list.append(actual_cost)
# Calculates the difference between previous and actual error values.
delta_cost = prev_cost - actual_cost
# Plot the error in each iteration to see how it decreases
# and some information about our final results
if plot:
plt.plot(iteration_list, error_list, '-')
plt.title('Error Minimization')
plt.xlabel('Iteration',fontsize=12)
plt.ylabel('Error',fontsize=12)
plt.show()
if verbose:
print('Iterations = ' + str(iterations))
print('Cost Function Value = '+ str(cost_function(a, b, x_values, y_values)))
print('a = ' + str(a) + ' and b = ' + str(b))
return (actual_cost, a, b)
When I run the algorithm with:
run_gradient_descent(0, 0, data['x'], data['y'], 0.0001, 0.01)
I get (a = 0.0496688656535 and b = 1.47825808018)
But the best value for 'a' is around 7.9 (tried another resources for linear regression).
Also, if I change the initial guess for the parameter 'a' the algorithm simply try to adjust the parameter 'b'.
For example, if I set a = 200 and b = 0
run_gradient_descent(200, 0, data['x'], data['y'], 0.0001, 0.01)
I get (a = 199.933763331 and b = -2.44824996193)
I couldn't find anything wrong with the code and I realized that the problem is the initial guess for a parameter. See my own answer above where I defined a helper function to get a range for search some values for initial a guess.

Gradient descent does not guarantee to find global optimum. Your chances of finding the global optimum depend on your starting value. To get the real values of the parameters, first I solved the least squares problem which guarantees global minimum.
data = pd.read_csv('data.csv',header=-1)
x,y = data[0],data[1]
from scipy.stats import linregress
linregress(x,y)
This results in following statistics:
LinregressResult(slope=1.32243102275536, intercept=7.9910209822703848, rvalue=0.77372849988782377, pvalue=3.855655536990139e-21, stderr=0.109377979589804)
Thus b = 1.32243102275536 and a = 7.9910209822703848. Given this, using your code I solved the problem a couple of times using randomized starting values a and b:
a,b = np.random.rand()*10,np.random.rand()*10
print("Initial values of parameters: ")
print("a=%f\tb=%f" % (a,b))
run_gradient_descent(a, b,x,y,1e-4,1e-2)
Here is the solution that I got:
Initial values of parameters:
a=6.100305 b=2.606448
Iterations = 21
Cost Function Value = 55.2093808263
a = 6.07601889437 and b = 1.36310312751
Therefore, it seems like the reason that you cannot get close to minimum is because of choice of your initial parameter values. You will see it yourself as well, if you put a and b obtained from least squares into your gradient descent algorithm, it will iterate only for one time and stay where it is.
Somehow, at some point delta_cost > precision is True and it stops there considering it a local optimum. If you decrease your precision and if you run it long enough then you might be able to find the global optimum.

The complete code for my Gradient Descent implementation could be found on my Github repository:
Gradient Descent for Linear Regression
Thinking about what #relay said that the Gradient Descent algorithm does not guarantee to find the global minima I tried to come up with an helper function to limit guesses for the parameter a in a certain search range, as follows:
def search_range(x, y, plot=False):
'''
Given a dataset with points (x, y) searches for a best guess for
initial values of 'a'.
'''
data_lenght = len(x) # Total size of of the dataset
q_lenght = int(data_lenght / 4) # Size of a quartile of the dataset
# Finding the max and min value for y in the first quartile
min_Q1 = (x[0], y[0])
max_Q1 = (x[0], y[0])
for i in range(q_lenght):
temp_point = (x[i], y[i])
if temp_point[1] < min_Q1[1]:
min_Q1 = temp_point
if temp_point[1] > max_Q1[1]:
max_Q1 = temp_point
# Finding the max and min value for y in the 4th quartile
min_Q4 = (x[data_lenght - 1], y[data_lenght - 1])
max_Q4 = (x[data_lenght - 1], y[data_lenght - 1])
for i in range(data_lenght - 1, data_lenght - q_lenght, -1):
temp_point = (x[i], y[i])
if temp_point[1] < min_Q4[1]:
min_Q4 = temp_point
if temp_point[1] > max_Q4[1]:
max_Q4 = temp_point
mean_Q4 = (((min_Q4[0] + max_Q4[0]) / 2), ((min_Q4[1] + max_Q4[1]) / 2))
# Finding max_y and min_y given the points found above
# Two lines need to be defined, L1 and L2.
# L1 will pass through min_Q1 and mean_Q4
# L2 will pass through max_Q1 and mean_Q4
# Calculatin slope for L1 and L2 given m = Delta(y) / Delta (x)
slope_L1 = (min_Q1[1] - mean_Q4[1]) / (min_Q1[0] - mean_Q4[0])
slope_L2 = (max_Q1[1] - mean_Q4[1]) / (max_Q1[0] -mean_Q4[0])
# Calculating y-intercepts for L1 and L2 given line equation in the form y = mx + b
# Float numbers are converted to int because they will be used as range for itaration
y_L1 = int(min_Q1[1] - min_Q1[0] * slope_L1)
y_L2 = int(max_Q1[1] - max_Q1[0] * slope_L2)
# Ploting L1 and L2
if plot:
L1 = [(y_L1 + slope_L1 * x) for x in data['x']]
L2 = [(y_L2 + slope_L2 * x) for x in data['x']]
plt.plot(data['x'], data['y'], '.')
plt.plot(data['x'], L1, '-', color='r')
plt.plot(data['x'], L2, '-', color='r')
plt.title('Scatterplot of Sample Data')
plt.xlabel('x',fontsize=12)
plt.ylabel('y',fontsize=12)
plt.show()
return y_L1, y_L2
The idea is to run the gradient descent with guesses for a within the range given by search_range() function and get the minimum possible value for the cost_function(). The new way to run the gradient descente becomes:
def run_search_gradient_descent(x_values, y_values, alpha, precision, verbose=False):
'''
Runs the gradient_descent_step function and updates (a,b) until
the value of the cost function varies less than 'precision'.
x_values, y_values: points (x,y) of the dataset
alpha: learning rate for the algorithm
precision: value for the algorithm to stop calculation
'''
from math import inf
a1, a2 = search_range(x_values, y_values)
best_guess = [inf, 0, 0]
for a in range(a1, a2):
cost, linear_coef, slope = run_gradient_descent(a, 0, x_values, y_values, alpha, precision)
# Saving value for cost_function and parameters (a,b)
if cost < best_guess[0]:
best_guess = [cost, linear_coef, slope]
if verbose:
print('Cost Function = ' + str(best_guess[0]))
print('a = ' + str(best_guess[1]) + ' and b = ' + str(best_guess[2]))
return (best_guess[0], best_guess[1], best_guess[2])
Running the code
run_search_gradient_descent(data['x'], data['y'], 0.0001, 0.001, verbose=True)
I've got:
Cost Function = 55.1294483959
a = 8.02595996606 and b = 1.3209768383
For comparison, using the linear regression from scipy.stats it returned
a = 7.99102098227and b = 1.32243102276

Autocorrelation to estimate periodicity with numpy

I have a large set of time series (> 500), I'd like to select only the ones that are periodic. I did a bit of literature research and I found out that I should look for autocorrelation. Using numpy I calculate the autocorrelation as:
def autocorr(x):
norm = x - np.mean(x)
result = np.correlate(norm, norm, mode='full')
acorr = result[result.size/2:]
acorr /= ( x.var() * np.arange(x.size, 0, -1) )
return acorr
This returns a set of coefficients (r?) that when plot should tell me if the time series is periodic or not.
I generated two toy examples:
#random signal
s1 = np.random.randint(5, size=80)
#periodic signal
s2 = np.array([5,2,3,1] * 20)
When I generate the autocorrelation plots I obtain:
The second autocorrelation vector clearly indicates some periodicity:
Autocorr1 = [1, 0.28, -0.06, 0.19, -0.22, -0.13, 0.07 ..]
Autocorr2 = [1, -0.50, -0.49, 1, -0.50, -0.49, 1 ..]
My question is, how can I automatically determine, from the autocorrelation vector, if a time series is periodic? Is there a way to summarise the values into a single coefficient, e.g. if = 1 perfect periodicity, if = 0 no periodicity at all. I tried to calculate the mean but it is not meaningful. Should I look at the number of 1?

I would use mode='same' instead of mode='full' because with mode='full' we get covariances for extreme shifts, where just 1 array element overlaps self, the rest being zeros. Those are not going to be interesting. With mode='same' at least half of the shifted array overlaps the original one.
Also, to have the true correlation coefficient (r) you need to divide by the size of the overlap, not by the size of the original x. (in my code these are np.arange(n-1, n//2, -1)). Then each of the outputs will be between -1 and 1.
A glance at Durbin–Watson statistic, which is similar to 2(1-r), suggests that people consider its values below 1 to be a significant indication of autocorrelation, which corresponds to r > 0.5. So this is what I use below. For a statistically sound treatment of the significance of autocorrelation refer to statistics literature; a starting point would be to have a model for your time series.
def autocorr(x):
n = x.size
norm = (x - np.mean(x))
result = np.correlate(norm, norm, mode='same')
acorr = result[n//2 + 1:] / (x.var() * np.arange(n-1, n//2, -1))
lag = np.abs(acorr).argmax() + 1
r = acorr[lag-1]
if np.abs(r) > 0.5:
print('Appears to be autocorrelated with r = {}, lag = {}'. format(r, lag))
else:
print('Appears to be not autocorrelated')
return r, lag
Output for your two toy examples:
Appears to be not autocorrelated
Appears to be autocorrelated with r = 1.0, lag = 4

Is there Implementation of Hawkes Process in PyMC?

I want to use Hawkes process to model some data. I could not find whether PyMC supports Hawkes process. More specifically I want an observed variable with Hawkes Process and learn a posterior on its params.
If it is not there, then could I define it in PyMC in some way e.g. #deterministic etc.??

It's been quite a long time since your question, but I've worked it out on PyMC today so I'd thought I'd share the gist of my implementation for the other people who might get across the same problem. We're going to infer the parameters λ and α of a Hawkes process. I'm not going to cover the temporal scale parameter β, I'll leave that as an exercise for the readers.
First let's generate some data :
def hawkes_intensity(mu, alpha, points, t):
p = np.array(points)
p = p[p <= t]
p = np.exp(p - t)
return mu + alpha * np.sum(p)
def simulate_hawkes(mu, alpha, window):
t = 0
points = []
lambdas = []
while t < window:
m = hawkes_intensity(mu, alpha, points, t)
s = np.random.exponential(scale=1/m)
ratio = hawkes_intensity(mu, alpha, points, t + s)
t = t + s
if t < window:
points.append(t)
lambdas.append(ratio)
else:
break
points = np.sort(np.array(points, dtype=np.float32))
lambdas = np.array(lambdas, dtype=np.float32)
return points, lambdas
# parameters
window = 1000
mu = 8
alpha = 0.25
points, lambdas = simulate_hawkes(mu, alpha, window)
num_points = len(points)
We just generated some temporal points using some functions that I adapted from there : https://nbviewer.jupyter.org/github/MatthewDaws/PointProcesses/blob/master/Temporal%20points%20processes.ipynb
Now, the trick is to create a matrix of size (num_points, num_points) that contains the temporal distance of the ith point from all the other points. So the (i, j) point of the matrix is the temporal interval separating the ith point to the jth. This matrix will be used to compute the sum of the exponentials of the Hawkes process, ie. the self-exciting part. The way to create this matrix as well as the sum of the exponentials is a bit tricky. I'd recommend to check every line yourself so you can see what they do.
tile = np.tile(points, num_points).reshape(num_points, num_points)
tile = np.clip(points[:, None] - tile, 0, np.inf)
tile = np.tril(np.exp(-tile), k=-1)
Σ = np.sum(tile, axis=1)[:-1] # this is our self-exciting sum term
We have points and we have a matrix containg the sum of the excitations term.
The duration between two consecutive events of a Hawkes process follow an exponential distribution of parameter λ = λ0 + ∑ excitation. This is what we are going to model, but first we have to compute the duration between two consecutive points of our generated data.
interval = points[1:] - points[:-1]
We're now ready for inference:
with pm.Model() as model:
λ = pm.Exponential("λ", 1)
α = pm.Uniform("α", 0, 1)
lam = pm.Deterministic("lam", λ + α * Σ)
interarrival = pm.Exponential(
"interarrival", lam, observed=interval)
trace = pm.sample(2000, tune=4000)
pm.plot_posterior(trace, var_names=["λ", "α"])
plt.show()
print(np.mean(trace["λ"]))
print(np.mean(trace["α"]))
7.829
0.284
Note: the tile matrix can become quite large if you have many data points.

Using dopri5 to plot a system of ODEs in matrix form

The system of equations I'm interested in plotting is the following:
I was able to plot them modifying an example someone posted by doing the following:
import scipy as sp
import pylab as plt
import numpy as np
import scipy.integrate as spi
#Constants
c13 = 4.2
c14 = 4.2
c21 = 4.3
c32 = 4.4
c34 = 4.4
c42 = 4.4
c43 = 4.4
e12 = 1.9
e23 = 2.5
e24 = 2.2
e31 = 2.0
e41 = 2.0
#Time
t_end = 700
t_start = 0
t_step = 1
t_interval = sp.arange(t_start, t_end, t_step)
#Initial Condition
r = [0.2,0.3,0.3,0.5]
def model(t,r):
Eqs= np.zeros((4))
Eqs[0] = (r[0]*(1-r[0]*r[0]-r[1]*r[1]-r[2]*r[2]-r[3]*r[3])-c21*((r[1]*r[1])*r[0])+e31*((r[2]*r[2])*r[0])+e41*((r[3]*r[3])*r[0]))
Eqs[1] = (r[1]*(1-r[0]*r[0]-r[1]*r[1]-r[2]*r[2]-r[3]*r[3])+e12*((r[0]*r[0])*r[1])-c32*((r[2]*r[2])*r[1])-c42*((r[3]*r[3])*r[1]))
Eqs[2] = (r[2]*(1-r[0]*r[0]-r[1]*r[1]-r[2]*r[2]-r[3]*r[3])-c13*((r[0]*r[0])*r[2])+e23*((r[1]*r[1])*r[2])-c43*((r[3]*r[3])*r[2]))
Eqs[3] = (r[3]*(1-r[0]*r[0]-r[1]*r[1]-r[2]*r[2]-r[3]*r[3])-c14*((r[0]*r[0])*r[3])+e24*((r[1]*r[1])*r[3])-c34*((r[2]*r[2])*r[3]))
return Eqs
ode = spi.ode(model)
ode.set_integrator('dopri5')
ode.set_initial_value(r,t_start)
ts = []
ys = []
while ode.successful() and ode.t < t_end:
ode.integrate(ode.t + t_step)
ts.append(ode.t)
ys.append(ode.y)
t = np.vstack(ts)
x1,x2,x3,x4 = np.vstack(ys).T
plt.subplot(1, 1, 1)
plt.plot(t, x1, 'r', label = 'x1')
plt.plot(t, x2, 'b', label = 'x2')
plt.plot(t, x3, 'g', label = 'x3')
plt.plot(t, x4, 'purple', label = 'x4')
plt.xlim([0,t_end])
plt.legend()
plt.ylim([-0.2,1.5])
plt.show()
This certainly appears to give me the plot I want. However, I want to end up doing stochastic analysis with this set of ODEs, and for that reason, it is much easier to model this if the system of ODEs is written in matrix form (that way, I can easily change the dimension of the noise and see how that affects the ODEs). I understand how mathematically to write the equation in matrix form, but I don't understand how to modify my code so that in the "def model(t,r):" part, it's read as an array/matrix. To convert the equations to matrix form, I can define:
b = np.array([1, 1, 1, 1])
A = np.array([[1, 1+c21, 1-e31, 1-e41],
[1-e12, 1, 1+c32, 1+c42],
[c13+1, 1-e23, 1, 1+c43],
[c14+1, 1-e24, 1+c34, 1]])
And then the system of equations would be (where x is the vector (x1,x2,x3,x4)):
x(t) = diag(x)[b^{T}-Adiag(x)x]
So my question is: How do I modify where I defined my ODEs so that I can enter them as a matrix instead of writing out each equation individually? (this would also make it easier if I later look at a system with more than 4 dimensions)

Using the implemented preference for numpy.array operations to act element-wise (in contrast to numpy.matrix operations that operate in matrix fashion) the formula for the system of equations is simply
def model(t,x):
return x*(b-A.dot(x*x))
x*x produces the vector of element-wise squares, x**2 would be another option, A.dot(x2) performs the matrix-vector product for numpy.array objects, and x*(b-...) is again the vector-valued element-wise product of the two operand vectors.
Using u=x*x as variables reduces the system to
dotu = 2*u*(b-A.dot(u))
thus it has one degree less and is quadratic in u which may help in the examination of the stationary points. I suspect that they all are hyperbolic, so that there is no asymptotically stable solution.
Using the substitution u=log(x) and thus
dotu = b-A.dot(exp(2*u))
hides the stationary points at minus infinity, thus the analytical value of this substitution may be limited. However, the positivity of x=exp(u) is built-in, which may allow for more aggressive numerical methods or provide a bit more accuracy using the same cautiousness as before.

Capturing high multi-collinearity in statsmodels

Say I fit a model in statsmodels
mod = smf.ols('dependent ~ first_category + second_category + other', data=df).fit()
When I do mod.summary() I may see the following:
Warnings:
[1] The condition number is large, 1.59e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
Sometimes the warning is different (e.g. based on eigenvalues of the design matrix). How can I capture high-multi-collinearity conditions in a variable? Is this warning stored somewhere in the model object?
Also, where can I find a description of the fields in summary()?

You can detect high-multi-collinearity by inspecting the eigen values of correlation matrix. A very low eigen value shows that the data are collinear, and the corresponding eigen vector shows which variables are collinear.
If there is no collinearity in the data, you would expect that none of the eigen values are close to zero:
>>> xs = np.random.randn(100, 5) # independent variables
>>> corr = np.corrcoef(xs, rowvar=0) # correlation matrix
>>> w, v = np.linalg.eig(corr) # eigen values & eigen vectors
>>> w
array([ 1.256 , 1.1937, 0.7273, 0.9516, 0.8714])
However, if say x[4] - 2 * x[0] - 3 * x[2] = 0, then
>>> noise = np.random.randn(100) # white noise
>>> xs[:,4] = 2 * xs[:,0] + 3 * xs[:,2] + .5 * noise # collinearity
>>> corr = np.corrcoef(xs, rowvar=0)
>>> w, v = np.linalg.eig(corr)
>>> w
array([ 0.0083, 1.9569, 1.1687, 0.8681, 0.9981])
one of the eigen values (here the very first one), is close to zero. The corresponding eigen vector is:
>>> v[:,0]
array([-0.4077, 0.0059, -0.5886, 0.0018, 0.6981])
Ignoring almost zero coefficients, above basically says x[0], x[2] and x[4] are colinear (as expected). If one standardizes xs values and multiplies by this eigen vector, the result will hover around zero with small variance:
>>> std_xs = (xs - xs.mean(axis=0)) / xs.std(axis=0) # standardized values
>>> ys = std_xs.dot(v[:,0])
>>> ys.mean(), ys.var()
(0, 0.0083)
Note that ys.var() is basically the eigen value which was close to zero.
So, in order to capture high multi-linearity, look at the eigen values of correlation matrix.

Based on a similar question for R, there are some other options that may help people. I was looking for a single number that captured the collinearity, and options include the determinant and condition number of the correlation matrix.
According to one of the R answers, determinant of the correlation matrix will "range from 0 (Perfect Collinearity) to 1 (No Collinearity)". I found the bounded range helpful.
Translated example for determinant:
import numpy as np
import pandas as pd
# Create a sample random dataframe
np.random.seed(321)
x1 = np.random.rand(100)
x2 = np.random.rand(100)
x3 = np.random.rand(100)
df = pd.DataFrame({'x1': x1, 'x2': x2, 'x3': x3})
# Now create a dataframe with multicollinearity
multicollinear_df = df.copy()
multicollinear_df['x3'] = multicollinear_df['x1'] + multicollinear_df['x2']
# Compute both correlation matrices
corr = np.corrcoef(df, rowvar=0)
multicollinear_corr = np.corrcoef(multicollinear_df, rowvar=0)
# Compare the determinants
print np.linalg.det(corr) . # 0.988532159861
print np.linalg.det(multicollinear_corr) . # 2.97779797328e-16
And similarly, the condition number of the covariance matrix will approach infinity with perfect linear dependence.
print np.linalg.cond(corr) . # 1.23116253259
print np.linalg.cond(multicollinear_corr) . # 6.19985218873e+15

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

smoothed state disturbances in statsmodels' state space model - python

Related

Gradient Descent algorithm for linear regression do not optmize the y-intercept parameter

Autocorrelation to estimate periodicity with numpy

Is there Implementation of Hawkes Process in PyMC?

Using dopri5 to plot a system of ODEs in matrix form

Capturing high multi-collinearity in statsmodels

Categories

Resources