I want to calculate auto-covariance of 3 arrays X1, X2 and Y which are all stationary random process. Is there any function in sciPy or other library can solve this problem?
Statsmodels has auto- and cross covariance functions
http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acovf.html
http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.ccovf.html
plus the correlation functions and partial autocorrelation
http://statsmodels.sourceforge.net/devel/tsa.html#descriptive-statistics-and-tests
According to the standard estimation of the autocovariance coefficient for discrete signals, which can be expressed by equation:
...where x(i) is a given signal (i.e specific 1D vector), k stands for the shift of x(i) signal by k samples, N is the length of x(i) signal, and:
...which is simple average, we can write:
'''
Calculate the autocovarriance coefficient.
'''
import numpy as np
Xi = np.array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5])
N = np.size(Xi)
k = 5
Xs = np.average(Xi)
def autocovariance(Xi, N, k, Xs):
autoCov = 0
for i in np.arange(0, N-k):
autoCov += ((Xi[i+k])-Xs)*(Xi[i]-Xs)
return (1/(N-1))*autoCov
print("Autocovariance:", autocovariance(Xi, N, k, Xs))
If you would like to normalize the autocovariance coefficient, which will become the autocorrelation coefficient expressed as:
...than you just have to add to the above code just two additional lines:
def autocorrelation():
return autocovariance(Xi, N, k, Xs) / autocovariance(Xi, N, 0, Xs)
Here is full script:
'''
Calculate the autocovarriance and autocorrelation coefficients.
'''
import numpy as np
Xi = np.array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5])
N = np.size(Xi)
k = 5
Xs = np.average(Xi)
def autocovariance(Xi, N, k, Xs):
autoCov = 0
for i in np.arange(0, N-k):
autoCov += ((Xi[i+k])-Xs)*(Xi[i]-Xs)
return (1/(N-1))*autoCov
def autocorrelation():
return autocovariance(Xi, N, k, Xs) / autocovariance(Xi, N, 0, Xs)
print("Autocovariance:", autocovariance(Xi, N, k, Xs))
print("Autocorrelation:", autocorrelation())
A small tweak to the previous answers, which avoids python for loops and uses numpy array operations instead. This will be quicker if you have a lot of data.
def lagged_auto_cov(Xi,t):
"""
for series of values x_i, length N, compute empirical auto-cov with lag t
defined: 1/(N-1) * \sum_{i=0}^{N-t} ( x_i - x_s ) * ( x_{i+t} - x_s )
"""
N = len(Xi)
# use sample mean estimate from whole series
Xs = np.mean(Xi)
# construct copies of series shifted relative to each other,
# with mean subtracted from values
end_padded_series = np.zeros(N+t)
end_padded_series[:N] = Xi - Xs
start_padded_series = np.zeros(N+t)
start_padded_series[t:] = Xi - Xs
auto_cov = 1./(N-1) * np.sum( start_padded_series*end_padded_series )
return auto_cov
Comparing this against #bluevoxel's code, using a time-series of 50,000 data points and computing the auto-correlation for a single fixed value of lag, the python for loop code averaged about 30 milli-seconds and using numpy arrays averaged faster than 0.3 milli-seconds (running on my laptop).
Get sample auto covariance:
# cov_auto_samp(X,delta)/cov_auto_samp(X,0) = auto correlation
def cov_auto_samp(X,delta):
N = len(X)
Xs = np.average(X)
autoCov = 0.0
times = 0.0
for i in np.arange(0, N-delta):
autoCov += (X[i+delta]-Xs)*(X[i]-Xs)
times +=1
return autoCov/times
#user333700 has the right answer. Using a library (such as statsmodels) is generally preferred over writing your own. However, it is insightful to implement your own at least once.
def _check_autocovariance_input(x):
if len(x) < 2:
raise ValueError('Need at least two elements to calculate autocovariance')
def get_autocovariance_given_lag(x, lag):
_check_autocovariance_input(x)
x_centered = x - np.mean(x)
a = np.pad(x_centered, pad_width=(0, lag), mode='constant')
b = np.pad(x_centered, pad_width=(lag, 0), mode='constant')
return np.dot(a, b) / len(x)
def get_autocovariance(x):
_check_autocovariance_input(x)
x_centered = x - np.mean(x)
return np.correlate(x_centered, x_centered, mode='full')[len(x) - 1:] / len(x)
The function I have get_autocovariance_given_lag calculates the autocovariance for a given lag.
If you are interested in all lags, the get_autocovariance can be used. The np.correlate function is what statsmodels uses under the hood. It calculates the cross correlation. This is a sliding dot product. For example, suppose the array is [1, 2, 3]. Then we get:
[1, 2, 3] = 3 * 1 = 3
[1, 2, 3]
[1, 2, 3] = 2 * 1 + 3 * 2 = 8
[1, 2, 3]
[1, 2, 3] = 1 * 1 + 2 * 2 + 3 * 3 = 14
[1, 2, 3]
[1, 2, 3] = 2 * 1 + 3 * 2 = 8
[1, 2, 3]
[1, 2, 3] = 3 * 1 = 3
[1, 2, 3]
But note we are interested in the covariance that starts at lag 0. Where is this? Well, this occurs after we have moved N - 1 positions to the right where N is the length of the array. This is why we return the array starting at N-1.
Related
I am implementing the Francis double step QR Iteration algorithm using the notes and psuedocode from lecture https://people.inf.ethz.ch/arbenz/ewp/Lnotes/chapter4.pdf - Algorithm 4.5
The psuedocode is provided in Matlab I believe.
Below is the implementation of my code.
# compute upper hessenberg form of matrix
def hessenberg(A):
m,n = A.shape
H = A.astype(np.float64)
for k in range(n-2):
x = H[k+1:, k]
v = np.concatenate([np.array([np.sign(x[0]) * np.linalg.norm(x)]), x[1:]])
v = v / np.linalg.norm(v)
H[k+1:, k:] -= 2 * np.outer(v, np.dot(v, H[k+1:, k:]))
H[:, k+1:] -= 2 * np.outer(np.dot(H[:, k+1:], v), v)
return(H)
# compute first three elements of M
def first_three_M(T,s,t):
x = T[0, 0]**2 + T[0, 1] * T[1, 0] - s * T[0, 0] + t
y = T[1, 0] * (T[0, 0] + T[1, 1] - s)
z = T[1, 0] * T[2, 1]
return(x,y,z)
# householder reflection
def householder_reflection_step(x_1):
v = x_1[0] + np.sign(x_1[0]) * np.linalg.norm(x_1)
v = v / np.linalg.norm(v)
P = np.eye(3) - 2 * np.outer(v, v)
return(P)
# update elements of M
def update_M(T,k,p):
x = T[k+1, k]
y = T[k+2, k]
if k < p - 3:
z = T[k+3, k]
else:
z = 0
return(x,y,z)
# givens rotation
def givens_step(T,x_2,x,y,p,q,n):
# calculate c and s
c = x / np.sqrt(x**2 + y**2)
s = -y / np.sqrt(x**2 + y**2)
P = np.array([[c, s], [-s, c]])
T[q-1:p, p-3:n] = P.T # T[q-1:p, p-3:n]
T[0:p, p-2:p] = T[0:p, p-2:p] # P
return(T)
# deflation step
def deflation_step(T,p,q,epsilon):
if abs(T[p-1, p-2]) < epsilon * (abs(T[p-2, p-2]) + abs(T[p-1, p-1])):
T[p-1, p-2] = 0
p = p - 1
q = p - 1
elif abs(T[p-2, p-3]) < epsilon * (abs(T[p-3, p-3]) + abs(T[p-2, p-2])):
T[p-2, p-3] = 0
p = p - 2
q = p - 1
return(T,p,q)
# francis qr step
def francis_step(H, epsilon=0.90):
n = H.shape[0]
T = H.copy().astype(np.float64)
p = n - 1
while p > 2:
q = p - 1
s = T[q, q] + T[p, p]
t = T[q, q] * T[p, p] - T[q, p] * T[p, q]
# Compute M
x,y,z = first_three_M(T,s,t)
x_1 = np.transpose([[x], [y], [z]])
# Bulge chasing
for k in range(p - 3):
# Compute Householder reflector
P = householder_reflection_step(x_1)
r = max(1, k-1)
T[k:k+3, r:] = P.T # T[k:k+3, r:]
r = min(k + 3, p)
T[0:r, k:k+3] = T[0:r, k:k+3] # P
# Update M
x,y,z = update_M(T,k,p)
x_2 = np.transpose([[x], [y]])
# Compute Givens rotation
T = givens_step(T,x_2,x,y,p,q,n)
# Check for convergence
T,p,q = deflation_step(T,p,q,epsilon)
return(T)
# francis qr iteration
def francis_qr_iteration(A):
m,n = A.shape
H = hessenberg(A)
eigvals = []
iters = 0
max_iters = 100
while iters<max_iters:
# Perform Francis step
T = francis_step(H)
eigvals.append(np.diag(T))
iters+=1
return(eigvals)
# for quick testing
A = np.array([[2, 2, 3, 4, 2],
[1, 2, 4, 2, 3],
[4, 1, 2, 1, 5],
[5, 2, 5, 2, 1],
[3, 6, 3, 1, 4]])
eigenvals = francis_qr_iteration(A)
#comparing our method to scipy - final eigvals obtained
print(len(eigenvals))
print(sorted(eigenvals[-1]))
print(sorted(scipy.linalg.eig(A)[0].real))
And this is the output I am getting.
100
[-4.421235127393854, -0.909209110641351, -0.8342390091346807, 3.7552499102751575, 8.215454029003958]
[-3.0411228516834217, -1.143605409373778, -1.143605409373778, 3.325396565009845, 14.002937105421134]
The matrix T is not changing and hence it does not converge to the Schur form through which I can obtain the eigenvalues by using np.diag(T). I believe the error is coming either from the Givens rotation step or the Householder reflection step. It could be an indexing issue since I tried to work in python using matlab psuedocode. Please let me know where I am going wrong so I can improve the code and make it converge.
I have a rational function: f(x) = P(x)/Q(x).
For example:
f(x) = (5x + 3)/(1-x^2)
Because f(x) is a generating function it can be written as:
f(x) = a0 + a1*x + a2*x² + ... + a_n*x^n + ... = P(x)/Q(x)
How can I use sympy to find the nth term of the generating function f(x) (that is a_n)?
If there is no such implementation in Sympy, I am curious also to know if this implemented in other packages, such as Maxima.
I appreciate any help.
To get the general formula for a_n of the generating function of a rational form , SymPy's rational_algorithm can be used.
For example:
from sympy import simplify
from sympy.abc import x, n
from sympy.series.formal import rational_algorithm
f = (5*x + 3)/(1-x**2)
func_n, independent_term, order = rational_algorithm(f, x, n, full=True)
print(f"The general formula for a_n is {func_n}")
for k in range(10):
print(f"a_{k} = {simplify(func_n.subs(n, k))}")
Output:
The general formula for a_n is (-1)**(-n - 1) + 4
a_0 = 3
a_1 = 5
a_2 = 3
a_3 = 5
a_4 = 3
a_5 = 5
a_6 = 3
a_7 = 5
a_8 = 3
a_9 = 5
Here is another example:
f = x / (1 - x - 2 * x ** 2)
func_n, independent_term, order = rational_algorithm(f, x, n, full=True)
print(f"The general formula for a_n is {func_n.simplify()}")
print("First terms:", [simplify(func_n.subs(n, k)) for k in range(20)])
The general formula for a_n is 2**n/3 - (-1)**(-n)/3
First terms: [0, 1, 1, 3, 5, 11, 21, 43, 85, 171, 341, 683, 1365, 2731, 5461, 10923, 21845, 43691, 87381, 174763]
You could take the kth derivative and substitute 0 for x and divide by factorial(k):
>>> f = (5*x + 3) / (1-x**2)
>>> f.diff(x, 20).subs(x, 0)/factorial(20)
3
The reference here talks about rational generating functions. Looking for a recurrence you can see the pattern pretty quickly using differentiation:
[f.diff(x,i).subs(x,0)/factorial(i) for i in range(6)]
[3, 5, 3, 5, 3, 5]
Adapting the approach of this post, you could try the following:
from sympy import *
from sympy.abc import x
f = (5*x + 3) / (1-x**2)
print(f.series(n=20))
k = 50
coeff50 = Poly(series(f, x, n=k + 1).removeO(), x).coeff_monomial(x ** k)
print(f"The coeffcient of x^{k} of the generating function of {f} is {coeff50}")
# to get the first 100 coeffcients (reversing the list to get a[0] the
# coefficient of x**0 etc.):
a = Poly(series(f, x, n=100).removeO(), x).all_coeffs()[::-1]
Output:
3 + 5*x + 3*x**2 + 5*x**3 + 3*x**4 + 5*x**5 + 3*x**6 + 5*x**7 + 3*x**8 + 5*x**9 + 3*x**10 + 5*x**11 + 3*x**12 + 5*x**13 + 3*x**14 + 5*x**15 + 3*x**16 + 5*x**17 + 3*x**18 + 5*x**19 + O(x**20)
The coeffcient of x^50 of the generating function of (5*x + 3)/(1 - x**2) is 3
Following this example at Cut The Knot, the approach can be used to find out the number of ways an amount n can be paid with coins of 1, 5, 10, 25 and 50 cents.
f = 1/((1 - x)*(1 - x**5)*(1 - x**10)*(1 - x**25)*(1 - x**50))
a = Poly(series(f, x, n=101).removeO(), x).all_coeffs()[::-1]
print(a[50]) # there are 50 ways to pay 50 cents
print(a[100]) # there are 292 ways to pay 100 cents
In maxima:
powerseries((5*x+3)/(1-x^2),x,0);
returns
Use part to extract the generator:
part(''%,1);
(4-(-1)^i1)x^i1
and coeff to get the coefficient:
a(i1) := coeff(''%, x, i1);
[a(0), a(1), a(2)];
[3, 5, 3]
Another nice way to approach this is to use the ring series:
>>> from sympy.polys.ring_series import rs_mul, rs_pow
>>> from sympy.polys.rings import ring
>>> R,x=ring('x', ZZ)
>>> from sympy import ZZ
>>> R,x=ring('x', ZZ)
>>> nmax = 100
>>> s = rs_mul(5*x+3, rs_pow(1-x**2, -1, x, nmax+1), x, nmax+1)
>>> [s.coeff(x**i) for i in (2,3,5,17,100)]
[3, 5, 5, 5, 3]
Currently I'm working on a project that implements cubic spline interpolation. So far I have managed to calculate coefficients for my equations.
Now I'm trying to return an interpolating function that for any x returns y.
Let's assume that we have
x = [1, 3, 5]
y = [6, -2, 4]
The coefficients that we get are as follow:
[ 6, -5.75, 0, 0.4375, -2, -0.5, 2.625, -0.4375]
It is equal to
[ a<sub>0</sub>, b<sub>0</sub>, c<sub>0</sub>, d<sub>0</sub>, a<sub>1</sub>, b<sub>1</sub>, c<sub>1</sub>, d<sub>1</sub>]
The interpolating polynomials are
S<sub>0</sub>(x) = a<sub>0</sub> + b<sub>0</sub>*x + c<sub>0</sub>*x<sup>2</sup> + d<sub>0</sub>*x<sup>3</sup> x ∈ [1, 3]
S<sub>1</sub>(x) = a<sub>1</sub> + b<sub>1</sub>*x + c<sub>1</sub>*x<sup>2</sup> + d<sub>1</sub>*x<sup>3</sup> x ∈ (3, 5]
And so on - it can be calculated for more than only 3 points
Right now I have implemented a method that works only if one x is given as an input.
def interpolate_spline(x, x_array, coefficients):
i = 1
while x_array[i] < x:
i += 1
i = i - 1
a = coefficients[4 * i]
b = coefficients[4 * i + 1]
c = coefficients[4 * i + 2]
d = coefficients[4 * i + 3]
return a + b * x + c * (x ** 2) + d * (x ** 3)
And coming back to my question: Is there any possibility that it can be vectorized or at least take whole array as an input?
I don't know if that matters but assume that x_array is sorted
I have a list of N unit-normalized 3D vectors p stored in a numpy ndarray with shape (N, 3). I have another such list, q. I want to calculate an ndarray U of shape (N, 3, 3) storing the rotation matrices that rotate each point in p to the corresponding point q.
The list of rotation matrices U should satisfy:
np.all(np.einsum('ijk,ik->ij', U, p) == q)
On a point-by-point basis, the problem reduces to being able to compute a rotation matrix for a rotation of some angle about some axis. Code solving the single-point case appears below:
def rotation_matrix(angle, direction):
direction = np.atleast_1d(direction).astype('f4')
sina = np.sin(angle)
cosa = np.cos(angle)
direction = direction/np.sqrt(np.sum(direction*direction))
R = np.diag([cosa, cosa, cosa])
R += np.outer(direction, direction) * (1.0 - cosa)
direction *= sina
R += np.array(((0.0, -direction[2], direction[1]),
(direction[2], 0.0, -direction[0]),
(-direction[1], direction[0], 0.0)))
return R
What I need is a function that behaves exactly as the above function, but instead of accepting a single angle and a single direction, it accepts an angles array of shape (npts, ) and a directions array of shape (npts, 3). The code below is only partially finished - the problem is that neither np.diag nor np.outer accept an axis argument
def rotation_matrices(angles, directions):
directions = np.atleast_2d(directions)
angles = np.atleast_1d(angles)
npts = directions.shape[0]
directions = directions/np.sqrt(np.sum(directions*directions, axis=1)).reshape((npts, 1))
sina = np.sin(angles)
cosa = np.cos(angles)
# Lines below require extension to 2d case - np.diag and np.outer do not support axis arguments
R = np.diag([cosa, cosa, cosa])
R += np.outer(directions, directions) * (1.0 - cosa)
directions *= sina
R += np.array(((0.0, -directions[2], directions[1]),
(directions[2], 0.0, -directions[0]),
(-directions[1], directions[0], 0.0)))
return R
Does either numpy or scipy have a compact vectorized function computing the appropriate rotation matrices in a way that avoids using for loops? The problem is that neither np.diag nor np.outer accept axis as an argument. My application will have N be very large, 1e7 or greater, so a vectorized function that keeps all the relevant axes aligned is necessary for performance reasons.
Dropping this here for now, will explain later. Using levi-cevita symbols from #jaime's answer here and the matrix form of the Rodrigues formula here and some algebra based on k = (a x b)/sin(theta)
def rotmatx(p, q):
eijk = np.zeros((3, 3, 3))
eijk[0, 1, 2] = eijk[1, 2, 0] = eijk[2, 0, 1] = 1
eijk[0, 2, 1] = eijk[2, 1, 0] = eijk[1, 0, 2] = -1
d = (p * q).sum(-1)[:, None, None]
c = (p.dot(eijk) # q[..., None]).squeeze() # cross product (optimized)
cx = c.dot(eijk)
return np.eye(3) + cx + cx # cx / (1 + d)
EDIT: dang. question changed.
def rotation_matrices(angles, directions):
eijk = np.zeros((3, 3, 3))
eijk[0, 1, 2] = eijk[1, 2, 0] = eijk[2, 0, 1] = 1
eijk[0, 2, 1] = eijk[2, 1, 0] = eijk[1, 0, 2] = -1
theta = angles[:, None, None]
K = directions.dot(eijk)
return np.eye(3) + K * np.sin(theta) + K # K * (1 - np.cos(theta))
Dropping another solution for bulk rotation of a Nx3x3 matrix. Where the 3x3 components represent vector components in
[[11, 12, 13],
[21, 22, 23],
[31, 32, 33]]
Now matrix rotation by np.einsum is:
data = np.random.uniform(size=(500, 3, 3))
rotmat = np.random.uniform(size=(3, 3))
data_rot = np.einsum('ij,...jk,lk->...il', rotmat, data, rotmat)
This is equivalent to
for data_mat in data:
np.dot(np.dot(rotmat, data_mat), rotmat.T)
Speedup over a np.dot-loop is around 250x.
I'm trying to write a recursive function to calculate matrix multiplication.
EDITED :
This is the code :
def mult_mat(x, nbr):
result = [[2, 4],
[1, 3]]
if nbr == 1:
return result
else:
for i in range(len(x)):
for j in range(len(result[0])):
for k in range(len(result)):
result[i][j] += x[i][k] * result[k][j]
mult_mat(result, nbr-1)
return result
m = [[2, 4],
[1, 3]]
# the number of times m1 will be multiplied
n = 3
res = mult_mat(m, n)
for r in res:
print(r)
As an example, for n = 3 I am trying to get the result:
m1 * m1 will be [[8, 20], [5, 3]] = result and result * m1 will be [[36, 92], [23, 59]] and so on.
the output of this code is:
[10, 24]
[44, 108]
and what i want is this :
[36, 92]
[23, 59]
Okay, let's understand conceptually what you want to achieve with recursion. You want to multiply a matrix, M, with itself. mult_mat(M, 2) will give M * M, therefore, mult_mat(M, 1) just returns M itself.
In the multiplication, you have 3 matrices going on. x and y are the two matrices you're multiplying together, which you store in result. Now, let's look what happens for the first few multiplications.
x * x # n = 2
x * (x * x) # n = 3
# here, we initially calculate x * x,
# which we pass as y in the next stack for x * y
As you can see, for n = 2, you multiply x by itself, but for n > 2, y is different than x, so you must pass it on to the function somehow. We can code this idea as follows.
def mult_mat(x, nbr, y=None):
if nbr == 1:
# if y is None, it means we called `mult_mat(x, 1)`, so return `x`
if y is not None:
return y
return x
if y is None:
y = x
result = [[0, 0],
[0, 0]]
for i in range(len(x)):
for j in range(len(result[0])):
for k in range(len(result)):
result[i][j] += x[i][k] * y[k][j]
return mult_mat(x, nbr-1, result)
m = [[2, 4],
[1, 3]]
# the number of times m1 will be multiplied
n = 3
res = mult_mat(m, n)
for r in res:
print(r)
It's may look like ugly code and that's probably because there are better ways to achieve what you want without recursion. However, I couldn't think of a different way while implementing recursion. My solution logically flowed from the points I laid out at the beginning.