Python code explanation for stationary distribution of a Markov chain

Python code explanation for stationary distribution of a Markov chain - python

I have got this code:
import numpy as np
from scipy.linalg import eig
transition_mat = np.matrix([
[.95, .05, 0., 0.],\
[0., 0.9, 0.09, 0.01],\
[0., 0.05, 0.9, 0.05],\
[0.8, 0., 0.05, 0.15]])
S, U = eig(transition_mat.T)
stationary = np.array(U[:, np.where(np.abs(S - 1.) < 1e-8)[0][0]].flat)
stationary = stationary / np.sum(stationary)
>>> print stationary
[ 0.34782609 0.32608696 0.30434783 0.02173913]
But I can't understand the line:
stationary = np.array(U[:, np.where(np.abs(S - 1.) < 1e-8)[0][0]].flat)
Can anyone explain the part: U[:, np.where(np.abs(S - 1.) < 1e-8)[0][0]].flat ?
I know that the routine returns S: eigenvalue, U : eigenvector. I need to find the eigenvector corresponding to the eigenvalue 1. I have wrote the code below:
for i in range(len(S)):
if S[i] == 1.0:
j = i
matrix = np.array(U[:, j].flat)
I am getting output:
: [ 0.6144763 0.57607153 0.53766676 0.03840477]
but it does not give the same output. why?!

How to find a stationary distribution.
Ok, I came to this post looking to see if there was a built-in method to find the stationary distribution. It looks like there's not. So, for anyone coming in from Google, this is how I would find the stationary distribution in this circumstance:
import numpy as np
#note: the matrix is row stochastic.
#A markov chain transition will correspond to left multiplying by a row vector.
Q = np.array([
[.95, .05, 0., 0.],
[0., 0.9, 0.09, 0.01],
[0., 0.05, 0.9, 0.05],
[0.8, 0., 0.05, 0.15]])
#We have to transpose so that Markov transitions correspond to right multiplying by a column vector. np.linalg.eig finds right eigenvectors.
evals, evecs = np.linalg.eig(Q.T)
evec1 = evecs[:,np.isclose(evals, 1)]
#Since np.isclose will return an array, we've indexed with an array
#so we still have our 2nd axis. Get rid of it, since it's only size 1.
evec1 = evec1[:,0]
stationary = evec1 / evec1.sum()
#eigs finds complex eigenvalues and eigenvectors, so you'll want the real part.
stationary = stationary.real
What that one weird line is doing.
Let's break that line into parts:
#Find the eigenvalues that are really close to 1.
eval_close_to_1 = np.abs(S-1.) < 1e-8
#Find the indices of the eigenvalues that are close to 1.
indices = np.where(eval_close_to_1)
#np.where acts weirdly. In this case it returns a 1-tuple with an array of size 1 in it.
the_array = indices[0]
index = the_array[0]
#Now we have the index of the eigenvector with eigenvalue 1.
stationary = U[:, index]
#For some really weird reason, the person that wrote the code
#also does this step, which is completely redundant.
#It just flattens the array, but the array is already 1-d.
stationary = np.array(stationary.flat)
If you compress all these lines of code into one line you get stationary = np.array(U[:, np.where(np.abs(S-1.)<1e-8)[0][0]].flat)
If you remove the redundant stuff you get stationary = U[:, np.where(np.abs(S - 1.) < 1e-8)[0][0]]
Why your code gives a different stationary vector.
As #Forzaa pointed out, your vector cannot represent a vector of probabilities because it does not sum to 1. If you divide it by its sum, you'll get the vector the original code snippet has.
Just add this line:
stationary = matrix/matrix.sum()
Your stationary distribution will then match.

stationary = np.array(U[:,np.where(np.abs(S-1.) < 1e-8)[0][0]].flat)
This piece of code is searching for elements in U who's corresponding eigen value - 1 is less than 1e-8

actually, just do a simple while iteration. I'll use a random P as an example
def get_stationary(n):
row = n
pi = np.full((1, row), 1 / row)
T = np.array([[1/4,1/2,1/4],
[1/3,0,2/3],
[1/2,0,1/2]])
while True:
new_pi = np.dot(pi, T)
if np.allclose(pi, new_pi):
return pi
break
pi = new_pi
print(get_stationary(3))

Related

cv2 triangulatePoints always returns same Z value

I am trying to get 3D points using cv2.triangulatePoints but it always returns almost same Z value. My output looks like this: As it seen, all points are in almost same Z value. There is no depth.
Here is my triangulation:
def triangulate(self, proj_mat1, pts0, pts1):
proj_mat0 = np.zeros((3,4))
proj_mat0[:, :3] = np.eye(3)
pts0, pts1 = self.normalize(pts0), self.normalize(pts1)
pts4d = cv2.triangulatePoints(proj_mat0, proj_mat1, pts0.T, pts1.T).T
pts4d /= pts4d[:, 3:]
out = np.delete(pts4d, 3, 1)
print(out)
return out
Here is my projection matrix calculation:
def getP(self, rmat, tvec):
P = np.concatenate([rmat, tvec.reshape(3, 1)], axis = 1)
return P
Here is the part that I get rmat, tvec and call triangulation:
E, mask = cv2.findEssentialMat(np.array(aa), np.array(bb), self.K)
_, R, t, mask = cv2.recoverPose(E, np.array(aa), np.array(bb), self.K)
proj_mat1 = self.getP(R, t)
out = self.triangulate(proj_mat1, np.array(aa, dtype = np.float32), np.array(bb, dtype = np.float32))
My camera matrix:
array([[787.8113353 , 0. , 318.49905794],
[ 0. , 786.9638204 , 245.98673477],
[ 0. , 0. , 1. ]])
My projection matrix 1:
array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.]])
Explanations:
aa and bb are matched points from 2 frames.
self.K is my camera matrix
rotation and translation matrices are extracted from Essential matrix
Essential matrix calculated from matched keypoints. It changes every frame.
Projection matrix 2 changes every frame.
Output after changing first projection matrix (I switched from matplotlib to pangolin as 3D visualization tool):
Output after using P1 and P2 that I mentioned in comments:
Where is my mistake? Please let me know if any further information needed. I will update my question.

Unfortunately I don't have the possibility to double-check directly but my gut feeling is that the issues you are facing are essentially due to the choice of your first projection matrix
I did some research and I found this great paper with both theory and practice. Despite differing a little bit from your approach, there is a thing that is worth saying
If you check carefully, the first projection matrix is exactly the camera matrix with an additional last column equal to zero. In fact, the rotation matrix for the first camera reduces to the identity matrix and the corresponding translation vector is a null vector, so using this general formula:
P = KT
where P is the projection matrix, K the camera matrix and T the matrix obtained by the rotation matrix R flanked by the translation vector t according to:
T = [R|t]
then you will get:
Coming back to your case, first of all I would suggest to change your first projection matrix as just said
Also, I understand that you are planned to work with something different at every frame but if after the suggested change the things still don't match then in your shoes I'd start working with just 2 images [I think you implicitly did already to create the correspondence between aa and bb], calculating first the matrices with your algorithm and then checking with the ones obtained following the article above
In this way you would be able to understand/debug which matrices are creating you troubles

Thank you so much for all the effort #Antonino. My webcams were pretty bad. After changing every part of my code and making many trials I decided to change my webcams and bought good webcams. It worked :D Here is the result:

What type of normalization happens with sklearn

I have a matrix which I'm trying to normalize by transforming each feature column to zero mean and unit standard deviation.
I have the following code that I'm using, but I want to know if that method actually does what I'm trying to or if it uses a different method.
from sklearn import preprocessing
mat_normalized = preprocessing.normalize(mat_from_df)

sklearn.preprocessing.normalize scales each sample vector to unit norm. (The default axis is 1, not 0.) Here's proof of that:
from sklearn.preprocessing import normalize
np.random.seed(444)
data = np.random.normal(loc=5, scale=2, size=(15, 2))
np.linalg.norm(normalize(data), axis=1)
# array([ 1., 1., 1., 1., 1., 1., ...
It sounds like you're looking for sklearn.preprocessing.scale to scale each feature vector to ~N(0, 1).
from sklearn.preprocessing import scale
# Are the scaled column-wise means approx. 0.?
np.allclose(scale(data).mean(axis=0), 0.)
# True
# Are the scaled column-wise stdevs. approx. 1.?
np.allclose(scale(data).std(axis=0), 1.)
# True

Like the documentation states:
sklearn.preprocessing.normalize(X, norm='l2',
axis=1, copy=True,
return_norm=False)
Scale input vectors individually to unit norm (vector length).
So it takes the norm (by default the L2 norm) and then ensures that the vector is unit.
So if we take as input an n×m-matrix, the output is an n×m-matrix. Every m-vector is normalized. For norm='l2' (the default), thus this means that the length is calculated (by the square root of the sum of the square of the components), and every element is divided by that length, such that the result is a vector with length 1.

Why do Mathematica and Python's answers differ when dealing with singular matrix equations?

I have been dealing with linear algebra problems of the form A = Bx in Python and comparing this to a colleague's code in MATLAB and Mathematica. We have noticed differences between Python and the others when B is a singular matrix. When using numpy.linalg.solve() I throw a singular matrix error, so I've instead implemented .pinv() (the Moore Penrose pseudo inverse).
I understand that storing the inverse is computationally inefficient and am first of all curious if there's a better way of dealing with singular matrices in Python. However the thrust of my question lies in how Python chooses an answer from an infinite solution space, and why it chooses a different one than MATLAB and Mathematica do.
Here is my toy problem:
B = np.array([[2,4,6],[1,0,3],[0,7,0]])
A = np.array([[12],[4],[7]])
BI = linalg.pinv(B)
x = BI.dot(A)
The answer that Python outputs to me is:
[[ 0.4]
[ 1. ]
[ 1.2]]
While this is certainly a correct answer, it isn't the one I had intended: (1,1,1). Why does Python generate this particular solution? Is there a way to return the space of solutions rather than one possible solution? My colleague's code returned (1, 1, 1) - is there a reason that Python is different from Mathematica and MATLAB?

In short, your code (and apparently np.linalg.lstsq) uses the Moore-Penrose pseudoinverse, which is implemented in np.linalg.pinv. MATLAB and Mathematica likely use Gaussian elimination to solve the system. We can replicate this latter approach in Python using the LU decomposition:
B = np.array([[2,4,6],[1,0,3],[0,7,0]])
y = np.array([[12],[4],[7]])
P, L, U = scipy.linalg.lu(B)
This decomposes B as B = P L U, where U is now an upper-diagonal matrix, and P L is invertible. In particular, we find:
>>> U
array([[ 2., 4., 6.],
[ 0., 7., 0.],
[ 0., 0., 0.]])
and
>>> np.linalg.inv(P # L) # y
array([[ 12.],
[ 7.],
[ 0.]])
The goal is to solve this under-determined, transformed problem, U x = (P L)^{-1} y. The solution set is the same as our original problem. Let a solution be written as x = (x_1, x_2, x_3). Then we immediately see that any solution must have x_2 = 1. Then we must have 2 x_1 + 4 + 6 x_2 = 12. Solving for x_1, we get x_1 = 4 - 3 x_2. And so any solution is of the form (4 - 3 x_2, 1, x_2).
The easiest way to generate a solution for the above is to simply choose x_2 = 1. Then x_1 = 1, and you recover the solution that MATLAB gives you: (1, 1, 1).
On the other hand, np.linalg.pinv computes the Moore-Penrose pseudoinverse, which is the unique matrix satisfying the pseudionverse properties for B. The emphasis here is on unique. Therefore, when you say:
my question lies in how Python chooses an answer from an infinite solution space
the answer is that all of the choosing is actually done by you when you use the pseudoinverse, because np.linalg.pinv(B) is a unique matrix, and hence np.linalg.pinv(B) # y is unique.
To generate the full set of solutions, see the comment above by #ali_m.

Python - euclidean distance of all pairs of subsequences of given length from given array

Lets say I have an numpy array [5,7,2,3,4,6] and I choose length of subsequence to be 3.
I want to get euclidean distances of such subsequences.
Possible subsequences are:
[5,7,2]
[7,2,3]
[2,3,4]
[3,4,6]
Distance between subsequence 1. and 3. would be calculated as (5-2)^2 + (7-3)^2 + (2-4)^2. I want to do this for all pairs of subsequences.
Is there a way to avoid loops?
My real array is quite long so the solution should be memory efficient as well.
EDIT>
To elaborate more: I have a timeseries of size 10^5 to 10^8 elements
Time series is growing. each time new point is added I need to take the L newest points and find a closest match to these points in the past points of the dataset. (But I want all value of distances not only to find the closest match)
Repeating the whole calculation is unnecessary. The distance of "previously newest L points" can be updated and only modified by substracting point of age L+1 and adding point of age 0 (the newest).
E.g. lets say size of time series is currently 100 and L=10. I calculate distances of subsequence A[90:100] to all previous subsequences. When 101st point arrives I can reuse the distances and only update them by adding a squares of distances of 101st point from the time series and substracting squares of 90th point.
EDIT 2>
Thanks a lot for the ideas, looks like magic. I have one more idea that might be efficient especially for the online time series when new elements of tiem series are being added.
I am thinking about this way of updating the distances. To calculate distances of first subsequence of length L=4 to the matrix we need to have first 4 columns of the following matrix (the triangles on top and bottom could be ommited). Then the distances would be squared and summed as shown with colors.
To obtain the distances of second subsequence of L=4 we can actually reuse the previously calculated distances and substract first column (squared) from them and add 4th column(squared). For L=4 it might not make sense but for L=100 it might. One distance has to be calculated from scratch. (Actually 2 have to be calculated if the Time series grows in size).
This way I can keep in memory just the distances of one subsequence and update them to obtain distances of next subsequence.
Do you think this would be efficient with numpy? Is there an easy way to implement it?

Assuming A as the input array and L as the length of subsequence, you can get a sliding 2D array version of A with broadcasting and then use pdist from scipy.spatial.distance, like so -
# Get sliding 2D array version of input array
A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
# Get pairwise distances with pdist
pairwise_dist = pdist(A2D,'sqeuclidean')
Please note that if you meant euclidean distances, you need to replace 'sqeuclidean' with 'euclidean' or just leave out that argument as it's the default one.
Sample run -
In [209]: # Inputs
...: A = np.array([5,7,2,3,4,6])
...: L = 3
...:
In [210]: A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
In [211]: A2D
Out[211]:
array([[5, 7, 2],
[7, 2, 3],
[2, 3, 4],
[3, 4, 6]])
In [212]: pdist(A2D,'sqeuclidean')
Out[212]: array([ 30., 29., 29., 27., 29., 6.])
# [1] element (= 29) is (5-2)^2 + (7-3)^2 + (2-4)^2
To get the correspinding IDs, you could use np.triu_indices like so -
idx1,idx2 = np.triu_indices(A2D.shape[0],1)
And, finally show IDs alongside the distances like so -
ID_dist = np.column_stack((idx1,idx2,pairwise_dist))
Sample run -
In [201]: idx1,idx2
Out[201]: (array([0, 0, 0, 1, 1, 2]), array([1, 2, 3, 2, 3, 3]))
In [202]: np.column_stack((idx1,idx2,pairwise_dist))
Out[202]:
array([[ 0., 1., 30.],
[ 0., 2., 29.], # This was your (5-2)^2 + (7-3)^2 + (2-4)^2
[ 0., 3., 29.],
[ 1., 2., 27.],
[ 1., 3., 29.],
[ 2., 3., 6.]])
For cases, when you are dealing millions of elements in A and L is in hundreds, it might be a better idea to perform computations for each pairwise differentiations of such sub-sequences in a loop, like so -
# Get pairiwise IDs
idx1,idx2 = np.triu_indices(A.size-L+1,1)
# Store range array for L as would be used frequently in loop
R = np.arange(L)
# Initialize output array and start computing
pairwise_dist = np.empty(len(idx1))
for i in range(len(idx1)):
pairwise_dist[i] = ((A[R+idx2[i]] - A[R+idx1[i]])**2).sum()
You can also use np.einsum to get us the squared summations at each iteration, like so -
diffs = A[R+idx2[i]] - A[R+idx1[i]]
pairwise_dist[i] = np.einsum('i,i->',diffs,diffs)

"extended" IFFT

If I have a waveform x such as
x = [math.sin(W*t + Ph) for t in range(16)]
with arbitrary W and Ph, and I calculate its (Real) FFT f with
f = numpy.fft.rfft(x)
I can get the original x with
numpy.fft.irfft(f)
Now, what if I need to extend the range of the recovered waveform a number of samples to the left and to the right? I.e. a waveform y such that len(y) == 48, y[16:32] == x and y[0:16], y[32:48] are the periodic extensions of the original waveform.
In other words, if the FFT assumes its input is an infinite function f(t) sampled over t = 0, 1, ... N-1, how can I recover the values of f(t) for t<0 and t>=N?
Note: I used a perfect sine wave as an example, but in practice x could be anything: arbitrary signals such as x = range(16) or x = np.random.rand(16), or a segment of any length taken from a random .wav file.

Now, what if I need to extend the range of the recovered waveform a number of samples to the left and to the right? I.e. a waveform y such that len(y) == 48, y[16:32] == x and y[0:16], y[32:48] are the periodic extensions of the original waveform.
The periodic extension are also just x because it's the periodic extension.
In other words, if the FFT assumes its input is an infinite function f(t) sampled over t = 0, 1, ... N-1, how can I recover the values of f(t) for t<0 and t>=N?
The "N-point FFT assumes" that your signal is periodic with a periodicity of N. That's because all the harmonic base functions your block is decomposed into are periodic in the way that the previous N and succeding N samples are just a copy of the main N samples.
If you allow any value for W your input sinusoid won't be periodic with periodicity of N. But that does not stop the FFT function from decomposing it into a sum of many periodic sinusiods. And the sum of periodic sinusoids with periodicity of N will also have a periodicity of N.
Clearly, you have to rethink the problem.
Maybe you could make use of linear prediction. Compute a couple of linear prediction coefficients based on your fragment's windowed auto-correlation and the Levinson-Durbin recursion and extrapolate using those prediction coefficients. However, for a stable prediction filter, the prediction will converge to zero and the speed of convergence depends on what kind of signal you have. The perfect linear prediction coefficients for white noise, for example, are all zero. In that case you would "extrapolate" zeros to the left and the right. But there's not much you can do about it. If you have white noise, there is just no information in your fragment about surrounding samples because all the samples are independent (that's what white noise is about).
This kind of linear prediction is actually able to predict sinusoid samples perfectly. So, if your input is sin(W*t+p) for arbitrary W and p you will only need linear prediction with order two. For more complex signals I suggest an order of 10 or 16.

The following examples should give you a good idea of how to go about it:
>>> x1 = np.random.rand(4)
>>> x2 = np.concatenate((x1, x1))
>>> x3 = np.concatenate((x1, x1, x1))
>>> np.fft.rfft(x1)
array([ 2.30410617+0.j , -0.89574460-0.26838271j, -0.26468792+0.j ])
>>> np.fft.rfft(x2)
array([ 4.60821233+0.j , 0.00000000+0.j ,
-1.79148921-0.53676542j, 0.00000000+0.j , -0.52937585+0.j ])
>>> np.fft.rfft(x3)
array([ 6.91231850+0.j , 0.00000000+0.j ,
0.00000000+0.j , -2.68723381-0.80514813j,
0.00000000+0.j , 0.00000000+0.j , -0.79406377+0.j ])
Of course the easiest way to get three periods is to concatenate 3 copies of the inverse FFT in the time domain:
np.concatenate((np.fft.irfft(f),) * 3)
But if you want or have to do this in the frequency domain, you can do the following:
>>> a = np.arange(4)
>>> f = np.fft.rfft(a)
>>> n = 3
>>> ext_f = np.zeros(((len(f) - 1) * n + 1,), dtype=f.dtype)
>>> ext_f[::n] = f * n
>>> np.fft.irfft(ext_f)
array([ 0., 1., 2., 3., 0., 1., 2., 3., 0., 1., 2., 3.])

For stationary waveforms that are periodic in the FFT aperture or length, you can just cyclicly repeat the waveform, or the IFFT(FFT()) re-synthesized equivalent waveform, to extend them in the time domain. For waveforms which are widowed in time from sources that are not periodic in the FFT aperture or length, the FFT result will be the spectrum convolved with a Sinc function. So some sort of equivalent to a de-convolution will be required to recover the original un-windowed spectral content. Since this deconvolution is difficult or impossible, most commonly an analysis/re-synthesis method is used instead, such as a phase-vocoder process or other frequency estimators. Then those estimated frequencies, which may be different from those in the bins of a single raw FFT result, can be fed to a bank of sinusoidal synthesizers, a mix of phase-modified IFFTs, or other re-synthesis methods, to create a longer waveform with approximately the same spectral content.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.