How to distinguish if Python or Matlab is wrong/faulty? - python

I am trying to use SVD and an Eigendecomposition for some data analysis using Dynamic Mode Decomposition. I am running into a simple problem of getting different results from Matlab and Python. I'm confused and don't know why Python is giving me wrong results/matrix values but everything looks (I think IS) correct.
So instead of using real data this time and looking at large data sets, I generated data. I will try to look at an eigenvalue plot after the eigendecomposition. I also use a delay embedding for the data because I will work with a data vector which is only (2x100), so I will perform a type of Hankel matrix to enrich the data with 10 delays.
clear all; close all; clc;
data = linspace(1,100);
data2 = linspace(2,101);
data = [data;data2];
numDelays = 10;
relTol= 10^-6;
%% Create first and second snap shot matrices for DMD. Any columns with missing
% data are not used.
disp('Constructing Data Matricies:')
X = zeros((numDelays+1)*size(data,1),size(data,2)-(numDelays+1));
Y = zeros(size(X));
for i = 1:numDelays+1
X(1 + (i-1)*size(data,1):i*size(data,1),:) = ...
data(:,(i):size(data,2)-(numDelays+1) + (i-1));
Y(1 + (i-1)*size(data,1):i*size(data,1),:) = ...
data(:,(i+1):size(data,2)-(numDelays+1) + (i));
end
[U,S,V] = svd(X);
r = find(diag(S)>S(1,1)*relTol,1,'last');
disp(['DMD subspace dimension:',num2str(r)])
U = U(:,1:r);
S = S(1:r,1:r);
V = V(:,1:r);
Atil = (U'*Y)*V*(S^-1);
[what,lambda] = eig(Atil);
Phi = (Y*V)*(S^-1)*what;
Keigs = diag(lambda);
tt = linspace(0,2*pi,101);
figure;
plot(real(Keigs),imag(Keigs),'ro')
hold on
plot(cos(tt),sin(tt),'--')
import scipy.io as sc
import math as m
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sys
from numpy import dot, multiply, diag, power, pi, exp, sin, cos, cosh, tanh, real, imag
from scipy.linalg import expm, sinm, cosm, fractional_matrix_power, svd, eig, inv
def dmd(X, Y, relTol):
U2,Sig2,Vh2 = svd(X, False) # SVD of input matrix
S = np.zeros((Sig2.shape[0], Sig2.shape[0])) # Create S matrix with zeros based on Diag of S
np.fill_diagonal(S, Sig2) # Fill diagonal of S matrix with the nonzero values
r = np.count_nonzero(np.diag(S) > S[0,0] * relTol) # rank truncation
U = U2[:,:r]
Sig = diag(Sig2)[:r,:r] #GOOD =)
V = Vh2.conj().T[:,:r]
Atil = dot(dot(dot(U.conj().T, Y), V), inv(Sig)) # build A tilde
print(Atil)
mu,W = eig(Atil)
Phi = dot(dot(dot(Y, V), inv(Sig)), W) # build DMD modes
return mu, Phi
data = np.array([(np.linspace(1,100,100)),(np.linspace(2,101,100))])
Data = np.array(data)
######### Choose number of Delays ###########
# observable (coordinates of feature points). Setting to zero means only
# experimental observables will be used.
numDelays = 10
relTol = 10**-6
########## Create Data Matrices for DMD ###############
# Create first and second snap shot matrices for DMD. Any columns with missing
# data are not used.
X = np.zeros(((numDelays + 1) * data.shape[0], data.shape[1] - (numDelays + 1)))
Y = np.zeros(X.shape)
for i in range(1, numDelays + 2):
X[0 + (i - 1) * Data.shape[0]:i * Data.shape[0], :] = Data[:, (i):Data.shape[1] - (numDelays + 1) + (i - 0)]
Y[0 + (i - 1) * Data.shape[0]:i * Data.shape[0], :] = Data[:, (i + 0):Data.shape[1] - (numDelays + 1) + (i)]
Keigs, Phi = dmd(X, Y, relTol)
tt = np.linspace(0,2*np.pi,101)
plt.figure()
plt.plot(np.cos(tt),np.sin(tt),'--')
plt.plot(Keigs.real,Keigs.imag,'ro')
plt.title('DMD Eigenvalues')
plt.xlabel(r'Real $\ lambda$')
plt.ylabel(r'Imaginary $\ lambda$')
# plt.axes().set_aspect('equal')
plt.show()
So in matlab and python, I get my eigenvalues to all sit on the unit circle (as expect) and I get precisely one, sitting at 1.
So the problem comes when I look at the matrices from SVD, they appear to have different values. The only matrix that is the same is the 'S or Sig' matrix. The rest will differ a number or +/- sign. The biggest thing that peaked my interest is the Atil matrix.
In matlab, it looks like,
[1.0157, -0.3116; 7.91229e-4, 0.9843]
And python it looks like,
[1.0, -4.508e-15; -4.439e-18, 1.0]
Now this may look slightly off due to numerical error possibly but when I look at real data and these differ, it messes up my analysis.

SVD of a non-square matrix is not unique in U and V. Even if you have a square matrix with non-zero, non-degenerate singular values, singular vectors in U and V are only unique up to a sign factor.
https://math.stackexchange.com/questions/644327/how-unique-on-non-unique-are-u-and-v-in-singular-value-decomposition-svd
Moreover, Matlab (LAPACK + BLAS) and scipy.linalg.svd may use different algorithms for SVD.
This can lead to the differences you have experienced.

Related

How to apply crank-nicolson method in python to a wave equation like schrodinger's

I'm trying to do a particle in a box simulation with no potential field. Took me some time to find out that simple explicit and implicit methods break unitary time evolution so I resorted to crank-nicolson, which is supposed to be unitary. But when I try it I find that it still is not so. I'm not sure what I'm missing.. The formulation I used is this:
where T is the tridiagonal Toeplitz matrix for the second derivative wrt x and
The system simplifies to
The A and B matrices are:
I just solve this linear system for using the sparse module. The math makes sense and I found the same numeric scheme in some papers so that led me to believe my code is where the problem is.
Here's my code so far:
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import toeplitz
from scipy.sparse.linalg import spsolve
from scipy import sparse
# Spatial discretisation
N = 100
x = np.linspace(0, 1, N)
dx = x[1] - x[0]
# Time discretisation
K = 10000
t = np.linspace(0, 10, K)
dt = t[1] - t[0]
alpha = (1j * dt) / (2 * (dx ** 2))
A = sparse.csc_matrix(toeplitz([1 + 2 * alpha, -alpha, *np.zeros(N-4)]), dtype=np.cfloat) # 2 less for both boundaries
B = sparse.csc_matrix(toeplitz([1 - 2 * alpha, alpha, *np.zeros(N-4)]), dtype=np.cfloat)
# Initial and boundary conditions (localized gaussian)
psi = np.exp((1j * 50 * x) - (200 * (x - .5) ** 2))
b = B.dot(psi[1:-1])
psi[0], psi[-1] = 0, 0
for index, step in enumerate(t):
# Within the domain
psi[1:-1] = spsolve(A, b)
# Enforce boundaries
# psi[0], psi[N - 1] = 0, 0
b = B.dot(psi[1:-1])
# Square integration to show if it's unitary
print(np.trapz(np.abs(psi) ** 2, dx))
You are relying on the Toeplitz constructor to produce a symmetric matrix, so that the entries below the diagonal are the same as above the diagonal. However, the documentation for scipy.linalg.toeplitz(c, r=None) says not "transpose", but
*"If r is not given, r == conjugate(c) is assumed."
so that the resulting matrix is self-adjoint. In this case this means that the entries above the diagonal have their sign switched.
It makes no sense to first construct a dense matrix and then extract a sparse representation. Construct it as sparse tridiagonal matrix from the start, using scipy.sparse.diags
A = sparse.diags([ (N-3)*[-alpha], (N-2)*[1+2*alpha], (N-3)*[-alpha]], [-1,0,1], format="csc");
B = sparse.diags([ (N-3)*[ alpha], (N-2)*[1-2*alpha], (N-3)*[ alpha]], [-1,0,1], format="csc");

Any suggestions for a Python Lasso solver that handles complex values?

I am looking for a Python Lasso solver that works with complex numbers to use in beamforming problems. The objective function is affine, XW - Y. I believe that there at least one such solver implemented for Matlab,
http://www.cs.ubc.ca/~schmidtm/Software/code.html
I have tried to use scikit-learn MultiTaskLasso, following a suggestion from
Is it possible to use complex numbers as target labels in scikit learn?
The matrix 21 norm in the MultiTaskLasso is the correct way to handle the L1 norm for complex numbers. However, my approach requires some gymnastics to force the solver to follow the rules of complex multiplication. Essentially, I need to minimize the L2 norm of
[Re{X}, Im{X}] * [[Re{W}, Im{W}], [-Im{W}, Re{W}]] - [Re{Y}, Im{Y}]
I attempted to enforce the relationship between the two columns of W by adding another row to the X matrix, [-Im{X}, Re{X}], and the row [-Im{Y}, Re{Y}] to Y. This ideally would equate the cost of a change a value each column of W with the corresponding value in the other column
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import MultiTaskLasso
# experiment specifications
numEl = 20
numLook = 1e2
arrayPosition = np.arange(numEl) * np.pi
thetaLook = np.r_[0 : np.pi : numLook * 1j] - (np.pi / 2)
thetaSource = -0.3758
# Make grid of look vectors
W = np.exp(1j * np.sin(thetaLook)[:,None] * arrayPosition)
data = np.exp(1j * np.sin(thetaSource) * arrayPosition).T
# Bartlet beamformer
bartlet = np.abs(np.dot(W.conj(), data))**2
B_bart = 10 * np.log10(np.abs(bartlet)); B_bart-=np.max(B_bart)
# Lasso setup
X = W.T
XSplit = np.vstack((np.hstack((X.real, X.imag)),\
np.hstack((-X.imag, X.real))))
YSplit = np.hstack((np.vstack((data.real, data.imag)),\
np.vstack((-data.imag, data.real)))).T
lasso_solver = MultiTaskLasso(alpha=0.1)
lasso = lasso_solver.fit(XSplit, YSplit).coef_
# Manipulate result back into complex values
stack1 = np.squeeze(lasso[0,:])
stack1 = np.squeeze(stack1[:numLook] + 1j * stack1[numLook:])
B_lasso = 10 * np.log10(np.abs(stack1) + np.spacing(1)); B_lasso -= np.max(B_lasso)
# stack1 ?= stack2 (Should be exact)
stack2 = np.squeeze(lasso[1,:])
stack2 = np.squeeze(-1j * stack2[:numLook] + stack2[numLook:])
np.testing.assert_almost_equal(stack1, stack2, decimal=1)
# Plot both beamformer results
_ = plt.plot(np.rad2deg(thetaLook), B_bart)
_ = plt.plot(np.rad2deg(thetaLook), B_lasso, 'r.')
_ = plt.ylim(-40,3); plt.ylabel('Beamformer Output, dB')
_ = plt.xlabel('Look Direction, deg')
While this approach seems to work for simple problems like the one above, it fails when the problems get more complicated. I define failure when the relationship between the first and second column of W no longer holds. A simple way to create small divergent behavior in the above example is to substitute a Ridge solver for the MultiTaskLasso.
Does anyone know of a Lasso solver that can solve the complex valued problem with rigorous treatment of complex numbers?

Issues Translating Custom Discrete Fourier Transform from MATLAB to Python

I'm developing Python software for someone and they specifically requested that I use their DFT function, written in MATLAB, in my program. My translation is just plain not working, tested with sin(2*pi*r).
The MATLAB function below:
function X=dft(t,x,f)
% Compute DFT (Discrete Fourier Transform) at frequencies given
% in f, given samples x taken at times t:
% X(f) = sum { x(k) * e**(2*pi*j*t(k)*f) }
% k
shape = size(f);
t = t(:); % Format 't' into a column vector
x = x(:); % Format 'x' into a column vector
f = f(:); % Format 'f' into a column vector
W = exp(-2*pi*j * f*t');
X = W * x;
X = reshape(X,shape);
And my Python interpretation:
def dft(t, x, f):
i = 1j #might not have to set it to a variable but better safe than sorry!
w1 = f * t
w2 = -2 * math.pi * i
W = exp(w1 * w2)
newArr = W * x
return newArr
Why am I having issues? The MATLAB code works fine but the Python translation outputs a weird increasing sine curve instead of a Fourier transform. I get the feeling Python is handling the calculations slightly differently but I don't know why or how to fix this.
Here's your MATLAB code -
t = 0:0.005:10-0.005;
x = sin(2*pi*t);
f = 30*(rand(size(t))+0.225);
shape = size(f);
t = t(:); % Format 't' into a column vector
x = x(:); % Format 'x' into a column vector
f = f(:); % Format 'f' into a column vector
W = exp(-2*pi*1j * f*t'); %//'
X = W * x;
X = reshape(X,shape);
figure,plot(f,X,'ro')
And here's one version of numpy ported code might look like -
import numpy as np
from numpy import math
import matplotlib.pyplot as plt
t = np.arange(0, 10, 0.005)
x = np.sin(2*np.pi*t)
f = 30*(np.random.rand(t.size)+0.225)
N = t.size
i = 1j
W = np.exp((-2 * math.pi * i)*np.dot(f.reshape(N,1),t.reshape(1,N)))
X = np.dot(W,x.reshape(N,1))
out = X.reshape(f.shape).T
plt.plot(f, out, 'ro')
MATLAB Plot -
Numpy Plot -
Numpy arrays do element wise multiplication with *.
You need np.dot(w1,w2) for matrix multiplication using numpy arrays (not the case for numpy matrices)
Make sure you are clear on the distinction between Numpy arrays and matrices. There is a good help page "Numpy for Matlab Users":
http://wiki.scipy.org/NumPy_for_Matlab_Users
Doesn't appear to be working at present so here is a temporary link.
Also, use t.T to transpose a numpy array called t.

Do I underestimate the power of NumPy.. again?

I don't think I can optimize my function anymore, but it won't be my first time that I underestimate the power of NumPy.
Given:
2 rank NumPy array with coordinates
1 rank NumPy array with elevation of each coordinate
Pandas DataFrame with stations
Function:
def Function(xy_coord):
# Apply a KDTree search for (and select) 8 nearest stations
dist_tree_real, ix_tree_real = tree.query(xy_coord, k=8, eps=0, p=1)
df_sel = df.ix[ix_tree_real]
# Fits multi-linear regression to find coefficients
M = np.vstack((np.ones(len(df_sel['POINT_X'])),df_sel['POINT_X'], df_sel['POINT_Y'],df_sel['Elev'])).T
b1,b2,b3 = np.linalg.lstsq(M,df_sel['TEMP'])[0][1:4]
# Compute IDW using the coefficients
return sum( (1/dist_tree_real)**2)**-1 * sum((df_sel['TEMP'] + (b1*(xy_coord[0] - df_sel['POINT_X'])) +
(b2*(xy_coord[1]-df_sel['POINT_Y'])) + (b3*(dem[index]-df_sel['Elev']))) *
(1/dist_tree_real)**2)
And I apply the function on the coordinates as follow:
for index, coord in enumerate(xy):
outarr[index] = func(coord)
This is an iterative process, if I try this outarr = np.vectorize(func)(xy) then Python crashes, so I guess that's something I should avoid doing.
I also prepared an IPython Notebook, so I could write LaTeX, something I've always dreamed of doing for a long time. Till now. The day has come. Yeah
Off topic: the math won't show up in the nbviewer.. on my local machine it looks like this:
My suggest is don't use DataFrame for the calculation, use numpy array only. Here is the code:
dist, idx = tree.query(xy, k=8, eps=0, p=1)
columns = ["POINT_X", "POINT_Y", "Elev", "TEMP"]
px, py, elev, tmp = df[columns].values.T[:, idx, None]
tmp = np.squeeze(tmp)
one = np.ones_like(px)
m = np.concatenate((one, px, py, elev), axis=-1)
mtm = np.einsum("ijx,ijy->ixy", m, m)
mty = np.einsum("ijx,ij->ix", m, tmp)
b1,b2,b3 = np.linalg.solve(mtm, mty)[:, 1:].T
px, py, elev = px.squeeze(), py.squeeze(), elev.squeeze()
b1 = b1[:,None]
b2 = b2[:,None]
b3 = b3[:,None]
rdist = (1/dist)**2
t0 = tmp + b1*(xy[:,0,None]-px) + b2*(xy[:,1,None]-py) + b3*(dem[:,None]-elev)
outarr = (t0*rdist).sum(1) / rdist.sum(1)
print outarr
output:
[ -499.24287422 -540.28111668 -512.43789349 -589.75389439 -411.65598912
-233.1779803 -1249.63803291 -232.4924416 -273.3978919 -289.35240473]
There are some trick in the code:
np.linalg.solve in numpy 1.8 is a generalized ufunc that can solve many linear equations by one call, but lstsq is not. So I need use solve to calculate lstsq.
To do many matrix multiply by one call, we can't use dot, einsum() does the trick, but I think it may be slower than dot. You can timeit for your real data.

How to change elements in sparse matrix in Python's SciPy?

I have built a small code that I want to use for solving eigenvalue problems involving large sparse matrices. It's working fine, all I want to do now is to set some elements in the sparse matrix to zero, i.e. the ones in the very top row (which corresponds to implementing boundary conditions). I can just adjust the column vectors (C0, C1, and C2) below to achieve that. However, I wondered if there is a more direct way. Evidently, NumPy indexing does not work with SciPy's sparse package.
import scipy.sparse as sp
import scipy.sparse.linalg as la
import numpy as np
import matplotlib.pyplot as plt
#discretize x-axis
N = 11
x = np.linspace(-5,5,N)
print(x)
V = x * x / 2
h = len(x)/(N)
hi2 = 1./(h**2)
#discretize Schroedinger Equation, i.e. build
#banded matrix from difference equation
C0 = np.ones(N)*30. + V
C1 = np.ones(N) * -16.
C2 = np.ones(N) * 1.
diagonals = np.array([-2,-1,0,1,2])
H = sp.spdiags([C2, C1, C0,C1,C2],[-2,-1,0,1,2], N, N)
H *= hi2 * (- 1./12.) * (- 1. / 2.)
#solve for eigenvalues
EV = la.eigsh(H,return_eigenvectors = False)
#check structure of H
plt.figure()
plt.spy(H)
plt.show()
This is a visualisation of the matrix that is build by the code above. I want so set the elements in the first row zero.
As suggested in the comments, I'll post the answer that I found to my own question. There are several matrix classes in in SciPy's sparse package, they are listed here. One can convert sparse matrices from one class to another. So for what I need to do, I choose to convert my sparse matrix to the class csr_matrix, simply by
H = sp.csr_matrix(H)
Then I can set the elements in the first row to 0 by using the regular NumPy notation:
H[0,0] = 0
H[0,1] = 0
H[0,2] = 0
For completeness, I post the full modified code snippet below.
#SciPy Sparse linear algebra takes care of sparse matrix computations
#http://docs.scipy.org/doc/scipy/reference/sparse.linalg.html
import scipy.sparse as sp
import scipy.sparse.linalg as la
import numpy as np
import matplotlib.pyplot as plt
#discretize x-axis
N = 1100
x = np.linspace(-100,100,N)
V = x * x / 2.
h = len(x)/(N)
hi2 = 1./(h**2)
#discretize Schroedinger Equation, i.e. build
#banded matrix from difference equation
C0 = np.ones(N)*30. + V
C1 = np.ones(N) * -16.
C2 = np.ones(N) * 1.
H = sp.spdiags([C2, C1, C0, C1, C2],[-2,-1,0,1,2], N, N)
H *= hi2 * (- 1./12.) * (- 1. / 2.)
H = sp.csr_matrix(H)
H[0,0] = 0
H[0,1] = 0
H[0,2] = 0
#check structure of H
plt.figure()
plt.spy(H)
plt.show()
EV = la.eigsh(H,return_eigenvectors = False)
Using lil_matrix is much more efficient in scipy to change elements than simple numpy method.
H = sp.csr_matrix(H)
HL = H.tolil()
HL[1,1] = 5 # same as the numpy indexing notation
print HL
print HL.todense() # if numpy style matrix is required
H = HL.tocsr() # if csr is required

Categories