LASSO regression result different in Matlab and Python - python

I am now trying to learn the ADMM algorithm (Boyd 2010) for LASSO regression.
I found out a very good example on this page.
The matlab code is shown here.
I tried to convert it into python language so that I could develop a better understanding.
Here is the code:
import scipy.io as io
import scipy.sparse as sp
import scipy.linalg as la
import numpy as np
def l1_norm(x):
return np.sum(np.abs(x))
def l2_norm(x):
return np.dot(x.ravel().T, x.ravel())
def fast_threshold(x, threshold):
return np.multiply(np.sign(x), np.fmax(abs(x) - threshold, 0))
def lasso_admm(X, A, gamma):
c = X.shape[1]
r = A.shape[1]
C = io.loadmat("C.mat")["C"]
L = np.zeros(X.shape)
rho = 1e-4
maxIter = 200
I = sp.eye(r)
maxRho = 5
cost = []
for n in range(maxIter):
B = la.solve(np.dot(A.T, A) + rho * I, np.dot(A.T, X) + rho * C - L)
C = fast_threshold(B + L / rho, gamma / rho)
L = L + rho * (B - C);
rho = min(maxRho, rho * 1.1);
cost.append(0.5 * l2_norm(X - np.dot(A, B)) + gamma * l1_norm(B))
cost = np.array(cost).ravel()
return B, cost
data = io.loadmat("lasso.mat")
A = data["A"]
X = data["X"]
B, cost = lasso_admm(X, A, gamma)
I have found the loss function did not converge after 100+ iterations. Matrix B did not tend to be sparse, on the other hand, the matlab code worked in different situations.
I have checked with different input data and compared with Matlab outputs, yet I still could not get hints.
Could anybody take a try?
Thank you in advance.

My gut feeling as to why this is not working to your expectations is your la.solve() call. la.solve() assumes that the matrix is full rank and is independent (i.e. invertible). When you use \ in MATLAB, what MATLAB does under the hood is that if the matrix is full rank, the exact inverse is found. However, should the matrix not be this way (i.e. overdetermined or underdetermined), the solution to the system is solved by least-squares instead. I would suggest you modify that call so that you're using lstsq instead of solve. As such, simply replace your la.solve() call with this:
sol = la.lstsq(np.dot(A.T, A) + rho * I, np.dot(A.T, X) + rho * C - L)
B = sol[0]
Note that lstsq returns a whole bunch of outputs in a 4-element tuple, in addition to the solution. The solution of the system is in the first element of this tuple, which is why I did B = sol[0]. What is also returned are the sums of residues (second element), the rank (third element) and the singular values of the matrix you are trying to invert when solving (fourth element).
Also some peculiarities that I have noticed:
One thing that may or may not matter is the random generation of numbers. MATLAB and Python NumPy generate random numbers differently, so this may or may not affect your solution.
In MATLAB, Simon Lucey's code initializes L to be a zero matrix such that L = zeros(size(X));. However, in your Python code, you initialize L to be this way: L = np.zeros(C.shape);. You are using different variables to ascertain the shape of L. Obviously, the
code wouldn't work if there was a dimension mismatch, but that's another thing that's different. Not sure if this will affect your solution either.
So far I haven't found anything out of the ordinary, so try that fix and let me know.

Related

Derivatives in python

I am trying to find the coefficients of a finite series, $f(x) = \sum_n a_nx^n$. To get the $m$th coefficient, we can take the $m$th derivative evaluated at zero. Therefore, the $m$th coefficient is
$$
a_n = \frac{1}{2\pi i } \oint_C \frac{f(z)}{z^{n+1}} dz
$$
I believe this code takes the derivative of a function using the above contour integral.
import math
import numpy
import matplotlib.pyplot as plt
def F(x):
mean=10
return math.exp(mean*(x.real-1))
def p(n):
mean=10
return (math.pow(mean, n) * math.exp(-mean)) / math.factorial(n)
def integration(func, a, n, r, n_steps):
z = r * numpy.exp(2j * numpy.pi * numpy.arange(0, 1, 1. / n_steps))
return math.factorial(n) * numpy.mean(func(a + z) / z**n)
ns = list(range(20))
f2 = numpy.vectorize(F)
plt.plot(ns,[p(n) for n in ns], label='Actual')
plt.plot(ns,[integration(f2, a=0., n=n, r=1., n_steps=100).real/math.factorial(n) for n in ns], label='Numerical derivative')
plt.legend()
However, it is clear that the numerical derivative is completely off the actual values of the coefficients of the series. What am I doing wrong?
The formulas in the Mathematics Stack Exchange answer that you're using to derive the coefficients of the power series expansion of F are based on complex analysis - coming for example from Cauchy's residue theorem (though other derivations are possible). One of the assumptions necessary to make those formulas work is that you have a holomorphic (i.e., complex differentiable) function.
Your definition of F gives a function that's not holomorphic. (For one thing, it always gives a real result for any complex input, which isn't possible for a non-constant holomorphic function.) But it's easily fixed to be holomorphic, while continuing to return the same result for real inputs.
Here's a fixed version of F, which replaces x.real with x. Since the input to exp is now complex, it's also necessary to use cmath.exp instead of math.exp to avoid a TypeError:
import cmath
def F(x):
mean=10
return cmath.exp(mean*(x-1))
After that fix for F, if I run your code I get rather surprisingly accurate results. Here's the graph that I get. (I had to print out the values to double check that that graph really did show two lines on top of one another.)

Solving coupled complex ODEs with Python (Propagating two time-dependent signals in space)

Im trying to simulate wave-propagation in nonlinear material. I consider a fundamental wave ![A(z, t)] which enters the material with a defined signal-shape for z0, and a second harmonic wave B(z, t) with B(z0, t) = 0. Nonlinear effects in the material causing the fundamental wave A(z, t) to generate second harmonics B(z, t) while travelling though the material in z-direction.
the generation can be described as follows:
I tried to implement this using Pythons odeint like this:
def NL_coupling(t, Amp, c):
A, B = Amp
rho1, rho2 = c
return [1j * rho1 * B * np.conj(A),
1j * rho2 * A * A]
solveNonlinear = integrate.ode(NL_coupling).set_integrator('zvode', method='bdf')
solveNonlinear.set_f_params(paramCoupling).set_initial_value([A, B], 0)
When I try to iterate the solver, I get the following error:
ZWORK length needed, LENZW (=I1), exceeds LZW (=I2)
I used the same solver before to propagate a single wave without coupling which worked fine. keep in mind that A and B are (complex) arrays containing the time-resolved pulseform.
So how could I possibly do this?
I tried solving the ODE for a single timepoint t, so that [A, B] is a list containing only two elements, than iterating over the whole signal. But since the solver does not calculate the value for the exact same point every time, this method does not really provide useful results.
Thanks for your help

Integrate a 2D vectorfield-array (reversing np.gradient)

i have the following problem:
I want to integrate a 2D array, so basically reversing a gradient operator.
Assuming i have a very simple array as follows:
shape = (60, 60)
sampling = 1
k_mesh = np.meshgrid(np.fft.fftfreq(shape[0], sampling), np.fft.fftfreq(shape[1], sampling))
Then i construct my vectorfield as a complex-valued arreay (x-vector = real part, y-vector = imaginary part):
k = k_mesh[0] + 1j * k_mesh[1]
So the real part for example looks like this
Now i take the gradient:
k_grad = np.gradient(k, sampling)
I then use Fourier transforms to reverse it, using the following function:
def freq_array(shape, sampling):
f_freq_1d_y = np.fft.fftfreq(shape[0], sampling[0])
f_freq_1d_x = np.fft.fftfreq(shape[1], sampling[1])
f_freq_mesh = np.meshgrid(f_freq_1d_x, f_freq_1d_y)
f_freq = np.hypot(f_freq_mesh[0], f_freq_mesh[1])
return f_freq
def int_2d_fourier(arr, sampling):
freqs = freq_array(arr.shape, sampling)
k_sq = np.where(freqs != 0, freqs**2, 0.0001)
k = np.meshgrid(np.fft.fftfreq(arr.shape[0], sampling), np.fft.fftfreq(arr.shape[1], sampling))
v_int_x = np.real(np.fft.ifft2((np.fft.fft2(arr[1]) * k[0]) / (2*np.pi * 1j * k_sq)))
v_int_y = np.real(np.fft.ifft2((np.fft.fft2(arr[0]) * k[0]) / (2*np.pi * 1j * k_sq)))
v_int_fs = v_int_x + v_int_y
return v_int_fs
k_int = int_2d_fourier(k, sampling)
Unfortunately, the result is not very accurate at the position where k has an abrupt change, as can be seen in the plot below, which displayes a horizontal line profile of k and k_int.
Any ideas how to improve the accuracy? Is there a way to make it exactly the same?
I actually found a solution. The integration itself yields very accurate results.
However, the gradient function from numpy calculates second order accurate central differences, which means that the gradient itself already is an approximation.
When you replace the problem above with an analytical formula such as a 2D Gaussian, one can calculate the derivative analytically. When integrating this analytically derived function, the error is on the order of 10^-10 (depending on the width of the Gaussian, which can lead to aliasing effects).
So long story short: The integration function proposed above works as intended!

numpy fit coefficients to linear combination of polynomials

I have data that I want to fit with polynomials. I have 200,000 data points, so I want an efficient algorithm. I want to use the numpy.polynomial package so that I can try different families and degrees of polynomials. Is there some way I can formulate this as a system of equations like Ax=b? Is there a better way to solve this than with scipy.minimize?
import numpy as np
from scipy.optimize import minimize as mini
x1 = np.random.random(2000)
x2 = np.random.random(2000)
y = 20 * np.sin(x1) + x2 - np.sin (30 * x1 - x2 / 10)
def fitness(x, degree=5):
poly1 = np.polynomial.polynomial.polyval(x1, x[:degree])
poly2 = np.polynomial.polynomial.polyval(x2, x[degree:])
return np.sum((y - (poly1 + poly2)) ** 2 )
# It seems like I should be able to solve this as a system of equations
# x = np.linalg.solve(np.concatenate([x1, x2]), y)
# minimize the sum of the squared residuals to find the optimal polynomial coefficients
x = mini(fitness, np.ones(10))
print fitness(x.x)
Your intuition is right. You can solve this as a system of equations of the form Ax = b.
However:
The system is overdefined and you want to get the least-squares solution, so you need to use np.linalg.lstsq instead of np.linalg.solve.
You can't use polyval because you need to separate the coefficients and powers of the independent variable.
This is how to construct the system of equations and solve it:
A = np.stack([x1**0, x1**1, x1**2, x1**3, x1**4, x2**0, x2**1, x2**2, x2**3, x2**4]).T
xx = np.linalg.lstsq(A, y)[0]
print(fitness(xx)) # test the result with original fitness function
Of course you can generalize over the degree:
A = np.stack([x1**p for p in range(degree)] + [x2**p for p in range(degree)]).T
With the example data, the least squares solution runs much faster than the minimize solution (800µs vs 35ms on my laptop). However, A can become quite large, so if memory is an issue minimize might still be an option.
Update:
Without any knowledge about the internals of the polynomial function things become tricky, but it is possible to separate terms and coefficients. Here is a somewhat ugly way to construct the system matrix A from a function like polyval:
def construct_A(valfunc, degree):
columns1 = []
columns2 = []
for p in range(degree):
c = np.zeros(degree)
c[p] = 1
columns1.append(valfunc(x1, c))
columns2.append(valfunc(x2, c))
return np.stack(columns1 + columns2).T
A = construct_A(np.polynomial.polynomial.polyval, 5)
xx = np.linalg.lstsq(A, y)[0]
print(fitness(xx)) # test the result with original fitness function

Computing filter(b,a,x,zi) using FFTs

I would like to try to compute y=filter(b,a,x,zi) and dy[i]/dx[j] using FFTs rather than in the time domain for possible speedup in a GPU implementation.
I am not sure it's possible, particularly when zi is non-zero. I looked through how scipy.signal.lfilter in scipy and filter in octave are implemented. They are both done directly in the time domain, with scipy using direct form 2 and octave direct form 1 (from looking through code in DLD-FUNCTIONS/filter.cc). I haven't seen anywhere an FFT implementation analogous to fftfilt for FIR filters in MATLAB (i.e. a = [1.]).
I tried doing y = ifft(fft(b) / fft(a) * fft(x)) but this seems to be conceptually wrong. Also, I am not sure how to handle the initial transient zi. Any references, pointer to existing implementation, would be appreciated.
Example code,
import numpy as np
import scipy.signal as sg
import matplotlib.pyplot as plt
# create an IRR lowpass filter
N = 5
b, a = sg.butter(N, .4)
MN = max(len(a), len(b))
# create a random signal to be filtered
T = 100
P = T + MN - 1
x = np.random.randn(T)
zi = np.zeros(MN-1)
# time domain filter
ylf, zo = sg.lfilter(b, a, x, zi=zi)
# frequency domain filter
af = sg.fft(a, P)
bf = sg.fft(b, P)
xf = sg.fft(x, P)
yfft = np.real(sg.ifft(bf/af * xf))[:T]
# error
print np.linalg.norm(yfft - ylf)
# plot, note error is larger at beginning and with larger N
plt.figure(1)
plt.clf()
plt.plot(ylf)
plt.plot(yfft)
You can reduce the error in your existing implementation by replacing P = T + MN - 1 with P = T + 2*MN - 1. This is purely intuitive, but it seems to me that the division of bf and af will require 2*MN terms, due to wraparound.
C.S. Burrus has a pretty terse writeup of how to regard filtering, whether FIR or IIR, in a block oriented way, here. I haven't read it in detail, but I think it gives you the equations you need to implement IIR filtering by convolution, including intermediate states.
I've forgotten what little I knew about FFTs but you could take a look at sedit.py and frequency.py at http://jc.unternet.net/src/ and see if anything there would help.
Try scipy.signal.lfiltic(b, a, y, x=None) to obtain the initial conditions.
Doc text for lfiltic:
Given a linear filter (b,a) and initial conditions on the output y
and the input x, return the inital conditions on the state vector zi
which is used by lfilter to generate the output given the input.
If M=len(b)-1 and N=len(a)-1. Then, the initial conditions are given
in the vectors x and y as
x = {x[-1],x[-2],...,x[-M]}
y = {y[-1],y[-2],...,y[-N]}
If x is not given, its inital conditions are assumed zero.
If either vector is too short, then zeros are added
to achieve the proper length.
The output vector zi contains
zi = {z_0[-1], z_1[-1], ..., z_K-1[-1]} where K=max(M,N).

Categories