Difference in fmin_l_bfgs_b implemented in MATLAB and Scipy.optimize

Difference in fmin_l_bfgs_b implemented in MATLAB and Scipy.optimize - python

I was following the Sparse Autoencoder tutorial in the Stanford UFLDL series (http://ufldl.stanford.edu/wiki/index.php/Exercise:Sparse_Autoencoder).
I finished the MATLAB version of implementation and it works perfectly fine, but I noticed some discrepancies when ported the same implementation to Python (with numpy, scipy, and Matplotlib).
I noticed that the cost function were not minimized to the same magnitude. I am aware that due to the fact thetas are randomly initialized, each run will give different final cost values, but after rerunning both implementations for 20+ times, I always see the Python implementation resulting f_cost = 4.57e-1 while the MATLAB version gives answers around f_cost = 4.46e-1. In other words, there is a consistent difference of ~0.01.
Since these are two implementations identical in theory (same cost func, same gradient, same minFunc, LBFGS)
Since I suspect the problem is conditional on the cost function and gradient computation, I could not reproduce it in a few lines. But you can find the full code in Python and MATLAB on Github (https://github.com/alanhyue/cs294a_2011-sparseAutoencoders).
Additional details
Here are a few more details that might help clarify the problem.
LBFGS and LBFGS-B
The starter code provided by the tutorial uses LBFGS to minimize thetas, while I did not find the exact equivalent in Scipy, I am using the scipy.optimize.fmin_l_bfgs_b. I read on Wikipedia that LBFGS-B is a boxed version of LBFGS. I suppose they should give the same result?
Both implementations passed numerical gradient checking
I assume this means the gradient calculation is correct.
Results looks somewhat correct.
As indicated in the lecture notes, a correct implementation should get a collection of line detectors, which means each patches would look like a picture of a straight line.
Here is the result from Python (with a cost of 0.457).
Results of Python implementation
Here is the result from MATLAB (with a cost of 0.446).
enter image description here

Related

Signal Convolution in C++ like Python np.convolve

I am writing a numerical simulation code where a convolution of a signal and a response function is needed (full mode). Now this sounds like a standard problem and I have used np.convolve etc. in python to great effect.
However, given that I need a faster computation (this convolution needs to be performed millions of times per simulation), I have started to implement this in C++, but I have struggled to find a analogue of np.convolve or scipy.fftconvolve in C++, where I would just plug in two std::vector<double> arrays and get the result of the discrete convolution. The only thing remotely resembling what I need is the implementation of a convolution from Numerical Recipes, however comparing to the numpy results this implementation seems to be just wrong.
So my question boils down to: Where can I find a library/code that performs the convolution just like the Python implementations do? Surely there must be some already exisiting, fast solution.

Is there a way to define a 'heterogeneous' kernel design to incorporate linear operators into the regression for GPflow (or GPytorch/GPy/...)?

I'm trying to perform a GP regression with linear operators as described in for example this paper by Särkkä: https://users.aalto.fi/~ssarkka/pub/spde.pdf In this example we can see from equation (8) that I need a different kernel function for the four covariance blocks (of training and test data) in the complete covariance matrix.
This is definitely possible and valid, but I would like to include this in a kernel definition of (preferably) GPflow, or GPytorch, GPy or the like.
However, in the documentation for kernel design in Gpflow, the only possibility is to define a covariance function that acts on all covariance blocks. In principle, the method above should be straight-forward to add myself (the kernel function expressions can be derived analytically), but I don't see any way of incorporating the 'heterogeneous' kernel functions into the regression or kernel classes. I tried to consult other packages such as Gpytorch and Gpy, but again, the kernel design does not seem to allow this.
Maybe I'm missing something here, maybe I'm not familiar enough with the underlying implementation to asses this, but if someone has done this before or sees the (what should be reasonably straight-forward?) implementation possibility, I would be happy to find out.
Thank you very much in advance for your answer!
Kind regards

This should be reasonably straightforward, though requires building a custom kernel. Basically, you need a kernel that can know for each input what the linear operator for the corresponding output is (whether this is a function observation/identity operator, integral observation, derivative observation, etc). You can achieve this by including an extra column in your input matrix X, similar to how it's done for the gpflow.kernels.Coregion kernel (see this notebook). You would need to then need to define a new kernel with K and K_diag methods that for each linear operator type find the corresponding rows in the input matrix, and pass it to the appropriate covariance function (using tf.dynamic_partition and tf.dynamic_stitch, this is used in a very similar way in GPflow's SwitchedLikelihood class).
The full implementation would probably take half a day or so, which is beyond what I can do here, but I hope this is a useful starting pointer, and you're very welcome to join the GPflow slack (invite link in the GPflow README) and discuss it in more detail there!

Scipy Linear algebra LinearOperator function utilised in Conjugate Gradient

I am preconditioning a matrix using spilu, however, to pass this preconditioner into cg (the built in conjugate gradient method) it is necessary to use the LinearOperator function, can someone explain to me the parameter matvec, and why I need to use it. Below is my current code
Ainv=scla.spilu(A,drop_tol= 1e-7)
Ainv=scla.LinearOperator(Ainv.shape,matvec=Ainv)
scla.cg(A,b,maxiter=maxIterations, M = Ainv)
However this doesnt work and I am given the error TypeError: 'SuperLU' object is not callable. I have played around and tried
Ainv=scla.LinearOperator(Ainv.shape,matvec=Ainv.solve)
instead. This seems to work but I want to know why matvec needs Ainv.solve rather than just Ainv, and is it the right thing to feed LinearOperator?
Thanks for your time

Without having much experience with this part of scipy, some comments:
According to the docs you don't have to use LinearOperator, but you might do
M : {sparse matrix, dense matrix, LinearOperator}, so you can use explicit matrices too!
The idea/advantage of the LinearOperator:
Many iterative methods (e.g. cg, gmres) do not need to know the individual entries of a matrix to solve a linear system A*x=b. Such solvers only require the computation of matrix vector products docs
Depending on the task, sometimes even matrix-free approaches are available which can be much more efficient
The working approach you presented is indeed the correct one (some other source doing it similarily, and some course-materials doing it like that)
The idea of not using the inverse matrix, but using solve() here is not to form the inverse explicitly (which might be very costly)
A similar idea is very common in BFGS-based optimization algorithms although wiki might not give much insight here
scipy has an extra LinearOperator for this not forming the inverse explicitly! (although i think it's only used for statistics / completing/finishing some optimization; but i successfully build some LBFGS-based optimizers with this one)
Source # scicomp.stackexchange discussing this without touching scipy
And because of that i would assume spilu is completely going for this too (returning an object with a solve-method)

Periodogram in MATLAB and Python scipy gives different results? [duplicate]

I am porting some matlab code to python using scipy and got stuck with the following line:
Matlab/Octave code
[Pxx, f] = periodogram(x, [], 512, 5)
Python code
f, Pxx = signal.periodogram(x, 5, nfft=512)
The problem is that I get different output on the same data. More specifically, Pxx vectors are different. I tried different windows for signal.periodogram, yet no luck (and it seems that default scypy's boxcar window is the same as default matlab's rectangular window) Another strange behavior is that in python, first element of Pxx is always 0, no matter what data input is.
Am i missing something? Any advice would be greatly appreciated!
Simple Matlab/Octave code with actual data: http://pastebin.com/czNeyUjs
Simple Python+scipy code with actual data: http://pastebin.com/zPLGBTpn

After researching octave's and scipy's periodogram source code I found that they use different algorithm to calculate power spectral density estimate. Octave (and MATLAB) use FFT, whereas scipy's periodogram use the Welch method.
As #georgesl has mentioned, the output looks quite alike, but still, it differs. And for porting reason it was critical. In the end, I simply wrote a small function to calculate PSD estimate using FFT, and now output is the same. According to timeit testing, it works ~50% faster (1.9006s vs 2.9176s on a loop with 10.000 iterations). I think it's due to the FFT being faster than Welch in scipy's implementation, of just being faster.
Thanks to everyone who showed interest.

I faced the same problem but then I came across the documentation of scipy's periodogram
As you would see there that detrend='constant' is the default argument. This means that python automatically subtracts the mean of the input data from each point. (Read here). While Matlab/Octave do no such thing. I believe that is the reason why the outputs are different. Try specifying detrend=False, while calling scipy's periodogram you should get the same output as Matlab.

After reading the Matlab and Scipy documentation, another contribution to the different values could be that they use different default window function. Matlab uses a Hamming window, and Scipy uses a Hanning. The two window functions and similar but not identical.

Did you look at the results ?
The slight differences between the two results may comes from optimizations/default windows/implementations/whatever etc.

Comparing fsolve results in python and matlab

I have a follow up question to the post written a couple days ago, thank you for the previous feedback:
Finding complex roots from set of non-linear equations in python
I have gotten the set non-linear equations set up in python now so that fsolve will handle the real and imaginary parts independently. However, there are still problems with the python "fsolve" converging to the correct solution. I have exactly the same inputs that are used in Matlab, and after double checking, the set of equations are exactly the same as well. Matlab, no matter how I set the initial values, will always converge to the correct solution. With python however, every initial condition produces a different result, and never the correct one. After a fraction of a second, the following warning appears with python:
/opt/local/Library/Frameworks/Python.framework/Versions/Current/lib/python2.7/site-packages/scipy/optimize/minpack.py:227:
RuntimeWarning: The iteration is not making good progress, as measured by the
improvement from the last ten iterations.
warnings.warn(msg, RuntimeWarning)
I was wondering if there are some known differences between the fsolve in python and Matlab, and if there are some known methods to optimize the performance in python.
Thank you very much

I don't think that you should rely on the fact that the names are the same. I see from your other question that you are specifying that Matlab's fsolve use the 'levenberg-marquardt' algorithm rather than the default. Python's scipy.optimize.fsolve uses MINPACK's hybrd algorithms. Levenberg-Marquardt finds roots approximately by minimizing the sum of squares of the function and is quite robust. It is not a true root-finding method like the default 'trust-region-dogleg' algorithm. I don't know how the hybrd schemes work, but they claim to be a modification of Powell's method.
If you want something similar to what you're doing in Matlab, I'd look for an optimization scheme that implements Levenberg-Marquardt, such as scipy.optimize.root, which you were also using in your previous question. Is there a reason why you're not using that?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.