Periodogram in MATLAB and Python scipy gives different results? [duplicate] - python

I am porting some matlab code to python using scipy and got stuck with the following line:
Matlab/Octave code
[Pxx, f] = periodogram(x, [], 512, 5)
Python code
f, Pxx = signal.periodogram(x, 5, nfft=512)
The problem is that I get different output on the same data. More specifically, Pxx vectors are different. I tried different windows for signal.periodogram, yet no luck (and it seems that default scypy's boxcar window is the same as default matlab's rectangular window) Another strange behavior is that in python, first element of Pxx is always 0, no matter what data input is.
Am i missing something? Any advice would be greatly appreciated!
Simple Matlab/Octave code with actual data: http://pastebin.com/czNeyUjs
Simple Python+scipy code with actual data: http://pastebin.com/zPLGBTpn

After researching octave's and scipy's periodogram source code I found that they use different algorithm to calculate power spectral density estimate. Octave (and MATLAB) use FFT, whereas scipy's periodogram use the Welch method.
As #georgesl has mentioned, the output looks quite alike, but still, it differs. And for porting reason it was critical. In the end, I simply wrote a small function to calculate PSD estimate using FFT, and now output is the same. According to timeit testing, it works ~50% faster (1.9006s vs 2.9176s on a loop with 10.000 iterations). I think it's due to the FFT being faster than Welch in scipy's implementation, of just being faster.
Thanks to everyone who showed interest.

I faced the same problem but then I came across the documentation of scipy's periodogram
As you would see there that detrend='constant' is the default argument. This means that python automatically subtracts the mean of the input data from each point. (Read here). While Matlab/Octave do no such thing. I believe that is the reason why the outputs are different. Try specifying detrend=False, while calling scipy's periodogram you should get the same output as Matlab.

After reading the Matlab and Scipy documentation, another contribution to the different values could be that they use different default window function. Matlab uses a Hamming window, and Scipy uses a Hanning. The two window functions and similar but not identical.

Did you look at the results ?
The slight differences between the two results may comes from optimizations/default windows/implementations/whatever etc.

Related

What is the most efficient way to calculate the eigen values of a large covariance matrix?

I have been trying for some days to calculate the nearest positive semi-definite matrix from a very large covariance matrix to be able to sample from it.
I have tried MATLAB for the effect, but the memory usage is insane and it always crashes eventually, no error message or log file as far as I searched. The function used for the calculation can be found here https://www.mathworks.com/matlabcentral/fileexchange/42885-nearestspd. Optimizing the function to remove intermediate matrices seemed to reduce the memory usage, but it eventually crashes much in the same way.
Found this approach for doing the calculation https://stackoverflow.com/a/63131309/18660401 and switched to Python, in hopes of finding some GPU libraries to accelerate the calculations, but it seems I cannot find an up-to-date library that suports calculating eigenvectors using the numpy function. This is the function I am using:
import numpy as np
def get_near_psd(A):
C = (A + A.T)/2
eigval, eigvec = np.linalg.eig(C)
eigval[eigval < 0] = 0
return eigvec.dot(np.diag(eigval)).dot(eigvec.T)
I am currently trying to run the same function with numba in hopes that the translation to LLVM is enough to make the calculations in reasonable time, only modified the above version to include the #jit decorator from numba.
There does not seem to be a very optimized way to do this as far as I can find on my own, so any suggestion is very appreciated to crack this.
Edit: The matrix is a two-dimensional 60416x60416 covariance matrix and it is to be used to generate new samples from the distribution of the mean and covariance matrix calculated from a set of samples using a GAN. For training purposes, samples also need to be generated from randomly sampling the distribution, which I am intending to use the function multivariate_normal from numpy for.
A very up to date library that does have these capabilities including GPU support is pytorch, check out the examples on the torch.linalg.eig-function and the corresponding accelerated function torch.linalg.eigh for symmetric/hermitian matrices. You do have to convert these matrices from numpy to pytorch-tensors first to do the computation (and then convert it back), but you can definitely use it in a very similar way.
Of course also this library can't just magically give you more memory, but it is highly optimized.

Signal Convolution in C++ like Python np.convolve

I am writing a numerical simulation code where a convolution of a signal and a response function is needed (full mode). Now this sounds like a standard problem and I have used np.convolve etc. in python to great effect.
However, given that I need a faster computation (this convolution needs to be performed millions of times per simulation), I have started to implement this in C++, but I have struggled to find a analogue of np.convolve or scipy.fftconvolve in C++, where I would just plug in two std::vector<double> arrays and get the result of the discrete convolution. The only thing remotely resembling what I need is the implementation of a convolution from Numerical Recipes, however comparing to the numpy results this implementation seems to be just wrong.
So my question boils down to: Where can I find a library/code that performs the convolution just like the Python implementations do? Surely there must be some already exisiting, fast solution.

Difference in fmin_l_bfgs_b implemented in MATLAB and Scipy.optimize

I was following the Sparse Autoencoder tutorial in the Stanford UFLDL series (http://ufldl.stanford.edu/wiki/index.php/Exercise:Sparse_Autoencoder).
I finished the MATLAB version of implementation and it works perfectly fine, but I noticed some discrepancies when ported the same implementation to Python (with numpy, scipy, and Matplotlib).
I noticed that the cost function were not minimized to the same magnitude. I am aware that due to the fact thetas are randomly initialized, each run will give different final cost values, but after rerunning both implementations for 20+ times, I always see the Python implementation resulting f_cost = 4.57e-1 while the MATLAB version gives answers around f_cost = 4.46e-1. In other words, there is a consistent difference of ~0.01.
Since these are two implementations identical in theory (same cost func, same gradient, same minFunc, LBFGS)
Since I suspect the problem is conditional on the cost function and gradient computation, I could not reproduce it in a few lines. But you can find the full code in Python and MATLAB on Github (https://github.com/alanhyue/cs294a_2011-sparseAutoencoders).
Additional details
Here are a few more details that might help clarify the problem.
LBFGS and LBFGS-B
The starter code provided by the tutorial uses LBFGS to minimize thetas, while I did not find the exact equivalent in Scipy, I am using the scipy.optimize.fmin_l_bfgs_b. I read on Wikipedia that LBFGS-B is a boxed version of LBFGS. I suppose they should give the same result?
Both implementations passed numerical gradient checking
I assume this means the gradient calculation is correct.
Results looks somewhat correct.
As indicated in the lecture notes, a correct implementation should get a collection of line detectors, which means each patches would look like a picture of a straight line.
Here is the result from Python (with a cost of 0.457).
Results of Python implementation
Here is the result from MATLAB (with a cost of 0.446).
enter image description here

Scipy Linear algebra LinearOperator function utilised in Conjugate Gradient

I am preconditioning a matrix using spilu, however, to pass this preconditioner into cg (the built in conjugate gradient method) it is necessary to use the LinearOperator function, can someone explain to me the parameter matvec, and why I need to use it. Below is my current code
Ainv=scla.spilu(A,drop_tol= 1e-7)
Ainv=scla.LinearOperator(Ainv.shape,matvec=Ainv)
scla.cg(A,b,maxiter=maxIterations, M = Ainv)
However this doesnt work and I am given the error TypeError: 'SuperLU' object is not callable. I have played around and tried
Ainv=scla.LinearOperator(Ainv.shape,matvec=Ainv.solve)
instead. This seems to work but I want to know why matvec needs Ainv.solve rather than just Ainv, and is it the right thing to feed LinearOperator?
Thanks for your time
Without having much experience with this part of scipy, some comments:
According to the docs you don't have to use LinearOperator, but you might do
M : {sparse matrix, dense matrix, LinearOperator}, so you can use explicit matrices too!
The idea/advantage of the LinearOperator:
Many iterative methods (e.g. cg, gmres) do not need to know the individual entries of a matrix to solve a linear system A*x=b. Such solvers only require the computation of matrix vector products docs
Depending on the task, sometimes even matrix-free approaches are available which can be much more efficient
The working approach you presented is indeed the correct one (some other source doing it similarily, and some course-materials doing it like that)
The idea of not using the inverse matrix, but using solve() here is not to form the inverse explicitly (which might be very costly)
A similar idea is very common in BFGS-based optimization algorithms although wiki might not give much insight here
scipy has an extra LinearOperator for this not forming the inverse explicitly! (although i think it's only used for statistics / completing/finishing some optimization; but i successfully build some LBFGS-based optimizers with this one)
Source # scicomp.stackexchange discussing this without touching scipy
And because of that i would assume spilu is completely going for this too (returning an object with a solve-method)

Comparing fsolve results in python and matlab

I have a follow up question to the post written a couple days ago, thank you for the previous feedback:
Finding complex roots from set of non-linear equations in python
I have gotten the set non-linear equations set up in python now so that fsolve will handle the real and imaginary parts independently. However, there are still problems with the python "fsolve" converging to the correct solution. I have exactly the same inputs that are used in Matlab, and after double checking, the set of equations are exactly the same as well. Matlab, no matter how I set the initial values, will always converge to the correct solution. With python however, every initial condition produces a different result, and never the correct one. After a fraction of a second, the following warning appears with python:
/opt/local/Library/Frameworks/Python.framework/Versions/Current/lib/python2.7/site-packages/scipy/optimize/minpack.py:227:
RuntimeWarning: The iteration is not making good progress, as measured by the
improvement from the last ten iterations.
warnings.warn(msg, RuntimeWarning)
I was wondering if there are some known differences between the fsolve in python and Matlab, and if there are some known methods to optimize the performance in python.
Thank you very much
I don't think that you should rely on the fact that the names are the same. I see from your other question that you are specifying that Matlab's fsolve use the 'levenberg-marquardt' algorithm rather than the default. Python's scipy.optimize.fsolve uses MINPACK's hybrd algorithms. Levenberg-Marquardt finds roots approximately by minimizing the sum of squares of the function and is quite robust. It is not a true root-finding method like the default 'trust-region-dogleg' algorithm. I don't know how the hybrd schemes work, but they claim to be a modification of Powell's method.
If you want something similar to what you're doing in Matlab, I'd look for an optimization scheme that implements Levenberg-Marquardt, such as scipy.optimize.root, which you were also using in your previous question. Is there a reason why you're not using that?

Categories