In many tutorials/blogs I've seen the output of np.fft.fft(signal) divided by the number of sample points N.
I understand that in some implementations that the transform is scaled/normalized by some factor like multiplying by N. However, I just read the docs, and by default the output of fft.fft() is unscaled. Yet I still see the output divided by N everywhere.
Why is this?
I have noticed that by scaling the output by 1/N I get back the correct amplitudes of the contributing wave signals. So obviously it is necessary, but I'd like to understand what the pure output is as compared to the scaled output.
For the DFT to be reversible (x == IDFT(DFT(x))), you need to divide by N somewhere. In signal processing this normalization is typically done in the inverse transform. For example Wikipedia shows it this way.
In other fields it is more often done in the forward transform. In physics I have seen half the normalization (1/sqrt(N)) applied to each transform, making them symmetric.
When the forward transform normalizes, then the values it returns are independent of the signal length (for example the zero frequency is the mean of all signal values). This is therefore the more useful variant when studying signal power.
The variant where the normalization is applied in the inverse transform (as commonly implemented in signal processing software, such as np.fft.fft(), and MATLAB's fft), then computing the convolution by multiplication in the frequency domain is easiest: one can directly write g = IDFT(DFT(f)*DFT(h)). If the normalization is applied elsewhere, it must be partly undone to obtain a correctly scaled result.
Other software, for example the FFTW library, does not normalize the transform at all, leaving that up to the user. This avoids unnecessary multiplications if the user wants a different normalization variant than what the library chooses.
Based on Parseval's theorem
This expresses that the energies in the time- and frequency-domain are the same. It means the magnitude |X[k]| of each frequency bin k is contributed by N samples. In order to find out the average contribution by each sample, the magnitude is normalized as |X[k]|/N, which leads to
where the LHS is the power of the signal.
However, such a scale normally doesn't matter unless you care about the unit of the magnitude, like in the case of the Sound Pressure Level (SPL) spectrum.
Related
I cannot make sense of this behavior of scipy. From the scipy.fftpack docs i learn that the output of fft is obviously complex and shaped like:
[y(0),y(1),..,y(n/2),y(1-n/2),...,y(-1)] if n is even
[y(0),y(1),..,y((n-1)/2),y(-(n-1)/2),...,y(-1)] if n is odd
so that if you plot something like np.abs(fft(signal)) you get the magnitude of the FFT.
If my signal is real-valued, then the negative frequencies give no information, and therefore one can use rfft to speed things up a little. And here comes what I don't understand: why is the output of rfft real valued, and shaped oddly like:
[y(0),Re(y(1)),Im(y(1)),...,Re(y(n/2))] if n is even
[y(0),Re(y(1)),Im(y(1)),...,Re(y(n/2)),Im(y(n/2))] if n is odd
Indeed with this definition, np.abs(rfft(signal)) gives you rubbish (you get alternately the absolute value of the real and imaginary parts of the FFT...), and you need some hacking to get the FFT magnitude. Why doesn't rfft it simply output complex valued y(j)s as:
[y(0),y(1),..,y(n/2)] if n is even
[y(0),y(1),..,y((n-1)/2)] if n is odd
so that things work nicely exactly as fft (as one would expect)?
What am I missing?
EDIT: the issue is being discussed here
Not a good reason, but one possible reason is to match the number of independent degrees of freedom with the number of variables in the input and output of the transform function. If you output only complex vector elements from strictly real input, then Im(y(0)) and Im(y(N/2)) are always zero (for even N), and thus a complex function return would waste memory while carrying no additional information by making the output two element components larger than the input, even though the degrees of freedom should be exactly the same.
The equality of input and output vector memory sizes also allows doing an in-place RFFT without requiring any additional memory allocation, which might be important in real-time systems with constrained memory relative to the FFT size.
Whereas FFTW is more typically run on big systems where it is more common for programmers use(waste?) massive amounts of memory more than the minimum theoretically needed.
Looking at this answer:
Python Scipy FFT wav files
The technical part is obvious and working, but I have two theoretical questions (the code mentioned is below):
1) Why do I have to normalized (b=...) the frames? What would happen if I used the raw data?
2) Why should I only use half of the FFT result (d=...)?
3) Why should I abs(c) the FFT result?
Perhaps I'm missing something due to inadequate understanding of WAV format or FFT, but while this code works just fine, I'd be glad to understand why it works and how to make the best use of it.
Edit: in response to the comment by #Trilarion :
I'm trying to write a simple, not 100% accurate but more like a proof-of-concept Speaker Diarisation in Python. That means taking a wav file (right now I am using this one for my tests) and in each second (or any other resolution) say if the speaker is person #1 or person #2. I know in advance that these are 2 persons and I am not trying to link them to any known voice signatures, just to separate. Right now take each second, FFT it (and thus get a list of frequencies), and cluster them using KMeans with the number of clusters between 2 and 4 (A, B [,Silence [,A+B]]).
I'm still new to analyzing wav files and audio in general.
import matplotlib.pyplot as plt
from scipy.io import wavfile # get the api
fs, data = wavfile.read('test.wav') # load the data
a = data.T[0] # this is a two channel soundtrack, I get the first track
b=[(ele/2**8.)*2-1 for ele in a] # this is 8-bit track, b is now normalized on [-1,1)
c = sfft.fft(b) # create a list of complex number
d = len(c)/2 # you only need half of the fft list
plt.plot(abs(c[:(d-1)]),'r')
plt.show()
To address these in order:
1) You don't need to normalize, but the input normalization is close to the raw structure of the digitized waveform so the numbers are unintuitive. For example, how loud is a value of 67? It's easier to normalize it to be in the range -1 to 1 to interpret the values. (But if you wanted to implement a filter, for example, where you did an FFT, modified the FFT values, followed by an IFFT, normalizing would be an unnecessary hassle.)
2) and 3) are similar in that they both have to do with the math living primarily in the complex numbers space. That is, FFTs take a waveform of complex numbers (eg, [.5+.1j, .4+.7j, .4+.6j, ...]) to another sequence of complex numbers.
So in detail:
2) It turns out that if the input waveform is real instead of complex, then the FFT has a symmetry about 0, so only the values that have a frequency >=0 are uniquely interesting.
3) The values output by the FFT are complex, so they have a Re and Im part, but this can also be expressed as a magnitude and phase. For audio signals, it's usually the magnitude that's the most interesting, because this is primarily what we hear. Therefore people often use abs (which is the magnitude), but the phase can be important for different problems as well.
That depends on what you're trying to do. It looks like you're only looking to plot the spectral density and then it's OK to do so.
In general the coefficient in the DFT is depending on the phase for each frequency so if you want to keep phase information you have to keep the argument of the complex numbers.
The symmetry you see is only guaranteed if the input is real numbered sequence (IIRC). It's related to the mirroring distortion you'll get if you have frequencies above the Nyquist frequency (half the sampling frequency), the original frequency shows up in the DFT, but also the mirrored frequency.
If you're going to inverse DFT you should keep the full data and also keep the arguments of the DFT-coefficients.
I'm trying to use KernelPCA for reducing the dimensionality of a dataset to 2D (both for visualization purposes and for further data analysis).
I experimented computing KernelPCA using a RBF kernel at various values of Gamma, but the result is unstable:
(each frame is a slightly different value of Gamma, where Gamma is varying continuously from 0 to 1)
Looks like it is not deterministic.
Is there a way to stabilize it/make it deterministic?
Code used to generate transformed data:
def pca(X, gamma1):
kpca = KernelPCA(kernel="rbf", fit_inverse_transform=True, gamma=gamma1)
X_kpca = kpca.fit_transform(X)
#X_back = kpca.inverse_transform(X_kpca)
return X_kpca
KernelPCA should be deterministic and evolve continuously with gamma. It is different from RBFSampler that does have built-in randomness in order to provide an efficient (more scalable) approximation of the RBF kernel.
However what can change in KernelPCA is the order of the principal components: in scikit-learn they are returned sorted in order of descending eigenvalue, so if you have 2 eigenvalues close to each other it could be that the order changes with gamma.
My guess (from the gif) is that this is what is happening here: the axes along which you are plotting are not constant so your data seems to jump around.
Could you provide the code you used to produce the gif?
I'm guessing it is a plot of the data points along the 2 first principal components but it would help to see how you produced it.
You could try to further inspect it by looking at the values of kpca.alphas_ (the eigenvectors) for each value of gamma.
Hope this makes sense.
EDIT: As you noted it looks like the points are reflected against the axis, the most plausible explanation is that one of the eigenvector flips sign (note this does not affect the eigenvalue).
I put in a simple gist to reproduce the issue (you'll need a Jupyter notebook to run it). You can see the sign-flipping when you change the value of gamma.
As a complement note that this kind of discrepancy happens only because you fit several times the KernelPCA object several times. Once you settled with a particular gamma value and you've fit kpca once you can call transform several times and get consistent results.
For the classical PCA the docs mention that:
Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.
I don't know about the behavior of a single KernelPCA object that you would fit several times (I did not find anything relevant in the docs).
It does not apply to your case though as you have to fit the object with several gamma values.
So... I can't give you a definitive answer on why KernelPCA is not deterministic. The behavior resembles the differences I've observed between the results of PCA and RandomizedPCA. PCA is deterministic, but RandomizedPCA is not, and sometimes the eigenvectors are flipped in sign relative to the PCA eigenvectors.
That leads me to my vague idea of how you might get more deterministic results....maybe. Use RBFSampler with a fixed seed:
def pca(X, gamma1):
kernvals = RBFSampler(gamma=gamma1, random_state=0).fit_transform(X)
kpca = PCA().fit_transform(X)
X_kpca = kpca.fit_transform(X)
return X_kpca
Ok, so what I'm trying to do is a scale space on a 1D set of data where the entire data set presumably is taken from a sum of gaussians function. To do this, I have to apply a gaussian convolution to the data set. My end goal is to find the number of gaussians in this data set by the number of zero crossings in the second order derivative of the convoluted data. The reasons for this come from this article.
Now the problem occurs with scipy's gaussian_filter1d that I'm using to do the convolution. I assume that when it says filter it only means a convolution with a gaussian because there is already a separate function for fourier_gaussian_filter. In addition, to avoid approximation, I'm using the gaussian_filter1d's own 2nd order derivative and then apply the convolution. The problem occurs when I keep lowering the sigma of the gaussian filter that you would assume that it would act more like an dirac delta. And this is what actually occurs at smaller values of sigma in the zero order derivate. Unfortunately, when I apply a 2nd order derivate gaussian filter, the data does not have the zero-crossings that I expect it to. In fact, it doesn't have any zero crossings even when there is only one gaussian in the original data.
Some possible ideas that came to me about what could be the problem is that an actual delta function doesn't have a derivative and that the derivative of a really small sigma Gaussian can't approximate the derivative of a delta. But I wanted to hear the community's thoughts on the problem. Thank you for reading this post.
Does anyone know if it is possible to find a power spectral density of a signal with gaps in it. For example (in matlab syntax cause that is what I'm familiar with)
ta=1:1000;
tb=1200:3000;
t=[ta tb]; % this is the timebase
signal=randn(size(t)); this is a signal
figure(101)
plot(t,signal,'.')
I'd like to be able to determine frequencies on a longer time base that just the individual sections of data. Obviously I could just take the PSD of individual sections but that will limit the lowest frequency. I could interpolate the data, but this would colour the PSD.
Any thoughts would be much appreciated.
The Lomb-Scargle periodogram algorithm is usually used to perform analysis on unevenly spaced data (sampled at arbitrary time points) or when a proportion of the data is missing.
Here's a couple of MATLAB implementations:
lombscargle.m (FEX)
Lomb (Lomb-Scargle) Periodogram (FEX)
lomb.m - ECG tools by Gari Clifford
I found this Non Uniform FFT but I'm not sure that its exactly what I need as it might really be for data that is mostly sampled on an uneven time base, rather than evenly spaced data with significant gaps. I'll give it a go!
Leaving out segments of the Fourier basis vectors results in exactly the same FT, thus PSD, as using the complete basis, but multiplying by zeros within a zero padding in any signal "gaps".