This article https://www.bitweenie.com/listings/fft-zero-padding/ gives a simple relation between time-length of the input data to the FFT and the minimum distance between two frequencies that can be distinguished in the FFT. The article calls this Waveform frequency resolution.
In other words; if two input-frequencies are closer in frequency than 1/time-length_of_input_data, they will show as only one peak in the FFT-plot.
My question is: is there a way to increase this Waveform frequency resolution? I am finding it difficult to work with rather short data-series due to this limitation.
As an example, if I use a combination of sine series with periods 9.5, 10, and 11 over 240 datapoints I cannot distinguish between the different frequencies.
To have good frequency resultion you need a long time series.
This is a fundamental issue, called uncertainty principle. It cannot be overcome within Fourier analysis (Fourier transform, DFT, short-time Fourier transform and so on).
Also note that zero padding will not overcome this issue.
It gives more points in the frequency domain, in the sense that the same spectral information is sampled more densely, but it will not make peaks sharper or more separated.
The only way to overcome the uncertainty principle is to make further assumptions on the data.
If for example you know that there is only a single frequency component, it is possible to determine its frequency more accurately than the uncertainty principle predicts.
Also you can use transforms such as the Vigner-Wille transform . It is not bound by the uncertainty principle, but generates "crossterms", i.e. frequency component artifacts. However, when you only have few frequency compoents this might be acceptable. Depends on the use-case.
Related
I want to compare the power spectra of the time traces of two random processes but the frequency range returned is different.
How is that frequency range chosen and how can I modify it ?
More specifically, what I do is the following:
from scipy import signal as sgn
spectrum1=sgn.periodogram(signal1,fs=fs1)
spectrum2=sgn.periodogram(signal2,fs=fs2)
and my problem is that spectrum1[0] has a significantly different range with respect to spectrum2[0].
The periodogram is computed using FFT (Fast Fourier Transform), which implements the DFT (Discrete Fourier Transform). The DFT of a periodic signal features discrete frequencies, all multiple of a fundamental frequency consistent with the duration of the frame T : f_0=1/T.
As a consequence, to get the same frequencies, the durations of the frame must be similar, of at least a multiple of one another:
len(signal1)/fs1 = k*len(signal2)/fs2
It may require to truncate one of the arrays. The argument nfft of scipy.signal.periodgram() may also be tried, the requirement becomes:
nfft1/fs1 = k*nfft2/fs2
If the duration of the frame is not consistent with the actual period of the signal, or if the signal is not periodic, windowing may limit the effects of spectral leakage. It is so useful that it is integrated to scipy.signal.periodgram() as an argument. You may try values 'hann' or 'parzen' as listed here.
If the sampling rates are not similar, resampling the signal may be required. To this end scipy.signal.resample() can be applied. It also features the argument window and makes use of FFT for resampling, thus avoiding some errors that linear interpolation would trigger.
I have a data with unevenly spaced (time) samples. How can I find the FFT of the signal and plot it.
Apart from the suggested answers, if your goal is find the frequencies (and not have to use FFT for some reason - which I can't infer from your question), you can consider using periodograms; more specifically, the Lomb-Scargle Periodogram - which can yield frequencies corresponding to unevenly spaced data.
Here is a great answer illustrating this suggestion.
You can't do an FFT of an unevenly sampled signal. That invalidates the assumptions of the math the FFT is based upon.
You'll have to resample the signal so you have evenly spaced samples.
This is slightly out of scope of this forum, but you can start in the dsp stackexchange
If you want a quick and dirty solution use the following approach :
choose a time delay less than or equal your smallest time between points --> dt or alternatively 20% of the inverse of the maximum frequency you are interested in.
make a buffer with N points with N a power of 2 and N*dt > Tmax - Tmin, or whatever the time window you are interested in.
distribute your points over the 2 closest points, or if you do not mind a bit more 'fuzz' just put it at the nearest point.
You'll end up with a buffer with spikes and zeroes in it, but with the same energy as your original signal.
Now FFT and only use the lowest 20% or so of the frequency lines.
This is an incredibly 'raw' and 'approximative' way of doing things, but it will give some approximation of wiggly power bars over frequency. You can clean the signal up by applying windows.
Note that digital signal processing is a field unto itself. I recommend to explore that rabbithole, but do expect to spent quite some time down there.
To use an FFT, you will need to created a vector of samples evenly spaced in time.
If the signal was bandlimited to below a sample rate implied by the widest sample spacings, you can try polynomial interpolation between your unevenly spaced samples to create a grid of about the same number of equally spaced samples in time. But, depending on polynomial degree, this might be highly sensitive to any noise in the bandlimiting or sampling process.
I am currently doing a project at university where I am distinguishing between different instruments playing notes of the same pitch using python.
I have recorded various notes on different instruments using a microphone attached to a computer.
I have also recorded background for the room.
So far I have plots for different notes on different instruments, where on the y-axis I have the amplitude in dB: 20*log10(|FFT(signal)|)
And on the x-axis I have DFT sample frequencies
Some of the harmonic peaks are small enough (or the background is large enough) for noise to be a factor-(can't post images as I'm a noob!) my problem is calculating the level of uncertainty in the height of the peaks when accounting for background noise.
My question is:
Well, how to calculate the level of uncertainty in the height of the peaks (their relative harmonic amplitudes) when accounting for background noise.
Some ideas:
What dB threshold I should use when classifying what is a harmonic peak and what is attributable to noise (should I discount a peak lower than the maximum backgound (~28000dB) or the mean (~15000) or perhaps twice one of these values)?
Also, to take account of the noise introduced by background, is it legitimate to subtract the value in FFT bin n for the background, from FFT bin n for my instrument recording?
Also I have looked at this post how can the noise be removed from a recorded sound,using fft in MATLAB? , there seem to be very differing opinions on there.
If it's relevant I can post segments of my code- wary of putting too much up though in case of classmate plagarism.
Links to literature that would help with the project would be very much appreciated. (Still at the stage where I'm plotting the data every which way I can think of to look for distinguishing attributes for each instrument).
Thanks in advance
You seem to be asking many questions. Let me start by answering your first one:
Well, how to calculate the level of uncertainty in the height of the
peaks (their relative harmonic amplitudes) when accounting for
background noise.
You would expect the sound to summate linearly (to a first order approximation). The natural thing to do would be to do some recordings of only the background and then measure the mean amplitude and standard deviation of the harmonics in the background.
As an example, say you are looking at 3 harmonics - 20KHz, 11KHz and 33KHz. Do some recordings of only background and you find mean amplitudes of 1.3dB 2.2dB and 2.3dB with standard deviations of say +/-0.1, +/-0.2 and +/-0.4dB. You now have an uncertainty estimate and a mean background harmonic to subtract from.
There are smarter ways to do this but it's a start.
Now then, to get on to your second question
What dB threshold I should use when classifying what is a harmonic
peak and what is attributable to noise (should I discount a peak lower
than the maximum backgound (~28000dB) or the mean (~15000) or perhaps
twice one of these values)?
If a peak is within the mean + the uncertainty (one or two standard deviations, this is arbitrary really and depends on convention) you can say it's significant. Eg if you find that noise level at 3KHz is 1.2dB with an uncertainty of +/- 0.3dB and you measure your harmonic to be 1.3dB with an uncertainty of (measured in the same way) of 0.1dB then it's not significant.
Now for the third part:
Also, to take account of the noise introduced by background, is it
legitimate to subtract the value in FFT bin n for the background, from
FFT bin n for my instrument recording?
Yes (generally speaking). If you really want to convince yourself of this you can either A)do some simulations with summating waves and doing an FFT of them, B)do an experiment and the same as in A or C)Go through the mathematics of Fourier transforms.
With regard to literature, I think that would depend on what you're doing specifically, if you're a physics student "Mathematical Methods in the Physical Sciences" by Mary Boas treats Fourier transforms well, if you're a computer scientist/engineer you probably want something different.
Let me know if you need more help.
As a musician [bassoon], neural network researcher, and using fft to compress bird song, I have a few suggestions:
musical instruments are defined by the bore -- straight, conical, or a combination. This results in emphasis of harmonic variations that will help distinguish.
double Reed, single Reed, flute, and brass instruments have different vibration patterns.
fft can resolve sound into fundamental and harmonics.
training a neural network to recognize harmonic patterns [normalize if fundamentals fiffer?] -- use a separate input for each frequency bin AND a separate output for each instrument [or family? -- it may be difficult to distinguish between saxes, for instance, while oboe, English horn, bassoon, and contra-bassoon may be distinguished]. I like at least 3 layer neural nets and had excellent results doing OCR with 4 layers [2 internal]
For data that is known to have seasonal, or daily patterns I'd like to use fourier analysis be used to make predictions. After running fft on time series data, I obtain coefficients. How can I use these coefficients for prediction?
I believe FFT assumes all data it receives constitute one period, then, if I simply regenerate data using ifft, I am also regenerating the continuation of my function, so can I use these values for future values?
Simply put: I run fft for t=0,1,2,..10 then using ifft on coef, can I use regenerated time series for t=11,12,..20 ?
I'm aware that this question may be not actual for you anymore, but for others that are looking for answers I wrote a very simple example of fourier extrapolation in Python https://gist.github.com/tartakynov/83f3cd8f44208a1856ce
Before you run the script make sure that you have all dependencies installed (numpy, matplotlib). Feel free to experiment with it.
P.S. Locally Stationary Wavelet may be better than fourier extrapolation. LSW is commonly used in predicting time series. The main disadvantage of fourier extrapolation is that it just repeats your series with period N, where N - length of your time series.
It sounds like you want a combination of extrapolation and denoising.
You say you want to repeat the observed data over multiple periods. Well, then just repeat the observed data. No need for Fourier analysis.
But you also want to find "patterns". I assume that means finding the dominant frequency components in the observed data. Then yes, take the Fourier transform, preserve the largest coefficients, and eliminate the rest.
X = scipy.fft(x)
Y = scipy.zeros(len(X))
Y[important frequencies] = X[important frequencies]
As for periodic repetition: Let z = [x, x], i.e., two periods of the signal x. Then Z[2k] = X[k] for all k in {0, 1, ..., N-1}, and zeros otherwise.
Z = scipy.zeros(2*len(X))
Z[::2] = X
When you run an FFT on time series data, you transform it into the frequency domain. The coefficients multiply the terms in the series (sines and cosines or complex exponentials), each with a different frequency.
Extrapolation is always a dangerous thing, but you're welcome to try it. You're using past information to predict the future when you do this: "Predict tomorrow's weather by looking at today." Just be aware of the risks.
I'd recommend reading "Black Swan".
you can use the library that #tartakynov posted and, to not repeat exactly the same time series in the forcast (overfitting), you can add a new parameter to the function called n_param and fix a lower bound h for the amplitudes of the frequencies.
def fourierExtrapolation(x, n_predict,n_param):
usually you will find that, in a signal, there are some frequencies that have significantly higher amplitude than others, so, if you select this frequencies you will be able to isolate the periodic nature of the signal
you can add this two lines who are determinated by certain number n_param
h=np.sort(x_freqdom)[-n_param]
x_freqdom=[ x_freqdom[i] if np.absolute(x_freqdom[i])>=h else 0 for i in range(len(x_freqdom)) ]
just adding this you will be able to forecast nice and smooth
another useful article about FFt:
forecast FFt in R
Does anyone know if it is possible to find a power spectral density of a signal with gaps in it. For example (in matlab syntax cause that is what I'm familiar with)
ta=1:1000;
tb=1200:3000;
t=[ta tb]; % this is the timebase
signal=randn(size(t)); this is a signal
figure(101)
plot(t,signal,'.')
I'd like to be able to determine frequencies on a longer time base that just the individual sections of data. Obviously I could just take the PSD of individual sections but that will limit the lowest frequency. I could interpolate the data, but this would colour the PSD.
Any thoughts would be much appreciated.
The Lomb-Scargle periodogram algorithm is usually used to perform analysis on unevenly spaced data (sampled at arbitrary time points) or when a proportion of the data is missing.
Here's a couple of MATLAB implementations:
lombscargle.m (FEX)
Lomb (Lomb-Scargle) Periodogram (FEX)
lomb.m - ECG tools by Gari Clifford
I found this Non Uniform FFT but I'm not sure that its exactly what I need as it might really be for data that is mostly sampled on an uneven time base, rather than evenly spaced data with significant gaps. I'll give it a go!
Leaving out segments of the Fourier basis vectors results in exactly the same FT, thus PSD, as using the complete basis, but multiplying by zeros within a zero padding in any signal "gaps".