Does anyone know if it is possible to find a power spectral density of a signal with gaps in it. For example (in matlab syntax cause that is what I'm familiar with)
ta=1:1000;
tb=1200:3000;
t=[ta tb]; % this is the timebase
signal=randn(size(t)); this is a signal
figure(101)
plot(t,signal,'.')
I'd like to be able to determine frequencies on a longer time base that just the individual sections of data. Obviously I could just take the PSD of individual sections but that will limit the lowest frequency. I could interpolate the data, but this would colour the PSD.
Any thoughts would be much appreciated.
The Lomb-Scargle periodogram algorithm is usually used to perform analysis on unevenly spaced data (sampled at arbitrary time points) or when a proportion of the data is missing.
Here's a couple of MATLAB implementations:
lombscargle.m (FEX)
Lomb (Lomb-Scargle) Periodogram (FEX)
lomb.m - ECG tools by Gari Clifford
I found this Non Uniform FFT but I'm not sure that its exactly what I need as it might really be for data that is mostly sampled on an uneven time base, rather than evenly spaced data with significant gaps. I'll give it a go!
Leaving out segments of the Fourier basis vectors results in exactly the same FT, thus PSD, as using the complete basis, but multiplying by zeros within a zero padding in any signal "gaps".
Related
This article https://www.bitweenie.com/listings/fft-zero-padding/ gives a simple relation between time-length of the input data to the FFT and the minimum distance between two frequencies that can be distinguished in the FFT. The article calls this Waveform frequency resolution.
In other words; if two input-frequencies are closer in frequency than 1/time-length_of_input_data, they will show as only one peak in the FFT-plot.
My question is: is there a way to increase this Waveform frequency resolution? I am finding it difficult to work with rather short data-series due to this limitation.
As an example, if I use a combination of sine series with periods 9.5, 10, and 11 over 240 datapoints I cannot distinguish between the different frequencies.
To have good frequency resultion you need a long time series.
This is a fundamental issue, called uncertainty principle. It cannot be overcome within Fourier analysis (Fourier transform, DFT, short-time Fourier transform and so on).
Also note that zero padding will not overcome this issue.
It gives more points in the frequency domain, in the sense that the same spectral information is sampled more densely, but it will not make peaks sharper or more separated.
The only way to overcome the uncertainty principle is to make further assumptions on the data.
If for example you know that there is only a single frequency component, it is possible to determine its frequency more accurately than the uncertainty principle predicts.
Also you can use transforms such as the Vigner-Wille transform . It is not bound by the uncertainty principle, but generates "crossterms", i.e. frequency component artifacts. However, when you only have few frequency compoents this might be acceptable. Depends on the use-case.
In many tutorials/blogs I've seen the output of np.fft.fft(signal) divided by the number of sample points N.
I understand that in some implementations that the transform is scaled/normalized by some factor like multiplying by N. However, I just read the docs, and by default the output of fft.fft() is unscaled. Yet I still see the output divided by N everywhere.
Why is this?
I have noticed that by scaling the output by 1/N I get back the correct amplitudes of the contributing wave signals. So obviously it is necessary, but I'd like to understand what the pure output is as compared to the scaled output.
For the DFT to be reversible (x == IDFT(DFT(x))), you need to divide by N somewhere. In signal processing this normalization is typically done in the inverse transform. For example Wikipedia shows it this way.
In other fields it is more often done in the forward transform. In physics I have seen half the normalization (1/sqrt(N)) applied to each transform, making them symmetric.
When the forward transform normalizes, then the values it returns are independent of the signal length (for example the zero frequency is the mean of all signal values). This is therefore the more useful variant when studying signal power.
The variant where the normalization is applied in the inverse transform (as commonly implemented in signal processing software, such as np.fft.fft(), and MATLAB's fft), then computing the convolution by multiplication in the frequency domain is easiest: one can directly write g = IDFT(DFT(f)*DFT(h)). If the normalization is applied elsewhere, it must be partly undone to obtain a correctly scaled result.
Other software, for example the FFTW library, does not normalize the transform at all, leaving that up to the user. This avoids unnecessary multiplications if the user wants a different normalization variant than what the library chooses.
Based on Parseval's theorem
This expresses that the energies in the time- and frequency-domain are the same. It means the magnitude |X[k]| of each frequency bin k is contributed by N samples. In order to find out the average contribution by each sample, the magnitude is normalized as |X[k]|/N, which leads to
where the LHS is the power of the signal.
However, such a scale normally doesn't matter unless you care about the unit of the magnitude, like in the case of the Sound Pressure Level (SPL) spectrum.
I want to compare the power spectra of the time traces of two random processes but the frequency range returned is different.
How is that frequency range chosen and how can I modify it ?
More specifically, what I do is the following:
from scipy import signal as sgn
spectrum1=sgn.periodogram(signal1,fs=fs1)
spectrum2=sgn.periodogram(signal2,fs=fs2)
and my problem is that spectrum1[0] has a significantly different range with respect to spectrum2[0].
The periodogram is computed using FFT (Fast Fourier Transform), which implements the DFT (Discrete Fourier Transform). The DFT of a periodic signal features discrete frequencies, all multiple of a fundamental frequency consistent with the duration of the frame T : f_0=1/T.
As a consequence, to get the same frequencies, the durations of the frame must be similar, of at least a multiple of one another:
len(signal1)/fs1 = k*len(signal2)/fs2
It may require to truncate one of the arrays. The argument nfft of scipy.signal.periodgram() may also be tried, the requirement becomes:
nfft1/fs1 = k*nfft2/fs2
If the duration of the frame is not consistent with the actual period of the signal, or if the signal is not periodic, windowing may limit the effects of spectral leakage. It is so useful that it is integrated to scipy.signal.periodgram() as an argument. You may try values 'hann' or 'parzen' as listed here.
If the sampling rates are not similar, resampling the signal may be required. To this end scipy.signal.resample() can be applied. It also features the argument window and makes use of FFT for resampling, thus avoiding some errors that linear interpolation would trigger.
I have a data with unevenly spaced (time) samples. How can I find the FFT of the signal and plot it.
Apart from the suggested answers, if your goal is find the frequencies (and not have to use FFT for some reason - which I can't infer from your question), you can consider using periodograms; more specifically, the Lomb-Scargle Periodogram - which can yield frequencies corresponding to unevenly spaced data.
Here is a great answer illustrating this suggestion.
You can't do an FFT of an unevenly sampled signal. That invalidates the assumptions of the math the FFT is based upon.
You'll have to resample the signal so you have evenly spaced samples.
This is slightly out of scope of this forum, but you can start in the dsp stackexchange
If you want a quick and dirty solution use the following approach :
choose a time delay less than or equal your smallest time between points --> dt or alternatively 20% of the inverse of the maximum frequency you are interested in.
make a buffer with N points with N a power of 2 and N*dt > Tmax - Tmin, or whatever the time window you are interested in.
distribute your points over the 2 closest points, or if you do not mind a bit more 'fuzz' just put it at the nearest point.
You'll end up with a buffer with spikes and zeroes in it, but with the same energy as your original signal.
Now FFT and only use the lowest 20% or so of the frequency lines.
This is an incredibly 'raw' and 'approximative' way of doing things, but it will give some approximation of wiggly power bars over frequency. You can clean the signal up by applying windows.
Note that digital signal processing is a field unto itself. I recommend to explore that rabbithole, but do expect to spent quite some time down there.
To use an FFT, you will need to created a vector of samples evenly spaced in time.
If the signal was bandlimited to below a sample rate implied by the widest sample spacings, you can try polynomial interpolation between your unevenly spaced samples to create a grid of about the same number of equally spaced samples in time. But, depending on polynomial degree, this might be highly sensitive to any noise in the bandlimiting or sampling process.
I spent couple days trying to solve this problem, but no luck so I turn to you. I have file for a photometry of a star with time and amplitude data. I'm supposed to use this data to find period changes. I used Lomb-Scargle from pysca library, but I have to use Fourier analysis. I tried fft (dft) from scipy and numpy but I couldn't get anything that would resemble frequency spectrum or Fourier coefficients. I even tried to use nfft from pynfft library because my data are not evenly sampled, but I did not get anywhere with this. So if any of you know how to get from Fourier analysis main frequency in periodical data, please let me know.
Before doing the FFT, you will need to resample or interpolate the data until you get a set of amplitude values equally spaced in time.