How to find the FFT of an unevenly sampled signal in python3?

How to find the FFT of an unevenly sampled signal in python3? - python

I have a data with unevenly spaced (time) samples. How can I find the FFT of the signal and plot it.

Apart from the suggested answers, if your goal is find the frequencies (and not have to use FFT for some reason - which I can't infer from your question), you can consider using periodograms; more specifically, the Lomb-Scargle Periodogram - which can yield frequencies corresponding to unevenly spaced data.
Here is a great answer illustrating this suggestion.

You can't do an FFT of an unevenly sampled signal. That invalidates the assumptions of the math the FFT is based upon.
You'll have to resample the signal so you have evenly spaced samples.
This is slightly out of scope of this forum, but you can start in the dsp stackexchange
If you want a quick and dirty solution use the following approach :
choose a time delay less than or equal your smallest time between points --> dt or alternatively 20% of the inverse of the maximum frequency you are interested in.
make a buffer with N points with N a power of 2 and N*dt > Tmax - Tmin, or whatever the time window you are interested in.
distribute your points over the 2 closest points, or if you do not mind a bit more 'fuzz' just put it at the nearest point.
You'll end up with a buffer with spikes and zeroes in it, but with the same energy as your original signal.
Now FFT and only use the lowest 20% or so of the frequency lines.
This is an incredibly 'raw' and 'approximative' way of doing things, but it will give some approximation of wiggly power bars over frequency. You can clean the signal up by applying windows.
Note that digital signal processing is a field unto itself. I recommend to explore that rabbithole, but do expect to spent quite some time down there.

To use an FFT, you will need to created a vector of samples evenly spaced in time.
If the signal was bandlimited to below a sample rate implied by the widest sample spacings, you can try polynomial interpolation between your unevenly spaced samples to create a grid of about the same number of equally spaced samples in time. But, depending on polynomial degree, this might be highly sensitive to any noise in the bandlimiting or sampling process.

Related

Waveform frequency resolution for FFT – any way to increase it?

This article https://www.bitweenie.com/listings/fft-zero-padding/ gives a simple relation between time-length of the input data to the FFT and the minimum distance between two frequencies that can be distinguished in the FFT. The article calls this Waveform frequency resolution.
In other words; if two input-frequencies are closer in frequency than 1/time-length_of_input_data, they will show as only one peak in the FFT-plot.
My question is: is there a way to increase this Waveform frequency resolution? I am finding it difficult to work with rather short data-series due to this limitation.
As an example, if I use a combination of sine series with periods 9.5, 10, and 11 over 240 datapoints I cannot distinguish between the different frequencies.

To have good frequency resultion you need a long time series.
This is a fundamental issue, called uncertainty principle. It cannot be overcome within Fourier analysis (Fourier transform, DFT, short-time Fourier transform and so on).
Also note that zero padding will not overcome this issue.
It gives more points in the frequency domain, in the sense that the same spectral information is sampled more densely, but it will not make peaks sharper or more separated.
The only way to overcome the uncertainty principle is to make further assumptions on the data.
If for example you know that there is only a single frequency component, it is possible to determine its frequency more accurately than the uncertainty principle predicts.
Also you can use transforms such as the Vigner-Wille transform . It is not bound by the uncertainty principle, but generates "crossterms", i.e. frequency component artifacts. However, when you only have few frequency compoents this might be acceptable. Depends on the use-case.

Optimising parameters for finding peaks in 1D array

I need to optimise a method for finding the number of data peaks in a 1D array. The data is a time-series of the amplitude of a wav file.
I have the code implemented already:
from scipy.io.wavfile import read
from scipy.signal import find_peaks
_, amplitudes = read('audio1.wav')
indexes, _ = find_peaks(amplitudes, height=80)
print(f'Number of peaks: {len(indexes)}')
When plotted, the data looks like this:
General scale
The 'peaks' that I am interested in are clear to the human eye - there are 23 in this particular dataset.
However, because the array is so large, the data is extremely variant within the peaks that are clear at a general scale (hence the many hundreds of peaks labelled with blue crosses):
Zoomed in view of one peak
Peak-finding questions have been asked many times before (I've been through a lot of them!) - but I can't find any help or explanation of optimising the parameters for finding only the peaks I want. I know a little about Python, but am blind when it comes to mathematical analysis!
Analysing by width seems useless because, as per the second image, the peaks clear at a large scale are actually interspersed with 'silent' ranges. Distance is not helpful because I do not know how close the peaks will be in other wav files. Prominence has been suggested as the best method but I could not get the results I needed; threshold likewise. I have also tried decimating the signal, smoothing the signal with a Savitzky-Golay filter, and different combinations of parameters and values, all with inaccurate results.
Height alone has been useful because I can see from the charts that peaks always reach above 80.

This is a common task in audio processing and there are several approaches which totally depend on your data.
However, there are implementations out there which are used for finding peaks in novelty functions (e.g., the output from a beat tracker). Try these ones:
https://madmom.readthedocs.io/en/latest/modules/features/onsets.html#madmom.features.onsets.peak_picking
https://librosa.github.io/librosa/generated/librosa.util.peak_pick.html#librosa.util.peak_pick
Basically they implement the same method but there might be differences in the details.
Furthermore, you could check, if you really need to work on this high sampling frequency. Try downsampling the signal or use a moving average filter.

Look at 0d persistent homology to find a good strategy, where the parameter you can optimize for is peak persistence. A nice blog post here explains the basics.
But in short the idea is to imagine your graph being filled by water, and then slowly draining the water. Every time a piece of the graph comes above water a new island is born. When two islands are next to each other they merge, which causes the younger island (with the lower peak) to die.
Then each data point has a birth time and a death time. The most significant peaks are those with the longest persistence, which is death - birth.
If the water level drops at a continuous rate, then the persistence is defined in terms of peak height. Another possibility is by dropping the water instantaneously from point to point as time goes from step t to step t+1, in wich case the persistence is defined in peak width in terms of signal samples.
For you it seems that using the original definition in terms of peak height > 70 finds all peaks you are interested in, albeit possibly too many, clustered together. You can limit this by choosing the first peak in each cluster or the highest peak in each cluster or by doing both approaches and only choosing peaks that have both great height persistence as well as width persistence.

Choosing a precise frequency interval for power spectrum

I want to compare the power spectra of the time traces of two random processes but the frequency range returned is different.
How is that frequency range chosen and how can I modify it ?
More specifically, what I do is the following:
from scipy import signal as sgn
spectrum1=sgn.periodogram(signal1,fs=fs1)
spectrum2=sgn.periodogram(signal2,fs=fs2)
and my problem is that spectrum1[0] has a significantly different range with respect to spectrum2[0].

The periodogram is computed using FFT (Fast Fourier Transform), which implements the DFT (Discrete Fourier Transform). The DFT of a periodic signal features discrete frequencies, all multiple of a fundamental frequency consistent with the duration of the frame T : f_0=1/T.
As a consequence, to get the same frequencies, the durations of the frame must be similar, of at least a multiple of one another:
len(signal1)/fs1 = k*len(signal2)/fs2
It may require to truncate one of the arrays. The argument nfft of scipy.signal.periodgram() may also be tried, the requirement becomes:
nfft1/fs1 = k*nfft2/fs2
If the duration of the frame is not consistent with the actual period of the signal, or if the signal is not periodic, windowing may limit the effects of spectral leakage. It is so useful that it is integrated to scipy.signal.periodgram() as an argument. You may try values 'hann' or 'parzen' as listed here.
If the sampling rates are not similar, resampling the signal may be required. To this end scipy.signal.resample() can be applied. It also features the argument window and makes use of FFT for resampling, thus avoiding some errors that linear interpolation would trigger.

How can I discriminate the falling edge of a signal with Python?

I am working on discriminating some signals for the calculation of the free-period oscillation and damping ratio of a spring-mass system (seismometer). I am using Python as the main processing program. What I need to do is import this signal, parse the signal and find the falling edge, then return a list of the peaks from the tops of each oscillation as a list so that I can calculate the damping ratio. The calculation of the free-period is fairly straightforward once I've determined the location of the oscillation within the dataset.
Where I am mostly hung up are how to parse through the list, identify the falling edge, and then capture each of the Z0..Zn elements. The oscillation frequency can be calculated using an FFT fairly easily, once I know where that falling edge is, but if I process the whole file, a lot of energy from the energizing of the system before the release can sometimes force the algorithm to kick out an ultra-low frequency that represents the near-DC offset, rather than the actual oscillation. (It's especially a problem at higher damping ratios where there might be only four or five measurable rebounds).
Has anyone got some ideas on how I can go about this? Right now, the code below uses the arbitrarily assigned values for the signal in the screenshot. However I need to have code calculate those values. Also, I haven't yet determined how to create my list of peaks for the calculation of the damping ratio h. Your help in getting some ideas together for solving this would be very welcome. Due to the fact I have such a small Stackoverflow reputation, I have included my signal in a sample screenshot at the following link:
(Boy, I hope this works!)
Tycho's sample signal -->
https://github.com/tychoaussie/Sigcal_v1/blob/066faca7c3691af3f894310ffcf3bbb72d730601/Freeperiod_Damping%20Ratio.jpg
##########################################################
import os, sys, csv
from scipy import signal
from scipy.integrate import simps
import pylab as plt
import numpy as np
import scipy as sp
#
# code goes here that imports the data from the csv file into lists.
# One of those lists is called laser, a time-history list in counts from an analog-digital-converter
# sample rate is 130.28 samples / second.
#
#
# Find the period of the observed signal
#
delta = 0.00767 # Calculated elsewhere in code, represents 130.28 samples/sec
# laser is a list with about 20,000 elements representing time-history data from a laser position sensor
# The release of the mass and system response occurs starting at sample number 2400 in this particular instance.
sense = signal.detrend(laser[2400:(2400+8192)]) # Remove the mean of the signal
N = len(sense)
W = np.fft.fft(sense)
freq = np.fft.fftfreq(len(sense),delta) # First value represents the number of samples and delta is the sample rate
#
# Take the sample with the largest amplitude as our center frequency.
# This only works if the signal is heavily sinusoidal and stationary
# in nature, like our calibration data.
#
idx = np.where(abs(W)==max(np.abs(W)))[0][-1]
Frequency = abs(freq[idx]) # Frequency in Hz
period = 1/(Frequency*delta) # represents the number of samples for one cycle of the test signal.
#
# create an axis representing time.
#
dt = [] # Create an x axis that represents elapsed time in seconds. delta = seconds per sample, i represents sample count
for i in range(0,len(sensor)):
dt.append(i*delta)
#
# At this point, we know the frequency interval, the delta, and we have the arrays
# for signal and laser. We can now discriminate out the peaks of each rebound and use them to process the damping ratio of either the 'undamped' system or the damping ratio of the 'electrically damped' system.
#
print 'Frequency calcuated to ',Frequency,' Hz.'

Here's a somewhat unconventional idea, which I think is might be fairly robust and doesn't require a lot of heuristics and guesswork. The data you have is really high quality and fits a known curve so that helps a lot here. Here I assume the "good part" of your curve has the form:
V = a * exp(-γ * t) * cos(2 * π * f * t + φ) + V0 # [Eq1]
V: voltage
t: time
γ: damping constant
f: frequency
a: starting amplitude
φ: starting phase
V0: DC offset
Outline of the algorithm
Getting rid of the offset
Firstly, calculate the derivative numerically. Since the data quality is quite high, the noise shouldn't affect things too much.
The voltage derivative V_deriv has the same form as the original data: same frequency and damping constant, but with a different phase ψ and amplitude b,
V_deriv = b * exp(-γ * t) * cos(2 * π * f * t + ψ) # [Eq2]
The nice thing is that this will automatically get rid of your DC offset.
Alternative: This step isn't completely needed – the offset is a relatively minor complication to the curve fitting, since you can always provide a good guess for an offset by averaging the entire curve. Trade-off is you get a better signal-to-noise if you don't use the derivative.
Preliminary guesswork
Now consider the curve of the derivative (or the original curve if you skipped the last step). Start with the data point on the very far right, and then follow the curve leftward as many oscillations as you can until you reach an oscillation whose amplitude is greater than some amplitude threshold A. You want to find a section of the curve containing one oscillation with a good signal-to-noise ratio.
How you determine the amplitude threshold is difficult to say. It depends on how good your sensors are. I suggest leaving this as a parameter for tweaking later.
Curve-fitting
Now that you have captured one oscillation, you can do lots of easy things: estimate the frequency f and damping constant γ. Using these initial estimates, you can perform a nonlinear curve fit of all the data to the right of this oscillation (including the oscillation itself).
The function that you fit is [Eq2] if you're using the derivative, or [Eq1] if you use the original curve (but they are the same functions anyway, so rather moot point).
Why do you need these initial estimates? For a nonlinear curve fit to a decaying wave, it's critical that you give a good initial guess for the parameters, especially the frequency and damping constant. The other parameters are comparatively less important (at least that's my experience), but here's how you can get the others just in case:
The phase is 0 if you start from a maximum, or π if you start at a minimum.
You can guess the starting amplitude too, though the fitting algorithm will typically do fine even if you just set it to 1.
Finding the edge
The curve fit should be really good at this point – you can tell because discrepancy between your curve and the fit is very low. Now the trick is to attempt to increase the domain of the curve to the left.
If you stay within the sinusoidal region, the curve fit should remain quite good (how you judge the "goodness" requires some experimenting). As soon as you hit the flat region of the curve, however, the errors will start to increase dramatically and the parameters will start to deviate. This can be used to determine where the "good" data ends.
You don't have to do this point by point though – that's quite inefficient. A binary search should work quite well here (possibly the "one-sided" variation of it).
Summary
You don't have to follow this exact procedure, but the basic gist is that you can perform some analysis on a small part of the data starting on the very right, and then gradually increase the time domain until you reach a point where you find that the errors are getting larger rather than smaller as they ought to be.
Furthermore, you can also combine different heuristics to see if they agree with each other. If they don't, then it's probably a rather sketchy data point that requires some manual intervention.
Note that one particular advantage of the algorithm I sketched above is that you will get the results you want (damping constant and frequency) as part of the process, along with uncertainty estimates.
I neglected to mention most of the gory mathematical and algorithmic details so as to provide a general overview, but if needed I can provide more detail.

Power spectral density of a signal with gaps?

Does anyone know if it is possible to find a power spectral density of a signal with gaps in it. For example (in matlab syntax cause that is what I'm familiar with)
ta=1:1000;
tb=1200:3000;
t=[ta tb]; % this is the timebase
signal=randn(size(t)); this is a signal
figure(101)
plot(t,signal,'.')
I'd like to be able to determine frequencies on a longer time base that just the individual sections of data. Obviously I could just take the PSD of individual sections but that will limit the lowest frequency. I could interpolate the data, but this would colour the PSD.
Any thoughts would be much appreciated.

The Lomb-Scargle periodogram algorithm is usually used to perform analysis on unevenly spaced data (sampled at arbitrary time points) or when a proportion of the data is missing.
Here's a couple of MATLAB implementations:
lombscargle.m (FEX)
Lomb (Lomb-Scargle) Periodogram (FEX)
lomb.m - ECG tools by Gari Clifford

I found this Non Uniform FFT but I'm not sure that its exactly what I need as it might really be for data that is mostly sampled on an uneven time base, rather than evenly spaced data with significant gaps. I'll give it a go!

Leaving out segments of the Fourier basis vectors results in exactly the same FT, thus PSD, as using the complete basis, but multiplying by zeros within a zero padding in any signal "gaps".

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.