How to extract features from FFT? - python
I am gathering data from X, Y and Z accelerometer sensors sampled at 200 Hz. The 3 axis are combined into a single signal called 'XYZ_Acc'. I followed tutorials on how to transform time domain signal into frequency domain using scipy fftpack library.
The code I'm using is the below:
from scipy.fftpack import fft
# get a 500ms slice from dataframe
sample500ms = df.loc[pd.to_datetime('2019-12-15 11:01:31.000'):pd.to_datetime('2019-12-15 11:01:31.495')]['XYZ_Acc']
f_s = 200 # sensor sampling frequency 200 Hz
T = 0.005 # 5 milliseconds between successive observation T =1/f_s
N = 100 # 100 samples in 0.5 seconds
f_values = np.linspace(0.0, f_s/2, N//2)
fft_values = fft(sample500ms)
fft_mag_values = 2.0/N * np.abs(fft_values[0:N//2])
Then I plot the frequency vs the magnitude
fig_fft = plt.figure(figsize=(5,5))
ax = fig_fft.add_axes([0,0,1,1])
ax.plot(f_values,fft_mag_values)
Screenshot:
My difficulty now is how to extract features out of this data, such as Irregularity, Fundamental Frequency, Flux...
Can someone guide me into the right direction?
Update 06/01/2019 - adding more context to my question.
I'm relatively new in machine learning, so any feedback is appreciated. X, Y, Z are linear acceleration signals, sampled at 200 Hz from a smart phone. I'm trying to detect road anomalies by analysing spectral and temporal statistics.
Here's a sample of the csv file which is being parsed into a pandas dataframe with the timestamp as the index.
X,Y,Z,Latitude,Longitude,Speed,timestamp
0.8756,-1.3741,3.4166,35.894833,14.354166,11.38,2019-12-15 11:01:30:750
1.0317,-0.2728,1.5602,35.894833,14.354166,11.38,2019-12-15 11:01:30:755
1.0317,-0.2728,1.5602,35.894833,14.354166,11.38,2019-12-15 11:01:30:760
1.0317,-0.2728,1.5602,35.894833,14.354166,11.38,2019-12-15 11:01:30:765
-0.1669,-1.9912,-4.2043,35.894833,14.354166,11.38,2019-12-15 11:01:30:770
-0.1669,-1.9912,-4.2043,35.894833,14.354166,11.38,2019-12-15 11:01:30:775
-0.1669,-1.9912,-4.2043,35.894833,14.354166,11.38,2019-12-15 11:01:30:780
In answer to 'francis', two columns are then added via this code:
df['XYZ_Acc_Mag'] = (abs(df['X']) + abs(df['Y']) + abs(df['Z']))
df['XYZ_Acc'] = (df['X'] + df['Y'] + df['Z'])
'XYZ_Acc_Mag' is to be used to extract temporal statistics.
'XYZ_Acc' is to be used to extract spectral statistics.
Data 'XYZ_Acc_Mag' is then re sampled in 0.5 second frequency and temporal stats such as mean, standard-deviation, etc have been extracted in a new dataframe. Pair plots reveal the anomaly shown at time 11:01:35 in the line plot above.
Now back to my original question. I'm re sampling data 'XYZ_Acc', also at 0.5 seconds, and obtaining the magnitude array 'fft_mag_values'. The question is how do I extract temporal features such as Irregularity, Fundamental Frequency, Flux out of it?
Since 'XYZ_Acc' is defined as a linear combination of the components of the signal, taking its DFT makes sense. It is equivalent to using a 1D accelometer in direction (1,1,1). But a more physical energy-related viewpoint can be adopted.
Computing the DFT is similar to writing the signal as a sum of sines. If the acceleration vector writes :
The corresponding velocity vector could write:
and the specific kinetic energy writes:
This method requires computing the DFT a each component before the magnitude corresponding to each frequency.
Another issue is that the DFT is intended to compute the Discrete Fourrier Transform of a periodic signal, that signal being build by periodizing the frame. Nevertheless, the actual frame is never a period of a periodic signal and repeating the period creates artificial discontinuities at the end/begin of the frame. The effects strong discontinuities in the spectral domain, deemded spectral leakage, could be reduced by windowing the frame. Computing the real-to-complex DFT result in a power distribution, featuring peaks at particular frequencies.
In addition the frequency of a given peak is better estimated as the mean frequency with respect to power density, as shown in Why are frequency values rounded in signal using FFT?
Another tool to estimate fundamental frequencies is to compute the autocorrelation of the signal: it is higher near the periods of the signal. Since the signal is a vector of 3 components, an autocorelation matrix can be built. It is a 3x3 Hermitian matrix for each time and therefore features real eigenvalues. The maxima of the higher eigen value can be picture as the magnitude of vaibrations while the correponding eigenvector is a complex direction, somewhat similar to the direction of vibrations combined to angular offsets. The angular offset may signal an ellipsoidal vibration.
Here is a fake signal, build by adding a guassian noise and sine waves:
Here is the power density spectrum for a given frame overlapping on sine wave:
Here is the resulting eigenvalues of the autocorrelation of the same frame, where the period of the 50Hz sine wave is visible. Vertical scaling is wrong:
Here goes a sample code:
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal
n=2000
t=np.linspace(0.,n/200,num=n,endpoint=False)
# an artificial signal, just for tests
ax=0.3*np.random.normal(0,1.,n)
ay=0.3*np.random.normal(0,1.,n)
az=0.3*np.random.normal(0,1.,n)
ay[633:733]=ay[633:733]+np.sin(2*np.pi*30*t[633:733])
az[433:533]=az[433:533]+np.sin(2*np.pi*50*t[433:533])
#ax=np.sin(2*np.pi*10*t)
#ay=np.sin(2*np.pi*30*t)
#az=np.sin(2*np.pi*50*t)
plt.plot(t,ax, label='x')
plt.plot(t,ay, label='y')
plt.plot(t,az, label='z')
plt.xlabel('t, s')
plt.ylabel('acc, m.s^-2')
plt.legend()
plt.show()
#splitting the sgnal into frames of 0.5s
noiseheight=0.
for i in range(2*(n/200)):
print 'frame', i,' time ', i*0.5, ' s'
framea=np.zeros((100,3))
framea[:,0]=ax[i*100:i*100+100]
framea[:,1]=ay[i*100:i*100+100]
framea[:,2]=az[i*100:i*100+100]
#for that frame, apply window. Factor 2 so that average remains 1.
window = np.hanning(100)
framea[:,0]=framea[:,0]*window*2
framea[:,1]=framea[:,1]*window*2
framea[:,2]=framea[:,2]*window*2
#DFT transform.
hatacc=np.fft.rfft(framea,axis=0, norm=None)
# scaling by length of frame.
hatacc=hatacc/100.
#computing the magnitude : all non-zero frequency are doubled to merge energy in bin N-k exp(-2ik/n) to bin k
accmag=2*(np.abs(hatacc[:,0])*np.abs(hatacc[:,0])+np.abs(hatacc[:,1])*np.abs(hatacc[:,1])+np.abs(hatacc[:,2])*np.abs(hatacc[:,2]))
accmag[0]=accmag[0]*0.5
#first frame says something about noise
if i==0:
noiseheight=2.*np.max(accmag)
if np.max(accmag)>noiseheight:
peaks, peaksdat=scipy.signal.find_peaks(accmag, height=noiseheight)
timestep=0.005
freq= np.fft.fftfreq(100, d=timestep)
#see https://stackoverflow.com/questions/54714169/why-are-frequency-values-rounded-in-signal-using-fft/54775867#54775867
# frequencies of peaks are better estimated as mean frequency of peak, with respect to power density
for ind in peaks:
totalweight=accmag[ind-2]+accmag[ind-1]+accmag[ind]+accmag[ind+1]+accmag[ind+2]
totalweightedfreq=accmag[ind-2]*freq[ind-2]+accmag[ind-1]*freq[ind-1]+accmag[ind]*freq[ind]+accmag[ind+1]*freq[ind+1]+accmag[ind+2]*freq[ind+2]
print 'found peak at frequency' , totalweightedfreq/totalweight, ' of height', accmag[ind]
#ploting
plt.plot(freq[0:50],accmag[0:50], label='||acc||^2')
plt.xlabel('frequency, Hz')
plt.ylabel('||acc||^2, m^2.s^-4')
plt.legend()
plt.show()
#another approach to find fundamental frequencies: computing the autocorrelation of the windowed signal and searching for maximums.
#building the autocorellation matrix
autocorr=np.zeros((100,3,3), dtype=complex)
acxfft=np.fft.fft(framea[:,0],axis=0, norm=None)
acyfft=np.fft.fft(framea[:,1],axis=0, norm=None)
aczfft=np.fft.fft(framea[:,2],axis=0, norm=None)
acxfft[0]=0.
acyfft[0]=0.
aczfft[0]=0.
autocorr[:,0,0]=np.fft.ifft(acxfft*np.conj(acxfft),axis=0, norm=None)
autocorr[:,0,1]=np.fft.ifft(acxfft*np.conj(acyfft),axis=0, norm=None)
autocorr[:,0,2]=np.fft.ifft(acxfft*np.conj(aczfft),axis=0, norm=None)
autocorr[:,1,0]=np.fft.ifft(acyfft*np.conj(acxfft),axis=0, norm=None)
autocorr[:,1,1]=np.fft.ifft(acyfft*np.conj(acyfft),axis=0, norm=None)
autocorr[:,1,2]=np.fft.ifft(acyfft*np.conj(aczfft),axis=0, norm=None)
autocorr[:,2,0]=np.fft.ifft(aczfft*np.conj(acxfft),axis=0, norm=None)
autocorr[:,2,1]=np.fft.ifft(aczfft*np.conj(acyfft),axis=0, norm=None)
autocorr[:,2,2]=np.fft.ifft(aczfft*np.conj(aczfft),axis=0, norm=None)
# at a given time, the 3x3 matrix autocorr is Hermitian.
#Its eigenvalues are real, its unitary eigenvectors signals directions of vibrations and phase between components.
autocorreigval=np.zeros((100,3))
autocorreigvec=np.zeros((100,3,3), dtype=complex)
for j in range(100):
autocorreigval[j,:], autocorreigvec[j,:,:]=np.linalg.eigh(autocorr[j,:,:],UPLO='L')
peaks, peaksdat=scipy.signal.find_peaks(autocorreigval[:50,2], 0.3*autocorreigval[0,2])
cleared=np.zeros(len(peaks))
peakperiod=np.zeros(len(peaks))
for j in range(len(peaks)):
totalweight=autocorreigval[peaks[j]-1,2]+autocorreigval[peaks[j],2]+autocorreigval[peaks[j]+1,2]
totalweightedperiod=0.005*(autocorreigval[peaks[j]-1,2]*(peaks[j]-1)+autocorreigval[peaks[j],2]*(peaks[j])+autocorreigval[peaks[j]+1,2]*(peaks[j]+1))
peakperiod[j]=totalweightedperiod/totalweight
#cleared[0]=1.
fundfreq=1
for j in range(len(peaks)):
if cleared[j]==0:
print "found fundamental frequency :", 1.0/(peakperiod[j]), 'eigenvalue', autocorreigval[peaks[j],2],' dir vibration ', autocorreigvec[peaks[j],:,2]
for k in range(j,len(peaks),1):
mm=np.zeros(1)
np.floor_divide(peakperiod[k],peakperiod[j],out=mm)
if ( np.abs(peakperiod[k]-peakperiod[j]*mm[0])< 0.2*peakperiod[j] or np.abs(peakperiod[k]-(peakperiod[j])*(mm[0]+1))< 0.2*peakperiod[j]) :
cleared[k]=fundfreq
#else :
# print k,j,mm[0]
# print peakperiod[k], peakperiod[j]*mm[0], peakperiod[j]*(mm[0]+1) , peakperiod[j]
fundfreq=fundfreq+1
plt.plot(t[i*100:i*100+100],autocorreigval[:,2], label='autocorrelation, large eigenvalue')
plt.plot(t[i*100:i*100+100],autocorreigval[:,1], label='autocorrelation, medium eigenvalue')
plt.plot(t[i*100:i*100+100],autocorreigval[:,0], label='autocorrelation, small eigenvalue')
plt.xlabel('t, s')
plt.ylabel('acc^2, m^2.s^-4')
plt.legend()
plt.show()
The output is:
frame 0 time 0.0 s
frame 1 time 0.5 s
frame 2 time 1.0 s
frame 3 time 1.5 s
frame 4 time 2.0 s
found peak at frequency 50.11249238149811 of height 0.2437842149351196
found fundamental frequency : 50.31467771196368 eigenvalue 47.03344783764712 dir vibration [-0.11441502+0.00000000e+00j 0.0216911 +2.98101624e-18j
-0.9931962 -5.95276353e-17j]
frame 5 time 2.5 s
frame 6 time 3.0 s
found peak at frequency 30.027895460975156 of height 0.3252387031089667
found fundamental frequency : 29.60690406120401 eigenvalue 61.51059682797539 dir vibration [ 0.11384195+0.00000000e+00j -0.98335779-4.34688198e-17j
-0.14158908+3.87566125e-18j]
frame 7 time 3.5 s
found peak at frequency 26.39622018109896 of height 0.042081187689137545
found fundamental frequency : 67.65844834016518 eigenvalue 6.875616417422696 dir vibration [0.8102307 +0.00000000e+00j 0.32697001-8.83058693e-18j
0.48643275-4.76094302e-17j]
frame 8 time 4.0 s
frame 9 time 4.5 s
Frequencies 50Hz and 30Hz got caught as 50.11/50.31Hz and 30.02/29.60Hz and directions are quite accurate as well. The last feature at 26.39Hz/67.65Hz is likely garbage, as it features different frequencies for the two methods and lower magnitude/eigenvalue.
Regarding monitoring of road surface to improve maintenance, I know of a project at my compagny, called Aigle3D. A laser fitted at the back of a van scans the road at highway speed at milimetric accuracy. The van is also fitted with a server, cameras and other sensors, thus providing a huge amount of data on road geometry and defects, presently covering hundreds of km of the french national road network. Detecting and repairing small early defects and cracks may extend the life expectancy of the road at limited cost. If useful, data from accelerometers of daily users could indeed complete the monitoring system, allowing a faster reaction whenether a large pothole appears.
Related
Interpolating measured sine wave using python
I have 2 sampled sine waves obtained as a measurement from a DSO. The sampling rate of the DSO is 160 GSa/s and my signal is 60 GHz. I need to find the phase difference between the two sine waves. Both are the same frequency. However, the sampling rate is not enough to accurately determine the phase. Is there any way to interpolate the measured signal to get a better sine wave and then calculate the phase difference?
You may fit to sine functions, but for the phase difference (delta phi=2pi frequency delta t) it would be sufficient to detect and compare the zero-crossings (respective a possible constant offset), which may be found from a segment of your series by an interpolation like w=6.38 # some radian frequency t = np.linspace(0, 0.5) # time interval containing ONE zero-crossing delta_phi=0.1 # some phase difference x = np.sin(w*t-delta_phi) # x(t) f = interpolate.interp1d(x, t) # interpolate t(x), default is linear delta_t = f(0) # zero-crossing time referred to t=0 delta_phi_detected= w*delta_t You need to relate two adjacent zero-crossings of your signals. Alternatively, you may obtain an average value by multiplication of both signals and numerical integration over time T which converges to (T/2)cos(delta_phi), if both signals have (or are made to) zero mean value.
Amplitude of rfft for real-valued array
I'm calculating the RFFT of a signal of length 3000 (sampled at 100 Hz) with only real valued entries: from scipy.fft import rfft coeffs = rfft(values) coeffs = np.abs(coeffs) With rfft I'm only getting half of the coefficients, i.e. the symmetric ones are dicarded (due to real valued input). Is it correct to scale the values by coeffs = (2 / len(values)) * coeffs to get the amplitudes? Edit: Below I have appended a plot of the amplitudes vs. Frequency (bins) for accelerometer and gyroscope (shaded area is standard deviation). For accelerometer the energy in the first FFT bin is much higher than in the other bins (> 2 in the first bin and around < 0.4 in the other bins). For gyroscope it is different and the energy is much more distributed. Does that mean that for acccelerometer the FFT looks good but for gyroscope it is worse? Further, is it reasonable to cut the FFT at 100 Hz (i.e. take only bins < 100 Hz) or take the first few bins until 95% of the energy is kept?
The approximate relationship I provided in this post holds whether you throw out half the coefficients or not. So, if the conditions indicated in that post apply to your situation, then you could get an estimate of the amplitude of a dominant sinusoidal component with approx_sinusoidal_amplitude = (2 / len(values)) * np.abs(coeffs[k]) for some index k corresponding to the frequency of the sinusoidal component (which according to the limitations indicated in my other post has to be at or near a multiple of 100/3000 ~ 0.033Hz in your case). For a dominant sinusoidal component, this index would typically correspond to a local peak in the frequency spectrum. Note however that if your signal is a mixture of various frequency components, the individual components may affect the frequency spectrum in such a way that the peak does not appear clearly.
I have an array of samples at equal intervals of 25 ms. I want to determine the frequency spectrum for the fundamental
From say N=1000 voltage samples at 1 ms sample rate. I need to find precisely with python/numpy the amplitude and angle of the fundamental, which is between 45 and 55 Hz as well as any side bands, that may exist. Do I need a phase lock loop to do that, or can it be done without ?
Your measurement frequency is fundamentally too low and should be more than double your expected event frequency! measured: 0.025s event range: 0.0182-0.0222s more here: https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem
A phase lock loop is be a reasonable approach to estimate the frequency etc. of the fundamental. Supposing you have collected the N samples up front, another way to do the analysis is: Apply a window, like np.hanning(N), by multiplying it pointwise with the samples. Compute the spectrum with np.fft.rfft. For a sampling interval of Ts seconds, the nth element of the resulting array is the DFT coefficient for frequency n * N * Ts in units of Hz (or for the values N=1000, Ts=0.001, simple n Hz). Find the bin with the peak magnitude between 45 and 55. The location of the peak gives the frequency of the fundamental. You can interpolate a polynomial (np.polyfit) across a few neighboring bins and find its peak to get a more precise estimate. Magnitude and complex phase of the peak give the amplitude and phase (angle) of the fundamental. Plot the magnitude of the spectrum to look for sidebands.
Amplitude from scipy.fft
Why is the amplitude I compute far, far away from original after fast Fourier transform (FFT)? I have a signal with 1024 points and sampling frequency of 1/120000. I apply the fast Fourier transform in Python with scipy.fftpack. I normalize the calculated magnitude by number of bins and multiply by 2 as I plot only positive values. As my initial signal amplitude is around 64 dB, I get very low amplitude values less then 1. Please see my code. Signal = well.ReadWellData(SignalNDB) y, x = Signal.GetData(numpy=np) N = y.size # Number of sample points 1024 ... T = 1/120000 # sampling frequency (sec) x = np.linspace(0.0, N*T, N) yf = abs(fft(y)) # Perform fft returning Magnitude xf = np.linspace(0.0, 1.0/(2.0*T), N//2) # Calculatel frequency bins freqs = fftfreq(N, T) ax1=plt.subplot(211) ax1.plot(x,y) plt.grid() ax2=plt.subplot(212) yf2 = 2/N * np.abs(yf[0:N//2]); # Normalize Magnitude by number of bins and multiply by 2 ax2.semilogy(xf, yf2) # freq vs ampl - positive only freq plt.grid() ax1.set_title(["check"]) #ax2.set_xlim([0,4000]) plt.show() Please see my plot: EDIT: Finally my signal Amplitude after fft is exactly what I expected. What I did. First I did fft for signal in mV. Then I converted the results to dB as per the formula: 20*log10(mV)+60; where 60 represents 1 mV proveded by the tool manufacturer.Therefore dB values presented on a linear scale format # the bottom plot rather than on the log format. Please see the resulting plot below. Results
Looks good to me. The FFT, or the Fourier transform in general, gives you the representation of your time-domain signal in the frequencies domain. By taking a look at your signal, you have two main components : something oscillating at around 500Hz (period of 0.002s) and an offset (which corresponds to freq = 0Hz). Looking at the result of the FFT, we can see mainly two peaks : one at 0Hz and the other one could be at 500Hz (difficult to be sure without zooming on the signal). The only relation between the intensities is defined by the Parseval's theorem, but having a signal oscillating around 64dB doesn't mean its FFT should have values close to 64dB. I suggest you take a look here.
How do I match lomb-scargle and FFT plots of same dataset?
I am doing some work, comparing the interpolated fft of the concentrations of some gases over a period, of which is unevenly sampled, with the lomb-scargle periodogram of the same data. I am using scipy's fft function to calculate the fourier transform and then squaring the modulus of this to give what I believe to be the power spectral density, in units of parts per billion(ppb) squared. I can get the lomb-scargle plot to match almost the exact pattern as the FFT but never the same scale of magnitude, the FFT power spectral density always is higher, even though I thought the lomb-scargle power was power spectral density. Now the lomb code I am using:http://www.astropython.org/snippet/2010/9/Fast-Lomb-Scargle-algorithm, normalises the dataset taking away the average and dividing by 2 times the variance from the data, therefore I have normalised the FFT data in the same manner, but still the magnitudes do not match. Therefore I did some more research and found that the normalised lomb-scargle power could unitless and therefore I cannot the plots match. This leads me to the 2 questions: What units (if any) are the power spectral density of a normalised lim-scargle perioogram in? How would I proceed to match my fft plot with my lomb-scargle plot, in terms of magnitude and pattern? Thank you.
The squared modulus of the Fourier transform of a series is defined as the energy spectral density (ESD). You need to divide the ESD by the length of the series to convert to an estimate of power spectral density (PSD). Units The units of a PSD are [units]**2/[frequency] where [units] represents the units of your original series. Normalization To check for proper normalization, one can numerically integrate the PSD of a white noise (with known variance). If the integrated spectrum equals the variance of the series, the normalization is correct. A factor of 2 (too low) is not incorrect, though, and may indicate the PSD is normalized to be double-sided; in that case, just multiply by 2 and you have a properly normalized, single-sided PSD. Using numpy, the randn function generates pseudo-random numbers that are Gaussian distributed. For example 10 * np.random.randn(1, 100) produces a 1-by-100 array with mean=0 and variance=100. If the sampling frequency is, say, 1-Hz, the single-sided PSD will theoretically be flat at 200 units**2/Hz, from [0,0.5] Hz; the integrated spectrum would thus be 10, equaling the variance of the series. Update I modified the example included in the python code you linked to demonstrate the normalization for a normally distributed series of length 20, with variance 1, and sampling frequency 10: import numpy import lomb numpy.random.seed(999) nd = 20 fs = 10 x = numpy.arange(nd) y = numpy.random.randn(nd) fx, fy, nout, jmax, prob = lomb.fasper(x, y, 1., fs) fNy = fx[-1] fy = fy/fs Si = numpy.mean(fy)*fNy print fNy, Si, Si*2 This gives, for me: 5.26315789474 0.482185882163 0.964371764327 which shows you a few things: The "Nyquist" frequency asked for is actually the sampling frequency. The result needs to be divided by the sampling frequency. The output is normalized for a double-sided PSD, so multiplying by 2 makes the integrated spectrum nearly 1.
In the time since this question was asked and answered, the AstroPy project has gained a Lomb-Scargle method, and this question is addressed in the documentation: http://docs.astropy.org/en/stable/stats/lombscargle.html#psd-normalization-unnormalized In brief, you can compute a Fourier periodogram and compare it to the astropy Lomb-Scargle periodogram as follows import numpy as np from astropy.stats import LombScargle def fourier_periodogram(t, y): N = len(t) frequency = np.fft.fftfreq(N, t[1] - t[0]) y_fft = np.fft.fft(y) positive = (frequency > 0) return frequency[positive], (1. / N) * abs(y_fft[positive]) ** 2 t = np.arange(100) y = np.random.randn(100) frequency, PSD_fourier = fourier_periodogram(t, y) PSD_LS = LombScargle(t, y).power(frequency, normalization='psd') np.allclose(PSD_fourier, PSD_LS) # True Since AstroPy is a common tool used in astronomy, I thought this might be more useful than an answer based on the code snippet mentioned above.