Interpolating measured sine wave using python

Interpolating measured sine wave using python - python

I have 2 sampled sine waves obtained as a measurement from a DSO. The sampling rate of the DSO is 160 GSa/s and my signal is 60 GHz. I need to find the phase difference between the two sine waves. Both are the same frequency. However, the sampling rate is not enough to accurately determine the phase. Is there any way to interpolate the measured signal to get a better sine wave and then calculate the phase difference?

You may fit to sine functions, but for the phase difference (delta phi=2pi frequency delta t) it would be sufficient to detect and compare the zero-crossings (respective a possible constant offset), which may be found from a segment of your series by an interpolation like
w=6.38 # some radian frequency
t = np.linspace(0, 0.5) # time interval containing ONE zero-crossing
delta_phi=0.1 # some phase difference
x = np.sin(w*t-delta_phi) # x(t)
f = interpolate.interp1d(x, t) # interpolate t(x), default is linear
delta_t = f(0) # zero-crossing time referred to t=0
delta_phi_detected= w*delta_t
You need to relate two adjacent zero-crossings of your signals.
Alternatively, you may obtain an average value by multiplication of both signals and numerical integration over time T which converges to (T/2)cos(delta_phi), if both signals have (or are made to) zero mean value.

Related

Parseval's Theorem with Numpy FFT is not fulfilled

I am trying to determine the total energy recorded by a detector in time domain by means of it's spectrum.
The first step after performing the Fast Fourier Transformation with Numpy's FFT library was to confirm Parseval's theorem.
According to the theorem, the total energy in time domain and in frequency domain must be the same. I have two problems that I am not able to solve.
I can confirm the theorem when I don't use the proper units for the x-Axis during the np.trapz() integration. As soon as I use my the actual sample points/frequencies, the result is off. I do not understand why this is the case and am wondering if I can apply a normalization to solve this error.
I cannot confirm the theorem when I apply a DC offset to the signal (uncomment the f = np.sin(np.pi**t)* line).
Below is my code with an examplatory Sine function.
# Python code
import matplotlib.pyplot as plt
import numpy as np
# Create a Sine function
dt = 0.001 # Time steps
t = np.arange(0,10,dt) # Time array
f = np.sin(np.pi*t) # Sine function
# f = np.sin(np.pi*t)+1 # Sine function with DC offset
N = len(t) # Number of samples
# Energy of function in time domain
energy_t = np.trapz(abs(f)**2)
# Energy of function in frequency domain
FFT = np.sqrt(2) * np.fft.rfft(f) # only positive frequencies; correct magnitude due to discarding of negative frequencies
FFT[0] /= np.sqrt(2) # DC magnitude does not have to be corrected
FFT[-1] /= np.sqrt(2) # Nyquist frequency does not have to be corrected
frq = np.fft.rfftfreq(N,d=dt) # FFT frequenices
# Energy of function in frequency domain
energy_f = np.trapz(abs(FFT)**2) / N
print('Parsevals theorem fulfilled: ' + str(energy_t - energy_f))
# Parsevals theorem with proper sample points
energy_t = np.trapz(abs(f)**2, x=t)
energy_f = np.trapz(abs(FFT)**2, x=frq) / N
print('Parsevals theorem NOT fulfilled: ' + str(energy_t - energy_f))

The FFT computes the Discrete Fourier Transform (DFT), which is not the same as the (continuous-domain) Fourier Transform.
For the DFT, Parseval’s theorem states that the sum of the square magnitude of the discrete signal equals the sum of the square magnitude of the DFT of the signal. There is no integration involved, and therefore you should not use trapz. Just use sum.
Note that a discrete signal is a set of samples x[n] at n=0..N-1. Fourier analysis in the discrete domain, and all related operations, only consider n, not t. The sampling frequency and the actual times those samples were recorded is irrelevant in these analyses. Likewise, the DFT produces a set of samples X[k] at k=0..N-1, not at any specific f or ω related to any sampling frequency.
Now it is possible to relate n to t because we know the sampling frequency, and it is possible to relate k to f because we know the sampling frequency. But these conversions should not make us think that X[k] is a sampling of the continuous-domain Fourier transform of the original continuous-domain signal. And they should especially not make us think that we can interpolate X[k].
Reconstructing the samples x[n] is accomplished by adding N sinusoids with parameters given by X[k]. “In between” those DFT components should not be anything. Interpolating them would mean we add sinusoids that do not exist in the samples x[n].
trapz uses linear interpolation to obtain an estimate of the integral, and therefore is inappropriate to use in discrete Fourier analysis.

Amplitude of rfft for real-valued array

I'm calculating the RFFT of a signal of length 3000 (sampled at 100 Hz) with only real valued entries:
from scipy.fft import rfft
coeffs = rfft(values)
coeffs = np.abs(coeffs)
With rfft I'm only getting half of the coefficients, i.e. the symmetric ones are dicarded (due to real valued input).
Is it correct to scale the values by coeffs = (2 / len(values)) * coeffs to get the amplitudes?
Edit: Below I have appended a plot of the amplitudes vs. Frequency (bins) for accelerometer and gyroscope (shaded area is standard deviation). For accelerometer the energy in the first FFT bin is much higher than in the other bins (> 2 in the first bin and around < 0.4 in the other bins). For gyroscope it is different and the energy is much more distributed.
Does that mean that for acccelerometer the FFT looks good but for gyroscope it is worse? Further, is it reasonable to cut the FFT at 100 Hz (i.e. take only bins < 100 Hz) or take the first few bins until 95% of the energy is kept?

The approximate relationship I provided in this post holds whether you throw out half the coefficients or not.
So, if the conditions indicated in that post apply to your situation, then you could get an estimate of the amplitude of a dominant sinusoidal component with
approx_sinusoidal_amplitude = (2 / len(values)) * np.abs(coeffs[k])
for some index k corresponding to the frequency of the sinusoidal component (which according to the limitations indicated in my other post has to be at or near a multiple of 100/3000 ~ 0.033Hz in your case). For a dominant sinusoidal component, this index would typically correspond to a local peak in the frequency spectrum. Note however that if your signal is a mixture of various frequency components, the individual components may affect the frequency spectrum in such a way that the peak does not appear clearly.

I have an array of samples at equal intervals of 25 ms. I want to determine the frequency spectrum for the fundamental

From say N=1000 voltage samples at 1 ms sample rate. I need to find precisely with python/numpy the amplitude and angle of the fundamental, which is between 45 and 55 Hz as well as any side bands, that may exist.
Do I need a phase lock loop to do that, or can it be done without ?

Your measurement frequency is fundamentally too low and should be more than double your expected event frequency!
measured: 0.025s
event range: 0.0182-0.0222s
more here: https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem

A phase lock loop is be a reasonable approach to estimate the frequency etc. of the fundamental. Supposing you have collected the N samples up front, another way to do the analysis is:
Apply a window, like np.hanning(N), by multiplying it pointwise with the samples.
Compute the spectrum with np.fft.rfft. For a sampling interval of Ts seconds, the nth element of the resulting array is the DFT coefficient for frequency n * N * Ts in units of Hz (or for the values N=1000, Ts=0.001, simple n Hz).
Find the bin with the peak magnitude between 45 and 55. The location of the peak gives the frequency of the fundamental. You can interpolate a polynomial (np.polyfit) across a few neighboring bins and find its peak to get a more precise estimate. Magnitude and complex phase of the peak give the amplitude and phase (angle) of the fundamental.
Plot the magnitude of the spectrum to look for sidebands.

How to extract features from FFT?

I am gathering data from X, Y and Z accelerometer sensors sampled at 200 Hz. The 3 axis are combined into a single signal called 'XYZ_Acc'. I followed tutorials on how to transform time domain signal into frequency domain using scipy fftpack library.
The code I'm using is the below:
from scipy.fftpack import fft
# get a 500ms slice from dataframe
sample500ms = df.loc[pd.to_datetime('2019-12-15 11:01:31.000'):pd.to_datetime('2019-12-15 11:01:31.495')]['XYZ_Acc']
f_s = 200 # sensor sampling frequency 200 Hz
T = 0.005 # 5 milliseconds between successive observation T =1/f_s
N = 100 # 100 samples in 0.5 seconds
f_values = np.linspace(0.0, f_s/2, N//2)
fft_values = fft(sample500ms)
fft_mag_values = 2.0/N * np.abs(fft_values[0:N//2])
Then I plot the frequency vs the magnitude
fig_fft = plt.figure(figsize=(5,5))
ax = fig_fft.add_axes([0,0,1,1])
ax.plot(f_values,fft_mag_values)
Screenshot:
My difficulty now is how to extract features out of this data, such as Irregularity, Fundamental Frequency, Flux...
Can someone guide me into the right direction?
Update 06/01/2019 - adding more context to my question.
I'm relatively new in machine learning, so any feedback is appreciated. X, Y, Z are linear acceleration signals, sampled at 200 Hz from a smart phone. I'm trying to detect road anomalies by analysing spectral and temporal statistics.
Here's a sample of the csv file which is being parsed into a pandas dataframe with the timestamp as the index.
X,Y,Z,Latitude,Longitude,Speed,timestamp
0.8756,-1.3741,3.4166,35.894833,14.354166,11.38,2019-12-15 11:01:30:750
1.0317,-0.2728,1.5602,35.894833,14.354166,11.38,2019-12-15 11:01:30:755
1.0317,-0.2728,1.5602,35.894833,14.354166,11.38,2019-12-15 11:01:30:760
1.0317,-0.2728,1.5602,35.894833,14.354166,11.38,2019-12-15 11:01:30:765
-0.1669,-1.9912,-4.2043,35.894833,14.354166,11.38,2019-12-15 11:01:30:770
-0.1669,-1.9912,-4.2043,35.894833,14.354166,11.38,2019-12-15 11:01:30:775
-0.1669,-1.9912,-4.2043,35.894833,14.354166,11.38,2019-12-15 11:01:30:780
In answer to 'francis', two columns are then added via this code:
df['XYZ_Acc_Mag'] = (abs(df['X']) + abs(df['Y']) + abs(df['Z']))
df['XYZ_Acc'] = (df['X'] + df['Y'] + df['Z'])
'XYZ_Acc_Mag' is to be used to extract temporal statistics.
'XYZ_Acc' is to be used to extract spectral statistics.
Data 'XYZ_Acc_Mag' is then re sampled in 0.5 second frequency and temporal stats such as mean, standard-deviation, etc have been extracted in a new dataframe. Pair plots reveal the anomaly shown at time 11:01:35 in the line plot above.
Now back to my original question. I'm re sampling data 'XYZ_Acc', also at 0.5 seconds, and obtaining the magnitude array 'fft_mag_values'. The question is how do I extract temporal features such as Irregularity, Fundamental Frequency, Flux out of it?

Since 'XYZ_Acc' is defined as a linear combination of the components of the signal, taking its DFT makes sense. It is equivalent to using a 1D accelometer in direction (1,1,1). But a more physical energy-related viewpoint can be adopted.
Computing the DFT is similar to writing the signal as a sum of sines. If the acceleration vector writes :
The corresponding velocity vector could write:
and the specific kinetic energy writes:
This method requires computing the DFT a each component before the magnitude corresponding to each frequency.
Another issue is that the DFT is intended to compute the Discrete Fourrier Transform of a periodic signal, that signal being build by periodizing the frame. Nevertheless, the actual frame is never a period of a periodic signal and repeating the period creates artificial discontinuities at the end/begin of the frame. The effects strong discontinuities in the spectral domain, deemded spectral leakage, could be reduced by windowing the frame. Computing the real-to-complex DFT result in a power distribution, featuring peaks at particular frequencies.
In addition the frequency of a given peak is better estimated as the mean frequency with respect to power density, as shown in Why are frequency values rounded in signal using FFT?
Another tool to estimate fundamental frequencies is to compute the autocorrelation of the signal: it is higher near the periods of the signal. Since the signal is a vector of 3 components, an autocorelation matrix can be built. It is a 3x3 Hermitian matrix for each time and therefore features real eigenvalues. The maxima of the higher eigen value can be picture as the magnitude of vaibrations while the correponding eigenvector is a complex direction, somewhat similar to the direction of vibrations combined to angular offsets. The angular offset may signal an ellipsoidal vibration.
Here is a fake signal, build by adding a guassian noise and sine waves:
Here is the power density spectrum for a given frame overlapping on sine wave:
Here is the resulting eigenvalues of the autocorrelation of the same frame, where the period of the 50Hz sine wave is visible. Vertical scaling is wrong:
Here goes a sample code:
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal
n=2000
t=np.linspace(0.,n/200,num=n,endpoint=False)
# an artificial signal, just for tests
ax=0.3*np.random.normal(0,1.,n)
ay=0.3*np.random.normal(0,1.,n)
az=0.3*np.random.normal(0,1.,n)
ay[633:733]=ay[633:733]+np.sin(2*np.pi*30*t[633:733])
az[433:533]=az[433:533]+np.sin(2*np.pi*50*t[433:533])
#ax=np.sin(2*np.pi*10*t)
#ay=np.sin(2*np.pi*30*t)
#az=np.sin(2*np.pi*50*t)
plt.plot(t,ax, label='x')
plt.plot(t,ay, label='y')
plt.plot(t,az, label='z')
plt.xlabel('t, s')
plt.ylabel('acc, m.s^-2')
plt.legend()
plt.show()
#splitting the sgnal into frames of 0.5s
noiseheight=0.
for i in range(2*(n/200)):
print 'frame', i,' time ', i*0.5, ' s'
framea=np.zeros((100,3))
framea[:,0]=ax[i*100:i*100+100]
framea[:,1]=ay[i*100:i*100+100]
framea[:,2]=az[i*100:i*100+100]
#for that frame, apply window. Factor 2 so that average remains 1.
window = np.hanning(100)
framea[:,0]=framea[:,0]*window*2
framea[:,1]=framea[:,1]*window*2
framea[:,2]=framea[:,2]*window*2
#DFT transform.
hatacc=np.fft.rfft(framea,axis=0, norm=None)
# scaling by length of frame.
hatacc=hatacc/100.
#computing the magnitude : all non-zero frequency are doubled to merge energy in bin N-k exp(-2ik/n) to bin k
accmag=2*(np.abs(hatacc[:,0])*np.abs(hatacc[:,0])+np.abs(hatacc[:,1])*np.abs(hatacc[:,1])+np.abs(hatacc[:,2])*np.abs(hatacc[:,2]))
accmag[0]=accmag[0]*0.5
#first frame says something about noise
if i==0:
noiseheight=2.*np.max(accmag)
if np.max(accmag)>noiseheight:
peaks, peaksdat=scipy.signal.find_peaks(accmag, height=noiseheight)
timestep=0.005
freq= np.fft.fftfreq(100, d=timestep)
#see https://stackoverflow.com/questions/54714169/why-are-frequency-values-rounded-in-signal-using-fft/54775867#54775867
# frequencies of peaks are better estimated as mean frequency of peak, with respect to power density
for ind in peaks:
totalweight=accmag[ind-2]+accmag[ind-1]+accmag[ind]+accmag[ind+1]+accmag[ind+2]
totalweightedfreq=accmag[ind-2]*freq[ind-2]+accmag[ind-1]*freq[ind-1]+accmag[ind]*freq[ind]+accmag[ind+1]*freq[ind+1]+accmag[ind+2]*freq[ind+2]
print 'found peak at frequency' , totalweightedfreq/totalweight, ' of height', accmag[ind]
#ploting
plt.plot(freq[0:50],accmag[0:50], label='||acc||^2')
plt.xlabel('frequency, Hz')
plt.ylabel('||acc||^2, m^2.s^-4')
plt.legend()
plt.show()
#another approach to find fundamental frequencies: computing the autocorrelation of the windowed signal and searching for maximums.
#building the autocorellation matrix
autocorr=np.zeros((100,3,3), dtype=complex)
acxfft=np.fft.fft(framea[:,0],axis=0, norm=None)
acyfft=np.fft.fft(framea[:,1],axis=0, norm=None)
aczfft=np.fft.fft(framea[:,2],axis=0, norm=None)
acxfft[0]=0.
acyfft[0]=0.
aczfft[0]=0.
autocorr[:,0,0]=np.fft.ifft(acxfft*np.conj(acxfft),axis=0, norm=None)
autocorr[:,0,1]=np.fft.ifft(acxfft*np.conj(acyfft),axis=0, norm=None)
autocorr[:,0,2]=np.fft.ifft(acxfft*np.conj(aczfft),axis=0, norm=None)
autocorr[:,1,0]=np.fft.ifft(acyfft*np.conj(acxfft),axis=0, norm=None)
autocorr[:,1,1]=np.fft.ifft(acyfft*np.conj(acyfft),axis=0, norm=None)
autocorr[:,1,2]=np.fft.ifft(acyfft*np.conj(aczfft),axis=0, norm=None)
autocorr[:,2,0]=np.fft.ifft(aczfft*np.conj(acxfft),axis=0, norm=None)
autocorr[:,2,1]=np.fft.ifft(aczfft*np.conj(acyfft),axis=0, norm=None)
autocorr[:,2,2]=np.fft.ifft(aczfft*np.conj(aczfft),axis=0, norm=None)
# at a given time, the 3x3 matrix autocorr is Hermitian.
#Its eigenvalues are real, its unitary eigenvectors signals directions of vibrations and phase between components.
autocorreigval=np.zeros((100,3))
autocorreigvec=np.zeros((100,3,3), dtype=complex)
for j in range(100):
autocorreigval[j,:], autocorreigvec[j,:,:]=np.linalg.eigh(autocorr[j,:,:],UPLO='L')
peaks, peaksdat=scipy.signal.find_peaks(autocorreigval[:50,2], 0.3*autocorreigval[0,2])
cleared=np.zeros(len(peaks))
peakperiod=np.zeros(len(peaks))
for j in range(len(peaks)):
totalweight=autocorreigval[peaks[j]-1,2]+autocorreigval[peaks[j],2]+autocorreigval[peaks[j]+1,2]
totalweightedperiod=0.005*(autocorreigval[peaks[j]-1,2]*(peaks[j]-1)+autocorreigval[peaks[j],2]*(peaks[j])+autocorreigval[peaks[j]+1,2]*(peaks[j]+1))
peakperiod[j]=totalweightedperiod/totalweight
#cleared[0]=1.
fundfreq=1
for j in range(len(peaks)):
if cleared[j]==0:
print "found fundamental frequency :", 1.0/(peakperiod[j]), 'eigenvalue', autocorreigval[peaks[j],2],' dir vibration ', autocorreigvec[peaks[j],:,2]
for k in range(j,len(peaks),1):
mm=np.zeros(1)
np.floor_divide(peakperiod[k],peakperiod[j],out=mm)
if ( np.abs(peakperiod[k]-peakperiod[j]*mm[0])< 0.2*peakperiod[j] or np.abs(peakperiod[k]-(peakperiod[j])*(mm[0]+1))< 0.2*peakperiod[j]) :
cleared[k]=fundfreq
#else :
# print k,j,mm[0]
# print peakperiod[k], peakperiod[j]*mm[0], peakperiod[j]*(mm[0]+1) , peakperiod[j]
fundfreq=fundfreq+1
plt.plot(t[i*100:i*100+100],autocorreigval[:,2], label='autocorrelation, large eigenvalue')
plt.plot(t[i*100:i*100+100],autocorreigval[:,1], label='autocorrelation, medium eigenvalue')
plt.plot(t[i*100:i*100+100],autocorreigval[:,0], label='autocorrelation, small eigenvalue')
plt.xlabel('t, s')
plt.ylabel('acc^2, m^2.s^-4')
plt.legend()
plt.show()
The output is:
frame 0 time 0.0 s
frame 1 time 0.5 s
frame 2 time 1.0 s
frame 3 time 1.5 s
frame 4 time 2.0 s
found peak at frequency 50.11249238149811 of height 0.2437842149351196
found fundamental frequency : 50.31467771196368 eigenvalue 47.03344783764712 dir vibration [-0.11441502+0.00000000e+00j 0.0216911 +2.98101624e-18j
-0.9931962 -5.95276353e-17j]
frame 5 time 2.5 s
frame 6 time 3.0 s
found peak at frequency 30.027895460975156 of height 0.3252387031089667
found fundamental frequency : 29.60690406120401 eigenvalue 61.51059682797539 dir vibration [ 0.11384195+0.00000000e+00j -0.98335779-4.34688198e-17j
-0.14158908+3.87566125e-18j]
frame 7 time 3.5 s
found peak at frequency 26.39622018109896 of height 0.042081187689137545
found fundamental frequency : 67.65844834016518 eigenvalue 6.875616417422696 dir vibration [0.8102307 +0.00000000e+00j 0.32697001-8.83058693e-18j
0.48643275-4.76094302e-17j]
frame 8 time 4.0 s
frame 9 time 4.5 s
Frequencies 50Hz and 30Hz got caught as 50.11/50.31Hz and 30.02/29.60Hz and directions are quite accurate as well. The last feature at 26.39Hz/67.65Hz is likely garbage, as it features different frequencies for the two methods and lower magnitude/eigenvalue.
Regarding monitoring of road surface to improve maintenance, I know of a project at my compagny, called Aigle3D. A laser fitted at the back of a van scans the road at highway speed at milimetric accuracy. The van is also fitted with a server, cameras and other sensors, thus providing a huge amount of data on road geometry and defects, presently covering hundreds of km of the french national road network. Detecting and repairing small early defects and cracks may extend the life expectancy of the road at limited cost. If useful, data from accelerometers of daily users could indeed complete the monitoring system, allowing a faster reaction whenether a large pothole appears.

Amplitude from scipy.fft

Why is the amplitude I compute far, far away from original after fast Fourier transform (FFT)?
I have a signal with 1024 points and sampling frequency of 1/120000. I apply the fast Fourier transform in Python with scipy.fftpack. I normalize the calculated magnitude by number of bins and multiply by 2 as I plot only positive values.
As my initial signal amplitude is around 64 dB, I get very low amplitude values less then 1.
Please see my code.
Signal = well.ReadWellData(SignalNDB)
y, x = Signal.GetData(numpy=np)
N = y.size # Number of sample points 1024 ...
T = 1/120000 # sampling frequency (sec)
x = np.linspace(0.0, N*T, N)
yf = abs(fft(y)) # Perform fft returning Magnitude
xf = np.linspace(0.0, 1.0/(2.0*T), N//2) # Calculatel frequency bins
freqs = fftfreq(N, T)
ax1=plt.subplot(211)
ax1.plot(x,y)
plt.grid()
ax2=plt.subplot(212)
yf2 = 2/N * np.abs(yf[0:N//2]); # Normalize Magnitude by number of bins and multiply by 2
ax2.semilogy(xf, yf2) # freq vs ampl - positive only freq
plt.grid()
ax1.set_title(["check"])
#ax2.set_xlim([0,4000])
plt.show()
Please see my plot:
EDIT:
Finally my signal Amplitude after fft is exactly what I expected. What I did.
First I did fft for signal in mV. Then I converted the results to dB as per the formula: 20*log10(mV)+60; where 60 represents 1 mV proveded by the tool manufacturer.Therefore dB values presented on a linear scale format # the bottom plot rather than on the log format.
Please see the resulting plot below.
Results

Looks good to me. The FFT, or the Fourier transform in general, gives you the representation of your time-domain signal in the frequencies domain.
By taking a look at your signal, you have two main components : something oscillating at around 500Hz (period of 0.002s) and an offset (which corresponds to freq = 0Hz). Looking at the result of the FFT, we can see mainly two peaks : one at 0Hz and the other one could be at 500Hz (difficult to be sure without zooming on the signal).
The only relation between the intensities is defined by the Parseval's theorem, but having a signal oscillating around 64dB doesn't mean its FFT should have values close to 64dB. I suggest you take a look here.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.