The spectrum shows ripples that we can visually quantify as ~50 MHz ripples. I am looking for a method to calculate the frequency of these ripples other than by visual inspection of thousands of spectra. Since the function is in frequency domain, taking FFT would get it back into time domain (with time reversal if I am correct). How can we get frequency of these ripples?
The problem arises from the fact that you are making a confusion between the term 'frequency' which you are measuring and the frequency of your data.
What you want is the ripple frequency, which actually is the period of your data.
With that out of the way, let's have a look at how to fix your fft.
As pointed out by Dmitrii's answer, you must determine the sampling frequency of your data and also get rid of the low frequency components in your FFT result.
To determine the sampling frequency, you can determine the sampling period by subtracting each sample by its predecessor and computing the average. The average sampling frequency will just be the inverse of that.
fs = 1 / np.mean(freq[1:] - freq[:-1])
For the high pass filter, you may use a butterworth filter, this is a good implementation.
# Defining a high pass filter
def butter_highpass(cutoff, fs, order=5):
nyq = 0.5 * fs
normal_cutoff = cutoff / nyq
b, a = signal.butter(order, normal_cutoff, btype='high', analog=False)
return b, a
def butter_highpass_filter(data, cutoff, fs, order=5):
b, a = butter_highpass(cutoff, fs, order=order)
y = signal.filtfilt(b, a, data)
return y
Next, when plotting the fft, you need to take the absolute value of it, that is what you are after. Also, since it gives you both the positive and negative parts, you can just use the positive one. As far as the x-axis is concerned, it will be from 0 to half of your sampling frequency. This is further explored on this answer
fft_amp = np.abs(np.fft.fft(amp, amp.size))
fft_amp = fft_amp[0:fft_amp.size // 2]
fft_freq = np.linspace(0, fs / 2, fft_amp.size)
Now, to determine the ripple frequency, simply obtain the peak of the FFT. The value you are looking for (around 50MHz) will be the period of the ripple peak (in GHz), since your original data was in GHz. For this example, it is actually around 57MHz.
peak = fft_freq[np.argmax(fft_amp)]
ripple_period = 1 / peak * 1000
print(f'The ripple period is {ripple_period} MHz')
And here is the complete code, which also plots the data.
import numpy as np
import pylab as plt
from scipy import signal as signal
# Defining a high pass filter
def butter_highpass(cutoff, fs, order=5):
nyq = 0.5 * fs
normal_cutoff = cutoff / nyq
b, a = signal.butter(order, normal_cutoff, btype='high', analog=False)
return b, a
def butter_highpass_filter(data, cutoff, fs, order=5):
b, a = butter_highpass(cutoff, fs, order=order)
y = signal.filtfilt(b, a, data)
return y
with open('ripple.csv', 'r') as fil:
data = np.genfromtxt(fil, delimiter=',', skip_header=True)
amp = data[:, 0]
freq = data[:, 1]
# Determine the sampling frequency of the data (it is around 500 Hz)
fs = 1 / np.mean(freq[1:] - freq[:-1])
# Apply a median filter to remove the noise
amp = signal.medfilt(amp)
# Apply a highpass filter to remove the low frequency components 5 Hz was chosen
# as the cutoff fequency by visual inspection. Depending on the problem, you
# might want to choose a different value
cutoff_freq = 5
amp = butter_highpass_filter(amp, cutoff_freq, fs)
_, ax = plt.subplots(ncols=2, nrows=1)
ax[0].plot(freq, amp)
ax[0].set_xlabel('Frequency GHz')
ax[0].set_ylabel('Intensity dB')
ax[0].set_title('Filtered signal')
# The FFT part is as follows
fft_amp = np.abs(np.fft.fft(amp, amp.size))
fft_amp = fft_amp[0:fft_amp.size // 2]
fft_freq = np.linspace(0, fs / 2, fft_amp.size)
ax[1].plot(fft_freq, 2 / fft_amp.size * fft_amp, 'r-') # the red plot
ax[1].set_xlabel('FFT frequency')
ax[1].set_ylabel('Intensity dB')
plt.show()
peak = fft_freq[np.argmax(fft_amp)]
ripple_period = 1 / peak * 1000
print(f'The ripple period is {ripple_period} MHz')
And here is the plot:
To get a proper spectrum for the blue plot you need to do two things:
Properly calculate frequencies for the spectrum plot (the red one)
Remove bias in the data so the spectrum is less contaminated with low
frequencies. That's because you're interested in the ripple, not in the slow fluctuations.
Note, that when you compute fft, you get complex values that contain information about both amplitude and phase of oscillations for each frequency. In your case, the red plot should be an amplitude spectrum (compared to the phase spectrum). To get that, we take absolute values of
fft coefficients.
Also, the spectrum you get with fft is two-sided and symmetric (since the signal is real). You really need only one side to get the idea where your ripple peak frequency is. I've implemented this in code.
After playing with your data, here's what I've got:
import pandas as pd
import numpy as np
import pylab as plt
import plotly.graph_objects as go
from scipy import signal as sig
df = pd.read_csv("ripple.csv")
f = df.Frequency.to_numpy()
data = df.Data
data = sig.medfilt(data) # median filter to remove the spikes
fig = go.Figure()
fig.add_trace(go.Scatter(x=f, y=(data - data.mean())))
fig.update_layout(
xaxis_title="Frequency in GHz", yaxis_title="dB"
) # the blue plot with ripples
fig.show()
# Remove bias to get rid of low frequency peak
data_fft = np.fft.fft(data - data.mean())
L = len(data) # number of samples
# Compute two-sided spectrum
tssp = abs(data_fft / L)
# Compute one-sided spectrum
ossp = tssp[0 : int(L / 2)]
ossp[1:-1] = 2 * ossp[1:-1]
delta_freq = f[1] - f[0] # without this freqs computation is incorrect
freqs = np.fft.fftfreq(f.shape[-1], delta_freq)
# Use first half of freqs since spectrum is one-sided
plt.plot(freqs[: int(L / 2)], ossp, "r-") # the red plot
plt.xlim([0, 50])
plt.xticks(np.arange(0, 50, 1))
plt.grid()
plt.xlabel("Oscillations per frequency")
plt.show()
So you can see there are two peaks: low-freq. oscillations between 1 and 2 Hz
and your ripple at around 17 oscillations per GHz.
Related
Let us suppose I have some gaussian data, at some known frequency from one another, sitting on some low frequency noise data. Each gaussian has a probability given by a Poisson distribution, so all the peaks have different heights.
I would like to design a filter to extract the peaks (or, subtract the noise).
I have tried to implement a Butterworth filter to remove the low frequency noise.
My implementation at the moment seems to produce a negative signal, which is not what I expect:
The signal is below 0, which is not what I expect;
My peaks do not appear to sit on a flat line. I think this is a result of my peaks being different amplitudes.
There appears to be some sort of 'ghost peaks' to the left and right of the peaks I am interested in.
Being new to Butterworth filters, I am unsure of what exactly it is I am doing incorrectly.
Can someone elucidate me to my mistake?
My implementation is as follows:
from scipy.stats import norm
from scipy.stats import skewnorm
from scipy.stats import poisson
from astropy.stats import freedman_bin_width
from scipy.interpolate import interp1d
Delta_X = 5
gaus_width_0 = 0.5
gaus_width_1 = 0.02
mu=2
######
# DEFINE HIGH PASS BUTTERWORTH FILTER
######
def butter_highpass(cutoff, fs, order=5):
nyq = 0.5 * fs
normal_cutoff = cutoff / nyq
b, a = signal.butter(order, normal_cutoff, btype='high', analog=False)
return b, a
def butter_highpass_filter(data, cutoff, fs, order=5):
b, a = butter_highpass(cutoff, fs, order=order)
y = signal.filtfilt(b, a, data)
return y
######
# GENERATE NOISE - GAUSSIANS SEPARATED BY Delta_X, WITH SOME SMEARING. EACH PEAK HAS A POISSON PROBABILITY
######
data = []
for i in range(10):
mean = float(i)*Delta_X
std = gaus_width_0 + gaus_width_1*i
gaus = norm.rvs(mean, std, size=int(20000*poisson.pmf(i, mu)))
data.append(gaus)
######
# GENERATE NOISE - A SKEWED NORMAL DISTRIBUTION
######
noise = 4*Delta_X*skewnorm.rvs(5, size=100000)
data.append(noise)
data = np.concatenate(data)
######
# GENERATE A HISTOGRAM
####
bw, bins = freedman_bin_width(data, return_bins=True)
counts, _ = np.histogram(data, bins=bins)
bin_centres = (bins[:-1] + bins[1:])/2.
nbins = len(bin_centres)
bin_numbers = np.arange(0, nbins)
######
# INTERPOLATE BETWEEN BIN UNITS AND MEASURED UNITS TO FIND THE FREQUENCY, IN BINS
####
f_inv = interp1d(bin_centres, bin_numbers)
freq_bins = f_inv(Delta_X) - f_inv(0)
######
# APPLY THE BUTTERWORTH FILTER
####
filtered_counts = butter_highpass_filter(counts, freq_bins, nbins)
plt.figure(figsize=(10,10))
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.xlabel("x", fontsize=20)
plt.plot(bin_centres, counts, lw=3, label="Original Histogram")
plt.plot(bin_centres, filtered_counts, lw=3, label="Original Histogram")
plt.grid(which="both")
plt.show()
I get the following result:
The Situation
I am currently writing a program that will later on be used to analyze a signal that is somewhat of a asymmetric Gaussian. I am interested in how many frequencies I need to reproduce the signal somewhat exact and especially the amplitudes of those frequencies.
Before I input the real data I'm testing the program with a default (asymmetric) Gaussian, as can be seen in the code below.
My Problem
To ensure I that get the amplitudes right, I am rebuilding the original signal using the whole frequency spectrum, but there are some difficulties. I get to reproduce the signal somewhat well multiplying amp with 0.16, which I got by looking at the fraction rebuild/original. Of course, this is really unsatisfying and can't be the correct solution.
To be precise the difference is not dependant on the time length and seems to be a Gaussian too, following the form of the original, increasing in asymmetry according to the Skewnorm function itself. The amplitude of the difference function is correlated linear to 'height'.
My Question
I am writing this post because I am out of ideas for getting the amplitude right. Maybe anyone has had the same / a similar problem and can share their solution for this / give a hint.
Further information
Before focusing on a (asymmetric) Gaussian I analyzed periodic signals and rectangular pulses, which sadly were very unstable to variations in the time length of the input signal. In this context, I experimented with window functions, which seemed to speed up the process and increase the stability, the reason being that I had to integrate the peaks. Working with the Gaussian I got told to take each peak, received via the bare fft and ditch the integration approach, therefore my incertitude considering the amplitude described above. Maybe anyone got an opinion on the approach chosen by me and if necessary can deliver an improvement.
Code
from numpy.fft import fft, fftfreq
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import skewnorm
np.random.seed(1234)
def data():
height = 1
data = height * skewnorm.pdf(t, a=0, loc=t[int(N/2)])
# noise_power = 1E-6
# noise = np.random.normal(scale=np.sqrt(noise_power), size=t.shape)
# data += noise
return data
def fft_own(data):
freq = fftfreq(N, dt)
data_fft = fft(data) * np.pi
amp = 2/N * np.abs(data_fft) # * factor (depending on t1)
# amp = 2/T * np.abs(data_fft)**2
phase = np.angle(data_fft)
peaks, = np.where(amp >= 0) # use whole spectrum for rebuild
return freq, amp, phase, peaks
def rebuild(fft_own):
freq, amp, phase, peaks = fft_own
df = freq[1] - freq[0]
data_rebuild = 0
for i in peaks:
amplitude = amp[i] * df
# amplitude = amp[i] * 0.1
# amplitude = np.sqrt(amp[i] * df)
data_rebuild += amplitude * np.exp(0+1j * (2*np.pi * freq[i] * t
+ phase[i]))
f, ax = plt.subplots(1, 1)
# mask = (t >= 0) & (t <= t1-1)
ax.plot(t, data_init, label="initial signal")
ax.plot(t, np.real(data_rebuild), label="rebuild")
# ax.plot(t[mask], (data_init - np.real(data_rebuild))[mask], label="diff")
ax.set_xlim(0, t1-1)
ax.legend()
t0 = 0
t1 = 10 # diff(t0, t1) ∝ df
# T = t1- t0
N = 4096
t = np.linspace(t0, t1, int(N))
dt = (t1 - t0) / N
data_init = data()
fft_init = fft_own(data_init)
rebuild_init = rebuild(fft_init)
You should get a perfect reconstruction if you divide amp by N, and remove all your other factors.
Currently you do:
data_fft = fft(data) * np.pi # Multiply by pi
amp = 2/N * np.abs(data_fft) # Multiply by 2/N
amplitude = amp[i] * df # Multiply by df = 1/(dt*N) = 1/10
This means that you currently multiply by a total of pi * 2 / 10, or 0.628, that you shouldn't (only the 1/N factor in there is correct).
Correct code:
def fft_own(data):
freq = fftfreq(N, dt)
data_fft = fft(data)
amp = np.abs(data_fft) / N
phase = np.angle(data_fft)
peaks, = np.where(amp >= 0) # use whole spectrum for rebuild
return freq, amp, phase, peaks
def rebuild(fft_own):
freq, amp, phase, peaks = fft_own
data_rebuild = 0
for i in peaks:
data_rebuild += amp[i] * np.exp(0+1j * (2*np.pi * freq[i] * t
+ phase[i]))
Your program can be significantly simplified by using ifft. Simply set to 0 those frequencies in data_fft that you don't want to include in the reconstruction, and apply ifft to it:
data_fft = fft(data)
data_fft[np.abs(data_fft) < threshold] = 0
rebuild = ifft(data_fft).real
Note that the Fourier transform of a Gaussian is a Gaussian, so you won't be picking out individual peaks, you are picking a compact range of frequencies that will always include 0. This is an ideal low-pass filter.
i am trying to remove the base wander noise from ecg signal, base wander noise is low-frequency artefact of around 0.5Hz , for that i tried a digital butterworth highpass filter:
code of filter
frequency response
the ecg signal used is the record 100 from mit bih arrhythmia data base ( record sampled at 360 samples per second), first i read the record using wfdb package and then i applied the filter on it, but the result looks something like this:
code of filtering the signal
the result
the result looks kinda off. i want to know where is the problem?
I think the problem is that your filter does not know the time step of your dataset, particularly in your call to bilinear where you explicitly give a sampling frequency of 1.
To demonstrate, let's start with a pretty strong signal:
import numpy as np
import matplotlib.pyplot as plt
N = 1_000
T = 4.
tau = 0.25 # period of signal, frequency of 4 Hz
f = 1 / tau # frequency
omega = 2 * np.pi * f # angular frequency
t, dt = np.linspace(0, T, N, endpoint = False, retstep = True)
fs = 1 / dt # sampling frequency in Hz
noise = np.random.random(N)
sig = np.sin(t * omega)
x = 0.2 * noise + sig
plt.plot(t, x)
plt.xlabel('time')
Now let's start with the filter you've developed. My wave is higher frequency than your cutoff so it should be caught, but it isn't.
from scipy import signal
b, a = signal.butter(4, 0.5, 'high', analog = True, output = 'ba')
z, p = signal.bilinear(b, a, 1.)
x_filt = signal.lfilter(z, p, x)
plt.figure()
plt.plot(t, x)
plt.plot(t, x_filt)
plt.xlabel('time')
Note that in your call to bilinear you are setting a sampling frequency of 1. Let's instead use our sampling frequency (defined above as fs).
z, p = signal.bilinear(b, a, fs)
x_filt = signal.lfilter(z, p, x)
plt.figure()
plt.plot(t, x)
plt.plot(t, x_filt)
plt.xlabel('time')
As a last note, you may sometimes observe a phase shift with Butterorth filters.
Again, I am not an expert and have only ever used them in cases where I was not worried about the phase, but see here for suggestion on using a different type of filter if this is an issue. The comments here also suggest using a different filter for this reason.
I'm trying to make some example of FFTs. The idea here is to have 3 wavelengths for 3 different musical notes (A, C, E), add them together (to form the aminor chord) and then do an FFT to retrieve the original frequencies.
import numpy as np
import matplotlib.pyplot as plt
import scipy.fft
def generate_sine_wave(freq, sample_rate, duration):
x = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
frequencies = x * freq
# 2pi because np.sin takes radians
y = np.sin(2 * np.pi * frequencies)
return x, y
def main():
# Frequency of note in Aminor chord (A, C, E)
# note_names = ('A', 'C', 'E')
# fs = (440, 261.63, 329.63)
fs = (27.50, 16.35, 20.60)
# duration, in seconds.
duration = .5
# sample rate. determines how many data points the signal uses to represent
# the sine wave per second. So if the signal had a sample rate of 10 Hz and
# was a five-second sine wave, then it would have 10 * 5 = 50 data points.
sample_rate = 1000
fig, ax = plt.subplots(5)
all_wavelengths = []
# Create a linspace, with N samples from 0 to duration
# x = np.linspace(0.0, T, N)
for i, f in enumerate(fs):
x, y = generate_sine_wave(f, sample_rate, duration)
# y = np.sin(2 * np.pi * F * x)
all_wavelengths.append(y)
ax[i].plot(x, y)
# sum of all notes
aminor = np.sum(all_wavelengths, axis=0)
ax[i].plot(x, aminor)
yf = np.abs(scipy.fft.rfft(aminor))
xf = scipy.fft.rfftfreq(int(sample_rate * duration), 1 / sample_rate)
ax[i + 1].plot(xf, yf)
ax[i + 1].vlines(fs, ymin=0, ymax=(max(yf)), color='purple')
plt.show()
if __name__ == '__main__':
main()
However, the FFT plot (last subplot) does not have the proper peak frequencies (highlighted through vertical purple lines). Why is that?
The FFT will only recover the contained frequencies exactly if the sampling window covers a multiple of the signal's period. Otherwise, if there is a "remainder", the frequency peaks will deviate from the exact values.
Since your A-minor signal contains three distinct frequencies, 27.50, 16.35, 20.60 Hz, you need a sampling duration which covers a multiple of the period for each of those components. In order to find that duration, you can compute the least common multiple of each of the fractional parts of the frequencies:
>>> import math
>>> math.lcm(50, 35, 60, 100)
2100
Note that we're including 100 here because the multiple also needs to satisfy the condition to sample a whole period. The above result implies that for a duration of 21 seconds, the frequencies will be recovered perfectly. Of course, any other multiple of 21 seconds will work as well. The following plot is obtained for a duration of 21 seconds:
I think that - within margin of error - the results do in-fact match your frequencies:
You can see in your frequency plot that the closest frequency in the plot to your actual frequencies do indeed have the highest amplitude.
However, because this is a DFT algorithm, and so the frequencies being returned are discrete, they don't exactly match the frequencies you used to construct your sample.
What you can try is making your sample size (ie the number of time points in your input data) either longer and/or a multiple of your input wavelengths. That should increase the frequency resolution and/or move the sampled output frequencies closer to input frequencies.
I set up a sine wave of a certain amplitude, frequency and phase, and tried recovering the amplitude and phase:
import numpy as np
import matplotlib.pyplot as plt
N = 1000 # Sample points
T = 1 / 800 # Spacing
t = np.linspace(0.0, N*T, N) # Time
frequency = np.fft.fftfreq(t.size, d=T) # Normalized Fourier frequencies in spectrum.
f0 = 25 # Frequency of the sampled wave
phi = np.pi/6 # Phase
A = 50 # Amplitude
s = A * np.sin(2 * np.pi * f0 * t - phi) # Signal
S = np.fft.fft(s) # Unnormalized FFT
fig, [ax1,ax2] = plt.subplots(nrows=2, ncols=1, figsize=(10, 5))
ax1.plot(t,s,'.-', label='time signal')
ax2.plot(freq[0:N//2], 2/N * np.abs(S[0:N//2]), '.', label='amplitude spectrum')
plt.show()
index, = np.where(np.isclose(frequency, f0, atol=1/(T*N))) # Getting the normalized frequency close to f0 in Hz)
magnitude = np.abs(S[index[0]]) # Magnitude
phase = np.angle(S[index[0]]) # Phase
print(magnitude)
print(phase)
phi
#21785.02149316858
#-1.2093259641890741
#0.5235987755982988
Now the amplitude should be 50, instead of 21785, and the phase pi/6=0.524, instead of -1.2.
Am I misinterpreting the output, or the answer on the post referred to in the link above?
You need to normalize the fft by 1/N with one of the two following changes (I used the 2nd one):
S = np.fft.fft(s) --> S = 1/N*np.fft.fft(s)
magnitude = np.abs(S[index[0]]) --> magnitude = 1/N*np.abs(S[index[0]])
Don't use index, = np.where(np.isclose(frequency, f0, atol=1/(T*N))), the fft is not exact and the highest magnitude may
not be at f0, use np.argmax(np.abs(S)) instead which will give
you the peak of the signal which will be very close to f0
np.angle messes up (I think its one of those pi,pi/2 arctan offset
things) just do it manually with np.arctan(np.real(x)/np.imag(x))
use more points (I made N higher) and make T smaller for higher accuracy
since a DFT (discrete fourier transform) is double sided and has peak signals in both the negative and positive frequencies, the peak in the positive side will only be half the actual magnitude. For an fft you need to multiply every frequency by two except for f=0 to acount for this. I multiplied by 2 in magnitude = np.abs(S[index])*2/N
N = 10000
T = 1/5000
...
index = np.argmax(np.abs(S))
magnitude = np.abs(S[index])*2/N
freq_max = frequency[index]
phase = np.arctan(np.imag(S[index])/np.real(S[index]))
print(f"magnitude: {magnitude}, freq_max: {freq_max}, phase: {phase}") print(phi)
Output: magnitude: 49.996693276663564, freq_max: 25.0, phase: 0.5079341239733628