Power Spectrum Analysis for a very large set of data - python

I have a voltage signal that I am trying to denoise. The signal comes in very large files (524288 cells). When I take the whole file and make n equal to length of data set, I get two extremely large peaks at frequency 0 and max.
sig = np.genfromtxt(directory + '/'+ file, skip_header=3, dtype=np.float64)
n = len(sig)
freq = np.arange(n)
fhat = np.fft.fft(sig, n)
PSD = fhat * np.conj(fhat) / n
plt.plot(freq,PSD)
plt.show()
indices: bool = PSD > 100
fhat = indices * fhat
ffilt = np.fft.ifft(fhat)
plt.plot(ffilt)
plt.show()
Is there a way to analyse the whole file or I have to split it to smaller data sets?

You can analyze the whole file in one go. Those high peaks might a padding issue.
Anyway, I would suggest you to use one of the already implemented methods in Python like the Welch method from Scipy. It would save you time and less headaches figuring out if your implementation is correct. Here is an example adapted from the Scipy lectures:
import numpy as np
from matplotlib import pyplot as plt
from scipy import signal
# Seed the random number generator
np.random.seed(0)
time_step = .01
time_vec = np.arange(0, 70, time_step)
# A signal with a small frequency chirp
sig = np.sin(0.5 * np.pi * time_vec * (1 + .1 * time_vec))
plt.figure()
plt.plot(time_vec, sig)
# Compute the Power Spectral Density
freqs, psd = signal.welch(sig)
plt.figure()
plt.semilogx(freqs, psd)
plt.title('PSD: power spectral density')
plt.xlabel('Frequency')
plt.ylabel('Power')
plt.tight_layout()
plt.show()
And here are the results:

Related

Correct amplitude of the python fft (for a Skew normal distribution)

The Situation
I am currently writing a program that will later on be used to analyze a signal that is somewhat of a asymmetric Gaussian. I am interested in how many frequencies I need to reproduce the signal somewhat exact and especially the amplitudes of those frequencies.
Before I input the real data I'm testing the program with a default (asymmetric) Gaussian, as can be seen in the code below.
My Problem
To ensure I that get the amplitudes right, I am rebuilding the original signal using the whole frequency spectrum, but there are some difficulties. I get to reproduce the signal somewhat well multiplying amp with 0.16, which I got by looking at the fraction rebuild/original. Of course, this is really unsatisfying and can't be the correct solution.
To be precise the difference is not dependant on the time length and seems to be a Gaussian too, following the form of the original, increasing in asymmetry according to the Skewnorm function itself. The amplitude of the difference function is correlated linear to 'height'.
My Question
I am writing this post because I am out of ideas for getting the amplitude right. Maybe anyone has had the same / a similar problem and can share their solution for this / give a hint.
Further information
Before focusing on a (asymmetric) Gaussian I analyzed periodic signals and rectangular pulses, which sadly were very unstable to variations in the time length of the input signal. In this context, I experimented with window functions, which seemed to speed up the process and increase the stability, the reason being that I had to integrate the peaks. Working with the Gaussian I got told to take each peak, received via the bare fft and ditch the integration approach, therefore my incertitude considering the amplitude described above. Maybe anyone got an opinion on the approach chosen by me and if necessary can deliver an improvement.
Code
from numpy.fft import fft, fftfreq
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import skewnorm
np.random.seed(1234)
def data():
height = 1
data = height * skewnorm.pdf(t, a=0, loc=t[int(N/2)])
# noise_power = 1E-6
# noise = np.random.normal(scale=np.sqrt(noise_power), size=t.shape)
# data += noise
return data
def fft_own(data):
freq = fftfreq(N, dt)
data_fft = fft(data) * np.pi
amp = 2/N * np.abs(data_fft) # * factor (depending on t1)
# amp = 2/T * np.abs(data_fft)**2
phase = np.angle(data_fft)
peaks, = np.where(amp >= 0) # use whole spectrum for rebuild
return freq, amp, phase, peaks
def rebuild(fft_own):
freq, amp, phase, peaks = fft_own
df = freq[1] - freq[0]
data_rebuild = 0
for i in peaks:
amplitude = amp[i] * df
# amplitude = amp[i] * 0.1
# amplitude = np.sqrt(amp[i] * df)
data_rebuild += amplitude * np.exp(0+1j * (2*np.pi * freq[i] * t
+ phase[i]))
f, ax = plt.subplots(1, 1)
# mask = (t >= 0) & (t <= t1-1)
ax.plot(t, data_init, label="initial signal")
ax.plot(t, np.real(data_rebuild), label="rebuild")
# ax.plot(t[mask], (data_init - np.real(data_rebuild))[mask], label="diff")
ax.set_xlim(0, t1-1)
ax.legend()
t0 = 0
t1 = 10 # diff(t0, t1) ∝ df
# T = t1- t0
N = 4096
t = np.linspace(t0, t1, int(N))
dt = (t1 - t0) / N
data_init = data()
fft_init = fft_own(data_init)
rebuild_init = rebuild(fft_init)
You should get a perfect reconstruction if you divide amp by N, and remove all your other factors.
Currently you do:
data_fft = fft(data) * np.pi # Multiply by pi
amp = 2/N * np.abs(data_fft) # Multiply by 2/N
amplitude = amp[i] * df # Multiply by df = 1/(dt*N) = 1/10
This means that you currently multiply by a total of pi * 2 / 10, or 0.628, that you shouldn't (only the 1/N factor in there is correct).
Correct code:
def fft_own(data):
freq = fftfreq(N, dt)
data_fft = fft(data)
amp = np.abs(data_fft) / N
phase = np.angle(data_fft)
peaks, = np.where(amp >= 0) # use whole spectrum for rebuild
return freq, amp, phase, peaks
def rebuild(fft_own):
freq, amp, phase, peaks = fft_own
data_rebuild = 0
for i in peaks:
data_rebuild += amp[i] * np.exp(0+1j * (2*np.pi * freq[i] * t
+ phase[i]))
Your program can be significantly simplified by using ifft. Simply set to 0 those frequencies in data_fft that you don't want to include in the reconstruction, and apply ifft to it:
data_fft = fft(data)
data_fft[np.abs(data_fft) < threshold] = 0
rebuild = ifft(data_fft).real
Note that the Fourier transform of a Gaussian is a Gaussian, so you won't be picking out individual peaks, you are picking a compact range of frequencies that will always include 0. This is an ideal low-pass filter.

Output of fft.fft() for magnitude and phase (angle) not corresponding the the values set up

I set up a sine wave of a certain amplitude, frequency and phase, and tried recovering the amplitude and phase:
import numpy as np
import matplotlib.pyplot as plt
N = 1000 # Sample points
T = 1 / 800 # Spacing
t = np.linspace(0.0, N*T, N) # Time
frequency = np.fft.fftfreq(t.size, d=T) # Normalized Fourier frequencies in spectrum.
f0 = 25 # Frequency of the sampled wave
phi = np.pi/6 # Phase
A = 50 # Amplitude
s = A * np.sin(2 * np.pi * f0 * t - phi) # Signal
S = np.fft.fft(s) # Unnormalized FFT
fig, [ax1,ax2] = plt.subplots(nrows=2, ncols=1, figsize=(10, 5))
ax1.plot(t,s,'.-', label='time signal')
ax2.plot(freq[0:N//2], 2/N * np.abs(S[0:N//2]), '.', label='amplitude spectrum')
plt.show()
index, = np.where(np.isclose(frequency, f0, atol=1/(T*N))) # Getting the normalized frequency close to f0 in Hz)
magnitude = np.abs(S[index[0]]) # Magnitude
phase = np.angle(S[index[0]]) # Phase
print(magnitude)
print(phase)
phi
#21785.02149316858
#-1.2093259641890741
#0.5235987755982988
Now the amplitude should be 50, instead of 21785, and the phase pi/6=0.524, instead of -1.2.
Am I misinterpreting the output, or the answer on the post referred to in the link above?
You need to normalize the fft by 1/N with one of the two following changes (I used the 2nd one):
S = np.fft.fft(s) --> S = 1/N*np.fft.fft(s)
magnitude = np.abs(S[index[0]]) --> magnitude = 1/N*np.abs(S[index[0]])
Don't use index, = np.where(np.isclose(frequency, f0, atol=1/(T*N))), the fft is not exact and the highest magnitude may
not be at f0, use np.argmax(np.abs(S)) instead which will give
you the peak of the signal which will be very close to f0
np.angle messes up (I think its one of those pi,pi/2 arctan offset
things) just do it manually with np.arctan(np.real(x)/np.imag(x))
use more points (I made N higher) and make T smaller for higher accuracy
since a DFT (discrete fourier transform) is double sided and has peak signals in both the negative and positive frequencies, the peak in the positive side will only be half the actual magnitude. For an fft you need to multiply every frequency by two except for f=0 to acount for this. I multiplied by 2 in magnitude = np.abs(S[index])*2/N
N = 10000
T = 1/5000
...
index = np.argmax(np.abs(S))
magnitude = np.abs(S[index])*2/N
freq_max = frequency[index]
phase = np.arctan(np.imag(S[index])/np.real(S[index]))
print(f"magnitude: {magnitude}, freq_max: {freq_max}, phase: {phase}") print(phi)
Output: magnitude: 49.996693276663564, freq_max: 25.0, phase: 0.5079341239733628

How do I get the power at a particular frequency of a sound file?

I'm working on my end of the degree thesis in which I have to measure the Sound Pressure Level of underwater recordings (wav files) at a particular frequency (2000Hz). So I came up with this code:
'''
def get_value(filename, f0, NFFT=8192, plot = False):
#Load audio
data, sampling_frequency = soundfile.read(filename)
# remove stereo
if len(data.shape)> 1:
data = data[:, 0]
# remove extra length
if len(data)>sampling_frequency:
data = data[0:sampling_frequency]
# remove DC
data = data - data.mean()
# power without filtering
total_power = 10*np.log10(np.mean(data**2))
# fft
NFFT = 4096 # number of samples in the FFT
window = np.array(1) #np.hamming(len(data))
fftdata = np.fft.fft(data / NFFT, n = NFFT)
SPL = 20 * np.log10(np.abs(fftdata)) # Sound Pressure Level [dB]
freq = np.linspace(0, sampling_frequency, NFFT) # frequency axis [Hz]
# take value at desired frequency
power_at_frequency = SPL[np.argmin(np.abs(freq-f0))]
print(power_at_frequency)
'''
However, I checked the value with audacity and is completely different.
Thanks beforehand.
If you are interested in only one frequency you don't have to compute the FFT you can simply use
totalEnergy = np.sum((data - np.mean(data)) ** 2)
freqEnergy = np.abs(np.sum(data * np.exp(2j * np.pi * np.arange(len(data)) * target_freq / sampling_freq)))
And if you are using FFT and the window size is not a multiple of the wave period the frequency will leak to other frequencies. To avoid this your
import numpy as np;
import matplotlib.pyplot as plt
sampling_frequency = 48000;
target_frequency = 2000.0;
ns = 1000000;
data = np.sin(2*np.pi * np.arange(ns) * target_frequency / sampling_frequency);
# power
print('a sine wave have power 0.5 ~', np.mean(data**2), 'that will be split in two ')
## Properly scaled frequency
plt.figure(figsize=(12, 5))
plt.subplot(121);
z = np.abs(np.fft.fft(data[:8192])**2) / 8192**2
print('tuned with 8192 samples', max(z), ' some power leaked in other frequencies')
plt.semilogy(np.fft.fftfreq(len(z)) * sampling_frequency, z)
plt.ylabel('power')
plt.title('some power leaked')
plt.subplot(122);
# 6000 samples = 1/8 second is multiple of 1/2000 second
z = np.abs(np.fft.fft(data[:6000])**2) / 6000**2
print('tuned with 6000 samples', max(z))
plt.semilogy(np.fft.fftfreq(len(z)) * sampling_frequency, z)
plt.xlabel('frequency')
plt.title('all power in exact two symmetric bins')
## FFT of size not multiple of 2000
print(np.sum(np.abs(np.fft.fft(data[:8192]))**2) / 8192)

Python: Designing a time-series filter after Fourier analysis

I have a time series of 3-hourly temperature data that I have analyzed and found the power spectrum for using Fourier analysis.
data = np.genfromtxt('H:/RData/3hr_obs.txt',
skip_header=3)
step = data[:,0]
t = data[:,1]
y = data[:,2]
freq = 0.125
yps = np.abs(np.fft.fft(y))**2
yfreqs = np.fft.fftfreq(y.size, freq)
y_idx = np.argsort(yfreqs)
fig = plt.figure(figsize=(14,10))
ax = fig.add_subplot(111)
ax.semilogy(yfreqs[y_idx],yps[y_idx])
ax.set_ylim(1e-3,1e8)
Original Data:
Frequency Spectrum:
Power Spectrum:
Now that I know that the signal is strongest at frequencies of 1 and 2, I want to create a filter (non-boxcar) that can smooth out the data to keep those dominant frequencies.
Is there a specific numpy or scipy function that can do this? Will this be something that will have to be created outside the main packages?
An example with some synthetic data:
# fourier filter example (1D)
%matplotlib inline
import matplotlib.pyplot as p
import numpy as np
# make up a noisy signal
dt=0.01
t= np.arange(0,5,dt)
f1,f2= 5, 20 #Hz
n=t.size
s0= 0.2*np.sin(2*np.pi*f1*t)+ 0.15 * np.sin(2*np.pi*f2*t)
sr= np.random.rand(np.size(t))
s=s0+sr
#fft
s-= s.mean() # remove DC (spectrum easier to look at)
fr=np.fft.fftfreq(n,dt) # a nice helper function to get the frequencies
fou=np.fft.fft(s)
#make up a narrow bandpass with a Gaussian
df=0.1
gpl= np.exp(- ((fr-f1)/(2*df))**2)+ np.exp(- ((fr-f2)/(2*df))**2) # pos. frequencies
gmn= np.exp(- ((fr+f1)/(2*df))**2)+ np.exp(- ((fr+f2)/(2*df))**2) # neg. frequencies
g=gpl+gmn
filt=fou*g #filtered spectrum = spectrum * bandpass
#ifft
s2=np.fft.ifft(filt)
p.figure(figsize=(12,8))
p.subplot(511)
p.plot(t,s0)
p.title('data w/o noise')
p.subplot(512)
p.plot(t,s)
p.title('data w/ noise')
p.subplot(513)
p.plot(np.fft.fftshift(fr) ,np.fft.fftshift(np.abs(fou) ) )
p.title('spectrum of noisy data')
p.subplot(514)
p.plot(fr,g*50, 'r')
p.plot(fr,np.abs(filt))
p.title('filter (red) + filtered spectrum')
p.subplot(515)
p.plot(t,np.real(s2))
p.title('filtered time data')

Lomb-Scargle vs FFT power spectrum: crashes with evenly spaced data

I am trying to create some routines to compute power spectra for both evenly and unevenly sampled data, using the Lomb-Scargle periodogram (LSP) and FFT-Power spectrum. The problem I am having is that when using the LSP implementation in scipy, I experience crashes with evenly sampled data.
The code below works, and produces near identical (and correct) output, as far as I can tell. However, I was forced to insert a kludge in the Lomb-Scargle function to add jitter to the frequencies, so they don't exactly match up to the FFT ones. When I comment out that line, I get a divide-by-zero error.
Is this an issue with the Lomb-Scargle implementation in scipy, or am I simply not supposed to use it with evenly sampled data?? Thanks in advance.
import numpy as np
import scipy.signal as sp
import matplotlib.pyplot as plt
def one_sided_fft(t,x):
full_amplitude_spectrum = np.abs(np.fft.fft(x))/x.size
full_freqs = np.fft.fftfreq(x.size, np.mean(np.ediff1d(t)))
oneinds = np.where(full_freqs >=0.0)
one_sided_freqs = full_freqs[oneinds]
one_sided_amplitude_spectrum=2*full_amplitude_spectrum[oneinds]
return one_sided_freqs, one_sided_amplitude_spectrum
def power_spectrum(t,x):
onef, oneamps = one_sided_fft(t,x)
return onef, oneamps**2
def lomb_scargle_pspec(t, x):
tstep = np.mean(np.ediff1d(t))
freqs = np.fft.fftfreq(x.size, tstep)
idxx = np.argsort(freqs)
one_sided_freqs = freqs[idxx]
one_sided_freqs = one_sided_freqs[one_sided_freqs>0]
#KLUDGE TO KEEP PERIODOGRAM FROM CRASHING
one_sided_freqs = one_sided_freqs+0.00001*np.random.random(one_sided_freqs.size)
#THE FOLLOWING LINE CRASHES WITHOUT THE KLUDGE
pgram = sp.lombscargle(t, x, one_sided_freqs*2*np.pi)
return one_sided_freqs, (pgram/(t.size/4))
if __name__ == "__main__":
#Sample data
fs = 100.0
fund_freq=5
ampl = 0.4
t = np.arange(0,10,1/fs)
x = ampl*np.cos(2*np.pi*fund_freq*t)
#power spectrum calculations
powerf, powerspec = power_spectrum(t,x)
lsf, lspspec = lomb_scargle_pspec(t,x)
#plotting
fig, (ax0, ax1, ax2)= plt.subplots(nrows=3)
fig.tight_layout()
ax0.plot(t, x)
ax0.set_title('Input Data, '+str(fund_freq)+' Hz, '+
'Amplitude: '+str(ampl)+
' Fs = '+str(fs)+' Hz')
ax0.set_ylabel('Volts')
ax0.set_xlabel('Time[s]')
ax1.plot(powerf, powerspec)
ax1.set_title('FFT-based Power Spectrum')
ax1.set_ylabel('Volts**2')
ax1.set_xlabel('Freq[Hz]')
ax2.plot(lsf, lspspec)
ax2.set_title('Lomb-Scargle Power Spectrum')
ax2.set_ylabel('Volts**2')
ax2.set_xlabel('Freq[Hz]')
plt.show()
It was a bug in lombscargle. The code contained an arctan calculation implemented as atan(2 * cs / (cc - ss)), where cc and ss depend on elements of x and freqs. For some inputs, cc - ss can be 0. The fixed code using atan2(2 * cs, cc - ss) was included in scipy 0.15.0.

Categories