Fast Fourier Transform using NumPy: why it looks like this? - python

Fast Fourier Transform is fast method of Discrete Fourier Transformation calculation, as far as I understood.
I've been playing with NumPy math library, as so has such plot with this code:
import numpy as np
from numpy.fft import fft, fftfreq
import matplotlib.pyplot as plt
t = np.arange(0, 10, step=0.001)
signal = np.sin(t) + np.sin(10*t)
sp = fft(signal)
freq = fftfreq(signal.size, d=0.001)
plt.plot(freq, sp)
plt.show()
It seems to me, that must look just like d(x-1) + d(x-10) ... // d is delta-function
(Discrete Fourier Transformation must look like simple Fourier Transformation, but with sloping edges, as far as I understand)
But it doesn't. it looks like "d(x-0.1) + d(x-1.5) ..." and I wonder why. Problems with fftfreq?

It's been many a year since I studied this ...
You're expecting to see peaks at 1 and 10 Hz (cycles/sec)?
Then you need to change the arguments of the sin functions.
sin takes radians for arg. 1 Hz is 2*pi radians/sec and 10 Hz is 10*2*pi rad/sec
Change your signal =np.sin(2*np.pi*t) + np.sin(10*2*np.pi*t) # optimize math as desired.

Related

How to make a PSD plot using `np.fft.fft`?

I want to make a plot of power spectral density versus frequency for a signal using the numpy.fft.fft function. I want to do this so that I can preserve the complex information in the transform and know what I'm doing, as apposed to relying on higher-level functions provided by numpy (like the periodogram function). I'm following Mathwork's nice page about doing PSD analysis using Matlab's fft function: https://www.mathworks.com/help/matlab/ref/fft.html
In this example, I expect the PSD to peak at the frequency I used to construct the signal, which was 100 in this case. I generate the signal using 1000 time points a frequency of 100 inverse time units. I thought that the fft magnitude could be plotted against [0, nt/2] and the peaks would show up where there is the most energy in the frequency. When I did this, things went wrong. I expected my PSD to peak at 100.
How can I make a spectral density plot of frequency vs energy contained in that frequency using np.fft.fft?
Edit
to clarify, in my real problem, I only know that my characteristic frequency is much larger than my sample frequency
import matplotlib.pyplot as plt
import numpy as np
t = np.arange(1000)
sp = np.fft.fft(np.sin(100 * t * np.pi))
trange = np.linspace(0, t[-1] / 2, t.size)
plt.plot(trange, np.abs(sp) / t.size)
plt.show()
This is a sketch I made of the expected output:
What is your sample frequency? This sequence you are generating can represent a infinite number of continuous time signals according to the sample frequency.
The sample frequency needs to be at least twice the maximum signal frequency, as stated by the Sampling Theorem, so, using fs = 250Hz and using a sine of 10 seconds it becomes:
import matplotlib.pyplot as plt
import numpy as np
fs = 250
t = np.arange(0, 10, 1/fs)
sp = np.fft.fft(np.sin(2*np.pi * 100 * t))
trange = np.linspace(0, fs, len(t))
plt.plot(trange, np.abs(sp))
plt.show()
If you run this you will see a peak at 100Hz as expected.

How to get time/freq from FFT in Python

I've got a little problem managing FFT data. I was looking for many examples of how to do FFT, but I couldn't get what I want from any of them. I have a random wave file with 44kHz sample rate and I want to get magnitude of N harmonics each X ms, let's say 100ms should be enough. I tried this code:
import scipy.io.wavfile as wavfile
import numpy as np
import pylab as pl
rate, data = wavfile.read("sound.wav")
t = np.arange(len(data[:,0]))*1.0/rate
p = 20*np.log10(np.abs(np.fft.rfft(data[:2048, 0])))
f = np.linspace(0, rate/2.0, len(p))
pl.plot(f, p)
pl.xlabel("Frequency(Hz)")
pl.ylabel("Power(dB)")
pl.show()
This was last example I used, I found it somewhere on stackoverflow. The problem is, this gets magnitude which I want, gets frequency, but no time at all. FFT analysis is 3D as far as I know and this is "merged" result of all harmonics. I get this:
X-axis = Frequency, Y-axis = Magnitude, Z-axis = Time (invisible)
From my understanding of the code, t is time - and it seems like that, but is not needed in the code - We'll maybe need it though. p is array of powers (or magnitude), but it seems like some average of all magnitudes of each frequency f, which is array of frequencies. I don't want average/merged value, I want magnitude for N harmonics each X milliseconds.
Long story short, we can get: 1 magnitude of all frequencies.
We want: All magnitudes of N freqeuencies including time when certain magnitude is present.
Result should look like this array: [time,frequency,amplitude]
So in the end if we want 3 harmonics, it would look like:
[0,100,2.85489] #100Hz harmonic has 2.85489 amplitude on 0ms
[0,200,1.15695] #200Hz ...
[0,300,3.12215]
[100,100,1.22248] #100Hz harmonic has 1.22248 amplitude on 100ms
[100,200,1.58758]
[100,300,2.57578]
[200,100,5.16574]
[200,200,3.15267]
[200,300,0.89987]
Visualization is not needed, result should be just arrays (or hashes/dictionaries) as listed above.
Further to #Paul R's answer, scipy.signal.spectrogram is a spectrogram function in scipy's signal processing module.
The example at the above link is as follows:
from scipy import signal
import matplotlib.pyplot as plt
# Generate a test signal, a 2 Vrms sine wave whose frequency linearly
# changes with time from 1kHz to 2kHz, corrupted by 0.001 V**2/Hz of
# white noise sampled at 10 kHz.
fs = 10e3
N = 1e5
amp = 2 * np.sqrt(2)
noise_power = 0.001 * fs / 2
time = np.arange(N) / fs
freq = np.linspace(1e3, 2e3, N)
x = amp * np.sin(2*np.pi*freq*time)
x += np.random.normal(scale=np.sqrt(noise_power), size=time.shape)
#Compute and plot the spectrogram.
f, t, Sxx = signal.spectrogram(x, fs)
plt.pcolormesh(t, f, Sxx)
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()
It looks like you're trying to implement a spectrogram, which is a sequence of power spectrum estimates, typically implemented with a succession of (usually overlapping) FFTs. Since you only have one FFT (spectrum) then you have no time dimension yet. Put your FFT code in a loop, and process one block of samples (e.g. 1024) per iteration, with a 50% overlap between successive blocks. The sequence of generated spectra will then be a 3D array of time v frequency v magnitude.
I'm not a Python person, but I can give you some pseudo code which should be enough to get you coding:
N = length of data input
N_FFT = no of samples per block (== FFT size, e.g. 1024)
i = 0 ;; i = index of spectrum within 3D output array
for block_start = 0 to N - block_start
block_end = block_start + N_FFT
get samples from block_start .. block_end
apply window function to block (e.g. Hamming)
apply FFT to windowed block
calculate magnitude spectrum (20 * log10( re*re + im*im ))
store spectrum in output array at index i
block_start += N_FFT / 2 ;; NB: 50% overlap
i++
end
Edit: Oh, so it seems this returns values, but they don't fit to the audio file at all. Even though they can be used as magnitude on spectrogram, they won't work for example in those classic audio visualizers which you can see in many music players. I also tried matplotlib's pylab for the spectrogram, but the result is same.
import os
import wave
import pylab
import math
from numpy import amax
from numpy import amin
def get_wav_info(wav_file,mi,mx):
wav = wave.open(wav_file, 'r')
frames = wav.readframes(-1)
sound_info = pylab.fromstring(frames, 'Int16')
frame_rate = wav.getframerate()
wav.close()
spectrum, freqs, t, im = pylab.specgram(sound_info, NFFT=1024, Fs=frame_rate)
n = 0
while n < 20:
for index,power in enumerate(spectrum[n]):
print("%s,%s,%s" % (n,int(round(t[index]*1000)),math.ceil(power*100)/100))
n += 1
get_wav_info("wave.wav",1,20)
Any tips how to obtain dB that's usable in visualization?
Basically, we apparently have all we need from the code above, just how to make it return normal values? Ignore mi and mx as these are just adjusting values in array to fit into mi..mx interval - that would be for visualization usage. If I am correct, spectrum in this code returns array of arrays which contains amplitudes for each frequency from freqs array, which are present on time according to t array, but how does the value work - is it really amplitude if it returns these weird values and if it is, how to convert it to dBs for example.
tl;dr I need output for visualizer like music players have, but it shouldn't work realtime, I want just the data, but values don't fit the wav file.
Edit2: I noticed there's one more issue. For 90 seconds wav, t array contains times till 175.x, which seems very weird considering the frame_rate is correct with the wav file. So now we have 2 problems: spectrum doesn't seem to return correct values (maybe it will fit if we get correct time) and t seems to return exactly double time of the wav.
Fixed: Case completely solved.
import os
import pylab
import math
from numpy import amax
from numpy import amin
from scipy.io import wavfile
frame_rate, snd = wavfile.read(wav_file)
sound_info = snd[:,0]
spectrum, freqs, t, im = pylab.specgram(sound_info,NFFT=1024,Fs=frame_rate,noverlap=5,mode='magnitude')
Specgram needed a little adjustment and I loaded only one channel with scipy.io library (instead of wave library). Also without mode set to magnitude, it returns 10log10 instead of 20log10, which is reason why it didn't return correct values.

Fourier transform with python

I have a set of data. It is obviously have some periodic nature. I want to find out what frequency it has by using the fourier transformation and plot it out.
Here is a shot of mine, but it seems not so good.
This is the corresponding code, I don't konw why it fails:
import numpy
from pylab import *
from scipy.fftpack import fft,fftfreq
import matplotlib.pyplot as plt
dataset = numpy.genfromtxt(fname='data.txt',skip_header=1)
t = dataset[:,0]
signal = dataset[:,1]
npts=len(t)
FFT = abs(fft(signal))
freqs = fftfreq(npts, t[1]-t[0])
subplot(211)
plot(t[:npts], signal[:npts])
subplot(212)
plot(freqs,20*log10(FFT),',')
xlim(-10,10)
show()
My question is:Since the original data is very periodic looking, and I expect to see that in the frequency domain the peak is very sharp; how can I make the peak nicer looking?
It's a problem of data analysis.
FFT works with complex number so the spectrum is symmetric on real data input : restrict on xlim(0,max(freqs)) .
The sampling period is not good : increasing period while keeping the same total number of input points will lead to a best quality spectrum on this exemple.
EDIT.
with :
dataset = numpy.genfromtxt(fname='data.txt',skip_header=1)[::30];
t,signal = dataset.T
(...)
plot(freqs,FFT)
xlim(0,1)
ylim(0,30)
the spectrum is
For best quality spectrum , just reacquire the signal for a long long time (for beautiful peaks), with sampling frequency of 1 Hz, which will give you a [0, 0.5 Hz] frequency scale (See Nyquist criterium).

How to make this matplotlib plot less noisy?

How can I plot the following noisy data with a smooth, continuous line without considering each individual value? I would like to only show the behavior in a nicer way, without caring about noisy and extreme values. This is the code I am using:
import numpy
import sys
import matplotlib.pyplot as plt
from scipy.interpolate import spline
dataset = numpy.genfromtxt(fname='data', delimiter=",")
dic = {}
for d in dataset:
dic[d[0]] = d[1]
plt.plot(range(len(dic)), dic.values(),linestyle='-', linewidth=2)
plt.savefig('plot.png')
plt.show()
In a previous answer, I was introduced to the Savitzky Golay filter, a particular type of low-pass filter, well adapted for data smoothing. How "smooth" you want your resulting curve to be is a matter of preference, and this can be adjusted by both the window-size and the order of the interpolating polynomial. Using the cookbook example for sg_filter:
import numpy as np
import sg_filter
import matplotlib.pyplot as plt
# Generate some sample data similar to your post
X = np.arange(1,1000,1)
Y = np.log(X**3) + 10*np.random.random(X.shape)
Y2 = sg_filter.savitzky_golay(Y, 101, 3)
plt.plot(X,Y,linestyle='-', linewidth=2,alpha=.5)
plt.plot(X,Y2,color='r')
plt.show()
There is more than one way to do it!
Here I show how to reduce noise using a variety of techniques:
Moving average
LOWESS regression
Low pass filter
Interpolation
Sticking with #Hooked example data for consistency:
import numpy as np
import matplotlib.pyplot as plt
X = np.arange(1, 1000, 1)
Y = np.log(X ** 3) + 10 * np.random.random(X.shape)
plt.plot(X, Y, alpha = .5)
plt.show()
Moving average
Sometimes all you need is a moving average.
For example, using pandas with a window size of 100:
import pandas as pd
df = pd.DataFrame(Y, X)
df_mva = df.rolling(100).mean() # moving average with a window size of 100
df_mva.plot(legend = False);
You will probably have to try several window sizes with your data. Note that the first 100 values of df_mva will be NaN but these can be removed with the dropna method.
Usage details for the pandas rolling function.
LOWESS regression
I've used LOWESS (Locally Weighted Scatterplot Smoothing) successfully to remove noise from repeated measures datasets. More information on local regression methods, including LOWESS and LOESS, here. It's a simple method with only one parameter to tune which in my experience gives good results.
Here is how to apply the LOWESS technique using the statsmodels implementation:
import statsmodels.api as sm
y_lowess = sm.nonparametric.lowess(Y, X, frac = 0.3) # 30 % lowess smoothing
plt.plot(y_lowess[:, 0], y_lowess[:, 1]) # some noise removed
plt.show()
It may be necessary to vary the frac parameter, which is the fraction of the data used when estimating each y value. Increase the frac value to increase the amount of smoothing. The frac value must be between 0 and 1.
Further details on statsmodels lowess usage.
Low pass filter
Scipy provides a set of low pass filters which may be appropriate.
After application of the lfiter:
from scipy.signal import lfilter
n = 50 # larger n gives smoother curves
b = [1.0 / n] * n # numerator coefficients
a = 1 # denominator coefficient
y_lf = lfilter(b, a, Y)
plt.plot(X, y_lf)
plt.show()
Check scipy lfilter documentation for implementation details regarding how numerator and denominator coefficients are used in the difference equations.
There are other filters in the scipy.signal package.
Interpolation
Finally, here is an example of radial basis function interpolation:
from scipy.interpolate import Rbf
rbf = Rbf(X, Y, function = 'multiquadric', smooth = 500)
y_rbf = rbf(X)
plt.plot(X, y_rbf)
plt.show()
Smoother approximation can be achieved by increasing the smooth parameter. Alternative function parameters to consider include 'cubic' and 'thin_plate'. When considering the function value, I usually try 'thin_plate' first followed by 'cubic'; however both 'thin_plate' and 'cubic' seemed to struggle with the noise in this dataset.
Check other Rbf options in the scipy docs. Scipy provides other univariate and multivariate interpolation techniques (see this tutorial).

FFT y-scale confusion - Python scipy

I am performing FFTs on random binary data. I am confused by what the y-axis scaling factor is. My random data has a repetition rate of 400Hz, or a interval between measurements of 0.0025 seconds. The number of data points is 12489.
The code below works, and gives a mean amplitude of around 50.
My questions:
What does y.size exactly do in this context?
What is the expected amplitude of an FFT performed on 12489 random binary points? (I understand that this question is specifically for here, but if it's understood I'd appreciate the help).
The working code: (If you wish to copy and paste it into Python to look at)
from numpy import *
import pylab as P
import numpy as N
import scipy as S
import array
import scipy.fftpack
from random import *
#Produce random binary data
x = N.linspace(0,12489,12489)
randBinList = lambda n: [randint(0,1) for b in range(1,n+1)]
y = randBinList(12489)
y = asarray(y)
#Perform an FFT
FFT = abs(S.fft(y))
freqs = S.fftpack.fftfreq(y.size,0.0025)
#What does y.size do???
x_range = freqs[(freqs>0)]
y_range = FFT[(freqs>0)]
P.plot(x_range,y_range,'.r')
P.show()
fftfreq generates the frequencies of each bin of the result of the FFT, which is computed from the number of samples you pass in and the sampling rate (doc).

Categories