How to get time/freq from FFT in Python - python

I've got a little problem managing FFT data. I was looking for many examples of how to do FFT, but I couldn't get what I want from any of them. I have a random wave file with 44kHz sample rate and I want to get magnitude of N harmonics each X ms, let's say 100ms should be enough. I tried this code:
import scipy.io.wavfile as wavfile
import numpy as np
import pylab as pl
rate, data = wavfile.read("sound.wav")
t = np.arange(len(data[:,0]))*1.0/rate
p = 20*np.log10(np.abs(np.fft.rfft(data[:2048, 0])))
f = np.linspace(0, rate/2.0, len(p))
pl.plot(f, p)
pl.xlabel("Frequency(Hz)")
pl.ylabel("Power(dB)")
pl.show()
This was last example I used, I found it somewhere on stackoverflow. The problem is, this gets magnitude which I want, gets frequency, but no time at all. FFT analysis is 3D as far as I know and this is "merged" result of all harmonics. I get this:
X-axis = Frequency, Y-axis = Magnitude, Z-axis = Time (invisible)
From my understanding of the code, t is time - and it seems like that, but is not needed in the code - We'll maybe need it though. p is array of powers (or magnitude), but it seems like some average of all magnitudes of each frequency f, which is array of frequencies. I don't want average/merged value, I want magnitude for N harmonics each X milliseconds.
Long story short, we can get: 1 magnitude of all frequencies.
We want: All magnitudes of N freqeuencies including time when certain magnitude is present.
Result should look like this array: [time,frequency,amplitude]
So in the end if we want 3 harmonics, it would look like:
[0,100,2.85489] #100Hz harmonic has 2.85489 amplitude on 0ms
[0,200,1.15695] #200Hz ...
[0,300,3.12215]
[100,100,1.22248] #100Hz harmonic has 1.22248 amplitude on 100ms
[100,200,1.58758]
[100,300,2.57578]
[200,100,5.16574]
[200,200,3.15267]
[200,300,0.89987]
Visualization is not needed, result should be just arrays (or hashes/dictionaries) as listed above.

Further to #Paul R's answer, scipy.signal.spectrogram is a spectrogram function in scipy's signal processing module.
The example at the above link is as follows:
from scipy import signal
import matplotlib.pyplot as plt
# Generate a test signal, a 2 Vrms sine wave whose frequency linearly
# changes with time from 1kHz to 2kHz, corrupted by 0.001 V**2/Hz of
# white noise sampled at 10 kHz.
fs = 10e3
N = 1e5
amp = 2 * np.sqrt(2)
noise_power = 0.001 * fs / 2
time = np.arange(N) / fs
freq = np.linspace(1e3, 2e3, N)
x = amp * np.sin(2*np.pi*freq*time)
x += np.random.normal(scale=np.sqrt(noise_power), size=time.shape)
#Compute and plot the spectrogram.
f, t, Sxx = signal.spectrogram(x, fs)
plt.pcolormesh(t, f, Sxx)
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()

It looks like you're trying to implement a spectrogram, which is a sequence of power spectrum estimates, typically implemented with a succession of (usually overlapping) FFTs. Since you only have one FFT (spectrum) then you have no time dimension yet. Put your FFT code in a loop, and process one block of samples (e.g. 1024) per iteration, with a 50% overlap between successive blocks. The sequence of generated spectra will then be a 3D array of time v frequency v magnitude.
I'm not a Python person, but I can give you some pseudo code which should be enough to get you coding:
N = length of data input
N_FFT = no of samples per block (== FFT size, e.g. 1024)
i = 0 ;; i = index of spectrum within 3D output array
for block_start = 0 to N - block_start
block_end = block_start + N_FFT
get samples from block_start .. block_end
apply window function to block (e.g. Hamming)
apply FFT to windowed block
calculate magnitude spectrum (20 * log10( re*re + im*im ))
store spectrum in output array at index i
block_start += N_FFT / 2 ;; NB: 50% overlap
i++
end

Edit: Oh, so it seems this returns values, but they don't fit to the audio file at all. Even though they can be used as magnitude on spectrogram, they won't work for example in those classic audio visualizers which you can see in many music players. I also tried matplotlib's pylab for the spectrogram, but the result is same.
import os
import wave
import pylab
import math
from numpy import amax
from numpy import amin
def get_wav_info(wav_file,mi,mx):
wav = wave.open(wav_file, 'r')
frames = wav.readframes(-1)
sound_info = pylab.fromstring(frames, 'Int16')
frame_rate = wav.getframerate()
wav.close()
spectrum, freqs, t, im = pylab.specgram(sound_info, NFFT=1024, Fs=frame_rate)
n = 0
while n < 20:
for index,power in enumerate(spectrum[n]):
print("%s,%s,%s" % (n,int(round(t[index]*1000)),math.ceil(power*100)/100))
n += 1
get_wav_info("wave.wav",1,20)
Any tips how to obtain dB that's usable in visualization?
Basically, we apparently have all we need from the code above, just how to make it return normal values? Ignore mi and mx as these are just adjusting values in array to fit into mi..mx interval - that would be for visualization usage. If I am correct, spectrum in this code returns array of arrays which contains amplitudes for each frequency from freqs array, which are present on time according to t array, but how does the value work - is it really amplitude if it returns these weird values and if it is, how to convert it to dBs for example.
tl;dr I need output for visualizer like music players have, but it shouldn't work realtime, I want just the data, but values don't fit the wav file.
Edit2: I noticed there's one more issue. For 90 seconds wav, t array contains times till 175.x, which seems very weird considering the frame_rate is correct with the wav file. So now we have 2 problems: spectrum doesn't seem to return correct values (maybe it will fit if we get correct time) and t seems to return exactly double time of the wav.
Fixed: Case completely solved.
import os
import pylab
import math
from numpy import amax
from numpy import amin
from scipy.io import wavfile
frame_rate, snd = wavfile.read(wav_file)
sound_info = snd[:,0]
spectrum, freqs, t, im = pylab.specgram(sound_info,NFFT=1024,Fs=frame_rate,noverlap=5,mode='magnitude')
Specgram needed a little adjustment and I loaded only one channel with scipy.io library (instead of wave library). Also without mode set to magnitude, it returns 10log10 instead of 20log10, which is reason why it didn't return correct values.

Related

Fast Fourier Transform for an accelerometer in Python

I have data from the accelerometer in m/s2 (Y-axis) for a time period in seconds (X-axis). I would like to convert this data real-time so that I get the value of an acceleration related to the frequency in Hz. I know that, for example, there is an FFT function in numpy, but I have no idea at all how to use it. I would appreciate, if somebody could provide an example code to convert the raw data (Y: m/s2, X: s) to the desired data (Y: m/s2, X: Hz). It should not be necessarily exactly this function. Thanks!
First, let's create a time-domain signal.
For simplicity, I will create a sine wave with frequency components 12Hz and 24Hz and you can assume the unit of the values are m/s^2:
import numpy as np
import matplotlib.pyplot as plt
# This would be the actual sample rate of your signal
# since you didn't provide that, I just picked one
# big enough to make our graphs look pretty
sample_rate = 22050
# To produce a 1-second wave
length = 1
# The x-axis of your time-domain signal
t = np.linspace(0, length, sample_rate * length)
# A signal with 2 frequency components
# - 12Hz and 24Hz
y = np.sin(12 * (2 * np.pi) * t) + 0.5*np.sin(24 * (2 * np.pi) * t)
# Plot time domain signal
plt.plot(t, y)
plt.xlabel("Time (s)")
plt.show()
This will output:
Now, we continue on with the script by taking the Fourier transform of our original time-domain signal and then creating the magnitude spectrum (since that gives us a better way to visualize how each component is contributing than the phase spectrum):
# This returns the fourier transform coeficients as complex numbers
transformed_y = np.fft.fft(y)
# Take the absolute value of the complex numbers for magnitude spectrum
freqs_magnitude = np.abs(transformed_y)
# Create frequency x-axis that will span up to sample_rate
freq_axis = np.linspace(0, sample_rate, len(freqs_magnitude))
# Plot frequency domain
plt.plot(freq_axis, freqs_magnitude)
plt.xlabel("Frequency (Hz)")
plt.xlim(0, 100)
plt.show()
So now we can visualize the frequency-domain:
Notice how the magnitude of the 24Hz component is about half of the 12Hz component. That is because I purposely timed the 24Hz component by 0.5 on the time-domain signal, so the 24Hz component 'contributes' less to the overall signal, hence we get this halved spike for that component.
Note, also, that the y-axis of our output signal is not really in m/s^2 per Hz as you wanted. But you could compute the actual m/s^2 values by taking the integral over your desired frequency band.
I'll leave the jupyter notebook I created available here, feel free to use it and open issues if you have any problems with it.

Is np.fft.fft working properly? I am getting very large frequency values

I'm trying to do some audio cleaning, which I have never done before using python or otherwise and I came across the idea that I could use FFT to find the frequencies that make up my audio and eliminate the frequencies that don't belong. i did it on normal audio, but I couldn't understand the results, so I tried it on a simple sine wave.
I made it
frequency = 1000
num_samples = 48000
# The sampling rate of the analog to digital convert
sampling_rate = 48000.0
sine_wave = [np.sin(2 * np.pi * frequency * x1 / sampling_rate) for x1 in range(num_samples)]
sine_wave = np.array(sine_wave)
Then I played and plotted it and it looked and sounded like a normal size wave.
fig, ax = plt.subplots(figsize=(20,3))
ax.plot(sine_wave[:500])
IPython.display.Audio(data=sine_wave, rate=44100)
But when I did fft and looked at the frequencies on a graph it didn't make sense
def do_fft(data_samples):
data_fft = np.fft.fft(data_samples)
freq = (np.abs(data_fft[:len(data_fft)]))
plt.subplots(figsize=(20,10))
plt.plot(freq)
print("The frequency is {} Hz".format(np.argmax(freq)))
return freq
sine_freq = do_fft(sine_wave)
sine_freq[47000]
For one, I don't really understand what my frequency array is supposed to mean. I know that a high number at a certain index K means that K Hz appears a lot in the sound. This would make sense since I got a value of like 23,999.99999 at 1000 Hz, which is what my wave frequency is. What doesn't make sense is that I got 24,000 for 47,000 Hz. That doesn't make any sense to me. Did I do something wrong? Is fft not working properly?
The FFT of strictly real (all imaginary components == zero) data is always conjugate mirror symmetric. That's just the way the math of an FFT works. So your 47kHz peak (the same as -1kHz) is just the mirror image of 1kHz at a 48k sample rate. The Nyquist folding frequency or mirroring hinge is at half the sample rate (and/or zero if you consider the upper bin frequencies negative).
I prefer to explicitly define time and calculate the frequencies. The FT should be plotted against those frequencies. The argmax alone computes the place in a vector, not a frequency.
A sampling rate (sa/sec) is not a frequency. The Nyquist theorem states (among others) that your maximum frequency (in Hz) is 1/2 the sampling rate in Sa/sec.
Since we are using the complex FT (pos. and neg. frequencies shown), the -999.979 Hz is actually +1000 Hz.
import numpy as np
import matplotlib.pyplot as p
sampling_rate = 48000
t= np.linspace(0,1,sampling_rate+1) # time vector
dt=t[1]-t[0]
print(f'first/last time {t[0]}, {t[-1]}')
print(f'time interval : {dt}')
f = 1000
sig = np.sin(2 * np.pi * f*t)
fig = p.figure(figsize=(15,10))
p.subplot(211)
p.plot(sig[:500])
p.subplot(212)
ft = np.fft.fftshift(np.fft.fft(sig))
freq=np.fft.fftshift(np.fft.fftfreq(len(t),dt))
p.plot(freq,ft )
print(f'argmax of FT is not a frequency, but a position in a vector : {np.argmax(ft)}')
f0=freq[np.argmax(ft)]
print(f'the frequency is {f0:.3f} Hz')

How to make a PSD plot using `np.fft.fft`?

I want to make a plot of power spectral density versus frequency for a signal using the numpy.fft.fft function. I want to do this so that I can preserve the complex information in the transform and know what I'm doing, as apposed to relying on higher-level functions provided by numpy (like the periodogram function). I'm following Mathwork's nice page about doing PSD analysis using Matlab's fft function: https://www.mathworks.com/help/matlab/ref/fft.html
In this example, I expect the PSD to peak at the frequency I used to construct the signal, which was 100 in this case. I generate the signal using 1000 time points a frequency of 100 inverse time units. I thought that the fft magnitude could be plotted against [0, nt/2] and the peaks would show up where there is the most energy in the frequency. When I did this, things went wrong. I expected my PSD to peak at 100.
How can I make a spectral density plot of frequency vs energy contained in that frequency using np.fft.fft?
Edit
to clarify, in my real problem, I only know that my characteristic frequency is much larger than my sample frequency
import matplotlib.pyplot as plt
import numpy as np
t = np.arange(1000)
sp = np.fft.fft(np.sin(100 * t * np.pi))
trange = np.linspace(0, t[-1] / 2, t.size)
plt.plot(trange, np.abs(sp) / t.size)
plt.show()
This is a sketch I made of the expected output:
What is your sample frequency? This sequence you are generating can represent a infinite number of continuous time signals according to the sample frequency.
The sample frequency needs to be at least twice the maximum signal frequency, as stated by the Sampling Theorem, so, using fs = 250Hz and using a sine of 10 seconds it becomes:
import matplotlib.pyplot as plt
import numpy as np
fs = 250
t = np.arange(0, 10, 1/fs)
sp = np.fft.fft(np.sin(2*np.pi * 100 * t))
trange = np.linspace(0, fs, len(t))
plt.plot(trange, np.abs(sp))
plt.show()
If you run this you will see a peak at 100Hz as expected.

FFT y-scale confusion - Python scipy

I am performing FFTs on random binary data. I am confused by what the y-axis scaling factor is. My random data has a repetition rate of 400Hz, or a interval between measurements of 0.0025 seconds. The number of data points is 12489.
The code below works, and gives a mean amplitude of around 50.
My questions:
What does y.size exactly do in this context?
What is the expected amplitude of an FFT performed on 12489 random binary points? (I understand that this question is specifically for here, but if it's understood I'd appreciate the help).
The working code: (If you wish to copy and paste it into Python to look at)
from numpy import *
import pylab as P
import numpy as N
import scipy as S
import array
import scipy.fftpack
from random import *
#Produce random binary data
x = N.linspace(0,12489,12489)
randBinList = lambda n: [randint(0,1) for b in range(1,n+1)]
y = randBinList(12489)
y = asarray(y)
#Perform an FFT
FFT = abs(S.fft(y))
freqs = S.fftpack.fftfreq(y.size,0.0025)
#What does y.size do???
x_range = freqs[(freqs>0)]
y_range = FFT[(freqs>0)]
P.plot(x_range,y_range,'.r')
P.show()
fftfreq generates the frequencies of each bin of the result of the FFT, which is computed from the number of samples you pass in and the sampling rate (doc).

Clipping FFT Matrix

Audio processing is pretty new for me. And currently using Python Numpy for processing wave files. After calculating FFT matrix I am getting noisy power values for non-existent frequencies. I am interested in visualizing the data and accuracy is not a high priority. Is there a safe way to calculate the clipping value to remove these values, or should I use all FFT matrices for each sample set to come up with an average number ?
regards
Edit:
from numpy import *
import wave
import pymedia.audio.sound as sound
import time, struct
from pylab import ion, plot, draw, show
fp = wave.open("500-200f.wav", "rb")
sample_rate = fp.getframerate()
total_num_samps = fp.getnframes()
fft_length = 2048.
num_fft = (total_num_samps / fft_length ) - 2
temp = zeros((num_fft,fft_length), float)
for i in range(num_fft):
tempb = fp.readframes(fft_length);
data = struct.unpack("%dH"%(fft_length), tempb)
temp[i,:] = array(data, short)
pts = fft_length/2+1
data = (abs(fft.rfft(temp, fft_length)) / (pts))[:pts]
x_axis = arange(pts)*sample_rate*.5/pts
spec_range = pts
plot(x_axis, data[0])
show()
Here is the plot in non-logarithmic scale, for synthetic wave file containing 500hz(fading out) + 200hz sine wave created using Goldwave.
Simulated waveforms shouldn't show FFTs like your figure, so something is very wrong, and probably not with the FFT, but with the input waveform. The main problem in your plot is not the ripples, but the harmonics around 1000 Hz, and the subharmonic at 500 Hz. A simulated waveform shouldn't show any of this (for example, see my plot below).
First, you probably want to just try plotting out the raw waveform, and this will likely point to an obvious problem. Also, it seems odd to have a wave unpack to unsigned shorts, i.e. "H", and especially after this to not have a large zero-frequency component.
I was able to get a pretty close duplicate to your FFT by applying clipping to the waveform, as was suggested by both the subharmonic and higher harmonics (and Trevor). You could be introducing clipping either in the simulation or the unpacking. Either way, I bypassed this by creating the waveforms in numpy to start with.
Here's what the proper FFT should look like (i.e. basically perfect, except for the broadening of the peaks due to the windowing)
Here's one from a waveform that's been clipped (and is very similar to your FFT, from the subharmonic to the precise pattern of the three higher harmonics around 1000 Hz)
Here's the code I used to generate these
from numpy import *
from pylab import ion, plot, draw, show, xlabel, ylabel, figure
sample_rate = 20000.
times = arange(0, 10., 1./sample_rate)
wfm0 = sin(2*pi*200.*times)
wfm1 = sin(2*pi*500.*times) *(10.-times)/10.
wfm = wfm0+wfm1
# int test
#wfm *= 2**8
#wfm = wfm.astype(int16)
#wfm = wfm.astype(float)
# abs test
#wfm = abs(wfm)
# clip test
#wfm = clip(wfm, -1.2, 1.2)
fft_length = 5*2048.
total_num_samps = len(times)
num_fft = (total_num_samps / fft_length ) - 2
temp = zeros((num_fft,fft_length), float)
for i in range(num_fft):
temp[i,:] = wfm[i*fft_length:(i+1)*fft_length]
pts = fft_length/2+1
data = (abs(fft.rfft(temp, fft_length)) / (pts))[:pts]
x_axis = arange(pts)*sample_rate*.5/pts
spec_range = pts
plot(x_axis, data[2], linewidth=3)
xlabel("freq (Hz)")
ylabel('abs(FFT)')
show()
FFT's because they are windowed and sampled cause aliasing and sampling in the frequency domain as well. Filtering in the time domain is just multiplication in the frequency domain so you may want to just apply a filter which is just multiplying each frequency by a value for the function for the filter you are using. For example multiply by 1 in the passband and by zero every were else. The unexpected values are probably caused by aliasing where higher frequencies are being folded down to the ones you are seeing. The original signal needs to be band limited to half your sampling rate or you will get aliasing. Of more concern is aliasing that is distorting the area of interest because for this band of frequencies you want to know that the frequency is from the expected one.
The other thing to keep in mind is that when you grab a piece of data from a wave file you are mathmatically multiplying it by a square wave. This causes a sinx/x to be convolved with the frequency response to minimize this you can multiply the original windowed signal with something like a Hanning window.
It's worth mentioning for a 1D FFT that the first element (index [0]) contains the DC (zero-frequency) term, the elements [1:N/2] contain the positive frequencies and the elements [N/2+1:N-1] contain the negative frequencies. Since you didn't provide a code sample or additional information about the output of your FFT, I can't rule out the possibility that the "noisy power values at non-existent frequencies" aren't just the negative frequencies of your spectrum.
EDIT: Here is an example of a radix-2 FFT implemented in pure Python with a simple test routine that finds the FFT of a rectangular pulse, [1.,1.,1.,1.,0.,0.,0.,0.]. You can run the example on codepad and see that the FFT of that sequence is
[0j, Negative frequencies
(1+0.414213562373j), ^
0j, |
(1+2.41421356237j), |
(4+0j), <= DC term
(1-2.41421356237j), |
0j, v
(1-0.414213562373j)] Positive frequencies
Note that the code prints out the Fourier coefficients in order of ascending frequency, i.e. from the highest negative frequency up to DC, and then up to the highest positive frequency.
I don't know enough from your question to actually answer anything specific.
But here are a couple of things to try from my own experience writing FFTs:
Make sure you are following Nyquist rule
If you are viewing the linear output of the FFT... you will have trouble seeing your own signal and think everything is broken. Make sure you are looking at the dB of your FFT magnitude. (i.e. "plot(10*log10(abs(fft(x))))" )
Create a unitTest for your FFT() function by feeding generated data like a pure tone. Then feed the same generated data to Matlab's FFT(). Do a absolute value diff between the two output data series and make sure the max absolute value difference is something like 10^-6 (i.e. the only difference is caused by small floating point errors)
Make sure you are windowing your data
If all of those three things work, then your fft is fine. And your input data is probably the issue.
Check the input data to see if there is clipping http://www.users.globalnet.co.uk/~bunce/clip.gif
Time doamin clipping shows up as mirror images of the signal in the frequency domain at specific regular intervals with less amplitude.

Categories