Audio processing is pretty new for me. And currently using Python Numpy for processing wave files. After calculating FFT matrix I am getting noisy power values for non-existent frequencies. I am interested in visualizing the data and accuracy is not a high priority. Is there a safe way to calculate the clipping value to remove these values, or should I use all FFT matrices for each sample set to come up with an average number ?
regards
Edit:
from numpy import *
import wave
import pymedia.audio.sound as sound
import time, struct
from pylab import ion, plot, draw, show
fp = wave.open("500-200f.wav", "rb")
sample_rate = fp.getframerate()
total_num_samps = fp.getnframes()
fft_length = 2048.
num_fft = (total_num_samps / fft_length ) - 2
temp = zeros((num_fft,fft_length), float)
for i in range(num_fft):
tempb = fp.readframes(fft_length);
data = struct.unpack("%dH"%(fft_length), tempb)
temp[i,:] = array(data, short)
pts = fft_length/2+1
data = (abs(fft.rfft(temp, fft_length)) / (pts))[:pts]
x_axis = arange(pts)*sample_rate*.5/pts
spec_range = pts
plot(x_axis, data[0])
show()
Here is the plot in non-logarithmic scale, for synthetic wave file containing 500hz(fading out) + 200hz sine wave created using Goldwave.
Simulated waveforms shouldn't show FFTs like your figure, so something is very wrong, and probably not with the FFT, but with the input waveform. The main problem in your plot is not the ripples, but the harmonics around 1000 Hz, and the subharmonic at 500 Hz. A simulated waveform shouldn't show any of this (for example, see my plot below).
First, you probably want to just try plotting out the raw waveform, and this will likely point to an obvious problem. Also, it seems odd to have a wave unpack to unsigned shorts, i.e. "H", and especially after this to not have a large zero-frequency component.
I was able to get a pretty close duplicate to your FFT by applying clipping to the waveform, as was suggested by both the subharmonic and higher harmonics (and Trevor). You could be introducing clipping either in the simulation or the unpacking. Either way, I bypassed this by creating the waveforms in numpy to start with.
Here's what the proper FFT should look like (i.e. basically perfect, except for the broadening of the peaks due to the windowing)
Here's one from a waveform that's been clipped (and is very similar to your FFT, from the subharmonic to the precise pattern of the three higher harmonics around 1000 Hz)
Here's the code I used to generate these
from numpy import *
from pylab import ion, plot, draw, show, xlabel, ylabel, figure
sample_rate = 20000.
times = arange(0, 10., 1./sample_rate)
wfm0 = sin(2*pi*200.*times)
wfm1 = sin(2*pi*500.*times) *(10.-times)/10.
wfm = wfm0+wfm1
# int test
#wfm *= 2**8
#wfm = wfm.astype(int16)
#wfm = wfm.astype(float)
# abs test
#wfm = abs(wfm)
# clip test
#wfm = clip(wfm, -1.2, 1.2)
fft_length = 5*2048.
total_num_samps = len(times)
num_fft = (total_num_samps / fft_length ) - 2
temp = zeros((num_fft,fft_length), float)
for i in range(num_fft):
temp[i,:] = wfm[i*fft_length:(i+1)*fft_length]
pts = fft_length/2+1
data = (abs(fft.rfft(temp, fft_length)) / (pts))[:pts]
x_axis = arange(pts)*sample_rate*.5/pts
spec_range = pts
plot(x_axis, data[2], linewidth=3)
xlabel("freq (Hz)")
ylabel('abs(FFT)')
show()
FFT's because they are windowed and sampled cause aliasing and sampling in the frequency domain as well. Filtering in the time domain is just multiplication in the frequency domain so you may want to just apply a filter which is just multiplying each frequency by a value for the function for the filter you are using. For example multiply by 1 in the passband and by zero every were else. The unexpected values are probably caused by aliasing where higher frequencies are being folded down to the ones you are seeing. The original signal needs to be band limited to half your sampling rate or you will get aliasing. Of more concern is aliasing that is distorting the area of interest because for this band of frequencies you want to know that the frequency is from the expected one.
The other thing to keep in mind is that when you grab a piece of data from a wave file you are mathmatically multiplying it by a square wave. This causes a sinx/x to be convolved with the frequency response to minimize this you can multiply the original windowed signal with something like a Hanning window.
It's worth mentioning for a 1D FFT that the first element (index [0]) contains the DC (zero-frequency) term, the elements [1:N/2] contain the positive frequencies and the elements [N/2+1:N-1] contain the negative frequencies. Since you didn't provide a code sample or additional information about the output of your FFT, I can't rule out the possibility that the "noisy power values at non-existent frequencies" aren't just the negative frequencies of your spectrum.
EDIT: Here is an example of a radix-2 FFT implemented in pure Python with a simple test routine that finds the FFT of a rectangular pulse, [1.,1.,1.,1.,0.,0.,0.,0.]. You can run the example on codepad and see that the FFT of that sequence is
[0j, Negative frequencies
(1+0.414213562373j), ^
0j, |
(1+2.41421356237j), |
(4+0j), <= DC term
(1-2.41421356237j), |
0j, v
(1-0.414213562373j)] Positive frequencies
Note that the code prints out the Fourier coefficients in order of ascending frequency, i.e. from the highest negative frequency up to DC, and then up to the highest positive frequency.
I don't know enough from your question to actually answer anything specific.
But here are a couple of things to try from my own experience writing FFTs:
Make sure you are following Nyquist rule
If you are viewing the linear output of the FFT... you will have trouble seeing your own signal and think everything is broken. Make sure you are looking at the dB of your FFT magnitude. (i.e. "plot(10*log10(abs(fft(x))))" )
Create a unitTest for your FFT() function by feeding generated data like a pure tone. Then feed the same generated data to Matlab's FFT(). Do a absolute value diff between the two output data series and make sure the max absolute value difference is something like 10^-6 (i.e. the only difference is caused by small floating point errors)
Make sure you are windowing your data
If all of those three things work, then your fft is fine. And your input data is probably the issue.
Check the input data to see if there is clipping http://www.users.globalnet.co.uk/~bunce/clip.gif
Time doamin clipping shows up as mirror images of the signal in the frequency domain at specific regular intervals with less amplitude.
Related
I am attempting to calculate the MTF from a test target. I calculate the spread function easily enough, but the FFT results do not quite make sense to me. To summarize,the values seem to alternate giving me a reflection of what I would expect. To test, I used a simple square wave and numpy:
from numpy import fft
data = []
for x in range (0, 20):
data.append(0)
data[9] = 10
data[10] = 10
data[11] = 10
dataFFT = fft.fft(data)
The results look correct, with the exception of the sign... I am seeing the following for the first 4 values as an example:
30.00000000 +0.00000000e+00j
-29.02113033 +7.10542736e-15j
26.18033989 -1.24344979e-14j
-21.75570505 +1.24344979e-14j
So my question is why positive->negative->positive->negative in the real plane? This is not what I would expect... It I plot it, it almost appears that the correct function is mirrored around the x axis.
Note: I was expecting the following as an example:
This is what I am getting:
Your pulse is symmetric and positioned in the center of your FFT window (around N/2). Symmetric real data corresponds to only the cosine or "real" components of an FFT result. Note that the cosine function alternates between being -1 and 1 at the center of the FFT window, depending on the frequency bin index (representing cosine periods per FFT width). So the correlation of these FFT basis functions with a positive going pulse will also alternate as long as the pulse is narrower than half the cosine period.
If you want the largest FFT coefficients to be mostly positive, try centering your narrow rectangular pulse around time 0 (or circularly, time N), where the cosine function is always 1 for any frequency.
It works if you shift the data around 0 instead of half your array, with:
dataFFT = fft.fft(np.fftshift(data))
This isn't all that unexpected. If you want to check against conventional plots, make sure you convert that info to magnitude and phase before coming to any conclusions.
I did a quick check using your code and numpy.abs for mag, numpy,angle for phase. It sure looks like a sinc() function to me, which is what would be expected if the time-domain is a square pulse. If you do this, you'll find a pretty wide sinc, as would be expeceted for a short duration pulse on so few samples.
you forget to specify if your data is Real or Complex
not everyone code in python/numpy (including me) and if you do not know this then you probably handle data to/from FFT the wrong way.
FFT input can be both real or complex domain
FFT output is complex domain
so check the docs for your FFT implementation and specify it and also repair your data handling accordingly. Complex domain usually have first value Re and Second Im but that depends on FFT implementation/configuration.
signal
here is an example of impulse response from FFT
first is input Real domain signal (Im=0) single finite nonzero width pulse and second is the Re part of FFT output. The third is the Im part of FFT output. If you zoom it a bit then you will see amplitude range of y axis of each signal (on left).
Do not forget that different FFT implementations can have different normalization constants which will change the amplitude of signal. If you want magnitude and phase convert it like this:
mag=sqrt(Re*Re+Im*Im); // power
ang=atanxy(Re,Im); // phase angle
atanxy(dx,dy) is 4 quadrant arctan also called atan2 but be careful to get the operand order the same as your atanxy/atan2 implementation needs. Also can use mine C++ atanxy implementation
[Notes]
if your input signal is Real domain then FFT output is symmetric. Both Re and Im signals will be like:
{ a0,a1,a2,a3,...,a(n-1),a(n-1)...,a3,a2,a1,a0 }
exactly like on the image above. On the left are low frequencies and in the middle is the top frequency. If your input signal is Complex domain then the output can be anything.
I'm working on a problem where I would like to extract and compare the time domain amplitudes of two different signals at each frequency. The signals are real world, so have noise, and multiple frequencies, so I'm trying to work in the FFT world.
I wrote a function to take the FFT of a dataset, and return the amplitudes. This seems to work okay for a simulated pure sin wave, but when performed on actual datasets, the amplitudes are always attenuated by some amount.
def amplitudePowerSpectrum(time,data):
dt = np.zeros(time.size-1,)
avgdt = np.mean(time[1:-1] - time[0:-2])
sampFreq = 1.0/(avedt)
nyquistFreq = sampFreq/2.0
FFTData = np.abs(scipy.fftpack.fft(data))
## Only care about positive frequencies
FFTData = FFTData[0:len(FFTData)/2]
## This is how we get the power spectrum in terms of time-domain amplitudes
amplitudeSpectrum = FFTData/len(FFTData)
freqsData = scipy.fftpack.fftfreq(data.size, avgdt)
freq = freqsData[0:len(freqsData)/2]
return (freq,amplitudeSpectrum,(sampFreq,nyquistFreq))
Here is a plot of a raw dataset, followed by one of the computed amplitude spectrum.As you can see, there are two specifically different frequencies, with other noise on top.
I'd expect the amplitudes in figure 2 to match the time domain amplitudes in figure 1. But they are attenuated by a pretty decent factor. The end goal is a scale factor between the input (blue) and output (red) signals at each frequency.
First, is obataining time domain amplitudes accurately possible in the Fourrier domain on real datasets? If so, what am I missing? I'm working with python numpy and scipy packages
I am following this link to do a smoothing of my data set.
The technique is based on the principle of removing the higher order terms of the Fourier Transform of the signal, and so obtaining a smoothed function.
This is part of my code:
N = len(y)
y = y.astype(float) # fix issue, see below
yfft = fft(y, N)
yfft[31:] = 0.0 # set higher harmonics to zero
y_smooth = fft(yfft, N)
ax.errorbar(phase, y, yerr = err, fmt='b.', capsize=0, elinewidth=1.0)
ax.plot(phase, y_smooth/30, color='black') #arbitrary normalization, see below
However some things do not work properly.
Indeed, you can check the resulting plot :
The blue points are my data, while the black line should be the smoothed curve.
First of all I had to convert my array of data y by following this discussion.
Second, I just normalized arbitrarily to compare the curve with data, since I don't know why the original curve had values much higher than the data points.
Most importantly, the curve is like "specular" to the data point, and I don't know why this happens.
It would be great to have some advices especially to the third point, and more generally how to optimize the smoothing with this technique for my particular data set shape.
Your problem is probably due to the shifting that the standard FFT does. You can read about it here.
Your data is real, so you can take advantage of symmetries in the FT and use the special function np.fft.rfft
import numpy as np
x = np.arange(40)
y = np.log(x + 1) * np.exp(-x/8.) * x**2 + np.random.random(40) * 15
rft = np.fft.rfft(y)
rft[5:] = 0 # Note, rft.shape = 21
y_smooth = np.fft.irfft(rft)
plt.plot(x, y, label='Original')
plt.plot(x, y_smooth, label='Smoothed')
plt.legend(loc=0)
plt.show()
If you plot the absolute value of rft, you will see that there is almost no information in frequencies beyond 5, so that is why I choose that threshold (and a bit of playing around, too).
Here the results:
From what I can gather you want to build a low pass filter by doing the following:
Move to the frequency domain. (Fourier transform)
Remove undesired frequencies.
Move back to the time domain. (Inverse fourier transform)
Looking at your code, instead of doing 3) you're just doing another fourier transform. Instead, try doing an actual inverse fourier transform to move back to the time domain:
y_smooth = ifft(yfft, N)
Have a look at scipy signal to see a bunch of already available filters.
(Edit: I'd be curious to see the results, do share!)
I would be very cautious in using this technique. By zeroing out frequency components of the FFT you are effectively constructing a brick wall filter in the frequency domain. This will result in convolution with a sinc in the time domain and likely distort the information you want to process. Look up "Gibbs phenomenon" for more information.
You're probably better off designing a low pass filter or using a simple N-point moving average (which is itself a LPF) to accomplish the smoothing.
I am new to Python.
I intend to do Fourier Transform to an array of discrete points, (time, acceleration), and plot the result out.
I copy and paste the sample FFT code, and modify accordingly.
Please see codes:
import numpy as np
import matplotlib.pyplot as plt
# Load the .txt file in
myData = np.loadtxt('twenty_z_up.txt')
# Extract the time and acceleration columns
time = copy(myData[:,0])
# Extract the acceleration columns
zAcc = copy(myData[:,3])
t = np.arange(10080)
sp = np.fft.fft(zAcc)
freq = np.fft.fftfreq(t.shape[-1])
plt.plot(freq, sp.real)
myData is a rectangular matrix with 10080 rows and 10 columns.
Thus, zAcc is the row3 extracted from the matrix.
In the plot drawn by Spyder, most of the harmonics concentrated around 0.
They are all extremely small.
But my data are actually the accelerations of the phone carried by a walking person (including the gravity). So I expect the most significant harmonic happens around 2Hz.
Why is the graph non-sense?
Thanks in advance!
==============UPDATES: My Graphs======================
The first time domain one:
x-axis is in millisecond.
y-axis is in m/s^2, due to earth gravity, it has a DC offset of ~10.
You do get two spikes at (approximately) 2Hz. Your sampling period is around 2.8 ms (as best as I can infer from your first plot), giving +/-2Hz the normalized frequency of +/-0.056, which is about where your spikes are. fft.fftfreq by default returns the normalized frequency (which scales the sampling period). You can set the d argument to be the sampling period, and you'll get a vector containing the actual frequency.
Your huge spike in the middle is obviously the DC offset (which you can trivially remove by subtracting the mean).
As others said, we need to see the data, post it somewhere. Just to check, try first fixing the timestep size in fftfreq, then plot this synthetic signal, and then plot your signal to see how they compare:
timestep=1./50.#Assume sampling at 50Hz. Change this accordingly.
N=10080#the number of samples
T=N*timestep
t = np.linspace(0,T,N)#needed only to generate xAcc_synthetic
freq=2.#peak a frequency at 2Hz
#generate synthetic signal at 2Hz and add some noise to it
xAcc_synthetic = sin((2*np.pi)*freq*t)+np.random.rand(N)*0.2
sp_synthetic = np.fft.fft(xAcc_synthetic)
freq = np.fft.fftfreq(t.size,d=timestep)
print max(abs(freq))==(1/timestep)/2.#simple check highest freq.
plt.plot(freq, abs(sp_synthetic))
xlabel('Hz')
Now, at the x axis equal to 2 you actually have a physical frequency of 2Hz, and you may spot the more pronounced peak you are looking for. Moreover, you may want to have a look also at yAcc and zAcc.
Operators used to examine the spectrum, knowing the location and width of each peak and judge the piece the spectrum belongs to. In the new way, the image is captured by a camera to a screen. And the width of each band must be computed programatically.
Old system: spectroscope -> human eye
New system: spectroscope -> camera -> program
What is a good method to compute the width of each band, given their approximate X-axis positions; given that this task used to be performed perfectly by eye, and must now be performed by program?
Sorry if I am short of details, but they are scarce.
Program listing that generated the previous graph; I hope it is relevant:
import Image
from scipy import *
from scipy.optimize import leastsq
# Load the picture with PIL, process if needed
pic = asarray(Image.open("spectrum.jpg"))
# Average the pixel values along vertical axis
pic_avg = pic.mean(axis=2)
projection = pic_avg.sum(axis=0)
# Set the min value to zero for a nice fit
projection /= projection.mean()
projection -= projection.min()
#print projection
# Fit function, two gaussians, adjust as needed
def fitfunc(p,x):
return p[0]*exp(-(x-p[1])**2/(2.0*p[2]**2)) + \
p[3]*exp(-(x-p[4])**2/(2.0*p[5]**2))
errfunc = lambda p, x, y: fitfunc(p,x)-y
# Use scipy to fit, p0 is inital guess
p0 = array([0,20,1,0,75,10])
X = xrange(len(projection))
p1, success = leastsq(errfunc, p0, args=(X,projection))
Y = fitfunc(p1,X)
# Output the result
print "Mean values at: ", p1[1], p1[4]
# Plot the result
from pylab import *
#subplot(211)
#imshow(pic)
#subplot(223)
#plot(projection)
#subplot(224)
#plot(X,Y,'r',lw=5)
#show()
subplot(311)
imshow(pic)
subplot(312)
plot(projection)
subplot(313)
plot(X,Y,'r',lw=5)
show()
Given an approximate starting point, you could use a simple algorithm that finds a local maxima closest to this point. Your fitting code may be doing that already (I wasn't sure whether you were using it successfully or not).
Here's some code that demonstrates simple peak finding from a user-given starting point:
#!/usr/bin/env python
from __future__ import division
import numpy as np
from matplotlib import pyplot as plt
# Sample data with two peaks: small one at t=0.4, large one at t=0.8
ts = np.arange(0, 1, 0.01)
xs = np.exp(-((ts-0.4)/0.1)**2) + 2*np.exp(-((ts-0.8)/0.1)**2)
# Say we have an approximate starting point of 0.35
start_point = 0.35
# Nearest index in "ts" to this starting point is...
start_index = np.argmin(np.abs(ts - start_point))
# Find the local maxima in our data by looking for a sign change in
# the first difference
# From http://stackoverflow.com/a/9667121/188535
maxes = (np.diff(np.sign(np.diff(xs))) < 0).nonzero()[0] + 1
# Find which of these peaks is closest to our starting point
index_of_peak = maxes[np.argmin(np.abs(maxes - start_index))]
print "Peak centre at: %.3f" % ts[index_of_peak]
# Quick plot showing the results: blue line is data, green dot is
# starting point, red dot is peak location
plt.plot(ts, xs, '-b')
plt.plot(ts[start_index], xs[start_index], 'og')
plt.plot(ts[index_of_peak], xs[index_of_peak], 'or')
plt.show()
This method will only work if the ascent up the peak is perfectly smooth from your starting point. If this needs to be more resilient to noise, I have not used it, but PyDSTool seems like it might help. This SciPy post details how to use it for detecting 1D peaks in a noisy data set.
So assume at this point you've found the centre of the peak. Now for the width: there are several methods you could use, but the easiest is probably the "full width at half maximum" (FWHM). Again, this is simple and therefore fragile. It will break for close double-peaks, or for noisy data.
The FWHM is exactly what its name suggests: you find the width of the peak were it's halfway to the maximum. Here's some code that does that (it just continues on from above):
# FWHM...
half_max = xs[index_of_peak]/2
# This finds where in the data we cross over the halfway point to our peak. Note
# that this is global, so we need an extra step to refine these results to find
# the closest crossovers to our peak.
# Same sign-change-in-first-diff technique as above
hm_left_indices = (np.diff(np.sign(np.diff(np.abs(xs[:index_of_peak] - half_max)))) > 0).nonzero()[0] + 1
# Add "index_of_peak" to result because we cut off the left side of the data!
hm_right_indices = (np.diff(np.sign(np.diff(np.abs(xs[index_of_peak:] - half_max)))) > 0).nonzero()[0] + 1 + index_of_peak
# Find closest half-max index to peak
hm_left_index = hm_left_indices[np.argmin(np.abs(hm_left_indices - index_of_peak))]
hm_right_index = hm_right_indices[np.argmin(np.abs(hm_right_indices - index_of_peak))]
# And the width is...
fwhm = ts[hm_right_index] - ts[hm_left_index]
print "Width: %.3f" % fwhm
# Plot to illustrate FWHM: blue line is data, red circle is peak, red line
# shows FWHM
plt.plot(ts, xs, '-b')
plt.plot(ts[index_of_peak], xs[index_of_peak], 'or')
plt.plot(
[ts[hm_left_index], ts[hm_right_index]],
[xs[hm_left_index], xs[hm_right_index]], '-r')
plt.show()
It doesn't have to be the full width at half maximum — as one commenter points out, you can try to figure out where your operators' normal threshold for peak detection is, and turn that into an algorithm for this step of the process.
A more robust way might be to fit a Gaussian curve (or your own model) to a subset of the data centred around the peak — say, from a local minima on one side to a local minima on the other — and use one of the parameters of that curve (eg. sigma) to calculate the width.
I realise this is a lot of code, but I've deliberately avoided factoring out the index-finding functions to "show my working" a bit more, and of course the plotting functions are there just to demonstrate.
Hopefully this gives you at least a good starting point to come up with something more suitable to your particular set.
Late to the party, but for anyone coming across this question in the future...
Eye movement data looks very similar to this; I'd base an approach off that used by Nystrom + Holmqvist, 2010. Smooth the data using a Savitsky-Golay filter (scipy.signal.savgol_filter in scipy v0.14+) to get rid of some of the low-level noise while keeping the large peaks intact - the authors recommend using an order of 2 and a window size of about twice the width of the smallest peak you want to be able to detect. You can find where the bands are by arbitrarily removing all values above a certain y value (set them to numpy.nan). Then take the (nan)mean and (nan)standard deviation of the remainder, and remove all values greater than the mean + [parameter]*std (I think they use 6 in the paper). Iterate until you're not removing any data points - but depending on your data, certain values of [parameter] may not stabilise. Then use numpy.isnan() to find events vs non-events, and numpy.diff() to find the start and end of each event (values of -1 and 1 respectively). To get even more accurate start and end points, you can scan along the data backward from each start and forward from each end to find the nearest local minimum which has value smaller than mean + [another parameter]*std (I think they use 3 in the paper). Then you just need to count the data points between each start and end.
This won't work for that double peak; you'd have to do some extrapolation for that.
The best method might be to statistically compare a bunch of methods with human results.
You would take a large variety data and a large variety of measurement estimates (widths at various thresholds, area above various thresholds, different threshold selection methods, 2nd moments, polynomial curve fits of various degrees, pattern matching, and etc.) and compare these estimates to human measurements of the same data set. Pick the estimate method that correlates best with expert human results. Or maybe pick several methods, the best one for each of various heights, for various separations from other peaks, and etc.