How to implement/perform DFT on a segment in python?

How to implement/perform DFT on a segment in python? - python

I am trying to write a simple program in python that will calculate and display DFT output of 1 segment.
My signal is 3 seconds long, I want to calculate DFT for every 10ms long segment. Sampling rate is 44100. So one segment is 441 samples long.
Since I am in the phase of testing this and original program is much larger(speech recognition) here is an isolated part for testing purposes that unfortunately behaves odd. Either that or my lack of knowledge on the subject.
I read somewhere that DFT input should be rounded to power of 2 so I arranged my array to 512 instead 441. Is this true?
If I am sampling at a rate of 44100, at most I can reach frequency of 22050Hz and for sample of length 512(~441) at least 100Hz ?
If 2. is true, then I can have all frequencies between 100hz and 22050hz in that 10ms segments, but the length of segment is 512(441) samples only, output of fft returns array of 256(220) values, they cannot contain all 21950 frequencies in there, can they?
My first guess is that the values in output of fft should be multiplied by 100, since 10ms is 100th of a second. Is this good reasoning?
The following program for two given frequencies 1000 and 2000 returns two spikes on graph at positions 24 and 48 in the output array and ~2071 and ~4156 on the graph. Since ratio of numbers is okay (2000:1000 = 48:24) I wonder if I should ignore some starting part of the fft output?
import matplotlib.pyplot as plt
import numpy as np
t = np.arange(0, 1, 1/512.0) # We create 512 long array
# We calculate here two sinusoids together at 1000hz and 2000hz
y = np.sin(2*np.pi*1000*t) + np.sin(2*np.pi*2000*t)
n = len(y)
k = np.arange(n)
# Problematic part is around here, I am not quite sure what
# should be on the horizontal line
T = n/44100.0
frq = k/T
frq = frq[range(n/2)]
Y = fft(y)
Y = Y[range(n/2)]
# Convert from complex numbers to magnitudes
iY = []
for f in Y:
iY.append(np.sqrt(f.imag * f.imag + f.real * f.real))
plt.plot(frq, iY, 'r')
plt.xlabel('freq (HZ)')
plt.show()

I read somewhere that the DFT input should be rounded to power of 2 so I arranged my array to 512 instead 441. Is this true?
The DFT is defined for all sizes. However, implementations of the DFT such as the FFT are generally much more efficient for sizes which can be factored in small primes. Some library implementations have limitations and do not support sizes other than powers of 2, but that isn't the case with numpy.
If I am sampling at a rate of 44100, at most I can reach frequency of 22050Hz and for sample of length 512(~441) at least 100Hz?
The highest frequency for even sized DFT will be 44100/2 = 22050Hz as you've correctly pointed out. Note that for odd sized DFT the highest frequency bin will correspond to a frequency slightly less than the Nyquist frequency. As for the minimum frequency, it will always be 0Hz. The next non-zero frequency will be 44100.0/N where N is the DFT length in samples (which gives 100Hz if you are using a DFT length of 441 samples and ~86Hz with a DFT length of 512 samples).
If 2) is true, then I can have all frequencies between 100Hz and 22050Hz in that 10ms segments, but the length of segment is 512(441) samples only, output of fft returns array of 256(220) values, they cannot contain all 21950 frequencies in there, can they?
First there aren't 21950 frequencies between 100Hz and 22050Hz since frequencies are continuous and not limited to integer frequencies. That said, you are correct in your realization that the output of the DFT will be limited to a much smaller set of frequencies. More specifically the DFT represents the frequency spectrum at discrete frequency step: 0, 44100/N, 2*44100/N, ...
My first guess is that the values in output of FFT should be multiplied by 100, since 10ms is 100th of a second. Is this good reasoning?
There is no need to multiply the FFT output by 100. But if you meant multiples of 100Hz with a DFT of length 441 and a sampling rate of 44100Hz, then your guess would be correct.
The following program for two given frequencies 1000 and 2000 returns two spikes on graph at positions 24 and 48 in the output array and ~2071 and ~4156 on the graph. Since ratio of numbers is okay (2000:1000 = 48:24) I wonder if I should ignore some starting part of the fft output?
Here the problem is more significant. As you declare the array
t = np.arange(0, 1, 1/512.0) # We create 512 long array
you are in fact representing a signal with a sampling rate of 512Hz instead of 44100Hz. As a result the tones you are generating are severely aliased (to 24Hz and 48Hz respectively). This is further compounded by the fact that you then use a sampling rate of 44100Hz for the frequency axis conversion. This is why the peaks are not appearing at the expected 1000Hz and 2000Hz frequencies.
To represent 512 samples of a signal sampled at a rate of 44100Hz, you should instead use
t = np.arange(0, 511.0/44100, 1/44100.0)
at which point the formula you used for the frequency axis would be correct (since it is based of the same 44100Hz sampling rate). You should then be able to see peaks near the expected 1000Hz and 2000Hz (the closest frequency bins of the peaks being at ~1033Hz and 1981Hz).

1) I read somewhere that DFT input should be rounded to power of 2 so
I aranged my array to 512 instead 441. Is this true?
Yes, DFT length should be a power of two. Just pad the input with zero to match 512.
2) If I am sampling at a rate of 44100, at most I can reach frequency
of 22050hz and for sample of length 512(~441) at least 100hz ?
Yes, the highest frequency you can get is half the the sampling rate, It's called the Nyquist frequency.
No, the lowest frequency bin you get (the first bin of the DFT) is called the DC component and marks the average of the signal. The next lowest frequency bin in your case is 22050 / 256 = 86Hz, and then 172Hz, 258Hz, and so on until 22050Hz.
You can get this freqs with the numpy.fftfreq() function.
3) If 2) is true, then I can have all frequencies between 100hz and
22050hz in that 10ms segments, but the length of segment is 512(441)
samples only, output of fft returns array of 256(220) values, they
cannot contain all 21950 frequencies in there, can they?
DFT doesn't lose the original signal's data, but it lacks accuracy when the DFT size is small. You may zero-pad it to make the DFT size larger, such as 1024 or 2048.
The DFT bin refers to a frequency range centered at each of the N output
points. The width of the bin is sample rate/2,
and it extends from: center frequency -(sample rate/N)/2 to center
frequency +(sample rate/N)/2. In other words, half of the bin extends
below each of the N output points, and half above it.
4) My first guess is that the values in output of fft should be
multiplied by 100, since 10ms is 100th of a second. Is this good
reasoning?
No, The value should not be multiplied if you want to preserve the magnitude.
The following program for two given frequencies 1000 and 2000 returns
two spikes on graph at positions 24 and 48 in the output array and
~2071 and ~4156 on the graph. Since ratio of numbers is okay
(2000:1000 = 48:24) I wonder if I should ignore some starting part of
the fft output?
The DFT result is mirrored in real input. In other words, your frequencies will be like this:
n 0 1 2 3 4 ... 255 256 257 ... 511 512
Hz DC 86 172 258 344 ... 21964 22050 21964 ... 86 0

Related

Amplitude of rfft for real-valued array

I'm calculating the RFFT of a signal of length 3000 (sampled at 100 Hz) with only real valued entries:
from scipy.fft import rfft
coeffs = rfft(values)
coeffs = np.abs(coeffs)
With rfft I'm only getting half of the coefficients, i.e. the symmetric ones are dicarded (due to real valued input).
Is it correct to scale the values by coeffs = (2 / len(values)) * coeffs to get the amplitudes?
Edit: Below I have appended a plot of the amplitudes vs. Frequency (bins) for accelerometer and gyroscope (shaded area is standard deviation). For accelerometer the energy in the first FFT bin is much higher than in the other bins (> 2 in the first bin and around < 0.4 in the other bins). For gyroscope it is different and the energy is much more distributed.
Does that mean that for acccelerometer the FFT looks good but for gyroscope it is worse? Further, is it reasonable to cut the FFT at 100 Hz (i.e. take only bins < 100 Hz) or take the first few bins until 95% of the energy is kept?

The approximate relationship I provided in this post holds whether you throw out half the coefficients or not.
So, if the conditions indicated in that post apply to your situation, then you could get an estimate of the amplitude of a dominant sinusoidal component with
approx_sinusoidal_amplitude = (2 / len(values)) * np.abs(coeffs[k])
for some index k corresponding to the frequency of the sinusoidal component (which according to the limitations indicated in my other post has to be at or near a multiple of 100/3000 ~ 0.033Hz in your case). For a dominant sinusoidal component, this index would typically correspond to a local peak in the frequency spectrum. Note however that if your signal is a mixture of various frequency components, the individual components may affect the frequency spectrum in such a way that the peak does not appear clearly.

Usage of signal.welch

I want to use python signal.welch. The usage of signal.welch is as follows,
f, Pxx_den = signal.welch(x, fs, nperseg=1024)
In my case, x is for example gyroscopy signal ( 1 x 1024 samples (for about 10 sec data)), fs = 100 Hz. In my case, how can I decide nperseg? I want to know how I can select nperseg when the number of samples of the input is 1024 (about 10 sec).

scipy.signal.welch estimates the power spectral density by dividing the data into segments and averaging periodograms computed on each segment. The nperseg arg is the segment length and (by default) also determines the FFT size.
On the one hand, making nperseg smaller allows the input to divide into more segments, good for more averaging to get a more reliable estimate. On the other hand, making nperseg larger improves the frequency resolution of the result. In any case, nperseg should be smaller than the input size in order to get multiple segments.
The default segment length is 256 samples, which seems like a reasonable starting point for a 1024-sample input.

I have an array of samples at equal intervals of 25 ms. I want to determine the frequency spectrum for the fundamental

From say N=1000 voltage samples at 1 ms sample rate. I need to find precisely with python/numpy the amplitude and angle of the fundamental, which is between 45 and 55 Hz as well as any side bands, that may exist.
Do I need a phase lock loop to do that, or can it be done without ?

Your measurement frequency is fundamentally too low and should be more than double your expected event frequency!
measured: 0.025s
event range: 0.0182-0.0222s
more here: https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem

A phase lock loop is be a reasonable approach to estimate the frequency etc. of the fundamental. Supposing you have collected the N samples up front, another way to do the analysis is:
Apply a window, like np.hanning(N), by multiplying it pointwise with the samples.
Compute the spectrum with np.fft.rfft. For a sampling interval of Ts seconds, the nth element of the resulting array is the DFT coefficient for frequency n * N * Ts in units of Hz (or for the values N=1000, Ts=0.001, simple n Hz).
Find the bin with the peak magnitude between 45 and 55. The location of the peak gives the frequency of the fundamental. You can interpolate a polynomial (np.polyfit) across a few neighboring bins and find its peak to get a more precise estimate. Magnitude and complex phase of the peak give the amplitude and phase (angle) of the fundamental.
Plot the magnitude of the spectrum to look for sidebands.

power spectral density-scipy.signal

While trying to compute the Power spectral density with an acquisition rate of 300000hz using ... signal.periodogram(x, fs,nfft=4096) , I get the graph upto 150000Hz and not upto 300000. Why is this upto half the value ? What is the meaning of sampling rate here?
In the example given in scipy documentation , the sampling rate is 10000Hz but we see in the plot only upto 5000Hz.
https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.signal.periodogram.html

The spectrum of real-valued signal is always symmetric with respect to the Nyquist frequency (half of the sampling rate). As a result, there is often no need to store or plot the redundant symmetric portion of the spectrum.
If you still want to see the whole spectrum, you can set the return_onesided argument to True as follows:
f, Pxx_den = signal.periodogram(x, fs, return_onesided=False)
The resulting plot of the same example provided in scipy.periodogram documentation would then cover a 10000Hz frequency range as would be expected:

If you check the length of f in the example:
>>> len(f)
>>> 50001
This is NOT 50000 Hz. This is because scipy.signal.periodogram calls scipy.signal.welch with the parameter nperseg=x.shape[-1] by default. This is the correct input for scipy.signal.welch. However, if dig into source and see lines 328-329 (as of now), you'll see the reason why the size of output is 50001.
if nfft % 2 == 0: # even
outshape[-1] = nfft // 2 + 1

Analyze audio using Fast Fourier Transform

I am trying to create a graphical spectrum analyzer in python.
I am currently reading 1024 bytes of a 16 bit dual channel 44,100 Hz sample rate audio stream and averaging the amplitude of the 2 channels together. So now I have an array of 256 signed shorts. I now want to preform a fft on that array, using a module like numpy, and use the result to create the graphical spectrum analyzer, which, to start will just be 32 bars.
I have read the wikipedia articles on Fast Fourier Transform and Discrete Fourier Transform but I am still unclear of what the resulting array represents. This is what the array looks like after I preform an fft on my array using numpy:
[ -3.37260500e+05 +0.00000000e+00j 7.11787022e+05 +1.70667403e+04j
4.10040193e+05 +3.28653370e+05j 9.90933073e+04 +1.60555003e+05j
2.28787050e+05 +3.24141951e+05j 2.09781047e+04 +2.31063376e+05j
-2.15941453e+05 +1.63773851e+05j -7.07833051e+04 +1.52467334e+05j
-1.37440802e+05 +6.28107674e+04j -7.07536614e+03 +5.55634993e+03j
-4.31009964e+04 -1.74891657e+05j 1.39384348e+05 +1.95956947e+04j
1.73613033e+05 +1.16883207e+05j 1.15610357e+05 -2.62619884e+04j
-2.05469722e+05 +1.71343186e+05j -1.56779748e+04 +1.51258101e+05j
-2.08639913e+05 +6.07372799e+04j -2.90623668e+05 -2.79550838e+05j
-1.68112214e+05 +4.47877871e+04j -1.21289916e+03 +1.18397979e+05j
-1.55779104e+05 +5.06852464e+04j 1.95309737e+05 +1.93876325e+04j
-2.80400414e+05 +6.90079265e+04j 1.25892113e+04 -1.39293422e+05j
3.10709174e+04 -1.35248953e+05j 1.31003438e+05 +1.90799303e+05j...
I am wondering what exactly these numbers represent and how I would convert these numbers into a percentage of a height for each of the 32 bars. Also, should I be averaging the 2 channels together?

The array you are showing is the Fourier Transform coefficients of the audio signal. These coefficients can be used to get the frequency content of the audio. The FFT is defined for complex valued input functions, so the coefficients you get out will be imaginary numbers even though your input is all real values. In order to get the amount of power in each frequency, you need to calculate the magnitude of the FFT coefficient for each frequency. This is not just the real component of the coefficient, you need to calculate the square root of the sum of the square of its real and imaginary components. That is, if your coefficient is a + b*j, then its magnitude is sqrt(a^2 + b^2).
Once you have calculated the magnitude of each FFT coefficient, you need to figure out which audio frequency each FFT coefficient belongs to. An N point FFT will give you the frequency content of your signal at N equally spaced frequencies, starting at 0. Because your sampling frequency is 44100 samples / sec. and the number of points in your FFT is 256, your frequency spacing is 44100 / 256 = 172 Hz (approximately)
The first coefficient in your array will be the 0 frequency coefficient. That is basically the average power level for all frequencies. The rest of your coefficients will count up from 0 in multiples of 172 Hz until you get to 128. In an FFT, you only can measure frequencies up to half your sample points. Read these links on the Nyquist Frequency and Nyquist-Shannon Sampling Theorem if you are a glutton for punishment and need to know why, but the basic result is that your lower frequencies are going to be replicated or aliased in the higher frequency buckets. So the frequencies will start from 0, increase by 172 Hz for each coefficient up to the N/2 coefficient, then decrease by 172 Hz until the N - 1 coefficient.
That should be enough information to get you started. If you would like a much more approachable introduction to FFTs than is given on Wikipedia, you could try Understanding Digital Signal Processing: 2nd Ed.. It was very helpful for me.
So that is what those numbers represent. Converting to a percentage of height could be done by scaling each frequency component magnitude by the sum of all component magnitudes. Although, that would only give you a representation of the relative frequency distribution, and not the actual power for each frequency. You could try scaling by the maximum magnitude possible for a frequency component, but I'm not sure that that would display very well. The quickest way to find a workable scaling factor would be to experiment on loud and soft audio signals to find the right setting.
Finally, you should be averaging the two channels together if you want to show the frequency content of the entire audio signal as a whole. You are mixing the stereo audio into mono audio and showing the combined frequencies. If you want two separate displays for right and left frequencies, then you will need to perform the Fourier Transform on each channel separately.

Although this thread is years old, I found it very helpful. I just wanted to give my input to anyone who finds this and are trying to create something similar.
As for the division into bars this should not be done as antti suggest, by dividing the data equally based on the number of bars. The most useful would be to divide the data into octave parts, each octave being double the frequency of the previous. (ie. 100hz is one octave above 50hz, which is one octave above 25hz).
Depending on how many bars you want, you divide the whole range into 1/X octave ranges.
Based on a given center frequency of A on the bar, you get the upper and lower limits of the bar from:
upper limit = A * 2 ^ ( 1 / 2X )
lower limit = A / 2 ^ ( 1 / 2X )
To calculate the next adjoining center frequency you use a similar calculation:
next lower = A / 2 ^ ( 1 / X )
next higher = A * 2 ^ ( 1 / X )
You then average the data that fits into these ranges to get the amplitude for each bar.
For example:
We want to divide into 1/3 octaves ranges and we start with a center frequency of 1khz.
Upper limit = 1000 * 2 ^ ( 1 / ( 2 * 3 ) ) = 1122.5
Lower limit = 1000 / 2 ^ ( 1 / ( 2 * 3 ) ) = 890.9
Given 44100hz and 1024 samples (43hz between each data point) we should average out values 21 through 26. ( 890.9 / 43 = 20.72 ~ 21 and 1122.5 / 43 = 26.10 ~ 26 )
(1/3 octave bars would get you around 30 bars between ~40hz and ~20khz).
As you can figure out by now, as we go higher we will average a larger range of numbers. Low bars typically only include 1 or a small number of data points. While the higher bars can be the average of hundreds of points. The reason being that 86hz is an octave above 43hz... while 10086hz sounds almost the same as 10043hz.

what you have is a sample whose length in time is 256/44100 = 0.00580499 seconds. This means that your frequency resolution is 1 / 0.00580499 = 172 Hz. The 256 values you get out from Python correspond to the frequencies, basically, from 86 Hz to 255*172+86 Hz = 43946 Hz. The numbers you get out are complex numbers (hence the "j" at the end of every second number).
EDITED: FIXED WRONG INFORMATION
You need to convert the complex numbers into amplitude by calculating the sqrt(i2 + j2) where i and j are the real and imaginary parts, resp.
If you want to have 32 bars, you should as far as I understand take the average of four successive amplitudes, getting 256 / 4 = 32 bars as you want.

FFT return N complex values which of you can compute the module=sqrt(real_part^2+imaginary_part^2). To get the value for each band you have to sum the modules about all harmonics inside the band. Below you can see an example about a 10 bars spectrum analyzer. The c code has to be wrapped to get a pyd python module.
float *samples_vett;
float *out_filters_vett;
int Nsamples;
float band_power = 0.0;
float harmonic_amplitude=0.0;
int i, out_index;
out_index=0;
for (i = 0; i < Nsamples / 2 + 1; i++)
{
if (i == 1 || i == 2 || i == 4 || i == 8 || i == 17 || i == 33 || i == 66 || i == 132 || i == 264 || i == 511)
{
out_filters_vett[out_index] = band_power;
band_power = 0;
out_index++;
}
harmonic_amplitude = sqrt(pow(ttfr_out_vett[i].r, 2) + pow(ttfr_out_vett[i].i, 2));
band_power += harmonic_amplitude;
}
I designed and made a whole 10 led bar spectrum analyzer by Python. Instead to use the nunmpy library (too big and useless to get just the FFT) a python pyd module (just 27KB) to get the FFT and to split the entire audio spectrum to bands was created.
In addition, to read the output audio a loopback WASapi portaudio pyd module was created. You can see the project (block diagram) in the image
10BarsSpectrumAnalyzerWithWASapi.jpg
Just added a tutorial video on my YouTube channel: how to design and make a very smart Python Spectrum Analyzer 10 Led Bar

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.