Analyze audio using Fast Fourier Transform

Analyze audio using Fast Fourier Transform - python

I am trying to create a graphical spectrum analyzer in python.
I am currently reading 1024 bytes of a 16 bit dual channel 44,100 Hz sample rate audio stream and averaging the amplitude of the 2 channels together. So now I have an array of 256 signed shorts. I now want to preform a fft on that array, using a module like numpy, and use the result to create the graphical spectrum analyzer, which, to start will just be 32 bars.
I have read the wikipedia articles on Fast Fourier Transform and Discrete Fourier Transform but I am still unclear of what the resulting array represents. This is what the array looks like after I preform an fft on my array using numpy:
[ -3.37260500e+05 +0.00000000e+00j 7.11787022e+05 +1.70667403e+04j
4.10040193e+05 +3.28653370e+05j 9.90933073e+04 +1.60555003e+05j
2.28787050e+05 +3.24141951e+05j 2.09781047e+04 +2.31063376e+05j
-2.15941453e+05 +1.63773851e+05j -7.07833051e+04 +1.52467334e+05j
-1.37440802e+05 +6.28107674e+04j -7.07536614e+03 +5.55634993e+03j
-4.31009964e+04 -1.74891657e+05j 1.39384348e+05 +1.95956947e+04j
1.73613033e+05 +1.16883207e+05j 1.15610357e+05 -2.62619884e+04j
-2.05469722e+05 +1.71343186e+05j -1.56779748e+04 +1.51258101e+05j
-2.08639913e+05 +6.07372799e+04j -2.90623668e+05 -2.79550838e+05j
-1.68112214e+05 +4.47877871e+04j -1.21289916e+03 +1.18397979e+05j
-1.55779104e+05 +5.06852464e+04j 1.95309737e+05 +1.93876325e+04j
-2.80400414e+05 +6.90079265e+04j 1.25892113e+04 -1.39293422e+05j
3.10709174e+04 -1.35248953e+05j 1.31003438e+05 +1.90799303e+05j...
I am wondering what exactly these numbers represent and how I would convert these numbers into a percentage of a height for each of the 32 bars. Also, should I be averaging the 2 channels together?

The array you are showing is the Fourier Transform coefficients of the audio signal. These coefficients can be used to get the frequency content of the audio. The FFT is defined for complex valued input functions, so the coefficients you get out will be imaginary numbers even though your input is all real values. In order to get the amount of power in each frequency, you need to calculate the magnitude of the FFT coefficient for each frequency. This is not just the real component of the coefficient, you need to calculate the square root of the sum of the square of its real and imaginary components. That is, if your coefficient is a + b*j, then its magnitude is sqrt(a^2 + b^2).
Once you have calculated the magnitude of each FFT coefficient, you need to figure out which audio frequency each FFT coefficient belongs to. An N point FFT will give you the frequency content of your signal at N equally spaced frequencies, starting at 0. Because your sampling frequency is 44100 samples / sec. and the number of points in your FFT is 256, your frequency spacing is 44100 / 256 = 172 Hz (approximately)
The first coefficient in your array will be the 0 frequency coefficient. That is basically the average power level for all frequencies. The rest of your coefficients will count up from 0 in multiples of 172 Hz until you get to 128. In an FFT, you only can measure frequencies up to half your sample points. Read these links on the Nyquist Frequency and Nyquist-Shannon Sampling Theorem if you are a glutton for punishment and need to know why, but the basic result is that your lower frequencies are going to be replicated or aliased in the higher frequency buckets. So the frequencies will start from 0, increase by 172 Hz for each coefficient up to the N/2 coefficient, then decrease by 172 Hz until the N - 1 coefficient.
That should be enough information to get you started. If you would like a much more approachable introduction to FFTs than is given on Wikipedia, you could try Understanding Digital Signal Processing: 2nd Ed.. It was very helpful for me.
So that is what those numbers represent. Converting to a percentage of height could be done by scaling each frequency component magnitude by the sum of all component magnitudes. Although, that would only give you a representation of the relative frequency distribution, and not the actual power for each frequency. You could try scaling by the maximum magnitude possible for a frequency component, but I'm not sure that that would display very well. The quickest way to find a workable scaling factor would be to experiment on loud and soft audio signals to find the right setting.
Finally, you should be averaging the two channels together if you want to show the frequency content of the entire audio signal as a whole. You are mixing the stereo audio into mono audio and showing the combined frequencies. If you want two separate displays for right and left frequencies, then you will need to perform the Fourier Transform on each channel separately.

Although this thread is years old, I found it very helpful. I just wanted to give my input to anyone who finds this and are trying to create something similar.
As for the division into bars this should not be done as antti suggest, by dividing the data equally based on the number of bars. The most useful would be to divide the data into octave parts, each octave being double the frequency of the previous. (ie. 100hz is one octave above 50hz, which is one octave above 25hz).
Depending on how many bars you want, you divide the whole range into 1/X octave ranges.
Based on a given center frequency of A on the bar, you get the upper and lower limits of the bar from:
upper limit = A * 2 ^ ( 1 / 2X )
lower limit = A / 2 ^ ( 1 / 2X )
To calculate the next adjoining center frequency you use a similar calculation:
next lower = A / 2 ^ ( 1 / X )
next higher = A * 2 ^ ( 1 / X )
You then average the data that fits into these ranges to get the amplitude for each bar.
For example:
We want to divide into 1/3 octaves ranges and we start with a center frequency of 1khz.
Upper limit = 1000 * 2 ^ ( 1 / ( 2 * 3 ) ) = 1122.5
Lower limit = 1000 / 2 ^ ( 1 / ( 2 * 3 ) ) = 890.9
Given 44100hz and 1024 samples (43hz between each data point) we should average out values 21 through 26. ( 890.9 / 43 = 20.72 ~ 21 and 1122.5 / 43 = 26.10 ~ 26 )
(1/3 octave bars would get you around 30 bars between ~40hz and ~20khz).
As you can figure out by now, as we go higher we will average a larger range of numbers. Low bars typically only include 1 or a small number of data points. While the higher bars can be the average of hundreds of points. The reason being that 86hz is an octave above 43hz... while 10086hz sounds almost the same as 10043hz.

what you have is a sample whose length in time is 256/44100 = 0.00580499 seconds. This means that your frequency resolution is 1 / 0.00580499 = 172 Hz. The 256 values you get out from Python correspond to the frequencies, basically, from 86 Hz to 255*172+86 Hz = 43946 Hz. The numbers you get out are complex numbers (hence the "j" at the end of every second number).
EDITED: FIXED WRONG INFORMATION
You need to convert the complex numbers into amplitude by calculating the sqrt(i2 + j2) where i and j are the real and imaginary parts, resp.
If you want to have 32 bars, you should as far as I understand take the average of four successive amplitudes, getting 256 / 4 = 32 bars as you want.

FFT return N complex values which of you can compute the module=sqrt(real_part^2+imaginary_part^2). To get the value for each band you have to sum the modules about all harmonics inside the band. Below you can see an example about a 10 bars spectrum analyzer. The c code has to be wrapped to get a pyd python module.
float *samples_vett;
float *out_filters_vett;
int Nsamples;
float band_power = 0.0;
float harmonic_amplitude=0.0;
int i, out_index;
out_index=0;
for (i = 0; i < Nsamples / 2 + 1; i++)
{
if (i == 1 || i == 2 || i == 4 || i == 8 || i == 17 || i == 33 || i == 66 || i == 132 || i == 264 || i == 511)
{
out_filters_vett[out_index] = band_power;
band_power = 0;
out_index++;
}
harmonic_amplitude = sqrt(pow(ttfr_out_vett[i].r, 2) + pow(ttfr_out_vett[i].i, 2));
band_power += harmonic_amplitude;
}
I designed and made a whole 10 led bar spectrum analyzer by Python. Instead to use the nunmpy library (too big and useless to get just the FFT) a python pyd module (just 27KB) to get the FFT and to split the entire audio spectrum to bands was created.
In addition, to read the output audio a loopback WASapi portaudio pyd module was created. You can see the project (block diagram) in the image
10BarsSpectrumAnalyzerWithWASapi.jpg
Just added a tutorial video on my YouTube channel: how to design and make a very smart Python Spectrum Analyzer 10 Led Bar

Related

How can I report the outcome of the typical spatial frequency Butterworth filter in cycles / degree of visual angle?

I'm working on spatial frequency filtering using code from this site.
https://www.djmannion.net/psych_programming/vision/sf_filt/sf_filt.html
There is similar code here on stack exchange. What I was wondering though is how to convert the cutoff used in the Butterworth filter, which is a number from 0 to 1, to cycles / degree in the image when I report it. I feel like I'm missing something obvious. I'm imagining it has to do with the visual angle the image subtends and the resolution.

The Butterworth filter commonly used in psychology software to filter spatial frequencies from images is typically used as a low pass or high pass filter with a value given between 0 and 1 for the cutoff. That value will be the 50% cut point for the filter. For example, if it's low pass 50% or more of the frequencies below the cutoff will remain in the image and 50% or fewer above the cutoff will remain. The opposite is true for the high pass setting. It does have a steep rise to the function near the cutoff so it's good to use the cutoff to describe your filtered image.
So, what is the cutoff value and what does it mean? It's just a proportion of the max frequency in cycles/pixel. Once you know that max it's easy to derive, and it turns out the max is a constant. Suppose your image is 128x128. In 128 pixels you could have a maximum frequency of 64 cycles, or 0.5 cycles/pixel. Looking at fftfreq function in numpy or matlab reveals the max is always a 0.5 cycles/pixel. That makes sense because you need two pixels for a period. This means that a cutoff of 0.2 is going to be 0.1 cycles/pixel. Whatever cutoff you pick you can just divide in half and that's the cycles/pixel. Than all you need to do is scale that to cycles/degree.
Cycles / pixel can be converted to cycles / degree of visual angle once you know the size of the image presented in degrees of visual angle. For example, if someone's eye is d distance from the image of h height then the image will span 2 * atan( (h/2) / d) degrees of visual angle (assuming your atan function reports degrees and not radians, you may have to convert). Take the number of pixels in your image and divide it by the total span in degrees to get pixels / degree. Then multiply frequency (cycles / pixel) by pixels / degree to get cycles / degree.
Pseudo code equation version below:
cycles/pixel = cutoff * 0.5
d = distance_from_image_in_cm
h = height_of_image_in_cm
degrees_of_visual_angle = 2 * atan( (h/2) / d)
pixels/degree = total_pixels / degrees_of_visual_angle
cycles/degree = cycles/pixel * pixels/degree

Amplitude of rfft for real-valued array

I'm calculating the RFFT of a signal of length 3000 (sampled at 100 Hz) with only real valued entries:
from scipy.fft import rfft
coeffs = rfft(values)
coeffs = np.abs(coeffs)
With rfft I'm only getting half of the coefficients, i.e. the symmetric ones are dicarded (due to real valued input).
Is it correct to scale the values by coeffs = (2 / len(values)) * coeffs to get the amplitudes?
Edit: Below I have appended a plot of the amplitudes vs. Frequency (bins) for accelerometer and gyroscope (shaded area is standard deviation). For accelerometer the energy in the first FFT bin is much higher than in the other bins (> 2 in the first bin and around < 0.4 in the other bins). For gyroscope it is different and the energy is much more distributed.
Does that mean that for acccelerometer the FFT looks good but for gyroscope it is worse? Further, is it reasonable to cut the FFT at 100 Hz (i.e. take only bins < 100 Hz) or take the first few bins until 95% of the energy is kept?

The approximate relationship I provided in this post holds whether you throw out half the coefficients or not.
So, if the conditions indicated in that post apply to your situation, then you could get an estimate of the amplitude of a dominant sinusoidal component with
approx_sinusoidal_amplitude = (2 / len(values)) * np.abs(coeffs[k])
for some index k corresponding to the frequency of the sinusoidal component (which according to the limitations indicated in my other post has to be at or near a multiple of 100/3000 ~ 0.033Hz in your case). For a dominant sinusoidal component, this index would typically correspond to a local peak in the frequency spectrum. Note however that if your signal is a mixture of various frequency components, the individual components may affect the frequency spectrum in such a way that the peak does not appear clearly.

Fastest way to get average value of frequencies within range [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I am new in python as well as in signal processing. I am trying to calculate mean value among some frequency range of a signal.
What I am trying to do is as follows:
import numpy as np
data = <my 1d signal>
lF = <lower frequency>
uF = <upper frequency>
ps = np.abs(np.fft.fft(data)) ** 2 #array of power spectrum
time_step = 1.0 / 2000.0
freqs = np.fft.fftfreq(data.size, time_step) # array of frequencies
idx = np.argsort(freqs) # sorting frequencies
sum = 0
c =0
for i in idx:
if (freqs[i] >= lF) and (freqs[i] <= uF) :
sum += ps[i]
c +=1
avgValue = sum/c
print 'mean value is=',avgValue
I think calculation is fine, but it takes a lot of time like for data of more than 15GB and processing time grows exponentially. Is there any fastest way available such that I would be able to get mean value of power spectrum within some frequency range in fastest manner. Thanks in advance.
EDIT 1
I followed this code for calculation of power spectrum.
EDIT 2
This doesn't answer to my question as it calculates mean over the whole array/list but I want mean over part of the array.
EDIT 3
Solution by jez of using mask reduces time. Actually I have more than 10 channels of 1D signal and I want to treat them in a same manner i.e. average frequencies in a range of each channel separately. I think python loops are slow. Is there any alternate for that?
Like this:
for i in xrange(0,15):
data = signals[:, i]
ps = np.abs(np.fft.fft(data)) ** 2
freqs = np.fft.fftfreq(data.size, time_step)
mask = np.logical_and(freqs >= lF, freqs <= uF )
avgValue = ps[mask].mean()
print 'mean value is=',avgValue

The following performs a mean over a selected region:
mask = numpy.logical_and( freqs >= lF, freqs <= uF )
avgValue = ps[ mask ].mean()
For proper scaling of power values that have been computed as abs(fft coefficients)**2, you will need to multiply by (2.0 / len(data))**2 (Parseval's theorem)
Note that it gets slightly fiddly if your frequency range includes the Nyquist frequency—for precise results, handling of that single frequency component would then need to depend on whether data.size is even or odd). So for simplicity, ensure that uF is strictly less than max(freqs). [For similar reasons you should ensure lF > 0.]
The reasons for this are tedious to explain and even more tedious to correct for, but basically: the DC component is represented once in the DFT, whereas most other frequency components are represented twice (positive frequency and negative frequency) at half-amplitude each time. The even-more-annoying exception is the Nyquist frequency which is represented once at full amplitude if the signal length is even, but twice at half amplitude if the signal length is odd. All of this would not affect you if you were averaging amplitude: in a linear system, being represented twice compensates for being at half amplitude. But you're averaging power, i.e. squaring the values before averaging, so this compensation doesn't work out.
I've pasted my code for grokking all of this. This code also shows how you can work with multiple signals stacked in one numpy array, which addresses your follow-up question about avoiding loops in the multi-channel case. Remember to supply the correct axis argument both to numpy.fft.fft() and to my fft2ap().

If you really have a signal of 15 GB size, you'll not be able to calculate the FFT in an acceptable time. You can avoid using the FFT, if it is acceptable for you to approximate your frequency range by a band pass filter. The justification is the Poisson summation formula, which states that sum of squares is not changed by a FFT (or: the power is preserved). Staying in the time domain will let the processing time rise proportionally to the signal length.
The following code designs a Butterworth band path filter, plots the filter response and filters a sample signal:
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
dd = np.random.randn(10**4) # generate sample data
T = 1./2e3 # sampling interval
n, f_s = len(dd), 1./T # number of points and sampling frequency
# design band path filter:
f_l, f_u = 50, 500 # Band from 50 Hz to 500 Hz
wp = np.array([f_l, f_u])*2/f_s # normalized pass band frequnecies
ws = np.array([0.8*f_l, 1.2*f_u])*2/f_s # normalized stop band frequencies
b, a = signal.iirdesign(wp, ws, gpass=60, gstop=80, ftype="butter",
analog=False)
# plot filter response:
w, h = signal.freqz(b, a, whole=False)
ff_w = w*f_s/(2*np.pi)
fg, ax = plt.subplots()
ax.set_title('Butterworth filter amplitude response')
ax.plot(ff_w, np.abs(h))
ax.set_ylabel('relative Amplitude')
ax.grid(True)
ax.set_xlabel('Frequency in Hertz')
fg.canvas.draw()
# do the filtering:
zi = signal.lfilter_zi(b, a)*dd[0]
dd1, _ = signal.lfilter(b, a, dd, zi=zi)
# calculate the avarage:
avg = np.mean(dd1**2)
print("RMS values is %g" % avg)
plt.show()
Read the documentation to Scipy's Filter design to learn how to modify the parameters of the filter.
If you want to stay with the FFT, read the docs on signal.welch and plt.psd. The Welch algorithm is a method to efficiently calculate the power spectral density of a signal (with some trade-offs).

It is much easier to work with FFT if your arrays are power of 2. When you do fft the frequencies ranges from -pi/timestep to pi/timestep (assuming that frequency is defined as w = 2*pi/t, change the values accordingly if you use f =1/t representation). Your spectrum is arranged as 0 to minfreqq--maxfreq to zero. you can now use fftshift function to swap the frequencies and your spectrum looks like minfreq -- DC -- maxfreq. now you can easily determine your desired frequency range because it is already sorted.
The frequency step dw=2*pi/(time span) or max-frequency/(N/2) where N is array size.
N/2 point is DC or 0 frequency. Nth position is max frequency now you can easily determine your range
Lower_freq_indx=N/2+N/2*Lower_freq/max_freq
Higher_freq_index=N/2+N/2*Higher_freq/Max_freq
avg=sum(ps[lower_freq_indx:Higher_freq_index]/(Higher_freq_index-Lower_freq_index)
I hope this will help
regards

How to implement/perform DFT on a segment in python?

I am trying to write a simple program in python that will calculate and display DFT output of 1 segment.
My signal is 3 seconds long, I want to calculate DFT for every 10ms long segment. Sampling rate is 44100. So one segment is 441 samples long.
Since I am in the phase of testing this and original program is much larger(speech recognition) here is an isolated part for testing purposes that unfortunately behaves odd. Either that or my lack of knowledge on the subject.
I read somewhere that DFT input should be rounded to power of 2 so I arranged my array to 512 instead 441. Is this true?
If I am sampling at a rate of 44100, at most I can reach frequency of 22050Hz and for sample of length 512(~441) at least 100Hz ?
If 2. is true, then I can have all frequencies between 100hz and 22050hz in that 10ms segments, but the length of segment is 512(441) samples only, output of fft returns array of 256(220) values, they cannot contain all 21950 frequencies in there, can they?
My first guess is that the values in output of fft should be multiplied by 100, since 10ms is 100th of a second. Is this good reasoning?
The following program for two given frequencies 1000 and 2000 returns two spikes on graph at positions 24 and 48 in the output array and ~2071 and ~4156 on the graph. Since ratio of numbers is okay (2000:1000 = 48:24) I wonder if I should ignore some starting part of the fft output?
import matplotlib.pyplot as plt
import numpy as np
t = np.arange(0, 1, 1/512.0) # We create 512 long array
# We calculate here two sinusoids together at 1000hz and 2000hz
y = np.sin(2*np.pi*1000*t) + np.sin(2*np.pi*2000*t)
n = len(y)
k = np.arange(n)
# Problematic part is around here, I am not quite sure what
# should be on the horizontal line
T = n/44100.0
frq = k/T
frq = frq[range(n/2)]
Y = fft(y)
Y = Y[range(n/2)]
# Convert from complex numbers to magnitudes
iY = []
for f in Y:
iY.append(np.sqrt(f.imag * f.imag + f.real * f.real))
plt.plot(frq, iY, 'r')
plt.xlabel('freq (HZ)')
plt.show()

I read somewhere that the DFT input should be rounded to power of 2 so I arranged my array to 512 instead 441. Is this true?
The DFT is defined for all sizes. However, implementations of the DFT such as the FFT are generally much more efficient for sizes which can be factored in small primes. Some library implementations have limitations and do not support sizes other than powers of 2, but that isn't the case with numpy.
If I am sampling at a rate of 44100, at most I can reach frequency of 22050Hz and for sample of length 512(~441) at least 100Hz?
The highest frequency for even sized DFT will be 44100/2 = 22050Hz as you've correctly pointed out. Note that for odd sized DFT the highest frequency bin will correspond to a frequency slightly less than the Nyquist frequency. As for the minimum frequency, it will always be 0Hz. The next non-zero frequency will be 44100.0/N where N is the DFT length in samples (which gives 100Hz if you are using a DFT length of 441 samples and ~86Hz with a DFT length of 512 samples).
If 2) is true, then I can have all frequencies between 100Hz and 22050Hz in that 10ms segments, but the length of segment is 512(441) samples only, output of fft returns array of 256(220) values, they cannot contain all 21950 frequencies in there, can they?
First there aren't 21950 frequencies between 100Hz and 22050Hz since frequencies are continuous and not limited to integer frequencies. That said, you are correct in your realization that the output of the DFT will be limited to a much smaller set of frequencies. More specifically the DFT represents the frequency spectrum at discrete frequency step: 0, 44100/N, 2*44100/N, ...
My first guess is that the values in output of FFT should be multiplied by 100, since 10ms is 100th of a second. Is this good reasoning?
There is no need to multiply the FFT output by 100. But if you meant multiples of 100Hz with a DFT of length 441 and a sampling rate of 44100Hz, then your guess would be correct.
The following program for two given frequencies 1000 and 2000 returns two spikes on graph at positions 24 and 48 in the output array and ~2071 and ~4156 on the graph. Since ratio of numbers is okay (2000:1000 = 48:24) I wonder if I should ignore some starting part of the fft output?
Here the problem is more significant. As you declare the array
t = np.arange(0, 1, 1/512.0) # We create 512 long array
you are in fact representing a signal with a sampling rate of 512Hz instead of 44100Hz. As a result the tones you are generating are severely aliased (to 24Hz and 48Hz respectively). This is further compounded by the fact that you then use a sampling rate of 44100Hz for the frequency axis conversion. This is why the peaks are not appearing at the expected 1000Hz and 2000Hz frequencies.
To represent 512 samples of a signal sampled at a rate of 44100Hz, you should instead use
t = np.arange(0, 511.0/44100, 1/44100.0)
at which point the formula you used for the frequency axis would be correct (since it is based of the same 44100Hz sampling rate). You should then be able to see peaks near the expected 1000Hz and 2000Hz (the closest frequency bins of the peaks being at ~1033Hz and 1981Hz).

1) I read somewhere that DFT input should be rounded to power of 2 so
I aranged my array to 512 instead 441. Is this true?
Yes, DFT length should be a power of two. Just pad the input with zero to match 512.
2) If I am sampling at a rate of 44100, at most I can reach frequency
of 22050hz and for sample of length 512(~441) at least 100hz ?
Yes, the highest frequency you can get is half the the sampling rate, It's called the Nyquist frequency.
No, the lowest frequency bin you get (the first bin of the DFT) is called the DC component and marks the average of the signal. The next lowest frequency bin in your case is 22050 / 256 = 86Hz, and then 172Hz, 258Hz, and so on until 22050Hz.
You can get this freqs with the numpy.fftfreq() function.
3) If 2) is true, then I can have all frequencies between 100hz and
22050hz in that 10ms segments, but the length of segment is 512(441)
samples only, output of fft returns array of 256(220) values, they
cannot contain all 21950 frequencies in there, can they?
DFT doesn't lose the original signal's data, but it lacks accuracy when the DFT size is small. You may zero-pad it to make the DFT size larger, such as 1024 or 2048.
The DFT bin refers to a frequency range centered at each of the N output
points. The width of the bin is sample rate/2,
and it extends from: center frequency -(sample rate/N)/2 to center
frequency +(sample rate/N)/2. In other words, half of the bin extends
below each of the N output points, and half above it.
4) My first guess is that the values in output of fft should be
multiplied by 100, since 10ms is 100th of a second. Is this good
reasoning?
No, The value should not be multiplied if you want to preserve the magnitude.
The following program for two given frequencies 1000 and 2000 returns
two spikes on graph at positions 24 and 48 in the output array and
~2071 and ~4156 on the graph. Since ratio of numbers is okay
(2000:1000 = 48:24) I wonder if I should ignore some starting part of
the fft output?
The DFT result is mirrored in real input. In other words, your frequencies will be like this:
n 0 1 2 3 4 ... 255 256 257 ... 511 512
Hz DC 86 172 258 344 ... 21964 22050 21964 ... 86 0

Converting from samplerate/cutoff frequency to pi-radians/sample in a discrete time sampled IIR filter system

I am working on doing some digital filter work using Python and Numpy/Scipy.
I'm using scipy.signal.iirdesign to generate my filter coefficents, but it requires the filter passband coefficents in a format I am not familiar with
wp, ws : float
Passband and stopband edge frequencies, normalized from 0 to 1 (1 corresponds
to pi radians / sample).
For example:
Lowpass: wp = 0.2, ws = 0.3
Highpass: wp = 0.3, ws = 0.2
(from here)
I'm not familiar with digital filters (I'm coming from a hardware design background). In an analog context, I would determine the desired slope and the 3db down point, and calculate component values from that.
In this context, how do I take a known sample rate, a desired corner frequency, and a desired rolloff, and calculate the wp, ws values from that?
(This might be more appropriate for math.stackexchange. I'm not sure)

If your sampling rate is fs, the Nyquist rate is fs/2. This represents the highest representable frequency you can have without aliasing. It is also equivalent to the normalized value of 1 referred to by the documentation. Therefore, if you are designing a low pass filter with a corner frequency of fc, you'd enter it as fc / (fs/2).
For example, you have fs=8000 so fs/2=4000. You want a low pass filter with a corner frequency of 3100 and a stop band frequency of 3300. The resulting values would be wp=fc/(fs/2)=3100/4000. The stopband frequency would be 3300/4000.
Make sense?

Take the function x(t) = cos(2*pi*fa*t). If we're sampling at frequency fs, the sampled function is x(n*ts) = x(n/fs) = cos(2*pi*n*fa/fs). The maximum frequency before aliasing (folding) is the Nyquist frequency fa = fs/2, which normalizes to (fs/2)/fs = 1/2. The normalized angular frequency is 2*pi*1/2 rad/sample = pi rad/sample. Thus the signal x[n] = cos[pi*n] = [1,-1,1,-1,...].
The sampled version of a given frequency such as a corner frequency 2*pi*fc rad/s would be 2*pi*fc/fs rad/sample. As a fraction of the Nyquist frequency pi, that's 2*fc/fs = fc/(fs/2).
A few formulas to live by:
exp[j*w*n] = cos[w*n] + j*sin[w*n]
x_even[n] = 0.5*x[n] + 0.5*x[-n]
cos[w*n] = 0.5*exp[j*w*n] + 0.5*exp[-j*w*n] # cos is even
x_odd[n] = 0.5*x[n] - 0.5*x[-n]
j*sin[w*n] = 0.5*exp[j*w*n] - 0.5*exp[-j*w*n] # sin is odd
The DFT of the even component (a sum of cosines) of a real-valued signal will be real and symmetric while the DFT of the odd component (a sum of sines) will be imaginary and anti-symmetric. Thus for real-valued signals such as the impulse response of a typical filter, the magnitude spectrum is symmetric while the phase spectrum is antisymmetric. Thus you only have to specify a filter for the range 0 to pi, which is normalized to [0,1].

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.