Calculate non-integer frequency with NumPy FFT - python

I would like to calculate the frequency of a periodic time series using NumPy FFT. As an example, let's say my time series y is defined as follows:
import numpy as np
freq = 12.3
x = np.arange(10000)
y = np.cos(x * 2 * np.pi * freq / 10000)
If the frequency is an integer, I can calculate it using np.argmax(np.abs(np.fft.fft(y))). However, in case the frequency is not an integer, how do I calculate the frequency with more precision?
EDIT: To clarify, we are not supposed to know how the time series y is generated. The above code snippet is just an artificial example of how a non-integer frequency could come up. Obviously if we already know the function that generates the time series, we don't need FFT to determine the frequency.

You need to give your signal more resolution
import numpy as np
freq = 12.3
x = np.arange(100000) # 10 times more resolution
y = np.cos(x * 2 * np.pi * freq / 10000) # don't change this
print(np.argmax(np.abs(np.fft.fft(y))) / 10) # divide by 10
# 12.3
The number of data points in x need to be 10 times more than the number you divide y with. You could get the same effect like this:
x = np.arange(10000)
y = np.cos(x * 2 * np.pi * freq / 1000)
print(np.argmax(np.abs(np.fft.fft(y))) / 10)
# 12.3
If you want to find the frequency with two decimals the resolution needs to be 100 times more.
freq = 12.34
x = np.arange(10000)
y = np.cos(x * 2 * np.pi * freq / 100) # 100 times more resolution
print(np.argmax(np.abs(np.fft.fft(y))) / 100) # divide by 100
# 12.34

You can pad the data with zeros before computing the FFT.
For example, here's your original calculation. It finds the Fourier coefficient with the maximum magnitude at frequency 12.0:
In [84]: freq = 12.3
In [85]: x = np.arange(10000)
In [86]: y = np.cos(x * 2 * np.pi * freq / 10000)
In [87]: f = np.fft.fft(y)
In [88]: k = np.argmax(np.abs(f))
In [89]: np.fft.fftfreq(len(f), d=1/10000)[k]
Out[89]: 12.0
Now recompute the Fourier transform, but pad the input to have a length of six times the original length (you can adjust that factor as needed). With the padded signal the Fourier coefficient with maximum magnitude is associated with frequency 12.333:
In [90]: f = np.fft.fft(y, 6*len(y))
In [91]: k = np.argmax(np.abs(f))
In [92]: np.fft.fftfreq(len(f), d=1/10000)[k]
Out[92]: 12.333333333333332
Here's a plot that illustrates the effect of padding the signal. The signal is not the same as above; I used different values with a much shorter signal to make it easier to see the effect. The shapes of the lobes are not changed, but the number of points at which the frequency is sampled is increased.
The plot is generated by the following script:
import numpy as np
import matplotlib.pyplot as plt
fs = 10
T = 1.4
t = np.arange(T*fs)/fs
freq = 2.6
y = np.cos(2*np.pi*freq*t)
fy = np.fft.fft(y)
magfy = np.abs(fy)
freqs = np.fft.fftfreq(len(fy), d=1/fs)
plt.plot(freqs, magfy, 'd', label='no padding')
for (factor, markersize) in [(2, 9), (16, 4)]:
fy_padded = np.fft.fft(y, factor*len(y))
magfy_padded = np.abs(fy_padded)
freqs_padded = np.fft.fftfreq(len(fy_padded), d=1/fs)
plt.plot(freqs_padded, magfy_padded, '.', label='padding factor %d' % factor,
alpha=0.5, markersize=markersize)
plt.xlabel('Frequency')
plt.ylabel('Magnitude of Fourier Coefficient')
plt.grid()
plt.legend(framealpha=1, shadow=True)
plt.show()

You can try using either interpolation or zero-padding (which is equivalent to entire vector interpolation) to potentially improve your frequency estimation, if the S/N allows. Sinc kernel interpolation is more accurate than parabolic interpolation.

Related

Fourier Transform - strange results

I'm trying to make some example of FFTs. The idea here is to have 3 wavelengths for 3 different musical notes (A, C, E), add them together (to form the aminor chord) and then do an FFT to retrieve the original frequencies.
import numpy as np
import matplotlib.pyplot as plt
import scipy.fft
def generate_sine_wave(freq, sample_rate, duration):
x = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
frequencies = x * freq
# 2pi because np.sin takes radians
y = np.sin(2 * np.pi * frequencies)
return x, y
def main():
# Frequency of note in Aminor chord (A, C, E)
# note_names = ('A', 'C', 'E')
# fs = (440, 261.63, 329.63)
fs = (27.50, 16.35, 20.60)
# duration, in seconds.
duration = .5
# sample rate. determines how many data points the signal uses to represent
# the sine wave per second. So if the signal had a sample rate of 10 Hz and
# was a five-second sine wave, then it would have 10 * 5 = 50 data points.
sample_rate = 1000
fig, ax = plt.subplots(5)
all_wavelengths = []
# Create a linspace, with N samples from 0 to duration
# x = np.linspace(0.0, T, N)
for i, f in enumerate(fs):
x, y = generate_sine_wave(f, sample_rate, duration)
# y = np.sin(2 * np.pi * F * x)
all_wavelengths.append(y)
ax[i].plot(x, y)
# sum of all notes
aminor = np.sum(all_wavelengths, axis=0)
ax[i].plot(x, aminor)
yf = np.abs(scipy.fft.rfft(aminor))
xf = scipy.fft.rfftfreq(int(sample_rate * duration), 1 / sample_rate)
ax[i + 1].plot(xf, yf)
ax[i + 1].vlines(fs, ymin=0, ymax=(max(yf)), color='purple')
plt.show()
if __name__ == '__main__':
main()
However, the FFT plot (last subplot) does not have the proper peak frequencies (highlighted through vertical purple lines). Why is that?
The FFT will only recover the contained frequencies exactly if the sampling window covers a multiple of the signal's period. Otherwise, if there is a "remainder", the frequency peaks will deviate from the exact values.
Since your A-minor signal contains three distinct frequencies, 27.50, 16.35, 20.60 Hz, you need a sampling duration which covers a multiple of the period for each of those components. In order to find that duration, you can compute the least common multiple of each of the fractional parts of the frequencies:
>>> import math
>>> math.lcm(50, 35, 60, 100)
2100
Note that we're including 100 here because the multiple also needs to satisfy the condition to sample a whole period. The above result implies that for a duration of 21 seconds, the frequencies will be recovered perfectly. Of course, any other multiple of 21 seconds will work as well. The following plot is obtained for a duration of 21 seconds:
I think that - within margin of error - the results do in-fact match your frequencies:
You can see in your frequency plot that the closest frequency in the plot to your actual frequencies do indeed have the highest amplitude.
However, because this is a DFT algorithm, and so the frequencies being returned are discrete, they don't exactly match the frequencies you used to construct your sample.
What you can try is making your sample size (ie the number of time points in your input data) either longer and/or a multiple of your input wavelengths. That should increase the frequency resolution and/or move the sampled output frequencies closer to input frequencies.

Compute sum of sine waves in numpy; avoid large matrices while maintaining numpy performance

I'm currently trying to compute the sum multiple generated sine waves, generated like so:
np.sin(2 * np.pi * freq * np.arange(N) / SAMPLE_RATE)
where N is the number of samples I want to generate, and SAMPLE_RATE is the sample rate in Hz. One neat thing I realized is that if I pass in a column vector for freq, it'll generate a matrix of samples due to broadcasting, where each row corresponds to a single sine wave for a single frequency, and each column corresponds to a single sample point for each frequency. As an example:
freq = np.array([[1,2,3,4]]).T
SAMPLE_RATE = 1
N = 5
print(np.sin(2 * np.pi * freq * np.arange(N) / 20))
outputs
[[ 0. 0.30901699 0.58778525 0.80901699 0.95105652]
[ 0. 0.58778525 0.95105652 0.95105652 0.58778525]
[ 0. 0.80901699 0.95105652 0.30901699 -0.58778525]
[ 0. 0.95105652 0.58778525 -0.58778525 -0.95105652]]
which if summed along the 0 axis, yields the sum of all of the sine waves. However, this takes up O(N * len(freq)) space, which when N and freq are large is unacceptable. Would there be a way to do this in O(N) space without sacrificing on the vectorized summation at the end? This problem should generalize to if a matrix is generated through any broadcasting operation only to immediately be collapsed by summation.
OK, found it on the math stack here
As long as freq is of the form np.arange(m) + 1 (in this case m=4)
s = lambda x, m: np.sin(np.pi * m * x / 20)
f = lambda x, m: np.where(x == 0, 0, s(x, m + 1) * s(x, m) / s(x, 1))
# ^^^ to fix divide by zero error at 0
f(np.arange(N), 4)
Out[]:
array([0.00000000e+00, 2.65687576e+00, 3.07768354e+00, 1.48130525e+00,
1.22464680e-16])
np.sin(2 * np.pi * freq * np.arange(N) / 20).sum(0)
Out[]:
array([0.00000000e+00, 2.65687576e+00, 3.07768354e+00, 1.48130525e+00,
1.11022302e-16])
You'll get floating-point differences when it's very close to zero (as with the last point there) but it should otherwise work.

Simple DFT Coefficients => Amplitude/Frequencies => Plot

Im trying on DFT and FFT in Python with numpy and pyplot.
My Sample Vector is
x = np.array([1,2,4,3]
The DFT coefficients for that vector are
K = [10+0j, -3+1j, 0+0j, -3-1j]
so basically we have 10, -3+i, 0 and -3-1i as DFT coefficients.
My problem now is to get a combination of sin and cos to fit all 4 points.
Let's assume we have a sample Rate of 1hz.
This is my code :
from matplotlib import pyplot as plt
import numpy as np
x = np.array([1,2,4,3])
fft = np.fft.fft(x)
space = np.linspace(0,4,50)
values = np.array([1,2,3,4])
cos0 = fft[0].real * np.cos(0 * space)
cos1 = fft[1].real * np.cos(1/4 * np.pi * space)
sin1 = fft[1].imag * np.sin(1/4 * np.pi * space)
res = cos0 + cos1 + sin1
plt.scatter(values, x, label="original")
plt.plot(space, cos0, label="cos0")
plt.plot(space, cos1, label="cos1")
plt.plot(space, sin1, label="sin1")
plt.plot(space, res, label="combined")
plt.legend()
As result i get the plot:
(source: heeser-it.de)
Why isnt the final curve hitting any point?
I would appreciate your help. Thanks!
EDIT:
N = 1000
dataPoints = np.linspace(0, np.pi, N)
function = np.sin(dataPoints)
fft = np.fft.fft(function)
F = np.zeros((N,))
for i in range(0, N):
F[i] = (2 * np.pi * i) / N
F_sin = np.zeros((N,N))
F_cos = np.zeros((N,N))
res = 0
for i in range(0, N):
F_sin[i] = fft[i].imag / 500 * np.sin(dataPoints * F[i])
F_cos[i] = fft[i].real / 500* np.cos(dataPoints * F[i])
res = res + F_sin[i] + F_cos[i]
plt.plot(dataPoints, function)
plt.plot(dataPoints, res)
my plot looks like:
(source: heeser-it.de)
where do i fail?
Your testing vector x looks bit like a sawtooth because it rises linearly and then starts to decrease but with that few datapoints it's hard to tell what signal it is. This has an infinite FFT series, which means it has lot of higher harmonic frequency components in it. So to describe it with DTF coefficients and get close to original points, you would have to use
higher sample rate, to get information about higher frequencies (you should learn about nyquist theorem)
more data points (samples), so you can extract more precise information about frequencies in your signal) This means you have to have more items in your array 'x'.
Also you could try to fit some simpler signal. What about you try to fit a sine signal for start? Generate 1000 data points of low frequency sine (1Hz or one cycle per 1000 samples) and then run DTF on it to check if your code works.
There are a few mistakes:
The xs you assigned to the original values are off by one
The frequency you assigned to fft[1] is incorrect
The coefficients are incorrectly scaled
This one works:
from matplotlib import pyplot as plt
import numpy as np
x = np.array([1,2,4,3])
fft = np.fft.fft(x)
space = np.linspace(0,4,50)
values = np.array([0,1,2,3])
cos0 = fft[0].real * np.cos(0 * space)/4
cos1 = fft[1].real * np.cos(1/2 * np.pi * space)/2
sin1 = -fft[1].imag * np.sin(1/2 * np.pi * space)/2
res = cos0 + cos1 + sin1
plt.scatter(values, x, label="original")
plt.plot(space, cos0, label="cos0")
plt.plot(space, cos1, label="cos1")
plt.plot(space, sin1, label="sin1")
plt.plot(space, res, label="combined")
plt.legend()
plt.show()

Two dimensional FFT using python results in slightly shifted frequency

I know there have been several questions about using the Fast Fourier Transform (FFT) method in python, but unfortunately none of them could help me with my problem:
I want to use python to calculate the Fast Fourier Transform of a given two dimensional signal f, i.e. f(x,y). Pythons documentation helps a lot, solving a few issues, which the FFT brings with it, but i still end up with a slightly shifted frequency compared to the frequency i expect it to show. Here is my python code:
from scipy.fftpack import fft, fftfreq, fftshift
import matplotlib.pyplot as plt
import numpy as np
import math
fq = 3.0 # frequency of signal to be sampled
N = 100.0 # Number of sample points within interval, on which signal is considered
x = np.linspace(0, 2.0 * np.pi, N) # creating equally spaced vector from 0 to 2pi, with spacing 2pi/N
y = x
xx, yy = np.meshgrid(x, y) # create 2D meshgrid
fnc = np.sin(2 * np.pi * fq * xx) # create a signal, which is simply a sine function with frequency fq = 3.0, modulating the x(!) direction
ft = np.fft.fft2(fnc) # calculating the fft coefficients
dx = x[1] - x[0] # spacing in x (and also y) direction (real space)
sampleFrequency = 2.0 * np.pi / dx
nyquisitFrequency = sampleFrequency / 2.0
freq_x = np.fft.fftfreq(ft.shape[0], d = dx) # return the DFT sample frequencies
freq_y = np.fft.fftfreq(ft.shape[1], d = dx)
freq_x = np.fft.fftshift(freq_x) # order sample frequencies, such that 0-th frequency is at center of spectrum
freq_y = np.fft.fftshift(freq_y)
half = len(ft) / 2 + 1 # calculate half of spectrum length, in order to only show positive frequencies
plt.imshow(
2 * abs(ft[:half,:half]) / half,
aspect = 'auto',
extent = (0, freq_x.max(), 0, freq_y.max()),
origin = 'lower',
interpolation = 'nearest',
)
plt.grid()
plt.colorbar()
plt.show()
And what i get out of this when running it, is:
Now you see that the frequency in x direction is not exactly at fq = 3, but slightly shifted to the left. Why is this?
I would assume that is has to do with the fact, that FFT is an algorithm using symmetry arguments and
half = len(ft) / 2 + 1
is used to show the frequencies at the proper place. But I don't quite understand what the exact problem is and how to fix it.
Edit: I have also tried using a higher sampling frequency (N = 10000.0), which did not solve the issue, but instead shifted the frequency slightly too far to the right. So i am pretty sure that the problem is not the sampling frequency.
Note: I'm aware of the fact, that the leakage effect leads to unphysical amplitudes here, but in this post I am primarily interested in the correct frequencies.
I found a number of issues
you use 2 * np.pi twice, you should choose one of either linspace or the arg to sine as radians if you want a nice integer number of cycles
additionally np.linspace defaults to endpoint=True, giving you an extra point for 101 instead of 100
fq = 3.0 # frequency of signal to be sampled
N = 100 # Number of sample points within interval, on which signal is considered
x = np.linspace(0, 1, N, endpoint=False) # creating equally spaced vector from 0 to 2pi, with spacing 2pi/N
y = x
xx, yy = np.meshgrid(x, y) # create 2D meshgrid
fnc = np.sin(2 * np.pi * fq * xx) # create a signal, which is simply a sine function with frequency fq = 3.0, modulating the x(!) direction
you can check these issues:
len(x)
Out[228]: 100
plt.plot(fnc[0])
fixing the linspace endpoint now means you have an even number of fft bins so you drop the + 1 in the half calc
matshow() appears to have better defaults, your extent = (0, freq_x.max(), 0, freq_y.max()), in imshow appears to fubar the fft bin numbering
from scipy.fftpack import fft, fftfreq, fftshift
import matplotlib.pyplot as plt
import numpy as np
import math
fq = 3.0 # frequency of signal to be sampled
N = 100 # Number of sample points within interval, on which signal is considered
x = np.linspace(0, 1, N, endpoint=False) # creating equally spaced vector from 0 to 2pi, with spacing 2pi/N
y = x
xx, yy = np.meshgrid(x, y) # create 2D meshgrid
fnc = np.sin(2 * np.pi * fq * xx) # create a signal, which is simply a sine function with frequency fq = 3.0, modulating the x(!) direction
plt.plot(fnc[0])
ft = np.fft.fft2(fnc) # calculating the fft coefficients
#dx = x[1] - x[0] # spacing in x (and also y) direction (real space)
#sampleFrequency = 2.0 * np.pi / dx
#nyquisitFrequency = sampleFrequency / 2.0
#
#freq_x = np.fft.fftfreq(ft.shape[0], d=dx) # return the DFT sample frequencies
#freq_y = np.fft.fftfreq(ft.shape[1], d=dx)
#
#freq_x = np.fft.fftshift(freq_x) # order sample frequencies, such that 0-th frequency is at center of spectrum
#freq_y = np.fft.fftshift(freq_y)
half = len(ft) // 2 # calculate half of spectrum length, in order to only show positive frequencies
plt.matshow(
2 * abs(ft[:half, :half]) / half,
aspect='auto',
origin='lower'
)
plt.grid()
plt.colorbar()
plt.show()
zoomed the plot:

Scaling Amplitude After Windowing FFT to Recover Correct Amplitude

I am trying to apply a Hann window to a sinusoidal signal with the idea of applying an FFT to recover the frequency and the amplitude. This is a canonical case I have created to increase my understanding before I move onto my data (real time signal where I want to accurately determine the frequency content and amplitude). In the code below I need to multiple by an additional factor of 2.0 to recover the amplitudes. I understand multiplying by 2/N but now I am multiplying by 4/N. Has anyone come across this or can anyone provide an explanation to why this is? Here is my code:
import numpy as np
import matplotlib.pyplot as plt
# first create the time signal, which has two frequencies 13.2 hz and 43.9 hz
f_s = 100.0 # Hz sampling frequency
f = 1.0 # Hz
time = np.arange(0.0, 10.0, 1/f_s)
x = 5 * np.sin(13.2 * 2 * np.pi * f * time) + 3 * np.sin(43.9 * 2 * np.pi * f * time)
x = x + np.random.randn(len(time)) #inject some noise
# apply hann window and take the FFT
win = np.hanning(len(x))
FFT = np.fft.fft(win * x) * 2.0 # IT SEEMS I NEED AN ADDITIONAL FACTOR OF 2 TO RECOVER THE AMPLITUDES
n = len(FFT)
freq_hanned = np.fft.fftfreq(n, 1/f_s)
half_n = np.ceil(n/2.0)
fft_hanned_half = (2.0 / n) * FFT[:half_n]
freq_hanned_half = freq_hanned[:half_n]
# and plot
plt.plot(freq_hanned_half, np.abs(fft_hanned_half))
plt.xlabel("Frequency (Hz)")
plt.ylabel("Amplitude")
The mean value of the von Hann window is (approximately) 0.5, for N=1000 you have
>>> N=1000 ; print sum(np.hanning(N))/N
0.4995
>>>
Does this explain the necessity of multiplying by two to recover the discrete amplitudes?

Categories