I asked a since deleted question regarding how to determine Fourier coefficients from time series data. I am resubmitting this because I have better formulated the problem and have a solution that I'll give as I think others may find this very useful.
I have some time series data that I have binned into equally spaced time bins (a fact which will be crucial to my solution), and from that data I want to determine the Fourier series (or any function, really) that best describes the data. Here is a MWE with some test data to show the data I'm trying to fit:
import numpy as np
import matplotlib.pyplot as plt
# Create a dependent test variable to define the x-axis of the test data.
test_array = np.linspace(0, 1, 101) - 0.5
# Define some test data to try to apply a Fourier series to.
test_data = [0.9783883464566918, 0.979599093567252, 0.9821424606299206, 0.9857575507812502, 0.9899278899999995,
0.9941848228346452, 0.9978438300395263, 1.0003009205426352, 1.0012208923679058, 1.0017130521235522,
1.0021799664031628, 1.0027475606936413, 1.0034168260869563, 1.0040914266144825, 1.0047781181102355,
1.005520348837209, 1.0061899214145387, 1.006846206627681, 1.0074483048543692, 1.0078691461988312,
1.008318736328125, 1.008446947572815, 1.00862051262136, 1.0085134881422921, 1.008337095516569,
1.0079539881889774, 1.0074857334630352, 1.006747783037474, 1.005962048923679, 1.0049115434782612,
1.003812267822736, 1.0026427549407106, 1.001251963531669, 0.999898555335968, 0.9984976286266923,
0.996995982142858, 0.9955652088974847, 0.9941647321428578, 0.9927727076023389, 0.9914750532544377,
0.990212467710371, 0.9891098035363466, 0.9875998927875242, 0.9828093773946361, 0.9722532524271845,
0.9574084365384614, 0.9411012303149601, 0.9251820309477757, 0.9121488392156851, 0.9033119748549322,
0.9002445803921568, 0.9032760564202343, 0.91192435882353, 0.9249696964980555, 0.94071381372549,
0.957139088974855, 0.9721083392156871, 0.982955287937743, 0.9880613320235758, 0.9897455322896282,
0.9909590626223097, 0.9922601592233015, 0.9936513112840472, 0.9951442427184468, 0.9967071285988475,
0.9982921493123781, 0.9998775465116277, 1.001389230174081, 1.0029109110251453, 1.0044033691406251,
1.0057110841487276, 1.0069551867704276, 1.008118776264591, 1.0089884470588228, 1.0098663972602735,
1.0104514566473979, 1.0109849223300964, 1.0112043902912626, 1.0114717968750002, 1.0113343036750482,
1.0112205972495087, 1.0108811786407768, 1.010500276264591, 1.0099054552529192, 1.009353759223301,
1.008592596116505, 1.007887223091976, 1.0070715634615386, 1.0063525891472884, 1.0055587861271678,
1.0048733732809436, 1.0041832862669238, 1.0035913326848247, 1.0025318871595328, 1.000088536345776,
0.9963596140350871, 0.9918380684931506, 0.9873937281553398, 0.9833394624277463, 0.9803621496062999,
0.9786476100386117]
# Create a figure to view the data.
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
# Plot the data.
ax.scatter(test_array, test_data, color="k", s=1)
This outputs the following:
The question is how to determine the Fourier series best describing this data. The usual formula for determining the Fourier coefficients requires inserting a function into an integral, but if I had a function to describe the data I wouldn't need the Fourier coefficients at all; the whole point of finding this series is to have a functional representation of the data. In the absence of such a function, then, how are the coefficients found?
My solution to this problem is to apply a discrete Fourier transform to the data using NumPy's implementation of the Fast Fourier Transform, numpy.fft.fft(); this is why it's critical that the data is evenly spaced in time, as FFT requires this. While the FFT is typically used to perform analysis of the frequency spectrum, the desired Fourier coefficients are directly related to the output of this function.
Specifically, this function outputs a series of i complex-valued coefficients c. The Fourier series coefficients are found using the relations:
Therefore the FFT allows the Fourier coefficients to be directly computed. Here is the MWE of my solution to this problem, expanding the example given above:
import numpy as np
import matplotlib.pyplot as plt
# Set the number of equal-time bins to create.
n_bins = 101
# Set the number of Fourier coefficients to use.
n_coeff = 51
# Define a function to generate a Fourier series based on the coefficients determined by the Fast Fourier Transform.
# This also includes a series of phases x to pass through the function.
def create_fourier_series(x, coefficients):
# Begin the series with the zeroeth-order Fourier coefficient.
fourier_series = coefficients[0][0] / 2
# Now generate the first through n_coeff'th terms. The period is defined to be 1 since we're operating in phase
# space.
for n in range(1, n_coeff):
fourier_series += (fourier_coeff[n][0] * np.cos(2 * np.pi * n * x) + fourier_coeff[n][1] *
np.sin(2 * np.pi * n * x))
return fourier_series
# Create a dependent test variable to define the x-axis of the test data.
test_array = np.linspace(0, 1, n_bins) - 0.5
# Define some test data to try to apply a Fourier series to.
test_data = [0.9783883464566918, 0.979599093567252, 0.9821424606299206, 0.9857575507812502, 0.9899278899999995,
0.9941848228346452, 0.9978438300395263, 1.0003009205426352, 1.0012208923679058, 1.0017130521235522,
1.0021799664031628, 1.0027475606936413, 1.0034168260869563, 1.0040914266144825, 1.0047781181102355,
1.005520348837209, 1.0061899214145387, 1.006846206627681, 1.0074483048543692, 1.0078691461988312,
1.008318736328125, 1.008446947572815, 1.00862051262136, 1.0085134881422921, 1.008337095516569,
1.0079539881889774, 1.0074857334630352, 1.006747783037474, 1.005962048923679, 1.0049115434782612,
1.003812267822736, 1.0026427549407106, 1.001251963531669, 0.999898555335968, 0.9984976286266923,
0.996995982142858, 0.9955652088974847, 0.9941647321428578, 0.9927727076023389, 0.9914750532544377,
0.990212467710371, 0.9891098035363466, 0.9875998927875242, 0.9828093773946361, 0.9722532524271845,
0.9574084365384614, 0.9411012303149601, 0.9251820309477757, 0.9121488392156851, 0.9033119748549322,
0.9002445803921568, 0.9032760564202343, 0.91192435882353, 0.9249696964980555, 0.94071381372549,
0.957139088974855, 0.9721083392156871, 0.982955287937743, 0.9880613320235758, 0.9897455322896282,
0.9909590626223097, 0.9922601592233015, 0.9936513112840472, 0.9951442427184468, 0.9967071285988475,
0.9982921493123781, 0.9998775465116277, 1.001389230174081, 1.0029109110251453, 1.0044033691406251,
1.0057110841487276, 1.0069551867704276, 1.008118776264591, 1.0089884470588228, 1.0098663972602735,
1.0104514566473979, 1.0109849223300964, 1.0112043902912626, 1.0114717968750002, 1.0113343036750482,
1.0112205972495087, 1.0108811786407768, 1.010500276264591, 1.0099054552529192, 1.009353759223301,
1.008592596116505, 1.007887223091976, 1.0070715634615386, 1.0063525891472884, 1.0055587861271678,
1.0048733732809436, 1.0041832862669238, 1.0035913326848247, 1.0025318871595328, 1.000088536345776,
0.9963596140350871, 0.9918380684931506, 0.9873937281553398, 0.9833394624277463, 0.9803621496062999,
0.9786476100386117]
# Determine the fast Fourier transform for this test data.
fast_fourier_transform = np.fft.fft(test_data[n_bins / 2:] + test_data[:n_bins / 2])
# Create an empty list to hold the values of the Fourier coefficients.
fourier_coeff = []
# Loop through the FFT and pick out the a and b coefficients, which are the real and imaginary parts of the
# coefficients calculated by the FFT.
for n in range(0, n_coeff):
a = 2 * fast_fourier_transform[n].real / n_bins
b = -2 * fast_fourier_transform[n].imag / n_bins
fourier_coeff.append([a, b])
# Create the Fourier series approximating this data.
fourier_series = create_fourier_series(test_array, fourier_coeff)
# Create a figure to view the data.
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
# Plot the data.
ax.scatter(test_array, test_data, color="k", s=1)
# Plot the Fourier series approximation.
ax.plot(test_array, fourier_series, color="b", lw=0.5)
This outputs the following:
Note that how I defined the FFT (importing the second half of the data followed by the first half) is a consequence of how this data was generated. Specifically, the data runs from -0.5 to 0.5, but the FFT assumes it runs from 0.0 to 1.0, necessitating this shift.
I've found that this works quite well for data that doesn't include very sharp and narrow discontinuities. I would be interested to hear if anyone has another suggested solution to this problem, and I hope people find this explanation clear and helpful.
Not sure if it helps you in anyway; I wrote a programme to interpoplate your data. This is done using buildingblocks==0.0.15
Please see below,
import matplotlib.pyplot as plt
from buildingblocks import bb
import numpy as np
Ydata = [0.9783883464566918, 0.979599093567252, 0.9821424606299206, 0.9857575507812502, 0.9899278899999995,
0.9941848228346452, 0.9978438300395263, 1.0003009205426352, 1.0012208923679058, 1.0017130521235522,
1.0021799664031628, 1.0027475606936413, 1.0034168260869563, 1.0040914266144825, 1.0047781181102355,
1.005520348837209, 1.0061899214145387, 1.006846206627681, 1.0074483048543692, 1.0078691461988312,
1.008318736328125, 1.008446947572815, 1.00862051262136, 1.0085134881422921, 1.008337095516569,
1.0079539881889774, 1.0074857334630352, 1.006747783037474, 1.005962048923679, 1.0049115434782612,
1.003812267822736, 1.0026427549407106, 1.001251963531669, 0.999898555335968, 0.9984976286266923,
0.996995982142858, 0.9955652088974847, 0.9941647321428578, 0.9927727076023389, 0.9914750532544377,
0.990212467710371, 0.9891098035363466, 0.9875998927875242, 0.9828093773946361, 0.9722532524271845,
0.9574084365384614, 0.9411012303149601, 0.9251820309477757, 0.9121488392156851, 0.9033119748549322,
0.9002445803921568, 0.9032760564202343, 0.91192435882353, 0.9249696964980555, 0.94071381372549,
0.957139088974855, 0.9721083392156871, 0.982955287937743, 0.9880613320235758, 0.9897455322896282,
0.9909590626223097, 0.9922601592233015, 0.9936513112840472, 0.9951442427184468, 0.9967071285988475,
0.9982921493123781, 0.9998775465116277, 1.001389230174081, 1.0029109110251453, 1.0044033691406251,
1.0057110841487276, 1.0069551867704276, 1.008118776264591, 1.0089884470588228, 1.0098663972602735,
1.0104514566473979, 1.0109849223300964, 1.0112043902912626, 1.0114717968750002, 1.0113343036750482,
1.0112205972495087, 1.0108811786407768, 1.010500276264591, 1.0099054552529192, 1.009353759223301,
1.008592596116505, 1.007887223091976, 1.0070715634615386, 1.0063525891472884, 1.0055587861271678,
1.0048733732809436, 1.0041832862669238, 1.0035913326848247, 1.0025318871595328, 1.000088536345776,
0.9963596140350871, 0.9918380684931506, 0.9873937281553398, 0.9833394624277463, 0.9803621496062999,
0.9786476100386117]
Xdata=list(range(0,len(Ydata)))
Xnew=list(np.linspace(0,len(Ydata),200))
Ynew=bb.interpolate(Xdata,Ydata,Xnew,40)
plt.figure()
plt.plot(Xdata,Ydata)
plt.plot(Xnew,Ynew,'*')
plt.legend(['Given Data', 'Interpolated Data'])
plt.show()
Should you want to further write code, I have also give code so that you can see the source code and learn:
import module
import inspect
src = inspect.getsource(module)
print(src)
Related
I am conducting PCA on a dataset. I am attempting to add a line in my 3d graph which shows the first principal component. I have tried a few methods but have not been able to display the first principal component as a line in my 3d graph. Any help is greatly appreciated. My code is as follows:
import numpy as np
np.set_printoptions (suppress=True, precision=5, linewidth=150)
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import LabelEncoder
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
file_name = 'C:/Users/data'
input_data = pd.read_csv (file_name + '.csv', header=0, index_col=0)
A = input_data.A.values.astype(float)
B = input_data.B.values.astype(float)
C = input_data.C.values.astype(float)
D = input_data.D.values.astype(float)
E = input_data.E.values.astype(float)
F = input_data.F.values.astype(float)
X = np.column_stack((A, B, C, D, E, F))
ncompo = int (input ("Number of components to study: "))
print("")
pca = PCA (n_components = ncompo)
pcafit = pca.fit(X)
cov_mat = np.cov(X, rowvar=0)
eig_vals, eig_vecs = np.linalg.eig(cov_mat)
perc = pcafit.explained_variance_ratio_
perc_x = range(1, len(perc)+1)
plt.plot(perc_x, perc)
plt.xlabel('Components')
plt.ylabel('Percentage of Variance Explained')
plt.show()
#3d Graph
plt.clf()
le = LabelEncoder()
le.fit(input_data.Grade)
number = le.transform(input_data.Grade)
colormap = np.array(['green', 'blue', 'red', 'yellow'])
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(D, E, F, c=colormap[number])
ax.set_xlabel('D')
ax.set_ylabel('E')
ax.set_zlabel('F')
plt.title('PCA')
plt.show()
Some remarks to begin with:
You are computing PCA twice! To compute PCA is to compute eigen values and eigen vectors of the covariance matrix. So, either you use the sklearn function pca.fit, either you do it yourself. But you don't need to do both, unless you want to discover pca.fit and see for yourself that it does exactly what you expect it to do (if this is what you wanted, fine. It is a good thing to do that king of checking. I did this once also). Of course pca.fit has another advantage: once you have it, it also provides pca.predict to project points in the components space. But that also is simply a base change using eigenvectors matrix (that is matrix to change base)
pca object let you get the eigenvectors (pca.components_) and eigen values (pca.explained_variance_)
pca.fit is a 'inplace' method. It does not return a new PCA object. It just fit the one you have. So, no need to get pcafit and use it.
This is not a minimal reproducible exemple as required on SO. We should be able to copy and paste it, and run it, so see exactly your problem. Not to guess what kind of secret data you have. And in the meantime, it should be minimal. So, contains data example generation (it doesn't matter if those data doesn't make sense. Sometimes it is even better, since it allows some testing. In my following code, I generate my own noisy data along an axis, which allow me to verify that, indeed, I am able to "guess" what was that axis). Plus, since your problem concerns only 3d plot, there is no need to include ploting of explained variance here. That part is not part of your question.
Now, to print the principal component, well, you already did the hard part. Twice. That is to compute it. It is the eigenvector associated with the highest eigenvalue.
With pca object no need to search for it, they are already sorted. So it is simply pca.components_[0]. And since you want to plot in the space D,E,F, you simply need to draw vector pca.components_[0][3:].
With correct scaling.
You can do that with plot providing just 2 points (first and last)
Here is my version (which, by the way, shows also what a minimal reproducible example is)
import numpy as np
np.set_printoptions (suppress=True, precision=5, linewidth=150)
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import LabelEncoder
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
# Generation of random data along a given vector
vec=np.array([1, -1, 0.5, -0.5, 0.75, 0.75]).reshape(-1,1)
# 10000 random data, that are U[0,10]×vec + gaussian noise std=1
X=(vec*np.random.rand(10000)*10 + np.random.normal(0,1,(6,10000))).T
(A,B,C,D,E,F)=X.T
input_data = pd.DataFrame({'A':A,'B':B,'C':C,'D':D,'E':E, 'F':F, 'Grade':np.random.randint(1,5, (10000,))})
ncompo=6
pca = PCA (n_components = ncompo)
pca.fit(X)
# Redundant
cov_mat = np.cov(X, rowvar=0)
eig_vals, eig_vecs = np.linalg.eig(cov_mat)
# See
print("Eigen values")
print(eig_vals)
print(pca.explained_variance_)
print("Eigen vec")
print(eig_vecs)
print(pca.components_)
# Note, compare first components to
print("Main component")
print(vec/np.linalg.norm(vec))
print(pca.components_[0])
#3d Graph
le = LabelEncoder()
le.fit(input_data.Grade)
number = le.transform(input_data.Grade)
fig = plt.figure()
colormap = np.array(['green', 'blue', 'red', 'yellow'])
ax = fig.add_subplot(111, projection='3d')
ax.scatter(D, E, F, c=colormap[number])
U=pca.components_[0]
sc1=max(D)/U[3]
sc2=min(D)/U[3]
# Draw the 1st principal component as a blue line
ax.plot([sc1*U[3],sc2*U[3]], [sc1*U[4], sc2*U[4]], [sc1*U[5], sc2*U[5]], linewidth=3)
ax.set_xlabel('D')
ax.set_ylabel('E')
ax.set_zlabel('F')
plt.title('PCA')
plt.show()
My example is not that minimal, because I took advantage of it to illustrate my first remark, and also computed PCA twice, to compare both result.
So, here I print, eigenvalues
Eigen values
[30.88941 1.01334 0.99512 0.96493 0.97692 0.98101]
[30.88941 1.01334 0.99512 0.98101 0.97692 0.96493]
(1st being your computation by diagonalisation of covariance matrix, 2nd pca.explained_variance_)
As you can see, they are the same, except sorting for the 1st one
Like wise,
Eigen vec
[[-0.52251 -0.27292 0.40863 -0.06321 0.26699 0.6405 ]
[ 0.52521 0.07577 -0.34211 0.27583 -0.04161 0.72357]
[-0.26266 -0.41332 -0.60091 0.38027 0.47573 -0.16779]
[ 0.26354 -0.52548 0.47284 0.59159 -0.24029 -0.15204]
[-0.39493 0.63946 0.07496 0.64966 -0.08619 0.00252]
[-0.3959 -0.25276 -0.35452 -0.0572 -0.79718 0.12217]]
[[ 0.52251 -0.52521 0.26266 -0.26354 0.39493 0.3959 ]
[-0.27292 0.07577 -0.41332 -0.52548 0.63946 -0.25276]
[-0.40863 0.34211 0.60091 -0.47284 -0.07496 0.35452]
[-0.6405 -0.72357 0.16779 0.15204 -0.00252 -0.12217]
[-0.26699 0.04161 -0.47573 0.24029 0.08619 0.79718]
[-0.06321 0.27583 0.38027 0.59159 0.64966 -0.0572 ]]
Also the same, but for sorting and transpose.
Eigen vectors are presented column wise when you diagonalize a matrix.
Where as for pca.components_ each line is an eigen vector.
But you can see that in the 1st matrix, the eigen vector associated to the biggest eigen value, that is, since biggest eigen value was the 1st one, the 1st column (-0.52, 0.52, etc.)
is also the same as the first line of pca.components_.
Like wise, the 4th biggest eigen value in your diagonalisation was the last one.
And if you look at the last column of your eigen vectors (0.64, 0.72, -0.76...), it is the same as the 4th line of pca.components_ (with a irrelevant ×-1 factor)
So, long story short, you already have eigenvals in pca.explained_variance_ sorted from the biggest to the smallest. And eigen vectors in pca_components_, in the same order.
Last thing I print here, is comparison between the first component (pca.components_[0]) and the vector I used to generate the data in the first place (my data are all colinear to a vector vec, + a gaussian noise).
Main component
[[ 0.52523]
[-0.52523]
[ 0.26261]
[-0.26261]
[ 0.39392]
[ 0.39392]]
[ 0.52251 -0.52521 0.26266 -0.26354 0.39493 0.3959 ]
As expected, PCA did find correctly that main axis.
So, that was just side comments.
What is really what you were looking for is
ax.plot([sc1*U[3],sc2*U[3]], [sc1*U[4], sc2*U[4]], [sc1*U[5], sc2*U[5]], linewidth=3)
sc1 and sc2 being just scaling factors (here I choose it so that it scales approx like the data. Another way would have been to set ax.set_xlim, ax.set_ylim, ax.set_zlim from D.min(), D.max(), E.min(), E.max(), etc.
And then just use big values for sc1 and sc2, like
sc1=1000
sc2=-1000
I have been thinking about it for a long time, but I don't find out what the problem is. Hope you can help me, Thank you.
F(s) Gaussian function
F(s)=1/(√2π s) e^(-(w-μ)^2/(2s^2 ))
Code:
import numpy as np
from matplotlib import pyplot as plt
from math import pi
from scipy.fft import fft
def F_S(w, mu, sig):
return (np.exp(-np.power(w-mu, 2)/(2 * np.power(sig, 2))))/(np.power(2*pi, 0.5)*sig)
w=np.linspace(-5,5,100)
plt.plot(w, np.real(np.fft.fft(F_S(w, 0, 1))))
plt.show()
Result:
As was mentioned before you want the absolute value, not the real part.
A minimal example, showing the the re/im , abs/phase spectra.
import numpy as np
import matplotlib.pyplot as p
%matplotlib inline
n=1001 # add 1 to keep the interval a round number when using linspace
t = np.linspace(-5, 5, n ) # presumed to be time
dt=t[1]-t[0] # time resolution
print(f'sampling every {dt:.3f} sec , so at {1/dt:.1f} Sa/sec, max. freq will be {1/2/dt:.1f} Hz')
y = np.exp(-(t**2)/0.01) # signal in time
fr= np.fft.fftshift(np.fft.fftfreq(n, dt)) # shift helps with sorting the frequencies for better plotting
ft=np.fft.fftshift(np.fft.fft(y)) # fftshift only necessary for plotting in sequence
p.figure(figsize=(20,12))
p.subplot(231)
p.plot(t,y,'.-')
p.xlabel('time (secs)')
p.title('signal in time')
p.subplot(232)
p.plot(fr,np.abs(ft), '.-',lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, abs');
p.subplot(233)
p.plot(fr,np.real(ft), '.-',lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, real');
p.subplot(235)
p.plot(fr,np.angle(ft), '.-', lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, phase');
p.subplot(236)
p.plot(fr,np.imag(ft), '.-',lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, imag');
you have to change from time scale to frequency scale
When you make a FFT you will get the simetric tranformation, i.e, mirror of the positive to negative curve. Usually, you only will look at the positive side.
Also, you should take care with sample rate, as FFT is designed to transform time domain input to frequency domain, the time, or sample rate, of input info matters. So add timestep in np.fft.fftfreq(n, d=timestep) for your sample rate.
If you simple want to make a fft of normal dist signal, here is another question with it and some good explanations on why are you geting this behavior:
Fourier transform of a Gaussian is not a Gaussian, but thats wrong! - Python
There are two mistakes in your code:
Don't take the real part, take the absoulte value when plotting.
From the docs:
If A = fft(a, n), then A[0] contains the zero-frequency term (the mean
of the signal), which is always purely real for real inputs. Then
A[1:n/2] contains the positive-frequency terms, and A[n/2+1:] contains
the negative-frequency terms, in order of decreasingly negative
frequency.
You can rearrange the elements with np.fft.fftshift.
The working code:
import numpy as np
from matplotlib import pyplot as plt
from math import pi
from scipy.fftpack import fft, fftshift
def F_S(w, mu, sig):
return (np.exp(-np.power(w-mu, 2)/(2 * np.power(sig, 2))))/(np.power(2*pi, 0.5)*sig)
w=np.linspace(-5,5,100)
plt.plot(w, fftshift(np.abs(np.fft.fft(F_S(w, 0, 1)))))
plt.show()
Also, you might want to consider scaling the x axis too.
Could you please advise me on the following:
I gather data from an Arduino ADC and store the data in a list on a Raspberry Pi 4 with Python 3.
The list is called 'dataList' and contains 1024 10 bits samples. This all works fine: I can reproduce the sampled signal on the Raspberry.
I would like to use the power spectrum of the acquired signal using numpy FFT.
I tried the following:
[see below]
This should illustrate what I'm trying to do; however this produces incoherent output. The sampled signal has a frequency of about 300 Hz. I would be very grateful for any hints in the right direction!
def show_FFT(window):
fft = np.fft.fft (dataList, 1024, -1, None)
for X_value in range (0,512, 1):
Y_value = fft ([X_value]
gfxdraw.pixel (window, X_value, int(abs(Y_value), black)
As you mentioned in your question, you have a data set whith X starting from 0 to... but for numpy.fft.fft you must keep in mind that it is a discrete Fourier transform (DFT) which caculate the fft of equaly spaced samples and i must mntion that it must be a symetric range of dataset from -x to x. You can simply try it with a gausian finction and change the parameters as you wish and see what are the results...
Since you didn''t give any data set here , I would refer you to a generl case with below code:
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
# create data from dataframes
x = np.random.rand(50) #unequaly spaced measurment
x.sort()
y = np.exp(-x*x) #measured signal
based on the answer here you can resample your data into equaly spaced points by:
f = interpolate.interp1d(x, y)
num = 500
xx = np.linspace(x[0], x[-1], num)
yy = f(xx)
plt.close('all')
plt.plot(x,y,'bo')
plt.plot(xx,yy, 'g.-')
plt.show()
enter image description here
then you can make your x data symetric very simply by :
x=xx
y=yy
xsample = x-((x.max()-x.min())/2)
xsample=xsample-(xsample.max()+xsample.min())/2
x=xsample
thne if you try fft you will get the corect results as:
ysample =yy
ysample_fft = np.fft.fftshift(np.abs(np.fft.fft(ysample/ysample.max()))) /
np.sqrt(len(ysample))
plt.plot(xsample,ysample_fft/ysample_fft.max(),'b--')
plt.show()
enter image description here
I have some data I gathered analyzing the change of acceleration regarding time. But when I wrote the code below to have a good fit for the sinusoidal wave, this was the result. Is this because I don't have enough data or am I doing something wrong here?
Here you can see my graph:
Measurements plotted directly(no fit)
Fit with horizontal and vertical shift (curve_fit)
Increased data by linspace
Manually manipulated amplitude
Edit: I increased the data size by using the linspace function and plotting it but I am not sure why the amplitude doesn't match, is it because there are very few data to analyze? (I was able to manipulate the amplitude manually but I don't understand why it can't do it)
The code I am using for the fit
def model(x, a, b):
return a * np.sin(b * x)
param, parav_cov = cf(model, time, z_values)
array_x = np.linspace(800, 1400, 1000)
fig = plt.figure(figsize = (9, 4))
plt.scatter(time, z_values, color = "#3333cc", label = "Data")
plt.plot(array_x, model(array_x, param[0], param[1], param[2], param[3]), label = "Sin Fit")
I'd use an FFT to get a first guess at parameters, as this sort of thing is highly non-linear and curve_fit is unlikely to get very far otherwise. the reason for using a FFT is to get an initial idea of the frequency involved, not much more. 3Blue1Brown has a great video on FFTs if you've not seem it
I used web plot digitizer to get your data out of your plots, then pulled into Python and made sure it looked OK with:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('sinfit2.csv')
print(df.head())
giving me:
x y
0 809.3 0.3
1 820.0 0.3
2 830.3 19.6
3 839.9 19.6
4 849.6 0.4
I started by doing a basic FFT with NumPy (SciPy has the full fftpack which is more complete, but not needed here):
import numpy as np
from numpy.fft import fft
d = fft(df.y)
plt.plot(np.abs(d)[:len(d)//2], '.')
the np.abs(d) is because you get a complex number back containing both phase and amplitude, and [:len(d)//2] is because (for real valued input) the output is symmetric about the midpoint, i.e. d[5] == d[-5].
this says the largest component was 18, I tried plotting this by hand and it looked OK:
x = np.linspace(0, np.pi * 2, len(df))
plt.plot(df.x, df.y, '.-', lw=1)
plt.plot(df.x, np.sin(x * 18) * 10 + 10)
I'm multiplying by 10 and adding 10 is because the range of a sine is (-1, +1) and we need to take it to (0, 20).
next I passed these to curve_fit with a simplified model to help it along:
from scipy.optimize import curve_fit
def model(x, a, b):
return np.sin(x * a + b) * 10 + 10
(a, b), cov = curve_fit(model, x, df.y, [18, 0])
again I'm hardcoding the * 10 + 10 to get the range to match your data, which gives me a=17.8 and b=2.97
finally I plot the function sampled at a higher frequency to make sure all is OK:
plt.plot(df.x, df.y)
plt.plot(
np.linspace(810, 1400, 501),
model(np.linspace(0, np.pi*2, 501), a, b)
)
giving me:
which seems to look OK. note you might want to change these parameters so they fit your original X, and note my df.x starts at 810, so I might have missed the first point.
I am trying to plot normal distribution curve using Python. First I did it manually by using the normal probability density function and then I found there's an exiting function pdf in scipy under stats module. However, the results I get are quite different.
Below is the example that I tried:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
mean = 5
std_dev = 2
num_dist = 50
# Draw random samples from a normal (Gaussion) distribution
normalDist_dataset = np.random.normal(mean, std_dev, num_dist)
# Sort these values.
normalDist_dataset = sorted(normalDist_dataset)
# Create the bins and histogram
plt.figure(figsize=(15,7))
count, bins, ignored = plt.hist(normalDist_dataset, num_dist, density=True)
new_mean = np.mean(normalDist_dataset)
new_std = np.std(normalDist_dataset)
normal_curve1 = stats.norm.pdf(normalDist_dataset, new_mean, new_std)
normal_curve2 = (1/(new_std *np.sqrt(2*np.pi))) * (np.exp(-(bins - new_mean)**2 / (2 * new_std**2)))
plt.plot(normalDist_dataset, normal_curve1, linewidth=4, linestyle='dashed')
plt.plot(bins, normal_curve2, linewidth=4, color='y')
The result shows how the two curves I get are very different from each other.
My guess is that it is has something to do with bins or pdf behaves differently than usual formula. I have used the same and new mean and standard deviation for both the plots. So, how do I change my code to match what stats.norm.pdf is doing?
I don't know yet which curve is correct.
Function plot simply connects the dots with line segments. Your bins do not have enough dots to show a smooth curve. Possible solution:
....
normal_curve1 = stats.norm.pdf(normalDist_dataset, new_mean, new_std)
bins = normalDist_dataset # Add this line
normal_curve2 = (1/(new_std *np.sqrt(2*np.pi))) * (np.exp(-(bins - new_mean)**2 / (2 * new_std**2)))
....