NP.FFT on python list - python

Could you please advise me on the following:
I gather data from an Arduino ADC and store the data in a list on a Raspberry Pi 4 with Python 3.
The list is called 'dataList' and contains 1024 10 bits samples. This all works fine: I can reproduce the sampled signal on the Raspberry.
I would like to use the power spectrum of the acquired signal using numpy FFT.
I tried the following:
[see below]
This should illustrate what I'm trying to do; however this produces incoherent output. The sampled signal has a frequency of about 300 Hz. I would be very grateful for any hints in the right direction!
def show_FFT(window):
fft = np.fft.fft (dataList, 1024, -1, None)
for X_value in range (0,512, 1):
Y_value = fft ([X_value]
gfxdraw.pixel (window, X_value, int(abs(Y_value), black)

As you mentioned in your question, you have a data set whith X starting from 0 to... but for numpy.fft.fft you must keep in mind that it is a discrete Fourier transform (DFT) which caculate the fft of equaly spaced samples and i must mntion that it must be a symetric range of dataset from -x to x. You can simply try it with a gausian finction and change the parameters as you wish and see what are the results...
Since you didn''t give any data set here , I would refer you to a generl case with below code:
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
# create data from dataframes
x = np.random.rand(50) #unequaly spaced measurment
x.sort()
y = np.exp(-x*x) #measured signal
based on the answer here you can resample your data into equaly spaced points by:
f = interpolate.interp1d(x, y)
num = 500
xx = np.linspace(x[0], x[-1], num)
yy = f(xx)
plt.close('all')
plt.plot(x,y,'bo')
plt.plot(xx,yy, 'g.-')
plt.show()
enter image description here
then you can make your x data symetric very simply by :
x=xx
y=yy
xsample = x-((x.max()-x.min())/2)
xsample=xsample-(xsample.max()+xsample.min())/2
x=xsample
thne if you try fft you will get the corect results as:
ysample =yy
ysample_fft = np.fft.fftshift(np.abs(np.fft.fft(ysample/ysample.max()))) /
np.sqrt(len(ysample))
plt.plot(xsample,ysample_fft/ysample_fft.max(),'b--')
plt.show()
enter image description here

Related

Time series dBFS plot output modification - current output plot not as expected (matplotlib)

I'm trying to plot the Amplitude (dBFS) vs. Time (s) plot of an audio (.wav) file using matplotlib. I managed to do that with the following code:
def convert_to_decibel(sample):
ref = 32768 # Using a signed 16-bit PCM format wav file. So, 2^16 is the max. value.
if sample!=0:
return 20 * np.log10(abs(sample) / ref)
else:
return 20 * np.log10(0.000001)
from scipy.io.wavfile import read as readWav
from scipy.fftpack import fft
import matplotlib.pyplot as gplot1
import matplotlib.pyplot as gplot2
import numpy as np
import struct
import gc
wavfile1 = '/home/user01/audio/speech.wav'
wavsamplerate1, wavdata1 = readWav(wavfile1)
wavdlen1 = wavdata1.size
wavdtype1 = wavdata1.dtype
gplot1.rcParams['figure.figsize'] = [15, 5]
pltaxis1 = gplot1.gca()
gplot1.axhline(y=0, c="black")
gplot1.xticks(np.arange(0, 10, 0.5))
gplot1.yticks(np.arange(-200, 200, 5))
gplot1.grid(linestyle = '--')
wavdata3 = np.array([convert_to_decibel(i) for i in wavdata1], dtype=np.int16)
yvals3 = wavdata3
t3 = wavdata3.size / wavsamplerate1
xvals3 = np.linspace(0, t3, wavdata3.size)
pltaxis1.set_xlim([0, t3 + 2])
pltaxis1.set_title('Amplitude (dBFS) vs Time(s)')
pltaxis1.plot(xvals3, yvals3, '-')
which gives the following output:
I had also plotted the Power Spectral Density (PSD, in dBm) using the code below:
from scipy.signal import welch as psd # Computes PSD using Welch's method.
fpsd, wPSD = psd(wavdata1, wavsamplerate1, nperseg=1024)
gplot2.rcParams['figure.figsize'] = [15, 5]
pltpsdm = gplot2.gca()
gplot2.axhline(y=0, c="black")
pltpsdm.plot(fpsd, 20*np.log10(wPSD))
gplot2.xticks(np.arange(0, 4000, 400))
gplot2.yticks(np.arange(-150, 160, 10))
pltpsdm.set_xlim([0, 4000])
pltpsdm.set_ylim([-150, 150])
gplot2.grid(linestyle = '--')
which gives the output as:
The second output above, using the Welch's method plots a more presentable output. The dBFS plot though informative is not very presentable IMO. Is this because of:
the difference in the domains (time in case of 1st output vs frequency in the 2nd output)?
the way plot function is implemented in pyplot?
Also, is there a way I can plot my dBFS output as a peak-to-peak style of plot just like in my PSD (dBm) plot rather than a dense stem plot?
Would be much helpful and would appreciate any pointers, answers or suggestions from experts here as I'm just a beginner with matplotlib and plots in python in general.
TLNR
This has nothing to do with pyplot.
The frequency domain is different from the time domain, but that's not why you didn't get what you want.
The calculation of dbFS in your code is wrong.
You should frame your data, calculate RMSs or peaks in every frame, and then convert that value to dbFS instead of applying this transformation to every sample point.
When we talk about the amplitude, we are talking about a periodic signal. And when we read in a series of data from a sound file, we read in a series of sample points of a signal(may be or be not periodic). The value of every sample point represents a, say, voltage value, or sound pressure value sampled at a specific time.
We assume that, within a very short time interval, maybe 10ms for example, the signal is stationary. Every such interval is called a frame.
Some specific function is applied to each frame usually, to reduce the sudden change at the edge of this frame, and these functions are called window functions. If you did nothing to every frame, you added rectangle windows to them.
An example: when the sampling frequency of your sound is 44100Hz, in a 10ms-long frame, there are 44100*0.01=441 sample points. That's what the nperseg argument means in your psd function but it has nothing to do with dbFS.
Given the knowledge above, now we can talk about the amplitude.
There are two methods a get the value of amplitude in every frame:
The most straightforward one is to get the maximum(peak) values in every frame.
Another one is to calculate the RMS(Root Mean Sqaure) of every frame.
After that, the peak values or RMS values can be converted to dbFS values.
Let's start coding:
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
# Determine full scall(maximum possible amplitude) by bit depth
bit_depth = 16
full_scale = 2 ** bit_depth
# dbFS function
to_dbFS = lambda x: 20 * np.log10(x / full_scale)
# Read in the wave file
fname = "01.wav"
fs,data = wavfile.read(fname)
# Determine frame length(number of sample points in a frame) and total frame numbers by window length(how long is a frame in seconds)
window_length = 0.01
signal_length = data.shape[0]
frame_length = int(window_length * fs)
nframes = signal_length // frame_length
# Get frames by broadcast. No overlaps are used.
idx = frame_length * np.arange(nframes)[:,None] + np.arange(frame_length)
frames = data[idx].astype("int64") # Convert to in 64 to avoid integer overflow
# Get RMS and peaks
rms = ((frames**2).sum(axis=1)/frame_length)**.5
peaks = np.abs(frames).max(axis=1)
# Convert them to dbfs
dbfs_rms = to_dbFS(rms)
dbfs_peak = to_dbFS(peaks)
# Let's start to plot
# Get time arrays of every sample point and ever frame
frame_time = np.arange(nframes) * window_length
data_time = np.linspace(0,signal_length/fs,signal_length)
# Plot
f,ax = plt.subplots()
ax.plot(data_time,data,color="k",alpha=.3)
# Plot the dbfs values on a twin x Axes since the y limits are not comparable between data values and dbfs
tax = ax.twinx()
tax.plot(frame_time,dbfs_rms,label="RMS")
tax.plot(frame_time,dbfs_peak,label="Peak")
tax.legend()
f.tight_layout()
# Save serval details
f.savefig("whole.png",dpi=300)
ax.set_xlim(1,2)
f.savefig("1-2sec.png",dpi=300)
ax.set_xlim(1.295,1.325)
f.savefig("1.2-1.3sec.png",dpi=300)
The whole time span looks like(the unit of the right axis is dbFS):
And the voiced part looks like:
You can see that the dbFS values become greater while the amplitudes become greater at the vowel start point:

Determining Fourier Coefficients from Time Series Data

I asked a since deleted question regarding how to determine Fourier coefficients from time series data. I am resubmitting this because I have better formulated the problem and have a solution that I'll give as I think others may find this very useful.
I have some time series data that I have binned into equally spaced time bins (a fact which will be crucial to my solution), and from that data I want to determine the Fourier series (or any function, really) that best describes the data. Here is a MWE with some test data to show the data I'm trying to fit:
import numpy as np
import matplotlib.pyplot as plt
# Create a dependent test variable to define the x-axis of the test data.
test_array = np.linspace(0, 1, 101) - 0.5
# Define some test data to try to apply a Fourier series to.
test_data = [0.9783883464566918, 0.979599093567252, 0.9821424606299206, 0.9857575507812502, 0.9899278899999995,
0.9941848228346452, 0.9978438300395263, 1.0003009205426352, 1.0012208923679058, 1.0017130521235522,
1.0021799664031628, 1.0027475606936413, 1.0034168260869563, 1.0040914266144825, 1.0047781181102355,
1.005520348837209, 1.0061899214145387, 1.006846206627681, 1.0074483048543692, 1.0078691461988312,
1.008318736328125, 1.008446947572815, 1.00862051262136, 1.0085134881422921, 1.008337095516569,
1.0079539881889774, 1.0074857334630352, 1.006747783037474, 1.005962048923679, 1.0049115434782612,
1.003812267822736, 1.0026427549407106, 1.001251963531669, 0.999898555335968, 0.9984976286266923,
0.996995982142858, 0.9955652088974847, 0.9941647321428578, 0.9927727076023389, 0.9914750532544377,
0.990212467710371, 0.9891098035363466, 0.9875998927875242, 0.9828093773946361, 0.9722532524271845,
0.9574084365384614, 0.9411012303149601, 0.9251820309477757, 0.9121488392156851, 0.9033119748549322,
0.9002445803921568, 0.9032760564202343, 0.91192435882353, 0.9249696964980555, 0.94071381372549,
0.957139088974855, 0.9721083392156871, 0.982955287937743, 0.9880613320235758, 0.9897455322896282,
0.9909590626223097, 0.9922601592233015, 0.9936513112840472, 0.9951442427184468, 0.9967071285988475,
0.9982921493123781, 0.9998775465116277, 1.001389230174081, 1.0029109110251453, 1.0044033691406251,
1.0057110841487276, 1.0069551867704276, 1.008118776264591, 1.0089884470588228, 1.0098663972602735,
1.0104514566473979, 1.0109849223300964, 1.0112043902912626, 1.0114717968750002, 1.0113343036750482,
1.0112205972495087, 1.0108811786407768, 1.010500276264591, 1.0099054552529192, 1.009353759223301,
1.008592596116505, 1.007887223091976, 1.0070715634615386, 1.0063525891472884, 1.0055587861271678,
1.0048733732809436, 1.0041832862669238, 1.0035913326848247, 1.0025318871595328, 1.000088536345776,
0.9963596140350871, 0.9918380684931506, 0.9873937281553398, 0.9833394624277463, 0.9803621496062999,
0.9786476100386117]
# Create a figure to view the data.
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
# Plot the data.
ax.scatter(test_array, test_data, color="k", s=1)
This outputs the following:
The question is how to determine the Fourier series best describing this data. The usual formula for determining the Fourier coefficients requires inserting a function into an integral, but if I had a function to describe the data I wouldn't need the Fourier coefficients at all; the whole point of finding this series is to have a functional representation of the data. In the absence of such a function, then, how are the coefficients found?
My solution to this problem is to apply a discrete Fourier transform to the data using NumPy's implementation of the Fast Fourier Transform, numpy.fft.fft(); this is why it's critical that the data is evenly spaced in time, as FFT requires this. While the FFT is typically used to perform analysis of the frequency spectrum, the desired Fourier coefficients are directly related to the output of this function.
Specifically, this function outputs a series of i complex-valued coefficients c. The Fourier series coefficients are found using the relations:
Therefore the FFT allows the Fourier coefficients to be directly computed. Here is the MWE of my solution to this problem, expanding the example given above:
import numpy as np
import matplotlib.pyplot as plt
# Set the number of equal-time bins to create.
n_bins = 101
# Set the number of Fourier coefficients to use.
n_coeff = 51
# Define a function to generate a Fourier series based on the coefficients determined by the Fast Fourier Transform.
# This also includes a series of phases x to pass through the function.
def create_fourier_series(x, coefficients):
# Begin the series with the zeroeth-order Fourier coefficient.
fourier_series = coefficients[0][0] / 2
# Now generate the first through n_coeff'th terms. The period is defined to be 1 since we're operating in phase
# space.
for n in range(1, n_coeff):
fourier_series += (fourier_coeff[n][0] * np.cos(2 * np.pi * n * x) + fourier_coeff[n][1] *
np.sin(2 * np.pi * n * x))
return fourier_series
# Create a dependent test variable to define the x-axis of the test data.
test_array = np.linspace(0, 1, n_bins) - 0.5
# Define some test data to try to apply a Fourier series to.
test_data = [0.9783883464566918, 0.979599093567252, 0.9821424606299206, 0.9857575507812502, 0.9899278899999995,
0.9941848228346452, 0.9978438300395263, 1.0003009205426352, 1.0012208923679058, 1.0017130521235522,
1.0021799664031628, 1.0027475606936413, 1.0034168260869563, 1.0040914266144825, 1.0047781181102355,
1.005520348837209, 1.0061899214145387, 1.006846206627681, 1.0074483048543692, 1.0078691461988312,
1.008318736328125, 1.008446947572815, 1.00862051262136, 1.0085134881422921, 1.008337095516569,
1.0079539881889774, 1.0074857334630352, 1.006747783037474, 1.005962048923679, 1.0049115434782612,
1.003812267822736, 1.0026427549407106, 1.001251963531669, 0.999898555335968, 0.9984976286266923,
0.996995982142858, 0.9955652088974847, 0.9941647321428578, 0.9927727076023389, 0.9914750532544377,
0.990212467710371, 0.9891098035363466, 0.9875998927875242, 0.9828093773946361, 0.9722532524271845,
0.9574084365384614, 0.9411012303149601, 0.9251820309477757, 0.9121488392156851, 0.9033119748549322,
0.9002445803921568, 0.9032760564202343, 0.91192435882353, 0.9249696964980555, 0.94071381372549,
0.957139088974855, 0.9721083392156871, 0.982955287937743, 0.9880613320235758, 0.9897455322896282,
0.9909590626223097, 0.9922601592233015, 0.9936513112840472, 0.9951442427184468, 0.9967071285988475,
0.9982921493123781, 0.9998775465116277, 1.001389230174081, 1.0029109110251453, 1.0044033691406251,
1.0057110841487276, 1.0069551867704276, 1.008118776264591, 1.0089884470588228, 1.0098663972602735,
1.0104514566473979, 1.0109849223300964, 1.0112043902912626, 1.0114717968750002, 1.0113343036750482,
1.0112205972495087, 1.0108811786407768, 1.010500276264591, 1.0099054552529192, 1.009353759223301,
1.008592596116505, 1.007887223091976, 1.0070715634615386, 1.0063525891472884, 1.0055587861271678,
1.0048733732809436, 1.0041832862669238, 1.0035913326848247, 1.0025318871595328, 1.000088536345776,
0.9963596140350871, 0.9918380684931506, 0.9873937281553398, 0.9833394624277463, 0.9803621496062999,
0.9786476100386117]
# Determine the fast Fourier transform for this test data.
fast_fourier_transform = np.fft.fft(test_data[n_bins / 2:] + test_data[:n_bins / 2])
# Create an empty list to hold the values of the Fourier coefficients.
fourier_coeff = []
# Loop through the FFT and pick out the a and b coefficients, which are the real and imaginary parts of the
# coefficients calculated by the FFT.
for n in range(0, n_coeff):
a = 2 * fast_fourier_transform[n].real / n_bins
b = -2 * fast_fourier_transform[n].imag / n_bins
fourier_coeff.append([a, b])
# Create the Fourier series approximating this data.
fourier_series = create_fourier_series(test_array, fourier_coeff)
# Create a figure to view the data.
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
# Plot the data.
ax.scatter(test_array, test_data, color="k", s=1)
# Plot the Fourier series approximation.
ax.plot(test_array, fourier_series, color="b", lw=0.5)
This outputs the following:
Note that how I defined the FFT (importing the second half of the data followed by the first half) is a consequence of how this data was generated. Specifically, the data runs from -0.5 to 0.5, but the FFT assumes it runs from 0.0 to 1.0, necessitating this shift.
I've found that this works quite well for data that doesn't include very sharp and narrow discontinuities. I would be interested to hear if anyone has another suggested solution to this problem, and I hope people find this explanation clear and helpful.
Not sure if it helps you in anyway; I wrote a programme to interpoplate your data. This is done using buildingblocks==0.0.15
Please see below,
import matplotlib.pyplot as plt
from buildingblocks import bb
import numpy as np
Ydata = [0.9783883464566918, 0.979599093567252, 0.9821424606299206, 0.9857575507812502, 0.9899278899999995,
0.9941848228346452, 0.9978438300395263, 1.0003009205426352, 1.0012208923679058, 1.0017130521235522,
1.0021799664031628, 1.0027475606936413, 1.0034168260869563, 1.0040914266144825, 1.0047781181102355,
1.005520348837209, 1.0061899214145387, 1.006846206627681, 1.0074483048543692, 1.0078691461988312,
1.008318736328125, 1.008446947572815, 1.00862051262136, 1.0085134881422921, 1.008337095516569,
1.0079539881889774, 1.0074857334630352, 1.006747783037474, 1.005962048923679, 1.0049115434782612,
1.003812267822736, 1.0026427549407106, 1.001251963531669, 0.999898555335968, 0.9984976286266923,
0.996995982142858, 0.9955652088974847, 0.9941647321428578, 0.9927727076023389, 0.9914750532544377,
0.990212467710371, 0.9891098035363466, 0.9875998927875242, 0.9828093773946361, 0.9722532524271845,
0.9574084365384614, 0.9411012303149601, 0.9251820309477757, 0.9121488392156851, 0.9033119748549322,
0.9002445803921568, 0.9032760564202343, 0.91192435882353, 0.9249696964980555, 0.94071381372549,
0.957139088974855, 0.9721083392156871, 0.982955287937743, 0.9880613320235758, 0.9897455322896282,
0.9909590626223097, 0.9922601592233015, 0.9936513112840472, 0.9951442427184468, 0.9967071285988475,
0.9982921493123781, 0.9998775465116277, 1.001389230174081, 1.0029109110251453, 1.0044033691406251,
1.0057110841487276, 1.0069551867704276, 1.008118776264591, 1.0089884470588228, 1.0098663972602735,
1.0104514566473979, 1.0109849223300964, 1.0112043902912626, 1.0114717968750002, 1.0113343036750482,
1.0112205972495087, 1.0108811786407768, 1.010500276264591, 1.0099054552529192, 1.009353759223301,
1.008592596116505, 1.007887223091976, 1.0070715634615386, 1.0063525891472884, 1.0055587861271678,
1.0048733732809436, 1.0041832862669238, 1.0035913326848247, 1.0025318871595328, 1.000088536345776,
0.9963596140350871, 0.9918380684931506, 0.9873937281553398, 0.9833394624277463, 0.9803621496062999,
0.9786476100386117]
Xdata=list(range(0,len(Ydata)))
Xnew=list(np.linspace(0,len(Ydata),200))
Ynew=bb.interpolate(Xdata,Ydata,Xnew,40)
plt.figure()
plt.plot(Xdata,Ydata)
plt.plot(Xnew,Ynew,'*')
plt.legend(['Given Data', 'Interpolated Data'])
plt.show()
Should you want to further write code, I have also give code so that you can see the source code and learn:
import module
import inspect
src = inspect.getsource(module)
print(src)

Fast Fourier Transform in Python

I am new to the fourier theory and I've seen very good tutorials on how to apply fft to a signal and plot it in order to see the frequencies it contains. Somehow, all of them create a mix of sines as their data and i am having trouble adapting it to my real problem.
I have 242 hourly observations with a daily periodicity, meaning that my period is 24. So I expect to have a peak around 24 on my fft plot.
A sample of my data.csv is here:
https://pastebin.com/1srKFpJQ
Data plotted:
My code:
data = pd.read_csv('data.csv',index_col=0)
data.index = pd.to_datetime(data.index)
data = data['max_open_files'].astype(float).values
N = data.shape[0] #number of elements
t = np.linspace(0, N * 3600, N) #converting hours to seconds
s = data
fft = np.fft.fft(s)
T = t[1] - t[0]
f = np.linspace(0, 1 / T, N)
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
plt.bar(f[:N // 2], np.abs(fft)[:N // 2] * 1 / N, width=1.5) # 1 / N is a normalization factor
plt.show()
This outputs a very weird result where it seems I am getting the same value for every frequency.
I suppose that the problems comes with the definition of N, t and T but I cannot find anything online that has helped me understand this clearly. Please help :)
EDIT1:
With the code provided by charles answer I have a spike around 0 that seems very weird. I have used rfft and rfftfreq instead to avoid having too much frequencies.
I have read that this might be because of the DC component of the series, so after substracting the mean i get:
I am having trouble interpreting this, the spikes seem to happen periodically but the values in Hz don't let me obtain my 24 value (the overall frequency). Anybody knows how to interpret this ? What am I missing ?
The problem you're seeing is because the bars are too wide, and you're only seeing one bar. You will have to change the width of the bars to 0.00001 or smaller to see them show up.
Instead of using a bar chart, make your x axis using fftfreq = np.fft.fftfreq(len(s)) and then use the plot function, plt.plot(fftfreq, fft):
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = pd.read_csv('data.csv',index_col=0)
data.index = pd.to_datetime(data.index)
data = data['max_open_files'].astype(float).values
N = data.shape[0] #number of elements
t = np.linspace(0, N * 3600, N) #converting hours to seconds
s = data
fft = np.fft.fft(s)
fftfreq = np.fft.fftfreq(len(s))
T = t[1] - t[0]
f = np.linspace(0, 1 / T, N)
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
plt.plot(fftfreq,fft)
plt.show()

Apply a filter on an audio sample with python

I'm trying to migrate my Matlab code over Python, but I'm having some troubles with filtering functions.
Here's the context :
I created a butterworth filter by calling the following function (fc is the bandpass center frequency, q its quality factor and n its order) :
def bandpass(fc, q, n):
bw = fc / q
low = fc - bw/2
high = fc + bw/2
return butter(N=n, Wn=[low, high], btype='band', analog=True)
And I have retrieved data from an audio signal (mono, sampled at fs = 48000, on 16-bits integers). Both are giving what I'm expecting when I plot the frequency response of the filter or the amplitude spectrum of the audio sample.
Here's the code :
# Imports
import numpy as np
import matplotlib.pyplot as plt
import scipy.io.wavfile as sw
from scipy import signal
from scipy.signal import butter, lfilter
# Read audio file
fs, y = sw.read(file)
# Nb of samples and time scale
N = len(y)
t = np.linspace(0, N/fs, N)
# Plot amplitude spectrum
plt.plot(t, y/max(y))
# Filter creation
b, a = bandpass(1000, 5, 2)
w, h = signal.freqs(b, a, np.logspace(0, 5, 20000))
# Plot frequential response
plt.semilogx(w, 20 * np.log10(abs(h)));
# Apply filter on audio signal
lfilter(b, a, y) # < Give unexepected results
Then comes the part in which I'm stuck : I'm trying to apply the butterworth filter on the audio sample but it doesn't seem to work as the filtered signal diverge towards infinity and eventually end up in NaN values for an unknown reason. I assume that I missed a step despite following the docs. I also tried filtfilt() because it appeared in some researchs I did but it didn't work either.
Here's the expected result I got with Matlab which I'm trying to replicate.
What am I missing ?
Thanks for your answers :)
Bonus question : How can I can achieve to plot this 3d view (view(-45,65) in Matlab) ?
You have a sampled (i.e. discrete-time) signal. To filter it, you must use a discrete filter, so the analog argument of butter must be False (which is the default).
To analyze the frequency response of the digital filter, use freqz, not freqs.
See How to implement band-pass Butterworth filter with Scipy.signal.butter for a related question and answer regarding a bandpass filter.

Curve fitting with large number of data points

this is quite a specific problem I was hoping the community could help me out with. Thanks in advance.
So I have 2 sets of data, one is experimental and the other is based off of an equation. I am trying to fit my data points to this curve and hence obtain the missing variables I am interested in. Namely, a and b in the Ebfit function.
Here is the code:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as spys
from scipy.optimize import curve_fit
time = [60,220,520,1840]
Moment = [0.64227262,0.468318916,0.197100772,0.104512508]
Temperature = 25 # Bake temperature in degrees C
Nb = len(Moment) # Number of bake measurements
Baketime_a = time #[s]
N_Device = 10000 # No. of devices considered in the array
T_ambient = 273 + Temperature
kt = 0.0256*(T_ambient/298) # In units of eV
f0 = 1e9 # Attempt frequency
def Ebfit(x,a,b):
Eb_mean = a*(0.0256/kt) # Eb at bake temperature
Eb_sigma = b*Eb_mean
Foursigma = 4*Eb_sigma
Eb_a = np.linspace(Eb_mean-Foursigma,Eb_mean+Foursigma,N_Device)
dEb = Eb_a[1] - Eb_a[0]
pdfEb_a = spys.norm.pdf(Eb_a,Eb_mean,Eb_sigma)
## Retention Time
DMom = np.zeros(len(x),float)
tau = (1/f0)*np.exp(Eb_a)
for bb in range(len(x)):
DMom[bb]= (1 - 2*(sum(pdfEb_a*(1 - np.exp(np.divide(-x[bb],tau))))*dEb))
return DMom
a = 30
b = 0.10
params,extras = curve_fit(Ebfit,time,Moment)
x_new = list(range(0,2000,1))
y_new = Ebfit(x_new,params[0],params[1])
plt.plot(time,Moment, 'o', label = 'data points')
plt.plot(x_new,y_new, label = 'fitted curve')
plt.legend()
The main problem I am having is that the fitting of the data to the function does not work when I use large number of points. In the above code When I use the 4 points (time & moment), this code works fine.
I get the following values for a and b.
array([ 29.11832766, 0.13918353])
The expected values for a is (23-50) and b is (0.06 - 0.15). So these values are within the acceptable range. This is the corresponding plot:
However, when I use my actual experimental normalized data with about 500 points.
EDIT: This data:
Normalized Data
https://www.dropbox.com/s/64zke4wckxc1r75/Normalized%20Data.csv?dl=0
Raw Data
https://www.dropbox.com/s/ojgse5ibp59r8nw/Data1.csv?dl=0
I get the following values and plot for a and b which are out of the acceptable range,
array([-13.76687781, -12.90494196])
I know these values are wrong and if I were to do it manually (slowly adjusting values to obtain the proper fit) it would be around a=30.1 and b=0.09. And when plotted looks as such:
I have tried changing the initial guess values for a & b, other sets of experimental data as well and other suggestions in similar threads. None seem to work for me. Any help you can provide is appreciated. Thanks.
.
.
.
.
ADDITIONAL INFORMATION
The model I am trying to fit the data to comes from the following equation:
where Dmom = 1 - 2*Psw
a is the Eb value while b is the Sigma value where, Eb has a range of values determined by the probability density function and 4 times of the sigma values (i.e. Foursigma). This distribution is then summed up to use for the final equation.
It looks like you do need to play around with the initial guesses for a and b after all. Perhaps the function you're fitting is not very well behaved, which is why it's so prone to fail for intitial guesses away from the global minumum. That being said, here's a working example of how to fit your data:
import pandas as pd
data_df = pd.read_csv('data.csv')
time = data_df['Time since start, Time [s]'].values
moment = data_df['Signal X direction, Moment [emu]'].values
params, extras = curve_fit(Ebfit, time, moment, p0=[40, 0.3])
Yields the values of a and b of:
In [6]: params
Out[6]: array([ 30.47553689, 0.08839412])
Which results in a nicely aligned fit of a function.
x_big = np.linspace(1, 1800, 2000)
y_big = Ebfit(x_big, params[0], params[1])
plt.plot(time, moment, 'o', alpha=0.5, label='all points')
plt.plot(x_big, y_big, label = 'fitted curve')
plt.legend()
plt.show()

Categories