How can I fit this sinusoidal wave with my current data? - python

I have some data I gathered analyzing the change of acceleration regarding time. But when I wrote the code below to have a good fit for the sinusoidal wave, this was the result. Is this because I don't have enough data or am I doing something wrong here?
Here you can see my graph:
Measurements plotted directly(no fit)
Fit with horizontal and vertical shift (curve_fit)
Increased data by linspace
Manually manipulated amplitude
Edit: I increased the data size by using the linspace function and plotting it but I am not sure why the amplitude doesn't match, is it because there are very few data to analyze? (I was able to manipulate the amplitude manually but I don't understand why it can't do it)
The code I am using for the fit
def model(x, a, b):
return a * np.sin(b * x)
param, parav_cov = cf(model, time, z_values)
array_x = np.linspace(800, 1400, 1000)
fig = plt.figure(figsize = (9, 4))
plt.scatter(time, z_values, color = "#3333cc", label = "Data")
plt.plot(array_x, model(array_x, param[0], param[1], param[2], param[3]), label = "Sin Fit")

I'd use an FFT to get a first guess at parameters, as this sort of thing is highly non-linear and curve_fit is unlikely to get very far otherwise. the reason for using a FFT is to get an initial idea of the frequency involved, not much more. 3Blue1Brown has a great video on FFTs if you've not seem it
I used web plot digitizer to get your data out of your plots, then pulled into Python and made sure it looked OK with:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('sinfit2.csv')
print(df.head())
giving me:
x y
0 809.3 0.3
1 820.0 0.3
2 830.3 19.6
3 839.9 19.6
4 849.6 0.4
I started by doing a basic FFT with NumPy (SciPy has the full fftpack which is more complete, but not needed here):
import numpy as np
from numpy.fft import fft
d = fft(df.y)
plt.plot(np.abs(d)[:len(d)//2], '.')
the np.abs(d) is because you get a complex number back containing both phase and amplitude, and [:len(d)//2] is because (for real valued input) the output is symmetric about the midpoint, i.e. d[5] == d[-5].
this says the largest component was 18, I tried plotting this by hand and it looked OK:
x = np.linspace(0, np.pi * 2, len(df))
plt.plot(df.x, df.y, '.-', lw=1)
plt.plot(df.x, np.sin(x * 18) * 10 + 10)
I'm multiplying by 10 and adding 10 is because the range of a sine is (-1, +1) and we need to take it to (0, 20).
next I passed these to curve_fit with a simplified model to help it along:
from scipy.optimize import curve_fit
def model(x, a, b):
return np.sin(x * a + b) * 10 + 10
(a, b), cov = curve_fit(model, x, df.y, [18, 0])
again I'm hardcoding the * 10 + 10 to get the range to match your data, which gives me a=17.8 and b=2.97
finally I plot the function sampled at a higher frequency to make sure all is OK:
plt.plot(df.x, df.y)
plt.plot(
np.linspace(810, 1400, 501),
model(np.linspace(0, np.pi*2, 501), a, b)
)
giving me:
which seems to look OK. note you might want to change these parameters so they fit your original X, and note my df.x starts at 810, so I might have missed the first point.

Related

SciPy curve_fit displays straight line and is not fit to the data

I'm trying to fit a set of data to a CDF exponential function. However, I'm not sure what is going wrong either in my code or in the initial parameter guess, but it only creates a straight line. Data was imported from a CSV file.
#Plot Data
plt.figure(1,dpi=120)
plt.title("Cell A3")
plt.xlabel(rawdata[0][0])
plt.ylabel(rawdata[0][1])
plt.scatter(xdata,ydata,label="A3 Cell 1")
#Define Function
def func(t,lam):
return 1 - (np.exp(-lam * t))
funcdata = func(xdata,1.17)
plt.plot(xdata,funcdata,label="Model")
plt.legend()
#CurveFit data to model
popt, pcov = curve_fit(func,xdata,ydata,p0=(-0.64))
perr = np.sqrt(np.diag(pcov))
Image of the graph I get with the initial data and the straight line that the curve_fit gives
You cannot fit correctly such a simple exponential function of this kind :
y=( 1 - (np.exp(-lam * t)) ) * scale
to the data because the shape of this function is far to the shape of the data in the range of 0<t<5.
Better consider a function of the logistic kind, For example :
Think about your data and your function. ydata is quite a large value. What is the maximum value of
def func(t,lam):
return 1 - (np.exp(-lam * t))
I think you will find the max of the function occurs as lam approaches infinity, the function approaches 1. How can a function with max value == 1 fit data in the 1000s? If you want to be able to scale beyond 1, you need more parameters in your function. Try with
def func(t,lam,scale):
return ( 1 - (np.exp(-lam * t)) ) * scale
and see if scipy is able to better fit the data.
EDIT:
I mananaged to get that to work, however, you aren't even plotting the optimum parameters. To do that, see my code with simulated xdata and ydata:
#Plot Data
import numpy as np
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt
def func(t,lam,scale):
return ( 1 - (np.exp(-lam * t)) ) * scale
xdata = np.arange(25.)
ydata = func(xdata, 1.12, 2000.)
plt.figure(1,dpi=120)
plt.title("Cell A3")
plt.xlabel(rawdata[0][0])
plt.ylabel(rawdata[0][1])
plt.scatter(xdata,ydata,label="A3 Cell 1")
#CurveFit data to model
popt, pcov = curve_fit(func,xdata,ydata,p0=[0.5, 1000.1])
plt.plot(np.arange(25),func(np.arange(25), *popt),label="Model")
plt.legend()
outputs:

Determining Fourier Coefficients from Time Series Data

I asked a since deleted question regarding how to determine Fourier coefficients from time series data. I am resubmitting this because I have better formulated the problem and have a solution that I'll give as I think others may find this very useful.
I have some time series data that I have binned into equally spaced time bins (a fact which will be crucial to my solution), and from that data I want to determine the Fourier series (or any function, really) that best describes the data. Here is a MWE with some test data to show the data I'm trying to fit:
import numpy as np
import matplotlib.pyplot as plt
# Create a dependent test variable to define the x-axis of the test data.
test_array = np.linspace(0, 1, 101) - 0.5
# Define some test data to try to apply a Fourier series to.
test_data = [0.9783883464566918, 0.979599093567252, 0.9821424606299206, 0.9857575507812502, 0.9899278899999995,
0.9941848228346452, 0.9978438300395263, 1.0003009205426352, 1.0012208923679058, 1.0017130521235522,
1.0021799664031628, 1.0027475606936413, 1.0034168260869563, 1.0040914266144825, 1.0047781181102355,
1.005520348837209, 1.0061899214145387, 1.006846206627681, 1.0074483048543692, 1.0078691461988312,
1.008318736328125, 1.008446947572815, 1.00862051262136, 1.0085134881422921, 1.008337095516569,
1.0079539881889774, 1.0074857334630352, 1.006747783037474, 1.005962048923679, 1.0049115434782612,
1.003812267822736, 1.0026427549407106, 1.001251963531669, 0.999898555335968, 0.9984976286266923,
0.996995982142858, 0.9955652088974847, 0.9941647321428578, 0.9927727076023389, 0.9914750532544377,
0.990212467710371, 0.9891098035363466, 0.9875998927875242, 0.9828093773946361, 0.9722532524271845,
0.9574084365384614, 0.9411012303149601, 0.9251820309477757, 0.9121488392156851, 0.9033119748549322,
0.9002445803921568, 0.9032760564202343, 0.91192435882353, 0.9249696964980555, 0.94071381372549,
0.957139088974855, 0.9721083392156871, 0.982955287937743, 0.9880613320235758, 0.9897455322896282,
0.9909590626223097, 0.9922601592233015, 0.9936513112840472, 0.9951442427184468, 0.9967071285988475,
0.9982921493123781, 0.9998775465116277, 1.001389230174081, 1.0029109110251453, 1.0044033691406251,
1.0057110841487276, 1.0069551867704276, 1.008118776264591, 1.0089884470588228, 1.0098663972602735,
1.0104514566473979, 1.0109849223300964, 1.0112043902912626, 1.0114717968750002, 1.0113343036750482,
1.0112205972495087, 1.0108811786407768, 1.010500276264591, 1.0099054552529192, 1.009353759223301,
1.008592596116505, 1.007887223091976, 1.0070715634615386, 1.0063525891472884, 1.0055587861271678,
1.0048733732809436, 1.0041832862669238, 1.0035913326848247, 1.0025318871595328, 1.000088536345776,
0.9963596140350871, 0.9918380684931506, 0.9873937281553398, 0.9833394624277463, 0.9803621496062999,
0.9786476100386117]
# Create a figure to view the data.
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
# Plot the data.
ax.scatter(test_array, test_data, color="k", s=1)
This outputs the following:
The question is how to determine the Fourier series best describing this data. The usual formula for determining the Fourier coefficients requires inserting a function into an integral, but if I had a function to describe the data I wouldn't need the Fourier coefficients at all; the whole point of finding this series is to have a functional representation of the data. In the absence of such a function, then, how are the coefficients found?
My solution to this problem is to apply a discrete Fourier transform to the data using NumPy's implementation of the Fast Fourier Transform, numpy.fft.fft(); this is why it's critical that the data is evenly spaced in time, as FFT requires this. While the FFT is typically used to perform analysis of the frequency spectrum, the desired Fourier coefficients are directly related to the output of this function.
Specifically, this function outputs a series of i complex-valued coefficients c. The Fourier series coefficients are found using the relations:
Therefore the FFT allows the Fourier coefficients to be directly computed. Here is the MWE of my solution to this problem, expanding the example given above:
import numpy as np
import matplotlib.pyplot as plt
# Set the number of equal-time bins to create.
n_bins = 101
# Set the number of Fourier coefficients to use.
n_coeff = 51
# Define a function to generate a Fourier series based on the coefficients determined by the Fast Fourier Transform.
# This also includes a series of phases x to pass through the function.
def create_fourier_series(x, coefficients):
# Begin the series with the zeroeth-order Fourier coefficient.
fourier_series = coefficients[0][0] / 2
# Now generate the first through n_coeff'th terms. The period is defined to be 1 since we're operating in phase
# space.
for n in range(1, n_coeff):
fourier_series += (fourier_coeff[n][0] * np.cos(2 * np.pi * n * x) + fourier_coeff[n][1] *
np.sin(2 * np.pi * n * x))
return fourier_series
# Create a dependent test variable to define the x-axis of the test data.
test_array = np.linspace(0, 1, n_bins) - 0.5
# Define some test data to try to apply a Fourier series to.
test_data = [0.9783883464566918, 0.979599093567252, 0.9821424606299206, 0.9857575507812502, 0.9899278899999995,
0.9941848228346452, 0.9978438300395263, 1.0003009205426352, 1.0012208923679058, 1.0017130521235522,
1.0021799664031628, 1.0027475606936413, 1.0034168260869563, 1.0040914266144825, 1.0047781181102355,
1.005520348837209, 1.0061899214145387, 1.006846206627681, 1.0074483048543692, 1.0078691461988312,
1.008318736328125, 1.008446947572815, 1.00862051262136, 1.0085134881422921, 1.008337095516569,
1.0079539881889774, 1.0074857334630352, 1.006747783037474, 1.005962048923679, 1.0049115434782612,
1.003812267822736, 1.0026427549407106, 1.001251963531669, 0.999898555335968, 0.9984976286266923,
0.996995982142858, 0.9955652088974847, 0.9941647321428578, 0.9927727076023389, 0.9914750532544377,
0.990212467710371, 0.9891098035363466, 0.9875998927875242, 0.9828093773946361, 0.9722532524271845,
0.9574084365384614, 0.9411012303149601, 0.9251820309477757, 0.9121488392156851, 0.9033119748549322,
0.9002445803921568, 0.9032760564202343, 0.91192435882353, 0.9249696964980555, 0.94071381372549,
0.957139088974855, 0.9721083392156871, 0.982955287937743, 0.9880613320235758, 0.9897455322896282,
0.9909590626223097, 0.9922601592233015, 0.9936513112840472, 0.9951442427184468, 0.9967071285988475,
0.9982921493123781, 0.9998775465116277, 1.001389230174081, 1.0029109110251453, 1.0044033691406251,
1.0057110841487276, 1.0069551867704276, 1.008118776264591, 1.0089884470588228, 1.0098663972602735,
1.0104514566473979, 1.0109849223300964, 1.0112043902912626, 1.0114717968750002, 1.0113343036750482,
1.0112205972495087, 1.0108811786407768, 1.010500276264591, 1.0099054552529192, 1.009353759223301,
1.008592596116505, 1.007887223091976, 1.0070715634615386, 1.0063525891472884, 1.0055587861271678,
1.0048733732809436, 1.0041832862669238, 1.0035913326848247, 1.0025318871595328, 1.000088536345776,
0.9963596140350871, 0.9918380684931506, 0.9873937281553398, 0.9833394624277463, 0.9803621496062999,
0.9786476100386117]
# Determine the fast Fourier transform for this test data.
fast_fourier_transform = np.fft.fft(test_data[n_bins / 2:] + test_data[:n_bins / 2])
# Create an empty list to hold the values of the Fourier coefficients.
fourier_coeff = []
# Loop through the FFT and pick out the a and b coefficients, which are the real and imaginary parts of the
# coefficients calculated by the FFT.
for n in range(0, n_coeff):
a = 2 * fast_fourier_transform[n].real / n_bins
b = -2 * fast_fourier_transform[n].imag / n_bins
fourier_coeff.append([a, b])
# Create the Fourier series approximating this data.
fourier_series = create_fourier_series(test_array, fourier_coeff)
# Create a figure to view the data.
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
# Plot the data.
ax.scatter(test_array, test_data, color="k", s=1)
# Plot the Fourier series approximation.
ax.plot(test_array, fourier_series, color="b", lw=0.5)
This outputs the following:
Note that how I defined the FFT (importing the second half of the data followed by the first half) is a consequence of how this data was generated. Specifically, the data runs from -0.5 to 0.5, but the FFT assumes it runs from 0.0 to 1.0, necessitating this shift.
I've found that this works quite well for data that doesn't include very sharp and narrow discontinuities. I would be interested to hear if anyone has another suggested solution to this problem, and I hope people find this explanation clear and helpful.
Not sure if it helps you in anyway; I wrote a programme to interpoplate your data. This is done using buildingblocks==0.0.15
Please see below,
import matplotlib.pyplot as plt
from buildingblocks import bb
import numpy as np
Ydata = [0.9783883464566918, 0.979599093567252, 0.9821424606299206, 0.9857575507812502, 0.9899278899999995,
0.9941848228346452, 0.9978438300395263, 1.0003009205426352, 1.0012208923679058, 1.0017130521235522,
1.0021799664031628, 1.0027475606936413, 1.0034168260869563, 1.0040914266144825, 1.0047781181102355,
1.005520348837209, 1.0061899214145387, 1.006846206627681, 1.0074483048543692, 1.0078691461988312,
1.008318736328125, 1.008446947572815, 1.00862051262136, 1.0085134881422921, 1.008337095516569,
1.0079539881889774, 1.0074857334630352, 1.006747783037474, 1.005962048923679, 1.0049115434782612,
1.003812267822736, 1.0026427549407106, 1.001251963531669, 0.999898555335968, 0.9984976286266923,
0.996995982142858, 0.9955652088974847, 0.9941647321428578, 0.9927727076023389, 0.9914750532544377,
0.990212467710371, 0.9891098035363466, 0.9875998927875242, 0.9828093773946361, 0.9722532524271845,
0.9574084365384614, 0.9411012303149601, 0.9251820309477757, 0.9121488392156851, 0.9033119748549322,
0.9002445803921568, 0.9032760564202343, 0.91192435882353, 0.9249696964980555, 0.94071381372549,
0.957139088974855, 0.9721083392156871, 0.982955287937743, 0.9880613320235758, 0.9897455322896282,
0.9909590626223097, 0.9922601592233015, 0.9936513112840472, 0.9951442427184468, 0.9967071285988475,
0.9982921493123781, 0.9998775465116277, 1.001389230174081, 1.0029109110251453, 1.0044033691406251,
1.0057110841487276, 1.0069551867704276, 1.008118776264591, 1.0089884470588228, 1.0098663972602735,
1.0104514566473979, 1.0109849223300964, 1.0112043902912626, 1.0114717968750002, 1.0113343036750482,
1.0112205972495087, 1.0108811786407768, 1.010500276264591, 1.0099054552529192, 1.009353759223301,
1.008592596116505, 1.007887223091976, 1.0070715634615386, 1.0063525891472884, 1.0055587861271678,
1.0048733732809436, 1.0041832862669238, 1.0035913326848247, 1.0025318871595328, 1.000088536345776,
0.9963596140350871, 0.9918380684931506, 0.9873937281553398, 0.9833394624277463, 0.9803621496062999,
0.9786476100386117]
Xdata=list(range(0,len(Ydata)))
Xnew=list(np.linspace(0,len(Ydata),200))
Ynew=bb.interpolate(Xdata,Ydata,Xnew,40)
plt.figure()
plt.plot(Xdata,Ydata)
plt.plot(Xnew,Ynew,'*')
plt.legend(['Given Data', 'Interpolated Data'])
plt.show()
Should you want to further write code, I have also give code so that you can see the source code and learn:
import module
import inspect
src = inspect.getsource(module)
print(src)

Fast Fourier Transform in Python

I am new to the fourier theory and I've seen very good tutorials on how to apply fft to a signal and plot it in order to see the frequencies it contains. Somehow, all of them create a mix of sines as their data and i am having trouble adapting it to my real problem.
I have 242 hourly observations with a daily periodicity, meaning that my period is 24. So I expect to have a peak around 24 on my fft plot.
A sample of my data.csv is here:
https://pastebin.com/1srKFpJQ
Data plotted:
My code:
data = pd.read_csv('data.csv',index_col=0)
data.index = pd.to_datetime(data.index)
data = data['max_open_files'].astype(float).values
N = data.shape[0] #number of elements
t = np.linspace(0, N * 3600, N) #converting hours to seconds
s = data
fft = np.fft.fft(s)
T = t[1] - t[0]
f = np.linspace(0, 1 / T, N)
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
plt.bar(f[:N // 2], np.abs(fft)[:N // 2] * 1 / N, width=1.5) # 1 / N is a normalization factor
plt.show()
This outputs a very weird result where it seems I am getting the same value for every frequency.
I suppose that the problems comes with the definition of N, t and T but I cannot find anything online that has helped me understand this clearly. Please help :)
EDIT1:
With the code provided by charles answer I have a spike around 0 that seems very weird. I have used rfft and rfftfreq instead to avoid having too much frequencies.
I have read that this might be because of the DC component of the series, so after substracting the mean i get:
I am having trouble interpreting this, the spikes seem to happen periodically but the values in Hz don't let me obtain my 24 value (the overall frequency). Anybody knows how to interpret this ? What am I missing ?
The problem you're seeing is because the bars are too wide, and you're only seeing one bar. You will have to change the width of the bars to 0.00001 or smaller to see them show up.
Instead of using a bar chart, make your x axis using fftfreq = np.fft.fftfreq(len(s)) and then use the plot function, plt.plot(fftfreq, fft):
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = pd.read_csv('data.csv',index_col=0)
data.index = pd.to_datetime(data.index)
data = data['max_open_files'].astype(float).values
N = data.shape[0] #number of elements
t = np.linspace(0, N * 3600, N) #converting hours to seconds
s = data
fft = np.fft.fft(s)
fftfreq = np.fft.fftfreq(len(s))
T = t[1] - t[0]
f = np.linspace(0, 1 / T, N)
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
plt.plot(fftfreq,fft)
plt.show()

Curve fitting with large number of data points

this is quite a specific problem I was hoping the community could help me out with. Thanks in advance.
So I have 2 sets of data, one is experimental and the other is based off of an equation. I am trying to fit my data points to this curve and hence obtain the missing variables I am interested in. Namely, a and b in the Ebfit function.
Here is the code:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as spys
from scipy.optimize import curve_fit
time = [60,220,520,1840]
Moment = [0.64227262,0.468318916,0.197100772,0.104512508]
Temperature = 25 # Bake temperature in degrees C
Nb = len(Moment) # Number of bake measurements
Baketime_a = time #[s]
N_Device = 10000 # No. of devices considered in the array
T_ambient = 273 + Temperature
kt = 0.0256*(T_ambient/298) # In units of eV
f0 = 1e9 # Attempt frequency
def Ebfit(x,a,b):
Eb_mean = a*(0.0256/kt) # Eb at bake temperature
Eb_sigma = b*Eb_mean
Foursigma = 4*Eb_sigma
Eb_a = np.linspace(Eb_mean-Foursigma,Eb_mean+Foursigma,N_Device)
dEb = Eb_a[1] - Eb_a[0]
pdfEb_a = spys.norm.pdf(Eb_a,Eb_mean,Eb_sigma)
## Retention Time
DMom = np.zeros(len(x),float)
tau = (1/f0)*np.exp(Eb_a)
for bb in range(len(x)):
DMom[bb]= (1 - 2*(sum(pdfEb_a*(1 - np.exp(np.divide(-x[bb],tau))))*dEb))
return DMom
a = 30
b = 0.10
params,extras = curve_fit(Ebfit,time,Moment)
x_new = list(range(0,2000,1))
y_new = Ebfit(x_new,params[0],params[1])
plt.plot(time,Moment, 'o', label = 'data points')
plt.plot(x_new,y_new, label = 'fitted curve')
plt.legend()
The main problem I am having is that the fitting of the data to the function does not work when I use large number of points. In the above code When I use the 4 points (time & moment), this code works fine.
I get the following values for a and b.
array([ 29.11832766, 0.13918353])
The expected values for a is (23-50) and b is (0.06 - 0.15). So these values are within the acceptable range. This is the corresponding plot:
However, when I use my actual experimental normalized data with about 500 points.
EDIT: This data:
Normalized Data
https://www.dropbox.com/s/64zke4wckxc1r75/Normalized%20Data.csv?dl=0
Raw Data
https://www.dropbox.com/s/ojgse5ibp59r8nw/Data1.csv?dl=0
I get the following values and plot for a and b which are out of the acceptable range,
array([-13.76687781, -12.90494196])
I know these values are wrong and if I were to do it manually (slowly adjusting values to obtain the proper fit) it would be around a=30.1 and b=0.09. And when plotted looks as such:
I have tried changing the initial guess values for a & b, other sets of experimental data as well and other suggestions in similar threads. None seem to work for me. Any help you can provide is appreciated. Thanks.
.
.
.
.
ADDITIONAL INFORMATION
The model I am trying to fit the data to comes from the following equation:
where Dmom = 1 - 2*Psw
a is the Eb value while b is the Sigma value where, Eb has a range of values determined by the probability density function and 4 times of the sigma values (i.e. Foursigma). This distribution is then summed up to use for the final equation.
It looks like you do need to play around with the initial guesses for a and b after all. Perhaps the function you're fitting is not very well behaved, which is why it's so prone to fail for intitial guesses away from the global minumum. That being said, here's a working example of how to fit your data:
import pandas as pd
data_df = pd.read_csv('data.csv')
time = data_df['Time since start, Time [s]'].values
moment = data_df['Signal X direction, Moment [emu]'].values
params, extras = curve_fit(Ebfit, time, moment, p0=[40, 0.3])
Yields the values of a and b of:
In [6]: params
Out[6]: array([ 30.47553689, 0.08839412])
Which results in a nicely aligned fit of a function.
x_big = np.linspace(1, 1800, 2000)
y_big = Ebfit(x_big, params[0], params[1])
plt.plot(time, moment, 'o', alpha=0.5, label='all points')
plt.plot(x_big, y_big, label = 'fitted curve')
plt.legend()
plt.show()

Improve Polynomial Curve Fitting using numpy/Scipy in Python Help Needed

I have two NumPy arrays time and no of get requests. I need to fit this data using a function so that i could make future predictions.
These data were extracted from cassandra table which stores the details of a log file. So basically the time format is epoch-time and the training variable here is get_counts.
from cassandra.cluster import Cluster
import numpy as np
import matplotlib.pyplot as plt
from cassandra.query import panda_factory
session = Cluster(contact_points=['127.0.0.1'], port=9042).connect(keyspace='ASIA_KS')
session.row_factory = panda_factory
df = session.execute("SELECT epoch_time, get_counts FROM ASIA_TRAFFIC")
.sort(columns=['epoch_time','get_counts'], ascending=[1,0])
time = np.array([x[1] for x in enumerate(df['epoch_time'])])
get = np.array([x[1] for x in enumerate(df['get_counts'])])
plt.title('Trend')
plt.plot(time, byte,'o')
plt.show()
The data is as follows:
there are around 1000 pairs of data
time -> [1391193000 1391193060 1391193120 ..., 1391279280 1391279340 1391279400 1391279460]
get -> [577 380 430 ...,250 275 365 15]
Plot image (full size here):
Can someone please help me in providing a function so that i could properly fit in the data? I am new to python.
EDIT *
fit = np.polyfit(time, get, 3)
yp = np.poly1d(fit)
plt.plot(time, yp(time), 'r--', time, get, 'b.')
plt.xlabel('Time')
plt.ylabel('Number of Get requests')
plt.title('Trend')
plt.xlim([time[0]-10000, time[-1]+10000])
plt.ylim(0, 2000)
plt.show()
print yp(time[1400])
the fit curve looks like this:
https://drive.google.com/file/d/0B-r3Ym7u_hsKUTF1OFVqRWpEN2M/view?usp=sharing
However at the later part of the curve the value of y becomes (-ve) which is wrong. The curve must change its slope back to (+ve) somewhere in between.
Can anyone please suggest me how to go about it.
Help will be much appreciated.
You could try:
time = np.array([x[1] for x in enumerate(df['epoch_time'])])
byte = np.array([x[1] for x in enumerate(df['byte_transfer'])])
fit = np.polyfit(time, byte, n) # step up n value here,
# where n is the degree of the polynomial
yp = np.poly1d(fit)
print yp # displays function in cx^n +- cx^n-1...c format
plt.plot(x, yp(x), '-')
plt.xlabel('Time')
plt.ylabel('Bytes Transfered')
plt.title('Trend')
plt.plot(time, byte,'o')
plt.show()
I'm new to Numpy and curve fitting as well, but this is how I've been attempting to do it.

Categories