I am trying to fit an exponential CDF to my data to see if it is a good fit/develop an equation from the fit, but am not sure how since I think scipy.stats fits the PDF, not the CDF. If I have the data below:
eta = [1,0.5,0.3,0.25,0.2];
q = [1e-9,9.9981e-10,9.9504e-10,9.7905e-10,9.492e-10];
How do I fit an exponential CDF to the data? Or how do find the distribution that fits the data the best?
You can define a general exp function, and use curve_fit from scipy.optimize:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def exp_func(x, a, b, c):
return a * np.exp(-b * x) + c
eta = np.array([1,0.5,0.3,0.25,0.2])
cdf = np.array([1e-9,9.9981e-10,9.9504e-10,9.7905e-10,9.492e-10])
popt, pcov = curve_fit(exp_func, eta, cdf)
plt.plot(eta, cdf)
plt.plot(eta, exp_func(eta, *popt), 'r-', label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.legend()
plt.show()
And you'll get an exp function which is very similar to your values:
From the fitted parameters, you can see the function is y=np.exp(-19.213 * x).
* Update *
If you want to make sure this is really a CDF function, you'll need to calculate the pdf (by taking the derivative):
x = np.linspace(0, 1, 1000)
cdf_fit = exp_func(x, *popt)
cdf_diff = np.r_[cdf_fit[0], np.diff(cdf_fit)]
You can do a sanity check:
plt.plot(x, np.cumsum(cdf_diff))
And then use scipy to fit the pdf to an exponent distribution:
from scipy.stats import expon
params = expon.fit(cdf_diff)
pdf_fit = expon.pdf(x, *params)
I must warn you the something doesn't sum up. pdf_fit doesn't align with cdf_diff. Maybe your CDF isn't a real distribution function? The last value of a CDF should be 1.
Related
Please tell me how to determine the unknown parameters in the calculated curve, using scipy optimization, having an experimental curve at the input. I need to determine the unknown parameters a, b, c (in the code below) from the calculated curve, so that the standard deviation functional is minimal
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from math import pi
def func(a,b,c):
return -a/(2*np.tan(c*(pi/2)))+np.sqrt(b+(a**2)/(4*np.tan((c)*(pi/2))))
file=experimental curve in .txt file
pd_file=pd.read_csv(file, sep="\s+",header=None,names=['frequence', 'y'],
skiprows=1)
xdata=pd_file['frequence']
ydata=pd_file['y']
popt, pcov = curve_fit(func, xdata, ydata, p0=[0.6,1], maxfev=500000000)
print('popt',popt)
I do not think your functional form is suitable for fitting the data you have. After some experimentation may I suggest a different one:
def func2(x,b, d):
return 0.2/ (1 + b * x + d * np.log(1+x))
file='chi_strich_strich_H0.txt'
pd_file=pd.read_csv(file, sep="\s+",header=None,names=['frequence', 'y'],
skiprows=1)
xdata=pd_file['frequence']
ydata=pd_file['y']
popt, pcov = curve_fit(func2, xdata, ydata, p0=[0,0], maxfev=500000000)
print('popt',popt)
yfit = func2(xdata,popt[0], popt[1])
plt.plot(xdata, ydata, '.', label = 'data')
plt.plot(xdata, yfit, '-', label = 'fit')
plt.legend(loc = 'best')
plt.show()
popt: [ 1.84672386e-05 -7.69652828e-02]
The fit is on this plot:
I'm trying to plot a histogram in python by importing data from an excel file.
Also, the histogram needs to be fitted with an exponential function.
How can I do this plotting and fitting procedure?
For plotting just use plt.hist and your data
import random
import matplotlib.pyplot as plt
# data for test
data = [random.randint(1,20) for i in range(20)]
n, x, _ = plt.hist(data)
bin_centers = 0.5*(x[1:]+x[:-1])
plt.plot(bin_centers,n);
for fitting you can extract bins centers and try to fit it with curve_fit:
from scipy.optimize import curve_fit
# some exponential function
def func(x, a, b, c):
return a * np.exp(-b * x) + c
popt, pcov = curve_fit(func, bin_centers, n, bounds=(0, [3., 1., 0.5]))
# bounds are variable, so you can change them as you wish
plt.plot(bin_centers, n, label='data')
plt.plot(bin_centers, func(bin_centers, *popt), label='fit')
plt.legend()
I would like to find and plot a function f that represents a curve fitted on some number of set points that I already know, x and y.
After some research I started experimenting with scipy.optimize and curve_fit but on the reference guide I found that the program uses a function to fit the data instead and it assumes ydata = f(xdata, *params) + eps.
So my question is this: What do I have to change in my code to use the curve_fit or any other library to find the function of the curve using my set points? (note: I want to know the function as well so I can integrate later for my project and plot it). I know that its going to be a decaying exponencial function but don't know the exact parameters. This is what I tried in my program:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
xdata = np.array([0.2, 0.5, 0.8, 1])
ydata = np.array([6, 1, 0.5, 0.2])
plt.plot(xdata, ydata, 'b-', label='data')
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(xdata, func(xdata, *popt), 'r-', label='fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
Am currently developing this project on a Raspberry Pi, if it changes anything. And would like to use least squares method since is great and precise, but any other method that works well is welcome.
Again, this is based on the reference guide of scipy library. Also, I get the following graph, which is not even a curve: Graph and curve based on set points
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
#c is a constant so taking the derivative makes it go to zero
def deriv(x, a, b, c):
return -a * b * np.exp(-b * x)
#Integrating gives you another c coefficient (offset) let's call it c1 and set it equal to zero by default
def integ(x, a, b, c, c1 = 0):
return -a/b * np.exp(-b * x) + c*x + c1
#There are only 4 (x,y) points here
xdata = np.array([0.2, 0.5, 0.8, 1])
ydata = np.array([6, 1, 0.5, 0.2])
#curve_fit already uses "non-linear least squares to fit a function, f, to data"
popt, pcov = curve_fit(func, xdata, ydata)
a,b,c = popt #these are the optimal parameters for fitting your 4 data points
#Now get more x values to plot the curve along so it looks like a curve
step = 0.01
fit_xs = np.arange(min(xdata),max(xdata),step)
#Plot the results
plt.plot(xdata, ydata, 'bx', label='data')
plt.plot(fit_xs, func(fit_xs,a,b,c), 'r-', label='fit')
plt.plot(fit_xs, deriv(fit_xs,a,b,c), 'g-', label='deriv')
plt.plot(fit_xs, integ(fit_xs,a,b,c), 'm-', label='integ')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
So I have two lists of data, which I can plot in a scatter plot, as such:
from matplotlib import pyplot as plt
x = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
y = [22.4155688819,22.3936180362,22.3177538001,22.1924849792,21.7721194577,21.1590235248,20.6670446864,20.4996957642,20.4260953411,20.3595072628,20.3926201626,20.6023149681,21.1694961343,22.1077417713,23.8270366414,26.5355924353,31.3179807276,42.7871637946,61.9639549412,84.7710953311]
plt.scatter(degrees,RMS_one_image)
This gives you a plot that looks like a Gaussian distribution, which is good as it should-
My issue is however I am trying to fit a Gaussian distribution to this, and failing miserably because a. it's only half a Gaussian instead of a full one, and b. what I've used before has only ever used one bunch of numbers. So something like:
# best fit of data
num_bins = 20
(mu, sigma) = norm.fit(sixteen)
y = mlab.normpdf(num_bins, mu, sigma)
n, bins, patches = plt.hist(deg_array, num_bins, normed=1, facecolor='blue', alpha=0.5)
# add a 'best fit' line
y = mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r--')
Does this approach work at all here, or am I going about this in the wrong way completely? Thanks...
It seems that your normal solution is to find the expectation value and standard deviation of the data directly instead of using a least square fit. Here is a solution using curve_fit from scipy.optimize.
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
x = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])
y = [22.4155688819,22.3936180362,22.3177538001,22.1924849792,21.7721194577,21.1590235248,20.6670446864,20.4996957642,20.4260953411,20.3595072628,20.3926201626,20.6023149681,21.1694961343,22.1077417713,23.8270366414,26.5355924353,31.3179807276,42.7871637946,61.9639549412,84.7710953311]
# Define a gaussian function with offset
def gaussian_func(x, a, x0, sigma,c):
return a * np.exp(-(x-x0)**2/(2*sigma**2)) + c
initial_guess = [1,20,2,0]
popt, pcov = curve_fit(gaussian_func, x, y,p0=initial_guess)
xplot = np.linspace(0,30,1000)
plt.scatter(x,y)
plt.plot(xplot,gaussian_func(xplot,*popt))
plt.show()
I'm using scipy.optimize.curve_fit to approximate peaks in my data with Gaussian functions. This works well for strong peaks, but it is more difficult with weaker peaks. However, I think fixing a parameter (say, width of the Gaussian) would help with this. I know I can set initial "estimates" but is there a way that I can easily define a single parameter without changing the function I'm fitting to?
If you want to "fix" a parameter of your fit function, you can just define a new fit function which makes use of the original fit function, yet setting one argument to a fixed value:
custom_gaussian = lambda x, mu: gaussian(x, mu, 0.05)
Here's a complete example of fixing sigma of a Gaussian function to 0.05 (instead of optimal value 0.1). Of course, this doesn't really make sense here because the algorithm has no problem in finding optimal values. Yet, you can see how mu is still found despite the fixed sigma.
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize
def gaussian(x, mu, sigma):
return 1 / sigma / np.sqrt(2 * np.pi) * np.exp(-(x - mu)**2 / 2 / sigma**2)
# Create sample data
x = np.linspace(0, 2, 200)
y = gaussian(x, 1, 0.1) + np.random.rand(*x.shape) - 0.5
plt.plot(x, y, label="sample data")
# Fit with original fit function
popt, _ = scipy.optimize.curve_fit(gaussian, x, y)
plt.plot(x, gaussian(x, *popt), label="gaussian")
# Fit with custom fit function with fixed `sigma`
custom_gaussian = lambda x, mu: gaussian(x, mu, 0.05)
popt, _ = scipy.optimize.curve_fit(custom_gaussian, x, y)
plt.plot(x, custom_gaussian(x, *popt), label="custom_gaussian")
plt.legend()
plt.show()
Hopefully this is helpful. Had to use hax. Curve_fit is pretty strict about what it takes.
import numpy as np
from numpy import random
import scipy as sp
from scipy.optimize import curve_fit
import matplotlib.pyplot as pl
def exp1(t,a1,tau1):
#A1*exp(-t/t1)
val=0.
val=(a1*np.exp(-t/tau1))*np.heaviside(t,0)
return val
def wrapper(t,*args):
global hold
global p0
wrapperName='exp1(t,'
for i in range(0, len(hold)):
if hold[i]:
wrapperName+=str(p0[i])
else:
if i%2==0:
wrapperName+='args['+str(i)+']'
else:
wrapperName+='args'+str(i)+']'
if i<len(hold):
wrapperName+=','
wrapperName+=')'
return eval(wrapperName)
p0=np.array([1.5,500.])
hold=np.array([0,1])
p1=np.delete(p0,1)
timepoints = np.arange(0.,2000.,20.)
y=exp1(timepoints,1,1000)+np.random.normal(0, .1, size=len(timepoints))
popt, pcov = curve_fit(exp1, timepoints, y, p0=p0)
print 'unheld parameters:', popt, pcov
popt, pcov = curve_fit(wrapper, timepoints, y, p0=p1)
for i in range(0, len(hold)):
if hold[i]:
popt=np.insert(popt,i,p0[i])
yfit=exp1(timepoints,popt[0],popt[1])
pl.plot(timepoints,y,timepoints,yfit)
pl.show()
print 'hold parameters:', popt, pcov