I'm using scipy.optimize.curve_fit to approximate peaks in my data with Gaussian functions. This works well for strong peaks, but it is more difficult with weaker peaks. However, I think fixing a parameter (say, width of the Gaussian) would help with this. I know I can set initial "estimates" but is there a way that I can easily define a single parameter without changing the function I'm fitting to?
If you want to "fix" a parameter of your fit function, you can just define a new fit function which makes use of the original fit function, yet setting one argument to a fixed value:
custom_gaussian = lambda x, mu: gaussian(x, mu, 0.05)
Here's a complete example of fixing sigma of a Gaussian function to 0.05 (instead of optimal value 0.1). Of course, this doesn't really make sense here because the algorithm has no problem in finding optimal values. Yet, you can see how mu is still found despite the fixed sigma.
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize
def gaussian(x, mu, sigma):
return 1 / sigma / np.sqrt(2 * np.pi) * np.exp(-(x - mu)**2 / 2 / sigma**2)
# Create sample data
x = np.linspace(0, 2, 200)
y = gaussian(x, 1, 0.1) + np.random.rand(*x.shape) - 0.5
plt.plot(x, y, label="sample data")
# Fit with original fit function
popt, _ = scipy.optimize.curve_fit(gaussian, x, y)
plt.plot(x, gaussian(x, *popt), label="gaussian")
# Fit with custom fit function with fixed `sigma`
custom_gaussian = lambda x, mu: gaussian(x, mu, 0.05)
popt, _ = scipy.optimize.curve_fit(custom_gaussian, x, y)
plt.plot(x, custom_gaussian(x, *popt), label="custom_gaussian")
plt.legend()
plt.show()
Hopefully this is helpful. Had to use hax. Curve_fit is pretty strict about what it takes.
import numpy as np
from numpy import random
import scipy as sp
from scipy.optimize import curve_fit
import matplotlib.pyplot as pl
def exp1(t,a1,tau1):
#A1*exp(-t/t1)
val=0.
val=(a1*np.exp(-t/tau1))*np.heaviside(t,0)
return val
def wrapper(t,*args):
global hold
global p0
wrapperName='exp1(t,'
for i in range(0, len(hold)):
if hold[i]:
wrapperName+=str(p0[i])
else:
if i%2==0:
wrapperName+='args['+str(i)+']'
else:
wrapperName+='args'+str(i)+']'
if i<len(hold):
wrapperName+=','
wrapperName+=')'
return eval(wrapperName)
p0=np.array([1.5,500.])
hold=np.array([0,1])
p1=np.delete(p0,1)
timepoints = np.arange(0.,2000.,20.)
y=exp1(timepoints,1,1000)+np.random.normal(0, .1, size=len(timepoints))
popt, pcov = curve_fit(exp1, timepoints, y, p0=p0)
print 'unheld parameters:', popt, pcov
popt, pcov = curve_fit(wrapper, timepoints, y, p0=p1)
for i in range(0, len(hold)):
if hold[i]:
popt=np.insert(popt,i,p0[i])
yfit=exp1(timepoints,popt[0],popt[1])
pl.plot(timepoints,y,timepoints,yfit)
pl.show()
print 'hold parameters:', popt, pcov
Related
I have been trying to fit a gaussian curve to my data
data
I have used the following code:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def gaus(x, y0, a, b, c):
return y0 + a*np.exp(-np.power(x - b, 2)/(2*np.power(c, 2)))
popt, pcov = curve_fit(gaus, x, y)
plt.figure()
plt.scatter(x, y, c='grey', marker = 'o', label = "Measured values", s = 2)
plt.plot(x, gaus(x, *popt), c='grey', linestyle = '-')
And that's what I am getting:
result
I have the x/y data available here in case you want to try it by yourself.
Any idea on how can I get a fit? This data is obviously gaussian shaped, so it seems weird I cannot fit a gaussian curve.
The fit needs a decent starting point. Per the docs if you do not specify the starting point all parameters are set to 1 which is clearly not appropriate, and the fit gets stuck in some wrong local minima. Try this, where I chose the starting point by eyeballing the data
popt, pcov = curve_fit(gaus, x, y, p0 = (1500,2000,20, 1))
you would get something like this:
and the solution found by the solver is
popt
array([1559.13138798, 2128.64718985, 21.50092272, 0.16298357])
Even just getting the mean (parameter b) roughly right is enough for the solver to find the solution, eg try this
popt, pcov = curve_fit(gaus, x, y, p0 = (1,1,20, 1))
you should see the same (good) result
I'm currently working on a lab report for Brownian Motion using this PDF equation with the intent of evaluating D:
Brownian PDF equation
And I am trying to curve_fit it to a histogram. However, whenever I plot my curve_fits, it's a line and does not appear correctly on the histogram.
Example Histogram with bad curve_fit
And here is my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
# Variables
eta = 1e-3
ra = 0.95e-6
T = 296.5
t = 0.5
# Random data
r = np.array(np.random.rayleigh(0.5e-6, 500))
# Histogram
plt.hist(r, bins=10, density=True, label='Counts')
# Curve fit
x,y = np.histogram(r, bins=10, density=True)
x = x[2:]
y = y[2:]
bin_width = y[1] - y[2]
print(bin_width)
bin_centers = (y[1:] + y[:-1])/2
err = x*0 + 0.03
def f(r, a):
return (((1e-6)3*np.pi*r*eta*ra)/(a*T*t))*np.exp(((-3*(1e-6 * r)**2)*eta*ra*np.pi)/(a*T*t))
print(x) # these are flipped for some reason
print(y)
plt.plot(bin_centers, x, label='Fitting this', color='red')
popt, pcov = optimize.curve_fit(f, bin_centers, x, p0 = (1.38e-23), sigma=err, maxfev=1000)
plt.plot(y, f(y, popt), label='PDF', color='orange')
print(popt)
plt.title('Distance vs Counts')
plt.ylabel('Counts')
plt.xlabel('Distance in micrometers')
plt.legend()
Is the issue with my curve_fit? Or is there an underlying issue I'm missing?
EDIT: I broke down D to get the Boltzmann constant as a in the function, which is why there are more numbers in f than the equation above. D and Gamma.
I've tried messing with the initial conditions and plotting the function with 1.38e-23 instead of popt, but that does this (the purple line). This tells me something is wrong with the equation for f, but no issues jump out to me when I look at it. Am I missing something?
EDIT 2: I changed the function to this to simplify it and match the numpy.random.rayleigh() distribution:
def f(r, a):
return ((r)/(a))*np.exp((-1*(r)**2)/(2*a))
But this doesn't resolve the issue that the curve_fit is a line with a positive slope instead of anything remotely what I'm interested in. Now I am more confused as to what the issue is.
There are a few things here. I don't think x and y were ever flipped, or at least when I assumed they weren't, everything seemed to work fine. I also cleaned up a few parts of the code, for example, I'm not sure why you call two different histograms; and I think there may have been problems handling the single element tuple of parameters. Also, for curve fitting, the initial parameter guess often needs to be in the ballpark, so I changed that too.
Here's a version that works for me:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
# Random data
r = np.array(np.random.rayleigh(0.5e-6, 500))
# Histogram
hist_values, bin_edges, patches = plt.hist(r, bins=10, density=True, label='Counts')
bin_centers = (bin_edges[1:] + bin_edges[:-1])/2
x = bin_centers[2:] # not necessary, and I'm not sure why the OP did this, but I'm doing this here because OP does
y = hist_values[2:]
def f(r, a):
return (r/(a*a))*np.exp((-1*(r**2))/(2*a*a))
plt.plot(x, y, label='Fitting this', color='red')
err = x*0 + 0.03
popt, pcov = optimize.curve_fit(f, x, y, p0 = (1.38e-6,), sigma=err, maxfev=1000)
plt.plot(x, f(x, *popt), label='PDF', color='orange')
plt.title('Distance vs Counts')
plt.ylabel('Counts')
plt.xlabel('Distance in Meters') # Motion seems to be in micron range, but calculation and plot has been done in meters
plt.legend()
I am trying to fit an exponential CDF to my data to see if it is a good fit/develop an equation from the fit, but am not sure how since I think scipy.stats fits the PDF, not the CDF. If I have the data below:
eta = [1,0.5,0.3,0.25,0.2];
q = [1e-9,9.9981e-10,9.9504e-10,9.7905e-10,9.492e-10];
How do I fit an exponential CDF to the data? Or how do find the distribution that fits the data the best?
You can define a general exp function, and use curve_fit from scipy.optimize:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def exp_func(x, a, b, c):
return a * np.exp(-b * x) + c
eta = np.array([1,0.5,0.3,0.25,0.2])
cdf = np.array([1e-9,9.9981e-10,9.9504e-10,9.7905e-10,9.492e-10])
popt, pcov = curve_fit(exp_func, eta, cdf)
plt.plot(eta, cdf)
plt.plot(eta, exp_func(eta, *popt), 'r-', label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.legend()
plt.show()
And you'll get an exp function which is very similar to your values:
From the fitted parameters, you can see the function is y=np.exp(-19.213 * x).
* Update *
If you want to make sure this is really a CDF function, you'll need to calculate the pdf (by taking the derivative):
x = np.linspace(0, 1, 1000)
cdf_fit = exp_func(x, *popt)
cdf_diff = np.r_[cdf_fit[0], np.diff(cdf_fit)]
You can do a sanity check:
plt.plot(x, np.cumsum(cdf_diff))
And then use scipy to fit the pdf to an exponent distribution:
from scipy.stats import expon
params = expon.fit(cdf_diff)
pdf_fit = expon.pdf(x, *params)
I must warn you the something doesn't sum up. pdf_fit doesn't align with cdf_diff. Maybe your CDF isn't a real distribution function? The last value of a CDF should be 1.
I would like to find and plot a function f that represents a curve fitted on some number of set points that I already know, x and y.
After some research I started experimenting with scipy.optimize and curve_fit but on the reference guide I found that the program uses a function to fit the data instead and it assumes ydata = f(xdata, *params) + eps.
So my question is this: What do I have to change in my code to use the curve_fit or any other library to find the function of the curve using my set points? (note: I want to know the function as well so I can integrate later for my project and plot it). I know that its going to be a decaying exponencial function but don't know the exact parameters. This is what I tried in my program:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
xdata = np.array([0.2, 0.5, 0.8, 1])
ydata = np.array([6, 1, 0.5, 0.2])
plt.plot(xdata, ydata, 'b-', label='data')
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(xdata, func(xdata, *popt), 'r-', label='fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
Am currently developing this project on a Raspberry Pi, if it changes anything. And would like to use least squares method since is great and precise, but any other method that works well is welcome.
Again, this is based on the reference guide of scipy library. Also, I get the following graph, which is not even a curve: Graph and curve based on set points
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
#c is a constant so taking the derivative makes it go to zero
def deriv(x, a, b, c):
return -a * b * np.exp(-b * x)
#Integrating gives you another c coefficient (offset) let's call it c1 and set it equal to zero by default
def integ(x, a, b, c, c1 = 0):
return -a/b * np.exp(-b * x) + c*x + c1
#There are only 4 (x,y) points here
xdata = np.array([0.2, 0.5, 0.8, 1])
ydata = np.array([6, 1, 0.5, 0.2])
#curve_fit already uses "non-linear least squares to fit a function, f, to data"
popt, pcov = curve_fit(func, xdata, ydata)
a,b,c = popt #these are the optimal parameters for fitting your 4 data points
#Now get more x values to plot the curve along so it looks like a curve
step = 0.01
fit_xs = np.arange(min(xdata),max(xdata),step)
#Plot the results
plt.plot(xdata, ydata, 'bx', label='data')
plt.plot(fit_xs, func(fit_xs,a,b,c), 'r-', label='fit')
plt.plot(fit_xs, deriv(fit_xs,a,b,c), 'g-', label='deriv')
plt.plot(fit_xs, integ(fit_xs,a,b,c), 'm-', label='integ')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
I have fitted curve to a set of data points. I would like to know how to find the maximum point of my curve and then I would like to annotate that point (I don't want to use by largest y value from my data to do this). I cannot exactly write my code but here is the basic layout of my code.
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = [1,2,3,4,5]
y = [1,4,16,4,1]
def f(x, p1, p2, p3):
return p3*(p1/((x-p2)**2 + (p1/2)**2))
p0 = (8, 16, 0.1) # guess perameters
plt.plot(x,y,"ro")
popt, pcov = curve_fit(f, x, y, p0)
plt.plot(x, f(x, *popt))
Also is there a way to find the peak width?
Am I missing a simple built in function that could do this? Could I differentiate the function and find the point at which it is zero? If so how?
After you fit to find the best parameters to maximize your function, you can find the peak using minimize_scalar (or one of the other methods from scipy.optimize).
Note that in below, I've shifted x[2]=3.2 so that the peak of the curve doesn't land on a data point and we can be sure we're finding the peak to the curve, not the data.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit, minimize_scalar
x = [1,2,3.2,4,5]
y = [1,4,16,4,1]
def f(x, p1, p2, p3):
return p3*(p1/((x-p2)**2 + (p1/2)**2))
p0 = (8, 16, 0.1) # guess perameters
plt.plot(x,y,"ro")
popt, pcov = curve_fit(f, x, y, p0)
# find the peak
fm = lambda x: -f(x, *popt)
r = minimize_scalar(fm, bounds=(1, 5))
print "maximum:", r["x"], f(r["x"], *popt) #maximum: 2.99846874275 18.3928199902
x_curve = np.linspace(1, 5, 100)
plt.plot(x_curve, f(x_curve, *popt))
plt.plot(r['x'], f(r['x'], *popt), 'ko')
plt.show()
Of course, rather than optimizing the function, we could just calculate it for a bunch of x-values and get close:
x = np.linspace(1, 5, 10000)
y = f(x, *popt)
imax = np.argmax(y)
print imax, x[imax] # 4996 2.99859985999
If you don't mind using sympy, it's pretty easy. Assuming the code you posted has already been run:
import sympy
sym_x = sympy.symbols('x', real=True)
sym_f = f(sym_x, *popt)
sym_df = sym_f.diff()
solns = sympy.solve(sym_df) # returns [3.0]