I'm currently working on a lab report for Brownian Motion using this PDF equation with the intent of evaluating D:
Brownian PDF equation
And I am trying to curve_fit it to a histogram. However, whenever I plot my curve_fits, it's a line and does not appear correctly on the histogram.
Example Histogram with bad curve_fit
And here is my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
# Variables
eta = 1e-3
ra = 0.95e-6
T = 296.5
t = 0.5
# Random data
r = np.array(np.random.rayleigh(0.5e-6, 500))
# Histogram
plt.hist(r, bins=10, density=True, label='Counts')
# Curve fit
x,y = np.histogram(r, bins=10, density=True)
x = x[2:]
y = y[2:]
bin_width = y[1] - y[2]
print(bin_width)
bin_centers = (y[1:] + y[:-1])/2
err = x*0 + 0.03
def f(r, a):
return (((1e-6)3*np.pi*r*eta*ra)/(a*T*t))*np.exp(((-3*(1e-6 * r)**2)*eta*ra*np.pi)/(a*T*t))
print(x) # these are flipped for some reason
print(y)
plt.plot(bin_centers, x, label='Fitting this', color='red')
popt, pcov = optimize.curve_fit(f, bin_centers, x, p0 = (1.38e-23), sigma=err, maxfev=1000)
plt.plot(y, f(y, popt), label='PDF', color='orange')
print(popt)
plt.title('Distance vs Counts')
plt.ylabel('Counts')
plt.xlabel('Distance in micrometers')
plt.legend()
Is the issue with my curve_fit? Or is there an underlying issue I'm missing?
EDIT: I broke down D to get the Boltzmann constant as a in the function, which is why there are more numbers in f than the equation above. D and Gamma.
I've tried messing with the initial conditions and plotting the function with 1.38e-23 instead of popt, but that does this (the purple line). This tells me something is wrong with the equation for f, but no issues jump out to me when I look at it. Am I missing something?
EDIT 2: I changed the function to this to simplify it and match the numpy.random.rayleigh() distribution:
def f(r, a):
return ((r)/(a))*np.exp((-1*(r)**2)/(2*a))
But this doesn't resolve the issue that the curve_fit is a line with a positive slope instead of anything remotely what I'm interested in. Now I am more confused as to what the issue is.
There are a few things here. I don't think x and y were ever flipped, or at least when I assumed they weren't, everything seemed to work fine. I also cleaned up a few parts of the code, for example, I'm not sure why you call two different histograms; and I think there may have been problems handling the single element tuple of parameters. Also, for curve fitting, the initial parameter guess often needs to be in the ballpark, so I changed that too.
Here's a version that works for me:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
# Random data
r = np.array(np.random.rayleigh(0.5e-6, 500))
# Histogram
hist_values, bin_edges, patches = plt.hist(r, bins=10, density=True, label='Counts')
bin_centers = (bin_edges[1:] + bin_edges[:-1])/2
x = bin_centers[2:] # not necessary, and I'm not sure why the OP did this, but I'm doing this here because OP does
y = hist_values[2:]
def f(r, a):
return (r/(a*a))*np.exp((-1*(r**2))/(2*a*a))
plt.plot(x, y, label='Fitting this', color='red')
err = x*0 + 0.03
popt, pcov = optimize.curve_fit(f, x, y, p0 = (1.38e-6,), sigma=err, maxfev=1000)
plt.plot(x, f(x, *popt), label='PDF', color='orange')
plt.title('Distance vs Counts')
plt.ylabel('Counts')
plt.xlabel('Distance in Meters') # Motion seems to be in micron range, but calculation and plot has been done in meters
plt.legend()
Related
I have been trying to fit a gaussian curve to my data
data
I have used the following code:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def gaus(x, y0, a, b, c):
return y0 + a*np.exp(-np.power(x - b, 2)/(2*np.power(c, 2)))
popt, pcov = curve_fit(gaus, x, y)
plt.figure()
plt.scatter(x, y, c='grey', marker = 'o', label = "Measured values", s = 2)
plt.plot(x, gaus(x, *popt), c='grey', linestyle = '-')
And that's what I am getting:
result
I have the x/y data available here in case you want to try it by yourself.
Any idea on how can I get a fit? This data is obviously gaussian shaped, so it seems weird I cannot fit a gaussian curve.
The fit needs a decent starting point. Per the docs if you do not specify the starting point all parameters are set to 1 which is clearly not appropriate, and the fit gets stuck in some wrong local minima. Try this, where I chose the starting point by eyeballing the data
popt, pcov = curve_fit(gaus, x, y, p0 = (1500,2000,20, 1))
you would get something like this:
and the solution found by the solver is
popt
array([1559.13138798, 2128.64718985, 21.50092272, 0.16298357])
Even just getting the mean (parameter b) roughly right is enough for the solver to find the solution, eg try this
popt, pcov = curve_fit(gaus, x, y, p0 = (1,1,20, 1))
you should see the same (good) result
So I have two lists of data, which I can plot in a scatter plot, as such:
from matplotlib import pyplot as plt
x = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
y = [22.4155688819,22.3936180362,22.3177538001,22.1924849792,21.7721194577,21.1590235248,20.6670446864,20.4996957642,20.4260953411,20.3595072628,20.3926201626,20.6023149681,21.1694961343,22.1077417713,23.8270366414,26.5355924353,31.3179807276,42.7871637946,61.9639549412,84.7710953311]
plt.scatter(degrees,RMS_one_image)
This gives you a plot that looks like a Gaussian distribution, which is good as it should-
My issue is however I am trying to fit a Gaussian distribution to this, and failing miserably because a. it's only half a Gaussian instead of a full one, and b. what I've used before has only ever used one bunch of numbers. So something like:
# best fit of data
num_bins = 20
(mu, sigma) = norm.fit(sixteen)
y = mlab.normpdf(num_bins, mu, sigma)
n, bins, patches = plt.hist(deg_array, num_bins, normed=1, facecolor='blue', alpha=0.5)
# add a 'best fit' line
y = mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r--')
Does this approach work at all here, or am I going about this in the wrong way completely? Thanks...
It seems that your normal solution is to find the expectation value and standard deviation of the data directly instead of using a least square fit. Here is a solution using curve_fit from scipy.optimize.
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
x = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])
y = [22.4155688819,22.3936180362,22.3177538001,22.1924849792,21.7721194577,21.1590235248,20.6670446864,20.4996957642,20.4260953411,20.3595072628,20.3926201626,20.6023149681,21.1694961343,22.1077417713,23.8270366414,26.5355924353,31.3179807276,42.7871637946,61.9639549412,84.7710953311]
# Define a gaussian function with offset
def gaussian_func(x, a, x0, sigma,c):
return a * np.exp(-(x-x0)**2/(2*sigma**2)) + c
initial_guess = [1,20,2,0]
popt, pcov = curve_fit(gaussian_func, x, y,p0=initial_guess)
xplot = np.linspace(0,30,1000)
plt.scatter(x,y)
plt.plot(xplot,gaussian_func(xplot,*popt))
plt.show()
I'm using scipy.optimize.curve_fit to approximate peaks in my data with Gaussian functions. This works well for strong peaks, but it is more difficult with weaker peaks. However, I think fixing a parameter (say, width of the Gaussian) would help with this. I know I can set initial "estimates" but is there a way that I can easily define a single parameter without changing the function I'm fitting to?
If you want to "fix" a parameter of your fit function, you can just define a new fit function which makes use of the original fit function, yet setting one argument to a fixed value:
custom_gaussian = lambda x, mu: gaussian(x, mu, 0.05)
Here's a complete example of fixing sigma of a Gaussian function to 0.05 (instead of optimal value 0.1). Of course, this doesn't really make sense here because the algorithm has no problem in finding optimal values. Yet, you can see how mu is still found despite the fixed sigma.
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize
def gaussian(x, mu, sigma):
return 1 / sigma / np.sqrt(2 * np.pi) * np.exp(-(x - mu)**2 / 2 / sigma**2)
# Create sample data
x = np.linspace(0, 2, 200)
y = gaussian(x, 1, 0.1) + np.random.rand(*x.shape) - 0.5
plt.plot(x, y, label="sample data")
# Fit with original fit function
popt, _ = scipy.optimize.curve_fit(gaussian, x, y)
plt.plot(x, gaussian(x, *popt), label="gaussian")
# Fit with custom fit function with fixed `sigma`
custom_gaussian = lambda x, mu: gaussian(x, mu, 0.05)
popt, _ = scipy.optimize.curve_fit(custom_gaussian, x, y)
plt.plot(x, custom_gaussian(x, *popt), label="custom_gaussian")
plt.legend()
plt.show()
Hopefully this is helpful. Had to use hax. Curve_fit is pretty strict about what it takes.
import numpy as np
from numpy import random
import scipy as sp
from scipy.optimize import curve_fit
import matplotlib.pyplot as pl
def exp1(t,a1,tau1):
#A1*exp(-t/t1)
val=0.
val=(a1*np.exp(-t/tau1))*np.heaviside(t,0)
return val
def wrapper(t,*args):
global hold
global p0
wrapperName='exp1(t,'
for i in range(0, len(hold)):
if hold[i]:
wrapperName+=str(p0[i])
else:
if i%2==0:
wrapperName+='args['+str(i)+']'
else:
wrapperName+='args'+str(i)+']'
if i<len(hold):
wrapperName+=','
wrapperName+=')'
return eval(wrapperName)
p0=np.array([1.5,500.])
hold=np.array([0,1])
p1=np.delete(p0,1)
timepoints = np.arange(0.,2000.,20.)
y=exp1(timepoints,1,1000)+np.random.normal(0, .1, size=len(timepoints))
popt, pcov = curve_fit(exp1, timepoints, y, p0=p0)
print 'unheld parameters:', popt, pcov
popt, pcov = curve_fit(wrapper, timepoints, y, p0=p1)
for i in range(0, len(hold)):
if hold[i]:
popt=np.insert(popt,i,p0[i])
yfit=exp1(timepoints,popt[0],popt[1])
pl.plot(timepoints,y,timepoints,yfit)
pl.show()
print 'hold parameters:', popt, pcov
I have fitted curve to a set of data points. I would like to know how to find the maximum point of my curve and then I would like to annotate that point (I don't want to use by largest y value from my data to do this). I cannot exactly write my code but here is the basic layout of my code.
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = [1,2,3,4,5]
y = [1,4,16,4,1]
def f(x, p1, p2, p3):
return p3*(p1/((x-p2)**2 + (p1/2)**2))
p0 = (8, 16, 0.1) # guess perameters
plt.plot(x,y,"ro")
popt, pcov = curve_fit(f, x, y, p0)
plt.plot(x, f(x, *popt))
Also is there a way to find the peak width?
Am I missing a simple built in function that could do this? Could I differentiate the function and find the point at which it is zero? If so how?
After you fit to find the best parameters to maximize your function, you can find the peak using minimize_scalar (or one of the other methods from scipy.optimize).
Note that in below, I've shifted x[2]=3.2 so that the peak of the curve doesn't land on a data point and we can be sure we're finding the peak to the curve, not the data.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit, minimize_scalar
x = [1,2,3.2,4,5]
y = [1,4,16,4,1]
def f(x, p1, p2, p3):
return p3*(p1/((x-p2)**2 + (p1/2)**2))
p0 = (8, 16, 0.1) # guess perameters
plt.plot(x,y,"ro")
popt, pcov = curve_fit(f, x, y, p0)
# find the peak
fm = lambda x: -f(x, *popt)
r = minimize_scalar(fm, bounds=(1, 5))
print "maximum:", r["x"], f(r["x"], *popt) #maximum: 2.99846874275 18.3928199902
x_curve = np.linspace(1, 5, 100)
plt.plot(x_curve, f(x_curve, *popt))
plt.plot(r['x'], f(r['x'], *popt), 'ko')
plt.show()
Of course, rather than optimizing the function, we could just calculate it for a bunch of x-values and get close:
x = np.linspace(1, 5, 10000)
y = f(x, *popt)
imax = np.argmax(y)
print imax, x[imax] # 4996 2.99859985999
If you don't mind using sympy, it's pretty easy. Assuming the code you posted has already been run:
import sympy
sym_x = sympy.symbols('x', real=True)
sym_f = f(sym_x, *popt)
sym_df = sym_f.diff()
solns = sympy.solve(sym_df) # returns [3.0]
I would like to visually plot a 3D graph of the error function calculated for a given slope and y-intercept for a linear regression.
This graph will be used to illustrate a gradient descent application.
Let’s suppose we want to model a set of points with a line. To do this we’ll use the standard y=mx+b line equation where m is the line’s slope and b is the line’s y-intercept. To find the best line for our data, we need to find the best set of slope m and y-intercept b values.
A standard approach to solving this type of problem is to define an error function (also called a cost function) that measures how “good” a given line is. This function will take in a (m,b) pair and return an error value based on how well the line fits the data. To compute this error for a given line, we’ll iterate through each (x,y) point in the data set and sum the square distances between each point’s y value and the candidate line’s y value (computed at mx+b). It’s conventional to square this distance to ensure that it is positive and to make our error function differentiable. In python, computing the error for a given line will look like:
# y = mx + b
# m is slope, b is y-intercept
def computeErrorForLineGivenPoints(b, m, points):
totalError = 0
for i in range(0, len(points)):
totalError += (points[i].y - (m * points[i].x + b)) ** 2
return totalError / float(len(points))
Since the error function consists of two parameters (m and b) we can visualize it as a two-dimensional surface.
Now my question, how can we plot such 3D-graph using python ?
Here is a skeleton code to build a 3D plot. This code snippet is totally out of the question context but it show the basics for building a 3D plot.
For my example i would need the x-axis being the slope, the y-axis being the y-intercept and the z-axis, the error.
Can someone help me build such example of graph ?
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import random
def fun(x, y):
return x**2 + y
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = y = np.arange(-3.0, 3.0, 0.05)
X, Y = np.meshgrid(x, y)
zs = np.array([fun(x,y) for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = zs.reshape(X.shape)
ax.plot_surface(X, Y, Z)
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
plt.show()
The above code produce the following plot, which is very similar to what i am looking for.
Simply replace fun with computeErrorForLineGivenPoints:
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import collections
def error(m, b, points):
totalError = 0
for i in range(0, len(points)):
totalError += (points[i].y - (m * points[i].x + b)) ** 2
return totalError / float(len(points))
x = y = np.arange(-3.0, 3.0, 0.05)
Point = collections.namedtuple('Point', ['x', 'y'])
m, b = 3, 2
noise = np.random.random(x.size)
points = [Point(xp, m*xp+b+err) for xp,err in zip(x, noise)]
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ms = np.linspace(2.0, 4.0, 10)
bs = np.linspace(1.5, 2.5, 10)
M, B = np.meshgrid(ms, bs)
zs = np.array([error(mp, bp, points)
for mp, bp in zip(np.ravel(M), np.ravel(B))])
Z = zs.reshape(M.shape)
ax.plot_surface(M, B, Z, rstride=1, cstride=1, color='b', alpha=0.5)
ax.set_xlabel('m')
ax.set_ylabel('b')
ax.set_zlabel('error')
plt.show()
yields
Tip: I renamed computeErrorForLineGivenPoints as error. Generally, there is no need to name a function compute... since almost all functions compute something. You also do not need to specify "GivenPoints" since the function signature shows that points is an argument. If you have other error functions or variables in your program, line_error or total_error might be a better name for this function.