I'm trying to fit 2D data points with a polynomial curve; see the picture below. The blue points are the data. The blue, dashed line is a 2nd order polynomial fit to these points. I want to force my fit to have exactly the same shape as the black line, and I want to compute the y offset of the new fit from the black curve. Any ideas on how this would be possible? Thanks in advance.
x = np.linspace(6.0,12.0,num=100)
a = -0.0864
b = 11.18
c = 9.04
fit_y = a*(x - b)**2 + c # black line
z = np.polyfit(data_x,data_y,2)
zfit=z[2]+z[1]*x+z[0]*x**2
fig, ax = plt.subplots()
ax.plot(data_x,data_y,'.',color='b')
ax.plot(x,fit_y,color='black') #curve of which we want the shape
ax.plot(x,zfit,color='blue',linestyle='dashed') #polynomial fit
ax.set_xlim([6.5,11.0])
ax.set_ylim([6.5,10.5])
plt.show()
Edit: This is the solution to my problem:
x = np.linspace(6.0,12.0,num=100)
# We want to keep a and b fixed to keep the same shape
# a = -0.0864
# b = 11.18
c = 9.04
#Only c is a variable because we only want to shift the plot on the y axis
def f(x, c):
return -0.0864*(x - 11.18)**2 + c
popt, pcov = curve_fit(f, data_x, data_y) # popt are the fitted parameters
plt.plot(data_x, data_y,'.') #blue data points
plt.plot(x,f(x, c),'black') #black line, this is the shape we want our fit to have
plt.plot(x, f(x, *popt), 'red') # new fitted line to the data (with same shape as black line)
plt.xlim([6.5,11.0])
plt.ylim([6.5,10.5])
plt.show()
print("y offset:", popt[0] - c)
y offset: 0.23492393887717355
solution
You want to use scipy.optimize.curve_fit. As you can see in the documentation, you can define your own function fit_y with the parameters to fit. Once the fit is done, you can compute the y offset (respect to the origin?) simply calculating the function in x=0. Below I show you an example code where I used a root function (this is what your black curve looks like):
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def f(x, a, b, c):
return a * np.power(x, b) + c
x_data = np.arange(100)
noise = np.random.normal(size=100)
y_data = np.power(x_data, 0.5) + noise
y = f(x_data, 1, 2, 0.3) # random values to initialize the fit
popt, _ = curve_fit(f, x_data, y_data) # popt are the fitted parameters
plt.scatter(x_data, y_data)
plt.plot(x_data, f(x_data, *popt), 'r') # fitted line
plt.show()
print("y offset:", f(0, *popt))
I don't have enough reputation to post the plot, but just run the code to see yourself.
Related
I have been trying to fit a gaussian curve to my data
data
I have used the following code:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def gaus(x, y0, a, b, c):
return y0 + a*np.exp(-np.power(x - b, 2)/(2*np.power(c, 2)))
popt, pcov = curve_fit(gaus, x, y)
plt.figure()
plt.scatter(x, y, c='grey', marker = 'o', label = "Measured values", s = 2)
plt.plot(x, gaus(x, *popt), c='grey', linestyle = '-')
And that's what I am getting:
result
I have the x/y data available here in case you want to try it by yourself.
Any idea on how can I get a fit? This data is obviously gaussian shaped, so it seems weird I cannot fit a gaussian curve.
The fit needs a decent starting point. Per the docs if you do not specify the starting point all parameters are set to 1 which is clearly not appropriate, and the fit gets stuck in some wrong local minima. Try this, where I chose the starting point by eyeballing the data
popt, pcov = curve_fit(gaus, x, y, p0 = (1500,2000,20, 1))
you would get something like this:
and the solution found by the solver is
popt
array([1559.13138798, 2128.64718985, 21.50092272, 0.16298357])
Even just getting the mean (parameter b) roughly right is enough for the solver to find the solution, eg try this
popt, pcov = curve_fit(gaus, x, y, p0 = (1,1,20, 1))
you should see the same (good) result
I am trying to fit a power law to some data following a power law with noise, displayed in a log-log scale:
The fit with scipy curve_fit is the orange line and the red line is the noiseless power law.
Here is the code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def powerFit(x, A, B):
return A * x**(-B)
x = np.logspace(np.log10(1),np.log10(1e5),30)
A = 1
B = 2
np.random.seed(1)
y_noise = np.random.normal(1, 0.2, size=x.size)
y = powerFit(x, A, B) * y_noise
plt.plot(x, y, 'o')
plt.xscale('log')
plt.yscale('log')
popt, pcov = curve_fit(powerFit, x, y, p0 = [A, B])
perr = np.sqrt(np.diag(pcov))
residuals = y - powerFit(x, *popt)
ss_res = np.sum(residuals**2)
ss_tot = np.sum((y-np.mean(y))**2)
r_squared = 1 - (ss_res / ss_tot)
plt.plot(x, powerFit(x, *popt),
label = 'fit: A*x**(-B)\n A=%.1e +- %.0e\n B=%.1e +- %.0e\n R2 = %.2f' %tuple(np.concatenate(( np.array([popt, perr]).T.flatten(), [r_squared]) )))
plt.plot(x, powerFit(x, A, B), 'r--', label = 'A = %d, B = %d'%(A,B))
plt.legend()
plt.savefig("fig.png", dpi = 300)
I do not understand what is happening with the fitted power law. Why does it look wrong? How could I solve this?
Note: I know you could also fit the power law plotting log(y) vs log(x). But according to this answer, it seems curve_fit should also manage to do it right directly. So my question is if it is possible to do a power law fit in a log log scale without log transforming. I am interested in avoiding the log-log transformation because it is not possible to apply to any fit (consider for example the fit to y = A*x**(-Bx) ).
I'm currently working on a lab report for Brownian Motion using this PDF equation with the intent of evaluating D:
Brownian PDF equation
And I am trying to curve_fit it to a histogram. However, whenever I plot my curve_fits, it's a line and does not appear correctly on the histogram.
Example Histogram with bad curve_fit
And here is my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
# Variables
eta = 1e-3
ra = 0.95e-6
T = 296.5
t = 0.5
# Random data
r = np.array(np.random.rayleigh(0.5e-6, 500))
# Histogram
plt.hist(r, bins=10, density=True, label='Counts')
# Curve fit
x,y = np.histogram(r, bins=10, density=True)
x = x[2:]
y = y[2:]
bin_width = y[1] - y[2]
print(bin_width)
bin_centers = (y[1:] + y[:-1])/2
err = x*0 + 0.03
def f(r, a):
return (((1e-6)3*np.pi*r*eta*ra)/(a*T*t))*np.exp(((-3*(1e-6 * r)**2)*eta*ra*np.pi)/(a*T*t))
print(x) # these are flipped for some reason
print(y)
plt.plot(bin_centers, x, label='Fitting this', color='red')
popt, pcov = optimize.curve_fit(f, bin_centers, x, p0 = (1.38e-23), sigma=err, maxfev=1000)
plt.plot(y, f(y, popt), label='PDF', color='orange')
print(popt)
plt.title('Distance vs Counts')
plt.ylabel('Counts')
plt.xlabel('Distance in micrometers')
plt.legend()
Is the issue with my curve_fit? Or is there an underlying issue I'm missing?
EDIT: I broke down D to get the Boltzmann constant as a in the function, which is why there are more numbers in f than the equation above. D and Gamma.
I've tried messing with the initial conditions and plotting the function with 1.38e-23 instead of popt, but that does this (the purple line). This tells me something is wrong with the equation for f, but no issues jump out to me when I look at it. Am I missing something?
EDIT 2: I changed the function to this to simplify it and match the numpy.random.rayleigh() distribution:
def f(r, a):
return ((r)/(a))*np.exp((-1*(r)**2)/(2*a))
But this doesn't resolve the issue that the curve_fit is a line with a positive slope instead of anything remotely what I'm interested in. Now I am more confused as to what the issue is.
There are a few things here. I don't think x and y were ever flipped, or at least when I assumed they weren't, everything seemed to work fine. I also cleaned up a few parts of the code, for example, I'm not sure why you call two different histograms; and I think there may have been problems handling the single element tuple of parameters. Also, for curve fitting, the initial parameter guess often needs to be in the ballpark, so I changed that too.
Here's a version that works for me:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
# Random data
r = np.array(np.random.rayleigh(0.5e-6, 500))
# Histogram
hist_values, bin_edges, patches = plt.hist(r, bins=10, density=True, label='Counts')
bin_centers = (bin_edges[1:] + bin_edges[:-1])/2
x = bin_centers[2:] # not necessary, and I'm not sure why the OP did this, but I'm doing this here because OP does
y = hist_values[2:]
def f(r, a):
return (r/(a*a))*np.exp((-1*(r**2))/(2*a*a))
plt.plot(x, y, label='Fitting this', color='red')
err = x*0 + 0.03
popt, pcov = optimize.curve_fit(f, x, y, p0 = (1.38e-6,), sigma=err, maxfev=1000)
plt.plot(x, f(x, *popt), label='PDF', color='orange')
plt.title('Distance vs Counts')
plt.ylabel('Counts')
plt.xlabel('Distance in Meters') # Motion seems to be in micron range, but calculation and plot has been done in meters
plt.legend()
I am currently using polynomial defined function to create a 3d curve fitting but to no avail.
image 1 scatter, image 2 curve fitting
code is given below:
#import excel data
"""
how can I improve this polynomial function,
is there any better methods instead of polynomial?
"""
def func(data, a, b, c, d):
x = data[0]
y = data[1]
z = data[2]
return a + b * x + c * y + d * x**2
# using curve fitting to pass the function
fittedParameters, pcov = scipy.optimize.curve_fit(
func, [xData, yData, zData],
zData, p0 = None, method= 'lm', maxfev=5000000
) #, p0 = None, maxfev=5000
# making mesh grid
# making meshgrid
xModel = numpy.linspace( min(x_data), max(x_data), 80) #min(x_data)
yModel = numpy.linspace( min(y_data), max(y_data), 80)
X, Y = numpy.meshgrid( xModel, yModel )
#popt = fittedparameters
a = fittedParameters[0]
b = fittedParameters[1]
c = fittedParameters[2]
d = fittedParameters[3]
x = X
y = Y
Z = a + b * x + c * y + d * x**2
axes.plot_surface(
X, Y, Z,
rstride=1, cstride=1, cmap=cm.coolwarm,
linewidth=1, antialiased=True
)
axes.scatter(x_data, y_data, z_data) # show data along with plotted surface
# add a title for surface plot
axes.set_title('Surface plot of LN(PoF) and length & depth')
axes.set_xlabel('Depth (mm)')
axes.set_ylabel('Length (mm)')
axes.set_zlabel('LN(PoF)') # Z axis data label
plt.show()
enter image description here
inbuild module
#%% splprep and splev for the 2D smoothing of x and y value
def splprep_2d(x,y):
tck, u = interpolate.splprep([x,y], s = 2,
task = 0,full_output=0,quiet = 0,
k = 5, t=None)
fittedParameters = interpolate.splev(u,tck)
xnew = np.array(fittedParameters[0])
ynew = np.array(fittedParameters[1])
return xnew, ynew
xnew, ynew = splprep_2d(x,y)
splprep_2d(x,y)
s = 2 is the smoothing factor, lower would result in accurate plot, using higher smoothing factor results in smoothed curve.
K = parabolic nature of the curve, upto 5th parabolic curve can be used.
These are your smoothed parameter:
xnew = np.array(fittedParameters[0])
ynew = np.array(fittedParameters[1])
I would like to visually plot a 3D graph of the error function calculated for a given slope and y-intercept for a linear regression.
This graph will be used to illustrate a gradient descent application.
Let’s suppose we want to model a set of points with a line. To do this we’ll use the standard y=mx+b line equation where m is the line’s slope and b is the line’s y-intercept. To find the best line for our data, we need to find the best set of slope m and y-intercept b values.
A standard approach to solving this type of problem is to define an error function (also called a cost function) that measures how “good” a given line is. This function will take in a (m,b) pair and return an error value based on how well the line fits the data. To compute this error for a given line, we’ll iterate through each (x,y) point in the data set and sum the square distances between each point’s y value and the candidate line’s y value (computed at mx+b). It’s conventional to square this distance to ensure that it is positive and to make our error function differentiable. In python, computing the error for a given line will look like:
# y = mx + b
# m is slope, b is y-intercept
def computeErrorForLineGivenPoints(b, m, points):
totalError = 0
for i in range(0, len(points)):
totalError += (points[i].y - (m * points[i].x + b)) ** 2
return totalError / float(len(points))
Since the error function consists of two parameters (m and b) we can visualize it as a two-dimensional surface.
Now my question, how can we plot such 3D-graph using python ?
Here is a skeleton code to build a 3D plot. This code snippet is totally out of the question context but it show the basics for building a 3D plot.
For my example i would need the x-axis being the slope, the y-axis being the y-intercept and the z-axis, the error.
Can someone help me build such example of graph ?
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import random
def fun(x, y):
return x**2 + y
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = y = np.arange(-3.0, 3.0, 0.05)
X, Y = np.meshgrid(x, y)
zs = np.array([fun(x,y) for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = zs.reshape(X.shape)
ax.plot_surface(X, Y, Z)
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
plt.show()
The above code produce the following plot, which is very similar to what i am looking for.
Simply replace fun with computeErrorForLineGivenPoints:
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import collections
def error(m, b, points):
totalError = 0
for i in range(0, len(points)):
totalError += (points[i].y - (m * points[i].x + b)) ** 2
return totalError / float(len(points))
x = y = np.arange(-3.0, 3.0, 0.05)
Point = collections.namedtuple('Point', ['x', 'y'])
m, b = 3, 2
noise = np.random.random(x.size)
points = [Point(xp, m*xp+b+err) for xp,err in zip(x, noise)]
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ms = np.linspace(2.0, 4.0, 10)
bs = np.linspace(1.5, 2.5, 10)
M, B = np.meshgrid(ms, bs)
zs = np.array([error(mp, bp, points)
for mp, bp in zip(np.ravel(M), np.ravel(B))])
Z = zs.reshape(M.shape)
ax.plot_surface(M, B, Z, rstride=1, cstride=1, color='b', alpha=0.5)
ax.set_xlabel('m')
ax.set_ylabel('b')
ax.set_zlabel('error')
plt.show()
yields
Tip: I renamed computeErrorForLineGivenPoints as error. Generally, there is no need to name a function compute... since almost all functions compute something. You also do not need to specify "GivenPoints" since the function signature shows that points is an argument. If you have other error functions or variables in your program, line_error or total_error might be a better name for this function.