I am using python to batch process some data and plot it. I can fit it quite well using scipy.curve_fit, a bi-exponential function and some sensible initial guesses. Here is a code snippet:
def biexpfunc(x, a, b, c, d, e):
y_new = []
for i in range(len(x)):
y = (a * np.exp(b*x[i])) + (c * np.exp(d*x[i])) + e
y_new.append(y)
return y_new
x = np.linspace(0, 160, 100)
y = biexpfunc(x, 50, -0.2, 50, -0.1, 10)
jitter_y = y + 0.5 *np.random.rand(len(y)) - 0.1
plt.scatter(x, jitter_y)
sigma = np.ones(len(x))
sigma[[0, -1]] = 0.01
popt, pcov = curve_fit(biexpfunc, x, jitter_y, p0 = (50, -0.2, 50, -0.1, 10),
sigma = sigma)
x_fit = np.linspace(0, x[-1])
y_fit = biexpfunc(x_fit, *popt)
plt.plot(x_fit, y_fit, 'r--')
plt.show()
I know how to interpolate this to find y for a given value of x (by putting it back into the function), but how can I find x for a given value of y? I feel like there must be a sensible method that doesn't require re-arrangement and defining a new function (partially because maths is not my strong suit and I don't know how to!). If the curve fits the data well is there a way to simply read off a value? Any assistance would be greatly appreciated!
Turns out, your question has nothing to do with curve fitting but is actually about root finding. Scipy.optimize has a whole arsenal of functions for this task. Choosing and configuring the right one is sometimes difficult. I might not be the best guide here, but since no-one else stepped up...
Root finding tries to determine x-values for which f(x) is zero. To find an x0 where f(x0) is a certain y0-value, we simply transform the function into g(x) = f(x)-y0.
Since your function is monotonous, not more than one root is to be expected for a given y-value. We also know the x-interval in which to search, so bisect seems to be a reasonable strategy:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit, bisect
def biexpfunc(x, a, b, c, d, e):
return (a * np.exp(b*x)) + (c * np.exp(d*x)) + e
np.random.seed(123)
x = np.linspace(0, 160, 100)
y = biexpfunc(x, 50, -0.2, 50, -0.1, 10)
jitter_y = y + 0.5 *np.random.rand(len(y)) - 0.1
fig, ax = plt.subplots(figsize=(10, 8))
ax.scatter(x, jitter_y, marker="x", color="blue", label="raw data")
#your curve fit routine
sigma = np.ones(len(x))
sigma[[0, -1]] = 0.01
popt, pcov = curve_fit(biexpfunc, x, jitter_y, p0 = (50, -0.2, 50, -0.1, 10), sigma = sigma)
x_fit = np.linspace(x.min(), x.max(), 100)
y_fit = biexpfunc(x_fit, *popt)
ax.plot(x_fit, y_fit, 'r--', label="fit")
#y-value for which we want to determine the x-value(s)
y_test=55
test_popt = popt.copy()
test_popt[-1] -= y_test
#here, the bisect method tries to establish the x for which f(x)=0
x_test=bisect(biexpfunc, x.min(), x.max(), args=tuple(test_popt))
#we calculate the deviation from the expected y-value
tol_test, = np.abs(y_test - biexpfunc(np.asarray([x_test]), *popt))
#and mark the determined point in the graph
ax.axhline(y_test, ls="--", color="grey")
ax.axvline(x_test, ls="--", color="grey")
ax.plot(x_test, y_test, c="tab:orange", marker="o", markersize=15, alpha=0.5)
ax.annotate(f"X: {x_test:.2f}, Y: {y_test:.2f}\ntol: {tol_test:.4f}",
xy=(x_test, y_test), xytext=(50, 50), textcoords="offset points",
arrowprops=dict(facecolor="tab:orange", shrink=0.05),)
ax.legend(title="root finding: bisect")
plt.show()
Sample output:
Another way to determine roots for more complex functions is, surprise, root. The script is mainly identical, only the root-routine is slightly different, for instance, we can choose the root-finding method:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit, root
def biexpfunc(x, a, b, c, d, e):
return (a * np.exp(b*x)) + (c * np.exp(d*x)) + e
np.random.seed(123)
x = np.linspace(0, 160, 100)
y = biexpfunc(x, 50, -0.2, 50, -0.1, 10)
jitter_y = y + 0.5 *np.random.rand(len(y)) - 0.1
fig, ax = plt.subplots(figsize=(10, 8))
ax.scatter(x, jitter_y, marker="x", color="blue", label="raw data")
#your curve fit routine
sigma = np.ones(len(x))
sigma[[0, -1]] = 0.01
popt, pcov = curve_fit(biexpfunc, x, jitter_y, p0 = (50, -0.2, 50, -0.1, 10), sigma = sigma)
x_fit = np.linspace(x.min(), x.max(), 100)
y_fit = biexpfunc(x_fit, *popt)
ax.plot(x_fit, y_fit, 'r--', label="fit")
#y-value for which we want to determine the x-value(s)
y_test=55
test_popt = popt.copy()
test_popt[-1] -= y_test
#calculate corresponding x-value with root finding
r=root(biexpfunc, x.mean(), args=tuple(test_popt), method="lm")
x_test, = r.x
tol_test, = np.abs(y_test - biexpfunc(r.x, *popt))
#mark point in graph
ax.axhline(y_test, ls="--", color="grey")
ax.axvline(x_test, ls="--", color="grey")
ax.plot(x_test, y_test, c="tab:orange", marker="o", markersize=15, alpha=0.5)
ax.annotate(f"X: {x_test:.2f}, Y: {y_test:.2f}\ntol: {tol_test:.4f}",
xy=(x_test, y_test), xytext=(50, 50), textcoords="offset points",
arrowprops=dict(facecolor="tab:orange", shrink=0.05))
ax.legend(title="root finding: lm")
plt.show()
Sample output:
The graphs look in this case identical. This is not necessarily for every function so; just like for curve fitting, the correct approach can dramatically improve the outcome.
Related
I'm new with the function curve_fit() from scipy.optimize, but I can't get it to work.
I've a barplot, really simple, and I would like to create a curve that "fit" it.
My code :
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
x = [i for i in range(15)]
y = [1,3,4,6,8,4,2,1,5,8,6,5,5,8,5]
plt.bar(x,y,color='yellow')
plt.show()
# that is working
curve_fit(x,y) # I want the curve to fit the barplot
# but it returns an error...
plt.show()
result : error because of curve_fit.
If you could help me, that would be really great.
That is bonus, don t waste too much time, but would you know how to do the curve and do some forecast ? For instance, the result could be:
curve_fit
You need to pass a fitting function to curve_fit. Note that the line you've drawn is quite overfit for such a small sample and would require a high-order polynomial (even a cubic fit won't look like that).
Here is an example of using a quartic fitting function f_curve4:
# curve_fit requires x and y to be arrays (not lists)
x = np.arange(15)
y = np.array([1, 3, 4, 6, 8, 4, 2, 1, 5, 8, 6, 5, 5, 8, 5])
plt.bar(x, y, color='cyan')
# fit
f_curve4 = lambda x, a, b, c, d, e: a*x**4 + b*x**3 + c*x**2 + d*x + e
popt, pcov = curve_fit(f_curve4, x, y)
plt.plot(x, f_curve4(x, *popt), '--', label='fit')
# forecast
x_new = np.arange(max(x), max(x) + 2)
plt.plot(x_new, f_curve4(x_new, *popt), 'r:', label='forecast')
polyfit
Alternatively use polyfit and just pass deg=N without manually defining an Nth-order fitting function:
plt.bar(x, y, color='cyan')
# fit
f_poly4 = np.polyfit(x, y, deg=4)
x_fit = np.linspace(min(x), max(x), 100)
y_fit = np.polyval(f_poly4, x_fit)
plt.plot(x_fit, y_fit, '--', label='fit')
# forecast
x_new = np.linspace(max(x), max(x) + 1, 10)
y_new = np.polyval(f_poly4, x_new)
plt.plot(x_new, y_new, 'r:', lw=2, label='forecast')
interp1d
Depending on your use case, consider interpolating with interp1d instead of fitting a polynomial. Here is an example of using cubic interpolation function f_interp:
plt.bar(x, y, color='cyan')
f_interp = interpolate.interp1d(x, y, kind='cubic')
x2 = np.linspace(min(x), max(x), 100)
plt.plot(x2, f_interp(x2), '--')
I have two variables, x and y, that are random variables. I want to fit a curve to them that plateaus. I've been able to do this using an exponential fit but I'd like to do so with a quadratic fit as well.
How can I get the fit to flatten out at the top? FWIW, the y data were generated such that no value goes above: 4300. So probably in the new curve it should have this requirement.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = np.asarray([70,37,39,42,35,35,44,40,42,51,65,32,56,51,33,47,33,42,33,44,46,38,53,38,54,54,51,46,50,51,48,48,50,32,54,60,41,40,50,49,58,35,53,66,41,48,43,54,51])
y = np.asarray([3781,3036,3270,3366,2919,2966,3326,2812,3053,3496,3875,1823,3510,3615,2987,3589,2791,2819,1885,3570,3431,3095,3678,2297,3636,3569,3547,3553,3463,3422,3516,3538,3671,1888,3680,3775,2720,3450,3563,3345,3731,2145,3364,3928,2720,3621,3425,3687,3630])
def polyfit(x, y, degree):
results = {}
coeffs = np.polyfit(x, y, degree)
# Polynomial Coefficients
results['polynomial'] = coeffs.tolist()
# r-squared, fit values, and average
p = np.poly1d(coeffs)
yhat = p(x)
ybar = np.sum(y)/len(y)
ssreg = np.sum((yhat-ybar)**2)
sstot = np.sum((y - ybar)**2)
results['determination'] = ssreg / sstot
return results, yhat, ybar
def plot_polyfit(x=None, y=None, degree=None):
# degree = degree of the fitting polynomial
xmin = min(x)
xmax = max(x)
fig, ax = plt.subplots(figsize=(5,4))
p = np.poly1d(np.polyfit(x, y, degree))
t = np.linspace(xmin, xmax, len(x))
ax.plot(x, y, 'ok', t, p(t), '-', markersize=3, alpha=0.6, linewidth=2.5)
results, yhat, ybar = polyfit(x,y,degree)
R_squared = results['determination']
textstr = r'$r^2=%.2f$' % (R_squared, )
props = dict(boxstyle='square', facecolor='lightgray', alpha=0.5)
fig.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=12,
verticalalignment='top', bbox=props)
results['polynomial'][0]
plot_polyfit(x=x, y=y, degree=2)
In contrast, I can use the same functions and get the curve to plateau better when the data are so:
x2 = np.asarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12])
y2 = np.asarray([2, 4, 8, 12, 14, 18, 20, 21, 22, 23, 24, 24])
plot_polyfit(x=x2, y=y2, degree=2)
Edits suggested by #tstanisl:
def plot_newfit(xdat, ydat):
x,y = xdat, ydat
xmax = 4300
def new_fit(A,x,B):
return A*(x - xmax)**2+B # testing this out
fig, axs = plt.subplots(figsize=(5,4))
# Find best fit.
popt, pcov = curve_fit(new_fit, x, y)
# Top plot
# Plot data and best fit curve.
axs.plot(x, y,'ok', alpha=0.6)
axs.plot(np.sort(x), new_fit(np.sort(x), *popt),'-')
#r2
residuals = y - new_fit(x, *popt)
ss_res = np.sum(residuals**2)
ss_tot = np.sum((y-np.mean(y))**2)
r_squared = 1 - (ss_res / ss_tot)
r_squared
# Add text
textstr = r'$r^2=%.2f$' % (r_squared, )
props = dict(boxstyle='square', facecolor='lightgray', alpha=0.5)
fig.text(0.05, 0.95, textstr, transform=axs.transAxes, fontsize=12,
verticalalignment='top', bbox=props)
plot_newfit(x,y)
You just need to slightly modify new_fit() to fit A, B rather x and B.
Set xmax to the desired location of the peek. Using x.max() will guarantee that the fit curve will flatten at the last sample.
def new_fit(x, A, B):
xmax = x.max() # or 4300
return A*(x - xmax)**2+B # testing this out
Result:
I'm not too familiar with scipy.optimise but, if you find the Euclidian distance between the point that contains x-max and the one that contains your y-max, divide it in half and do some trig, you could use that coord to either force your quadratic through it, or use it in your array. (again not too familiar with scipy.optimise so I'm not sure if that first option is possible, but the second should lessen the downwards curve)
I can provide the proof if you don't understand.
I have the following data-set:
x = 0, 5, 10, 15, 20, 25, 30
y = 0, 0.13157895, 0.31578947, 0.40789474, 0.46052632, 0.5, 0.53947368
Now, I want to plot this data and fit this data set with my defined function f(x) = (A*K*x/(1+K*x)) and find the parameters A and K ?
I wrote the following python script but it seems like it can't do the fitting I require:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
x = np.array([0, 5, 10, 15, 20, 25, 30])
y = np.array([0, 0.13157895, 0.31578947, 0.40789474, 0.46052632, 0.5, 0.53947368])
def func(x, A, K):
return (A*K*x / (1+K*x))
plt.plot(x, y, 'b-', label='data')
popt, pcov = curve_fit(func, x, y)
plt.plot(x, func(x, *popt), 'r-', label='fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
Still, it's not giving a best curve fit. Can anyone help me with the changes in the python script or a new script where I can properly fit the data with my desired fitting function ?
The classic problem: You didn't give any inital guess for A neither K. In this case the default value will be 1 for all parameters, which is not suitable for your dataset, and the fitting will converge to somewhere else. You can figure out the guesses different ways: by looking at the data, by the real meaning of parameters, etc.. You can guess values with the p0 parameter of scipy.optimize.curve_fit. It accepts list of values in the order they are in the func you want to optimize. I used 0.1 for both, and I got this curve:
popt, pcov = curve_fit(func, x, y, p0=[0.1, 0.1])
Try Minuit, which is a package implemented at Cern.
from iminuit import Minuit
import numpy as np
import matplotlib.pyplot as plt
def func(x, A, K):
return (A*K*x / (1+K*x))
def least_squares(a, b):
yvar = 0.01
return sum((y - func(x, a, b)) ** 2 / yvar)
x = np.array([0, 5, 10, 15, 20, 25, 30])
y = np.array([0, 0.13157895, 0.31578947, 0.40789474, 0.46052632, 0.5, 0.53947368])
m = Minuit(least_squares, a=5, b=5)
m.migrad() # finds minimum of least_squares function
m.hesse() # computes errors
plt.plot(x, y, "o")
plt.plot(x, func(x, *m.values.values()))
# print parameter values and uncertainty estimates
for p in m.parameters:
print("{} = {} +/- {}".format(p, m.values[p], m.errors[p]))
And the outcome:
a = 0.955697134431429 +/- 0.4957121286951612
b = 0.045175437602766676 +/- 0.04465599806912648
Here is my code
yp = df.final_sulfur/df.start_sulfur
xp = df.mag/df.hm_weight
sns.set(font_scale=1.5, font='DejaVu Sans')
fig, ax = plt.subplots(1,1, figsize=(9, 9))
yp = df.final_sulfur/df.start_sulfur
xp = df.mag/df.hm_weight
p = ax.plot(xp, yp, marker='o', markersize=1, alpha=0.2, linestyle='none')
p = ax.set_xlabel('mag/hm_weight')
p = ax.set_ylabel('final/start sulfur')
p = ax.set_title('final S vs mag')
p = ax.set_xlim(0.08, 12)
p = ax.set_ylim(3.0e-3, 1.5)
p = ax.set_xscale('log')
p = ax.set_yscale('log')
leg = plt.legend()
Trying to fit the curve below is the equation which I think an exponential decay should dobut what I am getting is completely bad result
import scipy as sp
from scipy.optimize import curve_fit
def func(xp, a, b, c):
return a*np.exp(-b*xp) + c
popt2, pcov2 = curve_fit(func, xp, yp, p0=None)
a2, b2, c2 = popt2
print ('a2=', a2, 'b2=', b2, 'c2=', c2)
print ('func=', func(xp, a2, b2, c2))
ax.plot(xp, func(xp, *popt2), 'b-', label='Fit',linestyle='--',color='red')
Note : I need to use log scale for plotting which is when it makes sense
How my sample data looks like (its difficult to put all the data here unfortunately)
xp (sample) converted to list for sake of SO
[1.8535530937584435,
0.3220338983050847,
1.184544992374174,
0.7583873696081196,
0.3209681662720337,
1.158380317785751,
1.6285714285714286,
0.44209925841414716,
0.7396205664008799,
0.27983539094650206,
0.575319622012229,
0.3287671232876712,
1.382921589688507,
0.8247978436657682,
1.315934065934066,
0.23450134770889489,
0.5697069296083265,
1.0015731515469324,
1.2841602547094721,
0.645600653772814,
0.4599483204134367,
0.8340260459961208,
0.8992900341835393,
0.961340206185567,
0.5845225027442371,
0.9623773173391493,
1.3451708366962605,
0.8418230563002681,
0.7456025203465477,
1.9345156889495225]
yp [0.05202312138728324,
0.47058823529411764,
0.04833333333333333,
0.11392405063291139,
0.36363636363636365,
0.020588235294117647,
0.008823529411764706,
0.25641025641025644,
0.12,
0.47826086956521735,
0.1826923076923077,
0.3333333333333333,
0.01282051282051282,
0.029411764705882353,
0.03225806451612903,
0.26666666666666666,
0.05,
0.011428571428571429,
0.12080536912751677,
0.11764705882352941,
0.2926829268292683,
0.049999999999999996,
0.06578947368421052,
0.08024691358024691,
0.15517241379310343,
0.024390243902439025,
0.017543859649122806,
0.05479452054794521,
0.03571428571428571,
0.007142857142857143]
This is how I got it willing to get better curve more of exponential decay
yp = df.final_sulfur/df.start_sulfur
xp = df.mag/df.hm_weight
test_data=xp.to_frame(name = 'xp').join(yp.to_frame(name='yp'))
sns.set(font_scale=1.5, font='DejaVu Sans')
fig, ax = plt.subplots(1,1, figsize=(9,15))
yp = df.final_sulfur/df.start_sulfur
xp = df.mag/df.hm_weight
p = ax.plot(xp, yp, marker='o', markersize=1, alpha=0.2, linestyle='none')
#p = sns.regplot(x=xp, y=yp, data=test_data,logx=True)
model = lambda x, A, x0, sigma, offset: offset+A*np.exp(-((x-x0)/sigma)**1)
popt, pcov = curve_fit(model, xp.values, yp.values, p0=[0,0.5,1,1])
x = np.linspace(xp.values.min(),xp.values.max())
p =ax.plot(x,model(x,*popt), label="fit",color='red',linestyle='--')
model2 = lambda x, sigma: model(x,0.5,0,sigma**1,0)
x2 = np.linspace(xp.values.min(),xp.values.max())
popt2, pcov2 = curve_fit(model2, xp.values,
yp.values, p0=[1])
p= ax.plot(x2,model2(x2,*popt2), label="fit2",color='green')
model3 = lambda x, A, x0, sigma, offset: offset+A*np.exp(-((x+x0)/sigma)**1)
popt, pcov = curve_fit(model3, xp.values, yp.values, p0=[0,0.5,1,1])
x3 = np.linspace(xp.values.min(),xp.values.max())
p = ax.plot(x3,model(x3,*popt), label="fit3",color='blue')
I'm trying to optimize a logarithmic fit to a data set with scipy.optimize.curve_fit. Before trying it on an actual data set, I wrote code to run on a dummy data set.
def do_fitting():
x = np.linspace(0, 4, 100)
y = func(x, 1.1, .4, 5)
y2 = y + 0.2 * np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, y2, p0=np.array([2, 0.5, 1]))
plt.figure()
plt.plot(x, y, 'bo', label="Clean Data")
plt.plot(x, y2, 'ko', label="Fuzzed Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
Of course, do_fitting() relies on func(), which it passes to curve_fit. Here's the problem. When I pass a func() that contains np.log, i.e. the function that I actually want to fit to, curve_fit declares that p0 (the initial condition) is the optimal solution and returns immediately with an infinite covariance.
Here's what happens if I run do_fitting() with a non-logarithmic func():
def func(x, a, b, c):
return a * np.exp(x*b) + c
popt = [ 0.90894173 0.44279212 5.19928151]
pcov = [[ 0.02044817 -0.00471525 -0.02601574]
[-0.00471525 0.00109879 0.00592502]
[-0.02601574 0.00592502 0.0339901 ]]
Here's what happens when I run do_fitting() with a logarithmic func():
def func(x, a, b, c):
return a * np.log(x*b) + c
popt = [ 2. 0.5 1. ]
pcov = inf
You'll notice that the logarithmic solution for popt is equal to the value I gave curve_fit for p0 in the above do_fitting(). This is true, and pcov is infinite, for every value of p0 I have tried.
What am I doing wrong here?
The problem is very simple - since the first value in your x array is 0, you are taking the log of 0, which is equal to -inf:
x = np.linspace(0, 4, 100)
p0 = np.array([2, 0.5, 1])
print(func(x, *p0).min())
# -inf
I was able to fit a logarithmic function just fine using the following code (hardly modified from your original):
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.log(x+b) + c
def do_fitting():
x = np.linspace(0, 4, 100)
y = func(x, 1.1, .4, 5)
y2 = y + 0.2 * np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, y2, p0=np.array([2, 0.5, 1]))
plt.figure()
plt.plot(x, y, 'bo', label="Clean Data")
plt.plot(x, y2, 'ko', label="Fuzzed Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
do_fitting()
(Unfortunately I can't post a picture of the final fit, but it agrees quite nicely with the clean data).
Likely your problem is not the logarithm itself, but some difficulty curve_fit is having with the specific function you're trying to fit. Can you edit your question to provide an example of the exact logarithmic function you're trying to fit?
EDIT: The function you provided is not well-defined for x=0, and produces a RuntimeWarning upon execution. curve_fit is not good at handling NaNs, and will not be able to fit the function in this case. If you change x to
x = np.linspace(1, 4, 100)
curve_fit performs just fine.