import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
x = [333,500,1000,2000,5000,10000]
y = [195.3267, 233.0235, 264.5914,294.8728, 328.3523,345.4688]
popt, pcov = curve_fit(func, x, y)
plt.figure()
plt.plot(x, y, 'ko', label="Original Noised Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
Error:
C:\Users\Aidan\Anaconda3\lib\site-packages\scipy\optimize\minpack.py:794:
OptimizeWarning: Covariance of the parameters could not be estimated
category=OptimizeWarning)
--------------------------------------------------------------------------- TypeError Traceback (most recent call
last) in ()
14 plt.figure()
15 plt.plot(x, y, 'ko', label="Original Noised Data")
---> 16 plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
17 plt.legend()
18 plt.show()
in func(x, a, b, c)
4
5 def func(x, a, b, c):
----> 6 return a * np.exp(-b * x) + c
7
8 x = [333,500,1000,2000,5000,10000]
TypeError: 'numpy.float64' object cannot be interpreted as an integer
For some reason I am not able to get a curve fit based on my data. I am following the exponential example from here: How to do exponential and logarithmic curve fitting in Python? I found only polynomial fitting
But I am using an two arrays rather than made up random data. I am new to python!
There are a few issues with your code.
You use lists instead of numpy.ndarray: the numpy and scipy routines are meant to work with numpy.ndarray and they use them internally. You should use them as well.
You are likely to get overflow issues with your data and your function, e.g. np.exp(-1000) is already approximated to zero in Python3
You are trying to fit a function that it is unlikely to fit your data. It looks more like an exponential recovery than a decay.
The following code tentatively address all these issues:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * (1 - np.exp(-b * x)) + c
x = np.array([333.0,500.0,1000.0,2000.0,5000.0,10000.0]) / 1000
y = np.array([195.3267, 233.0235, 264.5914,294.8728, 328.3523,345.4688]) / 10
popt, pcov = curve_fit(func, x, y)
print(popt)
plt.figure()
plt.plot(x, y, 'ko', label="Original Noised Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
Related
I'm trying to calculate the area under the curve of a Gaussian, I even managed to fit my data but I can't make an integral using this fit.
`
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as mpl
# Let's create a function to model and create data
def func(x, a, x0, sigma):
return a*np.exp(-(x-x0)**2/(4*sigma**2))
# Generating clean data
x = dados.col1
y = dados.col2
# Adding noise to the data
yn = y + 0.2 * np.random.normal(size=len(x))
# Plot out the current state of the data and model
fig = mpl.figure()
ax = fig.add_subplot(111)
ax.plot(x, y, c='k', label='Function')
ax.scatter(x, yn)
# Executing curve_fit on noisy data
popt, pcov = curve_fit(func, x, yn)
#popt returns the best fit values for parameters of the given model (func)
print (popt)
ym = func(x, popt[0], popt[1], popt[2])
ax.plot(x, ym, c='r', label='Best fit')
ax.legend()
fig.savefig('model_fit.png')
`
I hope to have the area of this function
can anyone help me struggle with fitting issue from curve.fit. I would like to fit my data to a second order equation. But I obtained a result like a linear equation.
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
f = a*np.power(x, 2) + b*x + c
return f
xdata_prime=[3.0328562996216282, 3.101784841139168, 3.1707134502066894, 3.2396419917242292, 3.308570533241769, 3.3774990747593088, 3.3774990747593088, 3.4337789932367149, 3.4900589392912855, 3.5463388577686916, 3.6026187762460977, 3.6588987223006684]
ydata_prime=[6.344300000000002, 6.723900000000002, 7.080399999999999, 7.399800000000001, 7.649099999999999, 7.753100000000002, 7.753100000000002, 7.658600000000002, 7.442100000000002, 7.180100000000001, 6.902700000000001, 6.6211]
plt.plot(xdata_prime, ydata_prime, 'b-', label='data')
popt, pcov = curve_fit(func, xdata_prime, ydata_prime)
popt
plt.plot(xdata_prime, func(xdata_prime, *popt), 'r-',label='fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
Your arrays need to be numpy arrays because your function is doing vectorized operations (namely a*np.power(x, 2)). So with this your code will work:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
f = a*np.power(x, 2) + b*x + c
return f
xdata_prime=np.array([3.0328562996216282, 3.101784841139168, 3.1707134502066894, 3.2396419917242292, 3.308570533241769, 3.3774990747593088, 3.3774990747593088, 3.4337789932367149, 3.4900589392912855, 3.5463388577686916, 3.6026187762460977, 3.6588987223006684])
ydata_prime=np.array([6.344300000000002, 6.723900000000002, 7.080399999999999, 7.399800000000001, 7.649099999999999, 7.753100000000002, 7.753100000000002, 7.658600000000002, 7.442100000000002, 7.180100000000001, 6.902700000000001, 6.6211])
plt.plot(xdata_prime, ydata_prime, 'b-', label='data')
popt, pcov = curve_fit(func, xdata_prime, ydata_prime)
plt.plot(xdata_prime, func(xdata_prime, *popt), 'r-',label='fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
I would like to find and plot a function f that represents a curve fitted on some number of set points that I already know, x and y.
After some research I started experimenting with scipy.optimize and curve_fit but on the reference guide I found that the program uses a function to fit the data instead and it assumes ydata = f(xdata, *params) + eps.
So my question is this: What do I have to change in my code to use the curve_fit or any other library to find the function of the curve using my set points? (note: I want to know the function as well so I can integrate later for my project and plot it). I know that its going to be a decaying exponencial function but don't know the exact parameters. This is what I tried in my program:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
xdata = np.array([0.2, 0.5, 0.8, 1])
ydata = np.array([6, 1, 0.5, 0.2])
plt.plot(xdata, ydata, 'b-', label='data')
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(xdata, func(xdata, *popt), 'r-', label='fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
Am currently developing this project on a Raspberry Pi, if it changes anything. And would like to use least squares method since is great and precise, but any other method that works well is welcome.
Again, this is based on the reference guide of scipy library. Also, I get the following graph, which is not even a curve: Graph and curve based on set points
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
#c is a constant so taking the derivative makes it go to zero
def deriv(x, a, b, c):
return -a * b * np.exp(-b * x)
#Integrating gives you another c coefficient (offset) let's call it c1 and set it equal to zero by default
def integ(x, a, b, c, c1 = 0):
return -a/b * np.exp(-b * x) + c*x + c1
#There are only 4 (x,y) points here
xdata = np.array([0.2, 0.5, 0.8, 1])
ydata = np.array([6, 1, 0.5, 0.2])
#curve_fit already uses "non-linear least squares to fit a function, f, to data"
popt, pcov = curve_fit(func, xdata, ydata)
a,b,c = popt #these are the optimal parameters for fitting your 4 data points
#Now get more x values to plot the curve along so it looks like a curve
step = 0.01
fit_xs = np.arange(min(xdata),max(xdata),step)
#Plot the results
plt.plot(xdata, ydata, 'bx', label='data')
plt.plot(fit_xs, func(fit_xs,a,b,c), 'r-', label='fit')
plt.plot(fit_xs, deriv(fit_xs,a,b,c), 'g-', label='deriv')
plt.plot(fit_xs, integ(fit_xs,a,b,c), 'm-', label='integ')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
I'm trying to optimize a logarithmic fit to a data set with scipy.optimize.curve_fit. Before trying it on an actual data set, I wrote code to run on a dummy data set.
def do_fitting():
x = np.linspace(0, 4, 100)
y = func(x, 1.1, .4, 5)
y2 = y + 0.2 * np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, y2, p0=np.array([2, 0.5, 1]))
plt.figure()
plt.plot(x, y, 'bo', label="Clean Data")
plt.plot(x, y2, 'ko', label="Fuzzed Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
Of course, do_fitting() relies on func(), which it passes to curve_fit. Here's the problem. When I pass a func() that contains np.log, i.e. the function that I actually want to fit to, curve_fit declares that p0 (the initial condition) is the optimal solution and returns immediately with an infinite covariance.
Here's what happens if I run do_fitting() with a non-logarithmic func():
def func(x, a, b, c):
return a * np.exp(x*b) + c
popt = [ 0.90894173 0.44279212 5.19928151]
pcov = [[ 0.02044817 -0.00471525 -0.02601574]
[-0.00471525 0.00109879 0.00592502]
[-0.02601574 0.00592502 0.0339901 ]]
Here's what happens when I run do_fitting() with a logarithmic func():
def func(x, a, b, c):
return a * np.log(x*b) + c
popt = [ 2. 0.5 1. ]
pcov = inf
You'll notice that the logarithmic solution for popt is equal to the value I gave curve_fit for p0 in the above do_fitting(). This is true, and pcov is infinite, for every value of p0 I have tried.
What am I doing wrong here?
The problem is very simple - since the first value in your x array is 0, you are taking the log of 0, which is equal to -inf:
x = np.linspace(0, 4, 100)
p0 = np.array([2, 0.5, 1])
print(func(x, *p0).min())
# -inf
I was able to fit a logarithmic function just fine using the following code (hardly modified from your original):
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.log(x+b) + c
def do_fitting():
x = np.linspace(0, 4, 100)
y = func(x, 1.1, .4, 5)
y2 = y + 0.2 * np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, y2, p0=np.array([2, 0.5, 1]))
plt.figure()
plt.plot(x, y, 'bo', label="Clean Data")
plt.plot(x, y2, 'ko', label="Fuzzed Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
do_fitting()
(Unfortunately I can't post a picture of the final fit, but it agrees quite nicely with the clean data).
Likely your problem is not the logarithm itself, but some difficulty curve_fit is having with the specific function you're trying to fit. Can you edit your question to provide an example of the exact logarithmic function you're trying to fit?
EDIT: The function you provided is not well-defined for x=0, and produces a RuntimeWarning upon execution. curve_fit is not good at handling NaNs, and will not be able to fit the function in this case. If you change x to
x = np.linspace(1, 4, 100)
curve_fit performs just fine.
How to calculate coefficient of determination (R2) and root mean square error (RMSE) for non linear curve fitting in python. Following code does until curve fitting. Then how to calculate R2 and RMSE?
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
x = np.linspace(0,4,50)
y = func(x, 2.5, 1.3, 0.5)
yn = y + 0.2*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn)
plt.figure()
plt.plot(x, yn, 'ko', label="Original Noised Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
You could do it like this:
print "Mean Squared Error: ", np.mean((y-func(x, *popt))**2)
ss_res = np.dot((yn - func(x, *popt)),(yn - func(x, *popt)))
ymean = np.mean(yn)
ss_tot = np.dot((yn-ymean),(yn-ymean))
print "Mean R :", 1-ss_res/ss_tot
This is taking the definitions directly, as for example in the wikipedia:
http://en.wikipedia.org/wiki/Coefficient_of_determination#Definitions
Martin Böschen, not y but yn here:
np.mean((y-func(x, *popt))**2)
And read this about root-mean-square error (RMSE): http://en.wikipedia.org/wiki/Regression_analysis
residuals = yn - func(x,*popt)
print "RMSE",(scipy.sum(residuals**2)/(residuals.size-2))**0.5
Now it calculates as Excel 2003 Analysis ToolPak.