the fitted curve doesn't fit the datapoints (xH_data, nH_data) as expected. Does someone know what might be the issue here?
from scipy.optimize import curve_fit
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
xH_data = np.array([1., 1.03, 1.06, 1.1, 1.2, 1.3, 1.5, 1.7, 2., 2.6, 3., 4., 5., 6.])
nH_data = np.array([403., 316., 235., 160., 70.8, 37.6, 14.8, 7.11, 2.81, 0.665, 0.313, 0.090, 0.044, 0.029])*1.0e6
plt.plot(xH_data, nH_data)
plt.yscale("log")
plt.xscale("log")
def eTemp(x, A, a, B):
n = B*(A+x)**a
return n
parameters, covariance = curve_fit(eTemp, xH_data, nH_data, maxfev=200000)
fit_A = parameters[0]
fit_a = parameters[1]
fit_B = parameters[2]
print(fit_A)
print(fit_a)
print(fit_B)
r = np.logspace(0, 0.7, 1000)
ne = fit_B *(fit_A + r)**(fit_a)
plt.plot(r, ne)
plt.yscale("log")
plt.xscale("log")
Thanks in advance for the help.
Ok, here is a different approach. As usual, the main problem are initial guesses for the non linear fit (For details, check this). Here, those are evaluated by using an integro relation of the fit function y( x ) = a ( x - c )^p, namely int( y ) = ( x - c ) / ( p + 1 ) y + d = x y / ( p + 1 ) - c y / ( p + 1 ) + d This means we can get c and p via a linear fit of int y against x y and y. Once those are known, a is a simple linear fit. It will turn out that these guesses are already quite good. Nevertheless, those will go as initial values into a non-linear fit providing the final result. In detail this goes like this:
import matplotlib.pyplot as plt
import numpy as np
from scipy.integrate import cumtrapz
from scipy.optimize import curve_fit
xHdata = np.array(
[
1.0, 1.03, 1.06, 1.1, 1.2, 1.3, 1.5,
1.7, 2.0, 2.6, 3.0, 4.0, 5.0, 6.0
]
)
nHdata = np.array(
[
403.0, 316.0, 235.0, 160.0, 70.8, 37.6,
14.8, 7.11, 2.81, 0.665, 0.313, 0.090, 0.044, 0.029
]
) * 1.0e6
def fit_func( x, a, c, p ):
out = a * ( x - c )**p
return out
### fitting the non-linear parameters as part of an integro-equation
### this is the standard matrix formulation of a linear fit
Sy = cumtrapz( nHdata, x=xHdata, initial=0 ) ## int( y )
VMXT = np.array( [ xHdata * nHdata , nHdata, np.ones( len( nHdata ) ) ] ) ## ( x y, y, d )
VMX = VMXT.transpose()
A = np.dot( VMXT, VMX )
SV = np.dot( VMXT, Sy )
sol = np.linalg.solve( A , SV )
print ( sol )
pF = 1 / sol[0] - 1
print( pF )
cF = -sol[1] * ( pF + 1 )
print( cF )
### making a linear fit on the scale
### the short version of the matrix form if only one factor is calculated
fk = fit_func( xHdata, 1, cF, pF )
aF = np.dot( nHdata, fk ) / np.dot( fk, fk )
print( aF )
#### using these guesses as input for a final non-linear fit
sol, cov = curve_fit(fit_func, xHdata, nHdata, p0=( aF, cF, pF ) )
print( sol )
print( cov )
### plotting
xth = np.linspace( 1, 6, 125 )
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.scatter( xHdata, nHdata )
ax.plot( xth, fit_func( xth, aF, cF, pF ), ls=':' )
ax.plot( xth, fit_func( xth, *sol ) )
plt.show()
Providing:
[-3.82334284e-01 2.51613126e-01 5.41522867e+07]
-3.6155122388787175
0.6580972107001803
8504146.59883185
[ 5.32486242e+07 2.44780953e-01 -7.24897172e+00]
[[ 1.03198712e+16 -2.71798924e+07 -2.37545914e+08]
[-2.71798924e+07 7.16072922e-02 6.26461373e-01]
[-2.37545914e+08 6.26461373e-01 5.49910325e+00]]
(note the high correlation from a to c and p)
and
I know of two things that might help you
Provide the p0 input parameter to curve_fit with a set of appropriate starting parameters to the function. That can keep the algorithm from running wild.
Change the function you are fitting so that it returns np.log(n) and then make the fit to np.log(nH_data). As it is now, there is a far larger penalty for not fitting the first data points than for not fitting the last data points, as the values are about 10^2 larger for the first ones. Thus, the first data points become "more important" to fit for the algorithm. Taking the logarithm puts them more on the same scale, so that points are weighed equally.
Go ahead and play around with it. I managed a pretty fine fit with these parameters
[-7.21450545e-01 -3.36131028e+00 5.97293632e+06]
I think you're nearly there, just need to fit on a log scale and throw in a decent guess. To make the guess you just need to throw in a plot like
plt.figure()
plt.plot(np.log(xH_data), np.log(nH_data))
and you'll see it's nearly linear. So your B will be the exponentiated intercept (i.e. exp(20ish)) and the a is the approximate slope (-5ish). A is weird one, does it have some physical meaning or you just threw it in there? If there's no physical meaning, I'd say get rid of it.
from scipy.optimize import curve_fit
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
xH_data = np.array([1., 1.03, 1.06, 1.1, 1.2, 1.3, 1.5, 1.7, 2., 2.6, 3., 4., 5., 6.])
nH_data = np.array([403., 316., 235., 160., 70.8, 37.6, 14.8, 7.11, 2.81, 0.665, 0.313, 0.090, 0.044, 0.029])*1.0e6
def eTemp(x, A, a, B):
logn = np.log(B*(x + A)**a)
return logn
parameters, covariance = curve_fit(eTemp, xH_data, np.log(nH_data), p0=[np.exp(0.1), -5, np.exp(20)], maxfev=200000)
fit_A = parameters[0]
fit_a = parameters[1]
fit_B = parameters[2]
print(fit_A)
print(fit_a)
print(fit_B)
r = np.logspace(0, 0.7, 1000)
ne = np.exp(eTemp(r, fit_A, fit_a, fit_B))
plt.plot(xH_data, nH_data)
plt.plot(r, ne)
plt.yscale("log")
plt.xscale("log")
There is a problem with your fit equation. If A is less than -1 and your a parameter is negative then you get an imaginary value for your function within your fit range. For this reason you need to add constraints and an initial set of parameters to your curve_fit function for example:
parameters, covariance = curve_fit(eTemp, xH_data, nH_data, method='dogbox', p0 = [100, -3.3, 10E8], bounds=((-0.9, -10, 0), (200, -1, 10e9)), maxfev=200000)
You need to change the method to 'dogbox' in order to perform this fit with the constraints.
Related
I'm having trouble getting a fit to converge, as it's either not converging or giving a NaN error, depending on my start parameters. I'm using quad to integrate and fitting using lmfit. Any help is appreciated.
I'm fitting my data to a Langevin function, weighted by a log-normal distribution. Stackoverflow won't let me post an image of the function because of my reputation score, but it's in the code below.
I'm plugging in H (field) and fitting for Ms, Dm, and sigma, while mu_0, Msb, kb, and T are all constants.
Here's what I'm working with, using some example data:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy
from numpy import vectorize, sqrt, log, inf, exp, pi, tanh
from scipy.constants import k, mu_0
from lmfit import Parameters
from scipy.integrate import quad
x_data = [-7.0, -6.5, -6.0, -5.5, -5.0, -4.5, -4.0, -3.5, -3.0, -2.5, -2.0, -1.5, -1.0,
-0.95, -0.9, -0.85, -0.8, -0.75, -0.7, -0.65, -0.6, -0.55, -0.5, -0.45, -0.4,
-0.35, -0.3, -0.25, -0.2, -0.1,-0.05, 3e-6, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3,
0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0,
1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0]
y_data = [-61.6, -61.6, -61.6, -61.5, -61.5, -61.4, -61.3, -61.2, -61.1, -61.0, -60.8,
-60.4, -59.8, -59.8, -59.7, -59.5, -59.4, -59.3, -59.1, -58.9, -58.7, -58.4,
-58.1, -57.7, -57.2, -56.5, -55.6, -54.3, -52.2, -48.7, -41.8, -27.3, 2.6,
30.1, 43.1, 49.3, 52.6, 54.5, 55.8, 56.6, 57.3, 57.8, 58.2, 58.5, 58.7, 59.0,
59.1, 59.3, 59.5, 59.6, 59.7, 59.8, 59.9, 60.5, 60.8, 61.0, 61.2, 61.3, 61.4,
61.4, 61.5, 61.6, 61.6, 61.7, 61.7]
params = Parameters()
params.add('Dm' , value = 8e-9 , vary = True, min = 0, max = 1) # magnetic diameter (m)
params.add('s' , value = 0.4 , vary = True, min = 0.0, max = 10.0) # sigma, unitless
params.add('Ms' , value = 61.0 , vary = True) #, min = 30.0 , max = 100.0) # saturation magnetization (emu/g)
params.add('Msb', value = 446000 * 1e-16, vary = False) # Bulk magnetite saturation magnetization (A/m)
params.add('T' , value = 300 , vary = False) # Temperature (K)
def Mag(x_data, params):
v = params.valuesdict() # put parameters into a dictionary
def numerator(D, x_data, params):
# langevin
a_numerator = pi * v['Msb'] * x_data * D**3
a_denominator = 6*k*v['T']
a = a_numerator / a_denominator
langevin = (1/tanh(a)) - (1/a)
# PDF
exp_num = (log(D/v['Dm']))**2
exp_denom = 2 * v['s']
exponential = exp(-exp_num/exp_denom)
pdf = exponential/(sqrt(2*pi) * v['s'] * D)
return D**3 * langevin * pdf
def denominator(D, params):
# PDF
exp_num = (log(D/v['Dm']))**2
exp_denom = 2 * v['s']
exponential = exp(-exp_num/exp_denom)
pdf = exponential/(sqrt(2*pi) * v['s'] * D)
return D**3 * pdf
# return integrals
return v['Ms'] * quad(numerator, 0, inf, args=(x_data, params))[0] / quad(denominator, 0, inf,args=(params))[0]
# vectorize
vcurve = np.vectorize(Mag, excluded=set([1]))
plt.plot(x_data, vcurve(x_data, params))
plt.scatter(x_data, y_data)
This plots the data and the fit equation with start parameters. I have an issue somewhere with units in the Langevin and have to multiply the numerator by 1e-16 to get the curve looking correct...
from lmfit import minimize, Minimizer, Parameters, Parameter, report_fit
def fit_function(params, x_data, y_data):
model1 = vcurve(x_data, params)
resid1 = y_data - model1
return resid1
minner = Minimizer(fit_function, params, fcn_args=(x_data, y_data))
result = minner.minimize()
report_fit(result)
result.params.pretty_print()
Depending on the sigma (s) value I choose, which should be able to range from 0 to infinity, the integral won't converge, giving the following error:
/var/folders/pz/tbd_dths0_512bm6l43vpg680000gp/T/ipykernel_68003/1413445460.py:39: IntegrationWarning: The algorithm does not converge. Roundoff error is detected
in the extrapolation table. It is assumed that the requested tolerance
cannot be achieved, and that the returned result (if full_output = 1) is
the best which can be obtained.
return v['Ms'] * quad(numerator, 0, inf, args=(x_data, params))[0] / quad(denominator, 0, inf,args=(params))[0]
I'm stuck on why the fit isn't converging. Is this an issue because I'm using very small numbers or is this an issue with quad/lmfit? Thank you!
Having parameters that are closer to order 1 (say, between 1e-7 and 1e7) is a good idea. If you expect a parameter is in the 1.e-9 (or 1.e-16!) range, you could definitely scale it (in the fitting function) so that the value passed back and forth by the fitting algorithm is closer to order 1. But, I sort of doubt that is the main problem you are having.
It looks to me like your Mag function is not very sensitive to the values of your variable parameters Dm and s. I am not 100% sure why that is. Have you verified that calculations using your "Mag" or "vcurve" do what you expect them to do?
I've tried scipy.optimize import curve_fit but it only seems to change the data points. I want to add a 1/Y^2 weighting during the fitting of the Ycurve from my data points residuals (least sq with weighting). I'm not sure how to target the yfit instead of ydata or if I should use something else? Any help would be appreciated.
xdata = np.array([0.10, 0.10, 0.10, 0.10, 0.10, 0.10, 1.12, 1.12, 1.12, 1.12, 1.12, 1.12, 2.89, 2.89, 2.89, 2.89,
2.89, 2.89, 6.19, 6.19, 6.19, 6.19, 6.19, 23.30, 23.30, 23.30, 23.30, 23.30, 23.30, 108.98, 108.98,
108.98, 108.98, 108.98, 255.33, 255.33, 255.33, 255.33, 255.33, 255.33, 1188.62, 1188.62, 1188.62,
1188.62, 1188.62], dtype=float)
ydata = np.array([0.264352, 0.412386, 0.231238, 0.483558, 0.613206, 0.728528, -1.15391, -1.46504, -0.942926,
-2.12808, -2.90962, -1.51093, -3.09798, -5.08591, -4.75703, -4.91317, -5.1966, -4.04019, -13.8455,
-16.9911, -11.0881, -10.6453, -15.1288, -52.4669, -68.2344, -74.7673, -70.2025, -65.8181, -55.7344,
-271.286, -329.521, -436.097, -654.034, -396.45, -826.195, -1084.43, -984.344, -1124.8, -1076.27,
-1072.03, -3968.22, -3114.46, -3771.61, -2805.4, -4078.05], dtype=float)
def fourPL(x, A, B, C, D):
return ((A-D)/(1.0+((x/C)**B))) + D
params, params_covariance = spo.curve_fit(fourPL, xdata, ydata)
params_list = params
roundy = [round(num, 4) for num in params_list]
print(roundy)
popt2, pcov2 = spo.curve_fit(fourPL, xdata, ydata, sigma=1/ydata**2, absolute_sigma=True)
yfit2 = fourPL(xdata, *popt2)
params_list2 = popt2
roundy2 = [round(num, 4) for num in params_list2]
print(roundy2)
x_min, x_max = np.amin(xdata), np.amax(xdata)
xs = np.linspace(x_min, x_max, 1000)
plt.scatter(xdata, ydata)
plt.plot(xs, fourPL(xs, *params), 'm--', label='No Weight')
plt.plot(xs, fourPL(xs, *popt2), 'b--', label='Weights')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, 1.05, 0.1, 0.1),
ncol=3, fancybox=True, shadow=True)
plt.xlabel('µg/mL)')
plt.ylabel('kHz/s')
#plt.xscale('log')
plt.show()
```
This would be my version using a manual least_squares fit. I compare it to the solution obtained with the simple curve_fit. Actually the difference is not very big and the curve_fit result looks good to me.
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import least_squares
from scipy.optimize import curve_fit
np.set_printoptions( linewidth=250, precision=2 ) ## avoids OPs "roundy"
xdata = np.array(
[
0.10, 0.10, 0.10, 0.10, 0.10, 0.10, 1.12, 1.12, 1.12, 1.12,
1.12, 1.12, 2.89, 2.89, 2.89, 2.89, 2.89, 2.89, 6.19, 6.19,
6.19, 6.19, 6.19, 23.30, 23.30, 23.30, 23.30, 23.30, 23.30,
108.98, 108.98, 108.98, 108.98, 108.98, 255.33, 255.33, 255.33,
255.33, 255.33, 255.33, 1188.62, 1188.62, 1188.62, 1188.62,
1188.62
], dtype=float
)
ydata = np.array(
[
0.264352, 0.412386, 0.231238, 0.483558, 0.613206, 0.728528,
-1.15391, -1.46504, -0.942926, -2.12808, -2.90962, -1.51093,
-3.09798, -5.08591, -4.75703, -4.91317, -5.1966, -4.04019,
-13.8455, -16.9911, -11.0881, -10.6453, -15.1288, -52.4669,
-68.2344, -74.7673, -70.2025, -65.8181, -55.7344, -271.286,
-329.521, -436.097, -654.034, -396.45, -826.195, -1084.43,
-984.344, -1124.8, -1076.27, -1072.03, -3968.22, -3114.46,
-3771.61, -2805.4, -4078.05
], dtype=float
)
def fourPL( x, a, b, c, d ):
out = ( a - d ) / ( 1 + ( x / c )**b ) + d
return out
def residuals( params, xlist, ylist ):
# ~a, b, c, d = params
yth = np.fromiter( (fourPL( x, *params ) for x in xlist ), np.float )
diff = np.subtract( yth, ylist )
weights = 1 / np.abs( yth )**1
## not sure if this makes sense, but we weigth with function value
## here I put it inverse linear as it gets squared in the chi-square
## but other weighting may be required
return diff * weights
### for initial guess
xl = np.linspace( .1, 1200, 150 )
yl0 = np.fromiter( (fourPL( x, 1, 1, 1000, -6000) for x in xl ), np.float )
### standard curve_fit
cfsol, _ = curve_fit( fourPL, xdata, ydata, p0=( 1, 1, 1000, -6000 ) )
print( cfsol )
yl1 = np.fromiter( (fourPL( x, *cfsol ) for x in xl ), np.float )
### least square with manual residual function including "unusual" weighting
lssol = least_squares( residuals, x0=( 1, 1, 1000, -6000 ), args=( xdata, ydata ) )
print( lssol.x )
yl2 = np.fromiter( (fourPL( x, *( lssol.x ) ) for x in xl ), np.float )
### plotting
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.scatter( xdata, ydata )
ax.plot( xl, yl1 )
ax.plot( xl, yl2 )
plt.show()
Looking at the covariance matrix, not only reveals rather large errors, but also quite high correlation of the parameters. This is due to the nature of the model of course, but should be handled with care, especially when it comes to data interpretation.
The problem becomes even more obvious when we only consider data for small x. The data for large x scatters a lot anyhow. For small x or large c the Taylor expansion is (a - d ) (1 - b x / c ) + d which isa - (a - d ) b / c x and that is basically a + e x. So b, c and d are basically doing the same.
I have what may be quite a basic question, but a quick googling was not able to solve it.
So I have some experimental data that I need to fit with an equation like
a * exp^{-x/t}
in the case of needing more components the expression is
a * exp^{-x/t1} + b * exp^{-x/t2} ... + n * exp^{-x/tn}
for n elements
Right now I have the following code
x = np.array([0.0001, 0.0004, 0.0006, 0.0008, 0.001, 0.0015, 0.002, 0.004, 0.006, 0.008, 0.01, 0.05, 0.1, 0.2, 0.5, 0.6, 0.8, 1, 1.5, 2, 4, 6, 8])
y1= np.array([5176350.00, 5144208.69, 4998297.04, 4787100.79, 4555731.93, 4030741.17, 3637802.79, 2949911.45, 2816472.26, 2831962.09, 2833262.53, 2815205.34, 2610685.14, 3581566.94, 1820610.74, 2100882.80, 1762737.50, 1558251.40, 997259.21, 977892.00, 518709.91, 309594.88, 186184.52])
y2 = np.array([441983.26, 423371.31, 399370.82, 390603.58, 378351.08, 356511.93, 349582.29, 346425.39, 351191.31, 329363.40, 325154.86, 352906.21, 333150.81, 301613.81, 94043.05, 100885.77, 86193.40, 75548.26, 27958.11, 20262.68, 27945.10])
def fitcurve (x, a, b, t1, t2):
return a * np.exp(- x / t1) + b * np.exp(- x / t2)
popt, pcov = curve_fit(fitcurve, x, y)
print('a = ', popt[0], 'b = ', popt[1], 't1 = ', popt[2], 't2 = ', popt[3])
plt.plot(x,y, 'bo')
plt.plot(x,fitcurve(x, *popt))
Something important is that a+b+...n = is equal to 1. Basically the percentage of each component. Ideally, I want to plot 1, 2, 3 and 4 components and see which ones provide a better fitting
I am afraid that your data cannot be fitted with a simple sum of exponential functions. Did you draw the points on a graph in order to see what is the shape of the curve ?
This looks more like a function of logistic kind (but not exactly logistic) than a sum of exponentials.
I could provide some advises to fit a sum of exponential (even with condition about the sum of coefficients). But this would be of no use with your data. Of course if you have other data convenient to fit a sum of exponentials, I would be pleased to show how to proceed.
I am not going into the model-fitting procedure but what you can do is argparse variable number of paramters and then try to fit for various numbers of exponentials. You can make use of the broadcasting feature of numpy to achieve this.
EDIT: you have to take care of the number of elements in argparse. Only even numbers works now. I leave it up to you to edit that part in (trivial).
Target
We want to fit $$\sum_i^N a_i \exp(-b_i x)$$ for variable $n$
Output:
Implementation:
from scipy import optimize, ndimage, interpolate
x = np.array([0.0001, 0.0004, 0.0006, 0.0008, 0.0010, 0.0015, 0.0020, 0.0040, 0.0060, 0.0080, 0.0100, 0.0500, 0.1000, 0.2000, 0.5000, 0.6000, 0.8000, 1.0000, 1.5000, 2.0000, 4.0000, 6.0000, 8.0000, 10.0000])
y = np.array([416312.6500, 387276.6400, 364153.7600, 350981.7000, 336813.8800, 314992.6100, 310430.4600, 318255.1700, 318487.1700, 291768.9700, 276617.3000, 305250.2100, 272001.3500, 260540.5600, 173677.1900, 155821.5500, 151502.9700, 83559.9000, 256097.3600, 20761.8400, 1.0000, 1.0000, 1.0000, 1.0000])
# variable args fit
def fitcurve (x, *args):
args = np.array(args)
half = len(args)//2
y = args[:half] * np.exp(-x[:, None] * args[half:])
return y.sum(-1)
# data seems to contain outlier?
# y = ndimage.median_filter(y, 5)
popt, pcov = optimize.curve_fit(fitcurve, x, y,
bounds = (0, np.inf),
p0 = np.ones(6), # set variable size
maxfev = 1e3,
)
fig, ax = plt.subplots()
ax.plot(x,y, 'bo')
# ax.set_yscale('log')
ax.set_xscale('symlog')
ax.plot(x,fitcurve(x, *popt))
fig.show()
I would like to interpolate between two lists in which 1st one contains numbers and second one contains arrays.
I tried using interp1d from scipy, but it did not work
from scipy import interpolate
r = [2,3,4]
t = [5,6,7]
f = [r,t]
q = [10,20]
c = interpolate.interp1d(q, f)
I would like to get an array, for example at value 15, which should be interpolated values between r and t arrays
Error message:
ValueError: x and y arrays must be equal in length along interpolation axis.
In the simple example of the OP it does not make a difference whether one takes 1D or 2D interpolation. If more vectors come into play, however, it makes a difference. Here both options, using numpy and taking care of floating point.
from scipy.interpolate import interp1d
from scipy.interpolate import interp2d
import numpy as np
r = np.array( [ 1, 1, 2], np.float )
s = np.array( [ 2, 3, 4], np.float )
t = np.array( [ 5, 6, 12], np.float ) # length of r,s,t,etc must be equal
f = np.array( [ r, s, t ] )
q = np.array( [ 0, 10, 20 ], np.float ) # length of q is length of f
def interpolate_my_array1D( x, xData, myArray ):
out = myArray[0].copy()
n = len( out )
for i in range(n):
vec = myArray[ : , i ]
func = interp1d( xData, vec )
out[ i ] = func( x )
return out
def interpolate_my_array2D( x, xData, myArray ):
out = myArray[0].copy()
n = len( out )
xDataLoc = np.concatenate( [ [xx] * n for xx in xData ] )
yDataLoc = np.array( range( n ) * len( xData ), np.float )
zDataLoc = np.concatenate( myArray )
func = interp2d( xDataLoc, yDataLoc, zDataLoc )
out = np.fromiter( ( func( x, yy ) for yy in range(n) ), np.float )
return out
print interpolate_my_array1D( 15., q, f )
print interpolate_my_array2D( 15., q, f )
giving
>> [3.5 4.5 5.5]
>> [2.85135135 4.17567568 6.05405405]
Following is the link to the interp1d function in scipy documentation interpolate SciPy.
From the docs you can see that the function does not take a list of list as an input. the inputs need to be either numpy arrays or list of primitive values.
I have the following data:
>>> x
array([ 3.08, 3.1 , 3.12, 3.14, 3.16, 3.18, 3.2 , 3.22, 3.24,
3.26, 3.28, 3.3 , 3.32, 3.34, 3.36, 3.38, 3.4 , 3.42,
3.44, 3.46, 3.48, 3.5 , 3.52, 3.54, 3.56, 3.58, 3.6 ,
3.62, 3.64, 3.66, 3.68])
>>> y
array([ 0.000857, 0.001182, 0.001619, 0.002113, 0.002702, 0.003351,
0.004062, 0.004754, 0.00546 , 0.006183, 0.006816, 0.007362,
0.007844, 0.008207, 0.008474, 0.008541, 0.008539, 0.008445,
0.008251, 0.007974, 0.007608, 0.007193, 0.006752, 0.006269,
0.005799, 0.005302, 0.004822, 0.004339, 0.00391 , 0.003481,
0.003095])
Now, I want to fit these data with, say, a 4 degree polynomial. So I do:
>>> coefs = np.polynomial.polynomial.polyfit(x, y, 4)
>>> ffit = np.poly1d(coefs)
Now I create a new grid for x values to evaluate the fitting function ffit:
>>> x_new = np.linspace(x[0], x[-1], num=len(x)*10)
When I do all the plotting (data set and fitting curve) with the command:
>>> fig1 = plt.figure()
>>> ax1 = fig1.add_subplot(111)
>>> ax1.scatter(x, y, facecolors='None')
>>> ax1.plot(x_new, ffit(x_new))
>>> plt.show()
I get the following:
fitting_data.png
What I expect is the fitting function to fit correctly (at least near the maximum value of the data). What am I doing wrong?
Unfortunately, np.polynomial.polynomial.polyfit returns the coefficients in the opposite order of that for np.polyfit and np.polyval (or, as you used np.poly1d). To illustrate:
In [40]: np.polynomial.polynomial.polyfit(x, y, 4)
Out[40]:
array([ 84.29340848, -100.53595376, 44.83281408, -8.85931101,
0.65459882])
In [41]: np.polyfit(x, y, 4)
Out[41]:
array([ 0.65459882, -8.859311 , 44.83281407, -100.53595375,
84.29340846])
In general: np.polynomial.polynomial.polyfit returns coefficients [A, B, C] to A + Bx + Cx^2 + ..., while np.polyfit returns: ... + Ax^2 + Bx + C.
So if you want to use this combination of functions, you must reverse the order of coefficients, as in:
ffit = np.polyval(coefs[::-1], x_new)
However, the documentation states clearly to avoid np.polyfit, np.polyval, and np.poly1d, and instead to use only the new(er) package.
You're safest to use only the polynomial package:
import numpy.polynomial.polynomial as poly
coefs = poly.polyfit(x, y, 4)
ffit = poly.polyval(x_new, coefs)
plt.plot(x_new, ffit)
Or, to create the polynomial function:
ffit = poly.Polynomial(coefs) # instead of np.poly1d
plt.plot(x_new, ffit(x_new))
Note that you can use the Polynomial class directly to do the fitting and return a Polynomial instance.
from numpy.polynomial import Polynomial
p = Polynomial.fit(x, y, 4)
plt.plot(*p.linspace())
p uses scaled and shifted x values for numerical stability. If you need the usual form of the coefficients, you will need to follow with
pnormal = p.convert(domain=(-1, 1))