Is there a python library for Multivariate Interpolation? Right now I have three independent variables and one dependent variable. My data looks like this:
X1=[3,3,3.1,3.1,4.2,5.2,6.3,2.3,7.4,8.4,5.4,3.4,3.4,3.4,...]
X2=[12.1,12.7,18.5,18.3,18.4,18.6,24.2,24.4,24.3,24.5,30.9,30.7,30.3,30.4,6.1,6.2,...]
X3=[0.3,9.2,0.3,9.4,0.1,9.8,0.4,9.3,0.7,9.7,18.3,27.4,0.6,9.44,...]
Y=[-5.890,-5.894,2.888,-3.8706,2.1516,-2.7334,1.4723,-2.1049,0.9167,-1.7281,-2.091,-6.7394,0.8777,-1.7046,...]
and len(X1)=len(X2)=len(X3)=len(Y)=400
I want to fit or interpolate the data so that given arbitrary x1, x2, x3 values, the function f(x1,x2,x3) is going to yield an estimated y value. Like given x1=4.11, x2=10.34, and x3=10.78, the function is going to yield -8.7567(best estimate). I'd imagine the function is going to be polynomial. So maybe a spline interpolation is the best option here?
curve_fit in scipy.optimize workd. In this code, estimation is linear function but it might be better one.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
X1=[3,3,3.1,3.1,4.2,5.2,6.3,2.3,7.4,8.4,5.4,3.4,3.4,3.4]
X2=[12.1,12.7,18.5,18.3,18.4,18.6,24.2,24.4,24.3,24.5,30.9,30.7,30.3,30.4]
X3=[0.3,9.2,0.3,9.4,0.1,9.8,0.4,9.3,0.7,9.7,18.3,27.4,0.6,9.44]
Y=[-5.890,-5.894,2.888,-3.8706,2.1516,-2.7334,1.4723,-2.1049,0.9167,-1.7281,-2.091,-6.7394,0.8777,-1.7046]
def fitFunc(x, a, b, c, d):
return a + b*x[0] + c*x[1] + d*x[2]
fitParams, fitCovariances = curve_fit(fitFunc, [X1, X2, X3], Y)
print(' fit coefficients:\n', fitParams)
# fit coefficients:
# [-6.11934208 0.21643939 0.26186705 -0.33794415]
Then use fitParams[0] + fitParams[1] * x1 + fitParams[2] * x2 + fitParams[3] * x3 is estimated y.
# get single y
def estimate(x1, x2, x3):
return fitParams[0] + fitParams[1] * x1 + fitParams[2] * x2 + fitParams[3] * x3
Compare the result with original y.
Y_estimated = [estimate(X1[i], X2[i], X3[i]) for i in range(len(X1))]
fig, ax = plt.subplots()
ax.scatter(Y, Y_estimated)
lims = [
np.min([ax.get_xlim(), ax.get_ylim()]), # min of both axes
np.max([ax.get_xlim(), ax.get_ylim()]), # max of both axes
]
ax.set_xlabel('Y')
ax.set_ylabel('Y_estimated')
ax.plot(lims, lims, 'k-', alpha=0.75, zorder=0)
ax.set_aspect('equal')
Reference scipy , stackoverflow-multifit, stackoverflow-plot xy
Related
I just used scipy.odeint to solve a diff_equation system, and use matplotlib to plot it. I got the graphs. My question is can I get some specific data points, like when t = 1, what is x1, x2, x3. I need when t = 1,2,3,4..., what value of concentration is. Thank you.
import matplotlib.pyplot as plt
from scipy.integrate import odeint
Dose = 100
V = 43.8
k12 = 1.2 # rate of central -> peripheral
k21 = 1.4 # rate of peripheral -> central
kel = 0.20 # rate of excrete from plasma
def diff(d_list, t):
x1, x2, x3, = d_list
# X1(t), X2(t), X3(t)
return np.array([(-k12*x1-kel*x1+k21*x2),
(k12*x1-k21*x2),
(kel*x1)])
t = np.linspace(0, 24, 960)
result = odeint(diff, [(Dose/V), 0, 0], t)
plt.plot(t, result[:, 0], label='x1: central')
plt.plot(t, result[:, 1], label='x2: tissue')
plt.plot(t, result[:, 2], label='x3: excreted')
plt.legend()
plt.xlabel('t (hr)')
plt.ylabel('Concentration (mg/L)')
plt.show()
This is not related to matplotlib or scipy. You can either interpolate or get the closest data point.
Interpolated value
If you need to get the x1, x2 and x3 for values of t which do not correspond to a data point (you mentioned 1,2,3,4 which are not in your t array), you will need to interpolate. To get x1, x2 and x3 at t=1, you can do (at the end of your script):
valuesAt1 = [np.interp(1, t, result[:,col]) for col in range(result.shape[1])]
The output of print(valuesAt1) is then:
[1.1059703843218311, 0.8813129004034452, 0.2958217381057726]
If you only need x1, just do
valuesAt1 = np.interp(1, t, result[:,0])
then, the output of print(valuesAt1) is:
1.1059703843218311
Closest data point
If you do not want to do interpolation but want the value of x1, x2 and x3 for the value of the t array element which is the closest from 1, do:
valuesAtClosestPointFrom1 = result[ np.argmin(np.abs(t-1))]
The output from print(valuesAtClosestPointFrom1) is:
[1.10563546 0.88141641 0.29605315]
This can be done by interpolation and using scipy.interpolate.InterpolatedUnivariateSpline as follows:
from scipy.interpolate import InterpolatedUnivariateSpline
splx1 = InterpolatedUnivariateSpline(t, result[:,0])
splx2 = InterpolatedUnivariateSpline(t, result[:,1])
splx3 = InterpolatedUnivariateSpline(t, result[:,2])
Firstly, you need to pass the x and y data that you want to interpolate. Secondly, create a list for x for which you want the desired values of y.
import numpy as np
desired_time = np.arange(1,25)
x1 = splx1(desired_time)
x2 = splx2(desired_time)
x3 = splx3(desired_time)
Lastly, pass it to the respective spline object to get the desired values. For example, a desired_time array from 1 to 24 using np.arange is created and passed to the spline objects in the example above.
I have a original curve. I am developing a model curve matching closely the original curve. Everything is working fine but not matching. How to control the curvature of my model curve? Below code is based on answer here.
My code:
def curve_line(point1, point2):
a = (point2[1] - point1[1])/(np.cosh(point2[0]) - np.cosh(point1[0]))
b = point1[1] - a*np.sinh(point1[0])
x = np.linspace(point1[0], point2[0],100).tolist()
y = (a*np.cosh(x) + b).tolist()
return x,y
###### A sample of my code is given below
point1 = [10,100]
point2 = [20,50]
x,y = curve_line(point1, point2)
plt.plot(point1[0], point1[1], 'o')
plt.plot(point2[0], point2[1], 'o')
plt.plot(x,y) ## len(x)
My present output:
I tried following function as well:
y = (50*np.exp(-x/10) +2.5)
The output is:
Instead of just guessing the right parameters of your model function, you can fit a model curve to your data using curve_fit.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
x = np.array([ 1.92, 14.35, 21.50, 25.27, 27.34, 30.32, 32.31, 34.09, 34.21])
y = np.array([8.30, 8.26, 8.13, 7.49, 6.66, 4.59, 2.66, 0.60, 0.06])
def fun(x, a, b, c):
return a * np.cosh(b * x) + c
coef,_ = curve_fit(fun, x, y)
plt.plot(x, y, label='Original curve')
plt.plot(x, fun(x, *coef), label=f'Model: %5.3f cosh(%4.2f x + %4.2f)' % tuple(coef) )
plt.legend()
plt.show()
If it is important that the start and end points are closely fitted, you can pass uncertainties to curve_fit, adjusting them to lower values towards the ends, e.g. by
s = np.ones(len(x))
s[1:-1] = s[1:-1] * 3
coef,_ = curve_fit(fun, x, y, sigma=s)
Your other approach a * np.exp(b * x) + c will also work and gives -0.006 exp(0.21 x + 8.49).
In some cases you'll have to provide an educated guess for the initial values of the coefficients to curve_fit (it uses 1 as default).
OK, I have a function which uses a range of parameters to calculate the effect on two separate variables over time. These variables have already been curve-matched to some existing data to minimize the variation (shown below)
I want to be able to check the previous working, and match new data. I have been trying to use the scipy.optimize.curve_fit function, by stacking the x and y data resulting from my function (as suggested here: fit multiple parametric curves with scipy).
It may not be the right method, or I may just be misunderstanding, but my code keeps running into a type error TypeError: Improper input: N=3 must not exceed M=2
My simplified prototype code was initially taken from here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
result = ([],[])
for i in x:
#set up 2 example curves
result[0].append(a * np.exp(-b * i) + c)
result[1].append(a * np.exp(-b * i) + c**2)
return result #as a tuple containing 2 lists
#Define the data to be fit with some noise:
xdata = list(np.arange(0, 10, 1))
y = func(xdata, 2.5, 5, 0.5)[0]
y2 = func(xdata, 1, 1, 2)[1]
#Add some noise
y_noise = 0.1 * np.random.normal(size=len(xdata))
y2_noise = 0.1 * np.random.normal(size=len(xdata))
ydata=[]
ydata2=[]
for i in range(len(y)): #clunky
ydata.append(y[i] + y_noise[i])
ydata2.append(y2[i] + y2_noise[i])
plt.scatter(xdata, ydata, label='data')
plt.scatter(xdata, ydata2, label='data2')
#plt.plot(xdata, y, 'k-', label='data (original function)')
#plt.plot(xdata, y2, 'k-', label='data2 (original function)')
#stack the data
xdat = xdata+xdata
ydat = ydata+ydata2
popt, pcov = curve_fit(func, xdat, ydat)
plt.plot(xdata, func(xdata, *popt), 'r-',
label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
Any help much appreciated !
Here is graphing example code that fits two different equations with a single shared parameter, if this looks like what you need it can easily be adapted for your specific problem.
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
y1 = np.array([ 16.00, 18.42, 20.84, 23.26])
y2 = np.array([-20.00, -25.50, -31.00, -36.50, -42.00])
comboY = np.append(y1, y2)
x1 = np.array([5.0, 6.1, 7.2, 8.3])
x2 = np.array([15.0, 16.1, 17.2, 18.3, 19.4])
comboX = np.append(x1, x2)
if len(y1) != len(x1):
raise(Exception('Unequal x1 and y1 data length'))
if len(y2) != len(x2):
raise(Exception('Unequal x2 and y2 data length'))
def function1(data, a, b, c): # not all parameters are used here, c is shared
return a * data + c
def function2(data, a, b, c): # not all parameters are used here, c is shared
return b * data + c
def combinedFunction(comboData, a, b, c):
# single data reference passed in, extract separate data
extract1 = comboData[:len(x1)] # first data
extract2 = comboData[len(x1):] # second data
result1 = function1(extract1, a, b, c)
result2 = function2(extract2, a, b, c)
return np.append(result1, result2)
# some initial parameter values
initialParameters = np.array([1.0, 1.0, 1.0])
# curve fit the combined data to the combined function
fittedParameters, pcov = curve_fit(combinedFunction, comboX, comboY, initialParameters)
# values for display of fitted function
a, b, c = fittedParameters
y_fit_1 = function1(x1, a, b, c) # first data set, first equation
y_fit_2 = function2(x2, a, b, c) # second data set, second equation
plt.plot(comboX, comboY, 'D') # plot the raw data
plt.plot(x1, y_fit_1) # plot the equation using the fitted parameters
plt.plot(x2, y_fit_2) # plot the equation using the fitted parameters
plt.show()
print('a, b, c:', fittedParameters)
Suppose I have x and y vectors with a weight vector wgt. I can fit a cubic curve (y = a x^3 + b x^2 + c x + d) by using np.polyfit as follows:
y_fit = np.polyfit(x, y, deg=3, w=wgt)
Now, suppose I want to do another fit, but this time, I want the fit to pass through 0 (i.e. y = a x^3 + b x^2 + c x, d = 0), how can I specify a particular coefficient (i.e. d in this case) to be zero?
Thanks
You can try something like the following:
Import curve_fit from scipy, i.e.
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import numpy as np
Define the curve fitting function. In your case,
def fit_func(x, a, b, c):
# Curve fitting function
return a * x**3 + b * x**2 + c * x # d=0 is implied
Perform the curve fitting,
# Curve fitting
params = curve_fit(fit_func, x, y)
[a, b, c] = params[0]
x_fit = np.linspace(x[0], x[-1], 100)
y_fit = a * x_fit**3 + b * x_fit**2 + c * x_fit
Plot the results if you please,
plt.plot(x, y, '.r') # Data
plt.plot(x_fit, y_fit, 'k') # Fitted curve
It does not answer the question in the sense that it uses numpy's polyfit function to pass through the origin, but it solves the problem.
Hope someone finds it useful :)
You can use np.linalg.lstsq and construct your coefficient matrix manually. To start, I'll create the example data x and y, and the "exact fit" y0:
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(100)
y0 = 0.07 * x ** 3 + 0.3 * x ** 2 + 1.1 * x
y = y0 + 1000 * np.random.randn(x.shape[0])
Now I'll create a full cubic polynomial 'training' or 'independent variable' matrix that includes the constant d column.
XX = np.vstack((x ** 3, x ** 2, x, np.ones_like(x))).T
Let's see what I get if I compute the fit with this dataset and compare it to polyfit:
p_all = np.linalg.lstsq(X_, y)[0]
pp = np.polyfit(x, y, 3)
print np.isclose(pp, p_all).all()
# Returns True
Where I've used np.isclose because the two algorithms do produce very small differences.
You're probably thinking 'that's nice, but I still haven't answered the question'. From here, forcing the fit to have a zero offset is the same as dropping the np.ones column from the array:
p_no_offset = np.linalg.lstsq(XX[:, :-1], y)[0] # use [0] to just grab the coefs
Ok, let's see what this fit looks like compared to our data:
y_fit = np.dot(p_no_offset, XX[:, :-1].T)
plt.plot(x, y0, 'k-', linewidth=3)
plt.plot(x, y_fit, 'y--', linewidth=2)
plt.plot(x, y, 'r.', ms=5)
This gives this figure,
WARNING: When using this method on data that does not actually pass through (x,y)=(0,0) you will bias your estimates of your output solution coefficients (p) because lstsq will be trying to compensate for that fact that there is an offset in your data. Sort of a 'square peg round hole' problem.
Furthermore, you could also fit your data to a cubic only by doing:
p_ = np.linalg.lstsq(X_[:1, :], y)[0]
Here again the warning above applies. If your data contains quadratic, linear or constant terms the estimate of the cubic coefficient will be biased. There can be times when - for numerical algorithms - this sort of thing is useful, but for statistical purposes my understanding is that it is important to include all of the lower terms. If tests turn out to show that the lower terms are not statistically different from zero that's fine, but for safety's sake you should probably leave them in when you estimate your cubic.
Best of luck!
Python's curve_fit calculates the best-fit parameters for a function with a single independent variable, but is there a way, using curve_fit or something else, to fit for a function with multiple independent variables? For example:
def func(x, y, a, b, c):
return log(a) + b*log(x) + c*log(y)
where x and y are the independent variable and we would like to fit for a, b, and c.
You can pass curve_fit a multi-dimensional array for the independent variables, but then your func must accept the same thing. For example, calling this array X and unpacking it to x, y for clarity:
import numpy as np
from scipy.optimize import curve_fit
def func(X, a, b, c):
x,y = X
return np.log(a) + b*np.log(x) + c*np.log(y)
# some artificially noisy data to fit
x = np.linspace(0.1,1.1,101)
y = np.linspace(1.,2., 101)
a, b, c = 10., 4., 6.
z = func((x,y), a, b, c) * 1 + np.random.random(101) / 100
# initial guesses for a,b,c:
p0 = 8., 2., 7.
print(curve_fit(func, (x,y), z, p0))
Gives the fit:
(array([ 9.99933937, 3.99710083, 6.00875164]), array([[ 1.75295644e-03, 9.34724308e-05, -2.90150983e-04],
[ 9.34724308e-05, 5.09079478e-06, -1.53939905e-05],
[ -2.90150983e-04, -1.53939905e-05, 4.84935731e-05]]))
optimizing a function with multiple input dimensions and a variable number of parameters
This example shows how to fit a polynomial with a two dimensional input (R^2 -> R) by an increasing number of coefficients. The design is very flexible so that the callable f from curve_fit is defined once for any number of non-keyword arguments.
minimal reproducible example
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def poly2d(xy, *coefficients):
x = xy[:, 0]
y = xy[:, 1]
proj = x + y
res = 0
for order, coef in enumerate(coefficients):
res += coef * proj ** order
return res
nx = 31
ny = 21
range_x = [-1.5, 1.5]
range_y = [-1, 1]
target_coefficients = (3, 0, -19, 7)
xs = np.linspace(*range_x, nx)
ys = np.linspace(*range_y, ny)
im_x, im_y = np.meshgrid(xs, ys)
xdata = np.c_[im_x.flatten(), im_y.flatten()]
im_target = poly2d(xdata, *target_coefficients).reshape(ny, nx)
fig, axs = plt.subplots(2, 3, figsize=(29.7, 21))
axs = axs.flatten()
ax = axs[0]
ax.set_title('Unknown polynomial P(x+y)\n[secret coefficients: ' + str(target_coefficients) + ']')
sm = ax.imshow(
im_target,
cmap = plt.get_cmap('coolwarm'),
origin='lower'
)
fig.colorbar(sm, ax=ax)
for order in range(5):
ydata=im_target.flatten()
popt, pcov = curve_fit(poly2d, xdata=xdata, ydata=ydata, p0=[0]*(order+1) )
im_fit = poly2d(xdata, *popt).reshape(ny, nx)
ax = axs[1+order]
title = 'Fit O({:d}):'.format(order)
for o, p in enumerate(popt):
if o%2 == 0:
title += '\n'
if o == 0:
title += ' {:=-{w}.1f} (x+y)^{:d}'.format(p, o, w=int(np.log10(max(abs(p), 1))) + 5)
else:
title += ' {:=+{w}.1f} (x+y)^{:d}'.format(p, o, w=int(np.log10(max(abs(p), 1))) + 5)
title += '\nrms: {:.1f}'.format( np.mean((im_fit-im_target)**2)**.5 )
ax.set_title(title)
sm = ax.imshow(
im_fit,
cmap = plt.get_cmap('coolwarm'),
origin='lower'
)
fig.colorbar(sm, ax=ax)
for ax in axs.flatten():
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()
P.S. The concept of this answer is identical to my other answer here, but the code example is way more clear. At the time given, I will delete the other answer.
Fitting to an unknown numer of parameters
In this example, we try to reproduce some measured data measData.
In this example measData is generated by the function measuredData(x, a=.2, b=-2, c=-.8, d=.1). I practice, we might have measured measData in a way - so we have no idea, how it is described mathematically. Hence the fit.
We fit by a polynomial, which is described by the function polynomFit(inp, *args). As we want to try out different orders of polynomials, it is important to be flexible in the number of input parameters.
The independent variables (x and y in your case) are encoded in the 'columns'/second dimension of inp.
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def measuredData(inp, a=.2, b=-2, c=-.8, d=.1):
x=inp[:,0]
y=inp[:,1]
return a+b*x+c*x**2+d*x**3 +y
def polynomFit(inp, *args):
x=inp[:,0]
y=inp[:,1]
res=0
for order in range(len(args)):
print(14,order,args[order],x)
res+=args[order] * x**order
return res +y
inpData=np.linspace(0,10,20).reshape(-1,2)
inpDataStr=['({:.1f},{:.1f})'.format(a,b) for a,b in inpData]
measData=measuredData(inpData)
fig, ax = plt.subplots()
ax.plot(np.arange(inpData.shape[0]), measData, label='measuered', marker='o', linestyle='none' )
for order in range(5):
print(27,inpData)
print(28,measData)
popt, pcov = curve_fit(polynomFit, xdata=inpData, ydata=measData, p0=[0]*(order+1) )
fitData=polynomFit(inpData,*popt)
ax.plot(np.arange(inpData.shape[0]), fitData, label='polyn. fit, order '+str(order), linestyle='--' )
ax.legend( loc='upper left', bbox_to_anchor=(1.05, 1))
print(order, popt)
ax.set_xticklabels(inpDataStr, rotation=90)
Result:
Yes. We can pass multiple variables for curve_fit. I have written a piece of code:
import numpy as np
x = np.random.randn(2,100)
w = np.array([1.5,0.5]).reshape(1,2)
esp = np.random.randn(1,100)
y = np.dot(w,x)+esp
y = y.reshape(100,)
In the above code I have generated x a 2D data set in shape of (2,100) i.e, there are two variables with 100 data points. I have fit the dependent variable y with independent variables x with some noise.
def model_func(x,w1,w2,b):
w = np.array([w1,w2]).reshape(1,2)
b = np.array([b]).reshape(1,1)
y_p = np.dot(w,x)+b
return y_p.reshape(100,)
We have defined a model function that establishes relation between y & x.
Note: The shape of output of the model function or predicted y should be (length of x,)
popt, pcov = curve_fit(model_func,x,y)
The popt is an 1D numpy array containing predicted parameters. In our case there are 3 parameters.
Yes, there is: simply give curve_fit a multi-dimensional array for xData.