Doing many iterations of scipy's `curve_fit` in one go - python

Consider the following MWE
import numpy as np
from scipy.optimize import curve_fit
X=np.arange(1,10,1)
Y=abs(X+np.random.randn(15,9))
def linear(x, a, b):
return (x/b)**a
coeffs=[]
for ix in range(Y.shape[0]):
print(ix)
c0, pcov = curve_fit(linear, X, Y[ix])
coeffs.append(c0)
XX=np.tile(X, Y.shape[0])
c0, pcov = curve_fit(linear, XX, Y.flatten())
I have a problem where I have to do basically that, but instead of 15 iterations it's thousands and it's pretty slow.
Is there any way to do all of those iterations at once with curve_fit? I know the result from the function is supposed to be a 1D-array, so just passing the args like this
c0, pcov = curve_fit(nlinear, X, Y)
is not going to work. Also I think the answer has to be in flattening Y, so I can get a flattened result, but I just can't get anything to work.
EDIT
I know that if I do something like
XX=np.tile(X, Y.shape[0])
c0, pcov = curve_fit(nlinear, XX, Y.flatten())
then I get a "mean" value of the coefficients, but that's not what I want.
EDIT 2
For the record, I solved with using Jacques Kvam's set-up but implemented using Numpy (because of a limitation)
lX = np.log(X)
lY = np.log(Y)
A = np.vstack([lX, np.ones(len(lX))]).T
m, c=np.linalg.lstsq(A, lY.T)[0]
And then m is a and to get b:
b=np.exp(-c/m)

Least squares won't give the same result because the noise is transformed by log in this case. If the noise is zero, both methods give the same result.
import numpy as np
from numpy import random as rng
from scipy.optimize import curve_fit
rng.seed(0)
X=np.arange(1,7)
Y = np.zeros((4, 6))
for i in range(4):
b = a = i + 1
Y[i] = (X/b)**a + 0.01 * randn(6)
def linear(x, a, b):
return (x/b)**a
coeffs=[]
for ix in range(Y.shape[0]):
print(ix)
c0, pcov = curve_fit(linear, X, Y[ix])
coeffs.append(c0)
coefs is
[array([ 0.99309127, 0.98742861]),
array([ 2.00197613, 2.00082722]),
array([ 2.99130237, 2.99390585]),
array([ 3.99644048, 3.9992937 ])]
I'll use scikit-learn's implementation of linear regression since I believe that scales well.
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
Take logs of X and Y
lX = np.log(X)[None, :]
lY = np.log(Y)
Now fit and check that coeffiecients are the same as before.
lr.fit(lX.T, lY.T)
lr.coef_
Which gives similar exponent.
array([ 0.98613517, 1.98643974, 2.96602892, 4.01718514])
Now check the divisor.
np.exp(-lr.intercept_ / lr.coef_.ravel())
Which gives similar coefficient, you can see the methods diverging somewhat though in their answers.
array([ 0.99199406, 1.98234916, 2.90677142, 3.73416501])

It might be useful in some situations to have the best fit parameters as a numpy array for further calculations. One can add the following after the for loop:
bestfit_par = np.asarray(coeffs)

Related

Why is scipy.optimize.curve_fit not producing a line of best fit for my points?

I am trying to plot several datasets for repeat R-T measurements and fit a cubic root line of best fit through each dataset using scipy.optimize.curve_fit.
My code produces a line for each dataset, but not a cubic root line of best fit. Each dataset is colour-coded to its corresponding line of best fit:
I've tried increasing the order of magnitude of my data, as I heard that sometimes scipy.optimize.curve_fit doesn't like very small numbers, but this made no change. If anyone could point out where I am going wrong I would be extremely grateful:
import numpy as np
from scipy.optimize import curve_fit
import scipy.optimize as scpo
import matplotlib.pyplot as plt
files = [ '50mA30%set1.lvm','50mA30%set3.lvm', '50mA30%set4.lvm',
'50mA30%set5.lvm']
for file in files:
data = numpy.loadtxt(file)
current_YBCO = data[:,1]
voltage_YBCO = data[:,2]
current_thermometer = data[:,3]
voltage_thermometer = data[:,4]
T = data[:,5]
R = voltage_thermometer/current_thermometer
p = np.polyfit(R, T, 4)
T_fit = p[0]*R**4 + p[1]*R**3 + p[2]*R**2 + p[3]*R + p[4]
y = voltage_YBCO/current_YBCO
def test(T_fit, a, b, c):
return a * (T_fit+b)**(1/3) + c
param, param_cov = curve_fit(test, np.array(T_fit), np.array(y),
maxfev=100000)
ans = (param[0]*(np.array(T_fit)+param[1])**(1/3)+param[2])
plt.scatter(T_fit,y, 0.5)
plt.plot(T_fit, ans, '--', label ="optimized data")
plt.xlabel("YBCO temperature(K)")
plt.ylabel("Resistance of YBCO(Ohms)")
plt.xlim(97, 102)
plt.ylim(-.00025, 0.00015)
Two things are making this harder for you.
First, cube roots of negative numbers for numpy arrays. If you try this you'll see that you aren't getting the result you want:
x = np.array([-8, 0, 8])
x**(1/3) # array([nan, 0., 2.])
This means that your test function is going to have a problem any time it gets a negative value, and you need the negative values to create the left hand side of the curves. Instead, use np.cbrt
x = np.array([-8, 0, 8])
np.cbrt(x) # array([-2., 0., 2.])
Secondly, your function is
def test(T_fit, a, b, c):
return a * (T_fit + b)**(1/3) + c
Unfortunately, this just doesn't look very much like the graph you show. This makes it really hard for the optimisation to find a "good" fit. Things I particularly dislike about this function are
it goes vertical at T_fit == b. Your data has a definite slope at this point
it keeps growing quite strongly away from T_fit = b. Your data goes horizontal.
However, it is sometimes possible to get a more "sensible fit" by giving the optimisation a good starting point.
You haven't given us any code to work from, which makes this much harder. So, by way of illustration, try this:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize
fig, ax = plt.subplots(1)
# Generate some data which looks a bit like yours
x = np.linspace(95, 105, 1001)
y = 0.001 * (-0.5 + 1/(1 + np.exp((100-x)/0.5)) + 0.125 * np.random.rand(len(x)))
# A fitting function
def fit(x, a, b, c):
return a * np.cbrt((x + b)) + c
# Perform the fitting
param, param_cov = scipy.optimize.curve_fit(fit, x, y, p0=[0.004, -100, 0.001], maxfev=100000)
# Calculate the fitted data:
yf = fit(x, *param)
print(param)
# Plot data and the fitted curve
ax.plot(x, y, '.')
ax.plot(x, yf, '-')
Now, if I run this code I do get a fit which roughly follows the data. However, if I take the initial guess out, i.e. do the fitting by calling
param, param_cov = scipy.optimize.curve_fit(fit, x, y, maxfev=100000)
then the fit is much worse. The reason why is that curve_fit will start from an initial guess of [1, 1, 1]. The solution which looks approximately right lies in a different valley to [1, 1, 1] and therefore it isn't the solution which is found. Said another way, it only finds the local minimum, not the global.

How to determine unknown parameters of a differential equation based on the best fit to a data set in Python?

I am trying to fit different differential equations to a given data set with python. For this reason, I use the scipy package, respectively the solve_ivp function.
This works fine for me, as long as I have a rough estimate of the parameters (b= 0.005) included in the differential equations, e.g:
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
import numpy as np
def f(x, y, b):
dydx= [-b[0] * y[0]]
return dydx
xspan= np.linspace(1, 500, 25)
yinit= [5]
b= [0.005]
sol= solve_ivp(lambda x, y: f(x, y, b),
[xspan[0], xspan[-1]], yinit, t_eval= xspan)
print(sol)
print("\n")
print(sol.t)
print(sol.y)
plt.plot(sol.t, sol.y[0], "b--")
However, what I like to achieve is, that the parameter b (or more parameters) is/are determined "automatically" based on the best fit of the solved differential equation to a given data set (x and y). Is there a way this can be done, for example by combining this example with the curve_fit function of scipy and how would this look?
Thank you in advance!
Yes, what you think about should work, it should be easy to plug together. You want to call
popt, pcov = scipy.optimize.curve_fit(curve, xdata, ydata, p0=[b0])
b = popt[0]
where you now have to define a function curve(x,*p) that transforms any list of point into a list of values according to the only parameter b.
def curve(x,b):
res = solve_ivp(odefun, [1,500], [5], t_eval=x, args = [b])
return res.y[0]
Add optional arguments for error tolerances as necessary.
To make this more realistic, make also the initial point a parameter. Then it also becomes more obvious where a list is expected and where single arguments. To get a proper fitting task add some random noise to the test data. Also make the fall to zero not so fast, so that the final plot still looks somewhat interesting.
from scipy.integrate import solve_ivp
from scipy.optimize import curve_fit
xmin,xmax = 1,500
def f(t, y, b):
dydt= -b * y
return dydt
def curve(t, b, y0):
sol= solve_ivp(lambda t, y: f(t, y, b),
[xmin, xmax], [y0], t_eval= t)
return sol.y[0]
xdata = np.linspace(xmin, xmax, 25)
ydata = np.exp(-0.02*xdata)+0.02*np.random.randn(*xdata.shape)
y0 = 5
b= 0.005
p0 = [b,y0]
popt, pcov = curve_fit(curve, xdata, ydata, p0=p0)
b, y0 = popt
print(f"b={b}, y0 = {y0}")
This returns
b=0.019975693539459473, y0 = 0.9757709108115179
Now plot the test data against the fitted curve

Scipy ValueError: object too deep for desired array with optimize.leastsq

I am trying to fit my 3D data with linear 3D function Z = ax+by+c. I import the data with pandas:
dataframe = pd.read_csv('3d_data.csv',names=['x','y','z'],header=0)
print(dataframe)
x y z
0 52.830740 7.812507 0.000000
1 44.647931 61.031381 8.827942
2 38.725318 0.707952 52.857968
3 0.000000 31.026271 17.743218
4 57.137854 51.291656 61.546131
5 46.341341 3.394429 26.462564
6 3.440893 46.333864 70.440650
I have done some digging and found that the best way to fit 3D data it is to use optimize from scipy with the model equation and residual function:
def model_calc(parameter, x, y):
a, b, c = parameter
return a*x + b*y + c
def residual(parameter, data, x, y):
res = []
for _x in x:
for _y in y:
res.append(data-model_calc(parameter,x,y))
return res
I fit the data with:
params0 = [0.1, -0.2,1.]
result = scipy.optimize.leastsq(residual,params0,(dataframe['z'],dataframe['x'],dataframe['y']))
fittedParams = result[0]
But the result is a ValueError:
ValueError: object too deep for desired array [...]
minpack.error: Result from function call is not a proper array of floats.
I was trying to minimize the residual function to give only single value or single np.array but it didn't help. I don't know where is the problem and if maybe the search space for parameters it is not too complex. I would be very grateful for some hints!
If you are fitting parameters to a function, you can use curve_fit. Here's an implementation:
from scipy.optimize import curve_fit
def model_calc(X, a, b, c):
x, y = X
return a*x + b*y + c
p0 = [0.1, -0.2, 1.]
popt, pcov = curve_fit(model_calc, (dataframe.x, dataframe.y), dataframe.z, p0) #popt is the fit, pcov is the covariance matrix (see the docs)
Note that your sintax must be if the form f(X, a, b, c), where X can be a 2D vector (See this post).
(Another approach)
If you know your fit is going to be linear, you can use numpy.linalg.lstsq. See here. Example solution:
import numpy as np
from numpy.linalg import lstsq
A = np.vstack((dataframe.x, dataframe.y, np.ones_like(dataframe.y))).T
B = dataframe.z
a, b, c = lstsq(A, B)[0]

how to fit a step function in python

I have a question about fitting a step function using scipy routines like curve_fit. I have trouble making it vectorized, for example:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
xobs=np.linspace(0,10,100)
yl=np.random.rand(50); yr=np.random.rand(50)+100
yobs=np.concatenate((yl,yr),axis=0)
def model(x,rf,T1,T2):
#1: x=np.vectorize(x)
if x<rf:
ret= T1
else:
ret= T2
return ret
#2: model=np.vectorize(model)
popt, pcov = curve_fit(model, xobs, yobs, [40.,0.,100.])
It says
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
If I add #1 or #2 it runs but doesn't really fit the data:
OptimizeWarning: Covariance of the parameters could not be estimated category=OptimizeWarning)
[ 40. 50.51182064 50.51182064] [[ inf inf inf]
[ inf inf inf]
[ inf inf inf]]
Anybody know how to fix that? THX
Here's what I did. I retained xobs and yobs:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
xobs=np.linspace(0,10,100)
yl=np.random.rand(50); yr=np.random.rand(50)+100
yobs=np.concatenate((yl,yr),axis=0)
Now, Heaviside function must be generated. To give you an overview of this function, consider the half-maximum convention of Heaviside function:
In Python, this is equivalent to: def f(x): return 0.5 * (np.sign(x) + 1)
A sample plot would be:
xval = sorted(np.concatenate([np.linspace(-5,5,100),[0]])) # includes x = 0
yval = f(xval)
plt.plot(xval,yval,'ko-')
plt.ylim(-0.1,1.1)
plt.xlabel('x',size=18)
plt.ylabel('H(x)',size=20)
Now, plotting xobs and yobs gives:
plt.plot(xobs,yobs,'ko-')
plt.ylim(-10,110)
plt.xlabel('xobs',size=18)
plt.ylabel('yobs',size=20)
Notice that comparing the two figures, the second plot is shifted by 5 units and the maximum increases from 1.0 to 100. I infer that the function for the second plot can be represented as follows:
or in Python: (0.5 * (np.sign(x-5) + 1) * 100 = 50 * (np.sign(x-5) + 1)
Combining the plots yields (where Fit represents the above fitting function)
The plot confirms that my guess is correct. Now, assuming that YOU DO NOT KNOW how did this correct fitting function come about, a generalized fitting function is created: def f(x,a,b,c): return a * (np.sign(x-b) + c), where theoretically, a = 50, b = 5, and c = 1.
Proceed to estimation:
popt,pcov=curve_fit(f,xobs,yobs,bounds=([49,4.75,0],[50,5,2])).
Now, bounds = ([lower bound of each parameter (a,b,c)],[upper bound of each parameter]). Technically, this means that 49 < a < 50, 4.75 < b < 5, and 0 < c < 2.
Here are MY results for popt and pcov:
pcov represents the estimated covariance of popt. The diagonals provide the variance of the parameter estimate [Source].
Results show that the parameter estimates pcov are near the theoretical values.
Basically, a generalized Heaviside function can be represented by: a * (np.sign(x-b) + c)
Here is the code that will generate parameter estimates and the corresponding covariances:
import numpy as np
from scipy.optimize import curve_fit
xobs = np.linspace(0,10,100)
yl = np.random.rand(50); yr=np.random.rand(50)+100
yobs = np.concatenate((yl,yr),axis=0)
def f(x,a,b,c): return a * (np.sign(x-b) + c) # Heaviside fitting function
popt, pcov = curve_fit(f,xobs,yobs,bounds=([49,4.75,0],[50,5,2]))
print 'popt = %s' % popt
print 'pcov = \n %s' % pcov
Finally, note that the estimates of popt and pcov vary.
pythonscipy
This question is pretty old, but in case it can be useful to other people: The Heaviside function is not differentiable at the step, and this is causing issues in the minimization. In such cases, I fit a logistic function, as shown below.
Fitting a heaviside function always fails in my case.
x = np.linspace(0,10,101)
y = np.heaviside((x-5), 0.)
def sigmoid(x, x0,b):
return scipy.special.expit((x-x0)*b)
args, cov = optim.curve_fit(sigmoid, x, y)
plt.scatter(x,y)
plt.plot(x, sigmoid(x, *args))
print(args)
>
[ 5.05006427 532.21427701]

numpy.polyfit with adapted parameters

Regarding to this: polynomial equation parameters
where I get 3 parameters for a squared function y = a*x² + b*x + c now I want only to get the first parameter for a squared function which describes my function y = a*x². With other words: I want to set b=c=0 and get the adapted parameter for a. In case I understand it right, polyfit isn't able to do this.
This can be done by numpy.linalg.lstsq. To explain how to use it, it is maybe easiest to show how you would do a standard 2nd order polyfit 'by hand'. Assuming you have your measurement vectors x and y, you first construct a so-called design matrix M like so:
M = np.column_stack((x**2, x, np.ones_like(x)))
after which you can obtain the usual coefficients as the least-square solution to the equation M * k = y using lstsq like this:
k, _, _, _ = np.linalg.lstsq(M, y)
where k is the column vector [a, b, c] with the usual coefficients. Note that lstsq returns some other parameters, which you can ignore. This is a very powerful trick, which allows you to fit y to any linear combination of the columns you put into your design matrix. It can be used e.g. for 2D fits of the type z = a * x + b * y (see e.g. this example, where I used the same trick in Matlab), or polyfits with missing coefficients like in your problem.
In your case, the design matrix is simply a single column containing x**2. Quick example:
import numpy as np
import matplotlib.pylab as plt
# generate some noisy data
x = np.arange(1000)
y = 0.0001234 * x**2 + 3*np.random.randn(len(x))
# do fit
M = np.column_stack((x**2,)) # construct design matrix
k, _, _, _ = np.linalg.lstsq(M, y) # least-square fit of M * k = y
# quick plot
plt.plot(x, y, '.', x, k*x**2, 'r', linewidth=3)
plt.legend(('measurement', 'fit'), loc=2)
plt.title('best fit: y = {:.8f} * x**2'.format(k[0]))
plt.show()
Result:
The coefficients are get to minimize the squared error, you don't assign them. However, you can set some of the coefficients to zero if they are too much insignificant. E.g., I have a list of points on curve y = 33*x²:
In [51]: x=np.arange(20)
In [52]: y=33*x**2 #y = 33*x²
In [53]: coeffs=np.polyfit(x, y, 2)
In [54]: coeffs
Out[54]: array([ 3.30000000e+01, 8.99625199e-14, -7.62430619e-13])
In [55]: epsilon=np.finfo(np.float32).eps
In [56]: coeffs[np.abs(coeffs)<epsilon]=0
In [57]: coeffs
Out[57]: array([ 33., 0., 0.])

Categories