I am attempting to write a program that reads two sets data from a .csv file into arrays, and then fits it to a piecewise function. What's most important to me is that these fits are done simultaneously because they have the same parameters. This piecewise function is my attempt to do so, though if you know a better way to fit them simultaneously I'd also greatly appreciate advice regarding that.
To avoid having to upload the csv files I've added the data directly into the arrays.
import numpy
import csv
import matplotlib
from scipy import optimize
xdata = [2.0, 10.0, 30.0, 50.0, 70.0, 90.0, 110.0, 130.0, 150.0, 250.0, 400.0, 1002.0, 1010.0, 1030.0, 1050.0, 1070.0, 1090.0, 1110.0, 1130.0, 1150.0, 1250.0, 1400.0]
ydata = [0.013833958803215633, 0.024273268442992078, 0.08792766000711709, 0.23477725658012044, 0.31997367288103884, 0.3822895295625711, 0.46037063893452784, 0.5531831477605121, 0.559757863748663, 0.6443036770720387, 0.7344601382896991, 2.6773979205076136e-09, 9.297289736857164e-10, 0.10915332214935693, 0.1345307163724643, 0.1230161681870127, 0.11286094974672768, 0.09186485171688986, 0.06609131137369342, 0.052616358869021135, 0.034629686697483314, 0.03993853791147095]
The first 11 points I want to fit to the function labeled 'SSdecay', and the second 11 points I want to fit to the function labeled 'SUdecay'. My attempt at doing this simultaneously was making the piecewise function labeled 'fitfunciton'.
#defines functions to be used in fitting
#to fit the first half of data
def SSdecay(x, lam1, lam2, norm, xoff):
return norm*(1 + lam2/(lam1 - lam2)*numpy.exp(-lam1*(x - xoff)) -
lam1/(lam1 - lam2)*numpy.exp(-lam2*(x - xoff)))
#to fit the second half of data
def SUdecay(x, lam1, lam2, norm, xoff):
return norm*(lam1/(lam1 - lam2))*(-numpy.exp(-lam1*(x - xoff)) +
numpy.exp(-lam2*(x - xoff)))
#piecewise function combining SS and SU functions to fit the whole data set
def fitfunction(x, lam1, lam2, norm, xoff):
y = numpy.piecewise(x,[x < 1000, x >= 1000],[SSdecay(x, lam1, lam2, norm, xoff),SUdecay(x, lam1, lam2, norm, xoff)])
return y
#fits the piecewise function with initial guesses for parameters
p0=[0.01,0.02,1,0]
popt, pcov = optimize.curve_fit(fitfunction, xdata, ydata, p0)
print(popt)
print(pcov)
After running this I get the error:
ValueError: NumPy boolean array indexing assignment cannot assign 22 input values to the 11 output values where the mask is true
It seems as though curve_fit does not like that I'm using a piecewise function but I am unsure why or if it is a fixable kind of problem.
Here are my results for separately fitting the two functions using the normalized data. It looks unlikely that these will work as a single piecewise equation, please see the plot image and source code below. I also have very different fitted parameters for the two equations:
SS parameters: [ 0.0110936, 0.09560932, 0.72929264, 6.82520026]
SU parameters: [ 3.46853883e-02, 9.54208972e-03, 1.99877873e-01, 1.00465563e+03]
import numpy
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
xdata = [2.0, 10.0, 30.0, 50.0, 70.0, 90.0, 110.0, 130.0, 150.0, 250.0, 400.0, 1002.0, 1010.0, 1030.0, 1050.0, 1070.0, 1090.0, 1110.0, 1130.0, 1150.0, 1250.0, 1400.0]
ydata = [0.013833958803215633, 0.024273268442992078, 0.08792766000711709, 0.23477725658012044, 0.31997367288103884, 0.3822895295625711, 0.46037063893452784, 0.5531831477605121, 0.559757863748663, 0.6443036770720387, 0.7344601382896991, 2.6773979205076136e-09, 9.297289736857164e-10, 0.10915332214935693, 0.1345307163724643, 0.1230161681870127, 0.11286094974672768, 0.09186485171688986, 0.06609131137369342, 0.052616358869021135, 0.034629686697483314, 0.03993853791147095]
#to fit the first half of data
def SSdecay(x, lam1, lam2, norm, xoff):
return norm*(1 + lam2/(lam1 - lam2)*numpy.exp(-lam1*(x - xoff)) -
lam1/(lam1 - lam2)*numpy.exp(-lam2*(x - xoff)))
#to fit the second half of data
def SUdecay(x, lam1, lam2, norm, xoff):
return norm*(lam1/(lam1 - lam2))*(-numpy.exp(-lam1*(x - xoff)) +
numpy.exp(-lam2*(x - xoff)))
# some initial parameter values
initialParameters_ss = numpy.array([0.01, 0.02, 1.0, 0.0])
initialParameters_su = initialParameters_ss # same values for this example
# curve fit the equations individually to their respective data
ssParameters, pcov = curve_fit(SSdecay, xdata[:11], ydata[:11], initialParameters_ss)
suParameters, pcov = curve_fit(SUdecay, xdata[11:], ydata[11:], initialParameters_su)
# values for display of fitted function
lam1_ss, lam2_ss, norm_ss, xoff_ss = ssParameters
lam1_su, lam2_su, norm_su, xoff_su = suParameters
# for plotting the fitting results
y_fit_ss = SSdecay(xdata[:11], lam1_ss, lam2_ss, norm_ss, xoff_ss) # first data set, first equation
y_fit_su = SUdecay(xdata[11:], lam1_su, lam2_su, norm_su, xoff_su) # second data set, second equation
plt.plot(xdata, ydata, 'D') # plot the raw data as a scatterplot
plt.plot(xdata[:11], y_fit_ss) # plot the SS equation using the fitted parameters
plt.plot(xdata[11:], y_fit_su) # plot the SU equation using the fitted parameters
plt.show()
print('SS parameters:', ssParameters)
print('SU parameters:', suParameters)
Related
I am trying to solve the following ODE to get displacement values wrt. time using scipy.integrate.ode. Here is the code I am working with.
import numpy as np
import scipy as sp
from scipy.integrate import odeint
import matplotlib.pyplot as plt
%matplotlib inline
rho_water = 0.9970470
h=7.5
def Lor(Z, t,args):
h = args[0]
g = args[1]
Omeg = args[2]
if Z[1]>0:
return [Z[1], 1/Z[0] - 1 -Omeg*Z[1] - (Z[1])**2/Z[0]]
else:
return [Z[1], 1/Z[0] - 1 -Omeg*Z[1] ]
def osc(h=7.5, g=9.80665e2,R=0.05,eta=0.0000889):
n= (16)*(eta)*((h)**(1/2))
d=(rho_water)*(R)**2*(g)**(1/2)
Omeg = (n/d)
params = (h,g,Omeg)
t=np.arange(0,500,0.001)
Z=sp.integrate.odeint(Lor, [0.17, 0.00], t, args=(params,))
z=Z[:,0]*h
n= (16)*(eta)*((h)**(1/2))
d=(rho_water)*(R)**2*(g)**(1/2)
Omeg = (n/d)
params = (h,g,Omeg)
return z
z_0_10=osc(h=7.5, g=9.80665e2,R=0.05,eta=0.0000889)
t=np.arange(0,500,0.001)
This code works when I use np.arange and create an array of time values that way but as I have data that I want to fit to the model described by the ODE, I need to solve the ODE over an array of very specific time values, so that I can get displacement values at those times. I want to use those solutions to calculate residuals for my data wrt the model. The array of time values I want to use is:
t_0_10=np.array([0.000, 0.067, 0.100, 0.133, 0.167, 0.200, 0.233, 0.267, 0.300, 0.333, 0.367, 0.400, 0.433, 0.467, 0.500, 0.533, 0.567, 0.600, 0.633, 0.667, 0.700, 0.767, 0.800, 0.833, 0.867, 0.900, 0.933, 0.967, 1.000, 1.033, 1.067, 1.133, 1.167, 1.200, 1.233, 1.333, 1.367, 1.400, 1.433, 1.467, 1.500, 1.533, 1.567, 1.600, 1.633, 1.733, 1.767, 1.867, 1.967, 2.000, 2.033, 2.067, 2.100])
When I use the array of time values as my argument for time in the ODE instead of using np.arange, I do not get the correct results for displacement anymore. The shape of the graph I plot, is not correct. Here is the code for graph I get when I use np.arange, which is the correct shape. When I use my time values, I get a different graph.
graph= plt.subplots(nrows=1, ncols=1, figsize=(14,8))
fig, ax1 =graph
ax1.plot(t*(h*1e-2/9.8)**0.5, z_0_10, c='orange', label='Model Solution', marker="o")
ax1.set_ylabel('Fluid level (Z) / cm',fontsize=12)
ax1.set_xlabel('Time Elapsed (t) / s',fontsize=12)
ax1.tick_params(axis="x", labelsize=12)
ax1.set_xlim([0,2.10])
How can I fix this issue and get the correct values for displacement using my array of time values? Or, if this is not possible, how can I obtain an array of the correct displacement values at those specific times, while still using np.arange to solve the ODE?
I am currently trying to evaluate some data of mine and tried replicating the fit function described here: https://www.graphpad.com/guides/prism/latest/curve-fitting/reg_classic_dr_variable.htm
At first I was having some trouble with numpy.float_power overflowing, but I think I fixed it (did I really?).
I am now using scipy.optimize.curve_fit to fit the described sigmoid to my data, but it never actually seems to fit, but instead produces constant functions and I have no idea why.
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
'''
Just a method that produces some simple test data
'''
def test_data_1():
return np.array([[0.000610352, 0.002441406, 0.009765625, 0.0390625, 0.15625, 0.625, 2.5, 10],
[0.89, 0.81, 0.64, 0.48, 0.45, 0.50, 0.58, 0.70]])
'''
Just a simple method that produces some more test data
'''
def test_data_2():
return np.array([[0.000610352, 0.002441406, 0.009765625, 0.0390625, 0.15625, 0.625, 2.5, 10],
[1, 0.83, 0.68, 0.52, 0.48, 0.59, 0.75, 0.62]])
'''
Dose response curve as described in: https://www.graphpad.com/guides/prism/latest/curve-fitting/reg_classic_dr_variable.htm
'''
def sigmoidal_dose_response_with_variable_slope(x_data, *params):
# Extract relevant parameters. Flattening the array just in case?
r_params = np.array(params).flatten()
bottom = r_params[0]
top = r_params[1]
logec50 = r_params[2]
slope = r_params[3]
# Calculating the numerator
numerator = top - bottom
# Calculating the denominator
denominator = 1 + np.float_power(10, (logec50 - x_data) * slope, dtype=np.longdouble)
return np.array(bottom + (numerator / denominator), dtype=np.float64)
if __name__ == "__main__":
x_data, y_data = test_data_1()
# Guessing bottom and top as the highest and lowest y-values.
bottom_guess = np.min(y_data)
bottom_guess_idx = np.argmin(y_data)
top_guess = np.max(y_data)
top_guess_idx = np.argmax(y_data)
# Guessing logec50 as the middle between those parameters
logec50_guess = np.linalg.norm(x_data[top_guess_idx] - x_data[bottom_guess_idx]) / 2 \
+ np.min([x_data[top_guess_idx], x_data[bottom_guess_idx]])
# Guessing a slope of 1
slope_guess = 1
p0 = [bottom_guess, top_guess, logec50_guess, slope_guess]
# Fitting the curve to my data
popt, pcov = curve_fit(sigmoidal_dose_response_with_variable_slope, x_data, y_data, p0)
# Making the x-axis scale logarithmically
fig, ax = plt.subplots()
ax.set_xscale('log')
# Plot my data
plt.plot(x_data, y_data, 's')
# Calculate function data. The borders are merely a guess
x_val = np.linspace(0, 10, 100)
y_val = sigmoidal_dose_response_with_variable_slope(x_val, popt)
# Plot
plt.plot(x_val, y_val)
plt.show()
It should be easily testable.
Update:
Something like this is what I am looking for:
I have an (x, y) signal with non-uniform sample rate in x. (The sample rate is roughly proportional to 1/x). I attempted to uniformly re-sample it using scipy.signal's resample function. From what I understand from the documentation, I could pass it the following arguments:
scipy.resample(array_of_y_values, number_of_sample_points, array_of_x_values)
and it would return the array of
[[resampled_y_values],[new_sample_points]]
I'd expect it to return an uniformly sampled data with a roughly identical form of the original, with the same minimal and maximalx value. But it doesn't:
# nu_data = [[x1, x2, ..., xn], [y1, y2, ..., yn]]
# with x values in ascending order
length = len(nu_data[0])
resampled = sg.resample(nu_data[1], length, nu_data[0])
uniform_data = np.array([resampled[1], resampled[0]])
plt.plot(nu_data[0], nu_data[1], uniform_data[0], uniform_data[1])
plt.show()
blue: nu_data, orange: uniform_data
It doesn't look unaltered, and the x scale have been resized too. If I try to fix the range: construct the desired uniform x values myself and use them instead, the distortion remains:
length = len(nu_data[0])
resampled = sg.resample(nu_data[1], length, nu_data[0])
delta = (nu_data[0,-1] - nu_data[0,0]) / length
new_samplepoints = np.arange(nu_data[0,0], nu_data[0,-1], delta)
uniform_data = np.array([new_samplepoints, resampled[0]])
plt.plot(nu_data[0], nu_data[1], uniform_data[0], uniform_data[1])
plt.show()
What is the proper way to re-sample my data uniformly, if not this?
Please look at this rough solution:
import matplotlib.pyplot as plt
from scipy import interpolate
import numpy as np
x = np.array([0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 20])
y = np.exp(-x/3.0)
flinear = interpolate.interp1d(x, y)
fcubic = interpolate.interp1d(x, y, kind='cubic')
xnew = np.arange(0.001, 20, 1)
ylinear = flinear(xnew)
ycubic = fcubic(xnew)
plt.plot(x, y, 'X', xnew, ylinear, 'x', xnew, ycubic, 'o')
plt.show()
That is a bit updated example from scipy page. If you execute it, you should see something like this:
Blue crosses are initial function, your signal with non uniform sampling distribution. And there are two results - orange x - representing linear interpolation, and green dots - cubic interpolation. Question is which option you prefer? Personally I don't like both of them, that is why I usually took 4 points and interpolate between them, then another points... to have cubic interpolation without that strange ups. That is much more work, and also I can't see doing it with scipy, so it will be slow. That is why I've asked about size of the data.
I am trying to fit gaussian to a spectrum and the y values are on the order of 10^(-19). Curve_fit gives me poor fitting result, both before and after I multiply my whole data by 10^(-19). Attached is my code, it is fairly simple set of data except that the values are very small. If I want to keep my original values, how would I get a reasonable gaussian fit that would give me the correct parameters?
#get fits data
aaa=pyfits.getdata('p1.cal.fits')
aaa=np.matrix(aaa)
nrow=np.shape(aaa)[0]
ncol=np.shape(aaa)[1]
ylo=79
yhi=90
xlo=0
xhi=1023
glo=430
ghi=470
#sum all the rows to get spectrum
ysum=[]
for x in range(xlo,xhi):
sum=np.sum(aaa[ylo:yhi,x])
ysum.append(sum)
wavelen_pix=range(xhi-xlo)
max=np.max(ysum)
print "maximum is at x=", np.where(ysum==max)
##fit gaussian
#fit only part of my data in the chosen range [glo:ghi]
x=wavelen_pix[glo:ghi]
y=ysum[glo:ghi]
def func(x, a, x0, sigma):
return a*np.exp(-(x-x0)**2/float((2*sigma**2)))
sig=np.std(ysum[500:1000]) #std of background noise
popt, pcov = curve_fit(func, x, sig)
print popt
#this gives me [1.,1.,1.], which is obviously wrong
gaus=func(x,popt[0],popt[1],popt[2])
aaa is a 153 by 1024 image matrix, partly looks like this:
matrix([[ -8.99793629e-20, 8.57133275e-21, 4.83523386e-20, ...,
-1.54811004e-20, 5.22941515e-20, 1.71179195e-20],
[ 2.75769318e-20, 1.03177243e-20, -3.19634928e-21, ...,
1.66583803e-20, -9.88712568e-22, -2.56897725e-20],
[ 2.88121935e-20, 8.57964252e-21, -2.60784327e-20, ...,
1.72335180e-20, -7.61189937e-21, -3.45333075e-20],
...,
[ 1.04006903e-20, 1.61200683e-20, 7.04195205e-20, ...,
1.72459645e-20, 4.29404029e-20, 1.99889374e-20],
[ 3.22315752e-21, -5.61394194e-21, 3.28763096e-20, ...,
1.99063583e-20, 2.12989880e-20, -1.23250648e-21],
[ 3.66591810e-20, -8.08647455e-22, -6.22773168e-20, ...,
-4.06145681e-21, 4.92453132e-21, 4.23689309e-20]], dtype=float32)
You are calling curve_fit incorrectly, here is the usage
curve_fit(f, xdata, ydata, p0=None, sigma=None, absolute_sigma=False, check_finite=True, **kw)
f is your function whose first arg is an array of independent variables, and whose subsequent args are the function parameters (such as amplitude, center, etc)
xdata are the independent variables
ydata are the dependedent variable
p0 is an initial guess at the function parameters (for Guassian this is amplitude, width, center)
By default p0 is set to a list of ones [1,1,...], which is probably why you get that as a result, the fit just never executed because you called it incorrectly.
Try estimating the amplitude, center, and width from the data, then make a p0 object (see below for details)
init_guess = ( a_i, x0_i, sig_i) # same order as they are supplied to your function
popt, pcov = curve_fit(func, xdata=x,ydata=y,p0=init_guess)
Here is a short example
xdata = np.linspace(0, 4, 50)
mygauss = ( 10,2,0.5) #( amp, center, width)
y = func(xdata, *mygauss ) # using your func defined above
ydata = y + 2*(np.random.random(50)- 0.5) # add some noise to create fake data
Now I can guess the fit params
ai = np.max( ydata) # guess the amplitude
xi = xdata[ np.argmax( ydata)] # guess the position of center
Guessing the width is tricky, I would first find where the half max is located (there are two, but you only need to find one, as the Gaussian is symmetric):
pos_half = argmin( np.abs( ydata-ao/2 ) ) # subtract half the amplitude and find the minimum
Now evaluate how far this is from the center of the gaussian (xi) :
sig_i = np.abs( xi - xdata[ pos_half] ) # estimate the width
Now you can make make the initial guess
init_guess = (ai, xi sig_i)
and fit
params, variance = curve_fit( func, xdata=xdata, ydata=ydata, p0=init_guess)
print params
#array([ 9.99457443, 2.01992858, 0.49599629])
which is very close to mygauss. Hope it helps.
Forget about rescaling, or making linear changes, or using the p0 parameter, which usually don't work! Try using the bounds parameter in the curve_fit for n parameters like this:
a0=np.array([a01,...,a0n])
af=np.array([af1,...,afn])
method="trf",bounds=(a0,af)
Hope it works!
;)
I have two lists that I am trying to do an exponential fit of form y=a*e^(bx) between. I am using an approach similar to the second answer from here but the results are not matching what I know to be true from testing with excel. Here is my code:
import numpy as np
from scipy.optimize import curve_fit
exp_constants = [62.5, 87.5, 112.5, 137.5, 162.5, 187.5, 212.5, 237.5, 262.5, 287.5]
means = [211.94, 139.30, 80.09, 48.29, 26.94, 12.12, 3.99, 1.02, 0.09, 0.02]
def func(x1, a, b):
return a * np.exp(b * x1)
popt, pcov = curve_fit(func, exp_constants, means)
When returning popt[0] and popt[1] I get 3.222e-127 and 1.0 respectively. However, when checking with excel the correct exponential equation should be y=7231.3e^(-0.04x). I am not very familiar with the curve_fit approach, is there something that I am missing in my code or a better approach to getting the correct exponential fit?
Edit: Here is the plot that is made with the following code:
plt.figure()
plt.plot(exp_constants, means, 'ko', label="Data")
plt.plot(exp_constants, func(exp_constants, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show
I guess the problem is that you do not provide an initial guess for the parameters, so as per the manual, curve_fit uses [1, 1] as a guess. The optimization might then get stuck at some local minimum. One other thing you should do is to change your xdata and ydata lists to numpy arrays, as shown by this answer:
import numpy as np
from scipy.optimize import curve_fit
exp_constants = np.array([62.5, 87.5, 112.5, 137.5, 162.5, 187.5, 212.5,
237.5, 262.5, 287.5])
means = np.array([211.94, 139.30, 80.09, 48.29, 26.94, 12.12, 3.99,
1.02, 0.09, 0.02])
def func(x1, a, b):
return a * np.exp(b * x1)
guess = [100, -0.1]
popt, pcov = curve_fit(func, exp_constants, means, p0 = guess)
The exact value of the guess is not important, but you should probably have at least the order of magnitude and the signs right, so that the optimization can converge to the optimal value. I just used some random numbers close to the 'correct answer' you mentioned. When you don't know what to guess, you can do a polyfit(xdata, log(ydata), 1) and some basic math to get an initial value, as shown by this answer to the question you linked.
Quick plot:
x = np.linspace(exp_constants[0], exp_constants[-1], 1000)
plt.plot(exp_constants, means, 'ko', x, popt[0]*np.exp(popt[1]*x), 'r')
plt.show()
Result: