create a function for chi_square

create a function for chi_square - python

I need to input chi_square function and got stuck because it always shows there is a invalid syntax when run it, wonder how should I write the script? And how do I input "v"?
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
data = np.loadtxt("214 ohm.txt", skiprows=1)
xdata = [row[0] for row in data ]#x represents current unit is "V"
ydata = [row[1] for row in data]#y represents voltage unit is "mA"
percision_error_V = np.array(xdata) * 0.0025 #we are using last digit of reading and multiply by measured voltage
accuracy_error_V = 0.01#we are using DC Vlotage, so use the error it provided online
erry = []
for i in range(len(percision_error_V)):
#to compare percision_error and accuracy_error for Voltage and use the larger one
erry.append(max(percision_error_V[i], accuracy_error_V))
def model_function (x, a, b):
return a*x + b
p0 = [0 , 0.]#214ohm is measured by ohmeter
p_opt , p_cov = curve_fit ( model_function ,
xdata , ydata , p0,
erry , True )
print(erry)
a_opt = p_opt[0]
b_opt = p_opt[1]
print(p_cov)
print("diagonal of P-cov is",np.diag(p_cov))
print("a_opt, b_opt is ",a_opt, b_opt)
xhat = np.arange(0, 16, 0.1)
plt.plot(xhat, model_function(xhat, a_opt, b_opt), 'r-', label="model function")
plt.errorbar(xdata, ydata,np.array(erry),linestyle="",marker='s', label="error bar")
plt.legend()
plt.ylabel('Current (mA)')
plt.xlabel('Voltage(V)')
plt.title("Voltage vs. Current with 220ohm Resistor")
plt.show()
p_sigma = np.sqrt(np.diag(p_cov))
print("p_sigma is" ,p_sigma)
for i in range(len(xdata)):
sum=sum((ydata[i]-model_function(xdata[i], a_opt, b_opt))
chi.append(sum)
this is the required function I'm supposed to put on python
Thanks
my code is alright until the equation of chi-square, I wonder how should I fix it?

You have an indentation, 1 missing parenthesis, and variable naming issues so far in this sample of code
FROM
for i in range(len(xdata)):
sum=sum((ydata[i]-model_function(xdata[i], a_opt, b_opt))
1.append(sum)
TO
for i in range(len(xdata)):
sum=sum((ydata[i]-model_function(xdata[i], a_opt, b_opt)) )
a.append(sum)
Variables cannot be named with numbers. E.g. 1,2,3. They must start with string - a1, alfa, betta, or s_t, _s.

Related

Cannot sample values of a trigonometric function

I have a problem with my code.
So i try to represent the sampled values of a function 'sin(t^3)/2^tan(t)' for
t between 0 and 1.5 and frequency fs=50Hz.
I have created a function 'sampleFunction' which takes as parameters the string which represents the trigonometric function,beginning of the interval,end of interval and the frequency.
I create tVector(0,0.02,0.04,..,1.48)
Then I take the elements of tVector and use them to evaluate the string and put the result in another vector y
I return both y and tVector
But I encounter a problem when i run it saying 'y' is not defined
This is the code:
import numpy as np
import matplotlib.pyplot as plt
import math
def sampleFunction(functionString,t0,t1,fs):
tVector=np.arange(start=t0, stop=t1, step=1/fs, dtype='float')
t=t0
for i in range(0,len(tVector)):
t=tVector[i]
y[i]=eval(functionString)
return y,tVector
t0=0
t1 =1.5
fs=50
thold=.1
functionString='math.sin(t**3)/2**math.tan(t)'
y,t=sampleFunction(functionString,t0,t1,fs)
plt.plot(t,y)
plt.xlabel('time')
plt.ylabel('Amplitude')

You can change your code in the following way:
def sampleFunction(functionString,t0,t1,fs):
tVector=np.arange(start=t0, stop=t1, step=1/fs, dtype='float')
t=t0
y = np.zeros( tVector.shape )
for i in range(0,len(tVector)):
t=tVector[i]
y[i]=eval(functionString)
return y,tVector
However, this is not good python. There are a couple of issues:
You should use vectorized operations.
You should avoid eval like the plague. This has security implications.
For vectorized operations, simply do:
def sampleFunction(functionString,t0,t1,fs):
t = np.arange(start=t0, stop=t1, step=1/fs, dtype='float')
y = eval(functionString)
return y, t
and call it as:
sampleFunction('np.sin(t**3)/2**np.tan(t)', 0, 10, 100)
This is much faster (especially for large arrays)
Finally, the vectorized form is only a single line long. You probably don't need the extra function.

You have a problem with the allocation of the 'y' variable as Harold is saying.
However, there are multiple ways of achieving what you are doing and the eval function is, unless you have a very good reason, the absolute worst. Maybe consider one of the possible examples below:
import numpy as np
import matplotlib.pyplot as plt
import math
def sampleFunction(functionString,t0,t1,fs):
tVector=np.arange(start=t0, stop=t1, step=1/fs, dtype='float')
t=t0
y = [float]*len(tVector) # <------------------- Allocate 'y' variable
for i in range(0,len(tVector)):
t = tVector[i]
y[i]=eval(functionString)
return y,tVector
t0=0
t1 =1.5
fs=50
thold=.1
# Your code
functionString = 'math.sin(t**3)/2**math.tan(t)'
y, t = sampleFunction(functionString,t0,t1,fs)
plt.plot(t, y, color='cyan')
# Using the 'map' built-in function
t = np.arange(start=t0, stop=t1, step=1./fs, dtype='float')
y = map(lambda ti: 0.9*math.sin(ti**3)/2**math.tan(ti), t)
plt.plot(t, y, color='magenta')
# Using Numpy's 'sin' and 'tan'
t = np.arange(start=t0, stop=t1, step=1./fs, dtype='float')
y = 0.8*np.sin(t**3)/2**np.tan(t)
plt.plot(t, y, color='darkorange')
# Using 'list comprehensions'
t = np.arange(start=t0, stop=t1, step=1./fs, dtype='float')
y = [ 0.7*math.sin(ti**3)/2**math.tan(ti) for ti in t]
plt.plot(t, y, color='darkgreen')
plt.xlabel('time')
plt.ylabel('Amplitude')
plt.show()
The result is:

When running the above code, you should have gotten an error message saying, in the end, "name 'y' is not defined". If you look at your function definition, you will see that it really isn't. You cannot passing a value to y[i] without defining y first! The following line before the "for" loop fixes that particular problem:
y = [None] * len(tVector)
The code will run fine after that correction.
But: why do you want to pass a function string when you can pass a function? Functions, in Python, are first-class-objects!

Confine a gaussian fit with curve_fit

in the framework of my bachelor's thesis, I need to evaluate my data with python. Unfortunately there's no suiting script of my fellow students yet and I'm quite new to programming.
I have this data set and I'm trying to fit it with a gaussian by using scipy.optimize.curve_fit. Since there are a lot of unusable counts especially at the end of the axis, I'd like to confine the part that is to be fitted.
Picture raw data
This is what I have so far:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x=np.arange(5120)
y=array([ 0.81434599, 1.17054264, 0.85279188, ..., 1. ,
1. , 13.56291391]) #most of the data isn't interesting
#to me, part of interest see below
def Gauss(x, a, x0, sigma):
return a * np.exp(-(x - x0)**2 / (2 * sigma**2))
mean = sum(x * y) / sum(y)
sigma = np.sqrt(sum(y * (x - mean)**2) / sum(y))
popt,pcov = curve_fit(Gauss, x, y, p0=[max(y), mean, sigma],
maxfev=360000)
plt.plot(x,y,label='data')
plt.plot(x,Gauss(x, *popt), 'r-',label='fit')
On docs.scipy.org I've found a general description about curve_fit
If I try using
bounds=([2400,-np.inf, -np.inf],[2600, np.inf, np.inf]),
I'm getting the ValueError: x0 is infeasible. What is the problem here?
I also tried to confine it with
popt,pcov = curve_fit(Gauss, x[2400:2600], y[2400:2600], p0=[max(y), mean, sigma], maxfev=360000)
as suggested in a comment on this question: "Error when obtaining gaussian fit for graph" at stackoverflow
In this case I only get a straight line though.
Picture: Confinement with x[2400:2600],y[2400:2600] as arguments of curve_fit
I really hope you can help me out here. I only need a way to fit a small part of my data. Thanks in advance!
interesting data:
y=array([ 0.93396226, 1.00884956, 1.15457413, 1.07590759,
0.88915094, 1.07142857, 1.10714286, 1.14171123, 1.06666667,
0.84975369, 0.95480226, 0.99388379, 1.01675978, 0.83967391,
0.9771987 , 1.02402402, 1.04531722, 1.07492795, 0.97135417,
0.99714286, 1.0248139 , 1.26223776, 1.1533101 , 0.99099099,
1.18867925, 1.15772871, 0.95076923, 1.03313253, 1.02278481,
0.93265993, 1.06705539, 1.00265252, 1.02023121, 0.92076503,
0.99728997, 1.03353659, 1.15116279, 1.04336043, 0.95076923,
1.05515588, 0.92571429, 0.93448276, 1.02702703, 0.90056818,
0.96068796, 1.08493151, 1.13584906, 1.1212938 , 1.0739645 ,
0.98972603, 0.94594595, 1.07913669, 0.98425197, 0.87762238,
0.96811594, 1.02710843, 0.99392097, 0.91384615, 1.09809264,
1.00630915, 0.93175074, 0.87572254, 1.00651466, 0.78772379,
1.12244898, 1.2248062 , 0.97109827, 0.94607843, 0.97900262,
0.97527473, 1.01212121, 1.16422287, 1.20634921, 0.97275204,
1.01090909, 0.99404762, 1.00561798, 1.01146132, 1.08695652,
0.97214485, 1.03525641, 0.99096386, 1.05135952, 1.16451613,
0.90462428, 0.76876877, 0.47701149, 0.27607362, 0.21580547,
0.20598007, 0.16766467, 0.15533981, 0.19745223, 0.15407855,
0.18925831, 0.26997245, 0.47603834, 0.596875 , 0.85126582, 0.96
, 1.06578947, 1.08761329, 0.89548023, 0.99705882, 1.07142857,
0.95677233, 0.86119874, 1.02857143, 0.98250729, 0.94214876,
1.04166667, 0.96024465, 1.07022472, 1.10344828, 1.04859335,
0.96655518, 1.06424581, 1.01754386, 1.03492063, 1.18627451,
0.91036415, 1.03355705, 1.09116809, 0.96083551, 1.01298701,
1.03691275, 1.02923977, 1.11612903, 1.01457726, 1.06285714,
0.98186528, 1.16470588, 0.86645963, 1.07317073, 1.09615385,
1.21192053, 0.94385027, 0.94244604, 0.88390501, 0.95718654,
0.9691358 , 1.01729107, 1.01119403, 1.20350877, 1.12890625,
1.06940063, 0.90410959, 1.14662757, 0.97093023, 1.03021148,
1.10629921, 0.97118156, 1.10693642, 1.07917889, 0.9484127 ,
1.07581227, 0.98006645, 0.98986486, 0.90066225, 0.90066225,
0.86779661, 0.86779661, 0.96996997, 1.01438849, 0.91186441,
0.91290323, 1.03745318, 1.0615942 , 0.97202797, 1.16608997,
0.94182825, 1.08333333, 0.9076087 , 1.18181818, 1.20618557,
1.01273885, 0.93606138, 0.87457627, 0.90575916, 1.09756098,
0.99115044, 1.13380282, 1.04333333, 1.04026846, 1.0297619 ,
1.04334365, 1.03395062, 0.92553191, 0.98198198, 1. ,
0.9439528 , 1.02684564, 1.1372549 , 0.96676737, 0.99649123,
1.07051282, 1.10367893, 1.0866426 , 1.15384615, 0.99667774])

You might find the lmfit module (https://lmfit.github.io/lmfit-py/) useful for this. It is designed to make curve fitting very easy, has built-in models for common peaks like Gaussian, and has many useful features such as allowing you to set bounds on parameters. A fit to your data with lmfit might look like this:
import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import GaussianModel, ConstantModel
y = np.array([.....]) # uses your shorter data range
x = np.arange(len(y))
# make a model that is a Gaussian + a constant:
model = GaussianModel(prefix='peak_') + ConstantModel()
# make parameters with starting values:
params = model.make_params(c=1.0, peak_center=90,
peak_sigma=5, peak_amplitude=-5)
# it's not really needed for this data, but you can put bounds on
# parameters like this (or set .vary=False to fix a parameter)
params['peak_sigma'].min = 0 # sigma > 0
params['peak_amplitude'].max = 0 # amplitude < 0
params['peak_center'].min = 80
params['peak_center'].max = 100
# run fit
result = model.fit(y, params, x=x)
# print, plot results
print(result.fit_report())
plt.plot(x, y)
plt.plot(x, result.best_fit)
plt.show()
This will print out
[[Model]]
(Model(gaussian, prefix='peak_') + Model(constant))
[[Fit Statistics]]
# function evals = 54
# data points = 200
# variables = 4
chi-square = 1.616
reduced chi-square = 0.008
Akaike info crit = -955.625
Bayesian info crit = -942.432
[[Variables]]
peak_sigma: 4.03660814 +/- 0.204240 (5.06%) (init= 5)
peak_center: 91.2246614 +/- 0.200267 (0.22%) (init= 90)
peak_amplitude: -9.79111362 +/- 0.445273 (4.55%) (init=-5)
c: 1.02138228 +/- 0.006796 (0.67%) (init= 1)
peak_fwhm: 9.50548558 +/- 0.480950 (5.06%) == '2.3548200*peak_sigma'
peak_height: -0.96766623 +/- 0.041854 (4.33%) == '0.3989423*peak_amplitude/max(1.e-15, peak_sigma)'
[[Correlations]] (unreported correlations are < 0.100)
C(peak_sigma, peak_amplitude) = -0.599
C(peak_amplitude, c) = -0.328
C(peak_sigma, c) = 0.196
and make a plot like this:

Understanding scipy deconvolve

I'm trying to understand scipy.signal.deconvolve.
From the mathematical point of view a convolution is just the multiplication in fourier space so I would expect
that for two functions f and g:
Deconvolve(Convolve(f,g) , g) == f
In numpy/scipy this is either not the case or I'm missing an important point.
Although there are some questions related to deconvolve on SO already (like here and here) they do not address this point, others remain unclear (this) or unanswered (here). There are also two questions on SignalProcessing SE (this and this) the answers to which are not helpful in understanding how scipy's deconvolve function works.
The question would be:
How do you reconstruct the original signal f from a convoluted signal,
assuming you know the convolving function g.?
Or in other words: How does this pseudocode Deconvolve(Convolve(f,g) , g) == f translate into numpy / scipy?
Edit: Note that this question is not targeted at preventing numerical inaccuracies (although this is also an open question) but at understanding how convolve/deconvolve work together in scipy.
The following code tries to do that with a Heaviside function and a gaussian filter.
As can be seen in the image, the result of the deconvolution of the convolution is not at
all the original Heaviside function. I would be glad if someone could shed some light into this issue.
import numpy as np
import scipy.signal
import matplotlib.pyplot as plt
# Define heaviside function
H = lambda x: 0.5 * (np.sign(x) + 1.)
#define gaussian
gauss = lambda x, sig: np.exp(-( x/float(sig))**2 )
X = np.linspace(-5, 30, num=3501)
X2 = np.linspace(-5,5, num=1001)
# convolute a heaviside with a gaussian
H_c = np.convolve( H(X), gauss(X2, 1), mode="same" )
# deconvolute a the result
H_dc, er = scipy.signal.deconvolve(H_c, gauss(X2, 1) )
#### Plot ####
fig , ax = plt.subplots(nrows=4, figsize=(6,7))
ax[0].plot( H(X), color="#907700", label="Heaviside", lw=3 )
ax[1].plot( gauss(X2, 1), color="#907700", label="Gauss filter", lw=3 )
ax[2].plot( H_c/H_c.max(), color="#325cab", label="convoluted" , lw=3 )
ax[3].plot( H_dc, color="#ab4232", label="deconvoluted", lw=3 )
for i in range(len(ax)):
ax[i].set_xlim([0, len(X)])
ax[i].set_ylim([-0.07, 1.2])
ax[i].legend(loc=4)
plt.show()
Edit: Note that there is a matlab example, showing how to convolve/deconvolve a rectangular signal using
yc=conv(y,c,'full')./sum(c);
ydc=deconv(yc,c).*sum(c);
In the spirit of this question it would also help if someone was able to translate this example into python.

After some trial and error I found out how to interprete the results of scipy.signal.deconvolve() and I post my findings as an answer.
Let's start with a working example code
import numpy as np
import scipy.signal
import matplotlib.pyplot as plt
# let the signal be box-like
signal = np.repeat([0., 1., 0.], 100)
# and use a gaussian filter
# the filter should be shorter than the signal
# the filter should be such that it's much bigger then zero everywhere
gauss = np.exp(-( (np.linspace(0,50)-25.)/float(12))**2 )
print gauss.min() # = 0.013 >> 0
# calculate the convolution (np.convolve and scipy.signal.convolve identical)
# the keywordargument mode="same" ensures that the convolution spans the same
# shape as the input array.
#filtered = scipy.signal.convolve(signal, gauss, mode='same')
filtered = np.convolve(signal, gauss, mode='same')
deconv, _ = scipy.signal.deconvolve( filtered, gauss )
#the deconvolution has n = len(signal) - len(gauss) + 1 points
n = len(signal)-len(gauss)+1
# so we need to expand it by
s = (len(signal)-n)/2
#on both sides.
deconv_res = np.zeros(len(signal))
deconv_res[s:len(signal)-s-1] = deconv
deconv = deconv_res
# now deconv contains the deconvolution
# expanded to the original shape (filled with zeros)
#### Plot ####
fig , ax = plt.subplots(nrows=4, figsize=(6,7))
ax[0].plot(signal, color="#907700", label="original", lw=3 )
ax[1].plot(gauss, color="#68934e", label="gauss filter", lw=3 )
# we need to divide by the sum of the filter window to get the convolution normalized to 1
ax[2].plot(filtered/np.sum(gauss), color="#325cab", label="convoluted" , lw=3 )
ax[3].plot(deconv, color="#ab4232", label="deconvoluted", lw=3 )
for i in range(len(ax)):
ax[i].set_xlim([0, len(signal)])
ax[i].set_ylim([-0.07, 1.2])
ax[i].legend(loc=1, fontsize=11)
if i != len(ax)-1 :
ax[i].set_xticklabels([])
plt.savefig(__file__ + ".png")
plt.show()
This code produces the following image, showing exactly what we want (Deconvolve(Convolve(signal,gauss) , gauss) == signal)
Some important findings are:
The filter should be shorter than the signal
The filter should be much bigger than zero everywhere (here > 0.013 is good enough)
Using the keyword argument mode = 'same' to the convolution ensures that it lives on the same array shape as the signal.
The deconvolution has n = len(signal) - len(gauss) + 1 points.
So in order to let it also reside on the same original array shape we need to expand it by s = (len(signal)-n)/2 on both sides.
Of course, further findings, comments and suggestion to this question are still welcome.

As written in the comments, I cannot help with the example you posted originally. As #Stelios has pointed out, the deconvolution might not work out due to numerical issues.
I can, however, reproduce the example you posted in your Edit:
That is the code which is a direct translation from the matlab source code:
import numpy as np
import scipy.signal
import matplotlib.pyplot as plt
x = np.arange(0., 20.01, 0.01)
y = np.zeros(len(x))
y[900:1100] = 1.
y += 0.01 * np.random.randn(len(y))
c = np.exp(-(np.arange(len(y))) / 30.)
yc = scipy.signal.convolve(y, c, mode='full') / c.sum()
ydc, remainder = scipy.signal.deconvolve(yc, c)
ydc *= c.sum()
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(4, 4))
ax[0][0].plot(x, y, label="original y", lw=3)
ax[0][1].plot(x, c, label="c", lw=3)
ax[1][0].plot(x[0:2000], yc[0:2000], label="yc", lw=3)
ax[1][1].plot(x, ydc, label="recovered y", lw=3)
plt.show()

Wrong Exponential Power Plot - How to improve curve fit

Unfortunately, the power fit with scipy does not return a good fit. I tried to use p0 as an input argument with close values which did not help.
I would be very glad if someone could point out to me my problem.
# Imports
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
# Data
data = [[0.004408724185371062, 78.78011887652593], [0.005507091456466967, 65.01330508350753], [0.007073553026306459, 58.13364205119446], [0.009417452253958304, 50.12258366028477], [0.01315330108197482, 44.22980301062208], [0.019648758406406834, 35.436139354228956], [0.03248060063099905, 28.359815190205957], [0.06366197723675814, 21.54769216720596], [0.17683882565766149, 14.532777174472574], [1.5915494309189533, 6.156872080264581]]
# Fill lists to store x and y value
x_data,y_data = [], []
for i in data:
x_data.append(i[0])
y_data.append(i[1])
# Exponential Function
def func(x,m,c):
return x**m * c
# Curve fit
coeff, _ = curve_fit(func, x_data, y_data)
m, c = coeff[0], coeff[1]
# Plot function
x_function = np.linspace(0, 1.5, 100)
y = x_function**m * c
a = plt.scatter(x_data, y_data, s=30, marker = "v")
yfunction = x_function**m * c
plt.plot(x_function, yfunction, '-')
plt.show()
Another dataset for which the fit is really bad would be:
data = [[0.004408724185371062, 194.04075083542443], [0.005507091456466967, 146.09194314074864], [0.007073553026306459, 120.2115882821158], [0.009417452253958304, 74.04014371874908], [0.01315330108197482, 34.167114633194736], [0.019648758406406834, 12.775528348369871], [0.03248060063099905, 7.903195816871708], [0.06366197723675814, 5.186092050500438], [0.17683882565766149, 3.260540592404184], [1.5915494309189533, 2.006254812978579]]

I might miss something but I think the curve_fit just works fine. When I compare the residuals obtained by curve_fit to the ones one would obtain using the parameters obtained by excel which you provide in the comments, the python results always lead to lower residuals (code is provided below). You say "Unfortunately the power fit with scipy does not return a good fit." but what exactly is your measure for a "good fit"? The python fit seems always be better than the excel fit with respect to the residuals.
Not sure whether it has to be exactly this function but if not, you could also consider to add a third parameter to your function (below it is named "d") which will lead to better results.
Here is the modified code. I changed your "func" and also increased the resolution for the plot. Then the residuals are printed as well. For the first data set, one obtains for excel around 79.35 and with python around 34.29. For the second data set it is 15220.79 with excel and 601.08 with python (assuming I did not mess anything up).
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
# Data
data = [[0.004408724185371062, 78.78011887652593], [0.005507091456466967, 65.01330508350753], [0.007073553026306459, 58.13364205119446], [0.009417452253958304, 50.12258366028477], [0.01315330108197482, 44.22980301062208], [0.019648758406406834, 35.436139354228956], [0.03248060063099905, 28.359815190205957], [0.06366197723675814, 21.54769216720596], [0.17683882565766149, 14.532777174472574], [1.5915494309189533, 6.156872080264581]]
#data = [[0.004408724185371062, 194.04075083542443], [0.005507091456466967, 146.09194314074864], [0.007073553026306459, 120.2115882821158], [0.009417452253958304, 74.04014371874908], [0.01315330108197482, 34.167114633194736], [0.019648758406406834, 12.775528348369871], [0.03248060063099905, 7.903195816871708], [0.06366197723675814, 5.186092050500438], [0.17683882565766149, 3.260540592404184], [1.5915494309189533, 2.006254812978579]]
# Fill lists to store x and y value
x_data,y_data = [], []
for i in data:
x_data.append(i[0])
y_data.append(i[1])
# Exponential Function
def func(x,m,c):
#slightly rewritten; you could also consider using a third parameter d
return c*np.power(x,m) # + d
# Curve fit
coeff, _ = curve_fit(func, x_data, y_data)
m, c = coeff[0], coeff[1] #, coeff[2]
print m, c #, d
# Plot function
a = plt.scatter(x_data, y_data, s=30, marker = "v")
x_function = np.linspace(0, 1.5, 1000)
yfunction = c*np.power(x_function,m) # + d
plt.plot(x_function, yfunction, '-')
plt.show()
print "residuals python:",((y_data - func(x_data, *coeff))**2).sum()
#compare to excel, first data set
print "residuals excel:",((y_data - func(x_data, -0.425,7.027))**2).sum()
#compare to excel, second data set
print "residuals excel:",((y_data - func(x_data, -0.841,1.0823))**2).sum()

Taking your second dataset as an example: If you plot the raw data, a difficulty with the data becomes obvious: your data are very non-uniform. Now, since your function has a pure power law form, it's easiest to do the fitting in log scale:
In [1]: import numpy as np
In [2]: import matplotlib.pyplot as plt
In [3]: plt.ion()
In [4]: data = [[0.004408724185371062, 194.04075083542443], [0.005507091456466967, 146.09194314074864], [0.007073553026306459, 120.2115882821158], [0.009417452253958304, 74.04014371874908], [0.01315330108197482, 34.167114633194736], [0.019648758406406834, 12.775528348369871], [0.03248060063099905, 7.903195816871708], [0.06366197723675814, 5.186092050500438], [0.17683882565766149, 3.260540592404184], [1.5915494309189533, 2.006254812978579]]
In [5]: data = np.asarray(data) # just for convenience
In [6]: data.shape
Out[6]: (10, 2)
In [7]: x, y = data[:, 0], data[:, 1]
In [8]: lx, ly = np.log(x), np.log(y)
In [9]: plt.plot(lx, ly, 'ro')
Out[9]: [<matplotlib.lines.Line2D at 0x323a250>]
In [10]: def lfunc(x, a, b):
....: return a*x + b
....:
In [11]: from scipy.optimize import curve_fit
In [12]: opt, cov = curve_fit(lfunc, lx, ly)
In [13]: opt
Out[13]: array([-0.84071518, 0.07906558])
In [14]: plt.plot(lx, lfunc(lx, *opt), 'b-')
Out[14]: [<matplotlib.lines.Line2D at 0x3be0f90>]
Whether this is an adequate model for the data is a separate concern.

curve fitting with a known function numpy

I have a x and y one-dimension numpy array and I would like to reproduce y with a known function to obtain "beta". Here is the code I am using:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
y = array([ 0.04022493, 0.04287536, 0.03983657, 0.0393201 , 0.03810298,
0.0363814 , 0.0331144 , 0.03074823, 0.02795767, 0.02413816,
0.02180802, 0.01861309, 0.01632699, 0.01368056, 0.01124232,
0.01005323, 0.00867196, 0.00940864, 0.00961282, 0.00892419,
0.01048963, 0.01199101, 0.01533408, 0.01855704, 0.02163586,
0.02630014, 0.02971127, 0.03511223, 0.03941218, 0.04280329,
0.04689105, 0.04960554, 0.05232003, 0.05487037, 0.05843364,
0.05120701])
x= array([ 0., 0.08975979, 0.17951958, 0.26927937, 0.35903916,
0.44879895, 0.53855874, 0.62831853, 0.71807832, 0.80783811,
0.8975979 , 0.98735769, 1.07711748, 1.16687727, 1.25663706,
1.34639685, 1.43615664, 1.52591643, 1.61567622, 1.70543601,
1.7951958 , 1.88495559, 1.97471538, 2.06447517, 2.15423496,
2.24399475, 2.33375454, 2.42351433, 2.51327412, 2.60303391,
2.6927937 , 2.78255349, 2.87231328, 2.96207307, 3.05183286,
3.14159265])
def func(x,beta):
return 1.0/(4.0*np.pi)*(1+beta*(3.0/2*np.cos(x)**2-1.0/2))
guesses = [20]
popt,pcov = curve_fit(func,x,y,p0=guesses)
y_fit = 1/(4.0*np.pi)*(1+popt[0]*(3.0/2*np.cos(x)**2-1.0/2))
plt.figure(1)
plt.plot(x,y,'ro',x,y_fit,'k-')
plt.show()
The code works but the fitting is completely off (see picture). Any idea why?
It looks like the formula to use contains an additional parameter, i.e. p
def func(x,beta,p):
return p/(4.0*np.pi)*(1+beta*(3.0/2*np.cos(x)**2-1.0/2))
guesses = [20,5]
popt,pcov = curve_fit(func,x,y,p0=guesses)
y_fit = func(angle_plot,*popt)
plt.figure(2)
plt.plot(x,y,'ro',x,y_fit,'k-')
plt.show()
print popt # [ 1.23341604 0.27362069]
In the popt which one is beta and which one is p?

This is perhaps not what you want but, if you are just trying to get a good fit to the data, you could use np.polyfit:
fit = np.polyfit(x,y,4)
fit_fn = np.poly1d(fit)
plt.scatter(x,y,label='data',color='r')
plt.plot(x,fit_fn(x),color='b',label='fit')
plt.legend(loc='upper left')
Note that fit gives the coefficient values of, in this case, a 4th order polynomial:
>>> fit
array([-0.00877534, 0.05561778, -0.09494909, 0.02634183, 0.03936857])

This is going to be as good as you can get (assuming you get the equation right as #mdurant suggested), an additional intercept term is required to further improve the fit:
def func(x,beta, icpt):
return 1.0/(4.0*np.pi)*(1+beta*(3.0/2*np.cos(x)**2-1.0/2))+icpt
guesses = [20, 0]
popt,pcov = curve_fit(func,x,y,p0=guesses)
y_fit = func(x, *popt)
plt.figure(1)
plt.plot(x,y,'ro', x,y_fit,'k-')
print popt #[ 0.33748816 -0.05780343]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

create a function for chi_square - python

Related

Cannot sample values of a trigonometric function

Confine a gaussian fit with curve_fit

Understanding scipy deconvolve

Wrong Exponential Power Plot - How to improve curve fit

curve fitting with a known function numpy

Categories

Resources