in the framework of my bachelor's thesis, I need to evaluate my data with python. Unfortunately there's no suiting script of my fellow students yet and I'm quite new to programming.
I have this data set and I'm trying to fit it with a gaussian by using scipy.optimize.curve_fit. Since there are a lot of unusable counts especially at the end of the axis, I'd like to confine the part that is to be fitted.
Picture raw data
This is what I have so far:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x=np.arange(5120)
y=array([ 0.81434599, 1.17054264, 0.85279188, ..., 1. ,
1. , 13.56291391]) #most of the data isn't interesting
#to me, part of interest see below
def Gauss(x, a, x0, sigma):
return a * np.exp(-(x - x0)**2 / (2 * sigma**2))
mean = sum(x * y) / sum(y)
sigma = np.sqrt(sum(y * (x - mean)**2) / sum(y))
popt,pcov = curve_fit(Gauss, x, y, p0=[max(y), mean, sigma],
maxfev=360000)
plt.plot(x,y,label='data')
plt.plot(x,Gauss(x, *popt), 'r-',label='fit')
On docs.scipy.org I've found a general description about curve_fit
If I try using
bounds=([2400,-np.inf, -np.inf],[2600, np.inf, np.inf]),
I'm getting the ValueError: x0 is infeasible. What is the problem here?
I also tried to confine it with
popt,pcov = curve_fit(Gauss, x[2400:2600], y[2400:2600], p0=[max(y), mean, sigma], maxfev=360000)
as suggested in a comment on this question: "Error when obtaining gaussian fit for graph" at stackoverflow
In this case I only get a straight line though.
Picture: Confinement with x[2400:2600],y[2400:2600] as arguments of curve_fit
I really hope you can help me out here. I only need a way to fit a small part of my data. Thanks in advance!
interesting data:
y=array([ 0.93396226, 1.00884956, 1.15457413, 1.07590759,
0.88915094, 1.07142857, 1.10714286, 1.14171123, 1.06666667,
0.84975369, 0.95480226, 0.99388379, 1.01675978, 0.83967391,
0.9771987 , 1.02402402, 1.04531722, 1.07492795, 0.97135417,
0.99714286, 1.0248139 , 1.26223776, 1.1533101 , 0.99099099,
1.18867925, 1.15772871, 0.95076923, 1.03313253, 1.02278481,
0.93265993, 1.06705539, 1.00265252, 1.02023121, 0.92076503,
0.99728997, 1.03353659, 1.15116279, 1.04336043, 0.95076923,
1.05515588, 0.92571429, 0.93448276, 1.02702703, 0.90056818,
0.96068796, 1.08493151, 1.13584906, 1.1212938 , 1.0739645 ,
0.98972603, 0.94594595, 1.07913669, 0.98425197, 0.87762238,
0.96811594, 1.02710843, 0.99392097, 0.91384615, 1.09809264,
1.00630915, 0.93175074, 0.87572254, 1.00651466, 0.78772379,
1.12244898, 1.2248062 , 0.97109827, 0.94607843, 0.97900262,
0.97527473, 1.01212121, 1.16422287, 1.20634921, 0.97275204,
1.01090909, 0.99404762, 1.00561798, 1.01146132, 1.08695652,
0.97214485, 1.03525641, 0.99096386, 1.05135952, 1.16451613,
0.90462428, 0.76876877, 0.47701149, 0.27607362, 0.21580547,
0.20598007, 0.16766467, 0.15533981, 0.19745223, 0.15407855,
0.18925831, 0.26997245, 0.47603834, 0.596875 , 0.85126582, 0.96
, 1.06578947, 1.08761329, 0.89548023, 0.99705882, 1.07142857,
0.95677233, 0.86119874, 1.02857143, 0.98250729, 0.94214876,
1.04166667, 0.96024465, 1.07022472, 1.10344828, 1.04859335,
0.96655518, 1.06424581, 1.01754386, 1.03492063, 1.18627451,
0.91036415, 1.03355705, 1.09116809, 0.96083551, 1.01298701,
1.03691275, 1.02923977, 1.11612903, 1.01457726, 1.06285714,
0.98186528, 1.16470588, 0.86645963, 1.07317073, 1.09615385,
1.21192053, 0.94385027, 0.94244604, 0.88390501, 0.95718654,
0.9691358 , 1.01729107, 1.01119403, 1.20350877, 1.12890625,
1.06940063, 0.90410959, 1.14662757, 0.97093023, 1.03021148,
1.10629921, 0.97118156, 1.10693642, 1.07917889, 0.9484127 ,
1.07581227, 0.98006645, 0.98986486, 0.90066225, 0.90066225,
0.86779661, 0.86779661, 0.96996997, 1.01438849, 0.91186441,
0.91290323, 1.03745318, 1.0615942 , 0.97202797, 1.16608997,
0.94182825, 1.08333333, 0.9076087 , 1.18181818, 1.20618557,
1.01273885, 0.93606138, 0.87457627, 0.90575916, 1.09756098,
0.99115044, 1.13380282, 1.04333333, 1.04026846, 1.0297619 ,
1.04334365, 1.03395062, 0.92553191, 0.98198198, 1. ,
0.9439528 , 1.02684564, 1.1372549 , 0.96676737, 0.99649123,
1.07051282, 1.10367893, 1.0866426 , 1.15384615, 0.99667774])
You might find the lmfit module (https://lmfit.github.io/lmfit-py/) useful for this. It is designed to make curve fitting very easy, has built-in models for common peaks like Gaussian, and has many useful features such as allowing you to set bounds on parameters. A fit to your data with lmfit might look like this:
import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import GaussianModel, ConstantModel
y = np.array([.....]) # uses your shorter data range
x = np.arange(len(y))
# make a model that is a Gaussian + a constant:
model = GaussianModel(prefix='peak_') + ConstantModel()
# make parameters with starting values:
params = model.make_params(c=1.0, peak_center=90,
peak_sigma=5, peak_amplitude=-5)
# it's not really needed for this data, but you can put bounds on
# parameters like this (or set .vary=False to fix a parameter)
params['peak_sigma'].min = 0 # sigma > 0
params['peak_amplitude'].max = 0 # amplitude < 0
params['peak_center'].min = 80
params['peak_center'].max = 100
# run fit
result = model.fit(y, params, x=x)
# print, plot results
print(result.fit_report())
plt.plot(x, y)
plt.plot(x, result.best_fit)
plt.show()
This will print out
[[Model]]
(Model(gaussian, prefix='peak_') + Model(constant))
[[Fit Statistics]]
# function evals = 54
# data points = 200
# variables = 4
chi-square = 1.616
reduced chi-square = 0.008
Akaike info crit = -955.625
Bayesian info crit = -942.432
[[Variables]]
peak_sigma: 4.03660814 +/- 0.204240 (5.06%) (init= 5)
peak_center: 91.2246614 +/- 0.200267 (0.22%) (init= 90)
peak_amplitude: -9.79111362 +/- 0.445273 (4.55%) (init=-5)
c: 1.02138228 +/- 0.006796 (0.67%) (init= 1)
peak_fwhm: 9.50548558 +/- 0.480950 (5.06%) == '2.3548200*peak_sigma'
peak_height: -0.96766623 +/- 0.041854 (4.33%) == '0.3989423*peak_amplitude/max(1.e-15, peak_sigma)'
[[Correlations]] (unreported correlations are < 0.100)
C(peak_sigma, peak_amplitude) = -0.599
C(peak_amplitude, c) = -0.328
C(peak_sigma, c) = 0.196
and make a plot like this:
Related
I have been struggling for apparently no reason trying to fit a sin function to a small dataset that resembles a sinusoid. I've looked at many other questions and tried different libraries and can't seem to find any glaring mistake in my code. Also in many answers people are fitting a function onto data where y = f(x); but I'm retrieving both of my lists independently from stellar spectra.
These are the lists for reference:
time = np.array([2454294.5084288 , 2454298.37039515, 2454298.6022165 ,
2454299.34790096, 2454299.60750029, 2454300.35176022,
2454300.61361622, 2454301.36130122, 2454301.57111912,
2454301.57540159, 2454301.57978822, 2454301.5842906 ,
2454301.58873511, 2454302.38635047, 2454302.59553152,
2454303.41548415, 2454303.56765036, 2454303.61479213,
2454304.38528718, 2454305.54043812, 2454306.36761011,
2454306.58025083, 2454306.60772791, 2454307.36686591,
2454307.49460991, 2454307.58258509, 2454308.3698358 ,
2454308.59468672, 2454309.40004997, 2454309.51208756,
2454310.43078368, 2454310.6091061 , 2454311.40121502,
2454311.5702085 , 2454312.39758274, 2454312.54580053,
2454313.52984047, 2454313.61734047, 2454314.37609003,
2454315.56721061, 2454316.39218499, 2454316.5672538 ,
2454317.49410168, 2454317.6280825 , 2454318.32944441,
2454318.56913047])
velocities = np.array([-2.08468951, -2.26117398, -2.44703149, -2.10149768, -2.09835213,
-2.20540079, -2.4221183 , -2.1394637 , -2.0841663 , -2.2458154 ,
-2.06177386, -2.47993416, -2.13462117, -2.26602791, -2.47359571,
-2.19834895, -2.17976339, -2.37745005, -2.48849617, -2.15875901,
-2.27674409, -2.39054554, -2.34029665, -2.09267843, -2.20338104,
-2.49483926, -2.08860222, -2.26816951, -2.08516229, -2.34925637,
-2.09381667, -2.21849357, -2.43438148, -2.28439031, -2.43506056,
-2.16953358, -2.24405359, -2.10093237, -2.33155007, -2.37739938,
-2.42468714, -2.19635302, -2.368558 , -2.45959665, -2.13392004,
-2.25268181]
These are radial velocities of a star observed at different times. When plotted they look like this:
Plotted Data
This is then the code I'm using to fit a test sine on the data:
x = time
y = velocities
def sin_fit(x, A, w):
return A * np.sin(w * x)
popt, pcov = curve_fit(sin_fit,x,y) #try to calculate exoplanet parameters with these data
xfit = np.arange(min(x),max(x),0.1)
fit = sin_fit(xfit,*popt)
mod = plt.figure()
plt.xlabel("Time (G. Days)")
plt.ylabel("Radial Velocity")
plt.scatter(x,[i for i in y],color="b",label="Data")
plt.plot(x,[i for i in y],color="b",alpha=0.2)
plt.plot(xfit,fit,color="r",label="Model Fit")
plt.legend()
mod.savefig("Data with sin fit.png")
plt.show()
I thought this was right, and it seems right by looking at other answers, but then this is what I get:
Data with model sine
What am I doing wrong?
Thank you in advanceee
I guess it's due the sin_fit function is not able to fit the data at all. The sin function per default whirls around y=0 while your data whirls somewhere around y=-2.3.
I tried your code and extended the sin_fit with an offset, yielding way better results (althought looking not too perfect):
def sin_fit(x, A, w, offset):
return A * np.sin(w * x) + offset
with this the function has at least a chance to fit
I am having issues fitting a Gaussian to my data. Currently the output for my code looks like
this. Where orange is the data, blue is the gaussian fit and green is an in-built gaussian fitter however I do not wish to use it as it never quite begins at zero and I do not have access to the code. I would like my output to look something like this where the drawn in red is the gaussian fit.
I have tried reading about the curve_fit documentation however at best I get a fit that looks like this which fits over all the data, however, this is undesirable as I am only interested in the central peak which is my main issue - I do not know how to get curve_fit to fit a gaussian on the central peak like in the second image.
I have considered using a weights function like np.random.choice() or looking at the data file's maximum value and then looking at the second derivative at either side of the central peak to see where there are changes in inflection but am unsure how best to implement this.
How would I best go about this? I have done a lot of googling but cant quite get my head around changing curve_fit to suit my needs.
Cheers for any pointers!
This is a data file.
https://drive.google.com/file/d/1qrAkD74U6L46GoGnvMiUHdPuLEToS6Pv/view?usp=sharing
This is my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from matplotlib.pyplot import figure
plt.close('all')
fpathB4 = 'E:\.1. Work - Current Projects + Old Projects\Current Projects\PF 4MHz Laser System\.8. 1050 SYSTEM\AC traces'
fpath = fpathB4.replace('\\','/') + ('/')
filename = '300'
with open(fpath+filename) as f:
dataraw = f.readlines()
FWHM = dataraw[8].split(':')[1].split()[0]
FWHM = np.float(FWHM)
print("##### For AC file -", filename, "#####")
print("Auto-co guess -", FWHM, "ps")
pulseduration = FWHM/np.sqrt(2)
pulseduration = str(pulseduration)
dataraw = dataraw[15:]
print("Pulse duration -", pulseduration, "ps" + "\n")
time = np.array([])
acf1 = np.array([]) #### DATA
fit = np.array([]) #### Gaussian fit
for k in dataraw:
data = k.split()
time = np.append(time, np.float(data[0]))
acf1= np.append(acf1, np.float(data[1]))
fit = np.append(fit, np.float(data[2]))
n = len(time)
y = acf1.copy()
x = time.copy()
mean = sum(x*y)/n
sigma = sum(y*(x-mean)**2)/n
def gaus(x,a,x0,sigma):
return a*np.exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma])
plt.plot(x,gaus(x,*popt)/np.max(gaus(x,*popt)))
figure(num=1, figsize=(8, 3), dpi=96, facecolor='w', edgecolor='k') # figsize = (length, height)
plt.plot(time, acf1/np.max(acf1), label = 'Data - ' + filename, linewidth = 1)
plt.plot(time, fit/np.max(fit), label = '$FWHM_{{\Delta t}}$ (ps) = ' + pulseduration)
plt.autoscale(enable = True, axis = 'x', tight = True)
plt.title("Auto-Correlation Data")
plt.xlabel("Time (ps)")
plt.ylabel("Intensity (a.u.)")
plt.legend()
I think the problem might be that the data are not completely Gaussian-like. It seems you have some kind of Airy/sinc function due to the time resolution of your acquisition instrument. Still, if you are only interested in the center you can still fit it using a single gaussian:
import fitwrap as fw
import pandas as pd
df = pd.read_csv('300', skip_blank_lines=True, skiprows=13, sep='\s+')
def gaussian_no_offset(x, x0=2, sigma=1, amp=300):
return amp*np.exp(-(x-x0)**2/sigma**2)
fw.fit(gaussian_no_offset, df.time, df.acf1)
x0: 2.59158 +/- 0.00828 (0.3%) initial:2
sigma: 0.373 +/- 0.0117 (3.1%) initial:1
amp: 355.02 +/- 9.65 (2.7%) initial:300
If you want to be slightly more precise I can think of a sinc squared function for the peak and a broad gaussian offset. The fit seems nicer, but it really depends on what the data actually represents...
def sinc(x, x0=2.5, amp=300, width=1, amp_g=20, sigma=3):
return amp*(np.sinc((x-x0)/width))**2 + amp_g*np.exp(-(x-x0)**2/sigma**2)
fw.fit(sinc, df.time, df.acf1)
x0: 2.58884 +/- 0.0021 (0.1%) initial:2.5
amp: 303.84 +/- 3.7 (1.2%) initial:300
width: 0.49211 +/- 0.00565 (1.1%) initial:1
amp_g: 81.32 +/- 2.11 (2.6%) initial:20
sigma: 1.512 +/- 0.0351 (2.3%) initial:3
I'd add a constant to the Gaussian equation, and limit the range of that in the bounds parameter of curve fit, so that the graph isn't raised higher.
So your equation would be:
def gaus(y0,x,a,x0,sigma):
return y0 + a*np.exp(-(x-x0)**2/(2*sigma**2))
and the curve_fit bounds would be something like this:
curve_fit(..... ,bounds = [[0,a_min, x0_min, sigma_min],[0.1, a_max, x0_max, sigma_max]])
There is an equation of exponential truncated power law in the article below:
Gonzalez, M. C., Hidalgo, C. A., & Barabasi, A. L. (2008). Understanding individual human mobility patterns. Nature, 453(7196), 779-782.
like this:
It is an exponential truncated power law. There are three parameters to be estimated: rg0, beta and K. Now we have got several users' radius of gyration(rg), and uploaded it onto Github: radius of gyrations.txt
The following codes can be used to read data and calculate P(rg):
import numpy as np
# read radius of gyration from file
rg = []
with open('/path-to-the-data/radius of gyrations.txt', 'r') as f:
for i in f:
rg.append(float(i.strip('\n')))
# calculate P(rg)
rg = sorted(rg, reverse=True)
rg = np.array(rg)
prg = np.arange(len(sorted_data)) / float(len(sorted_data)-1)
or you can directly get rg and prg data as the following:
rg = np.array([ 20.7863444 , 9.40547933, 8.70934714, 8.62690145,
7.16978087, 7.02575052, 6.45280959, 6.44755478,
5.16630287, 5.16092884, 5.15618737, 5.05610068,
4.87023561, 4.66753197, 4.41807645, 4.2635671 ,
3.54454372, 2.7087178 , 2.39016885, 1.9483156 ,
1.78393238, 1.75432688, 1.12789787, 1.02098332,
0.92653501, 0.32586582, 0.1514813 , 0.09722761,
0. , 0. ])
prg = np.array([ 0. , 0.03448276, 0.06896552, 0.10344828, 0.13793103,
0.17241379, 0.20689655, 0.24137931, 0.27586207, 0.31034483,
0.34482759, 0.37931034, 0.4137931 , 0.44827586, 0.48275862,
0.51724138, 0.55172414, 0.5862069 , 0.62068966, 0.65517241,
0.68965517, 0.72413793, 0.75862069, 0.79310345, 0.82758621,
0.86206897, 0.89655172, 0.93103448, 0.96551724, 1. ])
I can plot the P(r_g) and r_g using the following python script:
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(rg, prg, 'bs', alpha = 0.3)
# roughly estimated params:
# rg0=1.8, beta=0.15, K=5
plt.plot(rg, (rg+1.8)**-.15*np.exp(-rg/5))
plt.yscale('log')
plt.xscale('log')
plt.xlabel('$r_g$', fontsize = 20)
plt.ylabel('$P(r_g)$', fontsize = 20)
plt.show()
How can I use these data of rgs to estimate the three parameters above? I hope to solve it using python.
According to #Michael 's suggestion, we can solve the problem using scipy.optimize.curve_fit
def func(rg, rg0, beta, K):
return (rg + rg0) ** (-beta) * np.exp(-rg / K)
from scipy import optimize
popt, pcov = optimize.curve_fit(func, rg, prg, p0=[1.8, 0.15, 5])
print popt
print pcov
The results are given below:
[ 1.04303608e+03 3.02058550e-03 4.85784945e+00]
[[ 1.38243336e+18 -6.14278286e+11 -1.14784675e+11]
[ -6.14278286e+11 2.72951900e+05 5.10040746e+04]
[ -1.14784675e+11 5.10040746e+04 9.53072925e+03]]
Then we can inspect the results by plotting the fitted curve.
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(rg, prg, 'bs', alpha = 0.3)
plt.plot(rg, (rg+popt[0])**-(popt[1])*np.exp(-rg/popt[2]) )
plt.yscale('log')
plt.xscale('log')
plt.xlabel('$r_g$', fontsize = 20)
plt.ylabel('$P(r_g)$', fontsize = 20)
plt.show()
I'm quite new to python and the lmfit model and having some trouble. I want to fit a peak function (something like Gaussian or Voigtian profil) to my experimental data, but it never gives me any good results. Its best fit is a linear function, which kind of describes the base line of my peak profile.
The x data for the fitting process are simply numbers running from 0 to 100. Here are my y data:
array([ 0.99518284, 0.99449661, 0.99609029, 0.996 , 0.994307 ,
0.999693 , 0.99826185, 0.99680361, 0.99474041, 0.99793228,
0.99385553, 0.99869526, 1.00044695, 0.99625734, 0.99758916,
0.99489842, 1.00032957, 0.9967088 , 0.99655982, 0.99990068,
0.99515576, 0.99665914, 0.99990068, 0.99595034, 0.99792777,
0.9941851 , 0.99458691, 0.99312415, 0.99815801, 0.99851919,
0.99637472, 0.996 , 0.99632957, 0.99185102, 0.99173363,
0.9915395 , 0.99038826, 0.9917246 , 0.99315124, 0.98968397,
0.99120993, 0.98981038, 0.9892009 , 0.99009932, 0.98853725,
0.98624379, 0.98620316, 0.9826772 , 0.99204966, 0.98455982,
0.99049661, 0.98591422, 0.98906546, 0.98664108, 0.98740858,
0.99076298, 0.99046953, 0.99067269, 0.99255982, 0.99264108,
0.99215801, 0.99990068, 0.9948623 , 0.99616704, 0.99307449,
0.99626637, 0.9934447 , 0.99476749, 0.99636117, 0.99840181,
0.9984921 , 0.99782844, 0.99853273, 0.99575621, 0.9985553 ,
0.99936343, 0.99643792, 0.99825734, 0.9964605 , 0.99879007,
1.00068172, 0.99580135, 0.99898871, 1.00069074, 0.99920993,
0.9963702 , 0.99591874, 0.99730023, 0.99765237, 0.99334537,
0.99798194, 0.99770655, 0.99702935, 0.99716027, 0.99662754,
0.99779684, 0.9967088 , 0.99736343, 0.99786907, 0.9968623 ,
0.99961174])
I tried the following approaches with different model functions (Gaussian, Voigtian and PseudoVoigtian):
>>> from lmfit.models import PseudoVoigtModel
>>> mod = PseudoVoigtModel()
>>> pars = mod.guess(y, x=x)
>>> out = mod.fit(y, pars, x=x)
>>> print(out.fit_report(min_correl=0.25))
>>> out.plot()
The exact same code works very well for a profile test function, which I created, so I guess there is nothing wrong with it. But for the real measurement data, it always gives a linear function, no matter which profile model I choose. Here is an example:
>>> out.best_fit
array([ 0.99410398, 0.99412124, 0.99413851, 0.99415577, 0.99417303,
0.99419029, 0.99420755, 0.99422481, 0.99424207, 0.99425932,
0.99427658, 0.99429383, 0.99431108, 0.99432833, 0.99434558,
0.99436283, 0.99438007, 0.99439732, 0.99441456, 0.9944318 ,
0.99444904, 0.99446628, 0.99448351, 0.99450075, 0.99451798,
0.99453522, 0.99455245, 0.99456968, 0.99458691, 0.99460413,
0.99462136, 0.99463858, 0.99465581, 0.99467303, 0.99469025,
0.99470747, 0.99472468, 0.9947419 , 0.99475912, 0.99477633,
0.99479354, 0.99481075, 0.99482796, 0.99484517, 0.99486237,
0.99487958, 0.99489678, 0.99491398, 0.99493118, 0.99494838,
0.99496558, 0.99498278, 0.99499997, 0.99501716, 0.99503436,
0.99505155, 0.99506874, 0.99508592, 0.99510311, 0.9951203 ,
0.99513748, 0.99515466, 0.99517184, 0.99518902, 0.9952062 ,
0.99522338, 0.99524055, 0.99525772, 0.9952749 , 0.99529207,
0.99530924, 0.9953264 , 0.99534357, 0.99536074, 0.9953779 ,
0.99539506, 0.99541222, 0.99542938, 0.99544654, 0.9954637 ,
0.99548085, 0.99549801, 0.99551516, 0.99553231, 0.99554946,
0.99556661, 0.99558376, 0.9956009 , 0.99561805, 0.99563519,
0.99565233, 0.99566947, 0.99568661, 0.99570375, 0.99572088,
0.99573802, 0.99575515, 0.99577228, 0.99578941, 0.99580654,
0.99582367])
I used the following approach for another try, but here, it didn't fit something at all and I only got nan values back, although it works fine for my Gaussian test function:
from lmfit.models import GaussianModel
from lmfit import Model
import numpy as np
def gaussian(x, amp, cen, wid):
"1-d gaussian: gaussian(x, amp, cen, wid)"
return (amp/(sqrt(2*pi)*wid)) * exp(-(x-cen)**2 /(2*wid**2))
gmod = Model(gaussian)
mod.set_param_hint('x', value=10)
mod.set_param_hint('cent', value=47)
mod.set_param_hint('wid', value=20)
mod.set_param_hint('amp', value=0.2)
pars = gmod.make_params()
out = gmod.fit(normedy, pars, x=x)
print(out.fit_report(min_correl=0.1))
plt.figure(5, figsize=(8,8))
out.plot_fit()
I tried to fit the data with origin and it definitely works (so the data are not 'unfitable'), but how can I do it with python properly? Do you no any other ways I can try or things I can initialise to make it work?
A PseudoVoigt function (or Voigt or Gaussian or Lorentzian) goes to 0 at +/- infinity. Your data looks to go to ~1.0, with a dip around x=50.
You almost certainly want to add either a linear or constant component to the model. For a linear component, try:
mod = PseudoVoigtModel()
pars = mod.guess(y, x=x)
mod = mod + LinearModel()
pars.add('intercept', value=1, vary=True)
pars.add('slope', value=0, vary=True)
out = mod.fit(y, pars, x=x)
print(out.fit_report(min_correl=0.25))
or for a constant, try:
mod = PseudoVoigtModel()
pars = mod.guess(y, x=x)
mod = mod + ConstantModel()
pars.add('c', value=1, vary=True)
out = mod.fit(y, pars, x=x)
print(out.fit_report(min_correl=0.25))
as a better model for this data.
Also, to get better initial values for the parameters, you might try:
mod = PseudoVoigtModel()
pars = mod.guess((1-y), x=x) # Note '1-y'
so that the curve being used for initial values is more like a positive peak. Of course, the sign of the amplitude will be wrong, but its magnitude will be close, and the starting center and width will be close to correct. That should make the fit more robust.
I have a x and y one-dimension numpy array and I would like to reproduce y with a known function to obtain "beta". Here is the code I am using:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
y = array([ 0.04022493, 0.04287536, 0.03983657, 0.0393201 , 0.03810298,
0.0363814 , 0.0331144 , 0.03074823, 0.02795767, 0.02413816,
0.02180802, 0.01861309, 0.01632699, 0.01368056, 0.01124232,
0.01005323, 0.00867196, 0.00940864, 0.00961282, 0.00892419,
0.01048963, 0.01199101, 0.01533408, 0.01855704, 0.02163586,
0.02630014, 0.02971127, 0.03511223, 0.03941218, 0.04280329,
0.04689105, 0.04960554, 0.05232003, 0.05487037, 0.05843364,
0.05120701])
x= array([ 0., 0.08975979, 0.17951958, 0.26927937, 0.35903916,
0.44879895, 0.53855874, 0.62831853, 0.71807832, 0.80783811,
0.8975979 , 0.98735769, 1.07711748, 1.16687727, 1.25663706,
1.34639685, 1.43615664, 1.52591643, 1.61567622, 1.70543601,
1.7951958 , 1.88495559, 1.97471538, 2.06447517, 2.15423496,
2.24399475, 2.33375454, 2.42351433, 2.51327412, 2.60303391,
2.6927937 , 2.78255349, 2.87231328, 2.96207307, 3.05183286,
3.14159265])
def func(x,beta):
return 1.0/(4.0*np.pi)*(1+beta*(3.0/2*np.cos(x)**2-1.0/2))
guesses = [20]
popt,pcov = curve_fit(func,x,y,p0=guesses)
y_fit = 1/(4.0*np.pi)*(1+popt[0]*(3.0/2*np.cos(x)**2-1.0/2))
plt.figure(1)
plt.plot(x,y,'ro',x,y_fit,'k-')
plt.show()
The code works but the fitting is completely off (see picture). Any idea why?
It looks like the formula to use contains an additional parameter, i.e. p
def func(x,beta,p):
return p/(4.0*np.pi)*(1+beta*(3.0/2*np.cos(x)**2-1.0/2))
guesses = [20,5]
popt,pcov = curve_fit(func,x,y,p0=guesses)
y_fit = func(angle_plot,*popt)
plt.figure(2)
plt.plot(x,y,'ro',x,y_fit,'k-')
plt.show()
print popt # [ 1.23341604 0.27362069]
In the popt which one is beta and which one is p?
This is perhaps not what you want but, if you are just trying to get a good fit to the data, you could use np.polyfit:
fit = np.polyfit(x,y,4)
fit_fn = np.poly1d(fit)
plt.scatter(x,y,label='data',color='r')
plt.plot(x,fit_fn(x),color='b',label='fit')
plt.legend(loc='upper left')
Note that fit gives the coefficient values of, in this case, a 4th order polynomial:
>>> fit
array([-0.00877534, 0.05561778, -0.09494909, 0.02634183, 0.03936857])
This is going to be as good as you can get (assuming you get the equation right as #mdurant suggested), an additional intercept term is required to further improve the fit:
def func(x,beta, icpt):
return 1.0/(4.0*np.pi)*(1+beta*(3.0/2*np.cos(x)**2-1.0/2))+icpt
guesses = [20, 0]
popt,pcov = curve_fit(func,x,y,p0=guesses)
y_fit = func(x, *popt)
plt.figure(1)
plt.plot(x,y,'ro', x,y_fit,'k-')
print popt #[ 0.33748816 -0.05780343]