I am trying to fit a gaussian data to a specific three-term gaussian (in which the amplitude in one term is equal to twice the standard deviation of the next term). Here is my attempt:
import numpy as np
#from scipy.optimize import curve_fit
import scipy.optimize as optimize
import matplotlib.pyplot as plt
#r=np.linspace(0.0e-15,4e-15, 100)
data = np.loadtxt('V_lambda_n.dat')
r = data[:, 0]
V = data[:, 1]
def func(x, ps1, ps2, ps3, ps4):
return ps1*np.exp(-(x/ps2)**2) + ps2*np.exp(-(x/ps3)**2) + ps3*np.exp(-(x/ps4)**2)
popt, pcov = optimize.curve_fit(func, r, V, maxfev=10000)
#params = optimize.curve_fit(func, ps1, ps2, ps3, ps4)
#[ps1, ps2, ps2, ps4] = params[0]
p1=plt.plot(r, V, 'bo', label='data')
p2=plt.plot(r, func(r, *popt), 'r-', label='fit')
plt.xticks(np.linspace(0, 4, 9, endpoint=True))
plt.yticks(np.linspace(-50, 150, 9, endpoint=True))
plt.show()
Here is the result:
How may I fix this code to improve the fit? Thanks
With the help of friends from scipy-user forum, I tried as initial guess the following:
p0=[V.max(), std_dev, V.max(), 2]
The fit got a lot better. The new fit is as shown
enter image description here
I hope the fit could get better than this.
Related
Please tell me how to determine the unknown parameters in the calculated curve, using scipy optimization, having an experimental curve at the input. I need to determine the unknown parameters a, b, c (in the code below) from the calculated curve, so that the standard deviation functional is minimal
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from math import pi
def func(a,b,c):
return -a/(2*np.tan(c*(pi/2)))+np.sqrt(b+(a**2)/(4*np.tan((c)*(pi/2))))
file=experimental curve in .txt file
pd_file=pd.read_csv(file, sep="\s+",header=None,names=['frequence', 'y'],
skiprows=1)
xdata=pd_file['frequence']
ydata=pd_file['y']
popt, pcov = curve_fit(func, xdata, ydata, p0=[0.6,1], maxfev=500000000)
print('popt',popt)
I do not think your functional form is suitable for fitting the data you have. After some experimentation may I suggest a different one:
def func2(x,b, d):
return 0.2/ (1 + b * x + d * np.log(1+x))
file='chi_strich_strich_H0.txt'
pd_file=pd.read_csv(file, sep="\s+",header=None,names=['frequence', 'y'],
skiprows=1)
xdata=pd_file['frequence']
ydata=pd_file['y']
popt, pcov = curve_fit(func2, xdata, ydata, p0=[0,0], maxfev=500000000)
print('popt',popt)
yfit = func2(xdata,popt[0], popt[1])
plt.plot(xdata, ydata, '.', label = 'data')
plt.plot(xdata, yfit, '-', label = 'fit')
plt.legend(loc = 'best')
plt.show()
popt: [ 1.84672386e-05 -7.69652828e-02]
The fit is on this plot:
I have been trying to fit a gaussian curve to my data
data
I have used the following code:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def gaus(x, y0, a, b, c):
return y0 + a*np.exp(-np.power(x - b, 2)/(2*np.power(c, 2)))
popt, pcov = curve_fit(gaus, x, y)
plt.figure()
plt.scatter(x, y, c='grey', marker = 'o', label = "Measured values", s = 2)
plt.plot(x, gaus(x, *popt), c='grey', linestyle = '-')
And that's what I am getting:
result
I have the x/y data available here in case you want to try it by yourself.
Any idea on how can I get a fit? This data is obviously gaussian shaped, so it seems weird I cannot fit a gaussian curve.
The fit needs a decent starting point. Per the docs if you do not specify the starting point all parameters are set to 1 which is clearly not appropriate, and the fit gets stuck in some wrong local minima. Try this, where I chose the starting point by eyeballing the data
popt, pcov = curve_fit(gaus, x, y, p0 = (1500,2000,20, 1))
you would get something like this:
and the solution found by the solver is
popt
array([1559.13138798, 2128.64718985, 21.50092272, 0.16298357])
Even just getting the mean (parameter b) roughly right is enough for the solver to find the solution, eg try this
popt, pcov = curve_fit(gaus, x, y, p0 = (1,1,20, 1))
you should see the same (good) result
I'm currently working on a lab report for Brownian Motion using this PDF equation with the intent of evaluating D:
Brownian PDF equation
And I am trying to curve_fit it to a histogram. However, whenever I plot my curve_fits, it's a line and does not appear correctly on the histogram.
Example Histogram with bad curve_fit
And here is my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
# Variables
eta = 1e-3
ra = 0.95e-6
T = 296.5
t = 0.5
# Random data
r = np.array(np.random.rayleigh(0.5e-6, 500))
# Histogram
plt.hist(r, bins=10, density=True, label='Counts')
# Curve fit
x,y = np.histogram(r, bins=10, density=True)
x = x[2:]
y = y[2:]
bin_width = y[1] - y[2]
print(bin_width)
bin_centers = (y[1:] + y[:-1])/2
err = x*0 + 0.03
def f(r, a):
return (((1e-6)3*np.pi*r*eta*ra)/(a*T*t))*np.exp(((-3*(1e-6 * r)**2)*eta*ra*np.pi)/(a*T*t))
print(x) # these are flipped for some reason
print(y)
plt.plot(bin_centers, x, label='Fitting this', color='red')
popt, pcov = optimize.curve_fit(f, bin_centers, x, p0 = (1.38e-23), sigma=err, maxfev=1000)
plt.plot(y, f(y, popt), label='PDF', color='orange')
print(popt)
plt.title('Distance vs Counts')
plt.ylabel('Counts')
plt.xlabel('Distance in micrometers')
plt.legend()
Is the issue with my curve_fit? Or is there an underlying issue I'm missing?
EDIT: I broke down D to get the Boltzmann constant as a in the function, which is why there are more numbers in f than the equation above. D and Gamma.
I've tried messing with the initial conditions and plotting the function with 1.38e-23 instead of popt, but that does this (the purple line). This tells me something is wrong with the equation for f, but no issues jump out to me when I look at it. Am I missing something?
EDIT 2: I changed the function to this to simplify it and match the numpy.random.rayleigh() distribution:
def f(r, a):
return ((r)/(a))*np.exp((-1*(r)**2)/(2*a))
But this doesn't resolve the issue that the curve_fit is a line with a positive slope instead of anything remotely what I'm interested in. Now I am more confused as to what the issue is.
There are a few things here. I don't think x and y were ever flipped, or at least when I assumed they weren't, everything seemed to work fine. I also cleaned up a few parts of the code, for example, I'm not sure why you call two different histograms; and I think there may have been problems handling the single element tuple of parameters. Also, for curve fitting, the initial parameter guess often needs to be in the ballpark, so I changed that too.
Here's a version that works for me:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
# Random data
r = np.array(np.random.rayleigh(0.5e-6, 500))
# Histogram
hist_values, bin_edges, patches = plt.hist(r, bins=10, density=True, label='Counts')
bin_centers = (bin_edges[1:] + bin_edges[:-1])/2
x = bin_centers[2:] # not necessary, and I'm not sure why the OP did this, but I'm doing this here because OP does
y = hist_values[2:]
def f(r, a):
return (r/(a*a))*np.exp((-1*(r**2))/(2*a*a))
plt.plot(x, y, label='Fitting this', color='red')
err = x*0 + 0.03
popt, pcov = optimize.curve_fit(f, x, y, p0 = (1.38e-6,), sigma=err, maxfev=1000)
plt.plot(x, f(x, *popt), label='PDF', color='orange')
plt.title('Distance vs Counts')
plt.ylabel('Counts')
plt.xlabel('Distance in Meters') # Motion seems to be in micron range, but calculation and plot has been done in meters
plt.legend()
I am trying to fit some sample data in a semilogy plot with curve_fit function from scipy. My best fit curve looks okay with the code I am following, but I am having trouble with the 2 sigma curves, which I want to show simultaneously along with the best fit curve and grey-filled. My code looks like the following:
import sys
import os
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.optimize as optimization
M = np.array([-2, -1, 0, 1, 2, 3,4])
Y_z = np.array([0.05, 0.2, 3, 8, 50, 344, 2400 ])
# curve fit linear function
def line(x, a, b):
return a*x+b
popt, pcov = curve_fit(line, M, np.log10(Y_z)) # change here
# plotting
plt.semilogy(M , Y_z, 'o')
plt.semilogy(M, 10**line(M, popt[0], popt[1]), ':', label = 'curve-fit')
# plot 1 sigma -error
y1 = 10**(line(M, popt[0] + pcov[0,0]**0.5, popt[1] - pcov[1,1]**0.5))
y2 = 10**(line(M, popt[0] - pcov[0,0]**0.5, popt[1] + pcov[1,1]**0.5))
plt.semilogy(M, y1, ':')
plt.semilogy(M, y2, ':')
plt.fill_between(M, y1, y2, facecolor="gray", alpha=0.15)
plt.xlabel(r"$\log X$")
plt.ylabel('Y')
plt.legend()
plt.show()
Your help is very appreciated for the variance curves
In principle, a linear fit doesn't need non-linear least-squares curve-fitting at all: linear regression should work.
That said, to address your questions, you might find lmfit (http://lmfit.github.io/lmfit-py/) useful here. It has a slightly higher-level and slightly more Pythonic approach to curve-fitting, and adds many features. One of these is calculating the uncertainty in the result for a selected value of sigma.
To do your fit with lmfit, it would look like
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as optimization
import lmfit
M = np.array([-2, -1, 0, 1, 2, 3,4])
Y_z = np.array([0.05, 0.2, 3, 8, 50, 344, 2400 ])
# curve fit linear function
def line(x, a, b):
return a*x+b
# set up model and create parameters from model function
# note that function argument names are used for parameters
model = lmfit.Model(line)
params = model.make_params(a=1, b=0)
result = model.fit(np.log10(Y_z), params, x=M)
print(result.fit_report())
which will print out a report about the fit like this:
[[Model]]
Model(line)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 8
# data points = 7
# variables = 2
chi-square = 0.10468256
reduced chi-square = 0.02093651
Akaike info crit = -25.4191304
Bayesian info crit = -25.5273101
[[Variables]]
a: 0.77630819 +/- 0.02734470 (3.52%) (init = 1)
b: 0.22311337 +/- 0.06114460 (27.41%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
C(a, b) = -0.447
You can calculate the 2-sigma uncertainty in the best-fit result as
# calculate 2-sigma uncertainty in result
del2 = result.eval_uncertainty(sigma=2, x=M)
and then use this and the fit results to plot the results (slightly modified from your form):
plt.plot(M, np.log10(Y_z), 'o', label='data')
plt.plot(M, result.best_fit, ':', label = 'curve-fit')
plt.fill_between(M, result.best_fit-del2, result.best_fit+del2, facecolor="grey", alpha=0.15)
plt.xlabel(r"$\log X$")
plt.ylabel('Y')
plt.legend()
plt.show()
which should produce a plot like
hope that helps.
So I have two lists of data, which I can plot in a scatter plot, as such:
from matplotlib import pyplot as plt
x = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
y = [22.4155688819,22.3936180362,22.3177538001,22.1924849792,21.7721194577,21.1590235248,20.6670446864,20.4996957642,20.4260953411,20.3595072628,20.3926201626,20.6023149681,21.1694961343,22.1077417713,23.8270366414,26.5355924353,31.3179807276,42.7871637946,61.9639549412,84.7710953311]
plt.scatter(degrees,RMS_one_image)
This gives you a plot that looks like a Gaussian distribution, which is good as it should-
My issue is however I am trying to fit a Gaussian distribution to this, and failing miserably because a. it's only half a Gaussian instead of a full one, and b. what I've used before has only ever used one bunch of numbers. So something like:
# best fit of data
num_bins = 20
(mu, sigma) = norm.fit(sixteen)
y = mlab.normpdf(num_bins, mu, sigma)
n, bins, patches = plt.hist(deg_array, num_bins, normed=1, facecolor='blue', alpha=0.5)
# add a 'best fit' line
y = mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r--')
Does this approach work at all here, or am I going about this in the wrong way completely? Thanks...
It seems that your normal solution is to find the expectation value and standard deviation of the data directly instead of using a least square fit. Here is a solution using curve_fit from scipy.optimize.
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
x = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])
y = [22.4155688819,22.3936180362,22.3177538001,22.1924849792,21.7721194577,21.1590235248,20.6670446864,20.4996957642,20.4260953411,20.3595072628,20.3926201626,20.6023149681,21.1694961343,22.1077417713,23.8270366414,26.5355924353,31.3179807276,42.7871637946,61.9639549412,84.7710953311]
# Define a gaussian function with offset
def gaussian_func(x, a, x0, sigma,c):
return a * np.exp(-(x-x0)**2/(2*sigma**2)) + c
initial_guess = [1,20,2,0]
popt, pcov = curve_fit(gaussian_func, x, y,p0=initial_guess)
xplot = np.linspace(0,30,1000)
plt.scatter(x,y)
plt.plot(xplot,gaussian_func(xplot,*popt))
plt.show()