I plotted a graph for entropy. Now I want to do a curve fitting but I am not able to understand how to initiate the process. I tried it using the curve_fit module of scipy but what I get is just a straight line rather than a Gaussian curve. Here is the output:
from scipy.optimize import curve_fit
x = [18,23,95,142,154,156,157,158,258,318,367,382,484,501,522,574,681,832,943,1071,1078,1101,1133,1153,1174,1264]
y = [0.179,0.179,0.692,0.574,0.669,0.295,0.295,0.295,0.387,0.179,0.179,0.462,0.179,0.179,0.179,0.179,0.179,0.179,0.179,0.179,0.179,0.462,0.179,0.387,0.179,0.295]
x = np.asarray(x)
y = np.asarray(y)
def Gauss(x, A, B):
y = A*np.exp(-1*B*x**2)
return y
parameters_g, covariance_g = curve_fit(Gauss, x, y)
fit_A = parameters_g[0]
fit_B = parameters_g[1]
print(fit_A)
print(fit_B)
fit_y = Gauss(x, fit_A, fit_B)
plt.figure(figsize=(18,10))
plt.plot(x, y, 'o', label='data')
plt.plot(x, fit_y, '-', label='fit')
plt.legend()
plt.show()
I just connected the values using spline and got this kind of curve:
Can anyone suggest to me how to fit this curve into a Gaussian curve?
Edit 1: Now I have the barplot (sort of) where I have the y value for the corresponding x values. My x-axis ranges from 0 to 1273 and the y-axis from 0 to 1. How can I do a curve fitting and what will be the right curve over here? I was trying to fit a bimodal curve distribution for the given data. You can find the data from here.
Bar plot image: https://i.stack.imgur.com/1Awt5.png
Data : https://drive.google.com/file/d/1_uiweIWRWgzy5wNVLOvn4WN25jteu8rQ/view?usp=sharing
You dont have a line you indeed have a Gaussian curve (centered around zero because of your definition of Gauss). You can clearly see that when you plot the function on a different scale:
x_arr = np.linspace(-2,2,100)
fit_y = Gauss(x_arr, fit_A, fit_B)
plt.figure(figsize=(18,10))
plt.plot(x_arr, fit_y, '-', label='fit')
plt.legend()
plt.show()
That image was made with the estimated fit_A == fit_B == 1 from your code.
You can add a different initial guess which leads to different result via:
parameters_g, covariance_g = curve_fit(Gauss, x, y, p0=(1,1/1000))
but I would say that your data is not that well described via a Gaussian curve.
One thing I would allays recommend is setting those values by hand to see the effect it has on the plot. That way you can get a feeling if the task you try to automated is realistic in the first place.
Related
I have been trying to fit a gaussian curve to my data
data
I have used the following code:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def gaus(x, y0, a, b, c):
return y0 + a*np.exp(-np.power(x - b, 2)/(2*np.power(c, 2)))
popt, pcov = curve_fit(gaus, x, y)
plt.figure()
plt.scatter(x, y, c='grey', marker = 'o', label = "Measured values", s = 2)
plt.plot(x, gaus(x, *popt), c='grey', linestyle = '-')
And that's what I am getting:
result
I have the x/y data available here in case you want to try it by yourself.
Any idea on how can I get a fit? This data is obviously gaussian shaped, so it seems weird I cannot fit a gaussian curve.
The fit needs a decent starting point. Per the docs if you do not specify the starting point all parameters are set to 1 which is clearly not appropriate, and the fit gets stuck in some wrong local minima. Try this, where I chose the starting point by eyeballing the data
popt, pcov = curve_fit(gaus, x, y, p0 = (1500,2000,20, 1))
you would get something like this:
and the solution found by the solver is
popt
array([1559.13138798, 2128.64718985, 21.50092272, 0.16298357])
Even just getting the mean (parameter b) roughly right is enough for the solver to find the solution, eg try this
popt, pcov = curve_fit(gaus, x, y, p0 = (1,1,20, 1))
you should see the same (good) result
I'm currently working on a lab report for Brownian Motion using this PDF equation with the intent of evaluating D:
Brownian PDF equation
And I am trying to curve_fit it to a histogram. However, whenever I plot my curve_fits, it's a line and does not appear correctly on the histogram.
Example Histogram with bad curve_fit
And here is my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
# Variables
eta = 1e-3
ra = 0.95e-6
T = 296.5
t = 0.5
# Random data
r = np.array(np.random.rayleigh(0.5e-6, 500))
# Histogram
plt.hist(r, bins=10, density=True, label='Counts')
# Curve fit
x,y = np.histogram(r, bins=10, density=True)
x = x[2:]
y = y[2:]
bin_width = y[1] - y[2]
print(bin_width)
bin_centers = (y[1:] + y[:-1])/2
err = x*0 + 0.03
def f(r, a):
return (((1e-6)3*np.pi*r*eta*ra)/(a*T*t))*np.exp(((-3*(1e-6 * r)**2)*eta*ra*np.pi)/(a*T*t))
print(x) # these are flipped for some reason
print(y)
plt.plot(bin_centers, x, label='Fitting this', color='red')
popt, pcov = optimize.curve_fit(f, bin_centers, x, p0 = (1.38e-23), sigma=err, maxfev=1000)
plt.plot(y, f(y, popt), label='PDF', color='orange')
print(popt)
plt.title('Distance vs Counts')
plt.ylabel('Counts')
plt.xlabel('Distance in micrometers')
plt.legend()
Is the issue with my curve_fit? Or is there an underlying issue I'm missing?
EDIT: I broke down D to get the Boltzmann constant as a in the function, which is why there are more numbers in f than the equation above. D and Gamma.
I've tried messing with the initial conditions and plotting the function with 1.38e-23 instead of popt, but that does this (the purple line). This tells me something is wrong with the equation for f, but no issues jump out to me when I look at it. Am I missing something?
EDIT 2: I changed the function to this to simplify it and match the numpy.random.rayleigh() distribution:
def f(r, a):
return ((r)/(a))*np.exp((-1*(r)**2)/(2*a))
But this doesn't resolve the issue that the curve_fit is a line with a positive slope instead of anything remotely what I'm interested in. Now I am more confused as to what the issue is.
There are a few things here. I don't think x and y were ever flipped, or at least when I assumed they weren't, everything seemed to work fine. I also cleaned up a few parts of the code, for example, I'm not sure why you call two different histograms; and I think there may have been problems handling the single element tuple of parameters. Also, for curve fitting, the initial parameter guess often needs to be in the ballpark, so I changed that too.
Here's a version that works for me:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
# Random data
r = np.array(np.random.rayleigh(0.5e-6, 500))
# Histogram
hist_values, bin_edges, patches = plt.hist(r, bins=10, density=True, label='Counts')
bin_centers = (bin_edges[1:] + bin_edges[:-1])/2
x = bin_centers[2:] # not necessary, and I'm not sure why the OP did this, but I'm doing this here because OP does
y = hist_values[2:]
def f(r, a):
return (r/(a*a))*np.exp((-1*(r**2))/(2*a*a))
plt.plot(x, y, label='Fitting this', color='red')
err = x*0 + 0.03
popt, pcov = optimize.curve_fit(f, x, y, p0 = (1.38e-6,), sigma=err, maxfev=1000)
plt.plot(x, f(x, *popt), label='PDF', color='orange')
plt.title('Distance vs Counts')
plt.ylabel('Counts')
plt.xlabel('Distance in Meters') # Motion seems to be in micron range, but calculation and plot has been done in meters
plt.legend()
I have this histogram:
I want to have a resulting plot/waveform as like in the picture below? What code or python process to used. Gaussian Distribution is what I'm thinking to use.
Here are some of the codes I've been using:
def gaus(x,a,x0,sigma):
return a*np.exp(-(x-x0)**2/(2*sigma**2))
mean = np.mean(y)
sigma = np.std(y)
# y is histogram list
popt, pcov = curve_fit(gaus, x, y, p0=[1, mean, sigma])
plt.plot(x, y, 'b+:', label='data')
z = gaus(x, *popt)
plt.plot(x, z, 'ro:', label='fit')
plt.show()
EDIT / UPDATE:
I have edited the histogram plot so as it does not look continuous. The horizontal data/axis is the bin edges(they are measured voltages, measurement done to greater 5000 times). So the number of counts those voltages being hit is the vertical axis. What i wanted for my resulting plot/waveform would be to have two peaks. 1 peak is the maximum number of count and the 2nd peak is the 2nd number of count. How do i upload a list(.txt) in here so I can give you the whole data measurements?
So I have two lists of data, which I can plot in a scatter plot, as such:
from matplotlib import pyplot as plt
x = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
y = [22.4155688819,22.3936180362,22.3177538001,22.1924849792,21.7721194577,21.1590235248,20.6670446864,20.4996957642,20.4260953411,20.3595072628,20.3926201626,20.6023149681,21.1694961343,22.1077417713,23.8270366414,26.5355924353,31.3179807276,42.7871637946,61.9639549412,84.7710953311]
plt.scatter(degrees,RMS_one_image)
This gives you a plot that looks like a Gaussian distribution, which is good as it should-
My issue is however I am trying to fit a Gaussian distribution to this, and failing miserably because a. it's only half a Gaussian instead of a full one, and b. what I've used before has only ever used one bunch of numbers. So something like:
# best fit of data
num_bins = 20
(mu, sigma) = norm.fit(sixteen)
y = mlab.normpdf(num_bins, mu, sigma)
n, bins, patches = plt.hist(deg_array, num_bins, normed=1, facecolor='blue', alpha=0.5)
# add a 'best fit' line
y = mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r--')
Does this approach work at all here, or am I going about this in the wrong way completely? Thanks...
It seems that your normal solution is to find the expectation value and standard deviation of the data directly instead of using a least square fit. Here is a solution using curve_fit from scipy.optimize.
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
x = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])
y = [22.4155688819,22.3936180362,22.3177538001,22.1924849792,21.7721194577,21.1590235248,20.6670446864,20.4996957642,20.4260953411,20.3595072628,20.3926201626,20.6023149681,21.1694961343,22.1077417713,23.8270366414,26.5355924353,31.3179807276,42.7871637946,61.9639549412,84.7710953311]
# Define a gaussian function with offset
def gaussian_func(x, a, x0, sigma,c):
return a * np.exp(-(x-x0)**2/(2*sigma**2)) + c
initial_guess = [1,20,2,0]
popt, pcov = curve_fit(gaussian_func, x, y,p0=initial_guess)
xplot = np.linspace(0,30,1000)
plt.scatter(x,y)
plt.plot(xplot,gaussian_func(xplot,*popt))
plt.show()
I am working on fitting some data to a gaussian curve using lmfit module for Python. The fit was easy enough following some online tutorial, but I would also like to plot the error bars for this curve. Does anyone know how to accomplish this? If your interested the data represents counts of radioactive decay per 4 seconds from Cesium-137 (x-axis represents number of counts, y-axis represents the frequency that those counts occurred). Here is some of my code,
data = loadtxt('CS137gaussian.txt')
x = data[:, 0]
y = data[:, 1]
def gaussian(x, amp, mu, sigma):
"1-d gaussian: gaussian(x, amp, mu, sigma)"
return (amp/(sqrt(2*pi)*sigma)) * exp(-(x-mu)**2 /(2*sigma**2))
gmod = Model(gaussian)
result = gmod.fit(y, x=x, amp=20, mu=300, sigma=1)
plt.plot(x, y, 'bo')
plt.plot(x, result.init_fit, 'k--')
plt.plot(x, result.best_fit, 'r-')