Gaussian fit in log-log space - python

Would anything have to be changed in the answer for Gaussian fit for Python to fit data in log-log space? Specifically, for both x and y data covering several orders of magnitude and this code snippet:
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
def gaus(x,a,x0,sigma):
return a*exp(-(x-x0)**2/(2*sigma**2))
b=np.genfromtxt('Stuff.dat', delimiter=None, filling_values=0)
x = b[:,0]
y = b[:,1]
n = len(x) #the number of data
mean = sum(x*y)/n #note this correction
sigma = sum(y*(x-mean)**2)/n #note this correction
popt,pcov = curve_fit(gaus,x,y,p0=[max(y),mean,sigma])
ax = pl.gca()
ax.plot(x, y, 'r.-')
ax.plot(x,gaus(x,*popt),'ro:')
ax.set_xscale('log')
ax.set_yscale('log')
The "fits" are horizontal lines and I am not sure whether I am missing something in my code, or if my data simply isn't fittable by a Gaussian. Any help will be appreciated!

This is what I was missing: the data needs to be transformed before doing the fitting, then transformed back to plot on log axes:
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
import numpy as np
def gaus(x,a,x0,sigma):
return a*exp(-(x-x0)**2/(2*sigma**2))
b=np.genfromtxt('Stuff.dat', delimiter=None, filling_values=0)
x = np.log(b[:,0])
y = np.log(b[:,1])
n = len(x) #the number of data
mean = sum(x*y)/n #note this correction
sigma = sum(y*(x-mean)**2)/n #note this correction
popt,pcov = curve_fit(gaus,x,y,p0=[max(y),mean,sigma])
ax = pl.gca()
ax.plot(x, y, 'r.-')
ax.plot(10**x,10**(gaus(x,*popt)),'ro:')
ax.set_xscale('log')
ax.set_yscale('log')

Related

Python multivariable nonlinear regression calculation

i was trying to make a curve fit of data and want to find nonlinear regression equation.
thats what my plot looks like i got x,y data which will be my reference data,
then i got x0 and y0 which will be my second point,
dx and dy will be difference between them
when i show this as vector it showed form of
when i convert dx,dy to R and theta it showed x^2+y^2 form,
is it possible to find those equation with it?
here's my current code
import matplotlib.pyplot as plt
import numpy as np
import math
import seaborn as sns
from statsmodels.formula.api import ols
from mpl_toolkits.mplot3d import Axes3D
from scipy.interpolate import griddata
"""setting dpi for graph shown in editor"""
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300
import pandas as pd
"""reading data from excel sheet 1"""
df = pd.read_excel(r'C:\Users\JRKIM\Desktop\data\2513data.xlsx')
"""variable selection"""
tx_0 = df.loc[:,'TRUE_x_0']
ty_0 = df.loc[:,'TRUE_y_0']
v_x_0 = df.loc[:,'vx']
v_y_0 = df.loc[:,'vy']
dx0_0 = tx_0-v_x_0
dy0_0 = ty_0-v_y_0
dr0_0 = df.loc[:,'dr']
fig1, ax0 = plt.subplots()
ax0.set_title("delta0 in vector")
qk = ax0.quiver(tx_0,ty_0,dx0_0,dy0_0)
ax0.scatter(tx_0, ty_0, color='r', s=1)
"""3d graph with vector and position """
fig4 = plt.figure()
ax4 = fig4.add_subplot(111, projection='3d')
ax4.scatter(tx_0, ty_0, dr0_0, marker='*',linewidth = 0.01, cmap="jet")
ax4.set_xlabel('X Label')
ax4.set_ylabel('Y Label')
ax4.set_zlabel('dr Label')
you can use scipy.optimize.curve_fit as follows.
from scipy.optimize import curve_fit
def quadratic_function(x, a, b):
return a * x[0]**2 + b * x[1]**2 # + c * x[0]*x[1] if you want to satisfy quadratic form
xdata = np.vstack([np.array(dx0_0).flatten(),np.array(dy0_0).flatten()])
ydata = np.array(dr0_0).flatten()
popt, pcov = curve_fit(quadratic_function,xdata,ydata)
print(popt) # values for a and b
this should get you the coefficients for a and b for the least squares fitting of a*x**2 + b*y**2 = r, you have to further calculate the error and see if the error is low enough for your liking.
Edit: corrected the dimensions of inputs

Rayleigh distribution Curve_fit on python

I'm currently working on a lab report for Brownian Motion using this PDF equation with the intent of evaluating D:
Brownian PDF equation
And I am trying to curve_fit it to a histogram. However, whenever I plot my curve_fits, it's a line and does not appear correctly on the histogram.
Example Histogram with bad curve_fit
And here is my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
# Variables
eta = 1e-3
ra = 0.95e-6
T = 296.5
t = 0.5
# Random data
r = np.array(np.random.rayleigh(0.5e-6, 500))
# Histogram
plt.hist(r, bins=10, density=True, label='Counts')
# Curve fit
x,y = np.histogram(r, bins=10, density=True)
x = x[2:]
y = y[2:]
bin_width = y[1] - y[2]
print(bin_width)
bin_centers = (y[1:] + y[:-1])/2
err = x*0 + 0.03
def f(r, a):
return (((1e-6)3*np.pi*r*eta*ra)/(a*T*t))*np.exp(((-3*(1e-6 * r)**2)*eta*ra*np.pi)/(a*T*t))
print(x) # these are flipped for some reason
print(y)
plt.plot(bin_centers, x, label='Fitting this', color='red')
popt, pcov = optimize.curve_fit(f, bin_centers, x, p0 = (1.38e-23), sigma=err, maxfev=1000)
plt.plot(y, f(y, popt), label='PDF', color='orange')
print(popt)
plt.title('Distance vs Counts')
plt.ylabel('Counts')
plt.xlabel('Distance in micrometers')
plt.legend()
Is the issue with my curve_fit? Or is there an underlying issue I'm missing?
EDIT: I broke down D to get the Boltzmann constant as a in the function, which is why there are more numbers in f than the equation above. D and Gamma.
I've tried messing with the initial conditions and plotting the function with 1.38e-23 instead of popt, but that does this (the purple line). This tells me something is wrong with the equation for f, but no issues jump out to me when I look at it. Am I missing something?
EDIT 2: I changed the function to this to simplify it and match the numpy.random.rayleigh() distribution:
def f(r, a):
return ((r)/(a))*np.exp((-1*(r)**2)/(2*a))
But this doesn't resolve the issue that the curve_fit is a line with a positive slope instead of anything remotely what I'm interested in. Now I am more confused as to what the issue is.
There are a few things here. I don't think x and y were ever flipped, or at least when I assumed they weren't, everything seemed to work fine. I also cleaned up a few parts of the code, for example, I'm not sure why you call two different histograms; and I think there may have been problems handling the single element tuple of parameters. Also, for curve fitting, the initial parameter guess often needs to be in the ballpark, so I changed that too.
Here's a version that works for me:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
# Random data
r = np.array(np.random.rayleigh(0.5e-6, 500))
# Histogram
hist_values, bin_edges, patches = plt.hist(r, bins=10, density=True, label='Counts')
bin_centers = (bin_edges[1:] + bin_edges[:-1])/2
x = bin_centers[2:] # not necessary, and I'm not sure why the OP did this, but I'm doing this here because OP does
y = hist_values[2:]
def f(r, a):
return (r/(a*a))*np.exp((-1*(r**2))/(2*a*a))
plt.plot(x, y, label='Fitting this', color='red')
err = x*0 + 0.03
popt, pcov = optimize.curve_fit(f, x, y, p0 = (1.38e-6,), sigma=err, maxfev=1000)
plt.plot(x, f(x, *popt), label='PDF', color='orange')
plt.title('Distance vs Counts')
plt.ylabel('Counts')
plt.xlabel('Distance in Meters') # Motion seems to be in micron range, but calculation and plot has been done in meters
plt.legend()

Peak finding and analysis on python

I have written a code that reads in my data file and plots it and then fits it and finds the peaks however I have 6 peaks and the code is only currently fitting 2 of the peaks and isn't returning any data on them by code is as follows:
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
data = np.genfromtxt("C:\\Users\\lenovo laptop\\practice_data_ll16ame1.dat", skip_header = 15)
x = data[: , 0]
y = data[: , 1]
plt.plot(x,y)
plt.show()
def func(x, *params):
y = np.zeros_like(x)
for i in range(0, len(params), 3):
ctr = params[i]
amp = params[i+1]
wid = params[i+2]
y = y + amp * np.exp( -((x - ctr)/wid)**2)
return y
guess = [0, 60000, 80, 1000, 60000, 80]
for i in range(12):
guess += [60+80*i, 46000, 25]
popt, pcov = curve_fit(func, x, y, p0=guess)
fit = func(x, *popt)
plt.plot(x, y)
plt.plot(x, fit , 'r-')
plt.show()
When I looked at the plot of your custom function, it was clear that the majority of points were in a more-or-less horizontal line, so the function wouldn't fit well to your peaks. Because there is no noise and the peaks are so prominent, you just need to pass the y values and a threshold to the find_peaks function.
By implementing find_peaks instead of your custom function, you get the following code:
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
data = np.genfromtxt("C:\\Users\\lenovo laptop\\practice_data_ll16ame1.dat", skip_header = 15)
x = data[: , 0]
y = data[: , 1]
points = find_peaks(y, height = 100)
plt.plot(x, y)
for i in points[0]:
plt.scatter(x[i], y[i])
plt.show()
Find_peaks returns a tuple consisting of two things:
1. The index of the peaks ( points[0] in the code above)
2. The height of each peak (points[1])
The code yields the following plot, which I believe is what you want:

Python Data Fitting

I am getting a horrible fit when I am trying to fit a parabola to this data.
I am initially making a histogram of the data which is the position of an object and then plotting the negative log values of the histogram bin counts to the position using a parabola fit.
the code I am using is this:
time,pos=postime()
plt.plot(time, pos)
poslen=len(pos)
plt.xlabel('Time')
plt.ylabel('Positions')
plt.show()
n,bins,patches = plt.hist(pos,bins=100)
n=n.tolist()
plt.show()
l=len(bins)
s=len(n)
posx=[]
i=0
j=0
pbin=[]
sig=[]
while j < (l-1):
pbin.append((bins[j]+bins[j+1])/2)
j=j+1
while i < s:
if n[i]==0:
pbin[i]=0
else:
sig.append(np.power(1/n[i],2))
n[i]=n[i]/poslen
n[i]=np.log(n[i])
n[i]=n[i]*(-1)
i=i+1
n[:]=[y for y in n if y != 0]
pbin[:]=[y for y in pbin if y != 0]
from scipy.optimize import curve_fit
def parabola(x, a , b):
return a * (np.power(x,2)) + b
popt, pcov = curve_fit(parabola, pbin, n)
print popt
plt.plot(pbin,n)
plt.plot(pbin, parabola(pbin, *popt), 'r-')
I am not sure why you are computing the histogram... But here is a working example which does not require histogram computation.
import numpy as np
from scipy.optimize import curve_fit
from matplotlib import pyplot
time_ = np.arange(-5, 5, 0.1)
pos = time_**2 + np.random.rand(len(time_))*5
def parabola(x, a, b):
return a * (np.power(x, 2)) + b
popt, pcov = curve_fit(parabola, time_, pos)
yfit = parabola(time_, *popt)
pyplot.plot(time_, pos, 'o')
pyplot.plot(time_, yfit)
Also, if your time_ vector is not uniformly sampled, and you want it to be uniformly sampled for the fit, you can do: fittime_ = np.linsapce(np.min(time_), np.max(time_)) and then yfit = parabola(fittime_, *popt).
You can also use matrix inversion.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-5,5,100)
Y = (np.power(x,2) + np.random.normal(0,1,x.shape)).reshape(-1,1)
X = np.c_[np.ones(x.shape), x, np.power(x,2)]
A = np.linalg.inv(X.transpose().dot(X)).dot(X.transpose().dot(Y))
Yp = X.dot(A)
fig = plt.figure()
ax = fig.add_subplot()
plt.plot(x,Y,'o',alpha=0.5)
plt.plot(x,Yp)
plt.show()
The matrix form is
X*A=Y
A=(Xt*X)-1*Xt*Y
You can have a better idea here if needed. It does not always work out and you may want to apply some form of regularization.

Python interpolation sin function using nearest method

I write simple code using interpolation of sin function, nearest method. My question is it's that code it's correct? It seems to me that the function should consist of straight lines. Curved lines appear on the generated graph.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
import math
# Original "data set" --- 21 random numbers between 0 and 1.
x0 = np.arange(9)
y0 = [math.sin(i) for i in x0]
plt.plot(x0, y0, 'o', label='Data')
plt.grid(linestyle="-", color=(0.7, 0.8, 1.0))
x = np.linspace(0, 8, len(x0)*2)
# Available options for interp1d
options = ('linear', 'nearest')
f = interp1d(x0, y0, kind='nearest') # interpolation function
plt.plot(x, f(x), label='nearest') # plot of interpolated data
plt.legend()
plt.show()
EDIT:
I woudl like to impelment own interpolation algorithm, I try to divide sum of 2 values by 2
lst = list(x0)
for i, val in enumerate(lst):
lst[i] = lst[i] + lst[i+1] / 2
x0 = tuple(lst)
plt.plot(x0, y0, label='nearest')
But it's not working correctly
The problem is that the green line is drawn as a connected graph between all the points, and you have too few points. Maybe you have misunderstood how np.linspace works. If you increase the number of points, (and change to plot only the points instead as connected lines) you will get a result that looks much more like you probably expect:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
import math
# Original "data set" --- 21 random numbers between 0 and 1.
x0 = np.arange(9)
y0 = [math.sin(i) for i in x0]
plt.plot(x0, y0, 'o', label='Data')
plt.grid(linestyle="-", color=(0.7, 0.8, 1.0))
x = np.linspace(0, 8, 1000)
# Available options for interp1d
options = ('linear', 'nearest')
f = interp1d(x0, y0, kind='nearest') # interpolation function
plt.plot(x, f(x), '.', label='nearest') # plot of interpolated data
plt.legend()
plt.show()

Categories