Related
I would like to evaluate a 4d Gaussian / normal distribution on a 4d grid. Let's call the variables (x1,y1,x2,y2). Then if I have means = (x1=1,y1=0,x2=2,y2=0), I expect that when I do a 2d contour plot in the x1, x2 direction, at y1=y2=0, to see a Gaussian centered in (x1=1, x2=2). However, I see the mean/center at (x1=2,x2=0) instead.
What am I missing here? Is it how I define the grid to begin with?
For a 2d normal distribution it works as expected.
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import multivariate_normal
xy_min = -5
xy_max = 5
npoints = 50
x = np.linspace(xy_min, xy_max, npoints)
dim = 4
xx1,yy1,xx2,yy2 = np.meshgrid(x, x,x,x)
points = np.concatenate([xx1[:, :,:, :,None], yy1[:, :, :,:,None],xx2[:, :, :,:,None],yy2[:, :, :,:,None]], axis=-1)
cov = np.diag(np.ones(4))
mean=np.array([1,0,2,0])
rv = multivariate_normal.pdf(points , mean=mean, cov=cov)
plt.figure()
plt.contourf(x, x, rv[:,0,:,0])
I tried to manually reshape the evaluation points first, but it gives the same results. So I think I am missing something conceptually here?
points_resh = np.reshape(points,[npoints**4,dim],order='C')
rv_resh = multivariate_normal.pdf(points_resh , mean=mean, cov=cov)
rv2 = np.reshape(rv_resh,[npoints,npoints,npoints,npoints],order='C')
plt.figure()
plt.contourf(x, x, rv2[:,0,:,0])
** EDIT: SOLVED **
using ij indexing for meshgrid everything works as expected. Only need to keep in mind that the matrix needs to be transposed for contour plotting. See example below:
#%% Instead use ij indexing
x = np.linspace(-5, 5, 50)
y = np.linspace(-3, 3, 30)
z= np.linspace(-2, 2, 20)
w= np.linspace(-1, 1, 10)
x4d,y4d,z4d,w4d= np.meshgrid(x, y,z,w,indexing='ij')
points4d= np.concatenate([x4d[:, :,:,:,None], y4d[:, :,:,:,None], z4d[:, :,:,:,None],w4d[:, :,:,:,None]], axis=-1)
rv4d = multivariate_normal.pdf(points4d , mean=[1,0.0,2,0.0], cov=[0.1,0.1,0.1,0.1])
fig,ax=plt.subplots()
ax.contourf(x,z,rv4d[:,0,:,0].T)
ax.set(xlabel='x',ylabel='y')
print(x_mean)
using ij indexing for meshgrid everything works as expected. Only need to keep in mind that the matrix needs to be transposed for contour plotting. See example below:
#%% Instead use ij indexing
x = np.linspace(-5, 5, 50)
y = np.linspace(-3, 3, 30)
z= np.linspace(-2, 2, 20)
w= np.linspace(-1, 1, 10)
x4d,y4d,z4d,w4d= np.meshgrid(x, y,z,w,indexing='ij')
points4d= np.concatenate([x4d[:, :,:,:,None], y4d[:, :,:,:,None], z4d[:, :,:,:,None],w4d[:, :,:,:,None]], axis=-1)
rv4d = multivariate_normal.pdf(points4d , mean=[1,0.0,2,0.0], cov=[0.1,0.1,0.1,0.1])
fig,ax=plt.subplots()
ax.contourf(x,z,rv4d[:,0,:,0].T)
ax.set(xlabel='x',ylabel='y')
print(x_mean)
I have a curve parameterized by time that intersects a shape (in this case just a rectangle). Following this elegant suggestion, I used shapely to determine where the objects intersect, however from there on, I struggle to find a good solution for when that happens. Currently, I am approximating the time awkwardly by finding the point of the curve that is closest (in space) to the intersection, and then using its time stamp.
But I believe there should be a better solution e.g. by solving the polynomial equation, maybe using the root method of a numpy polynomial. I'm just not sure how to do this, because I guess you would need to somehow introduce tolerances as it is likely that the curve will never assume exactly the same intersection coordinates as determined by shapely.
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle, Ellipse
from matplotlib.collections import LineCollection
from shapely.geometry import LineString, Polygon
# the parameterized curve
coeffs = np.array([
[-2.65053088e-05, 2.76890591e-05],
[-5.70681576e-02, -2.69415587e-01],
[7.92564148e+02, 6.88557419e+02],
])
t_fit = np.linspace(-2400, 3600, 1000)
x_fit = np.polyval(coeffs[:, 0], t_fit)
y_fit = np.polyval(coeffs[:, 1], t_fit)
curve = LineString(np.column_stack((x_fit, y_fit)))
# the shape it intersects
area = {'x': [700, 1000], 'y': [1300, 1400]}
area_shape = Polygon([
(area['x'][0], area['y'][0]),
(area['x'][1], area['y'][0]),
(area['x'][1], area['y'][1]),
(area['x'][0], area['y'][1]),
])
# attempt at finding the time of intersection
intersection = curve.intersection(area_shape).coords[-1]
distances = np.hypot(x_fit-intersection[0], y_fit-intersection[1])
idx = np.where(distances == min(distances))
fit_intersection = x_fit[idx][0], y_fit[idx][0]
t_intersection = t_fit[idx]
print(t_intersection)
# code for visualization
fig, ax = plt.subplots(figsize=(5, 5))
ax.margins(0.4, 0.2)
ax.invert_yaxis()
area_artist = Rectangle(
(area['x'][0], area['y'][0]),
width=area['x'][1] - area['x'][0],
height=area['y'][1] - area['y'][0],
edgecolor='gray', facecolor='none'
)
ax.add_artist(area_artist)
points = np.array([x_fit, y_fit]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
z = np.linspace(0, 1, points.shape[0])
norm = plt.Normalize(z.min(), z.max())
lc = LineCollection(
segments, cmap='autumn', norm=norm, alpha=1,
linewidths=2, picker=8, capstyle='round',
joinstyle='round'
)
lc.set_array(z)
ax.add_collection(lc)
ax.autoscale_view()
ax.relim()
trans = (ax.transData + ax.transAxes.inverted()).transform
intersection_point = Ellipse(
xy=trans(fit_intersection), width=0.02, height=0.02, fc='none',
ec='black', transform=ax.transAxes, zorder=3,
)
ax.add_artist(intersection_point)
plt.show()
And just for the visuals, here is what the problem looks like in a plot:
The best is to use interpolation functions to compute (x(t), y(t)). And use a function to compute d(t): the distance to intersection. Then we use scipy.optimize.minimize on d(t) to find the t value at which d(t) is minimum. Interpolation will ensure good accuracy.
So, I added a few modifications to you code.
definitions of interpolation functions and distance calculation
Test if there is indeed intersection, otherwise it doesn't make sense.
Compute the intersection time by minimization
The code (UPDATED):
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle, Ellipse
from matplotlib.collections import LineCollection
from shapely.geometry import LineString, Polygon
from scipy.optimize import minimize
# Interpolate (x,y) at time t:
def interp_xy(t,tp, fpx,fpy):
# tp: time grid points, fpx, fpy: the corresponding x,y values
x=np.interp(t, tp, fpx)
y=np.interp(t, tp, fpy)
return x,y
# Compute distance to intersection:
def dist_to_intersect(t,tp, fpx, fpy, intersection):
x,y = interp_xy(t,tp,fpx,fpy)
d=np.hypot(x-intersection[0], y-intersection[1])
return d
# the parameterized curve
t_fit = np.linspace(-2400, 3600, 1000)
#t_fit = np.linspace(-4200, 0, 1000)
coeffs = np.array([[-2.65053088e-05, 2.76890591e-05],[-5.70681576e-02, -2.69415587e-01],[7.92564148e+02, 6.88557419e+02],])
#t_fit = np.linspace(-2400, 3600, 1000)
#coeffs = np.array([[4.90972365e-05, -2.03897149e-04],[2.19222264e-01, -1.63335372e+00],[9.33624672e+02, 1.07067102e+03], ])
#t_fit = np.linspace(-2400, 3600, 1000)
#coeffs = np.array([[-2.63100091e-05, -7.16542227e-05],[-5.60829940e-04, -3.19183803e-01],[7.01544289e+02, 1.24732452e+03], ])
#t_fit = np.linspace(-2400, 3600, 1000)
#coeffs = np.array([[-2.63574223e-05, -9.15525038e-05],[-8.91039302e-02, -4.13843734e-01],[6.35650643e+02, 9.40010900e+02], ])
x_fit = np.polyval(coeffs[:, 0], t_fit)
y_fit = np.polyval(coeffs[:, 1], t_fit)
curve = LineString(np.column_stack((x_fit, y_fit)))
# the shape it intersects
area = {'x': [700, 1000], 'y': [1300, 1400]}
area_shape = Polygon([
(area['x'][0], area['y'][0]),
(area['x'][1], area['y'][0]),
(area['x'][1], area['y'][1]),
(area['x'][0], area['y'][1]),
])
# attempt at finding the time of intersection
curve_intersection = curve.intersection(area_shape)
# We check if intersection is empty or not:
if not curve_intersection.is_empty:
# We can get the coords because intersection is not empty
intersection=curve_intersection.coords[-1]
distances = np.hypot(x_fit-intersection[0], y_fit-intersection[1])
print("Looking for minimal distance to intersection: ")
print('-------------------------------------------------------------------------')
# Call to minimize:
# We pass:
# - the function to be minimized (dist_to_intersect)
# - a starting value to t
# - arguments, method and tolerance tol. The minimization will succeed when
# dist_to_intersect < tol=1e-6
# - option: here --> verbose
dmin=np.min((x_fit-intersection[0])**2+(y_fit-intersection[1])**2)
index=np.where((x_fit-intersection[0])**2+(y_fit-intersection[1])**2==dmin)
t0=t_fit[index]
res = minimize(dist_to_intersect, t0, args=(t_fit, x_fit, y_fit, intersection), method='Nelder-Mead',tol = 1e-6, options={ 'disp': True})
print('-------------------------------------------------------------------------')
print("Result of the optimization:")
print(res)
print('-------------------------------------------------------------------------')
print("Intersection at time t = ",res.x[0])
fit_intersection = interp_xy(res.x[0],t_fit, x_fit,y_fit)
print("Intersection point : ",fit_intersection)
else:
print("No intersection.")
# code for visualization
fig, ax = plt.subplots(figsize=(5, 5))
ax.margins(0.4, 0.2)
ax.invert_yaxis()
area_artist = Rectangle(
(area['x'][0], area['y'][0]),
width=area['x'][1] - area['x'][0],
height=area['y'][1] - area['y'][0],
edgecolor='gray', facecolor='none'
)
ax.add_artist(area_artist)
#plt.plot(x_fit,y_fit)
points = np.array([x_fit, y_fit]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
z = np.linspace(0, 1, points.shape[0])
norm = plt.Normalize(z.min(), z.max())
lc = LineCollection(
segments, cmap='autumn', norm=norm, alpha=1,
linewidths=2, picker=8, capstyle='round',
joinstyle='round'
)
lc.set_array(z)
ax.add_collection(lc)
# Again, we check that intersection exists because we don't want to draw
# an non-existing point (it would generate an error)
if not curve_intersection.is_empty:
plt.plot(fit_intersection[0],fit_intersection[1],'o')
plt.show()
OUTPUT:
Looking for minimal distance to intersection:
-------------------------------------------------------------------------
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 31
Function evaluations: 62
-------------------------------------------------------------------------
Result of the optimization:
final_simplex: (array([[-1898.91943932],
[-1898.91944021]]), array([8.44804735e-09, 3.28684898e-07]))
fun: 8.448047349426054e-09
message: 'Optimization terminated successfully.'
nfev: 62
nit: 31
status: 0
success: True
x: array([-1898.91943932])
-------------------------------------------------------------------------
Intersection at time t = -1898.919439315796
Intersection point : (805.3563860471179, 1299.9999999916085)
Whereas your code gives a much less precise result:
t=-1901.5015015
intersection point: (805.2438793482748,1300.9671136070717)
Figure:
I have an (x, y) signal with non-uniform sample rate in x. (The sample rate is roughly proportional to 1/x). I attempted to uniformly re-sample it using scipy.signal's resample function. From what I understand from the documentation, I could pass it the following arguments:
scipy.resample(array_of_y_values, number_of_sample_points, array_of_x_values)
and it would return the array of
[[resampled_y_values],[new_sample_points]]
I'd expect it to return an uniformly sampled data with a roughly identical form of the original, with the same minimal and maximalx value. But it doesn't:
# nu_data = [[x1, x2, ..., xn], [y1, y2, ..., yn]]
# with x values in ascending order
length = len(nu_data[0])
resampled = sg.resample(nu_data[1], length, nu_data[0])
uniform_data = np.array([resampled[1], resampled[0]])
plt.plot(nu_data[0], nu_data[1], uniform_data[0], uniform_data[1])
plt.show()
blue: nu_data, orange: uniform_data
It doesn't look unaltered, and the x scale have been resized too. If I try to fix the range: construct the desired uniform x values myself and use them instead, the distortion remains:
length = len(nu_data[0])
resampled = sg.resample(nu_data[1], length, nu_data[0])
delta = (nu_data[0,-1] - nu_data[0,0]) / length
new_samplepoints = np.arange(nu_data[0,0], nu_data[0,-1], delta)
uniform_data = np.array([new_samplepoints, resampled[0]])
plt.plot(nu_data[0], nu_data[1], uniform_data[0], uniform_data[1])
plt.show()
What is the proper way to re-sample my data uniformly, if not this?
Please look at this rough solution:
import matplotlib.pyplot as plt
from scipy import interpolate
import numpy as np
x = np.array([0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 20])
y = np.exp(-x/3.0)
flinear = interpolate.interp1d(x, y)
fcubic = interpolate.interp1d(x, y, kind='cubic')
xnew = np.arange(0.001, 20, 1)
ylinear = flinear(xnew)
ycubic = fcubic(xnew)
plt.plot(x, y, 'X', xnew, ylinear, 'x', xnew, ycubic, 'o')
plt.show()
That is a bit updated example from scipy page. If you execute it, you should see something like this:
Blue crosses are initial function, your signal with non uniform sampling distribution. And there are two results - orange x - representing linear interpolation, and green dots - cubic interpolation. Question is which option you prefer? Personally I don't like both of them, that is why I usually took 4 points and interpolate between them, then another points... to have cubic interpolation without that strange ups. That is much more work, and also I can't see doing it with scipy, so it will be slow. That is why I've asked about size of the data.
I'm trying to follow this as an example but can't seem to adapt it to work with my dataset since I need truncated normals:
https://stackoverflow.com/questions/35990467/fit-two-gaussians-to-a-histogram-from-one-set-of-data-python#=
I have a dataset that is definitely a mixture of 2 truncated normals. The minimum value in the domain is 0 and the maximum is 1. I want to create an object that I can fit to optimize the parameters and get the likelihood of a sequence of numbers being drawn from that distribution. One option may be to just use the KDE model and using the pdf to get the likelihood. However, I want the exact mean and standard deviations of the 2 distributions. I guess I could, split the data in half and then model the 2 normals separately but I also want to learn how to use optimize in SciPy. I'm just starting to experiment with this type of statistical analysis so my apologies if this seems naive.
I'm not sure how to get a pdf this way that can integrate to 1 and have a domain constrained between 0 and 1.
import requests
from ast import literal_eval
from scipy import optimize, stats
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# Actual Data
u = np.asarray(literal_eval(requests.get("https://pastebin.com/raw/hP5VJ9vr").text))
# u.size ==> 6000
u.min(), u.max()
# (1.3628525454666037e-08, 0.99973136607553781)
# Distribution
with plt.style.context("seaborn-white"):
fig, ax = plt.subplots()
sns.kdeplot(u, color="black", ax=ax)
ax.axvline(0, linestyle=":", color="red")
ax.axvline(1, linestyle=":", color="red")
kde = stats.gaussian_kde(u)
# KDE Model
def truncated_gaussian_lower(x,mu,sigma,A):
return np.clip(A*np.exp(-(x-mu)**2/2/sigma**2), a_min=0, a_max=None)
def truncated_gaussian_upper(x,mu,sigma,A):
return np.clip(A*np.exp(-(x-mu)**2/2/sigma**2), a_min=None, a_max=1)
def mixture_model(x,mu1,sigma1,A1,mu2,sigma2,A2):
return truncated_gaussian_lower(x,mu1,sigma1,A1) + truncated_gaussian_upper(x,mu2,sigma2,A2)
kde = stats.gaussian_kde(u)
# Estimates: mu sigma A
estimates= [0.1, 1, 3,
0.9, 1, 1]
params,cov= optimize.curve_fit(mixture_model,u,kde.pdf(u),estimates )
# ---------------------------------------------------------------------------
# RuntimeError Traceback (most recent call last)
# <ipython-input-265-b2efb2ca0e0a> in <module>()
# 32 estimates= [0.1, 1, 3,
# 33 0.9, 1, 1]
# ---> 34 params,cov= optimize.curve_fit(mixture_model,u,kde.pdf(u),estimates )
# /Users/mu/anaconda/lib/python3.6/site-packages/scipy/optimize/minpack.py in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, **kwargs)
# 738 cost = np.sum(infodict['fvec'] ** 2)
# 739 if ier not in [1, 2, 3, 4]:
# --> 740 raise RuntimeError("Optimal parameters not found: " + errmsg)
# 741 else:
# 742 # Rename maxfev (leastsq) to max_nfev (least_squares), if specified.
# RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 1400.
In response to #Uvar 's very helpful explanation below. I am trying to test the integral from 0 - 1 to see if it equals 1 but I'm getting 0.3. I think I'm missing a crucial step in logic:
# KDE Model
def truncated_gaussian(x,mu,sigma,A):
return A*np.exp(-(x-mu)**2/2/sigma**2)
def mixture_model(x,mu1,sigma1,A1,mu2,sigma2,A2):
if type(x) == np.ndarray:
norm_probas = truncated_gaussian(x,mu1,sigma1,A1) + truncated_gaussian(x,mu2,sigma2,A2)
mask_lower = x < 0
mask_upper = x > 1
mask_floor = (mask_lower.astype(int) + mask_upper.astype(int)) > 1
norm_probas[mask_floor] = 0
return norm_probas
else:
if (x < 0) or (x > 1):
return 0
return truncated_gaussian_lower(x,mu1,sigma1,A1) + truncated_gaussian_upper(x,mu2,sigma2,A2)
kde = stats.gaussian_kde(u, bw_method=2e-2)
# # Estimates: mu sigma A
estimates= [0.1, 1, 3,
0.9, 1, 1]
params,cov= optimize.curve_fit(mixture_model,u,kde.pdf(u)/integrate.quad(kde, 0 , 1)[0],estimates ,maxfev=5000)
# params
# array([ 9.89751700e-01, 1.92831695e-02, 7.84324114e+00,
# 3.73623345e-03, 1.07754038e-02, 3.79238972e+01])
# Test the integral from 0 - 1
x = np.linspace(0,1,1000)
with plt.style.context("seaborn-white"):
fig, ax = plt.subplots()
ax.plot(x, kde(x), color="black", label="Data")
ax.plot(x, mixture_model(x, *params), color="red", label="Model")
ax.legend()
# Integrating from 0 to 1
integrate.quad(lambda x: mixture_model(x, *params), 0,1)[0]
# 0.3026863969781809
It seems you are misspecifying the fitting procedure.
You are trying to fit the kde.pdf(u) while constraining half-bounds.
foo = kde.pdf(u)
min(foo)
Out[329]: 0.22903365654960098
max(foo)
Out[330]: 4.0119283429320332
As you can see, the probability density function of u is not constrained to [0,1].
As such, just deleting the clipping action, will result in an exact fit.
def truncated_gaussian_lower(x,mu,sigma,A):
return A*np.exp((-(x-mu)**2)/(2*sigma**2))
def truncated_gaussian_upper(x,mu,sigma,A):
return A * np.exp((-(x-mu)**2)/(2*sigma**2))
def mixture_model(x,mu1,sigma1,A1,mu2,sigma2,A2):
return truncated_gaussian_lower(x,mu1,sigma1,A1) + truncated_gaussian_upper(x,mu2,sigma2,A2)
estimates= [0.15, 1, 3,
0.95, 1, 1]
params,cov= optimize.curve_fit(f=mixture_model, xdata=u, ydata=kde.pdf(u), p0=estimates)
params
Out[327]:
array([ 0.00672248, 0.07462657, 4.01188383, 0.98006841, 0.07654998,
1.30569665])
y3 = mixture_model(u, params[0], params[1], params[2], params[3], params[4], params[5])
plt.plot(kde.pdf(u)+0.1) #add offset for visual inspection purpose
plt.plot(y3)
So, let's now say I change what I am plotting to:
plt.figure(); plt.plot(u,y3,'.')
Because, indeed:
np.allclose(y3, kde(u), atol=1e-2)
>>True
You can edit the mixture model a bit to be 0 outside of the domain [0, 1]:
def mixture_model(x,mu1,sigma1,A1,mu2,sigma2,A2):
if (x < 0) or (x > 1):
return 0
return truncated_gaussian_lower(x,mu1,sigma1,A1) + truncated_gaussian_upper(x,mu2,sigma2,A2)
Doing so, however, will lose the option of instantly evaluating the function over an array of x.. So for the sake of argument, I will leave it out for now.
Anyway, we want our integral to sum up to 1 in the domain [0, 1], and one way to do this (feel free to play around with the bandwidth estimator in stats.gaussian_kde as well..) is to divide the probability density estimate by its integral over the domain. Take care that optimize.curve_fit only takes 1400 iterations in this implementation, so the initial parameter estimates matter.
from scipy import integrate
sum_prob = integrate.quad(kde, 0 , 1)[0]
y = kde(u)/sum_prob
# Estimates: mu sigma A
estimates= [0.15, 1, 5,
0.95, 0.5, 3]
params,cov= optimize.curve_fit(f=mixture_model, xdata=u, ydata=y, p0=estimates)
>>array([ 6.72247814e-03, 7.46265651e-02, 7.23699661e+00,
9.80068414e-01, 7.65499825e-02, 2.35533297e+00])
y3 = mixture_model(np.arange(0,1,0.001), params[0], params[1], params[2],
params[3], params[4], params[5])
with plt.style.context("seaborn-white"):
fig, ax = plt.subplots()
sns.kdeplot(u, color="black", ax=ax)
ax.axvline(0, linestyle=":", color="red")
ax.axvline(1, linestyle=":", color="red")
plt.plot(np.arange(0,1,0.001), y3) #The red line is now your custom pdf with area-under-curve = 0.998 in the domain..
To check for the area under the curve, I used this hacky solution of redefining mixture_model..:
def mixture_model(x):
mu1=params[0]; sigma1=params[1]; A1=params[2]; mu2=params[3]; sigma2=params[4]; A2=params[5]
return truncated_gaussian_lower(x,mu1,sigma1,A1) + truncated_gaussian_upper(x,mu2,sigma2,A2)
from scipy import integrate
integrated_value, error = integrate.quad(mixture_model, 0, 1) #0 lower bound, 1 upper bound
>>(0.9978588016186962, 5.222293368393178e-14)
Or doing the integral a second way:
import sympy
x = sympy.symbols('x', real=True, nonnegative=True)
foo = sympy.integrate(params[2]*sympy.exp((-(x-params[0])**2)/(2*params[1]**2))+params[5]*sympy.exp((-(x-params[3])**2)/(2*params[4]**2)),(x,0,1), manual=True)
foo.doit()
>>0.562981541724715*sqrt(pi) #this evaluates to 0.9978588016186956
And actually doing it your way as described in your edited question:
def mixture_model(x,mu1,sigma1,A1,mu2,sigma2,A2):
return truncated_gaussian_lower(x,mu1,sigma1,A1) + truncated_gaussian_upper(x,mu2,sigma2,A2)
integrate.quad(lambda x: mixture_model(x, *params), 0,1)[0]
>>0.9978588016186962
If I set my bandwidth to your level (2e-2), indeed the evaluation comes down to 0.92, which is a worse result than the 0.998 we had earlier, but that is still significantly different from the 0.3 you report which is something I cannot recreate, even while copying your code snippets. Do you perhaps accidentally redefine functions/variables somewhere?
I have a x and y one-dimension numpy array and I would like to reproduce y with a known function to obtain "beta". Here is the code I am using:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
y = array([ 0.04022493, 0.04287536, 0.03983657, 0.0393201 , 0.03810298,
0.0363814 , 0.0331144 , 0.03074823, 0.02795767, 0.02413816,
0.02180802, 0.01861309, 0.01632699, 0.01368056, 0.01124232,
0.01005323, 0.00867196, 0.00940864, 0.00961282, 0.00892419,
0.01048963, 0.01199101, 0.01533408, 0.01855704, 0.02163586,
0.02630014, 0.02971127, 0.03511223, 0.03941218, 0.04280329,
0.04689105, 0.04960554, 0.05232003, 0.05487037, 0.05843364,
0.05120701])
x= array([ 0., 0.08975979, 0.17951958, 0.26927937, 0.35903916,
0.44879895, 0.53855874, 0.62831853, 0.71807832, 0.80783811,
0.8975979 , 0.98735769, 1.07711748, 1.16687727, 1.25663706,
1.34639685, 1.43615664, 1.52591643, 1.61567622, 1.70543601,
1.7951958 , 1.88495559, 1.97471538, 2.06447517, 2.15423496,
2.24399475, 2.33375454, 2.42351433, 2.51327412, 2.60303391,
2.6927937 , 2.78255349, 2.87231328, 2.96207307, 3.05183286,
3.14159265])
def func(x,beta):
return 1.0/(4.0*np.pi)*(1+beta*(3.0/2*np.cos(x)**2-1.0/2))
guesses = [20]
popt,pcov = curve_fit(func,x,y,p0=guesses)
y_fit = 1/(4.0*np.pi)*(1+popt[0]*(3.0/2*np.cos(x)**2-1.0/2))
plt.figure(1)
plt.plot(x,y,'ro',x,y_fit,'k-')
plt.show()
The code works but the fitting is completely off (see picture). Any idea why?
It looks like the formula to use contains an additional parameter, i.e. p
def func(x,beta,p):
return p/(4.0*np.pi)*(1+beta*(3.0/2*np.cos(x)**2-1.0/2))
guesses = [20,5]
popt,pcov = curve_fit(func,x,y,p0=guesses)
y_fit = func(angle_plot,*popt)
plt.figure(2)
plt.plot(x,y,'ro',x,y_fit,'k-')
plt.show()
print popt # [ 1.23341604 0.27362069]
In the popt which one is beta and which one is p?
This is perhaps not what you want but, if you are just trying to get a good fit to the data, you could use np.polyfit:
fit = np.polyfit(x,y,4)
fit_fn = np.poly1d(fit)
plt.scatter(x,y,label='data',color='r')
plt.plot(x,fit_fn(x),color='b',label='fit')
plt.legend(loc='upper left')
Note that fit gives the coefficient values of, in this case, a 4th order polynomial:
>>> fit
array([-0.00877534, 0.05561778, -0.09494909, 0.02634183, 0.03936857])
This is going to be as good as you can get (assuming you get the equation right as #mdurant suggested), an additional intercept term is required to further improve the fit:
def func(x,beta, icpt):
return 1.0/(4.0*np.pi)*(1+beta*(3.0/2*np.cos(x)**2-1.0/2))+icpt
guesses = [20, 0]
popt,pcov = curve_fit(func,x,y,p0=guesses)
y_fit = func(x, *popt)
plt.figure(1)
plt.plot(x,y,'ro', x,y_fit,'k-')
print popt #[ 0.33748816 -0.05780343]