Most accurate way to interpolate data and find the peak? - python

The data I have is always on a second degree polynomial (quadratic function). I want to find the peak of the interpolated function as accurately as possible.
So far I've been using interp1d and then extract the peak value using linspace and a simple for loop. Although you can use a large number of newly generated samples in linspace you can still be more precise using the derivative of the fitted polynomial. I haven't found a way to do that using interp1d.
Now the only function I've found that returns the fitted polynomial coefficients is polyfit, but this fitted function is quite inaccurate (most of the time the function doesn't even go through the data points).
I've tried using UnivariateSpline and the fitted function seems to be quite accurate and it's very simple to get the derivative spline and its root.
Other polynomial fitting functions (BarycentricInterpolator, KroghInterpolator, ...) state that they are not computing polynomial coefficients for reasons of numerical stability.
How accurate is UnivariateSpline and its derivatives, or are there any better options out there?

If all you need is to find the min/max of a second degree polynomial why not do this:
import matplotlib.pyplot as plt
from scipy.interpolate import KroghInterpolator
import numpy as np
x=range(-20,20)
y=[]
for i in x:
y.append((i**2)+25)
x=x[1::5]
y=y[1::5]
f=KroghInterpolator(x,y)
xfine=np.arange(min(x),max(x),.5)
yfine=f(xfine)
val_interp=min(yfine)
print val_interp
plt.scatter(x,y)
plt.plot(xfine, yfine)
plt.show()

In the end I went with polyfit. Although the fitted function didn't go exactly through the data points the end result was still good. From the returned coefficients I got the desired x and y coordinates of the peak.

Related

Numpy polyfit find least divergent

I am using numpy polyfit to create a number of plots which show a line of best fit. This works fine.. But I am wondering... Is it possible to assertain WHICH one of my plots has got the "straightest" line
Not sure what the correct term is...
I guess from the data points given, which set of data is least divergent?
ie:
X = [1,2,3,4,5,6,7,8,9,10]
Y = [1,2,3,4,5,6,7,8,9,10]
this would be giving me a perfect fit... how can I find which is the most perfect fit?
Fitting Algorithms like Regression has a metric showing it's accuracy named RMSE (root mean square error) which shows how much does curve deviates from points. It is explained here well.

Can't fit Poisson to histogram in

I've looked at a bunch of examples on here and tried using snippets of other codes, but they're not working for me. I have 4 data sets, but I'll include just one here. My professor told me that the data appeared to be Poisson distributed, so I am trying to fit a Poisson to a histogram of the data. Here is my code:
######## Poisson fit ########
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.special import factorial
data = data59[4]
entries,bin_edges,patches = plt.hist(data59[4],60,[1,10],normed=True)
bin_middles = 0.5*(bin_edges[1:]+bin_edges[:-1])
def poisson(k, lamb):
return np.exp(-lamb)*(lamb**k)/factorial(k)
popt,pcov = curve_fit(poisson,bin_middles,entries)
x = np.linspace(1,10,100)
plt.plot(x,poisson(x,*popt))
plt.show()
I tried plotting other distributions on top of the histogram like normal and Rayleigh using scipy.stats instead of curve_fit. Those kind of worked only because they have a scale parameter, which scipy.stats.poisson doesn't. The distribution for this comes out looking exactly the same as the curve_fit. I'm not sure how to resolve this issue. Perhaps the data is not even Poisson distributed!
Thanks for helping!!
Update: The data is IceCube data from the TXS 0506+056 blazar. I used SkyDrive to get a URL for the file. I hope it works. The first column is the modified Julian day and the last column is the log of the energy proxy. I am using the last column. I have null and alternative hypotheses surrounding this data and am using maximum likelihood estimation (from a certain distribution, Poisson in my first case) to analyze the data.
Also, here is where I got the data: https://icecube.wisc.edu/science/data/TXS0506_point_source
The data presented in your histogram does not have a Poisson distribution. The Poisson is a counting distribution (what's the probability of getting 0, 1, 2,... observations per unit of time or space), its support is the positive integers. Your histogram clearly shows that you have fractional values, since the spikes are different non-zero heights at non-integer locations.

Frontier Equation - Fit a polynomial to the top of a data set

How can I fit a polynomial to an empirical data set using python such that it fits the "top" of the data -- i.e. for every value of x, the output of function is greater than the largest y at that x. But at the same time it minimizes this such that it hugs the data. An example of what I'm referring to is seen in the image below:
You need to use cvxopt to find the coordinates of the efficient frontier, which is a quadratic programming problem, then feed those coordinates in numpy's ployfit to get the polynomial fitting the frontier. This Quantopian blog post does both: https://blog.quantopian.com/markowitz-portfolio-optimization-2/

curve fitting in scipy is of poor quality. How can I improve it?

I'm doing a fit of a set results to a predicted function. The function might be interpreted as linear but I might have to change it a little so I am doing curve fitting instead of linear regression. I use the curve_fit function in scipy. Here is how I use it
kappa = 1
alpha=2
popt,pcov = curve_fit(fitFunc1,self.X[0:3],self.Y[0:3],sigma=self.Err[0:3],p0=[kappa,alpha])
and here is fitFunc1
def fitFunc1(X,kappa,alpha):
out = []
for x in X:
y = log(kappa)
y += 4*log(pi)
y += alpha*x
y -= 2*log(2)
out.append(-y)
return np.array(out)
Here is an example of the fit . The green line is a matlab fit. The red one is a scipy fit. I carry the fist over the first three dots.
You are using non-linear fitting routines to fit the data, not linear least-squares as invoked by A\b. The result is that the matlab and/or scipy minimization routines are getting stuck in local minima during the optimizations, leading to different results.
You should get the same results (to within numerical precision) if you apply logs to the raw data prior to linear fitting with A\b (in matlab).
edit
Inspecting function fitFunc1 it looks like the x/y data have already been transformed prior to the fit within scipy.
I performed a linear fit with the data shown, using matlab. The results using linear least squares with the operation polyfit(x,y,1) (essentially a linear fit) is very similar to the scipy result:
In any case, the data looks piecewise linear so a better solution may be to attempt a piecewise linear fit. On the other the log transformation can do all sorts of unwanted stuff, so performing nonlinear fits on the original data without performing a log tranform may be the best solution.
If you don't mind having a little bit of extra work I suggest using PyMinuit or iMinuit, both are minimisation packages based on Seal Minuit.
Then you can minimise a Chi Sq function or maximise the likelihood of your data in relation to your fit function. They also provide all the errors and everything you would like to know about the fit.
Hope this helps! xD

Python Least-Squares Natural Splines

I am trying to find a numerical package which will fit a natural spline which minimizes weighted least squares.
There is a package in scipy which does what I want for unnatural splines.
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate, randn
x = np.arange(0,5,1.0/6)
xs = np.arange(0,5,1.0/500)
y = np.sin(x+1) + .2*np.random.rand(len(x)) -.1
knots = np.array([1,2,3,4])
tck = interpolate.splrep(x,y,s=0,k=3,t=knots,task=-1)
ys = interpolate.splev(xs,tck,der=0)
plt.figure()
plt.plot(xs,ys,x,y,'x')
The spline.py file inside of this tar file from this page does a natural spline fit by default. There is also some code on this page that claims to mostly what you want. The pyD3D package also has a natural spline function in its pyDataUtils module. This last one looks the most promising to me. However, it doesn't appear to have the option of setting your own knots. Maybe if you look at the source you can find a way to rectify that.
Also, I found this message on the Scipy mailing list which says that using s=0.0 (as in your given code) makes splines fitted using your above procedure natural according the writer of the message. I did find this splmake function that has an option to do a natural spline fit, but upon looking at the source I found that it isn't implemented yet.

Categories