SciPy: n-dimensional interpolation of sparse data - python

I currently have a collection of n-dimensional data points, each with a value associated with it (n typically will range from 2 to 4).
I would like to employ some form of non-linear interpolation on the data points I am supplied with so that I can try and minimise this value. Of course, I am open to better methods of minimising the value.
At the moment, I have code that works for 1D and 2D arrays
mesh = np.meshgrid(*[i['grid2'] for i in self.cambParams], indexing='ij')
chi2 = griddata(data[:,:-1], data[:,-1], tuple(mesh), method='cubic')
However scipy.interpolate.griddata only supports linear interpolation above 2D grids, meaning interpolation is useless as the minimum will be a defined point in the data. Does anyone know of an alternate interpolation method that might work, or a better way of solving the problem in general?
Cheers

Received a tip from an external source that work, so posting the answer in case it helps anyone in the future.
SciPy has an Rbf interpolation method (radial basis function) which allows better than linear interpolation at arbitrary dimensions.
Taking a variable data with rows of (x1,x2,x3...,xn,v) values, the follow code modification to the original post allows for interpolation:
rbfi = Rbf(*data.T)
mesh = np.meshgrid(*[i['grid2'] for i in self.cambParams], indexing='ij')
chi2 = rbfi(*mesh)
The documentation here is useful, and there is a simple and easy to follow example here, which will make more sense than the code snippet above.

Related

Python/Numpy/Scipy, Solving non-linear least squares, with grouped covariates?

Basically, I'm trying to make this function happen:
Where i'm solving for the beta. gamma, alpha, and x are all from the data.
Originally, I just used the summary statistic mean(xi/gamma_i), which meant that everything in that summation could be pre-calculated, and then i would just present a simple np array to the non-linear optimizer... but now there's no way to pre-calculate the summary statistic, as it's not immediately clear how beta will affect f when f is changing in response to alpha_i. Thus, I'm not sure how to go about presenting that array. is it possible to embed those covariates as lists (numpy Objects) to still present a numpy array, and then unpack the list within the residual function? Am i going about this the wrong way?

Most efficient method of returning coefficients for a fit in Python for use in another languages?

So, I have the following data I've plotted in Python.
The data is input for a forcing term in a system of differential equations I am working with. Thus, I need to fit a continuous function to this data so I will not have to deal with stability issues that could come with discontinuities of a step-wise function. Unfortunately, it's a pretty large data set.
I am trying to end up with a fitted function that is possible and not too tedious to translate into Stan, the language that I am coding the differential equations in, so was preferring something in piece-wise polynomial form with a maximum of just a few pieces that I can manually code.
I started off with polyfit from numpy, which was not very good. Using UnivariateSpline from scipy gave me a decent fit, but it did not give me something that looked tractable for translation into Stan. Hence, I was looking for suggestions into other fits I could try that would return functions that are more easily translatable into other languages? Looking at the shape of my data, is there a periodic spline fit that could be useful?
The UnivariateSpline object has get_knots and get_coeffs methods. They give you the knots and coefficients of the fit in the b-spline basis.
An alternative, equivalent, way is to use splrep for fitting (and splev for evaluations).
To convert to a piecewise polynomial representation, use PPoly.from_spline (check the docs for the latter for the exact format)
If what you want is a Fourier space representation, you can use leastsq or least_squares. It'd be essential to provide sensible starting values for NLSQ fit parameters. At least I'd start from e.g. max-to-max distance estimate for the period and max-to-min estimate for the amplitude.
As always with non-linear fitting, YMMV, however.
From the direction field, it seems that a fit involving the sum of or composition of multiple sinusoidal functions might be it.
Ex: sin(cos(2x)), sin(x)+2cos(x), etc.
I would use Wolfram Alpha, Mathematica, or Matlab to create direction fields.

Generate two-dimensional normal distribution given a mean and standard deviation

I'm looking for a two-dimensional analog to the numpy.random.normal routine, i.e. numpy.random.normal generates a one-dimensional array with a mean, standard deviation and sample number as input, and what I'm looking for is a way to generate points in two-dimensional space with those same input parameters.
Looks like numpy.random.multivariate_normal can do this, but I don't quite understand what the cov parameter is supposed to be. The following excerpt describes this parameter in more detail and is from the scipy docs:
Covariance matrix of the distribution. Must be symmetric and
positive-semidefinite for “physically meaningful” results.
Later in the page, in the examples section, a sample cov value is given:
cov = [[1,0],[0,100]] # diagonal covariance, points lie on x or y-axis
The concept is still quite opaque to me, however.
If someone could clarify what the cov should be or suggest another way to generate points in two-dimensional space given a mean and standard deviation using python I would appreciate it.
If you pass size=[1, 2] to the normal() function, you get a 2D-array, which is actually what you're looking for:
>>> numpy.random.normal(size=[1, 2])
array([[-1.4734477 , -1.50257962]])

curve fitting in scipy is of poor quality. How can I improve it?

I'm doing a fit of a set results to a predicted function. The function might be interpreted as linear but I might have to change it a little so I am doing curve fitting instead of linear regression. I use the curve_fit function in scipy. Here is how I use it
kappa = 1
alpha=2
popt,pcov = curve_fit(fitFunc1,self.X[0:3],self.Y[0:3],sigma=self.Err[0:3],p0=[kappa,alpha])
and here is fitFunc1
def fitFunc1(X,kappa,alpha):
out = []
for x in X:
y = log(kappa)
y += 4*log(pi)
y += alpha*x
y -= 2*log(2)
out.append(-y)
return np.array(out)
Here is an example of the fit . The green line is a matlab fit. The red one is a scipy fit. I carry the fist over the first three dots.
You are using non-linear fitting routines to fit the data, not linear least-squares as invoked by A\b. The result is that the matlab and/or scipy minimization routines are getting stuck in local minima during the optimizations, leading to different results.
You should get the same results (to within numerical precision) if you apply logs to the raw data prior to linear fitting with A\b (in matlab).
edit
Inspecting function fitFunc1 it looks like the x/y data have already been transformed prior to the fit within scipy.
I performed a linear fit with the data shown, using matlab. The results using linear least squares with the operation polyfit(x,y,1) (essentially a linear fit) is very similar to the scipy result:
In any case, the data looks piecewise linear so a better solution may be to attempt a piecewise linear fit. On the other the log transformation can do all sorts of unwanted stuff, so performing nonlinear fits on the original data without performing a log tranform may be the best solution.
If you don't mind having a little bit of extra work I suggest using PyMinuit or iMinuit, both are minimisation packages based on Seal Minuit.
Then you can minimise a Chi Sq function or maximise the likelihood of your data in relation to your fit function. They also provide all the errors and everything you would like to know about the fit.
Hope this helps! xD

Finding the length of a cubic B-spline

Using scipy's interpolate.splprep function get a parametric spline on parameter u, but the domain of u is not the line integral of the spline, it is a piecewise linear connection of the input coordinates. I've tried integrate.splint, but that just gives the individual integrals over u. Obviously, I can numerically integrate a bunch of Cartesian differential distances, but I was wondering if there was closed-form method for getting the length of a spline or spline segment (using scipy or numpy) that I was overlooking.
Edit: I am looking for a closed-form solution or a very fast way to converge to a machine-precision answer. I have all but given up on the numerical root-finding methods and am now primarily after a closed-form answer. If anyone has any experience integrating elliptical functions or can point me to a good resource (other than Wolfram), That would be great.
I'm going to try Maxima to try to get the indefinite integral of what I believe is the function for one segment of the spline: I cross-posted this on MathOverflow
Because both x & y are cubic parametric functions, there isn't a closed solution in terms of simple functions. Numerical integration is the way to go. Either integrating the arc length expression or simply adding line segment lengths - depends on the accuracy you are after and how much effort you want to exert.
An accurate and fast "Adding length of line segments" method:
Using recurvise subdivision (a form of de Casteljeau's algorithm) to generate points, can give you a highly accurate representation with minimal number of points.
Only subdivide subdivisions if they fail to meet a criteria. Usually the criteria is based on the length joining the control points (the hull or cage).
For cubic, usually comparing closeness of P0P1+P1P2+P2P3 to P0P3, where P0, P1, P2 & P3 are the control points that define your bezier.
You can find some Delphi code here:
link text
It should be relatively easy to convert to Python.
It will generate the points. The code already calculates the length of the segments in order to test the criteria. You can simply accumulate those length values along the way.
You can integrate the function sqrt(x'(u)**2+y'(u)**2) over u, where you calculate the derivatives x' and y' of your coordinates with scipy.interpolate.splev. The integration can be done with one of the routines from scipy.integrate (quad is precise [Clenshaw-Curtis], romberg is generally faster). This should be more precise, and probably faster than adding up lots of small distances (which is equivalent to integrating with the rectangle rule).

Categories