When I execute the following code
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import Rbf
x_coarse, y_coarse = np.mgrid[0:5, 0:5]
x_fine, y_fine = np.mgrid[1:4:0.23,1:4:0.23]
data_coarse = np.ones([5,5])
rbfi = Rbf(x_coarse.ravel(), y_coarse.ravel(), data_coarse.ravel())
interpolated_data = rbfi(x_fine.ravel(), y_fine.ravel()).reshape([x_fine.shape[0],
y_fine.shape[0]])
plt.imshow(interpolated_data)
the array interpolated_data has values ranging from 0.988 to 1.002 and the corresponding plot looks like this:
However, I would expect that in such a simple interpolation case, the interpolated values would be a lot closer to the correct value, i.e. 1.000.
I think the variations in the interpolated values are caused by the different distances from the interpolated points to the given data points.
My question is: Is there a way to avoid this behavior? How can I get an interpolation that is not weighted by the distance of the interpolated points to the data points and gives me nothing but 1.000 in interpolated_data?
I would expect that in such a simple interpolation case,
An unwarranted expectation. The RBF interpolation, as its name says, uses radial basis functions. By default the basis function sqrt((r/epsilon)**2 + 1) where r is the distance from a data point and epsilon is a positive parameter. There is no way for a weighted sum of such functions to be identically constant. RBF interpolation isn't like a linear or bilinear interpolation. It's a rough interpolation suitable for rough data.
By setting an absurdly large epsilon, you can get closer to 1; just because it makes the basis functions nearly identical on the grid.
rbfi = Rbf(x_coarse.ravel(), y_coarse.ravel(), data_coarse.ravel(), epsilon=10)
# ...
print(interpolated_data.min(), interpolated_data.max())
# outputs 0.9999983458255883 1.0000002402521204
However this is not a good idea, because when the data is not constant, there will be too much long-range influence in the interpolant.
gives me nothing but 1.000 in interpolated_data?
That would be linear interpolation. LinearNDInterpolator has similar syntax to Rbf, in that it returns a callable.
linear = LinearNDInterpolator(np.stack((x_coarse.ravel(), y_coarse.ravel()), axis=-1),
data_coarse.ravel())
interpolated_data = linear(x_fine.ravel(), y_fine.ravel()).reshape([x_fine.shape[0], y_fine.shape[0]])
print(interpolated_data.min(), interpolated_data.max())
# outputs 1.0 1.0
There is also a griddata which has more interpolation modes.
Related
savgol_filter gives me the series.
I want to get the underlying polynormial function.
The function of the red line in a below picture.
So that I can extrapolate a point beyond the given x range.
Or I can find the slope of the function at the two extreme data points.
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import savgol_filter
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.2
yhat = savgol_filter(y, 51, 3) # window size 51, polynomial order 3
plt.plot(x,y)
plt.plot(x,yhat, color='red')
plt.show()
** edit**
Since the filter uses least squares regression to fit the data in a small window to a polynomial of given degree, you can probably only extrapolate from the ends. I think the fitted curve is a piecewise function of these 'fits' and each function would not be a good representation of the entire data as a whole. What you could do is take the end windows of your data, and fit them to the same polynomial degree as the savitzy golay filter (using scipy's polyfit). It likely will not be accurate very far from the window though.
You can also use scipy.signal.savgol_coeffs() to get the coefficients of the filter. I think you dot product the coefficient array with your array of data to get the value at each point. You can include a derivative argument to get the slope at the ends of your data.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.savgol_coeffs.html#scipy.signal.savgol_coeffs
I am new in Python and a bit confused with the interpolation and Least-squares fitting of two ndarrays.
I have 2 ndarrays:
My final goal is to make Least-squares fitting of the modelled spectrum (blue curve) to the observed spectrum (orange curve).
Blue curve ndarray has the following parameters:
Orange curve ndarray has the following parameters:
As a first and the easiest step I wanted to plot the residuals (difference) between that two ndarrays, but the problem is that since they have different sizes 391 and 256 respectively. I've tried to use numpy.reshape or ndarray.resphape functions, but they lead to an errors.
Probably the proper solution will be to start with the interpolation of the blue curve into the less denser grid of the orange curve. I've tried to use numpy.interp function but it also leads to an errors.
Something along the lines of the following:
import numpy as np
import matplotlib.pyplot as plt
n_denser = 33
n_coarser = 7
x_denser = np.linspace(0,1,n_denser)
y_denser = np.power(x_denser, 2) + np.random.randn(n_denser)/10.
x_coarser = np.linspace(0,1,n_coarser)
y_coarser = np.power(x_coarser, 2) + np.random.randn(n_coarser)/10. + 0.5
y_dense_interp = np.interp(x_coarser, x_denser, y_denser)
plt.plot(x_denser, y_denser, 'b+-')
plt.plot(x_coarser, y_coarser, 'ro:')
plt.plot(x_coarser, y_dense_interp, 'go')
plt.legend(['dense data', 'coarse data', 'interp data'])
plt.show()
Which returns something like:
Your confusion seems to stem from mixing up the methods you mention. Least-squares is not a method for interpolation, rather it is a minimization curve fitting method. One key difference is that with interpolation the plots always pass through the original data points. With least-squares this can happen bit it is not generally the case.
Cubic-spline interpolation will give you 'nice' plots if you need to pass through the original data points.
If you want to use least-squares, you need to know what degree polynomial you want to fit. The most common is linear (first order).
I came up with a custom interpolation method for my problem and I'd like to ask if there are any risks using it. I am not a math or programming expert, that's why I'd like a feedback :)
Story:
I was searching for a good curve-fit method for my data when I came up with an idea to interpolate the data.
I am mixing paints together and making reflectance measurements with a spectrophotometer when the film is dry. I would like to calculate the required proportions of white and colored paints to reach a certain lightness, regardless of any hue shift (e.g. black+white paints gives a bluish grey) or chroma loss (e.g. orange+white gives "pastel" yellowish orange, etc.)
I check if Beer-Lambert law applies, but it does not. Pigment-mixing behaves in a more complicated fashion than dye-dilutions. So I wanted to fit a curve to my data points (the process is explained here: Interpolation for color-mixing
First step was doing a calibration curve, I tested the following ratios of colored VS white paints mixed together:
ratios = 1, 1/2., 1/4., 1/8., 1/16., 1/32., 1/64., 0
This is the plot of my carefully prepared samples, measured with a spectrophotometer, the blue curve represents the full color (ratio = 1), the red curve represents the white paint (ratio = 0), the black curves the mixed samples:
Second step I wanted to guess from this data a function that would compute a spectral curve for any ration between 0 and 1. I did test several curve fitting (fitting an exponential function) and interpolation (quadratic, cubic) methods but the results were of a poor quality.
For example, this is my reflectance data at 380nm for all the color samples:
This is the result of scipy.optimize.curve_fit using the function:
def func(x, a, b, c):
return a * np.exp(-b * x) + c
popt, pcov = curve_fit(func, x, y)
Then I came-up with this idea: the logarithm of the spectral data gives a closer match to a straight line, and the logarithm of the logarithm of the data is almost a straight line, as demonstrated by this code and graph:
import numpy as np
import matplotlib.pyplot as plt
reflectance_at_380nm = 5.319, 13.3875, 24.866, 35.958, 47.1105, 56.2255, 65.232, 83.9295
ratios = 1, 1/2., 1/4., 1/8., 1/16., 1/32., 1/64., 0
linear_approx = np.log(np.log(reflectance_at_380nm))
plt.plot(ratios, linear_approx)
plt.show()
What I did then is to interpolate the linear approximation an then convert the data back to linear, then I got a very nice interpolation of my data, much better than what I got before:
import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate
reflectance_at_380nm = 5.319, 13.3875, 24.866, 35.958, 47.1105, 56.2255, 65.232, 83.9295
ratios = 1, 1/2., 1/4., 1/8., 1/16., 1/32., 1/64., 0
linear_approx = np.log(np.log(reflectance_at_380nm))
xnew = np.arange(100)/100.
cs = scipy.interpolate.spline(ratios, linear_approx, xnew, order=1)
cs = np.exp(np.exp(cs))
plt.plot(xnew,cs)
plt.plot(x,y,'ro')
plt.show()
So my question is for experts: how good is this interpolation method and what are the risks of using it? Can it lead to wrong results?
Also: can this method be improved or does it already exists and if so how is it called?
Thank you for reading
This looks similar to the Kernel Method that is used for fitting regression lines or finding decision boundaries for classification problems.
The idea behind the Kernel trick being, the data is transformed into a dimensional space (often higher dimensional), where the data is linearly separable (for classification), or has a linear curve-fit (for regression). After the curve-fitting is done, inverse transformations can be applied. In your case successive exponentiations (exp(exp(X))), seems to be the inverse transformation and successive logarithms (log(log(x)))seems to be the transformation.
I am not sure if there is a kernel that does exactly this, but the intuition is similar. Here is a medium article explaining this for classification using SVM:
https://medium.com/#zxr.nju/what-is-the-kernel-trick-why-is-it-important-98a98db0961d
Since it is a method that is quite popularly used in Machine Learning, I doubt it will lead to wrong results if the fit is done properly (not under-fit or over-fit) - and this needs to be judged by statistical testing.
Suppose 'h' is a function of x,y,z and t and it gives us a graph line (t,h) (simulated). At the same time we also have observed graph (observed values of h against t). How can I reduce the difference between observed (t,h) and simulated (t,h) graph by optimizing values of x,y and z? I want to change the simulated graph so that it imitates closer and closer to the observed graph in MATLAB/Python. In literature I have read that people have done same thing by Lavenberg-marquardt algorithm but don't know how to do it?
You are actually trying to fit the parameters x,y,z of the parametrized function h(x,y,z;t).
MATLAB
You're right that in MATLAB you should either use lsqcurvefit of the Optimization toolbox, or fit of the Curve Fitting Toolbox (I prefer the latter).
Looking at the documentation of lsqcurvefit:
x = lsqcurvefit(fun,x0,xdata,ydata);
It says in the documentation that you have a model F(x,xdata) with coefficients x and sample points xdata, and a set of measured values ydata. The function returns the least-squares parameter set x, with which your function is closest to the measured values.
Fitting algorithms usually need starting points, some implementations can choose randomly, in case of lsqcurvefit this is what x0 is for. If you have
h = #(x,y,z,t) ... %// actual function here
t_meas = ... %// actual measured times here
h_meas = ... %// actual measured data here
then in the conventions of lsqcurvefit,
fun <--> #(params,t) h(params(1),params(2),params(3),t)
x0 <--> starting guess for [x,y,z]: [x0,y0,z0]
xdata <--> t_meas
ydata <--> h_meas
Your function h(x,y,z,t) should be vectorized in t, such that for vector input in t the return value is the same size as t. Then the call to lsqcurvefit will give you the optimal set of parameters:
x = lsqcurvefit(#(params,t) h(params(1),params(2),params(3),t),[x0,y0,z0],t_meas,h_meas);
h_fit = h(x(1),x(2),x(3),t_meas); %// best guess from curve fitting
Python
In python, you'd have to use the scipy.optimize module, and something like scipy.optimize.curve_fit in particular. With the above conventions you need something along the lines of this:
import scipy.optimize as opt
popt,pcov = opt.curve_fit(lambda t,x,y,z: h(x,y,z,t), t_meas, y_meas, p0=[x0,y0,z0])
Note that the p0 starting array is optional, but all parameters will be set to 1 if it's missing. The result you need is the popt array, containing the optimal values for [x,y,z]:
x,y,z = popt
h_fit = h(x,y,z,t_meas)
I need to (numerically) calculate the first and second derivative of a function for which I've attempted to use both splrep and UnivariateSpline to create splines for the purpose of interpolation the function to take the derivatives.
However, it seems that there's an inherent problem in the spline representation itself for functions who's magnitude is order 10^-1 or lower and are (rapidly) oscillating.
As an example, consider the following code to create a spline representation of the sine function over the interval (0,6*pi) (so the function oscillates three times only):
import scipy
from scipy import interpolate
import numpy
from numpy import linspace
import math
from math import sin
k = linspace(0, 6.*pi, num=10000) #interval (0,6*pi) in 10'000 steps
y=[]
A = 1.e0 # Amplitude of sine function
for i in range(len(k)):
y.append(A*sin(k[i]))
tck =interpolate.UnivariateSpline(x, y, w=None, bbox=[None, None], k=5, s=2)
M=tck(k)
Below are the results for M for A = 1.e0 and A = 1.e-2
http://i.imgur.com/uEIxq.png Amplitude = 1
http://i.imgur.com/zFfK0.png Amplitude = 1/100
Clearly the interpolated function created by the splines is totally incorrect! The 2nd graph does not even oscillate the correct frequency.
Does anyone have any insight into this problem? Or know of another way to create splines within numpy/scipy?
Cheers,
Rory
I'm guessing that your problem is due to aliasing.
What is x in your example?
If the x values that you're interpolating at are less closely spaced than your original points, you'll inherently lose frequency information. This is completely independent from any type of interpolation. It's inherent in downsampling.
Nevermind the above bit about aliasing. It doesn't apply in this case (though I still have no idea what x is in your example...
I just realized that you're evaluating your points at the original input points when you're using a non-zero smoothing factor (s).
By definition, smoothing won't fit the data exactly. Try putting s=0 in instead.
As a quick example:
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
x = np.linspace(0, 6.*np.pi, num=100) #interval (0,6*pi) in 10'000 steps
A = 1.e-4 # Amplitude of sine function
y = A*np.sin(x)
fig, axes = plt.subplots(nrows=2)
for ax, s, title in zip(axes, [2, 0], ['With', 'Without']):
yinterp = interpolate.UnivariateSpline(x, y, s=s)(x)
ax.plot(x, yinterp, label='Interpolated')
ax.plot(x, y, 'bo',label='Original')
ax.legend()
ax.set_title(title + ' Smoothing')
plt.show()
The reason that you're only clearly seeing the effects of smoothing with a low amplitude is due to the way the smoothing factor is defined. See the documentation for scipy.interpolate.UnivariateSpline for more details.
Even with a higher amplitude, the interpolated data won't match the original data if you use smoothing.
For example, if we just change the amplitude (A) to 1.0 in the code example above, we'll still see the effects of smoothing...
The problem is in choosing suitable values for the s parameter. Its values depend on the scaling of the data.
Reading the documentation carefully, one can deduce that the parameter should be chosen around s = len(y) * np.var(y), i.e. # of data points * variance. Taking for example s = 0.05 * len(y) * np.var(y) gives a smoothing spline that does not depend on the scaling of the data or the number of data points.
EDIT: sensible values for s depend of course also on the noise level in the data. The docs seem to recommend choosing s in the range (m - sqrt(2*m)) * std**2 <= s <= (m + sqrt(2*m)) * std**2 where std is the standard deviation associated with the "noise" you want to smooth over.