I am getting some errors when interpolating with RBF. Here is an example in 1D. I think that it has to do with how close my y values are to each other. Is there any fix for this?
import numpy as np
from scipy.interpolate import Rbf, interp1d
import matplotlib.pyplot as plt
x = np.array([0.77639752, 0.8136646, 0.85093168, 0.88819876, 0.92546584, 0.96273292, 1.])
y = np.array([0.97119742, 0.98089758, 0.98937066, 0.99540737, 0.99917735, 1., 0.99779049])
xi = np.linspace(min(x),max(x),1000)
fig = plt.figure(1)
plt.plot(x,y,'ko', label='Raw Data')
#RBF
rbfi = Rbf(x,y, function='linear')
plt.plot(xi,rbfi(xi), label='RBF (linear)')
rbfi = Rbf(x,y, function='cubic')
plt.plot(xi,rbfi(xi), label='RBF (cubic)')
#1D
f = interp1d(x,y, kind='cubic')
plt.plot(xi,f(xi), label='Interp1D (cubic)')
plt.plot(x,y,'ko', label=None)
plt.grid()
plt.legend()
plt.xlabel('x')
plt.ylabel('y')
plt.tight_layout()
plt.savefig('RBFTest.png')
Indeed, when implemented properly, RBF interpolation using the polyharmonic spline r^3 in 1D coincides with the natural cubic spline, and is a "smoothest" interpolant.
Unfortunately, the scipy.interpolate.Rbf, despite the name, does not appear to be a correct implementation of the RBF methods known from the approximation theory. The error is around the line
self.nodes = linalg.solve(self.A, self.di)
They forgot the (linear) polynomial term in the construction of the polyharmonic RBF! The system should have been (2).
Now, one shouldn't trust interp1d blindly either. What algorithm used in interp1d function in scipy.interpolate suggests that it may not be using natural cubic spline but a different condition. No mentioning of it in the help page: one needs to go into the python source, and I'm afraid of what we will find there.
Is there a fix for this?
If it's a serious work, make your own implementation of the RBF interpolation algorithm. Or, if you want to try a different implementation in python, there is apparently one from the University of Michigan: https://rbf.readthedocs.io. If you do, could you post your findings here? If not, you've already did a good service by demonstrating an important SciPy error -- thank you!
Related
I'm having a lot of trouble fitting this data, particularly getting the fit parameters to match the expected parameters.
from scipy.optimize import curve_fit
import numpy as np
def gaussian_model(x, a, b, c, d): # add constant d
return a*np.exp(-(x-b)**2/(2*c**2))+d
x = np.linspace(0, 20, 100)
mu, cov = curve_fit(gaussian_model, xdata, ydata)
fit_A = mu[0]
fit_B = mu[1]
fit_C = mu[2]
fit_D = mu[3]
fit_y = gaussian_model(xdata, fit_A, fit_B, fit_C, fit_D)
print(mu)
plt.plot(x, fit_y)
plt.scatter(xdata, ydata)
plt.show()
Here's the plot
When I printed the parameters, I got values of -17 for amplitude, 2.6 for mean, -2.5 for standard deviation, and 110 for the base. This is very far off from what I would expect from the scatter plot. Any ideas why?
Also, I'm pretty new to coding, so any advice is helpful! Thanks everyone :)
Edit: figured out what was wrong! Just needed to add some guesses.
This is not an answer as expected.
This is an alternative method of fitting gaussian.
The process is not iteratif and doesn't requier initial "guessed" values of the parameters to start as in the usual methods.
The result is :
The method of calculus is shown below :
The general principle is explained with examples in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales . This is a linear regression wrt an integral equation which solution is the gaussian function.
If one want more accurate and/or more specific result according to some specified criteria of fitting, one have to use a software with non-linear regression process. Then one can use the above result as initial values of parameters for a more robust iterative process.
I'm trying to interpolate a set of points using the UnivariateSpline function, but I'm getting the usual big oscillations in the limits of the set, do you know any way to solve this?
My code looks like this:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
x=pd.read_csv('thrustlaw.txt')
x1=x['Time(sec)']
y1=x['Thrust(N)']
def splines(x1,y1):
from scipy.interpolate import UnivariateSpline
si = UnivariateSpline(x1,y1,s=0, k=3)
xs = np.linspace(0, x1[len(x1)-1], 10000)
ys = si(xs)
plt.plot(x1,y1,'go')
plt.plot(xs, ys)
plt.ylabel("Thrust[N]")
plt.xlabel("Time[sec]")
plt.title("Thrust curve (splines)")
plt.grid()
plt.show()
splines(x1,y1)
Result:
Fitting high-degree polynomials to noisy data tends to do this. An interpolation method that doesn't have this problem is the (unique) piecewise cubic polynomial that, for each pair of successive points i, i+1:
goes through x_i, y_i
goes through x_{i+1}, y_{i+1}
at x_i, has slope (y_{i+1} - y_{i-1}) / (x_{i+1} - x_{i-1})
at x_{i+1}, has slope (y_{i+2} - y_i) / (x_{i+2} - x_i)
So the tangent at each point is parallel to the straight line segment from the previous point to the next. This forces the derivative to be "somewhat similar" to the original data, so it doesn't oscillate wildly.
If I'm not mistaken, this is a Catmull-Rom spline, a particular case of a cubic Hermite spline. Maybe this question will help you implement it in scipy, or to find another interpolation method to your liking.
How could I get the coordinates of a point in the space with the greatest density.
I have this code to generate a random point and density analyze from this point.
import numpy as np
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
def random_data(N):
# Generate some random data.
return np.random.uniform(0., 10., N)
x_data = random_data(50)
y_data = random_data(50)
kernel = stats.gaussian_kde(np.vstack([x_data, y_data]), bw_method=0.05)
b = plt.plot(x_data, y_data, 'ro')
df = pd.DataFrame({"x":x_data,"y":y_data})
p = sns.jointplot(data=df,x='x', y='y',kind='kde')
plt.show(p)
Thank you for help. :)
For starters, let me state the obvious by saying that sns.jointplot computes the kernel density on its own, so your kernel variable is as of yet unused.
Here's what sns.jointplot generated for me with a random sample:
There's a nice maximum at around (7, 5.4).
Here's what your kernel corresponds to:
x,y = np.mgrid[:10:100j, :10:100j] # 100 x 100 grid for plotting
z = kernel.pdf(np.array([x.ravel(),y.ravel()])).reshape(x.shape)
fig,ax = plt.subplots()
ax.contourf(x, y, z, levels=10)
ax.axis('scaled')
This will clearly not do: the density contains peaks centered around your input points; you will never be able to get a similar estimate than what sns.jointplot gave you.
We can easily fix this: you just have to drop the custom bw_method argument in the call to gaussian_kde:
kernel = stats.gaussian_kde(np.vstack([x_data, y_data]))
x,y = np.mgrid[:10:100j, :10:100j] # 100 x 100 grid for plotting
z = kernel.pdf(np.array([x.ravel(),y.ravel()])).reshape(x.shape)
fig,ax = plt.subplots()
ax.contourf(x, y, z, levels=10)
ax.axis('scaled')
This looks just the way you want it:
Now you know that this kernel.pdf is a bivariate function for which you're looking for the maximum.
And to find the maximum you should probably use something from scipy.optimize, for instance scipy.optimize.minimize (the trick is to look at the negative of your function, which turns maxima into minima).
Since your function will probably have a few local maxima, finding the global maximum reliably is not trivial. I would either use the aforementioned minimize, but first use a sparse mesh over the relevant domain and find the best maximum candidate first, or use a heavy-weight solver such as differential_evolution which is a stochastic solver that's supposed to be good at finding the true global minimum of a function.
Root finding and minimization is always fickle business, so you will have to play around with your real data and available methods to find a reliable workflow that gives you your maximum.
There are two ways to specify the noise level for Gaussian Process Regression (GPR) in scikit-learn.
The first way is to specify the parameter alpha in the constructor of the class GaussianProcessRegressor which just adds values to the diagonal as expected.
The second way is incorporate the noise level in the kernel with WhiteKernel.
The documentation of GaussianProcessRegressor (see documentation here) says that specifying alpha is "equivalent to adding a WhiteKernel with c=alpha". However, I am experiencing a different behavior and want to find out what the reason is for that (and, of course, what the "correct" way or "truth" is).
Here is a code snippet plotting two different regression fits for a perturbed version of the function f(x)=x^2 although they should show the same:
import matplotlib.pyplot as plt
import numpy as np
import numpy.random as rnd
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import ConstantKernel as C, RBF, WhiteKernel
rnd.seed(0)
n = 40
xs = np.linspace(-1, 1, num=n)
noise = 0.1
kernel1 = C()*RBF() + WhiteKernel(noise_level=noise)
kernel2 = C()*RBF()
data = xs**2 + rnd.multivariate_normal(mean=np.zeros(n), cov=noise*np.eye(n))
gpr1 = GaussianProcessRegressor(kernel=kernel1, alpha=0.0, optimizer=None)
gpr1.fit(xs[:, np.newaxis], data)
gpr2 = GaussianProcessRegressor(kernel=kernel2, alpha=noise, optimizer=None)
gpr2.fit(xs[:, np.newaxis], data)
xs_plt = np.linspace(-1., 1., num=100)
for gpr in [gpr1, gpr2]:
pred, pred_std = gpr.predict(xs_plt[:, np.newaxis], return_std=True)
plt.figure()
plt.plot(xs_plt, pred, 'C0', lw=2)
plt.scatter(xs, data, c='C1', s=20)
plt.fill_between(xs_plt, pred - 1.96*pred_std, pred + 1.96*pred_std,
alpha=0.2, color='C0')
plt.title("Kernel: %s\n Log-Likelihood: %.3f"
% (gpr.kernel_, gpr.log_marginal_likelihood(gpr.kernel_.theta)),
fontsize=12)
plt.ylim(-1.2, 1.2)
plt.tight_layout()
plt.show()
I already was looking into the implementation in the scikit-learn package, but was not able to find out what is going wrong. Or maybe I am just overseeing something and the outputs make perfect sense.
Does anyone have an idea of what is going on here or had a similar experience?
Thanks a lot!
I might be wrong here, but I believe the claim 'specifying alpha is "equivalent to adding a WhiteKernel with c=alpha"' is subtly incorrect.
When setting the GP-Regression noise, the noise is added only to K, the covariance between the training points. When adding a Whitenoise-Kernel, the noise is also added to K**, the covariance between test points.
In your case, the test points and training points are identical. However, the three different matrices are likely still created. This could lead to the discrepancy observed here.
I argue that the documentation is incorrect. See github issue #13267 about this with (which I opened).
In practice, what I do is fit a GP with the WhiteKernel then take that noice level. I then add that value to alpha and recompute the necessary variables. An easier alternative is to make a new GP with the alpha set and the same length scales but do not fit it.
I should note that it is not universally accepted as to whether or not this is the right approach. I had this discussion with a colleague and we came to the following conclusion. This pertains to the data bei.ng noise from experimental error
If you want to sample the GP to predict what a new experiment with more independent measurements, you want the WhiteKernel
If you want to sample the possible underlying truth, you do not want the WhiteKernel since you want a smooth response
https://gpflow.readthedocs.io/en/awav-documentation/notebooks/regression.html
Maybe you can use the GPflow package, which makes separate prediction for latent function f and observation y (f+ noise).
m.predict_f returns the mean and variance of the latent function (f) at the points Xnew.
m.predict_y returns the mean and variance of a new data point (i.e. includes the noise variance).
I have the following graph that I want to digitize to a high-quality publication grade figure using Python and Matplotlib:
I used a digitizer program to grab a few samples from one of the 3 data sets:
x_data = np.array([
1,
1.2371,
1.6809,
2.89151,
5.13304,
9.23238,
])
y_data = np.array([
0.0688824,
0.0490012,
0.0332843,
0.0235889,
0.0222304,
0.0245952,
])
I have already tried 3 different methods of fitting a curve through these data points. The first method being to draw a spline through the points using scipy.interpolate import spline
This results in (with the actual data points drawn as blue markers):
This is obvisously no good.
My second attempt was to draw a curve fit using a series of different order polinimials using scipy.optimize import curve_fit. Even up to a fourth order polynomial the answer is useless (the lower order ones were even more useless):
Finally, I used scipy.interpolate import interp1d to try and interpolate between the data points. Linear interpolation obviously yields expected results but the line are straight and the whole purpose of this exercise is to get a nice smooth curve:
If I then use cubic interpolation I get a rubish result, however quadratic interpolation yields a slightly better result:
But it's not quite there yet, and I don't think interp1d can do higher order interpolation.
Is there anyone out there who has a good method of doing this? Maybe I would be better off trying to do it in IPE or something?
Thank you!
A standard cubic spline is not very good at reasonable looking interpolations between data points that are very unevenly spaced. Fortunately, there are plenty of other interpolation algorithms and Scipy provides a number of them. Here are a few applied to your data:
import numpy as np
from scipy.interpolate import spline, UnivariateSpline, Akima1DInterpolator, PchipInterpolator
import matplotlib.pyplot as plt
x_data = np.array([1, 1.2371, 1.6809, 2.89151, 5.13304, 9.23238])
y_data = np.array([0.0688824, 0.0490012, 0.0332843, 0.0235889, 0.0222304, 0.0245952])
x_data_smooth = np.linspace(min(x_data), max(x_data), 1000)
fig, ax = plt.subplots(1,1)
spl = UnivariateSpline(x_data, y_data, s=0, k=2)
y_data_smooth = spl(x_data_smooth)
ax.plot(x_data_smooth, y_data_smooth, 'b')
bi = Akima1DInterpolator(x_data, y_data)
y_data_smooth = bi(x_data_smooth)
ax.plot(x_data_smooth, y_data_smooth, 'g')
bi = PchipInterpolator(x_data, y_data)
y_data_smooth = bi(x_data_smooth)
ax.plot(x_data_smooth, y_data_smooth, 'k')
ax.plot(x_data_smooth, y_data_smooth)
ax.scatter(x_data, y_data)
plt.show()
I suggest looking through these, and also a few others, and finding one that matches what you think looks right. Also, though, you may want to sample a few more points. For example, I think the PCHIP algorithm wants to keep the fit monotonic between data points, so digitizing your minimum point would be useful (and probably a good idea regardless of the algorithm you use).