Getting spline equation from UnivariateSpline object - python

I'm using UnivariateSpline to construct piecewise polynomials for some data that I have. I would then like to use these splines in other programs (either in C or FORTRAN) and so I would like to understand the equation behind the generated spline.
Here is my code:
import numpy as np
import scipy as sp
from scipy.interpolate import UnivariateSpline
import matplotlib.pyplot as plt
import bisect
data = np.loadtxt('test_C12H26.dat')
Tmid = 800.0
print "Tmid", Tmid
nmid = bisect.bisect(data[:,0],Tmid)
fig = plt.figure()
plt.plot(data[:,0], data[:,7],ls='',marker='o',markevery=20)
npts = len(data[:,0])
#print "npts", npts
w = np.ones(npts)
w[0] = 100
w[nmid] = 100
w[npts-1] = 100
spline1 = UnivariateSpline(data[:nmid,0],data[:nmid,7],s=1,w=w[:nmid])
coeffs = spline1.get_coeffs()
print coeffs
print spline1.get_knots()
print spline1.get_residual()
print coeffs[0] + coeffs[1] * (data[0,0] - data[0,0]) \
+ coeffs[2] * (data[0,0] - data[0,0])**2 \
+ coeffs[3] * (data[0,0] - data[0,0])**3, \
data[0,7]
print coeffs[0] + coeffs[1] * (data[nmid,0] - data[0,0]) \
+ coeffs[2] * (data[nmid,0] - data[0,0])**2 \
+ coeffs[3] * (data[nmid,0] - data[0,0])**3, \
data[nmid,7]
print Tmid,data[-1,0]
spline2 = UnivariateSpline(data[nmid-1:,0],data[nmid-1:,7],s=1,w=w[nmid-1:])
print spline2.get_coeffs()
print spline2.get_knots()
print spline2.get_residual()
plt.plot(data[:,0],spline1(data[:,0]))
plt.plot(data[:,0],spline2(data[:,0]))
plt.savefig('test.png')
And here is the resulting plot. I believe I have valid splines for each interval but it looks like my spline equation is not correct... I can't find any reference to what it is supposed to be in the scipy documentation. Anybody knows? Thanks !

The scipy documentation does not have anything to say about how one can take the coefficients and manually generate the spline curve. However, it is possible to figure out how to do this from the existing literature on B-splines. The following function bspleval shows how to construct the B-spline basis functions (the matrix B in the code), from which one can easily generate the spline curve by multiplying the coefficients with the highest-order basis functions and summing:
def bspleval(x, knots, coeffs, order, debug=False):
'''
Evaluate a B-spline at a set of points.
Parameters
----------
x : list or ndarray
The set of points at which to evaluate the spline.
knots : list or ndarray
The set of knots used to define the spline.
coeffs : list of ndarray
The set of spline coefficients.
order : int
The order of the spline.
Returns
-------
y : ndarray
The value of the spline at each point in x.
'''
k = order
t = knots
m = alen(t)
npts = alen(x)
B = zeros((m-1,k+1,npts))
if debug:
print('k=%i, m=%i, npts=%i' % (k, m, npts))
print('t=', t)
print('coeffs=', coeffs)
## Create the zero-order B-spline basis functions.
for i in range(m-1):
B[i,0,:] = float64(logical_and(x >= t[i], x < t[i+1]))
if (k == 0):
B[m-2,0,-1] = 1.0
## Next iteratively define the higher-order basis functions, working from lower order to higher.
for j in range(1,k+1):
for i in range(m-j-1):
if (t[i+j] - t[i] == 0.0):
first_term = 0.0
else:
first_term = ((x - t[i]) / (t[i+j] - t[i])) * B[i,j-1,:]
if (t[i+j+1] - t[i+1] == 0.0):
second_term = 0.0
else:
second_term = ((t[i+j+1] - x) / (t[i+j+1] - t[i+1])) * B[i+1,j-1,:]
B[i,j,:] = first_term + second_term
B[m-j-2,j,-1] = 1.0
if debug:
plt.figure()
for i in range(m-1):
plt.plot(x, B[i,k,:])
plt.title('B-spline basis functions')
## Evaluate the spline by multiplying the coefficients with the highest-order basis functions.
y = zeros(npts)
for i in range(m-k-1):
y += coeffs[i] * B[i,k,:]
if debug:
plt.figure()
plt.plot(x, y)
plt.title('spline curve')
plt.show()
return(y)
To give an example of how this can be used with Scipy's existing univariate spline functions, the following is an example script. This takes the input data and uses Scipy's functional and also its object-oriented approach to spline fitting. Taking the coefficients and knot points from either of the two and using these as inputs to our manually-calculated routine bspleval, we reproduce the same curve that they do. Note that the difference between the manually evaluated curve and Scipy's evaluation method is so small that it is almost certainly floating-point noise.
x = array([-273.0, -176.4, -79.8, 16.9, 113.5, 210.1, 306.8, 403.4, 500.0])
y = array([2.25927498e-53, 2.56028619e-03, 8.64512988e-01, 6.27456769e+00, 1.73894734e+01,
3.29052124e+01, 5.14612316e+01, 7.20531200e+01, 9.40718450e+01])
x_nodes = array([-273.0, -263.5, -234.8, -187.1, -120.3, -34.4, 70.6, 194.6, 337.8, 500.0])
y_nodes = array([2.25927498e-53, 3.83520726e-46, 8.46685318e-11, 6.10568083e-04, 1.82380809e-01,
2.66344008e+00, 1.18164677e+01, 3.01811501e+01, 5.78812583e+01, 9.40718450e+01])
## Now get scipy's spline fit.
k = 3
tck = splrep(x_nodes, y_nodes, k=k, s=0)
knots = tck[0]
coeffs = tck[1]
print('knot points=', knots)
print('coefficients=', coeffs)
## Now try scipy's object-oriented version. The result is exactly the same as "tck": the knots are the
## same and the coeffs are the same, they are just queried in a different way.
uspline = UnivariateSpline(x_nodes, y_nodes, s=0)
uspline_knots = uspline.get_knots()
uspline_coeffs = uspline.get_coeffs()
## Here are scipy's native spline evaluation methods. Again, "ytck" and "y_uspline" are exactly equal.
ytck = splev(x, tck)
y_uspline = uspline(x)
y_knots = uspline(knots)
## Now let's try our manually-calculated evaluation function.
y_eval = bspleval(x, knots, coeffs, k, debug=False)
plt.plot(x, ytck, label='tck')
plt.plot(x, y_uspline, label='uspline')
plt.plot(x, y_eval, label='manual')
## Next plot the knots and nodes.
plt.plot(x_nodes, y_nodes, 'ko', markersize=7, label='input nodes') ## nodes
plt.plot(knots, y_knots, 'mo', markersize=5, label='tck knots') ## knots
plt.xlim((-300.0,530.0))
plt.legend(loc='best', prop={'size':14})
plt.figure()
plt.title('difference')
plt.plot(x, ytck-y_uspline, label='tck-uspl')
plt.plot(x, ytck-y_eval, label='tck-manual')
plt.legend(loc='best', prop={'size':14})
plt.show()

The coefficients given by get_coeffs are B-spline (Basis spline) coefficients, described here: B-spline (Wikipedia)
Probably whatever other program/language you will be using has an implementation. Supply the knot locations and coefficients, and you should be all set.

Related

Inverse of numpy.gradient function

I need to create a function which would be the inverse of the np.gradient function.
Where the Vx,Vy arrays (Velocity component vectors) are the input and the output would be an array of anti-derivatives (Arrival Time) at the datapoints x,y.
I have data on a (x,y) grid with scalar values (time) at each point.
I have used the numpy gradient function and linear interpolation to determine the gradient vector Velocity (Vx,Vy) at each point (See below).
I have achieved this by:
#LinearTriInterpolator applied to a delaunay triangular mesh
LTI= LinearTriInterpolator(masked_triang, time_array)
#Gradient requested at the mesh nodes:
(Vx, Vy) = LTI.gradient(triang.x, triang.y)
The first image below shows the velocity vectors at each point, and the point labels represent the time value which formed the derivatives (Vx,Vy)
The next image shows the resultant scalar value of the derivatives (Vx,Vy) plotted as a colored contour graph with associated node labels.
So my challenge is:
I need to reverse the process!
Using the gradient vectors (Vx,Vy) or the resultant scalar value to determine the original Time-Value at that point.
Is this possible?
Knowing that the numpy.gradient function is computed using second order accurate central differences in the interior points and either first or second order accurate one-sides (forward or backwards) differences at the boundaries, I am sure there is a function which would reverse this process.
I was thinking that taking a line derivative between the original point (t=0 at x1,y1) to any point (xi,yi) over the Vx,Vy plane would give me the sum of the velocity components. I could then divide this value by the distance between the two points to get the time taken..
Would this approach work? And if so, which numpy integrate function would be best applied?
An example of my data can be found here [http://www.filedropper.com/calculatearrivaltimefromgradientvalues060820]
Your help would be greatly appreciated
EDIT:
Maybe this simplified drawing might help understand where I'm trying to get to..
EDIT:
Thanks to #Aguy who has contibuted to this code.. I Have tried to get a more accurate representation using a meshgrid of spacing 0.5 x 0.5m and calculating the gradient at each meshpoint, however I am not able to integrate it properly. I also have some edge affects which are affecting the results that I don't know how to correct.
import numpy as np
from scipy import interpolate
from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D
#Createmesh grid with a spacing of 0.5 x 0.5
stepx = 0.5
stepy = 0.5
xx = np.arange(min(x), max(x), stepx)
yy = np.arange(min(y), max(y), stepy)
xgrid, ygrid = np.meshgrid(xx, yy)
grid_z1 = interpolate.griddata((x,y), Arrival_Time, (xgrid, ygrid), method='linear') #Interpolating the Time values
#Formatdata
X = np.ravel(xgrid)
Y= np.ravel(ygrid)
zs = np.ravel(grid_z1)
Z = zs.reshape(X.shape)
#Calculate Gradient
(dx,dy) = np.gradient(grid_z1) #Find gradient for points on meshgrid
Velocity_dx= dx/stepx #velocity ms/m
Velocity_dy= dy/stepx #velocity ms/m
Resultant = (Velocity_dx**2 + Velocity_dy**2)**0.5 #Resultant scalar value ms/m
Resultant = np.ravel(Resultant)
#Plot Original Data F(X,Y) on the meshgrid
fig = pyplot.figure()
ax = fig.add_subplot(projection='3d')
ax.scatter(x,y,Arrival_Time,color='r')
ax.plot_trisurf(X, Y, Z)
ax.set_xlabel('X-Coordinates')
ax.set_ylabel('Y-Coordinates')
ax.set_zlabel('Time (ms)')
pyplot.show()
#Plot the Derivative of f'(X,Y) on the meshgrid
fig = pyplot.figure()
ax = fig.add_subplot(projection='3d')
ax.scatter(X,Y,Resultant,color='r',s=0.2)
ax.plot_trisurf(X, Y, Resultant)
ax.set_xlabel('X-Coordinates')
ax.set_ylabel('Y-Coordinates')
ax.set_zlabel('Velocity (ms/m)')
pyplot.show()
#Integrate to compare the original data input
dxintegral = np.nancumsum(Velocity_dx, axis=1)*stepx
dyintegral = np.nancumsum(Velocity_dy, axis=0)*stepy
valintegral = np.ma.zeros(dxintegral.shape)
for i in range(len(yy)):
for j in range(len(xx)):
valintegral[i, j] = np.ma.sum([dxintegral[0, len(xx) // 2],
dyintegral[i, len(yy) // 2], dxintegral[i, j], - dxintegral[i, len(xx) // 2]])
valintegral = valintegral * np.isfinite(dxintegral)
Now the np.gradient is applied at every meshnode (dx,dy) = np.gradient(grid_z1)
Now in my process I would analyse the gradient values above and make some adjustments (There is some unsual edge effects that are being create which I need to rectify) and would then integrate the values to get back to a surface which would be very similar to f(x,y) shown above.
I need some help adjusting the integration function:
#Integrate to compare the original data input
dxintegral = np.nancumsum(Velocity_dx, axis=1)*stepx
dyintegral = np.nancumsum(Velocity_dy, axis=0)*stepy
valintegral = np.ma.zeros(dxintegral.shape)
for i in range(len(yy)):
for j in range(len(xx)):
valintegral[i, j] = np.ma.sum([dxintegral[0, len(xx) // 2],
dyintegral[i, len(yy) // 2], dxintegral[i, j], - dxintegral[i, len(xx) // 2]])
valintegral = valintegral * np.isfinite(dxintegral)
And now I need to calculate the new 'Time' values at the original (x,y) point locations.
UPDATE (08-09-20) : I am getting some promising results using the help from #Aguy. The results can be seen below (with the blue contours representing the original data, and the red contours representing the integrated values).
I am still working on an integration approach which can remove the inaccuarcies at the areas of min(y) and max(y)
from matplotlib.tri import (Triangulation, UniformTriRefiner,
CubicTriInterpolator,LinearTriInterpolator,TriInterpolator,TriAnalyzer)
import pandas as pd
from scipy.interpolate import griddata
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
#-------------------------------------------------------------------------
# STEP 1: Import data from Excel file, and set variables
#-------------------------------------------------------------------------
df_initial = pd.read_excel(
r'C:\Users\morga\PycharmProjects\venv\Development\Trial'
r'.xlsx')
Inputdata can be found here link
df_initial = df_initial .sort_values(by='Delay', ascending=True) #Update dataframe and sort by Delay
x = df_initial ['X'].to_numpy()
y = df_initial ['Y'].to_numpy()
Arrival_Time = df_initial ['Delay'].to_numpy()
# Createmesh grid with a spacing of 0.5 x 0.5
stepx = 0.5
stepy = 0.5
xx = np.arange(min(x), max(x), stepx)
yy = np.arange(min(y), max(y), stepy)
xgrid, ygrid = np.meshgrid(xx, yy)
grid_z1 = interpolate.griddata((x, y), Arrival_Time, (xgrid, ygrid), method='linear') # Interpolating the Time values
# Calculate Gradient (velocity ms/m)
(dy, dx) = np.gradient(grid_z1) # Find gradient for points on meshgrid
Velocity_dx = dx / stepx # x velocity component ms/m
Velocity_dy = dy / stepx # y velocity component ms/m
# Integrate to compare the original data input
dxintegral = np.nancumsum(Velocity_dx, axis=1) * stepx
dyintegral = np.nancumsum(Velocity_dy, axis=0) * stepy
valintegral = np.ma.zeros(dxintegral.shape) # Makes an array filled with 0's the same shape as dx integral
for i in range(len(yy)):
for j in range(len(xx)):
valintegral[i, j] = np.ma.sum(
[dxintegral[0, len(xx) // 2], dyintegral[i, len(xx) // 2], dxintegral[i, j], - dxintegral[i, len(xx) // 2]])
valintegral[np.isnan(dx)] = np.nan
min_value = np.nanmin(valintegral)
valintegral = valintegral + (min_value * -1)
##Plot Results
fig = plt.figure()
ax = fig.add_subplot()
ax.scatter(x, y, color='black', s=7, zorder=3)
ax.set_xlabel('X-Coordinates')
ax.set_ylabel('Y-Coordinates')
ax.contour(xgrid, ygrid, valintegral, levels=50, colors='red', zorder=2)
ax.contour(xgrid, ygrid, grid_z1, levels=50, colors='blue', zorder=1)
ax.set_aspect('equal')
plt.show()
TL;DR;
You have multiple challenges to address in this issue, mainly:
Potential reconstruction (scalar field) from its gradient (vector field)
But also:
Observation in a concave hull with non rectangular grid;
Numerical 2D line integration and numerical inaccuracy;
It seems it can be solved by choosing an adhoc interpolant and a smart way to integrate (as pointed out by #Aguy).
MCVE
In a first time, let's build a MCVE to highlight above mentioned key points.
Dataset
We recreate a scalar field and its gradient.
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
def f(x, y):
return x**2 + x*y + 2*y + 1
Nx, Ny = 21, 17
xl = np.linspace(-3, 3, Nx)
yl = np.linspace(-2, 2, Ny)
X, Y = np.meshgrid(xl, yl)
Z = f(X, Y)
zl = np.arange(np.floor(Z.min()), np.ceil(Z.max())+1, 2)
dZdy, dZdx = np.gradient(Z, yl, xl, edge_order=1)
V = np.hypot(dZdx, dZdy)
The scalar field looks like:
axe = plt.axes(projection='3d')
axe.plot_surface(X, Y, Z, cmap='jet', alpha=0.5)
axe.view_init(elev=25, azim=-45)
And, the vector field looks like:
axe = plt.contour(X, Y, Z, zl, cmap='jet')
axe.axes.quiver(X, Y, dZdx, dZdy, V, units='x', pivot='tip', cmap='jet')
axe.axes.set_aspect('equal')
axe.axes.grid()
Indeed gradient is normal to potential levels. We also plot the gradient magnitude:
axe = plt.contour(X, Y, V, 10, cmap='jet')
axe.axes.set_aspect('equal')
axe.axes.grid()
Raw field reconstruction
If we naively reconstruct the scalar field from the gradient:
SdZx = np.cumsum(dZdx, axis=1)*np.diff(xl)[0]
SdZy = np.cumsum(dZdy, axis=0)*np.diff(yl)[0]
Zhat = np.zeros(SdZx.shape)
for i in range(Zhat.shape[0]):
for j in range(Zhat.shape[1]):
Zhat[i,j] += np.sum([SdZy[i,0], -SdZy[0,0], SdZx[i,j], -SdZx[i,0]])
Zhat += Z[0,0] - Zhat[0,0]
We can see the global result is roughly correct, but levels are less accurate where the gradient magnitude is low:
Interpolated field reconstruction
If we increase the grid resolution and pick a specific interpolant (usual when dealing with mesh grid), we can get a finer field reconstruction:
r = np.stack([X.ravel(), Y.ravel()]).T
Sx = interpolate.CloughTocher2DInterpolator(r, dZdx.ravel())
Sy = interpolate.CloughTocher2DInterpolator(r, dZdy.ravel())
Nx, Ny = 200, 200
xli = np.linspace(xl.min(), xl.max(), Nx)
yli = np.linspace(yl.min(), yl.max(), Nx)
Xi, Yi = np.meshgrid(xli, yli)
ri = np.stack([Xi.ravel(), Yi.ravel()]).T
dZdxi = Sx(ri).reshape(Xi.shape)
dZdyi = Sy(ri).reshape(Xi.shape)
SdZxi = np.cumsum(dZdxi, axis=1)*np.diff(xli)[0]
SdZyi = np.cumsum(dZdyi, axis=0)*np.diff(yli)[0]
Zhati = np.zeros(SdZxi.shape)
for i in range(Zhati.shape[0]):
for j in range(Zhati.shape[1]):
Zhati[i,j] += np.sum([SdZyi[i,0], -SdZyi[0,0], SdZxi[i,j], -SdZxi[i,0]])
Zhati += Z[0,0] - Zhati[0,0]
Which definitely performs way better:
So basically, increasing the grid resolution with an adhoc interpolant may help you to get more accurate result. The interpolant also solve the need to get a regular rectangular grid from a triangular mesh to perform integration.
Concave and convex hull
You also have pointed out inaccuracy on the edges. Those are the result of the combination of the interpolant choice and the integration methodology. The integration methodology fails to properly compute the scalar field when it reach concave region with few interpolated points. The problem disappear when choosing a mesh-free interpolant able to extrapolate.
To illustrate it, let's remove some data from our MCVE:
q = np.full(dZdx.shape, False)
q[0:6,5:11] = True
q[-6:,-6:] = True
dZdx[q] = np.nan
dZdy[q] = np.nan
Then the interpolant can be constructed as follow:
q2 = ~np.isnan(dZdx.ravel())
r = np.stack([X.ravel(), Y.ravel()]).T[q2,:]
Sx = interpolate.CloughTocher2DInterpolator(r, dZdx.ravel()[q2])
Sy = interpolate.CloughTocher2DInterpolator(r, dZdy.ravel()[q2])
Performing the integration we see that in addition of classical edge effect we do have less accurate value in concave regions (swingy dot-dash lines where the hull is concave) and we have no data outside the convex hull as Clough Tocher is a mesh-based interpolant:
Vl = np.arange(0, 11, 1)
axe = plt.contour(X, Y, np.hypot(dZdx, dZdy), Vl, cmap='jet')
axe.axes.contour(Xi, Yi, np.hypot(dZdxi, dZdyi), Vl, cmap='jet', linestyles='-.')
axe.axes.set_aspect('equal')
axe.axes.grid()
So basically the error we are seeing on the corner are most likely due to integration issue combined with interpolation limited to the convex hull.
To overcome this we can choose a different interpolant such as RBF (Radial Basis Function Kernel) which is able to create data outside the convex hull:
Sx = interpolate.Rbf(r[:,0], r[:,1], dZdx.ravel()[q2], function='thin_plate')
Sy = interpolate.Rbf(r[:,0], r[:,1], dZdy.ravel()[q2], function='thin_plate')
dZdxi = Sx(ri[:,0], ri[:,1]).reshape(Xi.shape)
dZdyi = Sy(ri[:,0], ri[:,1]).reshape(Xi.shape)
Notice the slightly different interface of this interpolator (mind how parmaters are passed).
The result is the following:
We can see the region outside the convex hull can be extrapolated (RBF are mesh free). So choosing the adhoc interpolant is definitely a key point to solve your problem. But we still need to be aware that extrapolation may perform well but is somehow meaningless and dangerous.
Solving your problem
The answer provided by #Aguy is perfectly fine as it setups a clever way to integrate that is not disturbed by missing points outside the convex hull. But as you mentioned there is inaccuracy in concave region inside the convex hull.
If you wish to remove the edge effect you detected, you will have to resort to an interpolant able to extrapolate as well, or find another way to integrate.
Interpolant change
Using RBF interpolant seems to solve your problem. Here is the complete code:
df = pd.read_excel('./Trial-Wireup 2.xlsx')
x = df['X'].to_numpy()
y = df['Y'].to_numpy()
z = df['Delay'].to_numpy()
r = np.stack([x, y]).T
#S = interpolate.CloughTocher2DInterpolator(r, z)
#S = interpolate.LinearNDInterpolator(r, z)
S = interpolate.Rbf(x, y, z, epsilon=0.1, function='thin_plate')
N = 200
xl = np.linspace(x.min(), x.max(), N)
yl = np.linspace(y.min(), y.max(), N)
X, Y = np.meshgrid(xl, yl)
#Zp = S(np.stack([X.ravel(), Y.ravel()]).T)
Zp = S(X.ravel(), Y.ravel())
Z = Zp.reshape(X.shape)
dZdy, dZdx = np.gradient(Z, yl, xl, edge_order=1)
SdZx = np.nancumsum(dZdx, axis=1)*np.diff(xl)[0]
SdZy = np.nancumsum(dZdy, axis=0)*np.diff(yl)[0]
Zhat = np.zeros(SdZx.shape)
for i in range(Zhat.shape[0]):
for j in range(Zhat.shape[1]):
#Zhat[i,j] += np.nansum([SdZy[i,0], -SdZy[0,0], SdZx[i,j], -SdZx[i,0]])
Zhat[i,j] += np.nansum([SdZx[0,N//2], SdZy[i,N//2], SdZx[i,j], -SdZx[i,N//2]])
Zhat += Z[100,100] - Zhat[100,100]
lz = np.linspace(0, 5000, 20)
axe = plt.contour(X, Y, Z, lz, cmap='jet')
axe = plt.contour(X, Y, Zhat, lz, cmap='jet', linestyles=':')
axe.axes.plot(x, y, '.', markersize=1)
axe.axes.set_aspect('equal')
axe.axes.grid()
Which graphically renders as follow:
The edge effect is gone because of the RBF interpolant can extrapolate over the whole grid. You can confirm it by comparing the result of mesh-based interpolants.
Linear
Clough Tocher
Integration variable order change
We can also try to find a better way to integrate and mitigate the edge effect, eg. let's change the integration variable order:
Zhat[i,j] += np.nansum([SdZy[N//2,0], SdZx[N//2,j], SdZy[i,j], -SdZy[N//2,j]])
With a classic linear interpolant. The result is quite correct, but we still have an edge effect on the bottom left corner:
As you noticed the problem occurs at the middle of the axis in region where the integration starts and lacks a reference point.
Here is one approach:
First, in order to be able to do integration, it's good to be on a regular grid. Using here variable names x and y as short for your triang.x and triang.y we can first create a grid:
import numpy as np
n = 200 # Grid density
stepx = (max(x) - min(x)) / n
stepy = (max(y) - min(y)) / n
xspace = np.arange(min(x), max(x), stepx)
yspace = np.arange(min(y), max(y), stepy)
xgrid, ygrid = np.meshgrid(xspace, yspace)
Then we can interpolate dx and dy on the grid using the same LinearTriInterpolator function:
fdx = LinearTriInterpolator(masked_triang, dx)
fdy = LinearTriInterpolator(masked_triang, dy)
dxgrid = fdx(xgrid, ygrid)
dygrid = fdy(xgrid, ygrid)
Now comes the integration part. In principle, any path we choose should get us to the same value. In practice, since there are missing values and different densities, the choice of path is very important to get a reasonably accurate answer.
Below I choose to integrate over dxgrid in the x direction from 0 to the middle of the grid at n/2. Then integrate over dygrid in the y direction from 0 to the i point of interest. Then over dxgrid again from n/2 to the point j of interest. This is a simple way to make sure most of the path of integration is inside the bulk of available data by simply picking a path that goes mostly in the "middle" of the data range. Other alternative consideration would lead to different path selections.
So we do:
dxintegral = np.nancumsum(dxgrid, axis=1) * stepx
dyintegral = np.nancumsum(dygrid, axis=0) * stepy
and then (by somewhat brute force for clarity):
valintegral = np.ma.zeros(dxintegral.shape)
for i in range(n):
for j in range(n):
valintegral[i, j] = np.ma.sum([dxintegral[0, n // 2], dyintegral[i, n // 2], dxintegral[i, j], - dxintegral[i, n // 2]])
valintegral = valintegral * np.isfinite(dxintegral)
valintegral would be the result up to an arbitrary constant which can help put the "zero" where you want.
With your data shown here:
ax.tricontourf(masked_triang, time_array)
This is what I'm getting reconstructed when using this method:
ax.contourf(xgrid, ygrid, valintegral)
Hopefully this is somewhat helpful.
If you want to revisit the values at the original triangulation points, you can use interp2d on the valintegral regular grid data.
EDIT:
In reply to your edit, your adaptation above has a few errors:
Change the line (dx,dy) = np.gradient(grid_z1) to (dy,dx) = np.gradient(grid_z1)
In the integration loop change the dyintegral[i, len(yy) // 2] term to dyintegral[i, len(xx) // 2]
Better to replace the line valintegral = valintegral * np.isfinite(dxintegral) with valintegral[np.isnan(dx)] = np.nan

Python natural smoothing splines

I am trying to find a python package that would give an option to fit natural smoothing splines with user selectable smoothing factor. Is there an implementation for that? If not, how would you use what is available to implement it yourself?
By natural spline I mean that there should be a condition that the second derivative of the fitted function at the endpoints is zero (linear).
By smoothing spline I mean that the spline should not be 'interpolating' (passing through all the datapoints). I would like to decide the correct smoothing factor lambda (see the Wikipedia page for smoothing splines) myself.
What I have found
scipy.interpolate.CubicSpline [link]: Does natural (cubic) spline fitting. Does interpolation, and there is no way to smooth the data.
scipy.interpolate.UnivariateSpline [link]: Does spline fitting with user selectable smoothing factor. However, there is no option to make the splines natural.
After hours of investigation, I did not find any pip installable packages which could fit a natural cubic spline with user-controllable smoothness. However, after deciding to write one myself, while reading about the topic I stumbled upon a blog post by github user madrury. He has written python code capable of producing natural cubic spline models.
The model code is available here (NaturalCubicSpline) with a BSD-licence. He has also written some examples in an IPython notebook.
But since this is the Internet and links tend to die, I will copy the relevant parts of the source code here + a helper function (get_natural_cubic_spline_model) written by me, and show an example of how to use it. The smoothness of the fit can be controlled by using different number of knots. The position of the knots can be also specified by the user.
Example
from matplotlib import pyplot as plt
import numpy as np
def func(x):
return 1/(1+25*x**2)
# make example data
x = np.linspace(-1,1,300)
y = func(x) + np.random.normal(0, 0.2, len(x))
# The number of knots can be used to control the amount of smoothness
model_6 = get_natural_cubic_spline_model(x, y, minval=min(x), maxval=max(x), n_knots=6)
model_15 = get_natural_cubic_spline_model(x, y, minval=min(x), maxval=max(x), n_knots=15)
y_est_6 = model_6.predict(x)
y_est_15 = model_15.predict(x)
plt.plot(x, y, ls='', marker='.', label='originals')
plt.plot(x, y_est_6, marker='.', label='n_knots = 6')
plt.plot(x, y_est_15, marker='.', label='n_knots = 15')
plt.legend(); plt.show()
The source code for get_natural_cubic_spline_model
import numpy as np
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
def get_natural_cubic_spline_model(x, y, minval=None, maxval=None, n_knots=None, knots=None):
"""
Get a natural cubic spline model for the data.
For the knots, give (a) `knots` (as an array) or (b) minval, maxval and n_knots.
If the knots are not directly specified, the resulting knots are equally
space within the *interior* of (max, min). That is, the endpoints are
*not* included as knots.
Parameters
----------
x: np.array of float
The input data
y: np.array of float
The outpur data
minval: float
Minimum of interval containing the knots.
maxval: float
Maximum of the interval containing the knots.
n_knots: positive integer
The number of knots to create.
knots: array or list of floats
The knots.
Returns
--------
model: a model object
The returned model will have following method:
- predict(x):
x is a numpy array. This will return the predicted y-values.
"""
if knots:
spline = NaturalCubicSpline(knots=knots)
else:
spline = NaturalCubicSpline(max=maxval, min=minval, n_knots=n_knots)
p = Pipeline([
('nat_cubic', spline),
('regression', LinearRegression(fit_intercept=True))
])
p.fit(x, y)
return p
class AbstractSpline(BaseEstimator, TransformerMixin):
"""Base class for all spline basis expansions."""
def __init__(self, max=None, min=None, n_knots=None, n_params=None, knots=None):
if knots is None:
if not n_knots:
n_knots = self._compute_n_knots(n_params)
knots = np.linspace(min, max, num=(n_knots + 2))[1:-1]
max, min = np.max(knots), np.min(knots)
self.knots = np.asarray(knots)
#property
def n_knots(self):
return len(self.knots)
def fit(self, *args, **kwargs):
return self
class NaturalCubicSpline(AbstractSpline):
"""Apply a natural cubic basis expansion to an array.
The features created with this basis expansion can be used to fit a
piecewise cubic function under the constraint that the fitted curve is
linear *outside* the range of the knots.. The fitted curve is continuously
differentiable to the second order at all of the knots.
This transformer can be created in two ways:
- By specifying the maximum, minimum, and number of knots.
- By specifying the cutpoints directly.
If the knots are not directly specified, the resulting knots are equally
space within the *interior* of (max, min). That is, the endpoints are
*not* included as knots.
Parameters
----------
min: float
Minimum of interval containing the knots.
max: float
Maximum of the interval containing the knots.
n_knots: positive integer
The number of knots to create.
knots: array or list of floats
The knots.
"""
def _compute_n_knots(self, n_params):
return n_params
#property
def n_params(self):
return self.n_knots - 1
def transform(self, X, **transform_params):
X_spl = self._transform_array(X)
if isinstance(X, pd.Series):
col_names = self._make_names(X)
X_spl = pd.DataFrame(X_spl, columns=col_names, index=X.index)
return X_spl
def _make_names(self, X):
first_name = "{}_spline_linear".format(X.name)
rest_names = ["{}_spline_{}".format(X.name, idx)
for idx in range(self.n_knots - 2)]
return [first_name] + rest_names
def _transform_array(self, X, **transform_params):
X = X.squeeze()
try:
X_spl = np.zeros((X.shape[0], self.n_knots - 1))
except IndexError: # For arrays with only one element
X_spl = np.zeros((1, self.n_knots - 1))
X_spl[:, 0] = X.squeeze()
def d(knot_idx, x):
def ppart(t): return np.maximum(0, t)
def cube(t): return t*t*t
numerator = (cube(ppart(x - self.knots[knot_idx]))
- cube(ppart(x - self.knots[self.n_knots - 1])))
denominator = self.knots[self.n_knots - 1] - self.knots[knot_idx]
return numerator / denominator
for i in range(0, self.n_knots - 2):
X_spl[:, i+1] = (d(i, X) - d(self.n_knots - 2, X)).squeeze()
return X_spl
You could use this numpy/scipy implementation of natural cubic smoothing spline for univariate/multivariate data smoothing. Smoothing parameter should be in range [0.0, 1.0]. If we use smoothing parameter equal to 1.0 we get natural cubic spline interpolant without data smoothing. Also the implementation supports vectorization for univariate data.
Univariate example:
import numpy as np
import matplotlib.pyplot as plt
import csaps
np.random.seed(1234)
x = np.linspace(-5., 5., 25)
y = np.exp(-(x/2.5)**2) + (np.random.rand(25) - 0.2) * 0.3
sp = csaps.UnivariateCubicSmoothingSpline(x, y, smooth=0.85)
xs = np.linspace(x[0], x[-1], 150)
ys = sp(xs)
plt.plot(x, y, 'o', xs, ys, '-')
plt.show()
Bivariate example:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import csaps
xdata = [np.linspace(-3, 3, 61), np.linspace(-3.5, 3.5, 51)]
i, j = np.meshgrid(*xdata, indexing='ij')
ydata = (3 * (1 - j)**2. * np.exp(-(j**2) - (i + 1)**2)
- 10 * (j / 5 - j**3 - i**5) * np.exp(-j**2 - i**2)
- 1 / 3 * np.exp(-(j + 1)**2 - i**2))
np.random.seed(12345)
noisy = ydata + (np.random.randn(*ydata.shape) * 0.75)
sp = csaps.MultivariateCubicSmoothingSpline(xdata, noisy, smooth=0.988)
ysmth = sp(xdata)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_wireframe(j, i, noisy, linewidths=0.5, color='r')
ax.scatter(j, i, noisy, s=5, c='r')
ax.plot_surface(j, i, ysmth, linewidth=0, alpha=1.0)
plt.show()
The python package patsy has functions for generating spline bases, including a natural cubic spline basis. Described in the documentation.
Any library can then be used for fitting a model, e.g. scikit-learn or statsmodels.
The df parameter for cr() can be used to control the "smoothness"
Note that too low df can result to underfit (see below).
A simple example using scikit-learn.
import numpy as np
from sklearn.linear_model import LinearRegression
from patsy import cr
import matplotlib.pyplot as plt
n_obs = 600
np.random.seed(0)
x = np.linspace(-3, 3, n_obs)
y = 1 / (x ** 2 + 1) * np.cos(np.pi * x) + np.random.normal(0, 0.2, size=n_obs)
def plot_smoothed(df=5):
# Generate spline basis with different degrees of freedom
x_basis = cr(x, df=df, constraints="center")
# Fit model to the data
model = LinearRegression().fit(x_basis, y)
# Get estimates
y_hat = model.predict(x_basis)
plt.plot(x, y_hat, label=f"df={df}")
plt.scatter(x, y, s=4, color="tab:blue")
for df in (5, 7, 10, 25):
plot_smoothed(df)
plt.legend()
plt.title(f"Natural cubic spline with varying degrees of freedom")
plt.show()
For a project of mine, I needed to create intervals for time-series modeling, and to make the procedure more efficient I created tsmoothie: A python library for time-series smoothing and outlier detection in a vectorized way.
It provides different smoothing algorithms together with the possibility to computes intervals.
In the case of SplineSmoother of natural cubic type:
import numpy as np
import matplotlib.pyplot as plt
from tsmoothie.smoother import *
def func(x):
return 1/(1+25*x**2)
# make example data
x = np.linspace(-1,1,300)
y = func(x) + np.random.normal(0, 0.2, len(x))
# operate smoothing
smoother = SplineSmoother(n_knots=10, spline_type='natural_cubic_spline')
smoother.smooth(y)
# generate intervals
low, up = smoother.get_intervals('prediction_interval', confidence=0.05)
# plot the first smoothed timeseries with intervals
plt.figure(figsize=(11,6))
plt.plot(smoother.smooth_data[0], linewidth=3, color='blue')
plt.plot(smoother.data[0], '.k')
plt.fill_between(range(len(smoother.data[0])), low[0], up[0], alpha=0.3)
I point out also that tsmoothie can carry out the smoothing of multiple time-series in a vectorized way
The programming language R offers a very good implementation of natural cubic smoothing splines. You can use R functions in Python with rpy2:
import rpy2.robjects as robjects
r_y = robjects.FloatVector(y_train)
r_x = robjects.FloatVector(x_train)
r_smooth_spline = robjects.r['smooth.spline'] #extract R function# run smoothing function
spline1 = r_smooth_spline(x=r_x, y=r_y, spar=0.7)
ySpline=np.array(robjects.r['predict'](spline1,robjects.FloatVector(x_smooth)).rx2('y'))
plt.plot(x_smooth,ySpline)
If you want to directly set lambda: spline1 = r_smooth_spline(x=r_x, y=r_y, lambda=42) doesn't work, because lambda has already another meaning in Python, but there is a solution: How to use the lambda argument of smooth.spline in RPy WITHOUT Python interprating it as lambda.
To get the code running you first need to define the data x_train and y_train and you can define x_smooth=np.array(np.linspace(-3,5,1920)). if you want to plot it between -3 and 5 in Full-HD-resolution.
Note that this code is not fully compatible with Jupyter-notebooks for the latest versions of rpy2. You can fix this by using !pip install -Iv rpy2==3.4.2 as described in NotImplementedError: Conversion 'rpy2py' not defined for objects of type '<class 'rpy2.rinterface.SexpClosure'>' only after I run the code twice

numpy polyfit passing through 0

Suppose I have x and y vectors with a weight vector wgt. I can fit a cubic curve (y = a x^3 + b x^2 + c x + d) by using np.polyfit as follows:
y_fit = np.polyfit(x, y, deg=3, w=wgt)
Now, suppose I want to do another fit, but this time, I want the fit to pass through 0 (i.e. y = a x^3 + b x^2 + c x, d = 0), how can I specify a particular coefficient (i.e. d in this case) to be zero?
Thanks
You can try something like the following:
Import curve_fit from scipy, i.e.
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import numpy as np
Define the curve fitting function. In your case,
def fit_func(x, a, b, c):
# Curve fitting function
return a * x**3 + b * x**2 + c * x # d=0 is implied
Perform the curve fitting,
# Curve fitting
params = curve_fit(fit_func, x, y)
[a, b, c] = params[0]
x_fit = np.linspace(x[0], x[-1], 100)
y_fit = a * x_fit**3 + b * x_fit**2 + c * x_fit
Plot the results if you please,
plt.plot(x, y, '.r') # Data
plt.plot(x_fit, y_fit, 'k') # Fitted curve
It does not answer the question in the sense that it uses numpy's polyfit function to pass through the origin, but it solves the problem.
Hope someone finds it useful :)
You can use np.linalg.lstsq and construct your coefficient matrix manually. To start, I'll create the example data x and y, and the "exact fit" y0:
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(100)
y0 = 0.07 * x ** 3 + 0.3 * x ** 2 + 1.1 * x
y = y0 + 1000 * np.random.randn(x.shape[0])
Now I'll create a full cubic polynomial 'training' or 'independent variable' matrix that includes the constant d column.
XX = np.vstack((x ** 3, x ** 2, x, np.ones_like(x))).T
Let's see what I get if I compute the fit with this dataset and compare it to polyfit:
p_all = np.linalg.lstsq(X_, y)[0]
pp = np.polyfit(x, y, 3)
print np.isclose(pp, p_all).all()
# Returns True
Where I've used np.isclose because the two algorithms do produce very small differences.
You're probably thinking 'that's nice, but I still haven't answered the question'. From here, forcing the fit to have a zero offset is the same as dropping the np.ones column from the array:
p_no_offset = np.linalg.lstsq(XX[:, :-1], y)[0] # use [0] to just grab the coefs
Ok, let's see what this fit looks like compared to our data:
y_fit = np.dot(p_no_offset, XX[:, :-1].T)
plt.plot(x, y0, 'k-', linewidth=3)
plt.plot(x, y_fit, 'y--', linewidth=2)
plt.plot(x, y, 'r.', ms=5)
This gives this figure,
WARNING: When using this method on data that does not actually pass through (x,y)=(0,0) you will bias your estimates of your output solution coefficients (p) because lstsq will be trying to compensate for that fact that there is an offset in your data. Sort of a 'square peg round hole' problem.
Furthermore, you could also fit your data to a cubic only by doing:
p_ = np.linalg.lstsq(X_[:1, :], y)[0]
Here again the warning above applies. If your data contains quadratic, linear or constant terms the estimate of the cubic coefficient will be biased. There can be times when - for numerical algorithms - this sort of thing is useful, but for statistical purposes my understanding is that it is important to include all of the lower terms. If tests turn out to show that the lower terms are not statistically different from zero that's fine, but for safety's sake you should probably leave them in when you estimate your cubic.
Best of luck!

Why is my 2D interpolant generating a matrix with swapped axes in SciPy?

I solve a differential equation with vector inputs
y' = f(t,y), y(t_0) = y_0
where y0 = y(x)
using the explicit Euler method, which says that
y_(i+1) = y_i + h*f(t_i, y_i)
where t is a time vector, h is the step size, and f is the right-hand side of the differential equation.
The python code for the method looks like this:
for i in np.arange(0,n-1):
y[i+1,...] = y[i,...] + dt*myode(t[i],y[i,...])
The result is a k,m matrix y, where k is the size of the t dimension, and m is the size of y.
The vectors y and t are returned.
t, x, and y are passed to scipy.interpolate.RectBivariateSpline(t, x, y, kx=1, ky=1):
g = scipy.interpolate.RectBivariateSpline(t, x, y, kx=1, ky=1)
The resulting object g takes new vectors ti,xi ( g(p,q) ) to give y_int, which is y interpolated at the points defined by ti and xi.
Here is my problem:
The documentation for RectBivariateSpline describes the __call__ method in terms of x and y:
__call__(x, y[, mth]) Evaluate spline at the grid points defined by the coordinate arrays
The matplotlib documentation for plot_surface uses similar notation:
Axes3D.plot_surface(X, Y, Z, *args, **kwargs)
with the important difference that X and Y are 2D arrays which are generated by numpy.meshgrid().
When I compute simple examples, the input order is the same in both and the result is exactly what I would expect. In my explicit Euler example, however, the initial order is ti,xi, yet the surface plot of the interpolant output only makes sense if I reverse the order of the inputs, like so:
ax2.plot_surface(xi, ti, u, cmap=cm.coolwarm)
While I am glad that it works, I'm not satisfied because I cannot explain why, nor why (apart from the array geometry) it is necessary to swap the inputs. Ideally, I would like to restructure the code so that the input order is consistent.
Here is a working code example to illustrate what I mean:
# Heat equation example with explicit Euler method
import numpy as np
import matplotlib.pyplot as mplot
import matplotlib.cm as cm
import scipy.sparse as sp
import scipy.interpolate as interp
from mpl_toolkits.mplot3d import Axes3D
import pdb
# explicit Euler method
def eev(myode,tspan,y0,dt):
# Preprocessing
# Time steps
tspan[1] = tspan[1] + dt
t = np.arange(tspan[0],tspan[1],dt,dtype=float)
n = t.size
m = y0.shape[0]
y = np.zeros((n,m),dtype=float)
y[0,:] = y0
# explicit Euler recurrence relation
for i in np.arange(0,n-1):
y[i+1,...] = y[i,...] + dt*myode(t[i],y[i,...])
return y,t
# generate matrix A
# u'(t) = A*u(t) + g*u(t)
def a_matrix(n):
aa = sp.diags([1, -2, 1],[-1,0,1],(n,n))
return aa
# System of ODEs with finite differences
def f(t,u):
dydt = np.divide(1,h**2)*A.dot(u)
return dydt
# homogenous Dirichlet boundary conditions
def rbd(t):
ul = np.zeros((t,1))
return ul
# Initial value problem -----------
def main():
# Metal rod
# spatial discretization
# number of inner nodes
m = 20
x0 = 0
xn = 1
x = np.linspace(x0,xn,m+2)
# Step size
global h
h = x[1]-x[0]
# Initial values
u0 = np.sin(np.pi*x)
# A matrix
global A
A = a_matrix(m)
# Time
t0 = 0
tend = 0.2
# Time step width
dt = 0.0001
tspan = [t0,tend]
# Test r for stability
r = np.divide(dt,h**2)
if r <= 0.5:
u,t = eev(f,tspan,u0[1:-1],dt)
else:
print('r = ',r)
print('r > 0.5. Explicit Euler method will not be stable.')
# Add boundary values back
rb = rbd(t.size)
u = np.hstack((rb,u,rb))
# Interpolate heat values
# Create interpolant. Note the parameter order
fi = interp.RectBivariateSpline(t, x, u, kx=1, ky=1)
# Create vectors for interpolant
xi = np.linspace(x[0],x[-1],100)
ti = np.linspace(t0,tend,100)
# Compute function values from interpolant
u_int = fi(ti,xi)
# Change xi, ti in to 2D arrays
xi,ti = np.meshgrid(xi,ti)
# Create figure and axes objects
fig3 = mplot.figure(1)
ax3 = fig3.gca(projection='3d')
print('xi.shape =',xi.shape,'ti.shape =',ti.shape,'u_int.shape =',u_int.shape)
# Plot surface. Note the parameter order, compare with interpolant!
ax3.plot_surface(xi, ti, u_int, cmap=cm.coolwarm)
ax3.set_xlabel('xi')
ax3.set_ylabel('ti')
main()
mplot.show()
As I can see you define :
# Change xi, ti in to 2D arrays
xi,ti = np.meshgrid(xi,ti)
Change this to :
ti,xi = np.meshgrid(ti,xi)
and
ax3.plot_surface(xi, ti, u_int, cmap=cm.coolwarm)
to
ax3.plot_surface(ti, xi, u_int, cmap=cm.coolwarm)
and it works fine (if I understood well ).

finding inflection points in spline fitted 1d data

I have some one dimensional data and fit it with a spline. Then I want to find the inflection points (ignoring saddle points) in it. Now I am searching the extrema of its first derivation by using scipy.signal.argrelmin (and argrelmax) on a lot of values generated by splev.
import scipy.interpolate
import scipy.optimize
import scipy.signal
import numpy as np
import matplotlib.pyplot as plt
import operator
y = [-1, 5, 6, 4, 2, 5, 8, 5, 1]
x = np.arange(0, len(y))
tck = scipy.interpolate.splrep(x, y, s=0)
print 'roots', scipy.interpolate.sproot(tck)
# output:
# [0.11381478]
xnew = np.arange(0, len(y), 0.01)
ynew = scipy.interpolate.splev(xnew, tck, der=0)
ynew_deriv = scipy.interpolate.splev(xnew, tck, der=1)
min_idxs = scipy.signal.argrelmin(ynew_deriv)
max_idxs = scipy.signal.argrelmax(ynew_deriv)
mins = zip(xnew[min_idxs].tolist(), ynew_deriv[min_idxs].tolist())
maxs = zip(xnew[max_idxs].tolist(), ynew_deriv[max_idxs].tolist())
inflection_points = sorted(mins + maxs, key=operator.itemgetter(0))
print 'inflection_points', inflection_points
# output:
# [(3.13, -2.9822449358974357),
# (5.03, 4.3817785256410255)
# (7.13, -4.867132628205128)]
plt.legend(['data','Cubic Spline', '1st deriv'])
plt.plot(x, y, 'o',
xnew, ynew, '-',
xnew, ynew_deriv, '-')
plt.show()
But this feels terribly wrong. I guess there is a possibility to find what I am looking for without generating so many values. Something like sproot but applicable to the second derivation perhaps?
The derivative of a B-spline is also a B-spline. You can therefore first fit a spline to your data, then use the derivative formula to construct the coefficients of the derivative spline, and finally use the spline root finding to get the roots of the derivative spline. These are then the maxima/minima of the original curve.
Here is code to do it: https://gist.github.com/pv/5504366
The relevant computation of the coefficients is:
t, c, k = scipys_spline_representation
# Compute the denominator in the differentiation formula.
dt = t[k+1:-1] - t[1:-k-1]
# Compute the new coefficients
d = (c[1:-1-k] - c[:-2-k]) * k / dt
# Adjust knots
t2 = t[1:-1]
# Pad coefficient array to same size as knots (FITPACK convention)
d = np.r_[d, [0]*k]
# Done, a new spline
new_spline_repr = t2, d, k-1

Categories