How to interpolate data with NaN - values using scipy's interpolation routines?

How to interpolate data with NaN - values using scipy's interpolation routines? - python

I have some 2D data-arrays which I need to interpolate. SciPy offers some solutions to this, as nicely explained for example here. My data very roughly looks like a 2D parabola (as a function of x1 and x2), with the opening pointing downwards, as illustrated in Fig. 1 for a horizontal cut along x2=0. As you can see, there are no negative values (the datapoints are all exactly 0 there).
I wanted to perform cubic interpolation as I require smooth data. This gives a problem at the edges, resulting in "wiggling" or "overshooting" of the fit/interpolation, as illustrated in Fig. 2. Negative values are, however, not allowed in the following post-processing of the interpolated data (also the following overshooting to positive values where they should be zero need to be suppressed).
.
I thought that a clever "solution" would be to simply set the values which are 0 (note that those are all exactly the same) to NaN, such that they are ignored by the interpolation. But using SciPy's griddata and the cubic method is not working with NaNs. The linear method can handle this, but I need cubic.
My question is, am I missing something or doing something wrong that results in griddata not working properly with NaNs and the cubic method ?
An example code is as follows:
import matplotlib.pyplot as plt
import numpy as np
import scipy.interpolate as interp
def some_funct( x1, x2 ):
result = -x1**2 - 2.*x2**2 + 60.
result[result < .0] = .0
return result
# define original (sparse) grid
N_x1, N_x2 = 20, 20
x1_old = np.linspace( -9, 10, N_x1 )
x2_old = np.linspace( -9, 10, N_x2 )
X1_old, X2_old = np.meshgrid( x1_old, x2_old )
# datapoints to be interpolated
z_old = some_funct( X1_old, X2_old )
# set 0 datapoints to nan
z_old[z_old==0] = np.nan
# grid for interpolation
x1_new = np.linspace( -9, 10, 10*N_x1 )
x2_new = np.linspace( -9, 10, 10*N_x2 )
X1_new, X2_new = np.meshgrid( x1_new, x2_new )
# perform interpolation
z_new = interp.griddata( np.array([X1_old.ravel(),X2_old.ravel()]).T, z_old.ravel(),
(X1_new, X2_new),
method='cubic', fill_value=.0 # only works for 'linear'
)
# plot horizonal cut along x2=0
fig1 = plt.figure( figsize=(8,6) )
ax1 = fig1.add_subplot( 1,1,1 )
x2_old_0_id = (np.abs(x2_old - .0)).argmin()
x2_new_0_id = (np.abs(x2_new - .0)).argmin()
ax1.plot( x1_old, z_old[ x2_old_0_id , : ], marker='x', linestyle='None', label='data' )
ax1.plot( x1_new, z_new[ x2_new_0_id , : ], label='interpolation' )
ax1.legend()
ax1.set_xlabel( 'x1' )
ax1.set_ylabel( 'z' )
plt.show()
Any hints are greatly appreciated!
Update: Forgot to include the versions I am using:
numpy: 1.15.1
scipy: 1.1.0

For monotone cubic interpolation, which does not overshoot, use pchip or Akima1DInterpolator

Related

contourf() plots white space over finite data

I'm attempting to plot a 3D chart using matplotlib.pyplot.contourf() with the following program:
import numpy as np
import matplotlib.pyplot as plt
import scipy
# calculates Fast Fourier transforms for each value in the 1D array "Altitude"
# and stacks them vertically to form a 2D array of fft values called "Fourier"
Fourier = np.array([])
for i in range(len(Altitude)):
Ne_fft = Ne_lowpass[i,:]/np.average(Ne_lowpass[i,:])
Ne_fft = Ne_fft - Ne_fft.mean()
W = scipy.fftpack.fftfreq(10*Ne_fft.size, d=(Time[-1]-Time[0])/len(Ne_fft))
P = 1/abs(W)
FFT = abs(scipy.fftpack.fft(Ne_fft, n=10*len(Ne_fft)))
FFT = FFT**2
if len(Fourier) == 0:
Fourier = FFT
else:
Fourier = np.vstack((Fourier,FFT))
# plots the 2D contourf plot of "Fourier", with respect to "Altitude" and period "P"
plt.figure(5)
C = plt.contourf(P,Altitude,Fourier,100,cmap='jet')
plt.xscale('log')
plt.xlim([1,P[np.argmax(P)+1]])
plt.ylim([59,687])
plt.ylabel("Altitude")
plt.xlabel("Period")
plt.title("Power spectrum of Ne")
cbar = plt.colorbar(C)
cbar.set_label("Power", fontsize = 16)
For the most part it is working fine; however, in some places useless white space is plotted. the plot produced can be found here (sorry, I don't have enough reputation points to attach images directly)
The purpose of this program is to calculate a series of Fast Fourier Transforms across 1 axis of a 2 dimensional numpy array, and stack them up to display a contour plot depicting which periodicities are most prominent in the data.
I checked the parts of the plotted quantity that appear white, and finite values are still present, although much smaller than noticable quantities elsewhere in the plot:
print(Fourier[100:,14000:])
[[ 2.41147887e-03 1.50783490e-02 4.82620482e-02 ..., 1.49769976e+03
5.88859945e+02 1.31930217e+02]
[ 2.12684922e-03 1.44076962e-02 4.65881565e-02 ..., 1.54719976e+03
6.14086374e+02 1.38727145e+02]
[ 1.84414615e-03 1.38162140e-02 4.51940720e-02 ..., 1.56478339e+03
6.23619105e+02 1.41367042e+02]
...,
[ 3.51539440e-03 3.20182148e-03 2.38117665e-03 ..., 2.43824864e+03
1.18676851e+03 3.13067945e+02]
[ 3.51256439e-03 3.19924000e-03 2.37923875e-03 ..., 2.43805298e+03
1.18667139e+03 3.13042038e+02]
[ 3.50985146e-03 3.19677302e-03 2.37741084e-03 ..., 2.43790243e+03
1.18659640e+03 3.13021994e+02]]
print(np.isfinite(Fourier.all()))
True
print(np.isnan(Fourier.any()))
False
Is the white space present because the values are so small compared to the rest of the plot? I'm not sure at all how to fix this.

You can fix this problem by adding option extend='both'.
Example:
C = plt.contourf(P,Altitude, Fourier,100, cmap='jet', extend='both')
Ref: https://matplotlib.org/examples/pylab_examples/contourf_demo.html

In the line plt.contourf(P,Altitude,Fourier,100,cmap='jet') you are taking 100 automatically chosen levels for the contour plot. "Automatic" in this case does not guarantee that those levels include all data.
If you want to make sure they all data is included you may define you own levels to use
plt.contourf(x, y, Z, np.linspace(Z.min(), Z.max(), 100))

Using FFT for 3D array representation of 2D field

I need to obtain the fourier transform of a complex field. I'm using python.
My input is a 2D snapshot of the electric field in the xy-plane.
I currently have a 3D array F[x][y][z] where F[x][y][0] contains the real component and F[x][y]1 contains the complex component of the field.
My current code is very simple and does this:
result=np.fft.fftn(F)
result=np.fft.fftshift(result)
I have the following questions:
1) Does this correctly compute the fourier transform of the field, or should the field be entered as a 2D matrix with each element containing both the real and imaginary component instead?
2) I entered the complex component values of the field using the real multiple only (i.e if the complex value is 6i I entered 6), is this correct or should this be entered as a complex value instead (i.e. entered as '6j')?
3) As this is technically a 2D input field, should I use np.fft.fft2 instead? Doing this means the output is not centered in the middle.
4) The output does not look like what I'd expect the fourier transform of F to look like, and I'm unsure what I'm doing wrong.
Full example code:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
x, y = np.meshgrid(np.linspace(-1,1,100), np.linspace(-1,1,100))
d = np.sqrt(x*x+y*y)
sigma, mu = .35, 0.0
g1 = np.exp(-( (d-mu)**2 / ( 2.0 * sigma**2 ) ) )
F=np.empty(shape=(300,300,2),dtype=complex)
for x in range(0,300):
for y in range(0,300):
if y<50 or x<100 or y>249 or x>199:
F[x][y][0]=g1[0][0]
F[x][y][1]=0j
elif y<150:
F[x][y][0]=g1[x-100][y-50]
F[x][y][1]=0j
else:
F[x][y][0]=g1[x-100][y-150]
F[x][y][1]=0j
F_2D=np.empty(shape=(300,300))
for x in range(0,300):
for y in range(0,300):
F_2D[x][y]=np.absolute(F[x][y][0])+np.absolute(F[x][y][1])
plt.imshow(F_2D)
plt.show()
result=np.fft.fftn(F)
result=np.fft.fftshift(result)
result_2D=np.empty(shape=(300,300))
for x in range(0,300):
for y in range(0,300):
result_2D[x][y]=np.absolute(result[x][y][0])+np.absolute(result[x][y][1])
plt.imshow(result_2D)
plt.show()
plotting F gives this:
With np.fft.fftn, the image shown at the end is:
And with np.fft.fft2:
Neither of these look like what I would expect the fourier transform of F to look like.

I add here another answer, suitable to the added code.
The answer is still np.fft.fft2(). Here's an example. I modified the code slightly. To verify that we need fft2 I discarded one of the blobs, and then we know that a single Gaussian blob should transform into a Gaussian blob (with a certain phase, that's not shown when plotting absolute value). I also decreased the standard deviation so that the frequency response will widen a little.
Code:
import numpy as np
import matplotlib.pyplot as plt
x, y = np.meshgrid(np.linspace(-1,1,100), np.linspace(-1,1,100))
d = np.sqrt(x**2+y**2)
sigma, mu = .1, 0.0
g1 = np.exp(-( (d-mu)**2 / ( 2.0 * sigma**2 ) ) )
N = 300
positions = [ [150,100] ]#, [150,200] ]
sz2 = [int(x/2) for x in g1.shape]
F_2D = np.zeros([N,N])
for x0,y0 in positions:
F_2D[ x0-sz2[0]: x0+sz2[0], y0-sz2[1]:y0+sz2[1] ] = g1 + 1j*0.
result = np.fft.fftshift(np.fft.fft2(F_2D))
plt.subplot(211); plt.imshow(F_2D)
plt.subplot(212); plt.imshow(np.absolute(result))
plt.title('$\sigma$=.1')
plt.show()
Result:
To get back to the original problem, we need only change
positions = [ [150,100] , [150,200] ]
and sigma=.35 instead of sigma=.1.

You should use complex numpy variables (by using 1j) and use fft2. For example:
N = 16
x0 = np.random.randn(N,N,2)
x = x0[:,:,0] + 1j*x0[:,:,1]
X = np.fft.fft2(x)
Using fftn on x0 will do a 3D FFT, and using fft will do vector-wise 1D FFT.

How to interpolate a line between two other lines in python

Note: I asked this question before but it was closed as a duplicate, however, I, along with several others believe it was unduely closed, I explain why in an edit in my original post. So I would like to re-ask this question here again.
Does anyone know of a python library that can interpolate between two lines. For example, given the two solid lines below, I would like to produce the dashed line in the middle. In other words, I'd like to get the centreline. The input is a just two numpy arrays of coordinates with size N x 2 and M x 2 respectively.
Furthermore, I'd like to know if someone has written a function for this in some optimized python library. Although optimization isn't exactly a necessary.
Here is an example of two lines that I might have, you can assume they do not overlap with each other and an x/y can have multiple y/x coordinates.
array([[ 1233.87375018, 1230.07095987],
[ 1237.63559365, 1253.90749041],
[ 1240.87500801, 1264.43925132],
[ 1245.30875975, 1274.63795396],
[ 1256.1449357 , 1294.48254424],
[ 1264.33600095, 1304.47893299],
[ 1273.38192911, 1313.71468591],
[ 1283.12411536, 1322.35942538],
[ 1293.2559388 , 1330.55873344],
[ 1309.4817002 , 1342.53074698],
[ 1325.7074616 , 1354.50276051],
[ 1341.93322301, 1366.47477405],
[ 1358.15898441, 1378.44678759],
[ 1394.38474581, 1390.41880113]])
array([[ 1152.27115094, 1281.52899302],
[ 1155.53345506, 1295.30515742],
[ 1163.56506781, 1318.41642169],
[ 1168.03497425, 1330.03181319],
[ 1173.26135672, 1341.30559949],
[ 1184.07110925, 1356.54121651],
[ 1194.88086178, 1371.77683353],
[ 1202.58908737, 1381.41765447],
[ 1210.72465255, 1390.65097106],
[ 1227.81309742, 1403.2904646 ],
[ 1244.90154229, 1415.92995815],
[ 1261.98998716, 1428.56945169],
[ 1275.89219696, 1438.21626352],
[ 1289.79440676, 1447.86307535],
[ 1303.69661656, 1457.50988719],
[ 1323.80994319, 1470.41028655],
[ 1343.92326983, 1488.31068591],
[ 1354.31738934, 1499.33260989],
[ 1374.48879779, 1516.93734053],
[ 1394.66020624, 1534.54207116]])
Visualizing this we have:
So my attempt at this has been using the skeletonize function in the skimage.morphology library by first rasterizing the coordinates into a filled in polygon. However, I get branching at the ends like this:

First of all, pardon the overkill; I had fun with your question. If the description is too long, feel free to skip to the bottom, I defined a function that does everything I describe.
Your problem would be relatively straightforward if your arrays were the same length. In that case, all you would have to do is find the average between the corresponding x values in each array, and the corresponding y values in each array.
So what we can do is create arrays of the same length, that are more or less good estimates of your original arrays. We can do this by fitting a polynomial to the arrays you have. As noted in comments and other answers, the midline of your original arrays is not specifically defined, so a good estimate should fulfill your needs.
Note: In all of these examples, I've gone ahead and named the two arrays that you posted a1 and a2.
Step one: Create new arrays that estimate your old lines
Looking at the data you posted:
These aren't particularly complicated functions, it looks like a 3rd degree polynomial would fit them pretty well. We can create those using numpy:
import numpy as np
# Find the range of x values in a1
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
# Create an evenly spaced array that ranges from the minimum to the maximum
# I used 100 elements, but you can use more or fewer.
# This will be used as your new x coordinates
new_a1_x = np.linspace(min_a1_x, max_a1_x, 100)
# Fit a 3rd degree polynomial to your data
a1_coefs = np.polyfit(a1[:,0],a1[:,1], 3)
# Get your new y coordinates from the coefficients of the above polynomial
new_a1_y = np.polyval(a1_coefs, new_a1_x)
# Repeat for array 2:
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a2_x = np.linspace(min_a2_x, max_a2_x, 100)
a2_coefs = np.polyfit(a2[:,0],a2[:,1], 3)
new_a2_y = np.polyval(a2_coefs, new_a2_x)
The result:
That's not bad so bad! If you have more complicated functions, you'll have to fit a higher degree polynomial, or find some other adequate function to fit to your data.
Now, you've got two sets of arrays of the same length (I chose a length of 100, you can do more or less depending on how smooth you want your midpoint line to be). These sets represent the x and y coordinates of the estimates of your original arrays. In the example above, I named these new_a1_x, new_a1_y, new_a2_x and new_a2_y.
Step two: calculate the average between each x and each y in your new arrays
Then, we want to find the average x and average y value for each of our estimate arrays. Just use np.mean:
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(100)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(100)]
midx and midy now represent the midpoint between our 2 estimate arrays. Now, just plot your original (not estimate) arrays, alongside your midpoint array:
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
And voilà:
This method still works with more complex, noisy data (but you have to fit the function thoughtfully):
As a function:
I've put the above code in a function, so you can use it easily. It returns an array of your estimated midpoints, in the format you had your original arrays in.
The arguments: a1 and a2 are your 2 input arrays, poly_deg is the degree polynomial you want to fit, n_points is the number of points you want in your midpoint array, and plot is a boolean, whether you want to plot it or not.
import matplotlib.pyplot as plt
import numpy as np
def interpolate(a1, a2, poly_deg=3, n_points=100, plot=True):
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
new_a1_x = np.linspace(min_a1_x, max_a1_x, n_points)
a1_coefs = np.polyfit(a1[:,0],a1[:,1], poly_deg)
new_a1_y = np.polyval(a1_coefs, new_a1_x)
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a2_x = np.linspace(min_a2_x, max_a2_x, n_points)
a2_coefs = np.polyfit(a2[:,0],a2[:,1], poly_deg)
new_a2_y = np.polyval(a2_coefs, new_a2_x)
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(n_points)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(n_points)]
if plot:
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
return np.array([[x, y] for x, y in zip(midx, midy)])
[EDIT]:
I was thinking back on this question, and I overlooked a simpler way to do this, by "densifying" both arrays to the same number of points using np.interp. This method follows the same basic idea as the line-fitting method above, but instead of approximating lines using polyfit / polyval, it just densifies:
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a1_x = np.linspace(min_a1_x, max_a1_x, 100)
new_a2_x = np.linspace(min_a2_x, max_a2_x, 100)
new_a1_y = np.interp(new_a1_x, a1[:,0], a1[:,1])
new_a2_y = np.interp(new_a2_x, a2[:,0], a2[:,1])
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(100)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(100)]
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()

The "line between two lines" is not so well defined. You can obtain a decent though simple solution by triangulating between the two curves (you can triangulate by progressing from vertex to vertex, choosing the diagonals that produce the less skewed triangle).
Then the interpolated curve joins the middles of the sides.

I work with rivers, so this is a common problem. One of my solutions is exactly like the one you showed in your question--i.e. skeletonize the blob. You see that the boundaries have problems, so what I've done that seems to work well is to simply mirror the boundaries. For this approach to work, the blob must not intersect the corners of the image.
You can find my implementation in RivGraph; this particular algorithm is in rivers/river_utils.py called "mask_to_centerline".
Here's an example output showing how the ends of the centerline extend to the desired edge of the object:

sacuL's solution almost worked for me, but I needed to aggregate more than just two curves.
Here is my generalization for sacuL's solution:
def interp(*axis_list):
min_max_xs = [(min(axis[:,0]), max(axis[:,0])) for axis in axis_list]
new_axis_xs = [np.linspace(min_x, max_x, 100) for min_x, max_x in min_max_xs]
new_axis_ys = [np.interp(new_x_axis, axis[:,0], axis[:,1]) for axis, new_x_axis in zip(axis_list, new_axis_xs)]
midx = [np.mean([new_axis_xs[axis_idx][i] for axis_idx in range(len(axis_list))]) for i in range(100)]
midy = [np.mean([new_axis_ys[axis_idx][i] for axis_idx in range(len(axis_list))]) for i in range(100)]
for axis in axis_list:
plt.plot(axis[:,0], axis[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
If we now run an example:
a1 = np.array([[x, x**2+5*(x%4)] for x in range(10)])
a2 = np.array([[x-0.5, x**2+6*(x%3)] for x in range(10)])
a3 = np.array([[x+0.2, x**2+7*(x%2)] for x in range(10)])
interp(a1, a2, a3)
we get the plot:

Why does InterpolatedUnivariateSpline return nan values

I have some data, y vs x, which I would like to interpolate at a finer resolution xx using a cubic spline.
Here is my dataset:
import numpy as np
print np.version.version
import scipy
print scipy.version.version
1.9.2
0.15.1
x = np.array([0.5372973, 0.5382103, 0.5392305, 0.5402197, 0.5412042, 0.54221, 0.543209,
0.5442277, 0.5442277, 0.5452125, 0.546217, 0.5472153, 0.5482086,
0.5492241, 0.5502117, 0.5512249, 0.5522136, 0.5532056, 0.5532056,
0.5542281, 0.5552039, 0.5562125, 0.5567836])
y = np.array([0.01, 0.03108, 0.08981, 0.18362, 0.32167, 0.50941, 0.72415, 0.90698,
0.9071, 0.97955, 0.99802, 1., 0.97863, 0.9323, 0.85344, 0.72936,
0.56413, 0.36997, 0.36957, 0.17623, 0.05922, 0.0163, 0.01, ])
xx = np.array([0.5372981, 0.5374106, 0.5375231, 0.5376356, 0.5377481, 0.5378606,
0.5379731, 0.5380856, 0.5381981, 0.5383106, 0.5384231, 0.5385356,
0.5386481, 0.5387606, 0.5388731, 0.5389856, 0.5390981, 0.5392106,
0.5393231, 0.5394356, 0.5395481, 0.5396606, 0.5397731, 0.5398856,
0.5399981, 0.5401106, 0.5402231, 0.5403356, 0.5404481, 0.5405606,
0.5406731, 0.5407856, 0.5408981, 0.5410106, 0.5411231, 0.5412356,
0.5413481, 0.5414606, 0.5415731, 0.5416856, 0.5417981, 0.5419106,
0.5420231, 0.5421356, 0.5422481, 0.5423606, 0.5424731, 0.5425856,
0.5426981, 0.5428106, 0.5429231, 0.5430356, 0.5431481, 0.5432606,
0.5433731, 0.5434856, 0.5435981, 0.5437106, 0.5438231, 0.5439356,
0.5440481, 0.5441606, 0.5442731, 0.5443856, 0.5444981, 0.5446106,
0.5447231, 0.5448356, 0.5449481, 0.5450606, 0.5451731, 0.5452856,
0.5453981, 0.5455106, 0.5456231, 0.5457356, 0.5458481, 0.5459606,
0.5460731, 0.5461856, 0.5462981, 0.5464106, 0.5465231, 0.5466356,
0.5467481, 0.5468606, 0.5469731, 0.5470856, 0.5471981, 0.5473106,
0.5474231, 0.5475356, 0.5476481, 0.5477606, 0.5478731, 0.5479856,
0.5480981, 0.5482106, 0.5483231, 0.5484356, 0.5485481, 0.5486606,
0.5487731, 0.5488856, 0.5489981, 0.5491106, 0.5492231, 0.5493356,
0.5494481, 0.5495606, 0.5496731, 0.5497856, 0.5498981, 0.5500106,
0.5501231, 0.5502356, 0.5503481, 0.5504606, 0.5505731, 0.5506856,
0.5507981, 0.5509106, 0.5510231, 0.5511356, 0.5512481, 0.5513606,
0.5514731, 0.5515856, 0.5516981, 0.5518106, 0.5519231, 0.5520356,
0.5521481, 0.5522606, 0.5523731, 0.5524856, 0.5525981, 0.5527106,
0.5528231, 0.5529356, 0.5530481, 0.5531606, 0.5532731, 0.5533856,
0.5534981, 0.5536106, 0.5537231, 0.5538356, 0.5539481, 0.5540606,
0.5541731, 0.5542856, 0.5543981, 0.5545106, 0.5546231, 0.5547356,
0.5548481, 0.5549606, 0.5550731, 0.5551856, 0.5552981, 0.5554106,
0.5555231, 0.5556356, 0.5557481, 0.5558606, 0.5559731, 0.5560856,
0.5561981, 0.5563106, 0.5564231, 0.5565356, 0.5566481, 0.5567606])
I am trying to fit using the scipy InterpolatedUnivariateSpline method, interpolated with a 3rd order spline k=3, and extrapolated as zeros ext='zeros':
import scipy.interpolate as interp
yspline = interp.InterpolatedUnivariateSpline(x,y, k=3, ext='zeros')
yvals = yspline(xx)
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y, 'ko', label='Values')
ax.plot(xx, yvals, 'b-.', lw=2, label='Spline')
plt.xlim([min(x), max(x)])
However, as you can see in this image, my Spline returns NaN values :(
Is there a reason? I am pretty sure my x values are all increasing, so I am stumped as to why this is happening. I have many other datasets I am fitting using this method, and it only fails on this specific set of data.
Any help is greatly appreciated.
Thank you for reading.
EDIT!
The solution was that I have duplicate x values, with differing y values!

For this interpolation, you should rather use scipy.interpolate.interp1d with the argument kind='cubic' (see a related SO question )
I have yet to find a use case where InterpolatedUnivariateSpline can be used in practice (or maybe I just don't understand its purpose). With your code I get,
So the interpolation works but shows extremely strong oscillations, making it unusable, which is typically the result I was getting with this interpolation method in the past. With a lower order spline (e.g. k=1) that works better, but then you lose the advantage of cubic interpolation.

I've also encountered the problem with InterpolatedUnivariateSpline returning NaN values. But in my case the reason was not in having duplicates in x array but because values in x were decreasing when docs states that values "must be increasing".
So, in such a case, instead of original x and y one must supply them reversed: x[::-1] and y[::-1].

find tangent vector at a point for discrete data points

I have a vector with a min of two points in space, e.g:
A = np.array([-1452.18133319 3285.44737438 -7075.49516676])
B = np.array([-1452.20175668 3285.29632734 -7075.49110863])
I want to find the tangent of the vector at a discrete points along the curve, g.g the beginning and end of the curve. I know how to do it in Matlab but I want to do it in Python. This is the code in Matlab:
A = [-1452.18133319 3285.44737438 -7075.49516676];
B = [-1452.20175668 3285.29632734 -7075.49110863];
points = [A; B];
distance = [0.; 0.1667];
pp = interp1(distance, points,'pchip','pp');
[breaks,coefs,l,k,d] = unmkpp(pp);
dpp = mkpp(breaks,repmat(k-1:-1:1,d*l,1).*coefs(:,1:k-1),d);
ntangent=zeros(length(distance),3);
for j=1:length(distance)
ntangent(j,:) = ppval(dpp, distance(j));
end
%The solution would be at beginning and end:
%ntangent =
% -0.1225 -0.9061 0.0243
% -0.1225 -0.9061 0.0243
Any ideas? I tried to find the solution using numpy and scipy using multiple methods, e.g.
tck, u= scipy.interpolate.splprep(data)
but none of the methods seem satisfy what I want.

Give der=1 to splev to get the derivative of the spline:
from scipy import interpolate
import numpy as np
t=np.linspace(0,1,200)
x=np.cos(5*t)
y=np.sin(7*t)
tck, u = interpolate.splprep([x,y])
ti = np.linspace(0, 1, 200)
dxdt, dydt = interpolate.splev(ti,tck,der=1)

ok, I found the solution which is a little modification of "pv" above (note that splev works only for 1D vectors)
One problem I was having originally with "tck, u= scipy.interpolate.splprep(data)" is that it requires a min of 4 points to work (Matlab works with two points). I was using two points. After increasing the data points, it works as i want.
Here is the solution for completeness:
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
data = np.array([[-1452.18133319 , 3285.44737438, -7075.49516676],
[-1452.20175668 , 3285.29632734, -7075.49110863],
[-1452.32645025 , 3284.37412457, -7075.46633213],
[-1452.38226151 , 3283.96135828, -7075.45524248]])
distance=np.array([0., 0.15247556, 1.0834, 1.50007])
data = data.T
tck,u = interpolate.splprep(data, u=distance, s=0)
yderv = interpolate.splev(u,tck,der=1)
and the tangents are (which matches the Matlab results if the same data is used):
(-0.13394599723751408, -0.99063114953803189, 0.026614957159932656)
(-0.13394598523149195, -0.99063115868512985, 0.026614950816003666)
(-0.13394595055068903, -0.99063117647357712, 0.026614941718878599)
(-0.13394595652952143, -0.9906311632471152, 0.026614954146007865)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to interpolate data with NaN - values using scipy's interpolation routines? - python

For monotone cubic interpolation, which does not overshoot, use pchip or Akima1DInterpolator

Related

contourf() plots white space over finite data

Using FFT for 3D array representation of 2D field

How to interpolate a line between two other lines in python

Why does InterpolatedUnivariateSpline return nan values

find tangent vector at a point for discrete data points

Categories

Resources