Why does InterpolatedUnivariateSpline return nan values - python

I have some data, y vs x, which I would like to interpolate at a finer resolution xx using a cubic spline.
Here is my dataset:
import numpy as np
print np.version.version
import scipy
print scipy.version.version
1.9.2
0.15.1
x = np.array([0.5372973, 0.5382103, 0.5392305, 0.5402197, 0.5412042, 0.54221, 0.543209,
0.5442277, 0.5442277, 0.5452125, 0.546217, 0.5472153, 0.5482086,
0.5492241, 0.5502117, 0.5512249, 0.5522136, 0.5532056, 0.5532056,
0.5542281, 0.5552039, 0.5562125, 0.5567836])
y = np.array([0.01, 0.03108, 0.08981, 0.18362, 0.32167, 0.50941, 0.72415, 0.90698,
0.9071, 0.97955, 0.99802, 1., 0.97863, 0.9323, 0.85344, 0.72936,
0.56413, 0.36997, 0.36957, 0.17623, 0.05922, 0.0163, 0.01, ])
xx = np.array([0.5372981, 0.5374106, 0.5375231, 0.5376356, 0.5377481, 0.5378606,
0.5379731, 0.5380856, 0.5381981, 0.5383106, 0.5384231, 0.5385356,
0.5386481, 0.5387606, 0.5388731, 0.5389856, 0.5390981, 0.5392106,
0.5393231, 0.5394356, 0.5395481, 0.5396606, 0.5397731, 0.5398856,
0.5399981, 0.5401106, 0.5402231, 0.5403356, 0.5404481, 0.5405606,
0.5406731, 0.5407856, 0.5408981, 0.5410106, 0.5411231, 0.5412356,
0.5413481, 0.5414606, 0.5415731, 0.5416856, 0.5417981, 0.5419106,
0.5420231, 0.5421356, 0.5422481, 0.5423606, 0.5424731, 0.5425856,
0.5426981, 0.5428106, 0.5429231, 0.5430356, 0.5431481, 0.5432606,
0.5433731, 0.5434856, 0.5435981, 0.5437106, 0.5438231, 0.5439356,
0.5440481, 0.5441606, 0.5442731, 0.5443856, 0.5444981, 0.5446106,
0.5447231, 0.5448356, 0.5449481, 0.5450606, 0.5451731, 0.5452856,
0.5453981, 0.5455106, 0.5456231, 0.5457356, 0.5458481, 0.5459606,
0.5460731, 0.5461856, 0.5462981, 0.5464106, 0.5465231, 0.5466356,
0.5467481, 0.5468606, 0.5469731, 0.5470856, 0.5471981, 0.5473106,
0.5474231, 0.5475356, 0.5476481, 0.5477606, 0.5478731, 0.5479856,
0.5480981, 0.5482106, 0.5483231, 0.5484356, 0.5485481, 0.5486606,
0.5487731, 0.5488856, 0.5489981, 0.5491106, 0.5492231, 0.5493356,
0.5494481, 0.5495606, 0.5496731, 0.5497856, 0.5498981, 0.5500106,
0.5501231, 0.5502356, 0.5503481, 0.5504606, 0.5505731, 0.5506856,
0.5507981, 0.5509106, 0.5510231, 0.5511356, 0.5512481, 0.5513606,
0.5514731, 0.5515856, 0.5516981, 0.5518106, 0.5519231, 0.5520356,
0.5521481, 0.5522606, 0.5523731, 0.5524856, 0.5525981, 0.5527106,
0.5528231, 0.5529356, 0.5530481, 0.5531606, 0.5532731, 0.5533856,
0.5534981, 0.5536106, 0.5537231, 0.5538356, 0.5539481, 0.5540606,
0.5541731, 0.5542856, 0.5543981, 0.5545106, 0.5546231, 0.5547356,
0.5548481, 0.5549606, 0.5550731, 0.5551856, 0.5552981, 0.5554106,
0.5555231, 0.5556356, 0.5557481, 0.5558606, 0.5559731, 0.5560856,
0.5561981, 0.5563106, 0.5564231, 0.5565356, 0.5566481, 0.5567606])
I am trying to fit using the scipy InterpolatedUnivariateSpline method, interpolated with a 3rd order spline k=3, and extrapolated as zeros ext='zeros':
import scipy.interpolate as interp
yspline = interp.InterpolatedUnivariateSpline(x,y, k=3, ext='zeros')
yvals = yspline(xx)
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y, 'ko', label='Values')
ax.plot(xx, yvals, 'b-.', lw=2, label='Spline')
plt.xlim([min(x), max(x)])
However, as you can see in this image, my Spline returns NaN values :(
Is there a reason? I am pretty sure my x values are all increasing, so I am stumped as to why this is happening. I have many other datasets I am fitting using this method, and it only fails on this specific set of data.
Any help is greatly appreciated.
Thank you for reading.
EDIT!
The solution was that I have duplicate x values, with differing y values!

For this interpolation, you should rather use scipy.interpolate.interp1d with the argument kind='cubic' (see a related SO question )
I have yet to find a use case where InterpolatedUnivariateSpline can be used in practice (or maybe I just don't understand its purpose). With your code I get,
So the interpolation works but shows extremely strong oscillations, making it unusable, which is typically the result I was getting with this interpolation method in the past. With a lower order spline (e.g. k=1) that works better, but then you lose the advantage of cubic interpolation.

I've also encountered the problem with InterpolatedUnivariateSpline returning NaN values. But in my case the reason was not in having duplicates in x array but because values in x were decreasing when docs states that values "must be increasing".
So, in such a case, instead of original x and y one must supply them reversed: x[::-1] and y[::-1].

Related

scipy.interpolate.rbf messes up interpolation for large z

Can someone tell me what the limitations of .Rbf are? I have quite large z values but also when scaling them down I get weirdly random values for a non integer grid. When using a integer grid it seems to work alright...
import scipy.interpolate as sci
import numpy as np
x=np.linspace(-1.4,-1.25641026,3)
y=np.linspace(-30,-23.68421052,5)
afd=np.array([[67012.37999509, 65416.76027155, 59893.18041826],[52767.97720273, 56538.82517715, 54604.71202974],[50725.8556277 , 42643.22600322, 44000.07955214],[42121.84799917, 39075.56307817, 39949.93033376],[31695.19321716, 41350.21431499, 32807.28556624]])
A,B=np.meshgrid(x,y)
x1=A.ravel()
x2=B.ravel()
z=afd.ravel()
tck = sci.Rbf(x1, x2, z, smooth=0)
xnew=np.linspace(-1.4,-1.25641026,3)
ynew=np.linspace(-30,-23.68421052,1*5) #double grid density
Anew,Bnew=np.meshgrid(xnew,ynew)
plt.subplot(121)
plt.pcolor(Anew,Bnew,tck(Anew,Bnew))
print(tck(Anew,Bnew))
plt.subplot(122)
xnew=np.linspace(-1.4,-1.25641026,2*3-1)
ynew=np.linspace(-30,-23.68421052,2*5-1) #double grid density
Anew,Bnew=np.meshgrid(xnew,ynew)
plt.pcolor(Anew,Bnew,tck(Anew,Bnew))
print(tck(Anew,Bnew))

Fitting sin curve using python

I am having two list:
# on x-axis:
# list1:
[70.434654, 37.147266, 8.5787086, 161.40877, -27.31284, 80.429482, -81.918106, 52.320129, 64.064552, -156.40771, 12.37026, 15.599689, 166.40984, 134.93636, 142.55002, -38.073524, -38.073524, 123.88509, -82.447571, 97.934402, 106.28793]
# on y-axis:
# list2:
[86683.961, -40564.863, 50274.41, 80570.828, 63628.465, -87284.016, 30571.402, -79985.648, -69387.891, 175398.62, -132196.5, -64803.133, -269664.06, 36493.316, 22769.121, 25648.252, 25648.252, 53444.855, 684814.69, 82679.977, 103244.58]
I need to fit a sine curve a+bsine(2*3.14*list1+c) in the data points obtained by plotting list1(on x-axis) against(on-y-axis) using python.
I am not able to get any good result.Can anyone help me with a suitable code,explanation...
Thanks!
this is my graph after plotting the list1(on x-axis) and list2(on y-axis)
Well, if you used lmfit setting up and running your fit would look like this:
xdeg = [70.434654, 37.147266, 8.5787086, 161.40877, -27.31284, 80.429482, -81.918106, 52.320129, 64.064552, -156.40771, 12.37026, 15.599689, 166.40984, 134.93636, 142.55002, -38.073524, -38.073524, 123.88509, -82.447571, 97.934402, 106.28793]
y = [86683.961, -40564.863, 50274.41, 80570.828, 63628.465, -87284.016, 30571.402, -79985.648, -69387.891, 175398.62, -132196.5, -64803.133, -269664.06, 36493.316, 22769.121, 25648.252, 25648.252, 53444.855, 684814.69, 82679.977, 103244.58]
import numpy as np
from lmfit import Model
import matplotlib.pyplot as plt
def sinefunction(x, a, b, c):
return a + b * np.sin(x*np.pi/180.0 + c)
smodel = Model(sinefunction)
result = smodel.fit(y, x=xdeg, a=0, b=30000, c=0)
print(result.fit_report())
plt.plot(xdeg, y, 'o', label='data')
plt.plot(xdeg, result.best_fit, '*', label='fit')
plt.legend()
plt.show()
That is assuming your X data is in degrees, and that you really intended to convert that to radians (as numpy's sin() function requires).
But that just addresses the mechanics of how to do the fit (and I'll leave the display of results up to you - it seems like you may need the practice).
The fit result is terrible, because these data are not sinusoidal. They are also not well ordered, which isn't a problem for doing the fit, but does make it harder to see what is going on.

`ValueError: A value in x_new is above the interpolation range.` - what other reasons than not ascending values?

I receive this error in scipy interp1d function. Normally, this error would be generated if the x was not monotonically increasing.
import scipy.interpolate as spi
def refine(coarsex,coarsey,step):
finex = np.arange(min(coarsex),max(coarsex)+step,step)
intfunc = spi.interp1d(coarsex, coarsey,axis=0)
finey = intfunc(finex)
return finex, finey
for num, tfile in enumerate(files):
tfile = tfile.dropna(how='any')
x = np.array(tfile['col1'])
y = np.array(tfile['col2'])
finex, finey = refine(x,y,0.01)
The code is correct, because it successfully worked on 6 data files and threw the error for the 7th. So there must be something wrong with the data. But as far as I can tell, the data increase all the way down.
I am sorry for not providing an example, because I am not able to reproduce the error on an example.
There are two things that could help me:
Some brainstorming - if the data are indeed monotonically
increasing, what else could produce this error? Another hint,
regarding the decimals, could be in this question, but I think
my solution (the min and max of x) is robust enough to avoid it. Or
isn't it?
Is it possible (how?) to return the value of x_new and
it's index when throwing the ValueError: A value in x_new is above the interpolation range. so that I could actually see where in the
file is the problem?
UPDATE
So the problem is that, for some reason, max(finex) is larger than max(coarsex) (one is .x39 and the other is .x4). I hoped rounding the original values to 2 significant digits would solve the problem, but it didn't, it displays fewer digits but still counts with the undisplayed. What can I do about it?
If you are running Scipy v. 0.17.0 or newer, then you can pass fill_value='extrapolate' to spi.interp1d, and it will extrapolate to accomadate these values of your's that lie outside the interpolation range. So define your interpolation function like so:
intfunc = spi.interp1d(coarsex, coarsey,axis=0, fill_value="extrapolate")
Be forewarned, however!
Depending on what your data looks like and the type on interpolation you are performing, the extrapolated values can be erroneous. This is especially true if you have noisy or non-monotonic data. In your case you might be ok because your x_new value is only slighly beyond your interpolation range.
Here's simple demonstration of how this feature can work nicely but also give erroneous results.
import scipy.interpolate as spi
import numpy as np
x = np.linspace(0,1,100)
y = x + np.random.randint(-1,1,100)/100
x_new = np.linspace(0,1.1,100)
intfunc = spi.interp1d(x,y,fill_value="extrapolate")
y_interp = intfunc(x_new)
import matplotlib.pyplot as plt
plt.plot(x_new,y_interp,'r', label='interp/extrap')
plt.plot(x,y, 'b--', label='data')
plt.legend()
plt.show()
So the interpolated portion (in red) worked well, but the extrapolated portion clearly fails to follow the otherwise linear trend in this data because of the noise. So have some understanding of your data and proceed with caution.
A quick test of your finex calc shows that it can (always?) gets into the extrapolation region.
In [124]: coarsex=np.random.rand(100)
In [125]: max(coarsex)
Out[125]: 0.97393109991816473
In [126]: step=.01;finex=np.arange(min(coarsex), max(coarsex)+step, step);(max(
...: finex),max(coarsex))
Out[126]: (0.98273730602114795, 0.97393109991816473)
In [127]: step=.001;finex=np.arange(min(coarsex), max(coarsex)+step, step);(max
...: (finex),max(coarsex))
Out[127]: (0.97473730602114794, 0.97393109991816473)
Again it is a quick test, and may be missing some critical step or value.

Scipy's RectBivariateSpline returns wrong value?

Trying to interpolate data from a regular input grid, and came across this in the documentation for scipy.interpolate.interp2d:
See also RectBivariateSpline Much faster 2D interpolation if your
input data is on a grid
So I tried using scipy.interpolate.RectBivariateSpline instead of interp2d. Docs for both functions seem very similar, so I expected this to produce similar results:
import numpy as np
from scipy.interpolate import RectBivariateSpline, interp2d
from .constants import data
x_coords = y_coords = np.arange(data.shape[0]) # Square array
interp_fun = interp2d(x_coords, y_coords, data)
bivar_fun = RectBivariateSpline(x_coords, y_coords, data)
data[250, 60] # 76.1451873779
interp_fun(60, 250) # 76.14518738
bivar_fun(60, 250, grid=False) # 345.24444
Am I calling this wrong? I have no idea why the interpolation based on RectBivariateSpline is so far off?
I did suspect that maybe RectBivariateSpline operates on a cartesian grid and inverted the y-Axis of input data, but still no luck.
Right, just before submitting this I thought I should try calling bivar_fun(y, x) instead of bivar_fun(x, y) and things suddenly work:
data[250, 60] # 76.1451873779
interp_fun(60, 250) # 76.14518738
bivar_fun(250, 60, grid=False) # [ 76.14518738]
Still not quite sure why, because the first arguments to interp_fun and bivar_fun should be the same:
RectBivariateSpline.__call__(x, y, mth=None, dx=0, dy=0, grid=True)
RectBivariateSpline Docs
interp2d.__call__(x, y, dx=0, dy=0)
Interp2d Docs
There's also a related issue on Github: https://github.com/scipy/scipy/issues/3164

Gradient in noisy data, python

I have an energy spectrum from a cosmic ray detector. The spectrum follows an exponential curve but it will have broad (and maybe very slight) lumps in it. The data, obviously, contains an element of noise.
I'm trying to smooth out the data and then plot its gradient.
So far I've been using the scipy sline function to smooth it and then the np.gradient().
As you can see from the picture, the gradient function's method is to find the differences between each point, and it doesn't show the lumps very clearly.
I basically need a smooth gradient graph. Any help would be amazing!
I've tried 2 spline methods:
def smooth_data(y,x,factor):
print "smoothing data by interpolation..."
xnew=np.linspace(min(x),max(x),factor*len(x))
smoothy=spline(x,y,xnew)
return smoothy,xnew
def smooth2_data(y,x,factor):
xnew=np.linspace(min(x),max(x),factor*len(x))
f=interpolate.UnivariateSpline(x,y)
g=interpolate.interp1d(x,y)
return g(xnew),xnew
edit: Tried numerical differentiation:
def smooth_data(y,x,factor):
print "smoothing data by interpolation..."
xnew=np.linspace(min(x),max(x),factor*len(x))
smoothy=spline(x,y,xnew)
return smoothy,xnew
def minim(u,f,k):
""""functional to be minimised to find optimum u. f is original, u is approx"""
integral1=abs(np.gradient(u))
part1=simps(integral1)
part2=simps(u)
integral2=abs(part2-f)**2.
part3=simps(integral2)
F=k*part1+part3
return F
def fit(data_x,data_y,denoising,smooth_fac):
smy,xnew=smooth_data(data_y,data_x,smooth_fac)
y0,xnnew=smooth_data(smy,xnew,1./smooth_fac)
y0=list(y0)
data_y=list(data_y)
data_fit=fmin(minim, y0, args=(data_y,denoising), maxiter=1000, maxfun=1000)
return data_fit
However, it just returns the same graph again!
There is an interesting method published on this: Numerical Differentiation of Noisy Data. It should give you a nice solution to your problem. More details are given in another, accompanying paper. The author also gives Matlab code that implements it; an alternative implementation in Python is also available.
If you want to pursue the interpolation with splines method, I would suggest to adjust the smoothing factor s of scipy.interpolate.UnivariateSpline().
Another solution would be to smooth your function through convolution (say with a Gaussian).
The paper I linked to claims to prevent some of the artifacts that come up with the convolution approach (the spline approach might suffer from similar difficulties).
I won't vouch for the mathematical validity of this; it looks like the paper from LANL that EOL cited would be worth looking into. Anyway, I’ve gotten decent results using SciPy’s splines’ built-in differentiation when using splev.
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np
from scipy.interpolate import splrep, splev
x = np.arange(0,2,0.008)
data = np.polynomial.polynomial.polyval(x,[0,2,1,-2,-3,2.6,-0.4])
noise = np.random.normal(0,0.1,250)
noisy_data = data + noise
f = splrep(x,noisy_data,k=5,s=3)
#plt.plot(x, data, label="raw data")
#plt.plot(x, noise, label="noise")
plt.plot(x, noisy_data, label="noisy data")
plt.plot(x, splev(x,f), label="fitted")
plt.plot(x, splev(x,f,der=1)/10, label="1st derivative")
#plt.plot(x, splev(x,f,der=2)/100, label="2nd derivative")
plt.hlines(0,0,2)
plt.legend(loc=0)
plt.show()
You can also use scipy.signal.savgol_filter.
Result
Example
import matplotlib.pyplot as plt
import numpy as np
import scipy
from random import random
# generate data
x = np.array(range(100))/10
y = np.sin(x) + np.array([random()*0.25 for _ in x])
dydx = scipy.signal.savgol_filter(y, window_length=11, polyorder=2, deriv=1)
# Plot result
plt.plot(x, y, label='Original signal')
plt.plot(x, dydx*10, label='1st Derivative')
plt.plot(x, np.cos(x), label='Expected 1st Derivative')
plt.legend()
plt.show()

Categories