scipy.interp2d warning and different result than expected - python

I'm trying to convert MATLAB code to equivalent python.
I have 3 arrays and I want to compute interp2d:
nuA = np.asarray([2.439,2.5,2.6,2.7,2.8,3.0,3.2,3.5,4.0,5.0,6.0,8.0,10,15,25])
nuB = np.asarray([0,0.1,0.2,0.3,0.5,0.7,1])
a, b = np.meshgrid(nuA, nuB)
betaTab = np.transpose(np.asarray([[0.0,2.16,1.0,1.0,1.0,1.0,1.0],[0.0,1.592,3.39,1.0,1.0,1.0,1.0],[0.0,0.759,1.8,1.0,1.0,1.0,1.0],[0.0,0.482,1.048,1.694,1.0,1.0,1.0],[0.0,0.36,0.76,1.232,2.229,1.0,1.0],[0.0,0.253,0.518,0.823,1.575,1.0,1.0],[0.0,0.203,0.41,0.632,1.244,1.906,1.0],[0.0,0.165,0.332,0.499,0.943,1.56,1.0],[0.0,0.136,0.271,0.404,0.689,1.23,2.195],[0.0,0.109,0.216,0.323,0.539,0.827,1.917],[0.0,0.096,0.19,0.284,0.472,0.693,1.759],[0.0,0.082,0.163,0.243,0.412,0.601,1.596],[0.0,0.074,0.147,0.22,0.377,0.546,1.482],[0.0,0.064,0.128,0.191,0.33,0.478,1.362],[0.0,0.056,0.112,0.167,0.285,0.428,1.274]]))
ip = scipy.interpolate.interp2d(a,b,betaTab)
when I try to run it, this warning is displayed:
/usr/local/lib/python2.7/dist-packages/scipy/interpolate/fitpack.py:981: RuntimeWarning: No more knots can be added because the additional knot would
coincide with an old one. Probable cause: s too small or too large
a weight to an inaccurate data point. (fp>s)
kx,ky=1,1 nx,ny=4,14 m=105 fp=21.576347 s=0.000000
warnings.warn(RuntimeWarning(_iermess2[ierm][0] + _mess))
I know that interp2d is different from matlab interp2 and in python RectBivariateSpline function is preferred. But I can't use the latter function because of the length of my data. Also, the final result of ip(xi,yi) is different from the MATLAB answer.
How can I compute interp2d without warning and compute it correctly?

Your input data seems to be quite ill-defined. Here's a surface plot of your input points:
This is not an easy problem to interpolate. Incidentally, I've recently ran into problems where interp2d couldn't even interpolate a smooth data set. So I would suggest checking out scipy.interpolate.griddata instead:
import numpy as np
import scipy.interpolate as interp
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
#define your data as you did in your question: a, b and betaTab
ip = interp.interp2d(a,b,betaTab) # original interpolator
aplotv = np.linspace(a.min(),a.max(),100) # to interpolate at
bplotv = np.linspace(b.min(),b.max(),100) # to interpolate at
aplot,bplot = np.meshgrid(aplotv,bplotv) # mesh to interpolate at
# actual values from interp2d:
betainterp2d = ip(aplotv,bplotv)
# actual values from griddata:
betagriddata = interp.griddata(np.array([a.ravel(),b.ravel()]).T,betaTab.ravel(),np.array([aplot.ravel(),bplot.ravel()]).T)
# ^ this probably could be written in a less messy way,
# I'll keep thinking about it
#plot results
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(aplot,bplot,betainterp2d,cmap='viridis',cstride=1,rstride=1)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(aplot,bplot,betagriddata,cmap='viridis',cstride=1,rstride=1)
Results: (left: interp2d, right: griddata)
Conclusion: use scipy.interpolate.griddata.

Related

Smooth Contourf plot completely filled

I have the data with (X,Y,Z) values. I tried to make a density plot with Z values for intensity. However the plot I get is not smooth and and has polytope i.e not completely filled.
The following is the code with the Data
but I want to obtain smooth and completely filled plot
import numpy as np
from scipy.interpolate import griddata
import matplotlib.pyplot as plt
import xlrd
location = "~/Desktop/Data.xlsx"
data = xlrd.open_workbook(location)
sheet = data.sheet_by_index(0)
sample=2000
x=np.array(sheet.col_values(0))[0:sample]
y=np.array(sheet.col_values(1))[0:sample]
z=np.hamming(9000)[0:sample]
print z
def plot_contour(x,y,z,resolution = 500,contour_method='cubic'):
resolution = str(resolution)+'j'
X,Y = np.mgrid[min(x):max(x):complex(resolution), min(y):max(y):complex(resolution)]
points = [[a,b] for a,b in zip(x,y)]
Z = griddata(points, z, (X, Y), method=contour_method)
return X,Y,Z
X,Y,Z = plot_contour(x,y,z,resolution = 500,contour_method='linear')
plt.style.context("seaborn-deep")
plt.contourf(X,Y,Z)
plt.colorbar()
plt.show()
This is the output:
This is what I want to achieve using contourplotf:
plt.contourf() is not the main problem here, it's just working with the data it has. The problem is the linear interpolation in scipy.interpolate.griddata().
I recommend not using griddata, but instead using one of the following methods:
scipy.interpolate.Rbf() — this is what you were using before (see my previous answer).
verde — an awesome gridding package.
sklearn.gaussian_process — or some other prediction model.
All of these methods will fill in the grid. If you plot the result with plt.imshow() you'll get the type of plot you show in your question — that is not a plt.contourf() plot.
Here's a demo notebook showing all of these approaches (including griddata).

How to extract x,y data from kdensity plot from matplotlib for python

I am trying to figure out how to make a 3d figure of uni-variate kdensity plots as they change over time (since they pull from a sliding time window of data over time).
Since I can't figure out how to do that directly, I am first trying to get the x,y plotting data for kdensity plots of matplotlib in python. I hope after I extract them I can use them along with a time variable to make a three dimensional plot.
I see several posts telling how to do this in Matlab. All reference getting Xdata and Ydata from the underlying figure:
x=get(h,'Xdata')
y=get(h,'Ydata')
How about in python?
The answer was already contained in another thread (How to create a density plot in matplotlib?). It is pretty easy to get a set of kdensity x's and y's from a set of data.
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8 # data is a set of univariate data
xs = np.linspace(0,max(data),200) # This 200 sets the # of x (and so also y) points of the kdensity plot
density = gaussian_kde(data)
density.covariance_factor = lambda : .25
density._compute_covariance()
ys = density(xs)
plt.plot(xs,ys)
And there you have it. Both the kdensity plot and it's underlying x,y data.
Not sure how kdensity plots work, but note that matplotlib.pyplot.plot returns a list of the added Line2D objects, which are, in fact, where the X and Y data are stored. I suspect they did that to make it work similarly to MATLAB.
import matplotlib.pyplot as plt
h = plt.plot([1,2,3],[2,4,6]) # [<matplotlib.lines.Line2D object at 0x021DA9F0>]
x = h[0].get_xdata() # [1,2,3]
y = h[0].get_ydata() # [2,4,6]

Interpolating Data Using SciPy

I have two arrays of data that correspond to x and y values, that I would like to interpolate with a cubic spline.
I have tried to do this, but my interpolated function doesn't pass through my data points.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
re = np.array([0.2,2,20,200,2000,20000],dtype = float)
cd = np.array([103,13.0,2.72,0.800,0.401,0.433],dtype = float)
plt.yscale('log')
plt.xscale('log')
plt.xlabel( "Reynold's number" )
plt.ylabel( "Drag coefficient" )
plt.plot(re,cd,'x', label='Data')
x = np.linspace(0.2,20000,200000)
f = interp1d(re,cd,kind='cubic')
plt.plot(x,f(x))
plt.legend()
plt.show()
What I end up with looks like this;
Which is clearly an awful representation of my function. What am I missing here?
Thank you.
You can get the result you probably expect (smooth spline on the log axes) by doing this:
f = interp1d(np.log(re),np.log(cd), kind='cubic')
plt.plot(x,np.exp(f(np.log(x))))
This will build the interpolation in the log space and plot it correctly. Plot your data on a linear scale to see how the cubic has to flip to get the tail on the left hand side.
The main thing you are missing is the log scaling on your axes. The spline shown is not an unreasonable result given your input data. Try drawing the plot with plt.xscale('linear') instead of plt.xscale('log'). Perhaps a cubic spline is not the best interpolation technique, at least on the raw data. A better option may be to interpolate on the log of the data insead.

histogram matching in Python

I am trying to do histogram matching of simulated data to observed precipitation data. The below shows a simple simulated case. I got the CDF of both the simulated and observed data and got stuck theree. I hope a clue would help me to get across..Thanks you in advance
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
import scipy.stats as st
sim = st.gamma(1,loc=0,scale=0.8) # Simulated
obs = st.gamma(2,loc=0,scale=0.7) # Observed
x = np.linspace(0,4,1000)
simpdf = sim.pdf(x)
obspdf = obs.pdf(x)
plt.plot(x,simpdf,label='Simulated')
plt.plot(x,obspdf,'r--',label='Observed')
plt.title('PDF of Observed and Simulated Precipitation')
plt.legend(loc='best')
plt.show()
plt.figure(1)
simcdf = sim.cdf(x)
obscdf = obs.cdf(x)
plt.plot(x,simcdf,label='Simulated')
plt.plot(x,obscdf,'r--',label='Observed')
plt.title('CDF of Observed and Simulated Precipitation')
plt.legend(loc='best')
plt.show()
# Inverse CDF
invcdf = interp1d(obscdf,x)
transfer_func = invcdf(simcdf)
plt.figure(2)
plt.plot(transfer_func,x,'g-')
plt.show()
I tried to reproduce your code, and got the following error:
ValueError: A value in x_new is above the interpolation range.
If you look at the plot of your two CDFs it is pretty straight forward to figure out what is going on:
When you now define invcdf = interp1d(obscdf, x), notice that obscdf ranges from
>>> obscdf[0]
0.0
>>> obscdf[-1]
0.977852889924409
and so invcdf can only interpolate values between those limits: beyond them we would have to do extrapolation, which is not all that well defined. SciPy's default behavior is to raise an error when asked to extrapolate. Which is exactly what happens when you ask for invcdf(simcdf), because
>>> simcdf[-1]
0.99326205300091452
is beyond the interpolation range.
If you read the interp1d docs you will see that this behavior can be modified doing
invcdf = interp1d(obscdf, x, bounds_error=False)
and now everything works out fine, although you need to reverse the order of your plotting arguments to plt.plot(x, transfer_func,'g-') to get the same as in the figure you posted:

Add more sample points to data

Given some data of shape 20x45, where each row is a separate data set, say 20 different sine curves with 45 data points each, how would I go about getting the same data, but with shape 20x100?
In other words, I have some data A of shape 20x45, and some data B of length 20x100, and I would like to have A be of shape 20x100 so I can compare them better.
This is for Python and Numpy/Scipy.
I assume it can be done with splines, so I am looking for a simple example, maybe just 2x10 to 2x20 or something, where each row is just a line, to demonstrate the solution.
Thanks!
Ubuntu beat me to it while I was typing this example, but his example just uses linear interpolation, which can be more easily done with numpy.interpolate... (The difference is only a keyword argument in scipy.interpolate.interp1d, however).
I figured I'd include my example, as it shows using scipy.interpolate.interp1d with a cubic spline...
import numpy as np
import scipy as sp
import scipy.interpolate
import matplotlib.pyplot as plt
# Generate some random data
y = (np.random.random(10) - 0.5).cumsum()
x = np.arange(y.size)
# Interpolate the data using a cubic spline to "new_length" samples
new_length = 50
new_x = np.linspace(x.min(), x.max(), new_length)
new_y = sp.interpolate.interp1d(x, y, kind='cubic')(new_x)
# Plot the results
plt.figure()
plt.subplot(2,1,1)
plt.plot(x, y, 'bo-')
plt.title('Using 1D Cubic Spline Interpolation')
plt.subplot(2,1,2)
plt.plot(new_x, new_y, 'ro-')
plt.show()
One way would be to use scipy.interpolate.interp1d:
import scipy as sp
import scipy.interpolate
import numpy as np
x=np.linspace(0,2*np.pi,45)
y=np.zeros((2,45))
y[0,:]=sp.sin(x)
y[1,:]=sp.sin(2*x)
f=sp.interpolate.interp1d(x,y)
y2=f(np.linspace(0,2*np.pi,100))
If your data is fairly dense, it may not be necessary to use higher order interpolation.
If your application is not sensitive to precision or you just want a quick overview, you could just fill the unknown data points with averages from neighbouring known data points (in other words, do naive linear interpolation).

Categories