Repeating Scipy's griddata - python

The griding the data (d) in irregular grid (x and y) using Scipy's griddata is timecomsuing when the datasets are many. But, the longitudes and latitudes (x and y) are always same, only the data (d) are changing. In this case, once using the giddata, how to repeat the procedure with different d arrys to achieve faster result?
import numpy as np, matplotlib.pyplot as plt
from scipy.interpolate import griddata
x = np.array([110, 112, 114, 115, 119, 120, 122, 124]).astype(float)
y = np.array([60, 61, 63, 67, 68, 70, 75, 81]).astype(float)
d = np.array([4, 6, 5, 3, 2, 1, 7, 9]).astype(float)
ulx, lrx = np.min(x), np.max(x)
uly, lry = np.max(y), np.min(y)
xi = np.linspace(ulx, lrx, 15)
yi = np.linspace(uly, lry, 15)
grided_data = griddata((x, y), d, (xi.reshape(1,-1), yi.reshape(-1,1)), method='nearest',fill_value=0)
The above code works for one array of d.
But I have hundreds of other arrays.

griddata with nearest ends up using NearestNDInterpolator. That's a class that creates an iterator, which is called with the xi:
elif method == 'nearest':
ip = NearestNDInterpolator(points, values, rescale=rescale)
return ip(xi)
So you could create your own NearestNDInterpolator and call it with multiple times with different xi.
But I think in your case you want to change the values. Looking at the code for that class I see
self.tree = cKDTree(self.points)
self.values = y
the __call__ does:
dist, i = self.tree.query(xi)
return self.values[i]
I don't know the relative cost of creating the tree versus query.
So it should be easy to change values between uses of __call__. And it looks like values could have multiple columns, since it's just indexing on the 1st dimension.
This interpolator is simple enough that you could write your own using the same tree idea.
Here's a Nearest Interpolator that lets you repeat the interpolation for the same points, but different z values. I haven't done timings yet to see how much time it saves
class MyNearest(interpolate.NearestNDInterpolator):
# normal interpolation, but returns the near neighbor indices as well
def __call__(self, *args):
xi = interpolate.interpnd._ndim_coords_from_arrays(args, ndim=self.points.shape[1])
xi = self._check_call_shape(xi)
xi = self._scale_x(xi)
dist, i = self.tree.query(xi)
return i, self.values[i]
def my_griddata(points, values, method='linear', fill_value=np.nan,
points = interpolate.interpnd._ndim_coords_from_arrays(points)
if points.ndim < 2:
ndim = points.ndim
ndim = points.shape[-1]
# simplified call for 2d 'nearest'
ip = MyNearest(points, values, rescale=rescale)
return ip # ip(xi) # return iterator, not values
ip = my_griddata((xreg, yreg), z, method='nearest',fill_value=0)
xi = (xi.reshape(1,-1), yi.reshape(-1,1))
I, data = ip(xi)
z1 = xreg+yreg # new z data
data = z1[I] # should show diagonal color bars
So as long as z has the same shape as before (and as xreg), z[I] will return the nearest value for each xi.
And it can interpolated 2d data as well (e.g. (225,n) shaped)
z1 = np.array([xreg+yreg, xreg-yreg]).T
print(z1.shape) # (225,2)
data = z1[I]
print(data.shape) # (20,20,2)


Applying a function to meshgrids only under select polygon criteria?

My question is similar to this post but I'm having some trouble adapting it:
Ambiguous truth value for meshgrid and user-defined functions using if-statement
Essentially, I would like the conditional statement to not look like this:
import numpy as np
def test(x, y):
a = 1.0/(1+x*x)
b = np.ones(y.shape)
mask = (y!=0)
b[mask] = np.sin(y[mask])/y[mask]
return a*b
Rather, the "mask" to depend on whether x,y lie within a certain polygon. So every value in the resulting array is a 1, but a polygon between 4 values is generated. I only want the function to apply to points from the 2 meshgrid inputs (X,Y) which lay inside the polygon
x and y are real numbers that can be negative.
I'm not sure how to pass in the array items as singular values.
I ultimately want to plot Z on a colour plot
i.e. points within a polygon undergo a transformation, points outside the polygon remain as 1
For example, I would expect my function to look like this
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon
def f(x, y, poly):
a = 1.0/(1+x*x)
b = np.ones(y.shape)
mask = (Point(x,y).within(poly) == True)
b[mask] = a*b
return b
x and y are meshgrids of arbitrary dimensions
I should add that I get the following error:
"only size-1 arrays can be converted to Python scalars"
X and Y are generated and the function is called via
coords = [(0, 0), (4,0), (4,4), (0,4)]
poly = Polygon(coords)
x = np.linspace(0,10, 11, endpoint = True) # x intervals
y = np.linspace(0,10,11, endpoint = True) # y intervals
X, Y = np.meshgrid(x,y)
Z = f(X, Y, poly)
Error Message:
Traceback (most recent call last):
File "", line 28, in <module>
Z = f(X, Y, poly)
File "", line 16, in f
mask = (Point(x,y).within(poly) != True)
File "C:\Users\Nick\AppData\Local\Programs\Python\Python37\lib\site-packages\shapely\geometry\", line 48, in __init__
File "C:\Users\Nick\AppData\Local\Programs\Python\Python37\lib\site-packages\shapely\geometry\", line 137, in _set_coords
self._geom, self._ndim = geos_point_from_py(tuple(args))
File "C:\Users\Nick\AppData\Local\Programs\Python\Python37\lib\site-packages\shapely\geometry\", line 214, in geos_point_from_py
dx = c_double(coords[0])
TypeError: only size-1 arrays can be converted to Python scalars
Matplotlib has a function that accepts an array of points. Demo:
import numpy as np
from matplotlib.path import Path
coords = [(0, 0), (4,0), (4,4), (0,4)]
x = np.linspace(0, 10, 11, endpoint=True)
y = np.linspace(0, 10, 11, endpoint=True)
X, Y = np.meshgrid(x, y)
points = np.c_[X.ravel(), Y.ravel()]
mask = Path(coords).contains_points(points).reshape(X.shape)
You are passing an array to the function Point that accepts single values.

Number format python

I want to have the legend of the plot shown with the value in a list. But what I get is the element index but not the value itself. I dont know how to fix it. I'm referring to the plt.plot line. Thanks for the help.
import matplotlib.pyplot as plt
import numpy as np
x = np.random.random(1000)
y = np.random.random(1000)
n = len(x)
d_ij = []
for i in range(n):
for j in range(i+1,n):
a = np.sqrt((x[i]-x[j])**2+(y[i]-y[j])**2)
epsilon = np.linspace(0.01,1,num=10)
sigma = np.linspace(0.01,1,num=10)
def lj_pot(epsi,sig,d):
result = []
for i in range(len(d)):
a = 4*epsi*((sig/d[i])**12-(sig/d[i])**6)
return result
for i in range(len(epsilon)):
for j in range(len(sigma)):
a = epsilon[i]
b = sigma[j]
plt.ylim([-1.5, 1.5])
plt.xlim([0, 2])
plt.plot(sorted(d_ij),lj_pot(epsilon[i],sigma[j],sorted(d_ij)),label = 'epsilon = %d, sigma =%d' %(a,b))
plt.savefig("epsilon_%d_sigma_%d.png" % (i,j))
Your code is a bit unpythonic, so I tried to clean it up to the best of my knowledge. numpy.random.random and numpy.random.uniform(0, 1) are basically the same, however, the latter also allows you to pass the shape of the return array that you would like to have, in this case an array with 1000 rows and two columns (1000, 2). I then use some magic to assign the two colums of the return array to x and y in the same line, respectively.
numpy.hypot does as the name suggests and calculates the hypothenuse of x and y. It can also do that for each entry of arrays with the same size, saving you the for loops, which you should try to aviod in Python since they are pretty slow.
You used plt for all your plotting, which is fine as long as you only have one figure, but I would recommend to be as explicit as possible, according to one of Python's key notions:
explicit is better than implicit.
I recommend you read through this guide, in particular the section called 'Stateful Versus Stateless Approaches'. I changed your commands accordingly.
It is also very unpythonic to loop over items of a list using the index of the item in the list like you did (for i in range(len(list)): item = list[i]). You can just reference the item directly (for item in list:).
Lastly I changed your formatted strings to the more convenient f-strings. Have a read here.
import matplotlib.pyplot as plt
import numpy as np
def pot(epsi, sig, d):
result = 4*epsi*((sig/d)**12 - (sig/d)**6)
return result
# I am not sure why you would create the independent variable this way,
# maybe you are simulating something. In that case, the code below is
# simpler than your version and should achieve the same.
# x, y = zip(*np.random.uniform(0, 1, (1000, 2)))
# d = np.array(sorted(np.hypot(x, y)))
# If you only want to plot your pot function then creating the value range
# like this is just fine.
d = np.linspace(0.001, 1, 1000)
epsilons = sigmas = np.linspace(0.01, 1, num=10)
fig, ax = plt.subplots()
ax.set_xlim([0, 2])
ax.set_ylim([-1.5, 1.5])
line = None
for epsilon in epsilons:
for sigma in sigmas:
if line is None:
line = ax.plot(
d, pot(epsilon, sigma, d),
label=f'epsilon = {epsilon}, sigma = {sigma}'
line.set_data(d, pot(epsilon, sigma, d))
# plt.savefig(f"epsilon_{epsilon}_sigma_{sigma}.png")

How can I interpolate data in python?

I have a 4D dataset (time, z, y, x) and I would like to interpolate the data to get a higher resolution, this is a simple example code:
import numpy as np
from scipy.interpolate import griddata
x_0 = 10
cut_index = 10
res = 200j
x_index = x_0
y_index = np.linspace(0, 100, 50).astype(int)
z_index = np.linspace(0, 50, 25).astype(int)
#Time, zyx-coordinate
u = np.random.randn(20, 110, 110, 110)
z_index, y_index = np.meshgrid(z_index, y_index)
data = u[cut_index, z_index, y_index, x_index]
res = 200j
y_f = np.mgrid[0:100:res]
z_f = np.mgrid[0:50:res]
z_f, y_f = np.meshgrid(z_f, y_f)
data = griddata((z_index, y_index), data, (z_f, y_f))
I am getting the ValueError: invalid shape for input data points error. What kind of input is expected by the griddata function?
Your data parameter has to be a 1D array. Try flattening the arrays:
data = griddata((z_index.flatten(), y_index.flatten()), data.flatten(), (z_f, y_f))

IndexError: too many indices for array for an array that is definitely as big

I'm trying to make a movie by taking png images of an updating plot and stitching them together. There are three variables: degrees, ksB, and mp. Only mp changes each frame; the other two are constant. The data for mp for all times is stored in X. This is the relevant part of the code:
def plot(fname, haveMLPY=False):
# Load data from .npz file.
data = np.load(fname)
X = data["X"]
T = data["T"]
N = X.shape[1]
A = data["vipWeights"]
degrees = A.sum(1)
ksB = data["ksB"]
# Initialize a figure.
figure = plt.figure()
# Generate a plottable axis as the first subplot in 1 rows and 1 columns.
axis = figure.add_subplot(1,1,1)
# MP is the first (0th) variable. Plot one trajectory for each cell over time.
axis.plot(T, X[:,:,0], color="black")
# Decorate the plot.
axis.set_xlabel("time [hours]")
axis.set_ylabel("MP [nM]")
axis.set_title("PER mRNA concentration across all %d cells" % N)
firstInd = int(T.size / 2)
if haveMLPY:
import circadian.analysis
# Generate a and plot Signal object, which encapsulates wavelet analysis.
signal = circadian.analysis.Signal(X[firstInd:, 0, 0], T[firstInd:])
# filename for the name of the resulting movie
filename = 'animation'
mp = X[10**4-1,:,0]
from mpl_toolkits.mplot3d import Axes3D
for i in range(10**4):
print i
mp = X[i,:,0]
data2 = np.c_[degrees, ksB, mp]
# Find best fit surface for data2
# regular grid covering the domain of the data
mn = np.min(data2, axis=0)
mx = np.max(data2, axis=0)
X,Y = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20))
XX = X.flatten()
YY = Y.flatten()
order = 2 # 1: linear, 2: quadratic
if order == 1:
# best-fit linear plane
A = np.c_[data2[:,0], data2[:,1], np.ones(data2.shape[0])]
C,_,_,_ = scipy.linalg.lstsq(A, data2[:,2]) # coefficients
# evaluate it on grid
Z = C[0]*X + C[1]*Y + C[2]
# or expressed using matrix/vector product
#Z =[XX, YY, np.ones(XX.shape)], C).reshape(X.shape)
elif order == 2:
# best-fit quadratic curve
A = np.c_[np.ones(data2.shape[0]), data2[:,:2],[:,:2], axis=1), data2[:,:2]**2]
C,_,_,_ = scipy.linalg.lstsq(A, data2[:,2])
# evaluate it on a grid
Z =[np.ones(XX.shape), XX, YY, XX*YY, XX**2, YY**2], C).reshape(X.shape)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, alpha=0.2)
ax.scatter(degrees, ksB, mp)
# form a filename
fname2 = '_tmp%03d.png'%i
# save the frame
# append the filename to the list
# call mencoder
os.system("mencoder 'mf://_tmp*.png' -mf type=png:fps=10 -ovc lavc -lavcopts vcodec=wmv2 -oac copy -o " + filename + ".mpg")
# cleanup
for fname2 in files: os.remove(fname2)
Basically, all the data is stored in X. The format X[i, i, i] means X[time, neuron, data type]. Each time through the loop, I want to update the time, but still plot mp (the 0th variable) for all the neurons.
When I run this code, I get "IndexError: too many indices for array". I asked it to print i to see when the code was going wrong. I get an error when i = 1, meaning that the code loops through once but then has the error the second time.
However, I have data for 10^4 time steps. You can see in the first line of the provided code, I access X[10**4-1, :, 0] successfully. That's why it's confusing to me why X[1,:,0] would be out of range. If anybody could explain why/help me get around this, that would be great.
The traceback error is
Traceback (most recent call last):
File"/Users/angadanand/Documents/LiClipseWorkspace/Circadian/scripts /", line 196, in module
File"/Users/angadanand/Documents/LiClipseWorkspace/Circadian/scripts /", line 142, in plot
mp = X[i,:,0]
IndexError: too many indices for array
Your problem is that you overwrite your X inside your loop:
X,Y = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20))
So afterwards it will have another shape and contain different data. I would suggest changing this second X to x_grid and check where you need this "other" X and where the original.
for example:
X_grid, Y_grid = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20))

Scipy Griddata Output Dimensions

I'm not sure what I'm doing wrong. I'm attempting to use scipy griddata to interpolate data in an irregular grid.
from scipy.interpolate import griddata
I have two lists, "x" and "y", that represent the axes of my original, uninterpolated grid. They are both lists of length 8.
Then, I make the arrays that represent the axes of the intended final, filled-in grid.
ny = np.linspace(0.0, max(y), y[len(y)-1]/min_interval+1)
nx = np.linspace(0.0, max(x), len(ny))
I've checked and both "ny" and "nx" are of shape (61,). Then, I create an 8 x 8 list "z". Finally, I attempt to make my final grid.
Z = griddata((np.array(x), np.array(y)), np.array(z), (nx, ny), method='nearest', fill_value=0)
print Z.shape
The resulting 2D array has dimensions (61,8). I tried using "x" and "y" as lists and arrays - no change. Why is it only interpolating in one direction? I was expecting a (61,61) array output.
I would have included actual numbers if I felt it would have been helpful, but I don't see how it would make a difference. Do I not understand how griddata works?
Here is the full code:
import numpy as np
from scipy.interpolate import griddata
# random data to interpolate
x = np.array([0, 10, 13, 17, 20, 50, 55, 60.0])
y = np.array([10, 20, 40, 80, 90, 95, 100, 120.0])
zg = np.random.randn(8, 8)
#select one of the following two line, it depends on the order in z
#xg, yg = np.broadcast_arrays(x[:, None], y[None, :])
xg, yg = np.broadcast_arrays(x[None, :], y[:, None])
yg2, xg2 = np.mgrid[y.min()-10:y.max()+10:100j, x.min()-10:x.max()+10:100j]
zg2 = griddata((xg.ravel(), yg.ravel()), zg.ravel(), (xg2.ravel(), yg2.ravel()), method="nearest")
zg2.shape = yg2.shape
import pylab as pl
pl.pcolormesh(xg2, yg2, zg2)
pl.scatter(xg.ravel(), yg.ravel(), c=zg.ravel())
the output is:
