I am trying to interpolate complex values from one irregular grid to another irregular grid using Python. The grids are in 2D and there are 103,113 data points. I am using Python 2.6.6, Scipy 0.7.2, Numpy 1.3.0, Matplotlib 0.99.3
In Matlab using griddata this is achieved in roughly 5 seconds.
BnGRID2 = griddata(R_GRID1,Z_GRID1,BnGRID1,R_GRID2,Z_GRID2) (MATLAB)
(Note all arrays are 201 x 513)
However, if I try using matplotlib.mlab.griddata I get a memoryError even if I try to work with the real part only:
mlab.griddata(R_GRID1.flatten(),Z_GRID1.flatten(),num.real(BnGRID1.flatten()),R_GRID2.flatten(),Z_GRID2.flatten())
If I try using interp2d I get a segmentation fault and Python exits:
a = interp.interp2d(R_GRID1,Z_GRID1,num.real(BnGRID1))
I have tried using KDTree and this seems to work ok, however, it takes a few minutes compared with the few seconds for Matlab, but I haven't explored this option too much yet.
Was wondering if anyone has any ideas how I can get this done as quickly as Matlab seems to? I noticed that the newer version of Scipy also has griddata, does anyone know if this can handle large irregular grids?
Scipy's griddata seems to be able to deal with data sets of this size without problems:
import numpy as np
import scipy.interpolate
# old grid
x, y = np.mgrid[0:1:201j, 0:1:513j]
z = np.sin(x*20) * (1j + np.cos(y*3))**2 # some data
# new grid
x2, y2 = np.mgrid[0.1:0.9:201j, 0.1:0.9:513j]
# interpolate onto the new grid
z2 = scipy.interpolate.griddata((x.ravel(), y.ravel()), z.ravel(), (x2, y2), method='cubic')
The griddata step takes about 5s on an old AMD Athlon.
If your data is on a grid (i.e., the coordinates corresponding to value z[i,j] are (x[i], y[j])), you can get more speed by using scipy.interpolate.RectBivariateSpline
z3 = (scipy.interpolate.RectBivariateSpline(x[:,0], y[0,:], z.real)(x2[:,0], y2[0,:])
+ 1j*scipy.interpolate.RectBivariateSpline(x[:,0], y[0,:], z.imag)(x2[:,0], y2[0,:]))
which takes 0.05s. It's much faster, because even if your grid spacings are irregular, a more efficient algorithm can be used as long as the grid is rectangular.
Related
I am trying to find the most efficient way to regrid a portion of a 2-D array to a finer grid. To keep my program general, I would prefer a solution using a standard package like numpy or scipy, but perhaps this is beyond their capacities.
The intended input data is a geotiff DEM (digital elevation models) file, imported with GDAL, and converted to a numpy array. An issue is that many of the input files have no CRS information.
In my MWE below and regridding to a 2x2 grid (for demo) this takes 96s on my local machine. The actual fine grids will be much larger and the fine grid will be created multiple times in a loop. I admit this MWE might be the least efficient way to do this!
import numpy as np
import scipy.interpolate as interp
import time
nx,ny = 1600,2400 # size of input data
z_in = np.random.normal(size=(nx,ny)) # make input data
x_in,y_in = np.mgrid[0:nx,0:ny]*0.0932248 # make x,y coordinate arrays for input data
# make new (finer) x,y coordinate arrays. none of these x,y coordinates
# necessarily overlap directly with those of the input dataset.
nx_new,ny_new = 2,2 # size of fine array (in reality, this will be much larger (up to 1000x1000))
x_new,y_new = np.mgrid[0:nx_new,0:ny_new]
x_new = x_new*0.01 + 60
y_new = y_new*0.01 + 85
# regrid the data
starttime=time.time()
flattened_xy_in = np.array([x_in.flatten(),y_in.flatten()]).transpose()
fine_DEM_z = interp.griddata(flattened_xy_in, z_in.flatten(), (x_new, y_new), method='cubic')
print('that took '+str(time.time()-starttime)+' seconds...')
Your input is data that lies on a regular rectangular grid, therefore you're wasting resources by using griddata which can be used to interpolate unstructured data. Instead it would make sense to use RectBivariateSpline which can speed up the interpolation massively if you already have data on a grid.
Hi guys first question here, looked for an answer but could not find anything, I will try to give it my best.
I am currently working on a problem in the field of Computational Physics and I am solving the Navier-Stokes equations numerically using the Finite Difference Method. It`s my first time working with Python (using a Google Colaboratory notebook with Python 3). I am solving the equations for a grid of points in a two-dimensional plane. I created this grid using np.arrays
import numpy as np
import matplotlib.pyplot as plt
N = 10
data = np.zeros((N,N))
and then manipulating it. For example
for i in range(N):
for j in range(N):
data[i,j] = i
which makes the values of the array increase with index i. However, if I plot my data-array now using
x = np.arange(N)
y = np.arange(N)
plt.contourf(x, y, data)
plt.colorbar()
The result of the example:
It shows that the plotted data increases along the y-axis even though my manipulation of the array should make it increase along the x-axis.
I noticed this happens because the indexing of arrays (i,j) is different from the standard orientation of x- and y-axis, but how can I plot my data-array as if i=x and j=y?
You can use numpy's ndindex function to get the indices based on shape and then unzip the result.
x,y=list(zip(*np.ndindex((N,N))))
The data is row by column and can be obtained with meshgrid. If you're interested in the same manipulation. You can make the data with meshgrid as
dx,dy=np.meshgrid(np.arange(N),np.arange(N))
And then plot the dy to get variation in the x axis.
In python 3.7, I have this numpy array with shape=(2, 34900). This arrays is a list of coordinates where the index 0 represents the X axis and the index 1 represents the y axis.
When I use seaborn.kde_plot() to make a visualization of the distribution of this data, I'm able to get the result in about 5-15 seconds when running on a i5 7th generation.
But when I try to run the following piece of code:
#Find the kernel for
k = scipy.stats.kde.gaussian_kde(data, bw_method=.3)
#Define the grid
xi, yi = np.mgrid[0:1:2000*1j, 0:1:2000*1j]
#apply the function
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
which finds the gaussian kernel for this data and applies it to a grid I defined, it takes much more time. I wasn't able to run the full array but when running on a slice with the size of 140, it takes about 40 seconds to complete.
The 140 sized slice does make an interesting result which I was able to visualize using plt.pcolormesh().
My question is what I am missing here. If I understood what is happening correctly, I'm using the scipy.stats.kde.gaussian_kde() to create an estimation of a function defined by the data. Then I'm applying the function to a 2D space and getting it's Z component as result. Then I'm plotting the Z component. But how can this process be any different from
seaborn.kde_plot() that makes the code take so much longer.
Scipy's implementation just goes through each point doing this:
for i in range(self.n):
diff = self.dataset[:, i, newaxis] - points
tdiff = dot(self.inv_cov, diff)
energy = sum(diff*tdiff,axis=0) / 2.0
result = result + exp(-energy)
Seaborn has in general two ways to calculate the bivariate kde. If available, it uses statsmodels, if not, it falls back to scipy.
The scipy code is similar to what is shown in the question. It uses scipy.stats.gaussian_kde. The statsmodels code uses statsmodels.nonparametric.api.KDEMultivariate.
However, for a fair comparisson we would need to take the same grid size for both methods. The standard gridsize for seaborn is 100 points.
import numpy as np; np.random.seed(42)
import seaborn.distributions as sd
N = 34900
x = np.random.randn(N)
y = np.random.randn(N)
bw="scott"
gridsize=100
cut=3
clip = [(-np.inf, np.inf), (-np.inf, np.inf)]
f = lambda x,y : sd._statsmodels_bivariate_kde(x, y, bw, gridsize, cut, clip)
g = lambda x,y : sd._scipy_bivariate_kde(x, y, bw, gridsize, cut, clip)
If we time those two functions,
# statsmodels
%timeit f(x,y) # 1 loop, best of 3: 16.4 s per loop
# scipy
%timeit g(x,y) # 1 loop, best of 3: 8.67 s per loop
Scipy is hence twice as fast as statsmodels (the seaborn default). The reason why the code in the question takes so long is that instead of a grid of size 100, a grid of size 2000 is used.
Seeing those results one would actually be tempted to use scipy instead of statsmodels. Unfortunately it does not allow to choose which one to use. One hence needs to manually set the respective flag.
import seaborn.distributions as sd
sd._has_statsmodels = False
# plot kdeplot with scipy.stats.kde.gaussian_kde
sns.kdeplot(x,y)
It seems that seaborn just takes a sample of my data. Since the size is smaller, it is able to finish it in a small amount. On the other hand, SciPy uses every single point in its processing. So it takes way longer with the size of dataset I'm using.
I have created a program to take position and velocity state vectors and calculate all of the Keplerian orbital elements. The next step I want to do is plot the orbit! Any advice on how to approach this using Python 3? Also, any advice about where to migrate this question (if this spot is not appropriate) would be much appreciated.
The best plotting package is, by far, pyplot. It is essentialy a port of the matlab plotting system to python, but it works better than the original. Install numpy & matplotlib and look at the simple plotting tutorials. Plotting would be something like:
import matplotlib.pyplot as plt;
plt.plot(X, Y, color);
plt.show();
where X and Y are 1D arrays of the corresponding x, y values. The answer can't be more specific, since you don't give details about how the variables are stored.
I recommend OpenCV. Here I used CV2 for Python.
import numpy as np
import cv2
cv2.namedWindow("Orbit",cv2.WINDOW_AUTOSIZE)
im_old = np.zeros((100,100))
for i in range(360*4):
xc = 50
yc = 50
im = im_old.copy()
x = 25*np.cos(i*np.pi/180.0)+xc
y = 25*np.sin(i*np.pi/180.0)+yc
im[(x-2):(x+3),(y-2):(y+3)] = 255
im_old[x,y] = 128
cv2.imshow("Orbit",im)
cv2.waitKey(10);
This is for Python 2.7, but I think it should still work.
EDIT: This is for visualizing the actual motion, if that's what you're looking for.
Is there a library module or other straightforward way to implement multivariate spline interpolation in python?
Specifically, I have a set of scalar data on a regularly-spaced three-dimensional grid which I need to interpolate at a small number of points scattered throughout the domain. For two dimensions, I have been using scipy.interpolate.RectBivariateSpline, and I'm essentially looking for an extension of that to three-dimensional data.
The N-dimensional interpolation routines I have found are not quite good enough: I would prefer splines over LinearNDInterpolator for smoothness, and I have far too many data points (often over one million) for, e.g., a radial basis function to work.
If anyone knows of a python library that can do this, or perhaps one in another language that I could call or port, I'd really appreciate it.
If I'm understanding your question correctly, your input "observation" data is regularly gridded?
If so, scipy.ndimage.map_coordinates does exactly what you want.
It's a bit hard to understand at first pass, but essentially, you just feed it a sequence of coordinates that you want to interpolate the values of the grid at in pixel/voxel/n-dimensional-index coordinates.
As a 2D example:
import numpy as np
from scipy import ndimage
import matplotlib.pyplot as plt
# Note that the output interpolated coords will be the same dtype as your input
# data. If we have an array of ints, and we want floating point precision in
# the output interpolated points, we need to cast the array as floats
data = np.arange(40).reshape((8,5)).astype(np.float)
# I'm writing these as row, column pairs for clarity...
coords = np.array([[1.2, 3.5], [6.7, 2.5], [7.9, 3.5], [3.5, 3.5]])
# However, map_coordinates expects the transpose of this
coords = coords.T
# The "mode" kwarg here just controls how the boundaries are treated
# mode='nearest' is _not_ nearest neighbor interpolation, it just uses the
# value of the nearest cell if the point lies outside the grid. The default is
# to treat the values outside the grid as zero, which can cause some edge
# effects if you're interpolating points near the edge
# The "order" kwarg controls the order of the splines used. The default is
# cubic splines, order=3
zi = ndimage.map_coordinates(data, coords, order=3, mode='nearest')
row, column = coords
nrows, ncols = data.shape
im = plt.imshow(data, interpolation='nearest', extent=[0, ncols, nrows, 0])
plt.colorbar(im)
plt.scatter(column, row, c=zi, vmin=data.min(), vmax=data.max())
for r, c, z in zip(row, column, zi):
plt.annotate('%0.3f' % z, (c,r), xytext=(-10,10), textcoords='offset points',
arrowprops=dict(arrowstyle='->'), ha='right')
plt.show()
To do this in n-dimensions, we just need to pass in the appropriate sized arrays:
import numpy as np
from scipy import ndimage
data = np.arange(3*5*9).reshape((3,5,9)).astype(np.float)
coords = np.array([[1.2, 3.5, 7.8], [0.5, 0.5, 6.8]])
zi = ndimage.map_coordinates(data, coords.T)
As far as scaling and memory usage goes, map_coordinates will create a filtered copy of the array if you're using an order > 1 (i.e. not linear interpolation). If you just want to interpolate at a very small number of points, this is a rather large overhead. It doesn't increase with the number points you want to interpolate at, however. As long as have enough RAM for a single temporary copy of your input data array, you'll be fine.
If you can't store a copy of your data in memory, you can either a) specify prefilter=False and order=1 and use linear interpolation, or b) replace your original data with a filtered version using ndimage.spline_filter, and then call map_coordinates with prefilter=False.
Even if you have enough ram, keeping the filtered dataset around can be a big speedup if you need to call map_coordinates multiple times (e.g. interactive use, etc).
Smooth spline interpolation in dim > 2 is difficult to implement, and so there are not many freely available libraries able to do that (in fact, I don't know any).
You can try inverse distance weighted interpolation, see: Inverse Distance Weighted (IDW) Interpolation with Python .
This should produce reasonably smooth results, and scale better than RBF to larger data sets.