NumPy histogram2d with a rotated, non-orthogonal binning grid - python

I need to compute (and plot) a histogram2d but my binning grid is rotated and also non-orthogonal.
A way of doing this could be to apply a transformation to my data so I get it into a cartesian system, compute my histogram2d and then apply the inverse transformation.
Can this be done directly without this overhead transformation ?
I guess my question is: how do I define the bins for my histogram2d in this case ? (AFAIK, histogram2d will only accept x and y aligned bins)
My data is 2 huge lists of points (10k~100k each), the coordinates of which are given in a cartesian coordinate system (actually a projected CRS because these are real-world locations) but they are organized in a regular grid that is not aligned to X and Y axis (rotated) and that may or may not be orthogonal. The binning grid will be derived from it so it will be a (rotated) regular quadrilaterals grid.
I have seen that matplotlib has a QuadMesh object (see here) so I'm being hopeful but I'm not sure how to handle this in NumPy.
Basically this is what I want to achieve:

After some testing, I came to the conclusion that the overhead of transforming the coordinates into a Cartesian grid to compute the histogram and back for plotting is acceptable. Matrix operations in NumPy are fairly efficient and I can handle 115+ million points in less than 7 sec.
However, the "back" part can be handled by Matplotlib directly with matplotlib.transforms.
pcolormesh, hist2d and imshow all accept a transform keyword which can be used to plot the Cartesian data into the desired coordinates like so:
# set I, J, bins (in the Cartesian system) and cmap
# a, b, c, d, e, f are values of the transformation matrix
transform = matplotlib.transforms.Affine2D.from_values(a, b, c, f, d, e, f)
fig, ax = plt.subplots(figsize=figsize)
_, _, _, im = ax.hist2d(I, J, bins=bins, cmap=cmap, transform=transform + ax.transData)
fig.colorbar(im)
ax.autoscale()
It is not really much faster than handling the "back" conversion with NumPy but it can make the code lighter as it only requires 1 additional line and 1 additional keyword.
imshow can be a little bit of a pain as it won't update the display extent after using ax.autoscale() and it handles coordinates as images or matrix so the transform has to be adjusted accordingly. For these reasons, I prefer hist2d.
References:
https://matplotlib.org/3.1.1/api/transformations.html#module-matplotlib.transforms
https://matplotlib.org/3.1.1/tutorials/advanced/transforms_tutorial.html

Related

Bilinear interpolation from a (distorted) rectangular 2D grid to arbitrary points, in Python

The task at hand is seemingly simple:
I have a 2D grid of data. The data is available in 2D arrays for X and Y coordinates, as well as the input variable which I want to interpolate. This means I can plot the data using rectangular cells, which means it is possible to use bilinear interpolation. Unfortunately, the data is not precisely aligned with the coordinates, and also not precisely spaced. There were some numerics involved in creating the data, which means that all sampling locations are a little off the mark, and the cell spacing is smooth but not uniform.
I would like to interpolate from this input grid to a set of predefined sample coordinates (as opposed to simply refining the mesh).
In short, an example for my type of input is:
# a nice, regular grid
Xs, Ys = np.meshgrid(np.linspace(0, 1, num=3), np.linspace(0, 1, num=5))
# ...perturbed by some systematic and some random noise ...
X_in = Xs + np.random.normal(scale=0.03, size=(5, 3))
# ...and some systematic deviation
Y_in = (Ys + np.random.normal(scale=0.03, size=(5, 3)))* (1 + Xs**1.5)
# and some variable at each node to interpolate
Z_in = np.random.normal(scale=1, size=(5, 3))
So (X_in, Y_in) are arrays of shape (n, m) which define a mesh with quadrilateral cells, and Z_in another array of ther same shape which provides a value at each node in that mesh. I am looking for some Python library that performs bilinear interpolation of Z_in across those cells.
However, all methods I have found so far either ignore the rectangular structure (and triangulate the data, or fit some 2D spline through arbitrary point clouds), or require a perfectly rectangular and equally-spaced grid as input (which mine is not).
Examples of answers/methods that seem not to be appliccable:
This answer recommends using scipy.ndimage.map_coordinates -- but that effectively uses the indices of the 2D input data array as coordinates, which won't work for me.
scipy.interpolate.interp2d requires either a regular grid (node locations provided by 1D X and Y arrays), or an irregular one, which is flattened, which means that the algorithm cannot know which nodes form a cell. This means it either fits some spline through unstructured data, or triangulates it. And it only interpolates onto regular grids or individual points.
scipy.interpolate.RectBivariateSpline is recommended for interpolation from gridded data but only accepts input points which are perfectly aligned with the coordinate system.
There's also a Matplotlib toolkit for interpolation, which I had thought should be able to do this sort of thing, as it also does interpolated contour plots of rectangular meshes, but as it turns out, even though mpl_toolkits.basemap.interp accepts arbitrary quadrilateral meshes as target for interpolation, it cannot use them as inputs ...
Upon closer inspection, it turns out that even matplotlib.plt.contour() does not seem to perform bilinear interpolation when plotting the input data:
plt.contour(X_in, Y_in, Z_in, levels=np.linspace(Z_in.min(), Z_in.max(), 50))
plt.plot(X_in, Y_in, 'k-')
plt.plot(X_in.T, Y_in.T, 'k-')
As you can see, the contour lines within the cells are straight, but with bilinear interpolation, they should not be, and there should not be those empty quadrilateral areas in the mittle of some cells. I suspect that Matplotlib only finds the contour values on the cell edges and simply draws straight lines between them.
I have found two explanations of the maths of bilinear interpolation from grids which are not perfectly aligned, but I was hoping to come across a ready-made implementation somewhere because I'm sure that this kind of task is not so rare, and a numpy or scipy implementation (if it exists) is probably way faster than whatever I'd implement myself.

Plotting a 2D plane through a 3D surface

I'm trying to visualise a 2D plane cutting through a 3D graph with Numpy and Matplotlib to explain the intuition of partial derivatives.
Specifically, the function I'm using is J(θ1,θ2) = θ1^2 + θ2^2, and I want to plot a θ1-J(θ1,θ2) plane at θ2=0.
I have managed to plot a 2D plane with the below code but the superposition of the 2D plane and the 3D graph isn't quite right and the 2D plane is slightly off, as I want the plane to look like it's cutting the 3D at θ2=0.
It would be great if I can borrow your expertise on this, thanks.
def f(theta1, theta2):
return theta1**2 + theta2**2
fig, ax = plt.subplots(figsize=(6, 6),
subplot_kw={'projection': '3d'})
x,z = np.meshgrid(np.linspace(-1,1,100), np.linspace(0,2,100))
X = x.T
Z = z.T
Y = 0 * np.ones((100, 100))
ax.plot_surface(X, Y, Z)
r = np.linspace(-1,1,100)
theta1_grid, theta2_grid = np.meshgrid(r,r)
J_grid = f(theta1_grid, theta2_grid)
ax.contour3D(theta1_grid,theta2_grid,J_grid,500,cmap='binary')
ax.set_xlabel(r'$\theta_1$',fontsize='large')
ax.set_ylabel(r'$\theta_2$',fontsize='large')
ax.set_zlabel(r'$J(\theta_1,\theta_2)$',fontsize='large')
ax.set_title(r'Fig.2 $J(\theta_1,\theta_2)=(\theta_1^2+\theta_2^2)$',fontsize='x-large')
plt.tight_layout()
plt.show()
This is the image output by the code:
As #ImportanceOfBeingErnest noted in a comment, your code is fine but matplotlib has a 2d engine, so 3d plots easily show weird artifacts. In particular, objects are rendered one at a time, so two 3d objects are typically either fully in front of or fully behind one another, which makes the visualization of interlocking 3d objects near impossible using matplotlib.
My personal alternative suggestion would be mayavi (incredible flexibility and visualizations, pretty steep learning curve), however I would like to show a trick with which the problem can often be removed altogether. The idea is to turn your two independent objects into a single one using an invisible bridge between your surfaces. Possible downsides of the approach are that
you need to plot both surfaces as surfaces rather than a contour3D, and
the output relies heavily on transparency, so you need a backend that can handle that.
Disclaimer: I learned this trick from a contributor to the matplotlib topic of the now-defunct Stack Overflow Documentation project, but unfortunately I don't remember who that user was.
In order to use this trick for your use case, we essentially have to turn that contour3D call to another plot_surface one. I don't think this is overall that bad; you perhaps need to reconsider the density of your cutting plane if you see that the resulting figure has too many faces for interactive use. We also have to explicitly define a point-by-point colormap, the alpha channel of which contributes the transparent bridge between your two surfaces. Since we need to stitch the two surfaces together, at least one "in-plane" dimension of the surfaces have to match; in this case I made sure that the points along "y" are the same in the two cases.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def f(theta1, theta2):
return theta1**2 + theta2**2
fig, ax = plt.subplots(figsize=(6, 6),
subplot_kw={'projection': '3d'})
# plane data: X, Y, Z, C (first three shaped (nx,ny), last one shaped (nx,ny,4))
x,z = np.meshgrid(np.linspace(-1,1,100), np.linspace(0,2,100)) # <-- you can probably reduce these sizes
X = x.T
Z = z.T
Y = 0 * np.ones((100, 100))
# colormap for the plane: need shape (nx,ny,4) for RGBA values
C = np.full(X.shape + (4,), [0,0,0.5,1]) # dark blue plane, fully opaque
# surface data: theta1_grid, theta2_grid, J_grid, CJ (shaped (nx',ny) or (nx',ny,4))
r = np.linspace(-1,1,X.shape[1]) # <-- we are going to stitch the surface along the y dimension, sizes have to match
theta1_grid, theta2_grid = np.meshgrid(r,r)
J_grid = f(theta1_grid, theta2_grid)
# colormap for the surface; scale data to between 0 and 1 for scaling
CJ = plt.get_cmap('binary')((J_grid - J_grid.min())/J_grid.ptp())
# construct a common dataset with an invisible bridge, shape (2,ny) or (2,ny,4)
X_bridge = np.vstack([X[-1,:],theta1_grid[0,:]])
Y_bridge = np.vstack([Y[-1,:],theta2_grid[0,:]])
Z_bridge = np.vstack([Z[-1,:],J_grid[0,:]])
C_bridge = np.full(Z_bridge.shape + (4,), [1,1,1,0]) # 0 opacity == transparent; probably needs a backend that supports transparency!
# join the datasets
X_surf = np.vstack([X,X_bridge,theta1_grid])
Y_surf = np.vstack([Y,Y_bridge,theta2_grid])
Z_surf = np.vstack([Z,Z_bridge,J_grid])
C_surf = np.vstack([C,C_bridge,CJ])
# plot the joint datasets as a single surface, pass colors explicitly, set strides to 1
ax.plot_surface(X_surf, Y_surf, Z_surf, facecolors=C_surf, rstride=1, cstride=1)
ax.set_xlabel(r'$\theta_1$',fontsize='large')
ax.set_ylabel(r'$\theta_2$',fontsize='large')
ax.set_zlabel(r'$J(\theta_1,\theta_2)$',fontsize='large')
ax.set_title(r'Fig.2 $J(\theta_1,\theta_2)=(\theta_1^2+\theta_2^2)$',fontsize='x-large')
plt.tight_layout()
plt.show()
The result from two angles:
As you can see, the result is pretty decent. You can start playing around with the individual transparencies of your surfaces to see if you can make that cross-section more visible. You can also switch the opacity of the bridge to 1 to see how your surfaces are actually stitched together. All in all what we had to do was take your existing data, make sure their sizes match, and define explicit colormaps and the auxiliary bridge between the surfaces.

Python - Problems contour plotting offset grid of data

My data is regularly spaced, but not quite a grid - each row of points is slightly offset from the one below.
The data is in the form of 3 1D arrays, x, y, z, with each index corresponding to a point. It is smoothly varying data - approximately Gaussian.
The point density is quite high. What is the best way to plot this data?
I tried meshgrid, but it gives me some bad contours through regions that have no data points near the contour's value.
I have tried rbf interpolation according to this post:
Python : 2d contour plot from 3 lists : x, y and rho?
but this just gives me nonsense - all the contours are on one edge - does not reflect the data at all.
Any other ideas for what I can try. Maybe I should be using some sort of nearest neighbour interpolation? Here is a picture of about a 1/4 of my data points: http://imgur.com/a/b00R6
I'm surprised it is causing me such difficulty - it seems like it should be fairly easy to plot.
The easiest way to plot ungridded data is probably tricontour or tricontourf (a filled tricontour plot).
Having 1D arrays of the x, y and z coordinates x, y and z, you'd simply call
plt.tricontourf(x,y,z, n, ...)
to obtain n levels of contours.
The other quick method is to interpolate on a grid using matplotlib.mlab.griddata to obtain a regular grid from the irregular points.
Both methods are compared in an example on the matplotlib page:
Tricontour vs. griddata
Found the answer: needed to rescale my data.

Solving for zeroes in interpolated data in numpy/matplotlib

I have some data over a 2D range that I am interested in analyzing. These data were originally in lists x,y, and z where z[i] was the value for the point located at (x[i],y[i]). I then interpolated this data onto a regular grid using
x=np.array(x)
y=np.array(y)
z=np.array(z)
xi=np.linspace(minx,maxx,100)
yi=np.linspace(miny,maxy,100)
zi=griddata(x,y,z,xi,yi)
I then plotted the xi,yi,zi data using
plt.contour(xi,yi,zi)
plt.pcolormesh(xi,yi,zi,cmap=plt.get_cmap('PRGn'),norm=plt.Normalize(-10,10),vmin=-10,vmax=10)
This produced this plot:
In this plot you can see the S-like curve where the values are equal to zero (aside: the data doesn't vary as rapidly as shown in the colorbar -- that's simply a result of me normalizing the data to -10-10 when it actually extends far beyond that range; I did this to make the zero-valued region show up better -- maybe there's a better way of doing this too...).
The scattered dots are simply the points at which I have original data (yes, in this case my data was already on a regular grid). What I'm curious about is whether there is a good way for me to extract the values for which the curve is zero and obtain x,y pairs that, if plotted as a line, would trace that zero-region in the colormesh. I could interpolate to a really fine grid and then just brute force search for the values which are closest to zero. But is there a more automatic way of doing this, or a more automatic way of plotting this "zero-line"?
And a secondary question: I am using griddata correctly, right? I have these simple 1D arrays although elsewhere people use various meshgrids, loading texts, etc., before calling griddata.
Here is a full example:
import numpy as np
import matplotlib.pyplot as plt
y, x = np.ogrid[-1.5:1.5:200j, -1.5:1.5:200j]
f = (x**2 + y**2)**4 - (x**2 - y**2)**2
plt.figure(figsize=(9,4))
plt.subplot(121)
extent = [np.min(x), np.max(x), np.min(y), np.max(y)]
cs = plt.contour(f, extent=extent, levels=[0.1],
colors=["b", "r"], linestyles=["solid", "dashed"], linewidths=[2, 2])
plt.subplot(122)
# get the points on the lines
for c in cs.collections:
data = c.get_paths()[0].vertices
plt.plot(data[:,0], data[:,1],
color=c.get_color()[0], linewidth=c.get_linewidth()[0])
plt.show()
here is the output:

Multivariate spline interpolation in python/scipy?

Is there a library module or other straightforward way to implement multivariate spline interpolation in python?
Specifically, I have a set of scalar data on a regularly-spaced three-dimensional grid which I need to interpolate at a small number of points scattered throughout the domain. For two dimensions, I have been using scipy.interpolate.RectBivariateSpline, and I'm essentially looking for an extension of that to three-dimensional data.
The N-dimensional interpolation routines I have found are not quite good enough: I would prefer splines over LinearNDInterpolator for smoothness, and I have far too many data points (often over one million) for, e.g., a radial basis function to work.
If anyone knows of a python library that can do this, or perhaps one in another language that I could call or port, I'd really appreciate it.
If I'm understanding your question correctly, your input "observation" data is regularly gridded?
If so, scipy.ndimage.map_coordinates does exactly what you want.
It's a bit hard to understand at first pass, but essentially, you just feed it a sequence of coordinates that you want to interpolate the values of the grid at in pixel/voxel/n-dimensional-index coordinates.
As a 2D example:
import numpy as np
from scipy import ndimage
import matplotlib.pyplot as plt
# Note that the output interpolated coords will be the same dtype as your input
# data. If we have an array of ints, and we want floating point precision in
# the output interpolated points, we need to cast the array as floats
data = np.arange(40).reshape((8,5)).astype(np.float)
# I'm writing these as row, column pairs for clarity...
coords = np.array([[1.2, 3.5], [6.7, 2.5], [7.9, 3.5], [3.5, 3.5]])
# However, map_coordinates expects the transpose of this
coords = coords.T
# The "mode" kwarg here just controls how the boundaries are treated
# mode='nearest' is _not_ nearest neighbor interpolation, it just uses the
# value of the nearest cell if the point lies outside the grid. The default is
# to treat the values outside the grid as zero, which can cause some edge
# effects if you're interpolating points near the edge
# The "order" kwarg controls the order of the splines used. The default is
# cubic splines, order=3
zi = ndimage.map_coordinates(data, coords, order=3, mode='nearest')
row, column = coords
nrows, ncols = data.shape
im = plt.imshow(data, interpolation='nearest', extent=[0, ncols, nrows, 0])
plt.colorbar(im)
plt.scatter(column, row, c=zi, vmin=data.min(), vmax=data.max())
for r, c, z in zip(row, column, zi):
plt.annotate('%0.3f' % z, (c,r), xytext=(-10,10), textcoords='offset points',
arrowprops=dict(arrowstyle='->'), ha='right')
plt.show()
To do this in n-dimensions, we just need to pass in the appropriate sized arrays:
import numpy as np
from scipy import ndimage
data = np.arange(3*5*9).reshape((3,5,9)).astype(np.float)
coords = np.array([[1.2, 3.5, 7.8], [0.5, 0.5, 6.8]])
zi = ndimage.map_coordinates(data, coords.T)
As far as scaling and memory usage goes, map_coordinates will create a filtered copy of the array if you're using an order > 1 (i.e. not linear interpolation). If you just want to interpolate at a very small number of points, this is a rather large overhead. It doesn't increase with the number points you want to interpolate at, however. As long as have enough RAM for a single temporary copy of your input data array, you'll be fine.
If you can't store a copy of your data in memory, you can either a) specify prefilter=False and order=1 and use linear interpolation, or b) replace your original data with a filtered version using ndimage.spline_filter, and then call map_coordinates with prefilter=False.
Even if you have enough ram, keeping the filtered dataset around can be a big speedup if you need to call map_coordinates multiple times (e.g. interactive use, etc).
Smooth spline interpolation in dim > 2 is difficult to implement, and so there are not many freely available libraries able to do that (in fact, I don't know any).
You can try inverse distance weighted interpolation, see: Inverse Distance Weighted (IDW) Interpolation with Python .
This should produce reasonably smooth results, and scale better than RBF to larger data sets.

Categories