I've gone through tens of answers regarding heatmaps on this forum but I am still running into problems, so I thought I'd ask myself. Bare in mind that until a month ago I had no idea what Python was.
So, I have a large file of data in three columns. The first two are standard x-y coordinates. For each point, there is a third variable, z, that I want to use as weighting to build some sort of heatmap.
I have seen several methods, e.g. using meshgrid or changing array size, but what I think the problem is is that my array is not regular or rectangular. It's just a mess of random points in the x-y plane, not evenly spaced with each other, each with a z value.
Here is just a small snippet of the data I have in my spreadsheet:
x y z
392 616 0.5
416 614 1
497 603 3
533 598 3.5
383 589 0.5
574 574 4
...
I tried several methods, e.g. reshaping the arrays, but I always get some kind of error. How can I plot this data as a heatmap with the weighting of each point given by z? Thank you.
I'm aware that, since the data points are not regularly spaced out, there might be gaps where the heatmap would be zero, but I can sort those out later by extrapolating their weighting via a method I figured out, so that wouldn't be a problem.
The closest I got to getting the graph I'm looking for is using this code:
plt.hist2d(x, y, bins=8, weights=z, cmap="Greys")
plt.colorbar()
However, the problem with this is that, if there is more than one point in a given "bin", it computes the "aggregate" weighting -- e.g. if in a particular bin there are two data points with weightings of 1 and 2.5, respectively, the bin will be coloured as if its weighting was 1+2.5=3.5. Is there any way I can get it to display the colour corresponding to the weighting of the data point closest to the bin center?
e.g. if the data point with weighting 2.5 was really close to the bin center while the one with weighting 1 was along one of the bin's edges, is there a way I can get the bin to have weighting 2.5?
Thank you and sorry for disturbing.
Have a look at http://scipy-cookbook.readthedocs.io/items/Matplotlib_Gridding_irregularly_spaced_data.html
The idea is to use griddata from scipy.interpolate to get your irregularly spaced data on a regularly spaced grid.
If I assume you have your data x, y, z in numpy arrays, you can modify the example given in the docs:
# define grid.
xi = np.linspace(np.amin(x),np.amax(x),100)
yi = np.linspace(np.amin(y),np.amax(y),100)
# grid the data.
zi = griddata((x, y), z, (xi[None,:], yi[:,None]), method='cubic')
# contour the gridded data
CS = plt.contour(xi,yi,zi,15,linewidths=0.5,colors='k')
CS = plt.contourf(xi,yi,zi,15,cmap=plt.cm.jet)
plt.colorbar() # draw colorbar
Related
So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.
I have x,y,z data that define a surface (x and y position, z height).
The data is imperfect, in that it contains some noise, i.e. not every point lies precisely on the plane I wish to model, just very close to it.
I only have data within a triangular region, not the full x,y, plane.
Here is an example with z represented by colour:
In this example the data has been sampled in the centres of triangles on a mesh like this (each blue dot is a sample):
If it is necessary, the samples could be evenly spaced on an x,y grid, though a solution where this is not required is preferable.
I want to represent this data as a sum of sines and cosines in order to manipulate it mathematically. Ideally using as few terms as are needed to keep the error of the fit acceptably low.
If this were a square region I would take the 2D Fourier transform and discard higher frequency terms.
However I think this situation has two key differences that make this approach not viable:
Ideally I want to use samples at the points indicated by the blue dots in my grid above. I could instead use a regular x,y grid if there is no alternative, but this is not an ideal solution
I do not have data for the whole x,y, plane. The white areas in the first image above do not contain data that should be considered in the fit.
So in summary my question is thus:
Is there a way to extract coefficients for a best-fit of this data using a linear combination of sines and cosines?
Ideally using python.
My apologies if this is more of a mathematics question and stack overflow is not the correct place to post this!
EDIT: Here is an example dataset in python style [x,y,z] form - sorry it's huge but apparently I can't use pastebin?:
[[1.7500000000000001e-08, 1.0103629710818452e-08, 14939.866751020554],
[1.7500000000000001e-08, 2.0207259421636904e-08, 3563.2218207404617],
[8.7500000000000006e-09, 5.0518148554092277e-09, 24529.964593228644],
[2.625e-08, 5.0518148554092261e-09, 24529.961688158553],
[1.7500000000000001e-08, 5.0518148554092261e-09, 21956.74682671843],
[2.1874999999999999e-08, 1.2629537138523066e-08, 10818.190869824304],
[1.3125000000000003e-08, 1.2629537138523066e-08, 10818.186813746233],
[1.7500000000000001e-08, 2.5259074277046132e-08, 3008.9480862705223],
[1.3125e-08, 1.7681351993932294e-08, 5630.9978116591838],
[2.1874999999999999e-08, 1.768135199393229e-08, 5630.9969846863969],
[8.7500000000000006e-09, 1.0103629710818454e-08, 13498.380006002562],
[4.3750000000000003e-09, 2.5259074277046151e-09, 40376.866196753763],
[1.3125e-08, 2.5259074277046143e-09, 26503.432370909999],
[2.625e-08, 1.0103629710818452e-08, 13498.379635232159],
[2.1874999999999999e-08, 2.5259074277046139e-09, 26503.430698738041],
[3.0625000000000005e-08, 2.525907427704613e-09, 40376.867011915041],
[8.7500000000000006e-09, 1.2629537138523066e-08, 11900.832515759088],
[6.5625e-09, 8.8406759969661469e-09, 17422.002946526718],
[1.09375e-08, 8.8406759969661469e-09, 17275.788904632376],
[4.3750000000000003e-09, 5.0518148554092285e-09, 30222.756636780832],
[2.1875000000000001e-09, 1.2629537138523088e-09, 64247.241146490327],
[6.5625e-09, 1.2629537138523084e-09, 35176.652106572205],
[1.3125e-08, 5.0518148554092277e-09, 22623.574247287044],
[1.09375e-08, 1.2629537138523082e-09, 27617.700396641056],
[1.5312500000000002e-08, 1.2629537138523078e-09, 25316.907231576402],
[2.625e-08, 1.2629537138523066e-08, 11900.834523905782],
[2.4062500000000001e-08, 8.8406759969661469e-09, 17275.796410700641],
[2.8437500000000002e-08, 8.8406759969661452e-09, 17422.004617294893],
[2.1874999999999999e-08, 5.0518148554092269e-09, 22623.570035270699],
[1.96875e-08, 1.2629537138523076e-09, 25316.9042194055],
[2.4062500000000001e-08, 1.2629537138523071e-09, 27617.700160860692],
[3.0625000000000005e-08, 5.0518148554092261e-09, 30222.765972585737],
[2.8437500000000002e-08, 1.2629537138523069e-09, 35176.65151453446],
[3.2812500000000003e-08, 1.2629537138523065e-09, 64247.246775422129],
[2.1875000000000001e-09, 2.5259074277046151e-09, 46711.23463223876],
[1.0937500000000001e-09, 6.3147685692615553e-10, 101789.89315354674],
[3.28125e-09, 6.3147685692615543e-10, 52869.788364220134],
[3.2812500000000003e-08, 2.525907427704613e-09, 46711.229428833962],
[3.1718750000000001e-08, 6.3147685692615347e-10, 52869.79233902022],
[3.3906250000000006e-08, 6.3147685692615326e-10, 101789.92509671643],
[1.0937500000000001e-09, 1.2629537138523088e-09, 82527.848790063814],
[5.4687500000000004e-10, 3.1573842846307901e-10, 137060.87432327325],
[1.640625e-09, 3.157384284630789e-10, 71884.380087542726],
[3.3906250000000006e-08, 1.2629537138523065e-09, 82527.861035177877],
[3.3359375000000005e-08, 3.1573842846307673e-10, 71884.398689011548],
[3.4453125000000001e-08, 3.1573842846307663e-10, 137060.96214950032],
[4.3750000000000003e-09, 6.3147685692615347e-09, 18611.868317256733],
[3.28125e-09, 4.4203379984830751e-09, 27005.961455364879],
[5.4687499999999998e-09, 4.4203379984830751e-09, 28655.126635802204],
[3.0625000000000005e-08, 6.314768569261533e-09, 18611.869287539808],
[2.9531250000000002e-08, 4.4203379984830734e-09, 28655.119850641502],
[3.1718750000000001e-08, 4.4203379984830726e-09, 27005.959731047784]]
Nothing stops you from doing normal linear least squares with whatever basis you like. (You'll have to work out the periodicity you want, as mikuszefski said.) The lack of samples outside the triangle will naturally blind the method to the function's behavior out there. You probably want to weight the samples according to the area of their mesh cell, to avoid overfitting the corners.
Here some code that might help to fit periodic spikes. That also shows the use of the base x, x/2+ sqrt(3)/2 * y. The flat part can then be handled by low order Fourier. I hope that gives an idea. (BTW I agree with Davis Herring that area weighting is a good idea). For the fit, I guess, good initial guesses are crucial.
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
def gauss(x,s):
return np.exp(-x**2/(2.*s**2))
fig = plt.figure()
ax = fig.gca(projection='3d')
X = np.arange(-5, 5, 0.15)
Y = np.arange(-5, 5, 0.15)
X, Y = np.meshgrid(X, Y)
kX=np.sin(X)
kY=np.sin(0.5*X+0.5*np.sqrt(3.)*Y)
R = np.sqrt(kX**2 + kY**2)
Z = gauss(R,.4)
#~ surf = ax.plot_wireframe(X, Y, Z, linewidth=1)
surf= ax.plot_surface(X, Y, Z, rstride=1, cstride=1,linewidth=0, antialiased=False)
plt.show()
Output:
My data is regularly spaced, but not quite a grid - each row of points is slightly offset from the one below.
The data is in the form of 3 1D arrays, x, y, z, with each index corresponding to a point. It is smoothly varying data - approximately Gaussian.
The point density is quite high. What is the best way to plot this data?
I tried meshgrid, but it gives me some bad contours through regions that have no data points near the contour's value.
I have tried rbf interpolation according to this post:
Python : 2d contour plot from 3 lists : x, y and rho?
but this just gives me nonsense - all the contours are on one edge - does not reflect the data at all.
Any other ideas for what I can try. Maybe I should be using some sort of nearest neighbour interpolation? Here is a picture of about a 1/4 of my data points: http://imgur.com/a/b00R6
I'm surprised it is causing me such difficulty - it seems like it should be fairly easy to plot.
The easiest way to plot ungridded data is probably tricontour or tricontourf (a filled tricontour plot).
Having 1D arrays of the x, y and z coordinates x, y and z, you'd simply call
plt.tricontourf(x,y,z, n, ...)
to obtain n levels of contours.
The other quick method is to interpolate on a grid using matplotlib.mlab.griddata to obtain a regular grid from the irregular points.
Both methods are compared in an example on the matplotlib page:
Tricontour vs. griddata
Found the answer: needed to rescale my data.
I have a relatively large (spatially) irregular grid. It consists of multiple curvilinear grids (see example) of which I have all the x, y coordinates and the z-value (height) of the individual cells.
These cells can be in the order of 50m x 50m.
I would like to interpolate this to a 1m x 1m grid, how would one do such a thing efficiently in python?
If possible I would like to not create a grid which also spans the regions without grid cells since I think it is likely that the interpolated dataset will be quite large.
EDIT: tried to do 2D interpolation in python but cant get it to work
I found this script: http://scipy-cookbook.readthedocs.io/items/Matplotlib_Gridding_irregularly_spaced_data.html
From this example I tried to do the same but get only nan values. The script:
XZ2 = XZ[XZ != 0.].ravel()
YZ2 = YZ[YZ != 0.].ravel()
dps2 = dps[XZ != 0.].ravel()
# define grid.
xi = np.arange(math.floor(min(XZ2)), math.ceil(max(XZ2))+1, 1)
yi = np.arange(math.floor(min(YZ2)), math.ceil(max(YZ2))+1, 1)
# grid the data.
zi = griddata((XZ2, YZ2), dps, (xi[None,:], yi[:,None]), method='linear')
Any idea why?
I didn't do it in QGis because I feel I should be able to do this in python..
Thanks
The script above does work, it just creates lots of nan values, giving the impression that the result is incorrect..
I have some data over a 2D range that I am interested in analyzing. These data were originally in lists x,y, and z where z[i] was the value for the point located at (x[i],y[i]). I then interpolated this data onto a regular grid using
x=np.array(x)
y=np.array(y)
z=np.array(z)
xi=np.linspace(minx,maxx,100)
yi=np.linspace(miny,maxy,100)
zi=griddata(x,y,z,xi,yi)
I then plotted the xi,yi,zi data using
plt.contour(xi,yi,zi)
plt.pcolormesh(xi,yi,zi,cmap=plt.get_cmap('PRGn'),norm=plt.Normalize(-10,10),vmin=-10,vmax=10)
This produced this plot:
In this plot you can see the S-like curve where the values are equal to zero (aside: the data doesn't vary as rapidly as shown in the colorbar -- that's simply a result of me normalizing the data to -10-10 when it actually extends far beyond that range; I did this to make the zero-valued region show up better -- maybe there's a better way of doing this too...).
The scattered dots are simply the points at which I have original data (yes, in this case my data was already on a regular grid). What I'm curious about is whether there is a good way for me to extract the values for which the curve is zero and obtain x,y pairs that, if plotted as a line, would trace that zero-region in the colormesh. I could interpolate to a really fine grid and then just brute force search for the values which are closest to zero. But is there a more automatic way of doing this, or a more automatic way of plotting this "zero-line"?
And a secondary question: I am using griddata correctly, right? I have these simple 1D arrays although elsewhere people use various meshgrids, loading texts, etc., before calling griddata.
Here is a full example:
import numpy as np
import matplotlib.pyplot as plt
y, x = np.ogrid[-1.5:1.5:200j, -1.5:1.5:200j]
f = (x**2 + y**2)**4 - (x**2 - y**2)**2
plt.figure(figsize=(9,4))
plt.subplot(121)
extent = [np.min(x), np.max(x), np.min(y), np.max(y)]
cs = plt.contour(f, extent=extent, levels=[0.1],
colors=["b", "r"], linestyles=["solid", "dashed"], linewidths=[2, 2])
plt.subplot(122)
# get the points on the lines
for c in cs.collections:
data = c.get_paths()[0].vertices
plt.plot(data[:,0], data[:,1],
color=c.get_color()[0], linewidth=c.get_linewidth()[0])
plt.show()
here is the output: