Bin counting with hex-like bins in 2D - python

I use numpy's historgram 2d
to count how many (training) data points lie in one each. For a new point (x,y) I can then query how may points are in the same bin as (x,y):
Is there something similar for "hex" bins like in the matplotlib plots
where I can fill the bins and then later query how may point are in each bin?

You can get the bin data, but it's not as simple as doing the same operation on a rectangular grid. The reason is that hex bins do not lend themselves to straightforward two-dimensional indexing. The function hexbin(), returns a PolyCollection which has the bin locations accessible through get_offsets() and bin values accessible through get_array(). So:
import matplotlib.pyplot as plt
hb = plt.hexbin(...)
bin_xy = hb.get_offsets()
bin_counts = hb.get_array()

Related

2D interpolate list of many points [duplicate]

So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.

Using python to plot a heat map from five arrays: x,y and 3 arrays indicating RGB

I have 2 arrays, x and y, respectively representing each point's coordinate on a 2D plane. I also have another 3 arrays of the same length as x and y. These three arrays represent the RGB values of a color. Therefore, each point in x,y correspond to a color indicated by the RGB arrays. In Python, how can I plot a heat map with x,y as its axes and colors from the three RGB arrays? Each array is, say, 1000 in length.
As an example that takes the first 10 points, I have:
x = [10.946028, 16.229064, -36.855, -38.719057, 11.231684, 33.256904999999996, -41.21, 12.294958, 16.113228, -43.429027000000005]
y = [-21.003803, 4.5, 4.5, -22.135853, 4.084630000000001, 17.860079000000002, -18.083685, -3.98297, -19.565272, 0.877016]
R = [0,1,2,3,4,5,6,7,8,9]
G = [2,4,6,8,10,12,14,16,18,20]
B = [0,255,0,255,0,255,0,255,0,255]
I'd like to draw a heat map that, for example, the first point would have the coordinates (10.946028,-21.003803) and has a color of R=0,G=2,B=0. The second point would have the coordinates (16.229064, 4.5) and has a color of R=1,G=4,B=255.
Ok it seems like you want like your own colormap for your heatmap. Actually you can write your own, or just use some of matplotlibs templates. Check out this post for the use of heatmaps with matplotlib. If you want to do it on your own, the easiest way is to recombine the 5 one-dimension vectors to a 3D-RGB image. Afterwards you have to define a mapping function which combines the R-G and B value to a new single value for every pixel. Like:
f(R,G,B) = a*R +b*G + c*B
a,b,c can be whatever you like, actually the formular can be way more complex, but you have to determine in which correlation the values should be. From that you get a 2D-Matrix filled with values of your function f(R,G,B). Now you have to define which value of this new matrix gets what color. This can be a linear mapping by hand (like just writing a list: 0=deep-Blue , 1= ligth-Red ...). Using this look-up table you can now get your own specific heatmap. But as you may see, that path takes some time so i would recommend not doing it and just use one of the various templates of matplotlib. Example:
import matplotlib.pyplot as plt
import numpy as np
a = np.random.random((16, 16))
plt.imshow(a, cmap='hot', interpolation='nearest')
plt.show()
You can use various types of these buy changing the string after cmap="hot" to sth of that list. Hope i could help you, gl hf.

Alternative Voronoi diagram with specified minimum number of samples in each cell

I'm using the scipy.spatial.Voronoi function to bin up my data where the cell sizes decrease with increased 2D point density, in Python.
However the function places cell boundaries around each point such that each cell can only contain one point. (This is the point of a Voronoi diagram anyway)
However, is there a way to specify a minimum number of samples to lie in each bin such that you still have smaller cells in over-dense regions but no cell contains only one point?
Link to the function in question: https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.spatial.Voronoi.html
Example code:
import matplotlib.pyplot as plt
from scipy.spatial import Voronoi, voronoi_plot_2d
vor = Voronoi(xy.T)
fig = voronoi_plot_2d(vor, show_vertices=False, line_colors='orange',
line_width=2, line_alpha=0.6, point_size=2)
Note: xy can be any stacked x and y coordinate array of points here.
Perhaps there's an alternative algorithm entirely for this task? Any help will be greatly appreciated!

Plotting histrogram from numpy array

I need to create histograms from the 2D arrays that I obtain from convolving an input array and a filter. The bins should as the range of the values in the array.
I tried following this example: How does numpy.histogram() work?
The code is this:
import matplotlib.pyplot as plt
import numpy as np
plt.hist(result, bins = (np.min(result), np.max(result),1))
plt.show()
I always get this error message:
AttributeError: bins must increase monotonically.
Thanks for any help.
What you are actually doing is specifying three bins where the first bin is np.min(result), second bin is np.max(result) and third bin is 1. What you need to do is provide where you want the bins to be located in the histogram, and this must be in increasing order. My guess is that you want to choose bins from np.min(result) to np.max(result). The 1 seems a bit odd, but I'm going to ignore it. Also, you want to plot a 1D histogram of values, yet your input is 2D. If you'd like to plot the distribution of your data over all unique values in 1D, you'll need to unravel your 2D array when using np.histogram. Use np.ravel for that.
Now, I'd like to refer you to np.linspace. You can specify a minimum and maximum value and as many points as you want in between uniformly spaced. So:
bins = np.linspace(start, stop)
The default number of points in between start and stop is 50, but you can override this:
bins = np.linspace(start, stop, num=100)
This means that we generate 100 points in between start and stop.
As such, try doing this:
import matplotlib.pyplot as plt
import numpy as np
num_bins = 100 # <-- Change here - Specify total number of bins for histogram
plt.hist(result.ravel(), bins=np.linspace(np.min(result), np.max(result), num=num_bins)) #<-- Change here. Note the use of ravel.
plt.show()

Find density of points from their scatter plot in python

Can I go through equal boxed areas in the scatter plot, so I can calculate how many there are on average on each box?
Or, is there a specific function in python to calculate this?
I don't want a colored density plot, but a number that represents the density of these points in the scatter plot.
Here is for example a plot of the eigenvalues of a random matrix:
How would I find their density?
from scipy import linalg as la
e = la.eigvals(my_matrix)
hist,xedges,yedges = np.histogram2d(e.real,e.imag,bins=40,normed=False)
So in this case, 'hist' would be a 40x40 array (since bins=40). Its elements are the number of eigenvalues for each bin.
Thanks to #jepio and #plonser for the comments.

Categories