Related
Summary
I have a 2880x2880 similarity matrix (8.5 mil points). My attempt with Holoviews resulted in a 500 MB HTML file which never finishes "opening". So how do I make a round heatmap of the matrix?
Details
I had data from 10 different places, measured over 1 whole year. The hours of each month were turned into arrays, so each month had 24 arrays (one for all 00:00, one for all 01:00 ... 22:00, 23:00).
These were about 28-31 cells long, and each cell had the measurement of the thing I'm trying to analyze. So there are these 24 arrays for each month of 1 whole year, i.e. 24x12 = 288 arrays per place. And there are measurements from 10 places. So a total of 2880 arrays were created and all compared to each other, and saved in a 2880x2880 matrix with similarity coefficients.
I'm trying to turn it into a radial similarity matrix like the one from holoviews, but without the ticks and tags (since the format Place01Jan0800 would be cumbersome to look at for 2880 rows), just the shape and colors and divisions:
I managed to create the HTML file itself, but it ended up being 500 MB big, so it never shows up when I open it up. It's just blank. I've added a minimal example below of what I have, and replaced the loading of the datafile with some randomly generated data.
import sys
sys.setrecursionlimit(10000)
import random
import numpy as np
import pandas as pd
import holoviews as hv
from holoviews import opts
from bokeh.plotting import show
import gc
# Function creating dummy data for this example
def transformer():
dimension = 2880
dummy_matrix = ([[ random.random() for i in range(dimension) ] for j in range(dimension)]) #Fake, similar data
col_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
row_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
val_vals = (np.reshape(np.array(dummy_matrix), -1)).tolist() # Turn matrix into an array
idx_vals = [i for i in range(dimension*dimension)] # Placeholder
return idx_vals, val_vals, row_vals, col_vals
idx_arr, val_arr, row_arr, col_arr = transformer()
df = pd.DataFrame({"values": val_arr, "x-label": row_arr, "y-label": col_arr}, index=idx_arr)
hv.extension('bokeh')
heatmap = hv.HeatMap(df, ["x-label", "y-label"])
heatmap.opts(opts.HeatMap(cmap="viridis", radial=True))
gc.collect() # Attempt to save memory, because this thing is huge
show(hv.render(heatmap))
I had a look at datashader to see if it would help, but I have no idea how to plug it in (if it's possible for this case) to this radial heatmap, since it seems like the radial heatmap doesn't have that datashade-feature.
So I have no idea how to tackle this. I would be content with a broad overview too, I don't need the details nor the hover-infobox nor ability to zoom or any fancy extra features, I just need the general overview for a presentation. I'm open to any solution really.
I recommend you to use heatmp instead of radial heatamp for showing the similarity matrix. The reasons are:
The radial heatmap is designed for periodic variable. The time varible(288 hours) can be considered to be periodic data, however, I think the 288*10(288 hours, 10 places) is no longer periodic because of the existence of the "place".
Near the center of the radial heatmap, the color points will be too dense to be understood by the human.
The following is a simple code to show a heatmap.
import matplotlib.cm
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import numpy as np
n = 2880
m = 2880
dummy_matrix = np.random.rand(m, n)
fig = plt.figure(figsize=(50,50)) # change the figsize to control the resolution
ax = fig.add_subplot(111)
cmap = matplotlib.cm.get_cmap("Blues") # you may use other build-in colormap or define you own colormap
# if your data is not in range[0,1], use a normalization. Here is normalized by min and max values.
norm = Normalize(vmin=np.amin(dummy_matrix), vmax=np.amax(dummy_matrix))
image = ax.imshow(dummy_matrix, cmap=cmap, norm=norm)
plt.colorbar(image)
plt.show()
Which gives:
Another idea that comes to me is that, perhaps the computation of similarity matrix is unnecessary, and you can plot the orginial 288 * 10 data using radial heat map or just a normal heatmap, and one can get to know the data similarity from the color distribution directly.
Plain Matplotlib seems to be able to handle it, based on answers from here: How do I create radial heatmap in matplotlib?
import random
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
fig = plt.figure()
ax = Axes3D(fig)
n = 2880
m = 2880
rad = np.linspace(0, 10, m)
a = np.linspace(0, 2 * np.pi, n)
r, th = np.meshgrid(rad, a)
dummy_matrix = ([[ random.random() for i in range(n) ] for j in range(m)])
plt.subplot(projection="polar")
plt.pcolormesh(th, r, dummy_matrix, cmap = 'Blues')
plt.plot(a, r, ls='none', color = 'k')
plt.grid()
plt.colorbar()
plt.savefig("custom_radial_heatmap.png")
plt.show()
And it didn't even take an eternity, took only about 20 seconds max.
You would think it would turn out monstrous like that
But the sheer amount of points drowns out the jaggedness, WOOHOO!
There's some things left to be desired, like tags and ticks, but I think I'll figure that out.
Hi everybody I am trying to do something easy in Python but for me, nothing seems to be easy. What I want to do is to plot a grid of nxn automatically that is when I change the numbers in (make) for instance (4,4) to plot a 4X4 grid if I write (10,10) to plot a 10X10 grid, and so on. My code is below. Any help, please
import matplotlib.pyplot as plt
import numpy as np
def make(X,Y):
x, y = np.indices((X,Y))
sqr = (x==0) & (y==0)
for i in range (X):
for j in range (Y):
sqr = (x==i) & (y==j)
plt.show()
plt(sqr)
make(10,10)
I have the following code to plot a 2d histogram in pyplot:
#!/usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt
MIN, MAX, num = .001, 5000, 500
minn=1
maxx=1000
zbins = 10 ** np.linspace(np.log10(MIN), np.log10(MAX), num)
x=np.linspace(100,600,50000)
y=np.linspace(0,500,50000)
fig1 = plt.figure(1)
counts1,xedges1,edges1,d=plt.hist2d(x,y,bins=zbins)
mesh1 = plt.pcolormesh(zbins, zbins, counts1)
plt.xlim([minn, maxx])
plt.ylim([minn, maxx])
plt.gca().set_xscale("log")
plt.gca().set_yscale("log")
plt.colorbar()
plt.show()
Apologies for my horrible variable naming!
Anyways, when I plot this, the histogram seems to have the x and y axes switched. I checked the matplotlib 2d hist documentation and I was sure that I had the x and y arguments in the right order, but I cannot for the life of me figure out where I'm going wrong. Any help would be greatly appreciated!
The confusion comes from the fact that the returned counts array is not what you think it is.
plt.hist2d internally uses numpy.histogram2d to compute the two-dimensional histogram. The documentation states as return values:
H : ndarray, shape(nx, ny)
The bi-dimensional histogram of samples x and y. Values in x are histogrammed along the first dimension and values in y are histogrammed along the second dimension.
xedges : ndarray, shape(nx,)
The bin edges along the first dimension.
yedges : ndarray, shape(ny,)
The bin edges along the second dimension.
Apart from the fact that there seems to be a mistake concerning the exact shape of the arrays, we see that the first dimension of the returned histogram array is x and the second y.
However, matplotlib always expects y to be the first dimenstion. Therefore, while plt.hist2d produces the correct plot, plt.pcolormesh needs a transposed version of the array.
plt.pcolormesh(X,Y, counts.T)
A full example, comparing plt.hist2d and plt.pcolormesh:
import numpy as np
import matplotlib.pyplot as plt
x=np.linspace(1,10,10)
y=np.linspace(6,9,10)
zbinsx= np.linspace(0,10,11)
zbinsy= np.linspace(5,10,6)
fig, (ax, ax2) = plt.subplots(ncols=2)
counts,xedges,yedges,d = ax.hist2d(x,y, bins=[zbinsx,zbinsy])
# counts has shape (10, 5)
X,Y = np.meshgrid(xedges,yedges)
mesh1 =ax2.pcolormesh(X,Y, counts.T)
plt.show()
I tried looking this up a lot and there are lot of information on specific examples but they are too specific to understand.
How do I put data in a Numpy N-D Matrix to a 3D graph. please refer below example
import numpy as np
X =20
Y = 20
Z = 2
sample = np.zeros(((X,Y,Z)))
sample[1][2][2]=45
sample[1][3][0]=52
sample[1][8][1]=42
sample[1][15][1]=30
sample[1][19][2]=15
I Want to use values on X,Y,Z positions to be on a 3D graph (plot).
Thanks in advance
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
# Define size of data
P= 25
X = 70
Y = 25
Z = 3
# Create meshgrid
x,y = np.meshgrid(np.arange(X),np.arange(Y))
# Create some random data (your example didn't work)
sample = np.random.randn((((P,X,Y,Z))))
# Create figure
fig=plt.figure()
ax=fig.add_subplot(111,projection='3d')
fig.show()
# Define colors
colors=['b','r','g']
# Plot for each entry of in Z
for i in range(Z):
ax.plot_wireframe(x, y, sample[:,:,:,i],color=colors[i])
plt.draw()
plt.show()
But I only want to draw X,Y,Z only.
when I used above code python throws me lots of errors like ValueError: too many values to unpack
Are you looking for something like this?
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
# Define size of data
X = 20
Y = 20
Z = 3
# Create meshgrid
x,y = np.meshgrid(np.arange(X),np.arange(Y))
# Create some random data (your example didn't work)
sample = np.random.randn(X,Y,Z)
# Create figure
fig=plt.figure()
ax=fig.add_subplot(111,projection='3d')
fig.show()
# Define colors
colors=['b','r','g']
# Plot for each entry of in Z
for i in range(Z):
ax.plot_wireframe(x, y, sample[:,:,i],color=colors[i])
plt.draw()
plt.show()
which would you give
There are plenty of other ways to display 3D data in matplotlib, see also here. However, you are always limited to 3 dimensions (or 4, if you do a 3D scatter plot where color encodes the 4th dimension). So you need to make a decision which dimensions you want to show or if you can summarize them somehow.
I have got something it may work for you. To understand it I explain the process I go briefly. I have connected 4x4x4 = 64 point masses to each other and created a cube with dampers and springs and inner friction. I solved the kinematic and mechanical behaviour using numpy and then I need to visualise the cube all I have is X,Y,Z points for each time step of each mass.
What I have is 4x4x4 XYZ points of a cube for each time tn:
Here how it goes :
import matplotlib.pyplot as plt
zeroPoint=points[50] # at time step 50 elastic cube in space
surf0x=zeroPoint[0,:,:,0]
surf0y=zeroPoint[0,:,:,1]
surf0z=zeroPoint[0,:,:,2]
surf1x=zeroPoint[:,0,:,0]
surf1y=zeroPoint[:,0,:,1]
surf1z=zeroPoint[:,0,:,2]
surf2x=zeroPoint[:,:,0,0]
surf2y=zeroPoint[:,:,0,1]
surf2z=zeroPoint[:,:,0,2]
surf3x=zeroPoint[nmx-1,:,:,0]
surf3y=zeroPoint[nmx-1,:,:,1]
surf3z=zeroPoint[nmx-1,:,:,2]
surf4x=zeroPoint[:,nmy-1,:,0]
surf4y=zeroPoint[:,nmy-1,:,1]
surf4z=zeroPoint[:,nmy-1,:,2]
surf5x=zeroPoint[:,:,nmz-1,0]
surf5y=zeroPoint[:,:,nmz-1,1]
surf5z=zeroPoint[:,:,nmz-1,2]
fig = plt.figure(figsize=(10,10))
wf = plt.axes(projection ='3d')
wf.set_xlim(-0.5,2)
wf.set_ylim(-0.5,2)
wf.set_zlim(-0.5,2)
wf.plot_wireframe(surf0x, surf0y, surf0z, color ='green')
wf.plot_wireframe(surf1x, surf1y, surf1z, color ='red')
wf.plot_wireframe(surf2x, surf2y, surf2z, color ='blue')
wf.plot_wireframe(surf3x, surf3y, surf3z, color ='black')
wf.plot_wireframe(surf4x, surf4y, surf4z, color ='purple')
wf.plot_wireframe(surf5x, surf5y, surf5z, color ='orange')
# displaying the visualization
wf.set_title('Its a Cube :) ')
pyplot.show()
at time step 190 same cube (animation is 60 FPS) :
The trick is as you see you need to create surfaces from points before you go. You dont even need np.meshgrid to do that. People does it for parametric z values calculation. If you have all points you dont need it.
I am working with a large number of 3D points, each with x,y,z values stored in numpy arrays.
For background, the points will always fall within a cylinder of fixed radius, and height = max z value of the points.
My objective is to split the bounding cylinder (or column if it is easier) into e.g. 1 m height strata, and then count the number of points within each cell
of a regular grid (e.g. 1 m x 1 m) overlaid on each strata.
Conceptually, the operation would be the same as overlaying a raster and counting the points intersecting each pixel.
The grid of cells can form a square or a disk, it doesn't matter.
After a lot of searching and reading, my current thinking is to use some combination of numpy.linspace and numpy.meshgrid to generate the vertices of each cell stored within an array and test each cell against each point to see if it is 'in'. This seems inefficient, especially when working with thousands of points.
The numpy / scipy suite seems well suited to the problem, but I have not found a solution yet. Any suggestions would be much appreciated.
I have included a few example points and some code to visualize the data.
# Setup
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Load in X,Y,Z values from a sub-sample of 10 points for testing
# XY Values are scaled to a reasonable point of origin
z_vals = np.array([3.08,4.46,0.27,2.40,0.48,0.21,0.31,3.28,4.09,1.75])
x_vals = np.array([22.88,20.00,20.36,24.11,40.48,29.08,36.02,29.14,32.20,18.96])
y_vals = np.array([31.31,25.04,31.86,41.81,38.23,31.57,42.65,18.09,35.78,31.78])
# This plot is instructive to visualize the problem
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x_vals, y_vals, z_vals, c='b', marker='o')
plt.show()
I am not sure I understand perfectly what you are looking for, but since every "cell" seems to have a 1m side for all directions, couldn't you:
round all your values to integers (rasterize your data) probably with some floor function;
create a bijection from these integer coordinates to something more convenient with something like:
(64**2)*x + (64)*y + z # assuming all values are in [0,63]
You can put z rather at the beginning if you want to more easely focus on height later
compute the histogram of each "cell" (several functions from numpy/scipy or numpy can do it);
revert the bijection if needed (ie. know the "true" coordinates of each cell once the count is known)
Maybe I didn't understand well, but in case it helps...
Thanks #Baruchel. It turns out the n-dimensional histograms suggested by #DilithiumMatrix provides a fairly simple solution to the problem I posted. After some reading, here is my current solution for anyone else that faces a similar problem.
As this is my first Python/Numpy effort any improvements/suggestions, especially regarding performance, would be welcome. Thanks.
# Setup
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Load in X,Y,Z values from a sub-sample of 10 points for testing
# XY Values are scaled to a reasonable point of origin
z_vals = np.array([3.08,4.46,0.27,2.40,0.48,0.21,0.31,3.28,4.09,1.75])
x_vals = np.array([22.88,20.00,20.36,24.11,40.48,29.08,36.02,29.14,32.20,18.96])
y_vals = np.array([31.31,25.04,31.86,41.81,38.23,31.57,42.65,18.09,35.78,31.78])
# Updated code below
# Variables needed for 2D,3D histograms
xmax, ymax, zmax = int(x_vals.max())+1, int(y_vals.max())+1, int(z_vals.max())+1
xmin, ymin, zmin = int(x_vals.min()), int(y_vals.min()), int(z_vals.min())
xrange, yrange, zrange = xmax-xmin, ymax-ymin, zmax-zmin
xedges = np.linspace(xmin, xmax, (xrange + 1), dtype=int)
yedges = np.linspace(ymin, ymax, (yrange + 1), dtype=int)
zedges = np.linspace(zmin, zmax, (zrange + 1), dtype=int)
# Make the 2D histogram
h2d, xedges, yedges = np.histogram2d(x_vals, y_vals, bins=(xedges, yedges))
assert np.count_nonzero(h2d) == len(x_vals), "Unclassified points in the array"
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
plt.imshow(h2d.transpose(), extent=extent, interpolation='none', origin='low')
# Transpose and origin must be used to make the array line up when using imshow, unsure why
# Plot settings, not sure yet on matplotlib update/override objects
plt.grid(b=True, which='both')
plt.xticks(xedges)
plt.yticks(yedges)
plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')
plt.plot(x_vals, y_vals, 'ro')
plt.show()
# 3-dimensional histogram with 1 x 1 x 1 m bins. Produces point counts in each 1m3 cell.
xyzstack = np.stack([x_vals,y_vals,z_vals], axis=1)
h3d, Hedges = np.histogramdd(xyzstack, bins=(xedges, yedges, zedges))
assert np.count_nonzero(h3d) == len(x_vals), "Unclassified points in the array"
h3d.shape # Shape of the array should be same as the edge dimensions
testzbin = np.sum(np.logical_and(z_vals >= 1, z_vals < 2)) # Slice to test with
np.sum(h3d[:,:,1]) == testzbin # Test num points in second bins
np.sum(h3d, axis=2) # Sum of all vertical points above each x,y 'pixel'
# only in this example the h2d and np.sum(h3d,axis=2) arrays will match as no z bins have >1 points
# Remaining issue - how to get a r x c count of empty z bins.
# i.e. for each 'pixel' how many z bins contained no points?
# Possible solution is to reshape to use logical operators
count2d = h3d.reshape(xrange * yrange, zrange) # Maintain dimensions per num 3D cells defined
zerobins = (count2d == 0).sum(1)
zerobins.shape
# Get back to x,y grid with counts - ready for output as image with counts=pixel digital number
bincount_pixels = zerobins.reshape(xrange,yrange)
# Appears to work, perhaps there is a way without reshapeing?
PS if you are facing a similar problem scikit patch extraction looks like another possible solution.