Radial heatmap from similarity matrix in Python - python

Summary
I have a 2880x2880 similarity matrix (8.5 mil points). My attempt with Holoviews resulted in a 500 MB HTML file which never finishes "opening". So how do I make a round heatmap of the matrix?
Details
I had data from 10 different places, measured over 1 whole year. The hours of each month were turned into arrays, so each month had 24 arrays (one for all 00:00, one for all 01:00 ... 22:00, 23:00).
These were about 28-31 cells long, and each cell had the measurement of the thing I'm trying to analyze. So there are these 24 arrays for each month of 1 whole year, i.e. 24x12 = 288 arrays per place. And there are measurements from 10 places. So a total of 2880 arrays were created and all compared to each other, and saved in a 2880x2880 matrix with similarity coefficients.
I'm trying to turn it into a radial similarity matrix like the one from holoviews, but without the ticks and tags (since the format Place01Jan0800 would be cumbersome to look at for 2880 rows), just the shape and colors and divisions:
I managed to create the HTML file itself, but it ended up being 500 MB big, so it never shows up when I open it up. It's just blank. I've added a minimal example below of what I have, and replaced the loading of the datafile with some randomly generated data.
import sys
sys.setrecursionlimit(10000)
import random
import numpy as np
import pandas as pd
import holoviews as hv
from holoviews import opts
from bokeh.plotting import show
import gc
# Function creating dummy data for this example
def transformer():
dimension = 2880
dummy_matrix = ([[ random.random() for i in range(dimension) ] for j in range(dimension)]) #Fake, similar data
col_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
row_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
val_vals = (np.reshape(np.array(dummy_matrix), -1)).tolist() # Turn matrix into an array
idx_vals = [i for i in range(dimension*dimension)] # Placeholder
return idx_vals, val_vals, row_vals, col_vals
idx_arr, val_arr, row_arr, col_arr = transformer()
df = pd.DataFrame({"values": val_arr, "x-label": row_arr, "y-label": col_arr}, index=idx_arr)
hv.extension('bokeh')
heatmap = hv.HeatMap(df, ["x-label", "y-label"])
heatmap.opts(opts.HeatMap(cmap="viridis", radial=True))
gc.collect() # Attempt to save memory, because this thing is huge
show(hv.render(heatmap))
I had a look at datashader to see if it would help, but I have no idea how to plug it in (if it's possible for this case) to this radial heatmap, since it seems like the radial heatmap doesn't have that datashade-feature.
So I have no idea how to tackle this. I would be content with a broad overview too, I don't need the details nor the hover-infobox nor ability to zoom or any fancy extra features, I just need the general overview for a presentation. I'm open to any solution really.

I recommend you to use heatmp instead of radial heatamp for showing the similarity matrix. The reasons are:
The radial heatmap is designed for periodic variable. The time varible(288 hours) can be considered to be periodic data, however, I think the 288*10(288 hours, 10 places) is no longer periodic because of the existence of the "place".
Near the center of the radial heatmap, the color points will be too dense to be understood by the human.
The following is a simple code to show a heatmap.
import matplotlib.cm
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import numpy as np
n = 2880
m = 2880
dummy_matrix = np.random.rand(m, n)
fig = plt.figure(figsize=(50,50)) # change the figsize to control the resolution
ax = fig.add_subplot(111)
cmap = matplotlib.cm.get_cmap("Blues") # you may use other build-in colormap or define you own colormap
# if your data is not in range[0,1], use a normalization. Here is normalized by min and max values.
norm = Normalize(vmin=np.amin(dummy_matrix), vmax=np.amax(dummy_matrix))
image = ax.imshow(dummy_matrix, cmap=cmap, norm=norm)
plt.colorbar(image)
plt.show()
Which gives:
Another idea that comes to me is that, perhaps the computation of similarity matrix is unnecessary, and you can plot the orginial 288 * 10 data using radial heat map or just a normal heatmap, and one can get to know the data similarity from the color distribution directly.

Plain Matplotlib seems to be able to handle it, based on answers from here: How do I create radial heatmap in matplotlib?
import random
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
fig = plt.figure()
ax = Axes3D(fig)
n = 2880
m = 2880
rad = np.linspace(0, 10, m)
a = np.linspace(0, 2 * np.pi, n)
r, th = np.meshgrid(rad, a)
dummy_matrix = ([[ random.random() for i in range(n) ] for j in range(m)])
plt.subplot(projection="polar")
plt.pcolormesh(th, r, dummy_matrix, cmap = 'Blues')
plt.plot(a, r, ls='none', color = 'k')
plt.grid()
plt.colorbar()
plt.savefig("custom_radial_heatmap.png")
plt.show()
And it didn't even take an eternity, took only about 20 seconds max.
You would think it would turn out monstrous like that
But the sheer amount of points drowns out the jaggedness, WOOHOO!
There's some things left to be desired, like tags and ticks, but I think I'll figure that out.

Related

Finding spread of values between multiple arrays of different shapes in a line graph with matplotlib

I am plotting 16 arrays with different shapes onto one plot.However, things are getting difficult because what I would like to do is plot the maximum spread of data by essentially finding the maximum and minimum value at each point on the y-axis and shading between this area.
It would hopefully look something like this, except my lines are vertical so the spread would be across on the x rather than the y as shown here.
For simplicity, I'm working with some very simplified dummy data to figure out the best way to do this, but ultimately I'll be plotting more data with much larger lengths and a bit different values.
import matplotlib as mp
import matplotlib.pylab as plt
import numpy as np
d1 = np.random.randint(0,10,15)
d2 = np.random.randint(0,10,20)
d3 = np.random.randint(0,10,25)
d4 = np.random.randint(0,10,30)
y1=np.linspace(0,30,15,True)
y2=np.linspace(0,30,20,True)
y3=np.linspace(0,30,25,True)
y4=np.linspace(0,30,30,True)
plt.style.use('bmh')
fig = plt.figure(figsize=(5,15))
plt.plot(d1,y1,'red')
plt.plot(d2,y2,'blue')
plt.plot(d3,y3,'green')
plt.plot(d4,y4,'orange')
plt.xticks(np.arange(0,11,1))
plt.yticks(np.arange(0,31,1))
plt.show()
And with this code, this is currently what the plot looks like as shown on the left. What I want is something like on the right (which I did quickly by hand as an example) with the area between the black lines shaded. As you can see below, the black line follows whatever line is farthest on the outside (i.e. the smallest and largest values at each y point), and then I'd like to shade this region in between the two black lines.
Thanks for any help or advice!
Thank you for the advice people gave! I think I figured things out and I'll show how I did it. The graphs are a little different because I reran the entire code and of course I got a new set of random numbers for each of the dummy datasets.
I ultimately needed to interpolate the data to make them all the same length and from there I was able to stack them, find the max and min at each y, and then plot. You'll see the interpolation doesn't give a perfect representation, but I'm assuming that's in part because these are pretty small datasets with a larger step between values and for larger ones with a smaller step between values, it will do a little better.
import matplotlib as mp
import matplotlib.pylab as plt
import numpy as np
import scipy.interpolate as interp
d1 = np.random.randint(0,10,15)
d2 = np.random.randint(0,10,20)
d3 = np.random.randint(0,10,25)
d4 = np.random.randint(0,10,30)
y1=np.linspace(0,30,15)
y2=np.linspace(0,30,20)
y3=np.linspace(0,30,25)
y4=np.linspace(0,30,30)
y_common = np.linspace(0,30,30)
x1 = np.interp(y_common,y1,d1)
x2 = np.interp(y_common,y2,d2)
x3 = np.interp(y_common,y3,d3)
x4 = np.interp(y_common,y4,d4)
x = np.stack((x1,x2,x3,x4))
xmax = np.max(x,axis=0)
xmin = np.min(x,axis=0)
%matplotlib inline
plt.style.use('bmh')
fig = plt.figure(figsize=(5,15))
plt.plot(d1,y1,'red')
plt.plot(d2,y2,'blue')
plt.plot(d3,y3,'green')
plt.plot(d4,y4,'orange')
plt.xticks(np.arange(0,11,1))
plt.yticks(np.arange(0,31,1))
plt.show()
fig = plt.figure(figsize=(5,15))
plt.plot(x1,y_common,'#A2A2A2')
plt.plot(x2,y_common,'#A2A2A2')
plt.plot(x3,y_common,'#A2A2A2')
plt.plot(x4,y_common,'#A2A2A2')
plt.plot(xmax,y_common,'black')
plt.plot(xmin,y_common,'black')
plt.rcParams['hatch.color'] = 'black'
plt.fill_betweenx(y_common, xmax, xmin, facecolor='none', hatch ='/', edgecolor='black', linewidth=2)
plt.xticks(np.arange(0,11,1))
plt.yticks(np.arange(0,31,1))
plt.show()
Here are the resulting plots with the original before interpolation on the left and the plot with interpolation and a few other changes to match what I want my final graph to look like with my own data.

Numpy N-D Matrix to a 3D Mesh Graph

I tried looking this up a lot and there are lot of information on specific examples but they are too specific to understand.
How do I put data in a Numpy N-D Matrix to a 3D graph. please refer below example
import numpy as np
X =20
Y = 20
Z = 2
sample = np.zeros(((X,Y,Z)))
sample[1][2][2]=45
sample[1][3][0]=52
sample[1][8][1]=42
sample[1][15][1]=30
sample[1][19][2]=15
I Want to use values on X,Y,Z positions to be on a 3D graph (plot).
Thanks in advance
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
# Define size of data
P= 25
X = 70
Y = 25
Z = 3
# Create meshgrid
x,y = np.meshgrid(np.arange(X),np.arange(Y))
# Create some random data (your example didn't work)
sample = np.random.randn((((P,X,Y,Z))))
# Create figure
fig=plt.figure()
ax=fig.add_subplot(111,projection='3d')
fig.show()
# Define colors
colors=['b','r','g']
# Plot for each entry of in Z
for i in range(Z):
ax.plot_wireframe(x, y, sample[:,:,:,i],color=colors[i])
plt.draw()
plt.show()
But I only want to draw X,Y,Z only.
when I used above code python throws me lots of errors like ValueError: too many values to unpack
Are you looking for something like this?
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
# Define size of data
X = 20
Y = 20
Z = 3
# Create meshgrid
x,y = np.meshgrid(np.arange(X),np.arange(Y))
# Create some random data (your example didn't work)
sample = np.random.randn(X,Y,Z)
# Create figure
fig=plt.figure()
ax=fig.add_subplot(111,projection='3d')
fig.show()
# Define colors
colors=['b','r','g']
# Plot for each entry of in Z
for i in range(Z):
ax.plot_wireframe(x, y, sample[:,:,i],color=colors[i])
plt.draw()
plt.show()
which would you give
There are plenty of other ways to display 3D data in matplotlib, see also here. However, you are always limited to 3 dimensions (or 4, if you do a 3D scatter plot where color encodes the 4th dimension). So you need to make a decision which dimensions you want to show or if you can summarize them somehow.
I have got something it may work for you. To understand it I explain the process I go briefly. I have connected 4x4x4 = 64 point masses to each other and created a cube with dampers and springs and inner friction. I solved the kinematic and mechanical behaviour using numpy and then I need to visualise the cube all I have is X,Y,Z points for each time step of each mass.
What I have is 4x4x4 XYZ points of a cube for each time tn:
Here how it goes :
import matplotlib.pyplot as plt
zeroPoint=points[50] # at time step 50 elastic cube in space
surf0x=zeroPoint[0,:,:,0]
surf0y=zeroPoint[0,:,:,1]
surf0z=zeroPoint[0,:,:,2]
surf1x=zeroPoint[:,0,:,0]
surf1y=zeroPoint[:,0,:,1]
surf1z=zeroPoint[:,0,:,2]
surf2x=zeroPoint[:,:,0,0]
surf2y=zeroPoint[:,:,0,1]
surf2z=zeroPoint[:,:,0,2]
surf3x=zeroPoint[nmx-1,:,:,0]
surf3y=zeroPoint[nmx-1,:,:,1]
surf3z=zeroPoint[nmx-1,:,:,2]
surf4x=zeroPoint[:,nmy-1,:,0]
surf4y=zeroPoint[:,nmy-1,:,1]
surf4z=zeroPoint[:,nmy-1,:,2]
surf5x=zeroPoint[:,:,nmz-1,0]
surf5y=zeroPoint[:,:,nmz-1,1]
surf5z=zeroPoint[:,:,nmz-1,2]
fig = plt.figure(figsize=(10,10))
wf = plt.axes(projection ='3d')
wf.set_xlim(-0.5,2)
wf.set_ylim(-0.5,2)
wf.set_zlim(-0.5,2)
wf.plot_wireframe(surf0x, surf0y, surf0z, color ='green')
wf.plot_wireframe(surf1x, surf1y, surf1z, color ='red')
wf.plot_wireframe(surf2x, surf2y, surf2z, color ='blue')
wf.plot_wireframe(surf3x, surf3y, surf3z, color ='black')
wf.plot_wireframe(surf4x, surf4y, surf4z, color ='purple')
wf.plot_wireframe(surf5x, surf5y, surf5z, color ='orange')
# displaying the visualization
wf.set_title('Its a Cube :) ')
pyplot.show()
at time step 190 same cube (animation is 60 FPS) :
The trick is as you see you need to create surfaces from points before you go. You dont even need np.meshgrid to do that. People does it for parametric z values calculation. If you have all points you dont need it.

Python Matplotlib: plotting histogram with overlapping boundaries removed

I am plotting a histogram using Matplotlib in Python with the matplotlib.bar() function. This gives me plots that look like this:
I am trying to produce a histogram that only plots the caps of each bar and the sides that don't directly share space with the border of another bar, more like this: (I edited this using gimp)
How can I achieve this using Python? Answers using matplotlib are preferable since that is what I have the most experience with but I am open to anything that works using Python.
For what it's worth, here's the relevant code:
import numpy as np
import matplotlib.pyplot as pp
bin_edges, bin_values = np.loadtxt("datafile.dat",unpack=True)
bin_edges = np.append(bin_edges,500.0)
bin_widths = []
for j in range(len(bin_values)):
bin_widths.append(bin_edges[j+1] - bin_edges[j])
pp.bar(bin_edges[:-1],bin_values,width=bin_widths,color="none",edgecolor='black',lw=2)
pp.savefig("name.pdf")
I guess the easiest way is to use the step function instead of bar:
http://matplotlib.org/examples/pylab_examples/step_demo.html
Example:
import numpy as np
import matplotlib.pyplot as pp
# Simulate data
bin_edges = np.arange(100)
bin_values = np.exp(-np.arange(100)/5.0)
# Prepare figure output
pp.figure(figsize=(7,7),edgecolor='k',facecolor='w')
pp.step(bin_edges,bin_values, where='post',color='k',lw=2)
pp.tight_layout(pad=0.25)
pp.show()
If your bin_edges given represent the left edge use where='post'; if they are the rightern side use where='pre'. The only issue I see is that step doesn't really plot the last (first) bin correctly if you use post (pre). But you could just add another 0 bin before/after your data to make it draw everything properly.
Example 2 - If you want to bin some data and draw a histogram you could do something like this:
# Simulate data
data = np.random.rand(1000)
# Prepare histogram
nBins = 100
rng = [0,1]
n,bins = np.histogram(data,nBins,rng)
x = bins[:-1] + 0.5*np.diff(bins)
# Prepare figure output
pp.figure(figsize=(7,7),edgecolor='k',facecolor='w')
pp.step(x,n,where='mid',color='k',lw=2)
pp.show()

Changing axis options for Polar Plots in Matplotlib/Python

I have a problem changing my axis labels in Matplotlib. I want to change the radial axis options in my Polar Plot.
Basically, I'm computing the distortion of a cylinder, which is nothing but how much the radius deviates from the original (perfectly circular) cylinder. Some of the distortion values are negative, while some are positive due to tensile and compressive forces. I'm looking for a way to represent this in cylindrical coordinates graphically, so I thought that a polar plot was my best bet. Excel gives me a 'radar chart' option which is flexible enough to let me specify minimum and maximum radial axis values. I want to replicate this on Python using Matplotlib.
My Python script for plotting on polar coordinates is as follows.
#!usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(-180.0,190.0,10)
theta = (np.pi/180.0 )*x # in radians
offset = 2.0
R1 = [-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358]
fig1 = plt.figure()
ax1 = fig1.add_axes([0.1,0.1,0.8,0.8],polar=True)
ax1.set_rmax(1)
ax1.plot(theta,R1,lw=2.5)
My plot looks as follows:
But this is not how I want to present it. I want to vary my radial axis, so that I can show the data as a deviation from some reference value, say -2. How do I ask Matplotlib in polar coordinates to change the minimum axis label? I can do this VERY easily in Excel. I choose a minimum radial value of -2, to get the following Excel radar chart:
On Python, I can easily offset my input data by a magnitude of 2. My new dataset is called R2, as shown:
#!usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(-180.0,190.0,10)
theta = (np.pi/180.0 )*x # in radians
offset = 2.0
R2 = [1.642,1.517,1.521,1.654,1.879,2.137,2.358,2.483,2.479,2.346,2.121,1.863,\
1.642,1.517,1.521,1.654,1.879,2.137,2.358,2.483,2.479,2.346,2.121,1.863,1.642,\
1.517,1.521,1.654,1.879,2.137,2.358,2.483,2.479,2.346,2.121,1.863,1.642]
fig2 = plt.figure()
ax2 = fig2.add_axes([0.1,0.1,0.8,0.8],polar=True)
ax2.plot(theta,R2,lw=2.5)
ax2.set_rmax(1.5*offset)
plt.show()
The plot is shown below:
Once I get this, I can MANUALLY add axis labels and hard-code it into my script. But this is a really ugly way. Is there any way I can directly get a Matplotlib equivalent of the Excel radar chart and change my axis labels without having to manipulate my input data?
You can just use the normal way of setting axis limits:
#!usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(-180.0,190.0,10)
theta = (np.pi/180.0 )*x # in radians
offset = 2.0
R1 = [-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358]
fig1 = plt.figure()
ax1 = fig1.add_axes([0.1,0.1,0.8,0.8],polar=True)
ax1.set_ylim(-2,2)
ax1.set_yticks(np.arange(-2,2,0.5))
ax1.plot(theta,R1,lw=2.5)

Matplotlib Agg Rendering Complexity Error

I am trying to print a 600 dpi graph using Python matplotlib. However Python plotted 2 out of 8 graphs, and output the error:
OverflowError: Agg rendering complexity exceeded. Consider downsampling or decimating your data.
I am plotting a huge chunk of data (7,500,000 data per column) so I guess either that would be some overloading problem or that I need to set a large cell_block_limit.
I tried searching for the solutions for changing a cell_block_limit on Google but to no avail. What would be a good approach?
The code as follows:-
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, FormatStrFormatter
majorLocator = MultipleLocator(200)
majorFormatter = FormatStrFormatter('%d')
minorLocator = MultipleLocator(20)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.xaxis.set_major_locator(majorLocator)
ax.xaxis.set_major_formatter(majorFormatter)
ax.xaxis.set_minor_locator(minorLocator)
ax.xaxis.set_ticks_position('bottom')
ax.xaxis.grid(True,which='minor')
ax.yaxis.grid(True)
plt.plot(timemat,fildata)
plt.xlabel(plotxlabel,fontsize=14)
plt.ylabel(plotylabel,fontsize=14)
plt.title(plottitle,fontsize=16)
fig.savefig(plotsavetitle,dpi=600)
In addition to #Lennart's point that there's no need for the full resolution, you might also consider a plot similar to the following.
Calculating the max/mean/min of a "chunked" version is very simple and efficient if you use a 2D view of the original array and the axis keyword arg to x.min(), x.max(), etc.
Even with the filtering, plotting this is much faster than plotting the full array.
(Note: to plot this many points, you'll have to tune down the noise level a bit. Otherwise you'll get the OverflowError you mentioned. If you want to compare plotting the "full" dataset, change the y += 0.3 * y.max() np.random... line to more like 0.1 or remove it completely.)
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(1977)
# Generate some very noisy but interesting data...
num = 1e7
x = np.linspace(0, 10, num)
y = np.random.random(num) - 0.5
y.cumsum(out=y)
y += 0.3 * y.max() * np.random.random(num)
fig, ax = plt.subplots()
# Wrap the array into a 2D array of chunks, truncating the last chunk if
# chunksize isn't an even divisor of the total size.
# (This part won't use _any_ additional memory)
chunksize = 10000
numchunks = y.size // chunksize
ychunks = y[:chunksize*numchunks].reshape((-1, chunksize))
xchunks = x[:chunksize*numchunks].reshape((-1, chunksize))
# Calculate the max, min, and means of chunksize-element chunks...
max_env = ychunks.max(axis=1)
min_env = ychunks.min(axis=1)
ycenters = ychunks.mean(axis=1)
xcenters = xchunks.mean(axis=1)
# Now plot the bounds and the mean...
ax.fill_between(xcenters, min_env, max_env, color='gray',
edgecolor='none', alpha=0.5)
ax.plot(xcenters, ycenters)
fig.savefig('temp.png', dpi=600)
With 600dpi you would have to make the plot 13 meters wide to plot that data without decimating it. :-)
I would suggest chunking the data into pieces a couple of hundred or maybe even a thousand samples long, and extracting the maximum value out of that.
Something like this:
def chunkmax(data, chunk_size):
source = iter(data)
chunk = []
while True:
for i in range(chunk_size):
chunk.append(next(source))
yield max(chunk)
This would then, with a chunk_size of 1000 give you 7500 points to plot, where you then easily can see where in the data the shock comes. (Unless the data is so noisy you would have to average it to see if there is a chock or not. But that's also easily fixable).

Categories