Plotting a heatmap with interpolation in Python using excel file - python

I need to plot a HEATMAP in python using x, y, z data from the excel file.
All the values of z are 1 except at (x=5,y=5). The plot should be red at point (5,5) and blue elsewhere. But I am getting false alarms which need to be removed. The COLORMAP I have used is 'jet'
X=[0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9]
Y=[0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9]
Z=[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,9,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
Code I have used is:
import matplotlib.pyplot as plt
import numpy as np
from numpy import ravel
from scipy.interpolate import interp2d
import pandas as pd
import matplotlib as mpl
excel_data_df = pd.read_excel('test.xlsx')
X= excel_data_df['x'].tolist()
Y= excel_data_df['y'].tolist()
Z= excel_data_df['z'].tolist()
x_list = np.array(X)
y_list = np.array(Y)
z_list = np.array(Z)
# f will be a function with two arguments (x and y coordinates),
# but those can be array_like structures too, in which case the
# result will be a matrix representing the values in the grid
# specified by those arguments
f = interp2d(x_list,y_list,z_list,kind="linear")
x_coords = np.arange(min(x_list),max(x_list))
y_coords = np.arange(min(y_list),max(y_list))
z= f(x_coords,y_coords)
fig = plt.imshow(z,
extent=[min(x_list),max(x_list),min(y_list),max(y_list)],
origin="lower", interpolation='bicubic', cmap= 'jet', aspect='auto')
# Show the positions of the sample points, just to have some reference
fig.axes.set_autoscale_on(False)
#plt.scatter(x_list,y_list,400, facecolors='none')
plt.xlabel('X Values', fontsize = 15, va="center")
plt.ylabel('Y Values', fontsize = 15,va="center")
plt.title('Heatmap', fontsize = 20)
plt.tight_layout()
plt.show()
For your ease you can also use the X, Y, Z arrays instead of reading excel file.
The result that I am getting is:
Here you can see dark blue regions at (5,0) and (0,5). These are the FALSE ALARMS I am getting and I need to REMOVE these.
I am probably doing some beginner's mistake. Grateful to anyone who points it out. Regards

There are at least three problems in your example:
x_coords and y_coords are not properly resampled;
the interpolation z does to fill in the whole grid leading to incorrect output;
the output is then forced to be plotted on the original grid (extent) that add to the confusion.
Leading to the following interpolated results:
On what you have applied an extra smoothing with imshow.
Let's create your artificial input:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0, 11)
y = np.arange(0, 11)
X, Y = np.meshgrid(x, y)
Z = np.ones(X.shape)
Z[5,5] = 9
Depending on how you want to proceed, you can simply let imshow smooth your signal by interpolation:
fig, axe = plt.subplots()
axe.imshow(Z, origin="lower", cmap="jet", interpolation='bicubic')
And you are done, simple and efficient!
If you aim to do it by yourself, then choose the interpolant that suits you best and resample on a grid with a higher resolution:
interpolant = interpolate.interp2d(x, y, Z.ravel(), kind="linear")
xlin = np.linspace(0, 10, 101)
ylin = np.linspace(0, 10, 101)
zhat = interpolant(xlin, ylin)
fig, axe = plt.subplots()
axe.imshow(zhat, origin="lower", cmap="jet")
Have a deeper look on scipy.interpolate module to pick up the best interpolant regarding your needs. Notice that all methods does not expose the same interface for imputing parameters. You may need to reshape your data to use another objects.
MCVE
Here is a complete example using the trial data generated above. Just bind it to your excel columns:
# Flatten trial data to meet your requirement:
x = X.ravel()
y = Y.ravel()
z = Z.ravel()
# Resampling on as square grid with given resolution:
resolution = 11
xlin = np.linspace(x.min(), x.max(), resolution)
ylin = np.linspace(y.min(), y.max(), resolution)
Xlin, Ylin = np.meshgrid(xlin, ylin)
# Linear multi-dimensional interpolation:
interpolant = interpolate.NearestNDInterpolator([r for r in zip(x, y)], z)
Zhat = interpolant(Xlin.ravel(), Ylin.ravel()).reshape(Xlin.shape)
# Render and interpolate again if necessary:
fig, axe = plt.subplots()
axe.imshow(Zhat, origin="lower", cmap="jet", interpolation='bicubic')
Which renders as expected:

Related

How to convert a matrix to heatmap image in torch [duplicate]

Using Matplotlib, I want to plot a 2D heat map. My data is an n-by-n Numpy array, each with a value between 0 and 1. So for the (i, j) element of this array, I want to plot a square at the (i, j) coordinate in my heat map, whose color is proportional to the element's value in the array.
How can I do this?
The imshow() function with parameters interpolation='nearest' and cmap='hot' should do what you want.
Please review the interpolation parameter details, and see Interpolations for imshow and Image antialiasing.
import matplotlib.pyplot as plt
import numpy as np
a = np.random.random((16, 16))
plt.imshow(a, cmap='hot', interpolation='nearest')
plt.show()
Seaborn is a high-level API for matplotlib, which takes care of a lot of the manual work.
seaborn.heatmap automatically plots a gradient at the side of the chart etc.
import numpy as np
import seaborn as sns
import matplotlib.pylab as plt
uniform_data = np.random.rand(10, 12)
ax = sns.heatmap(uniform_data, linewidth=0.5)
plt.show()
You can even plot upper / lower left / right triangles of square matrices. For example, a correlation matrix, which is square and is symmetric, so plotting all values would be redundant.
corr = np.corrcoef(np.random.randn(10, 200))
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("white"):
ax = sns.heatmap(corr, mask=mask, vmax=.3, square=True, cmap="YlGnBu")
plt.show()
I would use matplotlib's pcolor/pcolormesh function since it allows nonuniform spacing of the data.
Example taken from matplotlib:
import matplotlib.pyplot as plt
import numpy as np
# generate 2 2d grids for the x & y bounds
y, x = np.meshgrid(np.linspace(-3, 3, 100), np.linspace(-3, 3, 100))
z = (1 - x / 2. + x ** 5 + y ** 3) * np.exp(-x ** 2 - y ** 2)
# x and y are bounds, so z should be the value *inside* those bounds.
# Therefore, remove the last value from the z array.
z = z[:-1, :-1]
z_min, z_max = -np.abs(z).max(), np.abs(z).max()
fig, ax = plt.subplots()
c = ax.pcolormesh(x, y, z, cmap='RdBu', vmin=z_min, vmax=z_max)
ax.set_title('pcolormesh')
# set the limits of the plot to the limits of the data
ax.axis([x.min(), x.max(), y.min(), y.max()])
fig.colorbar(c, ax=ax)
plt.show()
For a 2d numpy array, simply use imshow() may help you:
import matplotlib.pyplot as plt
import numpy as np
def heatmap2d(arr: np.ndarray):
plt.imshow(arr, cmap='viridis')
plt.colorbar()
plt.show()
test_array = np.arange(100 * 100).reshape(100, 100)
heatmap2d(test_array)
This code produces a continuous heatmap.
You can choose another built-in colormap from here.
Here's how to do it from a csv:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import griddata
# Load data from CSV
dat = np.genfromtxt('dat.xyz', delimiter=' ',skip_header=0)
X_dat = dat[:,0]
Y_dat = dat[:,1]
Z_dat = dat[:,2]
# Convert from pandas dataframes to numpy arrays
X, Y, Z, = np.array([]), np.array([]), np.array([])
for i in range(len(X_dat)):
X = np.append(X, X_dat[i])
Y = np.append(Y, Y_dat[i])
Z = np.append(Z, Z_dat[i])
# create x-y points to be used in heatmap
xi = np.linspace(X.min(), X.max(), 1000)
yi = np.linspace(Y.min(), Y.max(), 1000)
# Interpolate for plotting
zi = griddata((X, Y), Z, (xi[None,:], yi[:,None]), method='cubic')
# I control the range of my colorbar by removing data
# outside of my range of interest
zmin = 3
zmax = 12
zi[(zi<zmin) | (zi>zmax)] = None
# Create the contour plot
CS = plt.contourf(xi, yi, zi, 15, cmap=plt.cm.rainbow,
vmax=zmax, vmin=zmin)
plt.colorbar()
plt.show()
where dat.xyz is in the form
x1 y1 z1
x2 y2 z2
...
Use matshow() which is a wrapper around imshow to set useful defaults for displaying a matrix.
a = np.diag(range(15))
plt.matshow(a)
https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.matshow.html
This is just a convenience function wrapping imshow to set useful defaults for displaying a matrix. In particular:
Set origin='upper'.
Set interpolation='nearest'.
Set aspect='equal'.
Ticks are placed to the left and above.
Ticks are formatted to show integer indices.
Here is a new python package to plot complex heatmaps with different kinds of row/columns annotations in Python: https://github.com/DingWB/PyComplexHeatmap

Update interactive plot in Jupyter Notebook [duplicate]

I am trying to animate a pcolormesh in matplotlib. I have seen many of the examples using the package animation, most of them using a 1D plot routine, and some of them with imshow().
First, I wan to use the FuncAnimation routine. My problem is, first, that I do not know if I can initialize the plot
fig,ax = plt.subplots()
quad = ax.pcolormesh(X,Y,Z)
I have tried a few simple lines:
fig,ax = plt.subplots()
quad = ax.pcolormesh([])
def init():
quad.set_array([])
return quad,
def animate(ktime):
quad.set_array(X,Y,np.sin(Z)+ktime)
return quad,
anim = animation.FuncAnimation(fig,animate,init_func=init,frames=Ntime,interval=200,blit=True)
plt.show()
By the way, How do I set labels into and animated plot? Can I animate the title, if it is showing a number that changes in time?
Thanks
The problem was that I was wrongly using set_array() routine. It is very important to note that you must pass a 1D array to this routine. To do so, regarding that color, pcolormesh and so on usually plots multidimensional arrays, you should use .ravel() .
One more important thing: In order to animate different plots at the same time, the blitz option at animate.FuncAnimation must be False (See section "Animating selected plot elements" of this link).
Here I post the code that simple program with various subplots:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.gridspec as gridspec
import matplotlib.animation as animation
y, x = np.meshgrid(np.linspace(-10, 10,100), np.linspace(-10, 10,100))
z = np.sin(x)*np.sin(x)+np.sin(y)*np.sin(y)
v = np.linspace(-10, 10,100)
t = np.sin(v)*np.sin(v)
tt = np.cos(v)*np.cos(v)
###########
fig = plt.figure(figsize=(16, 8),facecolor='white')
gs = gridspec.GridSpec(5, 2)
ax1 = plt.subplot(gs[0,0])
line, = ax1.plot([],[],'b-.',linewidth=2)
ax1.set_xlim(-10,10)
ax1.set_ylim(0,1)
ax1.set_xlabel('time')
ax1.set_ylabel('amplitude')
ax1.set_title('Oscillationsssss')
time_text = ax1.text(0.02, 0.95, '', transform=ax1.transAxes)
#############################
ax2 = plt.subplot(gs[1:3,0])
quad1 = ax2.pcolormesh(x,y,z,shading='gouraud')
ax2.set_xlabel('time')
ax2.set_ylabel('amplitude')
cb2 = fig.colorbar(quad1,ax=ax2)
#########################
ax3 = plt.subplot(gs[3:,0])
quad2 = ax3.pcolormesh(x, y, z,shading='gouraud')
ax3.set_xlabel('time')
ax3.set_ylabel('amplitude')
cb3 = fig.colorbar(quad2,ax=ax3)
############################
ax4 = plt.subplot(gs[:,1])
line2, = ax4.plot(v,tt,'b',linewidth=2)
ax4.set_xlim(-10,10)
ax4.set_ylim(0,1)
def init():
line.set_data([],[])
line2.set_data([],[])
quad1.set_array([])
return line,line2,quad1
def animate(iter):
t = np.sin(2*v-iter/(2*np.pi))*np.sin(2*v-iter/(2*np.pi))
tt = np.cos(2*v-iter/(2*np.pi))*np.cos(2*v-iter/(2*np.pi))
z = np.sin(x-iter/(2*np.pi))*np.sin(x-iter/(2*np.pi))+np.sin(y)*np.sin(y)
line.set_data(v,t)
quad1.set_array(z.ravel())
line2.set_data(v,tt)
return line,line2,quad1
gs.tight_layout(fig)
anim = animation.FuncAnimation(fig,animate,frames=100,interval=50,blit=False,repeat=False)
plt.show()
print 'Finished!!'
There is an ugly detail you need to take care when using QuadMesh.set_array(). If you intantiate your QuadMesh with X, Y and C you can update the values C by using set_array(). But set_array does not support the same input as the constructor. Reading the source reveals that you need to pass a 1d-array and what is even more puzzling is that depending on the shading setting you might need to cut of your array C.
Edit: There is even a very old bug report about the confusing array size for shading='flat'.
That means:
Using QuadMesh.set_array() with shading = 'flat'
'flat' is default value for shading.
# preperation
import numpy as np
import matplotlib.pyplot as plt
plt.ion()
y = np.linspace(-10, 10, num=1000)
x = np.linspace(-10, 10, num=1000)
X, Y = np.meshgrid(x, y)
C = np.ones((1000, 1000)) * float('nan')
# intantiate empty plot (values = nan)
pcmesh = plt.pcolormesh(X, Y, C, vmin=-100, vmax=100, shading='flat')
# generate some new data
C = X * Y
# necessary for shading='flat'
C = C[:-1, :-1]
# ravel() converts C to a 1d-array
pcmesh.set_array(C.ravel())
# redraw to update plot with new data
plt.draw()
Looks like:
Note that if you omit C = C[:-1, :-1] your will get this broken graphic:
Using QuadMesh.set_array() with shading = 'gouraud'
# preperation (same as for 'flat')
import numpy as np
import matplotlib.pyplot as plt
plt.ion()
y = np.linspace(-10, 10, num=1000)
x = np.linspace(-10, 10, num=1000)
X, Y = np.meshgrid(x, y)
C = np.ones((1000, 1000)) * float('nan')
# intantiate empty plot (values = nan)
pcmesh = plt.pcolormesh(X, Y, C, vmin=-100, vmax=100, shading='gouraud')
# generate some new data
C = X * Y
# here no cut of of last row/column!
# ravel() converts C to a 1d-array
pcmesh.set_array(C.ravel())
# redraw to update plot with new data
plt.draw()
If you cut off the last row/column with shade='gouraud' you will get:
ValueError: total size of new array must be unchanged
I am not sure why your quad = ax.pcolormesh(X,Y,Z) function is giving an error. Can you post the error?
Below is what I would do to create a simple animation using pcolormesh:
import matplotlib.pyplot as plt
import numpy as np
y, x = np.meshgrid(np.linspace(-3, 3,100), np.linspace(-3, 3,100))
z = np.sin(x**2+y**2)
z = z[:-1, :-1]
ax = plt.subplot(111)
quad = plt.pcolormesh(x, y, z)
plt.colorbar()
plt.ion()
plt.show()
for phase in np.linspace(0,10*np.pi,200):
z = np.sin(np.sqrt(x**2+y**2) + phase)
z = z[:-1, :-1]
quad.set_array(z.ravel())
plt.title('Phase: %.2f'%phase)
plt.draw()
plt.ioff()
plt.show()
One of the frames:
Does this help? If not, maybe you can clarify the question.
There is another answer presented here that looks simpler thus better (IMHO)
Here is a copy & paste of the alternative solution :
import matplotlib.pylab as plt
from matplotlib import animation
fig = plt.figure()
plt.hold(True)
#We need to prime the pump, so to speak and create a quadmesh for plt to work with
plt.pcolormesh(X[0:1], Y[0:1], C[0:1])
anim = animation.FuncAnimation(fig, animate, frames = range(2,155), blit = False)
plt.show()
plt.hold(False)
def animate( self, i):
plt.title('Ray: %.2f'%i)
#This is where new data is inserted into the plot.
plt.pcolormesh(X[i-2:i], Y[i-2:i], C[i-2:i])

Plotting scatter density plots in python [duplicate]

I'd like to make a scatter plot where each point is colored by the spatial density of nearby points.
I've come across a very similar question, which shows an example of this using R:
R Scatter Plot: symbol color represents number of overlapping points
What's the best way to accomplish something similar in python using matplotlib?
In addition to hist2d or hexbin as #askewchan suggested, you can use the same method that the accepted answer in the question you linked to uses.
If you want to do that:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=100)
plt.show()
If you'd like the points to be plotted in order of density so that the densest points are always on top (similar to the linked example), just sort them by the z-values. I'm also going to use a smaller marker size here as it looks a bit better:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
# Sort the points by density, so that the densest points are plotted last
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=50)
plt.show()
Plotting >100k data points?
The accepted answer, using gaussian_kde() will take a lot of time. On my machine, 100k rows took about 11 minutes. Here I will add two alternative methods (mpl-scatter-density and datashader) and compare the given answers with same dataset.
In the following, I used a test data set of 100k rows:
import matplotlib.pyplot as plt
import numpy as np
# Fake data for testing
x = np.random.normal(size=100000)
y = x * 3 + np.random.normal(size=100000)
Output & computation time comparison
Below is a comparison of different methods.
1: mpl-scatter-density
Installation
pip install mpl-scatter-density
Example code
import mpl_scatter_density # adds projection='scatter_density'
from matplotlib.colors import LinearSegmentedColormap
# "Viridis-like" colormap with white background
white_viridis = LinearSegmentedColormap.from_list('white_viridis', [
(0, '#ffffff'),
(1e-20, '#440053'),
(0.2, '#404388'),
(0.4, '#2a788e'),
(0.6, '#21a784'),
(0.8, '#78d151'),
(1, '#fde624'),
], N=256)
def using_mpl_scatter_density(fig, x, y):
ax = fig.add_subplot(1, 1, 1, projection='scatter_density')
density = ax.scatter_density(x, y, cmap=white_viridis)
fig.colorbar(density, label='Number of points per pixel')
fig = plt.figure()
using_mpl_scatter_density(fig, x, y)
plt.show()
Drawing this took 0.05 seconds:
And the zoom-in looks quite nice:
2: datashader
Datashader is an interesting project. It has added support for matplotlib in datashader 0.12.
Installation
pip install datashader
Code (source & parameterer listing for dsshow):
import datashader as ds
from datashader.mpl_ext import dsshow
import pandas as pd
def using_datashader(ax, x, y):
df = pd.DataFrame(dict(x=x, y=y))
dsartist = dsshow(
df,
ds.Point("x", "y"),
ds.count(),
vmin=0,
vmax=35,
norm="linear",
aspect="auto",
ax=ax,
)
plt.colorbar(dsartist)
fig, ax = plt.subplots()
using_datashader(ax, x, y)
plt.show()
It took 0.83 s to draw this:
There is also possibility to colorize by third variable. The third parameter for dsshow controls the coloring. See more examples here and the source for dsshow here.
3: scatter_with_gaussian_kde
def scatter_with_gaussian_kde(ax, x, y):
# https://stackoverflow.com/a/20107592/3015186
# Answer by Joel Kington
xy = np.vstack([x, y])
z = gaussian_kde(xy)(xy)
ax.scatter(x, y, c=z, s=100, edgecolor='')
It took 11 minutes to draw this:
4: using_hist2d
import matplotlib.pyplot as plt
def using_hist2d(ax, x, y, bins=(50, 50)):
# https://stackoverflow.com/a/20105673/3015186
# Answer by askewchan
ax.hist2d(x, y, bins, cmap=plt.cm.jet)
It took 0.021 s to draw this bins=(50,50):
It took 0.173 s to draw this bins=(1000,1000):
Cons: The zoomed-in data does not look as good as in with mpl-scatter-density or datashader. Also you have to determine the number of bins yourself.
5: density_scatter
The code is as in the answer by Guillaume.
It took 0.073 s to draw this with bins=(50,50):
It took 0.368 s to draw this with bins=(1000,1000):
Also, if the number of point makes KDE calculation too slow, color can be interpolated in np.histogram2d [Update in response to comments: If you wish to show the colorbar, use plt.scatter() instead of ax.scatter() followed by plt.colorbar()]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.colors import Normalize
from scipy.interpolate import interpn
def density_scatter( x , y, ax = None, sort = True, bins = 20, **kwargs ) :
"""
Scatter plot colored by 2d histogram
"""
if ax is None :
fig , ax = plt.subplots()
data , x_e, y_e = np.histogram2d( x, y, bins = bins, density = True )
z = interpn( ( 0.5*(x_e[1:] + x_e[:-1]) , 0.5*(y_e[1:]+y_e[:-1]) ) , data , np.vstack([x,y]).T , method = "splinef2d", bounds_error = False)
#To be sure to plot all data
z[np.where(np.isnan(z))] = 0.0
# Sort the points by density, so that the densest points are plotted last
if sort :
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
ax.scatter( x, y, c=z, **kwargs )
norm = Normalize(vmin = np.min(z), vmax = np.max(z))
cbar = fig.colorbar(cm.ScalarMappable(norm = norm), ax=ax)
cbar.ax.set_ylabel('Density')
return ax
if "__main__" == __name__ :
x = np.random.normal(size=100000)
y = x * 3 + np.random.normal(size=100000)
density_scatter( x, y, bins = [30,30] )
You could make a histogram:
import numpy as np
import matplotlib.pyplot as plt
# fake data:
a = np.random.normal(size=1000)
b = a*3 + np.random.normal(size=1000)
plt.hist2d(a, b, (50, 50), cmap=plt.cm.jet)
plt.colorbar()

Python: How to revolve a surface around z axis and make a 3d plot?

I want to get 2d and 3d plots as shown below.
The equation of the curve is given.
How can we do so in python?
I know there may be duplicates but at the time of posting
I could not fine any useful posts.
My initial attempt is like this:
# Imports
import numpy as np
import matplotlib.pyplot as plt
# to plot the surface rho = b*cosh(z/b) with rho^2 = r^2 + b^2
z = np.arange(-3, 3, 0.01)
rho = np.cosh(z) # take constant b = 1
plt.plot(rho,z)
plt.show()
Some related links are following:
Rotate around z-axis only in plotly
The 3d-plot should look like this:
Ok so I think you are really asking to revolve a 2d curve around an axis to create a surface. I come from a CAD background so that is how i explain things.
and I am not the greatest at math so forgive any clunky terminology. Unfortunately you have to do the rest of the math to get all the points for the mesh.
Heres your code:
#import for 3d
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt
change arange to linspace which captures the endpoint otherwise arange will be missing the 3.0 at the end of the array:
z = np.linspace(-3, 3, 600)
rho = np.cosh(z) # take constant b = 1
since rho is your radius at every z height we need to calculate x,y points around that radius. and before that we have to figure out at what positions on that radius to get x,y co-ordinates:
#steps around circle from 0 to 2*pi(360degrees)
#reshape at the end is to be able to use np.dot properly
revolve_steps = np.linspace(0, np.pi*2, 600).reshape(1,600)
the Trig way of getting points around a circle is:
x = r*cos(theta)
y = r*sin(theta)
for you r is your rho, and theta is revolve_steps
by using np.dot to do matrix multiplication you get a 2d array back where the rows of x's and y's will correspond to the z's
theta = revolve_steps
#convert rho to a column vector
rho_column = rho.reshape(600,1)
x = rho_column.dot(np.cos(theta))
y = rho_column.dot(np.sin(theta))
# expand z into a 2d array that matches dimensions of x and y arrays..
# i used np.meshgrid
zs, rs = np.meshgrid(z, rho)
#plotting
fig, ax = plt.subplots(subplot_kw=dict(projection='3d'))
fig.tight_layout(pad = 0.0)
#transpose zs or you get a helix not a revolve.
# you could add rstride = int or cstride = int kwargs to control the mesh density
ax.plot_surface(x, y, zs.T, color = 'white', shade = False)
#view orientation
ax.elev = 30 #30 degrees for a typical isometric view
ax.azim = 30
#turn off the axes to closely mimic picture in original question
ax.set_axis_off()
plt.show()
#ps 600x600x600 pts takes a bit of time to render
I am not sure if it's been fixed in latest version of matplotlib but the setting the aspect ratio of 3d plots with:
ax.set_aspect('equal')
has not worked very well. you can find solutions at this stack overflow question
Only rotate the axis, in this case x
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.axes3d as axes3d
np.seterr(divide='ignore', invalid='ignore')
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = np.linspace(-3, 3, 60)
rho = np.cosh(x)
v = np.linspace(0, 2*np.pi, 60)
X, V = np.meshgrid(x, v)
Y = np.cosh(X) * np.cos(V)
Z = np.cosh(X) * np.sin(V)
ax.set_xlabel('eje X')
ax.set_ylabel('eje Y')
ax.set_zlabel('eje Z')
ax.plot_surface(X, Y, Z, cmap='YlGnBu_r')
plt.plot(x, rho, 'or') #Muestra la curva que se va a rotar
plt.show()
The result:

How can I make a scatter plot colored by density in matplotlib?

I'd like to make a scatter plot where each point is colored by the spatial density of nearby points.
I've come across a very similar question, which shows an example of this using R:
R Scatter Plot: symbol color represents number of overlapping points
What's the best way to accomplish something similar in python using matplotlib?
In addition to hist2d or hexbin as #askewchan suggested, you can use the same method that the accepted answer in the question you linked to uses.
If you want to do that:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=100)
plt.show()
If you'd like the points to be plotted in order of density so that the densest points are always on top (similar to the linked example), just sort them by the z-values. I'm also going to use a smaller marker size here as it looks a bit better:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
# Sort the points by density, so that the densest points are plotted last
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=50)
plt.show()
Plotting >100k data points?
The accepted answer, using gaussian_kde() will take a lot of time. On my machine, 100k rows took about 11 minutes. Here I will add two alternative methods (mpl-scatter-density and datashader) and compare the given answers with same dataset.
In the following, I used a test data set of 100k rows:
import matplotlib.pyplot as plt
import numpy as np
# Fake data for testing
x = np.random.normal(size=100000)
y = x * 3 + np.random.normal(size=100000)
Output & computation time comparison
Below is a comparison of different methods.
1: mpl-scatter-density
Installation
pip install mpl-scatter-density
Example code
import mpl_scatter_density # adds projection='scatter_density'
from matplotlib.colors import LinearSegmentedColormap
# "Viridis-like" colormap with white background
white_viridis = LinearSegmentedColormap.from_list('white_viridis', [
(0, '#ffffff'),
(1e-20, '#440053'),
(0.2, '#404388'),
(0.4, '#2a788e'),
(0.6, '#21a784'),
(0.8, '#78d151'),
(1, '#fde624'),
], N=256)
def using_mpl_scatter_density(fig, x, y):
ax = fig.add_subplot(1, 1, 1, projection='scatter_density')
density = ax.scatter_density(x, y, cmap=white_viridis)
fig.colorbar(density, label='Number of points per pixel')
fig = plt.figure()
using_mpl_scatter_density(fig, x, y)
plt.show()
Drawing this took 0.05 seconds:
And the zoom-in looks quite nice:
2: datashader
Datashader is an interesting project. It has added support for matplotlib in datashader 0.12.
Installation
pip install datashader
Code (source & parameterer listing for dsshow):
import datashader as ds
from datashader.mpl_ext import dsshow
import pandas as pd
def using_datashader(ax, x, y):
df = pd.DataFrame(dict(x=x, y=y))
dsartist = dsshow(
df,
ds.Point("x", "y"),
ds.count(),
vmin=0,
vmax=35,
norm="linear",
aspect="auto",
ax=ax,
)
plt.colorbar(dsartist)
fig, ax = plt.subplots()
using_datashader(ax, x, y)
plt.show()
It took 0.83 s to draw this:
There is also possibility to colorize by third variable. The third parameter for dsshow controls the coloring. See more examples here and the source for dsshow here.
3: scatter_with_gaussian_kde
def scatter_with_gaussian_kde(ax, x, y):
# https://stackoverflow.com/a/20107592/3015186
# Answer by Joel Kington
xy = np.vstack([x, y])
z = gaussian_kde(xy)(xy)
ax.scatter(x, y, c=z, s=100, edgecolor='')
It took 11 minutes to draw this:
4: using_hist2d
import matplotlib.pyplot as plt
def using_hist2d(ax, x, y, bins=(50, 50)):
# https://stackoverflow.com/a/20105673/3015186
# Answer by askewchan
ax.hist2d(x, y, bins, cmap=plt.cm.jet)
It took 0.021 s to draw this bins=(50,50):
It took 0.173 s to draw this bins=(1000,1000):
Cons: The zoomed-in data does not look as good as in with mpl-scatter-density or datashader. Also you have to determine the number of bins yourself.
5: density_scatter
The code is as in the answer by Guillaume.
It took 0.073 s to draw this with bins=(50,50):
It took 0.368 s to draw this with bins=(1000,1000):
Also, if the number of point makes KDE calculation too slow, color can be interpolated in np.histogram2d [Update in response to comments: If you wish to show the colorbar, use plt.scatter() instead of ax.scatter() followed by plt.colorbar()]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.colors import Normalize
from scipy.interpolate import interpn
def density_scatter( x , y, ax = None, sort = True, bins = 20, **kwargs ) :
"""
Scatter plot colored by 2d histogram
"""
if ax is None :
fig , ax = plt.subplots()
data , x_e, y_e = np.histogram2d( x, y, bins = bins, density = True )
z = interpn( ( 0.5*(x_e[1:] + x_e[:-1]) , 0.5*(y_e[1:]+y_e[:-1]) ) , data , np.vstack([x,y]).T , method = "splinef2d", bounds_error = False)
#To be sure to plot all data
z[np.where(np.isnan(z))] = 0.0
# Sort the points by density, so that the densest points are plotted last
if sort :
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
ax.scatter( x, y, c=z, **kwargs )
norm = Normalize(vmin = np.min(z), vmax = np.max(z))
cbar = fig.colorbar(cm.ScalarMappable(norm = norm), ax=ax)
cbar.ax.set_ylabel('Density')
return ax
if "__main__" == __name__ :
x = np.random.normal(size=100000)
y = x * 3 + np.random.normal(size=100000)
density_scatter( x, y, bins = [30,30] )
You could make a histogram:
import numpy as np
import matplotlib.pyplot as plt
# fake data:
a = np.random.normal(size=1000)
b = a*3 + np.random.normal(size=1000)
plt.hist2d(a, b, (50, 50), cmap=plt.cm.jet)
plt.colorbar()

Categories