Matplotlib 2D histogram seems transposed - python

I have the following code to plot a 2d histogram in pyplot:
#!/usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt
MIN, MAX, num = .001, 5000, 500
minn=1
maxx=1000
zbins = 10 ** np.linspace(np.log10(MIN), np.log10(MAX), num)
x=np.linspace(100,600,50000)
y=np.linspace(0,500,50000)
fig1 = plt.figure(1)
counts1,xedges1,edges1,d=plt.hist2d(x,y,bins=zbins)
mesh1 = plt.pcolormesh(zbins, zbins, counts1)
plt.xlim([minn, maxx])
plt.ylim([minn, maxx])
plt.gca().set_xscale("log")
plt.gca().set_yscale("log")
plt.colorbar()
plt.show()
Apologies for my horrible variable naming!
Anyways, when I plot this, the histogram seems to have the x and y axes switched. I checked the matplotlib 2d hist documentation and I was sure that I had the x and y arguments in the right order, but I cannot for the life of me figure out where I'm going wrong. Any help would be greatly appreciated!

The confusion comes from the fact that the returned counts array is not what you think it is.
plt.hist2d internally uses numpy.histogram2d to compute the two-dimensional histogram. The documentation states as return values:
H : ndarray, shape(nx, ny)
The bi-dimensional histogram of samples x and y. Values in x are histogrammed along the first dimension and values in y are histogrammed along the second dimension.
xedges : ndarray, shape(nx,)
The bin edges along the first dimension.
yedges : ndarray, shape(ny,)
The bin edges along the second dimension.
Apart from the fact that there seems to be a mistake concerning the exact shape of the arrays, we see that the first dimension of the returned histogram array is x and the second y.
However, matplotlib always expects y to be the first dimenstion. Therefore, while plt.hist2d produces the correct plot, plt.pcolormesh needs a transposed version of the array.
plt.pcolormesh(X,Y, counts.T)
A full example, comparing plt.hist2d and plt.pcolormesh:
import numpy as np
import matplotlib.pyplot as plt
x=np.linspace(1,10,10)
y=np.linspace(6,9,10)
zbinsx= np.linspace(0,10,11)
zbinsy= np.linspace(5,10,6)
fig, (ax, ax2) = plt.subplots(ncols=2)
counts,xedges,yedges,d = ax.hist2d(x,y, bins=[zbinsx,zbinsy])
# counts has shape (10, 5)
X,Y = np.meshgrid(xedges,yedges)
mesh1 =ax2.pcolormesh(X,Y, counts.T)
plt.show()

Related

Plot matrix of weighted cells in grid with Matplotlib

I have a square matrix built from an array of random integers, defined below:
import numpy as np
dim_low, dim_high = 0, 20 #array of random integers' dimensions
matrix = np.random.random_integers(low = dim_low,high = dim_high, size=(dim_high,dim_high))
print(matrix) #the matrix of defined with repetitions of the array.
Resulted matrix in the picture:
https://i.stack.imgur.com/eEcCh.png
What could I do to plot the matrix generated in a grid with Matplotlib, in a way that the values of each cell (the weights) are printed in the center of each cell, and there's a scale from 0 to 20 in x an y axis, as in the picture below (notice that 'x' and 'o' are text in the example, what I need is the weights, in integer form, not text form):
https://i.stack.imgur.com/9mBuG.png (here)
I pulled most of this from this post.
import numpy as np
import matplotlib.pyplot as plt
low_dim = 0
high_dim = 20
matrix = np.random.randint(low_dim, high_dim, (high_dim,high_dim))
fig, ax = plt.subplots()
for i in range(0, high_dim):
for j in range(0, high_dim):
val = matrix[i,j]
ax.text(i+0.5, j+0.5, str(val), va='center', ha='center')
ax.set_xlim(low_dim, high_dim)
ax.set_ylim(low_dim, high_dim)
ax.set_xticks(np.arange(high_dim))
ax.set_yticks(np.arange(high_dim))
ax.grid()
plt.show()
The right module for this would be seaborn. It has all the functionality you ask for and more...
Try using https://seaborn.pydata.org/generated/seaborn.heatmap.html. I won't take you through the different options because they're really well documented.
Goodluck!
BTW, you'll want to use a panda pivot table for comfortable compatibility.

Matplotlib 3D Waterfall Plot with Colored Heights

I'm trying to visualise a dataset in 3D which consists of a time series (along y) of x-z data, using Python and Matplotlib.
I'd like to create a plot like the one below (which was made in Python: http://austringer.net/wp/index.php/2011/05/20/plotting-a-dolphin-biosonar-click-train/), but where the colour varies with Z - i.e. so the intensity is shown by a colormap as well as the peak height, for clarity.
An example showing the colormap in Z is (apparently made using MATLAB):
This effect can be created using the waterfall plot option in MATLAB, but I understand there is no direct equivalent of this in Python.
I have also tried using the plot_surface option in Python (below), which works ok, but I'd like to 'force' the lines running over the surface to only be in the x direction (i.e. making it look more like a stacked time series than a surface). Is this possible?
Any help or advice greatly welcomed. Thanks.
I have generated a function that replicates the matlab waterfall behaviour in matplotlib, but I don't think it is the best solution when it comes to performance.
I started from two examples in matplotlib documentation: multicolor lines and multiple lines in 3d plot. From these examples, I only saw possible to draw lines whose color varies following a given colormap according to its z value following the example, which is reshaping the input array to draw the line by segments of 2 points and setting the color of the segment to the z mean value between the 2 points.
Thus, given the input matrixes n,m matrixes X,Y and Z, the function loops over the smallest dimension between n,m to plot each line like in the example, by 2 points segments, where the reshaping to plot by segments is done reshaping the array with the same code as the example.
def waterfall_plot(fig,ax,X,Y,Z):
'''
Make a waterfall plot
Input:
fig,ax : matplotlib figure and axes to populate
Z : n,m numpy array. Must be a 2d array even if only one line should be plotted
X,Y : n,m array
'''
# Set normalization to the same values for all plots
norm = plt.Normalize(Z.min().min(), Z.max().max())
# Check sizes to loop always over the smallest dimension
n,m = Z.shape
if n>m:
X=X.T; Y=Y.T; Z=Z.T
m,n = n,m
for j in range(n):
# reshape the X,Z into pairs
points = np.array([X[j,:], Z[j,:]]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
lc = LineCollection(segments, cmap='plasma', norm=norm)
# Set the values used for colormapping
lc.set_array((Z[j,1:]+Z[j,:-1])/2)
lc.set_linewidth(2) # set linewidth a little larger to see properly the colormap variation
line = ax.add_collection3d(lc,zs=(Y[j,1:]+Y[j,:-1])/2, zdir='y') # add line to axes
fig.colorbar(lc) # add colorbar, as the normalization is the same for all, it doesent matter which of the lc objects we use
Therefore, plots looking like matlab waterfall can be easily generated with the same input matrixes as a matplotlib surface plot:
import numpy as np; import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from mpl_toolkits.mplot3d import Axes3D
# Generate data
x = np.linspace(-2,2, 500)
y = np.linspace(-2,2, 40)
X,Y = np.meshgrid(x,y)
Z = np.sin(X**2+Y**2)
# Generate waterfall plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
waterfall_plot(fig,ax,X,Y,Z)
ax.set_xlabel('X') ; ax.set_xlim3d(-2,2)
ax.set_ylabel('Y') ; ax.set_ylim3d(-2,2)
ax.set_zlabel('Z') ; ax.set_zlim3d(-1,1)
The function assumes that when generating the meshgrid, the x array is the longest, and by default the lines have fixed y, and its the x coordinate what varies. However, if the size of the y dimension is larger, the matrixes are transposed, generating the lines with fixed x. Thus, generating the meshgrid with the sizes inverted (len(x)=40 and len(y)=500) yields:
with a pandas dataframe with the x axis as the first column and each spectra as another column
offset=0
for c in s.columns[1:]:
plt.plot(s.wavelength,s[c]+offset)
offset+=.25
plt.xlim([1325,1375])

Color 2D Grid with values from separate 2D array

I have two arrays of data, x and y. I would like to plot on a scatter plot y vs. x. The range of x is [0,3] and the range of y is [-3, 3]. I then want to grid up this region into an n by m grid and color the points in each region based on the values of a separate 2D numpy array (same shape as the grid, n by m). So, the top-leftmost grid cell of my plot should be colored based on the value of colorarr[0][0] and so on. Anyone have any suggestions on how to do this? The closest I"ve found so far is the following:
2D grid data visualization in Python
Unfortunately this simply displays the colorarr, and not the 2D region I would like to visualize.
Thanks!
I think what you want is a 2 dimensional histogram. Matplotlib.pyplot makes this really easy.
import numpy as np
import matplotlib.pyplot as plt
# Make some points
npoints = 500
x = np.random.uniform(low=0, high=3, size=npoints)
y = np.random.uniform(low=-3, high=3, size=npoints)
# Make the plot
plt.hist2d(x, y)
plt.colorbar()
plt.show()
You can do it from just the color array by setting extent and aspect keywords of imshow
import matplotlib as plt
import numpy as np
zval = np.random.rand(100, 100)
plt.imshow(zvals, extent=[0,3,-3,3], aspect="auto")
plt.show()
What you get is the zval array just "crunched in" the [0:3, -3:3] range. Plot just the zval array in imshow to convince yourself of this.

bin 3d points into 3d bins in python

How can I bin 3d points into 3d bins? Is there a multi dimensional version for np.digitize?
I can use np.digitize separately for each dimension, like here. Is there a better solution?
Thanks!
You can do this with numpy.histogramdd(sample), where the number of bins in each direction and the physical range can be adjusted as with a 1D histogram. More info on the reference page. For more general statistics, like the mean of another variable per point in a bin, you can use the scipy scipy.stats.binned_statistic_dd function, see docs.
For your case with an array of three dimensional points, you would use this in the following way,
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from scipy import stats
#Setup some dummy data
points = np.random.randn(1000,3)
hist, binedges = np.histogramdd(points, normed=False)
#Setup a 3D figure and plot points as well as a series of slices
fig = plt.figure()
ax1 = fig.add_subplot(111, projection='3d')
ax1.plot(points[:,0],points[:,1],points[:,2],'k.',alpha=0.3)
#Use one less than bin edges to give rough bin location
X, Y = np.meshgrid(binedges[0][:-1],binedges[1][:-1])
#Loop over range of slice locations (default histogram uses 10 bins)
for ct in [0,2,5,7,9]:
cs = ax1.contourf(X,Y,hist[:,:,ct],
zdir='z',
offset=binedges[2][ct],
level=100,
cmap=plt.cm.RdYlBu_r,
alpha=0.5)
ax1.set_xlim(-3, 3)
ax1.set_ylim(-3, 3)
ax1.set_zlim(-3, 3)
plt.colorbar(cs)
plt.show()
which gives a series of histogram slices of occupancy at each location,

Waterfall plot python?

Is there a python module that will do a waterfall plot like MATLAB does? I googled 'numpy waterfall', 'scipy waterfall', and 'matplotlib waterfall', but did not find anything.
You can do a waterfall in matplotlib using the PolyCollection class. See this specific example to have more details on how to do a waterfall using this class.
Also, you might find this blog post useful, since the author shows that you might obtain some 'visual bug' in some specific situation (depending on the view angle chosen).
Below is an example of a waterfall made with matplotlib (image from the blog post):
(source: austringer.net)
Have a look at mplot3d:
# copied from
# http://matplotlib.sourceforge.net/mpl_examples/mplot3d/wire3d_demo.py
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
X, Y, Z = axes3d.get_test_data(0.05)
ax.plot_wireframe(X, Y, Z, rstride=10, cstride=10)
plt.show()
I don't know how to get results as nice as Matlab does.
If you want more, you may also have a look at MayaVi: http://mayavi.sourceforge.net/
The Wikipedia type of Waterfall chart one can obtain also like this:
import numpy as np
import pandas as pd
def waterfall(series):
df = pd.DataFrame({'pos':np.maximum(series,0),'neg':np.minimum(series,0)})
blank = series.cumsum().shift(1).fillna(0)
df.plot(kind='bar', stacked=True, bottom=blank, color=['r','b'])
step = blank.reset_index(drop=True).repeat(3).shift(-1)
step[1::3] = np.nan
plt.plot(step.index, step.values,'k')
test = pd.Series(-1 + 2 * np.random.rand(10), index=list('abcdefghij'))
waterfall(test)
I have generated a function that replicates the matlab waterfall behaviour in matplotlib. That is:
It generates the 3D shape as many independent and parallel 2D curves
Its color comes from a colormap in the z values
I started from two examples in matplotlib documentation: multicolor lines and multiple lines in 3d plot. From these examples, I only saw possible to draw lines whose color varies following a given colormap according to its z value following the example, which is reshaping the input array to draw the line by segments of 2 points and setting the color of the segment to the z mean value between these 2 points.
Thus, given the input matrixes n,m matrixes X,Y and Z, the function loops over the smallest dimension between n,m to plot each of the waterfall plot independent lines as a line collection of the 2 points segments as explained above.
def waterfall_plot(fig,ax,X,Y,Z,**kwargs):
'''
Make a waterfall plot
Input:
fig,ax : matplotlib figure and axes to populate
Z : n,m numpy array. Must be a 2d array even if only one line should be plotted
X,Y : n,m array
kwargs : kwargs are directly passed to the LineCollection object
'''
# Set normalization to the same values for all plots
norm = plt.Normalize(Z.min().min(), Z.max().max())
# Check sizes to loop always over the smallest dimension
n,m = Z.shape
if n>m:
X=X.T; Y=Y.T; Z=Z.T
m,n = n,m
for j in range(n):
# reshape the X,Z into pairs
points = np.array([X[j,:], Z[j,:]]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
# The values used by the colormap are the input to the array parameter
lc = LineCollection(segments, cmap='plasma', norm=norm, array=(Z[j,1:]+Z[j,:-1])/2, **kwargs)
line = ax.add_collection3d(lc,zs=(Y[j,1:]+Y[j,:-1])/2, zdir='y') # add line to axes
fig.colorbar(lc) # add colorbar, as the normalization is the same for all
# it doesent matter which of the lc objects we use
ax.auto_scale_xyz(X,Y,Z) # set axis limits
Therefore, plots looking like matlab waterfall can be easily generated with the same input matrixes as a matplotlib surface plot:
import numpy as np; import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from mpl_toolkits.mplot3d import Axes3D
# Generate data
x = np.linspace(-2,2, 500)
y = np.linspace(-2,2, 60)
X,Y = np.meshgrid(x,y)
Z = np.sin(X**2+Y**2)-.2*X
# Generate waterfall plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
waterfall_plot(fig,ax,X,Y,Z,linewidth=1.5,alpha=0.5)
ax.set_xlabel('X'); ax.set_ylabel('Y'); ax.set_zlabel('Z')
fig.tight_layout()
The function assumes that when generating the meshgrid, the x array is the longest, and by default the lines have fixed y, and its the x coordinate what varies. However, if the size of the y array is longer, the matrixes are transposed, generating the lines with fixed x. Thus, generating the meshgrid with the sizes inverted (len(x)=60 and len(y)=500) yields:
To see what are the possibilities of the **kwargs argument, refer to the LineCollection class documantation and to its set_ methods.

Categories