How to make a spectrum plot - python

I am trying to replicate a spectrum plot like the figure below with both Python and Matlab, no success so far.
The image is from Electric Field Instrument data. The plot should have time on x-axis, frequency on y-axis and colorbar on the right y-axis.
The data is a two dimensional matrix, each row represents the time stamp, the column represents different frequency after FFT. the problem is the data has a lot of NaN values, only a few frequency has data, when I used plt.imshow() it give me completely blank image. Besides, the value ranges from 1e-12 to 1e-7, very small.
Any hint on how to visualize image like this would be greatly appreciated.
Screenshot of the data. The data is from NASA EFI data.
I utilized plt.imshow with Python and imagesc in Matlab with the whole 2d matrix, it give me blank image of the same color.
Below is my Python code trial, all gave me wrong images:
plt.matshow(dt, cmap='jet');plt.colorbar(); plt.show()
for i in range(dt.shape[0]):
plt.plot(dt.iloc[i, :]);plt.show()

You shouldn't use imshow because this will display it as if it were an image (because you have a 2D matrix).
You need to plot each row separately, like so:
import numpy as np
import matplotlib.pyplot as plt
sin1 = np.sin(np.linspace(0, 2*np.pi, 100))
sin2 = np.sin(np.linspace(0, 2*np.pi, 100)) + 0.5
sin3 = np.sin(np.linspace(0, 2*np.pi, 100)) + 1
sin1[10] = np.nan
sin2[20] = np.nan
sin3[30] = np.nan
data = np.array([sin1, sin2, sin3])
# plot each row as a separate series
for i in range(data.shape[0]):
plt.plot(data[i, :])
plt.show()
and then the nan's should just be empty spots in the graph.

Related

Radial heatmap from similarity matrix in Python

Summary
I have a 2880x2880 similarity matrix (8.5 mil points). My attempt with Holoviews resulted in a 500 MB HTML file which never finishes "opening". So how do I make a round heatmap of the matrix?
Details
I had data from 10 different places, measured over 1 whole year. The hours of each month were turned into arrays, so each month had 24 arrays (one for all 00:00, one for all 01:00 ... 22:00, 23:00).
These were about 28-31 cells long, and each cell had the measurement of the thing I'm trying to analyze. So there are these 24 arrays for each month of 1 whole year, i.e. 24x12 = 288 arrays per place. And there are measurements from 10 places. So a total of 2880 arrays were created and all compared to each other, and saved in a 2880x2880 matrix with similarity coefficients.
I'm trying to turn it into a radial similarity matrix like the one from holoviews, but without the ticks and tags (since the format Place01Jan0800 would be cumbersome to look at for 2880 rows), just the shape and colors and divisions:
I managed to create the HTML file itself, but it ended up being 500 MB big, so it never shows up when I open it up. It's just blank. I've added a minimal example below of what I have, and replaced the loading of the datafile with some randomly generated data.
import sys
sys.setrecursionlimit(10000)
import random
import numpy as np
import pandas as pd
import holoviews as hv
from holoviews import opts
from bokeh.plotting import show
import gc
# Function creating dummy data for this example
def transformer():
dimension = 2880
dummy_matrix = ([[ random.random() for i in range(dimension) ] for j in range(dimension)]) #Fake, similar data
col_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
row_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
val_vals = (np.reshape(np.array(dummy_matrix), -1)).tolist() # Turn matrix into an array
idx_vals = [i for i in range(dimension*dimension)] # Placeholder
return idx_vals, val_vals, row_vals, col_vals
idx_arr, val_arr, row_arr, col_arr = transformer()
df = pd.DataFrame({"values": val_arr, "x-label": row_arr, "y-label": col_arr}, index=idx_arr)
hv.extension('bokeh')
heatmap = hv.HeatMap(df, ["x-label", "y-label"])
heatmap.opts(opts.HeatMap(cmap="viridis", radial=True))
gc.collect() # Attempt to save memory, because this thing is huge
show(hv.render(heatmap))
I had a look at datashader to see if it would help, but I have no idea how to plug it in (if it's possible for this case) to this radial heatmap, since it seems like the radial heatmap doesn't have that datashade-feature.
So I have no idea how to tackle this. I would be content with a broad overview too, I don't need the details nor the hover-infobox nor ability to zoom or any fancy extra features, I just need the general overview for a presentation. I'm open to any solution really.
I recommend you to use heatmp instead of radial heatamp for showing the similarity matrix. The reasons are:
The radial heatmap is designed for periodic variable. The time varible(288 hours) can be considered to be periodic data, however, I think the 288*10(288 hours, 10 places) is no longer periodic because of the existence of the "place".
Near the center of the radial heatmap, the color points will be too dense to be understood by the human.
The following is a simple code to show a heatmap.
import matplotlib.cm
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import numpy as np
n = 2880
m = 2880
dummy_matrix = np.random.rand(m, n)
fig = plt.figure(figsize=(50,50)) # change the figsize to control the resolution
ax = fig.add_subplot(111)
cmap = matplotlib.cm.get_cmap("Blues") # you may use other build-in colormap or define you own colormap
# if your data is not in range[0,1], use a normalization. Here is normalized by min and max values.
norm = Normalize(vmin=np.amin(dummy_matrix), vmax=np.amax(dummy_matrix))
image = ax.imshow(dummy_matrix, cmap=cmap, norm=norm)
plt.colorbar(image)
plt.show()
Which gives:
Another idea that comes to me is that, perhaps the computation of similarity matrix is unnecessary, and you can plot the orginial 288 * 10 data using radial heat map or just a normal heatmap, and one can get to know the data similarity from the color distribution directly.
Plain Matplotlib seems to be able to handle it, based on answers from here: How do I create radial heatmap in matplotlib?
import random
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
fig = plt.figure()
ax = Axes3D(fig)
n = 2880
m = 2880
rad = np.linspace(0, 10, m)
a = np.linspace(0, 2 * np.pi, n)
r, th = np.meshgrid(rad, a)
dummy_matrix = ([[ random.random() for i in range(n) ] for j in range(m)])
plt.subplot(projection="polar")
plt.pcolormesh(th, r, dummy_matrix, cmap = 'Blues')
plt.plot(a, r, ls='none', color = 'k')
plt.grid()
plt.colorbar()
plt.savefig("custom_radial_heatmap.png")
plt.show()
And it didn't even take an eternity, took only about 20 seconds max.
You would think it would turn out monstrous like that
But the sheer amount of points drowns out the jaggedness, WOOHOO!
There's some things left to be desired, like tags and ticks, but I think I'll figure that out.

How to drawheat map with large data set in python

I am trying to plot a sine wave, and the color of the curve at each point is represented by its tangential slope value.
For example, a 3600 * 1000 data frame should be filled:
x_axis = list(range(0, 3601))
y_axis = list(range(-1000, 1001))
wave = pd.DataFrame(index = y_axis,columns= x_axis )
for i in range(0, 3601, 1):
y = int(round(np.sin(np.radians(i / 10)), 3) * 1000)
wave.loc[y, i] = -abs(y)
wave = wave.fillna(0)
wave[wave == 0] =np.nan
seaborn.heatmap(wave)
and by using seaborn.heatmap(wave) the heatmap will be generated like attached image. But what I am looking for is to draw maybe 50-100 sine wave like this in one picture, so the dataframe size will be much larger to 360000*10000. With this size of dataframe I still want to show similar heatmap, or any type or drawing that can represent the value change for each cell. My work station seems to freeze by using seaborn heatmap with this dataset.
Some of my thoughts would be to normalize all the values to 0-255 and use some GLV plotting function, I am still researching it.
You could create a similar plot using plt.scatter:
import matplotlib.pyplot as plt
import numpy as np
x_axis = np.arange(0, 360, 0.1)
y = np.round(np.sin(np.radians(x_axis)), 3) * 1000
plt.scatter(x_axis, y, c=-np.abs(y), s=1, cmap='gist_heat')
plt.show()
To get a wider curve, just increase s. To get rid of the white part of the colormap, you can move the color limits (called vmin and vmax). Standard they are the minimum and maximum of the given color values. In this case the maximum is 0 and the minimum is -1000. Setting vmax to +100 would leave out 10% of the color range.
plt.scatter(x_axis, y, c=-np.abs(y), vmax=0.1*y.max(), s=10, cmap='gist_heat')

Gridding irregular scatter data on basemap, trying different resolutions

I am trying to regrid/interpolate within a grid of a certain size, my dataset of irregularly scattered location (lat lon) tied variable values. My data is available as a dataframe with columns marking the value of variable, latitude and longitude, separately.
I have to first grid this data, by optimizing grid size, and then find the best method to take average of different number of points lying within the grid box.
I have tried a code by following an online example. I use histogram2d function to grid the latitudes and longitudes. I fill the grid boxes having scatter points, with density count (equal to average of all points lying within the grid). (I will then have to use this newly gridded data, generated out of scatter points, to compare with another dataset that has a different grid resolution).
It should ideally work fine but grid boxes without scatter points are getting filled while those with the points are being left out. The mismatch is greater in finer resolution or smaller bin sizes.
I have looked up these examples - example 1, example 2.
Here is a part of my code:
df #Dataframe as a csv file opened in pandas
y = df['lon']
x = df['lat']
z = df['var']
# Bin the data onto a 10x10 grid or into any other size
# Have to reverse x & y due to row-first indexing
zi, yi, xi = np.histogram2d(y, x, bins=(5,5), weights=z, normed=False)
counts, _, _ = np.histogram2d(y, x, bins=(5,5))
zi = zi / counts
zi = np.ma.masked_invalid(zi)
m = Basemap(llcrnrlat=45,urcrnrlat=55,llcrnrlon=25,urcrnrlon=30)
m.drawcoastlines(linewidth =0.75, color ="black")
m.drawcountries(linewidth =0.75, color ="black")
m.drawmapboundary()
p,q = m(yi,xi)
#cs=m.pcolormesh(xi, yi, zi, edgecolors='black',cmap = 'jet')
cs=m.pcolormesh(p, q, zi, edgecolors='black',cmap = 'jet')
m.colorbar(cs)
#scat = m.scatter(x,y, c=z, s=200,edgecolors='red')
scat=m.scatter(y,x, latlon=True,c=z, s =80)
The following is the image getting generated.
Any help will be much appreciated.
A friend helped me figure this out.
Had to Transpose the array matrix generated from histogram while doing a pcolormesh plot:
cs=m.pcolormesh(p, q, zi.T, edgecolors='black',cmap = 'jet')

Matplotlib 3D Waterfall Plot with Colored Heights

I'm trying to visualise a dataset in 3D which consists of a time series (along y) of x-z data, using Python and Matplotlib.
I'd like to create a plot like the one below (which was made in Python: http://austringer.net/wp/index.php/2011/05/20/plotting-a-dolphin-biosonar-click-train/), but where the colour varies with Z - i.e. so the intensity is shown by a colormap as well as the peak height, for clarity.
An example showing the colormap in Z is (apparently made using MATLAB):
This effect can be created using the waterfall plot option in MATLAB, but I understand there is no direct equivalent of this in Python.
I have also tried using the plot_surface option in Python (below), which works ok, but I'd like to 'force' the lines running over the surface to only be in the x direction (i.e. making it look more like a stacked time series than a surface). Is this possible?
Any help or advice greatly welcomed. Thanks.
I have generated a function that replicates the matlab waterfall behaviour in matplotlib, but I don't think it is the best solution when it comes to performance.
I started from two examples in matplotlib documentation: multicolor lines and multiple lines in 3d plot. From these examples, I only saw possible to draw lines whose color varies following a given colormap according to its z value following the example, which is reshaping the input array to draw the line by segments of 2 points and setting the color of the segment to the z mean value between the 2 points.
Thus, given the input matrixes n,m matrixes X,Y and Z, the function loops over the smallest dimension between n,m to plot each line like in the example, by 2 points segments, where the reshaping to plot by segments is done reshaping the array with the same code as the example.
def waterfall_plot(fig,ax,X,Y,Z):
'''
Make a waterfall plot
Input:
fig,ax : matplotlib figure and axes to populate
Z : n,m numpy array. Must be a 2d array even if only one line should be plotted
X,Y : n,m array
'''
# Set normalization to the same values for all plots
norm = plt.Normalize(Z.min().min(), Z.max().max())
# Check sizes to loop always over the smallest dimension
n,m = Z.shape
if n>m:
X=X.T; Y=Y.T; Z=Z.T
m,n = n,m
for j in range(n):
# reshape the X,Z into pairs
points = np.array([X[j,:], Z[j,:]]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
lc = LineCollection(segments, cmap='plasma', norm=norm)
# Set the values used for colormapping
lc.set_array((Z[j,1:]+Z[j,:-1])/2)
lc.set_linewidth(2) # set linewidth a little larger to see properly the colormap variation
line = ax.add_collection3d(lc,zs=(Y[j,1:]+Y[j,:-1])/2, zdir='y') # add line to axes
fig.colorbar(lc) # add colorbar, as the normalization is the same for all, it doesent matter which of the lc objects we use
Therefore, plots looking like matlab waterfall can be easily generated with the same input matrixes as a matplotlib surface plot:
import numpy as np; import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from mpl_toolkits.mplot3d import Axes3D
# Generate data
x = np.linspace(-2,2, 500)
y = np.linspace(-2,2, 40)
X,Y = np.meshgrid(x,y)
Z = np.sin(X**2+Y**2)
# Generate waterfall plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
waterfall_plot(fig,ax,X,Y,Z)
ax.set_xlabel('X') ; ax.set_xlim3d(-2,2)
ax.set_ylabel('Y') ; ax.set_ylim3d(-2,2)
ax.set_zlabel('Z') ; ax.set_zlim3d(-1,1)
The function assumes that when generating the meshgrid, the x array is the longest, and by default the lines have fixed y, and its the x coordinate what varies. However, if the size of the y dimension is larger, the matrixes are transposed, generating the lines with fixed x. Thus, generating the meshgrid with the sizes inverted (len(x)=40 and len(y)=500) yields:
with a pandas dataframe with the x axis as the first column and each spectra as another column
offset=0
for c in s.columns[1:]:
plt.plot(s.wavelength,s[c]+offset)
offset+=.25
plt.xlim([1325,1375])

Using numpy arrays to count the number of points within the cells of a regular grid

I am working with a large number of 3D points, each with x,y,z values stored in numpy arrays.
For background, the points will always fall within a cylinder of fixed radius, and height = max z value of the points.
My objective is to split the bounding cylinder (or column if it is easier) into e.g. 1 m height strata, and then count the number of points within each cell
of a regular grid (e.g. 1 m x 1 m) overlaid on each strata.
Conceptually, the operation would be the same as overlaying a raster and counting the points intersecting each pixel.
The grid of cells can form a square or a disk, it doesn't matter.
After a lot of searching and reading, my current thinking is to use some combination of numpy.linspace and numpy.meshgrid to generate the vertices of each cell stored within an array and test each cell against each point to see if it is 'in'. This seems inefficient, especially when working with thousands of points.
The numpy / scipy suite seems well suited to the problem, but I have not found a solution yet. Any suggestions would be much appreciated.
I have included a few example points and some code to visualize the data.
# Setup
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Load in X,Y,Z values from a sub-sample of 10 points for testing
# XY Values are scaled to a reasonable point of origin
z_vals = np.array([3.08,4.46,0.27,2.40,0.48,0.21,0.31,3.28,4.09,1.75])
x_vals = np.array([22.88,20.00,20.36,24.11,40.48,29.08,36.02,29.14,32.20,18.96])
y_vals = np.array([31.31,25.04,31.86,41.81,38.23,31.57,42.65,18.09,35.78,31.78])
# This plot is instructive to visualize the problem
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x_vals, y_vals, z_vals, c='b', marker='o')
plt.show()
I am not sure I understand perfectly what you are looking for, but since every "cell" seems to have a 1m side for all directions, couldn't you:
round all your values to integers (rasterize your data) probably with some floor function;
create a bijection from these integer coordinates to something more convenient with something like:
(64**2)*x + (64)*y + z # assuming all values are in [0,63]
You can put z rather at the beginning if you want to more easely focus on height later
compute the histogram of each "cell" (several functions from numpy/scipy or numpy can do it);
revert the bijection if needed (ie. know the "true" coordinates of each cell once the count is known)
Maybe I didn't understand well, but in case it helps...
Thanks #Baruchel. It turns out the n-dimensional histograms suggested by #DilithiumMatrix provides a fairly simple solution to the problem I posted. After some reading, here is my current solution for anyone else that faces a similar problem.
As this is my first Python/Numpy effort any improvements/suggestions, especially regarding performance, would be welcome. Thanks.
# Setup
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Load in X,Y,Z values from a sub-sample of 10 points for testing
# XY Values are scaled to a reasonable point of origin
z_vals = np.array([3.08,4.46,0.27,2.40,0.48,0.21,0.31,3.28,4.09,1.75])
x_vals = np.array([22.88,20.00,20.36,24.11,40.48,29.08,36.02,29.14,32.20,18.96])
y_vals = np.array([31.31,25.04,31.86,41.81,38.23,31.57,42.65,18.09,35.78,31.78])
# Updated code below
# Variables needed for 2D,3D histograms
xmax, ymax, zmax = int(x_vals.max())+1, int(y_vals.max())+1, int(z_vals.max())+1
xmin, ymin, zmin = int(x_vals.min()), int(y_vals.min()), int(z_vals.min())
xrange, yrange, zrange = xmax-xmin, ymax-ymin, zmax-zmin
xedges = np.linspace(xmin, xmax, (xrange + 1), dtype=int)
yedges = np.linspace(ymin, ymax, (yrange + 1), dtype=int)
zedges = np.linspace(zmin, zmax, (zrange + 1), dtype=int)
# Make the 2D histogram
h2d, xedges, yedges = np.histogram2d(x_vals, y_vals, bins=(xedges, yedges))
assert np.count_nonzero(h2d) == len(x_vals), "Unclassified points in the array"
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
plt.imshow(h2d.transpose(), extent=extent, interpolation='none', origin='low')
# Transpose and origin must be used to make the array line up when using imshow, unsure why
# Plot settings, not sure yet on matplotlib update/override objects
plt.grid(b=True, which='both')
plt.xticks(xedges)
plt.yticks(yedges)
plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')
plt.plot(x_vals, y_vals, 'ro')
plt.show()
# 3-dimensional histogram with 1 x 1 x 1 m bins. Produces point counts in each 1m3 cell.
xyzstack = np.stack([x_vals,y_vals,z_vals], axis=1)
h3d, Hedges = np.histogramdd(xyzstack, bins=(xedges, yedges, zedges))
assert np.count_nonzero(h3d) == len(x_vals), "Unclassified points in the array"
h3d.shape # Shape of the array should be same as the edge dimensions
testzbin = np.sum(np.logical_and(z_vals >= 1, z_vals < 2)) # Slice to test with
np.sum(h3d[:,:,1]) == testzbin # Test num points in second bins
np.sum(h3d, axis=2) # Sum of all vertical points above each x,y 'pixel'
# only in this example the h2d and np.sum(h3d,axis=2) arrays will match as no z bins have >1 points
# Remaining issue - how to get a r x c count of empty z bins.
# i.e. for each 'pixel' how many z bins contained no points?
# Possible solution is to reshape to use logical operators
count2d = h3d.reshape(xrange * yrange, zrange) # Maintain dimensions per num 3D cells defined
zerobins = (count2d == 0).sum(1)
zerobins.shape
# Get back to x,y grid with counts - ready for output as image with counts=pixel digital number
bincount_pixels = zerobins.reshape(xrange,yrange)
# Appears to work, perhaps there is a way without reshapeing?
PS if you are facing a similar problem scikit patch extraction looks like another possible solution.

Categories