Accessing (the right) data when using holoviews/bokeh - python

I am having difficulties accessing (the right) data when using holoviews/bokeh, either for connected plots showing a different aspect of the dataset, or just customising a plot with dynamic access to the data as plotted (say a tooltip).
TLDR: How to add a projection plot of my dataset (different set of dimensions and linked to main plot, like a marginal distribution but, you know, not restricted to histogram or distribution) and probably with a similar solution a related question I asked here on SO
Let me exemplify (straight from a ipynb, should be quite reproducible):
import numpy as np
import random, pandas as pd
import bokeh
import datashader as ds
import holoviews as hv
from holoviews import opts
from holoviews.operation.datashader import datashade, shade, dynspread, spread, rasterize
hv.extension('bokeh')
With imports set up, let's create a dataset (N target 10e12 ;) to use with datashader. Beside the key dimensions, I really need some value dimensions (here z and z2).
import numpy as np
import pandas as pd
N = int(10e6)
x_r = (0,100)
y_r = (100,2000)
z_r = (0,10e8)
x = np.random.randint(x_r[0]*1000,x_r[1]*1000,size=(N, 1))
y = np.random.randint(y_r[0]*1000,y_r[1]*1000,size=(N, 1))
z = np.random.randint(z_r[0]*1000,z_r[1]*1000,size=(N, 1))
z2 = np.ones((N,1)).astype(int)
df = pd.DataFrame(np.column_stack([x,y,z,z2]), columns=['x','y','z','z2'])
df[['x','y','z']] = df[['x','y','z']].div(1000, axis=0)
df
Now I plot the data, rasterised, and also activate the tooltip to see the defaults. Sure, x/y is trivial, but as I said, I care about the value dimensions. It shows z2 as x_y z2. I have a question related to tooltips with the same sort of data here on SO for value dimension access for the tooltips.
from matplotlib.cm import get_cmap
palette = get_cmap('viridis')
# palette_inv = palette.reversed()
p=hv.Points(df,['x','y'], ['z','z2'])
P=rasterize(p, aggregator=ds.sum("z2"),x_range=(0,100)).opts(cmap=palette)
P.opts(tools=["hover"]).opts(height=500, width=500,xlim=(0,100),ylim=(100,2000))
Now I can add a histogram or a marginal distribution which is pretty close to what I want, but there are issues with this soon past the trivial defaults. (E.g.: P << hv.Distribution(p, kdims=['y']) or P.hist(dimension='y',weight_dimension='x_y z',num_bins = 2000,normed=True))
Both are close approaches, but do not give me the other value dimension I'd like visualise. If I try to access the other value dimension ('x_y z') this fails. Also, the 'x_y z2' way seems very clumsy, is there a better way?
When I do something like this, my browser/notebook-extension blows up, of course.
transformed = p.transform(x=hv.dim('z'))
P << hv.Curve(transformed)
So how do I access all my data in the right way?

Related

How can I plot only particular values in xarray?

I am using data from cdasws to plot dynamic spectra. I am following the example found here https://cdaweb.gsfc.nasa.gov/WebServices/REST/jupyter/CdasWsExample.html
This is my code which I have modified to obtain a dynamic spectra for STEREO.
from cdasws import CdasWs
from cdasws.datarepresentation import DataRepresentation
import matplotlib.pyplot as plt
cdas = CdasWs()
import numpy as np
datasets = cdas.get_datasets(observatoryGroup='STEREO')
for index, dataset in enumerate(datasets):
print(dataset['Id'], dataset['Label'])
variables = cdas.get_variables('STEREO_LEVEL2_SWAVES')
for variable_1 in variables:
print(variable_1['Name'], variable_1['LongDescription'])
data = cdas.get_data('STEREO_LEVEL2_SWAVES', ['avg_intens_ahead'],
'2020-07-11T02:00:00Z', '2020-07-11T03:00:00Z',
dataRepresentation = DataRepresentation.XARRAY)[1]
print(data)
plt.figure(figsize = (15,7))
# plt.ylim(100,1000)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.yscale('log')
sorted_data.transpose().plot()
plt.xlabel("Time",size=18)
plt.ylabel("Frequency (kHz)",size=18)
plt.show()
Using this code gives a plot that looks something like this,
My question is, is there anyway of plotting this spectrum only for a particular frequency? For example, I want to plot just the intensity values at 636 kHz, is there any way I can do that?
Any help is greatly appreciated, I dont understand xarray, I have never worked with it before.
Edit -
Using the command,
data_stereo.avg_intens_ahead.loc[:,625].plot()
generates a plot that looks like,
While this is useful, what I needed is;
for the dynamic spectrum, if i choose a particular frequency like 600khz, can it display something like this (i have just added white boxes to clarify what i mean) -
If you still want the plot to be 2D, but to include a subset of your data along one of the dimensions, you can provide an array of indices or a slice object. For example:
data_stereo.avg_intens_ahead.sel(
frequency=[625]
).plot()
Or
# include a 10% band on either side
data_stereo.avg_intens_ahead.sel(
frequency=slice(625*0.9, 625*1.1)
).plot()
Alternatively, if you would actually like your plot to show white space outside this selected area, you could mask your data with where:
data_stereo.avg_intens_ahead.where(
data_stereo.frequency==625
).plot()

Holoviews: how to customize histogram for linked time series Curve plots

I am just getting started with Holoviews. My questions are on customizing histograms, but also I am sharing a complete example as it may be helpful for other newbies to look at, since the documentation for Holoviews is very thorough but can be overwhelming.
I have a number of time series in text files loaded as Pandas DataFrames where:
each file is for a specific location
at each location about 10 time series were collected, each with about 15,000 points
I am building a small interactive tool where a Selector can be used to choose the location / DataFrame, and then another Selector to pick 3 of 10 of the time series to be plotted together.
My goal is to allow linked zooms (both x and y scales). The questions and code will focus on this aspect of the tool.
I cannot share the actual data I am using, unfortunately, as it is proprietary, but I have created 3 random walks with specific data ranges that are consistent with the actual data.
## preliminaries ##
import pandas as pd
import numpy as np
import holoviews as hv
from holoviews.util.transform import dim
from holoviews.selection import link_selections
from holoviews import opts
from holoviews.operation.datashader import shade, rasterize
import hvplot.pandas
hv.extension('bokeh', width=100)
## create random walks (one location) ##
data_df = pd.DataFrame()
npoints=15000
np.random.seed(71)
x = np.arange(npoints)
y1 = 1300+2.5*np.random.randn(npoints).cumsum()
y2 = 1500+2*np.random.randn(npoints).cumsum()
y3 = 3+np.random.randn(npoints).cumsum()
data_df.loc[:,'x'] = x
data_df.loc[:,'rand1'] = y1
data_df.loc[:,'rand2'] = y2
data_df.loc[:,'rand3'] = y3
This first block is just to plot the data and show how, by design, one of the random walks have different range from the other two:
data_df.hvplot(x='x',y=['rand1','rand2','rand3'],value_label='y',width=800,height=400)
As a result, although hvplot subplots work out of the box (for linking), ranges are different so the scaling is not quite there:
data_df.hvplot(x='x',y=['rand1','rand2','rand3'],
value_label='y',subplots=True,width=800,height=200).cols(1)
So, my first attempt was to adapt the Python-based Points example from Linked brushing in the documentation:
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
hv.Points(data_df, dim).opts(color=c)
for c, dim in zip(colors, [['x', d] for d in dims])
])
link_selections(layout).opts(opts.Points(width=1200, height=300)).cols(1)
That is already an amazing result for a 20 minutes effort!
However, what I would really like is to plot a curve rather than points, and also see a histogram, so I adapted the comprehension syntax to work with Curve (after reading the documentation pages Applying customization, and Composing elements):
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([hv.Curve(data_df,'x',dim).opts(height=300,width=1200,
color=c).hist(dim) for c,
dim in zip(colors,[d for d in dims])])
link_selections(layout).cols(1)
Which is almost exactly what I want. But I still struggle with the different layers of opts syntax.
Question 1: with the comprehension from the last code block, how would I make the histogram share color with the curves?
Now, suppose I want to rasterize the plots (although I do not think is quite yet necessary with 15,000 points like in this case), I tried to adapt the first example with Points:
cmaps = ['Blues', 'Greens', 'Reds']
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
shade(rasterize(hv.Points(data_df, dims),
cmap=c)).opts(width=1200, height = 400).hist(dims[1])
for c, dims in zip(cmaps, [['x', d] for d in dims])
])
link_selections(layout).cols(1)
This is a decent start, but again I struggle with the options/customization.
Question 2: in the above cod block, how would I pass the colormaps (it does not work as it is now), and how do I make the histogram reflect data values as in the previous case (and also have the right colormap)?
Thank you!
Sander answered how to color the histogram, but for the other question about coloring the datashaded plot, Datashader renders your data with a colormap rather than a single color, so the parameter is named cmap rather than color. So you were correct to use cmap in the datashaded case, but (a) cmap is actually a parameter to shade (which does the colormapping of the output of rasterize), and (b) you don't really need shade, as you can let Bokeh do the colormapping in most cases nowadays, in which case cmap is an option rather than an argument. Example:
from bokeh.palettes import Blues, Greens, Reds
cmaps = [Blues[256][200:], Greens[256][200:], Reds[256][200:]]
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
rasterize(hv.Points(data_df, ds)).opts(cmap=c,width=1200, height = 400).hist(dims[1])
for c, ds in zip(cmaps, [['x', d] for d in dims])
])
link_selections(layout).cols(1)
To answer your first question to make the histogram share the color of the curve, I've added .opts(opts.Histogram(color=c)) to your code.
When you have a layout you can specify the options of an element inside the layout like that.
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout(
[hv.Curve(data_df,'x',dim)
.opts(height=300,width=600, color=c)
.hist(dim)
.opts(opts.Histogram(color=c))
for c, dim in zip(colors,[d for d in dims])]
)
link_selections(layout).cols(1)

Import bathemetry via csv and visualise a large body of water

I hope I explain this correctly.
I'm looking for a way to better visualise underwater noise.
I'm not after a solution (well, maybe I am) but more interested in what the perfect start would be, considering speed is of essence (so pretty much your opinion of Q1 to Q3).
I’m trying perform calculations and visualisations of a body of water.
For this I want basically import the bathymetry (csv containing x,y,z) of a substantial area (lets say 50kmx50km).
Q1: do I use a pandas dataframe or numpy array.
Q2: do you envision this as a mesh, where the column names are x and row names are y and the elevation(z) are the fields?
As z can be positive or negative, the landmass starts when z>0, which will always vary depending on tide. I want to be able to increase or decrease the low and high tide on the ‘fly’
The actual seafloor bottom is also important depending on the surface, salinity, water temperature per meter ect.
Q3: is this where I should go 3D (in mesh?)
For now I was just focussing on importing the bathymetry and visualise what I imported in a graphical way (and failing a bit).
So far my code looks like below, sorry about the lack of comments.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
import pandas as pd
import tkinter as tk
from tkinter import filedialog
import scipy
from scipy import interpolate
# min_? is minimum bound, max_? is maximum bound,
# dim_? is the granularity in that direction
filename = filedialog.askopenfilename()
df = pd.read_csv(filename, delimiter=',',names=["X", "Y", "Z"])
df.sort_values(by=["X"])
mat = df.to_numpy()
min_x = np.array(df['X'].values.tolist()).min(axis=0)
max_x = np.array(df['X'].values.tolist()).max(axis=0)
min_y = np.array(df['Y'].values.tolist()).min(axis=0)
max_y = np.array(df['Y'].values.tolist()).max(axis=0)
min_Z = np.array(df['Z'].values.tolist()).min(axis=0)
max_z = np.array(df['Z'].values.tolist()).max(axis=0)
dim_x = np.array(df['X'].count())
dim_y = np.array(df['Y'].count())
x=df.columns[0:]
y=df.columns[1:]
z=df.columns[2:]
x = np.linspace(min_x, max_x, dim_x)
y = np.linspace(min_y, max_y, dim_y)
X,Y = np.meshgrid(x, y)
# Interpolate (x,y,z) points [mat] over a normal (x,y) grid [X,Y]
# Depending on your "error", you may be able to use other methods
Z = scipy.interpolate.griddata((mat[:,0], mat[:,1]), mat[:,2], (X,Y),method='linear')
plt.pcolormesh(X,Y,Z)
plt.show()

How to overplot arrays of different shape?

I'm trying to overplot two arrays with different shapes but I'm unable to project one on the top of the other. For example:
#importing the relevant packages
import numpy as np
import matplotlib.pyplot as plt
def overplot(data1,data2):
'''
This function should make a contour plot
of data2 over the data1 plot.
'''
#creating the figure
fig = plt.figure()
#adding an axe
ax = fig.add_axes([1,1,1,1])
#making the plot for the
#first dataset
ax.imshow(data1)
#overplotting the contours
#for the second dataset
ax.contour(data2, projection = data2,
levels = [0.5,0.7])
#showing the figure
plt.show(fig)
return
if __name__ == '__main__':
'''
testing zone
'''
#creating two mock datasets
data1 = np.random.rand(3,3)
data2 = np.random.rand(9,9)
#using the overplot
overplot(data1,data2)
Currently, my output is something like:
While what I actually would like is to project the contours of the second dataset into the first one. This way, if I got images of the same object but with different resolution for the cameras I would be able to do such plots. How can I do that?
Thanks for your time and attention.
It's generally best to make the data match, and then plot it. This way you have complete control over how things are done.
In the simple example you give, you could use repeat along each axis to expand the 3x3 data to match the 9x9 data. That is, you could use, data1b = np.repeat(np.repeat(data1, 3, axis=1), 3, axis=0) to give:
But for the more interesting case of images, like you mention at the end of your question, then the axes probably won't be integer multiples and you'll be better served by a spline or other type interpolation. This difference is an example of why it's better to have control over this yourself, since there are many ways to to this type of mapping.

How to produce the following images (gabor patches)

I am trying to create four gabor patches, very similar to those below.
I don't need them to be identical to the pictures below, but similar.
Despite a bit of tinkering, I have been unable to reproduce these images...
I believe they were created in MATLAB originally. I don't have access to the original MATLAB code.
I have the following code in python (2.7.10):
import numpy as np
from scipy.misc import toimage # One can also use matplotlib*
data = gabor_fn(sigma = ???, theta = 0, Lambda = ???, psi = ???, gamma = ???)
toimage(data).show()
*graphing a numpy array with matplotlib
gabor_fn, from here, is defined below:
def gabor_fn(sigma,theta,Lambda,psi,gamma):
sigma_x = sigma;
sigma_y = float(sigma)/gamma;
# Bounding box
nstds = 3;
xmax = max(abs(nstds*sigma_x*numpy.cos(theta)),abs(nstds*sigma_y*numpy.sin(theta)));
xmax = numpy.ceil(max(1,xmax));
ymax = max(abs(nstds*sigma_x*numpy.sin(theta)),abs(nstds*sigma_y*numpy.cos(theta)));
ymax = numpy.ceil(max(1,ymax));
xmin = -xmax; ymin = -ymax;
(x,y) = numpy.meshgrid(numpy.arange(xmin,xmax+1),numpy.arange(ymin,ymax+1 ));
(y,x) = numpy.meshgrid(numpy.arange(ymin,ymax+1),numpy.arange(xmin,xmax+1 ));
# Rotation
x_theta=x*numpy.cos(theta)+y*numpy.sin(theta);
y_theta=-x*numpy.sin(theta)+y*numpy.cos(theta);
gb= numpy.exp(-.5*(x_theta**2/sigma_x**2+y_theta**2/sigma_y**2))*numpy.cos(2*numpy.pi/Lambda*x_theta+psi);
return gb
As you may be able to tell, the only difference (I believe) between the images is contrast. So, gabor_fn would likely needed to be altered to do allow for this (unless I misunderstand one of the params)...I'm just not sure how.
UPDATE:
from math import pi
from matplotlib import pyplot as plt
data = gabor_fn(sigma=5.,theta=pi/2.,Lambda=12.5,psi=90,gamma=1.)
unit = #From left to right, unit was set to 1, 3, 7 and 9.
bound = 0.0009/unit
fig = plt.imshow(
data
,cmap = 'gray'
,interpolation='none'
,vmin = -bound
,vmax = bound
)
plt.axis('off')
The problem you are having is a visualization problem (although, I think you are chossing too large parameters).
By default matplotlib, and scipy's (toimage) use bilinear (or trilinear) interpolation, depending on your matplotlib's configuration script. That's why your image looks so smooth. It is because your pixels values are being interpolated, and you are not displaying the raw kernel you have just calculated.
Try using matplotlib with no interpolation:
from matplotlib import pyplot as plt
plt.imshow(data, 'gray', interpolation='none')
plt.show()
For the following parameters:
data = gabor_fn(sigma=5.,theta=pi/2.,Lambda=25.,psi=90,gamma=1.)
You get this output:
If you reduce lamda to 15, you get something like this:
Additionally, the sigma you choose changes the strength of the smoothing, adding parameters vmin=-1 and vmax=1 to imshow (similar to what #kazemakase) suggested, will give you the desired contrast.
Check this guide for sensible values (and ways to use) gabor kernels:
http://scikit-image.org/docs/dev/auto_examples/plot_gabor.html
It seems like toimage scales the input data so that the min/max values are mapped to black/white.
I do not know what amplitudes to reasonably expect from gabor patches, but you should try something like this:
toimage(data, cmin=-1, cmax=1).show()
This tells toimage what range your data is in. You can try to play around with cmin and cmax, but make sure they are symmetric (i.e. cmin=-x, cmax=x) so that a value of 0 maps to grey.

Categories