Bokeh use of Column Data Source and Box_Select

Bokeh use of Column Data Source and Box_Select - python

I'm lost as to how to set up a Column Data Source so that I can select points from one graph and have the corresponding points highlighted in another graph. I am trying to learn more about how this works.
The sample code I am using is the example called Linked Brushing. I'd like to see if I can get the same effect with my own code, below. That web page explanation also refers to Linked Selection with Filtered Data but I don't understand what the code filters=[BooleanFilter([True if y > 250 or y < 100 else False for y in y1] on that page does, so I'm not sure how to adapt it, or if it's even relevant.
Here is my code:
from bokeh.plotting import figure, output_file, show, Column
from bokeh.models import ColumnDataSource, CDSView, BooleanFilter
from MyFiles import *
class bokehPlot:
def __init__(self, filename, t, a, b, c, d):
self.source = ColumnDataSource(data=dict(x=t, y1=a, y2=b, y3=c, y4=d))
p1 = self.makePlot(filename, 'x', 'y1', 'A')
p2 = self.makePlot(filename, 'x', 'y2', 'B', x_link=p1)
p3 = self.makePlot(filename, 'x', 'y3', 'C', x_link=p1)
p4 = self.makePlot(filename, 'x', 'y4', 'D', x_link=p1)
output_file('scatter_plotting.html', mode='cdn')
p = Column(p1, p2, p3, p4)
show(p)
def makePlot(self,filename,x0,y0,y_label, **optional):
TOOLS = "box_zoom,box_select,reset"
p = figure(tools=TOOLS, plot_width=1800, plot_height=300)
if ('x_link' in optional):
p0 = optional['x_link']
p.x_range = p0.x_range
p.scatter(x=x0, y=y0, marker='square', size=1, fill_color='red', source=self.source)
p.title.text = filename
p.title.text_color = 'orange'
p.xaxis.axis_label = 'T'
p.yaxis.axis_label = y_label
p.xaxis.minor_tick_line_color = 'red'
p.yaxis.minor_tick_line_color = None
return p
And my main looks like this (set to pass along up to 100K data points from the file):web
p = readMyFile(path+filename+extension, 100000)
t = p.time()
a = p.a()
b = p.b()
c = p.c()
d = p.d()
v = bokehPlot(filename, t, a, b, c, d)
The variables t, a, b, c, and d are type numpy ndarray.
I've managed to link the plots so I can pan and zoom them all from one graph. I would like to grab a cluster of data from one plot and see them highlighted, along with the corresponding values (at the same t values) highlighted on the other graphs.
In this code, I can draw a selection box, but it just remains for a moment, then disappears, and I see no effect on any plot. How is the box_select linked to the source and what causes the plots to redraw?
This is just one step in trying to familiarize myself with Bokeh. My next goal will be to use TSNE to cluster my data and show the clusters with synchronized colors in each graph. But first, I want to understand the mechanics of using the column data set here. In the sample code, for example, I don't see any explicit connection between the box_select operation and the source variable and what causes the plot to redraw.

My understanding is that the BooleanFilter, the IndexFilter and the GroupFilter can be used to filter the data in one of your plots before rendering. If you only want the second plot to respond to events in the first plot then you should just use gridplot as suggested in the comment. As long as the plots have the same ColumnDataSource they should be linked.
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, show
source = ColumnDataSource(data=dict(x=[1, 2, 3, 4, 5],
y=[1, 2, 3, 4, 5],
z=[3, 5, 1, 6, 7]))
tools = ["box_select", "hover", "reset"]
p_0 = figure(plot_height=300, plot_width=300, tools=tools)
p_0.circle(x="x", y="y", size=10, hover_color="red", source=source)
p_1 = figure(plot_height=300, plot_width=300, tools=tools)
p_1.circle(x="x", y="z", size=10, hover_color="red", source=source)
show(gridplot([[p_0, p_1]]))

Related

How to define Python Bokeh RangeSlider.on_change callback function to alter IndexFilter for plots?

I'm trying to implement a python callback function for a RangeSlider. The Slider Value should tell which Index a IndexFilter should get for display.
For example: If rangeslider.value is (3, 25) my plots should only contain/view data with the Index from 3 to 25.
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource, GMapOptions, CustomJS, CDSView, IndexFilter
from bokeh.plotting import gmap, ColumnDataSource, figure
from bokeh.layouts import column, row
from bokeh.models.widgets import RangeSlider
import numpy as np
def slider_callback(attr, old, new):
p.view = CDSView(source=source, filters=[IndexFilter(np.arange(new.value[0], new.value[1]))])
v.view = CDSView(source=source, filters=[IndexFilter(np.arange(new.value[0], new.value[1]))])
# data set
lon = [[48.7886, 48.7887, 48.7888, 48.7889, 48.789],
[48.7876, 48.7877, 48.78878, 48.7879, 48.787],
[48.7866, 48.7867, 48.7868, 48.7869, 48.786],
[48.7856, 48.7857, 48.7858, 48.7859, 48.785],
[48.7846, 48.7847, 48.7848, 48.7849, 48.784]]
lat = [[8.92, 8.921, 8.922, 8.923, 8.924],
[8.91, 8.911, 8.912, 8.913, 8.914],
[8.90, 8.901, 8.902, 8.903, 8.904],
[8.89, 8.891, 8.892, 8.893, 8.894],
[8.88, 8.881, 8.882, 8.883, 8.884]]
time = [0, 1, 2, 3, 4, 5]
velocity = [23, 24, 25, 24, 20]
lenght_dataset = len(lon)
# define source and map
source = ColumnDataSource(data = {'x': lon, 'y': lat, 't': time, 'v': velocity})
view = CDSView(source=source, filters=[IndexFilter(np.arange(0, lenght_dataset))])
map_options = GMapOptions(lat=48.7886, lng=8.92, map_type="satellite", zoom=13)
p = gmap("MY_API_KEY", map_options, title="Trajectory Map")
v = figure(plot_width=400, plot_height=400, title="Velocity")
# plot lines on map
p.multi_line('y', 'x', view=view, source=source, line_width=1)
v.line('t', 'v', view=view, source=source, line_width=3)
# slider to limit plotted data
range_slider = RangeSlider(title="Data Range Slider: ", start=0, end=lenght_dataset, value=(0, lenght_dataset), step=1)
range_slider.on_change('value', slider_callback)
# Layout to plot and output
layout = row(column(p, range_slider),
column(v)
)
output_file("diag_plot_bike_data.html")
show(layout)

Some notes:
time is longer than the rest of the columns - you will receive a warning about it. In my code below, I just removed its last element
view with filters in general should not be used for continuous glyphs like lines (v.line in particular - multi_line is fine). You will receive a warning about it. But if the indices in IndexFilter are always continuous, then you should be fine. Either way, you can use the segment glyph to avoid the warning
In your callback, you're trying to set view on the figures - views only exist on glyph renderers
In general, you don't want to recreate views, you want to recreate as few Bokeh models as possible. Ideally, you would have to just change the indices field of the filter. But there's some missing wiring in Bokeh, so you will have to set the filters field of the view, as below
new argument of Python callbacks receives the new value for the attribute passed as the first parameter to the corresponding on_change call. In this case, it will be a tuple, so instead of new.value[0] you should use new[0]
Since you've decided to use Python callbacks, you can no longer use show and have a static HTML file - you will have to use curdoc().add_root and bokeh serve. The UI needs that Python code to run somewhere in runtime
When changing the slider values, you will notice that the separate segments of multi_line will be joined together - it's a bug and I just created https://github.com/bokeh/bokeh/issues/10589 for it
Here's a working example:
from bokeh.io import curdoc
from bokeh.layouts import column, row
from bokeh.models import GMapOptions, CDSView, IndexFilter
from bokeh.models.widgets import RangeSlider
from bokeh.plotting import gmap, ColumnDataSource, figure
lon = [[48.7886, 48.7887, 48.7888, 48.7889, 48.789],
[48.7876, 48.7877, 48.78878, 48.7879, 48.787],
[48.7866, 48.7867, 48.7868, 48.7869, 48.786],
[48.7856, 48.7857, 48.7858, 48.7859, 48.785],
[48.7846, 48.7847, 48.7848, 48.7849, 48.784]]
lat = [[8.92, 8.921, 8.922, 8.923, 8.924],
[8.91, 8.911, 8.912, 8.913, 8.914],
[8.90, 8.901, 8.902, 8.903, 8.904],
[8.89, 8.891, 8.892, 8.893, 8.894],
[8.88, 8.881, 8.882, 8.883, 8.884]]
time = [0, 1, 2, 3, 4]
velocity = [23, 24, 25, 24, 20]
lenght_dataset = len(lon)
# define source and map
source = ColumnDataSource(data={'x': lon, 'y': lat, 't': time, 'v': velocity})
view = CDSView(source=source, filters=[IndexFilter(list(range(lenght_dataset)))])
map_options = GMapOptions(lat=48.7886, lng=8.92, map_type="satellite", zoom=13)
p = gmap("API_KEY", map_options, title="Trajectory Map")
v = figure(plot_width=400, plot_height=400, title="Velocity")
p.multi_line('y', 'x', view=view, source=source, line_width=1)
v.line('t', 'v', view=view, source=source, line_width=3)
range_slider = RangeSlider(title="Data Range Slider: ", start=0, end=lenght_dataset, value=(0, lenght_dataset), step=1)
def slider_callback(attr, old, new):
view.filters = [IndexFilter(list(range(*new)))]
range_slider.on_change('value', slider_callback)
layout = row(column(p, range_slider), column(v))
curdoc().add_root(layout)

Ploting multiple curves (x, y1, y2, x, y3, y4) in the same plot

I'm trying to plot a graph with four different values on the "y" axis. So, I have 6 arrays, 2 of which have elements that represent the time values of the "x" axis and the other 4 represent the corresponding elements (in the same position) in relation to the "y" axis.
Example:
LT_TIME = ['18:14:17.566 ', '18:14:17.570']
LT_RP = [-110,-113]
LT_RQ = [-3,-5]
GNR_TIME = ['18: 15: 42.489', '18:32:39.489']
GNR_RP = [-94, -94]
GNR_RQ = [-3, -7]
The coordinates of the "LT" graph are:
('18:14:17.566',-110), ('18:14:17.570',-113), ('18:14:17.566',-3), ('18:14:17.570',-5)
And with these coordinates, I can generate a graph with two "y" axes, which contains the points (-110,-113,-3,-5) and an "x" axis with the points ('18:14:17.566', '18:14:17.570').
Similarly, it is possible to do the same "GNR" arrays. So, how can I have all the Cartesian points on both the "LT" and "GNR" arrays on the same graph??? I mean, how to plot so that I have the following coordinates on the same graph:
('18:14:17.566',-110), ('18:14:17.570 ',-113), ('18:14:17.566',-3), ('18:14:17.570',-5),
('18:15:42.489',-94), ('18:32:39.489',-94), ('18:15:42.489',-3), ('18:32:39.489',-7)

It sounds like your problem has two parts: formatting the data in a way that visualisation libraries would understand and actually visualising it using a dual axis.
Your example screenshot includes some interactive controls so I suggest you use bokeh which gives you zoom and pan for "free" rather than matplotlib. Besides, I find that bokeh's way of adding dual axis is more straight-forward. If matplotlib is a must, here's another answer that should point you in the right direction.
For the first part, you can merge the data you have into a single dataframe, like so:
import pandas as pd
from bokeh.models import LinearAxis, Range1d, ColumnDataSource
from bokeh.plotting import figure, output_notebook, show
output_notebook() #if working in Jupyter Notebook, output_file() if not
LT_TIME = ['18:14:17.566 ', '18:14:17.570']
LT_RP = [-110,-113]
LT_RQ = [-3,-5]
GNR_TIME = ['18: 15: 42.489', '18:32:39.489']
GNR_RP = [-94, -94]
GNR_RQ = [-3, -7]
s1 = list(zip(LT_TIME, LT_RP)) + list(zip(GNR_TIME, GNR_RP))
s2 = list(zip(LT_TIME, LT_RQ)) + list(zip(GNR_TIME, GNR_RQ))
df1 = pd.DataFrame(s1, columns=["Date", "RP"])
df2 = pd.DataFrame(s2, columns=["Date", "RQ"])
df = df1.merge(df2, on="Date")
source = ColumnDataSource(df)
To visualise the data as a dual axis line chart, we just need to specify the extra y-axis and position it in the layout:
p = figure(x_range=df["Date"], y_range=(-90, -120))
p.line(x="Date", y="RP", color="cadetblue", line_width=2, source=source)
p.extra_y_ranges = {"RQ": Range1d(start=0, end=-10)}
p.line(x="Date", y="RQ", color="firebrick", line_width=2, y_range_name="RQ", source=source)
p.add_layout(LinearAxis(y_range_name="RQ"), 'right')
show(p)

Is there a way to update legend patch labels using a CustomJS callback?

Using Bokeh 1.4 and Python 3.7. I have a set of patches that I'd like to vary the color theme for based on two different keys (and labels) from the same ColumnDataSource. I want to stick to using one ColumnDataSource because my real file is quite large and the geometry (i.e. the xs and ys) are common between the two things i'd like to theme by.
See my working example:
from bokeh.io import show
from bokeh.models import ColumnDataSource,CustomJS, widgets, LinearColorMapper
from bokeh.palettes import RdBu6, Spectral11
from bokeh.plotting import figure
from bokeh.layouts import layout, column, row
source = ColumnDataSource(dict(
xs=[[1,2,2], [1,2,2], [3,4,4], [3,4,4]],
ys=[[3,3,4], [1,1,2], [3,3,4], [1,1,2]],
s1=[0, 50, 75, 50],
s2=[0, 25, 50, 75],
label_1=['Blue', 'Orangy', 'Red', 'Orangy'],
label_2=['S', 'P', 'E', 'C']
))
cmap1 = LinearColorMapper(palette='RdBu6', low = 0, high = 75)
cmap2 = LinearColorMapper(palette='Spectral11', low = 0, high = 75)
p = figure(x_range=(0, 7), y_range=(0, 5), plot_height=300)
patches = p.patches( xs='xs', ys='ys', fill_color={'field':'s1','transform':cmap1}
, legend_field='label_1', source=source)
b = widgets.Button(label = 'RdBu')
b.js_on_click(CustomJS(args=dict(b=b,source=source,patches=patches,cmap1=cmap1,cmap2=cmap2,p=p),
code="""if (b.label == 'RdBu')
{b.label='Spectral';
patches.glyph.fill_color = {field: 's2',transform:cmap2};}
else if (b.label == 'Spectral')
{b.label='RdBu';
patches.glyph.fill_color = {field: 's1',transform:cmap1}}"""
))
layout=column(row(p),row(b))
show(layout)
This yields this, and then this when clicking the button. You can see that the fill_color update part of the callback is working correctly as the colors change and even the colors in the legend change, but I have been unable to find a way instruct the CustomJS to properly update the legend entries so that in the second image there would be 4 entries with 'S','P','E' and 'C' as the legend labels.
From what I can tell, when I create the patches object and specify a legend_field argument, it constructs a legend for me with some sort of groupby/aggregate function to generate unique legend entries for me, and then it adds that legend to the figure object?
So that led me down the path of trying to drill down into p.legend:
p.legend.items #returns a list containing one LegendItem object
p.legend.items[0].label #returns a dictionary: {'field': 'label_1'}
I tried putting p.legend.items[0].label['field'] = 'label_2' outside of the callback and it worked as I hoped - the legend now reads S,P,E,C. But when I try putting that into the callback code it doesn't seem to update:
b.js_on_click(CustomJS(args=dict(b=b,source=source,patches=patches,cmap1=cmap1,cmap2=cmap2,p=p),
code="""if (b.label == 'RdBu')
{b.label='Spectral';
patches.glyph.fill_color = {field: 's2',transform:cmap2};
p.legend.items[0].label['field']='label_2'}
else if (b.label == 'Spectral')
{b.label='RdBu';
patches.glyph.fill_color = {field: 's1',transform:cmap1}
p.legend.items[0].label['field']='label_1'}"""
))
I feel like I'm very close but just missing one or two key things.... any advice/help appreciated!

Solution from Carolyn here: https://discourse.bokeh.org/t/is-there-a-way-to-update-legend-patch-labels-using-a-customjs-callback/4504
... I was really close.

python bokeh: update scatter plot colors on callback

I only started to use Bokeh recently. I have a scatter plot in which I would like to color each marker according to a certain third property (say a quantity, while the x-axis is a date and the y-axis is a given value at that point in time).
Assuming my data is in a data frame, I managed to do this using a linear color map as follows:
min_q = df.quantity.min()
max_q = df.quantity.max()
mapper = linear_cmap(field_name='quantity', palette=palettes.Spectral6, low=min_q, high=max_q)
source = ColumnDataSource(data=get_data(df))
p = figure(x_axis_type="datetime")
p.scatter(x="date_column", y="value", marker="triangle", fill_color=mapper, line_color=None, source=source)
color_bar = ColorBar(color_mapper=mapper['transform'], width=8, location=(0,0))
p.add_layout(color_bar, 'right')
This seems to work as expected. Below is the plot I get upon starting the bokeh server.
Then I have a callback function update() triggered upon changing value in some widget (a select or a time picker).
def update():
# get new df (according to new date/select)
df = get_df()
# update min/max for colormap
min_q = df.quantity.min()
max_q = df.quantity.max()
# I think I should not create a new mapper but doing so I get closer
mapper = linear_cmap(field_name='quantity', palette=palettes.Spectral6 ,low=min_q, high=max_q)
color_bar.color_mapper=mapper['transform']
source.data = get_data(df)
# etc
This is the closest I could get. The color map is updated with new values, but it seems that the colors of the marker still follow the original pattern. See picture below (given that quantity I would expect green, but it is blue as it still seen as < 4000 as in the map of the first plot before the callback).
Should I just add a "color" column to the data frame? I feel there is an easier/more convenient way to do that.
EDIT: Here is a minimal working example using the answer by bigreddot:
from bokeh.io import curdoc
from bokeh.layouts import column
from bokeh.plotting import figure
from bokeh.models import Button, ColumnDataSource, ColorBar, HoverTool
from bokeh.palettes import Spectral6
from bokeh.transform import linear_cmap
import numpy as np
x = [1,2,3,4,5,7,8,9,10]
y = [1,2,3,4,5,7,8,9,10]
z = [1,2,3,4,5,7,8,9,10]
source = ColumnDataSource(dict(x=x, y=y, z=z))
#Use the field name of the column source
mapper = linear_cmap(field_name='z', palette=Spectral6 ,low=min(y) ,high=max(y))
p = figure(plot_width=300, plot_height=300, title="Linear Color Map Based on Y")
p.circle(x='x', y='y', line_color=mapper,color=mapper, fill_alpha=1, size=12, source=source)
color_bar = ColorBar(color_mapper=mapper['transform'], width=8, location=(0,0))
p.add_tools(HoverTool(tooltips="#z", show_arrow=False, point_policy='follow_mouse'))
p.add_layout(color_bar, 'right')
b = Button()
def update():
new_z = np.exp2(z)
mapper = linear_cmap(field_name='z', palette=Spectral6 ,low=min(new_z), high=max(new_z))
color_bar.color_mapper=mapper['transform']
source.data = dict(x=x, y=y, z=new_z)
b.on_click(update)
curdoc().add_root(column(b, p))
Upon update, the circles will be colored according to the original scale: everything bigger than 10 will be red. Instead, I would expect everything blue until the last 3 circle on tops that should be colored green yellow and red respectively.

It's possible that is a bug, feel free to open a GitHub issue.
That said, the above code does not represent best practices for Bokeh usage, which is: always make the smallest update possible. In this case, this means setting new property values on the existing color transform, rather than replacing the existing color transform.
Here is a complete working example (made with Bokeh 1.0.2) that demonstrates the glyph's colormapped colors updating in response to the data column changing:
from bokeh.io import curdoc
from bokeh.layouts import column
from bokeh.plotting import figure
from bokeh.models import Button, ColumnDataSource, ColorBar
from bokeh.palettes import Spectral6
from bokeh.transform import linear_cmap
x = [1,2,3,4,5,7,8,9,10]
y = [1,2,3,4,5,7,8,9,10]
z = [1,2,3,4,5,7,8,9,10]
#Use the field name of the column source
mapper = linear_cmap(field_name='z', palette=Spectral6 ,low=min(y) ,high=max(y))
source = ColumnDataSource(dict(x=x, y=y, z=z))
p = figure(plot_width=300, plot_height=300, title="Linear Color Map Based on Y")
p.circle(x='x', y='y', line_color=mapper,color=mapper, fill_alpha=1, size=12, source=source)
color_bar = ColorBar(color_mapper=mapper['transform'], width=8, location=(0,0))
p.add_layout(color_bar, 'right')
b = Button()
def update():
new_z = np.exp2(z)
# update the existing transform
mapper['transform'].low=min(new_z)
mapper['transform'].high=max(new_z)
source.data = dict(x=x, y=y, z=new_z)
b.on_click(update)
curdoc().add_root(column(b, p))
Here is the original plot:
And here is the update plot after clicking the button

Jupyter Bokeh: Non-existent column name in glyph renderer

I have a GlyphRenderer whose data_source.data is
{'index': [0, 1, 2, 3, 4, 5, 6, 7],
'color': ['#3288bd', '#66c2a5', '#abdda4', '#e6f598', '#fee08b', '#fdae61', '#f46d43', '#d53e4f']}
The renderer's glyph is
Oval(height=0.1, width=0.2, fill_color="color")
When rendering, I see
E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: color [renderer: GlyphRenderer(id='1d1031f5-6ee3-4744-a0f7-22309798e313', ...)]
I'm clearly missing something, but this is pretty much lifted from published examples. I verified in a debugger that data_source.column_names is just ['index']; what I don't understand is why the 'color' column doesn't appear in the data source's column_names, or why Bokeh produces this warning (the graph appears to be correctly rendered).
The complete source is available at https://pastebin.com/HXAEEujP

It's generally better to provide all relevant arguments when constructing an object rather than mutating the object after it's already been created. It's especially true for Bokeh - in many cases it does some additional work based on the arguments passed to __init__.
Take a look at this version of your code:
import math
from bokeh.io import show
from bokeh.models import GraphRenderer, StaticLayoutProvider, Oval, GlyphRenderer, ColumnDataSource, MultiLine
from bokeh.palettes import Spectral8
from bokeh.plotting import figure
N = 8
node_indices = list(range(N))
plot = figure(title="Graph Layout Demonstration", x_range=(-1.1, 1.1), y_range=(-1.1, 1.1),
plot_width=250, plot_height=250,
tools="", toolbar_location=None)
node_ds = ColumnDataSource(data=dict(index=node_indices,
color=Spectral8),
name="Node Renderer")
edge_ds = ColumnDataSource(data=dict(start=[0] * N,
end=node_indices),
name="Edge Renderer")
### start of layout code
circ = [i * 2 * math.pi / 8 for i in node_indices]
x = [math.cos(i) for i in circ]
y = [math.sin(i) for i in circ]
graph_layout = dict(zip(node_indices, zip(x, y)))
graph = GraphRenderer(node_renderer=GlyphRenderer(glyph=Oval(height=0.1, width=0.2, fill_color="color"),
data_source=node_ds),
edge_renderer=GlyphRenderer(glyph=MultiLine(),
data_source=edge_ds),
layout_provider=StaticLayoutProvider(graph_layout=graph_layout))
plot.renderers.append(graph)
show(plot)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Bokeh use of Column Data Source and Box_Select - python

Related

How to define Python Bokeh RangeSlider.on_change callback function to alter IndexFilter for plots?

Ploting multiple curves (x, y1, y2, x, y3, y4) in the same plot

Is there a way to update legend patch labels using a CustomJS callback?

python bokeh: update scatter plot colors on callback

Jupyter Bokeh: Non-existent column name in glyph renderer

Categories

Resources