Bokeh not interpreting correct scale with SQL data

Bokeh not interpreting correct scale with SQL data - python

Explored lots of various solutions here but not finding one that works. I'm using sqlite and pandas to read data from a SQL database, but Bokeh doesn't like the date. I've tried conversions to datetime, unixepoch, etc. and they all seem to yield the same result.
EDIT: Here's the full code:
from os.path import dirname, join
import pandas as pd
import pandas.io.sql as psql
import numpy as np
import sqlite3
import os
from math import pi
from bokeh.plotting import figure, output_file, show
from bokeh.io import output_notebook, curdoc
from bokeh.models import ColumnDataSource, Div, DatetimeTickFormatter
from bokeh.models.widgets import Slider, Select, RadioButtonGroup
from bokeh.layouts import layout, widgetbox
import warnings
import datetime
warnings.filterwarnings('ignore')
## Set up the SQL Connection
conn = sqlite3.connect('/Users/<>/Documents/python_scripts/reptool/reptool_db')
c = conn.cursor()
## Run the SQL
proj = pd.read_sql(
"""
SELECT
CASE WHEN df is null THEN ds ELSE df END AS 'projdate',
CASE WHEN yhat is null THEN y ELSE yhat END AS 'projvol',
strftime('%Y',ds) as 'year'
FROM forecast
LEFT JOIN actuals
ON forecast.ds = actuals.df
""", con=conn)
# HTML index page and inline CSS stylesheet
desc = Div(text=open("/Users/<>/Documents/python_scripts/reptool/description.html").read(), width=800)
## Rename Columns and create list sets
proj.rename(columns={'projdate': 'x', 'projvol': 'y'}, inplace=True)
x=list(proj['x'])
y=list(proj['y'])
# proj['projdate'] = [datetime.datetime.strptime(x, "%Y-%m-%d").date() for x in proj['projdate']]
# Create input controls
radio_button_group = RadioButtonGroup(
labels=["Actuals", "Forecast","FY Projection"], active=0)
min_year = Slider(title="Period Start", start=2012, end=2018, value=2013, step=1)
max_year = Slider(title="Period End", start=2012, end=2018, value=2017, step=1)
## Declare systemic source
source = ColumnDataSource(data=dict(x=[], y=[], year=[]))
## Bokeh tools
TOOLS="pan,wheel_zoom,box_zoom,reset,xbox_select"
## Set up plot
p = figure(title="REP Forecast", plot_width=900, plot_height=300, tools=TOOLS, x_axis_label='date', x_axis_type='datetime', y_axis_label='volume', active_drag="xbox_select")
p.line(x=proj.index, y=y, line_width=2, line_alpha=0.6)
p.xaxis.major_label_orientation = pi/4
# p.xaxis.formatter = DatetimeTickFormatter(seconds=["%Y:%M"],
# minutes=["%Y:%M"],
# minsec=["%Y:%M"],
# hours=["%Y:%M"])
# axis map
# definitions
def select_rep():
selected = proj[
(proj.year >= min_year.value) &
(proj.year >= max_year.value)
]
return selected
def update():
proj = select_rep()
source.data = dict(
year=proj["year"]
)
controls = [min_year, max_year]
for control in controls:
control.on_change('value', lambda attr, old, new: update())
sizing_mode = 'fixed' # 'scale_width' also looks nice with this example
## Build the html page and inline CSS
inputs = widgetbox(*controls)
l = layout([
[desc],
[p],
[inputs],
], )
# update()
curdoc().add_root(l)
curdoc().title = "REP"
The SQLite output in Terminal.app looks like this:
SQL
The result is, that the x-axis displays in milliseconds. Also, the y-axis is showing up as exponential notation:
Bokeh Plot
The issue seems somehow related to pandas use of indexing, and thus I can't reference "x" here. I rename the columns and force list sets which, by themselves, will print correctly... and should therefore plot into the line properly but as you'll see below, they don't:
proj.rename(columns={'projdate': 'x', 'projvol': 'y'}, inplace=True)
x=list(proj['x'])
y=list(proj['y'])
To get the line to render in Bokeh, I have to pass it the index because passing it anything else doesn't seem to get the glyph to render. So currently I have this:
p = figure(title="REP Forecast", plot_width=900, plot_height=300, tools=TOOLS, x_axis_label='date', x_axis_type='datetime', y_axis_label='volume', active_drag="xbox_select")
p.line(x=proj.index, y=y, line_width=2, line_alpha=0.6)
Tried converting to unixepoch in the SQL, same result.
Tried converting to unixepoch in the data, same result.
Tried using DateTimeTickFormatter, just shows all 5-6 years as one year (thinking it's just displaying the milliseconds as years rather than changing them from milliseconds to days.
I've looked here and in github, up and down, and tried different things but ultimately I can't find one working example where the source is a sql query not a csv.

None of these things have anything to do with SQL, Bokeh only cares about the data that you give it, not where it came from. You have specified that you want a datetime axis on the x-axis:
x_axis_type='datetime'
So, Bokeh will set up the plot with a ticker that picks "nice" values on a datetime scale, and with a tick formatter that displays tick locations as formatted dates. What is important, however, is that the data coordinates are in the appropriate units, which are floating point milliseconds since epoch.
You can provide x values directly in these units, but Bokeh will also automatically convert common datetime types (e.g. python stdlib, numpy, or pandas) to the right units automatically. So the easiest thing for you to do is pass a column of datetime values as the x values to line.
To be clear, this statement:
To render the line in Bokeh, it has to use the index
is incorrect. You can pass any dataframe column you like as the x-values, and I am suggesting you pass a column of datetimes.

I changed a line of the SQL to:
CASE WHEN df is null THEN strftime('%Y',ds) ELSE strftime('%Y',df) END AS 'projdate',
However, when I try expanding that specifier to %Y-%m-%d %H-%m-%s it just reads it as a string all over again.
And also by re-importing the data I was able to pass the date through here without using Index:
p.line(x=x, y=y, line_width=2, line_alpha=0.6)
But then I get this weird output: link.
So it's clear that it can read the year, but I need to pass through the full date to display the time series forecast. And it's still displaying the dates and y-values in the incorrect scale, regardless.
Going to noodle on this some more but if anyone has other suggestions, I'm thankful.

SOLVED the datetime problem. Added this after the SQL query:
proj['projdate'] = proj['projdate'].astype('datetime64[ns]')
Which in turn yields this:
Bokeh Plot
Still got a problem with the x-axis but since that's a straight numerical value, x_axis_type should fix it.
So far the working code looks like this (again, still iterating to add other controls but everything about the Bokeh plot itself works as intended):
# main.py
# created by: <>
# version: 0.1.2
# created date: 07-Aug-2018
# modified date: 09-Aug-2018
from os.path import dirname, join
import pandas as pd
import pandas.io.sql as psql
import numpy as np
import sqlite3
import os
from math import pi
from bokeh.plotting import figure, output_file, show
from bokeh.io import output_notebook, curdoc
from bokeh.models import ColumnDataSource, Div, DatetimeTickFormatter
from bokeh.models.widgets import Slider, Select, RadioButtonGroup
from bokeh.layouts import layout, widgetbox
import warnings
import datetime
warnings.filterwarnings('ignore')
## Set up the SQL Connection
conn = sqlite3.connect('/Users/<>/Documents/python_scripts/reptool/reptool_db')
c = conn.cursor()
## Run the SQL
proj = pd.read_sql(
"""
SELECT
CASE WHEN df is null THEN strftime('%Y-%m-%d',ds) ELSE strftime('%Y-%m-%d',df) END AS 'projdate',
CASE WHEN yhat is null THEN y ELSE yhat END AS 'projvol',
strftime('%Y',ds) as 'year'
FROM forecast
LEFT JOIN actuals
ON forecast.ds = actuals.df
""", con=conn)
proj['projdate'] = proj['projdate'].astype('datetime64[ns]')
# HTML index page and inline CSS stylesheet
desc = Div(text=open("/Users/<>/Documents/python_scripts/reptool/description.html").read(), width=800)
## Rename Columns and create list sets
proj.rename(columns={'projdate': 'x', 'projvol': 'y'}, inplace=True)
x=list(proj['x'])
y=list(proj['y'])
# Create input controls
radio_button_group = RadioButtonGroup(
labels=["Actuals", "Forecast","FY Projection"], active=0)
min_year = Slider(title="Period Start", start=2012, end=2018, value=2013, step=1)
max_year = Slider(title="Period End", start=2012, end=2018, value=2017, step=1)
## Declare systemic source
source = ColumnDataSource(data=dict(x=[], y=[], year=[]))
## Bokeh tools
TOOLS="pan,wheel_zoom,box_zoom,reset,xbox_select"
## Set up plot
p = figure(title="REP Forecast", plot_width=900, plot_height=300, tools=TOOLS, x_axis_label='date', x_axis_type='datetime', y_axis_label='volume', active_drag="xbox_select")
p.line(x=x, y=y, line_width=2, line_alpha=0.6)
p.xaxis.major_label_orientation = pi/4
# p.xaxis.formatter = DatetimeTickFormatter(seconds=["%Y:%M"],
# minutes=["%Y:%M"],
# minsec=["%Y:%M"],
# hours=["%Y:%M"])
# axis map
# definitions
def select_rep():
selected = proj[
(proj.year >= min_year.value) &
(proj.year >= max_year.value)
]
return selected
def update():
proj = select_rep()
source.data = dict(
year=proj["year"]
)
controls = [min_year, max_year]
for control in controls:
control.on_change('value', lambda attr, old, new: update())
sizing_mode = 'fixed' # 'scale_width' also looks nice with this example
## Build the html page and inline CSS
inputs = widgetbox(*controls)
l = layout([
[desc],
[p],
[inputs],
], )
# update()
curdoc().add_root(l)
curdoc().title = "REP"

Related

How to put interactive bokeh HTML image in Jupyterlab page

I'm using Jupyterlab (v 3.2.1) and bokeh to create a webpage that allows a user to load a .csv file containing a matrix, and a slider to optionally set a threshold on displayed results. The matrix contains simply some numerical values. The result would be an interactive heatmap displayed below the confirmation button. Whit my code the webpage is displayed correctly but the final plot is displayed in a new tab:
import warnings
warnings.filterwarnings('ignore')
import jupyter_bokeh
import ipywidgets as widgets
import pandas as pd
import io
from bokeh.io import show
from bokeh.models import ColorBar, ColumnDataSource, CategoricalColorMapper
from bokeh.plotting import figure
from bokeh.transform import transform
import bokeh.palettes
from IPython.display import display, clear_output, display_html
from bokeh.resources import CDN
from bokeh.embed import file_html
from bokeh.layouts import layout
#Display the webpage
file = widgets.FileUpload(accept=".txt, .csv, .dat", multiple=False)
threshold=widgets.IntSlider(value=0, min=0, max=20, step=1, description="Threshold:", disabled=False, continuous_update=False, orintation='horizontal', readout=True, readout_format="d")
button = widgets.Button(description='Run code')
text_0 = widgets.HTML(value="<header><h1>Phenotype Major Categories vs Genes Heatmap</h1></header>")
text_1 = widgets.HTML(value="<h3>Welcome to the heatmap plotter. By loading a csv file containing the counts of phenoypes for a gene into an IMPC major phenotype category, it will display an interactive heatmap.</h3>")
text_2 = widgets.HTML(value="Please load yor file (accepted formats: csv, txt, dat):")
text_3 = widgets.HTML(value="If desired, set a threshold for counts to be displayed:")
text_4 = widgets.HTML(value="<h2>Heatmap:</h2>")
vbox_head = widgets.VBox([text_0, text_1])
page_layout_plot = [text_2, file, text_3, threshold, button]
vbox_text = widgets.VBox(page_layout_plot)
page = widgets.VBox([vbox_head,vbox_text])
display(page)
#Set the endpage button to run the code
def on_button_clicked(result):
#Load the file and set the threshold
inp = list(file.value.values())[0] #if multiple setted to true, will not work!
content = inp['content']
content = io.StringIO(content.decode('utf-8'))
mat = pd.read_csv(content, sep="\t", index_col=0)
mat.index.name = 'MGI_id'
mat.columns.name = 'phen_sys'
#filtering phase
rem=[]
x = int(threshold.value)
if x != 0:
for i in mat.index:
if mat.loc[i].max() < x:
rem.append(i)
mat.drop(rem,inplace=True,axis=0)
#Create a custom palette and add a specific mapper to map color with values, we are converting them to strings to create a categorical color mapper to include only the
#values that we have in the matrix and retrieve a better representation
df = mat.stack(dropna=False).rename("value").reset_index()
fact= df.value.unique()
fact.sort()
fact = fact.astype(str)
df.value = df.value.astype(str)
mapper = CategoricalColorMapper(palette=bokeh.palettes.inferno(len(df.value.unique())), factors= fact, nan_color = 'gray')
#Define a figure
p = figure(
plot_width=1280,
plot_height=800,
x_range=list(df.phen_sys.drop_duplicates()[::-1]),
y_range=list(df.MGI_id.drop_duplicates()),
tooltips=[('Phenotype system','#phen_sys'),('Gene','#MGI_id'),('Phenotypes','#value')],
x_axis_location="above",
output_backend="webgl")
#Create rectangles for heatmap
p.rect(
x="phen_sys",
y="MGI_id",
width=1,
height=1,
source=ColumnDataSource(df),
fill_color=transform('value', mapper))
p.xaxis.major_label_orientation = 45
#Add legend
color_bar = ColorBar(
color_mapper=mapper,
label_standoff=6,
border_line_color=None)
p.add_layout(color_bar, 'right')
show(p)
button.on_click(on_button_clicked)
I already tried to use output_notebook() at the beginning but in that case nothing is displayed.
How can I fix it? It would be useful to display in real time the plot by changing the threshold without the need to click the confirmation button every time.
Thank you for all the help.

You might need to observe the value attribute of your treshold object to refresh your plot. So add something like this at the end of your code:
def on_value_change(change):
on_button_clicked(None)
threshold.observe(on_value_change, names='value')
More from the doc: https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Events.html#Signatures

How to plot visualization with Interactive Feature Selection in Bokeh, Python

The task is to automate the Visualization. The CSV file contains large nos of features (column names e:g. 32 nos it may increase in future). The task is to plot Interactive Visualization. All the examples I found are hardcoded for the dynamic features selection.
But the requirement is to make the stuff dynamic. How to make it dynamic? Please guide.
I have successfully plotted the graph dynamically, but could not connect the interactive part. The code is as follows:
import pandas as pd
from bokeh.plotting import figure
from bokeh.io import show
from bokeh.models import CustomJS,HoverTool,ColumnDataSource,Select
from bokeh.models.widgets import CheckboxGroup
from bokeh.models.annotations import Title, Legend
import itertools
from bokeh.palettes import inferno
from bokeh.layouts import row
def creat_plot(dataframe):
data=dataframe
#Converting the timestamp Column to Timestamp datatype so that it can be used for Plotting on X-axis
data['timestamp'] = pd.to_datetime(data['timestamp'])
#Segregating Date and Time from timestamp column. It will be used in Hover Tool
date = lambda x: str(x)[:10]
data['date'] = data[['timestamp']].applymap(date)
time= lambda x: str(x)[11:]
data['time'] = data[['timestamp']].applymap(time)
#Converting whole dataframe ColumnDatasource for easy usage in hover tool
source = ColumnDataSource(data)
# List all the tools that you want in your plot separated by comas, all in one string.
TOOLS="crosshair,pan,wheel_zoom,box_zoom,reset,hover"
# New figure
t = figure(x_axis_type = "datetime", width=1500, height=600,tools=TOOLS,title="Plot for Interactive Features")
#X-axis Legend Formatter
t.xaxis.formatter.days = '%d/%m/%Y'
#Axis Labels
t.yaxis.axis_label = 'Count'
t.xaxis.axis_label = 'Date and Time Span'
#Grid Line Formatter
t.ygrid.minor_grid_line_color = 'navy'
t.ygrid.minor_grid_line_alpha = 0.1
t.xgrid.visible = True
t.ygrid.visible= True
#Hover Tool Usage
t.select_one(HoverTool).tooltips = [('Date', '#date'),('Time', '#time')]
#A color iterator creation
colors = itertools.cycle(inferno(len(data.columns)))
#A Line type iterator creation
line_types= ['solid','dashed','dotted','dotdash','dashdot']
lines= itertools.cycle(line_types)
column_name=[]
#Looping over the columns to plot the Data
for m in data.columns[2:len(data.columns)-2]:
column_name.append(m)
a=t.line(data.columns[0], m ,color=next(colors),source=source,line_dash=next(lines), alpha= 1)
#Adding Label Selection Check Box List
column_name= list(column_name)
checkboxes = CheckboxGroup(labels = column_name, active= [0,1,2])
show(row(t,checkboxes))
The above function can be used as follows:
dataframe= pd.read_csv('data.csv')
creat_plot(dataframe)
**The above code is executed on following requirements:
Bokeh version: 2.2.3
Panda Version: 1.1.3
The plot should be linked with the checkbox values. The features selected through the checkboxes shall be plotted only.
The solution to the above requirement is as follows:
import pandas as pd
from bokeh.plotting import figure
from bokeh.io import show,output_file
from bokeh.models import CustomJS,HoverTool,ColumnDataSource,Select
from bokeh.models.widgets import CheckboxGroup
from bokeh.models.annotations import Title, Legend
import itertools
from bokeh.palettes import inferno
from bokeh.layouts import row
def creat_plot(dataframe):
data=dataframe
#Converting the timestamp Column to Timestamp datatype so that it can be used for Plotting on X-axis
data['timestamp'] = pd.to_datetime(data['timestamp'])
#Segregating Date and Time from timestamp column. It will be used in Hover Tool
date = lambda x: str(x)[:10]
data['date'] = data[['timestamp']].applymap(date)
time= lambda x: str(x)[11:]
data['time'] = data[['timestamp']].applymap(time)
#Converting whole dataframe ColumnDatasource for easy usage in hover tool
source = ColumnDataSource(data)
# List all the tools that you want in your plot separated by comas, all in one string.
TOOLS="crosshair,pan,wheel_zoom,box_zoom,reset,hover"
# New figure
t = figure(x_axis_type = "datetime", width=1500, height=600,tools=TOOLS,title="Plot for Interactive Visualization")
#X-axis Legend Formatter
t.xaxis.formatter.days = '%d/%m/%Y'
#Axis Labels
t.yaxis.axis_label = 'Count'
t.xaxis.axis_label = 'Date and Time Span'
#Grid Line Formatter
t.ygrid.minor_grid_line_color = 'navy'
t.ygrid.minor_grid_line_alpha = 0.1
t.xgrid.visible = True
t.ygrid.visible= True
#Hover Tool Usage
t.select_one(HoverTool).tooltips = [('Date', '#date'),('Time', '#time')]
#A color iterator creation
colors = itertools.cycle(inferno(len(data.columns)))
#A Line type iterator creation
line_types= ['solid','dashed','dotted','dotdash','dashdot']
lines= itertools.cycle(line_types)
feature_lines = []
column_name=[]
#Looping over the columns to plot the Data
for m in data.columns[2:len(data.columns)-2]:
column_name.append(m)
#Solution to my question is here
feature_lines.append(t.line(data.columns[0], m ,color=next(colors),source=source,line_dash=next(lines), alpha= 1, visible=False))
#Adding Label Selection Check Box List
column_name= list(column_name)
#Solution to my question,
checkbox = CheckboxGroup(labels=column_name, active=[])
#Solution to my question
callback = CustomJS(args=dict(feature_lines=feature_lines, checkbox=checkbox), code="""
for (let i=0; i<feature_lines.length; ++i) {
feature_lines[i].visible = i in checkbox.active
}
""")
checkbox.js_on_change('active', callback)
output_file('Interactive_data_visualization.html')
show(row(t, checkbox))

bokeh: How to edit a df or CDS-object through box_select?

I'm trying to label a pandas-df (containing timeseries data) with the help of
a bokeh-lineplot, box_select tool and a TextInput widget in a jupyter-notebook. How can I access the by the box_select selected data points?
I tried to edit a similar problems code (Get selected data contained within box select tool in Bokeh) by changing the CustomJS to something like:
source.callback = CustomJS(args=dict(p=p), code="""
var inds = cb_obj.get('selected')['1d'].indices;
[source.data['xvals'][i] for i in inds] = 'b'
"""
)
but couldn't apply a change on the source of the selected points.
So the shortterm goal is to manipulate a specific column of source of the selected points.
Longterm I want to use a TextInput widget to label the selected points by the supplied Textinput. That would look like:
EDIT:
That's the current code I'm trying in the notebook, to reconstruct the issue:
from random import random
import bokeh as bk
from bokeh.layouts import row
from bokeh.models import CustomJS, ColumnDataSource, HoverTool
from bokeh.plotting import figure, output_file, show, output_notebook
output_notebook()
x = [random() for x in range(20)]
y = [random() for y in range(20)]
hovertool=HoverTool(tooltips=[("Index", "$index"), ("Label", "#label")])
source = ColumnDataSource(data=dict(x=x, y=y, label=[i for i in "a"*20]))
p1 = figure(plot_width=400, plot_height=400, tools="box_select", title="Select Here")
p1.circle('x', 'y', source=source, alpha=0.6)
p1.add_tools(hovertool)
source.selected.js_on_change('indices', CustomJS(args=dict(source=source), code="""
var inds = cb_obj.indices;
for (var i = 0; i < inds.length; i++) {
source.data['label'][inds[i]] = 'b'
}
source.change.emit();
""")
)
layout = row(p1)
show(layout)

The main thing to note is that BokehJS can only automatically notice updates when actual assignments are made, e.g.
source.data = some_new_data
That would trigger an update. If you update the data "in place" then BokehJS is not able to notice that. You will have to be explicit and call source.change.emit() to let BokehJS know something has been updated.
However, you should also know that you are using three different things that are long-deprecated and will be removed in the release after next.
cb_obj.get('selected')
There is no need to ever use .get You can just access properties directly:
cb_obj.selected
The ['1d'] syntax. This dict approach was very clumsy and will be removed very soon. For most selections you want the indices property of the selection:
source.selected.indices
source.callback
This is an ancient ad-hoc callback. There is a newer general mechanism for callbacks on properties that should always be used instead
source.selected.js_on_change('indices', CustomJS(...))
Note that in this case, the cb_obj is the selection, not the data source.

With the help of this guide on how to embed a bokeh server in the notebook I figured out the following minimal example for my purpose:
from random import random
import pandas as pd
import numpy as np
from bokeh.io import output_notebook, show
from bokeh.layouts import column
from bokeh.models import Button
from bokeh.plotting import figure
from bokeh.models import HoverTool, ColumnDataSource, BoxSelectTool
from bokeh.models.widgets import TextInput
output_notebook()
def modify_doc(doc):
# create a plot and style its properties
TOOLS="pan,wheel_zoom,reset"
p = figure(title = "My chart", tools=TOOLS)
p.xaxis.axis_label = 'X'
p.yaxis.axis_label = 'Y'
hovertool=HoverTool(tooltips=[("Index", "$index"), ("Label", "#label")])
source = ColumnDataSource(
data=dict(
xvals=list(range(0, 10)),
yvals=list(np.random.normal(0, 1, 10)),
label = [i for i in "a"*10]
))
p.scatter("xvals", "yvals",source=source, color="white")
p.line("xvals", "yvals",source=source)
p.add_tools(BoxSelectTool(dimensions="width"))
p.add_tools(hovertool)
# create a callback that will add a number in a random location
def callback():
inds = source.selected.indices
for i in inds:
source.data['label'][i] = label_input.value.strip()
print(source.data)
new_data = pd.DataFrame(source.data)
new_data.to_csv("new_data.csv", index=False)
# TextInput to specify the label
label_input = TextInput(title="Label")
# add a button widget and configure with the call back
button = Button(label="Label Data")
button.on_click(callback)
# put the button and plot in a layout and add to the document
doc.add_root(column(button,label_input, p))
show(modify_doc, notebook_url="http://localhost:8888")
That generates the following UI:
BTW: Due to the non-existing box_select tool for the line glyph I use a workaround by combining it with invisible scatter points.
So far so good, is there a more elegant way to access the data.source/new_data df in the notebook outside modify_doc() than exporting it within the callback?

Bokeh multiline plot

I am trying to plot RPI, CPI and CPIH on one chart with a HoverTool showing the value of each when you pan over a given area of the chart.
I initially tried adding each line separately using line() which kind of worked:
However, the HoverTool only works correctly when you scroll over the individual lines.
I have tried using multi_line() like:
combined_inflation_metrics = 'combined_inflation_metrics.csv'
df_combined_inflation_metrics = pd.read_csv(combined_inflation_metrics)
combined_source = ColumnDataSource(df_combined_inflation_metrics)
l.multi_line(xs=['Date','Date','Date'],ys=['RPI', 'CPI', 'CPIH'], source=combined_source)
#l.multi_line(xs=[['Date'],['Date'],['Date']],ys=[['RPI'], ['CPI'], ['CPIH']], source=combined_source)
show(l)
However, this is throwing the following:
RuntimeError:
Supplying a user-defined data source AND iterable values to glyph methods is
not possibe. Either:
Pass all data directly as literals:
p.circe(x=a_list, y=an_array, ...)
Or, put all data in a ColumnDataSource and pass column names:
source = ColumnDataSource(data=dict(x=a_list, y=an_array))
p.circe(x='x', y='y', source=source, ...)
But I am not too sure why this is?
Update:
I figured out a workaround by adding all of the values in each of the data sources. It works, but doesn't feel most efficient and would still like to know how to do this properly.
Edit - Code request:
from bokeh.plotting import figure, output_file, show
from bokeh.models import NumeralTickFormatter, DatetimeTickFormatter, ColumnDataSource, HoverTool, CrosshairTool, SaveTool, PanTool
import pandas as pd
import os
os.chdir(r'path')
#output_file('Inflation.html', title='Inflation')
RPI = 'RPI.csv'
CPI = 'CPI.csv'
CPIH = 'CPIH.csv'
df_RPI = pd.read_csv(RPI)
df_CPI = pd.read_csv(CPI)
df_CPIH = pd.read_csv(CPIH)
def to_date_time(data_frame, data_series):
data_frame[data_series] = data_frame[data_series].astype('datetime64[ns]')
to_date_time(df_RPI, 'Date')
to_date_time(df_CPI, 'Date')
to_date_time(df_CPIH, 'Date')
RPI_source = ColumnDataSource(df_RPI)
CPI_source = ColumnDataSource(df_CPI)
CPIH_source = ColumnDataSource(df_CPIH)
l = figure(title="Historic Inflaiton Metrics", logo=None)
l.plot_width = 1200
l.xaxis[0].formatter=DatetimeTickFormatter(
days=["%d %B %Y"],
months=["%d %B %Y"],
years=["%d %B %Y"],
)
glyph_1 = l.line('Date','RPI',source=RPI_source, legend='TYPE', color='red')
glyph_2 = l.line('Date','CPI',source=CPI_source, legend='TYPE', color='blue')
glyph_3 = l.line('Date','CPIH',source=CPIH_source, legend='TYPE', color='gold')
hover = HoverTool(renderers=[glyph_1],
tooltips=[ ("Date","#Date{%F}"),
("RPI","#RPI"),
("CPI","#CPI"),
("CPIH","#CPIH")],
formatters={"Date": "datetime"},
mode='vline'
)
l.tools = [SaveTool(), PanTool(), hover, CrosshairTool()]
show(l)

The hover tool looks up the data to show in the ColumnDataSource. Because you created a new ColumnDataSource for each line and restricted the hover tool to line1 it can only lookup data in the data source there.
The general solution is to only create one ColumnDataSource and reuse that in each line:
df_RPI = pd.read_csv(RPI)
df_CPI = pd.read_csv(CPI)
df_CPIH = pd.read_csv(CPIH)
df = df_RPI.merge(dfd_CPI, on="date")
df = df.merge(df_CPIH, on="date")
source = ColumnDataSource(df)
l = figure(title="Historic Inflation Metrics", logo=None)
glyph_1 = l.line('Date','RPI',source=source, legend='RPI', color='red')
l.line('Date','CPI',source=source, legend='CPI', color='blue')
l.line('Date','CPIH',source=source, legend='CPIH', color='gold')
hover = HoverTool(renderers=[glyph_1],
tooltips=[ ("Date","#Date{%F}"),
("RPI","#RPI"),
("CPI","#CPI"),
("CPIH","#CPIH")],
formatters={"Date": "datetime"},
mode='vline'
)
show(l)
This is of course only possible if you all your dataframes can be merged into one, i.e. the measurement timepoints are the same. If they are not besides resampling/interpolating I do not know a good method to do what you want.

Select corresponding/multiple bars on bokeh vbar with taptool

I'm currently working with data from a sports related test. I want to visualize the (multidimensional) test-data of one athlete from several tests (different dates), so I made grouped vbar with the dates at the lowest level of grouping. Now I want to tap on one bar to select it and the corresponding ones from the same date should be selected (and highlighted), too.
Till now I was searching on stackoverflow with "[python][bokeh]taptool" query, I looked up the whole issues section on git/bokeh with the tag "taptool" and did a google with similar queries, but I can't find a matching thread.
To clarify my needs, I modified the grouped_bars_example from the bokeh repository. My goal is to select all bars of one manufacturer by clicking on one bar. (I know it's possible to hold shift-key for multiselections, but it is quiet annoying to select, for example, 6 corresponding bars out of 120 bars. Thatswhy I'm looking for an efficient way to do so with one click)
#### basic example code from ~/latest/docs/user_guide/categorical.html#grouped
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource, HoverTool, TapTool
from bokeh.plotting import figure
from bokeh.palettes import Spectral5
from bokeh.sampledata.autompg import autompg_clean as df
from bokeh.transform import factor_cmap
# output_file('bars.html')
# preparing data for figure
df.cyl = df.cyl.astype(str)
df.yr = df.yr.astype(str)
group = df.groupby(('cyl', 'mfr'))
source = ColumnDataSource(group)
index_cmap = factor_cmap('cyl_mfr', palette=Spectral5, factors=sorted(df.cyl.unique()), end=1)
# setting up figure
p = figure(plot_width=800, plot_height=300, title="Mean MPG by # Cylinders and Manufacturer",
x_range=group, toolbar_location=None, tools="")
# adding grouped vbar
p.vbar(x='cyl_mfr', top='mpg_mean', width=1, source=source,
line_color="white", fill_color=index_cmap, )
# figurestyling
p.y_range.start = 0
p.x_range.range_padding = 0.05
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "Manufacturer grouped by # Cylinders"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None
# adding Tools
p.add_tools(HoverTool(tooltips=[("MPG", "#mpg_mean"), ("Cyl, Mfr", "#cyl_mfr")]))
p.add_tools(TapTool())
#### my additional code
import pandas as pd
import sys
from bokeh.plotting import curdoc, figure
# redirect output in files (just for debugging)
save_stderr = sys.stderr
f_err = open('error.log', 'w')
sys.stderr = f_err
save_stdout = sys.stdout
f_info = open('info.log', 'w')
sys.stdout = f_info
# callbackfunction to obtain selected bars
def callback_tap(attr, old, new):
# write selected indices to file
output = source.selected['1d']['indices'] # indices of selected bars
print(output, type(output))
# make calculations only, if one bar is selected
if len(output) == 1:
# find all corresponding indices to manufacturer based on selected bar
# get manufacturer corresponding to retrieved index
man = source.data['cyl_mfr'][output][1]
# temporary DataFrame
tmp = pd.DataFrame(source.data['cyl_mfr'].tolist(), columns=['cyl', 'mfr'])
# look up all corresponding indices for manufacturer "man"
indices = tmp.index[tmp.mfr == man].values.tolist()
# assing list of indices
source.selected['1d']['indices'] = indices
# assigning callbackfunction
source.on_change('selected', callback_tap)
curdoc().add_root(p)
Please note, that I change the output to run a bokeh server in order to have custom python callback. For debugging reasons, I redirected the outputs to files.
In my callback function, the first part is working fine, and I retrieve the indicees of the selected bar. Additionally, I find the corresponding indicees with an DataFrame, but at the end I'm struggle to assign the new indicees in a way, that the vbar figure is updated.
I'll be very happy, if someone can help me.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Bokeh not interpreting correct scale with SQL data - python

Related

How to put interactive bokeh HTML image in Jupyterlab page

How to plot visualization with Interactive Feature Selection in Bokeh, Python

bokeh: How to edit a df or CDS-object through box_select?

Bokeh multiline plot

Select corresponding/multiple bars on bokeh vbar with taptool

Categories

Resources