Struggling to understand why this bokeh visual will not allow me to change plots and see the predicted data. The plot and select (dropdown-looking) menu appears, but I'm not able to change the plot for items in the menu.
Running Bokeh 1.2.0 via Anaconda. The code has been run both inside & outside of Jupyter. No errors display when the code is run. I've looked through the handful of SO posts relating to this same issue, but I've not been able to apply the same solutions successfully.
I wasn't sure how to create a toy problem out of this, so in addition to the code sample below, the full code (including the regression code and corresponding data) can be found at my github here (code: Regression&Plotting.ipynb, data: pred_data.csv, historical_data.csv, features_created.pkd.)
import pandas as pd
import datetime
from bokeh.io import curdoc, output_notebook, output_file
from bokeh.layouts import row, column
from bokeh.models import Select, DataRange1d, ColumnDataSource
from bokeh.plotting import figure
#Must be run from the command line
def get_historical_data(src_hist, drug_id):
historical_data = src_hist.loc[src_hist['ndc'] == drug_id]
historical_data.drop(['Unnamed: 0', 'date'], inplace = True, axis = 1)#.dropna()
historical_data['date'] = pd.to_datetime(historical_data[['year', 'month', 'day']], infer_datetime_format=True)
historical_data = historical_data.set_index(['date'])
historical_data.sort_index(inplace = True)
# csd_historical = ColumnDataSource(historical_data)
return historical_data
def get_prediction_data(src_test, drug_id):
#Assign the new date
#Write a new dataframe with values for the new dates
df_pred = src_test.loc[src_test['ndc'] == drug_id].copy()
df_pred.loc[:, 'year'] = input_date.year
df_pred.loc[:, 'month'] = input_date.month
df_pred.loc[:, 'day'] = input_date.day
df_pred.drop(['Unnamed: 0', 'date'], inplace = True, axis = 1)
prediction = lin_model.predict(df_pred)
prediction_data = pd.DataFrame({'drug_id': prediction[0][0], 'predictions': prediction[0][1], 'date': pd.to_datetime(df_pred[['year', 'month', 'day']], infer_datetime_format=True, errors = 'coerce')})
prediction_data = prediction_data.set_index(['date'])
prediction_data.sort_index(inplace = True)
# csd_prediction = ColumnDataSource(prediction_data)
return prediction_data
def make_plot(historical_data, prediction_data, title):
#Historical Data
plot = figure(plot_width=800, plot_height = 800, x_axis_type = 'datetime',
toolbar_location = 'below')
plot.xaxis.axis_label = 'Time'
plot.yaxis.axis_label = 'Price ($)'
plot.axis.axis_label_text_font_style = 'bold'
plot.x_range = DataRange1d(range_padding = 0.0)
plot.grid.grid_line_alpha = 0.3
plot.title.text = title
plot.line(x = 'date', y='nadac_per_unit', source = historical_data, line_color = 'blue', ) #plot historical data
plot.line(x = 'date', y='predictions', source = prediction_data, line_color = 'red') #plot prediction data (line from last date/price point to date, price point for input_date above)
return plot
def update_plot(attrname, old, new):
ver = vselect.value
new_hist_source = get_historical_data(src_hist, ver) #calls the function above to get the data instead of handling it here on its own
historical_data.data = ColumnDataSource.from_df(new_hist_source)
# new_pred_source = get_prediction_data(src_pred, ver)
# prediction_data.data = new_pred_source.data
#Import data source
src_hist = pd.read_csv('data/historical_data.csv')
src_pred = pd.read_csv('data/pred_data.csv')
#Prep for default view
#Initialize plot with ID number
ver = 781593600
#Set the prediction date
input_date = datetime.datetime(2020, 3, 31) #Make this selectable in future
#Select-menu options
menu_options = src_pred['ndc'].astype(str) #already contains unique values
#Create select (dropdown) menu
vselect = Select(value=str(ver), title='Drug ID', options=sorted((menu_options)))
#Prep datasets for plotting
historical_data = get_historical_data(src_hist, ver)
prediction_data = get_prediction_data(src_pred, ver)
#Create a new plot with the source data
plot = make_plot(historical_data, prediction_data, "Drug Prices")
#Update the plot every time 'vselect' is changed'
vselect.on_change('value', update_plot)
controls = row(vselect)
curdoc().add_root(row(plot, controls))
UPDATED: ERRORS:
1) No errors show up in Jupyter Notebook.
2) CLI shows a UserWarning: Pandas doesn't allow columns to be careated via a new attribute name, referencing `historical_data.data = ColumnDatasource.from_df(new_hist_source).
Ultimately, the plot should have a line for historical data, and another line or dot for predicted data derived from sklearn. It also has a dropdown menu to select each item to plot (one at a time).
Your update_plot is a no-op that does not actually make any changes to Bokeh model state, which is what is necessary to change a Bokeh plot. Changing Bokeh model state means assigning a new value to a property on a Bokeh object. Typically, to update a plot, you would compute a new data dict and then set an existing CDS from it:
source.data = new_data # plain python dict
Or, if you want to update from a DataFame:
source.data = ColumnDataSource.from_df(new_df)
As an aside, don't assign the .data from one CDS to another:
source.data = other_source.data # BAD
By contrast, your update_plot computes some new data and then throws it away. Note there is never any purpose to returning anything at all from any Bokeh callback. The callbacks are called by Bokeh library code, which does not expect or use any return values.
Lastly, I don't think any of those last JS console errors were generated by BokehJS.
Related
I have a bokeh scatter plot that updates the data displayed based on the checkbox selection for time of day. The legend allows you to hide or show the different datasets.
I'd like to add a regression line that updates with the selections. For example, if hour 0 and both data sets are showing it would calculate and show the corresponding regression line. Then if hour 9 was added the regression line would update accordingly.
Example of Current Bokeh Plot
Below is a sample of the code corresponding to the example plot.
# create filtered plot
filt_time_fig = figure(title = 'AC Power vs Expected Power',
x_axis_label = 'AC Power',
y_axis_label = 'Expected Power')
# define source
source = ColumnDataSource(master_interval_df)
# create checkbox
active_hour = '0'
f = BooleanFilter(booleans=[l == active_hour for l in master_interval_df['hour']])
uniq_hours = list(map(str,list(range(0,24))))
cg = CheckboxGroup(labels=uniq_hours, active=[uniq_hours.index(active_hour)])
cg.js_on_change('active',
CustomJS(args=dict(source=source, f=f),
code="""\
const hours = cb_obj.active.map(idx => cb_obj.labels[idx]);
f.booleans = source.data.hour.map(l => hours.includes(l));
source.change.emit();
"""))
#create glyph for each dataset
for turb in turbine_list:
filt_time_fig.circle(x = 'AC_p_'+turb,
y = 'Exp_p_'+turb,
source = source,
legend_label = turb,
line_color = None,
fill_color={"field":"hour","transform":colormap},
view = CDSView(source=source, filters=[f]))
cg.max_width=50
filt_time_fig.legend.click_policy = 'hide'
show(filt_time_fig)
I am trying to build a visual that tracks widget counts by category using hbar. The source data is not aggregated. This is what it looks like:
This data is aggregated at MktCatKey level, but I want to group by category and then perform a calculation on the counts. Lets say if the category is Category_A, I want to add +10 to the counts. Finally, I want to display both current and projected on a visual.
This is how far I have gotten:
query = open('workingsql.sql')
dataset = pd.read_sql_query(query.read(), cnxn)
query.close()
p = figure()
CurrentCount = dataset.Current
ProjCount = dataset.Projected
Cat = dataset.Category
grouped = dataset.groupby('Category')['Current','Projected'].sum()
source = ColumnDataSource(grouped)
p = figure(y_range=Cat)
p.hbar(y=Cat, right = CurrentCount, left = 0, height = 0.5,source=source, fill_color="#D7D7D7")
p.hbar(y=Cat, right = ProjCount, left = 0, height = 0.5,source=source, fill_color="#E21150")
hover = HoverTool()
hover.tooltips = [("Totals", "#Current Current Count")]
hover.mode = 'hline'
p.add_tools(hover)
show(p)
I was able to get this to work if I source directly from the dataset. But since I’m trying to perform a calculation, I cant use the source directly. I’m not fully familiar on how to do an if statement on CurrentCount to see if it’s for Category_A or not but that’s where I’m at.
I have additional things I want to do on this dataset (like bring in a goals dataset and plot against that), but taking small steps for now. Any help is appreciated.
Working code below:
import pyodbc
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource, Div, Select, Slider, TextInput
from bokeh.embed import components
from bokeh.models.tools import HoverTool
query = open('workingsql.sql')
dataset = pd.read_sql_query(query.read(), cnxn)
query.close()
p = figure()
CurrentCount = dataset.Current
ProjCount = dataset.Projected
Cat = dataset.Category
grouped = dataset.groupby('Category')['Current','Projected'].sum()
source = ColumnDataSource(pd.DataFrame(grouped))
Category = source.data['Category'].tolist()
p = figure(y_range=Category)
p.hbar(y='Category', right = 'Current', left = 0, height = 0.5,source=source, fill_color="#D7D7D7")
p.hbar(y='Category', right = 'Projected', left = 0, height = 0.5,source=source, fill_color="#E21150")
hover = HoverTool()
hover.tooltips = [("Totals", "#Current Current Count")]
hover.mode = 'hline'
p.add_tools(hover)
show(p)
I try to create an interactive diagram which plots values Y over dates X. So far so good. Now I want to adjust the limits xmin and xmax of my x-axis via a DateRangeSlider but I don't understand the js callback function (I want to have a standalone html file at the end) and since I don't even know how to print values from inside the function and without any errors produced, I have no idea what to do now.
here is a running example of code:
import numpy as np
import pandas as pd
from datetime import datetime
from bokeh.models import ColumnDataSource, DatetimeTickFormatter, HoverTool
from bokeh.models.widgets import DateRangeSlider
from bokeh.layouts import layout, column
from bokeh.models.callbacks import CustomJS
from bokeh.plotting import figure, output_file, show, save
datesX = pd.date_range(start='1/1/2018', periods=100)
valuesY = pd.DataFrame(np.random.randint(0,25,size=(100, 1)), columns=list('A'))
source = ColumnDataSource(data={'x': datesX, 'y': valuesY['A']})
# output to static HTML file
output_file('file.html')
hover = HoverTool(tooltips=[('Timestamp', '#x{%Y-%m-%d %H:%M:%S}'), ('Value', '#y')],
formatters={'x': 'datetime'},)
date_range_slider = DateRangeSlider(title="Zeitrahmen", start=datesX[0], end=datesX[99], \
value=(datesX[0], datesX[99]), step=1, width=300)
# create a new plot with a title and axis labels
p = figure(title='file1', x_axis_label='Date', y_axis_label='yValue', x_axis_type='datetime',
tools="pan, wheel_zoom, box_zoom, reset", plot_width=300, plot_height=200)
# add a line renderer with legend and line thickness
p.line(x='x', y='y', source=source, line_width=2)
p.add_tools(hover)
callback = CustomJS(args=dict(source=source), code="""
##### what to do???
source.change.emit();
""")
date_range_slider.js_on_change('value', callback)
layout = column(p, date_range_slider)
# show the results
show(layout)
I tried to adjust and adapt similar examples of people on stackoverflow and from the bokeh demos, but i didn't manage to produce running code. Hope everything is clear and You can help.
You need to create a new source.data when changing the sliders. To do that, you also need a "back-up" source that you don't change and which serves as a reference for what data to include. Passing both as arguments to the callback function makes them available to the Javascript code.
datesX = pd.date_range(start='1/1/2018', periods=100)
valuesY = pd.DataFrame(np.random.randint(0,25,size=(100, 1)), columns=list('A'))
# keep track of the unchanged, y-axis values
source = ColumnDataSource(data={'x': datesX, 'y': valuesY['A']})
source2 = ColumnDataSource(data={'x': datesX, 'y': valuesY['A']})
# output to static HTML file
output_file('file.html')
hover = HoverTool(
tooltips=[('Timestamp', '#x{%Y-%m-%d %H:%M:%S}'), ('Value', '#y')],
formatters={'x': 'datetime'},)
date_range_slider = DateRangeSlider(
title="Zeitrahmen", start=datesX[0], end=datesX[99],
value=(datesX[0], datesX[99]), step=1, width=300)
# create a new plot with a title and axis labels
p = figure(
title='file1', x_axis_label='Date', y_axis_label='yValue',
y_range=(0, 30), x_axis_type='datetime',
tools="pan, wheel_zoom, box_zoom, reset",
plot_width=600, plot_height=200)
# add a line renderer with legend and line thickness
p.line(x='x', y='y', source=source, line_width=2)
p.add_tools(hover)
callback = CustomJS(args=dict(source=source, ref_source=source2), code="""
// print out array of date from, date to
console.log(cb_obj.value);
// dates returned from slider are not at round intervals and include time;
const date_from = Date.parse(new Date(cb_obj.value[0]).toDateString());
const date_to = Date.parse(new Date(cb_obj.value[1]).toDateString());
const data = source.data;
const ref = ref_source.data;
const from_pos = ref["x"].indexOf(date_from);
// add + 1 if you want inclusive end date
const to_pos = ref["x"].indexOf(date_to);
// re-create the source data from "reference"
data["y"] = ref["y"].slice(from_pos, to_pos);
data["x"] = ref["x"].slice(from_pos, to_pos);
source.change.emit();
""")
date_range_slider.js_on_change('value', callback)
layout = column(p, date_range_slider)
# show the results
show(layout)
I found out that the answer above does not work since the timestamps of the ref_source data are different than the parsed timestamps which come from the bokeh Slider Object (cb_obj).
So for example the timestamps from the ref_source data create the following output when being parsed with new Date(source.data.["x"]);:
01/01/2020 02:00:00
The timestamps coming from the bokeh Slider Object cb_obj always have a time of 00:00:00. Therefore the timestamps cant be found when using const from_pos = ref["date"].indexOf(date_from);.
To parse the dates from the ref_source correctly I created a new array new_ref and added the correctly parsed dates to this array. However, I have to emphasize here that I am not a JavaScript expert and I am pretty sure that the code can be written more efficiently here.
This is my working example:
// print out array of date from, date to
console.log(cb_obj.value);
// dates returned from slider are not at round intervals and include time;
const date_from = Date.parse(new Date(cb_obj.value[0]).toDateString());
const date_to = Date.parse(new Date(cb_obj.value[1]).toDateString());
console.log(date_from, date_to)
// Creating the Data Sources
const data = source.data;
const ref = ref_source.data;
// Creating new Array and appending correctly parsed dates
let new_ref = []
ref["x"].forEach(elem => {
elem = Date.parse(new Date(elem).toDateString());
new_ref.push(elem);
console.log(elem);
})
// Creating Indices with new Array
const from_pos = new_ref.indexOf(date_from);
const to_pos = new_ref.indexOf(date_to) + 1;
// re-create the source data from "reference"
data["y"] = ref["y"].slice(from_pos, to_pos);
data["x"] = ref["x"].slice(from_pos, to_pos);
source.change.emit();
I hope it helped you a bit :)
Interesting problem and discussion. Adding the following two lines (one of which was lifted directly from the documentation) allowed the slider to work without using the CustomJS and js_on_change function - using the js_link function instead:
date_range_slider.js_link('value', p.x_range, 'start', attr_selector=0)
date_range_slider.js_link('value', p.x_range, 'end', attr_selector=1)
I had similar success to #Jan Burger but using the CustonJS to directly change the plots x_range rather than filtering the datasource.
callback = CustomJS(args=dict(p=p), code="""
p.x_range.start = cb_obj.value[0]
p.x_range.end = cb_obj.value[1]
p.x_range.change.emit()
""")
date_range_slider.js_on_change('value_throttled', callback)
I am trying to find the max and min value for each category within source = columndatasource where my stock data is organized into columns by (Open, High, Low, Close, AdjClose, Volume, etc....)
I tried using,
max(source.data['Close'])
min(source.data['Close'])
however, the problem with max(source.data['Open'] is that the values do not update when I update my data when using the slider and select widgets.
Is there a way in which that I can find the min and max of each column that will update each time when I update my data ?
from math import pi
import pandas as pd
import numpy as np
import datetime
import time
from datetime import date
from bokeh.layouts import row, widgetbox, column
from bokeh.models import DataRange1d, LinearAxis, Range1d, ColumnDataSource, PrintfTickFormatter, CDSView, BooleanFilter, NumeralTickFormatter
from bokeh.models.widgets import PreText, Select, DateRangeSlider, Button, DataTable, TableColumn, NumberFormatter
from bokeh.io import curdoc, show, reset_output
from bokeh.plotting import figure, output_file
DEFAULT_TICKERS = ['AAPL','GOOG','NFLX', 'TSLA']
ticker1 = Select(value='AAPL', options = DEFAULT_TICKERS)
range_slider1 = DateRangeSlider(start=date(2014,1,1) , end=date(2017,1,1), value=(date(2014,2,1),date(2016,3,1)), step=1)
def load_ticker(ticker):
fname = ( '%s.csv' % ticker.lower())
data = pd.read_csv( fname, header = None, parse_dates = ['Date'],
names =['Date','Open','High','Low','Close','AdjClose','Volume'])
return data
def get_data(t1):
data = load_ticker(t1)
return data
def ticker1_change(attrname, old, new):
update()
def range_slider_change(attrname, old, new):
update()
def update(selected=None):
t1 = ticker1.value
if isinstance(range_slider1.value[0], (int, float)):
# pandas expects nanoseconds since epoch
start_date = pd.Timestamp(float(range_slider1.value[0])*1e6)
end_date = pd.Timestamp(float(range_slider1.value[1])*1e6)
else:
start_date = pd.Timestamp(range_slider1.value[0])
end_date = pd.Timestamp(range_slider1.value[1])
datarange = get_data(t1)
datarange['Date'] = pd.to_datetime(datarange['Date'])
mask = (datarange['Date'] > start_date) & (datarange['Date'] <= end_date)
data = datarange.loc[mask]
source.data = source.from_df(data)
p.title.text = t1
data = get_data(ticker1.value)
source = ColumnDataSource(data)
p = figure(plot_width=900, plot_height=400, x_axis_type='datetime', y_range = Range1d(min(source.data['Close']), max(source.data['Close'])))
p.grid.grid_line_alpha = 0.3
p.line('Date', 'Close', source=source)
ticker1.on_change('value', ticker1_change)
range_slider1.on_change('value', range_slider_change)
update()
layout = column(ticker1,range_slider1, p)
curdoc().add_root(layout)
curdoc().title = "Stock"
Yes. Your question is a little convoluted
Short answer: You need to create another "source" that contains the max and min values.
Long answer:
Your code is not running properly. I copied/pasted your code ^^ and ran it on a local bokeh server. No output i.e. you need to fix your code first.
But, let's say that your code was running. The only way as of now to auto update a max or min each time you change your bokeh slider or other widget value is to create another source, let's say source2.
source = ColumnDataSource(data_max_min)
Then, match the keys to the same value. In your example^^, it would most likely be date in the dictionary (data_max_min).
E.g.
pd = read_csv('.../AAPL.csv', header=0, index=None)
aapl_close = pd.DataFrame(aapl_df['close'])
aapl_close.index = aapl_df.date
aapl_close
close
date
2018/11/23 172.29
2018/11/26 174.62
2018/11/27 174.24
I'm assuming that you want to get a max and min value for each time range that you want to analyze on a rolling basis (or something like that). My code will just get the max for each close (*it will be the same value) just as an example. If you don't understand this, I would recommend reading some of the documentation again.
aapl_max_df = pd.DataFrame()
aapl_max_df['max'] = [max(prices) for prices in aapl_close['close']]
aapl_max_df.index = aapl_close.index
aapl_max_min = {}
dates = aapl_max_min.index
for i in range(aapl_max_min.shape[0]):
aapl_max_min[aapl_max_min.index.values[i]] = aapl_max_min['max'].values[i]
source2 = ColumnDataSource(data=aapl_max_min[dates[0]])
Then when you update the slider, you will need to update the "date" for for both sources. This is something not yet in your code. There are several examples online on how to do this (https://github.com/bokeh/bokeh/tree/master/examples/app/gapminder).
like so-->
def slider_update(attrname, old, new):
year = slider.value
label.text = str(year)
source.data = data[year]
source2.data = data[year]
I am new to plotly and working on a script to generate a graph based on some results pulled from a database. However when I send the data over to plotly, only the first data point for each of the three traces is being graphed. I've verified that the lists contain the right data, I've even simply pasted the lists in instead of dynamically creating the variables. Unfortunately each time only the first data point is being graphed. Does anyone know what I am missing here? I am also open to another library if needed.
Is it also possible to have the x axis show as a string?
import plotly.plotly as py
import plotly.graph_objs as go
# Custom database class, works fine.
from classes.database import DatabaseConnection
# Database Connections and instances
db_instance = DatabaseConnection()
db_conn = db_instance.conn
db_cur = db_instance.cur
def main():
# Get a list of versions and their stats.
db_cur.execute(
"""
select row_to_json(x) from
(SELECT
versions.version_number,
cast(AVG(results.average) as double precision) as average,
cast(AVG(results.minimum) as double precision) as minimum,
cast(AVG(results.maximum) as double precision) as maximum
FROM versions,results
WHERE
versions.version_number = results.version_number
GROUP BY
versions.version_number) x;
"""
)
versions = []
average = []
minimum = []
maximum = []
unclean = db_cur.fetchall()
# Create lists for x and y coordinates.
for row in unclean:
versions.append(row[0]['version_number'])
average.append(int(row[0]['average']))
minimum.append(int(row[0]['minimum']))
maximum.append(int(row[0]['maximum']))
grph_average = go.Scatter(
x=versions,
y=average,
name = 'Average',
mode='lines',
)
grph_minimum = go.Scatter(
x=versions,
y=minimum,
name = 'Minimum',
mode='lines',
)
grph_maximum = go.Scatter(
x=versions,
y=maximum,
name = 'Maximum',
mode='lines',
)
data = go.Data([grph_average, grph_minimum, grph_maximum])
# Edit the layout
layout = dict(title = 'Responses',
xaxis = dict(title = 'Versions'),
yaxis = dict(title = 'Ms'),
)
fig = dict(data=data, layout=layout)
py.plot(fig, filename='response-times', auto_open=False)
if __name__ == '__main__':
main()
The data that query returns is as follows, if you want to plug in the values :
versions = ['6.1', '5.0', '5.2']
average = [11232, 29391, 10429]
minimum = [3641, 7729, 3483]
maximum = [57440, 62535, 45201]
Here is some matplotlib that might get you started on this:
import matplotlib.pyplot as plt
versions = ['6.1', '5.0', '5.2']
average = [11232, 29391, 10429]
minimum = [3641, 7729, 3483]
maximum = [57440, 62535, 45201]
plt.plot(minimum)
plt.plot(average)
plt.plot(maximum)
plt.xticks(range(len(versions)), versions)
It looks like it was an issue with my x axis. By adding some text before the version number and specifically type casting to a string I was able to get the graphs to generate properly.
# Create lists for x and y coordinates.
for row in unclean:
versions.append("Version: " + str(row[0]['version_number']))
average.append(int(row[0]['average']))
minimum.append(int(row[0]['minimum']))
maximum.append(int(row[0]['maximum']))