I am trying to build a visual that tracks widget counts by category using hbar. The source data is not aggregated. This is what it looks like:
This data is aggregated at MktCatKey level, but I want to group by category and then perform a calculation on the counts. Lets say if the category is Category_A, I want to add +10 to the counts. Finally, I want to display both current and projected on a visual.
This is how far I have gotten:
query = open('workingsql.sql')
dataset = pd.read_sql_query(query.read(), cnxn)
query.close()
p = figure()
CurrentCount = dataset.Current
ProjCount = dataset.Projected
Cat = dataset.Category
grouped = dataset.groupby('Category')['Current','Projected'].sum()
source = ColumnDataSource(grouped)
p = figure(y_range=Cat)
p.hbar(y=Cat, right = CurrentCount, left = 0, height = 0.5,source=source, fill_color="#D7D7D7")
p.hbar(y=Cat, right = ProjCount, left = 0, height = 0.5,source=source, fill_color="#E21150")
hover = HoverTool()
hover.tooltips = [("Totals", "#Current Current Count")]
hover.mode = 'hline'
p.add_tools(hover)
show(p)
I was able to get this to work if I source directly from the dataset. But since I’m trying to perform a calculation, I cant use the source directly. I’m not fully familiar on how to do an if statement on CurrentCount to see if it’s for Category_A or not but that’s where I’m at.
I have additional things I want to do on this dataset (like bring in a goals dataset and plot against that), but taking small steps for now. Any help is appreciated.
Working code below:
import pyodbc
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource, Div, Select, Slider, TextInput
from bokeh.embed import components
from bokeh.models.tools import HoverTool
query = open('workingsql.sql')
dataset = pd.read_sql_query(query.read(), cnxn)
query.close()
p = figure()
CurrentCount = dataset.Current
ProjCount = dataset.Projected
Cat = dataset.Category
grouped = dataset.groupby('Category')['Current','Projected'].sum()
source = ColumnDataSource(pd.DataFrame(grouped))
Category = source.data['Category'].tolist()
p = figure(y_range=Category)
p.hbar(y='Category', right = 'Current', left = 0, height = 0.5,source=source, fill_color="#D7D7D7")
p.hbar(y='Category', right = 'Projected', left = 0, height = 0.5,source=source, fill_color="#E21150")
hover = HoverTool()
hover.tooltips = [("Totals", "#Current Current Count")]
hover.mode = 'hline'
p.add_tools(hover)
show(p)
Related
I am working on a Python application where I am collecting data from a device, and attempting to plot it in an excel file by using the Openpyxl library. I am successfully able to do everything including plotting the data, and formatting the scatter plot that I made, but I am having some trouble in adding minor gridlines to the plot.
I feel like this is definitely possible because in the API, I can see under the openpyxl.chart.axis module, there is a “minorGridlines” attribute, but it is not a boolean input (ON/OFF), rather it takes a Chartlines class. I tried going a bit down the rabbit-hole of seeing how I would do this, but I am wondering what the most straightforward way of adding the minor-gridlines would be? Do you have to construct chart lines manually, or is there a simple way of doing this?
I would really appreciate any help!
I think I answered my own question, but I will post it here if anybody else needs this (as I don’t see any other answers to this question on the forum).
Example Code (see lines 4, 38):
# Imports for script
from openpyxl import Workbook # For plotting things in excel
from openpyxl.chart import ScatterChart, Reference, Series
from openpyxl.chart.axis import ChartLines
from math import log10
# Variables for script
fileName = 'testFile.xlsx'
dataPoints = 100
# Generating a workbook to test with
wb = Workbook()
ws = wb.active # Fill data into the first sheet
ws_name = ws.title
# We will just generate a logarithmic plot, and scale the axis logarithmically (will look linear)
x_data = []
y_data = []
for i in range(dataPoints):
x_data.append(i + 1)
y_data.append(log10(i + 1))
# Go back through the data, and place the data into the sheet
ws['A1'] = 'x_data'
ws['B1'] = 'y_data'
for i in range(dataPoints):
ws['A%d' % (i + 2)] = x_data[i]
ws['B%d' % (i + 2)] = y_data[i]
# Generate a reference to the cells that we can plot
x_axis = Reference(ws, range_string='%s!A2:A%d' % (ws_name, dataPoints + 1))
y_axis = Reference(ws, range_string='%s!B2:B%d' % (ws_name, dataPoints + 1))
function = Series(xvalues=x_axis, values=y_axis)
# Actually create the scatter plot, and append all of the plots to it
ScatterPlot = ScatterChart()
ScatterPlot.x_axis.minorGridlines = ChartLines()
ScatterPlot.x_axis.scaling.logBase = 10
ScatterPlot.series.append(function)
ScatterPlot.x_axis.title = 'X_Data'
ScatterPlot.y_axis.title = 'Y_Data'
ScatterPlot.title = 'Openpyxl Plotting Test'
ws.add_chart(ScatterPlot, 'D2')
# Save the file at the end to output it
wb.save(fileName)
Background on solution:
I looked at how the code for Openpyxl generates the Major axis gridlines, which seems to follow a similar convention as the Minor axis gridlines, and I found that in the ‘NumericAxis’ class, they generated the major gridlines with the following line (labeled ‘##### This Line #####’ which is originally copied from the ‘openpyxl->chart->axis’ file):
class NumericAxis(_BaseAxis):
tagname = "valAx"
axId = _BaseAxis.axId
scaling = _BaseAxis.scaling
delete = _BaseAxis.delete
axPos = _BaseAxis.axPos
majorGridlines = _BaseAxis.majorGridlines
minorGridlines = _BaseAxis.minorGridlines
title = _BaseAxis.title
numFmt = _BaseAxis.numFmt
majorTickMark = _BaseAxis.majorTickMark
minorTickMark = _BaseAxis.minorTickMark
tickLblPos = _BaseAxis.tickLblPos
spPr = _BaseAxis.spPr
txPr = _BaseAxis.txPr
crossAx = _BaseAxis.crossAx
crosses = _BaseAxis.crosses
crossesAt = _BaseAxis.crossesAt
crossBetween = NestedNoneSet(values=(['between', 'midCat']))
majorUnit = NestedFloat(allow_none=True)
minorUnit = NestedFloat(allow_none=True)
dispUnits = Typed(expected_type=DisplayUnitsLabelList, allow_none=True)
extLst = Typed(expected_type=ExtensionList, allow_none=True)
__elements__ = _BaseAxis.__elements__ + ('crossBetween', 'majorUnit',
'minorUnit', 'dispUnits',)
def __init__(self,
crossBetween=None,
majorUnit=None,
minorUnit=None,
dispUnits=None,
extLst=None,
**kw
):
self.crossBetween = crossBetween
self.majorUnit = majorUnit
self.minorUnit = minorUnit
self.dispUnits = dispUnits
kw.setdefault('majorGridlines', ChartLines()) ######## THIS Line #######
kw.setdefault('axId', 100)
kw.setdefault('crossAx', 10)
super(NumericAxis, self).__init__(**kw)
#classmethod
def from_tree(cls, node):
"""
Special case value axes with no gridlines
"""
self = super(NumericAxis, cls).from_tree(node)
gridlines = node.find("{%s}majorGridlines" % CHART_NS)
if gridlines is None:
self.majorGridlines = None
return self
I took a stab, and after importing the ‘Chartlines’ class like so:
from openpyxl.chart.axis import ChartLines
I was able to add minor gridlines to the x-axis like so:
ScatterPlot.x_axis.minorGridlines = ChartLines()
As far as formatting the minor gridlines, I’m at a bit of a loss, and personally have no need, but this at least is a good start.
Struggling to understand why this bokeh visual will not allow me to change plots and see the predicted data. The plot and select (dropdown-looking) menu appears, but I'm not able to change the plot for items in the menu.
Running Bokeh 1.2.0 via Anaconda. The code has been run both inside & outside of Jupyter. No errors display when the code is run. I've looked through the handful of SO posts relating to this same issue, but I've not been able to apply the same solutions successfully.
I wasn't sure how to create a toy problem out of this, so in addition to the code sample below, the full code (including the regression code and corresponding data) can be found at my github here (code: Regression&Plotting.ipynb, data: pred_data.csv, historical_data.csv, features_created.pkd.)
import pandas as pd
import datetime
from bokeh.io import curdoc, output_notebook, output_file
from bokeh.layouts import row, column
from bokeh.models import Select, DataRange1d, ColumnDataSource
from bokeh.plotting import figure
#Must be run from the command line
def get_historical_data(src_hist, drug_id):
historical_data = src_hist.loc[src_hist['ndc'] == drug_id]
historical_data.drop(['Unnamed: 0', 'date'], inplace = True, axis = 1)#.dropna()
historical_data['date'] = pd.to_datetime(historical_data[['year', 'month', 'day']], infer_datetime_format=True)
historical_data = historical_data.set_index(['date'])
historical_data.sort_index(inplace = True)
# csd_historical = ColumnDataSource(historical_data)
return historical_data
def get_prediction_data(src_test, drug_id):
#Assign the new date
#Write a new dataframe with values for the new dates
df_pred = src_test.loc[src_test['ndc'] == drug_id].copy()
df_pred.loc[:, 'year'] = input_date.year
df_pred.loc[:, 'month'] = input_date.month
df_pred.loc[:, 'day'] = input_date.day
df_pred.drop(['Unnamed: 0', 'date'], inplace = True, axis = 1)
prediction = lin_model.predict(df_pred)
prediction_data = pd.DataFrame({'drug_id': prediction[0][0], 'predictions': prediction[0][1], 'date': pd.to_datetime(df_pred[['year', 'month', 'day']], infer_datetime_format=True, errors = 'coerce')})
prediction_data = prediction_data.set_index(['date'])
prediction_data.sort_index(inplace = True)
# csd_prediction = ColumnDataSource(prediction_data)
return prediction_data
def make_plot(historical_data, prediction_data, title):
#Historical Data
plot = figure(plot_width=800, plot_height = 800, x_axis_type = 'datetime',
toolbar_location = 'below')
plot.xaxis.axis_label = 'Time'
plot.yaxis.axis_label = 'Price ($)'
plot.axis.axis_label_text_font_style = 'bold'
plot.x_range = DataRange1d(range_padding = 0.0)
plot.grid.grid_line_alpha = 0.3
plot.title.text = title
plot.line(x = 'date', y='nadac_per_unit', source = historical_data, line_color = 'blue', ) #plot historical data
plot.line(x = 'date', y='predictions', source = prediction_data, line_color = 'red') #plot prediction data (line from last date/price point to date, price point for input_date above)
return plot
def update_plot(attrname, old, new):
ver = vselect.value
new_hist_source = get_historical_data(src_hist, ver) #calls the function above to get the data instead of handling it here on its own
historical_data.data = ColumnDataSource.from_df(new_hist_source)
# new_pred_source = get_prediction_data(src_pred, ver)
# prediction_data.data = new_pred_source.data
#Import data source
src_hist = pd.read_csv('data/historical_data.csv')
src_pred = pd.read_csv('data/pred_data.csv')
#Prep for default view
#Initialize plot with ID number
ver = 781593600
#Set the prediction date
input_date = datetime.datetime(2020, 3, 31) #Make this selectable in future
#Select-menu options
menu_options = src_pred['ndc'].astype(str) #already contains unique values
#Create select (dropdown) menu
vselect = Select(value=str(ver), title='Drug ID', options=sorted((menu_options)))
#Prep datasets for plotting
historical_data = get_historical_data(src_hist, ver)
prediction_data = get_prediction_data(src_pred, ver)
#Create a new plot with the source data
plot = make_plot(historical_data, prediction_data, "Drug Prices")
#Update the plot every time 'vselect' is changed'
vselect.on_change('value', update_plot)
controls = row(vselect)
curdoc().add_root(row(plot, controls))
UPDATED: ERRORS:
1) No errors show up in Jupyter Notebook.
2) CLI shows a UserWarning: Pandas doesn't allow columns to be careated via a new attribute name, referencing `historical_data.data = ColumnDatasource.from_df(new_hist_source).
Ultimately, the plot should have a line for historical data, and another line or dot for predicted data derived from sklearn. It also has a dropdown menu to select each item to plot (one at a time).
Your update_plot is a no-op that does not actually make any changes to Bokeh model state, which is what is necessary to change a Bokeh plot. Changing Bokeh model state means assigning a new value to a property on a Bokeh object. Typically, to update a plot, you would compute a new data dict and then set an existing CDS from it:
source.data = new_data # plain python dict
Or, if you want to update from a DataFame:
source.data = ColumnDataSource.from_df(new_df)
As an aside, don't assign the .data from one CDS to another:
source.data = other_source.data # BAD
By contrast, your update_plot computes some new data and then throws it away. Note there is never any purpose to returning anything at all from any Bokeh callback. The callbacks are called by Bokeh library code, which does not expect or use any return values.
Lastly, I don't think any of those last JS console errors were generated by BokehJS.
I m using linear regression to predict the closing price of a stock on the current day. This works fine.
I m using Django.
I need to add graphs(time-series and a candlestick). after searching I found that Bokeh is best for what I want to achieve.
Question:
I want to add time-series and candlestick graph in my Django project.
Code
This is how I m predicting stocks closing price on the current day.
stockprediction.py
def get_stock_data(name):
try:
if model_check(name) == False:
data_path = os.getcwd()+"\\StockPrediction\\data\\HISTORICAL_DATA\\"
df = pd.read_csv(data_path + name + '_data.csv')
df.fillna(df.mean(), inplace=True)
X = df.iloc[:, [1, 2, 3]]
Y = df.iloc[:, [4]]
reg = linear_model.LinearRegression()
reg.fit(X,Y)
y_today = reg.predict([get_nse_data(name)])
model_path = os.getcwd() + "\\StockPrediction\\data\\saved_data\\"
file = model_path + name + ".pkl"
joblib.dump(reg, file)
return y_today[0][0]
else:
model_path = os.getcwd()+"\\StockPrediction\\data\\saved_data\\"
file = model_path + name+".pkl"
model = joblib.load(file)
y_today = model.predict([get_nse_data(name)])
return y_today
except:
return ("Error")
def get_nse_data(name):
data = nse.get_quote(name)
current = [data['open'], data['dayHigh'], data['dayLow']]
return current
Bonus Question:
I need graphs which are best for showing stocks price like candlestick and time-series(can you suggest more.)
Help!
If you want to implement Bokeh, you can set up everything using the steps in the documentation (https://docs.bokeh.org/en/latest/docs/user_guide/quickstart.html#userguide-quickstart) and that will generate your .html file which you can include in your templates folder.
However, I find libraries like chart.js much more handy and customizable. They can be implemented in django fairly easily. Here's a link to a very good tutorial that helped me a lot:
https://youtu.be/B4Vmm3yZPgc
I've found holoviews to be really nice for this kind of stuff - in your case, you want to work with RangeToolLink in conjunction with hv.Curve and a pandas dataframe to make a typical stock plot with a range tool on the bottom.
Here's a simple example, stolen from the holoviews website:
import bokeh
bokeh.sampledata.download() # only needs to run once
import pandas as pd
import holoviews as hv
from bokeh.sampledata.stocks import AAPL
from holoviews.plotting.links import RangeToolLink
from holoviews import opts
hv.extension('bokeh')
# Make dataframe from stock data
aapl_df = pd.DataFrame(AAPL['close'], columns=['close'], index=pd.to_datetime(AAPL['date']))
aapl_df.index.name = 'Date'
# Create stock curve
aapl_curve = hv.Curve(aapl_df, 'Date', ('close', 'Price ($)'))
# Labels and layout
tgt = aapl_curve.relabel('AAPL close price').opts(width=800, labelled=['y'], toolbar='disable')
src = aapl_curve.opts(width=800, height=100, yaxis=None, default_tools=[])
RangeToolLink(src, tgt)
# Merge rangetool
layout = (tgt + src).cols(1)
layout.opts(opts.Layout(shared_axes=False, merge_tools=False))
Here's what you should see:
An even simpler example here uses candlesticks in bokeh:
from math import pi
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.sampledata.stocks import MSFT
df = pd.DataFrame(AAPL)[:50]
df["date"] = pd.to_datetime(df["date"])
inc = df.close > df.open
dec = df.open > df.close
w = 12*60*60*1000 # half day in ms
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"
p = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, title = "MSFT Candlestick")
p.xaxis.major_label_orientation = pi/4
p.grid.grid_line_alpha=0.3
p.segment(df.date, df.high, df.date, df.low, color="black")
p.vbar(df.date[inc], w, df.open[inc], df.close[inc], fill_color="#D5E1DD", line_color="black")
p.vbar(df.date[dec], w, df.open[dec], df.close[dec], fill_color="#F2583E", line_color="black")
show(p)
Result:
This all works seamlessly in a Jupyter notebook, so it should be easy enough for you - you just need to get your predictions into a Pandas dataframe!
I am trying to find the max and min value for each category within source = columndatasource where my stock data is organized into columns by (Open, High, Low, Close, AdjClose, Volume, etc....)
I tried using,
max(source.data['Close'])
min(source.data['Close'])
however, the problem with max(source.data['Open'] is that the values do not update when I update my data when using the slider and select widgets.
Is there a way in which that I can find the min and max of each column that will update each time when I update my data ?
from math import pi
import pandas as pd
import numpy as np
import datetime
import time
from datetime import date
from bokeh.layouts import row, widgetbox, column
from bokeh.models import DataRange1d, LinearAxis, Range1d, ColumnDataSource, PrintfTickFormatter, CDSView, BooleanFilter, NumeralTickFormatter
from bokeh.models.widgets import PreText, Select, DateRangeSlider, Button, DataTable, TableColumn, NumberFormatter
from bokeh.io import curdoc, show, reset_output
from bokeh.plotting import figure, output_file
DEFAULT_TICKERS = ['AAPL','GOOG','NFLX', 'TSLA']
ticker1 = Select(value='AAPL', options = DEFAULT_TICKERS)
range_slider1 = DateRangeSlider(start=date(2014,1,1) , end=date(2017,1,1), value=(date(2014,2,1),date(2016,3,1)), step=1)
def load_ticker(ticker):
fname = ( '%s.csv' % ticker.lower())
data = pd.read_csv( fname, header = None, parse_dates = ['Date'],
names =['Date','Open','High','Low','Close','AdjClose','Volume'])
return data
def get_data(t1):
data = load_ticker(t1)
return data
def ticker1_change(attrname, old, new):
update()
def range_slider_change(attrname, old, new):
update()
def update(selected=None):
t1 = ticker1.value
if isinstance(range_slider1.value[0], (int, float)):
# pandas expects nanoseconds since epoch
start_date = pd.Timestamp(float(range_slider1.value[0])*1e6)
end_date = pd.Timestamp(float(range_slider1.value[1])*1e6)
else:
start_date = pd.Timestamp(range_slider1.value[0])
end_date = pd.Timestamp(range_slider1.value[1])
datarange = get_data(t1)
datarange['Date'] = pd.to_datetime(datarange['Date'])
mask = (datarange['Date'] > start_date) & (datarange['Date'] <= end_date)
data = datarange.loc[mask]
source.data = source.from_df(data)
p.title.text = t1
data = get_data(ticker1.value)
source = ColumnDataSource(data)
p = figure(plot_width=900, plot_height=400, x_axis_type='datetime', y_range = Range1d(min(source.data['Close']), max(source.data['Close'])))
p.grid.grid_line_alpha = 0.3
p.line('Date', 'Close', source=source)
ticker1.on_change('value', ticker1_change)
range_slider1.on_change('value', range_slider_change)
update()
layout = column(ticker1,range_slider1, p)
curdoc().add_root(layout)
curdoc().title = "Stock"
Yes. Your question is a little convoluted
Short answer: You need to create another "source" that contains the max and min values.
Long answer:
Your code is not running properly. I copied/pasted your code ^^ and ran it on a local bokeh server. No output i.e. you need to fix your code first.
But, let's say that your code was running. The only way as of now to auto update a max or min each time you change your bokeh slider or other widget value is to create another source, let's say source2.
source = ColumnDataSource(data_max_min)
Then, match the keys to the same value. In your example^^, it would most likely be date in the dictionary (data_max_min).
E.g.
pd = read_csv('.../AAPL.csv', header=0, index=None)
aapl_close = pd.DataFrame(aapl_df['close'])
aapl_close.index = aapl_df.date
aapl_close
close
date
2018/11/23 172.29
2018/11/26 174.62
2018/11/27 174.24
I'm assuming that you want to get a max and min value for each time range that you want to analyze on a rolling basis (or something like that). My code will just get the max for each close (*it will be the same value) just as an example. If you don't understand this, I would recommend reading some of the documentation again.
aapl_max_df = pd.DataFrame()
aapl_max_df['max'] = [max(prices) for prices in aapl_close['close']]
aapl_max_df.index = aapl_close.index
aapl_max_min = {}
dates = aapl_max_min.index
for i in range(aapl_max_min.shape[0]):
aapl_max_min[aapl_max_min.index.values[i]] = aapl_max_min['max'].values[i]
source2 = ColumnDataSource(data=aapl_max_min[dates[0]])
Then when you update the slider, you will need to update the "date" for for both sources. This is something not yet in your code. There are several examples online on how to do this (https://github.com/bokeh/bokeh/tree/master/examples/app/gapminder).
like so-->
def slider_update(attrname, old, new):
year = slider.value
label.text = str(year)
source.data = data[year]
source2.data = data[year]
I am new to plotly and working on a script to generate a graph based on some results pulled from a database. However when I send the data over to plotly, only the first data point for each of the three traces is being graphed. I've verified that the lists contain the right data, I've even simply pasted the lists in instead of dynamically creating the variables. Unfortunately each time only the first data point is being graphed. Does anyone know what I am missing here? I am also open to another library if needed.
Is it also possible to have the x axis show as a string?
import plotly.plotly as py
import plotly.graph_objs as go
# Custom database class, works fine.
from classes.database import DatabaseConnection
# Database Connections and instances
db_instance = DatabaseConnection()
db_conn = db_instance.conn
db_cur = db_instance.cur
def main():
# Get a list of versions and their stats.
db_cur.execute(
"""
select row_to_json(x) from
(SELECT
versions.version_number,
cast(AVG(results.average) as double precision) as average,
cast(AVG(results.minimum) as double precision) as minimum,
cast(AVG(results.maximum) as double precision) as maximum
FROM versions,results
WHERE
versions.version_number = results.version_number
GROUP BY
versions.version_number) x;
"""
)
versions = []
average = []
minimum = []
maximum = []
unclean = db_cur.fetchall()
# Create lists for x and y coordinates.
for row in unclean:
versions.append(row[0]['version_number'])
average.append(int(row[0]['average']))
minimum.append(int(row[0]['minimum']))
maximum.append(int(row[0]['maximum']))
grph_average = go.Scatter(
x=versions,
y=average,
name = 'Average',
mode='lines',
)
grph_minimum = go.Scatter(
x=versions,
y=minimum,
name = 'Minimum',
mode='lines',
)
grph_maximum = go.Scatter(
x=versions,
y=maximum,
name = 'Maximum',
mode='lines',
)
data = go.Data([grph_average, grph_minimum, grph_maximum])
# Edit the layout
layout = dict(title = 'Responses',
xaxis = dict(title = 'Versions'),
yaxis = dict(title = 'Ms'),
)
fig = dict(data=data, layout=layout)
py.plot(fig, filename='response-times', auto_open=False)
if __name__ == '__main__':
main()
The data that query returns is as follows, if you want to plug in the values :
versions = ['6.1', '5.0', '5.2']
average = [11232, 29391, 10429]
minimum = [3641, 7729, 3483]
maximum = [57440, 62535, 45201]
Here is some matplotlib that might get you started on this:
import matplotlib.pyplot as plt
versions = ['6.1', '5.0', '5.2']
average = [11232, 29391, 10429]
minimum = [3641, 7729, 3483]
maximum = [57440, 62535, 45201]
plt.plot(minimum)
plt.plot(average)
plt.plot(maximum)
plt.xticks(range(len(versions)), versions)
It looks like it was an issue with my x axis. By adding some text before the version number and specifically type casting to a string I was able to get the graphs to generate properly.
# Create lists for x and y coordinates.
for row in unclean:
versions.append("Version: " + str(row[0]['version_number']))
average.append(int(row[0]['average']))
minimum.append(int(row[0]['minimum']))
maximum.append(int(row[0]['maximum']))