I m using linear regression to predict the closing price of a stock on the current day. This works fine.
I m using Django.
I need to add graphs(time-series and a candlestick). after searching I found that Bokeh is best for what I want to achieve.
Question:
I want to add time-series and candlestick graph in my Django project.
Code
This is how I m predicting stocks closing price on the current day.
stockprediction.py
def get_stock_data(name):
try:
if model_check(name) == False:
data_path = os.getcwd()+"\\StockPrediction\\data\\HISTORICAL_DATA\\"
df = pd.read_csv(data_path + name + '_data.csv')
df.fillna(df.mean(), inplace=True)
X = df.iloc[:, [1, 2, 3]]
Y = df.iloc[:, [4]]
reg = linear_model.LinearRegression()
reg.fit(X,Y)
y_today = reg.predict([get_nse_data(name)])
model_path = os.getcwd() + "\\StockPrediction\\data\\saved_data\\"
file = model_path + name + ".pkl"
joblib.dump(reg, file)
return y_today[0][0]
else:
model_path = os.getcwd()+"\\StockPrediction\\data\\saved_data\\"
file = model_path + name+".pkl"
model = joblib.load(file)
y_today = model.predict([get_nse_data(name)])
return y_today
except:
return ("Error")
def get_nse_data(name):
data = nse.get_quote(name)
current = [data['open'], data['dayHigh'], data['dayLow']]
return current
Bonus Question:
I need graphs which are best for showing stocks price like candlestick and time-series(can you suggest more.)
Help!
If you want to implement Bokeh, you can set up everything using the steps in the documentation (https://docs.bokeh.org/en/latest/docs/user_guide/quickstart.html#userguide-quickstart) and that will generate your .html file which you can include in your templates folder.
However, I find libraries like chart.js much more handy and customizable. They can be implemented in django fairly easily. Here's a link to a very good tutorial that helped me a lot:
https://youtu.be/B4Vmm3yZPgc
I've found holoviews to be really nice for this kind of stuff - in your case, you want to work with RangeToolLink in conjunction with hv.Curve and a pandas dataframe to make a typical stock plot with a range tool on the bottom.
Here's a simple example, stolen from the holoviews website:
import bokeh
bokeh.sampledata.download() # only needs to run once
import pandas as pd
import holoviews as hv
from bokeh.sampledata.stocks import AAPL
from holoviews.plotting.links import RangeToolLink
from holoviews import opts
hv.extension('bokeh')
# Make dataframe from stock data
aapl_df = pd.DataFrame(AAPL['close'], columns=['close'], index=pd.to_datetime(AAPL['date']))
aapl_df.index.name = 'Date'
# Create stock curve
aapl_curve = hv.Curve(aapl_df, 'Date', ('close', 'Price ($)'))
# Labels and layout
tgt = aapl_curve.relabel('AAPL close price').opts(width=800, labelled=['y'], toolbar='disable')
src = aapl_curve.opts(width=800, height=100, yaxis=None, default_tools=[])
RangeToolLink(src, tgt)
# Merge rangetool
layout = (tgt + src).cols(1)
layout.opts(opts.Layout(shared_axes=False, merge_tools=False))
Here's what you should see:
An even simpler example here uses candlesticks in bokeh:
from math import pi
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.sampledata.stocks import MSFT
df = pd.DataFrame(AAPL)[:50]
df["date"] = pd.to_datetime(df["date"])
inc = df.close > df.open
dec = df.open > df.close
w = 12*60*60*1000 # half day in ms
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"
p = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, title = "MSFT Candlestick")
p.xaxis.major_label_orientation = pi/4
p.grid.grid_line_alpha=0.3
p.segment(df.date, df.high, df.date, df.low, color="black")
p.vbar(df.date[inc], w, df.open[inc], df.close[inc], fill_color="#D5E1DD", line_color="black")
p.vbar(df.date[dec], w, df.open[dec], df.close[dec], fill_color="#F2583E", line_color="black")
show(p)
Result:
This all works seamlessly in a Jupyter notebook, so it should be easy enough for you - you just need to get your predictions into a Pandas dataframe!
Related
I am trying to visualize Air Quality Data as time-series charts using pycaret and plotly dash python libraries , but i am getting very weird graphs, below is my code:
import pandas as pd
import plotly.express as px
data = pd.read_csv('E:/Self Learning/Djang_Dash/2019-2020_5.csv')
data['Date'] = pd.to_datetime(data['Date'], format='%d/%m/%Y')
#data.set_index('Date', inplace=True)
# combine store and item column as time_series
data['OBJECTID'] = ['Location_' + str(i) for i in data['OBJECTID']]
#data['AQI_Bins_AI'] = ['Bin_' + str(i) for i in data['AQI_Bins_AI']]
data['time_series'] = data[['OBJECTID']].apply(lambda x: '_'.join(x), axis=1)
data.drop(['OBJECTID'], axis=1, inplace=True)
# extract features from date
data['month'] = [i.month for i in data['Date']]
data['year'] = [i.year for i in data['Date']]
data['day_of_week'] = [i.dayofweek for i in data['Date']]
data['day_of_year'] = [i.dayofyear for i in data['Date']]
data.head(4000)
data['time_series'].nunique()
for i in data['time_series'].unique():
subset = data[data['time_series'] == i]
subset['moving_average'] = subset['CO'].rolling(window = 30).mean()
fig = px.line(subset, x="Date", y=["CO","moving_average"], title = i, template = 'plotly_dark')
fig.show()
require needful help in this regard,
here is my sample data Google Drive Link
data has not been provided in a usable way. Sought out publicly available similar data. found: https://www.kaggle.com/rohanrao/air-quality-data-in-india?select=station_hour.csv
using this data, with a couple of cleanups of your code, no issues with plots. I suspect your data has one of these issues
date is not datetime64[ns] in your data frame
date is not sorted, leading to lines being drawn in way you have noted
by refactoring way moving average is calculated, you can use animation instead of lots of separate figures
get some data
import kaggle.cli
import sys, math
import pandas as pd
from pathlib import Path
from zipfile import ZipFile
import plotly.express as px
# download data set
# https://www.kaggle.com/rohanrao/air-quality-data-in-india?select=station_hour.csv
sys.argv = [
sys.argv[0]
] + "datasets download rohanrao/air-quality-data-in-india".split(
" "
)
kaggle.cli.main()
zfile = ZipFile("air-quality-data-in-india.zip")
print([f.filename for f in zfile.infolist()])
plot using code from question
import pandas as pd
import plotly.express as px
from pathlib import Path
from distutils.version import StrictVersion
# data = pd.read_csv('E:/Self Learning/Djang_Dash/2019-2020_5.csv')
# use kaggle data
# dfs = {f.filename:pd.read_csv(zfile.open(f)) for f in zfile.infolist() if f.filename in ['station_day.csv',"stations.csv"]}
# data = pd.merge(dfs['station_day.csv'],dfs["stations.csv"], on="StationId")
# data['Date'] = pd.to_datetime(data['Date'])
# # kaggle data is different from question, make it compatible with questions data
# data = data.assign(OBJECTID=lambda d: d["StationId"])
# sample data from google drive link
data2 = pd.read_csv(Path.home().joinpath("Downloads").joinpath("AQI.csv"))
data2["Date"] = pd.to_datetime(data2["Date"])
data = data2
# as per very first commment - it's important data is ordered !
data = data.sort_values(["Date","OBJECTID"])
data['time_series'] = "Location_" + data["OBJECTID"].astype(str)
# clean up data, remove rows where there is no CO value
data = data.dropna(subset=["CO"])
# can do moving average in one step (can also be used by animation)
if StrictVersion(pd.__version__) < StrictVersion("1.3.0"):
data["moving_average"] = data.groupby("time_series",as_index=False)["CO"].rolling(window=30).mean().to_frame()["CO"].values
else:
data["moving_average"] = data.groupby("time_series",as_index=False)["CO"].rolling(window=30).mean()["CO"]
# just first two for purpose of demonstration
for i in data['time_series'].unique()[0:3]:
subset = data.loc[data['time_series'] == i]
fig = px.line(subset, x="Date", y=["CO","moving_average"], title = i, template = 'plotly_dark')
fig.show()
can use animation
px.line(
data,
x="Date",
y=["CO", "moving_average"],
animation_frame="time_series",
template="plotly_dark",
).update_layout(yaxis={"range":[data["CO"].min(), data["CO"].quantile(.97)]})
I am trying to build a visual that tracks widget counts by category using hbar. The source data is not aggregated. This is what it looks like:
This data is aggregated at MktCatKey level, but I want to group by category and then perform a calculation on the counts. Lets say if the category is Category_A, I want to add +10 to the counts. Finally, I want to display both current and projected on a visual.
This is how far I have gotten:
query = open('workingsql.sql')
dataset = pd.read_sql_query(query.read(), cnxn)
query.close()
p = figure()
CurrentCount = dataset.Current
ProjCount = dataset.Projected
Cat = dataset.Category
grouped = dataset.groupby('Category')['Current','Projected'].sum()
source = ColumnDataSource(grouped)
p = figure(y_range=Cat)
p.hbar(y=Cat, right = CurrentCount, left = 0, height = 0.5,source=source, fill_color="#D7D7D7")
p.hbar(y=Cat, right = ProjCount, left = 0, height = 0.5,source=source, fill_color="#E21150")
hover = HoverTool()
hover.tooltips = [("Totals", "#Current Current Count")]
hover.mode = 'hline'
p.add_tools(hover)
show(p)
I was able to get this to work if I source directly from the dataset. But since I’m trying to perform a calculation, I cant use the source directly. I’m not fully familiar on how to do an if statement on CurrentCount to see if it’s for Category_A or not but that’s where I’m at.
I have additional things I want to do on this dataset (like bring in a goals dataset and plot against that), but taking small steps for now. Any help is appreciated.
Working code below:
import pyodbc
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource, Div, Select, Slider, TextInput
from bokeh.embed import components
from bokeh.models.tools import HoverTool
query = open('workingsql.sql')
dataset = pd.read_sql_query(query.read(), cnxn)
query.close()
p = figure()
CurrentCount = dataset.Current
ProjCount = dataset.Projected
Cat = dataset.Category
grouped = dataset.groupby('Category')['Current','Projected'].sum()
source = ColumnDataSource(pd.DataFrame(grouped))
Category = source.data['Category'].tolist()
p = figure(y_range=Category)
p.hbar(y='Category', right = 'Current', left = 0, height = 0.5,source=source, fill_color="#D7D7D7")
p.hbar(y='Category', right = 'Projected', left = 0, height = 0.5,source=source, fill_color="#E21150")
hover = HoverTool()
hover.tooltips = [("Totals", "#Current Current Count")]
hover.mode = 'hline'
p.add_tools(hover)
show(p)
I'm trying to create a dashboard that consists of two plots (heatmap and line graph) and one widget (selector):
When you select an option from widget both plots get updated;
When you tap on the first plot the second plot is updated based on tap info.
Currently I'm trying to do it in HoloViews. It seems that this should be very easy to do but I somehow can't wrap my head around it.
The code below shows how it should look like. However, the selector is not connected in any way to the dashboard since I don't know how to do it.
import pandas as pd
import numpy as np
import panel as pn
import holoviews as hv
hv.extension('bokeh')
def create_test_df(k_features, n_tickers=5, m_windows=5):
start_date = pd.Timestamp('01-01-2020')
window_len = pd.Timedelta(days=1)
cols = ['window_dt', 'ticker'] + [f'feature_{i}' for i in range(k_features)]
data = {c: [] for c in cols}
for w in range(m_windows):
window_dt = start_date + w*window_len
for t in range(n_tickers):
ticker = f'ticker_{t}'
data['window_dt'].append(window_dt)
data['ticker'].append(ticker)
for f in range(k_features):
data[f'feature_{f}'].append(np.random.rand())
return pd.DataFrame(data)
k_features = 3
features = [f'feature_{i}' for i in range(k_features)]
df = create_test_df(k_features)
selector = pn.widgets.Select(options=features)
heatmap = hv.HeatMap(df[['window_dt', 'ticker', f'{selector.value}']])
posxy = hv.streams.Tap(source=heatmap, x='01-01-2020', y='ticker_4')
def tap_heatmap(x, y):
scalar = np.random.randn()
x = np.linspace(-2*np.pi, 2*np.pi, 100)
data = list(zip(x, np.sin(x*scalar)))
return hv.Curve(data)
pn.Row(heatmap, hv.DynamicMap(tap_heatmap, streams=[posxy]), selector)
Ok I finally got it. It turned out to be simple (just as expected) but not quite intuitive. Basically, different approach for implementing selector (dropdown menu) should be used. Working code for such example is below:
import pandas as pd
import numpy as np
import panel as pn
import holoviews as hv
hv.extension('bokeh')
def create_test_df(k_features, n_tickers=5, m_windows=5):
start_date = pd.Timestamp('01-01-2020')
window_len = pd.Timedelta(days=1)
cols = ['window_dt', 'ticker'] + [f'feature_{i}' for i in range(k_features)]
data = {c: [] for c in cols}
for w in range(m_windows):
window_dt = start_date + w*window_len
for t in range(n_tickers):
ticker = f'ticker_{t}'
data['window_dt'].append(window_dt)
data['ticker'].append(ticker)
for f in range(k_features):
data[f'feature_{f}'].append(np.random.rand())
return pd.DataFrame(data)
def load_heatmap(feature):
return hv.HeatMap(df[['window_dt', 'ticker', f'{feature}']])
def tap_heatmap(x, y):
scalar = np.random.randn()
x = np.linspace(-2*np.pi, 2*np.pi, 100)
data = list(zip(x, np.sin(x*scalar)))
return hv.Curve(data)
k_features = 3
features = [f'feature_{i}' for i in range(k_features)]
df = create_test_df(k_features)
heatmap_dmap = hv.DynamicMap(load_heatmap, kdims='Feature').redim.values(Feature=features)
posxy = hv.streams.Tap(source=heatmap_dmap, x='01-01-2020', y='ticker_0')
sidegraph_dmap = hv.DynamicMap(tap_heatmap, streams=[posxy])
pn.Row(heatmap_dmap, sidegraph_dmap)
Struggling to understand why this bokeh visual will not allow me to change plots and see the predicted data. The plot and select (dropdown-looking) menu appears, but I'm not able to change the plot for items in the menu.
Running Bokeh 1.2.0 via Anaconda. The code has been run both inside & outside of Jupyter. No errors display when the code is run. I've looked through the handful of SO posts relating to this same issue, but I've not been able to apply the same solutions successfully.
I wasn't sure how to create a toy problem out of this, so in addition to the code sample below, the full code (including the regression code and corresponding data) can be found at my github here (code: Regression&Plotting.ipynb, data: pred_data.csv, historical_data.csv, features_created.pkd.)
import pandas as pd
import datetime
from bokeh.io import curdoc, output_notebook, output_file
from bokeh.layouts import row, column
from bokeh.models import Select, DataRange1d, ColumnDataSource
from bokeh.plotting import figure
#Must be run from the command line
def get_historical_data(src_hist, drug_id):
historical_data = src_hist.loc[src_hist['ndc'] == drug_id]
historical_data.drop(['Unnamed: 0', 'date'], inplace = True, axis = 1)#.dropna()
historical_data['date'] = pd.to_datetime(historical_data[['year', 'month', 'day']], infer_datetime_format=True)
historical_data = historical_data.set_index(['date'])
historical_data.sort_index(inplace = True)
# csd_historical = ColumnDataSource(historical_data)
return historical_data
def get_prediction_data(src_test, drug_id):
#Assign the new date
#Write a new dataframe with values for the new dates
df_pred = src_test.loc[src_test['ndc'] == drug_id].copy()
df_pred.loc[:, 'year'] = input_date.year
df_pred.loc[:, 'month'] = input_date.month
df_pred.loc[:, 'day'] = input_date.day
df_pred.drop(['Unnamed: 0', 'date'], inplace = True, axis = 1)
prediction = lin_model.predict(df_pred)
prediction_data = pd.DataFrame({'drug_id': prediction[0][0], 'predictions': prediction[0][1], 'date': pd.to_datetime(df_pred[['year', 'month', 'day']], infer_datetime_format=True, errors = 'coerce')})
prediction_data = prediction_data.set_index(['date'])
prediction_data.sort_index(inplace = True)
# csd_prediction = ColumnDataSource(prediction_data)
return prediction_data
def make_plot(historical_data, prediction_data, title):
#Historical Data
plot = figure(plot_width=800, plot_height = 800, x_axis_type = 'datetime',
toolbar_location = 'below')
plot.xaxis.axis_label = 'Time'
plot.yaxis.axis_label = 'Price ($)'
plot.axis.axis_label_text_font_style = 'bold'
plot.x_range = DataRange1d(range_padding = 0.0)
plot.grid.grid_line_alpha = 0.3
plot.title.text = title
plot.line(x = 'date', y='nadac_per_unit', source = historical_data, line_color = 'blue', ) #plot historical data
plot.line(x = 'date', y='predictions', source = prediction_data, line_color = 'red') #plot prediction data (line from last date/price point to date, price point for input_date above)
return plot
def update_plot(attrname, old, new):
ver = vselect.value
new_hist_source = get_historical_data(src_hist, ver) #calls the function above to get the data instead of handling it here on its own
historical_data.data = ColumnDataSource.from_df(new_hist_source)
# new_pred_source = get_prediction_data(src_pred, ver)
# prediction_data.data = new_pred_source.data
#Import data source
src_hist = pd.read_csv('data/historical_data.csv')
src_pred = pd.read_csv('data/pred_data.csv')
#Prep for default view
#Initialize plot with ID number
ver = 781593600
#Set the prediction date
input_date = datetime.datetime(2020, 3, 31) #Make this selectable in future
#Select-menu options
menu_options = src_pred['ndc'].astype(str) #already contains unique values
#Create select (dropdown) menu
vselect = Select(value=str(ver), title='Drug ID', options=sorted((menu_options)))
#Prep datasets for plotting
historical_data = get_historical_data(src_hist, ver)
prediction_data = get_prediction_data(src_pred, ver)
#Create a new plot with the source data
plot = make_plot(historical_data, prediction_data, "Drug Prices")
#Update the plot every time 'vselect' is changed'
vselect.on_change('value', update_plot)
controls = row(vselect)
curdoc().add_root(row(plot, controls))
UPDATED: ERRORS:
1) No errors show up in Jupyter Notebook.
2) CLI shows a UserWarning: Pandas doesn't allow columns to be careated via a new attribute name, referencing `historical_data.data = ColumnDatasource.from_df(new_hist_source).
Ultimately, the plot should have a line for historical data, and another line or dot for predicted data derived from sklearn. It also has a dropdown menu to select each item to plot (one at a time).
Your update_plot is a no-op that does not actually make any changes to Bokeh model state, which is what is necessary to change a Bokeh plot. Changing Bokeh model state means assigning a new value to a property on a Bokeh object. Typically, to update a plot, you would compute a new data dict and then set an existing CDS from it:
source.data = new_data # plain python dict
Or, if you want to update from a DataFame:
source.data = ColumnDataSource.from_df(new_df)
As an aside, don't assign the .data from one CDS to another:
source.data = other_source.data # BAD
By contrast, your update_plot computes some new data and then throws it away. Note there is never any purpose to returning anything at all from any Bokeh callback. The callbacks are called by Bokeh library code, which does not expect or use any return values.
Lastly, I don't think any of those last JS console errors were generated by BokehJS.
I am trying to find the max and min value for each category within source = columndatasource where my stock data is organized into columns by (Open, High, Low, Close, AdjClose, Volume, etc....)
I tried using,
max(source.data['Close'])
min(source.data['Close'])
however, the problem with max(source.data['Open'] is that the values do not update when I update my data when using the slider and select widgets.
Is there a way in which that I can find the min and max of each column that will update each time when I update my data ?
from math import pi
import pandas as pd
import numpy as np
import datetime
import time
from datetime import date
from bokeh.layouts import row, widgetbox, column
from bokeh.models import DataRange1d, LinearAxis, Range1d, ColumnDataSource, PrintfTickFormatter, CDSView, BooleanFilter, NumeralTickFormatter
from bokeh.models.widgets import PreText, Select, DateRangeSlider, Button, DataTable, TableColumn, NumberFormatter
from bokeh.io import curdoc, show, reset_output
from bokeh.plotting import figure, output_file
DEFAULT_TICKERS = ['AAPL','GOOG','NFLX', 'TSLA']
ticker1 = Select(value='AAPL', options = DEFAULT_TICKERS)
range_slider1 = DateRangeSlider(start=date(2014,1,1) , end=date(2017,1,1), value=(date(2014,2,1),date(2016,3,1)), step=1)
def load_ticker(ticker):
fname = ( '%s.csv' % ticker.lower())
data = pd.read_csv( fname, header = None, parse_dates = ['Date'],
names =['Date','Open','High','Low','Close','AdjClose','Volume'])
return data
def get_data(t1):
data = load_ticker(t1)
return data
def ticker1_change(attrname, old, new):
update()
def range_slider_change(attrname, old, new):
update()
def update(selected=None):
t1 = ticker1.value
if isinstance(range_slider1.value[0], (int, float)):
# pandas expects nanoseconds since epoch
start_date = pd.Timestamp(float(range_slider1.value[0])*1e6)
end_date = pd.Timestamp(float(range_slider1.value[1])*1e6)
else:
start_date = pd.Timestamp(range_slider1.value[0])
end_date = pd.Timestamp(range_slider1.value[1])
datarange = get_data(t1)
datarange['Date'] = pd.to_datetime(datarange['Date'])
mask = (datarange['Date'] > start_date) & (datarange['Date'] <= end_date)
data = datarange.loc[mask]
source.data = source.from_df(data)
p.title.text = t1
data = get_data(ticker1.value)
source = ColumnDataSource(data)
p = figure(plot_width=900, plot_height=400, x_axis_type='datetime', y_range = Range1d(min(source.data['Close']), max(source.data['Close'])))
p.grid.grid_line_alpha = 0.3
p.line('Date', 'Close', source=source)
ticker1.on_change('value', ticker1_change)
range_slider1.on_change('value', range_slider_change)
update()
layout = column(ticker1,range_slider1, p)
curdoc().add_root(layout)
curdoc().title = "Stock"
Yes. Your question is a little convoluted
Short answer: You need to create another "source" that contains the max and min values.
Long answer:
Your code is not running properly. I copied/pasted your code ^^ and ran it on a local bokeh server. No output i.e. you need to fix your code first.
But, let's say that your code was running. The only way as of now to auto update a max or min each time you change your bokeh slider or other widget value is to create another source, let's say source2.
source = ColumnDataSource(data_max_min)
Then, match the keys to the same value. In your example^^, it would most likely be date in the dictionary (data_max_min).
E.g.
pd = read_csv('.../AAPL.csv', header=0, index=None)
aapl_close = pd.DataFrame(aapl_df['close'])
aapl_close.index = aapl_df.date
aapl_close
close
date
2018/11/23 172.29
2018/11/26 174.62
2018/11/27 174.24
I'm assuming that you want to get a max and min value for each time range that you want to analyze on a rolling basis (or something like that). My code will just get the max for each close (*it will be the same value) just as an example. If you don't understand this, I would recommend reading some of the documentation again.
aapl_max_df = pd.DataFrame()
aapl_max_df['max'] = [max(prices) for prices in aapl_close['close']]
aapl_max_df.index = aapl_close.index
aapl_max_min = {}
dates = aapl_max_min.index
for i in range(aapl_max_min.shape[0]):
aapl_max_min[aapl_max_min.index.values[i]] = aapl_max_min['max'].values[i]
source2 = ColumnDataSource(data=aapl_max_min[dates[0]])
Then when you update the slider, you will need to update the "date" for for both sources. This is something not yet in your code. There are several examples online on how to do this (https://github.com/bokeh/bokeh/tree/master/examples/app/gapminder).
like so-->
def slider_update(attrname, old, new):
year = slider.value
label.text = str(year)
source.data = data[year]
source2.data = data[year]