How do I make a line graph from a dataframe in bokeh? - python

I'm reading a .csv file in bokeh which has two columns: one for date and one for the values corresponding to that date. I'm trying to make a line graph with the dates on the x axis and the values on y, but it isn't working. Any ideas?
CODE:
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from datetime import datetime
from bokeh.palettes import Spectral3
output_file('output.html')
df = pd.read_csv('speedgraphak29.csv')
p = figure(x_axis_type="datetime")
p.line(x=df.dates, y=df.windspeed, line_width=2)
show(p)
It's returning an empty graph. What should I do?

Since you didn't provide an example of the input data I had to make something up. You probably forgot to specify that the dates column should be interpreted as datetime values as bigreddot noted. Here is a working example:
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from datetime import datetime
from bokeh.palettes import Spectral3
output_file('output.html')
df = pd.DataFrame.from_dict({'dates': ["1-1-2019", "2-1-2019", "3-1-2019", "4-1-2019", "5-1-2019", "6-1-2019", "7-1-2019", "8-1-2019", "9-1-2019", "10-1-2019"], 'windspeed': [10, 15, 20,30 , 25, 5, 15, 30, 35, 25]})
df['dates'] = pd.to_datetime(df['dates'])
source = ColumnDataSource(df)
p = figure(x_axis_type="datetime")
p.line(x='dates', y='windspeed', line_width=2, source=source)
show(p)

You could use this. Say you have a CSV called sample_data.csv with columns Date and Amount. Just to add on to what Jasper had.
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
output_file('output.html')
df = pd.read_csv('sample_data.csv', parse_dates=['Date'])
source = ColumnDataSource(df)
p = figure(x_axis_type="datetime")
p.line(x='Date', y='Amount', line_width=2, source=source)
show(p)
In this case, read the CSV with the column as a date format. Using ColumnDataSource allows you to use advanced features like hovering over a plot to see more details if needed.
You may alternatively also use lists directly which would look like.
p.line(x='my_list_of_dates', y='my_list_of_counts', line_width=2)
This would mean reading each column and making a list from it. All in all, using ColumnDataSource would allow you to directly call a column by its name.

Related

Formatting hover text when plotting with hvplot

I am trying to use hvplot.line to plot 2 y variables in a line chart. My goal is to format the hover text to some format I want (say 1 decimal). I used the standard method to format them in bokeh's hovertool and try to pass it with ".opts(tools=)".
But the formatting does not reflect in the plot. I specified the format should '0.0',but the hover text still shows 3 decimal. What did I do wrong?
My code looks like something below:
import pandas as pd
import numpy as np
import hvplot.pandas
hvplot.extension('bokeh')
from numpy import random
from bokeh.models import HoverTool
df=pd.DataFrame({'length':np.linspace(0,4000,6),
'slope':np.linspace(1.7,2.4,6),
'Direction':np.linspace(1.2,-0.5,6),
'clearance':random.rand(6),
'weight':random.rand(6)},)
hover=HoverTool(tooltips=[('clearance','#clearance{0.0}'),('weight','#weight{0.0}')])
df.hvplot.line(x='length',y=['slope','Direction'],invert=True,hover_cols=['clearance','weight']).opts(tools=[hover])
But if I reduce the number of y variable to just 1. It works fine.
Replace the last line of code to be:
df.hvplot.line(x='length',y=['Direction'],invert=True,hover_cols=['clearance','weight']).opts(tools=[hover])
You can pass the tools to the plot call as a keyword argument.
Change your code
# df.hvplot.line(x='length',y=['slope','Direction'], hover_cols=['clearance','weight'], invert=True).opts(tools=[hover])
df.hvplot.line(x='length',y=['slope','Direction'], hover_cols=['clearance','weight'], tools=[hover], invert=True)
and your hovertool with your formatter is applied.
Minimal Example
import hvplot.pandas
import numpy as np
import pandas as pd
from bokeh.models import HoverTool
hvplot.extension('bokeh')
df=pd.DataFrame({
'length':np.linspace(0,4000,6),
'slope':np.linspace(1.7,2.4,6),
'Direction':np.linspace(1.2,-0.5,6),
'clearance':np.random.rand(6),
'weight':np.random.rand(6)}
)
hover=HoverTool(tooltips=[('clearance','#clearance{0.0}'),('weight','#weight{0.0}')])
df.hvplot.line(
x='length',
y=['slope','Direction'],
hover_cols=['clearance','weight'],
tools=[hover],
invert=True
)
Output

Bokeh categorical x-axis alignment on scatter

I have a scatter with a categorical x-axis, but my circles don't align with the axis. This code example replicates the issue:
import pandas as pd
from bokeh.plotting import figure
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource, HoverTool
data = [[1,12],[2,8]]
x_axis_rng = ['VAL 1','VAL 2']
df = pd.DataFrame(data)
df.columns = ['x','y']
chart_data = ColumnDataSource(df)
print(chart_data)
plot = figure(title='Example',
x_axis_label='x',
y_axis_label='y',
x_range=x_axis_rng)
plot.circle('x',
'y',
size=10,
source=chart_data)
hover = HoverTool(tooltips=[('x', '#x'),('y', '#y')])
plot.add_tools(hover)
output_file('test.html')
show(plot)
I read about offsetting the x-axis, but I could not get that to work, and it didn't seem like ti should be necessary in this case?
Any help appreciated!
Rob
You created an axis/range with categorical factors, but are giving circle coordinates as numbers. If you want to position glyphs according to categorical values, the coordinates have to reflect that:
data = [['VAL 1', 12],['VAL 2', 8]]

Bokeh skip tick labels for categorical data

I'm using Bokeh version 0.12.13.
I have a mixed numerical and categorical data. I only have one categorical data on the x-axis and the rest is numerical. I converted everything to categorical data to do the plotting (might not be the easiest way to achieve my goal). Now my x-axis tick labels are way denser than I need. I would like to space them out every 10th value so the labels are 10,20,...,90,rest
This is what I tried so far:
import pandas as pd
from bokeh.io import show
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.models.tickers import FixedTicker
# create mock data
n = [str(i) for i in np.arange(1,100)]
n.append('rest')
t = pd.DataFrame([n,list(np.random.randint(25,size=100))]).T
t.columns = ['Number','Value']
t.loc[t['Number']==100,'Number'] = 'Rest'
source = ColumnDataSource(t)
p = figure(plot_width=800, plot_height=400, title="",
x_range=t['Number'].tolist(),toolbar_location=None, tools="")
p.vbar(x='Number', top='Value', width=1, source=source,
line_color="white")
#p.xaxis.ticker = FixedTicker(ticks=[i for i in range(0,100,10)])
show(p)
Ideally, I would like the grid and the x-axis labels to appear every 10th value. Any help on how to get there would be greatly appreciated.
An easier way to do it is to keep the numerical data and use the xaxis.major_label_overrides. Here is the code:
import pandas as pd
from bokeh.io import show
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.models.tickers import FixedTicker
# create mock data
n = np.arange(1,101)
t = pd.DataFrame([n,list(np.random.randint(25,size=100))]).T
t.columns = ['Number','Value']
source = ColumnDataSource(t)
p = figure(plot_width=800, plot_height=400, title="",
toolbar_location=None, tools="")
p.vbar(x='Number', top='Value', width=1, source=source,
line_color="white")
p.xaxis.major_label_overrides = {100: 'Rest'}
show(p)
You can do this now (in Bokeh 2.2.3) using FuncTickFormatter:
# This prints out only every 10th tick label
p.axis.formatter = FuncTickFormatter(code="""
if (index % 10 == 0)
{
return tick;
}
else
{
return "";
}
""")
Sometimes you might want to do this instead of using numerical axis and major_label_overrides e.g. in a heatmap to get positioning of the content rects in the right place, or if you don't have numerical data at all but still want gaps in the axis labels.

TimeSeries in Bokeh using a dataframe with index

I'm trying to use Bokeh to plot a Pandas dataframe with a DateTime column containing years and a numeric one. If the DateTime is specified as x, the behaviour is the expected (years in the x-axis). However, if I use set_index to turn the DateTime column into the index of the dataframe and then only specify the y in the TimeSeries I get time in milliseconds in the x-axis. A minimal example
import pandas as pd
import numpy as np
from bokeh.charts import TimeSeries, output_file, show
output_file('fig.html')
test = pd.DataFrame({'datetime':pd.date_range('1/1/1880', periods=2000),'foo':np.arange(2000)})
fig = TimeSeries(test,x='datetime',y='foo')
show(fig)
output_file('fig2.html')
test = test.set_index('datetime')
fig2 = TimeSeries(test,y='foo')
show(fig2)
Is this the expected behaviour or a bug? I would expect the same picture with both approaches.
Cheers!!
Bokeh used to add an index for internal reasons but as of not-so-recent versions (>= 0.12.x) it no longer does this. Also it's worth noting that the bokeh.charts API has been deprecated and removed. The equivalent code using the stable bokeh.plotting API yields the expected result:
import pandas as pd
import numpy as np
from bokeh.plotting import figure, output_file, show
from bokeh.layouts import row
output_file('fig.html')
test = pd.DataFrame({'datetime':pd.date_range('1/1/1880', periods=2000),'foo':np.arange(2000)})
fig = figure(x_axis_type="datetime")
fig.line(x='datetime',y='foo', source=test)
test = test.set_index('datetime')
fig2 = figure(x_axis_type="datetime")
fig2.line(x='datetime', y='foo', source=test)
show(row(fig, fig2))

python bokeh plot how to format axis display

the y axis ticks seem to be formatting numbers like 500000000 to 5.000e+8. Is there a way to control the display so that it displays as 500000000?
using python 2.7, bokeh 0.5.2
i m trying out the timeseries example at bokeh tutorials page
The tutorial plots 'Adj Close' against 'Date' but i'm plotting with 'Volume' against 'Date'
You can also use NumeralTickFormatter as used in the toy plot below. The other possible values in place of '00' are listed here.
import pandas as pd
import numpy as np
from bokeh.plotting import figure, output_file, show
from bokeh.models import NumeralTickFormatter
df = pd.DataFrame(np.random.randint(0, 90000000000, (10,1)), columns=['my_int'])
p = figure(plot_width=700, plot_height=280, y_range=[0,100000000000])
output_file("toy_plot_with_commas")
for index, record in df.iterrows():
p.rect([index], [record['my_int']/2], 0.8, [record['my_int']], fill_color="red", line_color="black")
p.yaxis.formatter=NumeralTickFormatter(format="00")
show(p)
You have to add the option p.left[0].formatter.use_scientific = False to your code. In the timeseries tutorial, it'd be:
p1 = figure(title="Stocks")
p1.line(
AAPL['Date'],
AAPL['Adj Close'],
color='#A6CEE3',
legend='AAPL',
)
p1.left[0].formatter.use_scientific = False # <---- This forces showing 500000000 instead of 5.000e+8 as you want
show(VBox(p1, p2))

Categories