python bokeh plot how to format axis display - python

the y axis ticks seem to be formatting numbers like 500000000 to 5.000e+8. Is there a way to control the display so that it displays as 500000000?
using python 2.7, bokeh 0.5.2
i m trying out the timeseries example at bokeh tutorials page
The tutorial plots 'Adj Close' against 'Date' but i'm plotting with 'Volume' against 'Date'

You can also use NumeralTickFormatter as used in the toy plot below. The other possible values in place of '00' are listed here.
import pandas as pd
import numpy as np
from bokeh.plotting import figure, output_file, show
from bokeh.models import NumeralTickFormatter
df = pd.DataFrame(np.random.randint(0, 90000000000, (10,1)), columns=['my_int'])
p = figure(plot_width=700, plot_height=280, y_range=[0,100000000000])
output_file("toy_plot_with_commas")
for index, record in df.iterrows():
p.rect([index], [record['my_int']/2], 0.8, [record['my_int']], fill_color="red", line_color="black")
p.yaxis.formatter=NumeralTickFormatter(format="00")
show(p)

You have to add the option p.left[0].formatter.use_scientific = False to your code. In the timeseries tutorial, it'd be:
p1 = figure(title="Stocks")
p1.line(
AAPL['Date'],
AAPL['Adj Close'],
color='#A6CEE3',
legend='AAPL',
)
p1.left[0].formatter.use_scientific = False # <---- This forces showing 500000000 instead of 5.000e+8 as you want
show(VBox(p1, p2))

Related

Plotly: How to add vertical lines at specified points?

I have a data frame plot of a time series along with a list of numeric values at which I'd like to draw vertical lines. The plot is an interactive one created using the cufflinks package. Here is an example of three time series in 1000 time values, I'd like to draw vertical lines at 500 and 800. My attempt using "axvlinee" is based upon suggestions I've seen for similar posts:
import numpy as np
import pandas as pd
import cufflinks
np.random.seed(123)
X = np.random.randn(1000,3)
df=pd.DataFrame(X, columns=['a','b','c'])
fig=df.iplot(asFigure=True,xTitle='time',yTitle='values',title='Time Series Plot')
fig.axvline([500,800], linewidth=5,color="black", linestyle="--")
fig.show()
The error message states 'Figure' object has no attribute 'axvline'.
I'm not sure whether this message is due to my lack of understanding about basic plots or stems from a limitation of using igraph.
The answer:
To add a line to an existing plotly figure, just use:
fig.add_shape(type='line',...)
The details:
I gather this is the post you've seen since you're mixing in matplotlib. And as it has been stated in the comments, axvline has got nothing to do with plotly. That was only used as an example for how you could have done it using matplotlib. Using plotly, I'd either go for fig.add_shape(go.layout.Shape(type="line"). But before you try it out for yourself, please b aware that cufflinks has been deprecated. I really liked cufflinks, but now there are better options for building both quick and detailed graphs. If you'd like to stick to one-liners similat to iplot, I'd suggest using plotly.express. The only hurdle in your case is changing your dataset from a wide to a long format that is preferred by plotly.express. The snippet below does just that to produce the following plot:
Code:
import numpy as np
import pandas as pd
import plotly.express as px
from plotly.offline import iplot
#
np.random.seed(123)
X = np.random.randn(1000,3)
df=pd.DataFrame(X, columns=['a','b','c'])
df['id'] = df.index
df = pd.melt(df, id_vars='id', value_vars=df.columns[:-1])
# plotly line figure
fig = px.line(df, x='id', y='value', color='variable')
# lines to add, specified by x-position
lines = {'a':500,'c':700,'a':900,'b':950}
# add lines using absolute references
for k in lines.keys():
#print(k)
fig.add_shape(type='line',
yref="y",
xref="x",
x0=lines[k],
y0=df['value'].min()*1.2,
x1=lines[k],
y1=df['value'].max()*1.2,
line=dict(color='black', width=3))
fig.add_annotation(
x=lines[k],
y=1.06,
yref='paper',
showarrow=False,
text=k)
fig.show()
Not sure if this is what you want, adding two scatter seems to work:
np.random.seed(123)
X = np.random.randn(1000,3)
df=pd.DataFrame(X, columns=['a','b','c'])
fig = df.iplot(asFigure=True,xTitle='time',yTitle='values',title='Time Series Plot')
fig.add_scatter(x=[500]*100, y=np.linspace(-4,4,100), name='lower')
fig.add_scatter(x=[800]*100, y=np.linspace(-4,4,100), name='upper')
fig.show()
Output:

How do I make a line graph from a dataframe in bokeh?

I'm reading a .csv file in bokeh which has two columns: one for date and one for the values corresponding to that date. I'm trying to make a line graph with the dates on the x axis and the values on y, but it isn't working. Any ideas?
CODE:
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from datetime import datetime
from bokeh.palettes import Spectral3
output_file('output.html')
df = pd.read_csv('speedgraphak29.csv')
p = figure(x_axis_type="datetime")
p.line(x=df.dates, y=df.windspeed, line_width=2)
show(p)
It's returning an empty graph. What should I do?
Since you didn't provide an example of the input data I had to make something up. You probably forgot to specify that the dates column should be interpreted as datetime values as bigreddot noted. Here is a working example:
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from datetime import datetime
from bokeh.palettes import Spectral3
output_file('output.html')
df = pd.DataFrame.from_dict({'dates': ["1-1-2019", "2-1-2019", "3-1-2019", "4-1-2019", "5-1-2019", "6-1-2019", "7-1-2019", "8-1-2019", "9-1-2019", "10-1-2019"], 'windspeed': [10, 15, 20,30 , 25, 5, 15, 30, 35, 25]})
df['dates'] = pd.to_datetime(df['dates'])
source = ColumnDataSource(df)
p = figure(x_axis_type="datetime")
p.line(x='dates', y='windspeed', line_width=2, source=source)
show(p)
You could use this. Say you have a CSV called sample_data.csv with columns Date and Amount. Just to add on to what Jasper had.
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
output_file('output.html')
df = pd.read_csv('sample_data.csv', parse_dates=['Date'])
source = ColumnDataSource(df)
p = figure(x_axis_type="datetime")
p.line(x='Date', y='Amount', line_width=2, source=source)
show(p)
In this case, read the CSV with the column as a date format. Using ColumnDataSource allows you to use advanced features like hovering over a plot to see more details if needed.
You may alternatively also use lists directly which would look like.
p.line(x='my_list_of_dates', y='my_list_of_counts', line_width=2)
This would mean reading each column and making a list from it. All in all, using ColumnDataSource would allow you to directly call a column by its name.

Bokeh skip tick labels for categorical data

I'm using Bokeh version 0.12.13.
I have a mixed numerical and categorical data. I only have one categorical data on the x-axis and the rest is numerical. I converted everything to categorical data to do the plotting (might not be the easiest way to achieve my goal). Now my x-axis tick labels are way denser than I need. I would like to space them out every 10th value so the labels are 10,20,...,90,rest
This is what I tried so far:
import pandas as pd
from bokeh.io import show
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.models.tickers import FixedTicker
# create mock data
n = [str(i) for i in np.arange(1,100)]
n.append('rest')
t = pd.DataFrame([n,list(np.random.randint(25,size=100))]).T
t.columns = ['Number','Value']
t.loc[t['Number']==100,'Number'] = 'Rest'
source = ColumnDataSource(t)
p = figure(plot_width=800, plot_height=400, title="",
x_range=t['Number'].tolist(),toolbar_location=None, tools="")
p.vbar(x='Number', top='Value', width=1, source=source,
line_color="white")
#p.xaxis.ticker = FixedTicker(ticks=[i for i in range(0,100,10)])
show(p)
Ideally, I would like the grid and the x-axis labels to appear every 10th value. Any help on how to get there would be greatly appreciated.
An easier way to do it is to keep the numerical data and use the xaxis.major_label_overrides. Here is the code:
import pandas as pd
from bokeh.io import show
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.models.tickers import FixedTicker
# create mock data
n = np.arange(1,101)
t = pd.DataFrame([n,list(np.random.randint(25,size=100))]).T
t.columns = ['Number','Value']
source = ColumnDataSource(t)
p = figure(plot_width=800, plot_height=400, title="",
toolbar_location=None, tools="")
p.vbar(x='Number', top='Value', width=1, source=source,
line_color="white")
p.xaxis.major_label_overrides = {100: 'Rest'}
show(p)
You can do this now (in Bokeh 2.2.3) using FuncTickFormatter:
# This prints out only every 10th tick label
p.axis.formatter = FuncTickFormatter(code="""
if (index % 10 == 0)
{
return tick;
}
else
{
return "";
}
""")
Sometimes you might want to do this instead of using numerical axis and major_label_overrides e.g. in a heatmap to get positioning of the content rects in the right place, or if you don't have numerical data at all but still want gaps in the axis labels.

How to create a multi-line plot title in bokeh?

How do you create a multiline plot title in bokeh?... same question as https://github.com/bokeh/bokeh/issues/994
Is this resolved yet?
import bokeh.plotting as plt
plt.output_file("test.html")
plt.text(x=[1,2,3], y = [0,0,0], text=['hello\nworld!', 'hello\nworld!', 'hello\nworld!'], angle = 0)
plt.show()
Additionally, can the title text string accept rich text?
In recent versions of Bokeh, labels and text glyphs can accept newlines in the text, and these will be rendered as expected. For multi-line titles, you will have to add explicit Title annotations for each line you want. Here is a complete example:
from bokeh.io import output_file, show
from bokeh.models import Title
from bokeh.plotting import figure
output_file("test.html")
p = figure(x_range=(0, 5))
p.text(x=[1,2,3], y = [0,0,0], text=['hello\nworld!', 'hello\nworld!', 'hello\nworld!'], angle = 0)
p.add_layout(Title(text="Sub-Title", text_font_style="italic"), 'above')
p.add_layout(Title(text="Title", text_font_size="16pt"), 'above')
show(p)
Which produces:
Note that you are limited to the standard "text properties" that Bokeh exposes, since the underlying HTML Canvas does not accept rich text. If you need something like that it might be possible with a custom extension
You can add a simple title to your plot with this:
from bokeh.plotting import figure, show, output_file
output_file("test.html")
p = figure(title="Your title")
p.text(x=[1,2,3], y = [0,0,0], text=['hello\nworld!', 'hello\nworld!', 'hello\nworld!'], angle = 0)
show(p)
Addendum
Here is a working example for plotting a pandas dataframe for you to copy/paste into a jupyter notebook. It's neither elegant nor pythonic. I got it a long time ago from various SO posts. Sorry, that I don't remember which ones anymore, so I can't cite them.
Code
# coding: utf-8
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
import pandas as pd
import numpy as np
# Create some data
np_arr = np.array([[1,1,1], [2,2,2], [3,3,3], [4,4,4]])
pd_df = pd.DataFrame(data=np_arr)
pd_df
# Convert for multi-line plotting
data = [row[1].as_matrix() for row in pd_df.iterrows()]
num_lines = len(pd_df)
cols = [pd_df.columns.values] * num_lines
data
# Init bokeh output for jupyter notebook - Adjust this to your needs
output_notebook()
# Plot
p = figure(plot_width=600, plot_height=300)
p.multi_line(xs=cols, ys=data)
show(p)
Plot

TimeSeries in Bokeh using a dataframe with index

I'm trying to use Bokeh to plot a Pandas dataframe with a DateTime column containing years and a numeric one. If the DateTime is specified as x, the behaviour is the expected (years in the x-axis). However, if I use set_index to turn the DateTime column into the index of the dataframe and then only specify the y in the TimeSeries I get time in milliseconds in the x-axis. A minimal example
import pandas as pd
import numpy as np
from bokeh.charts import TimeSeries, output_file, show
output_file('fig.html')
test = pd.DataFrame({'datetime':pd.date_range('1/1/1880', periods=2000),'foo':np.arange(2000)})
fig = TimeSeries(test,x='datetime',y='foo')
show(fig)
output_file('fig2.html')
test = test.set_index('datetime')
fig2 = TimeSeries(test,y='foo')
show(fig2)
Is this the expected behaviour or a bug? I would expect the same picture with both approaches.
Cheers!!
Bokeh used to add an index for internal reasons but as of not-so-recent versions (>= 0.12.x) it no longer does this. Also it's worth noting that the bokeh.charts API has been deprecated and removed. The equivalent code using the stable bokeh.plotting API yields the expected result:
import pandas as pd
import numpy as np
from bokeh.plotting import figure, output_file, show
from bokeh.layouts import row
output_file('fig.html')
test = pd.DataFrame({'datetime':pd.date_range('1/1/1880', periods=2000),'foo':np.arange(2000)})
fig = figure(x_axis_type="datetime")
fig.line(x='datetime',y='foo', source=test)
test = test.set_index('datetime')
fig2 = figure(x_axis_type="datetime")
fig2.line(x='datetime', y='foo', source=test)
show(row(fig, fig2))

Categories