Create a stacked bar chart with grouping using pandas and bokeh - python

My dataset is like:
Count Date teams sex
39 2017/12/28 a m
26 2019/12/28 b f
3 2016/12/28 c f
8 2017/12/28 d m
1 2019/12/28 f f
22 2018/12/28 a m
26 2016/12/29 b m
I want a stacked chart with sex as stacks and grouped across teams ,
each day, using bokeh plot.
I found an answer but its using old bokeh plot and is deprecated as of now.
from bokeh.charts import Bar, output_file, show
p = Bar(df, label='date', values='count', stack='class', group='user',
)

Bar was deprecated a long time ago, and subsequently removed completely. It cannot be used with any recent versions I would highly recommend against using it with old versions either.
Instead, there is an entire chapter of the user's guide that describes how to make all sorts of bar charts, including stacked and grouped bar charts: Handling Categorical Data
To create a stacked, grouped bar chart you need to specify nested categories together with vbar_stack
A complete example is in the docs but I have reproduced an abridged version here as well:
from bokeh.core.properties import value
from bokeh.io import show
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure
factors = [
("Q1", "jan"), ("Q1", "feb"), ("Q1", "mar"),
("Q2", "apr"), ("Q2", "may"), ("Q2", "jun"),
("Q3", "jul"), ("Q3", "aug"), ("Q3", "sep"),
("Q4", "oct"), ("Q4", "nov"), ("Q4", "dec"),
]
regions = ['east', 'west']
source = ColumnDataSource(data=dict(
x=factors,
east=[ 5, 5, 6, 5, 5, 4, 5, 6, 7, 8, 6, 9 ],
west=[ 5, 7, 9, 4, 5, 4, 7, 7, 7, 6, 6, 7 ],
))
p = figure(x_range=FactorRange(*factors), plot_height=250,
toolbar_location=None, tools="")
p.vbar_stack(regions, x='x', width=0.9, alpha=0.5, color=["blue", "red"], source=source,
legend=[value(x) for x in regions])
show(p)

Related

Disable hue nesting in Seaborn

When plotting a bar chart with Seaborn and using the hue parameter to color the bars according to their column value, bars with identical column values are nested, or aggregated, and only a single bar is shown. The image below illustrates the problem. Patient number 1 has two samples of sample_type 1, with values 10 and 20. The two values have been nested, and both values are represented as a single bar (as the average of the two).
I'd like to avoid this nesting, and rather have something like in the image below.
Is this possible to achieve? MVE below. Thanks!
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({
"patient_number": [1, 1, 1, 2, 2, 2],
"sample_type": [1, 1, 2, 1, 2, 3],
"value": [10, 20, 15, 10, 11, 12]
})
sns.barplot(x="patient_number", y="value", hue="sample_type", data=df)
plt.show()
The following approach obtains the desired plot:
Seaborn's hue= parameter both defines the color and the position of the bars.
Per patient, an extra field ('idx') contains a unique number for each of the desired bars. This field 'idx' restarts from 0 for every next patient and is added to the dataframe.
'idx' can then be used as hue='idx' to get the desired columns, although they will be colored just sequently.
In order to get one color per sample type, an extra column now contains a factorized version of the sample types (so, 0 for the first type, 1 for the next, etc.)
Seaborn generates the bars per hue, one for each patient. These bars can be accessed as a list via ax.patches If some patient doesn't have a value for a given 'idx', a dummy bar is will be added to the list.
By iterating through the patients and then through the 'idx', all bars can be visited and colored via 'sample_type'. As the ordering of the bars is a bit tricky, an adequate renumbering is needed.
The legend needs to be changed to reflect the sample types.
The given data is extended a bit to be able to test different numbers of samples per patient, and sample types that aren't simple subsequent numbers.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({
'patient_number': [1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3],
'sample_type': ['st1', 'st1', 'st2', 'st1', 'st2', 'st3', 'st4', 'st4', 'st4', 'st4', 'st4'],
'value': [10, 20, 15, 10, 11, 12, 1, 2, 3, 4, 5]
})
df['idx'] = df.groupby('patient_number').cumcount()
df['sample_factors'], sample_labels = pd.factorize(df['sample_type'])
ax = sns.barplot(x='patient_number', y='value', hue='idx', data=df)
colors = plt.cm.get_cmap('Set2').colors # https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
handles = [None for _ in sample_labels]
num_patients = len(ax.patches) // (df['idx'].max() + 1)
for i, (patient_id, group) in enumerate(df.groupby('patient_number')):
for j, factor in enumerate(group['sample_factors']):
patch = ax.patches[i + j * num_patients]
patch.set_color(colors[factor])
handles[factor] = patch
ax.legend(handles=handles, labels=list(sample_labels), title='Sample type')
plt.show()

Using Hovertool in Python with Bokeh

I am a new python learner and am trying to make a plot with bokeh. I want to use the hover tool and it is working when I scroll over the dots. However, the X and Y values are showing ??? instead of the actual values. I'm not quite sure what I'm doing incorrectly because the hovertool itself is working, but the values are not displaying.
from bokeh.plotting import figure`
from bokeh.io import show, output_notebook
get_provider(Vendors.CARTODBPOSITRON)
from bokeh.models import ColumnDataSource, HoverTool
# Create a blank figure with labels
p = figure(plot_width = 600, plot_height = 600,
title = 'Example Glyphs',
x_axis_label = 'X', y_axis_label = 'Y')
hover = HoverTool(tooltips=[
("X", "#X "),
("Y","#Y")])
p = figure(x_axis_type="mercator",
y_axis_type="mercator",
tools=[hover, 'wheel_zoom','save'])
p.add_tile(CARTODBPOSITRON)
# Example data
circles_x = [1, 3, 4, 5, 8]
circles_y = [8, 7, 3, 1, 10]
circles_x = [9, 12, 4, 3, 15]
circles_y = [8, 4, 11, 6, 10]
# Add squares glyph
p.circle(squares_x, squares_y, size = 12, color = 'navy', alpha = 0.6)
# Add circle glyph
p.circle(circles_x, circles_y, size = 12, color = 'red')
# Set to output the plot in the notebook
output_notebook()
# Show the plot
show(p)
If you aren't using an explicit ColumnDataSource (which would allow you to use and refer to whatever column names you want), then you must refer to the default column names Bokeh uses. In this case, for circle, the default column names are "x" and "y" (lower case, not upper case as you have above). So:
hover = HoverTool(tooltips=[
("X", "#x"),
("Y", "#y"),
])

Bokeh Position Legend outside plot area for stacked vbar

I have a stacked vbar chart in Bokeh, a simplified version of which can be reproduced with:
from bokeh.plotting import figure
from bokeh.io import show
months = ['JAN', 'FEB', 'MAR']
categories = ["cat1", "cat2", "cat3"]
data = {"month" : months,
"cat1" : [1, 4, 12],
"cat2" : [2, 5, 3],
"cat3" : [5, 6, 1]}
colors = ["#c9d9d3", "#718dbf", "#e84d60"]
p = figure(x_range=months, plot_height=250, title="Categories by month",
toolbar_location=None)
p.vbar_stack(categories, x='month', width=0.9, color=colors, source=data)
show(p)
I want to add a legend to the chart, but my real chart has a lot of categories in the stacks and therefore the legend would be very large, so I want it to be outside the plot area to the right.
There's a SO answer here which explains how to add a legend outside of the plot area, but in the example given each glyph rendered is assigned to a variable which is then labelled and added to a Legend object. I understand how to do that, but I believe the vbar_stack method creates mutliple glyphs in a single call, so I don't know how to label these and add them to a separate Legend object to place outside the chart area?
Alternatively, is there a simpler way to use the legend argument when calling vbar_stack and then locate the legend outside the chart area?
Any help much appreciated.
For anyone interested, have now fixed this using simple indexing of the vbar_stack glyphs. Solution below:
from bokeh.plotting import figure
from bokeh.io import show
from bokeh.models import Legend
months = ['JAN', 'FEB', 'MAR']
categories = ["cat1", "cat2", "cat3"]
data = {"month" : months,
"cat1" : [1, 4, 12],
"cat2" : [2, 5, 3],
"cat3" : [5, 6, 1]}
colors = ["#c9d9d3", "#718dbf", "#e84d60"]
p = figure(x_range=months, plot_height=250, title="Categories by month",
toolbar_location=None)
v = p.vbar_stack(categories, x='month', width=0.9, color=colors, source=data)
legend = Legend(items=[
("cat1", [v[0]]),
("cat2", [v[1]]),
("cat3", [v[2]]),
], location=(0, -30))
p.add_layout(legend, 'right')
show(p)
Thanks Toby Petty for your answer.
I have slightly improved your code so that it automatically graps the categories from the source data and assigns colors. I thought this might be handy as the categories are often not explicitly stored in a variable and have to be taken from the data.
from bokeh.plotting import figure
from bokeh.io import show
from bokeh.models import Legend
from bokeh.palettes import brewer
months = ['JAN', 'FEB', 'MAR']
data = {"month" : months,
"cat1" : [1, 4, 12],
"cat2" : [2, 5, 3],
"cat3" : [5, 6, 1],
"cat4" : [8, 2, 1],
"cat5" : [1, 1, 3]}
categories = list(data.keys())
categories.remove('month')
colors = brewer['YlGnBu'][len(categories)]
p = figure(x_range=months, plot_height=250, title="Categories by month",
toolbar_location=None)
v = p.vbar_stack(categories, x='month', width=0.9, color=colors, source=data)
legend = Legend(items=[(x, [v[i]]) for i, x in enumerate(categories)], location=(0, -30))
p.add_layout(legend, 'right')
show(p)

Change label ordering in bokeh heatmap

From the bokeh examples
from bokeh.charts import HeatMap, output_file, show
data = {'fruit': ['apples']*3 + ['bananas']*3 + ['pears']*3,
'fruit_count': [4, 5, 8, 1, 2, 4, 6, 5, 4],
'sample': [1, 2, 3]*3}
hm = HeatMap(data, x='fruit', y='sample', values='fruit_count',
title='Fruits', stat=None)
show(hm)
is there a workaround for changing the order in which the labels are displayed? For example, if I wanted to show pears first?
First, you should not use bokeh.charts. It was deprecated, and has been removed from core Bokeh to a separate bkcharts repo. It is completely unsupported and unmaintained. It will not see any new features, bugfixes, improvements, or documentation. It is a dead end.
There are two good options to create this chart:
1) Use the stable and well-supported bokeh.plotting API. This is slightly more verbose, but gives you explicit control over everything, e.g. the order if the categories. In the code below these are specified as x_range and y_range values to figure:
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource, LinearColorMapper
from bokeh.palettes import Spectral9
from bokeh.plotting import figure
from bokeh.transform import transform
source = ColumnDataSource(data={
'fruit': ['apples']*3 + ['bananas']*3 + ['pears']*3,
'fruit_count': [4, 5, 8, 1, 2, 4, 6, 5, 4],
'sample': ['1', '2', '3']*3,
})
mapper = LinearColorMapper(palette=Spectral9, low=0, high=8)
p = figure(x_range=['apples', 'bananas', 'pears'], y_range=['1', '2', '3'],
title='Fruits')
p.rect(x='fruit', y='sample', width=1, height=1, line_color=None,
fill_color=transform('fruit_count', mapper), source=source)
show(p)
This yields the output below:
You can find much more information (as well as live examples) about categorical data with Bokeh in the Handling Categorical Data sections of the User's Guide.
2) Look into HoloViews, which is a very high level API on top of Bokeh that is actively maintained by a team, and endorsed by the Bokeh team as well. A simple HeatMap in HoloViews is typically a one-liner as with bokeh.charts.

stacked Bar charts in Bokeh

I am drawing bar charts with Bokeh( http://docs.bokeh.org/en/latest/docs/user_guide.html ). It is an amazing tool but at the same time I think it is a little bit immature currently. I have a stacked bar chart with 30 categories on x axis and 40 classes corresponding to each category. I am not able to find out the function that can enable me to change colors (colors right now are very ambiguous) and align legend to top. Alternatively, if a information box can be opened when someone hovers over that color, that can be helpful. I have a very little clue if that can be done.
http://docs.bokeh.org/en/latest/docs/user_guide/charts.html#bar
My example is similar to this one except that I have many variables.
Any suggestions?
UPDATE:
I tried myself the below solution but it looks like there is some problem with Bar(). It does not recognize Bar().
import bokeh.plotting as bp
data24 =OrderedDict()
for i in range(10):
data24[i] = np.random.randint(2, size=10)
figut = bp.figure(tools="reset, hover")
s1 = figut.Bar(data24, stacked= True,color=colors )
s1.select(dict(type=HoverTool)).tooltips = {"x":"$index"}
Running it I get:
AttributeError: 'Figure' object has no attribute 'Bar'
Here are the bar colors that I am getting. There is no way to distinguish between colors.
I've had a dig in the bokeh source code and it seems that the bokeh.charts.Bar method will except some keyword arguments. These can be properties of the Builder class which includes the palette property, defined here. You should be able to pass this as an argument therefore to Bar.
Bar(...,palette=['red','green','blue'],...)
Just tested this out by modifying the example that bokeh provides:
from collections import OrderedDict
import pandas as pd
from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.olympics2014 import data
df = pd.io.json.json_normalize(data['data'])
# filter by countries with at least one medal and sort
df = df[df['medals.total'] > 0]
df = df.sort("medals.total", ascending=False)
# get the countries and we group the data by medal type
countries = df.abbr.values.tolist()
gold = df['medals.gold'].astype(float).values
silver = df['medals.silver'].astype(float).values
bronze = df['medals.bronze'].astype(float).values
# build a dict containing the grouped data
medals = OrderedDict(bronze=bronze, silver=silver, gold=gold)
output_file("stacked_bar.html")
bar = Bar(
medals, countries, title="Stacked bars", stacked=True,
palette=['brown', 'silver', 'gold'])
show(bar)
Both the original question and other answer are very out of date. The bokeh.charts API was deprecated and removed years ago. For stacked bar charts in modern bokeh, see the section on Handling Categorical Data
Here is a complete example:
from bokeh.core.properties import value
from bokeh.io import show, output_file
from bokeh.plotting import figure
output_file("stacked.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]
colors = ["#c9d9d3", "#718dbf", "#e84d60"]
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 4, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
p = figure(x_range=fruits, plot_height=250, title="Fruit Counts by Year",
toolbar_location=None, tools="")
p.vbar_stack(years, x='fruits', width=0.9, color=colors, source=data,
legend=[value(x) for x in years])
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
show(p)

Categories