Python Bokeh - blending - python

Note from maintainers: This question concerns the obsolete bokeh.charts API, removed years ago. For information on creating all kinds of Bar charts with modern Bokeh, see:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html
OBSOLETE:
I am trying to create a bar chart from a dataframe df in Python Bokeh library. The data I have simply looks like:
value datetime
5 01-01-2015
7 02-01-2015
6 03-01-2015
... ... (for 3 years)
I would like to have a bar chart that shows 3 bars per month:
one bar for the MEAN of 'value' for the month
one bar for the MAX of 'value' for the month
one bar for the mean of 'value' for the month
I am able to create one bar chart any of MEAN/MAX/MIN with:
from bokeh.charts import Bar, output_file, show
p = Bar(df, 'datetime', values='value', title='mybargraph',
agg='mean', legend=None)
output_file('test.html')
show(p)
How could I have the 3 bar (mean, max, min) on the same plot ? And if possible stacked above each other.
It looks like blend could help me (like in this example:
http://docs.bokeh.org/en/latest/docs/gallery/stacked_bar_chart.html
but I cannot find detailed explanations of how it works. The bokeh website is amazing but for this particular item it is not really detailed.

Note from maintainers: This question concerns the obsolete bokeh.charts API, removed years ago. For information on creating all kinds of Bar charts with modern Bokeh, see:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html
OBSOLETE:
That blend example put me on the right track.
import pandas as pd
from pandas import Series
from dateutil.parser import parse
from bokeh.plotting import figure
from bokeh.layouts import row
from bokeh.charts import Bar, output_file, show
from bokeh.charts.attributes import cat, color
from bokeh.charts.operations import blend
output_file("datestats.html")
Just some sample data, feel free to alter it as you see fit.
First I had to wrangle the data into a proper format.
# Sample data
vals = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
dates = ["01-01-2015", "02-01-2015", "03-01-2015", "04-01-2015",
"01-02-2015", "02-02-2015", "03-02-2015", "04-02-2015",
"01-03-2015", "02-03-2015", "03-03-2015", "04-03-2015"
]
It looked like your date format was "day-month-year" - I used the dateutil.parser so pandas would recognize it properly.
# Format data as pandas datetime objects with day-first custom
days = []
days.append(parse(x, dayfirst=True) for x in dates)
You also needed it grouped by month - I used pandas resample to downsample the dates, get the appropriate values for each month, and merge into a dataframe.
# Put data into dataframe broken into min, mean, and max values each for month
ts = Series(vals, index=days[0])
firstmerge = pd.merge(ts.resample('M').min().to_frame(name="min"),
ts.resample('M').mean().to_frame(name="mean"),
left_index=True, right_index=True)
frame = pd.merge(firstmerge, ts.resample('M').max().to_frame(name="max"),
left_index=True, right_index=True)
Bokeh allows you to use the pandas dataframe's index as the chart's x values,
as discussed here
but it didn't like the datetime values so I added a new column for date labels. See timeseries comment below***.
# You can use DataFrame index for bokeh x values but it doesn't like timestamp
frame['Month'] = frame.index.strftime('%m-%Y')
Finally we get to the charting part. Just like the Olympic medal example, we pass some arguments to Bar.
Play with these however you like, but note that I added the legend by building it outside of the chart altogether. If you have a lot of data points it gets very messy on the chart the way it's built here.
# Main object to render with stacking
bar = Bar(frame,
values=blend('min', 'mean', 'max',
name='values', labels_name='stats'),
label=cat(columns='Month', sort=False),
stack=cat(columns='values', sort=False),
color=color(columns='values',
palette=['SaddleBrown', 'Silver', 'Goldenrod'],
sort=True),
legend=None,
title="Statistical Values Grouped by Month",
tooltips=[('Value', '#values')]
)
# Legend info (displayed as separate chart using bokeh.layouts' row)
factors = ["min", "mean", "max"]
x = [0] * len(factors)
y = factors
pal = ['SaddleBrown', 'Silver', 'Goldenrod']
p = figure(width=100, toolbar_location=None, y_range=factors)
p.rect(x, y, color=pal, width=10, height=1)
p.xaxis.major_label_text_color = None
p.xaxis.major_tick_line_color = None
p.xaxis.minor_tick_line_color = None
# Display chart
show(row(bar, p))
If you copy/paste this code, this is what you will show.
If you render it yourself or if you serve it: hover over each block to see the tooltips (values).
I didn't abstract everything I could (colors come to mind).
This is the type of chart you wanted to build, but it seems like a different chart style would display the data more informatively since stacked totals (min + mean + max) don't provide meaningful information. But I don't know what your data really are.
***You might consider a timeseries chart. This could remove some of the data wrangling done before plotting.
You might also consider grouping your bars instead of stacking them. That way you could easily visualize each month's numbers.

Related

Plotly: How to display individual value on histogram?

I am trying to make dynamic plots with plotly. I want to plot a count of data that have been aggregated (using groupby).
I want to facet the plot by color (and maybe even by column). The problem is that I want the value count to be displayed on each bar. With histogram, I get smooth bars but I can't find how to display the count:
With a bar plot I can display the count but I don't get smooth bar and the count does not appear for the whole bar but for each case composing that bar
Here is my code for the barplot
val = pd.DataFrame(data2.groupby(["program", "gender"])["experience"].value_counts())
px.bar(x=val.index.get_level_values(0), y=val, color=val.index.get_level_values(1), barmode="group", text=val)
It's basically the same for the histogram.
Thank you for your help!
px.histogram does not seem to have a text attribute. So if you're willing to do any binning before producing your plot, I would use px.Bar. Normally, you apply text to your barplot using px.Bar(... text = <something>). But this gives the results you've described with text for all subcategories of your data. But since we know that px.Bar adds data and annotations in the order that the source is organized, we can simply update text to the last subcategory applied using fig.data[-1].text = sums. The only challenge that remains is some data munging to retrieve the correct sums.
Plot:
Complete code with data example:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
# data
df = pd.DataFrame({'x':['a', 'b', 'c', 'd'],
'y1':[1, 4, 9, 16],
'y2':[1, 4, 9, 16],
'y3':[6, 8, 4.5, 8]})
df = df.set_index('x')
# calculations
# column sums for transposed dataframe
sums= []
for col in df.T:
sums.append(df.T[col].sum())
# change dataframe format from wide to long for input to plotly express
df = df.reset_index()
df = pd.melt(df, id_vars = ['x'], value_vars = df.columns[1:])
fig = px.bar(df, x='x', y='value', color='variable')
fig.data[-1].text = sums
fig.update_traces(textposition='inside')
fig.show()
If your first graph is with graph object librairy you can try:
# Use textposition='auto' for direct text
fig=go.Figure(data[go.Bar(x=val.index.get_level_values(0),
y=val, color=val.index.get_level_values(1),
barmode="group", text=val, textposition='auto',
)])

Adding legend to layerd chart in altair

Consider the following example:
import altair as alt
from vega_datasets import data
df = data.seattle_weather()
temp_max = alt.Chart(df).mark_line(color='blue').encode(
x='yearmonth(date):T',
y='max(temp_max)',
)
temp_min = alt.Chart(df).mark_line(color='red').encode(
x='yearmonth(date):T',
y='max(temp_min)',
)
temp_max + temp_min
In the resulting chart, I would like to add a legend that shows, that the blue line shows the maximum temperature and the red line the minimum temperature. What would be the easiest way to achieve this?
I saw (e.g. in the solution to this question: Labelling Layered Charts in Altair (Python)) that altair only adds a legend if in the encoding, you set the color or size or so, usually with a categorical column, but that is not possible here because I'm plotting the whole column and the label should be the column name (which is now shown in the y-axis label).
I would do a fold transform such that the variables could be encoded correctly.
import altair as alt
from vega_datasets import data
df = data.seattle_weather()
alt.Chart(df).mark_line().transform_fold(
fold=['temp_max', 'temp_min'],
as_=['variable', 'value']
).encode(
x='yearmonth(date):T',
y='max(value):Q',
color='variable:N'
)
If you layer two charts with the same columns and tell them to color by the same one, the legend will appear. Don't know is this helps but..
For example, i had:
Range, Amount, Type
0_5, 3, 'Private'
5_10, 5, 'Private'
Range, Amount, Type
0_5, 3, 'Public'
5_10, 5, 'Public'
and I charted both with 'color = 'Type'' and said alt.layer(chart1, chart2) and it showed me a proper legend

Change the order of bars in a grouped barplot with hvplot/holoviews

I try to create a grouped bar plot but can't figure out how to influence the order of the barplot.
Given these example data:
import pandas as pd
import hvplot.pandas
df = pd.DataFrame({
"lu": [200, 100, 10],
"le": [220, 80, 130],
"la": [60, 20, 15],
"group": [1, 2, 2],
})
df = df.groupby("group").sum()
I'd like to create a horizontal grouped bar plot showing the two groups 1 and 2 with all three columns. The columns should appear in the order of "le", "la" and "lu".
Naturally I'd try this with Hvplot:
df.hvplot.barh(x = "group", y = ["le", "la", "lu"])
With that I get the result below:
Hvplot does not seem to care about the order I add the columns (calling df.hvplot.barh(x = "group", y = ["lu", "le", "la"]) doesn't change anything. Nor does Hvplot seem to care about the original order in the dataframe.
Are there any options to influence the order of the bars?
For normal bar charts, you can just order your data in the way you want it to be plotted.
However, for grouped bar charts you can't set the order yet.
But development of this feature is on the way and probably available in one of the next releases: https://github.com/holoviz/holoviews/issues/3799
Current solutions with Hvplot 0.5.2 and Holoviews 1.12:
1) If you're using a Bokeh backend, you can use keyword hooks:
from itertools import product
# define hook function to set order on bokeh plot
def set_grouped_barplot_order(plot, element):
# define you categorical ordering in a list of tuples
factors = product(['2', '1'], ['le', 'la', 'lu'])
# since you're using horizontal bar set order on y_range.factors
# if you would have had a normal (vertical) barplot you would use x_range.factors
plot.state.y_range.factors = [*factors]
# create plot
group = df.groupby("group").sum()
group_plot = group.hvplot.barh(
x="group",
y=["le", "la", "lu"],
padding=0.05,
)
# apply your special ordering function
group_plot.opts(hooks=[set_grouped_barplot_order], backend='bokeh')
Hooks allow you to apply specific bokeh settings to your plots. You don't need hooks very often, but they are very handy in this case.
Documentation:
http://holoviews.org/user_guide/Customizing_Plots.html#Plot-hooks
https://holoviews.org/FAQ.html
2) Another solution would be to convert your Holoviews plot to an actual Bokeh plot and then set the ordering:
from itertools import product
import holoviews as hv
from bokeh.plotting import show
# create plot
group = df.groupby("group").sum()
group_plot = group.hvplot.barh(
x="group",
y=["le", "la", "lu"],
padding=0.05,
)
# render your holoviews plot as a bokeh plot
my_bokeh_plot = hv.render(group_plot, backend='bokeh')
# set the custom ordering on your bokeh plot
factors = product(['2', '1'], ['le', 'la', 'lu'])
my_bokeh_plot.y_range.factors = [*factors]
show(my_bokeh_plot)
Personally I prefer the first solution because it stays within Holoviews.
Resulting plot:
This has just been fixed in HoloViews 1.13.
You can sort your barplot just like you wanted:
df.hvplot.barh(x="group", y=["lu", "la", "le"])
As I write this, HoloViews 1.13 is not officially available yet, but you can install it through:
pip install git+https://github.com/holoviz/holoviews.git
If you want even more control over the order, you can use .redim.values() on your grouped_barplot:
group_specific_order = [2, 1]
variable_specific_order = ['lu', 'la', 'le']
# Note that group and Variable are the variable names of your dimensions here
# when you use this on a different grouped barchart, then please change to the
# names of your own dimensions.
your_grouped_barplot.redim.values(
group=group_specific_order,
Variable=variable_specific_order,
)

How to plot pandas grouped values using pygal?

I have a csv like this:
name,version,color
AA,"version 1",yellow
BB,"version 2",black
CC,"version 3",yellow
DD,"version 1",black
AA,"version 1",green
BB,"version 2",green
FF,"version 3",green
GG,"version 3",red
BB,"version 3",yellow
BB,"version 2",red
BB,"version 1",black
I would like to draw a bar chart, which shows versions on x axis and an amount (number) of different colors on y axis.
So I want to group DataFrame by version, check which colors belong to a particular version, count colors and display the results on the pygal bar chart.
It should look similar to this:
What I tried so far:
df = pd.read_csv(results)
new_df = df.groupby('version')['color'].value_counts()
bar_chart = pygal.Bar(width=1000, height=600,
legend_at_bottom=True, human_readable=True,
title='versions vs colors',
x_title='Version',
y_title='Number')
versions = []
for index, row in new_df.iteritems():
versions.append(index[0])
bar_chart.add(index[1], row)
bar_chart.x_labels = map(str, versions)
bar_chart.render_to_file('bar-chart.svg')
Unfortunately, it does not work and can not match group of colors to proper version.
I also tried using matplotlib.pyplot and it works like a charm:
pd.crosstab(df['version'],df['color']).plot.bar(ax=ax)
plt.draw()
This works as well:
df.groupby(['version','color']).size().unstack(fill_value=0).plot.bar()
But the generated chart is not accurate enough for me. I would like to have pygal chart.
I also checked:
How to plot pandas groupby values in a graph?
How to plot a pandas dataframe?

Parsing CSV file using Panda

I have been using matplotlib for quite some time now and it is great however, I want to switch to panda and my first attempt at it didn't go so well.
My data set looks like this:
sam,123,184,2.6,543
winter,124,284,2.6,541
summer,178,384,2.6,542
summer,165,484,2.6,544
winter,178,584,2.6,545
sam,112,684,2.6,546
zack,145,784,2.6,547
mike,110,984,2.6,548
etc.....
I want first to search the csv for anything with the name mike and create it own list. Now with this list I want to be able to do some math for example add sam[3] + winter[4] or sam[1]/10. The last part would be to plot it columns against each other.
Going through this page
http://pandas.pydata.org/pandas-docs/stable/io.html#io-read-csv-table
The only thing I see is if I have a column header, however, I don't have any headers. I only know the position in a row of the values I want.
So my question is:
How do I create a bunch of list for each row (sam, winter, summer)
Is this method efficient if my csv has millions of data point?
Could I use matplotlib plotting to plot pandas dataframe?
ie :
fig1 = plt.figure(figsize= (10,10))
ax = fig1.add_subplot(211)
ax.plot(mike[1], winter[3], label='Mike vs Winter speed', color = 'red')
You can read a csv without headers:
data=pd.read_csv(filepath, header=None)
Columns will be numbered starting from 0.
Selecting and filtering:
all_summers = data[data[0]=='summer']
If you want to do some operations grouping by the first column, it will look like this:
data.groupby(0).sum()
data.groupby(0).count()
...
Selecting a row after grouping:
sums = data.groupby(0).sum()
sums.loc['sam']
Plotting example:
sums.plot()
import matplotlib.pyplot as plt
plt.show()
For more details about plotting, see: http://pandas.pydata.org/pandas-docs/version/0.18.1/visualization.html
df = pd.read_csv(filepath, header=None)
mike = df[df[0]=='mike'].values.tolist()
winter = df[df[0]=='winter'].values.tolist()
Then you can plot those list as you wanted to above
fig1 = plt.figure(figsize= (10,10))
ax = fig1.add_subplot(211)
ax.plot(mike, winter, label='Mike vs Winter speed', color = 'red')

Categories