Plotly: How to display individual value on histogram? - python

I am trying to make dynamic plots with plotly. I want to plot a count of data that have been aggregated (using groupby).
I want to facet the plot by color (and maybe even by column). The problem is that I want the value count to be displayed on each bar. With histogram, I get smooth bars but I can't find how to display the count:
With a bar plot I can display the count but I don't get smooth bar and the count does not appear for the whole bar but for each case composing that bar
Here is my code for the barplot
val = pd.DataFrame(data2.groupby(["program", "gender"])["experience"].value_counts())
px.bar(x=val.index.get_level_values(0), y=val, color=val.index.get_level_values(1), barmode="group", text=val)
It's basically the same for the histogram.
Thank you for your help!

px.histogram does not seem to have a text attribute. So if you're willing to do any binning before producing your plot, I would use px.Bar. Normally, you apply text to your barplot using px.Bar(... text = <something>). But this gives the results you've described with text for all subcategories of your data. But since we know that px.Bar adds data and annotations in the order that the source is organized, we can simply update text to the last subcategory applied using fig.data[-1].text = sums. The only challenge that remains is some data munging to retrieve the correct sums.
Plot:
Complete code with data example:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
# data
df = pd.DataFrame({'x':['a', 'b', 'c', 'd'],
'y1':[1, 4, 9, 16],
'y2':[1, 4, 9, 16],
'y3':[6, 8, 4.5, 8]})
df = df.set_index('x')
# calculations
# column sums for transposed dataframe
sums= []
for col in df.T:
sums.append(df.T[col].sum())
# change dataframe format from wide to long for input to plotly express
df = df.reset_index()
df = pd.melt(df, id_vars = ['x'], value_vars = df.columns[1:])
fig = px.bar(df, x='x', y='value', color='variable')
fig.data[-1].text = sums
fig.update_traces(textposition='inside')
fig.show()

If your first graph is with graph object librairy you can try:
# Use textposition='auto' for direct text
fig=go.Figure(data[go.Bar(x=val.index.get_level_values(0),
y=val, color=val.index.get_level_values(1),
barmode="group", text=val, textposition='auto',
)])

Related

Python / Seaborn - How to plot the names of each value in a scatterplot

first of all, in case I comment on any mistakes while writing this, sorry, English is not my first language.
I'm a begginer with Data vizualiation with python, I have a dataframe with 115 rows, and I want to do a scatterplot with 4 quadrants and show the values in R1 (image below for reference)
enter image description here
At moment this is my scatterplot. It's a football player dataset so I want to plot the name of the players name in the 'R1'. Is that possible?
enter image description here
You can annotate each point by making a sub-dataframe of just the players in a quadrant that you care about based on their x/y values using plt.annotate. So something like this:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
##### Making a mock dataset #################################################
names = ['one', 'two', 'three', 'four', 'five', 'six']
value_1 = [1, 2, 3, 4, 5, 6]
value_2 = [1, 2, 3, 4, 5, 6]
df = pd.DataFrame(zip(names, value_1, value_2), columns = ['name', 'v_1', 'v_2'])
#############################################################################
plt.rcParams['figure.figsize'] = (10, 5) # sizing parameter to make the graph bigger
ax1 = sns.scatterplot(x = value_1, y = value_2, s = 100) # graph code
# Code to make a subset of data that fits the specific conditions that I want to annotate
quadrant = df[(df.v_1 > 3) & (df.v_2 > 3)].reset_index(drop = True)
# Code to annotate the above "quadrant" on the graph
for x in range(len(quadrant)):
plt.annotate('Player: {}\nValue 1: {}\nValue 2: {}'.format(quadrant.name[x], quadrant.v_1[x], quadrant.v_2[x]),
(quadrant.v_1[x], quadrant.v_2[x])
Output graph:
If you're just working in the notebook and don't need to save the image with all the player's names, then using a "hover" feature might be a better idea. Annotating every player's name might become too busy for the graph, so just hovering over the point might work out better for you.
%matplotlib widget # place this or "%matplotlib notebook" at the top of your notebook
# This allows you to work with matplotlib graphs in line
import mplcursors # this is the module that allows hovering features on your graph
# using the same dataframe from above
ax1 = sns.scatterplot(x = value_1, y = value_2, s = 100)
#mplcursors.cursor(ax1, hover=2).connect("add") # add the plot to the hover feature of mplcursors
def _(sel):
sel.annotation.set_text('Player: {}\nValue 1: {}\nValue 2: {}'.format(df.name[sel.index], sel.target[0], sel.target[0])) # set the text
# you don't need any of the below but I like to customize
sel.annotation.get_bbox_patch().set(fc="lightcoral", alpha=1) # set the box color
sel.annotation.arrow_patch.set(arrowstyle='-|>', connectionstyle='angle3', fc='black', alpha=.5) # set the arrow style
Example outputs from hovering:
You can do two (or more) scatter plots on a single figure.
If I understand correctly what you want to do, you could separate your dataset in two :
Points for which you don't want the name to be plotted
Points for which you want the name to be plotted
You can then plot the second data set and display the name.
Without any other details on your problem, it is difficult to do more. You could edit your question and add a minimal example of your data set.

Using Altair to generate an unstacked barplot with an already stacked data

I created a DataFrame with stacked data, i.e., tested \in total.
import pandas as pd
import altair as alt
df = pd.DataFrame({
'date': ['2021-01-01', '2021-02-01', '2021-03-01'],
'total': [10, 15, 20],
'tested': [0, 5, 10]
})
dfm = df.melt(id_vars=['date'])
I would like to plot a stacked bar plot with Altair. Since the tested column is already contained in the total column, I would expect a chart with the max values of the total, but the result shows the sum.
alt.Chart(dfm).mark_bar().encode(
x='date:O',
y='value:Q',
color='variable:O'
)
I know I can use pandas to create an untested column and generate the plot using tested and untested columns, but I would like to know if I can achieve this result without transforming the data.
To create an unstacked bar chart you can set stack=False:
alt.Chart(dfm).mark_bar().encode(
x='date:O',
y=alt.Y('value:Q', stack=False),
color='variable:O'
)
Note that it will always show the rightmost column on top (tested in the image above).

Adding legend to layerd chart in altair

Consider the following example:
import altair as alt
from vega_datasets import data
df = data.seattle_weather()
temp_max = alt.Chart(df).mark_line(color='blue').encode(
x='yearmonth(date):T',
y='max(temp_max)',
)
temp_min = alt.Chart(df).mark_line(color='red').encode(
x='yearmonth(date):T',
y='max(temp_min)',
)
temp_max + temp_min
In the resulting chart, I would like to add a legend that shows, that the blue line shows the maximum temperature and the red line the minimum temperature. What would be the easiest way to achieve this?
I saw (e.g. in the solution to this question: Labelling Layered Charts in Altair (Python)) that altair only adds a legend if in the encoding, you set the color or size or so, usually with a categorical column, but that is not possible here because I'm plotting the whole column and the label should be the column name (which is now shown in the y-axis label).
I would do a fold transform such that the variables could be encoded correctly.
import altair as alt
from vega_datasets import data
df = data.seattle_weather()
alt.Chart(df).mark_line().transform_fold(
fold=['temp_max', 'temp_min'],
as_=['variable', 'value']
).encode(
x='yearmonth(date):T',
y='max(value):Q',
color='variable:N'
)
If you layer two charts with the same columns and tell them to color by the same one, the legend will appear. Don't know is this helps but..
For example, i had:
Range, Amount, Type
0_5, 3, 'Private'
5_10, 5, 'Private'
Range, Amount, Type
0_5, 3, 'Public'
5_10, 5, 'Public'
and I charted both with 'color = 'Type'' and said alt.layer(chart1, chart2) and it showed me a proper legend

Change the order of bars in a grouped barplot with hvplot/holoviews

I try to create a grouped bar plot but can't figure out how to influence the order of the barplot.
Given these example data:
import pandas as pd
import hvplot.pandas
df = pd.DataFrame({
"lu": [200, 100, 10],
"le": [220, 80, 130],
"la": [60, 20, 15],
"group": [1, 2, 2],
})
df = df.groupby("group").sum()
I'd like to create a horizontal grouped bar plot showing the two groups 1 and 2 with all three columns. The columns should appear in the order of "le", "la" and "lu".
Naturally I'd try this with Hvplot:
df.hvplot.barh(x = "group", y = ["le", "la", "lu"])
With that I get the result below:
Hvplot does not seem to care about the order I add the columns (calling df.hvplot.barh(x = "group", y = ["lu", "le", "la"]) doesn't change anything. Nor does Hvplot seem to care about the original order in the dataframe.
Are there any options to influence the order of the bars?
For normal bar charts, you can just order your data in the way you want it to be plotted.
However, for grouped bar charts you can't set the order yet.
But development of this feature is on the way and probably available in one of the next releases: https://github.com/holoviz/holoviews/issues/3799
Current solutions with Hvplot 0.5.2 and Holoviews 1.12:
1) If you're using a Bokeh backend, you can use keyword hooks:
from itertools import product
# define hook function to set order on bokeh plot
def set_grouped_barplot_order(plot, element):
# define you categorical ordering in a list of tuples
factors = product(['2', '1'], ['le', 'la', 'lu'])
# since you're using horizontal bar set order on y_range.factors
# if you would have had a normal (vertical) barplot you would use x_range.factors
plot.state.y_range.factors = [*factors]
# create plot
group = df.groupby("group").sum()
group_plot = group.hvplot.barh(
x="group",
y=["le", "la", "lu"],
padding=0.05,
)
# apply your special ordering function
group_plot.opts(hooks=[set_grouped_barplot_order], backend='bokeh')
Hooks allow you to apply specific bokeh settings to your plots. You don't need hooks very often, but they are very handy in this case.
Documentation:
http://holoviews.org/user_guide/Customizing_Plots.html#Plot-hooks
https://holoviews.org/FAQ.html
2) Another solution would be to convert your Holoviews plot to an actual Bokeh plot and then set the ordering:
from itertools import product
import holoviews as hv
from bokeh.plotting import show
# create plot
group = df.groupby("group").sum()
group_plot = group.hvplot.barh(
x="group",
y=["le", "la", "lu"],
padding=0.05,
)
# render your holoviews plot as a bokeh plot
my_bokeh_plot = hv.render(group_plot, backend='bokeh')
# set the custom ordering on your bokeh plot
factors = product(['2', '1'], ['le', 'la', 'lu'])
my_bokeh_plot.y_range.factors = [*factors]
show(my_bokeh_plot)
Personally I prefer the first solution because it stays within Holoviews.
Resulting plot:
This has just been fixed in HoloViews 1.13.
You can sort your barplot just like you wanted:
df.hvplot.barh(x="group", y=["lu", "la", "le"])
As I write this, HoloViews 1.13 is not officially available yet, but you can install it through:
pip install git+https://github.com/holoviz/holoviews.git
If you want even more control over the order, you can use .redim.values() on your grouped_barplot:
group_specific_order = [2, 1]
variable_specific_order = ['lu', 'la', 'le']
# Note that group and Variable are the variable names of your dimensions here
# when you use this on a different grouped barchart, then please change to the
# names of your own dimensions.
your_grouped_barplot.redim.values(
group=group_specific_order,
Variable=variable_specific_order,
)

Python Bokeh - blending

Note from maintainers: This question concerns the obsolete bokeh.charts API, removed years ago. For information on creating all kinds of Bar charts with modern Bokeh, see:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html
OBSOLETE:
I am trying to create a bar chart from a dataframe df in Python Bokeh library. The data I have simply looks like:
value datetime
5 01-01-2015
7 02-01-2015
6 03-01-2015
... ... (for 3 years)
I would like to have a bar chart that shows 3 bars per month:
one bar for the MEAN of 'value' for the month
one bar for the MAX of 'value' for the month
one bar for the mean of 'value' for the month
I am able to create one bar chart any of MEAN/MAX/MIN with:
from bokeh.charts import Bar, output_file, show
p = Bar(df, 'datetime', values='value', title='mybargraph',
agg='mean', legend=None)
output_file('test.html')
show(p)
How could I have the 3 bar (mean, max, min) on the same plot ? And if possible stacked above each other.
It looks like blend could help me (like in this example:
http://docs.bokeh.org/en/latest/docs/gallery/stacked_bar_chart.html
but I cannot find detailed explanations of how it works. The bokeh website is amazing but for this particular item it is not really detailed.
Note from maintainers: This question concerns the obsolete bokeh.charts API, removed years ago. For information on creating all kinds of Bar charts with modern Bokeh, see:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html
OBSOLETE:
That blend example put me on the right track.
import pandas as pd
from pandas import Series
from dateutil.parser import parse
from bokeh.plotting import figure
from bokeh.layouts import row
from bokeh.charts import Bar, output_file, show
from bokeh.charts.attributes import cat, color
from bokeh.charts.operations import blend
output_file("datestats.html")
Just some sample data, feel free to alter it as you see fit.
First I had to wrangle the data into a proper format.
# Sample data
vals = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
dates = ["01-01-2015", "02-01-2015", "03-01-2015", "04-01-2015",
"01-02-2015", "02-02-2015", "03-02-2015", "04-02-2015",
"01-03-2015", "02-03-2015", "03-03-2015", "04-03-2015"
]
It looked like your date format was "day-month-year" - I used the dateutil.parser so pandas would recognize it properly.
# Format data as pandas datetime objects with day-first custom
days = []
days.append(parse(x, dayfirst=True) for x in dates)
You also needed it grouped by month - I used pandas resample to downsample the dates, get the appropriate values for each month, and merge into a dataframe.
# Put data into dataframe broken into min, mean, and max values each for month
ts = Series(vals, index=days[0])
firstmerge = pd.merge(ts.resample('M').min().to_frame(name="min"),
ts.resample('M').mean().to_frame(name="mean"),
left_index=True, right_index=True)
frame = pd.merge(firstmerge, ts.resample('M').max().to_frame(name="max"),
left_index=True, right_index=True)
Bokeh allows you to use the pandas dataframe's index as the chart's x values,
as discussed here
but it didn't like the datetime values so I added a new column for date labels. See timeseries comment below***.
# You can use DataFrame index for bokeh x values but it doesn't like timestamp
frame['Month'] = frame.index.strftime('%m-%Y')
Finally we get to the charting part. Just like the Olympic medal example, we pass some arguments to Bar.
Play with these however you like, but note that I added the legend by building it outside of the chart altogether. If you have a lot of data points it gets very messy on the chart the way it's built here.
# Main object to render with stacking
bar = Bar(frame,
values=blend('min', 'mean', 'max',
name='values', labels_name='stats'),
label=cat(columns='Month', sort=False),
stack=cat(columns='values', sort=False),
color=color(columns='values',
palette=['SaddleBrown', 'Silver', 'Goldenrod'],
sort=True),
legend=None,
title="Statistical Values Grouped by Month",
tooltips=[('Value', '#values')]
)
# Legend info (displayed as separate chart using bokeh.layouts' row)
factors = ["min", "mean", "max"]
x = [0] * len(factors)
y = factors
pal = ['SaddleBrown', 'Silver', 'Goldenrod']
p = figure(width=100, toolbar_location=None, y_range=factors)
p.rect(x, y, color=pal, width=10, height=1)
p.xaxis.major_label_text_color = None
p.xaxis.major_tick_line_color = None
p.xaxis.minor_tick_line_color = None
# Display chart
show(row(bar, p))
If you copy/paste this code, this is what you will show.
If you render it yourself or if you serve it: hover over each block to see the tooltips (values).
I didn't abstract everything I could (colors come to mind).
This is the type of chart you wanted to build, but it seems like a different chart style would display the data more informatively since stacked totals (min + mean + max) don't provide meaningful information. But I don't know what your data really are.
***You might consider a timeseries chart. This could remove some of the data wrangling done before plotting.
You might also consider grouping your bars instead of stacking them. That way you could easily visualize each month's numbers.

Categories