I have data in a dataframe that I want to plot with a stacked bar plot:
test_df = pd.DataFrame([[1, 5, 1, 'A'], [2, 10, 1, 'B'], [3, 3, 1, 'A']], columns = ('ID', 'Value', 'Bucket', 'Type'))
if I do the plot with Plotly Express I get bars stacked on each other and correctly ordered (based on the index):
fig = px.bar(test_df, x='Bucket', y='Value', barmode='stack')
However, I want to color the data based on Type, hence I go for
fig = px.bar(test_df, x='Bucket', y='Value', barmode='stack', color='Type')
This works, except now the ordering is messed up, because all bars are now grouped by Type. I looked through the docs of Plotly Express and couldn't find a way to specify the ordering of the bars independently. Any tips on how to do this?
I found this one here, but the scenario is a bit different and the options mentioned there don't seem to help me:
How to disable plotly express from grouping bars based on color?
Edit: This goes into the right direction, but not with using Plotly Express, but rather Plotly graph_objects:
import plotly.graph_objects as go
test_df = pd.DataFrame([[1, 5, 1, 'A', 'red'], [2, 10, 1, 'B', 'blue'], [3, 3, 1, 'A', 'red']], columns = ('ID', 'Value', 'Bucket', 'Type', 'Color'))
fig = go.Figure()
fig.add_trace(go.Bar(x=test_df["Bucket"], y=test_df["Value"], marker_color=test_df["Color"]))
Output:
Still, I'd prefer the Express version, because so many things are easier to handle there (Legend, Hover properties etc.).
The only way I can understand your question is that you don't want B to be stacked on top of A, but rather the opposite. If that's the case, then you can get what you want through:
fig.data = fig.data[::-1]
fig.layout.legend.traceorder = 'reversed'
Some details:
fig.data = fig.data[::-1] simply reverses the order that the traces appear in fig.data and ultimately in the plotted figure itself. This will however reverse the order of the legend as well. So without fig.layout.legend.traceorder = 'reversed' the result would be:
And so it follows that the complete work-around looks like this:
fig.data = fig.data[::-1]
fig.layout.legend.traceorder = 'reversed'
Complete code:
import pandas as px
import plotly.express as px
test_df = pd.DataFrame([[1, 5, 1, 'A'], [2, 10, 1, 'B'], [3, 3, 1, 'A']], columns = ('ID', 'Value', 'Bucket', 'Type'))
fig = px.bar(test_df, x='Bucket', y='Value', barmode='stack', color='Type')
fig.data = fig.data[::-1]
fig.layout.legend.traceorder = 'reversed'
fig.show()
Ok, sorry for the long delay on this, but I finally got around to solving this.
My solution is possibly not the most straight forward one, but it does work.
The basic idea is to use graph_objects instead of express and then iterate over the dataframe and add each bar as a separate trace. This way, each trace can get a name that can be grouped in a certain way (which is not possible if adding all bars in a single trace, or at least I could not find a way).
Unfortunately, the ordering of the legend is messed up (if you have more then 2 buckets) and there is no way in plotly currently to sort it. But that's a minor thing.
The main thing that bothers me is that this could've been so much easier if plotly.express allowed for manual ordering of the bars by a certain column.
Maybe I'll submit that as a suggestion.
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default = "browser"
test_df = pd.DataFrame(
[[1, 5, 1, 'B'], [3, 3, 1, 'A'], [5, 10, 1, 'B'],
[2, 8, 2, 'B'], [4, 5, 2, 'A'], [6, 3, 2, 'A']],
columns = ('ID', 'Value', 'Bucket', 'Type'))
# add named colors to the dataframe based on type
test_df.loc[test_df['Type'] == 'A', 'Color'] = 'Crimson'
test_df.loc[test_df['Type'] == 'B', 'Color'] = 'ForestGreen'
# ensure that the dataframe is sorted by the values
test_df.sort_values('ID', inplace=True)
fig = go.Figure()
# it's tedious to iterate over each item, but only this way we can ensure that everything is correctly ordered and labelled
# Set up legend_show_dict to check if an item should be shown or not. This should be only done for the first occurrence to avoid duplication.
legend_show_dict = {}
for i, row in test_df.iterrows():
if row['Type'] in legend_show_dict:
legend_show = legend_show_dict[row['Type']]
else:
legend_show = True
legend_show_dict[row['Type']] = False
fig.add_trace(
go.Bar(
x=[row['Bucket']],
y=[row['Value']],
marker_color=row['Color'],
name=row['Type'],
legendgroup=row['Type'],
showlegend=legend_show,
hovertemplate="<br>".join([
'ID: ' + str(row['ID']),
'Value: ' + str(row['Value']),
'Bucket: ' + str(row['Value']),
'Type: ' + row['Type'],
])
))
fig.update_layout(
xaxis={'categoryorder': 'category ascending', 'title': 'Bucket'},
yaxis={'title': 'Value'},
legend={'traceorder': 'normal'}
)
fig.update_layout(barmode='stack', font_size=20)
fig.show()
This is what it should look like then:
Related
In seaborn, is it possible to group observations based on a column without using the hue argument?
For example, how could I get these two lines to show up in the same colour, but as separate lines?
Code for generating this is below.
import pandas as pd
import seaborn as sns
df = pd.DataFrame(
{
'group': ["group01", "group01", "group02", "group02"],
'x': [1, 2, 3, 5],
'y': [2, 4, 3, 5]
}
)
sns.lineplot(df, x='x', y='y', hue='group')
plt.show()
This is straightforward to do in R's ggplot, by mapping the group variable to group, rather than to colour. For example, see this.
The reason I want to do this is that I want to show multiple overlaid plots all in the same colour. This helps to show variability across different datasets. The different colours that I would get with seaborn's hue are unnecessary and distracting, especially when there would be dozens of them. Here is the sort of plot I want to create:
seaborn.lineplot has a units parameter, which seems to be equivalent to ggplot's group:
units: vector or key in data
Grouping variable identifying sampling units. When used, a separate line will be drawn for each unit with appropriate semantics,
but no legend entry will be added. Useful for showing distribution of
experimental replicates when exact identities are not needed.
sns.lineplot(df, x='x', y='y', units='group')
Output:
combining units and hue in a more complex example:
df = pd.DataFrame(
{
'group': ["group01", "group01", "group02", "group02", "group01", "group01"],
'group2': ['A', 'A', 'A', 'A', 'B', 'B'],
'x': [1, 2, 3, 5, 2, 4],
'y': [2, 4, 3, 5, 3, 2]
}
)
sns.lineplot(df, x='x', y='y', units='group', hue='group2')
Output:
As per the Plotly website, in a simple line chart one can change the legend entry from the column name to a manually specified string of text. For example, this code results in the following chart:
import pandas as pd
import plotly.express as px
df = pd.DataFrame(dict(
x = [1, 2, 3, 4],
y = [2, 3, 4, 3]
))
fig = px.line(
df,
x="x",
y="y",
width=800, height=600,
labels={
"y": "Series"
},
)
fig.show()
label changed:
However, when one plots multiple columns to the line chart, this label specification no longer works. There is no error message, but the legend entries are simply not changed. See this example and output:
import pandas as pd
import plotly.express as px
df = pd.DataFrame(dict(
x = [1, 2, 3, 4],
y1 = [2, 3, 4, 3],
y2 = [2, 4, 6, 8]
))
fig = px.line(
df,
x="x",
y=["y1", "y2"],
width=800, height=600,
labels={
"y1": "Series 1",
"y2": "Series 2"
},
)
fig.show()
legend entries not changed:
Is this a bug, or am I missing something? Any idea how this can be fixed?
In case anybody read my previous post, I did some more digging and found the solution to this issue. At the heart, the labels one sees over on the right in the legend are attributes known as "names" and not "labels". Searching for how to revise those names, I came across another post about this issue with a solution Legend Label Update. Using that information, here is a revised version of your program.
import pandas as pd
import plotly.express as px
df = pd.DataFrame(dict(
x = [1, 2, 3, 4],
y1 = [2, 3, 4, 3],
y2 = [2, 4, 6, 8]
))
fig = px.line(df, x="x", y=["y1", "y2"], width=800, height=600)
fig.update_layout(legend_title_text='Variable', xaxis_title="X", yaxis_title="Series")
newnames = {'y1':'Series 1', 'y2': 'Series 2'} # From the other post
fig.for_each_trace(lambda t: t.update(name = newnames[t.name]))
fig.show()
Following is a sample graph.
Try that out to see if that addresses your situation.
Regards.
I get an overview of all distinct values from a data frame with this lambda function:
overview = df.apply(lambda col: col.unique())
Which returns the desired result like that:
ColA [1,2,3,...]
ColB [4,5,6,7,8,9...]
ColC [A,B,C]
... ...
How can I visualize this result using subplots / multiple bar plots?
My first attempt was just throwing the object into the plot method of dataframe, which apparantly not works. So I tried to create a dataframe out of the object:
overview = {}
for attr, value in overview.iteritems():
overview[attr] = value
df = pd.DataFrame(overview)
The output is:
ValueError: arrays must all be same length
So I'm trying utilizing a list:
overview = []
for attr, value in obj_overview.iteritems():
overview.append({attr: value})
df = pd.DataFrame(overview)
But the result is a cross-matrix, which has as many rows as columns and row n refers to column n. Which is wrong, too.
How can I get an overview using multiple bar charts / sub plots showing distinct values of a data frame?
There are in fact two possible goals I'd like to achieve:
There are multiple bar charts, where every chart represents one column in the original dataframe. X-axis shows all distinct / unique values, Y-axis shows occurences for each of those values. This is the nice-to-have-option. I know that my current approach cannot cover this. It's based on a similar plugin Alteryx e.g. offers:
This should be possible with my current approach: only one (stacked) bar chart is showing all columnes, where the x-axis shows every column, every respective bar contains all distinct values.
Thanks!
Separate Plots via value_counts:
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'ColA': [1, 2, 4, 4, 5],
'ColB': [4, 4, 6, 6, 6],
'ColC': ['A', 'C', 'C', 'E', 'E']})
for col in df:
df[col].value_counts().sort_index().plot(kind='bar', rot=0, ylabel='count')
plt.show()
ColA
ColB
ColC
Single Stacked Plot via melt + crosstab:
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'ColA': [1, 2, 4, 4, 5],
'ColB': [4, 4, 6, 6, 6],
'ColC': ['A', 'C', 'C', 'E', 'E']})
overview = df.melt()
overview = pd.crosstab(overview['variable'], overview['value'])
ax = overview.plot(kind='bar', stacked=True, rot=0, ylabel='count')
ax.legend(bbox_to_anchor=(1.2, 1))
plt.tight_layout()
plt.show()
This will give you one heatmap for all numerical columns and one for all alphabetical columns, where the colour represents the number of occurrences. It's a different way to plot the info as an alternative.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
col_dict = {
'A': [1,2,3],
'B': [3,4,4,4,5,5,6],
'C': ['A','B','C'],
'D': ['C', 'D', 'D']
}
num_cols = []
num_idx = []
letter_cols = []
letter_idx = []
for col in col_dict:
if isinstance(col_dict[col][0], int):
num_cols += col_dict[col]
num_idx.append(col)
else:
letter_cols += col_dict[col]
letter_idx.append(col)
num_cols = sorted(list(set(num_cols)))
letter_cols = sorted(list(set(letter_cols)))
num_df = pd.DataFrame(0, index=num_idx, columns=num_cols)
letter_df = pd.DataFrame(0, index=letter_idx, columns=letter_cols)
for col in col_dict:
if isinstance(col_dict[col][0], int):
for item in col_dict[col]:
num_df.loc[col, item] += 1
else:
for item in col_dict[col]:
letter_df.loc[col, item] += 1
print(num_df)
print(letter_df)
plt.set_cmap('inferno')
plt.pcolor(num_df)
plt.yticks(np.arange(0.5, len(num_df.index), 1), num_df.index)
plt.xticks(np.arange(0.5, len(num_df.columns), 1), num_df.columns)
plt.colorbar()
plt.xlabel('Counts')
plt.ylabel('Columns')
plt.title('Numerical occurrences')
plt.figure()
plt.pcolor(letter_df)
plt.yticks(np.arange(0.5, len(letter_df.index), 1), letter_df.index)
plt.xticks(np.arange(0.5, len(letter_df.columns), 1), letter_df.columns)
plt.colorbar()
plt.xlabel('Counts')
plt.ylabel('Columns')
plt.title('Aphabetical occurrences')
plt.show()
I have two dictionaries:
days = {'a':[1,2,3], 'b':[3,4,5]}
vals = {'a':[10,20,30], 'b':[9,16,25]}
Using plotly (ideally plotly express) I would like one line plot with two lines: the first line being days['a'] vs vals['a'] and the second line being days['b'] vs vals['b']. Of course in practice I may have many more potential lines. I am not sure how to pull this off. I'm happy to make a dataframe out of this data but not sure what the best structure is.
Thanks! Apologies for a noob question.
You can try the following:
import plotly.graph_objects as go
# your data
days = {'a':[1,2,3], 'b':[3,4,5]}
vals = {'a':[10,20,30], 'b':[9,16,25]}
# generate a plot for each dictionary key
data = []
for k in days.keys():
plot = go.Scatter(x=days[k],
y=vals[k],
mode="lines",
name=k # label for the plot legend
)
data.append(plot)
# create a figure with all plots and display it
fig = go.Figure(data=data)
fig.show()
This gives:
With Plotly Express:
import plotly.express as px
import pandas as pd
days = {'a': [1, 2, 3], 'b': [3, 4, 5]}
vals = {'a': [10, 20, 30], 'b': [9, 16, 25]}
# build DataFrame
df = pd.DataFrame(columns=["days", "vals", "label"])
for k in days.keys():
df = df.append(pd.DataFrame({
"days": days[k],
"vals": vals[k],
"label": k
}))
fig = px.line(df, x="days", y="vals", color="label")
fig.show()
The result is the same as above.
I have data as shown below:
So, from this, I need to display the count in each category year_month_id wise. Since I have 12 months there will be 12 sub-divisions and under each count of
ID within each class.
Something like the image below is what I am looking for.
Now the examples in Bokeh use ColumnDataSource and dictionary mapping, but how do I do this for my dataset.
Can someone please help me with this?
Below is the expected output in tabular and chart format.
I believe the pandas Python package would come in handy for preparing your data for plotting. It's useful for manipulating table-like data structures.
Here is how I went about your problem:
from pandas import DataFrame
from bokeh.io import show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource
from bokeh.palettes import Viridis5
# Your sample data
df = DataFrame({'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 1],
'year_month_id': [201612, 201612, 201612, 201612, 201612, 201612, 201612, 201612, 201612, 201701],
'class': ['A', 'D', 'B', 'other', 'other', 'other', 'A', 'other', 'A', 'B']
})
# Get counts of groups of 'class' and fill in 'year_month_id' column
df2 = DataFrame({'count': df.groupby(["year_month_id", "class"]).size()}).reset_index()
df2 now looks like this:
# Create new column to make plotting easier
df2['class-date'] = df2['class'] + "-" + df2['year_month_id'].map(str)
# x and y axes
class_date = df2['class-date'].tolist()
count = df2['count'].tolist()
# Bokeh's mapping of column names and data lists
source = ColumnDataSource(data=dict(class_date=class_date, count=count, color=Viridis5))
# Bokeh's convenience function for creating a Figure object
p = figure(x_range=class_date, y_range=(0, 5), plot_height=350, title="Counts",
toolbar_location=None, tools="")
# Render and show the vbar plot
p.vbar(x='class_date', top='count', width=0.9, color='color', source=source)
show(p)
So the Bokeh plot looks like this:
Of course you can alter it to suit your needs. The first thing I thought of was making the top of the y_range variable so it could accommodate data better, though I have not tried it myself.