Related
I have a rather simple strip plot with vertical data.
planets = sns.load_dataset("planets")
sns.stripplot(x="method", y="distance", data=planets, size=4, color=".7")
plt.xticks(rotation=45, ha="right")
plt.show()
I want to plot the mean of each x-element (method) as a small horizontal bar similar to what you get with:
sns.boxplot(
x="method",
y="distance",
data=planets,
whis=[50, 50],
showfliers=False,
showbox=False,
showcaps=False
)
But without the vertical lines (with whis=[50,50] just spots) for the first / third quartile and showing mean instead of median. Maybe there is a more elegant solution not involving a Boxplot.
Thanks in advance.
Boxplot objects are defined in matplotlib.pyplot.boxplot
showmeans=True
meanline=True makes a line instead of a marker
meanprops={'color': 'k', 'ls': '-', 'lw': 2} sets the color, style and width of the line.
See matplotlib.lines.Line2D for other line properties.
medianprops={'visible': False} makes the median line not visible
whiskerprops={'visible': False} makes the whisker line not visible
zorder=10 places the line on the top layer
Tested in matplotlib v3.4.2 and seaborn v0.11.1
import seaborn as sns
import matplotlib.pyplot as plt
# load the dataset
planets = sns.load_dataset("planets")
p = sns.stripplot(x="method", y="distance", data=planets, size=4, color=".7")
plt.xticks(rotation=45, ha="right")
p.set(yscale='log')
# plot the mean line
sns.boxplot(showmeans=True,
meanline=True,
meanprops={'color': 'k', 'ls': '-', 'lw': 2},
medianprops={'visible': False},
whiskerprops={'visible': False},
zorder=10,
x="method",
y="distance",
data=planets,
showfliers=False,
showbox=False,
showcaps=False,
ax=p)
plt.show()
Works similarly with a seaborn.swarmplot
Here's a solution using ax.hlines with find the mean using groupby and list comprehension:
import seaborn as sns
import matplotlib.pyplot as plt
# load the dataset
planets = sns.load_dataset("planets")
p = sns.stripplot(x="method", y="distance", data=planets, size=4, color=".7", zorder=1)
plt.xticks(rotation=45, ha="right")
p.set(yscale='log');
df_mean = planets.groupby('method', sort=False)['distance'].mean()
_ = [p.hlines(y, i-.25, i+.25, zorder=2) for i, y in df_mean.reset_index()['distance'].items()]
Output:
Here's another hack that is similar to the boxplot idea but requires less overriding: draw a pointplot but with a confidence interval of width 0, and activate the errorbar "caps" to get a horizontal line with a parametrizable width:
planets = sns.load_dataset("planets")
spec = dict(x="method", y="distance", data=planets)
sns.stripplot(**spec, size=4, color=".7")
sns.pointplot(**spec, join=False, ci=0, capsize=.7, scale=0)
plt.xticks(rotation=45, ha="right")
One downside that is evident here is that bootstrapping gets skipped for groups with a single observation, so you don't get a mean line there. This may or may not be a problem in an actual application.
Another trick would be to do the groupby yourself and then draw a scatterplot with a very wide vertical line marker:
planets = sns.load_dataset("planets")
variables = dict(x="method", y="distance")
sns.stripplot(data=planets, **variables, size=4, color=".7")
sns.scatterplot(
data=planets.groupby("method")["distance"].mean().reset_index(),
**variables, marker="|", s=2, linewidth=25
)
plt.xticks(rotation=45, ha="right")
I have two different dataframes:
df_test1 = pd.DataFrame(
[['<18', 80841], ['18-24', 334725], ['25-44', 698261], ['45-64', 273087], ['65+', 15035]],
columns = ['age_group', 'total_arrests']
)
df_test2 = pd.DataFrame(
[['<18', 33979], ['18-24', 106857], ['25-44', 219324], ['45-64', 80647], ['65+', 4211]],
columns = ['age_group','total_arrests']
)
I created the following plot using matplotlib:
fig, ax = plt.subplots()
ax.bar(df_test1.age_group, df_test1.total_arrests, color = 'seagreen')
ax.bar(df_test2.age_group, df_test2.total_arrests, color = 'lightgreen')
ax.set_xlabel('Age Group')
ax.set_ylabel('Number of Arrests')
ax.set_title('Arrests vs. Felony Arrests by Age Group')
plt.xticks(rotation=0)
plt.legend(['All Arressts', 'Felony Arrests'])
ax.yaxis.set_major_formatter(
ticker.FuncFormatter(lambda y,p: format(int(y), ','))
)
for i,j in zip(df_test1.age_group, df_test1.total_arrests):
ax.annotate(format(j, ','), xy=(i,j))
for i,j in zip(df_test2.age_group, df_test2.total_arrests):
ax.annotate(format(j, ','), xy=(i,j))
plt.show()
I was expecting 2 separate bars, one for each dataframe column, df_test1.total_arrests and df_test2.total_arrests but instead I got a stacked bar chart. How can I get a chart with bars next to one another similar to the chart here Matplotlib plot multiple bars in one graph ? I tried adjusting my code to the one in that example but I couldn't get it.
With only two bars, it's fairly easy. The solution is to align the bars on the "edge" of the tick, one bar is aligned to the left, the other to the right.
Repeat the same logic for proper alignment of the annotations. Half of them are left-aligned, the others are right-aligned
fig, ax = plt.subplots()
ax.bar(df_test1.age_group, df_test1.total_arrests, color = 'seagreen', width=0.4, align='edge')
ax.bar(df_test2.age_group, df_test2.total_arrests, color = 'lightgreen', width=-0.4, align='edge')
ax.set_xlabel('Age Group')
ax.set_ylabel('Number of Arrests')
ax.set_title('Arrests vs. Felony Arrests by Age Group')
plt.xticks(rotation=0)
plt.legend(['All Arressts', 'Felony Arrests'])
ax.yaxis.set_major_formatter(
matplotlib.ticker.FuncFormatter(lambda y,p: format(int(y), ','))
)
for i,j in zip(df_test1.age_group, df_test1.total_arrests):
ax.annotate(format(j, ','), xy=(i,j))
for i,j in zip(df_test2.age_group, df_test2.total_arrests):
ax.annotate(format(j, ','), xy=(i,j), ha='right')
plt.show()
If you have more than 2 bars, then the situation is more complicated (see the code that you linked above). You'll have an easier time using seaborn, but you have to transform your dataframe a bit:
df = pd.merge(left=df_test1, right=df_test2, on='age_group')
df.columns=['age_group','all_arrests', 'felonies']
df = df.melt(id_vars=['age_group'], var_name='Type', value_name='Number')
fig, ax = plt.subplots()
sns.barplot(y='Number',x='age_group',hue='Type', data=df, hue_order=['felonies','all_arrests'])
I am using ax.stem to draw lollipop plot in python. However, I found it difficult to assign different colors to each lollipop
as shown here
As you can see I have 2 categories "GWP" & "FDP".
In my project, each category should be divided into 4 subcategories "ingredient", "Waste", "energy" and "infrastructure". Therefore, I want to assign them different colors to indicate the subcategory.
There is a solution proposed here: https://python-graph-gallery.com/181-custom-lollipop-plot/
But this only teaches you how to change color for all lollipops.
And there is another solution: https://python-graph-gallery.com/183-highlight-a-group-in-lollipop/
But this one doesn't really use ax.stem.
Please let me know how to assign different colors to each lollipop.
(Also, I don't know somehow why my plot is displayed upside down. Also, the y axis does not align in order, and there is one dot not connected by a line. It displays correctly in my original plot though.)
Here is my code:
#%%
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
# my dataset
columns = np.array(['types', 'GWP100 (year)', 'FDP (year)'])
types = np.array(['Total (ingredient) per kg', 'Total (waste) per kg',
'energy (whole process) per kg', 'Infrastructure', 'Total (Total)']).reshape(5,1)
gwp = np.array([ 2.86982617e+02, 2.16824983e+02, 4.38920760e+01,
6.02400000e-02, 5.47759916e+02]).reshape(5,1)
fdp = np.array([ 1.35455867e+02, 7.02868322e+00, 1.26622560e+01,
1.64568000e-02, 1.55163263e+02]).reshape(5,1)
original_data = np.concatenate((types, gwp, fdp), axis = 1)
# produce dataframe
data = pd.DataFrame(original_data, columns = columns)
# types GWP100 (year) FDP (year)
#0 Total (ingredient) per kg 286.982617 135.455867
#1 Total (waste) per kg 216.824983 7.02868322
#2 energy (whole process) per kg 43.892076 12.662256
#3 Infrastructure 0.06024 0.0164568
#4 Total (Total) 547.759916 155.163263
#%% graph
fig = plt.figure(1, figsize =(8,6))
# 1st subplot
ax1 = fig.add_subplot(1,2,1)
gwp = data[data.columns[1]]
ax1.stem(gwp)
ax1.set_ylabel(r'kg CO$_2$-Eq', fontsize=10)
ax1.set_xlabel('GWP', fontsize=10)
# 2nd subplot
ax2 = fig.add_subplot(1,2,2)
fdp = data[data.columns[2]]
ax2.stem(fdp)
ax2.set_ylabel(r'kg oil-Eq', fontsize = 10)
ax2.set_xlabel('FDP', fontsize=10)
The stem currently consists of a couple of lines and a "line" consisting of dots on top. It has no option to colorize the lines separately within its interface.
You may replicate the stem plot to draw the lines manually with the color you like.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
columns = np.array(['types', 'GWP100 (year)', 'FDP (year)'])
types = np.array(['Total (ingredient) per kg', 'Total (waste) per kg',
'energy (whole process) per kg', 'Infrastructure', 'Total (Total)'])
gwp = np.array([ 2.86982617e+02, 2.16824983e+02, 4.38920760e+01,
6.02400000e-02, 5.47759916e+02])
fdp = np.array([ 1.35455867e+02, 7.02868322e+00, 1.26622560e+01,
1.64568000e-02, 1.55163263e+02])
# produce dataframe
data = pd.DataFrame([types,gwp,fdp], index = columns).transpose()
colors = list("bgryk")
fig, (ax, ax2) = plt.subplots(ncols=2)
for t, y, c in zip(data["types"], data["GWP100 (year)"],colors):
ax.plot([t,t], [0,y], color=c, marker="o", markevery=(1,2))
ax.set_ylim(0,None)
plt.setp(ax.get_xticklabels(), rotation=90)
fig.tight_layout()
plt.show()
A more efficient solution is of course to use a LineCollection in combination with a scatter plot for the dots.
fig, (ax, ax2) = plt.subplots(ncols=2)
segs = np.zeros((len(data), 2, 2))
segs[:,:,0] = np.repeat(np.arange(len(data)),2).reshape(len(data),2)
segs[:,1,1] = data["GWP100 (year)"].values
lc = LineCollection(segs, colors=colors)
ax.add_collection(lc)
ax.scatter(np.arange(len(data)), data["GWP100 (year)"].values, c=colors)
ax.set_xticks(np.arange(len(data)))
ax.set_xticklabels(data["types"], rotation=90)
ax.autoscale()
ax.set_ylim(0,None)
fig.tight_layout()
plt.show()
I will answer one of your main questions regarding the same coloring of the lines and markers category wise. There seems to be no direct option while calling ax1.stem() to specify the list of colors as per the official docs. In fact they say that the resulting plot might not be reasonable if one do so. Nevertheless, below is one trick to get things done your way.
The idea is following:
Get the objects (stemline) displayed on the subplot
Get the x-y data of the markers
Loop over the data and change the color of each stemline. Plot the marker individually with the same color as stemline. The colors is an array specifying the colors of your choice.
Following is the relevant part of the code:
# 1st subplot
ax1 = fig.add_subplot(1,2,1)
gwp = data[data.columns[1]]
colors = ['r', 'g', 'b', 'y', 'k']
_, stemlines, _ = ax1.stem(gwp)
line = ax1.get_lines()
xd = line[0].get_xdata()
yd = line[0].get_ydata()
# mec and mfc stands for markeredgecolor and markerfacecolor
for i in range(len(stemlines)):
plt.plot([xd[i]], [yd[i]], 'o', ms=7, mfc=colors[i], mec=colors[i])
plt.setp(stemlines[i], 'color', colors[i])
ax1.set_ylabel(r'kg CO$_2$-Eq', fontsize=10)
ax1.set_xlabel('GWP', fontsize=10)
# 2nd subplot
ax2 = fig.add_subplot(1,2,2)
fdp = data[data.columns[2]]
_, stemlines, _ = ax2.stem(fdp)
line = ax2.get_lines()
xd = line[0].get_xdata()
yd = line[0].get_ydata()
for i in range(len(stemlines)):
plt.plot([xd[i]], [yd[i]], 'o', ms=7, mfc=colors[i], mec=colors[i])
plt.setp(stemlines[i], 'color', colors[i])
I have a dataframe which has a number of values per date (datetime field). This values are classified in U (users) and S (session) by using a column Group. Seaborn is used to visualize two boxplots per date, where the hue is set to Group.
The problem comes when considering that the values corresponding to U (users) are much bigger than those corresponding to S (session), making the S data illegible. Thus, I need to come up with a solution that allows me to plot both series (U and S) in the same figure in an understandable manner.
I wonder if independent Y axes (with different scales) can be set to each hue, so that both Y axes are shown (as when using twinx but without losing hue visualization capabilities).
Any other alternative would be welcome =)
The S boxplot time series boxplot:
The combined boxplot time series using hue. Obviously it's not possible to see any information about the S group because of the scale of the Y axis:
The columns of the dataframe:
| Day (datetime) | n_data (numeric) | Group (S or U)|
The code line generating the combined boxplot:
seaborn.boxplot(ax=ax,x='Day', y='n_data', hue='Group', data=df,
palette='PRGn', showfliers=False)
Managed to find a solution by using twinx:
fig,ax= plt.subplots(figsize=(50,10))
tmpU = groups.copy()
tmpU.loc[tmp['Group']!='U','n_data'] = np.nan
tmpS = grupos.copy()
tmpS.loc[tmp['Group']!='S','n_data'] = np.nan
ax=seaborn.boxplot(ax=ax,x='Day', y = 'n_data', hue='Group', data=tmpU, palette = 'PRGn', showfliers=False)
ax2 = ax.twinx()
seaborn.boxplot(ax=ax2,x='Day', y = 'n_data', hue='Group', data=tmpS, palette = 'PRGn', showfliers=False)
handles,labels = ax.get_legend_handles_labels()
l= plt.legend(handles[0:2],labels[0:2],loc=1)
plt.setp(ax.get_xticklabels(),rotation=30,horizontalalignment='right')
for label in ax.get_xticklabels()[::2]:
label.set_visible(False)
plt.show()
plt.close('all')
The code above generates the following figure:
Which in this case turns out to be too dense to be published. Therefore I would adopt a visualization based in subplots, as Parfait susgested in his/her answer.
It wasn't an obvious solution to me so I would like to thank Parfait for his/her answer.
Consider building separate plots on same figure with y-axes ranges tailored to subsetted data. Below demonstrates with random data seeded for reproducibility (for readers of this post).
Data (with U values higher than S values)
import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt
np.random.seed(2018)
u_df = pd.DataFrame({'Day': pd.date_range('2016-10-01', periods=10)\
.append(pd.date_range('2016-10-01', periods=10)),
'n_data': np.random.uniform(0,800,20),
'Group': 'U'})
s_df = pd.DataFrame({'Day': pd.date_range('2016-10-01', periods=10)\
.append(pd.date_range('2016-10-01', periods=10)),
'n_data': np.random.uniform(0,200,20),
'Group': 'S'})
df = pd.concat([u_df, s_df], ignore_index=True)
df['Day'] = df['Day'].astype('str')
Plot
fig = plt.figure(figsize=(10,5))
for i,g in enumerate(df.groupby('Group')):
plt.title('N_data of {}'.format(g[0]))
plt.subplot(2, 1, i+1)
seaborn.boxplot(x="Day", y="n_data", data=g[1], palette="PRGn", showfliers=False)
plt.tight_layout()
plt.show()
plt.clf()
plt.close('all')
To retain original hue and grouping, render all non-group n_data to np.nan:
fig = plt.figure(figsize=(10,5))
for i,g in enumerate(df.Group.unique()):
plt.subplot(2, 1, i+1)
tmp = df.copy()
tmp.loc[tmp['Group']!=g, 'n_data'] = np.nan
seaborn.boxplot(x="Day", y="n_data", hue="Group", data=tmp,
palette="PRGn", showfliers=False)
plt.tight_layout()
plt.show()
plt.clf()
plt.close('all')
So one option to do a grouped box plot with two separate axis is to use hue_order= ['value, np.nan] in your argument for sns.boxplot:
fig = plt.figure(figsize=(14,8))
ax = sns.boxplot(x="lon_bucketed", y="value", data=m, hue='name', hue_order=['co2',np.nan],
width=0.75,showmeans=True,meanprops={"marker":"s","markerfacecolor":"black", "markeredgecolor":"black"},linewidth=0.5 ,palette = customPalette)
ax2 = ax.twinx()
ax2 = sns.boxplot(ax=ax2,x="lon_bucketed", y="value", data=m, hue='name', hue_order=[np.nan,'g_xco2'],
width=0.75,showmeans=True,meanprops={"marker":"s","markerfacecolor":"black", "markeredgecolor":"black"},linewidth=0.5, palette = customPalette)
ax1.grid(alpha=0.5, which = 'major')
plt.tight_layout()
ax.legend_.remove()
GW = mpatches.Patch(color='seagreen', label='$CO_2$')
WW = mpatches.Patch(color='mediumaquamarine', label='$XCO_2$')
ax, ax2.legend(handles=[GW,WW], loc='upper right',prop={'size': 14}, fontsize=12)
ax.set_title("$XCO_2$ vs. $CO_2$",fontsize=18)
ax.set_xlabel('Longitude [\u00b0]',fontsize=14)
ax.set_ylabel('$CO_2$ [ppm]',fontsize=14)
ax2.set_ylabel('$XCO_2$ [ppm]',fontsize=14)
ax.tick_params(labelsize=14)
This question already has answers here:
Is it possible to add hatches to each individual bar in seaborn.barplot?
(2 answers)
Closed 5 months ago.
I have a bar plot created using seaborn. For example, the plot can be created as follows:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data1 = pd.DataFrame(np.random.rand(17,3), columns=['A','B','C']).assign(Location=1)
data2 = pd.DataFrame(np.random.rand(17,3)+0.2, columns=['A','B','C']).assign(Location=2)
data3 = pd.DataFrame(np.random.rand(17,3)+0.4, columns=['A','B','C']).assign(Location=3)
cdf = pd.concat([data1, data2, data3])
mdf = pd.melt(cdf, id_vars=['Location'], var_name=['Letter'])
ax = sns.barplot(x="Location", y="value", hue="Letter", data=mdf, errwidth=0)
ax.legend(loc='upper center', bbox_to_anchor=(0.5, 1.2), ncol=3, fancybox=True, shadow=True)
plt.show()
This gives the following plot
I would like to do customize the chart as follows:
Remove the face color (set it to a white color)
Add a hash pattern to the image to distinguish the groups
How can this be achieved?
Removing the face color is easy, just do ax.set_facecolor('w'), although this will make the grid lines invisible. I'd recommend using sns.set_style('whitegrid') before plotting instead, you'll get a white background with only horizontal grid lines in grey.
As for the different has patterns, this is a little trickier with seaborn, but it can be done. You can pass the hatch keyword argument to barplot, but it'll be applied to each bar, which doesn't really help you distinguish them. Unfortunately, passing a dictionary here doesn't work. Instead, you can iterate over the bars after they're constructed to apply a hatch. You'll have to calculate the number of locations, but this is pretty straightforward with pandas. It turns out that seaborn actually plots each hue before moving on to the next hue, so in your example it would plot all blue bars, then all green bars, then all red bars, so the logic is pretty straightforward:
num_locations = len(mdf.Location.unique())
hatches = itertools.cycle(['/', '//', '+', '-', 'x', '\\', '*', 'o', 'O', '.'])
for i, bar in enumerate(ax.patches):
if i % num_locations == 0:
hatch = next(hatches)
bar.set_hatch(hatch)
So the full script is
import itertools
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
data1 = pd.DataFrame(np.random.rand(17, 3), columns=['A', 'B', 'C']).assign(Location=1)
data2 = pd.DataFrame(np.random.rand(17, 3) + 0.2, columns=['A', 'B', 'C']).assign(Location=2)
data3 = pd.DataFrame(np.random.rand(17, 3) + 0.4, columns=['A', 'B', 'C']).assign(Location=3)
cdf = pd.concat([data1, data2, data3])
mdf = pd.melt(cdf, id_vars=['Location'], var_name=['Letter'])
ax = sns.barplot(x="Location", y="value", hue="Letter", data=mdf, errwidth=0)
num_locations = len(mdf.Location.unique())
hatches = itertools.cycle(['/', '//', '+', '-', 'x', '\\', '*', 'o', 'O', '.'])
for i, bar in enumerate(ax.patches):
if i % num_locations == 0:
hatch = next(hatches)
bar.set_hatch(hatch)
ax.legend(loc='upper center', bbox_to_anchor=(0.5, 1.1), ncol=3, fancybox=True, shadow=True)
plt.show()
And I get the output
Reference for setting hatches and the different hatches available: http://matplotlib.org/examples/pylab_examples/hatch_demo.html
Note: I adjusted your bbox_to_anchor for the legend because it was partially outside of the figure on my computer.