This question already has answers here:
Is it possible to add hatches to each individual bar in seaborn.barplot?
(2 answers)
Closed 5 months ago.
I have a bar plot created using seaborn. For example, the plot can be created as follows:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data1 = pd.DataFrame(np.random.rand(17,3), columns=['A','B','C']).assign(Location=1)
data2 = pd.DataFrame(np.random.rand(17,3)+0.2, columns=['A','B','C']).assign(Location=2)
data3 = pd.DataFrame(np.random.rand(17,3)+0.4, columns=['A','B','C']).assign(Location=3)
cdf = pd.concat([data1, data2, data3])
mdf = pd.melt(cdf, id_vars=['Location'], var_name=['Letter'])
ax = sns.barplot(x="Location", y="value", hue="Letter", data=mdf, errwidth=0)
ax.legend(loc='upper center', bbox_to_anchor=(0.5, 1.2), ncol=3, fancybox=True, shadow=True)
plt.show()
This gives the following plot
I would like to do customize the chart as follows:
Remove the face color (set it to a white color)
Add a hash pattern to the image to distinguish the groups
How can this be achieved?
Removing the face color is easy, just do ax.set_facecolor('w'), although this will make the grid lines invisible. I'd recommend using sns.set_style('whitegrid') before plotting instead, you'll get a white background with only horizontal grid lines in grey.
As for the different has patterns, this is a little trickier with seaborn, but it can be done. You can pass the hatch keyword argument to barplot, but it'll be applied to each bar, which doesn't really help you distinguish them. Unfortunately, passing a dictionary here doesn't work. Instead, you can iterate over the bars after they're constructed to apply a hatch. You'll have to calculate the number of locations, but this is pretty straightforward with pandas. It turns out that seaborn actually plots each hue before moving on to the next hue, so in your example it would plot all blue bars, then all green bars, then all red bars, so the logic is pretty straightforward:
num_locations = len(mdf.Location.unique())
hatches = itertools.cycle(['/', '//', '+', '-', 'x', '\\', '*', 'o', 'O', '.'])
for i, bar in enumerate(ax.patches):
if i % num_locations == 0:
hatch = next(hatches)
bar.set_hatch(hatch)
So the full script is
import itertools
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
data1 = pd.DataFrame(np.random.rand(17, 3), columns=['A', 'B', 'C']).assign(Location=1)
data2 = pd.DataFrame(np.random.rand(17, 3) + 0.2, columns=['A', 'B', 'C']).assign(Location=2)
data3 = pd.DataFrame(np.random.rand(17, 3) + 0.4, columns=['A', 'B', 'C']).assign(Location=3)
cdf = pd.concat([data1, data2, data3])
mdf = pd.melt(cdf, id_vars=['Location'], var_name=['Letter'])
ax = sns.barplot(x="Location", y="value", hue="Letter", data=mdf, errwidth=0)
num_locations = len(mdf.Location.unique())
hatches = itertools.cycle(['/', '//', '+', '-', 'x', '\\', '*', 'o', 'O', '.'])
for i, bar in enumerate(ax.patches):
if i % num_locations == 0:
hatch = next(hatches)
bar.set_hatch(hatch)
ax.legend(loc='upper center', bbox_to_anchor=(0.5, 1.1), ncol=3, fancybox=True, shadow=True)
plt.show()
And I get the output
Reference for setting hatches and the different hatches available: http://matplotlib.org/examples/pylab_examples/hatch_demo.html
Note: I adjusted your bbox_to_anchor for the legend because it was partially outside of the figure on my computer.
Related
I try to plot a count plot using seaborn and matplotlib. Given each year, I want to sort the count "drought types" within each year so that it looks better. Currently it is unsorted within each year and look very messy.
Thank you!
import seaborn as sns
import matplotlib.pyplot as plt
count=pd.read_csv(r"https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/count.csv")
plt.figure(figsize=(15,8))
sns.countplot(x= 'Year', hue = 'Drought types', data = count, palette = 'YlOrRd')
plt.legend(loc = "best",frameon=True,bbox_to_anchor=(0.9,0.75))
plt.show()
The following approach draws the years one-by-one. order= is used to fix the order of the years. hue_order is recalculated for each individual year (.reindex() is needed to make sure all drought_types are present).
A dictionary palette is used to make sure each hue value gets the same color, independent of the order. The automatic legend repeats all hue values for each year, so the legend needs to be reduced.
By the way, loc='best' shouldn't be used together with bbox_to_anchor in the legend, as it might cause very unexpected changes with small changes in the data. loc='best' will be changed to one of the 9 possible locations depending on the available space.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
count = pd.read_csv("https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/count.csv")
fig, ax = plt.subplots(figsize=(15, 8))
drought_types = count['Drought types'].unique()
palette = {drought_type: color
for drought_type, color in zip(drought_types, sns.color_palette('YlOrRd', len(drought_types)))}
all_years = range(count['Year'].min(), count['Year'].max() + 1)
sns.set_style('darkgrid')
for year in all_years:
year_data = count[count['Year'] == year]
if len(year_data) > 0:
# reindex is needed to make sure all drought_types are present
hue_order = year_data.groupby('Drought types').size().reindex(drought_types).sort_values(ascending=True).index
sns.countplot(x='Year', order=all_years,
hue='Drought types', hue_order=hue_order,
data=year_data, palette=palette, ax=ax)
# handles, _ = ax.get_legend_handles_labels()
# handles = handles[:len(drought_types)]
handles = [plt.Rectangle((0, 0), 0, 0, color=palette[drought_type], label=drought_type)
for drought_type in drought_types]
ax.legend(handles=handles, loc="upper right", frameon=True, bbox_to_anchor=(0.995, 0.99))
plt.show()
I am trying to manipulate hatch of a countplot by hue. Here is the plot code and the corresponding plot I drew:
ax = sns.countplot(
data=data, y='M_pattern', hue='HHFAMINC',
palette=color_palette, lw=0.5, ec='black',
)
plt.yticks(rotation=45, ha='right')
legend_labels, _ = ax.get_legend_handles_labels()
hatches = ['-', '+', 'x', '\\']
# Loop over the bars
for i,thisbar in enumerate(bar.patches):
# Set a different hatch for each bar
thisbar.set_hatch(hatches[i])
plt.legend(
legend_labels, [
'Less than 10,000$ to 50,000$',
'50,000$ to 100,000$',
'100,000 to 150,000$',
'More than 150,000'
]
, title='Income categories'
)
plt.ylabel('Mandatory trip pattern')
plt.show()
Is there an straightforward way to hatch each income category separately?
ax.containers contains a list of each group of bars. To access individual bars, you can first loop through the containers and then through each of the bars. Instead of working with enumerate, it is highly recommended to use zip to simultaneously loop through two lists.
To rename legend entries, a replace onto the elements of the hue columns makes sure the correspondence between value and long name keeps intact after the dataset would be updated.
Here is an example using seaborn's 'tips' dataset. Here 3*hatch_pattern makes the hatching 3 times as dense.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
data = sns.load_dataset('tips')
ax = sns.countplot(
data=data.replace({'day': {'Thur': 'Thursday', 'Fri': 'Friday', 'Sat': 'Saturday', 'Sun': 'Sunday'}}),
y='time', hue='day',
palette='Set2', lw=0.5, ec='black')
plt.yticks(rotation=45, ha='right')
hatches = ['-', '+', 'x', '\\']
for hatch_pattern, these_bars in zip(hatches, ax.containers):
for this_bar in these_bars:
this_bar.set_hatch(3 * hatch_pattern)
ax.legend(loc='upper right', title='Days')
plt.show()
In Seaborn, I can assign the color of mean marker by providing meanprops
e.g. :
meanprops: {'marker': 'o', 'markeredgecolor': c,
'markerfacecolor': 'none', 'markersize': 4}
However, if I make a plot using hue, this will set same colour of mean to all the categories. How can i also apply hue color to mean also.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df_merge = pd.DataFrame(data={'AOD_440nm': np.random.rand(20),
'month': np.tile(['Jan','Feb'], 10),
'kind': np.repeat(['A', 'B'], 10)})
fig,ax = plt.subplots()
sns.boxplot(x='month', y='AOD_440nm', hue='kind', data=df_merge,
showfliers=False, whis=[5, 95],
palette=sns.color_palette(('r', 'k')),
showmeans=True)
for i, artist in enumerate(ax.artists):
# Set the linecolor on the artist to the facecolor, and set the facecolor to None
col = artist.get_facecolor()
artist.set_edgecolor(col)
artist.set_facecolor('None')
In short, how can I change colour of means?
You could loop through all the "lines" generated by the boxplot. The boxplot generates multiple lines per box, one for each element. The marker for the mean is also a "line", but with linestyle None, only having a marker (similar to how plt.plot can draw markers). The exact amount of lines per box depends on the options (as in: with/without mean, whiskers, ...). As changing the marker color of the non-marker lines doesn't have visible effect, changing all marker colors is the easiest approach.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df_merge = pd.DataFrame(data={'AOD_440nm': np.random.rand(20),
'month': np.tile(['Jan', 'Feb'], 10),
'kind': np.repeat(['A', 'B'], 10)})
fig, ax = plt.subplots()
sns.boxplot(x='month', y='AOD_440nm', hue='kind', data=df_merge,
showfliers=False, whis=[5, 95],
palette=sns.color_palette(('r', 'k')),
showmeans=True)
num_artists = len(ax.artists)
num_lines = len(ax.lines)
lines_per_artist = num_lines // num_artists
for i, artist in enumerate(ax.artists):
# Set the linecolor on the artist to the facecolor, and set the facecolor to None
col = artist.get_facecolor()
artist.set_edgecolor(col)
artist.set_facecolor('None')
# set the marker colors of the corresponding "lines" to the same color
for j in range(lines_per_artist):
ax.lines[i * lines_per_artist + j].set_markerfacecolor(col)
ax.lines[i * lines_per_artist + j].set_markeredgecolor(col)
plt.show()
PS: An alternative to artist.set_facecolor('None') could be to use a strong transparency: artist.set_alpha(0.1).
I am creating bar graphs for data that comes from series. However the names (x-axis values) are extremely long. If they are rotated 90 degrees it is impossible to read the entire name and get a good image of the graph. 45 degrees is not much better. I am looking for a way to label the x-axis by numbers 1-15 and then have a legend listing the names that correspond to each number.
This is the completed function I have so far, including creating the series from a larger dataframe
def graph_average_expressions(TAD_matches, CAGE):
"""graphs the top 15 expression levels of each lncRNA"""
for i, row in TAD_matches.iterrows():
mask = (
CAGE['short_description'].isin(row['peak_ID'])
)#finds expression level for peaks in an lncRNA
average = CAGE[mask].iloc[:,8:].mean(axis=0).astype('float32').sort_values().tail(n=15)
#made a new df of the top 15 highest expression levels for all averaged groups
#a group is peaks belong to the same lncRNA
cell_type = list(average.index)
expression = list(average.values)
average_df = pd.DataFrame(
list(zip(cell_type, expression)),
columns=['cell_type','expression']
)
colors = sns.color_palette(
'husl',
n_colors=len(cell_type)
)
p = sns.barplot(
x=average_df.index,
y='expression',
data=average_df,
palette=colors
)
cmap = dict(zip(average_df.cell_type, colors))
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
plt.legend(
handles=patches,
bbox_to_anchor=(1.04, 0.5),
loc='center left',
borderaxespad=0
)
plt.title('expression_levels_of_lncRNA_' + row['lncRNA_name'])
plt.xlabel('cell_type')
plt.ylabel('expression')
plt.show()
Here is an example of the data I am graphing
CD14_monocytes_treated_with_Group_A_streptococci_donor2.CNhs13532 1.583428
Neutrophils_donor3.CNhs11905 1.832527
CD14_monocytes_treated_with_Trehalose_dimycolate_TDM_donor2.CNhs13483 1.858384
CD14_monocytes_treated_with_Candida_donor1.CNhs13473 1.873013
CD14_Monocytes_donor2.CNhs11954 2.041607
CD14_monocytes_treated_with_Candida_donor2.CNhs13488 2.112112
CD14_Monocytes_donor3.CNhs11997 2.195365
CD14_monocytes_treated_with_Group_A_streptococci_donor1.CNhs13469 2.974203
Eosinophils_donor3.CNhs12549 3.566822
CD14_monocytes_treated_with_lipopolysaccharide_donor1.CNhs13470 3.685389
CD14_monocytes_treated_with_Salmonella_donor1.CNhs13471 4.409062
CD14_monocytes_treated_with_Candida_donor3.CNhs13494 5.546789
CD14_monocytes_-_treated_with_Group_A_streptococci_donor3.CNhs13492 5.673991
Neutrophils_donor1.CNhs10862 8.352045
Neutrophils_donor2.CNhs11959 11.595509
With the new code above this is the graph I get, but no legend or title.
A bit of a different route. Made a string mapping x values to the names and added it to the figure.
Made my own DataFrame for illustration.
from matplotlib import pyplot as plt
import pandas as pd
import string,random
df = pd.DataFrame({'name':[''.join(random.sample(string.ascii_letters,15))
for _ in range(10)],
'data':[random.randint(1,20) for _ in range(10)]})
Make the plot.
fig,ax = plt.subplots()
ax.bar(df.index,df.data)
Make the legend.
x_legend = '\n'.join(f'{n} - {name}' for n,name in zip(df.index,df['name']))
Add the legend as a Text artist and adjust the plot to accommodate it.
t = ax.text(.7,.2,x_legend,transform=ax.figure.transFigure)
fig.subplots_adjust(right=.65)
plt.show()
plt.close()
That can be made dynamic by getting and using the Text artist's size and the Figure's size.
# using imports and DataFrame from above
fig,ax = plt.subplots()
r = fig.canvas.get_renderer()
ax.bar(df.index,df.data)
x_legend = '\n'.join(f'{n} - {name}' for n,name in zip(df.index,df['name']))
t = ax.text(0,.1,x_legend,transform=ax.figure.transFigure)
# find the width of the Text and place it on the right side of the Figure
twidth = t.get_window_extent(renderer=r).width
*_,fwidth,fheight = fig.bbox.extents
tx,ty = t.get_position()
tx = .95 - (twidth/fwidth)
t.set_position((tx,ty))
# adjust the right edge of the plot/Axes
ax_right = tx - .05
fig.subplots_adjust(right=ax_right)
Setup the dataframe
verify the index of the dataframe to be plotted is reset, so it's integers beginning at 0, and use the index as the x-axis
plot the values on the y-axis
Option 1A: Seaborn hue
The easiest way is probably to use seaborn.barplot and use the hue parameter with the 'names'
Seaborn: Choosing color palettes
This plot is using husl
Additional options for the husl palette can be found at seaborn.husl_palette
The bars will not be centered for this option, because they are placed according to the number of hue levels, and there are 15 levels in this case.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# plt styling parameters
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.rcParams["patch.force_edgecolor"] = True
# create a color palette the length of the dataframe
colors = sns.color_palette('husl', n_colors=len(df))
# plot
p = sns.barplot(x=df.index, y='values', data=df, hue='names')
# place the legend to the right of the plot
plt.legend(bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)
Option 1B: Seaborn palette
Using the palette parameter instead of hue, places the bars directly over the ticks.
This option requires "manually" associating 'names' with the colors and creating the legend.
patches uses Patch to create each item in the legend. (e.g. the rectangle, associated with color and name).
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import Patch
# create a color palette the length of the dataframe
colors = sns.color_palette('husl', n_colors=len(df))
# plot
p = sns.barplot(x=df.index, y='values', data=df, palette=colors)
# create color map with colors and df.names
cmap = dict(zip(df.names, colors))
# create the rectangles for the legend
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
# add the legend
plt.legend(handles=patches, bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)
Option 2: pandas.DataFrame.plot
This option also requires "manually" associating 'names' with the palette and creating the legend using Patch.
Choosing Colormaps in Matplotlib
This plot is using tab20c
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.patches import Patch
# plt styling parameters
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.rcParams["patch.force_edgecolor"] = True
# chose a color map with enough colors for the number of bars
colors = [plt.cm.tab20c(np.arange(len(df)))]
# plot the dataframe
df.plot.bar(color=colors)
# create color map with colors and df.names
cmap = dict(zip(df.names, colors[0]))
# create the rectangles for the legend
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
# add the legend
plt.legend(handles=patches, bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)
Reproducible DataFrame
data = {'names': ['CD14_monocytes_treated_with_Group_A_streptococci_donor2.CNhs13532', 'Neutrophils_donor3.CNhs11905', 'CD14_monocytes_treated_with_Trehalose_dimycolate_TDM_donor2.CNhs13483', 'CD14_monocytes_treated_with_Candida_donor1.CNhs13473', 'CD14_Monocytes_donor2.CNhs11954', 'CD14_monocytes_treated_with_Candida_donor2.CNhs13488', 'CD14_Monocytes_donor3.CNhs11997', 'CD14_monocytes_treated_with_Group_A_streptococci_donor1.CNhs13469', 'Eosinophils_donor3.CNhs12549', 'CD14_monocytes_treated_with_lipopolysaccharide_donor1.CNhs13470', 'CD14_monocytes_treated_with_Salmonella_donor1.CNhs13471', 'CD14_monocytes_treated_with_Candida_donor3.CNhs13494', 'CD14_monocytes_-_treated_with_Group_A_streptococci_donor3.CNhs13492', 'Neutrophils_donor1.CNhs10862', 'Neutrophils_donor2.CNhs11959'],
'values': [1.583428, 1.832527, 1.858384, 1.873013, 2.041607, 2.1121112, 2.195365, 2.974203, 3.566822, 3.685389, 4.409062, 5.546789, 5.673991, 8.352045, 11.595509]}
df = pd.DataFrame(data)
I know how to cycle through a list of colors in matplotlib. But is it possible to do something similar with line styles (plain, dotted, dashed, etc.)? I'd need to do that so my graphs would be easier to read when printed. Any suggestions how to do that?
Something like this might do the trick:
import matplotlib.pyplot as plt
from itertools import cycle
lines = ["-","--","-.",":"]
linecycler = cycle(lines)
plt.figure()
for i in range(10):
x = range(i,i+10)
plt.plot(range(10),x,next(linecycler))
plt.show()
Result:
Edit for newer version (v2.22)
import matplotlib.pyplot as plt
from cycler import cycler
#
plt.figure()
for i in range(5):
x = range(i,i+5)
linestyle_cycler = cycler('linestyle',['-','--',':','-.'])
plt.rc('axes', prop_cycle=linestyle_cycler)
plt.plot(range(5),x)
plt.legend(['first','second','third','fourth','fifth'], loc='upper left', fancybox=True, shadow=True)
plt.show()
For more detailed information consult the matplotlib tutorial on "Styling with cycler"
To see the output click "show figure"
The upcoming matplotlib v1.5 will deprecate color_cycle for the new prop_cycler feature: http://matplotlib.org/devdocs/users/whats_new.html?highlight=prop_cycle#added-axes-prop-cycle-key-to-rcparams
plt.rcParams['axes.prop_cycle'] = ("cycler('color', 'rgb') +"
"cycler('lw', [1, 2, 3])")
Then go ahead and create your axes and plots!
here's a few examples of using the cyclers to develop sets of styles
cyclers can be added to give compositions (red with '-', blue with '--', ...)
plt.rc('axes', prop_cycle=(cycler('color', list('rbgk')) +
cycler('linestyle', ['-', '--', ':', '-.'])))
direct use on Axes:
ax1.set_prop_cycle(cycler('color', ['c', 'm', 'y', 'k']) +
cycler('lw', [1, 2, 3, 4]))
cyclers can be multiplied (http://matplotlib.org/cycler/) to give a wider range of unique styles
for ax in axarr:
ax.set_prop_cycle(cycler('color', list('rbgykcm')) *
cycler('linestyle', ['-', '--']))
see also: http://matplotlib.org/examples/color/color_cycle_demo.html
If you want the change to be automatic you can add this two lines in
the axes.py file of matplotlib:
Look for that line:
self.color_cycle = itertools.cycle(clist)
and add the following line underneath:
self.line_cycle = itertools.cycle(["-",":","--","-.",])
And look for the line:
kw['color'] = self.color_cycle.next()
and add the line:
kw['linestyle'] = self.line_cycle.next()
I guess you can do the same for marker.
I usually use a combination of basic colors and linestyles to represent different data sets. Suppose we have 16 data sets, each four data sets belonging to some group (having some property in common), then it is easy to visualize when we represent each group with a common color but its members with different line styles.
import numpy as np
import matplotlib.pyplot as plt
models=['00','01', '02', '03', '04', '05', '06', '07', '08', '09', '10',\
'11', '12', '13', '14', '15', '16']
fig = plt.figure()
ax = fig.add_subplot(111)
x = np.linspace(-1,1,100)
y = np.sin(x)
clrs_list=['k','b','g','r'] # list of basic colors
styl_list=['-','--','-.',':'] # list of basic linestyles
for i in range(0,16):
clrr=clrs_list[i // 4]
styl=styl_list[i % 4]
modl=models[i+1]
frac=(i+1)/10.0
ax.plot(x,y+frac,label=modl,color=clrr,ls=styl)
plt.legend()
plt.show()
I use code similar to this one to cycle through different linestyles. By default colours repeat after 7 plots.
idx = 0
for ds in datasets:
if idx < 7:
plot(ds)
elif idx < 14:
plot(ds, linestyle='--')
else:
plot(ds, linestyle=':')
idx += 1
Similar to Avaris graphs but different....
import matplotlib.pyplot as plt
import numpy as np
#set linestyles (for-loop method)
colors=('k','y','m','c','b','g','r','#aaaaaa')
linestyles=('-','--','-.',':')
styles=[(color,linestyle) for linestyle in linestyles for color in colors]
#-- sample data
numLines=30
dataXaxis=np.arange(0,10)
dataYaxis=dataXaxis+np.array([np.arange(numLines)]).T
plt.figure(1)
#-----------
# -- array oriented method but I cannot set the line color and styles
# -- without changing Matplotlib code
plt.plot(datax[:,np.newaxis],datay.T)
plt.title('Default linestyles - array oriented programming')
#-----------
#-----------
# -- 'for loop' based approach to enable colors and linestyles to be specified
plt.figure(2)
for num in range(datay.sh![enter image description here][1]ape[0]):
plt.plot(datax,datay[num,:],color=styles[num][0],ls=styles[num][1])
plt.title('User defined linestyles using for-loop programming')
#-----------
plt.show()