Seaborn countplot with group order

Seaborn countplot with group order - python

I try to plot a count plot using seaborn and matplotlib. Given each year, I want to sort the count "drought types" within each year so that it looks better. Currently it is unsorted within each year and look very messy.
Thank you!
import seaborn as sns
import matplotlib.pyplot as plt
count=pd.read_csv(r"https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/count.csv")
plt.figure(figsize=(15,8))
sns.countplot(x= 'Year', hue = 'Drought types', data = count, palette = 'YlOrRd')
plt.legend(loc = "best",frameon=True,bbox_to_anchor=(0.9,0.75))
plt.show()

The following approach draws the years one-by-one. order= is used to fix the order of the years. hue_order is recalculated for each individual year (.reindex() is needed to make sure all drought_types are present).
A dictionary palette is used to make sure each hue value gets the same color, independent of the order. The automatic legend repeats all hue values for each year, so the legend needs to be reduced.
By the way, loc='best' shouldn't be used together with bbox_to_anchor in the legend, as it might cause very unexpected changes with small changes in the data. loc='best' will be changed to one of the 9 possible locations depending on the available space.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
count = pd.read_csv("https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/count.csv")
fig, ax = plt.subplots(figsize=(15, 8))
drought_types = count['Drought types'].unique()
palette = {drought_type: color
for drought_type, color in zip(drought_types, sns.color_palette('YlOrRd', len(drought_types)))}
all_years = range(count['Year'].min(), count['Year'].max() + 1)
sns.set_style('darkgrid')
for year in all_years:
year_data = count[count['Year'] == year]
if len(year_data) > 0:
# reindex is needed to make sure all drought_types are present
hue_order = year_data.groupby('Drought types').size().reindex(drought_types).sort_values(ascending=True).index
sns.countplot(x='Year', order=all_years,
hue='Drought types', hue_order=hue_order,
data=year_data, palette=palette, ax=ax)
# handles, _ = ax.get_legend_handles_labels()
# handles = handles[:len(drought_types)]
handles = [plt.Rectangle((0, 0), 0, 0, color=palette[drought_type], label=drought_type)
for drought_type in drought_types]
ax.legend(handles=handles, loc="upper right", frameon=True, bbox_to_anchor=(0.995, 0.99))
plt.show()

Related

How to set xlim in seaborn barplot?

I have created a barplot for given days of the year and the number of people born on this given day (figure a). I want to set the x-axes in my seaborn barplot to xlim = (0,365) to show the whole year.
But, once I use ax.set_xlim(0,365) the bar plot is simply moved to the left (figure b).
This is the code:
#data
df = pd.DataFrame()
df['day'] = np.arange(41,200)
df['born'] = np.random.randn(159)*100
#plot
f, axes = plt.subplots(4, 4, figsize = (12,12))
ax = sns.barplot(df.day, df.born, data = df, hue = df.time, ax = axes[0,0], color = 'skyblue')
ax.get_xaxis().set_label_text('')
ax.set_xticklabels('')
ax.set_yscale('log')
ax.set_ylim(0,10e3)
ax.set_xlim(0,366)
ax.set_title('SE Africa')
How can I set the x-axes limits to day 0 and 365 without the bars being shifted to the left?

IIUC, the expected output given the nature of data is difficult to obtain straightforwardly, because, as per the documentation of seaborn.barplot:
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
This means the function seaborn.barplot creates categories based on the data in x (here, df.day) and they are linked to integers, starting from 0.
Therefore, it means even if we have data from day 41 onwards, seaborn is going to refer the starting category with x = 0, making for us difficult to tweak the lower limit of x-axis post function call.
The following code and corresponding plot clarifies what I explained above:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# data
rng = np.random.default_rng(101)
day = np.arange(41,200)
born = rng.integers(low=0, high=10e4, size=200-41)
df = pd.DataFrame({"day":day, "born":born})
# plot
f, ax = plt.subplots(figsize=(4, 4))
sns.barplot(data=df, x='day', y='born', ax=ax, color='b')
ax.set_xlim(0,365)
ax.set_xticks(ticks=np.arange(0, 365, 30), labels=np.arange(0, 365, 30))
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()
I suggest using matplotlib.axes.Axes.bar to overcome this issue, although handling colors of the bars would be not straightforward compared to sns.barplot(..., hue=..., ...) :
# plot
f, ax = plt.subplots(figsize=(4, 4))
ax.bar(x=df.day, height=df.born) # instead of sns.barplot
ax.get_xaxis().set_label_text('')
ax.set_xlim(0,365)
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()

Pointplot and Scatterplot in one figure but X axis is shifting

Hi I'm trying to plot a pointplot and scatterplot on one graph with the same dataset so I can see the individual points that make up the pointplot.
Here is the code I am using:
xlPath = r'path to data here'
df = pd.concat(pd.read_excel(xlPath, sheet_name=None),ignore_index=True)
sns.pointplot(data=df, x='ID', y='HM (N/mm2)', palette='bright', capsize=0.15, alpha=0.5, ci=95, join=True, hue='Layer')
sns.scatterplot(data=df, x='ID', y='HM (N/mm2)')
plt.show()
When I plot, for some reason the points from the scatterplot are offsetting one ID spot right on the x-axis. When I plot the scatter or the point plot separately, they each are in the correct ID spot. Why would plotting them on the same plot cause the scatterplot to offset one right?
Edit: Tried to make the ID column categorical, but that didn't work either.

Seaborn's pointplot creates a categorical x-axis while here the scatterplot uses a numerical x-axis.
Explicitly making the x-values categorical: df['ID'] = pd.Categorical(df['ID']), isn't sufficient, as the scatterplot still sees numbers. Changing the values to strings does the trick. To get them in the correct order, sorting might be necessary.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# first create some test data
df = pd.DataFrame({'ID': np.random.choice(np.arange(1, 49), 500),
'HM (N/mm2)': np.random.uniform(1, 10, 500)})
df['Layer'] = ((df['ID'] - 1) // 6) % 4 + 1
df['HM (N/mm2)'] += df['Layer'] * 8
df['Layer'] = df['Layer'].map(lambda s: f'Layer {s}')
# sort the values and convert the 'ID's to strings
df = df.sort_values('ID')
df['ID'] = df['ID'].astype(str)
fig, ax = plt.subplots(figsize=(12, 4))
sns.pointplot(data=df, x='ID', y='HM (N/mm2)', palette='bright',
capsize=0.15, alpha=0.5, ci=95, join=True, hue='Layer', ax=ax)
sns.scatterplot(data=df, x='ID', y='HM (N/mm2)', color='purple', ax=ax)
ax.margins(x=0.02)
plt.tight_layout()
plt.show()

Identify name of default color palette being used by seaborn or matplotlib

This is the color palette, seaborn has used by default when I used a column with categorical variables to color the scatter points.
Is there a way to get the name or colors of color-palette being used?
I get this color scheme in the beginning but as soon as I use a diff scheme for a plotly, I can't get to this palette for the same chart.
This is not the scheme which comes from sns.color_palette. This can also be a matplotlib color scheme.
Minimum reproducible example
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
# download data
df = pd.read_csv("https://www.statlearning.com/s/Auto.csv")
df.head()
# remove rows with "?"
df.drop(df.index[~df.horsepower.str.isnumeric()], axis=0, inplace=True)
df['horsepower'] = pd.to_numeric(df.horsepower, errors='coerce')
# plot 1 (gives the desired color-palette)
fig = sns.PairGrid(df, vars=df.columns[~df.columns.isin(['cylinders','origin','name'])].tolist(), hue='cylinders')
plt.gcf().set_size_inches(17,15)
fig.map_diag(sns.histplot)
fig.map_upper(sns.scatterplot)
fig.map_lower(sns.kdeplot)
fig.add_legend(ncol=5, loc=1, bbox_to_anchor=(0.5, 1.05), borderaxespad=0, frameon=False);
# plot 2
# Converting column cylinder to factor before using for 'color'
df.cylinders = df.cylinders.astype('category')
# Scatter plot - Cylinders as hue
pal = ['#fdc086','#386cb0','#beaed4','#33a02c','#f0027f']
col_map = dict(zip(sorted(df.cylinders.unique()), pal))
fig = px.scatter(df, y='mpg', x='year', color='cylinders',
color_discrete_map=col_map,
hover_data=['name','origin'])
fig.update_layout(width=800, height=400, plot_bgcolor='#fff')
fig.update_traces(marker=dict(size=8, line=dict(width=0.2,color='DarkSlateGrey')),
selector=dict(mode='markers'))
fig.show()
# plot 1 run again
fig = sns.PairGrid(df, vars=df.columns[~df.columns.isin(['cylinders','origin','name'])].tolist(), hue='cylinders')
plt.gcf().set_size_inches(17,15)
fig.map_diag(sns.histplot)
fig.map_upper(sns.scatterplot)
fig.map_lower(sns.kdeplot)
fig.add_legend(ncol=5, loc=1, bbox_to_anchor=(0.5, 1.05), borderaxespad=0, frameon=False);

The specific palette you have mentioned is the cubehelix and you can get it using:
sns.cubehelix_palette()
You can get the colours using indexing:
sns.cubehelix_palette()[:]
# [[0.9312692223325372, 0.8201921796082118, 0.7971480974663592],
# [0.8559578605899612, 0.6418993116910497, 0.6754191211563135],
# [0.739734329496642, 0.4765280683170713, 0.5959617419736206],
# [0.57916573903086, 0.33934576125314425, 0.5219003947563425],
# [0.37894937987025, 0.2224702044652721, 0.41140014301575434],
# [0.1750865648952205, 0.11840023306916837, 0.24215989137836502]]
In general, checking the official docs or in the case when you need to check some defaults of seaborn (which is the only case when you don't know what the palette is, otherwise you know because you're the one defining it), you can always check the github code (eg. here or here).

In your first graph the cylinders is a continuous variable of type int64 and seaborn is using a single color, in this case purple, and shading it to indicate the scale of the value, so that 8 cylinders would be darker than 4. This is done on purpose so you can easily tell what is what by the shade of the color.
Once you convert to categorical halfway through there is no longer such a relationship between the cylinder values, i.e. 8 cylinders is not twice as much a 4 cylinders anymore, they are essentially two totally different categories. To avoid associating the shade of color with the scale of the variable (since the values are no longer continuous and the relationship doesn't exist) a categorical color palette will be used by default, such that every color is distinct from the other.
In order to solve your problem, you will need to cast cylinders back to int64 prior to running your final chart with
df.cylinders = df.cylinders.astype('int64')
This will restore the variable to a continuous one and will allow seaborn to use gradients of the same color to represent the size of the values and your final plot will look just like the first one.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import warnings
warnings.filterwarnings("ignore")
# download data
df = pd.read_csv("https://www.statlearning.com/s/Auto.csv")
df.head()
# remove rows with "?"
df.drop(df.index[~df.horsepower.str.isnumeric()], axis=0, inplace=True)
df['horsepower'] = pd.to_numeric(df.horsepower, errors='coerce')
# plot 1 (gives the desired color-palette)
fig = sns.PairGrid(df, vars=df.columns[~df.columns.isin(['cylinders','origin','name'])].tolist(), hue='cylinders')
plt.gcf().set_size_inches(17,15)
fig.map_diag(sns.histplot)
fig.map_upper(sns.scatterplot)
fig.map_lower(sns.kdeplot)
fig.add_legend(ncol=5, loc=1, bbox_to_anchor=(0.5, 1.05), borderaxespad=0, frameon=False);
# plot 2
# Converting column cylinder to factor before using for 'color'
df.cylinders = df.cylinders.astype('category')
# Scatter plot - Cylinders as hue
pal = ['#fdc086','#386cb0','#beaed4','#33a02c','#f0027f']
col_map = dict(zip(sorted(df.cylinders.unique()), pal))
fig = px.scatter(df, y='mpg', x='year', color='cylinders',
color_discrete_map=col_map,
hover_data=['name','origin'])
fig.update_layout(width=800, height=400, plot_bgcolor='#fff')
fig.update_traces(marker=dict(size=8, line=dict(width=0.2,color='DarkSlateGrey')),
selector=dict(mode='markers'))
fig.show()
# plot 1 run again
df.cylinders = df.cylinders.astype('int64')
fig = sns.PairGrid(df, vars=df.columns[~df.columns.isin(['cylinders','origin','name'])].tolist(), hue='cylinders')
plt.gcf().set_size_inches(17,15)
fig.map_diag(sns.histplot)
fig.map_upper(sns.scatterplot)
fig.map_lower(sns.kdeplot)
fig.add_legend(ncol=5, loc=1, bbox_to_anchor=(0.5, 1.05), borderaxespad=0, frameon=False);
Output

One way to get it back is with sns.set(). But that doesn't tell us the name of the color scheme.

How to turn x-axis values into a legend for matplotlib bar graph

I am creating bar graphs for data that comes from series. However the names (x-axis values) are extremely long. If they are rotated 90 degrees it is impossible to read the entire name and get a good image of the graph. 45 degrees is not much better. I am looking for a way to label the x-axis by numbers 1-15 and then have a legend listing the names that correspond to each number.
This is the completed function I have so far, including creating the series from a larger dataframe
def graph_average_expressions(TAD_matches, CAGE):
"""graphs the top 15 expression levels of each lncRNA"""
for i, row in TAD_matches.iterrows():
mask = (
CAGE['short_description'].isin(row['peak_ID'])
)#finds expression level for peaks in an lncRNA
average = CAGE[mask].iloc[:,8:].mean(axis=0).astype('float32').sort_values().tail(n=15)
#made a new df of the top 15 highest expression levels for all averaged groups
#a group is peaks belong to the same lncRNA
cell_type = list(average.index)
expression = list(average.values)
average_df = pd.DataFrame(
list(zip(cell_type, expression)),
columns=['cell_type','expression']
)
colors = sns.color_palette(
'husl',
n_colors=len(cell_type)
)
p = sns.barplot(
x=average_df.index,
y='expression',
data=average_df,
palette=colors
)
cmap = dict(zip(average_df.cell_type, colors))
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
plt.legend(
handles=patches,
bbox_to_anchor=(1.04, 0.5),
loc='center left',
borderaxespad=0
)
plt.title('expression_levels_of_lncRNA_' + row['lncRNA_name'])
plt.xlabel('cell_type')
plt.ylabel('expression')
plt.show()
Here is an example of the data I am graphing
CD14_monocytes_treated_with_Group_A_streptococci_donor2.CNhs13532 1.583428
Neutrophils_donor3.CNhs11905 1.832527
CD14_monocytes_treated_with_Trehalose_dimycolate_TDM_donor2.CNhs13483 1.858384
CD14_monocytes_treated_with_Candida_donor1.CNhs13473 1.873013
CD14_Monocytes_donor2.CNhs11954 2.041607
CD14_monocytes_treated_with_Candida_donor2.CNhs13488 2.112112
CD14_Monocytes_donor3.CNhs11997 2.195365
CD14_monocytes_treated_with_Group_A_streptococci_donor1.CNhs13469 2.974203
Eosinophils_donor3.CNhs12549 3.566822
CD14_monocytes_treated_with_lipopolysaccharide_donor1.CNhs13470 3.685389
CD14_monocytes_treated_with_Salmonella_donor1.CNhs13471 4.409062
CD14_monocytes_treated_with_Candida_donor3.CNhs13494 5.546789
CD14_monocytes_-_treated_with_Group_A_streptococci_donor3.CNhs13492 5.673991
Neutrophils_donor1.CNhs10862 8.352045
Neutrophils_donor2.CNhs11959 11.595509
With the new code above this is the graph I get, but no legend or title.

A bit of a different route. Made a string mapping x values to the names and added it to the figure.
Made my own DataFrame for illustration.
from matplotlib import pyplot as plt
import pandas as pd
import string,random
df = pd.DataFrame({'name':[''.join(random.sample(string.ascii_letters,15))
for _ in range(10)],
'data':[random.randint(1,20) for _ in range(10)]})
Make the plot.
fig,ax = plt.subplots()
ax.bar(df.index,df.data)
Make the legend.
x_legend = '\n'.join(f'{n} - {name}' for n,name in zip(df.index,df['name']))
Add the legend as a Text artist and adjust the plot to accommodate it.
t = ax.text(.7,.2,x_legend,transform=ax.figure.transFigure)
fig.subplots_adjust(right=.65)
plt.show()
plt.close()
That can be made dynamic by getting and using the Text artist's size and the Figure's size.
# using imports and DataFrame from above
fig,ax = plt.subplots()
r = fig.canvas.get_renderer()
ax.bar(df.index,df.data)
x_legend = '\n'.join(f'{n} - {name}' for n,name in zip(df.index,df['name']))
t = ax.text(0,.1,x_legend,transform=ax.figure.transFigure)
# find the width of the Text and place it on the right side of the Figure
twidth = t.get_window_extent(renderer=r).width
*_,fwidth,fheight = fig.bbox.extents
tx,ty = t.get_position()
tx = .95 - (twidth/fwidth)
t.set_position((tx,ty))
# adjust the right edge of the plot/Axes
ax_right = tx - .05
fig.subplots_adjust(right=ax_right)

Setup the dataframe
verify the index of the dataframe to be plotted is reset, so it's integers beginning at 0, and use the index as the x-axis
plot the values on the y-axis
Option 1A: Seaborn hue
The easiest way is probably to use seaborn.barplot and use the hue parameter with the 'names'
Seaborn: Choosing color palettes
This plot is using husl
Additional options for the husl palette can be found at seaborn.husl_palette
The bars will not be centered for this option, because they are placed according to the number of hue levels, and there are 15 levels in this case.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# plt styling parameters
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.rcParams["patch.force_edgecolor"] = True
# create a color palette the length of the dataframe
colors = sns.color_palette('husl', n_colors=len(df))
# plot
p = sns.barplot(x=df.index, y='values', data=df, hue='names')
# place the legend to the right of the plot
plt.legend(bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)
Option 1B: Seaborn palette
Using the palette parameter instead of hue, places the bars directly over the ticks.
This option requires "manually" associating 'names' with the colors and creating the legend.
patches uses Patch to create each item in the legend. (e.g. the rectangle, associated with color and name).
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import Patch
# create a color palette the length of the dataframe
colors = sns.color_palette('husl', n_colors=len(df))
# plot
p = sns.barplot(x=df.index, y='values', data=df, palette=colors)
# create color map with colors and df.names
cmap = dict(zip(df.names, colors))
# create the rectangles for the legend
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
# add the legend
plt.legend(handles=patches, bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)
Option 2: pandas.DataFrame.plot
This option also requires "manually" associating 'names' with the palette and creating the legend using Patch.
Choosing Colormaps in Matplotlib
This plot is using tab20c
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.patches import Patch
# plt styling parameters
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.rcParams["patch.force_edgecolor"] = True
# chose a color map with enough colors for the number of bars
colors = [plt.cm.tab20c(np.arange(len(df)))]
# plot the dataframe
df.plot.bar(color=colors)
# create color map with colors and df.names
cmap = dict(zip(df.names, colors[0]))
# create the rectangles for the legend
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
# add the legend
plt.legend(handles=patches, bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)
Reproducible DataFrame
data = {'names': ['CD14_monocytes_treated_with_Group_A_streptococci_donor2.CNhs13532', 'Neutrophils_donor3.CNhs11905', 'CD14_monocytes_treated_with_Trehalose_dimycolate_TDM_donor2.CNhs13483', 'CD14_monocytes_treated_with_Candida_donor1.CNhs13473', 'CD14_Monocytes_donor2.CNhs11954', 'CD14_monocytes_treated_with_Candida_donor2.CNhs13488', 'CD14_Monocytes_donor3.CNhs11997', 'CD14_monocytes_treated_with_Group_A_streptococci_donor1.CNhs13469', 'Eosinophils_donor3.CNhs12549', 'CD14_monocytes_treated_with_lipopolysaccharide_donor1.CNhs13470', 'CD14_monocytes_treated_with_Salmonella_donor1.CNhs13471', 'CD14_monocytes_treated_with_Candida_donor3.CNhs13494', 'CD14_monocytes_-_treated_with_Group_A_streptococci_donor3.CNhs13492', 'Neutrophils_donor1.CNhs10862', 'Neutrophils_donor2.CNhs11959'],
'values': [1.583428, 1.832527, 1.858384, 1.873013, 2.041607, 2.1121112, 2.195365, 2.974203, 3.566822, 3.685389, 4.409062, 5.546789, 5.673991, 8.352045, 11.595509]}
df = pd.DataFrame(data)

Seaborn (time series) boxplot using hue and different scale axes

I have a dataframe which has a number of values per date (datetime field). This values are classified in U (users) and S (session) by using a column Group. Seaborn is used to visualize two boxplots per date, where the hue is set to Group.
The problem comes when considering that the values corresponding to U (users) are much bigger than those corresponding to S (session), making the S data illegible. Thus, I need to come up with a solution that allows me to plot both series (U and S) in the same figure in an understandable manner.
I wonder if independent Y axes (with different scales) can be set to each hue, so that both Y axes are shown (as when using twinx but without losing hue visualization capabilities).
Any other alternative would be welcome =)
The S boxplot time series boxplot:
The combined boxplot time series using hue. Obviously it's not possible to see any information about the S group because of the scale of the Y axis:
The columns of the dataframe:
| Day (datetime) | n_data (numeric) | Group (S or U)|
The code line generating the combined boxplot:
seaborn.boxplot(ax=ax,x='Day', y='n_data', hue='Group', data=df,
palette='PRGn', showfliers=False)
Managed to find a solution by using twinx:
fig,ax= plt.subplots(figsize=(50,10))
tmpU = groups.copy()
tmpU.loc[tmp['Group']!='U','n_data'] = np.nan
tmpS = grupos.copy()
tmpS.loc[tmp['Group']!='S','n_data'] = np.nan
ax=seaborn.boxplot(ax=ax,x='Day', y = 'n_data', hue='Group', data=tmpU, palette = 'PRGn', showfliers=False)
ax2 = ax.twinx()
seaborn.boxplot(ax=ax2,x='Day', y = 'n_data', hue='Group', data=tmpS, palette = 'PRGn', showfliers=False)
handles,labels = ax.get_legend_handles_labels()
l= plt.legend(handles[0:2],labels[0:2],loc=1)
plt.setp(ax.get_xticklabels(),rotation=30,horizontalalignment='right')
for label in ax.get_xticklabels()[::2]:
label.set_visible(False)
plt.show()
plt.close('all')
The code above generates the following figure:
Which in this case turns out to be too dense to be published. Therefore I would adopt a visualization based in subplots, as Parfait susgested in his/her answer.
It wasn't an obvious solution to me so I would like to thank Parfait for his/her answer.

Consider building separate plots on same figure with y-axes ranges tailored to subsetted data. Below demonstrates with random data seeded for reproducibility (for readers of this post).
Data (with U values higher than S values)
import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt
np.random.seed(2018)
u_df = pd.DataFrame({'Day': pd.date_range('2016-10-01', periods=10)\
.append(pd.date_range('2016-10-01', periods=10)),
'n_data': np.random.uniform(0,800,20),
'Group': 'U'})
s_df = pd.DataFrame({'Day': pd.date_range('2016-10-01', periods=10)\
.append(pd.date_range('2016-10-01', periods=10)),
'n_data': np.random.uniform(0,200,20),
'Group': 'S'})
df = pd.concat([u_df, s_df], ignore_index=True)
df['Day'] = df['Day'].astype('str')
Plot
fig = plt.figure(figsize=(10,5))
for i,g in enumerate(df.groupby('Group')):
plt.title('N_data of {}'.format(g[0]))
plt.subplot(2, 1, i+1)
seaborn.boxplot(x="Day", y="n_data", data=g[1], palette="PRGn", showfliers=False)
plt.tight_layout()
plt.show()
plt.clf()
plt.close('all')
To retain original hue and grouping, render all non-group n_data to np.nan:
fig = plt.figure(figsize=(10,5))
for i,g in enumerate(df.Group.unique()):
plt.subplot(2, 1, i+1)
tmp = df.copy()
tmp.loc[tmp['Group']!=g, 'n_data'] = np.nan
seaborn.boxplot(x="Day", y="n_data", hue="Group", data=tmp,
palette="PRGn", showfliers=False)
plt.tight_layout()
plt.show()
plt.clf()
plt.close('all')

So one option to do a grouped box plot with two separate axis is to use hue_order= ['value, np.nan] in your argument for sns.boxplot:
fig = plt.figure(figsize=(14,8))
ax = sns.boxplot(x="lon_bucketed", y="value", data=m, hue='name', hue_order=['co2',np.nan],
width=0.75,showmeans=True,meanprops={"marker":"s","markerfacecolor":"black", "markeredgecolor":"black"},linewidth=0.5 ,palette = customPalette)
ax2 = ax.twinx()
ax2 = sns.boxplot(ax=ax2,x="lon_bucketed", y="value", data=m, hue='name', hue_order=[np.nan,'g_xco2'],
width=0.75,showmeans=True,meanprops={"marker":"s","markerfacecolor":"black", "markeredgecolor":"black"},linewidth=0.5, palette = customPalette)
ax1.grid(alpha=0.5, which = 'major')
plt.tight_layout()
ax.legend_.remove()
GW = mpatches.Patch(color='seagreen', label='$CO_2$')
WW = mpatches.Patch(color='mediumaquamarine', label='$XCO_2$')
ax, ax2.legend(handles=[GW,WW], loc='upper right',prop={'size': 14}, fontsize=12)
ax.set_title("$XCO_2$ vs. $CO_2$",fontsize=18)
ax.set_xlabel('Longitude [\u00b0]',fontsize=14)
ax.set_ylabel('$CO_2$ [ppm]',fontsize=14)
ax2.set_ylabel('$XCO_2$ [ppm]',fontsize=14)
ax.tick_params(labelsize=14)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Seaborn countplot with group order - python

Related

How to set xlim in seaborn barplot?

Pointplot and Scatterplot in one figure but X axis is shifting

Identify name of default color palette being used by seaborn or matplotlib

How to turn x-axis values into a legend for matplotlib bar graph

Seaborn (time series) boxplot using hue and different scale axes

Categories

Resources