How to make a bubble graph using seaborn - python

import matplotlib.pyplot as plt
import numpy as np
# data
x=["IEEE", "Elsevier", "Others"]
y=[7, 6, 2]
import seaborn as sns
plt.legend()
plt.scatter(x, y, s=300, c="blue", alpha=0.4, linewidth=3)
plt.ylabel("No. of Papers")
plt.figure(figsize=(10, 4))
I want to make a graph as shown in the image. I am not sure how to provide data for both journal and conference categories. (Currently, I just include one). Also, I am not sure how to add different colors for each category.

You can try this code snippet for you problem.
- I modified your Data format, I suggest you to use pandas for
data visualization.
- I added one more field to visualize the data more efficiently.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
# data
x=["IEEE", "Elsevier", "Others", "IEEE", "Elsevier", "Others"]
y=[7, 6, 2, 5, 4, 3]
z=["conference", "journal", "conference", "journal", "conference", "journal"]
# create pandas dataframe
data_list = pd.DataFrame(
{'x_axis': x,
'y_axis': y,
'category': z
})
# change size of data points
minsize = min(data_list['y_axis'])
maxsize = max(data_list['y_axis'])
# scatter plot
sns.catplot(x="x_axis", y="y_axis", kind="swarm", hue="category",sizes=(minsize*100, maxsize*100), data=data_list)
plt.grid()

How to create the graph with correct bubble sizes and with no overlap
Seaborn stripplot and swarmplot (or sns.catplot(kind=strip or kind=swarm)) provide the handy dodge argument which prevents the bubbles from overlapping. The only downside is that the size argument applies a single size to all bubbles and the sizes argument (as used in the other answer) is of no use here. They do not work like the s and size arguments of scatterplot. Therefore, the size of each bubble must be edited after generating the plot:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import seaborn as sns # v 0.11.0
# Create sample data
x = ['IEEE', 'Elsevier', 'Others', 'IEEE', 'Elsevier', 'Others']
y = np.array([7, 6, 3, 7, 1, 3])
z = ['conference', 'conference', 'conference', 'journal', 'journal', 'journal']
df = pd.DataFrame(dict(organisation=x, count=y, category=z))
# Create seaborn stripplot (swarmplot can be used the same way)
ax = sns.stripplot(data=df, x='organisation', y='count', hue='category', dodge=True)
# Adjust the size of the bubbles
for coll in ax.collections[:-2]:
y = coll.get_offsets()[0][1]
coll.set_sizes([100*y])
# Format figure size, spines and grid
ax.figure.set_size_inches(7, 5)
ax.grid(axis='y', color='black', alpha=0.2)
ax.grid(axis='x', which='minor', color='black', alpha=0.2)
ax.spines['bottom'].set(position='zero', color='black', alpha=0.2)
sns.despine(left=True)
# Format ticks
ax.tick_params(axis='both', length=0, pad=10, labelsize=12)
ax.tick_params(axis='x', which='minor', length=25, width=0.8, color=[0, 0, 0, 0.2])
minor_xticks = [tick+0.5 for tick in ax.get_xticks() if tick != ax.get_xticks()[-1]]
ax.set_xticks(minor_xticks, minor=True)
ax.set_yticks(range(0, df['count'].max()+2))
# Edit labels and legend
ax.set_xlabel('Organisation', labelpad=15, size=12)
ax.set_ylabel('No. of Papers', labelpad=15, size=12)
ax.legend(bbox_to_anchor=(1.0, 0.5), loc='center left', frameon=False);
Alternatively, you can use scatterplot with the convenient s argument (or size) and then edit the space between the bubbles to reproduce the effect of the missing dodge argument (note that the x_jitter argument seems to have no effect). Here is an example using the same data as before and without all the extra formatting:
# Create seaborn scatterplot with size argument
ax = sns.scatterplot(data=df, x='organisation', y='count',
hue='category', s=100*df['count'])
ax.figure.set_size_inches(7, 5)
ax.margins(0.2)
# Dodge bubbles
bubbles = ax.collections[0].get_offsets()
signs = np.repeat([-1, 1], df['organisation'].nunique())
for bubble, sign in zip(bubbles, signs):
bubble[0] += sign*0.15
As a side note, I recommend that you consider other types of plots for this data. A grouped bar chart:
df.pivot(index='organisation', columns='category').plot.bar()
Or a balloon plot (aka categorical bubble plot):
sns.scatterplot(data=df, x='organisation', y='category', s=100*count).margins(0.4)
Why? In the bubble graph, the counts are displayed using 2 visual attributes, i) the y-coordinate location and ii) the bubble size. Only one of them is really necessary.

Related

How to set xlim in seaborn barplot?

I have created a barplot for given days of the year and the number of people born on this given day (figure a). I want to set the x-axes in my seaborn barplot to xlim = (0,365) to show the whole year.
But, once I use ax.set_xlim(0,365) the bar plot is simply moved to the left (figure b).
This is the code:
#data
df = pd.DataFrame()
df['day'] = np.arange(41,200)
df['born'] = np.random.randn(159)*100
#plot
f, axes = plt.subplots(4, 4, figsize = (12,12))
ax = sns.barplot(df.day, df.born, data = df, hue = df.time, ax = axes[0,0], color = 'skyblue')
ax.get_xaxis().set_label_text('')
ax.set_xticklabels('')
ax.set_yscale('log')
ax.set_ylim(0,10e3)
ax.set_xlim(0,366)
ax.set_title('SE Africa')
How can I set the x-axes limits to day 0 and 365 without the bars being shifted to the left?
IIUC, the expected output given the nature of data is difficult to obtain straightforwardly, because, as per the documentation of seaborn.barplot:
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
This means the function seaborn.barplot creates categories based on the data in x (here, df.day) and they are linked to integers, starting from 0.
Therefore, it means even if we have data from day 41 onwards, seaborn is going to refer the starting category with x = 0, making for us difficult to tweak the lower limit of x-axis post function call.
The following code and corresponding plot clarifies what I explained above:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# data
rng = np.random.default_rng(101)
day = np.arange(41,200)
born = rng.integers(low=0, high=10e4, size=200-41)
df = pd.DataFrame({"day":day, "born":born})
# plot
f, ax = plt.subplots(figsize=(4, 4))
sns.barplot(data=df, x='day', y='born', ax=ax, color='b')
ax.set_xlim(0,365)
ax.set_xticks(ticks=np.arange(0, 365, 30), labels=np.arange(0, 365, 30))
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()
I suggest using matplotlib.axes.Axes.bar to overcome this issue, although handling colors of the bars would be not straightforward compared to sns.barplot(..., hue=..., ...) :
# plot
f, ax = plt.subplots(figsize=(4, 4))
ax.bar(x=df.day, height=df.born) # instead of sns.barplot
ax.get_xaxis().set_label_text('')
ax.set_xlim(0,365)
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()

Seaborn countplot with group order

I try to plot a count plot using seaborn and matplotlib. Given each year, I want to sort the count "drought types" within each year so that it looks better. Currently it is unsorted within each year and look very messy.
Thank you!
import seaborn as sns
import matplotlib.pyplot as plt
count=pd.read_csv(r"https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/count.csv")
plt.figure(figsize=(15,8))
sns.countplot(x= 'Year', hue = 'Drought types', data = count, palette = 'YlOrRd')
plt.legend(loc = "best",frameon=True,bbox_to_anchor=(0.9,0.75))
plt.show()
The following approach draws the years one-by-one. order= is used to fix the order of the years. hue_order is recalculated for each individual year (.reindex() is needed to make sure all drought_types are present).
A dictionary palette is used to make sure each hue value gets the same color, independent of the order. The automatic legend repeats all hue values for each year, so the legend needs to be reduced.
By the way, loc='best' shouldn't be used together with bbox_to_anchor in the legend, as it might cause very unexpected changes with small changes in the data. loc='best' will be changed to one of the 9 possible locations depending on the available space.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
count = pd.read_csv("https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/count.csv")
fig, ax = plt.subplots(figsize=(15, 8))
drought_types = count['Drought types'].unique()
palette = {drought_type: color
for drought_type, color in zip(drought_types, sns.color_palette('YlOrRd', len(drought_types)))}
all_years = range(count['Year'].min(), count['Year'].max() + 1)
sns.set_style('darkgrid')
for year in all_years:
year_data = count[count['Year'] == year]
if len(year_data) > 0:
# reindex is needed to make sure all drought_types are present
hue_order = year_data.groupby('Drought types').size().reindex(drought_types).sort_values(ascending=True).index
sns.countplot(x='Year', order=all_years,
hue='Drought types', hue_order=hue_order,
data=year_data, palette=palette, ax=ax)
# handles, _ = ax.get_legend_handles_labels()
# handles = handles[:len(drought_types)]
handles = [plt.Rectangle((0, 0), 0, 0, color=palette[drought_type], label=drought_type)
for drought_type in drought_types]
ax.legend(handles=handles, loc="upper right", frameon=True, bbox_to_anchor=(0.995, 0.99))
plt.show()

Pointplot and Scatterplot in one figure but X axis is shifting

Hi I'm trying to plot a pointplot and scatterplot on one graph with the same dataset so I can see the individual points that make up the pointplot.
Here is the code I am using:
xlPath = r'path to data here'
df = pd.concat(pd.read_excel(xlPath, sheet_name=None),ignore_index=True)
sns.pointplot(data=df, x='ID', y='HM (N/mm2)', palette='bright', capsize=0.15, alpha=0.5, ci=95, join=True, hue='Layer')
sns.scatterplot(data=df, x='ID', y='HM (N/mm2)')
plt.show()
When I plot, for some reason the points from the scatterplot are offsetting one ID spot right on the x-axis. When I plot the scatter or the point plot separately, they each are in the correct ID spot. Why would plotting them on the same plot cause the scatterplot to offset one right?
Edit: Tried to make the ID column categorical, but that didn't work either.
Seaborn's pointplot creates a categorical x-axis while here the scatterplot uses a numerical x-axis.
Explicitly making the x-values categorical: df['ID'] = pd.Categorical(df['ID']), isn't sufficient, as the scatterplot still sees numbers. Changing the values to strings does the trick. To get them in the correct order, sorting might be necessary.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# first create some test data
df = pd.DataFrame({'ID': np.random.choice(np.arange(1, 49), 500),
'HM (N/mm2)': np.random.uniform(1, 10, 500)})
df['Layer'] = ((df['ID'] - 1) // 6) % 4 + 1
df['HM (N/mm2)'] += df['Layer'] * 8
df['Layer'] = df['Layer'].map(lambda s: f'Layer {s}')
# sort the values and convert the 'ID's to strings
df = df.sort_values('ID')
df['ID'] = df['ID'].astype(str)
fig, ax = plt.subplots(figsize=(12, 4))
sns.pointplot(data=df, x='ID', y='HM (N/mm2)', palette='bright',
capsize=0.15, alpha=0.5, ci=95, join=True, hue='Layer', ax=ax)
sns.scatterplot(data=df, x='ID', y='HM (N/mm2)', color='purple', ax=ax)
ax.margins(x=0.02)
plt.tight_layout()
plt.show()

How to turn x-axis values into a legend for matplotlib bar graph

I am creating bar graphs for data that comes from series. However the names (x-axis values) are extremely long. If they are rotated 90 degrees it is impossible to read the entire name and get a good image of the graph. 45 degrees is not much better. I am looking for a way to label the x-axis by numbers 1-15 and then have a legend listing the names that correspond to each number.
This is the completed function I have so far, including creating the series from a larger dataframe
def graph_average_expressions(TAD_matches, CAGE):
"""graphs the top 15 expression levels of each lncRNA"""
for i, row in TAD_matches.iterrows():
mask = (
CAGE['short_description'].isin(row['peak_ID'])
)#finds expression level for peaks in an lncRNA
average = CAGE[mask].iloc[:,8:].mean(axis=0).astype('float32').sort_values().tail(n=15)
#made a new df of the top 15 highest expression levels for all averaged groups
#a group is peaks belong to the same lncRNA
cell_type = list(average.index)
expression = list(average.values)
average_df = pd.DataFrame(
list(zip(cell_type, expression)),
columns=['cell_type','expression']
)
colors = sns.color_palette(
'husl',
n_colors=len(cell_type)
)
p = sns.barplot(
x=average_df.index,
y='expression',
data=average_df,
palette=colors
)
cmap = dict(zip(average_df.cell_type, colors))
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
plt.legend(
handles=patches,
bbox_to_anchor=(1.04, 0.5),
loc='center left',
borderaxespad=0
)
plt.title('expression_levels_of_lncRNA_' + row['lncRNA_name'])
plt.xlabel('cell_type')
plt.ylabel('expression')
plt.show()
Here is an example of the data I am graphing
CD14_monocytes_treated_with_Group_A_streptococci_donor2.CNhs13532 1.583428
Neutrophils_donor3.CNhs11905 1.832527
CD14_monocytes_treated_with_Trehalose_dimycolate_TDM_donor2.CNhs13483 1.858384
CD14_monocytes_treated_with_Candida_donor1.CNhs13473 1.873013
CD14_Monocytes_donor2.CNhs11954 2.041607
CD14_monocytes_treated_with_Candida_donor2.CNhs13488 2.112112
CD14_Monocytes_donor3.CNhs11997 2.195365
CD14_monocytes_treated_with_Group_A_streptococci_donor1.CNhs13469 2.974203
Eosinophils_donor3.CNhs12549 3.566822
CD14_monocytes_treated_with_lipopolysaccharide_donor1.CNhs13470 3.685389
CD14_monocytes_treated_with_Salmonella_donor1.CNhs13471 4.409062
CD14_monocytes_treated_with_Candida_donor3.CNhs13494 5.546789
CD14_monocytes_-_treated_with_Group_A_streptococci_donor3.CNhs13492 5.673991
Neutrophils_donor1.CNhs10862 8.352045
Neutrophils_donor2.CNhs11959 11.595509
With the new code above this is the graph I get, but no legend or title.
A bit of a different route. Made a string mapping x values to the names and added it to the figure.
Made my own DataFrame for illustration.
from matplotlib import pyplot as plt
import pandas as pd
import string,random
df = pd.DataFrame({'name':[''.join(random.sample(string.ascii_letters,15))
for _ in range(10)],
'data':[random.randint(1,20) for _ in range(10)]})
Make the plot.
fig,ax = plt.subplots()
ax.bar(df.index,df.data)
Make the legend.
x_legend = '\n'.join(f'{n} - {name}' for n,name in zip(df.index,df['name']))
Add the legend as a Text artist and adjust the plot to accommodate it.
t = ax.text(.7,.2,x_legend,transform=ax.figure.transFigure)
fig.subplots_adjust(right=.65)
plt.show()
plt.close()
That can be made dynamic by getting and using the Text artist's size and the Figure's size.
# using imports and DataFrame from above
fig,ax = plt.subplots()
r = fig.canvas.get_renderer()
ax.bar(df.index,df.data)
x_legend = '\n'.join(f'{n} - {name}' for n,name in zip(df.index,df['name']))
t = ax.text(0,.1,x_legend,transform=ax.figure.transFigure)
# find the width of the Text and place it on the right side of the Figure
twidth = t.get_window_extent(renderer=r).width
*_,fwidth,fheight = fig.bbox.extents
tx,ty = t.get_position()
tx = .95 - (twidth/fwidth)
t.set_position((tx,ty))
# adjust the right edge of the plot/Axes
ax_right = tx - .05
fig.subplots_adjust(right=ax_right)
Setup the dataframe
verify the index of the dataframe to be plotted is reset, so it's integers beginning at 0, and use the index as the x-axis
plot the values on the y-axis
Option 1A: Seaborn hue
The easiest way is probably to use seaborn.barplot and use the hue parameter with the 'names'
Seaborn: Choosing color palettes
This plot is using husl
Additional options for the husl palette can be found at seaborn.husl_palette
The bars will not be centered for this option, because they are placed according to the number of hue levels, and there are 15 levels in this case.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# plt styling parameters
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.rcParams["patch.force_edgecolor"] = True
# create a color palette the length of the dataframe
colors = sns.color_palette('husl', n_colors=len(df))
# plot
p = sns.barplot(x=df.index, y='values', data=df, hue='names')
# place the legend to the right of the plot
plt.legend(bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)
Option 1B: Seaborn palette
Using the palette parameter instead of hue, places the bars directly over the ticks.
This option requires "manually" associating 'names' with the colors and creating the legend.
patches uses Patch to create each item in the legend. (e.g. the rectangle, associated with color and name).
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import Patch
# create a color palette the length of the dataframe
colors = sns.color_palette('husl', n_colors=len(df))
# plot
p = sns.barplot(x=df.index, y='values', data=df, palette=colors)
# create color map with colors and df.names
cmap = dict(zip(df.names, colors))
# create the rectangles for the legend
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
# add the legend
plt.legend(handles=patches, bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)
Option 2: pandas.DataFrame.plot
This option also requires "manually" associating 'names' with the palette and creating the legend using Patch.
Choosing Colormaps in Matplotlib
This plot is using tab20c
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.patches import Patch
# plt styling parameters
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.rcParams["patch.force_edgecolor"] = True
# chose a color map with enough colors for the number of bars
colors = [plt.cm.tab20c(np.arange(len(df)))]
# plot the dataframe
df.plot.bar(color=colors)
# create color map with colors and df.names
cmap = dict(zip(df.names, colors[0]))
# create the rectangles for the legend
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
# add the legend
plt.legend(handles=patches, bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)
Reproducible DataFrame
data = {'names': ['CD14_monocytes_treated_with_Group_A_streptococci_donor2.CNhs13532', 'Neutrophils_donor3.CNhs11905', 'CD14_monocytes_treated_with_Trehalose_dimycolate_TDM_donor2.CNhs13483', 'CD14_monocytes_treated_with_Candida_donor1.CNhs13473', 'CD14_Monocytes_donor2.CNhs11954', 'CD14_monocytes_treated_with_Candida_donor2.CNhs13488', 'CD14_Monocytes_donor3.CNhs11997', 'CD14_monocytes_treated_with_Group_A_streptococci_donor1.CNhs13469', 'Eosinophils_donor3.CNhs12549', 'CD14_monocytes_treated_with_lipopolysaccharide_donor1.CNhs13470', 'CD14_monocytes_treated_with_Salmonella_donor1.CNhs13471', 'CD14_monocytes_treated_with_Candida_donor3.CNhs13494', 'CD14_monocytes_-_treated_with_Group_A_streptococci_donor3.CNhs13492', 'Neutrophils_donor1.CNhs10862', 'Neutrophils_donor2.CNhs11959'],
'values': [1.583428, 1.832527, 1.858384, 1.873013, 2.041607, 2.1121112, 2.195365, 2.974203, 3.566822, 3.685389, 4.409062, 5.546789, 5.673991, 8.352045, 11.595509]}
df = pd.DataFrame(data)

Seaborn (time series) boxplot using hue and different scale axes

I have a dataframe which has a number of values per date (datetime field). This values are classified in U (users) and S (session) by using a column Group. Seaborn is used to visualize two boxplots per date, where the hue is set to Group.
The problem comes when considering that the values corresponding to U (users) are much bigger than those corresponding to S (session), making the S data illegible. Thus, I need to come up with a solution that allows me to plot both series (U and S) in the same figure in an understandable manner.
I wonder if independent Y axes (with different scales) can be set to each hue, so that both Y axes are shown (as when using twinx but without losing hue visualization capabilities).
Any other alternative would be welcome =)
The S boxplot time series boxplot:
The combined boxplot time series using hue. Obviously it's not possible to see any information about the S group because of the scale of the Y axis:
The columns of the dataframe:
| Day (datetime) | n_data (numeric) | Group (S or U)|
The code line generating the combined boxplot:
seaborn.boxplot(ax=ax,x='Day', y='n_data', hue='Group', data=df,
palette='PRGn', showfliers=False)
Managed to find a solution by using twinx:
fig,ax= plt.subplots(figsize=(50,10))
tmpU = groups.copy()
tmpU.loc[tmp['Group']!='U','n_data'] = np.nan
tmpS = grupos.copy()
tmpS.loc[tmp['Group']!='S','n_data'] = np.nan
ax=seaborn.boxplot(ax=ax,x='Day', y = 'n_data', hue='Group', data=tmpU, palette = 'PRGn', showfliers=False)
ax2 = ax.twinx()
seaborn.boxplot(ax=ax2,x='Day', y = 'n_data', hue='Group', data=tmpS, palette = 'PRGn', showfliers=False)
handles,labels = ax.get_legend_handles_labels()
l= plt.legend(handles[0:2],labels[0:2],loc=1)
plt.setp(ax.get_xticklabels(),rotation=30,horizontalalignment='right')
for label in ax.get_xticklabels()[::2]:
label.set_visible(False)
plt.show()
plt.close('all')
The code above generates the following figure:
Which in this case turns out to be too dense to be published. Therefore I would adopt a visualization based in subplots, as Parfait susgested in his/her answer.
It wasn't an obvious solution to me so I would like to thank Parfait for his/her answer.
Consider building separate plots on same figure with y-axes ranges tailored to subsetted data. Below demonstrates with random data seeded for reproducibility (for readers of this post).
Data (with U values higher than S values)
import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt
np.random.seed(2018)
u_df = pd.DataFrame({'Day': pd.date_range('2016-10-01', periods=10)\
.append(pd.date_range('2016-10-01', periods=10)),
'n_data': np.random.uniform(0,800,20),
'Group': 'U'})
s_df = pd.DataFrame({'Day': pd.date_range('2016-10-01', periods=10)\
.append(pd.date_range('2016-10-01', periods=10)),
'n_data': np.random.uniform(0,200,20),
'Group': 'S'})
df = pd.concat([u_df, s_df], ignore_index=True)
df['Day'] = df['Day'].astype('str')
Plot
fig = plt.figure(figsize=(10,5))
for i,g in enumerate(df.groupby('Group')):
plt.title('N_data of {}'.format(g[0]))
plt.subplot(2, 1, i+1)
seaborn.boxplot(x="Day", y="n_data", data=g[1], palette="PRGn", showfliers=False)
plt.tight_layout()
plt.show()
plt.clf()
plt.close('all')
To retain original hue and grouping, render all non-group n_data to np.nan:
fig = plt.figure(figsize=(10,5))
for i,g in enumerate(df.Group.unique()):
plt.subplot(2, 1, i+1)
tmp = df.copy()
tmp.loc[tmp['Group']!=g, 'n_data'] = np.nan
seaborn.boxplot(x="Day", y="n_data", hue="Group", data=tmp,
palette="PRGn", showfliers=False)
plt.tight_layout()
plt.show()
plt.clf()
plt.close('all')
So one option to do a grouped box plot with two separate axis is to use hue_order= ['value, np.nan] in your argument for sns.boxplot:
fig = plt.figure(figsize=(14,8))
ax = sns.boxplot(x="lon_bucketed", y="value", data=m, hue='name', hue_order=['co2',np.nan],
width=0.75,showmeans=True,meanprops={"marker":"s","markerfacecolor":"black", "markeredgecolor":"black"},linewidth=0.5 ,palette = customPalette)
ax2 = ax.twinx()
ax2 = sns.boxplot(ax=ax2,x="lon_bucketed", y="value", data=m, hue='name', hue_order=[np.nan,'g_xco2'],
width=0.75,showmeans=True,meanprops={"marker":"s","markerfacecolor":"black", "markeredgecolor":"black"},linewidth=0.5, palette = customPalette)
ax1.grid(alpha=0.5, which = 'major')
plt.tight_layout()
ax.legend_.remove()
GW = mpatches.Patch(color='seagreen', label='$CO_2$')
WW = mpatches.Patch(color='mediumaquamarine', label='$XCO_2$')
ax, ax2.legend(handles=[GW,WW], loc='upper right',prop={'size': 14}, fontsize=12)
ax.set_title("$XCO_2$ vs. $CO_2$",fontsize=18)
ax.set_xlabel('Longitude [\u00b0]',fontsize=14)
ax.set_ylabel('$CO_2$ [ppm]',fontsize=14)
ax2.set_ylabel('$XCO_2$ [ppm]',fontsize=14)
ax.tick_params(labelsize=14)

Categories