How to set xlim in seaborn barplot? - python

I have created a barplot for given days of the year and the number of people born on this given day (figure a). I want to set the x-axes in my seaborn barplot to xlim = (0,365) to show the whole year.
But, once I use ax.set_xlim(0,365) the bar plot is simply moved to the left (figure b).
This is the code:
#data
df = pd.DataFrame()
df['day'] = np.arange(41,200)
df['born'] = np.random.randn(159)*100
#plot
f, axes = plt.subplots(4, 4, figsize = (12,12))
ax = sns.barplot(df.day, df.born, data = df, hue = df.time, ax = axes[0,0], color = 'skyblue')
ax.get_xaxis().set_label_text('')
ax.set_xticklabels('')
ax.set_yscale('log')
ax.set_ylim(0,10e3)
ax.set_xlim(0,366)
ax.set_title('SE Africa')
How can I set the x-axes limits to day 0 and 365 without the bars being shifted to the left?

IIUC, the expected output given the nature of data is difficult to obtain straightforwardly, because, as per the documentation of seaborn.barplot:
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
This means the function seaborn.barplot creates categories based on the data in x (here, df.day) and they are linked to integers, starting from 0.
Therefore, it means even if we have data from day 41 onwards, seaborn is going to refer the starting category with x = 0, making for us difficult to tweak the lower limit of x-axis post function call.
The following code and corresponding plot clarifies what I explained above:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# data
rng = np.random.default_rng(101)
day = np.arange(41,200)
born = rng.integers(low=0, high=10e4, size=200-41)
df = pd.DataFrame({"day":day, "born":born})
# plot
f, ax = plt.subplots(figsize=(4, 4))
sns.barplot(data=df, x='day', y='born', ax=ax, color='b')
ax.set_xlim(0,365)
ax.set_xticks(ticks=np.arange(0, 365, 30), labels=np.arange(0, 365, 30))
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()
I suggest using matplotlib.axes.Axes.bar to overcome this issue, although handling colors of the bars would be not straightforward compared to sns.barplot(..., hue=..., ...) :
# plot
f, ax = plt.subplots(figsize=(4, 4))
ax.bar(x=df.day, height=df.born) # instead of sns.barplot
ax.get_xaxis().set_label_text('')
ax.set_xlim(0,365)
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()

Related

Seaborn countplot with group order

I try to plot a count plot using seaborn and matplotlib. Given each year, I want to sort the count "drought types" within each year so that it looks better. Currently it is unsorted within each year and look very messy.
Thank you!
import seaborn as sns
import matplotlib.pyplot as plt
count=pd.read_csv(r"https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/count.csv")
plt.figure(figsize=(15,8))
sns.countplot(x= 'Year', hue = 'Drought types', data = count, palette = 'YlOrRd')
plt.legend(loc = "best",frameon=True,bbox_to_anchor=(0.9,0.75))
plt.show()
The following approach draws the years one-by-one. order= is used to fix the order of the years. hue_order is recalculated for each individual year (.reindex() is needed to make sure all drought_types are present).
A dictionary palette is used to make sure each hue value gets the same color, independent of the order. The automatic legend repeats all hue values for each year, so the legend needs to be reduced.
By the way, loc='best' shouldn't be used together with bbox_to_anchor in the legend, as it might cause very unexpected changes with small changes in the data. loc='best' will be changed to one of the 9 possible locations depending on the available space.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
count = pd.read_csv("https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/count.csv")
fig, ax = plt.subplots(figsize=(15, 8))
drought_types = count['Drought types'].unique()
palette = {drought_type: color
for drought_type, color in zip(drought_types, sns.color_palette('YlOrRd', len(drought_types)))}
all_years = range(count['Year'].min(), count['Year'].max() + 1)
sns.set_style('darkgrid')
for year in all_years:
year_data = count[count['Year'] == year]
if len(year_data) > 0:
# reindex is needed to make sure all drought_types are present
hue_order = year_data.groupby('Drought types').size().reindex(drought_types).sort_values(ascending=True).index
sns.countplot(x='Year', order=all_years,
hue='Drought types', hue_order=hue_order,
data=year_data, palette=palette, ax=ax)
# handles, _ = ax.get_legend_handles_labels()
# handles = handles[:len(drought_types)]
handles = [plt.Rectangle((0, 0), 0, 0, color=palette[drought_type], label=drought_type)
for drought_type in drought_types]
ax.legend(handles=handles, loc="upper right", frameon=True, bbox_to_anchor=(0.995, 0.99))
plt.show()

Pointplot and Scatterplot in one figure but X axis is shifting

Hi I'm trying to plot a pointplot and scatterplot on one graph with the same dataset so I can see the individual points that make up the pointplot.
Here is the code I am using:
xlPath = r'path to data here'
df = pd.concat(pd.read_excel(xlPath, sheet_name=None),ignore_index=True)
sns.pointplot(data=df, x='ID', y='HM (N/mm2)', palette='bright', capsize=0.15, alpha=0.5, ci=95, join=True, hue='Layer')
sns.scatterplot(data=df, x='ID', y='HM (N/mm2)')
plt.show()
When I plot, for some reason the points from the scatterplot are offsetting one ID spot right on the x-axis. When I plot the scatter or the point plot separately, they each are in the correct ID spot. Why would plotting them on the same plot cause the scatterplot to offset one right?
Edit: Tried to make the ID column categorical, but that didn't work either.
Seaborn's pointplot creates a categorical x-axis while here the scatterplot uses a numerical x-axis.
Explicitly making the x-values categorical: df['ID'] = pd.Categorical(df['ID']), isn't sufficient, as the scatterplot still sees numbers. Changing the values to strings does the trick. To get them in the correct order, sorting might be necessary.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# first create some test data
df = pd.DataFrame({'ID': np.random.choice(np.arange(1, 49), 500),
'HM (N/mm2)': np.random.uniform(1, 10, 500)})
df['Layer'] = ((df['ID'] - 1) // 6) % 4 + 1
df['HM (N/mm2)'] += df['Layer'] * 8
df['Layer'] = df['Layer'].map(lambda s: f'Layer {s}')
# sort the values and convert the 'ID's to strings
df = df.sort_values('ID')
df['ID'] = df['ID'].astype(str)
fig, ax = plt.subplots(figsize=(12, 4))
sns.pointplot(data=df, x='ID', y='HM (N/mm2)', palette='bright',
capsize=0.15, alpha=0.5, ci=95, join=True, hue='Layer', ax=ax)
sns.scatterplot(data=df, x='ID', y='HM (N/mm2)', color='purple', ax=ax)
ax.margins(x=0.02)
plt.tight_layout()
plt.show()

Grouped boxplot with 2 y axes 2 variables per x tick

I am trying to make a boxplot of cost (in Rupees unit) and installed capacity (in Megawatt unit) with xaxis as share of renewables (in % unit).
That is each x tick is associated with two boxplots, one is the cost and one of the installed capacity. I have 3 xtick values (20%, 40%, 60%).
I tried this answer but I get error that is attached on the bottom.
I need two boxplots per xtick.
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
plt.rcParams["font.family"] = "Times New Roman"
plt.style.use('seaborn-ticks')
plt.grid(color='w', linestyle='solid')
data1 = pd.read_csv('RES_cap.csv')
df=pd.DataFrame(data1, columns=['per','cap','cost'])
cost= df['cost']
cap=df['cap']
per_res=df['per']
fig, ax1 = plt.subplots()
xticklabels = 3
ax1.set_xlabel('Percentage of RES integration')
ax1.set_ylabel('Production Capacity (MW)')
res1 = ax1.boxplot(cost, widths=0.4,patch_artist=True)
for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
plt.setp(res1[element])
for patch in res1['boxes']:
patch.set_facecolor('tab:blue')
ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis
ax2.set_ylabel('Costs', color='tab:orange')
res2 = ax2.boxplot(cap, widths=0.4,patch_artist=True)
for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
plt.setp(res2[element], color='k')
for patch in res2['boxes']:
patch.set_facecolor('tab:orange')
ax1.set_xticklabels(['20%','40%','60%'])
fig.tight_layout()
plt.show()
sample data:
data attached
By testing your code and comparing it to the answer by Thomas Kühn in the linked question, I see several things that stand out:
the data you input for the x parameter has a 1-D shape instead of 2-D. You input one variable so you get one box instead of the three you actually want;
the positions argument has not been defined, which causes the boxes of both boxplots to overlap;
in the first for loop over res1, the color argument in plt.setp is missing;
you have set x tick labels without first setting the x ticks (as cautioned here) which causes an error message.
I offer the following solution which is based more on this answer by ImportanceOfBeingErnest. It solves the issue of shaping the data correctly and it makes use of dictionaries to define many of the parameters that are shared by multiple objects in the plot. This makes it easier to adjust the format to your taste and also makes the code cleaner as it avoids the need for the for loops (over the boxplot elements and the res objects) and the repetition of arguments in functions that share the same parameters.
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create a random dataset similar to the one in the image you shared
rng = np.random.default_rng(seed=123) # random number generator
data = dict(per = np.repeat([20, 40, 60], [60, 30, 10]),
cap = rng.choice([70, 90, 220, 240, 320, 330, 340, 360, 410], size=100),
cost = rng.integers(low=2050, high=2250, size=100))
df = pd.DataFrame(data)
# Pivot table according to the 'per' categories so that the cap and
# cost variables are grouped by them:
df_pivot = df.pivot(columns=['per'])
# Create a list of the cap and cost grouped variables to be plotted
# in each (twinned) boxplot: note that the NaN values must be removed
# for the plotting function to work.
cap = [df_pivot['cap'][var].dropna() for var in df_pivot['cap']]
cost = [df_pivot['cost'][var].dropna() for var in df_pivot['cost']]
# Create figure and dictionary containing boxplot parameters that are
# common to both boxplots (according to my style preferences):
# note that I define the whis parameter so that values below the 5th
# percentile and above the 95th percentile are shown as outliers
nb_groups = df['per'].nunique()
fig, ax1 = plt.subplots(figsize=(9,6))
box_param = dict(whis=(5, 95), widths=0.2, patch_artist=True,
flierprops=dict(marker='.', markeredgecolor='black',
fillstyle=None), medianprops=dict(color='black'))
# Create boxplots for 'cap' variable: note the double asterisk used
# to unpack the dictionary of boxplot parameters
space = 0.15
ax1.boxplot(cap, positions=np.arange(nb_groups)-space,
boxprops=dict(facecolor='tab:blue'), **box_param)
# Create boxplots for 'cost' variable on twin Axes
ax2 = ax1.twinx()
ax2.boxplot(cost, positions=np.arange(nb_groups)+space,
boxprops=dict(facecolor='tab:orange'), **box_param)
# Format x ticks
labelsize = 12
ax1.set_xticks(np.arange(nb_groups))
ax1.set_xticklabels([f'{label}%' for label in df['per'].unique()])
ax1.tick_params(axis='x', labelsize=labelsize)
# Format y ticks
yticks_fmt = dict(axis='y', labelsize=labelsize)
ax1.tick_params(colors='tab:blue', **yticks_fmt)
ax2.tick_params(colors='tab:orange', **yticks_fmt)
# Format axes labels
label_fmt = dict(size=12, labelpad=15)
ax1.set_xlabel('Percentage of RES integration', **label_fmt)
ax1.set_ylabel('Production Capacity (MW)', color='tab:blue', **label_fmt)
ax2.set_ylabel('Costs (Rupees)', color='tab:orange', **label_fmt)
plt.show()
Matplotlib documentation: boxplot demo, boxplot function parameters, marker symbols for fliers, label text formatting parameters
Considering that it is quite an effort to set this up, if I were to do this for myself, I would go for side-by-side subplots instead of creating twinned Axes. This can be done quite easily in seaborn using the catplot function which takes care of a lot of the formatting automatically. Seeing as there are only three categories per variable, it is relatively easy to compare the boxplots side-by-side using a different color for each percentage category, as illustrated with this example based on the same data:
import seaborn as sns # v 0.11.0
# Convert dataframe to long format with 'per' set aside as a grouping variable
df_melt = df.melt(id_vars='per')
# Create side-by-side boxplots of each variable: note that the boxes
# are colored by default
g = sns.catplot(kind='box', data=df_melt, x='per', y='value', col='variable',
height=4, palette='Blues', sharey=False, saturation=1,
width=0.3, fliersize=2, linewidth=1, whis=(5, 95))
g.fig.subplots_adjust(wspace=0.4)
g.set_titles(col_template='{col_name}', size=12, pad=20)
# Format Axes labels
label_fmt = dict(size=10, labelpad=10)
for ax in g.axes.flatten():
ax.set_xlabel('Percentage of RES integration', **label_fmt)
g.axes.flatten()[0].set_ylabel('Production Capacity (MW)', **label_fmt)
g.axes.flatten()[1].set_ylabel('Costs (Rupees)', **label_fmt)
plt.show()

Python. Use two y axis for line and bar plots on Seaborn Facetgrid

Updated question and code!
Probably, the tips dataset is not the best example to use, however my issue is reproduced in it, i.e. we see that both point and bar plots share the same Y
I need to combine line and bar plots on one chart. To do this I used seaborn and the following code:
tips = sns.load_dataset('tips')
g = sns.FacetGrid(tips, hue='sex', col='sex', size=4, aspect=2.1, sharey=False, sharex=False)
g = g.map(sns.pointplot, 'day', 'tip', ci=0)
g = g.map(sns.barplot, 'day', 'total_bill', ci=0)
g.set_xticklabels(rotation=45, fontsize=9)
g.set_xticklabels(rotation=45, fontsize=9)
plt.show()
Here is the result:
Everything is okay except the fact that one Y axis is used for both bars and lines on each facetgrid object. I am new to seaborn and currently cannot find a solution. Tried to add "sharey=False" to this line of code
> `g.map(sns.pointplot, 'date', 'worthusdcount')`
however it didn't help.
Any solutions on how to add second Y axis would be appreciated
Here's an example where you apply a custom mapping function to the dataframe of interest. Within the function, you can call plt.gca() to get the current axis at the facet being currently plotted in FacetGrid. Once you have the axis, twinx() can be called just like you would in plain old matplotlib plotting.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
def facetgrid_two_axes(*args, **kwargs):
data = kwargs.pop('data')
dual_axis = kwargs.pop('dual_axis')
alpha = kwargs.pop('alpha', 0.2)
kwargs.pop('color')
ax = plt.gca()
if dual_axis:
ax2 = ax.twinx()
ax2.set_ylabel('Second Axis!')
ax.plot(data['x'],data['y1'], **kwargs, color='red',alpha=alpha)
if dual_axis:
ax2.bar(df['x'],df['y2'], **kwargs, color='blue',alpha=alpha)
df = pd.DataFrame()
df['x'] = np.arange(1,5,1)
df['y1'] = 1 / df['x']
df['y2'] = df['x'] * 100
df['facet'] = 'foo'
df2 = df.copy()
df2['facet'] = 'bar'
df3 = pd.concat([df,df2])
win_plot = sns.FacetGrid(df3, col='facet', size=6)
(win_plot.map_dataframe(facetgrid_two_axes, dual_axis=True)
.set_axis_labels("X", "First Y-axis"))
plt.show()
This isn't the prettiest plot as you might want to adjust the presence of the second y-axis' label, the spacing between plots, etc. but the code suffices to show how to plot two series of differing magnitudes within FacetGrids.

Seaborn (time series) boxplot using hue and different scale axes

I have a dataframe which has a number of values per date (datetime field). This values are classified in U (users) and S (session) by using a column Group. Seaborn is used to visualize two boxplots per date, where the hue is set to Group.
The problem comes when considering that the values corresponding to U (users) are much bigger than those corresponding to S (session), making the S data illegible. Thus, I need to come up with a solution that allows me to plot both series (U and S) in the same figure in an understandable manner.
I wonder if independent Y axes (with different scales) can be set to each hue, so that both Y axes are shown (as when using twinx but without losing hue visualization capabilities).
Any other alternative would be welcome =)
The S boxplot time series boxplot:
The combined boxplot time series using hue. Obviously it's not possible to see any information about the S group because of the scale of the Y axis:
The columns of the dataframe:
| Day (datetime) | n_data (numeric) | Group (S or U)|
The code line generating the combined boxplot:
seaborn.boxplot(ax=ax,x='Day', y='n_data', hue='Group', data=df,
palette='PRGn', showfliers=False)
Managed to find a solution by using twinx:
fig,ax= plt.subplots(figsize=(50,10))
tmpU = groups.copy()
tmpU.loc[tmp['Group']!='U','n_data'] = np.nan
tmpS = grupos.copy()
tmpS.loc[tmp['Group']!='S','n_data'] = np.nan
ax=seaborn.boxplot(ax=ax,x='Day', y = 'n_data', hue='Group', data=tmpU, palette = 'PRGn', showfliers=False)
ax2 = ax.twinx()
seaborn.boxplot(ax=ax2,x='Day', y = 'n_data', hue='Group', data=tmpS, palette = 'PRGn', showfliers=False)
handles,labels = ax.get_legend_handles_labels()
l= plt.legend(handles[0:2],labels[0:2],loc=1)
plt.setp(ax.get_xticklabels(),rotation=30,horizontalalignment='right')
for label in ax.get_xticklabels()[::2]:
label.set_visible(False)
plt.show()
plt.close('all')
The code above generates the following figure:
Which in this case turns out to be too dense to be published. Therefore I would adopt a visualization based in subplots, as Parfait susgested in his/her answer.
It wasn't an obvious solution to me so I would like to thank Parfait for his/her answer.
Consider building separate plots on same figure with y-axes ranges tailored to subsetted data. Below demonstrates with random data seeded for reproducibility (for readers of this post).
Data (with U values higher than S values)
import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt
np.random.seed(2018)
u_df = pd.DataFrame({'Day': pd.date_range('2016-10-01', periods=10)\
.append(pd.date_range('2016-10-01', periods=10)),
'n_data': np.random.uniform(0,800,20),
'Group': 'U'})
s_df = pd.DataFrame({'Day': pd.date_range('2016-10-01', periods=10)\
.append(pd.date_range('2016-10-01', periods=10)),
'n_data': np.random.uniform(0,200,20),
'Group': 'S'})
df = pd.concat([u_df, s_df], ignore_index=True)
df['Day'] = df['Day'].astype('str')
Plot
fig = plt.figure(figsize=(10,5))
for i,g in enumerate(df.groupby('Group')):
plt.title('N_data of {}'.format(g[0]))
plt.subplot(2, 1, i+1)
seaborn.boxplot(x="Day", y="n_data", data=g[1], palette="PRGn", showfliers=False)
plt.tight_layout()
plt.show()
plt.clf()
plt.close('all')
To retain original hue and grouping, render all non-group n_data to np.nan:
fig = plt.figure(figsize=(10,5))
for i,g in enumerate(df.Group.unique()):
plt.subplot(2, 1, i+1)
tmp = df.copy()
tmp.loc[tmp['Group']!=g, 'n_data'] = np.nan
seaborn.boxplot(x="Day", y="n_data", hue="Group", data=tmp,
palette="PRGn", showfliers=False)
plt.tight_layout()
plt.show()
plt.clf()
plt.close('all')
So one option to do a grouped box plot with two separate axis is to use hue_order= ['value, np.nan] in your argument for sns.boxplot:
fig = plt.figure(figsize=(14,8))
ax = sns.boxplot(x="lon_bucketed", y="value", data=m, hue='name', hue_order=['co2',np.nan],
width=0.75,showmeans=True,meanprops={"marker":"s","markerfacecolor":"black", "markeredgecolor":"black"},linewidth=0.5 ,palette = customPalette)
ax2 = ax.twinx()
ax2 = sns.boxplot(ax=ax2,x="lon_bucketed", y="value", data=m, hue='name', hue_order=[np.nan,'g_xco2'],
width=0.75,showmeans=True,meanprops={"marker":"s","markerfacecolor":"black", "markeredgecolor":"black"},linewidth=0.5, palette = customPalette)
ax1.grid(alpha=0.5, which = 'major')
plt.tight_layout()
ax.legend_.remove()
GW = mpatches.Patch(color='seagreen', label='$CO_2$')
WW = mpatches.Patch(color='mediumaquamarine', label='$XCO_2$')
ax, ax2.legend(handles=[GW,WW], loc='upper right',prop={'size': 14}, fontsize=12)
ax.set_title("$XCO_2$ vs. $CO_2$",fontsize=18)
ax.set_xlabel('Longitude [\u00b0]',fontsize=14)
ax.set_ylabel('$CO_2$ [ppm]',fontsize=14)
ax2.set_ylabel('$XCO_2$ [ppm]',fontsize=14)
ax.tick_params(labelsize=14)

Categories