Related
I am trying to make a boxplot of cost (in Rupees unit) and installed capacity (in Megawatt unit) with xaxis as share of renewables (in % unit).
That is each x tick is associated with two boxplots, one is the cost and one of the installed capacity. I have 3 xtick values (20%, 40%, 60%).
I tried this answer but I get error that is attached on the bottom.
I need two boxplots per xtick.
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
plt.rcParams["font.family"] = "Times New Roman"
plt.style.use('seaborn-ticks')
plt.grid(color='w', linestyle='solid')
data1 = pd.read_csv('RES_cap.csv')
df=pd.DataFrame(data1, columns=['per','cap','cost'])
cost= df['cost']
cap=df['cap']
per_res=df['per']
fig, ax1 = plt.subplots()
xticklabels = 3
ax1.set_xlabel('Percentage of RES integration')
ax1.set_ylabel('Production Capacity (MW)')
res1 = ax1.boxplot(cost, widths=0.4,patch_artist=True)
for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
plt.setp(res1[element])
for patch in res1['boxes']:
patch.set_facecolor('tab:blue')
ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis
ax2.set_ylabel('Costs', color='tab:orange')
res2 = ax2.boxplot(cap, widths=0.4,patch_artist=True)
for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
plt.setp(res2[element], color='k')
for patch in res2['boxes']:
patch.set_facecolor('tab:orange')
ax1.set_xticklabels(['20%','40%','60%'])
fig.tight_layout()
plt.show()
sample data:
data attached
By testing your code and comparing it to the answer by Thomas Kühn in the linked question, I see several things that stand out:
the data you input for the x parameter has a 1-D shape instead of 2-D. You input one variable so you get one box instead of the three you actually want;
the positions argument has not been defined, which causes the boxes of both boxplots to overlap;
in the first for loop over res1, the color argument in plt.setp is missing;
you have set x tick labels without first setting the x ticks (as cautioned here) which causes an error message.
I offer the following solution which is based more on this answer by ImportanceOfBeingErnest. It solves the issue of shaping the data correctly and it makes use of dictionaries to define many of the parameters that are shared by multiple objects in the plot. This makes it easier to adjust the format to your taste and also makes the code cleaner as it avoids the need for the for loops (over the boxplot elements and the res objects) and the repetition of arguments in functions that share the same parameters.
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create a random dataset similar to the one in the image you shared
rng = np.random.default_rng(seed=123) # random number generator
data = dict(per = np.repeat([20, 40, 60], [60, 30, 10]),
cap = rng.choice([70, 90, 220, 240, 320, 330, 340, 360, 410], size=100),
cost = rng.integers(low=2050, high=2250, size=100))
df = pd.DataFrame(data)
# Pivot table according to the 'per' categories so that the cap and
# cost variables are grouped by them:
df_pivot = df.pivot(columns=['per'])
# Create a list of the cap and cost grouped variables to be plotted
# in each (twinned) boxplot: note that the NaN values must be removed
# for the plotting function to work.
cap = [df_pivot['cap'][var].dropna() for var in df_pivot['cap']]
cost = [df_pivot['cost'][var].dropna() for var in df_pivot['cost']]
# Create figure and dictionary containing boxplot parameters that are
# common to both boxplots (according to my style preferences):
# note that I define the whis parameter so that values below the 5th
# percentile and above the 95th percentile are shown as outliers
nb_groups = df['per'].nunique()
fig, ax1 = plt.subplots(figsize=(9,6))
box_param = dict(whis=(5, 95), widths=0.2, patch_artist=True,
flierprops=dict(marker='.', markeredgecolor='black',
fillstyle=None), medianprops=dict(color='black'))
# Create boxplots for 'cap' variable: note the double asterisk used
# to unpack the dictionary of boxplot parameters
space = 0.15
ax1.boxplot(cap, positions=np.arange(nb_groups)-space,
boxprops=dict(facecolor='tab:blue'), **box_param)
# Create boxplots for 'cost' variable on twin Axes
ax2 = ax1.twinx()
ax2.boxplot(cost, positions=np.arange(nb_groups)+space,
boxprops=dict(facecolor='tab:orange'), **box_param)
# Format x ticks
labelsize = 12
ax1.set_xticks(np.arange(nb_groups))
ax1.set_xticklabels([f'{label}%' for label in df['per'].unique()])
ax1.tick_params(axis='x', labelsize=labelsize)
# Format y ticks
yticks_fmt = dict(axis='y', labelsize=labelsize)
ax1.tick_params(colors='tab:blue', **yticks_fmt)
ax2.tick_params(colors='tab:orange', **yticks_fmt)
# Format axes labels
label_fmt = dict(size=12, labelpad=15)
ax1.set_xlabel('Percentage of RES integration', **label_fmt)
ax1.set_ylabel('Production Capacity (MW)', color='tab:blue', **label_fmt)
ax2.set_ylabel('Costs (Rupees)', color='tab:orange', **label_fmt)
plt.show()
Matplotlib documentation: boxplot demo, boxplot function parameters, marker symbols for fliers, label text formatting parameters
Considering that it is quite an effort to set this up, if I were to do this for myself, I would go for side-by-side subplots instead of creating twinned Axes. This can be done quite easily in seaborn using the catplot function which takes care of a lot of the formatting automatically. Seeing as there are only three categories per variable, it is relatively easy to compare the boxplots side-by-side using a different color for each percentage category, as illustrated with this example based on the same data:
import seaborn as sns # v 0.11.0
# Convert dataframe to long format with 'per' set aside as a grouping variable
df_melt = df.melt(id_vars='per')
# Create side-by-side boxplots of each variable: note that the boxes
# are colored by default
g = sns.catplot(kind='box', data=df_melt, x='per', y='value', col='variable',
height=4, palette='Blues', sharey=False, saturation=1,
width=0.3, fliersize=2, linewidth=1, whis=(5, 95))
g.fig.subplots_adjust(wspace=0.4)
g.set_titles(col_template='{col_name}', size=12, pad=20)
# Format Axes labels
label_fmt = dict(size=10, labelpad=10)
for ax in g.axes.flatten():
ax.set_xlabel('Percentage of RES integration', **label_fmt)
g.axes.flatten()[0].set_ylabel('Production Capacity (MW)', **label_fmt)
g.axes.flatten()[1].set_ylabel('Costs (Rupees)', **label_fmt)
plt.show()
I want to to a violin plot of binned data but at the same time be able to plot a model prediction and visualize how well the model describes the main part of the individual data distributions. My problem here is, I guess, that the x-axis after the violin plot does not behave like a regular axis with numbers, but more like string-values that just accidentally happen to be numbers. Maybe not a good description, but in the example I would like to have a "normal" plot a function, e.g. f(x) = 2*x**2, and at x=1, x=5.2, x=18.3 and x=27 I would like to have the violin in the background.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
np.random.seed(10)
collectn_1 = np.random.normal(1, 2, 200)
collectn_2 = np.random.normal(802, 30, 200)
collectn_3 = np.random.normal(90, 20, 200)
collectn_4 = np.random.normal(70, 25, 200)
ys = [collectn_1, collectn_2, collectn_3, collectn_4]
xs = [1, 5.2, 18.3, 27]
sns.violinplot(x=xs, y=ys)
xx = np.arange(0, 30, 10)
plt.plot(xx, 2*xx**2)
plt.show()
Somehow this code actually does not plot violins but only bars, this is only a problem in this example and not in the original code though. In my real code I want to have different "half-violins" on both sides, therefore I use sns.violinplot(x="..", y="..", hue="..", data=.., split=True).
I think that would be hard to do with seaborn because it does not provide an easy way to manipulate the artists that it creates, particularly if there are other things plotted on the same Axes. Matplotlib's violinplot allows setting the position of the violins, but does not provide an option for plotting only half violins. Therefore, I would suggest using statsmodels.graphics.boxplots.violinplot, which does both.
from statsmodels.graphics.boxplots import violinplot
df = sns.load_dataset('tips')
x_col = 'day'
y_col = 'total_bill'
hue_col = 'smoker'
xs = [1, 5.2, 18.3, 27]
xx = np.arange(0, 30, 1)
yy = 0.1*xx**2
cs = ['C0','C1']
fig, ax = plt.subplots()
ax.plot(xx,yy)
for (_,gr0),side,c in zip(df.groupby(hue_col),['left','right'],cs):
print(side)
data = [gr1 for (_,gr1) in gr0.groupby(x_col)[y_col]]
violinplot(ax=ax, data=data, positions=xs, side=side, show_boxplot=False, plot_opts=dict(violin_fc=c))
# violinplot above messes up which ticks are shown, the line below restores a sensible tick locator
ax.xaxis.set_major_locator(matplotlib.ticker.MaxNLocator())
x and y-axis variables in the stem plots are the columns from the DataFrames
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots(3, 1,figsize=(8,20))
fig.suptitle('Magnitude-Time Plot : R1')
fig.subplots_adjust(hspace=0.5)
# Defining custom 'xlim' and 'ylim' values.
xlim = (1990, 2005)
ylim = (0, 9)
# Setting the values for all axes.
plt.setp(ax, xlim=xlim, ylim=ylim)
# plot with no marker
#subplot(311) : Catalog8 in R1
markerline, stemlines, baseline = ax[0].stem(cata8q1.YearDeci,cata8q1.Magnitude, markerfmt=' ',use_line_collection=True,linefmt='b')
plt.setp(stemlines,linewidth=0.5)
ax[0].set_title('Catalog OLD')
#subplot(312) : catalog 9 _Uniq in R1
markerline, stemlines, baseline = ax[1].stem(cata9uniq1.YearDeci,cata9uniq1.Magnitude, markerfmt=' ',use_line_collection=True,linefmt='r')
plt.setp(stemlines,linewidth=0.5)
ax[1].set_title('Unique events in NEW CATALOG')
#subplot(312) : Catalog NEW in R1
markerline, stemlines, baseline = ax[2].stem(catanewq1.YearDeci,catanewq1.Magnitude, markerfmt=' ',use_line_collection=True)
plt.setp(stemlines,linewidth=0.5)
ax[2].set_title('OLD + unique events in NEW Catalog')
plt.show()
I am trying to plot the [3,1] stem subplots, I want to control the axes properties and figure properties more handy , with less number of lines of code
You can do something like this.
catalogs = [(cata8q1,"Catalog Old", "b"), ...]
for i, (catalog, title, linefmt) in enumerate (catalogs):
markerline, stemlines, baseline = ax[0].stem(catalog.YearDeci,catalog.Magnitude, markerfmt=' ',use_line_collection=True,linefmt=linefmt)
plt.setp(stemlines,linewidth=0.5)
ax[i].set_title(title)
I have a dataframe which has a number of values per date (datetime field). This values are classified in U (users) and S (session) by using a column Group. Seaborn is used to visualize two boxplots per date, where the hue is set to Group.
The problem comes when considering that the values corresponding to U (users) are much bigger than those corresponding to S (session), making the S data illegible. Thus, I need to come up with a solution that allows me to plot both series (U and S) in the same figure in an understandable manner.
I wonder if independent Y axes (with different scales) can be set to each hue, so that both Y axes are shown (as when using twinx but without losing hue visualization capabilities).
Any other alternative would be welcome =)
The S boxplot time series boxplot:
The combined boxplot time series using hue. Obviously it's not possible to see any information about the S group because of the scale of the Y axis:
The columns of the dataframe:
| Day (datetime) | n_data (numeric) | Group (S or U)|
The code line generating the combined boxplot:
seaborn.boxplot(ax=ax,x='Day', y='n_data', hue='Group', data=df,
palette='PRGn', showfliers=False)
Managed to find a solution by using twinx:
fig,ax= plt.subplots(figsize=(50,10))
tmpU = groups.copy()
tmpU.loc[tmp['Group']!='U','n_data'] = np.nan
tmpS = grupos.copy()
tmpS.loc[tmp['Group']!='S','n_data'] = np.nan
ax=seaborn.boxplot(ax=ax,x='Day', y = 'n_data', hue='Group', data=tmpU, palette = 'PRGn', showfliers=False)
ax2 = ax.twinx()
seaborn.boxplot(ax=ax2,x='Day', y = 'n_data', hue='Group', data=tmpS, palette = 'PRGn', showfliers=False)
handles,labels = ax.get_legend_handles_labels()
l= plt.legend(handles[0:2],labels[0:2],loc=1)
plt.setp(ax.get_xticklabels(),rotation=30,horizontalalignment='right')
for label in ax.get_xticklabels()[::2]:
label.set_visible(False)
plt.show()
plt.close('all')
The code above generates the following figure:
Which in this case turns out to be too dense to be published. Therefore I would adopt a visualization based in subplots, as Parfait susgested in his/her answer.
It wasn't an obvious solution to me so I would like to thank Parfait for his/her answer.
Consider building separate plots on same figure with y-axes ranges tailored to subsetted data. Below demonstrates with random data seeded for reproducibility (for readers of this post).
Data (with U values higher than S values)
import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt
np.random.seed(2018)
u_df = pd.DataFrame({'Day': pd.date_range('2016-10-01', periods=10)\
.append(pd.date_range('2016-10-01', periods=10)),
'n_data': np.random.uniform(0,800,20),
'Group': 'U'})
s_df = pd.DataFrame({'Day': pd.date_range('2016-10-01', periods=10)\
.append(pd.date_range('2016-10-01', periods=10)),
'n_data': np.random.uniform(0,200,20),
'Group': 'S'})
df = pd.concat([u_df, s_df], ignore_index=True)
df['Day'] = df['Day'].astype('str')
Plot
fig = plt.figure(figsize=(10,5))
for i,g in enumerate(df.groupby('Group')):
plt.title('N_data of {}'.format(g[0]))
plt.subplot(2, 1, i+1)
seaborn.boxplot(x="Day", y="n_data", data=g[1], palette="PRGn", showfliers=False)
plt.tight_layout()
plt.show()
plt.clf()
plt.close('all')
To retain original hue and grouping, render all non-group n_data to np.nan:
fig = plt.figure(figsize=(10,5))
for i,g in enumerate(df.Group.unique()):
plt.subplot(2, 1, i+1)
tmp = df.copy()
tmp.loc[tmp['Group']!=g, 'n_data'] = np.nan
seaborn.boxplot(x="Day", y="n_data", hue="Group", data=tmp,
palette="PRGn", showfliers=False)
plt.tight_layout()
plt.show()
plt.clf()
plt.close('all')
So one option to do a grouped box plot with two separate axis is to use hue_order= ['value, np.nan] in your argument for sns.boxplot:
fig = plt.figure(figsize=(14,8))
ax = sns.boxplot(x="lon_bucketed", y="value", data=m, hue='name', hue_order=['co2',np.nan],
width=0.75,showmeans=True,meanprops={"marker":"s","markerfacecolor":"black", "markeredgecolor":"black"},linewidth=0.5 ,palette = customPalette)
ax2 = ax.twinx()
ax2 = sns.boxplot(ax=ax2,x="lon_bucketed", y="value", data=m, hue='name', hue_order=[np.nan,'g_xco2'],
width=0.75,showmeans=True,meanprops={"marker":"s","markerfacecolor":"black", "markeredgecolor":"black"},linewidth=0.5, palette = customPalette)
ax1.grid(alpha=0.5, which = 'major')
plt.tight_layout()
ax.legend_.remove()
GW = mpatches.Patch(color='seagreen', label='$CO_2$')
WW = mpatches.Patch(color='mediumaquamarine', label='$XCO_2$')
ax, ax2.legend(handles=[GW,WW], loc='upper right',prop={'size': 14}, fontsize=12)
ax.set_title("$XCO_2$ vs. $CO_2$",fontsize=18)
ax.set_xlabel('Longitude [\u00b0]',fontsize=14)
ax.set_ylabel('$CO_2$ [ppm]',fontsize=14)
ax2.set_ylabel('$XCO_2$ [ppm]',fontsize=14)
ax.tick_params(labelsize=14)
Simply put suppose I have 2 lists:
A -> Has the list of names ['A','B','C','D','E','F','G','H']
B -> Has the list of values [5,7,3,8,2,9,1,3]
A will be the names of the X-Axis labels and the corresponding values in B will be the height of the graph ( i.e. the Y-Axis ).
%matplotlib inline
import pandas as pd
from matplotlib import rcParams
import matplotlib.pyplot as plt
from operator import itemgetter
rcParams.update({'figure.autolayout': True})
plt.figure(figsize=(14,9), dpi=600)
reso_names = ['A','B','C','D','E','F','G','H']
reso_values = [5,7,3,8,2,9,1,3]
plt.bar(range(len(reso_values)), reso_values, align='center')
plt.xticks(range(len(reso_names)), list(reso_names), rotation='vertical')
plt.margins(0.075)
plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')
plt.title('Graph', {'family' : 'Arial Black',
'weight' : 'bold',
'size' : 22})
plt.show()
This code gives the following output :
However I want it such that it makes subgraphs for every 2 values. In this case there should be 4 subgraphs:
1st Graph has 'A' and 'B'
2nd Graph has 'C' and 'D'
3rd Graph has 'E' and 'F'
4th Graph has 'G' and 'H'
This splitting should be done dynamically (not 4 different loops, it should break the graph into units of 2 each depending on the size of the input, if list A has 10 values then it should give 5 subgraphs).
I figured out how to split the graph into two with half each but I need to achieve it using steps of N per graph (N in this example being 2).
The code I have for breaking the graph into 2 equal subgraphs is :
%matplotlib inline
import pandas as pd
from matplotlib import rcParams
import matplotlib.pyplot as plt
from operator import itemgetter
rcParams.update({'figure.autolayout': True})
plt.figure(figsize=(14,9), dpi=600)
reso_names = ['A','B','C','D','E','F','G','H']
reso_values = [5,7,3,8,2,9,1,3]
fig, axs = plt.subplots(nrows=2, sharey=True, figsize=(14,18), dpi=50)
size = int(len(reso_values))
half = int( size/2 )
fig.suptitle('Graph',
**{'family': 'Arial Black', 'size': 22, 'weight': 'bold'})
for ax, start, end in zip(axs, (0, half), (half, size)):
names, values = list(reso_names[start:end]), reso_values[start:end]
ax.bar(range(len(values)), values, align='center')
ax.set_xlabel('X-Axis')
ax.set_ylabel('Y-Axis')
ax.set_xticks(range(len(names)))
ax.set_xticklabels(names, rotation='vertical')
ax.set_xlim(0, len(names))
fig.subplots_adjust(bottom=0.05, top=0.95)
plt.show()
Which gives me :
I just want the program to dynamically split the graphs into subgraphs based on the splitting number N.
You can directly split your lists values/names with size elements into size//N + 1 list of N elements with this code :
N=3
sublists_names = [reso_names[x:x+N] for x in range(0, len(reso_names), N)]
sublists_values = [reso_values[x:x+N] for x in range(0, len(reso_values), N)]
Note that the last sublist will have less elements if N does not divide size.
Then you just perform a zip and plot each sublist in a different graph :
import pandas as pd
from matplotlib import rcParams
import matplotlib.pyplot as plt
from operator import itemgetter
rcParams.update({'figure.autolayout': True})
plt.figure(figsize=(14,9), dpi=600)
reso_names = ['A','B','C','D','E','F','G','H']
reso_values = [5,7,3,8,2,9,1,3]
N=3
sublists_names = [reso_names[x:x+N] for x in range(0, len(reso_names), N)]
sublists_values = [reso_values[x:x+N] for x in range(0, len(reso_values), N)]
size = int(len(reso_values))
fig, axs = plt.subplots(nrows=size//N+1, sharey=True, figsize=(14,18), dpi=50)
fig.suptitle('Graph',
**{'family': 'Arial Black', 'size': 22, 'weight': 'bold'})
for ax, names, values in zip(axs, sublists_names, sublists_values):
ax.bar(range(len(values)), values, align='center')
ax.set_xlabel('X-Axis')
ax.set_ylabel('Y-Axis')
ax.set_xticks(range(len(names)))
ax.set_xticklabels(names, rotation='vertical')
ax.set_xlim(0, len(names))
#ax.set_xlim(0, N)
fig.subplots_adjust(bottom=0.05, top=0.95)
plt.show()
If the list are not dividible by N, you can uncomment the last commented line so the bars stay alined on the last subplot : (ax.set_xlim(0, N)) :