How to save shap partial dependance plot as pdf - python

How to save the shap partial dependance plot as a pdf?
What have I tried?
shap.partial_dependence_plot(
"B", model.predict, X100, ice=False, show=False,
model_expected_value=True, feature_expected_value=True
)
plt.show()
plt.draw()
plt.savefig('K.pdf',bbox_inches='tight', dpi=600)
#fig = plt.figure()

Related

How to get the max x value of a seaborn distribution (and plot the corresponding line)?

I'm working on a project about my Spotify playlists, which output is 6 ditribution graphs with 3 lines each. Something like this:
enter image description here
I would like for every curve to plot a vertical line reaching the top of each curve, and to put the x value on the x axis. Something like this:
enter image description here
I have found a solution here. But after having trying a lot of work around it seems that this doesn't work for me because I use sub plot.
I have done a simplier/usable code if you want to work with.
from pylab import *
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
input1 = np.random.normal(loc=0.2, scale=1.0, size=25)
output1 = np.random.normal(loc=0.3, scale=1.0, size=25)
input2 = np.random.normal(loc=0.1, scale=1.0, size=25)
output2 = np.random.normal(loc=0.15, scale=1.0, size=25)
inputs = [input1 , input2]
outputs = [output1 , output2]
my_alpha = 0.03
my_linewidth = 0.7
for i in range(len(inputs)):
fig = subplot(1,2,i+1)
ax1 = sns.kdeplot(inputs[i], shade=True, alpha=my_alpha, linewidth=my_linewidth, label = 'Input Playlist', color = '#2cc758')
ax2 = sns.kdeplot(outputs[i], shade=True, alpha=my_alpha, linewidth=my_linewidth, label = 'Recommandations Made', color = '#dbb818')
myLegend = plt.legend(loc='lower left', bbox_to_anchor=(-1.2,1.05), ncol=3) #legend location is set from the last ploted graph
myLegend.set_title("Tracks Set")
plt.setp(myLegend.get_title())
plt.show()
If you have some ideas I'll be glad to read you.
Thanks to the community
For this to work with the latest seaborn versions, fill=False is needed to obtain a line. Then, that line can be used for fill_between and to extract the highest point. When there is more than one kdeplot onto the same subplot, ax.lines[-1] gives the last added curve.
It helps to use for loops without a counter to make the code easier to maintain.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
input1 = np.random.normal(loc=0.2, scale=1.0, size=25)
output1 = np.random.normal(loc=0.3, scale=1.0, size=25)
input2 = np.random.normal(loc=0.1, scale=1.0, size=25)
output2 = np.random.normal(loc=0.15, scale=1.0, size=25)
inputs = [input1, input2]
outputs = [output1, output2]
fig, axs = plt.subplots(ncols=len(inputs), figsize=(15, 5))
for ax, input_i, output_i in zip(axs, inputs, outputs):
for data, label, color in zip([input_i, output_i],
['Input Playlist', 'Recommandations Made'],
['#2cc758', '#dbb818']):
sns.kdeplot(data, fill=False, linewidth=0.7, label=label, color=color, ax=ax)
xs, ys = ax.lines[-1].get_data()
ax.fill_between(xs, ys, color=color, alpha=0.05)
mode_idx = np.argmax(ys)
ax.vlines(xs[mode_idx], 0, ys[mode_idx], ls='--', color=color)
ax.text(xs[mode_idx], -0.1, f'{xs[mode_idx]:.2f}', color=color, ha='center', transform=ax.get_xaxis_transform())
myLegend = axs[0].legend(loc='lower left', bbox_to_anchor=(0, 1.02), ncol=3)
myLegend.set_title("Tracks Set")
plt.tight_layout()
plt.show()

Legends not appearing in 3D plot

I'm using the following dummy code for generating a 3D plot.
import random
from matplotlib import pyplot as plt
random.seed(0)
D = [[random.random() for x in range(3)] for y in range(1000)]
df = pd.DataFrame(D,columns=['Feature_1','Feature_2','Feature_3'])
predictions = [random.randint(0,4) for x in range(1000)]
df['predictions'] = predictions
plt.rcParams["figure.figsize"]=(10,10)
plt.rcParams['legend.fontsize'] = 10
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(df['Feature_1'],df['Feature_2'],df['Feature_3'], c=df['predictions'], s =150,cmap='rainbow')
ax.legend(loc = 'upper left')
ax.set_xlabel('Feature_1',fontsize=20,labelpad=10)
ax.set_ylabel('Feature_2', fontsize=20, rotation=150,labelpad=10)
ax.set_zlabel('Feature_3', fontsize=20, rotation=60,labelpad=15)
plt.show()
I'm using as marker color the column predictions, and i would like for each element of that column to appear in the legend but it does not.
Here's a screenshot of the resulting plot
You forgot to put a label handle into the scatter function. If you replace your scatter call with the following line, a legend will show up:
ax.scatter(
df['Feature_1'], df['Feature_2'], df['Feature_3'],
c=df['predictions'], s=150, cmap='rainbow', label='Dummy data'
)
Or to show the predictions classes as labels:
scatter = ax.scatter(df['Feature_1'], df['Feature_2'], df['Feature_3'],
c=df['predictions'], s=150, cmap='rainbow')
legend1 = ax.legend(*scatter.legend_elements(),
loc="upper left", title="Classes")
ax.add_artist(legend1)

Seaborn jointplot color histogram

I'd like to color my histogram according to my palette. Here's the code I used to make this, and here's the error I received when I tried an answer I found on here.
g = sns.jointplot(data=emb_df, x='f0', y='y', kind="hist", hue='klabels', palette='tab10', marginal_kws={'hist_kws': {'palette': 'tab10'}})
plt.show()
UserWarning: The marginal plotting function has changed to `histplot`, which does not accept the following argument(s): hist_kws.
I have also tried this:
plt.setp(g.ax_marg_y.patches, color='grey')
But this does not color my histogram according my 'klabels' parameter, just a flat grey.
The marginal plot is colored by default using the same palette with corresponding hue. So, you could just run it without marginal_kws=. The marginal_kws= go directly to the histplot; instead of marginal_kws={'hist_kws': {'palette': 'tab10'}}, the correct use would be marginal_kws={'palette': 'tab10'}. If you would like stacked bars, you could try marginal_kws={'multiple': 'stack'})
If you want the marginal plots to be larger, the ratio= parameter can be altered. The default is 5, meaning the central plot is 5 times as large as the marginal plots.
Here is an example:
from matplotlib import pyplot as plt
import seaborn as sns
iris = sns.load_dataset('iris')
g = sns.jointplot(data=iris, x='petal_length', y='sepal_length', kind="hist", hue='species', palette='tab10',
ratio=2, marginal_kws={'multiple': 'stack'})
sns.move_legend(g.ax_joint, loc='upper left') # optionally move the legend; seaborn >= 0.11.2 needed
plt.show()
To have these plots side-by-side as subplots, you can call the underlying sns.histplot either with both x= and y= filled in (2D histogram), only x= given (horizontal histogram) or only y= given (vertical histogram).
from matplotlib import pyplot as plt
import seaborn as sns
iris = sns.load_dataset('iris')
fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(15, 4))
sns.histplot(data=iris, x='petal_length', y='sepal_length', hue='species', palette='tab10', legend=False, ax=ax1)
sns.histplot(data=iris, x='petal_length', hue='species', palette='tab10', multiple='stack', legend=False, ax=ax2)
sns.histplot(data=iris, y='sepal_length', hue='species', palette='tab10', multiple='stack', ax=ax3)
sns.move_legend(ax3, bbox_to_anchor=[1.01, 1.01], loc='upper left')
plt.tight_layout()
plt.show()

scatterplot matrix with marginal probability distributions in seaborn

It is straightfoward to do scatter plot matrices with seaborn pairplot. Jointplot also allows combining scatter plots with marginal probability distributions for a single plot.
Although the option diag_kind='kde' let you plot the probability distributions in the diagonal (useful when x_varsand y_vars are the same) I want to combine both to have marginal probability distributions in a matrix scatter plot. Something like this:
How do I get marginal probability distributions in a matrix scatter plot in seaborn as shown in my screenshot above?
Many thanks mwaskom for the guiding.
As you suggested, I built my own matplotlib figure and plotted the seaborn plots there guided by this piece of documentation.
def basic_conf(f,a,xin,yin,x,y):
ax = f.add_subplot(a)
ax.tick_params(axis='both', which='major', labelsize=10)
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
if xin !=0:
ax.set_yticklabels([])
ax.set_ylabel(" ",fontsize=0).set_visible(False)
ax.set_ylabel(y,fontsize=10)
ax.set_xticklabels([])
ax.set_xlabel(" ",fontsize=0).set_visible(False)
return ax
def xhist_conf(f,a,x):
ax = f.add_subplot(a)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.set_yticklabels([])
ax.yaxis.set_ticks_position('none')
ax.set_xlabel(x,fontsize=10)
ax.set_ylabel(" ").set_visible(False)#,fontsize='xx-small'
return ax
def yhist_conf(f,a,y):
ax = f.add_subplot(a)
ax.tick_params(axis='both', which='major', labelsize=10)
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.xaxis.set_ticks_position('none')
ax.set_xlabel(" ",fontsize='xx-small').set_visible(False)
ax.set_ylabel(" ",fontsize=0).set_visible(False)
return ax
def includer(ax,x,y):
r,_=stats.pearsonr(concat_convert[x],concat_convert[y])
ax.text(0.1, 0.9, f'ρ = {r:.2f}', transform=ax.transAxes)#,fontsize='xx-small'
x_vars=["$P_{LA}$", "$R^{Ao}_P$", "$C^{Ao}_P$", "$R^{Ao}_S$", "$B_{VAD}$", "$A_{VAD}$", "HR", "EF"]
y_vars=["${Q}^{avg}_{M}$", "${Q}^{max}_{M}$","${Q}^{avg}_{Ao}$", "${Q}^{max}_{Ao}$", "${Q}^{avg}_{VAD}$", "${Q}^{max}_{VAD}$", "$Q_{RAT}$"]
sns.set(context="paper",font_scale=1.75,style="ticks")
f = plt.figure(figsize=(18, 16), dpi=600)
gs = f.add_gridspec(8, 9)
plt.rcParams['font.size'] = '10'
plt.rcParams['xtick.labelsize']='8'
with sns.axes_style("ticks"):
xin=0
for x in x_vars:
yin=0
for y in y_vars:
ax = basic_conf(f,gs[yin,xin],xin,yin,x,y)
sns.regplot(ax=ax, data=concat_convert, x=x, y=y, scatter_kws={'s':4})
includer(ax,x,y)
yin=yin+1
xin=xin+1
xin=0
for x in x_vars:
ax = xhist_conf(f,gs[yin,xin],x)
sns.histplot(ax=ax, data=concat_convert, x=x, kde=True)
xin=xin+1
yin=0
for y in y_vars:
ax = yhist_conf(f,gs[yin,xin],y)
sns.histplot(ax=ax, data=concat_convert, y=y, kde=True)
yin=yin+1
for i in range(len(y_vars)):
ax = f.add_subplot(gs[i,2])
ax.set_xlim((0.001,0.0014))
ax = f.add_subplot(gs[len(y_vars),0])
ax.ticklabel_format(style='sci',scilimits=(0,0), axis='x')
ax = f.add_subplot(gs[len(y_vars),5])
ax.ticklabel_format(style='sci',scilimits=(0,0), axis='x')
And it get me exactly what I want:
Many thanks.
EDIT: Final code snippet and obtained plot.

Unable to generate legend using python / matlibplot for 4 lines all labelled

Want labels for Bollinger Bands (R) ('upper band', 'rolling mean', 'lower band') to show up in legend. But legend just applies the same label to each line with the pandas label for the first (only) column, 'IBM'.
# Plot price values, rolling mean and Bollinger Bands (R)
ax = prices['IBM'].plot(title="Bollinger Bands")
rm_sym.plot(label='Rolling mean', ax=ax)
upper_band.plot(label='upper band', c='r', ax=ax)
lower_band.plot(label='lower band', c='r', ax=ax)
#
# Add axis labels and legend
ax.set_xlabel("Date")
ax.set_ylabel("Adjusted Closing Price")
ax.legend(loc='upper left')
plt.show()
I know this code may represent a fundamental lack of understanding of how matlibplot works so explanations are particularly welcome.
The problem is most probably that whatever upper_band and lower_band are, they are not labeled.
One option is to label them by putting them as column to a dataframe. This will allow to plot the dataframe column directly.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
y =np.random.rand(4)
yupper = y+0.2
ylower = y-0.2
df = pd.DataFrame({"price" : y, "upper": yupper, "lower": ylower})
fig, ax = plt.subplots()
df["price"].plot(label='Rolling mean', ax=ax)
df["upper"].plot(label='upper band', c='r', ax=ax)
df["lower"].plot(label='lower band', c='r', ax=ax)
ax.legend(loc='upper left')
plt.show()
Otherwise you can also plot the data directly.
import matplotlib.pyplot as plt
import numpy as np
y =np.random.rand(4)
yupper = y+0.2
ylower = y-0.2
fig, ax = plt.subplots()
ax.plot(y, label='Rolling mean')
ax.plot(yupper, label='upper band', c='r')
ax.plot(ylower, label='lower band', c='r')
ax.legend(loc='upper left')
plt.show()
In both cases, you'll get a legend with labels. If that isn't enough, I recommend reading the Matplotlib Legend Guide which also tells you how to manually add labels to legends.

Categories