PLS-DA Loading Plot in Python - python

How can I make a Loading plot with Matplotlib of a PLS-DA plot, like the loading plot like that of PCA?
This answer explains how it can be done with PCA:
Plot PCA loadings and loading in biplot in sklearn (like R's autoplot)
However there are some significant differences between the two methods which makes the implementation different as well. (Some of the relevant differences are explained here https://learnche.org/pid/latent-variable-modelling/projection-to-latent-structures/interpreting-pls-scores-and-loadings )
To make the PLS-DA plot I use the following code:
from sklearn.preprocessing import StandardScaler
from sklearn.cross_decomposition import PLSRegression
import numpy as np
import pandas as pd
targets = [0, 1]
x_vals = StandardScaler().fit_transform(df.values)
y = [g == targets[0] for g in sample_description]
y = np.array(y, dtype=int)
plsr = PLSRegression(n_components=2, scale=False)
plsr.fit(x_vals, y)
colormap = {
targets[0]: '#ff0000', # Red
targets[1]: '#0000ff', # Blue
}
colorlist = [colormap[c] for c in sample_description]
scores = pd.DataFrame(plsr.x_scores_)
scores.index = x.index
x_loadings = plsr.x_loadings_
y_loadings = plsr.y_loadings_
fig1, ax = get_default_fig_ax('Scores on LV 1', 'Scores on LV 2', title)
ax = scores.plot(x=0, y=1, kind='scatter', s=50, alpha=0.7,
c=colorlist, ax=ax)

I took your code and enhanced it. The biplot is obtained via simply overlaying the score and the loading plot.
Other, more rigerous plots could be made with truely shared axis according to https://blogs.sas.com/content/iml/2019/11/06/what-are-biplots.html#:~:text=A%20biplot%20is%20an%20overlay,them%20on%20a%20single%20plot.
The code below generates this image for a dataset with ~200 features (therefore there are ~200 red arrows shown):
from sklearn.cross_decomposition import PLSRegression
pls2 = PLSRegression(n_components=2)
pls2.fit(X_train, Y_train)
x_loadings = pls2.x_loadings_
y_loadings = pls2.y_loadings_
fig, ax = plt.subplots(constrained_layout=True)
scores = pd.DataFrame(pls2.x_scores_)
scores.plot(x=0, y=1, kind='scatter', s=50, alpha=0.7,
c=Y_train.values[:,0], ax = ax)
newax = fig.add_axes(ax.get_position(), frameon=False)
feature_n=x_loadings.shape[0]
print(x_loadings.shape)
for feature_i in range(feature_n):
comp_1_idx=0
comp_2_idx=1
newax.arrow(0, 0, x_loadings[feature_i,comp_1_idx], x_loadings[feature_i,comp_2_idx],color = 'r',alpha = 0.5)
newax.get_xaxis().set_visible(False)
newax.get_yaxis().set_visible(False)
plt.show()

Related

Save 3D Matplot as interactive HTML

I´m trying to save my 3D trisurface Plot as an interactive HTML figure, so it should be possible to zoom in/ out and change the viewpoint. In the IDE the plot already exists and works so far, but I
can`t save it in the HTML format because of the ValueError:
"The fig parameter must be a dict or Figure.
Received value of type <class 'matplotlib.figure.Figure'>: Figure(1600x900)".
I don´t understand why the "<class 'matplotlib.figure.Figure'>" is not a Figure?
This was my approach: https://plotly.com/python/interactive-html-export/
And I tried it with go.Figure() (Export rotable 3D plots from Python to HTML) already but it didn´t work with the trisurf.
Is there a way to keep my Plot settings (use trisurf as it is) and get the interactive figure in HTML?
Thanks a lot for any answer
#Import libraries
import matplotlib
#matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.ticker import LinearLocator
import numpy as np
from mpl_toolkits.mplot3d import axes3d, Axes3D
import pandas as pd
import plotly.express as px
import io
import plotly.io as pio
%matplotlib notebook
E = np.arange(225)
D = np.arange(225)
A = np.arange(225)
E = [10000.0, 10000.0, ...]
D = [500.0, 1000.0, ...]
A = [1.9495, 1.9644, ...]
#Create figure
fig = plt.figure(figsize =(16, 9))
ax = plt.axes(projection ='3d')
# Creating color map
my_cmap = plt.get_cmap('hot')
# Data for three-dimensional scattered points
zdata = A
xdata = D
ydata = E
# Creating plot
trisurf = ax.plot_trisurf(xdata, ydata, zdata,
cmap = my_cmap,
linewidth = 0.2,
antialiased = True,
edgecolor = 'grey')
fig.colorbar(trisurf, ax = ax, shrink = 0.5, aspect = 10)
ax.set_title('AIE_SIM0.003__lowE_10000_upE_460000_stepE_30000_lowD_500.0_upD_8000.0_stepD_500.0')
ax.set_xlabel('Damping Ns/m')
ax.set_ylabel('Stifness N/m')
ax.set_zlabel('Amplification')
A2 = np.arange(225)
A2.fill(20.757)
# Creating color map
my_cmap2 = plt.get_cmap('gray')
# Data for three-dimensional scattered points
zdata2 = A2
xdata = D
ydata = E
# Creating plot
trisurf2 = ax.plot_trisurf(xdata, ydata, zdata2,
cmap = my_cmap2,
linewidth = 0.2,
antialiased = False,
edgecolor = 'none', alpha = 0.2)
fig.colorbar(trisurf2, ax = ax, shrink = 0.5, aspect = 10)
print(type(fig))
#fig.write_html("file.html")
plotly.io.to_html(fig=fig)
fig.savefig('3D_Plot_PNG_lowE_10000_upE_460000_stepE_30000_lowD_500.0_upD_8000.0_stepD_500.0.png')
fig.show()
------------------------------------------------------------------------------------------
Figure 1
printed: <class 'matplotlib.figure.Figure'>
ValueError:
The fig parameter must be a dict or Figure.
Received value of type <class 'matplotlib.figure.Figure'>: Figure(1600x900)
As far as I'm aware, Matplotlib is not able to generate 3D html plot.
Moreover, what you tried above is wrong. That error message is telling you that Plotly's to_html only works with Plotly's Figure. So mixing Plotly and Matplotlib is not going to work. You need to create a Plotly figure.
Also, I don't think that Plotly exposes something similar to Matplotlib's plot_trisurf. However, it exposes go.Mesh that allows us to achieve the same result.
The recipe:
Generate your numerical data.
Create a triangulation. We will use Matplotlib's Triangulation class for this part.
Create the Plotly figure and add the surface.
Export the figure to html.
Here I'm going to post an example to guide you:
import numpy as np
import matplotlib.tri as mtri
import plotly.graph_objects as go
### DATA GENERATION
# Make parameter spaces radii and angles.
n_angles = 36
n_radii = 8
min_radius = 0.25
radii = np.linspace(min_radius, 0.95, n_radii)
angles = np.linspace(0, 2*np.pi, n_angles, endpoint=False)
angles = np.repeat(angles[..., np.newaxis], n_radii, axis=1)
angles[:, 1::2] += np.pi/n_angles
# Map radius, angle pairs to x, y, z points.
x = (radii*np.cos(angles)).flatten()
y = (radii*np.sin(angles)).flatten()
z = (np.cos(radii)*np.cos(3*angles)).flatten()
### TRIANGULATION
# Create the Triangulation; no triangles so Delaunay triangulation created.
triang = mtri.Triangulation(x, y)
# Mask off unwanted triangles.
xmid = x[triang.triangles].mean(axis=1)
ymid = y[triang.triangles].mean(axis=1)
mask = xmid**2 + ymid**2 < min_radius**2
triangles = triang.triangles[~mask]
### PLOT
fig = go.Figure(data=[
# go.Mesh allows to provide the triangulation
go.Mesh3d(
x=x, y=y, z=z,
colorbar_title='z',
colorscale="aggrnyl",
# Intensity of each vertex, which will be interpolated and color-coded
intensity =z,
# i, j and k give the vertices of triangles
i = triangles[:, 0],
j = triangles[:, 1],
k = triangles[:, 2],
showscale=True
)
])
fig.show()
### EXPORT TO HTML
# Please, execute `help(fig.write_html)` to learn about all the
# available keyword arguments to control the output
fig.write_html("test.html", include_plotlyjs=True, full_html=True)

Confidence Interval 3 dimensional plot

I have a 3-dimensional plot and I am able to plot it with the code written below.
Considering that my point distribution is represented by a 100x100 matrix, is it possible to plot a confidence interval on my data? In the code below, my data are called "result", while the upper bound and lower bound that I want to show are called "upper_bound" and "lower_bound".
For example, I am asking if exist something like this, but in 3 dimension (instead of 2 dimension like the picture below)
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
interval = np.random.normal(0, 1, size=(100, 100))
x = np.arange(0.1,1.1,0.01)
y = np.linspace(-np.pi,np.pi,100)
X,Y = np.meshgrid(x,y)
result = []
for i,j in zip(X,Y):
result.append(np.log(i)+np.sin(j))
upper_bound = np.array(result)+interval
lower_bound = np.array(result)-interval
fig = plt.figure()
fig.set_figwidth(20)
fig.set_figheight(6)
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, np.array(result))
ax.zaxis.set_major_locator(LinearLocator(10))
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()
Check out this 3d surface plot using plotly graph objects:
import plotly.graph_objects as go
import numpy as np
x = np.arange(0.1,1.1,0.01)
y = np.linspace(-np.pi,np.pi,100)
X,Y = np.meshgrid(x,y)
result = []
for i,j in zip(X,Y):
result.append(np.log(i)+np.sin(j))
upper_bound = np.array(result)+1
lower_bound = np.array(result)-1
fig = go.Figure(data=[
go.Surface(z=result),
go.Surface(z=upper_bound, showscale=False, opacity=0.3,colorscale='purp'),
go.Surface(z=lower_bound, showscale=False, opacity=0.3,colorscale='purp'),
])
fig.show()
This plots 3 surfaces, the one for your results and the 2 bounds. However if you'd like something that looks more like a filled volume you'd have to add volume graphs with scaling opacity.

How to plot scatter plot using python?

I have used this code to create clusters and I want to plot the scatter plot of the clusters. The vectorAssembles_01 produces data with ID and features. Both should be used to plot the scatter plot.When I am running the code in google Collab I am getting an error message stating RecursionError: maximum recursion depth exceeded in comparison. please correct if I am wrong.
from pyspark.ml.clustering import KMeans
from pyspark.ml.feature import VectorAssembler
import numpy as np
import matplotlib.pyplot as plt
FEATURES_COL = ['Height(CM)', 'Weight(KG)',
'Crossing', 'Finishing', 'HeadingAccuracy',
'ShortPassing', 'Volleys', 'Dribbling', 'Curve',
'FKAccuracy', 'LongPassing', 'BallControl',
'Acceleration', 'SprintSpeed', 'Agility',
'Reactions', 'Balance', 'ShotPower', 'Jumping',
'Stamina', 'Strength', 'LongShots', 'Aggression',
'Interceptions', 'Positioning', 'Vision', 'Penalties',
'Composure', 'Marking', 'StandingTackle', 'SlidingTackle']
vecAssembler_01 = VectorAssembler(inputCols=FEATURES_COL, outputCol="features")
df_kmeansn = vecAssembler_01.transform(df).select('ID','features')
df_kmeansn.show()
#df_kmeansn.plot("ID","fearures",kind="Scatter")
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
x = df_kmeansn.ID
y = df_kmeansn.features
ax.scatter(x, y, alpha=0.8, edgecolors='none')
The output of the df_kmeansn is as shown below.
I'm not sure you can just plot Spark Dataframe directly, perhaps you should call "to_pandas" first
# ...
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
df_pandas = df_kmeansn.to_pandas()
x = df_pandas.ID
y = df_pandas.features
ax.scatter(x, y, alpha=0.8, edgecolors='none')

How can I do a histogram with 1D gaussian mixture with sklearn?

I would like to do an histogram with mixture 1D gaussian as the picture.
Thanks Meng for the picture.
My histogram is this:
I have a file with a lot of data (4,000,000 of numbers) in a column:
1.727182
1.645300
1.619943
1.709263
1.614427
1.522313
And I'm using the follow script with modifications than Meng and Justice Lord have done :
from matplotlib import rc
from sklearn import mixture
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
import matplotlib.ticker as tkr
import scipy.stats as stats
x = open("prueba.dat").read().splitlines()
f = np.ravel(x).astype(np.float)
f=f.reshape(-1,1)
g = mixture.GaussianMixture(n_components=3,covariance_type='full')
g.fit(f)
weights = g.weights_
means = g.means_
covars = g.covariances_
plt.hist(f, bins=100, histtype='bar', density=True, ec='red', alpha=0.5)
plt.plot(f,weights[0]*stats.norm.pdf(f,means[0],np.sqrt(covars[0])), c='red')
plt.rcParams['agg.path.chunksize'] = 10000
plt.grid()
plt.show()
And when I run the script, I have the follow plot:
So, I don't have idea how put the start and end of all gaussians that must be there. I'm new in python and I'm confuse with the way to use the modules. Please, Can you help me and guide me how can I do this plot?
Thanks a lot
Although this is a reasonably old thread, I would like to provide my take on it. I believe my answer can be more comprehensible to some. Moreover, I include a test to check whether or not the desired number of components makes statistical sense via the BIC criterion.
# import libraries (some are for cosmetics)
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
from matplotlib.ticker import (MultipleLocator, FormatStrFormatter, AutoMinorLocator)
import astropy
from scipy.stats import norm
from sklearn.mixture import GaussianMixture as GMM
import matplotlib as mpl
mpl.rcParams['axes.linewidth'] = 1.5
mpl.rcParams.update({'font.size': 15, 'font.family': 'STIXGeneral', 'mathtext.fontset': 'stix'})
# create the data as in #Meng's answer
x = np.concatenate((np.random.normal(5, 5, 1000), np.random.normal(10, 2, 1000)))
x = x.reshape(-1, 1)
# first of all, let's confirm the optimal number of components
bics = []
min_bic = 0
counter=1
for i in range (10): # test the AIC/BIC metric between 1 and 10 components
gmm = GMM(n_components = counter, max_iter=1000, random_state=0, covariance_type = 'full')
labels = gmm.fit(x).predict(x)
bic = gmm.bic(x)
bics.append(bic)
if bic < min_bic or min_bic == 0:
min_bic = bic
opt_bic = counter
counter = counter + 1
# plot the evolution of BIC/AIC with the number of components
fig = plt.figure(figsize=(10, 4))
ax = fig.add_subplot(1,2,1)
# Plot 1
plt.plot(np.arange(1,11), bics, 'o-', lw=3, c='black', label='BIC')
plt.legend(frameon=False, fontsize=15)
plt.xlabel('Number of components', fontsize=20)
plt.ylabel('Information criterion', fontsize=20)
plt.xticks(np.arange(0,11, 2))
plt.title('Opt. components = '+str(opt_bic), fontsize=20)
# Since the optimal value is n=2 according to both BIC and AIC, let's write down:
n_optimal = opt_bic
# create GMM model object
gmm = GMM(n_components = n_optimal, max_iter=1000, random_state=10, covariance_type = 'full')
# find useful parameters
mean = gmm.fit(x).means_
covs = gmm.fit(x).covariances_
weights = gmm.fit(x).weights_
# create necessary things to plot
x_axis = np.arange(-20, 30, 0.1)
y_axis0 = norm.pdf(x_axis, float(mean[0][0]), np.sqrt(float(covs[0][0][0])))*weights[0] # 1st gaussian
y_axis1 = norm.pdf(x_axis, float(mean[1][0]), np.sqrt(float(covs[1][0][0])))*weights[1] # 2nd gaussian
ax = fig.add_subplot(1,2,2)
# Plot 2
plt.hist(x, density=True, color='black', bins=np.arange(-100, 100, 1))
plt.plot(x_axis, y_axis0, lw=3, c='C0')
plt.plot(x_axis, y_axis1, lw=3, c='C1')
plt.plot(x_axis, y_axis0+y_axis1, lw=3, c='C2', ls='dashed')
plt.xlim(-10, 20)
#plt.ylim(0.0, 2.0)
plt.xlabel(r"X", fontsize=20)
plt.ylabel(r"Density", fontsize=20)
plt.subplots_adjust(wspace=0.3)
plt.show()
plt.close('all')
It's all about reshape.
First, you need to reshape f.
For pdf, reshape before using stats.norm.pdf. Similarly, sort and reshape before plotting.
from matplotlib import rc
from sklearn import mixture
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
import matplotlib.ticker as tkr
import scipy.stats as stats
# x = open("prueba.dat").read().splitlines()
# create the data
x = np.concatenate((np.random.normal(5, 5, 1000),np.random.normal(10, 2, 1000)))
f = np.ravel(x).astype(np.float)
f=f.reshape(-1,1)
g = mixture.GaussianMixture(n_components=3,covariance_type='full')
g.fit(f)
weights = g.weights_
means = g.means_
covars = g.covariances_
plt.hist(f, bins=100, histtype='bar', density=True, ec='red', alpha=0.5)
f_axis = f.copy().ravel()
f_axis.sort()
plt.plot(f_axis,weights[0]*stats.norm.pdf(f_axis,means[0],np.sqrt(covars[0])).ravel(), c='red')
plt.rcParams['agg.path.chunksize'] = 10000
plt.grid()
plt.show()

how to generate a series of histograms on matplotlib?

I would like to generate a series of histogram shown below:
The above visualization was done in tensorflow but I'd like to reproduce the same visualization on matplotlib.
EDIT:
Using plt.fill_between suggested by #SpghttCd, I have the following code:
colors=cm.OrRd_r(np.linspace(.2, .6, 10))
plt.figure()
x = np.arange(100)
for i in range(10):
y = np.random.rand(100)
plt.fill_between(x, y + 10-i, 10-i,
facecolor=colors[i]
edgecolor='w')
plt.show()
This works great, but is it possible to use histogram instead of a continuous curve?
EDIT:
joypy based approach, like mentioned in the comment of october:
import pandas as pd
import joypy
import numpy as np
df = pd.DataFrame()
for i in range(0, 400, 20):
df[i] = np.random.normal(i/410*5, size=30)
joypy.joyplot(df, overlap=2, colormap=cm.OrRd_r, linecolor='w', linewidth=.5)
for finer control of colors, you can define a color gradient function which accepts a fractional index and start and stop color tuples:
def color_gradient(x=0.0, start=(0, 0, 0), stop=(1, 1, 1)):
r = np.interp(x, [0, 1], [start[0], stop[0]])
g = np.interp(x, [0, 1], [start[1], stop[1]])
b = np.interp(x, [0, 1], [start[2], stop[2]])
return (r, g, b)
Usage:
joypy.joyplot(df, overlap=2, colormap=lambda x: color_gradient(x, start=(.78, .25, .09), stop=(1.0, .64, .44)), linecolor='w', linewidth=.5)
Examples with different start and stop tuples:
original answer:
You could iterate over your dataarrays you'd like to plot with plt.fill_between, setting colors to some gradient and the line color to white:
creating some sample data:
import numpy as np
t = np.linspace(-1.6, 1.6, 11)
y = np.cos(t)**2
y2 = lambda : y + np.random.random(len(y))/5-.1
plot the series:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
colors = cm.OrRd_r(np.linspace(.2, .6, 10))
plt.figure()
for i in range(10):
plt.fill_between(t+i, y2()+10-i/10, 10-i/10, facecolor = colors[i], edgecolor='w')
If you want it to have more optimized towards your example you should perhaps consider providing some sample data.
EDIT:
As I commented below, I'm not quite sure if I understand what you want - or if you want the best for your task. Therefore here a code which plots besides your approach in your edit two smples of how to present a bunch of histograms in a way that they are better comparable:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.cm as cm
N = 10
np.random.seed(42)
colors=cm.OrRd_r(np.linspace(.2, .6, N))
fig1 = plt.figure()
x = np.arange(100)
for i in range(10):
y = np.random.rand(100)
plt.fill_between(x, y + 10-i, 10-i,
facecolor=colors[i],
edgecolor='w')
data = np.random.binomial(20, .3, (N, 100))
fig2, axs = plt.subplots(N, figsize=(10, 6))
for i, d in enumerate(data):
axs[i].hist(d, range(20), color=colors[i], label=str(i))
fig2.legend(loc='upper center', ncol=5)
fig3, ax = plt.subplots(figsize=(10, 6))
ax.hist(data.T, range(20), color=colors, label=[str(i) for i in range(N)])
fig3.legend(loc='upper center', ncol=5)
This leads to the following plots:
your plot from your edit:
N histograms in N subplots:
N histograms side by side in one plot:

Categories