A simple call to plotly's figure_factory routine to create a scatter matrix:
import pandas as pd
import numpy as np
from plotly import figure_factory
df = pd.DataFrame(np.random.randn(40,3))
fig = figure_factory.create_scatterplotmatrix(df, diag='histogram')
fig.show()
yields
My questions are:
How can I specify a single color for all the plots?
How can I set the axes ranges for each of the three variables on the scatter plot?
Is there a way to create a density (normalized) version of the histogram?
Is there a way to include the correlation coefficient (say, computed from df.corr()) in the upper right corner of the non-diagonal plots?
To change to the same color for the first, update the marker attribute color in the generated graph data; to modify the range of axes for the second scatter plot, update the generated data in the same way; since only the x-axis has been modified, use the same technique for the y-axis if necessary; to change to a normalized version of the third histogram To change to the normalized version of the third histogram, replace it with the normalized data. The data to be replaced is the one done in the example specification in Ref. If this does not hit normalization, I believe it is possible to replace it with data obtained with np.histogram(), etc. The fourth is a note, but I have added the data obtained with df.corr() with the graph data reference, specifying the data by axis name for each subplot.
import pandas as pd
import numpy as np
from plotly import figure_factory
np.random.seed(20220529)
df = pd.DataFrame(np.random.randn(40,3))
density = px.histogram(df, x=[0,1,2], histnorm='probability density')
df_corr = df.corr()
fig = figure_factory.create_scatterplotmatrix(df, diag='histogram', height=600, width=600)
# 1.How can I specify a single color for all the plots?
for i in range(9):
fig.data[i]['marker']['color'] = 'blue'
# 2.How can I set the axes ranges for each of the three variables on the scatter plot?
for axes in ['xaxis2','xaxis3','xaxis4','xaxis6','xaxis7']:
fig.layout[axes]['range']=(-4,4)
# 3.Is there a way to create a density (normalized) version of the histogram?
fig['data'][0]['histnorm'] = 'probability density'
fig['data'][4]['histnorm'] = 'probability density'
fig['data'][8]['histnorm'] = 'probability density'
# 4.Is there a way to include the correlation coefficient (say, computed from df.corr())
# in the upper right corner of the non-diagonal plots?
for r,x,y in zip(df_corr.values.flatten(),
['x1','x2','x3','x4','x5','x6','x7','x8','x9'],
['y1','y2','y3','y4','y5','y6','y7','y8','y9']):
if r == 1.0:
pass
else:
fig.add_annotation(x=3.3, y=2, xref=x, yref=y, showarrow=False, text='R:'+str(round(r,2)))
fig.show()
Related
I'm trying to visualize what filters are learning in CNN text classification model. To do this, I extracted feature maps of text samples right after the convolutional layer, and for size 3 filter, I got an (filter_num)*(length_of_sentences) sized tensor.
df = pd.DataFrame(-np.random.randn(50,50), index = range(50), columns= range(50))
g= sns.clustermap(df,row_cluster=True,col_cluster=False)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()
This code results in :
Where I can't see all the ticks in the y-axis. This is necessary
because I need to see which filters learn which information. Is there
any way to properly exhibit all the ticks in the y-axis?
kwargs from sns.clustermap get passed on to sns.heatmap, which has an option yticklabels, whose documentation states (emphasis mine):
If True, plot the column names of the dataframe. If False, don’t plot the column names. If list-like, plot these alternate labels as the xticklabels. If an integer, use the column names but plot only every n label. If “auto”, try to densely plot non-overlapping labels.
Here, the easiest option is to set it to an integer, so it will plot every n labels. We want every label, so we want to set it to 1, i.e.:
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
In your complete example:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame(-np.random.randn(50,50), index=range(50), columns=range(50))
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()
I'm trying to create a plot that contains both a violin plot and a stripplot with jitter. How do I go about doing this? I provided my attempt below. The problem that I have been encountering is that the violin plot seems to be invisible in the plots.
# 1. Create violin plot
violin = alt.Chart(df).transform_density(
"n_genes_by_counts",
as_=["n_genes_by_counts", "density"],
).mark_area(orient="horizontal").encode(
y="n_genes_by_counts:Q",
x=alt.X("Density:Q", stack="center", title=None),
)
# 2. Create stripplot
stripplot = alt.Chart(df).mark_circle(size=8, color="black").encode(
y="n_gene_by_counts",
x=alt.X("jitter:Q", title=None),
).transform_calculate(
jitter="sqrt(-2*log(random()))*cos(2*PI*random())"
)
# 3. Combine both
combined = stripplot + violin
I have a feeling that it could be a problem with the scaling of the X axis. That is, density is much, much smaller than jitter. If that's the case, how to I make jitter so that it's on the same order of magnitude as density? Would it be possible for someone to show me how to create a violin+stripplot given a column name n_gene_by_counts that belongs to some pandas dataframe df? Here's an example image of the kind of plot I'm looking for:
As you suspected, the different scales will make the violin very small in the stripplot unless you adjust for it. In your case, you have also accidentally capitalized Density:Q in the channel encoding, which means that your violinplot is actually empty since this channel doesn't exist. This example works:
import altair as alt
from vega_datasets import data
df = data.cars()
# 1. Create violin plot
violin = alt.Chart(df).transform_density(
"Horsepower",
as_=["Horsepower", "density"],
).mark_area().encode(
x="Horsepower:Q",
y=alt.Y("density:Q", stack="center", title=None),
)
# 2. Create stripplot
stripplot = alt.Chart(df).mark_circle(size=8, color="black").encode(
x="Horsepower",
y=alt.X("jitter:Q", title=None),
).transform_calculate(
jitter="(random() / 400) + 0.0052" # Narrowing and centering the points
)
# 3. Combine both
violin + stripplot
By using scipy, you could also lay out the points themselves in the shape of the violin, which I am personally quite found of (discussion in this issue):
import altair as alt
import numpy as np
import pandas as pd
from scipy import stats
from vega_datasets import data
# NAs are not supported in SciPy's density calculation
df = data.cars().dropna()
y = 'Horsepower'
# Compute the density function of the data
dens = stats.gaussian_kde(df[y])
# Compute the density value for each data point
pdf = dens(df[y].sort_values())
# Randomly jitter points within 0 and the upper bond of the probability density function
density_cloud = np.empty(pdf.shape[0])
for i in range(pdf.shape[0]):
density_cloud[i] = np.random.uniform(0, pdf[i])
# To create a symmetric density/violin, we make every second point negative
# Distributing every other point like this is also more likely to preserve the shape of the violin
violin_cloud = density_cloud.copy()
violin_cloud[::2] = violin_cloud[::2] * -1
# Append the density cloud to the original data in the correctly sorted order
df_with_density = pd.concat([
df,
pd.DataFrame({
'density_cloud': density_cloud,
'violin_cloud': violin_cloud
},
index=df['Horsepower'].sort_values().index)],
axis=1
)
# Visualize using the new Offset channel
alt.Chart(df_with_density).mark_circle().encode(
x='Horsepower',
y='violin_cloud'
)
Both these approaches will work with multiple categoricals without faceting in the next version of Altair when support for x/y offset channels are added.
I'm trying to plot via:
g = sns.jointplot(x = etas, y = vs, marginal_kws=dict(bins=100), space = 0)
g.ax_joint.set_xscale('log')
g.ax_joint.set_yscale('log')
g.ax_joint.set_xlim(0.01)
g.ax_joint.set_ylim(0.01)
g.ax_joint.set_xlabel(r'$\eta$')
g.ax_joint.set_ylabel("V")
plt.savefig("simple_scatter_plot_Seanborn.png",figsize=(8,8), dpi=150)
Which leaves me with the following image:
This is not what I want. Why are the histograms filled at the end? There are no data points there so I don't get it...
You're setting a log scale on the matplotlib axes, but by the time you are doing that, seaborn has already computed the histogram. So the equal-width bins in linear space appear to have different widths; the lowest bin has a narrow range in terms of actual values, but that takes up a lot of space on the horizontal plot.
Tested in python 3.10, matplotlib 3.5.1, seaborn 0.11.2
Solution: pass log_scale=True to the histograms:
import seaborn as sns
# test dataset
planets = sns.load_dataset('planets')
g = sns.jointplot(data=planets, x="orbital_period", y="distance", marginal_kws=dict(log_scale=True))
without using marginal_kws=dict(log_scale=True)
Compared to setting the scale after the plot is created.
g = sns.jointplot(data=planets, x="orbital_period", y="distance")
g.ax_joint.set_xscale('log')
g.ax_joint.set_yscale('log')
I have one questions about matplotlib and contourf.
I am using the last version of matplotlib with python3.7. Basically I have to matrix I want to plot on the same contour plot but using different colormap. One important aspect is that, for instance, if we have zero matrixA and matrixB with shape=(10,10) then the positions in which matrixA is different of zero are the positions in which matrixB are non-zero, and viceversa.
In other words I want to plot in different colors two different mask.
Thanks for your time.
Edited:
I add an example here
import numpy
import matplotlib.pyplot as plt
matrixA=numpy.random.randn(10,10).reshape(100,)
matrixB=numpy.random.randn(10,10).reshape(100,)
mask=numpy.random.uniform(10,10)
mask=mask.reshape(100,)
indexA=numpy.where(mask[mask>0.5])[0]
indexB=numpy.where(mask[mask<=0.5])[0]
matrixA_masked=numpy.zeros(100,)
matrixB_masked=numpy.zeros(100,)
matrixA_masked[indexA]=matrixA[indexA]
matrixB_masked[indexB]=matrixB[indexB]
matrixA_masked=matrixA_masked.reshape(100,100)
matrixB_masked=matrixB_masked.reshape(100,100)
x=numpy.linspace(0,10,1)
X,Y = numpy.meshgrid(x,x)
plt.contourf(X,Y,matrixA_masked,colormap='gray')
plt.contourf(X,Y,matrixB_masked,colormap='winter')
plt.show()
What I want is to be able to use different colormaps that appear in the same plot. So for instance in the plot there will be a part assigned to matrixA with a contour color (and 0 where matrixB take place), and the same to matrixB with a different colormap.
In other works each part of the contourf plot correspond to one matrix. I am plotting decision surfaces of Machine Learning Models.
I stumbled into some errors in your code so I have created my own dataset.
To have two colormaps on one plot you need to open a figure and define the axes:
import numpy
import matplotlib.pyplot as plt
matrixA=numpy.linspace(1,20,100)
matrixA[matrixA >= 10] = numpy.nan
matrixA_2 = numpy.reshape(matrixA,[50,2])
matrixB=numpy.linspace(1,20,100)
matrixB[matrixB <= 10] = numpy.nan
matrixB_2 = numpy.reshape(matrixB,[50,2])
fig,ax = plt.subplots()
a = ax.contourf(matrixA_2,cmap='copper',alpha=0.5,zorder=0)
fig.colorbar(a,ax=ax,orientation='vertical')
b=ax.contourf(matrixB_2,cmap='cool',alpha=0.5,zorder=1)
fig.colorbar(b,ax=ax,orientation='horizontal')
plt.show()
You'll also see I've changed the alpha and zorder
I hope this helps.
I am plotting from a pandas dataframe with commands like
fig1 = plt.hist(dataset_1[dataset_1>-1.0],bins=bins,alpha=0.75,label=label1,normed=True)
and the plots comprise multiple histograms on one canvas. Since each histogram is normalised to its own integral (hence the histograms have the same area, because the purpose of the histograms is to illustrate the shape of the datasets rather than their relative sizes), the numbers on the y axis are not meaningful. For now, I am suppressing y axis labelling using
axes.set_ylabel("(Normalised to unity)")
axes.get_yaxis().set_ticks([])
Is there a way of adjusting the scaling of the y axis such that "1" corresponds to the highest value on any histogram? This would display a vertical scale to guide the eye and with which to judge the relative values of different bins. In essence, I mean re-normalising the maximum displayed y value without affecting the scaling of the histograms (i.e. decoupling the axis scale from what it represents).
You have two options:
Drawing histogram, adjusting y axis tick.
You may set the y tick to the location of the maximum and label it with 1 afterwards.
import numpy as np; np.random.seed(1)
import matplotlib.pyplot as plt
a = np.random.rayleigh(scale=3, size=2000)
hist, edges,_ = plt.hist(a, ec="k")
plt.yticks([0,hist.max()], [0,1])
plt.show()
Normalizing histogram, drawing to scale.
You may normalize the histogram in the way you desire by first calculating the histogram, dividing it by its maximum and then plot a bar plot of it.
import numpy as np; np.random.seed(1)
import matplotlib.pyplot as plt
a = np.random.rayleigh(scale=3, size=2000)
hist, edges = np.histogram(a)
hist = hist/float(hist.max())
plt.bar(edges[1:], hist, width=np.diff(edges)[0], align="edge", ec="k")
plt.yticks([0,1])
plt.show()
The output in both cases would be the same: