Scatter Matrix showing too many floating point values on graph - python

I'm trying to plot a scatter matrix using Python but the ticks on the y-axis for the top left plot has a high amount of unnecessary digits. I'm directly plotting the graph from pandas using scatter_matrix function from pandas.plotting
Also, I am quite new to Python so sorry if this is a stupid question but I just couldn't find the right answer to fit my needs.
I've tried to use different axis formatting options using yaxis.set_major_formatter (not sure if this doesn't work because I'm plotting from pandas, but yielding no results either way), pandas.set_option to customise display.
from pandas.plotting import scatter_matrix
scatter_matrix(df, alpha=0.3, figsize=(9,9), diagonal='kde')
df: Tesla Ret Ford Ret GM Ret
Date
2012-01-03 NaN NaN NaN
2012-01-04 -0.013177 0.015274 0.004751
2012-01-05 -0.021292 0.025664 0.048227
2012-01-06 -0.008481 0.010354 0.033829
2012-01-09 0.013388 0.007686 -0.003490
2012-01-10 0.013578 0.000000 0.017513
2012-01-11 0.022085 0.022881 0.052926
2012-01-12 0.000708 0.005800 0.008173
2012-01-13 -0.193274 -0.008237 -0.015403
2012-01-17 0.167179 -0.001661 -0.003705
...
I've tried to use:
plt.gca().yaxis.set_major_formatter(StrMethodFormatter('{x:,.2f}')) and ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f')) after importing the respective modulesm, to no avail.
Figure is available here
Everything else in the figure is just as it should be, just the y-axis of the top left plot. I would like it to show one or two decimal point values like the rest of the figure.
I'd greatly appreciate any help that could fix my issue.
Thanks.

P.S: I have edited this answer based on the problem pointed out by #ImportanceOfBeingEarnest (thanks to him). Please read the comments below the answer to see what I mean.
The new solution is to get the displayed ticks for that particular axis and format them up to 2 decimal places.
new_labels = [round(float(i.get_text()), 2) for i in axes[0,0].get_yticklabels()]
axes[0,0].set_yticklabels(new_labels)
OLD ANSWER (Still kept as a history as you will see that the y-ticks in the figure generated below are not correct)
The problem is that you are using ax object to format the labels but ax returned from scatter_matrix is not a single axis object. It is an object containing 9 axis (3x3 subfigure). You can prove this if you plot the shape of the axes variable.
axes = scatter_matrix(df, alpha=0.3, figsize=(9,9), diagonal='kde')
print (axes.shape)
# (3, 3)
The solution is either to iterate through all the axis or to just change the formatting for the problematic case. P.S: The figure below don't match with your's because I just used the small DataFrame you posted.
Following is how you can do it for all the y-axis
from pandas.plotting import scatter_matrix
from matplotlib.ticker import FormatStrFormatter
axes = scatter_matrix(df, alpha=0.3, figsize=(9,9), diagonal='kde')
for ax in axes.flatten():
ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))
Alternatively you can just choose a particular axis. Here your top left subfigure can be accessed using axes[0,0]
axes[0,0].yaxis.set_major_formatter(FormatStrFormatter('%.2f'))

pandas.scatter_matrix suffers from an unfortunate design choice. That is, it plots the kde or histogram on the diagonal to the axes that shows the ticks for the rest of the row. This then requires to fake the ticks and labels to be fitting for the data. In the course of this a FixedLocator and a FixedFormatter are used. The format of the ticklabels is hence directly taken over from the string representation of a number.
I would propose a completely different design here. That is, the diagonal axes should stay empty, and instead twin axes are used to show the histogram or kde curve. The problem from the question can hence not occur.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def scatter_matrix(df, axes=None, **kw):
n = df.columns.size
diagonal = kw.pop("diagonal", "hist")
if not axes:
fig, axes = plt.subplots(n,n, figsize=kw.pop("figsize", None),
squeeze=False, sharex="col", sharey="row")
else:
flax = axes.flatten()
fig = flax[0].figure
assert len(flax) == n*n
# no gaps between subplots
fig.subplots_adjust(wspace=0, hspace=0)
hist_kwds = kw.pop("hist_kwds", {})
density_kwds = kw.pop("density_kwds", {})
import itertools
p = itertools.permutations(df.columns, r=2)
n = itertools.permutations(np.arange(len(df.columns)), r=2)
for (i,j), (y,x) in zip(n,p):
axes[i,j].scatter(df[x].values, df[y].values, **kw)
axes[i,j].tick_params(left=False, labelleft=False,
bottom=False, labelbottom=False)
diagaxes = []
for i, c in enumerate(df.columns):
ax = axes[i,i].twinx()
diagaxes.append(ax)
if diagonal == 'hist':
ax.hist(df[c].values, **hist_kwds)
elif diagonal in ('kde', 'density'):
from scipy.stats import gaussian_kde
y = df[c].values
gkde = gaussian_kde(y)
ind = np.linspace(y.min(), y.max(), 1000)
ax.plot(ind, gkde.evaluate(ind), **density_kwds)
if i!= 0:
diagaxes[0].get_shared_y_axes().join(diagaxes[0], ax)
ax.axis("off")
for i,c in enumerate(df.columns):
axes[i,i].tick_params(left=False, labelleft=False,
bottom=False, labelbottom=False)
axes[i,0].set_ylabel(c)
axes[-1,i].set_xlabel(c)
axes[i,0].tick_params(left=True, labelleft=True)
axes[-1,i].tick_params(bottom=True, labelbottom=True)
return axes, diagaxes
df = pd.DataFrame(np.random.randn(1000, 4), columns=['A','B','C','D'])
axes,diagaxes = scatter_matrix(df, diagonal='kde', alpha=0.5)
plt.show()

Related

multiple boxplots, side by side, using matplotlib from a dataframe

I'm trying to plot 60+ boxplots side by side from a dataframe and I was wondering if someone could suggest some possible solutions.
At the moment I have df_new, a dataframe with 66 columns, which I'm using to plot boxplots. The easiest way I found to plot the boxplots was to use the boxplot package inside pandas:
boxplot = df_new.boxplot(column=x, figsize = (100,50))
This gives me a very very tiny chart with illegible axis which I cannot seem to change the font size for, so I'm trying to do this natively in matplotlib but I cannot think of an efficient way of doing it. I'm trying to avoid creating 66 separate boxplots using something like:
fig, ax = plt.subplots(nrows = 1,
ncols = 66,
figsize = (10,5),
sharex = True)
ax[0,0].boxplot(#insert parameters here)
I actually do not not how to get the data from df_new.describe() into the boxplot function, so any tips on this would be greatly appreciated! The documentation is confusing. Not sure what x vectors should be.
Ideally I'd like to just give the boxplot function the dataframe and for it to automatically create all the boxplots by working out all the quartiles, column separations etc on the fly - is this even possible?
Thanks!
I tried to replace the boxplot with a ridge plot, which takes up less space because:
it requires half of the width
you can partially overlap the ridges
it develops vertically, so you can scroll down all the plot
I took the code from the seaborn documentation and adapted it a little bit in order to have 60 different ridges, normally distributed; here the code:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import itertools
sns.set(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# # Create the data
n = 20
x = list(np.random.randn(1, 60)[0])
g = [item[0] + item[1] for item in list(itertools.product(list('ABCDEFGHIJ'), list('123456')))]
df = pd.DataFrame({'x': n*x,
'g': n*g})
# Initialize the FacetGrid object
pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="g", hue="g", aspect=15, height=.5, palette=pal)
# Draw the densities in a few steps
g.map(sns.kdeplot, "x", clip_on=False, shade=True, alpha=1, lw=1.5, bw=.2)
g.map(sns.kdeplot, "x", clip_on=False, color="w", lw=2, bw=.2)
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Define and use a simple function to label the plot in axes coordinates
def label(x, color, label):
ax = plt.gca()
ax.text(0, .2, label, fontweight="bold", color=color,
ha="left", va="center", transform=ax.transAxes)
g.map(label, "x")
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25)
# Remove axes details that don't play well with overlap
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)
plt.show()
This is the result I get:
I don't know if it will be good for your needs, in any case keep in mind that keeping so many distributions next to each other will always require a lot of space (and a very big screen).
Maybe you could try dividing the distrubutions into smaller groups and plotting them a little at a time?

Matplotlib: Plot on double y-axis plot misaligned

I'm trying to plot two datasets into one plot with matplotlib. One of the two plots is misaligned by 1 on the x-axis.
This MWE pretty much sums up the problem. What do I have to adjust to bring the box-plot further to the left?
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
titles = ["nlnd", "nlmd", "nlhd", "mlnd", "mlmd", "mlhd", "hlnd", "hlmd", "hlhd"]
plotData = pd.DataFrame(np.random.rand(25, 9), columns=titles)
failureRates = pd.DataFrame(np.random.rand(9, 1), index=titles)
color = {'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue',
'caps': 'Gray'}
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
plotData.plot.box(ax=ax1, color=color, sym='+')
failureRates.plot(ax=ax2, color='b', legend=False)
ax1.set_ylabel('Seconds')
ax2.set_ylabel('Failure Rate in %')
plt.xlim(-0.7, 8.7)
ax1.set_xticks(range(len(titles)))
ax1.set_xticklabels(titles)
fig.tight_layout()
fig.show()
Actual result. Note that its only 8 box-plots instead of 9 and that they're starting at index 1.
The issue is a mismatch between how box() and plot() work - box() starts at x-position 1 and plot() depends on the index of the dataframe (which defaults to starting at 0). There are only 8 plots because the 9th is being cut off since you specify plt.xlim(-0.7, 8.7). There are several easy ways to fix this, as #Sheldore's answer indicates, you can explicitly set the positions for the boxplot. Another way you can do this is to change the indexing of the failureRates dataframe to start at 1 in construction of the dataframe, i.e.
failureRates = pd.DataFrame(np.random.rand(9, 1), index=range(1, len(titles)+1))
note that you need not specify the xticks or the xlim for the question MCVE, but you may need to for your complete code.
You can specify the positions on the x-axis where you want to have the box plots. Since you have 9 boxes, use the following which generates the figure below
plotData.plot.box(ax=ax1, color=color, sym='+', positions=range(9))

Matplotlib Center plot vertically and horizontally [duplicate]

I need to generate a whole bunch of vertically-stacked plots in matplotlib. The result will be saved using savefig and viewed on a webpage, so I don't care how tall the final image is, as long as the subplots are spaced so they don't overlap.
No matter how big I allow the figure to be, the subplots always seem to overlap.
My code currently looks like
import matplotlib.pyplot as plt
import my_other_module
titles, x_lists, y_lists = my_other_module.get_data()
fig = plt.figure(figsize=(10,60))
for i, y_list in enumerate(y_lists):
plt.subplot(len(titles), 1, i)
plt.xlabel("Some X label")
plt.ylabel("Some Y label")
plt.title(titles[i])
plt.plot(x_lists[i],y_list)
fig.savefig('out.png', dpi=100)
Please review matplotlib: Tight Layout guide and try using matplotlib.pyplot.tight_layout, or matplotlib.figure.Figure.tight_layout
As a quick example:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=4, ncols=4, figsize=(8, 8))
fig.tight_layout() # Or equivalently, "plt.tight_layout()"
plt.show()
Without Tight Layout
With Tight Layout
You can use plt.subplots_adjust to change the spacing between the subplots.
call signature:
subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)
The parameter meanings (and suggested defaults) are:
left = 0.125 # the left side of the subplots of the figure
right = 0.9 # the right side of the subplots of the figure
bottom = 0.1 # the bottom of the subplots of the figure
top = 0.9 # the top of the subplots of the figure
wspace = 0.2 # the amount of width reserved for blank space between subplots
hspace = 0.2 # the amount of height reserved for white space between subplots
The actual defaults are controlled by the rc file
Using subplots_adjust(hspace=0) or a very small number (hspace=0.001) will completely remove the whitespace between the subplots, whereas hspace=None does not.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as tic
fig = plt.figure(figsize=(8, 8))
x = np.arange(100)
y = 3.*np.sin(x*2.*np.pi/100.)
for i in range(1, 6):
temp = 510 + i
ax = plt.subplot(temp)
plt.plot(x, y)
plt.subplots_adjust(hspace=0)
temp = tic.MaxNLocator(3)
ax.yaxis.set_major_locator(temp)
ax.set_xticklabels(())
ax.title.set_visible(False)
plt.show()
hspace=0 or hspace=0.001
hspace=None
Similar to tight_layout matplotlib now (as of version 2.2) provides constrained_layout. In contrast to tight_layout, which may be called any time in the code for a single optimized layout, constrained_layout is a property, which may be active and will optimze the layout before every drawing step.
Hence it needs to be activated before or during subplot creation, such as figure(constrained_layout=True) or subplots(constrained_layout=True).
Example:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(4,4, constrained_layout=True)
plt.show()
constrained_layout may as well be set via rcParams
plt.rcParams['figure.constrained_layout.use'] = True
See the what's new entry and the Constrained Layout Guide
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(10,60))
plt.subplots_adjust( ... )
The plt.subplots_adjust method:
def subplots_adjust(*args, **kwargs):
"""
call signature::
subplots_adjust(left=None, bottom=None, right=None, top=None,
wspace=None, hspace=None)
Tune the subplot layout via the
:class:`matplotlib.figure.SubplotParams` mechanism. The parameter
meanings (and suggested defaults) are::
left = 0.125 # the left side of the subplots of the figure
right = 0.9 # the right side of the subplots of the figure
bottom = 0.1 # the bottom of the subplots of the figure
top = 0.9 # the top of the subplots of the figure
wspace = 0.2 # the amount of width reserved for blank space between subplots
hspace = 0.2 # the amount of height reserved for white space between subplots
The actual defaults are controlled by the rc file
"""
fig = gcf()
fig.subplots_adjust(*args, **kwargs)
draw_if_interactive()
or
fig = plt.figure(figsize=(10,60))
fig.subplots_adjust( ... )
The size of the picture matters.
"I've tried messing with hspace, but increasing it only seems to make all of the graphs smaller without resolving the overlap problem."
Thus to make more white space and keep the sub plot size the total image needs to be bigger.
You could try the .subplot_tool()
plt.subplot_tool()
Resolving this issue when plotting a dataframe with pandas.DataFrame.plot, which uses matplotlib as the default backend.
The following works for whichever kind= is specified (e.g. 'bar', 'scatter', 'hist', etc.).
Tested in python 3.8.12, pandas 1.3.4, matplotlib 3.4.3
Imports and sample data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# sinusoidal sample data
sample_length = range(1, 15+1)
rads = np.arange(0, 2*np.pi, 0.01)
data = np.array([np.sin(t*rads) for t in sample_length])
df = pd.DataFrame(data.T, index=pd.Series(rads.tolist(), name='radians'), columns=[f'freq: {i}x' for i in sample_length])
# default plot with subplots; each column is a subplot
axes = df.plot(subplots=True)
Adjust the Spacing
Adjust the default parameters in pandas.DataFrame.plot
Change figsize: a width of 5 and a height of 4 for each subplot is a good place to start.
Change layout: (rows, columns) for the layout of subplots.
sharey=True and sharex=True so space isn't taken for redundant labels on each subplot.
The .plot method returns a numpy array of matplotlib.axes.Axes, which should be flattened to easily work with.
Use .get_figure() to extract the DataFrame.plot figure object from one of the Axes.
Use fig.tight_layout() if desired.
axes = df.plot(subplots=True, layout=(3, 5), figsize=(25, 16), sharex=True, sharey=True)
# flatten the axes array to easily access any subplot
axes = axes.flat
# extract the figure object
fig = axes[0].get_figure()
# use tight_layout
fig.tight_layout()
df
# display(df.head(3))
freq: 1x freq: 2x freq: 3x freq: 4x freq: 5x freq: 6x freq: 7x freq: 8x freq: 9x freq: 10x freq: 11x freq: 12x freq: 13x freq: 14x freq: 15x
radians
0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.01 0.010000 0.019999 0.029996 0.039989 0.049979 0.059964 0.069943 0.079915 0.089879 0.099833 0.109778 0.119712 0.129634 0.139543 0.149438
0.02 0.019999 0.039989 0.059964 0.079915 0.099833 0.119712 0.139543 0.159318 0.179030 0.198669 0.218230 0.237703 0.257081 0.276356 0.295520
This answer shows using fig.tight_layout after creating the figure. However, tight_layout can be set directly when creating the figure, because matplotlib.pyplot.subplots accepts additional parameters with **fig_kw. All additional keyword arguments are passed to the pyplot.figure call.
See How to plot in multiple subplots for accessing and plotting in subplots.
import matplotlib.pyplot as plt
# create the figure with tight_layout=True
fig, axes = plt.subplots(nrows=4, ncols=4, figsize=(8, 8), tight_layout=True)

How to change the positions of subplot titles and axis labels in Seaborn FacetGrid?

I am trying to plot a polar plot using Seaborn's facetGrid, similar to what is detailed on seaborn's gallery
I am using the following code:
sns.set(context='notebook', style='darkgrid', palette='deep', font='sans-serif', font_scale=1.25)
# Set up a grid of axes with a polar projection
g = sns.FacetGrid(df_total, col="Construct", hue="Run", col_wrap=5, subplot_kws=dict(projection='polar'), size=5, sharex=False, sharey=False, despine=False)
# Draw a scatterplot onto each axes in the grid
g.map(plt.plot, 'Rad', ''y axis label', marker=".", ms=3, ls='None').set_titles("{col_name}")
plt.savefig('./image.pdf')
Which with my data gives the following:
I want to keep this organisation of 5 plots per line.
The problem is that the title of each subplot overlap with the values of the ticks, same for the y axis label.
Is there a way to prevent this behaviour? Can I somehow shift the titles slightly above their current position and can I shift the y axis labels slightly on the left of their current position?
Many thanks in advance!
EDIT:
This is not a duplicate of this SO as the problem was that the title of one subplot overlapped with the axis label of another subplot.
Here my problem is that the title of one subplot overlaps with the ticks label of the same subplot and similarly the axis label overlaps with the ticks label of the same subplot.
I also would like to add that I do not care that they overlap on my jupyter notebook (as it as been created with it), however I want the final saved image with no overlap, so perhaps there is something I need to do to save the image in a slightly different format to avoid that, but I don't know what (I am only using plt.savefig to save it).
EDIT 2: If someone would like to reproduce the problem here is a minimal example:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
sns.set()
sns.set(context='notebook', style='darkgrid', palette='deep', font='sans-serif', font_scale=1.5)
# Generate an example radial datast
r = np.linspace(0, 10000, num=100)
df = pd.DataFrame({'label': r, 'slow': r, 'medium-slow': 1 * r, 'medium': 2 * r, 'medium-fast': 3 * r, 'fast': 4 * r})
# Convert the dataframe to long-form or "tidy" format
df = pd.melt(df, id_vars=['label'], var_name='speed', value_name='theta')
# Set up a grid of axes with a polar projection
g = sns.FacetGrid(df, col="speed", hue="speed",
subplot_kws=dict(projection='polar'), size=4.5, col_wrap=5,
sharex=False, sharey=False, despine=False)
# Draw a scatterplot onto each axes in the grid
g.map(plt.scatter, "theta", "label")
plt.savefig('./image.png')
plt.show()
Which gives the following image in which the titles are not as bad as in my original problem (but still some overlap) and the label on the left hand side overlap completely.
In order to move the title a bit higher you can set at new position,
ax.title.set_position([.5, 1.1])
In order to move the ylabel a little further left, you can add some padding
ax.yaxis.labelpad = 25
To do this for the axes of the facetgrid, you'd do:
for ax in g.axes:
ax.title.set_position([.5, 1.1])
ax.yaxis.labelpad = 25
The answer provided by ImportanceOfBeingErnest in this SO question may help.

Is there a function to make scatterplot matrices in matplotlib?

Example of scatterplot matrix
Is there such a function in matplotlib.pyplot?
For those who do not want to define their own functions, there is a great data analysis libarary in Python, called Pandas, where one can find the scatter_matrix() method:
from pandas.plotting import scatter_matrix
df = pd.DataFrame(np.random.randn(1000, 4), columns = ['a', 'b', 'c', 'd'])
scatter_matrix(df, alpha = 0.2, figsize = (6, 6), diagonal = 'kde')
Generally speaking, matplotlib doesn't usually contain plotting functions that operate on more than one axes object (subplot, in this case). The expectation is that you'd write a simple function to string things together however you'd like.
I'm not quite sure what your data looks like, but it's quite simple to just build a function to do this from scratch. If you're always going to be working with structured or rec arrays, then you can simplify this a touch. (i.e. There's always a name associated with each data series, so you can omit having to specify names.)
As an example:
import itertools
import numpy as np
import matplotlib.pyplot as plt
def main():
np.random.seed(1977)
numvars, numdata = 4, 10
data = 10 * np.random.random((numvars, numdata))
fig = scatterplot_matrix(data, ['mpg', 'disp', 'drat', 'wt'],
linestyle='none', marker='o', color='black', mfc='none')
fig.suptitle('Simple Scatterplot Matrix')
plt.show()
def scatterplot_matrix(data, names, **kwargs):
"""Plots a scatterplot matrix of subplots. Each row of "data" is plotted
against other rows, resulting in a nrows by nrows grid of subplots with the
diagonal subplots labeled with "names". Additional keyword arguments are
passed on to matplotlib's "plot" command. Returns the matplotlib figure
object containg the subplot grid."""
numvars, numdata = data.shape
fig, axes = plt.subplots(nrows=numvars, ncols=numvars, figsize=(8,8))
fig.subplots_adjust(hspace=0.05, wspace=0.05)
for ax in axes.flat:
# Hide all ticks and labels
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
# Set up ticks only on one side for the "edge" subplots...
if ax.is_first_col():
ax.yaxis.set_ticks_position('left')
if ax.is_last_col():
ax.yaxis.set_ticks_position('right')
if ax.is_first_row():
ax.xaxis.set_ticks_position('top')
if ax.is_last_row():
ax.xaxis.set_ticks_position('bottom')
# Plot the data.
for i, j in zip(*np.triu_indices_from(axes, k=1)):
for x, y in [(i,j), (j,i)]:
axes[x,y].plot(data[x], data[y], **kwargs)
# Label the diagonal subplots...
for i, label in enumerate(names):
axes[i,i].annotate(label, (0.5, 0.5), xycoords='axes fraction',
ha='center', va='center')
# Turn on the proper x or y axes ticks.
for i, j in zip(range(numvars), itertools.cycle((-1, 0))):
axes[j,i].xaxis.set_visible(True)
axes[i,j].yaxis.set_visible(True)
return fig
main()
You can also use Seaborn's pairplot function:
import seaborn as sns
sns.set()
df = sns.load_dataset("iris")
sns.pairplot(df, hue="species")
Thanks for sharing your code! You figured out all the hard stuff for us. As I was working with it, I noticed a few little things that didn't look quite right.
[FIX #1] The axis tics weren't lining up like I would expect (i.e., in your example above, you should be able to draw a vertical and horizontal line through any point across all plots and the lines should cross through the corresponding point in the other plots, but as it sits now this doesn't occur.
[FIX #2] If you have an odd number of variables you are plotting with, the bottom right corner axes doesn't pull the correct xtics or ytics. It just leaves it as the default 0..1 ticks.
Not a fix, but I made it optional to explicitly input names, so that it puts a default xi for variable i in the diagonal positions.
Below you'll find an updated version of your code that addresses these two points, otherwise preserving the beauty of your code.
import itertools
import numpy as np
import matplotlib.pyplot as plt
def scatterplot_matrix(data, names=[], **kwargs):
"""
Plots a scatterplot matrix of subplots. Each row of "data" is plotted
against other rows, resulting in a nrows by nrows grid of subplots with the
diagonal subplots labeled with "names". Additional keyword arguments are
passed on to matplotlib's "plot" command. Returns the matplotlib figure
object containg the subplot grid.
"""
numvars, numdata = data.shape
fig, axes = plt.subplots(nrows=numvars, ncols=numvars, figsize=(8,8))
fig.subplots_adjust(hspace=0.0, wspace=0.0)
for ax in axes.flat:
# Hide all ticks and labels
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
# Set up ticks only on one side for the "edge" subplots...
if ax.is_first_col():
ax.yaxis.set_ticks_position('left')
if ax.is_last_col():
ax.yaxis.set_ticks_position('right')
if ax.is_first_row():
ax.xaxis.set_ticks_position('top')
if ax.is_last_row():
ax.xaxis.set_ticks_position('bottom')
# Plot the data.
for i, j in zip(*np.triu_indices_from(axes, k=1)):
for x, y in [(i,j), (j,i)]:
# FIX #1: this needed to be changed from ...(data[x], data[y],...)
axes[x,y].plot(data[y], data[x], **kwargs)
# Label the diagonal subplots...
if not names:
names = ['x'+str(i) for i in range(numvars)]
for i, label in enumerate(names):
axes[i,i].annotate(label, (0.5, 0.5), xycoords='axes fraction',
ha='center', va='center')
# Turn on the proper x or y axes ticks.
for i, j in zip(range(numvars), itertools.cycle((-1, 0))):
axes[j,i].xaxis.set_visible(True)
axes[i,j].yaxis.set_visible(True)
# FIX #2: if numvars is odd, the bottom right corner plot doesn't have the
# correct axes limits, so we pull them from other axes
if numvars%2:
xlimits = axes[0,-1].get_xlim()
ylimits = axes[-1,0].get_ylim()
axes[-1,-1].set_xlim(xlimits)
axes[-1,-1].set_ylim(ylimits)
return fig
if __name__=='__main__':
np.random.seed(1977)
numvars, numdata = 4, 10
data = 10 * np.random.random((numvars, numdata))
fig = scatterplot_matrix(data, ['mpg', 'disp', 'drat', 'wt'],
linestyle='none', marker='o', color='black', mfc='none')
fig.suptitle('Simple Scatterplot Matrix')
plt.show()
Thanks again for sharing this with us. I have used it many times! Oh, and I re-arranged the main() part of the code so that it can be a formal example code or not get called if it is being imported into another piece of code.
While reading the question I expected to see an answer including rpy. I think this is a nice option taking advantage of two beautiful languages. So here it is:
import rpy
import numpy as np
def main():
np.random.seed(1977)
numvars, numdata = 4, 10
data = 10 * np.random.random((numvars, numdata))
mpg = data[0,:]
disp = data[1,:]
drat = data[2,:]
wt = data[3,:]
rpy.set_default_mode(rpy.NO_CONVERSION)
R_data = rpy.r.data_frame(mpg=mpg,disp=disp,drat=drat,wt=wt)
# Figure saved as eps
rpy.r.postscript('pairsPlot.eps')
rpy.r.pairs(R_data,
main="Simple Scatterplot Matrix Via RPy")
rpy.r.dev_off()
# Figure saved as png
rpy.r.png('pairsPlot.png')
rpy.r.pairs(R_data,
main="Simple Scatterplot Matrix Via RPy")
rpy.r.dev_off()
rpy.set_default_mode(rpy.BASIC_CONVERSION)
if __name__ == '__main__': main()
I can't post an image to show the result :( sorry!

Categories