Setting xticks and yticks for scatter plot matrix with pandas [duplicate] - python

I'm trying to modify the scatter_matrix plot available on Pandas.
Simple usage would be
Obtained doing :
iris = datasets.load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
pd.tools.plotting.scatter_matrix(df, diagonal='kde', grid=False)
plt.show()
I want to do several modification, among which:
managing to turn off grid on all plots
rotate x any y labels 90 degree
turn ticks off
Is there a way for me to modify pandas' output without having to rewrite my own scatter plot function ? where to start to add non-existing options, fine tunings, etc ?
Thanks !

pd.tools.plotting.scatter_matrix returns an array of the axes it draws; The lower left boundary axes corresponds to indices [:,0] and [-1,:]. One can loop over these elements and apply any sort of modifications. For example:
axs = pd.tools.plotting.scatter_matrix(df, diagonal='kde')
def wrap(txt, width=8):
'''helper function to wrap text for long labels'''
import textwrap
return '\n'.join(textwrap.wrap(txt, width))
for ax in axs[:,0]: # the left boundary
ax.grid('off', axis='both')
ax.set_ylabel(wrap(ax.get_ylabel()), rotation=0, va='center', labelpad=20)
ax.set_yticks([])
for ax in axs[-1,:]: # the lower boundary
ax.grid('off', axis='both')
ax.set_xlabel(wrap(ax.get_xlabel()), rotation=90)
ax.set_xticks([])

Related

How can I rotate annotated seaborn heatmap data and legend?

I created to a seaborn heatmap to summarize Teils_U coefficients. The data is horizontally displayed in the heatmap. Now, I would like to rotate the data and the legend. I know that you can roate the x axis and y axis labels in a plot, but how can I rotate the data and the legend ?
This is my code:
#creates padnas dataframe to hold the values
theilu = pd.DataFrame(index=['Y'],columns=matrix.columns)
#store column names in variable columns
columns = matrix.columns
#iterate through each variable
for j in range(0,len(columns)):
#call teil_u function on "ziped" independant and dependant variable -> respectivley x & y in the functions section
u = theil_u(matrix['Y'].tolist(),matrix[columns[j]].tolist())
#select respecive columns needed for output
theilu.loc[:,columns[j]] = u
#handle nans if any
theilu.fillna(value=np.nan,inplace=True)
#plot correlation between fraud reported (y) and all other variables (x)
plt.figure(figsize=(20,1))
sns.heatmap(theilu,annot=True,fmt='.2f')
plt.show()
Here an image of what I am looking for:
Please let me know if you need and sample data or the teil_u function to recreate the problem. Thank you
The parameters of the annotation can be changed via annot_kws. One of them is the rotation.
Some parameters of the colorbar can be changed via cbar_kwsdict, but the unfortunately the orientation of the labels isn't one of them. Therefore, you need a handle to the colorbar's ax. One way is to create an ax beforehand, and pass it to sns.heatmap(..., cbar_ax=ax). An easier way is to get the handle afterwards: cbar = heatmap.collections[0].colorbar.
With this ax handle, you can change more properties of the colorbar, such as the orientation of its labels. Also, their vertical alignment can be changed to get them centered.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
data = np.random.rand(1, 12)
fig, ax = plt.subplots(figsize=(10,2))
heatmap = sns.heatmap(data, cbar=True, ax=ax,
annot=True, fmt='.2f', annot_kws={'rotation': 90})
cbar = heatmap.collections[0].colorbar
# heatmap.set_yticklabels(heatmap.get_yticklabels(), rotation=90)
heatmap.set_xticklabels(heatmap.get_xticklabels(), rotation=90)
cbar.ax.set_yticklabels(cbar.ax.get_yticklabels(), rotation=90, va='center')
plt.tight_layout()
plt.show()
You can pass argument to ax.text() (which is used to write the annotation) using the annot_kws= argument.
Therefore:
flights = sns.load_dataset("flights")
flights = flights.pivot("month", "year", "passengers")
fig, ax = plt.subplots(figsize=(8,8))
ax = sns.heatmap(flights, annot=True, fmt='d', annot_kws={'rotation':90})

Using AxesGrid to plot data with different x and y range in a square [duplicate]

I want to make x and y axes be of equal lengths (i.e the plot minus the legend should be square ). I wish to plot the legend outside (I have already been able to put legend outside the box). The span of x axis in the data (x_max - x_min) is not the same as the span of y axis in the data (y_max - y_min).
This is the relevant part of the code that I have at the moment:
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5), fontsize=15 )
plt.axis('equal')
plt.tight_layout()
The following link is an example of an output plot that I am getting : plot
How can I do this?
Would plt.axis('scaled') be what you're after? That would produce a square plot, if the data limits are of equal difference.
If they are not, you could get a square plot by setting the aspect of the axes to the ratio of xlimits and ylimits.
import numpy as np
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(1,2)
ax1.plot([-2.5, 2.5], [-4,13], "s-")
ax1.axis("scaled")
ax2.plot([-2.5, 2.5], [-4,13], "s-")
ax2.set_aspect(np.diff(ax2.get_xlim())/np.diff(ax2.get_ylim()))
plt.show()
One option you have to is manually set the limits, assuming that you know the size of your dataset.
axes = plt.gca()
axes.set_xlim([xmin,xmax])
axes.set_ylim([ymin,ymax])
A better option would be to iterate through your data to find the maximum x- and y-coordinates, take the greater of those two numbers, add a little bit more to that value to act as a buffer, and set xmax and ymax to that new value. You can use a similar method to set xmin and ymin: instead of finding the maximums, find the minimums.
To put the legend outside of the plot, I would look at this question: How to put the legend out of the plot

How to get rid of extra white space on subplots with shared axes?

I'm creating a plot using python 3.5.1 and matplotlib 1.5.1 that has two subplots (side by side) with a shared Y axis. A sample output image is shown below:
Notice the extra white space at the top and bottom of each set of axes. Try as I might I can't seem to get rid of it. The overall goal of the figure is to have a waterfall type plot on the left with a shared Y axes with the plot on the right.
Here's some sample code to reproduce the image above.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
# create some X values
periods = np.linspace(1/1440, 1, 1000)
# create some Y values (will be datetimes, not necessarily evenly spaced
# like they are in this example)
day_ints = np.linspace(1, 100, 100)
days = pd.to_timedelta(day_ints, 'D') + pd.to_datetime('2016-01-01')
# create some fake data for the number of points
points = np.random.random(len(day_ints))
# create some fake data for the color mesh
Sxx = np.random.random((len(days), len(periods)))
# Create the plots
fig = plt.figure(figsize=(8, 6))
# create first plot
ax1 = plt.subplot2grid((1,5), (0,0), colspan=4)
im = ax1.pcolormesh(periods, days, Sxx, cmap='viridis', vmin=0, vmax=1)
ax1.invert_yaxis()
ax1.autoscale(enable=True, axis='Y', tight=True)
# create second plot and use the same y axis as the first one
ax2 = plt.subplot2grid((1,5), (0,4), sharey=ax1)
ax2.scatter(points, days)
ax2.autoscale(enable=True, axis='Y', tight=True)
# Hide the Y axis scale on the second plot
plt.setp(ax2.get_yticklabels(), visible=False)
#ax1.set_adjustable('box-forced')
#ax2.set_adjustable('box-forced')
fig.colorbar(im, ax=ax1)
As you can see in the commented out code I've tried a number of approaches, as suggested by posts like https://github.com/matplotlib/matplotlib/issues/1789/ and Matplotlib: set axis tight only to x or y axis.
As soon as I remove the sharey=ax1 part of the second subplot2grid call the problem goes away, but then I also don't have a common Y axis.
Autoscale tends to add a buffer to the data so that all of the data points are easily visible and not part-way cut off by the axes.
Change:
ax1.autoscale(enable=True, axis='Y', tight=True)
to:
ax1.set_ylim(days.min(),days.max())
and
ax2.autoscale(enable=True, axis='Y', tight=True)
to:
ax2.set_ylim(days.min(),days.max())
To get:

Instead of grid lines on a plot, can matplotlib print grid crosses?

I want to have some grid lines on a plot, but actually full-length lines are too much/distracting, even dashed light grey lines. I went and manually did some editing of the SVG output to get the effect I was looking for. Can this be done with matplotlib? I had a look at the pyplot api for grid, and the only thing I can see that might be able to get near it are the xdata and ydata Line2D kwargs.
This cannot be done through the basic API, because the grid lines are created using only two points. The grid lines would need a 'data' point at every tick mark for there to be a marker drawn. This is shown in the following example:
import matplotlib.pyplot as plt
ax = plt.subplot(111)
ax.grid(clip_on=False, marker='o', markersize=10)
plt.savefig('crosses.png')
plt.show()
This results in:
Notice how the 'o' markers are only at the beginning and the end of the Axes edges, because the grid lines only involve two points.
You could write a method to emulate what you want, creating the cross marks using a series of Artists, but it's quicker to just leverage the basic plotting capabilities to draw the cross pattern.
This is what I do in the following example:
import matplotlib.pyplot as plt
import numpy as np
NPOINTS=100
def set_grid_cross(ax, in_back=True):
xticks = ax.get_xticks()
yticks = ax.get_yticks()
xgrid, ygrid = np.meshgrid(xticks, yticks)
kywds = dict()
if in_back:
kywds['zorder'] = 0
grid_lines = ax.plot(xgrid, ygrid, 'k+', **kywds)
xvals = np.arange(NPOINTS)
yvals = np.random.random(NPOINTS) * NPOINTS
ax1 = plt.subplot(121)
ax2 = plt.subplot(122)
ax1.plot(xvals, yvals, linewidth=4)
ax1.plot(xvals, xvals, linewidth=7)
set_grid_cross(ax1)
ax2.plot(xvals, yvals, linewidth=4)
ax2.plot(xvals, xvals, linewidth=7)
set_grid_cross(ax2, in_back=False)
plt.savefig('gridpoints.png')
plt.show()
This results in the following figure:
As you can see, I take the tick marks in x and y to define a series of points where I want grid marks ('+'). I use meshgrid to take two 1D arrays and make 2 2D arrays corresponding to the double loop over each grid point. I plot this with the mark style as '+', and I'm done... almost. This plots the crosses on top, and I added an extra keyword to reorder the list of lines associated with the plot. I adjust the zorder of the grid marks if they are to be drawn behind everything.*****
The example shows the left subplot where by default the grid is placed in back, and the right subplot disables this option. You can notice the difference if you follow the green line in each plot.
If you are bothered by having grid crosses on the boarder, you can remove the first and last tick marks for both x and y before you define the grid in set_grid_cross, like so:
xticks = ax.get_xticks()[1:-1] #< notice the slicing
yticks = ax.get_yticks()[1:-1] #< notice the slicing
xgrid, ygrid = np.meshgrid(xticks, yticks)
I do this in the following example, using a larger, different marker to make my point:
***** Thanks to the answer by #fraxel for pointing this out.
You can draw on line segments at every intersection of the tickpoints. Its pretty easy to do, just grab the tick locations get_ticklocs() for both axis, then loop through all combinations, drawing short line segments using axhline and axvline, thus creating a cross hair at every intersection. I've set zorder=0 so the cross-hairs are drawn first, so that they are behind the plot data. Its easy to control the color/alpha and cross-hair size. Couple of slight 'gotchas'... do the plot before you get the tick locations.. and also the xmin and xmax parameters seem to require normalisation.
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot((0,2,3,5,5,5,6,7,8,6,6,4,3,32,7,99), 'r-',linewidth=4)
x_ticks = ax.xaxis.get_ticklocs()
y_ticks = ax.yaxis.get_ticklocs()
for yy in y_ticks[1:-1]:
for xx in x_ticks[1:-1]:
plt.axhline(y=yy, xmin=xx / max(x_ticks) - 0.02,
xmax=xx / max(x_ticks) + 0.02, color='gray', alpha=0.5, zorder=0)
plt.axvline(x=xx, ymin=yy / max(y_ticks) - 0.02,
ymax=yy / max(y_ticks) + 0.02, color='gray', alpha=0.5, zorder=0)
plt.show()

Is there a function to make scatterplot matrices in matplotlib?

Example of scatterplot matrix
Is there such a function in matplotlib.pyplot?
For those who do not want to define their own functions, there is a great data analysis libarary in Python, called Pandas, where one can find the scatter_matrix() method:
from pandas.plotting import scatter_matrix
df = pd.DataFrame(np.random.randn(1000, 4), columns = ['a', 'b', 'c', 'd'])
scatter_matrix(df, alpha = 0.2, figsize = (6, 6), diagonal = 'kde')
Generally speaking, matplotlib doesn't usually contain plotting functions that operate on more than one axes object (subplot, in this case). The expectation is that you'd write a simple function to string things together however you'd like.
I'm not quite sure what your data looks like, but it's quite simple to just build a function to do this from scratch. If you're always going to be working with structured or rec arrays, then you can simplify this a touch. (i.e. There's always a name associated with each data series, so you can omit having to specify names.)
As an example:
import itertools
import numpy as np
import matplotlib.pyplot as plt
def main():
np.random.seed(1977)
numvars, numdata = 4, 10
data = 10 * np.random.random((numvars, numdata))
fig = scatterplot_matrix(data, ['mpg', 'disp', 'drat', 'wt'],
linestyle='none', marker='o', color='black', mfc='none')
fig.suptitle('Simple Scatterplot Matrix')
plt.show()
def scatterplot_matrix(data, names, **kwargs):
"""Plots a scatterplot matrix of subplots. Each row of "data" is plotted
against other rows, resulting in a nrows by nrows grid of subplots with the
diagonal subplots labeled with "names". Additional keyword arguments are
passed on to matplotlib's "plot" command. Returns the matplotlib figure
object containg the subplot grid."""
numvars, numdata = data.shape
fig, axes = plt.subplots(nrows=numvars, ncols=numvars, figsize=(8,8))
fig.subplots_adjust(hspace=0.05, wspace=0.05)
for ax in axes.flat:
# Hide all ticks and labels
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
# Set up ticks only on one side for the "edge" subplots...
if ax.is_first_col():
ax.yaxis.set_ticks_position('left')
if ax.is_last_col():
ax.yaxis.set_ticks_position('right')
if ax.is_first_row():
ax.xaxis.set_ticks_position('top')
if ax.is_last_row():
ax.xaxis.set_ticks_position('bottom')
# Plot the data.
for i, j in zip(*np.triu_indices_from(axes, k=1)):
for x, y in [(i,j), (j,i)]:
axes[x,y].plot(data[x], data[y], **kwargs)
# Label the diagonal subplots...
for i, label in enumerate(names):
axes[i,i].annotate(label, (0.5, 0.5), xycoords='axes fraction',
ha='center', va='center')
# Turn on the proper x or y axes ticks.
for i, j in zip(range(numvars), itertools.cycle((-1, 0))):
axes[j,i].xaxis.set_visible(True)
axes[i,j].yaxis.set_visible(True)
return fig
main()
You can also use Seaborn's pairplot function:
import seaborn as sns
sns.set()
df = sns.load_dataset("iris")
sns.pairplot(df, hue="species")
Thanks for sharing your code! You figured out all the hard stuff for us. As I was working with it, I noticed a few little things that didn't look quite right.
[FIX #1] The axis tics weren't lining up like I would expect (i.e., in your example above, you should be able to draw a vertical and horizontal line through any point across all plots and the lines should cross through the corresponding point in the other plots, but as it sits now this doesn't occur.
[FIX #2] If you have an odd number of variables you are plotting with, the bottom right corner axes doesn't pull the correct xtics or ytics. It just leaves it as the default 0..1 ticks.
Not a fix, but I made it optional to explicitly input names, so that it puts a default xi for variable i in the diagonal positions.
Below you'll find an updated version of your code that addresses these two points, otherwise preserving the beauty of your code.
import itertools
import numpy as np
import matplotlib.pyplot as plt
def scatterplot_matrix(data, names=[], **kwargs):
"""
Plots a scatterplot matrix of subplots. Each row of "data" is plotted
against other rows, resulting in a nrows by nrows grid of subplots with the
diagonal subplots labeled with "names". Additional keyword arguments are
passed on to matplotlib's "plot" command. Returns the matplotlib figure
object containg the subplot grid.
"""
numvars, numdata = data.shape
fig, axes = plt.subplots(nrows=numvars, ncols=numvars, figsize=(8,8))
fig.subplots_adjust(hspace=0.0, wspace=0.0)
for ax in axes.flat:
# Hide all ticks and labels
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
# Set up ticks only on one side for the "edge" subplots...
if ax.is_first_col():
ax.yaxis.set_ticks_position('left')
if ax.is_last_col():
ax.yaxis.set_ticks_position('right')
if ax.is_first_row():
ax.xaxis.set_ticks_position('top')
if ax.is_last_row():
ax.xaxis.set_ticks_position('bottom')
# Plot the data.
for i, j in zip(*np.triu_indices_from(axes, k=1)):
for x, y in [(i,j), (j,i)]:
# FIX #1: this needed to be changed from ...(data[x], data[y],...)
axes[x,y].plot(data[y], data[x], **kwargs)
# Label the diagonal subplots...
if not names:
names = ['x'+str(i) for i in range(numvars)]
for i, label in enumerate(names):
axes[i,i].annotate(label, (0.5, 0.5), xycoords='axes fraction',
ha='center', va='center')
# Turn on the proper x or y axes ticks.
for i, j in zip(range(numvars), itertools.cycle((-1, 0))):
axes[j,i].xaxis.set_visible(True)
axes[i,j].yaxis.set_visible(True)
# FIX #2: if numvars is odd, the bottom right corner plot doesn't have the
# correct axes limits, so we pull them from other axes
if numvars%2:
xlimits = axes[0,-1].get_xlim()
ylimits = axes[-1,0].get_ylim()
axes[-1,-1].set_xlim(xlimits)
axes[-1,-1].set_ylim(ylimits)
return fig
if __name__=='__main__':
np.random.seed(1977)
numvars, numdata = 4, 10
data = 10 * np.random.random((numvars, numdata))
fig = scatterplot_matrix(data, ['mpg', 'disp', 'drat', 'wt'],
linestyle='none', marker='o', color='black', mfc='none')
fig.suptitle('Simple Scatterplot Matrix')
plt.show()
Thanks again for sharing this with us. I have used it many times! Oh, and I re-arranged the main() part of the code so that it can be a formal example code or not get called if it is being imported into another piece of code.
While reading the question I expected to see an answer including rpy. I think this is a nice option taking advantage of two beautiful languages. So here it is:
import rpy
import numpy as np
def main():
np.random.seed(1977)
numvars, numdata = 4, 10
data = 10 * np.random.random((numvars, numdata))
mpg = data[0,:]
disp = data[1,:]
drat = data[2,:]
wt = data[3,:]
rpy.set_default_mode(rpy.NO_CONVERSION)
R_data = rpy.r.data_frame(mpg=mpg,disp=disp,drat=drat,wt=wt)
# Figure saved as eps
rpy.r.postscript('pairsPlot.eps')
rpy.r.pairs(R_data,
main="Simple Scatterplot Matrix Via RPy")
rpy.r.dev_off()
# Figure saved as png
rpy.r.png('pairsPlot.png')
rpy.r.pairs(R_data,
main="Simple Scatterplot Matrix Via RPy")
rpy.r.dev_off()
rpy.set_default_mode(rpy.BASIC_CONVERSION)
if __name__ == '__main__': main()
I can't post an image to show the result :( sorry!

Categories