How can I rotate annotated seaborn heatmap data and legend? - python

I created to a seaborn heatmap to summarize Teils_U coefficients. The data is horizontally displayed in the heatmap. Now, I would like to rotate the data and the legend. I know that you can roate the x axis and y axis labels in a plot, but how can I rotate the data and the legend ?
This is my code:
#creates padnas dataframe to hold the values
theilu = pd.DataFrame(index=['Y'],columns=matrix.columns)
#store column names in variable columns
columns = matrix.columns
#iterate through each variable
for j in range(0,len(columns)):
#call teil_u function on "ziped" independant and dependant variable -> respectivley x & y in the functions section
u = theil_u(matrix['Y'].tolist(),matrix[columns[j]].tolist())
#select respecive columns needed for output
theilu.loc[:,columns[j]] = u
#handle nans if any
theilu.fillna(value=np.nan,inplace=True)
#plot correlation between fraud reported (y) and all other variables (x)
plt.figure(figsize=(20,1))
sns.heatmap(theilu,annot=True,fmt='.2f')
plt.show()
Here an image of what I am looking for:
Please let me know if you need and sample data or the teil_u function to recreate the problem. Thank you

The parameters of the annotation can be changed via annot_kws. One of them is the rotation.
Some parameters of the colorbar can be changed via cbar_kwsdict, but the unfortunately the orientation of the labels isn't one of them. Therefore, you need a handle to the colorbar's ax. One way is to create an ax beforehand, and pass it to sns.heatmap(..., cbar_ax=ax). An easier way is to get the handle afterwards: cbar = heatmap.collections[0].colorbar.
With this ax handle, you can change more properties of the colorbar, such as the orientation of its labels. Also, their vertical alignment can be changed to get them centered.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
data = np.random.rand(1, 12)
fig, ax = plt.subplots(figsize=(10,2))
heatmap = sns.heatmap(data, cbar=True, ax=ax,
annot=True, fmt='.2f', annot_kws={'rotation': 90})
cbar = heatmap.collections[0].colorbar
# heatmap.set_yticklabels(heatmap.get_yticklabels(), rotation=90)
heatmap.set_xticklabels(heatmap.get_xticklabels(), rotation=90)
cbar.ax.set_yticklabels(cbar.ax.get_yticklabels(), rotation=90, va='center')
plt.tight_layout()
plt.show()

You can pass argument to ax.text() (which is used to write the annotation) using the annot_kws= argument.
Therefore:
flights = sns.load_dataset("flights")
flights = flights.pivot("month", "year", "passengers")
fig, ax = plt.subplots(figsize=(8,8))
ax = sns.heatmap(flights, annot=True, fmt='d', annot_kws={'rotation':90})

Related

Add multiple axes from different sources into same figure

I am using Python/matplotlib to create a figure whereby it has three subplots, each returned from a different 'source' or class method.
For example, I have a script called 'plot_spectra.py' that contains the Spectra() class with method Plot().
So, calling Spectra('filename.ext').Plot() will return a tuple, as per the code below:
# create the plot
fig, ax = plt.subplots()
ax.contour(xx, yy, plane, levels=cl, cmap=cmap)
ax.set_xlim(ppm_1h_0, ppm_1h_1)
ax.set_ylim(ppm_13c_0, ppm_13c_1)
# return the contour plot
return fig, ax
It is my understanding that the 'figure' is the 'window' in matplotlib, and the 'ax' is an individual plot. I would then want to say, plot three of these 'ax' objects in the same figure, but I am struggling to do so because I keep getting an empty window and I think I have misunderstood what each object actually is.
Calling:
hnca, hnca_ax = Spectra('data/HNCA.ucsf', type='sparky').Plot(plane_ppm=resi.N(), vline=resi.H())
plt.subplot(2,2,1)
plt.subplot(hnca_ax)
eucplot, barplot = PlotEucXYIntensity(scores, x='H', y='N')
plt.subplot(2,2,3)
plt.subplot(eucplot)
plt.subplot(2,2,4)
plt.subplot(barplot)
plt.show()
Ultimately, what I am trying to obtain is a single window that looks like this:
Where each plot has been returned from a different function or class method.
What 'object' do I need to return from my functions? And how do I incorporate these three objects into a single figure?
I would suggest this kind of approach, where you specify the ax on which you want to plot in the function:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
def Spectra(data, ax):
ax.plot(data)
def PlotIntensity(data, ax):
ax.hist(data)
def SeabornScatter(data, ax):
sns.scatterplot(data, data, ax = ax)
spectrum = np.random.random((1000,))
plt.figure()
ax1 = plt.subplot(1,3,1)
Spectra(spectrum, ax1)
ax2 = plt.subplot(1,3,2)
SeabornScatter(spectrum, ax2)
ax3 = plt.subplot(1,3,3)
PlotIntensity(spectrum, ax3)
plt.tight_layout()
plt.show()
You can specify the grid for the subplots in very different ways, and you probably also want to have a look on the gridspec module.
One way to do this is:
f = plt.figure()
gs = f.add_gridspec(2,2)
ax = f.add_subplot(gs[0,:])
Think of the '2,2' as adding 2 row x 2 columns.
On the third line 'gs[0,:]' is telling to add a chart on row 0, all columns. This will create the chart on the top of your top. Note that indices begin with 0 and not with 1.
To add the 'eucplot' you will have to call a different ax on row 1 and column 0:
ax2 = f.add_subplot(gs[1,0])
Lastly, the 'barplot' will go in yet a different ax on row 1, column 1:
ax3 = f.add_subplot(gs[1,1])
See this site here for further reference: Customizing Figure Layouts Using GridSpec and Other Functions

Editing the labels and position of the axis ticks on a seaborn heatmap results in an empty plot

I am trying to plot a seaborn heatmap with custom locations and labels on both axes. The dataframe looks like this:
Dataframe
I can plot this normally with seaborn.heatmap:
fig, ax = plt.subplots(figsize=(8, 8))
sns.heatmap(genome_freq.applymap(lambda x: np.log10(x+1)),
ax=ax)
plt.show()
Normal heatmap
I have a list of positions I'd like to set as the xticks (binned_chrom_genome_pos):
[1000000, 248000000, 491000000, 690000000, 881000000, 1062000000, 1233000000, 1392000000, 1538000000, 1679000000, 1814000000, 1948000000, 2081000000, 2195000000, 2301000000, 2402000000, 2490000000, 2569000000, 2645000000, 2709000000, 2772000000, 2819000000, 2868000000, 3023000000]
However, when I try to modify the xticks, the plot becomes empty:
plt.xticks(binned_chrom_genome_pos)
Modified heatmap
I also noticed that the x-axis labels do not correspond to the ticks specified.
Could someone assist me in plotting this properly?
why the code does what it does
ax.get_xticks() returns the positions of the ticks. You can see that they are between 0.5 and 3000. These values refer to the index of your data. Large values, set by plt.xticks, or ax.set_xticks are still interpreted as data indices. So, if you have 10 rows of data, and set xticks to [0, 1000], the data in your figure will only occupy 1% of the x-range, hence disappearing. I am not sure if I am making myself clear, so I will give an example with synthetic data:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
#generating data
dic = {a:np.random.randint(0,1000,100) for a in range(0,1000000, 10000)}
genome_freq = pd.DataFrame(dic, index=range(0,1000000, 10000))
#plotting heatmaps
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 4))
sns.heatmap(genome_freq.applymap(lambda x: np.log10(x+1)),
ax=ax1)
sns.heatmap(genome_freq.applymap(lambda x: np.log10(x+1)),
ax=ax2)
old_ticks = ax2.get_xticks()
print(np.min(old_ticks), np.max(old_ticks), len(old_ticks)) # prints 0.5 99.5 34
ax2.set_xticks([0,300]) # setting xticks with values way larger than your index squishes your data
plt.show()
what can be done to fix it
So, what you want to do, is to change the xticks based on the size of your data, and then overwrite xticklabels:
Given the new labels from your question:
new_labels = [1000000, 248000000, 491000000, 690000000, 881000000, 1062000000, 1233000000, 1392000000, 1538000000, 1679000000, 1814000000, 1948000000, 2081000000, 2195000000, 2301000000, 2402000000, 2490000000, 2569000000, 2645000000, 2709000000, 2772000000, 2819000000, 2868000000, 3023000000]
len(new_labels) # returns 24
fig, ax = plt.subplots(figsize=(4, 4))
sns.heatmap(genome_freq.applymap(lambda x: np.log10(x+1)),
ax=ax)
So, now we want 24 evenly spaced xticks between the former minimum and the former maximum. We can use np.linspace to achieve that:
old_ticks = ax.get_xticks()
new_ticks = np.linspace(np.min(old_ticks), np.max(old_ticks), len(new_labels))
ax.set_xticks(new_ticks)
ax.set_xticklabels(new_labels)
plt.show()

Matplotlib scatter legend with colors using categorical variable

I have made a simple scatterplot using matplotlib showing data from 2 numerical variables (varA and varB) with colors that I defined with a 3rd categorical string variable (col) containing 10 unique colors (corresponding to another string variable with 10 unique names), all in the same Pandas DataFrame with 100+ rows.
Is there an easy way to create a legend for this scatterplot that shows the unique colored dots and their corresponding category names? Or should I somehow group the data and plot each category in a subplot to do this? This is what I have so far:
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
varA = df['A']
varB = df['B']
col = df['Color']
plt.scatter(varA,varB, c=col, alpha=0.8)
plt.legend()
plt.show()
I had to chime in, because I could not accept that I needed a for-loop to accomplish this. It just seems really annoying and unpythonic - especially when I'm not using Pandas. However, after some searching, I found the answer. You just need to import the 'collections' package so that you can access the PathCollections class and specifically, the legend_elements() method. See implementation below:
# imports
import matplotlib.collections
import numpy as np
# create random data and numerical labels
x = np.random.rand(10,2)
y = np.random.randint(4, size=10)
# create list of categories
labels = ['type1', 'type2', 'type3', 'type4']
# plot
fig, ax = plt.subplots()
scatter = ax.scatter(x[:,0], x[:,1], c=y)
handles, _ = scatter.legend_elements(prop="colors", alpha=0.6) # use my own labels
legend1 = ax.legend(handles, labels, loc="upper right")
ax.add_artist(legend1)
plt.show()
scatterplot legend with custom labels
Source:
https://matplotlib.org/stable/gallery/lines_bars_and_markers/scatter_with_legend.html
https://matplotlib.org/stable/api/collections_api.html#matplotlib.collections.PathCollection.legend_elements
Considering, Color is the column that has all the colors and labels, you can simply do following.
colors = list(df['Color'].unique())
for i in range(0 , len(colors)):
data = df.loc[df['Color'] == colors[i]]
plt.scatter('A', 'B', data=data, color='Color', label=colors[i])
plt.legend()
plt.show()
A simple way is to group your data by color, then plot all of the data on one plot. Pandas has a built in groupby function. For example:
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
for color, group in df.groupby(['Color']):
plt.scatter(group['A'], group['B'], c=color, alpha=0.8, label=color)
plt.legend()
plt.show()
Notice that we call plt.scatter once for each grouping of data. Then we only need to call plt.legend and plt.show once all of the data is in our plot.

Adding second legend to scatter plot

Is there a way to add a secondary legend to a scatterplot, where the size of the scatter is proportional to some data?
I have written the following code that generates a scatterplot. The color of the scatter represents the year (and is taken from a user-defined df) while the size of the scatter represents variable 3 (also taken from a df but is raw data):
import pandas as pd
colors = pd.DataFrame({'1985':'red','1990':'b','1995':'k','2000':'g','2005':'m','2010':'y'}, index=[0,1,2,3,4,5])
fig = plt.figure()
ax = fig.add_subplot(111)
for i in df.keys():
df[i].plot(kind='scatter',x='variable1',y='variable2',ax=ax,label=i,s=df[i]['variable3']/100, c=colors[i])
ax.legend(loc='upper right')
ax.set_xlabel("Variable 1")
ax.set_ylabel("Variable 2")
This code (with my data) produces the following graph:
So while the colors/years are well and clearly defined, the size of the scatter is not.
How can I add a secondary or additional legend that defines what the size of the scatter means?
You will need to create the second legend yourself, i.e. you need to create some artists to populate the legend with. In the case of a scatter we can use a normal plot and set the marker accordingly.
This is shown in the below example. To actually add a second legend we need to add the first legend to the axes, such that the new legend does not overwrite the first one.
import matplotlib.pyplot as plt
import matplotlib.colors
import numpy as np; np.random.seed(1)
import pandas as pd
plt.rcParams["figure.subplot.right"] = 0.8
v = np.random.rand(30,4)
v[:,2] = np.random.choice(np.arange(1980,2015,5), size=30)
v[:,3] = np.random.randint(5,13,size=30)
df= pd.DataFrame(v, columns=["x","y","year","quality"])
df.year = df.year.values.astype(int)
fig, ax = plt.subplots()
for i, (name, dff) in enumerate(df.groupby("year")):
c = matplotlib.colors.to_hex(plt.cm.jet(i/7.))
dff.plot(kind='scatter',x='x',y='y', label=name, c=c,
s=dff.quality**2, ax=ax)
leg = plt.legend(loc=(1.03,0), title="Year")
ax.add_artist(leg)
h = [plt.plot([],[], color="gray", marker="o", ms=i, ls="")[0] for i in range(5,13)]
plt.legend(handles=h, labels=range(5,13),loc=(1.03,0.5), title="Quality")
plt.show()
Have a look at http://matplotlib.org/users/legend_guide.html.
It shows how to have multiple legends (about halfway down) and there is another example that shows how to set the marker size.
If that doesn't work, then you can also create a custom legend (last example).

Is there a function to make scatterplot matrices in matplotlib?

Example of scatterplot matrix
Is there such a function in matplotlib.pyplot?
For those who do not want to define their own functions, there is a great data analysis libarary in Python, called Pandas, where one can find the scatter_matrix() method:
from pandas.plotting import scatter_matrix
df = pd.DataFrame(np.random.randn(1000, 4), columns = ['a', 'b', 'c', 'd'])
scatter_matrix(df, alpha = 0.2, figsize = (6, 6), diagonal = 'kde')
Generally speaking, matplotlib doesn't usually contain plotting functions that operate on more than one axes object (subplot, in this case). The expectation is that you'd write a simple function to string things together however you'd like.
I'm not quite sure what your data looks like, but it's quite simple to just build a function to do this from scratch. If you're always going to be working with structured or rec arrays, then you can simplify this a touch. (i.e. There's always a name associated with each data series, so you can omit having to specify names.)
As an example:
import itertools
import numpy as np
import matplotlib.pyplot as plt
def main():
np.random.seed(1977)
numvars, numdata = 4, 10
data = 10 * np.random.random((numvars, numdata))
fig = scatterplot_matrix(data, ['mpg', 'disp', 'drat', 'wt'],
linestyle='none', marker='o', color='black', mfc='none')
fig.suptitle('Simple Scatterplot Matrix')
plt.show()
def scatterplot_matrix(data, names, **kwargs):
"""Plots a scatterplot matrix of subplots. Each row of "data" is plotted
against other rows, resulting in a nrows by nrows grid of subplots with the
diagonal subplots labeled with "names". Additional keyword arguments are
passed on to matplotlib's "plot" command. Returns the matplotlib figure
object containg the subplot grid."""
numvars, numdata = data.shape
fig, axes = plt.subplots(nrows=numvars, ncols=numvars, figsize=(8,8))
fig.subplots_adjust(hspace=0.05, wspace=0.05)
for ax in axes.flat:
# Hide all ticks and labels
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
# Set up ticks only on one side for the "edge" subplots...
if ax.is_first_col():
ax.yaxis.set_ticks_position('left')
if ax.is_last_col():
ax.yaxis.set_ticks_position('right')
if ax.is_first_row():
ax.xaxis.set_ticks_position('top')
if ax.is_last_row():
ax.xaxis.set_ticks_position('bottom')
# Plot the data.
for i, j in zip(*np.triu_indices_from(axes, k=1)):
for x, y in [(i,j), (j,i)]:
axes[x,y].plot(data[x], data[y], **kwargs)
# Label the diagonal subplots...
for i, label in enumerate(names):
axes[i,i].annotate(label, (0.5, 0.5), xycoords='axes fraction',
ha='center', va='center')
# Turn on the proper x or y axes ticks.
for i, j in zip(range(numvars), itertools.cycle((-1, 0))):
axes[j,i].xaxis.set_visible(True)
axes[i,j].yaxis.set_visible(True)
return fig
main()
You can also use Seaborn's pairplot function:
import seaborn as sns
sns.set()
df = sns.load_dataset("iris")
sns.pairplot(df, hue="species")
Thanks for sharing your code! You figured out all the hard stuff for us. As I was working with it, I noticed a few little things that didn't look quite right.
[FIX #1] The axis tics weren't lining up like I would expect (i.e., in your example above, you should be able to draw a vertical and horizontal line through any point across all plots and the lines should cross through the corresponding point in the other plots, but as it sits now this doesn't occur.
[FIX #2] If you have an odd number of variables you are plotting with, the bottom right corner axes doesn't pull the correct xtics or ytics. It just leaves it as the default 0..1 ticks.
Not a fix, but I made it optional to explicitly input names, so that it puts a default xi for variable i in the diagonal positions.
Below you'll find an updated version of your code that addresses these two points, otherwise preserving the beauty of your code.
import itertools
import numpy as np
import matplotlib.pyplot as plt
def scatterplot_matrix(data, names=[], **kwargs):
"""
Plots a scatterplot matrix of subplots. Each row of "data" is plotted
against other rows, resulting in a nrows by nrows grid of subplots with the
diagonal subplots labeled with "names". Additional keyword arguments are
passed on to matplotlib's "plot" command. Returns the matplotlib figure
object containg the subplot grid.
"""
numvars, numdata = data.shape
fig, axes = plt.subplots(nrows=numvars, ncols=numvars, figsize=(8,8))
fig.subplots_adjust(hspace=0.0, wspace=0.0)
for ax in axes.flat:
# Hide all ticks and labels
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
# Set up ticks only on one side for the "edge" subplots...
if ax.is_first_col():
ax.yaxis.set_ticks_position('left')
if ax.is_last_col():
ax.yaxis.set_ticks_position('right')
if ax.is_first_row():
ax.xaxis.set_ticks_position('top')
if ax.is_last_row():
ax.xaxis.set_ticks_position('bottom')
# Plot the data.
for i, j in zip(*np.triu_indices_from(axes, k=1)):
for x, y in [(i,j), (j,i)]:
# FIX #1: this needed to be changed from ...(data[x], data[y],...)
axes[x,y].plot(data[y], data[x], **kwargs)
# Label the diagonal subplots...
if not names:
names = ['x'+str(i) for i in range(numvars)]
for i, label in enumerate(names):
axes[i,i].annotate(label, (0.5, 0.5), xycoords='axes fraction',
ha='center', va='center')
# Turn on the proper x or y axes ticks.
for i, j in zip(range(numvars), itertools.cycle((-1, 0))):
axes[j,i].xaxis.set_visible(True)
axes[i,j].yaxis.set_visible(True)
# FIX #2: if numvars is odd, the bottom right corner plot doesn't have the
# correct axes limits, so we pull them from other axes
if numvars%2:
xlimits = axes[0,-1].get_xlim()
ylimits = axes[-1,0].get_ylim()
axes[-1,-1].set_xlim(xlimits)
axes[-1,-1].set_ylim(ylimits)
return fig
if __name__=='__main__':
np.random.seed(1977)
numvars, numdata = 4, 10
data = 10 * np.random.random((numvars, numdata))
fig = scatterplot_matrix(data, ['mpg', 'disp', 'drat', 'wt'],
linestyle='none', marker='o', color='black', mfc='none')
fig.suptitle('Simple Scatterplot Matrix')
plt.show()
Thanks again for sharing this with us. I have used it many times! Oh, and I re-arranged the main() part of the code so that it can be a formal example code or not get called if it is being imported into another piece of code.
While reading the question I expected to see an answer including rpy. I think this is a nice option taking advantage of two beautiful languages. So here it is:
import rpy
import numpy as np
def main():
np.random.seed(1977)
numvars, numdata = 4, 10
data = 10 * np.random.random((numvars, numdata))
mpg = data[0,:]
disp = data[1,:]
drat = data[2,:]
wt = data[3,:]
rpy.set_default_mode(rpy.NO_CONVERSION)
R_data = rpy.r.data_frame(mpg=mpg,disp=disp,drat=drat,wt=wt)
# Figure saved as eps
rpy.r.postscript('pairsPlot.eps')
rpy.r.pairs(R_data,
main="Simple Scatterplot Matrix Via RPy")
rpy.r.dev_off()
# Figure saved as png
rpy.r.png('pairsPlot.png')
rpy.r.pairs(R_data,
main="Simple Scatterplot Matrix Via RPy")
rpy.r.dev_off()
rpy.set_default_mode(rpy.BASIC_CONVERSION)
if __name__ == '__main__': main()
I can't post an image to show the result :( sorry!

Categories