Annotate heatmap with value from Pandas dataframe

Annotate heatmap with value from Pandas dataframe - python

I would like to annotate a heatmap with the values that I pass from a dataframe into the function below. I have looked at matplotlib.text but have not been able to get the values from my dataframe in a desired way in my heatmap. I have pasted in my function for generating a heatmap below, after that my dataframe and the output from the heatmap call. I would like to plot each value from my dataframe in the center of each cell in the heatmap.
Function for generating a heatmap:
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
def heatmap_binary(df,
edgecolors='w',
#cmap=mpl.cm.RdYlGn,
log=False):
width = len(df.columns)/7*10
height = len(df.index)/7*10
fig, ax = plt.subplots(figsize=(20,10))#(figsize=(width,height))
cmap, norm = mcolors.from_levels_and_colors([0, 0.05, 1],['Teal', 'MidnightBlue'] ) # ['MidnightBlue', Teal]['Darkgreen', 'Darkred']
heatmap = ax.pcolor(df ,
edgecolors=edgecolors, # put white lines between squares in heatmap
cmap=cmap,
norm=norm)
ax.autoscale(tight=True) # get rid of whitespace in margins of heatmap
ax.set_aspect('equal') # ensure heatmap cells are square
ax.xaxis.set_ticks_position('top') # put column labels at the top
ax.tick_params(bottom='off', top='off', left='off', right='off') # turn off ticks
plt.yticks(np.arange(len(df.index)) + 0.5, df.index, size=20)
plt.xticks(np.arange(len(df.columns)) + 0.5, df.columns, rotation=90, size= 15)
# ugliness from http://matplotlib.org/users/tight_layout_guide.html
from mpl_toolkits.axes_grid1 import make_axes_locatable
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", "3%", pad="1%")
plt.colorbar(heatmap, cax=cax)
plt.show()
Herre is an example of My dataframe :
dataframe :
0-5 km / h 5-40 km / h 40-80 km / h 80-120 km / h \
NORDIC 0.113955 0.191888 0.017485 -0.277528
MIDDLE EU 0.117903 0.197084 -0.001447 -0.332677
KOREA 0.314008 0.236503 -0.067174 -0.396518
CHINA 0.314008 0.236503 -0.067174 -0.396518
120-160 km / h 160-190 km / h 190 km / h
NORDIC -0.054365 0.006107 0.002458
MIDDLE EU 0.002441 0.012097 0.004599
KOREA -0.087191 0.000331 0.000040
CHINA -0.087191 0.000331 0.000040
Generating the heatmap:
heatmap_binary(dataframe)
Any ideas?
Update to clarify my problem
I tried the proposed solution from question which has the result I'm looking for:
how to annotate heatmap with text in matplotlib?
However, I still have a problem using the matplotlib.text function for positioning the values in the heatmap:
Here is my cod for trying this solution:
import matplotlib.pyplot as plt
import numpy as np
data = dataframe.values
heatmap_binary(dataframe)
for y in range(data.shape[0]):
for x in range(data.shape[1]):
plt.text(data[y,x] +0.05 , data[y,x] + 0.05, '%.4f' % data[y, x], #data[y,x] +0.05 , data[y,x] + 0.05
horizontalalignment='center',
verticalalignment='center',
color='w')
#plt.colorbar(heatmap)
plt.show()
added plot: (different coloring but same problem)

This functionality is provided by the seaborn package. It can produce maps like
An example usage of seaborn is
import seaborn as sns
sns.set()
# Load the example flights dataset and conver to long-form
flights_long = sns.load_dataset("flights")
flights = flights_long.pivot("month", "year", "passengers")
# Draw a heatmap with the numeric values in each cell
sns.heatmap(flights, annot=True, fmt="d", linewidths=.5)

The values you were using for your coordinates in your for loop were screwed up. Also you were using plt.colorbar instead of something cleaner like fig.colorbar. Try this (it gets the job done, with no effort made to otherwise cleanup the code):
def heatmap_binary(df,
edgecolors='w',
#cmap=mpl.cm.RdYlGn,
log=False):
width = len(df.columns)/7*10
height = len(df.index)/7*10
fig, ax = plt.subplots(figsize=(20,10))#(figsize=(width,height))
cmap, norm = mcolors.from_levels_and_colors([0, 0.05, 1],['Teal', 'MidnightBlue'] ) # ['MidnightBlue', Teal]['Darkgreen', 'Darkred']
heatmap = ax.pcolor(df ,
edgecolors=edgecolors, # put white lines between squares in heatmap
cmap=cmap,
norm=norm)
data = df.values
for y in range(data.shape[0]):
for x in range(data.shape[1]):
plt.text(x + 0.5 , y + 0.5, '%.4f' % data[y, x], #data[y,x] +0.05 , data[y,x] + 0.05
horizontalalignment='center',
verticalalignment='center',
color='w')
ax.autoscale(tight=True) # get rid of whitespace in margins of heatmap
ax.set_aspect('equal') # ensure heatmap cells are square
ax.xaxis.set_ticks_position('top') # put column labels at the top
ax.tick_params(bottom='off', top='off', left='off', right='off') # turn off ticks
ax.set_yticks(np.arange(len(df.index)) + 0.5)
ax.set_yticklabels(df.index, size=20)
ax.set_xticks(np.arange(len(df.columns)) + 0.5)
ax.set_xticklabels(df.columns, rotation=90, size= 15)
# ugliness from http://matplotlib.org/users/tight_layout_guide.html
from mpl_toolkits.axes_grid1 import make_axes_locatable
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", "3%", pad="1%")
fig.colorbar(heatmap, cax=cax)
Then
df1 = pd.DataFrame(np.random.choice([0, 0.75], size=(4,5)), columns=list('ABCDE'), index=list('WXYZ'))
heatmap_binary(df1)
gives:

This is because you're using plt.text after you've added another axes.
The state machine will plot on the current axes, and after you've added a new one with divider.append_axes, the colorbar's axes is the current one. (Just calling plt.colorbar will not cause this, as it sets the current axes back to the original one afterwards if it creates the axes itself. If a specific axes object is passed in using the cax kwarg, it doesn't reset the "current" axes, as that's not what you'd normally want.)
Things like this are the main reason that you'll see so many people advising that you use the OO interface to matplotlib instead of the state machine interface. That way you know which axes object that you're plotting on.
For example, in your case, you could have heatmap_binary return the ax object that it creates, and the plot using ax.text instead of plt.text (and similar for the other plotting methods).

You also can use plotly.figure_factory to create heatmap from DataFrame, but you have convert it into list.
import plotly.figure_factory as ff
z = [your_dataframe].values.tolist()
x = [your_dataframe].columns.tolist()
y = [your_dataframe].index.tolist()
fig = ff.create_annotated_heatmap(z, x=x, y=y, annotation_text=z, colorscale='viridis')
# for add annotation into Heatmap
for i in range(len(fig.layout.annotations)):
fig.layout.annotations[i].font.size = 12
# show your Heatmap
fig.show()

Related

How to customize seaborn boxplot with specific color sequence when boxplots have hue

I want to make boxplots with hues but I want to color code it so that each specific X string is a certain color with the hue just being a lighter color. I am able to do a boxplot without a hue. When I incorporate the hue, I get the second boxplot which loses the colors. Can someone help me customize the colors for the figure that contains the hue?
Essentially, its what the answer for this question is but with boxplots.
This is my code:
first boxplot
order=['Ash1','E1A','FUS','p53']
colors=['gold','teal','darkorange','royalblue']
color_dict=dict(zip(order,colors))
fig,ax=plt.subplots(figsize=(25,15))
bp=sns.boxplot(data=df_idrs, x=df_idrs["construct"], y=df_idrs['Norm_Ef_IDR/Ef_GS'],ax=ax,palette=color_dict)
sns.stripplot(ax=ax,y='Norm_Ef_IDR/Ef_GS', x='construct', data=df_idrs,palette=color_dict,
jitter=1, marker='o', alpha=0.4,edgecolor='black',linewidth=1, dodge=True)
ax.axhline(y=1,linestyle="--",color='black',linewidth=2)
plt.legend(loc='upper left', bbox_to_anchor=(1.03, 1))
second boxplot
order=['Ash1','E1A','FUS','p53']
colors=['gold','teal','darkorange','royalblue']
color_dict=dict(zip(order,colors))
fig,ax=plt.subplots(figsize=(25,15))
bp=sns.boxplot(data=df_idrs, x=df_idrs["construct"], y=df_idrs['Norm_Ef_IDR/Ef_GS'],ax=ax, hue=df_idrs["location"])
sns.stripplot(y='Norm_Ef_IDR/Ef_GS', x='construct', data=df_idrs, hue=df_idrs["location"],
jitter=1, marker='o', alpha=0.4,edgecolor='black',linewidth=1, dodge=True)
ax.axhline(y=1,linestyle="--",color='black',linewidth=2)
plt.legend(loc='upper left', bbox_to_anchor=(1.03, 1))
The only thing that changed was the palette to hue. I have seen many examples on here but I am unable to get them to work. Using the second code, I have tried the following:
Nothing happens for this one.
for ind, bp in enumerate(ax.findobj(PolyCollection)):
rgb = to_rgb(colors[ind // 2])
if ind % 2 != 0:
rgb = 0.5 + 0.5 * np.array(rgb) # make whiter
bp.set_facecolor(rgb)
I get index out of range for the following one.
for i in range(0,4):
mybox = bp.artists[i]
mybox.set_facecolor(color_dict[order[i]])

Matplotlib stores the boxes in ax.patches, but there are also 2 dummy patches (used to construct the legend) that need to be filtered away. The dots of the stripplot are stored in ax.collections. There are also 2 dummy collections for the legend, but as those come at the end, they don't form a problem.
Some remarks:
sns.boxplot returns the subplot on which it was drawn; as it is called with ax=ax it will return that same ax
Setting jitter=1in the stripplot will smear the dots over a width of 1. 1 is the distance between the x positions, and the boxes are only 0.4 wide. To avoid clutter, the code below uses jitter=0.4.
Here is some example code starting from dummy test data:
from matplotlib import pyplot as plt
from matplotlib.legend_handler import HandlerTuple
from matplotlib.patches import PathPatch
from matplotlib.colors import to_rgb
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(20230215)
order = ['Ash1', 'E1A', 'FUS', 'p53']
colors = ['gold', 'teal', 'darkorange', 'royalblue']
hue_order = ['A', 'B']
df_idrs = pd.DataFrame({'construct': np.repeat(order, 200),
'Norm_Ef_IDR/Ef_GS': (np.random.normal(0.03, 1, 800).cumsum() + 10) / 15,
'location': np.tile(np.repeat(hue_order, 100), 4)})
fig, ax = plt.subplots(figsize=(12, 5))
sns.boxplot(data=df_idrs, x=df_idrs['construct'], y=df_idrs['Norm_Ef_IDR/Ef_GS'], hue='location',
order=order, hue_order=hue_order, ax=ax)
box_colors = [f + (1 - f) * np.array(to_rgb(c)) # whiten colors depending on hue
for c in colors for f in np.linspace(0, 0.5, len(hue_order))]
box_patches = [p for p in ax.patches if isinstance(p, PathPatch)]
for patch, color in zip(box_patches, box_colors):
patch.set_facecolor(color)
sns.stripplot(y='Norm_Ef_IDR/Ef_GS', x='construct', data=df_idrs, hue=df_idrs['location'],
jitter=0.4, marker='o', alpha=0.4, edgecolor='black', linewidth=1, dodge=True, ax=ax)
for collection, color in zip(ax.collections, box_colors):
collection.set_facecolor(color)
ax.axhline(y=1, linestyle='--', color='black', linewidth=2)
handles = [tuple(box_patches[i::len(hue_order)]) for i in range(len(hue_order))]
ax.legend(handles=handles, labels=hue_order, title='hue category',
handlelength=4, handler_map={tuple: HandlerTuple(ndivide=None, pad=0)},
loc='upper left', bbox_to_anchor=(1.01, 1))
plt.tight_layout()
plt.show()

How to increase plottable space above a subplot in matplotlib?

I am currently making a plot on matplotlib, which looks like below.
The code for which is:
fig, ax1 = plt.subplots(figsize=(20,5))
ax2 = ax1.twinx()
# plt.subplots_adjust(top=1.4)
ax2.fill_between(dryhydro_df['Time'],dryhydro_df['Flow [m³/s]'],0,facecolor='lightgrey')
ax2.set_ylim([0,10])
AB = ax2.fill_between(dryhydro_df['Time'],[12]*len(dryhydro_df['Time']),9.25,facecolor=colors[0],alpha=0.5,clip_on=False)
ab = ax2.scatter(presence_df['Datetime'][presence_df['AB']==True],[9.5]*sum(presence_df['AB']==True),marker='X',color='black')
# tidal heights
ax1.plot(tide_df['Time'],tide_df['Tide'],color='dimgrey')
I want the blue shaded region and black scatter to be above the plot. I can move the elements above the plot by using clip_on=False but I think I need to extend the space above the plot to do visualise it. Is there a way to do this? Mock-up of what I need is below:

You can use clip_on=False to draw outside the main plot. To position the elements, an xaxis transform helps. That way, x-values can be used in the x direction, while the y-direction uses "axes coordinates". ax.transAxes() uses "axes coordinates" for both directions.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
dates = pd.date_range('2018-07-01', '2018-07-31', freq='H')
xs = dates.to_numpy().astype(float)
ys = np.sin(xs * .091) * (np.sin(xs * .023) ** 2 + 1)
fig, ax1 = plt.subplots(figsize=(20, 5))
ax1.plot(dates, ys)
ax1.scatter(np.random.choice(dates, 10), np.repeat(1.05, 10), s=20, marker='*', transform=ax1.get_xaxis_transform(),
clip_on=False)
ax1.plot([0, 1], [1.05, 1.05], color='steelblue', lw=20, alpha=0.2, transform=ax1.transAxes, clip_on=False)
plt.tight_layout() # fit labels etc. nicely
plt.subplots_adjust(top=0.9) # make room for the additional elements
plt.show()

Is there a Python package for plotting a spike map

A spike map (as shown in the image below, implemented with D3.js) is a method for displaying differences in the magnitude of a certain discrete, abruptly changing phenomenon such as counts of people.
Is there a package I could use (or example code I could follow) to create a static spike map, similar to the map shown above, in Python? e.g. Matplotlib

You could try with a Ridge Plot. It's not exactly the same, but maybe it can work for you. The implementation in seaborn looks like this:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# Create the data
rs = np.random.RandomState(1979)
x = rs.randn(500)
g = np.tile(list("ABCDEFGHIJ"), 50)
df = pd.DataFrame(dict(x=x, g=g))
m = df.g.map(ord)
df["x"] += m
# Initialize the FacetGrid object
pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="g", hue="g", aspect=15, height=.5, palette=pal)
# Draw the densities in a few steps
g.map(sns.kdeplot, "x",
bw_adjust=.5, clip_on=False,
fill=True, alpha=1, linewidth=1.5)
g.map(sns.kdeplot, "x", clip_on=False, color="w", lw=2, bw_adjust=.5)
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Define and use a simple function to label the plot in axes coordinates
def label(x, color, label):
ax = plt.gca()
ax.text(0, .2, label, fontweight="bold", color=color,
ha="left", va="center", transform=ax.transAxes)
g.map(label, "x")
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25)
# Remove axes details that don't play well with overlap
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)
plt.show()
And creates the following graph

Python - dual y axis chart, align zero

I'm trying to create a horizontal bar chart, with dual x axes. The 2 axes are very different in scale, 1 set goes from something like -5 to 15 (positive and negative value), the other set is more like 100 to 500 (all positive values).
When I plot this, I'd like to align the 2 axes so zero shows at the same position, and only the negative values are to the left of this. Currently the set with all positive values starts at the far left, and the set with positive and negative starts in the middle of the overall plot.
I found the align_yaxis example, but I'm struggling to align the x axes.
Matplotlib bar charts: Aligning two different y axes to zero
Here is an example of what I'm working on with simple test data. Any ideas/suggestions? thanks
import pandas as pd
import matplotlib.pyplot as plt
d = {'col1':['Test 1','Test 2','Test 3','Test 4'],'col 2':[1.4,-3,1.3,5],'Col3':[900,750,878,920]}
df = pd.DataFrame(data=d)
fig = plt.figure() # Create matplotlib figure
ax = fig.add_subplot(111) # Create matplotlib axes
ax2 = ax.twiny() # Create another axes that shares the same y-axis as ax.
width = 0.4
df['col 2'].plot(kind='barh', color='darkblue', ax=ax, width=width, position=1,fontsize =4, figsize=(3.0, 5.0))
df['Col3'].plot(kind='barh', color='orange', ax=ax2, width=width, position=0, fontsize =4, figsize=(3.0, 5.0))
ax.set_yticklabels(df.col1)
ax.set_xlabel('Positive and Neg',color='darkblue')
ax2.set_xlabel('Positive Only',color='orange')
ax.invert_yaxis()
plt.show()

I followed the link from a question and eventually ended up at this answer : https://stackoverflow.com/a/10482477/5907969
The answer has a function to align the y-axes and I have modified the same to align x-axes as follows:
def align_xaxis(ax1, v1, ax2, v2):
"""adjust ax2 xlimit so that v2 in ax2 is aligned to v1 in ax1"""
x1, _ = ax1.transData.transform((v1, 0))
x2, _ = ax2.transData.transform((v2, 0))
inv = ax2.transData.inverted()
dx, _ = inv.transform((0, 0)) - inv.transform((x1-x2, 0))
minx, maxx = ax2.get_xlim()
ax2.set_xlim(minx+dx, maxx+dx)
And then use it within the code as follows:
import pandas as pd
import matplotlib.pyplot as plt
d = {'col1':['Test 1','Test 2','Test 3','Test 4'],'col 2' [1.4,-3,1.3,5],'Col3':[900,750,878,920]}
df = pd.DataFrame(data=d)
fig = plt.figure() # Create matplotlib figure
ax = fig.add_subplot(111) # Create matplotlib axes
ax2 = ax.twiny() # Create another axes that shares the same y-axis as ax.
width = 0.4
df['col 2'].plot(kind='barh', color='darkblue', ax=ax, width=width, position=1,fontsize =4, figsize=(3.0, 5.0))
df['Col3'].plot(kind='barh', color='orange', ax=ax2, width=width, position=0, fontsize =4, figsize=(3.0, 5.0))
ax.set_yticklabels(df.col1)
ax.set_xlabel('Positive and Neg',color='darkblue')
ax2.set_xlabel('Positive Only',color='orange')
align_xaxis(ax,0,ax2,0)
ax.invert_yaxis()
plt.show()
This will give you what you're looking for

Matplotlib subplot: imshow + plot

I want to create a plot that looks like the image below. There are two unique plots in the figure. img1 was generated using plt.imshow(), while img2 was generated using plt.plot(). The code I used to generate each of the plots is provided below
plt.clf()
plt.imshow(my_matrix)
plt.savefig("mymatrix.png")
plt.clf()
plt.plot(x,y,'o-')
plt.savefig("myplot.png")
The matrix used in img1 is 64x64. The same range for img2's x-axis (x=range(64)). Ideally, the x-axes of the two img2's align with the axes of img1.
It is important to note that the final image is reminiscent of seaborn's jointplot(), but the marginal subplots (img2) in the image below do not show distribution plots.

You can use the make_axes_locatable functionality of the mpl_toolkits.axes_grid1 to create shared axes along both directions of the central imshow plot.
Here is an example:
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import numpy as np; np.random.seed(0)
Z = np.random.poisson(lam=6, size=(64,64))
x = np.mean(Z, axis=0)
y = np.mean(Z, axis=1)
fig, ax = plt.subplots()
ax.imshow(Z)
# create new axes on the right and on the top of the current axes.
divider = make_axes_locatable(ax)
axtop = divider.append_axes("top", size=1.2, pad=0.3, sharex=ax)
axright = divider.append_axes("right", size=1.2, pad=0.4, sharey=ax)
#plot to the new axes
axtop.plot(np.arange(len(x)), x, marker="o", ms=1, mfc="k", mec="k")
axright.plot(y, np.arange(len(y)), marker="o", ms=1, mfc="k", mec="k")
#adjust margins
axright.margins(y=0)
axtop.margins(x=0)
plt.tight_layout()
plt.show()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Annotate heatmap with value from Pandas dataframe - python

Related

How to customize seaborn boxplot with specific color sequence when boxplots have hue

How to increase plottable space above a subplot in matplotlib?

Is there a Python package for plotting a spike map

Python - dual y axis chart, align zero

Matplotlib subplot: imshow + plot

Categories

Resources