I would like to guide the reader's attention to just some columns (or rows and columns) in a heatmap, while still retaining the full context.
I can use alt.condition to alter color and opacity. Both work to some extent. But changes in opacity visualize in a similar way as changes in value. And using a different color changes the perception of values. What I would like to do instead is to put yellow or red borders around the consecutive columns I want to highlight.
This is what I have now. Any other ideas?
import altair as alt
alt.data_transformers.disable_max_rows()
def create_att_chart(df, keys_to_highlight=[], width=150, height=150, title=None, labels_x=True, labels_y=True):
properties = {}
if title:
properties['title'] = title
if width: properties['width'] = width
if height: properties['height'] = height
chart = alt.Chart(df).mark_rect().encode(
x=alt.X('k:N', sort=None, axis=alt.Axis(labels=labels_x, title=None, ticks=False), title=None),
y=alt.Y('q:N', sort=None, axis=alt.Axis(labels=labels_y, title=None, ticks=False), title=None),
opacity=alt.Opacity('a:Q', legend=None),
column=alt.Column('h:N', title=None, header=alt.Header(labels=False), spacing=0.),
row= alt.Row( 'l:N', title=None, header=alt.Header(labels=False), spacing=5.))
if keys_to_highlight:
chart = chart.encode(
color=alt.condition(
alt.Predicate(alt.FieldOneOfPredicate(field='k', oneOf=keys_to_highlight)),
alt.value('orange'),
alt.value('blue')))
else:
chart = chart.encode(color=alt.value('blue'))
return chart.properties(**properties)
[..]
((create_att_chart(df_pt, ['sage', '##maker'], title='Pre-Trained') | create_att_chart(df_ft, ['sage', '##maker'], title='Fine-Tuned', labels_y=False)).properties(padding=0))
You could try use the condition for the stroke encoding instead of color, but I think that would give you strokes around each box, which is probably not what you want. Instead you could use mark_rule or mark_rect with this example from the docs:
import altair as alt
import numpy as np
import pandas as pd
# Compute x^2 + y^2 across a 2D grid
x, y = np.meshgrid(range(-5, 5), range(-5, 5))
z = x ** 2 + y ** 2
# Convert this grid to columnar data expected by Altair
source = pd.DataFrame({'x': x.ravel(), 'y': y.ravel(), 'z': z.ravel()})
heatmap = alt.Chart(source).mark_rect().encode(
x='x:O',
y='y:O',
color=alt.Color('z:Q', scale=alt.Scale(scheme='blues')))
Now add the rules:
rule1 = alt.Chart(df).mark_rule(stroke='orange', strokeWidth=2).encode(x=alt.value(20))
rule2 = alt.Chart(df).mark_rule(stroke='orange', strokeWidth=2).encode(x=alt.value(60))
heatmap + rule1 + rule2
A top rule might be more appealing/elegant and you could add text above it with mark_text if needed:
rule1 = alt.Chart(df).mark_rule(stroke='orange', strokeWidth=3).encode(
y=alt.value(-5),
x=alt.value(20),
x2=alt.value(60))
heatmap + rule1
mark_rect works but add the lines in the middle of squares since the scale is ordinal and a quantitative mark_rect messes up the axis:
df = pd.DataFrame({'x': [0], 'x2': [3]})
box = alt.Chart(df).mark_rect(color='', stroke='orange', strokeWidth=2).encode(
x='x:O',
x2=alt.X2('x2:O', title='x'))
heatmap + box
If you try to add the lines in between, new ordinal axis marks will be created. You could abuse this and make the lines white to highlight by separation but the ticks on the axis are still there, so you would have to remove them with lablExpr or similar.
df = pd.DataFrame({'x': [0.5], 'x2': [3.5]})
box = alt.Chart(df).mark_rect(color='', stroke='white').encode(
x='x:O',
x2=alt.X2('x2:O', title='x'))
(heatmap + box).configure_view(stroke=None)
Related
When layered above a heatmap, the Altair scatterplot only seems to work if the point values are also on the axis of the heatmap. I any other case, white lines along the x and y-values are added. Here's a minimal example:
import streamlit as st
import altair as alt
import numpy as np
import pandas as pd
# Compute x^2 + y^2 across a 2D grid
x, y = np.meshgrid(range(-5, 5), range(-5, 5))
z = x ** 2 + y ** 2
# Convert this grid to columnar data expected by Altair
source = pd.DataFrame({'x': x.ravel(),
'y': y.ravel(),
'z': z.ravel()})
c = alt.Chart(source).mark_rect().encode(
x='x:O',
y='y:O',
color='z:Q'
)
scatter_source = pd.DataFrame({'x': [-1.001,-3], 'y': [0,1]})
s = alt.Chart(scatter_source).mark_circle(size=100).encode(
x='x:O',
y='y:O')
st.altair_chart(c + s)
Is there any way to prevent this behavior? I'd like to animate the points later on, so adding values to the heatmap axis is not an option.
Ordinal encodings (marked by :O) will always create a discrete axis with one bin per unique value. It sounds like you would like to visualize your data with a quantitative encoding (marked by :Q), which creates a continuous, real-valued axis.
In the case of the heatmap, though, this complicates things: if you're no longer treating the data as ordered categories, you must specify the starting and ending point for each bin along each axis. This requires some thought about what your bins represent: does the value "2" represent numbers spanning from 2 to 3? from 1 to 2? from 1.5 to 2.5? The answer will depend on context.
Here is an example of computing these bin boundaries using a calculate transform, assuming the values represent the center of unit bins:
c = alt.Chart(source).transform_calculate(
x1=alt.datum.x - 0.5,
x2=alt.datum.x + 0.5,
y1=alt.datum.y - 0.5,
y2=alt.datum.y + 0.5,
).mark_rect().encode(
x='x1:Q', x2='x2:Q',
y='y1:Q', y2='y2:Q',
color='z:Q'
).properties(
width=400, height=400
)
scatter_source = pd.DataFrame({'x': [-1.001,-3], 'y': [0,1]})
s = alt.Chart(scatter_source).mark_circle(size=100).encode(
x='x:Q',
y='y:Q'
)
st.altair_chart(c + s)
Alternatively, if you would like this binning to happen more automatically, you can use a bin transform on each axis:
c = alt.Chart(source).mark_rect().encode(
x=alt.X('x:Q', bin=True),
y=alt.Y('y:Q', bin=True),
color='z:Q'
).properties(
width=400,
height=400
)
scatter_source = pd.DataFrame({'x': [-1.001,-3], 'y': [0,1]})
s = alt.Chart(scatter_source).mark_circle(size=100).encode(
x='x:Q',
y='y:Q'
)
I'm trying to create an overlay plot of a binary grid map and a simple line plot. However, when creating a layered plot, the axis are not aligned and the plot becomes unreadable. Ideally, I'd like to have both plot share a single axis so that the line coordinates match the map coordinates.
Here's a basic snippet of my attempt:
import torch as th
import altair as alt
import pandas as pd
xv, yv = th.meshgrid(th.linspace(-10, 10, 100), th.linspace(-10, 10, 100))
o_map = th.zeros_like(xv)
o_map[40:60, 40:60] = 1 # add obstacle centred on origin
map_df = pd.DataFrame(
{"x": xv.flatten(), "y": yv.flatten(), "z": o_map.flatten()}
)
map_chart = (
alt.Chart(map_df)
.mark_rect()
.encode(
x=alt.X("x:O", axis=alt.Axis(format=".2")),
y=alt.Y("y:O", axis=alt.Axis(format=".2")),
color="z:N",
)
.properties(width=500, height=500)
)
x = th.linspace(-5, 10, 100)
line_df = pd.DataFrame({"x": x, "y": 0.2 * x ** 2 - 3})
line_chart = alt.Chart(line_df).mark_line(color="red").encode(x="x:Q", y="y:Q")
layer_chart = map_chart + line_chart
The resulting plots are as following:
Line plot
Binary map
Layered plot
If you change the x and y channel data type in map_chart from 'O' to 'Q' the axis should be aligned automatically.
...
map_chart = (
alt.Chart(map_df)
.mark_rect()
.encode(
x=alt.X("x:Q"),
y=alt.Y("y:Q"),
color="z:N",
)
.properties(width=500, height=500)
)
...
Update: use mark_square instead of mark_rect
It seems that quantitative data type doesn't play very well when marker is rect (e.g. referring to the official example here, if you change the type of x and y to quantitative, the heatmap doesn't look right, see below).
So if rect is not a must-have marker, I would suggest you choose square. As long as your grid is dense enough and the marker size is large enough there won't be empty gaps left between markers, effectively what you want.
Using matplotlib, we can "trivially" fill the area between two vertical lines using fill_between() as in the example:
https://matplotlib.org/3.2.1/gallery/lines_bars_and_markers/fill_between_demo.html#selectively-marking-horizontal-regions-across-the-whole-axes
Using matplotlib, I can make what I need:
We have two signals, and I''m computing the rolling/moving Pearson's and Spearman's correlation. When the correlations go either below -0.5 or above 0.5, I want to shade the period (blue for Pearson's and orange for Spearman's). I also darken the weekends in gray in all plots.
However, I'm finding a hard time to accomplish the same using Plotly. And it will also be helpful to know how to do it between two horizontal lines.
Note that I'm using Plotly and Dash to speed up the visualization of several plots. Users asked for a more "dynamic type of thing." However, I'm not a GUI guy and cannot spend time on this, although I need to feed them with initial results.
BTW, I tried Bokeh in the past, and I gave up for some reason I cannot remember. Plotly looks good since I can use either from Python or R, which are my main development tools.
Thanks,
Carlos
I don't think there is any built-in Plotly method that that is equivalent to matplotlib's fill_between() method. However you can draw shapes so a possible workaround is to draw a grey rectangle and set the the parameter layer="below" so that the signal is still visible. You can also set the coordinates of the rectangle outside of your axis range to ensure the rectangle extends to the edges of the plot.
You can fill the area in between horizontal lines by drawing a rectangle and setting the axes ranges in a similar manner.
import numpy as np
import plotly.graph_objects as go
x = np.arange(0, 4 * np.pi, 0.01)
y = np.sin(x)
fig = go.Figure()
fig.add_trace(go.Scatter(
x=x,
y=y
))
# hard-code the axes
fig.update_xaxes(range=[0, 4 * np.pi])
fig.update_yaxes(range=[-1.2, 1.2])
# specify the corners of the rectangles
fig.update_layout(
shapes=[
dict(
type="rect",
xref="x",
yref="y",
x0="4",
y0="-1.3",
x1="5",
y1="1.3",
fillcolor="lightgray",
opacity=0.4,
line_width=0,
layer="below"
),
dict(
type="rect",
xref="x",
yref="y",
x0="9",
y0="-1.3",
x1="10",
y1="1.3",
fillcolor="lightgray",
opacity=0.4,
line_width=0,
layer="below"
),
]
)
fig.show()
You haven't provided a data sample so I'm going to use a synthetical time-series to show you how you can add a number of shapes with defined start and stop dates for several different categories using a custom function bgLevel
Two vertical lines with a fill between them very quickly turns into a rectangle. And rectangles can easily be added as shapes using fig.add_shape. The example below will show you how to find start and stop dates for periods given by a certain critera. In your case these criteria are whether or not the value of a variable is higher or lower than a certain level.
Using shapes instead of traces with fig.add_trace() will let you define the position with regards to plot layers using layer='below'. And the shapes outlines can easily be hidden using line=dict(color="rgba(0,0,0,0)).
Plot 1: Time series figure with random data:
Plot 2: Background is set to an opaque grey when A > 100 :
Plot 2: Background is also set to an opaque red when D < 60
Complete code:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import datetime
pd.set_option('display.max_rows', None)
# data sample
nperiods = 200
np.random.seed(123)
df = pd.DataFrame(np.random.randint(-10, 12, size=(nperiods, 4)),
columns=list('ABCD'))
datelist = pd.date_range(datetime.datetime(2020, 1, 1).strftime('%Y-%m-%d'),periods=nperiods).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df.iloc[0] = 0
df = df.cumsum().reset_index()
# function to set background color for a
# specified variable and a specified level
# plotly setup
fig = px.line(df, x='dates', y=df.columns[1:])
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='rgba(0,0,255,0.1)')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='rgba(0,0,255,0.1)')
def bgLevels(fig, variable, level, mode, fillcolor, layer):
"""
Set a specified color as background for given
levels of a specified variable using a shape.
Keyword arguments:
==================
fig -- plotly figure
variable -- column name in a pandas dataframe
level -- int or float
mode -- set threshold above or below
fillcolor -- any color type that plotly can handle
layer -- position of shape in plotly fiugre, like "below"
"""
if mode == 'above':
m = df[variable].gt(level)
if mode == 'below':
m = df[variable].lt(level)
df1 = df[m].groupby((~m).cumsum())['dates'].agg(['first','last'])
for index, row in df1.iterrows():
#print(row['first'], row['last'])
fig.add_shape(type="rect",
xref="x",
yref="paper",
x0=row['first'],
y0=0,
x1=row['last'],
y1=1,
line=dict(color="rgba(0,0,0,0)",width=3,),
fillcolor=fillcolor,
layer=layer)
return(fig)
fig = bgLevels(fig = fig, variable = 'A', level = 100, mode = 'above',
fillcolor = 'rgba(100,100,100,0.2)', layer = 'below')
fig = bgLevels(fig = fig, variable = 'D', level = -60, mode = 'below',
fillcolor = 'rgba(255,0,0,0.2)', layer = 'below')
fig.show()
I think that fig.add_hrect() and fig.add_vrect() are the simplest approaches to reproducing the MatPlotLib fill_between functionality in this case:
https://plotly.com/python/horizontal-vertical-shapes/
For your example, add_vrect() should do the trick.
I am translating a set of R visualizations to Python. I have the following target R multiple plot histograms:
Using Matplotlib and Seaborn combination and with the help of a kind StackOverflow member (see the link: Python Seaborn Distplot Y value corresponding to a given X value), I was able to create the following Python plot:
I am satisfied with its appearance, except, I don't know how to put the Header information in the plots. Here is my Python code that creates the Python Charts
""" Program to draw the sampling histogram distributions """
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import seaborn as sns
def main():
""" Main routine for the sampling histogram program """
sns.set_style('whitegrid')
markers_list = ["s", "o", "*", "^", "+"]
# create the data dataframe as df_orig
df_orig = pd.read_csv('lab_samples.csv')
df_orig = df_orig.loc[df_orig.hra != -9999]
hra_list_unique = df_orig.hra.unique().tolist()
# create and subset df_hra_colors to match the actual hra colors in df_orig
df_hra_colors = pd.read_csv('hra_lookup.csv')
df_hra_colors['hex'] = np.vectorize(rgb_to_hex)(df_hra_colors['red'], df_hra_colors['green'], df_hra_colors['blue'])
df_hra_colors.drop(labels=['red', 'green', 'blue'], axis=1, inplace=True)
df_hra_colors = df_hra_colors.loc[df_hra_colors['hra'].isin(hra_list_unique)]
# hard coding the current_component to pc1 here, we will extend it by looping
# through the list of components
current_component = 'pc1'
num_tests = 5
df_columns = df_orig.columns.tolist()
start_index = 5
for test in range(num_tests):
current_tests_list = df_columns[start_index:(start_index + num_tests)]
# now create the sns distplots for each HRA color and overlay the tests
i = 1
for _, row in df_hra_colors.iterrows():
plt.subplot(3, 3, i)
select_columns = ['hra', current_component] + current_tests_list
df_current_color = df_orig.loc[df_orig['hra'] == row['hra'], select_columns]
y_data = df_current_color.loc[df_current_color[current_component] != -9999, current_component]
axs = sns.distplot(y_data, color=row['hex'],
hist_kws={"ec":"k"},
kde_kws={"color": "k", "lw": 0.5})
data_x, data_y = axs.lines[0].get_data()
axs.text(0.0, 1.0, row['hra'], horizontalalignment="left", fontsize='x-small',
verticalalignment="top", transform=axs.transAxes)
for current_test_index, current_test in enumerate(current_tests_list):
# this_x defines the series of current_component(pc1,pc2,rhob) for this test
# indicated by 1, corresponding R program calls this test_vector
x_series = df_current_color.loc[df_current_color[current_test] == 1, current_component].tolist()
for this_x in x_series:
this_y = np.interp(this_x, data_x, data_y)
axs.plot([this_x], [this_y - current_test_index * 0.05],
markers_list[current_test_index], markersize = 3, color='black')
axs.xaxis.label.set_visible(False)
axs.xaxis.set_tick_params(labelsize=4)
axs.yaxis.set_tick_params(labelsize=4)
i = i + 1
start_index = start_index + num_tests
# plt.show()
pp = PdfPages('plots.pdf')
pp.savefig()
pp.close()
def rgb_to_hex(red, green, blue):
"""Return color as #rrggbb for the given color values."""
return '#%02x%02x%02x' % (red, green, blue)
if __name__ == "__main__":
main()
The Pandas code works fine and it is doing what it is supposed to. It is my lack of knowledge and experience of using 'PdfPages' in Matplotlib that is the bottleneck. How can I show the header information in Python/Matplotlib/Seaborn that I can show in the corresponding R visalization. By the Header information, I mean What The R visualization has at the top before the histograms, i.e., 'pc1', MRP, XRD,....
I can get their values easily from my program, e.g., current_component is 'pc1', etc. But I don't know how to format the plots with the Header. Can someone provide some guidance?
You may be looking for a figure title or super title, fig.suptitle:
fig.suptitle('this is the figure title', fontsize=12)
In your case you can easily get the figure with plt.gcf(), so try
plt.gcf().suptitle("pc1")
The rest of the information in the header would be called a legend.
For the following let's suppose all subplots have the same markers. It would then suffice to create a legend for one of the subplots.
To create legend labels, you can put the labelargument to the plot, i.e.
axs.plot( ... , label="MRP")
When later calling axs.legend() a legend will automatically be generated with the respective labels. Ways to position the legend are detailed e.g. in this answer.
Here, you may want to place the legend in terms of figure coordinates, i.e.
ax.legend(loc="lower center",bbox_to_anchor=(0.5,0.8),bbox_transform=plt.gcf().transFigure)
I ask this question because I haven't found a working example on how to annotate grouped horizontal Pandas bar charts yet. I'm aware of the following two:
Annotate bars with values on Pandas bar plots
Pandas, Bar Chart Annotations
But they are all about vertical bar charts. I.e., either don't have a solution for horizontal bar chart, or it is not fully working.
After several weeks working on this issue, I finally am able to ask the question with a sample code, which is almost what I want, just not 100% working. Need your help to reach for that 100%.
Here we go, the full code is uploaded here. The result looks like this:
You can see that it is almost working, just the label is not placed at where I want and I can't move them to a better place myself. Besides, because the top of the chart bar is used for displaying error bar, so what I really want is to move the annotate text toward the y-axis, line up nicely on either left or right side of y-axis, depending the X-value. E.g., this is what my colleagues can do with MS Excel:
Is this possible for Python to do that with Pandas chart?
I'm including the code from my above url for the annotation, one is my all-that-I-can-do, and the other is for the reference (from In [23]):
# my all-that-I-can-do
def autolabel(rects):
#if height constant: hbars, vbars otherwise
if (np.diff([plt.getp(item, 'width') for item in rects])==0).all():
x_pos = [rect.get_x() + rect.get_width()/2. for rect in rects]
y_pos = [rect.get_y() + 1.05*rect.get_height() for rect in rects]
scores = [plt.getp(item, 'height') for item in rects]
else:
x_pos = [rect.get_width()+.3 for rect in rects]
y_pos = [rect.get_y()+.3*rect.get_height() for rect in rects]
scores = [plt.getp(item, 'width') for item in rects]
# attach some text labels
for rect, x, y, s in zip(rects, x_pos, y_pos, scores):
ax.text(x,
y,
#'%s'%s,
str(round(s, 2)*100)+'%',
ha='center', va='bottom')
# for the reference
ax.bar(1. + np.arange(len(xv)), xv, align='center')
# Annotate with text
ax.set_xticks(1. + np.arange(len(xv)))
for i, val in enumerate(xv):
ax.text(i+1, val/2, str(round(val, 2)*100)+'%', va='center',
ha='center', color='black')
Please help. Thanks.
So, I changed a bit the way you construct your data for simplicity:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
sns.set_style("white") #for aesthetic purpose only
# fake data
df = pd.DataFrame({'A': np.random.choice(['foo', 'bar'], 100),
'B': np.random.choice(['one', 'two', 'three'], 100),
'C': np.random.choice(['I1', 'I2', 'I3', 'I4'], 100),
'D': np.random.randint(-10,11,100),
'E': np.random.randn(100)})
p = pd.pivot_table(df, index=['A','B'], columns='C', values='D')
e = pd.pivot_table(df, index=['A','B'], columns='C', values='E')
ax = p.plot(kind='barh', xerr=e, width=0.85)
for r in ax.patches:
if r.get_x() < 0: # it it's a negative bar
ax.text(0.25, # set label on the opposite side
r.get_y() + r.get_height()/5., # y
"{:" ">7.1f}%".format(r.get_x()*100), # text
bbox={"facecolor":"red",
"alpha":0.5,
"pad":1},
fontsize=10, family="monospace", zorder=10)
else:
ax.text(-1.5, # set label on the opposite side
r.get_y() + r.get_height()/5., # y
"{:" ">6.1f}%".format(r.get_width()*100),
bbox={"facecolor":"green",
"alpha":0.5,
"pad":1},
fontsize=10, family="monospace", zorder=10)
plt.tight_layout()
which gives:
I plot the label depending on the mean value and put it on the other side of the 0-line so you're pretty sure that it will never overlap to something else, except an error bar sometimes. I set a box behind the text so it reflects the value of the mean.
There are some values you'll need to adjust depending on your figure size so the labels fit right, like:
width=0.85
+r.get_height()/5. # y
"pad":1
fontsize=10
"{:" ">6.1f}%".format(r.get_width()*100) : set total amount of char in the label (here, 6 minimum, fill with white space on the right if less than 6 char). It needs family="monospace"
Tell me if something isn't clear.
HTH