I'm trying to create an overlay plot of a binary grid map and a simple line plot. However, when creating a layered plot, the axis are not aligned and the plot becomes unreadable. Ideally, I'd like to have both plot share a single axis so that the line coordinates match the map coordinates.
Here's a basic snippet of my attempt:
import torch as th
import altair as alt
import pandas as pd
xv, yv = th.meshgrid(th.linspace(-10, 10, 100), th.linspace(-10, 10, 100))
o_map = th.zeros_like(xv)
o_map[40:60, 40:60] = 1 # add obstacle centred on origin
map_df = pd.DataFrame(
{"x": xv.flatten(), "y": yv.flatten(), "z": o_map.flatten()}
)
map_chart = (
alt.Chart(map_df)
.mark_rect()
.encode(
x=alt.X("x:O", axis=alt.Axis(format=".2")),
y=alt.Y("y:O", axis=alt.Axis(format=".2")),
color="z:N",
)
.properties(width=500, height=500)
)
x = th.linspace(-5, 10, 100)
line_df = pd.DataFrame({"x": x, "y": 0.2 * x ** 2 - 3})
line_chart = alt.Chart(line_df).mark_line(color="red").encode(x="x:Q", y="y:Q")
layer_chart = map_chart + line_chart
The resulting plots are as following:
Line plot
Binary map
Layered plot
If you change the x and y channel data type in map_chart from 'O' to 'Q' the axis should be aligned automatically.
...
map_chart = (
alt.Chart(map_df)
.mark_rect()
.encode(
x=alt.X("x:Q"),
y=alt.Y("y:Q"),
color="z:N",
)
.properties(width=500, height=500)
)
...
Update: use mark_square instead of mark_rect
It seems that quantitative data type doesn't play very well when marker is rect (e.g. referring to the official example here, if you change the type of x and y to quantitative, the heatmap doesn't look right, see below).
So if rect is not a must-have marker, I would suggest you choose square. As long as your grid is dense enough and the marker size is large enough there won't be empty gaps left between markers, effectively what you want.
Related
I would like to guide the reader's attention to just some columns (or rows and columns) in a heatmap, while still retaining the full context.
I can use alt.condition to alter color and opacity. Both work to some extent. But changes in opacity visualize in a similar way as changes in value. And using a different color changes the perception of values. What I would like to do instead is to put yellow or red borders around the consecutive columns I want to highlight.
This is what I have now. Any other ideas?
import altair as alt
alt.data_transformers.disable_max_rows()
def create_att_chart(df, keys_to_highlight=[], width=150, height=150, title=None, labels_x=True, labels_y=True):
properties = {}
if title:
properties['title'] = title
if width: properties['width'] = width
if height: properties['height'] = height
chart = alt.Chart(df).mark_rect().encode(
x=alt.X('k:N', sort=None, axis=alt.Axis(labels=labels_x, title=None, ticks=False), title=None),
y=alt.Y('q:N', sort=None, axis=alt.Axis(labels=labels_y, title=None, ticks=False), title=None),
opacity=alt.Opacity('a:Q', legend=None),
column=alt.Column('h:N', title=None, header=alt.Header(labels=False), spacing=0.),
row= alt.Row( 'l:N', title=None, header=alt.Header(labels=False), spacing=5.))
if keys_to_highlight:
chart = chart.encode(
color=alt.condition(
alt.Predicate(alt.FieldOneOfPredicate(field='k', oneOf=keys_to_highlight)),
alt.value('orange'),
alt.value('blue')))
else:
chart = chart.encode(color=alt.value('blue'))
return chart.properties(**properties)
[..]
((create_att_chart(df_pt, ['sage', '##maker'], title='Pre-Trained') | create_att_chart(df_ft, ['sage', '##maker'], title='Fine-Tuned', labels_y=False)).properties(padding=0))
You could try use the condition for the stroke encoding instead of color, but I think that would give you strokes around each box, which is probably not what you want. Instead you could use mark_rule or mark_rect with this example from the docs:
import altair as alt
import numpy as np
import pandas as pd
# Compute x^2 + y^2 across a 2D grid
x, y = np.meshgrid(range(-5, 5), range(-5, 5))
z = x ** 2 + y ** 2
# Convert this grid to columnar data expected by Altair
source = pd.DataFrame({'x': x.ravel(), 'y': y.ravel(), 'z': z.ravel()})
heatmap = alt.Chart(source).mark_rect().encode(
x='x:O',
y='y:O',
color=alt.Color('z:Q', scale=alt.Scale(scheme='blues')))
Now add the rules:
rule1 = alt.Chart(df).mark_rule(stroke='orange', strokeWidth=2).encode(x=alt.value(20))
rule2 = alt.Chart(df).mark_rule(stroke='orange', strokeWidth=2).encode(x=alt.value(60))
heatmap + rule1 + rule2
A top rule might be more appealing/elegant and you could add text above it with mark_text if needed:
rule1 = alt.Chart(df).mark_rule(stroke='orange', strokeWidth=3).encode(
y=alt.value(-5),
x=alt.value(20),
x2=alt.value(60))
heatmap + rule1
mark_rect works but add the lines in the middle of squares since the scale is ordinal and a quantitative mark_rect messes up the axis:
df = pd.DataFrame({'x': [0], 'x2': [3]})
box = alt.Chart(df).mark_rect(color='', stroke='orange', strokeWidth=2).encode(
x='x:O',
x2=alt.X2('x2:O', title='x'))
heatmap + box
If you try to add the lines in between, new ordinal axis marks will be created. You could abuse this and make the lines white to highlight by separation but the ticks on the axis are still there, so you would have to remove them with lablExpr or similar.
df = pd.DataFrame({'x': [0.5], 'x2': [3.5]})
box = alt.Chart(df).mark_rect(color='', stroke='white').encode(
x='x:O',
x2=alt.X2('x2:O', title='x'))
(heatmap + box).configure_view(stroke=None)
When layered above a heatmap, the Altair scatterplot only seems to work if the point values are also on the axis of the heatmap. I any other case, white lines along the x and y-values are added. Here's a minimal example:
import streamlit as st
import altair as alt
import numpy as np
import pandas as pd
# Compute x^2 + y^2 across a 2D grid
x, y = np.meshgrid(range(-5, 5), range(-5, 5))
z = x ** 2 + y ** 2
# Convert this grid to columnar data expected by Altair
source = pd.DataFrame({'x': x.ravel(),
'y': y.ravel(),
'z': z.ravel()})
c = alt.Chart(source).mark_rect().encode(
x='x:O',
y='y:O',
color='z:Q'
)
scatter_source = pd.DataFrame({'x': [-1.001,-3], 'y': [0,1]})
s = alt.Chart(scatter_source).mark_circle(size=100).encode(
x='x:O',
y='y:O')
st.altair_chart(c + s)
Is there any way to prevent this behavior? I'd like to animate the points later on, so adding values to the heatmap axis is not an option.
Ordinal encodings (marked by :O) will always create a discrete axis with one bin per unique value. It sounds like you would like to visualize your data with a quantitative encoding (marked by :Q), which creates a continuous, real-valued axis.
In the case of the heatmap, though, this complicates things: if you're no longer treating the data as ordered categories, you must specify the starting and ending point for each bin along each axis. This requires some thought about what your bins represent: does the value "2" represent numbers spanning from 2 to 3? from 1 to 2? from 1.5 to 2.5? The answer will depend on context.
Here is an example of computing these bin boundaries using a calculate transform, assuming the values represent the center of unit bins:
c = alt.Chart(source).transform_calculate(
x1=alt.datum.x - 0.5,
x2=alt.datum.x + 0.5,
y1=alt.datum.y - 0.5,
y2=alt.datum.y + 0.5,
).mark_rect().encode(
x='x1:Q', x2='x2:Q',
y='y1:Q', y2='y2:Q',
color='z:Q'
).properties(
width=400, height=400
)
scatter_source = pd.DataFrame({'x': [-1.001,-3], 'y': [0,1]})
s = alt.Chart(scatter_source).mark_circle(size=100).encode(
x='x:Q',
y='y:Q'
)
st.altair_chart(c + s)
Alternatively, if you would like this binning to happen more automatically, you can use a bin transform on each axis:
c = alt.Chart(source).mark_rect().encode(
x=alt.X('x:Q', bin=True),
y=alt.Y('y:Q', bin=True),
color='z:Q'
).properties(
width=400,
height=400
)
scatter_source = pd.DataFrame({'x': [-1.001,-3], 'y': [0,1]})
s = alt.Chart(scatter_source).mark_circle(size=100).encode(
x='x:Q',
y='y:Q'
)
I'm trying to plot a graph with four different values on the "y" axis. So, I have 6 arrays, 2 of which have elements that represent the time values of the "x" axis and the other 4 represent the corresponding elements (in the same position) in relation to the "y" axis.
Example:
LT_TIME = ['18:14:17.566 ', '18:14:17.570']
LT_RP = [-110,-113]
LT_RQ = [-3,-5]
GNR_TIME = ['18: 15: 42.489', '18:32:39.489']
GNR_RP = [-94, -94]
GNR_RQ = [-3, -7]
The coordinates of the "LT" graph are:
('18:14:17.566',-110), ('18:14:17.570',-113), ('18:14:17.566',-3), ('18:14:17.570',-5)
And with these coordinates, I can generate a graph with two "y" axes, which contains the points (-110,-113,-3,-5) and an "x" axis with the points ('18:14:17.566', '18:14:17.570').
Similarly, it is possible to do the same "GNR" arrays. So, how can I have all the Cartesian points on both the "LT" and "GNR" arrays on the same graph??? I mean, how to plot so that I have the following coordinates on the same graph:
('18:14:17.566',-110), ('18:14:17.570 ',-113), ('18:14:17.566',-3), ('18:14:17.570',-5),
('18:15:42.489',-94), ('18:32:39.489',-94), ('18:15:42.489',-3), ('18:32:39.489',-7)
It sounds like your problem has two parts: formatting the data in a way that visualisation libraries would understand and actually visualising it using a dual axis.
Your example screenshot includes some interactive controls so I suggest you use bokeh which gives you zoom and pan for "free" rather than matplotlib. Besides, I find that bokeh's way of adding dual axis is more straight-forward. If matplotlib is a must, here's another answer that should point you in the right direction.
For the first part, you can merge the data you have into a single dataframe, like so:
import pandas as pd
from bokeh.models import LinearAxis, Range1d, ColumnDataSource
from bokeh.plotting import figure, output_notebook, show
output_notebook() #if working in Jupyter Notebook, output_file() if not
LT_TIME = ['18:14:17.566 ', '18:14:17.570']
LT_RP = [-110,-113]
LT_RQ = [-3,-5]
GNR_TIME = ['18: 15: 42.489', '18:32:39.489']
GNR_RP = [-94, -94]
GNR_RQ = [-3, -7]
s1 = list(zip(LT_TIME, LT_RP)) + list(zip(GNR_TIME, GNR_RP))
s2 = list(zip(LT_TIME, LT_RQ)) + list(zip(GNR_TIME, GNR_RQ))
df1 = pd.DataFrame(s1, columns=["Date", "RP"])
df2 = pd.DataFrame(s2, columns=["Date", "RQ"])
df = df1.merge(df2, on="Date")
source = ColumnDataSource(df)
To visualise the data as a dual axis line chart, we just need to specify the extra y-axis and position it in the layout:
p = figure(x_range=df["Date"], y_range=(-90, -120))
p.line(x="Date", y="RP", color="cadetblue", line_width=2, source=source)
p.extra_y_ranges = {"RQ": Range1d(start=0, end=-10)}
p.line(x="Date", y="RQ", color="firebrick", line_width=2, y_range_name="RQ", source=source)
p.add_layout(LinearAxis(y_range_name="RQ"), 'right')
show(p)
Using matplotlib, we can "trivially" fill the area between two vertical lines using fill_between() as in the example:
https://matplotlib.org/3.2.1/gallery/lines_bars_and_markers/fill_between_demo.html#selectively-marking-horizontal-regions-across-the-whole-axes
Using matplotlib, I can make what I need:
We have two signals, and I''m computing the rolling/moving Pearson's and Spearman's correlation. When the correlations go either below -0.5 or above 0.5, I want to shade the period (blue for Pearson's and orange for Spearman's). I also darken the weekends in gray in all plots.
However, I'm finding a hard time to accomplish the same using Plotly. And it will also be helpful to know how to do it between two horizontal lines.
Note that I'm using Plotly and Dash to speed up the visualization of several plots. Users asked for a more "dynamic type of thing." However, I'm not a GUI guy and cannot spend time on this, although I need to feed them with initial results.
BTW, I tried Bokeh in the past, and I gave up for some reason I cannot remember. Plotly looks good since I can use either from Python or R, which are my main development tools.
Thanks,
Carlos
I don't think there is any built-in Plotly method that that is equivalent to matplotlib's fill_between() method. However you can draw shapes so a possible workaround is to draw a grey rectangle and set the the parameter layer="below" so that the signal is still visible. You can also set the coordinates of the rectangle outside of your axis range to ensure the rectangle extends to the edges of the plot.
You can fill the area in between horizontal lines by drawing a rectangle and setting the axes ranges in a similar manner.
import numpy as np
import plotly.graph_objects as go
x = np.arange(0, 4 * np.pi, 0.01)
y = np.sin(x)
fig = go.Figure()
fig.add_trace(go.Scatter(
x=x,
y=y
))
# hard-code the axes
fig.update_xaxes(range=[0, 4 * np.pi])
fig.update_yaxes(range=[-1.2, 1.2])
# specify the corners of the rectangles
fig.update_layout(
shapes=[
dict(
type="rect",
xref="x",
yref="y",
x0="4",
y0="-1.3",
x1="5",
y1="1.3",
fillcolor="lightgray",
opacity=0.4,
line_width=0,
layer="below"
),
dict(
type="rect",
xref="x",
yref="y",
x0="9",
y0="-1.3",
x1="10",
y1="1.3",
fillcolor="lightgray",
opacity=0.4,
line_width=0,
layer="below"
),
]
)
fig.show()
You haven't provided a data sample so I'm going to use a synthetical time-series to show you how you can add a number of shapes with defined start and stop dates for several different categories using a custom function bgLevel
Two vertical lines with a fill between them very quickly turns into a rectangle. And rectangles can easily be added as shapes using fig.add_shape. The example below will show you how to find start and stop dates for periods given by a certain critera. In your case these criteria are whether or not the value of a variable is higher or lower than a certain level.
Using shapes instead of traces with fig.add_trace() will let you define the position with regards to plot layers using layer='below'. And the shapes outlines can easily be hidden using line=dict(color="rgba(0,0,0,0)).
Plot 1: Time series figure with random data:
Plot 2: Background is set to an opaque grey when A > 100 :
Plot 2: Background is also set to an opaque red when D < 60
Complete code:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import datetime
pd.set_option('display.max_rows', None)
# data sample
nperiods = 200
np.random.seed(123)
df = pd.DataFrame(np.random.randint(-10, 12, size=(nperiods, 4)),
columns=list('ABCD'))
datelist = pd.date_range(datetime.datetime(2020, 1, 1).strftime('%Y-%m-%d'),periods=nperiods).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df.iloc[0] = 0
df = df.cumsum().reset_index()
# function to set background color for a
# specified variable and a specified level
# plotly setup
fig = px.line(df, x='dates', y=df.columns[1:])
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='rgba(0,0,255,0.1)')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='rgba(0,0,255,0.1)')
def bgLevels(fig, variable, level, mode, fillcolor, layer):
"""
Set a specified color as background for given
levels of a specified variable using a shape.
Keyword arguments:
==================
fig -- plotly figure
variable -- column name in a pandas dataframe
level -- int or float
mode -- set threshold above or below
fillcolor -- any color type that plotly can handle
layer -- position of shape in plotly fiugre, like "below"
"""
if mode == 'above':
m = df[variable].gt(level)
if mode == 'below':
m = df[variable].lt(level)
df1 = df[m].groupby((~m).cumsum())['dates'].agg(['first','last'])
for index, row in df1.iterrows():
#print(row['first'], row['last'])
fig.add_shape(type="rect",
xref="x",
yref="paper",
x0=row['first'],
y0=0,
x1=row['last'],
y1=1,
line=dict(color="rgba(0,0,0,0)",width=3,),
fillcolor=fillcolor,
layer=layer)
return(fig)
fig = bgLevels(fig = fig, variable = 'A', level = 100, mode = 'above',
fillcolor = 'rgba(100,100,100,0.2)', layer = 'below')
fig = bgLevels(fig = fig, variable = 'D', level = -60, mode = 'below',
fillcolor = 'rgba(255,0,0,0.2)', layer = 'below')
fig.show()
I think that fig.add_hrect() and fig.add_vrect() are the simplest approaches to reproducing the MatPlotLib fill_between functionality in this case:
https://plotly.com/python/horizontal-vertical-shapes/
For your example, add_vrect() should do the trick.
I want to add frequency labels to the histogram generated using plt.hist.
Here is the data :
np.random.seed(30)
d = np.random.randint(1, 101, size = 25)
print(sorted(d))
I looked up other questions on stackoverflow like :
Adding value labels on a matplotlib bar chart
and their answers, but apparantly, the objects returnded by plt.plot(kind='bar') are different than than those returned by plt.hist, and I got errors while using the 'get_height' or 'get width' functions, as suggested in some of the answers for bar plot.
Similarly, couldn't find the solution by going through the matplotlib documentation on histograms.
got this error
Here is how I managed it. If anyone has some suggestions to improve my answer, (specifically the for loop and using n=0, n=n+1, I think there must be a better way to write the for loop without having to use n in this manner), I'd welcome it.
# import base packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# generate data
np.random.seed(30)
d = np.random.randint(1, 101, size = 25)
print(sorted(d))
# generate histogram
# a histogram returns 3 objects : n (i.e. frequncies), bins, patches
freq, bins, patches = plt.hist(d, edgecolor='white', label='d', bins=range(1,101,10))
# x coordinate for labels
bin_centers = np.diff(bins)*0.5 + bins[:-1]
n = 0
for fr, x, patch in zip(freq, bin_centers, patches):
height = int(freq[n])
plt.annotate("{}".format(height),
xy = (x, height), # top left corner of the histogram bar
xytext = (0,0.2), # offsetting label position above its bar
textcoords = "offset points", # Offset (in points) from the *xy* value
ha = 'center', va = 'bottom'
)
n = n+1
plt.legend()
plt.show;