When layered above a heatmap, the Altair scatterplot only seems to work if the point values are also on the axis of the heatmap. I any other case, white lines along the x and y-values are added. Here's a minimal example:
import streamlit as st
import altair as alt
import numpy as np
import pandas as pd
# Compute x^2 + y^2 across a 2D grid
x, y = np.meshgrid(range(-5, 5), range(-5, 5))
z = x ** 2 + y ** 2
# Convert this grid to columnar data expected by Altair
source = pd.DataFrame({'x': x.ravel(),
'y': y.ravel(),
'z': z.ravel()})
c = alt.Chart(source).mark_rect().encode(
x='x:O',
y='y:O',
color='z:Q'
)
scatter_source = pd.DataFrame({'x': [-1.001,-3], 'y': [0,1]})
s = alt.Chart(scatter_source).mark_circle(size=100).encode(
x='x:O',
y='y:O')
st.altair_chart(c + s)
Is there any way to prevent this behavior? I'd like to animate the points later on, so adding values to the heatmap axis is not an option.
Ordinal encodings (marked by :O) will always create a discrete axis with one bin per unique value. It sounds like you would like to visualize your data with a quantitative encoding (marked by :Q), which creates a continuous, real-valued axis.
In the case of the heatmap, though, this complicates things: if you're no longer treating the data as ordered categories, you must specify the starting and ending point for each bin along each axis. This requires some thought about what your bins represent: does the value "2" represent numbers spanning from 2 to 3? from 1 to 2? from 1.5 to 2.5? The answer will depend on context.
Here is an example of computing these bin boundaries using a calculate transform, assuming the values represent the center of unit bins:
c = alt.Chart(source).transform_calculate(
x1=alt.datum.x - 0.5,
x2=alt.datum.x + 0.5,
y1=alt.datum.y - 0.5,
y2=alt.datum.y + 0.5,
).mark_rect().encode(
x='x1:Q', x2='x2:Q',
y='y1:Q', y2='y2:Q',
color='z:Q'
).properties(
width=400, height=400
)
scatter_source = pd.DataFrame({'x': [-1.001,-3], 'y': [0,1]})
s = alt.Chart(scatter_source).mark_circle(size=100).encode(
x='x:Q',
y='y:Q'
)
st.altair_chart(c + s)
Alternatively, if you would like this binning to happen more automatically, you can use a bin transform on each axis:
c = alt.Chart(source).mark_rect().encode(
x=alt.X('x:Q', bin=True),
y=alt.Y('y:Q', bin=True),
color='z:Q'
).properties(
width=400,
height=400
)
scatter_source = pd.DataFrame({'x': [-1.001,-3], 'y': [0,1]})
s = alt.Chart(scatter_source).mark_circle(size=100).encode(
x='x:Q',
y='y:Q'
)
Related
I would like to guide the reader's attention to just some columns (or rows and columns) in a heatmap, while still retaining the full context.
I can use alt.condition to alter color and opacity. Both work to some extent. But changes in opacity visualize in a similar way as changes in value. And using a different color changes the perception of values. What I would like to do instead is to put yellow or red borders around the consecutive columns I want to highlight.
This is what I have now. Any other ideas?
import altair as alt
alt.data_transformers.disable_max_rows()
def create_att_chart(df, keys_to_highlight=[], width=150, height=150, title=None, labels_x=True, labels_y=True):
properties = {}
if title:
properties['title'] = title
if width: properties['width'] = width
if height: properties['height'] = height
chart = alt.Chart(df).mark_rect().encode(
x=alt.X('k:N', sort=None, axis=alt.Axis(labels=labels_x, title=None, ticks=False), title=None),
y=alt.Y('q:N', sort=None, axis=alt.Axis(labels=labels_y, title=None, ticks=False), title=None),
opacity=alt.Opacity('a:Q', legend=None),
column=alt.Column('h:N', title=None, header=alt.Header(labels=False), spacing=0.),
row= alt.Row( 'l:N', title=None, header=alt.Header(labels=False), spacing=5.))
if keys_to_highlight:
chart = chart.encode(
color=alt.condition(
alt.Predicate(alt.FieldOneOfPredicate(field='k', oneOf=keys_to_highlight)),
alt.value('orange'),
alt.value('blue')))
else:
chart = chart.encode(color=alt.value('blue'))
return chart.properties(**properties)
[..]
((create_att_chart(df_pt, ['sage', '##maker'], title='Pre-Trained') | create_att_chart(df_ft, ['sage', '##maker'], title='Fine-Tuned', labels_y=False)).properties(padding=0))
You could try use the condition for the stroke encoding instead of color, but I think that would give you strokes around each box, which is probably not what you want. Instead you could use mark_rule or mark_rect with this example from the docs:
import altair as alt
import numpy as np
import pandas as pd
# Compute x^2 + y^2 across a 2D grid
x, y = np.meshgrid(range(-5, 5), range(-5, 5))
z = x ** 2 + y ** 2
# Convert this grid to columnar data expected by Altair
source = pd.DataFrame({'x': x.ravel(), 'y': y.ravel(), 'z': z.ravel()})
heatmap = alt.Chart(source).mark_rect().encode(
x='x:O',
y='y:O',
color=alt.Color('z:Q', scale=alt.Scale(scheme='blues')))
Now add the rules:
rule1 = alt.Chart(df).mark_rule(stroke='orange', strokeWidth=2).encode(x=alt.value(20))
rule2 = alt.Chart(df).mark_rule(stroke='orange', strokeWidth=2).encode(x=alt.value(60))
heatmap + rule1 + rule2
A top rule might be more appealing/elegant and you could add text above it with mark_text if needed:
rule1 = alt.Chart(df).mark_rule(stroke='orange', strokeWidth=3).encode(
y=alt.value(-5),
x=alt.value(20),
x2=alt.value(60))
heatmap + rule1
mark_rect works but add the lines in the middle of squares since the scale is ordinal and a quantitative mark_rect messes up the axis:
df = pd.DataFrame({'x': [0], 'x2': [3]})
box = alt.Chart(df).mark_rect(color='', stroke='orange', strokeWidth=2).encode(
x='x:O',
x2=alt.X2('x2:O', title='x'))
heatmap + box
If you try to add the lines in between, new ordinal axis marks will be created. You could abuse this and make the lines white to highlight by separation but the ticks on the axis are still there, so you would have to remove them with lablExpr or similar.
df = pd.DataFrame({'x': [0.5], 'x2': [3.5]})
box = alt.Chart(df).mark_rect(color='', stroke='white').encode(
x='x:O',
x2=alt.X2('x2:O', title='x'))
(heatmap + box).configure_view(stroke=None)
How do I draw quantile lines and connect two violin plots in plotly in Python?
For example, there is a library to do this in R (https://github.com/GRousselet/rogme). The library provided does not necessarily work when there are more than two groups.
There is definitely no built-in method to do something this specific in Plotly. The best you can do is probably draw some lines, and consider writing a function or some loops if you need to do this for multiple groups of data for different quantile values.
Here is how I would get started. You can create a list or array to store all of the coordinates of the lines if you want to connect the same quantiles from the Grouped violin plots. I acknowledge what I have at the moment is hacky, as it relies on groups in Plotly having y-coordinates starting at 0 and increasing by 1. There might be a way to access the y-coordinates of grouped violin plots, I'd recommend looking into the documentation.
Some more work will need to be done if you want to add text boxes to indicate the values of quantiles.
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
# generate some random data that is normally distributed
np.random.seed(42)
y1 = np.random.normal(0, 1, 1000) * 1.5 + 6
y2 = np.random.normal(0, 5, 1000) + 6
# group the data together and combine into one dataframe
df1 = pd.DataFrame({'Group': 'Group1', 'Values': y1})
df2 = pd.DataFrame({'Group': 'Group2', 'Values': y2})
df_final = pd.concat([df1, df2])
fig = px.strip(df_final, x='Values', y='Group', color_discrete_sequence=['grey'])
quantiles_list = [0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95]
## this is a bit hacky and relies on y coordinates for groups starting from 0 and increasing by 1
y_diff = 0
## these store the coordinates in order to connect the quantile lines
lower_coordinates, upper_coordinates = [], []
for group_name in df_final.Group.unique():
for quantile in quantiles_list:
quantile_value = np.quantile(df_final[df_final['Group'] == group_name].Values, quantile)
if group_name == 'Group1':
lower_coordinates.append((quantile_value, 0.2+1*y_diff))
if group_name == 'Group2':
upper_coordinates.append((quantile_value, -0.2+1*y_diff))
fig.add_shape(
# Vertical Line for Group1
dict(
type="line",
x0=quantile_value,
y0=-0.2+1*y_diff,
x1=quantile_value,
y1=0.2+1*y_diff,
line=dict(
color="black",
width=4
)
),
)
y_diff += 1
## draw connecting lines
for idx in range(len(upper_coordinates)):
fig.add_shape(
dict(
type="line",
x0=lower_coordinates[idx][0],
y0=lower_coordinates[idx][1],
x1=upper_coordinates[idx][0],
y1=upper_coordinates[idx][1],
line=dict(
color="chocolate",
width=4
)
),
)
fig.show()
I'm trying to create an overlay plot of a binary grid map and a simple line plot. However, when creating a layered plot, the axis are not aligned and the plot becomes unreadable. Ideally, I'd like to have both plot share a single axis so that the line coordinates match the map coordinates.
Here's a basic snippet of my attempt:
import torch as th
import altair as alt
import pandas as pd
xv, yv = th.meshgrid(th.linspace(-10, 10, 100), th.linspace(-10, 10, 100))
o_map = th.zeros_like(xv)
o_map[40:60, 40:60] = 1 # add obstacle centred on origin
map_df = pd.DataFrame(
{"x": xv.flatten(), "y": yv.flatten(), "z": o_map.flatten()}
)
map_chart = (
alt.Chart(map_df)
.mark_rect()
.encode(
x=alt.X("x:O", axis=alt.Axis(format=".2")),
y=alt.Y("y:O", axis=alt.Axis(format=".2")),
color="z:N",
)
.properties(width=500, height=500)
)
x = th.linspace(-5, 10, 100)
line_df = pd.DataFrame({"x": x, "y": 0.2 * x ** 2 - 3})
line_chart = alt.Chart(line_df).mark_line(color="red").encode(x="x:Q", y="y:Q")
layer_chart = map_chart + line_chart
The resulting plots are as following:
Line plot
Binary map
Layered plot
If you change the x and y channel data type in map_chart from 'O' to 'Q' the axis should be aligned automatically.
...
map_chart = (
alt.Chart(map_df)
.mark_rect()
.encode(
x=alt.X("x:Q"),
y=alt.Y("y:Q"),
color="z:N",
)
.properties(width=500, height=500)
)
...
Update: use mark_square instead of mark_rect
It seems that quantitative data type doesn't play very well when marker is rect (e.g. referring to the official example here, if you change the type of x and y to quantitative, the heatmap doesn't look right, see below).
So if rect is not a must-have marker, I would suggest you choose square. As long as your grid is dense enough and the marker size is large enough there won't be empty gaps left between markers, effectively what you want.
I'm trying to plot a graph with four different values on the "y" axis. So, I have 6 arrays, 2 of which have elements that represent the time values of the "x" axis and the other 4 represent the corresponding elements (in the same position) in relation to the "y" axis.
Example:
LT_TIME = ['18:14:17.566 ', '18:14:17.570']
LT_RP = [-110,-113]
LT_RQ = [-3,-5]
GNR_TIME = ['18: 15: 42.489', '18:32:39.489']
GNR_RP = [-94, -94]
GNR_RQ = [-3, -7]
The coordinates of the "LT" graph are:
('18:14:17.566',-110), ('18:14:17.570',-113), ('18:14:17.566',-3), ('18:14:17.570',-5)
And with these coordinates, I can generate a graph with two "y" axes, which contains the points (-110,-113,-3,-5) and an "x" axis with the points ('18:14:17.566', '18:14:17.570').
Similarly, it is possible to do the same "GNR" arrays. So, how can I have all the Cartesian points on both the "LT" and "GNR" arrays on the same graph??? I mean, how to plot so that I have the following coordinates on the same graph:
('18:14:17.566',-110), ('18:14:17.570 ',-113), ('18:14:17.566',-3), ('18:14:17.570',-5),
('18:15:42.489',-94), ('18:32:39.489',-94), ('18:15:42.489',-3), ('18:32:39.489',-7)
It sounds like your problem has two parts: formatting the data in a way that visualisation libraries would understand and actually visualising it using a dual axis.
Your example screenshot includes some interactive controls so I suggest you use bokeh which gives you zoom and pan for "free" rather than matplotlib. Besides, I find that bokeh's way of adding dual axis is more straight-forward. If matplotlib is a must, here's another answer that should point you in the right direction.
For the first part, you can merge the data you have into a single dataframe, like so:
import pandas as pd
from bokeh.models import LinearAxis, Range1d, ColumnDataSource
from bokeh.plotting import figure, output_notebook, show
output_notebook() #if working in Jupyter Notebook, output_file() if not
LT_TIME = ['18:14:17.566 ', '18:14:17.570']
LT_RP = [-110,-113]
LT_RQ = [-3,-5]
GNR_TIME = ['18: 15: 42.489', '18:32:39.489']
GNR_RP = [-94, -94]
GNR_RQ = [-3, -7]
s1 = list(zip(LT_TIME, LT_RP)) + list(zip(GNR_TIME, GNR_RP))
s2 = list(zip(LT_TIME, LT_RQ)) + list(zip(GNR_TIME, GNR_RQ))
df1 = pd.DataFrame(s1, columns=["Date", "RP"])
df2 = pd.DataFrame(s2, columns=["Date", "RQ"])
df = df1.merge(df2, on="Date")
source = ColumnDataSource(df)
To visualise the data as a dual axis line chart, we just need to specify the extra y-axis and position it in the layout:
p = figure(x_range=df["Date"], y_range=(-90, -120))
p.line(x="Date", y="RP", color="cadetblue", line_width=2, source=source)
p.extra_y_ranges = {"RQ": Range1d(start=0, end=-10)}
p.line(x="Date", y="RQ", color="firebrick", line_width=2, y_range_name="RQ", source=source)
p.add_layout(LinearAxis(y_range_name="RQ"), 'right')
show(p)
I am preparing a graph of latency percentile results. This is my pd.DataFrame looks like:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
result = pd.DataFrame(np.random.randint(133000, size=(5,3)), columns=list('ABC'), index=[99.0, 99.9, 99.99, 99.999, 99.9999])
I am using this function (commented lines are different pyplot methods I have already tried to achieve my goal):
def plot_latency_time_bar(result):
ind = np.arange(4)
means = []
stds = []
for index, row in result.iterrows():
means.append(np.mean([row[0]//1000, row[1]//1000, row[2]//1000]))
stds.append(np .std([row[0]//1000, row[1]//1000, row[2]//1000]))
plt.bar(result.index.values, means, 0.2, yerr=stds, align='center')
plt.xlabel('Percentile')
plt.ylabel('Latency')
plt.xticks(result.index.values)
# plt.xticks(ind, ('99.0', '99.9', '99.99', '99.999', '99.99999'))
# plt.autoscale(enable=False, axis='x', tight=False)
# plt.axis('auto')
# plt.margins(0.8, 0)
# plt.semilogx(basex=5)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
fig = plt.gcf()
fig.set_size_inches(15.5, 10.5)
And here is the figure:
As you can see bars for all percentiles above 99.0 overlaps and are completely unreadable. I would like to set some fixed space between ticks to have a same space between all of them.
Since you're using pandas, you can do all this from within that library:
means = df.mean(axis=1)/1000
stds = df.std(axis=1)/1000
means.plot.bar(yerr=stds, fc='b')
# Make some room for the x-axis tick labels
plt.subplots_adjust(bottom=0.2)
plt.show()
Not wishing to take anything away from xnx's answer (which is the most elegant way to do things given that you're working in pandas, and therefore likely the best answer for you) but the key insight you're missing is that, in matplotlib, the x positions of the data you're plotting and the x tick labels are independent things. If you say:
nominalX = np.arange( 1, 6 ) ** 2
y = np.arange( 1, 6 ) ** 4
positionalX = np.arange(len(y))
plt.bar( positionalX, y ) # graph y against the numbers 1..n
plt.gca().set(xticks=positionalX + 0.4, xticklabels=nominalX) # ...but superficially label the X values as something else
then that's different from tying positions to your nominal X values:
plt.bar( nominalX, y )
Note that I added 0.4 to the x position of the ticks, because that's half the default width of the bars bar( ..., width=0.8 )—so the ticks end up in the middle of the bar.