Tooltips in Altair line charts - python

When specifying a tooltip for a line chart, the tooltip only appears when hovering over points along a line, but not when hovering anywhere else along a line. This is especially problematic when using a non-linear interpolation... Is there way to explicitly set tooltips on the lines themselves?
import altair as alt
from vega_datasets import data
source = data.jobs.url
alt.Chart(source).mark_line(interpolate="basis").encode(
alt.X('year:O'),
alt.Y('perc:Q', axis=alt.Axis(format='%')),
color='sex:N',
tooltip='sex:N'
).properties(
title='Percent of work-force working as Welders'
).transform_filter(
alt.datum.job == 'Welder'
)

Extending from #Philipp_Kats's answer and #dominik's comment (and for anyone else who stumbled upon this thread and wish to see the Altair code example), the current way of achieving a "tooltip" effect along the lines is to:
Create the line (mark_line())
Create a selection that chooses the nearest point & selects based on x-value
Snap some transparent selectors across the line, informing the x-value across different positions of the line
Layer (mark_text()) on top of 1 - 3 above
A real example is this line chart on a simple Flask app I made. Only difference was that I didn't make the selectors transparent (opacity=alt.value(0)) but otherwise it's a line chart with tooltips snapped on it.
Here's a reproducible example using OP's original dataset:
# Step 1: create the line
line = alt.Chart().mark_line(interpolate="basis").encode(
x=alt.X("year:O"),
y=alt.Y("perc:Q", axis=alt.Axis(format='%')),
color='sex:N'
).transform_filter(
alt.datum.job == 'Welder'
)
# Step 2: Selection that chooses nearest point based on value on x-axis
nearest = alt.selection(type='single', nearest=True, on='mouseover',
fields=['year'])
# Step 3: Transparent selectors across the chart. This is what tells us
# the x-value of the cursor
selectors = alt.Chart().mark_point().encode(
x="year:O",
opacity=alt.value(0),
).add_selection(
nearest
)
# Step 4: Add text, show values in Sex column when it's the nearest point to
# mouseover, else show blank
text = line.mark_text(align='left', dx=3, dy=-3).encode(
text=alt.condition(nearest, 'sex:N', alt.value(' '))
)
# Layer them all together
chart = alt.layer(line, selectors, text, data=source, width=300)
chart
Resulting plot:

I doubt that there is a direct technical solution at the moment :-(
One workaround solution is to explicitly add points on top of lines so it is easier to hover. I usually make them relatively large, but hide until the hover event, like here As a cherry on the top, one could use Voronoi to show the closest point at any given point, as they do in this tutorial
Let me know if you need Altair code example, I used raw vega, but implementing Altair version should be relatively trivial

As of March 2022, a workaround for this without complicating your spec too much with selectors and Voronoi tesselation: Use a thick transparent line in the background (opacity should not be exactly 0, because then it is not rendered) and create a layer chart.
base = (
alt.Chart(
pd.DataFrame(
[{"x": 1, "y": 1}, {"x": 2, "y": 2}, {"x": 3, "y": 1}, {"x": 4, "y": 4}]
)
)
.mark_line()
.encode(x="x:Q", y="y:Q", tooltip="tt:N")
.transform_calculate(tt="datum.x+' value'")
)
tt = base.mark_line(strokeWidth=30, opacity=0.01)
base + tt

Related

Understanding the interaction between mark_line point overlay and legend

I have found some unintuitive behavior in the interaction between the point property of mark_line and the appearance of the color legend for Altair/Vega-Lite. I ran into this when attempting to create a line with very large and mostly-transparent points in order to increase the area that would trigger the line's tooltip, but was unable to preserve a visible type=gradient legend.
The following code is an MRE for this problem, showing 6 cases: the use of [False, True, and a custom OverlayMarkDef] for the point property and the use of plain and customized color encoding.
import pandas as pd
import altair as alt
# create data
df = pd.DataFrame()
df['x_data'] = [0, 1, 2] * 3
df['y2'] = [0] * 3 + [1] * 3 + [2] * 3
# initialize
base = alt.Chart(df)
markdef = alt.OverlayMarkDef(size=1000, opacity=.001)
color_encode = alt.Color(shorthand='y2', legend=alt.Legend(title='custom legend', type='gradient'))
marks = [False, True, markdef]
encodes = ['y2', color_encode]
plots = []
for i, m in enumerate(marks):
for j, c in enumerate(encodes):
plot = base.mark_line(point=m).\
encode(x='x_data', y='y2', color=c, tooltip=['x_data','y2']).\
properties(title=', '.join([['False', 'True', 'markdef'][i], ['plain encoding', 'custom encoding'][j]]))
plots.append(plot)
combined = alt.vconcat(
alt.hconcat(*plots[:2]).resolve_scale(color='independent'),
alt.hconcat(*plots[2:4]).resolve_scale(color='independent'),
alt.hconcat(*plots[4:]).resolve_scale(color='independent')
).resolve_scale(color='independent')
The resulting plot (the interactive tooltips work as expected):
The color data is the same for each of these plots, and yet the color legend is all over the place. In my real case, the gradient is preferred (the data is quantitative and continuous).
With no point on the mark_line, the legend is correct.
Adding point=True converts the legend to a symbol type - I'm not sure why this is the case since the default legend type is gradient for quantitative data (as seen in the first row) and this is the same data - but can be forced back to gradient by the custom encoding.
Attempting to make a custom point via OverlayMarkDef however renders the forced gradient colorbar invisible - matching the opacity of the OverlayMarkDef. But it is not simply a matter of the legend always inheriting the properties of the point, because the symbol legend does not attempt to reflect the opacity.
I would like to have the normal gradient colorbar available for the custom OverlayMarkDef, but I would also love to build up some intuition for what is going on here.
The transparency issue with the bottom right plot has been fixed since Altair 4.2.0, so now all occasions that include a point on the line changes the legend to 'Ordinal' instead of 'Quantitative'.
I believe the reason the legend is converted to a symbol instead of a gradient, is that your are adding filled points and the fill channel is not set to a quantitative field so it defaults to either ordinal or nominal with a sort:
plot = base.mark_line().encode(
x='x_data',
y='y2',
color='y2',
)
plot + plot.mark_circle(opacity=1)
mark_point gives a gradient legend since it has not fill, and if we set the fill for mark_circle explicitly we also get a gradient legend (one for fill and one for color.
plot = base.mark_line().encode(
x='x_data',
y='y2',
color='y2',
fill='y2'
)
plot + plot.mark_circle(opacity=1)
I agree with you that this is a bit unexpected and it would be more convenient if the encoding type of point=True was set to the same as that used for the lines. You might suggest this as an enhancement in VegaLite together with reporting the apparent bug that you can't override the legend type via type='gradient'.

Holoviews DynamicMap Area or Curve with two streams is showing wrong chart

I would like to use HoloViews DynamicMap with a widget to select data for two curves, and a widget to control whether the curves are shown separately or as a filled area. It almost works, but sometimes shows the wrong data, depending on the order in which the widgets are manipulated.
The code snippet below demonstrates the issue, if run in a Jupyter notebook. It creates two identical DynamicMaps to show how they get out of sync with the widgets.
For this demo, if 'fill', an Area chart is shown. Otherwise, two Curve elements show the top and bottom bounds of the same area.
If 'higher', the area or curves are shifted upwards along the vertical axis (higher y values).
First, one DynamicMap is displayed. The code snippet then toggles the widget for 'fill' followed by 'higher', in that order (alternatively, the user could manually toggle the widgets). The DynamicMap should show a filled area in the higher position, but actually shows a filled area in the lower position. The image below the code snippet shows this incorrect DynamicMap on the left.
The second DynamicMap (shown on the right) is added to the display after the widgets are toggled. It correctly displays a chart corresponding to the state of the widgets at that point.
Code snippet
import holoviews as hv
import numpy as np
import panel as pn
pn.extension()
# Make two binary widgets to control whether chart
# data is high or low, and whether chart shows
# an area fill or just a pair of lines.
check_boxes = {name: pn.widgets.Checkbox(value=False, name=name) \
for name in ["higher", "fill"]}
# Data for charts.
xvals = [0.10, 0.90]
yvals_high = [1, 1.25]
yvals_low = [0.25, 0.40]
# Declare horizontal and vertical dimensions to go on charts.
xdim = hv.Dimension("x", range=(-0.5, 1.5), label="xdim")
ydim = hv.Dimension("y", range=(0, 2), label="ydim")
def make_plot(higher, fill):
"""Make high or low, filled area or line plot"""
yvals_line1 = np.array(yvals_high if higher else yvals_low)
yvals_line2 = 1.2*yvals_line1
if fill:
# Make filled area plot with x series and two y series.
area_data = (xvals, yvals_line1, yvals_line2)
plot = hv.Area(area_data,
kdims=xdim,
vdims=[ydim, ydim.clone("y.2")])
plot = hv.Overlay([plot]) # DMap will want an overlay.
else:
# Make line plot with x series and y series.
line_data_low = (xvals, yvals_line1)
line_data_high = (xvals, yvals_line2)
plot = hv.Curve(line_data_low,
kdims=xdim,
vdims=ydim) \
* hv.Curve(line_data_high,
kdims=xdim,
vdims=ydim)
return plot
# Map combinations of higher and fill to corresponding charts.
chart_dict = {(higher, fill): make_plot(higher, fill) \
for higher in [False,True] for fill in [False,True]}
def chart_func(higher, fill):
"""Return chart from chart_dict lookup"""
return chart_dict[higher, fill]
# Make two DynamicMaps linked to the check boxes.
dmap1 = hv.DynamicMap(chart_func, kdims=["higher", "fill"], streams=check_boxes)
dmap2 = hv.DynamicMap(chart_func, kdims=["higher", "fill"], streams=check_boxes)
# Show the check boxes, and one of the DMaps.
widget_row = pn.Row(*check_boxes.values(), width=150)
dmap_row = pn.Row(dmap1, align='start')
layout = pn.Column(widget_row,
dmap_row)
display(layout)
## Optionally use following line to launch a server, then toggle widgets.
#layout.show()
# Toggle 'fill' and then 'higher', in that order.
# Both DMaps should track widgets...
check_boxes["fill"].value = True
check_boxes["higher"].value = True
# Show the other DMap, which displays correctly given the current widgets.
dmap_row.append(dmap2)
# But first dmap (left) is now showing an area in wrong location.
Notebook display
Further widget toggles
The code snippet below can be run immediately afterwards in another cell. The resulting notebook display is shown in an image below the code snippet.
The code here toggles the widgets again, 'fill' and 'higher', in that order (alternatively, the user could manually toggle the widgets).
The left DynamicMap correctly displays a chart corresponding to the state of the widgets at that point, that is, two lines in the lower position.
The right DynamicMap incorrectly shows the two lines in the higher position.
# Toggle 'fill' and then 'higher' again, in that order.
# Both DMaps should track widgets...
check_boxes["fill"].value = False
check_boxes["higher"].value = False
# But now the second DMap shows lines in wrong location.
Am I just going about this the wrong way?
Thanks for the detailed, reproducible report!
After running your example, I noticed two things:
Switching from pn.extension to hv.extension at the start seems to fix the strange behavior that I also observing when using the panel extension. Could you confirm that things work as expected when using the holoviews extension?
I was wondering why your DynamicMaps work via chart_dict and chart_func when you can just use your make_plot callback in the DynamicMaps directly, without modification.
If you can confirm that the extension used changes the behavior, could you file an issue about this? Thanks!

Python: Add calculated lines to a scatter plot with a nested categorical x-axis

Cross-post: https://discourse.bokeh.org/t/add-calculated-horizontal-lines-corresponding-to-categories-on-the-x-axis/5544
I would like to duplicate this plot in Python:
Here is my attempt, using pandas and bokeh:
Imports:
import pandas as pd
from bokeh.io import output_notebook, show, reset_output
from bokeh.palettes import Spectral5, Turbo256
from bokeh.plotting import figure
from bokeh.transform import factor_cmap
from bokeh.models import Band, Span, FactorRange, ColumnDataSource
Create data:
fruits = ['Apples', 'Pears']
years = ['2015', '2016']
data = {'fruit' : fruits,
'2015' : [2, 1],
'2016' : [5, 3]}
fruit_df = pd.DataFrame(data).set_index("fruit")
tidy_df = (pd.DataFrame(data)
.melt(id_vars=["fruit"], var_name="year")
.assign(fruit_year=lambda df: list(zip(df['fruit'], df['year'])))
.set_index('fruit_year'))
Create bokeh plot:
p = figure(x_range=FactorRange(factors=tidy_df.index.unique()),
plot_height=400,
plot_width=400,
tooltips=[('Fruit', '#fruit'), # first string is user-defined; second string must refer to a column
('Year', '#year'),
('Value', '#value')])
cds = ColumnDataSource(tidy_df)
index_cmap = factor_cmap("fruit",
Spectral5[:2],
factors=sorted(tidy_df["fruit"].unique())) # this is a reference back to the dataframe
p.circle(x='fruit_year',
y='value',
size=20,
source=cds,
fill_color=index_cmap,
line_color=None,
)
# how do I add a median just to one categorical section?
median = Span(location=tidy_df.loc[tidy_df["fruit"] == "Apples", "value"].median(), # median value for Apples
#dimension='height',
line_color='red',
line_dash='dashed',
line_width=1.0
)
p.add_layout(median)
# how do I add this standard deviation(ish) band to just the Apples or Pears section?
band = Band(
base='fruit_year',
lower=2,
upper=4,
source=cds,
)
p.add_layout(band)
show(p)
Output:
Am I up against this issue? https://github.com/bokeh/bokeh/issues/8592
Is there any other data visualization library for Python that can accomplish this? Altair, Holoviews, Matplotlib, Plotly... ?
Band is a connected area, but your image of the desired output has two disconnected areas. Meaning, you actually need two bands. Take a look at the example here to better understand bands: https://docs.bokeh.org/en/latest/docs/user_guide/annotations.html#bands
By using Band(base='fruit_year', lower=2, upper=4, source=cds) you ask Bokeh to plot a band where for each value of fruit_year, the lower coordinate will be 2 and the upper coordinate will be 4. Which is exactly what you see on your Bokeh plot.
A bit unrelated but still a mistake - notice how your X axis is different from what you wanted. You have to specify the major category first, so replace list(zip(df['fruit'], df['year'])) with list(zip(df['year'], df['fruit'])).
Now, to the "how to" part. Since you need two separate bands, you cannot provide them with the same data source. The way to do it would be to have two extra data sources - one for each band. It ends up being something like this:
for year, sd in [('2015', 0.3), ('2016', 0.5)]:
b_df = (tidy_df[tidy_df['year'] == year]
.drop(columns=['year', 'fruit'])
.assign(lower=lambda df: df['value'].min() - sd,
upper=lambda df: df['value'].max() + sd)
.drop(columns='value'))
p.add_layout(Band(base='fruit_year', lower='lower', upper='upper',
source=ColumnDataSource(b_df)))
There are two issues left however. The first one is a trivial one - the automatic Y range (an instance of DataRange1d class by default) will not take the bands' heights into account. So the bands can easily go out of bounds and be cropped by the plot. The solution here is to use manual ranging that takes the SD values into account.
The second issue is that the width of band is limited to the X range factors, meaning that the circles will be partially outside of the band. This one is not that easy to fix. Usually a solution would be to use a transform to just shift the coordinates a bit at the edges. But since this is a categorical axis, we cannot do it. One possible solution here is to create a custom Band model that adds an offset:
class MyBand(Band):
# language=TypeScript
__implementation__ = """
import {Band, BandView} from "models/annotations/band"
export class MyBandView extends BandView {
protected _map_data(): void {
super._map_data()
const base_sx = this.model.dimension == 'height' ? this._lower_sx : this._lower_sy
if (base_sx.length > 1) {
const offset = (base_sx[1] - base_sx[0]) / 2
base_sx[0] -= offset
base_sx[base_sx.length - 1] += offset
}
}
}
export class MyBand extends Band {
__view_type__: MyBandView
static init_MyBand(): void {
this.prototype.default_view = MyBandView
}
}
"""
Just replace Band with MyBand in the code above and it should work. One caveat - you will need to have Node.js installed and the startup time will be longer for a second or two because the custom model code needs compilation. Another caveat - the custom model code knows about internals of BokehJS. Meaning, that while it's working with Bokeh 2.0.2 I can't guarantee that it will work with any other Bokeh version.

Adding X-Y offsets to data points

I'm looking for a way to specify an X-Y offset to plotted data points. I'm just getting into Altair, so please bear with me.
The situation: I have a dataset recording daily measurements for 30 people. Every person can register several different types of measurements every day.
Example dataset & plot, with 2 people and 2 measurement types:
import pandas as pd
df = pd.DataFrame.from_dict({"date": pd.to_datetime(pd.date_range("2019-12-01", periods=5).repeat(4)),
"person": pd.np.tile(["Bob", "Amy"], 10),
"measurement_type": pd.np.tile(["score_a", "score_a", "score_b", "score_b"], 5),
"value": 20.0*np.random.random(size=20)})
import altair as alt
alt.Chart(df, width=600, height=100) \
.mark_circle(size=150) \
.encode(x = "date",
y = "person",
color = alt.Color("value"))
This gives me this graph:
In the example above, the 2 measurement types are plotted on top of each other. I would like to add an offset to the circles depending on the "measurement_type" column, so that they can all be made visible around the date-person location in the graph.
Here's a mockup of what I want to achieve:
I've been searching the docs but haven't figured out how to do this - been experimenting with the "stack" option, with the dx and dy options, ...
I have a feeling this should just be another encoding channel (offset or alike), but that doesn't exist.
Can anyone point me in the right direction?
There is currently no concept of an offset encoding in Altair, so the best approach to this will be to combine a column encoding with a y encoding, similar to the Grouped Bar Chart example in Altair's documentation:
alt.Chart(df,
width=600, height=100
).mark_circle(
size=150
).encode(
x = "date",
row='person',
y = "measurement_type",
color = alt.Color("value")
)
You can then fine-tune the look of the result using standard chart configuration settings:
alt.Chart(df,
width=600, height=alt.Step(25)
).mark_circle(
size=150
).encode(
x = "date",
row='person',
y = alt.Y("measurement_type", title=None),
color = alt.Color("value")
).configure_facet(
spacing=10
).configure_view(
strokeOpacity=0
)
Well I don't know what result you are getting up until know, but maybe write a function whith parameters likedef chart(DotsOnXAxis, FirstDotsOnYAxis, SecondDotsOnYAxis, OffsetAmount)
and then put those variables on the right place.
If you want an offset with the dots maybe put in a system like: SecondDotsOnYAxis = FirstDotsOnYAxis + OffsetAmount

plotly - multiple traces using a shared slider variable

As the title hints, I'm struggling to create a plotly chart that has multiple lines that are functions of the same slider variable.
I hacked something together using bits and pieces from the documentation: https://pastebin.com/eBixANqA. This works for one line.
Now I want to add more lines to the same chart, but this is where I'm struggling. https://pastebin.com/qZCMGeAa.
I'm getting a PlotlyListEntryError: Invalid entry found in 'data' at index, '0'
Path To Error: ['data'][0]
Can someone please help?
It looks like you were using https://plot.ly/python/sliders/ as a reference, unfortunately I don't have time to test with your code, but this should be easily adaptable. If you create each trace you want to plot in the same way that you have been:
trace1 = [dict(
type='scatter',
visible = False,
name = "trace title",
mode = 'markers+lines',
x = x[0:step],
y = y[0:step]) for step in range(len(x))]
where I note in my example my data is coming from pre-defined lists, where you are using a function, that's probably the only change you'll really need to make besides your own step size etc.
If you create a second trace in the same way, for example
trace2 = [dict(
type='scatter',
visible = False,
name = "trace title",
mode = 'markers+lines',
x = x2[0:step],
y = y2[0:step]) for step in range(len(x2))]`
Then you can put all your data together with the following
all_traces = trace1 + trace2
then you can just go ahead and plot it provided you have your layout set up correctly (it should remain unchanged from your single trace example):
fig = py.graph_objs.Figure(data=all_traces, layout=layout)
py.offline.iplot(fig)
Your slider should control both traces provided you were following https://plot.ly/python/sliders/ to get the slider working. You can combine multiple data dictionaries this way in order to have multiple plots controlled by the same slider.
I do note that if your lists of dictionaries containing data are of different length, that this gets topsy-turvy.

Categories