Altair Layer Chart Y Axis Not Resolving to Same Scale - python

I'm trying to plot data and compare it to a threshold set at a fixed value. When creating a layered chart, the y axis does not appear to hold for both layers. The same goes for hconcat.
I found this issue which mentions .resolve_scale(y='shared'), but that doesn't seem to work. When I specify the rule to be at 5, it appears above 15.
np.random.seed(42)
df = pd.DataFrame({
'x': np.linspace(0, 10, 500),
'y': np.random.randn(500).cumsum()
})
base = alt.Chart(df)
line = base.mark_line().encode(x='x', y='y')
rule = base.mark_rule().encode(y=alt.value(5))
alt.layer(line, rule).resolve_scale(y='shared')
To get the rule to appear at the value 5, I have to set it at 110.
rule = base.mark_rule().encode(y=alt.value(110))
alt.layer(line, rule).resolve_scale(y='shared')
How can I edit the chart so that the rule shows at the y-value specified?

Altair scales map a domain to a range. The domain describes the extent of the data values, while the range describes the extent of the visual features to which those values are mapped. For color encodings, the range might be "red", "blue", "green", etc. For positional encodings like x and y, the range is the pixel position of the mark on the chart.
When you use alt.value, you are specifying the range value, not the domain value. This is why you can use an encoding like color=alt.value('red'), to specify that you want the mark to appear as the color red. When you do y=alt.value(5), you are saying you want the mark to appear 5 pixels from the top of the y-axis.
Recent versions of Vega-Lite added the ability to specify the domain value via datum rather than value, but unfortunately Altair does not yet support this, and so the only way to make this work is to have a data field with the desired value. For example:
line = base.mark_line().encode(x='x', y='y')
rule = alt.Chart(pd.DataFrame({'y': [5]})).mark_rule().encode(y='y')
alt.layer(line, rule).resolve_scale(y='shared')

Related

Understanding the interaction between mark_line point overlay and legend

I have found some unintuitive behavior in the interaction between the point property of mark_line and the appearance of the color legend for Altair/Vega-Lite. I ran into this when attempting to create a line with very large and mostly-transparent points in order to increase the area that would trigger the line's tooltip, but was unable to preserve a visible type=gradient legend.
The following code is an MRE for this problem, showing 6 cases: the use of [False, True, and a custom OverlayMarkDef] for the point property and the use of plain and customized color encoding.
import pandas as pd
import altair as alt
# create data
df = pd.DataFrame()
df['x_data'] = [0, 1, 2] * 3
df['y2'] = [0] * 3 + [1] * 3 + [2] * 3
# initialize
base = alt.Chart(df)
markdef = alt.OverlayMarkDef(size=1000, opacity=.001)
color_encode = alt.Color(shorthand='y2', legend=alt.Legend(title='custom legend', type='gradient'))
marks = [False, True, markdef]
encodes = ['y2', color_encode]
plots = []
for i, m in enumerate(marks):
for j, c in enumerate(encodes):
plot = base.mark_line(point=m).\
encode(x='x_data', y='y2', color=c, tooltip=['x_data','y2']).\
properties(title=', '.join([['False', 'True', 'markdef'][i], ['plain encoding', 'custom encoding'][j]]))
plots.append(plot)
combined = alt.vconcat(
alt.hconcat(*plots[:2]).resolve_scale(color='independent'),
alt.hconcat(*plots[2:4]).resolve_scale(color='independent'),
alt.hconcat(*plots[4:]).resolve_scale(color='independent')
).resolve_scale(color='independent')
The resulting plot (the interactive tooltips work as expected):
The color data is the same for each of these plots, and yet the color legend is all over the place. In my real case, the gradient is preferred (the data is quantitative and continuous).
With no point on the mark_line, the legend is correct.
Adding point=True converts the legend to a symbol type - I'm not sure why this is the case since the default legend type is gradient for quantitative data (as seen in the first row) and this is the same data - but can be forced back to gradient by the custom encoding.
Attempting to make a custom point via OverlayMarkDef however renders the forced gradient colorbar invisible - matching the opacity of the OverlayMarkDef. But it is not simply a matter of the legend always inheriting the properties of the point, because the symbol legend does not attempt to reflect the opacity.
I would like to have the normal gradient colorbar available for the custom OverlayMarkDef, but I would also love to build up some intuition for what is going on here.
The transparency issue with the bottom right plot has been fixed since Altair 4.2.0, so now all occasions that include a point on the line changes the legend to 'Ordinal' instead of 'Quantitative'.
I believe the reason the legend is converted to a symbol instead of a gradient, is that your are adding filled points and the fill channel is not set to a quantitative field so it defaults to either ordinal or nominal with a sort:
plot = base.mark_line().encode(
x='x_data',
y='y2',
color='y2',
)
plot + plot.mark_circle(opacity=1)
mark_point gives a gradient legend since it has not fill, and if we set the fill for mark_circle explicitly we also get a gradient legend (one for fill and one for color.
plot = base.mark_line().encode(
x='x_data',
y='y2',
color='y2',
fill='y2'
)
plot + plot.mark_circle(opacity=1)
I agree with you that this is a bit unexpected and it would be more convenient if the encoding type of point=True was set to the same as that used for the lines. You might suggest this as an enhancement in VegaLite together with reporting the apparent bug that you can't override the legend type via type='gradient'.

How to Convert Color-Code Legends from Logarithmic Scale to Actual Values?

What is the best way to display actual vallues in color-code legend when using logarithmic scale color coding in plotly.figure_factory.create_choropleth?
Here is the sample code:
import plotly.figure_factory as ff
fips = df['fips']
values = np.log10(df['values'])
endpts = list(np.linspace(0, 4, len(colorscale) - 1))
fig = ff.create_choropleth(fips=fips, values=values, scope = ['usa'], binning_endpoints = endpts)
Here is what I have currently:
Here is what I wish to have:
Exactly same as above map except in the legend displaying actual numbers instead of log10(values). For example instead of 0.0-0.5, and 0.5-1.0 (meaning 10^0-to-10^1/2, and 10^1/2-to-10^1) I would like to see: 1-3, 4-10 and so forth.
I am not familiar with Plotly API and since you do not provide a minimal working example, it is hard for me to test, but I am quite confident that you could specify a colormap. If so, then you could just convert the colormap in logarithmic scale while feeding the numbers in liner scale.

How to hack this Bokeh HexTile plot to fix the coords, label placement and axes?

Below is Bokeh 1.4.0 code that tries to draw a HexTile map of the input dataframe, with axes, and tries to place labels on each hex.
I've been stuck on this for two days solid, reading bokeh doc, examples and github known issues, SO, Bokeh Discourse and Red Blob Games's superb tutorial on Hexagonal Grids, and trying code. (I'm less interested in raising Bokeh issues for the future, and far more interested in pragmatic workarounds to known limitations to just get my map code working today.) Plot is below, and code at bottom.
Here are the issues, in rough decreasing order of importance (it's impossible to separate the root-cause and tell which causes which, due to the way Bokeh handles glyphs. If I apply one scale factor or coord transform it fixes one set of issues, but breaks another, 'whack-a-mole' effect):
The label placement is obviously wrong, but I can't seem to hack up any variant of either (x,y) coords or (q,r) coords to work. (I tried combinations of figure(..., match_aspect=True)), I tried 1/sqrt(2) scaling the (x,y)-coords, I tried Hextile(... size, scale) params as per redblobgames, e.g. size = 1/sqrt(3) ~ 0.57735).
Bokeh forces the origin to be top left, and y-coords to increase as you go down, however the default axis labels show y or r as being negative. I found I still had to use p.text(q, -r, .... I suppose I have to manually patch the auto-supplied yaxis labels or TickFormatter to be positive.
I use np.mgrid to generate the coord grid, but I still seem to have to assign q-coords right-to-left: np.mgrid[0:8, (4+1):0:-1]. Still no matter what I do, the hexes are flipped L-to-R
(Note: empty '' counties are placeholders to get the desired shape, hence the boolean mask [counties!=''] on grid coords. This works fine and I want to leave it as-is)
The source (q,r) coords for the hexes are integers, and I use 'odd-r' offset coords (not axial or hexagonal coords). No matter what HexTile(..., size, scale) args I use, one or both dimensions in the plot is wrong or squashed. Or whether I include the 1/sqrt(2) factor in coord transform.
My +q-axis is east and my +r-axis should be 120° SSE
Ideally I'd like to have my origin at bottom-left (math plot style, not computer graphics). But Bokeh apparently doesn't support that, I can live without that. However defaulting the y-axis labels to negative, while requiring a mix of positive and negative coords, is confusing. Anyway, how to hack an automatic fix to that with minimum grief? (manual p.yrange = Range1d(?, ?)?)
Bokeh's approach to attaching (hex) glyphs to plots is a hard idiom to use. Ideally I simply want to reference (q,r)-coords everywhere for hexes, labels, axes. I never want to see (x,y)-coords appearing on axes, label coords, tick-marks, etc. but seems Bokeh won't allow you. I guess you have to manually hack the axes and ticks later. Also, the plot<->glyph interface doesn't allow you to expose a (q,r) <-> (x,y) coord transform function, certainly not a bidirectional one.
The default axes don't seem to have any accessors to automatically find their current extent/limits; p.yaxis.start/end are empty unless you specified them. The result from p.yaxis.major_tick_in,p.yaxis.major_tick_out is also wrong, for this plot it gives (2,6) for both x and y, seems to be clipping those to the interior multiples of 2(?). How to automatically get the axes' extent?
My current plot:
My code:
import pandas as pd
import numpy as np
from math import sqrt
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource
from bokeh.models.glyphs import HexTile
from bokeh.io import show
# Data source is a list of county abbreviations, in (q,r) coords...
counties = np.array([
['TE','DY','AM','DN', ''],
['DL','FM','MN','AH', ''],
['SO','LM','CN','LH', ''],
['MO','RN','LD','WH','MH'],
['GA','OY','KE','D', ''],
['', 'CE','LS','WW', ''],
['LC','TA','KK','CW', ''],
['KY','CR','WF','WX', ''],
])
#counties = counties[::-1] # UNUSED: flip so origin is at bottom-left
# (q,r) Coordinate system is “odd/even-r” horizontal Offset coords
r, q = np.mgrid[0:8, (4+1):0:-1]
q = q[counties!='']
r = r[counties!='']
sqrt3 = sqrt(3)
# Try to transform odd-r (q,r) offset coords -> (x,y). Per Red Blob Games' tutorial.
x = q - (r//2) # this may be slightly dubious
y = r
counties_df = pd.DataFrame({'q': q, 'r': r, 'abbrev': counties[counties!=''], 'x': x, 'y': y })
counties_ds = ColumnDataSource(ColumnDataSource.from_df(counties_df)) # ({'q': q, 'r': r, 'abbrev': counties[counties != '']})
p = figure(tools='save,crosshair') # match_aspect=True?
glyph = HexTile(orientation='pointytop', q='x', r='y', size=0.76, fill_color='#f6f699', line_color='black') # q,r,size,scale=??!?!!? size=0.76 is an empirical hack.
p.add_glyph(counties_ds, glyph)
p.xaxis.minor_tick_line_color = None
p.yaxis.minor_tick_line_color = None
print(f'Axes: x={p.xaxis.major_tick_in}:{p.xaxis.major_tick_out} y={p.yaxis.major_tick_in}:{p.yaxis.major_tick_out}')
# Now can't manage to get the right coords for text labels
p.text(q, -r, text=["(%d, %d)" % (q,r) for (q, r) in zip(q, r)], text_baseline="middle", text_align="center")
# Ideally I ultimately want to fix this and plot `abbrev` column as the text label
show(p)
There is an axial_to_cartesian function that will just compute the hex centers for you. You can then attach the labels in a variety of orientations and anchoring from these.
Bokeh does not force the origin to be anywhere. There is one axial to cartesian mapping Bokeh uses, exactly what is given by axial_to_cartesian. The position of the Hex tiles (and hence the cartesian coordinates that the axes display) follows from this. If you want different ticks, Bokeh affords lots of control points over both tick location and tick labelling.
There is more than one convention for Axial coords. Bokeh picked the one that has the r-axis tile "up an to the left", i.e. the one explicitly shown here:
https://docs.bokeh.org/en/latest/docs/user_guide/plotting.html#hex-tiles
Bokeh expects up-and-to-the-left axial coords. You will need to convert whatever coordinate system you have to that. For "squishing" you will need to set match_aspect=True to ensure the "data space" aspect ratio matches the "pixel space" aspect ratio 1-1.
Alternatively, if you don't or can't use auto-ranging you will need to set the plot size carefully and also control the border sizes with min_border_left etc to make sure the borders are always big enough to accommodate any tick labels you have (so that the inner region will not be resized)
I don't really understand this question, but you have absolute control over what ticks visually appear, regardless of the underlying tick data. Besides the built-in formatters, there is FuncTickFormatter that lets you format ticks any way you want with a snippet of JS code. [1] (And you also have control of where ticks are located, if you want that.)
[1] Please note the CoffeeScript and from_py_func options are both deprecated and being removed in then next 2.0 release.
Again, you'll want to use axial_to_cartesian to position anything other then Hex tiles. No other glyphs in Bokeh understand axial coordinates (which is why we provide the conversion function).
You misunderstood what major_tick_in and major_tick_out are for. They are literally how far the ticks visually extend inside and outside the plot frame, in pixels.
Auto-ranging (with DataRange1d) is only computed in the browser, in JavaScript, which is why the start/end are not available on the "Python" side. If you need to know the start/end, you will need to explicitly set the start/end, yourself. Note, however that match_aspect=True only function with DataRange1d. If you explicitly set start/end manually, Bokeh will assume you know what you want, and will honor what you ask for, regardless of what it does to aspect.
Below are my solution and plot. Mainly per #bigreddot's advice, but there's still some coordinate hacking needed:
Expecting users to pass input coords as axial instead of offset coords is a major limitation. I work around this. There's no point in creating a offset_to_cartesian() because we need to negate r in two out of three places:
My input is even-r offset coords. I still need to manually apply the offset: q = q + (r+1)//2
I need to manually negate r in both the axial_to_cartesian() call and the datasource creation for the glyph. (But not in the text() call).
The call needs to be: axial_to_cartesian(q, -r, size=2/3, orientation='pointytop')
Need p = figure(match_aspect=True ...) to prevent squishing
I need to manually create my x,y axes to get the range right
Solution:
import pandas as pd
import numpy as np
from math import sqrt
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, Range1d
from bokeh.models.glyphs import HexTile
from bokeh.io import curdoc, show
from bokeh.util.hex import cartesian_to_axial, axial_to_cartesian
counties = np.array([
['DL','DY','AM','', ''],
['FM','TE','AH','DN', ''],
['SO','LM','CN','MN', ''],
['MO','RN','LD','MH','LH'],
['GA','OY','WH','D' ,'' ],
['' ,'CE','LS','KE','WW'],
['LC','TA','KK','CW','' ],
['KY','CR','WF','WX','' ]
])
counties = np.flip(counties, (0)) # Flip UD for bokeh
# (q,r) Coordinate system is “odd/even-r” horizontal Offset coords
r, q = np.mgrid[0:8, 0:(4+1)]
q = q[counties!='']
r = r[counties!='']
# Transform for odd-r offset coords; +r-axis goes up
q = q + (r+1)//2
#r = -r # cannot globally negate 'r', see comments
# Transform odd-r offset coords (q,r) -> (x,y)
x, y = axial_to_cartesian(q, -r, size=2/3, orientation='pointytop')
counties_df = pd.DataFrame({'q': q, 'r': -r, 'abbrev': counties[counties!=''], 'x': x, 'y': y })
counties_ds = ColumnDataSource(ColumnDataSource.from_df(counties_df)) # ({'q': q, 'r': r, 'abbrev': counties[counties != '']})
p = figure(match_aspect=True, tools='save,crosshair')
glyph = HexTile(orientation='pointytop', q='q', r='r', size=2/3, fill_color='#f6f699', line_color='black') # q,r,size,scale=??!?!!?
p.add_glyph(counties_ds, glyph)
p.x_range = Range1d(-2,6)
p.y_range = Range1d(-1,8)
p.xaxis.minor_tick_line_color = None
p.yaxis.minor_tick_line_color = None
p.text(x, y, text=["(%d, %d)" % (q,r) for (q, r) in zip(q, r)],
text_baseline="middle", text_align="center")
show(p)

How to change color of drawn points on pyGal line chart

I am creating a line graph using pygal by passing in an array of numbers to be graphed. I am wishing for the points marked on the graph to change color when they are in/outside of a certain range. I.e. If there is a point logged over 40, color it red, if there is a point logged under 20, color it blue.
There does not seem to be an easy way to loop through the array and draw a single point.
The graph is being made with the following code:
customStyle = Style(colors=["#000000"])
chart = pygal.Line(style=customStyle)
chart.title = 'Browser usage evolution (in %)'
chart.x_labels = recordedDates
chart.add('Humidity', recordedHumidity)
chart.render_to_png("out.png")
I would like to have all points above 40 red and below 20 blue.
You can replace a number in the array with a dict that tells Pygal how to render the data point. This dict must contain the key value, which is the number you would have passed, alongside any customisation options you want to use. The list of available options is provided on the value configuration page of the docs, but the one you need here is color.
You can simply iterate over your existing array, creating a dictionary where color is set appropriately for the value:
data = []
for v in recordedHumidity:
if v > 40:
data.append({"value": v, "color": "red"})
elif v < 20:
data.append({"value": v, "color": "blue"})
else:
data.append(v)
You can then pass the newly created array when adding the series:
customStyle = Style(colors=["#000000"])
chart = pygal.Line(style=customStyle)
chart.x_labels = recordedDates
chart.add('Humidity', data)
chart.render_to_png("out.png")
You might also want to look at the chart configuration and series configuration pages in the docs to see how to customise other aspects of the chart, such as the size of the markers.

How to make the confidence interval (error bands) show on seaborn lineplot

I'm trying to create a plot of classification accuracy for three ML models, depending on the number of features used from the data (the number of features used is from 1 to 75, ranked according to a feature selection method). I did 100 iterations of calculating the accuracy output for each model and for each "# of features used". Below is what my data looks like (clsf from 0 to 2, timepoint from 1 to 75):
data
I am then calling the seaborn function as shown in documentation files.
sns.lineplot(x= "timepoint", y="acc", hue="clsf", data=ttest_df, ci= "sd", err_style = "band")
The plot comes out like this:
plot
I wanted there to be confidence intervals for each point on the x-axis, and don't know why it is not working. I have 100 y values for each x value, so I don't see why it cannot calculate/show it.
You could try your data set using Seaborn's pointplot function instead. It's specifically for showing an indication of uncertainty around a scatter plot of points. By default pointplot will connect values by a line. This is fine if the categorical variable is ordinal in nature, but it can be a good idea to remove the line via linestyles = "" for nominal data. (I used join = False in my example)
I tried to recreate your notebook to give a visual, but wasn't able to get the confidence interval in my plot exactly as you describe. I hope this is helpful for you.
sb.set(style="darkgrid")
sb.pointplot(x = 'timepoint', y = 'acc', hue = 'clsf',
data = ttest_df, ci = 'sd', palette = 'magma',
join = False);

Categories