I am attempting to create two layered histograms in Altair (and a vertical mean ruler for each). I would like a legend to label each of these four.
I am using the first 'Birth weight I' data that can be found here
My code (real long, apologies) looks something like this:
from altair import datum
# This histogram for baby weights of mothers who dont smoke
dont = alt.Chart(babyData).mark_bar().encode(
alt.X("bwt-oz:Q", axis=alt.Axis(title='Birth Weight (Ounces)'), bin=True),
alt.Y('count()', axis=alt.Axis(title='Count'), scale=alt.Scale(domain=[0, 350]))
).properties(
width=400,
height=400
).transform_filter(
datum.smoke == 0,
)
mean = alt.Chart(babyData).mark_rule(color='red').encode(
x='mean(bwt-oz):Q',
size=alt.value(4)
).transform_filter(
datum.smoke == 0
)
dontSmokeChart = dont + mean
# This histogram for baby weights of mothers who smoke
do = alt.Chart(babyData).mark_bar().encode(
alt.X("bwt-oz:Q", axis=alt.Axis(title='Birth Weight (Ounces)'), bin=True),
alt.Y('count()', axis=alt.Axis(title='Count'), scale=alt.Scale(domain=[0, 350]))
).transform_filter(
datum.smoke == 1
).properties(
width=400,
height=400
)
mean2 = alt.Chart(babyData).mark_rule(color='red').encode(
x='mean(bwt-oz):Q',
size=alt.value(4)
).transform_filter(
datum.smoke == 1
)
doSmokeChart = do + mean2
# This layers, and puts them all together
layer = alt.layer(
dont,
mean,
do,
mean2
).properties(
title="Layered Histogram of Baby Weights of Mothers Who smoke Vs. Who Don't",
).configure_mark(
opacity=0.5,
color='blue',
)
layer
The final layered chart looks something like this:
I would simply like a legend specifying which histogram/mean belongs to what.
If I could color them too, and perhaps add a legend that way, that would be nice as well, but I am unsure how to do so.
Thanks for any insight!
Rather than manually creating layers with filtered data, you should use a color encoding on your full dataset: then a legend will be generated automatically.
For example:
import altair as alt
import pandas as pd
babyData = pd.read_csv('https://www.stat.berkeley.edu/users/statlabs/data/babiesI.data', delim_whitespace=True)
base = alt.Chart(babyData).transform_filter(
'datum.smoke != 9'
)
hist = base.mark_bar(opacity=0.5).encode(
alt.X("bwt:Q",title='Birth Weight (Ounces)', bin=True),
alt.Y('count()', title='Count'),
color='smoke:N'
).properties(
width=400,
height=400
)
mean = base.mark_rule().encode(
x='mean(bwt):Q',
size=alt.value(4),
color='smoke:N'
)
hist + mean
From there you could use standard approaches to Customize the color schemes used for each mark.
#jakevdp just beat me to it! I was going to say the same thing. Here is a full example for you to work with.
import pandas as pd
import altair as alt
# Link to data source
URL = 'https://www.stat.berkeley.edu/users/statlabs/data/babiesI.data'
# Read data into a pandas dataframe
df = pd.read_table(URL, sep='\s+')
hist = alt.Chart(df).mark_area(
opacity=0.7,
interpolate='step'
).encode(
alt.X("bwt:Q", axis=alt.Axis(title='Birth Weight (Ounces)'), bin=True),
alt.Y('count()', axis=alt.Axis(title='Count'), stack=None),
alt.Color('smoke:N')
).properties(
width=400,
height=400
).transform_filter(alt.datum.smoke != 9)
rule = alt.Chart(df).mark_rule(color='red').encode(
alt.Detail('smoke:N'),
alt.Color('smoke:N'),
alt.X('mean(bwt):Q'),
size=alt.value(4),
).transform_filter(alt.datum.smoke != 9)
hist + rule
Related
Here is the current code for my visualization and the chart it produces:
base = alt.Chart(cs_data).mark_bar().encode(x=alt.X("PROGRAM:N", axis=alt.Axis(title='University/Credit Level', labels=False)),
y=alt.Y('MD_EARN_WNE:Q', axis=alt.Axis(title='Median Graduate Salary')),
).properties(
width=480,
height=320
)
credit_labels = base.mark_text(align='left', baseline='middle', angle=270, dx=3, color='black').encode(
text='CREDDESC:O'
)
chart = base.mark_bar().encode(
color=alt.Color("INSTNM", title="University")
)
final = alt.layer(chart, credit_labels, data=cs_data)
final
https://i.stack.imgur.com/pvFcC.png
As you can see, the text seems to be misaligned for the two orange bars. They are actually aligned to where the bar should end, but an extra bit gets added to the bar.
If I remove the color encoding this goes away:
base = alt.Chart(cs_data).mark_bar().encode(x=alt.X("PROGRAM:N", axis=alt.Axis(title='University/Credit Level', labels=False)),
y=alt.Y('MD_EARN_WNE:Q', axis=alt.Axis(title='Median Graduate Salary')),
).properties(
width=480,
height=320
)
credit_labels = base.mark_text(align='left', baseline='middle', angle=270, dx=3, color='black').encode(
text='CREDDESC:O'
)
chart = base.mark_bar().encode(
)
final = alt.layer(chart, credit_labels, data=cs_data)
final
https://i.stack.imgur.com/MNVBY.png
What's going on here?
It seems you are overplotting for these 2 bars. With the color encoding, the bars get stacked, without they are just overlayed on each other. You can also see that the text is bolder, because these are 2 labels overlapping.
A fix would be to use y=alt.Y('sum(MD_EARN_WNE):Q') to sum the overlapping bars.
It would be good to look in the data and find the dataframe column which is causing this grouping, so you understand what you are exactly plotting.
I want to do a parallel coordinates plot with multiple y axis. I've found how to do it in Vega-Lite here but I haven't found the way to do it with Altair, there's only a very simple example where all the y axis are the same.
Is there any way to do this plot in altair?
Note that this kind of chart is not "built-in" to Altair or Vega-Lite, so the only way to create it is with a manual sequence of transforms, and manually constructing your axes from tick and text marks.
Here is an Altair version of the chart in the answer you linked to:
import altair as alt
from vega_datasets import data
base = alt.Chart(
data.iris.url
).transform_window(
index="count()"
).transform_fold(
["petalLength", "petalWidth", "sepalLength", "sepalWidth"]
).transform_joinaggregate(
min="min(value)",
max="max(value)",
groupby=["key"]
).transform_calculate(
norm_val="(datum.value - datum.min) / (datum.max - datum.min)",
mid="(datum.min + datum.max) / 2"
).properties(width=600, height=300)
lines = base.mark_line(opacity=0.3).encode(
x='key:N',
y=alt.Y('norm_val:Q', axis=None),
color="species:N",
detail="index:N",
tooltip=["petalLength:N", "petalWidth:N", "sepalLength:N", "sepalWidth:N"]
)
rules = base.mark_rule(
color="#ccc", tooltip=None
).encode(
x="key:N",
detail="count():Q",
)
def ytick(yvalue, field):
scale = base.encode(x='key:N', y=alt.value(yvalue), text=f"min({field}):Q")
return alt.layer(
scale.mark_text(baseline="middle", align="right", dx=-5, tooltip=None),
scale.mark_tick(size=8, color="#ccc", orient="horizontal", tooltip=None)
)
alt.layer(
lines, rules, ytick(0, "max"), ytick(150, "mid"), ytick(300, "min")
).configure_axisX(
domain=False, labelAngle=0, tickColor="#ccc", title=None
).configure_view(
stroke=None
)
I have a visualization I made in Altair and I want to place borders around each label on the y axis (sorry if I am explaining this incorrectly) to separate them. This is the code I have so far:
alt.Chart(q4df).transform_fold(
rosspaints,
as_=['column', 'value']
).mark_circle().encode(
x = alt.X('column:N', axis=None),
y = alt.Y('TITLE', title=None),
size = alt.Size('value:Q', legend=None),
color=alt.Color('column:N', legend=None,
scale=alt.Scale(
domain=['alizarin crimson','bright red','burnt umber','cadmium yellow','dark sienna',
'indian yellow','indian red','liquid black','liquid clear','black gesso',
'midnight black','phthalo blue','phthalo green','prussian blue','sap green',
'titanium white','van dyke brown','yellow ochre'],
range=['#94261f','#c06341','#614f4b','#f8ed57','#5c2f08','#e6ba25','#cd5c5c',
'#000000','#ffffff','#000000','#36373c','#2a64ad','#215c2c','#325fa3',
'#364e00','#f9f7eb','#2d1a0c','#b28426']))
).properties(
width=400,
height=700
).configure_axis(grid=False, labelFontWeight= 'bold', labelColor='black')
This is my current output:
This is my desired output:
You could either set a gridline for each y-axis tick like this:
import altair as alt
from vega_datasets import data
source = data.barley()
alt.Chart(source).mark_point().encode(
alt.X('yield:Q', axis=alt.Axis(grid=False)),
alt.Y('variety:N', axis=alt.Axis(grid=True)),
color='year:N'
).configure_view(
stroke=None
)
Or use facet according to the same variable you have encoded on the y-axis while resolving the y-scale so that only the y-axis entry with data points shows up in each plot:
alt.Chart(source).mark_point().encode(
alt.X('yield:Q', axis=alt.Axis(grid=False)),
alt.Y('variety:N', title=''),
color='year:N',
).facet(
row=alt.Facet('variety:N', title='', header=alt.Header(labels=False))
).resolve_scale(
y='independent'
)
I am adding states outline overlay on my choropleth plot in altair
My choropleth plot has tooltip for it.
When I layer the state outline on top of the choropleth I lose the tooltip feature of the plot
Anyone have ideas on how to handle this?
Any help would be appreciated
import altair as alt
# saving data into a file rather than embedding into the chart
alt.data_transformers.enable('json')
alt.renderers.enable('notebook')
# alt.renderers.enable('jupyterlab')
from vega_datasets import data
import pandas as pd
from altair import Scale,Color
states = alt.topo_feature(data.us_10m.url, 'states')
counties = alt.topo_feature(data.us_10m.url+'#', 'counties')
dummy='#dbe9f6'
scheme='blues'
type1='linear'
fg = alt.Chart(counties).mark_geoshape(
stroke='black',
strokeWidth=0.05
).project(
type='albersUsa'
).transform_lookup(
lookup='id',
from_=alt.LookupData(fdf, 'fips', ['year','Pill_per_pop','BUYER_COUNTY', 'state'])
).transform_calculate(
Pill_per_pop='isValid(datum.Pill_per_pop) ? datum.Pill_per_pop : -1'
).encode(
color = alt.condition(
'datum.Pill_per_pop > 0',
alt.Color('Pill_per_pop:Q', scale=Scale(scheme=scheme,type=type1)),
alt.value(dummy)
),
tooltip=['BUYER_COUNTY:N', 'state:N','Pill_per_pop:Q','year:Q']
).properties(
width=700,
height=400,
title='Pills per 100k people'
)
outline = alt.Chart(states).mark_geoshape(stroke='black',strokeWidth=0.2).project(
type='albersUsa'
)
fg+outline
My output
However I am unable to find a way to retain the tooltip of the previous layer i.e. county level map
I figured it out
Define a new chart as follows:
fg1 = alt.Chart(counties).mark_geoshape(
stroke='black',
strokeWidth=0.05
).project(
type='albersUsa'
).transform_lookup(
lookup='id',
from_=alt.LookupData(fdf, 'fips', ['year','Pill_per_pop','BUYER_COUNTY', 'state'])
).transform_calculate(
Pill_per_pop='isValid(datum.Pill_per_pop) ? datum.Pill_per_pop : -1'
).encode(
tooltip=['BUYER_COUNTY:N', 'state:N','Pill_per_pop:Q','year:Q']
).properties(
width=700,
height=400,
title='Pills per 100k people'
)
fg+outline+fg1
Just define the chart again with encoding for tool-tip and not for color and layer it on top of the above chart.
For some reason, the Y-axis while plotting with altair seems to be inverted (would expect values to go from lower (bottom) to higher (top) of the plot). Also, I would like to be able to change the ticks frequency. With older versions I could use ticks=n_ticks but it seems now this argument can take only boolean.
import altair as alt
alt.renderers.enable('notebook')
eff_metals = pd.read_excel(filename, sheet_name='summary_eff_metals')
points = alt.Chart(eff_metals, height=250, width=400).mark_circle().encode(
x=alt.X('Temperature:Q',axis=alt.Axis(title='Temperature (°C)'),
scale=alt.Scale(zero=False, padding=50)),
y=alt.Y('Efficiency:N',axis=alt.Axis(title='Efficiency (%)'),
scale=alt.Scale(zero=False, padding=1)),
color=alt.Color('Element:N'),
)
text = points.mark_text(align='right', dx=0, dy=-5).encode(
text='Element:N'
)
chart = alt.layer(points, text, data=eff_metals,
width=600, height=300)
chart
And the figure:
I don't have your data, so difficult to write working code.
But here's an example of an inverted scale with additional ticks that is similar to the example scatter with tooltips example. See here for it in the vega editor.
import altair as alt
from vega_datasets import data
iris = data.iris()
alt.Chart(iris).mark_point().encode(
x='petalWidth',
y=alt.Y('petalLength', scale=alt.Scale(domain=[7,0]), axis=alt.Axis(tickCount=100)),
color='species'
).interactive()
This might work with your data:
eff_metals = pd.read_excel(filename, sheet_name='summary_eff_metals')
points = alt.Chart(eff_metals, height=250, width=400).mark_circle().encode(
x=alt.X('Temperature:Q',axis=alt.Axis(title='Temperature (°C)'),
scale=alt.Scale(zero=False, padding=50)),
y=alt.Y('Efficiency:N',axis=alt.Axis(title='Efficiency (%)'),
scale=alt.Scale(zero=False, padding=1, domain=[17,1])),
color=alt.Color('Element:N'),
)
text = points.mark_text(align='right', dx=0, dy=-5).encode(
text='Element:N'
)
chart = alt.layer(points, text, data=eff_metals,
width=600, height=300)
chart
However, I think it's possible that you've might just have the wrong type on your efficiency variable. You could try and replace 'Efficiency:N' with `'Efficiency:Q' and that might do it?
While it's possible to reverse the domain manually, that requires hardcoding the bounds.
Instead we can just pass Scale(reverse=True) to the axis encoding, e.g.:
from vega_datasets import data
alt.Chart(data.wheat().head()).mark_bar().encode(
x='wheat:Q',
y=alt.Y('year:O', scale=alt.Scale(reverse=True)),
)
Here it's been passed to alt.Y, so the years are inverted (left) vs the default y='year:O' (right):