Here is the current code for my visualization and the chart it produces:
base = alt.Chart(cs_data).mark_bar().encode(x=alt.X("PROGRAM:N", axis=alt.Axis(title='University/Credit Level', labels=False)),
y=alt.Y('MD_EARN_WNE:Q', axis=alt.Axis(title='Median Graduate Salary')),
).properties(
width=480,
height=320
)
credit_labels = base.mark_text(align='left', baseline='middle', angle=270, dx=3, color='black').encode(
text='CREDDESC:O'
)
chart = base.mark_bar().encode(
color=alt.Color("INSTNM", title="University")
)
final = alt.layer(chart, credit_labels, data=cs_data)
final
https://i.stack.imgur.com/pvFcC.png
As you can see, the text seems to be misaligned for the two orange bars. They are actually aligned to where the bar should end, but an extra bit gets added to the bar.
If I remove the color encoding this goes away:
base = alt.Chart(cs_data).mark_bar().encode(x=alt.X("PROGRAM:N", axis=alt.Axis(title='University/Credit Level', labels=False)),
y=alt.Y('MD_EARN_WNE:Q', axis=alt.Axis(title='Median Graduate Salary')),
).properties(
width=480,
height=320
)
credit_labels = base.mark_text(align='left', baseline='middle', angle=270, dx=3, color='black').encode(
text='CREDDESC:O'
)
chart = base.mark_bar().encode(
)
final = alt.layer(chart, credit_labels, data=cs_data)
final
https://i.stack.imgur.com/MNVBY.png
What's going on here?
It seems you are overplotting for these 2 bars. With the color encoding, the bars get stacked, without they are just overlayed on each other. You can also see that the text is bolder, because these are 2 labels overlapping.
A fix would be to use y=alt.Y('sum(MD_EARN_WNE):Q') to sum the overlapping bars.
It would be good to look in the data and find the dataframe column which is causing this grouping, so you understand what you are exactly plotting.
Related
I am new to Altair, and am attempting to plot a monthly time-series variable, and have a vertical line tooltip display the date and corresponding y-value.
The code I have (warning, probably a bit ugly) gets me most of the way there:
import altair as alt
import datetime as dt
import numpy as np
import pandas as pd
# create DataFrame
monthly_dates = pd.date_range('1997-09-01', '2022-08-01', freq = 'M')
monthly_data = pd.DataFrame(
index=['Date', 'y_var'],
data=[monthly_dates, np.random.normal(size = len(monthly_dates))]
).T
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection(type='single', nearest=True, on='mouseover',
fields=['Date'], empty='none')
# The basic line
line = alt.Chart(monthly_data).mark_line().encode(
x='Date:T',
y=alt.Y('y_var', title='Y variable')
)
# Transparent selectors across the chart. This is what tells us
# the x-value of the cursor
selectors = alt.Chart(monthly_data).mark_point().encode(
x='Date',
opacity=alt.value(0),
).add_selection(
nearest
)
# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)
# Draw text labels near the points, and highlight based on selection
text_x = line.mark_text(align='left', dx=5, dy=-10).encode(
text=alt.condition(nearest, 'Date', alt.value(' '))
)
# Draw text labels near the points, and highlight based on selection
text_y = line.mark_text(align='left', dx=5, dy=5).encode(
text=alt.condition(nearest, 'y_var', alt.value(' '))
).transform_calculate(label='datum.y_var + "%"')
# Draw a rule at the location of the selection
rules = alt.Chart(monthly_data).mark_rule(color='gray').encode(
x='Date',
).transform_filter(
nearest
)
# Put the seven layers into a chart and bind the data
chart = alt.layer(
line, selectors, points, rules, text_x, text_y
).properties(
width=600, height=300
).interactive()
chart.show()
yields the following interactive chart:
There are two things I need to do, though:
Add a box around the tooltip labels (and a plain background to this box), so that they are easy to read.
Format the labels independently: since we have monthly data, it would be great to drop the day and just have Oct 2008 or 2008-10 or something along those lines. For the value, rounding to one or two digits and adding '%' afterwards would be great. I tried using the example found here (as you can see for creating text_y) but to no avail.
Any and all help would be greatly appreciated. Apologies in advance for any dumb mistakes or poor coding practices; again, I am still learning the basics of Altair.
Update: I figured both out.
The solutions to both 1 and 2 are in the code below.
For 1: instead of trying to add a box around the text manually, I instead added tooltips to the selectors object and dropped the text_x and text_y entirely.
For 2: I used transform_calculate to create new fields for x_label and y_label that are exactly what I want to display, then feed these into the tooltip objects. This page has tons of ways to transform data.
selectors = alt.Chart(monthly_data).mark_point().transform_calculate(
x_label='timeFormat(datum.Date, "%b %Y")',
y_label='format(datum.y_var, ".1f") + "%"'
).encode(
x='Date',
opacity=alt.value(0),
tooltip=[
alt.Tooltip('x_label:N', title='Date'),
alt.Tooltip('y_label:N', title='Pct. Change')
]
).add_selection(
nearest
)
The finished product:
I am visualizing some data and I need to remove the gaps between the cells (It can be seen clearly in the picture). The white gaps between red and blue cells/elements.
The red represents 'no', while the blue, 'yes'. The code snippet is:
fig = px.bar(
data,
x = policy_type,
y = 'State',
color = policy_type,
title = title
)
This is mentioned in the plotly documentation. You can use histogram to avoid the striped look. It has a similar call signature to bar:
fig = px.histogram(
data,
x = policy_type,
y = 'State',
color = policy_type,
barmode = 'group',
title = title
)
I am generating a waffle plot (github-like activity heatmap) in the following way:
import altair as alt
import pandas as pd
# Import data
df = pd.read_csv("https://pastebin.com/raw/AzwJ0va4")
# Year interactive dropdown
years = list(df["year"].unique())
year_dropdown = alt.binding_select(options=years)
selection = alt.selection_single(
fields=["year"], bind=year_dropdown, name="Year", init={"year": 2020}
)
# Plot
(
alt.Chart(df)
.mark_rect()
.encode(
x=alt.X("week:O", title="Week"),
y=alt.Y("day(committed_on):O", title=""),
color=alt.Color(
"hash:Q", scale=alt.Scale(range=["transparent", "green"]), title="Commits"
),
tooltip=[
alt.Tooltip("committed_on", title="Date"),
alt.Tooltip("day(committed_on)", title="Day"),
alt.Tooltip("hash", title="Commits"),
],
)
.add_selection(selection)
.transform_filter(selection)
.properties(width=1000, height=200)
)
The resulting plot is behaving 99% as I would expect, but when I select a year with no activity (hash column populated as 0), as 2017, the plot will be filled with green squares as 0 anchored exactly in the middle of the scale.
How can I make sure that 0 is always placed at the bottom of the scale? (transparent color)
You can set the domain of the color scale the same way you set it for an axes: scale=alt.Scale(range=["transparent", "green"], domain=[0, 16]). It is possible to set just domainMin in newer version of VegaLite but not yet in Altair. In your case it is probably a got idea to set both min and max anyways, so that colors are interpreted the same for all years.
I'm trying to use mark_text to create a stacked text in a stacked bar chart. I would like to label each bar with the value of 'Time'. Is it possible to have text marks in the corresponding stack of a stacked area chart?
Here's how I create bar & text chart:
bar = alt.Chart(df_pivot, title = {'text' :'How do people spend their time?', 'subtitle' : 'Average of minutes per day from time-use diaries for people between 15 and 64'}).mark_bar().transform_calculate(
filtered="datum.Category == 'Paid work'"
).transform_joinaggregate(sort_val="sum(filtered)", groupby=["Country"]
).encode(
x=alt.X('Time', stack='zero'),
y=alt.Y('Country', sort=alt.SortField('sort_val', order='descending')),
color=alt.Color('Category:N', sort=CatOrder),
order=alt.Order('color_Category_sort_index:Q'),
tooltip=['Country', 'Category', 'Time']
).interactive()
bar
text = alt.Chart(df_pivot).mark_text(align='center', baseline='middle', color='black').transform_calculate(
filtered="datum.Category == 'Paid work'"
).transform_joinaggregate(sort_val="sum(filtered)", groupby=["Country"]
).encode(
x=alt.X('Time:Q', stack='zero'),
y=alt.Y('Country', sort=alt.SortField('sort_val', order='descending')),
detail='Category:N',
text=alt.Text('Time:Q', format='.0f')
)
bar + text
Issue:
The text is not in its proper stack & The order of the text is also wrong.
The Y sorting is reset and they are no longer sorted as expected.
It's not that I don't understand why I have these issues. I'm new to this platform, the source code via my notebook: https://www.kaggle.com/interphuoc0101/times-use. Thanks a lot.
Your bar chart specifies a stack order:
order=alt.Order('color_Category_sort_index:Q'),
You should add a matching order encoding to your text layer to ensure the text appears in the same order.
Here is an example of how you can use order in both charts:
import altair as alt
from vega_datasets import data
source=data.barley()
bars = alt.Chart(source).mark_bar().encode(
x=alt.X('sum(yield):Q', stack='zero'),
y=alt.Y('variety:N'),
color=alt.Color('site'),
order=alt.Order('color_Category_sort_index:Q'),
)
text = alt.Chart(source).mark_text(dx=-15, dy=3, color='white').encode(
x=alt.X('sum(yield):Q', stack='zero'),
y=alt.Y('variety:N'),
detail='site:N',
text=alt.Text('sum(yield):Q', format='.1f'),
order=alt.Order('color_Category_sort_index:Q')
)
bars + text
For some reason, the Y-axis while plotting with altair seems to be inverted (would expect values to go from lower (bottom) to higher (top) of the plot). Also, I would like to be able to change the ticks frequency. With older versions I could use ticks=n_ticks but it seems now this argument can take only boolean.
import altair as alt
alt.renderers.enable('notebook')
eff_metals = pd.read_excel(filename, sheet_name='summary_eff_metals')
points = alt.Chart(eff_metals, height=250, width=400).mark_circle().encode(
x=alt.X('Temperature:Q',axis=alt.Axis(title='Temperature (°C)'),
scale=alt.Scale(zero=False, padding=50)),
y=alt.Y('Efficiency:N',axis=alt.Axis(title='Efficiency (%)'),
scale=alt.Scale(zero=False, padding=1)),
color=alt.Color('Element:N'),
)
text = points.mark_text(align='right', dx=0, dy=-5).encode(
text='Element:N'
)
chart = alt.layer(points, text, data=eff_metals,
width=600, height=300)
chart
And the figure:
I don't have your data, so difficult to write working code.
But here's an example of an inverted scale with additional ticks that is similar to the example scatter with tooltips example. See here for it in the vega editor.
import altair as alt
from vega_datasets import data
iris = data.iris()
alt.Chart(iris).mark_point().encode(
x='petalWidth',
y=alt.Y('petalLength', scale=alt.Scale(domain=[7,0]), axis=alt.Axis(tickCount=100)),
color='species'
).interactive()
This might work with your data:
eff_metals = pd.read_excel(filename, sheet_name='summary_eff_metals')
points = alt.Chart(eff_metals, height=250, width=400).mark_circle().encode(
x=alt.X('Temperature:Q',axis=alt.Axis(title='Temperature (°C)'),
scale=alt.Scale(zero=False, padding=50)),
y=alt.Y('Efficiency:N',axis=alt.Axis(title='Efficiency (%)'),
scale=alt.Scale(zero=False, padding=1, domain=[17,1])),
color=alt.Color('Element:N'),
)
text = points.mark_text(align='right', dx=0, dy=-5).encode(
text='Element:N'
)
chart = alt.layer(points, text, data=eff_metals,
width=600, height=300)
chart
However, I think it's possible that you've might just have the wrong type on your efficiency variable. You could try and replace 'Efficiency:N' with `'Efficiency:Q' and that might do it?
While it's possible to reverse the domain manually, that requires hardcoding the bounds.
Instead we can just pass Scale(reverse=True) to the axis encoding, e.g.:
from vega_datasets import data
alt.Chart(data.wheat().head()).mark_bar().encode(
x='wheat:Q',
y=alt.Y('year:O', scale=alt.Scale(reverse=True)),
)
Here it's been passed to alt.Y, so the years are inverted (left) vs the default y='year:O' (right):