Altair slider transform data - python

I've been enjoying using Altair for a couple of weeks now but I'm stuck on how to solve a problem. I've been trying to do a simple plot of average temp data vs. month and using a slider widget to filter though the years. I can get the plot to work but as soon as I use the slider option it doesn't show any data. I tried just using the selection option but that didn't work. I just don't know how to handle the transform option. I use the US Population Over Time example as a guide.
import altair as alt
from altair.expr import datum, if_
alt.renderers.enable('notebook')
path = 'https://raw.githubusercontent.com/SpiritR/datpr6754/master/prtas_1901_2015.csv'
slider = alt.binding_range(min=1900, max=2020, step=10)
year = alt.selection_single(name="year", fields=['Year'], bind=slider)
alt.Chart(path).mark_bar().encode(
alt.X('Month_Name:O'),
alt.Y('tas:Q', scale=alt.Scale(domain=(20, 28))),
).properties(
width=900,
height=300,
).add_selection(
year
).transform_calculate(
????
).transform_filter(
year.ref()
)

The CSV data are being parsed as strings rather than numbers. When you use the slider to select a date (say 1959) it is filters the data to check which values are equal to that... and since the data are strings, "1959" != 1959 and the resulting subset is empty.
You can force the column to be parsed as a number, and then the slider will work correctly. For example:
import altair as alt
alt.renderers.enable('notebook')
path = 'https://raw.githubusercontent.com/SpiritR/datpr6754/master/prtas_1901_2015.csv'
data = alt.UrlData(url=path, format=alt.CsvDataFormat(parse={'Year': 'number'}))
slider = alt.binding_range(min=1901, max=2015, step=1)
year = alt.selection_single(name="year", fields=['Year'], bind=slider)
alt.Chart(data).mark_bar().encode(
alt.X('Month_Name:O'),
alt.Y('tas:Q', scale=alt.Scale(domain=(20, 28))),
).properties(
width=900,
height=300,
).add_selection(
year
).transform_filter(
year
)

Related

Add formatting, surrounding box to Altair vertical line tooltip label?

I am new to Altair, and am attempting to plot a monthly time-series variable, and have a vertical line tooltip display the date and corresponding y-value.
The code I have (warning, probably a bit ugly) gets me most of the way there:
import altair as alt
import datetime as dt
import numpy as np
import pandas as pd
# create DataFrame
monthly_dates = pd.date_range('1997-09-01', '2022-08-01', freq = 'M')
monthly_data = pd.DataFrame(
index=['Date', 'y_var'],
data=[monthly_dates, np.random.normal(size = len(monthly_dates))]
).T
# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection(type='single', nearest=True, on='mouseover',
fields=['Date'], empty='none')
# The basic line
line = alt.Chart(monthly_data).mark_line().encode(
x='Date:T',
y=alt.Y('y_var', title='Y variable')
)
# Transparent selectors across the chart. This is what tells us
# the x-value of the cursor
selectors = alt.Chart(monthly_data).mark_point().encode(
x='Date',
opacity=alt.value(0),
).add_selection(
nearest
)
# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)
# Draw text labels near the points, and highlight based on selection
text_x = line.mark_text(align='left', dx=5, dy=-10).encode(
text=alt.condition(nearest, 'Date', alt.value(' '))
)
# Draw text labels near the points, and highlight based on selection
text_y = line.mark_text(align='left', dx=5, dy=5).encode(
text=alt.condition(nearest, 'y_var', alt.value(' '))
).transform_calculate(label='datum.y_var + "%"')
# Draw a rule at the location of the selection
rules = alt.Chart(monthly_data).mark_rule(color='gray').encode(
x='Date',
).transform_filter(
nearest
)
# Put the seven layers into a chart and bind the data
chart = alt.layer(
line, selectors, points, rules, text_x, text_y
).properties(
width=600, height=300
).interactive()
chart.show()
yields the following interactive chart:
There are two things I need to do, though:
Add a box around the tooltip labels (and a plain background to this box), so that they are easy to read.
Format the labels independently: since we have monthly data, it would be great to drop the day and just have Oct 2008 or 2008-10 or something along those lines. For the value, rounding to one or two digits and adding '%' afterwards would be great. I tried using the example found here (as you can see for creating text_y) but to no avail.
Any and all help would be greatly appreciated. Apologies in advance for any dumb mistakes or poor coding practices; again, I am still learning the basics of Altair.
Update: I figured both out.
The solutions to both 1 and 2 are in the code below.
For 1: instead of trying to add a box around the text manually, I instead added tooltips to the selectors object and dropped the text_x and text_y entirely.
For 2: I used transform_calculate to create new fields for x_label and y_label that are exactly what I want to display, then feed these into the tooltip objects. This page has tons of ways to transform data.
selectors = alt.Chart(monthly_data).mark_point().transform_calculate(
x_label='timeFormat(datum.Date, "%b %Y")',
y_label='format(datum.y_var, ".1f") + "%"'
).encode(
x='Date',
opacity=alt.value(0),
tooltip=[
alt.Tooltip('x_label:N', title='Date'),
alt.Tooltip('y_label:N', title='Pct. Change')
]
).add_selection(
nearest
)
The finished product:

Fix scale botttom colour on 0 in altair

I am generating a waffle plot (github-like activity heatmap) in the following way:
import altair as alt
import pandas as pd
# Import data
df = pd.read_csv("https://pastebin.com/raw/AzwJ0va4")
# Year interactive dropdown
years = list(df["year"].unique())
year_dropdown = alt.binding_select(options=years)
selection = alt.selection_single(
fields=["year"], bind=year_dropdown, name="Year", init={"year": 2020}
)
# Plot
(
alt.Chart(df)
.mark_rect()
.encode(
x=alt.X("week:O", title="Week"),
y=alt.Y("day(committed_on):O", title=""),
color=alt.Color(
"hash:Q", scale=alt.Scale(range=["transparent", "green"]), title="Commits"
),
tooltip=[
alt.Tooltip("committed_on", title="Date"),
alt.Tooltip("day(committed_on)", title="Day"),
alt.Tooltip("hash", title="Commits"),
],
)
.add_selection(selection)
.transform_filter(selection)
.properties(width=1000, height=200)
)
The resulting plot is behaving 99% as I would expect, but when I select a year with no activity (hash column populated as 0), as 2017, the plot will be filled with green squares as 0 anchored exactly in the middle of the scale.
How can I make sure that 0 is always placed at the bottom of the scale? (transparent color)
You can set the domain of the color scale the same way you set it for an axes: scale=alt.Scale(range=["transparent", "green"], domain=[0, 16]). It is possible to set just domainMin in newer version of VegaLite but not yet in Altair. In your case it is probably a got idea to set both min and max anyways, so that colors are interpreted the same for all years.

Python Bokeh: Issues with fill_color in. Choropleth Map fills with grey except values of zero. Possibly a range issue?

I am very new to using Python and especially new to using the Bokeh library. I am trying to plot a Choropleth map of the United States with the fill color of each state corresponding to their bee population of a year.
It shows the value when you hover over it, but only the states with a value of zero have color.
Link to an image of the output plot is here.
I know there is a big difference in the range (minimum:0, maximum: 310,000) which I believe is causing the problem. How can I change the range of the color map to not fill all of the higher values with grey?
Code for reference below:
from bokeh.models import LogColorMapper
from bokeh.palettes import YlGnBu9 as YlGnBu
from bokeh.sampledata.us_states import data as us_states
import pandas as pd
import numpy as np
bee_pop = pd.read_csv('./BeePopulation.csv')
us_states_df = pd.DataFrame(us_states).T
us_states_df = us_states_df[~us_states_df["name"].isin(['Alaska', "Hawaii", "District of
Columbia"])]
us_states_df["lons"] = us_states_df.lons.values.tolist()
us_states_df["lats"] = us_states_df.lats.values.tolist()
us_states_df = us_states_df.reset_index()
bee_2016 = bee_pop[bee_pop['Year']==2016]
us_states_df = us_states_df.merge(bee_2016[["State", "Pop"]], how="left", left_on="index",
right_on="State")
us_states_df.head()
us_states_datasource = {}
us_states_datasource["lons"] = us_states_df.lons.values.tolist()
us_states_datasource["lats"] = us_states_df.lats.values.tolist()
us_states_datasource["name"] = us_states_df.name.values.tolist()
us_states_datasource["BeePop"] = us_states_df.Pop.values.tolist()
fig = figure(plot_width=900, plot_height=600,
title="United Bee Population Per State Choropleth Map",
x_axis_location=None, y_axis_location=None,
tooltips=[
("Name", "#name"), ("Bee Population", "#BeePop")
])
fig.grid.grid_line_color = None
fig.patches("lons", "lats", source=us_states_datasource,
fill_color={'field': 'BeePop', 'transform': LogColorMapper(palette=YlGnBu[::-1])},
fill_alpha=0.7, line_color="white", line_width=0.5)
show(fig)
Thank you in advance!
The LogColorMapper has configurable high and low properties. Another option, of course, is to use a different color mapper, e.g. LinearColorMapper or CategorgicalColorMapper in conjunction with some categorical binning.

Adding legend to layerd chart in altair

Consider the following example:
import altair as alt
from vega_datasets import data
df = data.seattle_weather()
temp_max = alt.Chart(df).mark_line(color='blue').encode(
x='yearmonth(date):T',
y='max(temp_max)',
)
temp_min = alt.Chart(df).mark_line(color='red').encode(
x='yearmonth(date):T',
y='max(temp_min)',
)
temp_max + temp_min
In the resulting chart, I would like to add a legend that shows, that the blue line shows the maximum temperature and the red line the minimum temperature. What would be the easiest way to achieve this?
I saw (e.g. in the solution to this question: Labelling Layered Charts in Altair (Python)) that altair only adds a legend if in the encoding, you set the color or size or so, usually with a categorical column, but that is not possible here because I'm plotting the whole column and the label should be the column name (which is now shown in the y-axis label).
I would do a fold transform such that the variables could be encoded correctly.
import altair as alt
from vega_datasets import data
df = data.seattle_weather()
alt.Chart(df).mark_line().transform_fold(
fold=['temp_max', 'temp_min'],
as_=['variable', 'value']
).encode(
x='yearmonth(date):T',
y='max(value):Q',
color='variable:N'
)
If you layer two charts with the same columns and tell them to color by the same one, the legend will appear. Don't know is this helps but..
For example, i had:
Range, Amount, Type
0_5, 3, 'Private'
5_10, 5, 'Private'
Range, Amount, Type
0_5, 3, 'Public'
5_10, 5, 'Public'
and I charted both with 'color = 'Type'' and said alt.layer(chart1, chart2) and it showed me a proper legend

Invert axis direction Altair

For some reason, the Y-axis while plotting with altair seems to be inverted (would expect values to go from lower (bottom) to higher (top) of the plot). Also, I would like to be able to change the ticks frequency. With older versions I could use ticks=n_ticks but it seems now this argument can take only boolean.
import altair as alt
alt.renderers.enable('notebook')
eff_metals = pd.read_excel(filename, sheet_name='summary_eff_metals')
points = alt.Chart(eff_metals, height=250, width=400).mark_circle().encode(
x=alt.X('Temperature:Q',axis=alt.Axis(title='Temperature (°C)'),
scale=alt.Scale(zero=False, padding=50)),
y=alt.Y('Efficiency:N',axis=alt.Axis(title='Efficiency (%)'),
scale=alt.Scale(zero=False, padding=1)),
color=alt.Color('Element:N'),
)
text = points.mark_text(align='right', dx=0, dy=-5).encode(
text='Element:N'
)
chart = alt.layer(points, text, data=eff_metals,
width=600, height=300)
chart
And the figure:
I don't have your data, so difficult to write working code.
But here's an example of an inverted scale with additional ticks that is similar to the example scatter with tooltips example. See here for it in the vega editor.
import altair as alt
from vega_datasets import data
iris = data.iris()
alt.Chart(iris).mark_point().encode(
x='petalWidth',
y=alt.Y('petalLength', scale=alt.Scale(domain=[7,0]), axis=alt.Axis(tickCount=100)),
color='species'
).interactive()
This might work with your data:
eff_metals = pd.read_excel(filename, sheet_name='summary_eff_metals')
points = alt.Chart(eff_metals, height=250, width=400).mark_circle().encode(
x=alt.X('Temperature:Q',axis=alt.Axis(title='Temperature (°C)'),
scale=alt.Scale(zero=False, padding=50)),
y=alt.Y('Efficiency:N',axis=alt.Axis(title='Efficiency (%)'),
scale=alt.Scale(zero=False, padding=1, domain=[17,1])),
color=alt.Color('Element:N'),
)
text = points.mark_text(align='right', dx=0, dy=-5).encode(
text='Element:N'
)
chart = alt.layer(points, text, data=eff_metals,
width=600, height=300)
chart
However, I think it's possible that you've might just have the wrong type on your efficiency variable. You could try and replace 'Efficiency:N' with `'Efficiency:Q' and that might do it?
While it's possible to reverse the domain manually, that requires hardcoding the bounds.
Instead we can just pass Scale(reverse=True) to the axis encoding, e.g.:
from vega_datasets import data
alt.Chart(data.wheat().head()).mark_bar().encode(
x='wheat:Q',
y=alt.Y('year:O', scale=alt.Scale(reverse=True)),
)
Here it's been passed to alt.Y, so the years are inverted (left) vs the default y='year:O' (right):

Categories