Direct labeling a line plot with Altair - python

I'm plotting a line graph in Altair (4.1.0) and would like to use direct labeling (annotations) instead of a regular legend.
As such, the text mark for each line (say, time series) should appear only once and at the right-most point of the x-axis (as opposed to this scatter plot example labeling every data point).
While I'm able to use pandas to manipulate the data to get the desired results, I think it would be more elegant to have a pure-Altair implementation, but I can't seem to get it right.
For example, given the following data:
import numpy as np
import pandas as pd
import altair as alt
np.random.seed(10)
time = pd.date_range(start="10/21/2020", end="10/22/2020", periods=n)
data = pd.concat([
pd.DataFrame({
"time": time,
"group": "One",
"value": np.random.normal(10, 2, n)}),
pd.DataFrame({
"time": time,
"group": "Two",
"value": np.random.normal(5, 2, n)}).iloc[:-1]
], ignore_index=True)
I can generate a satisfactory result using pandas to create a subset that includes the last time-point for each group:
lines = alt.Chart(data).mark_line(
point=True
).encode(
x="time:T",
y="value:Q",
color=alt.Color("group:N", legend=None), # Remove legend
)
text_data = data.loc[data.groupby('group')['time'].idxmax()] # Subset the data for text positions
labels = alt.Chart(text_data).mark_text(
# some adjustments
).encode(
x="time:T",
y="value:Q",
color="group:N",
text="group:N"
)
chart = lines + labels
However, if I try to use the main data and add Altair aggregations, for example using x=max(time) or explicit transform_aggregate(), I either get text annotations on all points or none at all (respectively).
Is there a better way to obtain the above result?

You can do this using an argmax aggregate in the y encoding. For example, your labels layer might look like this:
labels = alt.Chart(data).mark_text(
align='left', dx=5
).encode(
x='max(time):T',
y=alt.Y('value:Q', aggregate={'argmax': 'time'}),
text='group:N',
color='group:N',
)

Related

How do I add a secondary legend that explains what symbols mean with plotly express?

I have a plot which uses US states to map symbols. I currently assign symbols using the "state" column in my dataframe so that I can select particular states of interest by clicking or double clicking on the Plotly Express legend. This part is working fine. However, the symbol mapping I'm using also communicates information about territory, e.g. triangle-down means poor coverage in that state and many states will share this symbol. I would like to add another legend that shows what each shape means. How can I do this in Plotly Express? Alternatively, is there a way to display symbols in a footnote? I could also give the symbol definitions there.
The goal is to display that circle=Medium coverage, triangle-down=poor coverage, etc. in addition to the individual state legend I already have. If the legend is clickable such that I can select entire groups based on the symbol shape that would be the best possible outcome.
Thank you for any tips!
I tried using html and footnotes to display the symbols but it did not work.
as noted in comment, it can be achieved by additional traces on different axes
have simulated some data that matches what is implied in image and comments
from scatter figure extract out how symbols and colors have been assigned to states
build another scatter that is effectively a legend.
import pandas as pd
import numpy as np
import plotly.express as px
df_s = pd.read_html(
"https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States"
)[1].iloc[:, 0:2]
df_s.columns = ["name", "state"]
# generate a dataframe that matches structure in image and question
df = pd.DataFrame(
{"activity_month": pd.date_range("1-jan-2020", "today", freq="W")}
).assign(
value=lambda d: np.random.uniform(0, 1, len(d)),
state=lambda d: np.random.choice(df_s["state"], len(d)),
)
# straight forward scatter
fig = px.scatter(df, x="activity_month", y="value", symbol="state", color="state")
# extract out how symbols and colors have been assigned to states
df_symbol = pd.DataFrame(
[
{"symbol": t.marker.symbol, "state": t.name, "color": t.marker.color}
for t in fig.data
]
).assign(y=lambda d: d.index//20, x=lambda d: d.index%20)
# build a figure that effectively the legend
fig_legend = px.scatter(
df_symbol,
x="x",
y="y",
symbol="symbol",
color="state",
text="state",
color_discrete_sequence=df_symbol["color"]
).update_traces(textposition="middle right", showlegend=False, xaxis="x2", yaxis="y2")
# insert legend into scatter and format axes
fig.add_traces(fig_legend.data).update_layout(
yaxis_domain=[.15, 1],
yaxis2={"domain": [0, .15], "matches": None, "visible": False},
xaxis2={"visible":False},
xaxis={"position":0, "anchor":"free"},
showlegend=False
)

Plotting multiple graphs by grouping values from a data frame column?

I am rather new to visualizations and appreciate anything and help!
I am looking at how I could make multiple plots grouping a data frame by one column and then making a plot for each unique value. E.g. for the dataset below, I would like three different plots, one for each location, and I would like to be able to label them individually. I am not quite sure how to do that. The sample data frame:
data = {
"location": ["USA", "USA", "USA", "UK", "UK",
"UK", "World", "World", "World"],
"date": ["21-06-2021", "22-06-2021", "23-06-2021",
"21-06-2021", "22-06-2021", "23-06-2021",
"21-06-2021", "22-06-2021", "23-06-2021"],
"number": [456, 543, 675, 543, 765, 345, 9543, 9543, 9234]
}
import pandas as pd
df = pd.DataFrame (data, columns = ['location','date','number'])
df["date"] = pd.to_datetime(df["date"])
I tried doing this. It gives me three plots, but I don't know how to label and alter the descriptions of the graph individually.
df.groupby("location").plot(x="date", y="number", subplots=True)
And in general, it would be nice if the plot looked a bit nicer, like the one below (I am referring to the line and the dot):
import seaborn as sns
p = sns.catplot(data=df, x='date', y="number", hue='location', kind='point');
p.fig.set_figwidth(16)
p.fig.set_figheight(6)
import matplotlib.pyplot as plt
Try:
labels=[]
for k,v in df.groupby("location"):
plt.plot(v["date"],v["number"],marker='o')
labels.append(k)
plt.legend(labels,title='location')
plt.xticks(v["date"].unique())
Note: If you want to remove the top and right border then add this line at the end inside the for loop:
plt.gca().spines[['top','right']].set_visible(False)

Plotly: How to display individual value on histogram?

I am trying to make dynamic plots with plotly. I want to plot a count of data that have been aggregated (using groupby).
I want to facet the plot by color (and maybe even by column). The problem is that I want the value count to be displayed on each bar. With histogram, I get smooth bars but I can't find how to display the count:
With a bar plot I can display the count but I don't get smooth bar and the count does not appear for the whole bar but for each case composing that bar
Here is my code for the barplot
val = pd.DataFrame(data2.groupby(["program", "gender"])["experience"].value_counts())
px.bar(x=val.index.get_level_values(0), y=val, color=val.index.get_level_values(1), barmode="group", text=val)
It's basically the same for the histogram.
Thank you for your help!
px.histogram does not seem to have a text attribute. So if you're willing to do any binning before producing your plot, I would use px.Bar. Normally, you apply text to your barplot using px.Bar(... text = <something>). But this gives the results you've described with text for all subcategories of your data. But since we know that px.Bar adds data and annotations in the order that the source is organized, we can simply update text to the last subcategory applied using fig.data[-1].text = sums. The only challenge that remains is some data munging to retrieve the correct sums.
Plot:
Complete code with data example:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
# data
df = pd.DataFrame({'x':['a', 'b', 'c', 'd'],
'y1':[1, 4, 9, 16],
'y2':[1, 4, 9, 16],
'y3':[6, 8, 4.5, 8]})
df = df.set_index('x')
# calculations
# column sums for transposed dataframe
sums= []
for col in df.T:
sums.append(df.T[col].sum())
# change dataframe format from wide to long for input to plotly express
df = df.reset_index()
df = pd.melt(df, id_vars = ['x'], value_vars = df.columns[1:])
fig = px.bar(df, x='x', y='value', color='variable')
fig.data[-1].text = sums
fig.update_traces(textposition='inside')
fig.show()
If your first graph is with graph object librairy you can try:
# Use textposition='auto' for direct text
fig=go.Figure(data[go.Bar(x=val.index.get_level_values(0),
y=val, color=val.index.get_level_values(1),
barmode="group", text=val, textposition='auto',
)])

Adding legend to layerd chart in altair

Consider the following example:
import altair as alt
from vega_datasets import data
df = data.seattle_weather()
temp_max = alt.Chart(df).mark_line(color='blue').encode(
x='yearmonth(date):T',
y='max(temp_max)',
)
temp_min = alt.Chart(df).mark_line(color='red').encode(
x='yearmonth(date):T',
y='max(temp_min)',
)
temp_max + temp_min
In the resulting chart, I would like to add a legend that shows, that the blue line shows the maximum temperature and the red line the minimum temperature. What would be the easiest way to achieve this?
I saw (e.g. in the solution to this question: Labelling Layered Charts in Altair (Python)) that altair only adds a legend if in the encoding, you set the color or size or so, usually with a categorical column, but that is not possible here because I'm plotting the whole column and the label should be the column name (which is now shown in the y-axis label).
I would do a fold transform such that the variables could be encoded correctly.
import altair as alt
from vega_datasets import data
df = data.seattle_weather()
alt.Chart(df).mark_line().transform_fold(
fold=['temp_max', 'temp_min'],
as_=['variable', 'value']
).encode(
x='yearmonth(date):T',
y='max(value):Q',
color='variable:N'
)
If you layer two charts with the same columns and tell them to color by the same one, the legend will appear. Don't know is this helps but..
For example, i had:
Range, Amount, Type
0_5, 3, 'Private'
5_10, 5, 'Private'
Range, Amount, Type
0_5, 3, 'Public'
5_10, 5, 'Public'
and I charted both with 'color = 'Type'' and said alt.layer(chart1, chart2) and it showed me a proper legend

Change the order of bars in a grouped barplot with hvplot/holoviews

I try to create a grouped bar plot but can't figure out how to influence the order of the barplot.
Given these example data:
import pandas as pd
import hvplot.pandas
df = pd.DataFrame({
"lu": [200, 100, 10],
"le": [220, 80, 130],
"la": [60, 20, 15],
"group": [1, 2, 2],
})
df = df.groupby("group").sum()
I'd like to create a horizontal grouped bar plot showing the two groups 1 and 2 with all three columns. The columns should appear in the order of "le", "la" and "lu".
Naturally I'd try this with Hvplot:
df.hvplot.barh(x = "group", y = ["le", "la", "lu"])
With that I get the result below:
Hvplot does not seem to care about the order I add the columns (calling df.hvplot.barh(x = "group", y = ["lu", "le", "la"]) doesn't change anything. Nor does Hvplot seem to care about the original order in the dataframe.
Are there any options to influence the order of the bars?
For normal bar charts, you can just order your data in the way you want it to be plotted.
However, for grouped bar charts you can't set the order yet.
But development of this feature is on the way and probably available in one of the next releases: https://github.com/holoviz/holoviews/issues/3799
Current solutions with Hvplot 0.5.2 and Holoviews 1.12:
1) If you're using a Bokeh backend, you can use keyword hooks:
from itertools import product
# define hook function to set order on bokeh plot
def set_grouped_barplot_order(plot, element):
# define you categorical ordering in a list of tuples
factors = product(['2', '1'], ['le', 'la', 'lu'])
# since you're using horizontal bar set order on y_range.factors
# if you would have had a normal (vertical) barplot you would use x_range.factors
plot.state.y_range.factors = [*factors]
# create plot
group = df.groupby("group").sum()
group_plot = group.hvplot.barh(
x="group",
y=["le", "la", "lu"],
padding=0.05,
)
# apply your special ordering function
group_plot.opts(hooks=[set_grouped_barplot_order], backend='bokeh')
Hooks allow you to apply specific bokeh settings to your plots. You don't need hooks very often, but they are very handy in this case.
Documentation:
http://holoviews.org/user_guide/Customizing_Plots.html#Plot-hooks
https://holoviews.org/FAQ.html
2) Another solution would be to convert your Holoviews plot to an actual Bokeh plot and then set the ordering:
from itertools import product
import holoviews as hv
from bokeh.plotting import show
# create plot
group = df.groupby("group").sum()
group_plot = group.hvplot.barh(
x="group",
y=["le", "la", "lu"],
padding=0.05,
)
# render your holoviews plot as a bokeh plot
my_bokeh_plot = hv.render(group_plot, backend='bokeh')
# set the custom ordering on your bokeh plot
factors = product(['2', '1'], ['le', 'la', 'lu'])
my_bokeh_plot.y_range.factors = [*factors]
show(my_bokeh_plot)
Personally I prefer the first solution because it stays within Holoviews.
Resulting plot:
This has just been fixed in HoloViews 1.13.
You can sort your barplot just like you wanted:
df.hvplot.barh(x="group", y=["lu", "la", "le"])
As I write this, HoloViews 1.13 is not officially available yet, but you can install it through:
pip install git+https://github.com/holoviz/holoviews.git
If you want even more control over the order, you can use .redim.values() on your grouped_barplot:
group_specific_order = [2, 1]
variable_specific_order = ['lu', 'la', 'le']
# Note that group and Variable are the variable names of your dimensions here
# when you use this on a different grouped barchart, then please change to the
# names of your own dimensions.
your_grouped_barplot.redim.values(
group=group_specific_order,
Variable=variable_specific_order,
)

Categories