Adding legend to layerd chart in altair - python

Consider the following example:
import altair as alt
from vega_datasets import data
df = data.seattle_weather()
temp_max = alt.Chart(df).mark_line(color='blue').encode(
x='yearmonth(date):T',
y='max(temp_max)',
)
temp_min = alt.Chart(df).mark_line(color='red').encode(
x='yearmonth(date):T',
y='max(temp_min)',
)
temp_max + temp_min
In the resulting chart, I would like to add a legend that shows, that the blue line shows the maximum temperature and the red line the minimum temperature. What would be the easiest way to achieve this?
I saw (e.g. in the solution to this question: Labelling Layered Charts in Altair (Python)) that altair only adds a legend if in the encoding, you set the color or size or so, usually with a categorical column, but that is not possible here because I'm plotting the whole column and the label should be the column name (which is now shown in the y-axis label).

I would do a fold transform such that the variables could be encoded correctly.
import altair as alt
from vega_datasets import data
df = data.seattle_weather()
alt.Chart(df).mark_line().transform_fold(
fold=['temp_max', 'temp_min'],
as_=['variable', 'value']
).encode(
x='yearmonth(date):T',
y='max(value):Q',
color='variable:N'
)

If you layer two charts with the same columns and tell them to color by the same one, the legend will appear. Don't know is this helps but..
For example, i had:
Range, Amount, Type
0_5, 3, 'Private'
5_10, 5, 'Private'
Range, Amount, Type
0_5, 3, 'Public'
5_10, 5, 'Public'
and I charted both with 'color = 'Type'' and said alt.layer(chart1, chart2) and it showed me a proper legend

Related

Hiding facet row or column headers

How can I hide the row (or column) header labels in a facet chart?
I rotated the labels by 45 degrees in the following example (copied from this post) to highlight which ones I mean, the year numbers:
import altair as alt
from vega_datasets import data
df = data.seattle_weather()
alt.Chart(df).mark_rect().encode(
alt.Y('month(date):O', title='day'),
alt.X('date(date):O', title='month'),
color='temp_max:Q'
).facet(
row=alt.Row(
'year(date):N',
header=alt.Header(labelAngle=45)
)
)
Another way to do this is to turn off the labels by setting labels=False:
header=alt.Header(labels=False)
One way to do this is to use a labelExpr containing an empty string:
header = alt.Header(labelExpr="''")

Plotly: How to display individual value on histogram?

I am trying to make dynamic plots with plotly. I want to plot a count of data that have been aggregated (using groupby).
I want to facet the plot by color (and maybe even by column). The problem is that I want the value count to be displayed on each bar. With histogram, I get smooth bars but I can't find how to display the count:
With a bar plot I can display the count but I don't get smooth bar and the count does not appear for the whole bar but for each case composing that bar
Here is my code for the barplot
val = pd.DataFrame(data2.groupby(["program", "gender"])["experience"].value_counts())
px.bar(x=val.index.get_level_values(0), y=val, color=val.index.get_level_values(1), barmode="group", text=val)
It's basically the same for the histogram.
Thank you for your help!
px.histogram does not seem to have a text attribute. So if you're willing to do any binning before producing your plot, I would use px.Bar. Normally, you apply text to your barplot using px.Bar(... text = <something>). But this gives the results you've described with text for all subcategories of your data. But since we know that px.Bar adds data and annotations in the order that the source is organized, we can simply update text to the last subcategory applied using fig.data[-1].text = sums. The only challenge that remains is some data munging to retrieve the correct sums.
Plot:
Complete code with data example:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
# data
df = pd.DataFrame({'x':['a', 'b', 'c', 'd'],
'y1':[1, 4, 9, 16],
'y2':[1, 4, 9, 16],
'y3':[6, 8, 4.5, 8]})
df = df.set_index('x')
# calculations
# column sums for transposed dataframe
sums= []
for col in df.T:
sums.append(df.T[col].sum())
# change dataframe format from wide to long for input to plotly express
df = df.reset_index()
df = pd.melt(df, id_vars = ['x'], value_vars = df.columns[1:])
fig = px.bar(df, x='x', y='value', color='variable')
fig.data[-1].text = sums
fig.update_traces(textposition='inside')
fig.show()
If your first graph is with graph object librairy you can try:
# Use textposition='auto' for direct text
fig=go.Figure(data[go.Bar(x=val.index.get_level_values(0),
y=val, color=val.index.get_level_values(1),
barmode="group", text=val, textposition='auto',
)])

Plotly subplot represent same y-axis name with same color and single legend

I am trying to create a plot for two categories in a subplot. 1st column represent category FF and 2nd column represent category RF in the subplot.
The x-axis is always time and y-axis is remaining columns. In other words, it is a plot with one column vs rest.
1st category and 2nd category always have same column names just only the values differs.
I tried to generate the plot in a for loop but the problem is plotly treats each column name as distinct and thereby it represents the lines in different color for y-axis with same name. As a consequence, in legend also an entry is created.
For example, in first row Time vs price2010 I want both subplot FF and RF to be represented in same color (say blue) and a single entry in legend.
I tried adding legendgroup in go.Scatter but it doesn't help.
import pandas as pd
from pandas import DataFrame
from plotly import tools
from plotly.offline import init_notebook_mode, plot, iplot
import plotly.graph_objs as go
from plotly.subplots import make_subplots
CarA = {'Time': [10,20,30,40 ],
'Price2010': [22000,26000,27000,35000],
'Price2011': [23000,27000,28000,36000],
'Price2012': [24000,28000,29000,37000],
'Price2013': [25000,29000,30000,38000],
'Price2014': [26000,30000,31000,39000],
'Price2015': [27000,31000,32000,40000],
'Price2016': [28000,32000,33000,41000]
}
ff = DataFrame(CarA)
CarB = {'Time': [8,18,28,38 ],
'Price2010': [19000,20000,21000,22000],
'Price2011': [20000,21000,22000,23000],
'Price2012': [21000,22000,23000,24000],
'Price2013': [22000,23000,24000,25000],
'Price2014': [23000,24000,25000,26000],
'Price2015': [24000,25000,26000,27000],
'Price2016': [25000,26000,27000,28000]
}
rf = DataFrame(CarB)
Type = {
'FF' : ff,
'RF' : rf
}
fig = make_subplots(rows=len(ff.columns), cols=len(Type), subplot_titles=('FF','RF'),vertical_spacing=0.3/len(ff.columns))
labels = ff.columns[1:]
for indexC, (cat, values) in enumerate(Type.items()):
for indexP, params in enumerate(values.columns[1:]):
trace = go.Scatter(x=values.iloc[:,0], y=values[params], mode='lines', name=params,legendgroup=params)
fig.append_trace(trace,indexP+1, indexC+1)
fig.update_xaxes(title_text=values.columns[0],row=indexP+1, col=indexC+1)
fig.update_yaxes(title_text=params,row=indexP+1, col=indexC+1)
fig.update_layout(height=2024, width=1024,title_text="Car Analysis")
iplot(fig)
It might not be a good solution, but so far I can able to come up only with this hack.
fig = make_subplots(rows=len(ff.columns), cols=len(Type), subplot_titles=('FF','RF'),vertical_spacing=0.2/len(ff.columns))
labels = ff.columns[1:]
colors = [ '#a60000', '#f29979', '#d98d36', '#735c00', '#778c23', '#185900', '#00a66f']
legend = True
for indexC, (cat, values) in enumerate(Type.items()):
for indexP, params in enumerate(values.columns[1:]):
trace = go.Scatter(x=values.iloc[:,0], y=values[params], mode='lines', name=params,legendgroup=params, showlegend=legend, marker=dict(
color=colors[indexP]))
fig.append_trace(trace,indexP+1, indexC+1)
fig.update_xaxes(title_text=values.columns[0],row=indexP+1, col=indexC+1)
fig.update_yaxes(title_text=params,row=indexP+1, col=indexC+1)
fig.update_layout(height=1068, width=1024,title_text="Car Analysis")
legend = False
If you combine your data into a single tidy data frame, you can use a simple Plotly Express call to make the chart: px.line() with color, facet_row and facet_col

Invert axis direction Altair

For some reason, the Y-axis while plotting with altair seems to be inverted (would expect values to go from lower (bottom) to higher (top) of the plot). Also, I would like to be able to change the ticks frequency. With older versions I could use ticks=n_ticks but it seems now this argument can take only boolean.
import altair as alt
alt.renderers.enable('notebook')
eff_metals = pd.read_excel(filename, sheet_name='summary_eff_metals')
points = alt.Chart(eff_metals, height=250, width=400).mark_circle().encode(
x=alt.X('Temperature:Q',axis=alt.Axis(title='Temperature (°C)'),
scale=alt.Scale(zero=False, padding=50)),
y=alt.Y('Efficiency:N',axis=alt.Axis(title='Efficiency (%)'),
scale=alt.Scale(zero=False, padding=1)),
color=alt.Color('Element:N'),
)
text = points.mark_text(align='right', dx=0, dy=-5).encode(
text='Element:N'
)
chart = alt.layer(points, text, data=eff_metals,
width=600, height=300)
chart
And the figure:
I don't have your data, so difficult to write working code.
But here's an example of an inverted scale with additional ticks that is similar to the example scatter with tooltips example. See here for it in the vega editor.
import altair as alt
from vega_datasets import data
iris = data.iris()
alt.Chart(iris).mark_point().encode(
x='petalWidth',
y=alt.Y('petalLength', scale=alt.Scale(domain=[7,0]), axis=alt.Axis(tickCount=100)),
color='species'
).interactive()
This might work with your data:
eff_metals = pd.read_excel(filename, sheet_name='summary_eff_metals')
points = alt.Chart(eff_metals, height=250, width=400).mark_circle().encode(
x=alt.X('Temperature:Q',axis=alt.Axis(title='Temperature (°C)'),
scale=alt.Scale(zero=False, padding=50)),
y=alt.Y('Efficiency:N',axis=alt.Axis(title='Efficiency (%)'),
scale=alt.Scale(zero=False, padding=1, domain=[17,1])),
color=alt.Color('Element:N'),
)
text = points.mark_text(align='right', dx=0, dy=-5).encode(
text='Element:N'
)
chart = alt.layer(points, text, data=eff_metals,
width=600, height=300)
chart
However, I think it's possible that you've might just have the wrong type on your efficiency variable. You could try and replace 'Efficiency:N' with `'Efficiency:Q' and that might do it?
While it's possible to reverse the domain manually, that requires hardcoding the bounds.
Instead we can just pass Scale(reverse=True) to the axis encoding, e.g.:
from vega_datasets import data
alt.Chart(data.wheat().head()).mark_bar().encode(
x='wheat:Q',
y=alt.Y('year:O', scale=alt.Scale(reverse=True)),
)
Here it's been passed to alt.Y, so the years are inverted (left) vs the default y='year:O' (right):

Python Bokeh - blending

Note from maintainers: This question concerns the obsolete bokeh.charts API, removed years ago. For information on creating all kinds of Bar charts with modern Bokeh, see:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html
OBSOLETE:
I am trying to create a bar chart from a dataframe df in Python Bokeh library. The data I have simply looks like:
value datetime
5 01-01-2015
7 02-01-2015
6 03-01-2015
... ... (for 3 years)
I would like to have a bar chart that shows 3 bars per month:
one bar for the MEAN of 'value' for the month
one bar for the MAX of 'value' for the month
one bar for the mean of 'value' for the month
I am able to create one bar chart any of MEAN/MAX/MIN with:
from bokeh.charts import Bar, output_file, show
p = Bar(df, 'datetime', values='value', title='mybargraph',
agg='mean', legend=None)
output_file('test.html')
show(p)
How could I have the 3 bar (mean, max, min) on the same plot ? And if possible stacked above each other.
It looks like blend could help me (like in this example:
http://docs.bokeh.org/en/latest/docs/gallery/stacked_bar_chart.html
but I cannot find detailed explanations of how it works. The bokeh website is amazing but for this particular item it is not really detailed.
Note from maintainers: This question concerns the obsolete bokeh.charts API, removed years ago. For information on creating all kinds of Bar charts with modern Bokeh, see:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html
OBSOLETE:
That blend example put me on the right track.
import pandas as pd
from pandas import Series
from dateutil.parser import parse
from bokeh.plotting import figure
from bokeh.layouts import row
from bokeh.charts import Bar, output_file, show
from bokeh.charts.attributes import cat, color
from bokeh.charts.operations import blend
output_file("datestats.html")
Just some sample data, feel free to alter it as you see fit.
First I had to wrangle the data into a proper format.
# Sample data
vals = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
dates = ["01-01-2015", "02-01-2015", "03-01-2015", "04-01-2015",
"01-02-2015", "02-02-2015", "03-02-2015", "04-02-2015",
"01-03-2015", "02-03-2015", "03-03-2015", "04-03-2015"
]
It looked like your date format was "day-month-year" - I used the dateutil.parser so pandas would recognize it properly.
# Format data as pandas datetime objects with day-first custom
days = []
days.append(parse(x, dayfirst=True) for x in dates)
You also needed it grouped by month - I used pandas resample to downsample the dates, get the appropriate values for each month, and merge into a dataframe.
# Put data into dataframe broken into min, mean, and max values each for month
ts = Series(vals, index=days[0])
firstmerge = pd.merge(ts.resample('M').min().to_frame(name="min"),
ts.resample('M').mean().to_frame(name="mean"),
left_index=True, right_index=True)
frame = pd.merge(firstmerge, ts.resample('M').max().to_frame(name="max"),
left_index=True, right_index=True)
Bokeh allows you to use the pandas dataframe's index as the chart's x values,
as discussed here
but it didn't like the datetime values so I added a new column for date labels. See timeseries comment below***.
# You can use DataFrame index for bokeh x values but it doesn't like timestamp
frame['Month'] = frame.index.strftime('%m-%Y')
Finally we get to the charting part. Just like the Olympic medal example, we pass some arguments to Bar.
Play with these however you like, but note that I added the legend by building it outside of the chart altogether. If you have a lot of data points it gets very messy on the chart the way it's built here.
# Main object to render with stacking
bar = Bar(frame,
values=blend('min', 'mean', 'max',
name='values', labels_name='stats'),
label=cat(columns='Month', sort=False),
stack=cat(columns='values', sort=False),
color=color(columns='values',
palette=['SaddleBrown', 'Silver', 'Goldenrod'],
sort=True),
legend=None,
title="Statistical Values Grouped by Month",
tooltips=[('Value', '#values')]
)
# Legend info (displayed as separate chart using bokeh.layouts' row)
factors = ["min", "mean", "max"]
x = [0] * len(factors)
y = factors
pal = ['SaddleBrown', 'Silver', 'Goldenrod']
p = figure(width=100, toolbar_location=None, y_range=factors)
p.rect(x, y, color=pal, width=10, height=1)
p.xaxis.major_label_text_color = None
p.xaxis.major_tick_line_color = None
p.xaxis.minor_tick_line_color = None
# Display chart
show(row(bar, p))
If you copy/paste this code, this is what you will show.
If you render it yourself or if you serve it: hover over each block to see the tooltips (values).
I didn't abstract everything I could (colors come to mind).
This is the type of chart you wanted to build, but it seems like a different chart style would display the data more informatively since stacked totals (min + mean + max) don't provide meaningful information. But I don't know what your data really are.
***You might consider a timeseries chart. This could remove some of the data wrangling done before plotting.
You might also consider grouping your bars instead of stacking them. That way you could easily visualize each month's numbers.

Categories