Grouped bar chart in newer versions of altair (>= 4.2.0) - python

I am trying to create a grouped bar chart in altair like in the answer to this question here.
The particular interesting part is the "beautification:
chart = Chart(df).mark_bar().encode(
column=Column('Genre',
axis=Axis(axisWidth=1.0, offset=-8.0, orient='bottom'),
scale=Scale(padding=4.0)),
x=X('Gender', axis=False),
y=Y('Rating', axis=Axis(grid=False)),
color=Color('Gender', scale=Scale(range=['#EA98D2', '#659CCA']))
).configure_facet_cell(
strokeWidth=0.0,
)
chart.display()
The issue is, however, that none of the stuff in the columns (alt.Column) works in the current version of Altair (I am using 4.2).
In particular, I am getting:
SchemaValidationError: Invalid specification
altair.vegalite.v4.schema.channels.Column, validating
'additionalProperties' Additional properties are not allowed ('axis'
was unexpected)
Can something similar still be done?

In Altair 4.2.0 you achieve a similar results like this (not sure if you can connect the facets with the x-axis line):
import altair as alt
import pandas as pd
# create dataframe
df = pd.DataFrame([['Action', 5, 'F'],
['Crime', 10, 'F'],
['Action', 3, 'M'],
['Crime', 9, 'M']],
columns=['Genre', 'Rating', 'Gender'])
chart = alt.Chart(df).mark_bar().encode(
column=alt.Column(
'Genre',
header=alt.Header(orient='bottom')
),
x=alt.X('Gender', axis=alt.Axis(ticks=False, labels=False, title='')),
y=alt.Y('Rating', axis=alt.Axis(grid=False)),
color='Gender'
).configure_view(
stroke=None,
)
chart
In the current development version of Altair (will probably be released as 5.0), you can use the new offset channels to achieve the same result without faceting:
chart = alt.Chart(df).mark_bar().encode(
x=alt.X('Genre', axis=alt.Axis(labelAngle=0)),
xOffset='Gender',
y=alt.Y('Rating', axis=alt.Axis(grid=False)),
color='Gender'
).configure_view(
stroke=None,
)
chart

Related

Altair: Setting Raw Color Values Messes Up SortField

I'm trying to create an Altair barplot with the bars sorted by 'count' and then apply raw color values to the bars.
import pandas as pd
import altair as alt
# Dummy data
df = pd.DataFrame({'fruit': ['apple', 'orange', 'blueberry', 'pear', 'grape', 'kiwi', 'strawberry', 'lychee'],
'count': [2, 4, 1, 7, 9, 12, 16, 35],
'label': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'],
'colors': ['#4c78a8', '#f58518', '#e45756', '#72b7b2', '#54a24b', '#eeca3b', '#b279a2',
'#9d755d']})
def make_alt_chart(df):
g = alt.Chart(df).encode(
x=alt.X('count'),
y=alt.Y('fruit', sort=alt.SortField(field='count', order='descending'))
).properties(
width=700,
height=650,
)
bars = g.mark_bar(
size=60,
).encode(
color=alt.Color('colors', sort=alt.SortField('count', order='descending'),
scale=None, legend=None)
).properties(height=alt.Step(75))
text = g.mark_text(
align='center',
baseline='middle',
dx=28
).encode(
text='label'
).interactive()
return (bars + text)
fruits = make_alt_chart(df)
fruits
Adding sort=alt.SortField(field='count', order='descending') to y= gets the chart sorted how I want, but when I add color=alt.Color('colors', sort=alt.SortField('count', order='descending'), scale=None, legend=None) to bars, the order on the y axis is no longer sorted by 'count'.
This is what the fruits chart looks like after running the above code:
This is what my desired output would look like, but with the custom colors applied:
If there's an easier way to set custom colors in Altair please let me know.
Note: The color hex values are the tableau10 scheme but dropping the pink shade.
I've reviewed these resources but haven't been able to figure it out:
altair.Color Documentation
Altair Customizing Visualizations Docs
Vega Color Schemes Docs
Altair issues: sort not working on alt.Y
SO: Setting constant label color for bar chart
SO: Sorting based on alt.Color
If you look in the Javascript console, you see that the renderer is outputting this warning:
WARN Domains that should be unioned has conflicting sort properties. Sort will be set to true.
The relevant Vega-Lite issue suggests the workaround; replace
y=alt.Y('fruit', sort=alt.SortField(field='count', order='descending'))
with
y=alt.Y('fruit', sort=alt.EncodingSortField(field='count', order='descending', op='sum'))
This is the result:

Altair - link y-axis with x-axis of a different chart

I need to visually compare two signals and I'm using Altair to plot some interactive charts like the example below.
import altair as alt
import pandas as pd
import numpy as np
np.random.seed(42)
df_comparison = pd.DataFrame({'x1': np.arange(20), 'x2': np.arange(20)}) #just for example purposes, actual data will be more interesting
df_signal_1 = pd.DataFrame({'x1': np.arange(20), 'data_1': np.random.random(20)})
df_signal_2 = pd.DataFrame({'x2': np.arange(20), 'data_2': np.random.random(20)})
comparison = alt.Chart(df_comparison, title='Comparison').mark_point(filled=True).encode(
alt.X('x1'),
alt.Y('x2')
).interactive()
signal_1 = alt.Chart(df_signal_1,title='Signal 1').mark_line().encode(
alt.X('x1'),
alt.Y('data_1'),
)
signal_2 = alt.Chart(df_signal_2, title='Signal 2').mark_line().encode(
alt.X('x2'),
alt.Y('data_2'),
)
(signal_1 & (comparison | signal_2).resolve_scale(x='shared')).resolve_scale(x='shared')
By zooming the Comparison chart you can see that its "x1" axis is linked to the "x1" axis of Signal 1, which is fine. However, it is also linked to "x2" axis of Signal 2 and that is not good. How can I link the "x2" axes of the Comparison and Signal 2 charts without breaking the link between the "x1" axes?
You can do this by creating the interaction manually, and then linking domains to the selection's encodings; something like this:
x12_zoom = alt.selection_interval(encodings=['x', 'y'], bind='scales')
comparison = alt.Chart(df_comparison, title='Comparison').mark_point(filled=True).encode(
alt.X('x1'),
alt.Y('x2'),
).add_selection(x12_zoom)
signal_1 = alt.Chart(df_signal_1,title='Signal 1').mark_line().encode(
alt.X('x1', scale=alt.Scale(domain={'selection': x12_zoom.name, 'encoding': 'x'})),
alt.Y('data_1'),
)
signal_2 = alt.Chart(df_signal_2, title='Signal 2').mark_line().encode(
alt.X('x2', scale=alt.Scale(domain={'selection': x12_zoom.name, 'encoding': 'y'})),
alt.Y('data_2'),
)
(signal_1 & (comparison | signal_2))

Plotly: How to display individual value on histogram?

I am trying to make dynamic plots with plotly. I want to plot a count of data that have been aggregated (using groupby).
I want to facet the plot by color (and maybe even by column). The problem is that I want the value count to be displayed on each bar. With histogram, I get smooth bars but I can't find how to display the count:
With a bar plot I can display the count but I don't get smooth bar and the count does not appear for the whole bar but for each case composing that bar
Here is my code for the barplot
val = pd.DataFrame(data2.groupby(["program", "gender"])["experience"].value_counts())
px.bar(x=val.index.get_level_values(0), y=val, color=val.index.get_level_values(1), barmode="group", text=val)
It's basically the same for the histogram.
Thank you for your help!
px.histogram does not seem to have a text attribute. So if you're willing to do any binning before producing your plot, I would use px.Bar. Normally, you apply text to your barplot using px.Bar(... text = <something>). But this gives the results you've described with text for all subcategories of your data. But since we know that px.Bar adds data and annotations in the order that the source is organized, we can simply update text to the last subcategory applied using fig.data[-1].text = sums. The only challenge that remains is some data munging to retrieve the correct sums.
Plot:
Complete code with data example:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
# data
df = pd.DataFrame({'x':['a', 'b', 'c', 'd'],
'y1':[1, 4, 9, 16],
'y2':[1, 4, 9, 16],
'y3':[6, 8, 4.5, 8]})
df = df.set_index('x')
# calculations
# column sums for transposed dataframe
sums= []
for col in df.T:
sums.append(df.T[col].sum())
# change dataframe format from wide to long for input to plotly express
df = df.reset_index()
df = pd.melt(df, id_vars = ['x'], value_vars = df.columns[1:])
fig = px.bar(df, x='x', y='value', color='variable')
fig.data[-1].text = sums
fig.update_traces(textposition='inside')
fig.show()
If your first graph is with graph object librairy you can try:
# Use textposition='auto' for direct text
fig=go.Figure(data[go.Bar(x=val.index.get_level_values(0),
y=val, color=val.index.get_level_values(1),
barmode="group", text=val, textposition='auto',
)])

Bar chart showing count of each category month-wise using Bokeh

I have data as shown below:
So, from this, I need to display the count in each category year_month_id wise. Since I have 12 months there will be 12 sub-divisions and under each count of
ID within each class.
Something like the image below is what I am looking for.
Now the examples in Bokeh use ColumnDataSource and dictionary mapping, but how do I do this for my dataset.
Can someone please help me with this?
Below is the expected output in tabular and chart format.
I believe the pandas Python package would come in handy for preparing your data for plotting. It's useful for manipulating table-like data structures.
Here is how I went about your problem:
from pandas import DataFrame
from bokeh.io import show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource
from bokeh.palettes import Viridis5
# Your sample data
df = DataFrame({'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 1],
'year_month_id': [201612, 201612, 201612, 201612, 201612, 201612, 201612, 201612, 201612, 201701],
'class': ['A', 'D', 'B', 'other', 'other', 'other', 'A', 'other', 'A', 'B']
})
# Get counts of groups of 'class' and fill in 'year_month_id' column
df2 = DataFrame({'count': df.groupby(["year_month_id", "class"]).size()}).reset_index()
df2 now looks like this:
# Create new column to make plotting easier
df2['class-date'] = df2['class'] + "-" + df2['year_month_id'].map(str)
# x and y axes
class_date = df2['class-date'].tolist()
count = df2['count'].tolist()
# Bokeh's mapping of column names and data lists
source = ColumnDataSource(data=dict(class_date=class_date, count=count, color=Viridis5))
# Bokeh's convenience function for creating a Figure object
p = figure(x_range=class_date, y_range=(0, 5), plot_height=350, title="Counts",
toolbar_location=None, tools="")
# Render and show the vbar plot
p.vbar(x='class_date', top='count', width=0.9, color='color', source=source)
show(p)
So the Bokeh plot looks like this:
Of course you can alter it to suit your needs. The first thing I thought of was making the top of the y_range variable so it could accommodate data better, though I have not tried it myself.

How to create a grouped bar chart in Altair?

How does one create a grouped bar chart in Altair? I'm trying the following but it is just producing two graphs side by side.
Chart(data).mark_bar().encode(
column='Gender',
x='Genre',
y='Rating',
color='Gender'
)
Example of group bar chart
I show a simplified example of Grouped Bar Chart from Altair's documentation. You can also see the full documentation here.
Basically, you have to specify x-axis Gender (F or M in each subplot), y-axis as Rating and Genre as Column.
from altair import *
import pandas as pd
# create dataframe
df = pd.DataFrame([['Action', 5, 'F'],
['Crime', 10, 'F'],
['Action', 3, 'M'],
['Crime', 9, 'M']],
columns=['Genre', 'Rating', 'Gender'])
chart = Chart(df).mark_bar().encode(
column=Column('Genre'),
x=X('Gender'),
y=Y('Rating'),
color=Color('Gender', scale=Scale(range=['#EA98D2', '#659CCA']))
).configure_facet_cell(
strokeWidth=0.0,
)
chart.display() # will show the plot
The bar chart will look like following
Adding Axis parameters
You only have to follow Axis parameters in documentation to make the plot looks prettier:
chart = Chart(df).mark_bar().encode(
column=Column('Genre',
axis=Axis(axisWidth=1.0, offset=-8.0, orient='bottom'),
scale=Scale(padding=4.0)),
x=X('Gender', axis=False),
y=Y('Rating', axis=Axis(grid=False)),
color=Color('Gender', scale=Scale(range=['#EA98D2', '#659CCA']))
).configure_facet_cell(
strokeWidth=0.0,
)
chart.display()
If you try the accepted answers on newer version of Altair (since 4.2.0). You will notice that it doesn't work. Some of the API has changed, so to get the same results in Altair 4.2.0 you can use the approach posted in my answer to Grouped bar chart in newer versions of altair (>= 4.2.0). For the development version of Altair (which will probably be released as 5.0), this has become easier to achieve since you can use the xOffset encoding like this without the need to facet your charts:
import altair as alt
import pandas as pd
df = pd.DataFrame([['Action', 5, 'F'],
['Crime', 10, 'F'],
['Action', 3, 'M'],
['Crime', 9, 'M']],
columns=['Genre', 'Rating', 'Gender'])
chart = alt.Chart(df).mark_bar().encode(
x=alt.X('Genre', axis=alt.Axis(labelAngle=0)),
xOffset='Gender',
y=alt.Y('Rating', axis=alt.Axis(grid=False)),
color='Gender'
).configure_view(
stroke=None,
)
chart

Categories