Altair - link y-axis with x-axis of a different chart - python

I need to visually compare two signals and I'm using Altair to plot some interactive charts like the example below.
import altair as alt
import pandas as pd
import numpy as np
np.random.seed(42)
df_comparison = pd.DataFrame({'x1': np.arange(20), 'x2': np.arange(20)}) #just for example purposes, actual data will be more interesting
df_signal_1 = pd.DataFrame({'x1': np.arange(20), 'data_1': np.random.random(20)})
df_signal_2 = pd.DataFrame({'x2': np.arange(20), 'data_2': np.random.random(20)})
comparison = alt.Chart(df_comparison, title='Comparison').mark_point(filled=True).encode(
alt.X('x1'),
alt.Y('x2')
).interactive()
signal_1 = alt.Chart(df_signal_1,title='Signal 1').mark_line().encode(
alt.X('x1'),
alt.Y('data_1'),
)
signal_2 = alt.Chart(df_signal_2, title='Signal 2').mark_line().encode(
alt.X('x2'),
alt.Y('data_2'),
)
(signal_1 & (comparison | signal_2).resolve_scale(x='shared')).resolve_scale(x='shared')
By zooming the Comparison chart you can see that its "x1" axis is linked to the "x1" axis of Signal 1, which is fine. However, it is also linked to "x2" axis of Signal 2 and that is not good. How can I link the "x2" axes of the Comparison and Signal 2 charts without breaking the link between the "x1" axes?

You can do this by creating the interaction manually, and then linking domains to the selection's encodings; something like this:
x12_zoom = alt.selection_interval(encodings=['x', 'y'], bind='scales')
comparison = alt.Chart(df_comparison, title='Comparison').mark_point(filled=True).encode(
alt.X('x1'),
alt.Y('x2'),
).add_selection(x12_zoom)
signal_1 = alt.Chart(df_signal_1,title='Signal 1').mark_line().encode(
alt.X('x1', scale=alt.Scale(domain={'selection': x12_zoom.name, 'encoding': 'x'})),
alt.Y('data_1'),
)
signal_2 = alt.Chart(df_signal_2, title='Signal 2').mark_line().encode(
alt.X('x2', scale=alt.Scale(domain={'selection': x12_zoom.name, 'encoding': 'y'})),
alt.Y('data_2'),
)
(signal_1 & (comparison | signal_2))

Related

How to avoid None when plotting sunburst chart using Plotly?

I am trying to create sunburst chart using Plotly. My data consists of several different types of journeys of varying steps. Some journeys are 10 steps others are 100. But for the purposes of simplicity, let us consider only 3 steps.
Here is the data -
import pandas as pd
import plotly.express as px
import numpy as np
data = {
'step0' :['home', 'home','product2','product1','home'],
'step1' : ['product1','product1', None, 'product2',None] ,
'step2' : ['product2','checkout', None, None,None] ,
'total_sales' : [50,20,10,0,7]
}
data_df = pd.DataFrame(data)
data_df.head()
I now try to plot these steps in sunburst chart. Because some journeys can be short, the subsequent steps are marked as None in those cases.
data_df = data_df.fillna('end')
plotting code -
fig = px.sunburst(data_df, path=['step0','step1','step2'], values='total_sales', height = 400)
fig.show()
As you can see above, the None have been filled by end because Plotly does not like NAs. But then I do not want to show the end in the sunburst chart.
I want to re-create something like this -
https://bl.ocks.org/kerryrodden/7090426
How can I make this work in Plotly?
One workaround that uses what you already have would be to instead fillna with an empty string like " " so the word "end" doesn't show on the chart. Then you can loop through the marker colors and marker labels in the fig.data[0] object, changing the marker color to transparent "rgba(0,0,0,0)" for every label that matches the empty string.
The only thing is that the hovertemplate will still show information for the part of the sunburst chart we have used our workaround to hide, but the static image will look correct.
For example:
import pandas as pd
import plotly.express as px
import numpy as np
data = {
'step0' :['home', 'home','product2','product1','home'],
'step1' : ['product1','product1', None, 'product2',None] ,
'step2' : ['product2','checkout', None, None,None] ,
'total_sales' : [50,20,10,0,7]
}
data_df = pd.DataFrame(data)
# data_df.head()
data_df = data_df.fillna(" ")
fig = px.sunburst(
data_df,
path=['step0','step1','step2'],
values='total_sales',
color=["red","orange","yellow","green","blue"],
height = 400
)
## set marker colors whose labels are " " to transparent
marker_colors = list(fig.data[0].marker['colors'])
marker_labels = list(fig.data[0]['labels'])
new_marker_colors = ["rgba(0,0,0,0)" if label==" " else color for (color, label) in zip(marker_colors, marker_labels)]
marker_colors = new_marker_colors
fig.data[0].marker['colors'] = marker_colors
fig.show()

Ploting multiple curves (x, y1, y2, x, y3, y4) in the same plot

I'm trying to plot a graph with four different values on the "y" axis. So, I have 6 arrays, 2 of which have elements that represent the time values ​​of the "x" axis and the other 4 represent the corresponding elements (in the same position) in relation to the "y" axis.
Example:
LT_TIME = ['18:14:17.566 ', '18:14:17.570']
LT_RP = [-110,-113]
LT_RQ = [-3,-5]
GNR_TIME = ['18: 15: 42.489', '18:32:39.489']
GNR_RP = [-94, -94]
GNR_RQ = [-3, -7]
The coordinates of the "LT" graph are:
('18:14:17.566',-110), ('18:14:17.570',-113), ('18:14:17.566',-3), ('18:14:17.570',-5)
And with these coordinates, I can generate a graph with two "y" axes, which contains the points (-110,-113,-3,-5) and an "x" axis with the points ('18:14:17.566', '18:14:17.570').
Similarly, it is possible to do the same "GNR" arrays. So, how can I have all the Cartesian points on both the "LT" and "GNR" arrays on the same graph??? I mean, how to plot so that I have the following coordinates on the same graph:
('18:14:17.566',-110), ('18:14:17.570 ',-113), ('18:14:17.566',-3), ('18:14:17.570',-5),
('18:15:42.489',-94), ('18:32:39.489',-94), ('18:15:42.489',-3), ('18:32:39.489',-7)
It sounds like your problem has two parts: formatting the data in a way that visualisation libraries would understand and actually visualising it using a dual axis.
Your example screenshot includes some interactive controls so I suggest you use bokeh which gives you zoom and pan for "free" rather than matplotlib. Besides, I find that bokeh's way of adding dual axis is more straight-forward. If matplotlib is a must, here's another answer that should point you in the right direction.
For the first part, you can merge the data you have into a single dataframe, like so:
import pandas as pd
from bokeh.models import LinearAxis, Range1d, ColumnDataSource
from bokeh.plotting import figure, output_notebook, show
output_notebook() #if working in Jupyter Notebook, output_file() if not
LT_TIME = ['18:14:17.566 ', '18:14:17.570']
LT_RP = [-110,-113]
LT_RQ = [-3,-5]
GNR_TIME = ['18: 15: 42.489', '18:32:39.489']
GNR_RP = [-94, -94]
GNR_RQ = [-3, -7]
s1 = list(zip(LT_TIME, LT_RP)) + list(zip(GNR_TIME, GNR_RP))
s2 = list(zip(LT_TIME, LT_RQ)) + list(zip(GNR_TIME, GNR_RQ))
df1 = pd.DataFrame(s1, columns=["Date", "RP"])
df2 = pd.DataFrame(s2, columns=["Date", "RQ"])
df = df1.merge(df2, on="Date")
source = ColumnDataSource(df)
To visualise the data as a dual axis line chart, we just need to specify the extra y-axis and position it in the layout:
p = figure(x_range=df["Date"], y_range=(-90, -120))
p.line(x="Date", y="RP", color="cadetblue", line_width=2, source=source)
p.extra_y_ranges = {"RQ": Range1d(start=0, end=-10)}
p.line(x="Date", y="RQ", color="firebrick", line_width=2, y_range_name="RQ", source=source)
p.add_layout(LinearAxis(y_range_name="RQ"), 'right')
show(p)

How to hide axis lines but show ticks in a chart in Altair, while actively using "axis" parameter?

I am aware of using axis=None to hide axis lines. But when you have actively used axis to modify the graph, is it possible to keep just the ticks, but hide the axis lines for both X and Y axis?
For example, here is a graph I have where I'd like it to happen -
import pandas as pd
import altair as alt
df = pd.DataFrame({'a': [1,2,3,4], 'b':[2000,4000,6000,8000]})
alt.Chart(df).mark_trail().encode(
x=alt.X('a:Q', axis=alt.Axis(titleFontSize=12, title='Time →', labelColor='#999999', titleColor='#999999', titleAlign='right', titleAnchor='end', titleY=-30)),
y=alt.Y('b:Q', axis=alt.Axis(format="$s", tickCount=3, titleFontSize=12, title='Cost →', labelColor='#999999', titleColor='#999999', titleAnchor='end')),
size=alt.Size('b:Q', legend=None)
).configure_view(strokeWidth=0).configure_axis(grid=False)
The output should look like the ticks in this SO post.
Note: The plot in that post has nothing to do with the demo provided here. its just for understanding purposes.
Vega-Lite calls the axis line the domain. You can hide it by passing domain=False to the axis configuration:
import pandas as pd
import altair as alt
df = pd.DataFrame({'a': [1,2,3,4], 'b':[2000,4000,6000,8000]})
alt.Chart(df).mark_trail().encode(
x=alt.X('a:Q', axis=alt.Axis(titleFontSize=12, title='Time →', labelColor='#999999', titleColor='#999999', titleAlign='right', titleAnchor='end', titleY=-30)),
y=alt.Y('b:Q', axis=alt.Axis(format="$s", tickCount=3, titleFontSize=12, title='Cost →', labelColor='#999999', titleColor='#999999', titleAnchor='end')),
size=alt.Size('b:Q', legend=None)
).configure_view(strokeWidth=0).configure_axis(grid=False, domain=False)

Altair histogram, is it possible to give a name to the mark_rule?

I have a dataframe and I plot the following plot.
The code is this:
import altair as alt
alt.renderers.enable('default')
base = alt.Chart(df_800).properties()
bar = base.mark_bar().encode(
x=alt.X('volumechange', bin=True, title='Volume Change'),
y='count()'
)
rule = base.mark_rule(color='red').encode(
x='mean(volumechange)',
size=alt.value(5)
)
rule2 = base.mark_rule(color='orange').encode(
x='median(volumechange)',
size=alt.value(5),
)
bar + rule + rule2
I want to add a legend or something like that to show that red rule is the mean of the volume change and orange is the median. This is the first time I am using altair, any help is appreciated.
Legends in Altair are auto-generated from color encodings, so the trick is to get your aggregates into a format where their label is a column encoded by color. Here's one way you can do it:
import pandas as pd
import numpy as np
df_800 = pd.DataFrame({'volumechange': np.random.randn(100)})
base = alt.Chart(df_800)
bar = base.mark_bar(color='lightgray').encode(
x=alt.X('volumechange', bin=True, title='Volume Change'),
y='count()'
)
aggregates = base.transform_aggregate(
mean='mean(volumechange)',
median='median(volumechange)',
).transform_fold(
['mean', 'median']
).mark_rule().encode(
x='value:Q',
color='key:N'
)
bar + aggregates

Plotly subplot represent same y-axis name with same color and single legend

I am trying to create a plot for two categories in a subplot. 1st column represent category FF and 2nd column represent category RF in the subplot.
The x-axis is always time and y-axis is remaining columns. In other words, it is a plot with one column vs rest.
1st category and 2nd category always have same column names just only the values differs.
I tried to generate the plot in a for loop but the problem is plotly treats each column name as distinct and thereby it represents the lines in different color for y-axis with same name. As a consequence, in legend also an entry is created.
For example, in first row Time vs price2010 I want both subplot FF and RF to be represented in same color (say blue) and a single entry in legend.
I tried adding legendgroup in go.Scatter but it doesn't help.
import pandas as pd
from pandas import DataFrame
from plotly import tools
from plotly.offline import init_notebook_mode, plot, iplot
import plotly.graph_objs as go
from plotly.subplots import make_subplots
CarA = {'Time': [10,20,30,40 ],
'Price2010': [22000,26000,27000,35000],
'Price2011': [23000,27000,28000,36000],
'Price2012': [24000,28000,29000,37000],
'Price2013': [25000,29000,30000,38000],
'Price2014': [26000,30000,31000,39000],
'Price2015': [27000,31000,32000,40000],
'Price2016': [28000,32000,33000,41000]
}
ff = DataFrame(CarA)
CarB = {'Time': [8,18,28,38 ],
'Price2010': [19000,20000,21000,22000],
'Price2011': [20000,21000,22000,23000],
'Price2012': [21000,22000,23000,24000],
'Price2013': [22000,23000,24000,25000],
'Price2014': [23000,24000,25000,26000],
'Price2015': [24000,25000,26000,27000],
'Price2016': [25000,26000,27000,28000]
}
rf = DataFrame(CarB)
Type = {
'FF' : ff,
'RF' : rf
}
fig = make_subplots(rows=len(ff.columns), cols=len(Type), subplot_titles=('FF','RF'),vertical_spacing=0.3/len(ff.columns))
labels = ff.columns[1:]
for indexC, (cat, values) in enumerate(Type.items()):
for indexP, params in enumerate(values.columns[1:]):
trace = go.Scatter(x=values.iloc[:,0], y=values[params], mode='lines', name=params,legendgroup=params)
fig.append_trace(trace,indexP+1, indexC+1)
fig.update_xaxes(title_text=values.columns[0],row=indexP+1, col=indexC+1)
fig.update_yaxes(title_text=params,row=indexP+1, col=indexC+1)
fig.update_layout(height=2024, width=1024,title_text="Car Analysis")
iplot(fig)
It might not be a good solution, but so far I can able to come up only with this hack.
fig = make_subplots(rows=len(ff.columns), cols=len(Type), subplot_titles=('FF','RF'),vertical_spacing=0.2/len(ff.columns))
labels = ff.columns[1:]
colors = [ '#a60000', '#f29979', '#d98d36', '#735c00', '#778c23', '#185900', '#00a66f']
legend = True
for indexC, (cat, values) in enumerate(Type.items()):
for indexP, params in enumerate(values.columns[1:]):
trace = go.Scatter(x=values.iloc[:,0], y=values[params], mode='lines', name=params,legendgroup=params, showlegend=legend, marker=dict(
color=colors[indexP]))
fig.append_trace(trace,indexP+1, indexC+1)
fig.update_xaxes(title_text=values.columns[0],row=indexP+1, col=indexC+1)
fig.update_yaxes(title_text=params,row=indexP+1, col=indexC+1)
fig.update_layout(height=1068, width=1024,title_text="Car Analysis")
legend = False
If you combine your data into a single tidy data frame, you can use a simple Plotly Express call to make the chart: px.line() with color, facet_row and facet_col

Categories