plotly - multiple traces using a shared slider variable - python

As the title hints, I'm struggling to create a plotly chart that has multiple lines that are functions of the same slider variable.
I hacked something together using bits and pieces from the documentation: https://pastebin.com/eBixANqA. This works for one line.
Now I want to add more lines to the same chart, but this is where I'm struggling. https://pastebin.com/qZCMGeAa.
I'm getting a PlotlyListEntryError: Invalid entry found in 'data' at index, '0'
Path To Error: ['data'][0]
Can someone please help?

It looks like you were using https://plot.ly/python/sliders/ as a reference, unfortunately I don't have time to test with your code, but this should be easily adaptable. If you create each trace you want to plot in the same way that you have been:
trace1 = [dict(
type='scatter',
visible = False,
name = "trace title",
mode = 'markers+lines',
x = x[0:step],
y = y[0:step]) for step in range(len(x))]
where I note in my example my data is coming from pre-defined lists, where you are using a function, that's probably the only change you'll really need to make besides your own step size etc.
If you create a second trace in the same way, for example
trace2 = [dict(
type='scatter',
visible = False,
name = "trace title",
mode = 'markers+lines',
x = x2[0:step],
y = y2[0:step]) for step in range(len(x2))]`
Then you can put all your data together with the following
all_traces = trace1 + trace2
then you can just go ahead and plot it provided you have your layout set up correctly (it should remain unchanged from your single trace example):
fig = py.graph_objs.Figure(data=all_traces, layout=layout)
py.offline.iplot(fig)
Your slider should control both traces provided you were following https://plot.ly/python/sliders/ to get the slider working. You can combine multiple data dictionaries this way in order to have multiple plots controlled by the same slider.
I do note that if your lists of dictionaries containing data are of different length, that this gets topsy-turvy.

Related

Plotly express animations: Plotting multiple changing traces and background image for each frame

I am currently trying to create a debugging tool for a simulation. For this is am working with the animations of plotly express. What I want to achieve:
An animation with a different background image with different traces (also differing in number) for each frame. Any help would be very much appreciated! (Also if someone knows how to do this using any other library it'd be very much appreciated!)
Thanks!
Adding the background images work perfectly well like this:
fig = px.imshow(img,origin="upper", animation_frame=0, binary_string=True, labels=dict(animation_frame="slice"))
So I only need to figure out how to display the traces.
I tried the following first:
fig = px.imshow(img,origin="upper", animation_frame=0, binary_string=True, labels=dict(animation_frame="slice")) # img consists of |time_step| images (np.array)
## now I'm trying to add the traces
for i in range(len(img)):
time_step = df.iloc[i]
for j in range(len(time_step)):
current = time_step.iloc[j]
x, y = current["x"], current["y"]
if (not current["valid"]):
fig.add_trace(go.Scatter(x=x, y=y, name="{}".format(current["unique_id"]), text="{}".format(current["costs"]), hoverinfo='text+name', line=dict(color="black", dash="dot"), line_shape="spline"))
else:
fig.add_traces(go.Scatter(x=x_list, y=y_list, line=dict(color="black"), line_shape="spline"))
fig.show()
However, I realized that fig.add_trace always adds the trace to all frames of the animation. (When only displaying the first time step it worked though :) )
However, I'd like to add the traces for one time_step only and then the ones for the next time_step.
So I started looking into this approach:
# saving in a list instead of printing immediately
if (not current["feasible"]):
info.append([x_list, y_list, "{}".format(current["unique_id"]),
"{}".format(current["costs"]), dict(color=color, dash="dot")])
else:
info.append([x_list, y_list, "{}".format(current["unique_id"]),
"{}".format(current["costs"]),dict(color=color)])
and added the information to the single frames:
frames.append(
go.Frame(
data=[go.Scatter(
x=information[0],
y=information[1],
name=information[2],
text=information[3],
hoverinfo='text+name',
line=information[4],
line_shape="spline"
)
for information in info],
)
)
figa = go.Figure(data=fig.data, frames=frames, layout=fig.layout)
figa.show()
But it only displays the first background image and once I start the animation it disappears and only shows the first saved Scatter for each time_step.
I'm a little lost as I don't have a set number of traces I can't "hardcode" the Scatters.

Memory efficient way of plotting several line segements in Plotly?

I have a collection of timestamps that correspond roughly to the time these modules called "metal cells" are being utilized. At present, I am grabbing these by pairs and plotting them by creating a small line segment using go.Scatter and adding it to a list which later functions as the data argument for a go.Figure object.
for mcell in metal_cells:
mdf = cell_dep_grouped.get_group(mcell).reset_index(drop=True)
for i in range(0, mdf.shape[0], 2):
start_row = mdf.iloc[i]
finish_row = mdf.iloc[i+1]
buffer_dat = go.Scatter(x = [start_row.Time, finish_row.Time],
y = [mcell, mcell],
line = {'color': chem_palette[start_row.Recipe]},
legendgroup = start_row.Recipe,
name = start_row.Recipe,
showlegend = not plotted_chem[start_row.Recipe])
plot_data.append(buffer_dat)
The output looks as the picture I have attached, and I would like it to continue looking as closely to it because it clearly highlights the time the module was being utilized using the scatter segments, and downtime as implied by the empty space between segments. The main issue is I need this plot to be interactive - the main interactive feature being that, upon performing a selection, some calculations are made and some of the traces change. But because the plot consists of 80 or so go.Scatter objects, it is quite slow. Is there a more efficient way of plotting segments like these that make the plot lighter and faster?

Understanding the interaction between mark_line point overlay and legend

I have found some unintuitive behavior in the interaction between the point property of mark_line and the appearance of the color legend for Altair/Vega-Lite. I ran into this when attempting to create a line with very large and mostly-transparent points in order to increase the area that would trigger the line's tooltip, but was unable to preserve a visible type=gradient legend.
The following code is an MRE for this problem, showing 6 cases: the use of [False, True, and a custom OverlayMarkDef] for the point property and the use of plain and customized color encoding.
import pandas as pd
import altair as alt
# create data
df = pd.DataFrame()
df['x_data'] = [0, 1, 2] * 3
df['y2'] = [0] * 3 + [1] * 3 + [2] * 3
# initialize
base = alt.Chart(df)
markdef = alt.OverlayMarkDef(size=1000, opacity=.001)
color_encode = alt.Color(shorthand='y2', legend=alt.Legend(title='custom legend', type='gradient'))
marks = [False, True, markdef]
encodes = ['y2', color_encode]
plots = []
for i, m in enumerate(marks):
for j, c in enumerate(encodes):
plot = base.mark_line(point=m).\
encode(x='x_data', y='y2', color=c, tooltip=['x_data','y2']).\
properties(title=', '.join([['False', 'True', 'markdef'][i], ['plain encoding', 'custom encoding'][j]]))
plots.append(plot)
combined = alt.vconcat(
alt.hconcat(*plots[:2]).resolve_scale(color='independent'),
alt.hconcat(*plots[2:4]).resolve_scale(color='independent'),
alt.hconcat(*plots[4:]).resolve_scale(color='independent')
).resolve_scale(color='independent')
The resulting plot (the interactive tooltips work as expected):
The color data is the same for each of these plots, and yet the color legend is all over the place. In my real case, the gradient is preferred (the data is quantitative and continuous).
With no point on the mark_line, the legend is correct.
Adding point=True converts the legend to a symbol type - I'm not sure why this is the case since the default legend type is gradient for quantitative data (as seen in the first row) and this is the same data - but can be forced back to gradient by the custom encoding.
Attempting to make a custom point via OverlayMarkDef however renders the forced gradient colorbar invisible - matching the opacity of the OverlayMarkDef. But it is not simply a matter of the legend always inheriting the properties of the point, because the symbol legend does not attempt to reflect the opacity.
I would like to have the normal gradient colorbar available for the custom OverlayMarkDef, but I would also love to build up some intuition for what is going on here.
The transparency issue with the bottom right plot has been fixed since Altair 4.2.0, so now all occasions that include a point on the line changes the legend to 'Ordinal' instead of 'Quantitative'.
I believe the reason the legend is converted to a symbol instead of a gradient, is that your are adding filled points and the fill channel is not set to a quantitative field so it defaults to either ordinal or nominal with a sort:
plot = base.mark_line().encode(
x='x_data',
y='y2',
color='y2',
)
plot + plot.mark_circle(opacity=1)
mark_point gives a gradient legend since it has not fill, and if we set the fill for mark_circle explicitly we also get a gradient legend (one for fill and one for color.
plot = base.mark_line().encode(
x='x_data',
y='y2',
color='y2',
fill='y2'
)
plot + plot.mark_circle(opacity=1)
I agree with you that this is a bit unexpected and it would be more convenient if the encoding type of point=True was set to the same as that used for the lines. You might suggest this as an enhancement in VegaLite together with reporting the apparent bug that you can't override the legend type via type='gradient'.

Setting group order on pySankey sankey chart

I'm trying to use a sankey chart to show some user segmentation change using PySankey but the class order is the opposite to what I want. Is there a way for me to specify the order in which each class is posted?
Here is the code I'm using (a dummy version):
test_df = pd.DataFrame({
'curr_seg':np.repeat(['A','B','C','D'],4),
'new_seg':['A','B','C','D']*4,
'num_users':np.random.randint(low=10, high=20, size=16)
})
sankey(
left=test_df["curr_seg"], right=test_df["new_seg"],
leftWeight= test_df["num_users"], rightWeight=test_df["num_users"],
aspect=20, fontsize=20
)
Which produces this chart:
I want to have the A class first and the D class latest on both left and right axis. Does anybody know how can I set it up? Thank you very much.
There is a bug in the first line of check_data_matches_labels function, you need to change to the following:
if len(labels) > 0:
Then you can use leftLabels and rightLabels to control order.

Bokeh is behaving in mysterious way

import numpy as np
from bokeh.plotting import *
from bokeh.models import ColumnDataSource
prepare data
N = 300
x = np.linspace(0,4*np.pi, N)
y0 = np.sin(x)
y1 = np.cos(x)
output_notebook()
#create a column data source for the plots to share
source = ColumnDataSource(data = dict(x = x, y0 = y0, y1 = y1))
Tools = "pan, wheel_zoom, box_zoom, reset, save, box_select, lasso_select"
create a new plot and add a renderer
left = figure(tools = Tools, plot_width = 350, plot_height = 350, title = 'sinx')
left.circle(x, y0,source = source )
create another plot and add a renderer
right = figure(tools = Tools, plot_width = 350, plot_height = 350 , title = 'cosx')
right.circle(x, y1, source = source)
put the subplot in gridplot and show the plot
p = gridplot([[left, right]])
show(p)
something is wrong with sin graph. Don't know why 'Bokeh' is behaving like this.But if I write y's into Double or single quotation marks/inverted commas then things work fine
left.circle(x, 'y0',source = source )
right.circle(x, 'y1', source = source)
put the subplot in gridplot and show the plot
p = gridplot([[left, right]])
show(p)
Things I tried to resolve the problem
1) Restarted my notebook . (Easiest way to solve problem)
2) Generated the output into new window.
3) Generated plot separately instead of grid plot.
Please help me out to find out the reason behind the scene.
Am I doing something wrong ?
Is it a bug ?
If you want to configure multiple glyphs to share data from a single ColumnDataSource, then you always need to configure the glyph properties with the names of the columns, and not with the actual data literals, as you have done. In other words:
left.circle('x', 'y0',source = source )
right.circle('x', 'y1', source = source)
Note that I have quoted 'x' as well. This is the correct way to do things when sharing a source. When you pass a literal value (i.e., a real list or array), glyphs functions like .circle automatically synthesize a column for you as a convenience. But they use defined names based on the property, so if you share a source between two renderers, then the second call to .circle will overwrite the column 'y' column that the first call to .circle made. Which is exactly what you are seeing.
As you can imagine, this behavior is confusing. Accordingly, there is an open GitHub issue to specifically and completely disallow passing in data literals whenever the source argument is provided explicitly. I can guarantee this will happen in the near future, so if you are sharing a source, you should always and only pass in column names (i.e. strings).

Categories