How plot points based on categorical variable in plotly - python

I am using Plotly for visualization. I want to make plot, and give the points colors based on categorical variable.
fig = go.Figure()
fig.add_trace(go.Scatter(x=df.Predicted, y=df.Predicted,colors='Category',mode='markers',
))
fig.add_trace(go.Scatter(x=df.Predicted, y=df.real , colors='Category'
))
fig.show()
where Category is column in my dataframe. How can I do this kind of graph

you have implied a data frame structure which I have simulated
it's simpler to use Plotly Express higher level API that graph objects
have used to calls to px.scatter() to generate traces defined in your question. Plus have renamed traces in second call to ensure legend is clear and made them lines
import numpy as np
import pandas as pd
import plotly.express as px
df = pd.DataFrame(
{
"Predicted": np.sort(np.random.uniform(3, 15, 100)),
"real": np.sort(np.random.uniform(3, 15, 100)),
"Category": np.random.choice(list("ABCD"), 100),
}
)
px.scatter(df, x="Predicted", y="Predicted", color="Category").add_traces(
px.line(df, x="Predicted", y="real", color="Category")
.for_each_trace(
lambda t: t.update(name="real " + t.name)
) # make it clear in legend this is second set of traces
.data
)

Related

How do I add a secondary legend that explains what symbols mean with plotly express?

I have a plot which uses US states to map symbols. I currently assign symbols using the "state" column in my dataframe so that I can select particular states of interest by clicking or double clicking on the Plotly Express legend. This part is working fine. However, the symbol mapping I'm using also communicates information about territory, e.g. triangle-down means poor coverage in that state and many states will share this symbol. I would like to add another legend that shows what each shape means. How can I do this in Plotly Express? Alternatively, is there a way to display symbols in a footnote? I could also give the symbol definitions there.
The goal is to display that circle=Medium coverage, triangle-down=poor coverage, etc. in addition to the individual state legend I already have. If the legend is clickable such that I can select entire groups based on the symbol shape that would be the best possible outcome.
Thank you for any tips!
I tried using html and footnotes to display the symbols but it did not work.
as noted in comment, it can be achieved by additional traces on different axes
have simulated some data that matches what is implied in image and comments
from scatter figure extract out how symbols and colors have been assigned to states
build another scatter that is effectively a legend.
import pandas as pd
import numpy as np
import plotly.express as px
df_s = pd.read_html(
"https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States"
)[1].iloc[:, 0:2]
df_s.columns = ["name", "state"]
# generate a dataframe that matches structure in image and question
df = pd.DataFrame(
{"activity_month": pd.date_range("1-jan-2020", "today", freq="W")}
).assign(
value=lambda d: np.random.uniform(0, 1, len(d)),
state=lambda d: np.random.choice(df_s["state"], len(d)),
)
# straight forward scatter
fig = px.scatter(df, x="activity_month", y="value", symbol="state", color="state")
# extract out how symbols and colors have been assigned to states
df_symbol = pd.DataFrame(
[
{"symbol": t.marker.symbol, "state": t.name, "color": t.marker.color}
for t in fig.data
]
).assign(y=lambda d: d.index//20, x=lambda d: d.index%20)
# build a figure that effectively the legend
fig_legend = px.scatter(
df_symbol,
x="x",
y="y",
symbol="symbol",
color="state",
text="state",
color_discrete_sequence=df_symbol["color"]
).update_traces(textposition="middle right", showlegend=False, xaxis="x2", yaxis="y2")
# insert legend into scatter and format axes
fig.add_traces(fig_legend.data).update_layout(
yaxis_domain=[.15, 1],
yaxis2={"domain": [0, .15], "matches": None, "visible": False},
xaxis2={"visible":False},
xaxis={"position":0, "anchor":"free"},
showlegend=False
)

How to format plotly legend when using marker color?

I want to follow up on this post: Plotly: How to colorcode plotly graph objects bar chart using Python?.
When using plotly express, and specifying 'color', the legend is correctly produced as seen in the post by vestland.
This is my plotly express code:
data = {'x_data': np.random.random_sample((5,)),
'y_data': ['A', 'B', 'C', 'D', 'E'],
'c_data': np.random.randint(1, 100, size=5)
}
df = pd.DataFrame(data=data)
fig = px.bar(df,
x='x_data',
y='y_data',
orientation='h',
color='c_data',
color_continuous_scale='YlOrRd'
)
fig.show()
But when using go.Bar, the legend is incorrectly displayed as illustrated here:
This is my code using graph objects:
bar_trace = go.Bar(name='bar_trace',
x=df['x_data'],
y=df['y_data'],
marker={'color': df['c_data'], 'colorscale': 'YlOrRd'},
orientation='h'
)
layout = go.Layout(showlegend=True)
fig = go.FigureWidget(data=[bar_trace], layout=layout)
fig.show()
I'm learning how to use FigureWidget and it seems it can't use plotly express so I have to learn how to use graph objects to plot. How do I link the legend to the data such that it works like the plotly express example in vestland's post.
This really comes down to understanding what a high level API (plotly express) does. When you specify color in px if it is categorical it creates a trace per value of categorical. Hence the below two ways of creating a figure are mostly equivalent. The legend shows an item for each trace, not for each color.
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
df = pd.DataFrame({"x":np.linspace(0,10,10), "y":np.linspace(5,15,10), "color":np.random.choice(list("ABCD"),10)})
px.bar(df, x="x", y="y", color="color", orientation="h").show()
fig = go.Figure()
for g in df.groupby("color"):
fig.add_trace(go.Bar(x=g[1]["x"], y=g[1]["y"], name=g[0], orientation="h"))
fig
supplementary based on comments
you do not have to use graph objects if you are using FigureWidget() as demonstrated by second figure, create with plotly express and then generate FigureWidget()
for continuous data normal pattern is to use a single trace and a colorbar (also demonstrated in second figure). However if you want a discrete legend, create a trace per value in c_data and use https://plotly.com/python-api-reference/generated/plotly.colors.html sample_colorscale()
import plotly.express as px
import plotly.colors
import plotly.graph_objects as go
import numpy as np
import pandas as pd
# simulate data frame...
df = pd.DataFrame(
{
"x_data": np.linspace(0, 10, 10),
"y_data": np.linspace(5, 15, 10),
"c_data": np.random.randint(0, 4, 10),
}
)
# build a trace per value in c_data using graph objects ... correct legend !!??
bar_traces = [
go.Bar(
name="bar_trace",
x=d["x_data"],
y=d["y_data"],
marker={
"color": plotly.colors.sample_colorscale(
"YlOrRd",
d["c_data"] / df["c_data"].max(),
)
},
orientation="h",
)
for c, d in df.groupby("c_data")
]
layout = go.Layout(showlegend=True)
fig = go.FigureWidget(data=bar_traces, layout=layout)
fig.show()
fig = px.bar(
df,
x="x_data",
y="y_data",
color="c_data",
orientation="h",
color_continuous_scale="YlOrRd",
)
fig = go.FigureWidget(data=fig.data, layout=fig.layout)
fig.show()

Plotly Set Trace Position in a Figure

I'm a newbie in Plotly and I was wondering if there is a way to specify where a new trace needs to be centered within the Figure object.
Just to be more clear, this is an example:
import plotly.express as px
import plotly.graph_objects as go
df = pd.DataFrame(something)
fig = go.Figure()
for i in [40,45,50]:
fig.add_shape(
go.layout.Shape(
type='line',
xref='x',
yref='y',
x0=line_data[i]["min"],
y0=i,
x1=line_data[i]["max"],
y1=i,
),
)
fig.add_trace(
go.Scatter(
x=df.ColA.values,
y=df.ColB.values,
mode='markers',
)
)
This is the result
My goal is to build an histogram of the points in each horizontal line.
I don't know if there is a better and faster way, but my idea was to add more traces, each one with an histogram, and then center those traces in each line. Is there a way to do it? Maybe some position parameter for a trace, like (xcenter=7.5, ycenter=50)?
My ideal result should be:
you describe histogram / frequency multiple observed items
have mapped these to y-axis using base
import numpy as np
import plotly.graph_objects as go
df = pd.DataFrame({40:np.random.normal(5,2, 200).astype(int),50:np.random.normal(6,2, 200).astype(int),60:np.random.normal(6.5,2, 200).astype(int)})
# change to frequency of observed values
df2 = df[40].value_counts().to_frame().join(df[50].value_counts(), how="outer").join(df[60].value_counts(), how="outer")
# plot bar of frequency, setting base based on observation
fig = go.Figure([go.Bar(x=df2.index, y=df2[c]/len(df2), base=c, name=c) for c in df2.columns])
fig.update_layout(barmode="overlay")

Plotly Express: Plotting individual columns of a dataframe as multiple plots (scrollable) using plotly express

I posed a question at Plotly: How to add a horizontal scrollbar to a plotly express figure? asking how to add a horizontal scrollbar to a plotly express figure for purposes of visualizing a long multivariate time series. A solution for a simple example consisting of three series having 100K points each was given as follows:
import plotly.express as px
import numpy as np
import pandas as pd
np.random.seed(123)
e = np.random.randn(100000,3)
df=pd.DataFrame(e, columns=['a','b','c'])
df['x'] = df.index
df_melt = pd.melt(df, id_vars="x", value_vars=df.columns[:-1])
fig=px.line(df_melt, x="x", y="value",color="variable")
# Add range slider
fig.update_layout(xaxis=dict(rangeslider=dict(visible=True),
type="linear"))
fig.show()
This code is nice, but I'd like to have the plots not superimposed on a single set of axes--instead one above the other as would be done with subplot. For example, signal 'a' would appear above signal 'b', which would appear above signal 'c'.
Because my actual time series have at least 50 channels, a vertical scrollbar will likely be necessary.
As far as I know, it may be possible in dash, but it does not exist in plotly. The question you quoted also suggests a range slider as a substitute for the scroll function. At the same time, the range slider is integrated with the graph, so if you don't make the slider function independent, it will disappear on scrolling, which is not a good idea. I think the solution at the moment is to have 50 channels side by side and add a slider.
import plotly.graph_objects as go
import numpy as np
import pandas as pd
np.random.seed(123)
e = np.random.randn(100000,3)
df=pd.DataFrame(e, columns=['a','b','c'])
df['x'] = df.index
df_melt = pd.melt(df, id_vars="x", value_vars=df.columns[:-1])
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_melt.query('variable == "a"')['x'],
y=df_melt.query('variable == "a"')['value'], yaxis='y'))
fig.add_trace(go.Scatter(x=df_melt.query('variable == "b"')['x'],
y=df_melt.query('variable == "b"')['value'], yaxis='y2'))
fig.add_trace(go.Scatter(x=df_melt.query('variable == "c"')['x'],
y=df_melt.query('variable == "c"')['value'], yaxis='y3'))
# Add range slider
fig.update_layout(
xaxis=dict(
rangeslider=dict(visible=True),
type="linear"),
yaxis=dict(
anchor='x',
domain=[0, 0.33],
linecolor='blue',
type='linear',
zeroline=False
),
yaxis2=dict(
anchor='x',
domain=[0.33, 0.66],
linecolor='red',
type='linear',
zeroline=False
),
yaxis3=dict(
anchor='x',
domain=[0.66, 1.0],
linecolor='green',
type='linear',
zeroline=False
),
)
fig.show()

Plotly subplot represent same y-axis name with same color and single legend

I am trying to create a plot for two categories in a subplot. 1st column represent category FF and 2nd column represent category RF in the subplot.
The x-axis is always time and y-axis is remaining columns. In other words, it is a plot with one column vs rest.
1st category and 2nd category always have same column names just only the values differs.
I tried to generate the plot in a for loop but the problem is plotly treats each column name as distinct and thereby it represents the lines in different color for y-axis with same name. As a consequence, in legend also an entry is created.
For example, in first row Time vs price2010 I want both subplot FF and RF to be represented in same color (say blue) and a single entry in legend.
I tried adding legendgroup in go.Scatter but it doesn't help.
import pandas as pd
from pandas import DataFrame
from plotly import tools
from plotly.offline import init_notebook_mode, plot, iplot
import plotly.graph_objs as go
from plotly.subplots import make_subplots
CarA = {'Time': [10,20,30,40 ],
'Price2010': [22000,26000,27000,35000],
'Price2011': [23000,27000,28000,36000],
'Price2012': [24000,28000,29000,37000],
'Price2013': [25000,29000,30000,38000],
'Price2014': [26000,30000,31000,39000],
'Price2015': [27000,31000,32000,40000],
'Price2016': [28000,32000,33000,41000]
}
ff = DataFrame(CarA)
CarB = {'Time': [8,18,28,38 ],
'Price2010': [19000,20000,21000,22000],
'Price2011': [20000,21000,22000,23000],
'Price2012': [21000,22000,23000,24000],
'Price2013': [22000,23000,24000,25000],
'Price2014': [23000,24000,25000,26000],
'Price2015': [24000,25000,26000,27000],
'Price2016': [25000,26000,27000,28000]
}
rf = DataFrame(CarB)
Type = {
'FF' : ff,
'RF' : rf
}
fig = make_subplots(rows=len(ff.columns), cols=len(Type), subplot_titles=('FF','RF'),vertical_spacing=0.3/len(ff.columns))
labels = ff.columns[1:]
for indexC, (cat, values) in enumerate(Type.items()):
for indexP, params in enumerate(values.columns[1:]):
trace = go.Scatter(x=values.iloc[:,0], y=values[params], mode='lines', name=params,legendgroup=params)
fig.append_trace(trace,indexP+1, indexC+1)
fig.update_xaxes(title_text=values.columns[0],row=indexP+1, col=indexC+1)
fig.update_yaxes(title_text=params,row=indexP+1, col=indexC+1)
fig.update_layout(height=2024, width=1024,title_text="Car Analysis")
iplot(fig)
It might not be a good solution, but so far I can able to come up only with this hack.
fig = make_subplots(rows=len(ff.columns), cols=len(Type), subplot_titles=('FF','RF'),vertical_spacing=0.2/len(ff.columns))
labels = ff.columns[1:]
colors = [ '#a60000', '#f29979', '#d98d36', '#735c00', '#778c23', '#185900', '#00a66f']
legend = True
for indexC, (cat, values) in enumerate(Type.items()):
for indexP, params in enumerate(values.columns[1:]):
trace = go.Scatter(x=values.iloc[:,0], y=values[params], mode='lines', name=params,legendgroup=params, showlegend=legend, marker=dict(
color=colors[indexP]))
fig.append_trace(trace,indexP+1, indexC+1)
fig.update_xaxes(title_text=values.columns[0],row=indexP+1, col=indexC+1)
fig.update_yaxes(title_text=params,row=indexP+1, col=indexC+1)
fig.update_layout(height=1068, width=1024,title_text="Car Analysis")
legend = False
If you combine your data into a single tidy data frame, you can use a simple Plotly Express call to make the chart: px.line() with color, facet_row and facet_col

Categories