how to make countplot in plotly [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I am new to plotly. I am trying to create a countplot in plotly. I am reading a dataframe and here are my columns and values in the dataframe.
Name Defect severity
User1 Medium
User1 Medium
User1 High
User2 High
Here's how I would like the final graph to be shown
Can anyone suggest me how to code in Plotly?

You can do it with two lines of code, with the help of pandas groupby and plotly's barmode attribute.
Plotly's bar chart has a specific attribute to control how to show the bars, it's called barmode, quoting the API documentation:
barmode: str (default 'relative')
One of 'group', 'overlay' or 'relative' In 'relative' mode,
bars are stacked above zero for positive values and below zero for
negative values. In 'overlay' mode, bars are drawn on top of one
another. In 'group' mode, bars are placed beside each other.
See the bar chart documentation for examples.
Now, for your example:
# import needed libraries
import pandas as pd
import plotly.express as px
# some dummy dataset
df = pd.DataFrame(
{
"Name": ["User1", "User1", "User1", "User2"],
"Defect severity": ["Medium", "Medium", "High", "High"],
}
)
You need to group by both Name and Defect severity columns, and then use the count aggregating function (I recommend you take a look at this question)
df = df.groupby(by=["Name", "Defect severity"]).size().reset_index(name="counts")
The data now will look like the following:
Name
Defect severity
counts
0
User1
High
1
1
User1
Medium
2
2
User2
High
1
Finally, you can use plotly bar chart:
px.bar(data_frame=df, x="Name", y="counts", color="Defect severity", barmode="group")
The chart would be:
There you go! with only two lines of code, you got a nice grouped bar chart.

I created almost all what you want. Unfortunately, I did not find a way to set the title in the legend correctly(annotations is not good parameter to set a legend title). And to display numbers (1.0,2.0) it is necessary to create an additional column with values (column - df["Severity numbers"]).
Code:
# import all the necessaries libraries
import pandas as pd
import plotly
import plotly.graph_objs as go
# Create DataFrame
df = pd.DataFrame({"Name":["User1","User1", "User1","User2"],
"Defect severity":["Medium","Medium","High","High"],
"Severity numbers":[1,1,2,2]})
# Create two additional DataFrames to traces
df1 = df[df["Defect severity"] == "Medium"]
df2 = df[df["Defect severity"] == "High"]
# Create two traces, first "Medium" and second "High"
trace1 = go.Bar(x=df1["Name"], y=df1["Severity numbers"], name="Medium")
trace2 = go.Bar(x=df2["Name"], y=df2["Severity numbers"], name="High")
# Fill out data with our traces
data = [trace1, trace2]
# Create layout and specify title, legend and so on
layout = go.Layout(title="Severity",
xaxis=dict(title="Name"),
yaxis=dict(title="Count of defect severity"),
legend=dict(x=1.0, y=0.5),
# Here annotations need to create legend title
annotations=[
dict(
x=1.05,
y=0.55,
xref="paper",
yref="paper",
text=" Defect severity",
showarrow=False
)],
barmode="group")
# Create figure with all prepared data for plot
fig = go.Figure(data=data, layout=layout)
# Create a plot in your Python script directory with name "bar-chart.html"
plotly.offline.plot(fig, filename="bar-chart.html")
Output:

data = [
go.Bar(
y=coach_sectors['Sectors'].value_counts().to_dense().keys(),
x=coach_sectors['Sectors'].value_counts(),
orientation='h',
text="d",
)]
layout = go.Layout(
height=500,
title='Sector/ Area of Coaches - Combined',
hovermode='closest',
xaxis=dict(title='Votes', ticklen=5, zeroline=False, gridwidth=2, domain=[0.1, 1]),
yaxis=dict(title='', ticklen=5, gridwidth=2),
showlegend=False
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='Sector/ Area of Coaches - Combined')

Related

Plotly two mapbox figures in a single map with different color

I want to plot two mapbox figures in a single map. This is what I have right now:
fig = px.choropleth_mapbox(geo_df,
geojson=geo_df.geometry,
locations=geo_df.index,
color="TOTAL_POPULATION", color_continuous_scale=px.colors.sequential.Greens,
center={"lat": 40.7, "lon": -73.95},
mapbox_style="open-street-map",
zoom=10)
fig2 = px.scatter_mapbox(geo_df, lat="INTPTLAT", lon="INTPTLON",
size="MEDIAN_VALUE", color="MEDIAN_VALUE",
color_continuous_scale=px.colors.sequential.Blues,
mapbox_style="open-street-map")
fig.add_trace(fig2.data[0])
fig.update_layout(
autosize=False,
width=1400,
height=1000,
)
Here, I have specified different colors for the two mapbox, but its only picking the first one and applying it to both. How can I print them with different colors to improve visibility?
Since your question does not present any data, I have combined the reference example with another example to confirm the events.
I searched the plotly community for a solution and identified examples that would solve the issue.
The way to do this is to add a graph object choropleth map to the graph object and then add an express graph.
One issue is that the specified colormap is not valid. We are currently investigating but may not be able to reach a solution. I believe it is compatible with the solution to your question.
import plotly.express as px
import plotly.graph_objects as go
px.set_mapbox_access_token(open("mapbox_api_key.txt").read())
# fig for data
df_election = px.data.election()
geojson = px.data.election_geojson()
# fig2 for data
df_car = px.data.carshare()
df_car['peak_hour2'] = df_car['peak_hour']*20
fig = go.Figure()
fig.add_trace(go.Choroplethmapbox(geojson=geojson,
z=df_election["Bergeron"],
colorscale='greens',
locations=df_election["district"],
featureidkey="properties.district",
colorbar_x=1.12,
colorbar_title='election'
))
fig.update_layout(mapbox_style="open-street-map",
mapbox_center={"lat": 45.5517, "lon": -73.7073},
mapbox_zoom=10)
map_scatter = px.scatter_mapbox(df_car,
lat="centroid_lat",
lon="centroid_lon",
color="peak_hour",
size="car_hours",
color_continuous_scale=px.colors.sequential.Blues,
size_max=15,
zoom=9)
fig.add_traces(list(map_scatter.select_traces()))
fig.update_layout(coloraxis={'colorbar': {'title': {'text': 'peak_hour'}}})
fig.update_layout(autosize=True, height=600, margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

How do I add a secondary legend that explains what symbols mean with plotly express?

I have a plot which uses US states to map symbols. I currently assign symbols using the "state" column in my dataframe so that I can select particular states of interest by clicking or double clicking on the Plotly Express legend. This part is working fine. However, the symbol mapping I'm using also communicates information about territory, e.g. triangle-down means poor coverage in that state and many states will share this symbol. I would like to add another legend that shows what each shape means. How can I do this in Plotly Express? Alternatively, is there a way to display symbols in a footnote? I could also give the symbol definitions there.
The goal is to display that circle=Medium coverage, triangle-down=poor coverage, etc. in addition to the individual state legend I already have. If the legend is clickable such that I can select entire groups based on the symbol shape that would be the best possible outcome.
Thank you for any tips!
I tried using html and footnotes to display the symbols but it did not work.
as noted in comment, it can be achieved by additional traces on different axes
have simulated some data that matches what is implied in image and comments
from scatter figure extract out how symbols and colors have been assigned to states
build another scatter that is effectively a legend.
import pandas as pd
import numpy as np
import plotly.express as px
df_s = pd.read_html(
"https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States"
)[1].iloc[:, 0:2]
df_s.columns = ["name", "state"]
# generate a dataframe that matches structure in image and question
df = pd.DataFrame(
{"activity_month": pd.date_range("1-jan-2020", "today", freq="W")}
).assign(
value=lambda d: np.random.uniform(0, 1, len(d)),
state=lambda d: np.random.choice(df_s["state"], len(d)),
)
# straight forward scatter
fig = px.scatter(df, x="activity_month", y="value", symbol="state", color="state")
# extract out how symbols and colors have been assigned to states
df_symbol = pd.DataFrame(
[
{"symbol": t.marker.symbol, "state": t.name, "color": t.marker.color}
for t in fig.data
]
).assign(y=lambda d: d.index//20, x=lambda d: d.index%20)
# build a figure that effectively the legend
fig_legend = px.scatter(
df_symbol,
x="x",
y="y",
symbol="symbol",
color="state",
text="state",
color_discrete_sequence=df_symbol["color"]
).update_traces(textposition="middle right", showlegend=False, xaxis="x2", yaxis="y2")
# insert legend into scatter and format axes
fig.add_traces(fig_legend.data).update_layout(
yaxis_domain=[.15, 1],
yaxis2={"domain": [0, .15], "matches": None, "visible": False},
xaxis2={"visible":False},
xaxis={"position":0, "anchor":"free"},
showlegend=False
)

Adding multiple vertical lines for each frame in animated plot using plotly

Summary: I am looking to add multiple vertical lines based on date values within each (animation) frame in an animated horizontal bar plot in plotly. What I want is to compare the existing date and new date for each task within each location (Please refer below dataset), and further point out the unique "new" date values in the plot. So I slide across each location frame and display the grouped bar plots across tasks comparing dates (existing vs new). Much easier to understand if you please refer below snippet (totally reproducible) I put together along with the charts below.
Next:
I want to include a dotted line for each of the unique new dates at the end of the horizontal bars for each selection of location (animated frame) through slider (possibly using add_shape in a loop of frames?). The expected chart output is shown below (first one) with red dotted lines passing through the edges of each date bars of unique height. Any other thoughts on showing unique "new" dates welcome!
Is there any way to dynamically control for data labels (dates) such that the dates are visible within the chart? Currently my chart output doesn't consistently show up the labels fully within chart area. For example, please refer to the second chart below for selected location slider "S1". The constraint is I don't want to leave out lot of spaces extending the x-axis ranges arbitrarily to a far future date.
Any help would be greatly appreciated. Thanks much in advance!
import pandas as pd
from datetime import datetime
from plotly import graph_objects as go
#Read in sample data
inputDF = pd.read_csv('https://raw.githubusercontent.com/SauvikDe/Demo/main/Test_Dummy.csv')
#Format to datetime
inputDF['Date_Existing'] = pd.to_datetime(inputDF['Date_Existing'])
inputDF['Date_New'] = pd.to_datetime(inputDF['Date_New'])
#Sort required columns to plot better
inputDF.sort_values(by=['Location', 'Date_New', 'Date_Existing'], inplace=True)
The data looks like this (first few records):
Location,Task,TaskTimeMins,Date_Existing,Date_New
S1,E10,60,7/14/2022,7/14/2022
S1,E12,75,7/21/2022,7/14/2022
S1,SM0,40,7/21/2022,7/28/2022
S1,E16,60,7/28/2022,7/28/2022
S1,SM1,120,7/28/2022,7/28/2022
S1,E18,120,8/4/2022,7/28/2022
And, my working code to plot dynamic chart using plotly is as below:
# Create frames for each location
frames = []
for s in inputDF['Location'].unique():
t = inputDF.loc[inputDF['Location'] == s]
frames.append(
go.Frame(
name=str(s),
data=[
go.Bar(x=t['Date_Existing'],
y=t['Task']+":["+ t['TaskTimeMins'].astype(str)+" Minutes]",
xaxis="x",
name="Existing Date",
offsetgroup=1,
orientation='h',
text=t['Date_Existing'].dt.strftime("%d-%m-%y"),
textposition='outside',
marker_color='#f4af36'#'#f9cb9c'#'#9fc5e8'#'#fff2cc'
),
go.Bar(x=t['Date_New'],
y=t['Task']+":["+ t['TaskTimeMins'].astype(str)+" Minutes]",
xaxis="x2",
name='New Date',
offsetgroup=2,
orientation='h',
text=t['Date_New'].dt.strftime("%d-%m-%y"),
textposition='outside',
marker_color='#38761d'#'#6aa84f'#'#08ce00'
),
],
layout={"xaxis":{"range":[t['Date_Existing'].min() - pd.offsets.DateOffset(1),
t['Date_Existing'].max() + pd.offsets.DateOffset(30)],
#"dtick":"M3",
#"tickformat":"%d-%b-%y",
#"ticklabelmode":"period",
},
"xaxis2":{"range":[t['Date_Existing'].min() - pd.offsets.DateOffset(1),
t['Date_Existing'].max() + pd.offsets.DateOffset(30)],
#"dtick":"M3",
#"tickformat":"%d-%b-%y",
#"ticklabelmode":"period",
},
},
)
)
# Create the figure, also set axis layout
fig = go.Figure(
data=frames[0].data,
frames=frames,
layout={
"xaxis": {
"showticklabels":True,
},
"xaxis2": {
"showticklabels":True,
"overlaying": "x",
"side": "top",
},
},
)
# Update dynamic chart title for each frame
for k in range(len(fig.frames)):
fig.frames[k]['layout'].update(title_text=
f"<b>Location: {fig.frames[k].name}</b> >> Date comparison:Existing vs New",
)
# Finally create the slider
fig.update_layout(
barmode="group",
updatemenus=[{"buttons": [{"args": [None, {"frame": {"duration": 2000, "redraw": True}}],
"label": "▶",
"method": "animate",},],
"type": "buttons",}],
sliders=[{"steps": [{"args": [[f.name],{"frame": {"duration": 0, "redraw": True},
"mode": "immediate",},],
"label": f.name,
"method": "animate",}
for f in frames ],
}],
legend=dict(orientation="h",
yanchor="bottom",
y=-0.5,
xanchor="right",
x=0.7,
),
font_size=12,
font_color="blue",
title_font_family="Calibri",
title_font_color="purple",
title_x=0.5,
width=950,
height=600,
)
# Position play button
fig['layout']['updatemenus'][0]['pad']=dict(r=10, t=395)
# Position slider
fig['layout']['sliders'][0]['pad']=dict(r=10, t=50,)
# Print chart
fig.show()

Plotly pie graph not showing all data

I have noticed that my go.Pie graph only shows 2 of the 3 values held in the dataframe column. I noticed this when creating a px.treemap referencing the exact same column in the dataframe and it shows all 3 values.
Below is my code for the pie chart and then the treemap
#docCategory count pie graph
valuesDocCat = df['docCategory'].value_counts()
figDocCat = go.Figure(data=[go.Pie(labels = df['docCategory'], values = valuesDocCat)])
figDocCat.update_traces(textposition = 'inside')
figDocCat.update_layout(uniformtext_minsize=14, uniformtext_mode='hide', title='Document Category breakdown')
#treeMap test graph
valuesTreemap = df['Kind'].value_counts()
figTreemap = px.treemap(df, path = ['docCategory', 'Kind'], color='docCategory')
figTreemap.update_traces(root_color='lightgrey')
figTreemap.update_layout(margin = dict(t=50, l=25, r=25, b=25)
You can see my code above referencing the df['docCategory'] in both instances but as you can see in the images below the pie chart doesnt have the 'Unknown' field whereas the treemap does.
Any ideas on why? I have other pie charts that have more than 2 fields being referenced and no issues, it is just happening on this one.
your question "Plotly pie graph not showing all data", it is showing everything.
figDocCat = go.Figure(data=[go.Pie(labels = df['docCategory'], values = valuesDocCat)])
you are passing different length arrays for labels and values. plotly is taking first 3 items from labels, some of which are the same.
to be consistent this line would be figDocCat = go.Figure(data=[go.Pie(labels=valuesDocCat.index, values=valuesDocCat)]). i.e. both labels and values come from the same pandas series
have simulated data frame to demonstrate
full solution
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
import numpy as np
cats = {
"Structured": ["Spreadsheet"],
"Unknown": ["System File", "Unrecognised"],
"Unstrcutured": ["Document", "Email", "Image", "Calendar Entry"],
}
df = pd.DataFrame(
[
{"docCategory": c, "Kind": np.random.choice(cats[c], 2)[0]}
for c in np.random.choice(list(cats.keys()), 25)
]
)
# docCategory count pie graph
valuesDocCat = df["docCategory"].value_counts()
figDocCat = go.Figure(data=[go.Pie(labels=valuesDocCat.index, values=valuesDocCat)])
figDocCat.update_traces(textposition="inside")
figDocCat.update_layout(
uniformtext_minsize=14, uniformtext_mode="hide", title="Document Category breakdown"
)
figDocCat.show()
# treeMap test graph
valuesTreemap = df["Kind"].value_counts()
figTreemap = px.treemap(df, path=["docCategory", "Kind"], color="docCategory")
figTreemap.update_traces(root_color="lightgrey")
figTreemap.update_layout(margin=dict(t=50, l=25, r=25, b=25))

Plotly Express: Plotting individual columns of a dataframe as multiple plots (scrollable) using plotly express

I posed a question at Plotly: How to add a horizontal scrollbar to a plotly express figure? asking how to add a horizontal scrollbar to a plotly express figure for purposes of visualizing a long multivariate time series. A solution for a simple example consisting of three series having 100K points each was given as follows:
import plotly.express as px
import numpy as np
import pandas as pd
np.random.seed(123)
e = np.random.randn(100000,3)
df=pd.DataFrame(e, columns=['a','b','c'])
df['x'] = df.index
df_melt = pd.melt(df, id_vars="x", value_vars=df.columns[:-1])
fig=px.line(df_melt, x="x", y="value",color="variable")
# Add range slider
fig.update_layout(xaxis=dict(rangeslider=dict(visible=True),
type="linear"))
fig.show()
This code is nice, but I'd like to have the plots not superimposed on a single set of axes--instead one above the other as would be done with subplot. For example, signal 'a' would appear above signal 'b', which would appear above signal 'c'.
Because my actual time series have at least 50 channels, a vertical scrollbar will likely be necessary.
As far as I know, it may be possible in dash, but it does not exist in plotly. The question you quoted also suggests a range slider as a substitute for the scroll function. At the same time, the range slider is integrated with the graph, so if you don't make the slider function independent, it will disappear on scrolling, which is not a good idea. I think the solution at the moment is to have 50 channels side by side and add a slider.
import plotly.graph_objects as go
import numpy as np
import pandas as pd
np.random.seed(123)
e = np.random.randn(100000,3)
df=pd.DataFrame(e, columns=['a','b','c'])
df['x'] = df.index
df_melt = pd.melt(df, id_vars="x", value_vars=df.columns[:-1])
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_melt.query('variable == "a"')['x'],
y=df_melt.query('variable == "a"')['value'], yaxis='y'))
fig.add_trace(go.Scatter(x=df_melt.query('variable == "b"')['x'],
y=df_melt.query('variable == "b"')['value'], yaxis='y2'))
fig.add_trace(go.Scatter(x=df_melt.query('variable == "c"')['x'],
y=df_melt.query('variable == "c"')['value'], yaxis='y3'))
# Add range slider
fig.update_layout(
xaxis=dict(
rangeslider=dict(visible=True),
type="linear"),
yaxis=dict(
anchor='x',
domain=[0, 0.33],
linecolor='blue',
type='linear',
zeroline=False
),
yaxis2=dict(
anchor='x',
domain=[0.33, 0.66],
linecolor='red',
type='linear',
zeroline=False
),
yaxis3=dict(
anchor='x',
domain=[0.66, 1.0],
linecolor='green',
type='linear',
zeroline=False
),
)
fig.show()

Categories