Plotting multiple graphs by grouping values from a data frame column? - python

I am rather new to visualizations and appreciate anything and help!
I am looking at how I could make multiple plots grouping a data frame by one column and then making a plot for each unique value. E.g. for the dataset below, I would like three different plots, one for each location, and I would like to be able to label them individually. I am not quite sure how to do that. The sample data frame:
data = {
"location": ["USA", "USA", "USA", "UK", "UK",
"UK", "World", "World", "World"],
"date": ["21-06-2021", "22-06-2021", "23-06-2021",
"21-06-2021", "22-06-2021", "23-06-2021",
"21-06-2021", "22-06-2021", "23-06-2021"],
"number": [456, 543, 675, 543, 765, 345, 9543, 9543, 9234]
}
import pandas as pd
df = pd.DataFrame (data, columns = ['location','date','number'])
df["date"] = pd.to_datetime(df["date"])
I tried doing this. It gives me three plots, but I don't know how to label and alter the descriptions of the graph individually.
df.groupby("location").plot(x="date", y="number", subplots=True)
And in general, it would be nice if the plot looked a bit nicer, like the one below (I am referring to the line and the dot):
import seaborn as sns
p = sns.catplot(data=df, x='date', y="number", hue='location', kind='point');
p.fig.set_figwidth(16)
p.fig.set_figheight(6)

import matplotlib.pyplot as plt
Try:
labels=[]
for k,v in df.groupby("location"):
plt.plot(v["date"],v["number"],marker='o')
labels.append(k)
plt.legend(labels,title='location')
plt.xticks(v["date"].unique())
Note: If you want to remove the top and right border then add this line at the end inside the for loop:
plt.gca().spines[['top','right']].set_visible(False)

Related

How do I add a secondary legend that explains what symbols mean with plotly express?

I have a plot which uses US states to map symbols. I currently assign symbols using the "state" column in my dataframe so that I can select particular states of interest by clicking or double clicking on the Plotly Express legend. This part is working fine. However, the symbol mapping I'm using also communicates information about territory, e.g. triangle-down means poor coverage in that state and many states will share this symbol. I would like to add another legend that shows what each shape means. How can I do this in Plotly Express? Alternatively, is there a way to display symbols in a footnote? I could also give the symbol definitions there.
The goal is to display that circle=Medium coverage, triangle-down=poor coverage, etc. in addition to the individual state legend I already have. If the legend is clickable such that I can select entire groups based on the symbol shape that would be the best possible outcome.
Thank you for any tips!
I tried using html and footnotes to display the symbols but it did not work.
as noted in comment, it can be achieved by additional traces on different axes
have simulated some data that matches what is implied in image and comments
from scatter figure extract out how symbols and colors have been assigned to states
build another scatter that is effectively a legend.
import pandas as pd
import numpy as np
import plotly.express as px
df_s = pd.read_html(
"https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States"
)[1].iloc[:, 0:2]
df_s.columns = ["name", "state"]
# generate a dataframe that matches structure in image and question
df = pd.DataFrame(
{"activity_month": pd.date_range("1-jan-2020", "today", freq="W")}
).assign(
value=lambda d: np.random.uniform(0, 1, len(d)),
state=lambda d: np.random.choice(df_s["state"], len(d)),
)
# straight forward scatter
fig = px.scatter(df, x="activity_month", y="value", symbol="state", color="state")
# extract out how symbols and colors have been assigned to states
df_symbol = pd.DataFrame(
[
{"symbol": t.marker.symbol, "state": t.name, "color": t.marker.color}
for t in fig.data
]
).assign(y=lambda d: d.index//20, x=lambda d: d.index%20)
# build a figure that effectively the legend
fig_legend = px.scatter(
df_symbol,
x="x",
y="y",
symbol="symbol",
color="state",
text="state",
color_discrete_sequence=df_symbol["color"]
).update_traces(textposition="middle right", showlegend=False, xaxis="x2", yaxis="y2")
# insert legend into scatter and format axes
fig.add_traces(fig_legend.data).update_layout(
yaxis_domain=[.15, 1],
yaxis2={"domain": [0, .15], "matches": None, "visible": False},
xaxis2={"visible":False},
xaxis={"position":0, "anchor":"free"},
showlegend=False
)

Adding multiple vertical lines for each frame in animated plot using plotly

Summary: I am looking to add multiple vertical lines based on date values within each (animation) frame in an animated horizontal bar plot in plotly. What I want is to compare the existing date and new date for each task within each location (Please refer below dataset), and further point out the unique "new" date values in the plot. So I slide across each location frame and display the grouped bar plots across tasks comparing dates (existing vs new). Much easier to understand if you please refer below snippet (totally reproducible) I put together along with the charts below.
Next:
I want to include a dotted line for each of the unique new dates at the end of the horizontal bars for each selection of location (animated frame) through slider (possibly using add_shape in a loop of frames?). The expected chart output is shown below (first one) with red dotted lines passing through the edges of each date bars of unique height. Any other thoughts on showing unique "new" dates welcome!
Is there any way to dynamically control for data labels (dates) such that the dates are visible within the chart? Currently my chart output doesn't consistently show up the labels fully within chart area. For example, please refer to the second chart below for selected location slider "S1". The constraint is I don't want to leave out lot of spaces extending the x-axis ranges arbitrarily to a far future date.
Any help would be greatly appreciated. Thanks much in advance!
import pandas as pd
from datetime import datetime
from plotly import graph_objects as go
#Read in sample data
inputDF = pd.read_csv('https://raw.githubusercontent.com/SauvikDe/Demo/main/Test_Dummy.csv')
#Format to datetime
inputDF['Date_Existing'] = pd.to_datetime(inputDF['Date_Existing'])
inputDF['Date_New'] = pd.to_datetime(inputDF['Date_New'])
#Sort required columns to plot better
inputDF.sort_values(by=['Location', 'Date_New', 'Date_Existing'], inplace=True)
The data looks like this (first few records):
Location,Task,TaskTimeMins,Date_Existing,Date_New
S1,E10,60,7/14/2022,7/14/2022
S1,E12,75,7/21/2022,7/14/2022
S1,SM0,40,7/21/2022,7/28/2022
S1,E16,60,7/28/2022,7/28/2022
S1,SM1,120,7/28/2022,7/28/2022
S1,E18,120,8/4/2022,7/28/2022
And, my working code to plot dynamic chart using plotly is as below:
# Create frames for each location
frames = []
for s in inputDF['Location'].unique():
t = inputDF.loc[inputDF['Location'] == s]
frames.append(
go.Frame(
name=str(s),
data=[
go.Bar(x=t['Date_Existing'],
y=t['Task']+":["+ t['TaskTimeMins'].astype(str)+" Minutes]",
xaxis="x",
name="Existing Date",
offsetgroup=1,
orientation='h',
text=t['Date_Existing'].dt.strftime("%d-%m-%y"),
textposition='outside',
marker_color='#f4af36'#'#f9cb9c'#'#9fc5e8'#'#fff2cc'
),
go.Bar(x=t['Date_New'],
y=t['Task']+":["+ t['TaskTimeMins'].astype(str)+" Minutes]",
xaxis="x2",
name='New Date',
offsetgroup=2,
orientation='h',
text=t['Date_New'].dt.strftime("%d-%m-%y"),
textposition='outside',
marker_color='#38761d'#'#6aa84f'#'#08ce00'
),
],
layout={"xaxis":{"range":[t['Date_Existing'].min() - pd.offsets.DateOffset(1),
t['Date_Existing'].max() + pd.offsets.DateOffset(30)],
#"dtick":"M3",
#"tickformat":"%d-%b-%y",
#"ticklabelmode":"period",
},
"xaxis2":{"range":[t['Date_Existing'].min() - pd.offsets.DateOffset(1),
t['Date_Existing'].max() + pd.offsets.DateOffset(30)],
#"dtick":"M3",
#"tickformat":"%d-%b-%y",
#"ticklabelmode":"period",
},
},
)
)
# Create the figure, also set axis layout
fig = go.Figure(
data=frames[0].data,
frames=frames,
layout={
"xaxis": {
"showticklabels":True,
},
"xaxis2": {
"showticklabels":True,
"overlaying": "x",
"side": "top",
},
},
)
# Update dynamic chart title for each frame
for k in range(len(fig.frames)):
fig.frames[k]['layout'].update(title_text=
f"<b>Location: {fig.frames[k].name}</b> >> Date comparison:Existing vs New",
)
# Finally create the slider
fig.update_layout(
barmode="group",
updatemenus=[{"buttons": [{"args": [None, {"frame": {"duration": 2000, "redraw": True}}],
"label": "▶",
"method": "animate",},],
"type": "buttons",}],
sliders=[{"steps": [{"args": [[f.name],{"frame": {"duration": 0, "redraw": True},
"mode": "immediate",},],
"label": f.name,
"method": "animate",}
for f in frames ],
}],
legend=dict(orientation="h",
yanchor="bottom",
y=-0.5,
xanchor="right",
x=0.7,
),
font_size=12,
font_color="blue",
title_font_family="Calibri",
title_font_color="purple",
title_x=0.5,
width=950,
height=600,
)
# Position play button
fig['layout']['updatemenus'][0]['pad']=dict(r=10, t=395)
# Position slider
fig['layout']['sliders'][0]['pad']=dict(r=10, t=50,)
# Print chart
fig.show()

plotly: how to add different vertical lines in strip plot with categorical x-axis

I have a simple strip plot with a categorical x-axis. I want to add a vertical line (on different y-values) for each category.
I am able to to create one vertical line throughout the plot like this:
import plotly.express as px
fig = px.strip(df, x="category", y="value")
fig.add_hline(y=191)
Result looks like this:
However, I am unable to plot one vertical line for each category.
Desired Output
My desired output is something like this:
I tried adding a shape to the layout, but it did not affect the output:
fig.update_layout(shapes=[
dict( type= 'line',
yref= 'paper', y0= 180, y1= 180,
xref= 'x', x0= "cat1", x1= "cat1")])
Something like this would probably work if the x-axis is numeric. However, not sure how to specify the category here. If this is the way to go, then I am probably doing it wrong.
How would I be able to add a single horizontal line, as depcited above, with a different y-value for each category?
Data to reproduce plot:
import pandas as pd
df = pd.DataFrame(data={
"category": ["cat1", "cat1", "cat2", "cat2"],
"value": [150, 160, 180, 190]
})
Add lines with shapes. There are two types of coordinate axes: x,y axis and paper-based. From your desired output, you can read and specify the x-axis as paper-based and the y-axis as the y-axis value.
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'category':['cat1','cat1','cat2','cat2'],'value':[150,160,180,190]})
fig = px.strip(df, x="category", y="value")
fig.update_layout(shapes=[dict(type='line', x0=0.2, y0=180, x1=0.3, y1=180,
xref='paper', yref='y',
line_width=3, line_color='red'),
dict(type='line', x0=0.7, y0=192, x1=0.8, y1=192,
xref='paper', yref='y',
line_width=3, line_color='red'),
])
fig.show()

Bokeh Server: Tap on Scatter to Show additional Data in Bar Plot

I have a DataFrame which holds the data I want to plot in a scatter plot.
The DataFrame has far more information than only the columns needed for the scatter x and y data.
I want to show the additional data as hover (which is not the problem) but also if I Tap-select one data point in the scatter the additional data in the other columns of the ColumnDataSource shall be plottet in a Bar Plot.
My main Problem is to get the bar plot to accept the data stored in the one selected row of the ColumnDataSource.
Everything I have seen uses column based data to feed it to the bar plot.
I was half-way through a workaround where I use the selected row of the ColumnDatasource transform it back to a DataFrame then transpose it (so it is column based) and then back to a ColumnDataSource but this can not be the intention of the creators of bokeh, right?
I stripped my Problem down to a minimalistic code snippet:
df = pd.DataFrame({"x": [1,2,3,4,5,6],
"y": [6,5,4,3,2,1],
"cat1": [11,12,13,14,15,16],
"cat2": [100,99,98,97,96,95]})
SRC = ColumnDataSource(df)
def Plot(doc):
def callback(event):
SELECTED = SRC.selected.indices
bplot = make_bPlot(SELECTED)
def make_bPlot(selected):
#Here is my question:
#How to feed the row-wise data of the SRC to the barplot?
b = figure(x_range=["cat1", "cat2"])
b.vbar(x=["cat1", "cat2"], top=["cat1", "cat2"], source=SRC)
return b
TOOLTIPS = [
("x", "#x"),
("y", "#y"),
("Category 1", "#cat1"),
("Category 2", "#cat2")]
TOOLS="pan,wheel_zoom,zoom_in,zoom_out,box_zoom,reset,tap"
cplot = figure(tools = TOOLS, tooltips=TOOLTIPS)
cplot.circle("x", "y", source=SRC)
bplot = make_bPlot(None) # init
taptool = plot.select(type=TapTool)
cplot.on_event(Tap, callback)
layout = column(cplot, bplot)
doc.add_root(layout)
Thanks in advance.
I got my answer from the Bokeh Discourse Forum:
https://discourse.bokeh.org/t/tap-on-scatter-to-show-additional-data-in-bar-plot/6939

Direct labeling a line plot with Altair

I'm plotting a line graph in Altair (4.1.0) and would like to use direct labeling (annotations) instead of a regular legend.
As such, the text mark for each line (say, time series) should appear only once and at the right-most point of the x-axis (as opposed to this scatter plot example labeling every data point).
While I'm able to use pandas to manipulate the data to get the desired results, I think it would be more elegant to have a pure-Altair implementation, but I can't seem to get it right.
For example, given the following data:
import numpy as np
import pandas as pd
import altair as alt
np.random.seed(10)
time = pd.date_range(start="10/21/2020", end="10/22/2020", periods=n)
data = pd.concat([
pd.DataFrame({
"time": time,
"group": "One",
"value": np.random.normal(10, 2, n)}),
pd.DataFrame({
"time": time,
"group": "Two",
"value": np.random.normal(5, 2, n)}).iloc[:-1]
], ignore_index=True)
I can generate a satisfactory result using pandas to create a subset that includes the last time-point for each group:
lines = alt.Chart(data).mark_line(
point=True
).encode(
x="time:T",
y="value:Q",
color=alt.Color("group:N", legend=None), # Remove legend
)
text_data = data.loc[data.groupby('group')['time'].idxmax()] # Subset the data for text positions
labels = alt.Chart(text_data).mark_text(
# some adjustments
).encode(
x="time:T",
y="value:Q",
color="group:N",
text="group:N"
)
chart = lines + labels
However, if I try to use the main data and add Altair aggregations, for example using x=max(time) or explicit transform_aggregate(), I either get text annotations on all points or none at all (respectively).
Is there a better way to obtain the above result?
You can do this using an argmax aggregate in the y encoding. For example, your labels layer might look like this:
labels = alt.Chart(data).mark_text(
align='left', dx=5
).encode(
x='max(time):T',
y=alt.Y('value:Q', aggregate={'argmax': 'time'}),
text='group:N',
color='group:N',
)

Categories