I currently have the following code:
import pandas as pd
df = pd.DataFrame({'x1': [1,7,15], 'x2': [5,10,20]})
df
import plotly.graph_objects as go
fig = go.Figure()
for row in df.iterrows():
row_data = row[1]
fig.add_trace(go.Scatter(x=[row_data['x1'], row_data['x2']], y=[0,0], mode='lines',
line={'color': 'black'}))
fig.update_layout(showlegend=False)
fig.show()
This produces the required result. However, if I have 30k traces, things start to get pretty slow, both when rendering and when working with the plot (zooming, panning). So I'm trying to figure out a better way to do it. I thought of using shapes, but then I loos some functionalities that only traces have (e.g. hover information), and also not sure it'll be faster. Is there some other way to produce fragmented (non-overlapping) lines within one trace?
Thanks!
Update:
Based on the accepted answer by #Mangochutney, here is how I was able to produce the same plot using a single trace:
import numpy as np
import plotly.graph_objects as go
x = [1, 5, np.nan, 7, 10, np.nan, 15, 20]
y = [0, 0, np.nan, 0, 0, np.nan, 0, 0]
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, mode='lines'))
fig.update_layout(showlegend=True)
fig.show()
By default you can introduce gaps in your go.scatter line plot by adding additional np.nan entries where you need them. This behavior is controlled by the connectgaps parameter: see docs
E.g.: go.Scatter(x=[0,1,np.nan, 2, 3], y=[0,0,np.nan,0,0], mode='lines')
should create a line segement between 0 and 1 and 2 and 3.
You need first to find the overlapping lines. Then you can reduce the size of the data frame drastically. First, let us define a sample data frame like yours:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
x_upperbound = 100_000
data = {'x1': [], 'x2': []}
for i in range(30_000):
start = np.random.randint(1, x_upperbound-10)
end = np.random.randint(start, start+4)
data['x1'].append(start)
data['x2'].append(end)
df = pd.DataFrame(data)
Then using the following code, we can find a reduced (by one third) but an equivalent version of our original data frame introduced above:
l = np.zeros(x_upperbound+2)
for i, row in enumerate(df.iterrows()):
l[row[1]['x1']] += 1
l[row[1]['x2']+1] -= 1
cumsum = np.cumsum(l)
new_data = {'x1': [], 'x2': []}
flag = False
for i in range(len(cumsum)):
if cumsum[i]:
if flag:
continue
new_data['x1'].append(i)
flag = True
else:
if flag:
new_data['x2'].append(i-1)
flag = False
optimized_df = pd.DataFrame(new_data)
And now is show time. Using this code, you can show the exact result you would have gotten if you had graphed the original data frame:
fig = go.Figure()
for row in optimized_df.iterrows():
row_data = row[1]
fig.add_trace(go.Scatter(x=[row_data['x1'], row_data['x2']], y=[0,0], mode='lines',
line={'color': 'black'}))
fig.update_layout(showlegend=False)
fig.show()
It takes more time if either the distance between any x1 and its respective x2 decreases or their domain expands further.
Related
I am trying to distinguish weekends from weekdays by either 1) shading the region 2) coloring points with different colors or 3) setting x-axis label marked different for weekend.
Here I am trying the 2nd option — coloring data points for weekend differently. I first created an additional column (Is_Weekday) for distinguish weekends from weekdays. However, it’s not drawn on the same line, but rather draws two lines with different colors. I would like them to be in one line but with different color for values on weekends.
Here’s my code for reproducible data:
import pandas as pd
from datetime import datetime
import plotly.express as px
np.random.seed(42)
rng = pd.date_range('2022-04-10', periods=21, freq='D')
practice_df = pd.DataFrame({ 'Date': rng, 'Val' : np.random.randn(len(rng))})
practice_df = practice_df.set_index('Date')
weekend_list = []
for i in range(len(practice_df)):
if practice_df.index[i].weekday() > 4:
weekend_list.append(True)
else:
weekend_list.append(False)
practice_df['IsWeekend'] = weekend_list
fig = px.line(temp_df,
x=temp_df.index, y='cnt',
color = 'Is_Weekend',
markers=True)
fig.show()
What I want to do would look something like this but coloring data points/line for weekends differently.
Edit:
Thanks so much to #Derek_O, I was able to color weekend with my original dataset. But I'd want to color the friday-saturday line also colored as weekend legend, so I set practice_df.index[i].weekday() >= 4 instead of practice_df.index[i].weekday() > 4.
But would it be possible to have the Friday point to be the same as weekdays.
Also, is it possible to have a straight line connecting the points, not like stairs?
Otherwise, it'd also work if we could shade weekend region like the image at the bottom.
Borrowing from #Rob Raymond's answer here, we can loop through the practice_df two elements at a time, adding a trace to the fig for each iteration of the loop.
We also only want to show the legend category the first time it occurs (so that the legend entries only show each category like True or False once), which is why I've created a new column called "showlegend" that determines whether the legend is shown or not.
import numpy as np
import pandas as pd
from datetime import datetime
import plotly.express as px
import plotly.graph_objects as go
np.random.seed(42)
rng = pd.date_range('2022-04-10', periods=21, freq='D')
practice_df = pd.DataFrame({ 'Date': rng, 'Val' : np.random.randn(len(rng))})
practice_df = practice_df.set_index('Date')
weekend_list = []
for i in range(len(practice_df)):
if practice_df.index[i].weekday() > 4:
weekend_list.append(True)
else:
weekend_list.append(False)
practice_df['IsWeekend'] = weekend_list
weekend_color_map = {True:0, False:1}
weekend_name_map = {True:"True", False:"False"}
practice_df['color'] = practice_df['IsWeekend'].map(weekend_color_map)
practice_df['name'] = practice_df['IsWeekend'].map(weekend_name_map)
## use the color column since weekend corresponds to 0, nonweekend corresponds to 1
first_weekend_idx = practice_df['color'].loc[practice_df['color'].idxmin()]
first_nonweekend_idx = practice_df['color'].loc[practice_df['color'].idxmax()]
practice_df["showlegend"] = False
showlegendIdx = practice_df.columns.get_indexer(["showlegend"])[0]
practice_df.iat[first_weekend_idx, showlegendIdx] = True
practice_df.iat[first_nonweekend_idx, showlegendIdx] = True
practice_df["showlegend"] = practice_df["showlegend"].astype(object)
fig = go.Figure(
[
go.Scatter(
x=practice_df.index[tn : tn + 2],
y=practice_df['Val'][tn : tn + 2],
mode='lines+markers',
# line_shape="hv",
line_color=px.colors.qualitative.Plotly[practice_df['color'][tn]],
name=practice_df['name'][tn],
legendgroup=practice_df['name'][tn],
showlegend=practice_df['showlegend'][tn],
)
for tn in range(len(practice_df))
]
)
fig.update_layout(legend_title_text='Is Weekend')
fig.show()
I have two dictionaries:
days = {'a':[1,2,3], 'b':[3,4,5]}
vals = {'a':[10,20,30], 'b':[9,16,25]}
Using plotly (ideally plotly express) I would like one line plot with two lines: the first line being days['a'] vs vals['a'] and the second line being days['b'] vs vals['b']. Of course in practice I may have many more potential lines. I am not sure how to pull this off. I'm happy to make a dataframe out of this data but not sure what the best structure is.
Thanks! Apologies for a noob question.
You can try the following:
import plotly.graph_objects as go
# your data
days = {'a':[1,2,3], 'b':[3,4,5]}
vals = {'a':[10,20,30], 'b':[9,16,25]}
# generate a plot for each dictionary key
data = []
for k in days.keys():
plot = go.Scatter(x=days[k],
y=vals[k],
mode="lines",
name=k # label for the plot legend
)
data.append(plot)
# create a figure with all plots and display it
fig = go.Figure(data=data)
fig.show()
This gives:
With Plotly Express:
import plotly.express as px
import pandas as pd
days = {'a': [1, 2, 3], 'b': [3, 4, 5]}
vals = {'a': [10, 20, 30], 'b': [9, 16, 25]}
# build DataFrame
df = pd.DataFrame(columns=["days", "vals", "label"])
for k in days.keys():
df = df.append(pd.DataFrame({
"days": days[k],
"vals": vals[k],
"label": k
}))
fig = px.line(df, x="days", y="vals", color="label")
fig.show()
The result is the same as above.
Currently I am trying to create a Barplot that shows the amount of reviews for an app per week. The bar should however be colored according to a third variable which contains the average rating of the reviews in each week (range: 1 to 5).
I followed the instructions of the following post to create the graph: Python: Barplot with colorbar
The code works fine:
# Import Packages
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.cm import ScalarMappable
# Create Dataframe
data = [[1, 10, 3.4], [2, 15, 3.9], [3, 12, 3.6], [4, 30,1.2]]
df = pd.DataFrame(data, columns = ["week", "count", "score"])
# Convert to lists
data_x = list(df["week"])
data_hight = list(df["count"])
data_color = list(df["score"])
#Create Barplot:
data_color = [x / max(data_color) for x in data_color]
fig, ax = plt.subplots(figsize=(15, 4))
my_cmap = plt.cm.get_cmap('RdYlGn')
colors = my_cmap(data_color)
rects = ax.bar(data_x, data_hight, color=colors)
sm = ScalarMappable(cmap=my_cmap, norm=plt.Normalize(1,5))
sm.set_array([])
cbar = plt.colorbar(sm)
cbar.set_label('Color', rotation=270,labelpad=25)
plt.show()
Now to the issue: As you might notice the value of the average score in week 4 is "1.2". The Barplot does however indicate that the value lies around "2.5". I understand that this stems from the following code line, which standardizes the values by dividing it with the max value:
data_color = [x / max(data_color) for x in data_color]
Unfortunatly I am not able to change this command in a way that the colors resemble the absolute values of the scores, e.g. with a average score of 1.2 the last bar should be colored in deep red not light orange. I tried to just plug in the regular score values (Not standardized) to solve the issue, however, doing so creates all bars with the same green color... Since this is only my second python project, I have a hard time comprehending the process behind this matter and would be very thankful for any advice or solution.
Cheers Neil
You identified correctly that the normalization is the problem here. It is in the linked code by valued SO user #ImportanceOfBeingEarnest defined for the interval [0, 1]. If you want another normalization range [normmin, normmax], you have to take this into account during the normalization:
# Import Packages
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.cm import ScalarMappable
# Create Dataframe
data = [[1, 10, 3.4], [2, 15, 3.9], [3, 12, 3.6], [4, 30,1.2]]
df = pd.DataFrame(data, columns = ["week", "mycount", "score"])
# Not necessary to convert to lists, pandas series or numpy array is also fine
data_x = df.week
data_hight = df.mycount
data_color = df.score
#Create Barplot:
normmin=1
normmax=5
data_color = [(x-normmin) / (normmax-normmin) for x in data_color] #see the difference here
fig, ax = plt.subplots(figsize=(15, 4))
my_cmap = plt.cm.get_cmap('RdYlGn')
colors = my_cmap(data_color)
rects = ax.bar(data_x, data_hight, color=colors)
sm = ScalarMappable(cmap=my_cmap, norm=plt.Normalize(normmin,normmax))
sm.set_array([])
cbar = plt.colorbar(sm)
cbar.set_label('Color', rotation=270,labelpad=25)
plt.show()
Sample output:
Obviously, this does not check that all values are indeed within the range [normmin, normmax], so a better script would make sure that all values adhere to this specification. We could, alternatively, address this problem by clipping the values that are outside the normalization range:
#...
import numpy as np
#.....
#Create Barplot:
normmin=1
normmax=3.5
data_color = [(x-normmin) / (normmax-normmin) for x in np.clip(data_color, normmin, normmax)]
#....
You may also have noticed another change that I introduced. You don't have to provide lists - pandas series or numpy arrays are fine, too. And if you name your columns not like pandas functions such as count, you can access them as df.ABC instead of df["ABC"].
How can I use Plotly to produce a line plot with a shaded standard deviation? I am trying to achieve something similar to seaborn.tsplot. Any help is appreciated.
The following approach is fully flexible with regards to the number of columns in a pandas dataframe and uses the default color cycle of plotly. If the number of lines exceed the number of colors, the colors will be re-used from the start. As of now px.colors.qualitative.Plotly can be replaced with any hex color sequence that you can find using px.colors.qualitative:
Alphabet = ['#AA0DFE', '#3283FE', '#85660D', '#782AB6', '#565656', '#1...
Alphabet_r = ['#FA0087', '#FBE426', '#B00068', '#FC1CBF', '#C075A6', '...
[...]
Complete code:
# imports
import plotly.graph_objs as go
import plotly.express as px
import pandas as pd
import numpy as np
# sample data in a pandas dataframe
np.random.seed(1)
df=pd.DataFrame(dict(A=np.random.uniform(low=-1, high=2, size=25).tolist(),
B=np.random.uniform(low=-4, high=3, size=25).tolist(),
C=np.random.uniform(low=-1, high=3, size=25).tolist(),
))
df = df.cumsum()
# define colors as a list
colors = px.colors.qualitative.Plotly
# convert plotly hex colors to rgba to enable transparency adjustments
def hex_rgba(hex, transparency):
col_hex = hex.lstrip('#')
col_rgb = list(int(col_hex[i:i+2], 16) for i in (0, 2, 4))
col_rgb.extend([transparency])
areacol = tuple(col_rgb)
return areacol
rgba = [hex_rgba(c, transparency=0.2) for c in colors]
colCycle = ['rgba'+str(elem) for elem in rgba]
# Make sure the colors run in cycles if there are more lines than colors
def next_col(cols):
while True:
for col in cols:
yield col
line_color=next_col(cols=colCycle)
# plotly figure
fig = go.Figure()
# add line and shaded area for each series and standards deviation
for i, col in enumerate(df):
new_col = next(line_color)
x = list(df.index.values+1)
y1 = df[col]
y1_upper = [(y + np.std(df[col])) for y in df[col]]
y1_lower = [(y - np.std(df[col])) for y in df[col]]
y1_lower = y1_lower[::-1]
# standard deviation area
fig.add_traces(go.Scatter(x=x+x[::-1],
y=y1_upper+y1_lower,
fill='tozerox',
fillcolor=new_col,
line=dict(color='rgba(255,255,255,0)'),
showlegend=False,
name=col))
# line trace
fig.add_traces(go.Scatter(x=x,
y=y1,
line=dict(color=new_col, width=2.5),
mode='lines',
name=col)
)
# set x-axis
fig.update_layout(xaxis=dict(range=[1,len(df)]))
fig.show()
I was able to come up with something similar. I post the code here to be used by someone else or for any suggestions for improvements.
import matplotlib
import random
import plotly.graph_objects as go
import numpy as np
#random color generation in plotly
hex_colors_dic = {}
rgb_colors_dic = {}
hex_colors_only = []
for name, hex in matplotlib.colors.cnames.items():
hex_colors_only.append(hex)
hex_colors_dic[name] = hex
rgb_colors_dic[name] = matplotlib.colors.to_rgb(hex)
data = [[1, 3, 5, 4],
[2, 3, 5, 4],
[1, 1, 4, 5],
[2, 3, 5, 4]]
#calculating mean and standard deviation
mean=np.mean(data,axis=0)
std=np.std(data,axis=0)
#draw figure
fig = go.Figure()
c = random.choice(hex_colors_only)
fig.add_trace(go.Scatter(x=np.arange(4), y=mean+std,
mode='lines',
line=dict(color=c,width =0.1),
name='upper bound'))
fig.add_trace(go.Scatter(x=np.arange(4), y=mean,
mode='lines',
line=dict(color=c),
fill='tonexty',
name='mean'))
fig.add_trace(go.Scatter(x=np.arange(4), y=mean-std,
mode='lines',
line=dict(color=c, width =0.1),
fill='tonexty',
name='lower bound'))
fig.show()
Great custom responses posted by others. In case someone is interested in code from the official plotly website, see here: https://plotly.com/python/continuous-error-bars/
I wrote a function to extend plotly.express.line with the same high level interface of Plotly Express. The line function (source code below) is used in the same exact way as plotly.express.line but allows for continuous error bands with the flag argument error_y_mode which can be either 'band' or 'bar'. In the second case it produces the same result as the original plotly.express.line. Here is an usage example:
import plotly.express as px
df = px.data.gapminder().query('continent=="Americas"')
df = df[df['country'].isin({'Argentina','Brazil','Colombia'})]
df['lifeExp std'] = df['lifeExp']*.1 # Invent some error data...
for error_y_mode in {'band', 'bar'}:
fig = line(
data_frame = df,
x = 'year',
y = 'lifeExp',
error_y = 'lifeExp std',
error_y_mode = error_y_mode, # Here you say `band` or `bar`.
color = 'country',
title = f'Using error {error_y_mode}',
markers = '.',
)
fig.show()
which produces the following two plots:
The source code of the line function that extends plotly.express.line is this:
import plotly.express as px
import plotly.graph_objs as go
def line(error_y_mode=None, **kwargs):
"""Extension of `plotly.express.line` to use error bands."""
ERROR_MODES = {'bar','band','bars','bands',None}
if error_y_mode not in ERROR_MODES:
raise ValueError(f"'error_y_mode' must be one of {ERROR_MODES}, received {repr(error_y_mode)}.")
if error_y_mode in {'bar','bars',None}:
fig = px.line(**kwargs)
elif error_y_mode in {'band','bands'}:
if 'error_y' not in kwargs:
raise ValueError(f"If you provide argument 'error_y_mode' you must also provide 'error_y'.")
figure_with_error_bars = px.line(**kwargs)
fig = px.line(**{arg: val for arg,val in kwargs.items() if arg != 'error_y'})
for data in figure_with_error_bars.data:
x = list(data['x'])
y_upper = list(data['y'] + data['error_y']['array'])
y_lower = list(data['y'] - data['error_y']['array'] if data['error_y']['arrayminus'] is None else data['y'] - data['error_y']['arrayminus'])
color = f"rgba({tuple(int(data['line']['color'].lstrip('#')[i:i+2], 16) for i in (0, 2, 4))},.3)".replace('((','(').replace('),',',').replace(' ','')
fig.add_trace(
go.Scatter(
x = x+x[::-1],
y = y_upper+y_lower[::-1],
fill = 'toself',
fillcolor = color,
line = dict(
color = 'rgba(255,255,255,0)'
),
hoverinfo = "skip",
showlegend = False,
legendgroup = data['legendgroup'],
xaxis = data['xaxis'],
yaxis = data['yaxis'],
)
)
# Reorder data as said here: https://stackoverflow.com/a/66854398/8849755
reordered_data = []
for i in range(int(len(fig.data)/2)):
reordered_data.append(fig.data[i+int(len(fig.data)/2)])
reordered_data.append(fig.data[i])
fig.data = tuple(reordered_data)
return fig
This question already has an answer here:
How to share secondary y-axis between subplots in matplotlib
(1 answer)
Closed 5 years ago.
My goal is to have two rows and three columns of plots using matplotlib. Each graph in the top row will contain two data series, and two y-axes. I want to make the scales on each axis line up so that the corresponding data series are directly comparable. Right now I have it so that the primary y-axis on each graph is aligned, but I can't get the secondary y-axes to align. Here is my current code:
import matplotlib.pyplot as plt
import pandas as pd
excel_file = 'test_data.xlsx'
sims = ['Sim 02', 'Sim 01', 'Sim 03']
if __name__ == '__main__':
data = pd.read_excel(excel_file, skiprows=[0, 1, 2, 3], sheetname=None, header=1, index_col=[0, 1], skip_footer=10)
plot_cols = len(sims)
plot_rows = 2
f, axes = plt.subplots(plot_rows, plot_cols, sharex='col', sharey='row')
secondary_ax = []
for i, sim in enumerate(sims):
df = data[sim]
modern = df.loc['Modern']
traditional = df.loc['Traditional']
axes[0][i].plot(modern.index, modern['Idle'])
secondary_ax.append(axes[0][i].twinx())
secondary_ax[i].plot(modern.index, modern['Work'])
axes[1][i].bar(modern.index, modern['Result'])
axes[0][i].set_xlim(12, 6)
if i > 0:
secondary_ax[0].get_shared_y_axes().join(secondary_ax[0], secondary_ax[i])
# secondary_ax[0].get_shared_y_axes().join(x for x in secondary_ax)
plt.show()
The solution I tried (Both the line in the if statement, and the last line before plt.show()) were solutions to similar questions, however it didn't resolve my issue. Nothing breaks, the secondary axes just aren't aligned.
I also tried adding an extra row in the plt.subplots method and using twinx() to combined the first two rows, but it created an empty second row of plots none-the-less.
As a fall back I think I could manually check each axes for the maxes and mins, and loop through each to update manually, but I'd love to find a cleaner solution if one is out there, and anyone has any insight. Thanks.
You just need to share the y axes before plotting your data:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# excel_file = 'test_data.xlsx'
sims = ['Sim 02', 'Sim 01', 'Sim 03']
if __name__ == '__main__':
# data = pd.read_excel(excel_file, skiprows=[0, 1, 2, 3], sheetname=None, header=1, index_col=[0, 1], skip_footer=10)
modern = pd.DataFrame(np.random.randint(0, 100, (100, 3)), columns=sims)
traditional = pd.DataFrame(np.random.randint(10, 30, (100, 3)), columns=sims)
traditional[sims[1]] = traditional[sims[1]] + 40
traditional[sims[2]] = traditional[sims[2]] - 40
data3 = pd.DataFrame(np.random.randint(0, 100, (100, 3)), columns=sims)
plot_cols = len(sims)
plot_rows = 2
f, axes = plt.subplots(plot_rows, plot_cols, sharex='col', sharey='row', figsize=(30, 10))
secondary_ax = []
for i, sim in enumerate(sims):
df = data[sim]
modern_series = modern[sim]
traditional_series = traditional[sim]
idle = data3
axes[0][i].plot(modern_series.index, modern_series)
secondary_ax.append(axes[0][i].twinx())
if i > 0:
secondary_ax[0].get_shared_y_axes().join(secondary_ax[0], secondary_ax[i])
secondary_ax[i].plot(traditional_series.index, traditional_series)
# axes[1][i].bar(data3.index, data3)
axes[0][i].set_xlim(12, 6)
plt.show()