Related
I have a straightforward for loop that loops through datasets in a set and plots the resultant scatterplot for each dataset using the code below;
for i in dataframes:
x = i['cycleNumber']
y = i['QCharge_mA_h']
plt.figure()
sns.scatterplot(x=x, y=y).set(title=i.name)
This plots the graphs out as expected, one on top of the other. Is there a simple way to get them all to plot onto a grid for better readability?
As an example lets say we have the following datasets and code:
data1 = {'X':[12, 10, 20, 17], 'Y':[9, 8, 5, 3]}
data2 = {'X':[2, 13, 7, 21], 'Y':[17, 18, 4, 6]}
data3 = {'X':[9, 19, 20, 3], 'Y':[6, 12, 4, 1]}
data4 = {'X':[10, 13, 15, 1], 'Y':[6, 12, 5,16]}
data5 = {'X':[12, 10, 5, 3], 'Y':[18, 7, 21, 7]}
data6 = {'X':[5, 10, 8, 17], 'Y':[9, 12, 5, 18]}
df1=pd.DataFrame(data1)
df2=pd.DataFrame(data2)
df3=pd.DataFrame(data3)
df4=pd.DataFrame(data4)
df5=pd.DataFrame(data5)
df6=pd.DataFrame(data6)
lst = [df1, df2, df3, df4, df5, df6]
for i in lst:
plt.figure()
sns.scatterplot(x=i['X'], y=i['Y'])
This returns an output of each scatterplot called printing on top of another i.e. stacked. I cant upload a shot of what that output looks like as it runs across multiple pages (this tidy output that I can capture and display is exactly what it is I'm trying to achieve).
I want it to be in a grid, lets say a 2x3 grid given it has 6 plots. How do I achieve this?
Few ways you could do this.
The Original
import matplotlib # 3.6.0
from matplotlib import pyplot as plt
import numpy as np # 1.23.3
import pandas as pd # 1.5.1
import seaborn as sns # 0.12.1
# make fake data
df = pd.DataFrame({
"cycleNumber": np.random.random(size=(100,)),
"QCharge_mA_h": np.random.random(size=(100,)),
})
# single plot
fig, ax = plt.subplots()
sns.scatterplot(df, x="cycleNumber", y="QCharge_mA_h", ax=ax)
plt.show()
With matplotlib
# make 5 random data frames
dataframes = []
for i in range(5):
np.random.seed(i)
random_df = pd.DataFrame({
"cycleNumber": np.random.random(size=(100,)),
"QCharge_mA_h": np.random.random(size=(100,)),
})
dataframes.append(random_df)
# make len(dataframes) rows using matplotlib
fig, axs = plt.subplots(nrows=len(dataframes))
for df, ax in zip(dataframes, axs):
sns.scatterplot(df, x="cycleNumber", y="QCharge_mA_h", ax=ax)
plt.show()
With seaborn
# make 5 random data frames
dataframes = []
for i in range(5):
np.random.seed(i)
random_df = pd.DataFrame({
"cycleNumber": np.random.random(size=(100,)),
"QCharge_mA_h": np.random.random(size=(100,)),
})
dataframes.append(random_df)
# make len(dataframes) rows using matplotlib
# concat dataframes
dfs = pd.concat(dataframes, keys=range(len(dataframes)), names=["keys"])
# move keys to columns
dfs = dfs.reset_index(level="keys")
# make grid and map scatterplot to each row
grid = sns.FacetGrid(data=dfs, row="keys")
grid.map(sns.scatterplot, "cycleNumber", "QCharge_mA_h")
plt.show()
With col_wrap=3
# make 5 random data frames
dataframes = []
for i in range(5):
np.random.seed(i)
random_df = pd.DataFrame({
"cycleNumber": np.random.random(size=(100,)),
"QCharge_mA_h": np.random.random(size=(100,)),
})
dataframes.append(random_df)
# make len(dataframes) rows using matplotlib
# concat dataframes
dfs = pd.concat(dataframes, keys=range(len(dataframes)), names=["keys"])
# move keys to columns
dfs = dfs.reset_index(level="keys")
# make grid and map scatterplot to each column, wrapping after 3
grid = sns.FacetGrid(data=dfs, col="keys", col_wrap=3)
grid.map(sns.scatterplot, "cycleNumber", "QCharge_mA_h")
plt.show()
As per the Plotly website, in a simple line chart one can change the legend entry from the column name to a manually specified string of text. For example, this code results in the following chart:
import pandas as pd
import plotly.express as px
df = pd.DataFrame(dict(
x = [1, 2, 3, 4],
y = [2, 3, 4, 3]
))
fig = px.line(
df,
x="x",
y="y",
width=800, height=600,
labels={
"y": "Series"
},
)
fig.show()
label changed:
However, when one plots multiple columns to the line chart, this label specification no longer works. There is no error message, but the legend entries are simply not changed. See this example and output:
import pandas as pd
import plotly.express as px
df = pd.DataFrame(dict(
x = [1, 2, 3, 4],
y1 = [2, 3, 4, 3],
y2 = [2, 4, 6, 8]
))
fig = px.line(
df,
x="x",
y=["y1", "y2"],
width=800, height=600,
labels={
"y1": "Series 1",
"y2": "Series 2"
},
)
fig.show()
legend entries not changed:
Is this a bug, or am I missing something? Any idea how this can be fixed?
In case anybody read my previous post, I did some more digging and found the solution to this issue. At the heart, the labels one sees over on the right in the legend are attributes known as "names" and not "labels". Searching for how to revise those names, I came across another post about this issue with a solution Legend Label Update. Using that information, here is a revised version of your program.
import pandas as pd
import plotly.express as px
df = pd.DataFrame(dict(
x = [1, 2, 3, 4],
y1 = [2, 3, 4, 3],
y2 = [2, 4, 6, 8]
))
fig = px.line(df, x="x", y=["y1", "y2"], width=800, height=600)
fig.update_layout(legend_title_text='Variable', xaxis_title="X", yaxis_title="Series")
newnames = {'y1':'Series 1', 'y2': 'Series 2'} # From the other post
fig.for_each_trace(lambda t: t.update(name = newnames[t.name]))
fig.show()
Following is a sample graph.
Try that out to see if that addresses your situation.
Regards.
I currently have the following code:
import pandas as pd
df = pd.DataFrame({'x1': [1,7,15], 'x2': [5,10,20]})
df
import plotly.graph_objects as go
fig = go.Figure()
for row in df.iterrows():
row_data = row[1]
fig.add_trace(go.Scatter(x=[row_data['x1'], row_data['x2']], y=[0,0], mode='lines',
line={'color': 'black'}))
fig.update_layout(showlegend=False)
fig.show()
This produces the required result. However, if I have 30k traces, things start to get pretty slow, both when rendering and when working with the plot (zooming, panning). So I'm trying to figure out a better way to do it. I thought of using shapes, but then I loos some functionalities that only traces have (e.g. hover information), and also not sure it'll be faster. Is there some other way to produce fragmented (non-overlapping) lines within one trace?
Thanks!
Update:
Based on the accepted answer by #Mangochutney, here is how I was able to produce the same plot using a single trace:
import numpy as np
import plotly.graph_objects as go
x = [1, 5, np.nan, 7, 10, np.nan, 15, 20]
y = [0, 0, np.nan, 0, 0, np.nan, 0, 0]
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, mode='lines'))
fig.update_layout(showlegend=True)
fig.show()
By default you can introduce gaps in your go.scatter line plot by adding additional np.nan entries where you need them. This behavior is controlled by the connectgaps parameter: see docs
E.g.: go.Scatter(x=[0,1,np.nan, 2, 3], y=[0,0,np.nan,0,0], mode='lines')
should create a line segement between 0 and 1 and 2 and 3.
You need first to find the overlapping lines. Then you can reduce the size of the data frame drastically. First, let us define a sample data frame like yours:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
x_upperbound = 100_000
data = {'x1': [], 'x2': []}
for i in range(30_000):
start = np.random.randint(1, x_upperbound-10)
end = np.random.randint(start, start+4)
data['x1'].append(start)
data['x2'].append(end)
df = pd.DataFrame(data)
Then using the following code, we can find a reduced (by one third) but an equivalent version of our original data frame introduced above:
l = np.zeros(x_upperbound+2)
for i, row in enumerate(df.iterrows()):
l[row[1]['x1']] += 1
l[row[1]['x2']+1] -= 1
cumsum = np.cumsum(l)
new_data = {'x1': [], 'x2': []}
flag = False
for i in range(len(cumsum)):
if cumsum[i]:
if flag:
continue
new_data['x1'].append(i)
flag = True
else:
if flag:
new_data['x2'].append(i-1)
flag = False
optimized_df = pd.DataFrame(new_data)
And now is show time. Using this code, you can show the exact result you would have gotten if you had graphed the original data frame:
fig = go.Figure()
for row in optimized_df.iterrows():
row_data = row[1]
fig.add_trace(go.Scatter(x=[row_data['x1'], row_data['x2']], y=[0,0], mode='lines',
line={'color': 'black'}))
fig.update_layout(showlegend=False)
fig.show()
It takes more time if either the distance between any x1 and its respective x2 decreases or their domain expands further.
I have data in a dataframe that I want to plot with a stacked bar plot:
test_df = pd.DataFrame([[1, 5, 1, 'A'], [2, 10, 1, 'B'], [3, 3, 1, 'A']], columns = ('ID', 'Value', 'Bucket', 'Type'))
if I do the plot with Plotly Express I get bars stacked on each other and correctly ordered (based on the index):
fig = px.bar(test_df, x='Bucket', y='Value', barmode='stack')
However, I want to color the data based on Type, hence I go for
fig = px.bar(test_df, x='Bucket', y='Value', barmode='stack', color='Type')
This works, except now the ordering is messed up, because all bars are now grouped by Type. I looked through the docs of Plotly Express and couldn't find a way to specify the ordering of the bars independently. Any tips on how to do this?
I found this one here, but the scenario is a bit different and the options mentioned there don't seem to help me:
How to disable plotly express from grouping bars based on color?
Edit: This goes into the right direction, but not with using Plotly Express, but rather Plotly graph_objects:
import plotly.graph_objects as go
test_df = pd.DataFrame([[1, 5, 1, 'A', 'red'], [2, 10, 1, 'B', 'blue'], [3, 3, 1, 'A', 'red']], columns = ('ID', 'Value', 'Bucket', 'Type', 'Color'))
fig = go.Figure()
fig.add_trace(go.Bar(x=test_df["Bucket"], y=test_df["Value"], marker_color=test_df["Color"]))
Output:
Still, I'd prefer the Express version, because so many things are easier to handle there (Legend, Hover properties etc.).
The only way I can understand your question is that you don't want B to be stacked on top of A, but rather the opposite. If that's the case, then you can get what you want through:
fig.data = fig.data[::-1]
fig.layout.legend.traceorder = 'reversed'
Some details:
fig.data = fig.data[::-1] simply reverses the order that the traces appear in fig.data and ultimately in the plotted figure itself. This will however reverse the order of the legend as well. So without fig.layout.legend.traceorder = 'reversed' the result would be:
And so it follows that the complete work-around looks like this:
fig.data = fig.data[::-1]
fig.layout.legend.traceorder = 'reversed'
Complete code:
import pandas as px
import plotly.express as px
test_df = pd.DataFrame([[1, 5, 1, 'A'], [2, 10, 1, 'B'], [3, 3, 1, 'A']], columns = ('ID', 'Value', 'Bucket', 'Type'))
fig = px.bar(test_df, x='Bucket', y='Value', barmode='stack', color='Type')
fig.data = fig.data[::-1]
fig.layout.legend.traceorder = 'reversed'
fig.show()
Ok, sorry for the long delay on this, but I finally got around to solving this.
My solution is possibly not the most straight forward one, but it does work.
The basic idea is to use graph_objects instead of express and then iterate over the dataframe and add each bar as a separate trace. This way, each trace can get a name that can be grouped in a certain way (which is not possible if adding all bars in a single trace, or at least I could not find a way).
Unfortunately, the ordering of the legend is messed up (if you have more then 2 buckets) and there is no way in plotly currently to sort it. But that's a minor thing.
The main thing that bothers me is that this could've been so much easier if plotly.express allowed for manual ordering of the bars by a certain column.
Maybe I'll submit that as a suggestion.
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default = "browser"
test_df = pd.DataFrame(
[[1, 5, 1, 'B'], [3, 3, 1, 'A'], [5, 10, 1, 'B'],
[2, 8, 2, 'B'], [4, 5, 2, 'A'], [6, 3, 2, 'A']],
columns = ('ID', 'Value', 'Bucket', 'Type'))
# add named colors to the dataframe based on type
test_df.loc[test_df['Type'] == 'A', 'Color'] = 'Crimson'
test_df.loc[test_df['Type'] == 'B', 'Color'] = 'ForestGreen'
# ensure that the dataframe is sorted by the values
test_df.sort_values('ID', inplace=True)
fig = go.Figure()
# it's tedious to iterate over each item, but only this way we can ensure that everything is correctly ordered and labelled
# Set up legend_show_dict to check if an item should be shown or not. This should be only done for the first occurrence to avoid duplication.
legend_show_dict = {}
for i, row in test_df.iterrows():
if row['Type'] in legend_show_dict:
legend_show = legend_show_dict[row['Type']]
else:
legend_show = True
legend_show_dict[row['Type']] = False
fig.add_trace(
go.Bar(
x=[row['Bucket']],
y=[row['Value']],
marker_color=row['Color'],
name=row['Type'],
legendgroup=row['Type'],
showlegend=legend_show,
hovertemplate="<br>".join([
'ID: ' + str(row['ID']),
'Value: ' + str(row['Value']),
'Bucket: ' + str(row['Value']),
'Type: ' + row['Type'],
])
))
fig.update_layout(
xaxis={'categoryorder': 'category ascending', 'title': 'Bucket'},
yaxis={'title': 'Value'},
legend={'traceorder': 'normal'}
)
fig.update_layout(barmode='stack', font_size=20)
fig.show()
This is what it should look like then:
Here you are part of my data.
I count my data
count_interests = interests.count()
then made a graph
count_interests.iplot(kind = 'bar', xTitle='Interests', yTitle='Number of Person', colors='Red')
I tried many times to find a function change columns color with values so bigger and smaller columns looks different colors.
I know there is colorscale and color functions and I tried many times I couldn't find. Does anyone know any function?
You could define a function which returns a color for each value and then pass the colors for each bar in a list.
import pandas as pd
import plotly
def color(val, median, std):
if val > median + std:
return 'darkgreen'
if val < median - std:
return 'darkred'
return 'orange'
df = pd.DataFrame({'cinema': [1, 2, 5, 3, 3, None],
'theatre': [3, 0, 8, 4, 0, 4],
'wine': [3, 2, 5, None, 1, None],
'beer': [4, 8, 2, None, None, None]})
med = df.count().median()
std = df.count().std()
colors = [color(i, med, std) for i in df.count()]
fig = plotly.graph_objs.Bar(x=df.columns,
y=df.count(),
marker=dict(color=colors))
plotly.offline.plot([fig])
The bars could be also colored either by pd.pivot_table() the rows to columns or by creating a separate list of traces for bars. Here, each column was aggregated by taking a sum() as an example. Code below:
# Import libraries
import datetime
from datetime import date
import pandas as pd
import numpy as np
from plotly import __version__
%matplotlib inline
import cufflinks as cf
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
init_notebook_mode(connected=True)
cf.go_offline()
import plotly.graph_objs as go
import plotly.offline as pyo
# Create dataframe
INT_M_PUB = [0,0,0,0,0,1,0,0,0,0]
INT_M_CINEMA = [1,1,1,0,0,0,0,0,0,1]
INT_M_THEATRE = [1,0,1,0,0,1,0,1,0,1]
INT_M_GYM = [0,0,0,0,0,1,0,0,0,1]
INT_M_ENTERTAIN = [0,0,1,1,0,1,0,1,0,1]
INT_M_EATOUT = [0,1,1,0,0,1,0,0,1,1]
INT_M_WINE = [0,0,0,0,0,1,0,0,0,1]
interests = pd.DataFrame({'INT_M_PUB':INT_M_PUB, 'INT_M_CINEMA':INT_M_CINEMA, 'INT_M_THEATRE':INT_M_THEATRE,
'INT_M_GYM':INT_M_GYM, 'INT_M_ENTERTAIN':INT_M_ENTERTAIN, 'INT_M_EATOUT':INT_M_EATOUT,
'INT_M_WINE':INT_M_WINE
})
interests.head(2)
dfm = interests.sum().reset_index().rename(columns={'index':'interests', 0:'value'})
dfm
# Re-creating the plot similar to that in question (note: y-axis scales are different)
df = dfm.copy()
col_list = df.columns
df.iplot(kind = 'bar', x='interests', y='value', xTitle='Interests', yTitle='Number of Person', title='These bars need to be colored', color='red')
# Color plots by creating traces
# Initialize empty list named data to collect traces for each bar
data = []
for col_name in col_list:
trace = go.Bar(
x=[col_name],
y=df[col_name],
name=col_name
)
data.append(trace)
data = data
layout = go.Layout(
barmode='group',
title='Interests',
xaxis=dict(title='Interests'),
yaxis=dict(title='Number of Person')
)
fig = go.Figure(data=data, layout=layout)
pyo.iplot(fig, filename='grouped-bar')
# Creating plot by pivoting the table
df = pd.pivot_table(dfm, values='value', columns='interests')
df.iplot(kind = 'bar',xTitle='Interests', yTitle='Number of Person')