How to create horizontal histogram in Python's plotnine? - python

I'm using plotnine recently and wanted to plot a horizontal histogram (i.e., a histogram with horizontal bars).
The following example illustrates the vertical histogram:
from plotnine import *
import numpy as np
df = pd.DataFrame({'values': np.random.normal(0,10,1000), 'group': ['a']*500 + ['b']*500})
#
(
ggplot(df, aes(x = 'values', y = after_stat('count'), fill = 'group'))
+ geom_histogram(binwidth = 5)
)
And the result:
Simply changing the axes in aes doesn't work:
(
ggplot(df, aes(y = 'values', x = after_stat('count'), fill = 'group'))
+ geom_histogram(binwidth = 5)
)
#PlotnineError: 'stat_bin() must not be used with a y aesthetic.'
How can I achieve the desired result?

Use coord_flip to achieve the desired result:
from plotnine import *
import numpy as np
df = pd.DataFrame({'values': np.random.normal(0,10,1000), 'group': ['a']*500 + ['b']*500})
#
(
ggplot(df, aes(x = 'values', y = after_stat('count'), fill = 'group'))
+ geom_histogram(binwidth = 5)
+ coord_flip()
)

Related

Plotly - "grouped" scatter plot

Suppose I have the following pandas data frame:
import pandas as pd
d = {'Person': ['Bob']*9 + ['Alice']*9,
'Time': ['Morining']*3 + ['Noon']*3 + ['Evening']*3 + ['Morining']*3 + ['Noon']*3 + ['Evening']*3,
'Color': ['Red','Blue','Green']*6,
'Energy': [1,5,4,7,3,6,8,4,2,9,8,5,2,6,7,3,8,1]}
df = pd.DataFrame(d)
How can I create a plot like this?
(Excuse the crude plotting)
I've tried tricking scatter, strip and box plots into this, but with no success.
Thank you!
generate a scatter trace per Person
a bit of logic on x so that each person is offset. Hence hovertext and xaxis ticks
import plotly.graph_objects as go
xbase = pd.Series(df["Time"].unique()).reset_index().rename(columns={"index":"x",0:"Time"})
dfp = df.merge(xbase, on="Time").set_index("Person")
go.Figure(
[
go.Scatter(
name=p,
x=dfp.loc[p, "x"] + i/10,
y=dfp.loc[p, "Energy"],
text=dfp.loc[p, "Time"],
mode="markers",
marker={"color": dfp.loc[p, "Color"], "symbol":i, "size":10},
hovertemplate="(%{text},%{y})"
)
for i, p in enumerate(dfp.index.get_level_values("Person").unique())
]
).update_layout(xaxis={"tickmode":"array", "tickvals":xbase["x"], "ticktext":xbase["Time"]})
You've already received some great suggestions, but since you're still wondering about:
What if I also want the colors to show in the legend?
I'd just like to chip in that px.scatter comes really close to being an optimal approach right out of the box. The only thing that's missing is jitter. Still, the plot below can be produced by these few lines of code:
fig = px.scatter(df, x = 'Time', y = 'Energy', color = 'Color', symbol = 'Person')
fig.for_each_trace(lambda t: t.update(marker_color = t.name.split(',')[0],
name = t.name.split(',')[1], x = [1,2,3]))
fig.for_each_trace(lambda t: t.update(x=tuple([x + 0.2 for x in list(t.x)])) if t.name == ' Alice' else ())
Complete code:
import pandas as pd
import plotly.express as px
import plotly.graph_objs as go
# data
d = {'Person': ['Bob']*9 + ['Alice']*9,
'Time': ['Morining']*3 + ['Noon']*3 + ['Evening']*3 + ['Morning']*3 + ['Noon']*3 + ['Evening']*3,
'Color': ['Red','Blue','Green']*6,
'Energy': [1,5,4,7,3,6,8,4,2,9,8,5,2,6,7,3,8,1]}
df = pd.DataFrame(d)
# figure setup
fig = px.scatter(df, x = 'Time', y = 'Energy', color = 'Color', symbol = 'Person')
# some customizations in order to get to the desired result:
fig.for_each_trace(lambda t: t.update(marker_color = t.name.split(',')[0],
name = t.name.split(',')[1],
x = [1,2,3]))
# jitter
fig.for_each_trace(lambda t: t.update(x=tuple([x + 0.2 for x in list(t.x)])) if t.name == ' Alice' else ())
# layout
fig.update_layout(xaxis={"tickmode":"array","tickvals":[1,2,3],"ticktext":df.Time.unique()})
fig.show()
Room for improvement:
Some elements of the snippet above could undoubtedly be made more dynamic, like x = [1,2,3] which should take into account a varying number of elements on the x-axis. The same goes for the number of people and the arguments used for jitter. But I can look into that too if this is something you can use.
You can go through each row the DataFrame using itertuples (better performance than iterrows), and map 'Morning', 'Noon', and 'Evening' values to 1,2,3, respectively, and then jitter the x-values by mapping 'Bob' to '-0.05' and 'Alice' to 0.05 and adding these values to each of the x-values. You can also pass the 'Color' information to the marker_color argument.
Then map the tickvalues of 1,2,3 back to 'Morning','Noon' and 'Evening' and also use a legendgroup to get only one Bob and one Alice legend marker to display (to stop the marker for each trace from displaying in the legend)
import pandas as pd
import plotly.graph_objects as go
d = {'Person': ['Bob']*9 + ['Alice']*9,
'Time': ['Morning']*3 + ['Noon']*3 + ['Evening']*3 + ['Morning']*3 + ['Noon']*3 + ['Evening']*3,
'Color': ['Red','Blue','Green']*6,
'Energy': [1,5,4,7,3,6,8,4,2,9,8,5,2,6,7,3,8,1]}
df = pd.DataFrame(d)
shapes = {'Bob': 'circle', 'Alice': 'diamond'}
time = {'Morning':1, 'Noon':2, 'Evening':3}
jitter = {'Bob': -0.05, 'Alice': 0.05}
fig = go.Figure()
## position 1 of each row is Person... position 4 is the Energy value
s = df.Person.shift() != df.Person
name_changes = s[s].index.values
for row in df.itertuples():
if row[0] in name_changes:
fig.add_trace(go.Scatter(
x=[time[row[2]] + jitter[row[1]]],
y=[row[4]],
legendgroup=row[1],
name=row[1],
mode='markers',
marker_symbol=shapes[row[1]],
marker_color=row[3],
showlegend=True
))
else:
fig.add_trace(go.Scatter(
x=[time[row[2]] + jitter[row[1]]],
y=[row[4]],
legendgroup=row[1],
name=row[1],
mode='markers',
marker_symbol=shapes[row[1]],
marker_color=row[3],
showlegend=False
))
fig.update_traces(marker=dict(size=12,line=dict(width=2,color='DarkSlateGrey')))
fig.update_layout(
xaxis=dict(
tickmode='array',
tickvals=list(time.values()),
ticktext=list(time.keys())
)
)
fig.show()
In case you only want to go with matplotlib and don't want any extra dependencies, here is a sample code. (Pandas operations groupbys etc are left for you to optimize)
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.transforms as transforms
from matplotlib.lines import Line2D
df = pd.DataFrame(
{
'Person': ['Bob'] * 9 + ['Alice'] * 9,
'Time': ['Morning'] * 3
+ ['Noon'] * 3
+ ['Evening'] * 3
+ ['Morning'] * 3
+ ['Noon'] * 3
+ ['Evening'] * 3,
'Color': ['Red', 'Blue', 'Green'] * 6,
'Energy': [1, 5, 4, 7, 3, 6, 8, 4, 2, 9, 8, 5, 2, 6, 7, 3, 8, 1],
}
)
plt.figure()
x = ['Morning', 'Noon', 'Evening']
# Transform function
offset = lambda p: transforms.ScaledTranslation(
p / 72.0, 0, plt.gcf().dpi_scale_trans
)
trans = plt.gca().transData
# Use this to center transformation
start_offset = -len(df['Person'].unique()) // 2
# Define as many markers as people you have
markers = ['o', '^']
# Use this for custom legend
custom_legend = []
# Do this if you need to aggregate
df = df.groupby(['Person', 'Time', 'Color'])['Energy'].sum().reset_index()
df = df.set_index('Time')
for i, [person, pgroup] in enumerate(df.groupby('Person')):
pts = (i + start_offset) * 10
marker = markers[i]
transform = trans + offset(pts)
# This is for legend, not plotted
custom_legend.append(
Line2D(
[0],
[0],
color='w',
markerfacecolor='black',
marker=marker,
markersize=10,
label=person,
)
)
for color, cgroup in pgroup.groupby('Color'):
mornings = cgroup.loc[cgroup.index == 'Morning', 'Energy'].values[0]
noons = cgroup.loc[cgroup.index == 'Noon', 'Energy'].values[0]
evenings = cgroup.loc[cgroup.index == 'Evening', 'Energy'].values[0]
# This stupid if is because you need to define at least one non
# transformation scatter be it first or whatever.
if pts == 0:
plt.scatter(
x,
[mornings, noons, evenings],
c=color.lower(),
s=25,
marker=marker,
)
else:
plt.scatter(
x,
[mornings, noons, evenings],
c=color.lower(),
s=25,
marker=marker,
transform=transform,
)
plt.ylabel('Energy')
plt.xlabel('Time')
plt.legend(handles=custom_legend)
plt.margins(x=0.5)
plt.show()

Toggle points on and off in altair

I'd like to be able to toggle the display of the points on and off in the below chart. The 2 lines are the means of the points in groups 1 and 2. I thought there would be a way to do this using interactive but cannot find any examples. Any help is much appreciated.
import math
import numpy as np
import pandas as pd
import altair as alt
x = np.arange(0,math.pi,0.1);
y = np.sin(x);
a, b = -0.2, 0.2
summary_df=[]
for i in range(0,2):
for j in range(0,5):
rand_y = (b - a)*np.random.rand(len(y)) + a
df = pd.DataFrame({
'group': i,
'batch': j,
'x': x,
'y': y+rand_y
})
summary_df.append(df)
summary_df = pd.concat(summary_df)
base = alt.Chart(
summary_df
).properties(
width=200,
height=400
)
mean_selection = alt.selection_multi(fields=['group'], bind='legend')
mean_line = base.mark_line(size=2).encode(
x=alt.X('x:Q'),
y=alt.Y('y:Q', aggregate='mean', axis=alt.Axis(title='y')),
color='group:N',
opacity=alt.condition(mean_selection, alt.value(1), alt.value(0.2))
).add_selection(
mean_selection
).interactive()
all_selection = alt.selection_multi(fields=['group'], bind='legend')
all_points = base.mark_square(size=10).encode(
y=alt.Y('y:Q', axis=alt.Axis(title='y')),
x=alt.X('x:Q',),
color='group:N',
tooltip='batch:N',
opacity=alt.condition(all_selection, alt.value(1), alt.value(0.2))
).add_selection(
all_selection
).interactive()
(mean_line+all_points)

how to plot a range with a line in the center with Plotly, in Python [duplicate]

How can I use Plotly to produce a line plot with a shaded standard deviation? I am trying to achieve something similar to seaborn.tsplot. Any help is appreciated.
The following approach is fully flexible with regards to the number of columns in a pandas dataframe and uses the default color cycle of plotly. If the number of lines exceed the number of colors, the colors will be re-used from the start. As of now px.colors.qualitative.Plotly can be replaced with any hex color sequence that you can find using px.colors.qualitative:
Alphabet = ['#AA0DFE', '#3283FE', '#85660D', '#782AB6', '#565656', '#1...
Alphabet_r = ['#FA0087', '#FBE426', '#B00068', '#FC1CBF', '#C075A6', '...
[...]
Complete code:
# imports
import plotly.graph_objs as go
import plotly.express as px
import pandas as pd
import numpy as np
# sample data in a pandas dataframe
np.random.seed(1)
df=pd.DataFrame(dict(A=np.random.uniform(low=-1, high=2, size=25).tolist(),
B=np.random.uniform(low=-4, high=3, size=25).tolist(),
C=np.random.uniform(low=-1, high=3, size=25).tolist(),
))
df = df.cumsum()
# define colors as a list
colors = px.colors.qualitative.Plotly
# convert plotly hex colors to rgba to enable transparency adjustments
def hex_rgba(hex, transparency):
col_hex = hex.lstrip('#')
col_rgb = list(int(col_hex[i:i+2], 16) for i in (0, 2, 4))
col_rgb.extend([transparency])
areacol = tuple(col_rgb)
return areacol
rgba = [hex_rgba(c, transparency=0.2) for c in colors]
colCycle = ['rgba'+str(elem) for elem in rgba]
# Make sure the colors run in cycles if there are more lines than colors
def next_col(cols):
while True:
for col in cols:
yield col
line_color=next_col(cols=colCycle)
# plotly figure
fig = go.Figure()
# add line and shaded area for each series and standards deviation
for i, col in enumerate(df):
new_col = next(line_color)
x = list(df.index.values+1)
y1 = df[col]
y1_upper = [(y + np.std(df[col])) for y in df[col]]
y1_lower = [(y - np.std(df[col])) for y in df[col]]
y1_lower = y1_lower[::-1]
# standard deviation area
fig.add_traces(go.Scatter(x=x+x[::-1],
y=y1_upper+y1_lower,
fill='tozerox',
fillcolor=new_col,
line=dict(color='rgba(255,255,255,0)'),
showlegend=False,
name=col))
# line trace
fig.add_traces(go.Scatter(x=x,
y=y1,
line=dict(color=new_col, width=2.5),
mode='lines',
name=col)
)
# set x-axis
fig.update_layout(xaxis=dict(range=[1,len(df)]))
fig.show()
I was able to come up with something similar. I post the code here to be used by someone else or for any suggestions for improvements.
import matplotlib
import random
import plotly.graph_objects as go
import numpy as np
#random color generation in plotly
hex_colors_dic = {}
rgb_colors_dic = {}
hex_colors_only = []
for name, hex in matplotlib.colors.cnames.items():
hex_colors_only.append(hex)
hex_colors_dic[name] = hex
rgb_colors_dic[name] = matplotlib.colors.to_rgb(hex)
data = [[1, 3, 5, 4],
[2, 3, 5, 4],
[1, 1, 4, 5],
[2, 3, 5, 4]]
#calculating mean and standard deviation
mean=np.mean(data,axis=0)
std=np.std(data,axis=0)
#draw figure
fig = go.Figure()
c = random.choice(hex_colors_only)
fig.add_trace(go.Scatter(x=np.arange(4), y=mean+std,
mode='lines',
line=dict(color=c,width =0.1),
name='upper bound'))
fig.add_trace(go.Scatter(x=np.arange(4), y=mean,
mode='lines',
line=dict(color=c),
fill='tonexty',
name='mean'))
fig.add_trace(go.Scatter(x=np.arange(4), y=mean-std,
mode='lines',
line=dict(color=c, width =0.1),
fill='tonexty',
name='lower bound'))
fig.show()
Great custom responses posted by others. In case someone is interested in code from the official plotly website, see here: https://plotly.com/python/continuous-error-bars/
I wrote a function to extend plotly.express.line with the same high level interface of Plotly Express. The line function (source code below) is used in the same exact way as plotly.express.line but allows for continuous error bands with the flag argument error_y_mode which can be either 'band' or 'bar'. In the second case it produces the same result as the original plotly.express.line. Here is an usage example:
import plotly.express as px
df = px.data.gapminder().query('continent=="Americas"')
df = df[df['country'].isin({'Argentina','Brazil','Colombia'})]
df['lifeExp std'] = df['lifeExp']*.1 # Invent some error data...
for error_y_mode in {'band', 'bar'}:
fig = line(
data_frame = df,
x = 'year',
y = 'lifeExp',
error_y = 'lifeExp std',
error_y_mode = error_y_mode, # Here you say `band` or `bar`.
color = 'country',
title = f'Using error {error_y_mode}',
markers = '.',
)
fig.show()
which produces the following two plots:
The source code of the line function that extends plotly.express.line is this:
import plotly.express as px
import plotly.graph_objs as go
def line(error_y_mode=None, **kwargs):
"""Extension of `plotly.express.line` to use error bands."""
ERROR_MODES = {'bar','band','bars','bands',None}
if error_y_mode not in ERROR_MODES:
raise ValueError(f"'error_y_mode' must be one of {ERROR_MODES}, received {repr(error_y_mode)}.")
if error_y_mode in {'bar','bars',None}:
fig = px.line(**kwargs)
elif error_y_mode in {'band','bands'}:
if 'error_y' not in kwargs:
raise ValueError(f"If you provide argument 'error_y_mode' you must also provide 'error_y'.")
figure_with_error_bars = px.line(**kwargs)
fig = px.line(**{arg: val for arg,val in kwargs.items() if arg != 'error_y'})
for data in figure_with_error_bars.data:
x = list(data['x'])
y_upper = list(data['y'] + data['error_y']['array'])
y_lower = list(data['y'] - data['error_y']['array'] if data['error_y']['arrayminus'] is None else data['y'] - data['error_y']['arrayminus'])
color = f"rgba({tuple(int(data['line']['color'].lstrip('#')[i:i+2], 16) for i in (0, 2, 4))},.3)".replace('((','(').replace('),',',').replace(' ','')
fig.add_trace(
go.Scatter(
x = x+x[::-1],
y = y_upper+y_lower[::-1],
fill = 'toself',
fillcolor = color,
line = dict(
color = 'rgba(255,255,255,0)'
),
hoverinfo = "skip",
showlegend = False,
legendgroup = data['legendgroup'],
xaxis = data['xaxis'],
yaxis = data['yaxis'],
)
)
# Reorder data as said here: https://stackoverflow.com/a/66854398/8849755
reordered_data = []
for i in range(int(len(fig.data)/2)):
reordered_data.append(fig.data[i+int(len(fig.data)/2)])
reordered_data.append(fig.data[i])
fig.data = tuple(reordered_data)
return fig

How to grid plot 2D categorical data

I hava data that looks like:
Name X Y
A HIGH MID
B LOW LOW
C MID LOW
D HIGH MID
How to plot this data in a 2-D diagram with a 3x3 grid adding a random variation to place each data point including its name with enough spacing between each other.
So it should look somewhat like that:
The following i tried, but i dont know how to plot the values not exactly on the grid, but in between, so they do nbot overlap.
import pandas as pd
import matplotlib.pyplot as plt
### Mock Data ###
data = """A0,LOW,LOW
A,MID,MID
B,LOW,MID
C,MID,HIGH
D,LOW,MID
E,HIGH,HIGH"""
df = pd.DataFrame([x.split(',') for x in data.split('\n')])
df.columns = ['name','X','Y']
### Plotting ###
fig,axs = plt.subplots()
axs.scatter(df.X,df.Y,label=df.name)
axs.set_xlabel('X')
axs.set_ylabel('Y')
for i,p in enumerate(df.name):
axs.annotate(p, (df.X[i],df.Y[i]))
axs.grid()
axs.set_axisbelow(True)
fig.tight_layout()
plt.show()
resulting:
You can control directly the positions and change the labels on the axis. There are a few problems with your drawing because you are not taking into account some issue such as "what label will you have if you have more than one point at the same location?".
In any case here is a possible solution:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
### Mock Data ###
data = """A0,LOW,LOW
A,MID,MID
B,LOW,MID
C,MID,HIGH
D,LOW,MID
E,HIGH,HIGH"""
df = pd.DataFrame([x.split(',') for x in data.split('\n')])
df.columns = ['name','X','Y']
pos = [0, 1, 2]
lbls = ["LOW", "MID", "HIGH"]
trans = {lbls[i]:pos[i] for i in range(len(pos))}
mat = np.zeros((3, 3), dtype="U10") # This is limited to 10 characters
xxs = []
yys = []
offset = 0.05
for i in range(df.shape[0]):
xc, yc = trans[df.X[i]], trans[df.Y[i]]
if mat[xc, yc]=="":
mat[xc, yc] = df.name[i]
else:
mat[xc, yc] = mat[xc, yc] + ";" + df.name[i]
xxs.append(xc)
yys.append(yc)
fig,axs = plt.subplots()
axs.scatter(xxs, yys)
for i in range(df.shape[0]):
name = mat[xxs[i], yys[i]]
axs.text(xxs[i]+offset, yys[i]+offset, name)
axs.set_xticks(pos)
axs.set_xticklabels(lbls)
axs.set_yticks(pos)
axs.set_yticklabels(lbls)
for i in pos:
axs.axhline(pos[i]-0.5, color="black")
axs.axvline(pos[i]-0.5, color="black")
axs.set_xlim(-0.5, 2.5)
axs.set_ylim(-0.5, 2.5)
plt.show()
This result in the following image:

Unsure how to use colormap with Folium marker plot

I have a dataframe with latitude, longitude, and power percentage. I want to do something very simple but not sure how: apply a colormap to color the data points based on their percentage. So 90% is red and 100% is blue. I have created both a successful map and colormap, but not sure how to proceed next.
import folium
import pandas as pd
import folium.plugins
import branca
import branca.colormap as cm
data = [
[33.823400, -118.12194, 99.23],
[33.823500, -118.12294, 95.23],
[33.823600, -118.12394, 91.23],
[33.823700, -118.12494, 90.00]
]
df = pd.DataFrame(data, columns=['latitude','longitude','power'])
x_start = (df['latitude'].max() + df['latitude'].min()) / 2
y_start = (df['longitude'].max() + df['longitude'].min()) / 2
start_coord = (x_start, y_start)
map = folium.Map(location=start_coord, zoom_start=12)
lat = list(df.latitude)
lon = list(df.longitude)
for loc in zip(lat, lon):
folium.Circle(
location=loc,
radius=10,
#fill=True,
#color='blue',
#fill_opacity=0.7
).add_to(map)
display(map)
colormap = cm.LinearColormap(colors=['red','lightblue'], index=[90,100],vmin=90,vmax=100)
colormap
I'm in a rush, but this is how I've done it in the past. Create the CM and then call it like so colormap(.9)
import folium
import pandas as pd
import folium.plugins
import branca
import branca.colormap as cm
data = [
[33.823400, -118.12194, 99.23],
[33.823500, -118.12294, 95.23],
[33.823600, -118.12394, 91.23],
[33.823700, -118.12494, 90.00]
]
df = pd.DataFrame(data, columns=['latitude','longitude','power'])
x_start = (df['latitude'].max() + df['latitude'].min()) / 2
y_start = (df['longitude'].max() + df['longitude'].min()) / 2
start_coord = (x_start, y_start)
colormap = cm.LinearColormap(colors=['red','lightblue'], index=[90,100],vmin=90,vmax=100)
map = folium.Map(location=start_coord, zoom_start=12)
lat = list(df.latitude)
lon = list(df.longitude)
pow = list(df.power)
for loc, p in zip(zip(lat, lon), pow):
folium.Circle(
location=loc,
radius=10,
fill=True,
color=colormap(p),
#fill_opacity=0.7
).add_to(map)
map.add_child(colormap)
display(map)

Categories