Problems with long lists as input of set_xticklabels (Misaligned) - python

I am following the NMT with attention (https://github.com/tensorflow/tensorflow/blob/r1.13/tensorflow/contrib/eager/python/examples/nmt_with_attention/nmt_with_attention.ipynb) tutorial and I am applying it for my own use case. Unfortunately, when I try to plot the attention weigths, I get alignment problems of the x-axis if the input is too long (e.g. 14 instead of 7).
In this code block, the plotting works as expected:
import numpy as np
from matplotlib import pyplot as plt
def plot_attention():
attention = np.array([[7.78877574e-10, 4.04739769e-10, 6.65854022e-05, 1.63362725e-04,
2.85054208e-04, 8.50252633e-04, 4.58042100e-02],
[9.23501700e-02, 5.69618285e-01, 1.80586591e-01, 9.78111699e-02,
2.71992851e-02, 9.59911197e-03, 2.54837354e-03]])
sentence = ['<start>', 'hace', 'mucho', 'frio', 'aqui', '.', '<end>']
predicted_sentence = ['it', 's']
fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(1, 1, 1)
ax.matshow(attention, cmap='viridis')
fontdict = {'fontsize': 14}
ax.set_xticklabels([''] + sentence, fontdict=fontdict, rotation=90)
ax.set_yticklabels([''] + predicted_sentence, fontdict=fontdict)
plt.show()
plot_attention()
but with more elements in the list "sentence", it seems to misalign:
def plot_attention():
attention = np.array([[7.78877574e-10, 4.04739769e-10, 6.65854022e-05, 1.63362725e-04,
2.85054208e-04, 8.50252633e-04, 4.58042100e-02, 7.78877574e-10, 4.04739769e-10, 6.65854022e-05, 1.63362725e-04,
2.85054208e-04, 8.50252633e-04, 4.58042100e-02],
[9.23501700e-02, 5.69618285e-01, 1.80586591e-01, 9.78111699e-02,
2.71992851e-02, 9.59911197e-03, 2.54837354e-03, 7.78877574e-10, 4.04739769e-10, 6.65854022e-05, 1.63362725e-04,
2.85054208e-04, 8.50252633e-04, 4.58042100e-02]])
sentence = ['<start>', 'hace', 'mucho', 'frio', 'aqui', '.', '<end>', '<start>', 'hace', 'mucho', 'frio', 'aqui', '.', '<end>']
predicted_sentence = ['it', 's']
fig = plt.figure(figsize=(20,10))
ax = fig.add_subplot(1, 1, 1)
ax.matshow(attention, cmap='viridis')
fontdict = {'fontsize': 14}
ax.set_xticklabels([''] + sentence, fontdict=fontdict, rotation=90)
ax.set_yticklabels([''] + predicted_sentence, fontdict=fontdict)
plt.show()
plot_attention()
I expect the x-axis to be perfectly aligned and that all elements of the x-axis are shown (not every second one as it is right now)

The problem is that you are only setting the tick-labels without specifying the positions of the ticks. Whenever you modify the tick labels, you should always first set the tick positions. So, do the following in your code
ax.set_xticks(range(len(sentence)))
ax.set_yticks(range(len(predicted_sentence)))
ax.set_xticklabels(sentence, fontdict=fontdict, rotation=90)
ax.set_yticklabels(predicted_sentence, fontdict=fontdict)

Related

Matplotlib: How to plot errorbar plots based on a color map of third category column(Not X and Y)

I'm working on an experimentation personal project. I have the following dataframes:
treat_repr = pd.DataFrame({'kpi': ['cpsink', 'hpu', 'mpu', 'revenue', 'wallet']
,'diff_pct': [0.655280, 0.127299, 0.229958, 0.613308, -0.718421]
,'me_pct': [1.206313, 0.182875, 0.170821, 1.336590, 2.229763]
,'p': [0.287025, 0.172464, 0.008328, 0.368466, 0.527718]
,'significance': ['insignificant', 'insignificant', 'significant', 'insignificant', 'insignificant']})
pre_treat_repr = pd.DataFrame({'kpi': ['cpsink', 'hpu', 'mpu', 'revenue', 'wallet']
,'diff_pct': [0.137174, 0.111005, 0.169490, -0.152929, -0.450667]
,'me_pct': [1.419080, 0.207081, 0.202014, 1.494588, 1.901672]
,'p': [0.849734, 0.293427, 0.100091, 0.841053, 0.642303]
,'significance': ['insignificant', 'insignificant', 'insignificant', 'insignificant', 'insignificant']})
I have used the below code to construct errorbar plot, which works fine:
def confint_plot(df):
plt.style.use('fivethirtyeight')
fig, ax = plt.subplots(figsize=(18, 10))
plt.errorbar(df[df['significance'] == 'significant']["diff_pct"], df[df['significance'] == 'significant']["kpi"], xerr = df[df['significance'] == 'significant']["me_pct"], color = '#d62828', fmt = 'o', capsize = 10)
plt.errorbar(df[df['significance'] == 'insignificant']["diff_pct"], df[df['significance'] == 'insignificant']["kpi"], xerr = df[df['significance'] == 'insignificant']["me_pct"], color = '#2a9d8f', fmt = 'o', capsize = 10)
plt.legend(['significant', 'insignificant'], loc = 'best')
ax.axvline(0, c='red', alpha=0.5, linewidth=3.0,
linestyle = '--', ymin=0.0, ymax=1)
plt.title("Confidence Intervals of Continous Metrics", size=14, weight='bold')
plt.xlabel("% Difference of Control over Treatment", size=12)
plt.show()
for which the output of confint_plot(treat_repr) looks like this:
Now if I run the same plot function on a pre-treatment dataframe confint_plot(pre_treat_repr), the plot looks like this:
We can observe from both the plots that the order of the variables changed from 1st plot to 2nd plot depending on whether the kpi is significant(that's the way I figured after exhausting many attempts).
Questions:
How do I make a change to the code to dynamically allocate color maps without changing the order of the kpis on y axis?
Currently I have manually typed in the legends. Is there a way to dynamically populate legends?
Appreciate the help!
Because you plot the significant KPIs first, they will always appear on the bottom of the chart. How you solve this and keep the desired colors depends on the kind of charts you are making with matplotlib. With scatter charts, you can specify a color array in c parameter. Error bar charts do not offer that functionality.
One way to work around that is to sort your KPIs, give them numeric position (0, 1, 2, 3 , ...), plot them twice (once for significants, once for insignificants) and re-tick them:
def confint_plot(df):
plt.style.use('fivethirtyeight')
fig, ax = plt.subplots(figsize=(18, 10))
# Sort the KPIs alphabetically. You can change the order to anything
# that fits your purpose
df_plot = df.sort_values('kpi').assign(y=range(len(df)))
for significance in ['significant', 'insignificant']:
cond = df_plot['significance'] == significance
color = '#d62828' if significance == 'significant' else '#2a9d8f'
# Plot them in their numeric positions first
plt.errorbar(
df_plot.loc[cond, 'diff_pct'], df_plot.loc[cond, 'y'],
xerr=df_plot.loc[cond, 'me_pct'], label=significance,
fmt='o', capsize=10, c=color
)
plt.legend(loc='best')
ax.axvline(0, c='red', alpha=0.5, linewidth=3.0,
linestyle = '--', ymin=0.0, ymax=1)
# Re-tick to show the KPIs
plt.yticks(df_plot['y'], df_plot['kpi'])
plt.title("Confidence Intervals of Continous Metrics", size=14, weight='bold')
plt.xlabel("% Difference of Control over Treatment", size=12)
plt.show()

Seaborn violin plots don't align with x-axis labels

I am attempting to build a violin plot to illustrate depth on the y-axis and a distance away from a known point on the x-axis. I am able to get the x-axis labels to distribute appropriately spaced on the x-axis based on the variable distances but i am unable to get the violin plots to align. They plots appear to be shifted to the y-axis. Any help would be appreciated. My code is below:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
path = 'O:\info1.csv'
df = pd.read_csv(path)
item = ['a', 'b', 'c', 'd', 'e', 'f']
dist = [450, 1400, 2620, 3100, 3830, 4940]
plt.rcParams.update({'font.size': 15})
fig, axes1 = plt.subplots(figsize=(20,10))
axes1 = sns.violinplot(x='item', y='surface', data=df, hue = 'item', order = (item))
axes1.invert_yaxis()
axes1.set_xlabel('Item')
axes1.set_ylabel('Depth')
axes1.set_xticks(dist)
plt.xticks(rotation=20)
plt.show()
Example dataset:
You cannot use seaborn violin plot, because from the vignette:
This function always treats one of the variables as categorical and
draws data at ordinal positions (0, 1, … n) on the relevant axis, even
when the data has a numeric or date type.
So if you draw it directly with seaborn, it is categorical:
sns.violinplot(x='dist', y='surface', data=df, hue = 'item',dodge=False,cut=0)
To place the boxplot according, you need to use matplotlib, first we get the data out in the format required and define a color palette:
surface_values = list([np.array(value) for name,value in df.groupby('item')['surface']])
dist_values = df.groupby('item')['dist'].agg("mean")
pal = ["crimson","darkblue","rebeccapurple"]
You need to set the width, provide the distance, and for the inner "box", we modify the code from here:
fig, ax = plt.subplots(1, 1,figsize=(8,4))
parts = ax.violinplot(surface_values,widths=200,positions=dist_values,
showmeans=False, showmedians=False,showextrema=False)
for i,pc in enumerate(parts['bodies']):
pc.set_facecolor(pal[i])
pc.set_edgecolor('black')
pc.set_alpha(1)
quartile1, medians, quartile3 = np.percentile(surface_values, [25, 50, 75], axis=1)
whiskers = np.array([
adjacent_values(sorted_array, q1, q3)
for sorted_array, q1, q3 in zip(surface_values, quartile1, quartile3)])
whiskersMin, whiskersMax = whiskers[:, 0], whiskers[:, 1]
inds = dist_values
ax.scatter(inds, medians, marker='o', color='white', s=30, zorder=3)
ax.vlines(inds, quartile1, quartile3, color='k', linestyle='-', lw=5)
ax.vlines(inds, whiskersMin, whiskersMax, color='k', linestyle='-', lw=1)
If you don't need the inner box, you can just call plt.violin ...
thanks for including a bit of data.
To change your plot, the item and dist variables in your code need to be adjusted, and remove the item = [a,b...] and dist = [] arrays in your code. The ticks on the x-axis using the axes1.set_xticks needs a bit of tweaking to get what you're looking for there.
Example 1:
removed the two arrays that were creating the plot you were seeing before; violinplot function unchanged.
# item = ['a', 'b', 'c', 'd', 'e', 'f'] * Removed
# dist = [450, 1400, 2620, 3100, 3830, 4940] * Removed
plt.rcParams.update({'font.size': 15})
fig, axes1 = plt.subplots(figsize=(20,10))
axes1 = sb.violinplot(x='item', y='surface', data=df, hue = 'item', inner = 'box')
axes1.invert_yaxis()
axes1.set_xlabel('Item')
axes1.set_ylabel('Depth')
#axes1.set_xticks(dist) * Removed
plt.xticks(rotation=20)
plt.show()
Inside each curve, there is a black shape with a white dot inside. This is the miniature box plot mentioned above. If you'd like to remove the box plot, you can set the inner = None parameter in the violinplot call to simplify the look of the final visualization.
Example 2:
put dist on your x axis in place of the xticks.
plt.rcParams.update({'font.size': 15})
plt.subplots(figsize=(20,10))
# Put 'dist' as your x input, keep your categorical variable (hue) equal to 'item'
axes1 = sb.violinplot(data = df, x = 'dist', y = 'surface', hue = 'item', inner = 'box');
axes1.invert_yaxis()
axes1.set_xlabel('Item')
axes1.set_ylabel('Depth');
I'm not confident the items and the distances you are working with have a relationship you want to show on the x-axis, or if you just want to use those integers as your tick marks for that axis. If there is an important relationship between the item and the dist, you could use a dictionary new_dict = {450: 'a', 1400: 'b', 2620: 'c' ...
Hope you find this helpful.

Inset plots in with the simple plt plots

I am trying to add an inset plot for the Holding_cost and Backlogging_cost lines of the below graph. When I searched for the inset plots, I see that many of the examples use subplots with (ax, figure) definitions. Since I have some specific code for plt., I do not want to migrate this figure to a subplot version. Is there a solution to embed the inset plot under these circumstances?
from matplotlib import rc
import matplotlib.pyplot as plt
from labellines import labelLine, labelLines
rc('text', usetex=True)
plt.rcParams['text.latex.preamble']=[r"\usepackage{amsmath}",r"\usepackage[mathscr]{euscript}"]
plt.figure(figsize=(8, 8))
plt.plot(range(ksize),Cost_sum, '-k', label = 'Total',linewidth=3)
plt.plot(range(ksize),Build_cost, '-',c='seagreen', label = 'Build',linewidth=3)
plt.plot(range(ksize),Downtime_cost, '-', c='indigo', label = 'Downtime',linewidth=3)
plt.plot(range(ksize),Purchasing_cost, '-', c='crimson', label = 'Purchasing',linewidth=3)
plt.plot(range(ksize),Holding_cost, '-', c='orange', label = 'Holding',linewidth=3)
plt.plot(range(ksize),Backlogging_cost, '-', c='turquoise', label = 'Backlogging',linewidth=3)
plt.plot([k],[min(Cost_sum)], 'k', marker='*', mfc='none', markersize=14)
plt.text(k*0.98, min(Cost_sum)*1.04, r'$k^*$')
plt.xticks(list(plt.xticks()[0]) + [k])
lines = plt.gca().get_lines()
labelLine(lines[4],11,label='{}'.format(lines[4].get_label()),align = False,fontsize=16,zorder=2.5,va='bottom')
labelLine(lines[0],15,label='{}'.format(lines[0].get_label()),align = False,fontsize=16,zorder=2.5)
labelLine(lines[1],15,label='{}'.format(lines[1].get_label()),align = False,fontsize=16,zorder=2.5)
labelLine(lines[2],15,label='{}'.format(lines[2].get_label()),align = False,fontsize=16,zorder=2.5)
labelLine(lines[3],13,label='{}'.format(lines[3].get_label()),align = False,fontsize=16,zorder=2.5)
labelLines([lines[i] for i in range(len(lines))
if i not in [0,1,2,3,4]],zorder=2.5,align=False,fontsize=16,xvals=(15, 20),va='top')
plt.plot([k,k],[-max(Cost_sum)*0.05,min(Cost_sum)], '--k', dashes=(8, 6))
plt.ticklabel_format(style='sci', axis='y', scilimits=(0,0))
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.rc('font', size=20)
plt.xlim(left=-1,right=ksize*1.01)
plt.ylim(bottom = -max(Cost_sum)*0.05, top = max(Cost_sum)*1.05)
plt.xlabel(r'$k$', fontsize=20)
plt.ylabel('Annual Sourcing Cost (\$)', fontsize=20)
plt.show()
Here is the data to reproduce the attached figure.
(k,ksize,Cost_sum,Build_cost,Downtime_cost,Purchasing_cost,Holding_cost,Backlogging_cost) = (5, 21, [4566185.84958104, 3851831.0572200846, 3565535.616370524, 3395959.546010131, 3335310.1252832105, 3329129.801671144, 3348670.023232867, 3396492.9660517015, 3466368.5698340936, 3554060.0211574556, 3651792.419066521, 3758257.621291115, 3869866.130740931, 3985036.500754009, 4102434.354938996, 4221042.0030950345, 4340432.879293293, 4460167.615843281, 4580058.451828075, 4700015.099763582, 4819998.516231451], [0, 177222.2678574566, 358430.74126313033, 498693.32509609073, 625885.403095768, 746604.6981917183, 873034.7125363777, 975063.4720472957, 1049769.8471082426, 1145978.9429791428, 1199143.42108458, 1240852.8140335542, 1269882.4509229255, 1287853.138858912, 1308162.0194890695, 1311076.0940421396, 1312594.5371365794, 1312594.537136579, 1312594.537136579, 1312594.5371365792, 1312594.537136579], [0, 141129.06486431422, 198208.5540829755, 222573.40844582563, 259146.4212700642, 287560.0302637707, 320215.10890425567, 349656.43658995605, 368483.16599147196, 395395.4583382164, 410688.11784972803, 419781.9418066965, 432130.2205095227, 444005.25280786975, 451695.2201472967, 454782.15603979095, 456688.3693602677, 456423.1059102573, 456313.94189504976, 456270.5898305575, 456254.00629842596], [4541171.853867372, 3388943.963490517, 2745797.4942291463, 2296277.8906481406, 1953109.7381211254, 1678321.807949813, 1420580.3170326485, 1217326.2795553817, 1074452.1288599337, 919861.1429883543, 829741.2108432015, 765975.1717142115, 716516.4480671097, 682606.0612729619, 652138.545832702, 644789.7371737638, 640812.2765522903, 640812.2765522903, 640812.2765522903, 640812.2765522903, 640812.2765522903], [22393.016948892902, 21960.667293075123, 20795.12848139708, 16300.577174067963, 15257.966173544293, 14798.161857120422, 13090.785835092029, 12744.159873509096, 12026.756694177322, 11342.598968663648, 10753.439568233127, 10274.278741187029, 10018.226634046483, 9465.36734027374, 9350.039444208951, 9312.956511300557, 9259.41257276402, 9259.41257276402, 9259.41257276402, 9259.41257276402, 9259.41257276402], [2620.9787647758053, 2575.0937147202317, 2303.69831387141, 2114.3446460037326, 1910.5966227084793, 1845.1034087196194, 1749.098924491782, 1702.6179855582488, 1636.671180267143, 1481.8778830783633, 1466.2297207771976, 1373.4149954678835, 1318.7846073251453, 1106.6804739931142, 1088.530025718579, 1081.0593280402318, 1078.2836713916126, 1078.2836713916126, 1078.2836713916126, 1078.2836713916126, 1078.2836713916126])

Python: Highlighting, marking or indicating point in (scatter) plot

UPDATE
Trying some more, I managed to run this code without error:
from matplotlib.pyplot import figure
dict = pd.DataFrame({"Return": mkw_returns, "Standard Deviation": mkw_stds})
dict.head()
#plt.annotate("Sharpe Ratio", xytext=(0.5,0.5), xy=(0.03,0.03) , arrowprops=dict(facecolor='blue', shrink=0.01, width=220)) # arrowprops={width = 3, "facecolor":
#dict.plot(x="Standard Deviation", y = "Return", kind="scatter", figsize=(10,6))
#plt.xlabel("Standard Deviations")
#plt.ylabel("log_Return YoY")
figure(num=None, figsize=(15, 10), dpi=100, facecolor='w', edgecolor='k')
plt.plot( 'Standard Deviation', 'Return', data=dict, linestyle='none', marker='o')
plt.xlabel("Standard Deviations")
plt.ylabel("log_Return YoY")
# Annotate with text + Arrow
plt.annotate(
# Label and coordinate
'This is a Test', xy=(0.01, 1), xytext=(0.01, 1), color= "r", arrowprops={"facecolor": 'black', "shrink": 0.05}
)
Which now works YaY, can anybody shed some light onto this issue? Im not so sure why it suddenly started working. Thank you :)
Also, how would I simply mark a point, instead of using the arrow?
Problem: Cannot figure out how to mark/select/highlight a specific point in my scatter graph
(Python 3 Beginner)
So my goal is to highlight one or more points in a scatter graph with some text by it or supplied by a legend.
https://imgur.com/a/VWeO1EH
(not enough reputation to post images, sorry)
dict = pd.DataFrame({"Return": mkw_returns, "Standard Deviation": mkw_stds})
dict.head()
#plt.annotate("Sharpe Ratio", xytext=(0.5,0.5), xy=(0.03,0.03) , arrowprops=dict(facecolor='blue', shrink=0.01, width=220)) # arrowprops={width = 3, "facecolor":
dict.plot(x="Standard Deviation", y = "Return", kind="scatter", figsize=(10,6))
plt.xlabel("Standard Deviations")
plt.ylabel("log_Return YoY")
The supressed "plt.annotate" would give an error as specified below.
Specifically i would like to select the sharpe ratio, but for now Im happy if I manage to select any point in the scatter graph.
Im truly confused how to work with matplotlib, so any help is welcomed
I tried the following solutions I found online:
I)
This shows a simple way to use annotate in a plot, to mark a specific point by an arrow.
https://www.youtube.com/watch?v=ItHDZEE5wSk
However the pd.dataframe environment does not like annotate and i get the error:
TypeError: 'DataFrame' object is not callable
II)
Since Im running into issues with annotate in a Data Frame environment, I looked at the following solution
Annotate data points while plotting from Pandas DataFrame
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string
df = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)},
index=list(string.ascii_lowercase[:10]))
fig, ax = plt.subplots()
df.plot('x', 'y', kind='scatter', ax=ax, figsize=(10,6))
for k, v in df.iterrows():
ax.annotate(k, v)
However the resulting plot does not show any annotation what so ever when applied to my problem, besides this very long horizontal scroll bar
https://imgur.com/a/O8ykmeg
III)
Further, I stumbled upon this solution, to use a marker instead of an arrow,
Matplotlib annotate with marker instead of arrow
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
x=[1,2,3,4,5,6,7,8,9,10]
y=[1,1,1,2,10,2,1,1,1,1]
line, = ax.plot(x, y)
ymax = max(y)
xpos = y.index(ymax)
xmax = x[xpos]
# Add dot and corresponding text
ax.plot(xmax, ymax, 'ro')
ax.text(xmax, ymax+2, 'local max:' + str(ymax))
ax.set_ylim(0,20)
plt.show()
however the code does absolutely nothing, when applied to my situation like so
dict = pd.DataFrame({"Return": mkw_returns, "Standard Deviation": mkw_stds})
dict.head()
plt.annotate("Sharpe Ratio", xytext=(0.5,0.5), xy=(0.03,0.03) , arrowprops=dict(facecolor='blue', shrink=0.01, width=220)) # arrowprops={width = 3, "facecolor":
dict.plot(x="Standard Deviation", y = "Return", kind="scatter", figsize=(10,6))
plt.xlabel("Standard Deviations")
plt.ylabel("log_Return YoY")
ymax = max(y)
xpos = y.index(ymax)
xmax = x[xpos]
# Add dot and corresponding text
ax.plot(xmax, ymax, 'ro')
ax.text(xmax, ymax+2, 'local max:' + str(ymax))
ax.set_ylim(0,20)
plt.show()
IV)
Lastly, I tried a solution that apparently works flawlessly with an arrow in a pd.dataframe,
https://python-graph-gallery.com/193-annotate-matplotlib-chart/
# Library
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Basic chart
df=pd.DataFrame({'x': range(1,101), 'y': np.random.randn(100)*15+range(1,101) })
plt.plot( 'x', 'y', data=df, linestyle='none', marker='o')
# Annotate with text + Arrow
plt.annotate(
# Label and coordinate
'This point is interesting!', xy=(25, 50), xytext=(0, 80),
# Custom arrow
arrowprops=dict(facecolor='black', shrink=0.05)
)
however running this code yields me the same error as above:
TypeError: 'DataFrame' object is not callable
Version:
import sys; print(sys.version)
3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)]
Sorry for the WoT, but I thought its best to have everything I tried together in one post.
Any help is appreciated, thank you!
I think one solution is the following, as posted above as the "UPDATE":
UPDATE
Trying some more, I managed to run this code without error:
from matplotlib.pyplot import figure
dict = pd.DataFrame({"Return": mkw_returns, "Standard Deviation": mkw_stds})
dict.head()
#plt.annotate("Sharpe Ratio", xytext=(0.5,0.5), xy=(0.03,0.03) , arrowprops=dict(facecolor='blue', shrink=0.01, width=220)) # arrowprops={width = 3, "facecolor":
#dict.plot(x="Standard Deviation", y = "Return", kind="scatter", figsize=(10,6))
#plt.xlabel("Standard Deviations")
#plt.ylabel("log_Return YoY")
figure(num=None, figsize=(15, 10), dpi=100, facecolor='w', edgecolor='k')
plt.plot( 'Standard Deviation', 'Return', data=dict, linestyle='none', marker='o')
plt.xlabel("Standard Deviations")
plt.ylabel("log_Return YoY")
# Annotate with text + Arrow
plt.annotate(
# Label and coordinate
'This is a Test', xy=(0.01, 1), xytext=(0.01, 1), color= "r", arrowprops={"facecolor": 'black', "shrink": 0.05}
)
One question remains, how can I use a different marker or color and write about it in the legend instead?
Thanks in advance :)

(python matplotlib ) How to change color of each lollipop in a lollipop plot (ax.stem)

I am using ax.stem to draw lollipop plot in python. However, I found it difficult to assign different colors to each lollipop
as shown here
As you can see I have 2 categories "GWP" & "FDP".
In my project, each category should be divided into 4 subcategories "ingredient", "Waste", "energy" and "infrastructure". Therefore, I want to assign them different colors to indicate the subcategory.
There is a solution proposed here: https://python-graph-gallery.com/181-custom-lollipop-plot/
But this only teaches you how to change color for all lollipops.
And there is another solution: https://python-graph-gallery.com/183-highlight-a-group-in-lollipop/
But this one doesn't really use ax.stem.
Please let me know how to assign different colors to each lollipop.
(Also, I don't know somehow why my plot is displayed upside down. Also, the y axis does not align in order, and there is one dot not connected by a line. It displays correctly in my original plot though.)
Here is my code:
#%%
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
# my dataset
columns = np.array(['types', 'GWP100 (year)', 'FDP (year)'])
types = np.array(['Total (ingredient) per kg', 'Total (waste) per kg',
'energy (whole process) per kg', 'Infrastructure', 'Total (Total)']).reshape(5,1)
gwp = np.array([ 2.86982617e+02, 2.16824983e+02, 4.38920760e+01,
6.02400000e-02, 5.47759916e+02]).reshape(5,1)
fdp = np.array([ 1.35455867e+02, 7.02868322e+00, 1.26622560e+01,
1.64568000e-02, 1.55163263e+02]).reshape(5,1)
original_data = np.concatenate((types, gwp, fdp), axis = 1)
# produce dataframe
data = pd.DataFrame(original_data, columns = columns)
# types GWP100 (year) FDP (year)
#0 Total (ingredient) per kg 286.982617 135.455867
#1 Total (waste) per kg 216.824983 7.02868322
#2 energy (whole process) per kg 43.892076 12.662256
#3 Infrastructure 0.06024 0.0164568
#4 Total (Total) 547.759916 155.163263
#%% graph
fig = plt.figure(1, figsize =(8,6))
# 1st subplot
ax1 = fig.add_subplot(1,2,1)
gwp = data[data.columns[1]]
ax1.stem(gwp)
ax1.set_ylabel(r'kg CO$_2$-Eq', fontsize=10)
ax1.set_xlabel('GWP', fontsize=10)
# 2nd subplot
ax2 = fig.add_subplot(1,2,2)
fdp = data[data.columns[2]]
ax2.stem(fdp)
ax2.set_ylabel(r'kg oil-Eq', fontsize = 10)
ax2.set_xlabel('FDP', fontsize=10)
The stem currently consists of a couple of lines and a "line" consisting of dots on top. It has no option to colorize the lines separately within its interface.
You may replicate the stem plot to draw the lines manually with the color you like.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
columns = np.array(['types', 'GWP100 (year)', 'FDP (year)'])
types = np.array(['Total (ingredient) per kg', 'Total (waste) per kg',
'energy (whole process) per kg', 'Infrastructure', 'Total (Total)'])
gwp = np.array([ 2.86982617e+02, 2.16824983e+02, 4.38920760e+01,
6.02400000e-02, 5.47759916e+02])
fdp = np.array([ 1.35455867e+02, 7.02868322e+00, 1.26622560e+01,
1.64568000e-02, 1.55163263e+02])
# produce dataframe
data = pd.DataFrame([types,gwp,fdp], index = columns).transpose()
colors = list("bgryk")
fig, (ax, ax2) = plt.subplots(ncols=2)
for t, y, c in zip(data["types"], data["GWP100 (year)"],colors):
ax.plot([t,t], [0,y], color=c, marker="o", markevery=(1,2))
ax.set_ylim(0,None)
plt.setp(ax.get_xticklabels(), rotation=90)
fig.tight_layout()
plt.show()
A more efficient solution is of course to use a LineCollection in combination with a scatter plot for the dots.
fig, (ax, ax2) = plt.subplots(ncols=2)
segs = np.zeros((len(data), 2, 2))
segs[:,:,0] = np.repeat(np.arange(len(data)),2).reshape(len(data),2)
segs[:,1,1] = data["GWP100 (year)"].values
lc = LineCollection(segs, colors=colors)
ax.add_collection(lc)
ax.scatter(np.arange(len(data)), data["GWP100 (year)"].values, c=colors)
ax.set_xticks(np.arange(len(data)))
ax.set_xticklabels(data["types"], rotation=90)
ax.autoscale()
ax.set_ylim(0,None)
fig.tight_layout()
plt.show()
I will answer one of your main questions regarding the same coloring of the lines and markers category wise. There seems to be no direct option while calling ax1.stem() to specify the list of colors as per the official docs. In fact they say that the resulting plot might not be reasonable if one do so. Nevertheless, below is one trick to get things done your way.
The idea is following:
Get the objects (stemline) displayed on the subplot
Get the x-y data of the markers
Loop over the data and change the color of each stemline. Plot the marker individually with the same color as stemline. The colors is an array specifying the colors of your choice.
Following is the relevant part of the code:
# 1st subplot
ax1 = fig.add_subplot(1,2,1)
gwp = data[data.columns[1]]
colors = ['r', 'g', 'b', 'y', 'k']
_, stemlines, _ = ax1.stem(gwp)
line = ax1.get_lines()
xd = line[0].get_xdata()
yd = line[0].get_ydata()
# mec and mfc stands for markeredgecolor and markerfacecolor
for i in range(len(stemlines)):
plt.plot([xd[i]], [yd[i]], 'o', ms=7, mfc=colors[i], mec=colors[i])
plt.setp(stemlines[i], 'color', colors[i])
ax1.set_ylabel(r'kg CO$_2$-Eq', fontsize=10)
ax1.set_xlabel('GWP', fontsize=10)
# 2nd subplot
ax2 = fig.add_subplot(1,2,2)
fdp = data[data.columns[2]]
_, stemlines, _ = ax2.stem(fdp)
line = ax2.get_lines()
xd = line[0].get_xdata()
yd = line[0].get_ydata()
for i in range(len(stemlines)):
plt.plot([xd[i]], [yd[i]], 'o', ms=7, mfc=colors[i], mec=colors[i])
plt.setp(stemlines[i], 'color', colors[i])

Categories