How to position 3 stacked bar graphs using Pandas - python

I'm trying t create a graphic with three stacked bar graphics, like so:
I actually have two questions:
1) I'm trying to use the 'position' parameter in Pandas' DataFrame, but the bars still overlap. Is there another alternative other than reducing the width of the bars?
2) The three bars have three categories in common (B, C, D, E), how can I have a legend that only contains the actual six different categories ?
My DataFrame is:
A B C D E F
0 0.108858 0.265929 0.537369 2.183963 1.353575 2.938775
1 0.375641 0.198720 0.266806 0.409179 0.286645 0.636405
2 1.179256 0.808986 0.171202 0.946194 0.506783 2.121366
3 1.510399 1.218619 0.307752 0.819865 1.283067 0.213556
And my test code is:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(abs(np.random.randn(4, 6)), columns=list('ABCDEF'))
print(df)
colors1 = ['#a50f15', '#9ecae1', '#6baed6', '#3182bd']
colors2 = ['#fb6a4a', '#9ecae1', '#6baed6', '#3182bd']
colors3 = ['#fcbba1', '#9ecae1', '#6baed6', '#3182bd']
fig, ax = plt.subplots(figsize=(8,6))
df.plot(ax=ax, y=['A', 'B', 'C', 'D'], kind='bar', stacked=True, width=0.15, color=colors1, position=0)
df.plot(ax=ax, y=['E', 'B', 'C', 'D'], kind='bar', stacked=True, width=0.15, color=colors2, position=0.5)
df.plot(ax=ax, y=['F', 'B', 'C', 'D'], kind='bar', stacked=True, width=0.15, color=colors3, position=1)
ax.legend(ncol=3)
plt.tight_layout()
plt.show()

Related

How to add x-axis tick labels in python bar chart

I am trying to plot a python bar chart. Here is my code and an image of my bar chart. The problems I am facing are:
I want to write name of each category of bar chart on the x-axis as CAT1, CAT2, CAT3, CAT4. Right now it's printing 0, 1, 2 on the x-axis.
I want to change the purple color of the bar chart.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame([['CAT1',9,3,24,46,76], ['CAT2', 48,90,42,56,68], ['CAT3', 31,24,28,11,90],
['CAT4', 76,85,16,65,91]],
columns=['metric', 'A', 'B', 'C', 'D', 'E'])
df.plot(
kind='bar',
stacked=False
)
plt.legend(labels=['A', 'B', 'C', 'D', 'E'], ncol=4, loc='center', fontsize=15, bbox_to_anchor=(0.5, 1.06))
plt.show()
By default, matplotlib recognizes the index of your dataframe as x-labels.
I suggest you to add the following to make the column metric as the index, which allows matplotlib to automatically add label for you.
df = df.set_index('metric')

How to plot a line plot with confidence intervals and legend changing over x-axis in python

I have a dataframe that looks like this:
import pandas as pd
foo = pd.DataFrame({'time':[1,2,3,4], 'value':[2,4,6,8], 'group':['a', 'a', 'b', 'b'],
'top_ci':[3,5,7,9], 'bottom_ci': [1,3,5,7]})
I would like to create a lineplot, so i am using the following code:
ax = sns.lineplot(x="time", y="value", hue="group", data=foo)
ax.figure.savefig('test.png', bbox_inches='tight')
I would like to add a shaded area with the confidence interval, as it is defined from the top_ci and the bottom_ci columns in the foo dataframe.
Any ideas how I could do that ?
The easiest way would be to provide the individual datapoints and then let sns.lineplot compute the confidence interval for you. If you want/need to do it yourself, you can use ax.fill_between:
foo = pd.DataFrame({'time':[1,2,3,4], 'value':[2,4,6,8], 'group':['a', 'a', 'b', 'b'],
'top_ci':[3,5,7,9], 'bottom_ci': [1,3,5,7]})
groups = set(foo["group"]) # get group levels to get the same hue order in both plots
f, ax = plt.subplots()
sbn.lineplot(x="time", y="value", hue="group", data=foo, ax=ax, hue_order=groups)
for group in groups:
ax.fill_between(x=foo.loc[foo["group"] == group, "time"],
y1=foo.loc[foo["group"] == group, "bottom_ci"],
y2=foo.loc[foo["group"] == group, "top_ci"], alpha=0.2)
f.savefig('test15.png', bbox_inches='tight')

Can't set different colors for each bar when I put it on top of a clustergram

Here is my example, I can't get different bar colors defined.... for some reason all are red.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# initiliaze a dataframe with index and column names
idf = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6]), ('C', 10,
20, 30]), ('D', [14, 15, 16])], orient='index', columns=['x', > 'y', 'z'])
# Plot the clustermap which will be a figure by itself
cax = sns.clustermap(idf, col_cluster=False, row_cluster=True)
# Get the column dendrogram axis
cax_col_dend_ax = cax.ax_col_dendrogram.axes
# Plot the boxplot on the column dendrogram axis
idf.iloc[0,:].plot(kind='bar', ax=cax_col_dend_ax, color = ['r', 'g', 'b'])
# Show the plot
plt.show()
Your code works fine for me. It seems you are using old python version because I got a FutureWarning: from_items is deprecated.. Although this is from pandas but you might want to upgrade. Nevertheless, you can still change the colors as follows
import matplotlib as mpl
# Your code here
ax1 = idf.iloc[0,:].plot.bar(ax=cax_col_dend_ax)
colors = ['r', 'g', 'b']
bars = [r for r in ax1.get_children() if isinstance(r, mpl.patches.Rectangle)]
for i, bar in enumerate(bars[0:3]):
bar.set_color(colors[i])

Pandas and Matplotlib plotting df as subplots with 2 y-axes

I'm trying to plot a dataframe to a few subplots using pandas and matplotlib.pyplot. But I want to have the two columns use different y axes and have those shared between all subplots.
Currently my code is:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Area':['A', 'A', 'A', 'B', 'B', 'C','C','C','D','D','D','D'],
'Rank':[1,2,3,1,2,1,2,3,1,2,3,4],
'Count':[156,65,152,70,114,110,195,92,44,179,129,76],
'Value':[630,426,312,191,374,109,194,708,236,806,168,812]}
)
df = df.set_index(['Area', 'Rank'])
fig = plt.figure(figsize=(6,4))
for i, l in enumerate(['A','B','C','D']):
if i == 0:
sub1 = fig.add_subplot(141+i)
else:
sub1 = fig.add_subplot(141+i, sharey=sub1)
df.loc[l].plot(kind='bar', ax=sub1)
This produces:
This works to plot the 4 graphs side by side which is what I want but both columns use the same y-axis I'd like to have the 'Count' column use a common y-axis on the left and the 'Value' column use a common secondary y-axis on the right.
Can anybody suggest a way to do this? My attempts thus far have lead to each graph having it's own independent y-axis.
To create a secondary y axis, you can use twinax = ax.twinx(). Once can then join those twin axes via the join method of an axes Grouper, twinax.get_shared_y_axes().join(twinax1, twinax2). See this question for more details.
The next problem is then to get the two different barplots next to each other. Since I don't think there is a way to do this using the pandas plotting wrappers, one can use a matplotlib bar plot, which allows to specify the bar position quantitatively. The positions of the left bars would then be shifted by the bar width.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Area':['A', 'A', 'A', 'B', 'B', 'C','C','C','D','D','D','D'],
'Rank':[1,2,3,1,2,1,2,3,1,2,3,4],
'Count':[156,65,152,70,114,110,195,92,44,179,129,76],
'Value':[630,426,312,191,374,109,194,708,236,806,168,812]}
)
df = df.set_index(['Area', 'Rank'])
fig, axes = plt.subplots(ncols=len(df.index.levels[0]), figsize=(6,4), sharey=True)
twinaxes = []
for i, l in enumerate(df.index.levels[0]):
axes[i].bar(df["Count"].loc[l].index.values-0.4,df["Count"].loc[l], width=0.4, align="edge" )
ax2 = axes[i].twinx()
twinaxes.append(ax2)
ax2.bar(df["Value"].loc[l].index.values,df["Value"].loc[l], width=0.4, align="edge", color="C3" )
ax2.set_xticks(df["Value"].loc[l].index.values)
ax2.set_xlabel("Rank")
[twinaxes[0].get_shared_y_axes().join(twinaxes[0], ax) for ax in twinaxes[1:]]
[ax.tick_params(labelright=False) for ax in twinaxes[:-1]]
axes[0].set_ylabel("Count")
axes[0].yaxis.label.set_color('C0')
axes[0].tick_params(axis='y', colors='C0')
twinaxes[-1].set_ylabel("Value")
twinaxes[-1].yaxis.label.set_color('C3')
twinaxes[-1].tick_params(axis='y', colors='C3')
twinaxes[0].relim()
twinaxes[0].autoscale_view()
plt.show()

pandas - boxplot median color settings issues

I'm running Pandas 0.16.2 and Matplotlib 1.4.3. I have this issue coloring the median of the boxplot generated by the following code:
df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
fig, ax = plt.subplots()
medianprops = dict(linestyle='-', linewidth=2, color='blue')
bp = df.boxplot(medianprops=medianprops)
plt.show()
That returns:
It appears that the color setting is not read. Changing only the settings of linestyle and linewidth the plot reacts correctly.
medianprops = dict(linestyle='-.', linewidth=5, color='blue')
Anyone can reproduce it?
Looking at the code for DataFrame.boxplot() there is some special code to handle the colors of the different elements that supersedes the kws passed to matplotlib's boxplot. In theory, there seem to be a way to pass a color= argument containing a dictionary with keys being 'boxes', 'whiskers', 'medians', 'caps' but I can't seem to get it to work when calling boxplot() directly.
However, this seem to work:
df.plot(kind='box', color={'medians': 'blue'},
medianprops={'linestyle': '--', 'linewidth': 5})
see Pandas Boxplot Examples
Actually the following workaround works well, returning a dict from the boxplot command:
df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
fig, ax = plt.subplots()
bp = df.boxplot(return_type='dict')
and then assign directly colors and linewidth to the medians with:
[[item.set_color('r') for item in bp[key]['medians']] for key in bp.keys()]
[[item.set_linewidth(0.8) for item in bp[key]['medians']] for key in bp.keys()]

Categories