pandas - boxplot median color settings issues - python

I'm running Pandas 0.16.2 and Matplotlib 1.4.3. I have this issue coloring the median of the boxplot generated by the following code:
df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
fig, ax = plt.subplots()
medianprops = dict(linestyle='-', linewidth=2, color='blue')
bp = df.boxplot(medianprops=medianprops)
plt.show()
That returns:
It appears that the color setting is not read. Changing only the settings of linestyle and linewidth the plot reacts correctly.
medianprops = dict(linestyle='-.', linewidth=5, color='blue')
Anyone can reproduce it?

Looking at the code for DataFrame.boxplot() there is some special code to handle the colors of the different elements that supersedes the kws passed to matplotlib's boxplot. In theory, there seem to be a way to pass a color= argument containing a dictionary with keys being 'boxes', 'whiskers', 'medians', 'caps' but I can't seem to get it to work when calling boxplot() directly.
However, this seem to work:
df.plot(kind='box', color={'medians': 'blue'},
medianprops={'linestyle': '--', 'linewidth': 5})
see Pandas Boxplot Examples

Actually the following workaround works well, returning a dict from the boxplot command:
df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
fig, ax = plt.subplots()
bp = df.boxplot(return_type='dict')
and then assign directly colors and linewidth to the medians with:
[[item.set_color('r') for item in bp[key]['medians']] for key in bp.keys()]
[[item.set_linewidth(0.8) for item in bp[key]['medians']] for key in bp.keys()]

Related

How to add x-axis tick labels in python bar chart

I am trying to plot a python bar chart. Here is my code and an image of my bar chart. The problems I am facing are:
I want to write name of each category of bar chart on the x-axis as CAT1, CAT2, CAT3, CAT4. Right now it's printing 0, 1, 2 on the x-axis.
I want to change the purple color of the bar chart.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame([['CAT1',9,3,24,46,76], ['CAT2', 48,90,42,56,68], ['CAT3', 31,24,28,11,90],
['CAT4', 76,85,16,65,91]],
columns=['metric', 'A', 'B', 'C', 'D', 'E'])
df.plot(
kind='bar',
stacked=False
)
plt.legend(labels=['A', 'B', 'C', 'D', 'E'], ncol=4, loc='center', fontsize=15, bbox_to_anchor=(0.5, 1.06))
plt.show()
By default, matplotlib recognizes the index of your dataframe as x-labels.
I suggest you to add the following to make the column metric as the index, which allows matplotlib to automatically add label for you.
df = df.set_index('metric')

How to plot a line plot with confidence intervals and legend changing over x-axis in python

I have a dataframe that looks like this:
import pandas as pd
foo = pd.DataFrame({'time':[1,2,3,4], 'value':[2,4,6,8], 'group':['a', 'a', 'b', 'b'],
'top_ci':[3,5,7,9], 'bottom_ci': [1,3,5,7]})
I would like to create a lineplot, so i am using the following code:
ax = sns.lineplot(x="time", y="value", hue="group", data=foo)
ax.figure.savefig('test.png', bbox_inches='tight')
I would like to add a shaded area with the confidence interval, as it is defined from the top_ci and the bottom_ci columns in the foo dataframe.
Any ideas how I could do that ?
The easiest way would be to provide the individual datapoints and then let sns.lineplot compute the confidence interval for you. If you want/need to do it yourself, you can use ax.fill_between:
foo = pd.DataFrame({'time':[1,2,3,4], 'value':[2,4,6,8], 'group':['a', 'a', 'b', 'b'],
'top_ci':[3,5,7,9], 'bottom_ci': [1,3,5,7]})
groups = set(foo["group"]) # get group levels to get the same hue order in both plots
f, ax = plt.subplots()
sbn.lineplot(x="time", y="value", hue="group", data=foo, ax=ax, hue_order=groups)
for group in groups:
ax.fill_between(x=foo.loc[foo["group"] == group, "time"],
y1=foo.loc[foo["group"] == group, "bottom_ci"],
y2=foo.loc[foo["group"] == group, "top_ci"], alpha=0.2)
f.savefig('test15.png', bbox_inches='tight')

Seaborn violin plots don't align with x-axis labels

I am attempting to build a violin plot to illustrate depth on the y-axis and a distance away from a known point on the x-axis. I am able to get the x-axis labels to distribute appropriately spaced on the x-axis based on the variable distances but i am unable to get the violin plots to align. They plots appear to be shifted to the y-axis. Any help would be appreciated. My code is below:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
path = 'O:\info1.csv'
df = pd.read_csv(path)
item = ['a', 'b', 'c', 'd', 'e', 'f']
dist = [450, 1400, 2620, 3100, 3830, 4940]
plt.rcParams.update({'font.size': 15})
fig, axes1 = plt.subplots(figsize=(20,10))
axes1 = sns.violinplot(x='item', y='surface', data=df, hue = 'item', order = (item))
axes1.invert_yaxis()
axes1.set_xlabel('Item')
axes1.set_ylabel('Depth')
axes1.set_xticks(dist)
plt.xticks(rotation=20)
plt.show()
Example dataset:
You cannot use seaborn violin plot, because from the vignette:
This function always treats one of the variables as categorical and
draws data at ordinal positions (0, 1, … n) on the relevant axis, even
when the data has a numeric or date type.
So if you draw it directly with seaborn, it is categorical:
sns.violinplot(x='dist', y='surface', data=df, hue = 'item',dodge=False,cut=0)
To place the boxplot according, you need to use matplotlib, first we get the data out in the format required and define a color palette:
surface_values = list([np.array(value) for name,value in df.groupby('item')['surface']])
dist_values = df.groupby('item')['dist'].agg("mean")
pal = ["crimson","darkblue","rebeccapurple"]
You need to set the width, provide the distance, and for the inner "box", we modify the code from here:
fig, ax = plt.subplots(1, 1,figsize=(8,4))
parts = ax.violinplot(surface_values,widths=200,positions=dist_values,
showmeans=False, showmedians=False,showextrema=False)
for i,pc in enumerate(parts['bodies']):
pc.set_facecolor(pal[i])
pc.set_edgecolor('black')
pc.set_alpha(1)
quartile1, medians, quartile3 = np.percentile(surface_values, [25, 50, 75], axis=1)
whiskers = np.array([
adjacent_values(sorted_array, q1, q3)
for sorted_array, q1, q3 in zip(surface_values, quartile1, quartile3)])
whiskersMin, whiskersMax = whiskers[:, 0], whiskers[:, 1]
inds = dist_values
ax.scatter(inds, medians, marker='o', color='white', s=30, zorder=3)
ax.vlines(inds, quartile1, quartile3, color='k', linestyle='-', lw=5)
ax.vlines(inds, whiskersMin, whiskersMax, color='k', linestyle='-', lw=1)
If you don't need the inner box, you can just call plt.violin ...
thanks for including a bit of data.
To change your plot, the item and dist variables in your code need to be adjusted, and remove the item = [a,b...] and dist = [] arrays in your code. The ticks on the x-axis using the axes1.set_xticks needs a bit of tweaking to get what you're looking for there.
Example 1:
removed the two arrays that were creating the plot you were seeing before; violinplot function unchanged.
# item = ['a', 'b', 'c', 'd', 'e', 'f'] * Removed
# dist = [450, 1400, 2620, 3100, 3830, 4940] * Removed
plt.rcParams.update({'font.size': 15})
fig, axes1 = plt.subplots(figsize=(20,10))
axes1 = sb.violinplot(x='item', y='surface', data=df, hue = 'item', inner = 'box')
axes1.invert_yaxis()
axes1.set_xlabel('Item')
axes1.set_ylabel('Depth')
#axes1.set_xticks(dist) * Removed
plt.xticks(rotation=20)
plt.show()
Inside each curve, there is a black shape with a white dot inside. This is the miniature box plot mentioned above. If you'd like to remove the box plot, you can set the inner = None parameter in the violinplot call to simplify the look of the final visualization.
Example 2:
put dist on your x axis in place of the xticks.
plt.rcParams.update({'font.size': 15})
plt.subplots(figsize=(20,10))
# Put 'dist' as your x input, keep your categorical variable (hue) equal to 'item'
axes1 = sb.violinplot(data = df, x = 'dist', y = 'surface', hue = 'item', inner = 'box');
axes1.invert_yaxis()
axes1.set_xlabel('Item')
axes1.set_ylabel('Depth');
I'm not confident the items and the distances you are working with have a relationship you want to show on the x-axis, or if you just want to use those integers as your tick marks for that axis. If there is an important relationship between the item and the dist, you could use a dictionary new_dict = {450: 'a', 1400: 'b', 2620: 'c' ...
Hope you find this helpful.

Can't set different colors for each bar when I put it on top of a clustergram

Here is my example, I can't get different bar colors defined.... for some reason all are red.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# initiliaze a dataframe with index and column names
idf = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6]), ('C', 10,
20, 30]), ('D', [14, 15, 16])], orient='index', columns=['x', > 'y', 'z'])
# Plot the clustermap which will be a figure by itself
cax = sns.clustermap(idf, col_cluster=False, row_cluster=True)
# Get the column dendrogram axis
cax_col_dend_ax = cax.ax_col_dendrogram.axes
# Plot the boxplot on the column dendrogram axis
idf.iloc[0,:].plot(kind='bar', ax=cax_col_dend_ax, color = ['r', 'g', 'b'])
# Show the plot
plt.show()
Your code works fine for me. It seems you are using old python version because I got a FutureWarning: from_items is deprecated.. Although this is from pandas but you might want to upgrade. Nevertheless, you can still change the colors as follows
import matplotlib as mpl
# Your code here
ax1 = idf.iloc[0,:].plot.bar(ax=cax_col_dend_ax)
colors = ['r', 'g', 'b']
bars = [r for r in ax1.get_children() if isinstance(r, mpl.patches.Rectangle)]
for i, bar in enumerate(bars[0:3]):
bar.set_color(colors[i])

How to position 3 stacked bar graphs using Pandas

I'm trying t create a graphic with three stacked bar graphics, like so:
I actually have two questions:
1) I'm trying to use the 'position' parameter in Pandas' DataFrame, but the bars still overlap. Is there another alternative other than reducing the width of the bars?
2) The three bars have three categories in common (B, C, D, E), how can I have a legend that only contains the actual six different categories ?
My DataFrame is:
A B C D E F
0 0.108858 0.265929 0.537369 2.183963 1.353575 2.938775
1 0.375641 0.198720 0.266806 0.409179 0.286645 0.636405
2 1.179256 0.808986 0.171202 0.946194 0.506783 2.121366
3 1.510399 1.218619 0.307752 0.819865 1.283067 0.213556
And my test code is:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(abs(np.random.randn(4, 6)), columns=list('ABCDEF'))
print(df)
colors1 = ['#a50f15', '#9ecae1', '#6baed6', '#3182bd']
colors2 = ['#fb6a4a', '#9ecae1', '#6baed6', '#3182bd']
colors3 = ['#fcbba1', '#9ecae1', '#6baed6', '#3182bd']
fig, ax = plt.subplots(figsize=(8,6))
df.plot(ax=ax, y=['A', 'B', 'C', 'D'], kind='bar', stacked=True, width=0.15, color=colors1, position=0)
df.plot(ax=ax, y=['E', 'B', 'C', 'D'], kind='bar', stacked=True, width=0.15, color=colors2, position=0.5)
df.plot(ax=ax, y=['F', 'B', 'C', 'D'], kind='bar', stacked=True, width=0.15, color=colors3, position=1)
ax.legend(ncol=3)
plt.tight_layout()
plt.show()

Categories