Incorrect legend labels in python seaborn plots - python

The above plot is made using seaborn in python. However, not sure why some of the legend circles are filled in with color and others are not. This is the colormap I am using:
sns.color_palette("Set2", 10)
g = sns.factorplot(x='month', y='vae_factor', hue='ad_name', col='crop', data=df_sub_panel,
col_wrap=3, size=5, lw=0.5, ci=None, capsize=.2, palette=sns.color_palette("Set2", 10),
sharex=False, aspect=.9, legend_out=False)
g.axes[0].legend(fancybox=None)
--EDIT:
Is there a way the circles can be filled? The reason they are not filled is that they might not have data in this particular plot

The circles are not filled in when there is no data, as I think you've already deduced. But it can be forced by manipulating the legend object.
Full example:
import pandas as pd
import seaborn as sns
df_sub_panel = pd.DataFrame([
{'month':'jan', 'vae_factor':50, 'ad_name':'China', 'crop':False},
{'month':'feb', 'vae_factor':60, 'ad_name':'China', 'crop':False},
{'month':'feb', 'vae_factor':None, 'ad_name':'Mexico', 'crop':False},
])
sns.color_palette("Set2", 10)
g = sns.factorplot(x='month', y='vae_factor', hue='ad_name', col='crop', data=df_sub_panel,
col_wrap=3, size=5, lw=0.5, ci=None, capsize=.2, palette=sns.color_palette("Set2", 10),
sharex=False, aspect=.9, legend_out=False)
# fill in empty legend handles (handles are empty when vae_factor is NaN)
for handle in g.axes[0].get_legend_handles_labels()[0]:
if not handle.get_facecolors().any():
handle.set_facecolor(handle.get_edgecolors())
legend = g.axes[0].legend(fancybox=None)
sns.plt.show()
The important part is the manipulation of the handle objects in legend at the end (in the for loop).
This will generate:
Compared to the original (without the for loop):
EDIT: Now less hacky thanks to suggestions from comments!

Related

Marker style of legend and the graph are not matching?

Below is the code I am using for generating the plot, but the issue is style of the marker in the graph is different from that of the plot
sns.set_style(rc={'boxplot.flierprops.markeredgecolor':'black' ,'boxplot.flierprops.markeredgewidth':1.25,'boxplot.flierprops.markerfacecolor':'white'})
fig, scatter = plt.subplots(figsize = (6,4), dpi = 100)
scatter = sns.lineplot(data=df_whole,x='shortest_distance',y='similarity',style ='Metric',hue='Metric'
,markers=True,lw=1,markeredgewidth=1.25,markeredgecolor='black',markersize=7,dashes= False,errorbar=None,markerfacecolor='white')
scatter.set(title='TF-IDF')
scatter.legend(title = "Similarity Methods",prop={'size': 12})
As seaborn uses complex combinations of matplotlib elements to create its plots, and tries to make the legend as compact as possible, the legend is often custom-made. As such, seaborn unfortunately does not always take into account all matplotlib-level parameters.
In this case, the problem can be worked around via assigning these parameters again to the legend handles. Here is an example using one of seaborn's test datasets:
import matplotlib.pyplot as plt
import seaborn as sns
flights = sns.load_dataset('flights')
markerprops = dict(markeredgewidth=1.25, markeredgecolor='black', markersize=7, markerfacecolor='none')
ax = sns.lineplot(data=flights, x='year', y='passengers', style='month', hue='month',
markers=True, lw=1, dashes=False, errorbar=None, **markerprops)
ax.set(title='TF-IDF')
handles, labels = ax.get_legend_handles_labels()
for h in handles:
h.set(**markerprops)
ax.legend(handles=handles, title="Months", prop={'size': 12}, ncol=3)
plt.tight_layout()
plt.show()
PS: Matplotlib functions usually return the graphical elements they created (e.g. scatter dots or lines), while seaborn (and pandas) usually returns the subplot (ax) or grid of subplots. As such, giving the name scatter to the return value of sns.lineplot might be confusing when comparing code with other matplotlib and seaborn examples.

Changing x-labels and width while using catplot in seaborn

I have a sample dataset as follows;
pd.DataFrame({'Day_Duration':['Evening','Evening','Evening','Evening','Evening','Morning','Morning','Morning',
'Morning','Morning','Night','Night','Night','Night','Night','Noon','Noon','Noon',
'Noon','Noon'],'place_category':['Other','Italian','Japanese','Chinese','Burger',
'Other','Juice Bar','Donut','Bakery','American','Other','Italian','Japanese','Burger',\
'American','Other','Italian','Burger','American','Salad'],'Percent_delivery':[14.03,10.61,9.25,8.19,6.89,19.58,10.18,9.14,8.36,6.53,13.60,8.42,\
8.22,7.66,6.67,17.71,10.62,8.44,8.33,7.50]})
The goal is to draw faceted barplot with Day_duration serving as facets, hence 4 facets in total. I used the following code to achieve the same,
import seaborn as sns
#g = sns.FacetGrid(top5_places, col="Day_Duration")
g=sns.catplot(x="place_category", y="Percent_delivery", hue='place_category',col='Day_Duration',\
data=top5_places,ci=None,kind='bar',height=4, aspect=.7)
g.set_xticklabels(rotation=90)
Attached is the figure I got;
Can I kindly get help with 2 things, first is it possible to get only 5 values on the x-axis for each facet(rather than seeing all the values for each facet), second, is there a way to make the bars a bit wider. Help is appreciated.
Because you're using hue the api applies a unique color to each value of place_category, but it also expects each category to be in the plot, as shown in your image.
The final figure is a FacetGrid. Using subplot is the manual way of creating one.
In order to plot only the top n categories for each Day_Duration, each plot will need to be done individually, with a custom color map.
cmap is a dictionary with place categories as keys and colors as values. It's used so there will be one legend and each category will be colored the same for each plot.
Because we're not using the legend automatically generated by the plot, one needs to be created manually.
patches uses Patch to create each item in the legend. (e.g. the rectangle, associated with color and name).
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
# create a color map for unique values or place
place_cat = df.place_category.unique()
colors = sns.color_palette('husl', n_colors=10)
cmap = dict(zip(place_cat, colors))
# plot a subplot for each Day_Duration
plt.figure(figsize=(16, 6))
for i, tod in enumerate(df.Day_Duration.unique(), 1):
data = df[df.Day_Duration == tod].sort_values(['Percent_delivery'], ascending=False)
plt.subplot(1, 4, i)
p = sns.barplot(x='place_category', y='Percent_delivery', data=data, hue='place_category', palette=cmap)
p.legend_.remove()
plt.xticks(rotation=90)
plt.title(f'Day Duration: {tod}')
plt.tight_layout()
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
plt.legend(handles=patches, bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0)
plt.show()

multiple boxplots, side by side, using matplotlib from a dataframe

I'm trying to plot 60+ boxplots side by side from a dataframe and I was wondering if someone could suggest some possible solutions.
At the moment I have df_new, a dataframe with 66 columns, which I'm using to plot boxplots. The easiest way I found to plot the boxplots was to use the boxplot package inside pandas:
boxplot = df_new.boxplot(column=x, figsize = (100,50))
This gives me a very very tiny chart with illegible axis which I cannot seem to change the font size for, so I'm trying to do this natively in matplotlib but I cannot think of an efficient way of doing it. I'm trying to avoid creating 66 separate boxplots using something like:
fig, ax = plt.subplots(nrows = 1,
ncols = 66,
figsize = (10,5),
sharex = True)
ax[0,0].boxplot(#insert parameters here)
I actually do not not how to get the data from df_new.describe() into the boxplot function, so any tips on this would be greatly appreciated! The documentation is confusing. Not sure what x vectors should be.
Ideally I'd like to just give the boxplot function the dataframe and for it to automatically create all the boxplots by working out all the quartiles, column separations etc on the fly - is this even possible?
Thanks!
I tried to replace the boxplot with a ridge plot, which takes up less space because:
it requires half of the width
you can partially overlap the ridges
it develops vertically, so you can scroll down all the plot
I took the code from the seaborn documentation and adapted it a little bit in order to have 60 different ridges, normally distributed; here the code:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import itertools
sns.set(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# # Create the data
n = 20
x = list(np.random.randn(1, 60)[0])
g = [item[0] + item[1] for item in list(itertools.product(list('ABCDEFGHIJ'), list('123456')))]
df = pd.DataFrame({'x': n*x,
'g': n*g})
# Initialize the FacetGrid object
pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="g", hue="g", aspect=15, height=.5, palette=pal)
# Draw the densities in a few steps
g.map(sns.kdeplot, "x", clip_on=False, shade=True, alpha=1, lw=1.5, bw=.2)
g.map(sns.kdeplot, "x", clip_on=False, color="w", lw=2, bw=.2)
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Define and use a simple function to label the plot in axes coordinates
def label(x, color, label):
ax = plt.gca()
ax.text(0, .2, label, fontweight="bold", color=color,
ha="left", va="center", transform=ax.transAxes)
g.map(label, "x")
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25)
# Remove axes details that don't play well with overlap
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)
plt.show()
This is the result I get:
I don't know if it will be good for your needs, in any case keep in mind that keeping so many distributions next to each other will always require a lot of space (and a very big screen).
Maybe you could try dividing the distrubutions into smaller groups and plotting them a little at a time?

Matplotlib: Plot on double y-axis plot misaligned

I'm trying to plot two datasets into one plot with matplotlib. One of the two plots is misaligned by 1 on the x-axis.
This MWE pretty much sums up the problem. What do I have to adjust to bring the box-plot further to the left?
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
titles = ["nlnd", "nlmd", "nlhd", "mlnd", "mlmd", "mlhd", "hlnd", "hlmd", "hlhd"]
plotData = pd.DataFrame(np.random.rand(25, 9), columns=titles)
failureRates = pd.DataFrame(np.random.rand(9, 1), index=titles)
color = {'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue',
'caps': 'Gray'}
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
plotData.plot.box(ax=ax1, color=color, sym='+')
failureRates.plot(ax=ax2, color='b', legend=False)
ax1.set_ylabel('Seconds')
ax2.set_ylabel('Failure Rate in %')
plt.xlim(-0.7, 8.7)
ax1.set_xticks(range(len(titles)))
ax1.set_xticklabels(titles)
fig.tight_layout()
fig.show()
Actual result. Note that its only 8 box-plots instead of 9 and that they're starting at index 1.
The issue is a mismatch between how box() and plot() work - box() starts at x-position 1 and plot() depends on the index of the dataframe (which defaults to starting at 0). There are only 8 plots because the 9th is being cut off since you specify plt.xlim(-0.7, 8.7). There are several easy ways to fix this, as #Sheldore's answer indicates, you can explicitly set the positions for the boxplot. Another way you can do this is to change the indexing of the failureRates dataframe to start at 1 in construction of the dataframe, i.e.
failureRates = pd.DataFrame(np.random.rand(9, 1), index=range(1, len(titles)+1))
note that you need not specify the xticks or the xlim for the question MCVE, but you may need to for your complete code.
You can specify the positions on the x-axis where you want to have the box plots. Since you have 9 boxes, use the following which generates the figure below
plotData.plot.box(ax=ax1, color=color, sym='+', positions=range(9))

How to change boxplot size in seaborn FacetGrid object

I have a row of boxplots I produce using the following code:
import seaborn as sns
g = sns.FacetGrid(df, col="Column0", sharex=False)
g.map(sns.boxplot, 'column1', 'Column2')
It works well with the exception that the plots are super tiny. I have looked at How can I change the font size using seaborn FacetGrid? and How to change figuresize using seaborn factorplot as well as the seaborn manual, but I do not find the right way to include 'size' and 'aspect' into the code. What would be the proper way to change the plot size?
EDIT
If I try it like this:
g = sns.FacetGrid(df, col="Column0", sharex=False, size=20, aspect=3)
g.map(sns.boxplot, 'Column1', 'Column2')
I get the error: ValueError: width and height must each be below 32768. Is there a restriction in size for plots that are produced the way I do it?
Maybe you can try limiting maximum x and y values so that you plot will automatically adjust to values that are important.
g.set(xlim=(0, 60), ylim=(0, 14));
you say that plot is super tiny that means there are some elements present with very high values.

Categories