Matplotlib box plot fliers not showing - python

I was wondering if anyone had an issue with Matplotlib's box plot fliers not showing?
I literally copy-pasted this example here into a python script:
http://blog.bharatbhole.com/creating-boxplots-with-matplotlib/
...but the box plot fliers (outliers) are not showing. Does anyone know why I might not be seeing them? Sorry if this is a silly question, but I cannot for the life of me figure out why it doesn't work.
## Create data
np.random.seed(10)
collectn_1 = np.random.normal(100, 10, 200)
collectn_2 = np.random.normal(80, 30, 200)
collectn_3 = np.random.normal(90, 20, 200)
collectn_4 = np.random.normal(70, 25, 200)
## combine these different collections into a list
data_to_plot = [collectn_1, collectn_2, collectn_3, collectn_4]
# Create a figure instance
fig = plt.figure(1, figsize=(9, 6))
# Create an axes instance
ax = fig.add_subplot(111)
# Create the boxplot
bp = ax.boxplot(data_to_plot)
I also tried adding showfliers=True to the last line of that script, but it's still not working.
This is what I get as an output:

From the look of your plot, it seems you have imported the seaborn module. There is an issue with matplotlib boxplot fliers not showing up when seaborn is imported, even when fliers are explicitly enabled. Your code seem to be working fine when seaborn is not imported:
When seaborn is imported, you could do the following:
Solution 1:
Assuming you have imported seaborn like this:
import seaborn as sns
you can use the seaborn boxplot function:
sns.boxplot(data_to_plot, ax=ax)
resulting in:
Solution 2:
In case you want to keep using the matplotlib boxplot function (from Automatic (whisker-sensitive) ylim in boxplots):
ax.boxplot(data_to_plot, sym='k.')
resulting in:

You might not see fliers if the flier marker was set to None. The page you linked to has a for flier in bp['fliers']: loop, which sets the flier marker style, color and alpha:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(10)
collectn_1 = np.random.normal(100, 10, 200)
collectn_2 = np.random.normal(80, 30, 200)
collectn_3 = np.random.normal(90, 20, 200)
collectn_4 = np.random.normal(70, 25, 200)
## combine these different collections into a list
data_to_plot = [collectn_1, collectn_2, collectn_3, collectn_4]
# Create a figure instance
fig = plt.figure(1, figsize=(9, 6))
# Create an axes instance
ax = fig.add_subplot(111)
# Create the boxplot
bp = ax.boxplot(data_to_plot, showfliers=True)
for flier in bp['fliers']:
flier.set(marker='o', color='#e7298a', alpha=0.5)
plt.show()
yields

Related

How to change the number of size categories in seaborn scatterplot

I tried hard to look through all the documentation and examples but I am not able to figure it out. How do I change the number of categories = the number of size bubbles, and their boundaries in seaborn scatterplot? The sizes parameter doesn't help here.
It always gives me 6 of them regardless of what I try (here 8, 16, ..., 48):
import seaborn as sns
tips = sns.load_dataset("tips")
sns.scatterplot(data=tips, x="total_bill", y="tip", size="total_bill")
or
penguins = sns.load_dataset("penguins")
sns.scatterplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", size="body_mass_g")
And how do I change their boundaries? Ie. if I want to have 10, 20, 30, 40, 50 in the first case or 3000, 4000, 5000, 6000 in the second?
I know that going around and creating another column in the dataframe works but that is not wanted (adds unnecessary columns and even if I do it on the fly, it's just not what I am looking for).
Workaround:
def myfunc(mass):
if mass <3500:
return 3000
elif mass <4500:
return 4000
elif mass <5500:
return 5000
return 6000
penguins["mass"] = penguins.apply(lambda x: myfunc(x['body_mass_g']), axis=1)
sns.scatterplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", size="mass")
I don't think seaborn has a fine-grained control, it just tries to come up with something that works a bit intuitively for many situations, but not for all. The legend='full' parameter shows all values of the size column, but that can be too overwhelming.
The suggestion to create a new column with binned sizes has the drawback that this will also change the sizes used in the scatterplot.
An approach could be to create your own custom legend. Note that when the legend also contains other elements, this approach needs to be adapted a bit.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
tips = sns.load_dataset("tips")
ax = sns.scatterplot(data=tips, x="total_bill", y="tip", size="total_bill", legend='full')
handles, labels = ax.get_legend_handles_labels()
labels = np.array([float(l) for l in labels])
desired_labels = [10, 20, 30, 40, 50]
desired_handles = [handles[np.argmin(np.abs(labels - d))] for d in desired_labels]
ax.legend(handles=desired_handles, labels=desired_labels, title=ax.legend_.get_title().get_text())
plt.show()
The code can be wrapped into a function, and e.g. applied to the penguins:
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
def sizes_legend(desired_sizes, ax=None):
ax = ax or plt.gca()
handles, labels = ax.get_legend_handles_labels()
labels = np.array([float(l) for l in labels])
desired_handles = [handles[np.argmin(np.abs(labels - d))] for d in desired_sizes]
ax.legend(handles=desired_handles, labels=desired_sizes, title=ax.legend_.get_title().get_text())
penguins = sns.load_dataset("penguins")
ax = sns.scatterplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", size="body_mass_g", legend='full')
sizes_legend([3000, 4000, 5000, 6000], ax)
plt.show()

Improving time series subplots with Matplotlib Python

I am trying to make subplots from multiple columns of a pandas dataframe. Following code is somehow working, but I would like to improve it by moving all the legends to outside of plots (to the right) and add est_fmc variable to each plot.
L = new_df_honeysuckle[["Avg_1h_srf_mc", "Avg_1h_prof_mc", "Avg_10h_fuel_stick", "Avg_100h_debri_mc", "Avg_Daviesia_mc",
"Avg_Euclaypt_mc", "obs_fmc_average", "obs_fmc_max", "est_fmc"]].resample("1M").mean().interpolate().plot(figsize=(10,15),
subplots=True, linewidth = 3, yticks = (0, 50, 100, 150, 200))
plt.legend(loc='center left', markerscale=6, bbox_to_anchor=(1, 0.4))
Any help highly appreciated.
Since the plotting function of pandas does not allow for fine control, it is easiest to use the subplotting function of mpl and handle it through loop processing.' It was unclear whether you wanted to add the 'est_fmc' line or annotate it, so I added the line. For annotations, see this.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
import numpy as np
import itertools
columns = ["Avg_1h_srf_mc", "Avg_1h_prof_mc", "Avg_10h_fuel_stick", "Avg_100h_debri_mc", "Avg_Daviesia_mc", "Avg_Euclaypt_mc", "obs_fmc_average", "obs_fmc_max",'est_fmc']
date_rng = pd.date_range('2017-01-01','2020-02-01', freq='1m')
df = pd.DataFrame({'date':pd.to_datetime(date_rng)})
for col in columns:
tmp = np.random.randint(0,200,(37,))
df = pd.concat([df, pd.Series(tmp, name=col, index=df.index)], axis=1)
fig, axs = plt.subplots(len(cols[:-1]), 1, figsize=(10,15), sharex=True)
fig.subplots_adjust(hspace=0.5)
colors = mcolors.TABLEAU_COLORS
for i,(col,cname) in enumerate(zip(columns[:-1], itertools.islice(colors.keys(),9))):
axs[i].plot(df['date'], df[col], label=col, color=cname)
axs[i].plot(df['date'], df['est_fmc'], label='est_fmc', color='tab:olive')
axs[i].set_yticks([0, 50, 100, 150, 200])
axs[i].grid()
axs[i].legend(loc='upper left', bbox_to_anchor=(1.02, 1.0))
plt.show()

Y axis label in scientific notation when multiple bar charts are plotted

I was trying to plot multiple bar charts as subplot but the y axis keeps on getting scientific notation values. The initial code I ran was:
from matplotlib import pyplot as plt
fig, axes = plt.subplots(7,3,figsize = (25, 40)) # axes is a numpy array of pyplot Axes
axes = iter(axes.ravel())
cat_columns=['Source','Side','State','Timezone',
'Amenity', 'Bump', 'Crossing', 'Give_Way',
'Junction', 'No_Exit', 'Railway', 'Roundabout', 'Station', 'Stop',
'Traffic_Calming', 'Traffic_Signal', 'Turning_Loop', 'Sunrise_Sunset',
'Civil_Twilight', 'Nautical_Twilight', 'Astronomical_Twilight']
for col in cat_columns:
ax = df[col].value_counts().plot(kind='bar',label = col, ax=axes.__next__())
And the output looks like this:
enter image description here
fig, axes = plt.subplots(7,3,figsize = (25, 40)) # axes is a numpy array of pyplot Axes
axes = iter(axes.ravel())
cat_columns=['Source','Side','State','Timezone',
'Amenity', 'Bump', 'Crossing', 'Give_Way',
'Junction', 'No_Exit', 'Railway', 'Roundabout', 'Station', 'Stop',
'Traffic_Calming', 'Traffic_Signal', 'Turning_Loop', 'Sunrise_Sunset',
'Civil_Twilight', 'Nautical_Twilight', 'Astronomical_Twilight']
for col in cat_columns:
ax = df[col].value_counts().plot(kind='bar',label = col, ax=axes.__next__())
ax.ticklabel_format(useOffset=False, style='plain')
After using this line ax.ticklabel_format(useOffset=False, style='plain') I am getting an error like:
enter image description here
Please guide me on this error.
You can turn this off by creating a custom ScalarFormatter object and turning scientific notation off. For more details, see the matplotlib documentation pages on tick formatters and on ScalarFormatter.
# additional import statement at the top
from matplotlib import pyplot as plt
from matplotlib import ticker
# additional code for every axis
formatter = ticker.ScalarFormatter()
formatter.set_scientific(False)
ax.yaxis.set_major_formatter(formatter)

Matplotlib Axes legend shows only one label in barh

I have 15 barh subplots that looks like this:
I can't seem to get the legend working, so I'll see [2,3,4] as separate labels in the graph and in the legend.
I'm having trouble with making this work for subgraphs. My code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def plot_bars_by_data(data, title):
fig, axs = plt.subplots(8,2, figsize=(20,40))
fig.suptitle(title, fontsize=20)
fig.subplots_adjust(top=0.95)
plt.rcParams.update({'font.size': 13})
axs[7,1].remove()
column_index = 0
for ax_line in axs:
for ax in ax_line:
if column_index < len(data.columns):
column_name = data.columns[column_index]
current_column_values = data[column_name].value_counts().sort_index()
ax.barh([str(i) for i in current_column_values.index], current_column_values.values)
ax.legend([str(i) for i in current_column_values.index])
ax.set_title(column_name)
column_index +=1
plt.show()
# random data
df_test = pd.DataFrame([np.random.randint(2,5,size=15) for i in range(15)], columns=list('abcdefghijlmnop'))
plot_bars_by_data(df_test, "testing")
I just get a 8x2 bars that looks like the above graph. How can I fix this?
I'm using Python 3.6 and Jupyter Python notebook.
Use the following lines in your code. I can't put the whole output here as its a large figure with lots of subplots and hence showing a particular subplot. It turns out that first you have to create a handle for your subplot and then pass the legend values and the handle to produce the desired legends.
colors = ['r', 'g', 'b']
axx = ax.barh([str(i) for i in current_column_values.index], current_column_values.values, color=colors)
ax.legend(axx, [str(i) for i in current_column_values.index])
Sample Output

Using Colormaps to set color of line in matplotlib

How does one set the color of a line in matplotlib with scalar values provided at run time using a colormap (say jet)? I tried a couple of different approaches here and I think I'm stumped. values[] is a storted array of scalars. curves are a set of 1-d arrays, and labels are an array of text strings. Each of the arrays have the same length.
fig = plt.figure()
ax = fig.add_subplot(111)
jet = colors.Colormap('jet')
cNorm = colors.Normalize(vmin=0, vmax=values[-1])
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=jet)
lines = []
for idx in range(len(curves)):
line = curves[idx]
colorVal = scalarMap.to_rgba(values[idx])
retLine, = ax.plot(line, color=colorVal)
#retLine.set_color()
lines.append(retLine)
ax.legend(lines, labels, loc='upper right')
ax.grid()
plt.show()
The error you are receiving is due to how you define jet. You are creating the base class Colormap with the name 'jet', but this is very different from getting the default definition of the 'jet' colormap. This base class should never be created directly, and only the subclasses should be instantiated.
What you've found with your example is a buggy behavior in Matplotlib. There should be a clearer error message generated when this code is run.
This is an updated version of your example:
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cmx
import numpy as np
# define some random data that emulates your indeded code:
NCURVES = 10
np.random.seed(101)
curves = [np.random.random(20) for i in range(NCURVES)]
values = range(NCURVES)
fig = plt.figure()
ax = fig.add_subplot(111)
# replace the next line
#jet = colors.Colormap('jet')
# with
jet = cm = plt.get_cmap('jet')
cNorm = colors.Normalize(vmin=0, vmax=values[-1])
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=jet)
print scalarMap.get_clim()
lines = []
for idx in range(len(curves)):
line = curves[idx]
colorVal = scalarMap.to_rgba(values[idx])
colorText = (
'color: (%4.2f,%4.2f,%4.2f)'%(colorVal[0],colorVal[1],colorVal[2])
)
retLine, = ax.plot(line,
color=colorVal,
label=colorText)
lines.append(retLine)
#added this to get the legend to work
handles,labels = ax.get_legend_handles_labels()
ax.legend(handles, labels, loc='upper right')
ax.grid()
plt.show()
Resulting in:
Using a ScalarMappable is an improvement over the approach presented in my related answer:
creating over 20 unique legend colors using matplotlib
I thought it would be beneficial to include what I consider to be a more simple method using numpy's linspace coupled with matplotlib's cm-type object. It's possible that the above solution is for an older version. I am using the python 3.4.3, matplotlib 1.4.3, and numpy 1.9.3., and my solution is as follows.
import matplotlib.pyplot as plt
from matplotlib import cm
from numpy import linspace
start = 0.0
stop = 1.0
number_of_lines= 1000
cm_subsection = linspace(start, stop, number_of_lines)
colors = [ cm.jet(x) for x in cm_subsection ]
for i, color in enumerate(colors):
plt.axhline(i, color=color)
plt.ylabel('Line Number')
plt.show()
This results in 1000 uniquely-colored lines that span the entire cm.jet colormap as pictured below. If you run this script you'll find that you can zoom in on the individual lines.
Now say I want my 1000 line colors to just span the greenish portion between lines 400 to 600. I simply change my start and stop values to 0.4 and 0.6 and this results in using only 20% of the cm.jet color map between 0.4 and 0.6.
So in a one line summary you can create a list of rgba colors from a matplotlib.cm colormap accordingly:
colors = [ cm.jet(x) for x in linspace(start, stop, number_of_lines) ]
In this case I use the commonly invoked map named jet but you can find the complete list of colormaps available in your matplotlib version by invoking:
>>> from matplotlib import cm
>>> dir(cm)
A combination of line styles, markers, and qualitative colors from matplotlib:
import itertools
import matplotlib as mpl
import matplotlib.pyplot as plt
N = 8*4+10
l_styles = ['-','--','-.',':']
m_styles = ['','.','o','^','*']
colormap = mpl.cm.Dark2.colors # Qualitative colormap
for i,(marker,linestyle,color) in zip(range(N),itertools.product(m_styles,l_styles, colormap)):
plt.plot([0,1,2],[0,2*i,2*i], color=color, linestyle=linestyle,marker=marker,label=i)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,ncol=4);
UPDATE: Supporting not only ListedColormap, but also LinearSegmentedColormap
import itertools
import matplotlib.pyplot as plt
Ncolors = 8
#colormap = plt.cm.Dark2# ListedColormap
colormap = plt.cm.viridis# LinearSegmentedColormap
Ncolors = min(colormap.N,Ncolors)
mapcolors = [colormap(int(x*colormap.N/Ncolors)) for x in range(Ncolors)]
N = Ncolors*4+10
l_styles = ['-','--','-.',':']
m_styles = ['','.','o','^','*']
fig,ax = plt.subplots(gridspec_kw=dict(right=0.6))
for i,(marker,linestyle,color) in zip(range(N),itertools.product(m_styles,l_styles, mapcolors)):
ax.plot([0,1,2],[0,2*i,2*i], color=color, linestyle=linestyle,marker=marker,label=i)
ax.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,ncol=3,prop={'size': 8})
U may do as I have written from my deleted account (ban for new posts :( there was). Its rather simple and nice looking.
Im using 3-rd one of these 3 ones usually, also I wasny checking 1 and 2 version.
from matplotlib.pyplot import cm
import numpy as np
#variable n should be number of curves to plot (I skipped this earlier thinking that it is obvious when looking at picture - sorry my bad mistake xD): n=len(array_of_curves_to_plot)
#version 1:
color=cm.rainbow(np.linspace(0,1,n))
for i,c in zip(range(n),color):
ax1.plot(x, y,c=c)
#or version 2: - faster and better:
color=iter(cm.rainbow(np.linspace(0,1,n)))
c=next(color)
plt.plot(x,y,c=c)
#or version 3:
color=iter(cm.rainbow(np.linspace(0,1,n)))
for i in range(n):
c=next(color)
ax1.plot(x, y,c=c)
example of 3:
Ship RAO of Roll vs Ikeda damping in function of Roll amplitude A44

Categories