Controlling tick labels alignment in pandas Boxplot within subplots

Controlling tick labels alignment in pandas Boxplot within subplots - python

For a single boxplot, the tick labels alignment can be controlled like so:
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib inline
fig,ax = plt.subplots()
df.boxplot(column='col1',by='col2',rot=45,ax=ax)
plt.xticks(ha='right')
This is necessary because when the tick labels are long, it is impossible to read the plot if the tick labels are centered (the default behavior).
Now on to the case of multiple subplots. (I am sorry I am not posting a complete code example). I build the main figure first:
fig,axarr = plt.subplots(ny,nx,sharex=True,sharey=True,figsize=(12,6),squeeze=False)
then comes a for loop that iterates over all the subplot axes and calls a function that draws a boxplot in each of the axes objects:
for key,gr in grouped:
ix = i/ny # Python 2
iy = i%ny
add_box_plot(gr,xcol,axarr[iy,ix])
where
def add_box_plot(gs,xcol,ax):
gs.boxplot(column=xcol,by=keyCol,rot=45,ax=ax)
I have not found a way to get properly aligned tick labels.
If I add
plt.xticks(ha='right')
after the boxplot command in the function, only the last subplot gets the ticks aligned correctly (why?).

If I add plt.xticks(ha='right') after the boxplot command in the function, only the last subplot gets the ticks aligned correctly (why?).
This happens because plt.xticks refers to the last active axes. When you crate subplots, the one created last is active. You then access the axes opbjects directly(although they are called gs or gr in your code, whatever that means). However, this does not change the active axis.
Solution 1:
Use plt.sca() to set the current axis:
def add_box_plot(gs, xcol, ax):
gs.boxplot(column=xcol, by=keyCol, rot=45, ax=ax)
plt.sca(ax)
plt.xticks(ha='right')
Solution 2:
Use Axes.set_xticklabels() instead:
def add_box_plot(gs, xcol, ax):
gs.boxplot(column=xcol,by=keyCol,rot=45,ax=ax)
plt.draw() # This may be required to update the labels
labels = [l.get_text() for l in ax.get_xticklabels()]
ax.set_xticklabels(labels, ha='right')
I'm not sure if the call to plt.draw() is always required, but if I leave it out I only get empty labels.

Since you are using the mpl object oriented interface, you can set the tick parameters for each axis individually.
add a line to set the xticklabels within your add_box_plot function (after gs.boxplot). Unlike plt.xticks, you cannot just give set_xticklabels the ha keyword, it also requires you to give it a list of tick labels. Here, we can just grab the existing labels with get_xticklabels:
def add_box_plot(gs,xcol,ax):
gs.boxplot(column=xcol,by=keyCol,rot=45,ax=ax)
ax.set_xticklabels(ax.get_xticklabels(),ha='right')
Here's a minimal example to show this working:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# We'll create two subplots, to test out different alignments
fig,(ax1,ax2) = plt.subplots(2)
# A sample dataframe
df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
# Boxplot on first subplot
df.boxplot(ax=ax1)
# Boxplot on second subplot
df.boxplot(ax=ax2)
# Set xticklabels to right alignment
ax1.set_xticklabels(ax1.get_xticklabels(),ha='right')
# Set xticklabels to left alignment
ax2.set_xticklabels(ax2.get_xticklabels(),ha='left')
plt.show()
Notice the xticklabels are right-aligned on the top subplot, and left-aligned on the bottom.

Related

How can I rotate axis labels in a faceted seaborn.objects plot?

I am working with the excellent seaborn.objects module in the most recent version of seaborn.
I would like to produce a plot:
With rotated x-axis labels
With facets
Rotating x-axis labels is not directly supported within seaborn.objects (and standard stuff like plt.xticks() after making the graph doesn't work), but the documentation suggests doing it using the .on() method. .on() takes a matplotlib figure/subfigure or axis object and builds on top of it. As I pointed out in my answer to this question, the following code works to rotate axis labels:
import seaborn.objects as so
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'a':[1,2,3,4],
'b':[5,6,7,8],
'c':['a','a','b','b']})
fig, ax = plt.subplots()
ax.xaxis.set_tick_params(rotation=90)
(so.Plot(df, x = 'a', y = 'b')
.add(so.Line())
.on(ax))
However, if I add facets by changing the graph code to
(so.Plot(df, x = 'a', y = 'b')
.add(so.Line())
.on(ax)
.facet('c'))
I get the error Cannot create multiple subplots after calling Plot.on with a <class 'matplotlib.axes._axes.Axes'> object. You may want to use a <class 'matplotlib.figure.SubFigure'> instead.
However, if I follow the instruction, and instead use the fig object, rotating the axes in that, I get a strange dual x-axis, one with rotated labels unrelated to the data, and the actual graph's labels, unrotated:
fig, ax = plt.subplots()
plt.xticks(rotation = 90)
(so.Plot(df, x = 'a', y = 'b')
.add(so.Line())
.on(fig)
.facet('c'))
How can I incorporate rotated axis labels with facets?

You're ending up with multiple axes plotted on top of each other. Note the parameter description for Plot.on:
Passing matplotlib.axes.Axes will add artists without otherwise modifying the figure. Otherwise, subplots will be created within the space of the given matplotlib.figure.Figure or matplotlib.figure.SubFigure.
Additionally, the pyplot functions (i.e. plt.xticks) only operate on the "current" axes, not all axes in the current figure.
So, two step solution:
Only initialize the figure externally, and delegate subplot creation to Plot
Use matplotlib's object-oriented interface to modify the tick label parameters
Example:
fig = plt.figure()
(so.Plot(df, x = 'a', y = 'b')
.add(so.Line())
.on(fig)
.facet('c')
.plot()
)
for ax in fig.axes:
ax.tick_params("x", rotation=90)
Note that it will likely become possible to control tick label rotation directly through the Plot API, although rotating tick labels (especially to 90 degrees) is often not the best way to make overlapping labels readable.

Plot only specific subplots of a grid of subplots

I have created a grid of subplots to my liking.
I initiated the plotting by defining fig,ax = plt.subplots(2,6,figsize=(24,8))
So far so good. I filled those subplots with their respective content. Now I want to plot a single or two particular subplot in isolation. I tried:
ax[idx][idx].plot()
This does not work and returns an empty list
I have tried:
fig_single,ax_single = plt.subplots(2,1)
ax_single[0]=ax[idx][0]
ax_single[1]=ax[idx][1]
This returns:
TypeError: 'AxesSubplot' object does not support item assignment
How do I proceed without plotting those subplots again by calling the respective plot functions?

You're close.
fig,ax = plt.subplots(nrows=2,ncols=6,sharex=False,sharey=False,figsize=(24,8))
#set either sharex=True or sharey=True if you wish axis limits to be shared
#=> very handy for interactive exploration of timeseries data, ...
r=0 #first row
c=0 #first column
ax[r,c].plot() #plot your data, instead of ax[r][c].plot()
ax[r,c].set_title() #name title for a subplot
ax[r,c].set_ylabel('Ylabel ') #ylabel for a subplot
ax[r,c].set_xlabel('X axis label') #xlabel for a subplot
A more complete/flexible method is to assign r,c:
for i in range(nrows*ncols):
r,c = np.divmod(i,ncols)
ax[r,c].plot() #....
You can afterwards still make modifications, e.g. set_ylim, set_title, ...
So if you want to name the label of the 11th subplot:
ax[2,4].set_ylabel('11th subplot ylabel')
You will often want to make use of fig.tight_layout() at the end, so that the figure uses the available area correctly.
Complete example:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0,180,180)
nrows = 2
ncols = 6
fig,ax = plt.subplots(nrows=nrows,ncols=ncols,sharex=False,sharey=False,figsize=(24,8))
for i in range(nrows*ncols):
r,c = np.divmod(i,ncols)
y = np.sin(x*180/np.pi*(i+1))
ax[r,c].plot(x,y)
ax[r,c].set_title('%s'%i)
fig.suptitle('Overall figure title')
fig.tight_layout()

Two seaborn plots with different scales displayed on same plot but bars overlap

I am trying to include 2 seaborn countplots with different scales on the same plot but the bars display as different widths and overlap as shown below. Any idea how to get around this?
Setting dodge=False, doesn't work as the bars appear on top of each other.

The main problem of the approach in the question, is that the first countplot doesn't take hue into account. The second countplot won't magically move the bars of the first. An additional categorical column could be added, only taking on the 'weekend' value. Note that the column should be explicitly made categorical with two values, even if only one value is really used.
Things can be simplified a lot, just starting from the original dataframe, which supposedly already has a column 'is_weeked'. Creating the twinx ax beforehand allows to write a loop (so writing the call to sns.countplot() only once, with parameters).
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
sns.set_style('dark')
# create some demo data
data = pd.DataFrame({'ride_hod': np.random.normal(13, 3, 1000).astype(int) % 24,
'is_weekend': np.random.choice(['weekday', 'weekend'], 1000, p=[5 / 7, 2 / 7])})
# now, make 'is_weekend' a categorical column (not just strings)
data['is_weekend'] = pd.Categorical(data['is_weekend'], ['weekday', 'weekend'])
fig, ax1 = plt.subplots(figsize=(16, 6))
ax2 = ax1.twinx()
for ax, category in zip((ax1, ax2), data['is_weekend'].cat.categories):
sns.countplot(data=data[data['is_weekend'] == category], x='ride_hod', hue='is_weekend', palette='Blues', ax=ax)
ax.set_ylabel(f'Count ({category})')
ax1.legend_.remove() # both axes got a legend, remove one
ax1.set_xlabel('Hour of Day')
plt.tight_layout()
plt.show()

use plt.xticks(['put the label by hand in your x label'])

Customizing legend in Seaborn histplot subplots

I am trying to generate a figure with 4 subplots, each of which is a Seaborn histplot. The figure definition lines are:
fig,axes=plt.subplots(2,2,figsize=(6.3,7),sharex=True,sharey=True)
(ax1,ax2),(ax3,ax4)=axes
fig.subplots_adjust(wspace=0.1,hspace=0.2)
I would like to define strings for legend entries in each of the subplots. As an example, I am using the following code for the first subplot:
sp1=sns.histplot(df_dn,x="ktau",hue="statind",element="step", stat="density",common_norm=True,fill=False,palette=colvec,ax=ax1)
ax1.set_title(r'$d_n$')
ax1.set_xlabel(r'max($F_{a,max}$)')
ax1.set_ylabel(r'$\tau_{ken}$')
legend_labels,_=ax1.get_legend_handles_labels()
ax1.legend(legend_labels,['dep-','ind-','ind+','dep+'],title='Stat.ind.')
The legend is not showing correctly (legend entries are not plotted and the legend title is the name of the hue variable ("statind"). Please note I have successfully used the same code for other figures in which I used Seaborn relplots instead of histplots.

The main problem is that ax1.get_legend_handles_labels() returns empty lists (note that the first return value are the handles, the second would be the labels). At least for the current (0.11.1) version of seaborn's histplot().
To get the handles, you can do legend = ax1.get_legend(); handles = legend.legendHandles.
To recreate the legend, first the existing legend needs to be removed. Then, the new legend can be created starting from some handles.
Also note that to be sure of the order of the labels, it helps to set hue_order. Here is some example code to show the ideas:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
df_dn = pd.DataFrame({'ktau': np.random.randn(4000).cumsum(),
'statind': np.repeat([*'abcd'], 1000)})
fig, ax1 = plt.subplots()
sp1 = sns.histplot(df_dn, x="ktau", hue="statind", hue_order=['a', 'b', 'c', 'd'],
element="step", stat="density", common_norm=True, fill=False, ax=ax1)
ax1.set_title(r'$d_n$')
ax1.set_xlabel(r'max($F_{a,max}$)')
ax1.set_ylabel(r'$\tau_{ken}$')
legend = ax1.get_legend()
handles = legend.legendHandles
legend.remove()
ax1.legend(handles, ['dep-', 'ind-', 'ind+', 'dep+'], title='Stat.ind.')
plt.show()

Adding second legend to scatter plot

Is there a way to add a secondary legend to a scatterplot, where the size of the scatter is proportional to some data?
I have written the following code that generates a scatterplot. The color of the scatter represents the year (and is taken from a user-defined df) while the size of the scatter represents variable 3 (also taken from a df but is raw data):
import pandas as pd
colors = pd.DataFrame({'1985':'red','1990':'b','1995':'k','2000':'g','2005':'m','2010':'y'}, index=[0,1,2,3,4,5])
fig = plt.figure()
ax = fig.add_subplot(111)
for i in df.keys():
df[i].plot(kind='scatter',x='variable1',y='variable2',ax=ax,label=i,s=df[i]['variable3']/100, c=colors[i])
ax.legend(loc='upper right')
ax.set_xlabel("Variable 1")
ax.set_ylabel("Variable 2")
This code (with my data) produces the following graph:
So while the colors/years are well and clearly defined, the size of the scatter is not.
How can I add a secondary or additional legend that defines what the size of the scatter means?

You will need to create the second legend yourself, i.e. you need to create some artists to populate the legend with. In the case of a scatter we can use a normal plot and set the marker accordingly.
This is shown in the below example. To actually add a second legend we need to add the first legend to the axes, such that the new legend does not overwrite the first one.
import matplotlib.pyplot as plt
import matplotlib.colors
import numpy as np; np.random.seed(1)
import pandas as pd
plt.rcParams["figure.subplot.right"] = 0.8
v = np.random.rand(30,4)
v[:,2] = np.random.choice(np.arange(1980,2015,5), size=30)
v[:,3] = np.random.randint(5,13,size=30)
df= pd.DataFrame(v, columns=["x","y","year","quality"])
df.year = df.year.values.astype(int)
fig, ax = plt.subplots()
for i, (name, dff) in enumerate(df.groupby("year")):
c = matplotlib.colors.to_hex(plt.cm.jet(i/7.))
dff.plot(kind='scatter',x='x',y='y', label=name, c=c,
s=dff.quality**2, ax=ax)
leg = plt.legend(loc=(1.03,0), title="Year")
ax.add_artist(leg)
h = [plt.plot([],[], color="gray", marker="o", ms=i, ls="")[0] for i in range(5,13)]
plt.legend(handles=h, labels=range(5,13),loc=(1.03,0.5), title="Quality")
plt.show()

Have a look at http://matplotlib.org/users/legend_guide.html.
It shows how to have multiple legends (about halfway down) and there is another example that shows how to set the marker size.
If that doesn't work, then you can also create a custom legend (last example).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Controlling tick labels alignment in pandas Boxplot within subplots - python

Related

How can I rotate axis labels in a faceted seaborn.objects plot?

Plot only specific subplots of a grid of subplots

Two seaborn plots with different scales displayed on same plot but bars overlap

Customizing legend in Seaborn histplot subplots

Adding second legend to scatter plot

Categories

Resources