I am always bothered when I make a bar plot with pandas and I want to change the names of the labels in the legend. Consider for instance the output of this code:
import pandas as pd
from matplotlib.pyplot import *
df = pd.DataFrame({'A':26, 'B':20}, index=['N'])
df.plot(kind='bar')
Now, if I want to change the name in the legend, I would usually try to do:
legend(['AAA', 'BBB'])
But I end up with this:
In fact, the first dashed line seems to correspond to an additional patch.
So I wonder if there is a simple trick here to change the labels, or do I need to plot each of the columns independently with matplotlib and set the labels myself. Thanks.
To change the labels for Pandas df.plot() use ax.legend([...]):
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({'A':26, 'B':20}, index=['N'])
df.plot(kind='bar', ax=ax)
#ax = df.plot(kind='bar') # "same" as above
ax.legend(["AAA", "BBB"]);
Another approach is to do the same by plt.legend([...]):
import matplotlib.pyplot as plt
df.plot(kind='bar')
plt.legend(["AAA", "BBB"]);
If you need to call plot multiply times, you can also use the "label" argument:
ax = df1.plot(label='df1', y='y_var')
ax = df2.plot(label='df2', y='y_var')
While this is not the case in the OP question, this can be helpful if the DataFrame is in long format and you use groupby before plotting.
This is slightly an edge case but I think it can add some value to the other answers.
If you add more details to the graph (say an annotation or a line) you'll soon discover that it is relevant when you call legend on the axis: if you call it at the bottom of the script it will capture different handles for the legend elements, messing everything.
For instance the following script:
df = pd.DataFrame({'A':26, 'B':20}, index=['N'])
ax = df.plot(kind='bar')
ax.hlines(23, -.5,.5, linestyles='dashed')
ax.annotate('average',(-0.4,23.5))
ax.legend(["AAA", "BBB"]); #quickfix: move this at the third line
Will give you this figure, which is wrong:
While this a toy example which can be easily fixed by changing the order of the commands, sometimes you'll need to modify the legend after several operations and hence the next method will give you more flexibility. Here for instance I've also changed the fontsize and position of the legend:
df = pd.DataFrame({'A':26, 'B':20}, index=['N'])
ax = df.plot(kind='bar')
ax.hlines(23, -.5,.5, linestyles='dashed')
ax.annotate('average',(-0.4,23.5))
ax.legend(["AAA", "BBB"]);
# do potentially more stuff here
h,l = ax.get_legend_handles_labels()
ax.legend(h[:2],["AAA", "BBB"], loc=3, fontsize=12)
This is what you'll get:
Related
Consider the following snippet
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
data = np.random.rand(10,5)
cols = ["a","b","c","d","e"]
df = pd.DataFrame(data=data, columns = cols)
df.index.name="Time (s)"
fig,axes = plt.subplots(3,2,sharex=True, squeeze=False)
axes = axes.T.flat
axes[5].remove()
df.plot(subplots=True,grid=True,legend=True,ax = axes[0:5])
that produces the following plot
I wish to show the xticks in the subplots where they are missing as I wrote in red with reference to the above picture.
I wish to show only the xticks where I marked in red, not the labels. The labels are fine where they currently are and shall be kept there.
After some search, I tried with
for ax in axes:
ax.tick_params(axis="x")
and
for ax in axes:
ax.spines.set(visible=True)
but with no success.
Any hints?
EDIT: As someone kindly suggested, if I set sharex=False, then when I horizontally zoom on one axes I will not have the same zoom effect on the other axes and this is not what I want.
What I want is to: a) show the xticks in all axes, b) when I horizontally zoom on one axes all the other axes are horizontally zoomed of the same amount.
You need to turn off sharing x properties by setting sharex=False (which is the default value by the way in matplotlib.pyplot.subplots):
Replace this:
fig,axes = plt.subplots(3,2,sharex=True, squeeze=False)
By this:
fig,axes = plt.subplots(3,2, squeeze=False)
# Output:
I am trying to include 2 seaborn countplots with different scales on the same plot but the bars display as different widths and overlap as shown below. Any idea how to get around this?
Setting dodge=False, doesn't work as the bars appear on top of each other.
The main problem of the approach in the question, is that the first countplot doesn't take hue into account. The second countplot won't magically move the bars of the first. An additional categorical column could be added, only taking on the 'weekend' value. Note that the column should be explicitly made categorical with two values, even if only one value is really used.
Things can be simplified a lot, just starting from the original dataframe, which supposedly already has a column 'is_weeked'. Creating the twinx ax beforehand allows to write a loop (so writing the call to sns.countplot() only once, with parameters).
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
sns.set_style('dark')
# create some demo data
data = pd.DataFrame({'ride_hod': np.random.normal(13, 3, 1000).astype(int) % 24,
'is_weekend': np.random.choice(['weekday', 'weekend'], 1000, p=[5 / 7, 2 / 7])})
# now, make 'is_weekend' a categorical column (not just strings)
data['is_weekend'] = pd.Categorical(data['is_weekend'], ['weekday', 'weekend'])
fig, ax1 = plt.subplots(figsize=(16, 6))
ax2 = ax1.twinx()
for ax, category in zip((ax1, ax2), data['is_weekend'].cat.categories):
sns.countplot(data=data[data['is_weekend'] == category], x='ride_hod', hue='is_weekend', palette='Blues', ax=ax)
ax.set_ylabel(f'Count ({category})')
ax1.legend_.remove() # both axes got a legend, remove one
ax1.set_xlabel('Hour of Day')
plt.tight_layout()
plt.show()
use plt.xticks(['put the label by hand in your x label'])
I am trying to generate a figure with 4 subplots, each of which is a Seaborn histplot. The figure definition lines are:
fig,axes=plt.subplots(2,2,figsize=(6.3,7),sharex=True,sharey=True)
(ax1,ax2),(ax3,ax4)=axes
fig.subplots_adjust(wspace=0.1,hspace=0.2)
I would like to define strings for legend entries in each of the subplots. As an example, I am using the following code for the first subplot:
sp1=sns.histplot(df_dn,x="ktau",hue="statind",element="step", stat="density",common_norm=True,fill=False,palette=colvec,ax=ax1)
ax1.set_title(r'$d_n$')
ax1.set_xlabel(r'max($F_{a,max}$)')
ax1.set_ylabel(r'$\tau_{ken}$')
legend_labels,_=ax1.get_legend_handles_labels()
ax1.legend(legend_labels,['dep-','ind-','ind+','dep+'],title='Stat.ind.')
The legend is not showing correctly (legend entries are not plotted and the legend title is the name of the hue variable ("statind"). Please note I have successfully used the same code for other figures in which I used Seaborn relplots instead of histplots.
The main problem is that ax1.get_legend_handles_labels() returns empty lists (note that the first return value are the handles, the second would be the labels). At least for the current (0.11.1) version of seaborn's histplot().
To get the handles, you can do legend = ax1.get_legend(); handles = legend.legendHandles.
To recreate the legend, first the existing legend needs to be removed. Then, the new legend can be created starting from some handles.
Also note that to be sure of the order of the labels, it helps to set hue_order. Here is some example code to show the ideas:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
df_dn = pd.DataFrame({'ktau': np.random.randn(4000).cumsum(),
'statind': np.repeat([*'abcd'], 1000)})
fig, ax1 = plt.subplots()
sp1 = sns.histplot(df_dn, x="ktau", hue="statind", hue_order=['a', 'b', 'c', 'd'],
element="step", stat="density", common_norm=True, fill=False, ax=ax1)
ax1.set_title(r'$d_n$')
ax1.set_xlabel(r'max($F_{a,max}$)')
ax1.set_ylabel(r'$\tau_{ken}$')
legend = ax1.get_legend()
handles = legend.legendHandles
legend.remove()
ax1.legend(handles, ['dep-', 'ind-', 'ind+', 'dep+'], title='Stat.ind.')
plt.show()
I want to make a legend for all bars in my barplot. I have already extracted the labels for all bars, but somehow legend()z only creates a line for the first one and not the second one.
How should I proceed? I was thinking that I maybe have to extract the colors of the bars manually as well, but I don't know. I also hoped there should be an easier way.
df.Completeness.value_counts().plot(kind='bar')
_, labels = plt.xticks()
label_names = list(map(lambda p: p.get_text(), labels))
print(label_names)
plt.legend(label_names)
Set the color by hand and use mpaches
import matplotlib.patches as mpatches
df.Completeness.value_counts().plot(kind='bar')
complete = mpatches.Patch(color='red', label='Complete')
partial = mpatches.Patch(color='blue', label='Partial')
plt.legend(handles=[complete, partial])
If you run this dummy example, do you get the layout that you want?
import pandas as pd
import numpy as np
df=pd.DataFrame({'A':np.random.rand(2)-1,'B':np.random.rand(2)},index=['val1','val2'] )
ax = df.plot(kind='bar', color=['r','b'])
I came across this different behaviour in the third example plot below. Why am I able to correctly edit the x-axis' ticks with pandas line() and area() plots, but not with bar()? What's the best way to fix the (general) third example?
import numpy as np
import pandas as pd
import matplotlib.ticker as ticker
import matplotlib.pyplot as plt
x = np.arange(73,145,1)
y = np.cos(x)
df = pd.Series(y,x)
ax1 = df.plot.line()
ax1.xaxis.set_major_locator(ticker.MultipleLocator(10))
ax1.xaxis.set_minor_locator(ticker.MultipleLocator(2.5))
plt.show()
ax2 = df.plot.area(stacked=False)
ax2.xaxis.set_major_locator(ticker.MultipleLocator(10))
ax2.xaxis.set_minor_locator(ticker.MultipleLocator(2.5))
plt.show()
ax3 = df.plot.bar()
ax3.xaxis.set_major_locator(ticker.MultipleLocator(10))
ax3.xaxis.set_minor_locator(ticker.MultipleLocator(2.5))
plt.show()
Problem:
The bar plot is meant to be used with categorical data. Therefore the bars are not actually at the positions of x but at positions 0,1,2,...N-1. The bar labels are then adjusted to the values of x.
If you then put a tick only on every tenth bar, the second label will be placed at the tenth bar etc. The result is
You can see that the bars are actually positionned at integer values starting at 0 by using a normal ScalarFormatter on the axes:
ax3 = df.plot.bar()
ax3.xaxis.set_major_locator(ticker.MultipleLocator(10))
ax3.xaxis.set_minor_locator(ticker.MultipleLocator(2.5))
ax3.xaxis.set_major_formatter(ticker.ScalarFormatter())
Now you can of course define your own fixed formatter like this
n = 10
ax3 = df.plot.bar()
ax3.xaxis.set_major_locator(ticker.MultipleLocator(n))
ax3.xaxis.set_minor_locator(ticker.MultipleLocator(n/4.))
seq = ax3.xaxis.get_major_formatter().seq
ax3.xaxis.set_major_formatter(ticker.FixedFormatter([""]+seq[::n]))
which has the drawback that it starts at some arbitrary value.
Solution:
I would guess the best general solution is not to use the pandas plotting function at all (which is anyways only a wrapper), but the matplotlib bar function directly:
fig, ax3 = plt.subplots()
ax3.bar(df.index, df.values, width=0.72)
ax3.xaxis.set_major_locator(ticker.MultipleLocator(10))
ax3.xaxis.set_minor_locator(ticker.MultipleLocator(2.5))