I'm plotting 2 dataframes with this method:
df.plot(ax=ax, x='x', y='y', label = "first_df")
df2.plot(ax=ax, x='x', y='y', label = "second_df")
And I add some avxspan functions:
plt.axvspan(x, y, label = value)
Since that I have multiple avxspan and there are also dupplicated values, I am using this code to uniquely display the values.
handles, labels = plt.gca().get_legend_handles_labels()
by_label = dict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys(),loc='upper center', bbox_to_anchor=(1.1, 0.8))
But when I display the legend, I have one legend for the dfs and an other for the avxspan functions. I think it is because I use plot for dfs and plt for axvspan, so I don't know how to fusion the legends.
EDITH:
I tried this with ax1 et ax2 for my dfs:
h1, l1 = ax1.get_legend_handles_labels()
h2, l2 = ax2.get_legend_handles_labels()
ax1.legend((h1+h2), l1+l2, loc='upper center', bbox_to_anchor=(1.1, 0.8))
It's working but I have dupplicates in the legend, how can I remove it ?
In stead of using df.plot which creates a legend whenever it's called, you can use ax.plot:
ax.plot(df['x'], df['y'], label='first df')
ax.plot(df2['x'], df2['y'], label='second df')
ax.legend()
In the following example, the legend is combined by plotting everything onto the same Axes including the avxspan with ax.avxspan and then running ax.legend to add the avxspan legend to the existing legend:
import numpy as np # v 1.19.2
import pandas as pd # v 1.2.3
# Create sample dataset
rng = np.random.default_rng(seed=1)
size = 30
df = pd.DataFrame(dict(x=range(size), y=rng.integers(0, 100, size=size)))
df2 = pd.DataFrame(dict(x=range(size), y=rng.integers(10, 50, size=size)))
# Plot data unto single Axes with combined legend using multiple plotting functions
ax = df.plot(x='x', y='y', label='first_df', figsize=(8,4))
df2.plot(ax=ax, x='x', y='y', label = 'second_df')
ax.axvspan(10, 15, label='span', facecolor='black', edgecolor=None, alpha=0.2)
ax.legend(loc='upper right');
Related
I'd like to represent two datasets on the same plot, one as a line as one as a binned barplot. I can do each individually:
tobar = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
tobar["bins"] = pd.qcut(tobar.index, 20)
bp = sns.barplot(data=tobar, x="bins", y="value")
toline = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
lp = sns.lineplot(data=toline, x=toline.index, y="value")
But when I try to combine them, of course the x axis gets messed up:
fig, ax = plt.subplots()
ax2 = ax.twinx()
bp = sns.barplot(data=tobar, x="bins", y="value", ax=ax)
lp = sns.lineplot(data=toline, x=toline.index, y="value", ax=ax2)
bp.set(xlabel=None)
I also can't seem to get rid of the bin labels.
How can I get these two informations on the one plot?
This answer explains why it's better to plot the bars with matplotlib.axes.Axes.bar instead of sns.barplot or pandas.DataFrame.bar.
In short, the xtick locations correspond to the actual numeric value of the label, whereas the xticks for seaborn and pandas are 0 indexed, and don't correspond to the numeric value.
This answer shows how to add bar labels.
ax2 = ax.twinx() can be used for the line plot if needed
Works the same if the line plot is different data.
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2, seaborn 0.12.1
Imports and DataFrame
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# test data
np.random.seed(2022)
df = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
# create the bins
df["bins"] = pd.qcut(df.index, 20)
# add a column for the mid point of the interval
df['mid'] = df.bins.apply(lambda row: row.mid.round().astype(int))
# pivot the dataframe to calculate the mean of each interval
pt = df.pivot_table(index='mid', values='value', aggfunc='mean').reset_index()
Plot 1
# create the figure
fig, ax = plt.subplots(figsize=(30, 7))
# add a horizontal line at y=0
ax.axhline(0, color='black')
# add the bar plot
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
# set the labels on the xticks - if desired
ax.set_xticks(ticks=pt.mid, labels=pt.mid)
# add the intervals as labels on the bars - if desired
ax.bar_label(ax.containers[0], labels=df.bins.unique(), weight='bold')
# add the line plot
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Plot 2
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Plot 3
The bar width is the width of the interval
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=50, alpha=0.5, ec='k')
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
I have an issue with setting the x labels while using twinx function. My original data is a pandas dataframe, namely, df, which has 3 attributes, "name"=product name, "sold"=number of items sold, and "revenue". the name is a pandas series (like "2 shampoo"), but I can't set it to be x tick label (see pic below). How could I set the x labels to display the product's names?
fig = plt.figure() # Create matplotlib figure
ax = fig.add_subplot(111) # Create matplotlib axes
ax2 = ax.twinx() # Create another axes that shares the same x-axis as ax.
width = 0.4
df.sold.plot(kind='bar', color='red', ax=ax, width=width, position=1, rot=90)
df.revenue.plot(kind='bar', color='blue', ax=ax2, width=width, position=0, rot=90)
# print(type(df['name']), "\n", df['name'])
ax.set_ylabel('Sold')
ax2.set_ylabel('Revenue')
ax.legend(['Sold'], loc='upper left')
ax2.legend(['Revenue'], loc='upper right')
plt.show()
You will need to set the labels for X-axis using the set_xticklabels() to show the fields. Add this line after plotting the graph.
ax.set_xticklabels(df.Name)
and you will get the below plot.
In version 3.4, matplotlib added automatic Bar labels:
https://matplotlib.org/stable/users/whats_new.html#new-automatic-labeling-for-bar-charts
I'm trying to use this on a bar plot generated by Seaborn.
fig, axs = plt.subplots(
nrows=2,
)
for i, col in enumerate(['col_1', 'col_2']):
ax = axs[i]
sns.barplot(
x="class",
y=col,
hue="hue_col",
data=data_df,
edgecolor=".3",
linewidth=0.5,
ax=ax
)
ax.bar_label(ax.containers[i]) # Doesn't work
What do I need to do to make this work? example plot
You can loop through the containers and call ax.bar_label(...) for each of them. Note that seaborn creates one set of bars for each hue value.
The following example uses the titanic dataset and sets ci=None to avoid the error bars overlapping with the text (if error bars are needed, one could set a lighter color, e.g. errcolor='gold').
import seaborn as sns
import matplotlib.pyplot as plt
titanic = sns.load_dataset('titanic')
fig, axs = plt.subplots(ncols=2, figsize=(12, 4))
for ax, col in zip(axs, ['age', 'fare']):
sns.barplot(
x='sex',
y=col,
hue="class",
data=titanic,
edgecolor=".3",
linewidth=0.5,
ci=None,
ax=ax
)
ax.set_title('mean ' + col)
ax.margins(y=0.1) # make room for the labels
for bars in ax.containers:
ax.bar_label(bars, fmt='%.1f')
plt.tight_layout()
plt.show()
I'm trying desperately to create nice graphics with Matplot, but it's no easy task. To contextualize, I have two series (serie1, serie2). For each
I have 3 Groups (Group1, Group2 and Group3). For each group, I have some theme and values. Each series describes the behaviour of several individuals (G1, G2, G3) through different variables (Theme). The code is :
import pandas as pd
d = {"ThemeA": [25,34,75], "ThemeB": [0,71,18], "ThemeC": [2,0,0], "ThemeD":[1,14,0] }
serie1 = pd.DataFrame(data = d, index=["Groupe 1", "Groupe 2", "Groupe 3"] )
serie1= serie1.loc[:,:].div(serie1.sum(1), axis=0) * 100
d = {"ThemeA": [145,10,3], "ThemeB": [10,1,70], "ThemeC": [34,1,2], "ThemeD":[3,17,27]}
serie2= pd.DataFrame(data = d, index=["Groupe 1", "Groupe 2", "Groupe 3"])
serie2= serie2.loc[:,:].div(serie2.sum(1), axis=0) * 100
Now I would like to make a graph to display the user data :
ax = fig.add_subplot(111)
ax = serie1.plot(kind='barh', ax=ax, width=0.2, stacked=True, position=0, sharex=True,
sharey=True, legend=True, figsize = (6,2))
serie2.plot(kind='barh', ax=ax, width=0.2, stacked=True, position=1.6,
sharex=True, sharey=True, legend=False)
ax.grid(False)
plt.ylim([-0.5, 2.5])
I was able to get the following graph:
But I would like to move the legend to the bottom. If I try to do this,
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05),
fancybox=True, shadow=True, ncol=5)
I get the following output, which has too many labels.
Of course I would like to see each label exactly once in the legend.
If someone has a miracle solution, I'm a taker! Thanks in advance.
You can use an xaxis longer than needed to have empty space for the legends
# calculate the size of the longer column (max of row sums)
max_col = serie2.sum(axis=1).max()
# increase the size of the x axis a factor of 1.4
xlim(0, max_col*1.4)
If you want the legends at bottom, when you call legend you actually are drawing the labels from the two plots. You need to remove duplicate labels. For this you use a dictionary.
from collections import OrderedDict
fig = figure()
figsize(6,2)
ax = fig.add_subplot(111)
serie1.plot(kind='barh', ax=ax, width=0.2, stacked=True, position=0,
sharex=True, sharey=True)
serie2.plot(kind='barh', ax=ax, width=0.2, stacked=True, position=1.6,
sharex=True, sharey=True)
handles, labels = gca().get_legend_handles_labels()
my_labels = OrderedDict(zip(labels, handles))
legend(my_labels.values(), my_labels.keys(), loc='upper center',
bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=5)
ax.grid(False)
ylim([-0.5, 2.5])
Then you get:
A one-line hack which works in this case is to add the line
serie2.columns= ["_" + col for col in serie2.columns]
before you plot the second dataframe. This will replace all column names with an underscore, followed by the original name. Since names starting with underscore ("_") are not shown in the legend, this leaves you only with the legend entries of the first dataframe.
This solution requires to have the same order of columns in both dataframes.
I'd like to Change the color of the axis, as well as ticks and value-labels for a plot I did using matplotlib and PyQt.
Any ideas?
As a quick example (using a slightly cleaner method than the potentially duplicate question):
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(range(10))
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
ax.spines['bottom'].set_color('red')
ax.spines['top'].set_color('red')
ax.xaxis.label.set_color('red')
ax.tick_params(axis='x', colors='red')
plt.show()
Alternatively
[t.set_color('red') for t in ax.xaxis.get_ticklines()]
[t.set_color('red') for t in ax.xaxis.get_ticklabels()]
If you have several figures or subplots that you want to modify, it can be helpful to use the matplotlib context manager to change the color, instead of changing each one individually. The context manager allows you to temporarily change the rc parameters only for the immediately following indented code, but does not affect the global rc parameters.
This snippet yields two figures, the first one with modified colors for the axis, ticks and ticklabels, and the second one with the default rc parameters.
import matplotlib.pyplot as plt
with plt.rc_context({'axes.edgecolor':'orange', 'xtick.color':'red', 'ytick.color':'green', 'figure.facecolor':'white'}):
# Temporary rc parameters in effect
fig, (ax1, ax2) = plt.subplots(1,2)
ax1.plot(range(10))
ax2.plot(range(10))
# Back to default rc parameters
fig, ax = plt.subplots()
ax.plot(range(10))
You can type plt.rcParams to view all available rc parameters, and use list comprehension to search for keywords:
# Search for all parameters containing the word 'color'
[(param, value) for param, value in plt.rcParams.items() if 'color' in param]
For those using pandas.DataFrame.plot(), matplotlib.axes.Axes is returned when creating a plot from a dataframe. Therefore, the dataframe plot can be assigned to a variable, ax, which enables the usage of the associated formatting methods.
The default plotting backend for pandas, is matplotlib.
See matplotlib.spines
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
import pandas as pd
# test dataframe
data = {'a': range(20), 'date': pd.bdate_range('2021-01-09', freq='D', periods=20)}
df = pd.DataFrame(data)
# plot the dataframe and assign the returned axes
ax = df.plot(x='date', color='green', ylabel='values', xlabel='date', figsize=(8, 6))
# set various colors
ax.spines['bottom'].set_color('blue')
ax.spines['top'].set_color('red')
ax.spines['right'].set_color('magenta')
ax.spines['right'].set_linewidth(3)
ax.spines['left'].set_color('orange')
ax.spines['left'].set_lw(3)
ax.xaxis.label.set_color('purple')
ax.yaxis.label.set_color('silver')
ax.tick_params(colors='red', which='both') # 'both' refers to minor and major axes
seaborn axes-level plot
import seaborn as sns
# plot the dataframe and assign the returned axes
fig, ax = plt.subplots(figsize=(12, 5))
g = sns.lineplot(data=df, x='date', y='a', color='g', label='a', ax=ax)
# set the margines to 0
ax.margins(x=0, y=0)
# set various colors
ax.spines['bottom'].set_color('blue')
ax.spines['top'].set_color('red')
ax.spines['right'].set_color('magenta')
ax.spines['right'].set_linewidth(3)
ax.spines['left'].set_color('orange')
ax.spines['left'].set_lw(3)
ax.xaxis.label.set_color('purple')
ax.yaxis.label.set_color('silver')
ax.tick_params(colors='red', which='both') # 'both' refers to minor and major axes
seaborn figure-level plot
# plot the dataframe and assign the returned axes
g = sns.relplot(kind='line', data=df, x='date', y='a', color='g', aspect=2)
# iterate through each axes
for ax in g.axes.flat:
# set the margins to 0
ax.margins(x=0, y=0)
# make the top and right spines visible
ax.spines[['top', 'right']].set_visible(True)
# set various colors
ax.spines['bottom'].set_color('blue')
ax.spines['top'].set_color('red')
ax.spines['right'].set_color('magenta')
ax.spines['right'].set_linewidth(3)
ax.spines['left'].set_color('orange')
ax.spines['left'].set_lw(3)
ax.xaxis.label.set_color('purple')
ax.yaxis.label.set_color('silver')
ax.tick_params(colors='red', which='both') # 'both' refers to minor and major axes
motivated by previous contributors, this is an example of three axes.
import matplotlib.pyplot as plt
x_values1=[1,2,3,4,5]
y_values1=[1,2,2,4,1]
x_values2=[-1000,-800,-600,-400,-200]
y_values2=[10,20,39,40,50]
x_values3=[150,200,250,300,350]
y_values3=[-10,-20,-30,-40,-50]
fig=plt.figure()
ax=fig.add_subplot(111, label="1")
ax2=fig.add_subplot(111, label="2", frame_on=False)
ax3=fig.add_subplot(111, label="3", frame_on=False)
ax.plot(x_values1, y_values1, color="C0")
ax.set_xlabel("x label 1", color="C0")
ax.set_ylabel("y label 1", color="C0")
ax.tick_params(axis='x', colors="C0")
ax.tick_params(axis='y', colors="C0")
ax2.scatter(x_values2, y_values2, color="C1")
ax2.set_xlabel('x label 2', color="C1")
ax2.xaxis.set_label_position('bottom') # set the position of the second x-axis to bottom
ax2.spines['bottom'].set_position(('outward', 36))
ax2.tick_params(axis='x', colors="C1")
ax2.set_ylabel('y label 2', color="C1")
ax2.yaxis.tick_right()
ax2.yaxis.set_label_position('right')
ax2.tick_params(axis='y', colors="C1")
ax3.plot(x_values3, y_values3, color="C2")
ax3.set_xlabel('x label 3', color='C2')
ax3.xaxis.set_label_position('bottom')
ax3.spines['bottom'].set_position(('outward', 72))
ax3.tick_params(axis='x', colors='C2')
ax3.set_ylabel('y label 3', color='C2')
ax3.yaxis.tick_right()
ax3.yaxis.set_label_position('right')
ax3.spines['right'].set_position(('outward', 36))
ax3.tick_params(axis='y', colors='C2')
plt.show()
You can also use this to draw multiple plots in same figure and style them using same color palette.
An example is given below
fig = plt.figure()
# Plot ROC curves
plotfigure(lambda: plt.plot(fpr1, tpr1, linestyle='--',color='orange', label='Logistic Regression'), fig)
plotfigure(lambda: plt.plot(fpr2, tpr2, linestyle='--',color='green', label='KNN'), fig)
plotfigure(lambda: plt.plot(p_fpr, p_tpr, linestyle='-', color='blue'), fig)
# Title
plt.title('ROC curve')
# X label
plt.xlabel('False Positive Rate')
# Y label
plt.ylabel('True Positive rate')
plt.legend(loc='best',labelcolor='white')
plt.savefig('ROC',dpi=300)
plt.show();
Output:
Here is a utility function that takes a plotting function with necessary args and plots the figure with required background-color styles. You can add more arguments as necessary.
def plotfigure(plot_fn, fig, background_col = 'xkcd:black', face_col = (0.06,0.06,0.06)):
"""
Plot Figure using plt plot functions.
Customize different background and face-colors of the plot.
Parameters:
plot_fn (func): The plot functions with necessary arguments as a lamdda function.
fig : The Figure object by plt.figure()
background_col: The background color of the plot. Supports matlplotlib colors
face_col: The face color of the plot. Supports matlplotlib colors
Returns:
void
"""
fig.patch.set_facecolor(background_col)
plot_fn()
ax = plt.gca()
ax.set_facecolor(face_col)
ax.spines['bottom'].set_color('white')
ax.spines['top'].set_color('white')
ax.spines['left'].set_color('white')
ax.spines['right'].set_color('white')
ax.xaxis.label.set_color('white')
ax.yaxis.label.set_color('white')
ax.grid(alpha=0.1)
ax.title.set_color('white')
ax.tick_params(axis='x', colors='white')
ax.tick_params(axis='y', colors='white')
A use case is defined below
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=50, n_classes=2, n_features=5, random_state=27)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=27)
fig=plt.figure()
plotfigure(lambda: plt.scatter(range(0,len(y)), y, marker=".",c="orange"), fig)