Plotting 2 stacked series with Panda and Matplotlib - python

I'm trying desperately to create nice graphics with Matplot, but it's no easy task. To contextualize, I have two series (serie1, serie2). For each
I have 3 Groups (Group1, Group2 and Group3). For each group, I have some theme and values. Each series describes the behaviour of several individuals (G1, G2, G3) through different variables (Theme). The code is :
import pandas as pd
d = {"ThemeA": [25,34,75], "ThemeB": [0,71,18], "ThemeC": [2,0,0], "ThemeD":[1,14,0] }
serie1 = pd.DataFrame(data = d, index=["Groupe 1", "Groupe 2", "Groupe 3"] )
serie1= serie1.loc[:,:].div(serie1.sum(1), axis=0) * 100
d = {"ThemeA": [145,10,3], "ThemeB": [10,1,70], "ThemeC": [34,1,2], "ThemeD":[3,17,27]}
serie2= pd.DataFrame(data = d, index=["Groupe 1", "Groupe 2", "Groupe 3"])
serie2= serie2.loc[:,:].div(serie2.sum(1), axis=0) * 100
Now I would like to make a graph to display the user data :
ax = fig.add_subplot(111)
ax = serie1.plot(kind='barh', ax=ax, width=0.2, stacked=True, position=0, sharex=True,
sharey=True, legend=True, figsize = (6,2))
serie2.plot(kind='barh', ax=ax, width=0.2, stacked=True, position=1.6,
sharex=True, sharey=True, legend=False)
ax.grid(False)
plt.ylim([-0.5, 2.5])
I was able to get the following graph:
But I would like to move the legend to the bottom. If I try to do this,
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05),
fancybox=True, shadow=True, ncol=5)
I get the following output, which has too many labels.
Of course I would like to see each label exactly once in the legend.
If someone has a miracle solution, I'm a taker! Thanks in advance.

You can use an xaxis longer than needed to have empty space for the legends
# calculate the size of the longer column (max of row sums)
max_col = serie2.sum(axis=1).max()
# increase the size of the x axis a factor of 1.4
xlim(0, max_col*1.4)
If you want the legends at bottom, when you call legend you actually are drawing the labels from the two plots. You need to remove duplicate labels. For this you use a dictionary.
from collections import OrderedDict
fig = figure()
figsize(6,2)
ax = fig.add_subplot(111)
serie1.plot(kind='barh', ax=ax, width=0.2, stacked=True, position=0,
sharex=True, sharey=True)
serie2.plot(kind='barh', ax=ax, width=0.2, stacked=True, position=1.6,
sharex=True, sharey=True)
handles, labels = gca().get_legend_handles_labels()
my_labels = OrderedDict(zip(labels, handles))
legend(my_labels.values(), my_labels.keys(), loc='upper center',
bbox_to_anchor=(0.5, -0.1), fancybox=True, shadow=True, ncol=5)
ax.grid(False)
ylim([-0.5, 2.5])
Then you get:

A one-line hack which works in this case is to add the line
serie2.columns= ["_" + col for col in serie2.columns]
before you plot the second dataframe. This will replace all column names with an underscore, followed by the original name. Since names starting with underscore ("_") are not shown in the legend, this leaves you only with the legend entries of the first dataframe.
This solution requires to have the same order of columns in both dataframes.

Related

Twinx makes labels disappear

I have an issue with setting the x labels while using twinx function. My original data is a pandas dataframe, namely, df, which has 3 attributes, "name"=product name, "sold"=number of items sold, and "revenue". the name is a pandas series (like "2 shampoo"), but I can't set it to be x tick label (see pic below). How could I set the x labels to display the product's names?
fig = plt.figure() # Create matplotlib figure
ax = fig.add_subplot(111) # Create matplotlib axes
ax2 = ax.twinx() # Create another axes that shares the same x-axis as ax.
width = 0.4
df.sold.plot(kind='bar', color='red', ax=ax, width=width, position=1, rot=90)
df.revenue.plot(kind='bar', color='blue', ax=ax2, width=width, position=0, rot=90)
# print(type(df['name']), "\n", df['name'])
ax.set_ylabel('Sold')
ax2.set_ylabel('Revenue')
ax.legend(['Sold'], loc='upper left')
ax2.legend(['Revenue'], loc='upper right')
plt.show()
You will need to set the labels for X-axis using the set_xticklabels() to show the fields. Add this line after plotting the graph.
ax.set_xticklabels(df.Name)
and you will get the below plot.

Is there a way to make appear the y-label only once in seaborn's catplot?

The title is quite self explicative.
I have a dataset, and the command '''catplot''' in seaborn is the best way to combine data and see variations over time. This is the code
x = 'data'
y = 'BD'
df['data'] = df['data'].dt.strftime('%d-%m-%Y')
# hue=df['loc'].astype(str)+', '+df['prof'].astype(str)
palette=['green', 'lightgreen', 'red', 'firebrick']
ax = sns.catplot(data=df,
x=x, y=y,
hue ='ID',
col='depth(cm)',
kind='bar',
hue_order=hue_order,
ci="sd", capsize=0.03,
palette=palette,
legend=False,
sharey=True,
height=6, aspect=10/8)
plt.legend(title=None,
fontsize=14,
ncol=4,
frameon=False,
bbox_to_anchor=(.5,0), borderaxespad=2)
ax.fig.subplots_adjust(top=0.9)
ax.fig.suptitle('Title', fontsize=20)
ax.set(xlabel=None, ylabel='Bulk Density (g cm$^{-3}$)')
sns.set(font_scale=0.9)
sns.set_style("ticks",{'axes.grid' : True})
and this is what i get
enter image description here
Even though I've included sharey=True, the second plot returns the y-label anyway.
Is there a way remove y-labels from the second plot?

Why I have two legends ? How to fusion the legends ? Python

I'm plotting 2 dataframes with this method:
df.plot(ax=ax, x='x', y='y', label = "first_df")
df2.plot(ax=ax, x='x', y='y', label = "second_df")
And I add some avxspan functions:
plt.axvspan(x, y, label = value)
Since that I have multiple avxspan and there are also dupplicated values, I am using this code to uniquely display the values.
handles, labels = plt.gca().get_legend_handles_labels()
by_label = dict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys(),loc='upper center', bbox_to_anchor=(1.1, 0.8))
But when I display the legend, I have one legend for the dfs and an other for the avxspan functions. I think it is because I use plot for dfs and plt for axvspan, so I don't know how to fusion the legends.
EDITH:
I tried this with ax1 et ax2 for my dfs:
h1, l1 = ax1.get_legend_handles_labels()
h2, l2 = ax2.get_legend_handles_labels()
ax1.legend((h1+h2), l1+l2, loc='upper center', bbox_to_anchor=(1.1, 0.8))
It's working but I have dupplicates in the legend, how can I remove it ?
In stead of using df.plot which creates a legend whenever it's called, you can use ax.plot:
ax.plot(df['x'], df['y'], label='first df')
ax.plot(df2['x'], df2['y'], label='second df')
ax.legend()
In the following example, the legend is combined by plotting everything onto the same Axes including the avxspan with ax.avxspan and then running ax.legend to add the avxspan legend to the existing legend:
import numpy as np # v 1.19.2
import pandas as pd # v 1.2.3
# Create sample dataset
rng = np.random.default_rng(seed=1)
size = 30
df = pd.DataFrame(dict(x=range(size), y=rng.integers(0, 100, size=size)))
df2 = pd.DataFrame(dict(x=range(size), y=rng.integers(10, 50, size=size)))
# Plot data unto single Axes with combined legend using multiple plotting functions
ax = df.plot(x='x', y='y', label='first_df', figsize=(8,4))
df2.plot(ax=ax, x='x', y='y', label = 'second_df')
ax.axvspan(10, 15, label='span', facecolor='black', edgecolor=None, alpha=0.2)
ax.legend(loc='upper right');

Plotting three categories with two axes in matplotlib

I am using a combination of pandas and matplotlib to plot three values for several categories. I would like one plot to have its own axis, and the other two to share an axis.
Close, but illustrates the issue with why I need dual axes:
pd.DataFrame([[1,2,3], [500,600,700], [500, 700, 650]], columns=['foo', 'bar','baz'],
index=['a','b','c']).T.plot(kind='bar')
Instead, I would like a second axis for the a bars. My attempt:
smol = pd.DataFrame([[1,2,3], [500,600,700], [500, 700, 650]], columns=['foo', 'bar','baz'],
index=['a','b','c']).T
fig = plt.figure(figsize=(10,5)) # Create matplotlib figure
ax = fig.add_subplot(111) # Create matplotlib axes
ax2 = ax.twinx() # Create another axes that shares the same x-axis as ax.
smol['a'].plot(kind='bar', color='red', ax=ax, width=0.3,
position=1, edgecolor='black')
smol['b'].plot(kind='bar', color='blue', ax=ax2, width=0.3,
position=0, edgecolor='black')
ax.set_ylabel('Small scale')
ax2.set_ylabel('Big scale')
plt.show()
Unfortunately, adding
smol['c'].plot(kind='bar', color='green', ax=ax2, width=0.3,
position=0, edgecolor='black')
produces:
How can I have b and c share an axis, but appear next to each other, as in the first attempt?
I've used secondary_y keyword. The code is also considerably shorter
smol = pd.DataFrame([[1,2,3], [500,600,700], [500, 700, 650]], columns=['foo', 'bar','baz'],
index=['a','b','c']).T
ax = smol.plot(kind="bar", secondary_y=['b', 'c'])
ax.set_ylabel('Small scale')
ax.right_ax.set_ylabel('Big scale')
plt.show()

How to change the line colour for different lines in a subplot

I'm looking to be able to change the colours of lines to ones that I specify on a plot using matplotlib.
Below is the relevant code (it's part of a function) that I'm using. The variable avg_rel_trackis a '2D array' and each column corresponds to a 'blade'. Each 'blade' plots a separate line.
I would like each blade/line to be a colour I specify. I'm struggling to find the relevant documentation, and I'm sorry if it's obvious.
def plot_data(avg_rel_track, sd_rel_track_sum, shade):
fig = plt.figure(figsize=(18,10))
gs = gridspec.GridSpec(5, 1, height_ratios=[1.75, 1 ,1, 1])
ax0 = plt.subplot(gs[0])
ax1 = plt.subplot(gs[1])
ax2 = plt.subplot(gs[2])
ax3 = plt.subplot(gs[3])
ax4 = plt.subplot(gs[4])
fig.subplots_adjust(top=0.93)
#The following plot has 5 plots within it.
lineObjects = ax0.plot(avg_rel_track_nan)
ax0.set_title('Averaged Relative Track',fontsize=11)
ax0.legend(lineObjects, (1,2,3,4,5),loc='lower center', bbox_to_anchor=(0.82, 1),
fancybox=True, shadow=True, ncol=5)
Below is an example of the plot (Very rough)I would like to change the plots to what I specify. It is a crop out of 5 subplots
Problem is that plot function accepts only one color as a parameter. So you'll have to iterate over the columns and plot each one separately.
columns = [[1,2,3],[1,4,5],[6,4,2]]
colors = ['green', 'pink', 'blue']
labels = ['foo', 'bar', 'baz']
fig, ax = plt.subplots(1, 1)
for i, column in enumerate(columns):
ax.plot(column, color=colors[i], label=labels[i])
ax.legend(loc='lower center', bbox_to_anchor=(0.82, 1),
fancybox=True, shadow=True, ncol=3)

Categories