I'm creating a bar chart, and I can't figure out how to add value labels on the bars (in the center of the bar, or just above it).
I believe the solution is either with 'text' or 'annotate', but I:
a) don't know which one to use (and generally speaking, haven't figured out when to use which).
b) can't see to get either to present the value labels.
Here is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.mpl_style', 'default')
%matplotlib inline
# Bring some raw data.
frequencies = [6, 16, 75, 160, 244, 260, 145, 73, 16, 4, 1]
# In my original code I create a series and run on that,
# so for consistency I create a series from the list.
freq_series = pd.Series(frequencies)
x_labels = [108300.0, 110540.0, 112780.0, 115020.0, 117260.0, 119500.0,
121740.0, 123980.0, 126220.0, 128460.0, 130700.0]
# Plot the figure.
plt.figure(figsize=(12, 8))
fig = freq_series.plot(kind='bar')
fig.set_title('Amount Frequency')
fig.set_xlabel('Amount ($)')
fig.set_ylabel('Frequency')
fig.set_xticklabels(x_labels)
How can I add value labels on the bars (in the center of the bar, or just above it)?
Firstly freq_series.plot returns an axis not a figure so to make my answer a little more clear I've changed your given code to refer to it as ax rather than fig to be more consistent with other code examples.
You can get the list of the bars produced in the plot from the ax.patches member. Then you can use the technique demonstrated in this matplotlib gallery example to add the labels using the ax.text method.
import pandas as pd
import matplotlib.pyplot as plt
# Bring some raw data.
frequencies = [6, 16, 75, 160, 244, 260, 145, 73, 16, 4, 1]
# In my original code I create a series and run on that,
# so for consistency I create a series from the list.
freq_series = pd.Series(frequencies)
x_labels = [
108300.0,
110540.0,
112780.0,
115020.0,
117260.0,
119500.0,
121740.0,
123980.0,
126220.0,
128460.0,
130700.0,
]
# Plot the figure.
plt.figure(figsize=(12, 8))
ax = freq_series.plot(kind="bar")
ax.set_title("Amount Frequency")
ax.set_xlabel("Amount ($)")
ax.set_ylabel("Frequency")
ax.set_xticklabels(x_labels)
rects = ax.patches
# Make some labels.
labels = [f"label{i}" for i in range(len(rects))]
for rect, label in zip(rects, labels):
height = rect.get_height()
ax.text(
rect.get_x() + rect.get_width() / 2, height + 5, label, ha="center", va="bottom"
)
plt.show()
This produces a labeled plot that looks like:
Based on a feature mentioned in this answer to another question I have found a very generally applicable solution for placing labels on a bar chart.
Other solutions unfortunately do not work in many cases, because the spacing between label and bar is either given in absolute units of the bars or is scaled by the height of the bar. The former only works for a narrow range of values and the latter gives inconsistent spacing within one plot. Neither works well with logarithmic axes.
The solution I propose works independent of scale (i.e. for small and large numbers) and even correctly places labels for negative values and with logarithmic scales because it uses the visual unit points for offsets.
I have added a negative number to showcase the correct placement of labels in such a case.
The value of the height of each bar is used as a label for it. Other labels can easily be used with Simon's for rect, label in zip(rects, labels) snippet.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Bring some raw data.
frequencies = [6, -16, 75, 160, 244, 260, 145, 73, 16, 4, 1]
# In my original code I create a series and run on that,
# so for consistency I create a series from the list.
freq_series = pd.Series.from_array(frequencies)
x_labels = [108300.0, 110540.0, 112780.0, 115020.0, 117260.0, 119500.0,
121740.0, 123980.0, 126220.0, 128460.0, 130700.0]
# Plot the figure.
plt.figure(figsize=(12, 8))
ax = freq_series.plot(kind='bar')
ax.set_title('Amount Frequency')
ax.set_xlabel('Amount ($)')
ax.set_ylabel('Frequency')
ax.set_xticklabels(x_labels)
def add_value_labels(ax, spacing=5):
"""Add labels to the end of each bar in a bar chart.
Arguments:
ax (matplotlib.axes.Axes): The matplotlib object containing the axes
of the plot to annotate.
spacing (int): The distance between the labels and the bars.
"""
# For each bar: Place a label
for rect in ax.patches:
# Get X and Y placement of label from rect.
y_value = rect.get_height()
x_value = rect.get_x() + rect.get_width() / 2
# Number of points between bar and label. Change to your liking.
space = spacing
# Vertical alignment for positive values
va = 'bottom'
# If value of bar is negative: Place label below bar
if y_value < 0:
# Invert space to place label below
space *= -1
# Vertically align label at top
va = 'top'
# Use Y value as label and format number with one decimal place
label = "{:.1f}".format(y_value)
# Create annotation
ax.annotate(
label, # Use `label` as label
(x_value, y_value), # Place label at end of the bar
xytext=(0, space), # Vertically shift label by `space`
textcoords="offset points", # Interpret `xytext` as offset in points
ha='center', # Horizontally center label
va=va) # Vertically align label differently for
# positive and negative values.
# Call the function above. All the magic happens there.
add_value_labels(ax)
plt.savefig("image.png")
Edit: I have extracted the relevant functionality in a function, as suggested by barnhillec.
This produces the following output:
And with logarithmic scale (and some adjustment to the input data to showcase logarithmic scaling), this is the result:
As of matplotlib v3.4.0
Use matplotlib.pyplot.bar_label
The default label position, set with the parameter label_type, is 'edge'. To center the labels in the middle of the bar, use 'center'
Additional kwargs are passed to Axes.annotate, which accepts Text kwargs.
Properties like color, rotation, fontsize, etc., can be used.
See the matplotlib: Bar Label Demo page for additional formatting options.
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
ax.containers is a list of BarContainer artists
With a single level bar plot, it's a list of len 1, hence [0] is used.
For grouped and stacked bar plots there will be more objects in the list
Stacked
Grouped
How to annotate each segment of a stacked bar chart
How to plot and annotate grouped bars in seaborn
Stacked Bar Chart with Centered Labels
How to plot and annotate a grouped bar chart
Simple label formatting can be done with the fmt parameter, as shown in the Demo examples and at How to annotate a seaborn barplot with the aggregated value.
More sophisticated label formatting should use the label parameter, as shown in the Demo examples and the following
Examples with label=
Examples with label=
stack bar plot in matplotlib and add label to each section
How to annotate a stacked bar plot and add legend labels
How to add multiple annotations to a barplot
How to customize bar annotations to not show selected values
How to plot a horizontal stacked bar with annotations
How to annotate bar plots when adding error bars
How to align annotations at the end of a horizontal bar plot
import pandas as pd
# dataframe using frequencies and x_labels from the OP
df = pd.DataFrame({'Frequency': frequencies}, index=x_labels)
# display(df)
Frequency
108300.0 6
110540.0 16
112780.0 75
115020.0 160
117260.0 244
# plot
ax = df.plot(kind='bar', figsize=(12, 8), title='Amount Frequency',
xlabel='Amount ($)', ylabel='Frequency', legend=False)
# annotate
ax.bar_label(ax.containers[0], label_type='edge')
# pad the spacing between the number and the edge of the figure
ax.margins(y=0.1)
Specify additional kwargs for additional customization
Accepts parameters from matplotlib.axes.Axes.text
ax.bar_label(ax.containers[0], label_type='edge', color='red', rotation=90, fontsize=7, padding=3)
Seaborn axes-level plot
As can be seen, the is exactly the same as with ax.bar(...), plt.bar(...), and df.plot(kind='bar',...)
import seaborn as sns
# plot data
fig, ax = plt.subplots(figsize=(12, 8))
sns.barplot(x=x_labels, y=frequencies, ax=ax)
# annotate
ax.bar_label(ax.containers[0], label_type='edge')
# pad the spacing between the number and the edge of the figure
ax.margins(y=0.1)
Seaborn figure-level plot
seaborn.catplot accepts a dataframe for data.
Since .catplot is a FacetGrid (subplots), the only difference is to iterate through each axes of the figure to use .bar_labels.
import pandas as pd
import seaborn as sns
# load the data into a dataframe
df = pd.DataFrame({'Frequency': frequencies, 'amount': x_labels})
# plot
g = sns.catplot(kind='bar', data=df, x='amount', y='Frequency', height=6, aspect=1.5)
# iterate through the axes
for ax in g.axes.flat:
# annotate
ax.bar_label(ax.containers[0], label_type='edge')
# pad the spacing between the number and the edge of the figure; should be in the loop, otherwise only the last subplot would be adjusted
ax.margins(y=0.1)
matplotlib.axes.Axes.bar
It will be similar if just using matplotlib.pyplot.bar
import matplotlib.pyplot as plt
# create the xticks beginning a index 0
xticks = range(len(frequencies))
# plot
fig, ax = plt.subplots(figsize=(12, 8))
ax.bar(x=xticks, height=frequencies)
# label the xticks
ax.set_xticks(xticks, x_labels)
# annotate
ax.bar_label(ax.containers[0], label_type='edge')
# pad the spacing between the number and the edge of the figure
ax.margins(y=0.1)
Other examples using bar_label
Linked SO Answers
Linked SO Answers
How to create and annotate a stacked proportional bar chart
How to wrap long tick labels in a seaborn figure-level plot
How to calculate percent by row and annotate 100 percent stacked bars
How to annotate barplot with percent by hue/legend group
Stacked bars are unexpectedly annotated with the sum of bar heights
How to add percentages on top of bars in seaborn
How to plot and annotate grouped bars
How to plot percentage with seaborn distplot / histplot / displot
How to annotate bar chart with values different to those from get_height()
How to plot grouped bars in the correct order
Pandas bar how to label desired values
Problem with plotting two lists with different sizes using matplotlib
How to display percentage above grouped bar chart
How to annotate only one category of a stacked bar plot
How to set ticklabel rotation and add bar annotations
How to Increase subplot text size and add custom bar plot annotations
How to aggregate group metrics and plot data with pandas
How to get a grouped bar plot of categorical data
How to plot a stacked bar with annotations for multiple groups
How to create grouped bar plots in a single figure from a wide dataframe
How to annotate a stackplot or area plot
How to determine if the last value in all columns is greater than n
How to plot grouped bars
How to plot element count and add annotations
How to add multiple data labels in a bar chart in matplotlib
Seaborn Catplot set values over the bars
Python matplotlib multiple bars
Matplotlib pie chart label does not match value
plt grid ALPHA parameter not working in matplotlib
How to horizontally center a bar plot annotation
Building off the above (great!) answer, we can also make a horizontal bar plot with just a few adjustments:
# Bring some raw data.
frequencies = [6, -16, 75, 160, 244, 260, 145, 73, 16, 4, 1]
freq_series = pd.Series(frequencies)
y_labels = [108300.0, 110540.0, 112780.0, 115020.0, 117260.0, 119500.0,
121740.0, 123980.0, 126220.0, 128460.0, 130700.0]
# Plot the figure.
plt.figure(figsize=(12, 8))
ax = freq_series.plot(kind='barh')
ax.set_title('Amount Frequency')
ax.set_xlabel('Frequency')
ax.set_ylabel('Amount ($)')
ax.set_yticklabels(y_labels)
ax.set_xlim(-40, 300) # expand xlim to make labels easier to read
rects = ax.patches
# For each bar: Place a label
for rect in rects:
# Get X and Y placement of label from rect.
x_value = rect.get_width()
y_value = rect.get_y() + rect.get_height() / 2
# Number of points between bar and label. Change to your liking.
space = 5
# Vertical alignment for positive values
ha = 'left'
# If value of bar is negative: Place label left of bar
if x_value < 0:
# Invert space to place label to the left
space *= -1
# Horizontally align label at right
ha = 'right'
# Use X value as label and format number with one decimal place
label = "{:.1f}".format(x_value)
# Create annotation
plt.annotate(
label, # Use `label` as label
(x_value, y_value), # Place label at end of the bar
xytext=(space, 0), # Horizontally shift label by `space`
textcoords="offset points", # Interpret `xytext` as offset in points
va='center', # Vertically center label
ha=ha) # Horizontally align label differently for
# positive and negative values.
plt.savefig("image.png")
If you want to just label the data points above the bar, you could use plt.annotate()
My code:
import numpy as np
import matplotlib.pyplot as plt
n = [1,2,3,4,5,]
s = [i**2 for i in n]
line = plt.bar(n,s)
plt.xlabel('Number')
plt.ylabel("Square")
for i in range(len(s)):
plt.annotate(str(s[i]), xy=(n[i],s[i]), ha='center', va='bottom')
plt.show()
By specifying a horizontal and vertical alignment of 'center' and 'bottom' respectively one can get centered annotations.
I needed the bar labels too, note that my y-axis is having a zoomed view using limits on y axis. The default calculations for putting the labels on top of the bar still works using height (use_global_coordinate=False in the example). But I wanted to show that the labels can be put in the bottom of the graph too in zoomed view using global coordinates in matplotlib 3.0.2. Hope it help someone.
def autolabel(rects,data):
"""
Attach a text label above each bar displaying its height
"""
c = 0
initial = 0.091
offset = 0.205
use_global_coordinate = True
if use_global_coordinate:
for i in data:
ax.text(initial+offset*c, 0.05, str(i), horizontalalignment='center',
verticalalignment='center', transform=ax.transAxes,fontsize=8)
c=c+1
else:
for rect,i in zip(rects,data):
height = rect.get_height()
ax.text(rect.get_x() + rect.get_width()/2., height,str(i),ha='center', va='bottom')
If you only want to add Datapoints above the bars, you could easily do it with:
for i in range(len(frequencies)): # your number of bars
plt.text(x = x_values[i]-0.25, #takes your x values as horizontal positioning argument
y = y_values[i]+1, #takes your y values as vertical positioning argument
s = data_labels[i], # the labels you want to add to the data
size = 9) # font size of datalabels
Related
I am trying to create a stacked bar chart using PyCharm.
I am using matplotlib to explore at fullest its potentialities for simple data visualization.
My original code is for a group chart bar that displays cycle time for different teams. Such information come from a dataframe. The chart also includes autolabeling function (i.e. the height of each bar = continuous variable).
I am trying to convert such group bar chart in a stacked bar chart. The code below needs to be improved because of two factors:
labels for variables have too many decimals: this issue did not occur for the grouped bar chart. The csv file and the derived data frame weren't altered. I am struggling to understand if and where to use round command. I guess the issue is either related to the autolabeling function, because datatype used is float (I need to see at least 1 decimal).
data labels are displaced: as the auto labeling function was created for separated bars, the labels always matched the distance I wanted (based on the vertical offset). Unfortunately I did not figure out how to make sure that this distance is rather centered (see my example, the value for funnel time is at the height of squad time instead, and vice-versa). By logic, the issue should be that the height of each variable is defined ahead (see rects3 in the code, value of bottom) but I don't know how to reflect this in my auto-labeling variable.
The question is what exactly in the code must be altered in order to have the values of cycle time centered?
The code (notes for you are marked in bold):
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
'''PART 1 - Preprocess data -----------------------------------------------'''
#Directory or link of my CSV. This can be used also if you want to use API.
csv1 = r"C:\Users\AndreaPaviglianiti\Downloads\CT_Plot_DF.csv"
#Create and read dataframe. This is just to check the DF before plotting
df = pd.read_csv(csv1, sep=',', engine= 'python')
print(df, '\n')
#Extract columns as lists
squads = df['Squad_Name'].astype('str') #for our horizontal axis
funnel = df['Funnel_Time'].astype('float')
squadt = df['Squad_Time'].astype('float')
wait = df['Waiting_Time'].astype('float')
Here I tried to set the rounding but without success
'''PART 2 - Create the Bar Plot / Chart ----------------------------------'''
x = np.arange(len(squads)) #our labels on x will be the squads' names
width = 0.2 # the width of the bars. The bigger value, the larger bars
distance = 0.2
#Create objects that will be used as subplots (fig and ax).
#Each "rects" is the visualization of a yn value. first witdth is distance between X values,
# the second is the real width of bars.
fig, ax = plt.subplots()
rects1 = ax.bar(x, funnel, width, color='red', label='Funnel Time')
rects2 = ax.bar(x, squadt, width, color='green', bottom=funnel, label='Squad Time')
rects3 = ax.bar(x, wait, width, bottom=funnel+squadt, color='purple', label='Waiting Time')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Mean Cycle Time (h)')
ax.set_xlabel('\n Squads')
ax.set_title("Squad's Cycle Time Comparison in Dec-2020 \n (in mean Hours)")
ax.set_xticks(x)
ax.set_xticklabels(squads)
ax.legend()
fig.align_xlabels() #align labels to columns
# The function to display values above the bars
def autolabel(rects):
"""Attach a text label above each bar in *rects*, displaying its height."""
for rect in rects:
height = rect.get_height()
ax.annotate('{}'.format(height),
xy=(rect.get_x() + rect.get_width()/2, height),
xytext=(0, 3), # 3 points vertical offset
textcoords="offset points",
ha='center', va='bottom')
Here I tried to change xytext="center" but I get error, I am supposed to use coordinates only or is there an alternative to change the position from the height to the center?
#We will label only the most recent information. To label both add to the code "autolabel(rects1)"
autolabel(rects1)
autolabel(rects2)
autolabel(rects3)
fig.tight_layout()
'''PART 3 - Execute -------------------------------------------------------'''
plt.show()
Thank you for the help!
I am trying to display a count plot using seaborn, but the width of the bars is very high and the plot doesn't look nice. To counter it I change the width of the plot using the following code snippet:
sns.set()
fig,ax = plt.subplots(figsize=(10,4))
sns.countplot(x=imdb_data["label"],ax=ax)
for patch in ax.patches:
height = p.get_height()
width = patch.get_width
p.set_height(height*0.8)
patch.set_width(width*0.4)
x = p.get_x()
ax.text(x = x+new_width/2.,y= new_height+4,s = height,ha="center")
ax.legend(labels=("Negative","Positive"),loc='lower right')
plt.show()
But upon doing so the x-tick labels get shifted and the plot looks something like as shown in the attached image.
How should I change the width that, the x-tick location also, change automatically as per the new width of the bar ? . Also the legend is not being displayed properly. I used the below snippet to add the legend:
plt.legend(labels=['Positive','Negative'],loc='lower right')
Please help me out.
To keep the bar centered, you also need to change the x position with half the difference of the old and new width. Changing the height doesn't seem to be a good idea, as then the labels on the y-axis get mismatched. If the main reason to change the height is to make space for the text, it would be easier to change the y limits, e.g. via ax.margins(). Aligning the text vertically with 'bottom' allows leaving out the arbitrary offset for the y position.
The labels for the legend can be set via looping through the patches and setting the labels one by one. As the x-axis already has different positions for each bar, it might be better to leave out the legend and change the x tick labels?
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
sns.set()
imdb_data = pd.DataFrame({"label": np.random.randint(0, 2, 7500)})
fig, ax = plt.subplots(figsize=(10, 4))
sns.countplot(x=imdb_data["label"], ax=ax)
for patch, label in zip(ax.patches, ["Negative", "Positive"]):
height = patch.get_height()
width = patch.get_width()
new_width = width * 0.4
patch.set_width(new_width)
patch.set_label(label)
x = patch.get_x()
patch.set_x(x + (width - new_width) / 2)
ax.text(x=x + width/2, y=height, s=height, ha='center', va='bottom')
ax.legend(loc='lower right')
ax.margins(y=0.1)
plt.tight_layout()
plt.show()
PS: To change the x tick labels, so they can be used instead of the legend, add
ax.set_xticklabels(['negative', 'positive'])
and leave out the ax.legend() and patch.set_label(label) lines.
I am trying to make a series of matplotlib plots that plot timespans for different classes of objects. Each plot has an identical x-axis and plot elements like a title and a legend. However, which classes appear in each plot differs; each plot represents a different sampling unit, each of which only contains only a subset of all the possible classes.
I am having a lot of trouble determining how to set the figure and axis dimensions. The horizontal size should always remain the same, but the vertical dimensions need to be scaled to the number of classes represented in that sampling unit. The distance between each entry on the y-axis should be equal for every plot.
It seems that my difficulties lie in the fact that I can set the absolute size (in inches) of the figure with plt.figure(figsize=(w,h)), but I can only set the size of the axis with relative dimensions (e.g., fig.add_axes([0.3,0.05,0.6,0.85]) which leads to my x-axis labels getting cut off when the number of classes is small.
Here is an MSPaint version of what I'd like to get vs. what I'm getting.
Here is a simplified version of the code I have used. Hopefully it is enough to identify the problem/solution.
import pandas as pd
import matplotlib.pyplot as plt
import pylab as pl
from matplotlib import collections as mc
from matplotlib.lines import Line2D
import seaborn as sns
# elements for x-axis
start = 1
end = 6
interval = 1 # x-axis tick interval
xticks = [x for x in range(start, end, interval)] # create x ticks
# items needed for legend construction
lw_bins = [0,10,25,50,75,90,100] # bins for line width
lw_labels = [3,6,9,12,15,18] # line widths
def make_proxy(zvalue, scalar_mappable, **kwargs):
color = 'black'
return Line2D([0, 1], [0, 1], color=color, solid_capstyle='butt', **kwargs)
for line_subset in data:
# create line collection for this run through loop
lc = mc.LineCollection(line_subset)
# create plot and set properties
sns.set(style="ticks")
sns.set_context("notebook")
############################################################
# I think the problem lies here
fig = plt.figure(figsize=(11, len(line_subset.index)*0.25))
ax = fig.add_axes([0.3,0.05,0.6,0.85])
############################################################
ax.add_collection(lc)
ax.set_xlim(left=start, right=end)
ax.set_xticks(xticks)
ax.xaxis.set_ticks_position('bottom')
ax.margins(0.05)
sns.despine(left=True)
ax.set_yticks(line_subset['order_y'])
ax.set(yticklabels=line_subset['ylabel'])
ax.tick_params(axis='y', length=0)
# legend
proxies = [make_proxy(item, lc, linewidth=item) for item in lw_labels]
leg = ax.legend(proxies, ['0-10%', '10-25%', '25-50%', '50-75%', '75-90%', '90-100%'], bbox_to_anchor=(1.0, 0.9),
loc='best', ncol=1, labelspacing=3.0, handlelength=4.0, handletextpad=0.5, markerfirst=True,
columnspacing=1.0)
for txt in leg.get_texts():
txt.set_ha("center") # horizontal alignment of text item
txt.set_x(-23) # x-position
txt.set_y(15) # y-position
You can start by defining the margins on top and bottom in units of inches. Having a fixed unit of one data unit in inches allows to calculate how large the final figure should be.
Then dividing the margin in inches by the figure height gives the relative margin in units of figure size, this can be supplied to the figure using subplots_adjust, given the subplots has been added with add_subplot.
A minimal example:
import numpy as np
import matplotlib.pyplot as plt
data = [np.random.rand(i,2) for i in [2,5,8,4,3]]
height_unit = 0.25 #inch
t = 0.15; b = 0.4 #inch
for d in data:
height = height_unit*(len(d)+1)+t+b
fig = plt.figure(figsize=(5, height))
ax = fig.add_subplot(111)
ax.set_ylim(-1, len(d))
fig.subplots_adjust(bottom=b/height, top=1-t/height, left=0.2, right=0.9)
ax.barh(range(len(d)),d[:,1], left=d[:,0], ec="k")
ax.set_yticks(range(len(d)))
plt.show()
I am trying to plot a polar plot using Seaborn's facetGrid, similar to what is detailed on seaborn's gallery
I am using the following code:
sns.set(context='notebook', style='darkgrid', palette='deep', font='sans-serif', font_scale=1.25)
# Set up a grid of axes with a polar projection
g = sns.FacetGrid(df_total, col="Construct", hue="Run", col_wrap=5, subplot_kws=dict(projection='polar'), size=5, sharex=False, sharey=False, despine=False)
# Draw a scatterplot onto each axes in the grid
g.map(plt.plot, 'Rad', ''y axis label', marker=".", ms=3, ls='None').set_titles("{col_name}")
plt.savefig('./image.pdf')
Which with my data gives the following:
I want to keep this organisation of 5 plots per line.
The problem is that the title of each subplot overlap with the values of the ticks, same for the y axis label.
Is there a way to prevent this behaviour? Can I somehow shift the titles slightly above their current position and can I shift the y axis labels slightly on the left of their current position?
Many thanks in advance!
EDIT:
This is not a duplicate of this SO as the problem was that the title of one subplot overlapped with the axis label of another subplot.
Here my problem is that the title of one subplot overlaps with the ticks label of the same subplot and similarly the axis label overlaps with the ticks label of the same subplot.
I also would like to add that I do not care that they overlap on my jupyter notebook (as it as been created with it), however I want the final saved image with no overlap, so perhaps there is something I need to do to save the image in a slightly different format to avoid that, but I don't know what (I am only using plt.savefig to save it).
EDIT 2: If someone would like to reproduce the problem here is a minimal example:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
sns.set()
sns.set(context='notebook', style='darkgrid', palette='deep', font='sans-serif', font_scale=1.5)
# Generate an example radial datast
r = np.linspace(0, 10000, num=100)
df = pd.DataFrame({'label': r, 'slow': r, 'medium-slow': 1 * r, 'medium': 2 * r, 'medium-fast': 3 * r, 'fast': 4 * r})
# Convert the dataframe to long-form or "tidy" format
df = pd.melt(df, id_vars=['label'], var_name='speed', value_name='theta')
# Set up a grid of axes with a polar projection
g = sns.FacetGrid(df, col="speed", hue="speed",
subplot_kws=dict(projection='polar'), size=4.5, col_wrap=5,
sharex=False, sharey=False, despine=False)
# Draw a scatterplot onto each axes in the grid
g.map(plt.scatter, "theta", "label")
plt.savefig('./image.png')
plt.show()
Which gives the following image in which the titles are not as bad as in my original problem (but still some overlap) and the label on the left hand side overlap completely.
In order to move the title a bit higher you can set at new position,
ax.title.set_position([.5, 1.1])
In order to move the ylabel a little further left, you can add some padding
ax.yaxis.labelpad = 25
To do this for the axes of the facetgrid, you'd do:
for ax in g.axes:
ax.title.set_position([.5, 1.1])
ax.yaxis.labelpad = 25
The answer provided by ImportanceOfBeingErnest in this SO question may help.
I have a pie chart drawing the values extracted from a CSV file. The proportion of the values are currently displayed with the percentage displayed "autopct='%1.1f%%'". Is there a way to display the actual values which are represented in the dataset for each slice.
#Pie for Life Expectancy in Boroughs
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
# show plots inline
%matplotlib inline
# use ggplot style
matplotlib.style.use('ggplot')
#read data
lifeEx = pd.read_csv('LEpie.csv')
#Select columns
df = pd.DataFrame()
df['LB'] = lifeEx[['Regions']]
df['LifeEx'] = lifeEx[['MinLF']]
colorz = ['#B5DF00','#AD1FFF', '#BF1B00','#5FB1FF','#FFC93F']
exploda = (0, 0, 0, 0.1, 0)
#plotting
plt.pie(df['LifeEx'], labels=df['LB'], colors=colorz, autopct='%1.1f%%', explode = exploda, shadow = True,startangle=90)
#labeling
plt.title('Min Life expectancy across London Regions', fontsize=12)
Using the autopct keyword
As we know that the percentage shown times the sum of all actual values must be the actual value, we can define this as a function and supply this function to plt.pie using the autopct keyword.
import matplotlib.pyplot as plt
import numpy
labels = 'Frogs', 'Hogs', 'Dogs'
sizes = numpy.array([5860, 677, 3200])
colors = ['yellowgreen', 'gold', 'lightskyblue']
def absolute_value(val):
a = numpy.round(val/100.*sizes.sum(), 0)
return a
plt.pie(sizes, labels=labels, colors=colors,
autopct=absolute_value, shadow=True)
plt.axis('equal')
plt.show()
Care must be taken since the calculation involves some error, so the supplied value is only accurate to some decimal places.
A little bit more advanced may be the following function, that tries to get the original value from the input array back by comparing the difference between the calculated value and the input array. This method does not have the problem of inaccuracy but relies on input values which are sufficiently distinct from one another.
def absolute_value2(val):
a = sizes[ numpy.abs(sizes - val/100.*sizes.sum()).argmin() ]
return a
Changing text after pie creation
The other option is to first let the pie being drawn with the percentage values and replace them afterwards. To this end, one would store the autopct labels returned by plt.pie() and loop over them to replace the text with the values from the original array. Attention, plt.pie() only returns three arguments, the last one being the labels of interest, when autopct keyword is provided so we set it to an empty string here.
labels = 'Frogs', 'Hogs', 'Dogs'
sizes = numpy.array([5860, 677, 3200])
colors = ['yellowgreen', 'gold', 'lightskyblue']
p, tx, autotexts = plt.pie(sizes, labels=labels, colors=colors,
autopct="", shadow=True)
for i, a in enumerate(autotexts):
a.set_text("{}".format(sizes[i]))
plt.axis('equal')
plt.show()
If you're looking to plot a piechart from a DataFrame, and want to display the actual values instead of percentages, you could reformat autopct like so:
values=df['your_column'].value_counts(dropna=True)
plt.pie(<actual_values>, colors = colors, autopct= lambda x: '{:.0f}'.format(x*values.sum()/100), startangle=90)
The example below creates a Donut, but you could play around:
(Credit to Kevin Amipara # https://medium.com/#kvnamipara/a-better-visualisation-of-pie-charts-by-matplotlib-935b7667d77f)
import matplotlib.pyplot as plt
# Pie chart (plots value counts in this case)
labels = df['your_column'].dropna().unique()
actual_values = df['your_column'].value_counts(dropna=True)
#choose your colors
colors = ['#ff9999','#66b3ff','#99ff99','#ffcc99','#fffd55']
fig1, ax1 = plt.subplots()
# To denote actual values instead of percentages as labels in the pie chart, reformat autopct
values=df['your_column'].value_counts(dropna=True)
plt.pie(actual_values, colors = colors, autopct= lambda x: '{:.0f}'.format(x*values.sum()/100), startangle=90)
#draw circle (this example creates a donut)
centre_circle = plt.Circle((0,0),0.70,fc='white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
# Equal aspect ratio ensures that pie is drawn as a circle
ax1.axis('equal')
# A separate legend with labels (drawn to the bottom left of the pie in this case)
plt.legend(labels, bbox_to_anchor = (0.1, .3))
plt.tight_layout()
plt.show()