Matplotlib: Overlapping labels in pie chart - python

I have to make a piechart for the following data:
However, because the larger numbers are in the hundreds while the smaller numbers are lesser than 1, the labels for the graph end up illegible due to overlapping. For example, this is the graph for Singapore:
I have tried decreasing the font size and increasing the graph size but because it overlaps so much, doing so doesn't really help at all. Here are the necessary codes for my graph:
import matplotlib.pyplot as plt
plt.pie(consumption["Singapore"], labels = consumption.index)
fig = plt.gcf()
fig.set_size_inches(8,8)
ax = plt.gca()
handles, labels = ax.get_legend_handles_labels()
labels, handles = zip(*sorted(zip(labels, handles), key=lambda t: t[0], reverse=True))
plt.show()
Is there any way to solve this issue?

The problem of overlapping label characters cannot be completely solved by programming. If you're dealing with your challenges only, first group them to aggregate the number of labels. The grouped data frames are targeted for the pie chart. However, it still overlaps, so get the current label position and change the position of the overlapping label.
new_df = consumption.groupby('Singapore')['Entity'].apply(list).reset_index()
new_df['Entity'] = new_df['Entity'].apply(lambda x: ','.join(x))
new_df
Singapore Entity
0 0.000000 Biofuels,Wind,Hydro,Nuclear
1 0.679398 Other
2 0.728067 Solar
3 5.463305 Coal
4 125.983605 Gas
5 815.027694 Oil
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8,8))
wedges, texts = ax.pie(new_df["Singapore"], wedgeprops=dict(width=0.5), startangle=0, labels=new_df.Entity)
# print(wedges, texts)
texts[0].set_position((1.1,0.0))
texts[1].set_position((1.95,0.0))
texts[2].set_position((2.15,0.0))
plt.legend()
plt.show()

Related

Make a stacked bar plot from seaborn to matplotlib

I need some help making a set of stacked bar charts in python with matlibplot.
Formally, my dataframe looks like this
plt.figure(figsize=(10, 14))
fig= plt.figure()
ax = sns.countplot(x="airlines",hue='typecode', data=trafic,
order=trafic.airlines.value_counts(ascending=False).iloc[:5].index,
hue_order=trafic.typecode.value_counts(ascending=False).iloc[:5].index,
)
ax.set(xlabel="Airlines code", ylabel='Count')
As written in order and hue_order, I want to isolate the 5 most present airlines and aircraft types in my database
I was advised to make a stacked bar plot to make a more presentable graph, only I don't see any functionality with Seaborn to make one, and I can't manage with matplotlib to plot it while respecting this idea of isolating the 5 airlines/aircraft types most present in my database
Thanks for your help!
The following code uses seaborn's countplot with dodge=False. This places all bars belonging to the same airline one on top of the other. In a next step, all bars are moved up to stack them:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
sns.set()
np.random.seed(123)
trafic = pd.DataFrame({'airlines': np.random.choice([*'abcdefghij'], 500),
'typecode': np.random.choice([*'qrstuvwxyz'], 500)})
fig = plt.figure(figsize=(10, 5))
ax = sns.countplot(x="airlines", hue='typecode', palette='rocket', dodge=False, data=trafic,
order=trafic.airlines.value_counts(ascending=False).iloc[:5].index,
hue_order=trafic.typecode.value_counts(ascending=False).iloc[:5].index)
ax.set(xlabel="Airlines code", ylabel='Count')
bottoms = {}
for bars in ax.containers:
for bar in bars:
x, y = bar.get_xy()
h = bar.get_height()
if x in bottoms:
bar.set_y(bottoms[x])
bottoms[x] += h
else:
bottoms[x] = h
ax.relim() # the plot limits need to be updated with the moved bars
ax.autoscale()
plt.show()
Note that the airlines are sorted on their total airplanes, not on their total for the 5 overall most frequent airplane types.
PS: In the question's code, plt.figure() is called twice. That first creates an empty figure with the given figsize, and then a new figure with a default figsize.

Garbled x-axis labels in matplotlib subplots

I am querying COVID-19 data and building a dataframe of day-over-day changes for one of the data points (positive test results) where each row is a day, each column is a state or territory (there are 56 altogether). I can then generate a chart for every one of the states, but I can't get my x-axis labels (the dates) to behave like I want. There are two problems which I suspect are related. First, there are too many labels -- usually matplotlib tidily reduces the label count for readability, but I think the subplots are confusing it. Second, I would like the labels to read vertically; but this only happens on the last of the plots. (I tried moving the rotation='vertical' inside the for block, to no avail.)
The dates are the same for all the subplots, so -- this part works -- the x-axis labels only need to appear on the bottom row of the subplots. Matplotlib is doing this automatically. But I need fewer of the labels, and for all of them to align vertically. Here is my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# get current data
all_states = pd.read_json("https://covidtracking.com/api/v1/states/daily.json")
# convert the YYYYMMDD date to a datetime object
all_states[['gooddate']] = all_states[['date']].applymap(lambda s: pd.to_datetime(str(s), format = '%Y%m%d'))
# 'positive' is the cumulative total of COVID-19 test results that are positive
all_states_new_positives = all_states.pivot_table(index = 'gooddate', columns = 'state', values = 'positive', aggfunc='sum')
all_states_new_positives_diff = all_states_new_positives.diff()
fig, axes = plt.subplots(14, 4, figsize = (12,8), sharex = True )
plt.tight_layout
for i , ax in enumerate(axes.ravel()):
# get the numbers for the last 28 days
x = all_states_new_positives_diff.iloc[-28 :].index
y = all_states_new_positives_diff.iloc[-28 : , i]
ax.set_title(y.name, loc='left', fontsize=12, fontweight=0)
ax.plot(x,y)
plt.xticks(rotation='vertical')
plt.subplots_adjust(left=0.5, bottom=1, right=1, top=4, wspace=2, hspace=2)
plt.show();
Suggestions:
Increase the height of the figure.
fig, axes = plt.subplots(14, 4, figsize = (12,20), sharex = True)
Rotate all the labels:
fig.autofmt_xdate(rotation=90)
Use tight_layout at the end instead of subplots_adjust:
fig.tight_layout()

Matplotlib: Plot on double y-axis plot misaligned

I'm trying to plot two datasets into one plot with matplotlib. One of the two plots is misaligned by 1 on the x-axis.
This MWE pretty much sums up the problem. What do I have to adjust to bring the box-plot further to the left?
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
titles = ["nlnd", "nlmd", "nlhd", "mlnd", "mlmd", "mlhd", "hlnd", "hlmd", "hlhd"]
plotData = pd.DataFrame(np.random.rand(25, 9), columns=titles)
failureRates = pd.DataFrame(np.random.rand(9, 1), index=titles)
color = {'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue',
'caps': 'Gray'}
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
plotData.plot.box(ax=ax1, color=color, sym='+')
failureRates.plot(ax=ax2, color='b', legend=False)
ax1.set_ylabel('Seconds')
ax2.set_ylabel('Failure Rate in %')
plt.xlim(-0.7, 8.7)
ax1.set_xticks(range(len(titles)))
ax1.set_xticklabels(titles)
fig.tight_layout()
fig.show()
Actual result. Note that its only 8 box-plots instead of 9 and that they're starting at index 1.
The issue is a mismatch between how box() and plot() work - box() starts at x-position 1 and plot() depends on the index of the dataframe (which defaults to starting at 0). There are only 8 plots because the 9th is being cut off since you specify plt.xlim(-0.7, 8.7). There are several easy ways to fix this, as #Sheldore's answer indicates, you can explicitly set the positions for the boxplot. Another way you can do this is to change the indexing of the failureRates dataframe to start at 1 in construction of the dataframe, i.e.
failureRates = pd.DataFrame(np.random.rand(9, 1), index=range(1, len(titles)+1))
note that you need not specify the xticks or the xlim for the question MCVE, but you may need to for your complete code.
You can specify the positions on the x-axis where you want to have the box plots. Since you have 9 boxes, use the following which generates the figure below
plotData.plot.box(ax=ax1, color=color, sym='+', positions=range(9))

How to fix overlapping matplotlib y-axis tick labels or autoscale the plot? [duplicate]

I am trying to make a series of matplotlib plots that plot timespans for different classes of objects. Each plot has an identical x-axis and plot elements like a title and a legend. However, which classes appear in each plot differs; each plot represents a different sampling unit, each of which only contains only a subset of all the possible classes.
I am having a lot of trouble determining how to set the figure and axis dimensions. The horizontal size should always remain the same, but the vertical dimensions need to be scaled to the number of classes represented in that sampling unit. The distance between each entry on the y-axis should be equal for every plot.
It seems that my difficulties lie in the fact that I can set the absolute size (in inches) of the figure with plt.figure(figsize=(w,h)), but I can only set the size of the axis with relative dimensions (e.g., fig.add_axes([0.3,0.05,0.6,0.85]) which leads to my x-axis labels getting cut off when the number of classes is small.
Here is an MSPaint version of what I'd like to get vs. what I'm getting.
Here is a simplified version of the code I have used. Hopefully it is enough to identify the problem/solution.
import pandas as pd
import matplotlib.pyplot as plt
import pylab as pl
from matplotlib import collections as mc
from matplotlib.lines import Line2D
import seaborn as sns
# elements for x-axis
start = 1
end = 6
interval = 1 # x-axis tick interval
xticks = [x for x in range(start, end, interval)] # create x ticks
# items needed for legend construction
lw_bins = [0,10,25,50,75,90,100] # bins for line width
lw_labels = [3,6,9,12,15,18] # line widths
def make_proxy(zvalue, scalar_mappable, **kwargs):
color = 'black'
return Line2D([0, 1], [0, 1], color=color, solid_capstyle='butt', **kwargs)
for line_subset in data:
# create line collection for this run through loop
lc = mc.LineCollection(line_subset)
# create plot and set properties
sns.set(style="ticks")
sns.set_context("notebook")
############################################################
# I think the problem lies here
fig = plt.figure(figsize=(11, len(line_subset.index)*0.25))
ax = fig.add_axes([0.3,0.05,0.6,0.85])
############################################################
ax.add_collection(lc)
ax.set_xlim(left=start, right=end)
ax.set_xticks(xticks)
ax.xaxis.set_ticks_position('bottom')
ax.margins(0.05)
sns.despine(left=True)
ax.set_yticks(line_subset['order_y'])
ax.set(yticklabels=line_subset['ylabel'])
ax.tick_params(axis='y', length=0)
# legend
proxies = [make_proxy(item, lc, linewidth=item) for item in lw_labels]
leg = ax.legend(proxies, ['0-10%', '10-25%', '25-50%', '50-75%', '75-90%', '90-100%'], bbox_to_anchor=(1.0, 0.9),
loc='best', ncol=1, labelspacing=3.0, handlelength=4.0, handletextpad=0.5, markerfirst=True,
columnspacing=1.0)
for txt in leg.get_texts():
txt.set_ha("center") # horizontal alignment of text item
txt.set_x(-23) # x-position
txt.set_y(15) # y-position
You can start by defining the margins on top and bottom in units of inches. Having a fixed unit of one data unit in inches allows to calculate how large the final figure should be.
Then dividing the margin in inches by the figure height gives the relative margin in units of figure size, this can be supplied to the figure using subplots_adjust, given the subplots has been added with add_subplot.
A minimal example:
import numpy as np
import matplotlib.pyplot as plt
data = [np.random.rand(i,2) for i in [2,5,8,4,3]]
height_unit = 0.25 #inch
t = 0.15; b = 0.4 #inch
for d in data:
height = height_unit*(len(d)+1)+t+b
fig = plt.figure(figsize=(5, height))
ax = fig.add_subplot(111)
ax.set_ylim(-1, len(d))
fig.subplots_adjust(bottom=b/height, top=1-t/height, left=0.2, right=0.9)
ax.barh(range(len(d)),d[:,1], left=d[:,0], ec="k")
ax.set_yticks(range(len(d)))
plt.show()

How to show filtered legend labels in pyplot pie chart based on the values of contributions?

I would like to plot a pie chart that shows contributions that are more than 1%, and their corresponding legend label.
I have managed showing the percentage values I wanted on the pie (see script below), but not the legend labels. In the following example, I want to show legend labels ABCD, but not EF.
I have tried several things, but only able to show either a full legend, or a filtered legend with unmatched (wrong) colors.
How can I do this? Can someone help? Thanks.
sizes = pd.DataFrame([80,10,5,4,0.1,0.9],index=list("ABCDEF"))
fig1, ax2 = plt.subplots()
def autopct_more_than_1(pct):
return ('%1.f%%' % pct) if pct > 1 else ''
ax2.pie(sizes.values, autopct=autopct_more_than_1)
ax2.axis('equal')
plt.legend(sizes.index, loc="best", bbox_to_anchor=(1,1))
plt.show()
You can loop over the dataframe values (possibly normalized if they aren't already) and only take the legend handles and labels for those which are bigger than 1.
import matplotlib.pyplot as plt
import pandas as pd
sizes = pd.DataFrame([80,10,5,4,0.1,0.9],index=list("ABCDEF"))
fig1, ax = plt.subplots()
def autopct_more_than_1(pct):
return ('%1.f%%' % pct) if pct > 1 else ''
p,t,a = ax.pie(sizes.values, autopct=autopct_more_than_1)
ax.axis('equal')
# normalize dataframe (not actually needed here, but for general case)
normsizes = sizes/sizes.sum()*100
# create handles and labels for legend, take only those where value is > 1
h,l = zip(*[(h,lab) for h,lab,i in zip(p,sizes.index.values,normsizes.values) if i > 1])
ax.legend(h, l,loc="best", bbox_to_anchor=(1,1))
plt.show()

Categories