How do I remove negative X axis labels in a population pyramid? - python

I have a population pyramid but the x axis on one side is negative, is there a way to rename just the negative x axis so that it is positive?
# Draw Plot
plt.figure(figsize=(13,10), dpi= 80)
group_col = 't1_gender'
order_of_bars = df.agegroup.unique()[::-1]
colors = [[group_col].unique())-1)) for i in range(len(df[group_col].unique()))]
for c, group in zip(colors, df[group_col].unique()):
sns.barplot(x='count', y='agegroup', data=df.loc[df[group_col]==group, :], order=order_of_bars, color=c, label=group)
# Decorations
plt.xlabel("Number of calls")
plt.ylabel("Age group")
plt.title("", fontsize=22)
plt.savefig('images/figure2.png', dpi=300, facecolor=ax.get_facecolor(), transparent=True, pad_inches=0.0)

A possible solution is to get the current ticks using get_xticks() and then use the np.abs function to force the tick labels to be positive:
xticks = plt.gca().get_xticks().astype(
plt.xticks(xticks, labels=np.abs(xticks));
(add the code to the #decoration section)


adjusting horizontal bar chart matplotlib to accommodate the bars

I am doing a horizontal bar chart but struggling with adjusting ylim, or maybe another parameter to make my labels clearer and make all the labels fit the y axis . I played around with ylim and the text size can be bigger or smaller but the bars do not fit the y axis. Any idea about the right approach?
My code:
import matplotlib.pyplot as plt #we load the library that contains the plotting capabilities
from operator import itemgetter
for att, befor, after in zip(df_portion['attributes'], df_portion['2005_2011 (%)'], df_portion['2012_2015 (%)']):
i=(att, befor, after)
Dsort = sorted(D, key=itemgetter(1), reverse=False) #sort the list in order of usage
attri = [x[0] for x in Dsort]
aft = [x[1] for x in Dsort]
bef = [x[2] for x in Dsort]
ind = np.arange(len(attri))
ax = plt.subplot(111)
ax.barh(ind, aft, width,align='center',alpha=1, color='r', label='from 2012 to 2015') #a horizontal bar chart (use .bar instead of .barh for vertical)
ax.barh(ind - width, bef, width, align='center', alpha=1, color='b', label='from 2005 to 2008') #a horizontal bar chart (use .bar instead of .barh for vertical)
ax.set(yticks=ind, yticklabels=attri,ylim=[1, len(attri)/2])
plt.xlabel('Frequency distribution (%)')
plt.title('Frequency distribution (%) of common attributes between 2005_2008 and between 2012_2015')
This is the plot for above code
To make the labels fit, you need to set a smaller fontsize, or use a larger figsize. Changing the ylim will either just show a subset of the bars (in case ylim is set too narrow), or will show more whitespace (when ylim is larger).
The biggest problem in the code is width being too large. Twice the width needs to fit over a distance of 1.0 (the ticks are placed via ind, which is an array 0,1,2,...). As matplotlib calls the thickness of a horizontal bar plot "height", this name is used in the example code below. Using align='edge' lets you position the bars directly (align='center' will move them half their "height").
Pandas has simple functions to sort dataframes according to one or more rows.
Code to illustrate the ideas:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# first create some test data
df = pd.DataFrame({'attributes': ["alpha", "beta", "gamma", "delta", "epsilon", "zata", "eta", "theta", "iota",
"kappa", "lambda", "mu", "nu", "xi", "omikron", "pi", "rho", "sigma", "tau",
"upsilon", "phi", "chi", "psi", "omega"]})
totals_2005_2011 = np.random.uniform(100, 10000, len(df))
totals_2012_2015 = totals_2005_2011 * np.random.uniform(0.70, 2, len(df))
df['2005_2011 (%)'] = totals_2005_2011 / totals_2005_2011.sum() * 100
df['2012_2015 (%)'] = totals_2012_2015 / totals_2012_2015.sum() * 100
# sort all rows via the '2005_2011 (%)' column, sort from large to small
df = df.sort_values('2005_2011 (%)', ascending=False)
ind = np.arange(len(df))
height = 0.3 # two times height needs to be at most 1
fig, ax = plt.subplots(figsize=(12, 6))
ax.barh(ind, df['2012_2015 (%)'], height, align='edge', alpha=1, color='crimson', label='from 2012 to 2015')
ax.barh(ind - height, df['2005_2011 (%)'], height, align='edge', alpha=1, color='dodgerblue', label='from 2005 to 2011')
ax.set_yticklabels(df['attributes'], fontsize=10)
ax.set_xlabel('Frequency distribution (%)')
ax.set_title('Frequency distribution (%) of common attributes between 2005_2011 and between 2012_2015')
ax.margins(y=0.01) # use smaller margins in the y-direction
The seaborn library has some functions to create barplots with multiple bars per attribute, without the need to manually fiddle with bar positions. Seaborn prefers its data in "long form", which can be created via pandas' melt().
Example code:
import seaborn as sns
df = df.sort_values('2005_2011 (%)', ascending=True)
df_long = df.melt(id_vars='attributes', value_vars=['2005_2011 (%)', '2012_2015 (%)'],
var_name='period', value_name='distribution')
fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(data=df_long, y='attributes', x='distribution', hue='period', palette='turbo', ax=ax)
ax.set_xlabel('Frequency distribution (%)')
ax.set_title('Frequency distribution (%) of common attributes between 2005_2011 and between 2012_2015')
ax.tick_params(axis='y', labelsize=12)

Add a Right Yticks To a Plot

I am trying to make a plot of sort, this is my code and the output:
ticks = [3500, 5000]
labels = ["\u0332P", "P\u0305"]
plt.title("Nilai Premi Optimal \n dengan Batasan")
plt.xlabel("$\it{Bargaining Power}$ \u03BB")
plt.plot(xlamda, PsiBLamda, color = "red",linestyle='dashed',label = "$\u03C8_{B} (I^*(X))$")
plt.plot(xlamda, PsiSLamda, color = "blue",linestyle='dashed', label = "$\u03C8_{S} (I^*(X))$")
plt.legend(loc="upper left")
plt.plot(xlamda, PLamda, color = "black")
plt.xlim([0, 1])
plt.ylim([3500, 7000])
The plot output is correct, however I want to add a tick on the right y axis at the 5000 point with the label P. Here is an example:
How do I code that? Thank you
Check out secondary axes:
ticks = [3500, 5000]
labels = ["\u0332P", "P\u0305"]
fig, ax = plt.subplots() # need the axis object
plt.title("Nilai Premi Optimal \n dengan Batasan")
plt.xlabel("$\it{Bargaining Power}$ \u03BB")
plt.plot(xlamda, PsiBLamda, color = "red",linestyle='dashed',label = "$\u03C8_{B} (I^*(X))$")
plt.plot(xlamda, PsiSLamda, color = "blue",linestyle='dashed', label = "$\u03C8_{S} (I^*(X))$")
plt.legend(loc="upper left")
plt.plot(xlamda, PLamda, color = "black")
plt.xlim([0, 1])
plt.ylim([3500, 7000])
rightax = ax.secondary_yaxis('right') # create secondary axis on the right
rightax.set_yticks(ticks) # set tick locations
rightax.set_yticklabels(labels) # set tick labels

How to label the vertical lines independent of the scale of the plot?

My program takes n sets of data and plots their histograms.
I. How to label the vertical lines independent of the height of the plot?
A vertical line indicates the most frequent value in a dataset. I want to add a label indicating the value, say 20% from the top. When using matplotlib.pyplot.text() I had to manually assign x and y values. Depending up on the dataset the text goes way up or way down which I don't want to happen.
matplot.axvline(most_common_number, linewidth=0.5, color='black')
matplot.text(most_common_number + 3, 10, str(most_common_number),
horizontalalignment='center', fontweight='bold', color='black')
I also tried setting the label parameter of matplotlib.pyplot.axvline() but it only adds to the legend of the plot.
matplot.axvline(most_common_number, linewidth=0.5, color='black', label=str(most_common_number))
I wonder if there is a way to use percentages so the text appears n% from the top or use a different method to label the vertical lines. Or am I doing this all wrong?
II. How to make the ticks on x-axis to be spaced out better on resulting image?
I want the x-axis ticks to be factors of 16 so I had to override the defaults. This is where the trouble began. When I save the plot to a PNG file, the x-axis looks really messed up.
But when I use show() it works fine:
Program Snippet
kwargs = dict(alpha=0.5, bins=37, range=(0, 304), density=False, stacked=True)
fig, ax1 = matplot.subplots()
colors = ['tab:blue', 'tab:orange', 'tab:green', 'tab:red', 'tab:purple', 'tab:brown', 'tab:pink', 'tab:gray', 'tab:olive', 'tab:cyan']
count = 0
datasets = [('dataset name', ['data'])]
for item in datasets:
dataset = item[1]
most_common_number = most_common(dataset)
ax1.hist(dataset, **kwargs, label=item[0], color=colors[count])
matplot.axvline(most_common_number, linewidth=0.5, color='black')
matplot.text(most_common_number + 3, 10, str(most_common_number),
horizontalalignment='center', fontweight='bold', color='black')
count += 1
#for x-axis
loc = matplotticker.MultipleLocator(base=16) # this locator puts ticks at regular intervals
#for y-axis
y_vals = ax1.get_yticks()
ax1.set_yticklabels(['{:3.1f}%'.format(x / len(datasets[0][1]) * 100) for x in y_vals])
#set title
matplot.gca().set(title='1 vs 2 vs 3')
#set subtitle
matplot.suptitle("This is a cool subtitle.", va="bottom", family="overpass")
fig = matplot.gcf()
fig.set_size_inches(16, 9)
matplot.savefig('out.png', format = 'png', dpi=120)
I. How to label the vertical lines independent of the height of the plot?
It can be done in two ways:
Axes limits
matplotlib.pyplot.xlim and matplotlib.pyplot.ylim
ylim() will give the max and min values of the axis. eg: (0.0, 1707.3)
matplot.text(x + matplot.xlim()[1] * 0.02 , matplot.ylim()[1] * 0.8,
horizontalalignment='center', fontweight='bold', color='black')
(x + matplot.xlim()[1] * 0.02 means at x but 2% to the right. Because you don't want the text to coincide on the vertical line it labels.
matplot.ylim()[1] * 0.8 means at 80% height of the y-axis.
Or you can directly specify x and y as scale (eg: 0.8 of an axis) using transform parameter:
matplot.text(most_common_number, 0.8,
' ' + str(most_common_number), transform=ax1.get_xaxis_transform(),
horizontalalignment='center', fontweight='bold', color='black')
Here y = 0.8 means at 80% height of y-axis.
II. How to make the ticks on x-axis to be spaced out better on resulting image?
Use matplotlib.pyplot.gcf() to change the dimensions and use a custom dpi (otherwise the text will not scale properly) when saving the figure.
gcf() means "get current figure".
fig = matplot.gcf()
fig.set_size_inches(16, 9)
matplot.savefig('out.png', format = 'png', dpi=120)
So the resulting image will be (16*120, 9*120) or (1920, 1080) px.

Increasing the space between the plot and the title with matplotlib

I am using the following script to generate some plots. The problem is sometimes the scientific notation is overlapping with the title.
Is there a way to fix this like moving the plot a little bit down?
# init
u = {}
o = {}
# create figure
fig = plt.figure()
# x-Axis (timesteps)
i = np.array(i)
for key in urbs_values.keys():
# y-Axis (values)
u[key] = np.array(urbs_values[key])
o[key] = np.array(oemof_values[key])
# draw plots
plt.plot(i, u[key], label='urbs_'+str(key), linestyle='None', marker='x')
plt.ticklabel_format(axis='y', style='sci', scilimits=(0, 0))
plt.plot(i, o[key], label='oemof_'+str(key), linestyle='None', marker='.')
plt.ticklabel_format(axis='y', style='sci', scilimits=(0, 0))
# plot specs
plt.xlabel('Timesteps [h]')
plt.ylabel('Flow [MWh]')
plt.title(site+' '+name)
plt.legend(bbox_to_anchor=(1.025, 1), loc=2, borderaxespad=0)
You can change the position of the title by providing a value for the y parameter in plt.title(...), e.g., plt.title(site+' '+name, y=1.1).
You can edit the tittle position this way:
# plot specs
plt.xlabel('Timesteps [h]')
plt.ylabel('Flow [MWh]')
plt.title(site+' '+name)
ttl = plt.title
ttl.set_position([.5, 1.02])
plt.legend(bbox_to_anchor=(1.025, 1), loc=2, borderaxespad=0)
tuning the '1.02' should do the trick

Why is there extra space at the bottom of this plot?

I just created a horizontal stacked bar chart using matplotlib, and I can't figure out why there is extra space between the x axis and the first bar (code and picture below). Any suggestions or questions? Thanks!
fig = figure(facecolor="white")
ax1 = fig.add_subplot(111, axisbg="white")
heights = .43
data = np.array([source['loan1'],source['loan2'],source['loan3']])
dat2 = np.array(source2)
left = np.vstack((np.zeros((data.shape[1],), dtype=data.dtype), np.cumsum(data, axis=0) [:-1]))
colors = ( '#27A545', '#7D3CBD', '#C72121')
for dat, col, lefts, pname2 in zip(data, colors, left, pname):
ax1.barh(ind+(heights/2), dat, color=col, left=lefts, height = heights, align='center', alpha = .5)
p4 = ax1.barh(ind-(heights/2), dat2, height=heights, color = "#C6C6C6", align='center', alpha = .7)
yticks([z for z in range(N)], namelist)
#mostly for the legend
params = {'legend.fontsize': 8}
box = ax1.get_position()
ax1.set_position([box.x0, box.y0 + box.height * 0.1, box.width, box.height * 0.9])
l = ax1.legend(loc = 'upper center', bbox_to_anchor=(0.5,-0.05), fancybox=True, shadow = True, ncol = 4)
This is because matplotlib tries to intelligently choose minimum and maximum limits for the plot (i.e. "round-ish" numbers) by default.
This makes a lot of sense for some plots, but not for others.
To disable it, just do ax.axis('tight') to snap the data limits to the strict extents of the data.
If you want a bit of padding despite the "tight" bounds on the axes limits, use ax.margins.
In your case, you'd probably want something like:
# 5% padding on the y-axis and none on the x-axis
ax.margins(0, 0.05)
# Snap to data limits (with padding specified above)
Also, if you want to set the extents manually, you can just do
ax.axis([xmin, xmax, ymin, ymax])`
or use set_xlim, set_ylim, or even
ax.set(xlim=[xmin, xmax], ylim=[ymin, ymax], title='blah', xlabel='etc')
