How to stack only selected columns in pandas barh plot - python

I am trying to plot a bar chart where I would like to have two bars, one stacked and another one not stacked by the side of the stacked one.
I have the first plot which is a stacked plot:
And another plot, with the same lines and columns:
I want to plot it side by side to the columns of the last plot, and not stack it:
This is a code snippet to replicate my problem:
d = pd.DataFrame({'DC': {'col0': 257334.0,
'col1': 0.0,
'col2': 0.0,
'col3': 186146.0,
'col4': 0.0,
'col5': 366431.0,
'col6': 461.0,
'col7': 0.0,
'col8': 0.0},
'DC - IDC': {'col0': 32665.0,
'col1': 0.0,
'col2': 156598.0,
'col3': 0.0,
'col4': 176170.0,
'col5': 0.0,
'col6': 0.0,
'col7': 0.0,
'col8': 0.0},
'No Address': {'col0': 292442.0,
'col1': 227.0,
'col2': 298513.0,
'col3': 117167.0,
'col4': 249.0,
'col5': 747753.0,
'col6': 271976.0,
'col7': 9640.0,
'col8': 211410.0}})
d[['DC', 'DC - IDC']].plot.barh(stacked=True)
d[['No Address']].plot.barh( stacked=False, color='red')

Use position parameter to draw 2 columns on the same index:
fig, ax = plt.subplots()
d[['DC', 'DC - IDC']].plot.barh(width=0.4, position=0, stacked=True, ax=ax)
d[['No Address']].plot.barh(width=0.4, position=1, stacked=True, ax=ax, color='red')
plt.show()

You can achieve this only by using matplotlib.pyplot library. First, you need to import NumPy and matplotlib libraries.
import matplotlib.pyplot as plt
import numpy as np
Then,
plt.figure(figsize=(15,8))
plt.barh(d.index, d['DC'], 0.4, label='DC', align='edge')
plt.barh(d.index, d['DC - IDC'], 0.4, label='DC - IDC', align='edge')
plt.barh(np.arange(len(d.index))-0.4, d['No Address'], 0.4, color='red', label='No Address', align='edge')
plt.legend();
Here is what I did:
Increase the figure size (optional)
Create a BarContainer for each column
Decrease the width of each bar to 0.4 to make them fit
Align the left edges of the bars with the y positions
Normally all bars now are stacked. To put the red bars to the side you need to subtract each y coordinate by the width of the bars (0.4) np.arange(len(d.index))-0.4
Finally, add a legend
It should look like that:

Related

Use slider to plot dataframe columns as scatter plots

I'm trying to use a matplotlib slider to iterate through a list of dataframes and plot the columns as scatter plots.
As the slider progresses, I'd like it to plot the first column of the first dataframe as the x-axis, then, keeping the first column as the x-axis, progress through by individually plotting the remaining columns as the y-axes.
After completing the first dataframe, I'd like it to move on to the next one in the list of dataframes and follow the same general operation.
I am having trouble creating my update function to properly reach into the nested dataframes.
Here is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.widgets import Slider
data1 = {'col1': ['a1', 'b1', 'c1', 'd1', 'e1'],
'col2': [0.0 ,0.5, 1.0, 1.5, 2.0],
'col3': [2.5, 3.0, 3.5, 4.0, 4.5],
'col4': [5.0, 5.5, 6.0, 6.5, 7.0]}
data2 = {'col1': ['a2', 'b2', 'c2', 'd2', 'e2'],
'col2': [0.0 ,0.5, 1.0, 1.5, 2.0],
'col3': [2.5, 3.0, 3.5, 4.0, 4.5],
'col4': [5.0, 5.5, 6.0, 6.5, 7.0]}
data3 = {'col1': ['a3', 'b3', 'c3', 'd3', 'e3'],
'col2': [0.0 ,0.5, 1.0, 1.5, 2.0],
'col3': [2.5, 3.0, 3.5, 4.0, 4.5],
'col4': [5.0, 5.5, 6.0, 6.5, 7.0]}
data4 = {'col1': ['a4', 'b4', 'c4', 'd4', 'e4'],
'col2': [0.0 ,0.5, 1.0, 1.5, 2.0],
'col3': [2.5, 3.0, 3.5, 4.0, 4.5],
'col4': [5.0, 5.5, 6.0, 6.5, 7.0]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
df4 = pd.DataFrame(data4)
dataframes = [df1, df2, df3, df4]
fig, ax = plt.subplots(figsize=(9, 6))
fig.subplots_adjust(bottom=0.25)
scat = ax.scatter(dataframes[0]['col1'], dataframes[0]['col2'])
ax_pos = fig.add_axes([0.25, 0.1, 0.65, 0.03])
slider = Slider(
ax=ax_pos,
label='Slider',
valmin=1,
valmax=3,
valinit=1,
valstep=1
)
def update(val):
# I have no idea what to do here
columns = df1.columns
columns = columns[1:]
for x in columns:
slider.val = x
col = slider.val
for idx, df in enumerate(dataframes):
ax.scatter(dataframes[idx]['col1'], dataframes[idx][col])
fig.canvas.draw_idle()
slider.on_changed(update)
plt.show()
Running the code as-is generates the following plot, which is something, but the first adjustment of the slider plots everything (and all on the same graph):
I think my issue stems from a very nascent understanding of the relationship between the original scatter function (scat) and its counterpart inside the "update" function.
How can I create an update function that allows me to iterate through a list of nested dataframes?

histogram: setting y-axis label for pandas

I have dataframe:
d = {'group': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'D', 'D', 'D', 'D', 'D'],
'value': [0.2, 0.4, 0.6, 0.8, 0.2, 0.4, 0.6, 0.8, 0.2, 0.4, 0.6, 0.8, 1.0],
'count': [4, 3, 7, 3, 12, 14, 5, 10, 3, 8, 7, 15, 4]}
df = pd.DataFrame(data=d)
df
I want to plot multiple histograms in one figure. That is, a histogram for group A, a histogram for group B and group D in one figure. So, labels is a group column, y-axis is count, x-axis is value.
I do this, but there are incorrect values on the y-axis and it builds several figures.
axarr = df.hist(column='value', by = 'group', bins = 20)
for ax in axarr.flatten():
ax.set_xlabel("value")
ax.set_ylabel("count")
Assuming that you are looking for a grouped bar plot (pending the clarification in the comments):
Plot:
Code:
import pandas as pd
d = {'group': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'D', 'D', 'D', 'D', 'D'],
'value': [0.2, 0.4, 0.6, 0.8, 0.2, 0.4, 0.6, 0.8, 0.2, 0.4, 0.6, 0.8, 1.0],
'count': [4, 3, 7, 3, 12, 14, 5, 10, 3, 8, 7, 15, 4]}
df = pd.DataFrame(data=d)
df_pivot = pd.pivot_table(
df,
values="count",
index="value",
columns="group",
)
ax = df_pivot.plot(kind="bar")
fig = ax.get_figure()
fig.set_size_inches(7, 6)
ax.set_xlabel("value")
ax.set_ylabel("count")
Brief plot option:
pd.pivot(df, index="value", columns="group").plot(kind="bar", y='count')
Brief explanation:
Pivot table for preparation:
x-axis 'value' as index
different bars 'group' as columns
group A B D
value
0.2 4.0 12.0 3.0
0.4 3.0 14.0 8.0
0.6 7.0 5.0 7.0
0.8 3.0 10.0 15.0
1.0 NaN NaN 4.0
Pandas .plot() can handle that groupded bar plot directly after the df_pivot preparation.
Its default backend is matplotlib, so usual commands apply (like fig.savefig).
Add-on apperance:
You can make it look more like a hist plot concerning the x-axis by aligning the spacing in between the groups:
Just add a , width=0.95 (or =1.0) within the .plot( ) statements.

How to customize bar annotations to not show selected values

I have the following data set:
data = [6.92, 1.78, 0.0, 0.0, 3.5, 8.82, 3.06, 0.0, 0.0, 5.54, -10.8, -6.03, 0.0, 0.0, -6.8, 13.69, 8.61, 9.98, 0.0, 9.42, 4.91, 3.54, 2.62, 5.65, 1.95, 8.91, 11.46, 5.31, 6.93, 6.42]
Is there a way to remove the 0.0 labels from the bar plot?
I tried df = df.replace(0, "") but then I get a list index out of range error code.
My code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = [6.92, 1.78, 0.0, 0.0, 3.5, 8.82, 3.06, 0.0, 0.0, 5.54, -10.8, -6.03, 0.0, 0.0, -6.8, 13.69, 8.61, 9.98, 0.0, 9.42, 4.91, 3.54, 2.62, 5.65, 1.95, 8.91, 11.46, 5.31, 6.93, 6.42]
df = pd.DataFrame(np.array(data).reshape(6,5), columns=['Bank1', 'Bank2', 'Bank3', 'Bank4', 'Bank5'], index =['2016', '2017', '2018', '2019', '2020', '2021'])
print(df)
ax = df.plot(kind='bar', rot=0, xlabel='Year', ylabel='Total Return %', title='Overall Performance', figsize=(15, 10))
ax.bar_label(ax.containers[0], fmt='%.1f', fontsize=8, padding=3)
ax.bar_label(ax.containers[1], fmt='%.1f', fontsize=8, padding=3)
ax.bar_label(ax.containers[2], fmt='%.1f', fontsize=8, padding=3)
ax.bar_label(ax.containers[3], fmt='%.1f', fontsize=8, padding=3)
ax.bar_label(ax.containers[4], fmt='%.1f', fontsize=8, padding=3)
ax.legend(title='Columns', bbox_to_anchor=(1, 1.02), loc='upper left')
plt.show()
labels passed to matplotlib.pyplot.bar_label must be customized
Adjust the comparison (!= 0) value or range as needed.
labels = [f'{v.get_height():0.0f}' if v.get_height() != 0 else '' for v in c ] without the assignment expression (:=).
See this answer for additional details and examples using .bar_label
Tested in pandas 1.3.4, python 3.8.121., and matplotlib 3.4.31.
Minimum version required are 3.8 and 3.4.2 respectively
import pandas as pd
import matplotlib.pyplot as plt
data = [6.92, 1.78, 0.0, 0.0, 3.5, 8.82, 3.06, 0.0, 0.0, 5.54, -10.8, -6.03, 0.0, 0.0, -6.8, 13.69, 8.61, 9.98, 0.0, 9.42, 4.91, 3.54, 2.62, 5.65, 1.95, 8.91, 11.46, 5.31, 6.93, 6.42]
df = pd.DataFrame(np.array(data).reshape(6,5), columns=['Bank1', 'Bank2', 'Bank3', 'Bank4', 'Bank5'], index =['2016', '2017', '2018', '2019', '2020', '2021'])
ax = df.plot(kind='bar', rot=0, xlabel='Year', ylabel='Total Return %', title='Overall Performance', figsize=(15, 10))
for c in ax.containers:
# customize the label to account for cases when there might not be a bar section
labels = [f'{h:0.1f}' if (h := v.get_height()) != 0 else '' for v in c ]
# set the bar label
ax.bar_label(c, labels=labels, fontsize=8, padding=3)
ax.legend(title='Columns', bbox_to_anchor=(1, 1.02), loc='upper left')
plt.show()

How to create equally spaced interval in xaxis in matplotlib histogram when the bins are not equally spaced? [duplicate]

This question already has an answer here:
matplotlib histogram with equal bars width
(1 answer)
Closed 2 years ago.
I would like the matplotlib histogram to show the data in an equally spaced xaxis despite the values of its bin is not equally spaced. How do I do so? Presently the bars for age group '0-6', '7-12', '13-16', '17-20' looks thinner than the rest of my data and bars of '17-20' is overlapping with 21-30. xticklabels are also overlapping. How do I resolve these issues?
#!/usr/bin/env python3.6
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np
male_ages = [66.0, 37.0, 2.0, 56.0, 8.0, 56.0, 56.0, 31.0, 15.0, 41.0, 17.0, 40.0, 45.0, 0.5, 41.0, 27.0, 53.0, 64.0, 53.0,]
female_ages = [53.0, 56.0, 3.0, 31.0, 9.0, 73.0, 47.0, 18.0, 31.0, 28.0, 48.0, 44.0, 32.0, 42.0, 42.0, 39.0, 40.0, 38.0, 2.0 ]
age_bins_label = [ '0-6', '7-12', '13-16', '17-20', '21-30',
'31-40', '41-50', '51-60', '61-70', '71-80',
'81-90', '91-100', '101-110', '111-120' ]
age_bins = [0, 6, 12, 16, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 ]
xmax = max( male_ages, female_ages)
data = [ male_ages, female_ages ]
colors = [ 'orange', 'pink']
labels = [ 'male', 'female' ]
fig, axs = plt.subplots(2, 2, 'all', tight_layout=True, sharey=True )
axs[0, 0].hist( data, bins=age_bins, color=colors, rwidth=0.9, align='left',
stacked=False, label=labels )
axs[0, 0].legend(prop={'size': 10})
axs[0, 0].set_title('bars with legend')
axs[0, 0].get_xaxis().set_label_text( label='Age Groups', fontweight='bold' )
axs[0, 0].get_yaxis().set_label_text( label='Confirmed Cases', fontweight='bold' )
for ax in axs.flat:
ax.label_outer()
# Set x-axis
#xlabels = [ str(i) for i in age_bins[1:] ]
xlabels = age_bins_label
N_labels = len(xlabels)
plt.xticks( age_bins, xlabels )
plt.show()
I would suggest you make use of np.histogram
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np
male_ages = [66.0, 37.0, 2.0, 56.0, 8.0, 56.0, 56.0, 31.0, 15.0, 41.0, 17.0, 40.0, 45.0, 0.5, 41.0, 27.0, 53.0, 64.0, 53.0,]
female_ages = [53.0, 56.0, 3.0, 31.0, 9.0, 73.0, 47.0, 18.0, 31.0, 28.0, 48.0, 44.0, 32.0, 42.0, 42.0, 39.0, 40.0, 38.0, 2.0 ]
age_bins_label = [ '0-6', '7-12', '13-16', '17-20', '21-30',
'31-40', '41-50', '51-60', '61-70', '71-80',
'81-90', '91-100', '101-110', '111-120' ]
age_bins = [0, 6, 12, 16, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120 ]
fig, ax = plt.subplots()
ax.bar(x = age_bins_label,
height = np.histogram(male_ages, bins = age_bins)[0],
alpha=0.5,
label='male')
ax.bar(x = age_bins_label,
height = np.histogram(female_ages, bins = age_bins)[0],
alpha=0.5,
label='female')
plt.legend(loc='upper right')
Result

bar plot with vertical lines for each bar

%matplotlib inline
import matplotlib
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'index' : ['A', 'B', 'C', 'D'], 'first': [1.2, 1.23, 1.32, 1.08], 'second': [2, 2.2, 3, 1.08], 'max': [1.5, 3, 0.9, 'NaN']}).set_index('index')
I want to plot a horizontal bar chart with first and second as bars.
I want to use the max column for displaying a vertical line at the corresponding values if the other columns.
I only managed the bar plot as for now.
Like this:
Any hints on how to achieve this?
thx
I have replaced the NaN with some finite value and then you can use the following code
df = pd.DataFrame({'index' : ['A', 'B', 'C', 'D'], 'first': [1.2, 1.23, 1.32, 1.08],
'second': [2, 2.2, 3, 1.08], 'max': [1.5, 3, 0.9, 2.5]}).set_index('index')
plt.barh(range(4), df['first'], height=-0.25, align='edge')
plt.barh(range(4), df['second'], height=0.25, align='edge', color='red')
plt.yticks(range(4), df.index);
for i, val in enumerate(df['max']):
plt.vlines(val, i-0.25, i+0.25, color='limegreen')

Categories