Mask the data values inside a pie plot matplotlib - python

I am using the following code to produce a pie plot.
My question is, how do I mask/hide the numbers inside the pie chart?
I do not want the numbers 0.62, 0.31 and 0.02 inside the pie chart to be visible.
Thanks in advance.
import pandas as pd
import matplotlib.pyplot as plt
df99 = pd.DataFrame({
'Data': ['A', 'B', 'C'],
'Perc': [0.62, 0.31, 0.02]})
plt.pie(df99['Perc']*100, colors=['#002c4b','#392e2c','#92847a','#ccc2bb','#6b879d','#7FBAA4','#8E654C','#006CB8','#CBBBE9','#9778D3'],counterclock=False,startangle=-270,pctdistance=1.2,labeldistance=1.2,labels=df99['Data'],
autopct=lambda p: f"{p*df99['Perc'].sum()/100:.2f}")
plt.show()

IIUC,
import pandas as pd
import matplotlib.pyplot as plt
df99 = pd.DataFrame({
'Data': ['A', 'B', 'C'],
'Perc': [0.62, 0.31, 0.02]})
plt.pie(df99['Perc']*100,
colors=['#002c4b','#392e2c','#92847a','#ccc2bb','#6b879d','#7FBAA4','#8E654C','#006CB8','#CBBBE9','#9778D3'],counterclock=False,startangle=-270,pctdistance=1.2,labeldistance=1.2,
labels=df99['Data'],
autopct=None)
plt.show()
Output:
Let's use pandas plot also,
df99.set_index('Data').mul(100).plot.pie(y='Perc',colors=['#002c4b','#392e2c','#92847a','#ccc2bb','#6b879d','#7FBAA4','#8E654C','#006CB8','#CBBBE9','#9778D3'],counterclock=False,startangle=-270)
Output:

Related

Categorize and order bar chart by Hue

I have a problem. I want to show the two highest countries of each category. But unfortunately I only get the below output. However, I would like the part to be listed as an extra category.
Is there an option?
import pandas as pd
import seaborn as sns
d = {'count': [50, 20, 30, 100, 3, 40, 5],
'country': ['DE', 'CN', 'CN', 'BG', 'PL', 'BG', 'RU'],
'part': ['b', 'b', 's', 's', 'b', 's', 's']
}
df = pd.DataFrame(data=d)
print(df)
#print(df.sort_values('count', ascending=False).groupby('party').head(2))
ax = sns.barplot(x="country", y="count", hue='part',
data=df.sort_values('count', ascending=False).groupby('part').head(2), palette='GnBu')
What I got
What I want
You can always not use seaborn and plot everything in matplotlib directly.
from matplotlib import pyplot as plt
import pandas as pd
plt.style.use('seaborn')
df = pd.DataFrame({
'count': [50, 20, 30, 100, 3, 40, 5],
'country': ['DE', 'CN', 'CN', 'BG', 'PL', 'BG', 'RU'],
'part': ['b', 'b', 's', 's', 'b', 'b', 's']
})
fig, ax = plt.subplots()
offset = .2
xticks, xlabels = [], []
handles, labels = [], []
for i, (idx, group) in enumerate(df.groupby('part')):
plot_data = group.nlargest(2, 'count')
x = [i - offset, i + offset]
barcontainer = ax.bar(x=x, height=plot_data['count'], width=.35)
xticks += x
xlabels += plot_data['country'].tolist()
handles.append(barcontainer[0])
labels.append(idx)
ax.set_xticks(xticks)
ax.set_xticklabels(xlabels)
ax.legend(handles=handles, labels=labels, title='Part')
plt.show()
The following approach creates a FacetGrid for your data. Seaborn 11.2 introduced the helpful g.axes_dict. (In the example data I changed the second entry for 'BG' to 'b', supposing that each country/part combination only occurs once, as in the example plots).
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
d = {'count': [50, 20, 30, 100, 3, 40, 5],
'country': ['DE', 'CN', 'CN', 'BG', 'PL', 'BG', 'RU'],
'part': ['b', 'b', 's', 's', 'b', 'b', 's']
}
df = pd.DataFrame(data=d)
sns.set()
g = sns.FacetGrid(data=df, col='part', col_wrap=2, sharey=True, sharex=False)
for part, df_part in df.groupby('part'):
order = df_part.nlargest(2, 'count')['country']
ax = sns.barplot(data=df_part, x='country', y='count', order=order, palette='summer', ax=g.axes_dict[part])
ax.set(xlabel=f'part = {part}')
g.set_ylabels('count')
plt.tight_layout()
plt.show()

Stacked barplot over multiindex pandas dataframe

import pandas as pd
import numpy as np
np.random.seed(365)
rows = 100
data = {'Month': np.random.choice(['2014-01', '2014-02', '2014-03', '2014-04'], size=rows),
'Code': np.random.choice(['A', 'B', 'C'], size=rows),
'ColA': np.random.randint(5, 125, size=rows),
'ColB': np.random.randint(0, 51, size=rows),}
df = pd.DataFrame(data)
df = df[((~((df.Code=='A')&(df.Month=='2014-04')))&(~((df.Code=='C')&(df.Month=='2014-03'))))]
dfg = df.groupby(['Code', 'Month']).sum()
For above. I wish to plot a stacked plot..
dfg.unstack(level=0).plot(kind='bar', stacked =True)
I wish to stack over 'Code' column. But, above is stacking over 'Month' Why?. How to better plot stacked plot with this?
The index of the input dataframe is used by default as x-value in plot.bar
IIUC, you need:
dfg.unstack(level=1).plot(kind='bar', stacked=True)
legend position:
ax = dfg.unstack(level=1).plot(kind='bar', stacked=True, legend=False)
ax.figure.legend(loc='center left', bbox_to_anchor=(1, 0.5))

how to remove count from a plotly express bar chart hover data?

Given the following code:
import pandas as pd
import plotly.express as px
d = {'col1': ['a', 'a', 'b', 'b', 'b'], 'col2': [5, 6, 7, 8, 9]}
df = pd.DataFrame(data=d)
fig = px.bar(df, y='col1', color='col1')
fig.show()
that generates the following bar plot:
how do I remove count from hover_data?
plotly==5.1.0
You can remove it from hovertemplate
import pandas as pd
import plotly.express as px
d = {'col1': ['a', 'a', 'b', 'b', 'b'], 'col2': [5, 6, 7, 8, 9]}
df = pd.DataFrame(data=d)
fig = px.bar(df, y='col1', color='col1').update_traces(hovertemplate='col1=%{y}<br><extra></extra>')
fig.show()

Pandas: Bar-Plot with two bars from repetitive x-column in dataframe

I have a slightly odd csv file where the month column is repeated as such. My goal is to create a bar graph where each month has two columns of y (from both a and b). I have tried to approach this by separating the data frame into two - a only and b only - but the repetition of the month column gets in the way. Fairly new to Python and Pandas so perhaps there is a function I'm not aware of? Any help is appreciated.
month cond. y
Jan a 4
Jan b 8
Feb a 2
Feb b 9
March a 3
March b 7
Perhaps the most common way to approach this problem is to reshape the long-form data to wide-form via pivot and then DataFrame.plot:
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({
'month': ['Jan', 'Jan', 'Feb', 'Feb', 'March', 'March'],
'cond.': ['a', 'b', 'a', 'b', 'a', 'b'],
'y': [4, 8, 2, 9, 3, 7]
})
df.pivot(index='month', columns='cond.', values='y').plot(kind='bar', rot=0)
plt.tight_layout()
plt.show()
There is a noticeable issue in that the x-axis columns appear out of order as they are alphabetically ordered and not ordered by Date. One option would be to reindex before plotting. There would be more options if the month column was regular, but since it contains both full month names and abbreviations manually reindexing is likely the best option.
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({
'month': ['Jan', 'Jan', 'Feb', 'Feb', 'March', 'March'],
'cond.': ['a', 'b', 'a', 'b', 'a', 'b'],
'y': [4, 8, 2, 9, 3, 7]
})
(
df.pivot(index='month', columns='cond.', values='y')
.reindex(['Jan', 'Feb', 'March']) # Re-order so they appear correctly on x-axis
.plot(kind='bar', rot=0)
)
plt.tight_layout()
plt.show()
Seaborn is highly popular in solving these types of questions as the hue argument allows the reshaping step to be avoided. Additionally x will be in order of appearance in the frame so reindex is also unnecessary (assuming the data appears in the correct order in the source DataFrame)
sns.barplot:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
sns.set_theme() # (optional) Use seaborn theme
df = pd.DataFrame({
'month': ['Jan', 'Jan', 'Feb', 'Feb', 'March', 'March'],
'cond.': ['a', 'b', 'a', 'b', 'a', 'b'],
'y': [4, 8, 2, 9, 3, 7]
})
sns.barplot(data=df, x='month', y='y', hue='cond.')
plt.tight_layout()
plt.show()
Using the hue attribute to categorize also works
import seaborn as sns
sns.barplot(data=df,x='Month',y='y',hue='Cond')
result_plot

bar plot with vertical lines for each bar

%matplotlib inline
import matplotlib
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'index' : ['A', 'B', 'C', 'D'], 'first': [1.2, 1.23, 1.32, 1.08], 'second': [2, 2.2, 3, 1.08], 'max': [1.5, 3, 0.9, 'NaN']}).set_index('index')
I want to plot a horizontal bar chart with first and second as bars.
I want to use the max column for displaying a vertical line at the corresponding values if the other columns.
I only managed the bar plot as for now.
Like this:
Any hints on how to achieve this?
thx
I have replaced the NaN with some finite value and then you can use the following code
df = pd.DataFrame({'index' : ['A', 'B', 'C', 'D'], 'first': [1.2, 1.23, 1.32, 1.08],
'second': [2, 2.2, 3, 1.08], 'max': [1.5, 3, 0.9, 2.5]}).set_index('index')
plt.barh(range(4), df['first'], height=-0.25, align='edge')
plt.barh(range(4), df['second'], height=0.25, align='edge', color='red')
plt.yticks(range(4), df.index);
for i, val in enumerate(df['max']):
plt.vlines(val, i-0.25, i+0.25, color='limegreen')

Categories