How to plot a grouped bar plot of count from pandas - python

I have a dataframe with the following columns:
gender class
male A
female A
male B
female B
male B
female A
I want to plot a double bar graph with the columns as each gender and the values as the count of how many of each gender are in class A vs B respectively.
So the bars should be grouped by gender and there should be 2 bars - one for each class.
How do I visualize this? I see this example but I'm really confused
speed = [0.1, 17.5, 40, 48, 52, 69, 88]
lifespan = [2, 8, 70, 1.5, 25, 12, 28]
index = ['snail', 'pig', 'elephant',
'rabbit', 'giraffe', 'coyote', 'horse']
df = pd.DataFrame({'speed': speed,
'lifespan': lifespan}, index=index)
speed lifespan
snail 0.1 2.0
pig 17.5 8.0
elephant 40.0 70.0
rabbit 48.0 1.5
giraffe 52.0 25.0
coyote 69.0 12.0
horse 88.0 28.0
ax = df.plot.bar(rot=0)
My index is just row 0 to the # of rows, so I'm confused how I can configure df.plot.bar to work with my use case. Any help would be appreciated!

Use pandas.DataFrame.pivot_table to reshape the dataframe from a long to wide format. The index will be the x-axis, and the columns will be the groups when plotted with pandas.DataFrame.plot
pd.crosstab(df['gender'], df['class']) can also be used to reshape with an aggregation.
Alternatively, use seaborn.countplot and hue='class', or the figure level version seaborn.catplot with kind='count', both of which can create the desired plot without reshaping the dataframe.
If one of the desired columns is in the index, either specify df.index or reset the index with df = df.reset_index()
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = {'gender': ['male', 'female', 'male', 'female', 'male', 'female'], 'class': ['A', 'A', 'B', 'B', 'B', 'A']}
df = pd.DataFrame(data)
# pivot the data and aggregate
dfp = df.pivot_table(index='gender', columns='class', values='class', aggfunc='size')
# plot
dfp.plot(kind='bar', figsize=(5, 3), rot=0)
plt.show()
plt.figure(figsize=(5, 3))
sns.countplot(data=df, x='gender', hue='class')
plt.show()
sns.catplot(kind='count', data=df, x='gender', hue='class', height=3, aspect=1.4)
plt.show()

Related

set x axis as column names on barplot

I have a dataframe such as this:
data = {'name': ['Bob', 'Chuck', 'Daren', 'Elisa'],
'100m': [19, 14, 12, 11],
'200m': [36, 25, 24, 24],
'400m': [67, 64, 58, 57],
'800m': [117, 120, 123, 121]}
df = pd.DataFrame(data)
name 100m 200m 400m 800m
1 Bob 19 36 67 117
2 Chuck 14 25 64 120
3 Daren 12 24 58 123
4 Elisa 11 24 57 121
My task is simple: Plot the times (along the y-axis), with the name of the event (100m, 200m, etc. along the x-axis). The hue of each bar should be determined by the 'name' column, and look something like this.
Furthermore, I would like to overlay the results (not stack). However, there is no functionality in seaborn nor matplotlib to do this.
Instead of using seaborn, which is an API for matplotlib, plot df directly with pandas.DataFrame.plot. matplotlib is the default plotting backend for pandas.
Tested in python 3.11, pandas 1.5.1, matplotlib 3.6.2, seaborn 0.12.1
ax = df.set_index('name').T.plot.bar(alpha=.7, rot=0, stacked=True)
seaborn.barplot does not have an option for stacked bars, however, this can be implemented with seaborn.histplot, as shown in Stacked Bar Chart with Centered Labels.
df must be converted from a wide format to a long format with df.melt
# melt the dataframe
dfm = df.melt(id_vars='name')
# plot
ax = sns.histplot(data=dfm, x='variable', weights='value', hue='name', discrete=True, multiple='stack')

How to color bars based on a separate pandas column

I need to plot a barchat and to apply a color according to the "Attribute" column of my dataframe
x axis = Shares
y axis = Price
fig, ax = plt.subplots()
ax.barh(df['Share'],df['Price'], align='center')
ax.set_xlabel('Shares')
ax.set_ylabel('Price')
ax.set_title('Bar Chart & Colors')
plt.show()
Thanks for your help !
There are two easy ways to plot the bars with separate colors for 'Attribute'
Transform the dataframe with .pivot and then plot with pandas.DataFrame.plot and specify kind='barh' for a horizontal bar plot
The index will be the x-axis if using kind='bar', and will be the y-axis if using kind='barh'
The columns of the transformed dataframe will each be plotted with a separate color.
pandas uses matplotlib as the default plotting backend.
Use seaborn.barplot with hue='Attribute' and orient='h'. This option works with the dataframe in a long format, as shown in the OP.
seaborn is a high-level API for matplotlib
Tested with pandas 1.3.0, seaborn 0.11.1, and matplotlib 3.4.2
Imports and DataFrame
import pandas as pd
import seaborn as sns
# test dataframe
data = {'Price': [110, 105, 119, 102, 111, 117, 110, 110], 'Share': [110, -50, 22, 79, 29, -2, 130, 140], 'Attribute': ['A', 'B', 'C', 'D', 'A', 'B', 'B', 'C']}
df = pd.DataFrame(data)
Price Share Attribute
0 110 110 A
1 105 -50 B
2 119 22 C
3 102 79 D
4 111 29 A
5 117 -2 B
6 110 130 B
7 110 140 C
pandas.DataFrame.plot
# transform the dataframe with .pivot
dfp = df.pivot(index='Price', columns='Attribute', values='Share')
Attribute A B C D
Price
102 NaN NaN NaN 79.0
105 NaN -50.0 NaN NaN
110 110.0 130.0 140.0 NaN
111 29.0 NaN NaN NaN
117 NaN -2.0 NaN NaN
119 NaN NaN 22.0 NaN
# plot
ax = dfp.plot(kind='barh', title='Bar Chart of Colors', figsize=(6, 4))
ax.set(xlabel='Shares')
ax.legend(title='Attribute', bbox_to_anchor=(1, 1), loc='upper left')
ax.grid(axis='x')
with stacked=True
ax = dfp.plot(kind='barh', stacked=True, title='Bar Chart of Colors', figsize=(6, 4))
seaborn.barplot
Note the order of the y-axis values are reversed compared to the previous plot
ax = sns.barplot(data=df, x='Share', y='Price', hue='Attribute', orient='h')
ax.set(xlabel='Shares', title='Bar Chart of Colors')
ax.legend(title='Attribute', bbox_to_anchor=(1, 1), loc='upper left')
ax.grid(axis='x')

How to merge two plots in Pandas?

I want to merge two plots, that is my dataframe:
df_inc.head()
id date real_exe_time mean mean+30% mean-30%
0 Jan 31 33.14 43.0 23.0
1 Jan 30 33.14 43.0 23.0
2 Jan 33 33.14 43.0 23.0
3 Jan 38 33.14 43.0 23.0
4 Jan 36 33.14 43.0 23.0
My first plot:
df_inc.plot.scatter(x = 'date', y = 'real_exe_time')
Then
My second plot:
df_inc.plot(x='date', y=['mean','mean+30%','mean-30%'])
When I try to merge with:
fig=plt.figure()
ax = df_inc.plot(x='date', y=['mean','mean+30%','mean-30%']);
df_inc.plot.scatter(x = 'date', y = 'real_exe_time', ax=ax)
plt.show()
I got the following:
How I can merge the right way?
You should not repeat your mean values as an extra column. df.plot() for categorical data will be plotted against the index - hence you will see the original scatter plot (also plotted against the index) squeezed into the left corner.
You could create instead an additional aggregation dataframe that you can plot then into the same graph:
import matplotlib.pyplot as plt
import pandas as pd
#test data generation
import numpy as np
n=30
np.random.seed(123)
df = pd.DataFrame({"date": np.random.choice(list("ABCDEF"), n), "real_exe_time": np.random.randint(1, 100, n)})
df = df.sort_values(by="date").reindex()
#aggregate data for plotting
df_agg = df.groupby("date")["real_exe_time"].agg(mean="mean").reset_index()
df_agg["mean+30%"] = df_agg["mean"] * 1.3
df_agg["mean-30%"] = df_agg["mean"] * 0.7
#plot both into the same subplot
ax = df.plot.scatter(x = 'date', y = 'real_exe_time')
df_agg.plot(x='date', y=['mean','mean+30%','mean-30%'], ax=ax)
plt.show()
Sample output:
You could also consider using seaborn that has, for instance, pointplots for categorical data aggregation.
I'm Guessing that you haven't transform the Date to a datetime object so the first thing you should do is this
#Transform the date to datetime object
df_inc['date']=pd.to_datetime(df_inc['date'],format='%b')
fig=plt.figure()
ax = df_inc.plot(x='date', y=['mean','mean+30%','mean-30%']);
df_inc.plot.scatter(x = 'date', y = 'real_exe_time', ax=ax)
plt.show()

generate series of plots with pandas dataframe

I have to generate a series of scatter plots (roughly 100 in total).
I have created an example to illustrate the problem.
First do an import.
import pandas as pd
Create a pandas dataframe.
# Create dataframe
data = {'name': ['Jason', 'Jason', 'Tina', 'Tina', 'Tina', 'Jason', 'Tina'],
'report_value': [4, 24, 31, 2, 3, 5, 10],
'coverage_id': ['m1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7']}
df = pd.DataFrame(data)
print(df)
Output:
coverage_id name report_value
0 m1 Jason 4
1 m2 Jason 24
2 m3 Tina 31
3 m4 Tina 2
4 m5 Tina 3
5 m6 Jason 5
6 m7 Tina 10
The goal is generate two scatter plots without using a for-loop. The name of the person, Jason or Tina, should be displayed in the title. The report_value should be on the y-axis in both plots and the coverage_id (which is a string) on the x-axis.
I thought I should start with:
df.groupby('name')
Then I need to apply the operation to every group.
This way I have the dataframe grouped by their names. I don't know how to proceed and get Python to make the two plots for me.
Thanks a lot for any help.
I think you can use this solution, but first is necessary convert string column to numeric, plot and last set xlabels:
import matplotlib.pyplot as plt
u, i = np.unique(df.coverage_id, return_inverse=True)
df.coverage_id = i
groups = df.groupby('name')
# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
ax.plot(group.coverage_id,
group.report_value,
marker='o',
linestyle='',
ms=12,
label=name)
ax.set(xticks=range(len(i)), xticklabels=u)
ax.legend()
plt.show()
Another seaborn solution with seaborn.pairplot:
import seaborn as sns
u, i = np.unique(df.coverage_id, return_inverse=True)
df.coverage_id = i
g=sns.pairplot(x_vars=["coverage_id"], y_vars=["report_value"], data=df, hue="name", size=5)
g.set(xticklabels=u, xlim=(0, None))

How to plot multiple pandas columns

I have dataframe total_year, which contains three columns (year, action, comedy).
How can I plot two columns (action and comedy) on y-axis?
My code plots only one:
total_year[-15:].plot(x='year', y='action', figsize=(10,5), grid=True)
Several column names may be provided to the y argument of the pandas plotting function. Those should be specified in a list, as follows.
df.plot(x="year", y=["action", "comedy"])
Complete example:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"year": [1914,1915,1916,1919,1920],
"action" : [2.6,3.4,3.25,2.8,1.75],
"comedy" : [2.5,2.9,3.0,3.3,3.4] })
df.plot(x="year", y=["action", "comedy"])
plt.show()
Pandas.DataFrame.plot() per default uses index for plotting X axis, all other numeric columns will be used as Y values.
So setting year column as index will do the trick:
total_year.set_index('year').plot(figsize=(10,5), grid=True)
When using pandas.DataFrame.plot, it's only necessary to specify a column to the x parameter.
The caveat is, the rest of the columns with numeric values will be used for y.
The following code contains extra columns to demonstrate. Note, 'date' is left as a string. However, if 'date' is converted to a datetime dtype, the plot API will also plot the 'date' column on the y-axis.
If the dataframe includes many columns, some of which should not be plotted, then specify the y parameter as shown in this answer, but if the dataframe contains only columns to be plotted, then specify only the x parameter.
In cases where the index is to be used as the x-axis, then it is not necessary to specify x=.
import pandas as pd
# test data
data = {'year': [1914, 1915, 1916, 1919, 1920],
'action': [2.67, 3.43, 3.26, 2.82, 1.75],
'comedy': [2.53, 2.93, 3.02, 3.37, 3.45],
'test1': ['a', 'b', 'c', 'd', 'e'],
'date': ['1914-01-01', '1915-01-01', '1916-01-01', '1919-01-01', '1920-01-01']}
# create the dataframe
df = pd.DataFrame(data)
# display(df)
year action comedy test1 date
0 1914 2.67 2.53 a 1914-01-01
1 1915 3.43 2.93 b 1915-01-01
2 1916 3.26 3.02 c 1916-01-01
3 1919 2.82 3.37 d 1919-01-01
4 1920 1.75 3.45 e 1920-01-01
# plot the dataframe
df.plot(x='year', figsize=(10, 5), grid=True)

Categories