Plot Multiple bar chart with condition - python

I want to create a multiple bar chart that shows the distribution of income according to the 'Marital_Status'
ID Year_of_Birth Highest_Qualification Marital_Status Income \
0 5524 1957 Graduation Single 58138.0
1 2174 1954 Graduation Single 46344.0
2 4141 1965 Graduation Relationship 71613.0
3 6182 1984 Graduation Relationship 26646.0
4 5324 1981 PhD Relationship 58293.0
...
I tried it with seaborn
sns.displot(df['Income'], bins = 20, x=bin, hue = "Marital_Status", kde = False, multiple="stack")
plt.show
But it didn't work.
Could not interpret value `Marital_Status` for parameter `hue`
This was the most promising idea I had, but I don't get it to work...

The problem is probably in setting data argument of sns.displot to df['Income']. Seaborn needs to know that there are other columns to correctly plot the distribution of Income according to Marital_Status. Please checkout the documentation of Seaborn for more details.
Your question can be solved using seaborn.histplot:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
input_data = {"ID": [5524, 2174, 4141, 6182, 5324],
"Year_of_Birth":[1957, 1954, 1965, 1984, 1981],
"Highest_Qualification": ['Graduation'] * 4 + ['PhD'],
"Marital_Status":["Single"]*2 + ["Relationship"] * 3,
"Income": [58138.0, 46344.0, 71613.0, 26646.0, 58293.0]}
df = pd.DataFrame(data=input_data)
sns.histplot(data=df, bins = 20, x='Income', hue = "Marital_Status", multiple="stack")
plt.show()
It can be also solved using seaborn.displot (call displot instead of histplot in the above code) :
sns.displot(df, bins = 20, x='Income', hue = "Marital_Status", multiple="stack")
This gives:

Related

How to plot two bar diagram for two colums of a given table using seaborn or matplotib

This might be a simple task but I am new to plotting in python and is struggling to convert logic into code. I have 3 columns like below that consists of Countries, Quantities and Revenues:
Country
Quantities
Revenues
United Kingdom
2915836
8125479.97
EIRE
87390
253026.10
Netherlands
127083
245279.99
Germany
72068
202050.01
France
68439
184024.28
Australia
52611
122974.01
Spain
18947
56444.29
Switzerland
18769
50671.57
Belgium
12068
34926.92
Norway
10965
32184.10
Japan
14207
31914.79
Portugal
10430
30247.57
Sweden
10720
24456.55
All I want to do is creating a side by side bars for each country which would represent the revenue and quantity for each region.
So far, i have came across performing this:
sns.catplot(kind = 'bar', data = dj, y = 'Quantities,Revenues', x = 'Country', hue = 'Details')
plt.show()
But this cannot interpret the input "Country".
I hope I am making sense.
With pandas, you can simply use pandas.DataFrame.plot.bar :
dj.plot.bar(x="Country", figsize=(10, 5))
#dj[dj["Country"].ne("United Kingdom")].plot.bar(x="Country", figsize=(10, 5)) #to exclude UK
With seaborn, you can use seaborn.barplot after pandas.DataFrame.melting the original df.
fig, ax = plt.subplots(figsize=(10, 5))
ax.tick_params(axis='x', rotation=90)
dj_m = dj.melt(id_vars="Country", value_name="Values", var_name="Variables")
sns.barplot(data=dj_m, x='Country', y="Values", hue="Variables", ax=ax)
# Output :
pandas already has a built-in plotting function: .plot and you can choose which type by specifying it like; .bar(), .scatter() or using kind= and then the type; kind='bar' or kind='scatter'. So, in this situation you will use a bar.
import matplotlib.pyplot as plt # import this to show the plot
df.plot.bar(x="Country", **kwargs) # plot the bars
plt.show() # show it

Grouping variables to plot multilevel bar chart

Can you please help me with the following. I have a dataset with a variable - number of products (Prod) that takes discrete values from 1 to 3 (included). Then I have a variable (Gender) 1 for males, 0 for females. I want to plot a multilevel bar chart where on the x-axis I have number of products (Prod) and on the y-axis I have total value of these products that are grouped by the Gender. I need to create a 'count' variable that counts how many observations of each 'Prod' are in each 'Gender' category. To group and plot the variables I use the following code (which does not work):
#Group the variables
grouped_gender['count'] = main_data.groupby(['Prod', 'Gender'])[['Prod']].count()
grouped_gender = pd.DataFrame(grouped_gender)
#Plot
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(10, 7))
barplot2 = sns.barplot(
data=grouped_gender,
x='Prod',
y='count',
hue='Gender',
orient='v',
ax = axes,
ci=None,
dodge=False
)
Can you please help me to identify the problem?
Assuming you can put your DataFrame in a similar state as mine
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
grouped_gender = pd.DataFrame(
{
"Man": [50, 70],
"Woman": [90, 30]
},
index=["Product1", "Product2"]
)
grouped_gender.plot(kind="bar", stacked=True)
plt.title("Products sales")
plt.xlabel("Products")
plt.ylabel("Sales")
plt.show()
This produces the following result
Use countplot on the original dataset:
# sample dataset
df = sns.load_dataset('tips')
# `day` plays `Prod`, `sex` plays `Gender`
sns.countplot(x='day', hue='sex', data=df)
Output:
Note: if you want the data, not just the plot, use:
counts = pd.crosstab(df['day'], df['sex'])
# then to plot bar chart
# counts.plot.bar()
which gives you:
sex Male Female
day
Thur 30 32
Fri 10 9
Sat 59 28
Sun 58 18

Plot HIST of a pandas DataframeGroupbySeries

I'm working with a small dataframe with this codecademy: data
I'm trying to print data to make a small analysis with the following code:
data = pd.read_csv('insurance.csv')
age sex bmi children smoker region charges
0 19 female 27.900 0 yes southwest 16884.92400
1 18 male 33.770 1 no southeast 1725.55230
2 28 male 33.000 3 no southeast 4449.46200
3 33 male 22.705 0 no northwest 21984.47061
4 32 male 28.880 0 no northwest 3866.85520
data.groupby('sex').region.hist()
The code returns a pandas series where the first element is subplot1 and the second subplot2.
The code plots them on the same figure, and I'm unable to plot them separately.
To produce a histogram for each column based on gender:
'children' and 'smoker' look different because the number is discrete with only 6 and 2 unique values, respectively.
data.groupby('sex').hist(layout=(1, 4), figsize=(12, 4), ec='k', grid=False) alone will produce the graph, but without an easy way to add a title.
Producing the correct visualization often involves reshaping the data for the plotting API.
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.2, seaborn 0.11.2
import pandas as pd
# load data
data = pd.read_csv('insurance.csv')
# convert smoker from a string to int value; hist doesn't work on object type columns
data.smoker = data.smoker.map({'no': 0, 'yes': 1})
# group each column by sex; data.groupby(['sex', 'region']) is also an option
for gender, df in data.groupby('sex'):
# plot a hist for each column
axes = df.hist(layout=(1, 5), figsize=(15, 4), ec='k', grid=False)
# extract the figure object from the array of axes
fig = axes[0][0].get_figure()
# add the gender as the title
fig.suptitle(gender)
In regards to data.groupby('sex').region.hist() in the OP, this is a count plot, which shows counts of gender for each region; it is not a histogram.
pandas.crosstab by default computes a frequency table of the factors
ax = pd.crosstab(data.region, data.sex).plot(kind='bar', rot=0)
ax.legend(title='gender', bbox_to_anchor=(1, 1.02), loc='upper left')
Use seaborn.displot
This requires converting the data from a wide to long format, which is done with pandas.DataFrame.melt
import pandas as pd
import seaborn as sns
data = pd.read_csv('insurance.csv')
data.smoker = data.smoker.map({'no': 0, 'yes': 1})
# convert the dataframe from a wide to long form
df = data.melt(id_vars=['sex', 'region'])
# plot
p = sns.displot(data=df, kind='hist', x='value', col='variable', row='region', hue='sex',
multiple='dodge', common_bins=False, facet_kws={'sharey': False, 'sharex': False})

How to put Seaborn FacetGrid on plt.subplots grid and change title or xlabel

I am trying to create a distplot/kdeplot with hue as the target for multiple columns in my dataset. I am doing this by using kdeplot with a facetgrid and using the ax parameter to make it plot on on my plt.subplot figures. However, I am unable to get the title to change on the individual plot. I have tried g.fig.suptitle and ax[0,0].set_title but neither are working
df = sns.load_dataset('titanic')
fig, ax = plt.subplots(2,2,figsize=(10,5))
g = sns.FacetGrid(df, hue="survived")
g = g.map(sns.kdeplot, "age",ax=ax[0,0])
g.fig.suptitle('age')
h = g.map(sns.kdeplot, "fare",ax=ax[0,1])
h.fig.suptitle('fare')
ax[0,0].set_title ='age'
ax[1,0].set_xlabel='fare'
I am getting this image. It seems to be changing a plot at the bottom which I don't want
By the way, I don't have to use Seaborn, matplotlib is also fine
If you want two density plots for age and fare, you can try to pivot it long and then apply the facetGrid:
import seaborn as sns
df = sns.load_dataset('titanic')
So the long format looks like this, it allows you to split on the variable column:
df[['survived','age','fare']].melt(id_vars='survived')
survived variable value
0 0 age 22.00
1 1 age 38.00
2 1 age 26.00
3 1 age 35.00
4 0 age 35.00
And we plot:
g = sns.FacetGrid(df[['survived','age','fare']].melt(id_vars='survived'),
col="variable",sharex=False,sharey=False,hue="survived")
g.map(sns.kdeplot,"value")

Plot multiple distplot in seaborn Facetgrid

I have a dataframe which looks like below:
df:
RY MAJ_CAT Value
2016 Cause Unknown 0.00227
2016 Vegetation 0.04217
2016 Vegetation 0.04393
2016 Vegetation 0.07878
2016 Defective Equip 0.00137
2018 Cause Unknown 0.00484
2018 Defective Equip 0.01546
2020 Defective Equip 0.05169
2020 Defective Equip 0.00515
2020 Cause Unknown 0.00050
I want to plot the distribution of the value over the given years. So I used distplot of seaborn by using following code:
year_2016 = df[df['RY']==2016]
year_2018 = df[df['RY']==2018]
year_2020 = df[df['RY']==2020]
sns.distplot(year_2016['value'].values, hist=False,rug=True)
sns.distplot(year_2018['value'].values, hist=False,rug=True)
sns.distplot(year_2020['value'].values, hist=False,rug=True)
In the next step I want to plot the same value distribution over the given year w.r.t MAJ_CAT. So I decided to use Facetgrid of seaborn, below is the code :
g = sns.FacetGrid(df,col='MAJ_CAT')
g = g.map(sns.distplot,df[df['RY']==2016]['value'].values, hist=False,rug=True))
g = g.map(sns.distplot,df[df['RY']==2018]['value'].values, hist=False,rug=True))
g = g.map(sns.distplot,df[df['RY']==2020]['value'].values, hist=False,rug=True))
However, when it ran the above command, it throws the following error:
KeyError: "None of [Index([(0.00227, 0.04217, 0.043930000000000004, 0.07877999999999999, 0.00137, 0.0018800000000000002, 0.00202, 0.00627, 0.00101, 0.07167000000000001, 0.01965, 0.02775, 0.00298, 0.00337, 0.00088, 0.04049, 0.01957, 0.01012, 0.12065, 0.23699, 0.03639, 0.00137, 0.03244, 0.00441, 0.06748, 0.00035, 0.0066099999999999996, 0.00302, 0.015619999999999998, 0.01571, 0.0018399999999999998, 0.03425, 0.08046, 0.01695, 0.02416, 0.08975, 0.0018800000000000002, 0.14743, 0.06366000000000001, 0.04378, 0.043, 0.02997, 0.0001, 0.22799, 0.00611, 0.13960999999999998, 0.38871, 0.018430000000000002, 0.053239999999999996, 0.06702999999999999, 0.14103, 0.022719999999999997, 0.011890000000000001, 0.00186, 0.00049, 0.13947, 0.0067, 0.00503, 0.00242, 0.00137, 0.00266, 0.38638, 0.24068, 0.0165, 0.54847, 1.02545, 0.01889, 0.32750999999999997, 0.22526, 0.24516, 0.12791, 0.00063, 0.0005200000000000001, 0.00921, 0.07665, 0.00116, 0.01042, 0.27046, 0.03501, 0.03159, 0.46748999999999996, 0.022090000000000002, 2.2972799999999998, 0.69021, 0.22529000000000002, 0.00147, 0.1102, 0.03234, 0.05799, 0.11744, 0.00896, 0.09556, 0.03202, 0.01347, 0.00923, 0.0034200000000000003, 0.041530000000000004, 0.04848, 0.00062, 0.0031100000000000004, ...)], dtype='object')] are in the [columns]"
I am not sure where am I making the mistake. Could anyone please help me in fixing the issue?
setup the dataframe
import pandas as pd
import numpy as np
import seaborn as sns
# setup dataframe of synthetic data
np.random.seed(365)
data = {'RY': np.random.choice([2016, 2018, 2020], size=400),
'MAJ_CAT': np.random.choice(['Cause Unknown', 'Vegetation', 'Defective Equip'], size=400),
'Value': np.random.random(size=400) }
df = pd.DataFrame(data)
Updated Answer
From seaborn v0.11
Use sns.displot with kind='kde' and rug=True
Is a figure-level interface for drawing distribution plots onto a FacetGrid.
Plotting all 'MAJ_CAT' together
sns.displot(data=df, x='Value', hue='RY', kind='kde', palette='tab10', rug=True)
Plotting 'MAJ_CAT' separately
sns.displot(data=df, col='MAJ_CAT', x='Value', hue='RY', kind='kde', palette='tab10', rug=True)
Original Answer
In seaborn v0.11, distplot is deprecated
distplot
Consolidate the original code to generate the distplot
for year in df.RY.unique():
values = df.Value[df.RY == year]
sns.distplot(values, hist=False, rug=True)
facetgrid
properly configure the mapping and add hue to FacetGrid
g = sns.FacetGrid(df, col='MAJ_CAT', hue='RY')
p1 = g.map(sns.distplot, 'Value', hist=False, rug=True).add_legend()

Categories