I have a dataframe for example the following, where Material, A and B are all column headings:
Material A B
0 Iron 20.30000 5.040409
1 Antimony 0.09200 0.019933
2 Chromium 1.70000 0.237762
3 Copper 8.10000 2.522951
I want to be able to have a 2x2 subplots consisting of bar graphs based on the 4 rows. The heading of each of the 4 subplots would be the material. Each subplot would have two bars for each value of A and B, each bar is in the subplot have a colour associated to A and B. Finally also would be nice to have a legend showing the colour and what it represents i.e. A and B.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import matplotlib.style as style
fig, ax = plt.subplots(2,2)
#Enter for loop
I think a for loop would be the best way to do it, but simply cannot figure out the for loop. Thanks.
fig, axs = plt.subplots(2,2, constrained_layout=True)
for ax,(idx,row) in zip(axs.flat, df.iterrows()):
row[['A','B']].plot.bar(ax=ax, color=['C0','C1'])
proxy = ax.bar([0,0],[0,0], color=['C0','C1'])
fig.legend(proxy,['A','B'], bbox_to_anchor=(1,1), loc='upper right')
Note that the same result can be achieved using pandas only, but first you need to reshape your data
df2 = df.set_index('Material').T
Material Iron Antimony Chromium Copper
A 20.300000 0.092000 1.700000 8.100000
B 5.040409 0.019933 0.237762 2.522951
df2.plot(kind='bar', subplots=True, layout=(2,2), legend=False, color=[['C0','C1']])
You can do in this way:
df = df.set_index('Material')
fig = plt.figure(figsize=(10,8))
for i, (name, row) in enumerate(df.iterrows()):
ax = plt.subplot(2,2, i+1)
df.iloc[i].plot.bar(color=['C0', 'C1'])
fig.legend(ax.bar([0,0],[0,0], color=['C0','C1']),['A','B'], loc=5)
Hi I'm trying to plot a pointplot and scatterplot on one graph with the same dataset so I can see the individual points that make up the pointplot.
Here is the code I am using:
xlPath = r'path to data here'
df = pd.concat(pd.read_excel(xlPath, sheet_name=None),ignore_index=True)
sns.pointplot(data=df, x='ID', y='HM (N/mm2)', palette='bright', capsize=0.15, alpha=0.5, ci=95, join=True, hue='Layer')
sns.scatterplot(data=df, x='ID', y='HM (N/mm2)')
When I plot, for some reason the points from the scatterplot are offsetting one ID spot right on the x-axis. When I plot the scatter or the point plot separately, they each are in the correct ID spot. Why would plotting them on the same plot cause the scatterplot to offset one right?
Edit: Tried to make the ID column categorical, but that didn't work either.
Seaborn's pointplot creates a categorical x-axis while here the scatterplot uses a numerical x-axis.
Explicitly making the x-values categorical: df['ID'] = pd.Categorical(df['ID']), isn't sufficient, as the scatterplot still sees numbers. Changing the values to strings does the trick. To get them in the correct order, sorting might be necessary.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# first create some test data
df = pd.DataFrame({'ID': np.random.choice(np.arange(1, 49), 500),
'HM (N/mm2)': np.random.uniform(1, 10, 500)})
df['Layer'] = ((df['ID'] - 1) // 6) % 4 + 1
df['HM (N/mm2)'] += df['Layer'] * 8
df['Layer'] = df['Layer'].map(lambda s: f'Layer {s}')
# sort the values and convert the 'ID's to strings
df = df.sort_values('ID')
df['ID'] = df['ID'].astype(str)
fig, ax = plt.subplots(figsize=(12, 4))
sns.pointplot(data=df, x='ID', y='HM (N/mm2)', palette='bright',
capsize=0.15, alpha=0.5, ci=95, join=True, hue='Layer', ax=ax)
sns.scatterplot(data=df, x='ID', y='HM (N/mm2)', color='purple', ax=ax)
I have a data frame that contains 4 columns of data. Each of these columns is a character variable containing 5 different values ( i.e. column1 contains the values A,B,C,D or E . column2 contains the values EXCELLENT , VERY GOOD, GOOD, AVERAGE, and POOR. columns 3 and 4 are similar.
I'm trying to get a separate bar chart for each of the columns by using the below for loop. Unfortunately, it only provides me with the bar chart for column 4. It does not provide the bar chart for the previous 3 columns. Not sure what I am doing wrong.
for i in categorical_attribs:
CharacterVarDF [i].value_counts().plot(kind='bar')
Simply set up matplotlib subplots with number of rows and columns. Then in loop, assign each column bar plot to each ax:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(figsize=(8,6), ncols=1, nrows=CharacterVarDF.shape[1])
for col, ax in zip(CharacterVarDF.columns, np.ravel(axes)):
CharacterVarDF[col].value_counts().plot(kind='bar', ax=ax, rot=0, title=col)
To demonstrate with random data:
import numpy as np
import pandas as pd
from matplotlib import rc
import matplotlib.pyplot as plt
env_df = pd.DataFrame({
"planetary_boundaries": np.random.choice(
["ocean", "land", "biosphere", "atmosphere",
"climate", "soil", "ozone", "freshwater"], 50),
"species": np.random.choice(
["invertebrates", "vertebrates", "plants", "fungi & protists"], 50),
"tipping_points": np.random.choice(
["Arctic Sea Ice", "Greenland ice sheet", "West Antarctica ice sheet",
"Amazon Rainforest", "Boreal forest", "Indian Monsoon",
"Atlantic meridional overturning circulation",
"West African Monsoon", "Coral reef"], 50)
rc('font', **{'family' : 'Arial'})
fig, axes = plt.subplots(ncols=1, nrows=env_df.shape[1], figsize=(7,7))
for col, ax in zip(env_df.columns, np.ravel(axes)):
env_df[col] = env_df[col].str.replace(" ", "\n")
kind='bar', ax=ax, color='g', rot=0,
title=col.replace("_", " ").title(),
HI all I have the following groups of data:
sumcosts = df.groupby('AgeGroup').Costs.sum()
18-25 536295.37
25-35 1784085.88
35-45 2395250.62
45-55 5483060.33
55-65 11652094.30
65-75 9633490.63
75+ 5186867.32
Name: Costs, dtype: float64
countoftrips = df.groupby('AgeGroup').Booking.nunique()
18-25 139
25-35 398
35-45 379
45-55 738
55-65 1417
65-75 995
75+ 545
Name: Booking, dtype: int64
When trying to plot these i have used the following:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns
fig, ax1 = plt.subplots()
sns.barplot(data=sumcosts, palette="rocket", ax=ax1)
ax2 = ax1.twinx()
sns.lineplot(data=countoftrips, palette="rocket", ax=ax2)
the output is this:
The line section looks correct but the bar chart has obviously stoppoed in the first age bracket. Any ideas on how to correct? I tried to define the x='Agegroup' and y='Costs' but then got errors and this is the most progress I can get to. Thanks very much!
your barplot appears to be showing the sum of all costs, not just those of the 18-25 age group. The fact this bar is appearing under the x-axis label for the 18-25 group is only b/c of the positioning of your axis for the line plot - which makes it confusing.
I created a dummy data set of 1000 rows in a .csv to graph this
example, but my values are different - so the plots will look visually
different, everything else will work the same for you.
Jupyter Notebook Setup:
(images added to reflect outputs)
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
%matplotlib inline
# Read in dataset 'df', showing the header
df = pd.read_csv('./data-raw.csv')
Assuming you have no NaN values in your data ... otherwise you can use dropna() to remove them.
# Check if there are any NaN values in the all_stocks dataframe
print('Number of NaN values in the columns of our DataFrame:\n', df.isnull().sum())
# Remove any rows that contain NaN values using dropna (as applicable)
data.dropna(axis=0, inplace=True)
Your sumcosts and countoftrips are not a requirement for creating your plots, and I believe are the cause of your plotting error for the bar graph. I've included them here, but are not using them when creating the plot.
Plot Type:
It is also important to keep in mind that a bar plot shows only the mean (or other estimator, i.e std) value, but in many cases, it may be more informative to show the distribution of values at each level of the categorical variables. In that case, other approaches such as a box or violin plot may be more appropriate.
This is assuming you want to have the line and bar plot layered over each other, as in your example:
# This plot has both graphs on the axis you outlined in your code,
# I used the ci = None parameter to remove the confidence intervals to
# make the combined plot easier to read (optional)
fig, ax1 = plt.subplots()
sb.barplot(data = df, x = 'AgeGroup', y = 'Costs', ci = None,
ax = ax1, palette = 'rocket', order = ['18-25',
'25-35','35-45','45-55','55-65', '65-75', '75+']);
ax2 = ax1.twinx()
sb.lineplot(data = df, x = 'AgeGroup', y = 'Booking', ax = ax2, ci = None);
plt.xlabel('Age Group Ranges');
Here is an alternative you could try, also using subplot, but separating the two plots.
# Adjusting the plot size just to make it easier to read here:
plt.figure(figsize = [14, 4])
#Bar Chart on Left
plt.subplot(1, 2, 1) # 1 row, 2 cols, subplot 1
sb.barplot(data = df, x = 'AgeGroup', y = 'Costs', palette = 'rocket',
ci = 'sd', order = ['18-25', '25-35', '35-45',
'45-55','55-65', '65-75', '75+']);
plt.xlabel('Age Group Ranges')
# Line Chart on Right
plt.subplot(1, 2, 2) # 1 row, 2 cols, subplot 2
sb.lineplot(data = df, x = 'AgeGroup', y = 'Booking', ci = None)
plt.xlabel('Age Group Ranges')
Hope you find helpful!
I have a dataframe which has a number of values per date (datetime field). This values are classified in U (users) and S (session) by using a column Group. Seaborn is used to visualize two boxplots per date, where the hue is set to Group.
The problem comes when considering that the values corresponding to U (users) are much bigger than those corresponding to S (session), making the S data illegible. Thus, I need to come up with a solution that allows me to plot both series (U and S) in the same figure in an understandable manner.
I wonder if independent Y axes (with different scales) can be set to each hue, so that both Y axes are shown (as when using twinx but without losing hue visualization capabilities).
Any other alternative would be welcome =)
The S boxplot time series boxplot:
The combined boxplot time series using hue. Obviously it's not possible to see any information about the S group because of the scale of the Y axis:
The columns of the dataframe:
| Day (datetime) | n_data (numeric) | Group (S or U)|
The code line generating the combined boxplot:
seaborn.boxplot(ax=ax,x='Day', y='n_data', hue='Group', data=df,
palette='PRGn', showfliers=False)
Managed to find a solution by using twinx:
fig,ax= plt.subplots(figsize=(50,10))
tmpU = groups.copy()
tmpU.loc[tmp['Group']!='U','n_data'] = np.nan
tmpS = grupos.copy()
tmpS.loc[tmp['Group']!='S','n_data'] = np.nan
ax=seaborn.boxplot(ax=ax,x='Day', y = 'n_data', hue='Group', data=tmpU, palette = 'PRGn', showfliers=False)
ax2 = ax.twinx()
seaborn.boxplot(ax=ax2,x='Day', y = 'n_data', hue='Group', data=tmpS, palette = 'PRGn', showfliers=False)
handles,labels = ax.get_legend_handles_labels()
l= plt.legend(handles[0:2],labels[0:2],loc=1)
for label in ax.get_xticklabels()[::2]:
The code above generates the following figure:
Which in this case turns out to be too dense to be published. Therefore I would adopt a visualization based in subplots, as Parfait susgested in his/her answer.
It wasn't an obvious solution to me so I would like to thank Parfait for his/her answer.
Consider building separate plots on same figure with y-axes ranges tailored to subsetted data. Below demonstrates with random data seeded for reproducibility (for readers of this post).
Data (with U values higher than S values)
import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt
u_df = pd.DataFrame({'Day': pd.date_range('2016-10-01', periods=10)\
.append(pd.date_range('2016-10-01', periods=10)),
'n_data': np.random.uniform(0,800,20),
'Group': 'U'})
s_df = pd.DataFrame({'Day': pd.date_range('2016-10-01', periods=10)\
.append(pd.date_range('2016-10-01', periods=10)),
'n_data': np.random.uniform(0,200,20),
'Group': 'S'})
df = pd.concat([u_df, s_df], ignore_index=True)
df['Day'] = df['Day'].astype('str')
fig = plt.figure(figsize=(10,5))
for i,g in enumerate(df.groupby('Group')):
plt.title('N_data of {}'.format(g[0]))
plt.subplot(2, 1, i+1)
seaborn.boxplot(x="Day", y="n_data", data=g[1], palette="PRGn", showfliers=False)
To retain original hue and grouping, render all non-group n_data to np.nan:
fig = plt.figure(figsize=(10,5))
for i,g in enumerate(df.Group.unique()):
plt.subplot(2, 1, i+1)
tmp = df.copy()
tmp.loc[tmp['Group']!=g, 'n_data'] = np.nan
seaborn.boxplot(x="Day", y="n_data", hue="Group", data=tmp,
palette="PRGn", showfliers=False)
So one option to do a grouped box plot with two separate axis is to use hue_order= ['value, np.nan] in your argument for sns.boxplot:
fig = plt.figure(figsize=(14,8))
ax = sns.boxplot(x="lon_bucketed", y="value", data=m, hue='name', hue_order=['co2',np.nan],
width=0.75,showmeans=True,meanprops={"marker":"s","markerfacecolor":"black", "markeredgecolor":"black"},linewidth=0.5 ,palette = customPalette)
ax2 = ax.twinx()
ax2 = sns.boxplot(ax=ax2,x="lon_bucketed", y="value", data=m, hue='name', hue_order=[np.nan,'g_xco2'],
width=0.75,showmeans=True,meanprops={"marker":"s","markerfacecolor":"black", "markeredgecolor":"black"},linewidth=0.5, palette = customPalette)
ax1.grid(alpha=0.5, which = 'major')
GW = mpatches.Patch(color='seagreen', label='$CO_2$')
WW = mpatches.Patch(color='mediumaquamarine', label='$XCO_2$')
ax, ax2.legend(handles=[GW,WW], loc='upper right',prop={'size': 14}, fontsize=12)
ax.set_title("$XCO_2$ vs. $CO_2$",fontsize=18)
ax.set_xlabel('Longitude [\u00b0]',fontsize=14)
ax.set_ylabel('$CO_2$ [ppm]',fontsize=14)
ax2.set_ylabel('$XCO_2$ [ppm]',fontsize=14)
I have a few Pandas DataFrames sharing the same value scale, but having different columns and indices. When invoking df.plot(), I get separate plot images. what I really want is to have them all in the same plot as subplots, but I'm unfortunately failing to come up with a solution to how and would highly appreciate some help.
You can manually create the subplots with matplotlib, and then plot the dataframes on a specific subplot using the ax keyword. For example for 4 subplots (2x2):
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=2, ncols=2)
Here axes is an array which holds the different subplot axes, and you can access one just by indexing axes.
If you want a shared x-axis, then you can provide sharex=True to plt.subplots.
You can see e.gs. in the documentation demonstrating joris answer. Also from the documentation, you could also set subplots=True and layout=(,) within the pandas plot function:
df.plot(subplots=True, layout=(1,2))
You could also use fig.add_subplot() which takes subplot grid parameters such as 221, 222, 223, 224, etc. as described in the post here. Nice examples of plot on pandas data frame, including subplots, can be seen in this ipython notebook.
You can plot multiple subplots of multiple pandas data frames using matplotlib with a simple trick of making a list of all data frame. Then using the for loop for plotting subplots.
Working code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# dataframe sample data
df1 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df2 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df3 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df4 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df5 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df6 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
#define number of rows and columns for subplots
# make a list of all dataframes
df_list = [df1 ,df2, df3, df4, df5, df6]
fig, axes = plt.subplots(nrow, ncol)
# plot counter
for r in range(nrow):
for c in range(ncol):
Using this code you can plot subplots in any configuration. You need to define the number of rows nrow and the number of columns ncol. Also, you need to make list of data frames df_list which you wanted to plot.
You can use the familiar Matplotlib style calling a figure and subplot, but you simply need to specify the current axis using plt.gca(). An example:
df.A.plot() #no need to specify for first axis
You can use this:
fig = plt.figure()
ax = fig.add_subplot(221)
ax = fig.add_subplot(222)
You may not need to use Pandas at all. Here's a matplotlib plot of cat frequencies:
x = np.linspace(0, 2*np.pi, 400)
y = np.sin(x**2)
f, axes = plt.subplots(2, 1)
for c, i in enumerate(axes):
axes[c].plot(x, y)
Option 1: Create subplots from a dictionary of dataframes with long (tidy) data
There is a dictionary of multiple dataframes of tidy data that are either:
Created by reading in from files
Created by separating a single dataframe into multiple dataframes
The categories, cat, may be overlapping, but all dataframes don't necessarily contain all values of cat
This example uses a dict of dataframes, but a list of dataframes would be similar.
If the dataframes are wide, use pandas.DataFrame.melt to convert them to long form.
Because dataframes are being iterated through, there's no guarantee that colors will be mapped the same for each plot
A custom color map needs to be created from the unique 'cat' values for all the dataframes
Since the colors will be the same, place one legend to the side of the plots, instead of a legend in every plot
Tested in python 3.10, pandas 1.4.3, matplotlib 3.5.1, seaborn 0.11.2
Imports and Test Data
import pandas as pd
import numpy as np # used for random data
import matplotlib.pyplot as plt
from matplotlib.patches import Patch # for custom legend - square patches
from matplotlib.lines import Line2D # for custom legend - round markers
import seaborn as sns
import math import ceil # determine correct number of subplot
# synthetic data
df_dict = dict()
for i in range(1, 7):
np.random.seed(i) # for repeatable sample data
data_length = 100
data = {'cat': np.random.choice(['A', 'B', 'C'], size=data_length),
'x': np.random.rand(data_length), 'y': np.random.rand(data_length)}
df_dict[i] = pd.DataFrame(data)
# display(df_dict[1].head())
cat x y
0 B 0.944595 0.606329
1 A 0.586555 0.568851
2 A 0.903402 0.317362
3 B 0.137475 0.988616
4 B 0.139276 0.579745
# display(df_dict[6].tail())
cat x y
95 B 0.881222 0.263168
96 A 0.193668 0.636758
97 A 0.824001 0.638832
98 C 0.323998 0.505060
99 C 0.693124 0.737582
Create color mappings and plot
# create color mapping based on all unique values of cat
unique_cat = {cat for v in df_dict.values() for cat in v.cat.unique()} # get unique cats
colors = sns.color_palette('tab10', n_colors=len(unique_cat)) # get a number of colors
cmap = dict(zip(unique_cat, colors)) # zip values to colors
col_nums = 3 # how many plots per row
row_nums = math.ceil(len(df_dict) / col_nums) # how many rows of plots
# create the figue and axes
fig, axes = plt.subplots(row_nums, col_nums, figsize=(9, 6), sharex=True, sharey=True)
# convert to 1D array for easy iteration
axes = axes.flat
# iterate through dictionary and plot
for ax, (k, v) in zip(axes, df_dict.items()):
sns.scatterplot(data=v, x='x', y='y', hue='cat', palette=cmap, ax=ax)
sns.despine(top=True, right=True)
ax.legend_.remove() # remove the individual plot legends
ax.set_title(f'dataset = {k}', fontsize=11)
# create legend from cmap
# patches = [Patch(color=v, label=k) for k, v in cmap.items()] # square patches
patches = [Line2D([0], [0], marker='o', color='w', markerfacecolor=v, label=k, markersize=8) for k, v in cmap.items()] # round markers
# place legend outside of plot; change the right bbox value to move the legend up or down
plt.legend(title='cat', handles=patches, bbox_to_anchor=(1.06, 1.2), loc='center left', borderaxespad=0, frameon=False)
Option 2: Create subplots from a single dataframe with multiple separate datasets
The dataframes must be in a long form with the same column names.
This option uses pd.concat to combine multiple dataframes into a single dataframe, and .assign to add a new column.
See Import multiple csv files into pandas and concatenate into one DataFrame for creating a single dataframes from a list of files.
This option is easier because it doesn't require manually mapping colors to 'cat'
Combine DataFrames
# using df_dict, with dataframes as values, from the top
# combine all the dataframes in df_dict to a single dataframe with an identifier column
df = pd.concat((v.assign(dataset=k) for k, v in df_dict.items()), ignore_index=True)
# display(df.head())
cat x y dataset
0 B 0.944595 0.606329 1
1 A 0.586555 0.568851 1
2 A 0.903402 0.317362 1
3 B 0.137475 0.988616 1
4 B 0.139276 0.579745 1
# display(df.tail())
cat x y dataset
595 B 0.881222 0.263168 6
596 A 0.193668 0.636758 6
597 A 0.824001 0.638832 6
598 C 0.323998 0.505060 6
599 C 0.693124 0.737582 6
Plot a FacetGrid with seaborn.relplot
sns.relplot(kind='scatter', data=df, x='x', y='y', hue='cat', col='dataset', col_wrap=3, height=3)
Both options create the same result, however, it's less complicated to combine all the dataframes, and plot a figure-level plot with sns.relplot.
Building on #joris response above, if you have already established a reference to the subplot, you can use the reference as well. For example,
ax1 = plt.subplot2grid((50,100), (0, 0), colspan=20, rowspan=10)
df.plot.barh(ax=ax1, stacked=True)
Here is a working pandas subplot example, where modes is the column names of the dataframe.
figure_size=(20, 10)
fig, ax = plt.subplots(len(modes), 1, sharex="all", sharey="all", dpi=dpi)
for i in range(len(modes)):
ax[i] = pivot_df.loc[:, modes[i]].plot.bar(figsize=(figure_size[0], figure_size[1]*len(modes)),
ax=ax[i], title=modes[i], color=my_colors[i])
import numpy as np
import pandas as pd
imoprt matplotlib.pyplot as plt
fig, ax = plt.subplots(2,2)
df = pd.DataFrame({'A':np.random.randint(1,100,10),
'B': np.random.randint(100,1000,10),
for ax in ax.flatten():
df.plot(ax =ax)