I created a hypothetical DataFrame containing 3 measurements for 20 experiments. Each experiment is associated with a Subject (3 possibilities).
import random
random.seed(42) #set seed
tuples = list(zip(*[list(range(20)),random.choices(['Jean','Marc','Paul'], k = 20)]))#index labels
index=pd.MultiIndex.from_tuples(tuples, names=['num_exp','Subject'])#index
test= pd.DataFrame(np.random.randint(0,100,size=(20, 3)),index=index,columns=['var1','var2','var3']) #DataFrame
test.head() #first lines
head
I succeeded in constructing stacked bar plots with the 3 measurements (each bar is an experiment) for each subject:
test.groupby('Subject').plot(kind='bar', stacked=True,legend=False) #plots
plot1 plot2 plot3
Now, I would like to put each plot (for each subject) in a subplot. If I use the "subplots" argument, it gives me the following :
test.groupby('Subject').plot(kind='bar', stacked=True,legend=False,subplots= True) #plot with subplot
plotsubplot1 plotsubplot2 plotsubplot3
It created a subplot for each measurment because they correspond to columns in my DataFrame.
I don't know how I could do otherwise because I need them as columns to create stacked bars.
So here is my question :
Is it possible to construct this kind of figure with stacked bar plots in subplots (ideally in an elegant way, without iterating) ?
Thanks in advance !
I solved my problem with a simple loop without using anything else than pandas .plot()
Pandas .plot() has an ax parameters for matplotlib axes object.
So, starting from the list of distinct subjects :
subj= list(dict.fromkeys(test.index.get_level_values('Subject')))
I define my subplots :
fig, axs = plt.subplots(1, len(subj))
Then, I have to iterate for each subplot :
for a in range(len(subj)):
test.loc[test.index.get_level_values('Subject') == subj[a]].unstack(level=1).plot(ax= axs[a], kind='bar', stacked=True,legend=False,xlabel='',fontsize=10) #Plot
axs[a].set_title(subj[a],pad=0,fontsize=15) #title
axs[a].tick_params(axis='y', pad=0,size=1) #yticks
And it works well ! :finalresult
Related
I have two data frames (df1 and df2). Each have the same 10 variables with different values.
I created box plots of the variables in the data frames like so:
df1.boxplot()
df2.boxplot()
I get two graphs of 10 box plots next to each other for each variable. The actual output is the second graph, however, as obviously Python just runs the code in order.
Instead, I would either like these box plots to appear side by side OR ideally, I would like 10 graphs (one for each variable) comparing each variable by data frame (e.g. one graph for the first variable with two box plots in it, one for each data frame). Is that possible just using python library or do I have to use Matplotlib?
Thanks!
To get graphs, standard Python isn't enough. You'd need a graphical library such as matplotlib. Seaborn extends matplotlib to ease the creation of complex statistical plots. To work with Seaborn, the dataframes should be converted to long form (e.g. via pandas' melt) and then combined into one large dataframe.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# suppose df1 and df2 are dataframes, each with the same 10 columns
df1 = pd.DataFrame({i: np.random.randn(100).cumsum() for i in 'abcdefghij'})
df2 = pd.DataFrame({i: np.random.randn(150).cumsum() for i in 'abcdefghij'})
# pd.melt converts the dataframe to long form, pd.concat combines them
df = pd.concat({'df1': df1.melt(), 'df2': df2.melt()}, names=['source', 'old_index'])
# convert the source index to a column, and reset the old index
df = df.reset_index(level=0).reset_index(drop=True)
sns.boxplot(data=df, x='variable', y='value', hue='source', palette='turbo')
This creates boxes for each of the original columns, comparing the two dataframes:
Optionally, you could create multiple subplots with that same information:
sns.catplot(data=df, kind='box', col='variable', y='value', x='source',
palette='turbo', height=3, aspect=0.5, col_wrap=5)
By default, the y-axes are shared. You can disable the sharing via sharey=False. Here is an example, which also removes the repeated x axes and creates a common legend:
g = sns.catplot(data=df, kind='box', col='variable', y='value', x='source', hue='source', dodge=False,
palette='Reds', height=3, aspect=0.5, col_wrap=5, sharey=False)
g.set(xlabel='', xticks=[]) # remove x labels and ticks
g.add_legend()
PS: If you simply want to put two pandas boxplots next to each other, you can create a figure with two subplots, and pass the axes to pandas. (Note that pandas plotting is just an interface towards matplotlib.)
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15, 5))
df1.boxplot(ax=ax1)
ax1.set_title('df1')
df2.boxplot(ax=ax2)
ax2.set_title('df2')
plt.tight_layout()
plt.show()
I am trying to include 2 seaborn countplots with different scales on the same plot but the bars display as different widths and overlap as shown below. Any idea how to get around this?
Setting dodge=False, doesn't work as the bars appear on top of each other.
The main problem of the approach in the question, is that the first countplot doesn't take hue into account. The second countplot won't magically move the bars of the first. An additional categorical column could be added, only taking on the 'weekend' value. Note that the column should be explicitly made categorical with two values, even if only one value is really used.
Things can be simplified a lot, just starting from the original dataframe, which supposedly already has a column 'is_weeked'. Creating the twinx ax beforehand allows to write a loop (so writing the call to sns.countplot() only once, with parameters).
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
sns.set_style('dark')
# create some demo data
data = pd.DataFrame({'ride_hod': np.random.normal(13, 3, 1000).astype(int) % 24,
'is_weekend': np.random.choice(['weekday', 'weekend'], 1000, p=[5 / 7, 2 / 7])})
# now, make 'is_weekend' a categorical column (not just strings)
data['is_weekend'] = pd.Categorical(data['is_weekend'], ['weekday', 'weekend'])
fig, ax1 = plt.subplots(figsize=(16, 6))
ax2 = ax1.twinx()
for ax, category in zip((ax1, ax2), data['is_weekend'].cat.categories):
sns.countplot(data=data[data['is_weekend'] == category], x='ride_hod', hue='is_weekend', palette='Blues', ax=ax)
ax.set_ylabel(f'Count ({category})')
ax1.legend_.remove() # both axes got a legend, remove one
ax1.set_xlabel('Hour of Day')
plt.tight_layout()
plt.show()
use plt.xticks(['put the label by hand in your x label'])
I am trying to do EDA along with exploring the Matplotlib and Seaborn libraries.
The data_cat DataFrame has 4 columns and I want to create plots in a single row with 4 columns.
For that, I created a figure object with 4 axes objects.
fig, ax = plt.subplots(1,4, figsize = (16,4))
for i in range(len(data_cat.columns)):
sns.catplot(x = data_cat.columns[i], kind = 'count', data = data_cat, ax= ax[i])
The output for it is a figure with the 4 plots (as required) but it is followed by 4 blank plots that I think are the extra figure objects generated by the sns.catplot function.
Your code does not work as intended because sns.catplot() is a figure level function, that is designed to create its own grid of subplots if desired. So if you want to set up the subplot grid directly in matplotlib, as you do with your first line, you should use the appropriate axes level function instead, in this case sns.countplot():
fig, ax = plt.subplots(1, 4, figsize = (16,4))
for i in range(4):
sns.countplot(x = data_cat.columns[i], data = data_cat, ax= ax[i])
Alternatively, you could use pandas' df.melt() method to tidy up your dataset so that all the values from your four columns are in one column (say 'col_all'), and you have another column (say 'subplot') that identifies from which original column each value is. Then you can get all the subplots with one call:
sns.catplot(x='col_all', kind='count', data=data_cat, col='subplot')
I answered a related question here.
I've been struggling to generate the frequency plot of 2 columns named "Country" and "Company" in my DataFrame and show them as 2 subplots. Here's what I've got.
Figure1 = plt.figure(1)
Subplot1 = Figure1.add_subplot(2,1,1)
and here I'm going to use the bar chart pd.value_counts(DataFrame['Country']).plot('barh')
to shows as first subplot.
The problem is, I cant just go: Subplot1.pd.value_counts(DataFrame['Country']).plot('barh') as Subplot1. has no attribute pd. ~ Could anybody shed some light in to this?
Thanks a million in advance for your tips,
R.
You don't have to create Figure and Axes objects separately, and you should probably avoid initial caps in variable names, to differentiate them from classes.
Here, you can use plt.subplots, which creates a Figure and a number of Axes and binds them together. Then, you can just pass the Axes objects to the plot method of pandas:
from matplotlib import pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 4))
pd.value_counts(df['Country']).plot('barh', ax=ax1)
pd.value_counts(df['Company']).plot('barh', ax=ax2)
Pandas' plot method can take in a Matplotlib axes object and direct the resulting plot into that subplot.
# If you want a two plots, one above the other.
nrows = 2
ncols = 1
# Here axes contains 2 objects representing the two subplots
fig, axes = plt.subplots(nrows, ncols, figsize=(8, 4))
# Below, "my_data_frame" is the name of your Pandas dataframe.
# Change it accordingly for the code to work.
# Plot first subplot
# This counts the number of times each country appears and plot
# that as a bar char in the first subplot represented by axes[0].
my_data_frame['Country'].value_counts().plot('barh', ax=axes[0])
# Plot second subplot
my_data_frame['Company'].value_counts().plot('barh', ax=axes[1])
I suppose this is fairly easy but I tried for a while to get an answer without much success. I want to produce a stacked bar plot for two categories but I have such information in two separate date frames:
This is the code:
first_babies = live[live.birthord == 1] # first dataframe
others = live[live.birthord != 1] # second dataframe
fig = figure()
ax1 = fig.add_subplot(1,1,1)
first_babies.groupby(by=['prglength']).size().plot(
kind='bar', ax=ax1, label='first babies') # first plot
others.groupby(by=['prglength']).size().plot(kind='bar', ax=ax1, color='r',
label='others') #second plot
ax1.legend(loc='best')
ax1.set_xlabel('weeks')
ax1.set_ylabel('frequency')
ax1.set_title('Histogram')
But I want something like this or as I said, a stacked bar plot in order to better distinguish between categories:
I can't use stacked=True because it doesn't work using two different plots and I can't create a new dataframe because first_babies and othersdon't have the same number of elements.
Thanks
First create a new column to distinguish 'first_babies':
live['first_babies'] = live['birthord'].lambda(x: 'first_babies' if x==1 else 'others')
You can unstack the groupby:
grouped = live.groupby(by=['prglength', 'first_babies']).size()
unstacked_count = grouped.size().unstack()
Now you can plot a stacked bar-plot directly:
unstacked_count.plot(kind='bar', stacked=True)