I suppose this is fairly easy but I tried for a while to get an answer without much success. I want to produce a stacked bar plot for two categories but I have such information in two separate date frames:
This is the code:
first_babies = live[live.birthord == 1] # first dataframe
others = live[live.birthord != 1] # second dataframe
fig = figure()
ax1 = fig.add_subplot(1,1,1)
first_babies.groupby(by=['prglength']).size().plot(
kind='bar', ax=ax1, label='first babies') # first plot
others.groupby(by=['prglength']).size().plot(kind='bar', ax=ax1, color='r',
label='others') #second plot
ax1.legend(loc='best')
ax1.set_xlabel('weeks')
ax1.set_ylabel('frequency')
ax1.set_title('Histogram')
But I want something like this or as I said, a stacked bar plot in order to better distinguish between categories:
I can't use stacked=True because it doesn't work using two different plots and I can't create a new dataframe because first_babies and othersdon't have the same number of elements.
Thanks
First create a new column to distinguish 'first_babies':
live['first_babies'] = live['birthord'].lambda(x: 'first_babies' if x==1 else 'others')
You can unstack the groupby:
grouped = live.groupby(by=['prglength', 'first_babies']).size()
unstacked_count = grouped.size().unstack()
Now you can plot a stacked bar-plot directly:
unstacked_count.plot(kind='bar', stacked=True)
Related
I created a hypothetical DataFrame containing 3 measurements for 20 experiments. Each experiment is associated with a Subject (3 possibilities).
import random
random.seed(42) #set seed
tuples = list(zip(*[list(range(20)),random.choices(['Jean','Marc','Paul'], k = 20)]))#index labels
index=pd.MultiIndex.from_tuples(tuples, names=['num_exp','Subject'])#index
test= pd.DataFrame(np.random.randint(0,100,size=(20, 3)),index=index,columns=['var1','var2','var3']) #DataFrame
test.head() #first lines
head
I succeeded in constructing stacked bar plots with the 3 measurements (each bar is an experiment) for each subject:
test.groupby('Subject').plot(kind='bar', stacked=True,legend=False) #plots
plot1 plot2 plot3
Now, I would like to put each plot (for each subject) in a subplot. If I use the "subplots" argument, it gives me the following :
test.groupby('Subject').plot(kind='bar', stacked=True,legend=False,subplots= True) #plot with subplot
plotsubplot1 plotsubplot2 plotsubplot3
It created a subplot for each measurment because they correspond to columns in my DataFrame.
I don't know how I could do otherwise because I need them as columns to create stacked bars.
So here is my question :
Is it possible to construct this kind of figure with stacked bar plots in subplots (ideally in an elegant way, without iterating) ?
Thanks in advance !
I solved my problem with a simple loop without using anything else than pandas .plot()
Pandas .plot() has an ax parameters for matplotlib axes object.
So, starting from the list of distinct subjects :
subj= list(dict.fromkeys(test.index.get_level_values('Subject')))
I define my subplots :
fig, axs = plt.subplots(1, len(subj))
Then, I have to iterate for each subplot :
for a in range(len(subj)):
test.loc[test.index.get_level_values('Subject') == subj[a]].unstack(level=1).plot(ax= axs[a], kind='bar', stacked=True,legend=False,xlabel='',fontsize=10) #Plot
axs[a].set_title(subj[a],pad=0,fontsize=15) #title
axs[a].tick_params(axis='y', pad=0,size=1) #yticks
And it works well ! :finalresult
I'm trying to visualize a data frame I have with a stacked barchart, where the x is websites, the y is frequency and then the groups on the barchart are different groups using them.
This is the dataframe:
This is the plot created just by doing this:
web_data_roles.plot(kind='barh', stacked=True, figsize=(20,10))
As you can see its not what I want, vie tried changing the plot so the axes match up to the different columns of the dataframe but it just says no numerical data to plot, Not sure how to go about this anymore. so all help is appreciated
You need to organise your dataframe so that role is a column.
set_index() initial preparation
unstack() to move role out of index and make a column
droplevel() to clean up multi index columns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1,1, figsize=[10,5],
sharey=False, sharex=False, gridspec_kw={"hspace":0.3})
df = pd.read_csv(io.StringIO("""website,role,freq
www.bbc.co.uk,director,2000
www.bbc.co.uk,technical,500
www.twitter.com,director,4000
www.twitter.com,technical,1500
"""))
df.set_index(["website","role"]).unstack(1).droplevel(0,axis=1).plot(ax=ax, kind="barh", stacked=True)
I have a dataframe in pandas that I'm trying to create two separate plots from in the same function, one is an ordinary boxplot w/ jitter and the other is a violin plot.
I've tried saving them to two separate variables and then saving each of those to their own image files, but in each of those files, the plots seem to contain an overlay of both of them rather than each containing their own separate plot. Here's what the code looks like:
final_boxplot = sns.boxplot(data = df)
final_violin = sns.violinplot(data = df)
final_boxplot.figure.savefig('boxplot.png')
final_violin.figure.savefig('violin.png')
any ideas on what I might be doing wrong, or any alternatives?
You should create different instance of figures and
save:
fig,ax = plt.subplots()
sns.boxplot(data=df, ax=ax)
fig.savefig('boxplot.png')
fig, ax = plt.subplots()
sns.violinplot(data=df, ax=ax)
fig.savefig('violin.png')
I have pandas dataframe where I have nested 4 categories (50,60,70,80) within two categories (positive, negative) and I would like to plot with seaborn kdeplot of a column (eg., A_mean...) based on groupby. What I want to achieve is this (this was done by splitting the pandas to a list). I went over several posts, this code (Multiple single plots in seaborn with pandas groupby data) works for one level but not for the two if I want to plot this for each Game_RS:
for i, group in df_hb_SLR.groupby('Condition'):
sns.kdeplot(data=group['A_mean_per_subject'], shade=True, color='blue', label = 'label name')
I tried to use this one (Seaborn groupby pandas Series) but the first answer did not work for me:
sns.kdeplot(df_hb_SLR.A_mean_per_subject, groupby=df_hb_SLR.Game_RS)
AttributeError: 'Line2D' object has no property 'groupby'
and the pivot answer I was not able to make work.
Is there a direct way from seaborn or any better way directly from pandas Dataframe?
My data are accessible in csv format under this link -- data and I load them as usual:
df_hb_SLR = pd.read_csv('data.csv')
Thank you for help.
Here is a solution using seaborn's FacetGrid, which makes this kind of things really easy
g = sns.FacetGrid(data=df_hb_SLR, col="Condition", hue='Game_RS', height=5, aspect=0.5)
g = g.map(sns.kdeplot, 'A_mean_per_subject', shade=True)
g.add_legend()
The downside of FacetGrid is that it creates a new figure, so If you'd like to integrate those plots into a larger ensemble of subplots, you could achieve the same result using groupby() and some looping:
group1 = "Condition"
N1 = len(df_hb_SLR[group1].unique())
group2 = 'Game_RS'
target = 'A_mean_per_subject'
height = 5
aspect = 0.5
colour = ['gray', 'blue', 'green', 'darkorange']
fig, axs = plt.subplots(1,N1, figsize=(N1*height*aspect,N1*height*aspect), sharey=True)
for (group1Name,df1),ax in zip(df_hb_SLR.groupby(group1),axs):
ax.set_title(group1Name)
for (group2Name,df2),c in zip(df1.groupby(group2), colour):
sns.kdeplot(df2[target], shade=True, label=group2Name, ax=ax, color = c)
I have a list of many aggregated data frames with identical structure.
I would like to plot two columns from each dataframe on the same graph.
I used this code snippet but it gives me a separate plot for each dataframe:
# iterate through a list
for df in frames:
df.plot(x='Time', y='G1', figsize=(16, 10))
plt.hold(True)
plt.show()
If you have each set indexed, you can just concatenate all of them and plot them at once without having to iterate.
# If not indexed:
# frames = [df.assign(sample=i) for i, df in enumerate(frames)]
df = pd.concat(frames).pivot(index='Time', columns='sample', values='G1')
df.plot(figsize=(16, 10));
This helps make sure that your data is aligned and plt.hold is deprecated in matplotlib 2.0.
As you noticed, pandas.DataFrame.plot is not affected by matplotlib's hold parameter because it creates a new figure every time. The way to get around this is to pass in the ax parameter explicitly. If ax is not None, it tells the DataFrame to plot on a specific set of axes instead of making a new figure on its own.
You can prepare a set of axes ahead of time, or use the return value of the first call to df.plot. I show the latter approach here:
ax = None
for df in frames:
ax = df.plot(x='Time', y='G1', figsize=(16, 10), ax=ax)
plt.hold(True)
plt.show()