Python Stacked barchart with dataframe - python

I'm trying to visualize a data frame I have with a stacked barchart, where the x is websites, the y is frequency and then the groups on the barchart are different groups using them.
This is the dataframe:
This is the plot created just by doing this:
web_data_roles.plot(kind='barh', stacked=True, figsize=(20,10))
As you can see its not what I want, vie tried changing the plot so the axes match up to the different columns of the dataframe but it just says no numerical data to plot, Not sure how to go about this anymore. so all help is appreciated

You need to organise your dataframe so that role is a column.
set_index() initial preparation
unstack() to move role out of index and make a column
droplevel() to clean up multi index columns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1,1, figsize=[10,5],
sharey=False, sharex=False, gridspec_kw={"hspace":0.3})
df = pd.read_csv(io.StringIO("""website,role,freq
www.bbc.co.uk,director,2000
www.bbc.co.uk,technical,500
www.twitter.com,director,4000
www.twitter.com,technical,1500
"""))
df.set_index(["website","role"]).unstack(1).droplevel(0,axis=1).plot(ax=ax, kind="barh", stacked=True)

Related

How to put two Pandas box plots next to each other? Or group them by variable?

I have two data frames (df1 and df2). Each have the same 10 variables with different values.
I created box plots of the variables in the data frames like so:
df1.boxplot()
df2.boxplot()
I get two graphs of 10 box plots next to each other for each variable. The actual output is the second graph, however, as obviously Python just runs the code in order.
Instead, I would either like these box plots to appear side by side OR ideally, I would like 10 graphs (one for each variable) comparing each variable by data frame (e.g. one graph for the first variable with two box plots in it, one for each data frame). Is that possible just using python library or do I have to use Matplotlib?
Thanks!
To get graphs, standard Python isn't enough. You'd need a graphical library such as matplotlib. Seaborn extends matplotlib to ease the creation of complex statistical plots. To work with Seaborn, the dataframes should be converted to long form (e.g. via pandas' melt) and then combined into one large dataframe.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# suppose df1 and df2 are dataframes, each with the same 10 columns
df1 = pd.DataFrame({i: np.random.randn(100).cumsum() for i in 'abcdefghij'})
df2 = pd.DataFrame({i: np.random.randn(150).cumsum() for i in 'abcdefghij'})
# pd.melt converts the dataframe to long form, pd.concat combines them
df = pd.concat({'df1': df1.melt(), 'df2': df2.melt()}, names=['source', 'old_index'])
# convert the source index to a column, and reset the old index
df = df.reset_index(level=0).reset_index(drop=True)
sns.boxplot(data=df, x='variable', y='value', hue='source', palette='turbo')
This creates boxes for each of the original columns, comparing the two dataframes:
Optionally, you could create multiple subplots with that same information:
sns.catplot(data=df, kind='box', col='variable', y='value', x='source',
palette='turbo', height=3, aspect=0.5, col_wrap=5)
By default, the y-axes are shared. You can disable the sharing via sharey=False. Here is an example, which also removes the repeated x axes and creates a common legend:
g = sns.catplot(data=df, kind='box', col='variable', y='value', x='source', hue='source', dodge=False,
palette='Reds', height=3, aspect=0.5, col_wrap=5, sharey=False)
g.set(xlabel='', xticks=[]) # remove x labels and ticks
g.add_legend()
PS: If you simply want to put two pandas boxplots next to each other, you can create a figure with two subplots, and pass the axes to pandas. (Note that pandas plotting is just an interface towards matplotlib.)
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15, 5))
df1.boxplot(ax=ax1)
ax1.set_title('df1')
df2.boxplot(ax=ax2)
ax2.set_title('df2')
plt.tight_layout()
plt.show()

Stacked bar plot in subplots using pandas .plot()

I created a hypothetical DataFrame containing 3 measurements for 20 experiments. Each experiment is associated with a Subject (3 possibilities).
import random
random.seed(42) #set seed
tuples = list(zip(*[list(range(20)),random.choices(['Jean','Marc','Paul'], k = 20)]))#index labels
index=pd.MultiIndex.from_tuples(tuples, names=['num_exp','Subject'])#index
test= pd.DataFrame(np.random.randint(0,100,size=(20, 3)),index=index,columns=['var1','var2','var3']) #DataFrame
test.head() #first lines
head
I succeeded in constructing stacked bar plots with the 3 measurements (each bar is an experiment) for each subject:
test.groupby('Subject').plot(kind='bar', stacked=True,legend=False) #plots
plot1 plot2 plot3
Now, I would like to put each plot (for each subject) in a subplot. If I use the "subplots" argument, it gives me the following :
test.groupby('Subject').plot(kind='bar', stacked=True,legend=False,subplots= True) #plot with subplot
plotsubplot1 plotsubplot2 plotsubplot3
It created a subplot for each measurment because they correspond to columns in my DataFrame.
I don't know how I could do otherwise because I need them as columns to create stacked bars.
So here is my question :
Is it possible to construct this kind of figure with stacked bar plots in subplots (ideally in an elegant way, without iterating) ?
Thanks in advance !
I solved my problem with a simple loop without using anything else than pandas .plot()
Pandas .plot() has an ax parameters for matplotlib axes object.
So, starting from the list of distinct subjects :
subj= list(dict.fromkeys(test.index.get_level_values('Subject')))
I define my subplots :
fig, axs = plt.subplots(1, len(subj))
Then, I have to iterate for each subplot :
for a in range(len(subj)):
test.loc[test.index.get_level_values('Subject') == subj[a]].unstack(level=1).plot(ax= axs[a], kind='bar', stacked=True,legend=False,xlabel='',fontsize=10) #Plot
axs[a].set_title(subj[a],pad=0,fontsize=15) #title
axs[a].tick_params(axis='y', pad=0,size=1) #yticks
And it works well ! :finalresult

plotting multiple columns of a pandas dataframe

I am new python and I have two columns in a dataframe that i want to plot against date
plt.scatter(thing.date,thing.loc[:,['numbers','more_numbers']])
my intuition is the above should work (because matlab allows for this kind of thing), but it doesn't, and I'm not sure why.
Is there away around this?
I'm hoping to plot these columns for a sequence of 4 dataframes on the same axes - so i'd like to use a command like the above so I can colour the columns from each data frame to make it distinctive.
Easiest is to do a loop:
fig, ax = plt.subplots()
for col in ['numbers', 'more_numbers']:
ax.scatter(things.date, things[col], label=col)
# or
# things.scatter(x='date', y=col, label=col, ax=ax)
plt.show()

Seaborn plot pandas dataframe by multiple groupby

I have pandas dataframe where I have nested 4 categories (50,60,70,80) within two categories (positive, negative) and I would like to plot with seaborn kdeplot of a column (eg., A_mean...) based on groupby. What I want to achieve is this (this was done by splitting the pandas to a list). I went over several posts, this code (Multiple single plots in seaborn with pandas groupby data) works for one level but not for the two if I want to plot this for each Game_RS:
for i, group in df_hb_SLR.groupby('Condition'):
sns.kdeplot(data=group['A_mean_per_subject'], shade=True, color='blue', label = 'label name')
I tried to use this one (Seaborn groupby pandas Series) but the first answer did not work for me:
sns.kdeplot(df_hb_SLR.A_mean_per_subject, groupby=df_hb_SLR.Game_RS)
AttributeError: 'Line2D' object has no property 'groupby'
and the pivot answer I was not able to make work.
Is there a direct way from seaborn or any better way directly from pandas Dataframe?
My data are accessible in csv format under this link -- data and I load them as usual:
df_hb_SLR = pd.read_csv('data.csv')
Thank you for help.
Here is a solution using seaborn's FacetGrid, which makes this kind of things really easy
g = sns.FacetGrid(data=df_hb_SLR, col="Condition", hue='Game_RS', height=5, aspect=0.5)
g = g.map(sns.kdeplot, 'A_mean_per_subject', shade=True)
g.add_legend()
The downside of FacetGrid is that it creates a new figure, so If you'd like to integrate those plots into a larger ensemble of subplots, you could achieve the same result using groupby() and some looping:
group1 = "Condition"
N1 = len(df_hb_SLR[group1].unique())
group2 = 'Game_RS'
target = 'A_mean_per_subject'
height = 5
aspect = 0.5
colour = ['gray', 'blue', 'green', 'darkorange']
fig, axs = plt.subplots(1,N1, figsize=(N1*height*aspect,N1*height*aspect), sharey=True)
for (group1Name,df1),ax in zip(df_hb_SLR.groupby(group1),axs):
ax.set_title(group1Name)
for (group2Name,df2),c in zip(df1.groupby(group2), colour):
sns.kdeplot(df2[target], shade=True, label=group2Name, ax=ax, color = c)

Stacked bar plots from two different sources in Pandas

I suppose this is fairly easy but I tried for a while to get an answer without much success. I want to produce a stacked bar plot for two categories but I have such information in two separate date frames:
This is the code:
first_babies = live[live.birthord == 1] # first dataframe
others = live[live.birthord != 1] # second dataframe
fig = figure()
ax1 = fig.add_subplot(1,1,1)
first_babies.groupby(by=['prglength']).size().plot(
kind='bar', ax=ax1, label='first babies') # first plot
others.groupby(by=['prglength']).size().plot(kind='bar', ax=ax1, color='r',
label='others') #second plot
ax1.legend(loc='best')
ax1.set_xlabel('weeks')
ax1.set_ylabel('frequency')
ax1.set_title('Histogram')
But I want something like this or as I said, a stacked bar plot in order to better distinguish between categories:
I can't use stacked=True because it doesn't work using two different plots and I can't create a new dataframe because first_babies and othersdon't have the same number of elements.
Thanks
First create a new column to distinguish 'first_babies':
live['first_babies'] = live['birthord'].lambda(x: 'first_babies' if x==1 else 'others')
You can unstack the groupby:
grouped = live.groupby(by=['prglength', 'first_babies']).size()
unstacked_count = grouped.size().unstack()
Now you can plot a stacked bar-plot directly:
unstacked_count.plot(kind='bar', stacked=True)

Categories