Plotting multiple stacked bar graph given a pandas dataframe in Python - python

There are a few things that I would like to express in a bar chart and have no clue about doing it using basic graphing techniques in matplotlib. I have a dataframe which is shown below and would like to obtain a bar chart as described below. The x-axis is based on the the Type column of the dataframe and within a single bar, the different colors are based on the Name column and the size of the bar is defined by the Count number. The color of the different names need not to be the same across different types,
as long as the colors within a single bar is different.

You can use pivot and then plot
df.pivot('Type', 'Name', 'Count').plot(kind = 'bar', stacked = True, color = ['b','g','orange','m', 'r'])
Edit: To sort the values
df.pivot('Type', 'Name', 'Count').sort_values(by = 'A', ascending = False, axis = 1)\
.plot(kind = 'bar', stacked = True, color = ['g','r','b','orange', 'm'])

I you want to change the size of plot the use arg figsize=(15, 5)
df.pivot('Type', 'Name', 'Count').sort_values(by = 'A', ascending = False, axis = 1)\
.plot(kind = 'bar', stacked = True, color = ['g','r','b','orange', 'm'], figsize=(15,5))

Related

plotting two DataFrame.value_counts() in a single histogram

I want to plot in a single histogram two different dataframes (only one column from each).
d1 = {'Size': ['Big', 'Big', 'Normal','Big']}
df1 = pd.DataFrame(data=d1)
d2 = {'Size': ['Small','Normal','Normal','Normal', 'Small', 'Big', 'Big', 'Normal','Big']}
df2 = pd.DataFrame(data=d2)
#Plotting in one histogram
df1['Size'].value_counts().plot.bar(label = "df1")
df2['Size'].value_counts().plot.bar(label = "df2", alpha = 0.2,color='purple')
plt.legend(loc='upper right')
plt.show()
The issue is that in the x-axis of the histogram is only correct for df2. For df1 there should be 3 values of 'big' and 1 value of 'normal':
histogram of df1 and df2.
I have tried multiple ways of generating the plot and this is the closest I got to what I want, which is both dataframes in the same histogram, with different colors.
Ideally they would be side to side, but I didn't manage to find how, and 'stacked = False' doesn't work here.
Any help is welcome. Thanks!
You can reindex on explicit X-values:
x = ['Small', 'Normal', 'Big']
df1['Size'].value_counts().reindex(x).plot.bar(label = "df1")
df2['Size'].value_counts().reindex(x).plot.bar(label = "df2", alpha = 0.2,color='purple')
Output:
Another option:
(pd.concat({'df1': df1, 'df2': df2})['Size']
.groupby(level=0).value_counts()
.unstack(0)
.plot.bar()
)
Output:
You can also try plotly which produces interactive graphs. That is we can hover over the plots and see exact data values and other information.
import plotly.graph_objects as go
classes=['Small', 'Normal', 'Large']
#classes=df2.Size.unique() (better to use this)
fig = go.Figure(data=[
go.Bar(name='df1', x=classes, y=df1.value_counts()),
go.Bar(name='df2', x=classes, y=df2.value_counts())
])
# Change the bar mode
fig.update_layout(barmode='group')
fig.show()
Output:

Seaborn distplot remove empty plots

I have a dataframe with 4 columns. I want to plot it in sns.distplot as following:
g = sns.displot(dataframe, height = 25, kind = "kde", x = "value", fill = True, hue = "Testset", col = "Session", row = "Timepoint")
It produces the following plot with empty subplots, because I don't have all the combinations of values. Is there a way to remove the empty plots and plot it under one another.

Facetgrid to plot stacked normalised counts - Seaborn

I'm aiming to use Seaborn facet grid to plot counts of values but normalised, rather than pure counts. Using below, each row should display each unique value in Item. The x-axis should display Num and the values come from Label.
However, each row isn't being partitioned. The same data is displayed for each Item.
import pandas as pd
import Seaborn as sns
df = pd.DataFrame({
'Num' : [1,2,1,2,3,2,1,3,2],
'Label' : ['A','B','C','B','A','C','C','A','B'],
'Item' : ['Up','Left','Up','Left','Down','Right','Up','Down','Right'],
})
g = sns.FacetGrid(df,
row = 'Item',
row_order = ['Up','Right','Down','Left'],
aspect = 2,
height = 4,
sharex = True,
legend_out = True
)
g.map(sns.histplot, x = 'Num', hue = 'Label', data = df, multiple = 'fill', shrink=.8)
g.add_legend()
Maybe you can try g.map_dataframe(sns.histplot, x='Num', hue = 'Label', multiple = 'fill', shrink=.8). I'm not good at seaborn, I just look it up at https://seaborn.pydata.org/generated/seaborn.FacetGrid.html and map_dataframe seems work better than map.

Plotly: How to display individual value on histogram?

I am trying to make dynamic plots with plotly. I want to plot a count of data that have been aggregated (using groupby).
I want to facet the plot by color (and maybe even by column). The problem is that I want the value count to be displayed on each bar. With histogram, I get smooth bars but I can't find how to display the count:
With a bar plot I can display the count but I don't get smooth bar and the count does not appear for the whole bar but for each case composing that bar
Here is my code for the barplot
val = pd.DataFrame(data2.groupby(["program", "gender"])["experience"].value_counts())
px.bar(x=val.index.get_level_values(0), y=val, color=val.index.get_level_values(1), barmode="group", text=val)
It's basically the same for the histogram.
Thank you for your help!
px.histogram does not seem to have a text attribute. So if you're willing to do any binning before producing your plot, I would use px.Bar. Normally, you apply text to your barplot using px.Bar(... text = <something>). But this gives the results you've described with text for all subcategories of your data. But since we know that px.Bar adds data and annotations in the order that the source is organized, we can simply update text to the last subcategory applied using fig.data[-1].text = sums. The only challenge that remains is some data munging to retrieve the correct sums.
Plot:
Complete code with data example:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
# data
df = pd.DataFrame({'x':['a', 'b', 'c', 'd'],
'y1':[1, 4, 9, 16],
'y2':[1, 4, 9, 16],
'y3':[6, 8, 4.5, 8]})
df = df.set_index('x')
# calculations
# column sums for transposed dataframe
sums= []
for col in df.T:
sums.append(df.T[col].sum())
# change dataframe format from wide to long for input to plotly express
df = df.reset_index()
df = pd.melt(df, id_vars = ['x'], value_vars = df.columns[1:])
fig = px.bar(df, x='x', y='value', color='variable')
fig.data[-1].text = sums
fig.update_traces(textposition='inside')
fig.show()
If your first graph is with graph object librairy you can try:
# Use textposition='auto' for direct text
fig=go.Figure(data[go.Bar(x=val.index.get_level_values(0),
y=val, color=val.index.get_level_values(1),
barmode="group", text=val, textposition='auto',
)])

How to set y-axes limits in countplot?

df in my program happens to be a dataframe with these columns :
df.columns
'''output : Index(['lat', 'lng', 'desc', 'zip', 'title', 'timeStamp', 'twp', 'addr', 'e',
'reason'],
dtype='object')'''
When I execute this piece of code:
sns.countplot(x = df['reason'], data=df)
# output is the plot below
but if i slightly tweak my code like this :
p = df['reason'].value_counts()
k = pd.DataFrame({'causes':p.index,'freq':p.values})
sns.countplot(x = k['causes'], data = k)
So essentially I just stored the 'reasons' column values and its frequencies as a series in p and then converted them to another dataframe k but this new countplot doesn't have the right range of Y-axis for the given values.
My doubts happen to be :
Can we set of Y-axis in the second countplot in its appropriate limits
Why the does second countplot differ from the first one when i just separated the specific column i wanted to graph and plotted it separately ?

Categories