plotting two DataFrame.value_counts() in a single histogram - python

I want to plot in a single histogram two different dataframes (only one column from each).
d1 = {'Size': ['Big', 'Big', 'Normal','Big']}
df1 = pd.DataFrame(data=d1)
d2 = {'Size': ['Small','Normal','Normal','Normal', 'Small', 'Big', 'Big', 'Normal','Big']}
df2 = pd.DataFrame(data=d2)
#Plotting in one histogram
df1['Size'].value_counts().plot.bar(label = "df1")
df2['Size'].value_counts().plot.bar(label = "df2", alpha = 0.2,color='purple')
plt.legend(loc='upper right')
plt.show()
The issue is that in the x-axis of the histogram is only correct for df2. For df1 there should be 3 values of 'big' and 1 value of 'normal':
histogram of df1 and df2.
I have tried multiple ways of generating the plot and this is the closest I got to what I want, which is both dataframes in the same histogram, with different colors.
Ideally they would be side to side, but I didn't manage to find how, and 'stacked = False' doesn't work here.
Any help is welcome. Thanks!

You can reindex on explicit X-values:
x = ['Small', 'Normal', 'Big']
df1['Size'].value_counts().reindex(x).plot.bar(label = "df1")
df2['Size'].value_counts().reindex(x).plot.bar(label = "df2", alpha = 0.2,color='purple')
Output:
Another option:
(pd.concat({'df1': df1, 'df2': df2})['Size']
.groupby(level=0).value_counts()
.unstack(0)
.plot.bar()
)
Output:

You can also try plotly which produces interactive graphs. That is we can hover over the plots and see exact data values and other information.
import plotly.graph_objects as go
classes=['Small', 'Normal', 'Large']
#classes=df2.Size.unique() (better to use this)
fig = go.Figure(data=[
go.Bar(name='df1', x=classes, y=df1.value_counts()),
go.Bar(name='df2', x=classes, y=df2.value_counts())
])
# Change the bar mode
fig.update_layout(barmode='group')
fig.show()
Output:

Related

Stacked Barplot with 3 categories in Plolty

I need a help creating a Barplot in plotly, any help and suggestion will be much appreciated.
I have a dataframe with columns 'partner', 'weekday', 'time_of_day', 'country', and 'count'. I want to create a stacked bar plot where the x-axis is 'weekday'(0-6), the y-axis is 'count', and each bar is stacked by 'country'. Additionally, I want each 'weekday' to have 4 bins representing the 4 different values of 'time_of_day'(first_6hours,second_6hours,third_6hours,last_6hours). How can I achieve this using matplotlib, seaborn or plotly in Python?
Here are some rows of the data
Data
Here is current code of the barplot
import plotly.express as px
x_col = 'weekday'
y_col = 'count'
color_col = 'country'
hover_col = 'country'
title_col = 'partner'
for title in dataframe[title_col].unique():
fig = px.bar(data_frame=dataframe[dataframe[title_col]==title], x=x_col, y=y_col, color=color_col, hover_name=hover_col,
labels={y_col: 'Count'}, facet_col='time_of_day', height=600)
fig.update_layout(title=f'Distribution by {title_col} {title}')
fig.show()
and one of the output is the following.
Barplot
Unfortunately it is not the result that I want, I want to that each weekday have 4 bins (representing time_of_day).
If you have some idea please let me know, it can be in other libraries, as well.

Plot Bar Graph with different Parametes in X Axis

I have a DataFrame like below. It has Actual and Predicted columns. I want to compare Actual Vs Predicted in Bar plot in one on one. I have confidence value for Predicted column and default for Actual confidence is 1. So, I want to keep Each row in single bar group Actual and Predicted value will be X axis and corresponding Confidence score as y value.
I am unable to get the expected plot because X axis values are not aligned or grouped to same value in each row.
Actual Predicted Confidence
0 A A 0.90
1 B C 0.30
2 C C 0.60
3 D D 0.75
Expected Bar plot.
Any hint would be appreciable. Please let me know if further details required.
What I have tried so far.
df_actual = pd.DataFrame()
df_actual['Key']= df['Actual'].copy()
df_actual['Confidence'] = 1
df_actual['Identifier'] = 'Actual'
df_predicted=pd.DataFrame()
df_predicted = df[['Predicted', 'Confidence']]
df_predicted = df_predicted.rename(columns={'Predicted': 'Key'})
df_predicted['Identifier'] = 'Predicted'
df_combined = pd.concat([df_actual,df_predicted], ignore_index=True)
df_combined
fig = px.bar(df_combined, x="Key", y="Confidence", color='Identifier',
barmode='group', height=400)
fig.show()
I have found that adjusting the data first makes it easier to get the plot I want. I have used Seaborn, hope that is ok. Please see if this code works for you. I have considered that the df mentioned above is already available. I created df2 so that it aligns to what you had shown in the expected figure. Also, I used index as the X-axis column so that the order is maintained... Some adjustments to ensure xtick names align and the legend is outside as you wanted it.
Code
vals= []
conf = []
for x, y, z in zip(df.Actual, df.Predicted, df.Confidence):
vals += [x, y]
conf += [1, z]
df2 = pd.DataFrame({'Values': vals, 'Confidence':conf}).reset_index()
ax=sns.barplot(data = df2, x='index', y='Confidence', hue='Values',dodge=False)
ax.set_xticklabels(['Actual', 'Predicted']*4)
plt.legend(bbox_to_anchor=(1.0,1))
plt.show()
Plot
Update - grouping Actual and Predicted bars
Hi #Mohammed - As we have already used up hue, I don't think there is a way to do this easily with Seaborn. You would need to use matplotlib and adjust the bar position, xtick positions, etc. Below is the code that will do this. You can change SET1 to another color map to change colors. I have also added a black outline as the same colored bars were blending into one another. Further, I had to rotate the xlables, as they were on top of one another. You can change it as per your requirements. Hope this helps...
vals = df[['Actual','Predicted']].melt(value_name='texts')['texts']
conf = [1]*4 + list(df.Confidence)
ident = ['Actual', 'Predicted']*4
df2 = pd.DataFrame({'Values': vals, 'Confidence':conf, 'Identifier':ident}).reset_index()
uvals, uind = np.unique(df2["Values"], return_inverse=1)
cmap = plt.cm.get_cmap("Set1")
fig, ax=plt.subplots()
l = len(df2)
pos = np.arange(0,l) % (l//2) + (np.arange(0,l)//(l//2)-1)*0.4
ax.bar(pos, df2["Confidence"], width=0.4, align="edge", ec="k",color=cmap(uind) )
handles=[plt.Rectangle((0,0),1,1, color=cmap(i), ec="k") for i in range(len(uvals))]
ax.legend(handles=handles, labels=list(uvals), prop ={'size':10}, loc=9, ncol=8)
pos=pos+0.2
pos.sort()
ax.set_xticks(pos)
ax.set_xticklabels(df2["Identifier"][:l], rotation=45,ha='right', rotation_mode="anchor")
ax.set_ylim(0, 1.2)
plt.show()
Output plot
I updated #Redox answer to get the exact output.
df_ = pd.DataFrame({'Labels': df.reset_index()[['Actual', 'Predicted', 'index']].values.ravel(),
'Confidence': np.array(list(zip(np.repeat(1, len(df)), df['Confidence'].values, np.repeat(0, len(df))))).ravel()})
df_.loc[df_['Labels'].astype(str).str.isdigit(), 'Labels'] = ''
plt.figure(figsize=(15, 6))
ax=sns.barplot(data = df_, x=df_.index, y='Confidence', hue='Labels',dodge=False, ci=None)
ax.set_xticklabels(['Actual', 'Predicted', '']*len(df))
plt.setp(ax.get_xticklabels(), rotation=90)
ax.tick_params(labelsize=14)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
Output:
Removed loop to improve performance
Added blank bar values to look alike group chart.

How to remove one xticks in a subplot (seaborn FacetGrid) while still setting order in the other?

I'm trying to remove one xticks (one column on a categorical axis) from a Seaborn FacetGrid plot. I'm not sure either how to keep the spacing between each tick similar between subplots.
A picture is worth a thousand words, see pictures below:
The ideal output would be:
# Testing code
df = pd.DataFrame()
df['Value'] = np.random.rand(100)
df['Cat1'] = np.random.choice(['yes', 'no', 'maybe'], 100)
df['Cat2'] = np.random.choice(['Blue', 'Red'], 100)
df = df[~((df.Cat1 == 'maybe') & (df.Cat2 == 'Red'))]
g = sns.catplot(
data=df,
x='Cat1',
y='Value',
col='Cat2',
col_order=['Red', 'Blue'],
order=['yes', 'no', 'maybe'],
sharex=False,
color='black')
If we don't set the order, it creates a plot like this:
which has two issues: the order is not set (we want that), and the spacing is awkward.
Any tips? Thanks!
The approach I came up with is to replace the 'maybe' in the tick label of the first subplot with a blank, and move the position of the second subplot to the left, taking the coordinates of the second and subtracting the offset value. For some reason, the height of the second subplot has changed, so I have manually added an adjustment value.
g = sns.catplot(
data=df,
x='Cat1',
y='Value',
col='Cat2',
col_order=['Red', 'Blue'],
order=['yes', 'no', 'maybe'],
sharex=False,
color='black')
labels = g.fig.axes[0].get_xticklabels()
print(labels)
g.fig.axes[0].set_xticklabels(['yes','no',''])
g_pos2 = g.fig.axes[1].get_position()
print(g_pos2)
offset = 0.095
g.fig.axes[1].set_position([g_pos2.x0 - offset, g_pos2.y0, g_pos2.x1 - g_pos2.x0, g_pos2.y1 - 0.11611])
print(g.fig.axes[1].get_position())
Actually I found a workaround that is more generic than the previous answer.
df = pd.DataFrame()
df['Value'] = np.random.rand(100)
df['Cat1'] = np.random.choice(['yes', 'no', 'maybe'], 100)
df['Cat2'] = np.random.choice(['Blue', 'Red'], 100)
cat_ordered = ['yes', 'no', 'maybe']
# Set to categorical
df.loc[:, 'Cat1'] = df.Cat1.astype('category').cat.set_categories(cat_ordered)
# Reorder dataframe based on the categories (ordered)
df = df.sort_values(by='Cat1')
# Unset the categorical type so that the empty categories don't interfere in plot
df.loc[:, 'Cat1'] = df.Cat1.astype(str)
df = df[~((df.Cat1 == 'maybe') & (df.Cat2 == 'Red'))]
g = sns.FacetGrid(
data=df,
col='Cat2',
col_order=['Red', 'Blue'],
sharex=False, height=4, aspect=0.8,
# use this to squeeze plot to right dimensions
gridspec_kws={'width_ratios': [2, 3]} # easy to parameterize also
)
g.map_dataframe(
sns.swarmplot,
x='Cat1',
y='Value',
color='black', s=4
)
g.fig.subplots_adjust(wspace=0, hspace=0)
Result:

Plotting Dataframe as a bar chart with each column on a separate y axis

I have a Dataframe like this:
i would like to find a simple way of presenting this in a matplotlib bar chart, with the columns as a seperate yaxis. As in the below sketch:
Use seaborn and use hue parameter:
fig, ax = pyplot.subplots(figsize=(10, 10))
ax =seaborn.barplot(
data= df.melt(id_vars='index').rename(columns=str.title),
x= 'index',
y= 'value',
hue='varaible'
)
Let's try:
# sample data
df = pd.DataFrame({'sum_prices':[100,200,300],
'properties':[10,5, 4]}, index=list('abc'))
df.plot.bar(secondary_y=['properties'])
output:

Plotly: How to display individual value on histogram?

I am trying to make dynamic plots with plotly. I want to plot a count of data that have been aggregated (using groupby).
I want to facet the plot by color (and maybe even by column). The problem is that I want the value count to be displayed on each bar. With histogram, I get smooth bars but I can't find how to display the count:
With a bar plot I can display the count but I don't get smooth bar and the count does not appear for the whole bar but for each case composing that bar
Here is my code for the barplot
val = pd.DataFrame(data2.groupby(["program", "gender"])["experience"].value_counts())
px.bar(x=val.index.get_level_values(0), y=val, color=val.index.get_level_values(1), barmode="group", text=val)
It's basically the same for the histogram.
Thank you for your help!
px.histogram does not seem to have a text attribute. So if you're willing to do any binning before producing your plot, I would use px.Bar. Normally, you apply text to your barplot using px.Bar(... text = <something>). But this gives the results you've described with text for all subcategories of your data. But since we know that px.Bar adds data and annotations in the order that the source is organized, we can simply update text to the last subcategory applied using fig.data[-1].text = sums. The only challenge that remains is some data munging to retrieve the correct sums.
Plot:
Complete code with data example:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
# data
df = pd.DataFrame({'x':['a', 'b', 'c', 'd'],
'y1':[1, 4, 9, 16],
'y2':[1, 4, 9, 16],
'y3':[6, 8, 4.5, 8]})
df = df.set_index('x')
# calculations
# column sums for transposed dataframe
sums= []
for col in df.T:
sums.append(df.T[col].sum())
# change dataframe format from wide to long for input to plotly express
df = df.reset_index()
df = pd.melt(df, id_vars = ['x'], value_vars = df.columns[1:])
fig = px.bar(df, x='x', y='value', color='variable')
fig.data[-1].text = sums
fig.update_traces(textposition='inside')
fig.show()
If your first graph is with graph object librairy you can try:
# Use textposition='auto' for direct text
fig=go.Figure(data[go.Bar(x=val.index.get_level_values(0),
y=val, color=val.index.get_level_values(1),
barmode="group", text=val, textposition='auto',
)])

Categories