Stacked Barplot with 3 categories in Plolty - python

I need a help creating a Barplot in plotly, any help and suggestion will be much appreciated.
I have a dataframe with columns 'partner', 'weekday', 'time_of_day', 'country', and 'count'. I want to create a stacked bar plot where the x-axis is 'weekday'(0-6), the y-axis is 'count', and each bar is stacked by 'country'. Additionally, I want each 'weekday' to have 4 bins representing the 4 different values of 'time_of_day'(first_6hours,second_6hours,third_6hours,last_6hours). How can I achieve this using matplotlib, seaborn or plotly in Python?
Here are some rows of the data
Data
Here is current code of the barplot
import plotly.express as px
x_col = 'weekday'
y_col = 'count'
color_col = 'country'
hover_col = 'country'
title_col = 'partner'
for title in dataframe[title_col].unique():
fig = px.bar(data_frame=dataframe[dataframe[title_col]==title], x=x_col, y=y_col, color=color_col, hover_name=hover_col,
labels={y_col: 'Count'}, facet_col='time_of_day', height=600)
fig.update_layout(title=f'Distribution by {title_col} {title}')
fig.show()
and one of the output is the following.
Barplot
Unfortunately it is not the result that I want, I want to that each weekday have 4 bins (representing time_of_day).
If you have some idea please let me know, it can be in other libraries, as well.

Related

Manually set color legend for Plotly line plot [duplicate]

This question already has an answer here:
Plotly: How to define colors in a figure using Plotly Graph Objects and Plotly Express?
(1 answer)
Closed 15 days ago.
I would like to fix colors for my legend in a plotly line graph. Basically, I have a dataframe df in which there is a column Sex. I already know that my dataframe will hold only 3 possible values in that column.
So I would like to fix colors ie. Green for Male, Blue for Female and Yellow for Other. This should also hold true when there are no occurrences in the df for one category i.e. Male.
Currently, my code auto-defaults colors. For ex: If df contains all categories, it sets Blue for Male, Yellow for Female, Green for Other. But when, df only holds values containing Male and Female, the color preferences change, thereby losing consistency.
Code is as follows:
df = pd.DataFrame(...)
lineplot = px.line(
df,
x="Date",
y='Ct',
color='Sex',
title=f"Lineplot to keep track of people across time"
)
lineplot.show()
You can define the colours individually using color_discrete_map argument. Ref and still keep the specific color which we set manually:
import plotly.express as px
import pandas as pd
df = pd.DataFrame(dict(
Date=[1,2,3],
Male = [1,2,3],
Female = [2,3,1],
Others = [7,5,2]
))
fig = px.line(df, x="Date", y=["Male",
# "Female", comment this type for testing color
"Others"],
color_discrete_map={
"Male": "#456987",
"Female": "#147852",
"Others": "#00D",
})
fig.show()
output:

plotting two DataFrame.value_counts() in a single histogram

I want to plot in a single histogram two different dataframes (only one column from each).
d1 = {'Size': ['Big', 'Big', 'Normal','Big']}
df1 = pd.DataFrame(data=d1)
d2 = {'Size': ['Small','Normal','Normal','Normal', 'Small', 'Big', 'Big', 'Normal','Big']}
df2 = pd.DataFrame(data=d2)
#Plotting in one histogram
df1['Size'].value_counts().plot.bar(label = "df1")
df2['Size'].value_counts().plot.bar(label = "df2", alpha = 0.2,color='purple')
plt.legend(loc='upper right')
plt.show()
The issue is that in the x-axis of the histogram is only correct for df2. For df1 there should be 3 values of 'big' and 1 value of 'normal':
histogram of df1 and df2.
I have tried multiple ways of generating the plot and this is the closest I got to what I want, which is both dataframes in the same histogram, with different colors.
Ideally they would be side to side, but I didn't manage to find how, and 'stacked = False' doesn't work here.
Any help is welcome. Thanks!
You can reindex on explicit X-values:
x = ['Small', 'Normal', 'Big']
df1['Size'].value_counts().reindex(x).plot.bar(label = "df1")
df2['Size'].value_counts().reindex(x).plot.bar(label = "df2", alpha = 0.2,color='purple')
Output:
Another option:
(pd.concat({'df1': df1, 'df2': df2})['Size']
.groupby(level=0).value_counts()
.unstack(0)
.plot.bar()
)
Output:
You can also try plotly which produces interactive graphs. That is we can hover over the plots and see exact data values and other information.
import plotly.graph_objects as go
classes=['Small', 'Normal', 'Large']
#classes=df2.Size.unique() (better to use this)
fig = go.Figure(data=[
go.Bar(name='df1', x=classes, y=df1.value_counts()),
go.Bar(name='df2', x=classes, y=df2.value_counts())
])
# Change the bar mode
fig.update_layout(barmode='group')
fig.show()
Output:

how to make stacked plots for dataframe with multiple index in python?

I have trade export data which is collected weekly. I intend to make stacked bar plot with matplotlib but I have little difficulties managing pandas dataframe with multiple indexes. I looked into this post but not able to get what I am expecting. Can anyone suggest a possible way of doing this in python? Seems I made the wrong data aggregation and I think I might use for loop to iterate year then make a stacked bar plot on a weekly base. Does anyone know how to make this easier in matplotlib? any idea?
reproducible data and my attempt
import pandas as pd
import matplotlib.pyplot as plt
# load the data
url = 'https://gist.githubusercontent.com/adamFlyn/0eb9d60374c8a0c17449eef4583705d7/raw/edea1777466284f2958ffac6cafb86683e08a65e/mydata.csv'
df = pd.read_csv(url, parse_dates=['weekly'])
df.drop('Unnamed: 0', axis=1, inplace=True)
nn = df.set_index(['year','week'])
nn.drop("weekly", axis=1, inplace=True)
f, a = plt.subplots(3,1)
nn.xs('2018').plot(kind='bar',ax=a[0])
nn.xs('2019').plot(kind='bar',ax=a[1])
nn.xs('2020').plot(kind='bar',ax=a[2])
plt.show()
plt.close()
this attempt didn't work for me. instead of explicitly selecting years like 2018, 2019, ..., is there any more efficient to make stacked bar plots for dataframe with multiple indexes? Any thoughts?
desired output
this is the desired stacked bar plot for year of 2018 as an example
how should I get my desired stacked bar plot? Any better ideas?
Try this:
nn.groupby(level=0).plot.bar(stacked=True)
or to prevent year as tuple in x axis:
for n, g in nn.groupby(level=0):
g.loc[n].plot.bar(stacked=True)
Update per request in comments
for n, g in nn.groupby(level=0):
ax = g.loc[n].plot.bar(stacked=True, title=f'{n} Year', figsize=(8,5))
ax.legend(loc='lower center')
Change layout position
fig, ax = plt.subplots(1,3)
axi = iter(ax)
for n, g in nn.groupby(level=0):
axs = next(axi)
g.loc[n].plot.bar(stacked=True, title=f'{n}', figsize=(15,8), ax=axs)
axs.legend(loc='lower center')
Try using loc instead of xs:
f, a = plt.subplots(3,1)
for x, ax in zip(nn.index.unique('year'),a.ravel()):
nn.loc[x].plot.bar(stacked=True, ax=ax)

Plotly: How to display individual value on histogram?

I am trying to make dynamic plots with plotly. I want to plot a count of data that have been aggregated (using groupby).
I want to facet the plot by color (and maybe even by column). The problem is that I want the value count to be displayed on each bar. With histogram, I get smooth bars but I can't find how to display the count:
With a bar plot I can display the count but I don't get smooth bar and the count does not appear for the whole bar but for each case composing that bar
Here is my code for the barplot
val = pd.DataFrame(data2.groupby(["program", "gender"])["experience"].value_counts())
px.bar(x=val.index.get_level_values(0), y=val, color=val.index.get_level_values(1), barmode="group", text=val)
It's basically the same for the histogram.
Thank you for your help!
px.histogram does not seem to have a text attribute. So if you're willing to do any binning before producing your plot, I would use px.Bar. Normally, you apply text to your barplot using px.Bar(... text = <something>). But this gives the results you've described with text for all subcategories of your data. But since we know that px.Bar adds data and annotations in the order that the source is organized, we can simply update text to the last subcategory applied using fig.data[-1].text = sums. The only challenge that remains is some data munging to retrieve the correct sums.
Plot:
Complete code with data example:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
# data
df = pd.DataFrame({'x':['a', 'b', 'c', 'd'],
'y1':[1, 4, 9, 16],
'y2':[1, 4, 9, 16],
'y3':[6, 8, 4.5, 8]})
df = df.set_index('x')
# calculations
# column sums for transposed dataframe
sums= []
for col in df.T:
sums.append(df.T[col].sum())
# change dataframe format from wide to long for input to plotly express
df = df.reset_index()
df = pd.melt(df, id_vars = ['x'], value_vars = df.columns[1:])
fig = px.bar(df, x='x', y='value', color='variable')
fig.data[-1].text = sums
fig.update_traces(textposition='inside')
fig.show()
If your first graph is with graph object librairy you can try:
# Use textposition='auto' for direct text
fig=go.Figure(data[go.Bar(x=val.index.get_level_values(0),
y=val, color=val.index.get_level_values(1),
barmode="group", text=val, textposition='auto',
)])

Plotly subplot represent same y-axis name with same color and single legend

I am trying to create a plot for two categories in a subplot. 1st column represent category FF and 2nd column represent category RF in the subplot.
The x-axis is always time and y-axis is remaining columns. In other words, it is a plot with one column vs rest.
1st category and 2nd category always have same column names just only the values differs.
I tried to generate the plot in a for loop but the problem is plotly treats each column name as distinct and thereby it represents the lines in different color for y-axis with same name. As a consequence, in legend also an entry is created.
For example, in first row Time vs price2010 I want both subplot FF and RF to be represented in same color (say blue) and a single entry in legend.
I tried adding legendgroup in go.Scatter but it doesn't help.
import pandas as pd
from pandas import DataFrame
from plotly import tools
from plotly.offline import init_notebook_mode, plot, iplot
import plotly.graph_objs as go
from plotly.subplots import make_subplots
CarA = {'Time': [10,20,30,40 ],
'Price2010': [22000,26000,27000,35000],
'Price2011': [23000,27000,28000,36000],
'Price2012': [24000,28000,29000,37000],
'Price2013': [25000,29000,30000,38000],
'Price2014': [26000,30000,31000,39000],
'Price2015': [27000,31000,32000,40000],
'Price2016': [28000,32000,33000,41000]
}
ff = DataFrame(CarA)
CarB = {'Time': [8,18,28,38 ],
'Price2010': [19000,20000,21000,22000],
'Price2011': [20000,21000,22000,23000],
'Price2012': [21000,22000,23000,24000],
'Price2013': [22000,23000,24000,25000],
'Price2014': [23000,24000,25000,26000],
'Price2015': [24000,25000,26000,27000],
'Price2016': [25000,26000,27000,28000]
}
rf = DataFrame(CarB)
Type = {
'FF' : ff,
'RF' : rf
}
fig = make_subplots(rows=len(ff.columns), cols=len(Type), subplot_titles=('FF','RF'),vertical_spacing=0.3/len(ff.columns))
labels = ff.columns[1:]
for indexC, (cat, values) in enumerate(Type.items()):
for indexP, params in enumerate(values.columns[1:]):
trace = go.Scatter(x=values.iloc[:,0], y=values[params], mode='lines', name=params,legendgroup=params)
fig.append_trace(trace,indexP+1, indexC+1)
fig.update_xaxes(title_text=values.columns[0],row=indexP+1, col=indexC+1)
fig.update_yaxes(title_text=params,row=indexP+1, col=indexC+1)
fig.update_layout(height=2024, width=1024,title_text="Car Analysis")
iplot(fig)
It might not be a good solution, but so far I can able to come up only with this hack.
fig = make_subplots(rows=len(ff.columns), cols=len(Type), subplot_titles=('FF','RF'),vertical_spacing=0.2/len(ff.columns))
labels = ff.columns[1:]
colors = [ '#a60000', '#f29979', '#d98d36', '#735c00', '#778c23', '#185900', '#00a66f']
legend = True
for indexC, (cat, values) in enumerate(Type.items()):
for indexP, params in enumerate(values.columns[1:]):
trace = go.Scatter(x=values.iloc[:,0], y=values[params], mode='lines', name=params,legendgroup=params, showlegend=legend, marker=dict(
color=colors[indexP]))
fig.append_trace(trace,indexP+1, indexC+1)
fig.update_xaxes(title_text=values.columns[0],row=indexP+1, col=indexC+1)
fig.update_yaxes(title_text=params,row=indexP+1, col=indexC+1)
fig.update_layout(height=1068, width=1024,title_text="Car Analysis")
legend = False
If you combine your data into a single tidy data frame, you can use a simple Plotly Express call to make the chart: px.line() with color, facet_row and facet_col

Categories