Grouped bar chart for categories by month/year

Grouped bar chart for categories by month/year - python

I'm trying to use Plotly to create a stacked or grouped bar chart that has month/year on the x-axis and values on the y-axis. The data frame looks like this:
category value date
apple 4 10/2020
banana 3 10/2020
apple 2 10/2020
strawberry 1 11/2020
banana 4 11/2020
apple 9 11/2020
banana 4 12/2020
apple 7 12/2020
strawberry 4 12/2020
banana 8 12/2020
.
.
.
Assuming that newer dates will come through, and also more categories can be added, I'm trying to create a grouped bar chart that is also scrollable on the x-axis(date).
I tried this to create the grouped bar chart but it ends up being a stacked bar chart instead:
import plotly.graph_objects as go
fig_3_a = go.Figure(data=[go.Bar(
x=df['date'],
y=df['value'],
text=df['category'],
textposition='auto',
orientation ='v',
)],
layout=go.Layout(barmode='group'))
I would like something like this instead, where the different categories can possibly be assigned a different color, and the x-axis being the month/day and the y-axis being the value. Here, gender==category and x-axis==month/year. Also would need to add the scrolling for the x-axis to see all the month/year:

You can do it simply with plotly.express.
import plotly.express as px
fig = px.bar(df, x='date', y='value', color='category', barmode='group')
fig.show()
If you want to do it with go.Bar class, you need to add traces for each category.

Related

Plot count of one column versus time grouped by two columns

I want to create a bar graph showing count of top three Products per month grouped by Company. Please suggest a way to achieve this, preferably using plotly-express library.
data = [['2017-03', 'Car','Mercerdes'], ['2017-03', 'Car','Mercerdes'], ['2017-03', 'Car','BMW'],['2017-03', 'Car','Audi'],
['2017-03', 'Burger','KFC'], ['2022-09', 'Burger','Subway'], ['2022-09', 'Coffee','Subway'],
['2017-03', 'Clothes','Only'], ['2022-09', 'Coffee','KFC'], ['2022-09', 'Coffee','McD']
]
df = pd.DataFrame(data, columns=['Month', 'Product','Company'])
df
Month Product Company
0 2017-03 Car Mercerdes
1 2017-03 Car Mercerdes
2 2017-03 Car BMW
3 2017-03 Car Audi
4 2017-03 Burger KFC
5 2022-09 Burger Subway
6 2022-09 Coffee Subway
7 2017-03 Clothes Only
8 2022-09 Coffee KFC
9 2022-09 Coffee McD

The approach of groupby done by #kjanker is perfect have the number of counts. I think later on in plotly you can use barmode=group and add the text to the bars as product yields something similar to what you look for.
Otherwise I don't think with plotly express you can achieve to split the columns by Company and Product
import plotly.express as px
data = data = df.groupby(["Month", "Company", "Product"]).size().rename("Count").reset_index()
px.bar(data, x="Month", y="Count", color="Company",
barmode='group', text="Product")

You can do this with pandas groupby and size.
import plotly.express as px
data = df.groupby(["Month", "Company", "Product"]).size().rename("Count").reset_index()
month = "2017-03"
px.bar(data[data["Month"]==month], x="Product", y="Count", color="Company", title=month)
Note that you have to reset the index to remove the multi index (currently not supported by plotly express).

This could also be one way to do this. It is useful to plot aggregated data (count, sum,mean etc.) when a dataframe is grouped by two columns and we want to represent both columns in the plot. Also, 'facet_col' and 'color' could be interchanged depending on desired output.
The plot doesn't look correct with dummy data but it worked with my actual data.
import plotly.express as px
px.bar(df_grouped, x='month', y='count', color='Company',facet_col='Product')

How can I plot a pandas dataframe where x = month and y = frequency of text?

I have the following dataset:
Date
ID
Fruit
2021-2-2
1
Apple
2021-2-2
1
Pear
2021-2-2
1
Apple
2021-2-2
2
Pear
2021-2-2
2
Pear
2021-2-2
2
Apple
2021-3-2
3
Apple
2021-3-2
3
Apple
I have removed duplicate "Fruit" based on ID (There can only be 1 apple per ID number but multiple apples per a single month). And now I would like to generate multiple scatter/line plots (one per "Fruit" type) with the x-axis as month (i.e. Jan. 2021, Feb. 2021, Mar. 2021, etc) and the y-axis as frequency or counts of "Fruit" that occur in that month.
If I could generate new columns in a new sheet in Excel that I could then plot as x and y that would be great too. Something like this for Apples specifically:
Month
Number of Apples
Jan 2021
0
Feb 2021
2
Mar 2021
1
I've tried the following which let me remove duplicates but I can't figure out how to count the number of Apples in the Fruit column that occur within a given timeframe (month is what I'm looking for now) and set that to the y-axis.
import numpy as np
import pandas as pd
import re
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_excel('FruitExample.xlsx',
usecols=("A:E"), sheet_name=('Data'))
df_Example = df.drop_duplicates(subset=["ID Number", "Fruit"], keep="first")
df_Example.plot(x="Date", y=count("Fruit"), style="o")
plt.show()
I've tried to use groupby and categorical but can't seem to count this up properly and plot it. Here is an example of a plot that would be great.
[]

Make sure the dates are in datetime format
df['Date']=pd.to_datetime(df['Date'])
Then create a column for month-year,
df['Month-Year']=df['Date'].dt.to_period('M') #M for month
new_df=pd.DataFrame(df.groupby(['Month-Year','Fruit'])['ID'].count())
new_df.reset_index(inplace=True)
Make sure to change back to datetime as seaborn can't handle 'period' type
new_df['Month-Year']=new_df['Month-Year'].apply(lambda x: x.to_timestamp())
Then plot,
import seaborn as sns
sns.lineplot(x='Month-Year',y='ID',data=new_df,hue='Fruit')

how to plot in pandas categorical data

I have this kind of dataframe:
animal age where
0 dog 1 indoor
1 cat 4 indoor
2 horse 3 outdoor
I would like to present a bar plot in which:
y axis is age, x axis is animal, and the animals are grouped in adjacent bars with different colors.
Thanks

This should do the trick
df = pd.DataFrame({"animal":["dog","cat","horse"],"age":[1,4,3],"where":["indoor","indoor","outdoor"]})
df
animal age where
0 dog 1 indoor
1 cat 4 indoor
2 horse 3 outdoor
ax = df.plot.bar(x="animal",y="age",color=["b","r","g"])
ax.legend("")
ax.set_ylabel("Age")

Another easy way. Set the intended x axis label as index and plot. By defaul, float/integer end up on the y axis
import matplotlib.pyplot as plt
df.set_index(df['animal']).plot( kind='bar')
plt.ylabel('age')

Stacked area chart with datetime axis

I am attepmtimng to create a Bokeh stacked area chart from the following Pandas DataFrame.
An example of the of the DataFrame (df) is as follows;
date tom jerry bill
2014-12-07 25 12 25
2014-12-14 15 16 30
2014-12-21 10 23 32
2014-12-28 12 13 55
2015-01-04 5 15 20
2015-01-11 0 15 18
2015-01-18 8 9 17
2015-01-25 11 5 16
The above DataFrame represents a snippet of the total df, which snaps over a number of years and contains additional names to the ones shown.
I am attempting to use the datetime column date as the x-axis, with the count information for each name as the y-axis.
Any assistance that anyone could provide would be greatly appreciated.

You can create a stacked area chart by using the patch glyph. I first used df.cumsum to stack the values in the dataframe by row. After that I append two rows to the dataframe with the max and min date and Y value 0. I plot the patches in a reverse order of the column list (excluding the date column) so the person with the highest values is getting plotted first and the persons with lower values are plotted after.
Another implementation of a stacked area chart can be found here.
import pandas as pd
from bokeh.plotting import figure, show
from bokeh.palettes import inferno
from bokeh.models.formatters import DatetimeTickFormatter
df = pd.read_csv('stackData.csv')
df_stack = df[list(df)[1:]].cumsum(axis=1)
df_stack['date'] = df['date'].astype('datetime64[ns]')
bot = {list(df)[0]: max(df_stack['date'])}
for column in list(df)[1:]:
bot[column] = 0
df_stack = df_stack.append(bot, ignore_index=True)
bot = {list(df)[0]: min(df_stack['date'])}
for column in list(df)[1:]:
bot[column] = 0
df_stack = df_stack.append(bot, ignore_index=True)
p = figure(x_axis_type='datetime')
p.xaxis.formatter=DatetimeTickFormatter(days=["%d/%m/%Y"])
p.xaxis.major_label_orientation = 45
for person, color in zip(list(df_stack)[2::-1], inferno(len(list(df_stack)))):
p.patch(x=df_stack['date'], y=df_stack[person], color=color, legend=person)
p.legend.click_policy="hide"
show(p)

Grouping by multiple years in a single column and plotting the result stacked

I have a dataframe that looks like this, with the default pandas index starting at 0:
index Year Count Name
0 2005 70000 Apple
1 2005 60000 Banana
2 2006 20000 Pineapple
3 2007 70000 Cherry
4 2007 60000 Coconut
5 2007 40000 Pear
6 2008 90000 Grape
7 2008 10000 Apricot
I would like to create a stacked bar plot of this data.
However, using the df.groupby() function will only allow me to call a function such as .mean() or .count() on this data in order to plot the data by year. I am getting the following result which separates each data point and does not group them by the shared year.
I have seen the matplotlib example for stacked bar charts, but they are grouped by a common index, in this case I do not have a common index I want to plot by. Is there a way to group and plot this data without rearranging the entire dataframe?

If I understood you correctly, you could do this using pivot first:
df1 = pd.pivot_table(df, values='Count', index='Year', columns='Name')
df1.plot(kind='bar')
Output:
Or with the argument stacked=True:
df1.plot(kind='bar', stacked=True)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Grouped bar chart for categories by month/year - python

You can do it simply with plotly.express. import plotly.express as px fig = px.bar(df, x='date', y='value', color='category', barmode='group') fig.show() If you want to do it with go.Bar class, you need to add traces for each category.

Related

Plot count of one column versus time grouped by two columns

How can I plot a pandas dataframe where x = month and y = frequency of text?

how to plot in pandas categorical data

Stacked area chart with datetime axis

Grouping by multiple years in a single column and plotting the result stacked

Categories

Resources