Plot count of one column versus time grouped by two columns - python

I want to create a bar graph showing count of top three Products per month grouped by Company. Please suggest a way to achieve this, preferably using plotly-express library.
data = [['2017-03', 'Car','Mercerdes'], ['2017-03', 'Car','Mercerdes'], ['2017-03', 'Car','BMW'],['2017-03', 'Car','Audi'],
['2017-03', 'Burger','KFC'], ['2022-09', 'Burger','Subway'], ['2022-09', 'Coffee','Subway'],
['2017-03', 'Clothes','Only'], ['2022-09', 'Coffee','KFC'], ['2022-09', 'Coffee','McD']
]
df = pd.DataFrame(data, columns=['Month', 'Product','Company'])
df
Month Product Company
0 2017-03 Car Mercerdes
1 2017-03 Car Mercerdes
2 2017-03 Car BMW
3 2017-03 Car Audi
4 2017-03 Burger KFC
5 2022-09 Burger Subway
6 2022-09 Coffee Subway
7 2017-03 Clothes Only
8 2022-09 Coffee KFC
9 2022-09 Coffee McD

The approach of groupby done by #kjanker is perfect have the number of counts. I think later on in plotly you can use barmode=group and add the text to the bars as product yields something similar to what you look for.
Otherwise I don't think with plotly express you can achieve to split the columns by Company and Product
import plotly.express as px
data = data = df.groupby(["Month", "Company", "Product"]).size().rename("Count").reset_index()
px.bar(data, x="Month", y="Count", color="Company",
barmode='group', text="Product")

You can do this with pandas groupby and size.
import plotly.express as px
data = df.groupby(["Month", "Company", "Product"]).size().rename("Count").reset_index()
month = "2017-03"
px.bar(data[data["Month"]==month], x="Product", y="Count", color="Company", title=month)
Note that you have to reset the index to remove the multi index (currently not supported by plotly express).

This could also be one way to do this. It is useful to plot aggregated data (count, sum,mean etc.) when a dataframe is grouped by two columns and we want to represent both columns in the plot. Also, 'facet_col' and 'color' could be interchanged depending on desired output.
The plot doesn't look correct with dummy data but it worked with my actual data.
import plotly.express as px
px.bar(df_grouped, x='month', y='count', color='Company',facet_col='Product')

Related

How can I plot a pandas dataframe where x = month and y = frequency of text?

I have the following dataset:
Date
ID
Fruit
2021-2-2
1
Apple
2021-2-2
1
Pear
2021-2-2
1
Apple
2021-2-2
2
Pear
2021-2-2
2
Pear
2021-2-2
2
Apple
2021-3-2
3
Apple
2021-3-2
3
Apple
I have removed duplicate "Fruit" based on ID (There can only be 1 apple per ID number but multiple apples per a single month). And now I would like to generate multiple scatter/line plots (one per "Fruit" type) with the x-axis as month (i.e. Jan. 2021, Feb. 2021, Mar. 2021, etc) and the y-axis as frequency or counts of "Fruit" that occur in that month.
If I could generate new columns in a new sheet in Excel that I could then plot as x and y that would be great too. Something like this for Apples specifically:
Month
Number of Apples
Jan 2021
0
Feb 2021
2
Mar 2021
1
I've tried the following which let me remove duplicates but I can't figure out how to count the number of Apples in the Fruit column that occur within a given timeframe (month is what I'm looking for now) and set that to the y-axis.
import numpy as np
import pandas as pd
import re
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_excel('FruitExample.xlsx',
usecols=("A:E"), sheet_name=('Data'))
df_Example = df.drop_duplicates(subset=["ID Number", "Fruit"], keep="first")
df_Example.plot(x="Date", y=count("Fruit"), style="o")
plt.show()
I've tried to use groupby and categorical but can't seem to count this up properly and plot it. Here is an example of a plot that would be great.
[]
Make sure the dates are in datetime format
df['Date']=pd.to_datetime(df['Date'])
Then create a column for month-year,
df['Month-Year']=df['Date'].dt.to_period('M') #M for month
new_df=pd.DataFrame(df.groupby(['Month-Year','Fruit'])['ID'].count())
new_df.reset_index(inplace=True)
Make sure to change back to datetime as seaborn can't handle 'period' type
new_df['Month-Year']=new_df['Month-Year'].apply(lambda x: x.to_timestamp())
Then plot,
import seaborn as sns
sns.lineplot(x='Month-Year',y='ID',data=new_df,hue='Fruit')

Grouped bar chart for categories by month/year

I'm trying to use Plotly to create a stacked or grouped bar chart that has month/year on the x-axis and values on the y-axis. The data frame looks like this:
category value date
apple 4 10/2020
banana 3 10/2020
apple 2 10/2020
strawberry 1 11/2020
banana 4 11/2020
apple 9 11/2020
banana 4 12/2020
apple 7 12/2020
strawberry 4 12/2020
banana 8 12/2020
.
.
.
Assuming that newer dates will come through, and also more categories can be added, I'm trying to create a grouped bar chart that is also scrollable on the x-axis(date).
I tried this to create the grouped bar chart but it ends up being a stacked bar chart instead:
import plotly.graph_objects as go
fig_3_a = go.Figure(data=[go.Bar(
x=df['date'],
y=df['value'],
text=df['category'],
textposition='auto',
orientation ='v',
)],
layout=go.Layout(barmode='group'))
I would like something like this instead, where the different categories can possibly be assigned a different color, and the x-axis being the month/day and the y-axis being the value. Here, gender==category and x-axis==month/year. Also would need to add the scrolling for the x-axis to see all the month/year:
You can do it simply with plotly.express.
import plotly.express as px
fig = px.bar(df, x='date', y='value', color='category', barmode='group')
fig.show()
If you want to do it with go.Bar class, you need to add traces for each category.

Plotting several lines in one plot

My dataset has three columns, namely date, sold, and item.
I would like to investigate where a change in trends (like a peak or a drop) in market sales happens.
Date Sold Item
01/02/2018 1 socks
01/03/2018 4 t-shirts
01/04/2018 3 pants
01/04/2018 2 shirts
01/05/2018 1 socks
...
12/12/2018 21 watches
12/12/2018 35 toys
...
12/22/2018 43 flowers
12/22/2018 25 toys
12/22/2018 32 shirts
12/22/2018 70 pijamas
...
12/31/2018 12 toys
12/31/2018 2 skirts
To do this, I have been considering two things:
number of total sales per date (e.g. 1 on Jan 2, 2018; 4 on Jan 3,2018; 5 on Jan 4, 2018; and so on);
number of sales per item through time (i.e. looking at each item trend through time separately)
The first key point should be easily assessed by using groupby; the second key point should be also doable by using groupby.
However, my difficulties are in plotting all the items in the same plot (preferable a line plot).
What I have done is:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn
df = pd.read_csv("./MarketSales.csv")
sales_plot = df[Item].groupby("Sold").sum().sort_values("Sold",ascending=False).plot()
sales_plot.set_xlabel("Date")
sales_plot.set_ylabel("Frequency")
Unfortunately, the code above does not generate the expected results.
The most challenging topic in Python is about the use of groupby and plot.
I hope you can help me to understand the approach.
I'm not sure why you groupby 'Sold' because you seem to be interested by the number of sold per date, so here are the two lines of codes that would address your two points:
df.groupby(['Date'])['Sold'].sum().plot()
#and
df.groupby(['Date','Item'])['Sold'].sum().unstack().plot()
Also, you may want to convert your date before with df['Date'] = pd.to_datetime(df['Date']) to have a better visualization with time

Grouping by multiple years in a single column and plotting the result stacked

I have a dataframe that looks like this, with the default pandas index starting at 0:
index Year Count Name
0 2005 70000 Apple
1 2005 60000 Banana
2 2006 20000 Pineapple
3 2007 70000 Cherry
4 2007 60000 Coconut
5 2007 40000 Pear
6 2008 90000 Grape
7 2008 10000 Apricot
I would like to create a stacked bar plot of this data.
However, using the df.groupby() function will only allow me to call a function such as .mean() or .count() on this data in order to plot the data by year. I am getting the following result which separates each data point and does not group them by the shared year.
I have seen the matplotlib example for stacked bar charts, but they are grouped by a common index, in this case I do not have a common index I want to plot by. Is there a way to group and plot this data without rearranging the entire dataframe?
If I understood you correctly, you could do this using pivot first:
df1 = pd.pivot_table(df, values='Count', index='Year', columns='Name')
df1.plot(kind='bar')
Output:
Or with the argument stacked=True:
df1.plot(kind='bar', stacked=True)

A convenient way to plot bar-plot in Python pandas

I have a DataFrame contains as following, where first row is the "columns":
id,year,type,sale
1,1998,a,5
2,2000,b,10
3,1999,c,20
4,2001,b,15
5,2001,a,25
6,1998,b,5
...
I want to draw two figures, the first one is like
The second one is like
Figures in my draft might not be in right scale. I am a newbie to Python and I understand plotting functionality is powerful in Python. I believe there must be very easy to plot such figures.
The Pandas library provides simple and efficient tools to analyze and plot DataFrames.
Considering that the pandas library is installed and that the data are in a .csv file (matching the example you provided).
1. import the pandas library and load the data
import pandas as pd
data = pd.read_csv('filename.csv')
You now have a Pandas Dataframe as follow:
id year type sale
0 1 1998 a 5
1 2 2000 b 10
2 3 1999 c 20
3 4 2001 b 15
4 5 2001 a 25
5 6 1998 b 5
2. Plot the "sale" vs "type"
This is easily achieved by:
data.plot('type', 'sale', kind='bar')
which results in
If you want the sale for each type to be summed, data.groupby('type').sum().plot(y='sale', kind='bar') will do the trick (see #3 for explanation)
3. Plot the "sale" vs "year"
This is basically the same command, except that you have to first sum all the sale in the same year using the groupby pandas function.
data.groupby('year').sum().plot(y='sale', kind='bar')
This will result in
Edit:
4 Unstack the different type per year
You can also unstack the different 'type' per year for each bar by using groupby on 2 variables
data.groupby(['year', 'type']).sum().unstack().plot(y='sale', kind='bar', stacked=True)
Note:
See the Pandas Documentation on visualization for more information about achieving the layout you want.

Categories