A convenient way to plot bar-plot in Python pandas - python

I have a DataFrame contains as following, where first row is the "columns":
id,year,type,sale
1,1998,a,5
2,2000,b,10
3,1999,c,20
4,2001,b,15
5,2001,a,25
6,1998,b,5
...
I want to draw two figures, the first one is like
The second one is like
Figures in my draft might not be in right scale. I am a newbie to Python and I understand plotting functionality is powerful in Python. I believe there must be very easy to plot such figures.

The Pandas library provides simple and efficient tools to analyze and plot DataFrames.
Considering that the pandas library is installed and that the data are in a .csv file (matching the example you provided).
1. import the pandas library and load the data
import pandas as pd
data = pd.read_csv('filename.csv')
You now have a Pandas Dataframe as follow:
id year type sale
0 1 1998 a 5
1 2 2000 b 10
2 3 1999 c 20
3 4 2001 b 15
4 5 2001 a 25
5 6 1998 b 5
2. Plot the "sale" vs "type"
This is easily achieved by:
data.plot('type', 'sale', kind='bar')
which results in
If you want the sale for each type to be summed, data.groupby('type').sum().plot(y='sale', kind='bar') will do the trick (see #3 for explanation)
3. Plot the "sale" vs "year"
This is basically the same command, except that you have to first sum all the sale in the same year using the groupby pandas function.
data.groupby('year').sum().plot(y='sale', kind='bar')
This will result in
Edit:
4 Unstack the different type per year
You can also unstack the different 'type' per year for each bar by using groupby on 2 variables
data.groupby(['year', 'type']).sum().unstack().plot(y='sale', kind='bar', stacked=True)
Note:
See the Pandas Documentation on visualization for more information about achieving the layout you want.

Related

Plotting a panda dataframe column with only year on the x-axis

I have a panda dataframe named 'epu' with three columns 'year', 'month' and 'score'. It looks like this:
year month score
0 1970 1 125.224739
1 1970 1 99.020813
2 1970 1 112.190506
.
.
.
447 2022 4 154.661957
448 2022 5 168.034912
449 2022 6 143.154816
As can be seen, the year range is from 1970 to 2022. I wanted to plot 'score' on the y-axis and 'year' on the x-axis.
epu['score'].plot()
would give me the following graph. The score plot seems to be alright, but I don't get year labels on the x-axis.
But, plotting with
epu.plot(x='year',y='score')
would give me a strange-looking graph like below.
So, my question is how can I generate the graph in the first picture with the year labels on the x-axis of the second picture? Do I need some advanced features of matplotlib? I tried to search for answers here, but I might be missing something and couldn't find an answer for my problem.

Plot count of one column versus time grouped by two columns

I want to create a bar graph showing count of top three Products per month grouped by Company. Please suggest a way to achieve this, preferably using plotly-express library.
data = [['2017-03', 'Car','Mercerdes'], ['2017-03', 'Car','Mercerdes'], ['2017-03', 'Car','BMW'],['2017-03', 'Car','Audi'],
['2017-03', 'Burger','KFC'], ['2022-09', 'Burger','Subway'], ['2022-09', 'Coffee','Subway'],
['2017-03', 'Clothes','Only'], ['2022-09', 'Coffee','KFC'], ['2022-09', 'Coffee','McD']
]
df = pd.DataFrame(data, columns=['Month', 'Product','Company'])
df
Month Product Company
0 2017-03 Car Mercerdes
1 2017-03 Car Mercerdes
2 2017-03 Car BMW
3 2017-03 Car Audi
4 2017-03 Burger KFC
5 2022-09 Burger Subway
6 2022-09 Coffee Subway
7 2017-03 Clothes Only
8 2022-09 Coffee KFC
9 2022-09 Coffee McD
The approach of groupby done by #kjanker is perfect have the number of counts. I think later on in plotly you can use barmode=group and add the text to the bars as product yields something similar to what you look for.
Otherwise I don't think with plotly express you can achieve to split the columns by Company and Product
import plotly.express as px
data = data = df.groupby(["Month", "Company", "Product"]).size().rename("Count").reset_index()
px.bar(data, x="Month", y="Count", color="Company",
barmode='group', text="Product")
You can do this with pandas groupby and size.
import plotly.express as px
data = df.groupby(["Month", "Company", "Product"]).size().rename("Count").reset_index()
month = "2017-03"
px.bar(data[data["Month"]==month], x="Product", y="Count", color="Company", title=month)
Note that you have to reset the index to remove the multi index (currently not supported by plotly express).
This could also be one way to do this. It is useful to plot aggregated data (count, sum,mean etc.) when a dataframe is grouped by two columns and we want to represent both columns in the plot. Also, 'facet_col' and 'color' could be interchanged depending on desired output.
The plot doesn't look correct with dummy data but it worked with my actual data.
import plotly.express as px
px.bar(df_grouped, x='month', y='count', color='Company',facet_col='Product')

how to compare columns in data frame

I'm trying to visually compare two columns in a data frame and it either makes a weird table with 'frequency' instead of one of the columns
I tried these options:
ct1=pd.crosstab(df['releaseyear'],df['score'],normalize=True)
ct1.plot()
df.plot( x='releaseyear', y='score', kind='hist')
and also a scatter plot which get the x and y right but I don't know how normalize it so it will only show the average of each year and not all the data
plt.scatter(df['releaseyear'], df['score'])
plt.show()
There is no proper data which can be used to reproduce the dataframe or clue about how dataframe looks.
This answer is according to what i understood if data is like this
year score
2001 20
2001 18
2002 12
2002 16
then first use groupby and group data according to year and apply required aggregate function.
df=df.groupby('year').mean().reset_index()
output
year score
0 2001 19.0
1 2002 14.0
you can then plot the data accordingly.

Stacked bar plot of large data in python

I would like to plot a stacked bar plot from a csv file in python. I have three columns of data
year word frequency
2018 xyz 12
2017 gfh 14
2018 sdd 10
2015 fdh 1
2014 sss 3
2014 gfh 12
2013 gfh 2
2012 gfh 4
2011 wer 5
2010 krj 4
2009 krj 4
2019 bfg 4
... 300+ rows of data.
I need to go through all the data and plot a stacked bar plot which is categorized based on the year, so x axis is word and y axis is frequency, the legend color should show year wise. I want to see how the evolution of each word occured year wise. Some of the technology words are repeatedly used in every year and hence the stack bar graph should add the values on top and plot, for example the word gfh initially plots 14 for year 2017, and then in year 2014 I want the gfh word to plot (in a different color) for a value of 12 on top of the gfh of 2017. How do I do this? So far I called the csv file in my code. But I don't understand how could it go over all the rows and stack the words appropriately (as some words repeat through all the years). Any help is highly appreciated. Also the years are arranged in random order in csv but I sorted them year wise to make it easier. I am just learning python and trying to understand this plotting routine since i have 40 years of data and ~20 words. So I thought stacked bar plot is the best way to represent them. Any other visualisation method is also welcome.
This can be done using pandas:
import pandas as pd
df = pd.read_csv("file.csv")
# Aggregate data
df = df.groupby(["word", "year"], as_index=False).agg({"frequency": "sum"})
# Create list to sort by
sorter = (
df.groupby(["word"], as_index=False)
.agg({"frequency": "sum"})
.sort_values("frequency")["word"]
.values
)
# Pivot, reindex, and plot
df = df.pivot(index="word", columns="year", values="frequency")
df = df.reindex(sorter)
df.plot.bar(stacked=True)
Which outputs:

Grouping by multiple years in a single column and plotting the result stacked

I have a dataframe that looks like this, with the default pandas index starting at 0:
index Year Count Name
0 2005 70000 Apple
1 2005 60000 Banana
2 2006 20000 Pineapple
3 2007 70000 Cherry
4 2007 60000 Coconut
5 2007 40000 Pear
6 2008 90000 Grape
7 2008 10000 Apricot
I would like to create a stacked bar plot of this data.
However, using the df.groupby() function will only allow me to call a function such as .mean() or .count() on this data in order to plot the data by year. I am getting the following result which separates each data point and does not group them by the shared year.
I have seen the matplotlib example for stacked bar charts, but they are grouped by a common index, in this case I do not have a common index I want to plot by. Is there a way to group and plot this data without rearranging the entire dataframe?
If I understood you correctly, you could do this using pivot first:
df1 = pd.pivot_table(df, values='Count', index='Year', columns='Name')
df1.plot(kind='bar')
Output:
Or with the argument stacked=True:
df1.plot(kind='bar', stacked=True)

Categories