Plotting a panda dataframe column with only year on the x-axis - python

I have a panda dataframe named 'epu' with three columns 'year', 'month' and 'score'. It looks like this:
year month score
0 1970 1 125.224739
1 1970 1 99.020813
2 1970 1 112.190506
.
.
.
447 2022 4 154.661957
448 2022 5 168.034912
449 2022 6 143.154816
As can be seen, the year range is from 1970 to 2022. I wanted to plot 'score' on the y-axis and 'year' on the x-axis.
epu['score'].plot()
would give me the following graph. The score plot seems to be alright, but I don't get year labels on the x-axis.
But, plotting with
epu.plot(x='year',y='score')
would give me a strange-looking graph like below.
So, my question is how can I generate the graph in the first picture with the year labels on the x-axis of the second picture? Do I need some advanced features of matplotlib? I tried to search for answers here, but I might be missing something and couldn't find an answer for my problem.

Related

how to compare columns in data frame

I'm trying to visually compare two columns in a data frame and it either makes a weird table with 'frequency' instead of one of the columns
I tried these options:
ct1=pd.crosstab(df['releaseyear'],df['score'],normalize=True)
ct1.plot()
df.plot( x='releaseyear', y='score', kind='hist')
and also a scatter plot which get the x and y right but I don't know how normalize it so it will only show the average of each year and not all the data
plt.scatter(df['releaseyear'], df['score'])
plt.show()
There is no proper data which can be used to reproduce the dataframe or clue about how dataframe looks.
This answer is according to what i understood if data is like this
year score
2001 20
2001 18
2002 12
2002 16
then first use groupby and group data according to year and apply required aggregate function.
df=df.groupby('year').mean().reset_index()
output
year score
0 2001 19.0
1 2002 14.0
you can then plot the data accordingly.

Stacked bar plot of large data in python

I would like to plot a stacked bar plot from a csv file in python. I have three columns of data
year word frequency
2018 xyz 12
2017 gfh 14
2018 sdd 10
2015 fdh 1
2014 sss 3
2014 gfh 12
2013 gfh 2
2012 gfh 4
2011 wer 5
2010 krj 4
2009 krj 4
2019 bfg 4
... 300+ rows of data.
I need to go through all the data and plot a stacked bar plot which is categorized based on the year, so x axis is word and y axis is frequency, the legend color should show year wise. I want to see how the evolution of each word occured year wise. Some of the technology words are repeatedly used in every year and hence the stack bar graph should add the values on top and plot, for example the word gfh initially plots 14 for year 2017, and then in year 2014 I want the gfh word to plot (in a different color) for a value of 12 on top of the gfh of 2017. How do I do this? So far I called the csv file in my code. But I don't understand how could it go over all the rows and stack the words appropriately (as some words repeat through all the years). Any help is highly appreciated. Also the years are arranged in random order in csv but I sorted them year wise to make it easier. I am just learning python and trying to understand this plotting routine since i have 40 years of data and ~20 words. So I thought stacked bar plot is the best way to represent them. Any other visualisation method is also welcome.
This can be done using pandas:
import pandas as pd
df = pd.read_csv("file.csv")
# Aggregate data
df = df.groupby(["word", "year"], as_index=False).agg({"frequency": "sum"})
# Create list to sort by
sorter = (
df.groupby(["word"], as_index=False)
.agg({"frequency": "sum"})
.sort_values("frequency")["word"]
.values
)
# Pivot, reindex, and plot
df = df.pivot(index="word", columns="year", values="frequency")
df = df.reindex(sorter)
df.plot.bar(stacked=True)
Which outputs:

How to handle Datatime data with Pandas when grouping by

I have a question. I am dealing with a Datetime DataFrame in Pandas. I want to perform a count on a particular column and group by the month.
For example:
df.groupby(df.index.month)["count_interest"].count()
Assuming that I am analyzing a Data From December 2019. I get a result like this
date
1 246
2 360
3 27
12 170
In reality, December 2019 is supposed to come first. Please what can I do because when I plot the frame grouped by month, the December 2019 is showing at the last and this is practically incorrect.
See plot below for your understanding:
You can try reindex:
df.groupby(df.index.month)["count_interest"].count().reindex([12,1,2,3])

LinePlot with Seaborn: View Limit Minimum Error

I have the following dataframe:
Month Year Location Revenue
0 2015-01 Location 1 0.00
1 2015-03 Location 1 1105.50
2 2015-04 Location 1 44034.28
3 2015-05 Location 1 56756.39
4 2015-06 Location 1 51502.22
There are about two years worth of data. There are 5 different locations. I want to create a lineplot with seaborn that shows 5 different lines (one for each location) with Revenue on the y-axis, Month Year on the x-axis.
sns.lineplot(x="Month Year",
y="Revenue",
hue="Location",
data=rev_by_month,
palette="tab10")
When I run the code above, however, I receive the following error:
view limit minimum -0.05500000000000001 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units
For the record, the Month Year column was created using the pandas .to_datetime() function.

A convenient way to plot bar-plot in Python pandas

I have a DataFrame contains as following, where first row is the "columns":
id,year,type,sale
1,1998,a,5
2,2000,b,10
3,1999,c,20
4,2001,b,15
5,2001,a,25
6,1998,b,5
...
I want to draw two figures, the first one is like
The second one is like
Figures in my draft might not be in right scale. I am a newbie to Python and I understand plotting functionality is powerful in Python. I believe there must be very easy to plot such figures.
The Pandas library provides simple and efficient tools to analyze and plot DataFrames.
Considering that the pandas library is installed and that the data are in a .csv file (matching the example you provided).
1. import the pandas library and load the data
import pandas as pd
data = pd.read_csv('filename.csv')
You now have a Pandas Dataframe as follow:
id year type sale
0 1 1998 a 5
1 2 2000 b 10
2 3 1999 c 20
3 4 2001 b 15
4 5 2001 a 25
5 6 1998 b 5
2. Plot the "sale" vs "type"
This is easily achieved by:
data.plot('type', 'sale', kind='bar')
which results in
If you want the sale for each type to be summed, data.groupby('type').sum().plot(y='sale', kind='bar') will do the trick (see #3 for explanation)
3. Plot the "sale" vs "year"
This is basically the same command, except that you have to first sum all the sale in the same year using the groupby pandas function.
data.groupby('year').sum().plot(y='sale', kind='bar')
This will result in
Edit:
4 Unstack the different type per year
You can also unstack the different 'type' per year for each bar by using groupby on 2 variables
data.groupby(['year', 'type']).sum().unstack().plot(y='sale', kind='bar', stacked=True)
Note:
See the Pandas Documentation on visualization for more information about achieving the layout you want.

Categories