Multiple column plotting Python - python

I've got data in the form:
Year Month State Value
2001 Jan AK 80
2001 Feb AK 40
2001 Mar AK 60
2001 Jan LA 70
2001 Feb LA 79
2001 Mar LA 69
2001 Jan KS 65
.
.
This data is only for Year 2001 and Months repeat on each State.
I want a basic graph with this data together in one based off the State with X-Axis being Month and Y-Axis being the Value.
When I plot with:
g = df.groupby('State')
for state, data in g:
plt.plot(df['Month'], df['Value'], label=state)
plt.show()
I get a very wonky looking graph.
I know based off plotting these individually they aren't extremely different in their behaviour but they are not even close to being this much overlapped.
Is there a way of building more of a continuous plot?

Your problem is that inside your for loop you're referencing df, which still has the data for all the states. Try:
for state, data in g:
plt.plot(data['Month'], data['Value'], label = state)
plt.legend()
plt.show()
Hopefully this helps!

Related

Change X-axis for timeseries plot in Python

So I have a dataframe like this:
YM
YQMD
YQM
Year
Quarter
Month
Day
srch_id
2012-11
2012-4-11-01
2012-4-11
2012
4
11
01
3033780585
2012-11
2012-4-11-02
2012-4-11
2012
4
11
02
2812558229
..
2013-06
2013-2-06-26
2013-2-06
2013
2
06
26
5000321400
2013-06
2013-2-06-27
2013-2-06
2013
2
06
27
3953504722
Now I want to plot a lineplot. I did it with this code:
#plot lineplot
sns.set_style('darkgrid')
sns.set(rc={'figure.figsize':(14,8)})
ax = sns.lineplot(data=df_monthly, x ='YQMD', y = 'click_bool')
plt.ylabel('Number of Search Queries')
plt.xlabel('Year-Month')
plt.show()
This is the plot that I got and as you can see, the x-axis you cannot see the dates etc, because they are too much.
The pattern of the plot is the right way that I want it, but is it possible to get the x-axis like this below?:

How to produce a new data frame of mean monthly data, given a data frame consisting of daily data?

I have a data frame containing the daily CO2 data since 2015, and I would like to produce the monthly mean data for each year, then put this into a new data frame. A sample of the data frame I'm using is shown below.
month day cycle trend
year
2011 1 1 391.25 389.76
2011 1 2 391.29 389.77
2011 1 3 391.32 389.77
2011 1 4 391.36 389.78
2011 1 5 391.39 389.79
... ... ... ... ...
2021 3 13 416.15 414.37
2021 3 14 416.17 414.38
2021 3 15 416.18 414.39
2021 3 16 416.19 414.39
2021 3 17 416.21 414.40
I plan on using something like the code below to create the new monthly mean data frame, but the main problem I'm having is indicating the specific subset for each month of each year so that the mean can then be taken for this. If I could highlight all of the year "2015" for the month "1" and then average this etc. that might work?
Any suggestions would be hugely appreciated and if I need to make any edits please let me know, thanks so much!
dfs = list()
for l in L:
dfs.append(refined_data[index = 2015, "month" = 1. day <=31].iloc[l].mean(axis=0))
mean_matrix = pd.concat(dfs, axis=1).T

Plotting multiple lines in one graph with pandas and matplotlib, using climate data

I'm trying to create a graph that shows whether or not average temperatures in my city are increasing. I'm using data provided by NOAA and have a DataFrame that looks like this:
DATE TAVG MONTH YEAR
0 1939-07 86.0 07 1939
1 1939-08 84.8 08 1939
2 1939-09 82.2 09 1939
3 1939-10 68.0 10 1939
4 1939-11 53.1 11 1939
5 1939-12 52.5 12 1939
This is saved in a variable called "avgs", and I then use groupby and plot functions like so:
avgs.groupby(["YEAR"]).plot(kind='line',x='MONTH', y='TAVG')
This produces a line graph (see below for example) for each year that shows the average temperature for each month. That's great stuff, but I'd like to be able to put all of the yearly line graphs into one graph, for the purposes of visual comparison (to see if the monthly averages are increasing).
Example output
I'm a total noob with matplotlib and pandas, so I don't know the best way to do this. Am I going wrong somewhere and just don't realize it? And if I'm on the right track, where should I go from here?
Very similar to the other answer (by Anake), but you can get control over legend here (the other answer, legends for all years will be "TAVG". I add a new year entries into your data just to show this.
avgs = '''
DATE TAVG MONTH YEAR
0 1939-07 86.0 07 1939
1 1939-08 84.8 08 1939
2 1939-09 82.2 09 1939
3 1939-10 68.0 10 1939
4 1940-11 53.1 11 1940
5 1940-12 52.5 12 1940
'''
ax = plt.subplot()
for key, group in avgs.groupby("YEAR"):
ax.plot(group.MONTH, group.TAVG, label = key)
ax.set_xlabel('Month')
ax.set_ylabel('TAVG')
plt.legend()
plt.show()
will result in
You can do:
ax = None
for group in df.groupby("YEAR"):
ax = group[1].plot(x="MONTH", y="TAVG", ax=ax)
plt.show()
Each plot() returns the matplotlib Axes instance where it drew the plot. So by feeding that back in each time, you can repeatedly draw on the same set of axes.
I don't think you can do that directly in the functional style as you have tried unfortunately.

Matplotlib Plot confusion

I'm not sure what is going on here but I have two seemingly similar bits of code designed to produce graphs in the same format:
apple_fcount = apple_f1.groupby("Year")["Domain Category"].nunique("count")
plt.figure(1); apple_fcount.plot(figsize=(12,6))
plt.xlabel('Year')
plt.ylabel('Number of Fungicides used')
plt.title('Number of fungicides used on apples in the US')
plt.savefig('C:/Users/User/Documents/Work/Year 3/Project/Plots/Apple/apple fcount')
This one produces the graph how I would like it to be seen; y axis shows number of fungicides, x axis shows the respective years. However, the following code, on a different dataset prints a usable graph but the x axis shows years as '1, 2, 3, ...' instead of the actual years.
apple_yplot = apple_y1.groupby('Year')['Value'].sum()
plt.figure(3); apple_yplot.plot(figsize=(12,6))
plt.xlabel('Year')
plt.ylabel('Yield / lb per acre')
plt.title('Graph of apple yield in the US over time')
plt.savefig('C:/Users/User/Documents/Work/Year 3/Project/Plots/Apple/Yield.png')
The only discernable difference I see in the code is that the first counts .nunique() datapoints, whilst the second is a .sum() of all data in the year. I can't imagine thats the reason behind this issue though. Both .groupby() lines print in the same format, with year being correctly displayed there.
apple_fcount:
Year
1991 19
1993 19
1995 21
1997 26
1999 28
2001 27
2003 31
2005 37
2007 30
2009 35
2011 32
Name: Domain Category, dtype: int64
apple_yplot:
Year
2007 405399
2008 541180
2009 483130
2010 473150
2011 468120
2012 417710
2013 529470
2014 510700
Name: Value, dtype: float64

pandas complicated stacked barplot

I have the following data:
Year LandUse Region Area
0 2005 Corn LP 2078875
1 2005 Corn UP 149102.4
2 2005 Open Lands LP 271715
3 2005 Open Lands UP 232290.1
4 2005 Soybeans LP 1791342
5 2005 Soybeans UP 50799.12
6 2005 Other Ag LP 638010.4
7 2005 Other Ag UP 125527.2
8 2005 Forests/Wetlands LP 69629.86
9 2005 Forests/Wetlands UP 26511.43
10 2005 Developed LP 10225.56
11 2005 Developed UP 1248.442
12 2010 Corn LP 2303999
13 2010 Corn UP 201977.2
14 2010 Open Lands LP 131696.3
15 2010 Open Lands UP 45845.81
16 2010 Soybeans LP 1811186
17 2010 Soybeans UP 66271.21
18 2010 Other Ag LP 635332.9
19 2010 Other Ag UP 257439.9
20 2010 Forests/Wetlands LP 48124.43
21 2010 Forests/Wetlands UP 23433.76
22 2010 Developed LP 7619.853
23 2010 Developed UP 707.4816
How do I use pandas to make a stacked bar plot that shows area on y-axis and uses 'REGION' to construct the stacks and uses YEAR and LandUse on x-axis.
The main thing with pandas plots is figuring out which shape pandas expects the data to be in. If we reshape so that Year is in the index and different regions are in different columns:
# Assuming that we want to sum the areas for different
# LandUse's within each region
plot_table = df.pivot_table(index='Year', columns='Region',
values='Area', aggfunc='sum')
plot_table
Out[39]:
Region LP UP
Year
2005 4859797.820 585478.6920
2010 4937958.483 595675.3616
The plotting happens pretty straightforwardly:
plot_table.plot(kind='bar', stacked=True)
Having both Year and LandUse on the x-axis doesn't require much extra work, you can
put both in the index when creating the table for plotting:
plot_table = df.pivot_table(index=['Year', 'LandUse'],
columns='Region',
values='Area', aggfunc='sum')

Categories