I have a temperature file with many years temperature records, in a format as below:
2012-04-12,16:13:09,20.6
2012-04-12,17:13:09,20.9
2012-04-12,18:13:09,20.6
2007-05-12,19:13:09,5.4
2007-05-12,20:13:09,20.6
2007-05-12,20:13:09,20.6
2005-08-11,11:13:09,20.6
2005-08-11,11:13:09,17.5
2005-08-13,07:13:09,20.6
2006-04-13,01:13:09,20.6
Every year has different numbers, time of the records, so the pandas datetimeindices are all different.
I want to plot the different year's data in the same figure for comparing . The X-axis is Jan to Dec, the Y-axis is temperature. How should I go about doing this?
Try:
ax = df1.plot()
df2.plot(ax=ax)
If you a running Jupyter/Ipython notebook and having problems using;
ax = df1.plot()
df2.plot(ax=ax)
Run the command inside of the same cell!! It wont, for some reason, work when they are separated into sequential cells. For me at least.
Chang's answer shows how to plot a different DataFrame on the same axes.
In this case, all of the data is in the same dataframe, so it's better to use groupby and unstack.
Alternatively, pandas.DataFrame.pivot_table can be used.
dfp = df.pivot_table(index='Month', columns='Year', values='value', aggfunc='mean')
When using pandas.read_csv, names= creates column headers when there are none in the file. The 'date' column must be parsed into datetime64[ns] Dtype so the .dt extractor can be used to extract the month and year.
import pandas as pd
# given the data in a file as shown in the op
df = pd.read_csv('temp.csv', names=['date', 'time', 'value'], parse_dates=['date'])
# create additional month and year columns for convenience
df['Year'] = df.date.dt.year
df['Month'] = df.date.dt.month
# groupby the month a year and aggreate mean on the value column
dfg = df.groupby(['Month', 'Year'])['value'].mean().unstack()
# display(dfg)
Year 2005 2006 2007 2012
Month
4 NaN 20.6 NaN 20.7
5 NaN NaN 15.533333 NaN
8 19.566667 NaN NaN NaN
Now it's easy to plot each year as a separate line. The OP only has one observation for each year, so only a marker is displayed.
ax = dfg.plot(figsize=(9, 7), marker='.', xticks=dfg.index)
To do this for multiple dataframes, you can do a for loop over them:
fig = plt.figure(num=None, figsize=(10, 8))
ax = dict_of_dfs['FOO'].column.plot()
for BAR in dict_of_dfs.keys():
if BAR == 'FOO':
pass
else:
dict_of_dfs[BAR].column.plot(ax=ax)
This can also be implemented without the if condition:
fig, ax = plt.subplots()
for BAR in dict_of_dfs.keys():
dict_of_dfs[BAR].plot(ax=ax)
You can make use of the hue parameter in seaborn. For example:
import seaborn as sns
df = sns.load_dataset('flights')
year month passengers
0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121
.. ... ... ...
139 1960 Aug 606
140 1960 Sep 508
141 1960 Oct 461
142 1960 Nov 390
143 1960 Dec 432
sns.lineplot(x='month', y='passengers', hue='year', data=df)
Related
I'm trying to learn how to plot dataframes. I read in a csv and have the following columns:
cost, model, origin, year
--------------------------
200 x1 usa 2020
145 x1 chn 2020
233 x1 usa 2020
122 x2 chn 2020
583 x2 usa 2020
233 x3 chn 2020
201 x3 chn 2020
I'm trying to create a bar plot and only want to plot the average cost per model.
Here's my attempt, but I dont think im on the right track:
df = df.groupby('cost').mean()
plt.bar(df.index, df['model'])
plt.show()
You can groupby model, then calculate the mean of cost and plot it:
df.groupby('model')['cost'].mean().plot.bar()
Output:
Or with seaborn:
sns.barplot(data=df, x='model', y='cost', ci=None)
Output:
You can use the pandas plot function like so:
df.plot.bar(x='model', y='cost')
Below is script for a simplified version of the df in question:
import pandas as pd
df = pd.DataFrame({
'week': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17],
'month' : ['JAN','JAN ','JAN','JAN','FEB','FEB','FEB','FEB','MAR','MAR',
'MAR','MAR','APR','APR','APR','APR','APR'],
'weekly_stock' : [4,2,5,6,2,3,6,8,7,9,5,3,5,4,5,8,9]
})
df
week month weekly_stock
0 1 JAN 4
1 2 JAN 2
2 3 JAN 5
3 4 JAN 6
4 5 FEB 2
5 6 FEB 3
6 7 FEB 6
7 8 FEB 8
8 9 MAR 7
9 10 MAR 9
10 11 MAR 5
11 12 MAR 3
12 13 APR 5
13 14 APR 4
14 15 APR 5
15 16 APR 8
16 17 APR 9
As it currently stands, the script below produces a bar chart with week for x-labels
# plot chart
labels=df.week
line=df['weekly_stock']
fig, ax = plt.subplots(figsize=(20,8))
line1=plt.plot(line, label = '2019')
ax.set_xticks(x)
ax.set_xticklabels(labels, rotation=0)
ax.set_ylabel('Stock')
ax.set_xlabel('week')
plt.title('weekly stock')
However, I would like to have the month as the x-label.
INTENDED PLOT:
Any help would be greatly appreciated.
My recommendation is to have a valid datetime values column instead of 'month' and 'week', like you have. Matplotlib is pretty smart when working with valid datetime values, so I'd structure the dates like so first:
import pandas as pd
import matplotlib.pyplot as plt
# valid datetime values in a range
dates = pd.date_range(
start='2019-01-01',
end='2019-04-30',
freq='W', # weekly increments
name='dates',
closed='left'
)
weekly_stocks = [4,2,5,6,2,3,6,8,7,9,5,3,5,4,5,8,9]
df = pd.DataFrame(
{'weekly_stocks': weekly_stocks},
index=dates # set dates column as index
)
df.plot(
figsize=(20,8),
kind='line',
title='Weekly Stocks',
legend=False,
xlabel='Week',
ylabel='Stock'
)
plt.grid(which='both', linestyle='--', linewidth=0.5)
Now this is a fairly simple solution. Take notice that the ticks appear exactly where the weeks are; Matplotlib did all the work for us!
(easier) You can either lay the "data foundation" prior to plotting correctly, i.e., format the data for Matplotlib to do all the work like we did above(think of the ticks being the actual date-points created in the pd.date_range()).
(harder) Use tick locators/formatters as mentioned in docs here
Hope this was helpful.
I have the following data frame:
Parameters Year 2016 Year 2017 Year 2018....
0) X 10 12 13
1) Y 12 12 45
2) Z 89 23 97
3
.
.
.
I want to make a line chart with the column headers starting from Year 2016 to be on the x-axis and each line on the chart to represent each of the parameters - X, Y, Z
I am using the matplotlib library to make the plot but it is throwing errors.
Where given this dataframe:
df = pd.DataFrame({'Parameters':['X','Y','Z'],
'Year 2016':[10,12,13],
'Year 2017':[12,12,45],
'Year 2018':[89,23,97]})
Input Dataframe:
Parameters Year 2016 Year 2017 Year 2018
0 X 10 12 89
1 Y 12 12 23
2 Z 13 45 97
You can use some dataframe shaping and pandas plot:
df_out = df.set_index('Parameters').T
df_out.set_axis(pd.to_datetime(df_out.index, format='Year %Y'), axis=0, inplace=False)\
.plot()
Output Graph:
If you have a pandas DataFrame, let's call it df, for which your columns are X, Y, Z, etc. and your rows are the years in order, you can simply call df.plot() to plot each column as a line with the y axis being the values and the row name giving the x-axis.
I'm trying to create a graph that shows whether or not average temperatures in my city are increasing. I'm using data provided by NOAA and have a DataFrame that looks like this:
DATE TAVG MONTH YEAR
0 1939-07 86.0 07 1939
1 1939-08 84.8 08 1939
2 1939-09 82.2 09 1939
3 1939-10 68.0 10 1939
4 1939-11 53.1 11 1939
5 1939-12 52.5 12 1939
This is saved in a variable called "avgs", and I then use groupby and plot functions like so:
avgs.groupby(["YEAR"]).plot(kind='line',x='MONTH', y='TAVG')
This produces a line graph (see below for example) for each year that shows the average temperature for each month. That's great stuff, but I'd like to be able to put all of the yearly line graphs into one graph, for the purposes of visual comparison (to see if the monthly averages are increasing).
Example output
I'm a total noob with matplotlib and pandas, so I don't know the best way to do this. Am I going wrong somewhere and just don't realize it? And if I'm on the right track, where should I go from here?
Very similar to the other answer (by Anake), but you can get control over legend here (the other answer, legends for all years will be "TAVG". I add a new year entries into your data just to show this.
avgs = '''
DATE TAVG MONTH YEAR
0 1939-07 86.0 07 1939
1 1939-08 84.8 08 1939
2 1939-09 82.2 09 1939
3 1939-10 68.0 10 1939
4 1940-11 53.1 11 1940
5 1940-12 52.5 12 1940
'''
ax = plt.subplot()
for key, group in avgs.groupby("YEAR"):
ax.plot(group.MONTH, group.TAVG, label = key)
ax.set_xlabel('Month')
ax.set_ylabel('TAVG')
plt.legend()
plt.show()
will result in
You can do:
ax = None
for group in df.groupby("YEAR"):
ax = group[1].plot(x="MONTH", y="TAVG", ax=ax)
plt.show()
Each plot() returns the matplotlib Axes instance where it drew the plot. So by feeding that back in each time, you can repeatedly draw on the same set of axes.
I don't think you can do that directly in the functional style as you have tried unfortunately.
My data is organized like this:
Where country code is the index of the data frame and the columns are the years for the data. First, is it possible to plot line graphs (using matplotlib.pylot) over time for each country without transforming the data any further?
Second, if the above is not possible, how can I make the columns the index of the table so I can plot time series line graphs?
Trying df.t gives me this:
How can I make the dates the index now?
Transpose using df.T.
Plot as usual.
Sample:
import pandas as pd
df = pd.DataFrame({1990:[344,23,43], 1991:[234,64,23], 1992:[43,2,43]}, index = ['AFG', 'ALB', 'DZA'])
df = df.T
df
AFG ALB DZA
1990 344 23 43
1991 234 64 23
1992 43 2 43
# transform index to dates
import datetime as dt
df.index = [dt.date(year, 1, 1) for year in df.index]
import matplotlib.pyplot as plt
df.plot()
plt.savefig('test.png')