Manipulate Python Data frame to plot line charts - python

I have the following data frame:
Parameters Year 2016 Year 2017 Year 2018....
0) X 10 12 13
1) Y 12 12 45
2) Z 89 23 97
3
.
.
.
I want to make a line chart with the column headers starting from Year 2016 to be on the x-axis and each line on the chart to represent each of the parameters - X, Y, Z
I am using the matplotlib library to make the plot but it is throwing errors.

Where given this dataframe:
df = pd.DataFrame({'Parameters':['X','Y','Z'],
'Year 2016':[10,12,13],
'Year 2017':[12,12,45],
'Year 2018':[89,23,97]})
Input Dataframe:
Parameters Year 2016 Year 2017 Year 2018
0 X 10 12 89
1 Y 12 12 23
2 Z 13 45 97
You can use some dataframe shaping and pandas plot:
df_out = df.set_index('Parameters').T
df_out.set_axis(pd.to_datetime(df_out.index, format='Year %Y'), axis=0, inplace=False)\
.plot()
Output Graph:

If you have a pandas DataFrame, let's call it df, for which your columns are X, Y, Z, etc. and your rows are the years in order, you can simply call df.plot() to plot each column as a line with the y axis being the values and the row name giving the x-axis.

Related

Plotting barplot category-wise in pandas

I have a dataframe containing columns code, year and number_of_dues. I want to plot barplot having year on x axis and no of claims for each year on y axis for each code in one after after subplot fashion. please help me.
Sample data is given below.
Code Year No_of_dues
1 2016 100
1 2017 200
1 2018 300
2 2016 200
2 2017 300
2 2018 500
3 2016 600
3 2017 800
3 2018
Try this one:
df.groupby(['Code', 'Year'])['No_of_dues'].sum().to_frame().plot.bar()
just use seaborn.
set your x and y axes, and hue by the class you want to cohort by

Why this grouped data frame don't show the expected plot?

I have this pandas data frame, where I want to make a line plot, per each year strata:
year month canasta
0 2011 1 239.816531
1 2011 2 239.092353
2 2011 3 239.332308
3 2011 4 237.591538
4 2011 5 238.384231
... ... ... ...
59 2015 12 295.578605
60 2016 1 296.918861
61 2016 2 296.398701
62 2016 3 296.488780
63 2016 4 300.922927
And I tried this code:
dca.groupby(['year', 'month'])['canasta'].mean().reset_index().plot()
But I get this result:
I must be doing something wrong. Please, could you help me with this plot? The x axis is the months, and there should be a line per each year.
Why: Because after you do reset_index, year and month become normal columns. And some_df.plot() simply plots all the columns of the dataframe into one plot, resulting what you posted.
Fix: Try unstack instead of reset_index:
(dca.groupby(['year', 'month'])
['canasta'].mean()
.unstack('year').plot()
)

line chart with months for x-labels but using weekly data

Below is script for a simplified version of the df in question:
import pandas as pd
df = pd.DataFrame({
'week': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17],
'month' : ['JAN','JAN ','JAN','JAN','FEB','FEB','FEB','FEB','MAR','MAR',
'MAR','MAR','APR','APR','APR','APR','APR'],
'weekly_stock' : [4,2,5,6,2,3,6,8,7,9,5,3,5,4,5,8,9]
})
df
week month weekly_stock
0 1 JAN 4
1 2 JAN 2
2 3 JAN 5
3 4 JAN 6
4 5 FEB 2
5 6 FEB 3
6 7 FEB 6
7 8 FEB 8
8 9 MAR 7
9 10 MAR 9
10 11 MAR 5
11 12 MAR 3
12 13 APR 5
13 14 APR 4
14 15 APR 5
15 16 APR 8
16 17 APR 9
As it currently stands, the script below produces a bar chart with week for x-labels
# plot chart
labels=df.week
line=df['weekly_stock']
fig, ax = plt.subplots(figsize=(20,8))
line1=plt.plot(line, label = '2019')
ax.set_xticks(x)
ax.set_xticklabels(labels, rotation=0)
ax.set_ylabel('Stock')
ax.set_xlabel('week')
plt.title('weekly stock')
However, I would like to have the month as the x-label.
INTENDED PLOT:
Any help would be greatly appreciated.
My recommendation is to have a valid datetime values column instead of 'month' and 'week', like you have. Matplotlib is pretty smart when working with valid datetime values, so I'd structure the dates like so first:
import pandas as pd
import matplotlib.pyplot as plt
# valid datetime values in a range
dates = pd.date_range(
start='2019-01-01',
end='2019-04-30',
freq='W', # weekly increments
name='dates',
closed='left'
)
weekly_stocks = [4,2,5,6,2,3,6,8,7,9,5,3,5,4,5,8,9]
df = pd.DataFrame(
{'weekly_stocks': weekly_stocks},
index=dates # set dates column as index
)
df.plot(
figsize=(20,8),
kind='line',
title='Weekly Stocks',
legend=False,
xlabel='Week',
ylabel='Stock'
)
plt.grid(which='both', linestyle='--', linewidth=0.5)
Now this is a fairly simple solution. Take notice that the ticks appear exactly where the weeks are; Matplotlib did all the work for us!
(easier) You can either lay the "data foundation" prior to plotting correctly, i.e., format the data for Matplotlib to do all the work like we did above(think of the ticks being the actual date-points created in the pd.date_range()).
(harder) Use tick locators/formatters as mentioned in docs here
Hope this was helpful.

Stacked area chart with datetime axis

I am attepmtimng to create a Bokeh stacked area chart from the following Pandas DataFrame.
An example of the of the DataFrame (df) is as follows;
date tom jerry bill
2014-12-07 25 12 25
2014-12-14 15 16 30
2014-12-21 10 23 32
2014-12-28 12 13 55
2015-01-04 5 15 20
2015-01-11 0 15 18
2015-01-18 8 9 17
2015-01-25 11 5 16
The above DataFrame represents a snippet of the total df, which snaps over a number of years and contains additional names to the ones shown.
I am attempting to use the datetime column date as the x-axis, with the count information for each name as the y-axis.
Any assistance that anyone could provide would be greatly appreciated.
You can create a stacked area chart by using the patch glyph. I first used df.cumsum to stack the values in the dataframe by row. After that I append two rows to the dataframe with the max and min date and Y value 0. I plot the patches in a reverse order of the column list (excluding the date column) so the person with the highest values is getting plotted first and the persons with lower values are plotted after.
Another implementation of a stacked area chart can be found here.
import pandas as pd
from bokeh.plotting import figure, show
from bokeh.palettes import inferno
from bokeh.models.formatters import DatetimeTickFormatter
df = pd.read_csv('stackData.csv')
df_stack = df[list(df)[1:]].cumsum(axis=1)
df_stack['date'] = df['date'].astype('datetime64[ns]')
bot = {list(df)[0]: max(df_stack['date'])}
for column in list(df)[1:]:
bot[column] = 0
df_stack = df_stack.append(bot, ignore_index=True)
bot = {list(df)[0]: min(df_stack['date'])}
for column in list(df)[1:]:
bot[column] = 0
df_stack = df_stack.append(bot, ignore_index=True)
p = figure(x_axis_type='datetime')
p.xaxis.formatter=DatetimeTickFormatter(days=["%d/%m/%Y"])
p.xaxis.major_label_orientation = 45
for person, color in zip(list(df_stack)[2::-1], inferno(len(list(df_stack)))):
p.patch(x=df_stack['date'], y=df_stack[person], color=color, legend=person)
p.legend.click_policy="hide"
show(p)

Python matplotlib - add trend line, make subplot and save to .pdf [duplicate]

I have a temperature file with many years temperature records, in a format as below:
2012-04-12,16:13:09,20.6
2012-04-12,17:13:09,20.9
2012-04-12,18:13:09,20.6
2007-05-12,19:13:09,5.4
2007-05-12,20:13:09,20.6
2007-05-12,20:13:09,20.6
2005-08-11,11:13:09,20.6
2005-08-11,11:13:09,17.5
2005-08-13,07:13:09,20.6
2006-04-13,01:13:09,20.6
Every year has different numbers, time of the records, so the pandas datetimeindices are all different.
I want to plot the different year's data in the same figure for comparing . The X-axis is Jan to Dec, the Y-axis is temperature. How should I go about doing this?
Try:
ax = df1.plot()
df2.plot(ax=ax)
If you a running Jupyter/Ipython notebook and having problems using;
ax = df1.plot()
df2.plot(ax=ax)
Run the command inside of the same cell!! It wont, for some reason, work when they are separated into sequential cells. For me at least.
Chang's answer shows how to plot a different DataFrame on the same axes.
In this case, all of the data is in the same dataframe, so it's better to use groupby and unstack.
Alternatively, pandas.DataFrame.pivot_table can be used.
dfp = df.pivot_table(index='Month', columns='Year', values='value', aggfunc='mean')
When using pandas.read_csv, names= creates column headers when there are none in the file. The 'date' column must be parsed into datetime64[ns] Dtype so the .dt extractor can be used to extract the month and year.
import pandas as pd
# given the data in a file as shown in the op
df = pd.read_csv('temp.csv', names=['date', 'time', 'value'], parse_dates=['date'])
# create additional month and year columns for convenience
df['Year'] = df.date.dt.year
df['Month'] = df.date.dt.month
# groupby the month a year and aggreate mean on the value column
dfg = df.groupby(['Month', 'Year'])['value'].mean().unstack()
# display(dfg)
Year 2005 2006 2007 2012
Month
4 NaN 20.6 NaN 20.7
5 NaN NaN 15.533333 NaN
8 19.566667 NaN NaN NaN
Now it's easy to plot each year as a separate line. The OP only has one observation for each year, so only a marker is displayed.
ax = dfg.plot(figsize=(9, 7), marker='.', xticks=dfg.index)
To do this for multiple dataframes, you can do a for loop over them:
fig = plt.figure(num=None, figsize=(10, 8))
ax = dict_of_dfs['FOO'].column.plot()
for BAR in dict_of_dfs.keys():
if BAR == 'FOO':
pass
else:
dict_of_dfs[BAR].column.plot(ax=ax)
This can also be implemented without the if condition:
fig, ax = plt.subplots()
for BAR in dict_of_dfs.keys():
dict_of_dfs[BAR].plot(ax=ax)
You can make use of the hue parameter in seaborn. For example:
import seaborn as sns
df = sns.load_dataset('flights')
year month passengers
0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121
.. ... ... ...
139 1960 Aug 606
140 1960 Sep 508
141 1960 Oct 461
142 1960 Nov 390
143 1960 Dec 432
sns.lineplot(x='month', y='passengers', hue='year', data=df)

Categories