Pandas mapping with multiple conditions

Pandas mapping with multiple conditions - python

I have a dataframe consisting of Year, Month, Temperature. Now, I need to create seasonal means, such as DJF (Dec, Jan, Feb), MAM (Mar, Apr, May), JJA (Jun, Jul, Aug), SON (Sep, Oct, Nov).
But how can I take into account the fact that DJF should have December from the previous year, January and February of the subsequent year?
This is the code I have so far:
z = {1: 'DJF', 2: 'DJF', 3: 'MAM', 4: 'MAM', 5: 'MAM', 6: 'JJA', 7: 'JJA', 8: 'JJA', 9: 'SON', 10: 'SON',
11: 'SON', 12: 'DJF'}
df['season'] = df['Mon'].map(z)
The problem with the above coding is that when I group by year and season to calculate the means, the values for DJF will be incorrect, since they take Dec, Jan, and Feb of the same year.
df.groupby(['Year','season']).mean()

I think you can create periodindex by to_datetime and to_period
Then shift one moth and convert to Quarters by asfreq.
Last groupby by index anf aggregate mean:
df['Day'] = 1
df.index = pd.to_datetime(df[['Year','Month','Day']]).dt.to_period('M')
df = df.shift(1, freq='M').asfreq('Q')
print (df.groupby(level=0)['Temperature'].mean())
Sample:
rng = pd.date_range('2017-04-03', periods=20, freq='M')
df = pd.DataFrame({'Date': rng, 'Temperature': range(20)})
df['Year'] = df.Date.dt.year
df['Month'] = df.Date.dt.month
df = df.drop('Date', axis=1)
print (df)
Temperature Year Month
0 0 2017 4
1 1 2017 5
2 2 2017 6
3 3 2017 7
4 4 2017 8
5 5 2017 9
6 6 2017 10
7 7 2017 11
8 8 2017 12
9 9 2018 1
10 10 2018 2
11 11 2018 3
12 12 2018 4
13 13 2018 5
14 14 2018 6
15 15 2018 7
16 16 2018 8
17 17 2018 9
18 18 2018 10
19 19 2018 11
df['Day'] = 1
df.index = pd.to_datetime(df[['Year','Month','Day']]).dt.to_period('M')
df = df.shift(1, freq='M').asfreq('Q')
print (df)
Temperature Year Month Day
2017Q2 0 2017 4 1
2017Q2 1 2017 5 1
2017Q3 2 2017 6 1
2017Q3 3 2017 7 1
2017Q3 4 2017 8 1
2017Q4 5 2017 9 1
2017Q4 6 2017 10 1
2017Q4 7 2017 11 1
2018Q1 8 2017 12 1
2018Q1 9 2018 1 1
2018Q1 10 2018 2 1
2018Q2 11 2018 3 1
2018Q2 12 2018 4 1
2018Q2 13 2018 5 1
2018Q3 14 2018 6 1
2018Q3 15 2018 7 1
2018Q3 16 2018 8 1
2018Q4 17 2018 9 1
2018Q4 18 2018 10 1
2018Q4 19 2018 11 1
print (df.groupby(level=0)['Temperature'].mean())
2017Q2 0.5
2017Q3 3.0
2017Q4 6.0
2018Q1 9.0
2018Q2 12.0
2018Q3 15.0
2018Q4 18.0
Freq: Q-DEC, Name: Temperature, dtype: float64
And last if need season column:
df1 = df.groupby(level=0)['Temperature'].mean().rename_axis('per').reset_index()
z = {1: 'DJF',2: 'MAM', 3: 'JJA', 4: 'SON'}
df1['season'] = df1['per'].dt.quarter.map(z)
df1['yaer'] = df1['per'].dt.year
print (df1)
per Temperature season yaer
0 2017Q2 0.5 MAM 2017
1 2017Q3 3.0 JJA 2017
2 2017Q4 6.0 SON 2017
3 2018Q1 9.0 DJF 2018
4 2018Q2 12.0 MAM 2018
5 2018Q3 15.0 JJA 2018
6 2018Q4 18.0 SON 2018

Related

Convert a Python data frame with diferents 'year' column into continue time series

It is posible to convert a dataframe on Pandas like that:
Into a time series where each year its behind the last one

This is likely what df.unstack(level=1) is meant for.
np.random.seed(111) # reproducibility
df = pd.DataFrame(
data={
"2009": np.random.randn(12),
"2010": np.random.randn(12),
"2011": np.random.randn(12),
},
index=range(1, 13)
)
print(df)
Out[45]:
2009 2010 2011
1 -1.133838 -1.440585 0.570594
2 0.384319 0.773703 0.915420
3 1.496554 -1.027967 -1.669341
4 -0.355382 -0.090986 0.482714
5 -0.787534 0.492003 -0.310473
6 -0.459439 0.424672 2.394690
7 -0.059169 1.283049 1.550931
8 -0.354174 0.315986 -0.646465
9 -0.735523 -0.408082 -0.928937
10 -1.183940 -0.067948 -1.654976
11 0.238894 -0.952427 0.350193
12 -0.589920 -0.110677 -0.141757
df_out = df.unstack(1).reset_index()
df_out.columns = ["year", "month", "value"]
print(df_out)
Out[46]:
year month value
0 2009 1 -1.133838
1 2009 2 0.384319
2 2009 3 1.496554
3 2009 4 -0.355382
4 2009 5 -0.787534
5 2009 6 -0.459439
6 2009 7 -0.059169
7 2009 8 -0.354174
8 2009 9 -0.735523
9 2009 10 -1.183940
10 2009 11 0.238894
11 2009 12 -0.589920
12 2010 1 -1.440585
13 2010 2 0.773703
14 2010 3 -1.027967
15 2010 4 -0.090986
16 2010 5 0.492003
17 2010 6 0.424672
18 2010 7 1.283049
19 2010 8 0.315986
20 2010 9 -0.408082
21 2010 10 -0.067948
22 2010 11 -0.952427
23 2010 12 -0.110677
24 2011 1 0.570594
25 2011 2 0.915420
26 2011 3 -1.669341
27 2011 4 0.482714
28 2011 5 -0.310473
29 2011 6 2.394690
30 2011 7 1.550931
31 2011 8 -0.646465
32 2011 9 -0.928937
33 2011 10 -1.654976
34 2011 11 0.350193
35 2011 12 -0.141757

How to calculate cumulative sum in python using pandas of all the columns except the first one that contain names?

Here's the data in csv format:
Name 2012 2013 2014 2015 2016 2017 2018 2019 2020
Jack 1 15 25 3 5 11 5 8 3
Jill 5 10 32 5 5 14 6 8 7
I don't want Name column to be include as it gives an error.
I tried
df.cumsum()

Try with set_index and reset_index to keep the name column:
df.set_index('Name').cumsum().reset_index()
Output:
Name 2012 2013 2014 2015 2016 2017 2018 2019 2020
0 Jack 1 15 25 3 5 11 5 8 3
1 Jill 6 25 57 8 10 25 11 16 10

Fill column with value from previous year from the same month

How can I use the value from the same month in the previous year to fill values in the following table for 2020:
Category Month Year Value
A 1 2019 15
B 2 2019 20
A 2 2019 90
A 3 2019 50
B 4 2019 40
A 5 2019 20
A 6 2019 15
A 7 2019 17
A 8 2019 18
A 9 2019 12
A 10 2019 11
A 11 2019 19
A 12 2019 15
A 1 2020 18
A 2 2020 53
A 3 2020 80
The final desired result is the following:
Category Month Year Value
A 1 2019 15
B 2 2019 20
A 2 2019 90
A 3 2019 50
B 4 2019 40
A 4 2019 40
A 5 2019 20
A 6 2019 15
A 7 2019 17
A 8 2019 18
A 9 2019 12
A 10 2019 11
A 11 2019 19
A 12 2019 15
A 1 2020 18
A 2 2020 53
A 3 2020 80
B 4 2020 40
A 4 2020 40
A 5 2020 20
A 6 2020 15
A 7 2020 17
A 8 2020 18
A 9 2020 12
A 10 2020 11
A 11 2020 19
A 12 2020 15
I tried using pandas groupby but not sure if that is the right approach.

IIUC we use the pivot then ffill with stack
s=df.pivot_table(index=['Category','Year'],columns='Month',values='Value').groupby(level=0).ffill().stack().reset_index()
Category Year level_2 0
0 A 2019 1 15.0
1 A 2019 2 90.0
2 A 2019 3 50.0
3 A 2019 5 20.0
4 A 2019 6 15.0
5 A 2019 7 17.0
6 A 2019 8 18.0
7 A 2019 9 12.0
8 A 2019 10 11.0
9 A 2019 11 19.0
10 A 2019 12 15.0
11 A 2020 1 18.0
12 A 2020 2 53.0
13 A 2020 3 80.0
14 A 2020 5 20.0
15 A 2020 6 15.0
16 A 2020 7 17.0
17 A 2020 8 18.0
18 A 2020 9 12.0
19 A 2020 10 11.0
20 A 2020 11 19.0
21 A 2020 12 15.0
22 B 2019 2 20.0
23 B 2019 4 40.0

You can accomplish this with a combination of loc, concat, and drop_duplicates.
The idea here is to concatenate the dataframe with a copy of the 2019 data where year is changed to 2020, and then only keeping the first value for Category, Month, Year.
df2 = df.loc[df['Year'] == 2019, :]
df2['Year'] = 2020
pd.concat([df, df2]).drop_duplicates(subset=['Category', 'Month', 'Year'], keep='first')
Output
Category Month Year Value
0 A 1 2019 15
1 B 2 2019 20
2 A 2 2019 90
3 A 3 2019 50
4 B 4 2019 40
5 A 5 2019 20
6 A 6 2019 15
7 A 7 2019 17
8 A 8 2019 18
9 A 9 2019 12
10 A 10 2019 11
11 A 11 2019 19
12 A 12 2019 15
13 A 1 2020 18
14 A 2 2020 53
15 A 3 2020 80
1 B 2 2020 20
4 B 4 2020 40
5 A 5 2020 20
6 A 6 2020 15
7 A 7 2020 17
8 A 8 2020 18
9 A 9 2020 12
10 A 10 2020 11
11 A 11 2020 19
12 A 12 2020 15

How to use groupby and grouper properly for accumulating column 'A' and averaging column 'B', month by month

I have a pandas data with 3 columns:
date: from 1/1/2018 up until 8/23/2019, column A and column B.
import pandas as pd
df = pd.DataFrame(np.random.randint(0,10,size=(600, 2)), columns=list('AB'))
df['date'] = pd.DataFrame(pd.date_range(start='1/1/2018', end='8/23/2019'))
df.set_index('date')
df is as follows:
date A B
2018-01-01 7 4
2018-01-02 5 4
2018-01-03 3 1
2018-01-04 9 3
2018-01-05 7 8
2018-01-06 0 0
2018-01-07 6 8
2018-01-08 3 7
...
...
...
2019-08-18 1 0
2019-08-19 8 1
2019-08-20 5 9
2019-08-21 0 7
2019-08-22 3 6
2019-08-23 8 6
I want monthly accumulated values of column A and monthly averaged values of column B. The final output will become a df with 20 rows ( 12 months of year 2018 and 8 months of year 2019 ) and 4 columns, representing monthly accumulated values of column A, monthly averaged values of column B, month number and year number just like below:
month year monthly_accumulated_of_A monthly_averaged_of_B
0 1 2018 176 1.747947
1 2 2018 110 2.399476
2 3 2018 131 3.976747
3 4 2018 227 2.314923
4 5 2018 234 0.464097
5 6 2018 249 1.662753
6 7 2018 121 1.588865
7 8 2018 165 2.318268
8 9 2018 219 1.060595
9 10 2018 131 0.577268
10 11 2018 179 3.948414
11 12 2018 115 1.750346
12 1 2019 190 3.364003
13 2 2019 215 0.864792
14 3 2019 231 3.219739
15 4 2019 186 2.904413
16 5 2019 232 0.324695
17 6 2019 163 1.334139
18 7 2019 238 1.670644
19 8 2019 112 1.316442

How can I achieve this in pandas?

Use DataFrameGroupBy.agg with DatetimeIndex.month and DatetimeIndex.year, for ordering add sort_index and last use reset_index for columns from MultiIndex:
import pandas as pd
import numpy as np
np.random.seed(2018)
#changed 300 to 600
df = pd.DataFrame(np.random.randint(0,10,size=(600, 2)), columns=list('AB'))
df['date'] = pd.DataFrame(pd.date_range(start='1/1/2018', end='8/23/2019'))
df = df.set_index('date')
df1 = (df.groupby([df.index.month.rename('month'),
df.index.year.rename('year')])
.agg({'A':'sum', 'B':'mean'})
.sort_index(level=['year', 'month'])
.reset_index())
print (df1)
month year A B
0 1 2018 147 4.838710
1 2 2018 120 3.678571
2 3 2018 114 4.387097
3 4 2018 143 3.800000
4 5 2018 124 3.870968
5 6 2018 129 4.700000
6 7 2018 143 3.935484
7 8 2018 118 5.483871
8 9 2018 150 5.500000
9 10 2018 139 4.225806
10 11 2018 136 4.933333
11 12 2018 141 4.548387
12 1 2019 137 4.709677
13 2 2019 120 4.964286
14 3 2019 167 4.935484
15 4 2019 121 4.200000
16 5 2019 133 4.129032
17 6 2019 140 5.066667
18 7 2019 189 4.677419
19 8 2019 100 3.695652

Grouping data series by day intervals with Pandas

I have to perform some data analysis on a seasonal basis.
I have circa one and a half years worth of hourly measurements, from the end of 2015 to the second half of 2017. What I want to do is to sort this data in seasons.
Here's an example of the data I am working with:
Date,Year,Month,Day,Day week,Hour,Holiday,Week Day,Impulse,Power (kW),Temperature (C)
04/12/2015,2015,12,4,6,18,0,6,2968,1781,16.2
04/12/2015,2015,12,4,6,19,0,6,2437,1462,16.2
19/04/2016,2016,4,19,3,3,0,3,1348,809,14.4
19/04/2016,2016,4,19,3,4,0,3,1353,812,14.1
11/06/2016,2016,6,11,7,19,0,7,1395,837,18.8
11/06/2016,2016,6,11,7,20,0,7,1370,822,17.4
11/06/2016,2016,6,11,7,21,0,7,1364,818,17
11/06/2016,2016,6,11,7,22,0,7,1433,860,17.5
04/12/2016,2016,12,4,1,17,0,1,1425,855,14.6
04/12/2016,2016,12,4,1,18,0,1,1466,880,14.4
07/03/2017,2017,3,7,3,14,0,3,3668,2201,14.2
07/03/2017,2017,3,7,3,15,0,3,3666,2200,14
24/04/2017,2017,4,24,2,5,0,2,1347,808,11.4
24/04/2017,2017,4,24,2,6,0,2,1816,1090,11.5
24/04/2017,2017,4,24,2,7,0,2,2918,1751,12.4
15/06/2017,2017,6,15,5,13,1,1,2590,1554,22.5
15/06/2017,2017,6,15,5,14,1,1,2629,1577,22.5
15/06/2017,2017,6,15,5,15,1,1,2656,1594,22.1
15/11/2017,2017,11,15,4,13,0,4,3765,2259,15.6
15/11/2017,2017,11,15,4,14,0,4,3873,2324,15.9
15/11/2017,2017,11,15,4,15,0,4,3905,2343,15.8
15/11/2017,2017,11,15,4,16,0,4,3861,2317,15.3
As you can see I have data on three different years.
What I was thinking to do is to convert the first column with the pd.to_datetime() command. Then to group the rows according to the day/month, regardless of the year in dd/mm intervals (if winter goes from the 21/12 to the 21/03, create a new dataframe with all of those rows in which the date is included in this interval, regardless of the year), but I couldn't do it by neglecting the year (which make things more complicated).
EDIT:
A desired output would be:
df_spring
Date,Year,Month,Day,Day week,Hour,Holiday,Week Day,Impulse,Power (kW),Temperature (C)
19/04/2016,2016,4,19,3,3,0,3,1348,809,14.4
19/04/2016,2016,4,19,3,4,0,3,1353,812,14.1
07/03/2017,2017,3,7,3,14,0,3,3668,2201,14.2
07/03/2017,2017,3,7,3,15,0,3,3666,2200,14
24/04/2017,2017,4,24,2,5,0,2,1347,808,11.4
24/04/2017,2017,4,24,2,6,0,2,1816,1090,11.5
24/04/2017,2017,4,24,2,7,0,2,2918,1751,12.4
df_autumn
Date,Year,Month,Day,Day week,Hour,Holiday,Week Day,Impulse,Power (kW),Temperature (C)
04/12/2015,2015,12,4,6,18,0,6,2968,1781,16.2
04/12/2015,2015,12,4,6,19,0,6,2437,1462,16.2
04/12/2016,2016,12,4,1,17,0,1,1425,855,14.6
04/12/2016,2016,12,4,1,18,0,1,1466,880,14.4
15/11/2017,2017,11,15,4,13,0,4,3765,2259,15.6
15/11/2017,2017,11,15,4,14,0,4,3873,2324,15.9
15/11/2017,2017,11,15,4,15,0,4,3905,2343,15.8
15/11/2017,2017,11,15,4,16,0,4,3861,2317,15.3
And so on for the remaining seasons.

Define each season by filtering the relevant rows using Day and Month columns as presented for winter:
df_winter = df.loc[((df['Day'] >= 21) & (df['Month'] == 12)) | (df['Month'] == 1) | (df['Month'] == 2) | ((df['Day'] <= 21) & (df['Month'] == 3))]

you can simply filter your dataframe by month.isin()
# spring
df[df['Month'].isin([3,4])]
Date Year Month Day Day week Hour Holiday Week Day Impulse Power (kW) Temperature (C)
2 19/04/2016 2016 4 19 3 3 0 3 1348 809 14.4
3 19/04/2016 2016 4 19 3 4 0 3 1353 812 14.1
10 07/03/2017 2017 3 7 3 14 0 3 3668 2201 14.2
11 07/03/2017 2017 3 7 3 15 0 3 3666 2200 14.0
12 24/04/2017 2017 4 24 2 5 0 2 1347 808 11.4
13 24/04/2017 2017 4 24 2 6 0 2 1816 1090 11.5
14 24/04/2017 2017 4 24 2 7 0 2 2918 1751 12.4
# autumn
df[df['Month'].isin([11,12])]
Date Year Month Day Day week Hour Holiday Week Day Impulse Power (kW) Temperature (C)
0 04/12/2015 2015 12 4 6 18 0 6 2968 1781 16.2
1 04/12/2015 2015 12 4 6 19 0 6 2437 1462 16.2
8 04/12/2016 2016 12 4 1 17 0 1 1425 855 14.6
9 04/12/2016 2016 12 4 1 18 0 1 1466 880 14.4
18 15/11/2017 2017 11 15 4 13 0 4 3765 2259 15.6
19 15/11/2017 2017 11 15 4 14 0 4 3873 2324 15.9
20 15/11/2017 2017 11 15 4 15 0 4 3905 2343 15.8
21 15/11/2017 2017 11 15 4 16 0 4 3861 2317 15.3

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas mapping with multiple conditions - python

Related

Convert a Python data frame with diferents 'year' column into continue time series

How to calculate cumulative sum in python using pandas of all the columns except the first one that contain names?

Fill column with value from previous year from the same month

How to use groupby and grouper properly for accumulating column 'A' and averaging column 'B', month by month

Grouping data series by day intervals with Pandas

Categories

Resources