Flip and shift multi-column data to the left in Pandas - python
Here's an Employee - Supervisor mapping data.
I'd like to flip and then shift whole column to left. Only data should be shifted to the left 1 time and the columns should be fixed. Could you tell me how can I do this?
Input: Bottom - Up approach
Emp_ID
Sup_1 ID
Sup_2 ID
Sup_3 ID
Sup_4 ID
123
234
456
678
789
234
456
678
789
NaN
456
678
789
NaN
NaN
678
789
NaN
NaN
NaN
789
NaN
NaN
NaN
NaN
Output: Top - Down approach
Emp_ID
Sup_1 ID
Sup_2 ID
Sup_3 ID
Sup_4 ID
123
789
678
456
234
234
789
678
456
NaN
456
789
678
NaN
NaN
678
789
NaN
NaN
NaN
789
NaN
NaN
NaN
NaN
Appreciate any kind of assistance
Try with fliplr:
# Get numpy structure
x = df.loc[:, 'Sup_1 ID':].to_numpy()
# flip left to right
a = np.fliplr(x)
# Overwrite not NaN values in x with not NaN in a
x[~np.isnan(x)] = a[~np.isnan(a)]
# Update DataFrame
df.loc[:, 'Sup_1 ID':] = x
df:
Emp_ID Sup_1 ID Sup_2 ID Sup_3 ID Sup_4 ID
0 123 789.0 678.0 456.0 234.0
1 234 789.0 678.0 456.0 NaN
2 456 789.0 678.0 NaN NaN
3 678 789.0 NaN NaN NaN
4 789 NaN NaN NaN NaN
DataFrame Constructor and imports:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'Emp_ID': [123, 234, 456, 678, 789],
'Sup_1 ID': [234.0, 456.0, 678.0, 789.0, np.nan],
'Sup_2 ID': [456.0, 678.0, 789.0, np.nan, np.nan],
'Sup_3 ID': [678.0, 789.0, np.nan, np.nan, np.nan],
'Sup_4 ID': [789.0, np.nan, np.nan, np.nan, np.nan]
})
In your case try np.roll
df = df.set_index('Emp_ID')
out = df.apply(lambda x : np.roll(x[x.notnull()].values,1)).apply(pd.Series)
0 1 2 3
Sup_1 ID 789.0 234.0 456.0 678.0
Sup_2 ID 789.0 456.0 678.0 NaN
Sup_3 ID 789.0 678.0 NaN NaN
Sup_4 ID 789.0 NaN NaN NaN
out.columns = df.columns
Related
Transpose and Compare
I'm attempting to compare two data frames. Item and Summary variables correspond to various dates and quantities. I'd like to transpose the dates into one column of data along with the associated quantities. I'd then like to compare the two data frames and see what changed from PreviousData to CurrentData. Previous Data: PreviousData = { 'Item' : ['abc','def','ghi','jkl','mno','pqr','stu','vwx','yza','uaza','fupa'], 'Summary' : ['party','weekend','food','school','tv','photo','camera','python','r','rstudio','spyder'], '2022-01-01' : [1, np.nan, np.nan, 1.0, np.nan, 1.0, np.nan, np.nan, np.nan,np.nan,2], '2022-02-01' : [1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan], '2022-03-01' : [np.nan,np.nan,np.nan,1,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan], '2022-04-01' : [np.nan,np.nan,3,np.nan,np.nan,3,np.nan,np.nan,np.nan,np.nan,np.nan], '2022-05-01' : [np.nan,np.nan,np.nan,3,np.nan,np.nan,2,np.nan,np.nan,3,np.nan], '2022-06-01' : [np.nan,np.nan,np.nan,np.nan,2,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan], '2022-07-01' : [np.nan,1,np.nan,np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan], '2022-08-01' : [np.nan,np.nan,np.nan,1,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan], '2022-09-01' : [np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,1,np.nan], '2022-10-01' : [np.nan,np.nan,1,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan], '2022-11-01' : [np.nan,2,np.nan,np.nan,1,1,1,np.nan,np.nan,np.nan,np.nan], '2022-12-01' : [np.nan,np.nan,np.nan,np.nan,3,np.nan,np.nan,2,np.nan,np.nan,np.nan], '2023-01-01' : [np.nan,np.nan,1,np.nan,1,np.nan,np.nan,np.nan,2,np.nan,np.nan], '2023-02-01' : [np.nan,np.nan,np.nan,2,np.nan,2,np.nan,np.nan,np.nan,np.nan,np.nan], '2023-03-01' : [np.nan,3,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan], '2023-04-01' : [np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan], '2023-05-01' : [np.nan,np.nan,2,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,2,np.nan], '2023-06-01' : [1,1,np.nan,np.nan,9,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan], '2023-07-01' : [np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan], '2023-08-01' : [np.nan,1,np.nan,np.nan,1,np.nan,1,np.nan,np.nan,np.nan,np.nan], '2023-09-01' : [np.nan,1,1,np.nan,np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan], } PreviousData = pd.DataFrame(PreviousData) PreviousData Current Data: CurrentData = { 'Item' : ['ghi','stu','abc','mno','jkl','pqr','def','vwx','yza'], 'Summary' : ['food','camera','party','tv','school','photo','weekend','python','r'], '2022-01-01' : [3, np.nan, np.nan, 1.0, np.nan, 1.0, np.nan, np.nan, np.nan], '2022-02-01' : [np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan], '2022-03-01' : [np.nan,1,1,1,np.nan,np.nan,np.nan,np.nan,np.nan], '2022-04-01' : [np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan], '2022-05-01' : [np.nan,np.nan,3,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan], '2022-06-01' : [2,np.nan,np.nan,np.nan,4,np.nan,np.nan,np.nan,np.nan], '2022-07-01' : [np.nan,np.nan,np.nan,np.nan,np.nan,4,np.nan,np.nan,np.nan], '2022-08-01' : [np.nan,np.nan,3,np.nan,4,np.nan,np.nan,np.nan,np.nan], '2022-09-01' : [np.nan,np.nan,3,3,3,np.nan,np.nan,5,5], '2022-10-01' : [np.nan,np.nan,np.nan,np.nan,5,np.nan,np.nan,np.nan,np.nan], '2022-11-01' : [np.nan,np.nan,np.nan,5,np.nan,np.nan,np.nan,np.nan,np.nan], '2022-12-01' : [np.nan,4,np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan], '2023-01-01' : [np.nan,np.nan,np.nan,np.nan,1,1,np.nan,np.nan,np.nan], '2023-02-01' : [np.nan,np.nan,np.nan,2,1,np.nan,np.nan,np.nan,np.nan], '2023-03-01' : [np.nan,np.nan,np.nan,np.nan,2,np.nan,2,np.nan,2], '2023-04-01' : [np.nan,np.nan,np.nan,np.nan,np.nan,2,np.nan,np.nan,2], } CurrentData = pd.DataFrame(CurrentData) CurrentData As requested, here's an example of a difference: How to transpose and compare these two sets?
One way of doing this is the following. Transpose both dataframes: PreviousData_t = PreviousData.melt(id_vars=["Item", "Summary"], var_name="Date", value_name="value1") which is Item Summary Date value1 0 abc party 2022-01-01 1.0 1 def weekend 2022-01-01 NaN 2 ghi food 2022-01-01 NaN 3 jkl school 2022-01-01 1.0 4 mno tv 2022-01-01 NaN .. ... ... ... ... 226 stu camera 2023-09-01 NaN 227 vwx python 2023-09-01 1.0 228 yza r 2023-09-01 NaN 229 uaza rstudio 2023-09-01 NaN 230 fupa spyder 2023-09-01 NaN and CurrentData_t = CurrentData.melt(id_vars=["Item", "Summary"], var_name="Date", value_name="value2") Item Summary Date value2 0 ghi food 2022-01-01 3.0 1 stu camera 2022-01-01 NaN 2 abc party 2022-01-01 NaN 3 mno tv 2022-01-01 1.0 4 jkl school 2022-01-01 NaN .. ... ... ... ... 139 jkl school 2023-04-01 NaN 140 pqr photo 2023-04-01 2.0 141 def weekend 2023-04-01 NaN 142 vwx python 2023-04-01 NaN 143 yza r 2023-04-01 2.0 [144 rows x 4 columns] THen merge: Compare = PreviousData_t.merge(CurrentData_t, on =['Date','Item','Summary'], how = 'left') Item Summary Date value1 value2 0 abc party 2022-01-01 1.0 NaN 1 def weekend 2022-01-01 NaN NaN 2 ghi food 2022-01-01 NaN 3.0 3 jkl school 2022-01-01 1.0 NaN 4 mno tv 2022-01-01 NaN 1.0 .. ... ... ... ... ... 226 stu camera 2023-09-01 NaN NaN 227 vwx python 2023-09-01 1.0 NaN 228 yza r 2023-09-01 NaN NaN 229 uaza rstudio 2023-09-01 NaN NaN 230 fupa spyder 2023-09-01 NaN NaN [231 rows x 5 columns] and compare by creating a column marking differences Compare['diff'] = np.where(Compare['value1']!=Compare['value2'], 1,0) Item Summary Date value1 value2 diff 0 abc party 2022-01-01 1.0 NaN 1 1 def weekend 2022-01-01 NaN NaN 1 2 ghi food 2022-01-01 NaN 3.0 1 3 jkl school 2022-01-01 1.0 NaN 1 4 mno tv 2022-01-01 NaN 1.0 1 .. ... ... ... ... ... ... 226 stu camera 2023-09-01 NaN NaN 1 227 vwx python 2023-09-01 1.0 NaN 1 228 yza r 2023-09-01 NaN NaN 1 229 uaza rstudio 2023-09-01 NaN NaN 1 230 fupa spyder 2023-09-01 NaN NaN 1 [231 rows x 6 columns] If you only want to compare those entries that are common to both, do this: Compare = PreviousData_t.merge(CurrentData_t, on =['Date','Item','Summary']) Compare['diff'] = np.where(Compare['value1']!=Compare['value2'], 1,0) Item Summary Date value1 value2 diff 0 abc party 2022-01-01 1.0 NaN 1 1 def weekend 2022-01-01 NaN NaN 1 2 ghi food 2022-01-01 NaN 3.0 1 3 jkl school 2022-01-01 1.0 NaN 1 4 mno tv 2022-01-01 NaN 1.0 1 .. ... ... ... ... ... ... 139 mno tv 2023-04-01 NaN NaN 1 140 pqr photo 2023-04-01 NaN 2.0 1 141 stu camera 2023-04-01 NaN NaN 1 142 vwx python 2023-04-01 1.0 NaN 1 143 yza r 2023-04-01 NaN 2.0 1 [144 rows x 6 columns]
More efficient way to do the same merge on multiple columns in a dataframe?
Input: df1 OFF_P1 OFF_P2 OFF_P3 OFF_P4 OFF_P5 GAME_ID 0 1629675 1627736 1630162 201976 1629020 22101224 1 201599 1630178 202699 1629680 201980 22101228 2 1630191 1630180 1630587 1630240 1628402 22101228 3 1627759 201143 1628464 1628369 203935 22101223 4 1630573 1630271 1630238 1628436 1630346 22101223 df2 PLAYER_ID GAME_ID PTS 0 201980 21900002 28 1 201586 21900001 13 2 1628366 21900001 8 3 200755 21900001 16 4 202324 21900001 6 Desired Output: OFF_P1 OFF_P2 OFF_P3 OFF_P4 OFF_P5 GAME_ID OFF_P1_PTS OFF_P2_PTS etc... 0 1629675 1627736 1630162 201976 1629020 22101224 28 13 ... 1 201599 1630178 202699 1629680 201980 22101228 12 2 1630191 1630180 1630587 1630240 1628402 22101228 14 3 1627759 201143 1628464 1628369 203935 22101223 8 4 1630573 1630271 1630238 1628436 1630346 22101223 19 I would like to merge the PTS column from df2 to df1 but for each column of OFF_P1, OFF_P2, etc... Is there a more efficient way to do this other than something like the below? df1 = df1.merge(df2, left_on=['GAME_ID', 'OFF_P1'], right_on=['GAME_ID', 'PLAYER_ID']) df1 = df1.merge(df2, left_on=['GAME_ID', 'OFF_P2'], right_on=['GAME_ID', 'PLAYER_ID']) df1 = df1.merge(df2, left_on=['GAME_ID', 'OFF_P3'], right_on=['GAME_ID', 'PLAYER_ID']) df1 = df1.merge(df2, left_on=['GAME_ID', 'OFF_P4'], right_on=['GAME_ID', 'PLAYER_ID']) df1 = df1.merge(df2, left_on=['GAME_ID', 'OFF_P5'], right_on=['GAME_ID', 'PLAYER_ID'])
I would prefer the MultiIndex.map approach: d = df2.set_index(['GAME_ID', 'PLAYER_ID'])['PTS'] for c in df1.filter(like='OFF_P'): df1[f'{c}_PTS'] = df1.set_index(['GAME_ID', c]).index.map(d) print(df1) OFF_P1 OFF_P2 OFF_P3 OFF_P4 OFF_P5 GAME_ID OFF_P1_PTS OFF_P2_PTS OFF_P3_PTS OFF_P4_PTS OFF_P5_PTS OFF_P1_PTS_PTS OFF_P2_PTS_PTS OFF_P3_PTS_PTS OFF_P4_PTS_PTS OFF_P5_PTS_PTS 0 1629675 1627736 1630162 201976 1629020 22101224 28.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 201599 1630178 202699 1629680 201980 22101228 NaN 13.0 NaN NaN NaN NaN NaN NaN NaN NaN 2 1630191 1630180 1630587 1630240 1628402 22101228 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3 1627759 201143 1628464 1628369 203935 22101223 16.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 4 1630573 1630271 1630238 1628436 1630346 22101223 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
transpose/stack/unstack pandas dataframe whilst concatenating the field name with existing columns
I have a dataframe that looks something like: Component Date MTD YTD QTD FC ABC Jan 2017 56 nan nan nan DEF Jan 2017 453 nan nan nan XYZ Jan 2017 657 PQR Jan 2017 123 ABC Feb 2017 56 nan nan nan DEF Feb 2017 456 nan nan nan XYZ Feb 2017 6234 57 PQR Feb 2017 123 346 ABC Dec 2017 56 nan nan nan DEF Dec 2017 nan nan 345 324 XYZ Dec 2017 6234 57 PQR Dec 2017 nan 346 54654 546 And i would like to transpose this dataframe in such a way that the component becomes the prefix of the existing MTD,QTD, etc columns so the output expected would be: Date ABC_MTD DEF_MTD XYZ_MTD PQR_MTD ABC_YTD DEF_YTD XYZ_YTD PQR_YTD etcetc Jan 2017 56 453 657 123 nan nan nan nan Feb 2017 56 456 6234 123 nan nan 57 346 Dec 2017 56 nan 6234 nan 57 346 I am not sure whether a pivot or stack/unstack would be efficient out here. Thanks in advance.
You could try this: newdf=df.pivot(values=df.columns[2:], index='Date', columns='Component' ) newdf.columns = ['%s%s' % (b, '_%s' % a if b else '') for a, b in newdf.columns] #join the multiindex columns names print(newdf) Output: df Component Date MTD YTD QTD FC 0 ABC 2017-01-01 56.0 NaN NaN NaN 1 DEF 2017-01-01 453.0 NaN NaN NaN 2 XYZ 2017-01-01 657.0 3 PQR 2017-01-01 123.0 4 ABC 2017-02-01 56.0 NaN NaN NaN 5 DEF 2017-02-01 456.0 NaN NaN NaN 6 XYZ 2017-02-01 6234.0 57 7 PQR 2017-02-01 123.0 346 8 ABC 2017-12-01 56.0 NaN NaN NaN 9 DEF 2017-12-01 NaN NaN 345 324 10 XYZ 2017-12-01 6234.0 57 11 PQR 2017-12-01 NaN 346 54654 546 newdf ABC_MTD DEF_MTD PQR_MTD XYZ_MTD ABC_YTD DEF_YTD PQR_YTD XYZ_YTD ABC_QTD DEF_QTD PQR_QTD XYZ_QTD ABC_FC DEF_FC PQR_FC XYZ_FC Date 2017-01-01 56 453 123 657 NaN NaN NaN NaN NaN NaN 2017-02-01 56 456 123 6234 NaN NaN 346 57 NaN NaN NaN NaN 2017-12-01 56 NaN NaN 6234 NaN NaN 346 57 NaN 345 54654 NaN 324 546
Python: Expand a dataframe row-wise based on datetime
I have a dataframe like this: ID Date Value 783 C 2018-02-23 0.704 580 B 2018-08-04 -1.189 221 A 2018-08-10 -0.788 228 A 2018-08-17 0.038 578 B 2018-08-02 1.188 What I want is expanding the dataframe based on Date column to 1-month earlier, and fill ID with the same person, and fill Value with nan until the last observation. The expected result is similar to this: ID Date Value 0 C 2018/01/24 nan 1 C 2018/01/25 nan 2 C 2018/01/26 nan 3 C 2018/01/27 nan 4 C 2018/01/28 nan 5 C 2018/01/29 nan 6 C 2018/01/30 nan 7 C 2018/01/31 nan 8 C 2018/02/01 nan 9 C 2018/02/02 nan 10 C 2018/02/03 nan 11 C 2018/02/04 nan 12 C 2018/02/05 nan 13 C 2018/02/06 nan 14 C 2018/02/07 nan 15 C 2018/02/08 nan 16 C 2018/02/09 nan 17 C 2018/02/10 nan 18 C 2018/02/11 nan 19 C 2018/02/12 nan 20 C 2018/02/13 nan 21 C 2018/02/14 nan 22 C 2018/02/15 nan 23 C 2018/02/16 nan 24 C 2018/02/17 nan 25 C 2018/02/18 nan 26 C 2018/02/19 nan 27 C 2018/02/20 nan 28 C 2018/02/21 nan 29 C 2018/02/22 nan 30 C 2018/02/23 1.093 31 B 2018/07/05 nan 32 B 2018/07/06 nan 33 B 2018/07/07 nan 34 B 2018/07/08 nan 35 B 2018/07/09 nan 36 B 2018/07/10 nan 37 B 2018/07/11 nan 38 B 2018/07/12 nan 39 B 2018/07/13 nan 40 B 2018/07/14 nan 41 B 2018/07/15 nan 42 B 2018/07/16 nan 43 B 2018/07/17 nan 44 B 2018/07/18 nan 45 B 2018/07/19 nan 46 B 2018/07/20 nan 47 B 2018/07/21 nan 48 B 2018/07/22 nan 49 B 2018/07/23 nan 50 B 2018/07/24 nan 51 B 2018/07/25 nan 52 B 2018/07/26 nan 53 B 2018/07/27 nan 54 B 2018/07/28 nan 55 B 2018/07/29 nan 56 B 2018/07/30 nan 57 B 2018/07/31 nan 58 B 2018/08/01 nan 59 B 2018/08/02 nan 60 B 2018/08/03 nan 61 B 2018/08/04 0.764 62 A 2018/07/11 nan 63 A 2018/07/12 nan 64 A 2018/07/13 nan 65 A 2018/07/14 nan 66 A 2018/07/15 nan 67 A 2018/07/16 nan 68 A 2018/07/17 nan 69 A 2018/07/18 nan 70 A 2018/07/19 nan 71 A 2018/07/20 nan 72 A 2018/07/21 nan 73 A 2018/07/22 nan 74 A 2018/07/23 nan 75 A 2018/07/24 nan 76 A 2018/07/25 nan 77 A 2018/07/26 nan 78 A 2018/07/27 nan 79 A 2018/07/28 nan 80 A 2018/07/29 nan 81 A 2018/07/30 nan 82 A 2018/07/31 nan 83 A 2018/08/01 nan 84 A 2018/08/02 nan 85 A 2018/08/03 nan 86 A 2018/08/04 nan 87 A 2018/08/05 nan 88 A 2018/08/06 nan 89 A 2018/08/07 nan 90 A 2018/08/08 nan 91 A 2018/08/09 nan 92 A 2018/08/10 2.144 93 A 2018/07/18 nan 94 A 2018/07/19 nan 95 A 2018/07/20 nan 96 A 2018/07/21 nan 97 A 2018/07/22 nan 98 A 2018/07/23 nan 99 A 2018/07/24 nan 100 A 2018/07/25 nan 101 A 2018/07/26 nan 102 A 2018/07/27 nan 103 A 2018/07/28 nan 104 A 2018/07/29 nan 105 A 2018/07/30 nan 106 A 2018/07/31 nan 107 A 2018/08/01 nan 108 A 2018/08/02 nan 109 A 2018/08/03 nan 110 A 2018/08/04 nan 111 A 2018/08/05 nan 112 A 2018/08/06 nan 113 A 2018/08/07 nan 114 A 2018/08/08 nan 115 A 2018/08/09 nan 116 A 2018/08/10 nan 117 A 2018/08/11 nan 118 A 2018/08/12 nan 119 A 2018/08/13 nan 120 A 2018/08/14 nan 121 A 2018/08/15 nan 122 A 2018/08/16 nan 123 A 2018/08/17 0.644 124 B 2018/07/03 nan 125 B 2018/07/04 nan 126 B 2018/07/05 nan 127 B 2018/07/06 nan 128 B 2018/07/07 nan 129 B 2018/07/08 nan 130 B 2018/07/09 nan 131 B 2018/07/10 nan 132 B 2018/07/11 nan 133 B 2018/07/12 nan 134 B 2018/07/13 nan 135 B 2018/07/14 nan 136 B 2018/07/15 nan 137 B 2018/07/16 nan 138 B 2018/07/17 nan 139 B 2018/07/18 nan 140 B 2018/07/19 nan 141 B 2018/07/20 nan 142 B 2018/07/21 nan 143 B 2018/07/22 nan 144 B 2018/07/23 nan 145 B 2018/07/24 nan 146 B 2018/07/25 nan 147 B 2018/07/26 nan 148 B 2018/07/27 nan 149 B 2018/07/28 nan 150 B 2018/07/29 nan 151 B 2018/07/30 nan 152 B 2018/07/31 nan 153 B 2018/08/01 nan 154 B 2018/08/02 -0.767 The source data can be created as below: import pandas as pd from itertools import chain import numpy as np df_1 = pd.DataFrame({ 'ID' : list(chain.from_iterable([['A'] * 365, ['B'] * 365, ['C'] * 365])), 'Date' : pd.date_range(start = '2018-01-01', end = '2018-12-31').tolist() + pd.date_range(start = '2018-01-01', end = '2018-12-31').tolist() + pd.date_range(start = '2018-01-01', end = '2018-12-31').tolist(), 'Value' : np.random.randn(365 * 3) }) df_1 = df_1.sample(5, random_state = 123) Thanks for the advice!
You can create another DataFrame with previous months, then join together by concat, create DatetimeIndex, so possible use groupby with resample by d for days for add all values between: df_2 = df_1.assign(Date = df_1['Date'] - pd.DateOffset(months=1) + pd.DateOffset(days=1), Value = np.nan) df = (pd.concat([df_2, df_1], sort=False) .reset_index() .set_index('Date') .groupby('index', sort=False) .resample('D') .ffill() .reset_index(level=1) .drop('index', 1) .rename_axis(None)) print (df) Date ID Value 783 2018-01-24 C NaN 783 2018-01-25 C NaN 783 2018-01-26 C NaN 783 2018-01-27 C NaN 783 2018-01-28 C NaN .. ... .. ... 578 2018-07-29 B NaN 578 2018-07-30 B NaN 578 2018-07-31 B NaN 578 2018-08-01 B NaN 578 2018-08-02 B 0.562684 [155 rows x 3 columns] Another solution with list comprehension and concat, but last is necessary back filling of columns for index and ID, solution working if no missing value in original ID column: offset = pd.DateOffset(months=1) + pd.DateOffset(days=1) df=pd.concat([df_1.iloc[[i]].reset_index().set_index('Date').reindex(pd.date_range(d-offset,d)) for i, d in enumerate(df_1['Date'])], sort=False) df = (df.assign(index = df['index'].bfill().astype(int), ID = df['ID'].bfill()) .rename_axis('Date') .reset_index() .set_index('index') .rename_axis(None) ) print (df) Date ID Value 783 2018-01-24 C NaN 783 2018-01-25 C NaN 783 2018-01-26 C NaN 783 2018-01-27 C NaN 783 2018-01-28 C NaN .. ... .. ... 578 2018-07-29 B NaN 578 2018-07-30 B NaN 578 2018-07-31 B NaN 578 2018-08-01 B NaN 578 2018-08-02 B 1.224345 [155 rows x 3 columns]
We can create a date range in the "Date" column, then explode it. Then group the "Value" column by the index and set values to nan but the last. Finally reset the index. def drange(t): return pd.date_range( t-pd.DateOffset(months=1)+pd.DateOffset(days=1),t,freq="D",normalize=True) df["Date"]= df["Date"].transform(drange) ID Date Value index 783 C DatetimeIndex(['2018-01-24', '2018-01-25', '20... 0.704 580 B DatetimeIndex(['2018-07-05', '2018-07-06', '20... -1.189 221 A DatetimeIndex(['2018-07-11', '2018-07-12', '20... -0.788 228 A DatetimeIndex(['2018-07-18', '2018-07-19', '20... 0.038 578 B DatetimeIndex(['2018-07-03', '2018-07-04', '20... 1.188 df= df.reset_index(drop=True).explode(column="Date") ID Date Value 0 C 2018-01-24 0.704 0 C 2018-01-25 0.704 0 C 2018-01-26 0.704 0 C 2018-01-27 0.704 0 C 2018-01-28 0.704 .. .. ... ... 4 B 2018-07-29 1.188 4 B 2018-07-30 1.188 4 B 2018-07-31 1.188 4 B 2018-08-01 1.188 4 B 2018-08-02 1.188 df["Value"]= df.groupby(level=0)["Value"].transform(lambda v: [np.nan]*(len(v)-1)+[v.iloc[0]]) df= df.reset_index(drop=True) ID Date Value 0 C 2018-01-24 NaN 1 C 2018-01-25 NaN 2 C 2018-01-26 NaN 3 C 2018-01-27 NaN 4 C 2018-01-28 NaN .. .. ... ... 150 B 2018-07-29 NaN 151 B 2018-07-30 NaN 152 B 2018-07-31 NaN 153 B 2018-08-01 NaN 154 B 2018-08-02 1.188
Pivoting DataFrame with multiple columns for the index
I have a dataframe and I want to transpose only few rows to column. This is what I have now. Entity Name Date Value 0 111 Name1 2018-03-31 100 1 111 Name2 2018-02-28 200 2 222 Name3 2018-02-28 1000 3 333 Name1 2018-01-31 2000 I want to create date as the column and then add value. Something like this: Entity Name 2018-01-31 2018-02-28 2018-03-31 0 111 Name1 NaN NaN 100.0 1 111 Name2 NaN 200.0 NaN 2 222 Name3 NaN 1000.0 NaN 3 333 Name1 2000.0 NaN NaN I can have identical Name for two different Entitys. Here is an updated dataset. Code: import pandas as pd import datetime data1 = { 'Entity': [111,111,222,333], 'Name': ['Name1','Name2', 'Name3','Name1'], 'Date': [datetime.date(2018,3, 31), datetime.date(2018,2,28), datetime.date(2018,2,28), datetime.date(2018,1,31)], 'Value': [100,200,1000,2000] } df1 = pd.DataFrame(data1, columns= ['Entity','Name','Date', 'Value']) How do I achieve this? Any pointers? Thanks all.
Based on your update, you'd need pivot_table with two index columns - v = df1.pivot_table( index=['Entity', 'Name'], columns='Date', values='Value' ).reset_index() v.index.name = v.columns.name = None v Entity Name 2018-01-31 2018-02-28 2018-03-31 0 111 Name1 NaN NaN 100.0 1 111 Name2 NaN 200.0 NaN 2 222 Name3 NaN 1000.0 NaN 3 333 Name1 2000.0 NaN NaN
From unstack df1.set_index(['Entity','Name','Date']).Value.unstack().reset_index() Date Entity Name 2018-01-31 00:00:00 2018-02-28 00:00:00 \ 0 111 Name1 NaN NaN 1 111 Name2 NaN 200.0 2 222 Name3 NaN 1000.0 3 333 Name1 2000.0 NaN Date 2018-03-31 00:00:00 0 100.0 1 NaN 2 NaN 3 NaN