Flip and shift multi-column data to the left in Pandas - python

Here's an Employee - Supervisor mapping data.
I'd like to flip and then shift whole column to left. Only data should be shifted to the left 1 time and the columns should be fixed. Could you tell me how can I do this?
Input: Bottom - Up approach
Emp_ID
Sup_1 ID
Sup_2 ID
Sup_3 ID
Sup_4 ID
123
234
456
678
789
234
456
678
789
NaN
456
678
789
NaN
NaN
678
789
NaN
NaN
NaN
789
NaN
NaN
NaN
NaN
Output: Top - Down approach
Emp_ID
Sup_1 ID
Sup_2 ID
Sup_3 ID
Sup_4 ID
123
789
678
456
234
234
789
678
456
NaN
456
789
678
NaN
NaN
678
789
NaN
NaN
NaN
789
NaN
NaN
NaN
NaN
Appreciate any kind of assistance

Try with fliplr:
# Get numpy structure
x = df.loc[:, 'Sup_1 ID':].to_numpy()
# flip left to right
a = np.fliplr(x)
# Overwrite not NaN values in x with not NaN in a
x[~np.isnan(x)] = a[~np.isnan(a)]
# Update DataFrame
df.loc[:, 'Sup_1 ID':] = x
df:
Emp_ID Sup_1 ID Sup_2 ID Sup_3 ID Sup_4 ID
0 123 789.0 678.0 456.0 234.0
1 234 789.0 678.0 456.0 NaN
2 456 789.0 678.0 NaN NaN
3 678 789.0 NaN NaN NaN
4 789 NaN NaN NaN NaN
DataFrame Constructor and imports:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'Emp_ID': [123, 234, 456, 678, 789],
'Sup_1 ID': [234.0, 456.0, 678.0, 789.0, np.nan],
'Sup_2 ID': [456.0, 678.0, 789.0, np.nan, np.nan],
'Sup_3 ID': [678.0, 789.0, np.nan, np.nan, np.nan],
'Sup_4 ID': [789.0, np.nan, np.nan, np.nan, np.nan]
})

In your case try np.roll
df = df.set_index('Emp_ID')
out = df.apply(lambda x : np.roll(x[x.notnull()].values,1)).apply(pd.Series)
0 1 2 3
Sup_1 ID 789.0 234.0 456.0 678.0
Sup_2 ID 789.0 456.0 678.0 NaN
Sup_3 ID 789.0 678.0 NaN NaN
Sup_4 ID 789.0 NaN NaN NaN
out.columns = df.columns

Related

Transpose and Compare

I'm attempting to compare two data frames. Item and Summary variables correspond to various dates and quantities. I'd like to transpose the dates into one column of data along with the associated quantities. I'd then like to compare the two data frames and see what changed from PreviousData to CurrentData.
Previous Data:
PreviousData = { 'Item' : ['abc','def','ghi','jkl','mno','pqr','stu','vwx','yza','uaza','fupa'],
'Summary' : ['party','weekend','food','school','tv','photo','camera','python','r','rstudio','spyder'],
'2022-01-01' : [1, np.nan, np.nan, 1.0, np.nan, 1.0, np.nan, np.nan, np.nan,np.nan,2],
'2022-02-01' : [1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'2022-03-01' : [np.nan,np.nan,np.nan,1,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan],
'2022-04-01' : [np.nan,np.nan,3,np.nan,np.nan,3,np.nan,np.nan,np.nan,np.nan,np.nan],
'2022-05-01' : [np.nan,np.nan,np.nan,3,np.nan,np.nan,2,np.nan,np.nan,3,np.nan],
'2022-06-01' : [np.nan,np.nan,np.nan,np.nan,2,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'2022-07-01' : [np.nan,1,np.nan,np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan],
'2022-08-01' : [np.nan,np.nan,np.nan,1,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan],
'2022-09-01' : [np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,1,np.nan],
'2022-10-01' : [np.nan,np.nan,1,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan],
'2022-11-01' : [np.nan,2,np.nan,np.nan,1,1,1,np.nan,np.nan,np.nan,np.nan],
'2022-12-01' : [np.nan,np.nan,np.nan,np.nan,3,np.nan,np.nan,2,np.nan,np.nan,np.nan],
'2023-01-01' : [np.nan,np.nan,1,np.nan,1,np.nan,np.nan,np.nan,2,np.nan,np.nan],
'2023-02-01' : [np.nan,np.nan,np.nan,2,np.nan,2,np.nan,np.nan,np.nan,np.nan,np.nan],
'2023-03-01' : [np.nan,3,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'2023-04-01' : [np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan],
'2023-05-01' : [np.nan,np.nan,2,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,2,np.nan],
'2023-06-01' : [1,1,np.nan,np.nan,9,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'2023-07-01' : [np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'2023-08-01' : [np.nan,1,np.nan,np.nan,1,np.nan,1,np.nan,np.nan,np.nan,np.nan],
'2023-09-01' : [np.nan,1,1,np.nan,np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan],
}
PreviousData = pd.DataFrame(PreviousData)
PreviousData
Current Data:
CurrentData = { 'Item' : ['ghi','stu','abc','mno','jkl','pqr','def','vwx','yza'],
'Summary' : ['food','camera','party','tv','school','photo','weekend','python','r'],
'2022-01-01' : [3, np.nan, np.nan, 1.0, np.nan, 1.0, np.nan, np.nan, np.nan],
'2022-02-01' : [np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'2022-03-01' : [np.nan,1,1,1,np.nan,np.nan,np.nan,np.nan,np.nan],
'2022-04-01' : [np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'2022-05-01' : [np.nan,np.nan,3,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'2022-06-01' : [2,np.nan,np.nan,np.nan,4,np.nan,np.nan,np.nan,np.nan],
'2022-07-01' : [np.nan,np.nan,np.nan,np.nan,np.nan,4,np.nan,np.nan,np.nan],
'2022-08-01' : [np.nan,np.nan,3,np.nan,4,np.nan,np.nan,np.nan,np.nan],
'2022-09-01' : [np.nan,np.nan,3,3,3,np.nan,np.nan,5,5],
'2022-10-01' : [np.nan,np.nan,np.nan,np.nan,5,np.nan,np.nan,np.nan,np.nan],
'2022-11-01' : [np.nan,np.nan,np.nan,5,np.nan,np.nan,np.nan,np.nan,np.nan],
'2022-12-01' : [np.nan,4,np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan],
'2023-01-01' : [np.nan,np.nan,np.nan,np.nan,1,1,np.nan,np.nan,np.nan],
'2023-02-01' : [np.nan,np.nan,np.nan,2,1,np.nan,np.nan,np.nan,np.nan],
'2023-03-01' : [np.nan,np.nan,np.nan,np.nan,2,np.nan,2,np.nan,2],
'2023-04-01' : [np.nan,np.nan,np.nan,np.nan,np.nan,2,np.nan,np.nan,2],
}
CurrentData = pd.DataFrame(CurrentData)
CurrentData
As requested, here's an example of a difference:
How to transpose and compare these two sets?
One way of doing this is the following. Transpose both dataframes:
PreviousData_t = PreviousData.melt(id_vars=["Item", "Summary"],
var_name="Date",
value_name="value1")
which is
Item Summary Date value1
0 abc party 2022-01-01 1.0
1 def weekend 2022-01-01 NaN
2 ghi food 2022-01-01 NaN
3 jkl school 2022-01-01 1.0
4 mno tv 2022-01-01 NaN
.. ... ... ... ...
226 stu camera 2023-09-01 NaN
227 vwx python 2023-09-01 1.0
228 yza r 2023-09-01 NaN
229 uaza rstudio 2023-09-01 NaN
230 fupa spyder 2023-09-01 NaN
and
CurrentData_t = CurrentData.melt(id_vars=["Item", "Summary"],
var_name="Date",
value_name="value2")
Item Summary Date value2
0 ghi food 2022-01-01 3.0
1 stu camera 2022-01-01 NaN
2 abc party 2022-01-01 NaN
3 mno tv 2022-01-01 1.0
4 jkl school 2022-01-01 NaN
.. ... ... ... ...
139 jkl school 2023-04-01 NaN
140 pqr photo 2023-04-01 2.0
141 def weekend 2023-04-01 NaN
142 vwx python 2023-04-01 NaN
143 yza r 2023-04-01 2.0
[144 rows x 4 columns]
THen merge:
Compare = PreviousData_t.merge(CurrentData_t, on =['Date','Item','Summary'], how = 'left')
Item Summary Date value1 value2
0 abc party 2022-01-01 1.0 NaN
1 def weekend 2022-01-01 NaN NaN
2 ghi food 2022-01-01 NaN 3.0
3 jkl school 2022-01-01 1.0 NaN
4 mno tv 2022-01-01 NaN 1.0
.. ... ... ... ... ...
226 stu camera 2023-09-01 NaN NaN
227 vwx python 2023-09-01 1.0 NaN
228 yza r 2023-09-01 NaN NaN
229 uaza rstudio 2023-09-01 NaN NaN
230 fupa spyder 2023-09-01 NaN NaN
[231 rows x 5 columns]
and compare by creating a column marking differences
Compare['diff'] = np.where(Compare['value1']!=Compare['value2'], 1,0)
Item Summary Date value1 value2 diff
0 abc party 2022-01-01 1.0 NaN 1
1 def weekend 2022-01-01 NaN NaN 1
2 ghi food 2022-01-01 NaN 3.0 1
3 jkl school 2022-01-01 1.0 NaN 1
4 mno tv 2022-01-01 NaN 1.0 1
.. ... ... ... ... ... ...
226 stu camera 2023-09-01 NaN NaN 1
227 vwx python 2023-09-01 1.0 NaN 1
228 yza r 2023-09-01 NaN NaN 1
229 uaza rstudio 2023-09-01 NaN NaN 1
230 fupa spyder 2023-09-01 NaN NaN 1
[231 rows x 6 columns]
If you only want to compare those entries that are common to both, do this:
Compare = PreviousData_t.merge(CurrentData_t, on =['Date','Item','Summary'])
Compare['diff'] = np.where(Compare['value1']!=Compare['value2'], 1,0)
Item Summary Date value1 value2 diff
0 abc party 2022-01-01 1.0 NaN 1
1 def weekend 2022-01-01 NaN NaN 1
2 ghi food 2022-01-01 NaN 3.0 1
3 jkl school 2022-01-01 1.0 NaN 1
4 mno tv 2022-01-01 NaN 1.0 1
.. ... ... ... ... ... ...
139 mno tv 2023-04-01 NaN NaN 1
140 pqr photo 2023-04-01 NaN 2.0 1
141 stu camera 2023-04-01 NaN NaN 1
142 vwx python 2023-04-01 1.0 NaN 1
143 yza r 2023-04-01 NaN 2.0 1
[144 rows x 6 columns]

More efficient way to do the same merge on multiple columns in a dataframe?

Input:
df1
OFF_P1 OFF_P2 OFF_P3 OFF_P4 OFF_P5 GAME_ID
0 1629675 1627736 1630162 201976 1629020 22101224
1 201599 1630178 202699 1629680 201980 22101228
2 1630191 1630180 1630587 1630240 1628402 22101228
3 1627759 201143 1628464 1628369 203935 22101223
4 1630573 1630271 1630238 1628436 1630346 22101223
df2
PLAYER_ID GAME_ID PTS
0 201980 21900002 28
1 201586 21900001 13
2 1628366 21900001 8
3 200755 21900001 16
4 202324 21900001 6
Desired Output:
OFF_P1 OFF_P2 OFF_P3 OFF_P4 OFF_P5 GAME_ID OFF_P1_PTS OFF_P2_PTS etc...
0 1629675 1627736 1630162 201976 1629020 22101224 28 13 ...
1 201599 1630178 202699 1629680 201980 22101228 12
2 1630191 1630180 1630587 1630240 1628402 22101228 14
3 1627759 201143 1628464 1628369 203935 22101223 8
4 1630573 1630271 1630238 1628436 1630346 22101223 19
I would like to merge the PTS column from df2 to df1 but for each column of OFF_P1, OFF_P2, etc...
Is there a more efficient way to do this other than something like the below?
df1 = df1.merge(df2, left_on=['GAME_ID', 'OFF_P1'], right_on=['GAME_ID', 'PLAYER_ID'])
df1 = df1.merge(df2, left_on=['GAME_ID', 'OFF_P2'], right_on=['GAME_ID', 'PLAYER_ID'])
df1 = df1.merge(df2, left_on=['GAME_ID', 'OFF_P3'], right_on=['GAME_ID', 'PLAYER_ID'])
df1 = df1.merge(df2, left_on=['GAME_ID', 'OFF_P4'], right_on=['GAME_ID', 'PLAYER_ID'])
df1 = df1.merge(df2, left_on=['GAME_ID', 'OFF_P5'], right_on=['GAME_ID', 'PLAYER_ID'])
I would prefer the MultiIndex.map approach:
d = df2.set_index(['GAME_ID', 'PLAYER_ID'])['PTS']
for c in df1.filter(like='OFF_P'):
df1[f'{c}_PTS'] = df1.set_index(['GAME_ID', c]).index.map(d)
print(df1)
OFF_P1 OFF_P2 OFF_P3 OFF_P4 OFF_P5 GAME_ID OFF_P1_PTS OFF_P2_PTS OFF_P3_PTS OFF_P4_PTS OFF_P5_PTS OFF_P1_PTS_PTS OFF_P2_PTS_PTS OFF_P3_PTS_PTS OFF_P4_PTS_PTS OFF_P5_PTS_PTS
0 1629675 1627736 1630162 201976 1629020 22101224 28.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 201599 1630178 202699 1629680 201980 22101228 NaN 13.0 NaN NaN NaN NaN NaN NaN NaN NaN
2 1630191 1630180 1630587 1630240 1628402 22101228 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 1627759 201143 1628464 1628369 203935 22101223 16.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 1630573 1630271 1630238 1628436 1630346 22101223 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

transpose/stack/unstack pandas dataframe whilst concatenating the field name with existing columns

I have a dataframe that looks something like:
Component Date MTD YTD QTD FC
ABC Jan 2017 56 nan nan nan
DEF Jan 2017 453 nan nan nan
XYZ Jan 2017 657
PQR Jan 2017 123
ABC Feb 2017 56 nan nan nan
DEF Feb 2017 456 nan nan nan
XYZ Feb 2017 6234 57
PQR Feb 2017 123 346
ABC Dec 2017 56 nan nan nan
DEF Dec 2017 nan nan 345 324
XYZ Dec 2017 6234 57
PQR Dec 2017 nan 346 54654 546
And i would like to transpose this dataframe in such a way that the component becomes the prefix of the existing MTD,QTD, etc columns
so the output expected would be:
Date ABC_MTD DEF_MTD XYZ_MTD PQR_MTD ABC_YTD DEF_YTD XYZ_YTD PQR_YTD etcetc
Jan 2017 56 453 657 123 nan nan nan nan
Feb 2017 56 456 6234 123 nan nan 57 346
Dec 2017 56 nan 6234 nan 57 346
I am not sure whether a pivot or stack/unstack would be efficient out here.
Thanks in advance.
You could try this:
newdf=df.pivot(values=df.columns[2:], index='Date', columns='Component' )
newdf.columns = ['%s%s' % (b, '_%s' % a if b else '') for a, b in newdf.columns] #join the multiindex columns names
print(newdf)
Output:
df
Component Date MTD YTD QTD FC
0 ABC 2017-01-01 56.0 NaN NaN NaN
1 DEF 2017-01-01 453.0 NaN NaN NaN
2 XYZ 2017-01-01 657.0
3 PQR 2017-01-01 123.0
4 ABC 2017-02-01 56.0 NaN NaN NaN
5 DEF 2017-02-01 456.0 NaN NaN NaN
6 XYZ 2017-02-01 6234.0 57
7 PQR 2017-02-01 123.0 346
8 ABC 2017-12-01 56.0 NaN NaN NaN
9 DEF 2017-12-01 NaN NaN 345 324
10 XYZ 2017-12-01 6234.0 57
11 PQR 2017-12-01 NaN 346 54654 546
newdf
ABC_MTD DEF_MTD PQR_MTD XYZ_MTD ABC_YTD DEF_YTD PQR_YTD XYZ_YTD ABC_QTD DEF_QTD PQR_QTD XYZ_QTD ABC_FC DEF_FC PQR_FC XYZ_FC
Date
2017-01-01 56 453 123 657 NaN NaN NaN NaN NaN NaN
2017-02-01 56 456 123 6234 NaN NaN 346 57 NaN NaN NaN NaN
2017-12-01 56 NaN NaN 6234 NaN NaN 346 57 NaN 345 54654 NaN 324 546

Python: Expand a dataframe row-wise based on datetime

I have a dataframe like this:
ID Date Value
783 C 2018-02-23 0.704
580 B 2018-08-04 -1.189
221 A 2018-08-10 -0.788
228 A 2018-08-17 0.038
578 B 2018-08-02 1.188
What I want is expanding the dataframe based on Date column to 1-month earlier, and fill ID with the same person, and fill Value with nan until the last observation.
The expected result is similar to this:
ID Date Value
0 C 2018/01/24 nan
1 C 2018/01/25 nan
2 C 2018/01/26 nan
3 C 2018/01/27 nan
4 C 2018/01/28 nan
5 C 2018/01/29 nan
6 C 2018/01/30 nan
7 C 2018/01/31 nan
8 C 2018/02/01 nan
9 C 2018/02/02 nan
10 C 2018/02/03 nan
11 C 2018/02/04 nan
12 C 2018/02/05 nan
13 C 2018/02/06 nan
14 C 2018/02/07 nan
15 C 2018/02/08 nan
16 C 2018/02/09 nan
17 C 2018/02/10 nan
18 C 2018/02/11 nan
19 C 2018/02/12 nan
20 C 2018/02/13 nan
21 C 2018/02/14 nan
22 C 2018/02/15 nan
23 C 2018/02/16 nan
24 C 2018/02/17 nan
25 C 2018/02/18 nan
26 C 2018/02/19 nan
27 C 2018/02/20 nan
28 C 2018/02/21 nan
29 C 2018/02/22 nan
30 C 2018/02/23 1.093
31 B 2018/07/05 nan
32 B 2018/07/06 nan
33 B 2018/07/07 nan
34 B 2018/07/08 nan
35 B 2018/07/09 nan
36 B 2018/07/10 nan
37 B 2018/07/11 nan
38 B 2018/07/12 nan
39 B 2018/07/13 nan
40 B 2018/07/14 nan
41 B 2018/07/15 nan
42 B 2018/07/16 nan
43 B 2018/07/17 nan
44 B 2018/07/18 nan
45 B 2018/07/19 nan
46 B 2018/07/20 nan
47 B 2018/07/21 nan
48 B 2018/07/22 nan
49 B 2018/07/23 nan
50 B 2018/07/24 nan
51 B 2018/07/25 nan
52 B 2018/07/26 nan
53 B 2018/07/27 nan
54 B 2018/07/28 nan
55 B 2018/07/29 nan
56 B 2018/07/30 nan
57 B 2018/07/31 nan
58 B 2018/08/01 nan
59 B 2018/08/02 nan
60 B 2018/08/03 nan
61 B 2018/08/04 0.764
62 A 2018/07/11 nan
63 A 2018/07/12 nan
64 A 2018/07/13 nan
65 A 2018/07/14 nan
66 A 2018/07/15 nan
67 A 2018/07/16 nan
68 A 2018/07/17 nan
69 A 2018/07/18 nan
70 A 2018/07/19 nan
71 A 2018/07/20 nan
72 A 2018/07/21 nan
73 A 2018/07/22 nan
74 A 2018/07/23 nan
75 A 2018/07/24 nan
76 A 2018/07/25 nan
77 A 2018/07/26 nan
78 A 2018/07/27 nan
79 A 2018/07/28 nan
80 A 2018/07/29 nan
81 A 2018/07/30 nan
82 A 2018/07/31 nan
83 A 2018/08/01 nan
84 A 2018/08/02 nan
85 A 2018/08/03 nan
86 A 2018/08/04 nan
87 A 2018/08/05 nan
88 A 2018/08/06 nan
89 A 2018/08/07 nan
90 A 2018/08/08 nan
91 A 2018/08/09 nan
92 A 2018/08/10 2.144
93 A 2018/07/18 nan
94 A 2018/07/19 nan
95 A 2018/07/20 nan
96 A 2018/07/21 nan
97 A 2018/07/22 nan
98 A 2018/07/23 nan
99 A 2018/07/24 nan
100 A 2018/07/25 nan
101 A 2018/07/26 nan
102 A 2018/07/27 nan
103 A 2018/07/28 nan
104 A 2018/07/29 nan
105 A 2018/07/30 nan
106 A 2018/07/31 nan
107 A 2018/08/01 nan
108 A 2018/08/02 nan
109 A 2018/08/03 nan
110 A 2018/08/04 nan
111 A 2018/08/05 nan
112 A 2018/08/06 nan
113 A 2018/08/07 nan
114 A 2018/08/08 nan
115 A 2018/08/09 nan
116 A 2018/08/10 nan
117 A 2018/08/11 nan
118 A 2018/08/12 nan
119 A 2018/08/13 nan
120 A 2018/08/14 nan
121 A 2018/08/15 nan
122 A 2018/08/16 nan
123 A 2018/08/17 0.644
124 B 2018/07/03 nan
125 B 2018/07/04 nan
126 B 2018/07/05 nan
127 B 2018/07/06 nan
128 B 2018/07/07 nan
129 B 2018/07/08 nan
130 B 2018/07/09 nan
131 B 2018/07/10 nan
132 B 2018/07/11 nan
133 B 2018/07/12 nan
134 B 2018/07/13 nan
135 B 2018/07/14 nan
136 B 2018/07/15 nan
137 B 2018/07/16 nan
138 B 2018/07/17 nan
139 B 2018/07/18 nan
140 B 2018/07/19 nan
141 B 2018/07/20 nan
142 B 2018/07/21 nan
143 B 2018/07/22 nan
144 B 2018/07/23 nan
145 B 2018/07/24 nan
146 B 2018/07/25 nan
147 B 2018/07/26 nan
148 B 2018/07/27 nan
149 B 2018/07/28 nan
150 B 2018/07/29 nan
151 B 2018/07/30 nan
152 B 2018/07/31 nan
153 B 2018/08/01 nan
154 B 2018/08/02 -0.767
The source data can be created as below:
import pandas as pd
from itertools import chain
import numpy as np
df_1 = pd.DataFrame({
'ID' : list(chain.from_iterable([['A'] * 365, ['B'] * 365, ['C'] * 365])),
'Date' : pd.date_range(start = '2018-01-01', end = '2018-12-31').tolist() + pd.date_range(start = '2018-01-01', end = '2018-12-31').tolist() + pd.date_range(start = '2018-01-01', end = '2018-12-31').tolist(),
'Value' : np.random.randn(365 * 3)
})
df_1 = df_1.sample(5, random_state = 123)
Thanks for the advice!
You can create another DataFrame with previous months, then join together by concat, create DatetimeIndex, so possible use groupby with resample by d for days for add all values between:
df_2 = df_1.assign(Date = df_1['Date'] - pd.DateOffset(months=1) + pd.DateOffset(days=1),
Value = np.nan)
df = (pd.concat([df_2, df_1], sort=False)
.reset_index()
.set_index('Date')
.groupby('index', sort=False)
.resample('D')
.ffill()
.reset_index(level=1)
.drop('index', 1)
.rename_axis(None))
print (df)
Date ID Value
783 2018-01-24 C NaN
783 2018-01-25 C NaN
783 2018-01-26 C NaN
783 2018-01-27 C NaN
783 2018-01-28 C NaN
.. ... .. ...
578 2018-07-29 B NaN
578 2018-07-30 B NaN
578 2018-07-31 B NaN
578 2018-08-01 B NaN
578 2018-08-02 B 0.562684
[155 rows x 3 columns]
Another solution with list comprehension and concat, but last is necessary back filling of columns for index and ID, solution working if no missing value in original ID column:
offset = pd.DateOffset(months=1) + pd.DateOffset(days=1)
df=pd.concat([df_1.iloc[[i]].reset_index().set_index('Date').reindex(pd.date_range(d-offset,d))
for i, d in enumerate(df_1['Date'])], sort=False)
df = (df.assign(index = df['index'].bfill().astype(int), ID = df['ID'].bfill())
.rename_axis('Date')
.reset_index()
.set_index('index')
.rename_axis(None)
)
print (df)
Date ID Value
783 2018-01-24 C NaN
783 2018-01-25 C NaN
783 2018-01-26 C NaN
783 2018-01-27 C NaN
783 2018-01-28 C NaN
.. ... .. ...
578 2018-07-29 B NaN
578 2018-07-30 B NaN
578 2018-07-31 B NaN
578 2018-08-01 B NaN
578 2018-08-02 B 1.224345
[155 rows x 3 columns]
We can create a date range in the "Date" column, then explode it.
Then group the "Value" column by the index and set values to nan but the last.
Finally reset the index.
def drange(t):
return pd.date_range( t-pd.DateOffset(months=1)+pd.DateOffset(days=1),t,freq="D",normalize=True)
df["Date"]= df["Date"].transform(drange)
ID Date Value
index
783 C DatetimeIndex(['2018-01-24', '2018-01-25', '20... 0.704
580 B DatetimeIndex(['2018-07-05', '2018-07-06', '20... -1.189
221 A DatetimeIndex(['2018-07-11', '2018-07-12', '20... -0.788
228 A DatetimeIndex(['2018-07-18', '2018-07-19', '20... 0.038
578 B DatetimeIndex(['2018-07-03', '2018-07-04', '20... 1.188
df= df.reset_index(drop=True).explode(column="Date")
ID Date Value
0 C 2018-01-24 0.704
0 C 2018-01-25 0.704
0 C 2018-01-26 0.704
0 C 2018-01-27 0.704
0 C 2018-01-28 0.704
.. .. ... ...
4 B 2018-07-29 1.188
4 B 2018-07-30 1.188
4 B 2018-07-31 1.188
4 B 2018-08-01 1.188
4 B 2018-08-02 1.188
df["Value"]= df.groupby(level=0)["Value"].transform(lambda v: [np.nan]*(len(v)-1)+[v.iloc[0]])
df= df.reset_index(drop=True)
ID Date Value
0 C 2018-01-24 NaN
1 C 2018-01-25 NaN
2 C 2018-01-26 NaN
3 C 2018-01-27 NaN
4 C 2018-01-28 NaN
.. .. ... ...
150 B 2018-07-29 NaN
151 B 2018-07-30 NaN
152 B 2018-07-31 NaN
153 B 2018-08-01 NaN
154 B 2018-08-02 1.188

Pivoting DataFrame with multiple columns for the index

I have a dataframe and I want to transpose only few rows to column.
This is what I have now.
Entity Name Date Value
0 111 Name1 2018-03-31 100
1 111 Name2 2018-02-28 200
2 222 Name3 2018-02-28 1000
3 333 Name1 2018-01-31 2000
I want to create date as the column and then add value. Something like this:
Entity Name 2018-01-31 2018-02-28 2018-03-31
0 111 Name1 NaN NaN 100.0
1 111 Name2 NaN 200.0 NaN
2 222 Name3 NaN 1000.0 NaN
3 333 Name1 2000.0 NaN NaN
I can have identical Name for two different Entitys. Here is an updated dataset.
Code:
import pandas as pd
import datetime
data1 = {
'Entity': [111,111,222,333],
'Name': ['Name1','Name2', 'Name3','Name1'],
'Date': [datetime.date(2018,3, 31), datetime.date(2018,2,28), datetime.date(2018,2,28), datetime.date(2018,1,31)],
'Value': [100,200,1000,2000]
}
df1 = pd.DataFrame(data1, columns= ['Entity','Name','Date', 'Value'])
How do I achieve this? Any pointers? Thanks all.
Based on your update, you'd need pivot_table with two index columns -
v = df1.pivot_table(
index=['Entity', 'Name'],
columns='Date',
values='Value'
).reset_index()
v.index.name = v.columns.name = None
v
Entity Name 2018-01-31 2018-02-28 2018-03-31
0 111 Name1 NaN NaN 100.0
1 111 Name2 NaN 200.0 NaN
2 222 Name3 NaN 1000.0 NaN
3 333 Name1 2000.0 NaN NaN
From unstack
df1.set_index(['Entity','Name','Date']).Value.unstack().reset_index()
Date Entity Name 2018-01-31 00:00:00 2018-02-28 00:00:00 \
0 111 Name1 NaN NaN
1 111 Name2 NaN 200.0
2 222 Name3 NaN 1000.0
3 333 Name1 2000.0 NaN
Date 2018-03-31 00:00:00
0 100.0
1 NaN
2 NaN
3 NaN

Categories