Python conditional lookup - python

I have transactional table and a lookup table as below. I need add val field from df_lkp to df_txn by lookup.
For each record of df_txn, I need to loop thru df_lkp. If the grp field value is a then compare only field a in both tables and get match. If the grp value is ab then compare fields a and b in both tables. If it is abc then a, b, c fields should be compared to fetch val field, and so on. Is there a way this could done in pandas without a for-loop?
df_txn = pd.DataFrame({'id': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'},
'amt': {0: 100, 1: 200, 2: 300, 3: 400, 4: 500, 5: 600, 6: 700},
'a': {0: '226', 1: '227', 2: '248', 3: '236', 4: '248', 5: '236', 6: '236'},
'b': {0: '0E31', 1: '0E32', 2: '0E40', 3: '0E35', 4: '0E40', 5: '0E40', 6: '0E33'},
'c': {0: '3014', 1: '3015', 2: '3016', 3: '3016', 4: '3016', 5: '3016', 6: '3016'}})
df_lkp = pd.DataFrame({'a': {0: '226', 1: '227', 2: '236', 3: '237', 4: '248'},
'b': {0: '0E31', 1: '0E32', 2: '0E33', 3: '0E35', 4: '0E40'},
'c': {0: '3014', 1: '3015', 2: '3016', 3: '3018', 4: '3019'},
'grp': {0: 'a', 1: 'ab', 2: 'abc', 3: 'b', 4: 'bc'},
'val': {0: 'KE00CH0004', 1: 'KE00CH0003', 2: 'KE67593065', 3: 'KE67593262', 4: 'KE00CH0003'}})
the output
df_tx2 = pd.DataFrame({'id': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'},
'amt': {0: 100, 1: 200, 2: 300, 3: 400, 4: 500, 5: 600, 6: 700},
'a': {0: '226', 1: '227', 2: '248', 3: '236', 4: '248', 5: '236', 6: '236'},
'b': {0: '0E31', 1: '0E32', 2: '0E40', 3: '0E35', 4: '0E40', 5: '0E40', 6: '0E33'},
'c': {0: '3014', 1: '3015', 2: '3016', 3: '3016', 4: '3016', 5: '3016', 6: '3016'},
'val': {0: 'KE00CH0004', 1: 'KE00CH0003', 2: '', 3: '', 4: '', 5: '', 6: 'KE67593065'}
})

Related

Pandas - Group by multiple columns and datetime

I have a df of tennis results and I would like to be able to see how many days its been since each player last won a game.
This is what my df looks like
Player 1
Player 2
Date
p1_win
p2_win
Murray
Nadal
2022-05-16
1
0
Nadal
Murray
2022-05-25
1
0
and this is what I want it to look like
Player 1
Player 2
Date
p1_win
p2_win
p1_lastwin
p2_lastwin
Murray
Nadal
2022-05-16
1
0
na
na
Nadal
Murray
2022-05-25
1
0
na
9
the results will have to be able to include the days since the last win whether the player was player 1 or 2 using group by I think. Also maybe if possible it would be good to have a win percentage for the year if possible.
Any help is much appreciated.
edit - here is the dict
{'Player 1': {0: 'Murray',
1: 'Nadal',
2: 'Murray',
3: 'Nadal',
4: 'Murray',
5: 'Nadal',
6: 'Murray',
7: 'Nadal',
8: 'Murray',
9: 'Nadal',
10: 'Murray'},
'Player 2': {0: 'Nadal',
1: 'Murray',
2: 'Nadal',
3: 'Murray',
4: 'Nadal',
5: 'Murray',
6: 'Nadal',
7: 'Murray',
8: 'Nadal',
9: 'Murray',
10: 'Nadal'},
'Date': {0: '2022-05-16',
1: '2022-05-26',
2: '2022-05-27',
3: '2022-05-28',
4: '2022-05-29',
5: '2022-06-01',
6: '2022-06-02',
7: '2022-06-05',
8: '2022-06-09',
9: '2022-06-13',
10: '2022-06-17'},
'p1_win': {0: '1',
1: '1',
2: '0',
3: '1',
4: '0',
5: '0',
6: '1',
7: '0',
8: '1',
9: '0',
10: '1'},
'p2_win': {0: '0',
1: '0',
2: '1',
3: '0',
4: '1',
5: '1',
6: '0',
7: '1',
8: '0',
9: '1',
10: '0'}}
Thanks :)
I leveraged pd.merge_asof to find the latest win, and then did a merge to the relevant index.
df = pd.DataFrame({'Player 1': {0: 'Murray', 1: 'Nadal', 2: 'Murray', 3: 'Nadal', 4: 'Murray', 5: 'Nadal', 6: 'Murray'}, 'Player 2': {0: 'Nadal', 1: 'Murray', 2: 'Nadal', 3: 'Murray', 4: 'Nadal', 5: 'Murray', 6: 'Nadal'}, 'Date': {0: '2022-05-16', 1: '2022-05-26', 2: '2022-05-27', 3: '2022-05-28', 4: '2022-05-29', 5: '2022-06-01', 6: '2022-06-02'}, 'p1_win': {0: '1', 1: '1', 2: '0', 3: '1', 4: '0', 5: '0', 6: '1'}, 'p2_win': {0: '0', 1: '0', 2: '1', 3: '0', 4: '1', 5: '1', 6: '0'}})
df['p1_win']=df.p1_win.astype(int)
df['p2_win']=df.p2_win.astype(int)
df['Date'] = pd.to_datetime(df['Date'])
df['match'] = [x+'_'+y if x>y else y+'_'+x for x,y in zip(df['Player 1'],df['Player 2'])]
# df['winner'] = np.where(df.p1_win==1,df['Player 1'],df['Player 2'])
# df['looser'] = np.where(df.p1_win==0,df['Player 1'],df['Player 2'])
df = df.reset_index()
df = df.sort_values(by='Date')
df = pd.merge_asof(df,df[df.p1_win==1][['match','Date','index']],by=['match'],on='Date',suffixes=('','_latest_win_p1'),allow_exact_matches=False,direction='backward')
df = pd.merge_asof(df,df[df.p2_win==1][['match','Date','index']],by=['match'],on='Date',suffixes=('','_latest_win_p2'),allow_exact_matches=False,direction='backward')
# df = df[['index','Date','Player 1','Player 2','p1_win','p2_win','match','winner','looser','index_latest_win_p2','index_latest_win_p1']]
df = df.merge(df[['Date','index','match']],how='left',left_on=['index_latest_win_p1','match'],right_on=['index','match'],suffixes=('','_latest_win_winner'))
df = df.merge(df[['Date','index','match']],how='left',left_on=['index_latest_win_p2','match'],right_on=['index','match'],suffixes=('','_latest_win_looser'))
df['days_since_last_win_winner'] = (df['Date']-df.Date_latest_win_winner).dt.days
df['days_since_last_win_looser'] = (df['Date']-df.Date_latest_win_looser).dt.days
not sure that this is exactly what you meant but let me know if you need anything else:

How to substract two dates based on filter of two other columns

I am new in Python and I am struggling to reshape my dataFrame.
For a particular client (contact_id), I want to add an new date column that actually substracts the DTHR_OPERATION date for a 'TYPE_OPER_VALIDATION = 3' minus the DTHR_OPERATION date for a 'TYPE_OPER_VALIDATION = 1'.
If the 'TYPE_OPER_VALIDATION' is equal to 3 and that there is less than a hour difference between those two dates, I want to add a string such as 'connection' for example in the new column.
I have an issue "python Series' object has no attribute 'total_seconds" when I try to compare if the time difference is indeed minus or equal to an hour. I tried many solutions I found on Internet but I always seem to have a data type issue.
Here is my code snippet:
df_oper_one = merged_table.loc[(merged_table['TYPE_OPER_VALIDATION']==1),['contact_id','TYPE_OPER_VALIDATION','DTHR_OPERATION']]
df_oper_three = merged_table.loc[(merged_table['TYPE_OPER_VALIDATION']==3),['contact_id','TYPE_OPER_VALIDATION','DTHR_OPERATION']]
connection = []
for row in merged_table['contact_id']:
if (df_validation.loc[(df_validation['TYPE_OPER_VALIDATION']==3)]) & ((pd.to_datetime(df_oper_three['DTHR_OPERATION'],format='%Y-%m-%d %H:%M:%S') - pd.to_datetime(df_oper_one['DTHR_OPERATION'],format='%Y-%m-%d %H:%M:%S').total_seconds()) <= 3600): connection.append('connection')
# if diff_date.total_seconds() <= 3600: connection.append('connection')
else: connection.append('null')
merged_table['connection'] = pd.Series(connection)
Hello Nicolas and welcome to Stack Overflow. Please remember to always include sample data to reproduce your issue. Here is sample data to reproduce part of your dataframe:
df = pd.DataFrame({'Id contact':['cf2e79bc-8cac-ec11-9840-000d3ab078e6']*12+['865c5edf-c7ac-ec11-9840-000d3ab078e6']*10,
'DTHR OPERATION':['11/10/2022 07:07', '11/10/2022 07:29', '11/10/2022 15:47', '11/10/2022 16:22', '11/10/2022 16:44', '11/10/2022 18:06', '12/10/2022 07:11', '12/10/2022 07:25', '12/10/2022 17:21', '12/10/2022 18:04', '13/10/2022 07:09', '13/10/2022 18:36', '14/09/2022 17:59', '15/09/2022 09:34', '15/09/2022 19:17', '16/09/2022 08:31', '16/09/2022 19:18', '17/09/2022 06:41', '17/09/2022 11:19', '17/09/2022 15:48', '17/09/2022 16:13', '17/09/2022 17:07'],
'lastname':['BOUALAMI']*12+['VERVOORT']*10,
'TYPE_OPER_VALIDATION':[1, 3, 1, 3, 3, 3, 1, 3, 1, 3, 1, 3, 3, 1, 1, 1, 1, 1, 1, 1, 3, 3]})
df['DTHR OPERATION'] = pd.to_datetime(df['DTHR OPERATION'])
I would recommend creating a new table to more easily accomplish your task:
df2 = pd.merge(df[['Id contact', 'DTHR OPERATION']][df['TYPE_OPER_VALIDATION']==3], df[['Id contact', 'DTHR OPERATION']][df['TYPE_OPER_VALIDATION']==1], on='Id contact', suffixes=('_type3','_type1'))
Then find the time difference:
df2['seconds'] = (df2['DTHR OPERATION_type3']-df2['DTHR OPERATION_type1']).dt.total_seconds()
Finally, flag connections of an hour or less:
df2['connection'] = np.where(df2['seconds']<=3600, 'yes', 'no')
Hope this helps!
sure, here is the information you are looking for :
df_contact = pd.DataFrame{'contact_id': {0: '865C5EDF-C7AC-EC11-9840', 1: '9C9690B1-F8AC-EC11', 2: '4DD27359-14AF-EC11-9840', 3: '0091373E-E7F4-4170-BCAC'}, 'birthdate': {0: Timestamp('2005-05-19
00:00:00'), 1: Timestamp('1982-01-28 00:00:00'), 2: Timestamp('1997-05-15 00:00:00'), 3: Timestamp('2005-03-22 00:00:00')}, 'fullname': {0: 'Laura VERVO', 1: 'Mélanie ALBE', 2: 'Eric VANO', 3: 'Jean Docq'}, 'lastname': {0: 'VERVO', 1: 'ALBE', 2: 'VANO', 3: 'Docq'}, 'age': {0: 17, 1: 40, 2: 25, 3: 17}}
df_validation = pd.dataframe{'validation_id': {0: 8263835881, 1: 8263841517, 2: 8263843376, 3: 8263843377, 4: 8263843381, 5: 8263843382, 6: 8263863088, 7: 8263863124, 8: 8263868113, 9: 8263868123}, 'LIBEL_LONG_PRODUIT_TITRE': {0: 'Mens NEXT 12-17', 1: 'Ann NEXT 25-64%B', 2: 'Ann EXPRESS CBLANCHE', 3: 'Multi 8 NEXT', 4: 'Ann EXPRESS 18-24', 5: 'SNCB+TEC NEXT ABO', 6: 'Ann EXPRESS 18-24', 7: 'Ann EXPRESS 12-17%B', 8: '1 jour EX Réfugié', 9: 'Ann EXPRESS 2564%B'}, 'DTHR_OPERATION':
{0: Timestamp('2022-10-01 00:02:02'), 1: Timestamp('2022-10-01 00:22:45'), 2: Timestamp('2022-10-01 00:02:45'), 3: Timestamp('2022-10-01 00:02:49'), 4: Timestamp('2022-10-01 00:07:03'), 5: Timestamp('2022-10-01 00:07:06'), 6: Timestamp('2022-10-01 00:07:40'), 7: Timestamp('2022-10-01 00:31:51'), 8: Timestamp('2022-10-01 00:03:33'), 9: Timestamp('2022-10-01 00:07:40')}, 'TYPE_OPER_VALIDATION': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 3, 7: 3, 8: 2, 9: 1}, 'NUM_SERIE_SUPPORT': {0: '2040121921', 1: '2035998914', 2: '2034456458', 3: '14988572652829627697', 4: '2035956003', 5: '2033613155', 6: '2040119429', 7: '2036114867', 8: '14988572650230713650', 9: '2040146199'}}
{'support_id': {0: '8D3A331D-3E86-EC11-93B0', 1: '44863926-3E86-EC11-93B0', 2: '45863926-3E86-EC11-93B0', 3: '46863926-3E86-EC11-93B0', 4: '47863926-3E86-EC11-93B0', 5: 'E3863926-3E86-EC11-93B0', 6: '56873926-3E86-EC11', 7: 'E3CE312C-3E86-EC11-93B0', 8: 'F3CE312C-3E86-EC11-93B0', 9: '3CCF312C-3E86-EC11-93B0'}, 'bd_linkedcustomer': {0: '15CCC384-C4AD-EC11', 1: '9D27061D-14AE-EC11-9840', 2: '74CAE68F-D4AC-EC11-9840', 3: '18F5FE1A-58AC-EC11-983F', 4: None, 5: '9FBDA103-2FAD-EC11-9840', 6: 'EEA1FB63-75AC-EC11-9840', 7: 'F150EC3D-0DAD-EC11-9840', 8: '111DE8C4-CAAC-EC11-9840', 9: None}, 'bd_supportserialnumber': {0: '44884259', 1: '2036010559', 2: '62863150', 3: '2034498160', 4: '62989611', 5: '2036094315', 6: '2033192919', 7: '2036051529', 8: '2036062236', 9: '2033889172'}}
df_support = pd.dataframe{'support_id': {0: '8D3A331D-3E86-EC11-93B0', 1: '44863926-3E86-EC11', 2: '45863926-3E86-EC11-93B0', 3: '46863926-3E86-EC11-93B0', 4: '47863926-3E86-EC11-93B0', 5: 'E3863926-3E86-EC11-93B0', 6: '56873926-3E86-EC11-93B0', 7: 'E3CE312C-3E86-EC11-93B0', 8: 'F3CE312C-3E86-EC11-93B0', 9: '3CCF312C-3E86-EC11-93B0'}, 'bd_linkedcustomer': {0: '15CCC384-C4AD-EC11-9840', 1: '9D27061D-14AE-EC11-9840', 2: '74CAE68F-D4AC-EC11-9840', 3: '18F5FE1A-58AC-EC11-983F', 4: None, 5: '9FBDA103-2FAD-EC11', 6: 'EEA1FB63-75AC-EC11-9840', 7: 'F150EC3D-0DAD-EC11-9840', 8: '111DE8C4-CAAC-EC11-9840', 9: None}, 'bd_supportserialnumber': {0: '44884259', 1: '2036010559', 2: '62863150', 3: '2034498160', 4: '62989611', 5: '2036094315', 6: '2033192919', 7: '2036051529', 8: '2036062236', 9: '2033889172'}}
df2 = pd.dataframe{'support_id': {0: '4BE73E8C-B8F9-EC11-BB3D', 1: '4BE73E8C-B8F9-EC11-BB3D', 2: '4BE73E8C-B8F9-EC11-BB3D', 3: '4BE73E8C-B8F9-EC11-BB3D', 4: '4BE73E8C-B8F9-EC11-BB3D', 5: '4BE73E8C-B8F9-EC11-BB3D', 6: '4BE73E8C-B8F9-EC11', 7: '4BE73E8C-B8F9-EC11-BB3D', 8: '4BE73E8C-B8F9-EC11-BB3D', 9: '4BE73E8C-B8F9-EC11-BB3D'}, 'bd_linkedcustomer': {0: '9C9690B1-F8AC-EC11-9840', 1: '9C9690B1-F8AC-EC11-9840', 2: '9C9690B1-F8AC-EC11-9840', 3: '9C9690B1-F8AC-EC11-9840', 4: '9C9690B1-F8AC-EC11-9840',
5: '9C9690B1-F8AC-EC11-9840', 6: '9C9690B1-F8AC-EC11-9840', 7: '9C9690B1-F8AC-EC11-9840', 8: '9C9690B1-F8AC-EC11-9840', 9: '9C9690B1-F8AC-EC11-9840'}, 'bd_supportserialnumber': {0: '2036002771', 1: '2036002771', 2: '2036002771', 3: '2036002771', 4: '2036002771', 5: '2036002771', 6: '2036002771', 7: '2036002771', 8: '2036002771', 9: '2036002771'}, 'contact_id': {0: '9C9690B1-F8AC-EC11-9840', 1: '9C9690B1-F8AC-EC11-9840', 2: '9C9690B1-F8AC-EC11-9840', 3: '9C9690B1-F8AC-EC11-9840', 4: '9C9690B1-F8AC-EC11-9840', 5: '9C9690B1-F8AC-EC11-9840', 6: '9C9690B1-F8AC-EC11-9840', 7: '9C9690B1-F8AC-EC11-9840', 8: '9C9690B1-F8AC-EC11-9840', 9: '9C9690B1-F8AC-EC11-9840'}, 'birthdate': {0: Timestamp('1982-01-28 00:00:00'), 1: Timestamp('1982-01-28 00:00:00'), 2: Timestamp('1982-01-28 00:00:00'), 3: Timestamp('1982-01-28 00:00:00'), 4: Timestamp('1982-01-28 00:00:00'), 5: Timestamp('1982-01-28 00:00:00'), 6: Timestamp('1982-01-28 00:00:00'), 7: Timestamp('1982-01-28 00:00:00'), 8: Timestamp('1982-01-28 00:00:00'), 9: Timestamp('1982-01-28 00:00:00')}, 'fullname': {0: 'Mélanie ALBE', 1: 'Mélanie ALBE', 2: 'Mélanie ALBE', 3: 'Mélanie ALBE', 4: 'Mélanie ALBE', 5: 'Mélanie ALBE', 6: 'Mélanie ALBE', 7: 'Mélanie ALBE', 8: 'Mélanie ALBE', 9: 'Mélanie ALBE'}, 'lastname': {0: 'ALBE', 1: 'ALBE', 2: 'ALBE', 3: 'ALBE', 4: 'ALBE', 5: 'ALBE', 6: 'ALBE', 7: 'ALBE', 8: 'ALBE', 9: 'ALBE'}, 'age': {0: 40, 1: 40, 2: 40, 3: 40, 4: 40, 5: 40, 6: 40, 7: 40, 8: 40, 9: 40}, 'validation_id': {0: 8264573419, 1: 8264574166, 2: 8264574345, 3: 8264676975, 4: 8265441741, 5: 8272463799, 6: 8272471694, 7: 8274368291, 8: 8274397366, 9: 8277077728}, 'LIBEL_LONG_PRODUIT_TITRE': {0: 'Ann NEXT 25-64', 1: 'Ann NEXT 25-64', 2: 'Ann NEXT 25-64', 3: 'Ann NEXT 25-64', 4: 'Ann NEXT 25-64', 5: 'Ann NEXT 25-64', 6: 'Ann NEXT 25-64', 7: 'Ann NEXT 25-64', 8: 'Ann NEXT 25-64', 9: 'Ann NEXT 25-64'}, 'DTHR_OPERATION': {0: Timestamp('2022-10-01 08:30:18'), 1: Timestamp('2022-10-01 12:23:34'), 2: Timestamp('2022-10-01 07:47:46'), 3: Timestamp('2022-10-01 13:11:54'), 4: Timestamp('2022-10-01 12:35:02'), 5: Timestamp('2022-10-04 08:34:23'), 6: Timestamp('2022-10-04 08:04:50'), 7: Timestamp('2022-10-04 17:17:47'), 8: Timestamp('2022-10-04 15:20:29'), 9: Timestamp('2022-10-05 07:54:14')}, 'TYPE_OPER_VALIDATION': {0: 3, 1: 1, 2: 1, 3: 3, 4: 3, 5: 3, 6: 1, 7: 1, 8: 1, 9: 1}, 'NUM_SERIE_SUPPORT': {0: '2036002771', 1: '2036002771', 2: '2036002771', 3: '2036002771', 4: '2036002771', 5: '2036002771', 6: '2036002771', 7: '2036002771', 8: '2036002771', 9: '2036002771'}}
df3 = pd.dataframe{'contact_id': {0: '9C9690B1-F8AC-EC11-9840', 1: '9C9690B1-F8AC-EC11-9840', 2: '9C9690B1-F8AC-EC11-9840', 3: '9C9690B1-F8AC-EC11-9840', 4: '9C9690B1-F8AC-EC11-9840', 5: '9C9690B1-F8AC-EC11-9840', 6: '9C9690B1-F8AC-EC11-9840', 7: '9C9690B1-F8AC-EC11-9840', 8: '9C9690B1-F8AC-EC11-9840', 9: '9C9690B1-F8AC-EC11-9840'}, 'DTHR_OPERATION_type3': {0: Timestamp('2022-10-01 08:30:18'), 1: Timestamp('2022-10-01 08:30:18'), 2: Timestamp('2022-10-01 08:30:18'), 3: Timestamp('2022-10-01 08:30:18'), 4: Timestamp('2022-10-01 08:30:18'), 5: Timestamp('2022-10-01 08:30:18'), 6: Timestamp('2022-10-01 08:30:18'), 7: Timestamp('2022-10-01 08:30:18'), 8: Timestamp('2022-10-01 08:30:18'), 9: Timestamp('2022-10-01 08:30:18')}, 'DTHR_OPERATION_type1': {0: Timestamp('2022-10-01 12:23:34'), 1: Timestamp('2022-10-01 07:47:46'), 2: Timestamp('2022-10-04 08:04:50'), 3: Timestamp('2022-10-04 17:17:47'), 4: Timestamp('2022-10-04 15:20:29'), 5: Timestamp('2022-10-05 07:54:14'), 6: Timestamp('2022-10-05 18:22:42'), 7: Timestamp('2022-10-06 08:14:28'), 8: Timestamp('2022-10-06 18:19:33'), 9: Timestamp('2022-10-08 07:46:45')}, 'seconds': {0: -13996.0, 1: 2552.0, 2: -257672.00000000003, 3: -290849.0, 4: -283811.0, 5: -343436.0, 6: -381144.0, 7: -431050.0, 8: -467355.00000000006, 9: -602187.0}, 'first_connection': {0: 'no', 1: 'yes', 2: 'no', 3: 'no', 4: 'no', 5: 'no', 6: 'no', 7: 'no', 8: 'no', 9: 'no'}}
df4 = pd.dataframe{'contact_id': {0: '9C9690B1-F8AC-EC11-9840', 1: '9C9690B1-F8AC-EC11-9840', 2: '9C9690B1-F8AC-EC11-9840', 3: '9C9690B1-F8AC-EC11-9840', 4: '9C9690B1-F8AC-EC11-9840', 5: '9C9690B1-F8AC-EC11-9840', 6: '9C9690B1-F8AC-EC11-9840', 7: '9C9690B1-F8AC-EC11-9840', 8: '9C9690B1-F8AC-EC11-9840', 9: '9C9690B1-F8AC-EC11-9840'}, 'DTHR_OPERATION_type3': {0: Timestamp('2022-10-01 08:30:18'), 1: Timestamp('2022-10-01 08:30:18'), 2: Timestamp('2022-10-01 08:30:18'), 3: Timestamp('2022-10-01 08:30:18'), 4: Timestamp('2022-10-01 08:30:18'), 5: Timestamp('2022-10-01 08:30:18'), 6: Timestamp('2022-10-01 08:30:18'), 7: Timestamp('2022-10-01 08:30:18'), 8: Timestamp('2022-10-01 08:30:18'), 9: Timestamp('2022-10-01 08:30:18')}, 'DTHR_OPERATION_type3bis': {0: Timestamp('2022-10-01 08:30:18'), 1: Timestamp('2022-10-01 13:11:54'), 2: Timestamp('2022-10-01 12:35:02'), 3: Timestamp('2022-10-04 08:34:23'), 4: Timestamp('2022-10-05 08:27:04'), 5: Timestamp('2022-10-05 19:05:29'), 6: Timestamp('2022-10-06 08:34:21'), 7: Timestamp('2022-10-06 18:37:56'), 8: Timestamp('2022-10-06 19:08:30'), 9: Timestamp('2022-10-08 13:01:13')}, 'seconds_type3': {0: 0.0, 1: -16896.0, 2: -14684.000000000002, 3: -259445.00000000003, 4: -345406.0, 5: -383711.0, 6: -432243.0, 7: -468458.00000000006, 8: -470292.00000000006, 9: -621055.0}, 'second_or_more_connection': {0: 'no', 1: 'no', 2: 'no', 3: 'no', 4: 'no', 5: 'no', 6: 'no', 7: 'no', 8: 'no', 9: 'no'}}
The desired result is a dF5 with the following columns [['contact_id', 'fullname', 'validation_id', 'LIBEL_LONG_PRODUIT_TITRE', 'TYPE_OPER_VALIDATION']] as well as this new colum dF5['connection]. Don't hestitate to reach out if you need further information or clarifications. Many thanks for your support :)

Custom function to replace missing values in dataframe with median located in pivot table

I am attempting to write a function to replace missing values in the 'total_income' column with the median 'total_income' provided by the pivot table, using the row's 'education' and 'income_type' to index the pivot table. I want to populate using these medians so that the values are as optimal as they can be. Here is what I am testing:
This is the first 5 rows of the dataframe as a dictionary:
{'index': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4},
'children': {0: 1, 1: 1, 2: 0, 3: 3, 4: 0},
'days_employed': {0: 8437.673027760233,
1: 4024.803753850451,
2: 5623.422610230956,
3: 4124.747206540018,
4: 340266.07204682194},
'dob_years': {0: 42, 1: 36, 2: 33, 3: 32, 4: 53},
'education': {0: "bachelor's degree",
1: 'secondary education',
2: 'secondary education',
3: 'secondary education',
4: 'secondary education'},
'education_id': {0: 0, 1: 1, 2: 1, 3: 1, 4: 1},
'family_status': {0: 'married',
1: 'married',
2: 'married',
3: 'married',
4: 'civil partnership'},
'family_status_id': {0: 0, 1: 0, 2: 0, 3: 0, 4: 1},
'gender': {0: 'F', 1: 'F', 2: 'M', 3: 'M', 4: 'F'},
'income_type': {0: 'employee',
1: 'employee',
2: 'employee',
3: 'employee',
4: 'retiree'},
'debt': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'total_income': {0: 40620.102,
1: 17932.802,
2: 23341.752,
3: 42820.568,
4: 25378.572},
'purpose': {0: 'purchase of the house',
1: 'car purchase',
2: 'purchase of the house',
3: 'supplementary education',
4: 'to have a wedding'},
'age_group': {0: 'adult',
1: 'adult',
2: 'adult',
3: 'adult',
4: 'older adult'}}
def fill_income(row):
total_income = row['total_income']
age_group = row['age_group']
income_type = row['income_type']
education = row['education']
table = df.pivot_table(index=['age_group','income_type' ], columns='education', values='total_income', aggfunc='median')
if total_income == 'NaN':
if age_group =='adult':
return table.loc[education, income_type]
My desired output is the pivot table value (the median total_income) for the dataframe row's given education and income_type. When I test it, it returns 'None'.
Thanks in advance for your time helping me with this problem!

How to convert Monthly data into Yearly data in pandas dataframe?

All,
My dataframe looks like following. I am trying to convert my Monthly data into Yearly data. I am trying to aggregate my dataframe such that I can add the monthly data-points for the year 1997 and display the sum column. I would like to perform this activity for the years 1997-2018. I have also included dput of my dataset for reference.
Note: Below snapshot only shows few monthly data for the year 1997 and 1998,However,I have entire monthly data for the years 1997 till 2018.
Dput of the dataframe:
{'RegionID': {0: 84654, 1: 91982, 2: 84616, 3: 93144, 4: 91940}, 'RegionName': {0: 60657, 1: 77494, 2: 60614, 3: 79936, 4: 77449}, 'City': {0: 'Chicago', 1: 'Katy', 2: 'Chicago', 3: 'El Paso', 4: 'Katy'}, 'State': {0: 'IL', 1: 'TX', 2: 'IL', 3: 'TX', 4: 'TX'}, 'Metro': {0: 'Chicago-Naperville-Elgin', 1: 'Houston-The Woodlands-Sugar Land', 2: 'Chicago-Naperville-Elgin', 3: 'El Paso', 4: 'Houston-The Woodlands-Sugar Land'}, 'CountyName': {0: 'Cook County', 1: 'Harris County', 2: 'Cook County', 3: 'El Paso County', 4: 'Harris County'}, 'SizeRank': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, '1997-01': {0: 344400.0, 1: 197300.0, 2: 503400.0, 3: 77800.0, 4: 96600.0}, '1997-02': {0: 345700.0, 1: 195400.0, 2: 502200.0, 3: 77900.0, 4: 96400.0}, '1997-03': {0: 346700.0, 1: 193000.0, 2: 500000.0, 3: 77900.0, 4: 96200.0}, '1997-04': {0: 347800.0, 1: 191800.0, 2: 497900.0, 3: 77800.0, 4: 96100.0}, '1997-05': {0: 349000.0, 1: 191800.0, 2: 496300.0, 3: 77800.0, 4: 96200.0}, '1997-06': {0: 350400.0, 1: 193000.0, 2: 495200.0, 3: 77800.0, 4: 96300.0}, '1997-07': {0: 352000.0, 1: 195200.0, 2: 494700.0, 3: 77800.0, 4: 96600.0}, '1997-08': {0: 353900.0, 1: 198400.0, 2: 494900.0, 3: 77800.0, 4: 97000.0}, '1997-09': {0: 356200.0, 1: 202800.0, 2: 496200.0, 3: 77900.0, 4: 97500.0}, '1997-10': {0: 358800.0, 1: 208000.0, 2: 498600.0, 3: 78100.0, 4: 98000.0}, '1997-11': {0: 361800.0, 1: 213800.0, 2: 502000.0, 3: 78200.0, 4: 98400.0}, '1997-12': {0: 365700.0, 1: 220700.0, 2: 507600.0, 3: 78400.0, 4: 98800.0}, '1998-01': {0: 370200.0, 1: 227500.0, 2: 514900.0, 3: 78600.0, 4: 99200.0}, '1998-02': {0: 374700.0, 1: 231800.0, 2: 522200.0, 3: 78800.0, 4: 99500.0}, '1998-03': {0: 378900.0, 1: 233400.0, 2: 529500.0, 3: 79000.0, 4: 99700.0}, '1998-04': {0: 383500.0, 1: 233900.0, 2: 537900.0, 3: 79100.0, 4: 100000.0}, '1998-05': {0: 388300.0, 1: 233500.0, 2: 546900.0, 3: 79200.0, 4: 100200.0}, '1998-06': {0: 393300.0, 1: 233300.0, 2: 556400.0, 3: 79300.0, 4: 100400.0}, '1998-07': {0: 398500.0, 1: 234300.0, 2: 566100.0, 3: 79300.0, 4: 100700.0}, '1998-08': {0: 403800.0, 1: 237400.0, 2: 575600.0, 3: 79300.0, 4: 101100.0}, '1998-09': {0: 409100.0, 1: 242800.0, 2: 584800.0, 3: 79400.0, 4: 101800.0}, '1998-10': {0: 414600.0, 1: 250200.0, 2: 593500.0, 3: 79500.0, 4: 102900.0}, '1998-11': {0: 420100.0, 1: 258600.0, 2: 601600.0, 3: 79500.0, 4: 104300.0}, '1998-12': {0: 426200.0, 1: 268000.0, 2: 610100.0, 3: 79600.0, 4: 106200.0}, '1999-01': {0: 432600.0, 1: 277000.0, 2: 618600.0, 3: 79700.0, 4: 108400.0}, '1999-02': {0: 438600.0, 1: 283600.0, 2: 625600.0, 3: 79900.0, 4: 110400.0}, '1999-03': {0: 444200.0, 1: 288500.0, 2: 631100.0, 3: 80100.0, 4: 112100.0}, '1999-04': {0: 450000.0, 1: 293900.0, 2: 636600.0, 3: 80300.0, 4: 113200.0}, '1999-05': {0: 455900.0, 1: 299200.0, 2: 642100.0, 3: 80600.0, 4: 113600.0}, '1999-06': {0: 462100.0, 1: 304300.0, 2: 647600.0, 3: 80900.0, 4: 113500.0}, '1999-07': {0: 468500.0, 1: 308600.0, 2: 653300.0, 3: 81200.0, 4: 113000.0}, '1999-08': {0: 475300.0, 1: 311400.0, 2: 659300.0, 3: 81400.0, 4: 112500.0}, '1999-09': {0: 482500.0, 1: 312300.0, 2: 665800.0, 3: 81700.0, 4: 112200.0}, '1999-10': {0: 490200.0, 1: 311900.0, 2: 672900.0, 3: 82100.0, 4: 112100.0}, '1999-11': {0: 498200.0, 1: 311100.0, 2: 680500.0, 3: 82400.0, 4: 112400.0}, '1999-12': {0: 507200.0, 1: 311700.0, 2: 689600.0, 3: 82600.0, 4: 113100.0}, '2000-01': {0: 516800.0, 1: 313500.0, 2: 699700.0, 3: 82800.0, 4: 114200.0}, '2000-02': {0: 526300.0, 1: 315000.0, 2: 709300.0, 3: 82900.0, 4: 115700.0}, '2000-03': {0: 535300.0, 1: 316700.0, 2: 718300.0, 3: 83000.0, 4: 117800.0}, '2000-04': {0: 544500.0, 1: 319800.0, 2: 727600.0, 3: 83000.0, 4: 120300.0}, '2000-05': {0: 553500.0, 1: 323700.0, 2: 737100.0, 3: 82900.0, 4: 122900.0}, '2000-06': {0: 562400.0, 1: 327500.0, 2: 746600.0, 3: 82800.0, 4: 125600.0}, '2000-07': {0: 571200.0, 1: 329900.0, 2: 756200.0, 3: 82700.0, 4: 128000.0}, '2000-08': {0: 579800.0, 1: 329800.0, 2: 765800.0, 3: 82400.0, 4: 129800.0}, '2000-09': {0: 588100.0, 1: 326400.0, 2: 775100.0, 3: 82100.0, 4: 130800.0}, '2000-10': {0: 596300.0, 1: 320100.0, 2: 784400.0, 3: 81900.0, 4: 130900.0}, '2000-11': {0: 604200.0, 1: 312200.0, 2: 793500.0, 3: 81600.0, 4: 129900.0}, '2000-12': {0: 612200.0, 1: 304700.0, 2: 803000.0, 3: 81300.0, 4: 128000.0}, '2001-01': {0: 620200.0, 1: 298700.0, 2: 812500.0, 3: 81000.0, 4: 125600.0}, '2001-02': {0: 627700.0, 1: 294300.0, 2: 821200.0, 3: 80800.0, 4: 123000.0}, '2001-03': {0: 634500.0, 1: 291400.0, 2: 829200.0, 3: 80600.0, 4: 120500.0}, '2001-04': {0: 641000.0, 1: 290800.0, 2: 837000.0, 3: 80300.0, 4: 118300.0}, '2001-05': {0: 647000.0, 1: 291700.0, 2: 844400.0, 3: 80000.0, 4: 116600.0}, '2001-06': {0: 652700.0, 1: 293000.0, 2: 851600.0, 3: 79800.0, 4: 115200.0}, '2001-07': {0: 658100.0, 1: 293600.0, 2: 858600.0, 3: 79500.0, 4: 114200.0}, '2001-08': {0: 663300.0, 1: 292900.0, 2: 865300.0, 3: 79200.0, 4: 113500.0}, '2001-09': {0: 668400.0, 1: 290500.0, 2: 871800.0, 3: 78900.0, 4: 113200.0}, '2001-10': {0: 673400.0, 1: 286700.0, 2: 878200.0, 3: 78600.0, 4: 113100.0}, '2001-11': {0: 678300.0, 1: 282200.0, 2: 884700.0, 3: 78400.0, 4: 113200.0}, '2001-12': {0: 683200.0, 1: 276900.0, 2: 891300.0, 3: 78200.0, 4: 113400.0}, '2002-01': {0: 688300.0, 1: 271000.0, 2: 898000.0, 3: 78200.0, 4: 113700.0}, '2002-02': {0: 693300.0, 1: 264200.0, 2: 904700.0, 3: 78200.0, 4: 114000.0}, '2002-03': {0: 698000.0, 1: 257000.0, 2: 911200.0, 3: 78300.0, 4: 114300.0}, '2002-04': {0: 702400.0, 1: 249700.0, 2: 917600.0, 3: 78400.0, 4: 114700.0}, '2002-05': {0: 706400.0, 1: 243100.0, 2: 923800.0, 3: 78600.0, 4: 115100.0}, '2002-06': {0: 710200.0, 1: 237000.0, 2: 929800.0, 3: 78900.0, 4: 115500.0}, '2002-07': {0: 714000.0, 1: 231700.0, 2: 935700.0, 3: 79200.0, 4: 116100.0}, '2002-08': {0: 717800.0, 1: 227100.0, 2: 941400.0, 3: 79500.0, 4: 116700.0}, '2002-09': {0: 721700.0, 1: 223300.0, 2: 947100.0, 3: 79900.0, 4: 117200.0}, '2002-10': {0: 725700.0, 1: 220300.0, 2: 952800.0, 3: 80300.0, 4: 117800.0}, '2002-11': {0: 729900.0, 1: 217300.0, 2: 958900.0, 3: 80700.0, 4: 118200.0}, '2002-12': {0: 733400.0, 1: 214700.0, 2: 965100.0, 3: 81000.0, 4: 118500.0}, '2003-01': {0: 735600.0, 1: 213800.0, 2: 971000.0, 3: 81200.0, 4: 118800.0}, '2003-02': {0: 737200.0, 1: 215100.0, 2: 976400.0, 3: 81400.0, 4: 119100.0}, '2003-03': {0: 739000.0, 1: 217300.0, 2: 981400.0, 3: 81500.0, 4: 119300.0}, '2003-04': {0: 740900.0, 1: 219600.0, 2: 985700.0, 3: 81500.0, 4: 119500.0}, '2003-05': {0: 742600.0, 1: 221400.0, 2: 989400.0, 3: 81600.0, 4: 119600.0}, '2003-06': {0: 744400.0, 1: 222300.0, 2: 992900.0, 3: 81700.0, 4: 119700.0}, '2003-07': {0: 746000.0, 1: 222700.0, 2: 996800.0, 3: 81900.0, 4: 119900.0}, '2003-08': {0: 747200.0, 1: 223000.0, 2: 1000800.0, 3: 82000.0, 4: 120200.0}, '2003-09': {0: 748000.0, 1: 223700.0, 2: 1004600.0, 3: 82200.0, 4: 120500.0}, '2003-10': {0: 749000.0, 1: 225100.0, 2: 1008000.0, 3: 82500.0, 4: 120900.0}, '2003-11': {0: 750200.0, 1: 227200.0, 2: 1010600.0, 3: 82900.0, 4: 121500.0}, '2003-12': {0: 752300.0, 1: 229600.0, 2: 1012600.0, 3: 83400.0, 4: 122500.0}, '2004-01': {0: 755300.0, 1: 231800.0, 2: 1014500.0, 3: 84000.0, 4: 123900.0}, '2004-02': {0: 759200.0, 1: 233100.0, 2: 1017000.0, 3: 84700.0, 4: 125300.0}, '2004-03': {0: 764000.0, 1: 233500.0, 2: 1020500.0, 3: 85500.0, 4: 126600.0}, '2004-04': {0: 769600.0, 1: 233000.0, 2: 1024900.0, 3: 86400.0, 4: 127500.0}, '2004-05': {0: 775600.0, 1: 232100.0, 2: 1029800.0, 3: 87200.0, 4: 128100.0}, '2004-06': {0: 781900.0, 1: 231300.0, 2: 1035100.0, 3: 88000.0, 4: 128500.0}, '2004-07': {0: 787900.0, 1: 230700.0, 2: 1040500.0, 3: 88900.0, 4: 128800.0}, '2004-08': {0: 793200.0, 1: 230800.0, 2: 1046000.0, 3: 89700.0, 4: 128900.0}, '2004-09': {0: 798200.0, 1: 231500.0, 2: 1052100.0, 3: 90400.0, 4: 129000.0}, '2004-10': {0: 803100.0, 1: 232700.0, 2: 1058600.0, 3: 91100.0, 4: 129200.0}, '2004-11': {0: 807900.0, 1: 234000.0, 2: 1065000.0, 3: 91900.0, 4: 129400.0}, '2004-12': {0: 812900.0, 1: 235500.0, 2: 1071900.0, 3: 92700.0, 4: 129800.0}, '2005-01': {0: 818100.0, 1: 237000.0, 2: 1079000.0, 3: 93600.0, 4: 130100.0}, '2005-02': {0: 823200.0, 1: 238700.0, 2: 1086000.0, 3: 94400.0, 4: 130200.0}, '2005-03': {0: 828300.0, 1: 240600.0, 2: 1093100.0, 3: 95200.0, 4: 130300.0}, '2005-04': {0: 834000.0, 1: 241800.0, 2: 1100500.0, 3: 95800.0, 4: 130400.0}, '2005-05': {0: 839800.0, 1: 241700.0, 2: 1107400.0, 3: 96300.0, 4: 130400.0}, '2005-06': {0: 845600.0, 1: 240700.0, 2: 1113500.0, 3: 96700.0, 4: 130300.0}, '2005-07': {0: 851700.0, 1: 239300.0, 2: 1118800.0, 3: 97200.0, 4: 130100.0}, '2005-08': {0: 858000.0, 1: 238000.0, 2: 1123700.0, 3: 97700.0, 4: 129800.0}, '2005-09': {0: 864300.0, 1: 236900.0, 2: 1129200.0, 3: 98400.0, 4: 129400.0}, '2005-10': {0: 870600.0, 1: 235700.0, 2: 1135400.0, 3: 99000.0, 4: 129000.0}, '2005-11': {0: 876200.0, 1: 234700.0, 2: 1141900.0, 3: 99600.0, 4: 128800.0}, '2005-12': {0: 880600.0, 1: 233400.0, 2: 1148000.0, 3: 100200.0, 4: 128800.0}, '2006-01': {0: 884500.0, 1: 231700.0, 2: 1152800.0, 3: 101000.0, 4: 129000.0}, '2006-02': {0: 887800.0, 1: 230100.0, 2: 1155900.0, 3: 102000.0, 4: 129200.0}, '2006-03': {0: 890600.0, 1: 229000.0, 2: 1157900.0, 3: 103000.0, 4: 129400.0}, '2006-04': {0: 893200.0, 1: 228500.0, 2: 1159500.0, 3: 104300.0, 4: 129500.0}, '2006-05': {0: 895500.0, 1: 228700.0, 2: 1161000.0, 3: 105800.0, 4: 129700.0}, '2006-06': {0: 897300.0, 1: 229400.0, 2: 1162800.0, 3: 107400.0, 4: 130000.0}, '2006-07': {0: 898900.0, 1: 230400.0, 2: 1165300.0, 3: 109100.0, 4: 130300.0}, '2006-08': {0: 900300.0, 1: 231600.0, 2: 1168100.0, 3: 111000.0, 4: 130700.0}, '2006-09': {0: 902000.0, 1: 233000.0, 2: 1171300.0, 3: 113000.0, 4: 131200.0}, '2006-10': {0: 904300.0, 1: 234700.0, 2: 1174400.0, 3: 115000.0, 4: 131800.0}, '2006-11': {0: 907000.0, 1: 237100.0, 2: 1176700.0, 3: 117000.0, 4: 132300.0}, '2006-12': {0: 909500.0, 1: 240200.0, 2: 1178400.0, 3: 118800.0, 4: 132700.0}, '2007-01': {0: 912000.0, 1: 242900.0, 2: 1179900.0, 3: 120600.0, 4: 133000.0}, '2007-02': {0: 913400.0, 1: 244600.0, 2: 1181100.0, 3: 122200.0, 4: 133200.0}, '2007-03': {0: 913200.0, 1: 245200.0, 2: 1182800.0, 3: 124000.0, 4: 133600.0}, '2007-04': {0: 911800.0, 1: 245200.0, 2: 1184800.0, 3: 126000.0, 4: 134100.0}, '2007-05': {0: 909200.0, 1: 245000.0, 2: 1185300.0, 3: 128000.0, 4: 134700.0}, '2007-06': {0: 905200.0, 1: 245600.0, 2: 1183700.0, 3: 129600.0, 4: 135400.0}, '2007-07': {0: 901300.0, 1: 246900.0, 2: 1181000.0, 3: 130700.0, 4: 136000.0}, '2007-08': {0: 897900.0, 1: 248700.0, 2: 1177900.0, 3: 131400.0, 4: 136600.0}, '2007-09': {0: 895300.0, 1: 250700.0, 2: 1175400.0, 3: 132000.0, 4: 137000.0}, '2007-10': {0: 893500.0, 1: 252500.0, 2: 1173800.0, 3: 132300.0, 4: 137300.0}, '2007-11': {0: 891100.0, 1: 254000.0, 2: 1171700.0, 3: 132300.0, 4: 137400.0}, '2007-12': {0: 886700.0, 1: 254800.0, 2: 1167900.0, 3: 132000.0, 4: 137200.0}, '2008-01': {0: 881900.0, 1: 254000.0, 2: 1162900.0, 3: 131300.0, 4: 136500.0}, '2008-02': {0: 876500.0, 1: 252400.0, 2: 1157000.0, 3: 130300.0, 4: 135600.0}, '2008-03': {0: 870600.0, 1: 250900.0, 2: 1150700.0, 3: 129300.0, 4: 134700.0}, '2008-04': {0: 864900.0, 1: 249600.0, 2: 1144200.0, 3: 128300.0, 4: 133800.0}, '2008-05': {0: 859000.0, 1: 248400.0, 2: 1135900.0, 3: 127300.0, 4: 133000.0}, '2008-06': {0: 851600.0, 1: 247900.0, 2: 1125700.0, 3: 126300.0, 4: 132000.0}, '2008-07': {0: 843800.0, 1: 247700.0, 2: 1114200.0, 3: 125400.0, 4: 131200.0}, '2008-08': {0: 836400.0, 1: 247800.0, 2: 1102200.0, 3: 124600.0, 4: 130500.0}, '2008-09': {0: 830700.0, 1: 247900.0, 2: 1092100.0, 3: 123900.0, 4: 130000.0}, '2008-10': {0: 827300.0, 1: 247800.0, 2: 1085300.0, 3: 123300.0, 4: 129400.0}, '2008-11': {0: 824800.0, 1: 247600.0, 2: 1079400.0, 3: 122600.0, 4: 128700.0}, '2008-12': {0: 821400.0, 1: 247500.0, 2: 1072500.0, 3: 122100.0, 4: 128200.0}, '2009-01': {0: 818500.0, 1: 246600.0, 2: 1065400.0, 3: 121600.0, 4: 127600.0}, '2009-02': {0: 815200.0, 1: 245700.0, 2: 1057900.0, 3: 121200.0, 4: 127100.0}, '2009-03': {0: 810200.0, 1: 245600.0, 2: 1048900.0, 3: 120800.0, 4: 126400.0}, '2009-04': {0: 803500.0, 1: 246000.0, 2: 1037900.0, 3: 120300.0, 4: 125900.0}, '2009-05': {0: 795400.0, 1: 246300.0, 2: 1024300.0, 3: 119700.0, 4: 125300.0}, '2009-06': {0: 786800.0, 1: 246800.0, 2: 1010100.0, 3: 119100.0, 4: 124700.0}, '2009-07': {0: 780500.0, 1: 247200.0, 2: 999000.0, 3: 118700.0, 4: 124300.0}, '2009-08': {0: 776800.0, 1: 247600.0, 2: 990800.0, 3: 118400.0, 4: 124100.0}, '2009-09': {0: 774600.0, 1: 247900.0, 2: 985400.0, 3: 118200.0, 4: 124100.0}, '2009-10': {0: 774200.0, 1: 248100.0, 2: 983300.0, 3: 117900.0, 4: 124200.0}, '2009-11': {0: 774500.0, 1: 248200.0, 2: 982800.0, 3: 117600.0, 4: 124400.0}, '2009-12': {0: 775800.0, 1: 248000.0, 2: 983000.0, 3: 117500.0, 4: 124500.0}, '2010-01': {0: 774600.0, 1: 249800.0, 2: 985000.0, 3: 117300.0, 4: 124700.0}, '2010-02': {0: 774500.0, 1: 250500.0, 2: 988000.0, 3: 117300.0, 4: 125000.0}, '2010-03': {0: 773800.0, 1: 250100.0, 2: 986200.0, 3: 116900.0, 4: 125100.0}, '2010-04': {0: 769500.0, 1: 250400.0, 2: 978800.0, 3: 116100.0, 4: 124600.0}, '2010-05': {0: 765800.0, 1: 251800.0, 2: 974700.0, 3: 115700.0, 4: 124200.0}, '2010-06': {0: 767300.0, 1: 251300.0, 2: 975300.0, 3: 116100.0, 4: 124100.0}, '2010-07': {0: 765500.0, 1: 251200.0, 2: 973600.0, 3: 116400.0, 4: 124100.0}, '2010-08': {0: 761300.0, 1: 250600.0, 2: 967500.0, 3: 116700.0, 4: 123700.0}, '2010-09': {0: 756700.0, 1: 250000.0, 2: 957800.0, 3: 117400.0, 4: 123400.0}, '2010-10': {0: 747800.0, 1: 250000.0, 2: 945800.0, 3: 118200.0, 4: 123000.0}, '2010-11': {0: 738600.0, 1: 249700.0, 2: 935500.0, 3: 118700.0, 4: 122400.0}, '2010-12': {0: 732000.0, 1: 248100.0, 2: 927000.0, 3: 118800.0, 4: 121400.0}, '2011-01': {0: 730800.0, 1: 247400.0, 2: 924800.0, 3: 119000.0, 4: 120800.0}, '2011-02': {0: 732200.0, 1: 248500.0, 2: 926800.0, 3: 118800.0, 4: 120200.0}, '2011-03': {0: 732500.0, 1: 249400.0, 2: 925200.0, 3: 118300.0, 4: 119900.0}, '2011-04': {0: 731300.0, 1: 249200.0, 2: 918500.0, 3: 118100.0, 4: 120100.0}, '2011-05': {0: 731500.0, 1: 249300.0, 2: 914200.0, 3: 117600.0, 4: 120000.0}, '2011-06': {0: 731400.0, 1: 249500.0, 2: 912100.0, 3: 116800.0, 4: 119600.0}, '2011-07': {0: 732400.0, 1: 249500.0, 2: 913700.0, 3: 116500.0, 4: 119000.0}, '2011-08': {0: 735100.0, 1: 249400.0, 2: 919800.0, 3: 116100.0, 4: 118100.0}, '2011-09': {0: 736500.0, 1: 248900.0, 2: 924800.0, 3: 114800.0, 4: 117100.0}, '2011-10': {0: 736600.0, 1: 248000.0, 2: 925000.0, 3: 113500.0, 4: 116800.0}, '2011-11': {0: 735900.0, 1: 247100.0, 2: 924800.0, 3: 112800.0, 4: 116700.0}, '2011-12': {0: 739000.0, 1: 247000.0, 2: 930400.0, 3: 112700.0, 4: 116400.0}, '2012-01': {0: 739300.0, 1: 248600.0, 2: 930800.0, 3: 112400.0, 4: 116000.0}, '2012-02': {0: 735600.0, 1: 251200.0, 2: 925800.0, 3: 112200.0, 4: 115900.0}, '2012-03': {0: 735700.0, 1: 252600.0, 2: 927300.0, 3: 112400.0, 4: 115800.0}, '2012-04': {0: 741600.0, 1: 252600.0, 2: 940100.0, 3: 112800.0, 4: 115200.0}, '2012-05': {0: 746200.0, 1: 252700.0, 2: 954200.0, 3: 113200.0, 4: 114700.0}, '2012-06': {0: 752200.0, 1: 252700.0, 2: 967900.0, 3: 113400.0, 4: 114700.0}, '2012-07': {0: 762000.0, 1: 252400.0, 2: 978100.0, 3: 113100.0, 4: 115000.0}, '2012-08': {0: 772800.0, 1: 252500.0, 2: 986000.0, 3: 112800.0, 4: 115500.0}, '2012-09': {0: 781400.0, 1: 253300.0, 2: 995100.0, 3: 112900.0, 4: 115800.0}, '2012-10': {0: 788800.0, 1: 254200.0, 2: 1002400.0, 3: 112900.0, 4: 115900.0}, '2012-11': {0: 795800.0, 1: 255200.0, 2: 1005000.0, 3: 112900.0, 4: 116200.0}, '2012-12': {0: 800900.0, 1: 256600.0, 2: 1005100.0, 3: 112800.0, 4: 116700.0}, '2013-01': {0: 804200.0, 1: 257000.0, 2: 1008500.0, 3: 113000.0, 4: 117300.0}, '2013-02': {0: 808100.0, 1: 256500.0, 2: 1015700.0, 3: 113400.0, 4: 117900.0}, '2013-03': {0: 813200.0, 1: 256600.0, 2: 1027500.0, 3: 113600.0, 4: 118500.0}, '2013-04': {0: 819200.0, 1: 257300.0, 2: 1040800.0, 3: 113500.0, 4: 119300.0}, '2013-05': {0: 827900.0, 1: 258400.0, 2: 1055300.0, 3: 113300.0, 4: 120500.0}, '2013-06': {0: 838200.0, 1: 260700.0, 2: 1071300.0, 3: 113000.0, 4: 121800.0}, '2013-07': {0: 848300.0, 1: 263900.0, 2: 1090600.0, 3: 112900.0, 4: 123000.0}, '2013-08': {0: 853800.0, 1: 266900.0, 2: 1108500.0, 3: 112900.0, 4: 124300.0}, '2013-09': {0: 856500.0, 1: 269100.0, 2: 1123600.0, 3: 112700.0, 4: 125400.0}, '2013-10': {0: 856800.0, 1: 270900.0, 2: 1135600.0, 3: 112500.0, 4: 126100.0}, '2013-11': {0: 855400.0, 1: 273100.0, 2: 1142400.0, 3: 112300.0, 4: 126800.0}, '2013-12': {0: 854500.0, 1: 275800.0, 2: 1145800.0, 3: 112000.0, 4: 127600.0}, '2014-01': {0: 858500.0, 1: 277700.0, 2: 1148400.0, 3: 111500.0, 4: 128400.0}, '2014-02': {0: 862700.0, 1: 279600.0, 2: 1150700.0, 3: 111500.0, 4: 129100.0}, '2014-03': {0: 866500.0, 1: 282100.0, 2: 1152700.0, 3: 112100.0, 4: 130100.0}, '2014-04': {0: 874900.0, 1: 284500.0, 2: 1157700.0, 3: 112600.0, 4: 131300.0}, '2014-05': {0: 885100.0, 1: 286200.0, 2: 1162400.0, 3: 112700.0, 4: 132600.0}, '2014-06': {0: 890800.0, 1: 288300.0, 2: 1165200.0, 3: 113100.0, 4: 133700.0}, '2014-07': {0: 893800.0, 1: 290700.0, 2: 1169400.0, 3: 113900.0, 4: 134500.0}, '2014-08': {0: 894100.0, 1: 293100.0, 2: 1174900.0, 3: 114300.0, 4: 135300.0}, '2014-09': {0: 891300.0, 1: 295600.0, 2: 1175700.0, 3: 114400.0, 4: 136400.0}, '2014-10': {0: 889700.0, 1: 298200.0, 2: 1174000.0, 3: 114300.0, 4: 137600.0}, '2014-11': {0: 891900.0, 1: 300200.0, 2: 1176300.0, 3: 114200.0, 4: 138800.0}, '2014-12': {0: 894300.0, 1: 301500.0, 2: 1180100.0, 3: 114300.0, 4: 140000.0}, '2015-01': {0: 895000, 1: 301800, 2: 1178600, 3: 114700, 4: 141000}, '2015-02': {0: 897300, 1: 302200, 2: 1176700, 3: 115000, 4: 142000}, '2015-03': {0: 903700, 1: 303700, 2: 1180800, 3: 115100, 4: 143300}, '2015-04': {0: 911300, 1: 306600, 2: 1187600, 3: 115300, 4: 144800}, '2015-05': {0: 915600, 1: 309300, 2: 1193500, 3: 115700, 4: 146100}, '2015-06': {0: 916200, 1: 311900, 2: 1198300, 3: 115900, 4: 147200}, '2015-07': {0: 916700, 1: 314100, 2: 1199600, 3: 115600, 4: 148500}, '2015-08': {0: 918600, 1: 316000, 2: 1198000, 3: 115300, 4: 149700}, '2015-09': {0: 924400, 1: 318600, 2: 1199200, 3: 115300, 4: 151100}, '2015-10': {0: 935600, 1: 321800, 2: 1206600, 3: 115400, 4: 152200}, '2015-11': {0: 947200, 1: 324400, 2: 1218000, 3: 115700, 4: 153000}, '2015-12': {0: 950900, 1: 326400, 2: 1226400, 3: 116200, 4: 154100}, '2016-01': {0: 952700, 1: 327400, 2: 1230300, 3: 116200, 4: 156000}, '2016-02': {0: 959000, 1: 326900, 2: 1234700, 3: 115700, 4: 157800}, '2016-03': {0: 966400, 1: 327300, 2: 1240300, 3: 115100, 4: 159600}, '2016-04': {0: 970300, 1: 328900, 2: 1244700, 3: 114700, 4: 161700}, '2016-05': {0: 973200, 1: 330000, 2: 1245800, 3: 114300, 4: 164200}, '2016-06': {0: 973300, 1: 330000, 2: 1245300, 3: 114000, 4: 166100}, '2016-07': {0: 970600, 1: 328900, 2: 1243700, 3: 114000, 4: 167400}, '2016-08': {0: 971800, 1: 327500, 2: 1243400, 3: 113800, 4: 168100}, '2016-09': {0: 977800, 1: 326300, 2: 1245000, 3: 114000, 4: 168400}, '2016-10': {0: 985200, 1: 325300, 2: 1250800, 3: 114800, 4: 168400}, '2016-11': {0: 992900, 1: 324700, 2: 1259300, 3: 115600, 4: 168400}, '2016-12': {0: 997600, 1: 324700, 2: 1266600, 3: 116200, 4: 168400}, '2017-01': {0: 996000, 1: 323700, 2: 1270800, 3: 116800, 4: 168200}, '2017-02': {0: 993100, 1: 322100, 2: 1274500, 3: 117400, 4: 167900}, '2017-03': {0: 991500, 1: 320800, 2: 1278900, 3: 117800, 4: 167400}, '2017-04': {0: 990000, 1: 320400, 2: 1282600, 3: 118200, 4: 167000}, '2017-05': {0: 991400, 1: 320300, 2: 1285800, 3: 118700, 4: 166900}, '2017-06': {0: 998200, 1: 320900, 2: 1288100, 3: 119000, 4: 166800}, '2017-07': {0: 1004000, 1: 320900, 2: 1288500, 3: 119100, 4: 166800}, '2017-08': {0: 1006800, 1: 320300, 2: 1287500, 3: 119400, 4: 167300}, '2017-09': {0: 1008400, 1: 319800, 2: 1289200, 3: 119900, 4: 168300}, '2017-10': {0: 1011300, 1: 320200, 2: 1295000, 3: 120200, 4: 169500}, '2017-11': {0: 1015500, 1: 320800, 2: 1301100, 3: 120200, 4: 170700}, '2017-12': {0: 1022000, 1: 321100, 2: 1304300, 3: 120100, 4: 172100}, '2018-01': {0: 1028900, 1: 322700, 2: 1310100, 3: 120300, 4: 173500}, '2018-02': {0: 1034500, 1: 326500, 2: 1315300, 3: 120500, 4: 174600}, '2018-03': {0: 1037400, 1: 330400, 2: 1317900, 3: 120800, 4: 175500}, '2018-04': {0: 1038700, 1: 332700, 2: 1321100, 3: 121300, 4: 176400}, '2018-05': {0: 1041500, 1: 334500, 2: 1325300, 3: 122200, 4: 176900}, '2018-06': {0: 1042800, 1: 335900, 2: 1323800, 3: 123000, 4: 176900}, '2018-07': {0: 1042900, 1: 337000, 2: 1321200, 3: 123600, 4: 177300}, '2018-08': {0: 1044400, 1: 338300, 2: 1320700, 3: 124500, 4: 178000}, '2018-09': {0: 1047800, 1: 338400, 2: 1319500, 3: 125600, 4: 178500}, '2018-10': {0: 1049700, 1: 336900, 2: 1318800, 3: 126300, 4: 179300}, '2018-11': {0: 1048300, 1: 336000, 2: 1319700, 3: 126800, 4: 180200}, '2018-12': {0: 1047900, 1: 336500, 2: 1323300, 3: 127400, 4: 180700}}
I am new to Python, so please provide explanation with your code.
You can perform a groupby and sum on the columns:
df.iloc[:,7:].groupby(by=lambda x: x.split('-')[0], axis=1).sum().add_suffix('_sum')
We extract the monthly data and aggregate by the year. For this, I specify a callback to split the column name and return the year. So, for example x.split('-')[0] returns 1997 whenever x is 1997-XX.

Apply function across pandas dataframe columns

This seems to have been similarly answered, but I can't get it to work.
I have a pandas DataFrame that looks like sig_vars below. This df has a VAF and a Background column. I would like to use the ztest function from statsmodels to assign a p-value to a new p-value column.
The p-value is calculated something like this for each row:
from statsmodels.stats.weightstats import ztest
p_value = ztest(sig_vars.Background,value=sig_vars.VAF)[1]
I have tried something like this, but I can't quite get it to work:
def calc(x):
return ztest(x.Background, value=x.VAF.astype(float))[1]
sig_vars.dropna().assign(pval = lambda x: calc(x)).head()
It seems strange to me that this works just fine however:
def calc(x):
return ztest([0.0001,0.0002,0.0001], value=x.VAF.astype(float))[1]
sig_vars.dropna().assign(pval = lambda x: calc(x)).head()
Here is my DataFrame sig_vars:
sig_vars = pd.DataFrame({'AO': {0: 4.0, 1: 16.0, 2: 12.0, 3: 19.0, 4: 2.0},
'Background': {0: nan,
1: [0.00018832391713747646, 0.0002114408734430263, 0.000247843759294141],
2: nan,
3: [0.00023965141612200435,
0.00018864365214110544,
0.00036566589684372596,
0.0005452562704471102],
4: [0.00017349063150589867]},
'Change': {0: 'T>A', 1: 'T>C', 2: 'T>A', 3: 'T>C', 4: 'C>A'},
'Chrom': {0: 'chr1', 1: 'chr1', 2: 'chr1', 3: 'chr1', 4: 'chr1'},
'ConvChange': {0: 'T>A', 1: 'T>C', 2: 'T>A', 3: 'T>C', 4: 'C>A'},
'DP': {0: 16945.0, 1: 16945.0, 2: 16969.0, 3: 16969.0, 4: 16969.0},
'Downstream': {0: 'NaN', 1: 'NaN', 2: 'NaN', 3: 'NaN', 4: 'NaN'},
'Gene': {0: 'TIIIa', 1: 'TIIIa', 2: 'TIIIa', 3: 'TIIIa', 4: 'TIIIa'},
'ID': {0: '86.fastq/onlyProbedRegions.vcf',
1: '86.fastq/onlyProbedRegions.vcf',
2: '86.fastq/onlyProbedRegions.vcf',
3: '86.fastq/onlyProbedRegions.vcf',
4: '86.fastq/onlyProbedRegions.vcf'},
'Individual': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'IntEx': {0: 'TIII', 1: 'TIII', 2: 'TIII', 3: 'TIII', 4: 'TIII'},
'Loc': {0: 115227854, 1: 115227854, 2: 115227855, 3: 115227855, 4: 115227856},
'Upstream': {0: 'NaN', 1: 'NaN', 2: 'NaN', 3: 'NaN', 4: 'NaN'},
'VAF': {0: 0.00023605783416937148,
1: 0.0009442313366774859,
2: 0.0007071719017031057,
3: 0.0011196888443632507,
4: 0.00011786198361718427},
'Var': {0: 'A', 1: 'C', 2: 'A', 3: 'C', 4: 'A'},
'WT': {0: 'T', 1: 'T', 2: 'T', 3: 'T', 4: 'C'}})
Try this:
def calc(x):
return ztest(x['Background'], value=float(x['VAF']))[1]
sig_vars['pval'] = sig_vars.dropna().apply(calc, axis=1)

Categories