how to apply a check on below scenarios using if else conditions? - python

I have a dataframe like this -Please refer the dataframe as in the image shown
There are four columns('status','preferred_time','history','id'), need to check if all the columns have some values in it or not, in the history column ,its a nested list in some cases, so need to specially check nested list has all mandatory keys 'branch','rank','discharge_status','service_start',job_code','post_intention' have values in it, and add a column named "output" in the dataframe if all the columns have values then name that as "completed" else "pending" if blank or NaN or [{}] in any column or history column has any missing key value pair.
From the image , only the first row should be in completed state rest should fall in pending.
Please help me out in building better if else situation here in this scenario.
Thanks in advance.
Dict of above df image -
{'status': {0: 'No', 1: 'No', 2: nan, 3: 'No', 4: 'No'},
'preferred_time': {0: "['Morning', 'Midday', 'Afternoon']",
1: [],
2: "['Morning'] ",
3: nan,
4: "['Morning', 'Midday'] "},
'history': {0: "[{'branch': 'A', 'rank': 'E7', 'discharge_status': 'Honorable Discharge', 'service_start': '1999-02-13', 'job_code': '09', 'post_intention': ['No']}]",
1: "[{'branch': 'A', 'rank': 'E7', 'discharge_status': 'Honorable Discharge', 'service_start': '1999-02-13', 'job_code': '09', 'post_intention': ['No']}]",
2: "[{'branch': 'A', 'rank': 'E7', 'discharge_status': 'Honorable Discharge', 'service_start': '1995-02-13', 'job_code': '09', 'post_intention': ['No']},{'branch': 'A', 'rank': 'E6', 'discharge_status': 'Honorable Discharge', 'service_start': '2015-02-13', 'job_code': '09'}]",
3: nan,
4: '[{}]'},
'id': {0: 1, 1: 5, 2: 2, 3: 3, 4: 4}}
I tried below lines of code -
But I don't know how to check all the four columns in a single if statement -
for i in df.index:
status = df['status'][i]
preferred_time = df['preferred_time'][i]
id = df['id'][i]
history = df['history'][i]
if status and preferred_time and id and status!='' and preferred_time!= '' and id!='':
enroll_status = "completed"
else:
enroll_status = "pending"
if history!= '' or str(history)!= '[{}]':
for item in history:
if 'branch' in item.keys() and'rank' in item.keys() and'discharge_status' in item.keys() and'service_start' in item.keys() and 'job_code' in item.keys() and 'post_intention' in item.keys():
enroll_status = "completed"
else:
enroll_status = "pending"

Consider the following:
import numpy as np
import pandas as pd
from numpy import nan
def check_list(L):
if not isinstance(L,list):
return False
return all(k in d for k in keys_req for d in L)
labels = np.array(["pending","completed"])
keys_req = ['branch','rank','discharge_status','service_start','job_code','post_intention']
d = {'status': {0: 'No', 1: 'No', 2: nan, 3: 'No', 4: 'No'}, 'preferred_time': {0: "['Morning', 'Midday', 'Afternoon']", 1: nan, 2: "['Morning'] ", 3: nan, 4: "['Morning', 'Midday'] "}, 'history': {0: "[{'branch': 'A', 'rank': 'E7', 'discharge_status': 'Honorable Discharge', 'service_start': '1999-02-13', 'job_code': '09', 'post_intention': ['No']}]", 1: nan, 2: "[{'branch': 'A', 'rank': 'E7', 'discharge_status': 'Honorable Discharge', 'service_start': '1995-02-13', 'job_code': '09', 'post_intention': ['No']},{'branch': 'A', 'rank': 'E6', 'discharge_status': 'Honorable Discharge', 'service_start': '2015-02-13', 'job_code': '09'}]", 3: nan, 4: '[{}]'}, 'id': {0: 1, 1: 5, 2: 2, 3: 3, 4: 4}}
df = pd.DataFrame(d)
df['history_list'] = df['history'].apply(lambda x: eval(x) if isinstance(x,str) else x)
df['mandatory_keys'] = df['history_list'].apply(check_list)
df['no_nans'] = ~pd.isna(df).any(axis = 1)
df['output_tf'] = df['mandatory_keys'] & df['no_nans']
df['output'] = labels[df['output_tf'].to_numpy(dtype=int)]
Note that I corrected some typos from your dataframe in my copied version of the dictionary d (for example, 'rank:'E7' was replaced with 'rank':'E7'). The incremental columns added (history_list, mandatory_keys, no_nans, output_tf) are there to make understanding the process that I applied here easier; it is not actually necessary to add these to the dataframe if, for example, you want to use as little space as possible. The script above results in the following dataframe df:
status preferred_time \
0 No ['Morning', 'Midday', 'Afternoon']
1 No NaN
2 NaN ['Morning']
3 No NaN
4 No ['Morning', 'Midday']
history id \
0 [{'branch': 'A', 'rank': 'E7', 'discharge_stat... 1
1 NaN 5
2 [{'branch': 'A', 'rank': 'E7', 'discharge_stat... 2
3 NaN 3
4 [{}] 4
history_list mandatory_keys no_nans \
0 [{'branch': 'A', 'rank': 'E7', 'discharge_stat... True True
1 NaN False False
2 [{'branch': 'A', 'rank': 'E7', 'discharge_stat... False False
3 NaN False False
4 [{}] False True
output_tf output
0 True completed
1 False pending
2 False pending
3 False pending
4 False pending
Here's a more parsimonious version (which doesn't add the unnecessary columns or store the extra "labels" variable).
import numpy as np
import pandas as pd
from numpy import nan
def check_list(L):
if not isinstance(L,list):
return False
return all(k in d for k in keys_req for d in L)
keys_req = ['branch','rank','discharge_status','service_start','job_code','post_intention']
d = {'status': {0: 'No', 1: 'No', 2: nan, 3: 'No', 4: 'No'}, 'preferred_time': {0: "['Morning', 'Midday', 'Afternoon']", 1: nan, 2: "['Morning'] ", 3: nan, 4: "['Morning', 'Midday'] "}, 'history': {0: "[{'branch': 'A', 'rank': 'E7', 'discharge_status': 'Honorable Discharge', 'service_start': '1999-02-13', 'job_code': '09', 'post_intention': ['No']}]", 1: nan, 2: "[{'branch': 'A', 'rank': 'E7', 'discharge_status': 'Honorable Discharge', 'service_start': '1995-02-13', 'job_code': '09', 'post_intention': ['No']},{'branch': 'A', 'rank': 'E6', 'discharge_status': 'Honorable Discharge', 'service_start': '2015-02-13', 'job_code': '09'}]", 3: nan, 4: '[{}]'}, 'id': {0: 1, 1: 5, 2: 2, 3: 3, 4: 4}}
df = pd.DataFrame(d)
df['output'] = np.array(["pending","completed"])[
(df['history'].apply(lambda x: eval(x) if isinstance(x,str) else x)
.apply(check_list)
& ~pd.isna(df).any(axis = 1)
).to_numpy(dtype=int)]
A version that addresses your latest comment:
df = pd.DataFrame(d)
display(df)
df['output'] = np.array(["pending","completed"])[
(df['history'].apply(lambda x: eval(x) if isinstance(x,str) else x)
.apply(check_list)
& ~pd.isna(df).any(axis = 1)
& (df['preferred_time']!="[]")
).to_numpy(dtype=int)]

Related

Convert column to date format

I am trying to convert the date to a correct date format. I have tested some of the possibilities that I have read in the forum but, I still don't know how to tackle this issue:
After importing:
df = pd.read_excel(r'/path/df_datetime.xlsb', sheet_name="12FEB22", engine='pyxlsb')
I get the following df:
{'Unnamed: 0': {0: 'Administrative ID',
1: '000002191',
2: '000002382',
3: '000002434',
4: '000002728',
5: '000002826',
6: '000003265',
7: '000004106',
8: '000004333'},
'Unnamed: 1': {0: 'Service',
1: 'generic',
2: 'generic',
3: 'generic',
4: 'generic',
5: 'generic',
6: 'generic',
7: 'generic',
8: 'generic'},
'Unnamed: 2': {0: 'Movement type',
1: 'New',
2: 'New',
3: 'New',
4: 'Modify',
5: 'New',
6: 'New',
7: 'New',
8: 'New'},
'Unnamed: 3': {0: 'Date',
1: 37503,
2: 37475,
3: 37453,
4: 44186,
5: 37711,
6: 37658,
7: 37770,
8: 37820},
'Unnamed: 4': {0: 'Contract Term',
1: '12',
2: '12',
3: '12',
4: '12',
5: '12',
6: '12',
7: '12',
8: '12'}}
However, even although I have tried to convert the 'Date' Column (or 'Unnamed 3', because the original dataset hasn't first row so I have to change the header after that) during the importation, it has been unsuccessful.
Is there any option that I can do?
Thanks!
try this:
from xlrd import xldate_as_datetime
def trans_date(x):
if isinstance(x, int):
return xldate_as_datetime(x, 0).date()
else:
return x
print(df['Unnamed: 3'].apply(trans_date))
>>>
0 Date
1 2002-09-04
2 2002-08-07
3 2002-07-16
4 2020-12-21
5 2003-03-31
6 2003-02-06
7 2003-05-29
8 2003-07-18
Name: Unnamed: 3, dtype: object

I can't get pandas to union my dataframes properly

I try and concat or append (neither are working) 2 9-column dataframes together. But, instead of just doing a normal vertical stacking of them, pandas keeps trying to add 9 more empty columns as well. Do you know how to stop this?
output looks like this:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,0,1,10,11,12,13,2,3,4,5,6,7,8,9
10/23/2020,New Castle,DE,Gary,IN,Full,Flatbed,0.00,46,48,0,Dispatch,(800) 488-1860,Meadow Lark Agency ,,,,,,,,,,,,,,
10/22/2020,Wilmington,DE,METHUEN,MA,Full,Flatbed / Step Deck,0.00,48,48,0,Ken,(903) 280-7878,UrTruckBroker ,,,,,,,,,,,,,,
10/23/2020,WILMINGTON,DE,METHUEN,MA,Full,Flatbed w/Tarps,0.00,47,1,0,Dispatch,(912) 748-3801,DSV Road Inc. ,,,,,,,,,,,,,,
10/23/2020,WILMINGTON,DE,METHUEN,MA,Full,Flatbed w/Tarps,0.00,48,1,0,Dispatch,(541) 826-4786,Sureway Transportation Co / Anderson Trucking Serv ,,,,,,,,,,,,,,
10/30/2020,New Castle,DE,Gary,IN,Full,Flatbed,945.00,46,48,0,Dispatch,(800) 488-1860,Meadow Lark Agency ,,,,,,,,,,,,,,
...
,,,,,,,,,,,,,,03/02/2021,Knapp,0.0,Dispatch,(763) 432-3680,Fuze Logistics Services USA ,WI,Jackson,NE,Full,Flatbed / Step Deck,0.0,48.0,48.0
,,,,,,,,,,,,,,03/02/2021,Knapp,0.0,Dispatch,(763) 432-3680,Fuze Logistics Services USA ,WI,Sterling,IL,Full,Flatbed / Step Deck,0.0,48.0,48.0
,,,,,,,,,,,,,,03/02/2021,Milwaukee,0.0,Dispatch,(763) 432-3680,Fuze Logistics Services USA ,WI,Great Falls,MT,Full,Flatbed / Step Deck,0.0,45.0,48.0
,,,,,,,,,,,,,,03/02/2021,Algoma,0.0,Dispatch,(763) 432-3680,Fuze Logistics Services USA ,WI,Pamplico,SC,Full,Flatbed / Step Deck,0.0,48.0,48.0
code is a web request to get data, which I save to dataframe, which is then concat-ed with another dataframe that comes from a CSV. I then save all of this back to that csv:
this_csv = 'freights_trulos.csv'
try:
old_df = pd.read_csv(this_csv)
except BaseException as e:
print(e)
old_df = pd.DataFrame()
state, equip = 'DE', 'Flat'
url = "https://backend-a.trulos.com/load-table/grab_loads.php?state=%s&equipment=%s" % (state, equip)
payload = {}
headers = {
...
}
response = requests.request("GET", url, headers=headers, data=payload)
# print(response.text)
parsed = json.loads(response.content)
data = [r[0:13] + [r[-4].split('<br/>')[-2].split('>')[-1]] for r in parsed]
df = pd.DataFrame(data=data)
if not old_df.empty:
# concatenate old and new and remove duplicates
# df.reset_index(drop=True, inplace=True)
# old_df.reset_index(drop=True, inplace=True)
# df = pd.concat([old_df, df], ignore_index=True) <--- CONCAT HAS SAME ISSUES AS APPEND
df = df.append(old_df, ignore_index=True)
# remove duplicates on cols
df.drop_duplicates()
df.to_csv(this_csv, index=False)
EDIT appended df's have had their types changed
df.dtypes
Out[2]:
0 object
1 object
2 object
3 object
4 object
5 object
6 object
7 object
8 object
9 object
10 object
11 object
12 object
13 object
dtype: object
old_df.dtypes
Out[3]:
0 object
1 object
2 object
3 object
4 object
5 object
6 object
7 float64
8 int64
9 int64
10 int64
11 object
12 object
13 object
dtype: object
old_df to csv
0,1,2,3,4,5,6,7,8,9,10,11,12,13
10/23/2020,New Castle,DE,Gary,IN,Full,Flatbed,0.0,46,48,0,Dispatch,(800) 488-1860,Meadow Lark Agency
10/22/2020,Wilmington,DE,METHUEN,MA,Full,Flatbed / Step Deck,0.0,48,48,0,Ken,(903) 280-7878,UrTruckBroker
10/23/2020,WILMINGTON,DE,METHUEN,MA,Full,Flatbed w/Tarps,0.0,47,1,0,Dispatch,(912) 748-3801,DSV Road Inc.
10/23/2020,WILMINGTON,DE,METHUEN,MA,Full,Flatbed w/Tarps,0.0,48,1,0,Dispatch,(541) 826-4786,Sureway Transportation Co / Anderson Trucking Serv
10/30/2020,New Castle,DE,Gary,IN,Full,Flatbed,945.0,46,48,0,Dispatch,(800) 488-1860,Meadow Lark Agency
new_df to csv
0,1,2,3,4,5,6,7,8,9,10,11,12,13
10/23/2020,New Castle,DE,Gary,IN,Full,Flatbed,0.00,46,48,0,Dispatch,(800) 488-1860,Meadow Lark Agency
10/22/2020,Wilmington,DE,METHUEN,MA,Full,Flatbed / Step Deck,0.00,48,48,0,Ken,(903) 280-7878,UrTruckBroker
10/23/2020,WILMINGTON,DE,METHUEN,MA,Full,Flatbed w/Tarps,0.00,47,1,0,Dispatch,(912) 748-3801,DSV Road Inc.
10/23/2020,WILMINGTON,DE,METHUEN,MA,Full,Flatbed w/Tarps,0.00,48,1,0,Dispatch,(541) 826-4786,Sureway Transportation Co / Anderson Trucking Serv
10/30/2020,New Castle,DE,Gary,IN,Full,Flatbed,945.00,46,48,0,Dispatch,(800) 488-1860,Meadow Lark Agency
I guess the problem could be how you read the data if I copy your sample data to excel and split by comma and then import to pandas, all is fine. Also if I split on comma AND whitespaces, I have +9 additional columns. So you could try debugging by replacing all whitespaces before creating your dataframe.
I also used your sample data and it workend just fine for me if I initialize it like this:
import pandas as pd
df_new = pd.DataFrame({'0': {0: '10/23/2020',
1: '10/22/2020',
2: '10/23/2020',
3: '10/23/2020',
4: '10/30/2020'},
'1': {0: 'New_Castle',
1: 'Wilmington',
2: 'WILMINGTON',
3: 'WILMINGTON',
4: 'New_Castle'},
'2': {0: 'DE', 1: 'DE', 2: 'DE', 3: 'DE', 4: 'DE'},
'3': {0: 'Gary', 1: 'METHUEN', 2: 'METHUEN', 3: 'METHUEN', 4: 'Gary'},
'4': {0: 'IN', 1: 'MA', 2: 'MA', 3: 'MA', 4: 'IN'},
'5': {0: 'Full', 1: 'Full', 2: 'Full', 3: 'Full', 4: 'Full'},
'6': {0: 'Flatbed',
1: 'Flatbed_/_Step_Deck',
2: 'Flatbed_w/Tarps',
3: 'Flatbed_w/Tarps',
4: 'Flatbed'},
'7': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 945.0},
'8': {0: 46, 1: 48, 2: 47, 3: 48, 4: 46},
'9': {0: 48, 1: 48, 2: 1, 3: 1, 4: 48},
'10': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'11': {0: 'Dispatch', 1: 'Ken', 2: 'Dispatch', 3: 'Dispatch', 4: 'Dispatch'},
'12': {0: '(800)_488-1860',
1: '(903)_280-7878',
2: '(912)_748-3801',
3: '(541)_826-4786',
4: '(800)_488-1860'},
'13': {0: 'Meadow_Lark_Agency_',
1: 'UrTruckBroker_',
2: 'DSV_Road_Inc._',
3: 'Sureway_Transportation_Co_/_Anderson_Trucking_Serv_',
4: 'Meadow_Lark_Agency_'}})
df_new = pd.DataFrame({'0': {0: '10/23/2020',
1: '10/22/2020',
2: '10/23/2020',
3: '10/23/2020',
4: '10/30/2020'},
'1': {0: 'New_Castle',
1: 'Wilmington',
2: 'WILMINGTON',
3: 'WILMINGTON',
4: 'New_Castle'},
'2': {0: 'DE', 1: 'DE', 2: 'DE', 3: 'DE', 4: 'DE'},
'3': {0: 'Gary', 1: 'METHUEN', 2: 'METHUEN', 3: 'METHUEN', 4: 'Gary'},
'4': {0: 'IN', 1: 'MA', 2: 'MA', 3: 'MA', 4: 'IN'},
'5': {0: 'Full', 1: 'Full', 2: 'Full', 3: 'Full', 4: 'Full'},
'6': {0: 'Flatbed',
1: 'Flatbed_/_Step_Deck',
2: 'Flatbed_w/Tarps',
3: 'Flatbed_w/Tarps',
4: 'Flatbed'},
'7': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 945.0},
'8': {0: 46, 1: 48, 2: 47, 3: 48, 4: 46},
'9': {0: 48, 1: 48, 2: 1, 3: 1, 4: 48},
'10': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'11': {0: 'Dispatch', 1: 'Ken', 2: 'Dispatch', 3: 'Dispatch', 4: 'Dispatch'},
'12': {0: '(800)_488-1860',
1: '(903)_280-7878',
2: '(912)_748-3801',
3: '(541)_826-4786',
4: '(800)_488-1860'},
'13': {0: 'Meadow_Lark_Agency_',
1: 'UrTruckBroker_',
2: 'DSV_Road_Inc._',
3: 'Sureway_Transportation_Co_/_Anderson_Trucking_Serv_',
4: 'Meadow_Lark_Agency_'}})
df_new.append(df_old, ignore_index=True)
#OR
pd.concat([df_new, df_old])

Pandas Columns to Flattened Dictionary (instead of list of dictionaries)

I have a DF that looks like this.
df = pd.DataFrame({'ID': {0: 1, 1: 2, 2: 3}, 'Value': {0: 'a', 1: 'b', 2: np.nan}})
ID
Value
0
1
a
1
2
b
2
3
c
I'd like to create a dictionary out of it.
So if I run df.to_dict('records'), it gives me
[{'Visual_ID': 1, 'Customer': 'a'},
{'Visual_ID': 2, 'Customer': 'b'},
{'Visual_ID': 3, 'Customer': 'c'}]
​However, what I want is the following.
{
1: 'a',
2: 'b',
3: 'c'
}
All of the rows in the DF or unique, so it shouldn't run into same key names issue.
Try with
d = dict(zip(df.ID, df.Value))

How to convert if/else to np.where in pandas

My code is below
apply pd.to_numeric to the columns where supposed to int or float but coming as object. Can we convert more into pandas way like applying np.where
if df.dtypes.all() == 'object':
df=df.apply(pd.to_numeric,errors='coerce').fillna(df)
else:
df = df
A simple one liner is assign with selest_dtypes which will reassign existing columns
df.assign(**df.select_dtypes('O').apply(pd.to_numeric,errors='coerce').fillna(df))
np.where:
df[:] = (np.where(df.dtypes=='object',
df.apply(pd.to_numeric,errors='coerce').fillna(df),df)
Example (check Price column) :
d = {'CusID': {0: 1, 1: 2, 2: 3},
'Name': {0: 'Paul', 1: 'Mark', 2: 'Bill'},
'Shop': {0: 'Pascal', 1: 'Casio', 2: 'Nike'},
'Price': {0: '24000', 1: 'a', 2: '900'}}
df = pd.DataFrame(d)
print(df)
CusID Name Shop Price
0 1 Paul Pascal 24000
1 2 Mark Casio a
2 3 Bill Nike 900
df.to_dict()
{'CusID': {0: 1, 1: 2, 2: 3},
'Name': {0: 'Paul', 1: 'Mark', 2: 'Bill'},
'Shop': {0: 'Pascal', 1: 'Casio', 2: 'Nike'},
'Price': {0: '24000', 1: 'a', 2: '900'}}
(df.assign(**df.select_dtypes('O').apply(pd.to_numeric,errors='coerce')
.fillna(df)).to_dict())
{'CusID': {0: 1, 1: 2, 2: 3},
'Name': {0: 'Paul', 1: 'Mark', 2: 'Bill'},
'Shop': {0: 'Pascal', 1: 'Casio', 2: 'Nike'},
'Price': {0: 24000.0, 1: 'a', 2: 900.0}}
Equivalent of your if/else is df.mask
df_out = df.mask(df.dtypes =='O', df.apply(pd.to_numeric, errors='coerce')
.fillna(df))

Pandas: how to unstack by group of columns while keeping columns paired

I need to unstack a contact list (id, relatives, phone numbers...) so that the columns keep a specific order.
Given an index, dataframe UNSTACK operates by unstacking single columns one by one, even when applied to couple of columns
Data have
df_have=pd.DataFrame.from_dict({'ID': {0: '100', 1: '100', 2: '100', 3: '100', 4: '100', 5: '200', 6: '200', 7: '200', 8: '200', 9: '200'},
'ID_RELATIVE': {0: '100', 1: '100', 2: '150', 3: '150', 4: '190', 5: '200', 6: '200', 7: '250', 8: '290', 9: '290'},
'RELATIVE_ROLE': {0: 'self', 1: 'self', 2: 'father', 3: 'father', 4: 'mother', 5: 'self', 6: 'self', 7: 'father', 8: 'mother', 9: 'mother'},
'PHONE': {0: '111111', 1: '222222', 2: '333333', 3: '444444', 4: '555555', 5: '123456', 6: '456789', 7: '987654', 8: '778899', 9: '909090'}})
Data want
df_want=pd.DataFrame.from_dict({'ID': {0: '100', 1: '200'},
'ID_RELATIVE_1': {0: '100', 1: '200'},
'RELATIVE_ROLE_1': {0: 'self', 1: 'self'},
'PHONE_1_1': {0: '111111', 1: '123456'},
'PHONE_1_2': {0: '222222', 1: '456789'},
'ID_RELATIVE_2': {0: '150', 1: '250'},
'RELATIVE_ROLE_2': {0: 'father', 1: 'father'},
'PHONE_2_1': {0: '333333', 1: '987654'},
'PHONE_2_2': {0: '444444', 1: 'nan'},
'ID_RELATIVE_3': {0: '190', 1: '290'},
'RELATIVE_ROLE_3': {0: 'mother', 1: 'mother'},
'PHONE_3_1': {0: '555555', 1: '778899'},
'PHONE_3_2': {0: 'nan', 1: '909090'}})
So, in the end, I need ID to be the index, and to unstack the other columns that will hence become attributes of ID.
The usual unstack process provides a "correct" ouput but in the wrong shape.
df2=have.groupby(['ID'])['ID_RELATIVE','RELATIVE_ROLE','PHONE'].apply(lambda x: x.reset_index(drop=True)).unstack()
This would require the re-ordering of columns and some removal of duplicates (by columns, not by row), together with a FOR loop. I'd like to avoid using this approach, since I'm looking for a more "elegant" way of achieving the desired result by means of grouping/stacking/unstacking/pivoting and so on.
Thanks a lot
Solution have main 2 steps - first grouping by all column without PHONE for pairs, convert columns names to ordered catagoricals for correct sorting and then grouping by ID:
c = ['ID','ID_RELATIVE','RELATIVE_ROLE']
df = df_have.set_index(c+ [df_have.groupby(c).cumcount().add(1)])['PHONE']
df = df.unstack().add_prefix('PHONE_').reset_index()
df = df.set_index(['ID', df.groupby('ID').cumcount().add(1)])
df.columns = pd.CategoricalIndex(df.columns, categories=df.columns.tolist(), ordered=True)
df = df.unstack().sort_index(axis=1, level=1)
df.columns = [f'{a}_{b}' for a, b in df.columns]
df = df.reset_index()
print (df)
ID ID_RELATIVE_1 RELATIVE_ROLE_1 PHONE_1_1 PHONE_2_1 ID_RELATIVE_2 \
0 100 100 self 111111 222222 150
1 200 200 self 123456 456789 250
RELATIVE_ROLE_2 PHONE_1_2 PHONE_2_2 ID_RELATIVE_3 RELATIVE_ROLE_3 PHONE_1_3 \
0 father 333333 444444 190 mother 555555
1 father 987654 NaN 290 mother 778899
PHONE_2_3
0 NaN
1 909090
If need change order of digits in PHONE columns:
df.columns = [f'{a.split("_")[0]}_{b}_{a.split("_")[1]}'
if 'PHONE' in a
else f'{a}_{b}' for a, b in df.columns]
df = df.reset_index()
print (df)
ID ID_RELATIVE_1 RELATIVE_ROLE_1 PHONE_1_1 PHONE_1_2 ID_RELATIVE_2 \
0 100 100 self 111111 222222 150
1 200 200 self 123456 456789 250
RELATIVE_ROLE_2 PHONE_2_1 PHONE_2_2 ID_RELATIVE_3 RELATIVE_ROLE_3 PHONE_3_1 \
0 father 333333 444444 190 mother 555555
1 father 987654 NaN 290 mother 778899
PHONE_3_2
0 NaN
1 909090

Categories