Resolve complementary missing values between rows - python

I have a df that looks like this
Day ID ID_2 AS D E AS1 D1 E1
29 72 Participant 1 PS 6 42 NaN NaN NaN NaN NaN NaN
35 78 Participant 1 NaN yes 3 no 2 no 2
49 22 Participant 2 PS 1 89 NaN NaN NaN NaN NaN NaN
85 18 Participant 2 NaN yes 3 no 2 no 2
I'm looking for a way to add the ID_2 column value to all rows where ID matches (i.e., for Participant 1, fill in the NaN values with the values from the other row where ID=Participant 1). I've looked into using combine but that doesn't seem to work for this particular case.
Expected output:
Day ID ID_2 AS D E AS1 D1 E1
29 72 Participant 1 PS 6 42 yes 3 no 2 no 2
35 78 Participant 1 PS 6 42 yes 3 no 2 no 2
49 22 Participant 2 PS 1 89 yes 3 no 2 no 2
85 18 Participant 2 PS 1 89 yes 3 no 2 no 2
or
Day ID ID_2 AS D E AS1 D1 E1
29 72 Participant 1 PS 6 42 NaN NaN NaN NaN NaN NaN
35 78 Participant 1 PS 6 42 yes 3 no 2 no 2
49 22 Participant 2 PS 1 89 NaN NaN NaN NaN NaN NaN
85 18 Participant 2 PS 1 89 yes 3 no 2 no 2

I think you could try
df.ID_2 = df.groupby('ID').ID_2.ffill()
# 29 PS 6 42
# 35 PS 6 42
# 49 PS 1 89
# 85 PS 1 89

Not tested, but something like this should work - can't copy your df into my browser.
print(df)
Day ID ID_2 AS D E AS1 D1 E1
0 72 Participant_1 PS_6_42 NaN NaN NaN NaN NaN NaN
1 78 Participant_1 NaN yes 3.0 no 2.0 no 2.0
2 22 Participant_2 PS_1_89 NaN NaN NaN NaN NaN NaN
3 18 Participant_2 NaN yes 3.0 no 2.0 no 2.0
df2 = df.set_index('ID').groupby('ID').transform('ffill').transform('bfill').reset_index()
print(df2)
ID Day ID_2 AS D E AS1 D1 E1
0 Participant_1 72 PS_6_42 yes 3 no 2 no 2
1 Participant_1 78 PS_6_42 yes 3 no 2 no 2
2 Participant_2 22 PS_1_89 yes 3 no 2 no 2
3 Participant_2 18 PS_1_89 yes 3 no 2 no 2

Related

How to use each vector entry to fill NAN's of a separate groups in a dataframe

Say I have a vector ValsHR which looks like this:
valsHR=[78.8, 82.3, 91.0]
And I have a dataframe MainData
Age Patient HR
21 1 NaN
21 1 NaN
21 1 NaN
30 2 NaN
30 2 NaN
24 3 NaN
24 3 NaN
24 3 NaN
I want to fill the NaNs so that the first value in valsHR will only fill in the NaNs for patient 1, the second will fill the NaNs for patient 2 and the third will fill in for patient 3.
So far I've tried using this:
mainData['HR'] = mainData['HR'].fillna(ValsHR) but it fills all the NaNs with the first value in the vector.
I've also tried to use this:
mainData['HR'] = mainData.groupby('Patient').fillna(ValsHR) fills the NaNs with values that aren't in the valsHR vector at all.
I was wondering if anyone knew a way to do this?
Create dictionary by Patient values with missing values, map to original column and replace missing values only:
print (df)
Age Patient HR
0 21 1 NaN
1 21 1 NaN
2 21 1 NaN
3 30 2 100.0 <- value is not replaced
4 30 2 NaN
5 24 3 NaN
6 24 3 NaN
7 24 3 NaN
p = df.loc[df.HR.isna(), 'Patient'].unique()
valsHR = [78.8, 82.3, 91.0]
df['HR'] = df['HR'].fillna(df['Patient'].map(dict(zip(p, valsHR))))
print (df)
Age Patient HR
0 21 1 78.8
1 21 1 78.8
2 21 1 78.8
3 30 2 100.0
4 30 2 82.3
5 24 3 91.0
6 24 3 91.0
7 24 3 91.0
If some groups has no NaNs:
print (df)
Age Patient HR
0 21 1 NaN
1 21 1 NaN
2 21 1 NaN
3 30 2 100.0 <- group 2 is not replaced
4 30 2 100.0 <- group 2 is not replaced
5 24 3 NaN
6 24 3 NaN
7 24 3 NaN
p = df.loc[df.HR.isna(), 'Patient'].unique()
valsHR = [78.8, 82.3, 91.0]
df['HR'] = df['HR'].fillna(df['Patient'].map(dict(zip(p, valsHR))))
print (df)
Age Patient HR
0 21 1 78.8
1 21 1 78.8
2 21 1 78.8
3 30 2 100.0
4 30 2 100.0
5 24 3 82.3
6 24 3 82.3
7 24 3 82.3
It is simply mapping, if all of NaN should be replaced
import pandas as pd
from io import StringIO
valsHR=[78.8, 82.3, 91.0]
vals = {i:k for i,k in enumerate(valsHR, 1)}
df = pd.read_csv(StringIO("""Age Patient
21 1
21 1
21 1
30 2
30 2
24 3
24 3
24 3"""), sep="\s+")
df["HR"] = df["Patient"].map(vals)
>>> df
Age Patient HR
0 21 1 78.8
1 21 1 78.8
2 21 1 78.8
3 30 2 82.3
4 30 2 82.3
5 24 3 91.0
6 24 3 91.0
7 24 3 91.0

Add the values of several columns when the number of columns exceeds 3 - Pandas

I have a pandas dataframe with several columns of dates, numbers and bill amounts. I would like to add the amounts of the other invoices with the 3rd one and change the invoice number by "1111".
Here is an example:
ID customer
Bill1
Date 1
ID Bill 1
Bill2
Date 2
ID Bill 2
Bill3
Date3
ID Bill 3
Bill4
Date 4
ID Bill 4
Bill5
Date 5
ID Bill 5
4
6
2000-10-04
1
45
2000-11-05
2
51
1999-12-05
3
23
2001-11-23
6
76
2011-08-19
12
6
8
2016-05-03
7
39
2017-08-09
8
38
2018-07-14
17
21
2009-05-04
9
Nan
Nan
Nan
12
14
2016-11-16
10
73
2017-05-04
15
Nan
Nan
Nan
Nan
Nan
Nan
Nan
Nan
Nan
And I would like to get this :
ID customer
Bill1
Date 1
ID Bill 1
Bill2
Date 2
ID Bill 2
Bill3
Date3
ID Bill 3
4
6
2000-10-04
1
45
2000-11-05
2
150
1999-12-05
1111
6
8
2016-05-03
7
39
2017-08-09
8
59
2018-07-14
1111
12
14
2016-11-16
10
73
2017-05-04
15
Nan
Nan
Nan
This example is a sample of my data, I may have many more than 5 columns.
Thanks for your help
with a little of data manipulation, you should be able to do it as:
df = df.replace('Nan', np.nan)
idx_col_bill3 = 7
step = 3
idx_col_bill3_id = 10
cols = df.columns
bills = df[cols[range(idx_col_bill3,len(cols), step)]].sum(axis=1)
bills.replace(0, nan, inplace=True)
df = df[cols[range(idx_col_bill3_id)]]
df['Bill3'] = bills
df['ID Bill 3'].iloc._setitem_with_indexer(df['ID Bill 3'].notna(),1111)

I want to change the column value for a specific index

df = pd.read_csv('test.txt',dtype=str)
print(df)
HE WE
0 aa NaN
1 181 76
2 22 13
3 NaN NaN
I want to overwrite any of these data frames with the following indexes
dff = pd.DataFrame({'HE' : [100,30]},index=[1,2])
print(dff)
HE
1 100
2 30
for i in dff.index:
df._set_value(i,'HE',dff._get_value(i,'HE'))
print(df)
HE WE
0 aa NaN
1 100 76
2 30 13
3 NaN NaN
Is there a way to change it all at once without using 'for'?
Use DataFrame.update, (working inplace):
df.update(dff)
print (df)
HE WE
0 aa NaN
1 100 76.0
2 30 13.0
3 NaN NaN

groupby results in empty datafram

Issue:
groupby on below data, resulted in empty dataframe, not sure how to fix, please help thanks.
data:
business_group business_unit cost_center GL_code profit_center count
NaN a 12 12-09 09 1
NaN a 12 12-09 09 1
NaN b 23 23-87 87 1
NaN b 23 23-87 87 1
NaN b 34 34-76 76 1
groupby:
group_df = df.groupby(['business_group', 'business_unit','cost_center','GL_code','profit_center'],
as_index=False).count()
expected result:
business_group business_unit cost_center GL_code profit_center count
NaN a 12 12-09 09 2
NaN b 23 23-87 87 2
NaN c 34 34-76 76 1
result received:
Empty DataFrame
Columns: [business_group, business_unit, cost_center, GL_code, profit_center, count]
Index: []
That's because the NaN in business_group is the null values, and groupby() by default will drop all NaN values. You can pass dropna=False into groupby():
group_df = df.groupby(['business_group', 'business_unit',
'cost_center','GL_code','profit_center'],dropna=False,
as_index=False).count()
Output:
business_group business_unit cost_center GL_code profit_center count
0 NaN a 12 12-09 9 2
1 NaN b 23 23-87 87 2
2 NaN b 34 34-76 76 1

How to move every element in a column by n range in a dataframe using python?

I have a dataframe df that looks like below:
No A B value
1 23 36 1
2 45 23 1
3 34 12 2
4 22 76 NaN
...
I would like to shift each of the value in "value" column by 2. And the first row "value" should not be shifted.
I have already tried the normal shift, which directly shifts everthing by 2.
df['value']=df['value'].shift(2)
i expect the below result:
No A B value
1 23 36 1
2 45 23 Nan
3 34 12 Nan
4 22 76 1
5 10 12 Nan
6 34 2 Nan
7 21 11 2
...
In your case
df['Newvalue']=pd.Series(df.value.values,index=np.arange(len(df))*3)
df
Out[41]:
No A B value Newvalue
0 1 23 36 1.0 1.0
1 2 45 23 1.0 NaN
2 3 34 12 2.0 NaN
3 4 22 76 NaN 1.0

Categories