I have the following dataframe:
Book_No Replicate Sample Smell Taste Odour Volatility Notes
0 12, 43 1 control 0.3 10.0 71 1 NaN
1 12, 43 2 control 0.4 8.0 63 3 NaN
2 12, 43 3 control 0.1 3.0 22 2 NaN
3 19, 21 1 control 1.1 2.0 80 3 NaN
4 19, 21 2 control 0.4 8.0 0 4 NaN
5 19, 21 3 control 0.9 3.0 4 6 NaN
6 19, 21 4 control 2.1 6.0 50 4 NaN
7 11, 22 1 control 3.4 3.0 23 3 NaN
8 12, 43 1 Sample A 1.1 11.2 75 7 NaN
9 12, 43 2 Sample A 1.4 3.3 87 6 Temperature was too hot
10 12, 43 3 Sample A 0.7 7.4 91 5 NaN
11 19, 21 1 Sample B 2.1 3.2 99 7 NaN
12 19, 21 2 Sample B 2.2 11.3 76 8 NaN
13 19, 21 3 Sample B 1.9 9.3 89 9 sample spilt by user
14 19, 21 1 Sample C 3.2 4.0 112 10 NaN
15 19, 21 2 Sample C 2.1 5.0 96 15 NaN
16 19, 21 3 Sample C 2.7 7.0 105 13 Was too cold
17 11, 22 1 Sample C 2.4 3.0 121 19 NaN
I'd like to do two separate things. Firstly, I'd like to calculate the mean values for each 'smell', 'volatility', 'taste' and 'odour' columns of the 'Sample' Control, where the 'Book_No' is the same value. Then, subtract those mean values from the individual Sample A, Sample B and Sample C, where the 'Book_No' matches those of the control. The resulting dataframe should look something like this:
Book_No Replicate Sample Smell Taste Odour Volatility Notes
0 12, 43 1 control 0.300000 10.00 71.0 1.00 NaN
1 12, 43 2 control 0.400000 8.00 63.0 3.00 NaN
2 12, 43 3 control 0.100000 3.00 22.0 2.00 NaN
3 19, 21 1 control 1.100000 2.00 80.0 3.00 NaN
4 19, 21 2 control 0.400000 8.00 0.0 4.00 NaN
5 19, 21 3 control 0.900000 3.00 4.0 6.00 NaN
6 19, 21 4 control 2.100000 6.00 50.0 4.00 NaN
7 11, 22 1 control 3.400000 3.00 23.0 3.00 NaN
8 12, 43 1 Sample A 0.833333 4.20 23.0 5.00 NaN
9 12, 43 2 Sample A 1.133333 -3.70 35.0 4.00 Temperature was too hot
10 12, 43 3 Sample A 0.433333 0.40 39.0 3.00 NaN
11 19, 21 1 Sample B 0.975000 -1.55 65.5 2.75 NaN
12 19, 21 2 Sample B 1.075000 6.55 42.5 3.75 NaN
13 19, 21 3 Sample B 0.775000 4.55 55.5 4.75 sample spilt by user
14 19, 21 1 Sample C -0.200000 1.00 89.0 7.00 NaN
15 19, 21 2 Sample C -1.300000 2.00 73.0 12.00 NaN
16 19, 21 3 Sample C -0.700000 4.00 82.0 10.00 Was too cold
17 11, 22 1 Sample C -1.000000 0.00 98.0 16.00 NaN
I've tried the following codes, but neither seems to give me what I need, plus I'd need to copy and paste the code and change the column name for each column I'd like to apply it to:
df['Smell'] = df['Smell'] - df.groupby(['Book_No', 'Sample'])['Smell'].transform('mean')
and I've tried to apply a mask:
mask = df['Book_No'].unique()
df.loc[~mask, 'Smell'] = (df['Smell'] - df['Smell'].where(mask).groupby([df['Book_No'],df['Sample']]).transform('mean'))
Then, separately, I'd like to subtract the control values from the sample values, when the Book_No and replicate values match. The resulting dataframe should look something like this:
Book_No Replicate Sample Smell Taste Odour Volatility Unnamed: 7
0 12, 43 1 control 0.3 10.0 71 1 NaN
1 12, 43 2 control 0.4 8.0 63 3 NaN
2 12, 43 3 control 0.1 3.0 22 2 NaN
3 19, 21 1 control 1.1 2.0 80 3 NaN
4 19, 21 2 control 0.4 8.0 0 4 NaN
5 19, 21 3 control 0.9 3.0 4 6 NaN
6 19, 21 4 control 2.1 6.0 50 4 NaN
7 11, 22 1 control 3.4 3.0 23 3 NaN
8 12, 43 1 Sample A 0.8 1.2 4 6 NaN
9 12, 43 2 Sample A 1.0 -4.7 24 3 Temperature was too hot
10 12, 43 3 Sample A 0.6 4.4 69 3 NaN
11 19, 21 1 Sample B 1.0 1.2 19 4 NaN
12 19, 21 2 Sample B 1.8 3.3 76 4 NaN
13 19, 21 3 Sample B 1.0 6.3 85 3 sample spilt by user
14 19, 21 1 Sample C 2.1 2.0 32 7 NaN
15 19, 21 2 Sample C 1.7 -3.0 96 11 NaN
16 19, 21 3 Sample C 1.8 4.0 101 7 Was too cold
17 11, 22 1 Sample C -1.0 0.0 98 16 NaN
Could anyone kindly offer their assistance to help with these two scenarios?
Thank you in advance for any help
Splitting into different columns and reordering:
# This may be useful to you in the future, plus, ints are better than strings:
df[['Book', 'No']] = df.Book_No.str.split(', ', expand=True).astype(int)
cols = df.columns.tolist()
df = df[cols[-2:] + cols[1:-2]]
You should only focus on one problem at a time in your questions, so I'll help with the first part.
# Set some vars so we don't have to type these over and over:
cols = ['Smell', 'Volatility', 'Taste', 'Odour']
mask = df.Sample.eq('control')
group = ['Book', 'No']
# Find your control values:
ctrl_means = df[mask].groupby(group)[cols].mean()
# Apply your desired change:
df.loc[~mask, cols] = (df[~mask].groupby(group)[cols]
.apply(lambda x: x.sub(ctrl_means.loc[x.name])))
print(df)
Output:
Book No Replicate Sample Smell Taste Odour Volatility Notes
0 12 43 1 control 0.300000 10.00 71.0 1.00 NaN
1 12 43 2 control 0.400000 8.00 63.0 3.00 NaN
2 12 43 3 control 0.100000 3.00 22.0 2.00 NaN
3 19 21 1 control 1.100000 2.00 80.0 3.00 NaN
4 19 21 2 control 0.400000 8.00 0.0 4.00 NaN
5 19 21 3 control 0.900000 3.00 4.0 6.00 NaN
6 19 21 4 control 2.100000 6.00 50.0 4.00 NaN
7 11 22 1 control 3.400000 3.00 23.0 3.00 NaN
8 12 43 1 Sample A 0.833333 4.20 23.0 5.00 NaN
9 12 43 2 Sample A 1.133333 -3.70 35.0 4.00 Temperature was too hot
10 12 43 3 Sample A 0.433333 0.40 39.0 3.00 NaN
11 19 21 1 Sample B 0.975000 -1.55 65.5 2.75 NaN
12 19 21 2 Sample B 1.075000 6.55 42.5 3.75 NaN
13 19 21 3 Sample B 0.775000 4.55 55.5 4.75 sample spilt by user
14 19 21 1 Sample C 2.075000 -0.75 78.5 5.75 NaN
15 19 21 2 Sample C 0.975000 0.25 62.5 10.75 NaN
16 19 21 3 Sample C 1.575000 2.25 71.5 8.75 Was too cold
17 11 22 1 Sample C -1.000000 0.00 98.0 16.00 NaN
First we get the mean of the control samples:
cols = ['Smell', 'Taste', 'Odour', 'Volatility']
control_means = df[df.Sample.eq('control')].groupby(['Book_No'])[cols].mean()
Then subtract it from the remaining samples to get the fixed sample data. To utilize pandas automatic alignment, we need to temporarily set the index:
new_idx = ['Book_No', df.index]
fixed_samples = (df.set_index(new_idx).loc[df.set_index(new_idx).Sample.ne('control'), cols]
- control_means).droplevel(0)
Finally simply assign them back into the dataframe:
df.loc[df.Sample.ne('control'), cols] = fixed_samples
Result:
Book_No Replicate Sample Smell Taste Odour Volatility Notes
0 12, 43 1 control 0.300000 10.00 71.0 1.00 NaN
1 12, 43 2 control 0.400000 8.00 63.0 3.00 NaN
2 12, 43 3 control 0.100000 3.00 22.0 2.00 NaN
3 19, 21 1 control 1.100000 2.00 80.0 3.00 NaN
4 19, 21 2 control 0.400000 8.00 0.0 4.00 NaN
5 19, 21 3 control 0.900000 3.00 4.0 6.00 NaN
6 19, 21 4 control 2.100000 6.00 50.0 4.00 NaN
7 11, 22 1 control 3.400000 3.00 23.0 3.00 NaN
8 12, 43 1 Sample A 0.833333 4.20 23.0 5.00 NaN
9 12, 43 2 Sample A 1.133333 -3.70 35.0 4.00 Temperature was too hot
10 12, 43 3 Sample A 0.433333 0.40 39.0 3.00 NaN
11 19, 21 1 Sample B 0.975000 -1.55 65.5 2.75 NaN
12 19, 21 2 Sample B 1.075000 6.55 42.5 3.75 NaN
13 19, 21 3 Sample B 0.775000 4.55 55.5 4.75 sample spilt by user
14 19, 21 1 Sample C 2.075000 -0.75 78.5 5.75 NaN
15 19, 21 2 Sample C 0.975000 0.25 62.5 10.75 NaN
16 19, 21 3 Sample C 1.575000 2.25 71.5 8.75 Was too cold
17 11, 22 1 Sample C -1.000000 0.00 98.0 16.00 NaN
If you want you can squeeze it into a one-liner, but this his hardly comprehensible:
cols = ['Smell', 'Taste', 'Odour', 'Volatility']
new_idx = ['Book_No', df.index]
df.loc[df.Sample.ne('control'), cols] = (
df.set_index(new_idx).loc[df.set_index(new_idx).Sample.ne('control'), cols]
- df[df.Sample.eq('control')].groupby(['Book_No'])[cols].mean()
).droplevel(0)
I am trying merge 2 dataframes.
df1
Date A B C
01.01.2021 1 8 14
02.01.2021 2 9 15
03.01.2021 3 10 16
04.01.2021 4 11 17
05.01.2021 5 12 18
06.01.2021 6 13 19
07.01.2021 7 14 20
df2
Date B
07.01.2021 14
08.01.2021 27
09.01.2021 28
10.01.2021 29
11.01.2021 30
12.01.2021 31
13.01.2021 32
Both dataframes have one same row (although there could be several overlappings).
So I want to get df3 that looks as follows:
df3
Date A B C
01.01.2021 1 8 14
02.01.2021 2 9 15
03.01.2021 3 10 16
04.01.2021 4 11 17
05.01.2021 5 12 18
06.01.2021 6 13 19
07.01.2021 7 14 20
08.01.2021 Nan 27 Nan
09.01.2021 Nan 28 Nan
10.01.2021 Nan 29 Nan
11.01.2021 Nan 30 Nan
12.01.2021 Nan 31 Nan
13.01.2021 Nan 32 Nan
I've tried
df3=df1.merge(df2, on='Date', how='outer') but it gives extra A,B,C columns. Could you give some idea how to get df3?
Thanks a lot.
merge outer without specifying on (default on is the intersection of columns between the two DataFrames in this case ['Date', 'B']):
df3 = df1.merge(df2, how='outer')
df3:
Date A B C
0 01.01.2021 1.0 8 14.0
1 02.01.2021 2.0 9 15.0
2 03.01.2021 3.0 10 16.0
3 04.01.2021 4.0 11 17.0
4 05.01.2021 5.0 12 18.0
5 06.01.2021 6.0 13 19.0
6 07.01.2021 7.0 14 20.0
7 08.01.2021 NaN 27 NaN
8 09.01.2021 NaN 28 NaN
9 10.01.2021 NaN 29 NaN
10 11.01.2021 NaN 30 NaN
11 12.01.2021 NaN 31 NaN
12 13.01.2021 NaN 32 NaN
Assuming you always want to keep the first full version, you can concat the df2 on the end of df1 and drop duplicates on the Date column.
pd.concat([df1,df2]).drop_duplicates(subset='Date')
Output
Date A B C
0 01.01.2021 1.0 8 14.0
1 02.01.2021 2.0 9 15.0
2 03.01.2021 3.0 10 16.0
3 04.01.2021 4.0 11 17.0
4 05.01.2021 5.0 12 18.0
5 06.01.2021 6.0 13 19.0
6 07.01.2021 7.0 14 20.0
1 08.01.2021 NaN 27 NaN
2 09.01.2021 NaN 28 NaN
3 10.01.2021 NaN 29 NaN
4 11.01.2021 NaN 30 NaN
5 12.01.2021 NaN 31 NaN
6 13.01.2021 NaN 32 NaN
I have the following dataframe grouped by datafile and I want to fillna(method ='bfill') only for those 'groups' that contain more than half of the data.
df.groupby('datafile').count()
datafile column1 column2 column3 column4
datafile1 5 5 3 4
datafile2 5 5 4 5
datafile3 5 5 5 5
datafile4 5 5 0 0
datafile5 5 5 1 1
As you can see in the df above, I'd like to fill those groups that contain most of the information but not those who has none or little information. So I was thinking in a condition something like fillna those who have more than half of the counts and don't fill the rest or those with less than half.
I'm struggling on how to set up my condition since it involves working with a result of a groupby and the original df.
Help is appreciated it.
example df:
index datafile column1 column2 column3 column4
0 datafile1 5 5 NaN 20
1 datafile1 6 6 NaN 21
2 datafile1 7 7 9 NaN
3 datafile1 8 8 10 23
4 datafile1 9 9 11 24
5 datafile2 3 3 2 7
6 datafile2 4 4 3 8
7 datafile2 5 5 4 9
8 datafile2 6 6 NaN 10
9 datafile2 7 7 6 11
10 datafile3 10 10 24 4
11 datafile3 11 11 25 5
12 datafile3 12 12 26 6
13 datafile3 13 13 27 7
14 datafile3 14 14 28 8
15 datafile4 4 4 NaN NaN
16 datafile4 5 5 NaN NaN
17 datafile4 6 6 NaN NaN
18 datafile4 7 7 NaN NaN
19 datafile4 8 8 NaN NaN
19 datafile4 9 9 NaN NaN
20 datafile5 7 7 1 3
21 datafile5 8 8 NaN NaN
22 datafile5 9 9 NaN NaN
23 datafile5 10 10 NaN NaN
24 datafile5 11 1 NaN NaN
expected output df:
index datafile column1 column2 column3 column4
0 datafile1 5 5 9 20
1 datafile1 6 6 9 21
2 datafile1 7 7 9 23
3 datafile1 8 8 10 23
4 datafile1 9 9 11 24
5 datafile2 3 3 2 7
6 datafile2 4 4 3 8
7 datafile2 5 5 4 9
8 datafile2 6 6 6 10
9 datafile2 7 7 6 11
10 datafile3 10 10 24 4
11 datafile3 11 11 25 5
12 datafile3 12 12 26 6
13 datafile3 13 13 27 7
14 datafile3 14 14 28 8
15 datafile4 4 4 NaN NaN
16 datafile4 5 5 NaN NaN
17 datafile4 6 6 NaN NaN
18 datafile4 7 7 NaN NaN
19 datafile4 8 8 NaN NaN
19 datafile4 9 9 NaN NaN
20 datafile5 7 7 1 3
21 datafile5 8 8 NaN NaN
22 datafile5 9 9 NaN NaN
23 datafile5 10 10 NaN NaN
24 datafile5 11 1 NaN NaN
if the proportion of NON-null values is greater than or equal to 0.5 in each column then it is filled with the bfill method:
rate = 0.5
not_na = df.notna()
g = not_na.groupby(df['datafile'])
df_fill = (
df.bfill()
.where(
g.transform('sum')
.div(g['datafile'].transform('size'), axis=0)
.ge(rate) |
not_na
)
)
print(df_fill)
index datafile column1 column2 column3 column4
0 0 datafile1 5 5 9.0 20.0
1 1 datafile1 6 6 9.0 21.0
2 2 datafile1 7 7 9.0 23.0
3 3 datafile1 8 8 10.0 23.0
4 4 datafile1 9 9 11.0 24.0
5 5 datafile2 3 3 2.0 7.0
6 6 datafile2 4 4 3.0 8.0
7 7 datafile2 5 5 4.0 9.0
8 8 datafile2 6 6 6.0 10.0
9 9 datafile2 7 7 6.0 11.0
10 10 datafile3 10 10 24.0 4.0
11 11 datafile3 11 11 25.0 5.0
12 12 datafile3 12 12 26.0 6.0
13 13 datafile3 13 13 27.0 7.0
14 14 datafile3 14 14 28.0 8.0
15 15 datafile4 4 4 NaN NaN
16 16 datafile4 5 5 NaN NaN
17 17 datafile4 6 6 NaN NaN
18 18 datafile4 7 7 NaN NaN
19 19 datafile4 8 8 NaN NaN
20 19 datafile4 9 9 NaN NaN
21 20 datafile5 7 7 1.0 3.0
22 21 datafile5 8 8 NaN NaN
23 22 datafile5 9 9 NaN NaN
24 23 datafile5 10 10 NaN NaN
25 24 datafile5 11 1 NaN NaN
Also we can use:
m = (not_na.groupby(df['datafile'], sort=False)
.sum()
.div(df['datafile'].value_counts(), axis=0)
.ge(rate)
.reindex(df['datafile']).reset_index(drop=True))
df.bfill().where(m | not_na)
both methods have similar returns for the sample dataframe
%%timeit
rate = 0.5
not_na = df.notna()
m = (not_na.groupby(df['datafile'], sort=False)
.sum()
.div(df['datafile'].value_counts(),axis=0)
.ge(rate)
.reindex(df['datafile']).reset_index(drop=True))
df.bfill().where(m | not_na)
11.1 ms ± 53.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
rate = 0.5
not_na = df.notna()
g = not_na.groupby(df['datafile'])
df_fill = (df.bfill()
.where(g.transform('sum').div(g['datafile'].transform('size'),
axis=0).ge(rate) |
not_na)
)
12.9 ms ± 225 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Use pandas.groupby.filter
def most_not_null(x): return x.isnull().sum().sum() < (x.notnull().sum().sum() // 2)
filtered_groups = df.groupby('datafile').filter(most_not_null)
df.loc[filtered_groups.index] = filtered_groups.bfill()
Output
>>> df
index datafile column1 column2 column3 column4
0 0 datafile1 5 5 9.0 20.0
1 1 datafile1 6 6 9.0 21.0
2 2 datafile1 7 7 9.0 23.0
3 3 datafile1 8 8 10.0 23.0
4 4 datafile1 9 9 11.0 24.0
5 5 datafile2 3 3 2.0 7.0
6 6 datafile2 4 4 3.0 8.0
7 7 datafile2 5 5 4.0 9.0
8 8 datafile2 6 6 6.0 10.0
9 9 datafile2 7 7 6.0 11.0
10 10 datafile3 10 10 24.0 4.0
11 11 datafile3 11 11 25.0 5.0
12 12 datafile3 12 12 26.0 6.0
13 13 datafile3 13 13 27.0 7.0
14 14 datafile3 14 14 28.0 8.0
15 15 datafile4 4 4 NaN NaN
16 16 datafile4 5 5 NaN NaN
17 17 datafile4 6 6 NaN NaN
18 18 datafile4 7 7 NaN NaN
19 19 datafile4 8 8 NaN NaN
20 19 datafile4 9 9 NaN NaN
21 20 datafile5 7 7 1.0 3.0
22 21 datafile5 8 8 NaN NaN
23 22 datafile5 9 9 NaN NaN
24 23 datafile5 10 10 NaN NaN
25 24 datafile5 11 1 NaN NaN
How do I convert every numeric element of my pandas dataframe to an integer? I have not seen any documentation online for how to do so, which is surprising given Pandas is so popular...
If you have a data frame of ints, simply use astype directly.
df.astype(int)
If not, use select_dtypes first to select numeric columns.
df.select_dtypes(np.number).astype(int)
df = pd.DataFrame({'col1': [1.,2.,3.,4.], 'col2': [10.,20.,30.,40.]})
col1 col2
0 1.0 10.0
1 2.0 20.0
2 3.0 30.0
3 4.0 40.0
>>> df.astype(int)
col1 col2
0 1 10
1 2 20
2 3 30
3 4 40
You can use apply for this purpose:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':np.arange(1.0, 20.0), 'B':np.arange(101.0, 120.0)})
print(df)
A B
0 1.0 101.0
1 2.0 102.0
2 3.0 103.0
3 4.0 104.0
4 5.0 105.0
5 6.0 106.0
6 7.0 107.0
7 8.0 108.0
8 9.0 109.0
9 10.0 110.0
10 11.0 111.0
11 12.0 112.0
12 13.0 113.0
13 14.0 114.0
14 15.0 115.0
15 16.0 116.0
16 17.0 117.0
17 18.0 118.0
18 19.0 119.0
df2 = df.apply(lambda a: [int(b) for b in a])
print(df2)
A B
0 1 101
1 2 102
2 3 103
3 4 104
4 5 105
5 6 106
6 7 107
7 8 108
8 9 109
9 10 110
10 11 111
11 12 112
12 13 113
13 14 114
14 15 115
15 16 116
16 17 117
17 18 118
18 19 119
A better approach is to change the type at the level of series:
for col in df.columns:
if df[col].dtype == np.float64:
df[col] = df[col].astype('int')
print(df)
A B
0 1 101
1 2 102
2 3 103
3 4 104
4 5 105
5 6 106
6 7 107
7 8 108
8 9 109
9 10 110
10 11 111
11 12 112
12 13 113
13 14 114
14 15 115
15 16 116
16 17 117
17 18 118
18 19 119
Try this:
column_types = dict(df.dtypes)
for column in df.columns:
if column_types[column] == 'float64':
df[column] = df[column].astype('int')
df[column] = df[column].apply(lambda x: int(x))
I'm fairly new to Pandas so please forgive me if the answer to my question is rather obvious. I've got a dataset like this
Data Correction
0 100 Nan
1 104 Nan
2 108 Nan
3 112 Nan
4 116 Nan
5 120 0.5
6 124 Nan
7 128 Nan
8 132 Nan
9 136 0.4
10 140 Nan
11 144 Nan
12 148 Nan
13 152 0.3
14 156 Nan
15 160 Nan
What I want to is to calculate the correction factor for the data which accumulates upwards.
By that I mean that elements from 13 and below should have the factor 0.3 applied, with 9 and below applying 0.3*0.4 and 5 and below 0.3*0.4*0.5.
So the final correction column should look like this
Data Correction Factor
0 100 Nan 0.06
1 104 Nan 0.06
2 108 Nan 0.06
3 112 Nan 0.06
4 116 Nan 0.06
5 120 0.5 0.06
6 124 Nan 0.12
7 128 Nan 0.12
8 132 Nan 0.12
9 136 0.4 0.12
10 140 Nan 0.3
11 144 Nan 0.3
12 148 Nan 0.3
13 152 0.3 0.3
14 156 Nan 1
15 160 Nan 1
How can I do this?
I think you are looking for cumprod() after reversing the Correction column:
df=df.assign(Factor=df.Correction[::-1].cumprod().ffill().fillna(1))
Data Correction Factor
0 100 NaN 0.06
1 104 NaN 0.06
2 108 NaN 0.06
3 112 NaN 0.06
4 116 NaN 0.06
5 120 0.5 0.06
6 124 NaN 0.12
7 128 NaN 0.12
8 132 NaN 0.12
9 136 0.4 0.12
10 140 NaN 0.30
11 144 NaN 0.30
12 148 NaN 0.30
13 152 0.3 0.30
14 156 NaN 1.00
15 160 NaN 1.00
I can't think of a good pandas function that does this, however, you can create a for loop to do multiply an array with the values then put it as a column.
import numpy as np
import pandas as pd
lst = [np.nan,np.nan,np.nan,np.nan,np.nan,0.5,np.nan,np.nan,np.nan,np.nan,0.4,np.nan,np.nan,np.nan,0.3,np.nan,np.nan]
lst1 = [i + 100 for i in range(len(lst))]
newcol= [1.0 for i in range(len(lst))]
newcol = np.asarray(newcol)
df = pd.DataFrame({'Data' : lst1,'Correction' : lst})
for i in range(len(df['Correction'])):
if(~np.isnan(df.Correction[i])):
print(df.Correction[i])
newcol[0:i+1] = newcol[0:i+1] * df.Correction[i]
df['Factor'] = newcol
print(df)
This code prints
Data Correction Factor
0 100 NaN 0.06
1 101 NaN 0.06
2 102 NaN 0.06
3 103 NaN 0.06
4 104 NaN 0.06
5 105 0.5 0.06
6 106 NaN 0.12
7 107 NaN 0.12
8 108 NaN 0.12
9 109 NaN 0.12
10 110 0.4 0.12
11 111 NaN 0.30
12 112 NaN 0.30
13 113 NaN 0.30
14 114 0.3 0.30
15 115 NaN 1.00
16 116 NaN 1.00