fill NaN with values of a similar row

fill NaN with values of a similar row - python

My dataframe has these 2 columns: "reach" and "height". the column "reach" has a lot of missing value. But the column 'height' have all the value needed. What I see is that reach is often a function of height. Therefore, for the rows with NaN, I want to look at the height, then find another row with the same "height" and that has "reach" available, then copy this value to the 1 with missing value
name
SApM
SLpM
height
reach
record
stance
strAcc
strDef
subAvg
tdAcc
tdAvg
tdDef
weight
born_year
win
lose
draw
nc
Justin Frazier
6.11
1.11
6' 0"
75
10-3-0
Southpaw
0.66
0.04
0
0
0
0
265
1989
10
3
0
Gleidson Cutis
8.28
2.99
5' 9"
nan
7-4-0
Orthodox
0.52
0.59
0
0
0
0
155
1989
7
4
0
Xavier Foupa-Pokam
2.5
1.47
6' 1"
nan
32-22-0
Open Stance
0.43
0.49
0
0
0
0.16
185
1982
32
22
0
Mirko Filipovic
1.89
2.11
6' 2"
73
35-11-2-(1 NC)
Southpaw
0.5
0.63
0.3
0.4
0.19
0.78
230
1974
35
11
2
1
Jordan Johnson
2.64
3.45
6' 2"
79
10-0-0
Orthodox
0.47
0.53
1.2
0.42
3.25
1
205
1988
10
0
0
Martin Kampmann
3.28
3.22
6' 0"
72
20-7-0
Orthodox
0.42
0.62
2
0.41
1.86
0.78
170
1982
20
7
0
Darren Elkins
3.05
3.46
5' 10"
71
27-9-0
Orthodox
0.38
0.52
1.1
0.33
2.67
0.56
145
1984
27
9
0
Austen Lane
6.32
5.26
6' 6"
nan
2-1-0
Orthodox
0.35
0.6
0
0
0
0
245
1987
2
1
0
Rachael Ostovich
3.97
2.54
5' 3"
62
4-6-0
Orthodox
0.43
0.57
0.8
0.83
2.03
0.66
125
1991
4
6
0
Travis Lutter
2.42
0.41
5' 11"
75
10-6-0
Orthodox
0.32
0.42
0.7
0.24
1.95
0.3
185
1973
10
6
0
Tom Murphy
0.17
2.5
6' 2"
nan
8-0-0
Southpaw
0.71
0.84
2.5
0.85
7.51
0
227
1974
8
0
0
Darrell Montague
5.38
1.92
5' 6"
67
13-5-0
Southpaw
0.25
0.52
1.4
0.25
0.72
0.33
125
1987
13
5
0
Lauren Murphy
4.25
3.95
5' 5"
67
15-4-0
Orthodox
0.4
0.61
0.1
0.34
1.16
0.7
125
1983
15
4
0
Bill Mahood
3.59
1.54
6' 3"
nan
20-7-1-(1 NC)
Orthodox
0.85
0.17
3.9
0
0
0
200
1967
20
7
1
1
Nate Marquardt
2.32
2.71
6' 0"
74
35-19-2
Orthodox
0.49
0.55
0.8
0.51
1.87
0.7
185
1979
35
19
2
Mike Polchlopek
1.33
2
6' 4"
nan
1-1-0
Orthodox
0.38
0.57
0
0
0
0
285
1965
1
1
0
Harvey Park
7.21
3.77
6' 0"
70
12-3-0
Orthodox
0.5
0.33
0
0
0
0
155
1986
12
3
0
Junyong Park
3.17
4.37
5' 10"
73
13-4-0
Orthodox
0.47
0.58
0.6
0.57
3.02
0.46
185
1991
13
4
0
Ricco Rodriguez
1.15
1.85
6' 4"
nan
53-25-0-(1 NC)
Orthodox
0.51
0.61
1
0.39
2.3
0.4
265
1977
53
25
0
1
Aaron Riley
3.78
3.45
5' 8"
69
30-14-1
Southpaw
0.34
0.61
0.1
0.34
1.18
0.6
155
1980
30
14
1

You can create a height reference dataframe with .groupby() and fetch the first non-NaN entry of a height (if any) by .first(), as follows:
height_ref = df.groupby('height')['reach'].first()
height
5' 10" 71.0
5' 11" 75.0
5' 3" 62.0
5' 5" 67.0
5' 6" 67.0
5' 8" 69.0
5' 9" NaN
6' 0" 75.0
6' 1" NaN
6' 2" 73.0
6' 3" NaN
6' 4" NaN
6' 6" NaN
Name: reach, dtype: float64
Then, you can fill up the NaN values of column reach by looking up the height reference dataframe by .map() and use .fillna() to fill-up values, as follows:
df['reach2'] = df['reach'].fillna(df['height'].map(height_ref))
For demo purpose, I update to a new column reach2. You can overwrite the original column reach as appropriate.
Result:
print(df[['height', 'reach', 'reach2']])
height reach reach2
0 6' 0" 75.0 75.0
1 5' 9" NaN NaN
2 6' 1" NaN NaN
3 6' 2" 73.0 73.0
4 6' 2" 79.0 79.0
5 6' 0" 72.0 72.0
6 5' 10" 71.0 71.0
7 6' 6" NaN NaN
8 5' 3" 62.0 62.0
9 5' 11" 75.0 75.0
10 6' 2" NaN 73.0 <======= filled up with referenced height from other row
11 5' 6" 67.0 67.0
12 5' 5" 67.0 67.0
13 6' 3" NaN NaN
14 6' 0" 74.0 74.0
15 6' 4" NaN NaN
16 6' 0" 70.0 70.0
17 5' 10" 73.0 73.0
18 6' 4" NaN NaN
19 5' 8" 69.0 69.0

I think that a method does not exist to do that in a simple step.
If i were in your shoes I would:
Create a support dataset made up of height|reach fully populated, in which I would store my best guess values
Join the support dataframe with the existing ones, using height as key
Coalesce the values where NaN appears: df.reach = df.reach.fillna(df.from_support_dataset_height)

Related

Python Dataframe Get Value of Last Non Null Column for Each Row

I have a dataframe such as the following:
ID 2016 2017 2018 2019 2020
0 1 1.64 NaN NaN NaN NaN
1 2 NaN NaN NaN 0.78 NaN
2 3 1.11 0.97 1.73 1.23 0.87
3 4 0.84 0.74 1.64 1.47 0.41
4 5 0.75 1.05 NaN NaN NaN
I want to get the values from the last non-null column such that:
ID 2016 2017 2018 2019 2020 LastValue
0 1 1.64 NaN NaN NaN NaN 1.64
1 2 NaN NaN NaN 0.78 NaN 0.78
2 3 1.11 0.97 1.73 1.23 0.87 0.87
3 4 0.84 0.74 1.64 1.47 0.41 0.41
4 5 0.75 1.05 NaN NaN NaN 1.05
I tried to loop through the year columns in reverse as follows but couldn't fully achieve what I want.
for i in reversed(df.columns[1:]):
if df[i] is not None:
val = df[i]
Could you help about this issue? Thanks.

Idea is select all columns without first by DataFrame.iloc, then forward filling per rows missing values and last select last column:
df['LastValue'] = df.iloc[:, 1:].ffill(axis=1).iloc[:, -1]
print (df)
ID 2016 2017 2018 2019 2020 LastValue
0 1 1.64 NaN NaN NaN NaN 1.64
1 2 NaN NaN NaN 0.78 NaN 0.78
2 3 1.11 0.97 1.73 1.23 0.87 0.87
3 4 0.84 0.74 1.64 1.47 0.41 0.41
4 5 0.75 1.05 NaN NaN NaN 1.05
Detail:
print (df.iloc[:, 1:].ffill(axis=1))
2016 2017 2018 2019 2020
0 1.64 1.64 1.64 1.64 1.64
1 NaN NaN NaN 0.78 0.78
2 1.11 0.97 1.73 1.23 0.87
3 0.84 0.74 1.64 1.47 0.41
4 0.75 1.05 1.05 1.05 1.05

Shift and multipy in Pandas

I have a pandas dataframe looks like this:
Year Ship Age Surviving UEC
2018 12.88 13 0.00 17.2
2019 12.57 12 0.02 17.2
2020 12.24 11 0.06 17.2
2021 11.95 10 0.18 17.2
2022 11.77 9 0.37 17.2
2023 11.70 8 0.60 17.2
2024 11.75 7 0.81 17.2
2025 11.93 6 0.94 17.2
2026 12.12 5 0.99 0.3
2027 12.34 4 1.00 0.3
2028 12.56 3 NaN 0.3
2029 12.76 2 NaN 0.3
2030 12.93 1 NaN 0.3
I want to multiply Ship,Surviving, and UEC columns by down shifting all the columns by 1 at time, so the outputs df2 should look like this:
df2
Stock_uec
0 df1.iloc[:10,1]*df1.iloc[:10,3]*df1.iloc[:10,4]
1 df1.iloc[1:11,1]*df1.iloc[1:11,3]*df1.iloc[1:11,4]
3 df1.iloc[2:12,1]*df1.iloc[2:12,3]*df1.iloc[2:12,4]
Below is my code, but I didn't get the results as I expected.
for i, row in df1.iterrows():
out=df1.iloc[i:i+10,1].shift(1,axis=0)*df1.iloc[i:i+10,3].shift(1,
axis=0)*df1.iloc[i:i+10,4].shift(1, axis=0)
print(out)
Thank you for your help.

IIUC, I think you want:
df.loc[:,'shipping_utc'] = 0
for i in range(df.shape[0]):
df.loc[i:,'shipping_utc'] = df.iloc[i:][['Ship','Surviving','UEC']].prod(axis=1) + df.loc[i:,'shipping_utc']
output
df
Out[25]:
Year Ship Age Surviving UEC shipping_utc
0 2018 12.88 13 0.00 17.2 0.00000
1 2019 12.57 12 0.02 17.2 8.64816
2 2020 12.24 11 0.06 17.2 37.89504
3 2021 11.95 10 0.18 17.2 147.98880
4 2022 11.77 9 0.37 17.2 374.52140
5 2023 11.70 8 0.60 17.2 724.46400
6 2024 11.75 7 0.81 17.2 1145.90700
7 2025 11.93 6 0.94 17.2 1543.07392
8 2026 12.12 5 0.99 0.3 32.39676
9 2027 12.34 4 1.00 0.3 37.02000
10 2028 12.56 3 NaN 0.3 41.44800
11 2029 12.76 2 NaN 0.3 45.93600
12 2030 12.93 1 NaN 0.3 50.42700

Applying cumulative correction factor across dataframe

I'm fairly new to Pandas so please forgive me if the answer to my question is rather obvious. I've got a dataset like this
Data Correction
0 100 Nan
1 104 Nan
2 108 Nan
3 112 Nan
4 116 Nan
5 120 0.5
6 124 Nan
7 128 Nan
8 132 Nan
9 136 0.4
10 140 Nan
11 144 Nan
12 148 Nan
13 152 0.3
14 156 Nan
15 160 Nan
What I want to is to calculate the correction factor for the data which accumulates upwards.
By that I mean that elements from 13 and below should have the factor 0.3 applied, with 9 and below applying 0.3*0.4 and 5 and below 0.3*0.4*0.5.
So the final correction column should look like this
Data Correction Factor
0 100 Nan 0.06
1 104 Nan 0.06
2 108 Nan 0.06
3 112 Nan 0.06
4 116 Nan 0.06
5 120 0.5 0.06
6 124 Nan 0.12
7 128 Nan 0.12
8 132 Nan 0.12
9 136 0.4 0.12
10 140 Nan 0.3
11 144 Nan 0.3
12 148 Nan 0.3
13 152 0.3 0.3
14 156 Nan 1
15 160 Nan 1
How can I do this?

I think you are looking for cumprod() after reversing the Correction column:
df=df.assign(Factor=df.Correction[::-1].cumprod().ffill().fillna(1))
Data Correction Factor
0 100 NaN 0.06
1 104 NaN 0.06
2 108 NaN 0.06
3 112 NaN 0.06
4 116 NaN 0.06
5 120 0.5 0.06
6 124 NaN 0.12
7 128 NaN 0.12
8 132 NaN 0.12
9 136 0.4 0.12
10 140 NaN 0.30
11 144 NaN 0.30
12 148 NaN 0.30
13 152 0.3 0.30
14 156 NaN 1.00
15 160 NaN 1.00

I can't think of a good pandas function that does this, however, you can create a for loop to do multiply an array with the values then put it as a column.
import numpy as np
import pandas as pd
lst = [np.nan,np.nan,np.nan,np.nan,np.nan,0.5,np.nan,np.nan,np.nan,np.nan,0.4,np.nan,np.nan,np.nan,0.3,np.nan,np.nan]
lst1 = [i + 100 for i in range(len(lst))]
newcol= [1.0 for i in range(len(lst))]
newcol = np.asarray(newcol)
df = pd.DataFrame({'Data' : lst1,'Correction' : lst})
for i in range(len(df['Correction'])):
if(~np.isnan(df.Correction[i])):
print(df.Correction[i])
newcol[0:i+1] = newcol[0:i+1] * df.Correction[i]
df['Factor'] = newcol
print(df)
This code prints
Data Correction Factor
0 100 NaN 0.06
1 101 NaN 0.06
2 102 NaN 0.06
3 103 NaN 0.06
4 104 NaN 0.06
5 105 0.5 0.06
6 106 NaN 0.12
7 107 NaN 0.12
8 108 NaN 0.12
9 109 NaN 0.12
10 110 0.4 0.12
11 111 NaN 0.30
12 112 NaN 0.30
13 113 NaN 0.30
14 114 0.3 0.30
15 115 NaN 1.00
16 116 NaN 1.00

Python Pandas pivot table - how do to tell if differences between means is significant within pivot table?

I have written the following:
ax = df.pivot_table(index=['month'], columns='year', values='sale_amount_usd', margins=True,fill_value=0).round(2).plot(kind='bar',colormap=('Blues'),figsize=(18,15))
plt.legend(loc='best')
plt.ylabel('Average Sales Amount in USD')
plt.xlabel('Month')
plt.xticks(rotation=0)
plt.title('Average Sales Amount in USD by Month/Year')
for p in ax.patches:
ax.annotate(str(p.get_height()), (p.get_x() * 1.001, p.get_height() * 1.005))
plt.show();
Which returns a nice bar chart:
I'd now like to be able to tell whether the differences in means within each month, between years, is significant. In other words, is the jump from $321 in March 2013 to $365 in March 2014 a significant increase in average sales amount?
How would I do this? Is there a way to overlay a marker on the pivot table that tells me, visually, when a difference is significant?
edited to add sample data:
event_id event_date week_number week_of_month holiday month day year pub_organization_id clicks sales click_to_sale_conversion_rate sale_amount_usd per_sale_amount_usd per_click_sale_amount pub_commission_usd per_sale_pub_commission_usd per_click_pub_commission_usd
0 3365 1/11/13 2 2 NaN 1. January 11 2013 214 11945 754 0.06 40311.75 53.46 3.37 2418.71 3.21 0.20
1 13793 2/12/13 7 3 NaN 2. February 12 2013 214 11711 1183 0.10 73768.54 62.36 6.30 4426.12 3.74 0.38
2 4626 1/15/13 3 3 NaN 1. January 15 2013 214 11561 1029 0.09 70356.46 68.37 6.09 4221.39 4.10 0.37
3 10917 2/3/13 6 1 NaN 2. February 3 2013 167 11481 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4 14653 2/15/13 7 3 NaN 2. February 15 2013 214 11268 795 0.07 37262.56 46.87 3.31 2235.77 2.81 0.20
5 18448 2/27/13 9 5 NaN 2. February 27 2013 214 11205 504 0.04 48773.71 96.77 4.35 2926.43 5.81 0.26
6 11382 2/5/13 6 2 NaN 2. February 5 2013 214 11166 1324 0.12 93322.84 70.49 8.36 5599.38 4.23 0.50
7 14764 2/16/13 7 3 NaN 2. February 16 2013 214 11042 451 0.04 22235.51 49.30 2.01 1334.14 2.96 0.12
8 17080 2/23/13 8 4 NaN 2. February 23 2013 214 10991 248 0.02 14558.85 58.71 1.32 873.53 3.52 0.08
9 21171 3/8/13 10 2 NaN 3. March 8 2013 214 10910 1081 0.10 52005.12 48.11 4.77 3631.28 3.36 0.33
10 16417 2/21/13 8 4 NaN 2. February 21 2013 214 10826 507 0.05 44907.20 88.57 4.15 2694.43 5.31 0.25
11 13399 2/11/13 7 3 NaN 2. February 11 2013 214 10772 1142 0.11 38549.55 33.76 3.58 2312.97 2.03 0.21
12 1532 1/5/13 1 1 NaN 1. January 5 2013 214 10750 610 0.06 29838.49 48.92 2.78 1790.31 2.93 0.17
13 22500 3/13/13 11 3 NaN 3. March 13 2013 214 10743 821 0.08 47310.71 57.63 4.40 3688.83 4.49 0.34
14 5840 1/19/13 3 3 NaN 1. January 19 2013 214 10693 487 0.05 28427.35 58.37 2.66 1705.64 3.50 0.16
15 19566 3/3/13 10 1 NaN 3. March 3 2013 214 10672 412 0.04 15722.29 38.16 1.47 1163.16 2.82 0.11
16 26313 3/25/13 13 5 NaN 3. March 25 2013 214 10629 529 0.05 21946.51 41.49 2.06 1589.84 3.01 0.15
17 19732 3/4/13 10 2 NaN 3. March 4 2013 214 10619 1034 0.10 37257.20 36.03 3.51 2713.71 2.62 0.26
18 18569 2/28/13 9 5 NaN 2. February 28 2013 214 10603 414 0.04 40920.28 98.84 3.86 2455.22 5.93 0.23
19 8704 1/28/13 5 5 NaN 1. January 28 2013 214 10548 738 0.07 29041.87 39.35 2.75 1742.52 2.36 0.17

Although not conclusive, you could use error bars (through the yerr argument in plt.plot) that represent one standard deviation of uncertainty, and then just eyeball the overlap of the intervals. Something like (not tested)...
stds = df.groupby(['month', 'year'])['sale_amount_usd'].std().to_frame()
stds.columns = ['std_sales']
df_stds = df.pivot_table(index=['month'], columns='year',\
values='sale_amount_usd', \
margins=True,fill_value=0).round(2).join(stds)
ax = df_stds.plot(kind='bar', yerr = 'std_sales', colormap=('Blues'),figsize=(18,15))

Reading csv file using pandas where columns are separated by varying amounts of whitespace and commas

I want to read the csv file as a pandas dataframe. CSV file is here: https://www.dropbox.com/s/o3xc74f8v4winaj/aaaa.csv?dl=0
In particular,
I want to skip the first row
The column headers are in row 2. In this case, they are: 1, 1, 2 and TOT. I do not want to hardcode them though. It is ok if the only column that gets extracted is TOT
I do not want to use a non-pandas approach if possible.
Here is what I am doing:
df = pandas.read_csv('https://www.dropbox.com/s/o3xc74f8v4winaj/aaaa.csv?dl=0', skiprows=1, skipinitialspace=True, sep=' ')
But this gives the error:
*** CParserError: Error tokenizing data. C error: Expected 5 fields in line 4, saw 6
The output should look something like this:
1 1 2 TOT
0 DEPTH(m) 0.01 1.24 1.52
1 BD 33kpa(t/m3) 1.6 1.6 1.6
2 SAND(%) 42.1 42.1 65.1
3 SILT(%) 37.9 37.9 16.9
4 CLAY(%) 20 20 18
5 ROCK(%) 12 12 12
6 WLS(kg/ha) 0 5 0.1 5.1
7 WLM(kg/ha) 0 5 0.1 5.1
8 WLSL(kg/ha) 0 4 0.1 4.1
9 WLSC(kg/ha) 0 2.1 0 2.1
10 WLMC(kg/ha) 0 2.1 0 2.1
11 WLSLC(kg/ha) 0 1.7 0 1.7
12 WLSLNC(kg/ha) 0 0.4 0 0.4
13 WBMC(kg/ha) 9 1102.1 250.9 1361.9
14 WHSC(kg/ha) 69 8432 1920 10420
15 WHPC(kg/ha) 146 18018 4102 22266
16 WOC(kg/ha) 224 27556 6272 34
17 WLSN(kg/ha) 0 0 0 0
18 WLMN(kg/ha) 0 0.2 0 0.2
19 WBMN(kg/ha) 0.9 110.2 25.1 136.2
20 WHSN(kg/ha) 7 843 192 1042
21 WHPN(kg/ha) 15 1802 410 2227
22 WON(kg/ha) 22 2755 627 3405
23 CFEM(kg/ha) 0

You can specify a regular expression to be used as your delimiter, in your case it will work with [\s,]{2,20}, i.e. 2 or more spaces or commas:
In [180]: pd.read_csv('aaaa.csv',
skiprows = 1,
sep='[\s,]{2,20}',
index_col=0)
Out[180]:
Unnamed: 1 1 1.1 2 TOT
0
1 DEPTH(m) 0.01 1.24 1.52 NaN
2 BD 33kpa(t/m3) 1.60 1.60 1.60 NaN
3 SAND(%) 42.10 42.10 65.10 NaN
4 SILT(%) 37.90 37.90 16.90 NaN
5 CLAY(%) 20.00 20.00 18.00 NaN
6 ROCK(%) 12.00 12.00 12.00 NaN
7 WLS(kg/ha) 0.00 5.00 0.10 5.1
8 WLM(kg/ha) 0.00 5.00 0.10 5.1
9 WLSL(kg/ha) 0.00 4.00 0.10 4.1
10 WLSC(kg/ha) 0.00 2.10 0.00 2.1
11 WLMC(kg/ha) 0.00 2.10 0.00 2.1
12 WLSLC(kg/ha) 0.00 1.70 0.00 1.7
13 WLSLNC(kg/ha) 0.00 0.40 0.00 0.4
14 WBMC(kg/ha) 9.00 1102.10 250.90 1361.9
15 WHSC(kg/ha) 69.00 8432.00 1920.00 10420.0
16 WHPC(kg/ha) 146.00 18018.00 4102.00 22266.0
17 WOC(kg/ha) 224.00 27556.00 6272.00 34.0
18 WLSN(kg/ha) 0.00 0.00 0.00 0.0
19 WLMN(kg/ha) 0.00 0.20 0.00 0.2
20 WBMN(kg/ha) 0.90 110.20 25.10 136.2
21 WHSN(kg/ha) 7.00 843.00 192.00 1042.0
22 WHPN(kg/ha) 15.00 1802.00 410.00 2227.0
23 WON(kg/ha) 22.00 2755.00 627.00 3405.0
24 CFEM(kg/ha) 0.00 NaN NaN NaN
25, None NaN NaN NaN NaN
26, None NaN NaN NaN NaN

You need to specify the names of the columns. Notice the trick I used to get two columns called 1 (one is an integer name and the other is text).
Given how badly the data is structured, this is not perfect (note row 2 where BD and 33kpa got split because of the space between them).
pd.read_csv('/Downloads/aaaa.csv',
skiprows=2,
skipinitialspace=True,
sep=' ',
names=['Index', 'Description',1,"1",2,'TOT'],
index_col=0)
Description 1 1 2 TOT
Index
1, DEPTH(m) 0.01 1.24 1.52 NaN
2, BD 33kpa(t/m3) 1.60 1.60 1.6
3, SAND(%) 42.1 42.10 65.10 NaN
4, SILT(%) 37.9 37.90 16.90 NaN
5, CLAY(%) 20.0 20.00 18.00 NaN
6, ROCK(%) 12.0 12.00 12.00 NaN
7, WLS(kg/ha) 0.0 5.00 0.10 5.1
8, WLM(kg/ha) 0.0 5.00 0.10 5.1
9, WLSL(kg/ha) 0.0 4.00 0.10 4.1
10, WLSC(kg/ha) 0.0 2.10 0.00 2.1
11, WLMC(kg/ha) 0.0 2.10 0.00 2.1
12, WLSLC(kg/ha) 0.0 1.70 0.00 1.7
13, WLSLNC(kg/ha) 0.0 0.40 0.00 0.4
14, WBMC(kg/ha) 9.0 1102.10 250.90 1361.9
15, WHSC(kg/ha) 69. 8432.00 1920.00 10420.0
16, WHPC(kg/ha) 146. 18018.00 4102.00 22266.0
17, WOC(kg/ha) 224. 27556.00 6272.00 34.0
18, WLSN(kg/ha) 0.0 0.00 0.00 0.0
19, WLMN(kg/ha) 0.0 0.20 0.00 0.2
20, WBMN(kg/ha) 0.9 110.20 25.10 136.2
21, WHSN(kg/ha) 7. 843.00 192.00 1042.0
22, WHPN(kg/ha) 15. 1802.00 410.00 2227.0
23, WON(kg/ha) 22. 2755.00 627.00 3405.0
24, CFEM(kg/ha) 0. NaN NaN NaN
25, NaN NaN NaN NaN NaN
26, NaN NaN NaN NaN NaN
Or you can reset the index.
>>> (pd.read_csv('/Downloads/aaaa.csv',
skiprows=2,
skipinitialspace=True,
sep=' ',
names=['Index', 'Description',1,"1",2,'TOT'],
index_col=0)
.reset_index(drop=True)
.dropna(axis=0, how='all'))
Description 1 1 2 TOT
0 DEPTH(m) 0.01 1.24 1.52 NaN
1 BD 33kpa(t/m3) 1.60 1.60 1.6
2 SAND(%) 42.1 42.10 65.10 NaN
3 SILT(%) 37.9 37.90 16.90 NaN
4 CLAY(%) 20.0 20.00 18.00 NaN
5 ROCK(%) 12.0 12.00 12.00 NaN
6 WLS(kg/ha) 0.0 5.00 0.10 5.1
7 WLM(kg/ha) 0.0 5.00 0.10 5.1
8 WLSL(kg/ha) 0.0 4.00 0.10 4.1
9 WLSC(kg/ha) 0.0 2.10 0.00 2.1
10 WLMC(kg/ha) 0.0 2.10 0.00 2.1
11 WLSLC(kg/ha) 0.0 1.70 0.00 1.7
12 WLSLNC(kg/ha) 0.0 0.40 0.00 0.4
13 WBMC(kg/ha) 9.0 1102.10 250.90 1361.9
14 WHSC(kg/ha) 69. 8432.00 1920.00 10420.0
15 WHPC(kg/ha) 146. 18018.00 4102.00 22266.0
16 WOC(kg/ha) 224. 27556.00 6272.00 34.0
17 WLSN(kg/ha) 0.0 0.00 0.00 0.0
18 WLMN(kg/ha) 0.0 0.20 0.00 0.2
19 WBMN(kg/ha) 0.9 110.20 25.10 136.2
20 WHSN(kg/ha) 7. 843.00 192.00 1042.0
21 WHPN(kg/ha) 15. 1802.00 410.00 2227.0
22 WON(kg/ha) 22. 2755.00 627.00 3405.0
23 CFEM(kg/ha) 0. NaN NaN NaN

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

fill NaN with values of a similar row - python

Related

Python Dataframe Get Value of Last Non Null Column for Each Row

Shift and multipy in Pandas

Applying cumulative correction factor across dataframe

Python Pandas pivot table - how do to tell if differences between means is significant within pivot table?

Reading csv file using pandas where columns are separated by varying amounts of whitespace and commas

Categories

Resources