Shift and multipy in Pandas - python

I have a pandas dataframe looks like this:
Year Ship Age Surviving UEC
2018 12.88 13 0.00 17.2
2019 12.57 12 0.02 17.2
2020 12.24 11 0.06 17.2
2021 11.95 10 0.18 17.2
2022 11.77 9 0.37 17.2
2023 11.70 8 0.60 17.2
2024 11.75 7 0.81 17.2
2025 11.93 6 0.94 17.2
2026 12.12 5 0.99 0.3
2027 12.34 4 1.00 0.3
2028 12.56 3 NaN 0.3
2029 12.76 2 NaN 0.3
2030 12.93 1 NaN 0.3
I want to multiply Ship,Surviving, and UEC columns by down shifting all the columns by 1 at time, so the outputs df2 should look like this:
df2
Stock_uec
0 df1.iloc[:10,1]*df1.iloc[:10,3]*df1.iloc[:10,4]
1 df1.iloc[1:11,1]*df1.iloc[1:11,3]*df1.iloc[1:11,4]
3 df1.iloc[2:12,1]*df1.iloc[2:12,3]*df1.iloc[2:12,4]
Below is my code, but I didn't get the results as I expected.
for i, row in df1.iterrows():
out=df1.iloc[i:i+10,1].shift(1,axis=0)*df1.iloc[i:i+10,3].shift(1,
axis=0)*df1.iloc[i:i+10,4].shift(1, axis=0)
print(out)
Thank you for your help.

IIUC, I think you want:
df.loc[:,'shipping_utc'] = 0
for i in range(df.shape[0]):
df.loc[i:,'shipping_utc'] = df.iloc[i:][['Ship','Surviving','UEC']].prod(axis=1) + df.loc[i:,'shipping_utc']
output
df
Out[25]:
Year Ship Age Surviving UEC shipping_utc
0 2018 12.88 13 0.00 17.2 0.00000
1 2019 12.57 12 0.02 17.2 8.64816
2 2020 12.24 11 0.06 17.2 37.89504
3 2021 11.95 10 0.18 17.2 147.98880
4 2022 11.77 9 0.37 17.2 374.52140
5 2023 11.70 8 0.60 17.2 724.46400
6 2024 11.75 7 0.81 17.2 1145.90700
7 2025 11.93 6 0.94 17.2 1543.07392
8 2026 12.12 5 0.99 0.3 32.39676
9 2027 12.34 4 1.00 0.3 37.02000
10 2028 12.56 3 NaN 0.3 41.44800
11 2029 12.76 2 NaN 0.3 45.93600
12 2030 12.93 1 NaN 0.3 50.42700

Related

fill NaN with values of a similar row

My dataframe has these 2 columns: "reach" and "height". the column "reach" has a lot of missing value. But the column 'height' have all the value needed. What I see is that reach is often a function of height. Therefore, for the rows with NaN, I want to look at the height, then find another row with the same "height" and that has "reach" available, then copy this value to the 1 with missing value
name
SApM
SLpM
height
reach
record
stance
strAcc
strDef
subAvg
tdAcc
tdAvg
tdDef
weight
born_year
win
lose
draw
nc
Justin Frazier
6.11
1.11
6' 0"
75
10-3-0
Southpaw
0.66
0.04
0
0
0
0
265
1989
10
3
0
Gleidson Cutis
8.28
2.99
5' 9"
nan
7-4-0
Orthodox
0.52
0.59
0
0
0
0
155
1989
7
4
0
Xavier Foupa-Pokam
2.5
1.47
6' 1"
nan
32-22-0
Open Stance
0.43
0.49
0
0
0
0.16
185
1982
32
22
0
Mirko Filipovic
1.89
2.11
6' 2"
73
35-11-2-(1 NC)
Southpaw
0.5
0.63
0.3
0.4
0.19
0.78
230
1974
35
11
2
1
Jordan Johnson
2.64
3.45
6' 2"
79
10-0-0
Orthodox
0.47
0.53
1.2
0.42
3.25
1
205
1988
10
0
0
Martin Kampmann
3.28
3.22
6' 0"
72
20-7-0
Orthodox
0.42
0.62
2
0.41
1.86
0.78
170
1982
20
7
0
Darren Elkins
3.05
3.46
5' 10"
71
27-9-0
Orthodox
0.38
0.52
1.1
0.33
2.67
0.56
145
1984
27
9
0
Austen Lane
6.32
5.26
6' 6"
nan
2-1-0
Orthodox
0.35
0.6
0
0
0
0
245
1987
2
1
0
Rachael Ostovich
3.97
2.54
5' 3"
62
4-6-0
Orthodox
0.43
0.57
0.8
0.83
2.03
0.66
125
1991
4
6
0
Travis Lutter
2.42
0.41
5' 11"
75
10-6-0
Orthodox
0.32
0.42
0.7
0.24
1.95
0.3
185
1973
10
6
0
Tom Murphy
0.17
2.5
6' 2"
nan
8-0-0
Southpaw
0.71
0.84
2.5
0.85
7.51
0
227
1974
8
0
0
Darrell Montague
5.38
1.92
5' 6"
67
13-5-0
Southpaw
0.25
0.52
1.4
0.25
0.72
0.33
125
1987
13
5
0
Lauren Murphy
4.25
3.95
5' 5"
67
15-4-0
Orthodox
0.4
0.61
0.1
0.34
1.16
0.7
125
1983
15
4
0
Bill Mahood
3.59
1.54
6' 3"
nan
20-7-1-(1 NC)
Orthodox
0.85
0.17
3.9
0
0
0
200
1967
20
7
1
1
Nate Marquardt
2.32
2.71
6' 0"
74
35-19-2
Orthodox
0.49
0.55
0.8
0.51
1.87
0.7
185
1979
35
19
2
Mike Polchlopek
1.33
2
6' 4"
nan
1-1-0
Orthodox
0.38
0.57
0
0
0
0
285
1965
1
1
0
Harvey Park
7.21
3.77
6' 0"
70
12-3-0
Orthodox
0.5
0.33
0
0
0
0
155
1986
12
3
0
Junyong Park
3.17
4.37
5' 10"
73
13-4-0
Orthodox
0.47
0.58
0.6
0.57
3.02
0.46
185
1991
13
4
0
Ricco Rodriguez
1.15
1.85
6' 4"
nan
53-25-0-(1 NC)
Orthodox
0.51
0.61
1
0.39
2.3
0.4
265
1977
53
25
0
1
Aaron Riley
3.78
3.45
5' 8"
69
30-14-1
Southpaw
0.34
0.61
0.1
0.34
1.18
0.6
155
1980
30
14
1
You can create a height reference dataframe with .groupby() and fetch the first non-NaN entry of a height (if any) by .first(), as follows:
height_ref = df.groupby('height')['reach'].first()
height
5' 10" 71.0
5' 11" 75.0
5' 3" 62.0
5' 5" 67.0
5' 6" 67.0
5' 8" 69.0
5' 9" NaN
6' 0" 75.0
6' 1" NaN
6' 2" 73.0
6' 3" NaN
6' 4" NaN
6' 6" NaN
Name: reach, dtype: float64
Then, you can fill up the NaN values of column reach by looking up the height reference dataframe by .map() and use .fillna() to fill-up values, as follows:
df['reach2'] = df['reach'].fillna(df['height'].map(height_ref))
For demo purpose, I update to a new column reach2. You can overwrite the original column reach as appropriate.
Result:
print(df[['height', 'reach', 'reach2']])
height reach reach2
0 6' 0" 75.0 75.0
1 5' 9" NaN NaN
2 6' 1" NaN NaN
3 6' 2" 73.0 73.0
4 6' 2" 79.0 79.0
5 6' 0" 72.0 72.0
6 5' 10" 71.0 71.0
7 6' 6" NaN NaN
8 5' 3" 62.0 62.0
9 5' 11" 75.0 75.0
10 6' 2" NaN 73.0 <======= filled up with referenced height from other row
11 5' 6" 67.0 67.0
12 5' 5" 67.0 67.0
13 6' 3" NaN NaN
14 6' 0" 74.0 74.0
15 6' 4" NaN NaN
16 6' 0" 70.0 70.0
17 5' 10" 73.0 73.0
18 6' 4" NaN NaN
19 5' 8" 69.0 69.0
I think that a method does not exist to do that in a simple step.
If i were in your shoes I would:
Create a support dataset made up of height|reach fully populated, in which I would store my best guess values
Join the support dataframe with the existing ones, using height as key
Coalesce the values where NaN appears: df.reach = df.reach.fillna(df.from_support_dataset_height)

Python Dataframe Get Value of Last Non Null Column for Each Row

I have a dataframe such as the following:
ID 2016 2017 2018 2019 2020
0 1 1.64 NaN NaN NaN NaN
1 2 NaN NaN NaN 0.78 NaN
2 3 1.11 0.97 1.73 1.23 0.87
3 4 0.84 0.74 1.64 1.47 0.41
4 5 0.75 1.05 NaN NaN NaN
I want to get the values from the last non-null column such that:
ID 2016 2017 2018 2019 2020 LastValue
0 1 1.64 NaN NaN NaN NaN 1.64
1 2 NaN NaN NaN 0.78 NaN 0.78
2 3 1.11 0.97 1.73 1.23 0.87 0.87
3 4 0.84 0.74 1.64 1.47 0.41 0.41
4 5 0.75 1.05 NaN NaN NaN 1.05
I tried to loop through the year columns in reverse as follows but couldn't fully achieve what I want.
for i in reversed(df.columns[1:]):
if df[i] is not None:
val = df[i]
Could you help about this issue? Thanks.
Idea is select all columns without first by DataFrame.iloc, then forward filling per rows missing values and last select last column:
df['LastValue'] = df.iloc[:, 1:].ffill(axis=1).iloc[:, -1]
print (df)
ID 2016 2017 2018 2019 2020 LastValue
0 1 1.64 NaN NaN NaN NaN 1.64
1 2 NaN NaN NaN 0.78 NaN 0.78
2 3 1.11 0.97 1.73 1.23 0.87 0.87
3 4 0.84 0.74 1.64 1.47 0.41 0.41
4 5 0.75 1.05 NaN NaN NaN 1.05
Detail:
print (df.iloc[:, 1:].ffill(axis=1))
2016 2017 2018 2019 2020
0 1.64 1.64 1.64 1.64 1.64
1 NaN NaN NaN 0.78 0.78
2 1.11 0.97 1.73 1.23 0.87
3 0.84 0.74 1.64 1.47 0.41
4 0.75 1.05 1.05 1.05 1.05

Python Pandas pivot table - how do to tell if differences between means is significant within pivot table?

I have written the following:
ax = df.pivot_table(index=['month'], columns='year', values='sale_amount_usd', margins=True,fill_value=0).round(2).plot(kind='bar',colormap=('Blues'),figsize=(18,15))
plt.legend(loc='best')
plt.ylabel('Average Sales Amount in USD')
plt.xlabel('Month')
plt.xticks(rotation=0)
plt.title('Average Sales Amount in USD by Month/Year')
for p in ax.patches:
ax.annotate(str(p.get_height()), (p.get_x() * 1.001, p.get_height() * 1.005))
plt.show();
Which returns a nice bar chart:
I'd now like to be able to tell whether the differences in means within each month, between years, is significant. In other words, is the jump from $321 in March 2013 to $365 in March 2014 a significant increase in average sales amount?
How would I do this? Is there a way to overlay a marker on the pivot table that tells me, visually, when a difference is significant?
edited to add sample data:
event_id event_date week_number week_of_month holiday month day year pub_organization_id clicks sales click_to_sale_conversion_rate sale_amount_usd per_sale_amount_usd per_click_sale_amount pub_commission_usd per_sale_pub_commission_usd per_click_pub_commission_usd
0 3365 1/11/13 2 2 NaN 1. January 11 2013 214 11945 754 0.06 40311.75 53.46 3.37 2418.71 3.21 0.20
1 13793 2/12/13 7 3 NaN 2. February 12 2013 214 11711 1183 0.10 73768.54 62.36 6.30 4426.12 3.74 0.38
2 4626 1/15/13 3 3 NaN 1. January 15 2013 214 11561 1029 0.09 70356.46 68.37 6.09 4221.39 4.10 0.37
3 10917 2/3/13 6 1 NaN 2. February 3 2013 167 11481 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4 14653 2/15/13 7 3 NaN 2. February 15 2013 214 11268 795 0.07 37262.56 46.87 3.31 2235.77 2.81 0.20
5 18448 2/27/13 9 5 NaN 2. February 27 2013 214 11205 504 0.04 48773.71 96.77 4.35 2926.43 5.81 0.26
6 11382 2/5/13 6 2 NaN 2. February 5 2013 214 11166 1324 0.12 93322.84 70.49 8.36 5599.38 4.23 0.50
7 14764 2/16/13 7 3 NaN 2. February 16 2013 214 11042 451 0.04 22235.51 49.30 2.01 1334.14 2.96 0.12
8 17080 2/23/13 8 4 NaN 2. February 23 2013 214 10991 248 0.02 14558.85 58.71 1.32 873.53 3.52 0.08
9 21171 3/8/13 10 2 NaN 3. March 8 2013 214 10910 1081 0.10 52005.12 48.11 4.77 3631.28 3.36 0.33
10 16417 2/21/13 8 4 NaN 2. February 21 2013 214 10826 507 0.05 44907.20 88.57 4.15 2694.43 5.31 0.25
11 13399 2/11/13 7 3 NaN 2. February 11 2013 214 10772 1142 0.11 38549.55 33.76 3.58 2312.97 2.03 0.21
12 1532 1/5/13 1 1 NaN 1. January 5 2013 214 10750 610 0.06 29838.49 48.92 2.78 1790.31 2.93 0.17
13 22500 3/13/13 11 3 NaN 3. March 13 2013 214 10743 821 0.08 47310.71 57.63 4.40 3688.83 4.49 0.34
14 5840 1/19/13 3 3 NaN 1. January 19 2013 214 10693 487 0.05 28427.35 58.37 2.66 1705.64 3.50 0.16
15 19566 3/3/13 10 1 NaN 3. March 3 2013 214 10672 412 0.04 15722.29 38.16 1.47 1163.16 2.82 0.11
16 26313 3/25/13 13 5 NaN 3. March 25 2013 214 10629 529 0.05 21946.51 41.49 2.06 1589.84 3.01 0.15
17 19732 3/4/13 10 2 NaN 3. March 4 2013 214 10619 1034 0.10 37257.20 36.03 3.51 2713.71 2.62 0.26
18 18569 2/28/13 9 5 NaN 2. February 28 2013 214 10603 414 0.04 40920.28 98.84 3.86 2455.22 5.93 0.23
19 8704 1/28/13 5 5 NaN 1. January 28 2013 214 10548 738 0.07 29041.87 39.35 2.75 1742.52 2.36 0.17
Although not conclusive, you could use error bars (through the yerr argument in plt.plot) that represent one standard deviation of uncertainty, and then just eyeball the overlap of the intervals. Something like (not tested)...
stds = df.groupby(['month', 'year'])['sale_amount_usd'].std().to_frame()
stds.columns = ['std_sales']
df_stds = df.pivot_table(index=['month'], columns='year',\
values='sale_amount_usd', \
margins=True,fill_value=0).round(2).join(stds)
ax = df_stds.plot(kind='bar', yerr = 'std_sales', colormap=('Blues'),figsize=(18,15))

pandas dataframe append values to one column based on the values in another dataframe

I have two data frame and I want to add one column into first data frame according to values matching with second dataframe.
First df = df1
OCC_1988 DIST YEAR COW EST_LEV
0 0 100.00 1988 NaN 4
1 10000 5.83 1988 3.0 4
2 13002 0.28 1988 3.0 4
3 13005 0.16 1988 3.0 4
4 13008 0.06 1988 3.0 4
5 13011 0.38 1988 3.0 4
6 13014 0.39 1988 3.0 4
7 13017 0.16 1988 3.0 4
8 15017 0.22 1988 3.0 4
9 15023 1.96 1988 3.0 4
10 19005 1.30 1988 3.0 4
Second df = df2
soccode oescode oes99code
0 11-1011 19002 11-1011
1 11-1011 19005 11-1011
2 11-1021 19002 11-1021
3 11-1021 19005 11-1021
4 11-1031 15023 11-1031
5 11-2011 13011 11-2011
6 11-2021 13014 11-2021
7 11-2022 13002 11-2022
8 11-2031 13005 11-2031
9 11-3011 13008 11-3011
10 11-3021 13017 11-3021
11 11-3031 15017 11-3031
12 11-3041 10000 11-3040
I want to matching with df1['OCC_1988'] to df2['oescode'] and add matched df2['soccode'] values to newly added column 'new_occ_2000'.
So the final df will be like this:
OCC_1988 DIST YEAR COW EST_LEV new_occ_2000
0 0 100.00 1988 NaN 4 Nan
1 10000 5.83 1988 3.0 4 11-3041
2 13002 0.28 1988 3.0 4 11-2022
3 13005 0.16 1988 3.0 4 11-2031
4 13008 0.06 1988 3.0 4 11-3011
5 13011 0.38 1988 3.0 4 11-2011
6 13014 0.39 1988 3.0 4 11-2021
7 13017 0.16 1988 3.0 4 11-3021
8 15017 0.22 1988 3.0 4 11-3031
9 15023 1.96 1988 3.0 4 11-1031
10 19005 1.30 1988 3.0 4 11-1021
Is there any elegant way to do this?
Use merge:
(df1.merge(df2[['oescode','soccode']], left_on='OCC_1988', right_on='oescode', how='left')
.drop('oescode',axis=1)
.rename(columns={'soccode':'new_occ_2000'}))
Output:
OCC_1988 DIST YEAR COW EST_LEV new_occ_2000
0 0 100.00 1988 NaN 4 NaN
1 10000 5.83 1988 3.0 4 11-3041
2 13002 0.28 1988 3.0 4 11-2022
3 13005 0.16 1988 3.0 4 11-2031
4 13008 0.06 1988 3.0 4 11-3011
5 13011 0.38 1988 3.0 4 11-2011
6 13014 0.39 1988 3.0 4 11-2021
7 13017 0.16 1988 3.0 4 11-3021
8 15017 0.22 1988 3.0 4 11-3031
9 15023 1.96 1988 3.0 4 11-1031
10 19005 1.30 1988 3.0 4 11-1011
11 19005 1.30 1988 3.0 4 11-1021

Reading csv file using pandas where columns are separated by varying amounts of whitespace and commas

I want to read the csv file as a pandas dataframe. CSV file is here: https://www.dropbox.com/s/o3xc74f8v4winaj/aaaa.csv?dl=0
In particular,
I want to skip the first row
The column headers are in row 2. In this case, they are: 1, 1, 2 and TOT. I do not want to hardcode them though. It is ok if the only column that gets extracted is TOT
I do not want to use a non-pandas approach if possible.
Here is what I am doing:
df = pandas.read_csv('https://www.dropbox.com/s/o3xc74f8v4winaj/aaaa.csv?dl=0', skiprows=1, skipinitialspace=True, sep=' ')
But this gives the error:
*** CParserError: Error tokenizing data. C error: Expected 5 fields in line 4, saw 6
The output should look something like this:
1 1 2 TOT
0 DEPTH(m) 0.01 1.24 1.52
1 BD 33kpa(t/m3) 1.6 1.6 1.6
2 SAND(%) 42.1 42.1 65.1
3 SILT(%) 37.9 37.9 16.9
4 CLAY(%) 20 20 18
5 ROCK(%) 12 12 12
6 WLS(kg/ha) 0 5 0.1 5.1
7 WLM(kg/ha) 0 5 0.1 5.1
8 WLSL(kg/ha) 0 4 0.1 4.1
9 WLSC(kg/ha) 0 2.1 0 2.1
10 WLMC(kg/ha) 0 2.1 0 2.1
11 WLSLC(kg/ha) 0 1.7 0 1.7
12 WLSLNC(kg/ha) 0 0.4 0 0.4
13 WBMC(kg/ha) 9 1102.1 250.9 1361.9
14 WHSC(kg/ha) 69 8432 1920 10420
15 WHPC(kg/ha) 146 18018 4102 22266
16 WOC(kg/ha) 224 27556 6272 34
17 WLSN(kg/ha) 0 0 0 0
18 WLMN(kg/ha) 0 0.2 0 0.2
19 WBMN(kg/ha) 0.9 110.2 25.1 136.2
20 WHSN(kg/ha) 7 843 192 1042
21 WHPN(kg/ha) 15 1802 410 2227
22 WON(kg/ha) 22 2755 627 3405
23 CFEM(kg/ha) 0
You can specify a regular expression to be used as your delimiter, in your case it will work with [\s,]{2,20}, i.e. 2 or more spaces or commas:
In [180]: pd.read_csv('aaaa.csv',
skiprows = 1,
sep='[\s,]{2,20}',
index_col=0)
Out[180]:
Unnamed: 1 1 1.1 2 TOT
0
1 DEPTH(m) 0.01 1.24 1.52 NaN
2 BD 33kpa(t/m3) 1.60 1.60 1.60 NaN
3 SAND(%) 42.10 42.10 65.10 NaN
4 SILT(%) 37.90 37.90 16.90 NaN
5 CLAY(%) 20.00 20.00 18.00 NaN
6 ROCK(%) 12.00 12.00 12.00 NaN
7 WLS(kg/ha) 0.00 5.00 0.10 5.1
8 WLM(kg/ha) 0.00 5.00 0.10 5.1
9 WLSL(kg/ha) 0.00 4.00 0.10 4.1
10 WLSC(kg/ha) 0.00 2.10 0.00 2.1
11 WLMC(kg/ha) 0.00 2.10 0.00 2.1
12 WLSLC(kg/ha) 0.00 1.70 0.00 1.7
13 WLSLNC(kg/ha) 0.00 0.40 0.00 0.4
14 WBMC(kg/ha) 9.00 1102.10 250.90 1361.9
15 WHSC(kg/ha) 69.00 8432.00 1920.00 10420.0
16 WHPC(kg/ha) 146.00 18018.00 4102.00 22266.0
17 WOC(kg/ha) 224.00 27556.00 6272.00 34.0
18 WLSN(kg/ha) 0.00 0.00 0.00 0.0
19 WLMN(kg/ha) 0.00 0.20 0.00 0.2
20 WBMN(kg/ha) 0.90 110.20 25.10 136.2
21 WHSN(kg/ha) 7.00 843.00 192.00 1042.0
22 WHPN(kg/ha) 15.00 1802.00 410.00 2227.0
23 WON(kg/ha) 22.00 2755.00 627.00 3405.0
24 CFEM(kg/ha) 0.00 NaN NaN NaN
25, None NaN NaN NaN NaN
26, None NaN NaN NaN NaN
You need to specify the names of the columns. Notice the trick I used to get two columns called 1 (one is an integer name and the other is text).
Given how badly the data is structured, this is not perfect (note row 2 where BD and 33kpa got split because of the space between them).
pd.read_csv('/Downloads/aaaa.csv',
skiprows=2,
skipinitialspace=True,
sep=' ',
names=['Index', 'Description',1,"1",2,'TOT'],
index_col=0)
Description 1 1 2 TOT
Index
1, DEPTH(m) 0.01 1.24 1.52 NaN
2, BD 33kpa(t/m3) 1.60 1.60 1.6
3, SAND(%) 42.1 42.10 65.10 NaN
4, SILT(%) 37.9 37.90 16.90 NaN
5, CLAY(%) 20.0 20.00 18.00 NaN
6, ROCK(%) 12.0 12.00 12.00 NaN
7, WLS(kg/ha) 0.0 5.00 0.10 5.1
8, WLM(kg/ha) 0.0 5.00 0.10 5.1
9, WLSL(kg/ha) 0.0 4.00 0.10 4.1
10, WLSC(kg/ha) 0.0 2.10 0.00 2.1
11, WLMC(kg/ha) 0.0 2.10 0.00 2.1
12, WLSLC(kg/ha) 0.0 1.70 0.00 1.7
13, WLSLNC(kg/ha) 0.0 0.40 0.00 0.4
14, WBMC(kg/ha) 9.0 1102.10 250.90 1361.9
15, WHSC(kg/ha) 69. 8432.00 1920.00 10420.0
16, WHPC(kg/ha) 146. 18018.00 4102.00 22266.0
17, WOC(kg/ha) 224. 27556.00 6272.00 34.0
18, WLSN(kg/ha) 0.0 0.00 0.00 0.0
19, WLMN(kg/ha) 0.0 0.20 0.00 0.2
20, WBMN(kg/ha) 0.9 110.20 25.10 136.2
21, WHSN(kg/ha) 7. 843.00 192.00 1042.0
22, WHPN(kg/ha) 15. 1802.00 410.00 2227.0
23, WON(kg/ha) 22. 2755.00 627.00 3405.0
24, CFEM(kg/ha) 0. NaN NaN NaN
25, NaN NaN NaN NaN NaN
26, NaN NaN NaN NaN NaN
Or you can reset the index.
>>> (pd.read_csv('/Downloads/aaaa.csv',
skiprows=2,
skipinitialspace=True,
sep=' ',
names=['Index', 'Description',1,"1",2,'TOT'],
index_col=0)
.reset_index(drop=True)
.dropna(axis=0, how='all'))
Description 1 1 2 TOT
0 DEPTH(m) 0.01 1.24 1.52 NaN
1 BD 33kpa(t/m3) 1.60 1.60 1.6
2 SAND(%) 42.1 42.10 65.10 NaN
3 SILT(%) 37.9 37.90 16.90 NaN
4 CLAY(%) 20.0 20.00 18.00 NaN
5 ROCK(%) 12.0 12.00 12.00 NaN
6 WLS(kg/ha) 0.0 5.00 0.10 5.1
7 WLM(kg/ha) 0.0 5.00 0.10 5.1
8 WLSL(kg/ha) 0.0 4.00 0.10 4.1
9 WLSC(kg/ha) 0.0 2.10 0.00 2.1
10 WLMC(kg/ha) 0.0 2.10 0.00 2.1
11 WLSLC(kg/ha) 0.0 1.70 0.00 1.7
12 WLSLNC(kg/ha) 0.0 0.40 0.00 0.4
13 WBMC(kg/ha) 9.0 1102.10 250.90 1361.9
14 WHSC(kg/ha) 69. 8432.00 1920.00 10420.0
15 WHPC(kg/ha) 146. 18018.00 4102.00 22266.0
16 WOC(kg/ha) 224. 27556.00 6272.00 34.0
17 WLSN(kg/ha) 0.0 0.00 0.00 0.0
18 WLMN(kg/ha) 0.0 0.20 0.00 0.2
19 WBMN(kg/ha) 0.9 110.20 25.10 136.2
20 WHSN(kg/ha) 7. 843.00 192.00 1042.0
21 WHPN(kg/ha) 15. 1802.00 410.00 2227.0
22 WON(kg/ha) 22. 2755.00 627.00 3405.0
23 CFEM(kg/ha) 0. NaN NaN NaN

Categories