Get data from Multi-index dataframe based on numpy array

Get data from Multi-index dataframe based on numpy array - python

From the following dataframe:
dim_0 dim_1
0 0 40.54 23.40 6.70 1.70 1.82 0.96 1.62
1 175.89 20.24 7.78 1.55 1.45 0.80 1.44
2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1 0 21.38 24.00 5.90 1.60 2.55 1.50 2.36
1 130.29 18.40 8.49 1.52 1.45 0.80 1.47
2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2 0 6.30 25.70 5.60 1.70 2.16 1.16 1.87
1 73.45 21.49 6.88 1.61 1.61 0.94 1.63
2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3 0 16.64 25.70 5.70 1.60 2.17 1.12 1.76
1 125.89 19.10 7.52 1.43 1.44 0.78 1.40
2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4 0 41.38 24.70 5.60 1.50 2.08 1.16 1.85
1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5 0 180.59 16.40 3.80 1.10 4.63 3.86 5.71
1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
6 0 13.59 24.40 6.10 1.70 2.62 1.51 2.36
1 103.19 19.02 8.70 1.53 1.48 0.76 1.38
2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
7 0 3.15 24.70 5.60 1.50 2.14 1.22 2.00
1 55.90 23.10 6.07 1.50 1.86 1.12 1.87
2 208.04 20.39 6.82 1.35 1.47 0.95 1.67
How can I get only the rows from dim_01 that match the array [1 0 0 1 2 0 1 2]?
Desired result is:
0 175.89 20.24 7.78 1.55 1.45 0.80 1.44
1 21.38 24.00 5.90 1.60 2.55 1.50 2.36
2 6.30 25.70 5.60 1.70 2.16 1.16 1.87
3 125.89 19.10 7.52 1.43 1.44 0.78 1.40
4 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5 180.59 16.40 3.80 1.10 4.63 3.86 5.71
7 103.19 19.02 8.70 1.53 1.48 0.76 1.38
8 208.04 20.39 6.82 1.35 1.47 0.95 1.67
I've tried using slicing, cross-section, etc but no success.
Thanks in advance for the help.

Use MultiIndex.from_arrays and select by DataFrame.loc:
arr = np.array([1, 0, 0, 1, 2, 0, 1 ,2])
df = df.loc[pd.MultiIndex.from_arrays([df.index.levels[0], arr])]
print (df)
2 3 4 5 6 7 8
0
0 1 175.89 20.24 7.78 1.55 1.45 0.80 1.44
1 0 21.38 24.00 5.90 1.60 2.55 1.50 2.36
2 0 6.30 25.70 5.60 1.70 2.16 1.16 1.87
3 1 125.89 19.10 7.52 1.43 1.44 0.78 1.40
4 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5 0 180.59 16.40 3.80 1.10 4.63 3.86 5.71
6 1 103.19 19.02 8.70 1.53 1.48 0.76 1.38
7 2 208.04 20.39 6.82 1.35 1.47 0.95 1.67
arr = np.array([1, 0, 0, 1, 2, 0, 1 ,2])
df = df.loc[pd.MultiIndex.from_arrays([df.index.levels[0], arr])].droplevel(1)
print (df)
2 3 4 5 6 7 8
0
0 175.89 20.24 7.78 1.55 1.45 0.80 1.44
1 21.38 24.00 5.90 1.60 2.55 1.50 2.36
2 6.30 25.70 5.60 1.70 2.16 1.16 1.87
3 125.89 19.10 7.52 1.43 1.44 0.78 1.40
4 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5 180.59 16.40 3.80 1.10 4.63 3.86 5.71
6 103.19 19.02 8.70 1.53 1.48 0.76 1.38
7 208.04 20.39 6.82 1.35 1.47 0.95 1.67

I'd go with advanced indexing using Numpy:
l = [1, 0, 0, 1, 2, 0, 1, 2]
i,j = df.index.levels
ix = np.array(l)+np.arange(i.max()+1)*(j.max()+1)
pd.DataFrame(df.to_numpy()[ix])
0 1 2 3 4 5 6
0 175.89 20.24 7.78 1.55 1.45 0.80 1.44
1 21.38 24.00 5.90 1.60 2.55 1.50 2.36
2 6.30 25.70 5.60 1.70 2.16 1.16 1.87
3 125.89 19.10 7.52 1.43 1.44 0.78 1.40
4 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5 180.59 16.40 3.80 1.10 4.63 3.86 5.71
6 103.19 19.02 8.70 1.53 1.48 0.76 1.38
7 208.04 20.39 6.82 1.35 1.47 0.95 1.67

Try the following code:
mask_array = [1 0 0 1 2 0 1 2]
df_first = pd.DataFrame() # < It's your first array >
new_array = df_first[df_first['dim_1'].isin(mask_array)]

Related

Create new columns from existing column and sort the values from smallest to largest

I want to locate all values greater than 0 within columns n_1 to n_3 inclusive, and populate them into columns new_1 to new_3 inclusive in order of smallest to largest, such that column new_1 has the smallest value and new_3 has the largest value. If any columns are not populated because there are not enough values to do so, then populate them with 0
EVENT_ID n_1 n_2 n_3
143419013 0.00 7.80 12.83
143419017 1.72 20.16 16.08
143419021 3.03 12.00 17.14
143419025 2.63 0.00 2.51
143419028 2.38 22.00 2.96
143419030 0.00 40.00 0.00
Expected Output:
EVENT_ID n_1 n_2 n_3 new_1 new_2 new_3
143419013 0.00 7.80 12.83 7.80 12.83 0.00
143419017 1.72 20.16 16.08 1.72 16.08 20.16
143419021 3.03 12.00 17.14 3.03 12.00 17.14
143419025 2.63 0.00 2.51 2.51 13.78 0.00
143419028 2.38 22.00 2.96 2.38 2.96 22.00
143419030 3.92 40.00 11.23 40.00 0.00 0.00
I tried using the apply function to create this new column but I got an error down the line.
df[['new_1','new_2','new_3']] = pivot_df.apply(lambda a,b,c: a.n_1, b.n_2, c.n_3 axis=1)

Let's subset the DataFrame remove values that do not meet the condition with where, then use np.sort to sort across rows and fillna to replace any missing values with 0:
cols = ['n_1', 'n_2', 'n_3']
df[[f'new_{i}' for i in range(1, len(cols) + 1)]] = pd.DataFrame(
np.sort(df[cols].where(df[cols] > 0), axis=1)
).fillna(0)
df:
EVENT_ID n_1 n_2 n_3 new_1 new_2 new_3
0 143419013 0.00 7.80 12.83 7.80 12.83 0.00
1 143419017 1.72 20.16 16.08 1.72 16.08 20.16
2 143419021 3.03 12.00 17.14 3.03 12.00 17.14
3 143419025 2.63 0.00 2.51 2.51 2.63 0.00
4 143419028 2.38 22.00 2.96 2.38 2.96 22.00
5 143419030 0.00 40.00 0.00 40.00 0.00 0.00
Setup used:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'EVENT_ID': [143419013, 143419017, 143419021, 143419025, 143419028,
143419030],
'n_1': [0.0, 1.72, 3.03, 2.63, 2.38, 0.0],
'n_2': [7.8, 20.16, 12.0, 0.0, 22.0, 40.0],
'n_3': [12.83, 16.08, 17.14, 2.51, 2.96, 0.0]
})
df:
EVENT_ID n_1 n_2 n_3
0 143419013 0.00 7.80 12.83
1 143419017 1.72 20.16 16.08
2 143419021 3.03 12.00 17.14
3 143419025 2.63 0.00 2.51
4 143419028 2.38 22.00 2.96
5 143419030 0.00 40.00 0.00

change values in dataframe row based on condition

I have this dataframe
Region 2021 2022 2023
0 Europe 0.00 0.00 0.00
1 N.Amerca 0.50 0.50 0.50
2 N.Amerca 4.40 4.40 4.40
3 N.Amerca 0.00 8.00 8.00
4 Asia 0.00 0.00 1.75
5 Asia 0.00 0.00 0.00
6 Asia 0.00 0.00 2.00
7 N.Amerca 0.00 0.00 0.50
8 Eurpoe 6.00 6.00 6.00
9 Asia 7.50 7.50 7.50
10 Asia 3.75 3.75 3.75
11 Asia 3.50 3.50 3.50
12 Asia 3.80 3.80 3.80
13 Asia 0.00 0.00 0.00
14 Europe 6.52 6.52 6.52
Once a value in 2021 is found it should carry a 0 to the rest (2022 and 2023)
and if a value in 2022 is found -it should carry 0 to the rest. In other words, once value in found in columns 2021 and forth it should zero the rest on the right.
expected result would be:
Region 2021 2022 2023
0 Europe 0.00 0.00 0.00
1 N.Amerca 0.50 0.00 0.00
2 N.Amerca 4.40 0.00 0.00
3 N.Amerca 0.00 8.00 0.00
4 Asia 0.00 0.00 1.75
5 Asia 0.00 0.00 0.00
6 Asia 0.00 0.00 2.00
7 N.Amerca 0.00 0.00 0.50
8 Eurpoe 6.00 0.00 0.00
9 Asia 7.50 0.00 0.00
10 Asia 3.75 0.00 0.00
11 Asia 3.50 0.00 0.00
12 Asia 3.80 0.00 0.00
13 Asia 0.00 0.00 0.00
14 Europe 6.52 0.00 0.00
I have tried to apply a lambda:
def foo(r):
#if r['2021')>0: then 2020 and forth should be zero)
df = df.apply(lambda x: foo(x), axis=1)
but the challange is that there are 2021 - to 2030 and the foo becomes a mess)

Let us try duplicated
df = df.mask(df.T.apply(pd.Series.duplicated).T,0)
Out[57]:
Region 2021 2022 2023
0 Europe 0.00 0.0 0.00
1 N.Amerca 0.50 0.0 0.00
2 N.Amerca 4.40 0.0 0.00
3 N.Amerca 0.00 8.0 0.00
4 Asia 0.00 0.0 1.75
5 Asia 0.00 0.0 0.00
6 Asia 0.00 0.0 2.00
7 N.Amerca 0.00 0.0 0.50
8 Eurpoe 6.00 0.0 0.00
9 Asia 7.50 0.0 0.00
10 Asia 3.75 0.0 0.00
11 Asia 3.50 0.0 0.00
12 Asia 3.80 0.0 0.00
13 Asia 0.00 0.0 0.00
14 Europe 6.52 0.0 0.00

This is another way:
df2 = df.set_index('Region').diff(axis=1).reset_index()
df2['2021'] = df['2021']
or:
df.iloc[:,1:].where(df.iloc[:,1:].ne(0).cumsum(axis=1).eq(1),0)
Output:
2021 2022 2023
0 0.00 0.0 0.00
1 0.50 0.0 0.00
2 4.40 0.0 0.00
3 0.00 8.0 0.00
4 0.00 0.0 1.75
5 0.00 0.0 0.00
6 0.00 0.0 2.00
7 0.00 0.0 0.50
8 6.00 0.0 0.00
9 7.50 0.0 0.00
10 3.75 0.0 0.00
11 3.50 0.0 0.00
12 3.80 0.0 0.00
13 0.00 0.0 0.00
14 6.52 0.0 0.00

Pandas: Sum current and following cell in Pandas DataFrame

Python code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas_datareader import data as wb
stock='3988.HK'
df = wb.DataReader(stock,data_source='yahoo',start='2018-07-01')
rsi_period = 14
chg = df['Close'].diff(1)
gain = chg.mask(chg<0,0)
df['Gain'] = gain
loss = chg.mask(chg>0,0)
df['Loss'] = loss
avg_gain = gain.ewm(com = rsi_period-1,min_periods=rsi_period).mean()
avg_loss = loss.ewm(com = rsi_period-1,min_periods=rsi_period).mean()
df['Avg Gain'] = avg_gain
df['Avg Loss'] = avg_loss
rs = abs(avg_gain/avg_loss)
rsi = 100-(100/(1+rs))
df['RSI'] = rsi
df.reset_index(inplace=True)
df
Output:
Date High Low Open Close Volume Adj Close Gain Loss Avg Gain Avg Loss RSI
0 2018-07-03 3.87 3.76 3.83 3.84 684899302.0 3.629538 NaN NaN NaN NaN NaN
1 2018-07-04 3.91 3.84 3.86 3.86 460325574.0 3.648442 0.02 0.00 NaN NaN NaN
2 2018-07-05 3.70 3.62 3.68 3.68 292810499.0 3.680000 0.00 -0.18 NaN NaN NaN
3 2018-07-06 3.72 3.61 3.69 3.67 343653088.0 3.670000 0.00 -0.01 NaN NaN NaN
4 2018-07-09 3.75 3.68 3.70 3.69 424596186.0 3.690000 0.02 0.00 NaN NaN NaN
5 2018-07-10 3.74 3.70 3.71 3.71 327048051.0 3.710000 0.02 0.00 NaN NaN NaN
6 2018-07-11 3.65 3.61 3.63 3.64 371355401.0 3.640000 0.00 -0.07 NaN NaN NaN
7 2018-07-12 3.69 3.63 3.66 3.66 309888328.0 3.660000 0.02 0.00 NaN NaN NaN
8 2018-07-13 3.69 3.62 3.69 3.63 261928758.0 3.630000 0.00 -0.03 NaN NaN NaN
9 2018-07-16 3.63 3.57 3.61 3.62 306970074.0 3.620000 0.00 -0.01 NaN NaN NaN
10 2018-07-17 3.62 3.56 3.62 3.58 310294921.0 3.580000 0.00 -0.04 NaN NaN NaN
11 2018-07-18 3.61 3.55 3.58 3.58 334592695.0 3.580000 0.00 0.00 NaN NaN NaN
12 2018-07-19 3.61 3.56 3.61 3.56 211984563.0 3.560000 0.00 -0.02 NaN NaN NaN
13 2018-07-20 3.64 3.52 3.57 3.61 347506394.0 3.610000 0.05 0.00 NaN NaN NaN
14 2018-07-23 3.65 3.57 3.59 3.62 313125328.0 3.620000 0.01 0.00 0.010594 -0.021042 33.487100
15 2018-07-24 3.71 3.60 3.60 3.68 367627204.0 3.680000 0.06 0.00 0.015854 -0.018802 45.745967
16 2018-07-25 3.73 3.68 3.72 3.69 270460990.0 3.690000 0.01 0.00 0.015252 -0.016868 47.483263
17 2018-07-26 3.73 3.66 3.72 3.69 234388072.0 3.690000 0.00 0.00 0.013731 -0.015186 47.483263
18 2018-07-27 3.70 3.66 3.68 3.69 190039532.0 3.690000 0.00 0.00 0.012399 -0.013713 47.483263
19 2018-07-30 3.72 3.67 3.68 3.70 163971848.0 3.700000 0.01 0.00 0.012172 -0.012417 49.502851
20 2018-07-31 3.70 3.66 3.67 3.68 168486023.0 3.680000 0.00 -0.02 0.011047 -0.013118 45.716244
21 2018-08-01 3.72 3.66 3.71 3.68 199801191.0 3.680000 0.00 0.00 0.010047 -0.011930 45.716244
22 2018-08-02 3.68 3.59 3.66 3.61 307920738.0 3.610000 0.00 -0.07 0.009155 -0.017088 34.884632
23 2018-08-03 3.62 3.57 3.59 3.61 184816985.0 3.610000 0.00 0.00 0.008356 -0.015596 34.884632
24 2018-08-06 3.66 3.60 3.62 3.61 189696153.0 3.610000 0.00 0.00 0.007637 -0.014256 34.884632
25 2018-08-07 3.66 3.61 3.63 3.65 216157642.0 3.650000 0.04 0.00 0.010379 -0.013048 44.302922
26 2018-08-08 3.66 3.61 3.65 3.63 215365540.0 3.630000 0.00 -0.02 0.009511 -0.013629 41.101805
27 2018-08-09 3.66 3.59 3.59 3.65 230275455.0 3.650000 0.02 0.00 0.010378 -0.012504 45.353992
28 2018-08-10 3.66 3.60 3.65 3.62 219157328.0 3.620000 0.00 -0.03 0.009530 -0.013933 40.617049
29 2018-08-13 3.59 3.54 3.58 3.56 270620120.0 3.560000 0.00 -0.06 0.008759 -0.017658 33.158019
In this case, i want to create a new column 'max close within 14 trade days'.
'max close within 14 trade days' = Maximum 'close' within next 14 days.
For example,
in row 0, the data range should be from row 1 to row 15,
'max close within 14 trade days' = 3.86

You can do the following:
# convert to date time
df['Date'] = pd.to_datetime(df['Date'])
# calculate max for 14 days
df['max_close_within_14_trade_days'] = df['Date'].map(df.groupby([pd.Grouper(key='Date', freq='14D')])['Close'].max())
# fill missing values by previous value
df['max_close_within_14_trade_days'].fillna(method='ffill', inplace=True)
Date High Low Open Close max_close_within_14_trade_days
0 0 2018-07-03 3.87 3.76 3.83 3.86
1 1 2018-07-04 3.91 3.84 3.86 3.86
2 2 2018-07-05 3.70 3.62 3.68 3.86
3 3 2018-07-06 3.72 3.61 3.69 3.86
4 4 2018-07-09 3.75 3.68 3.70 3.86
Other solution:
df['max_close_within_14_trade_days'] = [df.loc[x+1:x+14,'Close'].max() for x in range(0, df.shape[0])]

interacting over a dateframe with functions

if I have a date frame like this:
N
EG_00_04 NEG_04_08 NEG_08_12 NEG_12_16 NEG_16_20 NEG_20_24 \
datum_von
2017-10-12 21.69 15.36 0.87 1.42 0.76 0.65
2017-10-13 11.85 8.08 1.39 2.86 1.02 0.55
2017-10-14 7.83 5.88 1.87 2.04 2.29 2.18
2017-10-15 14.64 11.28 2.62 3.35 2.13 1.25
2017-10-16 5.11 5.82 -0.30 -0.38 -0.24 -0.10
2017-10-17 12.09 9.61 0.20 1.09 0.39 0.57
And I wanna check the values that are above 0 and change them to zero when they are lower.
Not sure how should I use the function iterrows() and the loc() function to do so.

you can try:
df1 = df[df > 0].fillna(0)
as result:
In [24]: df
Out[24]:
EG_00_04 NEG_04_08 NEG_08_12 NEG_12_16 NEG_16_20 NEG_20_24 \
0 2017-10-12 21.69 15.36 0.87 1.42 0.76
1 2017-10-13 11.85 8.08 1.39 2.86 1.02
2 2017-10-14 7.83 5.88 1.87 2.04 2.29
3 2017-10-15 14.64 11.28 2.62 3.35 2.13
4 2017-10-16 5.11 5.82 -0.30 -0.38 -0.24
5 2017-10-17 12.09 9.61 0.20 1.09 0.39
datum_von
0 0.65
1 0.55
2 2.18
3 1.25
4 -0.10
5 0.57
In [25]: df1 = df[df > 0].fillna(0)
In [26]: df1
Out[26]:
EG_00_04 NEG_04_08 NEG_08_12 NEG_12_16 NEG_16_20 NEG_20_24 \
0 2017-10-12 21.69 15.36 0.87 1.42 0.76
1 2017-10-13 11.85 8.08 1.39 2.86 1.02
2 2017-10-14 7.83 5.88 1.87 2.04 2.29
3 2017-10-15 14.64 11.28 2.62 3.35 2.13
4 2017-10-16 5.11 5.82 0.00 0.00 0.00
5 2017-10-17 12.09 9.61 0.20 1.09 0.39
datum_von
0 0.65
1 0.55
2 2.18
3 1.25
4 0.00
5 0.57

clip_lower and mask solutions are good.
Here is another one with applymap:
df.applymap(lambda x: max(0.0, x))

Use clip_lower:
df = df.clip_lower(0)
print (df)
G_00_04 NEG_04_08 NEG_08_12 NEG_12_16 NEG_16_20 NEG_20_24
datum_von
2017-10-12 21.69 15.36 0.87 1.42 0.76 0.65
2017-10-13 11.85 8.08 1.39 2.86 1.02 0.55
2017-10-14 7.83 5.88 1.87 2.04 2.29 2.18
2017-10-15 14.64 11.28 2.62 3.35 2.13 1.25
2017-10-16 5.11 5.82 0.00 0.00 0.00 0.00
2017-10-17 12.09 9.61 0.20 1.09 0.39 0.57
If first column is not index:
df = df.set_index('datum_von').clip_lower(0)
print (df)
G_00_04 NEG_04_08 NEG_08_12 NEG_12_16 NEG_16_20 NEG_20_24
datum_von
2017-10-12 21.69 15.36 0.87 1.42 0.76 0.65
2017-10-13 11.85 8.08 1.39 2.86 1.02 0.55
2017-10-14 7.83 5.88 1.87 2.04 2.29 2.18
2017-10-15 14.64 11.28 2.62 3.35 2.13 1.25
2017-10-16 5.11 5.82 0.00 0.00 0.00 0.00
2017-10-17 12.09 9.61 0.20 1.09 0.39 0.57
Alternative solution:
df = df.mask(df < 0, 0)
print (df)
G_00_04 NEG_04_08 NEG_08_12 NEG_12_16 NEG_16_20 NEG_20_24
datum_von
2017-10-12 21.69 15.36 0.87 1.42 0.76 0.65
2017-10-13 11.85 8.08 1.39 2.86 1.02 0.55
2017-10-14 7.83 5.88 1.87 2.04 2.29 2.18
2017-10-15 14.64 11.28 2.62 3.35 2.13 1.25
2017-10-16 5.11 5.82 0.00 0.00 0.00 0.00
2017-10-17 12.09 9.61 0.20 1.09 0.39 0.57
Timings:
df = pd.concat([df]*10000).reset_index(drop=True)
In [240]: %timeit (df.applymap(lambda x: max(0.0, x)))
10 loops, best of 3: 164 ms per loop
In [241]: %timeit (df[df > 0].fillna(0))
100 loops, best of 3: 7.05 ms per loop
In [242]: %timeit (df.clip_lower(0))
1000 loops, best of 3: 1.96 ms per loop
In [243]: %timeit df.mask(df < 0, 0)
100 loops, best of 3: 5.18 ms per loop

Error in getting specific row from python pandas

I want to extract a row by name from the foll. dataframe:
Unnamed: 1 1 1.1 2 TOT
0
1 DEPTH(m) 0.01 1.24 1.52 NaN
2 BD 33kpa(t/m3) 1.60 1.60 1.60 NaN
3 SAND(%) 42.10 42.10 65.10 NaN
4 SILT(%) 37.90 37.90 16.90 NaN
5 CLAY(%) 20.00 20.00 18.00 NaN
6 ROCK(%) 12.00 12.00 12.00 NaN
7 WLS(kg/ha) 2.60 8.20 0.10 10.9
8 WLM(kg/ha) 5.00 8.30 0.00 13.4
9 WLSL(kg/ha) 0.00 3.80 0.10 3.9
10 WLSC(kg/ha) 1.10 3.50 0.00 4.6
11 WLMC(kg/ha) 2.10 3.50 0.00 5.6
12 WLSLC(kg/ha) 0.00 1.60 0.00 1.6
13 WLSLNC(kg/ha) 1.10 1.80 0.00 2.9
14 WBMC(kg/ha) 3.40 835.10 195.20 1033.7
15 WHSC(kg/ha) 66.00 8462.00 1924.00 10451.0
16 WHPC(kg/ha) 146.00 18020.00 4102.00 22269.0
17 WOC(kg/ha) 219.00 27324.00 6221.00 34.0
18 WLSN(kg/ha) 0.00 0.00 0.00 0.0
19 WLMN(kg/ha) 0.00 0.10 0.00 0.1
20 WBMN(kg/ha) 0.50 92.60 19.30 112.5
21 WHSN(kg/ha) 7.00 843.00 191.00 1041.0
22 WHPN(kg/ha) 15.00 1802.00 410.00 2227.0
23 WON(kg/ha) 22.00 2738.00 621.00 3381.0
I want to extract the row containing info on WOC(kg/ha). here is what I am doing:
df.loc['WOC(kg/ha)']
but I get the error:
*** KeyError: 'the label [WOC(kg/ha)] is not in the [index]'

You don't have that label in your index, it's in your first column the following should work:
df.loc[df['Unnamed: 1'] == 'WOC(kg/ha)']
otherwise set the index to that column and your code would work fine:
df.set_index('Unnamed: 1', inplace=True)
Also, this can be used to set index without explicitly specifying column name: df.set_index(df.columns[0], inplace=True)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get data from Multi-index dataframe based on numpy array - python

Try the following code: mask_array = [1 0 0 1 2 0 1 2] df_first = pd.DataFrame() # < It's your first array > new_array = df_first[df_first['dim_1'].isin(mask_array)]

Related

Create new columns from existing column and sort the values from smallest to largest

change values in dataframe row based on condition

Pandas: Sum current and following cell in Pandas DataFrame

interacting over a dateframe with functions

Error in getting specific row from python pandas

Categories

Resources