Python code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas_datareader import data as wb
stock='3988.HK'
df = wb.DataReader(stock,data_source='yahoo',start='2018-07-01')
rsi_period = 14
chg = df['Close'].diff(1)
gain = chg.mask(chg<0,0)
df['Gain'] = gain
loss = chg.mask(chg>0,0)
df['Loss'] = loss
avg_gain = gain.ewm(com = rsi_period-1,min_periods=rsi_period).mean()
avg_loss = loss.ewm(com = rsi_period-1,min_periods=rsi_period).mean()
df['Avg Gain'] = avg_gain
df['Avg Loss'] = avg_loss
rs = abs(avg_gain/avg_loss)
rsi = 100-(100/(1+rs))
df['RSI'] = rsi
df.reset_index(inplace=True)
df
Output:
Date High Low Open Close Volume Adj Close Gain Loss Avg Gain Avg Loss RSI
0 2018-07-03 3.87 3.76 3.83 3.84 684899302.0 3.629538 NaN NaN NaN NaN NaN
1 2018-07-04 3.91 3.84 3.86 3.86 460325574.0 3.648442 0.02 0.00 NaN NaN NaN
2 2018-07-05 3.70 3.62 3.68 3.68 292810499.0 3.680000 0.00 -0.18 NaN NaN NaN
3 2018-07-06 3.72 3.61 3.69 3.67 343653088.0 3.670000 0.00 -0.01 NaN NaN NaN
4 2018-07-09 3.75 3.68 3.70 3.69 424596186.0 3.690000 0.02 0.00 NaN NaN NaN
5 2018-07-10 3.74 3.70 3.71 3.71 327048051.0 3.710000 0.02 0.00 NaN NaN NaN
6 2018-07-11 3.65 3.61 3.63 3.64 371355401.0 3.640000 0.00 -0.07 NaN NaN NaN
7 2018-07-12 3.69 3.63 3.66 3.66 309888328.0 3.660000 0.02 0.00 NaN NaN NaN
8 2018-07-13 3.69 3.62 3.69 3.63 261928758.0 3.630000 0.00 -0.03 NaN NaN NaN
9 2018-07-16 3.63 3.57 3.61 3.62 306970074.0 3.620000 0.00 -0.01 NaN NaN NaN
10 2018-07-17 3.62 3.56 3.62 3.58 310294921.0 3.580000 0.00 -0.04 NaN NaN NaN
11 2018-07-18 3.61 3.55 3.58 3.58 334592695.0 3.580000 0.00 0.00 NaN NaN NaN
12 2018-07-19 3.61 3.56 3.61 3.56 211984563.0 3.560000 0.00 -0.02 NaN NaN NaN
13 2018-07-20 3.64 3.52 3.57 3.61 347506394.0 3.610000 0.05 0.00 NaN NaN NaN
14 2018-07-23 3.65 3.57 3.59 3.62 313125328.0 3.620000 0.01 0.00 0.010594 -0.021042 33.487100
15 2018-07-24 3.71 3.60 3.60 3.68 367627204.0 3.680000 0.06 0.00 0.015854 -0.018802 45.745967
16 2018-07-25 3.73 3.68 3.72 3.69 270460990.0 3.690000 0.01 0.00 0.015252 -0.016868 47.483263
17 2018-07-26 3.73 3.66 3.72 3.69 234388072.0 3.690000 0.00 0.00 0.013731 -0.015186 47.483263
18 2018-07-27 3.70 3.66 3.68 3.69 190039532.0 3.690000 0.00 0.00 0.012399 -0.013713 47.483263
19 2018-07-30 3.72 3.67 3.68 3.70 163971848.0 3.700000 0.01 0.00 0.012172 -0.012417 49.502851
20 2018-07-31 3.70 3.66 3.67 3.68 168486023.0 3.680000 0.00 -0.02 0.011047 -0.013118 45.716244
21 2018-08-01 3.72 3.66 3.71 3.68 199801191.0 3.680000 0.00 0.00 0.010047 -0.011930 45.716244
22 2018-08-02 3.68 3.59 3.66 3.61 307920738.0 3.610000 0.00 -0.07 0.009155 -0.017088 34.884632
23 2018-08-03 3.62 3.57 3.59 3.61 184816985.0 3.610000 0.00 0.00 0.008356 -0.015596 34.884632
24 2018-08-06 3.66 3.60 3.62 3.61 189696153.0 3.610000 0.00 0.00 0.007637 -0.014256 34.884632
25 2018-08-07 3.66 3.61 3.63 3.65 216157642.0 3.650000 0.04 0.00 0.010379 -0.013048 44.302922
26 2018-08-08 3.66 3.61 3.65 3.63 215365540.0 3.630000 0.00 -0.02 0.009511 -0.013629 41.101805
27 2018-08-09 3.66 3.59 3.59 3.65 230275455.0 3.650000 0.02 0.00 0.010378 -0.012504 45.353992
28 2018-08-10 3.66 3.60 3.65 3.62 219157328.0 3.620000 0.00 -0.03 0.009530 -0.013933 40.617049
29 2018-08-13 3.59 3.54 3.58 3.56 270620120.0 3.560000 0.00 -0.06 0.008759 -0.017658 33.158019
In this case, i want to create a new column 'max close within 14 trade days'.
'max close within 14 trade days' = Maximum 'close' within next 14 days.
For example,
in row 0, the data range should be from row 1 to row 15,
'max close within 14 trade days' = 3.86
You can do the following:
# convert to date time
df['Date'] = pd.to_datetime(df['Date'])
# calculate max for 14 days
df['max_close_within_14_trade_days'] = df['Date'].map(df.groupby([pd.Grouper(key='Date', freq='14D')])['Close'].max())
# fill missing values by previous value
df['max_close_within_14_trade_days'].fillna(method='ffill', inplace=True)
Date High Low Open Close max_close_within_14_trade_days
0 0 2018-07-03 3.87 3.76 3.83 3.86
1 1 2018-07-04 3.91 3.84 3.86 3.86
2 2 2018-07-05 3.70 3.62 3.68 3.86
3 3 2018-07-06 3.72 3.61 3.69 3.86
4 4 2018-07-09 3.75 3.68 3.70 3.86
Other solution:
df['max_close_within_14_trade_days'] = [df.loc[x+1:x+14,'Close'].max() for x in range(0, df.shape[0])]
Related
I'm trying to merge two dfs (basically th same df at different time) using pd.concat.
here is my code:
Aujourdhui = datetime.datetime.now()
Aujourdhui = (Aujourdhui.strftime("%X"))
PerfsL1 = pd.read_html('https://fbref.com/fr/comps/13/stats/Statistiques-Ligue-1#all_stats_standard', header=1)[0]
PerfsL1.columns = ['Équipe', 'Used_players', 'age', 'Possesion', "nb_matchs", "Starts", "Min",
'90s','Buts','Assists', 'No_penaltis', 'Penaltis', 'Penaltis_tentes',
'Cartons_jaunes', 'Cartons_rouges', 'Buts/90mn','Assists/90mn', 'B+A /90mn',
'NoPenaltis/90mn', 'B+A+P/90mn','Exp_buts','Exp_NoPenaltis', 'Exp_Assists', 'Exp_NP+A',
'Exp_buts/90mn', 'Exp_Assists/90mn','Exp_B+A/90mn','Exp_NoPenaltis/90mn', 'Exp_NP+A/90mn']
PerfsL1.insert(0, "Date", Aujourdhui)
print(PerfsL1)
PerfsL12 = pd.read_csv('Ligue_1_Perfs.csv', index_col=0)
print(PerfsL12)
PerfsL1 = pd.concat([PerfsL1, PerfsL12], ignore_index = True)
print (PerfsL1)
I successfully managed to get both df individually which are sharing the same columns, but I can't merge them, getting
ValueError: no types given.
Do you have an idea where it could be coming from ?
EDIT
Here are both dataframes:
'Ligue_1.csv'
Date Équipe Used_players age Possesion nb_matchs ... Exp_NP+A Exp_buts/90mn Exp_Assists/90mn Exp_B+A/90mn Exp_NoPenaltis/90mn Exp_NP+A/90mn
0 00:37:48 Ajaccio 18 29.1 34.5 2 ... 1.6 0.97 0.24 1.20 0.57 0.81
1 00:37:48 Angers 18 26.8 55.0 2 ... 5.9 1.78 1.18 2.96 1.78 2.96
2 00:37:48 Auxerre 15 29.4 39.5 2 ... 3.3 0.83 0.80 1.63 0.83 1.63
3 00:37:48 Brest 18 26.8 42.5 2 ... 5.0 1.67 1.23 2.90 1.28 2.51
4 00:37:48 Clermont Foot 18 27.8 48.5 2 ... 1.8 0.89 0.38 1.27 0.50 0.88
5 00:37:48 Lens 16 26.2 63.0 2 ... 5.6 1.92 1.29 3.21 1.53 2.82
6 00:37:48 Lille 18 27.2 65.0 2 ... 7.3 2.02 1.65 3.66 2.02 3.66
7 00:37:48 Lorient 14 25.8 36.0 1 ... 0.6 0.37 0.26 0.63 0.37 0.63
8 00:37:48 Lyon 15 26.0 68.0 1 ... 1.2 1.52 0.49 2.00 0.73 1.22
9 00:37:48 Marseille 17 26.9 55.0 2 ... 4.9 1.40 1.03 2.43 1.40 2.43
10 00:37:48 Monaco 19 24.8 40.5 2 ... 7.1 2.74 1.19 3.93 2.35 3.54
11 00:37:48 Montpellier 19 25.5 47.5 2 ... 3.2 0.93 0.66 1.59 0.93 1.59
12 00:37:48 Nantes 16 26.9 40.5 2 ... 3.9 1.37 0.60 1.97 1.37 1.97
13 00:37:48 Nice 18 25.9 54.0 2 ... 3.1 1.25 0.69 1.94 0.86 1.55
14 00:37:48 Paris S-G 18 27.6 60.0 2 ... 8.1 3.05 1.76 4.81 2.27 4.03
print(PerfsL1 = pd.read_html('https://fbref.com/fr/comps/13/stats/Statistiques-Ligue-1#all_stats_standard', header=1)[0])
Date Équipe Used_players age Possesion nb_matchs ... Exp_NP+A Exp_buts/90mn Exp_Assists/90mn Exp_B+A/90mn Exp_NoPenaltis/90mn Exp_NP+A/90mn
0 09:56:18 Ajaccio 18 29.1 34.5 2 ... 1.6 0.97 0.24 1.20 0.57 0.81
1 09:56:18 Angers 18 26.8 55.0 2 ... 5.9 1.78 1.18 2.96 1.78 2.96
2 09:56:18 Auxerre 15 29.4 39.5 2 ... 3.3 0.83 0.80 1.63 0.83 1.63
3 09:56:18 Brest 18 26.8 42.5 2 ... 5.0 1.67 1.23 2.90 1.28 2.51
4 09:56:18 Clermont Foot 18 27.8 48.5 2 ... 1.8 0.89 0.38 1.27 0.50 0.88
5 09:56:18 Lens 16 26.2 63.0 2 ... 5.6 1.92 1.29 3.21 1.53 2.82
6 09:56:18 Lille 18 27.2 65.0 2 ... 7.3 2.02 1.65 3.66 2.02 3.66
7 09:56:18 Lorient 14 25.8 36.0 1 ... 0.6 0.37 0.26 0.63 0.37 0.63
8 09:56:18 Lyon 15 26.0 68.0 1 ... 1.2 1.52 0.49 2.00 0.73 1.22
9 09:56:18 Marseille 17 26.9 55.0 2 ... 4.9 1.40 1.03 2.43 1.40 2.43
10 09:56:18 Monaco 19 24.8 40.5 2 ... 7.1 2.74 1.19 3.93 2.35 3.54
11 09:56:18 Montpellier 19 25.5 47.5 2 ... 3.2 0.93 0.66 1.59 0.93 1.59
12 09:56:18 Nantes 16 26.9 40.5 2 ... 3.9 1.37 0.60 1.97 1.37 1.97
13 09:56:18 Nice 18 25.9 54.0 2 ... 3.1 1.25 0.69 1.94 0.86 1.55
Thanks you for your support and have a great day !
Your code should work.
Nevertheless, try this before the concat:
PerfsL1["Date"] = pd.to_datetime(PerfsL1["Date"], format="%X", errors=‘coerce’)
I finally managed to concat both tables.
The solution was to put but both as csv before:
table1 = pd.read_html ('http://.......1........com)
table1.to_csv ('C://.....1........')
table1 = pd.read_csv('C://.....1........')
table2 = pd.read_html ('http://.......2........com)
table2.to_csv ('C://.....2........')
table2 = pd.read_csv('C://.....2........')
x = pd.concat([table2, table1])
And now it works perfectly !
Thanks for your help !
This is the original data.
pbr2=[PBR -list of companies -DataFrame]
KIS Stock 20100129 20100226 20100331 20100430 20100531
0 001402 036460 1.01 0.98 0.73 0.69 0.65
1 001471 033780 2.66 2.52 2.34 2.09 2.25
3 041800 035720 5.58 5.55 4.51 4.46 5.26
4 064847 032640 1.28 1.10 1.06 1.15 1.07
5 086512 069960 1.54 1.51 1.37 1.32 1.43
I have created a sorting function to sort multiple columns easily.
I have also created a for loop to repeat the sort_it function to iterate on multiple columns.
def sort_it(num):
df = pd.DataFrame(pbr2.iloc[:,[1,num]])
df = df.sort_values(df.columns[1])
return df
for i in range(4,10):
df = sort_it(i)
if i ==4:
total_df =df
else:
total_df =pd.concat([total_df,df])
The problem is that when I use sort_it individually, it outputs it correctly without showing NANs in the data. However, if I use the for loop
it turns out almost all of the data turns into a NAN. Like this:
Stock 20100331 20100430 20100531 20100630 20100730 20100831
87 016380 0.38 NaN NaN NaN NaN NaN
18 049770 0.43 NaN NaN NaN NaN NaN
47 003240 0.47 NaN. NaN NaN NaN NaN
I need the result to be:
What I want is to sort_value(ascending-order) for each column using a for-loop. Like the example of sort_it(4) for each column.
KIS Stock 20100129 20100226 20100331 20100430 20100531
0 001402 036460 1.01 0.98 0.73 0.69 0.65
1 001471 033780 2.66 2.52 2.34 2.09 2.25
3 041800 035720 5.58 5.55 4.51 4.46 5.26
4 064847 032640 1.28 1.10 1.06 1.15 1.07
5 086512 069960 1.54 1.51 1.37 1.32 1.43
sort_it(4)
Stock 20100331
87 016380 0.38
18 049770 0.43
47 003240 0.47
52 007700 0.47
... ... ...
131 051600 5.08
22 051900 8.81
I have an extract of a dataframe below:
ticker date open high low close
0 A2M 2020-08-28 18.45 18.71 17.39 17.47
1 A2M 2020-09-04 17.47 17.52 16.53 16.70
2 A2M 2020-09-11 16.70 16.97 16.13 16.45
3 A2M 2020-09-18 16.54 16.77 16.25 16.39
4 A2M 2020-09-25 16.36 17.13 16.32 17.02
5 AAN 2007-06-08 15.29 15.33 14.93 15.07
6 AAN 2007-06-15 15.10 15.23 14.95 15.18
7 AAN 2007-06-22 15.18 15.25 15.12 15.16
8 AAN 2007-06-29 15.14 15.25 15.11 15.22
9 AAN 2007-07-06 15.11 15.33 15.07 15.33
10 AAN 2007-07-13 15.29 15.35 15.12 15.26
11 AAN 2007-07-20 15.25 15.27 15.02 15.10
12 AAN 2007-07-27 15.05 15.15 14.00 14.82
13 AAN 2007-08-03 14.72 14.85 14.47 14.69
14 AAN 2007-08-10 14.56 14.90 14.22 14.54
15 AAN 2007-08-17 14.55 14.79 13.71 14.42
16 AAP 2000-10-06 7.11 7.14 7.10 7.12
17 AAP 2000-10-13 7.13 7.17 7.12 7.17
18 AAP 2000-10-20 7.16 7.25 7.16 7.23
19 AAP 2000-10-27 7.23 7.24 7.22 7.23
20 AAP 2000-11-03 7.16 7.25 7.12 7.25
21 AAP 2000-11-10 7.24 7.24 7.12 7.12
22 ABB 2002-07-26 2.70 3.05 2.60 2.95
23 ABB 2002-08-02 2.92 2.95 2.75 2.80
24 ABB 2002-08-09 2.80 2.84 2.70 2.70
25 ABB 2002-08-16 2.72 2.75 2.70 2.75
26 ABB 2002-08-23 2.71 2.85 2.71 2.75
27 ABB 2002-08-30 2.75 2.75 2.75 2.75
I've created the following code to find upPrices vs. downPrices:
i = 0
upPrices=[]
downPrices=[]
while i < len(df['close']):
if i == 0:
upPrices.append(0)
downPrices.append(0)
else:
if (df['close'][i]-df['close'][i-1])>0:
upPrices.append(df['close'][i]-df['close'][i-1])
downPrices.append(0)
else:
downPrices.append(df['close'][i]-df['close'][i-1])
upPrices.append(0)
i += 1
df['upPrices'] = upPrices
df['downPrices'] = downPrices
The result is the following dataframe:
ticker date open high low close upPrices downPrices
0 A2M 2020-08-28 18.45 18.71 17.39 17.47 0.00 0.00
1 A2M 2020-09-04 17.47 17.52 16.53 16.70 0.00 -0.77
2 A2M 2020-09-11 16.70 16.97 16.13 16.45 0.00 -0.25
3 A2M 2020-09-18 16.54 16.77 16.25 16.39 0.00 -0.06
4 A2M 2020-09-25 16.36 17.13 16.32 17.02 0.63 0.00
5 AAN 2007-06-08 15.29 15.33 14.93 15.07 0.00 -1.95
6 AAN 2007-06-15 15.10 15.23 14.95 15.18 0.11 0.00
7 AAN 2007-06-22 15.18 15.25 15.12 15.16 0.00 -0.02
8 AAN 2007-06-29 15.14 15.25 15.11 15.22 0.06 0.00
9 AAN 2007-07-06 15.11 15.33 15.07 15.33 0.11 0.00
10 AAN 2007-07-13 15.29 15.35 15.12 15.26 0.00 -0.07
11 AAN 2007-07-20 15.25 15.27 15.02 15.10 0.00 -0.16
12 AAN 2007-07-27 15.05 15.15 14.00 14.82 0.00 -0.28
13 AAN 2007-08-03 14.72 14.85 14.47 14.69 0.00 -0.13
14 AAN 2007-08-10 14.56 14.90 14.22 14.54 0.00 -0.15
15 AAN 2007-08-17 14.55 14.79 13.71 14.42 0.00 -0.12
16 AAP 2000-10-06 7.11 7.14 7.10 7.12 0.00 -7.30
17 AAP 2000-10-13 7.13 7.17 7.12 7.17 0.05 0.00
18 AAP 2000-10-20 7.16 7.25 7.16 7.23 0.06 0.00
19 AAP 2000-10-27 7.23 7.24 7.22 7.23 0.00 0.00
20 AAP 2000-11-03 7.16 7.25 7.12 7.25 0.02 0.00
21 AAP 2000-11-10 7.24 7.24 7.12 7.12 0.00 -0.13
22 ABB 2002-07-26 2.70 3.05 2.60 2.95 0.00 -4.17
23 ABB 2002-08-02 2.92 2.95 2.75 2.80 0.00 -0.15
24 ABB 2002-08-09 2.80 2.84 2.70 2.70 0.00 -0.10
25 ABB 2002-08-16 2.72 2.75 2.70 2.75 0.05 0.00
26 ABB 2002-08-23 2.71 2.85 2.71 2.75 0.00 0.00
27 ABB 2002-08-30 2.75 2.75 2.75 2.75 0.00 0.00
Unfortunately the logic is not correct. The upPrices and downPrices need to be for each ticker. At the moment, you can see that in rows 5, 16 and 22 it compares the previous close from another ticker. Essentially, I need this formula to groupby or some other means to restart at each ticker. However, when I try add in groupby it returns index length mismatch errors.
Please help!
Your intuition of groupby is correct. groupby ticker then diff the closing prices. You can use where to get it separated into the style of up and down columns you wanted. Plus, now no more loop! For something that just requires "basic" math operations a vectorized approach is much better.
import pandas as pd
data = {"ticker":["A2M","A2M","A2M","A2M","A2M","AAN","AAN","AAN","AAN"], "close":[17.47,16.7,16.45,16.39,17.02,15.07,15.18,15.16,15.22]}
df = pd.DataFrame(data)
df["diff"] = df.groupby("ticker")["close"].diff()
df["upPrice"] = df["diff"].where(df["diff"] > 0, 0)
df["downPrice"] = df["diff"].where(df["diff"] < 0, 0)
del df["diff"]
print(df)
if I have a date frame like this:
N
EG_00_04 NEG_04_08 NEG_08_12 NEG_12_16 NEG_16_20 NEG_20_24 \
datum_von
2017-10-12 21.69 15.36 0.87 1.42 0.76 0.65
2017-10-13 11.85 8.08 1.39 2.86 1.02 0.55
2017-10-14 7.83 5.88 1.87 2.04 2.29 2.18
2017-10-15 14.64 11.28 2.62 3.35 2.13 1.25
2017-10-16 5.11 5.82 -0.30 -0.38 -0.24 -0.10
2017-10-17 12.09 9.61 0.20 1.09 0.39 0.57
And I wanna check the values that are above 0 and change them to zero when they are lower.
Not sure how should I use the function iterrows() and the loc() function to do so.
you can try:
df1 = df[df > 0].fillna(0)
as result:
In [24]: df
Out[24]:
EG_00_04 NEG_04_08 NEG_08_12 NEG_12_16 NEG_16_20 NEG_20_24 \
0 2017-10-12 21.69 15.36 0.87 1.42 0.76
1 2017-10-13 11.85 8.08 1.39 2.86 1.02
2 2017-10-14 7.83 5.88 1.87 2.04 2.29
3 2017-10-15 14.64 11.28 2.62 3.35 2.13
4 2017-10-16 5.11 5.82 -0.30 -0.38 -0.24
5 2017-10-17 12.09 9.61 0.20 1.09 0.39
datum_von
0 0.65
1 0.55
2 2.18
3 1.25
4 -0.10
5 0.57
In [25]: df1 = df[df > 0].fillna(0)
In [26]: df1
Out[26]:
EG_00_04 NEG_04_08 NEG_08_12 NEG_12_16 NEG_16_20 NEG_20_24 \
0 2017-10-12 21.69 15.36 0.87 1.42 0.76
1 2017-10-13 11.85 8.08 1.39 2.86 1.02
2 2017-10-14 7.83 5.88 1.87 2.04 2.29
3 2017-10-15 14.64 11.28 2.62 3.35 2.13
4 2017-10-16 5.11 5.82 0.00 0.00 0.00
5 2017-10-17 12.09 9.61 0.20 1.09 0.39
datum_von
0 0.65
1 0.55
2 2.18
3 1.25
4 0.00
5 0.57
clip_lower and mask solutions are good.
Here is another one with applymap:
df.applymap(lambda x: max(0.0, x))
Use clip_lower:
df = df.clip_lower(0)
print (df)
G_00_04 NEG_04_08 NEG_08_12 NEG_12_16 NEG_16_20 NEG_20_24
datum_von
2017-10-12 21.69 15.36 0.87 1.42 0.76 0.65
2017-10-13 11.85 8.08 1.39 2.86 1.02 0.55
2017-10-14 7.83 5.88 1.87 2.04 2.29 2.18
2017-10-15 14.64 11.28 2.62 3.35 2.13 1.25
2017-10-16 5.11 5.82 0.00 0.00 0.00 0.00
2017-10-17 12.09 9.61 0.20 1.09 0.39 0.57
If first column is not index:
df = df.set_index('datum_von').clip_lower(0)
print (df)
G_00_04 NEG_04_08 NEG_08_12 NEG_12_16 NEG_16_20 NEG_20_24
datum_von
2017-10-12 21.69 15.36 0.87 1.42 0.76 0.65
2017-10-13 11.85 8.08 1.39 2.86 1.02 0.55
2017-10-14 7.83 5.88 1.87 2.04 2.29 2.18
2017-10-15 14.64 11.28 2.62 3.35 2.13 1.25
2017-10-16 5.11 5.82 0.00 0.00 0.00 0.00
2017-10-17 12.09 9.61 0.20 1.09 0.39 0.57
Alternative solution:
df = df.mask(df < 0, 0)
print (df)
G_00_04 NEG_04_08 NEG_08_12 NEG_12_16 NEG_16_20 NEG_20_24
datum_von
2017-10-12 21.69 15.36 0.87 1.42 0.76 0.65
2017-10-13 11.85 8.08 1.39 2.86 1.02 0.55
2017-10-14 7.83 5.88 1.87 2.04 2.29 2.18
2017-10-15 14.64 11.28 2.62 3.35 2.13 1.25
2017-10-16 5.11 5.82 0.00 0.00 0.00 0.00
2017-10-17 12.09 9.61 0.20 1.09 0.39 0.57
Timings:
df = pd.concat([df]*10000).reset_index(drop=True)
In [240]: %timeit (df.applymap(lambda x: max(0.0, x)))
10 loops, best of 3: 164 ms per loop
In [241]: %timeit (df[df > 0].fillna(0))
100 loops, best of 3: 7.05 ms per loop
In [242]: %timeit (df.clip_lower(0))
1000 loops, best of 3: 1.96 ms per loop
In [243]: %timeit df.mask(df < 0, 0)
100 loops, best of 3: 5.18 ms per loop
I want to extract a row by name from the foll. dataframe:
Unnamed: 1 1 1.1 2 TOT
0
1 DEPTH(m) 0.01 1.24 1.52 NaN
2 BD 33kpa(t/m3) 1.60 1.60 1.60 NaN
3 SAND(%) 42.10 42.10 65.10 NaN
4 SILT(%) 37.90 37.90 16.90 NaN
5 CLAY(%) 20.00 20.00 18.00 NaN
6 ROCK(%) 12.00 12.00 12.00 NaN
7 WLS(kg/ha) 2.60 8.20 0.10 10.9
8 WLM(kg/ha) 5.00 8.30 0.00 13.4
9 WLSL(kg/ha) 0.00 3.80 0.10 3.9
10 WLSC(kg/ha) 1.10 3.50 0.00 4.6
11 WLMC(kg/ha) 2.10 3.50 0.00 5.6
12 WLSLC(kg/ha) 0.00 1.60 0.00 1.6
13 WLSLNC(kg/ha) 1.10 1.80 0.00 2.9
14 WBMC(kg/ha) 3.40 835.10 195.20 1033.7
15 WHSC(kg/ha) 66.00 8462.00 1924.00 10451.0
16 WHPC(kg/ha) 146.00 18020.00 4102.00 22269.0
17 WOC(kg/ha) 219.00 27324.00 6221.00 34.0
18 WLSN(kg/ha) 0.00 0.00 0.00 0.0
19 WLMN(kg/ha) 0.00 0.10 0.00 0.1
20 WBMN(kg/ha) 0.50 92.60 19.30 112.5
21 WHSN(kg/ha) 7.00 843.00 191.00 1041.0
22 WHPN(kg/ha) 15.00 1802.00 410.00 2227.0
23 WON(kg/ha) 22.00 2738.00 621.00 3381.0
I want to extract the row containing info on WOC(kg/ha). here is what I am doing:
df.loc['WOC(kg/ha)']
but I get the error:
*** KeyError: 'the label [WOC(kg/ha)] is not in the [index]'
You don't have that label in your index, it's in your first column the following should work:
df.loc[df['Unnamed: 1'] == 'WOC(kg/ha)']
otherwise set the index to that column and your code would work fine:
df.set_index('Unnamed: 1', inplace=True)
Also, this can be used to set index without explicitly specifying column name: df.set_index(df.columns[0], inplace=True)