Split data frame based on specific value in Python

Split data frame based on specific value in Python - python

I have a dataframe as below:
Size C1 C2 C3 C4 C5 C6 C7 C8 C9
10000 .90 1.10 1.30 1.50 2.10 3.10 5.60 8.40 15.80
15000 1.35 1.65 1.95 2.25 3.15 4.65 8.40 12.60 23.70
20000 1.80 2.20 2.60 3.00 4.20 6.20 11.20 16.80 31.60
25000 2.25 2.75 3.25 3.75 5.25 7.75 14.00 21.00 39.50
30000 2.70 3.30 3.90 4.50 6.30 9.30 16.80 25.20 47.40
35000 3.15 3.85 4.55 5.25 7.35 10.85 19.60 29.40 55.30
40000 3.60 4.40 5.20 6.00 8.40 12.40 22.40 33.60 63.20
45000 4.05 4.95 5.85 6.75 9.45 13.95 25.20 37.80 71.10
50000 4.50 5.50 6.50 7.50 10.50 15.50 28.00 42.00 79.00
10000 .60 .80 1.00 1.20 1.80 2.80 5.30 8.10 15.50
15000 .90 1.20 1.50 1.80 2.70 4.20 7.95 12.15 23.25
20000 1.20 1.60 2.00 2.40 3.60 5.60 10.60 16.20 31.00
25000 1.50 2.00 2.50 3.00 4.50 7.00 13.25 20.25 38.75
30000 1.80 2.40 3.00 3.60 5.40 8.40 15.90 24.30 46.50
35000 2.10 2.80 3.50 4.20 6.30 9.80 18.55 28.35 54.25
40000 2.40 3.20 4.00 4.80 7.20 11.20 21.20 32.40 62.00
45000 2.70 3.60 4.50 5.40 8.10 12.60 23.85 36.45 69.75
50000 3.00 4.00 5.00 6.00 9.00 14.00 26.50 40.50 77.50
1000 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20
2000 0.39 0.39 0.39 0.39 0.39 0.39 0.39 0.39 0.39
3000 0.59 0.59 0.59 0.59 0.59 0.59 0.59 0.59 0.59
4000 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.78
5000 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98
6000 1.17 1.17 1.17 1.17 1.17 1.17 1.17 1.17 1.17
7000 1.37 1.37 1.37 1.37 1.37 1.37 1.37 1.37 1.37
8000 1.56 1.56 1.56 1.56 1.56 1.56 1.56 1.56 1.56
9000 1.76 1.76 1.76 1.76 1.76 1.76 1.76 1.76 1.76
10000 1.95 1.95 1.95 1.95 1.95 1.95 1.95 1.95 1.95
Now I would like to split them into 3 dataframes based on the 'Size'
df1: From 10000 - before next occurrence of 10000
df2: Second 10000 - before 1000
df3: From 1000 to end
Otherwise,it is fine to have a temporary variable (temp column) in the same dataframe specifying categories like S1,S2 and S3 respectively for above ranges.
Could anyone guide me how to go about this?
Regards

Assumng that you want to break on the decreases, you could use the compare-cumsum-groupby pattern:
parts = list(df.groupby((df["Size"].diff() < 0).cumsum()))
which gives me (suppressing boring rows in the middle)
>>> for key, group in parts:
... print(key)
... print(group)
... print("----")
...
0
Size C1 C2 C3 C4 C5 C6 C7 C8 C9
0 10000 0.90 1.10 1.30 1.50 2.10 3.10 5.6 8.4 15.8
1 15000 1.35 1.65 1.95 2.25 3.15 4.65 8.4 12.6 23.7
2 20000 1.80 2.20 2.60 3.00 4.20 6.20 11.2 16.8 31.6
[...]
7 45000 4.05 4.95 5.85 6.75 9.45 13.95 25.2 37.8 71.1
8 50000 4.50 5.50 6.50 7.50 10.50 15.50 28.0 42.0 79.0
----
1
Size C1 C2 C3 C4 C5 C6 C7 C8 C9
9 10000 0.6 0.8 1.0 1.2 1.8 2.8 5.30 8.10 15.50
10 15000 0.9 1.2 1.5 1.8 2.7 4.2 7.95 12.15 23.25
11 20000 1.2 1.6 2.0 2.4 3.6 5.6 10.60 16.20 31.00
[...]
16 45000 2.7 3.6 4.5 5.4 8.1 12.6 23.85 36.45 69.75
17 50000 3.0 4.0 5.0 6.0 9.0 14.0 26.50 40.50 77.50
----
2
Size C1 C2 C3 C4 C5 C6 C7 C8 C9
18 1000 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20
19 2000 0.39 0.39 0.39 0.39 0.39 0.39 0.39 0.39 0.39
20 3000 0.59 0.59 0.59 0.59 0.59 0.59 0.59 0.59 0.59
[...]
26 9000 1.76 1.76 1.76 1.76 1.76 1.76 1.76 1.76 1.76
27 10000 1.95 1.95 1.95 1.95 1.95 1.95 1.95 1.95 1.90
----

Not so elegant but this works:
In [259]:
ranges=[]
first = df.index[0]
criteria = df.index[df['Size'].diff() < 0]
for idx in criteria:
ranges.append((first, idx))
first += idx
ranges
Out[259]:
[(0, 9), (9, 18)]
In [261]:
splits = []
for r in ranges:
splits.append(df.iloc[r[0]:r[1]])
splits.append(df.iloc[ranges[-1][0]:])
splits
Out[261]:
[ Size C1 C2 C3 C4 C5 C6 C7 C8 C9
0 10000 0.90 1.10 1.30 1.50 2.10 3.10 5.6 8.4 15.8
1 15000 1.35 1.65 1.95 2.25 3.15 4.65 8.4 12.6 23.7
2 20000 1.80 2.20 2.60 3.00 4.20 6.20 11.2 16.8 31.6
3 25000 2.25 2.75 3.25 3.75 5.25 7.75 14.0 21.0 39.5
4 30000 2.70 3.30 3.90 4.50 6.30 9.30 16.8 25.2 47.4
5 35000 3.15 3.85 4.55 5.25 7.35 10.85 19.6 29.4 55.3
6 40000 3.60 4.40 5.20 6.00 8.40 12.40 22.4 33.6 63.2
7 45000 4.05 4.95 5.85 6.75 9.45 13.95 25.2 37.8 71.1
8 50000 4.50 5.50 6.50 7.50 10.50 15.50 28.0 42.0 79.0,
Size C1 C2 C3 C4 C5 C6 C7 C8 C9
9 10000 0.6 0.8 1.0 1.2 1.8 2.8 5.30 8.10 15.50
10 15000 0.9 1.2 1.5 1.8 2.7 4.2 7.95 12.15 23.25
11 20000 1.2 1.6 2.0 2.4 3.6 5.6 10.60 16.20 31.00
12 25000 1.5 2.0 2.5 3.0 4.5 7.0 13.25 20.25 38.75
13 30000 1.8 2.4 3.0 3.6 5.4 8.4 15.90 24.30 46.50
14 35000 2.1 2.8 3.5 4.2 6.3 9.8 18.55 28.35 54.25
15 40000 2.4 3.2 4.0 4.8 7.2 11.2 21.20 32.40 62.00
16 45000 2.7 3.6 4.5 5.4 8.1 12.6 23.85 36.45 69.75
17 50000 3.0 4.0 5.0 6.0 9.0 14.0 26.50 40.50 77.50,
Size C1 C2 C3 C4 C5 C6 C7 C8 C9
9 10000 0.60 0.80 1.00 1.20 1.80 2.80 5.30 8.10 15.50
10 15000 0.90 1.20 1.50 1.80 2.70 4.20 7.95 12.15 23.25
11 20000 1.20 1.60 2.00 2.40 3.60 5.60 10.60 16.20 31.00
12 25000 1.50 2.00 2.50 3.00 4.50 7.00 13.25 20.25 38.75
13 30000 1.80 2.40 3.00 3.60 5.40 8.40 15.90 24.30 46.50
14 35000 2.10 2.80 3.50 4.20 6.30 9.80 18.55 28.35 54.25
15 40000 2.40 3.20 4.00 4.80 7.20 11.20 21.20 32.40 62.00
16 45000 2.70 3.60 4.50 5.40 8.10 12.60 23.85 36.45 69.75
17 50000 3.00 4.00 5.00 6.00 9.00 14.00 26.50 40.50 77.50
18 1000 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20
19 2000 0.39 0.39 0.39 0.39 0.39 0.39 0.39 0.39 0.39
20 3000 0.59 0.59 0.59 0.59 0.59 0.59 0.59 0.59 0.59
21 4000 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.78
22 5000 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98
23 6000 1.17 1.17 1.17 1.17 1.17 1.17 1.17 1.17 1.17
24 7000 1.37 1.37 1.37 1.37 1.37 1.37 1.37 1.37 1.37
25 8000 1.56 1.56 1.56 1.56 1.56 1.56 1.56 1.56 1.56
26 9000 1.76 1.76 1.76 1.76 1.76 1.76 1.76 1.76 1.76
27 10000 1.95 1.95 1.95 1.95 1.95 1.95 1.95 1.95 1.95]
So firstly this looks to see when the size stops increasing:
df['Size'].diff() < 0
and we use to mask the index, we then iterate over these ranges to create a list of tuple ranges.
We iterate over these ranges to slice the df in the last step.

Related

pd.concat ends up with ValueError: no types given

I'm trying to merge two dfs (basically th same df at different time) using pd.concat.
here is my code:
Aujourdhui = datetime.datetime.now()
Aujourdhui = (Aujourdhui.strftime("%X"))
PerfsL1 = pd.read_html('https://fbref.com/fr/comps/13/stats/Statistiques-Ligue-1#all_stats_standard', header=1)[0]
PerfsL1.columns = ['Équipe', 'Used_players', 'age', 'Possesion', "nb_matchs", "Starts", "Min",
'90s','Buts','Assists', 'No_penaltis', 'Penaltis', 'Penaltis_tentes',
'Cartons_jaunes', 'Cartons_rouges', 'Buts/90mn','Assists/90mn', 'B+A /90mn',
'NoPenaltis/90mn', 'B+A+P/90mn','Exp_buts','Exp_NoPenaltis', 'Exp_Assists', 'Exp_NP+A',
'Exp_buts/90mn', 'Exp_Assists/90mn','Exp_B+A/90mn','Exp_NoPenaltis/90mn', 'Exp_NP+A/90mn']
PerfsL1.insert(0, "Date", Aujourdhui)
print(PerfsL1)
PerfsL12 = pd.read_csv('Ligue_1_Perfs.csv', index_col=0)
print(PerfsL12)
PerfsL1 = pd.concat([PerfsL1, PerfsL12], ignore_index = True)
print (PerfsL1)
I successfully managed to get both df individually which are sharing the same columns, but I can't merge them, getting
ValueError: no types given.
Do you have an idea where it could be coming from ?
EDIT
Here are both dataframes:
'Ligue_1.csv'
Date Équipe Used_players age Possesion nb_matchs ... Exp_NP+A Exp_buts/90mn Exp_Assists/90mn Exp_B+A/90mn Exp_NoPenaltis/90mn Exp_NP+A/90mn
0 00:37:48 Ajaccio 18 29.1 34.5 2 ... 1.6 0.97 0.24 1.20 0.57 0.81
1 00:37:48 Angers 18 26.8 55.0 2 ... 5.9 1.78 1.18 2.96 1.78 2.96
2 00:37:48 Auxerre 15 29.4 39.5 2 ... 3.3 0.83 0.80 1.63 0.83 1.63
3 00:37:48 Brest 18 26.8 42.5 2 ... 5.0 1.67 1.23 2.90 1.28 2.51
4 00:37:48 Clermont Foot 18 27.8 48.5 2 ... 1.8 0.89 0.38 1.27 0.50 0.88
5 00:37:48 Lens 16 26.2 63.0 2 ... 5.6 1.92 1.29 3.21 1.53 2.82
6 00:37:48 Lille 18 27.2 65.0 2 ... 7.3 2.02 1.65 3.66 2.02 3.66
7 00:37:48 Lorient 14 25.8 36.0 1 ... 0.6 0.37 0.26 0.63 0.37 0.63
8 00:37:48 Lyon 15 26.0 68.0 1 ... 1.2 1.52 0.49 2.00 0.73 1.22
9 00:37:48 Marseille 17 26.9 55.0 2 ... 4.9 1.40 1.03 2.43 1.40 2.43
10 00:37:48 Monaco 19 24.8 40.5 2 ... 7.1 2.74 1.19 3.93 2.35 3.54
11 00:37:48 Montpellier 19 25.5 47.5 2 ... 3.2 0.93 0.66 1.59 0.93 1.59
12 00:37:48 Nantes 16 26.9 40.5 2 ... 3.9 1.37 0.60 1.97 1.37 1.97
13 00:37:48 Nice 18 25.9 54.0 2 ... 3.1 1.25 0.69 1.94 0.86 1.55
14 00:37:48 Paris S-G 18 27.6 60.0 2 ... 8.1 3.05 1.76 4.81 2.27 4.03
print(PerfsL1 = pd.read_html('https://fbref.com/fr/comps/13/stats/Statistiques-Ligue-1#all_stats_standard', header=1)[0])
Date Équipe Used_players age Possesion nb_matchs ... Exp_NP+A Exp_buts/90mn Exp_Assists/90mn Exp_B+A/90mn Exp_NoPenaltis/90mn Exp_NP+A/90mn
0 09:56:18 Ajaccio 18 29.1 34.5 2 ... 1.6 0.97 0.24 1.20 0.57 0.81
1 09:56:18 Angers 18 26.8 55.0 2 ... 5.9 1.78 1.18 2.96 1.78 2.96
2 09:56:18 Auxerre 15 29.4 39.5 2 ... 3.3 0.83 0.80 1.63 0.83 1.63
3 09:56:18 Brest 18 26.8 42.5 2 ... 5.0 1.67 1.23 2.90 1.28 2.51
4 09:56:18 Clermont Foot 18 27.8 48.5 2 ... 1.8 0.89 0.38 1.27 0.50 0.88
5 09:56:18 Lens 16 26.2 63.0 2 ... 5.6 1.92 1.29 3.21 1.53 2.82
6 09:56:18 Lille 18 27.2 65.0 2 ... 7.3 2.02 1.65 3.66 2.02 3.66
7 09:56:18 Lorient 14 25.8 36.0 1 ... 0.6 0.37 0.26 0.63 0.37 0.63
8 09:56:18 Lyon 15 26.0 68.0 1 ... 1.2 1.52 0.49 2.00 0.73 1.22
9 09:56:18 Marseille 17 26.9 55.0 2 ... 4.9 1.40 1.03 2.43 1.40 2.43
10 09:56:18 Monaco 19 24.8 40.5 2 ... 7.1 2.74 1.19 3.93 2.35 3.54
11 09:56:18 Montpellier 19 25.5 47.5 2 ... 3.2 0.93 0.66 1.59 0.93 1.59
12 09:56:18 Nantes 16 26.9 40.5 2 ... 3.9 1.37 0.60 1.97 1.37 1.97
13 09:56:18 Nice 18 25.9 54.0 2 ... 3.1 1.25 0.69 1.94 0.86 1.55
Thanks you for your support and have a great day !

Your code should work.
Nevertheless, try this before the concat:
PerfsL1["Date"] = pd.to_datetime(PerfsL1["Date"], format="%X", errors=‘coerce’)

I finally managed to concat both tables.
The solution was to put but both as csv before:
table1 = pd.read_html ('http://.......1........com)
table1.to_csv ('C://.....1........')
table1 = pd.read_csv('C://.....1........')
table2 = pd.read_html ('http://.......2........com)
table2.to_csv ('C://.....2........')
table2 = pd.read_csv('C://.....2........')
x = pd.concat([table2, table1])
And now it works perfectly !
Thanks for your help !

Weird bug when changing boolean number to classify dataframe

I have a dataframe where I am trying to add some boolean constraints that are numbers.
hrw_hotdry=combined_hrw[(combined_hrw['June_anom']<0) & (combined_hrw['June_anom_t'])>0]
hrw_hotdry.head()
Year June_val June_anom July_val July_anom June_val_t June_anom_t July_val_t July_anom_t
0 1980 2.14 -1.40 0.99 -2.11 76.7 2.6 83.7 5.0
1 1981 2.85 -0.69 4.01 0.91 75.5 1.4 79.1 0.4
8 1988 2.08 -1.46 3.22 0.12 76.2 2.1 77.5 -1.2
10 1990 1.88 -1.66 3.16 0.06 77.3 3.2 76.7 -2.0
11 1991 3.13 -0.41 2.69 -0.41 75.1 1.0 78.4 -0.3
However, when I change the second constraint to 1 like this:
hrw_hotdry=combined_hrw[(combined_hrw['June_anom']<0) & (combined_hrw['June_anom_t'])>1]
hrw_hotdry.head()
Year June_val June_anom July_val July_anom June_val_t June_anom_t July_val_t July_anom_t
There is no output. How does this make sense?

Parentheses were incorrect:
hrw_hotdry = combined_hrw[(combined_hrw['June_anom']<0) & (combined_hrw['June_anom_t']>1.0)]
Year June_val June_anom July_val July_anom June_val_t June_anom_t July_val_t July_anom_t
0 1980 2.14 -1.40 0.99 -2.11 76.7 2.6 83.7 5.0
1 1981 2.85 -0.69 4.01 0.91 75.5 1.4 79.1 0.4
8 1988 2.08 -1.46 3.22 0.12 76.2 2.1 77.5 -1.2
10 1990 1.88 -1.66 3.16 0.06 77.3 3.2 76.7 -2.0

Converting rows from one pandas dataframe column to multiple columns without numeric data? [duplicate]

I have DataFrame that looks like this
exec ms tp lu ru
0 exec1 16.0 240.87 2.30 0.85
1 exec1 16.0 243.72 2.35 0.84
2 exec1 16.0 234.16 2.38 0.92
3 exec1 16.0 244.71 2.35 0.84
4 exec1 16.0 240.74 2.39 0.90
5 exec1 128.0 1686.78 2.09 0.69
6 exec1 128.0 1704.36 2.00 0.44
7 exec1 128.0 1686.45 2.07 0.60
8 exec1 128.0 1722.61 2.07 0.45
9 exec1 128.0 1726.15 2.08 0.50
10 exec1 1024.0 5754.92 2.23 0.93
11 exec1 1024.0 5740.71 2.24 0.93
12 exec1 1024.0 5751.58 2.24 0.96
13 exec1 1024.0 5819.63 2.23 0.92
14 exec1 1024.0 5797.03 2.22 0.96
15 exec1 8192.0 37833.45 1.91 3.87
16 exec1 8192.0 38154.95 2.00 3.87
17 exec1 8192.0 38178.19 2.02 3.85
18 exec1 8192.0 38152.86 1.95 3.84
19 exec1 8192.0 35209.98 1.80 3.65
20 exec1 16384.0 38109.76 1.81 3.84
21 exec1 16384.0 38059.07 1.76 3.90
22 exec1 16384.0 36683.24 1.54 3.71
23 exec1 16384.0 37908.00 1.73 3.85
24 exec1 16384.0 37014.79 1.71 3.75
and I would like to make columns from ms for data from tp, lu and ru and have them as hierarchical columns and use exec as the index like this:
lu ru tp
exec 16.0 128.0 1024.0 8192.0 16384.0 16.0 128.0 1024.0 8192.0 16384.0 16.0 128.0 1024.0 8192.0 16384.0
exec1 2.30 2.09 2.23 1.91 1.81 0.85 0.69 0.93 3.87 3.84 240.87 1686.78 5754.92 37833.45 38109.76
exec1 2.35 2.00 2.24 2.00 1.76 0.84 0.44 0.93 3.87 3.90 243.72 1704.36 5740.71 38154.95 38059.07
exec1 2.38 2.07 2.24 2.02 1.54 0.92 0.60 0.96 3.85 3.71 234.16 1686.45 5751.58 38178.19 36683.24
exec1 2.35 2.07 2.23 1.95 1.73 0.84 0.45 0.92 3.84 3.85 244.71 1722.61 5819.63 38152.86 37908.00
exec1 2.39 2.08 2.22 1.80 1.71 0.90 0.50 0.96 3.65 3.75 240.74 1726.15 5797.03 35209.98 37014.79
I tried using pd.pivot_table but it creates unwanted nans.

May need groupby +cumcount create a additional key , then do pivot transform , here I am using unstack , if you need check pivot , personally think this explain better than the official document
df.assign(key=df.groupby(['exec','ms']).cumcount()).set_index(['exec','ms','key']).unstack([1])

How do you reshape a csv file in python using pandas?

I'm trying to reshape this csv file in python to have 1,000 rows and all 12 columns but it shows up as only one column
I've tried using df.iloc[:][:1000] it shortens it to 1000 rows but it still only gives me 1 column
Dataframe for Wine
df_w= pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv')
df_w= df_w.iloc[:12][:1000]
df_w
The dataframe that shows up is 1000 rows with only 1 column and I'm wondering how you set the data into its respective column title

Use sep=';', for semicolon delimiter:
df_w= pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv', sep=';')
df_w= df_w.iloc[:12][:1000]
df_w
Output:
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5
2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5
3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6
4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
5 7.4 0.66 0.00 1.8 0.075 13.0 40.0 0.9978 3.51 0.56 9.4 5
6 7.9 0.60 0.06 1.6 0.069 15.0 59.0 0.9964 3.30 0.46 9.4 5
7 7.3 0.65 0.00 1.2 0.065 15.0 21.0 0.9946 3.39 0.47 10.0 7
8 7.8 0.58 0.02 2.0 0.073 9.0 18.0 0.9968 3.36 0.57 9.5 7
9 7.5 0.50 0.36 6.1 0.071 17.0 102.0 0.9978 3.35 0.80 10.5 5
10 6.7 0.58 0.08 1.8 0.097 15.0 65.0 0.9959 3.28 0.54 9.2 5
11 7.5 0.50 0.36 6.1 0.071 17.0 102.0 0.9978 3.35 0.80 10.5 5

Pandas: Sum current and following cell in Pandas DataFrame

Python code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas_datareader import data as wb
stock='3988.HK'
df = wb.DataReader(stock,data_source='yahoo',start='2018-07-01')
rsi_period = 14
chg = df['Close'].diff(1)
gain = chg.mask(chg<0,0)
df['Gain'] = gain
loss = chg.mask(chg>0,0)
df['Loss'] = loss
avg_gain = gain.ewm(com = rsi_period-1,min_periods=rsi_period).mean()
avg_loss = loss.ewm(com = rsi_period-1,min_periods=rsi_period).mean()
df['Avg Gain'] = avg_gain
df['Avg Loss'] = avg_loss
rs = abs(avg_gain/avg_loss)
rsi = 100-(100/(1+rs))
df['RSI'] = rsi
df.reset_index(inplace=True)
df
Output:
Date High Low Open Close Volume Adj Close Gain Loss Avg Gain Avg Loss RSI
0 2018-07-03 3.87 3.76 3.83 3.84 684899302.0 3.629538 NaN NaN NaN NaN NaN
1 2018-07-04 3.91 3.84 3.86 3.86 460325574.0 3.648442 0.02 0.00 NaN NaN NaN
2 2018-07-05 3.70 3.62 3.68 3.68 292810499.0 3.680000 0.00 -0.18 NaN NaN NaN
3 2018-07-06 3.72 3.61 3.69 3.67 343653088.0 3.670000 0.00 -0.01 NaN NaN NaN
4 2018-07-09 3.75 3.68 3.70 3.69 424596186.0 3.690000 0.02 0.00 NaN NaN NaN
5 2018-07-10 3.74 3.70 3.71 3.71 327048051.0 3.710000 0.02 0.00 NaN NaN NaN
6 2018-07-11 3.65 3.61 3.63 3.64 371355401.0 3.640000 0.00 -0.07 NaN NaN NaN
7 2018-07-12 3.69 3.63 3.66 3.66 309888328.0 3.660000 0.02 0.00 NaN NaN NaN
8 2018-07-13 3.69 3.62 3.69 3.63 261928758.0 3.630000 0.00 -0.03 NaN NaN NaN
9 2018-07-16 3.63 3.57 3.61 3.62 306970074.0 3.620000 0.00 -0.01 NaN NaN NaN
10 2018-07-17 3.62 3.56 3.62 3.58 310294921.0 3.580000 0.00 -0.04 NaN NaN NaN
11 2018-07-18 3.61 3.55 3.58 3.58 334592695.0 3.580000 0.00 0.00 NaN NaN NaN
12 2018-07-19 3.61 3.56 3.61 3.56 211984563.0 3.560000 0.00 -0.02 NaN NaN NaN
13 2018-07-20 3.64 3.52 3.57 3.61 347506394.0 3.610000 0.05 0.00 NaN NaN NaN
14 2018-07-23 3.65 3.57 3.59 3.62 313125328.0 3.620000 0.01 0.00 0.010594 -0.021042 33.487100
15 2018-07-24 3.71 3.60 3.60 3.68 367627204.0 3.680000 0.06 0.00 0.015854 -0.018802 45.745967
16 2018-07-25 3.73 3.68 3.72 3.69 270460990.0 3.690000 0.01 0.00 0.015252 -0.016868 47.483263
17 2018-07-26 3.73 3.66 3.72 3.69 234388072.0 3.690000 0.00 0.00 0.013731 -0.015186 47.483263
18 2018-07-27 3.70 3.66 3.68 3.69 190039532.0 3.690000 0.00 0.00 0.012399 -0.013713 47.483263
19 2018-07-30 3.72 3.67 3.68 3.70 163971848.0 3.700000 0.01 0.00 0.012172 -0.012417 49.502851
20 2018-07-31 3.70 3.66 3.67 3.68 168486023.0 3.680000 0.00 -0.02 0.011047 -0.013118 45.716244
21 2018-08-01 3.72 3.66 3.71 3.68 199801191.0 3.680000 0.00 0.00 0.010047 -0.011930 45.716244
22 2018-08-02 3.68 3.59 3.66 3.61 307920738.0 3.610000 0.00 -0.07 0.009155 -0.017088 34.884632
23 2018-08-03 3.62 3.57 3.59 3.61 184816985.0 3.610000 0.00 0.00 0.008356 -0.015596 34.884632
24 2018-08-06 3.66 3.60 3.62 3.61 189696153.0 3.610000 0.00 0.00 0.007637 -0.014256 34.884632
25 2018-08-07 3.66 3.61 3.63 3.65 216157642.0 3.650000 0.04 0.00 0.010379 -0.013048 44.302922
26 2018-08-08 3.66 3.61 3.65 3.63 215365540.0 3.630000 0.00 -0.02 0.009511 -0.013629 41.101805
27 2018-08-09 3.66 3.59 3.59 3.65 230275455.0 3.650000 0.02 0.00 0.010378 -0.012504 45.353992
28 2018-08-10 3.66 3.60 3.65 3.62 219157328.0 3.620000 0.00 -0.03 0.009530 -0.013933 40.617049
29 2018-08-13 3.59 3.54 3.58 3.56 270620120.0 3.560000 0.00 -0.06 0.008759 -0.017658 33.158019
In this case, i want to create a new column 'max close within 14 trade days'.
'max close within 14 trade days' = Maximum 'close' within next 14 days.
For example,
in row 0, the data range should be from row 1 to row 15,
'max close within 14 trade days' = 3.86

You can do the following:
# convert to date time
df['Date'] = pd.to_datetime(df['Date'])
# calculate max for 14 days
df['max_close_within_14_trade_days'] = df['Date'].map(df.groupby([pd.Grouper(key='Date', freq='14D')])['Close'].max())
# fill missing values by previous value
df['max_close_within_14_trade_days'].fillna(method='ffill', inplace=True)
Date High Low Open Close max_close_within_14_trade_days
0 0 2018-07-03 3.87 3.76 3.83 3.86
1 1 2018-07-04 3.91 3.84 3.86 3.86
2 2 2018-07-05 3.70 3.62 3.68 3.86
3 3 2018-07-06 3.72 3.61 3.69 3.86
4 4 2018-07-09 3.75 3.68 3.70 3.86
Other solution:
df['max_close_within_14_trade_days'] = [df.loc[x+1:x+14,'Close'].max() for x in range(0, df.shape[0])]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split data frame based on specific value in Python - python

Related

pd.concat ends up with ValueError: no types given

Weird bug when changing boolean number to classify dataframe

Converting rows from one pandas dataframe column to multiple columns without numeric data? [duplicate]

How do you reshape a csv file in python using pandas?

Pandas: Sum current and following cell in Pandas DataFrame

Categories

Resources