Python - faster way to run a for loop in a dataframe - python

I am running the following code to calculate for every dataframe row the number of positive days in the previous rows and the number of days in which the stock has beaten the S&P 500 index:
for offset in [1,5,15,30,45,60,75,90,120,150,
200,250,500,750,1000,1250,1500]:
asset['return_stock'] = (asset.Close - asset.Close.shift(1)) / (asset.Close.shift(1))
merged_data = pd.merge(asset, sp_500, on='Date')
total_positive_days=0
total_beating_sp_days=0
for index, row in merged_data.iterrows():
print(offset, index)
for i in range(0,offset):
if index-i-1>0:
if merged_data.loc[index-i,'Close_x'] > merged_data.loc[index-i-1,'Close_x']:
total_positive_days+=1
if merged_data.loc[index-i,'return_stock'] > merged_data.loc[index-i-1,'return_sp']:
total_beating_sp_days+=1
but it is quite slow. Is there a way to speed it up (possibly by somehow getting rid of the for loop)?
My dataset looks like this (merged_data follows):
Date Open_x High_x Low_x Close_x Adj Close_x Volume_x return_stock Pct_positive_1 Pct_beating_1 Pct_change_1 Pct_change_plus_1 Pct_positive_5 Pct_beating_5 Pct_change_5 Pct_change_plus_5 Pct_positive_15 Pct_beating_15 Pct_change_15 Pct_change_plus_15 Pct_positive_30 Pct_beating_30 Pct_change_30 Pct_change_plus_30 Open_y High_y Low_y Close_y Adj Close_y Volume_y return_sp
0 2010-01-04 30.490000 30.642857 30.340000 30.572857 26.601469 123432400 NaN 1311.0 1261.0 NaN -0.001726 1310.4 1260.8 NaN 0.018562 1307.2 1257.6 NaN 0.039186 1302.066667 1252.633333 NaN 0.056579 1116.560059 1133.869995 1116.560059 1132.989990 1132.989990 3991400000 0.016043
1 2010-01-05 30.657143 30.798571 30.464285 30.625713 26.647457 150476200 0.001729 1311.0 1261.0 0.001729 0.016163 1310.4 1260.8 NaN 0.032062 1307.2 1257.6 NaN 0.031268 1302.066667 1252.633333 NaN 0.056423 1132.660034 1136.630005 1129.660034 1136.520020 1136.520020 2491020000 0.003116
2 2010-01-06 30.625713 30.747143 30.107143 30.138571 26.223597 138040000 -0.015906 1311.0 1261.0 -0.015906 0.001852 1310.4 1260.8 NaN 0.001519 1307.2 1257.6 NaN 0.058608 1302.066667 1252.633333 NaN 0.046115 1135.709961 1139.189941 1133.949951 1137.140015 1137.140015 4972660000 0.000546
3 2010-01-07 30.250000 30.285715 29.864286 30.082857 26.175119 119282800 -0.001849 1311.0 1261.0 -0.001849 -0.006604 1310.4 1260.8 NaN 0.005491 1307.2 1257.6 NaN 0.096428 1302.066667 1252.633333 NaN 0.050694 1136.270020 1142.459961 1131.319946 1141.689941 1141.689941 5270680000 0.004001
4 2010-01-08 30.042856 30.285715 29.865715 30.282858 26.349140 111902700 0.006648 1311.0 1261.0 0.006648 0.008900 1310.4 1260.8 NaN 0.029379 1307.2 1257.6 NaN 0.088584 1302.066667 1252.633333 NaN 0.075713 1140.520020 1145.390015 1136.219971 1144.979980 1144.979980 4389590000 0.002882
asset follows:
Date Open High Low Close Adj Close Volume return_stock Pct_positive_1 Pct_beating_1 Pct_change_1 Pct_change_plus_1 Pct_positive_5 Pct_beating_5 Pct_change_5 Pct_change_plus_5
0 2010-01-04 30.490000 30.642857 30.340000 30.572857 26.601469 123432400 NaN 1311.0 1261.0 NaN -0.001726 1310.4 1260.8 NaN 0.018562
1 2010-01-05 30.657143 30.798571 30.464285 30.625713 26.647457 150476200 0.001729 1311.0 1261.0 0.001729 0.016163 1310.4 1260.8 NaN 0.032062
2 2010-01-06 30.625713 30.747143 30.107143 30.138571 26.223597 138040000 -0.015906 1311.0 1261.0 -0.015906 0.001852 1310.4 1260.8 NaN 0.001519
3 2010-01-07 30.250000 30.285715 29.864286 30.082857 26.175119 119282800 -0.001849 1311.0 1261.0 -0.001849 -0.006604 1310.4 1260.8 NaN 0.005491
4 2010-01-08 30.042856 30.285715 29.865715 30.282858 26.349140 111902700 0.006648 1311.0 1261.0 0.006648 0.008900 1310.4 1260.8 NaN 0.029379
sp_500 follows:
Date Open High Low Close Adj Close Volume return_sp
0 1999-12-31 1464.469971 1472.420044 1458.189941 1469.250000 1469.250000 374050000 NaN
1 2000-01-03 1469.250000 1478.000000 1438.359985 1455.219971 1455.219971 931800000 -0.009549
2 2000-01-04 1455.219971 1455.219971 1397.430054 1399.420044 1399.420044 1009000000 -0.038345
3 2000-01-05 1399.420044 1413.270020 1377.680054 1402.109985 1402.109985 1085500000 0.001922
4 2000-01-06 1402.109985 1411.900024 1392.099976 1403.449951 1403.449951 1092300000 0.000956

This is a partial answer.
I think the way you do
asset.Close - asset.Close.shift(1)
at the top is key to how you might do this. Instead of
if merged_data.loc[index-i,'Close_x'] > merged_data.loc[index-i-1,'Close_x']
create a column with the change in Close_x:
merged_data['Delta_Close_x'] = merged_data.Close_x - merged_data.Close_x.shift(1)
Similarly,
if merged_data.loc[index-i,'return_stock'] > merged_data.loc[index-i-1,'return_sp']
becomes
merged_data['vs_sp'] = merged_data.return_stock - merged_data.return_sp.shift(1)
Then you can iterate i and use subsets like
merged_data[merged_data['Delta_Close_x'] > 0 and merged_data['vs_sp'] > 0]
There are a lot of additional details to work out, but I hope this gets you started.

Related

How to iterate through columns of the dataframe?

I want go through the all the columns of the dataframe. so that I will get a particular data of the column, using these data I have to calculate for another dataframe.
Here i have :
DP1 DP2 DP3 DP4 DP5 DP6 DP7 DP8 DP9 DP10 Total
OP1 357848.0 1124788.0 1735330.0 2218270.0 2745596.0 3319994.0 3466336.0 3606286.0 3833515.0 3901463.0 3901463.0
OP2 352118.0 1236139.0 2170033.0 3353322.0 3799067.0 4120063.0 4647867.0 4914039.0 5339085.0 NaN 5339085.0
OP3 290507.0 1292306.0 2218525.0 3235179.0 3985995.0 4132918.0 4628910.0 4909315.0 NaN NaN 4909315.0
OP4 310608.0 1418858.0 2195047.0 3757447.0 4029929.0 4381982.0 4588268.0 NaN NaN NaN 4588268.0
OP5 443160.0 1136350.0 2128333.0 2897821.0 3402672.0 3873311.0 NaN NaN NaN NaN 3873311.0
OP6 396132.0 1333217.0 2180715.0 2985752.0 3691712.0 NaN NaN NaN NaN NaN 3691712.0
OP7 440832.0 1288463.0 2419861.0 3483130.0 NaN NaN NaN NaN NaN NaN 3483130.0
OP8 359480.0 1421128.0 2864498.0 NaN NaN NaN NaN NaN NaN NaN 2864498.0
OP9 376686.0 1363294.0 NaN NaN NaN NaN NaN NaN NaN NaN 1363294.0
OP10 344014.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 344014.0
Total 3671385.0 11614543.0 17912342.0 21930921.0 21654971.0 19828268.0 17331381.0 13429640.0 9172600.0 3901463.0 34358090.0
Latest Observation 344014.0 1363294.0 2864498.0 3483130.0 3691712.0 3873311.0 4588268.0 4909315.0 5339085.0 3901463.0 NaN
From this table I would like to calculate formula this formula :in column DP1,Total/Last observation and this answer is divides by DP2 columns total. Like this we have to calculate all the columns and save it in another dataframe.
we need row like this :
Weighted Average 3.491 1.747 1.457 1.174 1.104 1.086 1.054 1.077 1.018
This code we tried:
LDFTriangledf['Weighted Average'] =CumulativePaidTriangledf.loc['Total','DP2']/(CumulativePaidTriangledf.loc['Total','DP1'] - CumulativePaidTriangledf.loc['Latest Observation','DP1'])
You can remove the column names from .loc and just shift(-1, axis=1) to get the next column's Total. This lets you apply the formula to all columns in a single operation:
CumulativePaidTriangledf.shift(-1, axis=1).loc['Total'] / (CumulativePaidTriangledf.loc['Total'] - CumulativePaidTriangledf.loc['Latest Observation'])
# DP1 3.490607
# DP2 1.747333
# DP3 1.457413
# DP4 1.173852
# DP5 1.103824
# DP6 1.086269
# DP7 1.053874
# DP8 1.076555
# DP9 1.017725
# DP10 inf
# Total NaN
# dtype: float64
Here is a breakdown of what the three components are doing:
DP1
DP2
DP3
DP4
DP5
DP6
DP7
DP8
DP9
DP10
Total
A: .shift(-1, axis=1).loc['Total'] -- We are shifting the whole Total row to the left, so every column now has the next Total value.
1.161454e+07
1.791234e+07
2.193092e+07
2.165497e+07
1.982827e+07
1.733138e+07
1.342964e+07
9.172600e+06
3.901463e+06
34358090.0
NaN
B: .loc['Total'] -- This is the normal Total row.
3.671385e+06
1.161454e+07
1.791234e+07
2.193092e+07
2.165497e+07
1.982827e+07
1.733138e+07
1.342964e+07
9.172600e+06
3901463.0
34358090.0
C: .loc['Latest Observation'] -- This is the normal Latest Observation.
3.440140e+05
1.363294e+06
2.864498e+06
3.483130e+06
3.691712e+06
3.873311e+06
4.588268e+06
4.909315e+06
5.339085e+06
3901463.0
NaN
A / (B-C) -- This is what the code above does. It takes the shifted Total row (A) and divides it by the difference of the current Total row (B) and current Latest observation row (C).
3.490607
1.747333
1.457413
1.173852
1.103824
1.086269
1.053874
1.076555
1.017725
inf
NaN

Selecting Rows in pandas multiindex using query

I have the following dataframe:
Attributes Adj Close
Symbols ADANIPORTS.NS ASIANPAINT.NS AXISBANK.NS BAJAJ-AUTO.NS BAJFINANCE.NS BAJAJFINSV.NS BHARTIARTL.NS INFRATEL.NS BPCL.NS BRITANNIA.NS ... TCS.NS TATAMOTORS.NS TATASTEEL.NS TECHM.NS TITAN.NS ULTRACEMCO.NS UPL.NS VEDL.NS WIPRO.NS ZEEL.NS
month day
1 1 279.239893 676.232860 290.424052 2324.556588 974.134152 3710.866499 290.157978 243.696764 146.170036 950.108271 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 240.371331 507.737111 236.844831 2340.821987 718.111446 3042.034076 277.125503 236.177303 122.136606 733.759396 ... -2.714824 2.830603 109.334502 -17.856865 13.293902 18.980020 0.689529 -0.006994 -3.862265 -10.423989
3 241.700116 498.997079 213.632179 2368.956136 746.050460 3292.162304 279.075750 231.213816 114.698633 686.986466 ... 0.075497 -0.629591 -0.241416 -0.260787 1.392858 -1.196444 -0.660421 -0.161608 -0.243293 -1.687734
4 223.532480 439.849441 201.245454 2391.910913 499.554044 2313.025635 287.582485 276.568762 104.650728 603.446742 ... -1.270405 0.178012 0.109399 -0.224380 -0.415277 -5.050810 -0.084462 -0.075032 3.924894 0.959136
5 213.588413 359.632790 187.594303 2442.596619 309.180993 1587.324934 260.401816 305.384079 95.571235 475.708696 ... -0.995601 -1.093621 0.214684 -1.189623 -2.503186 -0.511994 -0.512211 0.693024 -1.025715 -1.516946
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12 27 238.901700 500.376711 227.057510 2413.230611 748.599821 3299.320564 276.806537 242.597250 124.235449 727.263012 ... 2.770155 -4.410527 -0.031403 -5.315438 -1.792164 1.038870 -0.860125 -1.258880 -0.933370 -1.487581
28 236.105050 461.535601 218.893424 2375.671582 542.521903 2613.480190 284.374906 264.309625 117.807956 681.625725 ... -0.614677 -1.045941 0.688749 -0.375988 1.848569 -1.362454 37.301528 4.794349 -21.079648 -2.224608
29 215.606034 372.030459 203.876520 2450.112244 324.772498 1765.010912 257.278008 300.096024 108.679112 543.112336 ... 3.220893 -28.873421 0.197491 0.649738 0.737047 -6.121189 -1.165286 0.197648 0.250269 -0.064486
30 205.715512 432.342895 235.872734 2279.715479 515.535031 2164.257183 237.584375 253.401642 116.322402 634.503822 ... -1.190093 0.111826 -1.100066 -0.274475 -1.107278 -0.638013 -7.148901 -0.594369 -0.622608 0.368726
31 222.971462 490.784491 246.348255 2211.909688 670.891505 2671.694809 260.623987 230.032092 108.617400 719.389436 ... -1.950700 0.994181 -11.328524 -1.575859 -8.297147 1.151578 -0.059656 -0.650074 -0.648105 -0.749307
366 rows × 601 columns
To select the row which is month 1 and day 1 i have used the following code:
df.query('month ==1' and 'day ==1')
But this produced the following dataframe:
Attributes Adj Close
Symbols ADANIPORTS.NS ASIANPAINT.NS AXISBANK.NS BAJAJ-AUTO.NS BAJFINANCE.NS BAJAJFINSV.NS BHARTIARTL.NS INFRATEL.NS BPCL.NS BRITANNIA.NS ... TCS.NS TATAMOTORS.NS TATASTEEL.NS TECHM.NS TITAN.NS ULTRACEMCO.NS UPL.NS VEDL.NS WIPRO.NS ZEEL.NS
month day
1 1 279.239893 676.232860 290.424052 2324.556588 974.134152 3710.866499 290.157978 243.696764 146.170036 950.108271 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 1 215.752040 453.336287 213.741552 2373.224390 517.295897 2289.618629 280.212598 253.640594 104.505893 620.435294 ... -2.526060 -1.059128 -2.052233 3.941005 25.233763 -41.377432 1.032536 7.398859 -4.622867 -1.506376
3 1 233.534958 472.889636 204.900776 2318.030298 561.193189 2697.357413 254.006857 250.426263 106.528327 649.475321 ... -2.269081 -1.375370 -1.734496 27.675276 -1.944131 0.401074 -0.852499 -0.119033 -1.723600 -1.930760
4 1 192.280787 467.604906 227.369618 1982.318034 506.188324 1931.920305 252.626459 226.062386 98.663596 637.086713 ... -0.044923 -0.111909 -0.181328 -1.943672 1.983368 -1.677000 -0.531217 0.032385 -0.956535 -2.015332
5 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000
6 1 230.836429 509.991614 218.370072 2463.180957 526.564244 2231.603166 289.425584 298.146594 118.566019 754.736115 ... -0.807933 -1.509616 1.792957 10.396550 -1.060003 2.008286 1.029651 6.690478 -3.114476 0.766063
7 1 197.943186 355.930544 242.388461 2168.834937 412.196744 1753.647647 233.189894 241.823186 90.870574 512.000742 ... -1.630295 11.019253 -0.244958 2.188104 -0.505939 -0.564639 -1.747775 -0.394980 -2.736355 -0.140087
8 1 236.361903 491.867703 218.289537 2102.183175 657.764627 2792.688073 264.695685 249.063224 108.213277 662.192035 ... -1.655988 -1.555488 -1.199192 -0.565774 -1.831832 -4.770262 -0.442534 -6.168488 -0.267261 -3.324977
9 1 229.131335 372.101859 225.172708 2322.747894 333.243305 1800.901049 246.923254 287.262203 114.754666 562.854895 ... -2.419973 0.205031 -1.096847 -0.840121 -2.932670 1.719342 6.196965 -2.674245 -6.542936 -2.526353
10 1 208.748352 429.829772 222.081509 2095.421448 553.005620 2204.335371 259.718945 229.177512 102.475334 641.439810 ... 0.752312 -1.371583 -1.367145 -5.607321 3.259092 26.787332 -1.023199 -0.589042 0.507405 2.428903
11 1 248.233805 545.774276 241.743095 2390.945333 803.738236 3088.686081 277.757322 243.703551 131.933623 789.243830 ... -1.882445 -0.660089 -0.476966 -1.097497 -0.525270 -0.857579 -0.702017 0.016806 -0.792296 -0.368364
12 1 200.472858 353.177721 200.870312 2451.274841 295.858735 1556.379498 255.714673 301.000198 103.908244 514.528562 ... -0.789445 -14.382776 0.196276 -0.394203 7.600042 48.345830 -0.276618 -0.411825 2.271997 42.734886
12 rows × 601 columns
it has produced day 1 for each month instead of row which will show month 1 and day 1. What can i do to resolve this issue?
Remove one '' for one string:
df.query('month == 1 and day == 1')

Why am I getting NaN values?

I have the following code:
data1 = data1.set_index('Date')
data1['ret'] = 1 - data1['Low'].div(data1['Close'].shift(freq='1d'))
data1['ret'] = data1['ret'].astype(float)*100
For some reason on the column ret i am getting NaN value:
High Low Open Close Volume Adj Close ret
Date
2020-01-24 3333.179932 3281.530029 3333.100098 3295.469971 3707130000 3295.469971 1.323394
2020-01-27 3258.850098 3234.500000 3247.159912 3243.629883 3823100000 3243.629883 NaN
2020-01-28 3285.780029 3253.219971 3255.350098 3276.239990 3526720000 3276.239990 -0.295659
2020-01-29 3293.469971 3271.889893 3289.459961 3273.399902 3584500000 3273.399902 0.132777
2020-01-30 3285.909912 3242.800049 3256.449951 3283.659912 3787250000 3283.659912 0.934803
2020-01-31 3282.330078 3214.679932 3282.330078 3225.520020 4527830000 3225.520020 2.100704
2020-02-03 3268.439941 3235.659912 3235.659912 3248.919922 3757910000 3248.919922 NaN
2020-02-04 3306.919922 3280.610107 3280.610107 3297.590088 3995320000 3297.590088 -0.975407
2020-02-05 3337.580078 3313.750000 3324.909912 3334.689941 4117730000 3334.689941 -0.490052
2020-02-06 3347.959961 3334.389893 3344.919922 3345.780029 3868370000 3345.780029 0.008998
2020-02-07 3341.419922 3322.120117 3335.540039 3327.709961 3730650000 3327.709961 0.707157
2020-02-10 3352.260010 3317.770020 3318.280029 3352.090088 3450350000 3352.090088 NaN
2020-02-11 3375.629883 3352.719971 3365.870117 3357.750000 3760550000 3357.750000 -0.018791
2020-02-12 3381.469971 3369.719971 3370.500000 3379.449951 3926380000 3379.449951 -0.356488
2020-02-13 3385.090088 3360.520020 3365.899902 3373.939941 3498240000 3373.939941 0.560148
2020-02-14 3380.689941 3366.149902 3378.080078 3380.159912 3398040000 3380.159912 0.230888
2020-02-18 3375.010010 3355.610107 3369.040039 3370.290039 3746720000 3370.290039 NaN
2020-02-19 3393.520020 3378.830078 3380.389893 3386.149902 3600150000 3386.149902 -0.253392
2020-02-20 3389.149902 3341.020020 3380.449951 3373.229980 4007320000 3373.229980 1.332779
2020-02-21 3360.760010 3328.449951 3360.500000 3337.750000 3899270000 3337.750000 1.327512
2020-02-24 3259.810059 3214.649902 3257.610107 3225.889893 4842960000 3225.889893 NaN
2020-02-25 3246.989990 3118.770020 3238.939941 3128.209961 5591510000 3128.209961 3.320630
2020-02-26 3182.510010 3108.989990 3139.899902 3116.389893 5478110000 3116.389893 0.614408
2020-02-27 3097.070068 2977.389893 3062.540039 2978.760010 7058840000 2978.760010 4.460289
2020-02-28 2959.719971 2855.840088 2916.899902 2954.219971 8563850000 2954.219971 4.126547
2020-03-02 3090.959961 2945.189941 2974.280029 3090.229980 6376400000 3090.229980 NaN
2020-03-03 3136.719971 2976.629883 3096.459961 3003.370117 6355940000 3003.370117 3.676105
2020-03-04 3130.969971 3034.379883 3045.750000 3130.120117 5035480000 3130.120117 -1.032499
2020-03-05 3083.040039 2999.830078 3075.699951 3023.939941 5575550000 3023.939941 4.162461
2020-03-06 2985.929932 2901.540039 2954.199951 2972.370117 6552140000 2972.370117 4.047696
2020-03-09 2863.889893 2734.429932 2863.889893 2746.560059 8423050000 2746.560059 NaN
2020-03-10 2882.590088 2734.000000 2813.479980 2882.229980 7635960000 2882.229980 0.457301
2020-03-11 2825.600098 2707.219971 2825.600098 2741.379883 7374110000 2741.379883 6.072035
2020-03-12 2660.949951 2478.860107 2630.860107 2480.639893 8829380000 2480.639893 9.576191
2020-03-13 2711.330078 2492.370117 2569.989990 2711.020020 8258670000 2711.020020 -0.472871
2020-03-16 2562.979980 2380.939941 2508.590088 2386.129883 7781540000 2386.129883 NaN
2020-03-17 2553.929932 2367.040039 2425.659912 2529.189941 8358500000 2529.189941 0.800034
2020-03-18 2453.570068 2280.520020 2436.500000 2398.100098 8755780000 2398.100098 9.831999
2020-03-19 2466.969971 2319.780029 2393.479980 2409.389893 7946710000 2409.389893 3.265922
2020-03-20 2453.010010 2295.560059 2431.939941 2304.919922 9044690000 2304.919922 4.724426
2020-03-23 2300.729980 2191.860107 2290.709961 2237.399902 7402180000 2237.399902 NaN
2020-03-24 2449.709961 2344.439941 2344.439941 2447.330078 7547350000 2447.330078 -4.784126
2020-03-25 2571.419922 2407.530029 2457.770020 2475.560059 8285670000 2475.560059 1.626264
2020-03-26 2637.010010 2500.719971 2501.290039 2630.070068 7753160000 2630.070068 -1.016332
2020-03-27 2615.909912 2520.020020 2555.870117 2541.469971 6194330000 2541.469971 4.184301
2020-03-30 2631.800049 2545.280029 2558.979980 2626.649902 5746220000 2626.649902 NaN
Why am i getting NaN?
Reason for missing values is if use Series.shift with freq='d' it count frequency per consecutive days.
So there is DatetimeIndex with some values missing, because removed weekends datetimes, so Mondays datetimes are counts from non exist Sundays and output are NaNs.
Solution is remove it, using:
data1 = data1.set_index('Date')
data1['ret'] = 1 - data1['Low'].div(data1['Close'].shift())
data1['ret'] = data1['ret'].astype(float)*100
then next Mondays use value from previous Fridays.

Rolling Mean not being calculated on a new column

I have an issue on calculating the rolling mean for a column I added in the code. For some reason, it doesnt work on the column I added but works on a column from the original csv.
Original dataframe from the csv as follow:
Open High Low Last Change Volume Open Int
Time
09/20/19 98.50 99.00 98.35 98.95 0.60 3305.0 0.0
09/19/19 100.35 100.75 98.10 98.35 -2.00 17599.0 0.0
09/18/19 100.65 101.90 100.10 100.35 0.00 18258.0 121267.0
09/17/19 103.75 104.00 100.00 100.35 -3.95 34025.0 122453.0
09/16/19 102.30 104.95 101.60 104.30 1.55 21403.0 127447.0
Ticker = pd.read_csv('\\......\Historical data\kcz19 daily.csv',
index_col=0, parse_dates=True)
Ticker['Return'] = np.log(Ticker['Last'] / Ticker['Last'].shift(1)).fillna('')
Ticker['ret20'] = Ticker['Return'].rolling(window=20, win_type='triang').mean()
print(Ticker.head())
Open High Low ... Open Int Return ret20
Time ...
09/20/19 98.50 99.00 98.35 ... 0.0
09/19/19 100.35 100.75 98.10 ... 0.0 -0.00608213 -0.00608213
09/18/19 100.65 101.90 100.10 ... 121267.0 0.0201315 0.0201315
09/17/19 103.75 104.00 100.00 ... 122453.0 0 0
09/16/19 102.30 104.95 101.60 ... 127447.0 0.0386073 0.0386073
ret20 column should have the rolling mean of the column Return so it should show some data starting from raw 21 whereas it is only a copy of column Return here.
If I replace with the Last column it will work.
Below is the result using colum Last
Open High Low ... Open Int Return ret20
Time ...
09/20/19 98.50 99.00 98.35 ... 0.0 NaN
09/19/19 100.35 100.75 98.10 ... 0.0 -0.00608213 NaN
09/18/19 100.65 101.90 100.10 ... 121267.0 0.0201315 NaN
09/17/19 103.75 104.00 100.00 ... 122453.0 0 NaN
09/16/19 102.30 104.95 101.60 ... 127447.0 0.0386073 NaN
09/13/19 103.25 103.60 102.05 ... 128707.0 -0.0149725 NaN
09/12/19 102.80 103.85 101.15 ... 128904.0 0.00823848 NaN
09/11/19 102.00 104.70 101.40 ... 132067.0 -0.00193237 NaN
09/10/19 98.50 102.25 98.00 ... 135349.0 -0.0175614 NaN
09/09/19 97.00 99.25 95.30 ... 137347.0 -0.0335283 NaN
09/06/19 95.35 97.30 95.00 ... 135399.0 -0.0122889 NaN
09/05/19 96.80 97.45 95.05 ... 136142.0 -0.0171477 NaN
09/04/19 95.65 96.95 95.50 ... 134864.0 0.0125002 NaN
09/03/19 96.00 96.60 94.20 ... 134685.0 -0.0109291 NaN
08/30/19 95.40 97.20 95.10 ... 134061.0 0.0135137 NaN
08/29/19 97.05 97.50 94.75 ... 132639.0 -0.0166584 NaN
08/28/19 97.40 98.15 95.95 ... 130573.0 0.0238601 NaN
08/27/19 97.35 98.00 96.40 ... 129921.0 -0.00410889 NaN
08/26/19 95.55 98.50 95.25 ... 129003.0 0.0035962 NaN
08/23/19 96.90 97.40 95.05 ... 130268.0 -0.0149835 98.97775
Appreciate any help
the .fillna('') is creating a string in the first row which then creates errors for the rolling calculation in Ticker['ret20'].
Delete this and the code will run fine:
Ticker['Return'] = np.log(Ticker['Last'] / Ticker['Last'].shift(1))

How to join a dataframe to a Series with different indices

I have a pandas data frame that looks like:
High Low ... Volume OpenInterest
2018-01-02 983.25 975.50 ... 8387 67556
2018-01-03 986.75 981.00 ... 7447 67525
2018-01-04 985.25 977.00 ... 8725 67687
2018-01-05 990.75 984.00 ... 7948 67975
I calculate the Average True Range and save it into a series:
i = 0
TR_l = [0]
while i < (df.shape[0]-1):
#TR = max(df.loc[i + 1, 'High'], df.loc[i, 'Close']) - min(df.loc[i + 1, 'Low'], df.loc[i, 'Close'])
TR = max(df['High'][i+1], df['Close'][i]) - min(df['Low'][i+1], df['Close'][i])
TR_l.append(TR)
i = i + 1
TR_s = pd.Series(TR_l)
ATR = pd.Series(TR_s.ewm(span=n, min_periods=n).mean(), name='ATR_' + str(n))
With a 14-period rolling window ATR looks like:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 8.096064
14 7.968324
15 8.455205
16 9.046418
17 8.895405
18 9.088769
19 9.641879
20 9.516764
But when I do:
df = df.join(ATR)
The ATR column in df is all NaN. It's because the indexes are different between the data frame and ATR. Is there any way to add the ATR column into the data frame?
Consider shift to avoid the while loop across rows and list building. Below uses Union Pacific (UNP) railroad stock data to demonstrate:
import pandas as pd
import pandas_datareader as pdr
stock_df = pdr.get_data_yahoo('UNP').loc['2019-01-01':'2019-03-29']
# SHIFT DATA ONE DAY BACK AND JOIN TO ORIGINAL DATA
stock_df = stock_df.join(stock_df.shift(-1), rsuffix='_future')
# CALCULATE TR DIFFERENCE BY ROW
stock_df['TR'] = stock_df.apply(lambda x: max(x['High_future'], x['Close']) - min(x['Low_future'], x['Close']), axis=1)
# CALCULATE EWM MEAN
n = 14
stock_df['ATR'] = stock_df['TR'].ewm(span=n, min_periods=n).mean()
Output
print(stock_df.head(20))
# High Low Open Close Volume Adj Close High_future Low_future Open_future Close_future Volume_future Adj Close_future TR ATR
# Date
# 2019-01-02 138.320007 134.770004 135.649994 137.779999 3606300.0 137.067413 136.750000 132.169998 136.039993 132.679993 5684500.0 131.993790 5.610001 NaN
# 2019-01-03 136.750000 132.169998 136.039993 132.679993 5684500.0 131.993790 138.580002 134.520004 134.820007 137.789993 5649900.0 137.077362 5.900009 NaN
# 2019-01-04 138.580002 134.520004 134.820007 137.789993 5649900.0 137.077362 139.229996 136.259995 137.330002 138.649994 4034200.0 137.932907 2.970001 NaN
# 2019-01-07 139.229996 136.259995 137.330002 138.649994 4034200.0 137.932907 152.889999 149.039993 151.059998 150.750000 10558800.0 149.970337 14.240005 NaN
# 2019-01-08 152.889999 149.039993 151.059998 150.750000 10558800.0 149.970337 151.059998 148.610001 150.289993 150.360001 4284600.0 149.582352 2.449997 NaN
# 2019-01-09 151.059998 148.610001 150.289993 150.360001 4284600.0 149.582352 155.289993 149.009995 149.899994 154.660004 6444600.0 153.860123 6.279999 NaN
# 2019-01-10 155.289993 149.009995 149.899994 154.660004 6444600.0 153.860123 155.029999 153.089996 153.639999 153.210007 3845200.0 152.417618 1.940002 NaN
# 2019-01-11 155.029999 153.089996 153.639999 153.210007 3845200.0 152.417618 154.240005 151.649994 152.229996 153.889999 3507100.0 153.094101 2.590012 NaN
# 2019-01-14 154.240005 151.649994 152.229996 153.889999 3507100.0 153.094101 154.360001 151.740005 153.789993 152.479996 4685100.0 151.691391 2.619995 NaN
# 2019-01-15 154.360001 151.740005 153.789993 152.479996 4685100.0 151.691391 153.729996 150.910004 152.910004 151.970001 4053200.0 151.184021 2.819992 NaN
# 2019-01-16 153.729996 150.910004 152.910004 151.970001 4053200.0 151.184021 154.919998 150.929993 151.110001 154.639999 4075400.0 153.840210 3.990005 NaN
# 2019-01-17 154.919998 150.929993 151.110001 154.639999 4075400.0 153.840210 158.800003 155.009995 155.539993 158.339996 5003900.0 157.521072 4.160004 NaN
# 2019-01-18 158.800003 155.009995 155.539993 158.339996 5003900.0 157.521072 157.199997 154.410004 156.929993 155.020004 6052900.0 154.218262 3.929993 NaN
# 2019-01-22 157.199997 154.410004 156.929993 155.020004 6052900.0 154.218262 156.020004 152.429993 155.449997 154.330002 4858000.0 153.531830 3.590012 4.011254
# 2019-01-23 156.020004 152.429993 155.449997 154.330002 4858000.0 153.531830 160.759995 156.009995 160.039993 160.339996 9222400.0 159.510742 6.429993 4.376440
# 2019-01-24 160.759995 156.009995 160.039993 160.339996 9222400.0 159.510742 162.000000 160.220001 161.460007 160.949997 7770700.0 160.117584 1.779999 3.991223
# 2019-01-25 162.000000 160.220001 161.460007 160.949997 7770700.0 160.117584 160.789993 159.339996 160.000000 159.899994 3733800.0 159.073013 1.610001 3.643168
# 2019-01-28 160.789993 159.339996 160.000000 159.899994 3733800.0 159.073013 160.929993 158.750000 160.039993 160.169998 3436900.0 159.341614 2.179993 3.432011
# 2019-01-29 160.929993 158.750000 160.039993 160.169998 3436900.0 159.341614 161.889999 159.440002 161.089996 160.820007 4112200.0 159.988266 2.449997 3.291831
# 2019-01-30 161.889999 159.440002 161.089996 160.820007 4112200.0 159.988266 160.990005 157.020004 160.750000 159.070007 7438600.0 158.247314 3.970001 3.387735

Categories