I have the following code:
data1 = data1.set_index('Date')
data1['ret'] = 1 - data1['Low'].div(data1['Close'].shift(freq='1d'))
data1['ret'] = data1['ret'].astype(float)*100
For some reason on the column ret i am getting NaN value:
High Low Open Close Volume Adj Close ret
Date
2020-01-24 3333.179932 3281.530029 3333.100098 3295.469971 3707130000 3295.469971 1.323394
2020-01-27 3258.850098 3234.500000 3247.159912 3243.629883 3823100000 3243.629883 NaN
2020-01-28 3285.780029 3253.219971 3255.350098 3276.239990 3526720000 3276.239990 -0.295659
2020-01-29 3293.469971 3271.889893 3289.459961 3273.399902 3584500000 3273.399902 0.132777
2020-01-30 3285.909912 3242.800049 3256.449951 3283.659912 3787250000 3283.659912 0.934803
2020-01-31 3282.330078 3214.679932 3282.330078 3225.520020 4527830000 3225.520020 2.100704
2020-02-03 3268.439941 3235.659912 3235.659912 3248.919922 3757910000 3248.919922 NaN
2020-02-04 3306.919922 3280.610107 3280.610107 3297.590088 3995320000 3297.590088 -0.975407
2020-02-05 3337.580078 3313.750000 3324.909912 3334.689941 4117730000 3334.689941 -0.490052
2020-02-06 3347.959961 3334.389893 3344.919922 3345.780029 3868370000 3345.780029 0.008998
2020-02-07 3341.419922 3322.120117 3335.540039 3327.709961 3730650000 3327.709961 0.707157
2020-02-10 3352.260010 3317.770020 3318.280029 3352.090088 3450350000 3352.090088 NaN
2020-02-11 3375.629883 3352.719971 3365.870117 3357.750000 3760550000 3357.750000 -0.018791
2020-02-12 3381.469971 3369.719971 3370.500000 3379.449951 3926380000 3379.449951 -0.356488
2020-02-13 3385.090088 3360.520020 3365.899902 3373.939941 3498240000 3373.939941 0.560148
2020-02-14 3380.689941 3366.149902 3378.080078 3380.159912 3398040000 3380.159912 0.230888
2020-02-18 3375.010010 3355.610107 3369.040039 3370.290039 3746720000 3370.290039 NaN
2020-02-19 3393.520020 3378.830078 3380.389893 3386.149902 3600150000 3386.149902 -0.253392
2020-02-20 3389.149902 3341.020020 3380.449951 3373.229980 4007320000 3373.229980 1.332779
2020-02-21 3360.760010 3328.449951 3360.500000 3337.750000 3899270000 3337.750000 1.327512
2020-02-24 3259.810059 3214.649902 3257.610107 3225.889893 4842960000 3225.889893 NaN
2020-02-25 3246.989990 3118.770020 3238.939941 3128.209961 5591510000 3128.209961 3.320630
2020-02-26 3182.510010 3108.989990 3139.899902 3116.389893 5478110000 3116.389893 0.614408
2020-02-27 3097.070068 2977.389893 3062.540039 2978.760010 7058840000 2978.760010 4.460289
2020-02-28 2959.719971 2855.840088 2916.899902 2954.219971 8563850000 2954.219971 4.126547
2020-03-02 3090.959961 2945.189941 2974.280029 3090.229980 6376400000 3090.229980 NaN
2020-03-03 3136.719971 2976.629883 3096.459961 3003.370117 6355940000 3003.370117 3.676105
2020-03-04 3130.969971 3034.379883 3045.750000 3130.120117 5035480000 3130.120117 -1.032499
2020-03-05 3083.040039 2999.830078 3075.699951 3023.939941 5575550000 3023.939941 4.162461
2020-03-06 2985.929932 2901.540039 2954.199951 2972.370117 6552140000 2972.370117 4.047696
2020-03-09 2863.889893 2734.429932 2863.889893 2746.560059 8423050000 2746.560059 NaN
2020-03-10 2882.590088 2734.000000 2813.479980 2882.229980 7635960000 2882.229980 0.457301
2020-03-11 2825.600098 2707.219971 2825.600098 2741.379883 7374110000 2741.379883 6.072035
2020-03-12 2660.949951 2478.860107 2630.860107 2480.639893 8829380000 2480.639893 9.576191
2020-03-13 2711.330078 2492.370117 2569.989990 2711.020020 8258670000 2711.020020 -0.472871
2020-03-16 2562.979980 2380.939941 2508.590088 2386.129883 7781540000 2386.129883 NaN
2020-03-17 2553.929932 2367.040039 2425.659912 2529.189941 8358500000 2529.189941 0.800034
2020-03-18 2453.570068 2280.520020 2436.500000 2398.100098 8755780000 2398.100098 9.831999
2020-03-19 2466.969971 2319.780029 2393.479980 2409.389893 7946710000 2409.389893 3.265922
2020-03-20 2453.010010 2295.560059 2431.939941 2304.919922 9044690000 2304.919922 4.724426
2020-03-23 2300.729980 2191.860107 2290.709961 2237.399902 7402180000 2237.399902 NaN
2020-03-24 2449.709961 2344.439941 2344.439941 2447.330078 7547350000 2447.330078 -4.784126
2020-03-25 2571.419922 2407.530029 2457.770020 2475.560059 8285670000 2475.560059 1.626264
2020-03-26 2637.010010 2500.719971 2501.290039 2630.070068 7753160000 2630.070068 -1.016332
2020-03-27 2615.909912 2520.020020 2555.870117 2541.469971 6194330000 2541.469971 4.184301
2020-03-30 2631.800049 2545.280029 2558.979980 2626.649902 5746220000 2626.649902 NaN
Why am i getting NaN?
Reason for missing values is if use Series.shift with freq='d' it count frequency per consecutive days.
So there is DatetimeIndex with some values missing, because removed weekends datetimes, so Mondays datetimes are counts from non exist Sundays and output are NaNs.
Solution is remove it, using:
data1 = data1.set_index('Date')
data1['ret'] = 1 - data1['Low'].div(data1['Close'].shift())
data1['ret'] = data1['ret'].astype(float)*100
then next Mondays use value from previous Fridays.
Related
I am downloading an Eurostat dataset in Python using the eurostat package and the Dataframe format is tricky to work with. I have been trying to turn the panel data into time-series, but I have not been successful.
I have filtered and cleaned the data a little bit, but I've failed to turn the table into time-series (I am fairly new to Python). Below my code:
#pip install eurostat
import pandas as pd
import eurostat
# Commercial flights by reporting country – monthly data (source: Eurocontrol)
df_eurostat = eurostat.get_data_df('avia_tf_cm')
df_eurostat = df_eurostat.rename(columns={'geo\\time':'Region'})
# To exclude: 'EU27_2020', 'EU28'
# df_eurostat = df_eurostat.drop(columns='unit').T
country_list = ['AL', 'AT', 'BE', 'BG', 'CH', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL',
'ES', 'FI', 'FR', 'HR', 'HU', 'IE', 'IS', 'IT', 'LT', 'LU', 'LV',
'ME', 'MK', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'RS', 'SE', 'SI',
'SK', 'TR', 'UK']
df_eurostat = df_eurostat[df_eurostat['Region'].isin(country_list)]
df_eurostat = df_eurostat.loc[(df_eurostat['unit']=='NR')]
Before:
After - what I want to achieve:
Would highly appreciate it if anyone could help. Thank you in advance!
One more step:
to_date = lambda x: pd.to_datetime(x['Date'], format='%YM%m')
df_eurostat = df_eurostat.drop(columns='unit').set_index('Region').T \
.rename_axis(index='Date', columns=None) \
.reset_index().assign(Date=to_date)
Output:
>>> df_eurostat
Date AL AT BE BG CH CY CZ DE DK ... NO PL PT RO RS SE SI SK TR UK
0 2021-12-01 2265.0 15224.0 20055.0 4188.0 24102.0 3851.0 6690.0 94592.0 17277.0 ... 32284.0 23299.0 23977.0 10653.0 3804.0 19148.0 1038.0 1224.0 55338.0 96922.0
1 2021-11-01 1953.0 15513.0 20445.0 3694.0 21180.0 4452.0 6549.0 96853.0 17630.0 ... 33727.0 22105.0 23334.0 9294.0 3578.0 19088.0 993.0 1040.0 57975.0 90265.0
2 2021-10-01 2358.0 18314.0 21520.0 4945.0 26289.0 7118.0 7019.0 115037.0 18805.0 ... 33051.0 23325.0 27620.0 11708.0 4017.0 19070.0 1137.0 1178.0 81820.0 103358.0
3 2021-09-01 2998.0 18856.0 21834.0 6853.0 24979.0 6488.0 7785.0 107754.0 17609.0 ... 31901.0 25523.0 26989.0 13370.0 4691.0 18503.0 1155.0 1453.0 81744.0 98183.0
4 2021-08-01 3705.0 19579.0 22261.0 8807.0 26451.0 6873.0 7815.0 106657.0 16538.0 ... 28870.0 26381.0 29506.0 14416.0 5761.0 17061.0 1268.0 1695.0 90404.0 92697.0
5 2021-07-01 2973.0 17697.0 21617.0 7663.0 24531.0 6418.0 7291.0 99334.0 15357.0 ... 26152.0 24355.0 26176.0 13446.0 5831.0 15591.0 1210.0 1608.0 87664.0 72389.0
6 2021-06-01 2173.0 11225.0 15313.0 4441.0 15021.0 4328.0 5151.0 68482.0 8958.0 ... 21798.0 17129.0 19879.0 10222.0 3955.0 11832.0 788.0 992.0 58319.0 50648.0
7 2021-05-01 1452.0 7783.0 11247.0 2796.0 11619.0 3016.0 3051.0 51870.0 5993.0 ... 19007.0 8933.0 13758.0 6936.0 2736.0 8661.0 592.0 436.0 36572.0 35027.0
8 2021-04-01 1039.0 6632.0 9537.0 2457.0 10199.0 1872.0 2310.0 45712.0 4994.0 ... 18183.0 7256.0 10086.0 5720.0 2203.0 7683.0 455.0 280.0 39540.0 27739.0
9 2021-03-01 935.0 5327.0 8454.0 2071.0 8431.0 1334.0 2174.0 39463.0 4615.0 ... 19120.0 6120.0 6216.0 4212.0 1829.0 7502.0 479.0 377.0 38896.0 25305.0
10 2021-02-01 751.0 3976.0 7836.0 1756.0 7116.0 992.0 1889.0 30330.0 3522.0 ... 16159.0 4553.0 5134.0 3543.0 1527.0 6274.0 391.0 418.0 30167.0 20496.0
11 2021-01-01 881.0 4801.0 9481.0 2229.0 9262.0 1064.0 2208.0 36932.0 4937.0 ... 18953.0 6943.0 9227.0 4555.0 1741.0 7203.0 402.0 444.0 32167.0 28100.0
12 2020-12-01 880.0 5271.0 10360.0 2577.0 9804.0 1316.0 2572.0 39709.0 6030.0 ... 18913.0 7898.0 10387.0 4463.0 1887.0 8003.0 416.0 521.0 29614.0 38484.0
13 2020-11-01 872.0 5409.0 9787.0 2265.0 7667.0 1528.0 2248.0 40854.0 6328.0 ... 21194.0 8035.0 9738.0 3661.0 2130.0 8903.0 404.0 362.0 36441.0 34516.0
14 2020-10-01 1227.0 9237.0 11507.0 3392.0 12132.0 3185.0 3271.0 64376.0 9356.0 ... 24317.0 13245.0 15886.0 6179.0 2817.0 11103.0 577.0 653.0 49092.0 61735.0
15 2020-09-01 1513.0 11990.0 12241.0 4429.0 14364.0 3464.0 4749.0 69292.0 10604.0 ... 24939.0 15927.0 17980.0 7112.0 2845.0 10819.0 664.0 901.0 51449.0 72451.0
16 2020-08-01 2087.0 13469.0 14772.0 5396.0 18023.0 3770.0 5157.0 73205.0 10657.0 ... 24069.0 18681.0 20945.0 8059.0 2898.0 9963.0 796.0 1114.0 51758.0 79123.0
17 2020-07-01 1754.0 10377.0 13294.0 5026.0 15326.0 2914.0 4441.0 62889.0 9168.0 ... 23057.0 14361.0 14599.0 6925.0 2846.0 8154.0 703.0 783.0 36743.0 52547.0
18 2020-06-01 400.0 3901.0 6902.0 2495.0 6319.0 996.0 1715.0 31467.0 4085.0 ... 17126.0 3120.0 4340.0 2386.0 1570.0 5025.0 513.0 382.0 18020.0 21071.0
19 2020-05-01 186.0 1628.0 5626.0 1521.0 2841.0 457.0 979.0 20787.0 2245.0 ... 13377.0 1106.0 2208.0 1391.0 494.0 3716.0 340.0 191.0 4703.0 16397.0
20 2020-04-01 134.0 1297.0 4708.0 931.0 1936.0 355.0 823.0 17894.0 1974.0 ... 13114.0 1059.0 1600.0 1393.0 295.0 3422.0 369.0 207.0 3726.0 13634.0
21 2020-03-01 862.0 13690.0 16101.0 3551.0 20060.0 2749.0 5807.0 84579.0 14416.0 ... 28254.0 14506.0 17820.0 8349.0 2529.0 19940.0 903.0 811.0 40122.0 96914.0
22 2020-02-01 1667.0 24837.0 22531.0 4923.0 33073.0 4030.0 9417.0 128115.0 22684.0 ... 36181.0 27688.0 25712.0 12360.0 4278.0 27289.0 1256.0 1511.0 60161.0 137542.0
23 2020-01-01 1984.0 25526.0 23595.0 5261.0 34628.0 4422.0 10130.0 132506.0 23224.0 ... 38375.0 29776.0 26492.0 13357.0 4614.0 27758.0 1325.0 1580.0 66067.0 141097.0
24 2019-12-01 2204.0 25704.0 24205.0 5187.0 33464.0 4233.0 11243.0 134607.0 22640.0 ... 35866.0 29886.0 27860.0 13759.0 4792.0 27187.0 1409.0 1747.0 65175.0 148395.0
25 2019-11-01 1983.0 24584.0 24661.0 4931.0 30263.0 4886.0 11019.0 139360.0 24478.0 ... 39602.0 29281.0 27347.0 13367.0 4675.0 29459.0 1375.0 1641.0 68215.0 143007.0
26 2019-10-01 2173.0 28210.0 28315.0 6027.0 36833.0 7826.0 13484.0 175844.0 28961.0 ... 44260.0 33407.0 35370.0 15213.0 5655.0 34250.0 1274.0 1934.0 92012.0 179242.0
27 2019-09-01 2572.0 29329.0 29049.0 9908.0 37735.0 8426.0 15865.0 176614.0 29324.0 ... 43968.0 36534.0 37728.0 16539.0 6418.0 35217.0 2242.0 2917.0 99239.0 186990.0
28 2019-08-01 3012.0 30197.0 29686.0 12911.0 38535.0 9024.0 16373.0 174218.0 29149.0 ... 43548.0 37931.0 40481.0 17583.0 7150.0 32993.0 2726.0 3444.0 110635.0 196632.0
29 2019-07-01 2954.0 30638.0 30711.0 12911.0 39715.0 8895.0 16339.0 178418.0 28525.0 ... 42885.0 37728.0 40453.0 17591.0 7041.0 31426.0 2757.0 3535.0 108069.0 196964.0
30 2019-06-01 2479.0 29954.0 28327.0 10775.0 37872.0 8428.0 15645.0 171786.0 29099.0 ... 43533.0 35891.0 37269.0 16189.0 6103.0 33742.0 2508.0 2937.0 98885.0 189383.0
31 2019-05-01 2262.0 28262.0 28503.0 7053.0 37384.0 7597.0 13281.0 171324.0 28684.0 ... 43880.0 34017.0 36306.0 15470.0 5319.0 34604.0 2555.0 2104.0 86267.0 187445.0
32 2019-04-01 2110.0 27218.0 27080.0 5539.0 36308.0 6305.0 11985.0 158711.0 25866.0 ... 39057.0 30874.0 34229.0 14432.0 4958.0 31597.0 2426.0 1955.0 73548.0 169391.0
33 2019-03-01 1775.0 27362.0 24518.0 5108.0 37157.0 4415.0 11213.0 150008.0 26319.0 ... 41485.0 28338.0 28165.0 13041.0 4299.0 33232.0 2285.0 1848.0 67939.0 157772.0
34 2019-02-01 1625.0 23368.0 21019.0 4529.0 33206.0 3526.0 9256.0 131628.0 22559.0 ... 36782.0 25442.0 24069.0 11906.0 3824.0 28637.0 2022.0 1614.0 59619.0 139353.0
35 2019-01-01 1925.0 24110.0 23694.0 4990.0 35228.0 3751.0 10059.0 138258.0 23211.0 ... 38933.0 27756.0 26258.0 13292.0 4237.0 30192.0 2226.0 1723.0 66304.0 145002.0
[36 rows x 37 columns]
I'm using matplotlib to draw trendance line for stock data.
import pandas as pd
import matplotlib.pyplot as plt
A = pd.read_csv('daily/A.csv', index_col=[0])
print(A)
AAL = pd.read_csv('daily/AAL.csv', index_col=[0])
print(AAL)
A['Close'].plot()
AAL['Close'].plot()
plt.show()
then result is:
High Low Open Close Volume Adj Close
Date
1999-11-18 35.77 28.61 32.55 31.47 62546300.0 27.01
1999-11-19 30.76 28.48 30.71 28.88 15234100.0 24.79
1999-11-22 31.47 28.66 29.55 31.47 6577800.0 27.01
1999-11-23 31.21 28.61 30.40 28.61 5975600.0 24.56
1999-11-24 30.00 28.61 28.70 29.37 4843200.0 25.21
... ... ... ... ... ... ...
2020-06-24 89.08 86.32 89.08 86.56 1806600.0 86.38
2020-06-25 87.35 84.80 86.43 87.26 1350100.0 87.08
2020-06-26 87.56 85.52 87.23 85.90 2225800.0 85.72
2020-06-29 87.36 86.11 86.56 87.29 1302500.0 87.29
2020-06-30 88.88 87.24 87.33 88.37 1428931.0 88.37
[5186 rows x 6 columns]
High Low Open Close Volume Adj Close
Date
2005-09-27 21.40 19.10 21.05 19.30 961200.0 18.19
2005-09-28 20.53 19.20 19.30 20.50 5747900.0 19.33
2005-09-29 20.58 20.10 20.40 20.21 1078200.0 19.05
2005-09-30 21.05 20.18 20.26 21.01 3123300.0 19.81
2005-10-03 21.75 20.90 20.90 21.50 1057900.0 20.27
... ... ... ... ... ... ...
2020-06-24 13.90 12.83 13.59 13.04 140975500.0 13.04
2020-06-25 13.24 12.18 12.53 13.17 117383400.0 13.17
2020-06-26 13.29 12.13 13.20 12.38 108813000.0 12.38
2020-06-29 13.51 12.02 12.57 13.32 114650300.0 13.32
2020-06-30 13.48 12.88 13.10 13.07 68669742.0 13.07
[3715 rows x 6 columns]
yes, the begin of 2 stocks is different, the end date is same.
so i get the plot is like this:
stockplot
this is not normal like others.
so, who could give me any advice, to draw a normal trendance line for 2 stocks?
You can try for making two different plots with same limits and then put one over the another for comparison.
I am running the following code to calculate for every dataframe row the number of positive days in the previous rows and the number of days in which the stock has beaten the S&P 500 index:
for offset in [1,5,15,30,45,60,75,90,120,150,
200,250,500,750,1000,1250,1500]:
asset['return_stock'] = (asset.Close - asset.Close.shift(1)) / (asset.Close.shift(1))
merged_data = pd.merge(asset, sp_500, on='Date')
total_positive_days=0
total_beating_sp_days=0
for index, row in merged_data.iterrows():
print(offset, index)
for i in range(0,offset):
if index-i-1>0:
if merged_data.loc[index-i,'Close_x'] > merged_data.loc[index-i-1,'Close_x']:
total_positive_days+=1
if merged_data.loc[index-i,'return_stock'] > merged_data.loc[index-i-1,'return_sp']:
total_beating_sp_days+=1
but it is quite slow. Is there a way to speed it up (possibly by somehow getting rid of the for loop)?
My dataset looks like this (merged_data follows):
Date Open_x High_x Low_x Close_x Adj Close_x Volume_x return_stock Pct_positive_1 Pct_beating_1 Pct_change_1 Pct_change_plus_1 Pct_positive_5 Pct_beating_5 Pct_change_5 Pct_change_plus_5 Pct_positive_15 Pct_beating_15 Pct_change_15 Pct_change_plus_15 Pct_positive_30 Pct_beating_30 Pct_change_30 Pct_change_plus_30 Open_y High_y Low_y Close_y Adj Close_y Volume_y return_sp
0 2010-01-04 30.490000 30.642857 30.340000 30.572857 26.601469 123432400 NaN 1311.0 1261.0 NaN -0.001726 1310.4 1260.8 NaN 0.018562 1307.2 1257.6 NaN 0.039186 1302.066667 1252.633333 NaN 0.056579 1116.560059 1133.869995 1116.560059 1132.989990 1132.989990 3991400000 0.016043
1 2010-01-05 30.657143 30.798571 30.464285 30.625713 26.647457 150476200 0.001729 1311.0 1261.0 0.001729 0.016163 1310.4 1260.8 NaN 0.032062 1307.2 1257.6 NaN 0.031268 1302.066667 1252.633333 NaN 0.056423 1132.660034 1136.630005 1129.660034 1136.520020 1136.520020 2491020000 0.003116
2 2010-01-06 30.625713 30.747143 30.107143 30.138571 26.223597 138040000 -0.015906 1311.0 1261.0 -0.015906 0.001852 1310.4 1260.8 NaN 0.001519 1307.2 1257.6 NaN 0.058608 1302.066667 1252.633333 NaN 0.046115 1135.709961 1139.189941 1133.949951 1137.140015 1137.140015 4972660000 0.000546
3 2010-01-07 30.250000 30.285715 29.864286 30.082857 26.175119 119282800 -0.001849 1311.0 1261.0 -0.001849 -0.006604 1310.4 1260.8 NaN 0.005491 1307.2 1257.6 NaN 0.096428 1302.066667 1252.633333 NaN 0.050694 1136.270020 1142.459961 1131.319946 1141.689941 1141.689941 5270680000 0.004001
4 2010-01-08 30.042856 30.285715 29.865715 30.282858 26.349140 111902700 0.006648 1311.0 1261.0 0.006648 0.008900 1310.4 1260.8 NaN 0.029379 1307.2 1257.6 NaN 0.088584 1302.066667 1252.633333 NaN 0.075713 1140.520020 1145.390015 1136.219971 1144.979980 1144.979980 4389590000 0.002882
asset follows:
Date Open High Low Close Adj Close Volume return_stock Pct_positive_1 Pct_beating_1 Pct_change_1 Pct_change_plus_1 Pct_positive_5 Pct_beating_5 Pct_change_5 Pct_change_plus_5
0 2010-01-04 30.490000 30.642857 30.340000 30.572857 26.601469 123432400 NaN 1311.0 1261.0 NaN -0.001726 1310.4 1260.8 NaN 0.018562
1 2010-01-05 30.657143 30.798571 30.464285 30.625713 26.647457 150476200 0.001729 1311.0 1261.0 0.001729 0.016163 1310.4 1260.8 NaN 0.032062
2 2010-01-06 30.625713 30.747143 30.107143 30.138571 26.223597 138040000 -0.015906 1311.0 1261.0 -0.015906 0.001852 1310.4 1260.8 NaN 0.001519
3 2010-01-07 30.250000 30.285715 29.864286 30.082857 26.175119 119282800 -0.001849 1311.0 1261.0 -0.001849 -0.006604 1310.4 1260.8 NaN 0.005491
4 2010-01-08 30.042856 30.285715 29.865715 30.282858 26.349140 111902700 0.006648 1311.0 1261.0 0.006648 0.008900 1310.4 1260.8 NaN 0.029379
sp_500 follows:
Date Open High Low Close Adj Close Volume return_sp
0 1999-12-31 1464.469971 1472.420044 1458.189941 1469.250000 1469.250000 374050000 NaN
1 2000-01-03 1469.250000 1478.000000 1438.359985 1455.219971 1455.219971 931800000 -0.009549
2 2000-01-04 1455.219971 1455.219971 1397.430054 1399.420044 1399.420044 1009000000 -0.038345
3 2000-01-05 1399.420044 1413.270020 1377.680054 1402.109985 1402.109985 1085500000 0.001922
4 2000-01-06 1402.109985 1411.900024 1392.099976 1403.449951 1403.449951 1092300000 0.000956
This is a partial answer.
I think the way you do
asset.Close - asset.Close.shift(1)
at the top is key to how you might do this. Instead of
if merged_data.loc[index-i,'Close_x'] > merged_data.loc[index-i-1,'Close_x']
create a column with the change in Close_x:
merged_data['Delta_Close_x'] = merged_data.Close_x - merged_data.Close_x.shift(1)
Similarly,
if merged_data.loc[index-i,'return_stock'] > merged_data.loc[index-i-1,'return_sp']
becomes
merged_data['vs_sp'] = merged_data.return_stock - merged_data.return_sp.shift(1)
Then you can iterate i and use subsets like
merged_data[merged_data['Delta_Close_x'] > 0 and merged_data['vs_sp'] > 0]
There are a lot of additional details to work out, but I hope this gets you started.
I have a pandas data frame that looks like:
High Low ... Volume OpenInterest
2018-01-02 983.25 975.50 ... 8387 67556
2018-01-03 986.75 981.00 ... 7447 67525
2018-01-04 985.25 977.00 ... 8725 67687
2018-01-05 990.75 984.00 ... 7948 67975
I calculate the Average True Range and save it into a series:
i = 0
TR_l = [0]
while i < (df.shape[0]-1):
#TR = max(df.loc[i + 1, 'High'], df.loc[i, 'Close']) - min(df.loc[i + 1, 'Low'], df.loc[i, 'Close'])
TR = max(df['High'][i+1], df['Close'][i]) - min(df['Low'][i+1], df['Close'][i])
TR_l.append(TR)
i = i + 1
TR_s = pd.Series(TR_l)
ATR = pd.Series(TR_s.ewm(span=n, min_periods=n).mean(), name='ATR_' + str(n))
With a 14-period rolling window ATR looks like:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 8.096064
14 7.968324
15 8.455205
16 9.046418
17 8.895405
18 9.088769
19 9.641879
20 9.516764
But when I do:
df = df.join(ATR)
The ATR column in df is all NaN. It's because the indexes are different between the data frame and ATR. Is there any way to add the ATR column into the data frame?
Consider shift to avoid the while loop across rows and list building. Below uses Union Pacific (UNP) railroad stock data to demonstrate:
import pandas as pd
import pandas_datareader as pdr
stock_df = pdr.get_data_yahoo('UNP').loc['2019-01-01':'2019-03-29']
# SHIFT DATA ONE DAY BACK AND JOIN TO ORIGINAL DATA
stock_df = stock_df.join(stock_df.shift(-1), rsuffix='_future')
# CALCULATE TR DIFFERENCE BY ROW
stock_df['TR'] = stock_df.apply(lambda x: max(x['High_future'], x['Close']) - min(x['Low_future'], x['Close']), axis=1)
# CALCULATE EWM MEAN
n = 14
stock_df['ATR'] = stock_df['TR'].ewm(span=n, min_periods=n).mean()
Output
print(stock_df.head(20))
# High Low Open Close Volume Adj Close High_future Low_future Open_future Close_future Volume_future Adj Close_future TR ATR
# Date
# 2019-01-02 138.320007 134.770004 135.649994 137.779999 3606300.0 137.067413 136.750000 132.169998 136.039993 132.679993 5684500.0 131.993790 5.610001 NaN
# 2019-01-03 136.750000 132.169998 136.039993 132.679993 5684500.0 131.993790 138.580002 134.520004 134.820007 137.789993 5649900.0 137.077362 5.900009 NaN
# 2019-01-04 138.580002 134.520004 134.820007 137.789993 5649900.0 137.077362 139.229996 136.259995 137.330002 138.649994 4034200.0 137.932907 2.970001 NaN
# 2019-01-07 139.229996 136.259995 137.330002 138.649994 4034200.0 137.932907 152.889999 149.039993 151.059998 150.750000 10558800.0 149.970337 14.240005 NaN
# 2019-01-08 152.889999 149.039993 151.059998 150.750000 10558800.0 149.970337 151.059998 148.610001 150.289993 150.360001 4284600.0 149.582352 2.449997 NaN
# 2019-01-09 151.059998 148.610001 150.289993 150.360001 4284600.0 149.582352 155.289993 149.009995 149.899994 154.660004 6444600.0 153.860123 6.279999 NaN
# 2019-01-10 155.289993 149.009995 149.899994 154.660004 6444600.0 153.860123 155.029999 153.089996 153.639999 153.210007 3845200.0 152.417618 1.940002 NaN
# 2019-01-11 155.029999 153.089996 153.639999 153.210007 3845200.0 152.417618 154.240005 151.649994 152.229996 153.889999 3507100.0 153.094101 2.590012 NaN
# 2019-01-14 154.240005 151.649994 152.229996 153.889999 3507100.0 153.094101 154.360001 151.740005 153.789993 152.479996 4685100.0 151.691391 2.619995 NaN
# 2019-01-15 154.360001 151.740005 153.789993 152.479996 4685100.0 151.691391 153.729996 150.910004 152.910004 151.970001 4053200.0 151.184021 2.819992 NaN
# 2019-01-16 153.729996 150.910004 152.910004 151.970001 4053200.0 151.184021 154.919998 150.929993 151.110001 154.639999 4075400.0 153.840210 3.990005 NaN
# 2019-01-17 154.919998 150.929993 151.110001 154.639999 4075400.0 153.840210 158.800003 155.009995 155.539993 158.339996 5003900.0 157.521072 4.160004 NaN
# 2019-01-18 158.800003 155.009995 155.539993 158.339996 5003900.0 157.521072 157.199997 154.410004 156.929993 155.020004 6052900.0 154.218262 3.929993 NaN
# 2019-01-22 157.199997 154.410004 156.929993 155.020004 6052900.0 154.218262 156.020004 152.429993 155.449997 154.330002 4858000.0 153.531830 3.590012 4.011254
# 2019-01-23 156.020004 152.429993 155.449997 154.330002 4858000.0 153.531830 160.759995 156.009995 160.039993 160.339996 9222400.0 159.510742 6.429993 4.376440
# 2019-01-24 160.759995 156.009995 160.039993 160.339996 9222400.0 159.510742 162.000000 160.220001 161.460007 160.949997 7770700.0 160.117584 1.779999 3.991223
# 2019-01-25 162.000000 160.220001 161.460007 160.949997 7770700.0 160.117584 160.789993 159.339996 160.000000 159.899994 3733800.0 159.073013 1.610001 3.643168
# 2019-01-28 160.789993 159.339996 160.000000 159.899994 3733800.0 159.073013 160.929993 158.750000 160.039993 160.169998 3436900.0 159.341614 2.179993 3.432011
# 2019-01-29 160.929993 158.750000 160.039993 160.169998 3436900.0 159.341614 161.889999 159.440002 161.089996 160.820007 4112200.0 159.988266 2.449997 3.291831
# 2019-01-30 161.889999 159.440002 161.089996 160.820007 4112200.0 159.988266 160.990005 157.020004 160.750000 159.070007 7438600.0 158.247314 3.970001 3.387735
I know using pandas this is how you normally get daily stock price quotes. But I'm wondering if its possible to get monthly or weekly quotes, is there maybe a parameter I can pass through to get monthly quotes?
from pandas.io.data import DataReader
from datetime import datetime
ibm = DataReader('IBM', 'yahoo', datetime(2000,1,1), datetime(2012,1,1))
print(ibm['Adj Close'])
Monthly closing prices from Yahoo! Finance...
import pandas_datareader.data as web
data = web.get_data_yahoo('IBM','01/01/2015',interval='m')
where you can replace the interval input as required ('d', 'w', 'm', etc).
Using Yahoo Finance, it is possible to get Stock Prices using "interval" option with instead of "m" as shown:
#Library
import yfinance as yf
from datetime import datetime
#Load Stock price
df = yf.download("IBM", start= datetime(2000,1,1), end = datetime(2012,1,1),interval='1mo')
df
The result is:
The other possible interval options are:
1m,
2m,
5m,
15m,
30m,
60m,
90m,
1h,
1d,
5d,
1wk,
1mo,
3mo.
try this:
In [175]: from pandas_datareader.data import DataReader
In [176]: ibm = DataReader('IBM', 'yahoo', '2001-01-01', '2012-01-01')
UPDATE: show average for Adj Close only (month start)
In [12]: ibm.groupby(pd.TimeGrouper(freq='MS'))['Adj Close'].mean()
Out[12]:
Date
2001-01-01 79.430605
2001-02-01 86.625519
2001-03-01 75.938913
2001-04-01 81.134375
2001-05-01 90.460754
2001-06-01 89.705042
2001-07-01 83.350254
2001-08-01 82.100543
2001-09-01 74.335789
2001-10-01 79.937451
...
2011-03-01 141.628553
2011-04-01 146.530774
2011-05-01 150.298053
2011-06-01 146.844772
2011-07-01 158.716834
2011-08-01 150.690990
2011-09-01 151.627555
2011-10-01 162.365699
2011-11-01 164.596963
2011-12-01 167.924676
Freq: MS, Name: Adj Close, dtype: float64
show average for Adj Close only (month end)
In [13]: ibm.groupby(pd.TimeGrouper(freq='M'))['Adj Close'].mean()
Out[13]:
Date
2001-01-31 79.430605
2001-02-28 86.625519
2001-03-31 75.938913
2001-04-30 81.134375
2001-05-31 90.460754
2001-06-30 89.705042
2001-07-31 83.350254
2001-08-31 82.100543
2001-09-30 74.335789
2001-10-31 79.937451
...
2011-03-31 141.628553
2011-04-30 146.530774
2011-05-31 150.298053
2011-06-30 146.844772
2011-07-31 158.716834
2011-08-31 150.690990
2011-09-30 151.627555
2011-10-31 162.365699
2011-11-30 164.596963
2011-12-31 167.924676
Freq: M, Name: Adj Close, dtype: float64
monthly averages (all columns):
In [179]: ibm.groupby(pd.TimeGrouper(freq='M')).mean()
Out[179]:
Open High Low Close Volume Adj Close
Date
2001-01-31 100.767857 103.553571 99.428333 101.870357 9474409 79.430605
2001-02-28 111.193160 113.304210 108.967368 110.998422 8233626 86.625519
2001-03-31 97.366364 99.423637 95.252272 97.281364 11570454 75.938913
2001-04-30 103.990500 106.112500 102.229501 103.936999 11310545 81.134375
2001-05-31 115.781363 117.104091 114.349091 115.776364 7243463 90.460754
2001-06-30 114.689524 116.199048 113.739523 114.777618 6806176 89.705042
2001-07-31 106.717143 108.028095 105.332857 106.646666 7667447 83.350254
2001-08-31 105.093912 106.196521 103.856522 104.939999 6234847 82.100543
2001-09-30 95.138667 96.740000 93.471334 94.987333 12620833 74.335789
2001-10-31 101.400870 103.140000 100.327827 102.145217 9754413 79.937451
2001-11-30 113.449047 114.875715 112.510952 113.938095 6435061 89.256046
2001-12-31 120.651001 122.076000 119.790500 121.087999 6669690 94.878736
2002-01-31 116.483334 117.509524 114.613334 115.994762 9217280 90.887920
2002-02-28 103.194210 104.389474 101.646316 102.961579 9069526 80.764672
2002-03-31 105.246500 106.764499 104.312999 105.478499 7563425 82.756873
... ... ... ... ... ... ...
2010-10-31 138.956188 140.259048 138.427142 139.631905 6537366 122.241844
2010-11-30 144.281429 145.164762 143.385241 144.439524 4956985 126.878319
2010-12-31 145.155909 145.959545 144.567273 145.251819 4245127 127.726929
2011-01-31 152.595000 153.950499 151.861000 153.181501 5941580 134.699880
2011-02-28 163.217895 164.089474 162.510002 163.339473 4687763 144.050847
2011-03-31 160.433912 161.745652 159.154349 160.425651 5639752 141.628553
2011-04-30 165.437501 166.587500 164.760500 165.978500 5038475 146.530774
2011-05-31 169.657144 170.679046 168.442858 169.632857 5276390 150.298053
2011-06-30 165.450455 166.559093 164.691819 165.593635 4792836 146.844772
2011-07-31 178.124998 179.866502 177.574998 178.981500 5679660 158.716834
2011-08-31 169.734350 171.690435 166.749567 169.360434 8480613 150.690990
2011-09-30 169.752858 172.034761 168.109999 170.245714 6566428 151.627555
2011-10-31 181.529525 183.597145 180.172379 182.302381 6883985 162.365699
2011-11-30 184.536668 185.950952 182.780477 184.244287 4619719 164.596963
2011-12-31 188.151428 189.373809 186.421905 187.789047 4925547 167.924676
[132 rows x 6 columns]
weekly averages (all columns):
In [180]: ibm.groupby(pd.TimeGrouper(freq='W')).mean()
Out[180]:
Open High Low Close Volume Adj Close
Date
2001-01-07 89.234375 94.234375 87.890625 91.656250 11060200 71.466436
2001-01-14 93.412500 95.062500 91.662500 93.412500 7470200 72.835824
2001-01-21 100.250000 103.921875 99.218750 102.250000 13851500 79.726621
2001-01-28 109.575000 111.537500 108.675000 110.600000 8056720 86.237303
2001-02-04 113.680000 115.465999 111.734000 113.582001 6538080 88.562436
2001-02-11 113.194002 115.815999 111.639999 113.884001 7269320 88.858876
2001-02-18 113.960002 116.731999 113.238000 115.106000 7225420 89.853021
2001-02-25 109.525002 111.375000 105.424999 107.977501 10722700 84.288436
2001-03-04 103.390001 106.052002 100.386000 103.228001 11982540 80.580924
2001-03-11 105.735999 106.920000 103.364002 104.844002 9226900 81.842391
2001-03-18 95.660001 97.502002 93.185997 94.899998 13863740 74.079992
2001-03-25 90.734000 92.484000 88.598000 90.518001 11382280 70.659356
2001-04-01 95.622000 97.748000 94.274000 96.106001 10467580 75.021411
2001-04-08 95.259999 97.360001 93.132001 94.642000 12312580 73.878595
2001-04-15 98.350000 99.520000 95.327502 97.170000 10218625 75.851980
... ... ... ... ... ... ...
2011-09-25 170.678003 173.695996 169.401996 171.766000 6358100 152.981582
2011-10-02 176.290002 178.850000 174.729999 176.762000 7373680 157.431216
2011-10-09 175.920001 179.200003 174.379999 177.792001 7623560 158.348576
2011-10-16 185.366000 187.732001 184.977997 187.017999 5244180 166.565614
2011-10-23 180.926001 182.052002 178.815997 180.351999 9359200 160.628611
2011-10-30 183.094003 184.742001 181.623996 183.582001 5743800 163.505379
2011-11-06 184.508002 186.067999 183.432004 184.716003 4583780 164.515366
2011-11-13 185.350000 186.690002 183.685999 185.508005 4180620 165.750791
2011-11-20 187.600003 189.101999 185.368002 186.738000 5104420 166.984809
2011-11-27 181.067497 181.997501 178.717499 179.449997 4089350 160.467733
2011-12-04 185.246002 187.182001 184.388000 186.052002 5168720 166.371376
2011-12-11 191.841998 194.141998 191.090002 192.794000 4828580 172.400204
2011-12-18 191.085999 191.537998 187.732001 188.619998 6037220 168.667729
2011-12-25 183.810001 184.634003 181.787997 183.678000 5433360 164.248496
2012-01-01 185.140003 185.989998 183.897499 184.750000 3029925 165.207100
[574 rows x 6 columns]
Get it from Quandl:
import pandas as pd
import quandl
quandl.ApiConfig.api_key = 'xxxxxxxxxxxx' # Optional
quandl.ApiConfig.api_version = '2015-04-09' # Optional
ibm = quandl.get("WIKI/IBM", start_date="2000-01-01", end_date="2012-01-01", collapse="monthly", returns="pandas")