How to iterate through columns of the dataframe? - python

I want go through the all the columns of the dataframe. so that I will get a particular data of the column, using these data I have to calculate for another dataframe.
Here i have :
DP1 DP2 DP3 DP4 DP5 DP6 DP7 DP8 DP9 DP10 Total
OP1 357848.0 1124788.0 1735330.0 2218270.0 2745596.0 3319994.0 3466336.0 3606286.0 3833515.0 3901463.0 3901463.0
OP2 352118.0 1236139.0 2170033.0 3353322.0 3799067.0 4120063.0 4647867.0 4914039.0 5339085.0 NaN 5339085.0
OP3 290507.0 1292306.0 2218525.0 3235179.0 3985995.0 4132918.0 4628910.0 4909315.0 NaN NaN 4909315.0
OP4 310608.0 1418858.0 2195047.0 3757447.0 4029929.0 4381982.0 4588268.0 NaN NaN NaN 4588268.0
OP5 443160.0 1136350.0 2128333.0 2897821.0 3402672.0 3873311.0 NaN NaN NaN NaN 3873311.0
OP6 396132.0 1333217.0 2180715.0 2985752.0 3691712.0 NaN NaN NaN NaN NaN 3691712.0
OP7 440832.0 1288463.0 2419861.0 3483130.0 NaN NaN NaN NaN NaN NaN 3483130.0
OP8 359480.0 1421128.0 2864498.0 NaN NaN NaN NaN NaN NaN NaN 2864498.0
OP9 376686.0 1363294.0 NaN NaN NaN NaN NaN NaN NaN NaN 1363294.0
OP10 344014.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 344014.0
Total 3671385.0 11614543.0 17912342.0 21930921.0 21654971.0 19828268.0 17331381.0 13429640.0 9172600.0 3901463.0 34358090.0
Latest Observation 344014.0 1363294.0 2864498.0 3483130.0 3691712.0 3873311.0 4588268.0 4909315.0 5339085.0 3901463.0 NaN
From this table I would like to calculate formula this formula :in column DP1,Total/Last observation and this answer is divides by DP2 columns total. Like this we have to calculate all the columns and save it in another dataframe.
we need row like this :
Weighted Average 3.491 1.747 1.457 1.174 1.104 1.086 1.054 1.077 1.018
This code we tried:
LDFTriangledf['Weighted Average'] =CumulativePaidTriangledf.loc['Total','DP2']/(CumulativePaidTriangledf.loc['Total','DP1'] - CumulativePaidTriangledf.loc['Latest Observation','DP1'])

You can remove the column names from .loc and just shift(-1, axis=1) to get the next column's Total. This lets you apply the formula to all columns in a single operation:
CumulativePaidTriangledf.shift(-1, axis=1).loc['Total'] / (CumulativePaidTriangledf.loc['Total'] - CumulativePaidTriangledf.loc['Latest Observation'])
# DP1 3.490607
# DP2 1.747333
# DP3 1.457413
# DP4 1.173852
# DP5 1.103824
# DP6 1.086269
# DP7 1.053874
# DP8 1.076555
# DP9 1.017725
# DP10 inf
# Total NaN
# dtype: float64
Here is a breakdown of what the three components are doing:
DP1
DP2
DP3
DP4
DP5
DP6
DP7
DP8
DP9
DP10
Total
A: .shift(-1, axis=1).loc['Total'] -- We are shifting the whole Total row to the left, so every column now has the next Total value.
1.161454e+07
1.791234e+07
2.193092e+07
2.165497e+07
1.982827e+07
1.733138e+07
1.342964e+07
9.172600e+06
3.901463e+06
34358090.0
NaN
B: .loc['Total'] -- This is the normal Total row.
3.671385e+06
1.161454e+07
1.791234e+07
2.193092e+07
2.165497e+07
1.982827e+07
1.733138e+07
1.342964e+07
9.172600e+06
3901463.0
34358090.0
C: .loc['Latest Observation'] -- This is the normal Latest Observation.
3.440140e+05
1.363294e+06
2.864498e+06
3.483130e+06
3.691712e+06
3.873311e+06
4.588268e+06
4.909315e+06
5.339085e+06
3901463.0
NaN
A / (B-C) -- This is what the code above does. It takes the shifted Total row (A) and divides it by the difference of the current Total row (B) and current Latest observation row (C).
3.490607
1.747333
1.457413
1.173852
1.103824
1.086269
1.053874
1.076555
1.017725
inf
NaN

Related

How to convert data from DataFrame to form

I'm trying to make a report and then convert it to the prescribed form but I don't know how. Below is my code:
data = pd.read_csv('https://raw.githubusercontent.com/hoatranobita/reports/main/Loan_list_test.csv')
data_pivot = pd.pivot_table(data,('CLOC_CUR_XC_BL'),index=['BIZ_TYPE_SBV_CODE'],columns=['TERM_CODE','CURRENCY_CD'],aggfunc=np.sum).reset_index
print(data_pivot)
Pivot table shows as below:
<bound method DataFrame.reset_index of TERM_CODE Ng?n h?n Trung h?n
CURRENCY_CD 1. VND 2. USD 1. VND 2. USD
BIZ_TYPE_SBV_CODE
201 170000.00 NaN 43533.42 NaN
202 2485441.64 5188792.76 2682463.04 1497309.06
204 35999.99 NaN NaN NaN
301 1120940.65 NaN 190915.62 453608.72
401 347929.88 182908.01 239123.29 NaN
402 545532.99 NaN 506964.23 NaN
403 21735.74 NaN 1855.92 NaN
501 10346.45 NaN NaN NaN
601 881974.40 NaN 50000.00 NaN
602 377216.09 NaN 828868.61 NaN
702 9798.74 NaN 23616.39 NaN
802 155099.66 NaN 762294.95 NaN
803 23456.79 NaN 97266.84 NaN
804 151590.00 NaN 378000.00 NaN
805 182925.30 54206.52 4290216.37 NaN>
Here is the prescribed form:
form = pd.read_excel('https://github.com/hoatranobita/reports/blob/main/Form%20A00034.xlsx?raw=true')
form.head()
Mã ngành kinh tế Dư nợ tín dụng (không bao gồm mua, đầu tư trái phiếu doanh nghiệp) Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5
0 NaN Ngắn hạn NaN Trung và dài hạn NaN Tổng cộng
1 NaN Bằng VND Bằng ngoại tệ Bằng VND Bằng ngoại tệ NaN
2 101.0 NaN NaN NaN NaN NaN
3 201.0 NaN NaN NaN NaN NaN
4 202.0 NaN NaN NaN NaN NaN
As you see, pivot table have no 101 but form has. So what I have to do to convert from Dataframe to Form that skip 101.
Thank you.
Hi First create a worksheet using xlsxwriter
import xlsxwriter
#start workbook
workbook = xlsxwriter.Workbook('merge1.xlsx')
#Introduce formatting
format = workbook.add_format({'border': 1,'bold': True})
#Adding a worksheet
worksheet = workbook.add_worksheet()
merge_format = workbook.add_format({
'bold':1,
'border': 1,
'align': 'center',
'valign': 'vcenter'})
#Starting the Headers
worksheet.merge_range('A1:A3', 'Mã ngành kinh tế', merge_format)
worksheet.merge_range('B1:F1', 'Dư nợ tín dụng (không bao gồm mua, đầu tư trái phiếu doanh nghiệp)', merge_format)
worksheet.merge_range('B2:C2', 'Ngắn hạn', merge_format)
worksheet.merge_range('D2:E2', 'Trung và dài hạn', merge_format)
worksheet.merge_range('F2:F3', 'Tổng cộng', merge_format)
worksheet.write(2, 1, 'Bằng VND',format)
worksheet.write(2, 2, 'Bằng ngoại tệ',format)
worksheet.write(2, 3, 'Bằng VND',format)
worksheet.write(2, 4, 'Bằng ngoại tệ',format)
After this formatting you can start writing to sheet looping through using worksheet.write() below I have included a sample
expenses = (
['Rent', 1000],
['Gas', 100],
['Food', 300],
['Gym', 50],
)
for item, cost in (expenses):
worksheet.write(row, col, item)
row += 1
In row and col you can specify the cell row and column number it goes as a numerical value like a matrix.
And finally close the workbook
workbook.close()

Using DataFrame Columns as id

Does anyone know how to transform this DataFrame in a way that the column names become a query ID (keeping the df length) and the values are flattened. I am trying to learn about 'learning to rank' algorithms. Thanks for the help.
AUD=X CAD=X CHF=X ... SGD=X THB=X ZAR=X
Date ...
2004-06-30 NaN 1.33330 1.25040 ... 1.72090 40.834999 6.12260
2004-07-01 NaN 1.33160 1.24900 ... 1.71420 40.716999 6.16500
2004-07-02 NaN 1.32270 1.23320 ... 1.71160 40.638000 6.12010
2004-07-05 NaN 1.32470 1.23490 ... 1.71480 40.658001 6.15010
2004-07-06 NaN 1.32660 1.23660 ... 1.71530 40.765999 6.20990
... ... ... ... ... ... ...
2021-07-19 1.352997 1.26169 0.91853 ... 1.35630 32.810001 14.38950
2021-07-20 1.362546 1.27460 0.91850 ... 1.36360 32.840000 14.53068
2021-07-21 1.362600 1.26751 0.92123 ... 1.36621 32.820000 14.59157
2021-07-22 1.360060 1.25689 0.91757 ... 1.36383 32.849998 14.57449
2021-07-23 1.354922 1.25640 0.91912 ... 1.35935 32.879002 14.69760
In [3]: df
Out[3]:
AUD=X CAD=X CHF=X SGD=X THB=X ZAR=X
Date
2004-06-30 NaN 1.3333 1.2504 1.7209 40.834999 6.1226
2004-07-01 NaN 1.3316 1.2490 1.7142 40.716999 6.1650
2004-07-02 NaN 1.3227 1.2332 1.7116 40.638000 6.1201
2004-07-05 NaN 1.3247 1.2349 1.7148 40.658001 6.1501
2004-07-06 NaN 1.3266 1.2366 1.7153 40.765999 6.2099
In [6]: df.columns = df.columns.str.slice(0, -2)
In [8]: df.T
Out[8]:
Date 2004-06-30 2004-07-01 2004-07-02 2004-07-05 2004-07-06
AUD NaN NaN NaN NaN NaN
CAD 1.333300 1.331600 1.3227 1.324700 1.326600
CHF 1.250400 1.249000 1.2332 1.234900 1.236600
SGD 1.720900 1.714200 1.7116 1.714800 1.715300
THB 40.834999 40.716999 40.6380 40.658001 40.765999
ZAR 6.122600 6.165000 6.1201 6.150100 6.209900
I'm still not super clear on the requirements, but this transformation might help.

How to retrieve the columns of DataFrame within the loop in Python?

I have this below output. I have wrote code for this inside while loop for looping. Here i enter 3 then it creates 3 different dataframes with different values.
Enter the Number to iter:3
Paid CDF Ultimate Reserves
OP1 3901463.0 NaN NaN NaN
OP2 5339085.0 1.000000 5339085.000 0.0
OP3 4909315.0 1.085767 5331516.090 422201.1
OP4 4588268.0 1.136096 5212272.448 624004.4
OP5 3873311.0 1.238680 4799032.329 925721.3
OP6 3691712.0 1.350145 4983811.200 1292099.2
OP7 3483130.0 1.602168 5579974.260 2096844.3
OP8 2864498.0 2.334476 6685738.332 3821240.3
OP9 1363294.0 4.237940 5777639.972 4414346.0
OP10 344014.0 15.204053 5230388.856 4886374.9
Total NaN NaN NaN 18482831.5
Paid CDF Ultimate Reserves
OP1 3901463.0 NaN NaN NaN
OP2 5339085.0 1.000000 5339085.000 0.0
OP3 4909315.0 1.090000 5351153.350 441838.4
OP4 4588268.0 1.137559 5221448.984 633181.0
OP5 3873311.0 1.231368 4768045.841 894734.8
OP6 3691712.0 1.331933 4917360.384 1225648.4
OP7 3483130.0 1.563703 5447615.320 1964485.3
OP8 2864498.0 2.318600 6642770.862 3778272.9
OP9 1363294.0 4.234960 5773550.090 4410256.1
OP10 344014.0 16.958969 5834133.426 5490119.4
Total NaN NaN NaN 18838536.3
Paid CDF Ultimate Reserves
OP1 3901463.0 NaN NaN NaN
OP2 5339085.0 1.000000 5339085.000 0.0
OP3 4909315.0 1.072698 5267694.995 358380.0
OP4 4588268.0 1.130229 5184742.840 596474.8
OP5 3873311.0 1.208164 4678959.688 805648.7
OP6 3691712.0 1.267187 4677399.104 985687.1
OP7 3483130.0 1.497767 5217728.740 1734598.7
OP8 2864498.0 2.229342 6384966.042 3520468.0
OP9 1363294.0 4.219405 5751737.386 4388443.4
OP10 344014.0 16.036065 5516608.504 5172594.5
Total NaN NaN NaN 17562295.2
Using Above reserve column i have to generate below dataframe like simulation1 ,simulation2 so on till the number of reserve column generated by user input.
OP1 OP2 OP3 OP4 OP5 OP6 OP7 OP8 OP9 OP10 Total
Simulation1 NaN 0.0 422201.1 624004.4 925721.3 1292099.2 2096844.3 3821240.3 4414346.0 4886374.9 18482831.5
Simulation2 NaN 0.0 441838.4 633181.0 894734.8 1225648.4 1964485.3 3778272.9 4410256.1 5490119.4 18838536.3
Simulation3 NaN 0.0 358380.0 596474.8 805648.7 985687.1 1734598.7 3520468.0 4388443.4 5172594.5 17562295.2
I have below code:
itno=int(input("Enter the Number to iter:"))
iter_count=0
while (iter_count < itno):
randomdf = scaledDF.copy()
choices = randomdf.values[~pd.isnull(randomdf.values)]
randomdf = randomdf.applymap(lambda x: np.random.choice(choices) if not pd.isnull(x) else x)
#print(randomdf,"\n\n")
cumdf = CumulativePaidTriangledf.iloc[:, :-1][:-2].copy()
ldfdf = LDFTriangledf.iloc[:, :-1].copy()
ResampledDF = randomdf.copy()
for colname4, col4 in ResampledDF.iteritems():
ResampledDF[colname4] = (ResampledDF[colname4] * (Variencedf[colname4][-1]/(cumdf[colname4]**0.5)))+ldfdf[colname4][-1]
#print(ResampledDF,"\n\n")
#SUMPRODUCT:
sumPro = ResampledDF.copy()
#cumdf = cumdf.iloc[:, :-1]
for colname5,col5 in sumPro.iteritems():
sumPro[colname5] = (sumPro[colname5].round(2))*cumdf[colname5]
sumPro = sumPro.append(pd.Series(sumPro.sum(), name='SUMPRODUCT'))
#print(sumPro)
#SUM(OFFSET):
sumOff = cumdf.apply(lambda x: x.iloc[:cumdf.index.get_loc(x.last_valid_index())].sum())
#print(sumOff)
#Weighted avg:
Weighted_avg = sumPro.loc['SUMPRODUCT']/sumOff
#print(Weighted_avg)
ResampledDF = ResampledDF.append(pd.Series(Weighted_avg, name='Weighted Avg'))
#print(ResampledDF,"\n\n")
'''for colname6,col6 in ResampledDF.iteritems():
ResampledDF[colname6] = ResampledDF[colname6].replace({'0':np.nan, 0:np.nan})
print(ResampledDF)'''
ResampledDF.loc['Weighted Avg'] = ResampledDF.loc['Weighted Avg'].replace(0, 1)
c = ResampledDF.iloc[-1][::-1].cumprod()[::-1]
ResampledDF = ResampledDF.append(pd.Series(c,name='CDF'))
#print("\n\n",ResampledDF)
#Getting Calculation of ultimates:
s = CumulativePaidTriangledf.iloc[:, :][:-2].copy()
ultiCalc = pd.DataFrame()
ultiCalc['Paid']= s['Total']
ultiCalc['CDF'] = np.flip(ResampledDF.loc['CDF'].values)
ultiCalc['Ultimate'] = ultiCalc['Paid']*(ultiCalc['CDF']).round(3)
ultiCalc['Reserves'] = (ultiCalc['Ultimate'].round(1))-ultiCalc['Paid']
ultiCalc.loc['Total'] = pd.Series(ultiCalc['Reserves'].sum(), index = ['Reserves']).round(2)
print("\n\n",ultiCalc)
iter_count+=1
#Getting Simulations:
simulationDf = pd.DataFrame(columns=['OP1','OP2','OP3','OP4','OP5','OP6','OP7','OP8','OP9','OP10','Total'])
simulationDf.loc['Simulation'] = ultiCalc['Reserves']
print("\n\n",simulationDf)
Current Output:
Simulation1 NaN
Simulation2 0.0
Simulation3 353470.7
Simulation4 559768.7
Simulation5 859875.0
Simulation6 1162889.3
Simulation7 1828643.2
Simulation8 3958736.2
Simulation9 4464787.9
Simulation10 5224196.6
Simulation11 18412367.6
Simulation12 NaN
Simulation13 0.0
Simulation14 402563.8
Simulation15 669887.1
Simulation16 883114.9
Simulation17 1185039.6
Simulation18 1859991.4
Simulation19 3511874.5
Simulation20 3875844.8
Simulation21 4481126.4
Simulation22 16869442.5
Use list comprehension for loop by list of DataFrames with select column Reserves and join together by DataFrame constructor, last if necessary set index:
dfs = [df1, df2, df3]
df = pd.DataFrame([x['Reserves'] for x in dfs]).reset_index(drop=True)
df.index = 'Simulation' + (df.index + 1).astype(str)
If there is some loop for generate DataFrames like pseudocode:
#create list outside loop
dfs = []
iter_count=0
while (iter_count < itno):
randomdf = scaledDF.copy()
choices = randomdf.values[~pd.isnull(randomdf.values)]
randomdf = randomdf.applymap(lambda x: np.random.choice(choices) if not pd.isnull(x) else x)
#print(randomdf,"\n\n")
....
....
#Getting Calculation of ultimates:
s = CumulativePaidTriangledf.iloc[:, :][:-2].copy()
ultiCalc = pd.DataFrame()
ultiCalc['Paid']= s['Total']
ultiCalc['CDF'] = np.flip(ResampledDF.loc['CDF'].values)
ultiCalc['Ultimate'] = ultiCalc['Paid']*(ultiCalc['CDF']).round(3)
ultiCalc['Reserves'] = (ultiCalc['Ultimate'].round(1))-ultiCalc['Paid']
ultiCalc.loc['Total'] = pd.Series(ultiCalc['Reserves'].sum(), index = ['Reserves']).round(2)
print("\n\n",ultiCalc)
iter_count+=1
#append in loop
dfs.append(ultiCalc['Reserves'])
And then outside loops join together:
df = pd.DataFrame([x for x in dfs]).reset_index(drop=True)
df.index = 'Simulation' + (df.index + 1).astype(str)

Python - faster way to run a for loop in a dataframe

I am running the following code to calculate for every dataframe row the number of positive days in the previous rows and the number of days in which the stock has beaten the S&P 500 index:
for offset in [1,5,15,30,45,60,75,90,120,150,
200,250,500,750,1000,1250,1500]:
asset['return_stock'] = (asset.Close - asset.Close.shift(1)) / (asset.Close.shift(1))
merged_data = pd.merge(asset, sp_500, on='Date')
total_positive_days=0
total_beating_sp_days=0
for index, row in merged_data.iterrows():
print(offset, index)
for i in range(0,offset):
if index-i-1>0:
if merged_data.loc[index-i,'Close_x'] > merged_data.loc[index-i-1,'Close_x']:
total_positive_days+=1
if merged_data.loc[index-i,'return_stock'] > merged_data.loc[index-i-1,'return_sp']:
total_beating_sp_days+=1
but it is quite slow. Is there a way to speed it up (possibly by somehow getting rid of the for loop)?
My dataset looks like this (merged_data follows):
Date Open_x High_x Low_x Close_x Adj Close_x Volume_x return_stock Pct_positive_1 Pct_beating_1 Pct_change_1 Pct_change_plus_1 Pct_positive_5 Pct_beating_5 Pct_change_5 Pct_change_plus_5 Pct_positive_15 Pct_beating_15 Pct_change_15 Pct_change_plus_15 Pct_positive_30 Pct_beating_30 Pct_change_30 Pct_change_plus_30 Open_y High_y Low_y Close_y Adj Close_y Volume_y return_sp
0 2010-01-04 30.490000 30.642857 30.340000 30.572857 26.601469 123432400 NaN 1311.0 1261.0 NaN -0.001726 1310.4 1260.8 NaN 0.018562 1307.2 1257.6 NaN 0.039186 1302.066667 1252.633333 NaN 0.056579 1116.560059 1133.869995 1116.560059 1132.989990 1132.989990 3991400000 0.016043
1 2010-01-05 30.657143 30.798571 30.464285 30.625713 26.647457 150476200 0.001729 1311.0 1261.0 0.001729 0.016163 1310.4 1260.8 NaN 0.032062 1307.2 1257.6 NaN 0.031268 1302.066667 1252.633333 NaN 0.056423 1132.660034 1136.630005 1129.660034 1136.520020 1136.520020 2491020000 0.003116
2 2010-01-06 30.625713 30.747143 30.107143 30.138571 26.223597 138040000 -0.015906 1311.0 1261.0 -0.015906 0.001852 1310.4 1260.8 NaN 0.001519 1307.2 1257.6 NaN 0.058608 1302.066667 1252.633333 NaN 0.046115 1135.709961 1139.189941 1133.949951 1137.140015 1137.140015 4972660000 0.000546
3 2010-01-07 30.250000 30.285715 29.864286 30.082857 26.175119 119282800 -0.001849 1311.0 1261.0 -0.001849 -0.006604 1310.4 1260.8 NaN 0.005491 1307.2 1257.6 NaN 0.096428 1302.066667 1252.633333 NaN 0.050694 1136.270020 1142.459961 1131.319946 1141.689941 1141.689941 5270680000 0.004001
4 2010-01-08 30.042856 30.285715 29.865715 30.282858 26.349140 111902700 0.006648 1311.0 1261.0 0.006648 0.008900 1310.4 1260.8 NaN 0.029379 1307.2 1257.6 NaN 0.088584 1302.066667 1252.633333 NaN 0.075713 1140.520020 1145.390015 1136.219971 1144.979980 1144.979980 4389590000 0.002882
asset follows:
Date Open High Low Close Adj Close Volume return_stock Pct_positive_1 Pct_beating_1 Pct_change_1 Pct_change_plus_1 Pct_positive_5 Pct_beating_5 Pct_change_5 Pct_change_plus_5
0 2010-01-04 30.490000 30.642857 30.340000 30.572857 26.601469 123432400 NaN 1311.0 1261.0 NaN -0.001726 1310.4 1260.8 NaN 0.018562
1 2010-01-05 30.657143 30.798571 30.464285 30.625713 26.647457 150476200 0.001729 1311.0 1261.0 0.001729 0.016163 1310.4 1260.8 NaN 0.032062
2 2010-01-06 30.625713 30.747143 30.107143 30.138571 26.223597 138040000 -0.015906 1311.0 1261.0 -0.015906 0.001852 1310.4 1260.8 NaN 0.001519
3 2010-01-07 30.250000 30.285715 29.864286 30.082857 26.175119 119282800 -0.001849 1311.0 1261.0 -0.001849 -0.006604 1310.4 1260.8 NaN 0.005491
4 2010-01-08 30.042856 30.285715 29.865715 30.282858 26.349140 111902700 0.006648 1311.0 1261.0 0.006648 0.008900 1310.4 1260.8 NaN 0.029379
sp_500 follows:
Date Open High Low Close Adj Close Volume return_sp
0 1999-12-31 1464.469971 1472.420044 1458.189941 1469.250000 1469.250000 374050000 NaN
1 2000-01-03 1469.250000 1478.000000 1438.359985 1455.219971 1455.219971 931800000 -0.009549
2 2000-01-04 1455.219971 1455.219971 1397.430054 1399.420044 1399.420044 1009000000 -0.038345
3 2000-01-05 1399.420044 1413.270020 1377.680054 1402.109985 1402.109985 1085500000 0.001922
4 2000-01-06 1402.109985 1411.900024 1392.099976 1403.449951 1403.449951 1092300000 0.000956
This is a partial answer.
I think the way you do
asset.Close - asset.Close.shift(1)
at the top is key to how you might do this. Instead of
if merged_data.loc[index-i,'Close_x'] > merged_data.loc[index-i-1,'Close_x']
create a column with the change in Close_x:
merged_data['Delta_Close_x'] = merged_data.Close_x - merged_data.Close_x.shift(1)
Similarly,
if merged_data.loc[index-i,'return_stock'] > merged_data.loc[index-i-1,'return_sp']
becomes
merged_data['vs_sp'] = merged_data.return_stock - merged_data.return_sp.shift(1)
Then you can iterate i and use subsets like
merged_data[merged_data['Delta_Close_x'] > 0 and merged_data['vs_sp'] > 0]
There are a lot of additional details to work out, but I hope this gets you started.

Rolling Mean not being calculated on a new column

I have an issue on calculating the rolling mean for a column I added in the code. For some reason, it doesnt work on the column I added but works on a column from the original csv.
Original dataframe from the csv as follow:
Open High Low Last Change Volume Open Int
Time
09/20/19 98.50 99.00 98.35 98.95 0.60 3305.0 0.0
09/19/19 100.35 100.75 98.10 98.35 -2.00 17599.0 0.0
09/18/19 100.65 101.90 100.10 100.35 0.00 18258.0 121267.0
09/17/19 103.75 104.00 100.00 100.35 -3.95 34025.0 122453.0
09/16/19 102.30 104.95 101.60 104.30 1.55 21403.0 127447.0
Ticker = pd.read_csv('\\......\Historical data\kcz19 daily.csv',
index_col=0, parse_dates=True)
Ticker['Return'] = np.log(Ticker['Last'] / Ticker['Last'].shift(1)).fillna('')
Ticker['ret20'] = Ticker['Return'].rolling(window=20, win_type='triang').mean()
print(Ticker.head())
Open High Low ... Open Int Return ret20
Time ...
09/20/19 98.50 99.00 98.35 ... 0.0
09/19/19 100.35 100.75 98.10 ... 0.0 -0.00608213 -0.00608213
09/18/19 100.65 101.90 100.10 ... 121267.0 0.0201315 0.0201315
09/17/19 103.75 104.00 100.00 ... 122453.0 0 0
09/16/19 102.30 104.95 101.60 ... 127447.0 0.0386073 0.0386073
ret20 column should have the rolling mean of the column Return so it should show some data starting from raw 21 whereas it is only a copy of column Return here.
If I replace with the Last column it will work.
Below is the result using colum Last
Open High Low ... Open Int Return ret20
Time ...
09/20/19 98.50 99.00 98.35 ... 0.0 NaN
09/19/19 100.35 100.75 98.10 ... 0.0 -0.00608213 NaN
09/18/19 100.65 101.90 100.10 ... 121267.0 0.0201315 NaN
09/17/19 103.75 104.00 100.00 ... 122453.0 0 NaN
09/16/19 102.30 104.95 101.60 ... 127447.0 0.0386073 NaN
09/13/19 103.25 103.60 102.05 ... 128707.0 -0.0149725 NaN
09/12/19 102.80 103.85 101.15 ... 128904.0 0.00823848 NaN
09/11/19 102.00 104.70 101.40 ... 132067.0 -0.00193237 NaN
09/10/19 98.50 102.25 98.00 ... 135349.0 -0.0175614 NaN
09/09/19 97.00 99.25 95.30 ... 137347.0 -0.0335283 NaN
09/06/19 95.35 97.30 95.00 ... 135399.0 -0.0122889 NaN
09/05/19 96.80 97.45 95.05 ... 136142.0 -0.0171477 NaN
09/04/19 95.65 96.95 95.50 ... 134864.0 0.0125002 NaN
09/03/19 96.00 96.60 94.20 ... 134685.0 -0.0109291 NaN
08/30/19 95.40 97.20 95.10 ... 134061.0 0.0135137 NaN
08/29/19 97.05 97.50 94.75 ... 132639.0 -0.0166584 NaN
08/28/19 97.40 98.15 95.95 ... 130573.0 0.0238601 NaN
08/27/19 97.35 98.00 96.40 ... 129921.0 -0.00410889 NaN
08/26/19 95.55 98.50 95.25 ... 129003.0 0.0035962 NaN
08/23/19 96.90 97.40 95.05 ... 130268.0 -0.0149835 98.97775
Appreciate any help
the .fillna('') is creating a string in the first row which then creates errors for the rolling calculation in Ticker['ret20'].
Delete this and the code will run fine:
Ticker['Return'] = np.log(Ticker['Last'] / Ticker['Last'].shift(1))

Categories