I am trying to loop over a DataFrame.
my code sample is the following:
for index1, long in tqdm(sim_df_ha_rsi.iterrows(), total=sim_df_ha_rsi.shape[0]):
if index1 in validLongLimits.index:
order_index += 1
orders[index] = {
'Type': 'LONG',
'Open': open_price,
'DateOpen': start_date,
'Commission_open': open_price * 0.0001 * amount,
'Commission_close': None,
'Close': None,
'DateClose': None,
'Profit': None,
'Balance': 165000,
'IsOpen': True
}
for index2, short in sim_df_ha_rsi.iloc[index1:].iterrows():
if index2 in validShortLimits.index:
if long['Date'] < short['Date']:
orders[index]['IsOpen'] = False
orders[index]['DateClose'] = short['Date']
orders[index]['Close'] = short['Close']
orders[index]['Commission_close'] = short['Close'] * 0.0001 * amount
for index3, stop in sim_df_ha_rsi.iloc[index2:].iterrows():
if index3 in validLongLimits.index:
open_price = validLongLimits.loc[ validLongLimits.index == index3 ]['Open'].values[0]
start_date = validLongLimits.loc[ validLongLimits.index == index3 ]['Date'].values[0]
index = index3
break
break
I fail to catch the following pattern:
let sim_df_ha_rsi be the data. validLongLimits and validShortLimits are sublists of the data. for a given value in validLongLimits, find the following value which belongs to validShortLimits, then reloop for the next value in validLongLimits which comes after the shorted one.
sim_df_ha_rsi is like:
Date Open High Low Close RSI_14_1m RSI_14_5m
0 2023-01-03 10:45:00 20.090000 20.180000 20.059999 20.115000 NaN NaN
1 2023-01-03 11:00:00 20.110001 20.100000 19.870001 19.990000 NaN NaN
2 2023-01-03 11:15:00 19.995000 19.950001 19.860001 19.897500 NaN NaN
3 2023-01-03 11:30:00 19.889999 19.889999 19.799999 19.852499 NaN NaN
4 2023-01-03 11:45:00 19.860000 19.850000 19.790001 19.817500 NaN NaN
... ... ... ... ... ... ... ...
313 2023-01-17 17:00:00 17.675000 17.750000 17.650000 17.700000 72.283634 62.390653
314 2023-01-17 17:15:00 17.700000 17.740000 17.629999 17.682499 70.135453 62.390653
315 2023-01-17 17:30:00 17.679999 17.660000 17.370001 17.547500 56.248421 55.891129
316 2023-01-17 17:45:00 17.580000 17.540001 17.410000 17.487500 51.379162 55.891129
317 2023-01-17 18:00:00 17.500000 17.520000 17.520000 17.520000 53.716387 55.891129
my output would be like:
Type Open DateOpen Commission_open Commission_close Close DateClose Profit Balance IsOpen
63 LONG 19.430000 2023-01-05 11:30:00 0.19430 None None None None 165000 True
74 LONG 19.600000 2023-01-05 14:15:00 0.19600 None None None None 165000 True
75 LONG 19.420000 2023-01-05 14:30:00 0.19420 None None None None 165000 True
76 LONG 19.330000 2023-01-05 14:45:00 0.19330 None None None None 165000 True
I want to open position and close position consecutively through data with specific conditions. Then, recalculate it with the price I closed the previous one.
my code seemingly works but shortcuts many gaps. It makes many calculations to return, say, 2 elements. breaks cause some fragility in the code. I don't want to use any break statement but don't know how to do.
Would a while loop with an anchor be more useful here?
Paraphase your inner for-loop line by line (but haven't tested with data):
short_df = sim_df_ha_rsi.iloc[index1:].filter(items=validShortLimits.index, axis=0)
short_df = short_df[short_df['Date'] > long['Date']]
if len(short_df) > 0:
short = short_df.iloc[0]
index2 = short.name
orders[index]['IsOpen'] = False
orders[index]['DateClose'] = short['Date']
orders[index]['Close'] = short['Close']
orders[index]['Commission_close'] = short['Close'] * 0.0001 * amount
long_df = sim_df_ha_rsi.iloc[index2:].filter(items=validLongLimits.index, axis=0)
if len(long_df) > 0:
open_price = long_df.iloc[0]['Open'].values[0]
start_date = long_df.iloc[0]['Date'].values[0]
index = long_df.iloc[0].name
Related
I have a table of data below:
(the 1st column is date, the 2nd column is the daily return)
2020-01-02 0.022034
2020-01-03 -0.002666
2020-01-06 0.009716
2020-01-07 0.009838
2020-01-08 -0.011690
2020-01-09 0.025103
2020-01-10 0.009325
2020-01-13 0.028888
2020-01-14 -0.009183
2020-01-15 0.012292
2020-01-16 -0.005593
2020-01-17 0.020492
2020-01-20 -0.003878
2020-01-21 -0.032687
2020-01-22 0.034887
2020-01-23 -0.033485
2020-01-24 0.001934
2020-01-29 -0.026629
2020-01-30 -0.039513
2020-01-31 -0.001845
2020-02-03 0.021784
2020-02-04 0.033137
2020-02-05 0.000586
2020-02-06 0.016146
2020-02-07 0.000082
2020-02-10 -0.016997
2020-02-11 0.010172
2020-02-12 0.016836
2020-02-13 0.013530
...
2022-01-31 0.031707
2022-02-04 0.028683
2022-02-07 -0.015853
2022-02-08 -0.024170
2022-02-09 0.045076
2022-02-10 0.013623
2022-02-11 -0.012259
2022-02-14 -0.023093
2022-02-15 -0.008984
2022-02-16 0.023177
2022-02-17 0.003182
2022-02-18 -0.054995
2022-02-21 -0.033302
2022-02-22 -0.028148
2022-02-23 0.012332
2022-02-24 -0.048095
2022-02-25 -0.004944
2022-02-28 -0.002682
2022-03-01 0.006940
2022-03-02 0.002542
2022-03-03 -0.006318
2022-03-04 -0.048641
2022-03-07 -0.050231
2022-03-08 -0.015469
2022-03-09 0.011477
2022-03-10 -0.002236
2022-03-11 -0.038740
2022-03-14 -0.115421
2022-03-15 -0.089573
2022-03-16 0.243084
I want to build a frequency table like below:
I think this might involve several steps:
(1) categorise daily return data into different ranges
(2) use value_counts() on the ranges
(3) calculate the percentage on the ranges
For the first step, I think I can try pd.cut with a groupby. However, my dataframe doesnt have a header, and I tried portret_df.columns = ['Dates','Daily Return'] but could not manage to add the header. May I ask how can I add a header there so that I can refer to the 1st and 2nd columns?
Much appreciated for your help.
Let's use pd.Series.value_counts with 'bins' parameter.
bins = [-np.inf,-.01,-.005,0,.005,.01,np.inf]
labels = ['ret < -1%',
'-1% < ret < -.5%',
'-.5% < ret < %0',
' %0 < ret < .5%',
'.5% < ret < 1%',
'reg > 1%']
df_counts = (df['ret'].value_counts(bins=bins, sort=False)
.rename('# of events').to_frame().set_axis(labels).T)
df_pcts = (df['ret'].value_counts(bins=bins, normalize=True, sort=False)
.rename('% of events').to_frame()
.set_axis(labels).T.mul(100).round(1))
pd.concat([df_counts, df_pcts])
Output:
ret < -1% -1% < ret < -.5% -.5% < ret < %0 %0 < ret < .5% .5% < ret < 1% reg > 1%
# of events 20.0 4.0 6.0 5.0 4.0 20.0
% of events 33.9 6.8 10.2 8.5 6.8 33.9
I have no idea why you have problem to change headers - maybe you read it as single column.
I have no problem to set headers when I load
df = pd.read_csv(..., names=['Date','Daily Return'])
or later
df.columns = ['Date','Daily Return']
And later I can use cut with bins=[min_val, -1, -0.5, 0, 0.5, 1, max_val]
min_val = df['Daily Return'].min() - 1
max_val = df['Daily Return'].max() + 1
regions = pd.cut(df['Daily Return'],
bins=[min_val, -1, -0.5, 0, 0.5, 1, max_val],
labels=['ret < -1(%)','-1 < ret < -0.5(%)', '-0.5 < ret < 0(%)','0 < ret < 0.5(%)','0.5 < ret < 1(%)','ret > 1(%)'],
)
And calculate number of events
count = regions.value_counts(sort=False)
print(count)
ret < -1(%) 0
-1 < ret < -0.5(%) 0
-0.5 < ret < 0(%) 30
0 < ret < 0.5(%) 29
0.5 < ret < 1(%) 0
ret > 1(%) 0
And use it to calculate percentage
size = len(regions)
percentage = (count/size) * 100
print(percentage)
ret < -1(%) 0.000000
-1 < ret < -0.5(%) 0.000000
-0.5 < ret < 0(%) 50.847458
0 < ret < 0.5(%) 49.152542
0.5 < ret < 1(%) 0.000000
ret > 1(%) 0.000000
Now it would need only to put all in DataFrame to format table
results = pd.DataFrame({'# of event': count, '% of event': percentage})
print(results.T.to_string())
ret < -1(%) -1 < ret < -0.5(%) -0.5 < ret < 0(%) 0 < ret < 0.5(%) 0.5 < ret < 1(%) ret > 1(%)
# of event 0.0 0.0 30.000000 29.000000 0.0 0.0
% of event 0.0 0.0 50.847458 49.152542 0.0 0.0
Full working code with example data readed with io.StringIO
but you should use own method to get data
text = '''2020-01-02 0.022034
2020-01-03 -0.002666
2020-01-06 0.009716
2020-01-07 0.009838
2020-01-08 -0.011690
2020-01-09 0.025103
2020-01-10 0.009325
2020-01-13 0.028888
2020-01-14 -0.009183
2020-01-15 0.012292
2020-01-16 -0.005593
2020-01-17 0.020492
2020-01-20 -0.003878
2020-01-21 -0.032687
2020-01-22 0.034887
2020-01-23 -0.033485
2020-01-24 0.001934
2020-01-29 -0.026629
2020-01-30 -0.039513
2020-01-31 -0.001845
2020-02-03 0.021784
2020-02-04 0.033137
2020-02-05 0.000586
2020-02-06 0.016146
2020-02-07 0.000082
2020-02-10 -0.016997
2020-02-11 0.010172
2020-02-12 0.016836
2020-02-13 0.013530
2022-01-31 0.031707
2022-02-04 0.028683
2022-02-07 -0.015853
2022-02-08 -0.024170
2022-02-09 0.045076
2022-02-10 0.013623
2022-02-11 -0.012259
2022-02-14 -0.023093
2022-02-15 -0.008984
2022-02-16 0.023177
2022-02-17 0.003182
2022-02-18 -0.054995
2022-02-21 -0.033302
2022-02-22 -0.028148
2022-02-23 0.012332
2022-02-24 -0.048095
2022-02-25 -0.004944
2022-02-28 -0.002682
2022-03-01 0.006940
2022-03-02 0.002542
2022-03-03 -0.006318
2022-03-04 -0.048641
2022-03-07 -0.050231
2022-03-08 -0.015469
2022-03-09 0.011477
2022-03-10 -0.002236
2022-03-11 -0.038740
2022-03-14 -0.115421
2022-03-15 -0.089573
2022-03-16 0.243084
'''
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text), sep='\s+', names=['date', 'value'])
df.columns = ['Date','Daily Return']
#print(df)
min_val = df['Daily Return'].min() - 1
max_val = df['Daily Return'].max() + 1
regions = pd.cut(df['Daily Return'],
bins=[min_val, -1, -0.5, 0, 0.5, 1, max_val],
labels=['ret < -1(%)','-1 < ret < -0.5(%)', '-0.5 < ret < 0(%)','0 < ret < 0.5(%)','0.5 < ret < 1(%)','ret > 1(%)'],
)
count = regions.value_counts(sort=False)
print(count)
size = len(regions)
percentage = (count/size) * 100
print(percentage)
results = pd.DataFrame({'# of event': count, '% of event': percentage})
print(results.T.to_string())
EDIT:
As #tdy suggests in comment you can also use -np.inf, np.inf instead of min_val, max_val
import numpy as np
regions = pd.cut(df['Daily Return'],
bins=[-np.inf, -1, -0.5, 0, 0.5, 1, np.inf],
labels=['ret < -1(%)','-1 < ret < -0.5(%)', '-0.5 < ret < 0(%)','0 < ret < 0.5(%)','0.5 < ret < 1(%)','ret > 1(%)'],
)
How can I modify the output from what it is currently, into the arrangement of the output as described at the bottom? I've tried stacking and un-stacking but I can't seem to hit the head on the nail. Help would be highly appreciated.
My code:
portfolio_count = 0
Equity_perportfolio = []
Portfolio_sequence = []
while portfolio_count < 1:
# declaring list
list = Tickers
portfolio_count = portfolio_count + 1
# initializing the value of n (Number of assets in portfolio)
n = 5
# printing n elements from list (add number while printing the potential portfolio)
potential_portfolio = random.sample(list, n)
print("Portfolio number", portfolio_count)
print(potential_portfolio)
#Pull 'relevant data' about the selected stocks. (Yahoo API?) # 1. df with Index Date and Closing
price_data_close = web.get_data_yahoo(potential_portfolio,
start = '2012-01-01',
end = '2021-03-31')['Close']
price_data = web.get_data_yahoo(potential_portfolio,
start = '2012-01-01',
end = '2021-03-31')
print(price_data)
Which gives me the following structure:(IGNORE NaNs)
Attributes Adj Close ... Volume
Symbols D HOLX PSX ... PSX MGM PG
Date ...
2012-01-03 36.209511 17.840000 NaN ... NaN 25873300.0 11565900.0
2012-01-04 35.912926 17.910000 NaN ... NaN 14717400.0 10595400.0
2012-01-05 35.837063 18.360001 NaN ... NaN 12437500.0 10085300.0
2012-01-06 35.471519 18.570000 NaN ... NaN 9079700.0 8421200.0
2012-01-09 35.423241 18.520000 NaN ... NaN 15750100.0 7836100.0
... ... ... ... ... ... ... ...
2021-03-25 75.220001 71.050003 82.440002 ... 2613300.0 9601500.0 7517300.0
2021-03-26 75.779999 73.419998 84.309998 ... 2368900.0 7809100.0 10820100.0
2021-03-29 76.699997 74.199997 82.529999 ... 1880600.0 7809700.0 11176000.0
2021-03-30 75.529999 73.870003 82.309998 ... 1960600.0 5668500.0 8090600.0
2021-03-31 75.959999 74.379997 81.540001 ... 2665200.0 7029900.0 9202600.0
However, I wanted it to output in this format:
Date Symbols Open High Low Close Volume Adjusted
04/12/2020 MMM 172.130005 173.160004 171.539993 172.460007 2663600 171.050461
07/12/2020 MMM 171.720001 172.5 169.179993 170.149994 2526800 168.759323
08/12/2020 MMM 169.740005 172.830002 169.699997 172.460007 1730800 171.050461
08/12/2020 MMM 169.740005 172.830002 169.699997 172.460007 1730800 171.050461
11/12/2020 D 172.300003 174.649994 172.169998 174.020004 1875700 172.597702
11/12/2020 D 172.300003 174.649994 172.169998 174.020004 1875700 172.597702
11/12/2020 D 172.300003 174.649994 172.169998 174.020004 1875700 172.597702
14/12/2020 D 175.669998 176.199997 172.990005 173.080002 3700100 171.66539
14/12/2020 D 175.669998 176.199997 172.990005 173.080002 3700100 171.66539
14/12/2020 PSX 175.669998 176.199997 172.990005 173.080002 3700100 171.66539
14/12/2020 PSX 175.669998 176.199997 172.990005 173.080002 3700100 171.66539
15/12/2020 PSX 174.389999 175.059998 172.550003 174.679993 2270600 173.252304
18/12/2020 PSX 176.759995 177.460007 175.110001 176.419998 4682000 174.978088
18/12/2020 PSX 176.759995 177.460007 175.110001 176.419998 4682000 174.978088
23/12/2020 PG 175.300003 175.809998 173.960007 173.990005 1762600 172.567963
28/12/2020 PG 175.309998 176.399994 174.389999 174.710007 1403000 173.282074
29/12/2020 PG 175.550003 175.639999 173.149994 173.850006 1218900 172.429108
31/12/2020 PG 174.119995 174.869995 173.179993 174.789993 1841300 173.361404
05/01/2021 PG 172.009995 173.25 170.649994 171.580002 2295300 170.177643
07/01/2021 MMM 171.559998 173.460007 166.160004 169.720001 5863400 168.332855
07/01/2021 MMM 171.559998 173.460007 166.160004 169.720001 5863400 168.332855
07/01/2021 MMM 171.559998 173.460007 166.160004 169.720001 5863400 168.332855
08/01/2021 MMM 169.169998 169.539993 164.610001 166.619995 4808100 165.258179
13/01/2021 MMM 167.270004 167.740005 166.050003 166.279999 2098000 164.920959
15/01/2021 MMM 165.630005 166.259995 163.380005 165.550003 3550700 164.19693
19/01/2021 MMM 167.259995 169.550003 166.800003 169.119995 3903200 167.737747
I have a list of shares that make up an ETF. I have formatted the tickers into a list and have named this variable assets
print(assets)
['AUD', 'CRWD', 'SPLK', 'OKTA', 'AVGO', 'CSCO', 'NET', 'ZS', 'AKAM', 'FTNT', 'BAH', 'CYBR', 'CHKP', 'BA/', 'VMW', 'PFPT', 'PANW', 'VRSN', 'FFIV', 'JNPR', 'LDOS', '4704', 'FEYE', 'QLYS', 'SAIC', 'RPD', 'HO', 'MIME', 'SAIL', 'VRNS', 'ITRI', 'AVST', 'MANT', 'TENB', '053800', 'ZIXI', 'OSPN', 'RDWR', 'ULE', 'MOBL', 'ATEN', 'TUFN', 'RBBN', 'NCC', 'KRW', 'EUR', 'JPY', 'GBP', 'USD']
I use the following for loop to iterate through the list and pull historical data from yahoo
for i in assets:
try:
df[i] = web.DataReader(i, data_source='yahoo', start=start, end=end)['Adj Close']
except RemoteDataError:
print(f'{i}')
continue
I am returned with:
BA/
4704
H0
053800
KRW
JPY
Suggesting these assets cannot be found on yahoo finance. I understand this is the case and accept that.
When I look for the stocks that have theoretically been found (e.g. df['FEYE']) on yahoo finance I get the following.
0 NaN 1 NaN 2 NaN 3 NaN 4 NaN 5 NaN 6 NaN 7 NaN 8 NaN 9 NaN 10 NaN 11 NaN 12 NaN 13 NaN 14 NaN 15 NaN 16 NaN 17 NaN 18 NaN 19 NaN 20 NaN 21 NaN 22 NaN 23 NaN 24 NaN 25 NaN 26 NaN 27 NaN 28 NaN 29 NaN 30 NaN 31 NaN 32 NaN 33 NaN 34 NaN 35 NaN 36 NaN 37 NaN 38 NaN 39 NaN 40 NaN 41 NaN 42 NaN 43 NaN 44 NaN 45 NaN 46 NaN 47 NaN 48 NaN
Name: FEYE, dtype: float64
When I proceed normally with just one share
(e.g. CSCO = web.DataReader(assets[5], data_source='yahoo', start=start, end=end)['Adj Close'])
It is all ok.
Any help is greatly appreciated,
Thank you!
Here is reproducible testing example of code and output.
If You have existing dataframe named df then new data is incompatible in terms of index and maybe column names.
Creating new dataframe is needed but outside the loop. Each itertation creates new column with ticker data.
import pandas as pd
import pandas_datareader.data as web
from pandas_datareader._utils import RemoteDataError
assets=['AUD', 'CRWD', 'SPLK', 'OKTA', 'AVGO', 'CSCO', 'NET', 'ZS', 'AKAM', 'FTNT', 'BAH', 'CYBR', 'CHKP', 'BA/', 'VMW', 'PFPT', 'PANW', 'VRSN', 'FFIV', 'JNPR', 'LDOS', '4704', 'FEYE', 'QLYS', 'SAIC', 'RPD', 'HO', 'MIME', 'SAIL', 'VRNS', 'ITRI', 'AVST', 'MANT', 'TENB', '053800', 'ZIXI', 'OSPN', 'RDWR', 'ULE', 'MOBL', 'ATEN', 'TUFN', 'RBBN', 'NCC', 'KRW', 'EUR', 'JPY', 'GBP', 'USD']
df = pd.DataFrame()
for i in assets:
try:
print(f'Try: {i}')
df[i] = web.DataReader(i, data_source='yahoo')['Adj Close']
except RemoteDataError as r:
print(f'Try: {i}: {r}')
continue
result:
Try: AUD
Try: CRWD
Try: SPLK
Try: OKTA
Try: AVGO
Try: CSCO
Try: NET
Try: ZS
Try: AKAM
Try: FTNT
Try: BAH
Try: CYBR
Try: CHKP
Try: BA/
Try: BA/: Unable to read URL: https://finance.yahoo.com/quote/BA//history?period1=1435975200&period2=1593741599&interval=1d&frequency=1d&filter=history
Response Text:
b'<html>\n<meta charset=\'utf-8\'>\n<script>\nvar u=\'https://www.yahoo.com/?err=404&err_url=https%3a%2f%2ffinance.yahoo.com%2fquote%2fBA%2f%2fhistory%3fperiod1%3d1435975200%26period2%3d1593741599%26interval%3d1d%26frequency%3d1d%26filter%3dhistory\';\nif(window!=window.top){\n document.write(\'<p>Content is currently unavailable.</p><img src="//geo.yahoo.com/p?s=1197757039&t=\'+new Date().getTime()+\'&_R=\'+encodeURIComponent(document.referrer)+\'&err=404&err_url=\'+u+\'" width="0px" height="0px"/>\');\n}else{\n window.location.replace(u);\n}\n</script>\n<noscript><META http-equiv="refresh" content="0;URL=\'https://www.yahoo.com/?err=404&err_url=https%3a%2f%2ffinance.yahoo.com%2fquote%2fBA%2f%2fhistory%3fperiod1%3d1435975200%26period2%3d1593741599%26interval%3d1d%26frequency%3d1d%26filter%3dhistory\'"></noscript>\n</html>\n'
Try: VMW
Try: PFPT
Try: PANW
Try: VRSN
Try: FFIV
Try: JNPR
Try: LDOS
Try: 4704
Try: 4704: No data fetched for symbol 4704 using YahooDailyReader
Try: FEYE
Try: QLYS
Try: SAIC
Try: RPD
Try: HO
Try: HO: No data fetched for symbol HO using YahooDailyReader
Try: MIME
Try: SAIL
Try: VRNS
Try: ITRI
Try: AVST
Try: MANT
Try: TENB
Try: 053800
Try: 053800: No data fetched for symbol 053800 using YahooDailyReader
Try: ZIXI
Try: OSPN
Try: RDWR
Try: ULE
Try: MOBL
Try: ATEN
Try: TUFN
Try: RBBN
Try: NCC
Try: KRW
Try: KRW: No data fetched for symbol KRW using YahooDailyReader
Try: EUR
Try: JPY
Try: JPY: No data fetched for symbol JPY using YahooDailyReader
Try: GBP
Please note there are 2 types of error:
when ticker does not exists, for example "HO"
when resulting URL is wrong due to "/" in "BA/"
Head of result set dataframe df.head():
AUD CRWD SPLK OKTA ... NCC EUR GBP USD
Date ...
2015-11-03 51.500000 NaN 57.139999 NaN ... 3.45 NaN 154.220001 13.608685
2015-12-22 55.189999 NaN 54.369999 NaN ... 3.48 NaN 148.279999 13.924644
2015-12-23 55.560001 NaN 56.509998 NaN ... 3.48 NaN 148.699997 14.146811
2015-12-24 55.560001 NaN 56.779999 NaN ... 3.48 NaN 149.119995 14.324224
2015-12-28 56.270000 NaN 57.660000 NaN ... 3.48 NaN 148.800003 14.057305
[5 rows x 43 columns]
Hope this helps.
I have a dataset that looks like:
print(portfolio_all[1])
Date Open High ... Close Adj Close Volume
0 2010-01-04 4.840000 4.940000 ... 4.770000 4.513494 9837300
1 2010-01-05 4.790000 5.370000 ... 5.310000 5.024457 25212000
2 2010-01-06 5.190000 5.380000 ... 5.090000 4.816288 16597900
3 2010-01-07 5.060000 5.430000 ... 5.240000 4.958220 14033400
4 2010-01-08 5.270000 5.430000 ... 5.140000 4.863598 12760000
5 2010-01-11 5.130000 5.230000 ... 5.040000 4.768975 10952900
6 2010-01-12 5.060000 5.150000 ... 5.080000 4.806825 7870300
7 2010-01-13 5.120000 5.500000 ... 5.480000 5.185314 16400500
8 2010-01-14 5.460000 5.710000 ... 5.590000 5.289400 12767100
9 2010-01-15 5.640000 5.840000 ... 5.500000 5.204239 10985300
10 2010-01-19 5.500000 5.730000 ... 5.640000 5.336711 7807700
11 2010-01-20 5.650000 5.890000 ... 5.740000 5.431333 13289100
I want to calculate how many days had positive return (i.e. Close_day_t > Close_day_t-1)
I tried the following function:
def positive_return_days(portfolio):
positive_returns = pd.DataFrame(
columns=['ticker', 'name', 'total positive', 'total days'])
for asset in portfolio:
for index, row in asset.iterrows():
try:
this_day_close = asset.iloc[[index]]['Close']
previous_day_close = asset.iloc[[index-1]]['Close']
asset.loc[index, 'positive_days'] = np.where((this_day_close > previous_day_close))
except IndexError:
print("I get out of bounds")
total_positive_days = asset['positive_days'].sum()
new_row = {'ticker':asset.name, 'name':asset.name, 'total positive':total_positive_days, 'total days':len(asset.index)}
positive_returns = positive_returns.append(new_row, ignore_index=True)
print("Asset: ", "total positive days: ", total_positive_days, "total days:",len(asset.index))
return positive_returns
but I am getting an error:
ValueError: Can only compare identically-labeled Series objects
How can I fix it?
you can just use .shift functio to shift column by one value.
import pandas as pd
df = pd.DataFrame({'Close':[1,2,3,2,1,3]})
print(df)
print("count",(df.Close - df.Close.shift(1) > 0).sum())
*output:
Close
0 1
1 2
2 3
3 2
4 1
5 3
count:3
You can use pd.Series.diff to calculate the difference and then count the ones that are positive:
(df['Close'].diff() > 0).sum()
I have a pandas data frame that looks like:
High Low ... Volume OpenInterest
2018-01-02 983.25 975.50 ... 8387 67556
2018-01-03 986.75 981.00 ... 7447 67525
2018-01-04 985.25 977.00 ... 8725 67687
2018-01-05 990.75 984.00 ... 7948 67975
I calculate the Average True Range and save it into a series:
i = 0
TR_l = [0]
while i < (df.shape[0]-1):
#TR = max(df.loc[i + 1, 'High'], df.loc[i, 'Close']) - min(df.loc[i + 1, 'Low'], df.loc[i, 'Close'])
TR = max(df['High'][i+1], df['Close'][i]) - min(df['Low'][i+1], df['Close'][i])
TR_l.append(TR)
i = i + 1
TR_s = pd.Series(TR_l)
ATR = pd.Series(TR_s.ewm(span=n, min_periods=n).mean(), name='ATR_' + str(n))
With a 14-period rolling window ATR looks like:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 8.096064
14 7.968324
15 8.455205
16 9.046418
17 8.895405
18 9.088769
19 9.641879
20 9.516764
But when I do:
df = df.join(ATR)
The ATR column in df is all NaN. It's because the indexes are different between the data frame and ATR. Is there any way to add the ATR column into the data frame?
Consider shift to avoid the while loop across rows and list building. Below uses Union Pacific (UNP) railroad stock data to demonstrate:
import pandas as pd
import pandas_datareader as pdr
stock_df = pdr.get_data_yahoo('UNP').loc['2019-01-01':'2019-03-29']
# SHIFT DATA ONE DAY BACK AND JOIN TO ORIGINAL DATA
stock_df = stock_df.join(stock_df.shift(-1), rsuffix='_future')
# CALCULATE TR DIFFERENCE BY ROW
stock_df['TR'] = stock_df.apply(lambda x: max(x['High_future'], x['Close']) - min(x['Low_future'], x['Close']), axis=1)
# CALCULATE EWM MEAN
n = 14
stock_df['ATR'] = stock_df['TR'].ewm(span=n, min_periods=n).mean()
Output
print(stock_df.head(20))
# High Low Open Close Volume Adj Close High_future Low_future Open_future Close_future Volume_future Adj Close_future TR ATR
# Date
# 2019-01-02 138.320007 134.770004 135.649994 137.779999 3606300.0 137.067413 136.750000 132.169998 136.039993 132.679993 5684500.0 131.993790 5.610001 NaN
# 2019-01-03 136.750000 132.169998 136.039993 132.679993 5684500.0 131.993790 138.580002 134.520004 134.820007 137.789993 5649900.0 137.077362 5.900009 NaN
# 2019-01-04 138.580002 134.520004 134.820007 137.789993 5649900.0 137.077362 139.229996 136.259995 137.330002 138.649994 4034200.0 137.932907 2.970001 NaN
# 2019-01-07 139.229996 136.259995 137.330002 138.649994 4034200.0 137.932907 152.889999 149.039993 151.059998 150.750000 10558800.0 149.970337 14.240005 NaN
# 2019-01-08 152.889999 149.039993 151.059998 150.750000 10558800.0 149.970337 151.059998 148.610001 150.289993 150.360001 4284600.0 149.582352 2.449997 NaN
# 2019-01-09 151.059998 148.610001 150.289993 150.360001 4284600.0 149.582352 155.289993 149.009995 149.899994 154.660004 6444600.0 153.860123 6.279999 NaN
# 2019-01-10 155.289993 149.009995 149.899994 154.660004 6444600.0 153.860123 155.029999 153.089996 153.639999 153.210007 3845200.0 152.417618 1.940002 NaN
# 2019-01-11 155.029999 153.089996 153.639999 153.210007 3845200.0 152.417618 154.240005 151.649994 152.229996 153.889999 3507100.0 153.094101 2.590012 NaN
# 2019-01-14 154.240005 151.649994 152.229996 153.889999 3507100.0 153.094101 154.360001 151.740005 153.789993 152.479996 4685100.0 151.691391 2.619995 NaN
# 2019-01-15 154.360001 151.740005 153.789993 152.479996 4685100.0 151.691391 153.729996 150.910004 152.910004 151.970001 4053200.0 151.184021 2.819992 NaN
# 2019-01-16 153.729996 150.910004 152.910004 151.970001 4053200.0 151.184021 154.919998 150.929993 151.110001 154.639999 4075400.0 153.840210 3.990005 NaN
# 2019-01-17 154.919998 150.929993 151.110001 154.639999 4075400.0 153.840210 158.800003 155.009995 155.539993 158.339996 5003900.0 157.521072 4.160004 NaN
# 2019-01-18 158.800003 155.009995 155.539993 158.339996 5003900.0 157.521072 157.199997 154.410004 156.929993 155.020004 6052900.0 154.218262 3.929993 NaN
# 2019-01-22 157.199997 154.410004 156.929993 155.020004 6052900.0 154.218262 156.020004 152.429993 155.449997 154.330002 4858000.0 153.531830 3.590012 4.011254
# 2019-01-23 156.020004 152.429993 155.449997 154.330002 4858000.0 153.531830 160.759995 156.009995 160.039993 160.339996 9222400.0 159.510742 6.429993 4.376440
# 2019-01-24 160.759995 156.009995 160.039993 160.339996 9222400.0 159.510742 162.000000 160.220001 161.460007 160.949997 7770700.0 160.117584 1.779999 3.991223
# 2019-01-25 162.000000 160.220001 161.460007 160.949997 7770700.0 160.117584 160.789993 159.339996 160.000000 159.899994 3733800.0 159.073013 1.610001 3.643168
# 2019-01-28 160.789993 159.339996 160.000000 159.899994 3733800.0 159.073013 160.929993 158.750000 160.039993 160.169998 3436900.0 159.341614 2.179993 3.432011
# 2019-01-29 160.929993 158.750000 160.039993 160.169998 3436900.0 159.341614 161.889999 159.440002 161.089996 160.820007 4112200.0 159.988266 2.449997 3.291831
# 2019-01-30 161.889999 159.440002 161.089996 160.820007 4112200.0 159.988266 160.990005 157.020004 160.750000 159.070007 7438600.0 158.247314 3.970001 3.387735