I am trying to pull stock price history at 1 hour intervals through the Yahoo Finance API using the yfinance package. I run the following code.
import yfinance as yf
msft = yf.Ticker("MSFT")
df = msft.history(period = "5d", interval = "1h")
df.reset_index(inplace = True)
print(df["Date"][0])
print(df["Date"][1])
print(df["Date"][2])
I get the output
2020-04-03 00:00:00
2020-04-03 00:00:00
2020-04-03 00:00:00
Why are the timestamps all 00:00:00? The stock prices are actually at 1 hour intervals and seem correct. The dates also change correctly after 7 rows. Just the timestamps are all 0s. I can just postprocess the timestamps as I know the intervals. Just curious if I am doing something wrong here. Is this how the package is supposed to work?
Have you tried using “60m” as the interval argument? Appears there is an issue you can see here: https://github.com/ranaroussi/yfinance/issues/125
To those who are new to yfinance this is how to extract the data from the yfinance history() function in more detail.
yfinance uses a module called Pandas. The data structures returned from the yfinance API are Pandas objects.
The object returned by the history() function is a Pandas DataFrame object. They are like 2 dimensional arrays, with extras.
For DataFrame objects, there is a columns field which contains an array of column names, and an index field which contains an array of index objects applicable to the columns. The indexes are of a fixed type, and can be objects themselves. In the DataFrame object returned by the yfinance history() function, the indexes are Pandas Timestamp objects. (Pandas allows using any type for the indexes, for example plain integers or strings or other objects would be also allowed)
There is an in-depth description of Pandas datastructures here and here.
Each column in the DataFrame object is a Pandas Series object which is like a one dimensional array. The columns can be accessed by the column names from the DataFrame object. The column values in each column can be accessed using the index objects. Every column uses the same indexes. The Python array notation [ ] can be used to access the fields in the Pandas objects.
This is how to access the data:
def zeroX(n):
result = ""
if (n < 10):
result += "0"
result += str (n)
return result
def dump_Pandas_Timestamp (ts):
result = ""
result += str(ts.year) + "-" + zeroX(ts.month) + "-" + zeroX(ts.day)
#result += " " + zeroX(ts.hour) + ":" + zeroX(ts.minute) + ":" + zeroX(ts.second)
return result
def dump_Pandas_DataFrame (DF):
result = ""
for indexItem in DF.index:
ts = dump_Pandas_Timestamp (indexItem)
fields = ""
first = 1
for colname in DF.columns:
fields += ("" if first else ", ") + colname + " = " + str(DF[colname][indexItem])
first = 0
result += ts + " " + fields + "\n"
return result
msft = yf.Ticker("MSFT")
# get historical market data
hist = msft.history(period="1mo", interval="1d")
print ("hist = " + dump_Pandas_DataFrame(hist))
Output:
hist = 2020-07-08 Open = 210.07, High = 213.26, Low = 208.69, Close = 212.83, Volume = 33600000, Dividends = 0, Stock Splits = 0
2020-07-09 Open = 216.33, High = 216.38, Low = 211.47, Close = 214.32, Volume = 33121700, Dividends = 0, Stock Splits = 0
2020-07-10 Open = 213.62, High = 214.08, Low = 211.08, Close = 213.67, Volume = 26177600, Dividends = 0, Stock Splits = 0
2020-07-13 Open = 214.48, High = 215.8, Low = 206.5, Close = 207.07, Volume = 38135600, Dividends = 0, Stock Splits = 0
2020-07-14 Open = 206.13, High = 208.85, Low = 202.03, Close = 208.35, Volume = 37591800, Dividends = 0, Stock Splits = 0
2020-07-15 Open = 209.56, High = 211.33, Low = 205.03, Close = 208.04, Volume = 32179400, Dividends = 0, Stock Splits = 0
2020-07-16 Open = 205.4, High = 205.7, Low = 202.31, Close = 203.92, Volume = 29940700, Dividends = 0, Stock Splits = 0
2020-07-17 Open = 204.47, High = 205.04, Low = 201.39, Close = 202.88, Volume = 31635300, Dividends = 0, Stock Splits = 0
2020-07-20 Open = 205.0, High = 212.3, Low = 203.01, Close = 211.6, Volume = 36884800, Dividends = 0, Stock Splits = 0
2020-07-21 Open = 213.66, High = 213.94, Low = 208.03, Close = 208.75, Volume = 38105800, Dividends = 0, Stock Splits = 0
2020-07-22 Open = 209.2, High = 212.3, Low = 208.39, Close = 211.75, Volume = 49605700, Dividends = 0, Stock Splits = 0
2020-07-23 Open = 207.19, High = 210.92, Low = 202.15, Close = 202.54, Volume = 67457000, Dividends = 0, Stock Splits = 0
2020-07-24 Open = 200.42, High = 202.86, Low = 197.51, Close = 201.3, Volume = 39827000, Dividends = 0, Stock Splits = 0
2020-07-27 Open = 201.47, High = 203.97, Low = 200.86, Close = 203.85, Volume = 30160900, Dividends = 0, Stock Splits = 0
2020-07-28 Open = 203.61, High = 204.7, Low = 201.74, Close = 202.02, Volume = 23251400, Dividends = 0, Stock Splits = 0
2020-07-29 Open = 202.5, High = 204.65, Low = 202.01, Close = 204.06, Volume = 19632600, Dividends = 0, Stock Splits = 0
2020-07-30 Open = 201.0, High = 204.46, Low = 199.57, Close = 203.9, Volume = 25079600, Dividends = 0, Stock Splits = 0
2020-07-31 Open = 204.4, High = 205.1, Low = 199.01, Close = 205.01, Volume = 51248000, Dividends = 0, Stock Splits = 0
2020-08-03 Open = 211.52, High = 217.64, Low = 210.44, Close = 216.54, Volume = 78983000, Dividends = 0, Stock Splits = 0
2020-08-04 Open = 214.17, High = 214.77, Low = 210.31, Close = 213.29, Volume = 49280100, Dividends = 0, Stock Splits = 0
2020-08-05 Open = 214.9, High = 215.0, Low = 211.57, Close = 212.94, Volume = 28858600, Dividends = 0, Stock Splits = 0
2020-08-06 Open = 212.34, High = 216.37, Low = 211.55, Close = 216.35, Volume = 32656800, Dividends = 0, Stock Splits = 0
2020-08-07 Open = 214.85, High = 215.7, Low = 210.93, Close = 212.48, Volume = 27789600, Dividends = 0, Stock Splits = 0
Related
This is a modified version of a program from a tutorial that extracts data from all of the stocks in the S&P 500 and picks stocks that match the criteria you specify.
The issue is that when I run the program List index out of range [stock symbol] pops up and those stocks are skipped and aren't added to the final CSV file.
Example:
list index out of range for ABMD
list index out of range for ABT
list index out of range for ADBE
list index out of range for ADI
I'm not really sure what the issue is, I would greatly appreciate it if someone would explain it to me! Also, I am not applying any of the specifying criteria yet and am just trying to get all of the stock data into the CSV file. Make sure to create a database named stock_data if you try the program. Thanks!
My code:
import pandas_datareader as web
import pandas as pd
from yahoo_fin import stock_info as si
import datetime as dt
dow_list = si.tickers_dow()
sp_list = si.tickers_sp500()
tickers = sp_list
'''tickers = list(set(tickers))
tickers.sort()'''
start = dt.datetime.now() - dt.timedelta(days=365)
end = dt.datetime.now()
sp500_df = web.DataReader('^GSPC', 'yahoo', start, end)
sp500_df['Pct Change'] = sp500_df['Adj Close'].pct_change()
sp500_return = (sp500_df['Pct Change'] + 1).cumprod()[-1]
return_list = []
final_df = pd.DataFrame(columns=['Ticker', 'Latest_Price', 'Score', 'PE_Ratio', 'PEG_Ratio', 'SMA_150', 'SMA_200', '52_Week_Low', '52_Week_High'])
counter = 0
for ticker in tickers:
df = web.DataReader(ticker, 'yahoo', start, end)
df.to_csv(f'stock_data/{ticker}.csv')
df['Pct Change'] = df['Adj Close'].pct_change()
stock_return = (df['Pct Change'] + 1).cumprod()[-1]
returns_compared = round((stock_return / sp500_return), 2)
return_list.append(returns_compared)
counter += 1
if counter == 100:
break
best_performers = pd.DataFrame(list(zip(tickers, return_list)), columns=['Ticker', 'Returns Compared'])
best_performers['Score'] = best_performers['Returns Compared'].rank(pct=True) * 100
best_performers = best_performers[best_performers['Score'] >= best_performers['Score'].quantile(0)] #picks stocks in top 25 percentile
for ticker in best_performers['Ticker']:
try:
df = pd.read_csv(f'stock_data/{ticker}.csv', index_col=0)
moving_averages = [150, 200]
for ma in moving_averages:
df['SMA_' + str(ma)] = round(df['Adj Close'].rolling(window=ma).mean(), 2)
latest_price = df['Adj Close'][-1]
pe_ratio = float(si.get_quote_table(ticker)['PE Ratio (TTM)'])
peg_ratio = float(si.get_stats_valuation(ticker)[1][4])
moving_average_150 = df['SMA_150'][-1]
moving_average_200 = df['SMA_200'][-1]
low_52week = round(min(df['Low'][-(52*5):]), 2)
high_52week = round(min(df['High'][-(52 * 5):]), 2)
score = round(best_performers[best_performers['Ticker'] == ticker]['Score'].tolist()[0])
condition_1 = latest_price > moving_average_150 > moving_average_200
condition_2 = latest_price >= (1.3 * low_52week)
condition_3 = latest_price >= (0.75 * high_52week)
condition_4 = pe_ratio < 25
condition_5 = peg_ratio < 2
final_df = final_df.append({'Ticker': ticker,
'Latest_Price': latest_price,
'Score': score,
'PE_Ratio': pe_ratio,
'PEG_Ratio': peg_ratio,
'SMA_150': moving_average_150,
'SMA_200': moving_average_200,
'52_Week_Low': low_52week,
'52_Week_High': high_52week}, ignore_index=True)
except Exception as e:
print(f"{e} for {ticker}")
final_df.sort_values(by='Score', ascending=False)
pd.set_option('display.max_columns', 10)
print(final_df)
final_df.to_csv('final.csv')
I have done the error shooting on your behalf. As a conclusion, I see that you have not checked the contents of the acquisition of the individual indicator data.
They are being added to the dictionary format and empty data frames as they are in index and named series. I believe that is the root cause of the error.
Specifying the last data and retrieving the values
iloc is not used.
52*5 lookbacks for 253 data
In addition, when additional indicators are acquired for the acquired issue data, there are cases where they can be acquired for the same issue, and cases where they cannot. (The cause is unknown.) Therefore, it may be necessary to change the method of processing pe_ratio and peg_ratio after obtaining them in advance.
for ticker in best_performers['Ticker']:
#print(ticker)
try:
df = pd.read_csv(f'stock_data/{ticker}.csv')#, index_col=0
moving_averages = [150, 200]
for ma in moving_averages:
df['SMA_' + str(ma)] = round(df['Adj Close'].rolling(window=ma).mean(), 2)
latest_price = df['Adj Close'][-1:].values[0]
pe_ratio = float(si.get_quote_table(ticker)['PE Ratio (TTM)'])
moving_average_150 = df['SMA_150'][-1:].values[0]
moving_average_200 = df['SMA_200'][-1:].values[0]
low_52week = round(min(df['Low'][-(52*1):]), 2)
high_52week = round(min(df['High'][-(52*1):]), 2)
#print(low_52week, high_52week)
score = round(best_performers[best_performers['Ticker'] == ticker]['Score'].tolist()[0])
#print(score)
#print(ticker, latest_price,score,pe_ratio,moving_average_200,low_52week,high_52week)
final_df = final_df.append({'Ticker': ticker,
'Latest_Price': latest_price,
'Score': score,
'PE_Ratio': pe_ratio,
'SMA_150': moving_average_150,
'SMA_200': moving_average_200,
'52_Week_Low': low_52week,
'52_Week_High': high_52week}, ignore_index=True)
#print(final_df)
except Exception as e:
print(f"{e} for {ticker}")
final_df
Ticker Latest_Price Score PE_Ratio SMA_150 SMA_200 52_Week_Low 52_Week_High
0 A 123.839996 40 31.42 147.26 150.31 123.06 126.75
1 AAP 218.250000 70 22.23 220.66 216.64 190.79 202.04
2 AAPL 165.070007 80 29.42 161.85 158.24 150.10 154.12
3 ABC 161.899994 90 21.91 132.94 129.33 132.00 137.79
4 ADBE 425.470001 10 42.46 552.19 571.99 407.94 422.38
Note
Some stocks are missing because additional indicators could not be obtained.
(tickers = sp_list[:10] tested on the first 10)
I have seen other threads but couldnt figure it out based on that.
class DataConsolidationAlgorithm(QCAlgorithm):
def Initialize(self):
'''Initialise the data and resolution required, as well as the cash and start-end dates for your algorithm. All algorithms must initialized.'''
self.SetStartDate(2017, 1, 1) #Set Start Date
self.SetEndDate(2020, 1, 1) #Set End Date
self.SetCash(100000) #Set Strategy Cash
self.SetBrokerageModel(BrokerageName.FxcmBrokerage)
symbols = [self.AddForex(ticker, Resolution.Minute).Symbol
for ticker in ["EURUSD"]]
self.SetBenchmark('SPY')
self.slow = self.EMA("EURUSD", 200, Resolution.Daily)
self.SetWarmUp(200)
def OnData(self, data):
# Simple buy and hold template
self.low = self.MIN("EURUSD", 7, Resolution.Daily, Field.Low)
self.high = self.MAX("EURUSD", 7, Resolution.Daily, Field.High)
#fxQuoteBars = data.QuoteBars
#QuoteBar = fxQuoteBars['EURUSD'].Close
#self.QuoteBar = self.History("EURUSD", TimeSpan.FromDays(1), Resolution.Daily)
self.quoteBar = data['EURUSD'] ## EURUSD QuoteBar
#self.Log(f"Mid-point open price: {quoteBar.Open}")
self.closeBar = (self.quoteBar.Close) ## EURUSD Bid Bar
self.history7days = self.History(["EURUSD"], 7, Resolution.Daily)
if self.closeBar <= self.low and self.Forex["EURUSD"].Price > self.slow.Current.Value:
self.SetHoldings("EURUSD", 1.0)
if self.closeBar > self.high:
self.SetHolding("EURUSD", 0.0)
Runtime Error: TypeError : Cannot get managed object
at OnData in main.py:line 50
:: if self.closeBar <= self.low and self.Forex["EURUSD"].Price > self.slow.Current.Value:
TypeError : Cannot get managed object
I was running into a similar error and solved it by making sure that the types of data I was trying to compare in the if statement with <, >, = etc were of the same type.
Redefine the indicators you want to compare as local indicators in OnData like below and all your indicators will be the same data type:
def Initialize(self):
'''Initialise the data and resolution required, as well as the cash and start-end dates for your algorithm. All algorithms must initialized.'''
self.SetStartDate(2017, 1, 1) #Set Start Date
self.SetEndDate(2020, 1, 1) #Set End Date
self.SetCash(100000) #Set Strategy Cash
self.SetBrokerageModel(BrokerageName.FxcmBrokerage)
symbols = [self.AddForex(ticker, Resolution.Minute).Symbol
for ticker in ["EURUSD"]]
self.SetBenchmark('SPY')
self.slow = self.EMA("EURUSD", 200, Resolution.Daily)
self.SetWarmUp(200)
# Simple buy and hold template
self.low = self.MIN("EURUSD", 7, Resolution.Daily, Field.Low)
self.high = self.MAX("EURUSD", 7, Resolution.Daily, Field.High)
#fxQuoteBars = data.QuoteBars
#QuoteBar = fxQuoteBars['EURUSD'].Close
#self.QuoteBar = self.History("EURUSD", TimeSpan.FromDays(1), Resolution.Daily)
self.quoteBar = data['EURUSD'] ## EURUSD QuoteBar
#self.Log(f"Mid-point open price: {quoteBar.Open}")
self.closeBar = (self.quoteBar.Close) ## EURUSD Bid Bar
self.history7days = self.History(["EURUSD"], 7, Resolution.Daily)
def OnData(self, data):
closebar = self.closeBar.Current.Value
low = self.low.Current.Value
price = self.Forex["EURUSD"].Price
slow = self.slow.Current.Value
high = self.high.Current.Value
if closeBar <= low and price > slow :
self.SetHoldings("EURUSD", 1.0)
if closeBar > high:
self.SetHolding("EURUSD", 0.0)
I wrote a script to scrape Yahoo Finance stock data using the Yahoo_Fin package
The aim of the script is to grab company financials to be able to perform some calculations. The input to the script is a txt file with a list of company ticker symbols. The output is also supposed to be a txt with only the companies that match a certain number of established criteria.
The script does occasionally work with a small txt file (20 tickers or less) however it does sometimes give me the following error (without me changing any code)
"None of ['Breakdown'] are in the columns" with Breakdown being the index column I set for the df.
I have run the script dozens of times and sometimes it works, sometimes it doesn't. Ran it in Atom and Jupyter Notebook and still have no clue what is causing the problem. I have also updated pandas and all necessary packages.
This is the code:
import pandas as pd
import statistics as stat
from yahoo_fin.stock_info import *
stock_list = [line.rstrip('\n') for line in open("test.txt", "r")]
#print(stock_list)
## The balance sheet df ##
balance_sheet = {ticker: get_balance_sheet(ticker)
for ticker in stock_list}
## The income statement df ##
income_statement = {ticker: get_income_statement(ticker)
for ticker in stock_list}
bs_data=[]
for i in range(0,len(stock_list)):
one_ticker = pd.DataFrame(balance_sheet[stock_list[i]])
one_ticker = one_ticker.set_index('Breakdown')
bs_data.append(one_ticker)
#print(bs_data)
income_data=[]
#one_ticker =[]
for i in range(0,len(stock_list)):
one_ticker = pd.DataFrame(income_statement[stock_list[i]])
one_ticker = one_ticker.set_index('Breakdown')
income_data.append(one_ticker)
#print(income_data)
## These are the balance sheet variables ##
for loop_counter in range(0,len(stock_list)):
# Total Assets
total_assets = (bs_data[loop_counter].loc['Total Assets'].astype(int))
avg_total_assets = stat.mean(total_assets)
#print(avg_total_assets)
# Total Current Liabilities
total_current_liabilities = (bs_data[loop_counter].loc['Total Current Liabilities'].astype(int))
avg_total_current_liabilities = stat.mean(total_current_liabilities)
#print(avg_total_current_liabilities)
#Total Liabilities
total_liabilities = (bs_data[loop_counter].loc['Total Liabilities'].astype(int))
avg_total_liabilities = stat.mean(total_liabilities)
#print(avg_total_liabilities)
## These are the income statement variables ##
# Total Revenue
total_revenue = (income_data[loop_counter].loc['Total Revenue']).astype(int)
avg_total_revenue = stat.mean(total_revenue)
#print(avg_total_revenue)
# Operating Income
operating_income = (income_data[loop_counter].loc['Operating Income or Loss']).astype(int)
avg_operating_income = stat.mean(operating_income)
#print(avg_operating_income)
# Total Operating Expenses
total_operating_expenses = (income_data[loop_counter].loc['Total Operating Expenses'].astype(int))
avg_total_operating_expenses = stat.mean(total_operating_expenses)
#print(avg_total_operating_expenses)
# EBIT
ebit = (avg_total_revenue-avg_total_operating_expenses)
#print(ebit)
## Calculations ##
opm = (avg_operating_income) / (avg_total_revenue)
#print(opm)
roce = (ebit) / ((avg_total_assets) - (avg_total_current_liabilities))
#print(roce)
leverage = (avg_total_liabilities) / (avg_total_assets)
#print(leverage)
#print("Leverage: " + str(round(leverage,2)))
#print("OPM: " + str(round(opm*100,2)) + "%")
#print("ROCE: " + str(round(roce*100,2)) + "%")
## Save to file ##
#print(leverage)
#print(opm)
#print(roce)
if leverage < 1.00 and roce >= 0.2 and opm >= 0.2:
#print("We have a match!")
outfile = open("results.txt", "a")
outfile.write(stock_list[loop_counter])
outfile.write("\n")
outfile.close()
Any clues to what might be the problem??
Update #2 Code:
import pandas as pd
import statistics as stat
from yahooquery import *
# Ticker input here
stock_list = [line.rstrip('\n') for line in open("test.txt", "r")]
#for stock in stock_list:
tickers = Ticker(stock_list)
# Get balance sheet
for stock in stock_list:
#print(stock)
bs = tickers.balance_sheet()
bs = pd.DataFrame(bs)
bs = bs.set_index('endDate')
#print(bs)
## Balance sheet variables to extract ##
# Total Assets
total_assets = bs['totalAssets']
avg_total_assets = stat.mean(total_assets)
# Total Current Liabilities
total_current_liabilities = bs['totalCurrentLiabilities']
avg_total_current_liabilities = stat.mean(total_current_liabilities)
# Total Liabilities
total_liabilities = bs['totalLiab']
avg_total_liabilities = stat.mean(total_liabilities)
## Get income statement ##
inst = tickers.income_statement()
inst = pd.DataFrame(inst)
inst = inst.set_index('endDate')
## Income statement variables to extract ##
# Total Revenue#
total_revenue = inst['totalRevenue']
avg_total_revenue = stat.mean(total_revenue)
# Operating Income
operating_income = inst['operatingIncome']
avg_operating_income = stat.mean(operating_income)
# Total Operating Expenses
total_operating_expenses = inst['totalOperatingExpenses']
avg_total_operating_expenses = stat.mean(total_operating_expenses)
# EBIT
ebit = (avg_total_revenue-avg_total_operating_expenses)
## Parameters ##
opm = (avg_operating_income) / (avg_total_revenue)
roce = (ebit) / ((avg_total_assets) - (avg_total_current_liabilities))
leverage = (avg_total_liabilities) / (avg_total_assets)
## Save to file ##
#print("Hello!")
if leverage < 1.00 and roce >= 0.2 and opm >= 0.2:
#print("Hello")
outfile = open("yahoo_query_results.txt", "w+")
outfile.write(stock)
outfile.write("\n")
outfile.close()
I am using SpreadedLinearZeroInterpolatedTermStructure class to price bond. I have 35 key rates, from 1M to 30Y. I also have a daily spot curve. So I want to input 35 key rates extracted from the daily spot curve to the class, then change key rates to see what's the bond price.
Giving credit to GB, and his article here:
http://gouthamanbalaraman.com/blog/bonds-with-spreads-quantlib-python.html
I followed his method which worked well, the bond price is changing due to the different values set to key rates.
Then I substituted his flat curve with my daily spot curve, his handles list with my handles (35 handles in it), and his two dates with my 35 dates.
I set values to some of the key rates while the NPV stayed still(even I gave a huge shock). I also tried to give only two key rates on a zero curve, and it worked. So I guess it's because 35 key rates is way too much? Any help is appreciated
import QuantLib as ql
# =============================================================================
# normal yc term structure
# =============================================================================
todaysDate = ql.Date(24,5,2019)
ql.Settings.instance().evaluationDate = todaysDate
KR1 = [0, 1, 3, 6, 9] # KR in month unit
KR2 = [x for x in range(1,31)] # KR in year unit
spotDates = [] # starting from today
for kr in KR1:
p = ql.Period(kr,ql.Months)
spotDates.append(todaysDate+p)
for kr in KR2:
p = ql.Period(kr,ql.Years)
spotDates.append(todaysDate+p)
spotRates = [0.02026,
0.021569,
0.02326,
0.025008,
0.026089,
0.026679,
0.028753,
0.029376,
0.030246,
0.031362,
0.033026,
0.034274,
0.033953,
0.033474,
0.033469,
0.033927,
0.03471,
0.035596,
0.036396,
0.036994,
0.037368,
0.037567,
0.037686,
0.037814,
0.037997,
0.038247,
0.038562,
0.038933,
0.039355,
0.039817,
0.040312,
0.040832,
0.041369,
0.041922,
0.042487] # matching points
dayCount = ql.Thirty360()
calendar = ql.China()
interpolation = ql.Linear()
compounding = ql.Compounded
compoundingFrequency = ql.Annual
spotCurve = ql.ZeroCurve(spotDates, spotRates, dayCount, calendar,
interpolation,compounding, compoundingFrequency)
spotCurveHandle = ql.YieldTermStructureHandle(spotCurve)
# =============================================================================
# bond settings
# =============================================================================
issue_date = ql.Date(24,5,2018)
maturity_date = ql.Date(24,5,2023)
tenor = ql.Period(ql.Semiannual)
calendar = ql.China()
business_convention = ql.Unadjusted
date_generation = ql.DateGeneration.Backward
month_end = False
schedule = ql.Schedule(issue_date,maturity_date,tenor,calendar,
business_convention, business_convention,
date_generation,month_end)
settlement_days = 0
day_count = ql.Thirty360()
coupon_rate = 0.03
coupons = [coupon_rate]
face_value = 100
fixed_rate_bond = ql.FixedRateBond(settlement_days,
face_value,
schedule,
coupons,
day_count)
#bond_engine = ql.DiscountingBondEngine(spotCurveHandle)
#fixed_rate_bond.setPricingEngine(bond_engine)
#print(fixed_rate_bond.NPV())
# =============================================================================
# non-parallel shift of yc
# =============================================================================
#def KRshocks(kr0=0.0, kr_1M=0.0, kr_3M=0.0, kr_6M=0.0, kr_9M=0.0,
# kr_1Y=0.0,kr_2Y=0.0, kr_3Y=0.0, kr_4Y=0.0, kr_5Y=0.0, kr_6Y=0.0,
# kr_7Y=0.0, kr_8Y=0.0, kr_9Y=0.0, kr_10Y=0.0, kr_11Y=0.0, kr_12Y=0.0,
# kr_13Y=0.0, kr_14Y=0.0, kr_15Y=0.0, kr_16Y=0.0, kr_17Y=0.0, kr_18Y=0.0,
# kr_19Y=0.0, kr_20Y=0.0, kr_21Y=0.0, kr_22Y=0.0, kr_23Y=0.0, kr_24Y=0.0,
# kr_25Y=0.0, kr_26Y=0.0, kr_27Y=0.0, kr_28Y=0.0, kr_29Y=0.0, kr_30Y=0.0):
# '''
#
# Parameters:
# Input shocks for each key rate.
# kr0 = today's spot rate shock;
# kr_1M = 0.083 year(1 month) later spot rate shock;
# kr_1Y = 1 year later spot rate shock;
# .
# .
# .
#
# '''
#
# krs = list(locals().keys())
# KRHandles = {}
# for k in krs:
# KRHandles['{}handle'.format(k)] = ql.QuoteHandle(ql.SimpleQuote(locals()[k]))
# return list(KRHandles.values())
#handles = KRshocks()
kr = ['kr0', 'kr_1M', 'kr_3M', 'kr_6M', 'kr_9M', 'kr_1Y','kr_2Y', 'kr_3Y',
'kr_4Y', 'kr_5Y', 'kr_6Y','kr_7Y', 'kr_8Y', 'kr_9Y', 'kr_10Y', 'kr_11Y',
'kr_12Y', 'kr_13Y', 'kr_14Y', 'kr_15Y', 'kr_16Y', 'kr_17Y', 'kr_18Y',
'kr_19Y', 'kr_20Y', 'kr_21Y', 'kr_22Y', 'kr_23Y', 'kr_24Y','kr_25Y',
'kr_26Y', 'kr_27Y', 'kr_28Y', 'kr_29Y', 'kr_30Y']
#KRQuotes = {}
handles = []
#for k in range(len(kr)):
# KRQuotes['{}'.format(kr[k])] = ql.SimpleQuote(spotRates[k])
# handles.append(ql.QuoteHandle(ql.SimpleQuote(spotRates[k])))
kr0 = ql.SimpleQuote(spotRates[0])
kr_1M = ql.SimpleQuote(spotRates[1])
kr_3M = ql.SimpleQuote(spotRates[2])
kr_6M = ql.SimpleQuote(spotRates[3])
kr_9M = ql.SimpleQuote(spotRates[4])
kr_1Y = ql.SimpleQuote(spotRates[5])
kr_2Y = ql.SimpleQuote(spotRates[6])
kr_3Y = ql.SimpleQuote(spotRates[7])
kr_4Y = ql.SimpleQuote(spotRates[8])
kr_5Y = ql.SimpleQuote(spotRates[9])
kr_6Y = ql.SimpleQuote(spotRates[10])
kr_7Y = ql.SimpleQuote(spotRates[11])
kr_8Y = ql.SimpleQuote(spotRates[12])
kr_9Y = ql.SimpleQuote(spotRates[13])
kr_10Y = ql.SimpleQuote(spotRates[14])
kr_11Y = ql.SimpleQuote(spotRates[15])
kr_12Y = ql.SimpleQuote(spotRates[16])
kr_13Y = ql.SimpleQuote(spotRates[17])
kr_14Y = ql.SimpleQuote(spotRates[18])
kr_15Y = ql.SimpleQuote(spotRates[19])
kr_16Y = ql.SimpleQuote(spotRates[20])
kr_17Y = ql.SimpleQuote(spotRates[21])
kr_18Y = ql.SimpleQuote(spotRates[22])
kr_19Y = ql.SimpleQuote(spotRates[23])
kr_20Y = ql.SimpleQuote(spotRates[24])
kr_21Y = ql.SimpleQuote(spotRates[25])
kr_22Y = ql.SimpleQuote(spotRates[26])
kr_23Y = ql.SimpleQuote(spotRates[27])
kr_24Y = ql.SimpleQuote(spotRates[28])
kr_25Y = ql.SimpleQuote(spotRates[29])
kr_26Y = ql.SimpleQuote(spotRates[30])
kr_27Y = ql.SimpleQuote(spotRates[31])
kr_28Y = ql.SimpleQuote(spotRates[32])
kr_29Y = ql.SimpleQuote(spotRates[33])
kr_30Y = ql.SimpleQuote(spotRates[34])
handles.append(ql.QuoteHandle(kr0))
handles.append(ql.QuoteHandle(kr_1M))
handles.append(ql.QuoteHandle(kr_3M))
handles.append(ql.QuoteHandle(kr_6M))
handles.append(ql.QuoteHandle(kr_9M))
handles.append(ql.QuoteHandle(kr_1Y))
handles.append(ql.QuoteHandle(kr_2Y))
handles.append(ql.QuoteHandle(kr_3Y))
handles.append(ql.QuoteHandle(kr_4Y))
handles.append(ql.QuoteHandle(kr_5Y))
handles.append(ql.QuoteHandle(kr_6Y))
handles.append(ql.QuoteHandle(kr_7Y))
handles.append(ql.QuoteHandle(kr_8Y))
handles.append(ql.QuoteHandle(kr_9Y))
handles.append(ql.QuoteHandle(kr_10Y))
handles.append(ql.QuoteHandle(kr_11Y))
handles.append(ql.QuoteHandle(kr_12Y))
handles.append(ql.QuoteHandle(kr_13Y))
handles.append(ql.QuoteHandle(kr_14Y))
handles.append(ql.QuoteHandle(kr_15Y))
handles.append(ql.QuoteHandle(kr_16Y))
handles.append(ql.QuoteHandle(kr_17Y))
handles.append(ql.QuoteHandle(kr_18Y))
handles.append(ql.QuoteHandle(kr_19Y))
handles.append(ql.QuoteHandle(kr_20Y))
handles.append(ql.QuoteHandle(kr_21Y))
handles.append(ql.QuoteHandle(kr_22Y))
handles.append(ql.QuoteHandle(kr_23Y))
handles.append(ql.QuoteHandle(kr_24Y))
handles.append(ql.QuoteHandle(kr_25Y))
handles.append(ql.QuoteHandle(kr_26Y))
handles.append(ql.QuoteHandle(kr_27Y))
handles.append(ql.QuoteHandle(kr_28Y))
handles.append(ql.QuoteHandle(kr_29Y))
handles.append(ql.QuoteHandle(kr_30Y))
ts_spreaded2 = ql.SpreadedLinearZeroInterpolatedTermStructure(spotCurveHandle,
handles,
spotDates)
ts_spreaded_handle2 = ql.YieldTermStructureHandle(ts_spreaded2)
bond_engine = ql.DiscountingBondEngine(ts_spreaded_handle2)
fixed_rate_bond.setPricingEngine(bond_engine)
#print(fixed_rate_bond.NPV())
kr0.setValue(0.1)
kr_10Y.setValue(0.2)
kr_12Y.setValue(0.2)
print(fixed_rate_bond.NPV())
no errors came out but the bond price is the same as the price before spreads added
I am trying to update a value in a dataframe using a method and a forloop. I pass the dataframe into the method and use a for loop to calculate the value I want to put into the last column.
Here is the method
def vwap2(df):
sumTpv = 0.00
sumVolume = 0
dayVwap = 0.00
for i, row in df.iterrows():
#Get all values from each row
#Find typical price
tp = (row['HIGH'] + row['LOW'] + row['CLOSE'] + row['OPEN']) / 4
tpv = tp * row['VOLUME']
sumTpv= sumTpv + tpv
sumVolume = sumVolume + row['VOLUME']
vwap = sumTpv / sumVolume
#Find VWAP
#df.assign(VWAP = vwap)
#row.assign(VWAP = vwap)
#row["VWAP"] = vwap
df.set_value(row, 'VWAP', vwap)
df = df.reindex(row = row)
df[row] = df[row].astype(float)
dayVwap = dayVwap + vwap
print('Day VWAP = ', dayVwap)
print('TPV sum = ', sumTpv)
print('Day Volume = ', sumVolume)
return df
And the Dataframe already has the column in it as I add it to it before I pass the df into the method. Like this
df["VWAP"] = ""
#do vwap calculation
df = vwap2(df)
But the values either are all the same which should not be or are not written. I tried a few things but to no success.
Updates
Here is the data that I am using, I am pulling it from Google each time:
CLOSE HIGH LOW OPEN VOLUME TP \
2018-05-10 22:30:00 97.3600 97.48 97.3000 97.460 371766 97.86375
1525991460000000000 97.2900 97.38 97.1800 97.350 116164 97.86375
1525991520000000000 97.3100 97.38 97.2700 97.270 68937 97.86375
1525991580000000000 97.3799 97.40 97.3101 97.330 46729 97.86375
1525991640000000000 97.2200 97.39 97.2200 97.365 64823 97.86375
TPV SumTPV SumVol VWAP
2018-05-10 22:30:00 3.722224e+08 1.785290e+09 18291710 97.601027
1525991460000000000 3.722224e+08 1.785290e+09 18291710 97.601027
1525991520000000000 3.722224e+08 1.785290e+09 18291710 97.601027
1525991580000000000 3.722224e+08 1.785290e+09 18291710 97.601027
1525991640000000000 3.722224e+08 1.785290e+09 18291710 97.601027
As you can see all the calculated stuff is the same.
Here is what I am using right now.
def vwap2(df):
sumTpv = 0.00
sumVolume = 0
dayVwap = 0.00
for i, row in df.iterrows():
#Get all values from each row
#Find typical price
tp = (row['HIGH'] + row['LOW'] + row['CLOSE'] + row['OPEN']) / 4
df['TP'] = tp
tpv = tp * row['VOLUME']
df['TPV'] = tpv
sumTpv= sumTpv + tpv
df['SumTPV'] = sumTpv
sumVolume = sumVolume + row['VOLUME']
df['SumVol'] = sumVolume
vwap = sumTpv / sumVolume
#Find VWAP
#row.assign(VWAP = vwap)
#row["VWAP"] = vwap
#df.set_value(row, 'VWAP', vwap)
df["VWAP"] = vwap
dayVwap = dayVwap + vwap
print('Day VWAP = ', dayVwap)
print('TPV sum = ', sumTpv)
print('Day Volume = ', sumVolume)
return df
IIUC, you don't need a loop, or even apply - you can use direct column assignment and cumsum() to get what you're looking for.
Some example data:
import numpy as np
import pandas as pd
N = 20
high = np.random.random(N)
low = np.random.random(N)
close = np.random.random(N)
opening = np.random.random(N)
volume = np.random.random(N)
data = {"HIGH":high, "LOW":low, "CLOSE":close, "OPEN":opening, "VOLUME":volume}
df = pd.DataFrame(data)
df.head()
CLOSE HIGH LOW OPEN VOLUME
0 0.848676 0.260967 0.004188 0.139342 0.931406
1 0.771065 0.356639 0.495715 0.652106 0.988217
2 0.288206 0.567776 0.023687 0.809410 0.134134
3 0.832711 0.508586 0.031569 0.120774 0.891948
4 0.857051 0.391618 0.155635 0.069054 0.628036
Assign the tp and tpv columns directly, then apply cumsum to get sumTpv and sumVolume:
df["tp"] = (df['HIGH'] + df['LOW'] + df['CLOSE'] + df['OPEN']) / 4
df["tpv"] = df.tp * df['VOLUME']
df["sumTpv"] = df.tpv.cumsum()
df["sumVolume"] = df.VOLUME.cumsum()
df["vwap"] = df.sumTpv.div(df.sumVolume)
df.head()
CLOSE HIGH LOW OPEN VOLUME tp tpv \
0 0.848676 0.260967 0.004188 0.139342 0.931406 0.313293 0.291803
1 0.771065 0.356639 0.495715 0.652106 0.988217 0.568881 0.562178
2 0.288206 0.567776 0.023687 0.809410 0.134134 0.422270 0.056641
3 0.832711 0.508586 0.031569 0.120774 0.891948 0.373410 0.333063
4 0.857051 0.391618 0.155635 0.069054 0.628036 0.368340 0.231331
sumTpv sumVolume vwap
0 0.291803 0.931406 0.313293
1 0.853982 1.919624 0.444869
2 0.910622 2.053758 0.443393
3 1.243685 2.945706 0.422203
4 1.475016 3.573742 0.412737
Update (per OP comment):
To get dayVwap as the sum of all vwap, use dayVwap = df.vwap.sum().