how pull beta data from yahoo.finance? - python

beta values are calculated in yahoo.finance and thinking I can save time rather calculating through variance and etc. The beta chart can be seen under stock chart. I am able to extract close price an volume for the ticker using the code below:
import yfinance as yf
from yahoofinancials import YahooFinancials
df = yf.download('AAPL, MSFT',
start='2021-08-01',
end=date.today(),
progress=False)
adjusted_close=df['Adj Close'].reset_index()
volume=df['Volume'].reset_index()
but how can get beta values the same way we get for prices or volumes? I am looking for pulling historical beta data with start and end date.

you can do this in a batch, using concat instead of the soon-to-be deprecated append
# import yfinance
import yfinance as yf
# initialise with a df with the columns
df = pd.DataFrame(columns=['Stock','Beta','Marketcap'])
# here, symbol_sgx is the list of symbols (tickers) you would like to retrieve data of
# for instance, to retrieve information for DBS, UOB, and Singtel, use the following:
symbol_sgx = ['D05.SI', 'U11.SI','Z74.SI']
for stock in symbol_sgx:
ticker = yf.Ticker(stock)
info = ticker.info
beta = info.get('beta')
marketcap = info.get('marketCap')
df_temp = pd.DataFrame({'Stock':stock,'Beta':[beta],'Marketcap':[marketcap]})
df = pd.concat([df, df_temp], ignore_index=True)
# this line allows you to check that you retrieved the right information
df
info.get() is a better alternative than info[] The latter is little buggy; if one of the tickers is errant (eg outdated, delisted) the script would stop. This is especially annoying if you have a long list of tickers, and you don't know which is the errant ticker. info.get() would continue to run if no information is available. For these entries, you just need to post-process a df.dropna() to remove NaNs.

Yahoo Finance has a dictionary of company information that can be retrieved in bulk. This includes beta values, which can be used.
import yfinance as yf
ticker = yf.Ticker('AAPL')
stock_info = ticker.info
stock_info['beta']
# 1.201965

Related

how to find values above threshold in pandas and store them with date

I have a DF with stock prices and I want to find stock prices for each day that are above a threshold and record the date, percent increase and stock name.
import pandas as pd
import requests
import time
import pandas as pd
import yfinance as yf
stock_ticker=['AAPL','MSFT','LCID','HOOD','TSLA]
df = yf.download(stock_tickers,
start='2020-01-01',
end='2021-06-12',
progress=True,
)
data=df['Adj Close']
data=data.pct_change()
data.dropna(inplace=True)
top=[]
for i in range(len(data)):
if i>.01 :
top.append(data.columns[i])
I tried to do a for loop but it saves all the tickers name
What I want to do is find the stocks for each day that increased by 1% and save the name, date and percent increase in a pandas.
Any help would be appreciate it
There might be a more efficient way, but I'd use DataFrame.iteritems(). An example attached below. I kept duplicated Date index since I was not sure how you'd like to keep the data.
data = df["Adj Close"].pct_change()
threshold = 0.01
df_above_th_list = []
for item in data.iteritems():
stock = item[0]
sr_above_th = item[1][item[1] > threshold]
df_above_th_list.append(pd.DataFrame({"stock": stock, "pct": sr_above_th}))
df_above_th = pd.concat(df_above_th_list)
If you want to process the data by row, you can use DataFrame.iterrows() or DataFrame.itertuples().

How to make it so that all available data is being pulled instead of specifically typing out a date range for this script?

The available options dates are below. How can I write a code so that it pulls all those dates instead of having to type them all out in a separate row?
2022-03-11, 2022-03-18, 2022-03-25, 2022-04-01, 2022-04-08, 2022-04-14, 2022-04-22, 2022-05-20, 2022-06-17, 2022-07-15, 2022-10-21, 2023-01-20, 2024-01-19
import yfinance as yf
gme = yf.Ticker("gme")
opt = gme.option_chain('2022-03-11')
print(opt)
First of all, as these dates have no regular pattern, you should create a list of the dates.
list1=['2022-03-11', '2022-03-18', '2022-03-25', '2022-04-01', '2022-04-08', '2022-04-14', '2022-04-22', '2022-05-20', '2022-06-17', '2022-07-15', '2022-10-21', '2023-01-20', '2024-01-19']
After you have created the list, you can initiate your code as how you have done:
import yfinance as yf
gme = yf.Ticker("gme")
But right now, since you would want to have everything being printed out, and I assume you would need to save it to file for a better view (as I have checked the output and I personally prefer csv for yfinance), you can do this:
for date in list1:
df = gme.option_chain(date)
df_call = df[0]
df_put = df[1]
df_call.to_csv(f'call_{date}.csv')
df_put.to_csv(f'put_{date}.csv')

Data appears when printed but doesn't show up in dataframe

#! /usr/lib/python3
import yfinance as yf
import pandas as pd
pd.set_option('display.max_rows', None, 'display.max_columns', None)
# Request stock data from yfinance
ticker = yf.Ticker('AAPL')
# Get all option expiration dates in the form of a list
xdates = ticker.options
# Go through the list of expiry dates one by one
for xdate in xdates:
# Get option chain info for that xdate
option = ticker.option_chain(xdate)
# print out this value, get back 15 columns and 63 rows of information
print(option)
# Put that same data in dataframe
df = pd.DataFrame(data = option)
# Show dataframe
print(df)
Expected: df will show a DataFrame containing the same information that is shown when running print(option), i.e. 15 columns and 63 rows of data, or at least some part of them
Actual:
df shows only two columns with no information
df.shape results in (2,1)
print(df.columns.tolist()) results in [0]
Since the desired info appears when you print it, I'm confused as to why it's not appearing in the dataframe.
The data of option_chain for specific expiration date is avaialable in calls property of the object as dataframe. You don't have to create a new dataframe.
ticker = yf.Ticker('AAPL')
xdates = ticker.options
option = ticker.option_chain(xdates[0])
option.calls # DataFrame
GitHub - yfinance

how to extract weekends and bank holidays for stock price data

markowitz = pd.read_excel('C:/Users/jordan/Desktop/book2.xlsx')
markowitz = markowitz.set_index('Dates')
markowitz
there are some NaN values in the data,some of them are weekends and some of them are holidays,i have to identify the holidays and set it as previous value
is there a simple way i can do this ,i used
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar as calendar
dr = pd.date_range(start='2013-01-01', end='2018-06-12')
df = pd.DataFrame()
df['Date'] = dr
cal = calendar()
holidays = cal.holidays(start=dr.min(), end=dr.max())
df['Holiday'] = df['Date'].isin(holidays)
print (df)
df = df[df['Holiday'] == True]
df
but there are still a lot of dates i have to copy and paste(can i just display the second row "date")and then set them as previous trading day value, is there a simpler way to do this ? Thanks a lot in advance.
There may be a simpler way, if I know what you are trying to do. The fillna method on dataframes lets you forward fill. So if you don't want to fill weekend days but want to fill all other nas (i.e. holidays), you can just exclude Saturdays and Sundays as follows:
df.loc[~df['Date'].dt.weekday_name.isin(['Saturday','Sunday'])] = df.loc[~df['Date'].dt.weekday_name.isin(['Saturday','Sunday'])].fillna(method='ffill')
You can use this on the whole dataframe or on particular columns.
I think your best bet is to get an API key from quandl.com. It's free and it gives you access to all kinds of time series historical data. There used to be access to Yahoo Finance and Google Finance, but I think both were depreciated well over 1 year ago.
Here is a small sample of code that can definitely help you.
import quandl
quandl.ApiConfig.api_key = 'your_api_key_goes_here'
# get the table for daily stock prices and,
# filter the table for selected tickers, columns within a time range
# set paginate to True because Quandl limits tables API to 10,000 rows per call
data = quandl.get_table('WIKI/PRICES', ticker = ['AAPL', 'MSFT', 'WMT'],
qopts = { 'columns': ['ticker', 'date', 'adj_close'] },
date = { 'gte': '2015-12-31', 'lte': '2016-12-31' },
paginate=True)
print(data)
Check the link below for info about how to get the data you need.
https://blog.quandl.com/api-for-stock-data
Also, please see this for more details about using Python for quantitative finance.
https://financetrain.com/best-python-librariespackages-finance-financial-data-scientists/
Finally, and I apologize if this is a little off topic, but I think it may be helpful at some level...consider something like this...
import requests
from bs4 import BeautifulSoup
base_url = 'http://finviz.com/screener.ashx?v=152&s=ta_topgainers&o=price&c=0,1,2,3,4,5,6,7,25,63,64,65,66,67'
html = requests.get(base_url)
soup = BeautifulSoup(html.content, "html.parser")
main_div = soup.find('div', attrs = {'id':'screener-content'})
light_rows = main_div.find_all('tr', class_="table-light-row-cp")
dark_rows = main_div.find_all('tr', class_="table-dark-row-cp")
data = []
for rows_set in (light_rows, dark_rows):
for row in rows_set:
row_data = []
for cell in row.find_all('td'):
val = cell.a.get_text()
row_data.append(val)
data.append(row_data)
# sort rows to maintain original order
data.sort(key=lambda x: int(x[0]))
import pandas
pandas.DataFrame(data).to_csv("AAA.csv", header=False)
It's not time series data, but rather fundamental data. I haven't spent a lot of time on that site, but maybe you can poke around and find something there that suits your needs. Just a thought.

Storing stock quotes data from this object into python panda data frame

I have a python object imported from win2com which contains stock price quotations. numbars is the number of bars in the stock history. quotations is the object that contain the stock quotations.
To retrieve and store the prices and dates, the python code will look something like this;
for x in range(0,num_bars):
date_quote[x] = quotations(x).Date.date()
close[x] = quotations(x).Close
open[x] = quotations(x).Open
high[x] = quotations(x).Low
low[x] = quotations(x).High
volume[x] = quotations(x).Volume
open_int[x] = quotations(x).OpenInt
I just discovered panda dataframe today and thought it will be better to store the stock quotations data into panda dataframe to make use of the ecosystem built around panda. However, when I looked at the dataframe structure created by pandas_datareader module, it looked highly complicated. Are there any convenient way, like an API or library, to store the stock data from quotations object into panda data frame?
I am using python v3.6
You can do following:
import pandas as pd
df = pd.DataFrame(columns=['Date', 'Close', 'Open', 'Low',
'High', 'Volume', 'OpenInt'])
for i in range(num_bars):
df.loc[i] = [quotations(i).Date.date(),
quotations(i).Close,
quotations(i).Open,
quotations(i).Low,
quotations(i).High,
quotations(i).Volume,
quotations(i).OpenInt]
After that, you'l have df DataFrame with stock quotations data.
Maybe you are looking for:
from pandas_datareader import data
aapl = data.DataReader('AAPL', 'google', '1980-01-01')

Categories