how to extract weekends and bank holidays for stock price data - python

markowitz = pd.read_excel('C:/Users/jordan/Desktop/book2.xlsx')
markowitz = markowitz.set_index('Dates')
markowitz
there are some NaN values in the data,some of them are weekends and some of them are holidays,i have to identify the holidays and set it as previous value
is there a simple way i can do this ,i used
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar as calendar
dr = pd.date_range(start='2013-01-01', end='2018-06-12')
df = pd.DataFrame()
df['Date'] = dr
cal = calendar()
holidays = cal.holidays(start=dr.min(), end=dr.max())
df['Holiday'] = df['Date'].isin(holidays)
print (df)
df = df[df['Holiday'] == True]
df
but there are still a lot of dates i have to copy and paste(can i just display the second row "date")and then set them as previous trading day value, is there a simpler way to do this ? Thanks a lot in advance.

There may be a simpler way, if I know what you are trying to do. The fillna method on dataframes lets you forward fill. So if you don't want to fill weekend days but want to fill all other nas (i.e. holidays), you can just exclude Saturdays and Sundays as follows:
df.loc[~df['Date'].dt.weekday_name.isin(['Saturday','Sunday'])] = df.loc[~df['Date'].dt.weekday_name.isin(['Saturday','Sunday'])].fillna(method='ffill')
You can use this on the whole dataframe or on particular columns.

I think your best bet is to get an API key from quandl.com. It's free and it gives you access to all kinds of time series historical data. There used to be access to Yahoo Finance and Google Finance, but I think both were depreciated well over 1 year ago.
Here is a small sample of code that can definitely help you.
import quandl
quandl.ApiConfig.api_key = 'your_api_key_goes_here'
# get the table for daily stock prices and,
# filter the table for selected tickers, columns within a time range
# set paginate to True because Quandl limits tables API to 10,000 rows per call
data = quandl.get_table('WIKI/PRICES', ticker = ['AAPL', 'MSFT', 'WMT'],
qopts = { 'columns': ['ticker', 'date', 'adj_close'] },
date = { 'gte': '2015-12-31', 'lte': '2016-12-31' },
paginate=True)
print(data)
Check the link below for info about how to get the data you need.
https://blog.quandl.com/api-for-stock-data
Also, please see this for more details about using Python for quantitative finance.
https://financetrain.com/best-python-librariespackages-finance-financial-data-scientists/
Finally, and I apologize if this is a little off topic, but I think it may be helpful at some level...consider something like this...
import requests
from bs4 import BeautifulSoup
base_url = 'http://finviz.com/screener.ashx?v=152&s=ta_topgainers&o=price&c=0,1,2,3,4,5,6,7,25,63,64,65,66,67'
html = requests.get(base_url)
soup = BeautifulSoup(html.content, "html.parser")
main_div = soup.find('div', attrs = {'id':'screener-content'})
light_rows = main_div.find_all('tr', class_="table-light-row-cp")
dark_rows = main_div.find_all('tr', class_="table-dark-row-cp")
data = []
for rows_set in (light_rows, dark_rows):
for row in rows_set:
row_data = []
for cell in row.find_all('td'):
val = cell.a.get_text()
row_data.append(val)
data.append(row_data)
# sort rows to maintain original order
data.sort(key=lambda x: int(x[0]))
import pandas
pandas.DataFrame(data).to_csv("AAA.csv", header=False)
It's not time series data, but rather fundamental data. I haven't spent a lot of time on that site, but maybe you can poke around and find something there that suits your needs. Just a thought.

Related

how to find values above threshold in pandas and store them with date

I have a DF with stock prices and I want to find stock prices for each day that are above a threshold and record the date, percent increase and stock name.
import pandas as pd
import requests
import time
import pandas as pd
import yfinance as yf
stock_ticker=['AAPL','MSFT','LCID','HOOD','TSLA]
df = yf.download(stock_tickers,
start='2020-01-01',
end='2021-06-12',
progress=True,
)
data=df['Adj Close']
data=data.pct_change()
data.dropna(inplace=True)
top=[]
for i in range(len(data)):
if i>.01 :
top.append(data.columns[i])
I tried to do a for loop but it saves all the tickers name
What I want to do is find the stocks for each day that increased by 1% and save the name, date and percent increase in a pandas.
Any help would be appreciate it
There might be a more efficient way, but I'd use DataFrame.iteritems(). An example attached below. I kept duplicated Date index since I was not sure how you'd like to keep the data.
data = df["Adj Close"].pct_change()
threshold = 0.01
df_above_th_list = []
for item in data.iteritems():
stock = item[0]
sr_above_th = item[1][item[1] > threshold]
df_above_th_list.append(pd.DataFrame({"stock": stock, "pct": sr_above_th}))
df_above_th = pd.concat(df_above_th_list)
If you want to process the data by row, you can use DataFrame.iterrows() or DataFrame.itertuples().

How can I create a DF for each iteration in a loop?

I'm attempting to create a small database that stores the schedules for each team in the NFL. I want to create a new DF for each iteration in the loop.
This means that each team would get a new DF for each season. Each new DF would be named the team abbreviation and year of the iteration. For example, IND_2021 or something of the like for the Indianapolis Colts.
Here is my code:
from bs4 import BeautifulSoup
import requests
from urllib.request import urlopen
import pandas as pd
abbreviations = ['IND','LAR','SFO','BUF','LAC','TAM','BAL','CIN','MIN','GNB','KAN','PIT','ARI','DAL','CLE',
'SEA','NWE','PHI','NOR','WAS','ATL','LVR','TEN','CAR','DEN','MIA','HOU','CHI','DET','JAX','NYG','NYJ']
year = '2021'
list_of_dataframes = []
for team in abbreviations:
url = 'https://www.pro-football-reference.com/teams/{}/{}.htm'.format(team,year)
df = pd.read_html(url)[1]
df['team'] = team
list_of_dataframes.append(df)
final_df = pd.concat(list_of_dataframes).reset_index(drop=True)
I've tried a bunch of different potential solutions, but have been stuck on this for a while. I might be on the total wrong path as well, so I would very much appreciate any insight on how to resolve.
1st, you need to fix your url variable based on those team abbreviations. The reason you get Not Found error is because it's precisely that...the site you want doesn't exist. Try going to https://www.pro-football-reference.com/teams/CHI/2021.htm. Those aren't the correct urls. The team abbreviations in the url should be lower case.
Secondly, with that point I just made, some of the team abbreviations aren't correct. 'IND' should be 'CLT', 'LAR' should be 'RAM', etc... there's a few more to correct too.
Thirdly, you can't "name" a dataframe. What you can do as suggested in the comments is create a dictionary of the dataframes. And the key can be the name of the team:
import pandas as pd
abbreviations = ['CLT','RAM','SFO','BUF','SDG','TAM','RAV','CIN','MIN','GNB','KAN','PIT','CRD','DAL','CLE',
'SEA','NWE','PHI','NOR','WAS','ATL','RAI','OTI','CAR','DEN','MIA','HTX','CHI','DET','JAX','NYG','NYJ']
year = '2021'
dict_of_dataframes = {}
for team in abbreviations:
print(team)
url = 'https://www.pro-football-reference.com/teams/{}/{}.htm'.format(team.lower(),year)
df = pd.read_html(url)[1]
df['team'] = team
dict_of_dataframes[f'{team}_{year}'] = df

PYTHON, Pandas Dataframe: how to select and read only certain rows

For the purpose to be clear here is the code that works perfectly (of course I put only the beginning, the rest is not important here):
df = pd.read_csv(
'https://github.com/pcm-dpc/COVID-19/raw/master/dati-andamento-nazionale/'
'dpc-covid19-ita-andamento-nazionale.csv',
parse_dates=['data'], index_col='data')
df.index = df.index.normalize()
ts = df[['nuovi_positivi']].dropna()
sts = ts.nuovi_positivi
So basically it takes some data from the online Github csv that you may find here:
Link NAZIONALE and look at the "data" which is the italian for "date" and extract for every date the value nuovi_positivi and then it put it into the program.
Now I have to do the same thing with this json that you may find here
Link Json
As you may see, now for every date there are 21 different values because Italy has 21 regions (Abruzzo Basilicata Campania and so on) but I am interested ONLY with the values of the region "Veneto", and I want to extract only the rows that contains "Veneto" under the label "denominazione_regione" to get for every day the value "nuovi_positivi".
I tried with:
df = pd.read_json('https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-
json/dpc-covid19-ita-regioni.json' , parse_dates=['data'], index_col='data',
index_row='Veneto')
df.index = df.index.normalize()
ts = df[['nuovi_positivi']].dropna()
sts = ts.nuovi_positivi
but of course it doesn't work. How to solve the problem? Thanks
try this:
df = pd.read_json('https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-regioni.json',
convert_dates =['data'])
df.index = df['data']
df.index = df.index.normalize()
df = df[df["denominazione_regione"] == 'Veneto']
ts = df[['nuovi_positivi']].dropna()
sts = ts.nuovi_positivi

how pull beta data from yahoo.finance?

beta values are calculated in yahoo.finance and thinking I can save time rather calculating through variance and etc. The beta chart can be seen under stock chart. I am able to extract close price an volume for the ticker using the code below:
import yfinance as yf
from yahoofinancials import YahooFinancials
df = yf.download('AAPL, MSFT',
start='2021-08-01',
end=date.today(),
progress=False)
adjusted_close=df['Adj Close'].reset_index()
volume=df['Volume'].reset_index()
but how can get beta values the same way we get for prices or volumes? I am looking for pulling historical beta data with start and end date.
you can do this in a batch, using concat instead of the soon-to-be deprecated append
# import yfinance
import yfinance as yf
# initialise with a df with the columns
df = pd.DataFrame(columns=['Stock','Beta','Marketcap'])
# here, symbol_sgx is the list of symbols (tickers) you would like to retrieve data of
# for instance, to retrieve information for DBS, UOB, and Singtel, use the following:
symbol_sgx = ['D05.SI', 'U11.SI','Z74.SI']
for stock in symbol_sgx:
ticker = yf.Ticker(stock)
info = ticker.info
beta = info.get('beta')
marketcap = info.get('marketCap')
df_temp = pd.DataFrame({'Stock':stock,'Beta':[beta],'Marketcap':[marketcap]})
df = pd.concat([df, df_temp], ignore_index=True)
# this line allows you to check that you retrieved the right information
df
info.get() is a better alternative than info[] The latter is little buggy; if one of the tickers is errant (eg outdated, delisted) the script would stop. This is especially annoying if you have a long list of tickers, and you don't know which is the errant ticker. info.get() would continue to run if no information is available. For these entries, you just need to post-process a df.dropna() to remove NaNs.
Yahoo Finance has a dictionary of company information that can be retrieved in bulk. This includes beta values, which can be used.
import yfinance as yf
ticker = yf.Ticker('AAPL')
stock_info = ticker.info
stock_info['beta']
# 1.201965

Filtering Pandas Dataframe by date not working

I am downloading Bitcoin price data using Cryptowatch API. Downloading price data works well, but I only need price data until from 1 month ago, so data from 29.10.2019 until 28.11.2019. I read several answers to similar questions but it does not seem to work for my code as I get the same output after filtering than without filtering.
Here is my code:
#define 1day period
periods = '86400'
#get price data from cryptowatchAPI
resp = requests.get('https://api.cryptowat.ch/markets/bitfinex/btcusd/ohlc', params={'periods': periods})
resp.ok
#create pandas dataframe
data = resp.json()
df = pd.DataFrame(data['result'][periods], columns=[
'CloseTime', 'OpenPrice', 'HighPrice', 'LowPrice', 'ClosePrice', 'Volume', 'NA'])
#Make a date out of CloseTime
df['CloseTime'] = pd.to_datetime(df['CloseTime'], unit='s')
#make CloseTime Index of the Dataframe
df.set_index('CloseTime', inplace=True)
#filter df by date until 1 month ago
df.loc[datetime.date(year=2019,month=10,day=29):datetime.date(year=2019,month=11,day=28)]
df
There is no error or anything, but the output is always the same, so filtering does not work.
Thank you very much in advance!!
Use strings format of datetimes for filtering, check also more infornation in docs:
df1 = df['2019-10-29':'2019-11-28']
Or:
s = datetime.datetime(year=2019,month=10,day=29)
e = datetime.datetime(year=2019,month=11,day=28)
df1 = df[s:e]

Categories