Pandas DataReader handle RemoteError yahoo finance - python

I am using the below code to parse a large tickers list to yahoo datareader, I am trying to get back a dataframe as per below. If the list is large, I often get a RemoteError back but on different tickers each time. I am not sure how to handle the RemoteError and I am happy to drop the ticker and continue with the next ticker in the list. I would, however, like to try again to get adj close ticker data. I thought using a for loop and adding a time delay would help with yahoo requests but I am still getting a Remote error. Any ideas?
IBM MSFT ORCL TSLA YELP
Date
2014-01-02 184.52 36.88 37.61 150.10 67.92
2014-01-03 185.62 36.64 37.51 149.56 67.66
2014-01-06 184.99 35.86 37.36 147.00 71.72
2014-01-07 188.68 36.14 37.74 149.36 72.66
2014-01-08 186.95 35.49 37.61 151.28 78.42
import pandas_datareader.data as web
import datetime as dt
import pandas as pd
import time
from pandas_datareader._utils import RemoteDataError
Which_group = ['Accident & Health Insurance'] ##<<<<put in group here
df = pd.read_csv('/home/ross/Downloads/UdemyPairs/stocks1.csv')
df.set_index('categoryName', inplace = True)
df1 = df.loc[Which_group]
tickers = df1.Ticker.tolist()
print(tickers)
#tickers = ['SPY', 'AAPL', 'MSFT'] # add as many tickers
start = dt.datetime(2013, 1,1)
end = dt.datetime.today()
# Function starts here
def get_previous_close(strt, end, tick_list, this_price):
""" arg: `this_price` can take str Open, High, Low, Close, Volume"""
#make an empty dataframe in which we will append columns
adj_close = pd.DataFrame([])
# loop here.
for idx, i in enumerate(tick_list):
try:
# time.sleep(0.01)
total = web.DataReader(i, 'yahoo', strt, end)
adj_close[i] = total[this_price]
except RemoteDataError:
pass
return adj_close
#call the function
print(get_previous_close(start, end, tickers, 'Adj Close'))

Maybe you can look at this question. This proposes a solution that it might work for you.
Pandas Dataframe - RemoteDataError - Python

Related

Getting a specific element from a Pandas Dataframe

This is probably a rookie question, but I haven't been able to find the solution.
I'm trying to collect some data from Yahoo Finance using pandas.
from pandas_datareader import data
tickers = ['EQNR.OL','BP','CL=F']
start_date = '2001-01-02'
end_date = '2021-02-26'
panel_data = data.DataReader(tickers, 'yahoo', start_date, end_date)
I wanna take a look at the BP stock (I need all 3, so excluding EQNR.OL and CL=F from tickers is not the right solution). I know how to get all the close prices of a single stock:
close_BP = panel_data['Close','BP']
But is there a way I can get all BP data (open, close, high, low) withdrawn from 'panel_data', and not only a specific column like 'close'?
I was thinking something like BP = panel_data[:,'BP'] or BP = panel_data.loc[:,'BP'] but it doesn't work.
A big thanks in advance.
I think what you're looking for is pandas.IndexSlice...
import pandas as pd
panel_data.loc[:, pd.IndexSlice[:, 'BP']]
You can swap multi-columns and then extract them with loc.
panel_data = panel_data.swaplevel(axis=1)
panel_data.loc[:, ('BP',)]
Attributes Adj Close Close High Low Open Volume
Date
2021-01-04 20.551737 20.830000 21.129999 20.549999 21.090000 14485100.0
2021-01-05 22.081030 22.379999 22.780001 21.370001 21.430000 25447500.0
2021-01-06 23.097271 23.410000 23.860001 22.940001 23.370001 25221400.0
2021-01-07 23.590591 23.910000 24.150000 23.500000 23.719999 16470700.0
2021-01-08 24.074045 24.400000 24.490000 23.990000 24.170000 20189600.0
...

How to convert a list to a dataframe without dropping all data?

I am testing the code below.
import pandas as pd
from pandas_datareader import data as wb
tickers = ['SBUX', 'AAPL', 'MSFT']
AllData = []
for ticker in tickers:
print('appending prices for ' + ticker)
tickers = wb.DataReader(ticker,start='2018-7-26',data_source='yahoo')
AllData.append(tickers)
AllData = pd.DataFrame(AllData)
print(AllData)
When I convert the list to a dataframe, everything gets dropped.
Also, I'm trying to get the ticker variable inserted into the relevant spot, so I can tell which one is which. I'd like the final result to look like this.
date ticker adj_close
0 2018-02-13 MSFT 164.34
1 2018-02-12 MSFT 162.71
...
265 2018-02-13 SBUX 81.30
266 2018-02-12 SBUX 82.11
How can I do that? TIA.
There are a couple of issues with your code.
First, you are iterating through tickers via for ticker in tickers: but you then reassign that variable in the loop via tickers = wb.DataReader(...). Never change the object over which you are iterating. Although this actually does not cause an issue in this case, it is clearly undesirable.
Second, AllData is a list containing three dataframes, none of which have a reference to their relevant ticker. You could concatenate at this stage, but you should first include the ticker as an additional column to the dataframe via .assign(ticker).
price_data = []
for ticker in tickers:
prices = wb.DataReader(ticker, start='2018-7-26', data_source='yahoo')[['Adj Close']]
price_data.append(prices.assign(ticker=ticker)[['ticker', 'Adj Close']])
df = pd.concat(price_data)
>>> df
ticker Adj Close
Date
2018-07-26 SBUX 50.324104
2018-07-27 SBUX 51.008789
2018-07-30 SBUX 50.764256
...
>>> df.set_index('ticker', append=True).unstack('ticker')
Adj Close
ticker AAPL MSFT SBUX
Date
2018-07-26 191.298080 107.868378 50.324104
2018-07-27 188.116501 105.959373 51.008789
2018-07-30 187.062546 103.686295 50.764256
...

How do I continiously calculate something based on the past X amount of data? (Please see info for more details)

Goal:
Calculate 50day moving average for each day, based on the past 50 days. I can calculate the mean for the entire dataset, but I am trying to contiously calculate the mean based on the past 50 days...with it changing each day of course!
import numpy as np
import pandas_datareader.data as pdr
import pandas as pd
# Define the instruments to download. We would like to see Apple, Microsoft and the S&P500 index.
ticker = ['AAPL']
#Define the data period that you would like
start_date = '2017-07-01'
end_date = '2019-02-08'
# User pandas_reader.data.DataReader to load the stock prices from Yahoo Finance.
df = pdr.DataReader(ticker, 'yahoo', start_date, end_date)
# Yahoo Finance gives 'High', 'Low', 'Open', 'Close', 'Volume', 'Adj Close'.
#Export Close PRice, Volume, and Date from yahoo finance
CloseP = df['Close']
CloseP.head()
Volm = df['Volume']
Volm.head()
Date = df["Date"] = df.index
#create a table with Date, Close Price, and Volume
Table = pd.DataFrame(np.array(Date), columns = ['Date'])
Table['Close Price'] = np.array(CloseP)
Table['Volume'] = np.array(Volm)
print (Table)
#create a column that contiosuly calculates 50 day MA
#This is what I can't get to work!
MA = np.mean(df['Close'])
Table['Moving Average'] = np.array(MA)
print (Table)
First of all, please, don't use CamelCase to name your variables, as they look as class names otherwise.
Next, use merge() to join your data frames instead of those yours np.array way:
>>> table = CloseP.merge(Volm, left_index=True, right_index=True)
>>> table.columns = ['close', 'volume'] # give names to columns
>>> table.head(10)
close volume
Date
2017-07-03 143.500000 14277800.0
2017-07-05 144.089996 21569600.0
2017-07-06 142.729996 24128800.0
2017-07-07 144.179993 19201700.0
2017-07-10 145.059998 21090600.0
2017-07-11 145.529999 19781800.0
2017-07-12 145.740005 24884500.0
2017-07-13 147.770004 25199400.0
2017-07-14 149.039993 20132100.0
2017-07-17 149.559998 23793500.0
Finally, use combination of rolling(), mean() and dropna() to calculate moving average:
>>> ma50 = table.rolling(window=50).mean().dropna()
>>> ma50.head(10)
close volume
Date
2017-09-12 155.075401 26092540.0
2017-09-13 155.398401 26705132.0
2017-09-14 155.682201 26748954.0
2017-09-15 156.025201 27248670.0
2017-09-18 156.315001 27430024.0
2017-09-19 156.588401 27424424.0
2017-09-20 156.799201 28087816.0
2017-09-21 156.952201 28340360.0
2017-09-22 157.034601 28769280.0
2017-09-25 157.064801 29254384.0
Please, refer to the docs of mentioned API calls to get more info about their usage. Good luck!

Stock price Import issue for a newbie

a total newbie who started this week on python. I have been reading Datacamp and some other online resources as well as Python without fear.
I wanted to test and see if I can import some data prices and copied code from the internet. I cannot get it to work due to an error: TypeError: string indices must be integers on line 10
import pandas_datareader as pdr #needed to read data from yahoo
#df = pdr.get_data_yahoo('AAPL')
#print (df.Close)
stock =('AAPL')
start_date = '2017-01-01'
end_date = '2017-12-10'
closes = [c['Close'] for c in pdr.get_data_yahoo(stock, start_date,
end_date)]
for c in closes:
print (c)
The line closes = [c.......] is giving me an error.
Any advice on how to fix this? I am starting my journey and actually trying to import the close prices for past year for S&P500 and then save them to Excel. If there is a snippet which does this already and I can learn from, please let me know.
Thank you all.
The call to get_data_yahoo returns a single dataframe.
df = pdr.get_data_yahoo(stock, start_date, end_date)
df.head()
Open High Low Close Adj Close \
Date
2017-01-03 115.800003 116.330002 114.760002 116.150002 114.311760
2017-01-04 115.849998 116.510002 115.750000 116.019997 114.183815
2017-01-05 115.919998 116.860001 115.809998 116.610001 114.764473
2017-01-06 116.779999 118.160004 116.470001 117.910004 116.043915
2017-01-09 117.949997 119.430000 117.940002 118.989998 117.106812
Volume
Date
2017-01-03 28781900
2017-01-04 21118100
2017-01-05 22193600
2017-01-06 31751900
2017-01-09 33561900
type(df)
pandas.core.frame.DataFrame
Meanwhile, you're trying to iterate over this returned dataframe. By default, a for loop will iterate over the columns. For example:
for c in df:
print(c)
Open
High
Low
Close
Adj Close
Volume
When you replicate this code in a list comp, c is given each column name in turn, and str[str] is an invalid operation.
In summary, just doing closes = df['Closes'] on the returned result is sufficient to obtain the Closes column.
I think you are simply looking to dump the dataframe to an Excel spreadsheet. This will get you there.
import pandas as pd
import pandas_datareader as pdr
df = pdr.get_data_yahoo('AAPL')
df.head(2)
Out[12]:
Open High Low Close Adj Close Volume
Date
2009-12-31 30.447144 30.478571 30.08 30.104286 26.986492 88102700
2010-01-04 30.490000 30.642857 30.34 30.572857 27.406532 123432400
df.to_excel('dump_aapl.xlsx')
If you just want the Close column:
df['Close'].to_excel('dump_aapl.xlsx')

Reading stocks from multiple sources using Pandas Datareader

I have a list of 6 stocks. I have set up my code to reference the stock name from the list vs hard coding in the stock name ... first with SPY which is in position 0. The code below the list will return yesterday's closing price of stock.
My question is: how do I loop the code through each stock in the list so that I print out the closing price for all 6 stocks?
I think I need to use loops but I don't understand them.
Any ideas?
CODE:
#import packages
import pandas_datareader.data as web
import datetime as dt
#create list of stocks to reference later
stocks = ['SPY', 'QQQ', 'IWM', 'AAPL', 'FB', 'GDX']
#define prior day close price
start = dt.datetime(2010, 1, 1)
end = dt.datetime(2030, 1, 27)
ticker = web.DataReader(stocks[0], 'google', start, end)
prior_day = ticker.iloc[-1]
PDL = list(prior_day)
prior_close = PDL[3]
#print the name of the stock from the stocks list, and the prior close price
print(stocks[0])
print('Prior Close')
print(prior_close)
RETURNS:
SPY
Prior Close
249.08
You could use a loop, but you don't need loops for this. Pass your entire list of stocks to the DataReader. This should be cheaper than making multiple calls.
stocks = ['SPY', 'QQQ', 'IWM', 'AAPL', 'FB', 'GDX']
ticker = web.DataReader(stocks, 'google', start, end)
close = ticker.to_frame().tail()['Close'].to_frame('Prior Close')
print(close)
Prior Close
Date minor
2017-09-26 FB 164.21
GDX 23.35
IWM 144.61
QQQ 143.17
SPY 249.08
Details
ticker is a panel, but can be converted to a dataframe using to_frame:
print(ticker)
<class 'pandas.core.panel.Panel'>
Dimensions: 5 (items) x 251 (major_axis) x 6 (minor_axis)
Items axis: Open to Volume
Major_axis axis: 2016-09-28 00:00:00 to 2017-09-26 00:00:00
Minor_axis axis: AAPL to SPY
df = ticker.to_frame()
You can view all recorded dates of stocks using df.index.get_level_values:
print(df.index.get_level_values('Date'))
DatetimeIndex(['2016-09-28', '2016-09-28', '2016-09-28', '2016-09-28',
'2016-09-28', '2016-09-28', '2016-09-29', '2016-09-29',
'2016-09-29', '2016-09-29',
...
'2017-09-25', '2017-09-25', '2017-09-25', '2017-09-25',
'2017-09-26', '2017-09-26', '2017-09-26', '2017-09-26',
'2017-09-26', '2017-09-26'],
dtype='datetime64[ns]', name='Date', length=1503, freq=None)
If you want to view all stocks for a particular date, you can use df.loc with a slice. For your case, you want to see the closing stocks on the last date, so you can use df.tail:
print(df.tail()['Close'].to_frame())
Close
Date minor
2017-09-26 FB 164.21
GDX 23.35
IWM 144.61
QQQ 143.17
SPY 249.08
You can just use a for loop
for stock in stocks:
start = dt.datetime(2010, 1, 1)
end = dt.datetime(2030, 1, 27)
ticker = web.DataReader(stock, 'google', start, end)
prior_day = ticker.iloc[-1]
PDL = list(prior_day)
prior_close = PDL[3]
print(stock)
print('Prior Close')
print(prior_close)
I'll make you a function that you can always pass for a list of stocks and that provides you a time series. ;)
I use this function for numerous tickers
tickers = ['SPY', 'QQQ', 'EEM', 'INDA', 'AAPL', 'MSFT'] # add as many tickers
start = dt.datetime(2010, 3,31)
end = dt.datetime.today()
# Function starts here
def get_previous_close(strt, end, tick_list, this_price):
""" arg: `this_price` can take str Open, High, Low, Close, Volume"""
#make an empty dataframe in which we will append columns
adj_close = pd.DataFrame([])
# loop here.
for idx, i in enumerate(tick_list):
total = web.DataReader(i, 'google', strt, end)
adj_close[i] = total[this_price]
return adj_close
#call the function
get_previous_close(start, end, tickers, 'Close')
You can use this time series in any way possible. It's always good to use a function that has maintainability and re-usability. Also, this function can take yahoo instead of google

Categories