a total newbie who started this week on python. I have been reading Datacamp and some other online resources as well as Python without fear.
I wanted to test and see if I can import some data prices and copied code from the internet. I cannot get it to work due to an error: TypeError: string indices must be integers on line 10
import pandas_datareader as pdr #needed to read data from yahoo
#df = pdr.get_data_yahoo('AAPL')
#print (df.Close)
stock =('AAPL')
start_date = '2017-01-01'
end_date = '2017-12-10'
closes = [c['Close'] for c in pdr.get_data_yahoo(stock, start_date,
end_date)]
for c in closes:
print (c)
The line closes = [c.......] is giving me an error.
Any advice on how to fix this? I am starting my journey and actually trying to import the close prices for past year for S&P500 and then save them to Excel. If there is a snippet which does this already and I can learn from, please let me know.
Thank you all.
The call to get_data_yahoo returns a single dataframe.
df = pdr.get_data_yahoo(stock, start_date, end_date)
df.head()
Open High Low Close Adj Close \
Date
2017-01-03 115.800003 116.330002 114.760002 116.150002 114.311760
2017-01-04 115.849998 116.510002 115.750000 116.019997 114.183815
2017-01-05 115.919998 116.860001 115.809998 116.610001 114.764473
2017-01-06 116.779999 118.160004 116.470001 117.910004 116.043915
2017-01-09 117.949997 119.430000 117.940002 118.989998 117.106812
Volume
Date
2017-01-03 28781900
2017-01-04 21118100
2017-01-05 22193600
2017-01-06 31751900
2017-01-09 33561900
type(df)
pandas.core.frame.DataFrame
Meanwhile, you're trying to iterate over this returned dataframe. By default, a for loop will iterate over the columns. For example:
for c in df:
print(c)
Open
High
Low
Close
Adj Close
Volume
When you replicate this code in a list comp, c is given each column name in turn, and str[str] is an invalid operation.
In summary, just doing closes = df['Closes'] on the returned result is sufficient to obtain the Closes column.
I think you are simply looking to dump the dataframe to an Excel spreadsheet. This will get you there.
import pandas as pd
import pandas_datareader as pdr
df = pdr.get_data_yahoo('AAPL')
df.head(2)
Out[12]:
Open High Low Close Adj Close Volume
Date
2009-12-31 30.447144 30.478571 30.08 30.104286 26.986492 88102700
2010-01-04 30.490000 30.642857 30.34 30.572857 27.406532 123432400
df.to_excel('dump_aapl.xlsx')
If you just want the Close column:
df['Close'].to_excel('dump_aapl.xlsx')
Related
This is probably a rookie question, but I haven't been able to find the solution.
I'm trying to collect some data from Yahoo Finance using pandas.
from pandas_datareader import data
tickers = ['EQNR.OL','BP','CL=F']
start_date = '2001-01-02'
end_date = '2021-02-26'
panel_data = data.DataReader(tickers, 'yahoo', start_date, end_date)
I wanna take a look at the BP stock (I need all 3, so excluding EQNR.OL and CL=F from tickers is not the right solution). I know how to get all the close prices of a single stock:
close_BP = panel_data['Close','BP']
But is there a way I can get all BP data (open, close, high, low) withdrawn from 'panel_data', and not only a specific column like 'close'?
I was thinking something like BP = panel_data[:,'BP'] or BP = panel_data.loc[:,'BP'] but it doesn't work.
A big thanks in advance.
I think what you're looking for is pandas.IndexSlice...
import pandas as pd
panel_data.loc[:, pd.IndexSlice[:, 'BP']]
You can swap multi-columns and then extract them with loc.
panel_data = panel_data.swaplevel(axis=1)
panel_data.loc[:, ('BP',)]
Attributes Adj Close Close High Low Open Volume
Date
2021-01-04 20.551737 20.830000 21.129999 20.549999 21.090000 14485100.0
2021-01-05 22.081030 22.379999 22.780001 21.370001 21.430000 25447500.0
2021-01-06 23.097271 23.410000 23.860001 22.940001 23.370001 25221400.0
2021-01-07 23.590591 23.910000 24.150000 23.500000 23.719999 16470700.0
2021-01-08 24.074045 24.400000 24.490000 23.990000 24.170000 20189600.0
...
Here is how the data looks like:
This is a snip of the data
Rows include:
Attributes Adj Close Close High Low Open Volume
Symbols CVX INTC CVX INTC CVX INTC CVX INTC CVX INTC CVX INTC
However, the stock symbols and Date and attributes cannot be called easily as it is. For example, CVX close and CVX low and Date are not even are the same row. I tried to pivot and also indexing but not yet successful. I am trying to have the column description all in one row and data as an index so I can perform analysis.
I tried the following first to index the date but it did not work:
data['Df'] = pd.to_datetime(df['Date'])
And unsuccessfully tried indexing the data:
df_pivot = df.pivot('Date','Symbol','close').reset_index()
I am not sure what the desired output is, but I wrote the code with the understanding that I want to process the stock name as a column. Is this the answer you intended? wide_to_long
import pandas as pd
import pandas_datareader.data as web
df_stock = web.DataReader(['CVX', 'INTC'], 'yahoo', '2020-01-02', '2020-01-15')
df_stock.reset_index(inplace=True)
df_stock.columns = [x[0]+'_'+x[1] for x in df_stock.columns]
df_stock = (pd.wide_to_long(df_stock, ['Adj Close','Close','High','Low','Open','Volume'],
i='Date_', j='Stock', sep='_', suffix='\\w+')
df_stock.head()
Adj Close Close High Low Open Volume
Date_ Stock
2020-01-02 CVX 114.918457 121.430000 121.629997 120.769997 120.809998 5205000
2020-01-03 CVX 114.520988 121.010002 122.720001 120.739998 121.779999 6360900
2020-01-06 CVX 114.132973 120.599998 121.669998 120.330002 121.239998 9953000
2020-01-07 CVX 112.675552 119.059998 119.730003 117.769997 119.019997 7856900
2020-01-08 CVX 111.388489 117.699997 119.089996 117.650002 118.550003 7295900
I am using the below code to parse a large tickers list to yahoo datareader, I am trying to get back a dataframe as per below. If the list is large, I often get a RemoteError back but on different tickers each time. I am not sure how to handle the RemoteError and I am happy to drop the ticker and continue with the next ticker in the list. I would, however, like to try again to get adj close ticker data. I thought using a for loop and adding a time delay would help with yahoo requests but I am still getting a Remote error. Any ideas?
IBM MSFT ORCL TSLA YELP
Date
2014-01-02 184.52 36.88 37.61 150.10 67.92
2014-01-03 185.62 36.64 37.51 149.56 67.66
2014-01-06 184.99 35.86 37.36 147.00 71.72
2014-01-07 188.68 36.14 37.74 149.36 72.66
2014-01-08 186.95 35.49 37.61 151.28 78.42
import pandas_datareader.data as web
import datetime as dt
import pandas as pd
import time
from pandas_datareader._utils import RemoteDataError
Which_group = ['Accident & Health Insurance'] ##<<<<put in group here
df = pd.read_csv('/home/ross/Downloads/UdemyPairs/stocks1.csv')
df.set_index('categoryName', inplace = True)
df1 = df.loc[Which_group]
tickers = df1.Ticker.tolist()
print(tickers)
#tickers = ['SPY', 'AAPL', 'MSFT'] # add as many tickers
start = dt.datetime(2013, 1,1)
end = dt.datetime.today()
# Function starts here
def get_previous_close(strt, end, tick_list, this_price):
""" arg: `this_price` can take str Open, High, Low, Close, Volume"""
#make an empty dataframe in which we will append columns
adj_close = pd.DataFrame([])
# loop here.
for idx, i in enumerate(tick_list):
try:
# time.sleep(0.01)
total = web.DataReader(i, 'yahoo', strt, end)
adj_close[i] = total[this_price]
except RemoteDataError:
pass
return adj_close
#call the function
print(get_previous_close(start, end, tickers, 'Adj Close'))
Maybe you can look at this question. This proposes a solution that it might work for you.
Pandas Dataframe - RemoteDataError - Python
I want to display the current date (at this example 2017-11-16) at top of the data frame. when i download data, the new date appear at the bottom of data frame.
how can i change it?
Open High Low Close Adj Close
Date
2017-11-13 173.500000 174.500000 173.399994 173.970001 173.970001
2017-11-14 173.039993 173.479996 171.179993 171.339996 171.339996
2017-11-15 169.970001 170.320007 168.380005 169.080002 169.080002
2017-11-16 171.179993 171.869995 170.300003 171.100006 171.100006
My code is:
import pandas_datareader.data as web
import datetime
import pandas as pd
DayToPast=5
today = datetime.date.today()
end = datetime.date.today()
start = today-datetime.timedelta(days=DayToPast)
df=web.DataReader('AAPL', 'yahoo', start, end)
print(df)
Thanks & Have a nice day.
Assuming "today" is the maximal date in your dataframe's index, you can just sort your data by descending order of its index:
df.sort_index(ascending=False, inplace=True)
I have a set of calculated OHLCVA daily securities data in a pandas dataframe like this:
>>> type(data_dy)
<class 'pandas.core.frame.DataFrame'>
>>> data_dy
Open High Low Close Volume Adj Close
Date
2012-12-28 140.64 141.42 139.87 140.03 148806700 134.63
2012-12-31 139.66 142.56 139.54 142.41 243935200 136.92
2013-01-02 145.11 146.15 144.73 146.06 192059000 140.43
2013-01-03 145.99 146.37 145.34 145.73 144761800 140.11
2013-01-04 145.97 146.61 145.67 146.37 116817700 140.72
[5 rows x 6 columns]
I'm using the following dictionary and the pandas resample function to convert the dataframe to monthly data:
>>> ohlc_dict = {'Open':'first','High':'max','Low':'min','Close': 'last','Volume': 'sum','Adj Close': 'last'}
>>> data_dy.resample('M', how=ohlc_dict, closed='right', label='right')
Volume Adj Close High Low Close Open
Date
2012-12-31 392741900 136.92 142.56 139.54 142.41 140.64
2013-01-31 453638500 140.72 146.61 144.73 146.37 145.11
[2 rows x 6 columns]
This does the calculations correctly, but I'd like to use the Yahoo! date convention for monthly data of using the first trading day of the period rather than the last calendar day of the period that pandas uses.
So I'd like the answer set to be:
Volume Adj Close High Low Close Open
Date
2012-12-28 392741900 136.92 142.56 139.54 142.41 140.64
2013-01-02 453638500 140.72 146.61 144.73 146.37 145.11
I could do this by converting the daily data to a python list, process the data and return the data to a dataframe, but how do can this be done with pandas?
Instead of M you can pass MS as the resample rule:
df =pd.DataFrame( range(72), index = pd.date_range('1/1/2011', periods=72, freq='D'))
#df.resample('MS', how = 'mean') # pandas <0.18
df.resample('MS').mean() # pandas >= 0.18
Updated to use the first business day of the month respecting US Federal Holidays:
df =pd.DataFrame( range(200), index = pd.date_range('12/1/2012', periods=200, freq='D'))
from pandas.tseries.offsets import CustomBusinessMonthBegin
from pandas.tseries.holiday import USFederalHolidayCalendar
bmth_us = CustomBusinessMonthBegin(calendar=USFederalHolidayCalendar())
df.resample(bmth_us).mean()
if you want custom starts of the month using the min month found in the data try this. (It isn't pretty, but it should work).
month_index =df.index.to_period('M')
min_day_in_month_index = pd.to_datetime(df.set_index(new_index, append=True).reset_index(level=0).groupby(level=0)['level_0'].min())
custom_month_starts =CustomBusinessMonthBegin(calendar = min_day_in_month_index)
Pass custom_start_months to the fist parameter of resample
Thank you J Bradley, your solution worked perfectly. I did have to upgrade my version of pandas from their official website though as the version installed via pip did not have CustomBusinessMonthBegin in pandas.tseries.offsets. My final code was:
#----- imports -----
import pandas as pd
from pandas.tseries.offsets import CustomBusinessMonthBegin
import pandas.io.data as web
#----- get sample data -----
df = web.get_data_yahoo('SPY', '2012-12-01', '2013-12-31')
#----- build custom calendar -----
month_index =df.index.to_period('M')
min_day_in_month_index = pd.to_datetime(df.set_index(month_index, append=True).reset_index(level=0).groupby(level=0)['Open'].min())
custom_month_starts = CustomBusinessMonthBegin(calendar = min_day_in_month_index)
#----- convert daily data to monthly data -----
ohlc_dict = {'Open':'first','High':'max','Low':'min','Close': 'last','Volume': 'sum','Adj Close': 'last'}
mthly_ohlcva = df.resample(custom_month_starts, how=ohlc_dict)
This yielded the following:
>>> mthly_ohlcva
Volume Adj Close High Low Close Open
Date
2012-12-03 2889875900 136.92 145.58 139.54 142.41 142.80
2013-01-01 2587140200 143.92 150.94 144.73 149.70 145.11
2013-02-01 2581459300 145.76 153.28 148.73 151.61 150.65
2013-03-01 2330972300 151.30 156.85 150.41 156.67 151.09
2013-04-01 2907035000 154.20 159.72 153.55 159.68 156.59
2013-05-01 2781596000 157.84 169.07 158.10 163.45 159.33
2013-06-03 3533321800 155.74 165.99 155.73 160.42 163.83
2013-07-01 2330904500 163.78 169.86 160.22 168.71 161.26
2013-08-01 2283131700 158.87 170.97 163.05 163.65 169.99
2013-09-02 2226749600 163.90 173.60 163.70 168.01 165.23
2013-10-01 2901739000 171.49 177.51 164.53 175.79 168.14
2013-11-01 1930952900 176.57 181.75 174.76 181.00 176.02
2013-12-02 2232775900 181.15 184.69 177.32 184.69 181.09
I've seen in the last version of pandas you can use time offset alias 'BMS', which stands for "business month start frequency" or 'BM', which stands for "business month end frequency".
The code in the first case would look like
data_dy.resample('BMS', closed='right', label='right').apply(ohlc_dict)
or, in the second case,
data_dy.resample('BM', closed='right', label='right').apply(ohlc_dict)