Reading stocks from multiple sources using Pandas Datareader - python

I have a list of 6 stocks. I have set up my code to reference the stock name from the list vs hard coding in the stock name ... first with SPY which is in position 0. The code below the list will return yesterday's closing price of stock.
My question is: how do I loop the code through each stock in the list so that I print out the closing price for all 6 stocks?
I think I need to use loops but I don't understand them.
Any ideas?
CODE:
#import packages
import pandas_datareader.data as web
import datetime as dt
#create list of stocks to reference later
stocks = ['SPY', 'QQQ', 'IWM', 'AAPL', 'FB', 'GDX']
#define prior day close price
start = dt.datetime(2010, 1, 1)
end = dt.datetime(2030, 1, 27)
ticker = web.DataReader(stocks[0], 'google', start, end)
prior_day = ticker.iloc[-1]
PDL = list(prior_day)
prior_close = PDL[3]
#print the name of the stock from the stocks list, and the prior close price
print(stocks[0])
print('Prior Close')
print(prior_close)
RETURNS:
SPY
Prior Close
249.08

You could use a loop, but you don't need loops for this. Pass your entire list of stocks to the DataReader. This should be cheaper than making multiple calls.
stocks = ['SPY', 'QQQ', 'IWM', 'AAPL', 'FB', 'GDX']
ticker = web.DataReader(stocks, 'google', start, end)
close = ticker.to_frame().tail()['Close'].to_frame('Prior Close')
print(close)
Prior Close
Date minor
2017-09-26 FB 164.21
GDX 23.35
IWM 144.61
QQQ 143.17
SPY 249.08
Details
ticker is a panel, but can be converted to a dataframe using to_frame:
print(ticker)
<class 'pandas.core.panel.Panel'>
Dimensions: 5 (items) x 251 (major_axis) x 6 (minor_axis)
Items axis: Open to Volume
Major_axis axis: 2016-09-28 00:00:00 to 2017-09-26 00:00:00
Minor_axis axis: AAPL to SPY
df = ticker.to_frame()
You can view all recorded dates of stocks using df.index.get_level_values:
print(df.index.get_level_values('Date'))
DatetimeIndex(['2016-09-28', '2016-09-28', '2016-09-28', '2016-09-28',
'2016-09-28', '2016-09-28', '2016-09-29', '2016-09-29',
'2016-09-29', '2016-09-29',
...
'2017-09-25', '2017-09-25', '2017-09-25', '2017-09-25',
'2017-09-26', '2017-09-26', '2017-09-26', '2017-09-26',
'2017-09-26', '2017-09-26'],
dtype='datetime64[ns]', name='Date', length=1503, freq=None)
If you want to view all stocks for a particular date, you can use df.loc with a slice. For your case, you want to see the closing stocks on the last date, so you can use df.tail:
print(df.tail()['Close'].to_frame())
Close
Date minor
2017-09-26 FB 164.21
GDX 23.35
IWM 144.61
QQQ 143.17
SPY 249.08

You can just use a for loop
for stock in stocks:
start = dt.datetime(2010, 1, 1)
end = dt.datetime(2030, 1, 27)
ticker = web.DataReader(stock, 'google', start, end)
prior_day = ticker.iloc[-1]
PDL = list(prior_day)
prior_close = PDL[3]
print(stock)
print('Prior Close')
print(prior_close)

I'll make you a function that you can always pass for a list of stocks and that provides you a time series. ;)
I use this function for numerous tickers
tickers = ['SPY', 'QQQ', 'EEM', 'INDA', 'AAPL', 'MSFT'] # add as many tickers
start = dt.datetime(2010, 3,31)
end = dt.datetime.today()
# Function starts here
def get_previous_close(strt, end, tick_list, this_price):
""" arg: `this_price` can take str Open, High, Low, Close, Volume"""
#make an empty dataframe in which we will append columns
adj_close = pd.DataFrame([])
# loop here.
for idx, i in enumerate(tick_list):
total = web.DataReader(i, 'google', strt, end)
adj_close[i] = total[this_price]
return adj_close
#call the function
get_previous_close(start, end, tickers, 'Close')
You can use this time series in any way possible. It's always good to use a function that has maintainability and re-usability. Also, this function can take yahoo instead of google

Related

groupby and join with pandas dataframe

Here is part of the data of scaffold_table
import pandas as pd
scaffold_table = pd.DataFrame({
'Position':[2000]*5,
'Company':['Amazon', 'Amazon', 'Alphabet', 'Amazon', 'Alphabet'],
'Date':['2020-05-26','2020-05-27','2020-05-27','2020-05-28','2020-05-28'],
'Ticker':['AMZN','AMZN','GOOG','AMZN','GOOG'],
'Open':[2458.,2404.9899,1417.25,2384.330078,1396.859985],
'Volume':[3568200,5056900,1685800,3190200,1692200],
'Daily Return':[-0.006164,-0.004736,0.000579,-0.003854,-0.000783],
'Daily PnL':[-12.327054,-9.472236,1.157283,-7.708126,-1.565741],
'Cumulative PnL/Ticker':[-12.327054,-21.799290,1.157283,-29.507417,-0.408459]})
I would like to create a summary table that returns the overall yield per ticker. The overall yield should be calculated as the total PnL per ticker divided by the last date's position per ticker
# Create a summary table of your average daily PnL, total PnL, and overall yield per ticker
summary_table = pd.DataFrame(scaffold_table.groupby(['Date','Ticker'])['Daily PnL'].mean())
position_ticker = pd.DataFrame(scaffold_table.groupby(['Date','Ticker'])['Position'].sum())
# the total PnL is the sum of PnL per Ticker after two years period
totals = summary_table.droplevel('Date').groupby('Ticker').sum().rename(columns={'Daily PnL':'total PnL'})
summary_table = summary_table.join(totals, on='Ticker')
summary_table = summary_table.join(position_ticker, on = ['Date','Ticker'], how='inner')
summary_table['Yield'] = summary_table.loc['2022-04-29']['total PnL']/summary_table.loc['2022-04-29']['Position']
summary_table
But the yield is showing NaN, could anyone take a look at my codes?
I used ['2022-04-29'] because it is the last date, but I think there are some codes to return the last date without explicitly inputting that.
I solved the problem with the following code
# we want the overall yield per ticker, so total PnL/Position on the last date
summary_table['Yield'] = summary_table['total PnL']/summary_table.loc['2022-04-29']['Position']
This does not specify the date for total PnL since it's the sum by ticker without regard of the date.
I note the comment in your code saying: "Create a summary table of your average daily PnL, total PnL, and overall yield per ticker".
If we start from this, here are a few observations:
the average daily PnL per ticker should just be the mean of Daily PnL for each ticker
the total PnL per ticker is already listed in the Cumulative PnL/Ticker column, so if we use groupby on Ticker and get the value in the Cumulative PnL/Ticker column for the most recent date (namely, for the last() row in the groupby assuming the df is sorted by date), we don't have to calculate it
for the overall yield per ticker (which you have specified "should be calculated as the total PnL per ticker divided by the last date's position per ticker") we can get the relevant Position (namely, for the most recent date per ticker) analogously to how we got the relevant Cumulative PnL/Ticker and use these two values to calculate Yield.
Here is sample code to do this:
import pandas as pd
scaffold_table = pd.DataFrame({
'Position':[2000]*5,
'Company':['Amazon', 'Amazon', 'Alphabet', 'Amazon', 'Alphabet'],
'Date':['2020-05-26','2020-05-27','2020-05-27','2020-05-28','2020-05-28'],
'Ticker':['AMZN','AMZN','GOOG','AMZN','GOOG'],
'Open':[2458.,2404.9899,1417.25,2384.330078,1396.859985],
'Volume':[3568200,5056900,1685800,3190200,1692200],
'Daily Return':[-0.006164,-0.004736,0.000579,-0.003854,-0.000783],
'Daily PnL':[-12.327054,-9.472236,1.157283,-7.708126,-1.565741],
'Cumulative PnL/Ticker':[-12.327054,-21.799290,1.157283,-29.507417,-0.408459]})
print(scaffold_table)
# Create a summary table of your average daily PnL, total PnL, and overall yield per ticker
gb = scaffold_table.groupby(['Ticker'])
summary_table = gb.last()[['Position', 'Cumulative PnL/Ticker']].rename(columns={'Cumulative PnL/Ticker':'Total PnL'})
summary_table['Yield'] = summary_table['Total PnL'] / summary_table['Position']
summary_table['Average Daily PnL'] = gb['Daily PnL'].mean()
summary_table = summary_table[['Average Daily PnL', 'Total PnL', 'Yield']]
print('\nsummary_table:'); print(summary_table)
exit()
Input:
Position Company Date Ticker Open Volume Daily Return Daily PnL Cumulative PnL/Ticker
0 2000 Amazon 2020-05-26 AMZN 2458.000000 3568200 -0.006164 -12.327054 -12.327054
1 2000 Amazon 2020-05-27 AMZN 2404.989900 5056900 -0.004736 -9.472236 -21.799290
2 2000 Alphabet 2020-05-27 GOOG 1417.250000 1685800 0.000579 1.157283 1.157283
3 2000 Amazon 2020-05-28 AMZN 2384.330078 3190200 -0.003854 -7.708126 -29.507417
4 2000 Alphabet 2020-05-28 GOOG 1396.859985 1692200 -0.000783 -1.565741 -0.408459
Output:
Average Daily PnL Total PnL Yield
Ticker
AMZN -9.835805 -29.507417 -0.014754
GOOG -0.204229 -0.408459 -0.000204

How to convert a list to a dataframe without dropping all data?

I am testing the code below.
import pandas as pd
from pandas_datareader import data as wb
tickers = ['SBUX', 'AAPL', 'MSFT']
AllData = []
for ticker in tickers:
print('appending prices for ' + ticker)
tickers = wb.DataReader(ticker,start='2018-7-26',data_source='yahoo')
AllData.append(tickers)
AllData = pd.DataFrame(AllData)
print(AllData)
When I convert the list to a dataframe, everything gets dropped.
Also, I'm trying to get the ticker variable inserted into the relevant spot, so I can tell which one is which. I'd like the final result to look like this.
date ticker adj_close
0 2018-02-13 MSFT 164.34
1 2018-02-12 MSFT 162.71
...
265 2018-02-13 SBUX 81.30
266 2018-02-12 SBUX 82.11
How can I do that? TIA.
There are a couple of issues with your code.
First, you are iterating through tickers via for ticker in tickers: but you then reassign that variable in the loop via tickers = wb.DataReader(...). Never change the object over which you are iterating. Although this actually does not cause an issue in this case, it is clearly undesirable.
Second, AllData is a list containing three dataframes, none of which have a reference to their relevant ticker. You could concatenate at this stage, but you should first include the ticker as an additional column to the dataframe via .assign(ticker).
price_data = []
for ticker in tickers:
prices = wb.DataReader(ticker, start='2018-7-26', data_source='yahoo')[['Adj Close']]
price_data.append(prices.assign(ticker=ticker)[['ticker', 'Adj Close']])
df = pd.concat(price_data)
>>> df
ticker Adj Close
Date
2018-07-26 SBUX 50.324104
2018-07-27 SBUX 51.008789
2018-07-30 SBUX 50.764256
...
>>> df.set_index('ticker', append=True).unstack('ticker')
Adj Close
ticker AAPL MSFT SBUX
Date
2018-07-26 191.298080 107.868378 50.324104
2018-07-27 188.116501 105.959373 51.008789
2018-07-30 187.062546 103.686295 50.764256
...

How do I continiously calculate something based on the past X amount of data? (Please see info for more details)

Goal:
Calculate 50day moving average for each day, based on the past 50 days. I can calculate the mean for the entire dataset, but I am trying to contiously calculate the mean based on the past 50 days...with it changing each day of course!
import numpy as np
import pandas_datareader.data as pdr
import pandas as pd
# Define the instruments to download. We would like to see Apple, Microsoft and the S&P500 index.
ticker = ['AAPL']
#Define the data period that you would like
start_date = '2017-07-01'
end_date = '2019-02-08'
# User pandas_reader.data.DataReader to load the stock prices from Yahoo Finance.
df = pdr.DataReader(ticker, 'yahoo', start_date, end_date)
# Yahoo Finance gives 'High', 'Low', 'Open', 'Close', 'Volume', 'Adj Close'.
#Export Close PRice, Volume, and Date from yahoo finance
CloseP = df['Close']
CloseP.head()
Volm = df['Volume']
Volm.head()
Date = df["Date"] = df.index
#create a table with Date, Close Price, and Volume
Table = pd.DataFrame(np.array(Date), columns = ['Date'])
Table['Close Price'] = np.array(CloseP)
Table['Volume'] = np.array(Volm)
print (Table)
#create a column that contiosuly calculates 50 day MA
#This is what I can't get to work!
MA = np.mean(df['Close'])
Table['Moving Average'] = np.array(MA)
print (Table)
First of all, please, don't use CamelCase to name your variables, as they look as class names otherwise.
Next, use merge() to join your data frames instead of those yours np.array way:
>>> table = CloseP.merge(Volm, left_index=True, right_index=True)
>>> table.columns = ['close', 'volume'] # give names to columns
>>> table.head(10)
close volume
Date
2017-07-03 143.500000 14277800.0
2017-07-05 144.089996 21569600.0
2017-07-06 142.729996 24128800.0
2017-07-07 144.179993 19201700.0
2017-07-10 145.059998 21090600.0
2017-07-11 145.529999 19781800.0
2017-07-12 145.740005 24884500.0
2017-07-13 147.770004 25199400.0
2017-07-14 149.039993 20132100.0
2017-07-17 149.559998 23793500.0
Finally, use combination of rolling(), mean() and dropna() to calculate moving average:
>>> ma50 = table.rolling(window=50).mean().dropna()
>>> ma50.head(10)
close volume
Date
2017-09-12 155.075401 26092540.0
2017-09-13 155.398401 26705132.0
2017-09-14 155.682201 26748954.0
2017-09-15 156.025201 27248670.0
2017-09-18 156.315001 27430024.0
2017-09-19 156.588401 27424424.0
2017-09-20 156.799201 28087816.0
2017-09-21 156.952201 28340360.0
2017-09-22 157.034601 28769280.0
2017-09-25 157.064801 29254384.0
Please, refer to the docs of mentioned API calls to get more info about their usage. Good luck!

Pandas DataReader handle RemoteError yahoo finance

I am using the below code to parse a large tickers list to yahoo datareader, I am trying to get back a dataframe as per below. If the list is large, I often get a RemoteError back but on different tickers each time. I am not sure how to handle the RemoteError and I am happy to drop the ticker and continue with the next ticker in the list. I would, however, like to try again to get adj close ticker data. I thought using a for loop and adding a time delay would help with yahoo requests but I am still getting a Remote error. Any ideas?
IBM MSFT ORCL TSLA YELP
Date
2014-01-02 184.52 36.88 37.61 150.10 67.92
2014-01-03 185.62 36.64 37.51 149.56 67.66
2014-01-06 184.99 35.86 37.36 147.00 71.72
2014-01-07 188.68 36.14 37.74 149.36 72.66
2014-01-08 186.95 35.49 37.61 151.28 78.42
import pandas_datareader.data as web
import datetime as dt
import pandas as pd
import time
from pandas_datareader._utils import RemoteDataError
Which_group = ['Accident & Health Insurance'] ##<<<<put in group here
df = pd.read_csv('/home/ross/Downloads/UdemyPairs/stocks1.csv')
df.set_index('categoryName', inplace = True)
df1 = df.loc[Which_group]
tickers = df1.Ticker.tolist()
print(tickers)
#tickers = ['SPY', 'AAPL', 'MSFT'] # add as many tickers
start = dt.datetime(2013, 1,1)
end = dt.datetime.today()
# Function starts here
def get_previous_close(strt, end, tick_list, this_price):
""" arg: `this_price` can take str Open, High, Low, Close, Volume"""
#make an empty dataframe in which we will append columns
adj_close = pd.DataFrame([])
# loop here.
for idx, i in enumerate(tick_list):
try:
# time.sleep(0.01)
total = web.DataReader(i, 'yahoo', strt, end)
adj_close[i] = total[this_price]
except RemoteDataError:
pass
return adj_close
#call the function
print(get_previous_close(start, end, tickers, 'Adj Close'))
Maybe you can look at this question. This proposes a solution that it might work for you.
Pandas Dataframe - RemoteDataError - Python

converting daily stock data to weekly-based via pandas in Python

I've got a DataFrame storing daily-based data which is as below:
Date Open High Low Close Volume
2010-01-04 38.660000 39.299999 38.509998 39.279999 1293400
2010-01-05 39.389999 39.520000 39.029999 39.430000 1261400
2010-01-06 39.549999 40.700001 39.020000 40.250000 1879800
2010-01-07 40.090000 40.349998 39.910000 40.090000 836400
2010-01-08 40.139999 40.310001 39.720001 40.290001 654600
2010-01-11 40.209999 40.520000 40.040001 40.290001 963600
2010-01-12 40.160000 40.340000 39.279999 39.980000 1012800
2010-01-13 39.930000 40.669998 39.709999 40.560001 1773400
2010-01-14 40.490002 40.970001 40.189999 40.520000 1240600
2010-01-15 40.570000 40.939999 40.099998 40.450001 1244200
What I intend to do is to merge it into weekly-based data. After grouping:
the Date should be every Monday (at this point, holidays scenario should be considered when Monday is not a trading day, we should apply the first trading day in current week as the Date).
Open should be Monday's (or the first trading day of current week) Open.
Close should be Friday's (or the last trading day of current week) Close.
High should be the highest High of trading days in current week.
Low should be the lowest Low of trading days in current week.
Volumn should be the sum of all Volumes of trading days in current week.
which should look like this:
Date Open High Low Close Volume
2010-01-04 38.660000 40.700001 38.509998 40.290001 5925600
2010-01-11 40.209999 40.970001 39.279999 40.450001 6234600
Currently, my code snippet is as below, which function should I use to mapping daily-based data to the expected weekly-based data? Many thanks!
import pandas_datareader.data as web
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2016, 12, 31)
f = web.DataReader("MNST", "yahoo", start, end, session=session)
print f
You can resample (to weekly), offset (shift), and apply aggregation rules as follows:
logic = {'Open' : 'first',
'High' : 'max',
'Low' : 'min',
'Close' : 'last',
'Volume': 'sum'}
offset = pd.offsets.timedelta(days=-6)
f = pd.read_clipboard(parse_dates=['Date'], index_col=['Date'])
f.resample('W', loffset=offset).apply(logic)
to get:
Open High Low Close Volume
Date
2010-01-04 38.660000 40.700001 38.509998 40.290001 5925600
2010-01-11 40.209999 40.970001 39.279999 40.450001 6234600
In general, assuming that you have the dataframe in the form you specified, you need to do the following steps:
put Date in the index
resample the index.
What you have is a case of applying different functions to different columns. See.
You can resample in various ways. for e.g. you can take the mean of the values or count or so on. check pandas resample.
You can also apply custom aggregators (check the same link).
With that in mind, the code snippet for your case can be given as:
f['Date'] = pd.to_datetime(f['Date'])
f.set_index('Date', inplace=True)
f.sort_index(inplace=True)
def take_first(array_like):
return array_like[0]
def take_last(array_like):
return array_like[-1]
output = f.resample('W', # Weekly resample
how={'Open': take_first,
'High': 'max',
'Low': 'min',
'Close': take_last,
'Volume': 'sum'},
loffset=pd.offsets.timedelta(days=-6)) # to put the labels to Monday
output = output[['Open', 'High', 'Low', 'Close', 'Volume']]
Here, W signifies a weekly resampling which by default spans from Monday to Sunday. To keep the labels as Monday, loffset is used.
There are several predefined day specifiers. Take a look at pandas offsets. You can even define custom offsets (see).
Coming back to the resampling method. Here for Open and Close you can specify custom methods to take the first value or so on and pass the function handle to the how argument.
This answer is based on the assumption that the data seems to be daily, i.e. for each day you have only 1 entry. Also, no data is present for the non-business days. i.e. Sat and Sun. So taking the last data point for the week as the one for Friday is ok. If you so want you can use business week instead of 'W'. Also, for more complex data you may want to use groupby to group the weekly data and then work on the time indices within them.
btw a gist for the solution can be found at:
https://gist.github.com/prithwi/339f87bf9c3c37bb3188
I had the exact same question and found a great solution here.
https://www.techtrekking.com/how-to-convert-daily-time-series-data-into-weekly-and-monthly-using-pandas-and-python/
The weekly code is posted below.
import pandas as pd
import numpy as np
print('*** Program Started ***')
df = pd.read_csv('15-06-2016-TO-14-06-2018HDFCBANKALLN.csv')
# ensuring only equity series is considered
df = df.loc[df['Series'] == 'EQ']
# Converting date to pandas datetime format
df['Date'] = pd.to_datetime(df['Date'])
# Getting week number
df['Week_Number'] = df['Date'].dt.week
# Getting year. Weeknum is common across years to we need to create unique index by using year and weeknum
df['Year'] = df['Date'].dt.year
# Grouping based on required values
df2 = df.groupby(['Year','Week_Number']).agg({'Open Price':'first', 'High Price':'max', 'Low Price':'min', 'Close Price':'last','Total Traded Quantity':'sum'})
# df3 = df.groupby(['Year','Week_Number']).agg({'Open Price':'first', 'High Price':'max', 'Low Price':'min', 'Close Price':'last','Total Traded Quantity':'sum','Average Price':'avg'})
df2.to_csv('Weekly_OHLC.csv')
print('*** Program ended ***')
Adding to #Stefan 's answer with recent pandas API as loffset was deprecated since version 1.1.0 and later removed.
df = pd.read_clipboard(parse_dates=['Date'], index_col=['Date'])
logic = {'Open' : 'first',
'High' : 'max',
'Low' : 'min',
'Close' : 'last',
'Volume': 'sum'}
dfw = df.resample('W').apply(logic)
# set the index to the beginning of the week
dfw.index = dfw.index - pd.tseries.frequencies.to_offset("6D")
At first I use df.resample() according to the answers forementioned, but it fills NaN when a week is missed, unhappy about that, after some research, I use groupby() instead of resample(). Thanks for your sharing.
My original data is:
c date h l o
260 6014.78 20220321 6053.90 5984.79 6030.43
261 6052.59 20220322 6099.53 5995.22 6012.17
262 6040.86 20220323 6070.85 6008.26 6059.11
263 6003.05 20220324 6031.73 5987.40 6020.00
264 5931.33 20220325 6033.04 5928.72 6033.04
265 5946.98 20220328 5946.98 5830.93 5871.35
266 5900.04 20220329 5958.71 5894.82 5950.89
267 6003.05 20220330 6003.05 5913.08 5913.08
268 6033.04 20220331 6059.11 5978.27 5993.92
269 6126.91 20220401 6134.74 5975.66 6006.96
270 6149.08 20220406 6177.77 6106.05 6126.91
271 6134.74 20220407 6171.25 6091.71 6130.83
272 6151.69 20220408 6160.82 6096.93 6147.78
273 6095.62 20220411 6166.03 6072.15 6164.73
274 6184.28 20220412 6228.62 6049.99 6094.32
275 6119.09 20220413 6180.37 6117.79 6173.85
276 6188.20 20220414 6201.24 6132.13 6150.38
277 6173.85 20220415 6199.93 6137.35 6137.35
278 6124.31 20220418 6173.85 6108.66 6173.85
279 6065.63 20220419 6147.78 6042.16 6124.31
I don't care the date is not Monday, so I didn't handle that, the code is:
data['Date'] = pd.to_datetime(data['date'], format="%Y%m%d")
# Refer to: https://www.techtrekking.com/how-to-convert-daily-time-series-data-into-weekly-and-monthly-using-pandas-and-python/
# and here: https://stackoverflow.com/a/60518425/5449346
# and this: https://github.com/pandas-dev/pandas/issues/11217#issuecomment-145253671
logic = {'o' : 'first',
'h' : 'max',
'l' : 'min',
'c' : 'last',
'Date': 'first',
}
data = data.groupby([data['Date'].dt.year, data['Date'].dt.week]).agg(logic)
data.set_index('Date', inplace=True)
And the result is, there's no NaN on 2022.01.31 which resample() will produce:
l o h c
Date
2021-11-29 6284.68 6355.09 6421.59 6382.47
2021-12-06 6365.52 6372.04 6700.62 6593.70
2021-12-13 6445.06 6593.70 6690.19 6450.28
2021-12-20 6415.07 6437.24 6531.12 6463.31
2021-12-27 6463.31 6473.75 6794.50 6649.77
2022-01-04 6625.00 6649.77 7089.18 7055.27
2022-01-10 6804.93 7055.27 7181.75 6808.84
2022-01-17 6769.73 6776.25 7098.30 6919.67
2022-01-24 6692.80 6906.63 7048.76 6754.08
2022-02-07 6737.13 6811.45 7056.58 7023.98
2022-02-14 6815.36 7073.53 7086.57 6911.85
2022-02-21 6634.12 6880.56 6904.03 6668.02
2022-02-28 6452.88 6669.33 6671.93 6493.30
2022-03-07 5953.50 6463.31 6468.53 6228.62
2022-03-14 5817.90 6154.30 6205.15 6027.82
2022-03-21 5928.72 6030.43 6099.53 5931.33
2022-03-28 5830.93 5871.35 6134.74 6126.91
2022-04-06 6091.71 6126.91 6177.77 6151.69
2022-04-11 6049.99 6164.73 6228.62 6173.85
2022-04-18 6042.16 6173.85 6173.85 6065.63
Updated solution for 2022
import pandas as pd
from pandas.tseries.frequencies import to_offset
df = pd.read_csv('your_ticker.csv')
logic = {'<Open>' : 'first',
'<High>' : 'max',
'<Low>' : 'min',
'<Close>' : 'last',
'<Volume>': 'sum'}
df['<DTYYYYMMDD>'] = pd.to_datetime(df['<DTYYYYMMDD>'])
df = df.set_index('<DTYYYYMMDD>')
df = df.sort_index()
df = df.resample('W').apply(logic)
df.index = df.index - pd.tseries.frequencies.to_offset("6D")
Not a direct answer, but suppose the columns are the dates (transpose of your table), without missing dates.
'''sum up daily results in df to weekly results in wdf'''
wdf = pd.DataFrame(index = df.index)
for i in range(len(df.columns)):
if (i!=0) & (i%7==0):
wdf['week'+str(i//7)]= df[df.columns[i-7:i]].sum(axis = 1)

Categories