pandas get business days data from datetime index

pandas get business days data from datetime index - python

I have pandas dataframe as:
df.ix[1:4]
Data
DateTime
2015-05-24 02:00:00 4368.02
2015-05-24 03:00:00 4254.63
2015-05-24 04:00:00 4167.88
I have created a calendar as:
us_bd = CustomBusinessDay(calendar=myCalendar())
How do I extract the business days data and non business days data from df?
Right now I am extracting the dates from df and then checking their presence in us_bd using numpy.in1d which appears very clumsy.

I'd simply say a business day is such that adding and subtracting one business day returns to the same day.
df['is_biz'] = ((df.DateTime + us_bd) - us_bd ) == df.DateTime

Related

Calculating rate of return for multiple time frames (annualized, quarterly) with daily time series data (S&P 500 (SPX index) daily prices)

I have a CSV file with some 30 years worth of daily close prices for the S&P 500 (SPX) stock market index, and I read it as Dataframe Series with Dates set as Index.
Dataframe:
Date
Open
High
Low
Close
2023-01-13
3960.60
4003.95
3947.67
3999.09
2023-01-12
3977.57
3997.76
3937.56
3983.17
2023-01-11
3932.35
3970.07
3928.54
3969.61
2023-01-10
3888.57
3919.83
3877.29
3919.25
2023-01-09
3910.82
3950.57
3890.42
3892.09
1990-01-08
353.79
354.24
350.54
353.79
1990-01-05
352.20
355.67
351.35
352.20
1990-01-04
355.67
358.76
352.89
355.67
1990-01-03
358.76
360.59
357.89
358.76
1990-01-02
359.69
359.69
351.98
359.69
It effectively has a date (as index) column, and four columns (open, high, low, close) of daily prices. I am using close prices.
I would like a flexible function to calculate annual returns from the chosen start date to the end date using the formula:
(end_price / beginning_price - 1) * 100
So, the annual return for 2022 would be:
(SPX_Close_price_at_31_December_2022 - SPX_Close_price_at_31_December_2021 - 1)*100
It would be ideal if the same function could handle monthly or quarterly date inputs. Then, I would like these periodic returns (%) to be added to the dataframe in a separate column, and/or a new dataframe, and match the start and end dates across rows, so I can produce consecutive annual returns on a Matplotlib line chart. And I would like to do this for the whole time series of 30 years.
This is the what I would like for the final dataframe to look like (return numbers below are examples only):
Date
Annual Return (%)
m/d/2022
-18
m/d/2021
20
m/d/2020
15
m/d/2019
18
I am a beginner with Python am and still struggling working with date and datetime formats and matching those dates to data in columns across selected rows.
Below is what I got to so far, but it doesn't work properly. I will try the dateutil library, but I think that concepts of building out efficient functions is still something I need to work on. This is my first question on Stack Overflow, so thanks for having me :)
def spx_return(df, sdate, edate):
delta = dt.timedelta(days=365)
while (sdate <= edate):
df2 = df['RoR'] = (df['Close'] / df['Close'].shift(-365) - 1) * 100
sdate += delta
#print(sdate, end="\n")
return df2

To calculate annual and quarterly rates in a generic way as well, I came up with a function that takes as arguments the start date, end date, and a pattern that distinguishes between years and quarters as the type of frequency. For the data frames extracted by start and end date, we use pd.Grouper() to extract the target data rows. For the result of that extraction, we will incorporate your formula in the next line. Also, when determining the rate from the start date, we need to go back further in time, so we subtract '366 days' or '90 days' for the frequency keyword. I have not verified that this value leads to the correct result in all cases. This is due to market holidays such as the year-end and New Year holidays. Setting a larger number of days may solve this problem.
import pandas as pd
import yfinance as yf
df = yf.download("^GSPC", start="2016-01-01", end="2022-01-01")
df.index = pd.to_datetime(df.index)
df.index = df.index.tz_localize(None)
def rating(data, startdate, enddate, freq):
offset = '366 days' if freq == 'Y' else '90 days'
#dff = df.loc[(df.index >= startdate) & (df.index <= enddate)]
dff = df.loc[(df.index >= pd.Timestamp(startdate) - pd.Timedelta(offset)) & (df.index <= pd.Timestamp(enddate))]
dfy = dff.groupby(pd.Grouper(level='Date', freq=freq)).tail(1)
ratio = (dfy['Close'] / dfy['Close'].shift()-1)*100
return ratio
period_rating = rating(df, '2017-01-01', '2019-12-31', freq='Y')
print(period_rating)
Date
2016-12-30 NaN
2017-12-29 19.419966
2018-12-31 -6.237260
2019-12-31 28.878070
Name: Close, dtype: float64
period_rating = rating(df, '2017-01-01', '2019-12-31', freq='Q')
print(period_rating)
Date
2016-12-30 NaN
2017-03-31 5.533689
2017-06-30 2.568647
2017-09-29 3.959305
2017-12-29 6.122586
2018-03-29 -1.224561
2018-06-29 2.934639
2018-09-28 7.195851
2018-12-31 -13.971609
2019-03-29 13.066190
2019-06-28 3.787754
2019-09-30 1.189083
2019-12-31 8.534170
Name: Close, dtype: float64

If your df has a DatetimeIndex, then you can use the .loc accessor with the date formatted as a string to retrieve the necessary values. For example, df.loc['2022-12-31'].Close should return the Close value on 2022-12-31.
In terms of efficiency, although you could use a shift operation, there isn't really a need to allocate more memory in a dataframe – you can use a loop instead:
annual_returns = []
end_dates = []
for year in range(1991,2022):
end_date = f"{year}-12-31"
start_date = f"{year-1}-12-31"
end_dates.append(end_date)
end_price, start_price = df.loc[end_date].Close, df.loc[start_date].Close
annual_returns.append((end_price / start_price - 1)*100)
Then you can build your final dataframe from your lists:
df_final = pd.DataFrame(
data=annual_returns,
index=pd.DatetimeIndex(end_dates, name='Date'),
columns=['Annual Return (%)']
)
Using some sample data from yfinance, I get the following:
>>> df_final
Annual Return (%)
Date
2008-12-31 -55.508475
2009-12-31 101.521206
2010-12-31 -4.195294
2013-12-31 58.431109
2014-12-31 -5.965609
2015-12-31 44.559938
2019-12-31 29.104585
2020-12-31 31.028712
2021-12-31 65.170561

python dataframe datetime condition

I am trying to create a new dataframe from an existing one by conditioning holiday datetime. train dataframe is existing and I want to create train_holiday from it by taking day and month values of holiday dataframe, my purpose is similar below:
date values
2015-02-01 10
2015-02-02 20
2015-02-03 30
2015-02-04 40
2015-02-05 50
2015-02-06 60
date
2012-02-02
2012-02-05
now first one is existing, and second dataframe shows holidays. I want to create a new dataframe from first one that only contains 2015 holidays similar below:
date values
2015-02-02 20
2015-02-05 50
I tried
train_holiday = train.loc[train["date"].dt.day== holidays["date"].dt.day]
but it gives error. could you please help me about this?

In your problem you care only the month and the day components, and one way to extract that is by dt.strftime() (ref). Applying that extraction on both date columns and use .isin() to keep month-day in df1 that matches that in df2.
df1[
df1['date'].dt.strftime('%m%d').isin(
df2['date'].dt.strftime('%m%d')
)
]
Make sure both date columns are in date-time format so that .dt can work. For example,
df1['date'] = pd.to_datetime(df1['date'])

How To Sum all the values of a column for a date instance in pandas

I am working on time-series data, where I have two columns date and quantity. The date is day wise. I want to add all the quantity for a month and convert it into a single date.
date is my index column
Example
quantity
date
2018-01-03 30
2018-01-05 45
2018-01-19 30
2018-02-09 10
2018-02-19 20
Output :
quantity
date
2018-01-01 105
2018-02-01 30
Thanks in advance!!

You can downsample to combine the data for each month and sum it by chaining the sum method.
df.resample("M").sum()
Check out the pandas user guide on resampling here.
You'll need to make sure your index is in datetime format for this to work. So first do: df.index = pd.to_datetime(df.index). Hat tip to sammywemmy for the same advice in the comments.

You an also use groupby to get results.
df.index = pd.to_datetime(df.index)
df.groupby(df.index.strftime('%Y-%m-01')).sum()

How to select a subset of pandas DateTimeIndex whose data are in a list?

Lets say I have a idx=pd.DatatimeIndex with one minute frequency. I also have a list of bad dates (each are of type pd.Timestamp without the time information) that I want to remove from the original idx. How do I do that in pandas?

Use normalize to remove the time part from your index so you can do a simple ~ + isin selection, i.e. find the dates not in that bad list. You can further ensure your list of dates don't have a time part with the same [x.normalize() for x in bad_dates] if you need to be extra safe.
Sample Data
import pandas as pd
df = pd.DataFrame(range(9), index=pd.date_range('2010-01-01', freq='11H', periods=9))
bad_dates = [pd.Timestamp('2010-01-02'), pd.Timestamp('2010-01-03')]
Code
df[~df.index.normalize().isin(bad_dates)]
# 0
#2010-01-01 00:00:00 0
#2010-01-01 11:00:00 1
#2010-01-01 22:00:00 2
#2010-01-04 05:00:00 7
#2010-01-04 16:00:00 8

Concatenate two dataframe columns as one timestamp

I'm working on a pandas dataframe, one of my column is a date (YYYYMMDD), another one is an hour (HH:MM), I would like to concatenate the two column as one timestamp or datetime64 column, to later use that column as an index (for a time series). Here is the situation :
Do you have any ideas? The classic pandas.to_datetime() seems to work only if the columns contain hours only, day only and year only, ... etc...

Setup
df
Out[1735]:
id date hour other
0 1820 20140423 19:00:00 8
1 4814 20140424 08:20:00 22
Solution
import datetime as dt
#convert date and hour to str, concatenate them and then convert them to datetime format.
df['new_date'] = df[['date','hour']].astype(str).apply(lambda x: dt.datetime.strptime(x.date + x.hour, '%Y%m%d%H:%M:%S'), axis=1)
df
Out[1756]:
id date hour other new_date
0 1820 20140423 19:00:00 8 2014-04-23 19:00:00
1 4814 20140424 08:20:00 22 2014-04-24 08:20:00

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas get business days data from datetime index - python

I'd simply say a business day is such that adding and subtracting one business day returns to the same day. df['is_biz'] = ((df.DateTime + us_bd) - us_bd ) == df.DateTime

Related

Calculating rate of return for multiple time frames (annualized, quarterly) with daily time series data (S&P 500 (SPX index) daily prices)

python dataframe datetime condition

How To Sum all the values of a column for a date instance in pandas

How to select a subset of pandas DateTimeIndex whose data are in a list?

Concatenate two dataframe columns as one timestamp

Categories

Resources