I want to create dataframe based on last 10 business days. Also it should check whether the day is public holiday or not.
I have a list of public holiday.
List of public holiday is:
Holiday
2021-01-26
2021-03-11
2021-03-29
2021-04-02
2021-04-14
2021-04-21
2021-05-13
2021-07-21
2021-08-19
2021-09-10
2021-10-15
2021-11-04
2021-11-05
2021-11-19
weekends saturday and sunday.
so i run the code today, which is saturday 27th Feb 2021, than
output should be like this
Business days
2021-02-15
2021-02-16
2021-02-17
2021-02-18
2021-02-19
2021-02-22
2021-02-23
2021-02-24
2021-02-25
2021-02-26
Alternative to #pi_pascal:
hols = ["2021-01-26", "2021-03-11", "2021-03-29", "2021-04-02",
"2021-04-14", "2021-04-21", "2021-05-13", "2021-07-21",
"2021-08-19", "2021-09-10", "2021-10-15", "2021-11-04",
"2021-11-05", "2021-11-19"]
hols = pd.to_datetime(hols)
bdays = pd.bdate_range(end=pd.Timestamp.today(), periods=60, freq="1D", closed="left")
bdays = bdays[bdays.weekday < 5].difference(hols)[-10:]
>>> bdays
DatetimeIndex(['2021-02-15', '2021-02-16', '2021-02-17', '2021-02-18',
'2021-02-19', '2021-02-22', '2021-02-23', '2021-02-24',
'2021-02-25', '2021-02-26'],
dtype='datetime64[ns]', freq=None)
I did not test this code but it should work
import datetime
today = datetime.datetime.now()
business_days = []
#holidays = [ your list of holidays in here ]
i = 0
while True:
temp_date = today - datetime.timedelta(i)
if temp_date.weekday() in (0,1,2,3,4) and temp_date not in holidays:
if len(business_days)<10:
business_days.append(temp_date)
else:
break
i += 1
print(business days)
Note: You need to format the days if you need the date to be displayed in specific format
You can use pandas built-in function date_range.
In your case, it will be
import pandas as pd
today = '2021-02-27'
businessDays = pd.date_range(end='2021-02-27', periods=14, freq='D').to_series()
businessDays = businessDays[businessDays.dt.dayofweek < 5]
print(businessDays)
Output like this:
2021-02-15
2021-02-16
2021-02-17
2021-02-18
2021-02-19
2021-02-22
2021-02-23
2021-02-24
2021-02-25
2021-02-26
Related
I am trying to output the days on my calendar, something like: 2021-02-02 2021-02-03 2021-02-04 2021-02-05 etc.
I copied this code from https://www.tutorialbrain.com/python-calendar/ so I don't understand why I get the error.
import calendar
year = 2021
month = 2
cal_obj = calendar.Calendar(firstweekday=1)
dates = cal_obj.itermonthdays(year, month)
for i in dates:
i = str(i)
if i[6] == "2":
print(i, end="")
Error:
if i[6] == "2":
IndexError: string index out of range
Process finished with exit code 1
There is a difference between your code and their code. It's very subtle, but it's there:
Yours:
dates = cal_obj.itermonthdays(year, month)
^^^^ days
Theirs:
dates = cal_obj.itermonthdates(year, month)
^^^^^ dates
itermonthdays returns the days of the month as ints, while itermonthdates returns datetime.dates.
If your goal is to create a list of date of the calendar, you can use the following aswell :
import pandas as pd
from datetime import datetime
datelist = list(pd.date_range(start="2021/01/01", end="2021/12/31").strftime("%Y-%m-%d"))
datelist
You can choose any start date or end date (if that date exists)
Output :
['2021-01-01',
'2021-01-02',
'2021-01-03',
'2021-01-04',
'2021-01-05',
'2021-01-06',
'2021-01-07',
'2021-01-08',
'2021-01-09',
'2021-01-10',
'2021-01-11',
'2021-01-12',
...
'2021-12-28',
'2021-12-29',
'2021-12-30',
'2021-12-31']
Seems like you are new to python i[6] means index to an element of a list or list-like data type.
The same stuff can be achieved by using datetime library in the following way
import datetime
start_date = datetime.date(2021, 2, 1) # set the start date in from of (year, month, day)
no_of_days = 30 # no of days you wanna print
day_jump = datetime.timedelta(days=1) # No of days to jump with each iteration, defaut 1
end_date = start_date + no_of_days * day_jump # Seting up the end date
for i in range((end_date - start_date).days):
print(start_date + i * day_jump)
OUTPUT
2021-02-01
2021-02-02
2021-02-03
2021-02-04
2021-02-05
2021-02-06
2021-02-07
2021-02-08
2021-02-09
2021-02-10
2021-02-11
2021-02-12
2021-02-13
2021-02-14
2021-02-15
2021-02-16
2021-02-17
2021-02-18
2021-02-19
2021-02-20
2021-02-21
2021-02-22
2021-02-23
2021-02-24
2021-02-25
2021-02-26
2021-02-27
2021-02-28
2021-03-01
2021-03-02
I have a big data frame (the fragment is below):
start_date finish_date
2842 2019-02-16 19:35:55.125766+00:00 2019-06-23 08:10:42.867492+00:00
2844 2019-05-29 18:03:54.230822+00:00 2019-06-05 08:06:37.896891+00:00
2846 2019-03-26 10:29:14.626280+00:00 2019-03-28 03:00:12.350836+00:00
2847 2019-04-22 16:29:30.480639+00:00 2019-04-24 18:02:09.869749+00:00
2852 2019-06-28 11:32:32.104132+00:00 2019-07-07 20:15:47.000026+00:00
2853 2019-03-21 17:20:50.030024+00:00 2019-03-27 03:18:26.652882+00:00
2854 2019-07-12 13:46:24.119986+00:00 2019-09-16 14:36:16.995393+00:00
start_date and finish_date are datetime64 format.
I need to create a new column with the result of calculation of how many months between start_date and finish_date.
for each string I used
len(pd.date_range(start=df.loc[2844, 'start_date'], end=df.loc[2844, 'finish_date'], freq='M'))
But I dont know how to apply this to every row ... row by row.
I guess some lambda must be used...
This:
df['length'] = pd.date_range(start=df['start_date'], end=df['finish_date'], freq='M')
rises an error...
expected result:
start_date finish_date length
2842 2019-02-16 19:35:55.125766+00:00 2019-06-23 08:10:42.867492+00:00 4
2844 2019-05-29 18:03:54.230822+00:00 2019-06-05 08:06:37.896891+00:00 1
2846 2019-03-26 10:29:14.626280+00:00 2019-03-28 03:00:12.350836+00:00 0
2847 2019-04-22 16:29:30.480639+00:00 2019-04-24 18:02:09.869749+00:00 0
2852 2019-06-28 11:32:32.104132+00:00 2019-07-07 20:15:47.000026+00:00 1
2853 2019-03-21 17:20:50.030024+00:00 2019-03-27 03:18:26.652882+00:00 0
2854 2019-07-12 13:46:24.119986+00:00 2019-09-16 14:36:16.995393+00:00 2
Finding a Difference between dates for month can cause issue with rounding so I have given both the result for your uderstanding:
import pandas as pd
data = {
"start_date" :["2019-02-16 19:35:55.125766+00:00", "2019-05-29 18:03:54.230822+00:00", "2019-03-26 10:29:14.626280+00:00", "2019-04-22 16:29:30.480639+00:00", "2019-06-28 11:32:32.104132+00:00", "2019-03-21 17:20:50.030024+00:00", "2019-07-12 13:46:24.119986+00:00"],
"finish_date" : ["2019-06-23 08:10:42.867492+00:00", "2019-06-05 08:06:37.896891+00:00", "2019-03-28 03:00:12.350836+00:00", "2019-04-24 18:02:09.869749+00:00", "2019-07-07 20:15:47.000026+00:00", "2019-03-27 03:18:26.652882+00:00", "2019-09-16 14:36:16.995393+00:00"]
}
df = pd.DataFrame(data)
# you can skip this if already in datetime format
df['start_date'] = pd.to_datetime(df['start_date'])
df['finish_date'] = pd.to_datetime(df['finish_date'])
df["months"] = df.finish_date.dt.to_period('M').astype(int) - df.start_date.dt.to_period('M').astype(int)
df["months_no_rounding"] = df.finish_date.dt.to_period('M') - df.start_date.dt.to_period('M')
print(df)
Result:
Since both dates are of dtype datetime you can calculate the difference between months by using Series.dt.month attribute:
df['length']=(df['finish_date'].dt.month-df['start_date'].dt.month).abs()
Be careful as your difference has issues with rounding! For instance, the difference between 2019-05-29 and 2019-06-05 is only 6 days, but you calculate it as 1 month. This might be wanted… or not!
Here is an alternative to approach a real difference in months (an absolute difference in months is not possible as months are variable, from 29 to 31):
df['diff'] = (df['finish_date']-df['start_date']).dt.days//30
output:
start_date finish_date diff
2842 2019-02-16 19:35:55.125766 2019-06-23 08:10:42.867492 4
2844 2019-05-29 18:03:54.230822 2019-06-05 08:06:37.896891 0
2846 2019-03-26 10:29:14.626280 2019-03-28 03:00:12.350836 0
2847 2019-04-22 16:29:30.480639 2019-04-24 18:02:09.869749 0
2852 2019-06-28 11:32:32.104132 2019-07-07 20:15:47.000026 0
2853 2019-03-21 17:20:50.030024 2019-03-27 03:18:26.652882 0
2854 2019-07-12 13:46:24.119986 2019-09-16 14:36:16.995393 2
and as float: df['diff'] = (df['finish_date']-df['start_date']).dt.days/30
start_date finish_date diff
2842 2019-02-16 19:35:55.125766 2019-06-23 08:10:42.867492 4.200000
2844 2019-05-29 18:03:54.230822 2019-06-05 08:06:37.896891 0.200000
2846 2019-03-26 10:29:14.626280 2019-03-28 03:00:12.350836 0.033333
2847 2019-04-22 16:29:30.480639 2019-04-24 18:02:09.869749 0.066667
2852 2019-06-28 11:32:32.104132 2019-07-07 20:15:47.000026 0.300000
2853 2019-03-21 17:20:50.030024 2019-03-27 03:18:26.652882 0.166667
2854 2019-07-12 13:46:24.119986 2019-09-16 14:36:16.995393 2.200000
I have the following df:
time_series date sales
store_0090_item_85261507 1/2020 1,0
store_0090_item_85261501 2/2020 0,0
store_0090_item_85261500 3/2020 6,0
Being 'date' = Week/Year.
So, I tried use the following code:
df['date'] = df['date'].apply(lambda x: datetime.strptime(x + '/0', "%U/%Y/%w"))
But, return this df:
time_series date sales
store_0090_item_85261507 2020-01-05 1,0
store_0090_item_85261501 2020-01-12 0,0
store_0090_item_85261500 2020-01-19 6,0
But, the first day of the first week of 2020 is 2019-12-29, considering sunday as first day. How can I have the first day 2020-12-29 of the first week of 2020 and not 2020-01-05?
From the datetime module's documentation:
%U: Week number of the year (Sunday as the first day of the week) as a zero padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0.
Edit: My originals answer doesn't work for input 1/2023 and using ISO 8601 date values doesn't work for 1/2021, so I've edited this answer by adding a custom function
Here is a way with a custom function
import pandas as pd
from datetime import datetime, timedelta
##############################################
# to demonstrate issues with certain dates
print(datetime.strptime('0/2020/0', "%U/%Y/%w")) # 2019-12-29 00:00:00
print(datetime.strptime('1/2020/0', "%U/%Y/%w")) # 2020-01-05 00:00:00
print(datetime.strptime('0/2021/0', "%U/%Y/%w")) # 2020-12-27 00:00:00
print(datetime.strptime('1/2021/0', "%U/%Y/%w")) # 2021-01-03 00:00:00
print(datetime.strptime('0/2023/0', "%U/%Y/%w")) # 2023-01-01 00:00:00
print(datetime.strptime('1/2023/0', "%U/%Y/%w")) # 2023-01-01 00:00:00
#################################################
df = pd.DataFrame({'date':["1/2020", "2/2020", "3/2020", "1/2021", "2/2021", "1/2023", "2/2023"]})
print(df)
def get_first_day(date):
date0 = datetime.strptime('0/' + date.split('/')[1] + '/0', "%U/%Y/%w")
date1 = datetime.strptime('1/' + date.split('/')[1] + '/0', "%U/%Y/%w")
date = datetime.strptime(date + '/0', "%U/%Y/%w")
return date if date0 == date1 else date - timedelta(weeks=1)
df['new_date'] = df['date'].apply(lambda x:get_first_day(x))
print(df)
Input
date
0 1/2020
1 2/2020
2 3/2020
3 1/2021
4 2/2021
5 1/2023
6 2/2023
Output
date new_date
0 1/2020 2019-12-29
1 2/2020 2020-01-05
2 3/2020 2020-01-12
3 1/2021 2020-12-27
4 2/2021 2021-01-03
5 1/2023 2023-01-01
6 2/2023 2023-01-08
You'll want to use ISO week parsing directives, Ex:
import pandas as pd
date = pd.Series(["1/2020", "2/2020", "3/2020"])
pd.to_datetime(date+"/1", format="%V/%G/%u")
0 2019-12-30
1 2020-01-06
2 2020-01-13
dtype: datetime64[ns]
you can also shift by one day if the week should start on Sunday:
pd.to_datetime(date+"/1", format="%V/%G/%u") - pd.Timedelta('1d')
0 2019-12-29
1 2020-01-05
2 2020-01-12
dtype: datetime64[ns]
My company uses a 4-4-5 calendar for reporting purposes. Each month (aka period) is 4-weeks long, except every 3rd month is 5-weeks long.
Pandas seems to have good support for custom calendar periods. However, I'm having trouble figuring out the correct frequency string or custom business month offset to achieve months for a 4-4-5 calendar.
For example:
df_index = pd.date_range("2020-03-29", "2021-03-27", freq="D", name="date")
df = pd.DataFrame(
index=df_index, columns=["a"], data=np.random.randint(0, 100, size=len(df_index))
)
df.groupby(pd.Grouper(level=0, freq="4W-SUN")).mean()
Grouping by 4-weeks starting on Sunday results in the following. The first three month start dates are correct but I need every third month to be 5-weeks long. The 4th month start date should be 2020-06-28.
a
date
2020-03-29 16.000000
2020-04-26 50.250000
2020-05-24 39.071429
2020-06-21 52.464286
2020-07-19 41.535714
2020-08-16 46.178571
2020-09-13 51.857143
2020-10-11 44.250000
2020-11-08 47.714286
2020-12-06 56.892857
2021-01-03 55.821429
2021-01-31 53.464286
2021-02-28 53.607143
2021-03-28 45.037037
Essentially what I'd like to achieve is something like this:
a
date
2020-03-29 20.000000
2020-04-26 50.750000
2020-05-24 49.750000
2020-06-28 49.964286
2020-07-26 52.214286
2020-08-23 47.714286
2020-09-27 46.250000
2020-10-25 53.357143
2020-11-22 52.035714
2020-12-27 39.750000
2021-01-24 43.428571
2021-02-21 49.392857
Pandas currently support only yearly and quarterly 5253 (aka 4-4-5 calendar).
See is pandas.tseries.offsets.FY5253 and pandas.tseries.offsets.FY5253Quarter
df_index = pd.date_range("2020-03-29", "2021-03-27", freq="D", name="date")
df = pd.DataFrame(index=df_index)
df['a'] = np.random.randint(0, 100, df.shape[0])
So indeed you need some more work to get to week level and maintain a 4-4-5 calendar. You could align to quarters using the native pandas offset and fill-in the 4-4-5 week pattern manually.
def date_range(start, end, offset_array, name=None):
start = pd.to_datetime(start)
end = pd.to_datetime(end)
index = []
start -= offset_array[0]
while(start<end):
for x in offset_array:
start += x
if start > end:
break
index.append(start)
return pd.Series(index, name=name)
This function takes a list of offsets rather than a regular frequency period, so it allows to move from date to date following the offsets in the given array:
offset_445 = [
pd.tseries.offsets.FY5253Quarter(weekday=6),
4*pd.tseries.offsets.Week(weekday=6),
4*pd.tseries.offsets.Week(weekday=6),
]
df_index_445 = date_range("2020-03-29", "2021-03-27", offset_445, name='date')
Out:
0 2020-05-03
1 2020-05-31
2 2020-06-28
3 2020-08-02
4 2020-08-30
5 2020-09-27
6 2020-11-01
7 2020-11-29
8 2020-12-27
9 2021-01-31
10 2021-02-28
Name: date, dtype: datetime64[ns]
Once the index is created, then it's back to aggregations logic to get the data in the right row buckets. Assuming that you want the mean for the start of each 4 or 5 week period, according to the df_index_445 you have generated, it could look like this:
# calculate the mean on reindex groups
reindex = df_index_445.searchsorted(df.index, side='right') - 1
res = df.groupby(reindex).mean()
# filter valid output
res = res[res.index>=0]
res.index = df_index_445
Out:
a
2020-05-03 47.857143
2020-05-31 53.071429
2020-06-28 49.257143
2020-08-02 40.142857
2020-08-30 47.250000
2020-09-27 52.485714
2020-11-01 48.285714
2020-11-29 56.178571
2020-12-27 51.428571
2021-01-31 50.464286
2021-02-28 53.642857
Note that since the frequency is not regular, pandas will set the datetime index frequency to None.
I have a bit of an odd Series full of Date, Times which I want to convert to DateTime so that I can do some manipulation
allSubs['Subscribed']
0 12th December, 08:08
1 11th December, 14:57
2 10th December, 21:40
3 7th December, 21:39
4 5th December, 14:51
5 30th November, 15:36
When I call pd.to_datetime(allSubs['Subscribed']) on it, I get the error ' Out of bounds nanosecond timestamp: 1-12-12 08:08:00'. I tried to use param errors='coerce' but this just returns a series of nat. I want to convert the series into a pandas datetime object with format YYYY-MM-DD.
I've looked into using datetime.strptime but couldn't find an efficient way to run this against a series.
Any help much appreciated!
Use:
from dateutil import parser
allSubs['Subscribed'] = allSubs['Subscribed'].apply(parser.parse)
print (allSubs)
Subscribed
0 2018-12-12 08:08:00
1 2018-12-11 14:57:00
2 2018-12-10 21:40:00
3 2018-12-07 21:39:00
4 2018-12-05 14:51:00
5 2018-11-30 15:36:00
Or use replace by regex, also is necessary specify year, then use to_datetime by custom format - http://strftime.org/:
s = allSubs['Subscribed'].str.replace(r'(\d)(st|nd|rd|th)', r'\1 2018')
allSubs['Subscribed'] = pd.to_datetime(s, format='%d %Y %B, %H:%M')
print (allSubs)
Subscribed
0 2018-12-12 08:08:00
1 2018-12-11 14:57:00
2 2018-12-10 21:40:00
3 2018-12-07 21:39:00
4 2018-12-05 14:51:00
5 2018-11-30 15:36:00