Creating a deadline based on criteria - python

With the datetime data below I would like to set a "deadline" relative to the 'Date' column. In essence, the deadline should be 2 Business Days from the time of the "Date". However there are some specific criteria, which are below:
If a 'Date' is in Holidays or on a weekend, then the deadline should be the next 2 business day at 17:00 hours.
If a 'Date' is on Monday-Friday, not a holiday, AND the hour of the 'Date' is between 0-8, then the deadline should be the next non-holiday, business day at 17:00 hours
If a 'Date' is on Monday-Friday, not a holiday, AND the hour of the 'Date' is between 9-17, then the deadline should be the next 2 non-holiday, business day at the same time.
If a 'Date' is on Monday-Friday, not a holiday, AND the hour of the 'Date' is between 18-23, then the deadline should be the next 2 non-holiday, business day at 17:00 hours
Below is the data:
import datetime
Holidays = [date(2018,1,1),date(2018,1,15),date(2018,2,19),date(2018,3,9)]
df = pd.DataFrame({'Date': ['2018-01-01 18:47','2018-01-08 06:11','2018-01-12 10:05','2018-02-10 09:22','2018-02-20 14:14','2018-03-08 16:17','2018-03-25 17:35'],
'Weekday': [0,0,4,5,1,3,6],
'Hour': [18,6,10,9,14,16,17]})
df['Date'] = pd.to_datetime(df['Date'])
The result should be as follows:
df = pd.DataFrame({'Date': ['2018-01-01 18:47','2018-01-08 06:11','2018-01-12 10:05','2018-02-10 09:22','2018-02-21 14:14','2018-03-08 16:17','2018-03-25 17:35'],
'Deadline': ['2018-01-03 17:00','2018-01-09 17:00','2018-01-17 10:05','2018-02-13 17:00','2018-02-23 14:14','2018-03-13 16:17','2018-03-27 17:00']})

Related

How to convert date format (dd/mm/yyyy) to days in python csv

I need a function to count the total number of days in the 'days' column between a start date of 1st Jan 1995 and an end date of 31st Dec 2019 in a dataframe taking leap years into account as well.
Example: 1st Jan 1995 - Day 1, 1st Feb 1995 - Day 32 .......and so on all the way to 31st.
If you want to filter a pandas dataframe using a range of 2 date you can do this by:
start_date = '1995/01/01'
end_date = '1995/02/01'
df = df[ (df['days']>=start_date) & (df['days']<=end_date) ]
and with len(df) you will see the number of rows of the filter dataframe.
Instead, if you want to calculate a range of days between 2 different date you can do without pandas with datetime:
from datetime import datetime
start_date = '1995/01/01'
end_date = '1995/02/01'
delta = datetime.strptime(end_date, '%Y/%m/%d') - datetime.strptime(start_date, '%Y/%m/%d')
print(delta.days)
Output:
31
The only thing is that this not taking into account leap years

Using date range in pandas for dekad

How can I use the pandas date range function to have a frequency of a dekad. A dekad is a 10 day period, and each month has 3 dekads, (from 1st - 10th, 11th - 20th, 212t - 30th).
pd.date_range(start_date, end_date, freq='D')

Combine weekday with hours in Pandas

I have a data frame with a weekday column that contains the name of the weekdays and a time column that contains hours on these days. How can I combine these 2 columns, so they can be also sortable?
I have tried the string version but it is not sortable based on weekdays and hours.
This is the sample table how it looks like.
weekday
time
Monday
12:00
Monday
13:00
Tuesday
20:00
Friday
10:00
This is what I want to get.
weekday_hours
Monday 12:00
Monday 13:00
Tuesday 20:00
Friday 10:00
Asumming that df is your initial dataframe
import json
datas = json.loads(df.to_json(orient="records"))
final_data = {"weekday_hours": []}
for data in datas:
final_data["weekday_hours"].append(data['weekday'] + ' ' + data['time'])
final_df = pd.DataFrame(final_data)
final_df
Ouptput:
you first need to create a datetime object of 7 days at an hourly level to sort by. In a normal Data warehousing world you normally have a calendar and a time dimension with all the different representation of your date data that you can merge and sort by, this is an adaptation of that methodology.
import pandas as pd
df1 = pd.DataFrame({'date' : pd.date_range('01 Jan 2021', '08 Jan 2021',freq='H')})
df1['str_date'] = df1['date'].dt.strftime('%A %H:%M')
print(df1.head(5))
date str_date
0 2021-01-01 00:00:00 Friday 00:00
1 2021-01-01 01:00:00 Friday 01:00
2 2021-01-01 02:00:00 Friday 02:00
3 2021-01-01 03:00:00 Friday 03:00
4 2021-01-01 04:00:00 Friday 04:00
Then create your column to merge on.
df['str_date'] = df['weekday'] + ' ' + df['time']
df2 = pd.merge(df[['str_date']],df1,on=['str_date'],how='left')\
.sort_values('date').drop('date',1)
print(df2)
str_date
3 Friday 10:00
0 Monday 12:00
1 Monday 13:00
2 Tuesday 20:00
Based on my understanding of the question, you want a single column, "weekday_hours," but you also want to be able to sort the data based on this column. This is a bit tricky because "Monday" doesn't provide enough information to define a valid datetime. Parsing using pd.to_datetime(df['weekday_hours'], format='%A %H:%M' for example, will return 1900-01-01 <hour::minute::second> if given just weekday and time. When sorted, this only sorts by time.
One workaround is to use dateutil to parse the dates. In lieu of a date, it will return the next date corresponding to the day of the week. For example, today (9 April 2021) dateutil.parser.parse('Friday 10:00') returns datetime.datetime(2021, 4, 9, 10, 0) and dateutil.parser.parse('Monday 10:00') returns datetime.datetime(2021, 4, 12, 10, 0). Therefore, we need to set the "default" date to something corresponding to our "first" day of the week. Here is an example starting with unsorted dates:
import datetime
import dateutil
import pandas as pd
weekdays = ['Friday', 'Monday', 'Monday', 'Tuesday']
times = ['10:00', '13:00', '12:00', '20:00', ]
df = pd.DataFrame({'weekday' : weekdays, 'time' : times})
df2 = pd.DataFrame()
df2['weekday_hours'] = df[['weekday', 'time']].agg(' '.join, axis=1)
amonday = datetime.datetime(2021, 2, 1, 0, 0) # assuming week starts monday
sorter = lambda t: [dateutil.parser.parse(ti, default=amonday) for ti in t]
print(df2.sort_values('weekday_hours', key=sorter))
Produces the output:
weekday_hours
2 Monday 12:00
1 Monday 13:00
3 Tuesday 20:00
0 Friday 10:00
Note there are probably more computationaly efficient ways if you are working with a lot of data, but this should illustrate the idea of a sortable weekday/time pair.

counting months between two days in dataframe

I have a dataframe with multiple columns, one of which is a date column. I'm interested in creating a new column which contains the number of months between the date column and a preset date. For example one of the dates in the 'start date' column is '2019-06-30 00:00:00' i would want to be able to calculate the number of months between that date and the end of 2021 so 2021-12-31 and place the answer into a new column and do this for the entire date column in the dataframe. I haven't been able to work out how i could go about this but i would like it in the end to look like this if the predetermined end date was 2021-12-31:
df =
|start date months
0|2019-06-30 30
1|2019-08-12 28
2|2020-01-24 23
You can do this using np.timedelta64:
end_date = pd.to_datetime('2021-12-31')
df['start date'] = pd.to_datetime(df['start date'])
df['month'] = ((end_date - df['start date'])/np.timedelta64(1, 'M')).astype(int)
print(df)
start date month
0 2019-06-30 30
1 2019-08-12 28
2 2020-01-24 23
Assume that start date column is of datetime type (not string)
and the reference date is defined as follows:
refDate = pd.to_datetime('2021-12-31')
or any other date of your choice.
Then you can compute the number of months as:
df['months'] = (refDate.to_period('M') - df['start date']\
.dt.to_period('M')).apply(lambda x: x.n)

Pandas DatetimeIndex and to_datetime discrepancies when calculate (format) the same date

I've got a simple task of creating consectuive days and do some calculations on it.
I did it using:
date = pd.DatetimeIndex(start='2019-01-01', end='2019-01-10',freq='D')
df = pd.DataFrame([date, date.week, date.dayofweek], index=['Date','Week', 'DOW']).T
df
and now I want to calculate back the date from week and day of week using:
df['Date2'] = pd.to_datetime('2019' + df['Week'].map(str) + df['DOW'].map(str), format='%Y%W%w')
The result I get is:
As I understand it DatetimeIndex has a different method of calculating Week Number as 1stJan2019 should be Week=0 and dow=2 and it is when I try run code: pd.to_datetime('201902', format='%Y%W%w') : Timestamp('2019-01-01 00:00:00')
Simmilar questions where asked here and here but both for both of them the discrepency came from different time zones and here I don't use them.
Thanks for help!
According to the documentation https://github.com/d3/d3-time-format#api-reference,
it appears %W is Monday-based week whereas %w is Sunday-based weekday.
I ran the code bellow to get back the expected result :
date = pd.DatetimeIndex(start='2019-01-01', end='2019-01-10',freq='D')
df = pd.DataFrame([date, date.week, date.weekday_name, date.dayofweek], index=['Date','Week', 'Weekday', 'DOW']).T
df['Week'] = df['Week'] - 1
df['Date2'] = pd.to_datetime('2019' + df['Week'].map(str) + df['Weekday'].map(str), format='%Y%W%A', box=True)
Notice that 2018-12-31 is in the first week of year 2019
Date Week Weekday DOW Date2
0 2018-12-31 00:00:00 0 Monday 0 2018-12-31

Categories