Assume you have two pandas datetimes: from_date and end_date. I need a function that splits it into folds of n months (lets say n=3). For example:
import pandas as pd
from_date = pd.to_datetime("2020-02-15")
to_date = pd.to_datetime("2020-05-20")
should be splitted into 2 folds:
{
"1": {"from_date": 2020-02-15, "to_date": 2020-05-15},
"2": {"from_date": 2020-05-16, "to_date": 2020-05-20}
}
each fold needs to satisfy the condition:
from_date + pd.DateOffset(months=2) >= end_date. So it is not about the number of days between start and end date.
what is the most pythonic way to do this? Is there something in pandas?
My solution:
import pandas as pd
def check_and_split(from_date, to_date):
from_date = pd.to_datetime(from_date)
to_date = pd.to_datetime(to_date)
done = False
fold = 0
result = {}
start = from_date
end = to_date
while not done:
if start + pd.DateOffset(months=2) > to_date:
done = True
end = to_date
else:
end = start + pd.DateOffset(months=3)
result[fold] = {"from_date": start, "to_date": end}
if not done:
start = end + pd.DateOffset(days=1)
fold += 1
return result
Isn't there a more pythonic way? Something in pandas maybe?
Replace the respective print statements according to the way you wish to use the 2 dates!
According to How do I calculate the date six months from the current date using the datetime Python module? , dateutil.relativedelta can help resolve those months with and without the 31st day!
import pandas as pd
from dateutil.relativedelta import relativedelta
from_date = pd.to_datetime("2020-02-15")
to_date = pd.to_datetime("2020-05-20")
fold = 0
result = {}
while from_date+relativedelta(months=+3)<to_date:
curfrom = from_date #retain current 'from_date'
from_date =from_date+relativedelta(months=+3)
result[fold] = {"from_date": curfrom, "to_date": from_date}
fold = fold+1
from_date = from_date + relativedelta(days=+1) #So that the next 'from_date' starts 1 day after
result[fold] = {"from_date": curfrom, "to_date": to_date}
print(result)
Related
How can I get the list of intervals (as 2-tuples) of x minutes between to datetimes? It should be easy but even pandas.date_range() returns exacts intervals, dropping end time. For example:
In:
from = '2023-01-05 16:01:58'
to = '2023-01-05 17:30:15'
x = 30
Expected:
[
('2023-01-05 16:01:58', '2023-01-05 16:30:00'),
('2023-01-05 16:30:00', '2023-01-05 17:00:00'),
('2023-01-05 17:00:00', '2023-01-05 17:30:00'),
('2023-01-05 17:30:00', '2023-01-05 17:30:15')
]
Here is one way to do it with Pandas to_datetime, DateOffset and dt.ceil:
import pandas as pd
from pandas.tseries.offsets import DateOffset
from_date = pd.to_datetime("2023-01-05 16:01:58")
to_date = pd.to_datetime("2023-01-05 17:30:15")
freq = "30min"
intervals = []
start = pd.Series(from_date)
while True:
end = (start + DateOffset(minutes=1)).dt.ceil(freq)
if end[0] >= to_date:
intervals.append((str(start[0]), str(pd.to_datetime(to_date))))
break
intervals.append((str(start[0]), str(end[0])))
start = end
Then:
print(intervals)
# Output
[
("2023-01-05 16:01:58", "2023-01-05 16:30:00"),
("2023-01-05 16:30:00", "2023-01-05 17:00:00"),
("2023-01-05 17:00:00", "2023-01-05 17:30:00"),
("2023-01-05 17:30:00", "2023-01-05 17:30:15"),
]
I've been diving into List Comprehensions, and I'm determined to put it into practice.
The below code takes a month and year input to determine the number of business days in the month, minus the public holidays (available at https://www.gov.uk/bank-holidays.json).
Additionally, I want to list all public holidays in that month/year, but I'm struck with a date format conflict.
TypeError: '<' not supported between instances of 'str' and 'datetime.date'
edate and sdate are datetime.date, whereas title["date"] is a string.
I've tried things like datetime.strptime and datetime.date to now avail.
How can I resolve the date conflict within the List Comprehension?
Any help or general feedback on code appreciated.
from datetime import date, timedelta, datetime
import inspect
from turtle import title
from typing import Iterator
import numpy as np
import json
import requests
from calendar import month, monthrange
import print
# Ask for a month and year input (set to September for quick testing)
monthInput = "09"
yearInput = "2022"
# Request from UK GOV and filter to England and Wales
holidaysJSON = requests.get("https://www.gov.uk/bank-holidays.json")
ukHolidaysJSON = json.loads(holidaysJSON.text)['england-and-wales']['events']
# List for all England and Wales holidays
ukHolidayList = []
eventIterator = 0
for events in ukHolidaysJSON:
ukHolidayDate = list(ukHolidaysJSON[eventIterator].values())[1]
ukHolidayList.append(ukHolidayDate)
eventIterator += 1
# Calculate days in the month
daysInMonth = monthrange(int(yearInput), int(monthInput))[1] # Extract the number of days in the month
# Define start and end dates
sdate = date(int(yearInput), int(monthInput), 1) # start date
edate = date(int(yearInput), int(monthInput), int(daysInMonth)) # end date
# Calculate delta
delta = edate - sdate
# Find all of the business days in the month
numberOfWorkingDays = 0
for i in range(delta.days + 1): # Look through all days in the month
day = sdate + timedelta(days=i)
if np.is_busday([day]) and str(day) not in ukHolidayList: # Determine if it's a business day
print("- " + str(day))
numberOfWorkingDays += 1
# Count all of the UK holidays
numberOfHolidays = 0
for i in range(delta.days + 1): # Look through all days in the month
day = sdate + timedelta(days=i)
if str(day) in ukHolidayList: # Determine if it's a uk holiday
numberOfHolidays += 1
# Strip the 0 from the month input
month = months[monthInput.lstrip('0')]
# for x in ukHolidaysJSON:
# pprint.pprint(x["title"])
# This is where I've gotten to
hols = [ title["title"] for title in ukHolidaysJSON if title["date"] < sdate and title["date"] > edate ]
print(hols)
I got this to work. You can used the datetime module to parse the string format but then you need to convert that result into a Date to compare to the Date objects you have already.
hols = [ title["title"] for title in ukHolidaysJSON if datetime.strptime(title["date"], '%Y-%m-%d').date() < sdate and datetime.strptime(title["date"], "%Y-%m-%d").date() > edate ]
First use strptime and then convert the datetime object to date. I'm not sure if there's a more straightforward way but this seems to work:
hols = [title["title"] for title in ukHolidaysJSON
if datetime.strptime(title["date"], "%Y-%M-%d").date() < sdate and
datetime.strptime(title["date"], "%Y-%M-%d").date() > edate]
start = "Nov20"
end = "Jan21"
# Expected output:
["Nov20", "Dec20", "Jan21"]
What I've tried so far is the following but am looking for more elegant way.
from calendar import month_abbr
from time import strptime
def get_range(a, b):
start = strptime(a[:3], '%b').tm_mon
end = strptime(b[:3], '%b').tm_mon
dates = []
for m in month_abbr[start:]:
dates.append(m+a[-2:])
for mm in month_abbr[1:end + 1]:
dates.append(mm+b[-2:])
print(dates)
get_range('Nov20', 'Jan21')
Note: i don't want to use pandas as that's not logical to import such library for generating dates.
The date range may span different years so one way is to loop from the start date to end date and increment the month by 1 until end date is reached.
Try this:
from datetime import datetime
def get_range(a, b):
start = datetime.strptime(a, '%b%y')
end = datetime.strptime(b, '%b%y')
dates = []
while start <= end:
dates.append(start.strftime('%b%y'))
if start.month == 12:
start = start.replace(month=1, year=start.year+1)
else:
start = start.replace(month=start.month+1)
return dates
dates = get_range("Nov20", "Jan21")
print(dates)
Output:
['Nov20', 'Dec20', 'Jan21']
You can use timedelta to step one month (31 days) forward, but make sure you stay on the 1st of the month, otherwise the days might add up and eventually skip a month.
from datetime import datetime
from datetime import timedelta
def get_range(a, b):
start = datetime.strptime(a, '%b%y')
end = datetime.strptime(b, '%b%y')
dates = []
while start <= end:
dates.append(start.strftime('%b%y'))
start = (start + timedelta(days=31)).replace(day=1) # go to 1st of next month
return dates
dates = get_range("Jan20", "Jan21")
print(dates)
I have a dataframe (df) with start_date column's and add_days column's (=10). I want to create target_date (=start_date + add_days) excluding week-end and holidays (holidays as dataframe).
I do some research and I try this.
from datetime import date, timedelta
import datetime as dt
df["star_date"] = pd.to_datetime(df["star_date"])
Holidays['Date_holi'] = pd.to_datetime(Holidays['Date_holi'])
def date_by_adding_business_days(from_date, add_days, holidays):
business_days_to_add = add_days
current_date = from_date
while business_days_to_add > 0:
current_date += datetime.timedelta(days=1)
weekday = current_date.weekday()
if weekday >= 5: # sunday = 6
continue
if current_date in holidays:
continue
business_days_to_add -= 1
return current_date
#demo:
base["Target_date"]=date_by_adding_business_days(df["start_date"], 10, Holidays['Date_holi'])
but i get this error:
AttributeError: 'Series' object has no attribute 'weekday'
Thanks you for your help.
The comments by ALollz are very valid; customizing your date during creation to only keep what is defined as business day for your problem would be optimal.
However, I assume that you cannot define the business day beforehand and that you need to solve the problem with the data frame constructed as is.
Here is one possible solution:
import pandas as pd
import numpy as np
from datetime import timedelta
# Goal is to offset a start date by N business days (weekday + not a holiday)
# Here we fake the dataset as it was not provided
num_row = 1000
df = pd.DataFrame()
df['start_date'] = pd.date_range(start='1/1/1979', periods=num_row, freq='D')
df['add_days'] = pd.Series([10]*num_row)
# Define what is a week day
week_day = [0,1,2,3,4] # Monday to Friday
# Define what is a holiday with month and day without year (you can add more)
holidays = ['10-30','12-24']
def add_days_to_business_day(df, week_day, holidays, increment=10):
'''
modify the dataframe to increment only the days that are part of a weekday
and not part of a pre-defined holiday
>>> add_days_to_business_day(df, [0,1,2,3,4], ['10-31','12-31'])
this will increment by 10 the days from Monday to Friday excluding Halloween and new year-eve
'''
# Increment everything that is in a business day
df.loc[df['start_date'].dt.dayofweek.isin(week_day),'target_date'] = df['start_date'] + timedelta(days=increment)
# Remove every increment done on a holiday
df.loc[df['start_date'].dt.strftime('%m-%d').isin(holidays), 'target_date'] = np.datetime64('NaT')
add_days_to_business_day(df, week_day, holidays)
df
To Note: I'm not using the 'add_days' column since its just a repeated value. I am instead using a parameter for my function increment which will increment by N number of days (with a default of N = 10).
Hope it helps!
I'm looking to compare a list of dates with todays date and would like to return the closest one. Ive had various ideas on it but they are seem very convoluted and involve scoring based on how many days diff and taking the smallest diff. But I have no clue how to do this simply any pointers would be appreciated.
import datetime
import re
date_list = ['2019-02-10', '2018-01-13', '2019-02-8',]
now = datetime.date.today()
for date_ in date_list:
match = re.match('.*(\d{4})-(\d{2})-(\d{2}).*', date_)
if match:
year = match.group(1)
month = match.group(2)
day = match.group(3)
delta = now - datetime.date(int(year), int(month), int(day))
print(delta)
As I was Waiting EDIT
So I solved this using the below
import datetime
import re
date_list = ['2019-02-10', '2018-01-13', '2019-02-8',]
now = datetime.date.today()
for date_ in date_list:
match = re.match('.*(\d{4})-(\d{2})-(\d{2}).*', date_)
if match:
year = match.group(1)
month = match.group(2)
day = match.group(3)
delta = now - datetime.date(int(year), int(month), int(day))
dates_range.append(int(delta.days))
days = min(s for s in dates_range)
convert each string into a datetime.date object, then just subtract and get the smallest difference
import datetime
import re
date_list = ['2019-02-10', '2018-01-13', '2019-02-8',]
now = datetime.date.today()
date_list_converted = [datetime.datetime.strptime(each_date, "%Y-%m-%d").date() for each_date in date_list]
differences = [abs(now - each_date) for each_date in date_list_converted]
minimum = min(differences)
closest_date = date_list[differences.index(minimum)]
This converts the strings to a datetime object, then subracts the current date from that and returns the date with the corresponding lowest absolute difference:
import datetime
import re
date_list = ['2019-02-10', '2018-01-13', '2019-02-8',]
numPattern = re.compile("[0-9]+")
def getclosest(dates):
global numPattern
now = datetime.date.today()
diffs = []
for day in date_list:
year, month, day = [int(i) for i in re.findall(numPattern, day)]
currcheck = datetime.date(year, month, day)
diffs.append(abs(now - currcheck))
return dates[diffs.index(min(diffs))]
It's by no means the most efficient, but it's semi-elegant and works.
Using inbuilts
Python's inbuilt datetime module has the functionality to do what you desire.
Let's first take your list of dates and convert it into a list of datetime objects:
from datetime import datetime
date_list = ['2019-02-10', '2018-01-13', '2019-02-8']
datetime_list = [datetime.strptime(date, "%Y-%m-%d") for date in date_list]
Once we have this we can find the difference between those dates and today's date.
today = datetime.today()
date_diffs = [abs(date - today) for date in datetime_list]
Excellent, date_diffs is now a list of datetime.timedelta objects. All that is left is to find the minimum and find which date this represents.
To find the minimum difference it is simple enough to use min(date_diffs), however, we then want to use this minimum to extract the corresponding closest date. This can be achieved as:
closest_date = date_list[date_diffs.index(min(date_diffs))]
With pandas
If performance is an issue, it may be worth investigating a pandas implementation. Using pandas we can convert your dates to a pandas dataframe:
from datetime import datetime
import pandas as pd
date_list = ['2019-02-10', '2018-01-13', '2019-02-8']
date_df = pd.to_datetime(date_list)
Finally, as in the method using inbuilts we find the differences in the dates and use it to extract the closest date to today.
today = datetime.today()
date_diffs = abs(today - date_df)
closest_date = date_list[date_diffs.argmin()]
The advantage of this method is that we've removed the for loops and so I'd expect this method to be more efficient for large numbers of dates
one fast and simple way will be to use bisect algorithm, especially if your date_list is significantly big :
import datetime
from bisect import bisect_left
FMT = '%Y-%m-%d'
date_list = ['2019-02-10', '2018-01-13', '2019-02-8', '2019-02-12']
date_list.sort()
def closest_day_to_now(days):
"""
Return the closest day form an ordered list of days
"""
now = datetime.datetime.now()
left_closest_day_index = bisect_left(days, now.strftime(FMT))
# check if there is one greater value
if len(days) - 1 > left_closest_day_index:
right_closest_day_index = left_closest_day_index + 1
right_day = datetime.datetime.strptime(days[right_closest_day_index], FMT)
left_day = datetime.datetime.strptime(days[left_closest_day_index], FMT)
closest_day_index = right_closest_day_index if abs(right_day - now) < abs(left_day - now) \
else left_closest_day_index
else:
closest_day_index = left_closest_day_index
return days[closest_day_index]
print(closest_day_to_now(date_list))