Say we have a system that runs continuously, and that we make some changes to it on a particular date start_date.
We would like to compare the effects of the changes between:
The time window of full days between start_date and today's date (shown in yellow below)
The equivalent time window (same days of the week) of full days that took place right before start_date (shown in blue below)
For example, say I started my experiments on March 25 (in red), and that today is March 29 (green), I would like to obtain the four dates that define
time_window_before (the two dates in yellow) and time_window_after( the two dates in blue).
The idea is to compare the results of the experiment started on start_date before and after the experiment started, on the longest possible number of days on a time window that is symmetric (in terms of days of the week) to the date the experiment started.
In other words, given start_date and today's date, how can I find the pairs of dates that define time_window_before time_window_after ( as datetime objects)?
Update
Since I was asked what happens if start_date and today's date don't fall on the same week, below is one such example:
Python's datetime library has all the methods you need to add and subtract the dates:
from datetime import date, timedelta
def get_time_window_after(experiment_start_date, experiment_end_date):
# Add 1 day to start and subtract 1 day from end
print "After Start: %s" %(experiment_start_date + timedelta(days = 1))
print "After End: %s" %(experiment_end_date - timedelta(days = 1))
def get_time_window_before(experiment_start_date, experiment_end_date):
# Find the total length of the experiment
delta = experiment_end_date - experiment_start_date
# Determine how many weeks it covers (add 1 because same week would be 0)
delta_magnitude = 1 + (delta.days / 7)
# Subtract 7 days per week that the experiment covered, also add/subtract 1 day
print "Before Start: %s" %(experiment_start_date - timedelta(days = 7 * delta_magnitude) + timedelta(days = 1))
print "Before End: %s" %(experiment_end_date - timedelta(days = 7 * delta_magnitude) - timedelta(days = 1))
Here's the examples I ran the code with to make sure it works:
print "\nResults for March 25 2014 to March 29 2014"
get_time_window_after(date(2014, 3, 25), date(2014, 3, 29))
get_time_window_before(date(2014, 3, 25), date(2014, 3, 29))
print "\nResults for March 18 2014 to March 29 2014"
get_time_window_after(date(2014, 3, 18), date(2014, 3, 29))
get_time_window_before(date(2014, 3, 18), date(2014, 3, 29))
print "\nResults for March 18 2014 to April 04 2014"
get_time_window_after(date(2014, 3, 18), date(2014, 4, 4))
get_time_window_before(date(2014, 3, 18), date(2014, 4, 4))
Note: If you make these functions return values and set variables, you could use the time_window_after as input into get_time_window_before() function and forego the duplicated timedelta(days = 1) logic.
This works at least in your two samples, would this be good enough?:
experiment_start_date = datetime.date(2014,3,18)
now=datetime.date(2014,3,29)
day_after1 = experiment_start_date+datetime.timedelta(1)
day_after2 = now-datetime.timedelta(1)
day_before2 = experiment_start_date-datetime.timedelta(day_after2.weekday()-experiment_start_date.weekday()+1)
day_before1 = day_before2-(day_after2-day_after1)
The following should do it:
def get_symmetric_time_window_fwd(ref_date, end_date):
da1 = ref_date+timedelta(1)
da2 = end_date-timedelta(1)
if da2.weekday() >= ref_date.weekday():
db2 = da2 - timedelta( 7 * (1+int((end_date - ref_date).days/7)))
else:
db2 = da2 - timedelta( 7 * (int((end_date - ref_date).days/7)))
db1 = db2-(da2-da1)
return da1, da2, db1, db2
Test 1:
In: get_symmetric_time_window(date(2014, 3, 18), date(2014, 3, 29))
Out:
(datetime.date(2014, 3, 19),
datetime.date(2014, 3, 28),
datetime.date(2014, 3, 5),
datetime.date(2014, 3, 14))
Test 2:
In: get_symmetric_time_window(date(2014, 3, 25), date(2014, 3, 29))
Out:
(datetime.date(2014, 3, 26),
datetime.date(2014, 3, 28),
datetime.date(2014, 3, 19),
datetime.date(2014, 3, 21))
Test 3:
In: get_symmetric_time_window(date(2014, 7, 17), date(2014, 7, 23))
Out:
(datetime.date(2014, 7, 18),
datetime.date(2014, 7, 22),
datetime.date(2014, 7, 11),
datetime.date(2014, 7, 15))
Related
I have a situation where I have a code with which I am processing data for operated shifts.
In it, I have arrays for start and end of shifts (e.g. shift_start[0] and shift_end[0] for shift #1), and for the time between them, I need to know how many weekdays, holidays or weekend days.
The holidays I have already defined in an array of datetime entries, which should represent the holidays of a specific country (it's not the same as here and I do not seek for further more dynamic options here yet).
So basically I have it like that:
started = [datetime.datetime(2022, 2, 1, 0, 0), datetime.datetime(2022, 2, 5, 8, 0), datetime.datetime(2022, 2, 23, 11, 19, 28)]
ended = [datetime.datetime(2022, 2, 2, 16, 0), datetime.datetime(2022, 2, 5, 17, 19, 28), datetime.datetime(2022, 4, 26, 12, 30)]
holidays = [datetime.datetime(2022, 1, 3), datetime.datetime(2022, 3, 3), datetime.datetime(2022, 4, 22), datetime.datetime(2022, 4, 25)]
I'm seeking for options to go thru each of the 3 ranges and match the number of days it contains (e.g. the first range should contain 2 weekdays, the second - one weekend day)
So based on the suggestion by #gimix, I was able to develop what I needed:
for each_start, each_end in zip(started, ended): # For each period
for single_date in self.daterange(each_start, each_end): # For each day of each period
# Checking if holiday or weekend
if (single_date.replace(hour=0, minute=0, second=0) in holidays) or (single_date.weekday() > 4):
set_special_days_worked(1)
# If not holiday or weekend, then it is regular working day
else:
set_regular_days_worked(1)
I would like to have the exact same date every year from an end date to next one from today. For example if my end is "20251220", I would like to get the following list of dates
"20211220","20221220","20231220","20241220". However, if it was "20250220" I only need "20220220","20230220","20240220" as we already passed February. I tried to a simple loop by myself (see below) where I then would check at then end if first date is in the past. But I think there must be a build in function to do this, via pandas or dateutil etc.
I've tried this:
In [455]: import datetime as dt
In [456]: end = dt.date(2025, 12,20)
In [457]: start = dt.date(dt.datetime.today().year, end.month,end.d)
In [458]: periods = end.year-start.year
In [461]: l = [dt.date(start.year + i, 12, 20) for i in range(0,periods)]
In [462]: l
Out[462]:
[datetime.date(2021, 12, 20),
datetime.date(2022, 12, 20),
datetime.date(2023, 12, 20),
datetime.date(2024, 12, 20),
datetime.date(2025, 12, 20)]
One idea with list comprehension:
import datetime as dt
end = dt.date(2025, 12,20)
today = dt.datetime.today()
l = [end.replace(year=i)
for i in range(today.year, end.year)
if end.replace(year=i) > today.date()]
print (l)
[datetime.date(2021, 12, 20),
datetime.date(2022, 12, 20),
datetime.date(2023, 12, 20),
datetime.date(2024, 12, 20)]
import datetime as dt
end = dt.date(2025, 2,20)
today = dt.datetime.today()
l = [end.replace(year=i)
for i in range(today.year, end.year)
if end.replace(year=i) > today.date()]
print (l)
[datetime.date(2022, 2, 20), datetime.date(2023, 2, 20), datetime.date(2024, 2, 20)]
Solution working with 29 February:
import datetime as dt
end = dt.date(2028, 2,29)
today = dt.datetime.today()
l = [(end + pd.offsets.DateOffset(year=i)).date()
for i in range(today.year, end.year)
if (end + pd.offsets.DateOffset(year=i)) > today]
print (l)
[datetime.date(2022, 2, 28),
datetime.date(2023, 2, 28),
datetime.date(2024, 2, 29),
datetime.date(2025, 2, 28),
datetime.date(2026, 2, 28),
datetime.date(2027, 2, 28)]
Given today's date, what is the efficient way to retrieve the first and last date for previous 3 months (i.e. 3/1/2020' and '3/31/2020'; '2/1/2020' and '2/29/2020'; '1/1/2020' and '1/31/2020')?
EDIT
For previous month's first and last, the following code is working as expected. But I am not sure how to retrieve the previous 2nd and 3rd month's first and last date.
from datetime import date, timedelta
last_day_of_prev_month = date.today().replace(day=1) - timedelta(days=1)
start_day_of_prev_month = (date.today().replace(day=1)
- timedelta(days=last_day_of_prev_month.day))
# For printing results
print("First day of prev month:", start_day_of_prev_month)
print("Last day of prev month:", last_day_of_prev_month)
You may
get the 3 previous month
create the date with day 1, and last day by going to the next and remove 1 day
def before_month(month):
v = [9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
return v[month:month + 3]
dd = datetime(2020, 4, 7)
dates = [[dd.replace(month=month, day=1), dd.replace(month=month, day=monthrange(dd.year, month)[1])]
for month in before_month(dd.month)]
print(dates)
# [[datetime.datetime(2020, 1, 1, 0, 0), datetime.datetime(2020, 1, 31, 0, 0)],
# [datetime.datetime(2020, 2, 1, 0, 0), datetime.datetime(2020, 2, 29, 0, 0)],
# [datetime.datetime(2020, 3, 1, 0, 0), datetime.datetime(2020, 3, 31, 0, 0)]]
I did not found another nice way to get the 3 previous month, but sometimes the easiest way it the one to use
You can loop over the 3 previous month, just update the date to the first day of the actual month at the end of every iteration:
from datetime import date, timedelta
d = date.today()
date_array = []
date_string_array = []
for month in range(1, 4):
first_day_of_month = d.replace(day=1)
last_day_of_previous_month = first_day_of_month - timedelta(days=1)
first_day_of_previous_month = last_day_of_previous_month.replace(day=1)
date_array.append((first_day_of_previous_month, last_day_of_previous_month))
date_string_array.append((first_day_of_previos_month.strftime("%m/%d/%Y"), last_day_of_previos_month.strftime("%m/%d/%Y")))
d = first_day_of_previos_month
print(date_array)
print(date_string_array)
Results:
[(datetime.date(2020, 3, 1), datetime.date(2020, 3, 31)), (datetime.date(2020, 2, 1), datetime.date(2020, 2, 29)), (datetime.date(2020, 2, 1), datetime.date(2020, 2, 29))]
[('03/01/2020', '03/31/2020'), ('03/01/2020', '03/31/2020'), ('03/01/2020', '03/31/2020')]
Let's say I have the following data frame. I want to calculate the average number of days between all the activities for a particular account.
Below is my desired result:
Now I know how to calculate the number of days between two dates with the following code. But I don't know how to calculate what I am looking for across multiple dates.
from datetime import date
d0 = date(2016, 8, 18)
d1 = date(2016, 9, 26)
delta = d0 - d1
print delta.days
I would do this as follows in pandas (assuming the Date column is a datetime64):
In [11]: df
Out[11]:
Account Activity Date
0 A a 2015-10-21
1 A b 2016-07-07
2 A c 2016-07-07
3 A d 2016-09-14
4 A e 2016-10-12
5 B a 2015-11-24
6 B b 2015-12-30
In [12]: df.groupby("Account")["Date"].apply(lambda x: x.diff().mean())
Out[12]:
Account
A 89 days 06:00:00
B 36 days 00:00:00
Name: Date, dtype: timedelta64[ns]
If your dates are in a list:
>>> from datetime import date
>>> dates = [date(2015, 10, 21), date(2016, 7, 7), date(2016, 7, 7), date(2016, 9, 14), date(2016, 10, 12), date(2016, 10, 12), date(2016, 11, 22), date(2016, 12, 21)]
>>> differences = [(dates[i]-dates[i-1]).days for i in range(1, len(dates))] #[260, 0, 69, 28, 0, 41, 29]
>>> float(sum(differences))/len(differences)
61.0
>>>
I am trying to find the quarter-end closest to a given date: e.g. the closest quarter-end for 5/27/2014 would be 6/30/2014 and for 2/2/2013 would be 12/31/2012. I have the following but it doesn't give me the expected output for a date like 8/15/2015:
import datetime
tester = datetime.datetime(2015, 8, 15)
calendar_date = datetime.datetime(tester.year - 1, 12, 31)
for dd in [(3, 31), (6, 30), (9, 30), (12, 31)]:
diff = abs(datetime.datetime(tester.year, dd[0], dd[1]) - tester)
if diff.days <= 45:
calendar_date = datetime.datetime(tester.year, dd[0], dd[1])
break
print tester, calendar_date
I've simplified by just assuming each quarter is 90 days and thus take 1/2 of that at 45 days (is there a better way???) but clearly that doesn't work for 8/16/2015 as it prints:
2015-08-15 00:00:00 2014-12-31 00:00:00
I was expecting 2015-09-30 00:00:00
datetime.timedelta might be negative and diff.days <= 45 is always true for negative time deltas, hence the incorrect result.
You already had a simple solution with the candidates in place. These are
Last quarter of year before target date
All four quarters of the current year
datetime.timedelta objects have relative comparison operators, i.e. they form a total order, which means there's a minimum. As noted in the comments by Padraic Cunningham, you want the candidate with the minimum absolute distance to the target date:
def get_closest_quarter(target):
# candidate list, nicely enough none of these
# are in February, so the month lengths are fixed
candidates = [
datetime.date(target.year - 1, 12, 31),
datetime.date(target.year, 3, 31),
datetime.date(target.year, 6, 30),
datetime.date(target.year, 9, 30),
datetime.date(target.year, 12, 31),
]
# take the minimum according to the absolute distance to
# the target date.
return min(candidates, key=lambda d: abs(target - d))
The code here uses datetime.date for simplicity, but it should be easy to generalize to datetime.datetime if necessary.
You can get away with comparing just two dates, using ind = (dt.month-1) // 3 + 1 to get the index for the current quarter:
def find_qrt(dt):
qrts = [date(dt.year - 1, 12, 31), date(dt.year, 3, 31),
date(dt.year, 6, 30), date(dt.year, 9, 30),
date(dt.year, 12, 31),
]
ind = (dt.month-1) // 3 + 1
curr_qr, last_qr = qrts[ind], qrts[ind-1]
return curr_qr if abs(curr_qr - dt) < abs(last_qr - dt) else last_qr
If you want to return the later quarter in the case of a tie as per your example date we just need to use <=:
dt = datetime.date(2015, 8, 15)
from datetime import date
def find_qrt(dt):
qrts = [date(dt.year - 1, 12, 31), date(dt.year, 3, 31),
date(dt.year, 6, 30), date(dt.year, 9, 30),
date(dt.year, 12, 31),
]
ind = (dt.month-1) // 3 + 1
curr_qr, last_qr = qrts[ind],qrts[ind-1]
return curr_qr if abs(curr_qr - dt) <= abs(last_qr - dt) else last_qr
print(find_qrt(dt))
The first function will return 2015-06-30 because the earlier date breaks the tie, for the second we get 2015-09-30 as we take the current quarter in the event of a tie.
The first function