I would like to have the exact same date every year from an end date to next one from today. For example if my end is "20251220", I would like to get the following list of dates
"20211220","20221220","20231220","20241220". However, if it was "20250220" I only need "20220220","20230220","20240220" as we already passed February. I tried to a simple loop by myself (see below) where I then would check at then end if first date is in the past. But I think there must be a build in function to do this, via pandas or dateutil etc.
I've tried this:
In [455]: import datetime as dt
In [456]: end = dt.date(2025, 12,20)
In [457]: start = dt.date(dt.datetime.today().year, end.month,end.d)
In [458]: periods = end.year-start.year
In [461]: l = [dt.date(start.year + i, 12, 20) for i in range(0,periods)]
In [462]: l
Out[462]:
[datetime.date(2021, 12, 20),
datetime.date(2022, 12, 20),
datetime.date(2023, 12, 20),
datetime.date(2024, 12, 20),
datetime.date(2025, 12, 20)]
One idea with list comprehension:
import datetime as dt
end = dt.date(2025, 12,20)
today = dt.datetime.today()
l = [end.replace(year=i)
for i in range(today.year, end.year)
if end.replace(year=i) > today.date()]
print (l)
[datetime.date(2021, 12, 20),
datetime.date(2022, 12, 20),
datetime.date(2023, 12, 20),
datetime.date(2024, 12, 20)]
import datetime as dt
end = dt.date(2025, 2,20)
today = dt.datetime.today()
l = [end.replace(year=i)
for i in range(today.year, end.year)
if end.replace(year=i) > today.date()]
print (l)
[datetime.date(2022, 2, 20), datetime.date(2023, 2, 20), datetime.date(2024, 2, 20)]
Solution working with 29 February:
import datetime as dt
end = dt.date(2028, 2,29)
today = dt.datetime.today()
l = [(end + pd.offsets.DateOffset(year=i)).date()
for i in range(today.year, end.year)
if (end + pd.offsets.DateOffset(year=i)) > today]
print (l)
[datetime.date(2022, 2, 28),
datetime.date(2023, 2, 28),
datetime.date(2024, 2, 29),
datetime.date(2025, 2, 28),
datetime.date(2026, 2, 28),
datetime.date(2027, 2, 28)]
Related
I've got a dataframe describing events in a company and it looks like this:
employee_id event event_start_date event_end_date hire_date
1 "data change" 1.01.2018 1.01.2018 1.09.2005
2 "data change" 4.04.2018 4.04.2018 1.06.2007
2 "termination" 2.10.2020 NaT 1.06.2007
3 "hire" 23.05.2019 23.05.2019 23.05.2019
3 "leave" 23.07.2019 30.07.2019 23.05.2019
3 "termination" 3.11.2020 NaT 23.05.2019
Table is indexed by employee_id and event, and sorted by event_start_date.
So one employee has one or more events listed in the table. "Hired" event is not always in the "event" column, so I assume that information about hiring date is only available in "hire_date" column. I would like to:
count the number of hiring events in each year
count the number of termination events in each year
Count the number of active employees in each year
Build the example df:
import pandas as pd
import datetime
import numpy as np
# example df
emp = [1, 2, 2, 3, 3, 3]
event = ["data change", "data change", "termination", "hire", "leave", "termination"]
s_date = [datetime.datetime(2018, 1, 1), datetime.datetime(2018, 4, 4), datetime.datetime(2020, 10, 2),
datetime.datetime(2019, 5, 23), datetime.datetime(2019, 7, 23), datetime.datetime(2020, 11, 3)]
e_date = [datetime.datetime(2018, 1, 1), datetime.datetime(2018, 4, 4), np.datetime64('NaT'),
datetime.datetime(2019, 5, 23), datetime.datetime(2019, 7, 30), np.datetime64('NaT')]
h_date = [datetime.datetime(2005, 9, 1), datetime.datetime(2007, 6, 1), datetime.datetime(2017, 6, 1),
datetime.datetime(2019, 5, 23), datetime.datetime(2019, 5, 23), datetime.datetime(2019, 5, 23)]
df = pd.DataFrame(emp, columns=['employee_id'])
df['event'] = event
df['event_start_date'] = s_date
df['event_end_date'] = e_date
df['hire_date'] = h_date
1st question
def calculate_hire_for_year():
df['hire_year'] = pd.DatetimeIndex(df['hire_date']).year
dict_years = {}
ids = set(list(df['employee_id']))
for id in ids:
result = df[df['employee_id'] == id]
year = list(result['hire_year'])[0]
dict_years[year] = dict_years.get("b", 0) + 1
return dict_years
print("Number of hiring events in each year:")
print(calculate_hire_for_year())
2nd question
def calculate_termination_per_year():
df['year'] = pd.DatetimeIndex(df['event_start_date']).year
result = df[df['event'] == "termination"]
count_series = result.groupby(["event", "year"]).size()
return count_series
print("Number of termination events in each year:")
print(calculate_termination_per_year())
3rd question
def calculate_employee_per_year():
dict_years = {}
df['year'] = pd.DatetimeIndex(df['event_start_date']).year
years = set(list(df['year']))
for year in years:
result = df[df['year'] == year]
count_emp = len(set(list(result['employee_id'])))
dict_years[year] = count_emp
return dict_years
print("Number of active employees in each year:")
print(calculate_employee_per_year())
Given today's date, what is the efficient way to retrieve the first and last date for previous 3 months (i.e. 3/1/2020' and '3/31/2020'; '2/1/2020' and '2/29/2020'; '1/1/2020' and '1/31/2020')?
EDIT
For previous month's first and last, the following code is working as expected. But I am not sure how to retrieve the previous 2nd and 3rd month's first and last date.
from datetime import date, timedelta
last_day_of_prev_month = date.today().replace(day=1) - timedelta(days=1)
start_day_of_prev_month = (date.today().replace(day=1)
- timedelta(days=last_day_of_prev_month.day))
# For printing results
print("First day of prev month:", start_day_of_prev_month)
print("Last day of prev month:", last_day_of_prev_month)
You may
get the 3 previous month
create the date with day 1, and last day by going to the next and remove 1 day
def before_month(month):
v = [9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
return v[month:month + 3]
dd = datetime(2020, 4, 7)
dates = [[dd.replace(month=month, day=1), dd.replace(month=month, day=monthrange(dd.year, month)[1])]
for month in before_month(dd.month)]
print(dates)
# [[datetime.datetime(2020, 1, 1, 0, 0), datetime.datetime(2020, 1, 31, 0, 0)],
# [datetime.datetime(2020, 2, 1, 0, 0), datetime.datetime(2020, 2, 29, 0, 0)],
# [datetime.datetime(2020, 3, 1, 0, 0), datetime.datetime(2020, 3, 31, 0, 0)]]
I did not found another nice way to get the 3 previous month, but sometimes the easiest way it the one to use
You can loop over the 3 previous month, just update the date to the first day of the actual month at the end of every iteration:
from datetime import date, timedelta
d = date.today()
date_array = []
date_string_array = []
for month in range(1, 4):
first_day_of_month = d.replace(day=1)
last_day_of_previous_month = first_day_of_month - timedelta(days=1)
first_day_of_previous_month = last_day_of_previous_month.replace(day=1)
date_array.append((first_day_of_previous_month, last_day_of_previous_month))
date_string_array.append((first_day_of_previos_month.strftime("%m/%d/%Y"), last_day_of_previos_month.strftime("%m/%d/%Y")))
d = first_day_of_previos_month
print(date_array)
print(date_string_array)
Results:
[(datetime.date(2020, 3, 1), datetime.date(2020, 3, 31)), (datetime.date(2020, 2, 1), datetime.date(2020, 2, 29)), (datetime.date(2020, 2, 1), datetime.date(2020, 2, 29))]
[('03/01/2020', '03/31/2020'), ('03/01/2020', '03/31/2020'), ('03/01/2020', '03/31/2020')]
As the title says, I'm trying to generate a list of datetimes corresponding to the occurrences of a specific day of the month between two dates.
So given a start date, an end date, and a day of the month, I want to see every occurrence of that day of the month:
from datetime import datetime
end_date = datetime(2012, 9, 15, 0, 0)
start_date = datetime(2012, 6, 1, 0, 0)
day_of_month = 16
dates = "magic code goes here"
dates would then hold an array as such:
dates == [
datetime(2012, 6, 16, 0, 0),
datetime(2012, 7, 16, 0, 0),
datetime(2012, 8, 16, 0, 0)
]
The issue I'm running into is the number of checks I have to perform. First I have to check if it's the start year, if so, then I have to start at the beginning month, but if the day of the month is before the start date, then I have to skip that month. This same thing applies for the end of the period. Not to mention I have to check if the period starts and ends in the same year. All in all it's turning into quite a mess of nested if and for statements.
Here is my solution:
import numpy as np
for year in np.arange(start_date.year, end_date.year + 1):
for month in np.arange(1, 13):
date = datetime(year, month, day_of_month, 0, 0)
if start_date < date < end_date:
dates.append(date)
Is there a more Pythonic way to accomplish this?
Here's a quick and dirty (but reasonably efficient) solution:
import datetime
d = start_date
days = []
while d <= end_date: # Change to < if you do not want the end_date
if d.day == day_of_month:
days.append(d)
d += datetime.timedelta(1)
days
# [datetime.datetime(2012, 6, 16, 0, 0),
# datetime.datetime(2012, 7, 16, 0, 0),
# datetime.datetime(2012, 8, 16, 0, 0)]
Ideally, you want to use pandas for this.
This is a succinct, but not efficient, way using pandas.date_range.
from datetime import datetime
import pandas as pd
end_date = datetime(2012, 9, 15, 0, 0)
start_date = datetime(2012, 6, 1, 0, 0)
day_of_month = 16
rng = [i.to_pydatetime() for i in pd.date_range(start_date, end_date, freq='1D') if i.day == day_of_month]
# [datetime.datetime(2012, 6, 16, 0, 0),
# datetime.datetime(2012, 7, 16, 0, 0),
# datetime.datetime(2012, 8, 16, 0, 0)]
Here is a more efficient method using a generator for the date range, which does not rely on pandas:
def daterange(start_date, end_date):
for n in range(int ((end_date - start_date).days)):
yield start_date + timedelta(n)
rng = [i for i in daterange(start_date, end_date) if i.day == day_of_month]
# [datetime.datetime(2012, 6, 16, 0, 0),
# datetime.datetime(2012, 7, 16, 0, 0),
# datetime.datetime(2012, 8, 16, 0, 0)]
I am trying to find the quarter-end closest to a given date: e.g. the closest quarter-end for 5/27/2014 would be 6/30/2014 and for 2/2/2013 would be 12/31/2012. I have the following but it doesn't give me the expected output for a date like 8/15/2015:
import datetime
tester = datetime.datetime(2015, 8, 15)
calendar_date = datetime.datetime(tester.year - 1, 12, 31)
for dd in [(3, 31), (6, 30), (9, 30), (12, 31)]:
diff = abs(datetime.datetime(tester.year, dd[0], dd[1]) - tester)
if diff.days <= 45:
calendar_date = datetime.datetime(tester.year, dd[0], dd[1])
break
print tester, calendar_date
I've simplified by just assuming each quarter is 90 days and thus take 1/2 of that at 45 days (is there a better way???) but clearly that doesn't work for 8/16/2015 as it prints:
2015-08-15 00:00:00 2014-12-31 00:00:00
I was expecting 2015-09-30 00:00:00
datetime.timedelta might be negative and diff.days <= 45 is always true for negative time deltas, hence the incorrect result.
You already had a simple solution with the candidates in place. These are
Last quarter of year before target date
All four quarters of the current year
datetime.timedelta objects have relative comparison operators, i.e. they form a total order, which means there's a minimum. As noted in the comments by Padraic Cunningham, you want the candidate with the minimum absolute distance to the target date:
def get_closest_quarter(target):
# candidate list, nicely enough none of these
# are in February, so the month lengths are fixed
candidates = [
datetime.date(target.year - 1, 12, 31),
datetime.date(target.year, 3, 31),
datetime.date(target.year, 6, 30),
datetime.date(target.year, 9, 30),
datetime.date(target.year, 12, 31),
]
# take the minimum according to the absolute distance to
# the target date.
return min(candidates, key=lambda d: abs(target - d))
The code here uses datetime.date for simplicity, but it should be easy to generalize to datetime.datetime if necessary.
You can get away with comparing just two dates, using ind = (dt.month-1) // 3 + 1 to get the index for the current quarter:
def find_qrt(dt):
qrts = [date(dt.year - 1, 12, 31), date(dt.year, 3, 31),
date(dt.year, 6, 30), date(dt.year, 9, 30),
date(dt.year, 12, 31),
]
ind = (dt.month-1) // 3 + 1
curr_qr, last_qr = qrts[ind], qrts[ind-1]
return curr_qr if abs(curr_qr - dt) < abs(last_qr - dt) else last_qr
If you want to return the later quarter in the case of a tie as per your example date we just need to use <=:
dt = datetime.date(2015, 8, 15)
from datetime import date
def find_qrt(dt):
qrts = [date(dt.year - 1, 12, 31), date(dt.year, 3, 31),
date(dt.year, 6, 30), date(dt.year, 9, 30),
date(dt.year, 12, 31),
]
ind = (dt.month-1) // 3 + 1
curr_qr, last_qr = qrts[ind],qrts[ind-1]
return curr_qr if abs(curr_qr - dt) <= abs(last_qr - dt) else last_qr
print(find_qrt(dt))
The first function will return 2015-06-30 because the earlier date breaks the tie, for the second we get 2015-09-30 as we take the current quarter in the event of a tie.
The first function
Say we have a system that runs continuously, and that we make some changes to it on a particular date start_date.
We would like to compare the effects of the changes between:
The time window of full days between start_date and today's date (shown in yellow below)
The equivalent time window (same days of the week) of full days that took place right before start_date (shown in blue below)
For example, say I started my experiments on March 25 (in red), and that today is March 29 (green), I would like to obtain the four dates that define
time_window_before (the two dates in yellow) and time_window_after( the two dates in blue).
The idea is to compare the results of the experiment started on start_date before and after the experiment started, on the longest possible number of days on a time window that is symmetric (in terms of days of the week) to the date the experiment started.
In other words, given start_date and today's date, how can I find the pairs of dates that define time_window_before time_window_after ( as datetime objects)?
Update
Since I was asked what happens if start_date and today's date don't fall on the same week, below is one such example:
Python's datetime library has all the methods you need to add and subtract the dates:
from datetime import date, timedelta
def get_time_window_after(experiment_start_date, experiment_end_date):
# Add 1 day to start and subtract 1 day from end
print "After Start: %s" %(experiment_start_date + timedelta(days = 1))
print "After End: %s" %(experiment_end_date - timedelta(days = 1))
def get_time_window_before(experiment_start_date, experiment_end_date):
# Find the total length of the experiment
delta = experiment_end_date - experiment_start_date
# Determine how many weeks it covers (add 1 because same week would be 0)
delta_magnitude = 1 + (delta.days / 7)
# Subtract 7 days per week that the experiment covered, also add/subtract 1 day
print "Before Start: %s" %(experiment_start_date - timedelta(days = 7 * delta_magnitude) + timedelta(days = 1))
print "Before End: %s" %(experiment_end_date - timedelta(days = 7 * delta_magnitude) - timedelta(days = 1))
Here's the examples I ran the code with to make sure it works:
print "\nResults for March 25 2014 to March 29 2014"
get_time_window_after(date(2014, 3, 25), date(2014, 3, 29))
get_time_window_before(date(2014, 3, 25), date(2014, 3, 29))
print "\nResults for March 18 2014 to March 29 2014"
get_time_window_after(date(2014, 3, 18), date(2014, 3, 29))
get_time_window_before(date(2014, 3, 18), date(2014, 3, 29))
print "\nResults for March 18 2014 to April 04 2014"
get_time_window_after(date(2014, 3, 18), date(2014, 4, 4))
get_time_window_before(date(2014, 3, 18), date(2014, 4, 4))
Note: If you make these functions return values and set variables, you could use the time_window_after as input into get_time_window_before() function and forego the duplicated timedelta(days = 1) logic.
This works at least in your two samples, would this be good enough?:
experiment_start_date = datetime.date(2014,3,18)
now=datetime.date(2014,3,29)
day_after1 = experiment_start_date+datetime.timedelta(1)
day_after2 = now-datetime.timedelta(1)
day_before2 = experiment_start_date-datetime.timedelta(day_after2.weekday()-experiment_start_date.weekday()+1)
day_before1 = day_before2-(day_after2-day_after1)
The following should do it:
def get_symmetric_time_window_fwd(ref_date, end_date):
da1 = ref_date+timedelta(1)
da2 = end_date-timedelta(1)
if da2.weekday() >= ref_date.weekday():
db2 = da2 - timedelta( 7 * (1+int((end_date - ref_date).days/7)))
else:
db2 = da2 - timedelta( 7 * (int((end_date - ref_date).days/7)))
db1 = db2-(da2-da1)
return da1, da2, db1, db2
Test 1:
In: get_symmetric_time_window(date(2014, 3, 18), date(2014, 3, 29))
Out:
(datetime.date(2014, 3, 19),
datetime.date(2014, 3, 28),
datetime.date(2014, 3, 5),
datetime.date(2014, 3, 14))
Test 2:
In: get_symmetric_time_window(date(2014, 3, 25), date(2014, 3, 29))
Out:
(datetime.date(2014, 3, 26),
datetime.date(2014, 3, 28),
datetime.date(2014, 3, 19),
datetime.date(2014, 3, 21))
Test 3:
In: get_symmetric_time_window(date(2014, 7, 17), date(2014, 7, 23))
Out:
(datetime.date(2014, 7, 18),
datetime.date(2014, 7, 22),
datetime.date(2014, 7, 11),
datetime.date(2014, 7, 15))