I am trying to find the quarter-end closest to a given date: e.g. the closest quarter-end for 5/27/2014 would be 6/30/2014 and for 2/2/2013 would be 12/31/2012. I have the following but it doesn't give me the expected output for a date like 8/15/2015:
import datetime
tester = datetime.datetime(2015, 8, 15)
calendar_date = datetime.datetime(tester.year - 1, 12, 31)
for dd in [(3, 31), (6, 30), (9, 30), (12, 31)]:
diff = abs(datetime.datetime(tester.year, dd[0], dd[1]) - tester)
if diff.days <= 45:
calendar_date = datetime.datetime(tester.year, dd[0], dd[1])
break
print tester, calendar_date
I've simplified by just assuming each quarter is 90 days and thus take 1/2 of that at 45 days (is there a better way???) but clearly that doesn't work for 8/16/2015 as it prints:
2015-08-15 00:00:00 2014-12-31 00:00:00
I was expecting 2015-09-30 00:00:00
datetime.timedelta might be negative and diff.days <= 45 is always true for negative time deltas, hence the incorrect result.
You already had a simple solution with the candidates in place. These are
Last quarter of year before target date
All four quarters of the current year
datetime.timedelta objects have relative comparison operators, i.e. they form a total order, which means there's a minimum. As noted in the comments by Padraic Cunningham, you want the candidate with the minimum absolute distance to the target date:
def get_closest_quarter(target):
# candidate list, nicely enough none of these
# are in February, so the month lengths are fixed
candidates = [
datetime.date(target.year - 1, 12, 31),
datetime.date(target.year, 3, 31),
datetime.date(target.year, 6, 30),
datetime.date(target.year, 9, 30),
datetime.date(target.year, 12, 31),
]
# take the minimum according to the absolute distance to
# the target date.
return min(candidates, key=lambda d: abs(target - d))
The code here uses datetime.date for simplicity, but it should be easy to generalize to datetime.datetime if necessary.
You can get away with comparing just two dates, using ind = (dt.month-1) // 3 + 1 to get the index for the current quarter:
def find_qrt(dt):
qrts = [date(dt.year - 1, 12, 31), date(dt.year, 3, 31),
date(dt.year, 6, 30), date(dt.year, 9, 30),
date(dt.year, 12, 31),
]
ind = (dt.month-1) // 3 + 1
curr_qr, last_qr = qrts[ind], qrts[ind-1]
return curr_qr if abs(curr_qr - dt) < abs(last_qr - dt) else last_qr
If you want to return the later quarter in the case of a tie as per your example date we just need to use <=:
dt = datetime.date(2015, 8, 15)
from datetime import date
def find_qrt(dt):
qrts = [date(dt.year - 1, 12, 31), date(dt.year, 3, 31),
date(dt.year, 6, 30), date(dt.year, 9, 30),
date(dt.year, 12, 31),
]
ind = (dt.month-1) // 3 + 1
curr_qr, last_qr = qrts[ind],qrts[ind-1]
return curr_qr if abs(curr_qr - dt) <= abs(last_qr - dt) else last_qr
print(find_qrt(dt))
The first function will return 2015-06-30 because the earlier date breaks the tie, for the second we get 2015-09-30 as we take the current quarter in the event of a tie.
The first function
Related
I have a situation where I have a code with which I am processing data for operated shifts.
In it, I have arrays for start and end of shifts (e.g. shift_start[0] and shift_end[0] for shift #1), and for the time between them, I need to know how many weekdays, holidays or weekend days.
The holidays I have already defined in an array of datetime entries, which should represent the holidays of a specific country (it's not the same as here and I do not seek for further more dynamic options here yet).
So basically I have it like that:
started = [datetime.datetime(2022, 2, 1, 0, 0), datetime.datetime(2022, 2, 5, 8, 0), datetime.datetime(2022, 2, 23, 11, 19, 28)]
ended = [datetime.datetime(2022, 2, 2, 16, 0), datetime.datetime(2022, 2, 5, 17, 19, 28), datetime.datetime(2022, 4, 26, 12, 30)]
holidays = [datetime.datetime(2022, 1, 3), datetime.datetime(2022, 3, 3), datetime.datetime(2022, 4, 22), datetime.datetime(2022, 4, 25)]
I'm seeking for options to go thru each of the 3 ranges and match the number of days it contains (e.g. the first range should contain 2 weekdays, the second - one weekend day)
So based on the suggestion by #gimix, I was able to develop what I needed:
for each_start, each_end in zip(started, ended): # For each period
for single_date in self.daterange(each_start, each_end): # For each day of each period
# Checking if holiday or weekend
if (single_date.replace(hour=0, minute=0, second=0) in holidays) or (single_date.weekday() > 4):
set_special_days_worked(1)
# If not holiday or weekend, then it is regular working day
else:
set_regular_days_worked(1)
I would like to have the exact same date every year from an end date to next one from today. For example if my end is "20251220", I would like to get the following list of dates
"20211220","20221220","20231220","20241220". However, if it was "20250220" I only need "20220220","20230220","20240220" as we already passed February. I tried to a simple loop by myself (see below) where I then would check at then end if first date is in the past. But I think there must be a build in function to do this, via pandas or dateutil etc.
I've tried this:
In [455]: import datetime as dt
In [456]: end = dt.date(2025, 12,20)
In [457]: start = dt.date(dt.datetime.today().year, end.month,end.d)
In [458]: periods = end.year-start.year
In [461]: l = [dt.date(start.year + i, 12, 20) for i in range(0,periods)]
In [462]: l
Out[462]:
[datetime.date(2021, 12, 20),
datetime.date(2022, 12, 20),
datetime.date(2023, 12, 20),
datetime.date(2024, 12, 20),
datetime.date(2025, 12, 20)]
One idea with list comprehension:
import datetime as dt
end = dt.date(2025, 12,20)
today = dt.datetime.today()
l = [end.replace(year=i)
for i in range(today.year, end.year)
if end.replace(year=i) > today.date()]
print (l)
[datetime.date(2021, 12, 20),
datetime.date(2022, 12, 20),
datetime.date(2023, 12, 20),
datetime.date(2024, 12, 20)]
import datetime as dt
end = dt.date(2025, 2,20)
today = dt.datetime.today()
l = [end.replace(year=i)
for i in range(today.year, end.year)
if end.replace(year=i) > today.date()]
print (l)
[datetime.date(2022, 2, 20), datetime.date(2023, 2, 20), datetime.date(2024, 2, 20)]
Solution working with 29 February:
import datetime as dt
end = dt.date(2028, 2,29)
today = dt.datetime.today()
l = [(end + pd.offsets.DateOffset(year=i)).date()
for i in range(today.year, end.year)
if (end + pd.offsets.DateOffset(year=i)) > today]
print (l)
[datetime.date(2022, 2, 28),
datetime.date(2023, 2, 28),
datetime.date(2024, 2, 29),
datetime.date(2025, 2, 28),
datetime.date(2026, 2, 28),
datetime.date(2027, 2, 28)]
How do I convert a list of dates that are in the form yyyymmdd to a serial number? For example, if I have this list of dates:
t = [1898-10-12 06:00,1898-10-12 12:00,1932-09-30 08:00,1932-09-30 00:00]
How do I convert each date to a serial number? Im currently using the datetime toordinal() command, but each date is being rounded to the same serial number. How do I get the same dates with different times to be different numbers?
The times in the list are the datetime.datetime numbers. I tried then doing:
thurser = []
for i in range(len(t)):
thurser.append(t[i].toordinal())
But am not getting serial numbers as floats.
datetime.toordinal() considers only the 'date' part of the datetime object, not the time. So does date.toordinal() - it only has a date part. The first 2 and last 2 elements in your list have datetimes on the same date but at different times, which .toordinal ignores. So, .toordinal will give you the same value for those same-dated datetimes.
In general, the solution would be to calculate the delta between your dates and a pre-determined/fixed one. I'm using datetime.datetime(1, 1, 1), the earliest possible datetime, so all the deltas are positive:
thurser = []
# assuming t is a list of datetime objects
for d in t:
delta = d - datetime.datetime(1, 1, 1)
thurser.append(delta.days + delta.seconds/(24 * 3600))
>>> print(thurser)
[693149.25, 693149.5, 705555.3333333334, 705555.0]
And if you prefer ints instead of floats, then use seconds instead of days:
thurser.append(int(delta.total_seconds())) # total_seconds has microseconds in the float
>>> print(thurser)
[59888095200, 59888116800, 60959980800, 60959952000]
And to get back the original values in the 2nd example:
>>> [datetime.timedelta(seconds=d) + datetime.datetime(1, 1, 1) for d in thurser]
[datetime.datetime(1898, 10, 12, 6, 0), datetime.datetime(1898, 10, 12, 12, 0),
datetime.datetime(1932, 9, 30, 8, 0), datetime.datetime(1932, 9, 30, 0, 0)]
>>> _ == t # compare with original values
True
Let me know if my understanding is wrong, I tried following and gives distinct numbers for each value of the list.
I modified
t = ['1898-10-12 06:00','1898-10-12 12:00','1932-09-30 08:00','1932-09-30 00:00']
with
t = [datetime.datetime(1898, 10, 12, 6, 0), datetime.datetime(1898, 10, 12, 12, 0), datetime.datetime(1932, 9, 30, 8, 0), datetime.datetime(1932, 9, 30, 0, 0)]
As mentioned in comment it is list of datetime.datetime.
I am considering total MilliSeconds from 1970-01-01 00:00:00 the given date to generate a number.
So dates which are before above date give values in negative. But distinct values.
t = [datetime.datetime(1898, 10, 12, 6, 0), datetime.datetime(1898, 10, 12, 12, 0), datetime.datetime(1932, 9, 30, 8, 0), datetime.datetime(1932, 9, 30, 0, 0)]
thurser = []
x = []
for i in range(len(t)):
thurser.append(t[i].toordinal())
x.append((t[i]-datetime.datetime.utcfromtimestamp(0)).total_seconds() * 1000.0)
print(thurser)
print(x)
output:
[693150, 693150, 705556, 705556]
[-2247501600000.0, -2247480000000.0, -1175616000000.0, -1175644800000.0]
Let's say I have the following data frame. I want to calculate the average number of days between all the activities for a particular account.
Below is my desired result:
Now I know how to calculate the number of days between two dates with the following code. But I don't know how to calculate what I am looking for across multiple dates.
from datetime import date
d0 = date(2016, 8, 18)
d1 = date(2016, 9, 26)
delta = d0 - d1
print delta.days
I would do this as follows in pandas (assuming the Date column is a datetime64):
In [11]: df
Out[11]:
Account Activity Date
0 A a 2015-10-21
1 A b 2016-07-07
2 A c 2016-07-07
3 A d 2016-09-14
4 A e 2016-10-12
5 B a 2015-11-24
6 B b 2015-12-30
In [12]: df.groupby("Account")["Date"].apply(lambda x: x.diff().mean())
Out[12]:
Account
A 89 days 06:00:00
B 36 days 00:00:00
Name: Date, dtype: timedelta64[ns]
If your dates are in a list:
>>> from datetime import date
>>> dates = [date(2015, 10, 21), date(2016, 7, 7), date(2016, 7, 7), date(2016, 9, 14), date(2016, 10, 12), date(2016, 10, 12), date(2016, 11, 22), date(2016, 12, 21)]
>>> differences = [(dates[i]-dates[i-1]).days for i in range(1, len(dates))] #[260, 0, 69, 28, 0, 41, 29]
>>> float(sum(differences))/len(differences)
61.0
>>>
Say we have a system that runs continuously, and that we make some changes to it on a particular date start_date.
We would like to compare the effects of the changes between:
The time window of full days between start_date and today's date (shown in yellow below)
The equivalent time window (same days of the week) of full days that took place right before start_date (shown in blue below)
For example, say I started my experiments on March 25 (in red), and that today is March 29 (green), I would like to obtain the four dates that define
time_window_before (the two dates in yellow) and time_window_after( the two dates in blue).
The idea is to compare the results of the experiment started on start_date before and after the experiment started, on the longest possible number of days on a time window that is symmetric (in terms of days of the week) to the date the experiment started.
In other words, given start_date and today's date, how can I find the pairs of dates that define time_window_before time_window_after ( as datetime objects)?
Update
Since I was asked what happens if start_date and today's date don't fall on the same week, below is one such example:
Python's datetime library has all the methods you need to add and subtract the dates:
from datetime import date, timedelta
def get_time_window_after(experiment_start_date, experiment_end_date):
# Add 1 day to start and subtract 1 day from end
print "After Start: %s" %(experiment_start_date + timedelta(days = 1))
print "After End: %s" %(experiment_end_date - timedelta(days = 1))
def get_time_window_before(experiment_start_date, experiment_end_date):
# Find the total length of the experiment
delta = experiment_end_date - experiment_start_date
# Determine how many weeks it covers (add 1 because same week would be 0)
delta_magnitude = 1 + (delta.days / 7)
# Subtract 7 days per week that the experiment covered, also add/subtract 1 day
print "Before Start: %s" %(experiment_start_date - timedelta(days = 7 * delta_magnitude) + timedelta(days = 1))
print "Before End: %s" %(experiment_end_date - timedelta(days = 7 * delta_magnitude) - timedelta(days = 1))
Here's the examples I ran the code with to make sure it works:
print "\nResults for March 25 2014 to March 29 2014"
get_time_window_after(date(2014, 3, 25), date(2014, 3, 29))
get_time_window_before(date(2014, 3, 25), date(2014, 3, 29))
print "\nResults for March 18 2014 to March 29 2014"
get_time_window_after(date(2014, 3, 18), date(2014, 3, 29))
get_time_window_before(date(2014, 3, 18), date(2014, 3, 29))
print "\nResults for March 18 2014 to April 04 2014"
get_time_window_after(date(2014, 3, 18), date(2014, 4, 4))
get_time_window_before(date(2014, 3, 18), date(2014, 4, 4))
Note: If you make these functions return values and set variables, you could use the time_window_after as input into get_time_window_before() function and forego the duplicated timedelta(days = 1) logic.
This works at least in your two samples, would this be good enough?:
experiment_start_date = datetime.date(2014,3,18)
now=datetime.date(2014,3,29)
day_after1 = experiment_start_date+datetime.timedelta(1)
day_after2 = now-datetime.timedelta(1)
day_before2 = experiment_start_date-datetime.timedelta(day_after2.weekday()-experiment_start_date.weekday()+1)
day_before1 = day_before2-(day_after2-day_after1)
The following should do it:
def get_symmetric_time_window_fwd(ref_date, end_date):
da1 = ref_date+timedelta(1)
da2 = end_date-timedelta(1)
if da2.weekday() >= ref_date.weekday():
db2 = da2 - timedelta( 7 * (1+int((end_date - ref_date).days/7)))
else:
db2 = da2 - timedelta( 7 * (int((end_date - ref_date).days/7)))
db1 = db2-(da2-da1)
return da1, da2, db1, db2
Test 1:
In: get_symmetric_time_window(date(2014, 3, 18), date(2014, 3, 29))
Out:
(datetime.date(2014, 3, 19),
datetime.date(2014, 3, 28),
datetime.date(2014, 3, 5),
datetime.date(2014, 3, 14))
Test 2:
In: get_symmetric_time_window(date(2014, 3, 25), date(2014, 3, 29))
Out:
(datetime.date(2014, 3, 26),
datetime.date(2014, 3, 28),
datetime.date(2014, 3, 19),
datetime.date(2014, 3, 21))
Test 3:
In: get_symmetric_time_window(date(2014, 7, 17), date(2014, 7, 23))
Out:
(datetime.date(2014, 7, 18),
datetime.date(2014, 7, 22),
datetime.date(2014, 7, 11),
datetime.date(2014, 7, 15))