Calculate Average Number of Days Between Multiple Dates - python

Let's say I have the following data frame. I want to calculate the average number of days between all the activities for a particular account.
Below is my desired result:
Now I know how to calculate the number of days between two dates with the following code. But I don't know how to calculate what I am looking for across multiple dates.
from datetime import date
d0 = date(2016, 8, 18)
d1 = date(2016, 9, 26)
delta = d0 - d1
print delta.days

I would do this as follows in pandas (assuming the Date column is a datetime64):
In [11]: df
Out[11]:
Account Activity Date
0 A a 2015-10-21
1 A b 2016-07-07
2 A c 2016-07-07
3 A d 2016-09-14
4 A e 2016-10-12
5 B a 2015-11-24
6 B b 2015-12-30
In [12]: df.groupby("Account")["Date"].apply(lambda x: x.diff().mean())
Out[12]:
Account
A 89 days 06:00:00
B 36 days 00:00:00
Name: Date, dtype: timedelta64[ns]

If your dates are in a list:
>>> from datetime import date
>>> dates = [date(2015, 10, 21), date(2016, 7, 7), date(2016, 7, 7), date(2016, 9, 14), date(2016, 10, 12), date(2016, 10, 12), date(2016, 11, 22), date(2016, 12, 21)]
>>> differences = [(dates[i]-dates[i-1]).days for i in range(1, len(dates))] #[260, 0, 69, 28, 0, 41, 29]
>>> float(sum(differences))/len(differences)
61.0
>>>

Related

yearly spaced dates on the exact same date

I would like to have the exact same date every year from an end date to next one from today. For example if my end is "20251220", I would like to get the following list of dates
"20211220","20221220","20231220","20241220". However, if it was "20250220" I only need "20220220","20230220","20240220" as we already passed February. I tried to a simple loop by myself (see below) where I then would check at then end if first date is in the past. But I think there must be a build in function to do this, via pandas or dateutil etc.
I've tried this:
In [455]: import datetime as dt
In [456]: end = dt.date(2025, 12,20)
In [457]: start = dt.date(dt.datetime.today().year, end.month,end.d)
In [458]: periods = end.year-start.year
In [461]: l = [dt.date(start.year + i, 12, 20) for i in range(0,periods)]
In [462]: l
Out[462]:
[datetime.date(2021, 12, 20),
datetime.date(2022, 12, 20),
datetime.date(2023, 12, 20),
datetime.date(2024, 12, 20),
datetime.date(2025, 12, 20)]
One idea with list comprehension:
import datetime as dt
end = dt.date(2025, 12,20)
today = dt.datetime.today()
l = [end.replace(year=i)
for i in range(today.year, end.year)
if end.replace(year=i) > today.date()]
print (l)
[datetime.date(2021, 12, 20),
datetime.date(2022, 12, 20),
datetime.date(2023, 12, 20),
datetime.date(2024, 12, 20)]
import datetime as dt
end = dt.date(2025, 2,20)
today = dt.datetime.today()
l = [end.replace(year=i)
for i in range(today.year, end.year)
if end.replace(year=i) > today.date()]
print (l)
[datetime.date(2022, 2, 20), datetime.date(2023, 2, 20), datetime.date(2024, 2, 20)]
Solution working with 29 February:
import datetime as dt
end = dt.date(2028, 2,29)
today = dt.datetime.today()
l = [(end + pd.offsets.DateOffset(year=i)).date()
for i in range(today.year, end.year)
if (end + pd.offsets.DateOffset(year=i)) > today]
print (l)
[datetime.date(2022, 2, 28),
datetime.date(2023, 2, 28),
datetime.date(2024, 2, 29),
datetime.date(2025, 2, 28),
datetime.date(2026, 2, 28),
datetime.date(2027, 2, 28)]

composing a dateframe with an index of a datetime

I have two lists but one is a datetime. How can I combine to form a date frame with index of this datetime and the values of lista2?
lista1 = [datetime.datetime(2017, 11, 11, 0, 0), datetime.datetime(2017, 11, 12, 0, 0), datetime.datetime(2017, 11, 13, 0, 0)]
lista2 = [31488, 14335, 89]
You can use the index parameter from the constructor to specify a list as indices, and use the other one as data:
pd.DataFrame(lista2,index=lista1)
For your sample data, this gives:
>>> pd.DataFrame(lista2,index=lista1)
0
2017-11-11 31488
2017-11-12 14335
2017-11-13 89
past two list to a list of tuple
pd.DataFrame(list(zip(lista1,lista2))).set_index(0)
Out[646]:
1
0
2017-11-11 31488
2017-11-12 14335
2017-11-13 89

Find the closest quarter-end for a date?

I am trying to find the quarter-end closest to a given date: e.g. the closest quarter-end for 5/27/2014 would be 6/30/2014 and for 2/2/2013 would be 12/31/2012. I have the following but it doesn't give me the expected output for a date like 8/15/2015:
import datetime
tester = datetime.datetime(2015, 8, 15)
calendar_date = datetime.datetime(tester.year - 1, 12, 31)
for dd in [(3, 31), (6, 30), (9, 30), (12, 31)]:
diff = abs(datetime.datetime(tester.year, dd[0], dd[1]) - tester)
if diff.days <= 45:
calendar_date = datetime.datetime(tester.year, dd[0], dd[1])
break
print tester, calendar_date
I've simplified by just assuming each quarter is 90 days and thus take 1/2 of that at 45 days (is there a better way???) but clearly that doesn't work for 8/16/2015 as it prints:
2015-08-15 00:00:00 2014-12-31 00:00:00
I was expecting 2015-09-30 00:00:00
datetime.timedelta might be negative and diff.days <= 45 is always true for negative time deltas, hence the incorrect result.
You already had a simple solution with the candidates in place. These are
Last quarter of year before target date
All four quarters of the current year
datetime.timedelta objects have relative comparison operators, i.e. they form a total order, which means there's a minimum. As noted in the comments by Padraic Cunningham, you want the candidate with the minimum absolute distance to the target date:
def get_closest_quarter(target):
# candidate list, nicely enough none of these
# are in February, so the month lengths are fixed
candidates = [
datetime.date(target.year - 1, 12, 31),
datetime.date(target.year, 3, 31),
datetime.date(target.year, 6, 30),
datetime.date(target.year, 9, 30),
datetime.date(target.year, 12, 31),
]
# take the minimum according to the absolute distance to
# the target date.
return min(candidates, key=lambda d: abs(target - d))
The code here uses datetime.date for simplicity, but it should be easy to generalize to datetime.datetime if necessary.
You can get away with comparing just two dates, using ind = (dt.month-1) // 3 + 1 to get the index for the current quarter:
def find_qrt(dt):
qrts = [date(dt.year - 1, 12, 31), date(dt.year, 3, 31),
date(dt.year, 6, 30), date(dt.year, 9, 30),
date(dt.year, 12, 31),
]
ind = (dt.month-1) // 3 + 1
curr_qr, last_qr = qrts[ind], qrts[ind-1]
return curr_qr if abs(curr_qr - dt) < abs(last_qr - dt) else last_qr
If you want to return the later quarter in the case of a tie as per your example date we just need to use <=:
dt = datetime.date(2015, 8, 15)
from datetime import date
def find_qrt(dt):
qrts = [date(dt.year - 1, 12, 31), date(dt.year, 3, 31),
date(dt.year, 6, 30), date(dt.year, 9, 30),
date(dt.year, 12, 31),
]
ind = (dt.month-1) // 3 + 1
curr_qr, last_qr = qrts[ind],qrts[ind-1]
return curr_qr if abs(curr_qr - dt) <= abs(last_qr - dt) else last_qr
print(find_qrt(dt))
The first function will return 2015-06-30 because the earlier date breaks the tie, for the second we get 2015-09-30 as we take the current quarter in the event of a tie.
The first function

Mirroring dates in Python

Say we have a system that runs continuously, and that we make some changes to it on a particular date start_date.
We would like to compare the effects of the changes between:
The time window of full days between start_date and today's date (shown in yellow below)
The equivalent time window (same days of the week) of full days that took place right before start_date (shown in blue below)
For example, say I started my experiments on March 25 (in red), and that today is March 29 (green), I would like to obtain the four dates that define
time_window_before (the two dates in yellow) and time_window_after( the two dates in blue).
                       
The idea is to compare the results of the experiment started on start_date before and after the experiment started, on the longest possible number of days on a time window that is symmetric (in terms of days of the week) to the date the experiment started.
In other words, given start_date and today's date, how can I find the pairs of dates that define time_window_before time_window_after ( as datetime objects)?
Update
Since I was asked what happens if start_date and today's date don't fall on the same week, below is one such example:
                       
Python's datetime library has all the methods you need to add and subtract the dates:
from datetime import date, timedelta
def get_time_window_after(experiment_start_date, experiment_end_date):
# Add 1 day to start and subtract 1 day from end
print "After Start: %s" %(experiment_start_date + timedelta(days = 1))
print "After End: %s" %(experiment_end_date - timedelta(days = 1))
def get_time_window_before(experiment_start_date, experiment_end_date):
# Find the total length of the experiment
delta = experiment_end_date - experiment_start_date
# Determine how many weeks it covers (add 1 because same week would be 0)
delta_magnitude = 1 + (delta.days / 7)
# Subtract 7 days per week that the experiment covered, also add/subtract 1 day
print "Before Start: %s" %(experiment_start_date - timedelta(days = 7 * delta_magnitude) + timedelta(days = 1))
print "Before End: %s" %(experiment_end_date - timedelta(days = 7 * delta_magnitude) - timedelta(days = 1))
Here's the examples I ran the code with to make sure it works:
print "\nResults for March 25 2014 to March 29 2014"
get_time_window_after(date(2014, 3, 25), date(2014, 3, 29))
get_time_window_before(date(2014, 3, 25), date(2014, 3, 29))
print "\nResults for March 18 2014 to March 29 2014"
get_time_window_after(date(2014, 3, 18), date(2014, 3, 29))
get_time_window_before(date(2014, 3, 18), date(2014, 3, 29))
print "\nResults for March 18 2014 to April 04 2014"
get_time_window_after(date(2014, 3, 18), date(2014, 4, 4))
get_time_window_before(date(2014, 3, 18), date(2014, 4, 4))
Note: If you make these functions return values and set variables, you could use the time_window_after as input into get_time_window_before() function and forego the duplicated timedelta(days = 1) logic.
This works at least in your two samples, would this be good enough?:
experiment_start_date = datetime.date(2014,3,18)
now=datetime.date(2014,3,29)
day_after1 = experiment_start_date+datetime.timedelta(1)
day_after2 = now-datetime.timedelta(1)
day_before2 = experiment_start_date-datetime.timedelta(day_after2.weekday()-experiment_start_date.weekday()+1)
day_before1 = day_before2-(day_after2-day_after1)
The following should do it:
def get_symmetric_time_window_fwd(ref_date, end_date):
da1 = ref_date+timedelta(1)
da2 = end_date-timedelta(1)
if da2.weekday() >= ref_date.weekday():
db2 = da2 - timedelta( 7 * (1+int((end_date - ref_date).days/7)))
else:
db2 = da2 - timedelta( 7 * (int((end_date - ref_date).days/7)))
db1 = db2-(da2-da1)
return da1, da2, db1, db2
Test 1:
In: get_symmetric_time_window(date(2014, 3, 18), date(2014, 3, 29))
Out:
(datetime.date(2014, 3, 19),
datetime.date(2014, 3, 28),
datetime.date(2014, 3, 5),
datetime.date(2014, 3, 14))
Test 2:
In: get_symmetric_time_window(date(2014, 3, 25), date(2014, 3, 29))
Out:
(datetime.date(2014, 3, 26),
datetime.date(2014, 3, 28),
datetime.date(2014, 3, 19),
datetime.date(2014, 3, 21))
Test 3:
In: get_symmetric_time_window(date(2014, 7, 17), date(2014, 7, 23))
Out:
(datetime.date(2014, 7, 18),
datetime.date(2014, 7, 22),
datetime.date(2014, 7, 11),
datetime.date(2014, 7, 15))

Pandas: Combine different timespans and cumsum

I have the following DataFrame:
from datetime import datetime
from pandas import DataFrame
df = DataFrame({
'Buyer': ['Carl', 'Carl', 'Carl', 'Carl', 'Joe', 'Carl'],
'Quantity': [18, 3, 5, 1, 9, 3],
'Date': [
datetime(2013, 9, 1, 13, 0),
datetime(2013, 9, 1, 13, 5),
datetime(2013, 10, 1, 20, 0),
datetime(2013, 10, 3, 10, 0),
datetime(2013, 12, 2, 12, 0),
datetime(2013, 9, 2, 14, 0),
]
})
First: I am looking to add another column to this DataFrame which sums up the purchases of the last 5 days for each buyer. In particular the result should look like this:
Quantity
Buyer Date
Carl 2013-09-01 21
2013-09-02 24
2013-10-01 5
2013-10-03 6
Joe 2013-12-02 9
To do so I started with the following:
df1 = (df.set_index(['Date', 'Buyer'])
.unstack(level=[1])
.resample('D', how='sum')
.fillna(0))
However, I do not know how to add another column to this DataFrame which can add up for each row the previous 5 row entries.
Second:
Add another column to this DataFrame which does not only sum up the purchases of the last 5 days like in (1) but also weights these purchases based on their dates. For example: those purchases from 5 days ago should be counted 20%, those from 4 days ago 40%, those from 3 days ago 60%, those from 2 days ago 80% and those from one day ago and from today 100%

Categories