Iterate over every quarter of the year - python

I am trying to iterate over every quarter of the year, this is what I have so far.
now = datetime.now()
first_day = datetime(year=now.year, month=1, day=1)
print("--",first_day)
hundredDaysLater = first_day - timedelta(days=100)
for dt in rrule.rrule(rrule.MONTHLY, dtstart=first_day, bymonthday=(31, -1), count=6, interval=3):
print(dt.replace(day=1))
print(dt)
output
-- 2018-01-01 00:00:00
2018-01-01 00:00:00
2018-01-31 00:00:00
2018-04-01 00:00:00
2018-04-30 00:00:00
2018-07-01 00:00:00
2018-07-31 00:00:00
2018-10-01 00:00:00
2018-10-31 00:00:00
2019-01-01 00:00:00
2019-01-31 00:00:00
2019-04-01 00:00:00
Now I basicly need to get this output
2018-01-01, 2018-03-31
2018-04-01, 2018-06-30
2018-07-01, 2018-09-30
2018-10-01, 2018-12-31
But instead of counting forwards I need to count backwards.

Using relativedelta (from the dateutil package):
>>> from dateutil.relativedelta import relativedelta
>>> from datetime import date
>>> d = date(2019, 1, 1)
>>> day = relativedelta(days=1)
>>> quarter = relativedelta(months=3)
>>> while True:
... print(d - day)
... print(d - quarter)
... d -= quarter
...
2018-12-31
2018-10-01
2018-09-30
2018-07-01
2018-06-30
2018-04-01
2018-03-31
2018-01-01
2017-12-31
2017-10-01
2017-09-30
2017-07-01
2017-06-30
2017-04-01
2017-03-31
...

Short solution with pandas.date_range function:
In [708]: start_date = datetime.datetime(2018, 1, 1)
In [709]: data = pd.date_range(start=start_date, periods=4, freq='QS-JAN').union(
pd.date_range(start=start_date, periods=4, freq='Q-DEC')).strftime('%Y-%m-%d')
In [710]: for i in range(0, data.size, 2):
...: print(data[i], data[i+1], sep=', ')
...:
2018-01-01, 2018-03-31
2018-04-01, 2018-06-30
2018-07-01, 2018-09-30
2018-10-01, 2018-12-31

Related

Slice a df into windows of 3Y and 1M with a date range Python

I have a df with a date index as follow:
ind = pd.date_range(start="2015-12-31", end = "2022-04-26", freq="D")
df = pd.DataFrame(
{
"col1": range(len(ind))
},
index=ind
)
What I need is slice the df in windows from the end of each month from 2017-08-31 to 3 years plus 1 month, so I have the next chunk of code
n = timedelta(365 * 3) + relativedelta(months=1)
fechas_ = pd.date_range("2017-08-31", ind.max() - n, freq="M")
# create a for loop to check the beginning and the end of each window
for i in fechas_:
print(f"start: {i}")
print(f"end: {i + n}")
print("\n")
My problem is that I need the last day of the month as the end of each window e.g.:
# first window
start: 2017-08-31 00:00:00
end: 2020-09-30 00:00:00
# second window
start: 2017-09-30 00:00:00
end: 2020-10-31 00:00:00
# so on
But I'm getting:
# first window
start: 2017-08-31 00:00:00
end: 2020-09-29 00:00:00
# second window
start: 2017-09-30 00:00:00
end: 2020-10-29 00:00:00
# 3
2017-10-31 00:00:00
2020-11-29 00:00:00
# 4
2017-11-30 00:00:00
2020-12-29 00:00:00
# 5
2017-12-31 00:00:00
2021-01-30 00:00:00
# 6
2018-01-31 00:00:00
2021-02-27 00:00:00
# 7
2018-02-28 00:00:00
2021-03-27 00:00:00
# 8
2018-03-31 00:00:00
2021-04-29 00:00:00
# 9
2018-04-30 00:00:00
2021-05-29 00:00:00
# 10
2018-05-31 00:00:00
2021-06-29 00:00:00
# 11
2018-06-30 00:00:00
2021-07-29 00:00:00
# 12
2018-07-31 00:00:00
2021-08-30 00:00:00
# 13
2018-08-31 00:00:00
2021-09-29 00:00:00
# 14
2018-09-30 00:00:00
2021-10-29 00:00:00
# 15
2018-10-31 00:00:00
2021-11-29 00:00:00
# 16
2018-11-30 00:00:00
2021-12-29 00:00:00
# 17
2018-12-31 00:00:00
2022-01-30 00:00:00
# 18
2019-01-31 00:00:00
2022-02-27 00:00:00
# 19
2019-02-28 00:00:00
2022-03-27 00:00:00
Does someone know how can I solve this?
Thanks a lot
In your code
n = timedelta(365 * 3) + relativedelta(months=1)
try replacing it with
n = relativedelta(years=3, months=1, day=31)

After groupby, evaluate value in column against column values in all rows in the group

I am looking for the following functionality in python:
I have a Pandas DataFrame with 4 columns: ID, StartDate, EndDate, Moment.
I want to group by ID and evaluate per row in the group whether the Moment variable falls between the interval between StartDate and EndDate. The problem is that I want to evaluate this for each row in the group. For example in the following DataFrame there are two groups (ID=1 and ID=2) and both groups contains of 5 rows. For each row, I want a boolean for each row in both groups whether the moment variable in that row falls in ANY of the time windows in the group, the window being [date1, date2].
import pandas as pd
i = pd.date_range('2018-04-11', periods=10, freq='2D20min')
i2 = pd.date_range('2018-04-12', periods=10, freq='2D20min')
i3 = pd.date_range('2018-04-9', periods=10, freq='1D6H')
id = ['1', '1', '1', '1', '1', '2', '2', '2', '2', '2']
ts = pd.DataFrame({'date1': i, 'date2': i2, 'moment': i3}, index=id)
ID date1 date2 moment
1 2018-04-11 00:00:00 2018-04-12 00:00:00 2018-04-09 00:00:00
1 2018-04-13 00:20:00 2018-04-14 00:20:00 2018-04-10 06:00:00
1 2018-04-15 00:40:00 2018-04-16 00:40:00 2018-04-11 12:00:00
1 2018-04-17 01:00:00 2018-04-18 01:00:00 2018-04-12 18:00:00
1 2018-04-19 01:20:00 2018-04-20 01:20:00 2018-04-14 00:00:00
2 2018-04-21 01:40:00 2018-04-22 01:40:00 2018-04-15 06:00:00
2 2018-04-23 02:00:00 2018-04-24 02:00:00 2018-04-16 12:00:00
2 2018-04-25 02:20:00 2018-04-26 02:20:00 2018-04-17 18:00:00
2 2018-04-27 02:40:00 2018-04-28 02:40:00 2018-04-19 00:00:00
2 2018-04-29 03:00:00 2018-04-30 03:00:00 2018-04-20 06:00:00
In this case, the value for moment in the first row of the first group does not fall in any of the five time intervals. Neither does the second. The third value, 2018-04-11 12:00:00 does fall in the interval in the first row and I would thus want to have True returned.
The desired result would look as follows:
ID date1 date2 moment result
1 2018-04-11 00:00:00 2018-04-12 00:00:00 2018-04-09 00:00:00 False
1 2018-04-13 00:20:00 2018-04-14 00:20:00 2018-04-10 06:00:00 False
1 2018-04-15 00:40:00 2018-04-16 00:40:00 2018-04-11 12:00:00 True
1 2018-04-17 01:00:00 2018-04-18 01:00:00 2018-04-12 18:00:00 False
1 2018-04-19 01:20:00 2018-04-20 01:20:00 2018-04-14 00:00:00 True
2 2018-04-21 01:40:00 2018-04-22 01:40:00 2018-04-15 06:00:00 False
2 2018-04-23 02:00:00 2018-04-24 02:00:00 2018-04-16 12:00:00 False
2 2018-04-25 02:20:00 2018-04-26 02:20:00 2018-04-17 18:00:00 False
2 2018-04-27 02:40:00 2018-04-28 02:40:00 2018-04-19 00:00:00 False
2 2018-04-29 03:00:00 2018-04-30 03:00:00 2018-04-20 06:00:00 False
EDIT
I already 'solved' this problem with the following approach but am looking for a more pythonic and perhaps faster way...
boolean_result = []
for c in ts.index.unique():
temp = ts.loc[ts.index == c]
for row in temp.index:
current_date = temp['moment'][row]
boolean_result.append(max((temp['date1'] <= current_date)
& (current_date <= temp['date2'])))
ts['Result'] = boolean_result
This may actually be very slow if your dataframe is too big, and there might be an optimal solution other than this one:
def time_in_range(start, end, x):
"""Return true if x is in the range [start, end]"""
if start <= x and x <= end:
return True
else:
return False
# empty list to be appended
result = []
test_list = []
for i in ts.index.unique():
temp_df = ts[ts.index == i]
for j in range(0, len(temp_df)):
for k in range(0, len(temp_df)):
test_list.append(time_in_range(temp_df.date1.iloc[k], temp_df.date2.iloc[k], temp_df.moment.iloc[j]))
result.append(any(test_list))
# reset the list
test_list = []
ts['result'] = result

Error rounding time to previous 15 min - Python

I've developed a crude method to round timestamps to the previous 15 mins. For instance, if the timestamp is 8:10:00, it gets rounded to 8:00:00.
However, when it goes over 15 mins it rounds to the previous hour. For instance, if the timestamp was 8:20:00, it gets rounded to 7:00:00 for some reason? I'll list the two examples below.
Correct Rounding:
import pandas as pd
from datetime import datetime, timedelta
d = ({
'Time' : ['8:00:00'],
})
df = pd.DataFrame(data=d)
df['Time'] = pd.to_datetime(df['Time'])
FirstTime = df['Time'].iloc[0]
def hour_rounder(t):
return (t.replace(second=0, microsecond=0, minute=0, hour=t.hour)
-timedelta(hours=t.minute//15))
StartTime = hour_rounder(FirstTime)
StartTime = datetime.time(StartTime)
print(StartTime)
Out:
08:00:00
Incorrect Rounding:
import pandas as pd
from datetime import datetime, timedelta
d = ({
'Time' : ['8:20:00'],
})
df = pd.DataFrame(data=d)
df['Time'] = pd.to_datetime(df['Time'])
FirstTime = df['Time'].iloc[0]
def hour_rounder(t):
return (t.replace(second=0, microsecond=0, minute=0, hour=t.hour)
-timedelta(hours=t.minute//15))
StartTime = hour_rounder(FirstTime)
StartTime = datetime.time(StartTime)
print(StartTime)
Out:
07:00:00
I don't understand what I'm doing wrong?
- timedelta(hours=t.minute//15)
If minute is 20, then minute // 15 equals 1, so you're subtracting one hour.
Try this instead:
return t.replace(second=0, microsecond=0, minute=(t.minute // 15 * 15), hour=t.hour)
Use .dt.floor('15min') to round down to 15 minute invervals.
import pandas as pd
df = pd.DataFrame({'Time': pd.date_range('2018-01-01', freq='13.141min', periods=13)})
df['prev_15'] = df.Time.dt.floor('15min')
Output:
Time prev_15
0 2018-01-01 00:00:00.000 2018-01-01 00:00:00
1 2018-01-01 00:13:08.460 2018-01-01 00:00:00
2 2018-01-01 00:26:16.920 2018-01-01 00:15:00
3 2018-01-01 00:39:25.380 2018-01-01 00:30:00
4 2018-01-01 00:52:33.840 2018-01-01 00:45:00
5 2018-01-01 01:05:42.300 2018-01-01 01:00:00
6 2018-01-01 01:18:50.760 2018-01-01 01:15:00
7 2018-01-01 01:31:59.220 2018-01-01 01:30:00
8 2018-01-01 01:45:07.680 2018-01-01 01:45:00
9 2018-01-01 01:58:16.140 2018-01-01 01:45:00
10 2018-01-01 02:11:24.600 2018-01-01 02:00:00
11 2018-01-01 02:24:33.060 2018-01-01 02:15:00
12 2018-01-01 02:37:41.520 2018-01-01 02:30:00
There is also .dt.round() and .dt.ceil() if you need to get the nearest 15 minute, or the following 15 minute invterval respectively.

How to subtract time when there is a date change in pandas?

I have following dataframe in pandas
start_date start_time end_time
2018-01-01 23:55:00 00:05:00
2018-01-02 00:05:00 00:10:00
2018-01-03 23:59:00 00:05:00
I want to calculate the time difference. But, for 1st and 3rd observation, there is a date change in end_time.
How can I do it in pandas?
Currently, I am using the logic where end_time is less than start_time I am creating one more column called end_date where it increments the start_date by 1 and then subtracts the time.
Is there any other way to do it?
Solution working with timedeltas - if difference are days equal -1 then add one day:
df['start_time'] = pd.to_timedelta(df['start_time'])
df['end_time'] = pd.to_timedelta(df['end_time'])
d = df['end_time'] - df['start_time']
df['diff'] = d.mask(d.dt.days == -1, d + pd.Timedelta(1, unit='d'))
print (df)
start_date start_time end_time diff
0 2018-01-01 23:55:00 00:05:00 00:10:00
1 2018-01-02 00:05:00 00:10:00 00:05:00
2 2018-01-03 23:59:00 00:05:00 00:06:00
Another solution:
s = df['end_time'] - df['start_time']
df['diff'] = np.where(df['end_time'] < df['start_time'],
s + pd.Timedelta(1, unit='d'),
s)
print (df)
start_date start_time end_time diff
0 2018-01-01 23:55:00 00:05:00 00:10:00
1 2018-01-02 00:05:00 00:10:00 00:05:00
2 2018-01-03 23:59:00 00:05:00 00:06:00

Stepping through date in Python

Let's say I have a starting date of datetime(2007, 2, 15).
I want to step this date in a loop so that it's advanced to the 1st and 15th of each month.
So datetime(2007, 2, 15) would step to datetime(2007, 3, 1).
In the next iteration, it would step to datetime(2007, 3, 15)... then to datetime(2007, 4, 1) and so forth.
Is there any possible way to do this with timedelta or dateutils considering that, the number of days it has to step by, continuously changes?
from datetime import datetime
for m in range(1, 13):
for d in (1, 15):
print str(datetime(2013, m, d))
2013-01-01 00:00:00
2013-01-15 00:00:00
2013-02-01 00:00:00
2013-02-15 00:00:00
2013-03-01 00:00:00
2013-03-15 00:00:00
2013-04-01 00:00:00
2013-04-15 00:00:00
2013-05-01 00:00:00
2013-05-15 00:00:00
2013-06-01 00:00:00
2013-06-15 00:00:00
2013-07-01 00:00:00
2013-07-15 00:00:00
2013-08-01 00:00:00
2013-08-15 00:00:00
2013-09-01 00:00:00
2013-09-15 00:00:00
2013-10-01 00:00:00
2013-10-15 00:00:00
2013-11-01 00:00:00
2013-11-15 00:00:00
2013-12-01 00:00:00
2013-12-15 00:00:00
I tend to work with datetime more than date objects, but you could use datetime.date depending on your needs.
I'd iterate through each day and ignore any date where the day of month isn't 1 or 15. Example:
import datetime
current_time = datetime.datetime(2007,2,15)
end_time = datetime.datetime(2008,4,1)
while current_time <= end_time:
if current_time.day in [1,15]:
print(current_time)
current_time += datetime.timedelta(days=1)
This way you can iterate across multiple years and start on the 15th, both of which would be problematic with doog's solution.
from datetime import datetime
d = datetime(month=2,year=2007,day=15)
current_day = next_day = d.day
current_month = next_month = d.month
current_year = next_year = d.year
for i in range(25):
if current_day == 1:
next_day = 15
elif current_day == 15:
next_day = 1
if current_month == 12:
next_month = 1
next_year+=1
else:
next_month+=1
new_date=datetime(month=next_month,year=next_year,day=next_day)
print new_date
current_day,current_month,current_year=next_day,next_month,next_year
2007-03-01 00:00:00
2007-03-15 00:00:00
2007-04-01 00:00:00
2007-04-15 00:00:00
2007-05-01 00:00:00
2007-05-15 00:00:00
2007-06-01 00:00:00
2007-06-15 00:00:00
2007-07-01 00:00:00
2007-07-15 00:00:00
2007-08-01 00:00:00
2007-08-15 00:00:00
2007-09-01 00:00:00
2007-09-15 00:00:00
2007-10-01 00:00:00
2007-10-15 00:00:00
2007-11-01 00:00:00
2007-11-15 00:00:00
2007-12-01 00:00:00
2007-12-15 00:00:00
2008-01-01 00:00:00
2008-01-15 00:00:00
2008-02-01 00:00:00
2008-02-15 00:00:00
2008-03-01 00:00:00

Categories