Let's say I have a starting date of datetime(2007, 2, 15).
I want to step this date in a loop so that it's advanced to the 1st and 15th of each month.
So datetime(2007, 2, 15) would step to datetime(2007, 3, 1).
In the next iteration, it would step to datetime(2007, 3, 15)... then to datetime(2007, 4, 1) and so forth.
Is there any possible way to do this with timedelta or dateutils considering that, the number of days it has to step by, continuously changes?
from datetime import datetime
for m in range(1, 13):
for d in (1, 15):
print str(datetime(2013, m, d))
2013-01-01 00:00:00
2013-01-15 00:00:00
2013-02-01 00:00:00
2013-02-15 00:00:00
2013-03-01 00:00:00
2013-03-15 00:00:00
2013-04-01 00:00:00
2013-04-15 00:00:00
2013-05-01 00:00:00
2013-05-15 00:00:00
2013-06-01 00:00:00
2013-06-15 00:00:00
2013-07-01 00:00:00
2013-07-15 00:00:00
2013-08-01 00:00:00
2013-08-15 00:00:00
2013-09-01 00:00:00
2013-09-15 00:00:00
2013-10-01 00:00:00
2013-10-15 00:00:00
2013-11-01 00:00:00
2013-11-15 00:00:00
2013-12-01 00:00:00
2013-12-15 00:00:00
I tend to work with datetime more than date objects, but you could use datetime.date depending on your needs.
I'd iterate through each day and ignore any date where the day of month isn't 1 or 15. Example:
import datetime
current_time = datetime.datetime(2007,2,15)
end_time = datetime.datetime(2008,4,1)
while current_time <= end_time:
if current_time.day in [1,15]:
print(current_time)
current_time += datetime.timedelta(days=1)
This way you can iterate across multiple years and start on the 15th, both of which would be problematic with doog's solution.
from datetime import datetime
d = datetime(month=2,year=2007,day=15)
current_day = next_day = d.day
current_month = next_month = d.month
current_year = next_year = d.year
for i in range(25):
if current_day == 1:
next_day = 15
elif current_day == 15:
next_day = 1
if current_month == 12:
next_month = 1
next_year+=1
else:
next_month+=1
new_date=datetime(month=next_month,year=next_year,day=next_day)
print new_date
current_day,current_month,current_year=next_day,next_month,next_year
2007-03-01 00:00:00
2007-03-15 00:00:00
2007-04-01 00:00:00
2007-04-15 00:00:00
2007-05-01 00:00:00
2007-05-15 00:00:00
2007-06-01 00:00:00
2007-06-15 00:00:00
2007-07-01 00:00:00
2007-07-15 00:00:00
2007-08-01 00:00:00
2007-08-15 00:00:00
2007-09-01 00:00:00
2007-09-15 00:00:00
2007-10-01 00:00:00
2007-10-15 00:00:00
2007-11-01 00:00:00
2007-11-15 00:00:00
2007-12-01 00:00:00
2007-12-15 00:00:00
2008-01-01 00:00:00
2008-01-15 00:00:00
2008-02-01 00:00:00
2008-02-15 00:00:00
2008-03-01 00:00:00
Related
I have a df with a date index as follow:
ind = pd.date_range(start="2015-12-31", end = "2022-04-26", freq="D")
df = pd.DataFrame(
{
"col1": range(len(ind))
},
index=ind
)
What I need is slice the df in windows from the end of each month from 2017-08-31 to 3 years plus 1 month, so I have the next chunk of code
n = timedelta(365 * 3) + relativedelta(months=1)
fechas_ = pd.date_range("2017-08-31", ind.max() - n, freq="M")
# create a for loop to check the beginning and the end of each window
for i in fechas_:
print(f"start: {i}")
print(f"end: {i + n}")
print("\n")
My problem is that I need the last day of the month as the end of each window e.g.:
# first window
start: 2017-08-31 00:00:00
end: 2020-09-30 00:00:00
# second window
start: 2017-09-30 00:00:00
end: 2020-10-31 00:00:00
# so on
But I'm getting:
# first window
start: 2017-08-31 00:00:00
end: 2020-09-29 00:00:00
# second window
start: 2017-09-30 00:00:00
end: 2020-10-29 00:00:00
# 3
2017-10-31 00:00:00
2020-11-29 00:00:00
# 4
2017-11-30 00:00:00
2020-12-29 00:00:00
# 5
2017-12-31 00:00:00
2021-01-30 00:00:00
# 6
2018-01-31 00:00:00
2021-02-27 00:00:00
# 7
2018-02-28 00:00:00
2021-03-27 00:00:00
# 8
2018-03-31 00:00:00
2021-04-29 00:00:00
# 9
2018-04-30 00:00:00
2021-05-29 00:00:00
# 10
2018-05-31 00:00:00
2021-06-29 00:00:00
# 11
2018-06-30 00:00:00
2021-07-29 00:00:00
# 12
2018-07-31 00:00:00
2021-08-30 00:00:00
# 13
2018-08-31 00:00:00
2021-09-29 00:00:00
# 14
2018-09-30 00:00:00
2021-10-29 00:00:00
# 15
2018-10-31 00:00:00
2021-11-29 00:00:00
# 16
2018-11-30 00:00:00
2021-12-29 00:00:00
# 17
2018-12-31 00:00:00
2022-01-30 00:00:00
# 18
2019-01-31 00:00:00
2022-02-27 00:00:00
# 19
2019-02-28 00:00:00
2022-03-27 00:00:00
Does someone know how can I solve this?
Thanks a lot
In your code
n = timedelta(365 * 3) + relativedelta(months=1)
try replacing it with
n = relativedelta(years=3, months=1, day=31)
I have a time data in a column and trying to figure out how can I get it in datetime format
2000
2100
2300
2355
0
1
5
10
100
105
330
My question is how can I get these in datetime format:
output should be:
20:00:00
21:00:00
23:00:00
23:55:00
00:00:00
00:01:00
00:05:00
00:10:00
01:00:00
01:05:00
03:30:00
tried:
1. da = pd.to_datetime(330, format='%H%M')
output: '03:30:00'
2. d= str(datetime.timedelta(minutes = 55 ))
output : '0:55:00'
But if I apply 1. to 100 it gives 10 hrs.
eg: da = pd.to_datetime(100, format='%H%M')
output: '10:00:00'
Try,
pd.to_datetime(df['time'].astype(str).str.zfill(4), format = '%H%M').dt.time
0 20:00:00
1 21:00:00
2 23:00:00
3 23:55:00
4 00:00:00
5 00:01:00
6 00:05:00
7 00:10:00
8 01:00:00
9 01:05:00
10 03:30:00
IIUC str.rjust
pd.to_datetime(s.astype(str).str.rjust(4,'0'),format='%H%M').dt.time
Out[41]:
0 20:00:00
1 21:00:00
2 23:00:00
3 23:55:00
4 00:00:00
5 00:01:00
6 00:05:00
7 00:10:00
8 01:00:00
9 01:05:00
10 03:30:00
Name: x, dtype: object
Since novice code, I am making the things more explicit and adding the formatting letters %H and %M info:
df['cname'] = pd.to_datetime(df['cname'].astype(str).str.zfill(4), format = '%H%M').dt.time
print(df['cname'])
# %H Hour (24-hour clock) as a zero-padded decimal number. 07
# %M Minute as a zero-padded decimal number. 06
I am trying to iterate over every quarter of the year, this is what I have so far.
now = datetime.now()
first_day = datetime(year=now.year, month=1, day=1)
print("--",first_day)
hundredDaysLater = first_day - timedelta(days=100)
for dt in rrule.rrule(rrule.MONTHLY, dtstart=first_day, bymonthday=(31, -1), count=6, interval=3):
print(dt.replace(day=1))
print(dt)
output
-- 2018-01-01 00:00:00
2018-01-01 00:00:00
2018-01-31 00:00:00
2018-04-01 00:00:00
2018-04-30 00:00:00
2018-07-01 00:00:00
2018-07-31 00:00:00
2018-10-01 00:00:00
2018-10-31 00:00:00
2019-01-01 00:00:00
2019-01-31 00:00:00
2019-04-01 00:00:00
Now I basicly need to get this output
2018-01-01, 2018-03-31
2018-04-01, 2018-06-30
2018-07-01, 2018-09-30
2018-10-01, 2018-12-31
But instead of counting forwards I need to count backwards.
Using relativedelta (from the dateutil package):
>>> from dateutil.relativedelta import relativedelta
>>> from datetime import date
>>> d = date(2019, 1, 1)
>>> day = relativedelta(days=1)
>>> quarter = relativedelta(months=3)
>>> while True:
... print(d - day)
... print(d - quarter)
... d -= quarter
...
2018-12-31
2018-10-01
2018-09-30
2018-07-01
2018-06-30
2018-04-01
2018-03-31
2018-01-01
2017-12-31
2017-10-01
2017-09-30
2017-07-01
2017-06-30
2017-04-01
2017-03-31
...
Short solution with pandas.date_range function:
In [708]: start_date = datetime.datetime(2018, 1, 1)
In [709]: data = pd.date_range(start=start_date, periods=4, freq='QS-JAN').union(
pd.date_range(start=start_date, periods=4, freq='Q-DEC')).strftime('%Y-%m-%d')
In [710]: for i in range(0, data.size, 2):
...: print(data[i], data[i+1], sep=', ')
...:
2018-01-01, 2018-03-31
2018-04-01, 2018-06-30
2018-07-01, 2018-09-30
2018-10-01, 2018-12-31
I have a time series covering January of 1979 with 6 hours time deltas. Time format is in continuous hour range:
1
7
13
18
25
31
.
.
.
739
Is it possible to convert these ints to dates? For instance:
1979/01/01 - 1:00
1979/01/01 - 7:00
1979/01/01 - 13:00
1979/01/01 - 18:00
1979/01/02 - 1:00
Thank you so much!
Setup
df = pd.DataFrame({'hour': [1,7,13,18,25,31]})
Use pd.to_datetime with the unit flag, and set the origin flag to the beginning of your desired year.
pd.to_datetime(df.hour, unit='h', origin='1979-01-01')
0 1979-01-01 01:00:00
1 1979-01-01 07:00:00
2 1979-01-01 13:00:00
3 1979-01-01 18:00:00
4 1979-01-02 01:00:00
5 1979-01-02 07:00:00
Name: hour, dtype: datetime64[ns]
Here is another way:
import pandas as pd
s = pd.Series([1,7,13])
s = pd.to_datetime(s*1e9*60*60+ pd.Timestamp(1979,1,1).value)
print(s)
Returns:
0 1979-01-01 01:00:00
1 1979-01-01 07:00:00
2 1979-01-01 13:00:00
dtype: datetime64[ns]
Could also just do this:
from datetime import datetime, timedelta
s = pd.Series([1,7,13,18,25])
s = s.apply(lambda h: datetime(1979, 1, 1) + timedelta(hours=h))
print(s)
Returns:
0 1979-01-01 01:00:00
1 1979-01-01 07:00:00
2 1979-01-01 13:00:00
3 1979-01-01 18:00:00
4 1979-01-02 01:00:00
dtype: datetime64[ns]
I have a group of dates. I would like to subtract them from their forward neighbor to get the delta between them. My code look like this:
import pandas, numpy, StringIO
txt = '''ID,DATE
002691c9cec109e64558848f1358ac16,2003-08-13 00:00:00
002691c9cec109e64558848f1358ac16,2003-08-13 00:00:00
0088f218a1f00e0fe1b94919dc68ec33,2006-05-07 00:00:00
0088f218a1f00e0fe1b94919dc68ec33,2006-06-03 00:00:00
00d34668025906d55ae2e529615f530a,2006-03-09 00:00:00
00d34668025906d55ae2e529615f530a,2006-03-09 00:00:00
0101d3286dfbd58642a7527ecbddb92e,2007-10-13 00:00:00
0101d3286dfbd58642a7527ecbddb92e,2007-10-27 00:00:00
0103bd73af66e5a44f7867c0bb2203cc,2001-02-01 00:00:00
0103bd73af66e5a44f7867c0bb2203cc,2008-01-20 00:00:00
'''
df = pandas.read_csv(StringIO.StringIO(txt))
df = df.sort('DATE')
df.DATE = pandas.to_datetime(df.DATE)
grouped = df.groupby('ID')
df['X_SEQUENCE_GAP'] = pandas.concat([g['DATE'].sub(g['DATE'].shift(), fill_value=0) for title,g in grouped])
I am getting pretty incomprehensible results. So, I am going to go with I have a logic error.
The results I get are as follows:
ID DATE X_SEQUENCE_GAP
0 002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 12277 days, 00:00:00
1 002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 00:00:00
3 0088f218a1f00e0fe1b94919dc68ec33 2006-06-03 00:00:00 27 days, 00:00:00
2 0088f218a1f00e0fe1b94919dc68ec33 2006-05-07 00:00:00 13275 days, 00:00:00
5 00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 13216 days, 00:00:00
4 00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 00:00:00
6 0101d3286dfbd58642a7527ecbddb92e 2007-10-13 00:00:00 13799 days, 00:00:00
7 0101d3286dfbd58642a7527ecbddb92e 2007-10-27 00:00:00 14 days, 00:00:00
9 0103bd73af66e5a44f7867c0bb2203cc 2008-01-20 00:00:00 2544 days, 00:00:00
8 0103bd73af66e5a44f7867c0bb2203cc 2001-02-01 00:00:00 11354 days, 00:00:00
I was expecting for exapme that 0 and 1 would have both a 0 result. Any help is most appreciated.
This is in 0.11rc1 (I don't think will work on a prior version)
When you shift dates the first one is a NaT (like a nan, but for datetimes/timedeltas)
In [27]: df['X_SEQUENCE_GAP'] = grouped.apply(lambda g: g['DATE']-g['DATE'].shift())
In [30]: df.sort()
Out[30]:
ID DATE X_SEQUENCE_GAP
0 002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 NaT
1 002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 00:00:00
2 0088f218a1f00e0fe1b94919dc68ec33 2006-05-07 00:00:00 NaT
3 0088f218a1f00e0fe1b94919dc68ec33 2006-06-03 00:00:00 27 days, 00:00:00
4 00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 NaT
5 00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 00:00:00
6 0101d3286dfbd58642a7527ecbddb92e 2007-10-13 00:00:00 NaT
7 0101d3286dfbd58642a7527ecbddb92e 2007-10-27 00:00:00 14 days, 00:00:00
8 0103bd73af66e5a44f7867c0bb2203cc 2001-02-01 00:00:00 NaT
9 0103bd73af66e5a44f7867c0bb2203cc 2008-01-20 00:00:00 2544 days, 00:00:00
You can then fillna (but you have to do this ackward type conversion becuase of a numpy bug, will get fixed in 0.12).
In [57]: df['X_SEQUENCE_GAP'].sort_index().astype('timedelta64[ns]').fillna(0)
Out[57]:
0 00:00:00
1 00:00:00
2 00:00:00
3 27 days, 00:00:00
4 00:00:00
5 00:00:00
6 00:00:00
7 14 days, 00:00:00
8 00:00:00
9 2544 days, 00:00:00
Name: X_SEQUENCE_GAP, dtype: timedelta64[ns]