Following in the spirit of this answer, I attempted the following to convert a DataFrame column of datetimes to a column of seconds since the epoch.
df['date'] = (df['date']+datetime.timedelta(hours=2)-datetime.datetime(1970,1,1))
df['date'].map(lambda td:td.total_seconds())
The second command causes the following error which I do not understand. Any thoughts on what might be going on here? I replaced map with apply and that didn't help matters.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-99-7123e823f995> in <module>()
----> 1 df['date'].map(lambda td:td.total_seconds())
/Users/cpd/.virtualenvs/py27-ipython+pandas/lib/python2.7/site-packages/pandas-0.12.0_937_gb55c790-py2.7-macosx-10.8-x86_64.egg/pandas/core/series.pyc in map(self, arg, na_action)
1932 return self._constructor(new_values, index=self.index).__finalize__(self)
1933 else:
-> 1934 mapped = map_f(values, arg)
1935 return self._constructor(mapped, index=self.index).__finalize__(self)
1936
/Users/cpd/.virtualenvs/py27-ipython+pandas/lib/python2.7/site-packages/pandas-0.12.0_937_gb55c790-py2.7-macosx-10.8-x86_64.egg/pandas/lib.so in pandas.lib.map_infer (pandas/lib.c:43628)()
<ipython-input-99-7123e823f995> in <lambda>(td)
----> 1 df['date'].map(lambda td:td.total_seconds())
AttributeError: 'float' object has no attribute 'total_seconds'
Update:
In 0.15.0 Timedeltas became a full-fledged dtype.
So this becomes possible (as well as the methods below)
In [45]: s = Series(pd.timedelta_range('1 day',freq='1S',periods=5))
In [46]: s.dt.components
Out[46]:
days hours minutes seconds milliseconds microseconds nanoseconds
0 1 0 0 0 0 0 0
1 1 0 0 1 0 0 0
2 1 0 0 2 0 0 0
3 1 0 0 3 0 0 0
4 1 0 0 4 0 0 0
In [47]: s.astype('timedelta64[s]')
Out[47]:
0 86400
1 86401
2 86402
3 86403
4 86404
dtype: float64
Original Answer:
I see that you are on master (and 0.13 is coming out very shortly),
so assuming you have numpy >= 1.7. Do this. See here for the docs (this is frequency conversion)
In [5]: df = DataFrame(dict(date = date_range('20130101',periods=10)))
In [6]: df
Out[6]:
date
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
3 2013-01-04 00:00:00
4 2013-01-05 00:00:00
5 2013-01-06 00:00:00
6 2013-01-07 00:00:00
7 2013-01-08 00:00:00
8 2013-01-09 00:00:00
9 2013-01-10 00:00:00
In [7]: df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1)
Out[7]:
0 15706 days, 02:00:00
1 15707 days, 02:00:00
2 15708 days, 02:00:00
3 15709 days, 02:00:00
4 15710 days, 02:00:00
5 15711 days, 02:00:00
6 15712 days, 02:00:00
7 15713 days, 02:00:00
8 15714 days, 02:00:00
9 15715 days, 02:00:00
Name: date, dtype: timedelta64[ns]
In [9]: (df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1)) / np.timedelta64(1,'s')
Out[9]:
0 1357005600
1 1357092000
2 1357178400
3 1357264800
4 1357351200
5 1357437600
6 1357524000
7 1357610400
8 1357696800
9 1357783200
Name: date, dtype: float64
The contained values are np.timedelta64[ns] objects, they don't have the same methods as timedelta objects, so no total_seconds().
In [10]: s = (df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1))
In [11]: s[0]
Out[11]: numpy.timedelta64(1357005600000000000,'ns')
You can astype them to int, and you get back a ns unit.
In [12]: s[0].astype(int)
Out[12]: 1357005600000000000
You can do this as well (but only on an individual unit element).
In [18]: s[0].astype('timedelta64[s]')
Out[18]: numpy.timedelta64(1357005600,'s')
Related
I've a dateset like this:
date
Condition
20-01-2015
1
20-02-2015
1
20-03-2015
2
20-04-2015
2
20-05-2015
2
20-06-2015
1
20-07-2015
1
20-08-2015
2
20-09-2015
2
20-09-2015
1
I want a new column date_new which should look at the condition in next column. If condition is one, do nothing. If condition is 2, add a day to the date and store in date_new.
Additional condition- There should be 3 continuous 2's for this to work.
The final output should look like this.
date
Condition
date_new
20-01-2015
1
20-02-2015
1
20-03-2015
2
21-02-2015
20-04-2015
2
20-05-2015
2
20-06-2015
1
20-07-2015
1
20-08-2015
2
20-09-2015
2
20-09-2015
1
Any help is appreciated. Thank you.
This solution is a little bit different. If condition is 1 put None, otherwise I add condition value -1 to the date
df['date_new'] = np.where(df['condition'] == 1, None, (df['date'] + pd.to_timedelta(df['condition']-1,'d')).dt.strftime('%d-%m-%Y') )
Ok, so I've edited my answer and transform it into a function:
def newdate(df):
L = df.Condition
res = [i for i, j, k in zip(L, L[1:], L[2:]) if i == j == k]
if 2 in res:
df['date'] = pd.to_datetime(df['date'])
df['new_date'] = df.apply(lambda x: x["date"]+pd.DateOffset(days=2) if x["Condition"]==2 else pd.NA, axis=1)
df['new_date'] = pd.to_datetime(df['new_date'])
df1 = df
return df1
#output:
index
date
Condition
new_date
0
2015-01-20 00:00:00
1
NaT
1
2015-02-20 00:00:00
1
NaT
2
2015-03-20 00:00:00
2
2015-03-22 00:00:00
3
2015-04-20 00:00:00
2
2015-04-22 00:00:00
4
2015-05-20 00:00:00
2
2015-05-22 00:00:00
5
2015-06-20 00:00:00
1
NaT
6
2015-07-20 00:00:00
1
NaT
7
2015-08-20 00:00:00
2
2015-08-22 00:00:00
8
2015-09-20 00:00:00
2
2015-09-22 00:00:00
9
2015-09-20 00:00:00
1
NaT
This question already has answers here:
Pandas pd.to_datetime only keep time do not date
(2 answers)
Closed 3 years ago.
I am using some date and time variables, but I only want to use the time part.
For example, a certain column A and B when I do'dtypes', it shows "dtype('O')" in both, but the output for A is:
0 2017-11-29 17:14:00
1 2017-02-15 15:35:00
2 2018-10-18 08:02:00
3 2017-06-22 09:25:00
And for B is:
0 2017-11-29 20:00:00
1 2017-02-15 16:43:00
2 2018-10-18 11:08:00
3 2017-06-22 11:29:00
Then I do this:
import datetime
from datetime import datetime
df = df[df['A'].apply(lambda v: isinstance(v, datetime))]
df = df[df['B'].apply(lambda v: isinstance(v, datetime))]
However, what I want to do is to subtract the time of A and B. Only the time, not the date.
For example, when I do df['A']-df['B'], I just want the output of the first line to be 02:46. Also, how can I transform this into minutes, but as an integer?
If this is the shape of your dataframe :
A B
0 2017-11-29 17:14:00 2017-11-29 20:00:00
1 2017-02-15 15:35:00 2017-02-15 16:43:00
2 2018-10-18 08:02:00 2018-10-18 11:08:00
3 2017-06-22 09:25:00 2017-06-22 11:29:00
then all you need to do is create a datetime object then you apply your operation to it
df[['A','B']] = df[['A','B']].apply(pd.to_datetime)
df['B'] - df['A']
0 02:46:00
1 01:08:00
2 03:06:00
3 02:04:00
dtype: timedelta64[ns]
another method using pd.to_timedelta and np.timedelta64 assuming that both columns are already datetimes.
df['diff'] = abs(
pd.to_timedelta(df["A"].dt.time.astype(str), "h")
- pd.to_timedelta(df["B"].dt.time.astype(str), "h")
) / np.timedelta64(1, "h")
print(df)
A B diff
0 2017-11-29 17:14:00 2017-11-29 20:00:00 2.766667
1 2017-02-15 15:35:00 2017-02-15 16:43:00 1.133333
2 2018-10-18 08:02:00 2018-10-18 11:08:00 3.100000
3 2017-06-22 09:25:00 2017-06-22 11:29:00 2.066667
or
(
pd.to_timedelta(df["A"].dt.time.astype(str), "h")
- pd.to_timedelta(df["B"].dt.time.astype(str), "h")
)
0 -1 days +21:14:00
1 -1 days +22:52:00
2 -1 days +20:54:00
3 -1 days +21:56:00
dtype: timedelta64[ns]
A workaround would be to subtract the date to each datetime to ensure you're only comparing times, and then subtract:
(df.A - df.A.dt.floor('d')) - (df.B - df.B.dt.floor('d'))
0 -1 days +21:14:00
1 -1 days +22:52:00
2 -1 days +20:54:00
3 -1 days +21:56:00
dtype: timedelta64[ns]
You can do this:
pd.to_timedelta(df[['A', 'B']].astype('datetime64').diff(axis=1)['B'].dt.seconds * 10 ** 9)
# 0 02:46:00
# 1 01:08:00
# 2 03:06:00
# 3 02:04:00
# Name: B, dtype: timedelta64[ns]
This extracts the seconds portion of the timedelta object and convert it back to a timedelta after factoring in the nanoseconds.
Or, if you only care up to the seconds:
pd.to_timedelta(df[['A', 'B']].astype('datetime64').diff(axis=1)['B'].dt.seconds, 's')
To explain, the steps taken were to:
Handle the dtype('O') and convert to datetime64 object
Calculate the difference of A and B based on axis=1
Extract the difference from resulting column B
Extract the timedelta seconds (removing any days, months, etc)
Convert the seconds back to timedelta object.
subtract the time of A and B. Only the time, not the date.
>>> a
0 2017-11-29 17:14:00
1 2017-02-15 15:35:00
2 2018-10-18 08:02:00
3 2017-06-22 09:25:00
dtype: datetime64[ns]
>>> b
0 2017-11-29 20:00:00
1 2017-02-15 16:43:00
2 2018-10-18 11:08:00
3 2017-06-22 11:29:00
dtype: datetime64[ns]
Subtract the seconds since midnight
>>> a1 = (a.dt.hour * 3600) + (a.dt.minute * 60) + (a.dt.microsecond / 1000000)
>>> b1 = (b.dt.hour * 3600) + (b.dt.minute * 60) + (b.dt.microsecond / 1000000)
>>> b1-a1
0 9960.0
1 4080.0
2 11160.0
3 7440.0
dtype: float64
Convert to a timedelta
>>> pd.to_timedelta(b1-a1, unit='S')
0 02:46:00
1 01:08:00
2 03:06:00
3 02:04:00
dtype: timedelta64[ns]
>>>
.dt accessor
>>> a2 = pd.DataFrame({'hr':a.dt.hour,'min':a.dt.minute,'microsec':a.dt.microsecond})
>>> b2 = pd.DataFrame({'hr':b.dt.hour,'min':b.dt.minute,'microsec':b.dt.microsecond})
>>> b2-a2
hr min microsec
0 3 -14 0
1 1 8 0
2 3 6 0
3 2 4 0
>>> c = b2-a2
>>> pd.to_timedelta(c['hr'],'hours') + pd.to_timedelta(c['min'],'minutes') + pd.to_timedelta(c['microsec'],'microseconds')
0 02:46:00
1 01:08:00
2 03:06:00
3 02:04:00
dtype: timedelta64[ns]
I have a dataframe and some columns. I want to sum column "Gap" where time is in some time slots.
region. date. time. gap
0 1 2016-01-01 00:00:08 1
1 1 2016-01-01 00:00:48 0
2 1 2016-01-01 00:02:50 1
3 1 2016-01-01 00:00:52 0
4 1 2016-01-01 00:10:01 0
5 1 2016-01-01 00:10:03 1
6 1 2016-01-01 00:10:05 0
7 1 2016-01-01 00:10:08 0
I want to sum gap column. I have time slots in dict like that.
'slot1': '00:00:00', 'slot2': '00:10:00', 'slot3': '00:20:00'
Now after summation, above dataframe should like that.
region. date. time. gap
0 1 2016-01-01 00:10:00/slot1 2
1 1 2016-01-01 00:20:00/slot2 1
I have many regions and 144 time slots from 00:00:00 to 23:59:49. I have tried this.
regres=reg.groupby(['start_region_hash','Date','Time'])['Time'].apply(lambda x: (x >= hoursdict['slot1']) & (x <= hoursdict['slot2'])).sum()
But it doesn't work.
Idea is convert column time to datetimes with floor by 10Min, then convert to strings HH:MM:SS:
d = {'slot1': '00:00:00', 'slot2': '00:10:00', 'slot3': '00:20:00'}
d1 = {v:k for k, v in d.items()}
df['time'] = pd.to_datetime(df['time']).dt.floor('10Min').dt.strftime('%H:%M:%S')
print (df)
region date time gap
0 1 2016-01-01 00:00:00 1
1 1 2016-01-01 00:00:00 0
2 1 2016-01-01 00:00:00 1
3 1 2016-01-01 00:00:00 0
4 1 2016-01-01 00:10:00 0
5 1 2016-01-01 00:10:00 1
6 1 2016-01-01 00:10:00 0
7 1 2016-01-01 00:10:00 0
Aggregate sum and last map values by dictionary with swapped keys with values:
regres = df.groupby(['region','date','time'], as_index=False)['gap'].sum()
regres['time'] = regres['time'] + '/' + regres['time'].map(d1)
print (regres)
region date time gap
0 1 2016-01-01 00:00:00/slot1 2
1 1 2016-01-01 00:10:00/slot2 1
If want display next 10Min slots:
d = {'slot1': '00:00:00', 'slot2': '00:10:00', 'slot3': '00:20:00'}
d1 = {v:k for k, v in d.items()}
times = pd.to_datetime(df['time']).dt.floor('10Min')
df['time'] = times.dt.strftime('%H:%M:%S')
df['time1'] = times.add(pd.Timedelta('10Min')).dt.strftime('%H:%M:%S')
print (df)
region date time gap time1
0 1 2016-01-01 00:00:00 1 00:10:00
1 1 2016-01-01 00:00:00 0 00:10:00
2 1 2016-01-01 00:00:00 1 00:10:00
3 1 2016-01-01 00:00:00 0 00:10:00
4 1 2016-01-01 00:10:00 0 00:20:00
5 1 2016-01-01 00:10:00 1 00:20:00
6 1 2016-01-01 00:10:00 0 00:20:00
7 1 2016-01-01 00:10:00 0 00:20:00
regres = df.groupby(['region','date','time','time1'], as_index=False)['gap'].sum()
regres['time'] = regres.pop('time1') + '/' + regres['time'].map(d1)
print (regres)
region date time gap
0 1 2016-01-01 00:10:00/slot1 2
1 1 2016-01-01 00:20:00/slot2 1
EDIT:
Improvement for floor and convert to strings is use bining by cut or searchsorted:
df['time'] = pd.to_timedelta(df['time'])
bins = pd.timedelta_range('00:00:00', '24:00:00', freq='10Min')
labels = np.array(['{}'.format(str(x)[-8:]) for x in bins])
labels = labels[:-1]
df['time1'] = pd.cut(df['time'], bins=bins, labels=labels)
df['time11'] = labels[np.searchsorted(bins, df['time'].values) - 1]
Just to avoid the complication of the Datetime comparison (unless that is your whole point, in which case ignore my answer), and show the essence of this group by slot window problem, I here assume times are integers.
df = pd.DataFrame({'time':[8, 48, 250, 52, 1001, 1003, 1005, 1008, 2001, 2003, 2056],
'gap': [1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1]})
slots = np.array([0, 1000, 1500])
df['slot'] = df.apply(func = lambda x: slots[np.argmax(slots[x['time']>slots])], axis=1)
df.groupby('slot')[['gap']].sum()
Output
gap
slot
-----------
0 2
1000 1
1500 3
The way to think about approaching this problem is converting your time column to the values you want first, and then doing a groupby sum of the time column.
The below code shows the approach I've used. I used np.select to include in as many conditions and condition options as I want. After I have converted time to the values I wanted, I did a simple groupby sum
None of the fuss of formatting time or converting strings etc is really needed. Simply let pandas dataframe handle it intuitively.
#Just creating the DataFrame using a dictionary here
regdict = {
'time': ['00:00:08','00:00:48','00:02:50','00:00:52','00:10:01','00:10:03','00:10:05','00:10:08'],
'gap': [1,0,1,0,0,1,0,0],}
df = pd.DataFrame(regdict)
import pandas as pd
import numpy as np #This is the library you require for np.select function
#Add in all your conditions and options here
condlist = [df['time']<'00:10:00',df['time']<'00:20:00']
choicelist = ['00:10:00/slot1','00:20:00/slot2']
#Use np.select after you have defined all your conditions and options
answerlist = np.select(condlist, choicelist)
print (answerlist)
['00:10:00/slot1' '00:10:00/slot1' '00:10:00/slot1' '00:10:00/slot1'
'00:20:00/slot2' '00:20:00/slot2' '00:20:00/slot2' '00:20:00/slot2']
#Assign answerlist to df['time']
df['time'] = answerlist
print (df)
time gap
0 00:10:00 1
1 00:10:00 0
2 00:10:00 1
3 00:10:00 0
4 00:20:00 0
5 00:20:00 1
6 00:20:00 0
7 00:20:00 0
df = df.groupby('time', as_index=False)['gap'].sum()
print (df)
time gap
0 00:10:00 2
1 00:20:00 1
If you wish to keep the original time you can instead do df['timeNew'] = answerlist and then filter from there.
df['timeNew'] = answerlist
print (df)
time gap timeNew
0 00:00:08 1 00:10:00/slot1
1 00:00:48 0 00:10:00/slot1
2 00:02:50 1 00:10:00/slot1
3 00:00:52 0 00:10:00/slot1
4 00:10:01 0 00:20:00/slot2
5 00:10:03 1 00:20:00/slot2
6 00:10:05 0 00:20:00/slot2
7 00:10:08 0 00:20:00/slot2
#Use transform function here to retain all prior values
df['aggregate sum of gap'] = df.groupby('timeNew')['gap'].transform(sum)
print (df)
time gap timeNew aggregate sum of gap
0 00:00:08 1 00:10:00/slot1 2
1 00:00:48 0 00:10:00/slot1 2
2 00:02:50 1 00:10:00/slot1 2
3 00:00:52 0 00:10:00/slot1 2
4 00:10:01 0 00:20:00/slot2 1
5 00:10:03 1 00:20:00/slot2 1
6 00:10:05 0 00:20:00/slot2 1
7 00:10:08 0 00:20:00/slot2 1
The datetime is given in the format YY-MM-DD HH:MM:SS in a dataframe.I want new Series of year,month and hour for which I am trying the below code.
But the problem is that Month and Hour are getting the same value,Year is fine.
Can anyone help me with this ? I am using Ipthon notebook and Pandas and numpy.
Here is the code :
def extract_hour(X):
cnv=datetime.strptime(X, '%Y-%m-%d %H:%M:%S')
return cnv.hour
def extract_month(X):
cnv=datetime.strptime(X, '%Y-%m-%d %H:%M:%S')
return cnv.month
def extract_year(X):
cnv=datetime.strptime(X, '%Y-%m-%d %H:%M:%S')
return cnv.year
#month column
train['Month']=train['datetime'].apply((lambda x: extract_month(x)))
test['Month']=test['datetime'].apply((lambda x: extract_month(x)))
#year column
train['Year']=train['datetime'].apply((lambda x: extract_year(x)))
test['Year']=test['datetime'].apply((lambda x: extract_year(x)))
#Hour column
train['Hour']=train['datetime'].apply((lambda x: extract_hour(x)))
test['Hour']=test['datetime'].apply((lambda x: extract_hour(x)))
you can use .dt accessors instead: train['datetime'].dt.month, train['datetime'].dt.year, train['datetime'].dt.hour (see the full list below)
Demo:
In [81]: train = pd.DataFrame(pd.date_range('2016-01-01', freq='1999H', periods=10), columns=['datetime'])
In [82]: train
Out[82]:
datetime
0 2016-01-01 00:00:00
1 2016-03-24 07:00:00
2 2016-06-15 14:00:00
3 2016-09-06 21:00:00
4 2016-11-29 04:00:00
5 2017-02-20 11:00:00
6 2017-05-14 18:00:00
7 2017-08-06 01:00:00
8 2017-10-28 08:00:00
9 2018-01-19 15:00:00
In [83]: train.datetime.dt.year
Out[83]:
0 2016
1 2016
2 2016
3 2016
4 2016
5 2017
6 2017
7 2017
8 2017
9 2018
Name: datetime, dtype: int64
In [84]: train.datetime.dt.month
Out[84]:
0 1
1 3
2 6
3 9
4 11
5 2
6 5
7 8
8 10
9 1
Name: datetime, dtype: int64
In [85]: train.datetime.dt.hour
Out[85]:
0 0
1 7
2 14
3 21
4 4
5 11
6 18
7 1
8 8
9 15
Name: datetime, dtype: int64
In [86]: train.datetime.dt.day
Out[86]:
0 1
1 24
2 15
3 6
4 29
5 20
6 14
7 6
8 28
9 19
Name: datetime, dtype: int64
List of all .dt accessors:
In [77]: train.datetime.dt.
train.datetime.dt.ceil train.datetime.dt.hour train.datetime.dt.month train.datetime.dt.to_pydatetime
train.datetime.dt.date train.datetime.dt.is_month_end train.datetime.dt.nanosecond train.datetime.dt.tz
train.datetime.dt.day train.datetime.dt.is_month_start train.datetime.dt.normalize train.datetime.dt.tz_convert
train.datetime.dt.dayofweek train.datetime.dt.is_quarter_end train.datetime.dt.quarter train.datetime.dt.tz_localize
train.datetime.dt.dayofyear train.datetime.dt.is_quarter_start train.datetime.dt.round train.datetime.dt.week
train.datetime.dt.days_in_month train.datetime.dt.is_year_end train.datetime.dt.second train.datetime.dt.weekday
train.datetime.dt.daysinmonth train.datetime.dt.is_year_start train.datetime.dt.strftime train.datetime.dt.weekday_name
train.datetime.dt.floor train.datetime.dt.microsecond train.datetime.dt.time train.datetime.dt.weekofyear
train.datetime.dt.freq train.datetime.dt.minute train.datetime.dt.to_period train.datetime.dt.year
I read a csv file containing 150,000 lines into a pandas dataframe. This dataframe has a field, Date, with the dates in yyyy-mm-dd format. I want to extract the month, day and year from it and copy into the dataframes' columns, Month, Day and Year respectively. For a few hundred records the below two methods work ok, but for 150,000 records both take a ridiculously long time to execute. Is there a faster way to do this for 100,000+ records?
First method:
df = pandas.read_csv(filename)
for i in xrange(len(df)):
df.loc[i,'Day'] = int(df.loc[i,'Date'].split('-')[2])
Second method:
df = pandas.read_csv(filename)
for i in xrange(len(df)):
df.loc[i,'Day'] = datetime.strptime(df.loc[i,'Date'], '%Y-%m-%d').day
Thank you.
In 0.15.0 you will be able to use the new .dt accessor to do this nice syntactically.
In [36]: df = DataFrame(date_range('20000101',periods=150000,freq='H'),columns=['Date'])
In [37]: df.head(5)
Out[37]:
Date
0 2000-01-01 00:00:00
1 2000-01-01 01:00:00
2 2000-01-01 02:00:00
3 2000-01-01 03:00:00
4 2000-01-01 04:00:00
[5 rows x 1 columns]
In [38]: %timeit f(df)
10 loops, best of 3: 22 ms per loop
In [39]: def f(df):
df = df.copy()
df['Year'] = DatetimeIndex(df['Date']).year
df['Month'] = DatetimeIndex(df['Date']).month
df['Day'] = DatetimeIndex(df['Date']).day
return df
....:
In [40]: f(df).head()
Out[40]:
Date Year Month Day
0 2000-01-01 00:00:00 2000 1 1
1 2000-01-01 01:00:00 2000 1 1
2 2000-01-01 02:00:00 2000 1 1
3 2000-01-01 03:00:00 2000 1 1
4 2000-01-01 04:00:00 2000 1 1
[5 rows x 4 columns]
From 0.15.0 on (release in end of Sept 2014), the following is now possible with the new .dt accessor:
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
I use below code which works very well for me
df['Year']=[d.split('-')[0] for d in df.Date]
df['Month']=[d.split('-')[1] for d in df.Date]
df['Day']=[d.split('-')[2] for d in df.Date]
df.head(5)
This is the cleanest answer I've found.
df = df.assign(**{t:getattr(df.data.dt,t) for t in nomtimes})
In [30]: df = pd.DataFrame({'data':pd.date_range(start, end)})
In [31]: df.head()
Out[31]:
data
0 2011-01-01
1 2011-01-02
2 2011-01-03
3 2011-01-04
4 2011-01-05
nomtimes = ["year", "hour", "month", "dayofweek"]
df = df.assign(**{t:getattr(df.data.dt,t) for t in nomtimes})
In [33]: df.head()
Out[33]:
data dayofweek hour month year
0 2011-01-01 5 0 1 2011
1 2011-01-02 6 0 1 2011
2 2011-01-03 0 0 1 2011
3 2011-01-04 1 0 1 2011
4 2011-01-05 2 0 1 2011