how to split date and time and create separate columns - python

I want to split DATE_H_REAL and create two columns. one for date and one hour, i use this :
from datetime import datetime
df_picru = datetime.strptime(df_picru['DATE_H_REAL'], '%Y-%m-%d %H:%M:%S')
df_picru['day'] = df_picru.strftime('%Y-%m-%d')
df_picru['hour'] = df_picru.strftime('%H:%M:%S')
My data look like this
0 NaN
1 NaN
2 NaN
3 02/02/2016 16:16
4 02/02/2016 16:17
5 02/02/2016 16:18

In pandas need to_datetime + Series.dt.strftime - if need output as strings:
df_picru = pd.DataFrame({'DATE_H_REAL':['02/02/2016 16:16',
'02/02/2016 16:17', np.nan]})
df_picru['DATE_H_REAL'] = pd.to_datetime(df_picru['DATE_H_REAL'])
df_picru['day'] = df_picru['DATE_H_REAL'].dt.strftime('%Y-%m-%d')
df_picru['hour'] = df_picru['DATE_H_REAL'].dt.strftime('%H:%M:%S')
print (df_picru)
DATE_H_REAL day hour
0 2016-02-02 16:16:00 2016-02-02 16:16:00
1 2016-02-02 16:17:00 2016-02-02 16:17:00
2 NaT NaT NaT
print (type(df_picru.loc[0, 'day']))
<class 'str'>
print (type(df_picru.loc[0, 'hour']))
<class 'str'>
print (df_picru['DATE_H_REAL'].dtypes)
datetime64[ns]
Or Series.dt.date + Series.dt.time if need output python date and python time:
df_picru['DATE_H_REAL'] = pd.to_datetime(df_picru['DATE_H_REAL'])
df_picru['day'] = df_picru['DATE_H_REAL'].dt.date
df_picru['hour'] = df_picru['DATE_H_REAL'].dt.time
print (df_picru)
DATE_H_REAL day hour
0 2016-02-02 16:16:00 2016-02-02 16:16:00
1 2016-02-02 16:17:00 2016-02-02 16:17:00
2 NaT NaN NaN
print (type(df_picru.loc[0, 'day']))
<class 'datetime.date'>
print (type(df_picru.loc[0, 'hour']))
<class 'datetime.time'>
print (df_picru['DATE_H_REAL'].dtypes)
datetime64[ns]

Related

Unable convert the data time of this column (object to time)

I have a data frame data types like below
usr_id year
0 t961 00:50:03.158000
1 t964 03:25:57
2 t335 00:55:00
3 t829 00:04:25.714000
usr_id object
year object
dtype: object
I want to convert the year column data type to a datetime. I used the below code.
timefmt = "%H:%M"
test['year'] = pd.to_datetime(
test['year'], format=timefmt, errors='coerce').dt.time
I get below output
usr_id year
0 t961 NaT
1 t964 NaT
2 t335 NaT
3 t829 NaT
How can I convert the data time of this column (object to datetime)?
How can I drop seconds & microseconds?
Expected output
usr_id year
0 t961 00:50
1 t964 03:25
2 t335 00:55
3 t829 00:04
Use to_datetime with Series.dt.strftime:
timefmt = "%H:%M"
test['year'] = pd.to_datetime(test['year'], errors='coerce').dt.strftime(timefmt)
print (test)
usr_id year
0 t961 00:50
1 t964 03:25
2 t335 00:55
3 t829 00:04
Or you can use Series.str.rsplit with n=1 for split by last : and select first lists by indexing:
test['year'] = test['year'].str.rsplit(':', n=1).str[0]
print (test)
usr_id year
0 t961 00:50
1 t964 03:25
2 t335 00:55
3 t829 00:04
Or solution by #Akira:
test['year'] = test['year'].astype(str).str[:5]
As there is currently no actual date in your year column, you need to set a default one. Then you you can pass a format to pandas to_datetime function.
This could be done in a one-liner like this:
test['year'] = pd.to_datetime(test['year'].apply(lambda x: '1900-01-01 '+ x),format='%Y-%m-%d %H:%M:%S')

Sorting timestamps python

I have a pandas df that contains a Column of timestamps. Some of the timestamps are after midnight. These are in 24hr time. I'm trying to add 12hrs to these times so it's consistent.
import pandas as pd
import datetime as dt
import numpy as np
d = ({
'time' : ['9:00:00','10:00:00','11:00:00','12:00:00','01:00:00','02:00:00'],
})
df = pd.DataFrame(data=d)
I have used the following code from another question. But I can't get it to include all the values. The dates are also not necessary.
Convert incomplete 12h datetime-like strings into appropriate datetime type
ts = pd.to_datetime(df.time, format = '%H:%M:%S')
ts[ts.dt.hour == 12] -= pd.Timedelta(12, 'h')
twelve = ts.dt.time == dt.time(0,0,0)
newdate = ts.dt.date.diff() > pd.Timedelta(0)
midnight = twelve & newdate
noon = twelve & ~newdate
offset = pd.Series(np.nan, ts.index, dtype='timedelta64[ns]')
offset[midnight] = pd.Timedelta(0)
offset[noon] = pd.Timedelta(12, 'h')
offset.fillna(method='ffill', inplace=True)
ts = ts.add(offset, fill_value=0).dt.strftime('%H:%M:%S')
print(ts)
Output:
TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('O')
My intended Output is
time
0 9:00:00
1 10:00:00
2 11:00:00
3 12:00:00
4 13:00:00
5 14:00:00
I think need change last row of code to add with fill_value=0 for repalce missing values to ts and then time for python times or strftime for strings:
ts = ts.add(offset, fill_value=0).dt.time
print (ts)
0 09:00:00
1 10:00:00
2 11:00:00
3 12:00:00
4 13:00:00
5 14:00:00
dtype: object
print (ts.apply(type))
0 <class 'datetime.time'>
1 <class 'datetime.time'>
2 <class 'datetime.time'>
3 <class 'datetime.time'>
4 <class 'datetime.time'>
5 <class 'datetime.time'>
dtype: object
ts = ts.add(offset, fill_value=0).dt.strftime('%H:%M:%S')
print (ts)
0 09:00:00
1 10:00:00
2 11:00:00
3 12:00:00
4 13:00:00
5 14:00:00
dtype: object
print (ts.apply(type))
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
3 <class 'str'>
4 <class 'str'>
5 <class 'str'>
dtype: object

How to replace python data frame values and concatenate another string with where condition

I want to replace column "Time Period" values & attach other string as shown below.
value: 2017M12
M replace with - and add '-01'
Final result: 2017-12-01
Frequency,Time Period,Date
3,2016M12
3,2016M1
3,2016M8
3,2016M7
3,2016M11
3,2016M10
dt['Date'] = dt.loc[dt['Frequency']=='3',replace('Time Period','M','-')]+'-01'
In [18]: df.loc[df.Frequency==3,'Date'] = \
pd.to_datetime(df.loc[df.Frequency==3, 'Time Period'],
format='%YM%m', errors='coerce')
In [19]: df
Out[19]:
Frequency Time Period Date
0 3 2016M12 2016-12-01
1 3 2016M1 2016-01-01
2 3 2016M8 2016-08-01
3 3 2016M7 2016-07-01
4 3 2016M11 2016-11-01
5 3 2016M10 2016-10-01
In [20]: df.dtypes
Out[20]:
Frequency int64
Time Period object
Date datetime64[ns] # <--- NOTE
dtype: object
You can use apply :
dt['Date'] = dt[ dt['Frequency'] ==3]['Time Period'].apply(lambda x: x.replace('M','-')+"-01")
output
Frequency Time Period Date
0 3 2016M12 2016-12-01
1 3 2016M1 2016-1-01
2 3 2016M8 2016-8-01
3 3 2016M7 2016-7-01
4 3 2016M11 2016-11-01
5 3 2016M10 2016-10-01
Also you don't need to create an empty columns 'Data', dt['Date'] = will create it automatically

Python: Adding hours to pandas timestamp

I read a csv file into pandas dataframe df and I get the following:
df.columns
Index([u'TDate', u'Hour', u'SPP'], dtype='object')
>>> type(df['TDate'][0])
<class 'pandas.tslib.Timestamp'>
type(df['Hour'][0])
<type 'numpy.int64'>
>>> type(df['TradingDate'])
<class 'pandas.core.series.Series'>
>>> type(df['Hour'])
<class 'pandas.core.series.Series'>
Both the Hour and TDate columns have 100 elements. I want to add the corresponding elements of Hour to TDate.
I tried the following:
import pandas as pd
from datetime import date, timedelta as td
z3 = pd.DatetimeIndex(df['TDate']).to_pydatetime() + td(hours = df['Hour'])
But I get error as it seems td doesn't take array as argument. How do I add each element of Hour to corresponding element of TDate.
I think you can add to column TDate column Hour converted to_timedelta with unit='h':
df = pd.DataFrame({'TDate':['2005-01-03','2005-01-04','2005-01-05'],
'Hour':[4,5,6]})
df['TDate'] = pd.to_datetime(df.TDate)
print (df)
Hour TDate
0 4 2005-01-03
1 5 2005-01-04
2 6 2005-01-05
df['TDate'] += pd.to_timedelta(df.Hour, unit='h')
print (df)
Hour TDate
0 4 2005-01-03 04:00:00
1 5 2005-01-04 05:00:00
2 6 2005-01-05 06:00:00

Pandas and csv import into dataframe. How to best to combine date anbd date fields into one

I have a csv file that I am trying to import into pandas.
There are two columns of intrest. date and hour and are the first two cols.
E.g.
date,hour,...
10-1-2013,0,
10-1-2013,0,
10-1-2013,0,
10-1-2013,1,
10-1-2013,1,
How do I import using pandas so that that hour and date is combined or is that best done after the initial import?
df = DataFrame.from_csv('bingads.csv', sep=',')
If I do the initial import how do I combine the two as a date and then delete the hour?
Thanks
Define your own date_parser:
In [291]: from dateutil.parser import parse
In [292]: import datetime as dt
In [293]: def date_parser(x):
.....: date, hour = x.split(' ')
.....: return parse(date) + dt.timedelta(0, 3600*int(hour))
In [298]: pd.read_csv('test.csv', parse_dates=[[0,1]], date_parser=date_parser)
Out[298]:
date_hour a b c
0 2013-10-01 00:00:00 1 1 1
1 2013-10-01 00:00:00 2 2 2
2 2013-10-01 00:00:00 3 3 3
3 2013-10-01 01:00:00 4 4 4
4 2013-10-01 01:00:00 5 5 5
Apply read_csv instead of read_clipboard to handle your actual data:
>>> df = pd.read_clipboard(sep=',')
>>> df['date'] = pd.to_datetime(df.date) + pd.to_timedelta(df.hour, unit='D')/24
>>> del df['hour']
>>> df
date ...
0 2013-10-01 00:00:00 NaN
1 2013-10-01 00:00:00 NaN
2 2013-10-01 00:00:00 NaN
3 2013-10-01 01:00:00 NaN
4 2013-10-01 01:00:00 NaN
[5 rows x 2 columns]
Take a look at the parse_dates argument which pandas.read_csv accepts.
You can do something like:
df = pandas.read_csv('some.csv', parse_dates=True)
# in which case pandas will parse all columns where it finds dates
df = pandas.read_csv('some.csv', parse_dates=[i,j,k])
# in which case pandas will parse the i, j and kth columns for dates
Since you are only using the two columns from the cdv file and combining those into one, I would squeeze into a series of datetime objects like so:
import pandas as pd
from StringIO import StringIO
import datetime as dt
txt='''\
date,hour,A,B
10-1-2013,0,1,6
10-1-2013,0,2,7
10-1-2013,0,3,8
10-1-2013,1,4,9
10-1-2013,1,5,10'''
def date_parser(date, hour):
dates=[]
for ed, eh in zip(date, hour):
month, day, year=list(map(int, ed.split('-')))
hour=int(eh)
dates.append(dt.datetime(year, month, day, hour))
return dates
p=pd.read_csv(StringIO(txt), usecols=[0,1],
parse_dates=[[0,1]], date_parser=date_parser, squeeze=True)
print p
Prints:
0 2013-10-01 00:00:00
1 2013-10-01 00:00:00
2 2013-10-01 00:00:00
3 2013-10-01 01:00:00
4 2013-10-01 01:00:00
Name: date_hour, dtype: datetime64[ns]

Categories