Python: Adding hours to pandas timestamp - python

I read a csv file into pandas dataframe df and I get the following:
df.columns
Index([u'TDate', u'Hour', u'SPP'], dtype='object')
>>> type(df['TDate'][0])
<class 'pandas.tslib.Timestamp'>
type(df['Hour'][0])
<type 'numpy.int64'>
>>> type(df['TradingDate'])
<class 'pandas.core.series.Series'>
>>> type(df['Hour'])
<class 'pandas.core.series.Series'>
Both the Hour and TDate columns have 100 elements. I want to add the corresponding elements of Hour to TDate.
I tried the following:
import pandas as pd
from datetime import date, timedelta as td
z3 = pd.DatetimeIndex(df['TDate']).to_pydatetime() + td(hours = df['Hour'])
But I get error as it seems td doesn't take array as argument. How do I add each element of Hour to corresponding element of TDate.

I think you can add to column TDate column Hour converted to_timedelta with unit='h':
df = pd.DataFrame({'TDate':['2005-01-03','2005-01-04','2005-01-05'],
'Hour':[4,5,6]})
df['TDate'] = pd.to_datetime(df.TDate)
print (df)
Hour TDate
0 4 2005-01-03
1 5 2005-01-04
2 6 2005-01-05
df['TDate'] += pd.to_timedelta(df.Hour, unit='h')
print (df)
Hour TDate
0 4 2005-01-03 04:00:00
1 5 2005-01-04 05:00:00
2 6 2005-01-05 06:00:00

Related

Sorting timestamps python

I have a pandas df that contains a Column of timestamps. Some of the timestamps are after midnight. These are in 24hr time. I'm trying to add 12hrs to these times so it's consistent.
import pandas as pd
import datetime as dt
import numpy as np
d = ({
'time' : ['9:00:00','10:00:00','11:00:00','12:00:00','01:00:00','02:00:00'],
})
df = pd.DataFrame(data=d)
I have used the following code from another question. But I can't get it to include all the values. The dates are also not necessary.
Convert incomplete 12h datetime-like strings into appropriate datetime type
ts = pd.to_datetime(df.time, format = '%H:%M:%S')
ts[ts.dt.hour == 12] -= pd.Timedelta(12, 'h')
twelve = ts.dt.time == dt.time(0,0,0)
newdate = ts.dt.date.diff() > pd.Timedelta(0)
midnight = twelve & newdate
noon = twelve & ~newdate
offset = pd.Series(np.nan, ts.index, dtype='timedelta64[ns]')
offset[midnight] = pd.Timedelta(0)
offset[noon] = pd.Timedelta(12, 'h')
offset.fillna(method='ffill', inplace=True)
ts = ts.add(offset, fill_value=0).dt.strftime('%H:%M:%S')
print(ts)
Output:
TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('O')
My intended Output is
time
0 9:00:00
1 10:00:00
2 11:00:00
3 12:00:00
4 13:00:00
5 14:00:00
I think need change last row of code to add with fill_value=0 for repalce missing values to ts and then time for python times or strftime for strings:
ts = ts.add(offset, fill_value=0).dt.time
print (ts)
0 09:00:00
1 10:00:00
2 11:00:00
3 12:00:00
4 13:00:00
5 14:00:00
dtype: object
print (ts.apply(type))
0 <class 'datetime.time'>
1 <class 'datetime.time'>
2 <class 'datetime.time'>
3 <class 'datetime.time'>
4 <class 'datetime.time'>
5 <class 'datetime.time'>
dtype: object
ts = ts.add(offset, fill_value=0).dt.strftime('%H:%M:%S')
print (ts)
0 09:00:00
1 10:00:00
2 11:00:00
3 12:00:00
4 13:00:00
5 14:00:00
dtype: object
print (ts.apply(type))
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
3 <class 'str'>
4 <class 'str'>
5 <class 'str'>
dtype: object

Convert Dataframe column to time format in python

I have a dataframe column which looks like this :
It reads M:S.MS. How can I convert it into a M:S:MS timeformat so I can plot it as a time series graph?
If I plot it as it is, python throws an Invalid literal for float() error.
Note
: This dataframe contains one hour worth of data. Values between
0:0.0 - 59:59.9
df = pd.DataFrame({'date':['00:02.0','00:05:0','00:08.1']})
print (df)
date
0 00:02.0
1 00:05:0
2 00:08.1
It is possible convert to datetime:
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f')
print (df)
date
0 1900-01-01 00:00:02.000
1 1900-01-01 00:00:05.000
2 1900-01-01 00:00:08.100
Or to timedeltas:
df['date'] = pd.to_timedelta(df['date'].radd('00:'))
print (df)
date
0 00:00:02
1 00:00:05
2 00:00:08.100000
EDIT:
For custom date use:
date = '2015-01-04'
td = pd.to_datetime(date) - pd.to_datetime('1900-01-01')
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f') + td
print (df)
date
0 2015-01-04 00:00:02.000
1 2015-01-04 00:00:05.000
2 2015-01-04 00:00:08.100

Convert strings to date format

I have a dataframe with a column of strings indicating month and year (MM-YY) but i need it to be like YYYY,MM,DD e.g 2015,10,01
for i in df['End Date (MM-YY)']:
print i
Mar-16
Nov-16
Jan-16
Jan-16
print type(i)
<type 'str'>
<type 'str'>
<type 'str'>
<type 'str'>
I think you can use to_datetime with parameter format:
df = pd.DataFrame({'End Date (MM-YY)': {0: 'Mar-16',
1: 'Nov-16',
2: 'Jan-16',
3: 'Jan-16'}})
print df
End Date (MM-YY)
0 Mar-16
1 Nov-16
2 Jan-16
3 Jan-16
print pd.to_datetime(df['End Date (MM-YY)'], format='%b-%y')
0 2016-03-01
1 2016-11-01
2 2016-01-01
3 2016-01-01
Name: End Date (MM-YY), dtype: datetime64[ns]
df['date'] = pd.to_datetime(df['End Date (MM-YY)'], format='%b-%y')
If you need convert date column to the last day of month, use MonthEnd:
df['date-end-month'] = df['date'] + pd.offsets.MonthEnd()
print df
End Date (MM-YY) date date-end-month
0 Mar-16 2016-03-01 2016-03-31
1 Nov-16 2016-11-01 2016-11-30
2 Jan-16 2016-01-01 2016-01-31
3 Jan-16 2016-01-01 2016-01-31
You can use Lambda and Map functions, the references for which are here 1 and 2 combined with to_datetime with parameter format.
Can you provide more information on the data that you are using. I can refine my answer further based on that part of information. Thanks!
If you are trying to do what I think you are...
Use the datetime.datetime.strptime method! It's a wonderful way to specify the format you expect dates to show up in a string, and it returns a nice datetime obj for you to do with what you will.
You can even turn it back into a differently formatted string with datetime.datetime.strftime!

Pandas and csv import into dataframe. How to best to combine date anbd date fields into one

I have a csv file that I am trying to import into pandas.
There are two columns of intrest. date and hour and are the first two cols.
E.g.
date,hour,...
10-1-2013,0,
10-1-2013,0,
10-1-2013,0,
10-1-2013,1,
10-1-2013,1,
How do I import using pandas so that that hour and date is combined or is that best done after the initial import?
df = DataFrame.from_csv('bingads.csv', sep=',')
If I do the initial import how do I combine the two as a date and then delete the hour?
Thanks
Define your own date_parser:
In [291]: from dateutil.parser import parse
In [292]: import datetime as dt
In [293]: def date_parser(x):
.....: date, hour = x.split(' ')
.....: return parse(date) + dt.timedelta(0, 3600*int(hour))
In [298]: pd.read_csv('test.csv', parse_dates=[[0,1]], date_parser=date_parser)
Out[298]:
date_hour a b c
0 2013-10-01 00:00:00 1 1 1
1 2013-10-01 00:00:00 2 2 2
2 2013-10-01 00:00:00 3 3 3
3 2013-10-01 01:00:00 4 4 4
4 2013-10-01 01:00:00 5 5 5
Apply read_csv instead of read_clipboard to handle your actual data:
>>> df = pd.read_clipboard(sep=',')
>>> df['date'] = pd.to_datetime(df.date) + pd.to_timedelta(df.hour, unit='D')/24
>>> del df['hour']
>>> df
date ...
0 2013-10-01 00:00:00 NaN
1 2013-10-01 00:00:00 NaN
2 2013-10-01 00:00:00 NaN
3 2013-10-01 01:00:00 NaN
4 2013-10-01 01:00:00 NaN
[5 rows x 2 columns]
Take a look at the parse_dates argument which pandas.read_csv accepts.
You can do something like:
df = pandas.read_csv('some.csv', parse_dates=True)
# in which case pandas will parse all columns where it finds dates
df = pandas.read_csv('some.csv', parse_dates=[i,j,k])
# in which case pandas will parse the i, j and kth columns for dates
Since you are only using the two columns from the cdv file and combining those into one, I would squeeze into a series of datetime objects like so:
import pandas as pd
from StringIO import StringIO
import datetime as dt
txt='''\
date,hour,A,B
10-1-2013,0,1,6
10-1-2013,0,2,7
10-1-2013,0,3,8
10-1-2013,1,4,9
10-1-2013,1,5,10'''
def date_parser(date, hour):
dates=[]
for ed, eh in zip(date, hour):
month, day, year=list(map(int, ed.split('-')))
hour=int(eh)
dates.append(dt.datetime(year, month, day, hour))
return dates
p=pd.read_csv(StringIO(txt), usecols=[0,1],
parse_dates=[[0,1]], date_parser=date_parser, squeeze=True)
print p
Prints:
0 2013-10-01 00:00:00
1 2013-10-01 00:00:00
2 2013-10-01 00:00:00
3 2013-10-01 01:00:00
4 2013-10-01 01:00:00
Name: date_hour, dtype: datetime64[ns]

Pandas Timedelta in Days

I have a dataframe in pandas called 'munged_data' with two columns 'entry_date' and 'dob' which i have converted to Timestamps using pd.to_timestamp.I am trying to figure out how to calculate ages of people based on the time difference between 'entry_date' and 'dob' and to do this i need to get the difference in days between the two columns ( so that i can then do somehting like round(days/365.25). I do not seem to be able to find a way to do this using a vectorized operation. When I do munged_data.entry_date-munged_data.dob i get the following :
internal_quote_id
2 15685977 days, 23:54:30.457856
3 11651985 days, 23:49:15.359744
4 9491988 days, 23:39:55.621376
7 11907004 days, 0:10:30.196224
9 15282164 days, 23:30:30.196224
15 15282227 days, 23:50:40.261632
However i do not seem to be able to extract the days as an integer so that i can continue with my calculation.
Any help appreciated.
Using the Pandas type Timedelta available since v0.15.0 you also can do:
In[1]: import pandas as pd
In[2]: df = pd.DataFrame([ pd.Timestamp('20150111'),
pd.Timestamp('20150301') ], columns=['date'])
In[3]: df['today'] = pd.Timestamp('20150315')
In[4]: df
Out[4]:
date today
0 2015-01-11 2015-03-15
1 2015-03-01 2015-03-15
In[5]: (df['today'] - df['date']).dt.days
Out[5]:
0 63
1 14
dtype: int64
You need 0.11 for this (0.11rc1 is out, final prob next week)
In [9]: df = DataFrame([ Timestamp('20010101'), Timestamp('20040601') ])
In [10]: df
Out[10]:
0
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00
In [11]: df = DataFrame([ Timestamp('20010101'),
Timestamp('20040601') ],columns=['age'])
In [12]: df
Out[12]:
age
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00
In [13]: df['today'] = Timestamp('20130419')
In [14]: df['diff'] = df['today']-df['age']
In [16]: df['years'] = df['diff'].apply(lambda x: float(x.item().days)/365)
In [17]: df
Out[17]:
age today diff years
0 2001-01-01 00:00:00 2013-04-19 00:00:00 4491 days, 00:00:00 12.304110
1 2004-06-01 00:00:00 2013-04-19 00:00:00 3244 days, 00:00:00 8.887671
You need this odd apply at the end because not yet full support for timedelta64[ns] scalars (e.g. like how we use Timestamps now for datetime64[ns], coming in 0.12)
Not sure if you still need it, but in Pandas 0.14 i usually use .astype('timedelta64[X]') method
http://pandas.pydata.org/pandas-docs/stable/timeseries.html (frequency conversion)
df = pd.DataFrame([ pd.Timestamp('20010101'), pd.Timestamp('20040605') ])
df.ix[0]-df.ix[1]
Returns:
0 -1251 days
dtype: timedelta64[ns]
(df.ix[0]-df.ix[1]).astype('timedelta64[Y]')
Returns:
0 -4
dtype: float64
Hope that will help
Let's specify that you have a pandas series named time_difference which has type
numpy.timedelta64[ns]
One way of extracting just the day (or whatever desired attribute) is the following:
just_day = time_difference.apply(lambda x: pd.tslib.Timedelta(x).days)
This function is used because the numpy.timedelta64 object does not have a 'days' attribute.
To convert any type of data into days just use pd.Timedelta().days:
pd.Timedelta(1985, unit='Y').days
84494

Categories