I have a pandas df that contains a Column of timestamps. Some of the timestamps are after midnight. These are in 24hr time. I'm trying to add 12hrs to these times so it's consistent.
import pandas as pd
import datetime as dt
import numpy as np
d = ({
'time' : ['9:00:00','10:00:00','11:00:00','12:00:00','01:00:00','02:00:00'],
})
df = pd.DataFrame(data=d)
I have used the following code from another question. But I can't get it to include all the values. The dates are also not necessary.
Convert incomplete 12h datetime-like strings into appropriate datetime type
ts = pd.to_datetime(df.time, format = '%H:%M:%S')
ts[ts.dt.hour == 12] -= pd.Timedelta(12, 'h')
twelve = ts.dt.time == dt.time(0,0,0)
newdate = ts.dt.date.diff() > pd.Timedelta(0)
midnight = twelve & newdate
noon = twelve & ~newdate
offset = pd.Series(np.nan, ts.index, dtype='timedelta64[ns]')
offset[midnight] = pd.Timedelta(0)
offset[noon] = pd.Timedelta(12, 'h')
offset.fillna(method='ffill', inplace=True)
ts = ts.add(offset, fill_value=0).dt.strftime('%H:%M:%S')
print(ts)
Output:
TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('O')
My intended Output is
time
0 9:00:00
1 10:00:00
2 11:00:00
3 12:00:00
4 13:00:00
5 14:00:00
I think need change last row of code to add with fill_value=0 for repalce missing values to ts and then time for python times or strftime for strings:
ts = ts.add(offset, fill_value=0).dt.time
print (ts)
0 09:00:00
1 10:00:00
2 11:00:00
3 12:00:00
4 13:00:00
5 14:00:00
dtype: object
print (ts.apply(type))
0 <class 'datetime.time'>
1 <class 'datetime.time'>
2 <class 'datetime.time'>
3 <class 'datetime.time'>
4 <class 'datetime.time'>
5 <class 'datetime.time'>
dtype: object
ts = ts.add(offset, fill_value=0).dt.strftime('%H:%M:%S')
print (ts)
0 09:00:00
1 10:00:00
2 11:00:00
3 12:00:00
4 13:00:00
5 14:00:00
dtype: object
print (ts.apply(type))
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
3 <class 'str'>
4 <class 'str'>
5 <class 'str'>
dtype: object
Related
I have two columns, one has type datetime64 and datetime.time. The
first column has the day and the second one the hour and minutes. I
am having trouble parsing them:
Leistung_0011
ActStartDateExecution ActStartTimeExecution
0 2016-02-17 11:00:00
10 2016-04-15 07:15:00
20 2016-06-10 10:30:00
Leistung_0011['Start_datetime'] = pd.to_datetime(Leistung_0011['ActStartDateExecution'].astype(str) + ' ' + Leistung_0011['ActStartTimeExecution'].astype(str))
ValueError: ('Unknown string format:', 'NaT 00:00:00')
You can convert to str and join with whitespace before passing to pd.to_datetime:
df['datetime'] = pd.to_datetime(df['day'].astype(str) + ' ' + df['time'].astype(str))
print(df, df.dtypes, sep='\n')
# day time datetime
# 0 2018-01-01 15:00:00 2018-01-01 15:00:00
# 1 2015-12-30 05:00:00 2015-12-30 05:00:00
# day datetime64[ns]
# time object
# datetime datetime64[ns]
# dtype: object
Setup
from datetime import datetime
df = pd.DataFrame({'day': ['2018-01-01', '2015-12-30'],
'time': ['15:00', '05:00']})
df['day'] = pd.to_datetime(df['day'])
df['time'] = df['time'].apply(lambda x: datetime.strptime(x, '%H:%M').time())
print(df['day'].dtype, type(df['time'].iloc[0]), sep='\n')
# datetime64[ns]
# <class 'datetime.time'>
Complete example including seconds:
import pandas as pd
from io import StringIO
x = StringIO(""" ActStartDateExecution ActStartTimeExecution
0 2016-02-17 11:00:00
10 2016-04-15 07:15:00
20 2016-06-10 10:30:00""")
df = pd.read_csv(x, delim_whitespace=True)
df['ActStartDateExecution'] = pd.to_datetime(df['ActStartDateExecution'])
df['ActStartTimeExecution'] = df['ActStartTimeExecution'].apply(lambda x: datetime.strptime(x, '%H:%M:%S').time())
df['datetime'] = pd.to_datetime(df['ActStartDateExecution'].astype(str) + ' ' + df['ActStartTimeExecution'].astype(str))
print(df.dtypes)
ActStartDateExecution datetime64[ns]
ActStartTimeExecution object
datetime datetime64[ns]
dtype: object
In Pandas, I would like to add a number of days to a date/datetime (two columns).
Example:
dates = pd.Series(pd.date_range("20180101 00:00", "20180104 00:00"))
0 2018-01-01
1 2018-01-02
2 2018-01-03
3 2018-01-04
dtype: datetime64[ns]
days = pd.Series(np.arange(4)).astype('float')
0 0.0
1 1.0
2 2.0
3 3.0
dtype: float64
What I have tried (and the associated error I get):
dates + days
TypeError: cannot operate on a series without a rhs of a series/ndarray of type datetime64[ns] or a timedelta
dates + days.astype('int')
TypeError: incompatible type for a datetime/timedelta operation [__add__]
dates + pd.DateOffset(days=days)
TypeError: DatetimeIndex cannot perform the operation +
dates + np.timedelta64(days.values)
ValueError: Could not convert object to NumPy timedelta
dates + pd.offsets.Day(days)
TypeError: cannot convert the series to
dates + pd.datetools.timedelta(days=days)
TypeError: unsupported type for timedelta days component: Series
At last I have found two methods!:
dates + pd.to_timedelta(days, unit='D')
or
dates + pd.TimedeltaIndex(days, unit='D')
Which both produce:
0 2018-01-01
1 2018-01-03
2 2018-01-05
3 2018-01-07
dtype: datetime64[ns]
(It took ages to find the relevant documentation: https://pandas.pydata.org/pandas-docs/stable/timedeltas.html#to-timedelta)
I want to split DATE_H_REAL and create two columns. one for date and one hour, i use this :
from datetime import datetime
df_picru = datetime.strptime(df_picru['DATE_H_REAL'], '%Y-%m-%d %H:%M:%S')
df_picru['day'] = df_picru.strftime('%Y-%m-%d')
df_picru['hour'] = df_picru.strftime('%H:%M:%S')
My data look like this
0 NaN
1 NaN
2 NaN
3 02/02/2016 16:16
4 02/02/2016 16:17
5 02/02/2016 16:18
In pandas need to_datetime + Series.dt.strftime - if need output as strings:
df_picru = pd.DataFrame({'DATE_H_REAL':['02/02/2016 16:16',
'02/02/2016 16:17', np.nan]})
df_picru['DATE_H_REAL'] = pd.to_datetime(df_picru['DATE_H_REAL'])
df_picru['day'] = df_picru['DATE_H_REAL'].dt.strftime('%Y-%m-%d')
df_picru['hour'] = df_picru['DATE_H_REAL'].dt.strftime('%H:%M:%S')
print (df_picru)
DATE_H_REAL day hour
0 2016-02-02 16:16:00 2016-02-02 16:16:00
1 2016-02-02 16:17:00 2016-02-02 16:17:00
2 NaT NaT NaT
print (type(df_picru.loc[0, 'day']))
<class 'str'>
print (type(df_picru.loc[0, 'hour']))
<class 'str'>
print (df_picru['DATE_H_REAL'].dtypes)
datetime64[ns]
Or Series.dt.date + Series.dt.time if need output python date and python time:
df_picru['DATE_H_REAL'] = pd.to_datetime(df_picru['DATE_H_REAL'])
df_picru['day'] = df_picru['DATE_H_REAL'].dt.date
df_picru['hour'] = df_picru['DATE_H_REAL'].dt.time
print (df_picru)
DATE_H_REAL day hour
0 2016-02-02 16:16:00 2016-02-02 16:16:00
1 2016-02-02 16:17:00 2016-02-02 16:17:00
2 NaT NaN NaN
print (type(df_picru.loc[0, 'day']))
<class 'datetime.date'>
print (type(df_picru.loc[0, 'hour']))
<class 'datetime.time'>
print (df_picru['DATE_H_REAL'].dtypes)
datetime64[ns]
I read a csv file into pandas dataframe df and I get the following:
df.columns
Index([u'TDate', u'Hour', u'SPP'], dtype='object')
>>> type(df['TDate'][0])
<class 'pandas.tslib.Timestamp'>
type(df['Hour'][0])
<type 'numpy.int64'>
>>> type(df['TradingDate'])
<class 'pandas.core.series.Series'>
>>> type(df['Hour'])
<class 'pandas.core.series.Series'>
Both the Hour and TDate columns have 100 elements. I want to add the corresponding elements of Hour to TDate.
I tried the following:
import pandas as pd
from datetime import date, timedelta as td
z3 = pd.DatetimeIndex(df['TDate']).to_pydatetime() + td(hours = df['Hour'])
But I get error as it seems td doesn't take array as argument. How do I add each element of Hour to corresponding element of TDate.
I think you can add to column TDate column Hour converted to_timedelta with unit='h':
df = pd.DataFrame({'TDate':['2005-01-03','2005-01-04','2005-01-05'],
'Hour':[4,5,6]})
df['TDate'] = pd.to_datetime(df.TDate)
print (df)
Hour TDate
0 4 2005-01-03
1 5 2005-01-04
2 6 2005-01-05
df['TDate'] += pd.to_timedelta(df.Hour, unit='h')
print (df)
Hour TDate
0 4 2005-01-03 04:00:00
1 5 2005-01-04 05:00:00
2 6 2005-01-05 06:00:00
I have a dataframe with a column of strings indicating month and year (MM-YY) but i need it to be like YYYY,MM,DD e.g 2015,10,01
for i in df['End Date (MM-YY)']:
print i
Mar-16
Nov-16
Jan-16
Jan-16
print type(i)
<type 'str'>
<type 'str'>
<type 'str'>
<type 'str'>
I think you can use to_datetime with parameter format:
df = pd.DataFrame({'End Date (MM-YY)': {0: 'Mar-16',
1: 'Nov-16',
2: 'Jan-16',
3: 'Jan-16'}})
print df
End Date (MM-YY)
0 Mar-16
1 Nov-16
2 Jan-16
3 Jan-16
print pd.to_datetime(df['End Date (MM-YY)'], format='%b-%y')
0 2016-03-01
1 2016-11-01
2 2016-01-01
3 2016-01-01
Name: End Date (MM-YY), dtype: datetime64[ns]
df['date'] = pd.to_datetime(df['End Date (MM-YY)'], format='%b-%y')
If you need convert date column to the last day of month, use MonthEnd:
df['date-end-month'] = df['date'] + pd.offsets.MonthEnd()
print df
End Date (MM-YY) date date-end-month
0 Mar-16 2016-03-01 2016-03-31
1 Nov-16 2016-11-01 2016-11-30
2 Jan-16 2016-01-01 2016-01-31
3 Jan-16 2016-01-01 2016-01-31
You can use Lambda and Map functions, the references for which are here 1 and 2 combined with to_datetime with parameter format.
Can you provide more information on the data that you are using. I can refine my answer further based on that part of information. Thanks!
If you are trying to do what I think you are...
Use the datetime.datetime.strptime method! It's a wonderful way to specify the format you expect dates to show up in a string, and it returns a nice datetime obj for you to do with what you will.
You can even turn it back into a differently formatted string with datetime.datetime.strftime!