Convert time object to datetime format in python pandas

Convert time object to datetime format in python pandas - python

I have a dataset of column name DateTime having dtype object.
df['DateTime'] = pd.to_datetime(df['DateTime'])
I have used the above code to convert to datetime format then did a split in the column to have Date and Time separately
df['date'] = df['DateTime'].dt.date
df['time'] = df['DateTime'].dt.time
but after the split the format changes to object type and while converting it to datetime it showing error for the time column name as: TypeError: is not convertible to datetime
How to convert it to datetime format the time column

You can use combine in list comprehension with zip:
df = pd.DataFrame({'DateTime': ['2011-01-01 12:48:20', '2014-01-01 12:30:45']})
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['date'] = df['DateTime'].dt.date
df['time'] = df['DateTime'].dt.time
import datetime
df['new'] = [datetime.datetime.combine(a, b) for a, b in zip(df['date'], df['time'])]
print (df)
DateTime date time new
0 2011-01-01 12:48:20 2011-01-01 12:48:20 2011-01-01 12:48:20
1 2014-01-01 12:30:45 2014-01-01 12:30:45 2014-01-01 12:30:45
Or convert to strings, join together and convert again:
df['new'] = pd.to_datetime(df['date'].astype(str) + ' ' +df['time'].astype(str))
print (df)
DateTime date time new
0 2011-01-01 12:48:20 2011-01-01 12:48:20 2011-01-01 12:48:20
1 2014-01-01 12:30:45 2014-01-01 12:30:45 2014-01-01 12:30:45
But if use floor for remove times with converting times to timedeltas then use + only:
df['date'] = df['DateTime'].dt.floor('d')
df['time'] = pd.to_timedelta(df['DateTime'].dt.strftime('%H:%M:%S'))
df['new'] = df['date'] + df['time']
print (df)
DateTime date time new
0 2011-01-01 12:48:20 2011-01-01 12:48:20 2011-01-01 12:48:20
1 2014-01-01 12:30:45 2014-01-01 12:30:45 2014-01-01 12:30:45

How to convert it back to datetime format the time column
There appears to be a misunderstanding. Pandas datetime series must include date and time components. This is non-negotiable. You can simply use pd.to_datetime without specifying a date and use the default 1900-01-01 date:
# date from jezrael
print(pd.to_datetime(df['time'], format='%H:%M:%S'))
0 1900-01-01 12:48:20
1 1900-01-01 12:30:45
Name: time, dtype: datetime64[ns]
Or use another date component, for example today's date:
today = pd.Timestamp('today').strftime('%Y-%m-%d')
print(pd.to_datetime(today + ' ' + df['time'].astype(str)))
0 2018-11-25 12:48:20
1 2018-11-25 12:30:45
Name: time, dtype: datetime64[ns]
Or recombine from your date and time series:
print(pd.to_datetime(df['date'].astype(str) + ' ' + df['time'].astype(str)))
0 2011-01-01 12:48:20
1 2014-01-01 12:30:45
dtype: datetime64[ns]

Related

Pandas, Datetime NaT Removal and replace with None, 'str' object has no attribute 'strftime'

I am converting my date from yyyymmdd to yyyy-mm-dd hh:mm:ss using:
d7['date'] = pd.to_datetime(d7['date'], format='%Y%m%d', errors='coerce').dt.strftime('%Y-%m-%d %H:%M:%S.%f')
but I am getting error AttributeError: 'str' object has no attribute 'strftime'
I have tried implementing this Remove dtype datetime NaT but having a hard time combining it with my formatting above.
in my date column ill get dates like 2028-01-31 00:00:00.000000 or NaT, I want to make the NaT blanks or None instead.
Sample dataframe:
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-04 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
NaT
Thank you.

"I am converting my date from yyyymmdd to yyyy-mm-dd hh:mm:ss", therefore 20180101 is an example of the initial date format.
Remove dtype datetime NaT expects a datetime format, but pd.to_datetime(d7['date'], format='%Y%m%d', errors='coerce').dt.strftime('%Y-%m-%d %H:%M:%S.%f') creates a string.
import pandas as pd
# sample data
df = pd.DataFrame({'date': ['20180101', '', '20190101']})
# df view
date
0 20180101
1
2 20190101
# convert to datetime format
df.date = pd.to_datetime(df.date, format='%Y%m%d', errors='coerce')
# df view
date
0 2018-01-01
1 NaT
2 2019-01-01
# apply method from link
# use None if you want NoneType. If you want a string, use 'None'
df.date = df.date.apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else None)
# final output
date
0 2018-01-01 00:00:00
1 None
2 2019-01-01 00:00:00
To find problem rows, try the following code instead.
The function will return a str, None or print any 'date' row that causes an AttributeError
df = pd.DataFrame({'date': ['20180101', '', '20190101', 'abcd', 3456]})
df.date = pd.to_datetime(df.date, format='%Y%m%d', errors='coerce')
# convert from datetime format to string format
def convert(x) -> (str, None):
try:
return x.strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else None
except AttributeError:
return None
print(x)
# apply the function
df.date = df.date.apply(lambda x: convert(x))
# output df
date
0 2018-01-01 00:00:00
1 None
2 2019-01-01 00:00:00
3 None
4 None

Parsing datetime64 and datetime.time Python 3.6.5

I have two columns, one has type datetime64 and datetime.time. The
first column has the day and the second one the hour and minutes. I
am having trouble parsing them:
Leistung_0011
ActStartDateExecution ActStartTimeExecution
0 2016-02-17 11:00:00
10 2016-04-15 07:15:00
20 2016-06-10 10:30:00
Leistung_0011['Start_datetime'] = pd.to_datetime(Leistung_0011['ActStartDateExecution'].astype(str) + ' ' + Leistung_0011['ActStartTimeExecution'].astype(str))
ValueError: ('Unknown string format:', 'NaT 00:00:00')

You can convert to str and join with whitespace before passing to pd.to_datetime:
df['datetime'] = pd.to_datetime(df['day'].astype(str) + ' ' + df['time'].astype(str))
print(df, df.dtypes, sep='\n')
# day time datetime
# 0 2018-01-01 15:00:00 2018-01-01 15:00:00
# 1 2015-12-30 05:00:00 2015-12-30 05:00:00
# day datetime64[ns]
# time object
# datetime datetime64[ns]
# dtype: object
Setup
from datetime import datetime
df = pd.DataFrame({'day': ['2018-01-01', '2015-12-30'],
'time': ['15:00', '05:00']})
df['day'] = pd.to_datetime(df['day'])
df['time'] = df['time'].apply(lambda x: datetime.strptime(x, '%H:%M').time())
print(df['day'].dtype, type(df['time'].iloc[0]), sep='\n')
# datetime64[ns]
# <class 'datetime.time'>
Complete example including seconds:
import pandas as pd
from io import StringIO
x = StringIO(""" ActStartDateExecution ActStartTimeExecution
0 2016-02-17 11:00:00
10 2016-04-15 07:15:00
20 2016-06-10 10:30:00""")
df = pd.read_csv(x, delim_whitespace=True)
df['ActStartDateExecution'] = pd.to_datetime(df['ActStartDateExecution'])
df['ActStartTimeExecution'] = df['ActStartTimeExecution'].apply(lambda x: datetime.strptime(x, '%H:%M:%S').time())
df['datetime'] = pd.to_datetime(df['ActStartDateExecution'].astype(str) + ' ' + df['ActStartTimeExecution'].astype(str))
print(df.dtypes)
ActStartDateExecution datetime64[ns]
ActStartTimeExecution object
datetime datetime64[ns]
dtype: object

Convert Dataframe column to time format in python

I have a dataframe column which looks like this :
It reads M:S.MS. How can I convert it into a M:S:MS timeformat so I can plot it as a time series graph?
If I plot it as it is, python throws an Invalid literal for float() error.
Note
: This dataframe contains one hour worth of data. Values between
0:0.0 - 59:59.9

df = pd.DataFrame({'date':['00:02.0','00:05:0','00:08.1']})
print (df)
date
0 00:02.0
1 00:05:0
2 00:08.1
It is possible convert to datetime:
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f')
print (df)
date
0 1900-01-01 00:00:02.000
1 1900-01-01 00:00:05.000
2 1900-01-01 00:00:08.100
Or to timedeltas:
df['date'] = pd.to_timedelta(df['date'].radd('00:'))
print (df)
date
0 00:00:02
1 00:00:05
2 00:00:08.100000
EDIT:
For custom date use:
date = '2015-01-04'
td = pd.to_datetime(date) - pd.to_datetime('1900-01-01')
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f') + td
print (df)
date
0 2015-01-04 00:00:02.000
1 2015-01-04 00:00:05.000
2 2015-01-04 00:00:08.100

Python Pandas remove date from timestamp

I have a large data set like this
user category
time
2014-01-01 00:00:00 21155349 2
2014-01-01 00:00:00 56347479 6
2014-01-01 00:00:00 68429517 13
2014-01-01 00:00:00 39055685 4
2014-01-01 00:00:00 521325 13
I want to make it as
user category
time
00:00:00 21155349 2
00:00:00 56347479 6
00:00:00 68429517 13
00:00:00 39055685 4
00:00:00 521325 13
How you do this using pandas

If you want to mutate a series (column) in pandas, the pattern is to apply a function to it (that updates on element in the series at a time), and to then assign that series back into into the dataframe
import pandas
import StringIO
# load data
data = '''date,user,category
2014-01-01 00:00:00, 21155349, 2
2014-01-01 00:00:00, 56347479, 6
2014-01-01 00:00:00, 68429517, 13
2014-01-01 00:00:00, 39055685, 4
2014-01-01 00:00:00, 521325, 13'''
df = pandas.read_csv(StringIO.StringIO(data))
df['date'] = pandas.to_datetime(df['date'])
# make the required change
without_date = df['date'].apply( lambda d : d.time() )
df['date'] = without_date
# display results
print df
If the problem is because the date is the index, you've got a few more hoops to jump through:
df = pandas.read_csv(StringIO.StringIO(data), index_col='date')
ser = pandas.to_datetime(df.index).to_series()
df.set_index(ser.apply(lambda d : d.time() ))
As suggested by #DSM, If you have pandas later than 0.15.2, you can use use the .dt accessor on the series to do fast updates.
df = pandas.read_csv(StringIO.StringIO(data), index_col='date')
ser = pandas.to_datetime(df.index).to_series()
df.set_index(ser.dt.time)

Converting time zone pandas dataframe

I have data:
Symbol bid ask
Timestamp
2014-01-01 21:55:34.378000 EUR/USD 1.37622 1.37693
2014-01-01 21:55:40.410000 EUR/USD 1.37624 1.37698
2014-01-01 21:55:47.210000 EUR/USD 1.37619 1.37696
2014-01-01 21:55:57.963000 EUR/USD 1.37616 1.37696
2014-01-01 21:56:03.117000 EUR/USD 1.37616 1.37694
The timestamp is in GMT. Is there a way to convert that to Eastern?
Note when I do:
data.index
I get output:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-01 21:55:34.378000, ..., 2014-01-01 21:56:03.117000]
Length: 5, Freq: None, Timezone: None

Localize the index (using tz_localize) to UTC (to make the Timestamps timezone-aware) and then convert to Eastern (using tz_convert):
import pytz
eastern = pytz.timezone('US/Eastern')
df.index = df.index.tz_localize(pytz.utc).tz_convert(eastern)
For example:
import pandas as pd
import pytz
index = pd.date_range('20140101 21:55', freq='15S', periods=5)
df = pd.DataFrame(1, index=index, columns=['X'])
print(df)
# X
# 2014-01-01 21:55:00 1
# 2014-01-01 21:55:15 1
# 2014-01-01 21:55:30 1
# 2014-01-01 21:55:45 1
# 2014-01-01 21:56:00 1
# [5 rows x 1 columns]
print(df.index)
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 21:55:00, ..., 2014-01-01 21:56:00]
# Length: 5, Freq: 15S, Timezone: None
eastern = pytz.timezone('US/Eastern')
df.index = df.index.tz_localize(pytz.utc).tz_convert(eastern)
print(df)
# X
# 2014-01-01 16:55:00-05:00 1
# 2014-01-01 16:55:15-05:00 1
# 2014-01-01 16:55:30-05:00 1
# 2014-01-01 16:55:45-05:00 1
# 2014-01-01 16:56:00-05:00 1
# [5 rows x 1 columns]
print(df.index)
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 16:55:00-05:00, ..., 2014-01-01 16:56:00-05:00]
# Length: 5, Freq: 15S, Timezone: US/Eastern

The simplest way is to use to_datetime with utc=True:
df = pd.DataFrame({'Symbol': ['EUR/USD'] * 5,
'bid': [1.37622, 1.37624, 1.37619, 1.37616, 1.37616],
'ask': [1.37693, 1.37698, 1.37696, 1.37696, 1.37694]})
df.index = pd.to_datetime(['2014-01-01 21:55:34.378000',
'2014-01-01 21:55:40.410000',
'2014-01-01 21:55:47.210000',
'2014-01-01 21:55:57.963000',
'2014-01-01 21:56:03.117000'],
utc=True)
For more flexibility, you can convert timezones with tz_convert(). If your data column/index is not timezone-aware, you will get a warning, and should first make the data timezone-aware with tz_localize.
df = pd.DataFrame({'Symbol': ['EUR/USD'] * 5,
'bid': [1.37622, 1.37624, 1.37619, 1.37616, 1.37616],
'ask': [1.37693, 1.37698, 1.37696, 1.37696, 1.37694]})
df.index = pd.to_datetime(['2014-01-01 21:55:34.378000',
'2014-01-01 21:55:40.410000',
'2014-01-01 21:55:47.210000',
'2014-01-01 21:55:57.963000',
'2014-01-01 21:56:03.117000'])
df.index = df.index.tz_localize('GMT')
df.index = df.index.tz_convert('America/New_York')
This also works similarly for datetime columns, but you need dt after accessing the column:
df['column'] = df['column'].dt.tz_convert('America/New_York')

To convert EST time into Asia tz
df.index = data.index.tz_localize('EST')
df.index = data.index.tz_convert('Asia/Kolkata')
Pandas has now inbuilt tz conversion ability.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert time object to datetime format in python pandas - python

Related

Pandas, Datetime NaT Removal and replace with None, 'str' object has no attribute 'strftime'

Parsing datetime64 and datetime.time Python 3.6.5

Convert Dataframe column to time format in python

Python Pandas remove date from timestamp

Converting time zone pandas dataframe

Categories

Resources