I have two columns, one has type datetime64 and datetime.time. The
first column has the day and the second one the hour and minutes. I
am having trouble parsing them:
Leistung_0011
ActStartDateExecution ActStartTimeExecution
0 2016-02-17 11:00:00
10 2016-04-15 07:15:00
20 2016-06-10 10:30:00
Leistung_0011['Start_datetime'] = pd.to_datetime(Leistung_0011['ActStartDateExecution'].astype(str) + ' ' + Leistung_0011['ActStartTimeExecution'].astype(str))
ValueError: ('Unknown string format:', 'NaT 00:00:00')
You can convert to str and join with whitespace before passing to pd.to_datetime:
df['datetime'] = pd.to_datetime(df['day'].astype(str) + ' ' + df['time'].astype(str))
print(df, df.dtypes, sep='\n')
# day time datetime
# 0 2018-01-01 15:00:00 2018-01-01 15:00:00
# 1 2015-12-30 05:00:00 2015-12-30 05:00:00
# day datetime64[ns]
# time object
# datetime datetime64[ns]
# dtype: object
Setup
from datetime import datetime
df = pd.DataFrame({'day': ['2018-01-01', '2015-12-30'],
'time': ['15:00', '05:00']})
df['day'] = pd.to_datetime(df['day'])
df['time'] = df['time'].apply(lambda x: datetime.strptime(x, '%H:%M').time())
print(df['day'].dtype, type(df['time'].iloc[0]), sep='\n')
# datetime64[ns]
# <class 'datetime.time'>
Complete example including seconds:
import pandas as pd
from io import StringIO
x = StringIO(""" ActStartDateExecution ActStartTimeExecution
0 2016-02-17 11:00:00
10 2016-04-15 07:15:00
20 2016-06-10 10:30:00""")
df = pd.read_csv(x, delim_whitespace=True)
df['ActStartDateExecution'] = pd.to_datetime(df['ActStartDateExecution'])
df['ActStartTimeExecution'] = df['ActStartTimeExecution'].apply(lambda x: datetime.strptime(x, '%H:%M:%S').time())
df['datetime'] = pd.to_datetime(df['ActStartDateExecution'].astype(str) + ' ' + df['ActStartTimeExecution'].astype(str))
print(df.dtypes)
ActStartDateExecution datetime64[ns]
ActStartTimeExecution object
datetime datetime64[ns]
dtype: object
Related
I'd like to convert a dataframe column from str to datetime format with pd.to_datetime.
import pandas as pd
data = {'Event Number' : [1, 2, 3] ,
'Time': ['1PM', '2PM', '5:30PM'] }
df = pd.DataFrame (data)
pd.to_datetime(df['Time', format = '%I%p']
however, I got an error message: time data '5:30PM' does not match format '%I%p' (match)
You can just let Pandas guess:
pd.to_datetime('2021-01-01 ' + df['Time'])
Output:
0 2021-01-01 13:00:00
1 2021-01-01 14:00:00
2 2021-01-01 17:30:00
Name: Time, dtype: datetime64[ns]
I am converting my date from yyyymmdd to yyyy-mm-dd hh:mm:ss using:
d7['date'] = pd.to_datetime(d7['date'], format='%Y%m%d', errors='coerce').dt.strftime('%Y-%m-%d %H:%M:%S.%f')
but I am getting error AttributeError: 'str' object has no attribute 'strftime'
I have tried implementing this Remove dtype datetime NaT but having a hard time combining it with my formatting above.
in my date column ill get dates like 2028-01-31 00:00:00.000000 or NaT, I want to make the NaT blanks or None instead.
Sample dataframe:
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-04 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
NaT
Thank you.
"I am converting my date from yyyymmdd to yyyy-mm-dd hh:mm:ss", therefore 20180101 is an example of the initial date format.
Remove dtype datetime NaT expects a datetime format, but pd.to_datetime(d7['date'], format='%Y%m%d', errors='coerce').dt.strftime('%Y-%m-%d %H:%M:%S.%f') creates a string.
import pandas as pd
# sample data
df = pd.DataFrame({'date': ['20180101', '', '20190101']})
# df view
date
0 20180101
1
2 20190101
# convert to datetime format
df.date = pd.to_datetime(df.date, format='%Y%m%d', errors='coerce')
# df view
date
0 2018-01-01
1 NaT
2 2019-01-01
# apply method from link
# use None if you want NoneType. If you want a string, use 'None'
df.date = df.date.apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else None)
# final output
date
0 2018-01-01 00:00:00
1 None
2 2019-01-01 00:00:00
To find problem rows, try the following code instead.
The function will return a str, None or print any 'date' row that causes an AttributeError
df = pd.DataFrame({'date': ['20180101', '', '20190101', 'abcd', 3456]})
df.date = pd.to_datetime(df.date, format='%Y%m%d', errors='coerce')
# convert from datetime format to string format
def convert(x) -> (str, None):
try:
return x.strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else None
except AttributeError:
return None
print(x)
# apply the function
df.date = df.date.apply(lambda x: convert(x))
# output df
date
0 2018-01-01 00:00:00
1 None
2 2019-01-01 00:00:00
3 None
4 None
convert to datatype (in csv file)
(yyyy-mm-dd h:mm:ss(current column datatype) --(convert)--> 143141231(nanosec))
problem
-> object does not match format '%Y-%m-%d %H:%M:%S'
csv file
ex)
time
2019-01-01 0:00:00
2019-01-01 8:50:00
2019-01-01 8:50:00
2019-01-01 8:51:00
2019-01-01 8:51:00
2019-01-01 8:51:00
code
import time
import datetime
import pandas as pd
df = pd.read_csv("data/test.csv")
df.head()
mydate =str(df["time"])
timestamp = time.mktime(datetime.datetime.strptime(mydate, "%Y-%m-%d %H:%M:%S").timetuple())
df["time"] = [str(df[timestamp()][t]) for t in range(len(df))]
df.head()
ns_precision = df
ns_precision.to_csv("data/test_convert.csv", index=False)
expect
time
1131314141
1412313131
1312414123
1231225244
2421412141
result
ValueError: time data '0
2019-01-01 0:00:00\n1 ... 2019-01-30 20:15:00\n
Name: time, Length: 77876,
dtype: object' does not match format '%Y-%m-%d %H:%M:%S'
I have a dataset of column name DateTime having dtype object.
df['DateTime'] = pd.to_datetime(df['DateTime'])
I have used the above code to convert to datetime format then did a split in the column to have Date and Time separately
df['date'] = df['DateTime'].dt.date
df['time'] = df['DateTime'].dt.time
but after the split the format changes to object type and while converting it to datetime it showing error for the time column name as: TypeError: is not convertible to datetime
How to convert it to datetime format the time column
You can use combine in list comprehension with zip:
df = pd.DataFrame({'DateTime': ['2011-01-01 12:48:20', '2014-01-01 12:30:45']})
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['date'] = df['DateTime'].dt.date
df['time'] = df['DateTime'].dt.time
import datetime
df['new'] = [datetime.datetime.combine(a, b) for a, b in zip(df['date'], df['time'])]
print (df)
DateTime date time new
0 2011-01-01 12:48:20 2011-01-01 12:48:20 2011-01-01 12:48:20
1 2014-01-01 12:30:45 2014-01-01 12:30:45 2014-01-01 12:30:45
Or convert to strings, join together and convert again:
df['new'] = pd.to_datetime(df['date'].astype(str) + ' ' +df['time'].astype(str))
print (df)
DateTime date time new
0 2011-01-01 12:48:20 2011-01-01 12:48:20 2011-01-01 12:48:20
1 2014-01-01 12:30:45 2014-01-01 12:30:45 2014-01-01 12:30:45
But if use floor for remove times with converting times to timedeltas then use + only:
df['date'] = df['DateTime'].dt.floor('d')
df['time'] = pd.to_timedelta(df['DateTime'].dt.strftime('%H:%M:%S'))
df['new'] = df['date'] + df['time']
print (df)
DateTime date time new
0 2011-01-01 12:48:20 2011-01-01 12:48:20 2011-01-01 12:48:20
1 2014-01-01 12:30:45 2014-01-01 12:30:45 2014-01-01 12:30:45
How to convert it back to datetime format the time column
There appears to be a misunderstanding. Pandas datetime series must include date and time components. This is non-negotiable. You can simply use pd.to_datetime without specifying a date and use the default 1900-01-01 date:
# date from jezrael
print(pd.to_datetime(df['time'], format='%H:%M:%S'))
0 1900-01-01 12:48:20
1 1900-01-01 12:30:45
Name: time, dtype: datetime64[ns]
Or use another date component, for example today's date:
today = pd.Timestamp('today').strftime('%Y-%m-%d')
print(pd.to_datetime(today + ' ' + df['time'].astype(str)))
0 2018-11-25 12:48:20
1 2018-11-25 12:30:45
Name: time, dtype: datetime64[ns]
Or recombine from your date and time series:
print(pd.to_datetime(df['date'].astype(str) + ' ' + df['time'].astype(str)))
0 2011-01-01 12:48:20
1 2014-01-01 12:30:45
dtype: datetime64[ns]
I have data:
Symbol bid ask
Timestamp
2014-01-01 21:55:34.378000 EUR/USD 1.37622 1.37693
2014-01-01 21:55:40.410000 EUR/USD 1.37624 1.37698
2014-01-01 21:55:47.210000 EUR/USD 1.37619 1.37696
2014-01-01 21:55:57.963000 EUR/USD 1.37616 1.37696
2014-01-01 21:56:03.117000 EUR/USD 1.37616 1.37694
The timestamp is in GMT. Is there a way to convert that to Eastern?
Note when I do:
data.index
I get output:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-01 21:55:34.378000, ..., 2014-01-01 21:56:03.117000]
Length: 5, Freq: None, Timezone: None
Localize the index (using tz_localize) to UTC (to make the Timestamps timezone-aware) and then convert to Eastern (using tz_convert):
import pytz
eastern = pytz.timezone('US/Eastern')
df.index = df.index.tz_localize(pytz.utc).tz_convert(eastern)
For example:
import pandas as pd
import pytz
index = pd.date_range('20140101 21:55', freq='15S', periods=5)
df = pd.DataFrame(1, index=index, columns=['X'])
print(df)
# X
# 2014-01-01 21:55:00 1
# 2014-01-01 21:55:15 1
# 2014-01-01 21:55:30 1
# 2014-01-01 21:55:45 1
# 2014-01-01 21:56:00 1
# [5 rows x 1 columns]
print(df.index)
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 21:55:00, ..., 2014-01-01 21:56:00]
# Length: 5, Freq: 15S, Timezone: None
eastern = pytz.timezone('US/Eastern')
df.index = df.index.tz_localize(pytz.utc).tz_convert(eastern)
print(df)
# X
# 2014-01-01 16:55:00-05:00 1
# 2014-01-01 16:55:15-05:00 1
# 2014-01-01 16:55:30-05:00 1
# 2014-01-01 16:55:45-05:00 1
# 2014-01-01 16:56:00-05:00 1
# [5 rows x 1 columns]
print(df.index)
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 16:55:00-05:00, ..., 2014-01-01 16:56:00-05:00]
# Length: 5, Freq: 15S, Timezone: US/Eastern
The simplest way is to use to_datetime with utc=True:
df = pd.DataFrame({'Symbol': ['EUR/USD'] * 5,
'bid': [1.37622, 1.37624, 1.37619, 1.37616, 1.37616],
'ask': [1.37693, 1.37698, 1.37696, 1.37696, 1.37694]})
df.index = pd.to_datetime(['2014-01-01 21:55:34.378000',
'2014-01-01 21:55:40.410000',
'2014-01-01 21:55:47.210000',
'2014-01-01 21:55:57.963000',
'2014-01-01 21:56:03.117000'],
utc=True)
For more flexibility, you can convert timezones with tz_convert(). If your data column/index is not timezone-aware, you will get a warning, and should first make the data timezone-aware with tz_localize.
df = pd.DataFrame({'Symbol': ['EUR/USD'] * 5,
'bid': [1.37622, 1.37624, 1.37619, 1.37616, 1.37616],
'ask': [1.37693, 1.37698, 1.37696, 1.37696, 1.37694]})
df.index = pd.to_datetime(['2014-01-01 21:55:34.378000',
'2014-01-01 21:55:40.410000',
'2014-01-01 21:55:47.210000',
'2014-01-01 21:55:57.963000',
'2014-01-01 21:56:03.117000'])
df.index = df.index.tz_localize('GMT')
df.index = df.index.tz_convert('America/New_York')
This also works similarly for datetime columns, but you need dt after accessing the column:
df['column'] = df['column'].dt.tz_convert('America/New_York')
To convert EST time into Asia tz
df.index = data.index.tz_localize('EST')
df.index = data.index.tz_convert('Asia/Kolkata')
Pandas has now inbuilt tz conversion ability.