I have a dataset of column name DateTime having dtype object.
df['DateTime'] = pd.to_datetime(df['DateTime'])
I have used the above code to convert to datetime format then did a split in the column to have Date and Time separately
df['date'] = df['DateTime'].dt.date
df['time'] = df['DateTime'].dt.time
but after the split the format changes to object type and while converting it to datetime it showing error for the time column name as: TypeError: is not convertible to datetime
How to convert it to datetime format the time column
You can use combine in list comprehension with zip:
df = pd.DataFrame({'DateTime': ['2011-01-01 12:48:20', '2014-01-01 12:30:45']})
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['date'] = df['DateTime'].dt.date
df['time'] = df['DateTime'].dt.time
import datetime
df['new'] = [datetime.datetime.combine(a, b) for a, b in zip(df['date'], df['time'])]
print (df)
DateTime date time new
0 2011-01-01 12:48:20 2011-01-01 12:48:20 2011-01-01 12:48:20
1 2014-01-01 12:30:45 2014-01-01 12:30:45 2014-01-01 12:30:45
Or convert to strings, join together and convert again:
df['new'] = pd.to_datetime(df['date'].astype(str) + ' ' +df['time'].astype(str))
print (df)
DateTime date time new
0 2011-01-01 12:48:20 2011-01-01 12:48:20 2011-01-01 12:48:20
1 2014-01-01 12:30:45 2014-01-01 12:30:45 2014-01-01 12:30:45
But if use floor for remove times with converting times to timedeltas then use + only:
df['date'] = df['DateTime'].dt.floor('d')
df['time'] = pd.to_timedelta(df['DateTime'].dt.strftime('%H:%M:%S'))
df['new'] = df['date'] + df['time']
print (df)
DateTime date time new
0 2011-01-01 12:48:20 2011-01-01 12:48:20 2011-01-01 12:48:20
1 2014-01-01 12:30:45 2014-01-01 12:30:45 2014-01-01 12:30:45
How to convert it back to datetime format the time column
There appears to be a misunderstanding. Pandas datetime series must include date and time components. This is non-negotiable. You can simply use pd.to_datetime without specifying a date and use the default 1900-01-01 date:
# date from jezrael
print(pd.to_datetime(df['time'], format='%H:%M:%S'))
0 1900-01-01 12:48:20
1 1900-01-01 12:30:45
Name: time, dtype: datetime64[ns]
Or use another date component, for example today's date:
today = pd.Timestamp('today').strftime('%Y-%m-%d')
print(pd.to_datetime(today + ' ' + df['time'].astype(str)))
0 2018-11-25 12:48:20
1 2018-11-25 12:30:45
Name: time, dtype: datetime64[ns]
Or recombine from your date and time series:
print(pd.to_datetime(df['date'].astype(str) + ' ' + df['time'].astype(str)))
0 2011-01-01 12:48:20
1 2014-01-01 12:30:45
dtype: datetime64[ns]
Related
I am converting my date from yyyymmdd to yyyy-mm-dd hh:mm:ss using:
d7['date'] = pd.to_datetime(d7['date'], format='%Y%m%d', errors='coerce').dt.strftime('%Y-%m-%d %H:%M:%S.%f')
but I am getting error AttributeError: 'str' object has no attribute 'strftime'
I have tried implementing this Remove dtype datetime NaT but having a hard time combining it with my formatting above.
in my date column ill get dates like 2028-01-31 00:00:00.000000 or NaT, I want to make the NaT blanks or None instead.
Sample dataframe:
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-04 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
2019-11-01 00:00:00.000000
NaT
Thank you.
"I am converting my date from yyyymmdd to yyyy-mm-dd hh:mm:ss", therefore 20180101 is an example of the initial date format.
Remove dtype datetime NaT expects a datetime format, but pd.to_datetime(d7['date'], format='%Y%m%d', errors='coerce').dt.strftime('%Y-%m-%d %H:%M:%S.%f') creates a string.
import pandas as pd
# sample data
df = pd.DataFrame({'date': ['20180101', '', '20190101']})
# df view
date
0 20180101
1
2 20190101
# convert to datetime format
df.date = pd.to_datetime(df.date, format='%Y%m%d', errors='coerce')
# df view
date
0 2018-01-01
1 NaT
2 2019-01-01
# apply method from link
# use None if you want NoneType. If you want a string, use 'None'
df.date = df.date.apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else None)
# final output
date
0 2018-01-01 00:00:00
1 None
2 2019-01-01 00:00:00
To find problem rows, try the following code instead.
The function will return a str, None or print any 'date' row that causes an AttributeError
df = pd.DataFrame({'date': ['20180101', '', '20190101', 'abcd', 3456]})
df.date = pd.to_datetime(df.date, format='%Y%m%d', errors='coerce')
# convert from datetime format to string format
def convert(x) -> (str, None):
try:
return x.strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else None
except AttributeError:
return None
print(x)
# apply the function
df.date = df.date.apply(lambda x: convert(x))
# output df
date
0 2018-01-01 00:00:00
1 None
2 2019-01-01 00:00:00
3 None
4 None
I have two columns, one has type datetime64 and datetime.time. The
first column has the day and the second one the hour and minutes. I
am having trouble parsing them:
Leistung_0011
ActStartDateExecution ActStartTimeExecution
0 2016-02-17 11:00:00
10 2016-04-15 07:15:00
20 2016-06-10 10:30:00
Leistung_0011['Start_datetime'] = pd.to_datetime(Leistung_0011['ActStartDateExecution'].astype(str) + ' ' + Leistung_0011['ActStartTimeExecution'].astype(str))
ValueError: ('Unknown string format:', 'NaT 00:00:00')
You can convert to str and join with whitespace before passing to pd.to_datetime:
df['datetime'] = pd.to_datetime(df['day'].astype(str) + ' ' + df['time'].astype(str))
print(df, df.dtypes, sep='\n')
# day time datetime
# 0 2018-01-01 15:00:00 2018-01-01 15:00:00
# 1 2015-12-30 05:00:00 2015-12-30 05:00:00
# day datetime64[ns]
# time object
# datetime datetime64[ns]
# dtype: object
Setup
from datetime import datetime
df = pd.DataFrame({'day': ['2018-01-01', '2015-12-30'],
'time': ['15:00', '05:00']})
df['day'] = pd.to_datetime(df['day'])
df['time'] = df['time'].apply(lambda x: datetime.strptime(x, '%H:%M').time())
print(df['day'].dtype, type(df['time'].iloc[0]), sep='\n')
# datetime64[ns]
# <class 'datetime.time'>
Complete example including seconds:
import pandas as pd
from io import StringIO
x = StringIO(""" ActStartDateExecution ActStartTimeExecution
0 2016-02-17 11:00:00
10 2016-04-15 07:15:00
20 2016-06-10 10:30:00""")
df = pd.read_csv(x, delim_whitespace=True)
df['ActStartDateExecution'] = pd.to_datetime(df['ActStartDateExecution'])
df['ActStartTimeExecution'] = df['ActStartTimeExecution'].apply(lambda x: datetime.strptime(x, '%H:%M:%S').time())
df['datetime'] = pd.to_datetime(df['ActStartDateExecution'].astype(str) + ' ' + df['ActStartTimeExecution'].astype(str))
print(df.dtypes)
ActStartDateExecution datetime64[ns]
ActStartTimeExecution object
datetime datetime64[ns]
dtype: object
I have a dataframe column which looks like this :
It reads M:S.MS. How can I convert it into a M:S:MS timeformat so I can plot it as a time series graph?
If I plot it as it is, python throws an Invalid literal for float() error.
Note
: This dataframe contains one hour worth of data. Values between
0:0.0 - 59:59.9
df = pd.DataFrame({'date':['00:02.0','00:05:0','00:08.1']})
print (df)
date
0 00:02.0
1 00:05:0
2 00:08.1
It is possible convert to datetime:
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f')
print (df)
date
0 1900-01-01 00:00:02.000
1 1900-01-01 00:00:05.000
2 1900-01-01 00:00:08.100
Or to timedeltas:
df['date'] = pd.to_timedelta(df['date'].radd('00:'))
print (df)
date
0 00:00:02
1 00:00:05
2 00:00:08.100000
EDIT:
For custom date use:
date = '2015-01-04'
td = pd.to_datetime(date) - pd.to_datetime('1900-01-01')
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f') + td
print (df)
date
0 2015-01-04 00:00:02.000
1 2015-01-04 00:00:05.000
2 2015-01-04 00:00:08.100
I have a large data set like this
user category
time
2014-01-01 00:00:00 21155349 2
2014-01-01 00:00:00 56347479 6
2014-01-01 00:00:00 68429517 13
2014-01-01 00:00:00 39055685 4
2014-01-01 00:00:00 521325 13
I want to make it as
user category
time
00:00:00 21155349 2
00:00:00 56347479 6
00:00:00 68429517 13
00:00:00 39055685 4
00:00:00 521325 13
How you do this using pandas
If you want to mutate a series (column) in pandas, the pattern is to apply a function to it (that updates on element in the series at a time), and to then assign that series back into into the dataframe
import pandas
import StringIO
# load data
data = '''date,user,category
2014-01-01 00:00:00, 21155349, 2
2014-01-01 00:00:00, 56347479, 6
2014-01-01 00:00:00, 68429517, 13
2014-01-01 00:00:00, 39055685, 4
2014-01-01 00:00:00, 521325, 13'''
df = pandas.read_csv(StringIO.StringIO(data))
df['date'] = pandas.to_datetime(df['date'])
# make the required change
without_date = df['date'].apply( lambda d : d.time() )
df['date'] = without_date
# display results
print df
If the problem is because the date is the index, you've got a few more hoops to jump through:
df = pandas.read_csv(StringIO.StringIO(data), index_col='date')
ser = pandas.to_datetime(df.index).to_series()
df.set_index(ser.apply(lambda d : d.time() ))
As suggested by #DSM, If you have pandas later than 0.15.2, you can use use the .dt accessor on the series to do fast updates.
df = pandas.read_csv(StringIO.StringIO(data), index_col='date')
ser = pandas.to_datetime(df.index).to_series()
df.set_index(ser.dt.time)
I have data:
Symbol bid ask
Timestamp
2014-01-01 21:55:34.378000 EUR/USD 1.37622 1.37693
2014-01-01 21:55:40.410000 EUR/USD 1.37624 1.37698
2014-01-01 21:55:47.210000 EUR/USD 1.37619 1.37696
2014-01-01 21:55:57.963000 EUR/USD 1.37616 1.37696
2014-01-01 21:56:03.117000 EUR/USD 1.37616 1.37694
The timestamp is in GMT. Is there a way to convert that to Eastern?
Note when I do:
data.index
I get output:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-01 21:55:34.378000, ..., 2014-01-01 21:56:03.117000]
Length: 5, Freq: None, Timezone: None
Localize the index (using tz_localize) to UTC (to make the Timestamps timezone-aware) and then convert to Eastern (using tz_convert):
import pytz
eastern = pytz.timezone('US/Eastern')
df.index = df.index.tz_localize(pytz.utc).tz_convert(eastern)
For example:
import pandas as pd
import pytz
index = pd.date_range('20140101 21:55', freq='15S', periods=5)
df = pd.DataFrame(1, index=index, columns=['X'])
print(df)
# X
# 2014-01-01 21:55:00 1
# 2014-01-01 21:55:15 1
# 2014-01-01 21:55:30 1
# 2014-01-01 21:55:45 1
# 2014-01-01 21:56:00 1
# [5 rows x 1 columns]
print(df.index)
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 21:55:00, ..., 2014-01-01 21:56:00]
# Length: 5, Freq: 15S, Timezone: None
eastern = pytz.timezone('US/Eastern')
df.index = df.index.tz_localize(pytz.utc).tz_convert(eastern)
print(df)
# X
# 2014-01-01 16:55:00-05:00 1
# 2014-01-01 16:55:15-05:00 1
# 2014-01-01 16:55:30-05:00 1
# 2014-01-01 16:55:45-05:00 1
# 2014-01-01 16:56:00-05:00 1
# [5 rows x 1 columns]
print(df.index)
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 16:55:00-05:00, ..., 2014-01-01 16:56:00-05:00]
# Length: 5, Freq: 15S, Timezone: US/Eastern
The simplest way is to use to_datetime with utc=True:
df = pd.DataFrame({'Symbol': ['EUR/USD'] * 5,
'bid': [1.37622, 1.37624, 1.37619, 1.37616, 1.37616],
'ask': [1.37693, 1.37698, 1.37696, 1.37696, 1.37694]})
df.index = pd.to_datetime(['2014-01-01 21:55:34.378000',
'2014-01-01 21:55:40.410000',
'2014-01-01 21:55:47.210000',
'2014-01-01 21:55:57.963000',
'2014-01-01 21:56:03.117000'],
utc=True)
For more flexibility, you can convert timezones with tz_convert(). If your data column/index is not timezone-aware, you will get a warning, and should first make the data timezone-aware with tz_localize.
df = pd.DataFrame({'Symbol': ['EUR/USD'] * 5,
'bid': [1.37622, 1.37624, 1.37619, 1.37616, 1.37616],
'ask': [1.37693, 1.37698, 1.37696, 1.37696, 1.37694]})
df.index = pd.to_datetime(['2014-01-01 21:55:34.378000',
'2014-01-01 21:55:40.410000',
'2014-01-01 21:55:47.210000',
'2014-01-01 21:55:57.963000',
'2014-01-01 21:56:03.117000'])
df.index = df.index.tz_localize('GMT')
df.index = df.index.tz_convert('America/New_York')
This also works similarly for datetime columns, but you need dt after accessing the column:
df['column'] = df['column'].dt.tz_convert('America/New_York')
To convert EST time into Asia tz
df.index = data.index.tz_localize('EST')
df.index = data.index.tz_convert('Asia/Kolkata')
Pandas has now inbuilt tz conversion ability.