pd.to_datetime wrong format - python

I have a dataframe df, where I want to set the column 'Time' to a datetimeindex. The column before converting looks like this:
01-10-19 09:05
01-10-19 10:04
01-10-19 11:05
01-10-19 12:04
01-10-19 13:04
...
31-05-20 22:05
31-05-20 23:05
01-06-20 00:05
01-06-20 01:05
01-06-20 02:05
So I tried the following line of code:
df['Time'] = pd.to_datetime(df['Time'], format='%d-%m-%Y %H:%M', errors='coerce')
Which lead to only NaT 'values' in the column, without datetimeindex being installed. I've also tried to change the format in multiple ways such as: '%%dd-%%mm-%%YY %%HH:%%MM' or '%d%d-%m%m-%Y%Y %H%H:%M%M', but it resulted in the same error.
When I remove the errors='coerce', I got the message: ValueError: time data '09:05' does not match format '%d-%m-%Y %H:%M' (match). What am I missing? Why is it the wrong format and how do I fix it? Thanks very much in advance!

Try this:
df['Time'] = pd.to_datetime(df['Time'], infer_datetime_format= True)
print(df)
#output:
Time
0 2019-01-10 09:05:00
1 2019-01-10 10:04:00
2 2019-01-10 11:05:00
3 2019-01-10 12:04:00
4 2019-01-10 13:04:00
5 2020-05-31 22:05:00
6 2020-05-31 23:05:00
7 2020-01-06 00:05:00
8 2020-01-06 01:05:00
9 2020-01-06 02:05:00

Related

DataFrame Pandas datetime error: hour must be in 0..23

I have the following time series and I want to convert to datetime in DataFrame using "pd.to_datetime". I am getting the following error: "hour must be in 0..23: 2017/ 01/01 24:00:00". How can I go around this error?
DateTime
0 2017/ 01/01 01:00:00
1 2017/ 01/01 02:00:00
2 2017/ 01/01 03:00:00
3 2017/ 01/01 04:00:00
...
22 2017/ 01/01 23:00:00
23 2017/ 01/01 24:00:00
Given:
DateTime
0 2017/01/01 01:00:00
1 2017/01/01 02:00:00
2 2017/01/01 03:00:00
3 2017/01/01 04:00:00
4 2017/01/01 23:00:00
5 2017/01/01 24:00:00
As the error says, 24:00:00 isn't a valid time. Depending on what it actually means, we can salvage it like this:
# Split up your Date and Time Values into separate Columns:
df[['Date', 'Time']] = df.DateTime.str.split(expand=True)
# Convert them separately, one as datetime, the other as timedelta.
df.Date = pd.to_datetime(df.Date)
df.Time = pd.to_timedelta(df.Time)
# Fix your DateTime Column, Drop the helper Columns:
df.DateTime = df.Date + df.Time
df = df.drop(['Date', 'Time'], axis=1)
print(df)
print(df.dtypes)
Output:
DateTime
0 2017-01-01 01:00:00
1 2017-01-01 02:00:00
2 2017-01-01 03:00:00
3 2017-01-01 04:00:00
4 2017-01-01 23:00:00
5 2017-01-02 00:00:00
DateTime datetime64[ns]
dtype: object
df['DateTime'] =pd.to_datetime(df['DateTime'], format='%y-%m-%d %H:%M', errors='coerce')
Try this out!

split datetime column into date and time columns in pandas

I have a following question. I have a date_time column in my dataframe (and many other columns).
df["Date_time"].head()
0 2021-05-15 09:54
1 2021-05-27 17:04
2 2021-05-27 00:00
3 2021-05-27 09:36
4 2021-05-26 18:39
Name: Date_time, dtype: object
I would like to split this column into two (date and time).
I use this formula that works fine:
df["Date"] = ""
df["Time"] = ""
def split_date_time(data_frame):
for i in range(0, len(data_frame)):
df["Date"][i] = df["Date_time"][i].split()[0]
df["Time"][i] = df["Date_time"][i].split()[1]
split_date_time(df)
But is there a more elegant way? Thanks
dt accessor can give you date and time separately:
df["Date"] = df["Date_time"].dt.date
df["Time"] = df["Date_time"].dt.time
to get
>>> df
Date_time Date Time
0 2021-05-15 09:54:00 2021-05-15 09:54:00
1 2021-05-27 17:04:00 2021-05-27 17:04:00
2 2021-05-27 00:00:00 2021-05-27 00:00:00
3 2021-05-27 09:36:00 2021-05-27 09:36:00
4 2021-05-26 18:39:00 2021-05-26 18:39:00

Extract hour and minutes from timestamp but keep it in datetime format

I have a dataframe looking like this
open Start show Einde show
5 NaN 11:30 NaN
6 16:00 18:00 19:45
7 14:30 16:30 18:15
8 NaN NaN NaN
9 18:45 20:45 22:30
These hours are in string format and I would like to transform them to datetime format.
Whenever I try to use pd.to_datetime(evs['open'], errors='coerce') (to change one of the columns) It changes the hours to a full datetime format like this: 2020-04-03 16:00:00 with todays date. I would like to have just the hour, but still in datetime format so I can add minutes etc.
Now when I use dt.hour to access the hour, it return a string and not in HH:MM format.
Can someone help me out please? I'm reading in a CSV through Pandas read_csv but when I use the date parser I get the same problem. Ideally this would get fixed in the read_csv section instead of separately but at this point I'll take anything.
Thanks!
As Chris commented, it is not possible to convert just the hours and minutes into datetime format. But you can use timedeltas to solve your problem.
import datetime
import pandas as pd
def to_timedelta(date):
date = pd.to_datetime(date)
try:
date_start = datetime.datetime(date.year, date.month, date.day, 0, 0)
except TypeError:
return pd.NaT # to keep dtype of series; Alternative: pd.Timedelta(0)
return date - date_start
df['open'].apply(to_timedelta)
Output:
5 NaT
6 16:00:00
7 14:30:00
8 NaT
9 18:45:00
Name: open, dtype: timedelta64[ns]
Now you can use datetime.timedelta to add/subtract minutes, hours or whatever:
df['open'] + datetime.timedelta(minutes=15)
Output:
5 NaT
6 16:15:00
7 14:45:00
8 NaT
9 19:00:00
Name: open, dtype: timedelta64[ns]
Also, it is pretty easy to get back to full datetimes:
df['open'] + datetime.datetime(2020, 4, 4)
Output:
5 NaT
6 2020-04-04 16:00:00
7 2020-04-04 14:30:00
8 NaT
9 2020-04-04 18:45:00
Name: open, dtype: datetime64[ns]

How to convert these 2 date/time columns into 1?

I've spent a few hours reading and trying things from the Python and Pandas docs and I'm not getting what I need...
I have 2 columns-- one is called DATE_GMT and one is called TIME_GMT. The date column is self-explanatory. The TIME column shows "0" through "24" as to which hour it is...
How do I convert the date and time columns, and then merge them so they are POSIX time supportive?
You can directly takes these two columns as two strings and append them together. Then use to_datetime from pandas and give the format of the string to update this as datetime value.
Code
d = pd.DataFrame({'DATE_GMT':['20-JAN-16','20-JAN-16','20-JAN-16','20-JAN-16','20-JAN-16'],
'HOUR_GMT':[23,23,23,23,23]})
d['combined_date'] = pd.to_datetime(d['DATE_GMT'].astype(str)+' '+d['HOUR_GMT'].astype(str),format='%d-%b-%y %H')
DATE_GMT HOUR_GMT combined_date
0 20-JAN-16 23 2016-01-20 23:00:00
1 20-JAN-16 23 2016-01-20 23:00:00
2 20-JAN-16 23 2016-01-20 23:00:00
3 20-JAN-16 23 2016-01-20 23:00:00
4 20-JAN-16 23 2016-01-20 23:00:00
To do this, you can use to_datetime function by passing it a dataframe of date time values.
## sample data
df = pd.DataFrame({'date':['20-JAN-2016','21-JAN-2016','21-JAN-2016','21-JAN-2016'],
'hour':[20,21,22,23]})
# convert to datetime
df['date'] = pd.to_datetime(df['date'])
# extract date components
df['year'] = df.date.dt.year
df['month'] = df.date.dt.month
df['day'] = df.date.dt.day
# remove date
df.drop('date', axis=1, inplace=True)
df['full_date'] = pd.to_datetime(df)
print(df)
hour year month day full_date
0 20 2016 1 20 2016-01-20 20:00:00
1 21 2016 1 21 2016-01-21 21:00:00
2 22 2016 1 21 2016-01-21 22:00:00
3 23 2016 1 21 2016-01-21 23:00:00
Use a combination of pd.to_datetime and pd.to_timedelta
pd.to_datetime(df.date) + pd.to_timedelta(df.hour, unit='h')
0 2016-01-20 20:00:00
1 2016-01-21 21:00:00
2 2016-01-21 22:00:00
3 2016-01-21 23:00:00
dtype: datetime64[ns]

set a pandas datetime64 pandas dataframe column as a datetimeindex without the time component

After importing data from a HDF5 file the index for my stock data has disappeared.
One of the columns in my dataframe "Date" is a Datetime64. How do I convert this date column to a datetimeindex column but without the time parts at the end.
So that slicing the dataframe like this data.ix["2016-01-01":"2016-02-06"] works.
IIUC, starting from a sample dataframe as:
Date x
0 2016-01-01 20:01 1
1 2016-01-02 20:02 2
you can do:
df = df.set_index(pd.DatetimeIndex(df['Date']).date)
which returns your DatetimeIndex only with the date part:
Date x
2016-01-01 2016-01-01 20:01 1
2016-01-02 2016-01-02 20:02 2
Use set_index to convert the column to an index. You do not need to trim the time part for the it to work.
df = pd.DataFrame({
'ts': pd.date_range('2015-12-20', periods=10, freq='12h'),
'stuff': np.random.randn(10)
})
print(df)
stuff ts
0 0.942231 2015-12-20 00:00:00
1 1.229604 2015-12-20 12:00:00
2 -0.162319 2015-12-21 00:00:00
3 -0.142590 2015-12-21 12:00:00
4 1.057184 2015-12-22 00:00:00
5 -0.370927 2015-12-22 12:00:00
6 -0.358605 2015-12-23 00:00:00
7 -0.561857 2015-12-23 12:00:00
8 -0.020714 2015-12-24 00:00:00
9 0.552764 2015-12-24 12:00:00
print(df.set_index('ts').ix['2015-12-21':'2015-12-23'])
stuff
ts
2015-12-21 00:00:00 -0.162319
2015-12-21 12:00:00 -0.142590
2015-12-22 00:00:00 1.057184
2015-12-22 12:00:00 -0.370927
2015-12-23 00:00:00 -0.358605
2015-12-23 12:00:00 -0.561857

Categories