I am dealing with a csv file containing a column called startTime, containing times.
When opening this file with Excel, the times appear as AM/PM times in the formula bar, although the timestamps in the column appear improperly formatted:
startTime
16:02.0
17:45.0
18:57.0
20:23.0
When reading this file using pandas' read_csv, I am unable to format these timestamps properly:
import pandas as pd
df = pd.read_csv('example_file.csv')
print(df.startTime)
Simply yields:
0 16:02.0
1 17:45.0
2 18:57.0
3 20:23.0
I first attempted to convert the output Series using pd.to_datetime(df.startTime,format=" %H%M%S") but this yields the following error message:
time data '16:02.0' does not match format ' %H%M%S' (match)
I then tried pd.to_datetime(df.startTime,format=" %I:%M:%S %p") based on this answer, in order to account for the AM/PM convention, but this returned the same error message.
How can I use pandas to format these timestamps like Excel automatically does?
Your csv file has text, not datetime, so you need to first convert text stored in this column to pandas datetime object, then you can convert this pandas datetime object to the kind of format that you want via a strftime method:
pd.to_datetime(df['startTime']).dt.strftime(date_format = '%I:%M:%S %p')
Outputs:
0 04:02:00 PM
1 05:45:00 PM
2 06:57:00 PM
3 08:23:00 PM
Note: these values are string values, not datetime.
Edit for this specific issue:
A quick format to add 00h to your timestamp before converting to get midnight AM:
pd.to_datetime(df['startTime'].apply(lambda x: f'00:{x}')).dt.strftime(date_format = '%I:%M:%S %p')
Outputs:
0 00:16:02 AM
1 00:17:45 AM
2 00:18:57 AM
3 00:20:23 AM
Try:
>>> pd.to_datetime(df['startTime'].str.strip(), format='%H:%M.%S')
0 1900-01-01 16:02:00
1 1900-01-01 17:45:00
2 1900-01-01 18:57:00
3 1900-01-01 20:23:00
Name: startTime, dtype: datetime64[ns]
Coerce to datetetime and extract time using dt.strftime
df['startTime']=pd.to_datetime(df['startTime']).dt.strftime('%I:%M.%S%p')
Related
6 1 I am new to Pandas and Python. I want to do some date time operations in my script. I am getting date time information from a csv file in following format: PT12M20S
How to convert it into pandas datetime format? Something like: 12:20
During Convertion Error is : Unknown string format: PT13M20S
Try this:
pd.to_datetime("PT12M20S", format='PT%HM%MS')
Replace the single string "PT12M20S" with your dataframe column.
EDIT: to reflect additional information given in the comments:
You can use errors="coerce" to ignore all cases where the first format cant be used. And then fill all the missing values using their respective format.
import pandas as pd
df = pd.DataFrame(data=["PT20M23S", "PT45M", "PT25S"], columns=["date"])
df["date"] = pd.to_datetime(df["date"], format="PT%HM%MS", errors="coerce"). \
fillna(pd.to_datetime(df["date"], format="PT%MM", errors="coerce")). \
fillna(pd.to_datetime(df["date"], format="PT%SS", errors="coerce"))
output:
date
0 1900-01-01 20:23:00
1 1900-01-01 00:45:00
2 1900-01-01 00:00:25
Alternatively you can design your own function and apply it.
from datetime import datetime
def format_date(val):
for f in {"PT%HM%MS", "PT%MM", "PT%SS"}:
try:
return datetime.strptime(val, f)
except ValueError:
continue
return None
df["date"].apply(lambda x: format_date(x))
I am trying to convert a column in string format to DateTime format, However, I am getting the following error, could somebody please help?
The error:time data '42:53.700' does not match format '%H:%M:%S.%f' (match)
Code:
Merge_df['Time'] = pd.to_datetime(Merge_df['Time'], format='%H:%M:%S.%f')
You'll need to clean the data to get a common format before you can parse to data type 'datetime'. For example you can remove the colons and fill with zeros, then parse with the appropriate directive:
import pandas as pd
df = pd.DataFrame({'time': ["1:45.333", "45:22.394", "4:55:23.444", "23:44:01.004"]})
df['time'] = pd.to_datetime(df['time'].str.replace(':', '').str.zfill(10), format="%H%M%S.%f")
df['time']
0 1900-01-01 00:01:45.333
1 1900-01-01 00:45:22.394
2 1900-01-01 04:55:23.444
3 1900-01-01 23:44:01.004
Name: time, dtype: datetime64[ns]
Since the data actually looks more like a duration to me, here's a way how to convert to data type 'timedelta'. You'll need to ensure HH:MM:SS.fff format which is a bit more work:
# ensure common string length
df['time'] = df['time'].str.zfill(12)
# ensure HH:MM:SS.fff format
df['time'] = df['time'].str[:2] + ":" + df['time'].str[3:5] + ":" + df['time'].str[6:]
df['timedelta'] = pd.to_timedelta(df['time'])
df['timedelta']
0 0 days 00:01:45.333000
1 0 days 00:45:22.394000
2 0 days 04:55:23.444000
3 0 days 23:44:01.004000
Name: timedelta, dtype: timedelta64[ns]
The advantage of using timedelta is that you can now also handle hours greater 23.
I have a variable in a df that looks like this
Datetime
10/27/2020 2:28:28 PM
8/2/2020 3:30:18 AM
6/15/2020 5:38:19 PM
How can I change it to this using python?
Date Time
10/27/2020 14:28:28
8/2/2020 3:30:18
6/15/2020 17:38:19
I understand how to separate date and time, but unsure of how to convert it to 24 hour time.
I think this is source you want:
from dateutil.parser import parse
dt = parse("10/27/2020 2:28:28")
print(dt)
# 2020-10-27 02:28:28
# Create Date
date=f"{str(dt.year)}/{str(dt.month)}/{str(dt.day)}"
# Create Time
time=f"{str(dt.hour)}:{str(dt.minute)}:{str(dt.second)}"
You can use pd.to_datetime to convert a scalar, array-like, Series or DataFrame/dict-like to a pandas datetime object. Then, you can use the accessor object for datetimelike properties of the Series values (Series.dt()) to obtain the time, that will be already in the desired format.
You can also use dt.strftime to format the output string which supports the same string format as the python standard library.
df['Datetime'] = pd.to_datetime(df.Datetime)
df['Date'] = df.Datetime.dt.strftime('%m/%d/%Y')
df['Time'] = df.Datetime.dt.time
print(df)
Datetime Date Time
0 2020-10-27 14:28:28 10/27/2020 14:28:28
1 2020-08-02 03:30:18 08/02/2020 03:30:18
2 2020-06-15 17:38:19 06/15/2020 17:38:19
I have a column with the following format:
Original format:
mm/dd/YYYY
10/28/2021
10/28/2021
the output after:
print(df['mm/dd/YYYY'])
0 2021-10-28 00:00:00
1 2021-10-28 00:00:00
However when I am trying to convert to datetime I get the following error:
pd.to_datetime(df['mm/dd/YYYY'], format='%Y-%m-%d %H:%M:%S')
time data mm/dd/YYYY doesn't match format specified
You are passing the wrong format. Try
pd.to_datetime(df['mm/dd/YYYY'], format='%m/%d/%Y')
I am working with an Excel file in Pandas where I am trying to deal with a
Date column where the Date is listed in ISO 8601 format. I want to take this column and store the date and time in two different columns.The values in these two columns need to be stored in Eastern Daylight Savings. This is what they are supposed to look like
Date Date (New) Time (New)
1999-01-01T00:00:29.75 12/31/1998 6:59:58 PM
1999-01-01T00:00:30.00 12/31/1998 6:59:59 PM
1999-01-01T00:00:32.25 12/31/1998 7:00:00 PM
1999-01-01T00:00:30.50 12/31/1998 6:59:58 PM
I have achieved this, partially.
I have converted the values to Eastern Daylight savings time and successfully stored the Date value correctly. However, I want the time value to be stored in the 12 hours format and not in the 24 hours format as it is being right now?
This is what my output looks like so far.
Date Date (New) Time (New)
1999-01-01T00:00:29.75 1998-12-31 19:00:30
1999-01-01T00:00:30.00 1998-12-31 19:00:30
1999-01-01T00:00:32.25 1998-12-31 19:00:32
1999-01-01T00:00:30.50 1998-12-31 19:00:31
Does anyone have any idea what i can do for this?
from pytz import timezone
import dateutil.parser
from pytz import UTC
import datetime as dt
df3['Day']=pd.to_datetime(df['Date'], format='%Y-%m-%d %H:%M: %S.%f',errors='coerce').dt.tz_localize('UTC')
df3['Day']= df3['Day'].dt.tz_convert('US/Eastern')
df3['Date(New)'], df3['Time(New)'] = zip(*[(d.date(), d.time()) for d in df3['Day']])
You should use d.time().strftime("%I:%M:%S %p") which will format the date as requested.
strftime() and strptime() Behavior
You can set the time format used for outputting - the time value itself is (and should be) stored as datetime.time() - if you want a specific string representation you can create a string-type column in the format you want:
from pytz import timezone
import pandas as pd
import datetime as dt
df= pd.DataFrame([{"Date":dt.datetime.now()}])
df['Day']=pd.to_datetime( df['Date'], format='%Y-%m-%d %H:%M: %S.%f',
errors='coerce').dt.tz_localize('UTC')
df['Day']= df['Day'].dt.tz_convert('US/Eastern')
df['Date(New)'], df['Time(New)'] = zip(*[(d.date(), d.time()) for d in df['Day']])
# create strings with specific formatting
df['Date(asstring)'] = df['Day'].dt.strftime("%Y-%m-%d")
df['Time(asstring)'] = df["Day"].dt.strftime("%I:%M:%S %p")
# show resulting column / cell types
print(df.dtypes)
print(df.applymap(type))
# show df
print(df)
Output:
# df.dtypes
Date datetime64[ns]
Day datetime64[ns, US/Eastern]
Date(New) object
Time(New) object
Date(asstring) object
Time(asstring) object
# from df.applymap(type)
Date <class 'pandas._libs.tslib.Timestamp'>
Day <class 'pandas._libs.tslib.Timestamp'>
Date(New) <class 'datetime.date'>
Time(New) <class 'datetime.time'>
Date(asstring) <class 'str'>
Time(asstring) <class 'str'>
# from print(df)
Date Day Date(New) Time(New)
0 2019-01-04 00:40:02.802606 2019-01-03 19:40:02.802606-05:00 2019-01-03 19:40:02.802606
Date(asstring) Time(asstring)
2019-01-03 07:40:02 PM
It looks like you are very close. %H is the 24 hour format. You should use %I instead.
How can I account for period (AM/PM) with datetime.strptime?