I have a pandas dataframe that contains a couple of columns. Two of which are start_time and end_time. In those columns the values look like - 2020-01-04 01:38:33 +0000 UTC
I am not able to create a datetime object from these strings because I am not able to get the format right -
df['start_time'] = pd.to_datetime(df['start_time'], format="yyyy-MM-dd HH:mm:ss +0000 UTC")
I also tried using yyyy-MM-dd HH:mm:ss %z UTC as a format
This gives the error -
ValueError: time data '2020-01-04 01:38:33 +0000 UTC' does not match format 'yyyy-MM-dd HH:mm:ss +0000 UTC' (match)
You just need to use the proper timestamp format that to_datetime will recognize
df['start_time'] = pd.to_datetime(df['start_time'], format="%Y-%m-%d %H:%M:%S +0000 UTC")
There are some notes below about this problem:
1. About your error
This gives the error -
You have parsed a wrong datetime format that will cause the error. For correct format check this one https://strftime.org/. Correct format for this problem would be: "%Y-%m-%d %H:%M:%S %z UTC"
2. Pandas limitation with timezone
Parsing UTC timezone as %z doesn't working on pd.Series (it only works on index value). So if you use this, it will not work:
df['startTime'] = pd.to_datetime(df.startTime, format="%Y-%m-%d %H:%M:%S %z UTC", utc=True)
Solution for this is using python built-in library for inferring the datetime data:
from datetime import datetime
f = lambda x: datetime.strptime(x, "%Y-%m-%d %H:%M:%S %z UTC")
df['startTime'] = pd.to_datetime(df.startTime.apply(f), utc=True)
#fmarm answer only help you dealing with date and hour data, not UTC timezone.
Related
I have a column in Pandas dataframe which is a datetime entry column in string.
I have tried using the the syntax but it gives rise to this error.
Syntax
pd.to_datetime(df['Datetime'], format = '%y-%m-%d %H:%M:%S')
Error
time data '2020-11-01 16:23:12' does not match format '%y-%m-%d %H:%M:%S'
Try %Y,
this is the cheatsheet: https://strftime.org/
Yes, you've used the wrong format for the year.
pd.to_datetime(df["Datetime"], format="%Y-%m-%d %H:%M:%S")
I'm working with big data in pandas and I have a problem with the format of the dates, this is the format of one column
Wed Feb 24 12:06:14 +0000 2021
and I think it is easier to change the format of all the columns with a format like this
'%d/%m/%Y, %H:%M:%S'
how can i do that?
Does this work for you?
pandas.to_datetime(s, format='%d/%m/%Y, %H:%M:%S')
Source: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
You can use the following function for your dataset.
def change_format(x):
format = dt.datetime.strptime(x, "%a %b %d %H:%M:%S %z %Y")
new_format = format.strftime('%d/%m/%Y, %H:%M:%S')
return new_format
Then apply it using df['date_column'] = df['date_column'].apply(change_format).
Here df is your dataset.
2020-11-20 23:07:59.381081 +0000 UTC
I am reading a csv file with pandas and into a dataframe there is a timestamp column that is object. I was not able to convert to datetime nor read the +0000 UTC into format
I tried the following:
datetimeObj = datetime.strptime('2020-11-21 22:16:25.389601 +0000 UTC', '%Y-%m-%d %H:%M:%S.%f %Z')
but the %Z is giving me error.
Any advice for a beginner in the python & Pandas world
?
You have to add also %z before %Z.
%z refers to the offset in the form +HHMM or -HHMM.
%Z referts to Time zone name.
Try this:
datetime_object = datetime.strptime('2020-11-20 23:07:59.381081 +0000 UTC', '%Y-%m-%d %H:%M:%S.%f %z %Z')
Assuming you have datetime strings in that format in a pandas DataFrame, I'd suggest to remove the +0000 since pd.to_datetime won't parse +0000 and UTC at the same time.
import pandas as pd
df = pd.DataFrame({'timestamp':["2020-11-20 23:07:59.381081 +0000 UTC"]})
df['datetime'] = pd.to_datetime(df['timestamp'].str.replace(" +0000", "", regex=False))
# df['datetime']
# 0 2020-11-20 23:07:59.381081+00:00
# Name: datetime, dtype: datetime64[ns, UTC]
Why not just strip the UTC? In contrast to a UTC offset of +0000, it's unambiguous. +0000 could also originate from a time zone that just happens to have UTC+0 at the time represented in the timestamp.
I've been trying to use the to_datetime function to convert values in my column to datetime:
df['date'] = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d %H:%M:%S %z %Z')
After that, I received only NaT values.
Example: Value Format in Column: '1979-01-01 00:00:00 +0000 UTC'
I think you can't parse utc offset (+0000) and timeszone information at the sime time.
You might want to remove the UTC at the end and only parse the offset.
df['date'] = df.date.str[:-4]
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d %H:%M:%S %z')
Pandas can't manage both %z and %Z as you can see here. Note that Python's strptime can handle this, but doesn't deal with %Z.
In your case you might want to just peel off the last bit with ser.str and consider opening a feature request.
I'm trying to convert strings in my dataset('2016-01-01 00:00:00') to time stamps using pd.to_datetime.
Im trying:
pd.to_datetime(train["timestamp"],format='%Y/%m/%d %I:%M:%S')
but I get
time data '2016-01-01 00:00:00' does not match format '%Y/%m/%d %I:%M:%S' (match)
How can I fix this?
If you want it to be in the specific format that you mentioned, that is %Y/%m/%d %I:%M:%S, then do it like this.
First convert your string to datetime format using to_datetime:
df['timestamp'] = pd.to_datetime(df['timestamp'])
Now that your column is in datetime format, convert to the following format using strftime:
df['timestamp'] = df['timestamp'].dt.strftime('%Y/%m/%d %I:%M:%S')
Output:
timestamp
0 2016/01/01 12:00:00
1 2016/01/01 12:00:00
As others pointed out, use %H instead of %I for 24 hour format, like this:
df['timestamp'] = df['timestamp'].dt.strftime('%Y/%m/%d %H:%M:%S')
That's because your format in your df is different. Try the following using -, also use %H for 24-hour clock:
pd.to_datetime(train["timestamp"],format='%Y-%m-%d %H:%M:%S')
2 issues here:
Use - instead of /
%I is for Hour 00-12, use %H for Hour 00-23
pd.to_datetime(train["timestamp"],format='%Y-%m-%d %H:%M:%S')