The following simple script:
from datetime import datetime as DT
ts = 'Mon Aug 17 12:49:28 EDT 2020'
fmt = '%a %b %d %H:%M:%S %Z %Y'
dts = DT.strptime(ts, fmt)
print(dts)
works normally, when I simply invoke Python with it:
% python3.7 t.py
2020-08-17 12:49:28
However, if I add a different timezone to the environment, the script fails:
% env TZ=UTC python3.7 t.py
Traceback (most recent call last):
File "t.py", line 5, in <module>
dts = DT.strptime(ts, fmt)
File "/opt/lib/python3.7/_strptime.py", line 577, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "/opt/lib/python3.7/_strptime.py", line 359, in _strptime
(data_string, format))
ValueError: time data 'Mon Aug 17 12:49:28 EDT 2020' does not match format '%a %b %d %H:%M:%S %Z %Y'
I tried with earlier Python-versions -- 2.7 and 3.6 -- and got the same error. Even setting the TZ to EDT does not work, although the value of America/New_York (which is my computer's /etc/localtime) seems Ok.
How can such timestamps be parsed reliably?
I suggest using dateutil's parser.parse with a time zone mapping dict:
import dateutil
ts = 'Mon Aug 17 12:49:28 EDT 2020'
# add more time zone names / abbreviations as key-value pairs here:
tzmapping = {'EDT': dateutil.tz.gettz('US/Eastern')}
dt = dateutil.parser.parse(ts, tzinfos=tzmapping)
print(dt)
print(repr(dt))
# 2020-08-17 12:49:28-04:00
# datetime.datetime(2020, 8, 17, 12, 49, 28, tzinfo=tzfile('US/Eastern'))
Time zone name abbreviations are inherently ambiguous and won't be parsed by %Z. Exceptions are UTC and GMT - however, also be careful with that! %Z accepts e.g. a literal "UTC" but it doesn't result in an aware datetime object. Again, dateutil's parser does a better job than the standard lib's datetime.strptime.
Related
I want to save the received date of emails from a Gmail account into a time-series database.
The problem is that I cannot convert the string that I got from the email to timestamp.
I tried this:
from datetime import datetime
date1 = 'Thu, 28 May 2020 08:15:58 -0700 (PDT)'
date1_obj = datetime.strptime(date1, '%a, %d %b %Y %H:%M:%S %z %Z')
print(date1_obj)
But got this error:
Traceback (most recent call last):
File "/format_date.py", line 11, in <module>
date1_obj = datetime.strptime(date1, '%a, %d %b %Y %H:%M:%S %z %Z')
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_strptime.py", line 577, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_strptime.py", line 359, in _strptime
(data_string, format))
ValueError: time data 'Thu, 28 May 2020 08:15:58 -0700 (PDT)' does not match format '%a, %d %b %Y %H:%M:%S %z %Z'
Tried with or without parenthesis wrapping Timezone.
Read a lot, but nothing about how to deal with date strings containing "(PDT)" or any other timezones. It's very important to get the right date... If I run the same code without "(PDT)", got an incorrect time (because of my local time).
I know I can use string methods to manipulate it and convert to a right datetime, but I feel like this would be flexible.
Sorry for my terrible English.
Thank you!
you could use dateutil's parser to parse the string, automatically inferring the format:
import dateutil
s = 'Thu, 28 May 2020 08:15:58 -0700 (PDT)'
dt = dateutil.parser.parse(s)
# datetime.datetime(2020, 5, 28, 8, 15, 58, tzinfo=tzoffset('PDT', -25200))
dt.utcoffset().total_seconds()
# -25200.0
Note that although the timezone is given a name ("PDT"), it is only a UTC offset of 25200 s. In many cases that is sufficient, at least to convert to UTC.
If you need the specific timezone (e.g. to account for DST transitions etc.), you can use a mapping dict that you supply to dateutil.parser.parse as tzinfos:
tzmap = {'PDT': dateutil.tz.gettz('US/Pacific'),
'PST': dateutil.tz.gettz('US/Pacific')}
dt = dateutil.parser.parse(s, tzinfos=tzmap)
# datetime.datetime(2020, 5, 28, 8, 15, 58, tzinfo=tzfile('US/Pacific'))
dt.utcoffset().total_seconds()
# -25200.0
Close, you forgot to put the bracket around the last entry.
date1_obj = datetime.strptime(date1, '%a, %d %b %Y %H:%M:%S %z (%Z)')
Well, after all your answers, which were very helpful, I finally solved.
This is how:
>>> from email.utils import parsedate_tz, mktime_tz
>>> date = 'Thu, 28 May 2020 08:15:58 -0700 (PST)'
>>> timestamp = mktime_tz(parsedate_tz(date))
>>> timestamp
1590678958
>>>
I checked that timestamp, and stands to 12:15:58 local time, what it's exactly what I was looking for.
Thank you very much to everybody who took a minute to answer.
If it does not work even if you enclose %Z in brackets then the problem lies within the %Z directive
https://docs.python.org/3/library/time.html
Support for the %Z directive is based on the values contained in
tzname and whether daylight is true. Because of this, it is
platform-specific except for recognizing UTC and GMT which are always
known (and are considered to be non-daylight savings timezones).
In example the following results in a ValueError for me (in Europe)
date1 = 'Thu, 28 May 2020 08:15:58 -0700 (PST)'
date1_obj = datetime.strptime(date1, '%a, %d %b %Y %H:%M:%S %z (%Z)')
print(date1_obj)
While with GMT it the output is 2020-05-28 08:15:58-07:00
date1 = 'Thu, 28 May 2020 08:15:58 -0700 (GMT)'
date1_obj = datetime.strptime(date1, '%a, %d %b %Y %H:%M:%S %z (%Z)')
print(date1_obj)
Based on your comment under this answer you could split the string if the Timezone bit is not important:
date1 = 'Thu, 28 May 2020 08:15:58 -0700 (GMT)'
date1_obj = datetime.strptime(date1.split(" (")[0], '%a, %d %b %Y %H:%M:%S %z')
I'm curious why the timezone in this example, GMT, is not parsed as a valid one:
>>> from datetime import datetime
>>> import pytz
>>> b = 'Mon, 3 Oct 2016 21:24:17 GMT'
>>> fmt = '%a, %d %b %Y %H:%M:%S %Z'
>>> datetime.strptime(b, fmt).astimezone(pytz.utc)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: astimezone() cannot be applied to a naive datetime
Doing the same with a -0700 instead of GMT and %z instead of %Z in the format works just fine.
What's the proper way to parse dates ending in string time zones if not this?
Use .replace() method with datetime object to update the time zone info.
>>> datetime.strptime(b, fmt).replace(tzinfo=pytz.utc)
datetime.datetime(2016, 10, 3, 21, 24, 17, tzinfo=<UTC>)
Since you mentioned, .astimezone() is working with %Z instead of %s in the format string. Even though there is z in both the formatting (difference in just case), but they are totally different in terms of what they represent.
As per the strftime's directive document:
%z : UTC offset in the form +HHMM or -HHMM (empty string if the the object is naive).
%Z : Time zone name (empty string if the object is naive).
I was trying to convert a string to a datetime object.
The string I got from a news feed is in the following format:
"Thu, 16 Oct 2014 01:16:17 EDT"
I tried using datetime.strptime() to convert it.
i.e.,
datetime.strptime('Thu, 16 Oct 2014 01:16:17 EDT','%a, %d %b %Y %H:%M:%S %Z')
And got the following error:
Traceback (most recent call last):
File "", line 1, in
datetime.strptime('Thu, 16 Oct 2014 01:16:17 EDT','%a, %d %b %Y %H:%M:%S %Z')
File "C:\Anaconda\lib_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data 'Thu, 16 Oct 2014 01:16:17 EDT' does not match
format '%a, %d %b %Y %H:%M:%S %Z'
However, if I tried the string without "EDT", it worked.
i.e.,
datetime.strptime('Thu, 16 Oct 2014 01:16:17','%a, %d %b %Y %H:%M:%S')
Does anyone know how to parse that "EDT" part?
To parse the date in RFC 2822 format, you could use email package:
from datetime import datetime, timedelta
from email.utils import parsedate_tz, mktime_tz
timestamp = mktime_tz(parsedate_tz("Thu, 16 Oct 2014 01:16:17 EDT"))
# -> 1413436577
utc_dt = datetime(1970, 1, 1) + timedelta(seconds=timestamp)
# -> datetime.datetime(2014, 10, 16, 5, 16, 17)
Note: parsedate_tz() assumes that EDT corresponds to -0400 UTC offset but it might be incorrect in Australia where EDT is +1100 (AEDT is used by pytz in this case) i.e., a timezone abbreviation may be ambiguous. See Parsing date/time string with timezone abbreviated name in Python?
Related Python bug: %Z in strptime doesn't match EST and others.
If your computer uses POSIX timestamps (likely), and you are sure the input date is within an acceptable range for your system (not too far into the future/past), and you don't need to preserve the microsecond precision then you could use datetime.utcfromtimestamp:
from datetime import datetime
from email.utils import parsedate_tz, mktime_tz
timestamp = mktime_tz(parsedate_tz("Thu, 16 Oct 2014 01:16:17 EDT"))
# -> 1413436577
utc_dt = datetime.utcfromtimestamp(timestamp)
# -> datetime.datetime(2014, 10, 16, 5, 16, 17)
The email.utils.parsedate_tz() solution is good for 3-letter timezones but it does not work for 4 letters such as AEDT or CEST. If you need a mix, the answer under Parsing date/time string with timezone abbreviated name in Python? works for both with the most commonly used time zones.
I am need to convert a date in below format into different format for displaying purpose. But before that I am trying to convert the date in string to time object, but not able to do so.
>>> time.strptime("Thu Mar 13 23:15:13 2014 EDT", '%a %b %d %H:%M:%S %Y %Z')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib64/python2.4/_strptime.py", line 293, in strptime
raise ValueError("time data did not match format: data=%s fmt=%s" %
ValueError: time data did not match format: data=Thu Mar 13 23:15:13 2014 EDT fmt=%a %b %d %H:%M:%S %Y %Z
Did a trial and error and it's the '%Z' causing the issue, below works fine (just %Z is removed)
>>> time.strptime("Thu Mar 13 23:15:13 2014", '%a %b %d %H:%M:%S %Y')
(2014, 3, 13, 23, 15, 13, 3, 72, -1)
Python wiki (https://docs.python.org/2/library/time.html) says timezone specifier is %Z, then what is the issue here. Please help me find.
Python version: 2.4.3
From the Python documentation. https://docs.python.org/2/library/time.html#time.strptime
Support for the %Z directive is based on the values contained in tzname and whether daylight is true. Because of this, it is platform-specific except for recognizing UTC and GMT which are always known (and are considered to be non-daylight savings timezones).
Which basically says that time.strptime() will only recognize timezones that are listed in time.tzname
Hope this helps
%z will only work for numeric timezone in python 3.x, here is a fix for python 2.x:
Instead of using:
datetime.strptime(t,'%Y-%m-%dT%H:%M %z')
use the timedelta to account for the timezone, like this:
from datetime import datetime,timedelta
def dt_parse(t):
ret = datetime.strptime(t[0:16],'%Y-%m-%dT%H:%M')
if t[18]=='+':
ret+=timedelta(hours=int(t[19:22]),minutes=int(t[23:]))
elif t[18]=='-':
ret-=timedelta(hours=int(t[19:22]),minutes=int(t[23:]))
return ret
This question already has answers here:
Parsing date with timezone from an email?
(8 answers)
Closed 8 years ago.
Given this string: "Fri, 09 Apr 2010 14:10:50 +0000" how does one convert it to a datetime object?
After doing some reading I feel like this should work, but it doesn't...
>>> from datetime import datetime
>>>
>>> str = 'Fri, 09 Apr 2010 14:10:50 +0000'
>>> fmt = '%a, %d %b %Y %H:%M:%S %z'
>>> datetime.strptime(str, fmt)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/_strptime.py", line 317, in _strptime
(bad_directive, format))
ValueError: 'z' is a bad directive in format '%a, %d %b %Y %H:%M:%S %z'
It should be noted that this works without a problem:
>>> from datetime import datetime
>>>
>>> str = 'Fri, 09 Apr 2010 14:10:50'
>>> fmt = '%a, %d %b %Y %H:%M:%S'
>>> datetime.strptime(str, fmt)
datetime.datetime(2010, 4, 9, 14, 10, 50)
But I'm stuck with "Fri, 09 Apr 2010 14:10:50 +0000". I would prefer to convert exactly that without changing (or slicing) it in any way.
It looks as if strptime doesn't always support %z. Python appears to just call the C function, and strptime doesn't support %z on your platform.
Note: from Python 3.2 onwards it will always work.