string to date - format string issue - python

I am need to convert a date in below format into different format for displaying purpose. But before that I am trying to convert the date in string to time object, but not able to do so.
>>> time.strptime("Thu Mar 13 23:15:13 2014 EDT", '%a %b %d %H:%M:%S %Y %Z')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib64/python2.4/_strptime.py", line 293, in strptime
raise ValueError("time data did not match format: data=%s fmt=%s" %
ValueError: time data did not match format: data=Thu Mar 13 23:15:13 2014 EDT fmt=%a %b %d %H:%M:%S %Y %Z
Did a trial and error and it's the '%Z' causing the issue, below works fine (just %Z is removed)
>>> time.strptime("Thu Mar 13 23:15:13 2014", '%a %b %d %H:%M:%S %Y')
(2014, 3, 13, 23, 15, 13, 3, 72, -1)
Python wiki (https://docs.python.org/2/library/time.html) says timezone specifier is %Z, then what is the issue here. Please help me find.
Python version: 2.4.3

From the Python documentation. https://docs.python.org/2/library/time.html#time.strptime
Support for the %Z directive is based on the values contained in tzname and whether daylight is true. Because of this, it is platform-specific except for recognizing UTC and GMT which are always known (and are considered to be non-daylight savings timezones).
Which basically says that time.strptime() will only recognize timezones that are listed in time.tzname
Hope this helps

%z will only work for numeric timezone in python 3.x, here is a fix for python 2.x:
Instead of using:
datetime.strptime(t,'%Y-%m-%dT%H:%M %z')
use the timedelta to account for the timezone, like this:
from datetime import datetime,timedelta
def dt_parse(t):
ret = datetime.strptime(t[0:16],'%Y-%m-%dT%H:%M')
if t[18]=='+':
ret+=timedelta(hours=int(t[19:22]),minutes=int(t[23:]))
elif t[18]=='-':
ret-=timedelta(hours=int(t[19:22]),minutes=int(t[23:]))
return ret

Related

Convert date string (from gmail) to timestamp | Python

I want to save the received date of emails from a Gmail account into a time-series database.
The problem is that I cannot convert the string that I got from the email to timestamp.
I tried this:
from datetime import datetime
date1 = 'Thu, 28 May 2020 08:15:58 -0700 (PDT)'
date1_obj = datetime.strptime(date1, '%a, %d %b %Y %H:%M:%S %z %Z')
print(date1_obj)
But got this error:
Traceback (most recent call last):
File "/format_date.py", line 11, in <module>
date1_obj = datetime.strptime(date1, '%a, %d %b %Y %H:%M:%S %z %Z')
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_strptime.py", line 577, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_strptime.py", line 359, in _strptime
(data_string, format))
ValueError: time data 'Thu, 28 May 2020 08:15:58 -0700 (PDT)' does not match format '%a, %d %b %Y %H:%M:%S %z %Z'
Tried with or without parenthesis wrapping Timezone.
Read a lot, but nothing about how to deal with date strings containing "(PDT)" or any other timezones. It's very important to get the right date... If I run the same code without "(PDT)", got an incorrect time (because of my local time).
I know I can use string methods to manipulate it and convert to a right datetime, but I feel like this would be flexible.
Sorry for my terrible English.
Thank you!
you could use dateutil's parser to parse the string, automatically inferring the format:
import dateutil
s = 'Thu, 28 May 2020 08:15:58 -0700 (PDT)'
dt = dateutil.parser.parse(s)
# datetime.datetime(2020, 5, 28, 8, 15, 58, tzinfo=tzoffset('PDT', -25200))
dt.utcoffset().total_seconds()
# -25200.0
Note that although the timezone is given a name ("PDT"), it is only a UTC offset of 25200 s. In many cases that is sufficient, at least to convert to UTC.
If you need the specific timezone (e.g. to account for DST transitions etc.), you can use a mapping dict that you supply to dateutil.parser.parse as tzinfos:
tzmap = {'PDT': dateutil.tz.gettz('US/Pacific'),
'PST': dateutil.tz.gettz('US/Pacific')}
dt = dateutil.parser.parse(s, tzinfos=tzmap)
# datetime.datetime(2020, 5, 28, 8, 15, 58, tzinfo=tzfile('US/Pacific'))
dt.utcoffset().total_seconds()
# -25200.0
Close, you forgot to put the bracket around the last entry.
date1_obj = datetime.strptime(date1, '%a, %d %b %Y %H:%M:%S %z (%Z)')
Well, after all your answers, which were very helpful, I finally solved.
This is how:
>>> from email.utils import parsedate_tz, mktime_tz
>>> date = 'Thu, 28 May 2020 08:15:58 -0700 (PST)'
>>> timestamp = mktime_tz(parsedate_tz(date))
>>> timestamp
1590678958
>>>
I checked that timestamp, and stands to 12:15:58 local time, what it's exactly what I was looking for.
Thank you very much to everybody who took a minute to answer.
If it does not work even if you enclose %Z in brackets then the problem lies within the %Z directive
https://docs.python.org/3/library/time.html
Support for the %Z directive is based on the values contained in
tzname and whether daylight is true. Because of this, it is
platform-specific except for recognizing UTC and GMT which are always
known (and are considered to be non-daylight savings timezones).
In example the following results in a ValueError for me (in Europe)
date1 = 'Thu, 28 May 2020 08:15:58 -0700 (PST)'
date1_obj = datetime.strptime(date1, '%a, %d %b %Y %H:%M:%S %z (%Z)')
print(date1_obj)
While with GMT it the output is 2020-05-28 08:15:58-07:00
date1 = 'Thu, 28 May 2020 08:15:58 -0700 (GMT)'
date1_obj = datetime.strptime(date1, '%a, %d %b %Y %H:%M:%S %z (%Z)')
print(date1_obj)
Based on your comment under this answer you could split the string if the Timezone bit is not important:
date1 = 'Thu, 28 May 2020 08:15:58 -0700 (GMT)'
date1_obj = datetime.strptime(date1.split(" (")[0], '%a, %d %b %Y %H:%M:%S %z')

Date with a time zone specified as a string is parsed as naive

I'm curious why the timezone in this example, GMT, is not parsed as a valid one:
>>> from datetime import datetime
>>> import pytz
>>> b = 'Mon, 3 Oct 2016 21:24:17 GMT'
>>> fmt = '%a, %d %b %Y %H:%M:%S %Z'
>>> datetime.strptime(b, fmt).astimezone(pytz.utc)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: astimezone() cannot be applied to a naive datetime
Doing the same with a -0700 instead of GMT and %z instead of %Z in the format works just fine.
What's the proper way to parse dates ending in string time zones if not this?
Use .replace() method with datetime object to update the time zone info.
>>> datetime.strptime(b, fmt).replace(tzinfo=pytz.utc)
datetime.datetime(2016, 10, 3, 21, 24, 17, tzinfo=<UTC>)
Since you mentioned, .astimezone() is working with %Z instead of %s in the format string. Even though there is z in both the formatting (difference in just case), but they are totally different in terms of what they represent.
As per the strftime's directive document:
%z : UTC offset in the form +HHMM or -HHMM (empty string if the the object is naive).
%Z : Time zone name (empty string if the object is naive).

Python: Date Conversion Error

I'm trying to convert a string into a date format, to be later stored into an SQLite database. Below is the code line at which I'm getting an error.
date_object = datetime.strptime(date, '%b %d, %Y %H:%M %Z')
And this is the error:
File "00Basic.py", line 20, in spider
date_object = datetime.strptime(date, '%b %d, %Y %H:%M %Z') File "C:\Python27\lib\_strptime.py", line 332, in _strptime
(data_string, format)) ValueError: time data 'Aug 19, 2016 08:13 IST' does not match format '%b %d, %Y %H %M %Z'
Question 1: How do I resolve this error?
Question 2: Is this the right approach for preparing to store the date in SQLite later?
Please Note: Very new to programming.
You could use pytz for the timezone conversion as shown:
from datetime import datetime
from pytz import timezone
s = "Aug 19, 2016 08:13 IST".replace('IST', '')
print(timezone('Asia/Calcutta').localize(datetime.strptime(s.rstrip(), '%b %d, %Y %H:%M')))
#2016-08-19 08:13:00+05:30
#<class 'datetime.datetime'>
I would suggest you to use dateutil incase you are handling multiple timezones of string.
The problem is located in the %Z (Time zone) part of the format.
As the documentation explains
%Z Time zone name (empty string if the object is naive). (empty), UTC, EST, CST
It looks like only UTC,EST and CST are valid. (Or it just doesn't recognize IST)
In order to fix this, you could use the %z parameter that accepts any UTC offset, like so:
struct_time = time.strptime("Aug 19, 2016 08:13 +0530", '%b %d, %Y %H:%M %z')
Update: Although this works fine in Python +3.2 it raises an exception when it's run with Python2

How to convert string with UTC offset

I have date as
In [1]: a = "Sun 10 May 2015 13:34:36 -0700"
When I try to convert it using strptime, its giving error.
In [3]: datetime.strptime(a, "%a %d %b %Y %H:%M:%S %Z"
...: )
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-973ef1c6daca> in <module>()
----> 1 datetime.strptime(a, "%a %d %b %Y %H:%M:%S %Z"
2 )
/usr/lib/python2.7/_strptime.pyc in _strptime(data_string, format)
323 if not found:
324 raise ValueError("time data %r does not match format %r" %
--> 325 (data_string, format))
326 if len(data_string) != found.end():
327 raise ValueError("unconverted data remains: %s" %
ValueError: time data 'Sun 10 May 2015 13:34:36 -0700' does not match format '%a %d %b %Y %H:%M:%S %Z'
In [6]: datetime.strptime(a, "%a %d %b %Y %H:%M:%S %z")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-e4870e34edda> in <module>()
----> 1 datetime.strptime(a, "%a %d %b %Y %H:%M:%S %z")
/usr/lib/python2.7/_strptime.pyc in _strptime(data_string, format)
315 del err
316 raise ValueError("'%s' is a bad directive in format '%s'" %
--> 317 (bad_directive, format))
318 # IndexError only occurs when the format string is "%"
319 except IndexError:
ValueError: 'z' is a bad directive in format '%a %d %b %Y %H:%M:%S %z'
As per doc, correct format is %z, but I might missing some part.
From the link you provided for the python doc, I found that you are using Python 2.7
It looks as if strptime doesn't always support %z. Python appears to just call the C function, and strptime doesn't support %z on your platform.
Note: from Python 3.2 onwards it will always work.
I am using Python 3.4 in which it is working fine
>>> a = "Sun 10 May 2015 13:34:36 -0700"
>>> datetime.strptime(a, "%a %d %b %Y %H:%M:%S %z")
Update using dateutil
$ pip install python-dateutil
from dateutil import parser
parsed_date = parser.parse(date)
>>> parsed_date
datetime.datetime(2015, 3, 14, 18, 43, 19)
Your format string is correct and works fine in Python 3.3:
>>> a = "Sun 10 May 2015 13:34:36 -0700"
>>> datetime.strptime(a, "%a %d %b %Y %H:%M:%S %z")
datetime.datetime(2015, 5, 10, 13, 34, 36, tzinfo=datetime.timezone(datetime.timedelta(-1, 61200)))
It gives the error in Python 2.7 indeed.
Unlike strftime(), which is implemented by calling the libc function, strptime() is implemented in the Python library. Here you can see that the version used in Python 2.7 doesn’t support the z format. On the other hand here is the version from Python 3.3, which supports that (I think this was added around 3.2).
So, basically, you have two options:
Using some external library that is able to handle z.
Implementing it yourself (e.g. by stripping the timezone from the string, feeding the first part to strptime() and parsing the second one manually). Looking at how this is done in the Python library might be helpful.
I tried to parse this to return an “aware” object, but it is somewhat complicated.
>>> a = "Sun 10 May 2015 13:34:36 -0700"
>>> time, tz = a.rsplit(' ', 1)
>>> d = datetime.strptime(time, '%a %d %b %Y %H:%M:%S')
datetime.datetime(2015, 5, 10, 13, 34, 36)
Now I have to call d.replace(tzinfo=…tz…) to replace the timezone, but the problem is that I can’t get an instance of tzinfo because just knowing the offset from UTC is not enough to identify a timezone.
In Python 3.2 there is a special timezone class that is a subclass of tzinfo representing a “fake” timezone defined by just its offset. So there are two ways to proceed:
Backport (basically, copy and paste) the timezone class from Python 3 and use it in your parser.
Return a “naive” object:
>>> d + timedelta(hours=int(tz[1:]) * (1 if tz.startswith('-') else -1))
datetime.datetime(2015, 6, 8, 17, 34, 36)
You can parse your input format using only stdlib even in Python 2.7:
>>> from datetime import datetime
>>> from email.utils import mktime_tz, parsedate_tz
>>> mktime_tz(parsedate_tz("Sun 10 May 2015 13:34:36 -0700"))
1431290076
>>> datetime.utcfromtimestamp(_)
datetime.datetime(2015, 5, 10, 20, 34, 36)
The result is a naive datetime object that represents time in UTC.
See other solutions and the way to get an aware datetime object in Python: parsing date with timezone from an email.

Python : Converting string to datetime [duplicate]

I was trying to convert a string to a datetime object.
The string I got from a news feed is in the following format:
"Thu, 16 Oct 2014 01:16:17 EDT"
I tried using datetime.strptime() to convert it.
i.e.,
datetime.strptime('Thu, 16 Oct 2014 01:16:17 EDT','%a, %d %b %Y %H:%M:%S %Z')
And got the following error:
Traceback (most recent call last):
File "", line 1, in
datetime.strptime('Thu, 16 Oct 2014 01:16:17 EDT','%a, %d %b %Y %H:%M:%S %Z')
File "C:\Anaconda\lib_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data 'Thu, 16 Oct 2014 01:16:17 EDT' does not match
format '%a, %d %b %Y %H:%M:%S %Z'
However, if I tried the string without "EDT", it worked.
i.e.,
datetime.strptime('Thu, 16 Oct 2014 01:16:17','%a, %d %b %Y %H:%M:%S')
Does anyone know how to parse that "EDT" part?
To parse the date in RFC 2822 format, you could use email package:
from datetime import datetime, timedelta
from email.utils import parsedate_tz, mktime_tz
timestamp = mktime_tz(parsedate_tz("Thu, 16 Oct 2014 01:16:17 EDT"))
# -> 1413436577
utc_dt = datetime(1970, 1, 1) + timedelta(seconds=timestamp)
# -> datetime.datetime(2014, 10, 16, 5, 16, 17)
Note: parsedate_tz() assumes that EDT corresponds to -0400 UTC offset but it might be incorrect in Australia where EDT is +1100 (AEDT is used by pytz in this case) i.e., a timezone abbreviation may be ambiguous. See Parsing date/time string with timezone abbreviated name in Python?
Related Python bug: %Z in strptime doesn't match EST and others.
If your computer uses POSIX timestamps (likely), and you are sure the input date is within an acceptable range for your system (not too far into the future/past), and you don't need to preserve the microsecond precision then you could use datetime.utcfromtimestamp:
from datetime import datetime
from email.utils import parsedate_tz, mktime_tz
timestamp = mktime_tz(parsedate_tz("Thu, 16 Oct 2014 01:16:17 EDT"))
# -> 1413436577
utc_dt = datetime.utcfromtimestamp(timestamp)
# -> datetime.datetime(2014, 10, 16, 5, 16, 17)
The email.utils.parsedate_tz() solution is good for 3-letter timezones but it does not work for 4 letters such as AEDT or CEST. If you need a mix, the answer under Parsing date/time string with timezone abbreviated name in Python? works for both with the most commonly used time zones.

Categories