Checking if a date string is in UTC format - python

I have a date string like "2011-11-06 14:00:00+00:00". Is there a way to check if this is in UTC format or not ?. I tried to convert the above string to a datetime object using utc = datetime.strptime('2011-11-06 14:00:00+00:00','%Y-%m-%d %H:%M%S+%z) so that i can compare it with pytz.utc, but i get 'ValueError: 'z' is a bad directive in format '%Y-%m-%d %H:%M%S+%z'
How to check if the date string is in UTC ?. Some example would be really appreciated.
Thank You

A simple regular expression will do:
>>> import re
>>> RE = re.compile(r'^\d{4}-\d{2}-\d{2}[ T]\d{2}:\d{2}:\d{2}[+-]\d{2}:\d{2}$')
>>> bool(RE.search('2011-11-06 14:00:00+00:00'))
True

By 'in UTC format' do you actually mean ISO-8601?. This is a pretty common question.

The problem with your format string is that strptime just passes the job of parsing time strings on to c's strptime, and different flavors of c accept different directives. In your case (and mine, it seems), the %z directive is not accepted.
There's some ambiguity in the doc pages about this. The datetime.datetime.strptime docs point to the format specification for time.strptime which doesn't contain a lower-case %z directive, and indicates that
Additional directives may be supported on certain platforms, but only the ones listed here have a meaning standardized by ANSI C.
But then it also points here which does contain a lower-case %z, but reiterates that
The full set of format codes supported varies across platforms, because Python calls the platform C library’s strftime() function, and platform variations are common.
There's also a bug report about this issue.

Related

How to require a timestamp to be zero-padded during validation in Python?

I'm trying to validate a string that's supposed to contain a timestamp in the format of ISO 8601 (commonly used in JSON).
Python's strptime seems to be very forgiving when it comes to validating zero-padding, see code example below (note that the hour is missing a leading zero):
>>> import datetime
>>> s = '1985-08-23T3:00:00.000'
>>> datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f')
datetime.datetime(1985, 8, 23, 3, 0)
It gracefully accepts a string that's not zero-padded for the hour for example, and doesn't throw a ValueError exception as I would expect.
Is there any way to enforce strptime to validate that it's zero-padded? Or is there any other built-in function in the standard libs of Python that does?
I would like to avoid writing my own regexp for this.
There is already an answer that parsing ISO8601 or RFC3339 date/time with Python strptime() is impossible: How to parse an ISO 8601-formatted date?
So, to answer you question, no there is no way in the standard Python library to reliable parse such a date.
Regarding the regex suggestions, a date string like
2020-14-32T45:33:44.123
would result in a valid date. There are lots of Python modules (if you search for "iso8601" on https://pypi.python.org), but building a complete ISO8601 Validator would require things like leap seconds, the list of possible time zone offset values and many more.
To enforce strptime to validate leading zeros for you you'll have to add your own literals to Python's _strptime._TimeRE_cache. The solution is very hacky, most likely not very portable, and requires writing a RegEx - although only for the hour part of a timestamp.
Another solution to the problem would be to write your own function that uses strptime and also converts the parsed date back to a string and compares the two strings. This solution is portable, but it lacks for the clear error messages - you won't be able to distinguish between missing leading zeros in hours, minutes, seconds.
You said you want to avoid a regex, but this is actually the type of problem where a regex is appropriate. As you discovered, strptime is very flexible about the input it will accept. However, the regex for this problem is relatively easy to compose:
import re
date_pattern = re.compile(r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}')
s_list = [
'1985-08-23T3:00:00.000',
'1985-08-23T03:00:00.000'
]
for s in s_list:
if date_pattern.match(s):
print "%s is valid" % s
else:
print "%s is invalid" % s
Output
1985-08-23T3:00:00.000 is invalid
1985-08-23T03:00:00.000 is valid
Try it on repl.it
The only thing I can think of outside of messing with Python internals is to test for the validity of the format by knowing what you are looking for.
So, if I garner it right, the format is '%Y-%m-%dT%H:%M:%S.%f' and should be zero padded.
Then, you know the exact length of the string you are looking for and reproduce the intended result..
import datetime
s = '1985-08-23T3:00:00.000'
stripped = datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f')
try:
assert len(s) == 23
except AssertionError:
raise ValueError("time data '{}' does not match format '%Y-%m-%dT%H:%M:%S.%f".format(s))
else:
print(stripped) #just for good measure
>>ValueError: time data '1985-08-23T3:00:00.000' does not match format '%Y-%m-%dT%H:%M:%S.%f

Does python time.strftime process timezone options correctly (for RFC 3339)

I'm trying to get RFC3339 compatible output from python's time module, using the time.strftime() function.
With the Linux 'date' command, I can use a format string like the following: "date +%F_%T%:z"
$ date +%F_%T%:z
2017-06-29_16:13:29-07:00
When used with python time.strftime, the %:z appears to not be supported.
$ python
>>> import time
>>> print time.strftime("%F %T%:z")
2017-06-29 16:16:15%:z
Apparently, '%z' is supported, but '%:z' is not:
>>> print time.strftime("%F %T%z")
2017-05-29 16:15:35-0700
RFC3339 specifically uses the timezone offset with the embedded colon.
That would be 07:00 in my case, instead of 0700.
I believe the omission of support for the "%:z' option is due to the underlying C implementation of strftime() not supporting the versions of timezone offset formatters with colons. That is '%:z', '%::z', etc.
Is there any workaround for this (e.g. another python module, or some option I'm missing int the 'time' module), other than writing code to get %z output and reformat it in %:z format, to solve this problem?
EDIT: Another question (Generate RFC 3339 timestamp in Python) gives solutions for other modules that can be used to output RFC3339 output. I'm going to self-answer with information that I found for the question in the title.
The strict answer to the question in the title "Does python time.strftime process timezone options correctly (for RFC3339)?" is: No.
The "%:z" supported by the Linux 'date' command is a GNU extension, and is not in the POSIX spec, or in the C implementation of strftime (as of this writing).
With regards to workarounds (requested in the body of the question), answers in Generate RFC 3339 timestamp in Python can be used as alternatives time.strftime to output RFC3339-compliant software.
Specifically, I used the pytz module to get timezone information, and datetime class isoformat() function to print in RFC3339-compliant format (with a colon in the timezone offset portion of the output). Like so:
(in Python 2.7 on Ubuntu 14.04)
>>> import pytz, datetime
>>> latz = pytz.timezone("America/Los_Angeles")
>>> latz
<DstTzInfo 'America/Los_Angeles' PST-1 day, 16:00:00 STD>
>>> dt = datetime.datetime.now(latz)
>>> dt2 = datetime.datetime(dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second, 0, latz)
>>> dt2.isoformat()
'2017-07-06T11:50:07-08:00'
Note the conversion from dt to dt2, to set microseconds to 0. This prevents isoformat from printing microseconds as a decimal portion of seconds in the isoformat output (which RFC3339 does not support)

Python - Getting the date format [duplicate]

This question already has answers here:
How to determine appropriate strftime format from a date string?
(4 answers)
Closed 5 years ago.
I'm getting a date as a string, then I'm parsing it to datetime object.
Is there any way to check what's is the date format of the object?
Let's say that this is the object that I'm creating:
modified_date = parser.parse("2015-09-01T12:34:15.601+03:00")
How can i print or get the exact date format of this object, i need this in order to verify that it's in the correct format, so I'll be able to to make a diff of today's date and the given date.
I had a look in the source code and, unfortunately, python-dateutil doesn't expose the format. In fact it doesn't even generate a guess for the format at all, it just goes ahead and parses - the code is like a big nested spaghetti of conditionals.
You could have a look at dateinfer which looks to be what you're searching for, but these are unrelated libraries so there is no guarantee at all that python-dateutil will parse with the same format that dateinfer suggests.
>>> from dateinfer import infer
>>> s = "2015-09-01T12:34:15.601+03:00"
>>> infer([s])
'%Y-%d-%mT%I:%M:%S.601+%m:%d'
Look at that .601. Close but not cigar. I think it has probably also mixed up the month and the day. You might get better results by giving it more than one date string to base the guess upon.
i need this in order to verify that it's in the correct format
If you know the expected time format (or a set of valid time formats) then you could just parse the input using it: if it succeeds then the time format is valid (the usual EAFP approach in Python):
for date_format in valid_date_formats:
try:
return datetime.strptime(date_string, date_format), date_format
except ValueError: # wrong date format
pass # try the next format
raise ValueError("{date_string} is not in the correct format. "
"valid formats: {valid_date_formats}".format(**vars()))
Here's a complete code example (in Russian -- ignore the text, look at the code).
If there are many valid date formats then to improve time performance you might want to combine them into a single regular expression or convert the regex to a deterministic or non-deterministic finite-state automaton (DFA or NFA).
In general, if you need to extract dates from a larger text that is too varied to create parsing rules manually; consider machine learning solutions e.g., a NER system such as webstruct (for html input).

time.strftime() incorrect timezone format [duplicate]

Every time I use:
time.strftime("%z")
I get:
Eastern Daylight Time
However, I would like the UTC offset in the form +HHMM or -HHMM. I have even tried:
time.strftime("%Z")
Which still yields:
Eastern Daylight Time
I have read several other posts related to strftime() and %z always seems to return the UTC offset in the proper +HHMM or -HHMM format. How do I get strftime() to output in the +HHMM or -HHMM format for python 3.3?
Edit: I'm running Windows 7
In 2.x, if you look at the docs for time.strftime, they don't even mention %z. It's not guaranteed to exist at all, much less to be consistent across platforms. In fact, as footnote 1 implies, it's left up to the C strftime function. In 3.x, on the other hand, they do mention %z, and the footnote that explains that it doesn't work the way you'd expect is not easy to see; that's an open bug.
However, in 2.6+ (including all 3.x versions), datetime.strftime is guaranteed to support %z as "UTC offset in the form +HHMM or -HHMM (empty string if the the object is naive)." So, that makes for a pretty easy workaround: use datetime instead of time. Exactly how to change things depends on what exactly you're trying to do — using Python-dateutil tz then datetime.now(tz.tzlocal()).strftime('%z') is the way to get just the local timezone formatted as a GMT offset, but if you're trying to format a complete time the details will be a little different.
If you look at the source, time.strftime basically just checks the format string for valid-for-the-platform specifiers and calls the native strftime function, while datetime.strftime has a bunch of special handling for different specifiers, including %z; in particular, it will replace the %z with a formatted version of utcoffset before passing things on to strftime. The code has changed a few times since 2.7, and even been radically reorganized once, but the same difference is basically there even in the pre-3.5 trunk.
For a proper solution, see abarnert’s answer below.
You can use time.altzone which returns a negative offset in seconds. For example, I’m on CEST at the moment (UTC+2), so I get this:
>>> time.altzone
-7200
And to put it in your desired format:
>>> '{}{:0>2}{:0>2}'.format('-' if time.altzone > 0 else '+', abs(time.altzone) // 3600, abs(time.altzone // 60) % 60)
'+0200'
As abarnert mentioned in the comments, time.altzone gives the offset when DST is active while time.timezone does for when DST is not active. To figure out which to use, you can do what J.F. Sebastian suggested in his answer to a different question. So you can get the correct offset like this:
time.altzone if time.daylight and time.localtime().tm_isdst > 0 else time.timezone
As also suggested by him, you can use the following in Python 3 to get the desired format using datetime.timezone:
>>> datetime.now(timezone.utc).astimezone().strftime('%z')
'+0200'
Use time.timezone to get the time offset in seconds.
Format it using :
("-" if time.timezone > 0 else "+") + time.strftime("%H:%M", time.gmtime(abs(time.timezone)))
to convert the same to +/-HH:MM format.
BTW isn't this supposed to be a bug ? According to strftime docs.
Also I thought this SO answer might help you to convert from Zone offset string to HH:MM format. But since "%z" is not working as expected, I feel its moot.
NOTE: The time.timezone is immune to Daylight savings.
It will come as no surprise that this bug persists in, what is the latest Windows version available currently, Win 10 Version 1703 (Creators). However, time marches on and there is a lovely date-and-time library called pendulum that does what the question asks for. Sébastien Eustace (principal author of the product?) has shown me this.
>>> pendulum.now().strftime('%z')
'-0400'
pendulum assumes UTC/GMT unless told otherwise, and keeps timezone with the date-time object. There are many other possibilities, amongst them these:
>>> pendulum.now(tz='Europe/Paris').strftime('%z')
'+0200'
>>> pendulum.create(year=2016, month=11, day=5, hour=16, minute=23, tz='America/Winnipeg').strftime('%z')
'-0500'
>>> pendulum.now(tz='America/Winnipeg').strftime('%z')
'-0500'

Parsing datetime strings with microseconds in Python 2.5

I have a text file with a lot of datetime strings in isoformat. The strings are similar to this:
'2009-02-10 16:06:52.598800'
These strings were generated using str(datetime_object). The problem is that, for some reason, str(datetime_object) generates a different format when the datetime object has microseconds set to zero and some strings look like this:
'2009-02-10 16:06:52'
How can I parse these strings and convert them into a datetime object?
It's very important to get all the data in the object, including microseconds.
NOTE: I have to use Python 2.5, the format directive %f for microseconds doesn't exist in 2.5.
Alternatively:
from datetime import datetime
def str2datetime(s):
parts = s.split('.')
dt = datetime.strptime(parts[0], "%Y-%m-%d %H:%M:%S")
return dt.replace(microsecond=int(parts[1]))
Using strptime itself to parse the date/time string (so no need to think up corner cases for a regex).
Use the dateutil module. It supports a much wider range of date and time formats than the built in Python ones.
You'll need to easy_install dateutil for the following code to work:
from dateutil.parser import parser
p = parser()
datetime_with_microseconds = p.parse('2009-02-10 16:06:52.598800')
print datetime_with_microseconds.microsecond
results in:
598799
Someone has already filed a bug with this issue: Issue 1982. Since you need this to work with python 2.5 you must parse the value manualy and then manipulate the datetime object.
It might not be the best solution, but you can use a regular expression:
m = re.match(r'(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})(?:\.(\d{6}))?', datestr)
dt = datetime.datetime(*[int(x) for x in m.groups() if x])

Categories