I have a text file with a lot of datetime strings in isoformat. The strings are similar to this:
'2009-02-10 16:06:52.598800'
These strings were generated using str(datetime_object). The problem is that, for some reason, str(datetime_object) generates a different format when the datetime object has microseconds set to zero and some strings look like this:
'2009-02-10 16:06:52'
How can I parse these strings and convert them into a datetime object?
It's very important to get all the data in the object, including microseconds.
NOTE: I have to use Python 2.5, the format directive %f for microseconds doesn't exist in 2.5.
Alternatively:
from datetime import datetime
def str2datetime(s):
parts = s.split('.')
dt = datetime.strptime(parts[0], "%Y-%m-%d %H:%M:%S")
return dt.replace(microsecond=int(parts[1]))
Using strptime itself to parse the date/time string (so no need to think up corner cases for a regex).
Use the dateutil module. It supports a much wider range of date and time formats than the built in Python ones.
You'll need to easy_install dateutil for the following code to work:
from dateutil.parser import parser
p = parser()
datetime_with_microseconds = p.parse('2009-02-10 16:06:52.598800')
print datetime_with_microseconds.microsecond
results in:
598799
Someone has already filed a bug with this issue: Issue 1982. Since you need this to work with python 2.5 you must parse the value manualy and then manipulate the datetime object.
It might not be the best solution, but you can use a regular expression:
m = re.match(r'(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})(?:\.(\d{6}))?', datestr)
dt = datetime.datetime(*[int(x) for x in m.groups() if x])
Related
I do have some trouble to get datetime to parse this time data. (This works in windows, but not on linux)
Can anyone tell my why this won't work on Linux?
d1 = '2020-01-31T15:16:21+00:00'
d1 = datetime.datetime.strptime(d1, "%Y-%m-%dT%H:%M:%S%z")
It's not supposed to work in Windows either. The %z format does not support a colon between the hours and minutes of the timezone, as stated in the datetime documentation. You'll need to remove the colon first:
import datetime
import re
d1 = "2020-01-31T15:16:21+00:00"
# Remove all colons in the timezone part
d1 = re.sub(r"([\+-]\d\d):(\d\d)(?::(\d\d(?:.\d+)?))?", r"\1\2\3", d1)
d1 = datetime.datetime.strptime(d1, "%Y-%m-%dT%H:%M:%S%z") # now works normally
Expected %z format: ±HHMM[SS[.ffffff]]
Explanation of what the regular expression, based on the above format, means: https://regex101.com/r/EoOBHW/1
Alternatively, you can use the dateutil third-party library, which parses that string successfully:
import datetime
from dateutil.parser import parse
d1 = parse("2020-01-31T15:16:21+00:00")
I need python code for 'YYYY-MM-DD HH24:MI:SS.FF' format .
The result would be like this '2019-07-27 12:07:00.0'
sample code that I tried:
from datetime import datetime as dt
from datetime import timedelta
timestamp=(dt.now() - timedelta(1)).strftime('%Y-%m-%d %HH24:%MI:%SS.%FF')
Output:2019-09-05 10H24:31I:57S.2019-09-05F
Results should looks like 2019-09-05 10:31:57.0
Your format string just needs to be adapted - Python takes a single character to tell about the correct output - your repeated characters don't work like that.
Here is a corrected code example:
from datetime import datetime as dt
from datetime import timedelta
timestamp=(dt.now() - timedelta(1)).strftime('%Y-%m-%d %H:%M:%S.%f')
As you can see, I just removed some characters and wrote f in lowercase. The format characters that you chose already include padding and 24-hour format.
Example output: '2019-09-05 12:27:45.416157'
For a full list of format characters, please check the linked python documentation.
According to this documentation, you should use the following format:
timestamp=(dt.now() - timedelta(1)).strftime('%Y-%m-%d %H:%M:%S.%f')
I'm trying to validate a string that's supposed to contain a timestamp in the format of ISO 8601 (commonly used in JSON).
Python's strptime seems to be very forgiving when it comes to validating zero-padding, see code example below (note that the hour is missing a leading zero):
>>> import datetime
>>> s = '1985-08-23T3:00:00.000'
>>> datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f')
datetime.datetime(1985, 8, 23, 3, 0)
It gracefully accepts a string that's not zero-padded for the hour for example, and doesn't throw a ValueError exception as I would expect.
Is there any way to enforce strptime to validate that it's zero-padded? Or is there any other built-in function in the standard libs of Python that does?
I would like to avoid writing my own regexp for this.
There is already an answer that parsing ISO8601 or RFC3339 date/time with Python strptime() is impossible: How to parse an ISO 8601-formatted date?
So, to answer you question, no there is no way in the standard Python library to reliable parse such a date.
Regarding the regex suggestions, a date string like
2020-14-32T45:33:44.123
would result in a valid date. There are lots of Python modules (if you search for "iso8601" on https://pypi.python.org), but building a complete ISO8601 Validator would require things like leap seconds, the list of possible time zone offset values and many more.
To enforce strptime to validate leading zeros for you you'll have to add your own literals to Python's _strptime._TimeRE_cache. The solution is very hacky, most likely not very portable, and requires writing a RegEx - although only for the hour part of a timestamp.
Another solution to the problem would be to write your own function that uses strptime and also converts the parsed date back to a string and compares the two strings. This solution is portable, but it lacks for the clear error messages - you won't be able to distinguish between missing leading zeros in hours, minutes, seconds.
You said you want to avoid a regex, but this is actually the type of problem where a regex is appropriate. As you discovered, strptime is very flexible about the input it will accept. However, the regex for this problem is relatively easy to compose:
import re
date_pattern = re.compile(r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}')
s_list = [
'1985-08-23T3:00:00.000',
'1985-08-23T03:00:00.000'
]
for s in s_list:
if date_pattern.match(s):
print "%s is valid" % s
else:
print "%s is invalid" % s
Output
1985-08-23T3:00:00.000 is invalid
1985-08-23T03:00:00.000 is valid
Try it on repl.it
The only thing I can think of outside of messing with Python internals is to test for the validity of the format by knowing what you are looking for.
So, if I garner it right, the format is '%Y-%m-%dT%H:%M:%S.%f' and should be zero padded.
Then, you know the exact length of the string you are looking for and reproduce the intended result..
import datetime
s = '1985-08-23T3:00:00.000'
stripped = datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f')
try:
assert len(s) == 23
except AssertionError:
raise ValueError("time data '{}' does not match format '%Y-%m-%dT%H:%M:%S.%f".format(s))
else:
print(stripped) #just for good measure
>>ValueError: time data '1985-08-23T3:00:00.000' does not match format '%Y-%m-%dT%H:%M:%S.%f
I am converting the datetime into time. My JSON datetime format is "2017-01-02T19:00:07.9181202Z". I have placed my code below:
from datetime import datetime
date_format = datetime.strptime('2017-01-02T19:00:07.9181202Z', '%Y-%m-%dT%H:%M:%S.%fZ')
time = date_format.strftime("%I:%M %p")
print(time)
Error message as below:
After that I read this python date-time document. It says that microsecond digit should be 6. But, JSON date-time microsecond has 7 digit.
Message from Python document:
%f is an extension to the set of format characters in the C standard
(but implemented separately in datetime objects, and therefore always
available). When used with the strptime() method, the %f directive
accepts from one to six digits and zero pads on the right.
I need result like 07:00 PM format. Is there any alternative method?
Thanks in advance.
If you're sure that the input will always be like that, you can just remove the extra digit before passing that string to strptime:
date_format = datetime.strptime('2017-01-02T19:00:07.9181202Z'[:-2] + 'Z', '%Y-%m-%dT%H:%M:%S.%fZ')
This is dirty, but gives the idea - remove the last two characters (the extra digit and "Z"), re-add the "Z".
I have a date string like "2011-11-06 14:00:00+00:00". Is there a way to check if this is in UTC format or not ?. I tried to convert the above string to a datetime object using utc = datetime.strptime('2011-11-06 14:00:00+00:00','%Y-%m-%d %H:%M%S+%z) so that i can compare it with pytz.utc, but i get 'ValueError: 'z' is a bad directive in format '%Y-%m-%d %H:%M%S+%z'
How to check if the date string is in UTC ?. Some example would be really appreciated.
Thank You
A simple regular expression will do:
>>> import re
>>> RE = re.compile(r'^\d{4}-\d{2}-\d{2}[ T]\d{2}:\d{2}:\d{2}[+-]\d{2}:\d{2}$')
>>> bool(RE.search('2011-11-06 14:00:00+00:00'))
True
By 'in UTC format' do you actually mean ISO-8601?. This is a pretty common question.
The problem with your format string is that strptime just passes the job of parsing time strings on to c's strptime, and different flavors of c accept different directives. In your case (and mine, it seems), the %z directive is not accepted.
There's some ambiguity in the doc pages about this. The datetime.datetime.strptime docs point to the format specification for time.strptime which doesn't contain a lower-case %z directive, and indicates that
Additional directives may be supported on certain platforms, but only the ones listed here have a meaning standardized by ANSI C.
But then it also points here which does contain a lower-case %z, but reiterates that
The full set of format codes supported varies across platforms, because Python calls the platform C library’s strftime() function, and platform variations are common.
There's also a bug report about this issue.