Related
I have an extremely large dataset with date/time columns with various formats. I have a validation function to detect the possible date/time string formats that can handle handle 24 hour time as well as 12 hour. The seperator is always :. A sample of the is below. However, after profiling my code, it seems this can become a bottleneck and expensive in terms of the execution time. My question is if there is a better way to do this without affecting the performance.
import datetime
def validate_time(time_str: str):
for time_format in ["%H:%M", "%H:%M:%S", "%H:%M:%S.%f", "%I:%M %p"]:
try:
return datetime.datetime.strptime(time_str, time_format)
except ValueError:
continue
return None
print(validate_time(time_str="9:21 PM"))
Instead of trying to parse using every format string, you could split by colons to obtain the segments of your string that denote hours, minutes, and everything that remains. Then you can parse the result depending on the number of values the split returns:
def validate_time_new(time_str: str):
time_vals = time_str.split(':')
try:
if len(time_vals) == 1:
# No split, so invalid time
return None
elif len(time_vals) == 2:
if time_vals[-1][::-2].lower() in ["am", "pm"]:
# if last element contains am or pm, try to parse as 12hr time
return datetime.datetime.strptime(time_str, "%I:%M %p")
else:
# try to parse as 24h time
return datetime.datetime.strptime(time_str, "%H:%M")
elif len(time_vals) == 3:
if "." in time_vals[-1]:
# If the last element has a decimal point, try to parse microseconds
return datetime.datetime.strptime(time_str, "%H:%M:%S.%f")
else:
# try to parse without microseconds
return datetime.datetime.strptime(time_str, "%H:%M:%S")
else: return None
except ValueError:
# If any of the attempts to parse throws an error, return None
return None
To test this, let's time both methods for a bunch of test strings:
import timeit
print("old\t\t\tnew\t\t\t\told/new\t\ttest_string")
for s in ["12:24", "12:23:42", "13:53", "1:53 PM", "12:24:43.220", "not a date", "54:23:21"]:
t1 = timeit.timeit('validate_time(s)', 'from __main__ import datetime, validate_time, s', number=100)
t2 = timeit.timeit('validate_time_new(s)', 'from __main__ import datetime, validate_time_new, s', number=100)
print(f"{t1:.6f}\t{t2:.6f}\t\t{t1/t2:.6f}\t\t{s}")
old new old/new test_string
0.001628 0.001143 1.424322 12:24
0.001567 0.001012 1.548661 12:23:42
0.000935 0.000979 0.955177 13:53
0.003004 0.000722 4.161657 1:53 PM
0.004523 0.001396 3.241204 12:24:43.220
0.002148 0.000025 84.897370 not a date
0.002262 0.000622 3.638629 54:23:21
So i have been trying to add a time format to my REST calls in python, but there seems to always be some type of issue, first of all here is the time format requirement, and it has to be exact, or it wont work unfortunately.
Use the following ISO-8601 compliant date/time format in request parameters.
yyyy-MM-dd'T'HH:mm:ss.SSSXXX
For example, May 26 2014 at 21:49:46 PM could have a format like one of the following:
l In PDT: 2014-05-26T21:49:46.000-07:00
l In UTC: 2014-05-26T21:49:46.000Z
Code Description
yyyy Four digit year
MM Two-digit month (01=January, etc.)
dd Two-digit day of month (01 through 31)
T Separator for date/time
HH Two digits of hour (00 through 23) (am/pm NOT allowed)
mm Two digits of minute (00 through 59)
ss Two digits of second (00 through 59)
SSS Three digit milliseconds of the second
XXX ISO 8601 time zone (Z or +hh:mm or -hh:mm)
So, what i have tried before is:
def format_time(self, isnow):
currentdt = datetime.datetime.utcnow()
if not isnow:
currentdt += datetime.timedelta(0,3)
(dt, micro) = currentdt.strftime('%Y-%m-%dT%H:%M:%S.%f').split('.')
dt = "%s.%03dZ" % (dt, int(micro) / 1000)
return dt
Now, this might return it in the kinda right format, but there is still the problem with timezones.
The end result i am trying to accomplish, is when i execute this, it finds the current time, (Amsterdam timezone/GMT/UTC+1), and creates it in this format.
And the else statement, to get the same time, but append X seconds.
Would anyone be so kind to help me out here?
Ok, so you got the microseconds formatted as milliseconds, well done there.
Now your challenge is to handle the timezone offset; it can't only be Z.
And to make things more difficult, strftime's %z format gives + (or -) HHMM, instead of HH:MM.
So you'll need to deal with that. Here's one way to do it:
Python 3:
def format_time(self, isnow):
currentdt = datetime.datetime.now(datetime.timezone.utc)
if not isnow:
currentdt += datetime.timedelta(0,3)
(dt, micro) = currentdt.strftime('%Y-%m-%dT%H:%M:%S.%f').split('.')
tz_offset = currentdt.astimezone().strftime('%z')
tz_offset = "Z" if tz_offset == "" else tz_offset[:3] + ":" + tz_offset[3:]
dt = "%s.%03d%s" % (dt, int(micro) / 1000, tz_offset)
return dt
Python 2:
import pytz
from dateutil.tz import *
def format_time(self, isnow):
currentdt = datetime.datetime.now(pytz.utc)
if not isnow:
currentdt += datetime.timedelta(0,3)
(dt, micro) = currentdt.strftime('%Y-%m-%dT%H:%M:%S.%f').split('.')
tz_offset = currentdt.astimezone(tzlocal()).strftime('%z')
tz_offset = "Z" if tz_offset == "" else tz_offset[:3] + ":" + tz_offset[3:]
dt = "%s.%03d%s" % (dt, int(micro) / 1000, tz_offset)
return dt
Response to comment:
I needed to make a few changes. It's remarkably non-trivial to find the current timezone. The easiest way I could find was from https://stackoverflow.com/a/25887393/1404311 and I've integrated those concepts into the code that is now above.
Basically, instead of utcnow(), you should use now(datetime.timezone.utc). The former gives a naive datetime, while the latter gives a datetime set to UTC, but aware that it is. Then use astimezone() to make it aware of your local timezone, then use strftime('%z') to get the time offzone from there. THEN go through the string manipulation.
This question already has answers here:
How do I parse an ISO 8601-formatted date?
(29 answers)
Closed 8 years ago.
The community reviewed whether to reopen this question last month and left it closed:
Original close reason(s) were not resolved
I'm getting a datetime string in a format like "2009-05-28T16:15:00" (this is ISO 8601, I believe). One hackish option seems to be to parse the string using time.strptime and passing the first six elements of the tuple into the datetime constructor, like:
datetime.datetime(*time.strptime("2007-03-04T21:08:12", "%Y-%m-%dT%H:%M:%S")[:6])
I haven't been able to find a "cleaner" way of doing this. Is there one?
I prefer using the dateutil library for timezone handling and generally solid date parsing. If you were to get an ISO 8601 string like: 2010-05-08T23:41:54.000Z you'd have a fun time parsing that with strptime, especially if you didn't know up front whether or not the timezone was included. pyiso8601 has a couple of issues (check their tracker) that I ran into during my usage and it hasn't been updated in a few years. dateutil, by contrast, has been active and worked for me:
from dateutil import parser
yourdate = parser.parse(datestring)
Since Python 3.7 and no external libraries, you can use the fromisoformat function from the datetime module:
datetime.datetime.fromisoformat('2019-01-04T16:41:24+02:00')
Python 2 doesn't support the %z format specifier, so it's best to explicitly use Zulu time everywhere if possible:
datetime.datetime.strptime("2007-03-04T21:08:12Z", "%Y-%m-%dT%H:%M:%SZ")
Because ISO 8601 allows many variations of optional colons and dashes being present, basically CCYY-MM-DDThh:mm:ss[Z|(+|-)hh:mm]. If you want to use strptime, you need to strip out those variations first.
The goal is to generate a UTC datetime object.
If you just want a basic case that work for UTC with the Z suffix like 2016-06-29T19:36:29.3453Z:
datetime.datetime.strptime(timestamp.translate(None, ':-'), "%Y%m%dT%H%M%S.%fZ")
If you want to handle timezone offsets like 2016-06-29T19:36:29.3453-0400 or 2008-09-03T20:56:35.450686+05:00 use the following. These will convert all variations into something without variable delimiters like 20080903T205635.450686+0500 making it more consistent/easier to parse.
import re
# This regex removes all colons and all
# dashes EXCEPT for the dash indicating + or - utc offset for the timezone
conformed_timestamp = re.sub(r"[:]|([-](?!((\d{2}[:]\d{2})|(\d{4}))$))", '', timestamp)
datetime.datetime.strptime(conformed_timestamp, "%Y%m%dT%H%M%S.%f%z" )
If your system does not support the %z strptime directive (you see something like ValueError: 'z' is a bad directive in format '%Y%m%dT%H%M%S.%f%z') then you need to manually offset the time from Z (UTC). Note %z may not work on your system in Python versions < 3 as it depended on the C library support which varies across system/Python build type (i.e., Jython, Cython, etc.).
import re
import datetime
# This regex removes all colons and all
# dashes EXCEPT for the dash indicating + or - utc offset for the timezone
conformed_timestamp = re.sub(r"[:]|([-](?!((\d{2}[:]\d{2})|(\d{4}))$))", '', timestamp)
# Split on the offset to remove it. Use a capture group to keep the delimiter
split_timestamp = re.split(r"([+|-])",conformed_timestamp)
main_timestamp = split_timestamp[0]
if len(split_timestamp) == 3:
sign = split_timestamp[1]
offset = split_timestamp[2]
else:
sign = None
offset = None
# Generate the datetime object without the offset at UTC time
output_datetime = datetime.datetime.strptime(main_timestamp +"Z", "%Y%m%dT%H%M%S.%fZ" )
if offset:
# Create timedelta based on offset
offset_delta = datetime.timedelta(hours=int(sign+offset[:-2]), minutes=int(sign+offset[-2:]))
# Offset datetime with timedelta
output_datetime = output_datetime + offset_delta
Arrow looks promising for this:
>>> import arrow
>>> arrow.get('2014-11-13T14:53:18.694072+00:00').datetime
datetime.datetime(2014, 11, 13, 14, 53, 18, 694072, tzinfo=tzoffset(None, 0))
Arrow is a Python library that provides a sensible, intelligent way of creating, manipulating, formatting and converting dates and times. Arrow is simple, lightweight and heavily inspired by moment.js and requests.
You should keep an eye on the timezone information, as you might get into trouble when comparing non-tz-aware datetimes with tz-aware ones.
It's probably the best to always make them tz-aware (even if only as UTC), unless you really know why it wouldn't be of any use to do so.
#-----------------------------------------------
import datetime
import pytz
import dateutil.parser
#-----------------------------------------------
utc = pytz.utc
BERLIN = pytz.timezone('Europe/Berlin')
#-----------------------------------------------
def to_iso8601(when=None, tz=BERLIN):
if not when:
when = datetime.datetime.now(tz)
if not when.tzinfo:
when = tz.localize(when)
_when = when.strftime("%Y-%m-%dT%H:%M:%S.%f%z")
return _when[:-8] + _when[-5:] # Remove microseconds
#-----------------------------------------------
def from_iso8601(when=None, tz=BERLIN):
_when = dateutil.parser.parse(when)
if not _when.tzinfo:
_when = tz.localize(_when)
return _when
#-----------------------------------------------
I haven't tried it yet, but pyiso8601 promises to support this.
import datetime, time
def convert_enddate_to_seconds(self, ts):
"""Takes ISO 8601 format(string) and converts into epoch time."""
dt = datetime.datetime.strptime(ts[:-7],'%Y-%m-%dT%H:%M:%S.%f')+\
datetime.timedelta(hours=int(ts[-5:-3]),
minutes=int(ts[-2:]))*int(ts[-6:-5]+'1')
seconds = time.mktime(dt.timetuple()) + dt.microsecond/1000000.0
return seconds
This also includes the milliseconds and time zone.
If the time is '2012-09-30T15:31:50.262-08:00', this will convert into epoch time.
>>> import datetime, time
>>> ts = '2012-09-30T15:31:50.262-08:00'
>>> dt = datetime.datetime.strptime(ts[:-7],'%Y-%m-%dT%H:%M:%S.%f')+ datetime.timedelta(hours=int(ts[-5:-3]), minutes=int(ts[-2:]))*int(ts[-6:-5]+'1')
>>> seconds = time.mktime(dt.timetuple()) + dt.microsecond/1000000.0
>>> seconds
1348990310.26
Both ways:
Epoch to ISO time:
isoTime = time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime(epochTime))
ISO time to Epoch:
epochTime = time.mktime(time.strptime(isoTime, '%Y-%m-%dT%H:%M:%SZ'))
Isodate seems to have the most complete support.
aniso8601 should handle this. It also understands timezones, Python 2 and Python 3, and it has a reasonable coverage of the rest of ISO 8601, should you ever need it.
import aniso8601
aniso8601.parse_datetime('2007-03-04T21:08:12')
Here is a super simple way to do these kind of conversions.
No parsing, or extra libraries required.
It is clean, simple, and fast.
import datetime
import time
################################################
#
# Takes the time (in seconds),
# and returns a string of the time in ISO8601 format.
# Note: Timezone is UTC
#
################################################
def TimeToISO8601(seconds):
strKv = datetime.datetime.fromtimestamp(seconds).strftime('%Y-%m-%d')
strKv = strKv + "T"
strKv = strKv + datetime.datetime.fromtimestamp(seconds).strftime('%H:%M:%S')
strKv = strKv +"Z"
return strKv
################################################
#
# Takes a string of the time in ISO8601 format,
# and returns the time (in seconds).
# Note: Timezone is UTC
#
################################################
def ISO8601ToTime(strISOTime):
K1 = 0
K2 = 9999999999
K3 = 0
counter = 0
while counter < 95:
K3 = (K1 + K2) / 2
strK4 = TimeToISO8601(K3)
if strK4 < strISOTime:
K1 = K3
if strK4 > strISOTime:
K2 = K3
counter = counter + 1
return K3
################################################
#
# Takes a string of the time in ISO8601 (UTC) format,
# and returns a python DateTime object.
# Note: returned value is your local time zone.
#
################################################
def ISO8601ToDateTime(strISOTime):
return time.gmtime(ISO8601ToTime(strISOTime))
#To test:
Test = "2014-09-27T12:05:06.9876"
print ("The test value is: " + Test)
Ans = ISO8601ToTime(Test)
print ("The answer in seconds is: " + str(Ans))
print ("And a Python datetime object is: " + str(ISO8601ToDateTime(Test)))
Currently I am logging stuff and I am using my own formatter with a custom formatTime():
def formatTime(self, _record, _datefmt):
t = datetime.datetime.now()
return t.strftime('%Y-%m-%d %H:%M:%S.%f')
My issue is that the microseconds, %f, are six digits. Is there anyway to spit out less digits, like the first three digits of the microseconds?
The simplest way would be to use slicing to just chop off the last three digits of the microseconds:
def format_time():
t = datetime.datetime.now()
s = t.strftime('%Y-%m-%d %H:%M:%S.%f')
return s[:-3]
I strongly recommend just chopping. I once wrote some logging code that rounded the timestamps rather than chopping, and I found it actually kind of confusing when the rounding changed the last digit. There was timed code that stopped running at a certain timestamp yet there were log events with that timestamp due to the rounding. Simpler and more predictable to just chop.
If you want to actually round the number rather than just chopping, it's a little more work but not horrible:
def format_time():
t = datetime.datetime.now()
s = t.strftime('%Y-%m-%d %H:%M:%S.%f')
head = s[:-7] # everything up to the '.'
tail = s[-7:] # the '.' and the 6 digits after it
f = float(tail)
temp = "{:.03f}".format(f) # for Python 2.x: temp = "%.3f" % f
new_tail = temp[1:] # temp[0] is always '0'; get rid of it
return head + new_tail
Obviously you can simplify the above with fewer variables; I just wanted it to be very easy to follow.
As of Python 3.6 the language has this feature built in:
def format_time():
t = datetime.datetime.now()
s = t.isoformat(timespec='milliseconds')
return s
This method should always return a timestamp that looks exactly like this (with or without the timezone depending on whether the input dt object contains one):
2016-08-05T18:18:54.776+0000
It takes a datetime object as input (which you can produce with datetime.datetime.now()). To get the time zone like in my example output you'll need to import pytz and pass datetime.datetime.now(pytz.utc).
import pytz, datetime
time_format(datetime.datetime.now(pytz.utc))
def time_format(dt):
return "%s:%.3f%s" % (
dt.strftime('%Y-%m-%dT%H:%M'),
float("%.3f" % (dt.second + dt.microsecond / 1e6)),
dt.strftime('%z')
)
I noticed that some of the other methods above would omit the trailing zero if there was one (e.g. 0.870 became 0.87) and this was causing problems for the parser I was feeding these timestamps into. This method does not have that problem.
An easy solution that should work in all cases:
def format_time():
t = datetime.datetime.now()
if t.microsecond % 1000 >= 500: # check if there will be rounding up
t = t + datetime.timedelta(milliseconds=1) # manually round up
return t.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]
Basically you do manual rounding on the date object itself first, then you can safely trim the microseconds.
Edit: As some pointed out in the comments below, the rounding of this solution (and the one above) introduces problems when the microsecond value reaches 999500, as 999.5 is rounded to 1000 (overflow).
Short of reimplementing strftime to support the format we want (the potential overflow caused by the rounding would need to be propagated up to seconds, then minutes, etc.), it is much simpler to just truncate to the first 3 digits as outlined in the accepted answer, or using something like:
'{:03}'.format(int(999999/1000))
-- Original answer preserved below --
In my case, I was trying to format a datestamp with milliseconds formatted as 'ddd'. The solution I ended up using to get milliseconds was to use the microsecond attribute of the datetime object, divide it by 1000.0, pad it with zeros if necessary, and round it with format. It looks like this:
'{:03.0f}'.format(datetime.now().microsecond / 1000.0)
# Produces: '033', '499', etc.
You can subtract the current datetime from the microseconds.
d = datetime.datetime.now()
current_time = d - datetime.timedelta(microseconds=d.microsecond)
This will turn 2021-05-14 16:11:21.916229 into 2021-05-14 16:11:21
This method allows flexible precision and will consume the entire microsecond value if you specify too great a precision.
def formatTime(self, _record, _datefmt, precision=3):
dt = datetime.datetime.now()
us = str(dt.microsecond)
f = us[:precision] if len(us) > precision else us
return "%d-%d-%d %d:%d:%d.%d" % (dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second, int(f))
This method implements rounding to 3 decimal places:
import datetime
from decimal import *
def formatTime(self, _record, _datefmt, precision='0.001'):
dt = datetime.datetime.now()
seconds = float("%d.%d" % (dt.second, dt.microsecond))
return "%d-%d-%d %d:%d:%s" % (dt.year, dt.month, dt.day, dt.hour, dt.minute,
float(Decimal(seconds).quantize(Decimal(precision), rounding=ROUND_HALF_UP)))
I avoided using the strftime method purposely because I would prefer not to modify a fully serialized datetime object without revalidating it. This way also shows the date internals in case you want to modify it further.
In the rounding example, note that the precision is string-based for the Decimal module.
Here is my solution using regexp:
import re
# Capture 6 digits after dot in a group.
regexp = re.compile(r'\.(\d{6})')
def to_splunk_iso(dt):
"""Converts the datetime object to Splunk isoformat string."""
# 6-digits string.
microseconds = regexp.search(dt.isoformat()).group(1)
return regexp.sub('.%d' % round(float(microseconds) / 1000), dt.isoformat())
Fixing the proposed solution based on Pablojim Comments:
from datetime import datetime
dt = datetime.now()
dt_round_microsec = round(dt.microsecond/1000) #number of zeroes to round
dt = dt.replace(microsecond=dt_round_microsec)
If once want to get the day of the week (i.e, 'Sunday)' along with the result, then by slicing '[:-3]' will not work. At that time you may go with,
dt = datetime.datetime.now()
print("{}.{:03d} {}".format(dt.strftime('%Y-%m-%d %I:%M:%S'), dt.microsecond//1000, dt.strftime("%A")))
#Output: '2019-05-05 03:11:22.211 Sunday'
%H - for 24 Hour format
%I - for 12 Hour format
Thanks,
Adding my two cents here as this method will allow you to write your microsecond format as you would a float in c-style. It takes advantage that they both use %f.
import datetime
import re
def format_datetime(date, format):
"""Format a ``datetime`` object with microsecond precision.
Pass your microsecond as you would format a c-string float.
e.g "%.3f"
Args:
date (datetime.datetime): You input ``datetime`` obj.
format (str): Your strftime format string.
Returns:
str: Your formatted datetime string.
"""
# We need to check if formatted_str contains "%.xf" (x = a number)
float_format = r"(%\.\d+f)"
has_float_format = re.search(float_format, format)
if has_float_format:
# make microseconds be decimal place. Might be a better way to do this
microseconds = date.microsecond
while int(microseconds): # quit once it's 0
microseconds /= 10
ms_str = has_float_format.group(1) % microseconds
format = re.sub(float_format, ms_str[2:], format)
return date.strftime(format)
print(datetime.datetime.now(), "%H:%M:%S.%.3f")
# '17:58:54.424'
I'm trying to learn python after spending the last 15 or so years working only in Perl and only occasionally.
I can't understand how to handle the two different kinds of results from the parse method of Calendar.parse() from parsedatetime
Given this script:
#!/usr/bin/python
import parsedatetime.parsedatetime as pdt
import parsedatetime.parsedatetime_consts as pdc
import sys
import os
# create an instance of Constants class so we can override some of the defaults
c = pdc.Constants()
# create an instance of the Calendar class and pass in our Constants # object instead of letting it create a default
p = pdt.Calendar(c)
while True:
reply = raw_input('Enter text:')
if reply == 'stop':
break
else:
result = p.parse(reply)
print result
print
And this sample run:
Enter text:tomorrow
(time.struct_time(tm_year=2009, tm_mon=11, tm_mday=28, tm_hour=9, tm_min=0, tm_sec=0, tm_wday=5, tm_yday=332, tm_isdst=-1), 1)
Enter text:11/28
((2009, 11, 28, 14, 42, 55, 4, 331, 0), 1)
I can't figure out how to get the output such that I can consisently use result like so:
print result[0].tm_mon, result[0].tm_mday
That won't work in the case where the input is "11/28" because the output is just a tuple and not a struct_time.
Probably a simple thing.. but not for this newbie. From my perspective the output of Calendar.parse() is unpredictable and hard to use. Any help appreciated. Tia.
I know this is an old question but I ran into this yesterday and the answer here is incomplete (it will fail in the case that parse() returns a datetime).
From the parsedatetime docs:
parse() returns a tuple ( result, type ) where type specifies one of:
0 = not parsed at all
1 = parsed as a date (of type struct_time)
2 = parsed as a time (of type struct_time)
3 = parsed as a datetime (of type datetime.datetime)
Which is a little weird and maybe not the clearest way to do it, but it works and is pretty useful.
Here's a little chunk of code that will convert whatever it returns to a proper python datetime:
import parsedatetime.parsedatetime as pdt
def datetimeFromString( s ):
c = pdt.Calendar()
result, what = c.parse( s )
dt = None
# what was returned (see http://code-bear.com/code/parsedatetime/docs/)
# 0 = failed to parse
# 1 = date (with current time, as a struct_time)
# 2 = time (with current date, as a struct_time)
# 3 = datetime
if what in (1,2):
# result is struct_time
dt = datetime.datetime( *result[:6] )
elif what == 3:
# result is a datetime
dt = result
if dt is None:
# Failed to parse
raise ValueError, ("Don't understand date '"+s+"'")
return dt
Use x = time.struct_time(result[0]) and you'll get a struct_time (so that you can check x.tm_mon and x.tm_mday) no matter whether that result[0] is a struct_time itself, or just a 9-tuple (I've never heard of parsedatetime so I don't know why it's inconsistent in its return type, but with this simple approach you can neutralize that inconsistency).