I have an extremely large dataset with date/time columns with various formats. I have a validation function to detect the possible date/time string formats that can handle handle 24 hour time as well as 12 hour. The seperator is always :. A sample of the is below. However, after profiling my code, it seems this can become a bottleneck and expensive in terms of the execution time. My question is if there is a better way to do this without affecting the performance.
import datetime
def validate_time(time_str: str):
for time_format in ["%H:%M", "%H:%M:%S", "%H:%M:%S.%f", "%I:%M %p"]:
try:
return datetime.datetime.strptime(time_str, time_format)
except ValueError:
continue
return None
print(validate_time(time_str="9:21 PM"))
Instead of trying to parse using every format string, you could split by colons to obtain the segments of your string that denote hours, minutes, and everything that remains. Then you can parse the result depending on the number of values the split returns:
def validate_time_new(time_str: str):
time_vals = time_str.split(':')
try:
if len(time_vals) == 1:
# No split, so invalid time
return None
elif len(time_vals) == 2:
if time_vals[-1][::-2].lower() in ["am", "pm"]:
# if last element contains am or pm, try to parse as 12hr time
return datetime.datetime.strptime(time_str, "%I:%M %p")
else:
# try to parse as 24h time
return datetime.datetime.strptime(time_str, "%H:%M")
elif len(time_vals) == 3:
if "." in time_vals[-1]:
# If the last element has a decimal point, try to parse microseconds
return datetime.datetime.strptime(time_str, "%H:%M:%S.%f")
else:
# try to parse without microseconds
return datetime.datetime.strptime(time_str, "%H:%M:%S")
else: return None
except ValueError:
# If any of the attempts to parse throws an error, return None
return None
To test this, let's time both methods for a bunch of test strings:
import timeit
print("old\t\t\tnew\t\t\t\told/new\t\ttest_string")
for s in ["12:24", "12:23:42", "13:53", "1:53 PM", "12:24:43.220", "not a date", "54:23:21"]:
t1 = timeit.timeit('validate_time(s)', 'from __main__ import datetime, validate_time, s', number=100)
t2 = timeit.timeit('validate_time_new(s)', 'from __main__ import datetime, validate_time_new, s', number=100)
print(f"{t1:.6f}\t{t2:.6f}\t\t{t1/t2:.6f}\t\t{s}")
old new old/new test_string
0.001628 0.001143 1.424322 12:24
0.001567 0.001012 1.548661 12:23:42
0.000935 0.000979 0.955177 13:53
0.003004 0.000722 4.161657 1:53 PM
0.004523 0.001396 3.241204 12:24:43.220
0.002148 0.000025 84.897370 not a date
0.002262 0.000622 3.638629 54:23:21
Related
I am able to parse strings containing date/time with time.strptime
>>> import time
>>> time.strptime('30/03/09 16:31:32', '%d/%m/%y %H:%M:%S')
(2009, 3, 30, 16, 31, 32, 0, 89, -1)
How can I parse a time string that contains milliseconds?
>>> time.strptime('30/03/09 16:31:32.123', '%d/%m/%y %H:%M:%S')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/_strptime.py", line 333, in strptime
data_string[found.end():])
ValueError: unconverted data remains: .123
Python 2.6 added a new strftime/strptime macro %f. The docs are a bit misleading as they only mention microseconds, but %f actually parses any decimal fraction of seconds with up to 6 digits, meaning it also works for milliseconds or even centiseconds or deciseconds.
time.strptime('30/03/09 16:31:32.123', '%d/%m/%y %H:%M:%S.%f')
However, time.struct_time doesn't actually store milliseconds/microseconds. You're better off using datetime, like this:
>>> from datetime import datetime
>>> a = datetime.strptime('30/03/09 16:31:32.123', '%d/%m/%y %H:%M:%S.%f')
>>> a.microsecond
123000
As you can see, .123 is correctly interpreted as 123 000 microseconds.
I know this is an older question but I'm still using Python 2.4.3 and I needed to find a better way of converting the string of data to a datetime.
The solution if datetime doesn't support %f and without needing a try/except is:
(dt, mSecs) = row[5].strip().split(".")
dt = datetime.datetime(*time.strptime(dt, "%Y-%m-%d %H:%M:%S")[0:6])
mSeconds = datetime.timedelta(microseconds = int(mSecs))
fullDateTime = dt + mSeconds
This works for the input string "2010-10-06 09:42:52.266000"
To give the code that nstehr's answer refers to (from its source):
def timeparse(t, format):
"""Parse a time string that might contain fractions of a second.
Fractional seconds are supported using a fragile, miserable hack.
Given a time string like '02:03:04.234234' and a format string of
'%H:%M:%S', time.strptime() will raise a ValueError with this
message: 'unconverted data remains: .234234'. If %S is in the
format string and the ValueError matches as above, a datetime
object will be created from the part that matches and the
microseconds in the time string.
"""
try:
return datetime.datetime(*time.strptime(t, format)[0:6]).time()
except ValueError, msg:
if "%S" in format:
msg = str(msg)
mat = re.match(r"unconverted data remains:"
" \.([0-9]{1,6})$", msg)
if mat is not None:
# fractional seconds are present - this is the style
# used by datetime's isoformat() method
frac = "." + mat.group(1)
t = t[:-len(frac)]
t = datetime.datetime(*time.strptime(t, format)[0:6])
microsecond = int(float(frac)*1e6)
return t.replace(microsecond=microsecond)
else:
mat = re.match(r"unconverted data remains:"
" \,([0-9]{3,3})$", msg)
if mat is not None:
# fractional seconds are present - this is the style
# used by the logging module
frac = "." + mat.group(1)
t = t[:-len(frac)]
t = datetime.datetime(*time.strptime(t, format)[0:6])
microsecond = int(float(frac)*1e6)
return t.replace(microsecond=microsecond)
raise
DNS answer above is actually incorrect. The SO is asking about milliseconds but the answer is for microseconds. Unfortunately, Python`s doesn't have a directive for milliseconds, just microseconds (see doc), but you can workaround it by appending three zeros at the end of the string and parsing the string as microseconds, something like:
datetime.strptime(time_str + '000', '%d/%m/%y %H:%M:%S.%f')
where time_str is formatted like 30/03/09 16:31:32.123.
Hope this helps.
My first thought was to try passing it '30/03/09 16:31:32.123' (with a period instead of a colon between the seconds and the milliseconds.) But that didn't work. A quick glance at the docs indicates that fractional seconds are ignored in any case...
Ah, version differences. This was reported as a bug and now in 2.6+ you can use "%S.%f" to parse it.
from python mailing lists: parsing millisecond thread. There is a function posted there that seems to get the job done, although as mentioned in the author's comments it is kind of a hack. It uses regular expressions to handle the exception that gets raised, and then does some calculations.
You could also try do the regular expressions and calculations up front, before passing it to strptime.
For python 2 i did this
print ( time.strftime("%H:%M:%S", time.localtime(time.time())) + "." + str(time.time()).split(".",1)[1])
it prints time "%H:%M:%S" , splits the time.time() to two substrings (before and after the .) xxxxxxx.xx and since .xx are my milliseconds i add the second substring to my "%H:%M:%S"
hope that makes sense :)
Example output:
13:31:21.72
Blink 01
13:31:21.81
END OF BLINK 01
13:31:26.3
Blink 01
13:31:26.39
END OF BLINK 01
13:31:34.65
Starting Lane 01
I am trying to write a function in python whereby I can input a start time and end time and it will return the total hours.
Currently I have been able to write a function where I input for example ('07:30:00', '12:00:00') and it returns 4.5
I want to be able to import a list though. For example,
('07:30:00, 08:30:00', '12:00:00, 12:00:00') and have it return 4.5 , 3.5 etc....
How do I alter my code so I can do this?
Thanks
I have been messing around for hours but am very new to python so do not know how to progress from here
def compute_opening_duration(opening_time, closing_time):
while True:
try:
FORMAT = '%H:%M:%S'
tdelta = datetime.strptime(closing_time, FORMAT) - datetime.strptime(opening_time, FORMAT)
tdelta_s = tdelta.total_seconds()
tdelta_m = tdelta_s/60
tdelta_h = tdelta_m/60
print(tdelta_h)
break
except ValueError:
print('-1')
break
Pass array as a parameter to function. Check if opening time array have the same length as closing time array lenght. Declare result array, in line where you compute tdelta you must than append to result array.
def compute_opening_duration(opening_time_arr, closing_time_arr):
if len(opening_time_arr) != len(closing_time_arr):
return
resultTime = []
for idx, closing_timein enumerate(closing_time_arr) :
try:
FORMAT = '%H:%M:%S'
tdelta = datetime.strptime(closing_time, FORMAT) -
datetime.strptime(opening_time_arr[idx], FORMAT)
resultTime.append(tdelta)
tdelta_s = tdelta.total_seconds()
tdelta_m = tdelta_s/60
tdelta_h = tdelta_m/60
#print(tdelta_h)
except ValueError:
pass
#print('-1')
return resultTime
If I got the question correctly
def compute_opening_duration(time_list):
# convert to datetime:
FORMAT = '%H:%M:%S'
time_list = [datetime.strptime(time, FORMAT) for time in time_list]
# compute and return deltas
return [(close_time-open_time).total_seconds()/3600
for open_time, close_time in zip(time_list[:-1], time_list[1:])
So i have been trying to add a time format to my REST calls in python, but there seems to always be some type of issue, first of all here is the time format requirement, and it has to be exact, or it wont work unfortunately.
Use the following ISO-8601 compliant date/time format in request parameters.
yyyy-MM-dd'T'HH:mm:ss.SSSXXX
For example, May 26 2014 at 21:49:46 PM could have a format like one of the following:
l In PDT: 2014-05-26T21:49:46.000-07:00
l In UTC: 2014-05-26T21:49:46.000Z
Code Description
yyyy Four digit year
MM Two-digit month (01=January, etc.)
dd Two-digit day of month (01 through 31)
T Separator for date/time
HH Two digits of hour (00 through 23) (am/pm NOT allowed)
mm Two digits of minute (00 through 59)
ss Two digits of second (00 through 59)
SSS Three digit milliseconds of the second
XXX ISO 8601 time zone (Z or +hh:mm or -hh:mm)
So, what i have tried before is:
def format_time(self, isnow):
currentdt = datetime.datetime.utcnow()
if not isnow:
currentdt += datetime.timedelta(0,3)
(dt, micro) = currentdt.strftime('%Y-%m-%dT%H:%M:%S.%f').split('.')
dt = "%s.%03dZ" % (dt, int(micro) / 1000)
return dt
Now, this might return it in the kinda right format, but there is still the problem with timezones.
The end result i am trying to accomplish, is when i execute this, it finds the current time, (Amsterdam timezone/GMT/UTC+1), and creates it in this format.
And the else statement, to get the same time, but append X seconds.
Would anyone be so kind to help me out here?
Ok, so you got the microseconds formatted as milliseconds, well done there.
Now your challenge is to handle the timezone offset; it can't only be Z.
And to make things more difficult, strftime's %z format gives + (or -) HHMM, instead of HH:MM.
So you'll need to deal with that. Here's one way to do it:
Python 3:
def format_time(self, isnow):
currentdt = datetime.datetime.now(datetime.timezone.utc)
if not isnow:
currentdt += datetime.timedelta(0,3)
(dt, micro) = currentdt.strftime('%Y-%m-%dT%H:%M:%S.%f').split('.')
tz_offset = currentdt.astimezone().strftime('%z')
tz_offset = "Z" if tz_offset == "" else tz_offset[:3] + ":" + tz_offset[3:]
dt = "%s.%03d%s" % (dt, int(micro) / 1000, tz_offset)
return dt
Python 2:
import pytz
from dateutil.tz import *
def format_time(self, isnow):
currentdt = datetime.datetime.now(pytz.utc)
if not isnow:
currentdt += datetime.timedelta(0,3)
(dt, micro) = currentdt.strftime('%Y-%m-%dT%H:%M:%S.%f').split('.')
tz_offset = currentdt.astimezone(tzlocal()).strftime('%z')
tz_offset = "Z" if tz_offset == "" else tz_offset[:3] + ":" + tz_offset[3:]
dt = "%s.%03d%s" % (dt, int(micro) / 1000, tz_offset)
return dt
Response to comment:
I needed to make a few changes. It's remarkably non-trivial to find the current timezone. The easiest way I could find was from https://stackoverflow.com/a/25887393/1404311 and I've integrated those concepts into the code that is now above.
Basically, instead of utcnow(), you should use now(datetime.timezone.utc). The former gives a naive datetime, while the latter gives a datetime set to UTC, but aware that it is. Then use astimezone() to make it aware of your local timezone, then use strftime('%z') to get the time offzone from there. THEN go through the string manipulation.
Currently I am logging stuff and I am using my own formatter with a custom formatTime():
def formatTime(self, _record, _datefmt):
t = datetime.datetime.now()
return t.strftime('%Y-%m-%d %H:%M:%S.%f')
My issue is that the microseconds, %f, are six digits. Is there anyway to spit out less digits, like the first three digits of the microseconds?
The simplest way would be to use slicing to just chop off the last three digits of the microseconds:
def format_time():
t = datetime.datetime.now()
s = t.strftime('%Y-%m-%d %H:%M:%S.%f')
return s[:-3]
I strongly recommend just chopping. I once wrote some logging code that rounded the timestamps rather than chopping, and I found it actually kind of confusing when the rounding changed the last digit. There was timed code that stopped running at a certain timestamp yet there were log events with that timestamp due to the rounding. Simpler and more predictable to just chop.
If you want to actually round the number rather than just chopping, it's a little more work but not horrible:
def format_time():
t = datetime.datetime.now()
s = t.strftime('%Y-%m-%d %H:%M:%S.%f')
head = s[:-7] # everything up to the '.'
tail = s[-7:] # the '.' and the 6 digits after it
f = float(tail)
temp = "{:.03f}".format(f) # for Python 2.x: temp = "%.3f" % f
new_tail = temp[1:] # temp[0] is always '0'; get rid of it
return head + new_tail
Obviously you can simplify the above with fewer variables; I just wanted it to be very easy to follow.
As of Python 3.6 the language has this feature built in:
def format_time():
t = datetime.datetime.now()
s = t.isoformat(timespec='milliseconds')
return s
This method should always return a timestamp that looks exactly like this (with or without the timezone depending on whether the input dt object contains one):
2016-08-05T18:18:54.776+0000
It takes a datetime object as input (which you can produce with datetime.datetime.now()). To get the time zone like in my example output you'll need to import pytz and pass datetime.datetime.now(pytz.utc).
import pytz, datetime
time_format(datetime.datetime.now(pytz.utc))
def time_format(dt):
return "%s:%.3f%s" % (
dt.strftime('%Y-%m-%dT%H:%M'),
float("%.3f" % (dt.second + dt.microsecond / 1e6)),
dt.strftime('%z')
)
I noticed that some of the other methods above would omit the trailing zero if there was one (e.g. 0.870 became 0.87) and this was causing problems for the parser I was feeding these timestamps into. This method does not have that problem.
An easy solution that should work in all cases:
def format_time():
t = datetime.datetime.now()
if t.microsecond % 1000 >= 500: # check if there will be rounding up
t = t + datetime.timedelta(milliseconds=1) # manually round up
return t.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]
Basically you do manual rounding on the date object itself first, then you can safely trim the microseconds.
Edit: As some pointed out in the comments below, the rounding of this solution (and the one above) introduces problems when the microsecond value reaches 999500, as 999.5 is rounded to 1000 (overflow).
Short of reimplementing strftime to support the format we want (the potential overflow caused by the rounding would need to be propagated up to seconds, then minutes, etc.), it is much simpler to just truncate to the first 3 digits as outlined in the accepted answer, or using something like:
'{:03}'.format(int(999999/1000))
-- Original answer preserved below --
In my case, I was trying to format a datestamp with milliseconds formatted as 'ddd'. The solution I ended up using to get milliseconds was to use the microsecond attribute of the datetime object, divide it by 1000.0, pad it with zeros if necessary, and round it with format. It looks like this:
'{:03.0f}'.format(datetime.now().microsecond / 1000.0)
# Produces: '033', '499', etc.
You can subtract the current datetime from the microseconds.
d = datetime.datetime.now()
current_time = d - datetime.timedelta(microseconds=d.microsecond)
This will turn 2021-05-14 16:11:21.916229 into 2021-05-14 16:11:21
This method allows flexible precision and will consume the entire microsecond value if you specify too great a precision.
def formatTime(self, _record, _datefmt, precision=3):
dt = datetime.datetime.now()
us = str(dt.microsecond)
f = us[:precision] if len(us) > precision else us
return "%d-%d-%d %d:%d:%d.%d" % (dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second, int(f))
This method implements rounding to 3 decimal places:
import datetime
from decimal import *
def formatTime(self, _record, _datefmt, precision='0.001'):
dt = datetime.datetime.now()
seconds = float("%d.%d" % (dt.second, dt.microsecond))
return "%d-%d-%d %d:%d:%s" % (dt.year, dt.month, dt.day, dt.hour, dt.minute,
float(Decimal(seconds).quantize(Decimal(precision), rounding=ROUND_HALF_UP)))
I avoided using the strftime method purposely because I would prefer not to modify a fully serialized datetime object without revalidating it. This way also shows the date internals in case you want to modify it further.
In the rounding example, note that the precision is string-based for the Decimal module.
Here is my solution using regexp:
import re
# Capture 6 digits after dot in a group.
regexp = re.compile(r'\.(\d{6})')
def to_splunk_iso(dt):
"""Converts the datetime object to Splunk isoformat string."""
# 6-digits string.
microseconds = regexp.search(dt.isoformat()).group(1)
return regexp.sub('.%d' % round(float(microseconds) / 1000), dt.isoformat())
Fixing the proposed solution based on Pablojim Comments:
from datetime import datetime
dt = datetime.now()
dt_round_microsec = round(dt.microsecond/1000) #number of zeroes to round
dt = dt.replace(microsecond=dt_round_microsec)
If once want to get the day of the week (i.e, 'Sunday)' along with the result, then by slicing '[:-3]' will not work. At that time you may go with,
dt = datetime.datetime.now()
print("{}.{:03d} {}".format(dt.strftime('%Y-%m-%d %I:%M:%S'), dt.microsecond//1000, dt.strftime("%A")))
#Output: '2019-05-05 03:11:22.211 Sunday'
%H - for 24 Hour format
%I - for 12 Hour format
Thanks,
Adding my two cents here as this method will allow you to write your microsecond format as you would a float in c-style. It takes advantage that they both use %f.
import datetime
import re
def format_datetime(date, format):
"""Format a ``datetime`` object with microsecond precision.
Pass your microsecond as you would format a c-string float.
e.g "%.3f"
Args:
date (datetime.datetime): You input ``datetime`` obj.
format (str): Your strftime format string.
Returns:
str: Your formatted datetime string.
"""
# We need to check if formatted_str contains "%.xf" (x = a number)
float_format = r"(%\.\d+f)"
has_float_format = re.search(float_format, format)
if has_float_format:
# make microseconds be decimal place. Might be a better way to do this
microseconds = date.microsecond
while int(microseconds): # quit once it's 0
microseconds /= 10
ms_str = has_float_format.group(1) % microseconds
format = re.sub(float_format, ms_str[2:], format)
return date.strftime(format)
print(datetime.datetime.now(), "%H:%M:%S.%.3f")
# '17:58:54.424'
I wrote a method to convert a date string to a date in python. When I get the date from an external method, the millisecond precision is lost, whereas when I do the casting within the method the precision is preserved. Could someone let me know what is the problem here? Thanks a lot!
from datetime import datetime
from pytz import timezone
def getUTCTimeFromString(date_string):
#allow time with Z in it
if date_string:
if date_string.find('Z'):
date_string = date_string[:len(date_string)-1]
return datetime.strptime(date_string,"%Y-%m-%dT%H:%M:%S.%f").replace(tzinfo=timezone('UTC'))
return None
def getStringFromDate(dateObject):
return dateObject.strftime('%Y-%m-%d %H:%M:%S.%f')
#Method being tested
# Input 2012-02-27T05:32:10.607Z
def getEasternTimeFromString(date_string):
if date_string:
if date_string.find('Z'):
date_string = date_string[:len(date_string)-1]
local_date = datetime.strptime(date_string,"%Y-%m-%dT%H:%M:%S.%f").replace(tzinfo=timezone('UTC'))
utc_date = getUTCTimeFromString(date_string)
print 'utc date from external method --> '+getStringFromDate(utc_date)
print 'utc date calculated locally -->' +getStringFromDate(local_date)
return utc_date.astimezone(timezone('US/Eastern'))
return None
This is the problem:
if date_string.find('Z'):
date_string = date_string[:len(date_string)-1]
The problem is that string.find() returns -1 if the target is not found. Since -1 is not zero, the if statement is true, and then your code chops off the last character of the string (whether it was a Z or not).
You're doing this truncation twice in the case of utc_date and once in the case of local_date, thus your different results.
I would suggest:
if date_string.endswith('Z'):
date_string = date_string[:-1]