Python Regex: Mixed format string duration to seconds - python

I have a bunch of time durations in a list as follows
['23m3s', '23:34', '53min 3sec', '2h 3m', '22.10', '1:23:33', ...]
A you can guess, there are N permutations of time formatting being used.
What is the most efficient or simplest way to extract duration in seconds from each element in Python?

This is perhaps still a bit crude, but it seems to do the trick for all the data you've posted so far. The second totals all come to what I would expect. A combination of re and timedelta seems to do the trick for this small sample.
>>> import re
>>> from datetime import timedelta
First a dictionary of regexes: UPDATED BASED ON YOUR COMMENT
d = {'hours': [re.compile(r'(\d+)(?=h)'), re.compile(r'^(\d+)[:.]\d+[:.]\d+')],
'minutes': [re.compile(r'(\d+)(?=m)'), re.compile(r'^(\d+)[:.]\d+$'),
re.compile(r'^\d+[.:](\d+)[.:]\d+')], 'seconds': [re.compile(r'(\d+)(?=s)'),
re.compile(r'^\d+[.:]\d+[.:](\d+)'), re.compile(r'^\d+[:.](\d+)$')]}
Then a function to try out the regexes (perhaps still a bit crude):
>>> def convert_to_seconds(*time_str):
timedeltas = []
for t in time_str:
td = timedelta(0)
for key in d:
for regex in d[key]:
if regex.search(t):
if key == 'hours':
td += timedelta(hours=int(regex.search(t).group(1)))
elif key == 'minutes':
td += timedelta(seconds=int(regex.search(t).group(1)) * 60)
elif key == 'seconds':
td += timedelta(seconds=int(regex.search(t).group(1)))
print(td.seconds)
Here are the results:
>>> convert_to_seconds(*t)
1383
1414
3183
7380
1330
5013
You could add more regexes as you encounter more data, but only to an extent.

Related

python - combine 2 time formats

def convert(time):
pos = ["s","m","h","d"]
time_dict = {"s": 1,"m": 60,"h": 3600,"d": 24*3600 }
unit = time[-1]
if unit not in pos:
return -1
try:
timeVal = int(time[:-1])
except:
return -2
return timeVal*time_dict[unit]
Currently, this is my code and I'm using it to translate Strings like 5d or 30m to seconds. And that's work, but if I try to combine them (like 5d 30m, it gives me the output -2. I don't really see what's wrong here.
Your problem is that you're only checking the last character, you need to parse the string to find each individual group and then work off of that
import re
def convert(time):
time_dict = {"s": 1,"m": 60,"h": 3600,"d": 24*3600 }
regex_groups = re.findall("(\d+)([smhd])", time)
return sum(int(x) * time_dict[y] for x,y in regex_groups)
I don't really see what's wrong here.
Lets say you provided 5d 30m as input, [:-1] does jettison last character which result in 5d 30. You then try to convert it to int which fails, as d is not allowed in integer representation.
You need first to tokenize elements then convert every piece to value in seconds then sum them together, simplified example with h and m only:
def to_seconds(token):
q = {"h":3600,"m":60}
return int(token[:-1])*q[token[-1]]
def convert(time):
return sum(to_seconds(i) for i in time.split())
print(convert("5h 30m"))
output
19800
Disclaimer: this solution assumes that elements are whitespaces sheared

Comparing two datetime strings

I have two DateTime strings. How would I compare them and tell which comes first?
A = '2019-02-12 15:01:45:145'
B = '2019-02-12 15:02:02:22'
This format has milliseconds in it, so it cannot be parsed by time.strptime. I chose to split according to the last colon, parse the left part, and manually convert the right part, add them together.
A = '2019-02-12 15:01:45:145'
B = '2019-02-12 15:02:02:22'
import time
def parse_date(s):
date,millis = s.rsplit(":",1)
return time.mktime(time.strptime(date,"%Y-%m-%d %H:%M:%S")) + int(millis)/1000.0
print(parse_date(A))
print(parse_date(B))
prints:
1549958505.145
1549958522.022
now compare the results instead of printing them to get what you want
If your convention on milliseconds is different (ex: here 22 could also mean 220), then it's slightly different. Pad with zeroes on the right, then parse:
def parse_date(s):
date,millis = s.rsplit(":",1)
millis = millis+"0"*(3-len(millis)) # pad with zeroes
return time.mktime(time.strptime(date,"%Y-%m-%d %H:%M:%S")) + int(millis)/1000.0
in that case the result it:
1549958505.145
1549958522.22
If both the date/time strings are in ISO 8601 format (YYYY-MM-DD hh:mm:ss) you can compare them with a simple string compare, like this:
a = '2019-02-12 15:01:45.145'
b = '2019-02-12 15:02:02.022'
if a < b:
print('Time a comes before b.')
else:
print('Time a does not come before b.')
Your strings, however, have an extra ':' after which come... milliseconds? I'm not sure. But if you convert them to a standard hh:mm:ss.xxx... form, then your date strings will be naturally comparable.
If there is no way to change the fact that you're receiving those strings in hh:mm:ss:xx format (I'm assuming that xx is milliseconds, but only you can say for sure), then you can "munge" the string slightly by parsing out the final ":xx" and re-attaching it as ".xxx", like this:
def mungeTimeString(timeString):
"""Converts a time string in "YYYY-MM-DD hh:mm:ss:xx" format
to a time string in "YYYY-MM-DD hh:mm:ss.xxx" format."""
head, _, tail = timeString.rpartition(':')
return '{}.{:03d}'.format(head, int(tail))
Then call it with:
a = '2019-02-12 15:01:45:145'
b = '2019-02-12 15:02:02:22'
a = mungeTimeString(a)
b = mungeTimeString(b)
if a < b:
print('Time a comes before b.')
else:
print('Time a does not come before b.')

How to sort different date time formats?

I have the following code:
comments = sorted(comments, key=lambda k: k['time_created'])
How to sort correctly if some elements have the different format, like 2017-12-14T17:42:30.345244+0000 and 2017-12-14 00:23:23.468560 and my code fail when trying to compare?
I need to save seconds accuracy.
Is it the good solution?
comments = sorted(comments, key=lambda k: self.unix_time_millis(k['time_created']), reverse=True)
#staticmethod
def unix_time_millis(dt):
epoch = datetime.datetime.utcfromtimestamp(0)
return (dt - epoch).total_seconds() * 1000.0
Python datetime objects are comparable and therefore sortable. I assume that you currently don't use datetime objects but Strings. The following example code is taken from
How to format date string via multiple formats in python
import dateutil.parser
dateutil.parser.parse(date_string)
You would then convert a list of strings to datetime objects via
list_of_dt_objs = [dateutil.parser.parse(str) for str in list_of_strings]
Please note that dateutil is an extension lib. So you have to install it, for instance via pip.
Something like this:
import re
import operator
def convert_to_secs(date_string):
multipliers = [31557600,2592000,86400,3600,60]
date_in_secs = 0
index = 0
for v in re.split(':|-|T|\.|\+|\ ',date_string):
if index < len(multipliers):
date_in_secs = date_in_secs + int(v) * multipliers[index]
index += 1
else:
break
return date_in_secs
def sort_dates(my_dates_in_string):
my_dates_dict = {}
for string_date in my_dates_in_string:
my_dates_dict[string_date] = convert_to_secs(string_date)
return sorted(my_dates_dict.items(), key=operator.itemgetter(1))
print sort_dates(["2017-12-14T17:42:30.345244+0000", "2017-12-14 00:23:23.468560"])

How to trim spaces within timestamps using 'm/d/yy' format

I have a Python script that generates .csv files from other data sources.
Currently, an error happens when the user manually adds a space to a date by accident. Instead of inputting the date as "1/13/17", a space may be added at the front (" 1/13/17") so that there's a space in front of the month.
I've included the relevant part of my Python script below:
def processDateStamp(sourceStamp):
matchObj = re.match(r'^(\d+)/(\d+)/(\d+)\s', sourceStamp)
(month, day, year) = (matchObj.group(1), matchObj.group(2), matchObj.group(3))
return "%s/%s/%s" % (month, day, year)
How do I trim the space issue in front of month and possibly on other components of the date (the day and year) as well for the future?
Thanks in advance.
Since you're dealing with dates, it might be more appropriate to use datetime.strptime than regex here. There are two advantages of this approach:
It makes it slightly clearer to anyone reading that you're trying to parse dates.
Your code will be more prone to throw exceptions when trying to parse data that doesn't represent dates, or represent dates in an incorrect format - this is good because it helps you catch and address issues that might otherwise go unnoticed.
Here's the code:
from datetime import datetime
def processDateStamp(sourceStamp):
date = datetime.strptime(sourceStamp.replace(' ', ''), '%M/%d/%y')
return '{}/{}/{}'.format(date.month, date.day, date.year)
if __name__ == '__main__':
print(processDateStamp('1/13/17')) # 1/13/17
print(processDateStamp(' 1/13/17')) # 1/13/17
print(processDateStamp(' 1 /13 /17')) # 1/13/17
You also can use parser from python-dateutil library. The main benefit you will get - it can recognize the datetime format for you (sometimes it may be useful):
from dateutil import parser
from datetime import datetime
def processDateTimeStamp(sourceStamp):
dt = parser.parse(sourceStamp)
return dt.strftime("%m/%d/%y")
processDateTimeStamp(" 1 /13 / 17") # returns 01/13/17
processDateTimeStamp(" jan / 13 / 17")
processDateTimeStamp(" 1 - 13 - 17")
processDateTimeStamp(" 1 .13 .17")
Once again, a perfect opportunity to use split, strip, and join:
def remove_spaces(date_string):
date_list = date_string.split('/')
result = '/'.join(x.strip() for x in date_list)
return result
Examples
In [7]: remove_spaces('1/13/17')
Out[7]: '1/13/17'
In [8]: remove_spaces(' 1/13/17')
Out[8]: '1/13/17'
In [9]: remove_spaces(' 1/ 13/17')
Out[9]: '1/13/17'

Convert string array to datetime and compare

I just started programming with Python, and have some simple questions (probably). What I would like to do is compare some timestamps to find the closest that isn't later then now.
Basically what Iam trying to do is getting the current track played on the radio, and they have a feed that show the next 20 or so with time for when the track starts. I want to get whats playing right now!
Here is an example array of strings:
examples = ['2012-12-10 02:06:45', '2012-12-10 02:02:43', '2012-12-10 01:58:53']
Now what I would like to do is compare the time closest to now (but not later) to see whats currently playing.
This is my script so far:
import datetime, itertools, time
currentTimeMachine = datetime.datetime.now()
now = currentTimeMachine.strftime("%Y-%m-%d %H:%M:%S")
examples = ['2012-12-10 02:06:45', '2012-12-10 02:02:43', '2012-12-10 01:58:53']
tmsg = examples.strftime('%d%b%Y')
print [x for x in itertools.takewhile( lambda t: now > datetime.datetime.strptime(t, "%Y-%m-%d %H:%M:%S"), examples )][-1]
The last bit there I picked up somwhere else, but I cant seem to get it to work.
Any help would be greatly appreciated!
The other answers have fixed your errors, so your algorithm now runs properly.
But the algorithm itself is wrong. You want to get the closest to the present without going over. But what you've written is:
[x for x in itertools.takewhile(pred, examples)][-1]
Think about what this means. First, takewhile will return examples until one of them fails the predicate. Then you're taking the last one that succeeded. So, if your examples looked like this:
[now-3, now-10, now-5, now+3, now-9, now-1, now+9]
First, takewhile will yield now-3, now-10, now-5 and then stop because pred(now+3) returns False. Then, you take the last one, now-5.
This would work if you sorted the examples in ascending order:
[now-10, now-9, now-5, now-3, now-1, now+3, now+9]
Now takewhile will yield everything up to now-1, so the last thing it yields is the one you want.
But the example in your initial question were in descending order, and in the comment to Anthony Kong's answer, you added some more that aren't in any order at all. So, you obviously can't rely on them being in sorted order. So, one possible fix is to sort them:
>>> import datetime, itertools, time
>>> currentTimeMachine = datetime.datetime.now()
>>> print [x for x in itertools.takewhile(lambda t: currentTimeMachine > datetime.datetime.strptime(t, "%Y-%m-%d %H:%M:%S"), sorted(examples))][-1]
2012-12-10 02:06:45
Or, to make things a bit more readable, break up that last line, and get rid of the extraneous list comprehension:
>>> exampleDates = [datetime.datetime.strptime(t, "%Y-%m-%d %H:%M:%S") for t in examples]
>>> def beforeNow(t):
... return currentTimeMachine > t
>>> print list(itertools.takewhile(beforeNow, sorted(exampleDates))[-1]
However, this is kind of a silly way to do things. What you really want is the maximum value in examples that isn't after the present. So just translate that English sentence into code:
>>> print max(x for x in exampleDates if x <= currentTimeMachine)
Let's put it all together:
>>> examples = ['2012-12-10 02:06:45', '2012-12-10 02:02:43', '2012-12-10 01:58:53']
>>> exampleDates = (datetime.datetime.strptime(t, "%Y-%m-%d %H:%M:%S") for t in examples)
>>> currentTimeMachine = datetime.datetime.now()
>>> print max(t for t in exampleDates if t <= currentTimeMachine)
2012-12-10 02:06:45
I used a generator expression rather than a list for exampleDates because you don't actually need the list for anything you just need to iterate over it once. If you want to keep it around for inspection or repeated use, change the parens to square brackets.
Also, I changed the < to <=, because you said "isn't later then now" rather than "is earlier than now" (in other words, now should count).
As a side note, because you happen to have ISO-esque timestamps, you actually can sort them as strings:
>>> now = datetime.datetime.now()
>>> currentTimeMachine = datetime.datetime.strftime(now, "%Y-%m-%d %H:%M:%S")
>>> print max(t for t in examples if t <= currentTimeMachine)
2012-12-10 02:06:45
There's no good reason to do things this way, and it will invariably lead you to bugs when you get timestamps in slightly different formats (e.g., '2012-12-10 02:06:45' compares before '2012-12-10Z01:06:45'), but it isn't actually a problem with your original code.
Since you did not post the error message, so based on the code your post, there are a few issues
1) tmsg = examples.strftime('%d%b%Y') won't work because you apply a call on strftime the list
2) As others have pointed out already, in the takewhile you're comparing string with datetime.
This will work:
>>> import datetime, itertools, time
>>> currentTimeMachine = datetime.datetime.now()
>>> print [x for x in itertools.takewhile( lambda t: currentTimeMachine > datetime.datetime.strptime(t, "%Y-%m-%d %H:%M:%S"), examples )][-1]
2012-12-10 01:58:53
Use datetime.datetime.strptime() to convert a string to a datetime object.
>>> import datetime
>>> examples = ['2012-12-10 02:06:45', '2012-12-10 02:02:43', '2012-12-10 01:58:53']
>>> parsed_datetimes = [datetime.datetime.strptime(e, "%Y-%m-%d %H:%M:%S") for e in examples]
>>> parsed_datetimes
[datetime.datetime(2012, 12, 10, 2, 6, 45), datetime.datetime(2012, 12, 10, 2, 2, 43), datetime.datetime(2012, 12, 10, 1, 58, 53)]
This will then get the minimum difference datetime, "closest to now", from the current datetime now:
>>> now = datetime.datetime.now()
>>> min_time_diff, min_datetime = min((now - i, i) for i in parsed_datetimes)
>>> min_time_diff, min_datetime
(datetime.timedelta(0, 36265, 626000), datetime.datetime(2012, 12, 10, 2, 6, 45))

Categories