How to sort different date time formats? - python

I have the following code:
comments = sorted(comments, key=lambda k: k['time_created'])
How to sort correctly if some elements have the different format, like 2017-12-14T17:42:30.345244+0000 and 2017-12-14 00:23:23.468560 and my code fail when trying to compare?
I need to save seconds accuracy.
Is it the good solution?
comments = sorted(comments, key=lambda k: self.unix_time_millis(k['time_created']), reverse=True)
#staticmethod
def unix_time_millis(dt):
epoch = datetime.datetime.utcfromtimestamp(0)
return (dt - epoch).total_seconds() * 1000.0

Python datetime objects are comparable and therefore sortable. I assume that you currently don't use datetime objects but Strings. The following example code is taken from
How to format date string via multiple formats in python
import dateutil.parser
dateutil.parser.parse(date_string)
You would then convert a list of strings to datetime objects via
list_of_dt_objs = [dateutil.parser.parse(str) for str in list_of_strings]
Please note that dateutil is an extension lib. So you have to install it, for instance via pip.

Something like this:
import re
import operator
def convert_to_secs(date_string):
multipliers = [31557600,2592000,86400,3600,60]
date_in_secs = 0
index = 0
for v in re.split(':|-|T|\.|\+|\ ',date_string):
if index < len(multipliers):
date_in_secs = date_in_secs + int(v) * multipliers[index]
index += 1
else:
break
return date_in_secs
def sort_dates(my_dates_in_string):
my_dates_dict = {}
for string_date in my_dates_in_string:
my_dates_dict[string_date] = convert_to_secs(string_date)
return sorted(my_dates_dict.items(), key=operator.itemgetter(1))
print sort_dates(["2017-12-14T17:42:30.345244+0000", "2017-12-14 00:23:23.468560"])

Related

Sorting arrays by date with datetime in python 3.10

I have tried all sorts of code but none of it works so far, this is as close as I have gotten all of them get a similar error
Heres the data i want to sort
{'15/2/2022': 'C:\\Users\\me\\filelocation\\Stuff\\codestuff\\Python\\PDF\\files\\paySlip15Febuary.pdf',
'18/1/2022':' C:\\Users\\me\\filelocation\\Stuff\\codestuff\\Python\\PDF\\files\\paySlip18January.pdf' }
and so on
This is the code i am using currently, but i have tried much more
def key(s):
fmt = "%d-%m-%Y"
s = ''.join(s.rsplit(':', 1)) # remove colon from offset
return datetime.strptime(s, fmt)
sorted(dates.values(), key=key)
No matter how many different things i try i get this error.
Btw if you are wondering i am creating an array of dates from a list
raise ValueError("time data %r does not match format %r" %
Sorry guys here is the correct answer
from datetime import datetime
sorted = OrderedDict(sorted(dates.items(), key = lambda x:datetime.strptime(x[0], '%d/%m/%Y'), reverse=True))

Comparing two datetime strings

I have two DateTime strings. How would I compare them and tell which comes first?
A = '2019-02-12 15:01:45:145'
B = '2019-02-12 15:02:02:22'
This format has milliseconds in it, so it cannot be parsed by time.strptime. I chose to split according to the last colon, parse the left part, and manually convert the right part, add them together.
A = '2019-02-12 15:01:45:145'
B = '2019-02-12 15:02:02:22'
import time
def parse_date(s):
date,millis = s.rsplit(":",1)
return time.mktime(time.strptime(date,"%Y-%m-%d %H:%M:%S")) + int(millis)/1000.0
print(parse_date(A))
print(parse_date(B))
prints:
1549958505.145
1549958522.022
now compare the results instead of printing them to get what you want
If your convention on milliseconds is different (ex: here 22 could also mean 220), then it's slightly different. Pad with zeroes on the right, then parse:
def parse_date(s):
date,millis = s.rsplit(":",1)
millis = millis+"0"*(3-len(millis)) # pad with zeroes
return time.mktime(time.strptime(date,"%Y-%m-%d %H:%M:%S")) + int(millis)/1000.0
in that case the result it:
1549958505.145
1549958522.22
If both the date/time strings are in ISO 8601 format (YYYY-MM-DD hh:mm:ss) you can compare them with a simple string compare, like this:
a = '2019-02-12 15:01:45.145'
b = '2019-02-12 15:02:02.022'
if a < b:
print('Time a comes before b.')
else:
print('Time a does not come before b.')
Your strings, however, have an extra ':' after which come... milliseconds? I'm not sure. But if you convert them to a standard hh:mm:ss.xxx... form, then your date strings will be naturally comparable.
If there is no way to change the fact that you're receiving those strings in hh:mm:ss:xx format (I'm assuming that xx is milliseconds, but only you can say for sure), then you can "munge" the string slightly by parsing out the final ":xx" and re-attaching it as ".xxx", like this:
def mungeTimeString(timeString):
"""Converts a time string in "YYYY-MM-DD hh:mm:ss:xx" format
to a time string in "YYYY-MM-DD hh:mm:ss.xxx" format."""
head, _, tail = timeString.rpartition(':')
return '{}.{:03d}'.format(head, int(tail))
Then call it with:
a = '2019-02-12 15:01:45:145'
b = '2019-02-12 15:02:02:22'
a = mungeTimeString(a)
b = mungeTimeString(b)
if a < b:
print('Time a comes before b.')
else:
print('Time a does not come before b.')

Return list element based on comparison of two other lists?

I am trying to return the element of a list based on a comparison between two other lists.
List 1 is a list of file names created from using glob.glob(path).
List 2 is identical to the first, but has the filenames parsed into datetimes using datetime.datetime.strptime. It is by definition the same length as the first list.
List 3 list is like the second, in that it is a list of datetimes parsed from a list of filenames. The lists of filenames are related, but not necessarily the same length.
What I am trying to do is return the filename (List 1) that corresponds to the last datetime in List 2 prior to a specific element of List 3. It's confusing, I know. Sorry about that.
I know that using next is a quick way to return values from a list based on comparisons, but I haven't found a way to use it to return a value from a list outside the comparison.
Here's what I have:
# get list of mat files and extract corresponding times
matFiles = []
matFileTimes = []
matFilePattern = re.compile('\\.*(\w*\s*\w*\s*)(\d+.\d+.\d+\s+\d+.\d+.\d+.\d+)(\s*\w*\s*\d*)?\.mat$')
for name in glob.glob(filePath[0] + '\*.mat'):
event = re.search(matFilePattern, name)
matFiles.append(event.group(0))
matFileTimes.append(datetime.datetime.strptime(event.group(2),'%Y-%m-%d %H-%M-%S-%f'))
self.matFiles = sorted(matFiles)
self.matFileTimes = sorted(matFileTimes)
if 'audio' in listdir(filePath[0]):
audioFiles = []
audioFileTimes = []
audioFilePattern = re.compile('\\.*(\w*\s*\w*\s*)(\d+.\d+.\d+\s+\d+.\d+.\d+.\d+)?\.wav$')
for name in glob.glob(path.join(filePath[0], 'audio') + '\*.wav'):
audioEvent = re.search(audioFilePattern, name)
audioFiles.append(audioEvent.group(0))
audioFileTimes.append(datetime.datetime.strptime(audioEvent.group(2),'%Y-%m-%d %H-%M-%S-%f'))
self.audioFiles = sorted(audioFiles)
self.audioFileTimes = sorted(audioFileTimes)
for each in audioFileTimes:
self.eventMenu.addItem(datetime.datetime.strftime(each, '%b %d %Y, %I:%M:%S %p'))
else:
for each in matFileTimes:
self.eventMenu.addItem(datetime.datetime.strftime(each, '%b %d %Y, %I:%M:%S %p'))
Then later (in a different class function):
if 'audio' in listdir(self.filePath):
fileToLoad = next(dt for dt in reversed(self.matFileTimes) if dt <= self.audioFileTimes[self.eventMenu.currentIndex()])
As it's implemented, next returns the datetime from "matFileTimes" that occurs immediately prior to the datetime indicated by the "eventMenu". What's the quickest Pythonic way to return the element of "matFiles" that corresponds to the "matFileTime" datetime?
Suggestions on better ways to do anything shown are also appreciated - I'm a bit new at this.
The quickest way to get a result is using the index of the file time element and get the element with the same index of the file name list:
if 'audio' in listdir(self.filePath):
timeToLoad = next(dt for dt in reversed(self.matFileTimes) if dt <= self.audioFileTimes[self.eventMenu.currentIndex()])
fileToLoad = self.matFiles[self.matFileTimes.index(timeToLoad)]

Python Regex: Mixed format string duration to seconds

I have a bunch of time durations in a list as follows
['23m3s', '23:34', '53min 3sec', '2h 3m', '22.10', '1:23:33', ...]
A you can guess, there are N permutations of time formatting being used.
What is the most efficient or simplest way to extract duration in seconds from each element in Python?
This is perhaps still a bit crude, but it seems to do the trick for all the data you've posted so far. The second totals all come to what I would expect. A combination of re and timedelta seems to do the trick for this small sample.
>>> import re
>>> from datetime import timedelta
First a dictionary of regexes: UPDATED BASED ON YOUR COMMENT
d = {'hours': [re.compile(r'(\d+)(?=h)'), re.compile(r'^(\d+)[:.]\d+[:.]\d+')],
'minutes': [re.compile(r'(\d+)(?=m)'), re.compile(r'^(\d+)[:.]\d+$'),
re.compile(r'^\d+[.:](\d+)[.:]\d+')], 'seconds': [re.compile(r'(\d+)(?=s)'),
re.compile(r'^\d+[.:]\d+[.:](\d+)'), re.compile(r'^\d+[:.](\d+)$')]}
Then a function to try out the regexes (perhaps still a bit crude):
>>> def convert_to_seconds(*time_str):
timedeltas = []
for t in time_str:
td = timedelta(0)
for key in d:
for regex in d[key]:
if regex.search(t):
if key == 'hours':
td += timedelta(hours=int(regex.search(t).group(1)))
elif key == 'minutes':
td += timedelta(seconds=int(regex.search(t).group(1)) * 60)
elif key == 'seconds':
td += timedelta(seconds=int(regex.search(t).group(1)))
print(td.seconds)
Here are the results:
>>> convert_to_seconds(*t)
1383
1414
3183
7380
1330
5013
You could add more regexes as you encounter more data, but only to an extent.

Python: Joining and writing (XML.etrees) trees stored in a list

I'm looping over some XML files and producing trees that I would like to store in a defaultdict(list) type. With each loop and the next child found will be stored in a separate part of the dictionary.
d = defaultdict(list)
counter = 0
for child in root.findall(something):
tree = ET.ElementTree(something)
d[int(x)].append(tree)
counter += 1
So then repeating this for several files would result in nicely indexed results; a set of trees that were in position 1 across different parsed files and so on. The question is, how do I then join all of d, and write the trees (as a cumulative tree) to a file?
I can loop through the dict to get each tree:
for x in d:
for y in d[x]:
print (y)
This gives a complete list of trees that were in my dict. Now, how do I produce one massive tree from this?
Sample input file 1
Sample input file 2
Required results from 1&2
Given the apparent difficulty in doing this, I'm happy to accept more general answers that show how I can otherwise get the result I am looking for from two or more files.
Use Spyne:
from spyne.model.primitive import *
from spyne.model.complex import *
class GpsInfo(ComplexModel):
UTC = DateTime
Latitude = Double
Longitude = Double
DopplerTime = Double
Quality = Unicode
HDOP = Unicode
Altitude = Double
Speed = Double
Heading = Double
Estimated = Boolean
class Header(ComplexModel):
Name = Unicode
Time = DateTime
SeqNo = Integer
class CTrailData(ComplexModel):
index = UnsignedInteger
gpsInfo = GpsInfo
Header = Header
class CTrail(ComplexModel):
LastError = AnyXml
MaxTrial = Integer
Trail = Array(CTrailData)
from lxml import etree
from spyne.util.xml import *
file_1 = get_xml_as_object(etree.fromstring(open('file1').read()), CTrail)
file_2 = get_xml_as_object(etree.fromstring(open('file2').read()), CTrail)
file_1.Trail.extend(file_2.Trail)
file_1.Trail.sort(key=lambda x: x.index)
elt = get_object_as_xml(file_1, no_namespace=True)
print etree.tostring(elt, pretty_print=True)
While doing this, Spyne also converts the data fields from string to their native Python formats as well, so it'll be much easier for you to work with the data from this xml document.
Also, if you don't mind using the latest version from git, you can do e.g.:
class GpsInfo(ComplexModel):
# (...)
doppler_time = Double(sub_name="DopplerTime")
# (...)
so that you can get data from the CamelCased tags without having to violate PEP8.
Use lxml.objectify:
from lxml import etree, objectify
obj_1 = objectify.fromstring(open('file1').read())
obj_2 = objectify.fromstring(open('file2').read())
obj_1.Trail.CTrailData.extend(obj_2.Trail.CTrailData)
# .sort() won't work as objectify's lists are not regular python lists.
obj_1.Trail.CTrailData = sorted(obj_1.Trail.CTrailData, key=lambda x: x.index)
print etree.tostring(obj_1, pretty_print=True)
It doesn't do the additional conversion work that the Spyne variant does, but for your use case, that might be enough.

Categories