Sorting dictionary by value by datetime in python - python

How do I sort a dictionary by value in datetime in shape of this:
{'user_0': [['item_805696', '2021-02-11 13:03:42'],
['item_386903', '2021-02-11 13:03:52'],
['item_3832', '2021-02-11 13:04:07'],
['item_849824', '2021-02-11 13:05:04'],
'user_1': [['item_97057', '2021-02-11 13:03:42'],
['item_644971', '2021-02-11 13:09:32'],
['item_947129', '2021-02-11 13:15:27'],
['item_58840', '2021-02-11 13:16:11'],
['item_640213', '2021-02-11 13:17:40'],
...
Im trying to sort values by datetime of second value in values of the dictionary

You can give a key to tell list.sort or sorted how to sort.
A key to sort by the first value is lambda x: x[0]; a key to sort by the second value is lambda x: x[1].
d = {'user_0': [['item_805696', '2021-02-11 13:03:42'],
['item_849824', '2021-02-11 13:05:04'],
['item_386903', '2021-02-11 13:03:52'],
['item_3832', '2021-02-11 13:04:07']],
'user_1': [['item_58840', '2021-02-11 13:16:11'],
['item_947129', '2021-02-11 13:15:27'],
['item_97057', '2021-02-11 13:03:42'],
['item_640213', '2021-02-11 13:17:40'],
['item_644971', '2021-02-11 13:09:32']]}
for v in d.values():
v.sort(key=lambda x: x[1])
print(d)
# {'user_0': [['item_805696', '2021-02-11 13:03:42'],
# ['item_386903', '2021-02-11 13:03:52'],
# ['item_3832', '2021-02-11 13:04:07'],
# ['item_849824', '2021-02-11 13:05:04']],
# 'user_1': [['item_97057', '2021-02-11 13:03:42'],
# ['item_644971', '2021-02-11 13:09:32'],
# ['item_947129', '2021-02-11 13:15:27'],
# ['item_58840', '2021-02-11 13:16:11'],
# ['item_640213', '2021-02-11 13:17:40']]}
Note that this only works because the timestamps are presented in the very practical format 'yyyy-mm-dd HH:MM:SS', and strings are sorted lexicographically, i.e., from left to right. If the timestamps had been in a less friendly format, such as 'dd-mm-yyyy HH:MM:SS', then you'd need to use the key to parse the timestamps.

You can follow this answer to convert the string representing the date to a number of seconds since the Unix epoch (1st January 1970): How to convert python timestamp string to epoch?
Then use can use the key argument of the sorted built-in method to use it to sort the dictionary how you want.

You can also use the datetime module in python standard librairy and here is the code
from datetime import datetime
d = {'user_0': [['item_805696', '2021-02-11 13:03:42'],
['item_386903', '2021-02-11 13:03:52'],
['item_3832', '2021-02-11 13:04:07'],
['item_849824', '2021-02-11 13:05:04']],
'user_1': [['item_97057', '2021-02-11 13:03:42'],
['item_644971', '2021-02-11 13:09:32'],
['item_947129', '2021-02-11 13:15:27'],
['item_58840', '2021-02-11 13:16:11'],
['item_640213', '2021-02-11 13:17:40']]}
def sort_by_datetime(val):
return datetime.strptime(val[1], '%Y-%m-%d %H:%M:%S')
sorted_dict = {k: sorted(v, key=sort_by_datetime) for k, v in d.items()}

Related

Iteration over two lists using index

I am trying to create a list of time and date at specific intervals. The times and dates are present in a time series csv and I want to write a code that extracts data from specific time intervals. I made two lists for day and hour and I am creating a new variable that that stores the date and time of interest. I have trying the following code but I get error:
day = ['01', '02', '03', '04', "05", '06', '07', '08', '09', '10', '11', '12','13','14','15','16','17','18'
'19','20','21','22','23','24','25','26','27','28','29','30','31']
hour = ['0', '3', '6', '9', '12', '15','18','21']
year, month, day, hour = year, month, day, hour # 2016-01-01 #01:00 am
day_time = []
for i in day.index:
for j in hour.index:
day_time = int("".join(day[i], hour[j], "00",))
print(day_time)
TypeError Traceback (most recent call last)
<ipython-input-72-15de17abf279> in <module>
6 year, month, day, hour = year, month, day, hour # 2016-01-01 #01:00 am
7 day_time = []
----> 8 for i in day.index:
9 for j in hour.index:
10 day_time = int("".join(day[i], hour[j], "00",))
TypeError: 'builtin_function_or_method' object is not iterable
can someone suggest a solution?
index is a function, not an attribute for list instance. please refer to Data structures
also, the join function of a str data type takes iterables, refer to here
Also, as #Lecdi pointed, you should use append to add to a list instead of redefinition of the variable using =; please refer to here
to be able to do what you want to do:
day = ['01', '02', '03', '04', "05", '06', '07', '08', '09', '10', '11', '12','13','14','15','16','17','18'
'19','20','21','22','23','24','25','26','27','28','29','30','31']
hour = ['0', '3', '6', '9', '12', '15','18','21']
year, month, day, hour = year, month, day, hour # 2016-01-01 #01:00 am
day_time = []
for day_i in day:
for hour_i in hour:
day_time.append(int("".join([day_i, hour_i, "00"])))
print(day_time)
I think enumerate() would work better for you
for indexDay, valueDay in enumerate(day):
for indexHour, valueHour in enumerate(hour):
day_time.append(int("".join([valueDay, valueHour, "00"])))

Looping through a list of tuples and removing them

Im having some difficulty understanding why my loop is not deleting invalid dates from a list of date tuples in the format of dd/mm/yyyy . heres what i have so far :
dates = [('12','10','1987'),('13','09','2010'), ('34','02','2002'), ('02','15','2005'),('37','10','2016'),('39','11','2001')]
print(dates)
for date in dates :
day = int(date[0])
month = int(date[1])
year = int(date[2])
if day > 31 :
dates.remove(date)
if month > 12 :
dates.remove(date)
print(dates)
and heres the result :
[('12', '10', '1987'), ('13', '09', '2010'), ('34', '02', '2002'), ('02', '15', '2005'), ('37', '10', '2016'), ('39', '11', '2001')]
[('12', '10', '1987'), ('13', '09', '2010'), ('02', '15', '2005'), ('39', '11', '2001')]
I'm a total beginner and any help would be much appreciated.
Never modify the (length of the) list you are looping over. Instead, use for example a temporary list:
dates = [('12','10','1987'),('13','09','2010'), ('34','02','2002'), ('02','15','2005'),('37','10','2016'),('39','11','2001')]
print(dates)
out = []
for date in dates :
day = int(date[0])
month = int(date[1])
year = int(date[2])
if day > 31 or month > 12:
continue
out.append(date)
dates = out
print(dates)
The continue statement jumps back to the first line of the loop, so the unwanted dates will be skipped.
Better alternative conserning dates
Commenting on the "date checking" functionality of the program: It might be really hard to determine by your own rules what dates are acceptable and what are not. Consider for example the Feb 29th, which is only valid on every fourth year.
What you could do instead is to use the datetime library to try to parse the strings to datetime objects, and if the parsing fails, you know the date is illegal.
import datetime as dt
dates = [('12','10','1987'),('13','09','2010'), ('34','02','2002'), ('02','15','2005'),('37','10','2016'),('39','11','2001')]
def filter_bad_dates(dates):
out = []
for date in dates:
try:
dt.datetime.strptime('-'.join(date), '%d-%m-%Y')
except ValueError:
continue
out.append(date)
return out
dates = filter_bad_dates(dates)
print(dates)
This try - except pattern is also called "Duck Typing":
If it looks like a date and gets parsed like a proper date, then it is probably a proper date.
You can easily accomplish that with this list comprehension:
dates = [('12','10','1987'),('13','09','2010'), ('34','02','2002'), ('02','15','2005'),('37','10','2016'),('39','11','2001')]
dates = [date for date in dates if int(date[1]) < 12 and int(date[0]) < 31]
print(dates)
Output:
[('12', '10', '1987'), ('13', '09', '2010')]
I like #AnnZen's comprehension approach (+1) though my tendency would be to go more symbolic at the waste of some time and space:
dates = [ \
('12', '10', '1987'), \
('13', '09', '2010'), \
('34', '02', '2002'), \
('02', '15', '2005'), \
('37', '10', '2016'), \
('39', '11', '2001'), \
]
dates = [date for (day, month, _), date in zip(dates, dates) if day < '31' and month < '12']
print(dates)
OUTPUT
> python3 test.py
[('12', '10', '1987'), ('13', '09', '2010')]
>
As far as #np8's "Never modify the list you are looping over.", that's excellent advice. Though, again, I might waste some space making the copy upfront to make my code simpler:
for date in list(dates): # iterate over a copy
day, month, _ = date
if int(day) > 31 or int(month) > 12:
dates.remove(date)
Though in the end, #np8's filtering through datetime seems the most reliable solution. (+1)

In python..looking for a simple code to output string from datetime and float from a list

I would like to loop through each average(index[0]) and each hour(index[1]) (in this order) in the first five lists of:
a = [[38.59, '15'], [23.81, '02'], [21.52, '20'], [16.8, '16'], [16.01, '21'], [14.74, '13'], [13.44, '10'], [13.24, '18'], [13.23, '14'], [11.46, '17']]
I would like to use the str.format() method to print the hour and average in the following format:
output str = "15:00: 38.59 average comments per post"
To format the hours, I can use the datetime.strptime() constructor to return a datetime object and then use the strftime() method to specify the format of the time.
To format the average, I can use {:.2f} to indicate that just two decimal places should be used.
How can I accomplish this with 2-3 lines of coding?
from datetime import datetime as dt
a = [[38.59, '15'], [23.81, '02'], [21.52, '20'], [16.8, '16'], [16.01, '21'], [14.74, '13'], [13.44, '10'], [13.24, '18'], [13.23, '14'], [11.46, '17']]
for elem in a[:5]:
dtObj = dt.strptime(elem[1], '%H')
timeString = dt.strftime(dtObj, '%H:%M')
roundedAverageString = "{0:.2f}".format(elem[0])
print("{}: {} average comments per post".format(timeString, roundedAverageString))
output example:
15:00: 38.59 average comments per post
Using datetime is overkill given the data:
data = [[38.59, '15'], [23.81, '02'], [21.52, '20'], [16.8, '16'], [16.01, '21'], [14.74, '13'], [13.44, '10'], [13.24, '18'], [13.23, '14'], [11.46, '17']]
for ave,hr in data[:5]:
print(f'{hr}:00: {ave:5.2f} average comments per post')
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post

How to compare multiple dates in a list in Python?

I am wondering how am I able to compare dates in a list. I would like to extract the "earliest" date.
(I did a for loop as I had to replace some characters with '-')
comment_list = comment_container.findAll("div", {"class" : "comment-date"})
D =[]
for commentDate in comment_list:
year, month, day = map(int, commentDate.split('-'))
date_object = datetime(year, month, day)
date_object = datetime.strptime(commentDate, '%Y-%m-%d').strftime('%Y-%m-%d')
D.append(date_object)
print(D)
Output:
['2018-06-26', '2018-04-01', '2018-07-19', '2018-04-23', '2018-08-25', '2018-06-08', '2018-06-14', '2018-07-08', '2019-03-15', '2019-03-15', '2019-03-15', '2019-03-15', '2019-03-15']
I want to extract the earliest date:
Eg.
'2018-04-01'
Just use the min function:
A = ['2018-06-26', '2018-04-01', '2018-07-19', '2018-04-23', '2018-08-25', '2018-06-08', '2018-06-14', '2018-07-08', '2019-03-15', '2019-03-15', '2019-03-15', '2019-03-15', '2019-03-15']
print(min(A))
produces
2018-04-01
comment_list = comment_container.findAll("div", {"class" : "comment-date"})
D =[]
for commentDate in comment_list:
year, month, day = map(int, commentDate.split('-'))
date_object = datetime(year, month, day)
D.append(date_object)
print(min(D))
You should keep the dates as datetime objects and then use the min() builtin function to determine the earliest date
from datetime import datetime
D = ['2018-06-26', '2018-04-01', '2018-07-19', '2018-04-23', '2018-08-25', '2018-06-08',
'2018-06-14', '2018-07-08', '2019-03-15', '2019-03-15', '2019-03-15', '2019-03-15', '2019-03-15']
D.sort()
print(D[0])
or this if you dont want change D
T = D[:]
T.sort()
print(T[0])
As suggested by Siong you can use min(D). You can achieve the same like this:
comment_list = comment_container.findAll("div", {"class" : "comment-date"})
D = [datetime.strptime(commentDate, '%Y-%m-%d') for commentDate in comment_list]
print(min(D))
Working with datetime.datetime objects is usually preferable since the comparisons you make are not based on the formatting of the string. You can always convert to string later on:
min_date_str = min(D).strftime('%Y-%m-%d')
If you are sure that all dates are correctly padded with zeros (i.e. 01 for January not 1 and so on) then simple min or max will be enough. However I want to note that tuples of ints might be also sorted which might be of use if you encounter mixed padded and not padded dates, consider for example:
d = ['2018-7-1','2018-08-01']
print(min(d)) #prints 2018-08-01 i.e. later date
print(min(d,key=lambda x:tuple(int(i) for i in x.split('-')))) #prints 2018-7-1
This solution assumes data are not broken, i.e. all elements produced by .split('-') might be turned into ints.
from dateutil.parser import parse
d = ['2018-7-1','2018-08-01']
date_mapping = dict((parse(x), x) for x in d)
earliest_date = date_mapping[min(date_mapping)]
print(earliest_date)
>>>> '2018-7-1'

python parse java calendar to isodate

I've data like this.
startDateTime: {'timeZoneID': 'America/New_York', 'date': {'year': '2014', 'day': '29', 'month': '1'}, 'second': '0', 'hour': '12', 'minute': '0'}
This is just a representation for 1 attribute. Like this i've 5 other attributes. LastModified, created etc.
I wanted to derive this as ISO Date format yyyy-mm-dd hh:mi:ss. is this the right way for doing this?
def parse_date(datecol):
x=datecol;
y=str(x.get('date').get('year'))+'-'+str(x.get('date').get('month')).zfill(2)+'-'+str(x.get('date').get('day')).zfill(2)+' '+str(x.get('hour')).zfill(2)+':'+str(x.get('minute')).zfill(2)+':'+str(x.get('second')).zfill(2)
print y;
return;
That works, but I'd say it's cleaner to use the string formatting operator here:
def parse_date(c):
d = c["date"]
print "%04d-%02d-%02d %02d:%02d:%02d" % tuple(map(str, (d["year"], d["month"], d["day"], c["hour"], c["minute"], c["second"])))
Alternatively, you can use the time module to convert your fields into a Python time value, and then format that using strftime. Remember the time zone, though.

Categories