I have a list of strings and I want to use regex to filter the list to certain strings.
Ex. Here is the original list:
quoteTitle = ['\r\n ', ' ', '\r\n ', '\r\n ', '\r\n ', '\r\n ', '\r\n ', '30. Loyalty', '29. Speed Scale', '28. Security', '27. Every Position', '26. Superior Brain Power', '25. A Long Line of Fighters', '24. Dwight Surveillance', '23. Friends ', '22. Pull the Plug ', '21. Second Life', '20. Accidentally vs. On Purpose', '19. Menstruation Wishes ', '18. Ideal Choice', '17. Healthcare in the Wild', '16. Superior Cousins', '15. Regular Ideas', '14. Immunity Logic', '13. The Person You Least, Medium and Most Suspect', '12. Real Heroes ', '11. Water Cooler Gossip', '10. Stress', '9. All These People!', '8. The “R” Sound', '7. A Woman’s Defects ', '6.Werewolf Hunting Experience ', '5. An Ideal World ', '4. Attention', '3. The Thing About Bear Attacks ', '2. Resume Critiquing', '1. Yeast Infections ', 'Tags', 'Recently in TV', '5/8/2018 3:45:00 PM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', 'Most Popular', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', 'TV', '4/3/2018 10:00:00 AM', 'TV', '4/3/2018 9:25:00 AM', 'Comedy', '3/22/2018 1:00:28 PM', 'TV', '3/15/2018 10:00:00 AM', 'Comedy', '3/13/2018 2:00:00 PM', 'TV', '3/10/2018 10:00:00 AM', 'TV', '3/2/2018 11:00:00 AM', 'TV', '2/25/2018 10:30:00 PM', 'TV', '2/23/2018 1:00:00 PM', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/3/2018 10:00:00 AM', '5/7/2018 10:00:00 AM', '4/26/2018 2:00:00 PM', '5/6/2018 10:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', '5/3/2018 12:00:00 AM', '5/3/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', '4/3/2018 10:00:00 AM', '4/3/2018 9:25:00 AM', '3/22/2018 1:00:28 PM', '3/15/2018 10:00:00 AM', '3/13/2018 2:00:00 PM']
I want only the numbered items and their text following from 30 to 1. I can successfully filter out anything that doesn't start with a number using
p = re.compile(r'\w')
q = filter(p.match, quoteTitle)
p = re.compile(r'^\d+')
q = filter(p.match, q)
This gets me to
print(list(q)) --> ['30. Loyalty', '29. Speed Scale', '28. Security', '27. Every Position', '26. Superior Brain Power', '25. A Long Line of Fighters', '24. Dwight Surveillance', '23. Friends ', '22. Pull the Plug ', '21. Second Life', '20. Accidentally vs. On Purpose', '19. Menstruation Wishes ', '18. Ideal Choice', '17. Healthcare in the Wild', '16. Superior Cousins', '15. Regular Ideas', '14. Immunity Logic', '13. The Person You Least, Medium and Most Suspect', '12. Real Heroes ', '11. Water Cooler Gossip', '10. Stress', '9. All These People!', '8. The “R” Sound', '7. A Woman’s Defects ', '6.Werewolf Hunting Experience ', '5. An Ideal World ', '4. Attention', '3. The Thing About Bear Attacks ', '2. Resume Critiquing', '1. Yeast Infections ', 'Tags', 'Recently in TV', '5/8/2018 3:45:00 PM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', 'Most Popular', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', 'TV', '4/3/2018 10:00:00 AM', 'TV', '4/3/2018 9:25:00 AM', 'Comedy', '3/22/2018 1:00:28 PM', 'TV', '3/15/2018 10:00:00 AM', 'Comedy', '3/13/2018 2:00:00 PM', 'TV', '3/10/2018 10:00:00 AM', 'TV', '3/2/2018 11:00:00 AM', 'TV', '2/25/2018 10:30:00 PM', 'TV', '2/23/2018 1:00:00 PM', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/3/2018 10:00:00 AM', '5/7/2018 10:00:00 AM', '4/26/2018 2:00:00 PM', '5/6/2018 10:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', '5/3/2018 12:00:00 AM', '5/3/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', '4/3/2018 10:00:00 AM', '4/3/2018 9:25:00 AM', '3/22/2018 1:00:28 PM', '3/15/2018 10:00:00 AM', '3/13/2018 2:00:00 PM']
Now I want to remove the dates in the list
I've tried a lot of combinations of this, but I think I'm missing something or not understanding. My thinking is to get all strings in the list that do not follow the format of the date entries.
p = re.compile(r'[^'\d+/]')
q = filter(p.match, q)
They start with an apostrophe because its a string of a quote and I think that might be my problem. Other than that, the format goes:
apostrophe, number (between 1-12 so \d+), /
That should be enough to filter out the date entries as long as I get it working correctly
Update: even tried this to search for elements of the list that have an AM or PM in them and still no luck
p = re.compile(r'[^(AM|PM)]')
q = filter(p.search, q)
You can search for strings that start with a digit and a .:
import re
quoteTitle = ['\r\n ', ' ', '\r\n ', '\r\n ', '\r\n ', '\r\n ', '\r\n ', '30. Loyalty', '29. Speed Scale', '28. Security', '27. Every Position', '26. Superior Brain Power', '25. A Long Line of Fighters', '24. Dwight Surveillance', '23. Friends ', '22. Pull the Plug ', '21. Second Life', '20. Accidentally vs. On Purpose', '19. Menstruation Wishes ', '18. Ideal Choice', '17. Healthcare in the Wild', '16. Superior Cousins', '15. Regular Ideas', '14. Immunity Logic', '13. The Person You Least, Medium and Most Suspect', '12. Real Heroes ', '11. Water Cooler Gossip', '10. Stress', '9. All These People!', '8. The “R” Sound', '7. A Woman’s Defects ', '6.Werewolf Hunting Experience ', '5. An Ideal World ', '4. Attention', '3. The Thing About Bear Attacks ', '2. Resume Critiquing', '1. Yeast Infections ', 'Tags', 'Recently in TV', '5/8/2018 3:45:00 PM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', 'Most Popular', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', 'TV', '4/3/2018 10:00:00 AM', 'TV', '4/3/2018 9:25:00 AM', 'Comedy', '3/22/2018 1:00:28 PM', 'TV', '3/15/2018 10:00:00 AM', 'Comedy', '3/13/2018 2:00:00 PM', 'TV', '3/10/2018 10:00:00 AM', 'TV', '3/2/2018 11:00:00 AM', 'TV', '2/25/2018 10:30:00 PM', 'TV', '2/23/2018 1:00:00 PM', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/3/2018 10:00:00 AM', '5/7/2018 10:00:00 AM', '4/26/2018 2:00:00 PM', '5/6/2018 10:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', '5/3/2018 12:00:00 AM', '5/3/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', '4/3/2018 10:00:00 AM', '4/3/2018 9:25:00 AM', '3/22/2018 1:00:28 PM', '3/15/2018 10:00:00 AM', '3/13/2018 2:00:00 PM']
new_result = list(filter(lambda x:re.findall('^\d+\.', x), quoteTitle))
Output:
['30. Loyalty', '29. Speed Scale', '28. Security', '27. Every Position', '26. Superior Brain Power', '25. A Long Line of Fighters', '24. Dwight Surveillance', '23. Friends ', '22. Pull the Plug ', '21. Second Life', '20. Accidentally vs. On Purpose', '19. Menstruation Wishes ', '18. Ideal Choice', '17. Healthcare in the Wild', '16. Superior Cousins', '15. Regular Ideas', '14. Immunity Logic', '13. The Person You Least, Medium and Most Suspect', '12. Real Heroes ', '11. Water Cooler Gossip', '10. Stress', '9. All These People!', '8. The \xe2\x80\x9cR\xe2\x80\x9d Sound', '7. A Woman\xe2\x80\x99s Defects ', '6.Werewolf Hunting Experience ', '5. An Ideal World ', '4. Attention', '3. The Thing About Bear Attacks ', '2. Resume Critiquing', '1. Yeast Infections ']
Edit: to find all data between the quotes, you can use .*?:
quote = ['i dont want this', '\r\n ', '\r\n ', ' "this is the quote i want to extract" ', '" and also this one"', '\r\n "and me"']
new_results = list(map(lambda x:x[0], filter(None, [re.findall('"(.*?)"', i) for i in quote])))
Output:
['this is the quote i want to extract', ' and also this one', 'and me']
Related
I have a pandas dataframe with the following column:
import pandas as pd
df = pd.DataFrame(['10/30/2022 7:00:00 AM +01:00', '10/31/2022 12:00:00 AM +01:00',
'10/30/2022 3:00:00 PM +01:00', '10/30/2022 9:00:00 PM +01:00',
'10/30/2022 5:00:00 PM +01:00', '10/30/2022 10:00:00 PM +01:00',
'10/30/2022 3:00:00 AM +01:00', '10/30/2022 2:00:00 AM +02:00',
'10/30/2022 10:00:00 AM +01:00', '10/30/2022 4:00:00 PM +01:00',
'10/30/2022 1:00:00 AM +02:00'], columns = ['Date'])
I want to convert the date values so it looks like the following:
2022-10-30T00:00:00+02:00
I tried the following; df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%Y-%m-%dT%H:%M:%S')
The code raises an error whenever the .dt is called:
AttributeError: Can only use .dt accessor with datetimelike values
Apparently the code pd.to_datetime() does not work.
Anyone knows how to fix this?
# convert to date
# convert to your format
df['Date']=df['Date'].apply(pd.to_datetime).apply(lambda x: x.strftime('%Y-%m-%dT%H:%M:%S%z'))
0 2022-10-30T07:00:00+0100
1 2022-10-31T00:00:00+0100
2 2022-10-30T15:00:00+0100
3 2022-10-30T21:00:00+0100
4 2022-10-30T17:00:00+0100
5 2022-10-30T22:00:00+0100
6 2022-10-30T03:00:00+0100
7 2022-10-30T02:00:00+0200
8 2022-10-30T10:00:00+0100
9 2022-10-30T16:00:00+0100
10 2022-10-30T01:00:00+0200
Name: Date, dtype: object
Add utc=True when converting to datetime. Try this:
import pandas as pd
df = pd.DataFrame(['10/30/2022 7:00:00 AM +01:00', '10/31/2022 12:00:00 AM +01:00',
'10/30/2022 3:00:00 PM +01:00', '10/30/2022 9:00:00 PM +01:00',
'10/30/2022 5:00:00 PM +01:00', '10/30/2022 10:00:00 PM +01:00',
'10/30/2022 3:00:00 AM +01:00', '10/30/2022 2:00:00 AM +02:00',
'10/30/2022 10:00:00 AM +01:00', '10/30/2022 4:00:00 PM +01:00',
'10/30/2022 1:00:00 AM +02:00'], columns = ['Date'])
df['Date'] =df['Date'] = pd.to_datetime(df['Date'], utc=True)
print(df['Date'].dt.strftime('%Y-%m-%dT%H:%M:%S'))
Output:
0 2022-10-30T06:00:00
1 2022-10-30T23:00:00
2 2022-10-30T14:00:00
3 2022-10-30T20:00:00
4 2022-10-30T16:00:00
5 2022-10-30T21:00:00
6 2022-10-30T02:00:00
7 2022-10-30T00:00:00
8 2022-10-30T09:00:00
9 2022-10-30T15:00:00
10 2022-10-29T23:00:00
Dataset
df_one = ['2017-07-27 04:00:00', '2017-08-07 04:00:00', '2017-08-11 20:00:00', '2017-08-15 16:00:00', '2017-08-21 20:00:00', '2017-08-23 08:00:00', '2017-08-23 16:00:00', '2017-08-31 20:00:00', '2017-09-01 08:00:00', '2017-09-01 16:00:00', '2017-09-01 20:00:00', '2017-09-04 00:00:00', '2017-09-04 20:00:00', '2017-09-05 00:00:00', '2017-09-05 04:00:00', '2017-09-12 12:00:00', '2017-09-13 12:00:00', '2017-09-14 00:00:00', '2017-09-18 04:00:00', '2017-09-21 08:00:00', '2017-09-22 16:00:00', '2017-09-25 08:00:00', '2017-10-10 12:00:00', '2017-10-16 16:00:00', '2017-10-19 12:00:00', '2017-10-23 04:00:00', '2017-10-26 00:00:00', '2017-10-27 00:00:00', '2017-11-10 04:00:00', '2017-11-21 08:00:00', '2017-11-22 16:00:00', '2017-11-30 00:00:00', '2017-11-30 08:00:00', '2017-11-30 16:00:00', '2017-12-01 00:00:00', '2017-12-04 20:00:00', '2017-12-14 08:00:00', '2017-12-15 12:00:00', '2017-12-15 16:00:00', '2017-12-18 00:00:00', '2017-12-19 12:00:00', '2018-01-08 20:00:00', '2018-01-11 20:00:00', '2018-02-06 04:00:00', '2018-02-13 20:00:00', '2018-02-20 08:00:00', '2018-03-02 20:00:00', '2018-03-09 08:00:00', '2018-03-13 20:00:00', '2018-03-16 00:00:00', '2018-03-20 08:00:00', '2018-03-20 16:00:00', '2018-03-22 08:00:00', '2018-03-29 04:00:00', '2018-04-09 20:00:00', '2018-04-13 20:00:00', '2018-04-16 00:00:00', '2018-04-20 08:00:00', '2018-05-11 20:00:00', '2018-05-15 16:00:00', '2018-05-31 16:00:00', '2018-06-13 12:00:00', '2018-06-14 00:00:00', '2018-06-14 20:00:00', '2018-06-22 16:00:00', '2018-06-27 20:00:00', '2018-06-29 20:00:00', '2018-07-03 00:00:00', '2018-07-03 04:00:00', '2018-07-12 04:00:00', '2018-07-16 20:00:00', '2018-07-18 00:00:00', '2018-07-20 20:00:00', '2018-07-27 04:00:00', '2018-07-31 00:00:00', '2018-08-02 00:00:00', '2018-08-20 04:00:00', '2018-09-03 00:00:00', '2018-09-06 08:00:00', '2018-09-07 20:00:00', '2018-09-13 00:00:00', '2018-09-27 16:00:00', '2018-10-11 08:00:00', '2018-10-17 20:00:00', '2018-11-02 00:00:00', '2018-11-05 20:00:00', '2018-11-06 00:00:00', '2018-11-09 04:00:00', '2018-11-16 08:00:00', '2018-11-23 20:00:00', '2018-11-29 12:00:00', '2018-12-03 00:00:00', '2018-12-03 16:00:00', '2018-12-03 20:00:00', '2018-12-04 08:00:00', '2018-12-05 08:00:00', '2018-12-07 00:00:00', '2018-12-12 00:00:00', '2018-12-13 12:00:00', '2018-12-13 20:00:00', '2018-12-18 08:00:00', '2018-12-27 00:00:00', '2018-12-28 00:00:00', '2019-01-03 00:00:00', '2019-01-07 08:00:00', '2019-01-14 20:00:00', '2019-01-15 08:00:00', '2019-01-15 16:00:00', '2019-01-28 04:00:00', '2019-02-05 12:00:00', '2019-02-18 20:00:00', '2019-02-19 12:00:00', '2019-02-20 00:00:00', '2019-03-04 16:00:00', '2019-03-13 00:00:00', '2019-03-22 20:00:00', '2019-04-08 20:00:00', '2019-04-18 16:00:00', '2019-04-30 16:00:00', '2019-05-03 00:00:00', '2019-05-07 04:00:00', '2019-05-08 00:00:00', '2019-05-08 12:00:00', '2019-05-09 08:00:00', '2019-05-09 12:00:00', '2019-05-09 16:00:00', '2019-05-09 20:00:00', '2019-05-15 04:00:00', '2019-05-24 08:00:00', '2019-05-29 00:00:00', '2019-06-03 08:00:00', '2019-06-14 00:00:00', '2019-06-20 12:00:00', '2019-07-01 16:00:00', '2019-07-11 12:00:00', '2019-07-16 16:00:00', '2019-07-19 04:00:00', '2019-07-22 00:00:00', '2019-08-05 16:00:00', '2019-08-14 04:00:00', '2019-08-26 04:00:00', '2019-08-27 12:00:00', '2019-08-27 16:00:00', '2019-08-28 00:00:00', '2019-09-05 08:00:00', '2019-09-11 20:00:00', '2019-09-13 04:00:00', '2019-09-17 00:00:00', '2019-09-18 04:00:00', '2019-09-19 08:00:00', '2019-09-19 16:00:00', '2019-09-20 20:00:00', '2019-10-03 04:00:00', '2019-10-09 08:00:00', '2019-10-09 16:00:00', '2019-10-25 08:00:00', '2019-10-30 08:00:00', '2019-11-05 12:00:00', '2019-11-18 00:00:00', '2019-11-25 00:00:00', '2019-12-02 20:00:00', '2019-12-09 08:00:00', '2019-12-10 16:00:00', '2019-12-19 00:00:00', '2019-12-19 16:00:00', '2019-12-19 20:00:00', '2019-12-27 08:00:00', '2020-01-03 16:00:00', '2020-01-06 16:00:00', '2020-01-08 00:00:00', '2020-01-14 12:00:00', '2020-01-14 20:00:00', '2020-01-15 20:00:00', '2020-01-17 16:00:00', '2020-01-31 20:00:00', '2020-02-05 04:00:00', '2020-02-24 04:00:00', '2020-02-24 12:00:00', '2020-02-25 00:00:00', '2020-03-12 20:00:00', '2020-03-26 04:00:00', '2020-04-01 20:00:00', '2020-04-08 04:00:00', '2020-04-08 08:00:00', '2020-04-09 20:00:00', '2020-04-16 04:00:00', '2020-04-27 20:00:00', '2020-04-28 12:00:00', '2020-04-28 16:00:00', '2020-05-05 16:00:00', '2020-05-13 04:00:00', '2020-05-14 04:00:00', '2020-05-19 00:00:00', '2020-05-25 00:00:00', '2020-05-26 16:00:00', '2020-06-15 00:00:00', '2020-06-16 08:00:00', '2020-06-17 04:00:00', '2020-06-23 08:00:00', '2020-06-25 12:00:00', '2020-06-29 20:00:00', '2020-06-30 00:00:00', '2020-07-02 04:00:00', '2020-07-03 12:00:00', '2020-07-06 08:00:00', '2020-07-10 16:00:00', '2020-07-10 20:00:00', '2020-08-10 04:00:00', '2020-08-13 08:00:00', '2020-08-20 16:00:00', '2020-08-21 08:00:00', '2020-08-21 16:00:00', '2020-08-27 12:00:00', '2020-08-28 00:00:00', '2020-08-28 12:00:00', '2020-09-02 20:00:00', '2020-09-10 20:00:00', '2020-09-17 04:00:00', '2020-09-18 12:00:00', '2020-09-21 16:00:00', '2020-09-30 00:00:00', '2020-10-14 00:00:00', '2020-10-20 00:00:00', '2020-10-28 04:00:00', '2020-11-05 04:00:00', '2020-11-11 20:00:00', '2020-11-13 00:00:00', '2020-11-24 08:00:00', '2020-11-24 16:00:00', '2020-12-10 08:00:00', '2020-12-10 16:00:00', '2020-12-23 04:00:00', '2020-12-24 12:00:00', '2020-12-24 16:00:00', '2020-12-28 12:00:00', '2021-01-08 08:00:00', '2021-01-21 20:00:00', '2021-01-26 12:00:00', '2021-01-27 00:00:00', '2021-01-27 16:00:00', '2021-02-09 04:00:00', '2021-02-17 08:00:00', '2021-02-19 16:00:00', '2021-02-26 20:00:00', '2021-03-11 20:00:00', '2021-03-12 20:00:00', '2021-03-15 04:00:00', '2021-03-15 12:00:00', '2021-03-18 04:00:00', '2021-03-19 04:00:00', '2021-03-23 04:00:00', '2021-03-23 16:00:00', '2021-04-02 16:00:00', '2021-04-05 00:00:00', '2021-04-06 00:00:00', '2021-05-03 00:00:00', '2021-05-07 04:00:00', '2021-05-13 04:00:00', '2021-05-14 20:00:00', '2021-05-27 08:00:00', '2021-06-01 00:00:00', '2021-06-02 16:00:00', '2021-06-03 08:00:00', '2021-06-03 12:00:00']
df_two = ['2017-08-11 23:59', '2017-09-14 23:59', '2017-10-10 23:59', '2017-10-12 23:59', '2017-10-16 23:59', '2017-10-25 23:59', '2018-04-23 23:59', '2018-07-09 23:59', '2018-07-31 23:59', '2018-08-30 23:59', '2018-09-05 23:59', '2018-09-28 23:59', '2018-11-20 23:59', '2019-01-03 23:59', '2019-01-16 23:59', '2019-01-29 23:59', '2019-02-06 23:59', '2019-04-18 23:59', '2019-05-10 23:59', '2019-06-04 23:59', '2019-06-05 23:59', '2019-07-03 23:59', '2019-07-10 23:59', '2019-07-16 23:59', '2019-08-05 23:59', '2019-10-15 23:59', '2019-10-29 23:59', '2019-12-10 23:59', '2019-12-26 23:59', '2020-01-08 23:59', '2020-01-14 23:59', '2020-01-20 23:59', '2020-02-03 23:59', '2020-03-30 23:59', '2020-05-01 23:59', '2020-05-19 23:59', '2020-10-02 23:59', '2020-10-05 23:59', '2020-10-14 23:59', '2020-11-11 23:59', '2021-01-19 23:59', '2021-01-20 23:59', '2021-02-02 23:59', '2021-02-12 23:59', '2021-02-19 23:59', '2021-02-22 23:59', '2021-03-02 23:59', '2021-04-14 23:59', '2021-04-16 23:59', '2021-05-05 23:59', '2021-05-06 23:59']
I'm looking to find those previous & current rows from df_one where df_two date is in-between 2 consecutive rows of df_one
The sort of logic I'm looking to write is something on the lines below
for each row in df_two:
for each row in df_one:
if df_two > df_one_previous_row & df_two < df_one_current_row:
print(df_one_previous_row & df_one_current_row)
Expected Output
2017-08-11 20:00:00 - 2017-08-11 23:59 - 2017-08-15 16:00:00
Found
2017-09-14 00:00:00 - 2017-09-14 23:59 - 2017-09-18 04:00:00
Found
2017-10-10 12:00:00 - 2017-10-10 23:59 - 2017-10-16 16:00:00
Found
2017-10-10 12:00:00 - 2017-10-12 23:59 - 2017-10-16 16:00:00
Found
2017-10-16 16:00:00 - 2017-10-16 23:59 - 2017-10-19 12:00:00
Found
2017-10-23 04:00:00 - 2017-10-25 23:59 - 2017-10-26 00:00:00
Found
2018-04-20 08:00:00 - 2018-04-23 23:59 - 2018-05-11 20:00:00
Found
2018-07-03 04:00:00 - 2018-07-09 23:59 - 2018-07-12 04:00:00
Found
2018-07-31 00:00:00 - 2018-07-31 23:59 - 2018-08-02 00:00:00
Found
2018-08-20 04:00:00 - 2018-08-30 23:59 - 2018-09-03 00:00:00
Found
2018-09-03 00:00:00 - 2018-09-05 23:59 - 2018-09-06 08:00:00
Found
2018-09-27 16:00:00 - 2018-09-28 23:59 - 2018-10-11 08:00:00
Found
2018-11-16 08:00:00 - 2018-11-20 23:59 - 2018-11-23 20:00:00
Found
2019-01-03 00:00:00 - 2019-01-03 23:59 - 2019-01-07 08:00:00
Found
2019-01-15 16:00:00 - 2019-01-16 23:59 - 2019-01-28 04:00:00
Found
2019-01-28 04:00:00 - 2019-01-29 23:59 - 2019-02-05 12:00:00
Found
2019-02-05 12:00:00 - 2019-02-06 23:59 - 2019-02-18 20:00:00
Found
2019-04-18 16:00:00 - 2019-04-18 23:59 - 2019-04-30 16:00:00
Found
2019-05-09 20:00:00 - 2019-05-10 23:59 - 2019-05-15 04:00:00
Found
2019-06-03 08:00:00 - 2019-06-04 23:59 - 2019-06-14 00:00:00
Found
2019-06-03 08:00:00 - 2019-06-05 23:59 - 2019-06-14 00:00:00
Found
2019-07-01 16:00:00 - 2019-07-03 23:59 - 2019-07-11 12:00:00
Found
2019-07-01 16:00:00 - 2019-07-10 23:59 - 2019-07-11 12:00:00
Found
2019-07-16 16:00:00 - 2019-07-16 23:59 - 2019-07-19 04:00:00
Found
2019-08-05 16:00:00 - 2019-08-05 23:59 - 2019-08-14 04:00:00
Found
2019-10-09 16:00:00 - 2019-10-15 23:59 - 2019-10-25 08:00:00
Found
2019-10-25 08:00:00 - 2019-10-29 23:59 - 2019-10-30 08:00:00
Found
2019-12-10 16:00:00 - 2019-12-10 23:59 - 2019-12-19 00:00:00
Found
2019-12-19 20:00:00 - 2019-12-26 23:59 - 2019-12-27 08:00:00
Found
2020-01-08 00:00:00 - 2020-01-08 23:59 - 2020-01-14 12:00:00
Found
2020-01-14 20:00:00 - 2020-01-14 23:59 - 2020-01-15 20:00:00
Found
2020-01-17 16:00:00 - 2020-01-20 23:59 - 2020-01-31 20:00:00
Found
2020-01-31 20:00:00 - 2020-02-03 23:59 - 2020-02-05 04:00:00
Found
2020-03-26 04:00:00 - 2020-03-30 23:59 - 2020-04-01 20:00:00
Found
2020-04-28 16:00:00 - 2020-05-01 23:59 - 2020-05-05 16:00:00
Found
2020-05-19 00:00:00 - 2020-05-19 23:59 - 2020-05-25 00:00:00
Found
2020-09-30 00:00:00 - 2020-10-02 23:59 - 2020-10-14 00:00:00
Found
2020-09-30 00:00:00 - 2020-10-05 23:59 - 2020-10-14 00:00:00
Found
2020-10-14 00:00:00 - 2020-10-14 23:59 - 2020-10-20 00:00:00
Found
2020-11-11 20:00:00 - 2020-11-11 23:59 - 2020-11-13 00:00:00
Found
2021-01-08 08:00:00 - 2021-01-19 23:59 - 2021-01-21 20:00:00
Found
2021-01-08 08:00:00 - 2021-01-20 23:59 - 2021-01-21 20:00:00
Found
2021-01-27 16:00:00 - 2021-02-02 23:59 - 2021-02-09 04:00:00
Found
2021-02-09 04:00:00 - 2021-02-12 23:59 - 2021-02-17 08:00:00
Found
2021-02-19 16:00:00 - 2021-02-19 23:59 - 2021-02-26 20:00:00
Found
2021-02-19 16:00:00 - 2021-02-22 23:59 - 2021-02-26 20:00:00
Found
2021-02-26 20:00:00 - 2021-03-02 23:59 - 2021-03-11 20:00:00
Found
2021-04-06 00:00:00 - 2021-04-14 23:59 - 2021-05-03 00:00:00
Found
2021-04-06 00:00:00 - 2021-04-16 23:59 - 2021-05-03 00:00:00
Found
2021-05-03 00:00:00 - 2021-05-05 23:59 - 2021-05-07 04:00:00
Found
2021-05-03 00:00:00 - 2021-05-06 23:59 - 2021-05-07 04:00:00
Found
Looping with a for or while is not so performance effective from the looks of it. Please could I get help to write a piece of code for this?
We can use np.searchsorted to find the indices in df_one for the corresponding timestamps in df_two which satisfy the condition of inclusion. Note: The timestamps in df_one must be sorted in order for searchsorted to work properly
one = pd.to_datetime(df_one)
two = pd.to_datetime(df_two)
i = np.searchsorted(one, two)
m = ~np.isin(i, [0, len(one)])
df = pd.DataFrame({'df_two': two})
df.loc[m, 'df_one_prev'] = one[i[m] - 1]
df.loc[m, 'df_one_curr'] = one[i[m]]
df['found'] = np.where(m, 'found', 'not found')
df_two df_one_prev df_one_curr found
0 2017-08-11 23:59:00 2017-08-11 20:00:00 2017-08-15 16:00:00 found
1 2017-09-14 23:59:00 2017-09-14 00:00:00 2017-09-18 04:00:00 found
2 2017-10-10 23:59:00 2017-10-10 12:00:00 2017-10-16 16:00:00 found
3 2017-10-12 23:59:00 2017-10-10 12:00:00 2017-10-16 16:00:00 found
4 2017-10-16 23:59:00 2017-10-16 16:00:00 2017-10-19 12:00:00 found
5 2017-10-25 23:59:00 2017-10-23 04:00:00 2017-10-26 00:00:00 found
6 2018-04-23 23:59:00 2018-04-20 08:00:00 2018-05-11 20:00:00 found
7 2018-07-09 23:59:00 2018-07-03 04:00:00 2018-07-12 04:00:00 found
8 2018-07-31 23:59:00 2018-07-31 00:00:00 2018-08-02 00:00:00 found
9 2018-08-30 23:59:00 2018-08-20 04:00:00 2018-09-03 00:00:00 found
10 2018-09-05 23:59:00 2018-09-03 00:00:00 2018-09-06 08:00:00 found
11 2018-09-28 23:59:00 2018-09-27 16:00:00 2018-10-11 08:00:00 found
12 2018-11-20 23:59:00 2018-11-16 08:00:00 2018-11-23 20:00:00 found
13 2019-01-03 23:59:00 2019-01-03 00:00:00 2019-01-07 08:00:00 found
14 2019-01-16 23:59:00 2019-01-15 16:00:00 2019-01-28 04:00:00 found
15 2019-01-29 23:59:00 2019-01-28 04:00:00 2019-02-05 12:00:00 found
16 2019-02-06 23:59:00 2019-02-05 12:00:00 2019-02-18 20:00:00 found
17 2019-04-18 23:59:00 2019-04-18 16:00:00 2019-04-30 16:00:00 found
18 2019-05-10 23:59:00 2019-05-09 20:00:00 2019-05-15 04:00:00 found
19 2019-06-04 23:59:00 2019-06-03 08:00:00 2019-06-14 00:00:00 found
20 2019-06-05 23:59:00 2019-06-03 08:00:00 2019-06-14 00:00:00 found
21 2019-07-03 23:59:00 2019-07-01 16:00:00 2019-07-11 12:00:00 found
22 2019-07-10 23:59:00 2019-07-01 16:00:00 2019-07-11 12:00:00 found
23 2019-07-16 23:59:00 2019-07-16 16:00:00 2019-07-19 04:00:00 found
24 2019-08-05 23:59:00 2019-08-05 16:00:00 2019-08-14 04:00:00 found
25 2019-10-15 23:59:00 2019-10-09 16:00:00 2019-10-25 08:00:00 found
26 2019-10-29 23:59:00 2019-10-25 08:00:00 2019-10-30 08:00:00 found
27 2019-12-10 23:59:00 2019-12-10 16:00:00 2019-12-19 00:00:00 found
28 2019-12-26 23:59:00 2019-12-19 20:00:00 2019-12-27 08:00:00 found
29 2020-01-08 23:59:00 2020-01-08 00:00:00 2020-01-14 12:00:00 found
30 2020-01-14 23:59:00 2020-01-14 20:00:00 2020-01-15 20:00:00 found
31 2020-01-20 23:59:00 2020-01-17 16:00:00 2020-01-31 20:00:00 found
32 2020-02-03 23:59:00 2020-01-31 20:00:00 2020-02-05 04:00:00 found
33 2020-03-30 23:59:00 2020-03-26 04:00:00 2020-04-01 20:00:00 found
34 2020-05-01 23:59:00 2020-04-28 16:00:00 2020-05-05 16:00:00 found
35 2020-05-19 23:59:00 2020-05-19 00:00:00 2020-05-25 00:00:00 found
36 2020-10-02 23:59:00 2020-09-30 00:00:00 2020-10-14 00:00:00 found
37 2020-10-05 23:59:00 2020-09-30 00:00:00 2020-10-14 00:00:00 found
38 2020-10-14 23:59:00 2020-10-14 00:00:00 2020-10-20 00:00:00 found
39 2020-11-11 23:59:00 2020-11-11 20:00:00 2020-11-13 00:00:00 found
40 2021-01-19 23:59:00 2021-01-08 08:00:00 2021-01-21 20:00:00 found
41 2021-01-20 23:59:00 2021-01-08 08:00:00 2021-01-21 20:00:00 found
42 2021-02-02 23:59:00 2021-01-27 16:00:00 2021-02-09 04:00:00 found
43 2021-02-12 23:59:00 2021-02-09 04:00:00 2021-02-17 08:00:00 found
44 2021-02-19 23:59:00 2021-02-19 16:00:00 2021-02-26 20:00:00 found
45 2021-02-22 23:59:00 2021-02-19 16:00:00 2021-02-26 20:00:00 found
46 2021-03-02 23:59:00 2021-02-26 20:00:00 2021-03-11 20:00:00 found
47 2021-04-14 23:59:00 2021-04-06 00:00:00 2021-05-03 00:00:00 found
48 2021-04-16 23:59:00 2021-04-06 00:00:00 2021-05-03 00:00:00 found
49 2021-05-05 23:59:00 2021-05-03 00:00:00 2021-05-07 04:00:00 found
50 2021-05-06 23:59:00 2021-05-03 00:00:00 2021-05-07 04:00:00 found
I have the following list which I need to sort in ascending order:
tlist = ['10:10 AM - 10:20 AM', '10:20 AM - 10:30 AM', '10:30 AM - 10:40 AM', '10:40 AM - 10:50 AM', '10:50 AM - 11:00 AM', '11:00 AM - 11:10 AM', '11:10 AM - 11:20 AM', '11:20 AM - 11:30 AM', '11:30 AM - 11:40 AM', '11:40 AM - 11:50 AM', '11:50 AM - 12:00 PM', '12:00 PM - 12:10 PM', '12:10 PM - 12:20 PM', '12:20 PM - 12:30 PM', '12:30 PM - 12:40 PM', '12:40 PM - 12:50 PM', '12:50 PM - 1:00 PM', '1:00 PM - 1:10 PM', '1:10 PM - 1:20 PM', '1:20 PM - 1:30 PM', '1:30 PM - 1:40 PM', '1:40 PM - 1:50 PM', '1:50 PM - 2:00 PM', '2:00 PM - 2:10 PM', '2:10 PM - 2:20 PM', '2:20 PM - 2:30 PM', '2:30 PM - 2:40 PM', '2:40 PM - 2:50 PM', '2:50 PM - 3:00 PM', '3:00 PM - 3:10 PM', '3:10 PM - 3:20 PM', '3:20 PM - 3:30 PM', '3:30 PM - 3:40 PM', '3:40 PM - 3:50 PM', '3:50 PM - 4:00 PM', '4:00 PM - 4:10 PM', '4:10 PM - 4:20 PM', '4:20 PM - 4:30 PM', '4:30 PM - 4:40 PM', '4:40 PM - 4:50 PM', '4:50 PM - 5:00 PM', '5:00 PM - 5:10 PM', '5:10 PM - 5:20 PM', '5:20 PM - 5:30 PM', '5:30 PM - 5:40 PM', '5:40 PM - 5:50 PM', '5:50 PM - 6:00 PM', '6:00 PM - 6:10 PM', '6:10 PM - 6:20 PM', '6:20 PM - 6:30 PM', '6:30 PM - 6:40 PM', '6:40 PM - 6:50 PM', '6:50 PM - 7:00 PM', '7:00 PM - 7:10 PM', '7:10 AM - 7:20 AM', '7:10 PM - 7:20 PM', '7:20 AM - 7:30 AM', '7:20 PM - 7:30 PM', '7:30 AM - 7:40 AM', '7:30 PM - 7:40 PM', '7:40 AM - 7:50 AM', '7:40 PM - 7:50 PM', '7:50 AM - 8:00 AM', '7:50 PM - 8:00 PM', '8:00 AM - 8:10 AM', '8:00 PM - 8:10 PM', '8:10 AM - 8:20 AM', '8:10 PM - 8:20 PM', '8:20 AM - 8:30 AM', '8:20 PM - 8:30 PM', '8:30 AM - 8:40 AM', '8:30 PM - 8:40 PM', '8:40 AM - 8:50 AM', '8:40 PM - 8:50 PM', '8:50 AM - 9:00 AM', '8:50 PM - 9:00 PM', '9:00 AM - 9:10 AM', '9:00 PM - 9:10 PM', '9:10 AM - 9:20 AM', '9:10 PM - 9:20 PM', '9:20 AM - 9:30 AM', '9:20 PM - 9:30 PM', '9:30 AM - 9:40 AM', '9:40 AM - 9:50 AM', '9:50 AM - 10:00 AM']
While attempting to do that, I had written an iterator to list each time string as a time object, but failing in conversion.
import time
tlist = ['10:10 AM - 10:20 AM', '10:20 AM - 10:30 AM', '10:30 AM - 10:40 AM', '10:40 AM - 10:50 AM', '10:50 AM - 11:00 AM', '11:00 AM - 11:10 AM', '11:10 AM - 11:20 AM', '11:20 AM - 11:30 AM', '11:30 AM - 11:40 AM', '11:40 AM - 11:50 AM', '11:50 AM - 12:00 PM', '12:00 PM - 12:10 PM', '12:10 PM - 12:20 PM', '12:20 PM - 12:30 PM', '12:30 PM - 12:40 PM', '12:40 PM - 12:50 PM', '12:50 PM - 1:00 PM', '1:00 PM - 1:10 PM', '1:10 PM - 1:20 PM', '1:20 PM - 1:30 PM', '1:30 PM - 1:40 PM', '1:40 PM - 1:50 PM', '1:50 PM - 2:00 PM', '2:00 PM - 2:10 PM', '2:10 PM - 2:20 PM', '2:20 PM - 2:30 PM', '2:30 PM - 2:40 PM', '2:40 PM - 2:50 PM', '2:50 PM - 3:00 PM', '3:00 PM - 3:10 PM', '3:10 PM - 3:20 PM', '3:20 PM - 3:30 PM', '3:30 PM - 3:40 PM', '3:40 PM - 3:50 PM', '3:50 PM - 4:00 PM', '4:00 PM - 4:10 PM', '4:10 PM - 4:20 PM', '4:20 PM - 4:30 PM', '4:30 PM - 4:40 PM', '4:40 PM - 4:50 PM', '4:50 PM - 5:00 PM', '5:00 PM - 5:10 PM', '5:10 PM - 5:20 PM', '5:20 PM - 5:30 PM', '5:30 PM - 5:40 PM', '5:40 PM - 5:50 PM', '5:50 PM - 6:00 PM', '6:00 PM - 6:10 PM', '6:10 PM - 6:20 PM', '6:20 PM - 6:30 PM', '6:30 PM - 6:40 PM', '6:40 PM - 6:50 PM', '6:50 PM - 7:00 PM', '7:00 PM - 7:10 PM', '7:10 AM - 7:20 AM', '7:10 PM - 7:20 PM', '7:20 AM - 7:30 AM', '7:20 PM - 7:30 PM', '7:30 AM - 7:40 AM', '7:30 PM - 7:40 PM', '7:40 AM - 7:50 AM', '7:40 PM - 7:50 PM', '7:50 AM - 8:00 AM', '7:50 PM - 8:00 PM', '8:00 AM - 8:10 AM', '8:00 PM - 8:10 PM', '8:10 AM - 8:20 AM', '8:10 PM - 8:20 PM', '8:20 AM - 8:30 AM', '8:20 PM - 8:30 PM', '8:30 AM - 8:40 AM', '8:30 PM - 8:40 PM', '8:40 AM - 8:50 AM', '8:40 PM - 8:50 PM', '8:50 AM - 9:00 AM', '8:50 PM - 9:00 PM', '9:00 AM - 9:10 AM', '9:00 PM - 9:10 PM', '9:10 AM - 9:20 AM', '9:10 PM - 9:20 PM', '9:20 AM - 9:30 AM', '9:20 PM - 9:30 PM', '9:30 AM - 9:40 AM', '9:40 AM - 9:50 AM', '9:50 AM - 10:00 AM']
for t in tlist:
f = t.split('-')[0]
print(f)
ft = time.strptime(f, "%I:%M %p")
print(f, ft)
I'm getting an error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-19-0a2d8195df22> in <module>()
4 f = t.split('-')[0]
5 print(f)
----> 6 ft = time.strptime(f, "%I:%M %p")
7 print(f, ft)
/usr/lib/python3.6/_strptime.py in _strptime_time(data_string, format)
557 """Return a time struct based on the input string and the
558 format string."""
--> 559 tt = _strptime(data_string, format)[0]
560 return time.struct_time(tt[:time._STRUCT_TM_ITEMS])
561
/usr/lib/python3.6/_strptime.py in _strptime(data_string, format)
363 if len(data_string) != found.end():
364 raise ValueError("unconverted data remains: %s" %
--> 365 data_string[found.end():])
366
367 iso_year = year = None
ValueError: unconverted data remains:
How can I fix this error? Is there an easier technique of sorting these other than the tedious method of looping over the list and transferring to an intermediary list?
By doing
f = t.split('-')[0]
ft = time.strptime(f, "%I:%M %p")
you end up with a space before and after each date string (eg '10:10 AM - 10:20 AM' becomes '10:10 AM ' and ' 10:20 AM').
This is also what the error message is saying:
ValueError: unconverted data remains:
strptime tried to apply the format %I:%M %p to f, but it got a leftover whitespace it did not know what to do with.
The solution is to either
split on ' - ': f = t.split(' - ')[0]
or
use strip (f = t.split('-')[0].strip()) (probably the better solution as it is a bit more generic)
You could also include the whitespace into the format (time.strptime(f, "%I:%M %p ")) but this will be a de-facto fix just waiting to break again in the future.
Change
f = t.split('-')[0]
To
f = t.split('-')[0].strip()
After split('-'), you will get 2 value exp: '10:10 AM ' and ' 10:20 AM'. So, It need to remove space in these value.
Using sorted
Ex:
import time
tlist = ['10:10 AM - 10:20 AM', '10:20 AM - 10:30 AM', '10:30 AM - 10:40 AM', '10:40 AM - 10:50 AM', '10:50 AM - 11:00 AM', '11:00 AM - 11:10 AM', '11:10 AM - 11:20 AM', '11:20 AM - 11:30 AM', '11:30 AM - 11:40 AM', '11:40 AM - 11:50 AM', '11:50 AM - 12:00 PM', '12:00 PM - 12:10 PM', '12:10 PM - 12:20 PM', '12:20 PM - 12:30 PM', '12:30 PM - 12:40 PM', '12:40 PM - 12:50 PM', '12:50 PM - 1:00 PM', '1:00 PM - 1:10 PM', '1:10 PM - 1:20 PM', '1:20 PM - 1:30 PM', '1:30 PM - 1:40 PM', '1:40 PM - 1:50 PM', '1:50 PM - 2:00 PM', '2:00 PM - 2:10 PM', '2:10 PM - 2:20 PM', '2:20 PM - 2:30 PM', '2:30 PM - 2:40 PM', '2:40 PM - 2:50 PM', '2:50 PM - 3:00 PM', '3:00 PM - 3:10 PM', '3:10 PM - 3:20 PM', '3:20 PM - 3:30 PM', '3:30 PM - 3:40 PM', '3:40 PM - 3:50 PM', '3:50 PM - 4:00 PM', '4:00 PM - 4:10 PM', '4:10 PM - 4:20 PM', '4:20 PM - 4:30 PM', '4:30 PM - 4:40 PM', '4:40 PM - 4:50 PM', '4:50 PM - 5:00 PM', '5:00 PM - 5:10 PM', '5:10 PM - 5:20 PM', '5:20 PM - 5:30 PM', '5:30 PM - 5:40 PM', '5:40 PM - 5:50 PM', '5:50 PM - 6:00 PM', '6:00 PM - 6:10 PM', '6:10 PM - 6:20 PM', '6:20 PM - 6:30 PM', '6:30 PM - 6:40 PM', '6:40 PM - 6:50 PM', '6:50 PM - 7:00 PM', '7:00 PM - 7:10 PM', '7:10 AM - 7:20 AM', '7:10 PM - 7:20 PM', '7:20 AM - 7:30 AM', '7:20 PM - 7:30 PM', '7:30 AM - 7:40 AM', '7:30 PM - 7:40 PM', '7:40 AM - 7:50 AM', '7:40 PM - 7:50 PM', '7:50 AM - 8:00 AM', '7:50 PM - 8:00 PM', '8:00 AM - 8:10 AM', '8:00 PM - 8:10 PM', '8:10 AM - 8:20 AM', '8:10 PM - 8:20 PM', '8:20 AM - 8:30 AM', '8:20 PM - 8:30 PM', '8:30 AM - 8:40 AM', '8:30 PM - 8:40 PM', '8:40 AM - 8:50 AM', '8:40 PM - 8:50 PM', '8:50 AM - 9:00 AM', '8:50 PM - 9:00 PM', '9:00 AM - 9:10 AM', '9:00 PM - 9:10 PM', '9:10 AM - 9:20 AM', '9:10 PM - 9:20 PM', '9:20 AM - 9:30 AM', '9:20 PM - 9:30 PM', '9:30 AM - 9:40 AM', '9:40 AM - 9:50 AM', '9:50 AM - 10:00 AM']
print(sorted(tlist, key=lambda x: time.strptime(x.split("-")[0].strip(), "%I:%M %p")))
Output:
['7:10 AM - 7:20 AM', '7:20 AM - 7:30 AM', '7:30 AM - 7:40 AM', '7:40 AM - 7:50 AM', '7:50 AM - 8:00 AM', '8:00 AM - 8:10 AM', '8:10 AM - 8:20 AM', '8:20 AM - 8:30 AM', '8:30 AM - 8:40 AM', '8:40 AM - 8:50 AM', '8:50 AM - 9:00 AM', '9:00 AM - 9:10 AM', '9:10 AM - 9:20 AM', '9:20 AM - 9:30 AM', '9:30 AM - 9:40 AM', '9:40 AM - 9:50 AM', '9:50 AM - 10:00 AM', '10:10 AM - 10:20 AM', '10:20 AM - 10:30 AM', '10:30 AM - 10:40 AM', '10:40 AM - 10:50 AM', '10:50 AM - 11:00 AM', '11:00 AM - 11:10 AM', '11:10 AM - 11:20 AM', '11:20 AM - 11:30 AM', '11:30 AM - 11:40 AM', '11:40 AM - 11:50 AM', '11:50 AM - 12:00 PM', '12:00 PM - 12:10 PM', '12:10 PM - 12:20 PM', '12:20 PM - 12:30 PM', '12:30 PM - 12:40 PM', '12:40 PM - 12:50 PM', '12:50 PM - 1:00 PM', '1:00 PM - 1:10 PM', '1:10 PM - 1:20 PM', '1:20 PM - 1:30 PM', '1:30 PM - 1:40 PM', '1:40 PM - 1:50 PM', '1:50 PM - 2:00 PM', '2:00 PM - 2:10 PM', '2:10 PM - 2:20 PM', '2:20 PM - 2:30 PM', '2:30 PM - 2:40 PM', '2:40 PM - 2:50 PM', '2:50 PM - 3:00 PM', '3:00 PM - 3:10 PM', '3:10 PM - 3:20 PM', '3:20 PM - 3:30 PM', '3:30 PM - 3:40 PM', '3:40 PM - 3:50 PM', '3:50 PM - 4:00 PM', '4:00 PM - 4:10 PM', '4:10 PM - 4:20 PM', '4:20 PM - 4:30 PM', '4:30 PM - 4:40 PM', '4:40 PM - 4:50 PM', '4:50 PM - 5:00 PM', '5:00 PM - 5:10 PM', '5:10 PM - 5:20 PM', '5:20 PM - 5:30 PM', '5:30 PM - 5:40 PM', '5:40 PM - 5:50 PM', '5:50 PM - 6:00 PM', '6:00 PM - 6:10 PM', '6:10 PM - 6:20 PM', '6:20 PM - 6:30 PM', '6:30 PM - 6:40 PM', '6:40 PM - 6:50 PM', '6:50 PM - 7:00 PM', '7:00 PM - 7:10 PM', '7:10 PM - 7:20 PM', '7:20 PM - 7:30 PM', '7:30 PM - 7:40 PM', '7:40 PM - 7:50 PM', '7:50 PM - 8:00 PM', '8:00 PM - 8:10 PM', '8:10 PM - 8:20 PM', '8:20 PM - 8:30 PM', '8:30 PM - 8:40 PM', '8:40 PM - 8:50 PM', '8:50 PM - 9:00 PM', '9:00 PM - 9:10 PM', '9:10 PM - 9:20 PM', '9:20 PM - 9:30 PM']
I'm working on a Django application and I'm just trying to push data up to the front-end to display.
In my views.py here's what I have:
def index(request):
...
context = RequestContext(request)
rooms = dict(db.studybug.find_one())
timeRange = [room.encode('utf-8') for room in rooms['timeRange']]
return render_to_response('studybug/index.html', timeRange, context)
Here, timeRange is a list that contains the following:
timeRange = ['Room 203A 10:00 AM \xc2\xa0', 'Room 203A 10:30 AM \xc2\xa0', 'Room 203A 11:00 AM \xc2\xa0', 'Room 203A 11:30 AM \xc2\xa0', 'Room 203A 12:00 PM \xc2\xa0', 'Room 203A 12:30 PM \xc2\xa0', 'Room 203A 3:00 PM \xc2\xa0', 'Room 203A 3:30 PM \xc2\xa0', 'Room 203A 4:00 PM \xc2\xa0', 'Room 203A 4:30 PM \xc2\xa0', 'Room 203A 5:00 PM \xc2\xa0', 'Room 203A 5:30 PM \xc2\xa0', 'Room 203A 6:00 PM \xc2\xa0', 'Room 203A 6:30 PM \xc2\xa0', 'Room 203A 7:00 PM \xc2\xa0', 'Room 203A 7:30 PM \xc2\xa0', 'Room 203A 8:00 PM \xc2\xa0', 'Room 203A 8:30 PM \xc2\xa0', 'Room 203A 9:00 PM \xc2\xa0', 'Room 203A 9:30 PM \xc2\xa0', 'Room 203A 10:00 PM \xc2\xa0', 'Room 203A 10:30 PM \xc2\xa0', 'Room 203A 11:00 PM \xc2\xa0', 'Room 203A 11:30 PM \xc2\xa0']
And then in my template (index.html), I have the following loop:
<div class="row">
...
<ul>
{% for item in timeRange %}
<li>{{ item }}</li>
{% endfor %}
</ul>
</div>
However, despite the list being generated in the backend, nothing is being displayed on the webpage. I know the list exists, but Django's rendering engine won't display it.
Am I missing something obvious here?
Thanks,
G
render_to_response second parameter, should be a dict containing your data, you are passing a list.
your render_to_response should looks like this:
return render_to_response('studybug/index.html', {'timeRange':timeRange}, context)
Given this base date:
base_date = "10/29 06:58 AM"
I want to find a tuple within the list that contains the closest date to the base_date, but it must not be an earlier date.
list_date = [('10/30 02:18 PM', '-103', '-107'), ('10/30 02:17 PM', '+100', '-110'), \
('10/29 02:15 AM', '-101', '-109')
so here the output should be ('10/30 02:17 PM', '+100', '-110') (it can't be the 3rd tuple because the date there happened earlier than the base date)
My question is, does it exist any module for such date comparison? I tried to first change the data all to AM format and then compare but my code gets ugly with lots of slicing.
#edit:
Big list to test:
[('10/30 02:18 PM', '+13 -103', '-13 -107'), ('10/30 02:17 PM', '+13 +100', '-13 -110'), ('10/30 02:15 PM', '+13 -101', '-13 -109'), ('10/30 02:14 PM', '+13 -103', '-13 -107'), ('10/30 01:59 PM', '+13 -105', '-13 -105'), ('10/30 01:46 PM', '+13 -106', '-13 -104'), ('10/30 01:37 PM', '+13 -105', '-13 -105'), ('10/30 01:24 PM', '+13 -107', '-13 -103'), ('10/30 01:23 PM', '+13 -106', '-13 -104'), ('10/30 01:05 PM', '+13 -103', '-13 -107'), ('10/30 01:02 PM', '+13 -104', '-13 -106'), ('10/30 12:55 PM', '+13 -103', '-13 -107'), ('10/30 12:51 PM', '+13.5 -110', '-13.5 +100'), ('10/30 12:44 PM', '+13.5 -108', '-13.5 -102'), ('10/30 12:38 PM', '+13.5 -107', '-13.5 -103'), ('10/30 12:35 PM', '+13 -102', '-13 -108'), ('10/30 12:34 PM', '+13 -103', '-13 -107'), ('10/30 12:06 PM', '+13.5 -110', '-13.5 +100'), ('10/30 11:57 AM', '+13.5 -108', '-13.5 -102'), ('10/30 11:36 AM', '+13.5 -107', '-13.5 -103'), ('10/30 09:01 AM', '+13.5 -110', '-13.5 +100'), ('10/30 08:59 AM', '+13.5 -108', '-13.5 -102'), ('10/30 08:13 AM', '+13.5 -105', '-13.5 -105'), ('10/30 06:11 AM', '+13.5 +100', '-13.5 -110'), ('10/30 06:09 AM', '+13.5 -105', '-13.5 -105'), ('10/30 06:04 AM', '+13.5 -110', '-13.5 +100'), ('10/30 05:32 AM', '+13.5 -105', '-13.5 -105'), ('10/30 04:48 AM', '+13.5 -107', '-13.5 -103'), ('10/30 12:51 AM', '+13.5 -110', '-13.5 +100'), ('10/29 01:31 PM', '+13.5 -105', '-13.5 -105'), ('10/29 01:31 PM', '+13 +103', '-13 -113'), ('10/29 01:28 PM', '+13 -102', '-13 -108'), ('10/29 07:59 AM', '+13 -105', '-13 -105'), ('10/29 07:20 AM', '+13 -103', '-13 -107'), ('10/29 07:14 AM', '+13 -105', '-13 -105'), ('10/29 04:47 AM', '+13 +100', '-13 -110'), ('10/29 04:14 AM', '+13 -105', '-13 -105'), ('10/28 08:17 PM', '+12.5 +100', '-12.5 -110'), ('10/28 12:52 PM', '+12.5 -105', '-12.5 -105')]
Big list to test2:
[('10/30 04:30 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 04:21 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:15 PM', '+1.5 -112', '-1.5 +102'), ('10/30 04:14 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:57 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:40 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:31 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:30 PM', '+1.5 -109', '-1.5 -101'), ('10/30 03:25 PM', '+1.5 -107', '-1.5 -103'), ('10/30 03:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:23 PM', '+1.5 -108', '-1.5 -102'), ('10/30 03:22 PM', '+1.5 -106', '-1.5 -104'), ('10/30 02:14 PM', '+1.5 -104', '-1.5 -106'), ('10/30 01:41 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:37 PM', '+1.5 -107', '-1.5 -103'), ('10/30 01:36 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:06 PM', '+1.5 -103', '-1.5 -107'), ('10/30 12:56 PM', '+2 -111', '-2 +101'), ('10/30 12:53 PM', '+2 -110', '-2 +100'), ('10/30 12:50 PM', '+2 -113', '-2 +103'), ('10/30 12:49 PM', '+2 -112', '-2 +102'), ('10/30 12:46 PM', '+2 -113', '-2 +103'), ('10/30 12:45 PM', '+2 -110', '-2 +100'), ('10/30 12:43 PM', '+2 -108', '-2 -102'), ('10/30 12:38 PM', '+2.5 -116', '-2.5 +106'), ('10/30 12:38 PM', '+2.5 -113', '-2.5 +103'), ('10/30 12:37 PM', '+2.5 -110', '-2.5 +100'), ('10/30 10:30 AM', '+2.5 -105', '-2.5 -105'), ('10/30 10:07 AM', '+3 -113', '-3 +103'), ('10/30 09:55 AM', '+3 -112', '-3 +102'), ('10/30 09:51 AM', '+3 -110', '-3 +100'), ('10/30 09:32 AM', '+3 -109', '-3 -101'), ('10/30 06:04 AM', '+3 -110', '-3 +100'), ('10/30 03:16 AM', '+3 -107', '-3 -103'), ('10/30 03:14 AM', '+3.5 -116', '-3.5 +106'), ('10/30 01:03 AM', '+3.5 -115', '-3.5 +105'), ('10/30 12:17 AM', '+3.5 -110', '-3.5 +100'), ('10/29 08:52 PM', '+3.5 -108', '-3.5 -102'), ('10/29 01:31 PM', '+3.5 -105', '-3.5 -105'), ('10/29 06:48 AM', '+3.5 -110', '-3.5 +100'), ('10/29 06:47 AM', '+3.5 -109', '-3.5 -101'), ('10/29 05:39 AM', '+3.5 -113', '-3.5 +103'), ('10/29 03:34 AM', '+3.5 -108', '-3.5 -102'), ('10/29 12:44 AM', '+3.5 -110', '-3.5 +100'), ('10/29 12:41 AM', '+3.5 -107', '-3.5 -103'), ('10/29 12:40 AM', '+3.5 -105', '-3.5 -105'), ('10/28 12:52 PM', '+4 -105', '-4 -105')]
This can be done using datetime module, which is able to parse date string into datetime object, which supports comparison and arithmetic with dates:
from datetime import datetime
# function for parsing strings using specific format
get_datetime = lambda s: datetime.strptime(s, "%m/%d %I:%M %p")
base = get_datetime(base_date)
later = filter(lambda d: get_datetime(d[0]) > base, list_date)
closest_date = min(later, key = lambda d: get_datetime(d[0]))
>>> from datetime import timedelta, datetime
>>> base_date = "10/29 06:58 AM"
>>> b_d = datetime.strptime(base_date, "%m/%d %I:%M %p")
def func(x):
d = datetime.strptime(x[0], "%m/%d %I:%M %p")
delta = d - b_d if d > b_d else timedelta.max
return delta
...
>>> min(list_date, key = func)
('10/30 02:17 PM', '+100', '-110')
datetime.strptime converts the date to a datetime object, so b_d now looks something like this :
>>> b_d
datetime.datetime(1900, 10, 29, 6, 58)
Now we can write a function that can be passed to key parameter of min:
delta = d - b_d if d > b_d else timedelta.max
if d > b_d i.e if the date passed to min is greater than base_date then assign their difference to delta else assign timedelta.max to it.
>>> timedelta.max
datetime.timedelta(999999999, 86399, 999999)
Update:
>>> from datetime import timedelta, datetime
>>> base_date = '10/29 06:59 AM'
>>> b_d = datetime.strptime(base_date, "%m/%d %I:%M %p")
>>> def func(x):
... d = datetime.strptime(x[0], "%m/%d %I:%M %p")
... delta = d - b_d if d > b_d else timedelta.max
... return delta
...
>>> lis2 = [('10/30 04:30 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 04:21 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:15 PM', '+1.5 -112', '-1.5 +102'), ('10/30 04:14 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:57 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:40 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:31 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:30 PM', '+1.5 -109', '-1.5 -101'), ('10/30 03:25 PM', '+1.5 -107', '-1.5 -103'), ('10/30 03:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:23 PM', '+1.5 -108', '-1.5 -102'), ('10/30 03:22 PM', '+1.5 -106', '-1.5 -104'), ('10/30 02:14 PM', '+1.5 -104', '-1.5 -106'), ('10/30 01:41 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:37 PM', '+1.5 -107', '-1.5 -103'), ('10/30 01:36 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:06 PM', '+1.5 -103', '-1.5 -107'), ('10/30 12:56 PM', '+2 -111', '-2 +101'), ('10/30 12:53 PM', '+2 -110', '-2 +100'), ('10/30 12:50 PM', '+2 -113', '-2 +103'), ('10/30 12:49 PM', '+2 -112', '-2 +102'), ('10/30 12:46 PM', '+2 -113', '-2 +103'), ('10/30 12:45 PM', '+2 -110', '-2 +100'), ('10/30 12:43 PM', '+2 -108', '-2 -102'), ('10/30 12:38 PM', '+2.5 -116', '-2.5 +106'), ('10/30 12:38 PM', '+2.5 -113', '-2.5 +103'), ('10/30 12:37 PM', '+2.5 -110', '-2.5 +100'), ('10/30 10:30 AM', '+2.5 -105', '-2.5 -105'), ('10/30 10:07 AM', '+3 -113', '-3 +103'), ('10/30 09:55 AM', '+3 -112', '-3 +102'), ('10/30 09:51 AM', '+3 -110', '-3 +100'), ('10/30 09:32 AM', '+3 -109', '-3 -101'), ('10/30 06:04 AM', '+3 -110', '-3 +100'), ('10/30 03:16 AM', '+3 -107', '-3 -103'), ('10/30 03:14 AM', '+3.5 -116', '-3.5 +106'), ('10/30 01:03 AM', '+3.5 -115', '-3.5 +105'), ('10/30 12:17 AM', '+3.5 -110', '-3.5 +100'), ('10/29 08:52 PM', '+3.5 -108', '-3.5 -102'), ('10/29 01:31 PM', '+3.5 -105', '-3.5 -105'), ('10/29 06:48 AM', '+3.5 -110', '-3.5 +100'), ('10/29 06:47 AM', '+3.5 -109', '-3.5 -101'), ('10/29 05:39 AM', '+3.5 -113', '-3.5 +103'), ('10/29 03:34 AM', '+3.5 -108', '-3.5 -102'), ('10/29 12:44 AM', '+3.5 -110', '-3.5 +100'), ('10/29 12:41 AM', '+3.5 -107', '-3.5 -103'), ('10/29 12:40 AM', '+3.5 -105', '-3.5 -105'), ('10/28 12:52 PM', '+4 -105', '-4 -105')]
>>> min(lis2, key = func)
('10/29 01:31 PM', '+3.5 -105', '-3.5 -105')
Timing comparisons:
Script:
from datetime import datetime, timedelta
import sys
import time
list_date = [('10/30 04:30 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 04:21 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:15 PM', '+1.5 -112', '-1.5 +102'), ('10/30 04:14 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:57 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:40 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:31 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:30 PM', '+1.5 -109', '-1.5 -101'), ('10/30 03:25 PM', '+1.5 -107', '-1.5 -103'), ('10/30 03:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:23 PM', '+1.5 -108', '-1.5 -102'), ('10/30 03:22 PM', '+1.5 -106', '-1.5 -104'), ('10/30 02:14 PM', '+1.5 -104', '-1.5 -106'), ('10/30 01:41 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:37 PM', '+1.5 -107', '-1.5 -103'), ('10/30 01:36 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:06 PM', '+1.5 -103', '-1.5 -107'), ('10/30 12:56 PM', '+2 -111', '-2 +101'), ('10/30 12:53 PM', '+2 -110', '-2 +100'), ('10/30 12:50 PM', '+2 -113', '-2 +103'), ('10/30 12:49 PM', '+2 -112', '-2 +102'), ('10/30 12:46 PM', '+2 -113', '-2 +103'), ('10/30 12:45 PM', '+2 -110', '-2 +100'), ('10/30 12:43 PM', '+2 -108', '-2 -102'), ('10/30 12:38 PM', '+2.5 -116', '-2.5 +106'), ('10/30 12:38 PM', '+2.5 -113', '-2.5 +103'), ('10/30 12:37 PM', '+2.5 -110', '-2.5 +100'), ('10/30 10:30 AM', '+2.5 -105', '-2.5 -105'), ('10/30 10:07 AM', '+3 -113', '-3 +103'), ('10/30 09:55 AM', '+3 -112', '-3 +102'), ('10/30 09:51 AM', '+3 -110', '-3 +100'), ('10/30 09:32 AM', '+3 -109', '-3 -101'), ('10/30 06:04 AM', '+3 -110', '-3 +100'), ('10/30 03:16 AM', '+3 -107', '-3 -103'), ('10/30 03:14 AM', '+3.5 -116', '-3.5 +106'), ('10/30 01:03 AM', '+3.5 -115', '-3.5 +105'), ('10/30 12:17 AM', '+3.5 -110', '-3.5 +100'), ('10/29 08:52 PM', '+3.5 -108', '-3.5 -102'), ('10/29 01:31 PM', '+3.5 -105', '-3.5 -105'), ('10/29 06:48 AM', '+3.5 -110', '-3.5 +100'), ('10/29 06:47 AM', '+3.5 -109', '-3.5 -101'), ('10/29 05:39 AM', '+3.5 -113', '-3.5 +103'), ('10/29 03:34 AM', '+3.5 -108', '-3.5 -102'), ('10/29 12:44 AM', '+3.5 -110', '-3.5 +100'), ('10/29 12:41 AM', '+3.5 -107', '-3.5 -103'), ('10/29 12:40 AM', '+3.5 -105', '-3.5 -105'), ('10/28 12:52 PM', '+4 -105', '-4 -105')]
base_date = "10/29 06:58 AM"
def func1(list_date):
#http://stackoverflow.com/a/17249420/846892
get_datetime = lambda s: datetime.strptime(s, "%m/%d %I:%M %p")
base = get_datetime(base_date)
later = filter(lambda d: get_datetime(d[0]) > base, list_date)
return min(later, key = lambda d: get_datetime(d[0]))
def func2(list_date):
#http://stackoverflow.com/a/17249470/846892
b_d = datetime.strptime(base_date, "%m/%d %I:%M %p")
def func(x):
d = datetime.strptime(x[0], "%m/%d %I:%M %p")
delta = d - b_d if d > b_d else timedelta.max
return delta
return min(list_date, key = func)
def func3(list_date):
#http://stackoverflow.com/a/17249529/846892
fmt = '%m/%d %I:%M %p'
d = datetime.strptime(base_date, fmt)
def foo(x):
return (datetime.strptime(x[0],fmt)-d).total_seconds() > 0
return sorted(list_date, key=foo)[-1]
def func4(list_date):
#http://stackoverflow.com/a/17249441/846892
fmt = '%m/%d %I:%M %p'
base_d = datetime.strptime(base_date, fmt)
candidates = ((datetime.strptime(d, fmt), d, x, y) for d, x, y in list_date)
candidates = min((dt, d, x, y) for dt, d, x, y in candidates if dt > base_d)
return candidates[1:]
Results:
>>> from so import *
#check output irst
>>> func1(list_date)
('10/29 01:31 PM', '+3.5 -105', '-3.5 -105')
>>> func2(list_date)
('10/29 01:31 PM', '+3.5 -105', '-3.5 -105')
>>> func3(list_date)
('10/29 01:31 PM', '+3.5 -105', '-3.5 -105')
>>> func4(list_date)
('10/29 01:31 PM', '+3.5 -105', '-3.5 -105')
>>> %timeit func1(list_date)
100 loops, best of 3: 3.07 ms per loop
>>> %timeit func2(list_date)
100 loops, best of 3: 1.59 ms per loop #winner
>>> %timeit func3(list_date)
100 loops, best of 3: 1.91 ms per loop
>>> %timeit func4(list_date)
1000 loops, best of 3: 2.02 ms per loop
#increase the input size
>>> list_date = list_date *10**3
>>> len(list_date)
48000
>>> %timeit func1(list_date)
1 loops, best of 3: 3.6 s per loop
>>> %timeit func2(list_date) #winner
1 loops, best of 3: 1.99 s per loop
>>> %timeit func3(list_date)
1 loops, best of 3: 2.09 s per loop
>>> %timeit func4(list_date)
1 loops, best of 3: 2.02 s per loop
#increase the input size again
>>> list_date = list_date *10
>>> len(list_date)
480000
>>> %timeit func1(list_date)
1 loops, best of 3: 36.4 s per loop
>>> %timeit func2(list_date) #winner
1 loops, best of 3: 20.2 s per loop
>>> %timeit func3(list_date)
1 loops, best of 3: 22.8 s per loop
>>> %timeit func4(list_date)
1 loops, best of 3: 22.7 s per loop
decorate, filter, find the closest date, undecorate
>>> base_date = "10/29 06:58 AM"
>>> list_date = [
... ('10/30 02:18 PM', '-103', '-107'),
... ('10/30 02:17 PM', '+100', '-110'),
... ('10/29 02:15 AM', '-101', '-109')
... ]
>>> import datetime
>>> fmt = '%m/%d %H:%M %p'
>>> base_d = datetime.datetime.strptime(base_date, fmt)
>>> candidates = ((datetime.datetime.strptime(d, fmt), d, x, y) for d, x, y in list_date)
>>> candidates = min((dt, d, x, y) for dt, d, x, y in candidates if dt > base_d)
>>> print candidates[1:]
('10/30 02:17 PM', '+100', '-110')
You can consider putting the dates list into a Pandas index and then use 'truncate' or 'get_loc' function.
import pandas as pd
##Initial inputs
list_date = [('10/30 02:18 PM', '-103', '-107'),('10/29 02:15 AM', '-101', '-109') , ('10/30 02:17 PM', '+100', '-110'), \
] # reordered to show the method is input order insensitive
base_date = "10/29 06:58 AM"
##Make a data frame with data
df=pd.DataFrame(list_date)
df.columns=['date','val1','val2']
dateIndex=pd.to_datetime(df['date'], format='%m/%d %I:%M %p')
df=df.set_index(dateIndex)
df=df.sort_index(ascending=False) #earliest comes on top
##Find the result
base_dateObj=pd.to_datetime(base_date, format='%m/%d %I:%M %p')
result=df.truncate(after=base_dateObj).iloc[-1] #take the bottom value, or the 1st after the base date
(result['date'],result['val1'], result['val2']) # result is ('10/30 02:17 PM', '+100', '-110')
Reference: this link
Linear search?
import sys
import time
base_date = "10/29 06:58 AM"
def str_to_my_time(my_str):
return time.mktime(time.strptime(my_str, "%m/%d %I:%M %p"))
# assume year 1900...
base_dt = str_to_my_time(base_date)
list_date = [('10/30 02:18 PM', '-103', '-107'),
('10/30 02:17 PM', '+100', '-110'),
('10/29 02:15 AM', '-101', '-109')]
best_delta = sys.maxint
best_match = None
for t in list_date:
the_dt = str_to_my_time(t[0])
delta_sec = the_dt - base_dt
if (delta_sec >= 0) and (delta_sec < best_delta):
best_delta = delta_sec
best_match = t
print best_match, best_delta
Producing:
('10/30 02:17 PM', '+100', '-110') 112740.0
import time
import sys
#The Function
def to_sec(date_string):
return time.mktime(time.strptime(date_string, '%m/%d %I:%M %p'))
#The Test
base_date = "10/29 06:58 AM"
base_date_sec = to_sec(base_date)
result = None
difference = sys.maxint
list_date = [
('10/30 02:18 PM', '-103', '-107'),
('10/30 02:17 PM', '+100', '-110'),
('10/29 02:15 AM', '-101', '-109') ]
for date_str in list_date:
diff_sec = to_sec(date_str[0])-base_date_sec
if diff_sec >= 0 and diff_sec < difference:
result = date_str
difference = diff_sec
print result
import datetime
fmt = '%m/%d %H:%M %p'
d = datetime.datetime.strptime(base_date, fmt)
def foo(x):
return (datetime.datetime.strptime(x[0],fmt)-d).total_seconds() > 0
sorted(list_date, key=foo)[-1]
I was looking up this problem and found some answers, most of which check all elements.
I have my dates sorted (and assume most people do), so if you do as well, use numpy:
import numpy as np
// dates is a numpy array of np.datetime64 objects
dates = np.array([date1, date2, date3, ...], dtype=np.datetime64)
timestamp = np.datetime64('Your date')
np.searchsorted(dates, timestamp)
searchsorted uses binary search, which uses the fact the dates are sorted, and is thus very efficient.
If you use pandas, this is possible:
dates = df.index # df is a DatetimeIndex-ed dataframe
timestamp = pd.to_datetime('your date here', format='its format')
np.searchsorted(dates, timestamp)
The function returns the index of the closest date (if the searched date is included in dates, its index is returned [if that isn't wanted, use side='right' as an argument into the function]), so to get the date do this:
dates[np.searchsorted(dates, timestamp)]