Make timeline-chart out of .csv data - python

for item in list_of_dictionaries:
print(item['created_time']
I have a list of 70k dictionaries and each dict has the above key and the value is in this format:
Wed Sep 20 23:40:58 +0000 2017
What i need is to make a dictionarythat would tell me the number of entries for every hour in each day, such as:
for Tue:
[00:200,01:231,02 ... ]
Is there any way to convert that string ( Wed Sep 20 23:40:58 2017) into a datetime format so i could group them for every half an hour?

Related

How to convert different rss feed dates with Python so they can be ordered

Trying to make an RSS feed reader using django, feedparser and dateutil
Getting this error: can't compare offset-naive and offset-aware datetimes
I just have five feeds right now. These are the datetimes from the feeds..
Sat, 10 Sep 2022 23:08:59 -0400
Sun, 11 Sep 2022 04:08:30 +0000
Sun, 11 Sep 2022 13:12:18 +0000
2022-09-10T01:01:16+00:00
Sat, 17 Sep 2022 11:27:15 EDT
I was able to order the first four feeds and then I got the error when I added the last one.
## create a list of lists - each inner list holds entries from a feed
parsed_feeds = [feedparser.parse(url)['entries'] for url in feed_urls]
## put all entries in one list
parsed_feeds2 = [item for feed in parsed_feeds for item in feed]
## sort entries by date
parsed_feeds2.sort(key=lambda x: dateutil.parser.parse(x['published']), reverse=True)
How can I make all the datetimes from the feeds the same so they can be ordered?

Python dateparser fails when timezone is in middle

I'm trying to parse a date string using the following code:
from dateutil.parser import parse
datestring = 'Thu Jul 25 15:13:16 GMT+06:00 2019'
d = parse(datestring)
print (d)
The parsed date is:
datetime.datetime(2019, 7, 25, 15, 13, 16, tzinfo=tzoffset(None, -21600))
As you can see, instead of adding 6 hours to GMT, it actually subtracted 6 hours.
What's wrong I'm doing here? Any help on how can I parse datestring in this format?
There's a comment in the source: https://github.com/dateutil/dateutil/blob/cbcc0871792e7eed4a42cc62630a08ec7a78be30/dateutil/parser/_parser.py#L803.
# Check for something like GMT+3, or BRST+3. Notice
# that it doesn't mean "I am 3 hours after GMT", but
# "my time +3 is GMT". If found, we reverse the
# logic so that timezone parsing code will get it
# right.
Important parts
Notice that it doesn't mean "I am 3 hours after GMT", but "my time +3 is GMT"
If found, we reverse the logic so that timezone parsing code will get it right
Last sentence in that comment (and 2nd bullet point above) explains why 6 hours are subtracted. Hence, Thu Jul 25 15:13:16 GMT+06:00 2019 means Thu Jul 25 09:13:16 2019 GMT.
Take a look at http://www.timebie.com/tz/timediff.php?q1=Universal%20Time&q2=GMT%20+6%20Time for more context.
dateutil.parse converts every time into GMT. The input is being read as 15:13:16 in GMT+06:00 time. Naturally, it becomes 15:13:16-06:00 in GMT.

Selecting specific dates from dataframe

I have a dataset with the column 'Date', which has dates in several formats, including:
2018.05.07
01-Jun-2018
Reported 01 Jun 2018
Jun 2018
2018
before 1970
1941-1945
Ca. 1960
There are also invalid dates, such as:
190Feb-2010
I am trying to find dates which have an exact date (day, month, and year) and convert them to datetime. I also need to exclude dates with "Reported" in the field. Is there any way to filter such data without finding before all the possible formats of dates?
Using dateutil library.
if statement to check if any part of date (month,year,date) is missing, if yes then avoid it.
use fuzzy=True if want to extract dates from strings such as "Reported 01 Jun 2018"
import dateutil.parser
dates = ["2018.05.07","01-Jun-2018","Reported 01 Jun 2018","Jun 2018","2018","before 1970","1941-1945","Ca. 1960","190Feb-2010"]
formated_date = []
for date in dates:
try:
if dateutil.parser.parse(date,fuzzy=False,default=datetime.datetime(2015, 1, 1)) == dateutil.parser.parse(date,fuzzy=False,default=datetime.datetime(2016, 2, 2)):
formated_date.append(yourdate)
except:
continue
another solution. This is brute force method that check each date with every format. Keep on adding more formats to make it work on any date format. But this is time taking method.
import datetime
dates = ["2018.05.07","01-Jun-2018","Reported 01 Jun 2018","Jun 2018","2018","before 1970","1941-1945","Ca. 1960","190Feb-2010"]
formats = ["%Y%m%d","%Y.%m.%d","%Y-%m-%d","%Y/%m/%d","%Y%a%d","%Y.%a.%d","%Y-%a-%d","%Y%A%d","%Y.%A.%d","%Y-%A-%d",
"%d-%m-%Y","%d.%m.%Y","%d%m%Y","%d/%m/%Y","%d-%b-%Y","%d%b%Y","%d.%b.%Y","%d/%b/%Y"]
formated_date = []
for date in dates:
for fmt in formats:
try:
dt = datetime.datetime.strptime(date,fmt)
formated_date.append(dt)
except:
continue
In [1]: string_with_dates = """entries are due by January 4th, 2017 at 8:00pm created 01/15/2005 by ACME Inc. and associates."""
In [2]: import datefinder
In [3]: matches = datefinder.find_dates(string_with_dates)
In [4]: for match in matches:
...: print match
2017-01-04 20:00:00
2005-01-15 00:00:00
Hope this would help you to find dates from string with dates

Non-standard week numbering

I'm building an API for my college's timetable website. The default option is to look up today's timetable. However, the timetable system is using week numbering different to the one we normally use.
One of the 2 things I need to build the URL is the week number.
TIMETABLE_URL = 'http://timetable.ait.ie/reporting/textspreadsheet;student+set;id;{}?t=student+set+textspreadsheet&days=1-5&=&periods=3-20&=student+set+textspreadsheet&weeks={}&template=student+set+textspreadsheet'
Week numbering should start at this date: 27 Aug 2018 - 2 Sep 2018. Following this, week 2 would be 3 Sep 2018-9 Sep 2018 and so on. This should carry over New Years, the date of 31 Dec 2018-6 Jan 2019 would be week 19. This 'year' would have 52 weeks total.
I know how to check if a certain date is in between one of the ranges from above, but I want to avoid manually setting all the date ranges. How can I have a script know that, for example, it's in week 3 on 12 September?
Using datetime.datetime object:
from datetime import datetime
start = datetime.strptime('20180827', '%Y%m%d')
current = datetime.strptime('20180912', '%Y%m%d')
print((current - start).days//7 +1) # The week number
# Output: 3
This can also handle different years. Note that this only works when the start date is Monday.

Increment times if date has changed?

I have a list of strings, and one of them looks like this:
'Thu Jun 18 19:58:02 2015
...many lines of data...
txup: 19:59:47 txdown: 20:05:22
rxup: 21:43:17 rxdown: 22:13:01'
But another may look like this:
'Fri Jun 19 23:12:12 2015
...many lines of data...
txup: 23:39:47 txdown: 23:57:22
rxup: 00:01:17 rxdown: 01:13:01'
As you can see, in some cases a time might cross midnight. When that happens, using the above string as an example, the date associated with that time would now be Jun 20 instead of Jun 19.
I need to write a code that compares the 'rxup' time with the date/time at the start of the string and recognizes if and when it increases by a day because it passed midnight (all relative to the date/time at the beginning).
If it hasn't crossed midnight and is thus the same day, then I'm done. But if it has crossed midnight, I need to take the difference between that time and the time at the beginning probably as a timedelta object, and add that increment onto a copy of the time at the beginning. How would I do this?
If the times grow monotonously you can simply compare them in lexicographic order. '00:01:17' is obviously less than '23:39:47', so each time next timestamp is less than the current one, you increment the date.
Assuming that rxup always appears after txdown, but less than 24 hours later, you can compare it like below:
# txdown, rxup - datetimes with respective times, date part doesn't matters
# associated_date - datetime associated with string
if rxup.time() < txdown.time():
associated_date += datetime.timedelta(days=1)

Categories