What's the best way to get datestrings from a website using Python?
The datestrings can be, for example, in the forms of:
April 1st, 2011
April 2nd, 2011
April 23, 2011
4/2/2011
04/23/2011
Would this have to be a ton of regex? What's the most elegant solution?
Consider this lib: http://code.google.com/p/parsedatetime/
From its examples Wiki page, here are a couple of formats it can handle that look relevant to your question:
result = p.parseDateText("March 5th, 1980")
result = p.parseDate("4/4/80")
EDIT: now I notice it's actually a duplicate of this SO question where the same library was recommended!
month = '(jan|feb|mar|apr|may|jun|jul|aug|sep|nov|dec)[a-z]{0,6}'
regex_strings = ['%s(\.| )\d{1,2},? \d{2,4}' % month, # Month.Day, Year
'\d{1,2} %s,? \d{4}' % month, # Day Month Year(4)
'%s \d{1,2}\w{2},? \d{4}' % month, # Mon Day(th), Year
'\d{1,2} %s' % month, # Day Month
'\d{1,2}\.\d{1,2}\.\d{4}', # Month.Day.Year
'\d{1,2}/\d{1,2}/\d{2,4}', # Month/Day/Year{2,4}
]
Related
So I am querying a server for specific data, and I need to extract the year, from the date field returned back, however the date field varies for example:
2009
2009-10-8
2009-10
2017-10-22
2017-10
The obvious would be to extract the date into a array and fetch the max: (but there is a problem)
year = max(d.split('-'))
for some reason this gives out false positives as 22 seems to be max verses 2017, also if future calls to the server result in the date being stored as "2019/10/20" this will bring forth issues as well.
The problem is that, while 2017 > 22, '2017' < '22' because it's a string comparison. You could do this to resolve that:
year = max(map(int, d.split('-')))
But instead, if you don't mind being frowned upon by the Long Now Foundation, consider using a regular expression to extract any 4-digit number:
match = re.search(r'\b\d{4}\b', d)
if match:
year = int(match.group(0))
I would use the python-dateutil library to easily extract the year from a date string:
from dateutil.parser import parse
dates = ['2009', '2009-10-8', '2009-10']
for date in dates:
print(parse(date).year)
Output:
2009
2009
2009
I'm trying to generate week number string using Python time module, considering week starts on Sunday.
If my interpretation of the official documentation is correct then this can be achieved by the following code:
import time
time.strftime("%U", time.localtime())
>> 37
My question is, is the above output correct? Shouldn't the output be 38 instead, considering the below details:
My timezone is IST (GMT+5:30)
import time
#Year
time.localtime()[0]
>> 2019
#Month
time.localtime()[1]
>> 9
#Day
time.localtime()[2]
>> 18
Yes, the output is correct. Week 1 started on January 6th, as that was the first Sunday in 2019. January 1st through 5th were week 0:
>>> time.strftime('%U', time.strptime("2019-1-1", "%Y-%m-%d"))
'00'
>>> time.strftime('%U', time.strptime("2019-1-6", "%Y-%m-%d"))
'01'
This is covered in the documentation:
All days in a new year preceding the first Sunday are considered to be in week 0.
You are perhaps looking for the ISO week date, but note that in this system the first day of the week is a Monday.
You can get the week number using that system with the datetime.date.isocalendar() method, or by formatting with %V:
>>> time.strftime("%V", time.localtime())
'38'
>>> from datetime import date
>>> date.today().isocalendar() # returns ISO year, week, and weekday
(2019, 38, 2)
>>> date.today().strftime("%V")
'38'
It's correct since you start counting from the first Sunday.
%U - week number of the current year, starting with the first Sunday as the first day of the first week
https://www.tutorialspoint.com/python/time_strftime.htm
It's correct. Since all days in a new year preceding the first Sunday are considered to be in week 0 (01/01 to 01/05), this week is the week 37.
Haven't been able to find an answer to this problem. Basically what I'm trying to do is this:
Take a daterange, for example October 10th to November 25th. What is the best algorithm for determining how many of the days in the daterange are in October and how many are in November.
Something like this:
def daysInMonthFromDaterange(daterange, month):
# do stuff
return days
I know that this is pretty easy to implement, I'm just wondering if there's a very good or efficient algorithm.
Thanks
Borrowing the algorithm from this answer How do I divide a date range into months in Python?
, this might work. The inputs are in date format, but can be changed to date strings if preferred:
import datetime
begin = '2018-10-10'
end = '2018-11-25'
dt_start = datetime.datetime.strptime(begin, '%Y-%m-%d')
dt_end = datetime.datetime.strptime(end, '%Y-%m-%d')
one_day = datetime.timedelta(1)
start_dates = [dt_start]
end_dates = []
today = dt_start
while today <= dt_end:
#print(today)
tomorrow = today + one_day
if tomorrow.month != today.month:
start_dates.append(tomorrow)
end_dates.append(today)
today = tomorrow
end_dates.append(dt_end)
out_fmt = '%d %B %Y'
for start, end in zip(start_dates,end_dates):
diff = (end - start).days
print('{} to {}: {} days'.format(start.strftime(out_fmt), end.strftime(out_fmt), diff))
result:
10 October 2018 to 31 October 2018: 21 days
01 November 2018 to 25 November 2018: 24 days
The problem as stated may not have a unique answer. For example what should you get from daysInMonthFromDaterange('Feb 15 - Mar 15', 'February')? That will depend on the year!
But if you substitute actual days, I would suggest converting from dates to integer days, using the first of the month to the first of the next month as your definition of a month. This is now reduced to intersecting intervals of integers, which is much easier.
The assumption that the first of the month always happened deals with months of different lengths, variable length months, and even correctly handles the traditional placement of the switch from the Julian calendar to the Gregorian. See cal 1752 for that. (It will not handle that switch for all locations though. Should you be dealing with a library that does Romanian dates in 1919, you could have a problem...)
You can use the datetime module:
from datetime import datetime
start = datetime(2018,10,10)
end = datetime(2018,11,25)
print((end - start).days)
Something like this would work:
def daysInMonthFromDaterange(date1, date2, month):
return [x for x in range(date1.toordinal(), date2.toordinal()) if datetime.date.fromordinal(x).year == month.year and datetime.date.fromordinal(x).month == month.month]
print(len(days_in_month(date(2018,10,10), date(2018,11,25), date(2018,10,01))))
This just loops through all the days between date1 and date2, and returns it as part of a list if it matches the year and month of the third argument.
I have seen many ways to determine week of the year. Like by giving instruction datetime.date(2016, 2, 14).isocalendar()[1] I get 6 as output. Which means 14th feb 2016 falls under 6th Week of the year. But I couldn't find any way by which I could find week of the Month.
Means IF I give input as some_function(2016,2,16)
I should get output as 3, denoting me that 16th Feb 2016 is 3rd week of the Feb 2016
[ this is different question than similar available question, here I'm asking about finding week no of the month and not of the year]
This function did the work what I wanted
from math import ceil
def week_of_month(dt):
first_day = dt.replace(day=1)
dom = dt.day
adjusted_dom = dom + first_day.weekday()
return int(ceil(adjusted_dom/7.0))
I got this function from This StackOverFlow Answer
import datetime
def week_number_of_month(date_value):
week_number = (date_value.isocalendar()[1] - date_value.replace(day=1).isocalendar()[1] + 1)
if week_number == -46:
week_number = 6
return week_number
date_given = datetime.datetime(year=2018, month=12, day=31).date()
week_number_of_month(date_given)
I am trying to get a date based on a number of the week, but there are some annoyances.
The date.weekday() returns the day of the week where 0 in Monday and 6 is Sunday.
The %w directive of date.strftime() and date.strptime() uses the 0 for Sunday and 6 for Saturday.
This causes some really annoying issues when trying to figure out a date given a week number from date.weekday().
Is there a better way of getting a date from a week number?
EDIT:
Added the example.
import datetime
original_date = datetime.date(2014, 8, 24)
week_of_the_date = original_date.isocalendar()[1] # 34
day_of_the_date = original_date.isocalendar()[2] # 7
temp = '{0} {1} {2}'.format(*(2014, week_of_the_date, day_of_the_date-1))
date_from_week = datetime.datetime.strptime(temp, '%Y %W %w')
week_from_new_date = date_from_week.isocalendar()[1] # 35!!
EDIT 2:
I ultimately put the date stuff in the view (using jQuery UI), it has more consistent notions of weeks.
I think the Sunday vs. Monday distinction between weekday and strftime using %W is moot - you could use isoweekday to get those to line up, or %U in strftime if you wanted Sunday as the first day of the week. The real problem is that strftime, based on the underlying C function, determines the first week of the year differently than the ISO definition. With %W the docs say: " All days in a new year preceding the first Monday are considered to be in week 0". ISO calendars count the week containing the first Thursday as the first week, for reasons I do not understand.
Two ways I found to work with ISO weeks, either just getting datetime.date instances back or supporting a variety of operations, are:
this answer with a simple timedelta approach:
What's the best way to find the inverse of datetime.isocalendar()?
this third-party library: https://pypi.python.org/pypi/isoweek/