Python: Parse String as Date with Formatting - python

A user can input a string and the string contains a date in the following formats MM/DD/YY or MM/DD/YYYY. Is there an efficient way to pull the date from the string? I was thinking of using RegEx for \d+\/\d+\/\d+. I also want the ability to be able to sort the dates. I.e. if the strings contain 8/17/15 and 08/16/2015, it would list the 8/16 date first and then 8/17

Have a look at datetime.strptime, it's a built in function that knows how to create a datetime object from a string. It accepts a string to be converted and the format the date is written in.
from datetime import datetime
def str_to_date(string):
pattern = '%m/%d/%Y' if len(string) > 8 else '%m/%d/%y'
try:
return datetime.strptime(string, pattern).date()
except ValueError:
raise # TODO: handle invalid input
The function returns a date() object which can be directly compared with other date() objects (e.g. when sorting) them.
Usage:
>>> d1 = str_to_date('08/13/2015')
>>> d2 = str_to_date('08/12/15')
>>> d1
datetime.date(2015, 8, 13)
>>> d2
datetime.date(2015, 8, 12)
>>> d1 > d2
True
Update
OP explained in a comment that strings such as 'foo 08/13/2015 bar' should not be automatically thrown away, and that the date should be extracted from them.
To achieve that, we must first search for a candidate string in user's input:
import re
from datetime import date
user_string = input('Enter something') # use raw_input() in Python 2.x
pattern = re.compile(r'(\d{2})/(\d{2})/(\d{4}|\d{2})') # 4 digits match first!
match = re.search(pattern, user_string)
if not match:
d = None
else:
month, day, year = map(int, match.groups())
try:
d = date(year, month, day)
except ValueError:
d = None # or handle error in a different way
print(d)
The code reads user input and then tries to find a pattern in it that represents a date in MM/DD/YYYY or MM/DD/YY format. Note that the last capturing group (in parentheses, i.e. ()) checks for either four or two consecutive digits.
If it finds a candidate date, it unpacks the capturing groups in the match, converting them to integers at the same time. It then uses the three matched pieces to tries to create a new date() object. If that fails, the candidate date was invalid, e.g. '02/31/2015'
Footnotes:
the code will only catch the first date candidate in the input
the regular expression used will, in its current form, also match dates in inputs like '12308/13/2015123'. If this is not desirable it would have to be modified, probably adding some lookahead/lookbehind assertions.

you could also try strptime:
import time
dates = ('08/17/15', '8/16/2015')
for date in dates:
print(date)
ret = None
try:
ret = time.strptime(date, "%m/%d/%Y")
except ValueError:
ret = time.strptime(date, "%m/%d/%y")
print(ret)
UPDATE
update after comments:
this way you will get a valid date back or None if the date can not be parsed:
import time
dates = ('08/17/15', '8/16/2015', '02/31/15')
for date in dates:
print(date)
ret = None
try:
ret = time.strptime(date, "%m/%d/%Y")
except ValueError:
try:
ret = time.strptime(date, "%m/%d/%y")
except ValueError:
pass
print(ret)
UPDATE 2
one more update after the comments about the requirements.
this is a version (it only takes care of the dates; not the text before/after. but using the regex group this can easily be extracted):
import re
import time
dates = ('foo 1 08/17/15', '8/16/2015 bar 2', 'foo 3 02/31/15 bar 4')
for date in dates:
print(date)
match = re.search('(?P<date>[0-9]+/[0-9]+/[0-9]+)', date)
date_str = match.group('date')
ret = None
try:
ret = time.strptime(date_str, "%m/%d/%Y")
except ValueError:
try:
ret = time.strptime(date_str, "%m/%d/%y")
except ValueError:
pass
print(ret)

Why not use strptime to store them as datetime objects. These objects can easily be compared and sorted that way.
import datetime
try:
date = datetime.datetime.strptime("08/03/2015", "%m/%d/%Y")
except:
date = datetime.datetime.strptime("08/04/15", "%m/%d/%y")
finally:
dateList.append(date)
Note the difference between %Y and %y. You can then just compare dates made this way to see which ones are greater or less. You can also sort it using dateList.sort()
If you want the date as a string again you can use:
>>> dateString = date.strftime("%Y-%m-%d")
>>> print dateString
'2015-08-03'

Why bother with regex when you can use datetime.strptime?

You can use the date parser from Pandas.
import pandas as pd
timestr = ['8/8/95', '8/15/2014']
>>> [pd.datetools.parse(d) for d in timestr]
[datetime.datetime(1995, 8, 8, 0, 0), datetime.datetime(2014, 8, 15, 0, 0)]

Using regex groups we'd get something like this:
import re
ddate = '08/16/2015'
reg = re.compile('(\d+)\/(\d+)\/(\d+)')
matching = reg.match(ddate)
if matching is not None:
print(matching.groups())
Would yield
('08','16','2015')
You could parse this after, but if you wanted to get rid of leading 0's from the first place you could use
reg = re.compile('0*(\d+)\/0*(\d+)\/(\d+)')

Related

Converting a date string to date format then do subtraction

I am given two dates as strings below, and I want to subtract them to get the number 16 as my output. I tried converting them to date format first and then doing the math, but it didn't work.
from datetime import datetime
date_string = '2021-05-27'
prev_date_string = '2021-05-11'
a = datetime.strptime(date_string '%y/%m/%d')
b = datetime.strptime(prev_date_string '%y/%m/%d')
c = a - b
print (c)
There are two problems with the strptime calls. First, they are missing commas (,) between the two arguments. Second, the format string you use must match the format of the dates you have.
Also, note the result of subtracting two datetime objects is a timedelta object. If you just want to print out the number 16, you'll need to extract the days property of the result:
a = datetime.strptime(date_string, '%Y-%m-%d')
b = datetime.strptime(prev_date_string, '%Y-%m-%d')
c = a-b
print (c.days)
The simple answer for this problem.
from datetime import date
a = date(2021, 5, 11)
b = date(2021, 5, 27)
c = b - a
print(c.days)

Adding day and month fields to date field which doesn't contain them

Say we have a list like this:
['1987', '1994-04', '2001-05-03']
We would like to convert these strings into datetime objects with a consistent format, then back into strings, something like this:
['1987-01-01', '1994-04-01', '2001-05-03']
In this case, we have decided if the date doesn't contain a month or a day, we set it to the first of that respective field. Is there a way to achieve this using a datetime or only by string detection?
Read about datetime module to understand the basics of handling dates and date-like strings in Python.
The below approach using try-except is one of the many ways to achieve the desired outcome.:
from datetime import datetime
strings = ['1987', '1994-04', '2001-05-03']
INPUT_FORMATS = ('%Y', '%Y-%m', '%Y-%m-%d')
OUTPUT_FORMAT = '%Y-%m-%d'
output = []
for s in strings:
dt = None
for f in INPUT_FORMATS:
try:
dt = datetime.strptime(s, f)
break
except ValueError:
pass
except:
raise
if not dt:
raise ValueError("%s doesn't match any of the formats" % s)
output.append(dt.date().strftime(OUTPUT_FORMAT))
print(output)
Output:
['1987-01-01', '1994-04-01', '2001-05-03']
The above code will work only with date formats listed in formats variable. You can add additional formats to it if you know them beforehand.

Identify and Extract Date from String - Python

I am looking to identify and extract a date from a number of different strings. The dates may not be formatted the same. I have been using the datefinder package but I am having some issues saving the output.
Goal: Extract the date from a string, which may be formatted in a number of different ways (ie April,22 or 4/22 or 22-Apr etc) and if there is no date, set the value to 'None' and append the date list with either the date or 'None'.
Please see the examples below.
Example 1: (This returns a date, but does not get appended to my list)
import datefinder
extracted_dates = []
sample_text = 'As of February 27, 2019 there were 28 dogs at the kennel.'
matches = datefinder.find_dates(sample_text)
for match in matches:
if match == None:
date = 'None'
extracted_dates.append(date)
else:
date = str(match)
extracted_dates.append(date)
Example 2: (This does not return a date, and does not get appended to my list)
import datefinder
extracted_dates = []
sample_text = 'As of the date, there were 28 dogs at the kennel.'
matches = datefinder.find_dates(sample_text)
for match in matches:
if match == None:
date = 'None'
extracted_dates.append(date)
else:
date = str(match)
extracted_dates.append(date)
I have tried using your package, but it seemed that there was no fast and general way of extracting the real date on your example.
I instead used the DateParser package and more specifically the search_dates method
I briefly tested it on your examples only.
from dateparser.search import search_dates
sample_text = 'As of February 27, 2019 there were 28 dogs at the kennel.'
extracted_dates = []
# Returns a list of tuples of (substring containing the date, datetime.datetime object)
dates = search_dates(sample_text)
if dates is not None:
for d in dates:
extracted_dates.append(str(d[1]))
else:
extracted_dates.append('None')
print(extracted_dates)

How to search for a substring, find the beginning and ending, and then check if that data is a weekday?

I've come up with the following which should be fairly close, but it's not quite right. I am getting the following error when I try to test if the data is a weekday. AttributeError: 'str' object has no attribute 'isoweekday'
Here is my feeble code:
offset = str(link).find('Run:')
amount = offset + 15
pos = str(link)[offset:amount]
if pos.isoweekday() in range(1, 6):
outF.write(str(link))
outF.write('\n')
I'm looking for the string 'Run: ' (it always has 2 blanks after the colon) and then I want to move 15 spaces to the right, to capture the date. So, n-number of spaces to find 'Run: ' and then get the date, like '2018-12-23' and test if this date is a weekday. If this substring is a weekday, I want to write the entire string to a line in a CSV file (the writing to a CSV file works fine). I'm just not sure how to find that one date (there are several dates in the string; I need the one immediately following 'Run: ').
You've only forgotten to load it into a datetime object:
from datetime import datetime
# ...
pos_date = datetime.strptime(pos, "%Y-%m-%d")
if pos_date.isoweekday() in range(1, 6):
# ...
Also, as you are using .isoweekday() and Monday is represented as 1, you don't really need to check the lower boundary:
if pos_date.isoweekday() <= 5: # Monday..Friday
# ...
Maybe to convert back to datetime type:
offset = str(link).find('Run:')
amount = offset + 15
pos = str(link)[offset:amount]
if datetime.strptime(pos,'%Y-%m-%d').isoweekday() in range(1, 6):
outF.write(str(link))
outF.write('\n')
Then it should work as expected.
Let's suppose your link is
link = "Your Link String is Run: 2018-12-21 21:15:48"
Your following code will work well to find the offset starting from Run
offset = str(link).find('Run:')
amount = offset + 16
Since, there are two spaces after Run: hence, 16 needs to be added to offset.
Now extracting exactly the date string 2018-12-21, we need to add 6 to offset as Run: has 6 character before starting the date string.
pos = str(link)[offset + 6:amount]
Now formatting our date string in an datetime object with
pos_date = datetime.strptime(pos, "%Y-%m-%d")
Remember to import datetime at the top of your program file as
from datetime import datetime
Now checking and displaying if the date is a weekday
if pos_date.isoweekday() in range(1, 6):
print("It's a Week Day!")
This will return It's a Week Day!.
link = "something something Run: 2018-12-24 ..."
offset = str(link).find('Run:')
amount = offset + 15 # should be 16
pos = str(link)[offset:amount] # this is a string
The pos of the example above will be Run: 2018-12-24, so it does not capture the date exactly.
A string object does not have isoweekday method, so pos.isoweekday() will result to error. But a datetime.datetime object does have that method.
A solution:
import datetime
link = "something something Run: 2018-12-24 ..."
offset = str(link).find('Run:') # will only give the index of 'R', so offset will be 20
amount = offset + 16
pos = str(link)[offset:amount] # pos is 'Run: 2018-12-24'
datestring = pos.split()[1] # split and capture only the date string
#now convert the string into datetime object
datelist = datestring.split('-')
date = datetime.datetime(int(datelist[0]), int(datelist[1]), int(datelist[2]))
if date.isoweekday() in range(1, 6):
....
This okay..?
Another alternative to this would be to use dateutil.parser
from dateutil.parser import parse
try:
if parse(pos).isoweekday() <=5:
....
except ValueError:
.....
The advantage here is that parse will accept a wide variety of date formats that datetime might error out for

Add one day to date (string format) when the input string format is not known

I have the format
day/month/year
And I have a task to define a function that takes a date and returns the date with 1 day increased
Example:
next_day("13/1/2018") returns 14/1/2018
next_day("31/3/2018") returns 1/4/2018
How can I do that, I don't know how to do this when the function takes date not day, month, year.
This is one way using the 3rd party dateutil library and datetime from the standard library.
import datetime
from dateutil import parser
def add_day(x):
try:
new = parser.parse(x) + datetime.timedelta(days=1)
except ValueError:
new = parser.parse(x, dayfirst=True) + datetime.timedelta(days=1)
return new.strftime('%d/%m/%Y').lstrip('0').replace('/0', '/')
add_day('13/1/2018') # '14/1/2018'
add_day('31/3/2018') # '1/4/2018'
Trying to perform the same logic with datetime will be more restrictive, which is probably not what you want since it's not obvious you can guarantee the format of your input dates.
Explanation
Try parsing sequentially with month first (default), then day first.
Add a day using datetime.timedelta.
Use string formatting to remove leading zeros.
Pure datetime solution
import datetime
def add_day(x):
try:
new = datetime.datetime.strptime(x, '%m/%d/%Y') + datetime.timedelta(days=1)
except ValueError:
new = datetime.datetime.strptime(x, '%d/%m/%Y') + datetime.timedelta(days=1)
return new.strftime('%d/%m/%Y').lstrip('0').replace('/0', '/')
add_day('13/1/2018') # '14/1/2018'
add_day('31/3/2018') # '1/4/2018'
You can try this function to return the current date at least.
extension Date {
var withWeekDayMonthDayAndYear: String {
let formatter = DateFormatter()
formatter.timeZone = TimeZone(abbreviation: "EST")
formatter.dateFormat = "EEEE, MMMM dd, yyyy"
return formatter.string(from: self)
}
Then use the extension..
((Date().withWeekDayMonthDayAndYear))
It's a start..

Categories