Parsing long form dates from string - python

I am aware that there are other solutions to similar problems on stack overflow but they don't work in my particular situation.
I have some strings -- here are some examples of them.
string_with_dates = "random non-date text, 22 May 1945 and 11 June 2004"
string2 = "random non-date text, 01/01/1999 & 11 June 2004"
string3 = "random non-date text, 01/01/1990, June 23 2010"
string4 = "01/2/2010 and 25th of July 2020"
string5 = "random non-date text, 01/02/1990"
string6 = "random non-date text, 01/02/2010 June 10 2010"
I need a parser that can determine how many date-like objects are in the string and then parse them into actual dates into a list. I can't find any solutions out there. Here is desired output:
['05/22/1945','06/11/2004']
Or as actual datetiem objects. Any ideas?
I have tried the solutions listed here but they don't work. How to parse multiple dates from a block of text in Python (or another language)
Here is what happens when I try the solutions suggested in that link:
import itertools
from dateutil import parser
jumpwords = set(parser.parserinfo.JUMP)
keywords = set(kw.lower() for kw in itertools.chain(
parser.parserinfo.UTCZONE,
parser.parserinfo.PERTAIN,
(x for s in parser.parserinfo.WEEKDAYS for x in s),
(x for s in parser.parserinfo.MONTHS for x in s),
(x for s in parser.parserinfo.HMS for x in s),
(x for s in parser.parserinfo.AMPM for x in s),
))
def parse_multiple(s):
def is_valid_kw(s):
try: # is it a number?
float(s)
return True
except ValueError:
return s.lower() in keywords
def _split(s):
kw_found = False
tokens = parser._timelex.split(s)
for i in xrange(len(tokens)):
if tokens[i] in jumpwords:
continue
if not kw_found and is_valid_kw(tokens[i]):
kw_found = True
start = i
elif kw_found and not is_valid_kw(tokens[i]):
kw_found = False
yield "".join(tokens[start:i])
# handle date at end of input str
if kw_found:
yield "".join(tokens[start:])
return [parser.parse(x) for x in _split(s)]
parse_multiple(string_with_dates)
Output:
ParserError: Unknown string format: 22 May 1945 and 11 June 2004
Another method:
from dateutil.parser import _timelex, parser
a = "I like peas on 2011-04-23, and I also like them on easter and my birthday, the 29th of July, 1928"
p = parser()
info = p.info
def timetoken(token):
try:
float(token)
return True
except ValueError:
pass
return any(f(token) for f in (info.jump,info.weekday,info.month,info.hms,info.ampm,info.pertain,info.utczone,info.tzoffset))
def timesplit(input_string):
batch = []
for token in _timelex(input_string):
if timetoken(token):
if info.jump(token):
continue
batch.append(token)
else:
if batch:
yield " ".join(batch)
batch = []
if batch:
yield " ".join(batch)
for item in timesplit(string_with_dates):
print "Found:", (item)
print "Parsed:", p.parse(item)
Output:
ParserError: Unknown string format: 22 May 1945 11 June 2004
Any ideas?

Okay sorry to anyone who spent time on this -- but I was able to answer my own question. Leaving this up in case anyone else has the same issue.
This package was able to work perfectly: https://pypi.org/project/datefinder/
import datefinder
def DatesToList(x):
dates = datefinder.find_dates(x)
lists = []
for date in dates:
lists.append(date)
return (lists)
dates = DateToList(string_with_dates)
Output:
[datetime.datetime(1945, 5, 22, 0, 0), datetime.datetime(2004, 6, 11, 0, 0)]

Related

How to limit the input in Python for only dates?

Im currently working on my project.
I want the user to only be allowed to input a date (ex, January 2). If he enters anything else than a date a message should appear like "This is not a date, try again" repeatedly until a real date is given. How do i do this?
My initial idea was to create a .txt file were i write all the 365 dates and then somehow code that the user is only allowed to enter a string that matches one of the elements in the file, else try again.
I would really apreciate your help
Use dateutil.parser to handle dates of arbitrary formats.
Code
import dateutil.parser
def valid_date(date_string):
try:
date = dateutil.parser.parse(date_string)
return True
except ValueError:
return False
Test
for date in ['Somestring', 'Feb 20, 2021', 'Feb 20', 'Feb 30, 2021', 'January 25, 2011', '1/15/2020']:
print(f'Valid date {date}: {valid_date(date)}')
Output
Valid date Somestring: False # detects non-date strings
Valid date Feb 20, 2021: True
Valid date Feb 20: True
Valid date Feb 30, 2021: False # Recognizes Feb 30 as invalid
Valid date January 25, 2011: True
Valid date 1/15/2020: True # Handles different formats
There is no need to store all possible valid dates in a file.
Use datetime.strptime() to parse a string (entered by the user) into a datetime object according to a specific format.
strptime will raise an exception the input specified does not adhere to the pattern, so you can catch that exception and tell the user to try again.
Wrap it all in a while loop to make it work forever, until the user gets it right.
You can start with this:
from datetime import datetime
pattern = '%B %d, %Y' # e.g. January 2, 2021
inp = ''
date = None
while date is None:
inp = input('Please enter a date: ')
try:
date = datetime.strptime(inp, pattern)
break
except ValueError:
print(f'"{inp}" is not a valid date.')
continue
For a full list of the %-codes that strptime supports, check out the Python docs.
Provide you with several ways to verify the date, these are just simple implementations, and there is no strict check, you can choose one of the methods and then supplement the detailed check by yourself.
Use date(year,month,day)
def isValidDate(year, month, day):
try:
date(year, month, day)
except:
return False
else:
return True
Use date.fromisoformat()
def isValidDate(datestr):
try:
date.fromisoformat(datestr)
except:
return False
else:
return True
Use strptime
def check_date(i):
valids = ['%Y-%m-%d', '%Y%M']
for valid in valids
try:
return strptime(i, valid)
except ValueError as e:
pass
return False
Use regex
def check_date(str):
reg = /^(\d{4})-(\d{2})-(\d{2})$/;
return reg.test(str)

How can I adjust 'the time' in python with module Re

this is a funny question.
I try to find out the right time in some phrases.
I use try-except module and re module
but there is something wrong in my code that can't deal with some tough phrase
As is depicted belong, I input the rediculous time 1997-25-52 or 1996-42-120
it still can output an answer.
def regular_time(time):
"""
部分电影日期带有国家, 例如:'1994-09-10(加拿大)'
正则提取日期
"""
import re
pattern = '^(([1-2]\d{3})-(0[1-9]|1[0-2])-(0[1-9]|[1-2][0-9]|3[0-1]))'
try:
matches = re.match(pattern, time, flags=0).group()
return matches
except Exception as e:
try:
pattern = '^(([1-2]\d{3})-(0[1-9]|1[0-2]))'
matches = re.match(pattern, time, flags=0).group()+'-01'
return matches
except:
try:
pattern = '^(([1-2]\d{3}))'
matches = re.match(pattern, time, flags=0).group() + '-01-01'
return matches
except:
print('errors')
time='1996-12-58'
regular_time(time)
How can I deal with this problem? Many thanks if you could do me a favor
Question: Default date from invalid datestring
Using datetime handles also leap years!
datetime.datetime.strptime
datetime.date.strftime
For example:
import re
from datetime import datetime
def regular_time(time):
_t = time.split('-')
# allways 3 itmes
while len(_t) < 3:
_t.append('01')
# year month and day ranges
ymd = [(range(1900, 2099), '1900'),
(range(1, 13), '01'),
(range(1, 32), '01')
]
# validate ranges
for n in range(3):
if not int(_t[n]) in ymd[n][0]:
_t[n] = ymd[n][1]
_time = '-'.join(_t)
try:
date = datetime.strptime(_time, '%Y-%m-%d')
print('VALID:{} => {}'
.format(time, date.strftime('%Y-%m-%d')))
except ValueError as e:
if "day is out of range for month" in e:
print('{} for {}, change to 01'.format(e, time))
_t[2] = '01'
regular_time('-'.join(_t))
else:
print('INVALID[{}]:{}'.format(_time, e))
for time in ['1996', '1996-18', '2019-09-31', '2019-01-31',
'1996-12-58', '1997-25-52', '1996-42-120']:
regular_time(time)
Output:
VALID:1996 => 1996-01-01
VALID:1996-18 => 1996-01-01
day is out of range for month for 2019-09-31, change to 01
VALID:2019-09-01 => 2019-09-01
VALID:2019-01-31 => 2019-01-31
VALID:1996-12-58 => 1996-12-01
VALID:1997-25-52 => 1997-01-01
VALID:1996-42-120 => 1996-01-01
Tested with Python 3.6
Your test case returns "1996-12-01", which is that it hits second-level "try-except", since it matches pattern of correct year and month (first failed cause day is unrealistic), then it just simplifies it to the first day of the month by adding "-01".
If you want to keep all parts of the date realistic - don't overwrite original "pattern". But fail it in first step.

How to convert a downloaded string to datetime format?

I am trying to check if today's date < date downloaded from text file online. Here is my code :
import datetime
import requests
URL = "http://directlinktotextfile.com/text.txt"
result = requests.get(URL)
today = datetime.datetime.now().date()
Url_date = result.text
Url_date.strip()
Url_date = datetime.date(Url_date)
if today < Url_date :
print "Today is less than future date"
raw_input()
else:
print "Today is greater than or = to future date"
raw_input()
The result that comes back is just this : 2018,02,14. I use .strip() in case there might be blank spaces or extra lines. I've printed out result.text after strip() and it shows the correct details. Why is it that I can't check if today < Url_date. It works fine if I enter manually a date into datetime.date(2018,02,14), but when I'm downloading the string it won't work. Any suggestions?
You pass string to datetime.date() which should be each an integer.
Url_list = []
Url_list = Url_date.split(",")
yr = int(Url_list[0])
mn = int(Url_list[1])
d = int(Url_list[2])
Now pass these integers to datetime.date
Url_date = datetime.date(yr, mn, d)
The arguments you pass to datetime.date(arg1, arg2, arg3) are not strings as a whole. When you pass it from url, what you are actually doing is
datetime.date("2018,2,14")
Note that you are passing only one string argument and not 3 different integers. You should split the date string using comma and then convert each into integers and then pass them as arguments to datetime.date.
Here is what your code is trying to do :
Url_date = datetime.date("2018,02,14")
But he wants to have:
Url_date = datetime.date(2018,02,14)
Do
Url_date.split(',') # Result: ['2018','02','14']
And then convert all the string in the array in integers
It should be ok :)
Use strptime:
import datetime
today = datetime.datetime.now().date()
parsed = datetime.datetime.strptime("2018,02,14", "%Y,%m,%d").date()
print(today < parsed) # True

Date conversion from numbers to words in Python

I wrote a program in python(I'm a beginner still) that converts a date from numbers to words using dictionaries. The program is like this:
dictionary_1 = { 1:'first', 2:'second'...}
dictionary_2 = { 1:'January', 2:'February',...}
and three other more for tens, hundreds, thousands;
2 functions, one for years <1000, the other for years >1000;
an algorithm that verifies if it's a valid date.
In main I have:
a_random_date = raw_input("Enter a date: ")
(I've chosen raw_input for special chars. between numbers such as: 21/11/2014 or 21-11-2014 or 21.11.2014, only these three) and after verifying if it's a valid date I do not know nor did I find how to call upon the dictionaries to convert the date into words, when I run the program I want at the output for example if I typed 1/1/2015: first/January/two thousand fifteen.
And I would like to apply the program to a text document to seek the dates and convert them from numbers to words if it is possible.
Thank you!
You can split that date in list and then check if there is that date in dictionary like this:
import re
dictionary_1 = { 1:'first', 2:'second'}
dictionary_2 = { 1:'January', 2:'February'}
dictionary_3 = { 1996:'asd', 1995:'asd1'}
input1 = raw_input("Enter date:")
lista = re.split(r'[.\/-]', input1)
print "lista: ", lista
day = lista[0]
month = lista[1]
year = lista[2]
everything_ok = False
if dictionary_1.get(int(day)) != None:
day_print = dictionary_1.get(int(day))
everything_ok = True
else:
print "There is no such day"
if dictionary_2.get(int(month)) != None:
month_print = dictionary_2.get(int(month))
everything_ok = True
else:
print "There is no such month"
everything_ok = False
if dictionary_3.get(int(year)) != None:
year_print = dictionary_3.get(int(year))
everything_ok = True
else:
print "There is no such year"
everything_ok = False
if everything_ok == True:
print "Date: ", day_print, "/", month_print, "/", year_print #or whatever format
else:
pass
This is the output:
Enter date:1/2/1996
Date: first / February / asd
I hope this helps you.
Eventually you will need the re module. Learn to write a regular expression that can search strings of a particular format. Here's some code example:
with open("mydocument.txt") as f:
contents = f.read()
fi = re.finditer(r"\d{1,2}-\d{1,2}-\d{4}", contents)
This will find all strings that are made up of 1 or 2 digits followed by a hyphen, followed by another 1 or 2 digits followed by a hyphen, followed by 4 digits. Then, you feed each string into datetime.strptime; it will parse your "date" string and decide if it is valid according to your specified format.
Have fun!

parse multiple dates using dateutil

I am trying to parse multiple dates from a string in Python with the help of this code,
from dateutil.parser import _timelex, parser
a = "Approve my leave from first half of 12/10/2012 to second half of 20/10/2012 "
p = parser()
info = p.info
def timetoken(token):
try:
float(token)
return True
except ValueError:
pass
return any(f(token) for f in (info.jump,info.weekday,info.month,info.hms,info.ampm,info.pertain,info.utczone,info.tzoffset))
def timesplit(input_string):
batch = []
for token in _timelex(input_string):
if timetoken(token):
if info.jump(token):
continue
batch.append(token)
else:
if batch:
yield " ".join(batch)
batch = []
if batch:
yield " ".join(batch)
for item in timesplit(a):
print "Found:", item
print "Parsed:", p.parse(item)
and the codes is taking second half from the string as second date and giving me this error,
raise ValueError, "unknown string format"
ValueError: unknown string format
when i change 'second half' to 'third half' or 'forth half' then it is working all fine.
Can any one help me to parse this string ?
Your parser couldn't handle the "second" found by timesplit,if you set the fuzzy param to be True, it doesn't break but nor does it produce anything meaningful.
from cStringIO import StringIO
for item in timesplit(StringIO(a)):
print "Found:", item
print "Parsed:", p.parse(StringIO(item),fuzzy=True)
out:
Found: 12 10 2012
Parsed: 2012-12-10 00:00:00
Found: second
Parsed: 2013-01-11 00:00:00
Found: 20 10 2012
Parsed: 2012-10-20 00:00:00
You have to fix the timesplitting or handle the errors:
opt1:
lose the info.hms from timetoken
opt2:
from cStringIO import StringIO
for item in timesplit(StringIO(a)):
print "Found:", item
try:
print "Parsed:", p.parse(StringIO(item))
except ValueError:
print 'Not Parsed!'
out:
Found: 12 10 2012
Parsed: 2012-12-10 00:00:00
Found: second
Not Parsed!
Parsed: Found: 20 10 2012
Parsed: 2012-10-20 00:00:00
If you need only dates, could extract it with regex and works with dates.
a = "Approve my leave from first half of 12/10/2012 to second half of 20/10/2012 "
import re
pattern = re.compile('\d{2}/\d{2}/\d{4}')
pattern.findall(a)
['12/10/2012', '20/10/2012']

Categories