this is a funny question.
I try to find out the right time in some phrases.
I use try-except module and re module
but there is something wrong in my code that can't deal with some tough phrase
As is depicted belong, I input the rediculous time 1997-25-52 or 1996-42-120
it still can output an answer.
def regular_time(time):
"""
部分电影日期带有国家, 例如:'1994-09-10(加拿大)'
正则提取日期
"""
import re
pattern = '^(([1-2]\d{3})-(0[1-9]|1[0-2])-(0[1-9]|[1-2][0-9]|3[0-1]))'
try:
matches = re.match(pattern, time, flags=0).group()
return matches
except Exception as e:
try:
pattern = '^(([1-2]\d{3})-(0[1-9]|1[0-2]))'
matches = re.match(pattern, time, flags=0).group()+'-01'
return matches
except:
try:
pattern = '^(([1-2]\d{3}))'
matches = re.match(pattern, time, flags=0).group() + '-01-01'
return matches
except:
print('errors')
time='1996-12-58'
regular_time(time)
How can I deal with this problem? Many thanks if you could do me a favor
Question: Default date from invalid datestring
Using datetime handles also leap years!
datetime.datetime.strptime
datetime.date.strftime
For example:
import re
from datetime import datetime
def regular_time(time):
_t = time.split('-')
# allways 3 itmes
while len(_t) < 3:
_t.append('01')
# year month and day ranges
ymd = [(range(1900, 2099), '1900'),
(range(1, 13), '01'),
(range(1, 32), '01')
]
# validate ranges
for n in range(3):
if not int(_t[n]) in ymd[n][0]:
_t[n] = ymd[n][1]
_time = '-'.join(_t)
try:
date = datetime.strptime(_time, '%Y-%m-%d')
print('VALID:{} => {}'
.format(time, date.strftime('%Y-%m-%d')))
except ValueError as e:
if "day is out of range for month" in e:
print('{} for {}, change to 01'.format(e, time))
_t[2] = '01'
regular_time('-'.join(_t))
else:
print('INVALID[{}]:{}'.format(_time, e))
for time in ['1996', '1996-18', '2019-09-31', '2019-01-31',
'1996-12-58', '1997-25-52', '1996-42-120']:
regular_time(time)
Output:
VALID:1996 => 1996-01-01
VALID:1996-18 => 1996-01-01
day is out of range for month for 2019-09-31, change to 01
VALID:2019-09-01 => 2019-09-01
VALID:2019-01-31 => 2019-01-31
VALID:1996-12-58 => 1996-12-01
VALID:1997-25-52 => 1997-01-01
VALID:1996-42-120 => 1996-01-01
Tested with Python 3.6
Your test case returns "1996-12-01", which is that it hits second-level "try-except", since it matches pattern of correct year and month (first failed cause day is unrealistic), then it just simplifies it to the first day of the month by adding "-01".
If you want to keep all parts of the date realistic - don't overwrite original "pattern". But fail it in first step.
Related
I am aware that there are other solutions to similar problems on stack overflow but they don't work in my particular situation.
I have some strings -- here are some examples of them.
string_with_dates = "random non-date text, 22 May 1945 and 11 June 2004"
string2 = "random non-date text, 01/01/1999 & 11 June 2004"
string3 = "random non-date text, 01/01/1990, June 23 2010"
string4 = "01/2/2010 and 25th of July 2020"
string5 = "random non-date text, 01/02/1990"
string6 = "random non-date text, 01/02/2010 June 10 2010"
I need a parser that can determine how many date-like objects are in the string and then parse them into actual dates into a list. I can't find any solutions out there. Here is desired output:
['05/22/1945','06/11/2004']
Or as actual datetiem objects. Any ideas?
I have tried the solutions listed here but they don't work. How to parse multiple dates from a block of text in Python (or another language)
Here is what happens when I try the solutions suggested in that link:
import itertools
from dateutil import parser
jumpwords = set(parser.parserinfo.JUMP)
keywords = set(kw.lower() for kw in itertools.chain(
parser.parserinfo.UTCZONE,
parser.parserinfo.PERTAIN,
(x for s in parser.parserinfo.WEEKDAYS for x in s),
(x for s in parser.parserinfo.MONTHS for x in s),
(x for s in parser.parserinfo.HMS for x in s),
(x for s in parser.parserinfo.AMPM for x in s),
))
def parse_multiple(s):
def is_valid_kw(s):
try: # is it a number?
float(s)
return True
except ValueError:
return s.lower() in keywords
def _split(s):
kw_found = False
tokens = parser._timelex.split(s)
for i in xrange(len(tokens)):
if tokens[i] in jumpwords:
continue
if not kw_found and is_valid_kw(tokens[i]):
kw_found = True
start = i
elif kw_found and not is_valid_kw(tokens[i]):
kw_found = False
yield "".join(tokens[start:i])
# handle date at end of input str
if kw_found:
yield "".join(tokens[start:])
return [parser.parse(x) for x in _split(s)]
parse_multiple(string_with_dates)
Output:
ParserError: Unknown string format: 22 May 1945 and 11 June 2004
Another method:
from dateutil.parser import _timelex, parser
a = "I like peas on 2011-04-23, and I also like them on easter and my birthday, the 29th of July, 1928"
p = parser()
info = p.info
def timetoken(token):
try:
float(token)
return True
except ValueError:
pass
return any(f(token) for f in (info.jump,info.weekday,info.month,info.hms,info.ampm,info.pertain,info.utczone,info.tzoffset))
def timesplit(input_string):
batch = []
for token in _timelex(input_string):
if timetoken(token):
if info.jump(token):
continue
batch.append(token)
else:
if batch:
yield " ".join(batch)
batch = []
if batch:
yield " ".join(batch)
for item in timesplit(string_with_dates):
print "Found:", (item)
print "Parsed:", p.parse(item)
Output:
ParserError: Unknown string format: 22 May 1945 11 June 2004
Any ideas?
Okay sorry to anyone who spent time on this -- but I was able to answer my own question. Leaving this up in case anyone else has the same issue.
This package was able to work perfectly: https://pypi.org/project/datefinder/
import datefinder
def DatesToList(x):
dates = datefinder.find_dates(x)
lists = []
for date in dates:
lists.append(date)
return (lists)
dates = DateToList(string_with_dates)
Output:
[datetime.datetime(1945, 5, 22, 0, 0), datetime.datetime(2004, 6, 11, 0, 0)]
Im currently working on my project.
I want the user to only be allowed to input a date (ex, January 2). If he enters anything else than a date a message should appear like "This is not a date, try again" repeatedly until a real date is given. How do i do this?
My initial idea was to create a .txt file were i write all the 365 dates and then somehow code that the user is only allowed to enter a string that matches one of the elements in the file, else try again.
I would really apreciate your help
Use dateutil.parser to handle dates of arbitrary formats.
Code
import dateutil.parser
def valid_date(date_string):
try:
date = dateutil.parser.parse(date_string)
return True
except ValueError:
return False
Test
for date in ['Somestring', 'Feb 20, 2021', 'Feb 20', 'Feb 30, 2021', 'January 25, 2011', '1/15/2020']:
print(f'Valid date {date}: {valid_date(date)}')
Output
Valid date Somestring: False # detects non-date strings
Valid date Feb 20, 2021: True
Valid date Feb 20: True
Valid date Feb 30, 2021: False # Recognizes Feb 30 as invalid
Valid date January 25, 2011: True
Valid date 1/15/2020: True # Handles different formats
There is no need to store all possible valid dates in a file.
Use datetime.strptime() to parse a string (entered by the user) into a datetime object according to a specific format.
strptime will raise an exception the input specified does not adhere to the pattern, so you can catch that exception and tell the user to try again.
Wrap it all in a while loop to make it work forever, until the user gets it right.
You can start with this:
from datetime import datetime
pattern = '%B %d, %Y' # e.g. January 2, 2021
inp = ''
date = None
while date is None:
inp = input('Please enter a date: ')
try:
date = datetime.strptime(inp, pattern)
break
except ValueError:
print(f'"{inp}" is not a valid date.')
continue
For a full list of the %-codes that strptime supports, check out the Python docs.
Provide you with several ways to verify the date, these are just simple implementations, and there is no strict check, you can choose one of the methods and then supplement the detailed check by yourself.
Use date(year,month,day)
def isValidDate(year, month, day):
try:
date(year, month, day)
except:
return False
else:
return True
Use date.fromisoformat()
def isValidDate(datestr):
try:
date.fromisoformat(datestr)
except:
return False
else:
return True
Use strptime
def check_date(i):
valids = ['%Y-%m-%d', '%Y%M']
for valid in valids
try:
return strptime(i, valid)
except ValueError as e:
pass
return False
Use regex
def check_date(str):
reg = /^(\d{4})-(\d{2})-(\d{2})$/;
return reg.test(str)
Say for example, I have the following strings and an input 4.0, which represents seconds:
John Time Made 11:05:20 in 2010
5.001 Kelly #1
6.005 Josh #8
And would like the following result:
John Time Made 11:05:24 in 2010 #Input 4.0 is added to the seconds of 11:05:20
1.001 Kelly #1 #4.0 is subtracted from the first number 5.001 = 1.001
2.005 Josh #8 #4.0 is subtracted from the first number 5.001 = 2.005
How can I recognize the hours:minutes:seconds in the first line, and #.### in the rest to add/subtract the input number?
Thank you in advance and will accept/upvote answer
This solution should work if your complete data has the same format as this particular sample you provided. You should have the data in the input.txt file.
val_to_add = 4
with open('input.txt') as fin:
# processing first line
first_line = fin.readline().strip()
splitted = first_line.split(' ')
# get hour, minute, second corresponding to time (11:05:20)
time_values = splitted[3].split(':')
# seconds is the last element
seconds = int(time_values[-1])
# add the value
new_seconds = seconds + val_to_add
# doing simple math to avoid having values >= 60 for minute and second
# this part probably can be solved with datetime or some other lib, but it's not that complex, so I did it in couple of lines
seconds = new_seconds % 60 # if we get > 59 seconds we only put the modulo as second and the other part goes to minute
new_minutes = int(time_values[1]) + new_seconds // 60 # if we have more than 60 s then here we'll add minutes produced by adding to the seconds
minutes = new_minutes % 60 # similarly as for seconds
hours = int(time_values[0]) + new_minutes // 60
# here I convert again to string so we could easily apply join operation (operates only on strings) and additionaly add zero in front for 1 digit numbers
time_values[0] = str(hours).rjust(2, '0')
time_values[1] = str(minutes).rjust(2, '0')
time_values[2] = str(seconds).rjust(2, '0')
new_time_val = ':'.join(time_values)# join the values to follow the HH:MM:SS format
splitted[3] = new_time_val# replace the old time with the new one (with the value added)
first_line_modified = ' '.join(splitted)# just join the modified list
print(first_line_modified)
# processing othe lines
for line in fin:
# here we only get the first (0th) value and subtract the val_to_add and round to 3 digits the response (to avoid too many decimal places)
stripped = line.strip()
splitted = stripped.split(' ')
splitted[0] = str(round(float(splitted[0]) - val_to_add, 3))
modified_line = ' '.join(splitted)
print(modified_line)
Although regex was discouraged in the comments, regex can be used to parse the time objects into datetime.time objects, perform the necessary calculations on them, then print them in the required format:
# datetime module for time calculations
import datetime
# regex module
import re
# seconds to add to time
myinp = 4
# List of data strings
# data = 'John Time Made 11:05:20 in 2010', '5.001 Kelly', '6.005 Josh'
with open('data.txt') as f:
data = f.readlines()
new_data = []
#iterate through the list of data strings
for time in data:
try:
# First check for 'HH:MM:SS' time format in data string
# regex taken from this question: http://stackoverflow.com/questions/8318236/regex-pattern-for-hhmmss-time-string
match = re.findall("([0-1]?\d|2[0-3]):([0-5]?\d):([0-5]?\d)", time)
# this regex returns a list of tuples as strings "[('HH', 'MM', 'SS')]",
# which we join back together with ':' (colon) separators
t = ':'.join(match[0])
# create a Datetime object from indexing the first matched time in the list,
# taken from this answer http://stackoverflow.com/questions/100210/what-is-the-standard-way-to-add-n-seconds-to-datetime-time-in-python
# May create an IndexError exception, which we catch in the `except` clause below
orig = datetime.datetime(100,1,1,int(match[0][0]), int(match[0][1]), int(match[0][2]))
# Add the number of seconds to the Datetime object,
# taken from this answer: http://stackoverflow.com/questions/656297/python-time-timedelta-equivalent
newtime = (orig + datetime.timedelta(0, myinp)).time()
# replace the time in the original data string with the newtime and print
new_data.append(time.replace(t, str(newtime)))
# catch an IndexError Exception, which we look for float-formatted seconds only
except IndexError:
# look for float-formatted seconds (s.xxx)
# taken from this answer: http://stackoverflow.com/questions/4703390/how-to-extract-a-floating-number-from-a-string
match = re.findall("\d+\.\d+", time)
# create a Datetime object from indexing the first matched time in the list,
# specifying only seconds, and microseconds, which we convert to milliseconds (micro*1000)
orig = datetime.datetime(100,1,1,second=int(match[0].split('.')[0]),microsecond=int(match[0].split('.')[1])*1000)
# Subtract the seconds from the Datetime object, similiar to the time addtion in the `try` clause above
newtime = orig - datetime.timedelta(0, myinp)
# format the newtime as `seconds` concatenated with the milliseconds converted from microseconds
newtime_fmt = newtime.second + newtime.microsecond/1000000.
# Get the seconds value (first value(index 0)) from splitting the original string at the `space` between the `seconds` and `name` strings
t = time.split(' ')[0]
# replace the time in the original data string with the newtime and print
new_data.append(time.replace(t , str(newtime_fmt)))
with open('new_data.txt', 'w') as nf:
for newline in new_data:
nf.write(newline)
new_data.txt file contents should read as:
John Time Made 11:05:24 in 2010
1.001 Kelly
2.005 Josh
I am struggling to find the best way to convert the date input given by the user as mm/dd/yyyy to 3 variables. I am unable to split this because I receive an error since it is a 'float'.
>>> date=3/2/2016
>>> date.split('/')
Traceback (most recent call last):
File "<pyshell#152>", line 1, in <module> date.split('/')
AttributeError: 'float' object has no attribute 'split'
what do I need to add to this to make sure it doesn't evaluate the date with division?
def main():
date=input("Enter date mm/dd/yyyy: ")
I want the input date given as mm/dd/yyyy, and then a way to convert this to 3 variables as m=month d=day y=year
What's the best way to do this?
Try str.split:
>>> test_date = "05/12/2016"
>>> month, day, year = test_date.split('/')
>>> print(f"Month = {month}, Day = {day}, Year = {year}")
Month = 05, Day = 12, Year = 2016
I wrote this following piece of code and it works perfectly fine.
>>> date='3/2/2016'
>>> new=date.split('/')
>>> new
['3', '2', '2016']
>>>
>>> m,d,year=new
>>> m
'3'
>>> d
'2'
>>> year
'2016'
>>>
Like Jessica Smith has already pointed it out, date=3/2/2016 evaluates expressions and divides the numbers. It has to be of string string type to be split.
The error "'float' object has no attribute 'split'" suggests that type(date) == float in your example that implies that you are trying to run Python 3 code using Python 2 interpreter where input() evaluates its input as a Python expression instead of returning it as a string.
To get the date as a string on Python 2, use raw_input() instead of input():
date_string = raw_input("Enter date mm/dd/yyyy: ")
To make it work on both Python 2 and 3, add at the top of your script:
try: # make input() and raw_input() to be synonyms
input = raw_input
except NameError: # Python 3
raw_input = input
If you need the old Python 2 input() behavior; you could call eval() explicitly.
To validate the input date, you could use datetime.strptime() and catch ValueError:
from datetime import datetime
try:
d = datetime.strptime(date_string, '%m/%d/%Y')
except ValueError:
print('wrong date string: {!r}'.format(date_string))
.strptime() guarantees that the input date is valid otherwise ValueError is raised. On success, d.year, d.month, d.day work as expected.
Putting it all together (not tested):
#!/usr/bin/env python
from datetime import datetime
try: # make input() and raw_input() to be synonyms
input = raw_input
except NameError: # Python 3
raw_input = input
while True: # until a valid date is given
date_string = raw_input("Enter date mm/dd/yyyy: ")
try:
d = datetime.strptime(date_string, '%m/%d/%Y')
except ValueError: # invalid date
print('wrong date string: {!r}'.format(date_string))
else: # valid date
break
# use the datetime object here
print("Year: {date.year}, Month: {date.month}, Day: {date.day}".format(date=d))
See Asking the user for input until they give a valid response.
You could use .split('/') instead of .strptime() if you must:
month, day, year = map(int, date_string.split('/'))
It doesn't validate whether the values form a valid date in the Gregorian calendar.
Try:
def main():
month, day, year = [int(x) for x in raw_input("Enter date mm/dd/yyyy: ").split('/')]
print "Month: {}\n".format(month), "Day: {}\n".format(day), "Year: {}".format(year)
main()
Output:
Enter date mm/dd/yyyy: 03/09/1987
Month: 3
Day: 9
Year: 1987
I am trying to parse multiple dates from a string in Python with the help of this code,
from dateutil.parser import _timelex, parser
a = "Approve my leave from first half of 12/10/2012 to second half of 20/10/2012 "
p = parser()
info = p.info
def timetoken(token):
try:
float(token)
return True
except ValueError:
pass
return any(f(token) for f in (info.jump,info.weekday,info.month,info.hms,info.ampm,info.pertain,info.utczone,info.tzoffset))
def timesplit(input_string):
batch = []
for token in _timelex(input_string):
if timetoken(token):
if info.jump(token):
continue
batch.append(token)
else:
if batch:
yield " ".join(batch)
batch = []
if batch:
yield " ".join(batch)
for item in timesplit(a):
print "Found:", item
print "Parsed:", p.parse(item)
and the codes is taking second half from the string as second date and giving me this error,
raise ValueError, "unknown string format"
ValueError: unknown string format
when i change 'second half' to 'third half' or 'forth half' then it is working all fine.
Can any one help me to parse this string ?
Your parser couldn't handle the "second" found by timesplit,if you set the fuzzy param to be True, it doesn't break but nor does it produce anything meaningful.
from cStringIO import StringIO
for item in timesplit(StringIO(a)):
print "Found:", item
print "Parsed:", p.parse(StringIO(item),fuzzy=True)
out:
Found: 12 10 2012
Parsed: 2012-12-10 00:00:00
Found: second
Parsed: 2013-01-11 00:00:00
Found: 20 10 2012
Parsed: 2012-10-20 00:00:00
You have to fix the timesplitting or handle the errors:
opt1:
lose the info.hms from timetoken
opt2:
from cStringIO import StringIO
for item in timesplit(StringIO(a)):
print "Found:", item
try:
print "Parsed:", p.parse(StringIO(item))
except ValueError:
print 'Not Parsed!'
out:
Found: 12 10 2012
Parsed: 2012-12-10 00:00:00
Found: second
Not Parsed!
Parsed: Found: 20 10 2012
Parsed: 2012-10-20 00:00:00
If you need only dates, could extract it with regex and works with dates.
a = "Approve my leave from first half of 12/10/2012 to second half of 20/10/2012 "
import re
pattern = re.compile('\d{2}/\d{2}/\d{4}')
pattern.findall(a)
['12/10/2012', '20/10/2012']