Check if string has date, any format - python

How do I check if a string can be parsed to a date?
Jan 19, 1990
January 19, 1990
Jan 19,1990
01/19/1990
01/19/90
1990
Jan 1990
January1990
These are all valid dates. If there's any concern regarding the lack of space in between stuff in item #3 and the last item above, that can be easily remedied via automatically inserting a space in between letters/characters and numbers, if so needed.
But first, the basics:
I tried putting it in an if statement:
if datetime.strptime(item, '%Y') or datetime.strptime(item, '%b %d %y') or datetime.strptime(item, '%b %d %Y') or datetime.strptime(item, '%B %d %y') or datetime.strptime(item, '%B %d %Y'):
But that's in a try-except block, and keeps returning something like this:
16343 time data 'JUNE1890' does not match format '%Y'
Unless, it met the first condition in the if statement.
To clarify, I don't actually need the value of the date - I just want to know if it is. Ideally, it would've been something like this:
if item is date:
print date
else:
print "Not a date"
Is there any way to do this?

The parse function in dateutils.parser is capable of parsing many date string formats to a datetime object.
If you simply want to know whether a particular string could represent or contain a valid date, you could try the following simple function:
from dateutil.parser import parse
def is_date(string, fuzzy=False):
"""
Return whether the string can be interpreted as a date.
:param string: str, string to check for date
:param fuzzy: bool, ignore unknown tokens in string if True
"""
try:
parse(string, fuzzy=fuzzy)
return True
except ValueError:
return False
Then you have:
>>> is_date("1990-12-1")
True
>>> is_date("2005/3")
True
>>> is_date("Jan 19, 1990")
True
>>> is_date("today is 2019-03-27")
False
>>> is_date("today is 2019-03-27", fuzzy=True)
True
>>> is_date("Monday at 12:01am")
True
>>> is_date("xyz_not_a_date")
False
>>> is_date("yesterday")
False
Custom parsing
parse might recognise some strings as dates which you don't want to treat as dates. For example:
Parsing "12" and "1999" will return a datetime object representing the current date with the day and year substituted for the number in the string
"23, 4" and "23 4" will be parsed as datetime.datetime(2023, 4, 16, 0, 0).
"Friday" will return the date of the nearest Friday in the future.
Similarly "August" corresponds to the current date with the month changed to August.
Also parse is not locale aware, so does not recognise months or days of the week in languages other than English.
Both of these issues can be addressed to some extent by using a custom parserinfo class, which defines how month and day names are recognised:
from dateutil.parser import parserinfo
class CustomParserInfo(parserinfo):
# three months in Spanish for illustration
MONTHS = [("Enero", "Enero"), ("Feb", "Febrero"), ("Marzo", "Marzo")]
An instance of this class can then be used with parse:
>>> parse("Enero 1990")
# ValueError: Unknown string format
>>> parse("Enero 1990", parserinfo=CustomParserInfo())
datetime.datetime(1990, 1, 27, 0, 0)

If you want to parse those particular formats, you can just match against a list of formats:
txt='''\
Jan 19, 1990
January 19, 1990
Jan 19,1990
01/19/1990
01/19/90
1990
Jan 1990
January1990'''
import datetime as dt
fmts = ('%Y','%b %d, %Y','%b %d, %Y','%B %d, %Y','%B %d %Y','%m/%d/%Y','%m/%d/%y','%b %Y','%B%Y','%b %d,%Y')
parsed=[]
for e in txt.splitlines():
for fmt in fmts:
try:
t = dt.datetime.strptime(e, fmt)
parsed.append((e, fmt, t))
break
except ValueError as err:
pass
# check that all the cases are handled
success={t[0] for t in parsed}
for e in txt.splitlines():
if e not in success:
print e
for t in parsed:
print '"{:20}" => "{:20}" => {}'.format(*t)
Prints:
"Jan 19, 1990 " => "%b %d, %Y " => 1990-01-19 00:00:00
"January 19, 1990 " => "%B %d, %Y " => 1990-01-19 00:00:00
"Jan 19,1990 " => "%b %d,%Y " => 1990-01-19 00:00:00
"01/19/1990 " => "%m/%d/%Y " => 1990-01-19 00:00:00
"01/19/90 " => "%m/%d/%y " => 1990-01-19 00:00:00
"1990 " => "%Y " => 1990-01-01 00:00:00
"Jan 1990 " => "%b %Y " => 1990-01-01 00:00:00
"January1990 " => "%B%Y " => 1990-01-01 00:00:00

Related

String to date in pandas

I have a dataset with dates encoded as strings formatted as %B %d, %Y, eg September 10, 2021.
Using:df['sale_date'] = pd.to_datetime(df.sale_date, format = '%B %d, %Y')
produces this error ValueError: time data 'September 10, 2021' does not match format '%B %d, %Y' (match)
Manually checking with strptimedatetime.strptime('September 10, 2021', '%B %d, %Y') produces the correct datetime object.
Is there something I missed in the pd.to_datetime?
Thanks.
Upon further investigation, I found out that the error only happens on the first element of the series. It seems that the string has '\ufeff' added to it. So I just did a series.str.replace() and now it is working. Sorry for the bother. Question is how did that BOM end up there?
Very likely you have to eliminate some whitespaces first!
If I add whitespaces at the beginning, end or both..
datestring = ' September 10, 2021 '
datetime.datetime.strptime(datestring, '%B %d, %Y')
it will result in the same error message as you have..
ValueError: time data ' September 10, 2021 ' does not match format '%B %d, %Y'
As a solution for a single value use:
datestring = ' September 10, 2021 '
datestring.strip()
for a column in a dataframe use:
dummy = pd.DataFrame(columns={'Date'}, data = [' September 10, 2021 ', ' September 11, 2021 ', ' September 12, 2021 '])
dummy.Date = dummy.Date.apply(lambda x: x.strip())

How to convert "01 January 2016" to UTC ISO format?

So I have the following string : " 01 January 2016" to UTC ISO date format ?
I'm using arrow module and the following code, but it's full of errors, and I was thinking that may be, there was a smaller more elegent solution as python encourages elegant and easier ways to do things, anyways here's my code :
updateStr = " 01 January 2016" #Note the space at the beginning
dateList = updateStr.split[' ']
dateDict = {"day" : dateList[1],"month": months.index(dateList[2])+1, "year" : dateList[3]}
dateStr = str(dateDict['day']) + "-" + str(dateDict["month"]) + "-" + str(dateDict["year"])
dateISO = arrow.get(dateStr, 'YYYY-MM-DD HH:mm:ss')
Please help me I have to convert it to the UTC ISO formats, Also months is a list of months in the year .
You can use datetime:
>>> updateStr = " 01 January 2016"
>>> import datetime as dt
>>> dt.datetime.strptime(updateStr, " %d %B %Y")
datetime.datetime(2016, 1, 1, 0, 0)
>>> _.isoformat()
'2016-01-01T00:00:00'
Keep in mind that is a 'naive' object without a timezone. Check out pytz to deal with timezones elegantly, or just add an appropriate utcoffset to the datetime object for UTC.
Using arrow:
>>> import arrow
>>> updateStr = " 01 January 2016"
>>> arrow.get(updateStr, "DD MMMM YYYY").isoformat()
'2016-01-01T00:00:00+00:00'
>>>
You can use datetime's methods to parse this date string and then reformat it to UTC format:
>>> from datetime import datetime
>>> updateStr = " 01 January 2016" #Note the space at the beginning
>>> d = datetime.strptime(updateStr, ' %d %B %Y') # Same space here
>>> s = datetime.isoformat(d)
>>> s
'2016-01-01T00:00:00'

Datetime pattern does not match in python even though bash recognizes it

I have the following code (based on http://strftime.org/):
try:
datetime.datetime.strptime("Apr 14, 2016 9", '%b %d, %Y %-I')
print "matched date format"
except ValueError:
print "did NOT match date format"
The above prints:
$ python parse_log.py
did NOT match date format
However bash recognizes this date format:
$ date '+%b %d, %Y %-I'
Apr 14, 2016 1
What am I missing?
It seems that the %-I is the problem, since Python matches date without the %-I section:
try:
datetime.datetime.strptime("Apr 14, 2016 ", '%b %d, %Y ')
print "matched date format"
except ValueError:
print "did NOT match date format"
output:
$ python parse_log.py
matched date format
I'm on python 2.6.6.
The actual pattern I need to match uses 12 hour clock and is:
datetime.datetime.strptime("Apr 14, 2016 9:59:54", '%b %d, %Y %-I:%M:%S')
You need to remove the - for strptime:
'%b %d, %Y %I:%M:%S'
In [17]: print datetime.datetime.strptime("Apr 14, 2016 9:59:54", '%b %d, %Y %I:%M:%S')
2016-04-14 09:59:54
The -I is used only for strftime:
In [15]: print datetime.datetime.strptime("Apr 14, 2016 9:59:54", '%b %d, %Y %I:%M:%S').strftime('%b %d, %Y %-I:%M:%S')
Apr 14, 2016 9:59:54

Time data does not match specified format

I'm trying to convert a string given in "DD MM YYYY" format into a datetime object. Here's the code for the same:
from datetime import date, timedelta
s = "23 July 2001"
d = datetime.datetime.strptime(s, "%d %m %Y")
However, I get the following error:
ValueError: time data '23 July 2001' does not match format '%d %m %Y'
What's wrong ? Isn't the format specified in the string the same as that specified by "%d %m %Y" ?
%m means "Month as a zero-padded decimal number."
Your month is July so you should use %B, which is "Month as locale’s full name."
Reference: https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior
Like behzad.nouri said, Use d = datetime.datetime.strptime(s, "%d %B %Y").
Or make s = '23 07 2001' and d = datetime.datetime.strptime(s, "%d %m %Y")

Convert datetime object to a String of date only in Python

I see a lot on converting a date string to an datetime object in Python, but I want to go the other way.
I've got
datetime.datetime(2012, 2, 23, 0, 0)
and I would like to convert it to string like '2/23/2012'.
You can use strftime to help you format your date.
E.g.,
import datetime
t = datetime.datetime(2012, 2, 23, 0, 0)
t.strftime('%m/%d/%Y')
will yield:
'02/23/2012'
More information about formatting see here
date and datetime objects (and time as well) support a mini-language to specify output, and there are two ways to access it:
direct method call: dt.strftime('format here')
format method (python 2.6+): '{:format here}'.format(dt)
f-strings (python 3.6+): f'{dt:format here}'
So your example could look like:
dt.strftime('The date is %b %d, %Y')
'The date is {:%b %d, %Y}'.format(dt)
f'The date is {dt:%b %d, %Y}'
In all three cases the output is:
The date is Feb 23, 2012
For completeness' sake: you can also directly access the attributes of the object, but then you only get the numbers:
'The date is %s/%s/%s' % (dt.month, dt.day, dt.year)
# The date is 02/23/2012
The time taken to learn the mini-language is worth it.
For reference, here are the codes used in the mini-language:
%a Weekday as locale’s abbreviated name.
%A Weekday as locale’s full name.
%w Weekday as a decimal number, where 0 is Sunday and 6 is Saturday.
%d Day of the month as a zero-padded decimal number.
%b Month as locale’s abbreviated name.
%B Month as locale’s full name.
%m Month as a zero-padded decimal number. 01, ..., 12
%y Year without century as a zero-padded decimal number. 00, ..., 99
%Y Year with century as a decimal number. 1970, 1988, 2001, 2013
%H Hour (24-hour clock) as a zero-padded decimal number. 00, ..., 23
%I Hour (12-hour clock) as a zero-padded decimal number. 01, ..., 12
%p Locale’s equivalent of either AM or PM.
%M Minute as a zero-padded decimal number. 00, ..., 59
%S Second as a zero-padded decimal number. 00, ..., 59
%f Microsecond as a decimal number, zero-padded on the left. 000000, ..., 999999
%z UTC offset in the form +HHMM or -HHMM (empty if naive), +0000, -0400, +1030
%Z Time zone name (empty if naive), UTC, EST, CST
%j Day of the year as a zero-padded decimal number. 001, ..., 366
%U Week number of the year (Sunday is the first) as a zero padded decimal number.
%W Week number of the year (Monday is first) as a decimal number.
%c Locale’s appropriate date and time representation.
%x Locale’s appropriate date representation.
%X Locale’s appropriate time representation.
%% A literal '%' character.
Another option:
import datetime
now=datetime.datetime.now()
now.isoformat()
# ouptut --> '2016-03-09T08:18:20.860968'
If you are looking for a simple way of datetime to string conversion and can omit the format. You can convert datetime object to str and then use array slicing.
In [1]: from datetime import datetime
In [2]: now = datetime.now()
In [3]: str(now)
Out[3]: '2019-04-26 18:03:50.941332'
In [5]: str(now)[:10]
Out[5]: '2019-04-26'
In [6]: str(now)[:19]
Out[6]: '2019-04-26 18:03:50'
But note the following thing. If other solutions will rise an AttributeError when the variable is None in this case you will receive a 'None' string.
In [9]: str(None)[:19]
Out[9]: 'None'
You could use simple string formatting methods:
>>> dt = datetime.datetime(2012, 2, 23, 0, 0)
>>> '{0.month}/{0.day}/{0.year}'.format(dt)
'2/23/2012'
>>> '%s/%s/%s' % (dt.month, dt.day, dt.year)
'2/23/2012'
You can easly convert the datetime to string in this way:
from datetime import datetime
date_time = datetime(2012, 2, 23, 0, 0)
date = date_time.strftime('%m/%d/%Y')
print("date: %s" % date)
These are some of the patterns that you can use to convert datetime to string:
For better understanding, you can take a look at this article on how to convert strings to datetime and datetime to string in Python or the official strftime documentation
type-specific formatting can be used as well:
t = datetime.datetime(2012, 2, 23, 0, 0)
"{:%m/%d/%Y}".format(t)
Output:
'02/23/2012'
If you want the time as well, just go with
datetime.datetime.now().__str__()
Prints 2019-07-11 19:36:31.118766 in console for me
The sexiest version by far is with format strings.
from datetime import datetime
print(f'{datetime.today():%Y-%m-%d}')
It is possible to convert a datetime object into a string by working directly with the components of the datetime object.
from datetime import date
myDate = date.today()
#print(myDate) would output 2017-05-23 because that is today
#reassign the myDate variable to myDate = myDate.month
#then you could print(myDate.month) and you would get 5 as an integer
dateStr = str(myDate.month)+ "/" + str(myDate.day) + "/" + str(myDate.year)
# myDate.month is equal to 5 as an integer, i use str() to change it to a
# string I add(+)the "/" so now I have "5/" then myDate.day is 23 as
# an integer i change it to a string with str() and it is added to the "5/"
# to get "5/23" and then I add another "/" now we have "5/23/" next is the
# year which is 2017 as an integer, I use the function str() to change it to
# a string and add it to the rest of the string. Now we have "5/23/2017" as
# a string. The final line prints the string.
print(dateStr)
Output --> 5/23/2017
You can convert datetime to string.
published_at = "{}".format(self.published_at)
String concatenation, str.join, can be used to build the string.
d = datetime.now()
'/'.join(str(x) for x in (d.month, d.day, d.year))
'3/7/2016'
end_date = "2021-04-18 16:00:00"
end_date_string = end_date.strftime("%Y-%m-%d")
print(end_date_string)
An approach to how far from now
support different languages by passing in param li, a list corresponding timestamp.
from datetime import datetime
from dateutil import parser
t1 = parser.parse("Tue May 26 15:14:45 2021")
t2 = parser.parse("Tue May 26 15:9:45 2021")
# 5min
t3 = parser.parse("Tue May 26 11:14:45 2021")
# 4h
t4 = parser.parse("Tue May 26 11:9:45 2021")
# 1day
t6 = parser.parse("Tue May 25 11:14:45 2021")
# 1day4h
t7 = parser.parse("Tue May 25 11:9:45 2021")
# 1day4h5min
t8 = parser.parse("Tue May 19 11:9:45 2021")
# 1w
t9 = parser.parse("Tue Apr 26 11:14:45 2021")
# 1m
t10 = parser.parse("Tue Oct 08 06:00:20 2019")
# 1y7m, 19m
t11 = parser.parse("Tue Jan 08 00:00:00 2019")
# 2y4m, 28m
# create: date of object creation
# now: time now
# li: a list of string indicate time (in any language)
# lst: suffix (in any language)
# long: display length
def howLongAgo(create, now, li, lst, long=2):
dif = create - now
print(dif.days)
sec = dif.days * 24 * 60 * 60 + dif.seconds
minute = sec // 60
sec %= 60
hour = minute // 60
minute %= 60
day = hour // 24
hour %= 24
week = day // 7
day %= 7
month = (week * 7) // 30
week %= 30
year = month // 12
month %= 12
s = []
for ii, tt in enumerate([sec, minute, hour, day, week, month, year]):
ss = li[ii]
if tt != 0:
if tt == 1:
s.append(str(tt) + ss)
else:
s.append(str(tt) + ss + 's')
return ' '.join(list(reversed(s))[:long]) + ' ' + lst
t = howLongAgo(t1, t11, [
'second',
'minute',
'hour',
'day',
'week',
'month',
'year',
], 'ago')
print(t)
# 2years 4months ago
I have used this method to insert dates to JSON object
my_json_string = json.dumps({'date_of_birth': '''{}'''.format(date_of_birth)})

Categories