How to handle spelled out days in strptime - python

I have an array of dates that is formatted like so:
['October 22nd, 2019', 'February 8th, 2020', 'July 31st, 2020', 'September 21st, 2020', ...]
I'd like to turn it into datetime objects using strptime, but I can't figure out how to hand the spelled out parts of the days, e.g. 22nd or 8th and it doesn't say in the format documentation.
The following works when there's no written out part of the day:
from datetime import datetime
dt_obj = datetime.striptime('October 22, 2019', '&B &d, &Y')
But I can't figure out how to parse a string that has the day written out:
in: dt_obj = datetime.striptime('October 22nd, 2019', '&B &d, &Y')
out: ValueError: time data 'October 22nd, 2019' does not match format '%B %d, %Y'
What's the proper format for this? Thank you!

e.g. 22nd or 8th and it doesn't say in the format
documentation
You got it right, it is not mentioned in documentation because there is no such formats, one way you can parse them is by using regex, and converting those date strings to something for which Python's datetime has the format for.
import re
from datetime import datetime
[datetime.strptime(x, '%B %d %Y') for x in [' '.join(re.findall('^\w+|\d+',each)) for each in ['October 22nd, 2019', 'February 8th, 2020', 'July 31st, 2020', 'September 21st, 2020']]]
#output:
[datetime.datetime(2019, 10, 22, 0, 0), datetime.datetime(2020, 2, 8, 0, 0), datetime.datetime(2020, 7, 31, 0, 0), datetime.datetime(2020, 9, 21, 0, 0)]

Related

Convert Dates in python to compare

I have two dates in Python. One is received from the database:
(datetime.datetime(2017, 10, 10, 10, 10, 10),)
And the other from the email:
Thu, 18 Jan 2018 15:50:49 -0500.
I want to compare these two dates in Python for which I think these dates need to be converted in one specific format. How to convert these dates in one specific format to compare?
Convert both to datetime (very powerful with dates).
dt_date = datetime.datetime(2017, 10, 10, 10, 10, 10)
str_date = 'Thu, 18 Jan 2018 15:50:49 -0500'
pd.to_datetime([dt_date,str_date])
Output:
DatetimeIndex(['2017-10-10 10:10:10', '2018-01-18 20:50:49']
from datetime import datetime
start_date = datetime(2017, 10, 10, 10, 10, 10)
email_date = datetime.strptime("Thu, 18 Jan 2018 15:50:49", "%a, %d %b %Y %H:%M:%S")
start_date, email_date
>>>datetime.datetime(2017, 10, 10, 10, 10, 10) datetime.datetime(2018, 1, 18, 15, 50, 49)

Representing date format for three letter month in Python

How do I represent a 3 letter month date format in python such as the following:
Jan 16, 2012
I know for January 16, 2012 the format is %B %d,%Y. Any ideas?
There's the three letter month format %b:
In [37]: datetime.strptime('Jan 16, 2012', '%b %d, %Y')
Out[37]: datetime.datetime(2012, 1, 16, 0, 0)
date_str = 'Jan 16, 2012'
date_components = date_str.split(' ')
date_components[0] = date_components[0][:3]
return ' '.join(date_components)

change datetime object to string

I want to convert datetime.datetime(2016, 11, 21, 5, 34, 38, 826339, tzinfo=<UTC>) as Nov. 21, 2016, 11:04 a.m.
The time in the datetime object is in UST but I want it to be converted into IST(UST+ 05:30).
I tried using strftime as:
>>> datetime(2016, 11, 21, 5, 34, 38, 826339, tzinfo=<UTC>).isoformat(' ')
File "<stdin>", line 1
datetime(2016, 11, 21, 5, 34, 38, 826339, tzinfo=<UTC>).isoformat(' ')
^
SyntaxError: invalid syntax
Can I get some help here.
PS: I am using python
EDIT:
cr_date = datetime(2016, 11, 21, 5, 34, 38, 826339) #excluding the timezone
I can get partial desired reults by:
cr_date.strftime('%b. %d, %Y %H:%M')
'Oct. 31, 2013 18:23
didn't get the am/pm though
For the am/pm part, couldn't you use %p for the last field? I don't know python, I'm just assuming python is taking syntax from the unix date command.
Please find below code
from datetime import datetime
cr_date = datetime(2016, 11, 21, 5, 34, 38, 826339)
cr_date.strftime('%b. %d, %Y %H:%M %P')
'Nov. 21, 2016 05:34 am'
add %p for 'AM or PM' else add %P for 'am or pm'

How can I splice out the date from similar strings?

I have a bunch of dates from some web scraping, but it seems that a country is also in the date string. Here is a sample:
Nov. 4, 2015Bangladesh
April 8, 2015Saudi Arabia
Jan. 14, 2016Indonesia
June 26, 2015Tunisia
Jan. 11, 2016France
I know regex is really great for working with strings, but I am just not experienced enough to know how to start.
How can I remove the country while keeping the dates intact?
This regex will get you just the date string from all of those. This could probably also be fixed by showing us your code for scraping the dates, but that's not what this question is about.
^.+?\s\d+,\s\d+
Example:
import re
dates = ["Nov. 4, 2015Bangladesh",
"April 8, 2015Saudi Arabia ",
"Jan. 14, 2016Indonesia ",
"June 26, 2015Tunisia ",
"Jan. 11, 2016France "]
for item in dates:
print(re.match(r"^.+?\s\d+,\s\d+", item).group(0))
This prints:
Nov. 4, 2015
April 8, 2015
Jan. 14, 2016
June 26, 2015
Jan. 11, 2016
Explanation
^ -assert position at start of string
.+? -match any char except newline (as few as possible)
\s -match a space character
\d+ -match any number of digits
, -match literal comma
\s -match a space character
\d+ -match any number of digits
You could try following:
^(.*\d{4})
Check the demo here:
import re
dates = """Nov. 4, 2015Bangladesh
April 8, 2015Saudi Arabia
Jan. 14, 2016Indonesia
June 26, 2015Tunisia
Jan. 11, 2016France"""
print re.findall(r'^(.*\d{4})', dates, re.M)
# ['Nov. 4, 2015', 'April 8, 2015', 'Jan. 14, 2016', 'June 26, 2015', 'Jan. 11, 2016']

Pythonic way to get datetime from string without leading zero

Pythonic way to get datetime from a string without leading zeroes?
e.g. no leading zero for Hour (typical case)
'Date: Jul 10, 2014 4:41:28 PM'
dateutil would handle it from out-of-the-box (fuzzy helps to ignore unrelated parts of the string):
>>> from dateutil import parser
>>> s = "Date: Jul 10, 2014 4:41:28 PM"
>>> parser.parse(s, fuzzy=True)
datetime.datetime(2014, 7, 10, 16, 41, 28)
Without dateutil:
>>> import datetime
>>> d = datetime.datetime.strptime(s, 'Date: %b %d, %Y %I:%M:%S %p')
>>> d.hour
16
>>> d
datetime.datetime(2014, 7, 10, 16, 41, 28)

Categories