How to strip a date string to datetime? [duplicate] - python

I'm trying to parse date in python. My code work with all other months except Sept
Here is my code
time.strptime("Sept. 30, 2014", "%b. %d, %Y")
I get this error
ValueError: time data 'Sept. 30, 2014' does not match format '%b. %d, %Y'

The abbreviation for September is Sep, not Sept.
>>> datetime.strptime("Sep. 30, 2014", "%b. %d, %Y")
datetime.datetime(2014, 9, 30, 0, 0)
Here's the list of all abbreviated month names for the en_US local.

The use of "Sept" in place of "Sep" seems to be a common occurrence. As suggested by #morgan-thrapp, you have to replace the former with the latter:
time.strptime("Sept. 30, 2014".upper().replace("SEPT", "SEP"), "%b. %d, %Y")
The use of upper() converts all lower-case characters to upper-case, so that the use of replace is more straightforward.

Related

String to date in pandas

I have a dataset with dates encoded as strings formatted as %B %d, %Y, eg September 10, 2021.
Using:df['sale_date'] = pd.to_datetime(df.sale_date, format = '%B %d, %Y')
produces this error ValueError: time data 'September 10, 2021' does not match format '%B %d, %Y' (match)
Manually checking with strptimedatetime.strptime('September 10, 2021', '%B %d, %Y') produces the correct datetime object.
Is there something I missed in the pd.to_datetime?
Thanks.
Upon further investigation, I found out that the error only happens on the first element of the series. It seems that the string has '\ufeff' added to it. So I just did a series.str.replace() and now it is working. Sorry for the bother. Question is how did that BOM end up there?
Very likely you have to eliminate some whitespaces first!
If I add whitespaces at the beginning, end or both..
datestring = ' September 10, 2021 '
datetime.datetime.strptime(datestring, '%B %d, %Y')
it will result in the same error message as you have..
ValueError: time data ' September 10, 2021 ' does not match format '%B %d, %Y'
As a solution for a single value use:
datestring = ' September 10, 2021 '
datestring.strip()
for a column in a dataframe use:
dummy = pd.DataFrame(columns={'Date'}, data = [' September 10, 2021 ', ' September 11, 2021 ', ' September 12, 2021 '])
dummy.Date = dummy.Date.apply(lambda x: x.strip())

Python Pandas Date in format 'Thursday, March 03, 2019' want to convert to %m/%d/%y

I am using the python pandas and datetime libraries to convert dates in a date column from the following format: 'Thursday, March 03, 2019' to: '3/3/2019'.
Below is the code I am using to get me the result, but I continue to get a ValueError. 'Unconverted Data Remains'.
Does anyone know a way around this issue?
df_['Date'] = df_['Date'].apply(lambda x: dt.datetime.strptime(x, '%A, %B %d, %Y').strftime('%d/%m/%Y'))
I think you can use exact=False in pandas.to_datetime if your date string is part of some other string.
exact behaviour : If True, require an exact format match. - If False, allow the format to match anywhere in the target string.
ex :
In [6]: pd.to_datetime("Send this to me on Thursday, March 31, 2015", format='%A, %B %d, %Y', exact=False)
Out[6]: Timestamp('2015-03-31 00:00:00')

How to convert string date (Nov 13, 2020) to datetime in pandas?

I have a dataset with abbreviated month names, and I have tried to follow some other solutions posted here, such as :
r.Date = pd.to_datetime(r.Date, format='%MMM %d, %Y')
but unfortunately it is giving me a ValueError: time data 'Nov 13, 2020' does not match format '%d %B, %Y' (match). The months dates are all abbreviated.
Change to
pd.to_datetime('Nov 13, 2020',format='%b %d, %Y')
Out[23]: Timestamp('2020-11-13 00:00:00')

Converting string in python to date format

I'm having trouble converting a string to data format. I'm using the time module to convert a string to the YYYY-MM-DD format. The code below is what I've tried but I get the following error.
sre_constants.error: redefinition of group name 'Y' as group 5; was group 3
Here is the code
import time
review_date = "April 18, 2018"
review_date = time.strptime(review_date, '%m %d %Y %I:%Y%m%d')
Firstly, the error is because you're using %Y, %m, and %d twice in your time.strptime() call.
Secondly, you're using the wrong format. The format you pass to strptime() has to match the format of the date / time string you pass, which in this case is: %B %d, %Y.
This is a good reference on the different format types.
I normally use datetime for this:
from datetime import datetime
review_date = "April 18, 2018"
review_date = datetime.strptime(review_date, '%B %d, %Y').strftime('%Y-%m-%d')
This code returns review_date = '2018-04-18'. See https://docs.python.org/3/library/datetime.html
The date format for April is %B. strptime() converts to a datetime object, .strftime() converts the datetime object to a string.
time.strptime() is for parsing strings into date/time structures. It takes two arguments, the string to be parsed and another string describing the format of the string to be parsed.
Try this:
time.strptime("April 18, 2018", "%B %d, %Y")
... and notice that "%B %d, %Y" is:
Full locale name of the month ("April")
[Space]
Date of the month (18)
[Comma]
[Space]
Four digit year (2018)
The format string specification that you provided bears no resemblance to the formatting of your date string.
These "magic" formatting codes are enumerated in the documentation for time.strftime()
review_date = time.strptime(review_date, '%B %d, %Y')
import time
review_date = "April 18, 2018"
review_date = time.strptime(review_date, '%B %d, %Y')
That's what you should have

Datetime from string doesn't match

I am trying to match a specific datetime format from a string but I am receiving a ValueError and I am not sure why. I am using the following format:
t = datetime.datetime.strptime(t,"%b %d, %Y %H:%M:%S.%f Eastern Standard Time")
which is an attempt to match the following string:
Nov 19, 2017 20:09:14.071360000 Eastern Standard Time
Can anyone see why these do not match?
From the docs we can see that %f expects:
Microsecond as a decimal number, zero-padded on the left.
The problem with your string is that you have a number that's zero-padded on the right.
Here is one way to fix your issue:
new_t = t.partition(" Eastern Standard Time")[0].rstrip('0') + ' Eastern Standard Time'
print(new_t)
#Nov 19, 2017 20:09:14.07136 Eastern Standard Time
t2 = datetime.datetime.strptime(new_t,"%b %d, %Y %H:%M:%S.%f Eastern Standard Time")
print(t2)
#datetime.datetime(2017, 11, 19, 20, 9, 14, 71360)
As noted by pault and the documentation, the issue is that the %f directive is essentially limited to 6 decimal places for your microseconds. While their solution works fine for your string, you might have an issue if your string is something like
'Nov 19, 2017 20:09:14.071360123 Eastern Standard Time'
Because calling rstrip('0') in that case would not cut the microseconds to the proper length. You could otherwise do the same with regex:
import re
import datetime
date_string = 'Nov 19, 2017 20:09:14.071360123 Eastern Standard Time'
# use a regex to strip the microseconds to 6 decimal places:
new_date_string = ''.join(re.findall(r'(.*\.\d{6})\d+(.*)', date_string)[0])
print(new_date_string)
#'Nov 19, 2017 20:09:14.071360 Eastern Standard Time'
t = datetime.datetime.strptime(new_date_string,"%b %d, %Y %H:%M:%S.%f Eastern Standard Time")
print(t)
#datetime.datetime(2017, 11, 19, 20, 9, 14, 71360)

Categories