datetime.strptime() conversion failure for year 10000 - python

The below python code worked fine until 12-31-9999.
It threw error once I changed the date to 01-01-10000 (see the screen shot attached)
from datetime import datetime
import time
myNewDate = "12-31-9999"
myNewDateTime = datetime.strptime(myNewDate, '%m-%d-%Y').date()
print (myNewDateTime)
Here is the screen shot:

According to the docs, only 4 digit years are supported.
The years are limited by the 1989 C standard.

%Y signifies a 4 digit year, the last 0 is unmatched by the pattern. You can't use python datetime to work with years beyond 9999: https://docs.python.org/3/library/datetime.html#datetime.MAXYEAR

Related

Why does pandas interpret Aug-30 as 1930-08, but not 2030-08? [duplicate]

I'm coming across something that is almost certainly a stupid mistake on my part, but I can't seem to figure out what's going on.
Essentially, I have a series of dates as strings in the format "%d-%b-%y", such as 26-Sep-05. When I go to convert them to datetime, the year is sometimes correct, but sometimes it is not.
E.g.:
dates = ['26-Sep-05', '26-Sep-05', '15-Jun-70', '5-Dec-94', '9-Jan-61', '8-Feb-55']
pd.to_datetime(dates, format="%d-%b-%y")
DatetimeIndex(['2005-09-26', '2005-09-26', '1970-06-15', '1994-12-05',
'2061-01-09', '2055-02-08'],
dtype='datetime64[ns]', freq=None)
The last two entries, which get returned as 2061 and 2055 for the years, are wrong. But this works fine for the 15-Jun-70 entry. What's going on here?
That seems to be the behavior of the Python library datetime, I did a test to see where the cutoff is 68 - 69:
datetime.datetime.strptime('31-Dec-68', '%d-%b-%y').date()
>>> datetime.date(2068, 12, 31)
datetime.datetime.strptime('1-Jan-69', '%d-%b-%y').date()
>>> datetime.date(1969, 1, 1)
Two digits year ambiguity
So it seems that anything with the %y year below 69 will be attributed a century of 2000, and 69 upwards get 1900
The %y two digits can only go from 00 to 99 which is going to be ambiguous if we start crossing centuries.
If there is no overlap, you could manually process it and annotate the century (kill the ambiguity)
I suggest you process your data manually and specify the century, e.g. you can decide that anything in your data that has the year between 17 and 68 is attributed to 1917 - 1968 (instead of 2017 - 2068).
If you have overlap then you can't process with insufficient year information, unless e.g. you have some ordered data and a reference
If you have overlap e.g. you have data from both 2016 and 1916 and both were logged as '16', that's ambiguous and there isn't sufficient information to parse this, unless the data is ordered by date in which case you can use heuristics to switch the century as you parse it.
from the docs
Year 2000 (Y2K) issues: Python depends on the platform’s C library,
which generally doesn’t have year 2000 issues, since all dates and
times are represented internally as seconds since the epoch. Function
strptime() can parse 2-digit years when given %y format code. When
2-digit years are parsed, they are converted according to the POSIX
and ISO C standards: values 69–99 are mapped to 1969–1999, and values
0–68 are mapped to 2000–2068.
For anyone looking for a quick and dirty code snippet to fix these cases, this worked for me:
from datetime import timedelta, date
col = 'date'
df[col] = pd.to_datetime(df[col])
future = df[col] > date(year=2050,month=1,day=1)
df.loc[future, col] -= timedelta(days=365.25*100)
You may need to tune the threshold date closer to the present depending on the earliest dates in your data.
You can write a simple function to correct this parsing of wrong year as stated below:
import datetime
def fix_date(x):
if x.year > 1989:
year = x.year - 100
else:
year = x.year
return datetime.date(year,x.month,x.day)
df['date_column'] = data['date_column'].apply(fix_date)
Hope this helps..
Another quick solution to the problem:-
import pandas as pd
import numpy as np
dates = pd.DataFrame(['26-Sep-05', '26-Sep-05', '15-Jun-70', '5-Dec-94', '9-Jan-61', '8-Feb-55'])
for i in dates:
tempyear=pd.to_numeric(dates[i].str[-2:])
dates["temp_year"]=np.where((tempyear>=44)&(tempyear<=99),tempyear+1900,tempyear+2000).astype(str)
dates["temp_month"]=dates[i].str[:-2]
dates["temp_flyr"]=dates["temp_month"]+dates["temp_year"]
dates["pddt"]=pd.to_datetime(dates.temp_flyr.str.upper(), format='%d-%b-%Y', yearfirst=False)
tempdrops=["temp_year","temp_month","temp_flyr",i]
dates.drop(tempdrops, axis=1, inplace=True)
And the output is as follows, here I have converted the output to pandas datetime format from object using pd.to_datetime
pddt
0 2005-09-26
1 2005-09-26
2 1970-06-15
3 1994-12-05
4 1961-01-09
5 1955-02-08
As mentioned in some other answers this works best if there is no overlap between the dates of the two centuries.
If running into the same problem using a pandas DataFrame, try using the current year or year greater than a particular year, then apply a lambda function similar to below:
df["column"] = df["column"].apply(lambda x: x - dt.timedelta(days=365*100) if x > dt.datetime.now() else x)
or
df["column"] = df["column"].apply(lambda x: x - dt.timedelta(days=365*100) if x > 2022 else x)

How to subtract days from a date using datetime [duplicate]

This question already has answers here:
How to perform arithmetic operation on a date in Python?
(2 answers)
Closed 2 years ago.
Something surely extremely simple, but I've been browsing around for almost one hour and couldn't find:
Working with Python, I have a date d="2020-01-22" (means January, 22nd, 2020) and I want to calculate the date corresponding to d - 57 days. With datetime, surely, but how, exactly?
Use the following code-
from datetime import datetime, timedelta
d = datetime.today()
new_d = d - timedelta(days=57)
Use package datetime.
# python3
import datetime
d = datetime.datetime.strptime("2020-01-22", '%Y-%m-%d')
print(d - datetime.timedelta(days=57)) # 2019-11-26 00:00:00
You can use datetime.strptime. this is the main function for parsing strings into datetimes. It can handle all type of formats, with the format determined by a format string you provide. You can read in detail here
from datetime import datetime
date_time= datetime.strptime('2020-01-22", '%Y-%m-%d')
print(date_time)

Finding the day x years,months from a given time (Python) [duplicate]

This question already has answers here:
python getting weekday from an input date
(2 answers)
Closed 9 years ago.
Have a maths question here which I know to solve using only pen and paper. Takes a while with that approach, mind you. Does anybody know to do this using Python? I've done similar questions involving "dates" but none involving "days". Any of you folk able to figure this one out?
The date on 25/11/1998 is a Wednesday. What is the day on 29/08/2030?
Can anyone at least suggest an algorithm?
Cheers
Use the wonderful datetime module:
>>> import datetime
>>> mydate = datetime.datetime.strptime('29/08/2030', '%d/%M/%Y')
>>> print mydate.strftime('%A')
Tuesday
The algorithm/math is quite simple: There are always 7 days a week. Just calculate how many days between the two days, add it to the weekday of the given day then mod the sum by 7.
<!-- language: python -->
> from datetime import datetime
> given_day = datetime(1998,11,25)
> cal_day = datetime(2030,8,29)
> print cal_day.weekday()
3
> print (given_day.weekday() + (cal_day-given_day).days) % 7
3

get year from string '07-JUL-50' [duplicate]

This question already has answers here:
2 digit years using strptime() is not able to parse birthdays very well
(3 answers)
Closed 9 years ago.
I have a birth date field like:
date = '07-JUL-50'
and when I wanna get the year I got this:
my_date = datetime.datetime.strptime(date, "%d-%b-%y")
>>> my_date.year
2050
Is there an elegant way of get '1950'??
Thx
Documentation says that:
When 2-digit years are accepted, they are converted according to the
POSIX or X/Open standard: values 69-99 are mapped to 1969-1999, and
values 0–68 are mapped to 2000–2068.
That's why you are seeing 2050.
If you want 1950 instead, it depends on what are allowed dates in your particular scenario. E.g. if date values can be only dates in the past, you can do something like this:
import datetime
def get_year(date):
my_date = datetime.datetime.strptime(date, "%d-%b-%y")
now = datetime.datetime.now()
if my_date > now:
my_date = my_date.replace(year=my_date.year - 100)
return my_date.year
print get_year('07-JUL-50') # prints 1950
print get_year('07-JUL-13') # prints 2013
print get_year('07-JUL-14') # prints 1914
Hope that helps.

How to convert the integer date format into YYYYMMDD?

Python and Matlab quite often have integer date representations as follows:
733828.0
733829.0
733832.0
733833.0
733834.0
733835.0
733836.0
733839.0
733840.0
733841.0
these numbers correspond to some dates this year. Do you guys know which function can convert them back to YYYYMMDD format?
thanks a million!
The datetime.datetime class can help you here. The following works, if those values are treated as integer days (you don't specify what they are).
>>> from datetime import datetime
>>> dt = datetime.fromordinal(733828)
>>> dt
datetime.datetime(2010, 2, 25, 0, 0)
>>> dt.strftime('%Y%m%d')
'20100225'
You show the values as floats, and the above doesn't take floats. If you can give more detail about what the data is (and where it comes from) it will be possible to give a more complete answer.
Since Python example was already demonstrated, here is the matlab one:
>> datestr(733828, 'yyyymmdd')
ans =
20090224
Also, note that while looking similar these are actually different things in Matlab and Python:
Matlab
A serial date number represents the whole and fractional number of days
from a specific date and time, where datenum('Jan-1-0000 00:00:00') returns
the number 1. (The year 0000 is merely a reference point and is not intended
to be interpreted as a real year in time.)
Python, datetime.date.fromordinal
Return the date corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1.
So they would differ by 366 days, which is apparently the length of the year 0.
Dates like 733828.0 are Rata Die dates, counted from January 1, 1 A.D. (and decimal fraction of days). They may be UTC or by your timezone.
Julian Dates, used mostly by astronomers, count the days (and decimal fraction of days) since January 1, 4713 BC Greenwich noon. Julian date is frequently confused with Ordinal date, which is the date count from January 1 of the current year (Feb 2 = ordinal day 33).
So datetime is calling these things ordinal dates, but I think this only makes sense locally, in the world of python.
Is 733828.0 a timestamp? If so, you can do the following:
import datetime as dt
dt.date.fromtimestamp(733828.0).strftime('%Y%m%d')
PS
I think Peter Hansen is right :)
I am not a native English speaker. Just trying to help. I don't quite know the difference between a timestamp and an ordinal :(

Categories