I am trying to find the # of months between 2 dates. Some solutions are off by 1 month and others are off by several months. I found This solution on SO but the solutions are either too complicated or incorrect.
For example, given the starting date of 04/30/12 and ending date of 03/31/16,
def diff_month(d1, d2):
return (d1.year - d2.year)*12 + d1.month - d2.month
returns 47 months, not 48
and
dates = [dt for dt in rrule(MONTHLY, dtstart=strt_dt, until=end_dt)]
returns 44 (Reason being that February does not have a day # 30 so it does not see it as a valid date)
I can of course fix that by doing
dates = [dt for dt in rrule(MONTHLY, dtstart=strt_dt.replace(day=2), until=end_dt.replace(day=1))]
But this does not seem like a proper solution (I mean the answer is right but the method sucks).
Is there a proper way of calculating the # of months so that given my example dates, it would return 48?
I realize this post doesn't have a Pandas tag, but if you are willing to use it you can simply do the following which takes the difference between two monthly periods:
import pandas as pd
>>> pd.Period('2016-3-31', 'M') - pd.Period('2012-4-30', 'M')
47
Related
I'm coming across something that is almost certainly a stupid mistake on my part, but I can't seem to figure out what's going on.
Essentially, I have a series of dates as strings in the format "%d-%b-%y", such as 26-Sep-05. When I go to convert them to datetime, the year is sometimes correct, but sometimes it is not.
E.g.:
dates = ['26-Sep-05', '26-Sep-05', '15-Jun-70', '5-Dec-94', '9-Jan-61', '8-Feb-55']
pd.to_datetime(dates, format="%d-%b-%y")
DatetimeIndex(['2005-09-26', '2005-09-26', '1970-06-15', '1994-12-05',
'2061-01-09', '2055-02-08'],
dtype='datetime64[ns]', freq=None)
The last two entries, which get returned as 2061 and 2055 for the years, are wrong. But this works fine for the 15-Jun-70 entry. What's going on here?
That seems to be the behavior of the Python library datetime, I did a test to see where the cutoff is 68 - 69:
datetime.datetime.strptime('31-Dec-68', '%d-%b-%y').date()
>>> datetime.date(2068, 12, 31)
datetime.datetime.strptime('1-Jan-69', '%d-%b-%y').date()
>>> datetime.date(1969, 1, 1)
Two digits year ambiguity
So it seems that anything with the %y year below 69 will be attributed a century of 2000, and 69 upwards get 1900
The %y two digits can only go from 00 to 99 which is going to be ambiguous if we start crossing centuries.
If there is no overlap, you could manually process it and annotate the century (kill the ambiguity)
I suggest you process your data manually and specify the century, e.g. you can decide that anything in your data that has the year between 17 and 68 is attributed to 1917 - 1968 (instead of 2017 - 2068).
If you have overlap then you can't process with insufficient year information, unless e.g. you have some ordered data and a reference
If you have overlap e.g. you have data from both 2016 and 1916 and both were logged as '16', that's ambiguous and there isn't sufficient information to parse this, unless the data is ordered by date in which case you can use heuristics to switch the century as you parse it.
from the docs
Year 2000 (Y2K) issues: Python depends on the platform’s C library,
which generally doesn’t have year 2000 issues, since all dates and
times are represented internally as seconds since the epoch. Function
strptime() can parse 2-digit years when given %y format code. When
2-digit years are parsed, they are converted according to the POSIX
and ISO C standards: values 69–99 are mapped to 1969–1999, and values
0–68 are mapped to 2000–2068.
For anyone looking for a quick and dirty code snippet to fix these cases, this worked for me:
from datetime import timedelta, date
col = 'date'
df[col] = pd.to_datetime(df[col])
future = df[col] > date(year=2050,month=1,day=1)
df.loc[future, col] -= timedelta(days=365.25*100)
You may need to tune the threshold date closer to the present depending on the earliest dates in your data.
You can write a simple function to correct this parsing of wrong year as stated below:
import datetime
def fix_date(x):
if x.year > 1989:
year = x.year - 100
else:
year = x.year
return datetime.date(year,x.month,x.day)
df['date_column'] = data['date_column'].apply(fix_date)
Hope this helps..
Another quick solution to the problem:-
import pandas as pd
import numpy as np
dates = pd.DataFrame(['26-Sep-05', '26-Sep-05', '15-Jun-70', '5-Dec-94', '9-Jan-61', '8-Feb-55'])
for i in dates:
tempyear=pd.to_numeric(dates[i].str[-2:])
dates["temp_year"]=np.where((tempyear>=44)&(tempyear<=99),tempyear+1900,tempyear+2000).astype(str)
dates["temp_month"]=dates[i].str[:-2]
dates["temp_flyr"]=dates["temp_month"]+dates["temp_year"]
dates["pddt"]=pd.to_datetime(dates.temp_flyr.str.upper(), format='%d-%b-%Y', yearfirst=False)
tempdrops=["temp_year","temp_month","temp_flyr",i]
dates.drop(tempdrops, axis=1, inplace=True)
And the output is as follows, here I have converted the output to pandas datetime format from object using pd.to_datetime
pddt
0 2005-09-26
1 2005-09-26
2 1970-06-15
3 1994-12-05
4 1961-01-09
5 1955-02-08
As mentioned in some other answers this works best if there is no overlap between the dates of the two centuries.
If running into the same problem using a pandas DataFrame, try using the current year or year greater than a particular year, then apply a lambda function similar to below:
df["column"] = df["column"].apply(lambda x: x - dt.timedelta(days=365*100) if x > dt.datetime.now() else x)
or
df["column"] = df["column"].apply(lambda x: x - dt.timedelta(days=365*100) if x > 2022 else x)
I would like to extract a week number from a data in panda dataframe but starting from the SPECIFIC date.
For example from 4th April:
20/04/2010 --> 1
27/04/2010 --> 2
04/05/2010 --> 3
and so on..
Any idea?
Thank you in advance!
Just calculate the difference in days between 2 dates, divide by 7 and add 1 :
from datetime import date
origin = date(2010, 4, 20)
def week_number_from(my_date, origin):
return (my_date - origin).days / 7 + 1
Use pandas to_datetime to parse your date column if it is not already in datetime format.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html
Then use the date time method weekday.
https://docs.python.org/2/library/datetime.html
my_dataframe['week_nums'] = pandas.to_datetime(my_dataframe['date_col']).weekday()
Sorry I saw you want week day from specific date, will update answer... it is easy to calculate the difference between 2 dates.
I have a list of dates as generated by:
from dateutil import parser
from datetime import date, timedelta
d1 = parser.parse("2015-11-25")
d2 = parser.parse("2016-02-06")
delta = (d2-d1).days
date_list = [d1 + timedelta(days=x) for x in range(0, delta+1)]
In this list there are 6 days in the month of november 2015, 31 days in december 2015 , 31 days in january 2016 and 6 days in february 2016. December 2015 and January 2016 are "full" months, i.e. the datelist has all days in those months.
How can I get this information programatically in python, in order to produce a list such as:
[(2015,11,6,False),(2015,12,31,True),(2016,1,31,True),(2016,2,6,False)]
Found a neat short solution:
from dateutil import parser
from datetime import date, timedelta
from collections import Counter
from calendar import monthrange
d1 = parser.parse("2015-11-25")
d2 = parser.parse("2016-02-06")
delta = (d2-d1).days
date_list = [d1 + timedelta(days=x) for x in range(0, delta+1)]
month_year_list = [(d.year, d.month) for d in date_list]
result = [(k[0],k[1],v , True if monthrange(k[0], k[1])[1] == v else
False) for k,v in Counter(month_year_list).iteritems()]
print result
Walk the list and accumulate the number of days for each year/month combination:
import collections
days_in_year_month = defaultdict(int)
for each_date in date_list:
days_in_year_month[(each_date.year, each_date.month)] += 1
Next output the tuples with each year, month, count and T/F:
import calendar
result = []
for year_month in date_list.keys():
days_in_ym = days_in_year_month([year_month[0], year_month[1])
is_complete = days_in_ym == calendar.monthrange(year_month[0], year_month[1])[1]
result.append(year_month[0], year_month[1], days_in_ym, is_complete)
So:
I learned about monthrange here: How do we determine the number of days for a given month in python
My solution sucks because it will do a total of 3 loops: the initial loop from your list comprehension, plus the two loops I added. Since you're walking the days in order for your list comprehension, this could be much optimized to run in a single loop.
I didn't test it :)
The previous mentioned solutions seem ok, however I believe I have a more optimal solution, since they require to calculate a list that contains all the days. For a small date difference this won't be problematic. However if the difference increases, your list will become a lot larger.
I want to give another approach that is more intuitive, since you basically know that all months that between the dates are full, and the months of the dates themselves are not full.
I try to leverage that information and the loop will only iterate the amount of months between the dates.
The code:
from dateutil import parser
from calendar import monthrange
d1 = parser.parse("2015-11-25")
d2 = parser.parse("2016-02-06")
# needed to calculate amount of months between the dates
m1 = d1.year * 12 + (d1.month- 1)
m2 = d2.year * 12 + (d2.month - 1)
result = []
# append first month since this will not be full
result.append((d1.year,d1.month,monthrange(d1.year, d1.month)[1]-d1.day+1,False))
current_month = d1.month
current_year = d1.year
# loop through the months and years that follow d1.
for _ in xrange(0,(m2-m1)-1):
if current_month+1 > 12:
current_month = 1
current_year += 1
else:
current_month += 1
result.append((current_year,current_month,monthrange(current_year, current_month)[1],True))
# append last month since this will not be full either.
result.append((d2.year,d2.month,d2.day,False))
print result
Keep in mind that the code I gave is an example, it doesn't support for instance the scenario where the 2 given dates have the same month.
This question already has answers here:
Generate a list of datetimes between an interval
(5 answers)
Closed 8 years ago.
I have two yyyymm values that will be input by a user:
yyyymm_1 = '201406'
yyyymm_2 = '201501'
I want to be able to iterate through this range in increasing month order:
for yyyy and mm in the range of yyyymm_1 to yyyymm_2
my_function( yyyy, mm )
How can this be done in python?
Update:
Ideally, the solution should be as simple as possible without requiring external libraries. I'm not looking for a generic date manipulation solution, but a solution to answer the specific question I have asked above.
I had seen lots of generic solutions before posting my question. However, being a python noob, couldn't see how to adapt them to my question:
Generate a list of datetimes between an interval
Iterating through a range of dates in Python
On that note, the other questions linked to from this page are much more generic. If you are looking to generate a range of yyyymm values, I urge you to look at the selected answer on this page.
Here's another rather simple variant, without even using datetime. Just split the date, calculate the 'total month', and iterate.
def to_month(yyyymm):
y, m = int(yyyymm[:4]), int(yyyymm[4:])
return y * 12 + m
def iter_months(start, end):
for month in range(to_month(start), to_month(end) + 1):
y, m = divmod(month-1, 12) # ugly fix to compensate
yield y, m + 1 # for 12 % 12 == 0
for y, m in iter_months('201406', '201501'):
print y, m
Output:
2014 6
2014 7
...
2014 12
2015 1
For output in the same yyyymm format, use print("%d%02d" % (y, m)).
You can do this using the builtin datetime module and the third party package dateutil.
The code first converts your strings to datetime.datetime objects using datetime.datetime.strptime. It then uses the relativedelta function from dateutil to create a period of one month that can be added to your datetimes.
Within the while loop you can either work with the datetime objects directly, or construct the month and year as strings using strftime, I've shown an example of both in print functions.
import datetime as dt
from dateutil.relativedelta import relativedelta
yyyymm_1 = '201406'
yyyymm_2 = '201501'
MONTH = relativedelta(months=+1)
fmt = '%Y%m'
date_1 = dt.datetime.strptime(yyyymm_1, fmt).date()
date_2 = dt.datetime.strptime(yyyymm_2, fmt).date()
d = date_1
while d <= date_2:
print(d)
print(d.strftime('%Y'), d.strftime('%m'))
d += MONTH
I am working with a dataset that encodes dates as the integer number of months since December 1899, so month 1 is January 1900 and month 1165 is January 1997. I would like to convert to a pandas DateTimeIndex. So far the best I've come up with is:
month0 = np.datetime64('1899-12-15')
one_month = np.timedelta64(30, 'D') + np.timedelta64(10.5, 'h')
birthdates = pandas.DatetimeIndex(month0 + one_month * resp.cmbirth)
The start date is the 15th of the month, and the timedelta is 30 days 10.5 hours, the average length of a calendar month. So the date within the month drifts by a day or two.
So this seems a little hacky and I wondered if there's a better way.
You can use built-in pandas date-time functionality.
import pandas as pd
import numpy as np
indexed_months = np.random.random_integers(0, high=1165, size=100)
month0 = pd.to_datetime('1899-12-01')
date_list = [month0 + pd.DateOffset(months=mnt) for mnt in indexed_months]
birthdates = pd.DatetimeIndex(date_list)
I've made an assumption that your resp.cmbirth object looks like an array of integers between 0 and 1165.
I'm not quite clear on why you want the bin edges of the indices to be offset from the start or end of the month. This can be done:
shifted_birthdates = birthdates.shift(15, freq=pd.datetools.day)
and similarly for hours if you want. There is also useful info in the answers to this SO question and the related pandas github issue.