I am trying to do a function where I check if a date is in my excel file, and if unfortunately it is not. I retrieve the date before.
I succeeded with the after date and here is my code.
Only with the date before, I really can't do it.
i tried this for the day before:
def get_all_dates_between_2_dates_with_special_begin_substraction(Class, date_départ, date_de_fin, date_debut_analyse, exclus=False):
date_depart = date_départ
date_fin = date_de_fin
result_dates = []
inFile = "database/Calendar_US_Target.xlsx"
inSheetName = "Sheet1"
df =(pd.read_excel(inFile, sheet_name = inSheetName))
date_depart = datetime.datetime.strptime(date_depart, '%Y-%m-%d')
date_fin = datetime.datetime.strptime(date_fin, '%Y-%m-%d')
date_calcul_depart = datetime.datetime.strptime(date_debut_analyse, '%Y-%m-%d')
var_date_depart = date_depart
time_to_add = ""
if (Class.F0 == "mois"):
time_to_add = relativedelta(months=1)
if (Class.F0 == "trimestre"):
time_to_add = relativedelta(months=3)
if (Class.F0 == "semestre"):
time_to_add = relativedelta(months=6)
if (Class.F0 == "année"):
time_to_add = relativedelta(years=1)
while var_date_depart <= date_fin:
-------------------------------------------------------------
df['mask'] = (var_date_depart <= df['TARGETirs_holi']) # daybefore
print(df.head())
print(df[df.mask =="True"].head(1)) #want to check the last true value
------------------------------------------------------------------------------
if (result >= date_calcul_depart):
result = (str(result)[0:10])
result = result[8:10] + "/" + result[5:7] + "/" + result[0:4]
result_dates.append(str(result))
var_date_depart = var_date_depart + time_to_add
if (exclus == True):
result_dates = result_dates[1:-1]
return(result_dates)
I want to say, do a column (or a dataframe) where the first date is true where the first date smaller than the second then i take the last value who is true.
for example:
I have this array [12-05-2022,15-05-2022,16-05-2022 and 19-05-2022]
if i put 15-05-2022, it gives me 15-05-2022, but if i put 18-05-2022, its gives me 16-05-2022
Thanks!
Related
So I have a Python code that first aggregates and standardizes Data into a file I called "tripFile". Then the code tries to identify the differences between this most recent tripFile and a previous one.
From the first part of the code, if I export the tripFile, and import it again for the second part of the code, it takes around 5 minutes to run and says it is looping over a bit more than 4,000 objects.
newTripFile = pd.read_csv(PATH + today + ' Trip File v6.csv')
However, if I do not export & re-import the Data (just keeping it from the first part of the code), it takes a bit less than 24 hours (!!) and says it is looping over a bit more than 951,691 objects.
newTripFile = tripFile
My Data is a dataframe, and checked the shape of it, it is identical to the file I export.
Any idea what can be causing that ???
Here is the second part of my code:
oldTripFile = pd.read_excel(PATH + OLDTRIPFILE)
oldTripFile.drop(['id'], axis = 1, inplace = True)
oldTripFile['status'] = 'old'
# New version of trip file
newTripFile = pd.read_csv(PATH + today + ' Trip File v6.csv')
newTripFile.drop(['id'], axis = 1, inplace = True)
newTripFile['status'] = 'new'
db_trips = pd.concat([oldTripFile, newTripFile]) #concatenation of the two dataframes
db_trips = db_trips.reset_index(drop = True)
db_trips.drop_duplicates(keep = False, subset = [column for column in db_trips.columns[:-1] ], inplace = True)
db_trips = db_trips.reset_index(drop = True)
db_trips.head()
update_details = []
# Get the duplicates : only consider ['fromCode', 'toCode', 'mode'] for identifying duplicates
# Create a dataframe that contains only the trips that was deleted and was recently added
db_trips_delete_new = db_trips.drop_duplicates(keep = False, subset = ['fromCode', 'toCode', 'mode'])
db_trips_delete_new = db_trips_delete_new.reset_index(drop = True)
# New trips
new_trips = db_trips_delete_new[db_trips_delete_new['status'] == 'new'].values.tolist()
for trip in new_trips:
trip.append('new trip added')
update_details = update_details + new_trips
# Deleted trips
old_trips = db_trips_delete_new[db_trips_delete_new['status'] == 'old'].values.tolist()
for trip in old_trips:
trip.append('trip deleted')
update_details = update_details + old_trips
db_trips_delete_new.head()
# Updated trips
# Ocean: no need to check the transit time column
sea_trips = db_trips.loc[db_trips['mode'].isin(['sea', 'cfs'])]
sea_trips = sea_trips.reset_index(drop = True)
list_trips_sea_update = sea_trips[sea_trips.duplicated(subset = ['fromCode', 'toCode', 'mode'], keep = False)].values.tolist()
if len(list_trips_sea_update) != 0:
for i in tqdm(range(0, len(list_trips_sea_update) - 1)):
for j in range(i + 1, len(list_trips_sea_update)):
if list_trips_sea_update[i][2] == list_trips_sea_update[j][2] and list_trips_sea_update[i][9] == list_trips_sea_update[j][9] and list_trips_sea_update[i][14] == list_trips_sea_update[j][14]:
update_comment = ''
# Check display from / to
if list_trips_sea_update[i][5] != list_trips_sea_update[j][5]:
update_comment = update_comment + 'fromDisplayLocation was updated.'
if list_trips_sea_update[i][12] != list_trips_sea_update[j][12]:
update_comment = update_comment + 'toDisplayLocation was updated.'
# Get the updated trip (the row with status new)
if list_trips_sea_update[i][17] == 'new' and list_trips_sea_update[j][17] != 'new' :
list_trips_sea_update[i].append(update_comment)
update_details = update_details + [list_trips_sea_update[i]]
else:
if list_trips_sea_update[j][17] == 'new' and list_trips_sea_update[i][17] != 'new':
list_trips_sea_update[j].append(update_comment)
update_details = update_details + [list_trips_sea_update[j]]
else:
print('excel files are not organized')
# Ground: transit time column need to be checked
ground_trips = db_trips[~db_trips['mode'].isin(['sea', 'cfs'])]
ground_trips = ground_trips.reset_index(drop = True)
list_trips_ground_update = ground_trips[ground_trips.duplicated(subset = ['fromCode', 'toCode', 'mode'], keep = False)].values.tolist()
if len(list_trips_ground_update) != 0:
for i in tqdm(range(0, len(list_trips_ground_update) - 1)):
for j in range(i + 1, len(list_trips_ground_update)):
if list_trips_ground_update[i][2] == list_trips_ground_update[j][2] and list_trips_ground_update[i][9] == list_trips_ground_update[j][9] and list_trips_ground_update[i][14] == list_trips_ground_update[j][14]:
update_comment = ''
# Check display from / to
if list_trips_ground_update[i][5] != list_trips_ground_update[j][5]:
update_comment = update_comment + 'fromDisplayLocation was updated.'
if list_trips_ground_update[i][12] != list_trips_ground_update[j][12]:
update_comment = update_comment + 'toDisplayLocation was updated.'
# Check transit time
if list_trips_ground_update[i][15] != list_trips_ground_update[j][15]:
update_comment = update_comment + 'transit time was updated.'
# Get the updated trip (the row with status new)
if list_trips_ground_update[i][17] == 'new' and list_trips_ground_update[j][17] != 'new' :
list_trips_ground_update[i].append(update_comment)
update_details=update_details + [list_trips_ground_update[i]]
else:
if list_trips_ground_update[j][17] == 'new' and list_trips_ground_update[i][17] != 'new':
list_trips_ground_update[j].append(update_comment)
update_details = update_details + [list_trips_ground_update[j]]
else:
print('excel files are not organized')
And here an example of what my trip file looks like:
Any help is appreciated :)
If ever it can be useful to someone else, issue was coming from the type. When keeping my tripFile in memory, one of my column was "10.0" for example, whereas when imported this column was "10".
As I'm comparing with another imported tripFile, if both files are imported the column in both files are of same type, but if one of the files is kept in memory the column is of different type in both files and considered as updated. As such takes much longer when kept in memory as every row is considered updated.
I am analysing daily rainfall data to calculate Insurance payout for farmers in case of excess rainfall coverage. policy covers if the "Consecutive 3 day cumulative rainfall" is greater than 50mm between 1st-Oct and 31st-Oct.
I was able to write the code in Python to find the matching criteria. But when it rains continuously then the result has overlapping dates which is not acceptable payout.
Need help in calculating best payout option in case of overlapping dates.
for dist in data["dcode"].unique():
d_data = data[data["dcode"] == dist]
#print(dist)
for block in d_data["mandal"].unique():
prev_rain = 0
prev_to_date = "01/12/2022"
for each in rain_dev_input:
#[(rain_dev_input["TERM"] == "DEFICIT RAINFALL") & (rain_dev_input["DIST_CODE"] == 240)]:
#print(each)
distcode = each["DIST_CODE"]
#print(distcode)
term = str(each["TERM"])
if (distcode == dist) & (term == "EXCESS RAINFALL"):
start_date = each["FROM_PERIOD"]
end_date = each["TO_PERIOD"]
s_date = datetime.datetime.strptime(start_date, "%d/%m/%y")
e_date = datetime.datetime.strptime(end_date, "%d/%m/%y")
#s_date = start_date.strftime(start_date, "%Y-%m-%d")
#e_date = end_date.strftime(end_date, "%Y-%m-%d")
#count1 = daterange(s_date, e_date)
#print(s_date, ": ", e_date)
m_data = d_data[d_data["mandal"] == block]
p_data = m_data.loc[start_date:end_date]
for singledate in daterange(s_date, e_date):
#print("Inside Excess Rain")
from_date = datetime.datetime.strftime(singledate, "%Y-%m-%d")
to_date = datetime.datetime.strftime(
(singledate + timedelta(2)), "%Y-%m-%d"
)
total_rain = p_data.loc[from_date:to_date]["rain"].sum()
#print(total_rain)
range1 = float(each["RANGE1"])
range2 = float(each["RANGE2"])
if (total_rain >= range1) & (total_rain < range2):
#print("inside write to file")
if (from_date <= prev_to_date <= to_date) & (prev_rain <= total_rain):
temp["Max"] = total_rain;
prev_rain = total_rain
prev_to_date = to_date
temp = dict()
temp["district"] = each["DIST_NAME"]
temp["mandal"] = block
temp[
"category"
] = "excess rainfall for 3 consecutive days, cumulative"
temp["rainfall"] = total_rain
temp["from_date"] = from_date
temp["to_date"] = to_date
temp["phase"] = each["PHASE"]
#temp["insurance"] = insurance_excessrain(each, total_rain)
temp["insurance"] = (total_rain - range1)* each["PAYOUT"]
excess_rainfall.append(temp)
# %%
# output excess rainfall rain_dev_list data
excess_rain = pd.DataFrame(excess_rainfall)
excess_rain.to_csv(str(output_folder) + "/excess_rainfall_2020_h2.csv", index=False)
This result has overlapping dates
Daily Rainfall sample data
I am attempting to create a python script to iterate over all the rows of a specific column in an excel spredsheet. This column contains dates, I need to compare each of these dates in order to find and return the oldest date in the excel sheet. After which, I will need to modify the data in that row.
I have tried to append the dates into a numpy array as datetime objects, this was working but I cannot traverse through the array and compare the dates. I have also tried to reformat the dates in the excel sheet to datetime objects in python and then compare but I get the following error:
AttributeError: type object 'datetime.datetime' has no attribute 'datetime'
I have tried some other unsuccessful methods. These are the ones where I got closest to achieving what I want. I'm quite lost, please help!
import openpyxl
import numpy as np
import datetime
def main():
wb = openpyxl.load_workbook("C:\\Users\\User\\Desktop\\Python Telecom Project.xlsx")
sheet = wb.active
def menuSelection():
while True:
menuChoice = input("Please select one of the following options:\n1. Add User\n2.Delete User\n3.Modify User\n")
if menuChoice not in ('1', '2', '3'):
print("The input entered is invalid, please try again")
continue
else:
break
return menuChoice
def findOldestDate():
wb = openpyxl.load_workbook("C:\\Users\\User\\Desktop\\Python Telecom Project.xlsx")
sheet = wb.active
## startMult = np.empty((0,1000), dtype='datetime64[D]')
## value = datetime.date.strftime("%Y-%m-%d")
for rowNum in range(2, sheet.max_row+1):
status = sheet.cell(row=rowNum, column=5).value
d8 = sheet.cell(row=rowNum, column=6).value
d8_2 = sheet.cell(row=rowNum+1, column=6).value
d8.value = datetime.date.strftime(d8, "%Y-%m-%d")
d8_2.value = datetime.date.strftime(d8_2, "%Y-%m-%d")
d8.number_format = 'YYYY MM DD'
d8_2.number_format = 'YYYY MM DD'
if d8 < d8_2:
oldestDate = d8
elif d8 > d8_2:
oldestDate = d8_2
else:
continue
return oldestDate
## array.append(startMult, date)
##
## while counter < len(array)-1:
##
## if array[counter] < array[counter + 1]:
##
## oldestDate = array[counter]
## counter += 1
##
## elif array[counter] > array[counter + 1]:
##
## oldestDate = array[counter + 1]
## counter += 1
##
## else:
## oldestDate = array[counter]
## continue
##
## return oldestDate
def addUser():
wb = openpyxl.load_workbook("C:\\Users\\User\\Desktop\\Python Telecom Project.xlsx")
sheet = wb.active
dateTimeObj = datetime.date.today()
print("Please enter the following information:\n")
inputName = input("Name: ")
inputNTID = input("NTID: ")
inputRATSID = input("RATSID: ")
inputStatus = input("Status: ")
inputTaskNum = input("Task #: ")
for rowVal in range(2, sheet.max_row+1):
oldestDate = findOldDate()
phoneNum = sheet.cell(row=rowVal, column=1).value
name = sheet.cell(row=rowVal, column=2).value
ntID = sheet.cell(row=rowVal, column=3).value
ratsID = sheet.cell(row=rowVal, column=4).value
status = sheet.cell(row=rowVal, column=5).value
date = sheet.cell(row=rowVal, column=6).value
if date == oldestDate:
name = inputName
ntID = inputNTID
ratsID = inputRATSID
status = inputStatus
date = dateTimeObj
print("\nChanges have been implemented successfully!")
##def deleteUser():
##
##
##
##def modifyUser():
addUser()
This is the current error message:
AttributeError: type object 'datetime.datetime' has no attribute 'datetime'
Prior to this one, I was getting:
can't compare 'str' to 'datetime'
What I want is the oldest date in the column to be returned from this function.
Finding the oldest date can be achieved with a one-liner like the following:
from datetime import datetime as dt
from re import match
def oldest(sheet, column):
"""
Returns the tuple (index, timestamp) of the oldest date in the given sheet at the given column.
"""
return min([(i, dt.strptime(sheet.cell(row=i, column=column).value, '%Y %m %d').timestamp()) for i in range(2, sheet.max_row+1) if isinstance(sheet.cell(row=i, column=column).value, str) and match(r'\d{4}\s\d{2}\s\d{2}', sheet.cell(row=i, column=column).value)], key=lambda x:x[1])
The longer, slower but more readable version follow:
def oldest(sheet, column):
"""
Returns the tuple (index, timestamp) of the oldest date in the given sheet at the given column.
"""
format = '%Y %m %d'
values = list()
for i in range(2, sheet.max_row+1):
if isinstance(sheet.cell(row=i, column=column).value, str) and match(r'\d{4}\s\d{2}\s\d{2}', sheet.cell(row=i, column=column).value):
values.append((i, dt.strptime(sheet.cell(row=i, column=column).value, format).timestamp()))
return min(values, key=lambda x: x[1])
If you need that, you can convert the retrieved timestamp back in the date format you had as shown in this sample session at the python REPL:
>>> row, timestamp = oldest(sheet, 1)
>>> date = dt.utcfromtimestamp(timestamp[1]).strftime('%Y %m %d')
>>> date
'2019 10 31'
>>> row
30
I have the below two functions :
def create_base_df(start_date, end_date):
base_df = pd.DataFrame({"dt": pd.date_range(start_date, end_date)})
base_df["dt_num_key"] = base_df.dt.apply(lambda x: datetime.datetime.strftime(x, "%Y%m%d")).astype(int)
base_df["cal_yr_nkey"] = base_df.dt.dt.strftime("%Y")
base_df["cal_mon_ofyr_nkey"] = base_df.dt.dt.strftime("%m")
base_df["cal_qtr_ofyr_nkey"] = base_df.dt.dt.quarter.astype(str).apply(lambda x: x.rjust(2, '0'))
base_df["cal_wk_ofyr_nkey"] = base_df.dt.dt.week.astype(str)
return base_df
def month_operations(df):
df["cal_mon_nm"] = df.dt.dt.strftime("%B")
df["cal_mon_shrt_nm"] = df.dt.dt.strftime("%b")
df["cal_yr_mon_nkey"] = df["cal_yr_nkey"] + df["cal_mon_ofyr_nkey"]
df["mon_seq_id"] = df.cal_yr_mon_nkey.sort_values().reset_index() ["cal_yr_mon_nkey"].rank(method='dense').astype(int)
df["dt_frst_dayof_mon"] = df.dt.apply(lambda x: datetime.datetime(x.year, x.month, 1))
df["dt_frst_dayof_mon_nkey"] = df["dt_frst_dayof_mon"].dt.strftime("%Y%m%d")
df["dt_lst_dayof_mon"] = df["dt_frst_dayof_mon"] + pd.tseries.offsets.DateOffset(
months=1) - pd.tseries.offsets.DateOffset(days=1)
df["dt_lst_dayof_mon_nkey"] = df["dt_lst_dayof_mon"].dt.strftime("%Y%m%d")
df["dt_frst_dayof_lst_mon"] = df["dt_frst_dayof_mon"] - pd.DateOffset(months=1)
df["dt_frst_dayof_lst_mon_nkey"] = df["dt_frst_dayof_lst_mon"].dt.strftime("%Y%m%d")
df["dt_lst_mon"] = df.dt - pd.tseries.offsets.DateOffset(months=1)
df["dt_lst_mon_nkey"] = df["dt_lst_mon"].dt.strftime("%Y%m%d")
df["dt_lst_yr_lst_mon"] = df.dt_lst_mon - pd.tseries.offsets.DateOffset(years=1)
df["dt_lst_yr_lst_mon_nkey"] = df["dt_lst_yr_lst_mon"].dt.strftime("%Y%m%d")
return df
The columns dt_lst_yr_lst_mon_nkey, dt_lst_mon_nkey and dt_frst_dayof_lst_mon_nkey are returning values in datetime format ('1899-12-01 00:00:00') and I cant seem to figure out why. All the other *key columns return integers as expected
my main looks like below:
base_df = create_base_df(start_date="01/01/1900", end_date="01/12/1900")
month_df = month_operations(base_df)
The expected output : if the value of dt_lst_yr_lst_mon is "1900-12-01 00:00:00" then dt_lst_yr_lst_mon_nkey will be "19001201"
Any pointers on where I am going wrong is appreciated.
Thanks.
After solving a naive datetime problem I am facing a new problem on a view to generate graphs. Now I get mktime argument out of range.
I have no idea how to solve it. I didn't write the code, I am using it from a colleague of mine and I can't seem o understand why it fails. I think it has to do with a function that runs overtime and the error pops out.
#login_required(login_url='/accounts/login/')
def loggedin(request):
data = []
data2 = []
data3 = []
dicdata2 = {}
dicdata3 = {}
datainterior = []
today = timezone.localtime(timezone.now()+timedelta(hours=1)).date()
tomorrow = today + timedelta(1)
semana= today - timedelta(7)
today = today - timedelta(1)
semana_start = datetime.combine(today, time())
semana_start = timezone.make_aware(semana_start, timezone.utc)
today_start = datetime.combine(today, time())
today_start = timezone.make_aware(today_start, timezone.utc)
today_end = datetime.combine(tomorrow, time())
today_end = timezone.make_aware(today_end, timezone.utc)
for modulo in Repository.objects.values("des_especialidade").distinct():
dic = {}
mod = str(modulo['des_especialidade'])
dic["label"] = str(mod)
dic["value"] = Repository.objects.filter(des_especialidade__iexact=mod).count()
data.append(dic)
for modulo in Repository.objects.values("modulo").distinct():
dic = {}
mod = str(modulo['modulo'])
dic["label"] = str(mod)
dic["value"] = Repository.objects.filter(modulo__iexact=mod, dt_diag__gte=semana_start).count()
datainterior.append(dic)
# print mod, Repository.objects.filter(modulo__iexact=mod).count()
# data[mod] = Repository.objects.filter(modulo__iexact=mod).count()
dicdata2['values'] = datainterior
dicdata2['key'] = "Cumulative Return"
dicdata3['values'] = data
dicdata3['color'] = "#d67777"
dicdata3['key'] = "Diagnosticos Identificados"
data3.append(dicdata3)
data2.append(dicdata2)
#-------sunburst
databurst = []
dictburst = {}
dictburst['name'] = "CHP"
childrenmodulo = []
for modulo in Repository.objects.values("modulo").distinct():
childrenmodulodic = {}
mod = str(modulo['modulo'])
childrenmodulodic['name'] = mod
childrenesp = []
for especialidade in Repository.objects.filter(modulo__iexact=mod).values("des_especialidade").distinct():
childrenespdic = {}
esp = str(especialidade['des_especialidade'])
childrenespdic['name'] = esp
childrencode = []
for code in Repository.objects.filter(modulo__iexact=mod,des_especialidade__iexact=esp).values("cod_diagnosis").distinct():
childrencodedic = {}
codee= str(code['cod_diagnosis'])
childrencodedic['name'] = 'ICD9 - '+codee
childrencodedic['size'] = Repository.objects.filter(modulo__iexact=mod,des_especialidade__iexact=esp,cod_diagnosis__iexact=codee).count()
childrencode.append(childrencodedic)
childrenespdic['children'] = childrencode
#childrenespdic['size'] = Repository.objects.filter(des_especialidade__iexact=esp).count()
childrenesp.append(childrenespdic)
childrenmodulodic['children'] = childrenesp
childrenmodulo.append(childrenmodulodic)
dictburst['children'] = childrenmodulo
databurst.append(dictburst)
# print databurst
# --------stacked area chart
datastack = []
for modulo in Repository.objects.values("modulo").distinct():
datastackdic = {}
mod = str(modulo['modulo'])
datastackdic['key'] = mod
monthsarray = []
year = timezone.localtime(timezone.now()+timedelta(hours=1)).year
month = timezone.localtime(timezone.now()+timedelta(hours=1)).month
last = timezone.localtime(timezone.now()+timedelta(hours=1)) - relativedelta(years=1)
lastyear = int(last.year)
lastmonth = int(last.month)
#i = 1
while lastmonth <= int(month) or lastyear<int(year):
date = str(lastmonth) + '/' + str(lastyear)
if (lastmonth < 12):
datef = str(lastmonth + 1) + '/' + str(lastyear)
else:
lastmonth = 01
lastyear = int(lastyear)+1
datef = str(lastmonth)+'/'+ str(lastyear)
lastmonth = 0
datainicial = datetime.strptime(date, '%m/%Y')
datainicial = timezone.make_aware(datainicial, timezone.utc)
datafinal = datetime.strptime(datef, '%m/%Y')
datafinal = timezone.make_aware(datafinal, timezone.utc)
#print "lastmonth",lastmonth,"lastyear", lastyear
#print "datainicial:",datainicial,"datafinal: ",datafinal
filtro = Repository.objects.filter(modulo__iexact=mod)
count = filtro.filter(dt_diag__gte=datainicial, dt_diag__lt=datafinal).count()
conv = datetime.strptime(date, '%m/%Y')
ms = datetime_to_ms_str(conv)
monthsarray.append([ms, count])
#i += 1
lastmonth += 1
datastackdic['values'] = monthsarray
datastack.append(datastackdic)
#print datastack
if request.user.last_login is not None:
#print(request.user.last_login)
contador_novas = Repository.objects.filter(dt_diag__lte=today_end, dt_diag__gte=today_start).count()
return render_to_response('loggedin.html',
{'user': request.user.username, 'contador': contador_novas, 'data': data, 'data2': data2,
'data3': data3,
'databurst': databurst, 'datastack':datastack})
def datetime_to_ms_str(dt):
return str(1000 * mktime(dt.timetuple()))
I think the problem is with this condition.
while lastmonth <= int(month) or lastyear<int(year):
During December, month=12, so lastmonth <= int(month) will always be True. So the loop whill always return True, even once lastyear is more that the current year.
You want to loop if the loop is in the previous year, or if the loop is in the current year and the month is not in the future. Therefore, I think you want to change it to the following:
while lastyear < year or (lastyear == year and lastmonth <= month):
To be sure that the code is working and to understand it, you need to add lots of print statements to the loops, see how lastmonth and lastyear change, and check that the loop exits when you expect it to. You also need to test it for other values of year and month so that it doesn't break next month. Ideally you want to extract this bit of the code into a separate function. It would be easier to understand the loop if it only returned a list of (month, year) integers, instead of doing lots of date formatting at the same time. Then it would be easier to add unit tests.