How to assign months to their numeric equivalents in Python / Pandas?

How to assign months to their numeric equivalents in Python / Pandas? - python

Currently, I'm using the following for loop based on an if condition for each month to assign months to their numeric equivalents. It seems to be quite efficient in terms of runtime, but is too manual and ugly for my preferences.
How could this be better executed? I imagine it's possible to improve on it by simplifying/condensing the multiple if conditions somehow, as well as by using some sort of translator that is made for date conversions? Each of which would be preferable?
#make numeric month
combined = combined.sort_values('month')
combined.index = range(len(combined))
combined['month_numeric'] = None
for i in combined['month'].unique():
first = combined['month'].searchsorted(i, side='left')
last = combined['month'].searchsorted(i, side='right')
first_num = list(first)[0] #gives first instance
last_num = list(last)[0] #gives last instance
if i == 'January':
combined['month_numeric'][first_num:last_num] = "01"
elif i == 'February':
combined['month_numeric'][first_num:last_num] = "02"
elif i == 'March':
combined['month_numeric'][first_num:last_num] = "03"
elif i == 'April':
combined['month_numeric'][first_num:last_num] = "04"
elif i == 'May':
combined['month_numeric'][first_num:last_num] = "05"
elif i == 'June':
combined['month_numeric'][first_num:last_num] = "06"
elif i == 'July':
combined['month_numeric'][first_num:last_num] = "07"
elif i == 'August':
combined['month_numeric'][first_num:last_num] = "08"
elif i == 'September':
combined['month_numeric'][first_num:last_num] = "09"
elif i == 'October':
combined['month_numeric'][first_num:last_num] = "10"
elif i == 'November':
combined['month_numeric'][first_num:last_num] = "11"
elif i == 'December':
combined['month_numeric'][first_num:last_num] = "12"

You can use to_datetime, then month, convert to string and use zfill:
print (pd.to_datetime(df['month'], format='%B').dt.month.astype(str).str.zfill(2))
Sample:
import pandas as pd
df = pd.DataFrame({ 'month': ['January','February', 'December']})
print (df)
month
0 January
1 February
2 December
print (pd.to_datetime(df['month'], format='%B').dt.month.astype(str).str.zfill(2))
0 01
1 02
2 12
Name: month, dtype: object
Another solution is map by dict d:
d = {'January':'01','February':'02','December':'12'}
print (df['month'].map(d))
0 01
1 02
2 12
Name: month, dtype: object
Timings:
df = pd.DataFrame({ 'month': ['January','February', 'December']})
print (df)
df = pd.concat([df]*1000).reset_index(drop=True)
print (pd.to_datetime(df['month'], format='%B').dt.month.astype(str).str.zfill(2))
print (df['month'].map({'January':'01','February':'02','December':'12'}))
In [200]: %timeit (pd.to_datetime(df['month'], format='%B').dt.month.astype(str).str.zfill(2))
100 loops, best of 3: 13.5 ms per loop
In [201]: %timeit (df['month'].map({'January':'01','February':'02','December':'12'}))
1000 loops, best of 3: 462 µs per loop

You can use a map:
month2int = {"January":1, "February":2, ...}
combined["month_numeric"] = combined["month"].map(month2int)

Related

Parsing dates and I am lost

Write a program to read dates from input, one date per line. Each date's format must be as follows: March 1, 1990. Any date not following that format is incorrect and should be ignored. The input ends with -1 on a line alone. Output each correct date as: 3/1/1990.
Hint: Use string[start:end] to get a substring when parsing the string and extracting the date. Use the split() method to break the input into tokens.
Ex: If the input is:
March 1, 1990
April 2 1995
7/15/20
December 13, 2003
-1
then the output is:
3/1/1990
12/13/2003
This is what I have to start with and I am lost. Help?
def get_month_as_int(monthString):
if monthString == 'January':
month_int = 1
elif monthString == 'February':
month_int = 2
elif monthString == 'March':
month_int = 3
elif monthString == 'April':
month_int = 4
elif monthString == 'May':
month_int = 5
elif monthString == 'June':
month_int = 6
elif monthString == 'July':
month_int = 7
elif monthString == 'August':
month_int = 8
elif monthString == 'September':
month_int = 9
elif monthString == 'October':
month_int = 10
elif monthString == 'November':
month_int = 11
elif monthString == 'December':
month_int = 12
else:
month_int = 0
return month_int
user_string = input()
# TODO: Read dates from input, parse the dates to find the one
# in the correct format, and output in m/d/yyyy format

import datetime
inputs = []
result = []
#read the inputs
date = input()
inputs.append(date)
while not date == "-1":
date = input()
inputs.append(date)
#check if the input is in the correct format and convert it.
for date_text in inputs:
try:
date_text = datetime.datetime.strptime(date_text,"%d %B, %Y")
result.append(date_text.strftime("%d/%m/%Y"))
except ValueError:
pass
print(*result, sep = "\n")

s=0
d={'january':1,
'february':2,
'march':3,
'april':4,
'may':5,
'june':6,
'july':7,
'august':8,
'september':9,
'october':10,
'november':11,
'december':12}
while s!='-1':
s=input()
if "," in s:
s = s.split(",")
ar = s[0].split(" ")
if len(ar)<2:
continue
month, date = ar[0], ar[1]
year = s[1].strip()
if d.get(month.lower()):
print("{}/{}/{}".format(date, d[month.lower()], year))

Split text columns into two columns in Pandas DataFrame, for different dataframes

I have six different dataframes, some of this dataframes have 'NaN' values. I tried it without the if statements and it only worked on the dataframe that doesn't have 'NaN' values (I get this error: "ValueError: Columns must be same length as key" when I try it on the other dfs). What I'm trying to do is to create a function to split the df columns into two (air quality values and the unit).
def formatting(df):
""" split text columns into two columns and changes data type"""
# setting all floats to 2 digits in general
pd.options.display.float_format = "{:.2f}".format
# NO2
if 'NO2' != 'NaN':
df[['NO2', 'NO2_UNIT']] = df.NO2.apply(lambda x: pd.Series(str(x).split(' ')))
if 'NO2' != 'NaN':
df['NO2'] = pd.to_numeric(df['NO2'], downcast="float")
else:
pass
# SO2
if 'SO2' != 'NaN':
df[['SO2', 'SO2_UNIT']] = df.SO2.apply(lambda x: pd.Series(str(x).split(' ')))
if 'SO2' != 'NaN':
df['SO2'] = pd.to_numeric(df['SO2'], downcast="float")
else:
pass
# 03
if 'O3' != 'NaN':
df[['O3', 'O3_UNIT']] = df.O3.apply(lambda x: pd.Series(str(x).split(' ')))
if 'O3' != 'NaN':
df['O3'] = pd.to_numeric(df['O3'], downcast="float")
else:
pass
# PM10
if 'PM10' != 'NaN':
df[['PM10', 'PM10_UNIT']] = df.PM10.apply(lambda x: pd.Series(str(x).split(' ')))
if 'PM10' != 'NaN':
df['PM10'] = pd.to_numeric(df['PM10'], downcast="float")
else:
pass
# PM2.5
if 'PM2.5' != 'NaN':
df.rename(columns={'PM2.5': 'PM25'}, inplace = True)
df[['PM25', 'PM25_UNIT']] = df.PM25.apply(lambda x: pd.Series(str(x).split(" ")))
if 'PM2.5' != 'NaN':
df['PM25'] = pd.to_numeric(df['PM25'], downcast="float")
else:
pass
# CO
if 'CO' != 'NaN':
df[['CO', 'CO_UNIT']] = df.CO.apply(lambda x: pd.Series(str(x).split(" ")))
if 'CO' != 'NaN':
df['CO'] = pd.to_numeric(df['CO'], downcast="float")
else:
pass
# TEMP
if 'TEMP' != 'NaN':
df[['TEMP', 'TEMP_UNIT']] = df.TEMP.apply(lambda x: pd.Series(str(x).split(" ")))
if 'TEMP' != 'NaN':
df['TEMP'] = pd.to_numeric(df['TEMP'], downcast="float")
else:
pass
# HUM
if 'HUM' != 'NaN':
df[['HUM', 'HUM_UNIT']] = df.HUM.apply(lambda x: pd.Series(str(x).split(" ")))
if 'HUM' != 'NaN':
df['HUM'] = pd.to_numeric(df['HUM'], downcast="float")
else:
pass
# AIRPRES
if 'AIRPRES' != 'NaN':
df[['AIRPRES', 'AIRPRES_UNIT']] = df.AIRPRES.apply(lambda x: pd.Series(str(x).split(" ")))
if 'AIRPRES' != 'NaN':
df['AIRPRES'] = df['AIRPRES'].replace(',', '', regex=True)
df['AIRPRES'] = pd.to_numeric(df['AIRPRES'], downcast="float")
else:
pass
# WS
if 'WS' != 'NaN':
df[['WS', 'WS_UNIT']] = df.WS.apply(lambda x: pd.Series(str(x).split(" ")))
if 'WS' != 'NaN':
df['WS'] = pd.to_numeric(df['WS'], downcast="float")
else:
pass
# WD
if 'WD' != 'NaN':
df[['WD', 'WD_UNIT']] = df.WD.apply(lambda x: pd.Series(str(x).split(" ")))
if 'WD' != 'NaN':
df['WD'] = pd.to_numeric(df['WD'], downcast="float")
else:
pass
# NO
if 'NO' != 'NaN':
df[['NO', 'NO_UNIT']] = df.NO.apply(lambda x: pd.Series(str(x).split(" ")))
if 'NO' != 'NaN':
df['NO'] = pd.to_numeric(df['NO'], downcast="float")
else:
pass
# BENZENE
if 'BENZENE' != 'NaN':
df[['BENZENE', 'BENZENE_UNIT']] = df.BENZENE.apply(lambda x: pd.Series(str(x).split(" ")))
if 'BENZENE' != 'NaN':
df['BENZENE'] = pd.to_numeric(df['BENZENE'], downcast="float")
else:
pass
# order columns
df = df[['TIMESTAMP', 'NO2', 'NO2_UNIT', 'SO2', 'SO2_UNIT', 'O3', 'O3_UNIT',
'PM10', 'PM10_UNIT', 'PM25', 'PM25_UNIT', 'CO', 'CO_UNIT', 'TEMP',
'TEMP_UNIT', 'HUM', 'HUM_UNIT', 'AIRPRES', 'AIRPRES_UNIT', 'WS',
'WS_UNIT', 'WD', 'WD_UNIT', 'NO', 'NO_UNIT', 'BENZENE', 'BENZENE_UNIT']]
return df
Then I'm planning to put all the df on a list and then use a for loop to run the function on each of the df.
Here you see the headers and the first three rows:
print(gharb.head(3).to_dict())
{'TIMESTAMP': {0: '26/01/2022 14:00', 1: '26/01/2022 13:00', 2: '26/01/2022 12:00'},
'NO2': {0: '1.3 µg/m3', 1: '1.41 µg/m3', 2: '2.11 µg/m3'},
'SO2': {0: '0.78 µg/m3', 1: '0.81 µg/m3', 2: '0.89 µg/m3'},
'O3': {0: '90.05 µg/m3', 1: '88.33 µg/m3', 2: '86.41 µg/m3'},
'PM10': {0: '1.9 µg/m3', 1: '2.18 µg/m3', 2: '3.28 µg/m3'},
'CO': {0: '0.19 mg/m3', 1: '0.19 mg/m3', 2: '0.19 mg/m3'},
'TEMP': {0: '10.1 °C', 1: '9.99 °C', 2: '9.79 °C'},
'HUM': {0: '64.98 %', 1: '63.59 %', 2: '64.63 %'},
'WS': {0: '4.92 m/s', 1: '5.24 m/s', 2: '5.37 m/s'},
'WD': {0: '249.15 Deg', 1: '232.48 Deg', 2: '238.07 Deg'},
'NO': {0: '0.12 µg/m3', 1: '0.14 µg/m3', 2: '0.31 µg/m3'},
'PM2.5': {0: 'None', 1: 'None', 2: 'None'},
'AIRPRES': {0: 'None', 1: 'None', 2: 'None'},
'BENZENE': {0: 'None', 1: 'None', 2: 'None'}}

Here is one way that should work with your input data:
def formatting(df):
""" split text columns into two columns and changes data type"""
# setting all floats to 2 digits in general
pd.options.display.float_format = "{:.2f}".format
# define all the columns to perform the split
# could also be an input of the function
cols = [ 'NO2', 'SO2', 'O3', 'PM10', 'CO', 'TEMP', 'HUM', 'WS',
'WD', 'NO', 'PM2.5', 'AIRPRES', 'BENZENE']
# to get all result columns available
res_cols = ['TIMESTAMP']
# iterate over the columns to split
for col in cols:
#use try/except instead of if to be able to handle weird columns
try:
# add the column to select in the result
res_cols.append(col)
# now split the column and expand one time only, in case several space
df[[col, col+'_UNIT']] = df[col].astype(str).str.split(' ', expand=True, n=1)
# add the unit column only if the split works
res_cols.append(col+'_UNIT')
# in case of the split does not work
except ValueError:
print(f'Error for column {col}')
# from string to float, coerce (aka replace by NaN) if not possible
df[col] = pd.to_numeric(df[col], downcast="float", errors='coerce')
# order columns
df = df[res_cols]
return df
and now you get. You can remove the print in the except if you don't care.
df = formatting(df)
# Error for column PM2.5
# Error for column AIRPRES
# Error for column BENZENE
print(df)
# TIMESTAMP NO2 NO2_UNIT SO2 SO2_UNIT O3 O3_UNIT PM10 PM10_UNIT \
# 0 26/01/2022 14:00 1.30 µg/m3 0.78 µg/m3 90.05 µg/m3 1.90 µg/m3
# 1 26/01/2022 13:00 1.41 µg/m3 0.81 µg/m3 88.33 µg/m3 2.18 µg/m3
# 2 26/01/2022 12:00 2.11 µg/m3 0.89 µg/m3 86.41 µg/m3 3.28 µg/m3
# CO CO_UNIT TEMP TEMP_UNIT HUM HUM_UNIT WS WS_UNIT WD WD_UNIT \
# 0 0.19 mg/m3 10.10 °C 64.98 % 4.92 m/s 249.15 Deg
# 1 0.19 mg/m3 9.99 °C 63.59 % 5.24 m/s 232.48 Deg
# 2 0.19 mg/m3 9.79 °C 64.63 % 5.37 m/s 238.07 Deg
# NO NO_UNIT PM2.5 AIRPRES BENZENE
# 0 0.12 µg/m3 NaN NaN NaN
# 1 0.14 µg/m3 NaN NaN NaN
# 2 0.31 µg/m3 NaN NaN NaN
Note that if you rerun the fonction of df, you get print a error print for all the columns, but the result is still good.

List comprehension with a exit once value found [duplicate]

This question already has answers here:
How can I simplify repetitive if-elif statements in my grading system function?
(14 answers)
Closed 2 years ago.
i am now learning list comprehensions, and want to replace a lengthy if statement with an elegant list comprehension. The following if statement is what I want to convert to comprehension list below it. The comprehension list doesn't do what I want to do yet, but atleast you can see where I am trying to go with it.
would like the list comprehension to only give back one value as how the if statement will.
Thank you in advance
weight_kg = 8
if weight_kg <= 0.25:
price_weight = 2.18
elif weight_kg <= 0.5:
price_weight = 2.32
elif weight_kg <= 1:
price_weight = 2.49
elif weight_kg <= 1.5:
price_weight = 2.65
elif weight_kg <= 2:
price_weight = 2.90
elif weight_kg <= 3:
price_weight = 4.14
elif weight_kg <= 4:
price_weight = 4.53
elif weight_kg <= 5:
price_weight = 4.62
elif weight_kg <= 6:
price_weight = 5.28
elif weight_kg <= 7:
price_weight = 5.28
elif weight_kg <= 8:
price_weight = 5.42
elif weight_kg <= 9:
price_weight = 5.42
elif weight_kg <= 10:
price_weight = 5.42
elif weight_kg <= 11:
price_weight = 5.43
else:
price_weight = 5.63
print(price_weight)
shipping_price = [{"weight": 0.25, "price" : 2.18}, {"weight": 0.5 "price" : 2.32}, {"weight": 1 "price" : 2.49}]
toy_weight = 0.6
price = [ship_price["weight"] for ship_price in shipping_price if ship_price["weight"] <= toy_weight]
print(price)

Since you only want the first value from the generator expression, you don't want a list at all. Just use next to pull the first value:
>>> shipping_price = [
... {"weight": 0.25, "price" : 2.18},
... {"weight": 0.5, "price" : 2.32},
... {"weight": 1, "price" : 2.49}
... ]
>>> toy_weight = 0.6
>>> next(sp["price"] for sp in shipping_price if sp["weight"] >= toy_weight)
2.49
I'd use tuples for this rather than dictionaries. Possibly NamedTuples if you have a lot of fields and want to give them names, but for two values I'd just use a plain old tuple like this:
>>> weights_and_prices = [(0.25, 2.18), (0.5, 2.32), (1, 2.49)]
>>> toy_weight = 0.6
>>> next(wp[1] for wp in weights_and_prices if wp[0] >= toy_weight)
2.49
Expand the weights_and_prices tuple list as needed (i.e. with all the remaining weight/price values from your original if/elif chain).

Udacity's finish daysBetweenDates answer might be wrong

There might be a bug in this lesson.
I'm trying to move forward without needing to watch the answer videos for the daysBetweenDates quiz.
Long story short:
I figured out the code and it works for all test cases, except one.
The error was odd. The difference between the number of days between dates I got and the answer according to Udacity. My notion is that if there's error in my code, the difference between the answer I got and Udacity's expected answer would be more than 1 because the error should be recurring.
I tried to compute the number of days between dates using a different approach. I got the number my program computed.
So the question is, is the number of days between dates between 1900,1 ,1 and 1999,12, 31 really 36523(Udacity's answer) or 36524(my answer)
Here's my complete code
I recommend you try it on your interpreter to check if Udacity's answer for the last test case is correct.
# Credit goes to Websten from forums
#
# Use Dave's suggestions to finish your daysBetweenDates
# procedure. It will need to take into account leap years
# in addition to the correct number of days in each month.
number_of_days_in_month = 30
def nextDay(year, month, day):
"""Simple version: assume every month has 30 days"""
number_of_days_in_month = setDaysInMonth(month, year)
if day < number_of_days_in_month:
return year, month, day + 1
else:
if month == 12:
return year + 1, 1, 1
else:
return year, month + 1, 1
def dateIsBefore(year1, month1, day1, year2, month2, day2):
"""Returns True if year1-month1-day1 is before year2-month2-day2. Otherwise, returns False."""
if year1 < year2:
return True
if year1 == year2:
if month1 < month2:
return True
if month1 == month2:
return day1 < day2
return False
def daysBetweenDates(year1, month1, day1, year2, month2, day2):
"""Returns the number of days between year1/month1/day1
and year2/month2/day2. Assumes inputs are valid dates
in Gregorian calendar."""
# program defensively! Add an assertion if the input is not valid!
assert not dateIsBefore(year2, month2, day2, year1, month1, day1)
number_of_days_between_dates = 0
while dateIsBefore(year1, month1, day1, year2, month2, day2):
year1, month1, day1 = nextDay(year1, month1, day1)
number_of_days_between_dates += 1
print number_of_days_between_dates
return number_of_days_between_dates
def setDaysInMonth(month1, year1):
if isLeapYear(year1) == False:
if month1 == 1:
number_of_days_in_month = 31
if month1 == 3:
number_of_days_in_month = 31
if month1 == 5:
number_of_days_in_month = 31
if month1 == 7:
number_of_days_in_month = 31
if month1 == 8:
number_of_days_in_month = 31
if month1 == 10:
number_of_days_in_month = 31
if month1 == 12:
number_of_days_in_month = 31
if month1 == 4:
number_of_days_in_month = 30
if month1 == 6:
number_of_days_in_month = 30
if month1 == 9:
number_of_days_in_month = 30
if month1 == 11:
number_of_days_in_month = 30
if month1 == 2:
number_of_days_in_month = 28
return number_of_days_in_month
else:
if month1 == 1:
number_of_days_in_month = 31
if month1 == 3:
number_of_days_in_month = 31
if month1 == 5:
number_of_days_in_month = 31
if month1 == 7:
number_of_days_in_month = 31
if month1 == 8:
number_of_days_in_month = 31
if month1 == 10:
number_of_days_in_month = 31
if month1 == 12:
number_of_days_in_month = 31
if month1 == 4:
number_of_days_in_month = 30
if month1 == 6:
number_of_days_in_month = 30
if month1 == 9:
number_of_days_in_month = 30
if month1 == 11:
number_of_days_in_month = 30
if month1 == 2:
number_of_days_in_month = 29
return number_of_days_in_month
def isLeapYear(year1):
if year1 % 4 == 0:
return True
return False
def numberOfLeapYears(year1, year2):
number_of_leap_years = 0
while year1 < year2:
if year1 % 4 == 0:
number_of_leap_years += 1
year1 += 1
else:
year1 += 1
#print "number of leap years: " + str(number_of_leap_years)
return number_of_leap_years
def numberOfNonLeapYears(year1, year2):
number_of_non_leap_years = 0
while year1 < year2:
if year1 % 4 == 0:
year1 += 1
else:
number_of_non_leap_years += 1
year1 += 1
#print "number of non leap years: " + str(number_of_non_leap_years)
return number_of_non_leap_years
def numberOfDays(year1, year2):
number_of_leap_years = numberOfLeapYears(year1, year2)
print number_of_leap_years
number_of_non_leap_years = numberOfNonLeapYears(year1, year2)
print number_of_non_leap_years
number_of_days = 0
number_of_days = number_of_leap_years * 366 + number_of_non_leap_years * 365
#print number_of_days
return number_of_days
def test():
test_cases = [((2012,1,1,2012,2,28), 58),
((2012,1,1,2012,3,1), 60),
((2011,6,30,2012,6,30), 366),
((2011,1,1,2012,8,8), 585 ),
((1900,1,1,1999,12,31), 36523),
((1900,1,1,1910,1,1), 3653)]
for (args, answer) in test_cases:
result = daysBetweenDates(*args)
if result != answer:
print "Test with data:", args, "failed"
else:
print "Test case passed!"
test()
#print isLeapYear(1900)
#print 366*3 + 365*7
#print numberOfLeapYears(1900, 1999)
#print numberOfNonLeapYears(1900, 1999)
print numberOfDays(1900, 2000)

You are using a naive test for Leap Year ( %4). But there are additional rules to leap years.
Leap year is any year evenly divisible by 4, unless it is divisible by 100 (then it is not a leap year), unless it is divisible by 400 (Then it is a leap year).
Thus 1500, 1700, 1800, 1900 were not leaps years but 1600 and 2000 were leap years.
A simple Python version:
def is_leap_year(year):
if year % 400 == 0:
return True
if year % 4 == 0 and not year % 100 == 0:
return True
return False

Days old udacity

I have a problem in these two casesprint daysBetweenDates(2011, 1, 1, 2012, 8, 8)
print daysBetweenDates(1900,1,1, 1999,12, 31)when I put them with the other test cases I got a wrong answer by 1 day extra and sometimes by 2 days.sometimes one of them give me the right answer but it also appears asTest with data:(2011, 1, 1, 2012,8,8)failed
Test with data: (1900, 1, 1, 1999, 12, 31) failed but when I test each case alone i got the right answer.
daysofmonths = [ 0,31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
def leap_year(year):
leap_day = 366
common_day = 365
if year % 4 != 0:
return common_day
elif year % 100 != 0:
return leap_day
elif year % 400 !=0:
return common_day
else:
return leap_day
def daysBetweenDates(year1, month1, day1, year2, month2, day2):
#code for same year
if year1 == year2:
if month1 == month2:
return day2 - day1
days = daysofmonths[month1] - day1
month1 = month1 + 1
while month1 < month2:
if leap_year(year1) == 366:
daysofmonths[2] = 29
days = days + daysofmonths[month1]
month1 = month1 + 1
return days + day2
################################################
days = daysofmonths[month1] - day1
month1 = month1 + 1
while month1 <= 12:
if leap_year(year1) == 366:
daysofmonths[2] = 29
days = days + daysofmonths[month1]
month1 = month1 + 1
#print days
year1 = year1 + 1
###########################################################
days = days + day2
month2 = month2 - 1
while month2 >= 1:
if leap_year(year2) == 366:
daysofmonths[2] = 29
days = days + daysofmonths[month2]
month2 = month2 - 1
#print days
year2 = year2 - 1
###########################################################
while year1 <= year2:
days = days + leap_year(year1)
year1 = year1 + 1
return days
print daysBetweenDates(2011, 1, 1, 2012, 8, 8)
print daysBetweenDates(1900,1,1, 1999,12, 31)
def test():
test_cases = [((2012,1,1,2012,2,28), 58),
((2012,1,1,2012,3,1), 60),
((2011,6,30,2012,6,30), 366),
((2011,1,1,2012,8,8), 585 ),
((1900,1,1,1999,12,31), 36523)]
for (args, answer) in test_cases:
result = daysBetweenDates(*args)
if result != answer:
print "Test with data:", args, "failed"
else:
print "Test case passed!"
test()

when you do:
daysofmonths[2] = 29
it changes the element in the list, which then is used for every subsequent call, if you added print(daysofmonths[2]) in between the test cases you would see that it is always 29 after the first case that needs to check February, so instead of conditionally changing the list with:
if leap_year(year1) == 366:
daysofmonths[2] = 29
days = days + daysofmonths[month1]
just conditionally add to days:
if leap_year(year1) == 366 and month1 == 2:
days = days + 29
else:
days = days + daysofmonths[month1]
then do the same thing lower down with year2 and month2 (I would highly recommend you separate your code into more functions as a lot of it is very repetitive)

This is the only post I found here on this particular problem so I thought I would share my solution.
#days in the months of a non leap year
daysOfMonths = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
#determine if a year is a leap year
def is_leap_year(year1):
year = True
if year1 % 4 != 0:
year = False
elif year1 % 100 != 0:
year = True
elif year1 % 400 != 0:
year = False
else: year = True
return year
#returns the days in the given month of the given year
#I was trying to do something similar to the OP until I read this post
def days_in_month(year, month):
days = 0
if is_leap_year(year) and month == 2:
days += 29
else:
days += daysOfMonths[month - 1]
return days
#itterates through each month starting at year1 month1
#up to but not including month2 of year2 and
#returns the total number of days in that period
def total_days(year1, month1, year2, month2):
days = 0
while year1 < year2 or month1 < month2:
days += days_in_month(year1, month1)
month1 += 1
if month1 == 13:
year1 += 1
month1 = 1
return days
def daysBetweenDates(year1, month1, day1, year2, month2, day2):
days = total_days(year1, month1, year2, month2)
#because I included the beginning month I have to subtract day1
#because I did not include the final month I have to add day2
return days - day1 + day2
#I used print statements here to troubleshoot
#print days_in_month(2012, 1)
#print daysBetweenDates(2012, 1, 1, 2012, 2, 28)
#print daysBetweenDates(2012, 1, 1, 2012, 3, 1)
#print daysBetweenDates(2011,6,30,2012,6,30)
#print daysBetweenDates(2011,1,1,2012,8,8)
#print daysBetweenDates(1900,1,1,1999,12,31)
def test():
test_cases = [((2012,1,1,2012,2,28), 58),
((2012,1,1,2012,3,1), 60),
((2011,6,30,2012,6,30), 366),
((2011,1,1,2012,8,8), 585 ),
((1900,1,1,1999,12,31), 36523)]
for (args, answer) in test_cases:
result = daysBetweenDates(*args)
if result != answer:
print "Test with data:", args, "failed"
else:
print "Test case passed!"
test()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to assign months to their numeric equivalents in Python / Pandas? - python

You can use a map: month2int = {"January":1, "February":2, ...} combined["month_numeric"] = combined["month"].map(month2int)

Related

Parsing dates and I am lost

Split text columns into two columns in Pandas DataFrame, for different dataframes

List comprehension with a exit once value found [duplicate]

Udacity's finish daysBetweenDates answer might be wrong

Days old udacity

Categories

Resources