Python XML Parsing - need to correct while loop - python

Fairly new to Python. I'm parsing an XML file and the following code returns the undesired results. I can understand why I'm getting my results - there are two escalations in the XML for this deal and I'm getting results for each set. I'm need help updating my code to only return the monthly rent for each escalation in the XML:
<RentEscalations>
<RentEscalation ID="354781">
<BeginIn>7</BeginIn>
<Escalation>3.8</Escalation>
<RecurrenceInterval>12</RecurrenceInterval>
<EscalationType>bump</EscalationType>
</RentEscalation>
<RentEscalation ID="354782">
<BeginIn>61</BeginIn>
<Escalation>1.0</Escalation>
<RecurrenceInterval>12</RecurrenceInterval>
<EscalationType>bump</EscalationType>
</RentEscalation>
</RentEscalations>
The rent starts at $3.00/sqft for the first 6 months. This XML block shows that, for each 12 months (RecurrenceInterval), the rent will be $6.80/sqft ($3.00 base + $3.80 escalation). The following twelve months will be $10.60 ($6.80 + 3.80). Each year, the amount per square foot will increase by $3.80 until the 61st month in the term. At that point, the rent will increase by $1.00/sqft for the remainder of the term. The entire term of the lease is 120 months.
My results include 114 results based on the first escalation (3.80/sqft) followed by 114 rows showing as if the rent starts at $3.00/sqft incrementing by $1.00/sqft each year.
Any help is appreciated!
import xml.etree.ElementTree as ET
import pyodbc
import dateutil.relativedelta as rd
import datetime as dt
tree = ET.parse('C:\\FileLocation\\DealData.xml')
root = tree.getroot()
for deal in root.findall("Deals"):
for dl in deal.findall("Deal"):
dealid = dl.get("DealID")
for dts in dl.findall("DealTerms/DealTerm"):
dtid = dts.get("ID")
darea = float(dts.find("RentableArea").text)
dterm = int(dts.find("LeaseTerm").text)
for brrent in dts.findall("BaseRents/BaseRent"):
brid = brrent.get("ID")
rent = float(brrent.find("Rent").text)
darea = float(dts.find("RentableArea").text)
per = brrent.find("Period").text
dtstart = dts.find("CommencementDate").text
startyr = int(dtstart[0:4])
startmo = int(dtstart[5:7])
startday = int(dtstart[8:])
start = dt.date(startyr, startmo, startday)
end = start + rd.relativedelta(months=dterm)
if brrent.find("Duration").text is None:
duration = 0
else:
duration = int(brrent.find("Duration").text)
termbal = dterm - duration
for resc in dts.findall("RentEscalations/RentEscalation"):
rescid = resc.get("ID")
esctype = resc.find("EscalationType").text
begmo = int(resc.find("BeginIn").text)
esc = float(resc.find("Escalation").text)
intrvl = int(resc.find("RecurrenceInterval").text)
if intrvl != 0:
pers = termbal / intrvl
else:
pers = 0
escst = start + rd.relativedelta(months=begmo - 1)
i = 0
x = begmo
newrate = rent
while i < termbal:
billdt = escst + rd.relativedelta(months=i)
if per == "rsf/year":
monthlyamt = (newrate + esc) * darea / 12.0
if per == "month":
monthlyamt = newrate + esc
if per == "year":
monthlyamt = (newrate + esc) / 12.0
if per == "rsf/month":
monthlyamt = (newrate + esc) * darea
try:
if i % intrvl == 0:
level = x + 1
newrent = monthlyamt
x += 1
newrate += esc
else:
level = x
except ZeroDivisionError:
break
i += 1
if dealid == "1254278":
print(dealid, dtid, rescid, dterm, darea, escst, rent, intrvl, esctype, termbal, \
monthlyamt, billdt, pers, level, newrate, newrent)

Related

Python convert excel column with currency from another column using currencycoverter or other dynamic source

I have following table in my excel file called output.xlsx
output_workbook = 'output.xlsx'
sheet1 = pd.read_excel(output_workbook, sheet_name="Sheet1")
a1_to_convert = sheet1['amount1']
a2_to_convert = sheet1['amount2']
a3_to_convert = sheet1['amount3']
currency = sheet1['Currency']
pd.to_numeric(a1_to_convert)
pd.to_numeric(a2_to_convert)
pd.to_numeric(a3_to_convert)
currency = ''
rate = 1
pd.to_numeric(rate)
sheet1['Exchange Rate'] = 0
wb = xl.open_workbook(output_workbook)
s1 = wb.sheet_by_index(0)
s1.cell_value(0,0)
print("No. of rows:", s1.nrows)
lg = s1.nrows - 1
for i in range (lg):
currency = sheet1['Currency'].at[i]
if currency == 'USD':
rate = 1
elif currency == "EUR":
rate = 1.2035935
elif currency == "CAD":
rate = 0.79816741
elif currency == "CHF":
rate = 1.0924002
elif currency == "GBP":
rate = 1.3986061
elif currency == "NOK":
rate = 0.12056582
elif currency == "DKK":
rate = 0.16186762
elif currency == "SEK":
rate = 0.11905152
sheet1['Exchange Rate'].at[i] = rate
convert_a1_usd = a1_to_convert * rate
sheet1['a1 USD'] = convert_a1_usd
convert_a2_usd = a2_to_convert * rate
sheet1['a2 USD'] = convert_a2_usd
convert_a3_usd = a3_to_convert * rate
sheet1['a3 USD'] = convert_a3_usd
print(sheet1)
sheet1.to_excel("output1.xlsx")
expected output
I need using python to convert in place the value in all lines in USD
and if possible to also help on adding a new column with the usd value at the right of the current column.
Many thanks for your help

Rainfall based Crop Insurance Payout calculation

I am analysing daily rainfall data to calculate Insurance payout for farmers in case of excess rainfall coverage. policy covers if the "Consecutive 3 day cumulative rainfall" is greater than 50mm between 1st-Oct and 31st-Oct.
I was able to write the code in Python to find the matching criteria. But when it rains continuously then the result has overlapping dates which is not acceptable payout.
Need help in calculating best payout option in case of overlapping dates.
for dist in data["dcode"].unique():
d_data = data[data["dcode"] == dist]
#print(dist)
for block in d_data["mandal"].unique():
prev_rain = 0
prev_to_date = "01/12/2022"
for each in rain_dev_input:
#[(rain_dev_input["TERM"] == "DEFICIT RAINFALL") & (rain_dev_input["DIST_CODE"] == 240)]:
#print(each)
distcode = each["DIST_CODE"]
#print(distcode)
term = str(each["TERM"])
if (distcode == dist) & (term == "EXCESS RAINFALL"):
start_date = each["FROM_PERIOD"]
end_date = each["TO_PERIOD"]
s_date = datetime.datetime.strptime(start_date, "%d/%m/%y")
e_date = datetime.datetime.strptime(end_date, "%d/%m/%y")
#s_date = start_date.strftime(start_date, "%Y-%m-%d")
#e_date = end_date.strftime(end_date, "%Y-%m-%d")
#count1 = daterange(s_date, e_date)
#print(s_date, ": ", e_date)
m_data = d_data[d_data["mandal"] == block]
p_data = m_data.loc[start_date:end_date]
for singledate in daterange(s_date, e_date):
#print("Inside Excess Rain")
from_date = datetime.datetime.strftime(singledate, "%Y-%m-%d")
to_date = datetime.datetime.strftime(
(singledate + timedelta(2)), "%Y-%m-%d"
)
total_rain = p_data.loc[from_date:to_date]["rain"].sum()
#print(total_rain)
range1 = float(each["RANGE1"])
range2 = float(each["RANGE2"])
if (total_rain >= range1) & (total_rain < range2):
#print("inside write to file")
if (from_date <= prev_to_date <= to_date) & (prev_rain <= total_rain):
temp["Max"] = total_rain;
prev_rain = total_rain
prev_to_date = to_date
temp = dict()
temp["district"] = each["DIST_NAME"]
temp["mandal"] = block
temp[
"category"
] = "excess rainfall for 3 consecutive days, cumulative"
temp["rainfall"] = total_rain
temp["from_date"] = from_date
temp["to_date"] = to_date
temp["phase"] = each["PHASE"]
#temp["insurance"] = insurance_excessrain(each, total_rain)
temp["insurance"] = (total_rain - range1)* each["PAYOUT"]
excess_rainfall.append(temp)
# %%
# output excess rainfall rain_dev_list data
excess_rain = pd.DataFrame(excess_rainfall)
excess_rain.to_csv(str(output_folder) + "/excess_rainfall_2020_h2.csv", index=False)
This result has overlapping dates
Daily Rainfall sample data

python data frame filter conditions: any faster way

parts_list = imp_parts_df['Parts'].tolist()
sub_week_list = ['2016-12-11', '2016-12-04', '2016-11-27', '2016-11-20', '2016-11-13']
i = 0
start = DT.datetime.now()
for p in parts_list:
for thisdate in sub_week_list:
thisweek_start = pd.to_datetime(thisdate, format='%Y-%m-%d') #'2016/12/11'
thisweek_end = thisweek_start + DT.timedelta(days=7) # add 7 days to the week date
val_shipped = len(shipment_df[(shipment_df['loc'] == 'USW1') & (shipment_df['part'] == str(p)) & (shipment_df['shipped_date'] >= thisweek_start) & (shipment_df['shipped_date'] < thisweek_end)])
print(DT.datetime.now() - start).total_seconds()
shipment_df has around 35000 records
partlist has 436 parts
sub_week_list has 5 dates in it
it took overall 438.13 secs to run this code
Is there any faster way to do it?
parts_list = imp_parts_df['Parts'].astype(str).tolist()
i = 0
start = DT.datetime.now()
for p in parts_list:
q = 'loc == "xxx" & part == #p & "2016-11-20" <= shipped_date < "2016-11-27"'
val_shipped = len(shipment_df.query(q))
print (DT.datetime.now() - start).total_seconds()

Calculate the future value for only one category using the IRR (Python)

import xlrd
import numpy
fileWorkspace = 'C://Users/jod/Desktop/'
wb1 = xlrd.open_workbook(fileWorkspace + 'assign2.xls')
sh1 = wb1.sheet_by_index(0)
time,amount,category = [],[],[]
for a in range(2,sh1.nrows):
time.append(int(sh1.cell(a,0).value)) # Pulling time from excel (column A)
amount.append(float(sh1.cell(a,1).value)) # Pulling amount from excel (column B)
category.append(str(sh1.cell(a,2).value)) # Pulling category from excel (column C)
#print(time)
#print(amount)
#print(category)
print('\n')
p_p2 = str(sh1.cell(0,1))
p_p1 = p_p2.replace("text:'","")
pp = p_p1.replace("'","")
print(pp) # Printing the type of pay period (Row 1, col B)
c_p2 = str(sh1.cell(1,1))
c_p1 = c_p2.replace("text:'","")
cp = c_p1.replace("'","")
print(cp) # Printing the type of compound period (Row 2, col B)
netflow = 0
outflow = 0
inflow = 0
flow = 0
cat = ["Sales", "Salvage", "Subsidy", "Redeemable", "Utility", "Labor",
"Testing", "Marketing", "Materials", "Logistics"]
if pp == "Years" and cp == "Years": # if pay period and compound period are both in years
IRR = numpy.irr(amount) * 100 # Calculates the internal rate of return (IRR)
print ("IRR:", round(IRR, 2), '%', '\n') # prints (IRR)
for i in time: # for every value in time array
if cat[5] in category: # if "Labor" for cat array is in category array or not
# calculates the present values using all the amount values (col B) instead of
# just using the ones that has "Labor" category label beside them
# Need to make every other value 0, such as beside "Redeemable" and "Salvage"
flow = amount[i] / numpy.power((1 + (IRR/100)), time[i])
if flow>0:
inflow = inflow + flow
if flow<0:
outflow = outflow + flow
print ('Present Value (P) is:', round(flow,0), '\n')
netflow = outflow + inflow
print("In year 0 or current year")
print("-------")
print ('Outflow is: ', round(outflow,0))
print ('Inflow is: ', round(inflow,0))
print ('Netflow is: ', round(netflow,0), '\n')
outflow2 = (round(outflow,0))*(1+(IRR/100))**(9)
inflow2 = (round(inflow,0))*(1+(IRR/100))**(9)
netflow2 = outflow2 + inflow2
print("In year 9")
print("-------")
print ('Outflow is: ', round(outflow2,0))
print ('Inflow is: ', round(inflow2,0))
print ('Netflow is: ', round(netflow2,0), '\n')
I have commented important lines of code for clarification.
Here is the original question:
illustrate the breakdown of major project revenues and expenses by category as a percentage of that project’s future value in year 9. The illustration must also clearly indicate the total future value of the project in year 9 as well as the IRR.
There will be a total of 10 revenue and cost categories that a project may be composed of. The categories are: Sales, salvage, subsidy, redeemable, utility, labor, testing, marketing, materials and logistics. All revenues and expenses will fall in one of these ten categories. The project pay period and compound period will be identified at the top of the Excel sheet. Pay period and compound period may be designated as any of the following: years, quarters, months.
I am getting confused because I am not able to pull the only values from beside the "Labor", "Redeemable", or "Salvage". I just don't know where I am making a mistake, or there is something that is incomplete. Below is the excel file image:
Excel File Image 2
Excel File Image 3
After revising, all cashflows are discounted at the irr. What is done is the following:
i) determineAdjustments takes the pay period (column A) and adjusts if for the year ended (if it is a monthly amount it puts it in the proper year ended) and if its monthly puts in in the month ended (no adjustment necessary). This will divide the pay period by 12 if yearly cash flows are needed (yearly compounding)
ii) IRR is calculated, and the compounding period is used to adjust the monthly IRR for monthly pay periods
iii) all expenses are discounted at the IRR and input into a list for cat_contributions['category_name'] = [discounted period 1, discounted period 2 ... ]
iv) Then the net inflows and outflows are sums of these.
I can't type up data in the spreadsheets from the images as that would take a while, but maybe tinker with this and see if you can get it to work.
from __future__ import division
import xlrd
import numpy
import os
import math
def main(xls = 'xls_name.xlsx', sh = 0):
#save script in same folder as the xls file
os.chdir( os.getcwd() )
wb = xlrd.open_workbook(xls)
sh = wb.sheet_by_index(0)
pay_period = sh.cell_value(0,1)
compounding_period = sh.cell_value(1,1)
compounding_factor, pay_factor = determineAdjustments(
pay_period, compounding_period)
number_of_periods = max( sh.col_values(0, start_rowx = 2) )
flow_per_period = [ 0*i for i in range( int( math.ceil( number_of_periods/pay_factor ) ) + 1 ) ]#list of length number of pay_periods
for r in range(2,sh.nrows):
pay_period = int( math.ceil( sh.cell_value(r,0) / pay_factor ) )
flow_per_period[pay_period] += sh.cell_value(r,1) #unadjusted cash flows
irr = calculateIRR(flow_per_period, compounding_factor)
cat_contributions = sortExpenditures(sh, irr, pay_factor)
total_cat_contributions, netflow, total_outflow, total_inflow = calculateFlows(cat_contributions)
printStats(cat_contributions, irr, compounding_factor, pay_factor,
total_cat_contributions, netflow, total_outflow, total_inflow)
return
def determineAdjustments(pay_period, compounding_period):
if compounding_period == 'years':
compounding_factor = 1
if pay_period == 'months':
pay_factor = 12
if pay_period == 'years':
pay_factor = 1
#assume no days pay periods
if compounding_period == 'months':
compounding_factor = 12
#assume no yearly payouts and that the
#all payments are in months
pay_factor = 1
return compounding_factor, pay_factor
def calculateIRR(cashflow, compounding_factor):
irr = numpy.irr(cashflow)
irr_comp = (1 + irr)**compounding_factor - 1
#seems like in first example it uses rounded irr, can do something like:
#irr_comp = round(irr_comp,4)
return irr_comp
def sortExpenditures(sh, irr, pay_factor):
#percentages and discounting occurs at the IRR caculated in the main
#function
cat = ["Sales", "Salvage", "Subsidy", "Redeemable", "Utility", "Labor",
"Testing", "Marketing", "Materials", "Logistics"]
#python dictionary to sort contributions into categories
cat_contributions = {}
for c in cat:
cat_contributions[c] = []
# create list of contributions of each list item to FV in a dictionary
for r in range(2,sh.nrows):
try:
#discounted cash flow of each expenditure
#using formula FV = expenditure/(1+i)^n
cat_contributions[sh.cell_value(r,2)].append(
sh.cell_value(r,1) / ( (1 + irr) ** (sh.cell_value(r,0)/pay_factor) )
)
except KeyError:
print "No category for type: " + sh.cell_value(r,2) +'\n'
return cat_contributions
def calculateFlows(cat_contributions):
total_outflow = 0
total_inflow = 0
total_cat_contributions = {}
for cat in cat_contributions:
total_cat_contributions[cat] = sum( cat_contributions[cat] )
if total_cat_contributions[cat] < 0:
total_outflow += total_cat_contributions[cat]
else:
total_inflow += total_cat_contributions[cat]
netflow = total_inflow + total_outflow
return total_cat_contributions, netflow, total_outflow, total_inflow
def printStats(cat_contributions, irr, compounding_factor, pay_period,
total_cat_contributions, netflow, total_outflow, total_inflow):
print "IRR: "+str(irr*100) +' %'
if compounding_factor == 1: print "Compounding: Yearly"
if compounding_factor == 12: print "Compounding: Monthly"
if pay_period == 1: "Cashflows: Year Ended"
if pay_period == 12: "Cashflows: Month Ended"
print "Future Value (Net Adjusted Cashflow): " +str(netflow)
print "Adjusted Inflows: " + str(total_inflow)
print "Adjusted Outflows: " + str(total_outflow) +'\n'
for cat in total_cat_contributions:
if total_cat_contributions[cat] != 0:
print '-----------------------------------------------------'
print cat + '\n'
print "Total Contribution to FV " + str( total_cat_contributions[cat] )
if total_cat_contributions[cat] < 0:
print "Contribution to Expenses: " + str ( abs(100 * total_cat_contributions[cat]/total_outflow) )
else:
print "Contribution to Revenues: " + str ( abs(100 * total_cat_contributions[cat]/total_inflow) ) +'\n'
main(xls='Book1.xlsx')

OverflowError mktime argument out of range

After solving a naive datetime problem I am facing a new problem on a view to generate graphs. Now I get mktime argument out of range.
I have no idea how to solve it. I didn't write the code, I am using it from a colleague of mine and I can't seem o understand why it fails. I think it has to do with a function that runs overtime and the error pops out.
#login_required(login_url='/accounts/login/')
def loggedin(request):
data = []
data2 = []
data3 = []
dicdata2 = {}
dicdata3 = {}
datainterior = []
today = timezone.localtime(timezone.now()+timedelta(hours=1)).date()
tomorrow = today + timedelta(1)
semana= today - timedelta(7)
today = today - timedelta(1)
semana_start = datetime.combine(today, time())
semana_start = timezone.make_aware(semana_start, timezone.utc)
today_start = datetime.combine(today, time())
today_start = timezone.make_aware(today_start, timezone.utc)
today_end = datetime.combine(tomorrow, time())
today_end = timezone.make_aware(today_end, timezone.utc)
for modulo in Repository.objects.values("des_especialidade").distinct():
dic = {}
mod = str(modulo['des_especialidade'])
dic["label"] = str(mod)
dic["value"] = Repository.objects.filter(des_especialidade__iexact=mod).count()
data.append(dic)
for modulo in Repository.objects.values("modulo").distinct():
dic = {}
mod = str(modulo['modulo'])
dic["label"] = str(mod)
dic["value"] = Repository.objects.filter(modulo__iexact=mod, dt_diag__gte=semana_start).count()
datainterior.append(dic)
# print mod, Repository.objects.filter(modulo__iexact=mod).count()
# data[mod] = Repository.objects.filter(modulo__iexact=mod).count()
dicdata2['values'] = datainterior
dicdata2['key'] = "Cumulative Return"
dicdata3['values'] = data
dicdata3['color'] = "#d67777"
dicdata3['key'] = "Diagnosticos Identificados"
data3.append(dicdata3)
data2.append(dicdata2)
#-------sunburst
databurst = []
dictburst = {}
dictburst['name'] = "CHP"
childrenmodulo = []
for modulo in Repository.objects.values("modulo").distinct():
childrenmodulodic = {}
mod = str(modulo['modulo'])
childrenmodulodic['name'] = mod
childrenesp = []
for especialidade in Repository.objects.filter(modulo__iexact=mod).values("des_especialidade").distinct():
childrenespdic = {}
esp = str(especialidade['des_especialidade'])
childrenespdic['name'] = esp
childrencode = []
for code in Repository.objects.filter(modulo__iexact=mod,des_especialidade__iexact=esp).values("cod_diagnosis").distinct():
childrencodedic = {}
codee= str(code['cod_diagnosis'])
childrencodedic['name'] = 'ICD9 - '+codee
childrencodedic['size'] = Repository.objects.filter(modulo__iexact=mod,des_especialidade__iexact=esp,cod_diagnosis__iexact=codee).count()
childrencode.append(childrencodedic)
childrenespdic['children'] = childrencode
#childrenespdic['size'] = Repository.objects.filter(des_especialidade__iexact=esp).count()
childrenesp.append(childrenespdic)
childrenmodulodic['children'] = childrenesp
childrenmodulo.append(childrenmodulodic)
dictburst['children'] = childrenmodulo
databurst.append(dictburst)
# print databurst
# --------stacked area chart
datastack = []
for modulo in Repository.objects.values("modulo").distinct():
datastackdic = {}
mod = str(modulo['modulo'])
datastackdic['key'] = mod
monthsarray = []
year = timezone.localtime(timezone.now()+timedelta(hours=1)).year
month = timezone.localtime(timezone.now()+timedelta(hours=1)).month
last = timezone.localtime(timezone.now()+timedelta(hours=1)) - relativedelta(years=1)
lastyear = int(last.year)
lastmonth = int(last.month)
#i = 1
while lastmonth <= int(month) or lastyear<int(year):
date = str(lastmonth) + '/' + str(lastyear)
if (lastmonth < 12):
datef = str(lastmonth + 1) + '/' + str(lastyear)
else:
lastmonth = 01
lastyear = int(lastyear)+1
datef = str(lastmonth)+'/'+ str(lastyear)
lastmonth = 0
datainicial = datetime.strptime(date, '%m/%Y')
datainicial = timezone.make_aware(datainicial, timezone.utc)
datafinal = datetime.strptime(datef, '%m/%Y')
datafinal = timezone.make_aware(datafinal, timezone.utc)
#print "lastmonth",lastmonth,"lastyear", lastyear
#print "datainicial:",datainicial,"datafinal: ",datafinal
filtro = Repository.objects.filter(modulo__iexact=mod)
count = filtro.filter(dt_diag__gte=datainicial, dt_diag__lt=datafinal).count()
conv = datetime.strptime(date, '%m/%Y')
ms = datetime_to_ms_str(conv)
monthsarray.append([ms, count])
#i += 1
lastmonth += 1
datastackdic['values'] = monthsarray
datastack.append(datastackdic)
#print datastack
if request.user.last_login is not None:
#print(request.user.last_login)
contador_novas = Repository.objects.filter(dt_diag__lte=today_end, dt_diag__gte=today_start).count()
return render_to_response('loggedin.html',
{'user': request.user.username, 'contador': contador_novas, 'data': data, 'data2': data2,
'data3': data3,
'databurst': databurst, 'datastack':datastack})
def datetime_to_ms_str(dt):
return str(1000 * mktime(dt.timetuple()))
I think the problem is with this condition.
while lastmonth <= int(month) or lastyear<int(year):
During December, month=12, so lastmonth <= int(month) will always be True. So the loop whill always return True, even once lastyear is more that the current year.
You want to loop if the loop is in the previous year, or if the loop is in the current year and the month is not in the future. Therefore, I think you want to change it to the following:
while lastyear < year or (lastyear == year and lastmonth <= month):
To be sure that the code is working and to understand it, you need to add lots of print statements to the loops, see how lastmonth and lastyear change, and check that the loop exits when you expect it to. You also need to test it for other values of year and month so that it doesn't break next month. Ideally you want to extract this bit of the code into a separate function. It would be easier to understand the loop if it only returned a list of (month, year) integers, instead of doing lots of date formatting at the same time. Then it would be easier to add unit tests.

Categories