Rainfall based Crop Insurance Payout calculation - python

I am analysing daily rainfall data to calculate Insurance payout for farmers in case of excess rainfall coverage. policy covers if the "Consecutive 3 day cumulative rainfall" is greater than 50mm between 1st-Oct and 31st-Oct.
I was able to write the code in Python to find the matching criteria. But when it rains continuously then the result has overlapping dates which is not acceptable payout.
Need help in calculating best payout option in case of overlapping dates.
for dist in data["dcode"].unique():
d_data = data[data["dcode"] == dist]
#print(dist)
for block in d_data["mandal"].unique():
prev_rain = 0
prev_to_date = "01/12/2022"
for each in rain_dev_input:
#[(rain_dev_input["TERM"] == "DEFICIT RAINFALL") & (rain_dev_input["DIST_CODE"] == 240)]:
#print(each)
distcode = each["DIST_CODE"]
#print(distcode)
term = str(each["TERM"])
if (distcode == dist) & (term == "EXCESS RAINFALL"):
start_date = each["FROM_PERIOD"]
end_date = each["TO_PERIOD"]
s_date = datetime.datetime.strptime(start_date, "%d/%m/%y")
e_date = datetime.datetime.strptime(end_date, "%d/%m/%y")
#s_date = start_date.strftime(start_date, "%Y-%m-%d")
#e_date = end_date.strftime(end_date, "%Y-%m-%d")
#count1 = daterange(s_date, e_date)
#print(s_date, ": ", e_date)
m_data = d_data[d_data["mandal"] == block]
p_data = m_data.loc[start_date:end_date]
for singledate in daterange(s_date, e_date):
#print("Inside Excess Rain")
from_date = datetime.datetime.strftime(singledate, "%Y-%m-%d")
to_date = datetime.datetime.strftime(
(singledate + timedelta(2)), "%Y-%m-%d"
)
total_rain = p_data.loc[from_date:to_date]["rain"].sum()
#print(total_rain)
range1 = float(each["RANGE1"])
range2 = float(each["RANGE2"])
if (total_rain >= range1) & (total_rain < range2):
#print("inside write to file")
if (from_date <= prev_to_date <= to_date) & (prev_rain <= total_rain):
temp["Max"] = total_rain;
prev_rain = total_rain
prev_to_date = to_date
temp = dict()
temp["district"] = each["DIST_NAME"]
temp["mandal"] = block
temp[
"category"
] = "excess rainfall for 3 consecutive days, cumulative"
temp["rainfall"] = total_rain
temp["from_date"] = from_date
temp["to_date"] = to_date
temp["phase"] = each["PHASE"]
#temp["insurance"] = insurance_excessrain(each, total_rain)
temp["insurance"] = (total_rain - range1)* each["PAYOUT"]
excess_rainfall.append(temp)
# %%
# output excess rainfall rain_dev_list data
excess_rain = pd.DataFrame(excess_rainfall)
excess_rain.to_csv(str(output_folder) + "/excess_rainfall_2020_h2.csv", index=False)
This result has overlapping dates
Daily Rainfall sample data

Related

How can I take the last value -1 pandas

I am trying to do a function where I check if a date is in my excel file, and if unfortunately it is not. I retrieve the date before.
I succeeded with the after date and here is my code.
Only with the date before, I really can't do it.
i tried this for the day before:
def get_all_dates_between_2_dates_with_special_begin_substraction(Class, date_départ, date_de_fin, date_debut_analyse, exclus=False):
date_depart = date_départ
date_fin = date_de_fin
result_dates = []
inFile = "database/Calendar_US_Target.xlsx"
inSheetName = "Sheet1"
df =(pd.read_excel(inFile, sheet_name = inSheetName))
date_depart = datetime.datetime.strptime(date_depart, '%Y-%m-%d')
date_fin = datetime.datetime.strptime(date_fin, '%Y-%m-%d')
date_calcul_depart = datetime.datetime.strptime(date_debut_analyse, '%Y-%m-%d')
var_date_depart = date_depart
time_to_add = ""
if (Class.F0 == "mois"):
time_to_add = relativedelta(months=1)
if (Class.F0 == "trimestre"):
time_to_add = relativedelta(months=3)
if (Class.F0 == "semestre"):
time_to_add = relativedelta(months=6)
if (Class.F0 == "année"):
time_to_add = relativedelta(years=1)
while var_date_depart <= date_fin:
-------------------------------------------------------------
df['mask'] = (var_date_depart <= df['TARGETirs_holi']) # daybefore
print(df.head())
print(df[df.mask =="True"].head(1)) #want to check the last true value
------------------------------------------------------------------------------
if (result >= date_calcul_depart):
result = (str(result)[0:10])
result = result[8:10] + "/" + result[5:7] + "/" + result[0:4]
result_dates.append(str(result))
var_date_depart = var_date_depart + time_to_add
if (exclus == True):
result_dates = result_dates[1:-1]
return(result_dates)
I want to say, do a column (or a dataframe) where the first date is true where the first date smaller than the second then i take the last value who is true.
for example:
I have this array [12-05-2022,15-05-2022,16-05-2022 and 19-05-2022]
if i put 15-05-2022, it gives me 15-05-2022, but if i put 18-05-2022, its gives me 16-05-2022
Thanks!

Python XML Parsing - need to correct while loop

Fairly new to Python. I'm parsing an XML file and the following code returns the undesired results. I can understand why I'm getting my results - there are two escalations in the XML for this deal and I'm getting results for each set. I'm need help updating my code to only return the monthly rent for each escalation in the XML:
<RentEscalations>
<RentEscalation ID="354781">
<BeginIn>7</BeginIn>
<Escalation>3.8</Escalation>
<RecurrenceInterval>12</RecurrenceInterval>
<EscalationType>bump</EscalationType>
</RentEscalation>
<RentEscalation ID="354782">
<BeginIn>61</BeginIn>
<Escalation>1.0</Escalation>
<RecurrenceInterval>12</RecurrenceInterval>
<EscalationType>bump</EscalationType>
</RentEscalation>
</RentEscalations>
The rent starts at $3.00/sqft for the first 6 months. This XML block shows that, for each 12 months (RecurrenceInterval), the rent will be $6.80/sqft ($3.00 base + $3.80 escalation). The following twelve months will be $10.60 ($6.80 + 3.80). Each year, the amount per square foot will increase by $3.80 until the 61st month in the term. At that point, the rent will increase by $1.00/sqft for the remainder of the term. The entire term of the lease is 120 months.
My results include 114 results based on the first escalation (3.80/sqft) followed by 114 rows showing as if the rent starts at $3.00/sqft incrementing by $1.00/sqft each year.
Any help is appreciated!
import xml.etree.ElementTree as ET
import pyodbc
import dateutil.relativedelta as rd
import datetime as dt
tree = ET.parse('C:\\FileLocation\\DealData.xml')
root = tree.getroot()
for deal in root.findall("Deals"):
for dl in deal.findall("Deal"):
dealid = dl.get("DealID")
for dts in dl.findall("DealTerms/DealTerm"):
dtid = dts.get("ID")
darea = float(dts.find("RentableArea").text)
dterm = int(dts.find("LeaseTerm").text)
for brrent in dts.findall("BaseRents/BaseRent"):
brid = brrent.get("ID")
rent = float(brrent.find("Rent").text)
darea = float(dts.find("RentableArea").text)
per = brrent.find("Period").text
dtstart = dts.find("CommencementDate").text
startyr = int(dtstart[0:4])
startmo = int(dtstart[5:7])
startday = int(dtstart[8:])
start = dt.date(startyr, startmo, startday)
end = start + rd.relativedelta(months=dterm)
if brrent.find("Duration").text is None:
duration = 0
else:
duration = int(brrent.find("Duration").text)
termbal = dterm - duration
for resc in dts.findall("RentEscalations/RentEscalation"):
rescid = resc.get("ID")
esctype = resc.find("EscalationType").text
begmo = int(resc.find("BeginIn").text)
esc = float(resc.find("Escalation").text)
intrvl = int(resc.find("RecurrenceInterval").text)
if intrvl != 0:
pers = termbal / intrvl
else:
pers = 0
escst = start + rd.relativedelta(months=begmo - 1)
i = 0
x = begmo
newrate = rent
while i < termbal:
billdt = escst + rd.relativedelta(months=i)
if per == "rsf/year":
monthlyamt = (newrate + esc) * darea / 12.0
if per == "month":
monthlyamt = newrate + esc
if per == "year":
monthlyamt = (newrate + esc) / 12.0
if per == "rsf/month":
monthlyamt = (newrate + esc) * darea
try:
if i % intrvl == 0:
level = x + 1
newrent = monthlyamt
x += 1
newrate += esc
else:
level = x
except ZeroDivisionError:
break
i += 1
if dealid == "1254278":
print(dealid, dtid, rescid, dterm, darea, escst, rent, intrvl, esctype, termbal, \
monthlyamt, billdt, pers, level, newrate, newrent)

Count summer days between two dates

I want to count summer days between two dates. Summer is May first to August last.
This will count all days:
import datetime
startdate=datetime.datetime(2015,1,1)
enddate=datetime.datetime(2016,6,1)
delta=enddate-startdate
print delta.days
>>517
But how can only count the passed summer days?
You could define a generator to iterate over every date between startdate and enddate, define a function to check if a date represents a summer day and use sum to count the summer days:
import datetime
startdate = datetime.datetime(2015,1,1)
enddate = datetime.datetime(2016,6,1)
all_dates = (startdate + datetime.timedelta(days=x) for x in range(0, (enddate-startdate).days))
def is_summer_day(date):
return 5 <= date.month <= 8
print(sum(1 for date in all_dates if is_summer_day(date)))
# 154
Thanks to the generator, you don't need to create a huge list in memory with every day between startdate and enddate.
This iteration still considers every single day, even if it's not needed. For very large gaps, you could use the fact that every complete year has 123 summer days according to your definition.
You can create a few functions to count how many summer days you have between two days:
from datetime import date
def get_summer_start(year):
return date(year, 5, 1)
def get_summer_end(year):
return date(year, 8, 31)
def get_start_date(date, year):
return max(date, get_summer_start(year))
def get_end_date(date, year):
return min(date, get_summer_end(year))
def count_summer_days(date1, date2):
date1_year = date1.year
date2_year = date2.year
if date1_year == date2_year:
s = get_start_date(date1, date1_year)
e = get_end_date(date2, date1_year)
return (e - s).days
else:
s1 = max(date1, get_summer_start(date1_year))
e1 = get_summer_end(date1_year)
first_year = max(0,(e1 -s1).days)
s1 = get_summer_start(date2_year)
e1 = min(date2, get_summer_end(date2_year))
last_year = max(0,(e2 -s2).days)
other_years = date2_year - date1_year - 1
summer_days_per_year = (get_summer_end(date1_year) - get_summer_start(date1_year)).days
return first_year + last_year + (other_years * summer_days_per_year)
date1 = date(2015,1,1)
date2 = date(2016,6,1)
print count_summer_days(date1, date2)
Here is a better solution for large periods:
first_summer_day = (5,1)
last_summer_day = (8,31)
from datetime import date
startdate = date(2015,1,1)
enddate = date(2016,6,1)
# make sure that startdate > endate
if startdate > enddate:
startdate, endate = endate, startdate
def iter_yearly_summer_days(startdate, enddate):
for year in range(startdate.year, enddate.year+1):
start_period = startdate if year == startdate.year else date(year, 1, 1)
end_period = enddate if year == enddate.year else date(year, 12, 31)
year_first_summer_day = date(year, *first_summer_day)
year_last_summer_day = date(year, *last_summer_day)
summer_days_that_year = (min(year_last_summer_day, end_period) - max(year_first_summer_day, start_period)).days
print('year {} had {} days of summer'.format(year, summer_days_that_year))
yield summer_days_that_year
print(sum(iter_yearly_summer_days(startdate, enddate)))

python data frame filter conditions: any faster way

parts_list = imp_parts_df['Parts'].tolist()
sub_week_list = ['2016-12-11', '2016-12-04', '2016-11-27', '2016-11-20', '2016-11-13']
i = 0
start = DT.datetime.now()
for p in parts_list:
for thisdate in sub_week_list:
thisweek_start = pd.to_datetime(thisdate, format='%Y-%m-%d') #'2016/12/11'
thisweek_end = thisweek_start + DT.timedelta(days=7) # add 7 days to the week date
val_shipped = len(shipment_df[(shipment_df['loc'] == 'USW1') & (shipment_df['part'] == str(p)) & (shipment_df['shipped_date'] >= thisweek_start) & (shipment_df['shipped_date'] < thisweek_end)])
print(DT.datetime.now() - start).total_seconds()
shipment_df has around 35000 records
partlist has 436 parts
sub_week_list has 5 dates in it
it took overall 438.13 secs to run this code
Is there any faster way to do it?
parts_list = imp_parts_df['Parts'].astype(str).tolist()
i = 0
start = DT.datetime.now()
for p in parts_list:
q = 'loc == "xxx" & part == #p & "2016-11-20" <= shipped_date < "2016-11-27"'
val_shipped = len(shipment_df.query(q))
print (DT.datetime.now() - start).total_seconds()

OverflowError mktime argument out of range

After solving a naive datetime problem I am facing a new problem on a view to generate graphs. Now I get mktime argument out of range.
I have no idea how to solve it. I didn't write the code, I am using it from a colleague of mine and I can't seem o understand why it fails. I think it has to do with a function that runs overtime and the error pops out.
#login_required(login_url='/accounts/login/')
def loggedin(request):
data = []
data2 = []
data3 = []
dicdata2 = {}
dicdata3 = {}
datainterior = []
today = timezone.localtime(timezone.now()+timedelta(hours=1)).date()
tomorrow = today + timedelta(1)
semana= today - timedelta(7)
today = today - timedelta(1)
semana_start = datetime.combine(today, time())
semana_start = timezone.make_aware(semana_start, timezone.utc)
today_start = datetime.combine(today, time())
today_start = timezone.make_aware(today_start, timezone.utc)
today_end = datetime.combine(tomorrow, time())
today_end = timezone.make_aware(today_end, timezone.utc)
for modulo in Repository.objects.values("des_especialidade").distinct():
dic = {}
mod = str(modulo['des_especialidade'])
dic["label"] = str(mod)
dic["value"] = Repository.objects.filter(des_especialidade__iexact=mod).count()
data.append(dic)
for modulo in Repository.objects.values("modulo").distinct():
dic = {}
mod = str(modulo['modulo'])
dic["label"] = str(mod)
dic["value"] = Repository.objects.filter(modulo__iexact=mod, dt_diag__gte=semana_start).count()
datainterior.append(dic)
# print mod, Repository.objects.filter(modulo__iexact=mod).count()
# data[mod] = Repository.objects.filter(modulo__iexact=mod).count()
dicdata2['values'] = datainterior
dicdata2['key'] = "Cumulative Return"
dicdata3['values'] = data
dicdata3['color'] = "#d67777"
dicdata3['key'] = "Diagnosticos Identificados"
data3.append(dicdata3)
data2.append(dicdata2)
#-------sunburst
databurst = []
dictburst = {}
dictburst['name'] = "CHP"
childrenmodulo = []
for modulo in Repository.objects.values("modulo").distinct():
childrenmodulodic = {}
mod = str(modulo['modulo'])
childrenmodulodic['name'] = mod
childrenesp = []
for especialidade in Repository.objects.filter(modulo__iexact=mod).values("des_especialidade").distinct():
childrenespdic = {}
esp = str(especialidade['des_especialidade'])
childrenespdic['name'] = esp
childrencode = []
for code in Repository.objects.filter(modulo__iexact=mod,des_especialidade__iexact=esp).values("cod_diagnosis").distinct():
childrencodedic = {}
codee= str(code['cod_diagnosis'])
childrencodedic['name'] = 'ICD9 - '+codee
childrencodedic['size'] = Repository.objects.filter(modulo__iexact=mod,des_especialidade__iexact=esp,cod_diagnosis__iexact=codee).count()
childrencode.append(childrencodedic)
childrenespdic['children'] = childrencode
#childrenespdic['size'] = Repository.objects.filter(des_especialidade__iexact=esp).count()
childrenesp.append(childrenespdic)
childrenmodulodic['children'] = childrenesp
childrenmodulo.append(childrenmodulodic)
dictburst['children'] = childrenmodulo
databurst.append(dictburst)
# print databurst
# --------stacked area chart
datastack = []
for modulo in Repository.objects.values("modulo").distinct():
datastackdic = {}
mod = str(modulo['modulo'])
datastackdic['key'] = mod
monthsarray = []
year = timezone.localtime(timezone.now()+timedelta(hours=1)).year
month = timezone.localtime(timezone.now()+timedelta(hours=1)).month
last = timezone.localtime(timezone.now()+timedelta(hours=1)) - relativedelta(years=1)
lastyear = int(last.year)
lastmonth = int(last.month)
#i = 1
while lastmonth <= int(month) or lastyear<int(year):
date = str(lastmonth) + '/' + str(lastyear)
if (lastmonth < 12):
datef = str(lastmonth + 1) + '/' + str(lastyear)
else:
lastmonth = 01
lastyear = int(lastyear)+1
datef = str(lastmonth)+'/'+ str(lastyear)
lastmonth = 0
datainicial = datetime.strptime(date, '%m/%Y')
datainicial = timezone.make_aware(datainicial, timezone.utc)
datafinal = datetime.strptime(datef, '%m/%Y')
datafinal = timezone.make_aware(datafinal, timezone.utc)
#print "lastmonth",lastmonth,"lastyear", lastyear
#print "datainicial:",datainicial,"datafinal: ",datafinal
filtro = Repository.objects.filter(modulo__iexact=mod)
count = filtro.filter(dt_diag__gte=datainicial, dt_diag__lt=datafinal).count()
conv = datetime.strptime(date, '%m/%Y')
ms = datetime_to_ms_str(conv)
monthsarray.append([ms, count])
#i += 1
lastmonth += 1
datastackdic['values'] = monthsarray
datastack.append(datastackdic)
#print datastack
if request.user.last_login is not None:
#print(request.user.last_login)
contador_novas = Repository.objects.filter(dt_diag__lte=today_end, dt_diag__gte=today_start).count()
return render_to_response('loggedin.html',
{'user': request.user.username, 'contador': contador_novas, 'data': data, 'data2': data2,
'data3': data3,
'databurst': databurst, 'datastack':datastack})
def datetime_to_ms_str(dt):
return str(1000 * mktime(dt.timetuple()))
I think the problem is with this condition.
while lastmonth <= int(month) or lastyear<int(year):
During December, month=12, so lastmonth <= int(month) will always be True. So the loop whill always return True, even once lastyear is more that the current year.
You want to loop if the loop is in the previous year, or if the loop is in the current year and the month is not in the future. Therefore, I think you want to change it to the following:
while lastyear < year or (lastyear == year and lastmonth <= month):
To be sure that the code is working and to understand it, you need to add lots of print statements to the loops, see how lastmonth and lastyear change, and check that the loop exits when you expect it to. You also need to test it for other values of year and month so that it doesn't break next month. Ideally you want to extract this bit of the code into a separate function. It would be easier to understand the loop if it only returned a list of (month, year) integers, instead of doing lots of date formatting at the same time. Then it would be easier to add unit tests.

Categories