Optimising if statement, calculation, session query

Optimising if statement, calculation, session query - python

Currently I'm trying with my hobby 2 years of hobby scripting in python to optimize my old codes.
For a script in the past I had a check for giving out resources to a player. The whole calculation for 100.000 did take about 25 minutes. This has been shortened to a mere 1 minute and 40 seconds with a filter that if resources are equal to maximum storage it will not get the row from the table.
There are still checks in place in case one of te 3 resource types are full (so other resources will receive their production bonus.
Production is split between 4 since it will run once in 15 minutes.
Old code was a 25 minute run time.
The code displayed here beneath does run over 100.000 "villages" within 1 minute and 40 seconds.
2,3 seconds to withdraw all data
0,2 seconds to write all data to database
the rest (1 minute and 37,5 seconds) is purely what runs between
for village in villages: and session.add(add_resources)
Is it possible to speed this up even further or are these the limits of Python itself?
time_start, villages = datetime.utcnow(), new_get_villages()
for village in villages:
production_wood, production_stone, production_iron = int(village.wood_production/4), int(village.stone_production/4), int(village.iron_production/4)
add_resources = session.query(VillageNew).filter(VillageNew.pk == village.pk).first()
if add_resources.wood_stock != add_resources.max_storage:
add_resources.wood_stock = add_resources.wood_stock + production_wood
if add_resources.wood_stock > add_resources.max_storage:
add_resources.wood_stock = add_resources.max_storage
if add_resources.stone_stock != add_resources.max_storage:
add_resources.stone_stock = add_resources.stone_stock + production_stone
if add_resources.stone_stock > add_resources.max_storage:
add_resources.stone_stock = add_resources.max_storage
if add_resources.iron_stock != add_resources.max_storage:
add_resources.iron_stock = add_resources.iron_stock + production_iron
if add_resources.iron_stock > add_resources.max_storage:
add_resources.iron_stock = add_resources.max_storage
session.add(add_resources)
session.commit
time_end = datetime.utcnow()
session:
db_connection_string = conf.get_string('DBConf', 'db_connection_string')
engine = create_engine(db_connection_string, encoding='utf8')
Session = sessionmaker(bind=engine)
Code update after the post of #gimix
for village in session.query(VillageNew).filter(or_(VillageNew.wood_stock != VillageNew.max_storage, VillageNew.stone_stock != VillageNew.max_storage, VillageNew.iron_stock != VillageNew.max_storage)).all():
village.wood_stock = (village.wood_stock + int(village.wood_production))
if village.wood_stock > village.max_storage:
village.wood_stock = village.max_storage
village.stone_stock = (village.stone_stock + int(village.wood_production))
if village.stone_stock > village.max_storage:
village.stone_stock = village.max_storage
village.iron_stock = (village.iron_stock + int(village.wood_production))
if village.iron_stock > village.max_storage:
village.iron_stock = village.max_storage
session.commit()
Code update 02/11/2021:
session.query(VillageNew).where(VillageNew.wood_stock < VillageNew.max_storage).update({VillageNew.wood_stock: VillageNew.wood_stock + VillageNew.wood_production})
session.query(VillageNew).where(VillageNew.stone_stock < VillageNew.max_storage).update({VillageNew.stone_stock: VillageNew.stone_stock + VillageNew.stone_production})
session.query(VillageNew).where(VillageNew.iron_stock < VillageNew.max_storage).update({VillageNew.iron_stock: VillageNew.iron_stock + VillageNew.iron_production})
session.query(VillageNew).where(VillageNew.wood_stock > VillageNew.max_storage).update({VillageNew.wood_stock: VillageNew.max_storage})
session.query(VillageNew).where(VillageNew.stone_stock > VillageNew.max_storage).update({VillageNew.stone_stock: VillageNew.max_storage})
session.query(VillageNew).where(VillageNew.iron_stock > VillageNew.max_storage).update({VillageNew.iron_stock: VillageNew.max_storage})

In the end the fastest was pure SQL execution.
session.execute('UPDATE table.villagenew SET wood_stock = least(villagenew.wood_stock + villagenew.wood_production, villagenew.max_storage) WHERE villagenew.wood_stock < villagenew.max_storage')
In case feedback to the program was needed as in it is full:
session.query(VillageNew).where(VillageNew.wood_stock < VillageNew.max_storage).update({VillageNew.wood_stock: VillageNew.wood_stock + VillageNew.wood_production}) session.query(VillageNew).where(VillageNew.stone_stock < VillageNew.max_storage).update({VillageNew.stone_stock: VillageNew.stone_stock + VillageNew.stone_production}) session.query(VillageNew).where(VillageNew.iron_stock < VillageNew.max_storage).update({VillageNew.iron_stock: VillageNew.iron_stock + VillageNew.iron_production}) session.query(VillageNew).where(VillageNew.wood_stock > VillageNew.max_storage).update({VillageNew.wood_stock: VillageNew.max_storage}) session.query(VillageNew).where(VillageNew.stone_stock > VillageNew.max_storage).update({VillageNew.stone_stock: VillageNew.max_storage}) session.query(VillageNew).where(VillageNew.iron_stock > VillageNew.max_storage).update({VillageNew.iron_stock: VillageNew.max_storage})

Related

Filtering and saving subset of pandas

I have a function that does the following:
Inserting class values 1,2,3 based on timestamps. This work as inspected and in the first iteration of the first for-loop i get the following class distribution:
mapping: {'Seizure': 1, 'Preictal': 2, 'Interictal': 3}
value counts:
3.0 3150000
2.0 450000
1.0 28000
Name: class, dtype:
So i have this number of rows for each class.
However in the second forloop i iterate through the same list of timestamps and want to subset the data between the timestamps and include some conditions based on the classes i inserted in first forloop.
This is the result of the same timestamps e.g. first iteration:
len sz: 28000
len prei: 450000
len pre int: 29700000
logging
len post int: 1485499
How the * does preint and post int (interictal class) get this high of a count? it doesn't at all correspond somewhat to the number interictal in the first?
here my function.
def insert_class_col(dataframe, sz_info_list, date_converter, save_filename, save_path, file_sample_rate, file_channel):
print(f"sz_info_list: {sz_info_list}")
if "class" not in dataframe.columns:
dataframe.insert(0, "class", np.nan)
file_channel.extend(['timestamp', 'class'])
dataframe = dataframe[file_channel]
# Insert class attributes to ensure that seizure, preictal, interictal does not overlap.
for index, container in enumerate(sz_info_list):
delay = container.delay * 1000
duration = container.duration * 1000
sz_start = date_converter(container.time_emu) + delay
sz_end = sz_start + duration
print(f"sz_start index = {sz_start}")
print(f"sz_end: {sz_end}")
preictal_start = sz_start - (15 * 60 * 1000)
interictal_start = sz_start - (1 * 60 * 60 * 1000)
interictal_end = sz_end + (1 * 60 * 60 * 1000)
dataframe['timestamp'] = pd.to_numeric(dataframe['timestamp'])
# hvis data er sezure tag seizure
# hvis data er preictal tag preictal/interictal, men ikke indenfor seizure data.
dataframe.loc[(dataframe['timestamp'] >= sz_start) & (dataframe['timestamp'] < sz_end), "class"] = class_mapping['Seizure']
dataframe.loc[(dataframe['class'] != class_mapping['Seizure']) & (dataframe['timestamp'] >= preictal_start) & (dataframe['timestamp'] < sz_start), "class"] = class_mapping['Preictal']
dataframe.loc[(dataframe['class'] != class_mapping['Seizure']) & (dataframe['class'] != class_mapping['Preictal']) & (dataframe['timestamp'] >= interictal_start) & (dataframe['timestamp'] < interictal_end), "class"] = class_mapping['Interictal']
print(f"mapping: {class_mapping} \n value counts: \n{dataframe['class'].value_counts()}")
print(f"Begginging current number of class in df {dataframe['class'].value_counts()}")
# Saving to csv
for index, container in enumerate(sz_info_list):
delay = container.delay * 1000
duration = container.duration * 1000
sz_start = date_converter(container.time_emu) + delay
sz_end = sz_start + duration
print(f"sz_start index = {sz_start}")
print(f"sz_end: {sz_end}")
preictal_start = sz_start - (15 * 60 * 1000)
interictal_start = sz_start - (1 * 60 * 60 * 1000)
interictal_end = sz_end + (1 * 60 * 60 * 1000)
dataframe['timestamp'] = pd.to_numeric(dataframe['timestamp'])
#INSERTING SEIZURE CLASS
sz_df = dataframe[(dataframe['timestamp'] >= sz_start) & (dataframe['timestamp'] < sz_end)].copy()
print(f"len sz: {len(sz_df)}")
#df_save_compress(f"Seizure_{index}_{save_filename}", save_path + "/Seizure", sz_df)
#logging_info_txt(f"Seizure_{index}_{save_filename}", save_path, file_sample_rate, file_channel)
#INSERTING PREICTAL
prei_df = dataframe[(dataframe['timestamp'] >= preictal_start) & (dataframe['timestamp'] < sz_start) & (dataframe['class'] != class_mapping["Seizure"])].copy()
print(f"len prei: {len(prei_df)}")
#df_save_compress(f"Preictal_{index}_{save_filename}", save_path + "/Preictal", prei_df)
#logging_info_txt(f"Preictal_{index}_{save_filename}", save_path, file_sample_rate, file_channel)
#INSERTING INTERICTAL
pre_int_df = dataframe[(dataframe['timestamp'] >= interictal_start) & (dataframe['timestamp'] < preictal_start) & (dataframe['class'] != class_mapping["Seizure"]) | (dataframe['class'] != class_mapping["Preictal"])].copy()
print(f"len pre int: {len(pre_int_df)}")
#df_save_compress(f"PreInt_{index}_{save_filename}", save_path + "/Interictal", pre_int_df)
logging_info_txt(f"PreInt_{index}_{save_filename}", save_path, file_sample_rate, file_channel)
post_int_df = dataframe[(dataframe['timestamp'] >= sz_end) & (dataframe['timestamp'] < interictal_end) & (dataframe['class'] != class_mapping["Seizure"]) & (dataframe['class'] != class_mapping["Preictal"])].copy()
print(f"len post int: {len(post_int_df)}")
#df_save_compress(f"PostInt_{index}_{save_filename}", save_path + "/Interictal", post_int_df)
logging_info_txt(f"PostInt_{index}_{save_filename}", save_path, file_sample_rate, file_channel)
#print(f"after = len df: {len(dataframe)} values class: \n {dataframe['class'].value_counts()}")
# clean up
del pre_int_df, post_int_df, sz_df, prei_df
gc.collect()
Notice that preint which is interictal is 29700000 while printing the classes i should be lower than 3150000.
Any ideas of this pandas behavior?

#richardec answered the question see comments.

Same Python code, same Data, outcomes different if Data imported or not?

So I have a Python code that first aggregates and standardizes Data into a file I called "tripFile". Then the code tries to identify the differences between this most recent tripFile and a previous one.
From the first part of the code, if I export the tripFile, and import it again for the second part of the code, it takes around 5 minutes to run and says it is looping over a bit more than 4,000 objects.
newTripFile = pd.read_csv(PATH + today + ' Trip File v6.csv')
However, if I do not export & re-import the Data (just keeping it from the first part of the code), it takes a bit less than 24 hours (!!) and says it is looping over a bit more than 951,691 objects.
newTripFile = tripFile
My Data is a dataframe, and checked the shape of it, it is identical to the file I export.
Any idea what can be causing that ???
Here is the second part of my code:
oldTripFile = pd.read_excel(PATH + OLDTRIPFILE)
oldTripFile.drop(['id'], axis = 1, inplace = True)
oldTripFile['status'] = 'old'
# New version of trip file
newTripFile = pd.read_csv(PATH + today + ' Trip File v6.csv')
newTripFile.drop(['id'], axis = 1, inplace = True)
newTripFile['status'] = 'new'
db_trips = pd.concat([oldTripFile, newTripFile]) #concatenation of the two dataframes
db_trips = db_trips.reset_index(drop = True)
db_trips.drop_duplicates(keep = False, subset = [column for column in db_trips.columns[:-1] ], inplace = True)
db_trips = db_trips.reset_index(drop = True)
db_trips.head()
update_details = []
# Get the duplicates : only consider ['fromCode', 'toCode', 'mode'] for identifying duplicates
# Create a dataframe that contains only the trips that was deleted and was recently added
db_trips_delete_new = db_trips.drop_duplicates(keep = False, subset = ['fromCode', 'toCode', 'mode'])
db_trips_delete_new = db_trips_delete_new.reset_index(drop = True)
# New trips
new_trips = db_trips_delete_new[db_trips_delete_new['status'] == 'new'].values.tolist()
for trip in new_trips:
trip.append('new trip added')
update_details = update_details + new_trips
# Deleted trips
old_trips = db_trips_delete_new[db_trips_delete_new['status'] == 'old'].values.tolist()
for trip in old_trips:
trip.append('trip deleted')
update_details = update_details + old_trips
db_trips_delete_new.head()
# Updated trips
# Ocean: no need to check the transit time column
sea_trips = db_trips.loc[db_trips['mode'].isin(['sea', 'cfs'])]
sea_trips = sea_trips.reset_index(drop = True)
list_trips_sea_update = sea_trips[sea_trips.duplicated(subset = ['fromCode', 'toCode', 'mode'], keep = False)].values.tolist()
if len(list_trips_sea_update) != 0:
for i in tqdm(range(0, len(list_trips_sea_update) - 1)):
for j in range(i + 1, len(list_trips_sea_update)):
if list_trips_sea_update[i][2] == list_trips_sea_update[j][2] and list_trips_sea_update[i][9] == list_trips_sea_update[j][9] and list_trips_sea_update[i][14] == list_trips_sea_update[j][14]:
update_comment = ''
# Check display from / to
if list_trips_sea_update[i][5] != list_trips_sea_update[j][5]:
update_comment = update_comment + 'fromDisplayLocation was updated.'
if list_trips_sea_update[i][12] != list_trips_sea_update[j][12]:
update_comment = update_comment + 'toDisplayLocation was updated.'
# Get the updated trip (the row with status new)
if list_trips_sea_update[i][17] == 'new' and list_trips_sea_update[j][17] != 'new' :
list_trips_sea_update[i].append(update_comment)
update_details = update_details + [list_trips_sea_update[i]]
else:
if list_trips_sea_update[j][17] == 'new' and list_trips_sea_update[i][17] != 'new':
list_trips_sea_update[j].append(update_comment)
update_details = update_details + [list_trips_sea_update[j]]
else:
print('excel files are not organized')
# Ground: transit time column need to be checked
ground_trips = db_trips[~db_trips['mode'].isin(['sea', 'cfs'])]
ground_trips = ground_trips.reset_index(drop = True)
list_trips_ground_update = ground_trips[ground_trips.duplicated(subset = ['fromCode', 'toCode', 'mode'], keep = False)].values.tolist()
if len(list_trips_ground_update) != 0:
for i in tqdm(range(0, len(list_trips_ground_update) - 1)):
for j in range(i + 1, len(list_trips_ground_update)):
if list_trips_ground_update[i][2] == list_trips_ground_update[j][2] and list_trips_ground_update[i][9] == list_trips_ground_update[j][9] and list_trips_ground_update[i][14] == list_trips_ground_update[j][14]:
update_comment = ''
# Check display from / to
if list_trips_ground_update[i][5] != list_trips_ground_update[j][5]:
update_comment = update_comment + 'fromDisplayLocation was updated.'
if list_trips_ground_update[i][12] != list_trips_ground_update[j][12]:
update_comment = update_comment + 'toDisplayLocation was updated.'
# Check transit time
if list_trips_ground_update[i][15] != list_trips_ground_update[j][15]:
update_comment = update_comment + 'transit time was updated.'
# Get the updated trip (the row with status new)
if list_trips_ground_update[i][17] == 'new' and list_trips_ground_update[j][17] != 'new' :
list_trips_ground_update[i].append(update_comment)
update_details=update_details + [list_trips_ground_update[i]]
else:
if list_trips_ground_update[j][17] == 'new' and list_trips_ground_update[i][17] != 'new':
list_trips_ground_update[j].append(update_comment)
update_details = update_details + [list_trips_ground_update[j]]
else:
print('excel files are not organized')
And here an example of what my trip file looks like:
Any help is appreciated :)

If ever it can be useful to someone else, issue was coming from the type. When keeping my tripFile in memory, one of my column was "10.0" for example, whereas when imported this column was "10".
As I'm comparing with another imported tripFile, if both files are imported the column in both files are of same type, but if one of the files is kept in memory the column is of different type in both files and considered as updated. As such takes much longer when kept in memory as every row is considered updated.

Python XML Parsing - need to correct while loop

Fairly new to Python. I'm parsing an XML file and the following code returns the undesired results. I can understand why I'm getting my results - there are two escalations in the XML for this deal and I'm getting results for each set. I'm need help updating my code to only return the monthly rent for each escalation in the XML:
<RentEscalations>
<RentEscalation ID="354781">
<BeginIn>7</BeginIn>
<Escalation>3.8</Escalation>
<RecurrenceInterval>12</RecurrenceInterval>
<EscalationType>bump</EscalationType>
</RentEscalation>
<RentEscalation ID="354782">
<BeginIn>61</BeginIn>
<Escalation>1.0</Escalation>
<RecurrenceInterval>12</RecurrenceInterval>
<EscalationType>bump</EscalationType>
</RentEscalation>
</RentEscalations>
The rent starts at $3.00/sqft for the first 6 months. This XML block shows that, for each 12 months (RecurrenceInterval), the rent will be $6.80/sqft ($3.00 base + $3.80 escalation). The following twelve months will be $10.60 ($6.80 + 3.80). Each year, the amount per square foot will increase by $3.80 until the 61st month in the term. At that point, the rent will increase by $1.00/sqft for the remainder of the term. The entire term of the lease is 120 months.
My results include 114 results based on the first escalation (3.80/sqft) followed by 114 rows showing as if the rent starts at $3.00/sqft incrementing by $1.00/sqft each year.
Any help is appreciated!
import xml.etree.ElementTree as ET
import pyodbc
import dateutil.relativedelta as rd
import datetime as dt
tree = ET.parse('C:\\FileLocation\\DealData.xml')
root = tree.getroot()
for deal in root.findall("Deals"):
for dl in deal.findall("Deal"):
dealid = dl.get("DealID")
for dts in dl.findall("DealTerms/DealTerm"):
dtid = dts.get("ID")
darea = float(dts.find("RentableArea").text)
dterm = int(dts.find("LeaseTerm").text)
for brrent in dts.findall("BaseRents/BaseRent"):
brid = brrent.get("ID")
rent = float(brrent.find("Rent").text)
darea = float(dts.find("RentableArea").text)
per = brrent.find("Period").text
dtstart = dts.find("CommencementDate").text
startyr = int(dtstart[0:4])
startmo = int(dtstart[5:7])
startday = int(dtstart[8:])
start = dt.date(startyr, startmo, startday)
end = start + rd.relativedelta(months=dterm)
if brrent.find("Duration").text is None:
duration = 0
else:
duration = int(brrent.find("Duration").text)
termbal = dterm - duration
for resc in dts.findall("RentEscalations/RentEscalation"):
rescid = resc.get("ID")
esctype = resc.find("EscalationType").text
begmo = int(resc.find("BeginIn").text)
esc = float(resc.find("Escalation").text)
intrvl = int(resc.find("RecurrenceInterval").text)
if intrvl != 0:
pers = termbal / intrvl
else:
pers = 0
escst = start + rd.relativedelta(months=begmo - 1)
i = 0
x = begmo
newrate = rent
while i < termbal:
billdt = escst + rd.relativedelta(months=i)
if per == "rsf/year":
monthlyamt = (newrate + esc) * darea / 12.0
if per == "month":
monthlyamt = newrate + esc
if per == "year":
monthlyamt = (newrate + esc) / 12.0
if per == "rsf/month":
monthlyamt = (newrate + esc) * darea
try:
if i % intrvl == 0:
level = x + 1
newrent = monthlyamt
x += 1
newrate += esc
else:
level = x
except ZeroDivisionError:
break
i += 1
if dealid == "1254278":
print(dealid, dtid, rescid, dterm, darea, escst, rent, intrvl, esctype, termbal, \
monthlyamt, billdt, pers, level, newrate, newrent)

python data frame filter conditions: any faster way

parts_list = imp_parts_df['Parts'].tolist()
sub_week_list = ['2016-12-11', '2016-12-04', '2016-11-27', '2016-11-20', '2016-11-13']
i = 0
start = DT.datetime.now()
for p in parts_list:
for thisdate in sub_week_list:
thisweek_start = pd.to_datetime(thisdate, format='%Y-%m-%d') #'2016/12/11'
thisweek_end = thisweek_start + DT.timedelta(days=7) # add 7 days to the week date
val_shipped = len(shipment_df[(shipment_df['loc'] == 'USW1') & (shipment_df['part'] == str(p)) & (shipment_df['shipped_date'] >= thisweek_start) & (shipment_df['shipped_date'] < thisweek_end)])
print(DT.datetime.now() - start).total_seconds()
shipment_df has around 35000 records
partlist has 436 parts
sub_week_list has 5 dates in it
it took overall 438.13 secs to run this code
Is there any faster way to do it?

parts_list = imp_parts_df['Parts'].astype(str).tolist()
i = 0
start = DT.datetime.now()
for p in parts_list:
q = 'loc == "xxx" & part == #p & "2016-11-20" <= shipped_date < "2016-11-27"'
val_shipped = len(shipment_df.query(q))
print (DT.datetime.now() - start).total_seconds()

Calculate the future value for only one category using the IRR (Python)

import xlrd
import numpy
fileWorkspace = 'C://Users/jod/Desktop/'
wb1 = xlrd.open_workbook(fileWorkspace + 'assign2.xls')
sh1 = wb1.sheet_by_index(0)
time,amount,category = [],[],[]
for a in range(2,sh1.nrows):
time.append(int(sh1.cell(a,0).value)) # Pulling time from excel (column A)
amount.append(float(sh1.cell(a,1).value)) # Pulling amount from excel (column B)
category.append(str(sh1.cell(a,2).value)) # Pulling category from excel (column C)
#print(time)
#print(amount)
#print(category)
print('\n')
p_p2 = str(sh1.cell(0,1))
p_p1 = p_p2.replace("text:'","")
pp = p_p1.replace("'","")
print(pp) # Printing the type of pay period (Row 1, col B)
c_p2 = str(sh1.cell(1,1))
c_p1 = c_p2.replace("text:'","")
cp = c_p1.replace("'","")
print(cp) # Printing the type of compound period (Row 2, col B)
netflow = 0
outflow = 0
inflow = 0
flow = 0
cat = ["Sales", "Salvage", "Subsidy", "Redeemable", "Utility", "Labor",
"Testing", "Marketing", "Materials", "Logistics"]
if pp == "Years" and cp == "Years": # if pay period and compound period are both in years
IRR = numpy.irr(amount) * 100 # Calculates the internal rate of return (IRR)
print ("IRR:", round(IRR, 2), '%', '\n') # prints (IRR)
for i in time: # for every value in time array
if cat[5] in category: # if "Labor" for cat array is in category array or not
# calculates the present values using all the amount values (col B) instead of
# just using the ones that has "Labor" category label beside them
# Need to make every other value 0, such as beside "Redeemable" and "Salvage"
flow = amount[i] / numpy.power((1 + (IRR/100)), time[i])
if flow>0:
inflow = inflow + flow
if flow<0:
outflow = outflow + flow
print ('Present Value (P) is:', round(flow,0), '\n')
netflow = outflow + inflow
print("In year 0 or current year")
print("-------")
print ('Outflow is: ', round(outflow,0))
print ('Inflow is: ', round(inflow,0))
print ('Netflow is: ', round(netflow,0), '\n')
outflow2 = (round(outflow,0))*(1+(IRR/100))**(9)
inflow2 = (round(inflow,0))*(1+(IRR/100))**(9)
netflow2 = outflow2 + inflow2
print("In year 9")
print("-------")
print ('Outflow is: ', round(outflow2,0))
print ('Inflow is: ', round(inflow2,0))
print ('Netflow is: ', round(netflow2,0), '\n')
I have commented important lines of code for clarification.
Here is the original question:
illustrate the breakdown of major project revenues and expenses by category as a percentage of that project’s future value in year 9. The illustration must also clearly indicate the total future value of the project in year 9 as well as the IRR.
There will be a total of 10 revenue and cost categories that a project may be composed of. The categories are: Sales, salvage, subsidy, redeemable, utility, labor, testing, marketing, materials and logistics. All revenues and expenses will fall in one of these ten categories. The project pay period and compound period will be identified at the top of the Excel sheet. Pay period and compound period may be designated as any of the following: years, quarters, months.
I am getting confused because I am not able to pull the only values from beside the "Labor", "Redeemable", or "Salvage". I just don't know where I am making a mistake, or there is something that is incomplete. Below is the excel file image:
Excel File Image 2
Excel File Image 3

After revising, all cashflows are discounted at the irr. What is done is the following:
i) determineAdjustments takes the pay period (column A) and adjusts if for the year ended (if it is a monthly amount it puts it in the proper year ended) and if its monthly puts in in the month ended (no adjustment necessary). This will divide the pay period by 12 if yearly cash flows are needed (yearly compounding)
ii) IRR is calculated, and the compounding period is used to adjust the monthly IRR for monthly pay periods
iii) all expenses are discounted at the IRR and input into a list for cat_contributions['category_name'] = [discounted period 1, discounted period 2 ... ]
iv) Then the net inflows and outflows are sums of these.
I can't type up data in the spreadsheets from the images as that would take a while, but maybe tinker with this and see if you can get it to work.
from __future__ import division
import xlrd
import numpy
import os
import math
def main(xls = 'xls_name.xlsx', sh = 0):
#save script in same folder as the xls file
os.chdir( os.getcwd() )
wb = xlrd.open_workbook(xls)
sh = wb.sheet_by_index(0)
pay_period = sh.cell_value(0,1)
compounding_period = sh.cell_value(1,1)
compounding_factor, pay_factor = determineAdjustments(
pay_period, compounding_period)
number_of_periods = max( sh.col_values(0, start_rowx = 2) )
flow_per_period = [ 0*i for i in range( int( math.ceil( number_of_periods/pay_factor ) ) + 1 ) ]#list of length number of pay_periods
for r in range(2,sh.nrows):
pay_period = int( math.ceil( sh.cell_value(r,0) / pay_factor ) )
flow_per_period[pay_period] += sh.cell_value(r,1) #unadjusted cash flows
irr = calculateIRR(flow_per_period, compounding_factor)
cat_contributions = sortExpenditures(sh, irr, pay_factor)
total_cat_contributions, netflow, total_outflow, total_inflow = calculateFlows(cat_contributions)
printStats(cat_contributions, irr, compounding_factor, pay_factor,
total_cat_contributions, netflow, total_outflow, total_inflow)
return
def determineAdjustments(pay_period, compounding_period):
if compounding_period == 'years':
compounding_factor = 1
if pay_period == 'months':
pay_factor = 12
if pay_period == 'years':
pay_factor = 1
#assume no days pay periods
if compounding_period == 'months':
compounding_factor = 12
#assume no yearly payouts and that the
#all payments are in months
pay_factor = 1
return compounding_factor, pay_factor
def calculateIRR(cashflow, compounding_factor):
irr = numpy.irr(cashflow)
irr_comp = (1 + irr)**compounding_factor - 1
#seems like in first example it uses rounded irr, can do something like:
#irr_comp = round(irr_comp,4)
return irr_comp
def sortExpenditures(sh, irr, pay_factor):
#percentages and discounting occurs at the IRR caculated in the main
#function
cat = ["Sales", "Salvage", "Subsidy", "Redeemable", "Utility", "Labor",
"Testing", "Marketing", "Materials", "Logistics"]
#python dictionary to sort contributions into categories
cat_contributions = {}
for c in cat:
cat_contributions[c] = []
# create list of contributions of each list item to FV in a dictionary
for r in range(2,sh.nrows):
try:
#discounted cash flow of each expenditure
#using formula FV = expenditure/(1+i)^n
cat_contributions[sh.cell_value(r,2)].append(
sh.cell_value(r,1) / ( (1 + irr) ** (sh.cell_value(r,0)/pay_factor) )
)
except KeyError:
print "No category for type: " + sh.cell_value(r,2) +'\n'
return cat_contributions
def calculateFlows(cat_contributions):
total_outflow = 0
total_inflow = 0
total_cat_contributions = {}
for cat in cat_contributions:
total_cat_contributions[cat] = sum( cat_contributions[cat] )
if total_cat_contributions[cat] < 0:
total_outflow += total_cat_contributions[cat]
else:
total_inflow += total_cat_contributions[cat]
netflow = total_inflow + total_outflow
return total_cat_contributions, netflow, total_outflow, total_inflow
def printStats(cat_contributions, irr, compounding_factor, pay_period,
total_cat_contributions, netflow, total_outflow, total_inflow):
print "IRR: "+str(irr*100) +' %'
if compounding_factor == 1: print "Compounding: Yearly"
if compounding_factor == 12: print "Compounding: Monthly"
if pay_period == 1: "Cashflows: Year Ended"
if pay_period == 12: "Cashflows: Month Ended"
print "Future Value (Net Adjusted Cashflow): " +str(netflow)
print "Adjusted Inflows: " + str(total_inflow)
print "Adjusted Outflows: " + str(total_outflow) +'\n'
for cat in total_cat_contributions:
if total_cat_contributions[cat] != 0:
print '-----------------------------------------------------'
print cat + '\n'
print "Total Contribution to FV " + str( total_cat_contributions[cat] )
if total_cat_contributions[cat] < 0:
print "Contribution to Expenses: " + str ( abs(100 * total_cat_contributions[cat]/total_outflow) )
else:
print "Contribution to Revenues: " + str ( abs(100 * total_cat_contributions[cat]/total_inflow) ) +'\n'
main(xls='Book1.xlsx')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimising if statement, calculation, session query - python

Related

Filtering and saving subset of pandas

Same Python code, same Data, outcomes different if Data imported or not?

Python XML Parsing - need to correct while loop

python data frame filter conditions: any faster way

Calculate the future value for only one category using the IRR (Python)

Categories

Resources