Conditionally setting values with df.loc inside a loop

Conditionally setting values with df.loc inside a loop - python

I'm querying an MS Access db to retrieve a set of leases. My task is to calculate monthly totals for base rent for the next 60 months. The leases have dates related to start and end in order to calculate the correct periods in the event a lease terminates prior to 60 periods. My current challenge comes in when I attempt to increase the base rent by a certain amount whenever it's time to increment for that specific lease. I'm at a beginner level with Python/pandas so my approach is likely not optimum and the code rough looking. It's likely a vectorized approach is better suited however i'm not quite able to execute such code yet.
Data:
Lease input & output
Code:
try:
sql = 'SELECT * FROM [tbl_Leases]'
#sql = 'SELECT * FROM [Copy Of tbl_Leases]'
df = pd.read_sql(sql, conn)
#print df
#df.to_csv('lease_output.csv', index_label='IndexNo')
df_fcst_periods = pd.DataFrame()
# init increments
periods = 0
i = 0
# create empty lists to store looped info from original df
fcst_months = []
fcst_lease_num = []
fcst_base_rent = []
fcst_method = []
fcst_payment_int = []
fcst_rate_inc_amt = []
fcst_rate_inc_int = []
fcst_rent_start = []
# create array for period deltas, rent interval calc, pmt interval calc
fcst_period_delta = []
fcst_rate_int_bool = []
fcst_pmt_int_bool = []
for row in df.itertuples():
# get min of forecast period or lease ending date
min_period = min(fcst_periods, df.Lease_End_Date[i])
# count periods to loop for future periods in new df_fcst
periods = (min_period.year - currentMonth.year) * 12 + (min_period.month - currentMonth.month)
for period in range(periods):
nextMonth = (currentMonth + monthdelta(period))
period_delta = (nextMonth.year - df.Rent_Start_Date[i].year) * 12 + (nextMonth.month - df.Rent_Start_Date[i].month)
period_delta = float(period_delta)
# period delta values allow us to divide by the payment & rent intervals looking for integers
rate_int_calc = period_delta/df['Rate_Increase_Interval'][i]
pmt_int_calc = period_delta/df['Payment_Interval'][i]
# float.is_integer() method - returns bool
rate_int_bool = rate_int_calc.is_integer()
pmt_int_bool = pmt_int_calc.is_integer()
# conditional logic to handle base rent increases
if df['Forecast_Method'][i] == "Percentage" and rate_int_bool:
rate_increase = df['Base_Rent'][i] * (1 + df['Rate_Increase_Amt'][i]/100)
df.loc[df.index, "Base_Rent"] = rate_increase
fcst_base_rent.append(df['Base_Rent'][i])
print "Both True"
else:
fcst_base_rent.append(df['Base_Rent'][i])
print rate_int_bool
fcst_rate_int_bool.append(rate_int_bool)
fcst_pmt_int_bool.append(pmt_int_bool)
fcst_months.append(nextMonth)
fcst_period_delta.append(period_delta)
fcst_rent_start.append(df['Rent_Start_Date'][i])
fcst_lease_num.append(df['Lease_Number'][i])
#fcst_base_rent.append(df['Base_Rent'][i])
fcst_method.append(df['Forecast_Method'][i])
fcst_payment_int.append(df['Payment_Interval'][i])
fcst_rate_inc_amt.append(df['Rate_Increase_Amt'][i])
fcst_rate_inc_int.append(df['Rate_Increase_Interval'][i])
i += 1
df_fcst_periods['Month'] = fcst_months
df_fcst_periods['Rent_Start_Date'] = fcst_rent_start
df_fcst_periods['Lease_Number'] = fcst_lease_num
df_fcst_periods['Base_Rent'] = fcst_base_rent
df_fcst_periods['Forecast_Method'] = fcst_method
df_fcst_periods['Payment_Interval'] = fcst_payment_int
df_fcst_periods['Rate_Increase_Amt'] = fcst_rate_inc_amt
df_fcst_periods['Rate_Increase_Interval'] = fcst_rate_inc_int
df_fcst_periods['Period_Delta'] = fcst_period_delta
df_fcst_periods['Rate_Increase_Interval_bool'] = fcst_rate_int_bool
df_fcst_periods['Payment_Interval_bool'] = fcst_pmt_int_bool
except Exception, e:
print str(e)
conn.close()

I ended up initializing a variable before the periods loop which allowed me to perform a calculation when looping to obtain the correct base rents for subsequent periods.
# init base rent, rate increase amount, new rate for leases
base_rent = df['Base_Rent'][i]
rate_inc_amt = float(df['Rate_Increase_Amt'][i])
new_rate = 0
for period in range(periods):
nextMonth = (currentMonth + monthdelta(period))
period_delta = (nextMonth.year - df.Rent_Start_Date[i].year) * 12 + (nextMonth.month - df.Rent_Start_Date[i].month)
period_delta = float(period_delta)
# period delta values allow us to divide by the payment & rent intervals looking for integers
rate_int_calc = period_delta/df['Rate_Increase_Interval'][i]
pmt_int_calc = period_delta/df['Payment_Interval'][i]
# float.is_integer() method - returns bool
rate_int_bool = rate_int_calc.is_integer()
pmt_int_bool = pmt_int_calc.is_integer()
# conditional logic to handle base rent increases
if df['Forecast_Method'][i] == "Percentage" and rate_int_bool:
new_rate = base_rent * (1 + rate_inc_amt/100)
base_rent = new_rate
fcst_base_rent.append(new_rate)
elif df['Forecast_Method'][i] == "Manual" and rate_int_bool:
new_rate = base_rent + rate_inc_amt
base_rent = new_rate
fcst_base_rent.append(new_rate)
else:
fcst_base_rent.append(base_rent)
Still open for any alternative approaches though!

Related

How to display the average price of bitcoin in Pandas and not through a loop?

y = []
n = 0
days = 1
for i in btc['Adj Close']:
averagePrice = (i + n) / days
n += i
days += 1
y.append(averagePrice)
btc['TopAverage'] = y

If btc is a pandas data frame (as it appears so), then:
btc.loc[:, 'Days'] = list(range(1, btc.shape[0] + 1))
btc.loc[:, 'n'] = btc['Adj Close'].cumsum().shift(periods=1, fill_value=0.)
btc.loc[:, 'TopAverage'] = (btc['Adj Close'] + btc['n']) / btc['Days']
reflects the logic in your code. This will add the columns 'Days' and 'n' to the data frame as well.

how to put a $ in front of my value table numbers

I'm new to python and wondering how I can put a dollar sign in this spot? i think it might have something to do with line 31 in my code but I cannot figure it out
https://i.stack.imgur.com/Flv4W.png
here is the code:
#constants
CITY_CLOSE_RATE = 2
CITY_DIST_RATE = 1
BNDRY_DIST = 20
#inputs
propValue = float(input('What is the cost of the property right now?'))
numYears = int(input('Value after how many years?'))
propDist = float(input('How far is the property from your city?'))
# select the right rate depending on the distance to the city
if propDist <= BNDRY_DIST:
rate = CITY_CLOSE_RATE
else:
rate = CITY_DIST_RATE
#calculate percantage
rate = rate / 100
#print header of the table
print(f'{"Years":>5} {"value":>15}')
# calculating property for every year. body of the table
for count in range (1, numYears +1) :
increment = propValue * rate
endVal = propValue + increment
print (f'{count:>5} {endVal :>15.2f}')
propValue= endVal
#print final prop value after appreciation
print(f'Value of the property after {count} years: ${propValue : .2f}')

One way might be to use an earlier f-string to make endVal in to a string with a $ added before. As such the for loop becomes;
for count in range (1, numYears +1):
increment = propValue * rate
endVal = propValue + increment
strEndVal = '$ ' + f'{endVal:,.2f}'
print (f'{count:>5} {strEndVal :>15}')
propValue= endVal

Calculating monthly growth percentage from cumulative total growth

I am trying to calculate a constant for month-to-month growth rate from an annual growth rate (goal) in Python.
My question has arithmetic similarities to this question, but was not completely answered.
For example, if total annual sales for 2018 are $5,600,000.00 and I have an expected 30% increase for the next year, I would expect total annual sales for 2019 to be $7,280,000.00.
BV_2018 = 5600000.00
Annual_GR = 0.3
EV_2019 = (BV * 0.3) + BV
I am using the last month of 2018 to forecast the first month of 2019
Last_Month_2018 = 522000.00
Month_01_2019 = (Last_Month_2018 * CONSTANT) + Last_Month_2018
For the second month of 2019 I would use
Month_02_2019 = (Month_01_2019 * CONSTANT) + Month_01_2019
...and so on and so forth
The cumulative sum of Month_01_2019 through Month_12_2019 needs to be equal to EV_2019.
Does anyone know how to go about calculating the constant in Python? I am familiar with the np.cumsum function, so that part is not an issue. My problem is I cannot solve for the constant I need.
Thank you in advance and please do not hesitate to ask for further clarification.
More clarification:
# get beginning value (BV)
BV = 522000.00
# get desired end value (EV)
EV = 7280000.00
We are trying to get from BV to EV (which is a cumulative sum) by calculating the cumulative sum of the [12] monthly totals. Each monthly total will have a % increase from the previous month that is constant across months. It is this % increase that I want to solve for.
Keep in mind, BV is the last month of the previous year. It is from BV that our forecast (i.e., Months 1 through 12) will be calculated. So, I'm thinking that it makes sense to go from BV to the EV plus the BV. Then, just remove BV and its value from the list, giving us EV as the cumulative total of Months 1 through 12.
I imagine using this constant in a function like this:
def supplier_forecast_calculator(sales_at_cost_prior_year, sales_at_cost_prior_month, year_pct_growth_expected):
"""
Calculates monthly supplier forecast
Example:
monthly_forecast = supplier_forecast_calculator(sales_at_cost_prior_year = 5600000,
sales_at_cost_prior_month = 522000,
year_pct_growth_expected = 0.30)
monthly_forecast.all_metrics
"""
# get monthly growth rate
monthly_growth_expected = CONSTANT
# get first month sales at cost
month1_sales_at_cost = (sales_at_cost_prior_month*monthly_growth_expected)+sales_at_cost_prior_month
# instantiate lists
month_list = ['Month 1'] # for months
sales_at_cost_list = [month1_sales_at_cost] # for sales at cost
# start loop
for i in list(range(2,13)):
# Append month to list
month_list.append(str('Month ') + str(i))
# get sales at cost and append to list
month1_sales_at_cost = (month1_sales_at_cost*monthly_growth_expected)+month1_sales_at_cost
# append month1_sales_at_cost to sales at cost list
sales_at_cost_list.append(month1_sales_at_cost)
# add total to the end of month_list
month_list.insert(len(month_list), 'Total')
# add the total to the end of sales_at_cost_list
sales_at_cost_list.insert(len(sales_at_cost_list), np.sum(sales_at_cost_list))
# put the metrics into a df
all_metrics = pd.DataFrame({'Month': month_list,
'Sales at Cost': sales_at_cost_list}).round(2)
# return the df
return all_metrics

Let r = 1 + monthly_rate. Then, the problem we are trying to solve is
r + ... + r**12 = EV/BV. We can use numpy to get the numeric solution. This should be relatively fast in practice. We are solving a polynomial r + ... + r**12 - EV/BV = 0 and recovering monthly rate from r. There will twelve complex roots, but only one real positive one - which is what we want.
import numpy as np
# get beginning value (BV)
BV = 522000.00
# get desired end value (EV)
EV = 7280000.00
def get_monthly(BV, EV):
coefs = np.ones(13)
coefs[-1] -= EV / BV + 1
# there will be a unique positive real root
roots = np.roots(coefs)
return roots[(roots.imag == 0) & (roots.real > 0)][0].real - 1
rate = get_monthly(BV, EV)
print(rate)
# 0.022913299846925694
Some comments:
roots.imag == 0 may be problematic in some cases since roots uses a numeric algorithm. As an alternative, we can pick a root with the least imaginary part (in absolute value) among all roots with a positive real part.
We can use the same method to get rates for other time intervals. For example, for weekly rates, we can replace 13 == 12 + 1 with 52 + 1.
The above polynomial has a solution by radicals, as outlined here.
Update on performance. We could also frame this as a fixed point problem, i.e. to look for a fixed point of a function
x = EV/BV * x ** 13 - EV/BV + 1
The fix point x will be equal to (1 + rate)**13.
The following pure-Python implementation is roughly four times faster than the above numpy version on my machine.
def get_monthly_fix(BV, EV, periods=12):
ratio = EV / BV
r = guess = ratio
while True:
r = ratio * r ** (1 / periods) - ratio + 1
if abs(r - guess) < TOLERANCE:
return r ** (1 / periods) - 1
guess = r
We can make this run even faster with a help of numba.jit.

I am not sure if this works (tell me if it doesn't) but try this.
def get_value(start, end, times, trials=100, _amount=None, _last=-1, _increase=None):
#don't call with _amount, _last, or _increase! Only start, end and times
if _amount is None:
_amount = start / times
if _increase is None:
_increase = start / times
attempt = 1
for n in range(times):
attempt = (attempt * _amount) + attempt
if attempt > end:
if _last != 0:
_increase /= 2
_last = 0
_amount -= _increase
elif attempt < end:
if _last != 1:
_increase /= 2
_last = 1
_amount += _increase
else:
return _amount
if trials <= 0:
return _amount
return get_value(start, end, times, trials=trials-1,
_amount=_amount, _last=_last, _increase=_increase)
Tell me if it works.
Used like this:
get_value(522000.00, 7280000.00, 12)

Comparing values in Python data frame efficiently

I'm trading daily on Cryptocurrencies and would like to find which are the most desirable Cryptos for trading.
I have CSV file for every Crypto with the following fields:
Date Sell Buy
43051.23918 1925.16 1929.83
43051.23919 1925.12 1929.79
43051.23922 1925.12 1929.79
43051.23924 1926.16 1930.83
43051.23925 1926.12 1930.79
43051.23926 1926.12 1930.79
43051.23927 1950.96 1987.56
43051.23928 1190.90 1911.56
43051.23929 1926.12 1930.79
I would like to check:
How many quotes will end with profit:
for Buy positions - if one of the following Sells > current Buy.
for Sell positions - if one of the following Buys < current Sell.
How much time it would take to a theoretical position to become profitable.
What can be the profit potential.
I'm using the following code:
#converting from OLE to datetime
OLE_TIME_ZERO = dt.datetime(1899, 12, 30, 0, 0, 0)
def ole(oledt):
return OLE_TIME_ZERO + dt.timedelta(days=float(oledt))
#variables initialization
buy_time = ole(43031.57567) - ole(43031.57567)
sell_time = ole(43031.57567) - ole(43031.57567)
profit_buy_counter = 0
no_profit_buy_counter = 0
profit_sell_counter = 0
no_profit_sell_counter = 0
max_profit_buy_positions = 0
max_profit_buy_counter = 0
max_profit_sell_positions = 0
max_profit_sell_counter = 0
df = pd.read_csv("C:/P/Crypto/bitcoin_test_normal_276k.csv")
#comparing to max
for index, row in df.iterrows():
a = index + 1
df_slice = df[a:]
if df_slice["Sell"].max() - row["Buy"] > 0:
max_profit_buy_positions += df_slice["Sell"].max() - row["Buy"]
max_profit_buy_counter += 1
for index1, row1 in df_slice.iterrows():
if row["Buy"] < row1["Sell"] :
buy_time += ole(row1["Date"])- ole(row["Date"])
profit_buy_counter += 1
break
else:
no_profit_buy_counter += 1
#comparing to sell
for index, row in df.iterrows():
a = index + 1
df_slice = df[a:]
if row["Sell"] - df_slice["Buy"].min() > 0:
max_profit_sell_positions += row["Sell"] - df_slice["Buy"].min()
max_profit_sell_counter += 1
for index2, row2 in df_slice.iterrows():
if row["Sell"] > row2["Buy"] :
sell_time += ole(row2["Date"])- ole(row["Date"])
profit_sell_counter += 1
break
else:
no_profit_sell_counter += 1
num_rows = len(df.index)
buy_avg_time = buy_time/num_rows
sell_avg_time = sell_time/num_rows
if max_profit_buy_counter == 0:
avg_max_profit_buy = "There is no profitable buy positions"
else:
avg_max_profit_buy = max_profit_buy_positions/max_profit_buy_counter
if max_profit_sell_counter == 0:
avg_max_profit_sell = "There is no profitable sell positions"
else:
avg_max_profit_sell = max_profit_sell_positions/max_profit_sell_counter
The code works fine for 10K-20K lines but for a larger amount (276K) it take a long time (more than 10 hrs)
What can I do in order to improve it?
Is there any "Pythonic" way to compare each value in a data frame to all following values?
note - the dates in the CSV are in OLE so I need to convert it to Datetime.
File for testing:
Thanks for your comment.
Here you can find the file that I used:

First, I'd want to create the cumulative maximum/minimum values for Sell and Buy per row, so it's easy to compare to. pandas has cummax and cummin, but they go the wrong way. So we'll do:
df['Max Sell'] = df[::-1]['Sell'].cummax()[::-1]
df['Min Buy'] = df[::-1]['Buy'].cummin()[::-1]
Now, we can just compare each row:
df['Buy Profit'] = df['Max Sell'] - df['Buy']
df['Sell Profit'] = df['Sell'] - df['Min Buy']
I'm positive this isn't exactly what you want as I don't perfectly understand what you're trying to do, but hopefully it leads you in the right direction.
After comparing your function and mine, there is a slight difference, as your a is offset one off the index. Removing that offset, you'll see that my method produces the same results as yours, only in vastly shorter time:
for index, row in df.iterrows():
a = index
df_slice = df[a:]
assert (df_slice["Sell"].max() - row["Buy"]) == df['Max Sell'][a] - df['Buy'][a]
else:
print("All assertions passed!")
Note this will still take the very long time required by your function. Note that this can be fixed with shift, but I don't want to run your function for long enough to figure out what way to shift it.

Python Script slowing down as it progresses?

I have a simulation running that has this basic structure:
from time import time
def CSV(*args):
#write * args to .CSV file
return
def timeleft(a,L,period):
print(#details on how long last period took, ETA#)
for L in range(0,6,4):
for a in range(1,100):
timeA = time()
for t in range(1,1000):
## Manufacturer in Supply Chain ##
inventory_accounting_lists.append(#simple calculations#)
# Simulation to determine the optimal B-value (Basestock level)
for B in range(1,100):
for tau in range(1,1000):
## simple inventory accounting operations##
## Distributor in Supply Chain ##
inventory_accounting_lists.append(#simple calculations#)
# Simulation to determine the optimal B-value (Basestock level)
for B in range(1,100):
for tau in range(1,1000):
## simple inventory accounting operations##
## Wholesaler in Supply Chain ##
inventory_accounting_lists.append(#simple calculations#)
# Simulation to determine the optimal B-value (Basestock level)
for B in range(1,100):
for tau in range(1,1000):
## simple inventory accounting operations##
## Retailer in Supply Chain ##
inventory_accounting_lists.append(#simple calculations#)
# Simulation to determine the optimal B-value (Basestock level)
for B in range(1,100):
for tau in range(1,1000):
## simple inventory accounting operations##
CSV(Simulation_Results)
timeB = time()
timeleft(a,L,timeB-timeA)
As the script continues, it seems to be getting slower and slower. Here is what it is for these values (and it increases linearly as a increases).
L = 0, a = 1: 1.15 minutes
L = 0, a = 99: 1.7 minutes
L = 2, a = 1: 2.7 minutes
L = 2, a = 99: 5.15 minutes
L = 4, a = 1: 4.5 minutes
L = 4, a = 15: 4.95 minutes (this is the latest value it has reached)
Why would each iteration take longer? Each iteration of the loop essentially resets everything except for a master global list, which is being added to each time. However, loops inside each "period" aren't accessing this master list -- they are accessing the same local list every time.
EDIT 1: I will post the simulation code here, in case anyone wants to wade through it, but I warn you, it is rather long, and the variable names are probably unnecessarily confusing.
#########
a = 0.01
L = 0
total = 1000
sim = 500
inv_cost = 1
bl_cost = 4
#########
# Functions
import random
from time import time
time0 = time()
# function to report ETA etc.
def timeleft(a,L,period_time):
if L==0:
periods_left = ((1-a)*100)-1+2*99
if L==2:
periods_left = ((1-a)*100)-1+99
if L==4:
periods_left = ((1-a)*100)-1+0*99
minute_time = period_time/60
minutes_left = (periods_left*period_time)/60
hours_left = (periods_left*period_time)/3600
percentage_complete = 100*((297-periods_left)/297)
print("Time for last period = ","%.2f" % minute_time," minutes")
print("%.2f" % percentage_complete,"% complete")
if hours_left<1:
print("%.2f" % minutes_left," minutes left")
else:
print("%.2f" % hours_left," hours left")
print("")
return
def dcopy(inList):
if isinstance(inList, list):
return list( map(dcopy, inList) )
return inList
# Save values to .CSV file
def CSV(a,L,I_STD_1,I_STD_2,I_STD_3,I_STD_4,O_STD_0,
O_STD_1,O_STD_2,O_STD_3,O_STD_4):
pass
# Initialization
# These are the global, master lists of data
I_STD_1 = [[0],[0],[0]]
I_STD_2 = [[0],[0],[0]]
I_STD_3 = [[0],[0],[0]]
I_STD_4 = [[0],[0],[0]]
O_STD_0 = [[0],[0],[0]]
O_STD_1 = [[0],[0],[0]]
O_STD_2 = [[0],[0],[0]]
O_STD_3 = [[0],[0],[0]]
O_STD_4 = [[0],[0],[0]]
for L in range(0,6,2):
# These are local lists that are appended to at the end of every period
I_STD_1_L = []
I_STD_2_L = []
I_STD_3_L = []
I_STD_4_L = []
O_STD_0_L = []
O_STD_1_L = []
O_STD_2_L = []
O_STD_3_L = []
O_STD_4_L = []
test = []
for n in range(1,100): # THIS is the start of the 99 value loop
a = n/100
print ("L=",L,", alpha=",a)
# Initialization for each Period
F_1 = [0,10] # Forecast
F_2 = [0,10]
F_3 = [0,10]
F_4 = [0,10]
R_0 = [10] # Items Received
R_1 = [10]
R_2 = [10]
R_3 = [10]
R_4 = [10]
for i in range(L):
R_1.append(10)
R_2.append(10)
R_3.append(10)
R_4.append(10)
I_1 = [10] # Final Inventory
I_2 = [10]
I_3 = [10]
I_4 = [10]
IP_1 = [10+10*L] # Inventory Position
IP_2 = [10+10*L]
IP_3 = [10+10*L]
IP_4 = [10+10*L]
O_1 = [10] # Items Ordered
O_2 = [10]
O_3 = [10]
O_4 = [10]
BL_1 = [0] # Backlog
BL_2 = [0]
BL_3 = [0]
BL_4 = [0]
OH_1 = [20] # Items on Hand
OH_2 = [20]
OH_3 = [20]
OH_4 = [20]
OR_1 = [10] # Order received from customer
OR_2 = [10]
OR_3 = [10]
OR_4 = [10]
Db_1 = [10] # Running Average Demand
Db_2 = [10]
Db_3 = [10]
Db_4 = [10]
var_1 = [0] # Running Variance in Demand
var_2 = [0]
var_3 = [0]
var_4 = [0]
B_1 = [IP_1[0]+10] # Optimal Basestock
B_2 = [IP_2[0]+10]
B_3 = [IP_3[0]+10]
B_4 = [IP_4[0]+10]
D = [0,10] # End constomer demand
for i in range(total+1):
D.append(9)
D.append(12)
D.append(8)
D.append(11)
period = [0]
from time import time
timeA = time()
# 1000 time periods t
for t in range(1,total+1):
period.append(t)
#### MANUFACTURER ####
# Manufacturing order from previous time period put into production
R_4.append(O_4[t-1])
#recieve shipment from supplier, calculate items OH HAND
if I_4[t-1]<0:
OH_4.append(R_4[t])
else:
OH_4.append(I_4[t-1]+R_4[t])
# Recieve and dispatch order, update Inventory and Backlog for time t
if (O_3[t-1] + BL_4[t-1]) <= OH_4[t]: # No Backlog
I_4.append(OH_4[t] - (O_3[t-1] + BL_4[t-1]))
BL_4.append(0)
R_3.append(O_3[t-1]+BL_4[t-1])
else:
I_4.append(OH_4[t] - (O_3[t-1] + BL_4[t-1])) # Backlogged
BL_4.append(-I_4[t])
R_3.append(OH_4[t])
# Update Inventory Position
IP_4.append(IP_4[t-1] + O_4[t-1] - O_3[t-1])
# Use exponential smoothing to forecast future demand
future_demand = (1-a)*F_4[t] + a*O_3[t-1]
F_4.append(future_demand)
# Calculate D_bar(t) and Var(t)
Db_4.append((1/t)*sum(O_3[0:t]))
s = 0
for i in range(0,t):
s+=(O_3[i]-Db_4[t])**2
if t==1:
var_4.append(0) # var(1) = 0
else:
var_4.append((1/(t-1))*s)
# Simulation to determine B(t)
S_BC_4 = [10000000000]*10
Run_4 = [0]*10
for B in range(10,500):
S_OH_4 = OH_4[:]
S_I_4 = I_4[:]
S_R_4 = R_4[:]
S_BL_4 = BL_4[:]
S_IP_4 = IP_4[:]
S_O_4 = O_4[:]
# Update O(t)(the period just before the simulation begins)
# using the B value for the simulation
if B - S_IP_4[t] > 0:
S_O_4.append(B - S_IP_4[t])
else:
S_O_4.append(0)
c = 0
for i in range(t+1,t+sim+1):
S_R_4.append(S_O_4[i-1])
#simulate demand
demand = -1
while demand <0:
demand = random.normalvariate(F_4[t+1],(var_4[t])**(.5))
# Receive simulated shipment, calculate simulated items on hand
if S_I_4[i-1]<0:
S_OH_4.append(S_R_4[i])
else:
S_OH_4.append(S_I_4[i-1]+S_R_4[i])
# Receive and send order, update Inventory and Backlog (simulated)
owed = (demand + S_BL_4[i-1])
S_I_4.append(S_OH_4[i] - owed)
if owed <= S_OH_4[i]: # No Backlog
S_BL_4.append(0)
c += inv_cost*S_I_4[i]
else:
S_BL_4.append(-S_I_4[i]) # Backlogged
c += bl_cost*S_BL_4[i]
# Update Inventory Position
S_IP_4.append(S_IP_4[i-1] + S_O_4[i-1] - demand)
# Update Order, Upstream member dispatches goods
if (B-S_IP_4[i]) > 0:
S_O_4.append(B - S_IP_4[i])
else:
S_O_4.append(0)
# Log Simulation costs for that B-value
S_BC_4.append(c)
# If the simulated costs are increasing, stop
if B>11:
dummy = []
for i in range(0,10):
dummy.append(S_BC_4[B-i]-S_BC_4[B-i-1])
Run_4.append(sum(dummy)/float(len(dummy)))
if Run_4[B-3] > 0 and B>20:
break
else:
Run_4.append(0)
# Use minimum cost as new B(t)
var = min((val, idx) for (idx, val) in enumerate(S_BC_4))
optimal_B = var[1]
B_4.append(optimal_B)
# Calculate O(t)
if B_4[t] - IP_4[t] > 0:
O_4.append(B_4[t] - IP_4[t])
else:
O_4.append(0)
#### DISTRIBUTOR ####
#recieve shipment from supplier, calculate items OH HAND
if I_3[t-1]<0:
OH_3.append(R_3[t])
else:
OH_3.append(I_3[t-1]+R_3[t])
# Recieve and dispatch order, update Inventory and Backlog for time t
if (O_2[t-1] + BL_3[t-1]) <= OH_3[t]: # No Backlog
I_3.append(OH_3[t] - (O_2[t-1] + BL_3[t-1]))
BL_3.append(0)
R_2.append(O_2[t-1]+BL_3[t-1])
else:
I_3.append(OH_3[t] - (O_2[t-1] + BL_3[t-1])) # Backlogged
BL_3.append(-I_3[t])
R_2.append(OH_3[t])
# Update Inventory Position
IP_3.append(IP_3[t-1] + O_3[t-1] - O_2[t-1])
# Use exponential smoothing to forecast future demand
future_demand = (1-a)*F_3[t] + a*O_2[t-1]
F_3.append(future_demand)
# Calculate D_bar(t) and Var(t)
Db_3.append((1/t)*sum(O_2[0:t]))
s = 0
for i in range(0,t):
s+=(O_2[i]-Db_3[t])**2
if t==1:
var_3.append(0) # var(1) = 0
else:
var_3.append((1/(t-1))*s)
# Simulation to determine B(t)
S_BC_3 = [10000000000]*10
Run_3 = [0]*10
for B in range(10,500):
S_OH_3 = OH_3[:]
S_I_3 = I_3[:]
S_R_3 = R_3[:]
S_BL_3 = BL_3[:]
S_IP_3 = IP_3[:]
S_O_3 = O_3[:]
# Update O(t)(the period just before the simulation begins)
# using the B value for the simulation
if B - S_IP_3[t] > 0:
S_O_3.append(B - S_IP_3[t])
else:
S_O_3.append(0)
c = 0
for i in range(t+1,t+sim+1):
#simulate demand
demand = -1
while demand <0:
demand = random.normalvariate(F_3[t+1],(var_3[t])**(.5))
S_R_3.append(S_O_3[i-1])
# Receive simulated shipment, calculate simulated items on hand
if S_I_3[i-1]<0:
S_OH_3.append(S_R_3[i])
else:
S_OH_3.append(S_I_3[i-1]+S_R_3[i])
# Receive and send order, update Inventory and Backlog (simulated)
owed = (demand + S_BL_3[i-1])
S_I_3.append(S_OH_3[i] - owed)
if owed <= S_OH_3[i]: # No Backlog
S_BL_3.append(0)
c += inv_cost*S_I_3[i]
else:
S_BL_3.append(-S_I_3[i]) # Backlogged
c += bl_cost*S_BL_3[i]
# Update Inventory Position
S_IP_3.append(S_IP_3[i-1] + S_O_3[i-1] - demand)
# Update Order, Upstream member dispatches goods
if (B-S_IP_3[i]) > 0:
S_O_3.append(B - S_IP_3[i])
else:
S_O_3.append(0)
# Log Simulation costs for that B-value
S_BC_3.append(c)
# If the simulated costs are increasing, stop
if B>11:
dummy = []
for i in range(0,10):
dummy.append(S_BC_3[B-i]-S_BC_3[B-i-1])
Run_3.append(sum(dummy)/float(len(dummy)))
if Run_3[B-3] > 0 and B>20:
break
else:
Run_3.append(0)
# Use minimum cost as new B(t)
var = min((val, idx) for (idx, val) in enumerate(S_BC_3))
optimal_B = var[1]
B_3.append(optimal_B)
# Calculate O(t)
if B_3[t] - IP_3[t] > 0:
O_3.append(B_3[t] - IP_3[t])
else:
O_3.append(0)
#### WHOLESALER ####
#recieve shipment from supplier, calculate items OH HAND
if I_2[t-1]<0:
OH_2.append(R_2[t])
else:
OH_2.append(I_2[t-1]+R_2[t])
# Recieve and dispatch order, update Inventory and Backlog for time t
if (O_1[t-1] + BL_2[t-1]) <= OH_2[t]: # No Backlog
I_2.append(OH_2[t] - (O_1[t-1] + BL_2[t-1]))
BL_2.append(0)
R_1.append(O_1[t-1]+BL_2[t-1])
else:
I_2.append(OH_2[t] - (O_1[t-1] + BL_2[t-1])) # Backlogged
BL_2.append(-I_2[t])
R_1.append(OH_2[t])
# Update Inventory Position
IP_2.append(IP_2[t-1] + O_2[t-1] - O_1[t-1])
# Use exponential smoothing to forecast future demand
future_demand = (1-a)*F_2[t] + a*O_1[t-1]
F_2.append(future_demand)
# Calculate D_bar(t) and Var(t)
Db_2.append((1/t)*sum(O_1[0:t]))
s = 0
for i in range(0,t):
s+=(O_1[i]-Db_2[t])**2
if t==1:
var_2.append(0) # var(1) = 0
else:
var_2.append((1/(t-1))*s)
# Simulation to determine B(t)
S_BC_2 = [10000000000]*10
Run_2 = [0]*10
for B in range(10,500):
S_OH_2 = OH_2[:]
S_I_2 = I_2[:]
S_R_2 = R_2[:]
S_BL_2 = BL_2[:]
S_IP_2 = IP_2[:]
S_O_2 = O_2[:]
# Update O(t)(the period just before the simulation begins)
# using the B value for the simulation
if B - S_IP_2[t] > 0:
S_O_2.append(B - S_IP_2[t])
else:
S_O_2.append(0)
c = 0
for i in range(t+1,t+sim+1):
#simulate demand
demand = -1
while demand <0:
demand = random.normalvariate(F_2[t+1],(var_2[t])**(.5))
# Receive simulated shipment, calculate simulated items on hand
S_R_2.append(S_O_2[i-1])
if S_I_2[i-1]<0:
S_OH_2.append(S_R_2[i])
else:
S_OH_2.append(S_I_2[i-1]+S_R_2[i])
# Receive and send order, update Inventory and Backlog (simulated)
owed = (demand + S_BL_2[i-1])
S_I_2.append(S_OH_2[i] - owed)
if owed <= S_OH_2[i]: # No Backlog
S_BL_2.append(0)
c += inv_cost*S_I_2[i]
else:
S_BL_2.append(-S_I_2[i]) # Backlogged
c += bl_cost*S_BL_2[i]
# Update Inventory Position
S_IP_2.append(S_IP_2[i-1] + S_O_2[i-1] - demand)
# Update Order, Upstream member dispatches goods
if (B-S_IP_2[i]) > 0:
S_O_2.append(B - S_IP_2[i])
else:
S_O_2.append(0)
# Log Simulation costs for that B-value
S_BC_2.append(c)
# If the simulated costs are increasing, stop
if B>11:
dummy = []
for i in range(0,10):
dummy.append(S_BC_2[B-i]-S_BC_2[B-i-1])
Run_2.append(sum(dummy)/float(len(dummy)))
if Run_2[B-3] > 0 and B>20:
break
else:
Run_2.append(0)
# Use minimum cost as new B(t)
var = min((val, idx) for (idx, val) in enumerate(S_BC_2))
optimal_B = var[1]
B_2.append(optimal_B)
# Calculate O(t)
if B_2[t] - IP_2[t] > 0:
O_2.append(B_2[t] - IP_2[t])
else:
O_2.append(0)
#### RETAILER ####
#recieve shipment from supplier, calculate items OH HAND
if I_1[t-1]<0:
OH_1.append(R_1[t])
else:
OH_1.append(I_1[t-1]+R_1[t])
# Recieve and dispatch order, update Inventory and Backlog for time t
if (D[t] +BL_1[t-1]) <= OH_1[t]: # No Backlog
I_1.append(OH_1[t] - (D[t] + BL_1[t-1]))
BL_1.append(0)
R_0.append(D[t]+BL_1[t-1])
else:
I_1.append(OH_1[t] - (D[t] + BL_1[t-1])) # Backlogged
BL_1.append(-I_1[t])
R_0.append(OH_1[t])
# Update Inventory Position
IP_1.append(IP_1[t-1] + O_1[t-1] - D[t])
# Use exponential smoothing to forecast future demand
future_demand = (1-a)*F_1[t] + a*D[t]
F_1.append(future_demand)
# Calculate D_bar(t) and Var(t)
Db_1.append((1/t)*sum(D[1:t+1]))
s = 0
for i in range(1,t+1):
s+=(D[i]-Db_1[t])**2
if t==1: # Var(1) = 0
var_1.append(0)
else:
var_1.append((1/(t-1))*s)
# Simulation to determine B(t)
S_BC_1 = [10000000000]*10
Run_1 = [0]*10
for B in range(10,500):
S_OH_1 = OH_1[:]
S_I_1 = I_1[:]
S_R_1 = R_1[:]
S_BL_1 = BL_1[:]
S_IP_1 = IP_1[:]
S_O_1 = O_1[:]
# Update O(t)(the period just before the simulation begins)
# using the B value for the simulation
if B - S_IP_1[t] > 0:
S_O_1.append(B - S_IP_1[t])
else:
S_O_1.append(0)
c=0
for i in range(t+1,t+sim+1):
#simulate demand
demand = -1
while demand <0:
demand = random.normalvariate(F_1[t+1],(var_1[t])**(.5))
S_R_1.append(S_O_1[i-1])
# Receive simulated shipment, calculate simulated items on hand
if S_I_1[i-1]<0:
S_OH_1.append(S_R_1[i])
else:
S_OH_1.append(S_I_1[i-1]+S_R_1[i])
# Receive and send order, update Inventory and Backlog (simulated)
owed = (demand + S_BL_1[i-1])
S_I_1.append(S_OH_1[i] - owed)
if owed <= S_OH_1[i]: # No Backlog
S_BL_1.append(0)
c += inv_cost*S_I_1[i]
else:
S_BL_1.append(-S_I_1[i]) # Backlogged
c += bl_cost*S_BL_1[i]
# Update Inventory Position
S_IP_1.append(S_IP_1[i-1] + S_O_1[i-1] - demand)
# Update Order, Upstream member dispatches goods
if (B-S_IP_1[i]) > 0:
S_O_1.append(B - S_IP_1[i])
else:
S_O_1.append(0)
# Log Simulation costs for that B-value
S_BC_1.append(c)
# If the simulated costs are increasing, stop
if B>11:
dummy = []
for i in range(0,10):
dummy.append(S_BC_1[B-i]-S_BC_1[B-i-1])
Run_1.append(sum(dummy)/float(len(dummy)))
if Run_1[B-3] > 0 and B>20:
break
else:
Run_1.append(0)
# Use minimum as your new B(t)
var = min((val, idx) for (idx, val) in enumerate(S_BC_1))
optimal_B = var[1]
B_1.append(optimal_B)
# Calculate O(t)
if B_1[t] - IP_1[t] > 0:
O_1.append(B_1[t] - IP_1[t])
else:
O_1.append(0)
### Calculate the Standard Devation of the last half of time periods ###
def STD(numbers):
k = len(numbers)
mean = sum(numbers) / k
SD = (sum([dev*dev for dev in [x-mean for x in numbers]])/(k-1))**.5
return SD
start = (total//2)+1
# Only use the last half of the time periods to calculate the standard deviation
I_STD_1_L.append(STD(I_1[start:]))
I_STD_2_L.append(STD(I_2[start:]))
I_STD_3_L.append(STD(I_3[start:]))
I_STD_4_L.append(STD(I_4[start:]))
O_STD_0_L.append(STD(D[start:]))
O_STD_1_L.append(STD(O_1[start:]))
O_STD_2_L.append(STD(O_2[start:]))
O_STD_3_L.append(STD(O_3[start:]))
O_STD_4_L.append(STD(O_4[start:]))
from time import time
timeB = time()
timeleft(a,L,timeB-timeA)
I_STD_1[L//2] = I_STD_1_L[:]
I_STD_2[L//2] = I_STD_2_L[:]
I_STD_3[L//2] = I_STD_3_L[:]
I_STD_4[L//2] = I_STD_4_L[:]
O_STD_0[L//2] = O_STD_0_L[:]
O_STD_1[L//2] = O_STD_1_L[:]
O_STD_2[L//2] = O_STD_2_L[:]
O_STD_3[L//2] = O_STD_3_L[:]
O_STD_4[L//2] = O_STD_4_L[:]
CSV(a,L,I_STD_1,I_STD_2,I_STD_3,I_STD_4,O_STD_0,
O_STD_1,O_STD_2,O_STD_3,O_STD_4)
from time import time
timeE = time()
print("Run Time: ",(timeE-time0)/3600," hours")

This would be a good time to look at a profiler. You can profile the code to determine where time is being spent. It would appear likely that you issue is in the simulation code, but without being able to see that code the best help you're likely to get going to be vague.
Edit in light of added code:
You're doing a fair amount of copying of lists, which while not terribly expensive can consume a lot of time.
I agree the your code is probably unnecessarily confusing and would advise you to clean up the code. Changing the confusing names to meaningful ones may help you find where you're having a problem.
Finally, it may be the case that your simulation is simply computationally expensive. You might want to consider looking into a SciPy, Pandas, or some other Python mathematic package to get better performance and perhaps better tools for expressing the model you're simulating.

I experienced a similar problem with a Python 3.x script I wrote. The script randomly generated 1,000,000 (one million) JSON objects, writing them out to a file.
My problem was that the program was growing progressively slower as time proceeded. Here is a timestamp trace every 10,000 objects:
So far: Mar23-17:56:46: 0
So far: Mar23-17:56:48: 10000 ( 2 seconds)
So far: Mar23-17:56:50: 20000 ( 2 seconds)
So far: Mar23-17:56:55: 30000 ( 5 seconds)
So far: Mar23-17:57:01: 40000 ( 6 seconds)
So far: Mar23-17:57:09: 50000 ( 8 seconds)
So far: Mar23-17:57:18: 60000 ( 8 seconds)
So far: Mar23-17:57:29: 70000 (11 seconds)
So far: Mar23-17:57:42: 80000 (13 seconds)
So far: Mar23-17:57:56: 90000 (14 seconds)
So far: Mar23-17:58:13: 100000 (17 seconds)
So far: Mar23-17:58:30: 110000 (17 seconds)
So far: Mar23-17:58:51: 120000 (21 seconds)
So far: Mar23-17:59:12: 130000 (21 seconds)
So far: Mar23-17:59:35: 140000 (23 seconds)
As can be seen, the script takes progressively longer to generate groups of 10,000 records.
In my case it turned out to be the way I was generating unique ID numbers, each in the range of 10250000000000-10350000000000. To avoid regenerating the same ID twice, I stored a newly generated ID in a list, checking later that the ID does not exist in the list:
trekIdList = []
def GenerateRandomTrek ():
global trekIdList
while True:
r = random.randint (10250000000000, 10350000000000)
if not r in trekIdList:
trekIdList.append (r)
return r
The problem is that an unsorted list takes O(n) to search. As newly generated IDs are appended to the list, the time needed to traverse/search the list grows.
The solution was to switch to a dictionary (or map):
trekIdList = {}
. . .
def GenerateRandomTrek ():
global trekIdList
while True:
r = random.randint (10250000000000, 10350000000000)
if not r in trekIdList:
trekIdList [r] = 1
return r
The improvement was immediate:
So far: Mar23-18:11:30: 0
So far: Mar23-18:11:30: 10000
So far: Mar23-18:11:31: 20000
So far: Mar23-18:11:31: 30000
So far: Mar23-18:11:31: 40000
So far: Mar23-18:11:32: 50000
So far: Mar23-18:11:32: 60000
So far: Mar23-18:11:32: 70000
So far: Mar23-18:11:33: 80000
So far: Mar23-18:11:33: 90000
So far: Mar23-18:11:33: 100000
So far: Mar23-18:11:34: 110000
So far: Mar23-18:11:34: 120000
So far: Mar23-18:11:34: 130000
So far: Mar23-18:11:35: 140000
The reason is that accessing a value in a dictionary/map/hash is O(1).
Moral: When dealing with large numbers of items, use a dictionary/map or binary searching a sorted list rathen than an unordered list.

You can use cProfile and the like but many times it will still be hard to spot the issue. However knowing that slowness is in linear progression is at huge benefit for you since you already kind of know what the problem is, but not exactly where it is.
I'd start by elimination and simplifying:
Make a small fast example that demonstrates the sluggishness as a separate file.
Run the above and keep removing/commenting out huge portions of the code.
Once you have narrowed down enough, look for Python keywords values(), items(), in, for , deepcopy as good examples.
By continuously simplifying the example and re-running the test script you will eventually get down to the core issue.
Once you resolved one bottleneck, you might find that you still exhibit the sluggishness when you bring back the old code. Most probably there's more than 1 bottlenecks then.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Conditionally setting values with df.loc inside a loop - python

Related

How to display the average price of bitcoin in Pandas and not through a loop?

how to put a $ in front of my value table numbers

Calculating monthly growth percentage from cumulative total growth

Comparing values in Python data frame efficiently

Python Script slowing down as it progresses?

Categories

Resources