PuLP python - how to formulate constraint with a binary variable - python

I am working on a logistics supply-demand problem involving the loading of vessels over a 7-day horizon.
I am trying to define a binary variable that indicates the date a vessel is loaded i.e. 1 = Can load or 0 = can't load. This criteria is determined by the inventory availability: if material is available, the vessel can load.
Current output
Currently the output of the binary variable is always 1, which is incorrect because once the vessel is loaded then the shipment is complete. As a result the shipment variable (which signifies what date the vessel demand is satisfied) is at 1.0 for each day in the planned horizon
Desired Output
I need the binary variable to signify when a vessel can load (1/0). The vessel can only load after it has arrived, and if sufficient material is available in the inventory.
Code and definitions below:
Owing to the amount of code in the model (including variable formation) it is very difficult to provide an MRE, so I am hoping that someone can spot the error in the definition of the binary variable/constraint expression.
vessel_grade_demand_tonnes[vessel, grade]: Constant. the required amount, in tonnes
of each grade required by each vessel.
vessel_sales_demand_vars[(vessel, grade, date)]: Variable. The date
a vessels demand requirements are fully satisfied i.e. ship is loaded with all grades it requires.
vessel_load_start_date[vessel, date]: Binary. The date indicating
when a vessel can be loaded. NOTE a vessel can only load if the total
amount it requires is available in the port inventory,
port_inventory_vars[date, grade].
Code:
# Vessel can only load when sufficient material available.
for date, vessel, grade in vessel_sales_temp:
model += vessel_load_start_date[vessel, date] * vessel_sales_demand_tonnes[date, vessel, grade] <= port_inventory_vars[date, grade]
# All vessel requirements must be satisfied on one day, defined by loading date
for grade in grades:
for vessel, date in vessel_load_start_date:
model += vessel_load_start_date[vessel, date] * vessel_grade_demand_tonnes[vessel, grade] == pulp.lpSum(vessel_grade_demand_tonnes[vessel, grade])
model += vessel_sales_demand_vars[(vessel, grade, date)] <= vessel_load_start_date[vessel, date] * vessel_grade_demand_tonnes[vessel, grade]
# Vessel sales requirements vars must equal the total required sales tonnes
for vessel, grade, date in vessel_sales_demand_vars:
model += pulp.lpSum(vessel_sales_demand_vars[vessel, grade, date]) == vessel_grade_demand_tonnes[vessel, grade]
All help gratefully received.

Related

Define the reward function to minimize costs

I have a problem with my reinforcement learning model. I am trying to simulate an electric battery storage.
The battery charges when the electricity prices are low and discharge ONLY to the user at fixed hours during the day, every day.
Therefore, the only cost for the user is power of charge * electricity price at the hour.
The reward function is set as the opposite of the cumulative sum of the cost.
Is it a correct approach? How to properly define it so that the overall cost of the purchased electricity is at minimum at the end of the year?
The problem that I have is that the battery will always near the maximum capacity and never fully take advantage of the full range of MWh available.
1. Define a dataframe where to store fictitious electricity prices for 365 days
df=pd.DataFrame(np.random.randint(0,500,size=(24, 365)))
2. Define the main parameters
Lookback_window_size=7
Current_day=Lookback_window_size
P_charge=2 #MW
P_discharge=3 #MW
3. Define the class Battery(Env)
class Battery(Env):
metadata = {'render.modes': ['human']}
def __init__(self, df):
#Import the dataframe
self.df = df
# The action space is a 1D array of shape (24,). Since we are simulating day-ahead market, the action space returns
# the overall daily charge / no charge scenario
# action = 1 means that we charge our battery, action = 0 means that we don't charge
self.action_space= spaces.MultiBinary(24)
# The observation space is a 1D array. Given a lookback window size of 1 day, then The first 48 columns represent
# the electricity prices for the current day + all the days before included in the lookback window size.
# The last two columns store SOC (state of charge) at the end of the day and overall cost
# (how much we paid for electricity).
self.observation_shape=(int((Lookback_window_size+1)*24+2),)
self.observation_space = spaces.Box(low = 0, high=np.inf, shape=self.observation_shape, dtype=np.float64)
def _next_observation(self):
# Add the prices of the last days to the monitor matrix
prices=[]
for i in range(self.Current_day - Lookback_window_size,self.Current_day + 1):
prices=np.concatenate([prices,self.df.iloc[0:,i].values])
# Add extra values to monitor such as SOC, cost and day of the week (Monday=1,Tuesday=2,etc.)
extra = [self.SOC, self.Cost]
obs=np.concatenate([prices,extra])
return obs
def _take_action(self, action):
# Being the action space an array, the for loop will check the action at every hour (action[i]) and update the
# cost and the state of charge
self.capacity=200 #MWh
i=0
for x in action:
#When action = 1 then we charge our battery, if action = 0 then we don't charge
if x == 1:
# The cost increase based on the price of the electricity at that hour
self.Cost+=self.df[self.Current_day][i]*P_charge
# If we charge, then the state of charge (SOC) increases as well
self.SOC+=P_charge
# Everyday we discharge the battery always at the same hours
if (i in range(8,14)):
self.SOC-=P_discharge
# if the battery is depleted, then we directly buy electricity from the grid
if self.SOC<0:
self.Cost+=self.df[self.Current_day][i+1]*(-self.SOC)
self.SOC=0
#the battery cannot charge above the capacity threshold.
if self.capacity is not None:
if self.SOC > self.capacity:
# We subtract the latest cost. Since it could not have happened being the SOC above the maximum.
self.Cost-=self.df[self.Current_day][i]*P_charge
# The capacity needs to be set to the threshold
self.SOC = min(self.SOC, self.capacity)
i+=1
def step(self, action):
# Execute one time step within the environment
self._take_action(action)
self.Current_day += 1
# Maximizing the reward means to minimize the costs
reward = - self.Cost
# Stop at the end of the dataframe
done = self.Current_day >= len(self.df.columns)-1
obs = self._next_observation()
return obs, reward, done, {}
def render(self, mode='human', close=False):
print(f'Day: {self.Current_day}')
print(f'SOC: {self.SOC}')
print(f'Cost: {self.Cost}')
print(f'Actions: {action}')
def reset(self):
self.Current_day = Lookback_window_size
# Give an initial SOC value
self.SOC = 50
# Cost at day 0 is null
self.Cost = 0
return self._next_observation()

Parse raw text data and extract a particular value in Python

One of the columns in my database stores text information in the below mentioned format. The text is not in a standard format sometimes there might be additional text before "Insurance Date" field. When I do the split in Python it might place this "Insurance date" in different columns. I need to search for the value "Insurance date in all columns in this case.
Sample text
"Accumulation Period - period of time insured must incur eligible medical expenses at least equal to the deductible amount in order to establish a benefit period under a major medical expense or comprehensive medical expense policy.\n
Insurance Date 12/17/2018\n
Insurance Number 235845\n
Carrier Name SKGP\n
Coverage $240000"
Expected result
INS_NO Insurance Date Carrier Name
235845 12/17/2018 SKGP
How do we parse raw text information like this and extract the value of Insurance Date
I'm using the below logic to extract this but I'm don't know how to extract the date into another column
df= pd.read_sql(query, conn)
df2=df["NOTES"].str.split("\n", expand=True)
Use regex
If the text follows a pattern (more or less), you could use regex.
See the python documentation for regular expressions operations here.
Example
See and try with the code of two possible solutions here.
Below you can find a simplified example.
text = """
Accumulation Period - period of time insured must incur eligible medical expenses at least equal to the deductible amount in order to establish a benefit period under a major medical expense or comprehensive medical expense policy.
Insurance Date 12/17/2018
Insurance Number 235845
Carrier Name SKGP
Coverage $240000
"""
pattern = re.compile(r"Insurance Date (.*)\nInsurance Number (.*)\nCarrier Name (.*)\n")
match = pattern.search(text)
print("Found:")
if match:
for g in match.groups():
print(g)
The output
Found:
12/17/2018
235845
SKGP
If I understand you correctly, this may get you close to what you need:
insurance = """
"Accumulation Period - period of time insured must incur eligible medical expenses at least equal to the deductible amount in order to establish a benefit period under a major medical expense or comprehensive medical expense policy.\n
Insurance Date 12/17/2018\n
Insurance Number 235845\n
Carrier Name SKGP\n
Coverage $240000"
"""
items = insurance.split('\n')
filtered_items = list(filter(lambda x: x != "", items))
del filtered_items[0]
del filtered_items[-1]
row = []
for item in filtered_items:
row.append(item.split(' ')[-1])
columns = ["INS_NO ", "Insurance Date", "Carrier Name"]
df = pd.DataFrame([row],columns=columns)
df
Output:
INS_NO Insurance Date Carrier Name
0 12/17/2018 235845 SKGP

MIP using PULP not approaching result

I am trying to solve a MIP problem. I am trying to find the number of exams to be done by each tech on a date for a week by minimizing the total number of techs used.
I have demand, time taken by each tech, list of techs etc. in separate dataframes.
Initially, I was using the cost function as minimizing the total time used to finish demand which #kabdulla helped me solve, linkhere!
Now, with the new cost function, the script gets stuck and doesn't seem to converge and I am not able to identify the reason.
Below is my code so far:
# Instantiate problem class
model = pulp.LpProblem("Time minimizing problem", pulp.LpMinimize)
capacity = pulp.LpVariable.dicts("capacity",
((examdate , techname, region) for examdate, techname, region in tech_data_new.index),
lowBound=0,
cat='Integer')
tech_used = pulp.LpVariable.dicts("techs",
((examdate,techname) for examdate,techname,region in tech_data_new.index.unique()),
cat='Binary')
model += pulp.lpSum(tech_used[examdate, techname] for examdate,techname in date_techname_index.index.unique())
for date in demand_data.index.get_level_values('Exam Date').unique():
for i in demand_data.loc[date].index.tolist():
model += pulp.lpSum([capacity[examdate,techname,region] for examdate, techname, region in tech_data_new.index if (date == examdate and i == region)]) == demand_data.loc[(demand_data.index.get_level_values('Exam Date') == date) & (demand_data.index.get_level_values('Body Region') == i), shiftname].item()
for examdate, techname,region in tech_data_new.index:
model += (capacity[examdate, techname, region]) <= tech_data_new.loc[(examdate,techname,region), 'Max Capacity']*tech_used[examdate, techname]
# Number of techs used in a day should be less than 8
for examdate in tech_data_new.index.get_level_values('Exam Date').unique():
model += pulp.lpSum(tech_used[examdate, techname] for techname in tech_data_new.index.get_level_values('Technologist Name').unique()) <=8
# Max time each tech should work in a day should be less than 8 hours(28800 secs)
for date in tech_data_new.index.get_level_values('Exam Date').unique():
for name in tech_data_new.loc[date].index.get_level_values('Technologist Name').unique():
#print(name)
model += pulp.lpSum(capacity[examdate,techname,region] * tech_data_new.loc[(examdate,techname,region), 'Time taken'] for examdate, techname, region in tech_data_new.index if (date == examdate and name == techname)) <= 28800
The last condition seems to be the problem, if I remove it, the problem converges. However, I am not able to understand the problem.
Please let me know, what I am missing in my understanding. Thanks.

How to price a SimpleCashFlow

I would like to use QuantLib to price a portfolio of liabilities, which are modeled to be deterministic future cash-flows. I am now modelling them as a strip of FixedRateBonds with zero coupons, which seems like a very inelegant solution.
Problem:
Question 1: Is there a way to create an 'Instrument' that is just a 'SimpleCashFlow', 'Redemption' etc. and price it on a discount curve?
Question 2: Is it possible to construct a 'CashFlows' object or Instrument from multiple SimpleCashFlow's and price it on a curve?
Many thanks in advance
Code Example:
See code below for an example of what I am trying to do.
from QuantLib import *
# set params
calc_date = Date(30, 3, 2017)
risk_free_rate = 0.01
discount_curve = YieldTermStructureHandle(
FlatForward(calc_date, risk_free_rate, ActualActual()))
bond_engine = DiscountingBondEngine(discount_curve)
# characteristics of the cash-flow that I am trying to NPV
paymentdate = Date(30, 3, 2018)
paymentamount = 1000
# this works: pricing a fixed rate bond with no coupons
schedule = Schedule(paymentdate-1, paymentdate, Period(Annual), TARGET(),
Unadjusted, Unadjusted, DateGeneration.Backward, False)
fixed_rate_bond = FixedRateBond(0, paymentamount, schedule, [0.0],ActualActual())
bond_engine = DiscountingBondEngine(discount_curve)
fixed_rate_bond.setPricingEngine(bond_engine)
print(fixed_rate_bond.NPV())
# create a simple cashflow
simple_cash_flow = SimpleCashFlow(paymentamount, paymentdate)
# Q1: how to create instrument, set pricing engine and price a SimpleCashFlow?
#wrongcode:# simple_cash_flow.setPricingEngine(bond_engine)
#wrongcode:# print(simple_cash_flow.NPV())
# Q2: can I stick multiple cashflows into a single instrument, e.g.:
# how do I construct and price a CashFlows object from multiple 'SimpleCashFlow's?
simple_cash_flow2 = SimpleCashFlow(paymentamount, Date(30, 3, 2019))
#wrongcode:# cashflows_multiple = CashFlows([simple_cash_flow, simple_cash_flow2])
#wrongcode:# cashflows_multiple.setPricingEngine(bond_engine)
#wrongcode:# print(cashflows_multiple.NPV())
There are a couple of possible approaches. If you want to use an instrument, you can use a ZeroCouponBond instead of the fixed-rate one you're currently using:
bond = ZeroCouponBond(0, TARGET(), paymentamount, paymentdate)
bond.setPricingEngine(bond_engine)
print(bond.NPV())
Using an instrument will give you notifications and recalculation if the discount curve were to change, but might be overkill if you want a single pricing. In that case, you might work directly with the cashflows by using the methods of the CashFlows class:
cf = SimpleCashFlow(paymentamount, paymentdate)
print(CashFlows.npv([cf], discount_curve, True))
where the last parameter is True if you want to include any cashflow happening on today's date and False otherwise (note that this will give you a result a bit different from your calculation; that's because the payment date you used is a TARGET holiday, and the FixedRateBond constructor adjusts it to the next business day).
The above also works with several cash flows:
cfs = [SimpleCashFlow(paymentamount, paymentdate),
SimpleCashFlow(paymentamount*0.5, paymentdate+180),
SimpleCashFlow(paymentamount*2, paymentdate+360)]
print(CashFlows.npv(cfs, discount_curve, True))
Finally, if you want to do the same with an instrument, you can use the base Bond class and pass the cashflows directly:
custom_bond = Bond(0, TARGET(), 100.0, Date(), Date(), cfs)
custom_bond.setPricingEngine(bond_engine)
print(custom_bond.NPV())
this works but is kind of a kludge: the bond uses the passed cashflows directly and ignores the passed face amount and maturity date.

Python dataset calculations

I have a data set recording different weeks and the new cases of dengue for that specific week and I am supposed to calculate the infection rate and recovery rate for each week. The infection rate can be calculated by dividing the number of newly infected patients by the susceptible population for that week while the recovery rate can be calculated by dividing the number of newly recovered patients by the infected population for that week. The infection rate is relatively simple but for the recovery rate I have to take into account that infected patients take exactly 2 weeks to recover and I'm stuck. Any help would be appreciated
t_pop = 4*10**6
s_pop = t_pop
i_pop = 0
r_pop = 0
weeks = 0
#Infection Rate
for index, row in data.iterrows():
new_i = row['New Cases']
s_pop -= new_i
weeks += 1
infection_rate = float(new_i)/float(s_pop)
print('Week', weeks, ':' ,infection_rate)
*Note: t_pop refers to total population which we assume to be 4million, s_pop refers to the population at risk of contracting dengue and i_pop refers to infected population
You could create a dictionary to store the data for each week, and then use it to refer back to when you need to calculate the recovery rate. For example:
dengue_dict = {}
dengue_dict["Week 1"] = {"Infection Rate": infection_rate, "Recovery Rate": None}
I use none at first, because there's no recovery rate until at least two weeks have gone by. Later, you can either update weeks or just add them right away. Here's an example for week 3:
recovery_rate = dengue_dict["Week 1"]["Infection Rate"]/infection_rate
And then update the entry in the dictionary:
dengue_dict["Week 3"]["Recovery Rate"] = recovery_rate

Categories