Define the reward function to minimize costs - python

I have a problem with my reinforcement learning model. I am trying to simulate an electric battery storage.
The battery charges when the electricity prices are low and discharge ONLY to the user at fixed hours during the day, every day.
Therefore, the only cost for the user is power of charge * electricity price at the hour.
The reward function is set as the opposite of the cumulative sum of the cost.
Is it a correct approach? How to properly define it so that the overall cost of the purchased electricity is at minimum at the end of the year?
The problem that I have is that the battery will always near the maximum capacity and never fully take advantage of the full range of MWh available.
1. Define a dataframe where to store fictitious electricity prices for 365 days
df=pd.DataFrame(np.random.randint(0,500,size=(24, 365)))
2. Define the main parameters
Lookback_window_size=7
Current_day=Lookback_window_size
P_charge=2 #MW
P_discharge=3 #MW
3. Define the class Battery(Env)
class Battery(Env):
metadata = {'render.modes': ['human']}
def __init__(self, df):
#Import the dataframe
self.df = df
# The action space is a 1D array of shape (24,). Since we are simulating day-ahead market, the action space returns
# the overall daily charge / no charge scenario
# action = 1 means that we charge our battery, action = 0 means that we don't charge
self.action_space= spaces.MultiBinary(24)
# The observation space is a 1D array. Given a lookback window size of 1 day, then The first 48 columns represent
# the electricity prices for the current day + all the days before included in the lookback window size.
# The last two columns store SOC (state of charge) at the end of the day and overall cost
# (how much we paid for electricity).
self.observation_shape=(int((Lookback_window_size+1)*24+2),)
self.observation_space = spaces.Box(low = 0, high=np.inf, shape=self.observation_shape, dtype=np.float64)
def _next_observation(self):
# Add the prices of the last days to the monitor matrix
prices=[]
for i in range(self.Current_day - Lookback_window_size,self.Current_day + 1):
prices=np.concatenate([prices,self.df.iloc[0:,i].values])
# Add extra values to monitor such as SOC, cost and day of the week (Monday=1,Tuesday=2,etc.)
extra = [self.SOC, self.Cost]
obs=np.concatenate([prices,extra])
return obs
def _take_action(self, action):
# Being the action space an array, the for loop will check the action at every hour (action[i]) and update the
# cost and the state of charge
self.capacity=200 #MWh
i=0
for x in action:
#When action = 1 then we charge our battery, if action = 0 then we don't charge
if x == 1:
# The cost increase based on the price of the electricity at that hour
self.Cost+=self.df[self.Current_day][i]*P_charge
# If we charge, then the state of charge (SOC) increases as well
self.SOC+=P_charge
# Everyday we discharge the battery always at the same hours
if (i in range(8,14)):
self.SOC-=P_discharge
# if the battery is depleted, then we directly buy electricity from the grid
if self.SOC<0:
self.Cost+=self.df[self.Current_day][i+1]*(-self.SOC)
self.SOC=0
#the battery cannot charge above the capacity threshold.
if self.capacity is not None:
if self.SOC > self.capacity:
# We subtract the latest cost. Since it could not have happened being the SOC above the maximum.
self.Cost-=self.df[self.Current_day][i]*P_charge
# The capacity needs to be set to the threshold
self.SOC = min(self.SOC, self.capacity)
i+=1
def step(self, action):
# Execute one time step within the environment
self._take_action(action)
self.Current_day += 1
# Maximizing the reward means to minimize the costs
reward = - self.Cost
# Stop at the end of the dataframe
done = self.Current_day >= len(self.df.columns)-1
obs = self._next_observation()
return obs, reward, done, {}
def render(self, mode='human', close=False):
print(f'Day: {self.Current_day}')
print(f'SOC: {self.SOC}')
print(f'Cost: {self.Cost}')
print(f'Actions: {action}')
def reset(self):
self.Current_day = Lookback_window_size
# Give an initial SOC value
self.SOC = 50
# Cost at day 0 is null
self.Cost = 0
return self._next_observation()

Related

Calculating EMA from SMA in python

I've defined a function that calculates SMA(Simple Moving average), compares the value to the closing price 'p' and gives a signal as to whether we should buy or sell, represented by a 1 or -1 respectively.
I now want to create a function for EMA(Exponential Moving Average). The formula for EMA is EMA = Closing price x multiplier + EMA (previous day) x (1-multiplier) where the multiplier is [2 รท (number of observations + 1)].
However, I'm not sure how to do this in Spyder/Python.
My code for SMA is:
def SMA(p,window=10,signal_type='buy only'):
#input price "p", look-back window "window",
#signal type = buy only (default) --gives long signals, sell only --gives sell signals, both --gives both long and short signals
#return a list of signals = 1 for long position and -1 for short position
signals = np.zeros(len(p))
if len(p)<window:
#no signal if no sufficient data
return signals
sma = list(np.zeros(window)+np.nan) #the first few prices does not give technical indicator values
sma += [np.average(p[k:k+window]) for k in np.arange(len(p)-window)]
for i in np.arange(len(p)-1):
if np.isnan(sma[i]):
continue #skip the open market time window
if sma[i]<p[i] and (signal_type=='buy only' or signal_type=='both'):
signals[i] = 1
elif sma[i]>p[i] and (signal_type=='sell only' or signal_type=='both'):
signals[i] = -1
return signals
This gets called in another file like so, I want to be able to just change 'backtest.SMA' to 'backtest.EMA' so I can see the signals that EMA produces.
signals = backtest.SMA(this_dat['Close'].values, window=10)
('this_dat['Close'].values' is a list of closing price values taken from a table)
I defined EMA in the same way I defined SMA but I'm not sure how to obtain EMA
def EMA(p,window=10,signal_type='buy only'):
signals = np.zeros(len(p))
if len(p)<window:
#no signal if no sufficient data
return signals
multiplier=(2/(len(p)+1))
ema = list(np.zeros(window)+np.nan)
#ema = Closing price x multiplier + EMA (previous day) x (1-multiplier)#
ema +=
for i in np.arange(len(p)-1):
if np.isnan(ema[i]):
continue #skip the open market time window
if ema[i]<p[i] and (signal_type=='buy only' or signal_type=='both'):
signals[i] = 1
elif ema[i]>p[i] and (signal_type=='sell only' or signal_type=='both'):
signals[i] = -1
return signals
I need help finding a way to write that forumla for EMA in python, I can't figure it out. If anyone knows a package I can use that might work then I'm also open to trying it but I would prefer if I could find EMA using my SMA function as a template.
Any help would be greatly appreciated, thanks

MIP using PULP not approaching result

I am trying to solve a MIP problem. I am trying to find the number of exams to be done by each tech on a date for a week by minimizing the total number of techs used.
I have demand, time taken by each tech, list of techs etc. in separate dataframes.
Initially, I was using the cost function as minimizing the total time used to finish demand which #kabdulla helped me solve, linkhere!
Now, with the new cost function, the script gets stuck and doesn't seem to converge and I am not able to identify the reason.
Below is my code so far:
# Instantiate problem class
model = pulp.LpProblem("Time minimizing problem", pulp.LpMinimize)
capacity = pulp.LpVariable.dicts("capacity",
((examdate , techname, region) for examdate, techname, region in tech_data_new.index),
lowBound=0,
cat='Integer')
tech_used = pulp.LpVariable.dicts("techs",
((examdate,techname) for examdate,techname,region in tech_data_new.index.unique()),
cat='Binary')
model += pulp.lpSum(tech_used[examdate, techname] for examdate,techname in date_techname_index.index.unique())
for date in demand_data.index.get_level_values('Exam Date').unique():
for i in demand_data.loc[date].index.tolist():
model += pulp.lpSum([capacity[examdate,techname,region] for examdate, techname, region in tech_data_new.index if (date == examdate and i == region)]) == demand_data.loc[(demand_data.index.get_level_values('Exam Date') == date) & (demand_data.index.get_level_values('Body Region') == i), shiftname].item()
for examdate, techname,region in tech_data_new.index:
model += (capacity[examdate, techname, region]) <= tech_data_new.loc[(examdate,techname,region), 'Max Capacity']*tech_used[examdate, techname]
# Number of techs used in a day should be less than 8
for examdate in tech_data_new.index.get_level_values('Exam Date').unique():
model += pulp.lpSum(tech_used[examdate, techname] for techname in tech_data_new.index.get_level_values('Technologist Name').unique()) <=8
# Max time each tech should work in a day should be less than 8 hours(28800 secs)
for date in tech_data_new.index.get_level_values('Exam Date').unique():
for name in tech_data_new.loc[date].index.get_level_values('Technologist Name').unique():
#print(name)
model += pulp.lpSum(capacity[examdate,techname,region] * tech_data_new.loc[(examdate,techname,region), 'Time taken'] for examdate, techname, region in tech_data_new.index if (date == examdate and name == techname)) <= 28800
The last condition seems to be the problem, if I remove it, the problem converges. However, I am not able to understand the problem.
Please let me know, what I am missing in my understanding. Thanks.

creating signals based on current and prior time periods

I'm trying to write a trading algo and I am very new to python.
Lots of things are easy to understand but I get lost easily. I have a strategy I want to use, but the coding is getting in the way.
I want to create two moving averages and when they cross I want that to be a signal.
The part im I am currently struggling with is also including information about the prior period.
When
MovingAverage1( last 10 candles ) == MovingAverage2( Last 20 candles ),
that's a signal,
but is it a buy or sell?
When
MovingAVerage1( last 10 candles after skipping most recent ) > MovingAverage2( last 10 candles after skipping most recent )
then sell.
Here is what I've got so far, where the MA-s I am using are being simplified for this question:
class MyMACrossStrategy (Strategy):
"""
Requires:
symbol - A stock symbol on which to form a strategy on.
bars - A DataFrame of bars for the above symbol.
short_window - Lookback period for short moving average.
long_window - Lookback period for long moving average."""
def __init__(self, symbol, bars, short_window=4, long_window=9):
self.symbol = symbol
self.bars = bars
self.short_window = short_window
self.long_window = long_window
# Function Helper for indicators
def fill_for_noncomputable_vals(input_data, result_data):
non_computable_values = np.repeat(
np.nan, len(input_data) - len(result_data)
)
filled_result_data = np.append(non_computable_values, result_data)
return filled_result_data
def simple_moving_average(data, period):
"""
Simple Moving Average.
Formula:
SUM(data / N)
"""
catch_errors.check_for_period_error(data, period)
# Mean of Empty Slice RuntimeWarning doesn't affect output so it is
# supressed
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=RuntimeWarning)
sma = [np.mean(data[idx-(period-1):idx+1]) for idx in range(0, len(data))]
sma = fill_for_noncomputable_vals(data, sma)
return sma
def hull_moving_average(data, period):
"""
Hull Moving Average.
Formula:
HMA = WMA(2*WMA(n/2) - WMA(n)), sqrt(n)
"""
catch_errors.check_for_period_error(data, period)
hma = wma(
2 * wma(data, int(period/2)) - wma(data, period), int(np.sqrt(period))
)
return hma
def generate_signals(self):
"""Returns the DataFrame of symbols containing the signals
to go long, short or hold (1, -1 or 0)."""
signals = pd.DataFrame(index=self.bars.index)
signals['signal'] = 0.0
# Create the set of moving averages over the
# respective periods
signals['Fast_Line'] = sma(bars['Close'], self.short_window)
signals['Slow_line'] = hma(bars['Close'], self.long_window)
signals1['Fast_Line'] = sma(bars['Close'], self.short_window[-1])
signals1['Slow_line'] = hma(bars['Close'], self.long_window[-1])
# Create a 'signal' (invested or not invested) when the short moving average crosses the long
# moving average, but only for the period greater than the shortest moving average window
signals['signal'][self.short_window:] = np.where(signals['Fast_Line'][self.short_window:]
> signals['Slow_line'][self.short_window:], 1.0, 0.0)
# Take the difference of the signals in order to generate actual trading orders
signals['positions'] = signals['signal'].diff()
if signals['Fast_Line'] = signals['Slow_Line'] and ...
return signals
Hopefully my question makes sense.
I am assuming that you want to test your strategy first before using it in live market. You can download the stock data from yahoo finance in csv format. And you can upload with below code:
import pandas as pd
import numpy as np
data = pd.read_csv('MSFT.csv')
once the data is stored in the pandas dataframe data, you can moving average of the Closing price with following code:
if you are planning the crossover strategy
sma_days=20
lma_days=50
data['SMA_20']=data['Close'].rolling(window=sma_days,center=False).mean()
data['SMA_50']=data['Close'].rolling(window=lma_days,center=False).mean()
data['SIGNAL']=np.where(data['SMA_20']>data['SMA_50'],'BUY','SELL')

Python - path dependent simulation

I've setup a simulation example below.
Setup:
I have weekly data, say 6 years of data each week of around 1000 stocks some weeks more other weeks less than 1000. I randomly chose 75 stocks at time t0. At t1 some stocks dies (probability p, goes out of fashion) or leave the index (structural such as merging). I need to simulate stocks so that every week I've exactly 75 stocks. Every week some stocks dies (between 0 and 75) and I pick new ones not from the existing 75. I also check if the stock leaves do to structural reasons. Every week I calculate the returns of the 75 stocks.
Questions: Is there an obvious why to improve the speed. I started with Pandas objects (group sort) which was to slow. I haven't tried to parallel the loop. I'm more interesting to hear if I should use numba (but it doesn't have the np.in1d function) or if there is a faster way to shuffle (I actually only need to shuffle the ones). I've also think about creating a fixed array with all stocks id using NaN, the problem here is that I need 75 names so I still need to filter out these NaN every week.
Maybe this is to detailed problem for this forum, I apologize if that's the case
Code:
from timeit import default_timer
import numpy as np
# Create dataset
n_weeks = 312 # Approximately 6 years of weekly data
n_stocks = np.random.normal(1000, 5, n_weeks).astype(dtype=np.uint16) # Around 1000 stocks every week but not fixed
idx_new_week = np.cumsum(np.hstack((0, n_stocks)))
# We give each stock a stock idea
n_obs = n_stocks.sum()
stock_id = np.ones([n_obs], dtype=np.uint16)
for j in range(1, n_weeks+1):
stock_id[idx_new_week[j-1]:idx_new_week[j]] = np.cumsum(np.ones(n_stocks[j-1]))
stock_rtn = np.random.normal(0, 0.25/np.sqrt(52), n_obs) # Simulated forward (one week ahead) return for each stock
# Simulation part
# Week 0 pick randomly 75 stocks
# Week n >=1 a stock dies for two reasons
# 1) randomness (probability 'p')
# 2) structural event (could be merger, fall out of index).
# We cannot assume that it is always the high stockid which dies for structural reasons (as it looks like here)
# If a stock dies we randomely pick a stock from the "deak" stock dataset (not included the ones which dies this week)
n_sim = 100 # I want this to be 1 mill
n_stock_cand = 75 # For this example we pick 75 stocks
p_survial = 0.90
# The weekly periodcal returns
pf_rtn = np.zeros([n_weeks, n_sim])
start = default_timer()
for k in range(0, n_sim):
# Randomely choice n_stock_cand at time zero
boolean_list = np.array([False] * (n_stocks[0] - n_stock_cand) + [True] * n_stock_cand)
np.random.shuffle(boolean_list) # Shuffle the list
stock_id_this_week = stock_id[idx_new_week[0]:idx_new_week[1]][boolean_list]
stock_rtn_this_week = stock_rtn[idx_new_week[0]:idx_new_week[1]][boolean_list]
# This part only simulate the Buzz portfolio names - later we simulate returns and from specific holdings of the 75 names
for j in range(1, n_weeks):
pf_rtn[j-1, k] = stock_rtn_this_week.mean()
# Find the number of stocks to keep
boolean_keep_stocks = np.random.rand(n_stock_cand) < p_survial
# Next we need to check if a stock is still part of the universe next period
stock_cand_temp = stock_id[idx_new_week[j-1]:idx_new_week[j]]
stock_rtn_temp = stock_rtn[idx_new_week[j-1]:idx_new_week[j]]
boolean_keep_stocks = (boolean_keep_stocks) & (np.in1d(stock_id_this_week, stock_cand_temp, assume_unique=True))
n_stocks_to_replace = n_stock_cand - boolean_keep_stocks.sum() # Number of new stocks to pick this week
if n_stocks_to_replace > 0:
# We have to pick from stocks which is not part of the portfolio already
boolean_cand = np.in1d(stock_cand_temp, stock_id_this_week, assume_unique=True, invert=True)
n_stocks_to_pick_from = boolean_cand.sum()
boolean_list = np.array([False] * (n_stocks_to_pick_from - n_stocks_to_replace) + [True] * n_stocks_to_replace)
np.random.shuffle(boolean_list) # Shuffle the list
# First avoid picking the same stock twich, next pick from the unique candidate list
stock_id_new = stock_cand_temp[boolean_cand][boolean_list] # The new stocks
stock_rtn_new = stock_rtn_temp[boolean_cand][boolean_list] # and their returns
stock_id_this_week = np.hstack((stock_id_this_week[boolean_keep_stocks], stock_id_new))
stock_rtn_this_week = np.hstack((stock_rtn_this_week[boolean_keep_stocks], stock_rtn_new))
else:
# No replacement of stocks / all surview but order might differ
boolean_cand = np.in1d(stock_cand_temp, stock_id_this_week, assume_unique=True, invert=False)
stock_id_this_week = stock_cand_temp[boolean_cand]
stock_rtn_this_week = stock_rtn_temp[boolean_cand]
# PnL last period
pf_rtn[n_weeks-1, k] = stock_rtn_this_week.mean()
print(default_timer() - start)

Python dataset calculations

I have a data set recording different weeks and the new cases of dengue for that specific week and I am supposed to calculate the infection rate and recovery rate for each week. The infection rate can be calculated by dividing the number of newly infected patients by the susceptible population for that week while the recovery rate can be calculated by dividing the number of newly recovered patients by the infected population for that week. The infection rate is relatively simple but for the recovery rate I have to take into account that infected patients take exactly 2 weeks to recover and I'm stuck. Any help would be appreciated
t_pop = 4*10**6
s_pop = t_pop
i_pop = 0
r_pop = 0
weeks = 0
#Infection Rate
for index, row in data.iterrows():
new_i = row['New Cases']
s_pop -= new_i
weeks += 1
infection_rate = float(new_i)/float(s_pop)
print('Week', weeks, ':' ,infection_rate)
*Note: t_pop refers to total population which we assume to be 4million, s_pop refers to the population at risk of contracting dengue and i_pop refers to infected population
You could create a dictionary to store the data for each week, and then use it to refer back to when you need to calculate the recovery rate. For example:
dengue_dict = {}
dengue_dict["Week 1"] = {"Infection Rate": infection_rate, "Recovery Rate": None}
I use none at first, because there's no recovery rate until at least two weeks have gone by. Later, you can either update weeks or just add them right away. Here's an example for week 3:
recovery_rate = dengue_dict["Week 1"]["Infection Rate"]/infection_rate
And then update the entry in the dictionary:
dengue_dict["Week 3"]["Recovery Rate"] = recovery_rate

Categories